Robots.txt

What is the Robots.txt?

Robots.txt was introduced as a protocol in 1994 and ensures that web crawlers must first call up Robots.txt and read out its contents when calling up a web page. Only then are the bots allowed to include the contents of the web page in the index.

So that the bots can also read the file first, it must be placed in the top level of the root directory. The file itself must be a simple text file - hence the .txt extension. Only one file with this name may exist in the directory.

Since the bots read content case sensitive, the content should be lowercase.

Notice:

Even if the crawlers of Google, Bing, Yahoo and Youtube strictly adhere to the content or instructions in the Robots.txt, it can still happen that these areas are indexed despite everything.

To prevent indexing of a page, the meta tag must be included in the HEAD section of this page. Furthermore, robots.txt do not provide protection against unauthorized access.

What instructions are in the Robots.txt?

This text file contains instructions to the bots on which areas of a web page they should read. With this text file, website developers can easily exclude entire areas of a website, complete domains, unimportant subdirectories or individual files, such as images, from crawling.

What exactly do these instructions look like?

It should be repeated here that the bots read the contents case-sensitively - i.e. always in lower case.

First of all the most important keywords or characters for the instructions:

user-agent

This command addresses the bots directly.

disallow

Preventing the reading of directories, files or pages

allow

Explicit instruction to read out files, directories or pages

sitemap

Shows the bots the path to the sitemap.

*

The asterisk indicates that all bots are meant by this statement. It is a so-called wildcard.

$

The dollar sign is also a wildcard and tells the bot that this is the end of a file.

Examples of a statement in the Robots File:

user-agent: googlebot

user-agent: bingbot

disallow: / image directory / image.jpg /

allow: / image directory / image2.jpg / 

Here you give the bots of Google and Bing the instruction not to crawl file.jpg in the image directory, but the permission to crawl for the file image2.jpg in the same directory already. 

user-agent *

disallow: /page1/

Here you create the instruction that the crawlers should ignore the directory page1, including subpages.

The Robots.txt and the search engine optimization?

When it comes to search engine optimization, it is important to be careful. If you accidentally give the instruction that the entire website should not be crawled, it can lead to a loss in ranking.

Therefore, in case of unexplained losses, it is always advisable to take a look at the Robots.txt file. Here we can help you as SEO Agency in Munich would be happy to help.

    Etiam magna arcu, ullamcorper ut pulvinar et, ornare sit amet ligula. Aliquam vitae bibendum lorem. Cras id dui lectus. Pellentesque nec felis tristique urna lacinia sollicitudin ac ex. Maecenas mattis faucibus condimentum. Curabitur imperdiet felis at est posuere bibendum. Sed quis nulla tellus.

    ADDRESS

    63739 street lorem ipsum City, Country

    PHONE

    +12 (0) 345 678 9

    EMAIL

    info@company.com