As a fresher, there are so many things to learn in Digital marketing. You will learn about search engine optimization (SEO), paid searches (PPC), email marketing, social media marketing, digital display marketing, web analytics, and reporting, mobile marketing, etc… Apart from this component, there is one minor component which is equally important to understand in learning digital marketing and that is Robot.txt file
web analytics, and reporting, mobile marketing, etc… Apart from this component, there is one minor component which is equally important to understand in learning digital marketing and that is Robot.txt file
Robot.txt file is one of those important things that you need to look for your website search visibility.
It is a list of instructions for search engine bots or web crawlers, it indicates any areas of your website or web pages you do not wish to crawl by the search engines. If you make a mistake in this, it probably leads your website disappearing from the search result entirely!
The robot.txt file is used to communicate to the search engine like Google, Yahoo, Bing, etc…
This is done for the number of reasons, one of which is to prevent duplicate content or web pages that don’t benefit the visitor of your website.
By using the robot.txt file you can instruct web crawlers to ignore certain areas of your website it may be some files or anything that you don’t want to get index on search engine
The basic syntax for blocking all the crawlers
User- agent: *
Disallow: [URL not to be crawled]
Whereas “User-agent” is used to give instructions to search engine crawler
If you want to give instruction to a specific one search engine crawler instead of using ‘*’ use the name of bots that you want to give a command. For example
User-agent: Googlebot (this command means Google: follow this instruction)
After this use
Allow: or Disallow:
To tell web crawlers which page to crawl or not to crawl.
Let take an example here:
We are giving following instruction to search engine crawler
In the above example “User-agent:*” means you are giving instruction to all web crawlers, “Allow:/” instruction means you are allowing everything on your website to be crawled except the login page of your site so, to don’t index this page we are using “Disallow: loginpage” instruction.
One more important thing to keep in mind you can instruct web crawlers to go through your sitemap, which is a good SEO practice for your website.
Mistakes you need to avoid in the robot.txt file:
Making mistake in the understanding robot.txt file and you will lose your search visibility from the search engine. The file name is case sensitive make sure to type robot.txt and not Robot.txt
At last, thing to remember, this file is a guide and it is not 100% guaranteed that these instructions will always be followed by all web crawlers.