The reason for a robots.txt tag, otherwise called the robots exclusion protocol, is to give webmasters command over what pages robots (ordinarily called creepy crawlies) can slither and file on their website. A commonplace robots.txt document, set on your site’s server, ought to incorporate your sitemap’s URL and some other boundaries you wish to set up.
In the event that a robot needs to visit a page on your site before it does so it checks your robots.txt (put at www.domain.com/robots.txt – case touchy, on the off chance that you call it Robots.TXT it won’t work) and sees that your robots.txt document contains the accompanying exclusion:
User-agent: *
Disallow: /
The ‘User-agent: *’ tells the robot that this standard applies to all robots, not just Google bots or search engines.
The ‘Disallow: /’ tells the robots that it isn’t permitted to visit any pages on this space. While making your robots.txt document you should be cautious about what boundaries you set, as though your robots.txt resembles the above model this implies that your site won’t be crept by Google!
Note: Some robots will disregard your robots.txt document, as it is just an order hence it will even now get to pages on your site in any case. These are ordinarily malevolent bots who may gather data from your site. Some could be malignant, regardless of whether you make a part in your robots.txt document to avoid it from creeping your site, as these robots, as a rule overlook your robots.txt record it would be ineffective. Obstructing the robot’s IP address could be an alternative however as these spammers typically utilize diverse IP which like it may look, it can be a tedious process.
Robots.txt Options
You have a progression of alternatives with regards to your robots.txt and what you need it to contain, underneath are a few models that may assist you with making yours!
Case Sensitivity
Robots.txt orders are case sensitive so on the off chance that you disallow/logo-image.gif the mandate would impede http://www.domain.com/logo-image.gif yet http://www.domain.com/Logo-Image.gif would at present be available to robots.
Permit all robots to creep your entire site
User-agent: *
Disallow:
Prohibit all robots (pernicious and Google bots) from your entire site
User-agent: *
Disallow:/
Prohibit a particular robot from a particular organizer/record on your site
User-agent: Examplebot
Disallow:/no-robots/
Note: You can just have one envelope/document per “Disallow:” line, in the event that you have more than one area you need to prohibit you should add more Disallow lines.
Permit one explicit robot and prohibit every other robot
User-agent: Googlebot
Disallow:
User-agent: *Disallow:/Exclude a particular robot User-agent: SpamBotDisallow:/
Announcing your sitemap in your robots.txt record
User-agent: *
Disallow:
Sitemap: http://www.domain.com/sitemap.xml
Note: The sitemap assertion should be to an absolute URL not a relative URL
Prohibit all robots from an entire organizer separated from one document/picture
User-agent: *
Disallow:/my-photographs
Permit:/my-photographs/logo.jpg
Meta Robots Tag
As far as SEO is concerned, in case you need to hinder Google from slithering a particular page on your site and ordering it in its list items pages then it is best practice to utilize a Meta robots tag to reveal to them that they are permitted to get to this page however not show it in the SERPs. Your robots Meta tag should appear as though this and be put in the <head> part of your site:
<meta name=”robots” content=”noindex”>
On the off chance that you need to prohibit a crawler from ordering the content on your page and keep it from following any of the connections, your meta robots tag would resemble this:
<meta name=”robots” content=”noindex, nofollow”>
Meta Robots tag versus Robots.txt
Generally, if you find the need to deindex a page or directory from Google’s Search Results then we recommend that you utilize a “Noindex” meta tag instead of a robots.txt mandate as by utilizing this strategy whenever your site is crept your page will be deindexed, implying that you won’t need to send a URL evacuation demand. Nonetheless, you can in any case utilize a robots.txt mandate combined with a Webmaster Tools page evacuation to achieve this.
Utilizing a meta robots tag additionally guarantees that your connection value isn’t being lost, with the utilization of the ‘follow’ order.
Robots.txt records are best for refusing an entire segment of a site, for example, a classification through a meta tag is more productive at forbidding single documents and pages. You could decide to utilize both a meta robots tag and a robots.txt document as neither has authority over the other, however “noindex” consistently has authority over “index” demands.