Intro The docs have an example of how to block every page using robots.txt but, I wanted to make a robots.txt template that blocks only pages with a certain frontmatter param. As others have pointed out in the comments and elsewhere, robots.txt is not a sure thing. Web crawlers have to be set to ho…

Very useful. Thanks @RickCogley .

For those who may not be familiar with robots.txt - please remember that this is not valid for actually preventing access to information. It only discourages well behaved search engines from indexing your data. There are lots of badly behaved search engines and even more bad actors who actively hunt…

Good point. Looks like you’d need to do something like: [image] Allow taxonomy with limitations support No. But you can hide these pages so that search engines cannot index them. Replicate your taxonomies as a folder structure under /content/ and then in the_index.md add…

It is important to synchronize the robots.txt and the sitemap.xml, if the page is blocked in the robots.txt, but is present in the sitemap.xml, Google-webmasters makes a warning.

Thanks @TotallyInformation , @Mikhail , @alexandros , @bep for the various bits of info that went into it. Let me know if I’m missing something and I’ll edit the mini tutorial. Hopefully if this is good enough, I can get it into the proper docs, as I think this is a pretty universal need.

Please forgive me for this one, but I HAVE to ask this question. Why not refrain from putting the information on the website, entirely - if it’s not something you want indexed/searched/found? Is robots.txt used so that you can have a few Easter eggs on your site?

[image] KalikaKay: Why not refrain from putting the information on the website, entirely - if it’s not something you want indexed/searched/found? Is robots.txt used so that you can have a few Easter eggs on your site? Well in my case I am hiding a Thank You HTML page that is displayed once a…

[ThankYou-BW] Lol! That’s the last thing on my mind, currently. You might want those terms and conditions pages to be indexed/searched, though. I know I would, if I were to sign up for something. I forget those policies and don’t always know what it is I’ve signed up for. It’d be nice if they’…

[image] KalikaKay: You might want those terms and conditions pages to be indexed/searched, though. I know I would, if I were to sign up for something. I forget those policies and don’t always know what it is I’ve signed up for. It’d be nice if they’d pop up before I used your results. Actu…

That’s a perfect example of the use of the robot.txt file and I appreciate that now, do I not only know the purpose of it, I know how and when to use it. I can also choose not to use it and if I do enable the setting, I’ll still need to add some configurations to get it working properly. […

Custom Robots.txt and sitemap.xml Templates

tips & tricks

RickCogley May 8, 2018, 1:20pm 4

Good point. Looks like you’d need to do something like:

Topic		Replies	Views
Do not index certain pages support	8	931	July 2, 2020
Custom robots.txt template not working support taxonomy	5	1227	March 21, 2022
Where to put enableRobotsTXT support	7	1954	June 22, 2020
Multilingual site always get a noindex tag in root index.html support	4	2152	September 4, 2018
Crawling is blocked	1	743	February 2, 2020

Custom Robots.txt and sitemap.xml Templates

Related topics