HUGO

Custom Robots.txt and sitemap.xml Templates


Lol! That’s the last thing on my mind, currently.

You might want those terms and conditions pages to be indexed/searched, though.
I know I would, if I were to sign up for something.
I forget those policies and don’t always know what it is I’ve signed up for.
It’d be nice if they’d pop up before I used your results.

Actually the links to these pages are displayed prominently in the Newsletter subscription page, the user is required to agree to the terms if she wishes to subscribe to the Newsletter.

Also there are links to these pages in the footer of every HTML page of a Hugo site I manage.

So the user can find them quite easily, if she wants to read them.

Keeping these pages not listed in search results is just a matter of taste.

That’s a perfect example of the use of the robot.txt file and I appreciate that now, do I not only know the purpose of it, I know how and when to use it.

I can also choose not to use it and if I do enable the setting, I’ll still need to add some configurations to get it working properly.

1 Like

Thank you for this great tutorial. When I wrote
{{ with .Params.robotsdisallow }}<meta name="robots" content="noindex">{{ end }}

in head.html, then I get a line for the meta tag the have the line enableRobotsTXT = "true" in the frontmater.
<meta name="robots" content="noindex">

But all other pages need another
<meta name="robots" content="index, follow, archive">

1 Like

@Joerg, extending my {{ with ...}}, I would do:

{{ with .Params.robotsdisallow }}<meta name="robots" content="noindex, nofollow, noarchive">{{ else }}<meta name="robots" content="index, follow, archive">{{ end }}

Just tested and it works. If you just need content="index, follow", then just leave the “else” off, because that is the default as I understand.

(updated the tutorial, thanks @Joerg :smile: )

1 Like

Extended the tutorial a bit with a section on setting sitemap changefreq and priority in frontmatter. See above.

This would still only work for well-behaved search crawlers, just as a warning to the unfamiliar. People can still download anything they want for free from the command line using wget and others, or paid apps like Screaming Frog. To the previous point, if you want it truly hidden, you’ll need to hide it behind a login…

1 Like

Yeah, all these things are “polite requests” only. :wink:

1 Like

Extended the tutorial updating the custom sitemap template by adding a default “x-default” lang, recommended by Google and Yoast.

Hi, Rick. I’ve got confused with my robots.txt. Trying follow thar artical Custom Robots.txt and sitemap.xml Templates Something going wrong((( I can not find my Robots.txt and I have three Layout (take a look a scan) That’s ok?
layouts_scan Every time I have 404 page instead

No, there can be a layouts in the root of your project and in the theme, but it appears you have one in content as well. I have not tested to see if there would be a negative effect having one in content but, it is not the usual way.

If you want specific help, please see Requesting Help and share your repo etc with the community in a new thread.

Thanks a lot @RickCogley. I followed your tutorial but I have 2 issues:

  • I don’t understand the following part: " Assuming you set that param in your 404", what am I supposed to put in my 404 and how?
  • When I visit http://localhost:1313/sitemap.xml, I have the following error:

    It’s listing anything that is coming from my blog page. Would you know why it happens and how I can fix it?
    Thanks

Hi - that was confusing. I rewrote it. It can be any page you want to exclude. Once you add the param, it should show up in the robots.txt.

Regarding the other question, could you make a new post please. From what you pasted it is hard to tell, and I want to ask if you could please have a look at Requesting Help and provide some more details, in the new post? Thanks.

@RickCogley
Loving your code, thanks.

I noticed an issue: Multilingual pages, like example.ru.md, do not get added to robots.txt.

I have enableRobotsTXT = "true" in front matter of both files example.md and example.ru.md.

I expect to get two entries in robots.txt:


/example/
/ru/example/

But I only get one /example/

What do you think is the problem?

Hmm, I imagine it is something with the range statement. I don’t have time to set up a test but, can you try {{ .Permalink }} instead? Or, maybe it needs absLangURL.

1 Like

Thanks, will give it a try later.

Tried using {{ .Permalink }} instead of {{ .RelPermalink }}. Did not work. It just gave me an absolute url output.

Then I tried using{{ .RelPermalink | relLangURL }} and absLangURL. Did not work either.

Did some more research but can not figure it out yet. Any ideas?

@RickCogley

Hi,

I have a sitemap.xml and robots.txt given by my client’s SEO person.

As I’m sure about how to connect the steps given by you to my files,

kindly help on the same

sitemap.xml

    <?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
    <!-- Generated by Web-Site-Map.com -->
    <url>
        <loc>https://www.prabhatha.com/</loc>
        <changefreq>weekly</changefreq>
        <priority>1.00</priority>
    </url>
    <url>
        <loc>https://www.prabhatha.com/join-prabhat/register/</loc>
        <changefreq>weekly</changefreq>
        <priority>0.85</priority>
    </url>
    <url>
        <loc>https://www.prabhatha.com/about/about_prabhat/</loc>
        <changefreq>weekly</changefreq>
        <priority>0.85</priority>
    </url>
    <url>
        <loc>https://www.prabhatha.com/repertoire/home/</loc>
        <changefreq>weekly</changefreq>
        <priority>0.85</priority>
    </url>
    <url>
        <loc>https://www.prabhatha.com/training/home/</loc>
        <changefreq>weekly</changefreq>
        <priority>0.85</priority>
    </url>
    <url>
        <loc>https://www.prabhatha.com/costumes/home/</loc>
        <changefreq>weekly</changefreq>
        <priority>0.85</priority>
    </url>
    <url>
        <loc>https://www.prabhatha.com/gallery/home/</loc>
        <changefreq>weekly</changefreq>
        <priority>0.85</priority>
    </url>
</urlset>

and robots.txt

User-agent: *

Sitemap: https://www.prabhatha.com/sitemap.xml

@TenSketch Best would be to open a new topic / request, and share your repo etc w/ what you tried, so someone can assist.