Disabling RSS, Robots, and Taxonomies

I’m migrating a website to a hugo theme, and I’ve mostly finished the homepage. Before I continue, I want to disable some things like the sitemaps, RSS xml files, etc. This is the output when I build the website:

$ hugo
Building sites …
                | EN
+------------------+----+
Pages            | 18
Paginator pages  |  0
Non-page files   |  3
Static files     |  5
Processed images |  6
Aliases          |  4
Sitemaps         |  1
Cleaned          |  0

Total in 258 ms

The “public” directory looks like this:

public
│   404.html
│   index.html
│   index.xml
│   sitemap.xml
│
├───authors
│   ├───user1
│   │       avatar.jpg
│   │       avatar_huf02e5d50a52a53a70f7136bfb44f76bd_2065045_150x150_fill_q90_lanczos_center.jpg
│   │       avatar_huf02e5d50a52a53a70f7136bfb44f76bd_2065045_250x250_fill_q90_lanczos_center.jpg
│   │       index.html
│   │       index.xml
│   │
│   ├───user2
│   │       avatar.jpg
│   │       avatar_hu96c3b8f0b5d7436e6b2e1b90d656cca3_2970_150x150_fill_q90_lanczos_center.jpg
│   │       avatar_hu96c3b8f0b5d7436e6b2e1b90d656cca3_2970_250x250_fill_q90_lanczos_center.jpg
│   │       index.html
│   │       index.xml
│   │
│   └───user3
│           avatar.png
│           avatar_hu3f1bb60fbaddb8aa1ca857a1ea0283e4_26632_150x150_fill_lanczos_center_2.png
│           avatar_hu3f1bb60fbaddb8aa1ca857a1ea0283e4_26632_250x250_fill_lanczos_center_2.png
│           index.html
│           index.xml
│
├───css
│       academic.min.d5019aa0d4e10dd08df80508af89deae.css
│
├───img
│       icon-192.png
│       icon-32.png
│       icon-512.png
│
├───js
│   │   academic.min.dc856155b640fa1cd8bd8b7b068fe79c.js
│   │
│   └───vendor
│       └───reveal.js
│           └───plugin
│               └───notes
│                       notes.html
│                       notes.js
│
└───page
    └───1
            index.html

These are the contents of sitemap.xml:

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:xhtml="http://www.w3.org/1999/xhtml">
  
  <url>
    <loc>/index.html</loc>
    <priority>0</priority>
  </url>
  
  <url>
    <loc>/index.html</loc>
    <priority>0</priority>
  </url>
  
  <url>
    <loc>/index.html</loc>
    <priority>0</priority>
  </url>
  
  <url>
    <loc>/index.html</loc>
    <priority>0</priority>
  </url>
  
  <url>
    <loc>/authors/user1/</loc>
  </url>
  
  <url>
    <loc>/authors/user2/</loc>
  </url>
  
  <url>
    <loc>/authors/user3/</loc>
  </url>
  
  <url>
    <loc>/</loc>
    <priority>0</priority>
  </url>
  
</urlset>

Why are there four entries for index.html?

In my “config/_default/config.toml” file, I have the following settings:

# disable the taxonomy generations
[taxonomies]
  tag = ""
  group = ""
  category = ""

disableKinds = ["taxonomy", "taxonomyTerm", "RSS", "sitemap", "robotsTXT", "404"]

With all of this, I’m confused as to why any of the XML documents are generated as well as the 404.html page. My understanding is that the disableKinds setting would prevent hugo from generating these things. Additionally, what is the “page/1/index.html” doing? It is practically empty other than some metadata.

Unfortunately, these things might be caused by the theme I’m using (https://github.com/gcushen/hugo-academic), but I don’t know enough about the interactions between themes and the overriding “_default” files to understand what’s happening.

Hugo doesn’t delete your public folder by default. I don’t know if that is your case, as I can’t see your code.

Start a new site and use a minimal theme, and test your configuration settings.

There is a --cleanDestinationDir for the hugo command.

So

hugo --cleanDestinationDir [more flags]

remove files from destination not found in static directories

I use a rm -rf public command in my deploy script:

find . -name *.DS_Store -type f -delete
rm -rf public/
case "$1" in
    --nm | nm) hugo --gc;;
    *) hugo --minify --gc;;
esac
git push

The case switch just lets me choose whether the site should me minified. It can be left out, of course.

Hope it helps.

Thanks for the replies everyone. To clear up any confusion, I’ve been manually deleting the public folder between each build when I was trying to figure this out.

I’ll try with a minimal theme and see how that works.

When I first read your response about the cleanDestinationDir subcommand, it made me think about some kind of post-processing script I can put into my pipeline until I figure out how to configure hugo to do what I want.

If you shared your site, we’d figure it out immediately. You’d have at least a dozen :nerd:s replicating your issue.

Thanks for the reply. The repository is here: https://gitlab.com/princeton-infosec/www

I see the output your site generates, and if I tried various disableKinds configs on a fresh site.

I don’t have answers, but my testing shows that without a theme those settings work correctly (and in your instance taxonomies are not being created, they work as expected).

What I’ve noticed is the 404 is created when Academic is chosen as the theme, which makes sense as it contains a 404.html layout, except disabling the 404 should build the site sans the 404 page.

Same deal with sitemap. I’m missing something, because I can’t figure out how Academic is overriding that, and I can’t reproduce it without Academic.

@zwbetz and @alexandros: either of you folks run into this?

1 Like

@maiki Once I’m at my laptop I’ll give this a go

1 Like

Okay, this one has bitten me before too :slightly_smiling_face:.

So you need to move your disableKinds line 68 above all of your TOML tables. For example, try moving it to line 41. Then it should work as expected.

2 Likes

That is some arcane configuration magic right there. It worked. One thing to note is that the correctly working “disableKinds = […“taxonomy”…]” entry prevented the “people” widget from working, so I removed that entry from the disableKinds setting.

Thanks everyone for all the help. I doubt I would have figured it out without you!