XML files are generated, Google search console doesn't index those: crawled - currently not indexed

bugok · July 16, 2024, 5:44am

When I build and deploy my hugo website, there are xml files which are generated and deployed. Google search console shows me that the xml files can’t be indexed properly (error is ‘crawled - currently not indexed’).

This isn’t something too bad, but I think it means that something isn’t configured properly.

Does it make sense for the xml files to be generated and deployed?
Is the right solution to disallow crawling xml files in the robots.txt file?
Or maybe add a no index statement in those xml files?
Something else?

References:

Website: https://www.noamlerner.com/
The code for the site is here: GitHub - bugok/blog: Noam Lerner's blog using hugo, using the congo theme (GitHub - jpanther/congo: A powerful, lightweight theme for Hugo built with Tailwind CSS.)
The site is deployed using cloudflare pages.
Here’s a link to a github discussion with some more information: Generated XML files cause crawling errors in Google Search Console. What's the best way to handle this? · jpanther/congo · Discussion #913 · GitHub

Thanks!

idarek · July 16, 2024, 5:55am

To what is, is not, indexed by Google is strictly up to Google Algorythms to decide. You can generate XML files to tell Google your links, but thats discretion of algorythm to index or not.

Currently for Google content matter. Your website do not have much content. Keep writing and you will see that will change.

but,
why you got this is robots.txt?

Disallow: /*.xml$

treat robots as a guidsance file not a mandate to what bots must obey. Many crawlers simply ignore them if they found them useless.

Why you want to disallow XML files in robots? or set No-index?

I don’t think that “generated XML files cause crawling errors in Google Search Console”.

There is nothing wrong in any of files (index.xml or /en/sitemap.xml for example). Just add them to search console, keep working on your content and see if this will change in next months or so. Nothing to do with Hugo itself.

frjo · July 16, 2024, 6:29am

In robots.txt you have this:

User-agent: *
Disallow: /*.xml$

You are forbidding access to all xml files, including sitemap.xml. I think Google is decent about respecting things in robots.txt so I suggest you change it.

If you really want to block access to some xml files at least add a allow statement for your sitemaps.

bugok · July 16, 2024, 6:36am

@frjo , @idarek: thanks for your comments.

The reason I have Disallow: /*.xml$ in my robots.txt is because this is how I’m trying to ‘fix’ the issue. The crawling error is from before I had this in the robots.txt file.

The reason I raise this is because I’d expect things to work with the default configuration. The fact that I got the ‘Crawled - currently not indexed’ warning made me think that something isn’t configured right.

I’m fine with reverting my change to the robots.txt file - I’m trying to understand what would be the best action here, so I could ‘play nice’ with Google.

Thanks again.

idarek · July 16, 2024, 6:49am

Your disallow rule will not fix anything, as there is nothing to fix in first place. It will cause more issues.

The things work (from Hugo perspective) and there is nothing that needs fixing.

‘Crawled - currently not indexed’ is strictly to discretion of Google Algorythms. If they will find your links usefull and desirable in matter of content, they will be indexed. Indexation takes time as well.

Everything on your site (apart of the robots thing) is configures right.

add to Google Search console into sitemaps:

/sitemap.xml
/en/sitemap.xml
/he/sitemap.xml
/index.xml

And let it digest for week or two. In meantime concentrate on writing content.

idarek · July 17, 2024, 5:02pm

Some interesting article for you to read:

system · July 19, 2024, 5:02pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
No index.html in public/ support	3	969	November 18, 2019
Crawling is blocked	1	681	February 2, 2020
"noindex" in HTTP "X-Robots-Tag" in every page support	3	1034	July 17, 2020
Help with SEO & Hugo - sitemap processed but no indexing being done support	3	1285	December 14, 2022
Understanding "public" folder architecture support	1	726	June 15, 2023

XML files are generated, Google search console doesn't index those: crawled - currently not indexed

Related topics