Google ignoring pages with alias

Hello, I am experiencing a weird behavior of Google treating the following page:

It is a pretty normal Hugo-generated blog post with one twist - it uses an alias feature:

---
type: "post"
aliases:
- /2013/07/hidden-gems-of-xterm.html
date: "2013-07-17T00:00:00Z"
tags:
- linux
- fedora
title: Hidden gems of xterm
---

It is a path from my old blog that was not powered by Hugo. Now, the generated alias is generated from the stock layout:

<!DOCTYPE html>
<html lang="en-us">
  <head>
    <title>http://localhost:1313/posts/2013/hidden-gems-of-xterm/</title>
    <link rel="canonical" href="http://localhost:1313/posts/2013/hidden-gems-of-xterm/">
    <meta name="robots" content="noindex">
    <meta charset="utf-8">
    <meta http-equiv="refresh" content="0; url=http://localhost:1313/posts/2013/hidden-gems-of-xterm/">
  </head>
</html>

Everything works fine, except, for some reason I noticed that Google does not index the original page. Google Search Console shows the reason: metadata noindex:

I know I should be asking Google rather than Hugo, but I am curious if anyone experienced this or can verify on different site if that is the case.

One thing I should mention is that my blog is hosted on Github and I have also CloudFlare cache.

Did you type ‘hugo‘ and look at the resulting page/pages?

Sorry what?

Running Hugo by typing the ‘hugo’ command instead of ‘hugo server‘ will give you a ‘public’ folder you can explore. Make sure to clear it first. Then analyze the actual output from Hugo. This might solve your mystery.

Everything is okay from the hugo perspective, it generates a proper alias, it works, all is good.

It is just Google does not index both the alias (client-side redirect page) AND the actual page. I am curious if others see the same, for example duckduckgo does index it correctly.

Google says that the metadata tells them it is ‘noindex’. I did a ‘curl’ on your old url and this is the result:

<!doctype html><html lang=en-us><head><title>https://lukas.zapletalovi.com/posts/2013/hidden-gems-of-xterm/</title>
<link rel=canonical href=https://lukas.zapletalovi.com/posts/2013/hidden-gems-of-xterm/><meta name=robots content="noindex"><meta charset=utf-8><meta http-equiv=refresh content="0; url=https://lukas.zapletalovi.com/posts/2013/hidden-gems-of-xterm/"></head></html>

Your output is a redirect with a noindex instruction. Your canonical does not have a noindex instruction (if I am correct). This page should be indexed.

1 Like

Exactly, I do not understand…

Found another user finding the same:

If verifiable, this should have been logged as a bug over 6 years ago.

Search for “hidden gems of xterm” via Google right know, you will find zero links to my blog. I have just updated my blog with custom alias template that does not have that noindex and asked Google for a new visit. Tomorrow, my page should be back.

I will report back if it helps.

Edit: I can confirm that Google immediately picked up the mentioned page.

Are we sure that the alias file won’t be indexed as well? This isn’t just about how Google handles it; it’s about every crawler.

What exactly should not be indexed? There is literally no content on the alias page. What I mean is: what should I search for in order for such page to appear in results. There is the URL in the HTML title tag that is the only content I can see.

I am fully aware Google is not the only one and it is not my choice, but I was running my blog for several years without knowing Google users are unable to find any of my migrated content. I think it makes sense to optimize for Google in the first place.

More information, though not terribly helpful…

Google’s documentation does not state what happens when you use meta refresh and noindex at the same time:

1 Like

Wow, been a long time since I posted about this.

I am still of the opinion the default template is wrong, even after all this time. I’ve been using my own alias template without the no-index without any problems whatsoever.

The key thing is shown on how Google treats it in the search console with and without the no-index. There are now two of us saying exactly the same thing.

1 Like

I did the same research, yeah, it is unfortunately something that has not been officially documented.

As Jonathan said, Google Search Console is extremely confusing. So this is something that Google should probably look at. I would not hold my breath waiting for a fix, there is probably a reason why they do this.

There is absolutely a reason. If you no index a page Google ignores it. If they ignore it how can they find the canonical to know to redirect it?

how can they find the canonical to know to redirect it?

I thought, perhaps naively, that the sitemap would provide this.

1 Like

Sitemap, which is generated by default?

Sitemap is just something that shows them the page which they are then told to ignore and not process.