Get URLs of published pages with "build": {"list":"never"} parameter

Hi,

I have some normal content pages and some pages with “build”: {“list”:“never”} parameter. After building the site, I need to get ALL urls of ALL pages, including the ones with “build”: {“list”:“never”}.

I’ve tried to use all variations of methods like “.Site.AllPages”, “site.Pages”, etc and tried getting URLs via “hugo list all” command after the build. “debug” command option do not return list of urls either.
Pages with “build”: {“list”:“never”} seem to be always excluded from all results.

Is there any possible built-in way to get a list of all URLs that were built/published including the ones with “build”: {“list”:“never”} parameter?

Thanks

In fact that’s the meaning of never. :wink:

You could use a ScratchPad to store needed information when the page is rendered and retrieve that later.

Here’s some code from standard hugo new theme (requires 0.139.0)

single template

Second but last line stores the current Page to the site Scratchpad

{{ define "main" }}
   <h1>{{ .Title }}</h1>

   {{ $dateMachine := .Date | time.Format "2006-01-02T15:04:05-07:00" }}
   {{ $dateHuman := .Date | time.Format ":date_long" }}
   <time datetime="{{ $dateMachine }}">{{ $dateHuman }}</time>

   {{ .Content }}
   {{ partial "terms.html" (dict "taxonomy" "tags" "page" .) }}
   {{- site.Store.Add "AllSinglePages" (slice $) -}}
{{ end }}
home.html

Second range (within the <ul> tag) retrieves the stored pages and prints their Permalink

{{ define "main" }}
   {{ .Content }}
   {{ range site.RegularPages }}
      <h2><a href="{{ .RelPermalink }}">{{ .LinkTitle }}</a></h2>
      {{ .Summary }}
   {{ end }}
   <ul>
      {{- range site.Store.Get "AllSinglePages" -}}
         <li>{{ .Permalink }}</li>
      {{- end -}}
   </ul>
{{ end }}

What is your goal when setting list = 'never'? It sounds like you want to exclude a page from some page collections, but not all page collections? Is that accurate?

@irkode Thanks for proposed workaround, it worked for me and I achieved what I needed.

@jmooring My goal is to hide page from everywhere on the site (sections, taxonomies, sitemaps, etc) but still build some pages. So if I know URL then I can access it, but if I don’t know URL then there is no way to find it through the website.

I was expecting that Hugo has some built-in way to list all pages that were built & published, including the ones with “build”: {“list”:“never”}. However I haven’t found any options to do that (except proposed workaround).
So even though Hugo builds the page (e.g. I have output page .html file and I can access it if I know URL), I need to lookup through filesystem by some script or use mentioned workaround to get a list of ALL pages & urls that were built.

Thanks.

may I kindly ask whats the purpose of publishing pages that no one can see?

@irkode it related to my specific task. Technically these pages are still visible and can be accessed if you know the link, but I don’t want them to be discoverable through the website. For example, I can still index them on Google via their Indexing API, but they cannot be crawled by the bots or someone else (because there no references to these pages anywhere on the site).

How many of these pages are there? Is there any pattern to their path? And why do you need to list them if you don’t want them to appear in lists?

@jmooring There are like 50k of these pages.
Yes, there is URL pattern like site_url/section1/term_level_1/term_level_2/page, does it matter?

I’m currently testing workaround with adding permalinks of the pages to the store via {{- site.Store.Add “AllSinglePages” (slice .Permalink) -}}. It looks like it increased build time, but not sure yet how much.

After testing, I found out that using {{- site.Store.Add “AllSinglePages” (slice .Permalink) -}} on single page template results in almost 2x memory usage.

Before (without it):
Alloc = 7.6 GB
TotalAlloc = 395.4 GB
Sys = 14.3 GB
NumGC = 197

After:
Alloc = 5.9 GB
TotalAlloc = 621.0 GB
Sys = 14.1 GB
NumGC = 260

In general, I found a compromise by putting site.Store.Add into if statement, so its adding only pages that have some particular parameter in front matter (same pages that have “build”: {“list”:“never”} param).

It would be nice if “hugo list” command would have a flag to list all pages including “build”: {“list”:“never”} pages, because technically they are still published, just not listed in sections/etc.

my example stored the complete page, you could try to just store needed value(s) which might reduce the memory footprint but this depends on internals - test it

you could also use warnf in templates to print values to the log and parse that in a post build step.

Yes.

If the pattern has something unique when compared to the other pages, you can control the build option in your site configuration (by environment) instead of in front matter. Then you can do:

hugo -e development list all

A common pattern could be a string used in all paths, or even the number of path segments.

config/
├── _default/
│   └── hugo.toml
└── production/
    └── hugo.toml

config/_default/hugo.toml

baseURL = 'https://example.org/'
languageCode = 'en-US'
title = 'My New Site'

config/production/hugo.toml

[[cascade]]
[cascade.build]
list = 'never'
[cascade._target]
# This targets all paths with four segments.
path = '{/*/*/*/*}'

You can cascade values down the content tree from either front matter or your site configuration.

https://gohugo.io/content-management/front-matter/#cascade.

1 Like