Two or more pages collide when they have the same publication path.
In my scenario, I need to generate pages based on my content files and then add some data from a remote URL. I assume this is because I have 10k posts and remote data lives in one single .json file, which takes Hugo forever to process:
{{ $postTitle := .context.Title }}
{{ $url := "https://example.com" }}
{{ $cacheKey := print $url (now.Format "2006-01-02") }}
{{ $resource := resources.GetRemote $url (dict "key" $cacheKey) }}
{{ with $resource }}
{{ $data := . | transform.Unmarshal }}
{{ range $data }}
{{ if eq .remoteItemTitle $postTitle }}
{{ .remoteItemPrice }}
{{ end }}
{{ end }}
{{ end }}
Interestingly, node.js does this much quicker. I assume the content adapters could be much quicker than .GetRemote
Are there any ways to improve this with content adapters or maybe my code is just not optimized?
I have 10,000 local posts. Besides the standard .Params data, I need a “price” value for each post from a remote JSON file. I tried using Hugo’s .GetRemote to fetch a single large JSON containing all prices, then generate the final pages—but it takes around five minutes for Hugo to process all 10,000 posts.
To speed things up, I wrote a Node.js script that preprocesses the prices by injecting them directly into .Params before building Hugo. This brought the build time down to under 30 seconds, including the preprocessing part.
Now I’m wondering if I can replace that Node.js approach with Hugo’s content adapters. I don’t need to build entire pages from remote data; I only want to include the prices alongside existing content. However, I saw a warning about page collisions—does that prevent me from mixing local and remote content?
Content adapters are for creating pages from data, not for adding data to existing content. And adding data to existing content will not cause a page collision.
Your question is purely about performance.
Without access to your project I have no idea what improvements could be made, but the first place I’d look is the potential for caching partial templates.
And if I understand your code above, with 10,000 data records and 10,000 corresponding content files there would be 100 million iterations.
You can cut the number of iterations by ~50% by breaking out of the loop once you find a matching record.
Or better yet, use where to find a matching record instead of iterating over the data. For example, with 1000 content files and 1000 matching records in a data file…
{{ with resources.Get "data/people.json" }}
{{ with . | transform.Unmarshal }}
{{ $t := debug.Timer "Test compare" }}
{{ range . }}
{{ if eq .title $.Title }}
{{ .description }}
{{ break }}
{{ end }}
{{ end }}
{{ $t.Stop }} --> approximately 3 ms per iteration
{{ end }}
{{ end }}
{{ with resources.Get "data/people.json" }}
{{ with . | transform.Unmarshal }}
{{ $t := debug.Timer "Test where" }}
{{ with where . "title" $.Title }}
{{ (index . 0).description }}
{{ end }}
{{ $t.Stop }} --> --> approximately 0.7 ms per iteration
{{ end }}
{{ end }}
It is actually 4 times that because I have 4 external sources of data for each file.
Now I get why Node.js is quicker - because I iterate it only once.
Let me try your suggestions, will get back to you with the results. <3
Another option that I have not tested is to wrangle the data into a map instead of a slice of maps, and the key for each map would be the title. Now you’d have an indexed data file to use with the index function which should be faster still.
Edit: just tested this; performance is about the same as the range/break construct due to the time required to transform the data structure. Use where unless your data is already structured with a key.