Improving performance when working with remote data

iBasher · January 7, 2025, 9:13pm

According to Content adapters docs:

Two or more pages collide when they have the same publication path.

In my scenario, I need to generate pages based on my content files and then add some data from a remote URL. I assume this is because I have 10k posts and remote data lives in one single .json file, which takes Hugo forever to process:

{{ $postTitle := .context.Title }}

{{ $url := "https://example.com" }}
{{ $cacheKey := print $url (now.Format "2006-01-02") }}
{{ $resource := resources.GetRemote $url (dict "key" $cacheKey) }}

{{ with $resource }}

	{{ $data := . | transform.Unmarshal }}
	{{ range $data }}
		{{ if eq .remoteItemTitle $postTitle }}

			{{ .remoteItemPrice }}
            
		{{ end }}

	{{ end }}
{{ end }}

Interestingly, node.js does this much quicker. I assume the content adapters could be much quicker than .GetRemote

Are there any ways to improve this with content adapters or maybe my code is just not optimized?

jmooring · January 8, 2025, 3:26am

I’ve read your post several times, and I don’t understand how this is related to either content adapters or page collisions.

Nor do I understand the problem.

iBasher · January 9, 2025, 4:01pm

Let me clarify:

I have 10,000 local posts. Besides the standard .Params data, I need a “price” value for each post from a remote JSON file. I tried using Hugo’s .GetRemote to fetch a single large JSON containing all prices, then generate the final pages—but it takes around five minutes for Hugo to process all 10,000 posts.

To speed things up, I wrote a Node.js script that preprocesses the prices by injecting them directly into .Params before building Hugo. This brought the build time down to under 30 seconds, including the preprocessing part.

Now I’m wondering if I can replace that Node.js approach with Hugo’s content adapters. I don’t need to build entire pages from remote data; I only want to include the prices alongside existing content. However, I saw a warning about page collisions—does that prevent me from mixing local and remote content?

jmooring · January 9, 2025, 4:07pm

Content adapters are for creating pages from data, not for adding data to existing content. And adding data to existing content will not cause a page collision.

Your question is purely about performance.

Without access to your project I have no idea what improvements could be made, but the first place I’d look is the potential for caching partial templates.

jmooring · January 9, 2025, 4:20pm

And if I understand your code above, with 10,000 data records and 10,000 corresponding content files there would be 100 million iterations.

You can cut the number of iterations by ~50% by breaking out of the loop once you find a matching record.

Or better yet, use where to find a matching record instead of iterating over the data. For example, with 1000 content files and 1000 matching records in a data file…

{{ with resources.Get "data/people.json" }}
  {{ with . | transform.Unmarshal }}
    {{ $t := debug.Timer "Test compare" }}
    {{ range . }}
      {{ if eq .title $.Title }}
        {{ .description }}
        {{ break }}
      {{ end }}
    {{ end }}
    {{ $t.Stop }} --> approximately 3 ms per iteration
  {{ end }}
{{ end }}

{{ with resources.Get "data/people.json" }}
  {{ with . | transform.Unmarshal }}
    {{ $t := debug.Timer "Test where" }}
    {{ with where . "title" $.Title }}
      {{ (index . 0).description }}
    {{ end }}
    {{ $t.Stop }} --> --> approximately 0.7 ms per iteration
  {{ end }}
{{ end }}

iBasher · January 9, 2025, 5:05pm

It is actually 4 times that because I have 4 external sources of data for each file.
Now I get why Node.js is quicker - because I iterate it only once.

Let me try your suggestions, will get back to you with the results. <3

jmooring · January 9, 2025, 5:07pm

Another option that I have not tested is to wrangle the data into a map instead of a slice of maps, and the key for each map would be the title. Now you’d have an indexed data file to use with the index function ~~which should be faster still~~.

Edit: just tested this; performance is about the same as the range/break construct due to the time required to transform the data structure. Use where unless your data is already structured with a key.

iBasher · January 9, 2025, 6:34pm

{{ break }} is actually improving time by ~50%

But the actual lifesaver was index. I don’t even have to use where.

In my scenario, with a slightly different data structure (map[%itemName:%description]), that translates to:

{{ with resources.GetRemote $url $options }}
	{{ $t := debug.Timer "index" }}

	{{ $data := . | transform.Unmarshal }}

	{{ $description := index $data $.Title }}
	{{ $description }}
	
	{{ $t.Stop }}
{{ end }}

Pretty eloquent logs:

"range"         average 65.3ms

"range + break" average 35.3ms -> ~2 times faster

"index"         average 0.18ms -> ~362 times faster O_O

Thanks @jmooring you are the best as always.

That’s funny that my original data is already “the key for each map would be the title”

jmooring · January 9, 2025, 6:56pm

That’s great. My index tests were slower because I had to transform the structure of the test data.

system · January 11, 2025, 6:56pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Content adapters: examples and performance tips & tricks content-adapters	43	2290	November 4, 2024
Sharing Content Between Hugo Sites (Bulk Content Adapter Pages) support content-adapters	2	39	February 25, 2025
Content adapters: using local data support content-adapters	2	253	June 16, 2024
Content adapters: caching support content-adapters	4	474	June 4, 2024
Related content with content adapters support content-adapters	6	38	January 13, 2025

Improving performance when working with remote data

Related topics