Architecture / Code Overview?

Hi,
the contribution guides are very detailed about general things like cloning and building the code, but I can’t find anything that explains the overall architecture of Hugo.

Is there anything that explains the overall design (what components interact how) and code flow (what is called when and what happens in parallel)?

Trying to understand the entire code base would probably take weeks without assistance, so I would really appreciate any pointers.

I try to estimate how much effort it would be to adjust Hugo to my needs. From studying the docs alone there seem to be quite a few blockers, thus I’m hesitant because the last migration of my project from Gitbook to Jekyll was much like one step forward, two steps back. But I might simply lack a “Hugo mindset” to see that all the tools are already there, ready to use. :slightly_smiling_face:

2 Likes

There is nothing that explains the overall design, code flow and mindset. It’s a Golang app. It’s not a finished product according to Semver rules (0.60 was just released, there is a big 0 in the front). I heard of Golang people who look at the code and understand it :wink: If you don’t speak Golang then don’t try to adjust Hugo to your needs. If you want to create a static website, then use it. It’s perfect for it. And speedy.

1 Like

Finished or not, there must have been design decisions in the making of Hugo. And it’s hard to reason about them as you can’t read the unwritten code. Also, there is great software which hasn’t or will never reach 1.0. It doesn’t say much about the quality or stability.

Here’s the situation: we already have a static site generator (the second in fact), customized to some extent, acceptable performance. What we don’t have is someone with extensive Ruby knowledge for Jekyll, so fixing bugs and implementing additional features is quite a struggle. We do have a dedicated Go team on the other hand. But I am the one who has to assess possible alternatives and convince management to invest time and money for yet another migration. And so far there are no real reasons to switch, because Hugo lacks too many things:

  • Compliant Markdown parser (blackfriday has some critical bugs)
  • Plugin system? (custom Go code, not templating)
  • Access to AST for advanced validation and ability to modify nodes on the fly
  • Extensibility for non-text output formats

There are also dozens of small things that are a must, but I’m sure most of them are supported by Hugo. I have to check in detail though.

1 Like

0.60.0 was just released. Goldmark is now the default markdown parser.

What specifically do you need a plugin system for? Maybe it can be solved another way.

Do you have a specific example in mind?

Great to hear about the parser switch. We use deeply nested lists a lot, so blackfriday’s list-related bugs were a real blocker.

We have semi-structured data files from which a lot of content is generated. It requires some preprocessing including a custom parser with a state machine of sorts and recursive calls to get the data into the correct shape for rendering. In Jekyll this is done through a plug-in script with a few hundred lines of code at build time.

Other extensions implement versioning and a version switcher (the data part, the actual switcher uses JavaScript), a two-tiered navigation, previous/next links etc. There is also a third-party plug-in to extract text content for indexing.

I would like to access the AST to find broken markup, e.g. **emphasis* with an asterisk missing at the end. Grepping for such mistakes is difficult (many false-positives or inefficient regex). It would be much easier to detect in context (e.g. ignore inline and fenced code).

Output formats would be PDF and ePUB. Pages media requires some automatic content changes / different rendering to work properly, e.g. convert external links to text + footnotes with the URL.

See this GitHub issue: Build pages from data source · Issue #5074 · gohugoio/hugo · GitHub

If you mean versioning as in “let me read the latest version of your docs, or some previous version”. Then the things you mentioned can all be done with Hugo + JavaScript.


Others will have to chime in on AST and PDF/ePUB.

I second that. An overview of the architectural structure of the Hugo project would be very helpful in bringing new contributors on board. I for myself struggle to fully grasp the code structure and where would be the right place to put an addition or where to find the possible source of a bug etc.

A good example I know of would be the structure documentation of the Blender3D project.
https://wiki.blender.org/wiki/Source/File_Structure

See also this article on ARCHITECTURE.md

This could be improved, but there is a list of package descriptions based up on the package godocs at pkg.go.dev.

1 Like

Architecture Design of Hugo #

image

Hugo’s architecture idea is easy to understand. It is mainly divided into three major blocks: configuration module, site module and dependency module.

Configuration Module

The first thing Hugo parses is the configuration file config.toml of the user project. Initiated by configLoader, the configuration file is read from the hard disk and stored as a key-value pair object after parsing.

configLoader mainly needs to complete three things:

  1. Load the user project configuration file to understand the user’s custom requirements.
  2. Complete the default configuration Defaults Config, to ensure the normal operation of other modules.
  3. Generate module configuration information, starting from the user project, using the user project as the first module - project module, and in our example there is a second module, that is the theme module mytheme.

There are dependencies between modules and there is only one Owner Owner. The project module project module is special, because it is the initial module, so it does not belong to any other modules.

type Module interface {
	...
	// Owner In the dependency tree, this is the first 
	// module that defines this module as a dependency.
	Owner() Module
	...
}

After all the information is collected, the config.Provider service will be provided externally: it can be queried and configuration items can be updated.

HugoSites Module

This is the core module of building a site, which is equivalent to the aggregate root in DDD. It organizes all the information needed to build a site internally and provides site building services externally.

The initialization of HugoSites depends on DepsCfg and Site, yes, there are two sites. The relationship between HugoSites and Site is one-to-many, and the relationship between Site and Language is one-to-one, so a multilingual site will create a site for each language, which together form HugoSites.

Language items are created by DepsCfg, but will be stored in config.Provider, so they are marked in light yellow. The initialization of DepsCfg depends on Fs and config.Provider. Fs records the source file address and release address. The source files come from the user project, which is the actual hard disk file system. The publishing address is obtained from config.Provider, and the default is the public folder. It will check whether it already exists here, and create it actively if not. Finally, synchronize the newly created information such as workingDir back to config.Provider.

As can be seen, their dependencies are HugoSites <- Site <- Language <- DepsCfg <- Fs.

Deps Module

Hugo refers to all the services and objects needed to build a site as dependencies, and puts them all in Deps.

In the process of building dependencies, TemplateProvider that provides templates will be generated; Clear input and output media type MediaType; and output format OutputFormats; will be updated to config.Provider.

It will also be prepared for collecting site content, and there will be a Page Collection to help collect. The publishing service that needs to be used when finally publishing the site is Publisher. These will be updated to Site.

At the same time, it is also necessary to manage resources in a unified manner with clear specifications, which can ensure the convenience of use and conform to the principle of single responsibility in the principle of oriented design. Contains Path Spec that provides a unified standard file structure service; and a Resources Spec with all media type and output format information; and a Content Spec that provides services for Content information; Plus Source Spec to help define resource policies, such as filtering functions.

With the help of Deps, all the information needed to build the site, such as raw materials, rules, and output formats, etc., are prepared.

1 Like

Hope it helps:

1 Like