Hi,
I would like to convert three existing web sites of small non-profit organizations to Hugo. It is very important that the URLs of the HTML files and Markdown files are preserved, as they have been stable respectively for more than 20 years (*.html
) and 10 years (*.md
) and several deep links appear in printed documents. However, I am unable to find a way to configure Hugo so that it does not rename files or move them around. After spending several days trying different configuration options, reading the documentation and many messages in this community forum, I come here to ask for help or for a confirmation that Hugo is not suitable for importing existing web sites.
Here is a simplified view of the directory structure of one of the sites that I would like to convert to Hugo:
.
βββ index.html
βββ index.md
βββ legal.html
βββ legal.md
βββ news/
β βββ 1995-01-13-foobar.html
β βββ 1995-01-13-foobar.md
...
β βββ 2020-10-01-quux.html
β βββ 2020-10-01-quux.md
β βββ index.html
β βββ index.md
β βββ team.html
β βββ team.md
βββ reports/
β βββ 2005-01-31/
β β βββ index.html
β β βββ index.md
β β βββ graph1.png
β β βββ graph2.png
....
β βββ 2020-08-31/
β βββ index.html
β βββ index.md
...
βββ topic1/
β βββ article1.html
β βββ article1.md
β βββ article2.html
β βββ article2.md
β βββ index.html
β βββ index.md
β βββ sub-topic12/
β βββ image1.jpg
β βββ index.html
β βββ index.md
β βββ sub-sub-topic123/
...
βββ topic2/
...
Copying the *.md
files next to the *.html
files generated by Hugo in its output directory is a minor issue that can easily be solved. But preserving the existing directory structure seems to be impossible, with or without uglyURLs
and other settings. I have read many articles here describing similar problems, but most of the answers can be summarized as: adapt your directory structure to Hugo instead of trying to adapt Hugo to your web site.
If necessary, I can rename some index.md
files to _index.md
when running Hugo and then publish them as index.md
. But this is not sufficient because no matter how I configure Hugo, this does not work: Hugo generates the *.html
files in a different path than their source *.md
files. For example, /news/_index.md
generates /news.html
instead of /news/index.html
, while /news/index.md
appears to work but blocks the generation of other contents under /news/
because Hugo treats it as a leaf and then ignores /news/team.md
. There are several other unexpected renames, such as Hugo stripping the leading date from some file names or moving some files one level up in the directory structure. I also tried to fix the date-stripping issue by playing with the [permalinks]
configuration but then some files under /news/
or /reports/
such as /news/team.md
get an unwanted date prepended to their HTML output.
This is rather frustrating. I wish there was a setting preserveOriginalURLs = true
or preserveFilePaths = true
.
Here is my last attempt at creating a config.toml
but that still does not work:
baseURL = "(...)"
title = "(...)"
theme = "(...)"
uglyURLs = true
disablePathToLower = true
disableKinds = [ "taxonomy", "term", "RSS", "sitemap", "robotsTXT", "404" ]
[taxonomies]
[frontmatter]
date = [":filename", "date", "publishDate", ":fileModeTime", ":default"]
lastmod = [":git", ":fileModTime", ":default"]
...
A bit of history: the oldest of these three web sites was created in late 1993, when the web was still very young and was competing with FTP and Gopher. Its domain name, directory structure and URLs have remained stable since 1996. It started as a set of static HTML files, then after a few years it had its contents managed by the now-obsolete Website Meta Language (WML), then around 2005 its contents were converted to Markdown with some Makefiles and scary Perl scripts to convert these Markdown files back to HTML again, update the indexes and lists, etc. It was decided to publish the Markdown files to make it easier for third parties to extract and convert the contents of the site. The other web sites that I mentioned are a bit more recent but have a similar history and similar directory structure containing both *.md
and *.html
files. The custom Perl scripts that generate these web sites are difficult to maintain and inconvenient for Windows users (the majority of the current contributors to these sites) so I would like to simplify this system and replace it by Hugo. Alas, it looks like Hugo is unable to preserve the existing directory structure.
Because of that history and because of the way these web sites are currently managed, I have the following constraints:
- URLs must not change. The
*.md
files are published alongside the*.html
files that they generate. - I cannot configure the web servers to rewrite URLs.
- I cannot override the URLs in the front matter of the Markdown files. In fact, I would like to avoid having to set anything in the front matter, for two reasons: on the one hand, some of these files are edited by people with very little knowledge of computers and who could accidentally make some content unavailable if they copy-paste or edit the front matter without understanding it. And on the other hand, some of the Markdown files are also fetched and processed by (old) third-party tools that are unable to understand or skip the front matter.
- If any weird tricks are needed, they can be in
config.toml
, inlayouts
(I also played with that, without much success), in a custom theme or in other files that are preferably outside thecontent
tree.
I like the single-binary approach of Hugo because it is easier to use for Windows users who do not have to install a whole language framework such as Perl, Python or Javascript/Node.js and who do not have to care about script dependencies. However, I am about to give up because it looks like Hugo wants to force web sites to be organized in specific ways instead of preserving the existing file names. So this cry for help is my last attempt before switching to another static site generatorβ¦
Thanks for reading my long rant. Any help about how to configure Hugo would be appreciated, or a confirmation that Hugo is not suitable for this task.