Thinking long-term (long post)

I keep coming back to this - so you have a discussion forum, and an issue tracker, and a chat room, and your open source projects are small but chugging along just fine. Lots of people are dropping by, and there’s a community making it all worthwhile. Oh, and you’re blogging and documenting with Hugo, surely.

The context for me is self-hosted open-source development (s/w & h/w).

3 years later … Some projects are still going strong, others have become comatose (nothing really dies in open-source land, does it?). Your sites are starting to look a bit dated, some of the fancy new features are not available unless you upgrade in a major way, or maybe even migrate to a new system. Which rarely happens, because it’s such a chore and feels more like moving sideways than forward.

6 years later … Some of the tools you use have stopped evolving, and you start looking around for alternatives. The options don’t look good, almost everything whispers “major migration!” in your ear. Sooo… you abandon some of your beloved setups, perhaps create a messy/partial/static snapshot to try and keep the information available for Googling, and … start over with some new tool and maybe also a new DNS (sub) domain. Life is good again, and the old information is sort of still available, almost all URLs and embedded images are still intact.

9 years later - That migration you did? Well, turns out it’s partly broken now. Some URLs are not working (much of it not your fault, that’s simply how the web ages), and the look of those static snapshots is confusing, giving the impression that the snapshot is still where the action is. The new tools and site are going ok, but even more projects have become either completely outdated, or again comatose, for lack of progress. Bit-rot is setting in.

12 years later - You have moved on. Personal changes after over a decade make all this work a fair bit less important, and you’re really not spending much time with most of the old projects anymore. Some have changed hands, and are being maintained and taken further by others, elsewhere.


Very loosely speaking, the above describes my situation, and keep in mind that I’ve been moving more and more to a self-hosted setup. Which is not the main point of my story - even when hosted on cloud services, the above scenarios could still pan out on the same way. In my view, it doesn’t become someone else’s problem just because the bits live on someone else’s server.

The point of my story, is that I would like to bring some attention to these long-term aspects of our digitally-enhanced (-reduced?) lives. And the reason for bringing it up in this forum, is that “the Hugo approach” stands out as the shining light in terms of long-term perspective, in my opinion: static files require no particular infra-structure, can be served by the weakest systems, and can end up on archive.org or on anyone else’s machine, with no extra effort.

If you find yourself nodding in agreement with some of the above, then perhaps you will also be willing to think about the following:

  • can we have a commenting system which is also static-file based?
  • … this requires re-generating comment views after each new entry
  • likewise for forums, i.e. can we have forum systems as static pages?
  • and again for issue tracking, as a specialised variant of forums?

At the moment, most of these live in dynamic systems, which need to be kept operational, secure, and up to date, and which launch processes even when hardly anything changes after a while. GitHub’s issues come to mind.

Chat rooms, i.e. Gitter, Slack, etc, are different in my view, since they are not necessarily intended to stay around forever, and also not always open to full public reading. As such, I see less need for “static chats”, if such a thing were possible. Just like voice conversations are how we make things happen, find consensus, and take action - without saving the audio itself.

Anyway, I hope this is food for thought. Now I need to get back and ponder on how (and whether!) to keep all my ageing server setups on life-support …

2 Likes

I’m a long term advocate of “future proof” docs, and have put a lot of my company’s stuff in text, rather than Word or Powerpoint.

3 Likes

Take the situation you describe above and magnify it to sites with hundreds of thousands of pages, huge distributed authorships, and millions of dollars of continual investment and you’ll see why I feel like my job as a content strategist has (reasonably) good job security (at least in the context of tech). More importantly, I hope you’ll realize that everyone has these issues, so don’t beat yourself up about them and lose perspective on all the great work you’re doing. Note to that sometimes losing perspective means trying to shoehorn inappropriate use cases into a tool you’re currently digging because it’s beautiful and easy to use (i.e. Hugo).

Loving Hugo doesn’t mean you have to make everything static. Relational databases and other solutions were created with hard-fought wisdom to solve specific problems—and they often do a damn good job at that. Your best bet is to be strategic about which intellectual property you think needs to be most portable and which will provide you (or others) the most value in the long run; it’s the distinction between being a curator and a hoarder. Good luck with all your projects and welcome to Hugo!

you make a good point. I remember reading somewhere about IBM putting its bazillion pages of manuals in DITA format, because it makes sense for them and how they manage their massive docs requirement. It’s indeed text, although XML and quite different from markdown. At least if you want to, you can chuck it in a git repo.

1 Like

Thx for the comments. @rdwatters - I don’t think we’re talking about the same thing. Keeping services alive is one thing, and yes, that needs well-chosen and elaborate software solutions. My point was rather that once site activity ends, why keep these dynamic systems up at all? It’s a bit like having a book read to you when you want to look up something, versus storing the book on a shelf, with virtually no maintenance, and picking it up occasionally, when needed. But then with a website, and on a worldwide scale.

@RickCogley - As for representation formats for static data: yes, that too is important, but not really my point. To bring up the (dead wood) book analogy again: that’s more about material, layout, and font choices. Pick whatever works, IMO.

My concern is that a lot of information ends up in active systems (i.e. dynamic-content servers), even though that information will not ever change again, after a certain point. It’s there as historical fact, and for reference only. Do we really want to make access to such stored data reliant on specific software solutions, which tend to evolve and become obsolete all the time?

We’ve solved it for weblogs and documentation: Markdown, XML, plain text, HTML - whatever, they’re all readable and likely to remain accessible decades from now, without further migration efforts. The point of Hugo and similar static site generators, is that the results they produce don’t depend on them for future use. Again just as with real books: you don’t need the publisher once the job is done. The book, and website, will live on.

But we haven’t solved this for other valuable sources of information and knowledge: comments, discussions, issues. These become inaccessible the moment people stop maintaining the systems they were created on.

Can (and should) we not make our public information work in such a way that when the dynamics end, all that remains is a static rendering which continues to offer access, and can easily be archived and replicated?

There may be light at the end of the tunnel, as frameworks such as React and Svelte make it possible the pre-render pages on the server side. By simply storing that on disk as a snapshot, the loss of dynamism would leave a usable “after-image”.

There are some systems, like staticman, that try to address this, with a small dynamic component being used to collect the comments, then pushing those to the static site’s repo. It feels like a reasonable compromise, but, it would be nice if Hugo were able to handle that somehow natively, but, it feels like it’s out of Hugo’s wheelhouse. I have one blog using disqus but, I want to move all that to some kind of static comment system. I don’t want any of my comments in any system which prevents me from searching or easily removing my comments and discussions.

A few things come to mind that might be relevant:

  • seems to me it will always cost money to host content no matter what it is - the domain, hosting, dns, storage, bandwidth. Granted, putting a site on S3 when it has almost no traffic and exists as an archive is super cheap. Who will pay, and what is the real price?
  • I mirror some text content to services like tilde.town, because it is free to me and I don’t feel like I’m abusing their system by putting a light site there.
  • I have a database which accepts markdown in its unstructured text fields, and we use that to store lots and lots of operational data. But these are an example of critical info that needs to be searchable, so, I took some time to do a scripted extraction of these many, many fields into markdown files which I commit to git. At least now I have the markdown files and any attachments, in a move-able format.
  • depending on the system, it may be expensive to move a dynamic system to a static system. In enterprise projects, “data migration” is always a major pain point, and something we try to avoid. It’s such a pain and so expensive, that it ends up much cheaper to just keep a license for old system X alive (compared to spending a large amount of cash on the migration), so we can search when needed.
  • some services leverage those kind of dynamic bits of info like issues, comments, making them purposefully hard or costly to retrieve, to make the service more sticky. When you transfer a github repo, the last time I checked, issues don’t transfer, which necessitates using a service that can do it, or, writing a script.
  • I don’t trust any of the major SNS systems with any longer writing (here I am ironically writing a longish post too). I am ok using FB etc to communicate with my family but, I don’t want to invest my time into putting any real content in, that I then cannot even search or export…

In my opinion, yes, that would be the best way.

1 Like

In my opinion choosing Hugo (or other SSG) is far more future-proof than other systems out there. Whole site is compiled forever (as long as browsers can interpret HTML). What I don’t like is “default” structure of document. Every CMS is trying to cram down content to one “field”. Like Wordpress with Gutenberg and 10+ years old database structure. I don’t see it being future-proof with actual html in post_content DB field. It applies to markdown as well. Then using wierd regex and stuff to build different view based on same content (eg. XML, AMP). In my newer projects I found yaml frontmatter fields much more efficient in storing content as structural object. Based on that you can always change final HTML/XML/TXT/whatever output. So content is as portable as possible. To be more clear, I don’t blame markdown. I even use markdown in frontmatter for simple formatting like headings, bolds, links, … But using this frontmatter approach requires few extra steps to be compatible with stuff like {{ .Description }}. In the future I can see how content will be compiler independent as well as CMS independent. So you will be able to change different parts of ecosystem.

@zzamboni are you using forestry? while not compulsory - that is one usage pattern which it seems to preference. The only problem I’ve found is that, unless the YAML (or other semi-readable format) is “smartly” created and laid-out that it becomes kind-of an unreadable mess (though manageable) which can undercut its use as a “graceful static fall back” which seems to to be what @jcw is after?

@HenrySkup I used other systems besides forestry. Forestry is good, but sucks in some ways. I’ve seen few other open-source solutions, where you are able to map your CMS to content. Which is similar to mapping content to output view. So it works kind of bi-directional. One source of truth to satisfy CMS and output. I believe there will be new CMSes following this pattern coming. Which will give you ability to switch requiring just initial config for new CMS to be able to read your content. For instance, wordpress is full of settings which are serialized or json encoded. They are taking advantage of this pattern for years. You can also easily convert structural data into any format, like json, which is widely supported by almost any programming language. APIs outputs are also in this format and more and more applications rely on them. Internet of things is using this pattern, Google reads your website content from json+ld to better understand your website or provide speakable answer to user asking question aloud. So why not keep whole content structured as well?

I am using HUGO. HUGO is the best among static site generators. But I am using it hesitantly. I have mildly successful, long running blog on WP. And several business related sites too. Yes it takes time to manage and is not simple. And it is not cheap. But I do not see myself migrating anywhere soon.

Allow me to be blunt.

Without Windows Desktop GUI App , HUGO will be yet another dead project, much faster than 6 years will pass.

Yes you read that right: “Windows” and “GUI” and “desktop”. No amount of Linux CLI magic will be ever enough to make HUGO a market noticing success.

Until “end users” (aka “normal people”) start using it in millions, no open source project will be even noticed on the market. Without even mentioning well organized technical support army required behind it.

WordPress is a sorry php+javascript mish-mash on top of the mountain of technical debt. WP database schema is laughable kindergarten sql. But WP is mature. Market is conquered by the WP. WP rules. WP generates huge revenue for huge number of people and companies. For them this HUGO, HEXO or whatever (we are super exited about) is simply non existent. Also Drupal, Joomla or Wix, they do not care about at all.

WP is also wining the Enterprise (or has it already won?). Share Point multi million licences on “server farms” are deep under the thick carpets of boardroom blame, shame and finger pointing.

Azure web apps are few and far between, but none is of course running a Share Point. (btw Share Point DB schema is even more laughable than WP one). There is a lot of Azure, AWS, etc. VM’s running WP sites. A lot. It is hard to even imagine such a huge number of web sites. HUGO?

Sorry for rocking the boat (wildly) but I dare to think it is time to present a clear HUGO vision and strategy and road map. That might convince someone to invest…

I think that is the big picture.

Forestry.io is MVP. Read: unusable. Unless the one has a lot of time and likes pain.

IBM DITA? Here are the amazingly fitting lyrics to that situation

We have traveled pain and love 
To call ourselves high born 
Living in a maze so crazed, lunacy is legend 
Lunacy is legend 

Words I fear that clutch my crutch 
And drive your senses crazy 
Men or women too get blue, 
so don`t make living hazy 
No don`t make living hazy

Blue Man and Women, living hazy? Oh my …

Source

Unfortunately a native Hugo GUI is no longer on the roadmap. It used to be. But it’s just that currently, there aren’t enough skilled developers contributing to Hugo and the ones who do contribute have other priorities which are also quite important (if not more important that a Hugo GUI client).

At some point I’m sure that we will get there, but it just doesn’t seem likely to have a Hugo GUI client in the foreseeable future.

Also regarding Hugo’s survival hm… let me just point out that the way things are at the moment with all the trendy JS based SSGs that are dog slow, Hugo’s survival is guaranteed.

However people who want to learn Hugo -as thing are at the moment- they have to be willing to invest the time.

It took me about a year and a half to get to grips with Hugo and I’m still learning something new every time I develop another project that has a different requirement.

Also I do not come from a Computer Science background. But the ride has been fun so far. And it’s also a bonus that now I have a basic grasp of programming concepts like variables, Booleans, conditionals etc. thanks to Hugo.

2 Likes

GO is one very big plus for HUGO, even users do not need to use it. Except templates. Nice to learn users learn from HUGO.

It makes more sense for a different project to build such a tool on top of Hugo. I want Hugo to focus on web tech and templating, rather than desktop quirks (any more than it already has).

3 Likes

You’re talking about user-generated content - i.e. users who do not have “backend” access.

You can and do have user-generated content systems that store content statically instead of using a database.

That isn’t the problem. The problem is that any user-generated content system has to be a dynamic system because you are creating/editing backend data, which brings in security, authentication, and other issues.

But side-by-side systems are emerging which allow for separation of Content Store, Generation, and Access (not special terms).

Relevant to your query are projects like Netlify CMS - the open source JAM stack editor that provides an Access layer to the Content Store.

Netlify CMS does rely on a git backend and an authentication server, but it is at least agnositc as to the Generator and somewhat agnostic as to Content Store (although it has to be markdown).

Netlify CMS does not currently support user-generated content like you are thinking of, being focused on dev and editorial roles, but it could in theory do so (and will likely do so in the future).

I expect this space will become more filled with more options in the next few years.

We’ll get there.

If I can add my two cents. I believe Hugo is and should remain primarily a mechanism for the presentation layer. As others have said, I do not think it should have a GUI, or built in mechanism for capturing user generator content. IMHO for user generated content, just put that data in a database, that’s what it’s meant for.

That said I think the biggest issue holding Hugo back from greater adoption is it lacks a feature to generate pages from data sources. If you look at a project like GatsbyJS which exploded in popularity, it puts a large emphasis on the ability to consume any api to generate pages, and I think that has really helped propel it. (There’s also the fact that uses react which is really popular as well, but I think the point stands.)

People are using Gatsby with the Shopify API, Contentful, Magento, Wordress, Firebase, or whatever they feel like. Just make a Gatsby source plugin, or use one someone else made and you’re good to go.

I think if Hugo added the feature to generate content from an API it could be huge, because let’s face it Gatsby’s build times are freaking slow. Hugo could generate an eCommerce site with say 5,000 products in seconds while Gatsby would be working for forever. I understand Gatsby has a bunch of other features (gatsby-image, it generates a SPA, etc), and they are leveraging those features very well. Hugo on the other hand provides the ability to generate a very large static website very easily and quickly, which combined with the ability to use multiple data source (like I’m saying) makes it a very nice alternative.

I’m aware that there’s already an open issue for this feature. However, I’m not sure what the timeline looks like. I do believe a feature like this should be high on the priority list.


Sidenote: I’ve used Hugo with APIs myself on a couple websites, but it’s always required me to convert the API response to markdown files with some custom code. Most people I think won’t usually do that.

1 Like

Some Additional Thoughts:

I think build time in general is the biggest limitation with using Static Site Generators at scale. Hugo is probably the closest to overcoming that limitation. Smashing Magazine has shown how Hugo can work with large publications and the like. So I guess if Hugo wanted to think long term they should think about what kind of features are important to dev teams of publications the size of Smashing Magazine or larger? Could something like partial rebuilds help? Generating pages based on data sources also lets publications use the CMS of their choice, which is greater incentive as well.

I wasn’t talking about sites which grow. I was talking about sites which stay low-activity, and eventually cease to be updated. The point of Hugo in this context, is that the tool is not needed to keep the site online. Think decades. Like books, which can remain useful long after the publisher is gone.

1 Like