Encryption for content

Hi everybody,

I am a new user to the hugo discourse group, but a long time hugo user. Today I went digging for hugo encryption for content and it was quite a journey which finally led me to the writing of this forum post, hoping to get a civil discussions with the community.

I read the contributing guide and I read the disclaimer for the lack of working-hours for maintainers. I understand that hugo is completely free thus nothing is due. I want to thank every developer, maintainer, translator and contributor out there and in this forum for the amazing work. This post is meant not to criticize the work of anybody within this great community, but to start a discussion about a feature discussed in issue #7547 and issue #8969.

TL;DR of the issues is the request for a new template encryption function, which would enable users to encrypt content using a pre-shared-key at hugo “website-generation time”.

Now, I get this is a niche usecase, but I want to point out that a workaround written by another user hugo_encryptor has 114 stars and 24 forks. I know this is small peanuts for the numbers of people hugo is serving, but please hear me out.

To me, it seem very clear that there is a use-case and that there are people needing it and that are already using the former workaround, launching a python script post-hugo and sticking with a bad-suited cryptographic algorithm (AES-CBC).

I want to clarify my use case, which is the same one exposed by github user Lednerb in issue #7547. Quoting:

In my case I’m hacking machines on HackTheBox and want to publish some Write-Ups for machines / challenges. To ensure that only users can read that Write-Up / Walkthrough that have already hacked the machine by themselfs, the content should be encrypted with a password / token / flag / hash that is found when the machine was hacked / the challenge was solved.

One way to provide this would be to upload a password-protected PDF or 7z file for downloading, but after a while the HTB machines get “retired” and the walkthroughs can be published without any protection. In this case, it would be possible to re-publish the content.

Github user moorereason came up with a proposal of a solution in puill request #7605, which after a short discussion with @bep got closed by the author of the pull request itself.

In the pull request @bep asks some important questions, quoting:

I think we need to demonstrate “usefulness” here before we add this. The only type of bugs I’m worried about here is what we’ve seen in Outlook (I think) where they have for some odd reason sent encrypted messages as plain text.

“We have test”, you say.

Yes, but I have seen stranger things happen where people comment out code paths and tests to debug something, then forgets to enable it …

I will try to address the points raised by @bep.

First and foremost, I hope I already demonstrated the “usefulness” (or at least the potential one) in my previous paragraph, thus to share CTF write-ups. I know this is niche, but I am sure that there are many others out there and that adding a tpl function to symmetrically encrypt content will greatly make their day.

Its hard to contest the second point, mistakes are human and I think it is really really hard to write 100% bug-free code. Most of the time code is interacting with other code written by others and runs in a environment in which other programs are running. CPUs have been running in a non-deterministic fashion for the better part of the last half-century. That being said, I think that this argument could be applied to any part of the code-base and it would still apply.

I get that with encryption involved, something important is probably at stake, but limiting the capabilities of software for the fear of possible bugs, without any lead on where those may come from feels… strict.

That being said, I looked into current tpl crypto functions and saw that some (such as HMAC) have changed over time, but I think that is not a big deal and coding a working “bug free” implementation would be trivial.

I hope that this thread is well received from the community, as I already stated it is not meant as a critic but as a possible starting point to talk about this issue, which is very much heartfelt to CTF players :slight_smile:

Thanks for your time and attention,
antipatico

You want to share posts to specific readers only, isn’t it?

This is the objective of this experiment too:

Maybe you’ll find interesting bits out there

Thanks for the reference, but unfortunately this does not fit my use case, please read the first quotation to understand my scenario.

I read the PR you linked to and read the discussion and your post. I think there are two concerns in terms of bug-free implementation

  1. The initial implementation. This is probably the easy part and least likely to have problems.
  2. Ongoing maintenance of the code. When a developer or contributor who’s main focus is not this section of code has to deal with something affects or requires alteration to this section of code (e.g. changes to HMAC functions, or generic interfaces) and makes a mistake at that time. Over the lifetime of the code this is significant, and most people view a crypto function as offering some type of promise around privacy of what is encrypted.
    It is therefore much riskier than code that generates known open results.

As for usefulness; I’m not sure a niche case or small collection of niche cases justifies the risk. Would it be ‘handy’ at times? Probably; if it worked correctly. Is that sufficient for the risk associated with the human tendency to make mistakes? I remain to be convinced.

I get your points and I think they are valid, but I think that if leveraging the nacl secretbox package from golang extended library, the implementation would be so trivial that little to no maintenance would be required.

The proposed implementation in the pull request is way too complicated, and the patch can be narrowed down to two functions. Follows an example of implementation, please be gentle while reviewing since I am not an actual hugo contributor and my code may stink:

package secretbox

import (
	"crypto/rand"
	"fmt"
	"io"

	"github.com/spf13/cast"
	"golang.org/x/crypto/argon2"
	"golang.org/x/crypto/nacl/secretbox"
)

// The salt used by the argon2id Key Derivation Function.
// Optimally it is unique to each website (maybe we could use the domain?).
// Even a constant one is fine for the purpose of symmetric key encryption.
const salt string = "saltypeanuts"

// The following are cryptographic constants or variables, which are set to the
// values suggested by cryptographers as of October 2022.

// CPU cost variable of the argon2id Key Derivation Function.
const time uint32 = 1

// Memory usage variable of the argon2id KDF, 64 MB.
const memory uint32 = 64 * 1024

// Parallelism variable of the argon2id KDF, 4 threads.
const threads uint8 = 4

// Key size of XChacha20 algorithm in bytes (constant).
const keySize uint32 = 32

// Nonce size of XChacha20 algorithm in bytes (constant).
const nonceSize uint32 = 24

// New returns a new instance of the secretbox-namespaced template functions.
func New() *Namespace {
	return &Namespace{}
}

// Namespace provides template functions for the "secretbox" namespace.
type Namespace struct{}

// Seal encrypts small messages using the NaCl secretbox framework. The password
// is used to generate a strong cryptographic key using argon2id. A 24-byte
// random nonce is prepended to the resulting output.
func (ns *Namespace) Seal(password, message any) (string, error) {
	passwordString, err := cast.ToStringE(password)
	if err != nil {
		return "", fmt.Errorf("Failed to convert content to string: %w", err)
	}
	messageString, err := cast.ToStringE(message)
	if err != nil {
		return "", fmt.Errorf("Failed to convert content to string: %w", err)
	}

	// We need to derive a strong cryptographic key from the user provided
	// password. We leverage well known cryptographic KDF argon2id.
	var key [keySize]byte
	derivedKey := argon2.IDKey([]byte(passwordString), []byte(salt), time, memory, threads, keySize)
	copy(key[:], derivedKey)

	// To preserve uniqueness of each message and to have sound cryptography,
	// we generate a completely random one for each box.
	var nonce [nonceSize]byte
	if _, err = io.ReadFull(rand.Reader, nonce[:]); err != nil {
		return "", fmt.Errorf("Failed to generate random nonce: %w", err)
	}

	ciphertext := secretbox.Seal(nonce[:], []byte(messageString), &nonce, &key)
	return string(ciphertext), nil
}

// Open decrypts and authenticates small messages using the NaCl secretbox
// framework. This implementation expects to find the 24-byte nonce at the
// start of the box content. The password is used to generate a strong
// cryptographic key using argon2id.
func (ns *Namespace) Open(password, ciphertext any) (string, error) {
	passwordString, err := cast.ToStringE(password)
	if err != nil {
		return "", fmt.Errorf("Failed to convert content to string: %w", err)
	}
	ciphertextString, err := cast.ToStringE(ciphertext)
	if err != nil {
		return "", fmt.Errorf("Failed to convert content to string: %w", err)
	}

	// Key derivation.
	// We must use the same parameters using during sealing, if the user
	// supplied password matches the one given at encryption-time, the derived
	// keys will also match.
	var key [keySize]byte
	derivedKey := argon2.IDKey([]byte(passwordString), []byte(salt), time, memory, threads, keySize)
	copy(key[:], derivedKey)

	// The Seal function wrapper prepends the nonce at the start of the
	// ciphertext. We extract it and given as is. There is no need to "protect"
	// or obfuscate the nonce, since it can be public.
	var nonce [nonceSize]byte
	copy(nonce[:], []byte(ciphertextString))

	cleartext, success := secretbox.Open(nil, []byte(ciphertextString[nonceSize:]), &nonce, &key)
	if !success {
		return "", fmt.Errorf("Failed to open sealed secretbox: bad password / ciphertext combination")
	}
	return string(cleartext), nil
}

This is better from the proposed implementation in two ways:

  1. The nonce is randomly generated and there is no way the user can input a bad nonce, limiting user error such as nonce re-usage.
  2. It uses a strong cryptographic key derived from the user password instead of using the password as a cryptographic key.

I know this does not solve in anyway the second point of your post, but it is just to show how simple an actual working implementation is. I know it can look nasty, especially with all those hard-coded constants, but that is just the way this is supposed to be. Those constants will have to be changed with time, as our computational power and knowledge of cryptography increases.

Moreover, there is no need to implement the HexEncode and the HexDecode functions, base64 can be used to encode and decode the resulting ciphertext.

I get this is not either the first priority nor the simplest code to audit, I am just trying to add a functionality I desire to a software I really like. If this is not meant to be, I completely get it and I will move on. I am still grateful to the whole community for the amazing free software is being produced here.

This is a neat idea, but it should be a new Go application that goes through an output folder and encrypts in place, not a Hugo feature. Part of the new app is a JavaScript library to make it work, which doesn’t really fit for Hugo. You can use Go’s HTML parser to find tags like <template data-encrypt> and replace it and its contents with <script type="encrypted/html">. It’s a cool idea, but not very Hugo.

I hear you, but the idea is to just port the encryption / decryption functions and leave the javascript out.

I guess that having post-run hooks script would be a fair tradeoff too, but as somebody already pointed out it seems there is no support for them either.

It’s interesting that the question came from a user from the hugo-encryptor project.

If there’s no JS, it’s not very useful to end users. It doesn’t have to be a full thing, just the decoder.

I agree though that a post-run hook would be useful, e.g., for https://pagefind.app. There’s a post-run CSS purge step now, but it’s sort of ad hoc. It could be more general.

Once the encryption is done at website generation time, any js library could be used to decrypt it on the client side / send it to a microservice to decrypt it for the user.

I guess multiple cipher may be supported, but for know I would limit it to a single cipher and KDF.