Sort slice with accented characters

Does the sort function have a parameter to define the language for collation ?

{{ $s := slice "côte" "cote" "coter" "coté" "cotée" "côté"}}
{{ $sFR := slice "cote" "coté" "côte" "côté" "cotée" "coter" }}
Unsorted: {{ $s }}<br>
Want : {{ $sFR }}<br>
Have: {{ sort $s }}<br>

Unsorted: [côte cote coter coté cotée côté]
Want : [cote coté côte côté cotée coter]
Have: [cote coter coté cotée côte côté]

The Hugo sort func seems to sort on the rune value which is not the french natural order.

In my Go scripts I use the collate and sort packages that takes the locale into account, french locale in this case. I don’t see how to pass these arguments to the Hugo sort func.

Edit: Short example in Go:

import (
	"fmt"
	"sort"

	"golang.org/x/text/collate"
	"golang.org/x/text/language"
)

func main() {

	s := []string{"côte", "cote", "coter", "coté", "cotée", "côté"}
	fmt.Println(s)

	cl := collate.New(language.French)
	sort.Slice(s, func(i, j int) bool {
		return cl.CompareString(s[i], s[j]) == -1
	})
	fmt.Println(s)
}

Output:

[côte cote coter coté cotée côté]
[cote coté côte côté cotée coter]

We don’t currently. This is the function used for almost all. string sorting (Go’s template package have some map sorting that is a little bit out of control):

When I added that we did discuss what you talk about, but opted for the simpler solution, still much better than what we had.

We do a lot of sorting in Hugo, and I’m hesitant to add that kind of sorting for all/default … But we should do something about it.

It would be a nice complement to the handy lang.NumFmt. We could imagine to have a i18n lang.SortSlice func taking the language.xxx locale (or language tag) and sort order as arguments. And possibly also an argument to specify if the original order of equal elements needs to be preserved.(sort.SliceStable() or sort.Slice() in Go)

1 Like

TL;TR

The current Hugo sort func is really fast but does not provide a correct sort for non-ASCII characters (accented strings).

See benchmark results below.

struct used for test

type SortFr struct {
	Order int
	Fr    string
}

The buildStruct() just build that struct with a french dictionary containing 80864 entries.

ASCII (plain basic sort, rune based - output NOT correct for accented strings)

func sortAscii() {
	s := buidStruct()
	sort.SliceStable(s, func(i, j int) bool {
		return s[i].Fr < s[j].Fr
	})
}

using the collation package not preserving original order of equal elements(see above post)

func sortSlice() {
	s := buidStruct()
    cl := collate.New(language.French)
	sort.Slice(s, func(i, j int) bool {
		return cl.CompareString(s[i].Fr, s[j].Fr) == -1
	})
}

using the collation package original order of equal elements preserved

func sortSliceStable() {
	s := buidStruct()
	cl := collate.New(language.French)
	sort.SliceStable(s, func(i, j int) bool {
		return cl.CompareString(s[i].Fr, s[j].Fr) == -1
	})
}

using the current Hugo compareFold() func posted above (output NOT correct for accented strings)

func sortSliceHugo() {
	s := buidStruct()
	sort.Slice(s, func(i, j int) bool {
		return compareFold(s[i].Fr, s[j].Fr) == -1
	})
}

benchmark results:

BenchmarkSortAscii-8         	       7	 154803670 ns/op	27932514 B/op	  161801 allocs/op
BenchmarkSortSliceStable-8   	       3	 463536229 ns/op	80057130 B/op	  568987 allocs/op
BenchmarkSortSlice-8         	       4	 327012995 ns/op	87800362 B/op	  629481 allocs/op
BenchmarkSortSliceHugo-8     	      15	  75025927 ns/op	27932520 B/op	  161801 allocs/op
PASS
ok  	jeanluc/csv2json/test	9.520s