I’ve been using Automatic Content Summaries, and they don’t seem to work quite right, so I looked at the code and found a few issues. I’ll discuss the most serious one here.
For non-CJK languages, a function called TruncateWordsToWholeSentence
(in helpers/content.go
) is used to split the page content at or soon after the number of words defined by summaryLength
. It locates the end of a sentence simply by looking for one of .
!
?
\n
"
In fact, locating the end of a sentence (in English at least) is notoriously difficult (see, for example, https://stackoverflow.com/questions/4576077/how-can-i-split-a-text-into-sentences) and effectively impossible for this application.
I suggest instead that the function should cut off the text after a certain number of words, and add an ellipsis (…, HTML entity …
) to indicate that it has done so. (Which is what other well-known website generation systems do.)
This is simple and reliable (and I’ve written some code if anyone else thinks this is a good idea).
It would be a ‘breaking’ change in the sense that it would slightly change the appearance of many websites, but I think it would be a significant improvement.
This doesn’t apply to CJK languages – they are handled by a different function, and I’m not qualified to tell if the function works correctly or not. I also don’t know if an ellipsis is suitable for use with other non-Latin scripts.
What does the panel think?