WordCount and TotalWords counts comma as a word

Hi. I thought I rather write this here, as it might not be an issue. Though I see there has been problems with this. Might be related…

These guys:

// TotalWords returns an int of the total number of words in a given content.
func TotalWords(s string) int {
	return len(strings.Fields(s))
}

// WordCount takes content and returns a map of words and count of each word.
func WordCount(s string) map[string]int {
	m := make(map[string]int)
	for _, f := range strings.Fields(s) {
		m[f]++
	}

	return m
}

Are counting fields. Which means, they will count comma and other things as words. And it will return 3 for a sentence like this: One , Two.

Is there a reason for this? Might be because, some strings are encoded differently? Might use a different approach for them, like counting words defined https://golang.org/pkg/unicode/ Is*.

Thanks,
Gergely.

Shouldn’t the above be

One, two.

I’m Norwegian, so my English may not be perfect.

Well, yes. :slight_smile: However, I’m trying to say, that the comma will be a word. Which it shouldn’t.

A better example would be then this: One & Two. This will also be 3 words, since the & will be counted individually. And the map will look like this:

{“One”: 1, “Two”: 1, “&”: 1}.

I’m trying to say that you create a problem which isn’t really a big problem. The word counter is an approximation.

To get an exact count (whatever that is), we would have to spend lot of time and energy on this, time better spent on more important issues. It isn’t worth it.

Someone might even argue that & (ampersand or “and”) is a word

Okay then. Also, there are a couple of other functions who are trying to count words by runes. So I’m guessing these will be deprecated after a while.

No, “words” and runes are not compatible in this discussion. But that is a long discussion. It is partly covered in some GitHub issues.

Yeah, sorry, that was truncatewordsbyrunes. Something unrelated.