Hi all, I’m interested in possibly migrating my WordPress blog to Hugo. I’ve had preliminary success migrating my content into Hugo’s structure, but I’ve encountered an issue for which Google yields no solution: some of my posts include emoji and/or Hebrew text in their titles, and so far I can’t get these posts to be successfully served by hugo serve
or python -m SimpleHTTPServer
or an old version of httpd running on my personal server.
To elaborate:
I started with
- A WordPress blog with 3,269 posts that I’ve been running since 2004 on various versions of PHP, MySQL, Linux, etc
- Currently running WordPress 4.7 on Ubuntu 14.04.1 LTS with PHP 5.5.9-1ubuntu4.5 and MySQL 14.14 Distrib 5.5.40 (whatever that means)
- My MySQL DB is so old and has been migrated so many times that it’s a wonder it kinda-sorta still works. I would be shocked if it used a sensible character encoding setting, etc. It’s probably a mess.
I did
- Used the Jekyll Exporter WordPress plugin to generate and download my site in Jekyll’s structure
- Discovered via
jekyll serve
that a bunch of my earlier posts included “invalid” byte sequences (invalid as per Unicode, in other words byte sequences that are not valid Unicode) - Used a rudimentary shell script invoking
iconv
to remove all the invalid byte sequences - Used
hugo import jekyll
to import the site from the Jekyll structure to the Hugo structure
Now I’ve got
- Most of the posts look great, work great.
- The posts with emoji/Hebrew in their titles render correctly in the index
- Except for that
…
showing up at the end — I have no idea what that is. Many of the posts have that at the end — it seems to be more common with more recent posts- Maybe it’s related to the plugin I’ve been using to import tweets
- Except for that
- But I can’t navigate to those pages. I get a 404
- As I wrote above, I tried this with various webservers, using the dynamic
hugo serve
and also with a published version of the site
- As I wrote above, I tried this with various webservers, using the dynamic
- The URL for that post above, for example, is
http://localhost:1313/post/🤔-if-one-works-as-a-member-of-a-collaborative-team/
- I mean the URL as generated in the index page
- The path of that post in my Hugo site — as per
ls
via Bash in my MacOS terminal — iscontent/post/2016-10-19-%f0%9f%a4%94-if-one-works-as-a-member-of-a-collaborative-team.md
- Within that file, as per
cat
, the line withtitle
reads:title: "\U0001F914 if one works as a member of a collaborative team…"
So…
I’m kinda out of time writing this up… I’m not sure how to resolve this — although I’m a software developer, I’m out of my depth here. I just don’t know what’s going on or how I might resolve this.
I’d very much appreciate any suggestions!
Thank you!