[RFC] Alpha WordPress Conversion Tool

After far more work than I intended, instead of just a quick and dirty modification of exitwp-to-hugo to use Python3 I’ve written a completely new Python scripts called wpxr-to-static.

I am pleased to announce that it’s ready for eyeballs and initial comments although not regular use, especially by less technical users (hence the alpha monicker).

I hope that some folks will find the current state interesting enought to look at, and will be continuing to work on it, along with my theme and debug stuff for Hugo, over the coming month.

In any event you can find it at:

or

There is no documentation except the source and config file comments at the moment, but that will be improved soon.

[EDIT]: Two additional notes:

  1. It seems I managed to introduce a couple of whitespace errors between my last test and initial release; that has been been fixed now. The problems with trying to finalizing when falling asleep at the keyboard…
  2. The results are almost usable as an initial converted version of the site, just by adding a theme, but is currently missing the homepage. I’m wondering if I should add some kind of root _index.md so that most themes will work correctly. What do folks think?

Great work! I used (adapted) the exit-wp script when I brought a website over from Wordpress to Hugo a couple of years ago. For my purposes what I wanted was the content (posts and pages in Wordpress speak) to come across with whatever metadata was useful. I expected that I’d need to spend time setting up the homepage - as you normally need to do just changing a theme in Wordpress.

1 Like

Thank you for reminding of that. Hugo themes seem to have that their own homepage ‘quirks’ as well, so I’m not going to spend a lot of time building something. I think I will do an _index.md that has links to all the top-level sections, just so there is something there, because I think it won’t make things worse, and for some themes it’ll mean that one can spin up a server to get a general sense of whether there any major issues with the first pass.

Before that, though, is getting the image URLs dealt with. In my case I prefer to copy them server as part of a my site backup, rather than downloading from a site that still exists and will be doing my initial images work based on the scenario where the images can by copied from a local filesystem, or a scenario where Hugo’s directories don’t have the images, but the absolute URL from the WPXR (export xml) is just rewritten according to some regex.

Ironically because my old WP sites are currently still existing, the conversion I’ve got so far, results in pages with properly loading images (from the live site). I have discovered that the WP data has signs of damage (in the XML, although the XML is valid, so the problem is probably the DB) though, so I’ll have to do some manual fixing of the pages and posts – that was not unexpected, and WP misbehaviour was one of the reasons for my wanting to switch back to Hugo.

I’ve debated about converting the ‘Media Library’ aka attachments (post_type) pages for the various media, but I think that’s overkill for my needs. I’m not trying to imitate the WP site, just get the important bits for having a good Hugo site :sunny:

EDIT: I did decide the download was not so much overkill, and will soon release with a full image handling, including the option of downloading or copying from local paths the image files.

I think I’ve got things ready for a more official release now, though need other testers before I declare it a done deal. The tag below

Should be able to become a release tag (probably as 0.2.0-beta.1). I’ve got things to point where I can do a conversion (including downloads, if the server permits urllib3), add a theme and have a site that only needs tweaks for unavoidable minor issues (like not having a homepage).

I will add docs before tagging as a release version, but right now I have some sites to switch away from WordPress…

3 Likes

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.