Need to convert html to Markdown

I’m looking to convert an old hand-coded HTML site to Hugo. The problem is that it’ll take forever to manually convert the pages from HTML to Markdown. Is there a tool I can use to do this. The only thing I want to keep from the old pages is the content of the and tags. Any ideas or am I in for a long haul?

This converter should do the job.

1 Like

Pandoc can do it too. See example 12.

@johnblood
Assuming you have homebrew installed:

  1. brew install pandoc
  2. cd into directory/with/html/files
FILES=*.html
for f in $FILES
do
  filename="${f%.*}"
  echo "Converting $f to $filename.md"
  `pandoc $f -t markdown -o $filename.md`
done

Hope that helps. Cheers.

In the Windows 7 i use pandoc and this .bat file:

html2mark4folder.bat
for /r %1 %%f in (*.html) do (pandoc --wrap=none -s -f html -w markdown_mmd+yaml_metadata_block-raw_html -o %%~dpnf.md %%f)

This file shold be placed in folder "C:\Users\USERNAME\AppData\Roaming\Microsoft\Windows\SendTo"
And after that will be possible right-click on any folder, and send to it to html2mark4folder.bat to convert all files in the folder.

1 Like

Unfortunately, I have Win7 installed.

Brett Terpstra has a nice online converter at http://fuckyeahmarkdown.com

2 Likes

Nice. I gave it try and i looks great. Thanks.

@johnblood No worries. You can install and use pandoc on Windows as well:

http://pandoc.org/getting-started.html

Unfortunately, I won’t be of much use with powershell/Windows command line, but it looks like @Mikhail has got that covered for you. Terpstra’s tool seems really cool too, but if you have a ton of pages you might want to hit this with a script recursively. That or just wait until you can run bash on Windows :wink: .

It will be a cold day in H E double hockey stick before I upgrade to Win10. If I have to I’ll switch to Manjaro Linux first. I use Bash on Win7 with Git Bash. Better than sliced bread.

Doesn’t work yet :frowning: .The timer_create syscall handler has not been implemented yet.

pandoc: timer_create: Invalid argument

My philosophy is why risk goofing up a Windows install that works (those licences cost $$$$$$$$$$$$$$$$$$$$$$$$$) when you can buy a cheap laptop and install Manjaro.

I tried to use the .bat files, but it did not work. When I try to use it, the command prompt appeared briefly and disappeared. When I tried to convert a test file from the command prompt (pandoc -s earlyyears.html -o earlyyears.md). I got this error:

pandoc.exe: Cannot decode byte '\xa9': Data.Text.Internal.Encoding.Fusion.stream Utf8: Invalid UTF-8 stream.

Any ideas?

http://tex.stackexchange.com/questions/97843/how-to-interpret-message-invalid-utf-8-stream-when-trying-to-convert-a-tex-fil

Thanks for this. Extremely useful :slight_smile:

2 Likes

See also: Useful cmd for windows-users. Converter page.md to page/index.md and reverse; slug to filename and etc