I’m looking to convert an old hand-coded HTML site to Hugo. The problem is that it’ll take forever to manually convert the pages from HTML to Markdown. Is there a tool I can use to do this. The only thing I want to keep from the old pages is the content of the and tags. Any ideas or am I in for a long haul?
This converter should do the job.
Pandoc can do it too. See example 12.
@johnblood
Assuming you have homebrew installed:
brew install pandoc
cd into directory/with/html/files
FILES=*.html
for f in $FILES
do
filename="${f%.*}"
echo "Converting $f to $filename.md"
`pandoc $f -t markdown -o $filename.md`
done
Hope that helps. Cheers.
In the Windows 7 i use pandoc
and this .bat file:
html2mark4folder.bat
for /r %1 %%f in (*.html) do (pandoc --wrap=none -s -f html -w markdown_mmd+yaml_metadata_block-raw_html -o %%~dpnf.md %%f)
This file shold be placed in folder "C:\Users\USERNAME\AppData\Roaming\Microsoft\Windows\SendTo"
And after that will be possible right-click on any folder, and send to
it to html2mark4folder.bat to convert all files in the folder.
Unfortunately, I have Win7 installed.
Brett Terpstra has a nice online converter at http://fuckyeahmarkdown.com
Nice. I gave it try and i looks great. Thanks.
@johnblood No worries. You can install and use pandoc on Windows as well:
http://pandoc.org/getting-started.html
Unfortunately, I won’t be of much use with powershell/Windows command line, but it looks like @Mikhail has got that covered for you. Terpstra’s tool seems really cool too, but if you have a ton of pages you might want to hit this with a script recursively. That or just wait until you can run bash on Windows .
It will be a cold day in H E double hockey stick before I upgrade to Win10. If I have to I’ll switch to Manjaro Linux first. I use Bash on Win7 with Git Bash. Better than sliced bread.
Doesn’t work yet .The timer_create
syscall handler has not been implemented yet.
pandoc: timer_create: Invalid argument
My philosophy is why risk goofing up a Windows install that works (those licences cost $$$$$$$$$$$$$$$$$$$$$$$$$) when you can buy a cheap laptop and install Manjaro.
I tried to use the .bat files, but it did not work. When I try to use it, the command prompt appeared briefly and disappeared. When I tried to convert a test file from the command prompt (pandoc -s earlyyears.html -o earlyyears.md
). I got this error:
pandoc.exe: Cannot decode byte '\xa9': Data.Text.Internal.Encoding.Fusion.stream Utf8: Invalid UTF-8 stream
.
Any ideas?
Thanks for this. Extremely useful