Show HN: HTML-to-Markdown – convert entire websites to Markdown with Golang/CLI
(github.com)356 points by JohannesKauf 5 days ago | 48 comments
Hey HN!
I originally built “html-to-markdown” back in 2018 (while still in high school) to handle complex HTML conversions where other libraries struggled.
Now, I’ve released v2 — a complete rewrite designed to handle even more edge cases. It supports entire websites with a high accuracy.
Example use: I’ve used it in my RSS reader to strip HTML down to clean Markdown, similar to the "Reader Mode" in your Browser.
It can be used as a Golang package or as an CLI.
Give it a try & tell me what edge cases you encounter!
miki123211 5 days ago | next |
If you need this sort of thing in any other language, there's a free, no-auth, no-api-key-required, no-strings-attached API that can do this at https://jina.ai/reader/
You just fetch a URL like `https://r.jina.ai/https://www.asimov.press/p/mitochondria`, and get a markdown document for the "inner" URL.
I've actually used this and it's not perfect, there are websites (mostly those behind Cloudflare and other such proxies) that it can't handle, but it does 90% of the job, and is an one-liner in most languages with a decent HTTP requests library.