2026-05-17 11:53:25 +00:00
2026-05-17 11:53:25 +00:00
2026-05-17 11:53:25 +00:00
2026-05-17 11:53:25 +00:00
2026-05-17 11:53:25 +00:00
2026-05-17 11:51:04 +00:00

web-to-markdown

A highly efficient Golang utility that downloads an HTML page, extracts its main article content, converts it to Markdown, and concurrently downloads all inline images locally.

Features

  • Boilerplate Removal: Uses go-readability to extract the main article, stripping out ads, navbars, and footers.
  • Concurrent Image Downloading: Uses native Goroutines to download all images simultaneously for maximum efficiency.
  • Markdown Conversion: Uses html-to-markdown to generate clean, readable Markdown.
  • Intelligent Flat Output: Saves the article as index.md and intelligently renames all images based on their alt text or original filenames. Images are saved alongside the markdown file for a clean, flat directory structure.

Build Requirements

  • Go 1.25.0+ (Will be managed by go mod)

To compile the application:

go build -o web-to-markdown main.go

Usage

Provide the URL of the article you want to convert as the positional argument. You can also pass optional flags.

./web-to-markdown [options] "<url>"

Options

  • -title "Custom Title": Override the default parsed article title. This affects both the title in the markdown frontmatter and the generated output folder name.
  • -out "/path/to/save": Change the base output directory where the folder will be created. Defaults to the current directory (.).

Example

./web-to-markdown -title "Go Concurrency Guide" -out "./my-docs" "https://example.com/interesting-article"

Output Structure

The tool will use a brief version of the title (or your -title flag) to create a safe, short folder slug. Inside that directory, you will find index.md alongside all intelligently named images:

my-docs/
└── go-concurrency-guide/
    ├── index.md
    ├── diagram-of-goroutines-a1b2c3.jpg
    └── author-profile-pic-f6g7h8.png
Description
A highly efficient Golang utility that downloads an HTML page, extracts its main article content, converts it to Markdown, and concurrently downloads all inline images locally.
Readme 35 KiB
Languages
Go 100%