Files
web-to-markdown/README.md
2026-05-17 11:51:04 +00:00

1.9 KiB

web-to-markdown

A highly efficient Golang utility that downloads an HTML page, extracts its main article content, converts it to Markdown, and concurrently downloads all inline images locally.

Features

  • Boilerplate Removal: Uses go-readability to extract the main article, stripping out ads, navbars, and footers.
  • Concurrent Image Downloading: Uses native Goroutines to download all images simultaneously for maximum efficiency.
  • Markdown Conversion: Uses html-to-markdown to generate clean, readable Markdown.
  • Intelligent Flat Output: Saves the article as index.md and intelligently renames all images based on their alt text or original filenames. Images are saved alongside the markdown file for a clean, flat directory structure.

Build Requirements

  • Go 1.25.0+ (Will be managed by go mod)

To compile the application:

go build -o web-to-markdown main.go

Usage

Provide the URL of the article you want to convert as the positional argument. You can also pass optional flags.

./web-to-markdown [options] "<url>"

Options

  • -title "Custom Title": Override the default parsed article title. This affects both the title in the markdown frontmatter and the generated output folder name.
  • -out "/path/to/save": Change the base output directory where the folder will be created. Defaults to the current directory (.).

Example

./web-to-markdown -title "Go Concurrency Guide" -out "./my-docs" "https://example.com/interesting-article"

Output Structure

The tool will use a brief version of the title (or your -title flag) to create a safe, short folder slug. Inside that directory, you will find index.md alongside all intelligently named images:

my-docs/
└── go-concurrency-guide/
    ├── index.md
    ├── diagram-of-goroutines-a1b2c3.jpg
    └── author-profile-pic-f6g7h8.png