commit 2a23ac883ebd12558fba0232a31ee04f0e03a47f Author: wompmacho Date: Sun May 17 11:51:04 2026 +0000 initital commit diff --git a/README.md b/README.md new file mode 100644 index 0000000..b0a7751 --- /dev/null +++ b/README.md @@ -0,0 +1,50 @@ +# web-to-markdown + +A highly efficient Golang utility that downloads an HTML page, extracts its main article content, converts it to Markdown, and concurrently downloads all inline images locally. + +## Features + +- **Boilerplate Removal:** Uses `go-readability` to extract the main article, stripping out ads, navbars, and footers. +- **Concurrent Image Downloading:** Uses native Goroutines to download all images simultaneously for maximum efficiency. +- **Markdown Conversion:** Uses `html-to-markdown` to generate clean, readable Markdown. +- **Intelligent Flat Output:** Saves the article as `index.md` and intelligently renames all images based on their `alt` text or original filenames. Images are saved alongside the markdown file for a clean, flat directory structure. + +## Build Requirements + +- Go 1.25.0+ (Will be managed by `go mod`) + +To compile the application: +```bash +go build -o web-to-markdown main.go +``` + +## Usage + +Provide the URL of the article you want to convert as the positional argument. You can also pass optional flags. + +```bash +./web-to-markdown [options] "" +``` + +### Options + +- `-title "Custom Title"`: Override the default parsed article title. This affects both the `title` in the markdown frontmatter and the generated output folder name. +- `-out "/path/to/save"`: Change the base output directory where the folder will be created. Defaults to the current directory (`.`). + +### Example + +```bash +./web-to-markdown -title "Go Concurrency Guide" -out "./my-docs" "https://example.com/interesting-article" +``` + +## Output Structure + +The tool will use a brief version of the title (or your `-title` flag) to create a safe, short folder slug. Inside that directory, you will find `index.md` alongside all intelligently named images: + +```text +my-docs/ +└── go-concurrency-guide/ + ├── index.md + ├── diagram-of-goroutines-a1b2c3.jpg + └── author-profile-pic-f6g7h8.png +``` \ No newline at end of file