initital commit
This commit is contained in:
50
README.md
Normal file
50
README.md
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
# web-to-markdown
|
||||||
|
|
||||||
|
A highly efficient Golang utility that downloads an HTML page, extracts its main article content, converts it to Markdown, and concurrently downloads all inline images locally.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- **Boilerplate Removal:** Uses `go-readability` to extract the main article, stripping out ads, navbars, and footers.
|
||||||
|
- **Concurrent Image Downloading:** Uses native Goroutines to download all images simultaneously for maximum efficiency.
|
||||||
|
- **Markdown Conversion:** Uses `html-to-markdown` to generate clean, readable Markdown.
|
||||||
|
- **Intelligent Flat Output:** Saves the article as `index.md` and intelligently renames all images based on their `alt` text or original filenames. Images are saved alongside the markdown file for a clean, flat directory structure.
|
||||||
|
|
||||||
|
## Build Requirements
|
||||||
|
|
||||||
|
- Go 1.25.0+ (Will be managed by `go mod`)
|
||||||
|
|
||||||
|
To compile the application:
|
||||||
|
```bash
|
||||||
|
go build -o web-to-markdown main.go
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
Provide the URL of the article you want to convert as the positional argument. You can also pass optional flags.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./web-to-markdown [options] "<url>"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Options
|
||||||
|
|
||||||
|
- `-title "Custom Title"`: Override the default parsed article title. This affects both the `title` in the markdown frontmatter and the generated output folder name.
|
||||||
|
- `-out "/path/to/save"`: Change the base output directory where the folder will be created. Defaults to the current directory (`.`).
|
||||||
|
|
||||||
|
### Example
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./web-to-markdown -title "Go Concurrency Guide" -out "./my-docs" "https://example.com/interesting-article"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Output Structure
|
||||||
|
|
||||||
|
The tool will use a brief version of the title (or your `-title` flag) to create a safe, short folder slug. Inside that directory, you will find `index.md` alongside all intelligently named images:
|
||||||
|
|
||||||
|
```text
|
||||||
|
my-docs/
|
||||||
|
└── go-concurrency-guide/
|
||||||
|
├── index.md
|
||||||
|
├── diagram-of-goroutines-a1b2c3.jpg
|
||||||
|
└── author-profile-pic-f6g7h8.png
|
||||||
|
```
|
||||||
Reference in New Issue
Block a user