Files
docs-public/projects/web-to-markdown/index.md
wompmacho f8345e7f56
All checks were successful
deploy-docs / build-and-deploy (push) Successful in 1m22s
adding notes on web-to-markdown
2026-05-17 12:29:27 +00:00

4.1 KiB

title, description, author, date, lastmod, tags
title description author date lastmod tags
Scraping web articles for AI Scraping web articles has never been this easy. Heres some tools to make it even easier and help feed your AI more data. wompmacho 2026-05-17T12:15:13-04:00 2026-05-17
golang
automation
markdown
gemini-cli
skills

Introduction

Managing knowledge often involves capturing articles and documentation from the web. To streamline this workflow, I developed web-to-markdown as a specialized utility written in Go that extracts article content, converts it to clean Markdown, and downloads inline images locally (structured to compliment my hugo site).

This is really good for quickly grabbing / sanitizing data and providing a great deal of context for agents. Adding this tool as a skill really compliments the planning stage of a project.

The web-to-markdown Utility

The core utility (git repos) is a highly optimized CLI application built with Go 1.25.0+. It is designed to be fast, reliable, and to produce a clean output structure.

Key Features

  • Boilerplate Removal: The tool leverages the go-readability library to intelligently isolate the main article content, automatically stripping out distracting elements such as advertisements, navigation bars, and footers.
  • Concurrent Image Downloading: Performance is maximized by utilizing native Goroutines to download all inline images simultaneously. This significantly reduces the time required to process image-heavy articles.
  • Markdown Conversion: The sanitized HTML is converted into readable Markdown using the html-to-markdown package.
  • Intelligent Output Structure: The utility generates a flat directory structure. It saves the main article as index.md and intelligently renames all downloaded images based on their alt text, <figcaption>, or surrounding header context. The markdown links are automatically rewritten to point to these local, contextualized image names.

Usage Example

The utility is executed via the command line, accepting the target URL and optional flags for customization.

./web-to-markdown -title "Go Concurrency Guide" -out "./docs/go" "https://example.com/go-concurrency"

This command generates a structured output similar to the following:

docs/go/
└── go-concurrency-guide/
    ├── index.md
    ├── diagram-of-goroutines-a1b2c3.jpg
    └── author-profile-pic-f6g7h8.png

Extending the Agent with a Custom Skill

While the CLI tool is powerful on its own, manually executing it interrupts the creative or research flow. To solve this, we developed a custom Gemini CLI skill.

Skills in Gemini CLI are modular packages that inject specialized procedural knowledge and workflows into the agent's context window.

The web-to-markdown Skill

The custom skill we created instructs the Gemini CLI agent on how and when to use the local web-to-markdown binary.

When a user issues a command like, "Grab me a copy of https://example.com/article," the skill triggers the following automated workflow:

  1. Configuration Gathering: The agent pauses and asks the user where the article should be saved and what the desired title should be.
  2. Execution: Once the parameters are confirmed, the agent autonomously executes the run_shell_command tool, invoking the web-to-markdown utility with the correct flags and URL.
  3. Verification: The agent verifies the success of the command by checking the target directory and informs the user that the Markdown file is ready for review.

Skill Creation Process

The skill was generated using the built-in skill-creator. The process involved:

  1. Initialization: Running the initialization script to scaffold the skill directory structure.
  2. Drafting Instructions: Writing the SKILL.md file, which includes the YAML frontmatter (defining the trigger description) and the step-by-step workflow instructions for the agent.
  3. Packaging and Installation: Compiling the skill into a .skill archive and installing it into the user's ~/.gemini/skills/ directory.