+
Skip to content

Free web scraper that converts HTML to markdown using AI. Simple API, LLM-powered content extraction, easy self-hosting.

License

Notifications You must be signed in to change notification settings

Xpos587/markfetch

Repository files navigation

Markfetch ⚡📝

Fast API-only web scraper that converts websites to LLM-ready markdown using AI.

Features 🚀

  • API-only - No web interface, pure API service
  • Convert any website into clean markdown
  • AI-powered content processing with Google Gemini
  • Crawling multiple pages (1-10 pages configurable)
  • Secure API key authentication
  • Fast async browser automation with Camoufox
  • Containerized deployment with automatic GitHub publishing

Quick Start

Using Docker (Recommended)

# Pull the image
docker pull ghcr.io/xpos587/markfetch:latest

# Run with environment variables
docker run -d \
  -p 8000:8000 \
  -e GOOGLE_API_KEY=your_google_api_key \
  -e MARKFETCH_API_KEY=your_secret_api_key \
  ghcr.io/xpos587/markfetch:latest

Local Development

# Clone the repository
git clone https://github.com/xpos587/markfetch.git
cd markfetch

# Install dependencies with uv
pip install uv
uv sync

# Run the server
export GOOGLE_API_KEY=your_google_api_key
export MARKFETCH_API_KEY=your_secret_api_key
python -m src.app

API Usage

Base URL

http://localhost:8000/

Authentication

All requests require API key in Authorization header:

Authorization: Bearer YOUR_API_KEY

Parameters

  • url (required): Website URL to convert
  • ai_processing (optional, default: false): Enable AI-powered content filtering
  • crawl_pages (optional, default: 1): Number of pages to crawl (1-10)

Examples

Single page (markdown response):

curl -H "Authorization: Bearer YOUR_API_KEY" \
     "http://localhost:8000/?url=https://example.com"

Single page with AI processing:

curl -H "Authorization: Bearer YOUR_API_KEY" \
     "http://localhost:8000/?url=https://example.com&ai_processing=true"

Multiple pages with AI (JSON response):

curl -H "Authorization: Bearer YOUR_API_KEY" \
     "http://localhost:8000/?url=https://github.com&ai_processing=true&crawl_pages=5"

Response Types

  • Single page: Plain text markdown
  • Multiple pages: JSON with array of results

JSON Response Example:

[
  {
    "url": "https://example.com",
    "md": "# Example Domain\n\nThis domain is for use in documentation examples..."
  }
]

Environment Variables

  • GOOGLE_API_KEY: Google Gemini API key (required)
  • GEMINI_API_KEY: Alternative Gemini API key (optional)
  • MARKFETCH_API_KEY: Your API key for authentication (required)
  • PRODUCTION: Set to true to disable hot reload (optional)
  • PORT: Custom port (default: 8000) (optional)

Deployment

Automated Deployment

GitHub Actions automatically builds and publishes containers to GitHub Container Registry.

Available Registry:

  • ghcr.io/xpos587/markfetch

Tags:

  • latest: Latest main branch build
  • vX.Y.Z: Semantic version tags

Manual Build

# Build image
docker build -f Containerfile -t markfetch .

# Run container
docker run -d \
  --name markfetch \
  -p 8000:8000 \
  --restart unless-stopped \
  -e GOOGLE_API_KEY=$GOOGLE_API_KEY \
  -e MARKFETCH_API_KEY=$MARKFETCH_API_KEY \
  markfetch

Docker Compose

services:
  markfetch:
    image: ghcr.io/xpos587/markfetch:latest
    ports:
      - "8000:8000"
    environment:
      - GOOGLE_API_KEY=your_google_api_key
      - MARKFETCH_API_KEY=your_secret_api_key
    restart: unless-stopped
docker compose up -d

Tech Stack

  • Backend: FastAPI with Python 3.12
  • Browser: Camoufox (async Firefox-based)
  • AI: Google Gemini 2.0 Flash
  • Container: Multi-stage Docker build with Camoufox pre-initialization
  • Package Manager: UV

Security

  • API key authentication required for all requests
  • No web interface - API only
  • Secure header-based authentication
  • No API keys in URL parameters

Support

Star this repository to support development! ⭐

About

Free web scraper that converts HTML to markdown using AI. Simple API, LLM-powered content extraction, easy self-hosting.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

 
 
 
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载