Fast API-only web scraper that converts websites to LLM-ready markdown using AI.
- API-only - No web interface, pure API service
- Convert any website into clean markdown
- AI-powered content processing with Google Gemini
- Crawling multiple pages (1-10 pages configurable)
- Secure API key authentication
- Fast async browser automation with Camoufox
- Containerized deployment with automatic GitHub publishing
# Pull the image
docker pull ghcr.io/xpos587/markfetch:latest
# Run with environment variables
docker run -d \
-p 8000:8000 \
-e GOOGLE_API_KEY=your_google_api_key \
-e MARKFETCH_API_KEY=your_secret_api_key \
ghcr.io/xpos587/markfetch:latest
# Clone the repository
git clone https://github.com/xpos587/markfetch.git
cd markfetch
# Install dependencies with uv
pip install uv
uv sync
# Run the server
export GOOGLE_API_KEY=your_google_api_key
export MARKFETCH_API_KEY=your_secret_api_key
python -m src.app
http://localhost:8000/
All requests require API key in Authorization header:
Authorization: Bearer YOUR_API_KEY
url
(required): Website URL to convertai_processing
(optional, default: false): Enable AI-powered content filteringcrawl_pages
(optional, default: 1): Number of pages to crawl (1-10)
Single page (markdown response):
curl -H "Authorization: Bearer YOUR_API_KEY" \
"http://localhost:8000/?url=https://example.com"
Single page with AI processing:
curl -H "Authorization: Bearer YOUR_API_KEY" \
"http://localhost:8000/?url=https://example.com&ai_processing=true"
Multiple pages with AI (JSON response):
curl -H "Authorization: Bearer YOUR_API_KEY" \
"http://localhost:8000/?url=https://github.com&ai_processing=true&crawl_pages=5"
- Single page: Plain text markdown
- Multiple pages: JSON with array of results
JSON Response Example:
[
{
"url": "https://example.com",
"md": "# Example Domain\n\nThis domain is for use in documentation examples..."
}
]
GOOGLE_API_KEY
: Google Gemini API key (required)GEMINI_API_KEY
: Alternative Gemini API key (optional)MARKFETCH_API_KEY
: Your API key for authentication (required)PRODUCTION
: Set totrue
to disable hot reload (optional)PORT
: Custom port (default: 8000) (optional)
GitHub Actions automatically builds and publishes containers to GitHub Container Registry.
Available Registry:
ghcr.io/xpos587/markfetch
Tags:
latest
: Latest main branch buildvX.Y.Z
: Semantic version tags
# Build image
docker build -f Containerfile -t markfetch .
# Run container
docker run -d \
--name markfetch \
-p 8000:8000 \
--restart unless-stopped \
-e GOOGLE_API_KEY=$GOOGLE_API_KEY \
-e MARKFETCH_API_KEY=$MARKFETCH_API_KEY \
markfetch
services:
markfetch:
image: ghcr.io/xpos587/markfetch:latest
ports:
- "8000:8000"
environment:
- GOOGLE_API_KEY=your_google_api_key
- MARKFETCH_API_KEY=your_secret_api_key
restart: unless-stopped
docker compose up -d
- Backend: FastAPI with Python 3.12
- Browser: Camoufox (async Firefox-based)
- AI: Google Gemini 2.0 Flash
- Container: Multi-stage Docker build with Camoufox pre-initialization
- Package Manager: UV
- API key authentication required for all requests
- No web interface - API only
- Secure header-based authentication
- No API keys in URL parameters
Star this repository to support development! ⭐