A web-based multi-tenant crawler for SEO analysis and website auditing.
🌐 Website: librecrawl.com Try the Live Demo: crawl.librecrawl.com
LibreCrawl crawls websites and gives you detailed information about pages, links, SEO elements, and performance. It's built as a web application using Python Flask with a modern web interface supporting multiple concurrent users.
- 🚀 Multi-tenancy - Multiple users can crawl simultaneously with isolated sessions
- 🎨 Custom CSS styling - Personalize the UI with your own CSS themes
- 💾 Browser localStorage persistence - Settings saved per browser
- 🔄 JavaScript rendering for dynamic content (React, Vue, Angular, etc.)
- 📊 SEO analysis - Extract titles, meta descriptions, headings, etc.
- 🔗 Link analysis - Track internal and external links with detailed relationship mapping
- 📈 PageSpeed Insights integration - Analyze Core Web Vitals
- 💾 Multiple export formats - CSV, JSON, or XML
- 🔍 Issue detection - Automated SEO issue identification
- ⚡ Real-time crawling progress with live statistics
The easiest way to run LibreCrawl - just run the startup script and it handles everything:
Windows:
start-librecrawl.batLinux/Mac:
chmod +x start-librecrawl.sh
./start-librecrawl.shWhat it does automatically:
- Checks for Docker - if found, runs LibreCrawl in a container (recommended)
- If no Docker, checks for Python - if not found, downloads and installs it (Windows only temporairly disabled since it causes some bat issues)
- Installs all dependencies automatically (
pip install -r requirements.txt) - Installs Playwright browsers for JavaScript rendering
- Starts LibreCrawl in local mode (no authentication)
- Opens your browser to
http://localhost:5000
If you prefer to install manually or want more control:
Requirements:
- Docker and Docker Compose
Steps:
# Clone the repository
git clone https://github.com/PhialsBasement/LibreCrawl.git
cd LibreCrawl
# Copy environment file
cp .env.example .env
# Start LibreCrawl
docker-compose up -d
# Open browser to http://localhost:5000By default, LibreCrawl runs in local mode for easy personal use. The .env file controls this:
# .env file
LOCAL_MODE=true
HOST_BINDING=127.0.0.1For production deployment with user authentication, edit your .env file:
# .env file
LOCAL_MODE=false
HOST_BINDING=0.0.0.0- Python 3.8 or later
- Modern web browser (Chrome, Firefox, Safari, Edge)
-
Clone or download this repository
-
Install dependencies:
pip install -r requirements.txt- For JavaScript rendering support (optional):
playwright install chromium- Run the application:
# Standard mode (with authentication and tier system)
python main.py
# Local mode (all users get admin tier, no rate limits)
python main.py --local
# or
python main.py -l- Open your browser and navigate to:
- Local:
http://localhost:5000 - Network:
http://<your-ip>:5000
- Local:
Standard Mode (default):
- Full authentication system with login/register
- Tier-based access control (Guest, User, Extra, Admin)
- Guest users limited to 3 crawls per 24 hours (IP-based)
- Ideal for public-facing demos or shared hosting
Local Mode (--local or -l):
- All users automatically get admin tier access
- No rate limits or tier restrictions
- Perfect for personal use or single-user self-hosting
- Recommended for local development and testing
Click "Settings" to configure:
- Crawler settings: depth (up to 5M URLs), delays, external links
- Request settings: user agent, timeouts, proxy, robots.txt
- JavaScript rendering: browser engine, wait times, viewport size
- Filters: file types and URL patterns to include/exclude
- Export options: formats and fields to export
- Custom CSS: personalize the UI appearance with custom styles
- Issue exclusion: patterns to exclude from SEO issue detection
For PageSpeed analysis, add a Google API key in Settings > Requests for higher rate limits (25k/day vs limited).
- CSV: Spreadsheet-friendly format
- JSON: Structured data with all details
- XML: Markup format for other tools
LibreCrawl supports multiple concurrent users with isolated sessions:
- Each browser session gets its own crawler instance and data
- Settings are stored in browser localStorage (persistent across restarts)
- Custom CSS themes are per-browser
- Sessions expire after 1 hour of inactivity
- Crawl data is isolated between users
- PageSpeed API has rate limits (works better with API key)
- Large sites may take time to crawl completely
- JavaScript rendering is slower than HTTP-only crawling
- Settings stored in localStorage (cleared if browser data is cleared)
main.py- Main application and Flask serversrc/crawler.py- Core crawling enginesrc/settings_manager.py- Configuration managementweb/- Frontend interface files
MIT License - see LICENSE file for details.