Automate your browser with natural language commands - Open source browser AI agent solution
ChromeAiAgent is a revolutionary Chrome extension that transforms your browser into an intelligent automation platform. Using natural language commands and AI-powered intelligence, ChromeAiAgent can automate virtually any browser task - from complex multi-step workflows to simple repetitive actions.
- Automatic Discovery: Instantly detects 6 popular local AI servers (Ollama, LM Studio, Jan.ai, LocalAI, Text-Gen-WebUI, GPT4All)
- Real-Time Status: Live monitoring of server status and model availability
- One-Click Switching: Switch between AI models with automatic configuration
- Smart UI: Visual indicators showing server status and available model count
- Enhanced UX: Modern gradients, improved error handling, and contextual help
- 🧠 Natural Language Control: Command your browser in plain English - no coding required
- 🤖 AI-Powered Intelligence: 30+ MCP tools that understand context and adapt to your needs
- ⚡ Multi-Step Automation: Execute complex workflows with single commands
- 🔄 Universal Compatibility: Works with any website - no special setup needed
- 📊 Smart Data Extraction: Automatically collect and organize information from web pages
- 🎯 Precision Actions: Click, fill, scroll, and interact with elements using AI vision
- 📝 Form Automation: Fill out forms, submit data, and handle complex interactions
- 🖼️ Visual Understanding: AI can see and understand page content for intelligent automation
- 🔧 Developer Friendly: Open source with extensive API for custom automation
- 🚀 Lightning Fast: Execute automation tasks in seconds, not minutes
- Smart Content Analysis: Extract structured data from any webpage
- Price Monitoring: Track prices across multiple e-commerce sites
- Research Automation: Gather information from multiple sources automatically
- Visual Element Detection: AI can see and interact with page elements
- Form Automation: Fill out complex forms with intelligent field mapping
- Dynamic Content Handling: Adapt to changing page layouts and content
- Text Highlighting & Summarization: Automatically highlight and summarize important content
- Document Processing: Extract and organize information from web documents
- Smart Note-Taking: Capture and organize insights from web browsing
- AI-Powered Organization: Automatically group and organize tabs by topic
- Smart Tab Switching: Find and switch between tabs using natural language
- Multi-Window Coordination: Manage complex workflows across multiple browser windows
ChromeAiAgent supports multiple AI providers, giving you flexibility in choosing the best model for your needs:
- GitHub Models (Free tier available) - GPT-5, Claude, Llama models
- OpenAI - GPT-4.1, GPT-4o, latest models
- Anthropic Claude - Claude 3.5 Sonnet, advanced reasoning
- Google Gemini - Multimodal capabilities
- DeepSeek - Cost-effective, strong reasoning
- Azure OpenAI - Enterprise-grade
- Ollama - Run models locally, complete privacy
- LM Studio - User-friendly local AI interface
- LocalAI - Self-hosted OpenAI-compatible API
- Text Generation WebUI - Community solution
- Custom API - Connect any OpenAI-compatible endpoint
📚 See LLM-PROVIDERS.md for detailed setup instructions
-
Open ChromeAiAgent
- Press
⌘+M
(Mac) orCtrl+M
(Windows/Linux) - Or click the ChromeAiAgent icon in your toolbar
- Press
-
Start Automating
- Type
/ai
to start AI automation chat - Use natural language: "Click the login button", "Fill out this form"
- Try complex workflows: "Research React best practices and save to notes"
- Type
We love contributions! Here's how you can help make ChromeAiAgent even better:
📖 For detailed development setup, build instructions, and contribution guidelines, please see DEVELOPMENT.md
- 🏗️ Local Development: See DEVELOPMENT.md#local-development-setup
- 🔧 Building: See DEVELOPMENT.md#building-for-production
- 🤝 Contributing: See DEVELOPMENT.md#how-to-contribute
- 📊 Project Status: See DEVELOPMENT.md#development-status
🗂️ Tab Management - 8 tools
Complete tab control and navigation:
get_all_tabs
- Get all open tabs across all windowsget_current_tab
- Get information about the currently active tabswitch_to_tab
- Switch to a specific tab by IDcreate_new_tab
- Create a new tab with the specified URLget_tab_info
- Get detailed information about a specific tabduplicate_tab
- Duplicate an existing tabclose_tab
- Close a specific tabget_current_tab_content
- Get the visible text content of the current tab
📄 Page Content & Interaction - 15 tools
Content extraction, analysis, and page interaction:
get_page_metadata
- Get page metadata including title, description, keywordsextract_page_text
- Extract text content with word count and reading timeget_page_links
- Get all links from the current pagesearch_page_text
- Search for text on the current pageget_interactive_elements
- Get all interactive elements (links, buttons, inputs)get_interactive_elements_optimized
- Optimized version for complex pagesclick_element
- Click an element using CSS selectorsummarize_page
- Summarize page content with key pointsfill_input
- Fill an input field with textclear_input
- Clear the content of an input fieldget_input_value
- Get the current value of an input fieldsubmit_form
- Submit a form using CSS selectorget_form_elements
- Get all form elements and input fieldsscroll_to_element
- Scroll to a DOM element and center ithighlight_element
- Permanently highlight DOM elementshighlight_text_inline
- Highlight specific words or phrases within text
⬇️ Downloads & Files - 4 tools
Download control and file management:
download_text_as_markdown
- Download text content as markdown filedownload_image
- Download an image from base64 datadownload_chat_images
- Download multiple images from chat messagesdownload_current_chat_images
- Download all images from current AI chat
📸 Screenshots - 3 tools
Visual capture and screenshot management:
capture_screenshot
- Capture screenshot of current visible tabcapture_tab_screenshot
- Capture screenshot of a specific tab by IDcapture_screenshot_to_clipboard
- Capture screenshot and save to clipboard
🔧 Advanced Features - 3+ tools
Advanced browser automation and utilities:
- Additional specialized tools for enhanced browser control
- AI-powered content analysis and processing
- Custom automation workflows
This project is licensed under the MIT License - see the LICENSE file for details.
- 🐛 Found a bug? Open an issue
- 💡 Have a feature request? Start a discussion
- 🤝 Want to contribute? See our Contributing Guide
- 💬 Need help? Join our community discussions
Thank you to all the amazing contributors who help make ChromeAiAgent better:
ropzislaw 56 commits |
Codexiaoyi 10 commits |
guberm 5 commits |
Total Contributors: 3 | Total Commits: 71
Want to contribute? Check out our Contributing Guide and help make ChromeAiAgent even better!