Transform documents, images, and websites into engaging podcast episodes using AI
Convert long-form content into natural podcast dialogues that capture attention and make information more accessible for auditory learning on the go.
- π Versatile Input Support: Upload documents (PDF, DOCX, TXT), images with OCR (JPG, JPEG, PNG), paste text directly, or convert websites via URL
- π€ AI-Powered Dialogue Generation: Uses OpenAI's GPT-4.1-mini to create natural, engaging podcast conversations from your content
- π΅ Professional Audio: Leverages OpenAI's text-to-speech models for high-quality, lifelike voices through Mr.π AI Hub routing
- π Multi-Language Support: Generate podcasts in English, Chinese (Traditional), or Cantonese with optimized voice synthesis
- π° Cost Transparency: Real-time TTS cost calculation and tracking (English, Chinese, Cantonese)
- π₯οΈ User-Friendly Interface: Gradio-based web interface for easy interaction
- πΎ Smart History Management: Browse and reload previous podcasts stored in your browser (IndexedDB + localStorage)
- π§ Resilient Processing: Retry mechanisms and error handling for reliable conversion
- β‘ FastAPI Backend: Robust server architecture with deployment-ready setup
The project includes sample inputs:
- PDF documents (e.g., "Intangible cultural heritage item.pdf")
- Images with text (e.g., "JUPAS Guide.jpg")
- URL extraction from web pages
- Python >= 3.12
- uv package manager (recommended)
-
Clone the repository:
git clone https://github.com/bentwnghk/mr5-podcast-ai.git cd mr5-podcast-ai -
Install dependencies:
uv sync
-
Environment Setup:
- Get an API key from Mr.π AI Hub
- Set environment variables:
export OPENAI_API_KEY="your-api-key-here" export OPENAI_BASE_URL="https://api.mr5ai.com/v1" # Mr.π AI Hub endpoint
- Optional: Configure Sentry for error monitoring:
export SENTRY_DSN="your-sentry-dsn"
-
Launch the Application:
uv run python main.py
The Gradio interface will open in your browser at http://localhost:8000
-
Generate Podcasts:
- Upload Files: Select PDF, DOCX, TXT, or image files
- Paste Text: Directly input text content
- From URL: Convert web pages to podcasts
- Choose language: English, Chinese, or Cantonese
- Enter your API key (auto-saved to browser)
- Click "Generate Podcast"
-
View Results:
- Listen to the generated MP3 podcast
- Review the dialogue transcript
- Check TTS costs
- Access previous podcasts in the History panel
- PDF: Digital documents, scans, reports
- DOCX: Word documents
- TXT: Plain text files
- Images: JPG/PNG with text extraction via OpenAI Vision API
- Paste any text content
- Supports up to ~8000 token dialogues
- Convert web articles to podcasts
- Automatic content extraction with fallbacks
- Supports major news sites and blogs
- English: Standard OpenAI TTS models
- Chinese (ηΉι«): Traditional Chinese with optimized output
- Cantonese: Specialized voice support through Mr.π AI Hub
- Frontend: Gradio web interface
- Backend: FastAPI server with async processing
- Storage: Temporary file management with auto-cleanup
- AI Services: OpenAI GPT-4.1-mini + TTS via Mr.π AI Hub
- Database: Browser-based history (IndexedDB/localStorage)
- Deployment: Ready for Docker/Uvicorn
TTS costs vary by language:
- English: ~$0.15 per 1M characters
- Chinese: ~$0.30 per 1M characters (x2 multiplier)
- Cantonese: ~$0.75 per 1M characters (x8 multiplier)
.
βββ main.py # Application entry point
βββ description.md # UI descriptions
βββ head.html # Custom HTML/JS for browser features
βββ static/ # Web assets (logo, icon)
βββ examples/ # Sample files for testing
βββ pyproject.toml # Python dependencies
βββ uv.lock # Dependency lock file
βββ Dockerfile # Container configuration
βββ docker-compose.yml # Docker composition
βββ LICENSE # Apache 2.0 License
βββ README.md # This file
| Variable | Description | Required |
|---|---|---|
OPENAI_API_KEY |
Mr.π AI Hub API key | Yes |
OPENAI_BASE_URL |
Mr.π AI Hub endpoint URL | Yes |
SENTRY_DSN |
Sentry monitoring DSN | No |
The application is designed to work with Mr.π AI Hub compatible endpoints. Set OPENAI_BASE_URL to:
- Production:
https://api.mr5ai.com/v1 - Local:
http://localhost:3000/v1(if running locally)
- API Key Issues: Ensure your Mr.π AI Hub key is valid and has sufficient credits
- File Upload Errors: Check file size limits and supported formats
- URL Processing: Some websites block scraping - try different sources
- TTS Failures: Request timeouts - the app has retry mechanisms
Set python -c "import logging; logging.basicConfig(level=logging.DEBUG)" before launching for detailed logs.
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes with tests
- Submit a pull request
Licensed under the Apache License 2.0. See LICENSE for details.
- Create issues on GitHub for bugs/feature requests
- Check the examples directory for sample inputs
- Review browser console for detailed error messages