θΏ™ζ˜―indexlocζδΎ›ηš„ζœεŠ‘οΌŒδΈθ¦θΎ“ε…₯任何密码
Skip to content

bentwnghk/mr5-podcast-ai

Β 
Β 

Repository files navigation

Mr.πŸ†– PodcastAI πŸŽ™οΈπŸŽ§

Transform documents, images, and websites into engaging podcast episodes using AI

Convert long-form content into natural podcast dialogues that capture attention and make information more accessible for auditory learning on the go.

Features

  • πŸ“ Versatile Input Support: Upload documents (PDF, DOCX, TXT), images with OCR (JPG, JPEG, PNG), paste text directly, or convert websites via URL
  • πŸ€– AI-Powered Dialogue Generation: Uses OpenAI's GPT-4.1-mini to create natural, engaging podcast conversations from your content
  • 🎡 Professional Audio: Leverages OpenAI's text-to-speech models for high-quality, lifelike voices through Mr.πŸ†– AI Hub routing
  • 🌐 Multi-Language Support: Generate podcasts in English, Chinese (Traditional), or Cantonese with optimized voice synthesis
  • πŸ’° Cost Transparency: Real-time TTS cost calculation and tracking (English, Chinese, Cantonese)
  • πŸ–₯️ User-Friendly Interface: Gradio-based web interface for easy interaction
  • πŸ’Ύ Smart History Management: Browse and reload previous podcasts stored in your browser (IndexedDB + localStorage)
  • πŸ”§ Resilient Processing: Retry mechanisms and error handling for reliable conversion
  • ⚑ FastAPI Backend: Robust server architecture with deployment-ready setup

Demo Examples

The project includes sample inputs:

  • PDF documents (e.g., "Intangible cultural heritage item.pdf")
  • Images with text (e.g., "JUPAS Guide.jpg")
  • URL extraction from web pages

Installation

Prerequisites

  • Python >= 3.12
  • uv package manager (recommended)

Quick Setup

  1. Clone the repository:

    git clone https://github.com/bentwnghk/mr5-podcast-ai.git
    cd mr5-podcast-ai
  2. Install dependencies:

    uv sync
  3. Environment Setup:

    • Get an API key from Mr.πŸ†– AI Hub
    • Set environment variables:
      export OPENAI_API_KEY="your-api-key-here"
      export OPENAI_BASE_URL="https://api.mr5ai.com/v1"  # Mr.πŸ†– AI Hub endpoint
    • Optional: Configure Sentry for error monitoring:
      export SENTRY_DSN="your-sentry-dsn"

Usage

Quick Start

  1. Launch the Application:

    uv run python main.py

    The Gradio interface will open in your browser at http://localhost:8000

  2. Generate Podcasts:

    • Upload Files: Select PDF, DOCX, TXT, or image files
    • Paste Text: Directly input text content
    • From URL: Convert web pages to podcasts
    • Choose language: English, Chinese, or Cantonese
    • Enter your API key (auto-saved to browser)
    • Click "Generate Podcast"
  3. View Results:

    • Listen to the generated MP3 podcast
    • Review the dialogue transcript
    • Check TTS costs
    • Access previous podcasts in the History panel

Input Methods

1. File Upload

  • PDF: Digital documents, scans, reports
  • DOCX: Word documents
  • TXT: Plain text files
  • Images: JPG/PNG with text extraction via OpenAI Vision API

2. Text Input

  • Paste any text content
  • Supports up to ~8000 token dialogues

3. URL Processing

  • Convert web articles to podcasts
  • Automatic content extraction with fallbacks
  • Supports major news sites and blogs

Language Options

  • English: Standard OpenAI TTS models
  • Chinese (繁體): Traditional Chinese with optimized output
  • Cantonese: Specialized voice support through Mr.πŸ†– AI Hub

Architecture

  • Frontend: Gradio web interface
  • Backend: FastAPI server with async processing
  • Storage: Temporary file management with auto-cleanup
  • AI Services: OpenAI GPT-4.1-mini + TTS via Mr.πŸ†– AI Hub
  • Database: Browser-based history (IndexedDB/localStorage)
  • Deployment: Ready for Docker/Uvicorn

Cost Estimation

TTS costs vary by language:

  • English: ~$0.15 per 1M characters
  • Chinese: ~$0.30 per 1M characters (x2 multiplier)
  • Cantonese: ~$0.75 per 1M characters (x8 multiplier)

Project Structure

.
β”œβ”€β”€ main.py              # Application entry point
β”œβ”€β”€ description.md        # UI descriptions
β”œβ”€β”€ head.html             # Custom HTML/JS for browser features
β”œβ”€β”€ static/               # Web assets (logo, icon)
β”œβ”€β”€ examples/             # Sample files for testing
β”œβ”€β”€ pyproject.toml        # Python dependencies
β”œβ”€β”€ uv.lock              # Dependency lock file
β”œβ”€β”€ Dockerfile           # Container configuration
β”œβ”€β”€ docker-compose.yml   # Docker composition
β”œβ”€β”€ LICENSE              # Apache 2.0 License
└── README.md            # This file

Configuration

Environment Variables

Variable Description Required
OPENAI_API_KEY Mr.πŸ†– AI Hub API key Yes
OPENAI_BASE_URL Mr.πŸ†– AI Hub endpoint URL Yes
SENTRY_DSN Sentry monitoring DSN No

Custom API Endpoints

The application is designed to work with Mr.πŸ†– AI Hub compatible endpoints. Set OPENAI_BASE_URL to:

  • Production: https://api.mr5ai.com/v1
  • Local: http://localhost:3000/v1 (if running locally)

Troubleshooting

Common Issues

  • API Key Issues: Ensure your Mr.πŸ†– AI Hub key is valid and has sufficient credits
  • File Upload Errors: Check file size limits and supported formats
  • URL Processing: Some websites block scraping - try different sources
  • TTS Failures: Request timeouts - the app has retry mechanisms

Debug Mode

Set python -c "import logging; logging.basicConfig(level=logging.DEBUG)" before launching for detailed logs.

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make your changes with tests
  4. Submit a pull request

License

Licensed under the Apache License 2.0. See LICENSE for details.

Support

  • Create issues on GitHub for bugs/feature requests
  • Check the examples directory for sample inputs
  • Review browser console for detailed error messages

Transform your content into podcasts that engage and inform πŸŽ™οΈ

About

Mr.πŸ†– PodcastAI - Convert documents and websites into audio podcasts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 60.8%
  • HTML 39.0%
  • Dockerfile 0.2%