Scout is a privacy-focused, cross-platform voice transcription application built with Tauri v2, React/TypeScript, and Rust. It provides real-time voice-to-text transcription with advanced model management and file upload capabilities.
Download Scout v0.4.0: Latest Release • Only 12MB! 🪶
- Download the DMG for your Mac (Apple Silicon or Intel)
- Drag Scout to Applications
- Launch and download a Whisper model from Settings
- Press
Cmd+Shift+Space
to start recording!
- 🪶 Tiny Bundle Size: Just 12MB for the entire application!
- ✅ 100% Test Pass Rate: All tests passing, ensuring reliability
- ⚡ <300ms Latency: Near real-time transcription
- 💾 <215MB Memory: Efficient resource usage
- Local-First Processing: All audio processing and transcription happens locally on your device
- Cross-Platform: Works on macOS, Windows, and Linux (iOS support planned)
- Real-Time Transcription: Low-latency voice-to-text conversion using Whisper models
- File Upload Support: Drag & drop or upload audio files for transcription
- Model Management: Download and switch between different Whisper models (tiny, base, small, medium, large)
- Smart Model Selection: Automatic model downloads and intelligent fallback handling
- Privacy-Focused: No cloud dependencies or telemetry
- Push-to-Talk Interface: Recording with global hotkeys (Cmd+Shift+Space)
- Native macOS Overlay: Minimal recording indicator with position customization
- Transcript Management: Save, search, export, and manage your transcriptions locally
- Export Options: Download transcripts in JSON, Text, or Markdown formats
- Settings System: Comprehensive settings with hotkey customization and model selection
- Audio Format Support: Handles various audio formats with automatic conversion
- Background Processing: Queued processing system for file uploads
- Clean UI: Modern, VSCode-inspired interface with dark mode support
- Frontend: React with TypeScript + Vite
- Backend: Rust with Tauri v2
- Audio Processing: cpal (Cross-platform Audio Library) with Voice Activity Detection
- Transcription:
- Built-in Mode: whisper-rs with CoreML support for optimized performance
- External Service Mode: Standalone transcriber service with multiple AI models (Whisper, Parakeet MLX, Hugging Face)
- Database: SQLite with sqlx for local transcript storage
- Settings: JSON-based configuration system with hot-reload support
- File Processing: Background queue system with audio format conversion
- Service Management: macOS LaunchAgent integration for external transcription services
See docs/architecture/project-structure.md for detailed directory layout.
scout/
├── src/ # React frontend
│ ├── components/ # React components (ModelManager, Overlay, etc.)
│ ├── hooks/ # Custom React hooks
│ ├── contexts/ # React context providers
│ ├── lib/ # Utilities and helpers
│ └── types/ # TypeScript type definitions
├── src-tauri/ # Rust backend
│ ├── src/
│ │ ├── audio/ # Audio recording and conversion
│ │ ├── transcription/ # Whisper transcription engine
│ │ ├── db/ # SQLite database layer
│ │ ├── llm/ # LLM processing pipeline
│ │ ├── service_manager.rs # External service management
│ │ └── macos/ # macOS-specific overlay implementation
│ └── Cargo.toml # Rust dependencies
├── transcriber/ # Standalone transcription service
│ ├── src/ # Rust service core
│ ├── python/ # Python ML workers
│ └── README.md # Service documentation
├── docs/ # Technical documentation
│ ├── architecture/ # System design and structure
│ ├── features/ # Feature specifications
│ └── development/ # Development guides and testing
├── config/ # Build and development configuration
├── scripts/ # Setup and utility scripts
├── models/ # Downloaded Whisper model files
└── package.json # Node.js dependencies
- Node.js (v16 or later)
- pnpm (v8 or later) - Install with
npm install -g pnpm
- Rust (latest stable)
- CMake (for building whisper.cpp)
- macOS, Windows, or Linux
-
Download the latest release from the Releases page
- macOS (Apple Silicon):
Scout_0.4.0_aarch64.dmg
- macOS (Intel):
Scout_0.4.0_x86_64.dmg
- macOS (Apple Silicon):
-
Open the DMG file and drag Scout to your Applications folder
-
On first launch, you'll need to download Whisper models:
- Open Scout
- Go to Settings → Transcription Models
- Download at least one model (Base recommended for starting)
Note for macOS: You may need to right-click and select "Open" the first time to bypass Gatekeeper if the app isn't code-signed.
- Clone the repository:
git clone https://github.com/arach/scout.git
cd scout
- Install dependencies:
pnpm install
- Run in development mode:
pnpm tauri dev
The application will automatically download a Whisper model on first launch.
To build for production:
pnpm tauri build
This will create platform-specific binaries in src-tauri/target/release/bundle/
.
- Launch the application
- Click the "Start Recording" button or use the global hotkey (Cmd+Shift+Space)
- Speak clearly into your microphone
- The native overlay shows recording status
- Click "Stop Recording" or press the hotkey again to end recording
- The transcript will appear automatically after processing
- Drag and drop audio files onto the recording area, or
- Click "Upload Audio File" to select files
- Supported formats: WAV, MP3, M4A, FLAC, and more
- Files are processed in the background queue
- View progress and results in the transcript list
- Open Settings to access the Model Manager
- Download different Whisper models based on your needs:
- Tiny (39MB): Fastest, basic accuracy
- Base (74MB): Good balance of speed and accuracy
- Small (244MB): Better accuracy, slower processing
- Medium (769MB): High accuracy
- Large (1550MB): Best accuracy, slowest
- Switch between models by clicking "Use This Model"
- The active model is shown with a green "Active" badge
For enhanced performance and additional model options:
- Install the Scout Transcriber Service
- Open Settings → Transcription → Switch to "Advanced Mode"
- Choose from advanced models:
- Parakeet MLX: NVIDIA's model optimized for Apple Silicon
- Whisper Large V3 Turbo: Hugging Face's latest optimized model
- Custom Models: Bring your own fine-tuned models
- Configure worker processes for parallel transcription
- Monitor service health in real-time
- Global Hotkeys: Customize the recording shortcut
- Overlay Position: Move the recording indicator to different screen positions
- Transcription Mode:
- Integrated: Use built-in Whisper models (simple, no setup)
- Advanced: Connect to external transcriber service (better performance, more models)
- Model Selection: Choose which model to use for transcription
- Voice Activity Detection: Enable/disable automatic silence detection
- External Service Configuration: Set up distributed transcription with custom ports and worker counts
- Core Application: Tauri v2 project with React/TypeScript frontend
- Audio Recording: High-quality recording with cpal and Voice Activity Detection
- Transcription: Full whisper-rs integration with CoreML optimization
- Model Management: Download, switch, and manage multiple Whisper models
- File Upload: Drag & drop support with automatic audio format conversion
- Settings System: JSON-based configuration with hotkey customization
- Database: SQLite storage with full transcript management
- Native Overlay: macOS-specific recording indicator with positioning
- Background Processing: Queued file processing system
- Search & Export: Full-text search and export in multiple formats
- Global Hotkeys: Customizable shortcuts for hands-free operation
- UI/UX: VSCode-inspired theme with responsive design
- Testing Infrastructure: Comprehensive test suite with 100% passing rate
- Advanced VAD tuning and noise reduction
- Real-time streaming transcription
- Cloud sync options (optional)
- Plugin system for custom workflows
- Multiple language support beyond English
- Custom model training utilities
- Team collaboration features
- API endpoint for external integrations
# Development
pnpm dev # Start Vite dev server
pnpm tauri dev # Run full app in development mode
# Build
pnpm build # TypeScript + Vite build
pnpm tauri build # Build production binaries
# Testing
cd src-tauri && cargo test # Run Rust tests
Scout has comprehensive test coverage with 97.6% success rate (163/167 tests passing):
# Frontend tests (Vitest + React Testing Library)
pnpm test
# Rust tests
cd src-tauri
cargo test
# Run tests with coverage
pnpm test --coverage
- Frontend: ESLint and Prettier
- Backend: rustfmt and clippy
- User-perceived latency: <300ms
- Memory usage: <215MB for base model
- Processing efficiency: 0.1-0.5 RTF for small models
- User-perceived latency: <200ms with Parakeet MLX
- Parallel processing: 2-8 concurrent transcriptions
- Memory usage: ~500MB base + 200MB per worker
- Processing efficiency: 0.03-0.1 RTF with optimized models
- Automatic failover to built-in mode if service unavailable
- All processing is done locally
- No network requests for transcription
- Audio files are stored temporarily and deleted after processing
- Database is stored in the app's local data directory
[License information to be added]
[Contributing guidelines to be added]