Real-time audio transcription with Go backend and web frontend. Captures microphone audio, streams via WebSocket, and provides live transcription using Google Cloud Speech-to-Text and Vertex AI.
- Go 1.24+ installed
- Google Cloud account with an active project
- Google Cloud CLI (
gcloud) installed
Install and configure gcloud CLI:
# Install gcloud CLI (if not already installed)
# Visit: https://cloud.google.com/sdk/docs/install
# Initialize gcloud and authenticate
gcloud init
# Set up Application Default Credentials
gcloud auth application-default loginCreate/configure your GCP project:
# Set your project ID (replace with your actual project)
export GCP_PROJECT_ID=your-project-id
export GCP_LOCATION=us-central1
# Set the project as default
gcloud config set project $GCP_PROJECT_ID
# Enable required APIs
gcloud services enable speech.googleapis.com
gcloud services enable aiplatform.googleapis.comSet environment variables:
# Add these to your shell profile (.bashrc, .zshrc, etc.)
export GCP_PROJECT_ID=your-project-id
export GCP_LOCATION=us-central1
# AI Model Configuration
export GEMINI_MODEL=gemini-2.5-flash # Gemini model to use (default: gemini-2.5-flash)
# Logging Configuration
export LOG_LEVEL=INFO # DEBUG, INFO, WARN, ERROR (default: INFO)
export LOG_FORMAT=JSON # JSON, TEXT (default: JSON)
# Preset Configuration
export PRESET_DIRECTORY=./presets # Directory containing preset files (default: ./presets)
# Optional: Set custom port (default: 8080)
export PORT=8080# Install Go dependencies
go mod tidy
# Run the server
go run main.goOpen your browser and navigate to: http://localhost:8080
For secure connections, the server automatically detects SSL certificate files and enables HTTPS:
Note: secure connection is mandatory to use anything else that localhost
# Generate a private key
openssl genrsa -out server.key 2048
# Generate a self-signed certificate (valid for 365 days)
openssl req -new -x509 -key server.key -out server.crt -days 365
# You'll be prompted for certificate details:
# - Country Name: US
# - State: Your State
# - City: Your City
# - Organization: Your Organization
# - Organizational Unit: IT Department
# - Common Name: localhost (IMPORTANT: use 'localhost' for local development)
# - Email: your-email@domain.comNote: When prompted for "Common Name", enter localhost to avoid browser certificate warnings during local development.
Once certificate files (server.crt and server.key) are present in the project directory:
- Server automatically starts in HTTPS mode
- Access via: https://localhost:8080
- WebSocket connections use: wss://localhost:8080/ws
- Real-time speech transcription with interim/final results
- Multi-language support with auto-detection
- AI-powered summarization with configurable Gemini models
- Audio visualization and session statistics
- Copy transcripts and summaries to clipboard
- Pre-configured prompt presets for different use cases (meetings, interviews, lectures)
- Configurable logging with structured output
- Languages: Configure BCP-47 codes (default: en-US,fr-FR,es-ES)
- Logging: Set
LOG_LEVEL(DEBUG/INFO/WARN/ERROR) andLOG_FORMAT(JSON/TEXT) - Port: Set
PORTenvironment variable (default: 8080) - Audio: 16kHz LINEAR16 mono format
The application supports prompt presets for different use cases. Presets are stored as text files in the presets directory (configurable via PRESET_DIRECTORY environment variable).
- General Summary: Basic conversation summarization
- Meeting Summary: Business meeting focused with decisions and action items
- Interview Summary: Job interview evaluation and assessment
- Lecture Notes: Educational content with key concepts and takeaways
Create custom presets by:
- Setting
PRESET_DIRECTORYenvironment variable to your custom directory - Creating
.txtfiles following this format:
Title: Your Preset Title
Summary: Your summary prompt content here...
This can span multiple lines.
Conclusion: Your conclusion prompt content here...
This can also span multiple lines.
- Restarting the application to load new presets
GET /- Web interfaceGET /api/default-prompt- Returns the default summary prompt as JSONGET /api/presets- Returns available preset names and titles as JSONGET /api/presets/{name}- Returns specific preset content (title, summary, conclusion)WebSocket /ws- Real-time audio streaming and transcription
go build -o live_transcription main.go
./live_transcription