Live Audio Transcription

Real-time audio transcription with Go backend and web frontend. Captures microphone audio, streams via WebSocket, and provides live transcription using Google Cloud Speech-to-Text and Vertex AI.

Quick Start

Prerequisites

Go 1.24+ installed
Google Cloud account with an active project
Google Cloud CLI (gcloud) installed

1. Google Cloud Setup

Install and configure gcloud CLI:

# Install gcloud CLI (if not already installed)
# Visit: https://cloud.google.com/sdk/docs/install

# Initialize gcloud and authenticate
gcloud init

# Set up Application Default Credentials
gcloud auth application-default login

Create/configure your GCP project:

# Set your project ID (replace with your actual project)
export GCP_PROJECT_ID=your-project-id
export GCP_LOCATION=us-central1

# Set the project as default
gcloud config set project $GCP_PROJECT_ID

# Enable required APIs
gcloud services enable speech.googleapis.com
gcloud services enable aiplatform.googleapis.com

Set environment variables:

# Add these to your shell profile (.bashrc, .zshrc, etc.)
export GCP_PROJECT_ID=your-project-id
export GCP_LOCATION=us-central1

# AI Model Configuration
export GEMINI_MODEL=gemini-2.5-flash  # Gemini model to use (default: gemini-2.5-flash)

# Logging Configuration
export LOG_LEVEL=INFO    # DEBUG, INFO, WARN, ERROR (default: INFO)
export LOG_FORMAT=JSON   # JSON, TEXT (default: JSON)

# Preset Configuration
export PRESET_DIRECTORY=./presets  # Directory containing preset files (default: ./presets)

# Optional: Set custom port (default: 8080)
export PORT=8080

2. Run the Application

# Install Go dependencies
go mod tidy

# Run the server
go run main.go

3. Access the Web Interface

Open your browser and navigate to: http://localhost:8080

HTTPS Support (Optional)

For secure connections, the server automatically detects SSL certificate files and enables HTTPS:

Note: secure connection is mandatory to use anything else that localhost

Generate Self-Signed Certificate

# Generate a private key
openssl genrsa -out server.key 2048

# Generate a self-signed certificate (valid for 365 days)
openssl req -new -x509 -key server.key -out server.crt -days 365

# You'll be prompted for certificate details:
# - Country Name: US
# - State: Your State  
# - City: Your City
# - Organization: Your Organization
# - Organizational Unit: IT Department
# - Common Name: localhost (IMPORTANT: use 'localhost' for local development)
# - Email: your-email@domain.com

Note: When prompted for "Common Name", enter localhost to avoid browser certificate warnings during local development.

HTTPS Access

Once certificate files (server.crt and server.key) are present in the project directory:

Server automatically starts in HTTPS mode
Access via: https://localhost:8080
WebSocket connections use: wss://localhost:8080/ws

Certificate Security

⚠️ Self-signed certificates will show browser warnings. For production use, obtain certificates from a trusted Certificate Authority (CA) like Let's Encrypt.

Features

Real-time speech transcription with interim/final results
Multi-language support with auto-detection
AI-powered summarization with configurable Gemini models
Audio visualization and session statistics
Copy transcripts and summaries to clipboard
Pre-configured prompt presets for different use cases (meetings, interviews, lectures)
Configurable logging with structured output

Configuration

Languages: Configure BCP-47 codes (default: en-US,fr-FR,es-ES)
Logging: Set LOG_LEVEL (DEBUG/INFO/WARN/ERROR) and LOG_FORMAT (JSON/TEXT)
Port: Set PORT environment variable (default: 8080)
Audio: 16kHz LINEAR16 mono format

Presets

The application supports prompt presets for different use cases. Presets are stored as text files in the presets directory (configurable via PRESET_DIRECTORY environment variable).

Built-in Presets

General Summary: Basic conversation summarization
Meeting Summary: Business meeting focused with decisions and action items
Interview Summary: Job interview evaluation and assessment
Lecture Notes: Educational content with key concepts and takeaways

Custom Presets

Create custom presets by:

Setting PRESET_DIRECTORY environment variable to your custom directory
Creating .txt files following this format:

Title: Your Preset Title
Summary: Your summary prompt content here...
This can span multiple lines.

Conclusion: Your conclusion prompt content here...
This can also span multiple lines.

Restarting the application to load new presets

API Endpoints

GET / - Web interface
GET /api/default-prompt - Returns the default summary prompt as JSON
GET /api/presets - Returns available preset names and titles as JSON
GET /api/presets/{name} - Returns specific preset content (title, summary, conclusion)
WebSocket /ws - Real-time audio streaming and transcription

Build

go build -o live_transcription main.go
./live_transcription

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.claude		.claude
custom_presets		custom_presets
presets		presets
testdata		testdata
ui		ui
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
Makefile		Makefile
README.md		README.md
TESTING.md		TESTING.md
api_integration_test.go		api_integration_test.go
genai.go		genai.go
genai_test.go		genai_test.go
go.mod		go.mod
go.sum		go.sum
handlers.go		handlers.go
handlers_test.go		handlers_test.go
integration_test.go		integration_test.go
logger.go		logger.go
logger_test.go		logger_test.go
main.go		main.go
main_test.go		main_test.go
prompt		prompt
setup-dev-tools.sh		setup-dev-tools.sh
speech.go		speech.go
speech_test.go		speech_test.go
types.go		types.go
types_test.go		types_test.go
websocket.go		websocket.go
websocket_test.go		websocket_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Live Audio Transcription

Quick Start

Prerequisites

1. Google Cloud Setup

2. Run the Application

3. Access the Web Interface

HTTPS Support (Optional)

Generate Self-Signed Certificate

HTTPS Access

Certificate Security

Features

Configuration

Presets

Built-in Presets

Custom Presets

API Endpoints

Build

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

owulveryck/live_transcription

Folders and files

Latest commit

History

Repository files navigation

Live Audio Transcription

Quick Start

Prerequisites

1. Google Cloud Setup

2. Run the Application

3. Access the Web Interface

HTTPS Support (Optional)

Generate Self-Signed Certificate

HTTPS Access

Certificate Security

Features

Configuration

Presets

Built-in Presets

Custom Presets

API Endpoints

Build

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages