A Go-based audio transcription tool that uses Google Cloud Platform's Vertex AI Gemini to transcribe audio files and generate summaries with timecoded speaker identification.
-
Google Cloud Setup
# Install gcloud CLI and authenticate gcloud auth application-default login # Set your project ID export GCP_PROJECT="your-project-id"
-
Install Dependencies
- Go 1.19+
- ffmpeg (for audio splitting)
- Clone and build:
git clone https://github.com/owulveryck/audiotranscribe.git cd audiotranscribe go build -o audiotranscribe .
Single audio file:
./audiotranscribe audio.m4aMultiple audio files with output file:
./audiotranscribe -o transcript.md audio1.m4a audio2.m4aLarge files (auto-split into 25min chunks):
./split_and_transcribe.sh large_audio.m4aGCP_PROJECT(required) - Your Google Cloud project IDGEMINI_MODEL(optional) - Gemini model to use (default: "gemini-2.0-flash")GCP_REGION(optional) - GCP region (default: "europe-west9")
The tool generates markdown files with:
- Timestamped transcripts with speaker identification
- Combined summaries for multiple files
- Structured format for easy reading
Example output placed in same directory as input files.