🎭 TalkMateAI

Real-time Voice-Controlled 3D Avatar with Multimodal AI

Your 3D AI companion that never stops listening, never stops caring.

Transform conversations into immersive experiences with AI-powered 3D avatars that see, hear, and respond naturally.

🎥 Demo Video

✨ Features

🎯 Core Capabilities

🎤 Real-time Voice Activity Detection - Advanced VAD with configurable sensitivity
🗣️ Speech-to-Text - Powered by OpenAI Whisper (tiny model) for instant transcription
👁️ Vision Understanding - SmolVLM2-256M-Video-Instruct for multimodal comprehension
🔊 Natural Text-to-Speech - Kokoro TTS with native word-level timing
🎭 3D Avatar Animation - Lip-sync and emotion-driven animations using TalkingHead

🚀 Advanced Features

📹 Camera Integration - Real-time image capture with voice commands
⚡ Streaming Responses - Chunked audio generation for minimal latency
🎬 Native Timing Sync - Perfect lip-sync using Kokoro's native timing data
🎨 Draggable Camera View - Floating, resizable camera interface
📊 Real-time Analytics - Voice energy visualization and transmission tracking
🔄 WebSocket Communication - Low-latency bidirectional data flow

🏗️ Architecture

🛠️ Technology Stack

Backend (Python)

🧠 AI Models from HuggingFace🤗:
- openai/whisper-tiny - Speech recognition
- HuggingFaceTB/SmolVLM2-256M-Video-Instruct - Vision-language understanding
- Kokoro TTS - High-quality voice synthesis
⚡ Framework: FastAPI with WebSocket support
🔧 Processing: PyTorch, Transformers, Flash Attention 2
🎵 Audio: SoundFile, NumPy for real-time processing

Frontend (TypeScript/React)

🖼️ Framework: Next.js 15 with TypeScript
🎨 UI: Tailwind CSS + shadcn/ui components
🎭 3D Rendering: TalkingHead library
🎙️ Audio: Web Audio API with AudioWorklet
📡 Communication: Native WebSocket with React Context

🔧 Development Tools

📦 Package Management: UV (Python) + PNPM (Node.js)
🎨 Code Formatting:
- Backend: Black (Python)
- Frontend: Prettier (TypeScript/React)
🔍 Quality Control: Husky for pre-commit hooks

📋 Requirements

System Tested on

OS: Windows 11 (Linux/macOS support coming soon, will create a docker image)
GPU: NVIDIA RTX 3070 (8GB VRAM)

🚀 Quick Start

1. Prerequisites

Node.js 20+
PNPM
Python 3.10
UV (Python package manager)

2. Setup monorepo dependencies from root

# will setup both frontend and backend but require the prerequisites
pnpm run monorepo-setup

3. Development Workflow

# Format code before committing (recommended)
pnpm format

4. Run the Application

Start Development Servers

# Run both frontend and backend from root
pnpm dev

# Or run individually
pnpm dev:client  # Frontend (http://localhost:3000)
pnpm dev:server  # Backend (http://localhost:8000)

5. Initial Setup

Allow microphone access when prompted
Enable camera for multimodal interactions
Click "Connect" to establish WebSocket connection
Start Voice Control and begin speaking!

🎮 Usage Guide

Camera Controls

Drag to move camera window
Resize with maximize/minimize buttons
Toggle on/off as needed

Voice Settings

Energy Threshold: Adjust sensitivity to background noise
Pause Duration: How long to wait before processing speech
Min/Max Speech: Control segment length limits

🙏 Acknowledgments

TalkingHead (met4citizen) for 3D avatar rendering and lip-sync
yeyu2 (Multimodal-local-phi4) for multimodal implementation inspiration

⭐ Star this repo if you find it useful! ⭐

Made with ❤️ by the Kiranbaby14

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.husky		.husky
apps		apps
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎭 TalkMateAI

🎥 Demo Video

✨ Features

🎯 Core Capabilities

🚀 Advanced Features

🏗️ Architecture

🛠️ Technology Stack

Backend (Python)

Frontend (TypeScript/React)

🔧 Development Tools

📋 Requirements

System Tested on

🚀 Quick Start

1. Prerequisites

2. Setup monorepo dependencies from root

3. Development Workflow

4. Run the Application

5. Initial Setup

🎮 Usage Guide

Camera Controls

Voice Settings

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

kiranbaby14/TalkMateAI

Folders and files

Latest commit

History

Repository files navigation

🎭 TalkMateAI

🎥 Demo Video

✨ Features

🎯 Core Capabilities

🚀 Advanced Features

🏗️ Architecture

🛠️ Technology Stack

Backend (Python)

Frontend (TypeScript/React)

🔧 Development Tools

📋 Requirements

System Tested on

🚀 Quick Start

1. Prerequisites

2. Setup monorepo dependencies from root

3. Development Workflow

4. Run the Application

5. Initial Setup

🎮 Usage Guide

Camera Controls

Voice Settings

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages