An audio-first interface for AI conversation, built with Next.js and OpenAI.
This project provides a simple web interface to interact with an AI assistant using your voice. It handles audio recording, transcription, chat completion, and speech synthesis to create a seamless conversational experience.
- Voice-to-Text: Transcribes your speech in real-time using OpenAI's
whisper-1
model. - AI Chat: Generates intelligent and conversational responses using OpenAI's
gpt-4o-mini
model. - Text-to-Speech: Converts the AI's text response into natural-sounding audio using OpenAI's
tts-1
model. - Message Playback: Each AI response has a play button to replay the audio anytime.
- Audio Caching: Audio responses are cached to avoid repeated API calls for the same message.
- Conversation Persistence: Chat history is saved locally and persists across page refreshes.
- Responsive UI: A clean and simple interface that displays the conversation history and provides clear status indicators.
- Modern Tech Stack: Built with Next.js App Router, TypeScript, and Tailwind CSS.
Follow these steps to get the project running on your local machine.
You will need an OpenAI API key to use the service.
-
Install dependencies:
npm install # or yarn install # or pnpm install # or bun install
-
Set up environment variables: Create a file named
.env.local
in the root of the project and add your OpenAI API key:OPENAI_API_KEY="your_openai_api_key_here"
Note: This project uses the OpenAI API, which may incur costs.
-
Run the development server:
npm run dev # or yarn dev # or pnpm dev # or bun dev
-
Open http://localhost:3000 with your browser to see the result.
The interface is designed to be simple and intuitive:
- Click the microphone button to start recording your voice. Your browser may ask for microphone permission the first time.
- The button will turn red and pulse, and a timer will show the recording duration.
- Speak your question or message.
- Click the microphone button again to stop recording.
- The application will process your audio. You will see status updates on the screen: "Transcribing...", "Thinking...", and "Speaking...".
- The AI's response will be spoken out loud, and the full conversation will appear on the screen.
- Each AI response in the conversation history has a play button (
▶️ ) that you can click to replay that specific message. - When playing, the button changes to a pause button (⏸️) - click it to stop playback.
- Only one message can play at a time - starting a new one automatically stops the current playback.
- Play buttons are disabled during recording or when the system is processing audio.
- Your conversation history is automatically saved and will persist when you refresh the page.
- Use the Clear button (🗑️) in the conversation header to delete all messages and start fresh.
- Audio responses are cached locally, so replaying messages doesn't make additional API calls.
The easiest way to deploy your Next.js app is to use the Vercel Platform from the creators of Next.js.
When deploying, remember to add your OPENAI_API_KEY
as an environment variable in your Vercel project settings.
Check out our Next.js deployment documentation for more details.