Bad Therapy is an AI Agent chatbot that provides mental health coaching.
- Frontend: React 19, TypeScript
- Backend: Python, FastAPI
- Database: PostgreSQL (via Supabase)
- AI: LangGraph, OpenAI, Perplexity
- Auth: Auth0
- Conversational AI (Arlo):
- Safety checks for harmful language
- Context-aware conversations with relevant history retrieval
- AI-powered primary therapist (coaching, journaling)
- Therapist finder (uses Perplexity to suggest real therapists)
- Smart router to decide next steps
- User Profiles: Personalized sessions and recommendations.
- Journaling: Save and review your thoughts securely with rich text editing (TipTap) and AI-powered insights generator.
- Daily Tips: AI-generated wellness tips with curated resource links and credibility scoring.
- Mood Tracking: Daily mood logging with trend visualization and analytics.
- Rate Limiting: API protection with user-based rate limiting (30/min for AI, 100/min general).
- Message Formatting: Markdown-rendered AI responses with streaming support.
- Security: All data is encrypted and protected with Auth0 authentication.
- MCP (Model Context Protocol):
- Future-proof for multi-agent and multi-model workflows
- Each session uses a
TherapyState
object to track conversation, safety, and next actions.
- The core workflow is a LangGraph
StateGraph
(seeserver/graphs/therapy_graph.py
). - Nodes:
safety
: Checks for harmful language.router
: Decides next step.context
: Retrieves and summarizes relevant conversation history using GPT-4o-mini.primary_therapist
: Main chatbot including custom tool calls to save messages to the user journal.find_therapist
: Searches for and suggests real therapists based on user input and conversation history using the Perplexity api.journal_insights
: Analyzes user journal entries to provide AI-powered insights and patterns.
- Edges between nodes are conditional, so the agent can branch based on user input and state (e.g., escalate to human therapist if unsafe).
- FastAPI streams AI responses node-by-node for real-time feedback.
- The
primary_therapist
node can call thesave_to_journal
tool to save user messages to their journal.
- All prompts are saved in the
prompts
directory
- Conversations are saved in a PostgreSQL database for persistent memory.
- Each message is vector-embedded for future Retrieval-Augmented Generation (RAG).
- Sensitive data is encrypted at rest.
- Auth0 is used for authentication and access control.
- Rate limiting with slowapi
- All LangGraph runs can be visualized with LangGraph Studio debugging.
langgraph dev
- All LangGraph runs are traced with LangSmith for cloud-based debugging and observability.
- See tracings and monitoring here: https://smith.langchain.com/o/65c77578-2a48-42ef-a24f-8d83c29bc984/
- Production Security: Use
LANGSMITH_HIDE_INPUTS=true
andLANGSMITH_HIDE_OUTPUTS=true
environment variables to protect sensitive mental health data in production environments.
Bad Therapy includes a comprehensive evaluation system that integrates with LangSmith for tracking AI performance across safety, therapy quality, and system performance dimensions.
Safety Evaluators
- Crisis Detection: Evaluates ability to identify suicidal ideation, self-harm, and crisis situations
- Harmful Content Prevention: Tests filtering of inappropriate medical/therapeutic advice
- Referral Appropriateness: Assesses when the system correctly escalates to human professionals
Therapy Quality Evaluators
- Empathy Assessment: Measures emotional validation and therapeutic understanding
- Clinical Appropriateness: Ensures responses follow therapeutic boundaries and best practices
- Therapeutic Effectiveness: Evaluates the therapeutic value and insight promotion
- Boundary Maintenance: Prevents inappropriate personal disclosure or advice-giving
System Performance Evaluators
- Router Accuracy: Tests intent classification (therapy vs. therapist search vs. journal insights)
- Context Retention: Evaluates multi-turn conversation coherence and memory
- Response Relevance: Measures how well responses address user concerns
- Consistency: Assesses reliability and coherence across interactions
The evaluation system respects your production privacy settings:
- Input/Output Hiding: Works with
LANGSMITH_HIDE_INPUTS=true
andLANGSMITH_HIDE_OUTPUTS=true
- Metadata Tracking: Monitors performance metrics without exposing sensitive therapy content
- Synthetic Data: Uses realistic but non-sensitive test scenarios for comprehensive evaluation
Set up LangSmith datasets:
cd server
python -m evaluations.runners.evaluation_runner --type langsmith-setup
Run safety-critical evaluation:
python -m evaluations.runners.evaluation_runner --type langsmith-eval --langsmith-dataset "bad-therapy-safety-critical"
Run therapy quality evaluation:
python -m evaluations.runners.evaluation_runner --type langsmith-eval --langsmith-dataset "bad-therapy-quality"
Run system performance evaluation:
python -m evaluations.runners.evaluation_runner --type langsmith-eval --langsmith-dataset "bad-therapy-performance"
Run comprehensive batch evaluation:
python -m evaluations.runners.evaluation_runner --type langsmith-batch --experiment-name "my_evaluation"
Run traditional local evaluation (without LangSmith):
python -m evaluations.runners.evaluation_runner --type full
python -m evaluations.runners.evaluation_runner --type safety
python -m evaluations.runners.evaluation_runner --type quality
python -m evaluations.runners.evaluation_runner --type performance
Install dependencies:
cd server
source .venv/bin/activate
uv pip install -r pyproject.toml
# For production deployment
uv pip freeze > requirements.txt
Run development server:
uv run fastapi dev
Run server unit tests:
PYTHONPATH=. uv run pytest
Install dependencies:
cd client
npm install
Run development:
npm run dev