A secure, Docker-based Python sandbox server using the Model Context Protocol (MCP) for isolated code execution and advanced healthcare analytics. This project enables secure processing of Synthea synthetic healthcare data with PostgreSQL OMOP CDM integration and LLM-powered analytics.
- 🔒 Secure Sandboxing: Isolated Docker containers with resource limits and user isolation
- 🏥 Healthcare Data Pipeline: Synthea-to-PostgreSQL with OMOP CDM mapping
- 🤖 LLM Integration: Natural language queries for healthcare analytics
- 📊 Advanced Analytics: Structured and LLM-friendly data exploration
- 🔧 MCP Protocol: Model Context Protocol for AI agent integration
- 🐳 Docker Integration: Containerized PostgreSQL database with data persistence
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ MCP Client │───▶│ FastMCP Server │───▶│ Docker Sandbox │
│ (AI Agent) │ │ (main.py) │ │ (Isolated) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌─────────────────┐
│ PostgreSQL DB │ │ Synthea CSV │
│ (OMOP CDM) │ │ (Mounted Data) │
└──────────────────┘ └─────────────────┘
- Python 3.8+ with pip
- Docker & Docker Compose
- Synthea CSV files (optional, for healthcare data processing)
This project is configured to use uv
for environment management. uv
creates and manages Python virtual environments and can install the dependencies declared in pyproject.toml
under tool.uv
.
Quick start using uv
:
# Install uv (see https://astral.sh/uv for instructions)
# Then create a uv-managed venv and install dependencies:
scripts/setup_uv.sh
source .venv/bin/activate
If you prefer not to use uv
, you can still create a regular venv and install the packages listed in pyproject.toml
or requirements.txt
.
git clone https://github.com/fastomop/omcp_py.git
cd omcp_py
# Install dependencies
pip install -r requirements.txt
# Start the OMOP database
docker-compose up -d db
# Verify it's running
docker-compose ps
Place your Synthea CSV files in the synthetic_data/
directory:
synthetic_data/
├── patients.csv # Patient demographics
├── encounters.csv # Healthcare encounters
├── conditions.csv # Medical conditions
└── ...
# Set Python path
export PYTHONPATH=src
# Start the server
python src/omcp_py/main.py
Use MCP Inspector or your preferred MCP client:
# Install MCP Inspector
npm install -g @modelcontextprotocol/inspector
# Connect to the server
mcp-inspector python src/omcp_py/main.py
Then open http://127.0.0.1:6274 in your browser.
# 1. Create sandbox and install packages
sandbox_id = await mcp.create_sandbox()
await mcp.install_package(sandbox_id, "pandas psycopg2-binary sqlalchemy")
# 2. Create OMOP CDM schema
await mcp.create_omop_schema(sandbox_id)
# 3. Load Synthea data
await mcp.load_synthea_to_postgres(sandbox_id, "/synthetic_data")
# 4. Run analytics
await mcp.analyze_omop_data(sandbox_id, "basic")
await mcp.llm_dataframe_operation(sandbox_id, "Count total patients")
Tool | Description | Example |
---|---|---|
create_sandbox |
Create isolated Python environment | create_sandbox() |
install_package |
Install Python packages | install_package(sandbox_id, "pandas") |
create_omop_schema |
Create OMOP CDM database schema | create_omop_schema(sandbox_id) |
load_synthea_to_postgres |
Load Synthea CSV to PostgreSQL | load_synthea_to_postgres(sandbox_id, "/synthetic_data") |
analyze_omop_data |
Run structured analytics | analyze_omop_data(sandbox_id, "basic") |
llm_dataframe_operation |
Natural language queries | llm_dataframe_operation(sandbox_id, "Count patients") |
execute_sql_in_sandbox |
Direct SQL execution | execute_sql_in_sandbox(sandbox_id, "SELECT COUNT(*) FROM person") |
remove_sandbox |
Clean up sandbox | remove_sandbox(sandbox_id, force=True) |
{
"total_patients": 1000,
"total_visits": 5000,
"total_conditions": 8000
}
[
{
"gender_concept_id": 8507,
"patient_count": 500,
"avg_age": 45.2
}
]
# These work with natural language
await mcp.llm_dataframe_operation(sandbox_id, "Count total patients")
await mcp.llm_dataframe_operation(sandbox_id, "Show age distribution")
await mcp.llm_dataframe_operation(sandbox_id, "Count unique conditions")
await mcp.llm_dataframe_operation(sandbox_id, "Show gender distribution")
Create a .env
file or set environment variables:
# Sandbox Configuration
SANDBOX_TIMEOUT=300
MAX_SANDBOXES=10
DOCKER_IMAGE=fastomop/sandbox:python-3.11-slim # recommended prebuilt sandbox image
DEBUG=false
LOG_LEVEL=INFO
# Database Configuration
DB_HOST=localhost
DB_PORT=5432
DB_USER=omop_user
DB_PASSWORD=omop_pass
DB_NAME=omop
The docker-compose.yml
provides:
- PostgreSQL 15 with OMOP database
- Persistent data storage
- Synthea data directory mounting
python tests/test_synthea_integration.py
./scripts/demo.sh
We provide a prebuilt sandbox Dockerfile and a convenience demo script to run an end-to-end local demo.
- Build the prebuilt sandbox image (optional but recommended):
docker build -t fastomop/sandbox:python-3.11-slim -f docker/sandbox/Dockerfile .
- Run the demo (builds image, starts DB, launches server, runs a local client and prints DB counts):
./scripts/demo.sh
If you have a DuckDB snapshot at synthetic_data/synthea.duckdb
and want the demo to load Synthea into Postgres, run:
./scripts/demo.sh --load-duckdb
If port 5432 on your host is already in use, pass an alternate host port to the demo script or set DB_PORT
in your environment (or .env) before running:
# Use port 5433 for the host mapping
./scripts/demo.sh --db-port 5433 --load-duckdb
# or export DB_PORT beforehand
export DB_PORT=5433
./scripts/demo.sh --load-duckdb
Notes:
- The sandbox manager will auto-join the docker-compose network (if detected) so sandboxes can resolve the
db
service name when running underdocker compose
. - If you use a host Postgres instance, set
DB_HOST=host.docker.internal
or enable host-gateway resolution.
# Test file structure
python -c "import src.omcp_py.main; print('✅ Main module loads successfully')"
# Test Docker Compose
docker-compose config
- Container Isolation: Each sandbox runs in isolated Docker containers
- Resource Limits: CPU and memory restrictions per sandbox
- User Isolation: Non-root user execution
- Network Security: Controlled network access
- File System: Read-only filesystem with temporary mounts
- Capability Dropping: Removed dangerous Linux capabilities
- Auto-cleanup: Automatic removal of inactive sandboxes
- Synthea Usage Guide - Detailed workflow documentation
- API Reference - Complete tool documentation
- Configuration Guide - Environment and deployment setup
- Architecture Overview - System design and components
Extend the Synthea-to-OMOP mapping in load_synthea_to_postgres
:
synthea_mappings = {
'custom_data.csv': {
'table': 'omop_cdm.custom_table',
'columns': {
'custom_id': 'person_id',
'custom_date': 'birth_datetime'
}
}
}
Extend the schema to include more OMOP CDM tables:
drug_exposure
procedure_occurrence
measurement
observation
Create domain-specific analytics:
# Custom Python code in sandbox
code = '''
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('postgresql://omcp:postgres@db:5432/omcp')
df = pd.read_sql("SELECT * FROM omop_cdm.person", engine)
# Your custom analysis here
result = df.groupby('gender_concept_id').agg({
'person_id': 'count',
'birth_datetime': lambda x: pd.Timestamp.now().year - pd.to_datetime(x).dt.year.mean()
}).to_dict()
print(result)
'''
await mcp.execute_python_code(sandbox_id, code)
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
MIT License - see LICENSE file for details.
- Model Context Protocol for the MCP specification
- FastMCP for the Python MCP implementation
- Synthea for synthetic healthcare data
- OMOP CDM for healthcare data standards
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Wiki
Built by Zhangshu Joshua Jiang and the wider FastOMCP team