OMCP Python Sandbox Server

Overview

A secure, Docker-based Python sandbox server using the Model Context Protocol (MCP) for isolated code execution and advanced healthcare analytics. This project enables secure processing of Synthea synthetic healthcare data with PostgreSQL OMOP CDM integration and LLM-powered analytics.

🚀 Key Features

🔒 Secure Sandboxing: Isolated Docker containers with resource limits and user isolation
🏥 Healthcare Data Pipeline: Synthea-to-PostgreSQL with OMOP CDM mapping
🤖 LLM Integration: Natural language queries for healthcare analytics
📊 Advanced Analytics: Structured and LLM-friendly data exploration
🔧 MCP Protocol: Model Context Protocol for AI agent integration
🐳 Docker Integration: Containerized PostgreSQL database with data persistence

🏗️ Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   MCP Client    │───▶│  FastMCP Server  │───▶│ Docker Sandbox  │
│  (AI Agent)     │    │   (main.py)      │    │  (Isolated)     │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                │                        │
                                ▼                        ▼
                       ┌──────────────────┐    ┌─────────────────┐
                       │ PostgreSQL DB    │    │ Synthea CSV     │
                       │ (OMOP CDM)       │    │ (Mounted Data)  │
                       └──────────────────┘    └─────────────────┘

📋 Prerequisites

Python 3.8+ with pip
Docker & Docker Compose
Synthea CSV files (optional, for healthcare data processing)

Using UV for environment management

This project is configured to use uv for environment management. uv creates and manages Python virtual environments and can install the dependencies declared in pyproject.toml under tool.uv.

Quick start using uv:

# Install uv (see https://astral.sh/uv for instructions)
# Then create a uv-managed venv and install dependencies:
scripts/setup_uv.sh
source .venv/bin/activate

If you prefer not to use uv, you can still create a regular venv and install the packages listed in pyproject.toml or requirements.txt.

🚀 Quick Start

1. Clone and Setup

git clone https://github.com/fastomop/omcp_py.git
cd omcp_py

# Install dependencies
pip install -r requirements.txt

2. Start PostgreSQL Database

# Start the OMOP database
docker-compose up -d db

# Verify it's running
docker-compose ps

3. Prepare Data (Optional)

Place your Synthea CSV files in the synthetic_data/ directory:

synthetic_data/
├── patients.csv      # Patient demographics
├── encounters.csv    # Healthcare encounters  
├── conditions.csv    # Medical conditions
└── ...

4. Start the MCP Server

# Set Python path
export PYTHONPATH=src

# Start the server
python src/omcp_py/main.py

5. Connect with MCP Client

Use MCP Inspector or your preferred MCP client:

# Install MCP Inspector
npm install -g @modelcontextprotocol/inspector

# Connect to the server
mcp-inspector python src/omcp_py/main.py

Then open http://127.0.0.1:6274 in your browser.

🏥 Healthcare Data Workflow

Complete Synthea-to-PostgreSQL Pipeline

# 1. Create sandbox and install packages
sandbox_id = await mcp.create_sandbox()
await mcp.install_package(sandbox_id, "pandas psycopg2-binary sqlalchemy")

# 2. Create OMOP CDM schema
await mcp.create_omop_schema(sandbox_id)

# 3. Load Synthea data
await mcp.load_synthea_to_postgres(sandbox_id, "/synthetic_data")

# 4. Run analytics
await mcp.analyze_omop_data(sandbox_id, "basic")
await mcp.llm_dataframe_operation(sandbox_id, "Count total patients")

Available MCP Tools

Tool	Description	Example
`create_sandbox`	Create isolated Python environment	`create_sandbox()`
`install_package`	Install Python packages	`install_package(sandbox_id, "pandas")`
`create_omop_schema`	Create OMOP CDM database schema	`create_omop_schema(sandbox_id)`
`load_synthea_to_postgres`	Load Synthea CSV to PostgreSQL	`load_synthea_to_postgres(sandbox_id, "/synthetic_data")`
`analyze_omop_data`	Run structured analytics	`analyze_omop_data(sandbox_id, "basic")`
`llm_dataframe_operation`	Natural language queries	`llm_dataframe_operation(sandbox_id, "Count patients")`
`execute_sql_in_sandbox`	Direct SQL execution	`execute_sql_in_sandbox(sandbox_id, "SELECT COUNT(*) FROM person")`
`remove_sandbox`	Clean up sandbox	`remove_sandbox(sandbox_id, force=True)`

📊 Analytics Examples

Basic Counts

{
  "total_patients": 1000,
  "total_visits": 5000,
  "total_conditions": 8000
}

Demographics Analysis

[
  {
    "gender_concept_id": 8507,
    "patient_count": 500,
    "avg_age": 45.2
  }
]

LLM Natural Language Queries

# These work with natural language
await mcp.llm_dataframe_operation(sandbox_id, "Count total patients")
await mcp.llm_dataframe_operation(sandbox_id, "Show age distribution")
await mcp.llm_dataframe_operation(sandbox_id, "Count unique conditions")
await mcp.llm_dataframe_operation(sandbox_id, "Show gender distribution")

🔧 Configuration

Environment Variables

Create a .env file or set environment variables:

# Sandbox Configuration
SANDBOX_TIMEOUT=300
MAX_SANDBOXES=10
DOCKER_IMAGE=fastomop/sandbox:python-3.11-slim  # recommended prebuilt sandbox image
DEBUG=false
LOG_LEVEL=INFO

# Database Configuration
DB_HOST=localhost
DB_PORT=5432
DB_USER=omop_user
DB_PASSWORD=omop_pass
DB_NAME=omop

Docker Compose

The docker-compose.yml provides:

PostgreSQL 15 with OMOP database
Persistent data storage
Synthea data directory mounting

🧪 Testing

Run Integration Tests

python tests/test_synthea_integration.py

Run Workflow Demo

./scripts/demo.sh

Demo and prebuilt sandbox image

We provide a prebuilt sandbox Dockerfile and a convenience demo script to run an end-to-end local demo.

Build the prebuilt sandbox image (optional but recommended):

docker build -t fastomop/sandbox:python-3.11-slim -f docker/sandbox/Dockerfile .

Run the demo (builds image, starts DB, launches server, runs a local client and prints DB counts):

./scripts/demo.sh

If you have a DuckDB snapshot at synthetic_data/synthea.duckdb and want the demo to load Synthea into Postgres, run:

./scripts/demo.sh --load-duckdb

If port 5432 on your host is already in use, pass an alternate host port to the demo script or set DB_PORT in your environment (or .env) before running:

# Use port 5433 for the host mapping
./scripts/demo.sh --db-port 5433 --load-duckdb

# or export DB_PORT beforehand
export DB_PORT=5433
./scripts/demo.sh --load-duckdb

Notes:

The sandbox manager will auto-join the docker-compose network (if detected) so sandboxes can resolve the db service name when running under docker compose.
If you use a host Postgres instance, set DB_HOST=host.docker.internal or enable host-gateway resolution.

Test Individual Components

# Test file structure
python -c "import src.omcp_py.main; print('✅ Main module loads successfully')"

# Test Docker Compose
docker-compose config

🔒 Security Features

Container Isolation: Each sandbox runs in isolated Docker containers
Resource Limits: CPU and memory restrictions per sandbox
User Isolation: Non-root user execution
Network Security: Controlled network access
File System: Read-only filesystem with temporary mounts
Capability Dropping: Removed dangerous Linux capabilities
Auto-cleanup: Automatic removal of inactive sandboxes

📚 Documentation

Synthea Usage Guide - Detailed workflow documentation
API Reference - Complete tool documentation
Configuration Guide - Environment and deployment setup
Architecture Overview - System design and components

🚀 Advanced Usage

Custom Data Mapping

Extend the Synthea-to-OMOP mapping in load_synthea_to_postgres:

synthea_mappings = {
    'custom_data.csv': {
        'table': 'omop_cdm.custom_table',
        'columns': {
            'custom_id': 'person_id',
            'custom_date': 'birth_datetime'
        }
    }
}

Additional OMOP Tables

Extend the schema to include more OMOP CDM tables:

drug_exposure
procedure_occurrence
measurement
observation

Custom Analytics

Create domain-specific analytics:

# Custom Python code in sandbox
code = '''
import pandas as pd
from sqlalchemy import create_engine

engine = create_engine('postgresql://omcp:postgres@db:5432/omcp')
df = pd.read_sql("SELECT * FROM omop_cdm.person", engine)

# Your custom analysis here
result = df.groupby('gender_concept_id').agg({
    'person_id': 'count',
    'birth_datetime': lambda x: pd.Timestamp.now().year - pd.to_datetime(x).dt.year.mean()
}).to_dict()

print(result)
'''

await mcp.execute_python_code(sandbox_id, code)

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

Model Context Protocol for the MCP specification
FastMCP for the Python MCP implementation
Synthea for synthetic healthcare data
OMOP CDM for healthcare data standards

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: Wiki

Built by Zhangshu Joshua Jiang and the wider FastOMCP team

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
docker		docker
docs		docs
scripts		scripts
src/omcp_py		src/omcp_py
synthetic_data		synthetic_data
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
Dockerfile.sandbox		Dockerfile.sandbox
README.md		README.md
WIKI.md		WIKI.md
docker-compose.yml		docker-compose.yml
install.sh		install.sh
mkdocs.yml		mkdocs.yml
package.json		package.json
pyodide_server.js		pyodide_server.js
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sample.env		sample.env
uv.lock		uv.lock

fastomop/omcp_py

Folders and files

Latest commit

History

Repository files navigation