Here is the complete content for the README.md
file, formatted correctly in Markdown, with the tables and code blocks properly enclosed.
# 🧬 DEEP-GOMS: Cohort-Driven Predictive Modeling of Immunotherapy Response
DEEP-GOMS (Deep learning for Gut-Omni-Immunotherapy Modeling System) is an integrated framework designed to predict patient response to Immune Checkpoint Inhibitors (ICI) and provide mechanistic insights into the gut microbiome-immune system-tumor axis.
This version of DEEP-GOMS shifts focus to leveraging **integrated, harmonized multi-omic cohort datasets** to train and interpret predictive models. This streamlines the pipeline, improving accessibility and bypassing the need for raw sequencing data analysis (e.g., taxonomic profiling with Kraken2 or HUMAnN3).
---
## Key Features (Updated)
* **Cohort Data Integration:** Aggregate and harmonize processed microbiome, single-cell, and immunotherapy response data from multiple public and private cohort studies (e.g., RaCInG, NRCO\_GOMS, PRECISE-, LORIS, etc.).
* **Multi-omic Feature Extraction:** Utilizes pre-calculated patient-level features, including gut microbiome profiles, intratumoral immune cell composition, and specialized **ILRI (Immunotherapy Ligand-Receptor Interaction) network scores**.
* **Machine Learning:** Trains deep learning models to predict immunotherapy response (e.g., durable clinical benefit) using harmonized multi-omic features across cohorts.
* **Interpretability:** Derives patient-specific **ILRI fingerprints** linking gut microbiome strains, immune cells, and tumor interactions for mechanistic insight.
---
## 💻 Setup and Installation
DEEP-GOMS requires a standard environment for data science and deep learning. No specialized bioinformatics tools are needed.
### 1. Environment Setup
We recommend using a virtual environment (e.g., `conda`) for dependency management.
```bash
# Clone the repository
git clone [https://github.com/your-repo/deep-goms.git](https://github.com/your-repo/deep-goms.git)
cd deep-goms
# Create and activate the conda environment
conda create -n deepgoms python=3.10
conda activate deepgoms
The primary machine learning, data handling, and network analysis components are run in Python (
Package | Version | Purpose |
---|---|---|
PyTorch | Latest Stable | Core Deep Learning Model (DEEP-GOMS) training. |
MetaPhlAn | Latest Stable | Taxonomic Profiling of raw WGS/16S data. |
GraPhlAn | Latest Stable | Visualization of taxonomic and phylogenic trees. |
scikit-learn | Latest Stable | Model evaluation, splitting, and general ML utilities. |
Numpy | 1.16.6+ | Numerical computing. |
Scipy | 1.7.1+ | Scientific computing (includes scipy.optimize and kernel methods). |
Pandas | 1.2.4+ | Data manipulation and handling. |
networkx | Latest Stable | Core graph analysis for ILRI feature computation. |
Seaborn | 0.11.2+ | Data visualization. |
Matplotlib | 3.4.3+ | Plotting and data visualization. |
Adjusttext | 0.7.3+ | Automated text placement in Matplotlib plots. |
To install the Python requirements:
# Install required Python packages
pip install torch scikit-learn pandas numpy scipy matplotlib seaborn networkx adjusttext
R version (
Package | Version | Purpose |
---|---|---|
liana | 0.1.10 | Ligand-Receptor Interaction (LRI) analysis. |
OmnipathR | 3.7.0 | Accessing molecular networks and pathway data. |
immunedeconv | 2.1.0 | Unified interface for multiple deconvolution methods. |
easier | 1.4.0 | Ensemble immune signature analysis. |
EPIC | 1.1.5 | Immune cell deconvolution method. |
MCPcounter | 1.2.0 | Immune cell deconvolution method. |
quantiseqr | 1.6.0 | Immune cell deconvolution method. |
xCell | 1.1.0 | Immune cell deconvolution method. |
ConsensusTME | 0.0.1.9000 | Consensus Tumor Microenvironment estimation. |
dplyr | 1.0.10 | Data manipulation and structure. |
ggplot2 | 3.4.0 | Data visualization. |
corrplot | 0.92 | Visualization of correlation matrices. |
This quick start guides you through downloading the integrated data and running a basic prediction/interpretation task using the combined cohort features.
The DEEP-GOMS pipeline relies on a pre-processed and harmonized feature matrix that integrates data from the specified cohort studies.
# This script downloads the pre-processed HDF5 file (or similar format)
# containing the harmonized multi-omic features and clinical outcomes.
python src/data/download_cohort_data.py
# Output: data/harmonized_features_all_cohorts.h5
Use the integrated data to initialize and train the predictive model.
import pandas as pd
from src.model.deepgoms import DEEPGOMS
from sklearn.model_selection import train_test_split
# 1. Load the harmonized feature matrix
data = pd.read_hdf('data/harmonized_features_all_cohorts.h5', key='features')
# Assuming features and labels are defined
X = data.drop(columns=['Response_Label', 'Cohort_ID'])
y = data['Response_Label']
# Split data for basic testing (Cross-validation across cohorts is recommended for robust evaluation)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 2. Initialize and Train the Model
input_dim = X_train.shape[1]
model = DEEPGOMS(input_dim=input_dim, hidden_dim=64, num_layers=2)
model.train_model(X_train, y_train, epochs=50, learning_rate=0.001)
# 3. Evaluate and Predict
predictions = model.predict_proba(X_test)
print(f"Sample prediction probabilities (Non-Response vs. Response):\n{predictions[:5]}")
Generate the core mechanistic insights that link specific microbiome and immune features to the model's prediction for a given patient.
from src.interpret.fingerprint import generate_ilri_fingerprint
# Select a patient's features from the test set for interpretation
patient_features = X_test.iloc[0]
# Generate the patient-specific interpretation
GOMS = generate_goms_signatures(model, patient_features)
print("--- Patient-Specific GOMS ---")
print(f"Predicted Response Probability: {predictions[0][1]:.4f}")
print("Top 5 Positive Predictors (Microbe-Immune-Tumor Interactions):")
print(fingerprint['Positive_Drivers'].head())
The following steps detail the full DEEP-GOMS predictive pipeline. Users primarily interact with the outputs of Data Harmonization and proceed to Model Training.
Step | Description | Dependencies/Tools | Output |
---|---|---|---|
Data Acquisition | Download processed multi-omic and clinical data from specified cohorts. | src/data/download_cohort_data.py |
Raw Cohort Data Files |
Data Harmonization | Standardize features, units, and metadata across cohorts. Correct batch effects using techniques like Harmony. | R (Harmony ), Python (pandas ) |
harmonized_features_all_cohorts.h5 |
Feature Engineering | Construct ILRI network features (e.g., graph centrality) from harmonized immune and microbiome data. | networkx , igraph , src/features/ilri_engineer.py |
Integrated Feature Matrix |
Model Training | Train the DEEP-GOMS deep learning architecture using the integrated multi-omic and GOMS features. | PyTorch , scikit-learn |
Trained deepgoms_model.pth |
Prediction & Interpretation | Output patient-specific immunotherapy response probabilities and generate detailed GOMS. | src/model/predict.py , src/interpret/signatures.py |
Predictions, Interpretive Reports |