Building open, reproducible frameworks for multi-omic data integration, spatial transcriptomics, and computational pathology.
My work focuses on developing interpretable and reproducible AI frameworks for cancer genomics, uniting biological prior knowledge with multi-omic and spatial data. The central goal is to replace black-box prediction with mechanistic understanding—models that not only perform well but explain how genomic alterations, perturbations, and drug responses reshape cellular states. Each framework in this series emphasizes pathway- and network-level interpretability, cross-dataset generalization, and transparent benchmarking, establishing reproducible standards for computational oncology. Through this approach, I aim to bridge machine learning, systems biology, and translational research, advancing models that predict, explain, and validate biological mechanisms.
The following key projects are part of the MM-KPNN framework family, a unified effort to develop concept-bottleneck AI models that embed biological knowledge directly into network architecture—ensuring interpretability, reproducibility, and mechanistic insight across multi-omic and spatial data.
A modular and interpretable graph framework for spatial transcriptomics in the tumor microenvironment.
- Combines Graph Attention Networks (GAT) with knowledge-primed decoding
- Explains immune exclusion, stromal remodeling, and therapy-induced rewiring
- Outputs attention maps, pathway overlays, and ligand–receptor driver rankings
2. MM-KPNN 
Interpretable multimodal neural network integrating scRNA-seq + scATAC-seq using biological priors.
- Decoder constrained by pathway and TF nodes
- Provides mechanistic attributions at the pathway and regulator levels
- A reproducible framework for multimodal interpretability and benchmarking
Pathway-bottleneck graph neural network for drug-sensitivity prediction across pharmacogenomic panels.
- Integrates multi-omic features, drug descriptors, and prior knowledge graphs
- Focuses on cross-panel generalization (e.g., CCLE → GDSC)
- Provides pathway-level interpretability and reproducible benchmarking
Extends MM-KPNN to model drug and CRISPR perturbation responses at single-cell resolution.
- Implements pathway and TF bottlenecks for interpretability
- Measures attribution stability and supports counterfactual pathway editing
- Designed for robust, cross-dataset perturbation benchmarking
A modular framework for computational analysis of organoid systems.
- Addresses reproducibility, heterogeneity, fidelity, integration, and prediction
- Integrates RNA and protein modalities with interpretable ML
- Demonstrates end-to-end reproducibility through documented, result-embedded notebooks
Spatial mapping of tumor and metastatic architecture using 10x Visium transcriptomics.
- Integrates curated gene programs to define epithelial, immune, stromal, and proliferative regions
- Reveals spatial organization and regional heterogeneity across breast tumors and lymph node metastases
- Fully documented, end-to-end notebook with embedded results and biological interpretation
End-to-end pipeline for structural variant discovery and annotation using PacBio long-read sequencing.
- Implements clinical annotation (ACMG/AMP) and variant filtering
- Includes functional scoring and visualization modules
- Designed for scalable deployment in HPC environments
Modular framework for rare-variant burden analysis in genomic cohorts.
- Supports SKAT, SKAT-O, and extended statistical methods
- Implements functional weighting and population correction
- Provides reproducible variant filtering and QC workflows
Systems biology workflow for reconstructing gene-regulatory networks.
- Integrates TF–target priors and expression-based inference
- Performs network topology and modularity analysis
- Identifies functionally enriched regulatory modules
Gene co-expression analysis pipeline using WGCNA.
- Identifies expression modules and hub genes
- Evaluates biological function and module preservation
- Applies to bulk and single-cell RNA-seq datasets
Workflow for secure, efficient genomic data transfer using Globus.
- Integrates HPC environments and folder structuring
- Enables checksum validation and metadata tracking
- Ensures reproducible data sharing for collaborative projects
Sally Yepes
📧 sallyepes233@gmail.com
🔗 GitHub: Sally332