Exploring cell-to-cell variability and functional insights through differentially variable gene analysis

Gatlin, Victoria; Gupta, Shreyan; Romero, Selim; Chapkin, Robert S.; Cai, James J.

doi:10.1038/s41540-025-00507-z

Download PDF

Article
Open access
Published: 20 March 2025

Exploring cell-to-cell variability and functional insights through differentially variable gene analysis

npj Systems Biology and Applications volume 11, Article number: 29 (2025) Cite this article

4266 Accesses
3 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular variability by capturing gene expression profiles of individual cells. The importance of cell-to-cell variability in determining and shaping cell function has been widely appreciated. Nevertheless, differential expression (DE) analysis remains a cornerstone method in analytical practice. Current computational analyses overlook the rich information encoded by variability within the single-cell gene expression data by focusing exclusively on mean expression. To offer a deeper understanding of cellular systems, there is a need for approaches to assess data variability rather than just the mean. Here we present spline-DV, a statistical framework for differential variability (DV) analysis using scRNA-seq data. The spline-DV method identifies genes exhibiting significantly increased or decreased expression variability among cells derived from two experimental conditions. Case studies show that DV genes identified using spline-DV are representative and functionally relevant to tested cellular conditions, including obesity, fibrosis, and cancer.

Isolating salient variations of interest in single-cell data with contrastiveVI

Article 07 August 2023

Discovering cell types using manifold learning and enhanced visualization of single-cell RNA-Seq data

Article Open access 07 January 2022

Data-driven comparison of multiple high-dimensional single-cell expression profiles

Article Open access 01 November 2021

Introduction

It is well established that transcription of many genes occurs in bursts^1,2, resulting in large mRNA number variability between cells, even if they are from the same underlying state. The importance of cell-to-cell gene expression variability in determining and shaping cell function has been recognized for some time^3,4,5. Increased variability in gene expression has been associated with the differentiation of embryonic stem cells⁶, reprogramming of induced pluripotent stem cells⁷, circadian rhythm regulation^8,9, differentiation^10,11, and aging^12,13. In the single-cell omics era, investigators now have a higher resolution tool such as single-cell RNA sequencing (scRNA-seq) to reveal gene expression variability between cells, allowing us to better understand its role. In a recent study¹⁴, we demonstrated that single-cell gene expression variability is intrinsically linked to the function of genes. Individual cells of the same cell type exhibit stochastic gene expression beyond technical noise. This variability is primarily driven by highly variable genes (HVGs), which may or may not be highly expressed in all cells but show significant variation between cells and predominantly contribute to cell type-specific functions. Given that variability is central to cellular function, as our findings suggest, developing analytical methods and computational tools for single-cell data should take a variability-centric view. Unfortunately, in practice, the importance of variability is often overlooked. Many single-cell data analytical frameworks are mean-centric and neglect gene expression variability by design. Significant refinement is required in many methods developed under non-variability-centric principles.

Differential expression (DE) analysis has been a mainstay of gene expression studies since the days of microarray and bulk RNA-seq. DE analysis focuses on identifying genes that are up- or down-regulated (with increased or decreased expression) across conditions, using a mean-difference approach, which is also applied to scRNA-seq data using R packages like DESeq2, EdgeR, and limma¹⁵. Despite the development of single-cell-specific DE methods¹⁶, these do not consistently outperform bulk-oriented methods¹⁷. These methods often consider cell‐to‐cell variability and dropout as mere technical noise, overlooking the crucial biological insights embedded within this variability. Recent studies have demonstrated that dropout events can be as biologically informative as expression levels^18,19, further suggesting that DE methods may miss key aspects of gene expression regulation. Moreover, inconsistent identification of DE genes across existing methods²⁰, underscores a need for approaches that capture the full spectrum of biologically relevant expression variability.

Here, we advocate for moving beyond methods that focus solely on detecting mean expression differences in scRNA-seq data to embrace gene expression variability as a crucial dimension of cellular biology. We acknowledge the central role of gene expression variability, challenging the current dominance of mean-based DE analysis in single-cell studies. This shift becomes particularly urgent in light of recent efforts, such as the study by Squair et al.²¹ which proposes analyzing pooled, pseudobulk scRNA-seq data to mimic bulk RNA-seq for DE analysis. While such approaches may seem appealing, they disregard scRNA-seq’s primary strength—capturing cell-to-cell variability. Critically, Squair et al.²¹ use bulk RNA-seq DE results as a reference to evaluate the performance of scRNA-seq DE methods, reinforcing a mean-centric view that considers bulk RNA-seq as the gold standard, despite its limitations in capturing cellular variability. Thus, the assumption that the pseudobulk method accurately recovers bulk RNA-seq expression patterns needs further investigation, as these techniques likely reveal distinct biological phenomena. Mean-centric thinking in molecular biology due to the legacy of microarray and bulk RNA-seq analysis poses a risk of obscuring key biological phenomena when applied to scRNA-seq. Such a focus may miss critical insights that a variability-driven approach could uncover. We propose that prioritizing variability in gene expression offers equally if not more, valuable insights into cellular function by capturing a larger spectrum of biological heterogeneity.

Results

Illustrating the “variation-is-function” concept

The “variation-is-function” hypothesis proposed by Dueck et al.⁴ posits that cell-to-cell gene expression variability is key to population-level cellular functions. Our previous study¹⁴ empirically supports this hypothesis, showing that highly variable genes (HVGs) in homogeneous cellular populations participate in biological processes and provide molecular functions specific to their respective cell types. While both mean-based and variability-based analyses may identify enriched functions, our findings suggest that variability-based approaches may be particularly effective in capturing the functional relevance of within-cell-type variation. Interestingly, most HVGs are not highly expressed, whereas highly expressed genes—such as housekeeping genes and marker genes—tend not to convey cell-type specific functions. Figure 1 illustrates the “variation-is-function” concept through a scatter plot, where genes are represented using three summary statistics: mean, coefficient of variation (CV), and dropout rate. We applied a spline-fit algorithm from scGEAToolbox²² to generate a curve for all genes, identifying HVGs as those that deviate most from this curve. In the simulated data (Fig. 1A), no HVGs emerge, as all genes align closely along the spline-fit curve. By contrast, human embryonic stem cells (hESCs) and human umbilical vein endothelial cells (hUVECs) show gene expression variability with the level increased with differentiation—undifferentiated hESCs display fewer detected HVGs (Fig. 1B), while differentiated HUVECs exhibit higher variability, more HVGs (Fig. 1C). As undifferentiated cells (e.g., hESCs) differentiate into more distinct, mature forms (e.g., HUVECs), more genes are expressed increasingly variably and become HVGs. One of such genes is ANKRD1 (shown in Fig. 1C), encoding the ankyrin repeat domain 1 transcription factor, which localizes to the nucleus of endothelial cells to play its role.

**Fig. 1: Single-cell gene expression variability, driven by highly variable genes (HVGs), increases with cell differentiation level.**

Identification of differentially variable genes

We introduce the spline-DV method—a nonparametric, model-free framework for analyzing changes in single-cell gene expression variability between two conditions. The aim is to identify genes showing differential variability (DV), which are functionally more active or transcriptionally more engaged in one condition than the other. By focusing on variability changes independent of mean gene expression, DV analysis provides a new, distinct perspective on cellular state transitions across conditions. Figure 2 illustrates the spline-DV method. This approach uses three gene-level metrics—mean expression, CV, and dropout rate as x, y, and z coordinates, respectively, to create a 3D model for estimating gene expression variability. Within this 3D space, two spline-fit curves are generated for two conditions independently (Fig. 2A) and merged and visualized together for comparative assessment (Fig. 2B). For a given gene (e.g., Gene A in Fig. 2A, B), its position in this 3D space is determined by the observed mean expression, CV, and dropout rate. A vector originating at the nearest point on the spline curve to the gene’s position represents the gene’s deviation from the expected expression statistics. For two conditions, two vectors ${\vec{v}}_{1}$ and ${\vec{v}}_{2}$ are defined. The magnitude of each vector was used to quantify the level of deviation, i.e., the expression variability of the gene in each condition. To obtain the level of DV between two conditions, the difference between ${\vec{v}}_{1}$ and ${\vec{v}}_{2}$ is computed (Fig. 2C). The resulting DV vector, $\vec{{dv}}={\vec{v}}_{2}-{\vec{v}}_{1}$, captures the difference in variability between two conditions. In this way, a fair comparison between conditions is made by first comparing each gene to its expected statistics rather than making a direct comparison. The magnitude of $\vec{{dv}}$ is called DV score, which is used to quantify the level of DV of a gene between two conditions (Methods). Finally, spline-DV ranked the list of genes based on their DV scores. Top DV genes across conditions are prioritized for further investigation as they likely play crucial roles in the biological processes under study.

**Fig. 2: Illustration of the spline-DV method proposed in this study.**

Simulated data analysis for method validation

To evaluate the performance of spline-DV, we first simulated scRNA-seq data with controlled variability modifications. Simulated data was chosen to mimic real-world scRNA-seq challenges, including sparse expression patterns, variations in dropout rates, and relevant changes in variability. This controlled environment allowed us to rigorously test spline-DV’s ability to detect DV under conditions representative of actual biological systems. A gene-by-cell matrix was generated with 500 genes and 1000 cells, divided into two equal groups (A and B). Ten genes were then randomly selected, with seven assigned increased variability and three assigned decreased variability in group B compared to group A. Variability modifications included scaling expression values to emulate biologically relevant changes in variability, incorporating changes in the CV and dropout rates (“Methods”). Spline-DV successfully identified all “ground-truth” DV genes, highlighting its accuracy and robustness (Supplementary Fig. 1).

Case studies

Case study 1—More insightful genes identified in adipocytes in response to diet-induced obesity

We applied the spline-DV method to real scRNA-seq data sets to assess its performance. Our first case study utilized a scRNA-seq data set from a study on diet-induced obesity²³, in which adipocytes dissociated from adipose tissues in mice fed either low-fat diet (LFD) or high-fat diet (HFD) for 18 weeks (Fig. 3A) were collected and compared (Fig. 3B). Using the spline-DV, we identified 249 DV genes (Supplementary Table 1), showing differential variability between the two conditions. The top genes include Plpp1, Thrsp, Blcap, Nnat, and Lyz2 (Fig. 3C). Supplementary videos showcase the differential deviation of Plpp1 and Thrsp from spline curves across two conditions (Supplementary Videos 1 and 2). Plpp1, which encodes a protein of the phosphatidic acid phosphatase family, showed increased variability in the HFD sample (Fig. 3D). Plpp1 deletion increases endogenous lipid lysophosphatidic acid concentrations in hepatocytes and reduces glucose production²⁴. Thrsp, which encodes a thyroid hormone-inducible hepatic protein, exhibited decreased variability under HFD conditions (Fig. 3D). Thrsp deletion decreases mitochondrial respiration and fatty acid oxidation is known to contribute to metabolic dysfunction in obese adipose tissue²⁵. Functional enrichment analysis revealed that DV genes are enriched in core adipocyte pathways related to lipid metabolism, insulin response, and fatty acid biosynthesis (Supplementary Table 2). In particular, DV analysis uniquely identified Hadh in the fatty acid biosynthesis pathway, a gene not flagged by DE analysis (see Section “Limited overlap between DV and DE genes” for more details). Hadh, encoding hydroxyacyl-CoA dehydrogenase, regulates fatty acid breakdown and tryptophan metabolism (Fig. 3E). Ptgis and Nr1h3, which are critical regulators of adipogenesis, lipid metabolism, and inflammation, and Acsl1, Slc27a1, Nr1h3, which are PPAR signaling genes essential for fatty acid transport and metabolism, were also identified.

**Fig. 3: Application of the spline-DV method in analyzing scRNA-seq data from a mouse nutrition study.**

Case study 2 – Enhanced knowledge of gene activity in hepatic stellate cells with simulated chronic liver fibrosis

Our second case study for the spline-DV method utilized a scRNA-seq data set from a study focusing on potential drivers of liver fibrosis²⁶. The study was designed with a well-established chronic injury model using carbon tetrachloride administration in mice (Fig. 4A). Carbon tetrachloride effectively mimics human centrilobular fibrosis, allowing researchers to investigate disease mechanisms. Using this publicly available data set, we compared the healthy control and 6-week chronic injury hepatic stellate cell samples (Fig. 4B). We identified 205 DV genes (Supplementary Table 3) including Acta2, Gpx3, and Dpt (Fig. 4C). DV genes identified using the spline-DV method were more representative of genes associated with the progression of fibrosis, in contrast to DE genes that were not. For example, DV genes, Col1a1, Col3a1, Col6a3, and Col8a1, which are directly linked to fibrosis progression, were not captured by DE. The products of these collagen genes are the major protein component of the extracellular matrix that accumulates excessively in fibrosis²⁷. Additionally, we identified DV genes encoding enzymes critical for extracellular matrix remodeling, including Mmp2, Mmp14, Adamts4, and Adamts5, which are involved in the proteolytic processes that regulate fibrosis progression²⁸. The presence of Timp1 and Timp3, which encode tissue inhibitors of metalloproteinases, indicates a regulatory mechanism that balances extracellular matrix degradation and synthesis, crucial in fibrosis progression²⁸. Furthermore, Gas6, involved in cell survival and inflammation, showed increased variability, suggesting its role in modulating hepatic stellate cell activation and inflammation in the fibrotic liver²⁹. Functional enrichment analysis revealed that DV genes were enriched in core fibrotic processes, particularly in collagen fibril organization, extracellular matrix organization, and regulation of fibroblast proliferation (Fig. 4D, Supplementary Table 4). For instance, genes like Pdgfra, Tgfb3, and Csf1, which are crucial for hepatic stellate cell proliferation and activation during liver fibrosis, were uniquely identified as DV genes, demonstrating the spline-DV method’s ability to capture the dynamic changes in gene expression variability that occur in response to chronic injury. Other DV unique genes of interest included Loxl1, which encodes an enzyme involved in collagen cross-linking, playing a vital role in maintaining extracellular matrix integrity and stability^30,31, and Clec11a, which has been implicated in cell proliferation and differentiation. The increased variability of these genes indicates their potential roles in the activation and function of hepatic stellate cells during fibrosis development.

**Fig. 4: Application of the spline-DV method in analyzing scRNA-seq data from a chronic liver fibrosis study.**

Case study 3—Deeper understanding of driver genes in epithelial cells affected by colorectal cancer

Our third case study utilized a scRNA-seq data set from a study focused on the malignant transformation of colorectal cancer (CRC)³² (Fig. 5A). In comparing epithelial cells from unaffected and cancerous patient samples (Fig. 5B), 197 DV genes (Supplementary Table 5) were identified. These include genes associated with CRC progression such as MACF1, SMOC2, and FGFR2 (Fig. 5C), which are critical in cell adhesion, extracellular matrix remodeling, and fibroblast growth factor signaling—all processes known to support cancer cell proliferation and metastatic potential^33,34,35,36. Additionally, functional enrichment analysis revealed DV genes represented by JAG1, GMDS, SORBS2, and RBPJ are involved in Notch signaling, the pathway known to enhance colonic epithelial cell survival and promote tumorigenesis³⁷. The complete list of enriched pathways information is provided in Supplementary Table 6. Notably, high JAG1 expression correlates with poor post-surgical prognosis in CRC patients³⁸. Furthermore, DV analysis revealed distinct genes and nuanced regulatory functions that are missed by DE analysis (see Section “Limited overlap between DV and DE genes” for more details). For instance, DV analysis identified genes (CEACAM1, PDE3B, SORBS1, LPIN2) involved in insulin response, which is relevant to CRC as insulin signaling is implicated in cell growth, energy metabolism, and survival of cancer cells³⁹. Similarly, the DV-specific gene, TMSB4X, is known to be involved in ATP biosynthetic and metabolic processes, in line with the increased energy demands and metabolic shifts characteristic of cancer cells⁴⁰. Additional DV-specific genes, such as RNF213, RUBCNL, and RORA, were enriched in lipid metabolism pathways, further supporting the role of altered lipid metabolism in CRC progression by providing essential structural components for rapidly dividing cells and influencing intracellular signaling⁴¹. In immune-related pathways, VAV3 and MALT1 in antigen receptor-mediated signaling were identified, indicating potential interactions between cancer cells and immune modulation in the tumor microenvironment. This DV-derived gene-level insight refines our understanding of CRC biology, revealing unique targets across critical regulatory functions—including ERBB signaling, the ERK1/ERK2 cascade, protein kinase activity, and cell motility—while also mapping metabolic and immune pathways integral to the complex network of CRC progression and metastasis.

**Fig. 5: Application of the spline-DV method in analyzing scRNA-seq data from a colorectal cancer (CRC) study.**

Overlap between DV and DE genes

In this section, we conducted an overlap analysis between DV and DE genes identified from the same data sets. We included three different DE methods, namely the Wilcoxon rank-sum test, DESeq2⁴², and MAST⁴³, and applied them to the data sets of three case studies. In each comparison, overlap between the top 200 DV genes and the top 200 up- or down-regulated DE genes was shown (Fig. 6). While DE and DV analyses revealed overlapping genes, the numbers were limited—in most cases, regardless of data sets and DE methods, less than 25% of DV genes were also identified as top up- or down-regulated genes. The limited overlap between DE and DV genes is not unexpected as it reflects the analysis quantity difference between the two methods. Supplementary Table 7 provides complete lists of genes that were exclusively identified as significant DV genes but not DE genes by the three DE methods. These unique DV genes highlighted the distinct roles of gene expression variability in shaping respective cellular diversity and provided unique insights into gene functions and cellular states. For example, in the case of diet-induced obesity in mouse adipocytes (Fig. 6A), there are 123, 126, and 121 exclusive DV genes when compared to DE genes identified by the Wilcoxon rank-sum test, DESeq2, and MAST, respectively. They are enriched in significant pathways including inflammatory response pathway, extracellular matrix remodeling, fatty acid biosynthesis, and white fat cell differentiation. The data of case studies 2 (Fig. 6B) and 3 (Fig. 6C) revealed a similarly large proportion of unique DV genes along with respective significant pathways. These findings support the enhanced performance of DV genes, which capture a broader functional spectrum across comparisons using spline-DV’s non-mean-centric analysis.

**Fig. 6: Limited overlap between DV genes and DE genes across three case studies.**

Spline-DV leverages dropout rate to gain additional insights

While dropout can arise from technical factors, it also captures meaningful biological signals, such as biologically relevant sparsity in gene expression^18,19. This dual role makes dropout a critical dimension for understanding cellular heterogeneity, and spline-DV uniquely leverages this property to detect genes with sparse yet functionally significant expression patterns. To evaluate the added value of incorporating dropout as a third dimension in spline-DV, we compared its results to the spline-2d method, which is a truncated version of spline-DV that models gene variability using only mean and CV. Across three case studies, spline-DV consistently identified biologically significant genes that were missed by spline-2d. These findings highlight the importance of integrating dropout to capture condition-specific variability patterns in single-cell data.

In case study 1, spline-DV identified genes such as SERPINE1, CD74, and CCL2, which play pivotal roles in inflammatory signaling within adipose tissue. These genes were not detected by spline-2d because their mean and CV values overlapped with other genes, making them indistinguishable in a 2D space. Similarly, in case study 2, spline-DV uniquely identified key fibrosis-associated genes like CXCL10 and MMP14, which are critical for immune modulation and extracellular matrix remodeling in liver fibrosis. These genes had sparse yet biologically relevant expression patterns that were resolved only through the incorporation of dropout. Finally, in case study 3, dropout incorporation allowed for the detection of PPARG and CDH17, genes critical for colorectal cancer progression. These genes, with sparse but biologically relevant expression patterns, were indistinguishable using spline-2d. Importantly, the genes identified with dropout were highly enriched in condition-specific pathways, such as pro-inflammatory signaling, fibrosis progression, and metabolic reprogramming, underscoring their functional relevance.

Spline-DV and BASiCS

We further compared the results obtained from spline-DV with those derived from BASiCS⁴⁴, which is another established DV analysis method. BASiCS takes batch information and uses a Bayesian hierarchical model to estimate expression variability; DV is measured by residual dispersion distance, which quantifies how far a gene’s variability deviates from a global mean/over-dispersion trend, comparing the degree of variability between two samples.

We benchmarked BASiCS and spline-DV on the diet-induced obesity and colorectal cancer data sets. The BASiCS-based differential expression analysis with default parameters (see Methods) identified only a small number of genes; 1179 genes were identified in the diet-induced obesity data set (915 common with spline-DV) and 1194 genes were identified in the colorectal cancer data set (762 common with spline-DV). For these genes that had a significant residual dispersion using BASiCS, we observed strong agreement in gene rankings between BASiCS (based on residual dispersion distances) and spline-DV (based on DV scores) for the genes they had in common (Supplementary Fig. 2). Next, we performed an enrichment analysis with the top 200 unique DV genes from spline-DV that were not identified by BASiCS and found biologically relevant pathways for each experiment (Supplementary Tables 8 and 9). For example, Regulation of Angiogenesis and Regulation of Smooth Muscle Cell Proliferation were significantly enriched for the colorectal cancer data set. Acetyl-CoA Metabolic Process, Lipid Biosynthetic Process, and Fatty Acid Metabolic Process were significantly enriched for the diet-induced obesity data set. These confirm that unique genes captured by spline-DV are functionally relevant.

More importantly, we found that, for a given gene, residual dispersion measured in BASiCS is equivalent to the DV score in spline-DV. The difference is that spline-DV assumes no prior distribution and uses separate trend estimates for each condition. Residual dispersion in BASiCS models variability relative to a global mean/over-dispersion trend, which can overlook condition-specific differences. Such differences are often key to understanding dynamic gene regulation under distinct biological conditions. In contrast, spline-DV uses separate trend estimates for each condition, ensuring that the unique variability patterns specific to each condition are captured. This allows spline-DV to identify genes whose variability may be masked by the assumptions of a global trend in BASiCS. By incorporating mean, CV, and dropout, spline-DV captures a multi-dimensional view of gene expression variability, which better reflects biological heterogeneity than single-dimensional metrics like residual dispersion. The enrichment of unique DV genes in functionally relevant pathways further supports this approach as biologically meaningful.

Notably, spline-DV demonstrated faster runtimes, processing the diet-induced obesity data set in 3.7 sec and the colorectal cancer data set in 5.6 s on a 14-core Intel Core i5-13500 CPU (2.50 GHz) with 32 GB of RAM. In contrast, the same tasks took BASiCS more than 11 h for processing the diet-induced obesity data set, and more than 44 for the colorectal cancer data set on a computer cluster (“Methods”).

Spline-DV method is robust against sampling bias

Last, to assess the stability of the spline-DV method, we performed a sampling analysis experiment. Our goal was to evaluate how much of the DV estimate between cells from two different conditions would fluctuate if, instead, it was based on cells sampled from the same condition. We used a subsampling strategy. For cells from two conditions, for example, condition A and condition B, we subsampled cells randomly into A-1 and A-2 as well as B-1 and B-2. Each contains half of the cells from their original group. Then spline-DV was applied to cells within group (i.e., A-1 vs. A-2 and B-1 vs. B-2) as well as between groups (e.g., A-1 vs. B-1 or A-1 vs. B-2). This process was repeated 100 times to capture variability across random permutations of cells and account for inherent variability within the samples. The violin plots in Fig. 7 show the differences between within-group and between-group estimates of DV scores for top DV genes. With data sets from all three case studies, the violin plots demonstrate a clear separation between within-group and between-group estimates. High DV scores indicate greater DV levels, with “between” scores exhibiting larger spreads than “within” scores, suggesting greater DV levels between groups than within. The differences in DV scores highlight genes with significant changes in expression variability from the difference between conditions, allowing for the identification of condition-specific variability patterns in gene expression. Together, these results indicate that the spline-DV method effectively captures condition-specific DV with neglectable sampling bias in the identified DV genes.

**Fig. 7: DV scores estimated with cells from between- and within-group sampling.**

Discussion

Our goal was to develop a robust method, both biologically and statistically optimized, to identify transcriptomic changes at a single-cell level that extends beyond average gene expression changes. Our solution, the spline-DV method, is an analytical framework for comparative gene expression variability analysis, based on single-cell gene expression mean, CV, and dropout rate. Spline-DV is expected to identify novel, functionally relevant genes that are typically overlooked by traditional DE methods.

Statisticians have been undoubtedly influenced by biologists in developing statistical methods for solving biological questions. DE methods are designed to estimate the mean expression changes as accurately as possible. The strategy often involves complex statistical modeling and individual gene expression distribution fitting to precisely estimate the mean. However, concentrating solely on mean differences may be misguided if variability in gene expression holds a significant contribution in cellular function. Based on this premise, a comparison of mean expression levels between samples would not represent the complete biological complexities of the data. Unfortunately, traditional methods are centered on DE analysis. The effort of developing DV analysis is relatively limited^44,45,46,47. It is worth noting that Liu et al.⁴⁸ have previously proposed a method using scRNA-seq data to detect changes in overall variability between two groups of cells. However, that analysis is focused on assessing the variability of cell populations rather than genes, on which our spline-DV method prioritizes.

A shift in focus from expression mean to variability is essential for a more comprehensive understanding of cellular function. As a promising research direction, after identifying DV genes, we will unlock a deeper understanding of cellular function by delving into the cause of DV. Indeed, evidence from coupled scRNA-seq and scATAC-seq data (i.e., simultaneous profiling of transcriptome and chromatin accessibility and gene expression from the same cell/nucleus) shows that the expression of HVGs is strongly regulated⁴⁹. Thus, changes in single-cell gene expression variability may be further evaluated using information from correlated single-cell open chromatin profiling.

Spline-DV uses two distinct data-driven spline-fit curves (baselines) for the two conditions. This approach was designed to capture gene expression changes relative to the characteristic spline-fit curves of each condition, rather than emphasizing only the differences between the baseline gene expression. By doing so, our method provides a second-order measure of variability, reflecting relative shifts in gene expression dynamics between the two conditions. Thus, spline-DV estimates variability by leveraging the observed CV and dropout rates of genes. With advances in scRNA-seq technology, this approach suggests that imputation and empirical distribution estimations have minimal improvement in modeling scRNA-seq data. Instead, real observations provide reliable measures of gene expression variability.

The spline-DV method has several advantages in addition to its ability to identify significant genes that could be more functionally relevant to the specific cell type under study. One of the advantages is its robustness against the batch effect of input data, which is a common issue in scRNA-seq analyses. For example, we know that the baseline expression of genes often tends to shift between different biological treatment conditions, resulting in a batch effect. The spline-DV is resilient to such bias and provides a fair means to compare scRNA-seq data from two treatment groups. This is because the 3D spline curve, the building block of spline-DV, is computed in a treatment-specific manner, i.e., two conditions are processed independently. Other advantages of the spline-DV method are related to its impressive computational efficiency and its unsupervised methodology. The spline-fit method itself has been considered one of the best methods for feature selection⁵⁰.

In conclusion, we propose the spline-DV method for comparing changes in expression variability in single-cell data between different conditions. The approach, termed “differential variability”, can potentially provide additional insight into the role of gene function within the continuum of cell state transitions compared to traditional DE analysis. Based on the work presented here, we demonstrate the effectiveness of spline-DV using real scRNA-seq data case studies. By embracing cell-to-cell variability, we can gain a deeper understanding of cellular processes and dynamics within complex tissues. In addition to its application in the case studies presented here, spline-DV’s ability to capture condition-specific gene expression variability positions it as a valuable tool for a wide range of future applications, including disease modeling, drug response studies, and understanding complex tissue heterogeneity. Its computational efficiency further makes it suitable for large-scale single-cell studies, where rapid and robust methods are essential. Moreover, spline-DV is computationally efficient, processing large data sets in significantly less time compared to existing methods. This scalability ensures its applicability for analyzing increasingly complex and high-dimensional single-cell data sets, making it a valuable tool for both small-scale exploratory analyses and large-scale studies requiring rapid and robust results. As scRNA-seq data sets grow in complexity, spline-DV’s scalable, robust framework will serve as a cornerstone for future analyses, unlocking novel insights across diverse biological domains.

Methods

Single-cell gene expression data sets

The scRNA-seq data sets for the case studies were sourced from three models: the HFD-LFD mouse adipose model by Sabari et al.²³ the mouse liver model by Dobie et al.²⁶ and the human colon model by Becker et al.³² respectively. The first case study involves the analysis of data from adipocytes of mouse adipose tissue²³. The data was downloaded from the NCBI GEO database using accession numbers GSM4878207 and GSM4878210. The second case study involves the analysis of data from hepatic stellate cells in mouse liver²⁶. The data was downloaded from the NCBI GEO database using accession numbers GSM4085625 and GSM4085623. The third case study involves the analysis of data from cancerous and unaffected epithelial cells in the human colon³². The data was downloaded from the NCBI GEO database using accession numbers GSM6061702 and GSM6061683. The real scRNA-seq data sets used for generating Fig. 1 were derived from human embryonic stem cells (hESCs) and human umbilical vein endothelial cells (HUVECs), respectively. The hESC data set was obtained from Xiu et al.⁵¹ which focused on the role of FLI1 in the hESC-EC system. The HUVEC data set was obtained from the GEO database using accession number GSM7511518. This scRNA-seq study investigated the role of vascular endothelial cell growth factor (VEGF), with the data derived from a control sample not treated with human recombinant VEGF-A165.

Processing of scRNA-seq data for case studies

The downloaded scRNA-seq UMI count matrices were imported into scGEAToolbox²² for pre-processing before running the spline-DV. The default quality control filtering was applied with thresholds of library size of 1000 minimum reads per cell, 15% maximum mitochondrial DNA ratio per cell, 15 minimum nonzero cells per gene, and 500 minimum nonzero genes per cell. The cells were then embedded using t-SNE and clustered using the K-means algorithm and annotated using marker genes from the PanglaoDB database. Functional enrichment analysis of genes was conducted using Enrichr⁵².

Simulating scRNA-seq data

The simulated scRNA-seq data for making Fig. 1A was generated using the algorithm proposed by Lun, Bach, and Marioni⁵³. Briefly, for each gene i in a cell j, the expression count Y_ij was sampled from a negative binomial distribution with mean θ_jλ_i, where θ_j is cell-specific variability term from a normal distribution with log2(θ_j)~N(0, 0.25), and λ_i is a gene-specific constant sampled from a gamma distribution with shape and rate parameters set to 2. The negative binomial dispersion was set for each gene at φ_i = 0.1.

To evaluate the performance of spline-DV, we simulated scRNA-seq data using the Splatter R package⁵⁴. A gene-by-cell matrix was generated with 500 genes and 1000 cells, divided into two groups, A and B, with equal proportions (50%) and a baseline variability parameter (bcv.common = 0.2). These settings provided a controlled environment for testing differential variability detection. To introduce realistic variability changes, 10 genes were randomly selected, with 7 assigned increased variability and 3 assigned decreased variability in group B compared to group A. Variability modifications were implemented by scaling expression values, where genes with increased variability were scaled up by a factor of 4, with slight adjustments to mean values and a dropout fraction of 10%, and genes with decreased variability were scaled down by a factor of 0.25, with adjustments to mean values and a dropout fraction of 30%. These transformations emulated biologically relevant changes in variability, incorporating modifications in CV and dropout rate, both critical metrics utilized by spline-DV.

Differential expression analysis and BASiCS analysis

Differential expression analysis was conducted using the Wilcoxon rank-sum test, DESeq2⁴², and MAST⁴³, all implemented with their default parameters using the Seurat R package⁵⁵. BASiCS⁴⁴ analysis was performed with its R implementation on the Texas A&M High-Performance Research Computing platform, using a 48-core Intel Xeon 6248R (Cascade Lake) compute node with a 3.0 GHz processor and 128 GB of random-access memory (RAM). BASiCS was executed for each treatment group, using the default parameters as suggested in its tutorial, with 20,000 iterations, a thinning interval of 20, and a burn-in period of 10,000 iterations. The BASiCS differential test was run using the default parameter of ${{\varepsilon }}^{M}$ and ${{\varepsilon }}^{D}$ set to ${\rm{Log}}_{2}1.5,$ and an FDR cutoff of 0.1.

The spline-DV framework

The input of the spline-DV is two lists of three statistics: mean, CV, and dropout rate, which together depict the expression profile of genes across cells in two conditions. These statistics are defined as follows:

Mean (${\mu }_{i}):$ ${\mu }_{i}=\frac{{\sum }_{j=1}^{{n\,{\rm{cells}}}}{X}_{{ij}}}{{n}_{\rm{cells}}}$, representing the average gene expression.

CV (${C}_{\sigma ,i}):$ ${C}_{\sigma ,i}=\frac{{\sigma }_{i}}{{\mu }_{i}}$, representing the normalized variability of gene expression.

Dropout rate ($D{r}_{i}):$ $D{r}_{i}={\sum }_{j=1}^{{n}_{\rm{cells}}}I({X}_{{ij}}=0)/{n}_{\rm{cells}}$, capturing the proportion of cells where the gene is not detected.

Here, ${\sigma }_{i}={\sum }_{j=1}^{{n}_{\rm{cells}}}{({X}_{{ij}}-{\mu }_{i})}^{2}/({n}_{\rm{cells}}-1)$ is the standard deviation of a gene, and $I({X}_{{ij}}=0)$ is an indicator function that equals 1 if ${X}_{{ij}}=0$ (indicating a dropout event) and 0 otherwise. $i$ denotes the gene index and $j$ denotes the cell index.

For each condition, these three summary statistics are processed independently. The genes are sorted by mean expression, and a 3D spline-fit curve is constructed to represent the cumulative sum of the logarithmic differences between successive elements:

$$\delta {\mu }_{i}=\mathrm{ln}\left({\mu }_{i+1}+1\right)-\mathrm{ln}\left({\mu }_{i}+1\right)$$

$$\delta {C}_{\sigma ,i}=\mathrm{ln}\left({C}_{\sigma ,i+1}+1\right)-\mathrm{ln}\left({C}_{\sigma ,i}+1\right)$$

$$\delta {{Dr}}_{i}={{Dr}}_{i+1}-{{Dr}}_{i}$$

Based on these metrics, the 3D spline-fit curve encapsulates the expected behavior of all genes within a condition. This expected variability is used as a baseline for comparison and is defined as:

$${s}_{j}=\mathop{\sum}\limits_{i=1}^{i\le j}\sqrt{{\left(\delta {\mu }_{i}\right)}^{2}+{\left(\delta {C}_{\sigma ,i}\right)}^{2}+{\left(\delta {{Dr}}_{i}\right)}^{2}}\forall j\in \{1,\ldots ,n_{\rm{genes}}-1\}$$

The spline-DV method uses condition-specific spline-fit curves as baselines to account for the unique variability patterns in each condition. This approach avoids assumptions about a global trend, which could mask condition-specific differences in variability. By constructing separate baselines, spline-DV ensures that deviations (${\vec{v}}_{1}$ and ${\vec{v}}_{2}$) accurately reflect variability relative to the expected behavior within each condition.

Gene statistics ${\vec{{gs}}}_{i}=\left({\mu }_{i},{C}_{\sigma ,i},D{r}_{i}\right)$ are compared to their closest point on the spline-fit curve $({\vec{s}}_{{fit},i})$, which represents the “expected” variability. The deviation from the expected variability for each condition is calculated as: ${\vec{v}}_{1}={\vec{gs}}_{i}-{\vec{s}}_{{{fit}},i}$ (for condition 1) and ${\vec{v}}_{2}={\vec{{gs}}^{\prime} }_{i}-{\vec{s}^{\prime} }_{{{fit}},i}$ (for condition 2).

The difference in the variability between the two conditions is given by:

$$\vec{{dv}}={\vec{v}}_{2}-{\vec{v}}_{1}$$

and the magnitude of this difference $(\Vert{\vec{dv}}{\Vert})$ is the DV score—a measure of relative changes in variability between the two conditions.

Identifying differentially variable genes

Genes with a significantly greater DV score are considered DV genes. The direction of variability change between conditions is given by ${||}{\vec{v}}_{1}{||}-{||}{\vec{v}}_{2}{||}$, indicating whether variability increased or decreased. To assess the confidence of DV scores, the distance difference magnitude (${||}\vec{{dv}}{||}$) is standardized using z-scores. Assuming a Gaussian error distribution, the cumulative density function is used to compute p-values, identifying genes with distinct variability changes in the right tail of the distribution. A p-value threshold of <0.05, rather than a more stringent cutoff, was used to obtain sufficiently large sets of DV genes in each case study, facilitating the interpretation of gene functions. Notably, the Benjamini-Hochberg procedure⁵⁶ or the Benjamini-Yekutieli correction⁵⁷, the latter applicable when test statistics are correlated, can be employed to control the false discovery rate.

Data availability

The data sets analyzed during the current study are either publicly available or can be obtained from the corresponding author of the original study that generated the data set. Details of the publicly available data sets, including their accession codes or identifiers, are provided in the “Methods” section.

Code availability

Code implementing the spline-DV method in both R and MATLAB is available at https://github.com/cailab-tamu/Spline-DV along with example data sets for testing.

References

Raj, A. et al. Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 4, e309 (2006).
Article PubMed PubMed Central Google Scholar
Suter, D. M. et al. Mammalian genes are transcribed with widely different bursting kinetics. Science 332, 472–474 (2011).
Article CAS PubMed Google Scholar
Altschuler, S. J. & Wu, L. F. Cellular heterogeneity: do differences make a difference? Cell 141, 559–563 (2010).
Article CAS PubMed PubMed Central Google Scholar
Dueck, H., Eberwine, J. & Kim, J. Variation is function: are single cell differences functionally important?: Testing the hypothesis that single cell variation is required for aggregate function. Bioessays 38, 172–180 (2016).
Article PubMed Google Scholar
Losick, R. & Desplan, C. Stochasticity and cell fate. Science 320, 65–68 (2008).
Article CAS PubMed PubMed Central Google Scholar
Stumpf, P. S. et al. Stem cell differentiation as a non-Markov stochastic process. Cell Syst. 5, 268–282.e7 (2017).
Article CAS PubMed PubMed Central Google Scholar
Buganim, Y. et al. Single-cell expression analyses during cellular reprogramming reveal an early stochastic and a late hierarchic phase. Cell 150, 1209–1222 (2012).
Article CAS PubMed PubMed Central Google Scholar
Droin, C. et al. Space-time logic of liver gene expression at sub-lobular scale. Nat. Metab. 3, 43–58 (2021).
Article CAS PubMed PubMed Central Google Scholar
Phillips, N. E. et al. The circadian oscillator analysed at the single-transcript level. Mol. Syst. Biol. 17, e10135 (2021).
Article CAS PubMed PubMed Central Google Scholar
Richard, A. et al. Single-cell-based analysis highlights a surge in cell-to-cell molecular variability preceding irreversible commitment in a differentiation process. PLoS Biol. 14, e1002585 (2016).
Article PubMed PubMed Central Google Scholar
Mojtahedi, M. et al. Cell fate decision as high-dimensional critical state transition. PLoS Biol. 14(12), e2000640 (2016).
Article PubMed PubMed Central Google Scholar
Bahar, R. et al. Increased cell-to-cell variation in gene expression in ageing mouse heart. Nature 441(7096), 1011–1014 (2006).
Article CAS PubMed Google Scholar
Martinez-Jimenez, C. P. et al. Aging increases cell-to-cell transcriptional variability upon immune stimulation. Science 355(6332), 1433–1436 (2017).
Article CAS PubMed PubMed Central Google Scholar
Osorio, D. et al. Single-cell expression variability implies cell function. Cells 9, 14 (2019).
Article PubMed PubMed Central Google Scholar
Luecken, M. D. & Theis, F. J. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15, e8746 (2019).
Article PubMed PubMed Central Google Scholar
Das, S., Rai, A. & Rai, S. N. Differential expression analysis of single-cell RNA-seq data: current statistical approaches and outstanding challenges. Entropy (Basel) 24, 995 (2022).
Article CAS PubMed Google Scholar
Van den Berge, K. et al. Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications. Genome Biol. 19, 24 (2018).
Article PubMed PubMed Central Google Scholar
Qiu, P. Embracing the dropouts in single-cell RNA-seq analysis. Nat. Commun. 11, 1169 (2020).
Article CAS PubMed PubMed Central Google Scholar
Bouland, G. A., Mahfouz, A. & Reinders, M. J. T. Consequences and opportunities arising due to sparser single-cell RNA-seq datasets. Genome Biol. 24, 86 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wang, T. et al. Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC Bioinforma. 20, 40 (2019).
Article Google Scholar
Squair, J. W. et al. Confronting false discoveries in single-cell differential expression. Nat. Commun. 12, 5692 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cai, J. J. scGEAToolbox: a Matlab toolbox for single-cell RNA sequencing data analysis. Bioinformatics 36, 1948–1949 (2019).
Article Google Scholar
Sarvari, A. K. et al. Plasticity of epididymal adipose tissue in response to diet-induced obesity at single-nucleus resolution. Cell Metab. 33, 437–453 e5 (2021).
Article CAS PubMed Google Scholar
Taddeo, E. P. et al. Lysophosphatidic acid counteracts glucagon-induced hepatocyte glucose production via STAT3. Sci. Rep. 7, 127 (2017).
Article PubMed PubMed Central Google Scholar
Ahonen, M. A. et al. Insulin-inducible THRSP maintains mitochondrial function and regulates sphingolipid metabolism in human adipocytes. Mol. Med. 28(1), 68 (2022).
Article CAS PubMed PubMed Central Google Scholar
Dobie, R. et al. Single-cell transcriptomics uncovers zonation of function in the mesenchyme during liver fibrosis. Cell Rep. 29, 1832–1847.e8 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ma, F. et al. Systems-based identification of the Hippo pathway for promoting fibrotic mesenchymal differentiation in systemic sclerosis. Nat. Commun. 15, 210 (2024).
Article CAS PubMed PubMed Central Google Scholar
Raeeszadeh-Sarmazdeh, M., Do, L. D. & Hritz, B. G. Metalloproteinases and their inhibitors: potential for the development of new therapeutics. Cells 9, 1313 (2020).
Article CAS PubMed PubMed Central Google Scholar
Bellan, M. et al. Gas6/TAM system: a key modulator of the interplay between inflammation and fibrosis. Int. J. Mol. Sci. 20, 5070 (2019).
Article CAS PubMed PubMed Central Google Scholar
Yang, A. et al. Selective depletion of hepatic stellate cells-specific LOXL1 alleviates liver fibrosis. FASEB J. 35, e21918 (2021).
Article CAS PubMed Google Scholar
Yang, A. et al. Hepatic stellate cells-specific LOXL1 deficiency abrogates hepatic inflammation, fibrosis, and corrects lipid metabolic abnormalities in non-obese NASH mice. Hepatol. Int. 15, 1122–1135 (2021).
Article PubMed Google Scholar
Becker, W. R. et al. Single-cell analyses define a continuum of cell state and composition changes in the malignant transformation of polyps to colorectal cancer. Nat. Genet. 54, 985–995 (2022).
Article CAS PubMed PubMed Central Google Scholar
Shvab, A. et al. Induction of the intestinal stem cell signature gene SMOC-2 is required for L1-mediated colon cancer progression. Oncogene 35, 549–557 (2016).
Article CAS PubMed Google Scholar
Li, P. et al. FGFR2 promotes expression of PD-L1 in colorectal cancer via the JAK/STAT3 signaling pathway. J. Immunol. 202, 3065–3075 (2019).
Article CAS PubMed Google Scholar
Jang, B. G. et al. SMOC2, an intestinal stem cell marker, is an independent prognostic marker associated with better survival in colorectal cancers. Sci. Rep. 10, 14591 (2020).
Article CAS PubMed PubMed Central Google Scholar
Mohamed, D. A. W. et al. miR-34a-5p suppresses colorectal cancer cell proliferation through silencing microtubule actin crosslinking factor 1 (MACF1) gene. Gene Rep. 25, 101416 (2021).
Article CAS Google Scholar
Suman, S. et al. Targeting notch signaling in colorectal cancer. Curr. Colorectal Cancer Rep. 10, 411–416 (2014).
Article PubMed PubMed Central Google Scholar
Sugiyama, M. et al. High expression of the Notch ligand Jagged-1 is associated with poor prognosis after surgery for colorectal cancer. Cancer Sci. 107, 1705–1716 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kasprzak, A. Insulin-like growth factor 1 (IGF-1) signaling in glucose metabolism in colorectal cancer. Int. J. Mol. Sci. 22, 6434 (2021).
Article CAS PubMed PubMed Central Google Scholar
Xiao, J. et al. Integrating spatial and single-cell transcriptomics reveals tumor heterogeneity and intercellular networks in colorectal cancer. Cell Death Dis. 15, 326 (2024).
Article PubMed PubMed Central Google Scholar
Chen, X. et al. The effects of metabolism on the immune microenvironment in colorectal cancer. Cell Death Discov. 10, 118 (2024).
Article PubMed PubMed Central Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central Google Scholar
Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).
Article PubMed PubMed Central Google Scholar
Eling, N. et al. Correcting the mean-variance dependency for differential variability testing using single-cell RNA sequencing data. Cell Syst. 7, 284–294.e12 (2018).
Article CAS PubMed PubMed Central Google Scholar
Ho, J. W. et al. Differential variability analysis of gene expression and its application to human diseases. Bioinformatics 24, i390–i398 (2008).
Article CAS PubMed PubMed Central Google Scholar
Ran, D. & Daye, Z. J. Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq. Nucleic Acids Res. 45, e127 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wang, K. et al. Differential Shannon entropy and differential coefficient of variation: alternatives and augmentations to differential expression in the search for disease-related genes. Int. J. Comput Biol. Drug Des. 7, 183–194 (2014).
Article PubMed PubMed Central Google Scholar
Liu, J., Kreimer, A. & Li, W. V. Differential variability analysis of single-cell gene expression data. Brief. Bioinform. 24, bbad294 (2023).
Article PubMed PubMed Central Google Scholar
Zhong, Y. et al. Controlled noise: evidence of epigenetic regulation of single-cell expression variability. Bioinformatics 40, btae457 (2024).
Article CAS PubMed PubMed Central Google Scholar
Sheng, J. & Li, W. V. Selecting gene features for unsupervised analysis of single-cell gene expression data. Brief. Bioinform. 22, bbab295 (2021).
Article PubMed PubMed Central Google Scholar
Xu, X. et al. Single-cell RNA-seq analysis of a human embryonic stem cell to endothelial cell system based on transcription factor overexpression. Stem Cell Rev. Rep. 19, 2497–2509 (2023).
Article CAS PubMed Google Scholar
Xie, Z. et al. Gene set knowledge discovery with enrichr. Curr. Protoc. 1, e90 (2021).
Article CAS PubMed PubMed Central Google Scholar
Lun, A. T., Bach, K. & Marioni, J. C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).
Article PubMed Google Scholar
Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).
Article PubMed PubMed Central Google Scholar
Butler, A. et al. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Article CAS PubMed PubMed Central Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995).
Article Google Scholar
Benjamini, Y. & Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29, 1165–1188 (2001).
Article Google Scholar

Download references

Acknowledgements

We are grateful to Dr. Liang Hu for sharing the hESC-EC scRNA-seq data. We acknowledge the use of advanced computing resources provided by Texas A&M High-Performance Research Computing in conducting parts of this research. This research was supported in part by grants from the U.S. Department of Defense (DoD, GW200026) and the National Institute for Environmental Health Sciences (P30 ES029067) for J.J.C., Allen Endowed Chair in Nutrition & Chronic Disease Prevention for R.S.C., and the Cancer Prevention & Research Institute of Texas (CPRIT, RP230204) for J.J.C. and R.S.C.

Author information

Authors and Affiliations

Department of Veterinary Integrative Biosciences, School of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX, 77843, USA
Victoria Gatlin, Shreyan Gupta, Selim Romero & James J. Cai
CPRIT Single Cell Data Science Core, Texas A&M University, College Station, TX, 77843, USA
Victoria Gatlin, Shreyan Gupta, Selim Romero, Robert S. Chapkin & James J. Cai
Department of Nutrition, Texas A&M University, College Station, TX, 77843, USA
Selim Romero & Robert S. Chapkin
Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, 77843, USA
James J. Cai

Authors

Victoria Gatlin
View author publications
Search author on:PubMed Google Scholar
Shreyan Gupta
View author publications
Search author on:PubMed Google Scholar
Selim Romero
View author publications
Search author on:PubMed Google Scholar
Robert S. Chapkin
View author publications
Search author on:PubMed Google Scholar
James J. Cai
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to James J. Cai.

Ethics declarations

Competing interests

The authors declare no competing of interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Video 1. Video showing the differential deviation of Plpp1 from two spline curves of conditions.

Supplementary Video 2. Video showing the differential deviation of Thrsp from two spline curves of conditions.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Gatlin, V., Gupta, S., Romero, S. et al. Exploring cell-to-cell variability and functional insights through differentially variable gene analysis. npj Syst Biol Appl 11, 29 (2025). https://doi.org/10.1038/s41540-025-00507-z

Download citation

Received: 28 August 2024
Accepted: 26 February 2025
Published: 20 March 2025
Version of record: 20 March 2025
DOI: https://doi.org/10.1038/s41540-025-00507-z