SCORPION: Enhancing Population-Level Gene Regulatory Network Analysis with Single-Cell Data

SCORPION: Enhancing Population-Level Gene Regulatory Network Analysis with Single-Cell Data

Table of Contents

Introduction

Single-cell RNA sequencing (scRNA-seq) is a technique which allows for the detailed characterization of transcriptomes at the individual cell level within a heterogeneous population. Unlike traditional bulk RNA sequencing, which provides an averaged gene expression profile across a mixture of cells, scRNA-seq offers high-resolution insights into cellular diversity. This technique employs dimensionality reduction methods to simplify complex gene expression data into more manageable forms, aiding in subsequent unsupervised clustering analyses. Algorithms like K-means clustering, combined with community detection methods such as Louvain or Leiden, facilitate the identification of distinct cell populations or clusters. Analyzing the gene expression patterns of these clusters provides valuable insights into cellular functions, disease mechanisms, developmental pathways, and gene regulatory networks..

Gene regulatory networks are systems of interacting genes, transcription factors, and other molecular components that govern the gene expression levels within a single cell. These networks consist of nodes (genes and proteins) and edges (regulatory interactions) that together orchestrate the cellular processes necessary for development, function, and response to environmental stimuli.  In a recent publication in Nature Computational Science, Osorio et al. (2024) introduce SCORPION, which is a newly developed R package that offers a novel solution for reconstructing gene regulatory networks or pathways from scRNA-seq data, thereby enabling detailed population-level comparisons. This tool aims to improve our understanding of gene regulation across different biological conditions and diseases. By revealing the interactions between genes within individual cells, SCORPION provides detailed insights into the mechanisms that drive health and disease at a molecular level.

Single-Cell Analysis with SCORPION

SCORPION first addresses the issue of sparsity in high-throughput single-cell/nuclei RNA-seq data using a technique called coarse-graining. This process collapses similar cells into SuperCells, effectively reducing the sample size and allowing for better capture of gene expression relationships.

SCORPION then uses a message-passing algorithm called PANDA (Passing Attributes between Networks for Data Assimilation) to create three gene regulatory networks.

  1. Co-regulatory network: Represents co-expression patterns between genes, constructed using correlation analyses of the coarse-grained transcriptomic data.
  2. Cooperativity network: Accounts for known protein-protein interactions between transcription factors, with data sourced from the STRING database.
  3. Regulatory network: Describes the relationship between transcription factors and their target genes using transcription factor footprint motifs found in the promoter regions of genes.

Subsequently, these networks are used to iteratively refine the regulatory network by integrating information from the co-regulatory and cooperativity networks. Through this iterative process, SCORPION converges on a refined regulatory network that accurately reflects the relationships between transcription factors and genes.

SCORPION was compared against and demonstrated to outperform 12 existing gene regulatory network techniques, accurately identifying differences in regulatory networks between wild-type and transcription factor-perturbed cells. It is also scalable, demonstrated by its application to a single-cell RNA-sequencing atlas of 200,436 cells from colorectal cancer and adjacent healthy tissues. Furthermore, SCORPION is compatible with the popular single-cell analysis R package, Seurat, for data loading and clustering, and also allows for potential integration with other types of data, such as bulk RNA-seq and ATAC-seq, to enhance gene regulatory network analysis. It can also be combined with other R packages to support comprehensive multi-omics studies, providing a more complete view of cellular functions and interactions.

Revolutionizing Cancer Research

The ability to accurately reconstruct gene regulatory networks from single-cell data has profound implications for biomedical research and clinical applications. By enabling detailed comparisons across different cell populations, SCORPION can identify key regulatory mechanisms driving diseases like cancer. This tool has the potential to help guide the development of targeted therapies by pinpointing crucial transcription factors and regulatory interactions involved in disease progression.

In a case-study, SCORPION was used to identify significant transcription factors and gene interactions associated with colorectal cancer progression and patient survival. These insights could lead to more personalized treatment strategies and better prognostic tools in clinical settings. Moreover, SCORPION’s ability to model changes in transcription factor activity provides a clearer understanding of how these factors influence gene expression in various conditions.

The Future of Single-Cell Analysis

SCORPION is an exciting new tool in the analysis of single-cell transcriptomic data. Its innovative approach to gene regulatory network reconstruction allows for precise population-level studies, and the potential for enhancing our understanding of complex gene regulatory mechanisms. One of the key strengths of SCORPION is its ability to integrate various layers of biological data. This is particularly important as single-cell technologies evolve, thereby offering increased resolution and complexity in transcriptomic data. Tools like SCORPION are essential for translating these high-resolution datasets into meaningful biological insights, which can have significant implications for understanding cellular processes, identification of biomarkers, and the development of personalized medicine approaches that tailor treatments based on an individual’s specific gene regulatory network profile.

Outsourcing Bioinformatics Analysis: How Bridge Informatics (BI) Can Help

We are passionate about empowering life science companies with cutting-edge technologies. BI’s data scientists prioritize studying, understanding, and reporting on the latest developments so we can advise our clients confidently. Our bioinformaticians are trained bench biologists, so they understand the biological questions driving your computational analysis.

From pipeline development and software engineering to deploying your existing bioinformatic tools, BI can help you on every step of your research journey. As experts across data types from leading sequencing platforms, we can help you tackle the challenging computational tasks of storing, analyzing and interpreting genomic and transcriptomic data. Click here to schedule a free introductory call with a member of our team.


Tyler Kolisnik, PhD, Data Scientist, Bridge Informatics

In his role as Data Scientist, Tyler helps clients transform complex data into actionable insights. A specialist in bioinformatics, his expertise includes high-throughput sequencing, data analytics, pipeline development, SQL databasing, and R and Python programming.

Tyler previously worked as a Bioinformatician at Imagia-Canexia Health, Rancho Biosciences, and GenomeDx Biosciences. He completed his PhD at Massey University in Auckland, New Zealand in collaboration with the Genome Sciences Centre in Vancouver. His research focused on the development of machine learning models and tools for improving cancer prognosis and treatment. If you’re interested in reaching out, please email [email protected] or [email protected]

Share this article with a friend