By Conor León
February 3, 2022
Genome Annotation and scRNA-seq Data
Single cell sequencing has revolutionized the field of transcriptomics, allowing researchers to visualize which genes are actively being expressed across a population of cells.
Thus far, scRNA-seq and subsequent analyses have relied on high-quality, annotated genomes. “Annotations” are essentially notes researchers have made in the genome of the functions of different genes, and are used in scRNA-seq analysis by bioinformaticians to identify cell types in their data. Genome annotation is achieved through homology analysis and subsequent manual curation or validation for commonly studied species. This then poses the question: how can researchers utilize scRNA-seq for uncommon organism genomes and other genomic regions with little or no functional annotation?
Computational Tool in R
In a recently published Nature article, Wang et al. propose borrowing a trick from plant geneticists to solve this issue. The proposed procedure involves analyzing scRNA-seq data from a cell population of interest without gene annotation, through a computational tool in R named groHMM.
This tool allows for the identification of transcriptionally active regions (TARs) in single-cell data. Using groHMM, comparisons can then be made between transcribed regions that are already annotated (aTARs) and transcribed regions that lack annotation (uTARs). The difference reveals how much important information lies outside of known gene annotations for the organism of interest.
Using uTARs to Distinguish Between Cell-Types
Clusters of the same cell type can be grouped by co-expression of well-characterized genes. Certain uTARs were found to correlate strongly with cell types. Wang et al. correctly identified embryonic heart cells from chickens by identifying 18 associated uTARs using groHMM that were differentially expressed. This opens up the exciting possibility that refinement of this procedure could be used in distinguishing between cell types in transcriptomic datasets.
The most obvious application is a more robust analysis of scRNA-seq data from organisms lacking significant gene annotation. However, it has exciting potential for applications in developing tissues, which often lack the quality of annotation found in adult tissues. Given the transient expression patterns, a better understanding of the developmental transcriptome would be extremely valuable.
The field of scRNA-seq data analysis is ever-evolving. The authors of this paper hope that their proposed procedure can be used in conjunction with additional custom bioinformatic tools and pipelines to further the current understanding of transcriptional dynamics in cells.
Conor León, Geneticist & Content Writer, Bridge Informatics
Conor is a Content Writer at Bridge Informatics, a professional services firm that helps biotech customers implement advanced techniques in the management and analysis of genomic data. Bridge Informatics focuses on data mining, machine learning, and various bioinformatic techniques to discover biomarkers and companion diagnostics. If you’re interested in reaching out, please email [email protected] or [email protected].