Using Neural Networks and Data-Driven Approaches to Upgrade our Understanding of Human Disease 

Using Neural Networks and Data-Driven Approaches to Upgrade our Understanding of Human Disease 

Table of Contents

The Language of Disease

What should we think of a disease? Though we could acknowledge that using terms (like “depression” and “hayfever”) is a simple practice that can hide similar disease mechanisms, how are we to understand diseases in a way that goes beyond treating them as monolithic, separate, discrete things? 

One approach to making complex data of any kind human-comprehensible is to map data into an abstract space of several different dimensions. Computational linguists can organize words in a ‘meaning space’ where words with similar meanings cluster together. Different cell types can be identified from a sample by clustering analysis. Jia et al. (2023) recently used these methods to create an abstract, 20-dimensional ‘space of diseases.’  

Building a State-Space

Constructing these high-dimensional spaces requires a lot of data. Creatively, rather than using biomarker measurements from patients, Jia and colleagues built a language model from the text of electronic health insurance records of over 100 million people in the USA. By training a simple neural network to predict randomly omitted diseases listed chronologically for each patient, they were then able to map over 500 different diseases into 20 unique ‘disease dimensions’ using principal component analysis (PCA). The advantage of this approach is that given any disease in the space, one can then look at the closest neighboring diseases and those that are furthest apart (e.g. the closest neighbor of MS was encephalitis and its furthest was asthma).

Grounding the Model in Genetic Data

Treating each disease dimension as a continuous genetic trait, they were able to map these new genetic traits using family records and associate them with genomic data from the UK Biobank. This new model was then verified against two other databases in the USA and Japan. Grounding their 20-dimensional disease space model in genomic data allowed them to examine the combined effect of numerous genetic variants to each dimension, thereby discovering new variants (and constellations of variants) that may contribute to similar diseases, diseases which previous taxonomies, influenced by arbitrary anatomical and cultural factors, may not necessarily view as similar. In summary, they were able to construct a model that cut against the grain of typical disease classification, challenging the “traditional, reductionist, one-disease-at-a-time approach” in favor of something more holistic. Though the consequences of these new connections remain to be seen and the model perfected, it is a convincing argument for creative data-driven approaches to updating our understanding of human disease.

Outsourcing Bioinformatics Analysis: How Bridge Informatics Can Help

This study serves as an excellent example of the power of modern data analysis techniques, when carefully combined, to cut across established boundaries and make links that would not have been possible otherwise. Groundbreaking studies like these are made possible by technological advances making biological data generation, storage, and analysis faster and more accessible than ever before. From pipeline development and software engineering to deploying existing bioinformatics tools, Bridge Informatics can help you on every step of your research journey.

As experts across data types from leading sequencing platforms, we can help you tackle the challenging computational tasks of storing, analyzing, and interpreting genomic and transcriptomic data. Bridge Informatics’ bioinformaticians are trained bench biologists, so they understand the biological questions driving your computational analysis. Click here to schedule a free introductory call with a member of our team.



Fionn O’Sullivan, Neuroscientist & Content Writer, Bridge Informatics

Fionn is a Content Writer at Bridge Informatics, a professional services firm that helps biotech customers implement advanced techniques in the management and analysis of genomic data. Bridge Informatics focuses on data mining, machine learning, and various bioinformatic techniques to discover biomarkers and companion diagnostics. If you’re interested in reaching out, please email [email protected] or [email protected].

Share this article with a friend