Scalable TCR Analysis for Autoimmune Biomarker Discovery

Scalable TCR Analysis for Autoimmune Biomarker Discovery

Situation

A research-driven biotechnology company investigating T cell–mediated immune responses aimed to uncover disease-associated T cell receptor (TCR) clusters with relevance to autoimmune conditions such as Psoriatic Arthritis (PsA). The Company hypothesized that certain TCR motifs and clonal expansions could serve as biomarkers or therapeutic entry points. However, extracting meaningful patterns from large, noisy datasets, particularly when integrating millions of CDR3 sequences and gene expression profiles, posed a significant analytical challenge.

To overcome these hurdles, the Company engaged Bridge Informatics (BI) to deploy and validate machine learning tools capable of scaling with data complexity while maintaining biological interpretability.

Strategy

BI designed and executed a cloud-native analysis workflow centered on three of the most advanced deep learning tools in TCR bioinformatics: TCR-DeepInsight,  Mal-ID, and DeepTCR. These tools use cutting-edge neural architectures to identify patterns beyond the reach of conventional methods.

TCR-DeepInsight integrates a BERT transformer model for unsupervised sequence embedding with a variational autoencoder (VAE) trained on gene expression data. This hybrid framework enables clustering of TCRs based on both sequence similarity and phenotypic context—capturing shared immune functionality across donors and conditions.

Mal-ID is a supervised classifier trained to predict disease state directly from raw TCR sequences. It uses deep neural networks to learn complex, nonlinear features across large datasets. Unlike older tools reliant on simple sequence matching or motif enrichment, Mal-ID can identify hidden patterns specific to disease progression, immune evasion, or clonal expansion.

DeepTCR provides a flexible platform for both supervised and unsupervised learning on TCR repertoires. It supports sequence-level classification, motif extraction, and visualization using convolutional neural networks (CNNs) and variational autoencoders. DeepTCR was used to cross-validate key clusters and extract interpretable motifs that reinforced findings from the other models.

BI’s implementation strategy included:

  1. Cloud Infrastructure and Pipeline Engineering
    Deployed a Snakemake-managed, Dockerized workflow on GPU-enabled AWS EC2 instances. Configured environments to support CUDA acceleration and DNN model execution, ensuring reproducibility and scalability.

  2. Model Training and Execution
    Ran TCR-DeepInsight on a curated PsA dataset, generating unsupervised clusters and computing similarity/disease specificity scores. Simultaneously, applied Mal-ID to the same dataset, training its classifier on disease vs. healthy samples. DeepTCR was used to independently assess motif patterns and identify high-impact CDR3 features across key clones. Outputs included ranked TCRs by predictive importance and receiver operating characteristic (ROC) curves demonstrating classification performance.

  3. Cross-Tool Comparison and Validation
    Benchmarked both models against published datasets and internal controls. Used Jaccard similarity and area under the curve (AUC) metrics to assess consistency and predictive power. Found strong agreement between the tools: clusters identified by TCR-DeepInsight were highly enriched for sequences flagged by both Mal-ID and DeepTCR as disease-associated.

  4. Visualization and Interpretability
    Developed custom UMAPs, network graphs, and Cytoscape-compatible outputs to visualize the relationship between clusters, predictive scores, and disease labels. Delivered summary tables of enriched V/J gene usage, motif prevalence, and cluster-level disease enrichment.

  5. Knowledge Transfer and Strategic Support
    Held weekly working sessions to train our client’s team on interpreting DNN outputs, tuning hyperparameters, and expanding the tools to new indications. Provided detailed documentation and a roadmap for future publications or therapeutic exploration.

Results

In just six weeks, BI delivered a powerful framework for disease-focused TCR repertoire discovery:

Validated Deep Learning Outputs
TCR-DeepInsight revealed biologically meaningful clusters aligned with gene expression and disease phenotype. Mal-ID classified PsA vs. healthy donors with high accuracy (AUC > 0.9), while DeepTCR supported these findings with interpretable motifs and feature-level visualizations.

Complementary Tool Synergy
While TCR-DeepInsight uncovered unsupervised structure, Mal-ID and DeepTCR independently validated disease relevance—creating a multi-angle approach that enhanced confidence in both the data and interpretation.

Scalable, Production-Ready Infrastructure
All tools were deployed in a cloud-agnostic, Dockerized format, ready for future use on new datasets and disease areas.

Conclusion

By integrating deep neural network–driven tools like TCR-DeepInsight, Mal-ID, and DeepTCR,  Bridge Informatics helped transform the Company’s TCR data into actionable immunological insights.

This project showcased how DNNs can detect intricate sequence relationships and condition-specific signatures that evade traditional pipelines. The result: a scalable, interpretable system for immune repertoire analysis with immediate relevance to biomarker discovery and immunotherapy development.

If you’re looking to bring AI-driven precision to your immunological research, our team at Bridge Informatics is ready to help. Let’s build the infrastructure that makes your science future-ready. Click here to schedule a free introductory call.

Originally published by Bridge Informatics. Reuse with attribution only.

Share this article with a friend