The Situation
A research institute was investigating extrachromosomal circular DNA (eccDNA), a class of genomic elements increasingly implicated in gene regulation, cancer biology, and genome instability. Across prior engagements with Bridge Informatics, the team had made substantial progress, including:
- Quality control and preprocessing of sequencing data
- eccDNA assembly using tools such as Circle-Map and CReSIL
- Annotation and motif analysis of assembled DNA sequences
- Analysis of matched RNA-seq data to assess read pile-ups relative to sequence motifs
As the project progressed, both teams identified an opportunity to deepen the analysis. Early findings highlighted the limitations of focusing solely on assembled eccDNA sequences, a subset of the total data that inherently reflects prior assumptions. To move beyond these constraints, we shifted toward a more foundational approach: analyzing DNA and RNA datasets together in their raw form to uncover sequence-dependent relationships without bias from earlier processing steps.
The Challenge
The core scientific question was straightforward:
Do DNA and RNA datasets exhibit sequence complementarity, and if so, where in the genome do these interactions occur?
Answering this required navigating several layers of complexity:
- Heterogeneous data types: Short-read Illumina and long-read ONT data required distinct alignment and preprocessing strategies
- Platform-specific artifacts: ONT data introduced shadow sequences that needed to be identified and removed
- Computational scale: Tens to hundreds of millions of reads per dataset made naive all-versus-all comparisons infeasible
- Tooling limitations: Existing methods were not designed for genome-scale complementarity analysis
A new approach was required, one that balanced computational efficiency with biological rigor.
Our Strategy
Rather than forcing existing tools beyond their limits, we designed a multi-stage analytical framework to systematically reduce the search space while preserving biological signals.
Establishing a strong analytical foundation:
We began by setting up a reproducible computational environment and rigorously preparing both sequencing datasets. This included removing platform-specific artifacts from long-read ONT data and independently aligning both DNA and RNA reads to the human reference genome.
Iterative method evaluation
For the complementarity analysis, we tested multiple approaches in sequence, allowing early results to inform subsequent steps. An initial method quickly revealed insufficient discriminatory power, an important finding that enabled us to pivot early and avoid unnecessary computational cost.
Deploying scalable, sequence-level approaches:
We then implemented sequence-level comparison methods that were both computationally tractable and biologically meaningful. These approaches revealed a consistent signal across early and late timepoint samples, providing confidence in the observed patterns across multiple analytical strategies.
Targeted high-resolution analysis:
With the signal localized, we shifted to a focused analysis of the most informative genomic regions. This allowed us to confirm true sequence-level relationships while also uncovering additional, previously unobserved patterns that informed the next phase of research.
Delivering structured, usable outputs:
Finally, all regions of complementarity were annotated by genomic context, quantified across samples, and delivered as browser-ready files, accompanied by a comprehensive suite of visualizations to support interpretation and downstream use.
The Outcome
What began as an open-ended exploration of DNA–RNA relationships resulted in a set of clear, biologically interpretable findings, and a focused roadmap for future investigation. A key factor in this success was the ability to evaluate methods critically and adapt
luickly. In exploratory genomics, analytical judgment is as important as technical execution. By iterating efficiently and prioritizing signal over sunk cost, we were able to deliver meaningful results within a fixed budget and timeline.
Importantly, the project was structured with flexibility in mind from the outset, ensuring alignment on how to handle evolving insights without disrupting progress.
The result: high-confidence conclusions, publication-ready visualizations, and a well-defined set of next questions, providing the foundation needed to advance this line of research.
Conclusion
If your team is tackling a complex biological question and needs a partner to design and execute the right analytical approach from the ground up, we’d welcome the conversation.
Click here to schedule a free introductory call with a member of our data science
team.