A new DNA sequencing method called CODEC (Concatenating Original Duplex for Error Correction) has been developed by the Broad Institute, which addresses the limitations of existing NGS sequencing methods. The method physically links (concatenates) the two complementary strands of DNA, allowing mutations to be distinguished more easily from errors in sequencing, and can be used seamlessly with existing NGS protocols. With 1000-fold higher accuracy than traditional NGS, CODEC allows for precise genetic testing and bioinformatic analysis, which is critical for the advancement of precision medicine.
Next-Generation VS Third-Generation Sequencing
To understand the remarkable CODEC DNA sequencing method just published by researchers from the Broad Institute, you have to understand how DNA sequencing currently works, including the limitations of existing methods.
When you think of DNA sequencing, your first thought is likely next-generation sequencing (NGS). The dominant company in the NGS space is Illumina, controlling an estimated 80% of the market. NGS, also referred to as massively parallel sequencing, involves the two complementary strands of DNA being separated and fragmented. The single-stranded fragments are amplified in clusters and then sequenced as the strands are synthesized simultaneously. This method is highly accurate for short reads, but loses accuracy with longer reads, novel genomes, or very rare variants due to small errors induced in the amplification and synthesis processes.
Third-generation sequencing methods attempt to eliminate those limitations by producing reads that are tens or hundreds of thousands of base pairs long rather than NGS reads that are typically around a few hundred base pairs. Novel methods like Oxford Nanopore’s technique of passing intact DNA molecules through a charged protein pore allow for direct sequencing of DNA without separating the strands and less fragmentation. However, they are still significantly less accurate than NGS methods, though the technology is rapidly improving.
How Does CODEC Work?
In a new paper published in Nature Genetics, Bae et. al. from the Broad Institute describe Concatenating Original Duplex for Error Correction (CODEC), a method that addresses current challenges in DNA sequencing to improve accuracy. One of the main benefits of third-generation direct sequencing (meaning not separating the two complementary DNA strands) is that mutations found in the DNA can be more easily distinguished from mutations induced by errors in sequencing. This is because the mutation will be mirrored on the other strand.
CODEC takes advantage of this principle while using the existing NGS workflow to physically link (concatenate) the two DNA strands. The researchers created a new adapter quadruplex sequence that links to both ends of a double-stranded DNA fragment (a duplex) and then elongates the strands using the complementary strand as a template. The result is a single-stranded DNA molecule consisting of the two original strands: the top half is one strand, the CODEC linker is in the middle and the bottom half is the other original complementary strand with Illumina adapters on both sides.
CODEC Dramatically Improves DNA Sequencing Accuracy
What is remarkable about the CODEC method is that it can be used seamlessly in existing NGS protocols just by exchanging the typical adapters for the CODEC adapters and adding sample indices earlier in the process. CODEC affords up to 1000-fold higher accuracy than NGS due to its broad applicability from targeted sequencing to whole-genome sequencing and reduction of error rates.
CODEC detected genetic signatures from single DNA molecules including single mutated duplexes from tumor genomes and liquid biopsies and microsatellite instability using up to 100-fold fewer reads than NGS. This level of specificity will allow for remarkably precise genetic testing and analysis which is critical for the advancement of precision medicine.
One current limitation of CODEC is that it is limited to shorter strands around 300 base pairs, much like traditional NGS. Once the DNA strands are linked with CODEC, total read lengths are up to 600 base pairs which are near the maximum for NGS. NGS read lengths can also vary, but the entirety of both strands must be read for there to be a consensus between the strands. However, this is a new technique, and improvements are in progress.
Outsourcing Bioinformatics Analysis: How Bridge Informatics Can Help
Groundbreaking technological advances like these are what drive biological data generation, storage, and analysis to be faster and more accessible than ever before. From pipeline development and software engineering to deploying existing bioinformatics tools, Bridge Informatics can help you on every step of your research journey.
As experts across data types from leading sequencing platforms, we can help you tackle the challenging computational tasks of storing, analyzing, and interpreting genomic and transcriptomic data. Bridge Informatics’ bioinformaticians are trained bench biologists, so they understand the biological questions driving your computational analysis. Click here to schedule a free introductory call with a member of our team.
Jane Cook, Biochemist & Content Writer, Bridge Informatics
Jane Cook, the leading Content Writer for Bridge Informatics, has written over 100 articles on the latest topics and trends for the bioinformatics community. Jane’s broad and deep interdisciplinary molecular biology experience spans developing biochemistry assays to genomics. Prior to joining Bridge, Jane held research assistant roles in biochemistry research labs across a variety of therapeutic areas.
While obtaining her B.A. in Biochemistry from Trinity College in Dublin, Ireland, Jane also studied journalism at New York University’s Arthur L. Carter Journalism Institute. As a native Texan, she embraces any challenge that comes her way. Jane hails from Dallas but returns to Ireland any and every chance she gets. If you’re interested in reaching out, please email [email protected] or [email protected].