August 8, 2022
With all of the recent advances in genomic data analysis, it is worth examining where the raw sequence data actually comes from. Researchers have a few choices of sequencing platforms, from traditional, low-throughput Sanger sequencing to innovative next- and third-generation sequencing technologies. Platform choice will depend on the scale of the project, the cost of sequencing, and the ultimate research question being answered by downstream analysis.
Next Generation VS Third Generation
The dominant company in the next-generation sequencing space is Illumina. Illumina sequencing is primarily a short-read, sequencing-by-synthesis platform, and has become incredibly popular for high-throughput sequencing.
Slightly newer to the space, and sometimes even called “third-generation” sequencing, is Pacific Biosciences (PacBio), which is still based on sequencing-by-synthesis but produces long reads. Both Illumina and PacBio are facing competition from another third-generation sequencing company, Oxford Nanopore, which relies on different fundamental principles.
Principles and Accuracy of Sequencing Platforms
Illumina sequencing-by-synthesis uses a proprietary platform to amplify fragments of the genome being sequenced and then reads which base is added as the fragment is synthesized using fluorescently tagged bases. Sequencing-by-synthesis is a well-established technology, and Illumina’s adjustments have produced a high throughput version of this type of sequencing with over 99% accuracy.
PacBio, on the other hand, has made different adjustments to the sequencing-by-synthesis method using their HiFi protocol. This method produces large, circular DNA molecules which can then be sequenced continuously, unlike the fragments used in Illumina’s protocol. PacBio HiFi sequencing also has an accuracy rate of over 99% and was the sequencing technology of choice for the recently completed Telomere-to-Telomere project to complete the human genome sequence.
Long-Read vs Short-Read Technology
Though the sequencing principles vary slightly between these two sequencing approaches, the main difference between Illumina and PacBio is that Illumina specializes in short-read raw sequence data while PacBio focuses on long-read raw sequence data.
Illumina sequencing primarily sequences small fragments of DNA, producing read lengths of 50-300 base pairs (bp) which are then assembled into a whole genome sequence using bioinformatics pipelines and reference genomes. This is called short-read technology and has been incredibly useful thus far in genomics. However, it is very time and labor-intensive to assemble these short reads correctly, and if the genome is from an organism that lacks a high-quality reference genome or has many repeat sequences or rare variants, it makes assembly even more challenging and less accurate.
PacBio HiFi sequencing is a long-read technology, producing reads over 10,000 base pairs in length. The advantage of longer reads is easier genome assembly and higher accuracy in identifying rare variants and distinguishing repeating sequences more clearly.
It is worth noting that Illumina is breaking into the long-read space with its new Infinity assay. Infinity uses existing Illumina sequencing with different sample preparation steps to produce reads up to 10,000 base pairs in length.
PacBio is also branching out and is a popular provider of microbial sequencing services, a unique niche that, much like cancer genomics, is growing in research popularity but requires its own specific sequencing protocols.
Outsourcing Downstream Bioinformatic Analysis
The raw genomic data produced by these sequencing platforms has enormous potential to provide us with biological and health-related insights but requires significant downstream processing and analysis to extract this valuable information. Working with service providers like Bridge Informatics is a great option. We support your data storage, analysis, and pipeline development needs to eliminate common challenges associated with these downstream analysis tasks. Book a free discovery call with us if you’re interested in outsourcing your bioinformatic needs with Bridge Informatics.
Jane Cook, Journalist & Content Writer, Bridge Informatics
Jane is a Content Writer at Bridge Informatics, a professional services firm that helps biotech customers implement advanced techniques in the management and analysis of genomic data. Bridge Informatics focuses on data mining, machine learning, and various bioinformatic techniques to discover biomarkers and companion diagnostics. If you’re interested in reaching out, please email [email protected] or [email protected].