August 1, 2022
What is AlphaFold?
If it hasn’t come across your Twitter feed or LinkedIn yet, AlphaFold is the answer to one of the longest-standing problems in structural biology: how can you predict a protein’s final folded conformation from its primary amino acid sequence?
Researchers from Google’s machine learning-focused company, DeepMind, launched the machine learning network called AlphaFold last year, which can predict the folded structure of a protein from its primary sequence with extremely high accuracy. Earlier this week, DeepMind announced that up from the roughly 350,000 structures predicted by the AI last year, with new entries, AlphaFold has now predicted over 200 million protein structures: essentially, the shape of every protein known to science.
Structural Biology vs Bioinformatics
AlphaFold is an interesting project in that it lies at the intersection of an extremely wet lab-heavy field (structural biology) and the strictly in silico world of computational biology. Traditional methods for determining a protein’s structure like X-ray crystallography are time, material and cost-intensive, and some proteins are next to impossible to crystallize.
AlphaFold, on the other hand, takes advantage of protein structures that have been previously experimentally determined as well as the incredible quantity of genetic sequence data available to researchers today to create a powerful predictive machine learning model. This makes searching for the structure of a protein as easy as a Google search.
200 Million Protein Structures and Counting
With 200 million protein structures to search from, however, it’s natural to wonder where to begin. This is where modern bioinformatics comes in. Bioinformaticians can analyze genetic and transcriptomic data to extract biological insights, a prime example being identifying proteins that may be expressed at higher (or lower) rates under different conditions.
Now, based on sequence data alone, the structure of that protein can be visualized, even if it is yet to be determined experimentally. Does it have a domain that may interact with nearby proteins? Is it an enzyme whose active site can be modeled as a potential drug target? Protein structure is a highly useful piece of information to have when translating sequence data into the lab.
Outsourcing Bioinformatics Analysis
Storing and analyzing genomic data of any kind to interpret for biological research is a challenging computational task. Our experts at Bridge Informatics can help design custom cloud infrastructure for your data storage needs, as well as high-quality, reproducible data analysis pipelines. Book a free discovery call today to discuss your project needs.
Jane Cook, Journalist & Content Writer, Bridge Informatics
Jane is a Content Writer at Bridge Informatics, a professional services firm that helps biotech customers implement advanced techniques in the management and analysis of genomic data. Bridge Informatics focuses on data mining, machine learning, and various bioinformatic techniques to discover biomarkers and companion diagnostics. If you’re interested in reaching out, please email [email protected] or [email protected].
Sources: