SITUATION
A pharmaceutical company aimed to develop an AI-powered tool for classifying protein function based on amino acid sequences, but they lacked the internal expertise to build a classification model. They needed a partner with experience in designing a supervised artificial intelligence (AI) model for sequence-based data classification, so the Chief Scientific Officer asked a trusted member of his network for a referral. He was referred to Bridge Informatics (BI) by a former client of ours, and we were selected to design the classification model.
STRATEGY
Looking to leverage our expertise in deep learning for protein classification, they relied on us to develop a workflow that would help them accomplish their goal. After meeting with them to understand their overall objectives and short- and long-term goals, we outlined the following approach in our Statement of Work:
Data Acquisition: We curated a comprehensive dataset of protein sequences and their corresponding domain level information, which are predictive of binding partners and function, from UniProt. The data was carefully filtered to ensure a high-quality labeled training set with diverse protein types and well-annotated functions.
Feature Engineering: We transformed the raw protein sequences into numerical representations suitable for machine learning algorithms. This involved techniques where each amino acid is converted into a unique binary vector.
Deep Neural Network Development: The processed data, with protein sequences as inputs and functional categories as labels, was used to train and test a deep neural network architecture specifically designed for protein sequence based analysis. The network incorporated recurrent layers to capture the sequential nature of protein sequences and achieved optimal performance through hyperparameter tuning.
RESULTS
Within 6 months, BI delivered a powerful AI classifier trained on a large-scale protein sequence dataset. The model demonstrated high accuracy and precision in predicting protein function, exceeding a threshold of 85% and achieving an F1 score of 0.9.
This tool, built using expertly curated protein sequence data, was capable of analyzing novel protein sequences and predicting their functions with relatively high confidence.
Interested in partnering with Bridge Informatics? Contact us to learn more about our team of bioinformaticians with experience at the bench whose core specialty is understanding and analyzing biological data.
Jessica Corrado, Head of Business Development & Commercial Operations, Bridge Informatics
As the Head of Business Development & Commercial Operations, Jessica is responsible for driving strategic growth initiatives and overseeing the company’s commercial activities. She has both a keen understanding of the life sciences industry and a strong track record in building successful partnerships.
Prior to joining Bridge, Jessica held a number of leadership roles across sales, marketing, and communications. Outside of work, Jessica is responsible for the majority of marketing and event planning for Shore Saves, a non-profit animal rescue. She enjoys reading and is often reading at least two books of various genres at a time. If you’re interested in reaching out, please email [email protected].