Using machine learning to map protein aggregation
Abnormal protein accumulations are a hallmark of a diverse group of disorders. The Switch lab has now developed CORDAX, an algorithm to better understand and predict amyloid formation, writes Nikos Louros.
January 19, 2022
Amyloid diseases are a diverse group of increasingly prevalent pathologies, sitting currently at 3rd place of the leading death causes worldwide (WHO) while only expected to rise further until 2050. Furthermore, most amyloid diseases remain uncurbable, as determining the major constituents is still a tricky process. Switch Laboratory in VIB/KU Leuven has developed a tool that can help to better identify the key drivers of amyloid formation.
Amyloids are linked to non-infectious pathologies of the “modern era” which are gradually affecting more and more people as life expectancy increases. People often associated amyloids to neurodegenerative disorders, including Alzheimer’s and Parkinson’s, or other diseases such as type II diabetes, atherosclerosis and even certain forms of cancer. However, as things are often not one-sided, amyloids at the same time also play useful roles, for example in the biosynthesis of pigments and oocyte fertilization in humans or in the formation of biofilms in bacteria.
A complex problem
We know that amyloids are fibrillar protein agglomerates held together by “sticky” segments that are susceptible to clumping up, known as aggregation prone regions. However, virtually any protein can form amyloids under certain conditions that cause it to misfold and expose its own “sticky” parts. This complexity has made it hard to determine common patterns of amyloid prone regions from different proteins over the years, thus keeping us in the dark when it comes to the underlying mechanisms of amyloid formation.
Amyloid structures showed us the way
Thankfully, recent technological advancement has brought to light new methodologies such as cryo-electron microscopy and microcrystal diffraction which have helped in determining the structural features of amyloid fibrils. Using these techniques, we now have gathered information on more than 80 different amyloid fibrils, whereas research efforts over a period of more than 20 years have uncovered more than a thousand known sticky protein sequences.
Accessibility to this public knowledge inspired us to test whether this information would now suffice to train a machine model to accurately predict the sticky parts of other unknown protein sequences. Using a machine-learning approach, we developed Cordax (named after a provocative ancient Greek dance), a tool that detects aggregation prone regions with high accuracy and also predicts the structural features of the amyloid fibrils they can form.
New properties of aggregation-prone regions come to light
We used the model to identify the aggregation prone regions in more than 30 major amyloidogenic proteins. Through this process we found more than 80 new aggregation-prone regions that may play a key role in protein accumulation. A deeper look showed that these sticky sequences are not only exposed when a protein misfolds but can often form sticky patches on their surfaces, thus constituting much more dangerous clumping hot spots.
Using this information, we were also able to reconstruct a map of all aggregation prone segments known today. This comprehensive map revealed a range of common amyloid sequence patterns that are also linked to phase transitioning events (liquid-liquid phase separation) and to the formation of functional nanomaterials.
Our new tool was published in Nature Communications and is publicly accessible at https://cordax.switchlab.org.
Wouldn’t it be amazing if we could now target these sticky regions in order to prevent them from aggregating? Our lab is working on pulling this thread further by designing anti-amyloid structured-based therapeutics, as well as to develop new peptide-based bionanomaterials.