
Professor Tuomas Knowles, lead author of the paper and a Fellow at St John’s College, said: “Bringing machine-learning technology into research into neurodegenerative diseases and cancer is an absolute game-changer. Ultimately, the aim will be to use artificial intelligence to develop targeted drugs to dramatically ease symptoms or to prevent dementia happening at all.
The human body is home to thousands and thousands of proteins and scientists don’t yet know the function of many of them. We asked a neural network based language model to learn the language of proteins.
Proteins are large, complex molecules that play many critical roles in the body. They do most of the work in cells and are required for the structure, function and regulation of the body’s tissues and organs—antibodies, for example, are a protein that function to protect the body.
Alzheimer’s, Parkinson’s and Huntington’s diseases are three of the most common neurodegenerative diseases, but scientists believe there are several hundred.
In Alzheimer’s disease, which affects 50 million people worldwide, proteins go rogue, form clumps and kill healthy nerve cells. A healthy brain has a quality control system that effectively disposes of these potentially dangerous masses of proteins, known as aggregates.
Scientists now think that some disordered proteins also form liquid-like droplets of proteins called condensates that don’t have a membrane and merge freely with each other. Unlike protein aggregates which are irreversible, protein condensates can form and reform and are often compared to blobs of shapeshifting wax in lava lamps.
Protein condensates have recently attracted a lot of attention in the scientific world because they control key events in the cell such as gene expression—how our DNA is converted into proteins—and protein synthesis—how the cells make proteins.
Any defects connected with these protein droplets can lead to diseases such as cancer. This is why bringing natural language processing technology into research into the molecular origins of protein malfunction is vital if we want to be able to correct the grammatical mistakes inside cells that cause disease.
Data fed to algorithm all of data held on the known proteins so it could learn and predict the language of proteins in the same way these models learn about human language and how WhatsApp knows how to suggest words for you to use.
Then it is able ask it about the specific grammar that leads only some proteins to form condensates inside cells. It is a very challenging problem and unlocking it will help us learn the rules of the language of disease.
Further use of machine-learning could transform future cancer and neurodegenerative disease research. Discoveries could be made beyond what scientists currently already know and speculate about diseases and potentially even beyond what the human brain can understand without the help of machine-learning.
The network developed has now been made freely available to researchers around the world to enable advances to be worked on by more scientists.
Research Link : https://www.pnas.org/content/118/15/e2019053118
