Privacy Policy Banner

We use cookies to improve your experience. By continuing, you agree to our Privacy Policy.

A Spanish AI decipher how protein aggregation forms

A Spanish AI decipher how protein aggregation forms
A Spanish AI decipher how protein aggregation forms

A tool of artificial intelligence It takes an important step in the translation of the language that proteins use to know if they will form sticky aggregates – defective proteins together with each other – whose presence is related to the Alzheimer’s and with more than fifty types of diseases.

Unlike other AI modelsBlack box‘, Canya can explain your decisions. In fact, he revealed specific chemical patterns that drive or prevent the harmful aggregation of proteins.

Protein agglomeration is a health danger that alters the normal function of cells

The discovery, published in the magazine Science AdvancesIt offers new knowledge about the molecular mechanisms that cause this union, which is related to diseases that affect 500 million people worldwide.

Protein agglomeration, or also called Amiloid aggregationIt is a health danger that alters the normal function of cells. When certain parts of the proteins adhere to each other they become dense and fibrous masses that may have health problems.

Impact on biotechnology and pharmaceutical industry

Although the study has implications for the investigation of neurodegenerative diseases, its most immediate impact will be on the biotechnology, since many drugs are proteins and, often, their function is hindered by unwanted unions.

“Protein aggregation is a great headache for pharmaceutical companies,” he says Benedetta Bolognesimain co -author of the study and leader at the Bioengineering Institute of Catalonia (IBEC).

The language of a protein has twenty different letters, whose combinations form ‘words’ or ‘reasons’

“Yes one Therapeutic protein It begins to be added, the manufacturing lots can fail, which costs and money. Canya can help guide efforts to design antibodies and enzymes that are less likely to join and reduce setbacks in the , “he adds.

Protein aggregations are formed by little known language. Proteins are created by twenty different types of amino acids. Instead of the usual letters A, C, G, T that make up the language of DNA, the language of a protein has Twenty different letterswhose combinations form ‘words’ or ‘reasons’.

A mysterious language

Several investigations have long tried to decipher what Combinations They cause amyloid aggregation and what others allow proteins to fold without errors.

The artificial intelligence tools treated by amino acids such as the alphabet of a mysterious language could help identify the specific words or reasons. However, the quality and volume of the data necessary to feed the models have been scarce or they have restricted to very small fragments.

Around one in five protein fragments caused agglomeration, while the rest did not

The study has addressed this challenge by conducting large -scale experiments. The authors created more than 100 000 Fragments of random proteins from scratch, each of 20 amino acids long.

The ability of each synthetic fragment To join, it was tested in live yeast cells. Thus, if a fragment triggered aggregate formation, yeast cells would grow in a particular way that can be analyzed to determine the cause and effect.

Around one in five protein fragments caused agglomeration, while the rest did not. The new data set recorded a much greater catalog of the different protein variants that can cause amyloid aggregation.

“We have created fragments of random proteins, including many versions that are not found in nature. Evolution has explored only a fraction of all possible protein sequences, while our approach helps us to look at a much greater galaxy of possibilities, providing a large number of data points to help understand the most general laws of aggregation behavior,” he explains Mike Thompson author of the study and postdoctoral researcher at the Genomic Regulation Center (CRG).

A more transparent AI

The large amount of data generated was used to train Cane. The team decided to create it using the principles of the “explainable AI”, so that its decision -making processes were more transparent and understandable. This meant sacrificing part of its predictive power, which is usually greater in the “black box”. Despite this, Canya proved to be around a 15 % more precise than existing models.

Specifically, Canya is a of convolution-noction, that is, a hybrid tool that borrows from two different areas of AI.

The convolution models, such as those used in the recognition of images, scan the photos in search of characteristics such as an ear or a to identify a face. In this same way, Canya Ojea the protein to find significant characteristics as reasons or ‘words’.

The team decided to create it using the principles of the ‘explainable AI’, making its decision -making processes transparent and understandable

On the other hand, language translation tools use AI models to identify phrases in a sentence before deciding which is the best translation. The team incorporated this technique to help Canya discover what reasons are the most important of the entire protein.

Together, these two approaches help AI to see local reasons closely and, at the same time, to detect their large -scale importance. This information can be used not only for predict what reasons In the protein chain they encourage agglomeration, block or cause an intermediate stadium, but also to understand why.

For example, Canya showed that small amino acid regions Water repellent They are more likely to cause agglomeration, while some reasons have a greater impact on agglomeration if they are towards the beginning of a protein sequence instead of the end. These observations are aligned with previous findings that have been seen under the microscope in known amyloid fibrils.

But Canya also found new rules that protein aggregation. For example, it was thought that certain basic components of proteins, the so -called charged amino acids, Avoid agglomeration. But it turns out that, in the context of other specific construction blocks, they can actually agglomeration.

There is still

In its current form, Canya explains the aggregation of proteins in terms of itself or not, that is, it works as a call ‘sorter’. How future work, the team wants to refine the system so that it can predict and compare the aggregation speeds instead of only the probability of aggregation.

This could help predict which protein variants form quickly and which do it more slowly, a vital factor in neurodegenerative diseases in which the moment of amyloid formation is as important as the fact that it occurs.

Canya mainly explains protein aggregation in terms of itself or not, that is, it works as a call ‘classifier’

“There are 1 024 quintillones of ways to create a long -amino acid protein fragment. Until now, we have trained an AI with only 100,000 fragments. We want to improve the process creating more fragments and larger,” concludes Bolognesi.

“This project is a great example of how the combination of large -scale data generation with AI can accelerate research. It is also a very profitable method to generate data,” says Icrea Research Professor Ben Lehnermain co -author of the study and group leader at the CRG and the Welclome Sanger Institute.

“Using the synthesis and sequencing of DNA, we can perform hundreds of thousands of experiments in a single tube, generating the data we need to train AI models. This is an approach that we are applying to many difficult problems of biology, with the aim of being predictable and programmable“Lehner adds.

Reference:

Thompson et al, Massive experimental quantification allows interpretable deep learning of protein aggregation, Science Advances (2025).

-

-
PREV Members of the indigenous minga blocked for several hours the entries to the DNP building in Bogotá
NEXT Young who disappeared in Disneyland was found in a dump in Mexico rummaging food