Machine learning technologies are everywhere. They’re used by search engines, social media, and even in online banking. But one area that this technology is still emerging is medicine.
Machine learning technologies could be very promising in medicine, and could be used for many applications, such as detecting signs of disease in cells, or discovering new drugs for rare diseases. But in order for a machine learning approach to be able to do such things, it needs to be both accurate and able to understand how cells work.
Our team has developed an accurate machine learning approach that can predict cell growth in a way that researchers can easily understand. The machine learning technique makes its predictions by looking at how cells change and act under different conditions. This method could someday be used to diagnose cancer, or predict how certain drugs may interact with a patient.
Interpreting machine learning predictions
In essence, machine learning is a form of artificial intelligence (AI) in which data is used to teach computers to make decisions on their own, without a person needing to be there to do it for them.
But one of the main weaknesses of machine learning techniques in biology and medicine is the fact that they don’t incorporate biological knowledge—such as underlying cell biochemistry—in the learning process. In general, they also ignore this knowledge when making their predictions. This is because these systems treat biological information as data or numbers, so they don’t consider the actual biological meaning of these numbers.
Such systems are often referred to as “black box” systems. These are AI that are fed data, and provide users with a clear decision or prediction based on the patterns found in that data. However, it’s usually unclear how the AI made its decision because of how complex its analysis is.
Black box predictions aren’t a major issue in fields where high accuracy is the most important goal—such as in software used to predict spam emails. But it’s a major disadvantage in biomedicine. Black box predictions can’t be interpreted by researchers because of how complex they are, meaning they have little understanding of how the AI algorithm reaches its prediction.
“White box” systems, on the other hand, could be slightly less accurate in their decisions or predictions, but it’s clearer to users the relationships they’ve inferred based on the data given. The benefit of white box systems is that users can understand what information the system used to make its prediction, and because it’s understandable, users can also interrogate the decision itself and interpret it from a biological point of view.
Machine learning predictions need to be interpretable and justifiable to be trustworthy and to work in biomedicine. In the case of detecting cancer, if the AI technique made a false-positive prediction, it could lead to unnecessary treatment—while false-negative predictions could lead to the disease being left untreated. Understanding the predictions made by machine learning algorithms will also help avoid false negatives when researching potential drugs and any side effects they might have.
Predicting cell growth
In order for AI methods to work in biomedicine, we first needed to design a machine learning approach that could predict cell growth, and understand what was driving this growth. Understanding how cells grow and how their growth changes in different conditions is the first step in being able to design an AI that can detect the presence of a disease or predict how well certain treatments may work.
Our team evaluated 27 different machine learning approaches that looked at both gene expression profiles and mechanistic metabolic models. Gene expression profiles showed how the cell’s process of assembling proteins changed under a variety of conditions. Metabolic models showed how the underlying cell biochemistry works in each strain.
We then built our own white box machine learning technique, which would allow us to easily interpret how the AI made its decision, overcoming the shortfalls of previous computer learning techniques. We did this by teaching our AI to make decisions using data from both gene expression and metabolic models—something that hasn’t been done before.
Using both models to build our machine learning approach improved predictive accuracy compared to using only gene expression data by up to 4% in some cases. This has the advantage of revealing previously unknown interactions between gene expression and metabolic activity.
We then checked our approach on more than 1000 different strains of Saccharomyces cerevisiae – a species of yeast common in baking, brewing, and wine making. Data on this type of yeast is widely available, making it easy to evaluate the effectiveness of our machine learning approach.
The results from the yeast showed that with our white-box approach, we can maintain and in some cases improve the predictive accuracy of AI techniques. But importantly, we also offer an interpretation of these predictions, by explaining which biochemical reaction is active in the cell across various conditions.
Our approach incorporates information on biological mechanisms, such as cell biochemistry, in the learning process. This overcomes the black-box limitations of conventional data-driven approaches, and achieves a step towards the development of interpretable machine learning models.
The advantage of this is that machine learning models based on our approach will be more trustworthy. Our results show that combining data and knowledge-driven models gives researchers more information about how cells grow and work in certain conditions.
While this will still need to be tested using human cells, it could have many promising applications in the future. For example, understanding how cancer cells are influenced by their genetic make-up and by environmental conditions is a major and pressing challenge in treating and preventing it.