Machine Learning Infers Microbial Relationships

Carnegie Mellon University‘s Radu Marculescu is using machine learning to understand the microorganisms in the human gastrointestinal tract. His work could lay the groundwork for improved preventative medicine.

“It turns out, the interactions that happen in the human [GI] microbiome have far more implications than we originally thought,” said Marculescu, a professor in the Department of Electrical and Computer Engineering. “People associate changes in the microbiome to depression, infections, even cancer, so it’s sort of like a second brain for humans.”

Marculescu, along with ECE Ph.D. student Chieh Lo, has developed a machine learning algorithm — called MPLasso — that uses data to infer associations and interactions between microbes in the GI microbiome. MPLasso mines medical and scientific literature from the past few decades in search of experimental data from research focused on various types of microbial interactions and associations. MPLasso pulls this disparate information into a centralized dataset that catalogs microbial interactions within the human GI tract.

Machine learning is a novel approach for this type of investigation. Marculescu’s CMU-based System Level Design Group, which commits time to cyber-physical systems research, seemed like the right venue in which to tackle such a project. In doing so, he found a way to provide medical researchers and professionals with a catalog of inferred microbial interactions that can bolster the understanding of how those interactions influence and affect human health.

Until now, it’s been challenging to get a good look at how microorganisms interact in the human GI tract. Marculescu knows it will be years before advanced technologies like engineered ingestible pills and bacteria are ready for mainstream adoption, but he sees MPLasso as a major step in helping researchers better understand how the microorganisms in the human GI tract co-exist.

Marculescu says this type of information is extremely valuable for preventive medicine because it lays the groundwork for uncovering how microbial interactions translate into a person being healthy or sick. If researchers first understand what microbes are present and how they behave together, they can then start establishing cause-and-effect relationships between microbial interactions and various types of ailments.

“Researchers also observe real experimentation. They observe microbial presence at, and interactions during, various events in the body,” Marculescu said. “Based on this, one can infer a network of interactions that is predictive in nature.”

MPLasso has shown to be 95 percent accurate in the associations and interactions it infers in part because it addresses issues of high-dimensionality and compositionality of human microbiome data. High-dimensionality refers to the number of potential microbial associations and interactions that exist being far larger than the number of samples available in any given library of data. Compositional data provides numbers as a percentage of a whole and not as an exact measurement.

Marculescu and Lo have made MPLasso publicly available through GitHub. When researchers download it for their own use, they also can upload their own data to the platform. MPLasso offers a user-friendly interface for a database that continuously updates as it mines newly uploaded and relevant data.

“You’ll see the difference between yesterday and a few months ago because the algorithm is automatically collecting and improving your model,” Marculescu said. “Your model today is better than the one two weeks ago or two months ago simply because if something has been reported that aligns with what you’re looking at, the associations become that much stronger.”

Source : Carnegie Mellon University