A new approach developed by Zak Costello (left) and Hector Garcia Martin brings the the speed and analytic power of machine learning to bioengineering. Credit: Marilyn Chung a Berkeley Lab

Researchers from the U.S. Department of Energy’s Lawrence Berkeley National Laboratory have created a new method using machine learning to accelerate the design of microbes that produce biofuel.

To speed up the production of biofuels, the scientists developed a computer algorithm that begins with abundant data about the proteins and metabolites in a biofuel-producing microbial. However, the algorithm does not contain information about how the pathway actually work and instead uses data from previous experiments to learn how the pathway will behave.

This new technique enables scientists to automatically predict the amount of biofuel produced by pathways that have been added to E. coli bacterial cells.

A pathway is a series of chemical reactions in a cell that produce a specific compound, which researchers have sought to find ways to re-engineer and import from one microbe to another. This could  improve a number of fields including medicine, energy, manufacturing and agriculture.

While this often proves difficult, new synthetic biology tools like CRISPR-Cas9 enables scientists to conduct research at a heightened precision.

“But there's a significant bottleneck in the development process,” Hector Garcia Martin, group lead at the DOE Agile BioFoundry and director of Quantitative Metabolic Modeling at the Joint BioEnergy Institute (JBEI), a DOE Bioenergy Research Center funded by DOE's Office of Science and led by Berkeley Lab, said in a statement.

“It's very difficult to predict how a pathway will behave when it's re-engineered. Trouble-shooting takes up 99 percent of our time,” he added. “Our approach could significantly shorten this step and become a new way to guide bioengineering efforts.”

Researchers currently predict a pathway’s dynamics by using several different equations that describe how the components of a system changes over time. The kinetic models take months to develop and result in predictions that do not always match the experimental results.

By using machine learning, scientists can train a computer algorithm to make predictions of a pathway using data from related systems.

To test the new technique, the team added E. coli cells to a pathway designed to produce limonene—a bio-based jet fuel—and a pathway that produces a gasoline replacement called isopentenol. The researchers fed the algorithm data from previous experiments that produced data related to how different versions of the pathways function in various E. coli strains.

The algorithm taught itself how the concentrations of metabolites in these pathways change over time and how much biofuel the pathways produce by analyzing data from two experimentally known pathways that produce small and large amounts of biofuels.

The algorithm also predicted the behavior of a third set of mystery pathways it had never seen before and accurately predicted the biofuel-production profiles for the mystery pathways, including that the pathways produce a medium amount of fuel. In addition, the machine learning-derived prediction outperformed kinetic models.

“And the more data we added, the more accurate the predictions became,” said Garcia Martin. “This approach could expedite the time it takes to design new biomolecules. A project that today takes ten years and a team of experts could someday be handled by a summer student.”

The study was published in npj Systems Biology and Applications.