New statistical method helps write the instruction manual for biological complexity
Researchers take a mathematical and statistical approach to identify patterns in how proteins interact with each other across thousands of bacteria to build the collective whole.
Director of Communications, Biological Sciences Division
Imagine building a complicated Lego set, like this Star Wars Millennium Falcon model with more than 7,500 pieces. Given enough time, you could build it successfully if you follow the instruction manual carefully. But if someone handed you a box of 7,500 pieces with just a picture of a finished spaceship, you would never be able to do it. You might be able to piece together a few elements like the landing gear or cockpit based on what you see on the outside, but you could only guess at the thousands of joints and structures inside that hold the whole thing together.
For decades, scientists have been trying to reverse engineer complex systems in nature the same way. Using the available molecular biology and genetic tools, for example, they can deconstruct the genes and proteins involved in cell motility, or how it moves. But that approach can never scale to all the body parts and processes that make up an entire organism, or hundreds of strains of bacteria interacting with each other in a microbiome. Instead, they need an instruction manual that tells them how the individual genes make proteins, which in turn make cells, which form into tissues, which then build organs, and so on to form the complete system from the inside out.
Connections at SCALES
Arjun Raman, MD, PhD, is a systems biologist at the University of Chicago who studies how complex systems grow and evolve in nature. This involves more than just breaking down the individual pieces of how one component works, because the whole of the system is always greater than the sum of its parts. Instead, he wants to learn the rules that helped the system evolve, so he can write a version of its instruction manual and engineer new systems to solve biological problems and treat disease.
“Using existing approaches, we’ve been able to come up with an artisanal, one-of-a-kind picture of what cell motility looks like. But good luck doing that for another function, because it will take that much more time,” Raman said. “Instead, we posed the question: How do you come up with a framework that tells you both about the individual parts and how they collect together to make a whole?”
In a new study published recently in eLife, Raman and a team of scientists from UChicago and Washington University in St. Louis used a mathematical and statistical approach to identify patterns in how proteins interact with each other across thousands of bacteria. The method, they call SCALES (Spectral Correlation Analysis of Layered Evolutionary Signals), can help scientists create the instruction manual for a complex system, connecting the genotypes of organisms to actual biological properties and behaviors. True to its name, it also enables researchers to make these connections at different scales.
“We saw that if we sequenced a ton of bacteria and cells, we could statistically define emergent properties, meaning how these parts come together to create intermediate units of function that then ultimately create the collective function,” Raman said.
The team started building their new method by downloading genome data from more than 7,000 bacteria that are available on a public website repository and annotated their protein structures. They then put this data into a matrix, basically an enormous spreadsheet with one row for each of the 7,000 bacteria, and columns for their collective 10,000 genes. The data in each cell represents a value of how much the gene is expressed; how these numbers vary across columns can tell you how the genes interact. For example, if the values in two columns increase or decrease at the same rate, they likely interact with each other. Sometimes, only a subsection of the column interacts in concert, and some may appear more tightly linked statistically than others.
Raman and his team compared the results from their tool to a well-known system for proteins involved in motility and found that it corresponded perfectly. The tool identified the same protein interactions it had taken scientists decades to find using traditional biochemical techniques. They also applied it to an understudied bacteria called Pseudomonas aeruginosa and identified a previously unknown gene that affects its motility.
“This was in part of the genome that is known to affect a specific type of motility, but the protein itself was uncharacterized,” Raman said. “So, here's an example of where deconstructing systems in the lab is going to be incomplete, because you would never have found that kind of needle in a haystack amongst the thousands of genes in that organism.”
Describing emergence
Since this technique is based on math, it can be applied to larger units of complexity. Instead of comparing the gene and protein structure within single organisms, for example, you could use it compare entire microbiomes created from thousands of bacteria. Raman would like to be able to use it to zero in on the combinations that give microbiomes different characteristics, like the ability to colonize a certain ecosystem or resist different pathogens.
“We're thinking about this as a paradigm for describing emergence in biology,” he said. “Our next question is to ask whether or not the statistical framework that we’ve described can be generative, meaning can it not just describe what’s out there, but can it also be used to make new systems.”
Additional authors on the paper, “Defining hierarchical protein interaction networks from spectral analysis of bacterial proteomes,” include Alexander S. Little, Fidel Haro, and Valeryia Aksianiuk from UChicago; and William J. Buchser, Aaron DiAntonio, Jeffrey I. Gordon, and Jeffrey Milbrandt from Washington University School of Medicine, St. Louis.