New work from Rice computer scientist Luay Nakhleh, graduate student Dingqiao Wen and postdoctoral researcher Yun Yu adds a Bayesian inference component to the Nakhleh lab’s long-running development of PhyloNet. PhyloNet is an open-source software package to determine aspects of evolution that wouldn’t show up on a standard evolutionary — or phylogenetic — tree but would appear as part of a network. A paper on the research appeared this month in PLOS Genetics.
The latest iteration of PhyloNet includes the update and will be the topic of an invited talk and a tutorial session by Nakhleh and his group at Evolution 2016, a joint conference of the American Society of Naturalists, the Society for the Study of Evolution and the Society of Systematic Biologists. The conference will be June 17-21 in Austin, Texas.
Bayesian inference is a statistics-based method to estimate probabilities based on a data set. The massive amount of genomic data sets becoming available through sequencing presents opportunities to look deeper into the evolutionary history of life on Earth, and Nakhleh and his group are positioning PhyloNet to help biologists take advantage of them.
Nakhleh said his group’s research has evolved since he joined Rice in 2004. “Our approaches originally were not statistical, but in the last five years or so we shifted toward likelihood-based ones,” he said.
Now methods based on maximum likelihood are giving way to Bayesian probability (named for 18th-century statistician Thomas Bayes). It allows researchers to specify, in a probabilistic way, “prior” knowledge on the models being inferred for connections through hybridization or horizontal gene transfer — collectively known as reticulate evolution — which sidestep parent-to-child transfer as ways to spread favorable genetic traits.
“A biologist says, ‘I want to infer the evolutionary history of this group of species.’ There might be some prior knowledge about them. In a Bayesian framework, there’s a formal way of specifying that,” Nakhleh said.
The new method samples evolutionary histories, called phylogenetic networks, and makes suggested connections between closely related species, or even multiple individuals within species.
“That’s one of the nice things about the Bayesian approach and why we went in that direction,” he said. “We don’t give biologists a single answer. We don’t say, ‘Here’s your optimal evolutionary history.’ We say, ‘Here’s a sample of evolutionary histories the method found while walking the space of possible evolutionary histories.’”
To test its technique, the team ran synthetic data sets through the software to confirm its accuracy and then looked for links in known genomic sets from bread wheat, mosquitoes and house mice.
“Sometimes the choice of data is based on one simple criterion: This is the data that is available to us,” Nakhleh said. “We don’t generate data, but we can look into the literature to see what is out there, what biologists have reported that we can use our method on. And these happened to be three recent data sets.”
The wheat and mosquito calculations involved data sets from multiple species, while the mouse study involved individuals from the same species. But for each, PhyloNet delivered a set of plausible connections in the species’ phylogenetic histories.
Including wheat data was a natural choice, Nakhleh said. “When people think about hybridization and biology, they usually think about plants,” he said. “Plants hybridize naturally and can be made to hybridize. You can see it in any grocery store that has plumcots,” a hybrid of plums and apricots.
Nakhleh is pleased that PhyloNet — which he calls “a testament to the quality of students at Rice” — is finding wide acceptance in the evolutionary biology community. “For these types of evolutionary relationships, we have the main piece of software out there for the community to use,” he said. “And from day one, it has been publicly available as open source. I want it to be open to the community because this is how I believe research should be done.”
Nakhleh is an associate professor of computer science, of ecology and evolutionary biology and of biochemistry and cell biology. The National Science Foundation (NSF) supported the research. The researchers used the Night Owls Time-Sharing Service supported by the NSF and administered by Rice’s Ken Kennedy Institute for Information Technology.