Computational biologists at Carnegie Mellon University have developed a more accurate computational method for reconstructing the full-length nucleotide sequences of the RNA products in cells, called transcripts, that transform information from a gene into proteins or other gene products.
A report on Scallop by Carl Kingsford, associate professor of computational biology, and Mingfu Shao, Lane Fellow in theSchool of Computer Science‘sComputational Biology Department, is being published online today by the journalNature Biotechnology.
Scallop is a so-called transcript assembler, taking fragments of RNA sequences, called reads, that are produced by high-throughput RNA sequencing technologies (RNA-seq), and putting them back together, like pieces of a puzzle, to reconstruct complete RNA transcripts.
“There are many existing assemblers,” Shao said, “but these existing methods are still not accurate enough.”
When compared to two leading assemblers, StringTie and TransComb, Scallop is 34.5 percent and 36.3 percent more accurate for transcripts consisting of multiple exons — subunits of a gene that encode part of the gene product.
Like other reference-based assemblers, Scallop begins by constructing a graph to organize reads that are mapped to the corresponding locations on the gene’s DNA. Many alternative paths exist for connecting the reads together, however, so errors are easily made. Scallop improves its odds by using a novel algorithm to take full advantage of the information from reads that span several exons to guide it to the correct assembly paths.
Scallop proves particularly adept when assembling less abundant RNA transcripts, improving upon the accuracy of StringTie and TransComb by 67.5 percent and 52.3 percent.
“We’ve had more than 100 downloads already and, based on the feedback we’ve received, people are really using it,” Shao said. “We expect more users now that our paper is out.”
Source : Carnegie Mellon University