It works by first identifying presumed harmful variants in a patient’s genome. The algorithm then cross-references the various mutations against large databases linking genes and symptoms, and then determines the likelihood of any given gene variant being implicated in the patient’s disease.
Other tools available to scour the genome for harmful mutations tend to rely solely on DNA sequence data. But, the KAUST team’s new PhenomeNET Variant Predictor (PVP) system includes clinical information from a patient’s medical record as well. It also incorporates reams of phenotype data from systematic evaluations of mice and zebrafish that match DNA changes to disease features.
“Ours uses more information than other tools, and we look for potential causative variants, not just a deleterious variant,” explained Professor Robert Hoehndorf, who led the study, along with his Ph.D. student Imane Boudellioua.
In their new paper, the researchers used a retrospective dataset from the UK and the Supercomputing Laboratory at KAUST to show that PVP accurately identified the causative gene variants responsible for congenital hypothyroidism. Mutations in a number of different genes are known to cause the disease, leading to an under-production by the thyroid gland in the neck of iodine-containing hormone needed for normal growth and development. As reported, PVP pinpointed the gene variants responsible for congenital hypothyroidism in individual patients, both in sequence datasets that spanned the entire genome and in those that included only the protein-coding portion.
Hoehndorf envisions the tool becoming a part of the clinical geneticist’s diagnosis routine. For a patient with a suspected genetic disease, doctors could sequence that person’s genome, give a full clinical workup and then run the algorithm. “PVP should be able to identify the variant or variants causing the patient’s phenotypes (symptoms) directly in most cases,” he says.
Still, there’s room for improvement. Hoehndorf explained that PVP can find pathogenic DNA variants in genes that have already been implicated in disease, either in people or in lab organisms; however, around two-thirds of the protein-coding genes in mice still await full characterization. While more genes have been characterized in zebrafish, the evolutionary distance between fish and humans (and differences in experimental protocols) makes this kind of cross-species comparison more challenging.
“We desperately need more high-quality phenotype data from model organisms, in particular the mouse, to improve our system,” Hoehndorf said.