Around 80% of rare diseases are thought to have a genetic component, but currently many patients experience long delays in diagnosis or never receive a diagnosis at all. Recent developments in our ability to obtain whole or partial genome sequences cheaply and efficiently now makes it feasible for patients to benefit from this technology through cheaper, faster diagnosis of disease, and the development of new therapies. Many international projects are currently seeking to capitalise on these developments by sequencing hundreds of thousands of individuals, such as the UK 100,000 Genome Project. However, a major challenge remains how to associate changes in a patient’s DNA to their disease.
“The challenge for scientists is to identify which of the hundreds of thousands of genetic differences between a patient and an unaffected individual might be responsible for their disease,” says Dr Paul Schofield from the Department of Physiology, Development and Neuroscience at the University of Cambridge. “Given the huge complexity of this problem, it has been described as ‘looking for needles in stacks of needles’.”
Now, Dr Schofield and a team of researchers from the UK and Saudi Arabia have developed an algorithm, published this week in the journal PLOS Computational Biology, that can identify variants that modify the normal function of a gene associated with a particular disease.
A framework developed by the team, called PhenomeNET, matches a patient’s phenotype (symptoms) to a large database of gene-to-phenotype associations, including those from studies involving mice and zebrafish, in order to identify disease-causing genes.
Mice and zebrafish are commonly used when studying the biology underlying human diseases as they have a number of important genetic and biological similarities to us. For many years, data on the consequences of naturally-occurring and experimentally-induced genetic variants in these animal models have been collected resulting in a huge ‘Big Data’ resource associating genetic makeup and phenotype, such as the Mouse Genome Database, which contains more than 60,000 of these associations.
By combining PhenomeNET with methods that find harmful variants in a genomic sequence, the team developed the PhenomeNET Variant Predictor (PVP) system, an algorithm that prioritises these variants with their likelihood of involvement in human disease.
“Our algorithm makes use of clinical and experimental data that have been collected for years and uses them to identify the genetic variants underlying the conditions of patients with genetic disorders,” adds Professor Robert Hoehndorf from King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.
Working with Dr Nadia Schoenmakers at the Wellcome Trust-Medical Research Council Institute of Metabolic Science in Cambridge, the team was able to show that the new algorithm can identify genetic changes in patients with congenital thyroid disease, and can reveal candidate genetic changes in ‘Mendelian’ diseases where only a single gene is involved.
“We’ve shown that our algorithm works for simpler diseases and now the real test will be to determine whether a similar approach can be applied to complex diseases, such as diabetes, where multiple genes are involved,” says Professor George Gkoutos from the University of Birmingham.