As genomics information becomes more medically useful, the storage of this data in medical records will inevitably become more vulnerable, Yale researchers suggest. While large-scale genetics data is invaluable for research purposes, there is another side to the coin: A hacker can use subtle correlations implicit in the data to reveal sensitive facts related to patients — such as disease diagnosis.
“In a sense, we are more at risk today than we thought for disclosure of individualized medical information from anonymous data sources,” said Mark Gerstein, the Albert L. Williams Professor of Biomedical Informatics and co-author of the study published Feb. 1 in the journal Nature Methods.
Gerstein and co-author Arif Harmanci describe how identifying information can be mined from private databases through “linking attacks.”
In one hypothetical situation, a political enemy might sequence the DNA of a candidate from saliva left on a cocktail glass and then find a matching genetic signature from an anonymous cancer research database. The political agent could use this linking not only to identify the candidate as a cancer survivor, but also to pinpoint predispositions for other medical illnesses of the candidate, and even his or her family.
The risk of characterization from anonymous data sources will only increase as amount of genomic information available increases, the authors say.
“The contribution of genomic information to different information sources may individually be for good and noble causes such as medical research, but when they are brought together they may create a toxic mixture,” Gerstein said.
Gerstein stressed that no one should stop volunteering genomic information for medical research but scientists should take additional steps to avoid disclosure of this information through such linkage attacks.
“If we say something is private, we should take the necessary steps to ensure information remains private,’’ he said.