Punctuating the Messages Encoded in Human Genome with Transposable Elements

Genetic landmarks2
Georgia Tech researchers (l-r) Lu Wang, Emily Norris, King Jordan and Lavanya Rishishwar with a diagram showing Mammalian-wide Interspersed Repeats (MIRs). Their study found that MIRs can serve as genetic landmarks that help to target specific regulatory mechanisms to a large number of genomic sites and thereby lead to the coordinated regulation of the genes located nearby these sites. (Credit: John Toon, Georgia Tech)

Since the classical studies of Jacob and Monod in the early 1960s, it has been evident that genome sequences contain not only blueprints for genes and the proteins that they encode, but also the instructions for a coordinated regulatory program that governs when, where and to what extent these genes and proteins are expressed. The execution of this regulatory code is what allows for the creation of very different cell and tissue types from the same set of genetic instructions found in the nucleus of every cell.

A recent study published in the journal Proceedings of the National Academy of Sciences(PNAS) shows that critical aspects of this regulatory program are encoded by genomic sequence elements that were previously thought to be mere “junk DNA” with no important functions.

The vast majority of the human genome – about 98 percent of the total genetic information – is not dedicated to encoding proteins, and this non-coding sequence was initially designated as junk DNA to underscore its lack of apparent function. Much of the so-called junk DNA in our genomes has accumulated over evolutionary time due to the activity of retrotransposable elements (RTEs), which are capable of moving (transposing) from one location to another in the genome and make copies of themselves when they do so.

These elements have been considered as genomic parasites that exist by virtue of their ability to replicate themselves to high numbers within genomes without providing any beneficial function for the hosts in which they reside. However, recent studies on RTEs have shown that they can in fact encode important functions, and much of their functional activity turns out to be related to how genomes are regulated. RTEs have been linked to stem cell function, tissue differentiation, cancer progression and ultimately to aging and age-related pathologies.

The PNAS study provides a new perspective on the role that RTE-derived sequences play in the precise execution of the human genome’s regulatory program. This study found that one particular class of RTEs – Mammalian-wide Interspersed Repeats (MIRs) – can serve as genetic landmarks that help to target specific regulatory mechanisms to a large number of genomic sites and thereby lead to the coordinated regulation of the genes located nearby these sites.

This discovery was spearheaded by a team of computational biologists, led by King Jordan, associate professor in the School of Biology at the Georgia Institute of Technology and director of Georgia Tech’s Bioinformatics Graduate Program. Jordan’s lab performed a “big data” analysis of massive datasets generated by hundreds of scientists from dozens of laboratories around the world working as part of the “Encyclopedia of DNA Elements” or ENCODE project. Their comprehensive and integrated data analysis, conducted primarily by first author Jianrong Wang from Jordan’s team, allowed them to pinpoint the location of thousands of individual MIR elements in the human genome that appear to function as so-called “boundary elements” in T lymphocyte cells of the immune system.

Boundary elements are epigenetic regulatory sequences that separate transcriptionally active regions of the human genome from transcriptionally silent regions in a cell-type specific manner. In doing this, these critical regulatory elements help to provide distinct identities to different cell types, although they all contain identical sets of information. The regulatory programs that underlie these cell- and tissue-specific functions and identities are based largely on genome packaging.

Genes that should not be expressed in a given cell or tissue are located in tightly packaged regions of the genome and inaccessible to the transcription factors that would otherwise turn them on. These boundary elements help to establish the geography of genome packaging by delineating the margins between silent regions in which genes are not expressed and active regions in which they are. In this critical role, boundary elements help to control the timing and extent of gene expression across the entire genome. As a result, defects in the organization of the genome by boundary elements are highly relevant for physiological and pathological processes.

“One thing that is particularly striking is the fact that these punctuation marks play a role that is deeply evolutionary conserved,” said Jordan. “The same exact MIR sequences were able to function as boundaries in human CD4+ lymphocytes, in mouse cell models and in Zebrafish.”

The paper built on earlier work by a San Francisco company, Aelan Cell Technologies, which had discovered that another class of retrotransposon, the SINEB2 element, can provide boundary function at the mouse growth hormone locus. The company’s research team collaborated with Jordan’s lab on the project.

“We randomly picked a handful of the MIR sequences predicted to serve as boundary elements by the Jordan lab and experimentally validated their activity in mouse cell lines and, with help of our Spanish collaborators, in Zebra fish upon embryonic development,” said Dr. Victoria Lunyak, CEO of the company. “This testing revealed that MIR sequences can serve as punctuation marks within our genome that enable cells to correctly read and comprehend the message transmitted by the genomic sequences.”

Aging is characterized by a number of global changes in genome organization and function, and aging-associated defects in how our genome is packaged can have severe pathological consequences. In particular, age-related defects in genomic packaging can greatly increase the susceptibility of the genome to damage. Based on the discoveries published in their PNAS paper, the Jordan lab at Georgia Tech and the Lunyak team at Aelan Cell Technologies and their partner Nuclea Biotechnologies, are now working toward the development of novel diagnostic and therapeutic strategies that target the critical roles of epigenetic regulators, such as human retrotransposons, in coordinating cell-type specific regulatory programs.

“This is an important discovery because the understanding of how RTEs punctuate messages encoded in the human genome can help researchers to develop treatments for a wide variety of human diseases, including aging,” added Lunyak.

At Georgia Tech, the research was supported by an Alfred P. Sloan Research Fellowship in Computational and Evolutionary Molecular Biology and by a Georgia Tech Integrative BioSystems Institute pilot program grant.

CITATION: Jianrong Wang, et al., “MIR retrotransposon sequences provide insulators to the human genome,” (Proceedings of the National Academy of Sciences, 2015). http://www.pnas.org/content/early/2015/07/22/1507253112.abstract?sid=c90…