Researchers Explain Signals of CpG ‘Traffic Lights’ in DNA

genes
Illustration. CpG traffic light for genes. Credit: @tsarcyanide/MIPT Press Office

A research team featuring bioinformaticians from the Moscow Institute of Physics and Technology (MIPT) has identified reliable markers of gene activity. The discovery has potential for future applications in clinical practice. The findings are reported in BMC Genomics.

In terms of macromolecular chemistry, DNA is a polymer, or polynucleotide, composed of four kinds of repeated units known as nucleotides. What makes the four types different are the associated nitrogenous bases: adenine (A), thymine (T), guanine (G), and cytosine (C). A DNA region with a C followed by G, connected via a phosphate (p), is known as a CpG dinucleotide (figure 1).

Figure 1. The colored region in the middle chart is a CpG dinucleotide. The DNA double helix is shown on the right. Credit: @tsarcyanide/MIPT Press Office

Genes are DNA regions carrying primary information about the RNA and proteins produced by a cell. The DNA sequence is usually the same across all cells in an organism, but the “working” genes actually involved in RNA synthesis are different for every cell type. This is enabled by regulatory mechanisms functioning as switches that launch or halt the production of specific RNAs. Once a certain level of organism complexity is reached in evolution, the number of genes does not increase any longer. Instead, more elaborate regulatory programs evolve to make sure the right genes are activated at the right time.

As of now, two major types of mechanisms regulating gene transcription (RNA synthesis) have been studied in detail: those involving so-called transcription factors and epigenetic regulators. Transcription factors are regulatory proteins capable of binding to DNA by recognizing certain nucleotide sequences in the regulatory regions of genes. Once a transcription factor has bound to DNA, it engages special cell machinery to initiate RNA synthesis from that particular gene. A large number of such factors is known (over 1,500 in humans). Their combinations regulate how active transcription should be and whether it should happen at all.

Epigenetic regulation involves mechanisms that control how active a gene is without affecting the primary DNA structure. One of such mechanisms is DNA methylation. It is mostly achieved by attaching a methyl group (CH₃) to the cytosine in a CpG dinucleotide (figure 2). Once attached, the methyl group serves as a marker signaling which genes are active or repressed, ultimately determining the cell type. It is no surprise that DNA methylation is associated with numerous biological processes, both normal and pathological. Abnormal DNA methylation is observed in cancer, metabolic disorders, cardiovascular, neurodegenerative, and other diseases.

Figure 2. Cytosine methylation: Methyl groups shown in purple and labeled as CH₃ or M bind to the cytosine nucleobase (C). Credit: @tsarcyanide/MIPT Press Office

By binding to cytosine nucleotides in distinct functional regions of the DNA, a methyl group can have distinct effects. For example, methylation of regions close to transcription start sites tends to suppress gene activity. Conversely, methylation of cytosines inside the gene usually serves to activate it (figure 3).

Figure 3. The red column denotes CpG dinucleotide traffic lights, their methylation levels correlate strongly with gene activity. The diagram shows the promoter region of the gene, the transcription start site, and the gene body (at the top). Credit: @tsarcyanide/MIPT Press Office

“In our previous papers, we showed that methylation of certain CpG dinucleotides was strongly associated with gene activity. We called such dinucleotides CpG traffic lights. Now we have demonstrated that the methylation of CpG traffic lights is a better indicator of gene activity than promoter or gene body methylation. In addition, we’ve shown that enhancers — DNA regions located away from genes but regulating their activity — are enriched in CpG traffic lights,” explained Yulia Medvedeva, the senior author of the paper and associate professor of bioinformatics and systems biology at MIPT, who leads the regulatory transcriptomics and epigenomics group at the Research Center of Biotechnology of the Russian Academy of Sciences.

“We noted that CpG traffic lights are conserved in the course of evolution. That is, these positions are relatively less susceptible to mutations, supporting their functional significance,” added Anna Lioznova, the first author of the study and a doctoral student at the Research Center of Biotechnology, RAS.

“We used to think that the main role of CpG traffic lights was to switch the regions of transcription factor binding from the active to the passive state,” said study co-author Ivan Kulakovskiy, a researcher at Engelhardt Institute of Molecular Biology and the Institute of Mathematical Problems of Biology, RAS. “Surprisingly, that previously explained and described mechanism proved to account for a relatively small share in the overall number of traffic lights. We suppose that the operation of traffic lights is tied to what’s known as an activity ‘map’ of DNA regions [aka. chromatin states], but the specific mechanisms are still to be discovered.”

CpG traffic lights are CpG dinucleotides whose methylation reflects the activation or repression of a gene encoded nearby. In other words, such dinucleotides signal whether RNA is to be synthesized from this gene, and ultimately whether the associated protein will be produced. By studying CpG traffic lights, the researchers hope to understand the mechanisms of gene regulation. In clinical practice, determining the status of cytosine methylation produces more reliable results, compared with direct measurements of gene activity. This opens up prospects for the clinical use of CpG traffic lights as effective indicators of gene activity.

The study was conducted by researchers from MIPT, Lomonosov Moscow State University, King Abdullah University of Science and Technology in Saudi Arabia, and the following institutions of the Russian Academy of Sciences: Research Center of Biotechnology, Vavilov Institute of General Genetics, the Institute of Mathematical Problems of Biology, Engelhardt Institute of Molecular Biology, and the Institute for Information Transmission Problems.

This research was supported by the Russian Foundation for Basic Research and the Russian Science Foundation.