A new paper published in Science reveals a second, ‘hidden’ form of coding hidden within the genome that directs regulation of gene expression, a finding with potentially highly significant implications for genomic medicine.
The DNA coding regions of the genome (the exome) have long been known to specify the amino acid chains that make up proteins via a triplet code, with each set of three nucleotides (a codon) specifying addition of a specific amino acid or termination of the amino acid chain. Now it seems that 15% of these codons, dubbed ‘duons’ by the researchers, also have a regulatory function by specifying transcription factor (TF) recognition sequences.
TFs are sequence-specific DNA binding proteins that control gene expression; they have known binding sites outside coding regions, including sites close to the start of gene coding sequences; others may be very distant, but can interact directly with cellular gene expression machinery (with intervening DNA looped out of the way) to influence the process.
The researchers mapped TF occupancy of the human exome in 81 different human cell types and found that ‘duons’ were highly conserved. However, genetic variants within duons resulted in significant functional variability, with 17% of single-nucleotide variants having an impact on TF binding. They conclude that ‘pervasive dual encoding of amino acid and regulatory information appears to be a fundamental feature of genome evolution’ and have evolved together.
Lead researcher Dr John Stamatoyannopoulos of the University of Washington said: “For over 40 years we have assumed that DNA changes affecting the genetic code solely impact how proteins are made…Now we know that this basic assumption about reading the human genome missed half of the picture. These new findings highlight that DNA is an incredibly powerful information storage device, which nature has fully exploited in unexpected ways”.
The discovery is part of research within ENCODE, the Encyclopedia of DNA Elements Project (see previous news).
Comment: These findings add a new lawyer of complexity to the multiple elements of gene expression regulatory control already known to exist, such as alternative splicing, RNA structural elements and so on, as well as opening a new line of enquiry for researchers attempting to understand how specific mutations can cause disease. Harmful genetic changes within coding regions may directly disrupt control of gene expression as well as protein coding sequence, if they affect a ‘duon’. However, many details about the process remain to be determined.