18 June 2007
The first report from the ENCODE (Encyclopedia of DNA Elements) consortium suggests that we might need to make some quite fundamental adjustments to our thinking about how the human genome functions [The ENCODE Project Consortium (2007) Nature 447, 799-816 (abstract)]. The consortium used a range of high-throughput techniques to study the functions of an estimated 1% of the human genome, corresponding to about 30 million base pairs of DNA. Where possible, a combination of techniques was used to address a particular question, thus strengthening the conclusions.
The biggest surprise was that, instead of a discrete and tidy set of genes that are transcribed into RNAs and translated into proteins, the genome appears to encode a vast network of overlapping transcripts, for many of which there is as yet no known function. A corollary to this finding is that many regulatory DNA elements that were thought to be far away from the genes whose expression they regulate are in fact quite close to the transcription start site for one or more of the newly identified transcripts. Regulatory sequences also turned up just as frequently ‘downstream’ as ‘upstream’ from the transcriptional start, again overturning conventional wisdom. Studies on the short- and long-range ‘architecture’ of the genome confirmed many previous findings but also revealed a more sophisticated picture of the relationships between regulatory sites for transcription and DNA replication, chromatin packaging and histone modification.
Another ‘rule of thumb’ in genomics has been that functionally important DNA elements are likely to be evolutionarily conserved. The ENCODE results suggest that although there is indeed overlap between evolutionarily constrained sequences and functional sequences, some functional sequences appear to show considerable variability, not just among different mammals but even in different human populations. The ENCODE authors suggest that this pool of variable functional elements – if confirmed in higher-resolution studies – might serve as a ‘warehouse’ of raw material for natural selection.
Comment: The ENCODE results are an exciting milestone in functional genomics. As pointed out by Greally in an accompanying News and Views article [Greally JM (2007) Nature 447, 782-3], the findings relating to the transcriptional activity of the genome may prove to be particularly important in understanding how gene variants affect disease susceptibility, as many of the highly significant associations emerging from whole-genome associations implicate single nucleotide polymorphisms (SNPs) not located within any known genes. But this vast piece of work is just the tip of the iceberg: we don’t yet know whether the 1% of the genome studied so far is typical of the other 99%, or what might emerge from studying a wider set of cell types.