2 October 2015
An international collaboration to study human genetic variation across the world has completed its final phase with the publication of two papers in Nature.
The 1000 Genomes Project provides the most comprehensive view of global human variation so far. Over the last eight years the project has involved the analysis of the genomes of 2504 individuals from 26 populations across five continents and 18 countries, exceeding the projects initial goal of 1000 Genomes.
The initial goal was to find most genetic variants that have frequencies of at least 1% in the populations studied in order to have a better understanding of the history and evolution of genetic variation in the human population.
“We now have a public repository that describes the range and diversity of genetic variation around the world…we now know which genes rarely change and which are altered in different populations,” said Dr Goncola Abecasis, co-principal investigator for the main Nature study.
In the main Nature study, investigators identified 88 million sites in the human genomes. Of these, only 12 million had common variants that were likely shared by many of the populations. The group with the most variants was African, consistent with evidence that humans originated in Africa.
The second paper looked at the structure of the genome and found nearly 69,000 structural variants, with some occurring in genomic regions that have previously been associated with complex traits and disease. The new data is expected to be a starting point for future mechanistic studies.
Over the course of the project, the team have developed new, improved methods for large-scale DNA sequencing, analysis and interpretation of genomic information which have been a blueprint for larger genome studies in Alzheimer’s, autism and diabetes.
People who donated DNA to the project consented to the full release of their genetic data, with the understanding that no associated health data would be collected. This means that the current data sets and analyses are freely accessible to researchers across the world.
An immediate use of the dataset generated is for genome-wide association studies which compare the genomes of people with and without a disease, in order to search for regions of the genome that contain genomic variants associated with that disease. GWAS studies can find several genomic regions associated with a disease and many variants in each of these regions. Rather than sequencing the genomes of all the people in a study, researchers can use the 1000 Genomes Project data to find most of the variants in the regions identified by the GWAS study to be associated with the diseases, saving considerable resources.
Looking ahead, Dr Adam Auton, main Nature study senior author said: “Everyone now wants to know what these variants tell us about human disease”.