Theoretical analysis stresses importance of sample size in genome-wide association studies

23 August 2007   |   By Dr Alison Stewart   |   News story

Recent months have seen publication of several research papers reporting striking success in the use of genome-wide association studies to find genetic variants (alleles) associated with common, complex diseases. These studies typically scan many thousands of single-nucleotide polymorphisms (SNPs) distributed across the genome, to look for differences between cases (i.e. individuals with the disease) and controls. The International HapMap project has facilitated some of this work by identifying ‘tagging’ SNPs that are representative of larger sets of SNPs (SNP haplotypes) in specific populations.

One problem with genome-wide association studies is that increasing the chances of finding associations by increasing the number of SNPs also increases the chances of false-positive results. This is sometimes known as the ‘multiple testing’ problem. A theoretical analysis published in the advance on-line publication section of the journal Human Molecular Genetics reports that, unless sample sizes (that is, the numbers of cases and controls) are sufficiently large, any advantage from increasing SNP coverage is wiped out by the increased risk of spurious associations [Nannya Y et al Hum Mol Genet. 2007 Jul 31; (Epub ahead of print)]. According to their analysis, conducted by simulating large numbers of case-control panels based on empirical data from the HapMap project, studies using around 1000 cases and 1000 controls are adequate for detecting variants with relative risks of at least 1.7. However, many gene-disease associations in common disease are expected to have relative risks considerably lower than this; studies to detect these weaker alleles while avoiding false positives should, according to Nannya et al, concentrate on achieving large sample sizes rather than denser SNP coverage.