18 August 2009
In their study, Ng et al. have carried out targeted sequencing of all of the protein-coding regions of eight HapMap individuals, as well as four unrelated individuals with a rare autosomal dominant disorder – Freeman-Sheldon syndrome (FSS) – to demonstrate an approach for the discovery of rare highly penetrant variants [Ng et al. (2009) Nature doi:10.1038/nature08250]. They enriched the coding sequences from the genomes by targeted capture using microarrays; the captured exomes were then sequenced using high-throughput sequencing. The quality of the exomic data was assessed in a number of ways in order to validate the sensitivity and specificity of the technique in identifying variants.
The candidate gene related to FSS was identified through a number of steps taken to eliminate background non-causal variants. Firstly, the number of genes that had one or more non-synonymous coding SNPs (i.e. those with potentially the highest impact on phenotype), splice site disruptions or coding indels in one or several FSS exomes were investigated. Filters were then applied to remove common variants present in the dbSNP catalogue (a public database of SNPs) or the eight HapMap exomes. This narrowed the possible disease-causing candidates to a single gene, MYH3, which had previously been identified using a candidate gene approach. A disruption of this gene was observed in all four individuals with FSS but not in the dbSNP or the HapMap exomes.
The authors suggest that “direct sequencing of exomes of small numbers of unrelated individuals with a shared monogenic disorder can serve as a genome-wide scan for the causative gene”. They further suggest that this strategy may be easier when applied to recessive diseases, as there are far fewer genes which are homozygous or compound heterozygotes. This strategy may also be applied to complex common diseases, but will require larger sample sizes and a better approach to assessing the impact of the mutation in order to combat increasing extent of genetic heterogeneity. The authors point out that although this approach is useful in discovering causal-variants, one limitation is that is does not identify structural or non-coding variants, which may be found by whole genome sequencing.