29 October 2018
As genomic medicine becomes further established in the NHS, a new generation of sequencers capable of reading exceptionally long stretches of DNA is causing a stir in genomics research and inching its way towards the clinic. In two complementary policy briefings, we provide an introduction to long-read sequencing, discuss the advantages it presents, and examine what impact it might have on clinical diagnostics.
Genomic sequencing is an increasingly important tool for the diagnosis of patients with rare diseases, and for informing the care of patients with cancer. Whole genome sequencing (WGS) has been undertaken as part of the 100,000 Genomes Project – with 87,231genomes* sequenced so far – and will be available as a routine diagnostic for certain conditions in the NHS in England this year. The technologies used to sequence genomes within this, and other international sequencing projects, are primarily high-throughput short-read sequencing (SRS) systems, which are capable of reading millions of short strands of DNA in parallel. Indeed, SRS is the dominant form of DNA and RNA sequencing worldwide.
Genomic analysis was revolutionised by the introduction of high-throughput SRS systems in 2006. These have enabled production of the ‘$1000 genome’ and facilitated significant advances in our understanding of genomics as a whole. However, these technologies have important limitations, including difficulty in reading particular, but not uncommon, features of the genome and detecting large mutations, alterations or rearrangements. This means there are gaps in our knowledge about some parts of our genome sequence. Deciphering these features is important for building a more complete and accurate picture of the ‘whole’ genome, and indeed some features that could be clinically relevant.
Do you care about healthcare? Take the PHG Foundation survey to tell us what you think of our work and help us make healthcare better - click here
Genomes are too long to be read in one piece, so shorter sections are sequenced in parallel, producing ‘reads’. Reconstructing a genome can be compared to completing a jigsaw puzzle, where reads are the jigsaw pieces. The larger the pieces, the easier many sections are to fill in.
Genomes are made up of a sequence of paired bases; thus, genomes and sections of DNA can be measured in base pairs (bp). Long-read, single-molecule sequencers such as those produced by Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (Nanopore) – arguably the two most talked-about long-read sequencing (LRS) technologies – are capable of producing reads hundreds of thousands of bp in length, with an average read length of around 10-100kbp. Long reads can span complex genomic features, allowing for greater resolution and making correct placing of these features into a reconstructed genome more feasible.
These advantages could, in theory, lead to more simple and comprehensive sequencing. Long reads could be particularly useful in clinical diagnostics for, amongst others:
These systems also have the potential to consolidate laboratory processes by providing a range of information that often requires several pieces of equipment to obtain. In 2017, US doctors used PacBio LRS to diagnose a young patient with a rare genetic condition where it had not been possible to do so using other techniques.
Alongside the inherent advantages of longer reads, the differing foundational technologies of long-read sequencers offer different opportunities. The Nanopore MinION is highly portable, being only the size of a mobile phone and weighing less than 100g, facilitating its use in areas with poor infrastructure and in mobile laboratories. PacBio is able to generate very high accuracy consensus data which can exceed that of conventional SRS, offering new insights into previously poorly understood areas of the genome. These technologies also offer short or flexible run times which are useful for rapid sequencing of infectious diseases such as tuberculosis.
Although LRS offers new opportunities and advantages, there are several areas in which it does not currently provide advantages over SRS, or falls short of other technologies. Some LRS systems produce a higher sequencing error rate compared to SRS, whilst others are lower throughput. They are also fairly expensive on a per run basis, some 2017 estimates for PacBio LRS are $5000-6000 per genome, vs $1000 for SRS. In addition, bioinformatics pipelines for analysis are not yet as developed as those for SRS. In many situations, a combination of LRS and SRS technologies is used (hybrid sequencing) to achieve a high quality, high coverage reconstructed genome; this comes at a greater cost.
Ultimately, the decision to use a technology depends upon the question being asked; long-read sequencing may not be the optimal choice in all situations. Selecting the optimal technology will depend very much on context: budget, desired outcomes, and logistical considerations. Indeed, how the technology is used is almost as important as the system itself; factors such as throughput, cost, and accuracy are all intertwined with original sample type and quality, pre-sequencing sample preparation, and bioinformatic techniques when sequencing a genome. The opportunities which LRS does present are currently evident for cancer, rare disease and infectious disease. It will be important to keep a close watch on LRS technologies as they continue to evolve rapidly.
* Correct as of 2 October 2018