Trends in Genomics Technology
Ever since the study of individual genes and RNAs was first known to be important, there has been a drive to get as detailed and complete genomic information as possible. Early technologies like the hybridization-based Southern and Northern blotting methods were tremendous advances, but allowed only a handful of genomic targets to be studied at a time. Analog information about length and frequency was generated for a small number of targets.
These methods gave way to microarrays, another hybridization-based approach generating analog data. Arrays provided information on many more RNAs or DNAs and the age of genome-wide studies began. Array-based methods didn’t measure RNA lengths and had certain drawbacks like dynamic range and a requirement for prior sequence knowledge. Despite this, the tremendous throughput advantage meant that many more genes could be examined in parallel, making up for the shortcomings. Some information content was sacrificed but a more complete view of the nucleic acid universe resulted. Array technologies improved to the point that over a million SNPs or all sequenced RNAs could be assessed on single arrays (subject to the limitations of dynamic range).
As arrays became the standard for evaluating DNA variants and gene expression, DNA sequencing was advancing at an astounding rate with costs dropping and throughput increasing. This technological transformation had a dramatic impact on genomics experiments, shifting both older methods and previously undoable methods to sequencing. Arrays were replaced by sequencing on whatever platform researchers had access.
The digital nature of sequencing eliminated one of the major stumbling blocks of array-based methods: dynamic range. With arrays, it was impossible to sample all ranges of expression or variant counts, because the system was either not sensitive enough at the lower end or became saturated at the higher end. To a first approximation these problems disappeared with sufficient sequencing depth. Genes expressed at both low and high levels could be assessed in the same experiment. Even sequencing has limitations in dynamic range because error rates, library diversity and other factors can constrain the true range. Furthermore, despite the low costs, sequencing depth was also limited by how much a researcher could spend.
Cheap sequencing not only led to an improvement in data quality, but it also opened up new applications that had not been feasible. Whole-genome protein binding, DNA modification mapping, chromatin accessibility, ancient DNA sequencing and other applications had been impractical with older methods. Cheap, digital data changed that. As the applications attempted using sequencing-based methods increased, the technology limitations became more apparent. Depending on the nature of the information needed, each application had its own problems including amplification bias, short-read lengths, error rates and throughput limitations. For example, the length information provided by Northern blotting was useful for understanding splicing and transcriptional start/stop sites but these features were challenging to decipher using short-read technologies. The high accuracy of allele calling using Sanger sequencing became murky in diploid or polyploid genomes. The desire to carry out even better sequencing experiments remained.
Advances in sequencing technology have slowed from a dizzying pace to a merely frantic level, allowing a greater focus on which instrument specifications are most important for each application. Different researchers lobby for instruments and run conditions that provide the type of data ideal for their purposes. It’s often now possible for individuals to select from a variety of instruments in their core facility or from contract sequencing providers that offer a variety of services. For genome assemblies, long reads are critical while of little importance for ChIP-Seq where read count is much more significant. Both features are important for RNA-Seq where length provides splice junction information but read count is critical for determining expression differences. However, the quantitative nature of RNA-Seq and ChIP-Seq can be plagued by amplification bias that can be eliminated using single-molecule methods; but most single-molecule methods are currently throughput limited. Amplification bias affect absolute measurements but, if the bias is consistent, it’s less important for relative measurements. For many clinical applications, standard run times of many days are simply not viable. Smaller, faster instruments are more popular in the clinical realm. Thus, individual needs determine which instrument is most desirable.
While many applications are now well—though not perfectly—served by existing instruments, some methods are still difficult to carry out. Assembly of small, non-repetitive genomes is straightforward with long read technologies but larger, highly repetitive or polyploid genomes are still very problematic. Issues with genome assembly highlight the need for instruments that can provide much longer reads than currently generated. Analysis of heterogeneous samples like metagenomes or tumors is limited by short reads because they cannot distinguish among closely related genomes. Only much longer reads that span a large number of differences allow such distinctions. Heterogeneous samples also raise throughput and cost issues, as much higher coverage is needed to assess all genomes in such mixtures.
Amplification-based and sequencing-by-synthesis methods are unlikely able to achieve reads that are cheap enough and long enough to fully analyze complex tumor genomes and microbiome samples. For such samples, molecular information on length scales greater than 100 kb are invaluable. Optical mapping methods can generate such information but the resolution of optical instruments is limited by the wavelength of light and the expense of lasers and detectors. These issues can be circumvented by solid-state, electronic nanodetectors now approaching commercialization. Novel developments like nanodetectors will continue to push the frontiers of genomic applications.
While DNA sequencing has made great strides, no single system encompasses all desired attributes for all applications. While 100% accuracy is always welcome, it is, in fact, not necessary in many situations. Some samples require long reads while others necessarily begin with short DNA—DNA from FFPE, ancient samples, miRNAs or circulating DNA—and thus can’t be directly analyzed with long reads. Developing instruments that meet these varied requirements of read length, accuracy, cost and other specifications is an ongoing process that will continue to evolve. The improvements in sequencing and throughput that have occurred over the last decade have, so far, only whet the appetites of researchers for even more sequencing power. They will continue to identify more applications and strive for better instruments to carry out experiments that have not yet been envisioned. New instrument capabilities will continue to drive these aspirations.