Home >> ALL ISSUES >> 2015 Issues >> Groups closing the gap in reference materials for sequencing assays

Groups closing the gap in reference materials for sequencing assays

image_pdfCreate PDF

William Check, PhD

March 2015—It’s a truism in the clinical laboratory that your results are only as good as the reference standards available to QC your assay. For measuring small analytes like glucose that’s not a problem.

However, in clinical laboratories the analyte in question increasingly is DNA. In the past five years, next-generation sequencing has been adopted to detect variants in small targeted regions of specific genes, which is useful in oncology and medical genetics. More ambitious applications of NGS—whole genome and whole exome sequencing—have recently begun to enter the clinical realm as well.

At 3 billion base pairs in length, and with any of four bases at every position, human DNA presents an unprecedented challenge to analytic methods and the development of reference standards. And because human DNA is diploid, every base position can be homozygous or heterozygous. Moreover, medical consequences can arise not just from a change in the base occupying a single position—single nucleotide variants—but from larger structural variants, such as the insertion of an extra stretch of DNA where it doesn’t belong or the deletion of a small or large segment of DNA, collectively called indels.

Dr. Church

Dr. Church

Given this complexity, it’s impressive that NGS is as good as it is. But it still falls short for more demanding applications. “Right now whole genome approaches do not really give you whole genome analysis, because the whole genome is not accessible,” says Deanna Church, PhD, director of genomics and content at Personalis.

Or, as Arend Sidow, PhD, likes to say, “We kind of pretend to be sequencing whole genomes right now, but we’re really not.”

“Long-read technologies have the potential to get us much closer to that goal,” says Dr. Sidow, who is associate professor of pathology and of genetics at Stanford University. For long-read research, Dr. Sidow’s group works with the Oxford Nanopore platform.

“Right now we are really not very good at deciphering structural variants of intermediate size,” Dr. Sidow tells CAP TODAY. “We are good at detecting single nucleotide variants in those parts of the genome in which next-generation sequencing works—approximately 75 percent of the genome—and very large variants, hundreds of kilobases. But there is much genetic variation that is between those sizes, and current sequencing technology is not very good at supporting the discovery of those variants. That’s where long-read approaches come in.”

Dr. Sidow

Dr. Sidow

In 2003 when the “complete sequence” of the human genome was announced, this deficiency was already recognized, although it was downplayed. The answer to a frequently asked question posted on the site of the National Human Genome Research Institute said: “Within the limits of today’s technology, the human genome is as complete as it can be. Small gaps that are unrecoverable in any current sequencing method remain….” (www.genome.gov/11006943).

Since then, appreciation for those “small gaps” has grown. As one research group wrote recently, “The human genome is arguably the most complete mammalian reference assembly, yet more than 160 euchromatic gaps remain and aspects of its structural variation remain poorly understood ten years after its completion” (Chaisson MJ, et al. Nature. 2015;517:608–611).

Added to this is the great heterogeneity among “normal” genomes. Justin Zook, PhD, of the National Institute of Standards and Technology, tells CAP TODAY: “It’s really amazing how much normal variation there is between individuals. We are just starting to understand it. Many of these variations fall into parts of the genome that are difficult to sequence. It can be particularly challenging to detect structural variants.”

All of these complexities in the structure of DNA make it difficult to establish a complete human reference genome. And without a complete reference genome, a laboratory can’t know whether it has identified all the variations in a patient’s DNA. In heritable conditions with no known cause, that complicates the task of assigning pathogenesis.

There are two different references: the “reference genome,” a model assembly of the human genome that includes a sampling of variation across the genomes of a number of individuals, and “reference material genomes,” which are single genomes well-characterized for variants against the reference assembly. “These reference material genomes are being sufficiently characterized that they will be among the ‘best’ human genomes, and are intended to act as gold-standard benchmark samples,” says Marc Salit, PhD, of NIST.

Dr. Salit is leading a project called Genome in a Bottle, or GIAB, the aim of which is to generate highly accurate reference material genomes as a public-private-academic partnership. As part of this initiative, Dr. Salit, Dr. Zook, and colleagues determined just how much of the genome can be characterized accurately with current methods. NGS in its present incarnations, they estimate, can make high-confidence calls for 78 percent of the genome (Zook JM, et al. Nat Biotechnol. 2014;32:246–251). “[T]here is a need for a highly accurate set of genotypes across a genome that can be used as a benchmark” for human genome sequencing, the authors wrote. NIST will make samples of genomic DNA of these genomes available as reference materials from the NIST Standard Reference Materials Program.

Dr. Ashley

Dr. Ashley

Euan Ashley, MRCP, DPhil, is a collaborator in the GIAB project. “This is a really important area,” he says. “My group is focused on genomics for clinical medicine.” Dr. Ashley is associate professor of medicine and genetics and, by courtesy, pathology at Stanford University; director of the Stanford Center for Inherited Cardiovascular Disease; and co-director of the Stanford Clinical Genomics Service. He finds “great excitement,” he says, in the use of this new technology. “We want to move this technology toward clinical grade. It is transformative for medicine. Yet we need to concentrate more on algorithms to bring the standard of sequencing up to the standard we expect for clinical medicine,” he says.

In addition to GIAB, several other projects aim to generate highly accurate human genome sequences. One group that has posted results already is the Genetic Testing Reference Materials Coordination Program, or GeT-RM (wwwn.cdc.gov/clia/Resources/GeTRM/default.aspx). Its coordinator is Lisa Kalman, PhD, health scientist in the Laboratory Research and Evaluation Branch of the Centers for Disease Control and Prevention.

“There are now tests for about 5,000 genetic conditions. But there are very, very few reference materials available for these tests, maybe 50 different materials in 50 different genes,” Dr. Kalman says. “That’s a huge gap. We need reference materials to develop new genetic tests, to validate genetic tests, for QC and also for proficiency testing or alternate assessment activities.”

Fifteen years ago the CDC recognized that gap and started projects to address it, including GeT-RM. “Our program’s purpose,” Dr. Kalman explains, “is to make publicly available highly characterized genomic DNA samples that laboratories can use. Having reference materials will enable labs to validate tests that look at variants in all parts of the genome and to assess the accuracy of the tests.”

Addressing the growth in our understanding of the complexity of the human genome in the past decade, Robert Sebra, PhD, says, “The information content of the genome just gets bigger and bigger as sequencing technology expands. We conduct a variety of R&D using PacBio [Pacific Biosciences] single-molecule sequencing and other long-read technologies toward better detection of structural variants involved in inherited disease.” He is assistant professor of genetics and genomic sciences at the Icahn School of Medicine at Mount Sinai in New York.

“What’s really important,” Dr. Sebra says, “is that no one technology is going to give you the answer to everything. In an ideal scenario, a single technology would offer long reads and accuracy and high throughput. That is not currently the situation. Right now we have to take every case and match it to the current technology that best addresses the types of variants necessary on a given genetic panel. We can do much with available platforms, but for many structural variants and for addressing the unresolved regions of the genome, we will need even longer reads.”

To initiate the GeT-RM effort, Dr. Kalman gathered clinical labs, next-generation sequencing companies, representatives of the National Institutes of Health, and other stakeholders on a conference call. “I asked, ‘How can we address this gap [in reference materials]? Labs are starting to do whole exome and whole genome sequencing but there are no materials to assess their assays.’”

CAP TODAY
X