Home >> ALL ISSUES >> 2016 Issues >> PD-L1, other targeted therapies await more standardized IHC

PD-L1, other targeted therapies await more standardized IHC

image_pdfCreate PDF

Anne Paxton

February 2016—Immunohistochemistry is heading down a path toward more standardization, and that’s essential as it plays an increasing role in rapidly expanding immunotherapy, says David L. Rimm, MD, PhD, professor of pathology and of medicine (oncology) and director of translational pathology at Yale University School of Medicine. As a co-presenter of a webinar produced by CAP TODAY in collaboration with Horizon Diagnostics, titled “Immunohistochemistry Through the Lens of Companion Diagnostics” (http://j.mp/ihclens_webinar), he analyzes the core challenges of IHC’s adaptation to the needs of precision medicine: binary versus continuous IHC, measuring as opposed to counting or viewing by the pathologist, automation, and assay performance versus protein measurement.

“Immunohistochemistry is 99 percent binary already,” Dr. Rimm points out. “There are only a few assays in our labs—ER, PR, HER2, Ki-67, and maybe a few more—where we really are looking at a continuous curve or a level of expression.”

The left panel shows a case that was called negative in one lab and positive in another. The right panel is a serial section of that case showing definitive positive staining illustrated by omission of hematoxylin counterstain.

Two criteria in the 2010 ASCO/CAP guidelines on ER and PR testing in breast cancer patients are key, he says: 1) the percentage of cells staining and 2) any immunoreactivity. “The first is hard to estimate, but the guidelines recommend the use of greater than or equal to one percent of cells that are immunoreactive. That means they could have a tiny bit of signal or they could have a huge amount of signal and they would be considered immunoreactive, which thereby makes this a binary test.”

Having the test be binary can be a problem for companion diagnostic purposes because any immunoreactivity is dependent on the laboratory threshold and counterstain. For example, if two of the same spots, serial sections on a tissue microarray, were shown side by side, one with and one without the hematoxylin counterstain, “you might see the counterstain make this positive test into a negative by eye, which is a potential problem with IHC when you have a binary stain.” (Fig. 1).

Dr. Rimm describes a small study done with three different CLIA-certified labs, each using a different FDA-approved antibody and measuring about 500 breast cancer cases on a tissue microarray. The study showed there can be fairly significant discordance between labs—between 18 and 30 percent discordance—in terms of the cases that were positive. “In fact, if we look at outcome, 18 percent of the cases were called positive in Lab Two but were negative in Lab Three. Lab Three showed outcomes similar to the double positives whereas Lab Two had false-negatives.” This is an important problem that occurs when we try to binarize our immunohistochemistry, he says.

Counting is more variable in a real-world setting due to the variability of the threshold for considering a case positive. “You can easily calculate that if your threshold was five percent, then you’d have 70 percent positive cells. And you would easily call this positive. But if you added more hematoxylin because that’s how your pathologist liked it, then perhaps you’d only have 30 percent positive. So this is the risk of using thresholds.” (Fig. 2).

Although this is done in all of immunohistochemistry today, Dr. Rimm thinks it is an important consideration as IHC transitions to more standardized form. “An H score—intensity times area, which has been attempted many times, can’t be done by human beings. Pathologists try but have failed.”

“We can’t do those intensities by eye. We have to measure them with a machine. But we get a very different piece of information content when we measure intensity, as opposed to measuring the percentage of cells above a threshold. In sum, more information is present in a measurement than in counting.”

A shows comparison of a quantitative fluorescence score on the x axis versus an H-score on the y axis. Note the noncontinuous nature of human estimation of intensity times area (H-score). B) The survival curve in a population of lung cancer cases using the H-score. C) The survival curve in the same population using the quantitative score. (Source: David Rimm, MD, PhD)

Pathologists read slides for a living, so it’s uncomfortable to think about giving that up in order to use a machine to measure the slides. “But I think if we want to serve our clients and our patients, we really owe them the accuracy of the 21st century as opposed to the methods of the 20th century.” (Fig. 3).

Among the currently available quantitative measuring devices are the Visiopharm, VIAS (Ventana), Aperio (Leica), InForm (Perkin-Elmer), and Definiens platforms. “We use the platform invented in my lab, called Aqua [Automated Quantitative Analysis], but this is now owned by Genoptix/Novartis. Genoptix intends to provide commercial tests using Aqua internally,” Dr. Rimm says, “as well as enable platform and commercial testing through partnership with additional reference lab providers.

“There are many quantification platforms,” he adds, “and I believe that any of them, used properly, can be effective in measurement.”

(Of the 265 participants in the CAP PM2 Survey, 2015 B mailing, who reported using an imaging system for quantification, 4.6 percent use VIAS, 4.1 percent use ACIS, 0.8 use Applied Imaging, and 10 percent use “other” imaging systems. Of the 1,359 Survey participants who responded to the question about use of an imaging system to analyze hormone receptor slides, 1,094, or 80.5 percent, reported not using any imaging system for quantification.)

Says Dr. Rimm: “The first platform we used to try to quantitate some DAB stain slides was actually the Aperio Nuclear Image Analysis algorithm. But the problem with DAB is that you can’t see through it. And so inherently it’s physically flawed as a method for accurate measurement.” He compares DAB to looking at stacks of pennies from above, where their height and quantity can’t be surmised, as opposed to from the side, where their numbers can be accurately estimated. “This is why I don’t use, in general, DAB-type technologies or any chromogen.”

Fluorescence doesn’t have this problem, and that is the reason Dr. Rimm began using fluorescence as a quantitative method. “We try to be entirely quantitative without any feature extraction. So we define epithelial tumors using a mask of cytokeratin. We define a mask by bleeding and dilating, filling some holes, and then ultimately measure the intensity of each cell, or of each target we’re looking for. In this case, in a molecularly defined compartment.”

Compartments can be defined by any type of molecular interactions. “We defined DAPI-positive pixels as nuclei, and we measure the intensity of the estrogen receptor within the compartment. And that gives us an intensity over an area or the equivalent of a concentration.” Many other fluorescent tools can be used in this same manner, but he cautions against use of fluorescent tools that group and count. “That’s a second approach that can be used, but the result gives you a count instead of a measurement.”

When comparing a pathologist’s reading versus a quantitative immunofluorescence score, he notes, pathologists actually don’t generate a continuous score. Instead, pathologists tend to use groups. “We tend to use a 100 or a 200 or an even number. We never say, ‘Well, it’s 37 percent positive.’ We say, ‘It’s 40 percent positive,’ because we know we can’t reproducibly tell 37 from 38 from 40 percent positive.”

The result of that is a noncontinuous scoring result, which doesn’t give the information content of quantitative measurement. A comparison between the two methods shows that at times, where quantitative measurement shows a significant difference in outcome, nonquantitative measure or an H-score difference may not show a difference in outcome. (Fig. 3 illustrates this concept.)

“Pathologists tend to group things, and we also tend to overestimate. It’s not that pathologists are bad readers. It’s just the tendency of the human eye because of our ability to distinguish different intensities and the subtle difference between intensities. But even if you compare two quantitative methods, you can see that the method where light absorbance occurs—that is the percent positive nuclei by Aperio, which is a chromogen-based method—tends to saturate. This is, in fact, amplified dramatically when you look at something with a wide dynamic range like HER2.” (Fig. 4).

In one study, researchers found less than one percent discordance—essentially no discordance—between two antibodies (Dekker TJ, et al. Breast Cancer Res. 2012;14[3]:R93). But looking at these results graphed quantitatively, you would see a very different result, Dr. Rimm says. “You can see a whole group of cases down below where there’s very low extracellular domain and very high cytoplasmic domain. In fact, some of these cases have essentially no extracellular domain, but high levels of cytoplasmic domain, and other cases have roughly equal levels of each” (Carvajal-Hausdorf DE, et al. J Natl Cancer Inst. 2015;107[8]:pii:djv136).

Recent studies by Dr. Rimm’s group have shown this to have clinical implications. He looked at patients treated with trastuzumab in the absence of chemotherapy, in an unusual study called the HeCOG (Hellenic Cooperative Oncology Group) trial.

“We found that patients who had high levels of both extracellular and intracellular domain have much more benefit than patients who are missing the extracellular domain and thereby missing the trastuzumab binding site.” Follow-up studies are being done to validate this finding in larger cohorts.

Preanalytical variables, Dr. Rimm emphasizes, can have significant effects on IHC results, and more than 175 of them have been identified. “These are basically all the things we can’t control, which is the ultimate argument for standardization.”

In a surprising study by Flory Nkoy, et al., he says, it was shown that breast cancer specimens were more likely to be ER negative if the patient’s surgery was on a Friday because there was a higher ER-negative rate on Friday than on Monday. “So how could that be? Well, it was clearly the fact that the tissue was sitting over the weekend. And when it sat over the weekend, the ER positivity rate was going down” (Arch Pathol Lab Med. 2010;134:606–612).

Another study showed that after one hour, four hours, and eight hours of storage at room temperature, you lose significant amounts of staining, Dr. Rimm says. “And perhaps the best nonquantitative study or H-score-based study of this phenomenon was done by Isil Yildiz-Aktas, et al., where a significant decrease in the estrogen receptor score was found after only three hours in delay to fixation” (Mod Pathol. 2012;25:1098–1105).

How long the slide is left to sit after it is cut is another preanalytical variable to be concerned with. “In the clinical lab, that’s not often a problem since we cut them, then stain them right away. But in a research setting, a fresh-cut slide can look very different from a slide that’s two days old, six days old, or 30 days old, where a 2+ spot on a breast cancer patient becomes negative after 30 days sitting on a lab bench. So those are both key variables to be mindful of.”

One solution for those preanalytic variables is trying to prevent delayed time to fixation. “And probably time to fixation is one of the main preanalytic variables, although it’s only one of the many hundreds of variables. The method we use to try to get around this problem is to use core biopsies or allow rapid and complete fixation, and then other things can be done.”

Finally, he warns, don’t cut your tissue until right before you stain it. “If you’re asked to send a tissue out to a collaborator or someone who is going to use it for research purposes later, we recommend coring and re-embedding the core, or sending the whole block. Unstained sections, when not properly stored in a vacuum, will ultimately be damaged by hydration or oxidation, both of which lead to loss of antigenicity.”

The crux of the matter is assay performance versus protein measurement, Dr. Rimm says. “In the last six to nine months, we really are faced with this problem in spades, as PD-L1 has become a very important companion diagnostic.”

There are now four PD-L1 drugs with complementary or companion diagnostic tests (Fig. 5). One of the FDA-approved drugs, nivolumab (Opdivo, Bristol-Myers Squibb), for example, uses a clone called 28-8, which is provided by Dako in an assay, a complementary diagnostic assay, and with the following suggested scoring system: one percent, five percent, or 10 percent. In contrast, pembrolizumab (Keytruda, Merck) is also now FDA-approved but requires a companion diagnostic test that uses a different antibody, although the same Dako Link 48 platform. This diagnostic has a different scoring system of less than one percent, one to 49 percent, and 50 percent and over.

Two other companies, Roche/Genentech and AstraZeneca, also have drugs in trials that may or may not have companion diagnostic testing, though both have already identified a partner and a unique antibody (neither of those listed above) and companion diagnostic testing scores used in their clinical trials.

“So what’s a pathologist to do?” Dr. Rimm says. “Well, there are a few problems with this. First of all, what we really should be doing is measuring PD-L1. That’s the target and that’s what should ultimately predict response. But instead what we’re stuck with, through the intricacies of the way our field has grown and our legacy, is closed-system assays. While these probably do measure PD-L1, we do not know how these compare to each other.” Two parallel large multi-institutional studies are addressing this issue now, he says.

There are solutions for managing these closed-system assays to be sure the assay is working in your lab and that you can get the right answer, Dr. Rimm says. His laboratory uses a closed-system assay for PD-L1, relying not on the defined system but rather on a test system it has developed in doing a study with different investigators.

CAP TODAY
X