After nearly two decades, I recently retired from my positions as Chair of Pathology (University of Massachusetts – Baystate) and Professor of Pathology (Tufts University School of Medicine, University of Massachusetts School of Medicine). I also recently concluded my roles as President (2015-17) and Governor (2007-15) of the College of American Pathologists. As an academic pathologist heavily involved in both anatomic and clinical pathology, I can appreciate more than most the critical need for innovative non-invasive diagnostics to improve long-term allograft survival.
Over the past few decades, the improvements in the care and management of transplant patients have been dramatic, yet our inability to easily and reliably diagnose allograft rejection in its earliest and most reversible stages is holding back further improvement. With strong analytical and clinical validity data that matches or improves upon predicate tests, Prospera is an important step in addressing this shortfall. The use of donor-derived cell-free DNA (dd-cfDNA) holds much promise for improvements in sensitivity, specificity, test availability, sample accessibility, as well as patient satisfaction and comfort. The addition of this test into a clinician’s arsenal should lead to both improved efficiency and accuracy of the traditional biopsy-based results.
In my review of the clinical validation and analytical validation discussed in the proposed Prospera LCD (DL38041), I see the full value of the dd-cfDNA test as it should lead to better and more accurate biopsy-based results. Specifically, Prospera is:
- A more accurate measure than the standard-of-care
The current tools to diagnose kidney transplant rejection are either deeply invasive (tissue biopsy) or insensitive (serum creatinine). Biopsies are also subject to sampling error. We have a strong unmet need for better and earlier diagnostic tools to improve patient management and outcomes. Sigdel, et al., clearly showed more accurate performance of Prospera compared to serum creatinine through higher sensitivity (89% vs 52%), higher specificity (73% vs 70%) and higher AUC (0.87 vs 0.71).
- Validated in a large cohort with breadth and diverse sample types
The Sigdel study at the University of California - San Francisco (UCSF), investigated 217 biopsy-matched plasma samples from 178 unique kidney transplant patients with 38 cases of confirmed active rejection (AR). This study is twice as large as the cohort size in published dd- cfDNA studies in renal transplantation - indicating a more expansive study of the test. The broad ethnic diversity captured in the cohort is critical as transplant assessment varies by ethnicity. Including demographics such as Hispanic/Latinos, Asians, and African Americans in the validation studies provides greater assurance in the dependability of this test prior to biopsy.
I believe this LCD is a major step forward for Medicare as it helps advance personalized medicine by covering tests, like Prospera, to provide physicians with an improvement to the standard-of- care. It will provide us with comprehensive, accurate, earlier information before proceeding to a biopsy.
Thank you for the comment. It is reassuring to see that experienced clinicians are in agreement with our coverage decision. We attempt to ensure that Medicare beneficiaries have access to reasonable and necessary services innovative services.
On behalf of the approximately 4,000 members of the American Society of Transplantation (AST) I am writing to express support for the continued development of testing donor derived cell-free DNA testing used to detect active cellular rejection in kidney transplant recipients, as described in LCD DL38041. The AST appreciates the acknowledgement for the need of innovative means to monitor kidney allograft rejection based on the following reasons:
Lack of sensitive, specific and safer diagnostic tools
Kidney transplantation is now the renal replacement therapy of choice for patients with end-stage renal disease. However, kidney allograft management and survival continues to be a challenge due to the inability to safely and accurately monitor risk of rejection. The available diagnostics for the detection of kidney allograft rejection lack needed performance creating demand for better surveillance methodologies. The current tools for diagnosing organ transplant rejection are either invasive (biopsies) or insensitive (serum creatinine), creating a significant unmet need for better diagnostic tools to improve patient management and outcomes.
Need to improve longer term graft survival
With roughly 20,000 new renal allograft patients and nearly 200,000 living renal transplant recipients in the U.S. just in 2015, there is a significant gap in the effective management of this disease as 20-30% of kidney transplants fail within five years and approximately 50% fail within 10 years. Evaluating for all rejection types, including sub-clinical and for-cause rejection, is a key factor to improve long-term graft survival. Cell-free DNA assays which detect which can detect subclinical rejection with high sensitivity, as seen in Sigdel et al, enable earlier detection and better management of these patients. A more sensitive method to monitor rejection would allow nephrologists to administer new medications, dosing regimens, and other interventions prior to irreversible rejection mediated renal injury.
More focus on patient-specific care through physician choice
Precision medicine approaches are needed in the field of transplantation. Physicians are best able to deliver high quality personalized care with a choice of products in the market. By leaving test use decisions to the physician, testing companies must continue to assess the needs of physicians and patients to consistently improve and deliver to stay relevant. New entrants and competition often spurs innovation to increase value in the healthcare system and provide access to transplant patients who need optimal care. All in all, this may create a higher quality, low cost dynamic in transplantation care.
AST agrees that there is well-validated evidence for donor-derived cell-free DNA as a potentially useful biomarker in the care of transplant patients.
As such, cell free donor derived assays will be welcomed by the transplant community as another new tool that addresses an important clinical issue for kidney transplant patients.
We thank you for the opportunity to provide comment and your collaboration in making transplant as available, safe and effective as possible for as many people as possible. We look forward to working with you in this area in the future. Please let us know if you have any questions or would like to discuss any transplant-related issues further.
Thank you for the comment. We are reassured to know that a major society supports the use of this kind of testing for patients with kidney transplants.
CareDx appreciates this opportunity to comment on the above-referenced MolDX proposed local coverage determination (LCD), which concerns the Prospera test marketed by Natera, Inc.
As you know, molecular diagnostic tests (MDTs) build on the work of the Human Genome Project and are important in providing 21st century personalized medical care to patients. The Molecular Diagnostic Services Program (MolDX) developed by Palmetto GBA is critical in ensuring that Medicare beneficiaries benefit from this new technology. As Palmetto has noted, MolDX success hinges on three components: Test registration and ID assignment, application review, and coverage determination and reimbursement.
Each final LCD issued by the MolDX Program providing coverage not only ensures the access of Medicare beneficiaries to the molecular diagnostic test involved, but also contributes, along with other federal efforts, to developing a regulatory paradigm to govern the development, marketing, and application of such tests by companies, academic institutions, and other entities. This regulatory paradigm has important effects on the decisions of scientists and developers to invest time and capital in the creation of new tests. It is therefore extremely important that the regulatory paradigm identify and consistently apply scientific standards.
CareDx has had very positive interactions with MolDX over the years and values the rigorous process through which MolDX approaches coverage and reimbursement determinations for high complexity molecular diagnostics. In October 2017, this review process led to the coverage and reimbursement of CareDx’s AlloSure, which is a next-generation sequencing assay for the quantification of donor-derived cell-free DNA (dd-cfDNA) in kidney transplant recipients to assess the probability of transplant rejection. AlloSure results may enable physicians to decide that their patients can avoid expensive and potentially harmful needle biopsies of the kidney. The LCD for AlloSure and our collaboration on further data development have led to significant use of our test in the kidney transplant community, excitement over the possibilities for dd-cfDNA to improve patient care, and ideas from clinicians for further study and application of dd-cfDNA in other indications.
The experience of CareDx, concerning the development and marketing of a dd-cfDNA diagnostic test in transplantation, places us in a unique position to comment on the proposed LCD for Natera’s test, Prospera. Below we discuss our concerns regarding Natera’s scientific conclusions as described in the draft LCD. These concerns relate to the clinical data on which Natera relies.
We emphasize at the outset that CareDx welcomes fair competition from Prospera or any similar test. Our goal is simply to ensure that a consistent scientific standard is applied to all molecular diagnostic tests seeking coverage and reimbursement.
I. MolDX Coverage Standards
MolDX requires that, in order to cover a molecular diagnostic test, an MDT developer must demonstrate the test’s analytical validity, clinical validity, and clinical utility. MolDX conducts a technical assessment (TA) of molecular assays that are laboratory developed tests; involve next generation sequencing technology; employ new or novel technology; or have undefined or unproven clinical utility. The TA process requires that an MDT developer submit a comprehensive dossier of scientific information. During this process, subject matter experts and MolDX determine whether an assay demonstrates clinical utility, fulfills Medicare’s “reasonable and necessary” criteria, and meets analytical and clinical validity standards.
CareDx believes that the clinical data supporting the Prospera test does not meet the high standards established by MolDX.
II. Technical Issues With Prospera Study Cast Doubt on Performance Claims for Prospera
The study on which Natera relies to obtain MolDX coverage (Prospera study or study report) has a number of technical deficiencies. As explained in more detail below, CareDx has the following concerns about the study:
- The study results are not representative of the kidney transplant population and transplant centers in the U.S. because the study was performed on samples from a pre-existing biobank from only one center. The study did not characterize how the test samples may perform at multiple centers across the United States that manage patients by different protocols and use different treatment regimens. In addition, the study results did not characterize test performance under commercial conditions such as multicenter use of kits to collect, label, and ship the fresh blood samples.
- It was a retrospective study whose data demonstrate bias in sample selection:
- It included pediatric samples but the findings were not consistent with the typical findings in past studies of children.
- It pooled data from “for cause” biopsy samples with data from “protocol” biopsy samples.
- It found eGFR to be a discriminating factor for kidney rejection, and used no- rejection samples with an unusually high mean eGFR.
- The definition of rejection conflicts with the label of protocol biopsy.
- It departed from the Banff classification rules for defining rejection.
Overall, there are numerous factors that lead to bias in selection of index cases and therefore questions arise regarding the stated performance of Prospera as applied to dd-cfDNA. Some of these same issues have been raised independently in an investor analyst report, included in this letter as an appendix.
B. Study Results Not Representative of U.S. Kidney-Transplant Population or U.S. Transplant Centers
The Prospera study was conducted at a single transplant center. This is a simple but important point. The findings from a single-center study always carry the risk that they do not apply beyond that single center. The results obtained from the Prospera study’s sample set from one center cannot be extrapolated and considered representative of the kidney-transplant population and transplant centers in the U.S., with the broad variety among centers in patient management practices and in patients’ medical conditions.
In defining the benefit of a new technology, it is important to distinguish between test performances estimated from samples from a single center’s pre-existing biobank versus samples collected in an unbiased manner across a representative multi-center population. The Prospera study results did not characterize how the test may perform under real-world situations, which would necessarily include the variety of monitoring protocols and immunosuppression regimens at transplant centers across the United States, as well as a diversity of pathologist perceptions when reading biopsies used as the comparator.
Transplant centers have a range of practices for managing patient care both among and within centers. For example, about 20 percent of transplant centers perform protocol surveillance biopsies, while the remainder only performs biopsies when clinically indicated. Another example is that different centers use different immunosuppressive regimens, with some ceasing the use of steroids by three months after the transplant while others continue with a low level of steroids.
Another consideration is that the rejection classification approaches of pathologists can differ among pathologists within a center and among centers. The well-known variance in pathology reading can cause a biopsy classified as a TCMR 1A rejection at one center to be classified as borderline TCMR at a second center and TCMR 1B at yet a third center.
Taking real world considerations into account, it is highly unlikely that the Prospera study is representative of the clinical care that will be encountered in test use across the country. The results of multi-center studies of dd-cfDNA as measured by Prospera are necessary before findings about the test’s performance can be considered generally applicable to the entire U.S. kidney- transplant population.
C. Retrospective Study Design Leads to Demonstrated Sample-Selection Bias
In a prospective study of patients the investigators design a study to be subsequently executed to detect the relation (if any) between a variable and patient status or outcomes. The study data is developed in accordance with the study design. In a retrospective study, the patient outcomes are already known at the beginning of the study; the clinical data already exists and thus was not developed for purposes of the study, but rather for some other purpose. Both types of studies have their advantages. Retrospective studies take less time and cost less money, but the research controls to determine the relation between the variable and the outcomes are not as rigorous. The data in retrospective studies is prone to bias, which are errors that result in an untrue association, and also to confounding factors that result in a misleading true association.
With retrospective selection of samples there is a risk that existing data can be used to identify subsets of samples that are the most likely to demonstrate effectiveness of the test. In a study of a test to detect rejection of a transplanted kidney, selection bias could occur for samples with additional evidence of rejection such as subsequent poor outcomes, which would be more likely to be “true rejection” than those simply biopsy-positive. Alternatively, selection bias could also occur for samples with no evidence of rejection, such as a complete patient history of no rejection instead of a single time point, which might be more likely true “no rejection” than samples with only “lack of positive biopsy”. In retrospective sample selection there is the risk that both of these selections might occur, resulting in selection of extreme examples to represent the two classes of rejection and no-rejection.
Natera conducted a retrospective clinical study concerning its Prospera test. The plasma samples that were analyzed were selected from a pre-existing sample archive at a single transplant center, which can have a significant impact on the estimate of performance. The study design prevented the Prospera investigators from prospectively defining when these samples were collected or analyzed. Moreover, the biopsies were analyzed by a single pathologist at that single center, with a risk of bias in defining rejection relative to peers across the country.
In the Prospera study report, the data have characteristics that demonstrate bias occurred in sample selection, as explained below.
1. Pediatric Samples Not Representative of the Pediatric Population
Of the samples analyzed, 20 percent were from children (individuals under the age of 18). None of these pediatric samples were classified as rejection. This is in stark contrast to an incidence of rejection in pediatric kidney transplantation that is published to be higher than in adult populations. The absence of any pediatric case of rejection indicates sample selection bias. The publication does not explain why the Prospera study did not include any children with rejection in the study cohort.
2. Combining Protocol Biopsy Data With For-Cause Biopsy Data
Samples were taken from kidney-transplant patients who had a for-cause biopsy (the patient had clinical signs of concern for possible rejection) and from patients who had a protocol biopsy (the patient did not have clinical indication of possible rejection at that time). As expected, the ratio of rejection samples to no-rejection samples was higher in the for-cause samples than in the protocol samples. Yet, the Prospera investigators combined the for-cause results with the protocol results, and they made no adjustments to take into account the different rates of rejection in the two types of samples.
Since protocol biopsy samples are not likely to have rejection, including them tends to inflate the estimate of Prospera’s specificity (the ability to identify true negatives). The protocol biopsy samples are more likely to be “easy” calls in that there is no rejection, inflating the appearance of Prospera’s accuracy.
3. eGFR Incorrectly Used as a Discriminator of Rejection; eGFR in No-Rejection Samples is Not Representative
The estimated glomerular filtration rate (eGFR) is an effective means of measuring the level of kidney function. It had not, however, been reported in the scientific literature as a discriminating factor for the presence or absence of kidney rejection in the face of compromised function until the Prospera study report did so. The report’s introduction includes the following:
Diagnosis of renal transplant rejection is generally dependent on an increase in serum creatinine levels or its algorithmic derivative, estimated glomerular filtration rate (eGFR), which indicates altered renal filtration functioning. Methods of estimating kidney rejection in allograft recipients based on serum creatinine or eGFR, however, lack sufficient accuracy.
As noted, eGFR as a discriminator of rejection had not been observed by other studies. The identification of eGFR as a discriminator of rejection in this study gives rise to the concern that there was selection bias in the Prospera study that may have led to choosing optimal samples for the analysis of the no-rejection (NR) group. The novelty of this finding is not addressed in the Prospera study report.
Moreover, the study report found a mean eGFR of 77 mL/min/1.73m2 in the NR group. This is a highly unusual finding for an NR cohort, which is further evidence of selection bias. In fact, it is difficult to explain how such results could be representative of the U.S. kidney transplant population.
4. Definition of Active Rejection is Applied to Protocol Biopsies
According to the methods section of the Prospera study report, the allograft injury and active rejections studied were defined by an increase of at least 20 percent in serum creatinine. This amount of creatinine increase is commonly a trigger to order a for-cause biopsy. Therefore, by standard clinical definition, the biopsies listed as “protocol” with rejection that were selected for inclusion in this study were de-facto for-cause biopsies; there were no protocol biopsies positive for rejection.
D. Rejections Definitions Deviated from the Banff Rules
The Banff Classification of Allograft Pathology (Banff rules) is an international consensus classification for the reporting of biopsies from kidney transplants. These rules provide criteria for the diagnosis of types of kidney rejections and other pathology. The Banff rules are reviewed and updated every two years in light of the rapidly expanding information about kidney transplants. Generally, only kidney transplant rejection studies conducted in accordance with the Banff rules are accepted in the transplant research and scientific community.
The Prospera study report stated that it used the Banff 2017 criteria, but the report indicates significant deviations in the definition of rejection. The Banff working group has defined the criteria for pathologists to classify T cell-mediated rejection (TCMR) and antibody-mediated rejection (ABMR). The deviation in the Prospera study report concerned both TCMR and ABMR:
- First, the ABMR definitions specified in the methods section do not match those described for Banff 2017. They report two kinds of ABMR, C4d+ ABMR and C4d– ABMR and both definitions differ substantially from the subtypes of ABMR in the Banff 2017 definition, Active AMR and Chronic Active AMR.
- Second, the TCMR definition found in the methods section would exclude TCMR type IA as defined by Banff. Yet, in a supplemental table listing all rejections (Table S5 in the study report), some rejections are labeled TCMR IA. Since the methods section defines specific minimum criteria for TCMR for purposes of the study, it can be inferred that the study report labeled those samples TCMR IA because doing so was considered consistent with the study’s definitions. Application of the Banff grading criteria, however, would assign a higher rejection grade to those samples.
- Third, within the TCMR cohort, the study report includes cases with features of ABMR and refers to these cases as ‘borderline ABMR”. Banff 2017, however, does not recognize this concept, as there is no “borderline ABMR” definition in Banff 2017. This issue has been raised in an editorial in the American Journal of Transplantation as a factor very likely contributing to the overall sensitivity computed by the Prospera study report.
Although published in a nominally peer-reviewed journal, the selection of a general medical journal with an unproven reputation may have led to review by individuals with limited ability to assess the needs of a study in kidney transplantation. Ultimately, the analysis in the study report is confounded by numerous inconsistencies in defining the sample sets and is therefore difficult to use for defining characteristics and drawing conclusions about performance.
As discussed above, CareDx believes it is extremely important that the regulatory paradigm created by MolDX provide MDT developers with ways of identifying and consistently applying scientific standards that must be met to obtain coverage.
MolDX traditionally has applied high scientific standards to obtain coverage and CareDx urges the program to continue doing so. In particular, CareDx has substantial concerns about allowing reliance on a single retrospective study when that study was conducted with patient samples from a single center and when there is significant evidence that the study suffered from selection bias.
With over 20 years of dedication to the field of transplantation, CareDx has come to know transplant recipients well and we value the long-term outcomes for these patients. Our interactions with MolDX over the years give us confidence that you also have the best interest of patients as your most important mission.
Thank you for the comment and for raising concerns.
In general, when making national coverage determinations, MolDX and partner MACs evaluate relevant clinical evidence to determine whether or not the evidence is of sufficient quality to support a finding that an item or service falling within one or more benefit categories is reasonable and necessary for the diagnosis or treatment of illness or injury or to improve the functioning of a malformed body member (Title XVIII of the Social Security Act §1862(a)(1)(A)). For example, the evidence may consist of internal review of published studies, evidence-based guidelines, professional society position statements, expert opinion, and public comments.
The critical appraisal of the evidence during the development of a local coverage determination enables us to determine to what degree we are confident that: 1) the specific assessment of a clinical question relevant to the coverage request can be answered conclusively; and 2) the intervention will improve health outcomes for beneficiaries. An improved health outcome is one of several considerations in determining whether an item or service is reasonable and necessary.
MolDX must allow beneficiaries to have access to medically reasonable and necessary services while avoiding paying for services that are not reasonable and necessary and simultaneously avoiding the exercise of supervision or control over the practice of medicine. The coverage decision merely provides coverage for a service in the Medicare population, while deferring to the practicing transplant clinicians and the patients that they are treating to make individualized decisions about the most appropriate diagnostic approach to select for that beneficiary.
The data set in any study can be sliced countless ways based on sets of characteristics such that a patient with those characteristics was not included in the study. This is a practical limitation of all clinical research, and, as such, we must assess whether there is sufficient evidence to establish the population studied generalizes to a specific intended use population within the Medicare population. While most Medicare beneficiaries are older adults, the Medicare program may provide coverage to individuals with chronic kidney failure requiring dialysis of all ages. Furthermore, the inclusion of or enrichment of some atypical groups within a research sample study is common and does not negate the applicability of the study to the unenriched groups. Such inclusion or enrichment may alter positive and negative predictive values by altering the pre-test probability. However, pre-test probabilities do not necessarily impact sensitivity or specificity of a test.
Additionally, as the editorial article by Bloom referenced by the third point in Section D of the comment alludes to, significant scientific uncertainty exists in the assessment of the rejection classification. However, clinicians must care for individual patients in the setting of this scientific uncertainty, and the individual clinicians are the best judges of how to apply limited information to the care of their specific patients. While Dr. Bloom does note that a change in classification of some patients in the Sigdel study would have altered the published assay performance, he does so in the context of making a point about fundamental difficulties that exist in transplantation for classifying patients accurately into antibody mediated or T-cell mediated rejection and gives the classification of patients in the Sigdel study as an example of one important question raised by these fundamental issues.
We appreciate the Palmetto MolDX Program’s published review and draft coverage determination for our Prospera donor-derived cell-free DNA transplant assessment test.
As discussed in our previous meetings, I am writing in response to feedback shared by paid advisors of CareDx during the open meeting of May 6 and other public venues regarding the merits of the studies referenced in the Prospera LCD DL38041.
1. Regarding single center. Sigdel et al is the largest, most diverse published study of donor- derived cell-free DNA in renal transplantation to-date, and there is no study bias associated with the number of sites. The study included a meaningful number of patients across a variety of ages and racial demographics (African Americans, Hispanic/Latinos, Asians and Caucasians), which is clinically significant as key renal transplant metrics such as eGFR and graft survival are known to vary by race. Furthermore, a single pathologist reviewed all biopsy samples in the study to ensure consistency in the data. Finally, the study was conducted in collaboration with the University of California, San Francisco (UCSF), a leading institution in transplant medicine.
2. Regarding retrospective study design. Sigdel et al incorporated key elements of a strong study design, for the purposes of correlation analysis between dd-cfDNA and the gold standard of pathology.
- All blood samples and clinical data were collected and stored under IRB approved protocol.
- The pathology readings and analysis of blood samples were performed blinded to any other data in the study.
- The analysis only included same-day biopsy matches samples and a prospectively selected cut-off of 1%.
- A broad range of organ injury phenotypes were included, to evaluate test performance in a range of different clinical scenarios.
3. Regarding BANFF classification standards. Classification of rejection status from biopsies strictly adhered to the most current 2017 BANFF guidelines. As the field of transplantation matures the guidelines evolve to incorporate the latest research findings and expert opinion. Using the latest guidelines to dictate care is not only clinically responsible but also needed to provide high quality, cost effective treatments to the healthcare system.
To reflect the latest care protocols, Sigdel et al assessed active rejection (AR) in the following categories: antibody-mediated rejection (ABMR), T-cell mediated rejection (TCMR), and mixed (ABMR/TCMR). As directed by BANFF 2017 criteria, biopsies classified as “borderline rejection” were not included in the AR group. Borderline cases were included in the non- rejection category. In fact, the median dd-cfDNA value of active rejection cases is substantially higher than that of borderline cases (2.32% vs 0.58%).
4. Regarding performance across sub-cohorts. In a sub-analysis, the data in the Sigdel et al study showed consistent performance, regardless of biopsy indication and age.
- Prospera performance in a protocol biopsy (subclinical rejection) and for-cause biopsy (clinical) was similar in sensitivity (92% versus 87%, respectively), specificity (75% versus 69%, respectively) and AUC (0.89 versus 0.85, respectively).
- Prospera performance in an adults-only cohort was similar in sensitivity (89%), specificity (76%) and AUC (.88) to the overall study population.
- When looking at a cohort of only adults in for-cause situations, the predictive power (AUC) of Prospera (0.84) significantly outperformed eGFR (0.62) and serum creatinine (0.58) in detecting active rejection. This serum creatinine performance of 0.58 AUC is in-line with other previously published serum creatinine performance metrics (0.54, 0.63).
In conclusion, the analytical and clinical validation studies were scientifically sound, and they support the intended use of the Prospera test as defined in the LCD: to help physicians rule in or rule out active rejection when assessing the need for a diagnostic biopsy.
Thank you for the comment. We as the contractor wish to note that no comments in this document were shared with external stakeholders prior to the release of the response to comments document, though this comment does address many of the issues raised in the preceding comment. We have concluded that the evidence supporting Prospera™ is of sufficient quality to support a finding that the service is reasonable and necessary.
I am Dr. XXXX, a practicing kidney transplant surgeon. I participated in the public meeting on May 6th, 2019 and humbly appreciate this opportunity to provide written comments to the above-referenced MolDX proposed local coverage determination (LCD), for the Prospera test marketed by Natera, Inc. Based on Natera’s published data, I have concerns that Natera’s claims of test performance may be invalid. The inextricably linked issues of poor study design, lack of rigor in analysis, and likely significant selection bias are the major issues.
Natera’s stated performance claims are based on a retrospective, single-center analysis of banked blood. This study design is prone to selection bias for extreme cases, and there is evidence in the published data that such selection bias may be present. Selection bias could tend to select extreme examples to represent the two cohorts of rejection and no-rejection.
In defining the benefit of a new technology, it is important to distinguish between test performance estimated from samples from a single center derived biobank and real-world test performance from diverse populations treated at multiple sites. Natera has only demonstrated their assay in that narrow and specific setting (i.e. in a single center collection and handling of samples). Analysis of effectiveness would be better determined in a larger and more representative population. The Natera results did not characterize how the test may perform under pragmatic, real world conditions. Those conditions would include having multiple technicians across the United States, using commercially available blood collection tubes and kits, collecting, labelling, and shipping fresh blood samples across the country. Thus, the analytical and clinical validation Natera has published is not representative of the practical methods and materials that will be encountered in commercial test use across the country.
It is quite possible that the results obtained are not generalizable to other populations and not representative of practical performance in a real-world setting. For example, none of the pediatric patients in the Sigdel study had rejection, although the likelihood of rejection is higher in pediatric recipients. Furthermore, the authors found eGFR to be a discriminator for the presence or absence of rejection, reporting an eGFR of 76 in the No Rejection cohort and 45 in the Rejection group. This finding of the utility of eGFR as a discriminator of rejection has not been observed by others, and gives credence to the concern that selection bias may have led to analysis of optimal samples for the NR group. If serum creatinine (or eGFR) were useful as a discriminator of the presence or absence of rejection, why would the lack of utility of creatinine and eGFR in this setting be so obvious to the transplant community and why would we have such a pressing need for molecular diagnostics to aid in surveillance of kidney recipients?
Independent of the fact that eGFR was found to be of discriminatory utility in the Sigdel paper, it is also important to consider the remarkable finding of a mean eGFR of 76 in the NR group. Even more remarkably, the authors report a mean eGFR in the "Stable" patients (one of three groups composing the “Non-Rejection” group of 99.5 ml/min. (range of 47.4 - 131.1). These are extremely high mean eGFRs in transplant recipient cohorts, and these findings make it extremely unlikely that significant selection bias did not occur. For example, Y. Huang et al examined trends in renal function in a large cohort of kidney transplant recipients (Understanding Trends in Kidney Function 1 Year after Kidney Transplant in the United States. J Am Soc Nephrol. 2017;28(8):2498). Huang et al evaluated average eGFR at 1 year after kidney transplant in the United States in a cohort of 189,944 patients undergoing transplant between 2001and 2013, and found that "among deceased-donor KT (DDKT) recipients, average 1-year post-KT eGFR ranged from 54.8 to 56.5 ml/min per 1.73 m2 in 2001–2005, and 56.6 to 56.9 in 2011–2013."
Having a representative cohort of renal transplant recipients in long-term follow-up with a mean eGFR of 76 ml/min (or even more so, a population of “stable” patients with an eGFR of 99.5 ml/min) would be akin to randomly selecting a group of individuals from the general population for a pick-up basketball game and then showing up with a team on which all the players are over six feet, eight inches tall. In other words, this finding is extremely unlikely in a representative transplant population and provides powerful evidence of possible selection bias.
The data set for the Sigdel paper contains only 25 African American (AA) recipients (only 14% of the total study population). This underrepresents the proportion of AA (38%) transplanted in the U.S., and AA comprise an even larger fraction of Medicare covered renal transplant recipients. It would seem atypical for an LCD to be granted based on a single-center study cohort that significantly underrepresents AA recipients relative to the national waitlist or national transplant recipients.
In this study, for-cause and protocol biopsies were lumped together in the analysis, although these patient populations would be anticipated to have very different likelihoods of harboring rejection. The finding of eGFR as a discriminator for rejection may be due to the imbalanced combination of for-cause and protocol biopsies and/or it may be an indicator of sample selection for extremes
The histopathologic analysis of the samples raises concern as well. The authors purport to follow Banff 2017, but then do not do so in at least three significant ways: 1). Their definition of ABMR is widely inconsistent with the Banff criteria. 2). They classify biopsies as “borderline ABMR” when no such category exists in the Banff 2017 classification. This is important because it would be anticipated to alter the calculated sensitivity of the assay. 3). They seemingly excluded TCMR 1A from their analysis, leaving only the more severe TCMR 1B. It may be appropriate to ask Natera to show the records of the work plan and notebook of the pathologist who read all the biopsies under Banff 2017 rules, since the name of the pathologist was not mentioned in the authorship of the manuscript and apparent inconsistencies in the histologic analysis relative to the Banff 2017 criteria directly impact the results of the analysis.
Ultimately, the Sigdel et al paper is confounded by multiple inconsistencies in study design, specimen selection and sample analysis, making it difficult to draw conclusions about the assay’s performance.
This is an exciting and critically important clinical arena in a vulnerable patient population. Molecular diagnostics will undoubtedly play an increasing role in clinical decision making in this arena. However, we need more robust validation and analysis for the Natera assay before considering it for CMS approval and certainly before we see widespread adoption of this assay by the transplant community.
Thank you for the commentary. We appreciate hearing concerns from an experienced clinician.
The issue of selection bias pertains specifically to what the population was in whom this test was studied and also the pretest probability of rejection in that population. However, since each individual patient carries his or her own pretest probability, we also believe that a clinician must make a decision for each patient on whether the test is appropriate to use in the first place and how to interpret the results in light of that pretest probability. This decision is an individualized decision that must be made for each patient and is not a decision to be dictated by coverage policy.
I am a practicing transplant nephrologist and Medical Director of the Kidney Transplant Clinic at Vanderbilt University, a high-volume transplant center that performs over 200 kidney transplants per year with follow-up for the duration of the kidney allograft life. Vanderbilt University Medical Center is located in Nashville, Tennessee, in Palmetto Jurisdiction J. I would like to take this opportunity to provide written comments to the above-referenced MolDX proposed local coverage determination (LCD), which concerns the Prospera test marketed by Natera, Inc.
The manuscript describing the clinical validation of Natera’s test does not meet the burden of proof I would require to use a test clinically. This is of importance as every laboratory result comes with inherent risk due to actions taken by physicians based on the result. A laboratory result must provide sufficient and reliable information for physicians to make an educated assessment of the risks and benefits of further action or inaction.
The population used for the validation appears to carry the risk of affecting the results. The population is quite young and of immunologically “low risk”. They also excluded patients with infections and viruses (notably BK virus, the most common and concerning virus post-kidney transplant), both of which have the potential to cause elevated dd-cfDNA, as more severe cases lead to allograft cellular damage. In addition, there were no repeat transplants within the population. This concerns me that there may not be external validity in a substantial portion of my transplant population (approximately 8%), and enhances the risk of the test being used beyond its proven capabilities, which further puts our patients at risk of iatrogenic harm.
The pathologic criterion used in the manuscript is cited as Banff 2017 criteria, yet there are important differences in the definitions that they use compared to the actual Banff 2017 criteria. It would be interesting to know why the investigators cite and claim the use of the standard criteria for pathological grading, yet fail to follow them exactly.
The patients that were included in the study had extremely high eGFR, much higher than patients that are standard kidney transplant recipients. I found it very interesting that they found that eGFR was able to discriminate rejection from no rejection in their model, although it has been shown in multiple previous analyses to not be able to. When an analysis contradicts previous results, especially results that have been confirmed multiple times in well designed, often prospective studies, it concerns me that the results of this analysis are due to biased study design.
Finally, their comparison in the discussion section to the previously published results of a different dd-cfDNA test is clearly biased. They do not disclose that the populations in the two studies are completely different, the clinical situation in which the tests and biopsies are performed are different, and the biopsies are graded on different Banff criteria (2013 vs. 2017).
In summary, I would request that the MolDX group further investigate the concerns that I have laid out prior to making this test available to the transplant community. I am extremely concerned that the test requires further validation in a well-designed study prior to being used in patient care. With this thought I remain…
Thank you for this comment. We recognize that with the significant degree of scientific uncertainty in the field of transplantation and the large variability in management protocols between transplant centers, different transplant clinicians may have different judgements about the appropriate use of this test in the patients for whom they care. While this LCD may provide specific indications for coverage, it is not intended to suggest to providers how to treat their own patients. Rather, providers must still use their own clinical judgement and scientific knowledge to make individualized decisions for each beneficiary that they are seeing regarding the services that are most appropriate for the management of that beneficiary’s condition and those that are not appropriate. As regards the consistency with prior studies, we see that the manuscript by Sigdel provides comparatively poor discrimination ability of eGFR as compared with donor-derived cell-free DNA.