The CMS Innovation Center must evaluate its models as part of its statutory authority. Evaluations look at provider and patient experiences with a model, model implementation, transformation of the health care marketplace around a model, and a model’s impact on spending and quality of care. Evaluation findings inform decisions related to model modification, expansion, or termination, as well as the design of future models.

Conducting model evaluations

Each model evaluation is carefully designed to ensure it will accurately assess model implementation and its impact on cost and quality. Some evaluation measures such as total spending and hospital utilization are standard across all models, while others are customized to align with specific features of a model such as emergency department utilization, patient-reported outcomes, or spending during an episode of care.

Model evaluation involves a team of CMS Innovation Center staff and contractors. CMS staff ensure that a model is designed to be evaluated thoroughly, and an independent evaluation contractor develops and conducts the evaluation. The contractor designs the model evaluation, analyzes spending and quality data, and produces evaluation reports. The evaluation contractor uses existing administrative and enrollment data and often collects and analyzes additional data from surveys, interviews, focus groups, or site visits to understand how participants implemented a model and explain how they changed spending or quality for their patients. 

Evaluation reports and additional data are published on the CMS Innovation Center website.

Model evaluation spending results versus model participant financial results

Evaluation spending results are commonly confused with model participant financial results. The two sets of results serve distinct purposes and are calculated at different times using different methods. Financial results are payouts from CMS to model providers for their participation, and they represent a cost to CMS of running a model. All models have financial results, but their structure and timing vary by model. They are accounted for in the evaluation spending results for each performance period, which may cover several weeks or up to a year depending on the nature of a model.


Model evaluation spending results versus model participant financial results table
MethodModel Participant Financial ResultsModel Evaluation Spending Results
PurposeTo determine incentive payments disbursed as shared savings or other bonus payments to model participantsTo estimate impact of the model on Medicare or Medicaid spending
TimingCalculated before or soon after a performance period to create a timely, reliable payment mechanismCalculated several months after a performance period to ensure finalized claims data and time to create a comparison group and perform analyses
ComparisonHistorical spending of participants projected forward based on trends from a reference population to create a forecasted target or benchmarkComparison group of a similar patient population not in the model whose spending is measured before and after the model start
Risk adjustmentAdjustments to spending based on estimates of sickness levels of patients in the modelAdjustments to spending based on estimates of sickness levels of patients in the model and comparison group
Net savings or lossesSavings relative to savings or benchmark after deducting incentive payments disbursed to model participantsMedicare or Medicaid spending changes after deducting incentive and operational payments such as care management fees

Considerations when evaluating a model 

  • Voluntary and mandatory models. In most models, providers voluntarily sign up to participate. Providers that voluntarily participate in a model may be more likely to perform well for reasons related to their experience, organizational structure, or the patient population in their health care market. Participants’ decisions to continue or end their participation in a model can also influence a model’s outcomes. Even when the evaluation accounts for these factors, model participants’ performance in voluntary models may not resemble how the model would fare if expanded to other providers or health care markets. To guard against these possibilities, some models are designed to be mandatory for a randomly selected set of providers.
  • Sample size. To convincingly measure whether spending or quality performance differs between model participants and their comparison group, a model must include enough patients and providers. Innovation Center staff calculate the minimum number of patients or providers needed as part of designing the model. Having a sufficient sample size ultimately depends on the number of participating providers, the type of performance measure, and the magnitude of effect participants have on the measure. For measures such as spending that can vary widely across patients and providers, the model may require thousands of patients to be able to generate credible evidence that a model has an impact.
  • Data availability. Fee-for-service Medicare claims data are available after final claims are submitted, but similar data for Medicare Advantage or Medicaid are not always available or not available as quickly. Sometimes other types of data about a model or comparison group or other payers are not available to help explain evaluation results. These types of data may include a provider’s experiences that affect performance in a model; a patient’s living conditions, education level, or level of engagement with health care decisions; or the activities of commercial payers in a health care marketplace. The evaluation often has to balance the desire for more comprehensive data against the burden and costs of data collection or reporting.
  • Overlapping models. As the number of Innovation Center models has grown, the extent to which they may overlap across providers and patients increases. Different models can have competing or complementary incentives, so it’s important for the evaluation to capture which models overlap to help explain how overlap might affect an evaluation’s results.
  • Health care landscape. The evaluation measures changes in spending and quality in the real-life health care marketplace, making it a complicated laboratory to study model effects. Although the evaluation strives to account for the messiness of the health care landscape, it can be challenging to account for all the forces that affect spending and quality.
  • Short-term effects. The evaluation only assesses model performance while it is operating, which means that a model’s longer-term effects on the way providers deliver care or the health of patients cannot be captured by the evaluation.

Evaluations and the CMS Innovation Center’s strategic direction

In 2021, the CMS Innovation Center unveiled a new strategy for the next decade, with the goal of achieving equitable outcomes through high-quality, affordable, person-centered care. Under this new strategy, the Center will continue to perform rigorous evaluations of its models. 

A critical component of the CMS Innovation Center’s strategy is designing models that allow their evaluations to assess the transformative impact of models on health system reform, underserved populations, and health equity, aligning with efforts across CMS. The Innovation Center will develop standardized evaluation requirements and measures for measuring health impact and assessing models’ effects on closing disparities in care and outcomes. The Innovation Center will also utilize a person-centered strategy when selecting quality measures that matter most to patients and employ a consistent approach to measuring them.

Additional Information

CMMI White Paper: Synthesis of Evaluation Results Across 21 Medicare Models (PDF)  |  Findings At-a-Glace (PDF)  |  Slides (PDF)  |  Recording (MP4)  |  Transcript (PDF) 


< Back to Key Concepts