Validity of Race and Ethnicity Codes in Medicare Administrative Data Compared With Gold-standard Self-reported Race Collected During Routine Home Health Care Visits

Background: Misclassification of Medicare beneficiaries’ race/ethnicity in administrative data sources is frequently overlooked and a limitation in health disparities research. Objective: To compare the validity of 2 race/ethnicity variables found in Medicare administrative data [enrollment database (EDB) and Research Triangle Institute (RTI) race] against a gold-standard source also available in the Medicare data warehouse: the self-reported race/ethnicity variable on the home health Outcome and Assessment Information Set (OASIS). Subjects: Medicare beneficiaries over the age of 18 who received home health care in 2015 (N=4,243,090). Measures: Percent agreement, sensitivity, specificity, positive predictive value, and Cohen κ coefficient. Results: The EDB and RTI race variable have high validity for black race and low validity for American Indian/Alaskan Native race. Although the RTI race variable has better validity than the EDB race variable for other races, κ values suggest room for future improvements in classification of whites (0.90), Hispanics (0.87), Asian/Pacific Islanders (0.77), and American Indian/Alaskan Natives (0.44). Discussion: The status quo of using “good-enough for government” race/ethnicity variables contained in Medicare administrative data for minority health disparities research can be improved through the use of self-reported race/ethnicity data, available in the Medicare data warehouse. Health services and policy researchers should critically examine the source of race/ethnicity variables used in minority health and health disparities research. Future work to improve the accuracy of Medicare beneficiaries’ race/ethnicity data should incorporate and augment the self-reported race/ethnicity data contained in assessment and survey data, available within the Medicare data warehouse.

I mproving minority health and reducing health disparities is a national priority. 1,2 Recent attention has been placed on addressing confounding of observational data and the use of sophisticated causal modeling methods in health disparities research. 3 However, monitoring and reducing disparities require accurate data on race and ethnicity that are not consistently available. [4][5][6] Administrative data sources of race/ ethnicity data are limited with regards to completeness and accuracy, making self-reported data the preferred source and gold standard. 7 Despite this, even when self-reported race/ ethnicity data is available, an administrative data source is frequently used in research on disparities in health care quality and outcomes. [8][9] The completeness and accuracy of race/ethnicty data are especially problematic for Asian Americans and Pacific Islanders (AAPIs), and for American Indians and Alaskan Natives (AIANs). [10][11][12] As a result, incomplete and inaccurate race/ethnicity data limit our understanding of the sources of disparities in health care access, quality, and outcomes and evaluation of changes in minority health over time.
Administrative data, including insurance plan enrollment and demographic information, are contained in the Medicare Beneficiary Summary File (MBSF). The MBSF contains 2 separate race variables. The first is from the Medicare enrollment database (EDB) and originates from Social Security Administration records. Before 1980, the Social Security Administration (SSA) collected voluntary race data using the categories: white, black, other, and un-known (for people who did not respond). "A further limitation in the racial and ethnic data contained in Medicare beneficiary files is that when the Center for Medicare and Medicaid Services (CMS) obtains the enrollee information from the SSA master beneficiary record, it receives information only on the retiree, not the retiree's spouse. Instead, the race of the beneficiary is simply assigned to the spouse." 13 CMS has made multiple efforts to fill in missing data including a postcard survey of people with Hispanic surname or country of birth, and use of race/ethnicity data from Medicaid for dual-eligible beneficiaries from 32 states. However, despite these efforts, the EDB race variable is known to severely undercount Hispanics, Asian Americans/Pacific Islanders, and AIANs. 14 Because of these limitations, analyses using race/ethnicity data from the enrollment file (EDB) are generally restricted to the identification of differences between black and white patient populations. 9,15 The second race variable was created a decade ago by researchers at the Research Triangle Institute (RTI) to improve classification of Hispanics and Asians/Pacific Islanders. 16,17 The RTI race imputation algorithm utilizes lists of Hispanic and Asian/ Pacific Islander names from the US Census, and simple geography (residence in Puerto Rico or Hawaii) to improve on the EDB race code. 16 The RTI race variable is used by the Centers for Medicare & Medicaid Services' in reports on health disparities in the Medicare population and in studies which include a focus on Hispanic and Asian/Pacific Islander populations. 18,19 In contrast to administrative data sources, national surveys of Medicare beneficiaries include self-reported race and ethnicity. Examples of survey datasets that contain selfreported race/ethnicity include the Medical Expenditure Panel Survey (MEPS), the Medicare Current Beneficiary Survey (MCBS), and the Health and Retirement Survey (HRS). In addition, the Consumer Assessment of Healthcare Providers and Systems (CAHPS) patient experience datasets contain self-reported race/ethnicity data. Finally, self-reported race/ ethnicity data are collected as part of postacute and long-term care assessments including the Outcome and Assessment Information Set (OASIS) used in home health care (the gold standard in this study), the Minimum Dataset (MDS) used in nursing homes, the Inpatient Rehabilitation Facility-Patient Assessment Instrument (IRF-PAI), and the Medicare Health Outcomes Survey (HOS) used in Programs of All-Inclusive Care of the Elderly and with a random sample of Medicare Advantage plan subscribers.
Although patient experience survey data (CAHPS) have been used to validate race/ethnicity variables contained in administrative sources, the use of self-reported race/ethnicity data collected as a routine part of health care delivery has received less attention. The main objective of this analysis is to compare the agreement and accuracy of 2 sources of race and ethnicity information contained in the Medicare data warehouse: (1) the EDB race variable which originates from SSA data; (2) the RTI race variable imputed from name and geography; with a gold standard: the self-reported race and ethnicity data collected by Registered Nurses and Physical Therapists during routine home health care assessments as part of the OASIS. 20 For added context, the accuracy and agreement measures are stratified by sex, patterns of misclassification errors are explored, and we compare our findings with earlier studies using survey data as the gold standard.

Data Source and Patient Population
The study population included all Medicare beneficiaries, 18 years and older, who received home health care in 2015 (4,243,090 people). Two data sources containing 3 race/ ethnicity variables for our sample of Medicare beneficiaries were linked using the unique Chronic Conditions Warehouse beneficiary identification number for the entire study population: The 2015 MBSF containing the EDB race variable and RTI race variable; and the 2015 OASIS containing the "gold-standard" self-reported race/ethnicity for all home health care patients. All 3 race variables (EDB, RTI, and OASIS) were available for the entire study population.
During the initial home health care visit by a registered nurse or licensed physical therapist, as part of the standardized OASIS assessment, race/ethnicity data are obtained by self-report (a caregiver may answer if the patient is unable) and allows for multiple answers to be recorded. The directions for this question include the words "Mark all that apply" and the response choices are: (1) American Indian or Alaska Native; (2) Asian; (3) Black or African-American; (4) Hispanic or Laino; (5) Native Hawaiian or Pacific Islander; and (6) White.
For the purposes of this paper, and for consistency with the EDB and RTI race variable categories, beneficiaries who self-identified as either or both (1) Asian and (2) Native Hawaiian or Pacific Islander were classified as AAPI. The vast majority (99.73%) of home health beneficiaries had only a single race/ethnicity recorded, and we restricted our study to this population. Details of the remaining 11,720 people (0.27% of the study population) who identified with ≥ 2 racial/ethnic groups are included for the interested reader as a brief in Appendix. Our final study sample consisted of 4,231,370 adult Medicare beneficiaries who received home health care in 2015. The study was approved by the Institutional Review Board of Rutgers, The State University of New Jersey.

Statistical Analyses
Datasets were linked at the patient level using the unique beneficiary identification code assigned for this purpose by CMS. For each person, the analytic file contained the 3 race variables (EDB, RTI, and OASIS) which were recoded (so that the value 1 had the same meaning in each dataset) and also dummy coded for calculation of single-race κ statistic. All analyses were completed by the second author using SAS statistical software (version 9.4) and the first author using Stata 15.0 to ensure reproducibility and confirm final results were error-free. We first assessed the agreement and validity of the EDB race and RTI race variables compared with self-reported race/ethnicity data from the home health OASIS. Analyses of sensitivity, specificity, positive predictive value, and Cohen's κ coefficient were calculated for the full sample and for each sex separately ( Cohen's kappa (k) statistic is a measure of interrater reliability that takes into account the frequency or rarity of belonging to a different racial/ethnic group. Values range from 1 (complete agreement) to −1 (complete disagreement). 21 As a point of reference, Landis and Koch 22 have suggested a κ coefficient > 0.81 indicates excellent agreement. Both the overall κ statistic and the individual race κ statistics were calculated using the entire sample, including cases classified as other/unknown.
In the second set of analyses, the pattern of race/ ethnicity misclassifications was explored for both the EDB and RTI race variables compared with OASIS gold standard. Table 2 includes the raw data used to populate and calculate the overall sample statistics presented in Table 1. Next, we focus on the subset of cases which were misclassified, highlighting the improvement of the RTI race variable compared with the EDB race variable (Table 3).
In the third set of analyses, differences in race/ethnicity categorization of RTI compared with OASIS race/ethnicity are compared for a subset of beneficiaries with dementia or diabetes (Table 4). We determined dementia or diabetes diagnosis status for our subset study population from the MBSF Chronic Conditions Warehouse flags. This analysis highlights 1 aspect of race/ethnicity variable choice on study design and the resulting differences in frequency and prevalence of chronic disease burden within subpopulations.

Agreement and Accuracy of EDB and RTI Race Variables With Self-reported Race/Ethnicity From OASIS
Both the EDB and RTI race variables have mutually exclusive categories, meaning that a person who is categorized as white or black is considered to be non-Hispanic. For this reason, in the text and tables, the term "white" refers to non-Hispanic whites, the term "black" refers to non-Hispanic blacks and African Americans, the term "AAPI" refers to non-Hispanic Asians and Pacific Islanders, and the term  "AIAN" refers to non-Hispanic American Indians and Alaskan Natives.
In our analyses using OASIS race as the validation standard (shown in Table 1), the sensitivity of EDB and RTI race variables for non-Hispanic whites was high (96.9-97.9), however, the specificity of EDB race was low (79.6) compared with RTI race (95.5). In contrast, among people who self-identified as non-Hispanic black, the EDB, and RTI race variables both perform similarly well, with high sensitivity (96.6-97.0) and high specificity (99.2-99.4). Among people who self-identified as Hispanic the original EDB variable had low sensitivity (36.2) but high specificity (99.8). In contrast, the RTI race variable had both good sensitivity (90.8) and high specificity (98.8). Among people who self-identified as non-Hispanic Asian, Hawaiian Native, or other Pacific Islander (AAPI), specificity of both the EDB and RTI race variables was high (99.6-99.8). However, the RTI race variable had better sensitivity (74.7) compared with the EDB race variable (62.6). Finally, among people who selfidentified as non-Hispanic American Indian or Alaskan Native (AIAN), the sensitivity of the EDB and RTI race variables was low (43.0-43.2), whereas the specificity was high (99.8). The EDB classification of AIANs on the basis of tribal membership registration results in fewer than half of people who self-identify as AIAN being correctly classified in Medicare administrative race/ethnicity data.

Sex Differences in Accuracy and Agreement of Race/Ethnicity Variables
The EDB race variable, originating from SSA records, is slightly more accurate for women compared with men except among AIANs (κ = 0.44 vs. 0.46). In contrast, the RTI race variable, imputed from US Census name lists and residence in Hawaii or Puerto Rico, is less accurate for women compared with men among AAPIs (κ = 0.77 vs. 0.79), Hispanics (κ = 0.85 vs. 0.89), and AIANs (κ = 0.44 vs. 0.46). See Table 1 for all accuracy and agreement statistics stratified by sex.

Patterns of Over-classification and Misclassification by Race/Ethnicity Variables
The pattern of misclassification errors in the EDB and RTI race variables compared with self-reported race/ethnicity from the OASIS dataset are shown in Table 3. Using the original EDB race variable, 190,434 people were misclassified as non-Hispanic white, with the majority (167,495/190,434; 88%) self-identifying as Hispanic. In contrast, the RTI race variable mistakenly classifies a much smaller number (41,878) of minorities as being non-Hispanic white, with about half being Hispanic (21,941/41,878; 52.4%). However, the RTI race variable misassigned non-Hispanic whites as Hispanic > 5 times as often compared with the original EDB race variable (37,670 vs. 6,695), accounting for 78% of people misassigned as Hispanic by RTI race. Although smaller in number, non-Hispanic whites also comprise 80% of people misassigned by the RTI race variable as black, 77% who are misassigned as AAPI, and 84% of people misassigned as AIAN (Table 3).

Dementia and Diabetes Frequency and Prevalence by Race/Ethnicity Variables
To illustrate the potential impact of race/ethnicity misclassification on the estimated size of health disparities and disease prevalence, we calculated the number of beneficiaries with dementia and diabetes using each of the 3 race/ ethnicity variables. When comparing the numbers of people with a diagnosis of dementia or diabetes, the largest net differences were among the Hispanics, Asians/Pacific Islanders, and AIAN (Table 4). The net difference is important for study designs that draw their sampling frame from administrative data sources.
Using the RTI race variable (compared with OASIS) resulted in an overestimation of the number of Hispanics with dementia by a net difference of 4283 (4.8%) and diabetes by a net difference of 10,477 (5.4%). In contrast, the EDB race variable underestimated the number of Hispanics with dementia by a net difference of 48,407 (−54.8%) and diabetes by a net difference of 114,003 (−59.0%). However, the EDB race variable also produced falsely high estimates of the prevalence of dementia (34.1%) and diabetes (67.9%) in Hispanics. The RTI and OASIS race variables produced similar estimates of the prevalence of dementia (29.0%-29.6%) and diabetes (63.9% -64.9%) among Hispanics.
Among AAPIs, the number of people with dementia was underestimated by a net difference of 1853 (−6.4%) using the RTI race variable and by 6032 (−21.1%) using the EDB race variable. The pattern was similar for diabetes in AAPIs, which was underestimated by a net difference of 4391 (−8.2%) using the RTI race variable, and 12,113 (−22.6%) using the EDB race variable. When the prevalence of dementia and diabetes were calculated for AAPIs using each of the race/ethnicity variables, the pattern was similar to that seen for Hispanics, with EDB race overestimating chronic disease burden. Using the RTI and OASIS variables the prevalence of dementia among Asians/Pacific Islanders was 32.1%-32.6% and 34.1% using EDB race. For diabetes, the prevalence among AAPIs was 59.9%-60.2% using the RTI and OASIS race variables, and 62.7% using EDB race. Full results are shown in Table 4.

DISCUSSION
If we believe the self-reported race is truly a "gold standard," we must consider more than overall accuracy (κ statistic > 0.81) and high specificity. Paraphrasing Statalist (statalist.org) expert Clyde Schechter, let us use a simple example: Lou Gehrig's disease or amyotrophic lateral sclerosis (ALS) is a very rare motor neuron disease. If a "test" to diagnose ALS simply results in everyone "not having it," that test will have high specificity, giving the correct answer for well over 99.9% of the population. However, it is useless to find people who actually have ALS. To be useful, you really need to consider 2 different measures of validity: (1) sensitivity: the proportion of people who are positive under the gold standard who are also test positive, and (2) specificity: the proportion of people who are negative under the gold standard who also test negative. Referring to Clyde's phony "test" for ALS, the test would have a specificity of nearly 100% but a sensitivity of 0%. Evaluation of tests or measures for which a gold standard exists usually requires looking at both the sensitivity and specificity.
Consistent with prior studies, we found the EDB and RTI race variables contained in Medicare administrative data undercount Hispanics, AAPIs, and AIANs (summarized in Table 5). 17,23,24 Although advances have been made in the Medicare Bayesian Improved Surname and Geocoding (MBISG 2.0) algorithm used to calculate racial and ethnic differences in Healthcare Effectiveness Data and Information Set (HEDIS) measures, [25][26][27] the accuracy statistics are reported as cross-validated Pearson correlations with selfreport, in the form of probabilities, precluding direct comparison with current and prior studies listed in Table 5.
From a methodological standpoint, the choice of race/ ethnicity data source is essential at the study design stage for health disparities research. The impact of race/ethnicity variable selection on estimates of disease prevalence is of special concern, as we found in the case of dementia prevalence among Hispanics shown in Table 4. When using the EDB race variable, the prevalence of dementia among Hispanics is 18% higher compared with when the RTI race variable is used, with an absolute difference of just over 5 percentage points. A smaller difference (1.5 percentage points) is seen for AAPIs, with virtually no difference for non-Hispanic whites, blacks, and AIANs. Compared with the EDB race variable, if the RTI variable was a "race-specific" antidementia drug for Hispanics, then it would be a blockbuster.
For AAPI populations, our study findings have additional significance. Asian Americans/Pacific Islanders are the fastestgrowing population in the United States, while being the most heterogenous. Certain AAPI subgroups, such as Filipinos, may be more prone to misclassification using surname-based imputation methods because of the long history of Spanish colonization in the Philippines. Similarly, the Republic of China (Taiwan) was colonized by the Dutch and Spanish; India was colonized by the Portuguese, Dutch, and British; and Vietnam, Laos, and Cambodia were colonized by the French. In addition, interracial/intercultural marriages frequently result in women changing their last name to that of their husband's family, such that a woman who marries a Filipino-American man might be classified as Hispanic using name-based race algorithms.
Although the self-reported race/ethnicity data should always be the first choice, we found the RTI race variable to be very accurate for identifying Hispanics (k = 0.89 for male individuals; k = 0.85 for female individuals) and non-Hispanic whites (k = 0.90) or blacks (k = 0.96) of either sex (Table 1). For more granular analyses, and especially research that aims to disentangle race/ethnicity and socioeconomic status, a higher level of accuracy may be desired. Researchers who are working with linked administrative and assessment datasets should report racial/ethnic differences on the basis of the self-reported race variable. Reviewers and journal editors should question the source of race/ethnicity data and critically examine the rationale for research which uses the EDB race variable, as it is inappropriate for use beyond studies of black/white disparities. Similarly, studies of a nursing home or home health patients should not use the EDB or RTI race variable, as self-reported race collected in the MDS and OASIS assessments is the gold standard. Finally, future advances in race/ethnicity imputation algorithms at CMS should include and augment self-reported race/ethnicity data from both survey (MCBS, HOS, and CAHPS) and assessment (OASIS and MDS) data sources.
This study has several limitations. First, the study population consisted only of Medicare beneficiaries who utilized home health care in the calendar year 2015. Second, blacks are slightly over-represented in the home health care population compared with the full Medicare population (estimated with the RTI race variable). Third, some older adults, especially AAPIs and Hispanics, may retire or seek supportive care outside of the United States, limiting their access and use of the Medicare home health care benefit, and the generalizability of findings for Medicare beneficiaries living outside the United States. Finally, AIANs who live on tribal reservations may be under-represented, in contrast to people who self-identify as American Indian but are not registered tribal members.
In conclusion, administrative datasets are commonly used in reports and studies of minority health and health disparities. Our study highlights the potential for bias and error introduced during the selection of race/ethnicity data source. Our work confirms the advantages of using the RTI race variable compared with EDB race variable. We also show that further reductions in error and bias can be gained by using self-reported race/ethnicity contained in assessment datasets. These findings have important implications for the design of future studies and the interpretation of prior published research on minority health and health disparities. Future work to improve imputation algorithms for Medicare beneficiaries' race/ethnicity should incorporate self-reported race/ethnicity data that is contained in assessment (eg, MDS, OASIS, IRF-PAI, HIS, and HOS) and survey data (CAHPS) to augment existing data sources (EDB and RTI).
ACKNOWLEDGMENT Authors would like to acknowledge Tina Dharamdasani and Julia Kang for assistance with preparation of revised manuscript, tables, and figure.

APPENDIX
Description and discussion of Medicare beneficiaries who self-identified with two or more race/ethnicity groups during home health assessment (OASIS dataset).
In this supplemental analysis, we focused on the 11,720 beneficiaries who self-identified with two or more racial/ethnic groups during their home health care assessment and were excluded from the main analysis. While this represents a very small fraction (0.28%) of the 4,243,090 Medicare beneficiaries who received home health care in 2015, the number of individuals with multi-racial/ethnic identities is rapidly growing. 28 Of these, 289 people (0.007%) identified with more than two races.
Researchers should be aware of this issue and methods for classifying and modeling individuals who self-report multiple races/ethnicities. 28 For example, of the 4,568 Hispanic individuals who self-reported two races/ethnicities in OASIS, the corresponding RTI race/ethnicity variable correctly classified 3,194 (70%) as Hispanic but missed/undercounted 1,374 (30%). Additionally, among people who self-identified in OASIS as American Indian or Native Alaskan (AIAN) nearly one-sixth also identified with another race/ethnicity (2,919/18,891).