Current Challenges of Using Patient-Level Claims and Electronic Health Record Data for the Longitudinal Evaluation of Duchenne Muscular Dystrophy Outcomes

Data Sources

“Closed” insurance claims data, derived from individual payers and inclusive of all relevant records for healthcare encounters for a given individual, were used from two sources. The first was Merative’s MarketScan Commercial databases, a set of large, nationally representative healthcare databases with data for employer-sponsored, privately insured employees and their families. The second was Merative’s MarketScan Multistate Medicaid claims databases, which include demographic and clinical information, inpatient and outpatient utilization data, and outpatient prescription data for 17 million individuals enrolled in Medicaid across multiple states in the USA. Clarivate, an “open” claims database where records of healthcare encounters are derived from numerous sources including EHRs, was used as a third data source and has at least one claim filed for 200 million patients across inpatient, outpatient, or pharmacy settings [29]. MarketScan and Clarivate data used for this study are de-identified and did not require institutional review board review. The authors obtained permission to access and use the data from the owners of the MarketScan and Clarivate databases.

Patient Selection

The study period was defined as April 1, 2013, through March 31, 2018 (MarketScan Commercial); January 1, 2013, through June 30, 2018 (MarketScan Medicaid); and January 1, 2011 through December 14, 2021 (Clarivate). The index date was defined as the date when the first eligible inpatient or outpatient visit with a relevant diagnostic code or medication prescription record appeared in the datasets during their respective study periods. More detailed information on diagnostic codes can be found in the Supplementary Materials section.

Patients were included if they were male, 30 years of age or younger, and met at least one of the following criteria at any time during the study period:

International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) diagnostic code for hereditary progressive muscular dystrophy (359.1)

ICD-10-CM code for muscular dystrophy (G71.0 or, if on or after October 1, 2018, G71.01) in claims filed on or after October 1, 2015

Systematized Nomenclature of Medicine (SNOMED) code for DMD (76670001) in the Clarivate dataset

Patients were excluded if they met at least one of the following criteria:

≥ 2 medical claims for ventilator use separated by ≥ 180 days, before 6 years of age

≥ 1 medical claim with a Current Procedural Terminology (CPT) code or Healthcare Common Procedure Coding System (HCPCS) code for ankle foot orthosis or lower extremity surgery, before 3 years of age

≥ 1 medical claim for a power, power-assist, and/or manual wheelchair, before 5 years of age

≥ 1 medication fill (National Drug Code [NDC] 64406005801) or an injection code (HCPCS J2326, C9489) for nusinersen

In addition to the above exclusions, specific exclusion criteria were applied to the Clarivate dataset. These criteria included the presence of SNOMED code 387732009 for Becker muscular dystrophy (BMD). An exception was made for patients with BMD with a DMD-specific medication, ensuring that patients with DMD-related indications were appropriately included in the analysis even if they would have otherwise been excluded.

Patients with a record of prescribed DMD-specific therapies of phosphorodiamidate morpholino oligomer (PMO) therapies [18, 19, 30] were included into the cohort regardless of other inclusion/exclusion criteria. All included patients were followed from their index date until death (if known), deregistration, or the end of the study period.

DMD-Relevant Outcomes

A list of clinical and functional outcomes were ascertained from published systematic literature reviews (SLRs) [25, 31]. A total of 54 DMD-relevant outcomes were identified and independently assigned to one of five mutually exclusive categories: functional and clinical events, clinical measures, biomarkers, functional outcomes, and patient-reported outcomes (PROs) (Fig. 1). These categories provided a framework for assessing a dataset’s ability to capture and monitor DMD-relevant outcomes in individual patients. Data sources for clinical measures, functional outcomes, and PROs were identified from assessments or tests that measured the outcome of interest.

Fig. 1figure 1

DMD-relevant outcomes for which insurance claims and EHR data availability and suitability were examined

Each dataset was examined for data relevant to the prespecified outcome using a comprehensive approach that accounted for structured and unstructured fields. Structured fields were defined as having data organized in a standardized, predefined format, and included ICD diagnosis and procedure codes, CPT and HCPCS procedure and/or equipment codes, and NDC for medication dispensations. Unstructured fields were defined as having data that was not organized in a predefined manner and included clinical notes. The examination of structured and unstructured fields for available data of interest relied on direct or indirect methods depending on the DMD-relevant outcome of interest. Data were identified directly if there were codes or keywords available that directly represented an outcome of interest (e.g., diagnoses of cardiomyopathy). If unavailable, then data for an outcome of interest were identified indirectly using codes or keywords that could be used as a proxy (e.g., records of cardioprotective medication use, which could indirectly suggest that a cardiomyopathy diagnosis had occurred).

Functional and clinical events were defined as events that patients will not be able to regress from once achieved, and included cardiomyopathy/heart failure, LOA, mortality, respiratory insufficiency, scoliosis, and wheelchair use. Cardiomyopathy and heart failure data were identified directly with ICD codes specific to each one, and indirectly using codes for cardioprotective medications. Data availability for respiratory insufficiency was identified directly with ICD diagnosis codes for respiratory failure, and indirectly with HCPCS, CPT, and ICD procedural codes related to pulmonary management, tracheostomy, or assisted ventilation. Scoliosis data were identified directly using ICD diagnosis codes for scoliosis and HCPCS, CPT, and ICD procedural codes for spinal surgery. Despite the known lack of mortality data in most USA-based insurance claims datasets, mortality was recorded if indicated in the evaluated fields [32, 33]. Data availability and suitability were not assessed for LOA as billing codes cannot reliably determine full-time wheelchair use. Wheelchair purchase and repair data availability were assessed to inform future research but considered to be insufficient proxies for part-time or full-time wheelchair use. Although a published cross-sectional study validated algorithms for identifying nonambulatory status in individuals with DMD, the algorithms did not determine the extent of mobility loss, the timing of LOA, nor disease progression until LOA is reached [28]. One additional item was assessed that does not fall under the definition of functional and clinical events but is clinically significant for patients with DMD: ventilation use [23]. Ventilation use data were also checked to inform future research, and were identified directly using HCPCS, CPT, and ICD procedural codes for tracheostomy or assisted ventilation. For more information on codes used to identify DMD-relevant outcomes for functional and clinical events, refer to Supplementary Material. Data availability for clinical measures and biomarkers, which are administered clinically, were examined by identifying procedure codes that indicated an assessment or test was ordered. Structured fields in Clarivate EHR data were examined for the results of ordered tests or assessments; no similar fields for test results exist within the claims datasets. Data availability for biomarkers was assessed by searching for documentation of brain/B-type natriuretic peptide, creatine kinase (CK), or dystrophin levels. Functional outcomes, which are measured through clinically administered assessments, and PROs, which are not often administered in routine clinical practice, did not have structured fields available.

Unstructured data fields were available in the Clarivate EHR data and examined by keyword, searching for information relevant to a test or assessment for clinical measures, biomarkers, functional outcomes, or PROs. The availability of data for these outcomes was assessed by reviewing information for items such as test or assessment results, panel names, test order names, and vital statistics.

Analysis Overview

Age at index date and median duration of follow-up from the index date were estimated for the cohorts from each dataset and summarized by age group using frequency and percentage of the population included in the dataset. Follow-up was summarized using median and interquartile range. Cumulative annual attrition was determined using a Kaplan–Meier curve that estimated the percentage of patients remaining in a dataset by the end of each post-index year. For each identified outcome of interest, the insurance claims and EHR data were assessed for availability and feasibility of longitudinal patient-level tracking of disease progression using the algorithm illustrated in Fig. 2. No additional statistical tests were conducted.

Fig. 2figure 2

Algorithm used to assess the feasibility of insurance claims and EHR data to assess and track progression of DMD-relevant outcomes over time

Data relevant to the DMD outcomes of interest were also assessed for suitability of estimating disease severity and onset. Data were assessed for indicators that could distinguish more severe from less severe symptoms as well as indication of onset.

Finally, the overall feasibility of using available data to assess disease progression at the individual level was examined. The reporting of actual test results was required to inform whether data on the measures’ outcomes existed (as opposed to a reference to an assessment being requested or performed without the corresponding results). The prevalence of assessable measures was summarized overall for each dataset.

Comments (0)

No login
gif