Fracture risk assessment in the presence of competing risk of death

Study design and participants

We used data from the Dubbo Osteoporosis Epidemiology Study for which the study design and protocols have been described in detail elsewhere [17]. Briefly, through the electoral roll and via media campaign, all community-dwelling women and men aged 60 years or older as of 30 June 1989, living in Dubbo City, New South Wales, Australia, were invited to participate in the study. There is only one hospital and three radiology services for the entire Dubbo region. This centralized healthcare system, in addition to a geographically isolated research community, allows a complete ascertainment of all fractures and mortality among elderly people aged 60 years or older in the whole Dubbo region, making censoring minimal [18]. The study was approved by St. Vincent’s Hospital Human Research Ethics Committee, New South Wales, Australia (HREC reference number: 13/254) and carried out according to the Australian National Health and Medical Research Council Guidelines, consistent with the Declaration of Helsinki. All participants provided written informed consent.

Regular visits were conducted biennially for a detailed and ongoing assessment of bone health. At recruitment and each visit, a nurse coordinator interviewed participants by administering a structured questionnaire to obtain anthropometric data, lifestyle factors, number of falls during the previous 12 months, prior fracture after the age of 50 years, chronic health disorders and medications prescribed. Bone mineral density (BMD) was measured at the lumbar spine and femoral neck by dual-energy x-ray absorptiometry (Lunar DPX-L; GE-Lunar).

Outcome assessments

The X-ray reports from all three radiology services for the entire Dubbo area were reviewed regularly to identify incident fractures occurring between recruitment until recently. The circumstances surrounding each fracture were determined by phone call after each fracture. The analysis included only fractures involving minimal trauma less than or equivalent to fall from standing height. High-trauma fractures, those due to underlying diseases, e.g., cancer or Paget disease, or those of digits, skull, or cervical spine were excluded. All deaths in the region were obtained from funeral lists and obituary review with verification from the State Registry of Births, Deaths, and Marriages.

Statistical analysis

As all models included five predefined predictors [8] and the study aimed to quantify the predictive performance in the validation cohort, the study population was randomly split into the development cohort (60%) and the validation cohort (40%) [19].

First, we fitted four regression models that apply different statistical methods to account for the competing death in the development cohort. They included (i) the conventional Cox’s proportional hazard model, (ii) the cause-specific hazard model, (iii) the Fine-Gray sub-distribution hazard model, and (iv) the multistate model. The conventional model estimates the risk of fracture, right censoring the competing death (i.e., the death without a fracture); whereas the cause-specific hazard, Fine-Gray and multistate models apply different methods to account for the competing death (Supplemental Methods).

Briefly, the conventional approach models fracture risk under the assumption that individuals who remain under follow-up have the same fracture risk as those who die without a fracture as if the occurrence of fracture is independent of the occurrence of death without a fracture (Figure S1A). By contrast, the cause-specific hazard approach, as the name implies, models the cause-specific hazards for fracture and those for death without a fracture separately and then combines these two models’ coefficients to obtain a valid estimation of the cumulative hazard for fracture (Figure S1B) [11, 12]. The Fine-Gray method treats individuals who have died without a fracture as if they are still at risk of fracture, representing “immortal” time, but assigns a gradual reduction of weights for those with the competing death in modeling fracture risk (Figure S1C) [14]. Finally, the multistate model treats fracture and death without a fracture as two separate “states” but takes their complex inter-correlation into account (Figure S1D) [15]. Whereas the other approaches compute the cumulative incidence of fracture, the multistate model estimates the transition risk from the “event-free” state to the “fracture” state at a particular time t which is technically the fraction of individuals with a fracture at time t.

Follow-up time to fracture was calculated from the recruitment date to the date of fracture, while the follow-up time was calculated until the date of death for individuals who died without a fracture, the date of last visit or 30 June 2018, whichever came first for those who remained fracture-free. All four models used the same fracture predictors, including sex, age, femoral neck BMD, the presence of falls during the last 12 months and the presence of prior fracture after the age of 50 years prior to the study entry [8] to allow cross-comparison of their predictive performance. These predictor variables had no missing data. A proportional hazard assumption was graphically checked using the Schoenfeld residuals [20].

Secondly, we quantified the predictive accuracy of the four regression models in the validation cohort using both discrimination and calibration analyses that have been widely employed to validate the predictive accuracy of the existing fracture risk assessment tools for predicting the occurrence of fracture at clinically relevant time points [7]. Specifically, we examined the predicted absolute risks of fracture at 5 and 10 years of follow-up with the primary focus on the 10-year risk that is widely used in reality to identify high-risk individuals [5, 6, 21].

The discrimination performance was primarily quantified using Harrell’s concordance C index [22] with a value closer to 1 indicating better discrimination. Harrell’s C index was calculated specifically for each of the four models of interest. We used a flexible calibration curve with the addition of confidence limits for predicted group categorization [23] as the primary calibration measure for the moderate model calibration which has been shown to be realistic in epidemiologic research and considered a pragmatic guarantee that decision-making based on the model is not clinically harmful [24]. The calibration curve is constructed for centiles of predicted fracture risk with the closer concordance between the predicted fracture risks and observed fracture rates to the line of perfect prediction indicating better calibration. The predicted fracture risks were estimated from the prediction models for each participant in the validation cohort at single time point, whereas the observed fracture rates were computed as the number of participants who sustained a fracture up to the specific time point over a total of participants in each centile of predicted risk [24]. The calibration curve with its corresponding 95% confidence interval (CI) is then drawn as the average predicted fracture risk in each centile as the x-axis against the observed fracture rate in the same centile as the corresponding y-axis [24]. For quantitative comparison of the calibration performance across the models, we reported the “calibration-in-the-large” index that quantifies the overall difference between the average observed event rate and the average predictive risk [25] and the estimated calibration index that summarizes a flexible calibration curve into a single value [26]. Ideally, the calibration-in-the-large index is zero. The prediction of fracture risk was considered accurate if the average predictive values were not significantly different from the average observed fracture events (i.e., the 95% CI of the calibration-in-the-large index includes a reference unity of zero) [25]. Similarly, the estimated calibration index, calculated as the average squared difference between predicted risk and the observed event rate is zero if the flexible calibration curve is perfect. The estimated calibration index has been thus recommended as a valid measure for easily comparing calibration performance across different prediction models [26]. Other secondary measures of model’s discrimination and calibration performance were also reported (Table S1).

We conducted a sensitivity analysis that mimicked the “standard” data collection in a conventional longitudinal study. The sensitivity analysis, therefore, censored the follow-up time at the last visit date and included only outcome events (i.e., fracture or death) occurring at or prior to the “hypothetically” last visit that a participant should have shown up if he or she had neither died nor been lost to follow-up (Figure S2).

The analyses were performed using the R statistical environment on a Windows platform (R-4.0.2) [27]. This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.

Comments (0)

No login
gif