Identifying thresholds for meaningful improvements in NTDT-PRO scores to support conclusions about treatment benefit in clinical studies of patients with non-transfusion-dependent beta-thalassaemia: analysis of pooled data from a phase 2, double-blind, placebo-controlled, randomised trial

Statistical analyses were conducted by using SAS V.9.4 or higher (SAS Institute). Analyses were performed using blinded data for all randomised participants.

Clinically meaningful within-patient threshold for improvement

Consistent with FDA guidance, an anchor-based analysis was implemented as the primary approach to estimate clinically meaningful within-patient improvement in T/W and SoB scores.17 The anchor-based approach uses an external criterion to categorise patients into a priori-determined groups with different levels of self-reported treatment response (eg, improvement, no change, worsening). Appropriate anchors should be described as plainly understood, assessing similar concepts to the concept measured by the target assessment (the NTDT-PRO in this case) and having sufficient correlation with the target PRO measure (correlation coefficient ≥0.3).

The use of multiple anchors is recommended by the FDA.17 Thus, the following clinical and PRO measures that were used in BEYOND alongside the NTDT-PRO were evaluated for their suitability as anchors for this analysis: haemoglobin level, PGI-S, PGI-C, FACIT-F Fatigue Subscale (FS), FACIT-F item HI7 (‘I feel fatigued the past 7 days’), FACIT-F item HI12 (‘I feel weak all over the past 7 days’), FACIT-F item An2 (‘I feel tired the past 7 days’), FACIT-F item An5 (‘I have energy the past 7 days’), SF-36v2 vitality, SF-36v2 item 9e (‘How much of the time during the past week did you have a lot of energy?’), SF-36v2 item 9g (‘How much of the time during the past week did you feel worn out?’) and SF-36v2 item 9i (‘How much of the time during the past week did you feel tired?’). Haemoglobin level was chosen because it is a well-established clinical outcome in NTDT and was used to define the primary efficacy endpoint in BEYOND.2 16 23 PGI-S and PGI-C, which measure the severity of overall NTDT-related symptoms and change in the overall symptoms, respectively, are anchors recommended by the FDA.17 The FACIT-F FS and SF-36v2 vitality scores are PRO domain scores measuring concepts related to the NTDT-PRO T/W domain with previously established clinically meaningful within-patient change thresholds described below. Finally, FACIT-F items HI7, HI12, An2 and An5, and SF-36v2 items 9e, 9g and 9i were chosen as they are single-item Verbal Rating Scales (VRSs), each measuring concepts like those targeted by the NTDT-PRO T/W domain and having response options that could be easily interpreted to indicate different levels of change.

Spearman’s rank correlation coefficients between changes in T/W and SoB domain scores from baseline to weeks 13–24 and changes in the 12 potential anchors over the same period were calculated (except for PGI-C, which is already a measure of change from the start of the study, where the absolute score at weeks 13–24 was used in the correlation calculations). Five of the potential anchors with the highest correlation coefficients (and absolute value ≥0.3) with both NTDT-PRO T/W and SoB domains were chosen to be used in the anchor-based analyses.24 25 Patients were then categorised by level of response on each of the five chosen anchors, and descriptive statistics on the change in NTDT-PRO T/W and SoB scores and corresponding empirical cumulative distribution function (eCDF) and probability distribution function (PDF, using the kernel density estimator) curves were generated for each of the levels of response. Levels of response were defined (see online supplemental table S1) based on the clinically meaningful within-patient improvement threshold on the anchors (for continuous scales), and their meaningfulness was confirmed on inspection of the eCDF curves. Meaningful improvement on the PGI-S was defined as a decrease of 1 point, and 4-point and 6.7-point increases on the FACIT-F FS and SF-36v2 vitality domains were chosen to reflect meaningful improvements based on the findings by the instruments’ developers.20 26 For each of the FACIT-F and SF-36v2 VRS items included as anchors, a 1-point change (ie, one level change on the VRS) was defined as a meaningful change (also confirmed on inspection of the eCDF curves).

Distribution-based estimates, suggested as a supportive approach by the FDA, were given by the SE of measurement (SEM, as estimated based on the method provided in online supplemental material 1) and half of the SD at baseline of the NTDT-PRO T/W and SoB scores.17 27–30

Mean and median changes in the NTDT-PRO T/W and SoB domain scores, obtained from the a priori-determined anchor group with the level(s) of improvement deemed to be meaningful (which were guided by the eCDF and PDF curves), were considered in triangulation of the final clinically meaningful within-patient improvement thresholds. Estimates from the receiver operating characteristic (ROC) curve analyses and distribution-based analyses were considered supportive in determining the thresholds, or ranges of thresholds, for each NTDT-PRO domain.

Finally, to assess the appropriateness of the newly derived meaningful improvement thresholds, the percentages of patients who would be considered responders on the NTDT-PRO T/W and SoB domains when applying the thresholds were calculated among those patients who achieved an average ≥1.0 g/dL (and ≥1.5 g/dL) change from baseline to weeks 13–24.

Symptomatic threshold

To estimate the symptomatic threshold for the NTDT-PRO T/W domain score, ROC analysis was performed. All potential NTDT-PRO T/W score thresholds were assessed for their accuracy at classifying symptomatic and less/asymptomatic participants, as defined using FACIT-F FS (comprising 13 items specific to fatigue) and SF-36v2 vitality (comprising questions about patients’ perception of their energy levels and tiredness). These scales were selected as their concepts overlap with those that the NTDT-PRO T/W aims to capture (ie, T/W). Patients with FACIT-F FS score <43 or SF-36v2 vitality score <45 were defined as more symptomatic and those with FACIT-F FS score ≥43 or SF-36v2 vitality score ≥45 as less/asymptomatic.20 32

ROC analyses were conducted using pooled assessments from baseline and weeks 6, 12, 18 and 24, with area under the curve (AUC) values of 0.5, 0.7 and 1.0 indicating no diagnostic ability (ie, similar to random guessing), good diagnostic ability and perfect diagnostic accuracy, respectively. Similar to the analysis to estimate the meaningful within-patient improvement threshold, the NTDT-PRO T/W score that maximised Youden’s index was identified as the optimal cut-off above which scores indicate symptomatic disease and only those cut-off values from the ROC analyses with AUC ≥0.70, indicating good performance, were considered.33 34

Comments (0)

No login
gif