Explainable Prediction of Long-Term Glycated Hemoglobin Response Change in Finnish Patients with Type 2 Diabetes Following Drug Initiation Using Evidence-Based Machine Learning Approaches

Introduction

The escalating prevalence of type 2 diabetes (T2D) poses a significant global health challenge, necessitating innovative approaches for personalized management and treatment. One pivotal aspect of T2D management is the monitoring of glycemic control, often assessed through the measurement of HbA1c levels. The management of T2D involves various treatment modalities, including lifestyle modifications, oral blood glucose lowering drugs or GLP-1 analogs, and insulin therapy.1 Predicting the changes in HbA1c levels is a crucial step for determining the most effective treatment for patients with T2D.

While randomized controlled trials (RCTs) serve as the gold standard for comprehending the impact of treatments on clinical outcomes. Subsequently, average treatment effects derived from RCTs are employed to inform evidence-based clinical decision-making for individual patients.2 However, applying population-level results to individual treatment selection may lead to suboptimal decisions, as these average treatment effects may only reflect the experiences of a specific subset of patients.3 In essence, treatment heterogeneity, a significant challenge in managing T2D, underscores the fact that a one-size-fits-all approach may not be ideal for every patient. Understanding and addressing this variability is crucial for optimizing treatment strategies, as it allows for more personalized and effective care.

Precision medicine plays a pivotal role in addressing the complexities of managing T2D, particularly in relation to patient heterogeneity.4–9 By tailoring medical interventions to individual characteristics—such as genetic profiles, lifestyle choices, and clinical history—precision medicine aims to optimize treatment effectiveness while minimizing adverse effects. Machine learning (ML) enhances this approach by identifying patterns within individual-level data, capturing the diverse and often complex interactions between clinical, genetic, demographic, and treatment-related factors.

Unlike traditional statistical models that rely on predefined assumptions about data distribution or linear relationships, ML models can handle non-linear interactions and accommodate a wide range of variables. This flexibility allows ML to capture the inherent heterogeneity of T2D, which is shaped by multiple factors. By uncovering hidden patterns in patient data, ML models provide more personalized and accurate predictions of how individual patients will respond to specific treatments. Studies demonstrate that ML-based models reveal complex relationships that might be overlooked by conventional methods, enabling more tailored and effective treatment strategies for managing T2D.

The significance of tailoring treatment strategies using ML for individual patients has gained increasing recognition.10,11 However, existing predictive models used in clinical practice often fall short in capturing patient-specific variability, such as differences in treatment response, comorbidities, and demographic factors. These limitations can hinder their utility in accurately predicting long-term treatment outcomes, including changes in HbA1c levels. Accurately predicting these changes has significant clinical implications, such as enabling healthcare providers to tailor treatment strategies more effectively, minimize the risk of adverse effects, and ultimately improve overall patient outcomes.

This study leverages electronic health records (EHR), offering a more diverse and representative patient population from real-world clinical settings.12–16 Furthermore, this study models six different antidiabetic substances, including metformin, GLP-1 analogs, DPP-4 (dipeptidyl peptidase 4) inhibitors, SGLT2 (sodium glucose co transporter 2) inhibitors, combinations of oral glucose-lowering drugs, and insulin. This broader scope enhances the generalizability and applicability of the findings to a wider range of treatment scenarios encountered in real-world clinical practice.17

This study aims to advance predictive modeling in T2D management by integrating values derived from RCTs as predictors during model training, creating an offset model that merges RCT-derived insights with real-world data. Offset modeling refers to the inclusion of external, evidence-based reference values (such as RCT outcomes) as baseline or contextual inputs in predictive models to enhance their relevance and robustness when applied to diverse real-world settings.

Furthermore, the value of these prediction models extends beyond their ability to accurately forecast changes in HbA1c levels for individual patients. It is crucial not only to provide precise predictions but also to elucidate the underlying factors driving these changes. By employing XAI techniques, this study seeks to improve the transparency and interpretability of its predictions. Explainable AI refers to methods and tools that allow users to understand, trust, and verify the reasoning behind machine learning model predictions. These techniques offer clinicians valuable insights into the factors driving the predictions, thereby facilitating more informed and confident decision-making in T2D treatment.

This study has been supported by Next-Generation HTA (HTx, 2019–2024), which is a Horizon 2020 project supported by the European Union.

Materials and Methods Study Design

Two distinct study designs were employed to assess the impact of anti-diabetic treatment on individuals in the context of HbA1c levels. The initial approach solely relied on baseline values as predictors, also capturing patient’s HbA1c levels before any treatment initiation (baseline models). This design sought to establish a foundational understanding of patient’s starting points, offering a snapshot of their metabolic status before intervention. By isolating baseline HbA1c values, this study aimed to discern initial patterns and trends among the patient cohort. Additionally, incorporating expected HbA1c changes derived from RCT values enriched the analysis, providing a standardized benchmark for anticipated treatment outcomes.18–25

In contrast, the second study design incorporated a follow-up measurement of HbA1c after drug initiation. This approach aimed to delve deeper into understanding how individual responses varied post-treatment (follow-up models). By utilizing the first follow-up HbA1c value post drug initiation, this study sought to elucidate the specific effects of treatment on individual patients for more precise prediction.

Data Description

Patients with T2D (ICD-10 code E11) diagnosed by the end of 2012 (N=10,139) were identified from regional EHR of Siun sote, the Joint Municipal Authority for North Karelia Social and Health Services, Finland. The information collected covered patient-level records from both primary and specialized health care, including diagnostic and laboratory data for 2011–2019. Data were compiled with medication purchase data for 1995–2019 obtained from the Finnish Prescription Register maintained by the Social Insurance Institution of Finland.

Antidiabetic drug initiations were identified from the Finnish Prescription Register. Patients initiating an antidiabetic drug in 2012–2018 as a new user, ie, no previous use of the drug during 1995–2011, were included. The dataset included initiations of various antidiabetic drugs, such as metformin (ATC code A10BA02), GLP-1 analogs (A10BJ), DPP-4 inhibitors (A10BH), SGLT2 inhibitors (A10BK), combinations of oral blood glucose-lowering drugs (A10BD), and insulin (A10A). Each drug purchase was assumed to last for 90 days (about 3 months), and 90-day gaps were allowed between the end date of the last purchase and the date of the subsequent purchase. The drug use period was discontinued when the gap exceeded 90 days (about 3 months), when another antidiabetic drug was initiated, when the patient died, or the administrative end (Dec 31, 2019) was reached. The drug use period was required to last for at least 12 months to be included in further analyses. Patients initiating more than one antidiabetic drug at a time were excluded. However, one patient could have several drug use periods with different antidiabetic drugs, but only the first period with a particular drug was included (ie, new user).

Outcomes

HbA1c was routinely measured with the turbidimetric inhibition immunoassay method in the regional Eastern Finland laboratory (ISLAB, \\ https://www.islab.fi). Values were standardized to the International Federation of Clinical Chemistry (IFCC) units (mmol/mol). Change in HbA1c was calculated as a difference between baseline HbA1c (measured within four months prior to or at the drug initiation) and 12-month HbA1c (closest measure of 12-month time point occurring 80–365 days after the drug initiation).

Potential Predictors

We utilized a diverse dataset consisting of patient information (age, sex, duration of T2D), drug details, comorbidities, and baseline HbA1c values to predict the change in HbA1c levels after the drug initiation. In addition to HbA1c, the data included most recent laboratory measures before antidiabetic drug initiation on fasting plasma glucose (FPG), serum total cholesterol, low-density and high-density lipoprotein cholesterols (LDL and HDL, respectively), triglycerides, and creatinine extracted from the EHRs. From creatinine, we further calculated the estimated glomerular filtration rate (eGFR) based on the CKD-EPI formula. Recordings on body mass index (BMI; kg/m2) before antidiabetic drug initiation were also available from the EHRs. Information on use of other than antidiabetic drugs within the previous year of antidiabetic initiation was gathered from the Finnish Prescription Register using the third level of ATC code. Comorbidities occurring before antidiabetic initiation were identified and categorized with ICD-10 codes from the EHRs. A variable, days, was calculated to represent the time difference between date of antidiabetic drug initiation and 12-month follow-up HbA1c, quantifying treatment response duration. This variable was employed to filter the dataset, ensuring inclusion of only those patients whose 12-month follow-up HbA1c measurement occurred between 80- and 365-days post-drug initiation. Follow-up models had one additional feature, which is the first follow-up measurement of HbA1c after a drug initiation (occurring within 80–365 days) was used as a predictor. Therefore, the sample for follow-up models was restricted to include patients who had at least two HbA1c measurements taken between 80- and 365-days post-drug initiation. A comprehensive list of potential predictors with definitions is presented in Supplementary table S1.

The expected efficacy of an antidiabetic drug in lowering blood glucose levels was based on a literature review of meta-analyses and RCT of the substances.18–25 A comprehensive list of the expected efficacies for the drugs used in this study and their 95% confidence intervals are presented in Supplementary table S2.

Data Preprocessing and Feature Selection

Data preprocessing steps were implemented to ensure data quality and model performance. Missing values were addressed through imputation and exclusion, with a specific focus on maintaining data integrity by removing rows with missing entries in the response column. The K-Nearest Neighbors (KNN) algorithm was used for imputation in datasets by considering two neighboring data points.26 Columns with constant values across all rows were discarded to prevent redundancy and streamline the dataset. Outliers were detected and removed to enhance model robustness using Ordinary Least Squares (OLS) model.27 The process involved fitting an OLS model to the training set and calculating the standardized residuals. Observations with standardized residuals exceeding a threshold of 6.5 were identified as outliers. These outliers were then removed from the training and testing sets to enhance the model’s robustness. The code exemplifies a thorough approach to addressing outliers, ensuring that the model’s performance is not influenced by extreme data points.

Data splitting is a pivotal step in ML, where the dataset is randomly divided into a training set and a test set. Specifically, 70% of the data was allocated for training, while the remaining 30% was reserved for testing. The study used the data splitting function called GroupShuffleSplit, preserving the structure of related data points indicated by the subject ID column.28 GroupShuffleSplit ensures that groups of related data (ie, same Subject ID) stay together in the training or test set, avoiding information leakage. This approach enables assessment of the model’s performance on unseen data, mirroring real-world scenarios and maintaining the integrity of relationships within the dataset. Furthermore, feature selection was done using the SelectKBest method, employing mutual_info_regression as the scoring function.28 This process selects the top k features with highest mutual information with the target variable (ie, change in HbA1c levels). Furthermore, selecting the best number of features was crucial for optimizing prediction model performance.

Prediction Modelling and Explainability

For predicting treatment response among patients with T2D, five diverse machine learning models were enlisted: linear regression (LR), multi-layer perceptron (MLP) regression, ridge regression (RR), random forest regressor (RF), and extreme gradient boosting (XGB) regressor.28–33 LR elucidated linear associations between input variables and treatment response, providing a transparent interpretation of linear relationships. MLP regression was chosen to discern non-linear patterns within the data, identifying complex relationships beyond linear correlations. RR addressed potential multicollinearity concerns, enhancing model robustness through regularization. Additionally, the ensemble models, RF and XGB, were incorporated. RF leverages a collection of decision trees to capture intricate relationships and patterns in the data, while XGB optimizes predictive performance by boosting weak learners. Subsequent regression analysis facilitated the derivation of fitted lines, visually representing the connection between predictors and treatment response. The equations of the lines of best fit were calculated and displayed for interpretation. To gauge prediction uncertainty, 95% confidence and prediction intervals were plotted, providing a comprehensive understanding of plausible outcomes based on the predicted treatment responses across the diverse models employed.

The models were trained using the training dataset and optimized through hyperparameter tuning. A total of ten (5 x (baseline models + follow-up models)) ML models were trained and tested. Evaluation metrics such as R-squared score (R²) and Root Mean Squared Error (RMSE) were computed to assess model performance. R² measures how well the model’s predictions align with actual outcomes, with values closer to 1 indicating better predictive power. In contrast, RMSE provides the average magnitude of prediction errors, with lower values signifying more accurate models.

In this study, SHAP (SHapley Additive exPlanations) was applied to enhance interpretability for baseline and follow-up MLP models to showcase the interpretability and validate model’s findings. SHAP leverages cooperative game theory and Shapley values, where features act as players collectively contributing to predictions.34 Shapley values offer a fair distribution of each feature’s contribution. SHAP provides both local and global interpretability of ML models. Locally, it explains the role of each feature in specific healthcare instances, aiding decision-making. Globally, it aggregates insights over the entire dataset, revealing consistent feature importance. This transparency is crucial in healthcare, ensuring that ML models are understandable and trustworthy for informed decision-making and patient care.

Ethics Statement

Use of the data was approved by the ethics committee of the Northern Savonia Hospital District (diary number 81/2012). The study protocol was also approved by the register administrator, Siun sote, the Joint Municipal Authority for North Karelia Social and Health Services. A separate permission to link data on medication purchases and special reimbursements was achieved from the Social Insurance Institution of Finland (diary number 110/522/2018). Only pseudonymized register-based data were utilized and individuals in the registers were not contacted. In accordance with Finnish legislation, consent from the patients was not needed as the study was carried out entirely using register data without contacting the patients.

Results

The data preprocessing steps for both the baseline and follow-up models are illustrated in Figure 1, with more detailed information available in Appendix 1. Baseline characteristics of the cohort in the baseline model are shown in Table 1. Figure 2 presents the progression of HbA1c levels from 5 to 12 months before drug initiation to 12 months after drug initiation. The patients are categorized based on the type of new drug or combination therapy they received. The pre-processed data was randomly divided into training and testing sets to evaluate the models effectively.

Table 1 Baseline Characteristics of the Cohort in Baseline Model (N=1693)

Figure 1 Data pre-processing steps for baseline and follow-up models.

Figure 2 Progression of HbA1c in T2D patients: 5–12 months pre-drug initiation to 12 months post-drug initiation, stratified by the class of the initiated antidiabetic drug.

Table 2 summarizes the performance of all ML—for both the baseline and follow-up models. The baseline model’s R² scores ranged from 0.48 to 0.55 on the testing set, with the MLP and LR models performing similarly (R² ≈ 0.54, RMSE ≈ 9.46), while the XGB model achieved the highest R² on the training set (0.99) but showed signs of overfitting on the testing set (R² = 0.47, RMSE = 10.14). In contrast, the follow-up models demonstrated improved performance across all metrics, with the MLP yielding an R² of 0.74 on the training set and 0.65 on the testing set (RMSE = 7.61). This suggests that incorporating follow-up HbA1c data enhances prediction accuracy. Additionally, the Random Forest model showed improvement in the follow-up set with a slightly higher R² (0.57) and reduced RMSE compared to the baseline model.

Table 2 Performance for All Baseline and Follow-up Models. The RCT Model is Represented Using the Mean Difference for Change in HbA1c of the Drug Based on RCT Values in Comparison to the Observed HbA1c Change

Table 3 provides a breakdown of the performance of the MLP model for different drug classes. Metformin achieved the highest R² (0.75 in the baseline model, 0.86 in the follow-up model) with low RMSE values (5.93 and 5.11, respectively), while GLP-1 analogs had the lowest R² scores (0.11 and 0.52 in baseline and follow-up models, respectively). The follow-up model significantly improved the prediction of Insulin response (R² = 0.71, RMSE = 10.52).

Table 3 Performance of MLP Regressor for Each Study Design by Drug Class on Test Set

The number of features, k, was set to 15 for the baseline model and 10 for the follow-up model after iterative testing from k = 1 to 150 for each study design. The SelectKBest function was used to select the most significant features for predicting changes in HbA1c following the initiation of anti-diabetic medications. For the baseline model, the selected features covered a wide array of important indicators, including HbA1c Baseline, Fasting Plasma Glucose (FPG), HDL, Insulin and its derivatives, Cancer or in situ carcinomas, and Mean HbA1c Change (RCT) (Figure 3(a)). Similarly, for the follow-up model, 10 key features were selected, as shown in Figure 3(b). The top features were HbA1c Baseline, Fasting Plasma Glucose (FPG), Follow-up HbA1c, HDL, Non-insulin glucose-lowering meds, and GLP-1 analog use in past year. It is noteworthy that while both the baseline and follow-up models included RCT-derived HbA1c change values as potential predictors (offset-modeling), only the baseline model identified this variable as a significant feature in the final selection.

Figure 3 Feature importance plot for baseline (a) and follow-up (b) models.

The ML models were then trained and tested, generating regression lines with confidence intervals (CI) to provide estimates of true population parameters with 95% confidence (Figure 4). The baseline model achieved an R² of 0.52 for the training set and 0.55 for the testing set, with RMSE values of 9.27 and 9.50, respectively. The follow-up model exhibited superior performance, with an R² of 0.74 for the training set and 0.65 for the testing set, and RMSE values of 6.93 and 7.62 (Table 2). However, certain models, such as RF and XGB, showed signs of overfitting. The baseline model showed variability in performance across drug classes. For Metformin, it achieved an R² of 0.75 with low RMSE values, but for GLP-1 analogs, it had a lower R² of 0.11. The follow-up model demonstrated overall improved performance, especially in predicting Insulin response, with an R² of 0.71, significantly outperforming the baseline model.

Figure 4 Fitted regression lines and drug class distribution for MLP baseline (a) and follow-up (b) models. Regression lines represent the model’s fit, with samples differentiated by color to denote various drug classes. The Figure also includes 95% confidence intervals (CI) and 95% prediction limits.

Global SHAP explanations were employed to evaluate the overall feature importance of the model (Figure 5), providing an overview of how various features contributed to predictions. SHAP decision plots (Figure 6) then offered a more detailed, individualized perspective, illustrating the specific impact of each feature on individual predictions for four randomly selected samples. To further ensure transparency regarding the model’s decision-making process, Figures S1 and S2 display SHAP decision plots for four randomly selected samples (from those used in Figure 6). These plots present individual feature values in detail, allowing for a complete view of how these features influenced the model’s decisions for each sample.

Figure 5 Global SHAP explanations for MLP baseline (a) and follow-up (b) models. This Figure provides insights into the overall importance of features, offering a comprehensive view of the model’s interpretability.

Figure 6 SHAP decision plots with chosen samples for MLP baseline (a) and follow-up (b) models. The Figure highlights SHAP decision plots, revealing the impact of four specifically chosen samples. A legend accompanies these plots, indicating the actual HbA1c change, providing a clear visualization where x-axis represent the predicted HbA1c change.

Performance comparisons between the observed, predicted, and expected HbA1c changes for all test samples are illustrated in Figure 7a and b. In the baseline model, out of 510 testing samples, predictions for 287 samples were more closely aligned with the actual values, while predictions for 223 samples were closer to the expected values from RCT data. In the follow-up model, with 348 test samples, predictions for 202 samples were more accurate relative to the actual values, while 146 predictions were closer to the expected RCT outcomes. Supplementary figures S1 and S2 provide additional insights into the predictions for individual patients, while Supplementary figure S3 presents a comparison of changes in HbA1c levels and BMI from baseline to 12 months across different drug classes.

Figure 7 Observed, predicted, and expected HbA1c change for MLP baseline (a) and follow-up (b) models. This Figure presents a comprehensive overview, illustrating the observed, predicted, and expected (based on randomized controlled trials) HbA1c change for all test samples in the baseline and follow-up models.

Discussion

This study introduces several methodological advancements in ML-based T2D management. The use of real-world data captures a broad and representative patient population, enhancing the model’s relevance to diverse clinical settings. The study models six different antidiabetic substances, increasing the generalizability of its findings. The incorporation of RCT-derived values as predictors during model training introduces a novel offset model that blends controlled trial data with real-world patient outcomes, strengthening the model’s predictive capacity. Additionally, the application of XAI techniques provides insights into the key factors driving changes in HbA1c levels, fostering transparency and trust in the model’s predictions.

When the outputs of the baseline and follow-up models were compared to estimated average results from RCT, valuable insights emerged regarding their performance. The findings suggest that ML models have a remarkable capacity to generate more accurate and personalized predictions for individual patients compared with the broader expectations set by RCT. Notably, ML models also leverage a richer feature set compared to traditional statistical models, allowing for the incorporation of a wider range of patient-specific variables, which further enhances predictive accuracy.35 This comprehensive data integration enables ML models to capture nuanced patterns that statistical methods may overlook, ultimately providing more precise results.

Furthermore, the follow-up models consistently outperformed the baseline models in predicting changes in HbA1c levels. This improvement aligns with the understanding that incorporating follow-up HbA1c measurements after drug initiation enhances the model’s accuracy and provides early indication of whether the drug’s effects are taking hold sooner than expected.

Baseline models exhibit both novelty and superiority in predicting HbA1c changes compared with previous research.11,17,36 Specifically, in a study utilizing penalized regression for predicting HbA1c outcomes, the R2 score was reported as 0.30.11 The baseline MLP model, in this study, achieved higher performance metrics with an R² of 0.52 and 0.55 for training and testing datasets respectively. The baseline model’s performance varied across drug classes. It showed high accuracy in predicting Metformin response but lower accuracy for GLP-1 analogs.

The follow-up MLP model further outperformed the baseline MLP model, achieving higher R² values (0.74 for training and 0.65 for testing), indicating more precise predictions. Most models generalized well to the testing dataset, though the RF and XGB models showed signs of overfitting during training. The follow-up MLP model, however, consistently demonstrated better performance, especially in predicting insulin response, with an R² of 0.71 for testing data.

The progression of mean HbA1c in T2D patients from Figure 2 highlights that all drug classes experienced a reduction in HbA1c levels after drug initiation, but distinct patterns emerged across different drug classes. For example, patients using all drugs except GLP-1 analogs experienced an initial increase in HbA1c levels during the initiation phase (0–4 months before treatment), compared to the 5–12 months prior to initiation. The indication for the initiation of GLP-1 analogs can be related in addition to better glycemic control also to weight reduction and decreasing hypoglycemia risk, which might explain smaller increase of HbA1c levels in the initiation phase as compared to other treatment classes (see Supplementary figure S3). Furthermore, HbA1c levels showed little further reduction from the first follow-up measurement to the 12-month mark, implying that the most significant changes occur shortly after drug initiation.

This study demonstrates favorable outcomes for the use of GLP-1 analogs in patients with T2D, showing significant improvements in glycemic control and weight reduction with no initial HbA1c increase commonly seen with other drug classes. Given that GLP-1 analogs were primarily prescribed to patients with high BMI and elevated insulin levels—key indicators of obesity—these findings support their use even as the first-line therapy for T2D patients with comorbid obesity-related conditions. Current guidelines already prioritize GLP-1 analogs as the first-line treatment for patients with cardiovascular or renal comorbidities. Based on our data, expanding their use to include obese patients at an earlier stage could improve their outcomes. However, external validation in diverse cohorts is needed to confirm these findings and refine treatment guidelines for this population.

The analysis of the models using SHAP revealed important insights into the predictors of HbA1c changes. In the baseline MLP model (Figure 5a), several influential features were identified, including baseline HbA1c levels, the duration of T2D, and the mean HbA1c change from prior clinical trials. Specifically, higher baseline HbA1c levels were associated with more significant reductions in HbA1c following treatment, whereas a longer duration of T2D tended to correlate with less favorable treatment outcomes. Other notable factors included recent insulin use, the presence of comorbidities, cancer, and fasting plasma glucose. In the follow-up MLP model (Figure 5b), both baseline and follow-up HbA1c levels emerged as critical predictors of future outcomes. This analysis highlighted the significant roles of non-insulin glucose-lowering medications, fasting plasma glucose, and HDL cholesterol in influencing predictions.

The SHAP decision plots (Figure 6) provided further detail on how individual features impacted models’ prediction for randomly selected set of patients. In the baseline model, higher baseline HbA1c levels and longer durations of T2D were linked to larger predicted decreases in HbA1c. However, it was also noted that extended durations of T2D might result in less favorable treatment responses. In the follow-up MLP model, the decision plots illustrated the significant influence of both baseline and follow-up HbA1c levels on predictions, while also underscoring the importance of fasting plasma glucose and the use of GLP-1 analogs. However, it is important to acknowledge that SHAP has limitations, especially when it comes to interpreting interactions between features or when the model complexity increases.

The influence of drug class initiation, baseline HbA1c levels, and diabetes history are rather obvious; however, the role of comorbidities like cancer is less straight forward. Cancer has impact on metabolic processes and overall health.37 A few studies suggest a relationship between cancer and malnutrition, leading to a catabolic status characterized by increased metabolism including glucose metabolism.38,39 The increased glucose metabolism associated with cancer may lead to decreased insulin secretion and reduced effectiveness on post-receptor levels. In practical terms, this suggests that individuals with T2D who also suffer from cancer may experience challenges in managing their blood sugar levels effectively. This could manifest as difficulties in achieving glycemic control despite adherence to treatment regimens. Also, cancer treatments such as chemotherapy or radiation therapy may impact glucose metabolism and insulin sensitivity, potentially exacerbating glycemic control issues in individuals with T2D.38 This example highlights the importance of considering comorbidities when designing the individualized treatment regimen.

The feature combinations of lipid-modifying agents in baseline models are supported by relevant literature, including findings from the meta-analysis examining the influence of ezetimibe treatment on glycemic control. The meta-analysis investigated the effects of combining ezetimibe with statin therapy on glycemic parameters, shedding light on the potential influence of lipid-modifying agent combinations on HbA1c levels.40

The prescription of systemic antifungal medications in follow-up model indicates the impacts of fungal infections, which are more common in diabetic patients with high glucose levels.41 This feature is crucial for understanding the burden of infections in the study population and their association with glycemic control and other risk factors. Neurotic, somatoform, stress disorders incl. eating disorders are also significant as stress is a known contributor to inflammation and glucose dysregulation in diabetes.42 Stress can induce hyperglycemia through the release of stress hormones like cortisol, exacerbating inflammation and insulin resistance.42 The inclusion of these disorders helps to explore the psychosomatic interface in diabetes management and the impact of mental health on glycemic control.42 Lastly, conditions like back pain and degenerative disc disease are associated with chronic inflammation, which can aggravate insulin resistance and complicate diabetes management. The presence of Back diseases in follow-up model warrants a closer examination of the inflammatory pathways that might contribute to poor glycemic control and increased comorbid complications.

An intriguing observation in the used data set is an increase in HbA1c levels in some patients post treatment (Figure 7), even though, theoretically, HbA1c should decrease after drug initiation. This warrants further investigation. A key factor contributing to this phenomenon could be treatment adherence, as some patients in the EHR might have never started taking the prescribed drug, leading to a lack of the expected HbA1c reduction.43 Other potential factors include individual patient responses or clinical complexities. Analyzing these findings further will offer valuable insights for refining our model and understanding the clinical implications of such unexpected trends, particularly in relation to adherence.

The integration of EHR with ML models offers significant advantages in healthcare research. EHRs provide access to large-scale, real-world data, enabling more personalized and accurate predictions of patient outcomes.12–16 However, EHR data can present challenges, particularly related to missing values, which can affect the quality and reliability of the models. In this study, data pre-processing led to a significant reduction in sample size due to missing values, as incomplete records were removed. This reduction can negatively affect ML model training by limiting the available data, which may decrease model robustness and increase the risk of overfitting. Overfitting can cause the model to perform well on training data but poorly on unseen data, reducing its predictive accuracy and generalizability. Despite efforts to mitigate overfitting through techniques like cross-validation and hyperparameter tuning, this remains a limitation that warrants further attention.

Missingness in EHR data is often due to patients not having their values measured within a specific time frame, which may vary based on variable definitions. For example, BMI might not be recorded if the patient is reluctant (eg, severely obese individuals) or if the patient has a normal weight, making the measurement seem unnecessary. Additionally, this study acknowledges the potential impact of unmeasured confounding variables such as medication adherence, lifestyle factors, and undetected comorbidities, which were not captured in the dataset. These factors could influence changes in HbA1c levels and may limit the study’s ability to fully explain patient outcomes.

The focus of this study on a 12-month follow-up period may also limit its ability to capture long-term treatment effects or changes in patient health status. Longer follow-up periods would provide a more comprehensive view of treatment durability and changes in health over time. Moreover, the performance of the models varied significantly across different drug classes, particularly with treatments like GLP-1 analogs, suggesting that predictive accuracy may be affected by patient heterogeneity within drug classes. Future research should explore subgroup-specific modeling approaches and include additional predictors to improve the performance and generalizability of these models. An important area for future research is evaluating the generalizability of the model across various healthcare settings. Testing the model in diverse clinical environments, such as primary care, specialty clinics, and hospitals across different geographic regions, will help assess its robustness and adaptability. This will allow us to understand how well the model performs in real-world settings beyond the scope of this study and identify potential adjustments needed for broader applicability.

As part of our future work, we are developing a personalized treatment selection model aimed at identifying the optimal therapy for type 2 diabetes patients. The model focuses on selecting between SGLT2 and DPP4 inhibitors based on individual patient characteristics and their relative effects on multiple health parameters as responses. These parameters include HbA1c, BMI, LDL, and HDL levels.

Conclusion

This study successfully used machine learning to predict changes in HbA1c levels following the initiation of different antidiabetic drugs in patients with T2D from the North Karelia region in Finland. Prediction accuracy is improved by incorporating expected HbA1c changes from RCT into the baseline model, creating an offset model that leverages clinical trial data as a benchmark for real-world predictions.

The findings of this study indicate that ML-based models enable more precise targeting of appropriate medications for individual patients. Additionally, the use of follow-up models facilitates timely adjustments to treatment when anticipated therapeutic effects are not observed, potentially improving patient outcomes. Additionally, the results contribute to a better understanding of how antidiabetic drugs influence glycemic control, helping clinicians choose appropriate treatments for T2D patients. The study also emphasizes the interpretability and robustness of the ML models, demonstrating their stability and effectiveness across different dataset splits. The use of the SHAP library further enhanced transparency by revealing the importance of individual features, making the model’s predictions more clinically relevant. Future research could explore additional predictors, incorporate longitudinal models, and use larger datasets to improve the performance and generalizability of these models.

Abbreviations

ATC, Anatomical Therapeutic Chemical; AI, artificial intelligence; BMI, body mass index; CI, confidence interval; eGFR, estimated glomerular filtration rate; EHR, electronic health records; FPG, fasting plasma glucose; HbA1c, glycated hemoglobin A1c; HTA, Health Technology Assessment; HDL, high-density lipoprotein; ICD, International Classification of Diseases; IFCC, International Federation of Clinical Chemistry; KNN, K-Nearest Neighbors; LDL, low-density lipoprotein; LR, linear regressor; ML, machine learning; MLP, multi-layer perceptron; OAD, oral antidiabetic drugs or GLP-1 analogs (incl. metformin, sulfonylureas, combinations of oral blood glucose lowering drugs, glitazones, DPP-4 inhibitors, glinides, GLP-1 analogs, and SGLT2 inhibitors); OLS, Ordinary Least Squares; RF, random forest regressor; RMSE, Root Mean Square Error; RR, ridge regressor; RCT, randomized controlled trials; R2, R-squared value; SHAP, Shapley additive explanations; SII, Social Insurance Institution of Finland; T2D, type 2 diabetes; XAI, explainable artificial intelligence; XGB, extreme gradient boosting.

Data Sharing Statement

Access to data is regulated by the European Union and Finnish laws and therefore, sharing of sensitive data is not possible and data are not publicly available. An anonymized version of the data is available for researchers who meet the criteria as required by the European Union and Finnish laws for access to confidential data with a data permit of an appropriate authority. Contact information: [email protected] for data requests from the Siun sote – Joint municipal authority for North Karelia social and health services and [email protected] for data requests from the Social Insurance Institute.

Ethics Approval and Informed Consent

Use of the data was approved by the Ethics Committee of the Northern Savonia Hospital District (diary number 81/2012). The study protocol was also approved by the register administrator, the Siun sote. A separate permission to link data on medication purchases and special reimbursements was achieved from the SII (diary number 110/522/2018). Only register-based data were utilized and thus, consent from the patients was not needed. This study complies with the Declaration of Helsinki.

Consent for Publication

All authors confirm that the details of any images, etc. can be published and that the persons providing consent have been shown the article contents to be published.

Acknowledgments

This paper was initially presented as a poster during the Arctic AI Days, showcasing interim findings. The poster was subsequently published on the funding project’s website: https://www.htx-h2020.eu/wp-content/uploads/2024/02/Poster_HbA1c_Response.pdf

Author Contributions

GC: Conception, study design, data processing, analysis, interpretation, drafting the manuscript, writing, and revisions. PL: Conception, data curation, interpretation, writing, revisions, and critical review of the study. TL: Conception, lead data curation, interpretation, writing, revisions, and critical review of the study. PS, ST, AI, and JM: Conception, interpretation, and critical review of the study. JR: Funding acquisition, lead supervision, and critical review of the study. All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Funding

This study was partly supported by the Finnish Diabetes Association, the Research Committee of the Kuopio University Hospital Catchment Area for the State Research Funding (project QCARE, Joensuu, Finland), the Strategic Research Council at the Academy of Finland (project IMPRO, 312703), and the HTx project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement N° 825162. The funders had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Disclosure

JM is a founding partner of ESiOR Oy. This company was not involved in carrying out this research. GC, PL, PS, ST, AI, TL, and JR declare no conflicts of interest in this work.

References

1. The Finnish Medical Society Duodecim. Type 2 Diabetes. Current Care Guidelines. Working Group Set up by the Finnish Medical Society Duodecim, the Finnish Society of Internal Medicine and the Medical Advisory Board of the Finnish Diabetes Society. Helsinki; 2020. Available from: https://www.kaypahoito.fi/hoi50056#K. Accessed February 21, 2025

2. Shields BM, Dennis JM, Angwin CD, et al. Patient stratification for determining optimal second-line and third-line therapy for type 2 diabetes: the TriMaster study. Nat Med. 2023;29.

3. Ioannidis JPA, Lau J. The impact of high-risk patients on the results of clinical trials. J Clin Epidemiol. 1997;50(10):1089–1098. doi:10.1016/S0895-4356(97)00149-2

4. Jeon E. Precision medicine in type 2 diabetes. J Korean Diabetes. 2022;23(2):77–82. doi:10.4093/jkd.2022.23.2.77

5. Prasad RB, Groop L. Precision medicine in type 2 diabetes. J Internal Med. 2019;285(1):40–48. doi:10.1111/joim.12859

6. Fitipaldi H, McCarthy MI, Florez JC, Franks PW. A global overview of precision medicine in type 2 diabetes. Diabetes. 2018;67(10):1911–1922. doi:10.2337/dbi17-0045

7. Loscalzo J. Network medicine and type 2 diabetes mellitus: insights into disease mechanism and guide to precision medicine. Endocrine. 2019;66(3):456–459. doi:10.1007/s12020-019-02042-4

8. Mohan V, Radha V. Precision diabetes is slowly becoming a reality. Med Princ Pract. 2019;28(1):1–9. doi:10.1159/000497241

9. Williams DM, Jones H, Stephens JW. Personalized type 2 diabetes management: an update on recent advances and recommendations. Diabetes Metab Syndr Obes. 2022;15:281–295. doi:10.2147/DMSO.S331654

10. Murphy SA, Collins LM, Rush AJ. Customizing treatment to the patient: adaptive treatment strategies. Drug Alcohol Depend. 2007;88:S1–S3. doi:10.1016/j.drugalcdep.2007.02.001

11. Venkatasubramaniam A, Mateen BA, Shields BM, et al. Comparison of causal forest and regression-based approaches to evaluate treatment effect heterogeneity: an application for type 2 diabetes precision medicine. BMC Med Inform Decis Mak. 2023;23(1). doi:10.1186/s12911-023-02207-2.

12. Coquet J, Bievre N, Billaut V, et al. Assessment of a clinical trial-derived survival model in patients with metastatic castration-resistant prostate cancer. JAMA Network Open. 2021;4(1):e2031730. doi:10.1001/jamanetworkopen.2020.31730

13. Martinez AI, Perez-Vilar S, Shinde M, et al. Comparing outcomes in trial-eligible vs real-world COVID-19 patients: the case of invasive mechanical ventilation. Pharmacoepidemiol Drug Saf. 2021;30:97.

14. Rajkomar A, Patel S, Valencia V, et al. The data trial: a randomized controlled trial of next generation audit and feed back. J Gen Intern Med. 2017;32:S335.

15. Jones WS, Wruck LM, Harrington RA, Hernandez AF. Iterative approaches to the use of electronic health records data for large pragmatic studies. Contemp Clin Trials. 2022;117:106789. doi:10.1016/j.cct.2022.106789

16. Wu P, Zeng D, Fu H, Wang Y. On using electronic health records to improve optimal treatment rules in randomized trials. Biometrics. 2020;76(4):1075–1086. doi:10.1111/biom.13288

17. Dennis JM, Young KG, McGovern A, et al. Development of a treatment selection algorithm for SGLT2 and DPP-4 inhibitor therapies in people with type 2 diabetes: a retrospective cohort study. Lancet Digit Health. 2022;4(2):e873–83.

18. Zhang XW, Zhang XL, Xu B, Kang LN. Comparative safety and efficacy of insulin degludec with insulin glargine in type 2 and type 1 diabetes: a meta-analysis of randomized controlled trials. Acta Diabetol. 2018;55(5):429–441. doi:10.1007/s00592-018-1107-1

19. Tsapas A, Avgerinos I, Karagiannis T, et al. Comparative effectiveness of glucose-lowering drugs for type 2 diabetes: a systematic review and network meta-analysis. Ann Internal Med. 2020;173(4):278–286. doi:10.7326/M20-0864

20. Rosenstock J, Inzucchi SE, Seufert J, et al. Initial combination therapy with alogliptin and pioglitazone in drug-naïve patients with type 2 diabetes. Diabetes Care. 2010;33(11):2406–2408. doi:10.2337/dc10-0159

21. Rosenstock J, Hansen L, Zee P, et al. Dual add-on therapy in type 2 diabetes poorly controlled with metformin monotherapy: a Randomized double-blind trial of saxagliptin plus dapagliflozin addition versus single addition of saxagliptin or dapaglif lozin to metformin. Diabetes Care. 2015;38(3):376–383. doi:10.2337/dc14-1142

22. Goldstein BJ, Feinglos MN, Lunceford JK, Johnson J, Williams-Herman DE. Effect of initial combination therapy with sitagliptin, a dipeptidyl peptidase-4 inhibitor, and metformin on glycemic control in patients with type 2 diabetes. Diabetes Care. 2007;30(8):1979–1987. doi:10.2337/dc07-0627

23. Einhorn D, Rendell M, Rosenzweig J, et al. Pioglitazone hydrochloride in combination with metformin in the treatment of Qpe 2 diabetes mellitus: a randomized, placebo-controlled study for the pioglitazone 027 study group*. Lipids. 2000;22(12):1395–409.

24. De Fronzo RA, Burant CF, Fleck P, et al. Efficacy and tolerability of the DPP-4 inhibitor alogliptin combined with pioglitazone, in metformin-treated patients with type 2 diabetes. J Clin Endocrinol Metab. 2012;97(5):1615–22.

25. Cai X, Gao X, Yang W, Han X, Ji L. Efficacy and safety of initial combination therapy in treatment-naïve type 2 diabetes patients: a systematic review and meta-analysis. Diabetes Therapy. 2018;9(5):1995–2014. doi:10.1007/s13300-018-0493-2

26. Troyanskaya O, Cantor M, Sherlock G, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520–525. doi:10.1093/bioinformatics/17.6.520

27. Seabold S, Perktold J. Statsmodels: econometric and statistical modeling with python. Proc Python Sci Conf. 2010. doi:10.25080/majora-92bf1922-011

28. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–2830.

29. Kingma DP, Ba JL. Adam: a method for stochastic optimization. in 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings; 2015.

30. Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. J Mach Learn Res. 2010;9.

31. Hinton GE. Connectionist learning procedures. Artif Intell. 1989;40(1–3):185–234. doi:10.1016/0004-3702(89)90049-0

32. Breiman L. Random forests. Mach Learn. 2001;45:5–32.

33. Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn. 2006;63(1):3–42. doi:10.1007/s10994-006-6226-1

34. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;2017(December).

35. Ihalapathirana AT, Chalkou K, Siirtola P, et al. Explainable Artificial Intelligence to predict clinical outcomes in type 1 diabetes and relapsing-remitting multiple sclerosis adult patients. Inform Med Unlocked. 2023;42:101349. doi:10.1016/j.imu.2023.101349

36. Home PD, Shen C, Hasan MI, et al. Predictive and explanatory factors of change in HbA1c in a 24-week observational study of 66,726 people with type 2 diabetes starting insulin analogs. Diabetes Care. 2014;37(5):1237–1245. doi:10.2337/dc13-2413

37. Xu C-X. Diabetes and cancer: associations, mechanisms, and implications for medical practice. World J Diabetes. 2014;5(3):372. doi:10.4239/wjd.v5.i3.372

38. Olatunde A, Nigam M, Singh RK, et al. Cancer and diabetes: the interlinking metabolic pathways and repurposing actions of antidiabetic drugs. Can Cell Inter. 2021;21(1). doi:10.1186/s12935-021-02202-5

39. Huxley R, Ansary-Moghaddam A, Berrington De González A, Barzi F, Woodward M. Type-II diabetes and pancreatic cancer: a meta-analysis of 36 studies. Br J Cancer. 2005;92(11):2076–2083. doi:10.1038/sj.bjc.6602619

40. Wu H, Shang H, Wu J. Effect of ezetimibe on glycemic control: a systematic review and meta-analysis of randomized controlled trials. Endocrine. 2018;60(2):229–239. doi:10.1007/s12020-018-1541-4

41. Mandal SM, Mahata D, Migliolo L, et al. Glucose directly promotes antifungal resistance in the fungal pathogen, Candida spp. J Biol Chem. 2014;289(37):25469–25473. doi:10.1074/jbc.C114.571778

42. Sharma K, Akre S, Chakole S, Wanjari MB. Stress-induced diabetes: a review. Cureus. 2022. doi:10.7759/cureus.29142

43. Carls GS, Tuttle E, Tan R-D, et al. Understanding the gap between efficacy in randomized controlled trials and effectiveness in real-world use of GLP-1 RA and DPP-4 therapies in patients with type 2 diabetes. Diabetes Care. 2017;40(11):1469–1478. doi:10.2337/dc16-2725

Comments (0)

No login
gif