MRI-based texture analysis for breast cancer subtype classification in a multi-ethnic population

This study aimed to identify the most effective texture features from specific MRI sequences for classifying breast cancer molecular subtypes within a multi-ethnic population (comprising Chinese, Malay, and Indian groups) using a hybrid Random Forest–recursive feature elimination (RF-RFE) machine learning algorithm. Our findings indicate that the most discriminative features for breast cancer subtype classification, based on the AUROC metric, included tumour size, shape, margin characteristics, and intensity-related features, including the pattern of intensity distribution within the tumour. The MRI sequences that contributed the most significant features were inversion recovery and T1 post-contrast sequences.

Previous studies have employed RF models for various prediction and classification tasks in breast cancer, such as predicting pathological complete response (pCR) after chemotherapy [23], classifying prognostic biomarkers and subtypes [24], and identifying metastatic brain tumour types [25]. As a supervised machine learning algorithm, RF is well suited for both regression and classification tasks. It is capable of handling datasets with a mix of continuous and categorical variables, making it particularly effective for classifying breast cancer subtypes, as demonstrated in our study.

The hybrid RF-RFE approach is particularly advantageous in radiomics due to its ability to manage high-dimensional, complex, and noisy data. This method has also been successfully applied in previous studies, such as the prediction of pCR after chemotherapy using MRI, where RF and RFE demonstrated high accuracy in classifying pCR following neoadjuvant chemotherapy [26]. In addition, Mylona et al. evaluated multiple machine learning models and identified RFE as one of the most effective feature selection techniques, alongside L1-LASSO, RF importance (RF-imp), and Boruta [27]. Previous study by Mitani, Ono [28], reported that RF-RFE was superior to XGBoost when combined with RFE. Radiomic datasets, as used in this study, involve high-dimensional data with hundreds of features derived from MRI. The RF-RFE approach effectively reduced dimensionality by selecting only the most relevant features, thereby improving model performance and minimising overfitting. RF ranks feature importance using metrics, such as the Gini Impurity, while RFE iteratively removes less significant features to prevent correlated or redundant features from negatively impacting the model. RFE operates by recursively eliminating features and building models on subsets of the data to identify the most predictive feature set. By selecting a smaller, interpretable feature set through RFE, the resulting model becomes better aligned with clinical applications, enabling clinicians to understand and trust its predictions.

The sequence that yielded the greatest number of characteristics following selection and reduction in our investigation is inversion recovery (IR) (2 features for luminal, 1 feature for HER2-enriched, and 2 for TNBC). This is followed by the T1W post-contrast sequence, which provided two features for HER2-enriched and one for TNBC. The T1W pre-contrast sequence identified one feature for luminal and one for HER2-enriched. In summary, 7 out of the 10 radiomics features were derived from non-contrast-enhanced sequences. This highlights the relevance of non-contrast-enhanced sequences, specifically IR, in providing comparable radiomics data and independence from the reliability of angiogenesis or perfusion information in radiomics research. This applies specifically to some instances where the use of gadolinium is contraindicated or not advised [29, 30].

Morphological tumour features, including shape and margin, play a crucial role in lesion characterisation in MRI breast, as outlined in the standardised reporting guidelines of the ACR-BI-RADS committee [31]. The shape-related features selected in this study, which are sphericity, volume in voxels, and compactness, align with these established radiological criteria. Therefore, it is not surprising that morphology emerged as an important radiomic feature for classifying HER2-enriched and TNBC subtypes in our study. This finding is consistent with previous research, which also reported significant differences in shape features between breast cancer subtypes [32]. Figure 10 are sample cases from our study demonstrating the difference in features of each breast cancer subtype (Fig. 10).

Fig. 10figure 10

Case example of MRI images (post-contrast) in different molecular subtypes from the study. A Luminal, B HER2-enriched and C triple-negative breast cancer

The differences in imaging and texture features observed among breast cancer subtypes in our study can be related to their known cellular and pathological characteristics. TNBC typically demonstrate high cellularity and central tumour necrosis, which may contribute to the rim-enhancement and increased texture heterogeneity seen on MRI. HER2-enriched tumours are associated with increased angiogenesis and higher cellular density, aligning with the observed higher enhancement patterns and certain texture parameters indicating heterogeneity. In contrast, luminal subtypes generally have lower cellular density and slower proliferation rates, which are reflected in less aggressive enhancement patterns and lower heterogeneity on texture analysis. By aligning these imaging findings with the established biological behaviour of the subtypes, our study supports the potential utility of MRI-based texture analysis as a surrogate marker for tumour biology.

The present study highlighted the significance of sequence selection and model optimisation in enhancing accuracy and likelihood ratios for breast cancer subtype classification. Previous research employing IR images from MRI in radiomics studies for breast cancer has focussed on predicting LVI [33] and predicting ALN in breast cancer [34]. Gamal et al. [35] demonstrated that radiomic features extracted from non-contrast sequences, including T1W, T2W, and IR, can predict neoadjuvant chemotherapy response in breast cancer.

The semantic visually perceptible tumour morphological features, including shape and margin, are critical findings for lesion characterisation in MRI breast according to the standardised reporting guidelines of the ACR-BI-RADS committee [31]. Consequently, it is not unexpected that a morphological feature (shape) is among the selected radiomic characteristics for the classification of HER2-enriched and TNBC subtypes in our study. The shape feature has also been previously documented to have a considerable variance among subtypes [32]. In this study, volume, compactness, and sphericity were selected as key features for the HER2-enriched and TNBC. This is likely because tumour morphology and spatial characteristics are particularly relevant for these aggressive subtypes, which exhibit distinct morphological and growth patterns compared to others [36]. Biologically, TNBCs are inherently more aggressive, making features such as compactness valuable in quantifying the shape. Compactness is a measure of how closely a tumour’s shape resembles a perfect sphere, distinguishing TNBC from the other subtypes [37]. Compactness measures surface irregularity, with lower compactness values indicating greater irregularity.

In contrast, sphericity evaluates the overall resemblance to a sphere, where lower sphericity values suggest elongation or asymmetry. Volume-in-voxel and sphericity were identified as key features in the HER2-enriched subtype. Tumour size, a distinguishing feature between HER2-enriched and luminal subtypes in this study, was incorporated into the first step of the nomogram series. Sphericity quantifies how spherical a VOI is, and as previously reported, the HER2-enriched subtype exhibits distinct shape characteristics compared to other subtypes [38].

Radiomic intensity features, such as discretised AUC CSH (uniformity across classes using compact summarised histograms), skewness, and mean intensity, have shown correlations with specific breast cancer molecular subtypes in this study.

In addition, our study highlights the importance of the spatial distribution of discretised grey levels within the tumour in distinguishing luminal subtype cancers from other subtypes. The grey-level run-length matrix (GLRLM), which quantifies the length of consecutive voxels with the same grey level along a defined direction, was identified as a key predictive feature. In GLRLM, fine textures exhibit shorter runs, whereas coarser textures have longer runs with varying intensity values. Prior studies have demonstrated the utility of run-length non-uniformity in differentiating benign from malignant breast lesions in dynamic contrast-enhanced MRI [39] and in distinguishing luminal A from luminal B subtypes [32]

This study presents a novel approach compared to existing literature, as research on predictive modelling based on multi-ethnic patient cohorts remains limited. We focussed on invasive breast cancer cases and utilised multiple MRI sequences for molecular subtyping, enhancing the model’s generalisability and applicability across diverse populations. Notably, our findings demonstrate that texture features extracted from non-contrast-enhanced MRI sequences can effectively differentiate subtypes, reinforcing the potential of alternative imaging techniques. By testing four different MRI sequences, we provide further evidence supporting this observation.

However, this study has several limitations. It was conducted at a single centre with a relatively small sample size and an imbalanced subtype distribution. In addition, luminal A and luminal B subtypes were grouped as a single luminal category due to the absence of a standardised separation method in our dataset. We acknowledge the potential bias introduced by VOI drawn from post-contrast images mapped to the rest of the sequences. The segmentation was performed by a single trained radiologist with review by a senior breast radiologist. A multi-reader segmentation with inter-observer variability analysis would enhance methodological rigour; this will be considered in future prospective studies and external validation. Other potentially relevant clinical variables, such as risk factors and menopausal status, were not collected. We acknowledge that precision and other threshold-dependent performance metrics were not reported in this study, as our primary objective was to evaluate the overall discriminative ability of MRI-based texture features using AUROC, which is threshold-independent and appropriate for multi-class classification. Future studies with larger sample sizes, diverse imaging datasets, and external validation are necessary to confirm the reproducibility and robustness of our findings and to further assess model performance in real-world settings.

Clinical implications and future direction

This study lays the groundwork for expanding the role of MRI breast in complementing or potentially replacing invasive techniques for molecular subtyping, thereby promoting less invasive and more efficient diagnostic strategies towards precision medicine. The study observed no significant differences in MRI-derived features among the three major ethnic groups, indicating that the imaging characteristics of breast cancer subtypes are largely consistent across ethnicities. Hence, our report supports the potential generalisability of imaging-based predictive models in multi-ethnic populations.

We acknowledge that the current study is preliminary, and for this method to be adopted clinically, further validation on a larger, multicenter dataset with diverse scanner settings is essential. While it is challenging to specify an exact number, we estimate that a dataset of at least 500–1000 patients with balanced subtype distribution, including external validation, would be necessary to refine the model, evaluate its generalisability, and establish robust performance metrics suitable for clinical implementation.

Future studies may benefit from exploring ensemble machine learning approaches that integrate multiple models, as well as implementing subtype-specific preprocessing pipelines. These strategies could further improve predictive accuracy and enhance the robustness of radiomics-based classification.

We envision that the proposed MRI-based texture analysis method could serve as a non-invasive adjunct tool in the clinical workflow to aid in the pre-treatment prediction of breast cancer subtypes. This approach may complement histopathological analysis, particularly in cases where biopsy results are inconclusive, repeated biopsies are not feasible, or when there is a need for additional non-invasive assessment for treatment planning and risk stratification. For clinical implementation, further validation on a larger, multicenter dataset is essential to ensure generalisability and robustness of the model across diverse patient populations and imaging protocols.

Comments (0)

No login
gif