Macrotrabecular-massive subtype in hepatocellular carcinoma based on contrast-enhanced CT: deep learning outperforms machine learning

To achieve a non-invasive and objective diagnosis of the aggressive MTM subtype in HCC, we developed and validated a DL model (RVCL algorithm) based on contrast-enhanced CT, systematically evaluating its diagnostic performance for MTM. The results demonstrated that the RVCL model outperformed existing mainstream DL models, such as AlexNet, VGG, ResNet, ViT, and EfficientNet, in identifying the MTM subtype, showcasing superior diagnostic efficacy. Furthermore, compared to traditional ML models, including LR, RF, and SVM, the RVCL algorithm exhibited significant advantages, while the performance of the three traditional ML models in MTM identification showed no notable differences.

The superior diagnostic performance of the RVCL model over existing DL and ML models can be attributed to its innovative architectural design, which is reflected in three key aspects. First, by leveraging ResNet-50 to extract local features and integrating ViT to capture long-range dependencies, RVCL achieves synergistic modeling of both local and global features. This approach addresses the limitations of traditional CNNs in global context modeling and pure Transformers in local detail capture, significantly improving the Recall metric (external test set Recall of 75%, a notable increase compared to ViT’s 50%). Second, RVCL effectively combines residual learning with self-attention mechanisms. The residual connections mitigate the vanishing gradient problem, while the self-attention mechanism dynamically adjusts feature weights, enabling the model to focus more effectively on critical regions. This results in exceptional performance in F1 Score (78%) and AUC (93%). Finally, RVCL employs a hybrid loss function that combines binary cross-entropy (BCE) loss with triplet loss. The BCE loss ensures classification accuracy, while the triplet loss enhances feature discriminability, allowing the model to maintain high Precision (91%) while significantly improving Recall and F1 Score. This achieves a better balance between precision and recall. These design elements collectively enable RVCL to excel in fine-grained classification tasks.

Traditional CNNs, including architectures like AlexNet, VGG, and ResNet, demonstrate strong specificity, making them effective for excluding non-MTM cases. However, their sensitivity remains suboptimal, and they exhibit notable performance declines in external validation, suggesting susceptibility to overfitting and limited generalizability. ResNet shows relatively stable training performance but struggles with inconsistent precision in new datasets, raising concerns about false positives. ViT, relying on global feature analysis through self-attention, achieves exceptional specificity but suffers from poor sensitivity and imbalanced performance, requiring extensive datasets for optimal training. EfficientNet balances specificity and generalizability better than other baseline models, with a scalable architecture favoring deployment, though its modest sensitivity limits reliability in high-stakes MTM-HCC detection. In contrast, the hybrid RVCL model synergizes local feature extraction and global contextual understanding, delivering robust and balanced performance across sensitivity and specificity, alongside strong generalizability. However, its computational demands may hinder practical implementation in resource-limited environments. The integration of clinical data into RVCL (RVCL + C) further elevates diagnostic precision and sensitivity, achieving near-perfect performance, though reliance on standardized clinical inputs poses challenges in heterogeneous healthcare settings. While traditional CNNs and ViT prioritize specificity at the expense of sensitivity, EfficientNet offers a middle ground with deployability but compromises diagnostic accuracy. RVCL/C+ emerges as the gold standard for comprehensive diagnosis in well-resourced clinical workflows, whereas EfficientNet may suffice for initial screening where computational or data constraints exist. This underscores the importance of aligning model complexity, data quality, and clinical objectives to optimize real-world applicability.

Compared to traditional ML models such as LR, RF, and SVM, the RVCL model demonstrated significantly superior diagnostic performance, while the three traditional ML models showed no notable differences in MTM identification. This finding exhibits some inconsistency with previous studies. For instance, Zhang et al [18], in a single-center study based on 232 magnetic resonance imaging (MRI) images, reported that LR outperformed other traditional ML algorithms (including K-nearest neighbor, Bayesian, decision tree (DT), and SVM), achieving AUC values of 0.766 and 0.739 on the training and test sets, respectively. Additionally, Feng et al [15] extracted radiomics features from contrast-enhanced CT images of 365 patients, selected features using least absolute shrinkage and selection operator (LASSO) regression, and incorporated them into an SVM classifier, yielding AUC values of 0.80 and 0.74 on the internal and external test sets, respectively. Cai et al [23] extracted radiomics features from MRI images of 127 patients and employed an RF algorithm to identify MTM-HCC, achieving AUC values of 0.916 and 0.833 on the training and validation sets, respectively. The discrepancies in these results may stem from differences in patient cohorts, imaging modalities (MRI vs CT), regions of interest (two-dimensional vs three-dimensional), and feature dimensionality reduction methods. In contrast, our study, through a head-to-head comparison, further substantiates the unique advantages of DL models in handling high-dimensional, non-linear medical imaging data, particularly in extracting complex features and modeling global contextual information. Traditional ML models, due to their limited capability in feature extraction, struggle to capture subtle differences in CT images, resulting in relatively modest performance in MTM identification tasks.

Our findings indicate that the elevated serum AFP levels serve as an independent predictor of the MTM subtype, which aligns with previous reports [7, 15]. However, integrating AFP into the RVCL model did not significantly enhance its diagnostic performance (p ≥ 0.05), consistent with prior studies [15]. This may suggest that the RVCL model has already extracted sufficient diagnostic information from CT images, rendering the additional clinical information provided by AFP of limited contribution to model performance. However, this does not preclude the potential for further enhancing model performance by incorporating other clinical or imaging features, such as tumor markers, pathological data, or multimodal imaging. The fusion of multimodal data may provide the model with more comprehensive information, thereby enabling more precise diagnosis in complex cases.

Our study has several limitations. Firstly, our sample size—while multi-institutional—remains relatively small for a DL study. Larger multicenter datasets are needed to reduce potential biases and further validate generalizability in order to achieve better performance. Secondly, DL is often seen as a “black box”, and limited model interpretability can reduce clinical adoption. To enhance the interpretability of the model, future research could incorporate interpretability techniques (such as Grad-CAM) to visualize the decision-making process of the model, thereby increasing clinicians’ trust in the model. Lastly, the use of different CT scanners from two centers may affect the radiomics features.

In conclusion, the RVCL model demonstrates significant advantages in the non-invasive diagnosis of MTM-HCC subtypes, leveraging an innovative architecture that synergistically integrates CNNs and ViTs. Its robust diagnostic performance highlights its potential as a clinical decision-support tool, paving the way for future applications in precision medicine and personalized treatment strategies.

Comments (0)

No login
gif