Innovations in artificial intelligence to minimize diagnostic error - a comparison with human interpretation of chest radiographs in the clinical context: a scoping review

Since the advent of the digital era, artificial intelligence (AI) has assumed a prominent role in various medical specialties ranging from AI-assisted robotic surgery to its integration into diagnostic imaging, such as chest X-rays. The practice of medicine has become increasingly complex; the adoption of advanced diagnostic support tools has grown accordingly. Although the clinical implementation of AI entails high costs, particularly when scaled to hospital wide applications it offers distinct advantages. These include the automation of routine and repetitive tasks, which paves the way for enhanced creative thinking and expedited decision-making processes [9].

4.1 Pulmonary nodules detected using artificial intelligence in chest X-rays

Five articles (Table 6) focused on the detection of pulmonary nodules in chest X-rays using AI compared with medical personnel. These studies employed various methodologies, including a retrospective study and observational retrospective study, a single-center retrospective study, a pragmatic open-label randomized controlled trial, and a multicenter retrospective cohort study. Despite their methodological variability, all studies shared a common objective: to evaluate the effectiveness of AI in detecting pulmonary nodules and compare its diagnostic performance to healthcare professionals.

Table 5 Sensitivity, specificity, and area under the curve in image reading with artificial intelligenceTable 6 Diagnostic performance comparison (AI vs. Human Interpretation) in pulmonary nodules

The sensitivity of AI in detecting pulmonary nodules ranged from 60 to 80%, with several factors influencing diagnostic accuracy most notably, the size of the nodules. For instance, Takamatsu et al. stratified nodules by size (levels 1 to 4), with level 4 nodules fully detected by AI, whereas only 37% of level 1 nodules identified. This indicates a positive correlation between nodule size and detection sensitivity [10]. Nevertheless, AI demonstrated reduced sensitivity down to 52% in detecting adenocarcinomas specifically [11].

In a retrospective study analyzing clinical records associated with chest radiographs, AI software reported a sensitivity of 66.3%, specificity of 98.0%, positive predictive value (PPV) of 35.7%, and negative predictive value (NPV) of 99.4%. In comparison, clinical reports yielded a sensitivity of 80.0%, specificity of 98.0%, PPV of 35.3%, and NPV of 99.7%, underscoring the continued importance of expert clinical interpretation [12]. The AI system produced 34 false negatives when benchmarked against assessments by a multidisciplinary team. These diagnostic errors were attributed to complex clinical conditions, including obstructive pulmonary collapse secondary to malignancy (18.2%), solitary or multiple malignant nodules (13.6%), malignant pleural effusion (9.1%), and persistent pulmonary consolidation (9.1%), highlighting the necessity for ongoing refinement and clinical training of AI models [13].

As such, AI is best utilized as a clinical decision support tool rather than a stand-alone diagnostic modality [14]. In a retrospective study comparing diagnostic outcomes between physicians working with and without AI support, nodule detection rates were 0.59% and 0.25%, respectively. False referral rates were comparable between the two groups: 45.9% with AI assistance versus 56.0% without. A multicenter study involving multiple clinical readers reported statistically significant improvements in sensitivity, specificity, and overall diagnostic accuracy with AI-enhanced interpretation, with sensitivity gains ranging from 9 to 12% [15]. These improvements were consistent across healthcare providers of varying experience levels, including general practitioners, radiology residents, and expert radiologists, reaffirming AI’s capacity to augment diagnostic precision. Furthermore, the integration of AI into clinical practice facilitates timely referrals in primary care settings, optimizing workflow and resource utilization [13,14,15]. Nonetheless, high implementation costs remain a substantial barrier, particularly in healthcare centers with limited access to specialized personnel. Iglesias López highlighted the logistical and infrastructural challenges of integrating AI-based diagnostic systems in countries such as Cuba, where national medical networks and technological infrastructure are underdeveloped. Notably, no studies to date have evaluated the costs or logistical processes of implementing AI systems in Colombia [9].

4.1.1 Lung Cancer detected using artificial intelligence in chest X-rays

Two studies (Table 7) specifically addressed the diagnosis of lung cancer using artificial intelligence (AI) applied to chest radiographs. In both investigations, AI was employed as a diagnostic support tool for healthcare professionals, resulting in two comparison groups: one in which clinical interpretation was assisted by AI, and another in which it was conducted without AI support. Both studies concluded that the incorporation of AI enhanced clinicians’ diagnostic performance, reporting mean values for precision, sensitivity, and specificity of 90.67%, 91.33%, and 90%, respectively [16, 17]. However, it is important to note that neither study conducted a direct head-to-head comparison between the diagnostic accuracy of AI operating independently and that of medical personnel. This highlights a critical gap in the literature and underscores the need for further comparative research to evaluate the standalone performance of AI relative to human experts [16, 17].

Table 7 Diagnostic performance in respiratory pathologies (COVID-19, pneumonia, etc.)4.2 Respiratory pathologies detected using artificial intelligence in chest X-rays

Three studies (Table 8) investigated the performance of artificial intelligence (AI) in interpreting respiratory pathologies, with a particular focus on COVID-19. All three utilized retrospective methodologies to assess the diagnostic accuracy of AI in chest radiographs in comparison to expert human interpretation.

Table 8 Diagnostic performance in other thoracic pathologies

The first study analyzed 300 images evaluated by five certified radiologists and an AI system (DeepCOVID-XR). The AI system achieved an overall accuracy of 83%, sensitivity of 75%, whereas the radiologists demonstrated a slightly lower accuracy of 81% [18]. The second study evaluated the performance of another AI model, COV19NET, which was assessed by three radiologists. This study reported a sensitivity of 85% and a specificity of 81%, with the AI significantly outperforming the human readers (P = 0.01) [19]. In the third study, a deep learning model was directly compared to radiologists, with results indicating that experienced radiologists still outperformed the AI. Nonetheless, the model was recommended as a rapid, real-time diagnostic support tool, particularly beneficial in settings with limited medical resources [20].

These findings underscore AI’s potential to enhance diagnostic capabilities for respiratory diseases such as COVID-19, particularly when used by less experienced clinicians or in resource-constrained environments. However, the effectiveness of AI remains closely tied to the expertise of the interpreting physician. Further research is necessary to optimize the integration of AI into clinical workflows and to better understand the dynamics of AI-human collaboration in diagnostic radiology.

4.3 Other pathologies detected using artificial intelligence in chest X-rays

Six studies (Table 9) examined radiographic findings across a range of pathologies, including pneumothorax, pleural effusion, pneumonia, mediastinal and hilar masses, and pulmonary nodules. These investigations focused on evaluating sensitivity, specificity, and area under the curve (AUC) when comparing AI systems used either autonomously or in combination with clinician interpretation.

Table 9 Diagnostic performance in other thoracic pathologies

One study reported that radiologists including: thoracic imaging specialists, general radiologists, and radiology residents achieved adequate sensitivity and specificity without AI assistance. However, the introduction of AI led to improved sensitivity across all groups, except for thoracic radiologists evaluating mediastinal and hilar masses. In this subgroup, AI marginally reduced performance by − 0.5% (95% CI: −1.7 to 0.5; P = 0.32), suggesting that experienced radiologists may demonstrate skepticism toward AI, which could affect their diagnostic performance [21].

A South Korean study found that AI-CAD achieved a sensitivity of 95.3% with a false-positive rate of 61.7%. In contrast, on-call radiologists had a sensitivity of 66.6% (P < 0.001) and a false-positive rate of 18.9% (P < 0.001). While AI showed higher sensitivity, radiologists made fewer false-positive errors [22].

A UK-based study highlighted the impact of experience: junior clinicians achieved a 12% sensitivity improvement (95% CI: 4–19%) for nodule detection with AI assistance, while senior radiologists improved by 9% (95% CI: 0.5–17%). Specificity also improved slightly in both groups [14]. Interestingly, intermediate-level readers outperformed senior readers without AI support, suggesting AI may be especially beneficial for less experienced clinicians [17].

A study conducted in the United Kingdom highlighted the influence of clinical experience on diagnostic outcomes. Junior clinicians experienced a 12% increase in sensitivity (95% CI: 4–19%) for pulmonary nodule detection with AI support, while senior radiologists showed a 9% improvement (95% CI: 0.5–17%). Minor improvements in specificity were also observed in both groups [14]. Notably, intermediate-level readers outperformed senior radiologists when unaided, suggesting that AI support may be particularly valuable for less experienced clinicians [17].

Additionally, AI demonstrated superior diagnostic accuracy in detecting certain conditions such as pneumothorax and pneumoperitoneum but was less effective in identifying more complex pathologies like consolidation and atelectasis. This variation may be attributed to factors such as image quality and diagnostic complexity [23]. One study comparing autonomous AI interpretation to unaided clinician performance reported a higher AUC for AI in detecting consolidation (0.93 vs. 0.71), with AI showing greater sensitivity across all pathologies, particularly pneumothorax [24]. Another study documented an average AI AUC of 0.80 (95% CI: 0.858–0.875), consistent with results from AI-assisted interpretations [21]. Collectively, these findings suggest that well-trained AI systems can serve as valuable diagnostic aids. However, the final interpretation and clinical decision-making should remain under the purview of qualified healthcare professionals. Current clinician trust in AI remains limited, largely due to the scarcity of real-world validation data and concerns over reliability in complex clinical scenarios.

4.4 Ethical implications of AI in medicine

Given the significant potential of artificial intelligence (AI) in medical practices such as predictive analytics and support for clinical decision-making, it is essential to consider the ethical implications, and the complex ethical challenges associated with its use [25]. While AI demonstrates remarkable capabilities in performing specific tasks and analyzing clinical cases, it cannot replicate human soft skills, such as empathy and compassion, which are crucial components of medical care. As a result, medical management guided solely by AI risks lacking these essential humanistic dimensions [25].

Furthermore, questions of accountability must be addressed, including the extent of responsibility borne by the AI system, its developers, and the healthcare professionals involved in the diagnostic process [26]. The integration of AI into clinical practice also raises concerns about patient privacy [27], as these systems rely on the collection and processing of sensitive personal and medical data to train machine learning algorithms [28]. In many cases, the issue of obtaining individual informed consent for data use remains a significant ethical concern [29].

Nevertheless, AI presents a valuable opportunity for collaborative interaction between humans and machines. In this model, physicians retain full autonomy in determining whether to accept or reject the system’s recommendations, thereby maintaining professional judgment and accountability [25].

Finally, the perspectives of leading international organizations—such as the World Health Organization (WHO), UNESCO, and the World Medical Association (WMA) must be considered [30,31,32]. These institutions advocate for fair and equitable access to AI technologies, while emphasizing the need for algorithmic transparency and inclusivity for both healthcare providers and patients. They also underscore that the ethical integration of AI into healthcare is an ongoing process, requiring continuous assessment and strong regulatory frameworks [26].

4.5 Study limitations

Although the application of artificial intelligence in the reading and interpretation of chest X-rays has shown significant progress in a variety of international articles, it is important to highlight the following limitations:

Data size and representation: While most studies were conducted on populations from the United States and other developed countries, there is no comprehensive global representation of the application of these models in regions with economic constraints or limited technology. Therefore, the accuracy of artificial intelligence algorithms may vary in underrepresented populations.

Real medical environments: There is a lack of evaluation of these systems in real-world clinical settings with diverse patient populations, varying image quality, and limited time for care. The performance of artificial intelligence was assessed in controlled and preselected environments, meaning that findings and results may not translate directly to real-world medical settings.

Human interaction in decision-making: Once a diagnosis is made by an artificial intelligence system, it must always be validated by a physician. Regardless of how precise the diagnosis may be, the final decision regarding diagnosis and treatment should be made by a human. Some comparisons previously discussed indicate that the performance of algorithms was not effective due to the lack of trust from traditional physicians in artificial intelligence, as there is limited evidence on decision-making based solely on AI-generated findings.

Lack of complete clinical context: Operating systems are trained to read certain pathologies, following specific, limited patterns. Unlike human knowledge, artificial intelligence systems may have limited access to relevant patient information, such as detailed medical history and current symptoms. While some models can integrate contextual information, their ability to do so remains limited compared to human decision-making capacity.

Limited comparison among expert radiologists: There are different levels of experience among radiologists, which can influence diagnostic perception. The previously mentioned studies considered these comparisons. However, there is no large-scale study involving a greater number of radiologists.

Bias in training data: The datasets used to train artificial intelligence systems may be biased due to limited information on clinical cases and patients. Therefore, the system’s ability to identify atypical or underrepresented data could affect its effectiveness in real-world applications.

4.6 Beyond the numbers: the cost of implementing AI in radiology

The implementation of artificial intelligence (AI) software in hospital settings offers numerous advantages in terms of diagnostic accuracy and operational efficiency. However, its adoption also entails a range of costs that must be carefully considered by healthcare institutions.

Direct costs include several critical components. These encompass the number of software licenses required—often proportional to the size and complexity of the institution—as well as the personnel costs associated with installation, system configuration, and training of radiology staff, who are typically the primary users of these tools. Moreover, ongoing expenses such as maintenance contracts, technical support, and regular software updates must be factored into the budget to ensure long-term functionality and optimal performance.

Indirect costs, although less apparent, can have a substantial impact on hospital operations. For example, temporary interruptions in radiology services during the installation phase may result in delayed diagnoses and disruptions to patient care, leading to potential financial losses. Additionally, continuous staff training is necessary—particularly in institutions with high turnover or rotating personnel—which represents a recurring investment in both time and human resources.

In the case of computer-aided detection (CAD), screening costs vary depending on the software used. For instance, the per-screen cost for systems such as CAD4TB and InferRead ranges from USD 0.19 to USD 2.78. These figures are generally lower than the cost of radiologist readings, which fall between USD 0.70 and USD 0.93 in high-volume settings. Nevertheless, the estimated cumulative cost of implementing such AI programs nationwide over a four-year period could range from USD 2.65 million to USD 19.23 million [33].

Notably, the literature and official websites of widely used AI systems often lack precise cost breakdowns, reflecting the inherent complexity and variability of implementation-related expenses. This underscores the need for future research to produce comprehensive cost-effectiveness analyses, comparing initial and long-term expenditures with expected clinical and operational benefits.

Comments (0)

No login
gif