Sleep is regarded as a crucial component in maintaining homeostasis, mental integrity, cognitive well-being, and physical health.[1] A normal sleep-wake cycle helps provide the sufficient sleep needed to maintain a resting state of the body.
Previous studies have shown that a person’s sleep habit is, in general, heritable, which is correlated with the structure and function of the brain and is fundamentally related to overall health.[2] Furthermore, it has been found that recent technological advances have highlighted the integral role played by sleep in physical and cognitive well-being.[3] Sleep disorders are increasingly prevalent among young individuals.[4,5]
Historically, the Pittsburgh Sleep Quality Index (PSQI) has been established as a gold standard for determining a person’s sleep quality. It was developed in 1989. It is a self-performed questionnaire that analyses the quality of sleep. It encompasses various dimensions, including sleep duration, disturbances, and latency.[6]
Recently, the emergence of artificial intelligence (AI) has been prominent and is evolving as an innovative tool in the healthcare sector. AI has been increasingly utilized in various aspects of sleep medicine, including scoring respiratory events, staging sleep, predicting circadian rhythms, diagnosing insomnia, and profiling obstructive sleep apnea (OSA).[7] These upgrades suggest that AI has immense potential to radically change the assessments of sleep by extending more efficient and personalized evaluations.
It was found out that evolved machine learning algorithms can assess and analyze complex sleep data, such as polysomnography recordings, to pinpoint specific patterns of sleep disorders. This increases the effectiveness of the diagnosis and its accuracy. This will pave the way for the development of personalized treatment plans as part of a person’s unique sleep profile.[8]
Devices such as the Belun Ring use AI to track the stages of sleep and identify conditions like OSA, offering real-time data that can inform customized interventions. This demonstrates that the integration of AI in wearable technology enhances the effectiveness of personalized sleep medicine. Furthermore, AI-enabled software may provide personalized recommendations for improving sleep hygiene by analyzing user data and suggesting behavioral modifications. These breakthroughs in AI enhance its potential to redefine sleep medicine by enabling tailored approaches that can address individual preferences and needs.[9]
Regardless of these AI developments in the field of sleep, sleep assessment devices require stringent validation to enhance the reliability and accuracy of their reports. Studies that compare sleep assessment tools, such as the PSQI, with AI-driven assessments are crucial for analyzing the efficacy of AI in sleep medicine.
The present study aims to analyze the performance of an AI-generated assessment tool of sleep quality in comparison with the conventional sleep assessment tool, PSQI, among undergraduate medical students. By evaluating this agreement and comparing these two assessment tools, we aim to explore the potential of AI in expanding personalized sleep medicine.
AimThis study aims to analyze the effectiveness and performance of an AI-generated assessment tool for sleep quality by comparing it with the PSQI among undergraduate medical students and to investigate its potential integrated application in personalized sleep medicine.
MATERIAL AND METHODS Ethical considerationsThe current study was conducted after obtaining Institutional Ethics Committee clearance (IEC NO: 562/2022/IEC/ ACSMCH). All participants were thoroughly informed about the study, and written informed consent was obtained from them prior to data collection. The anonymity and confidentiality of the participants were maintained.
Study designThe current study is a cross-sectional study that was conducted among 300 undergraduate medical students.
Study participantsUndergraduate medical students who belong to age group of 18–30 years and had given informed and written consent to participate in the present study were included in the study.
Undergraduate medical students with any history of OSA, acute or chronic cardiac or respiratory illness, other sleep disorder, substance use, chronic diseases like diabetes, arthritis, or night shift work were excluded.
Assessment toolsTwo different sleep quality assessment tools were used. The first one is the traditional PSQI tool, a validated sleep quality assessment tool used over the past month.
The second tool is an AI-generated sleep quality assessment tool for undergraduate medical students, structured and designed by Chat-GPT, which utilizes the same seven domains of the PSQI [Supplementary File].
In this study, Chat-GPT was chosen as an AI tool because of its sophisticated Natural Language Processing capabilities, adaptability, and easy accessibility in obtaining personalized questionnaire-based assessments.[10] Furthermore, it has the ability to facilitate human-like conversational interactions, allowing for a user-friendly and engaging experience compared to conventional static questionnaire formats.[11] Hence, we have chosen Chat-GPT as a potential tool that could align with PSQI tool.
Collection of dataThe study participants were administered both AI-generated and PSQI sleep quality assessment questionnaires that cover the seven similar domains of sleep.
From the collected data, statistical analysis was performed using the Statistical Package for the Social Sciences (SPSS) software, version 20. Normality was first assessed using the Shapiro–Wilk test. Since the data obtained was not normally distributed, we have used Wilcoxon signed-rank test to compare the two tools. Agreement about the categorization between them was performed using Cohen’s weighted kappa coefficient. Cross-tabulation was performed to identify discrepancies in classification between the two tools.
RESULTSThe mean scores of the AI-generated sleep assessment questionnaire and the PSQI questionnaire were found to be not normally distributed by the Shapiro–Wilk test (P < 0.0001); therefore, they were compared using the Wilcoxon signed-rank test. The results show a statistically significant difference between the two sleep assessment tools with a Wilcoxon statistic of 8841.0 with P < 0.0001. This proves that one of the two assessment tools systematically underestimates or overestimates sleep disturbances.
The mean PSQI score was 11.94 ± 2.45, and for the AI-sleep assessment score, it was 10.65 ± 2.30. From these findings, it is derived that AI-tool score was lower than PSQI score. This highlights the finding that AI-generated tools tend to underestimate sleep quality when compared to the PSQI.
To find the agreement between AI-generated classification and PSQI classification of sleep quality, Cohen’s weighted kappa coefficient was computed. The resulting value was 0.133, which indicates a slight agreement between these two classifications. Cross-tabulation has found that AI-generated questionnaires often assign different categories to participants than the categories of the PSQI.
Table 1 depicts the comparison of PSQI and AI tool sleep quality using Cohen’s Kappa Cross-Tabulation. It was found out that AI misclassified 45 participants with poor sleep quality as having good sleep. Furthermore, it has underestimated 25 participants with severe sleep disturbances as poor sleep. Approximately four participants were accurately identified by AI as having severe sleep disturbances. Thus, this low agreement between these two tools explains the statistical Cohen’s kappa value of 0.133.
Table 1: Cohen’s Kappa cross-tabulation.
PSQI sleep category AI: Good sleep AI: Poor sleep AI: Severe sleep disturbances Poor sleep 45 37 0 Severe sleep disturbances 25 189 4This form of low agreement (kappa = 0.133) between the two tools indicates that the AI-generated questionnaire categorizes the sleep quality of participants differently compared to the PSQI. This could be due to a difference in scoring sensitivity that potentially underestimates the quality of sleep. The analysis shows that AI-generated questionnaire classifies participants with moderate sleep quality as good sleep quality rather than categorizing them under poor sleep quality. This suggests a lower sensitivity in identifying individuals with severe sleep disturbances, potentially leading to discrepancies in classification.
DISCUSSIONFrom the current study [Figure 1], it was found that a statistically significant difference was observed between PSQI scores and AI-generated sleep quality assessment scores among the undergraduate medical students. The mean score of the AI-generated assessment (10.65 ± 2.30) was lower than the mean PSQI score (11.94 ± 2.45), indicating that the AI tool might underestimate sleep quality assessment relative to PSQI scoring.
Export to PPT
In addition, a slight agreement was found between these two assessment scores, as indicated by a Cohen’s weighted kappa coefficient of 0.133. This again highlights the potential discrepancy in classification of sleep quality.
PSQI is a well-known validated questionnaire for the assessment of sleep quality among various groups with proven validity and reliability. Previous studies have revealed a good internal consistency that is proven with a Cronbach’s alpha values from 0.69 to 0.84 and good test-retest reliability.[12] When these results were compared with the present study, it was found to be in contrast as the AI-generated sleep quality assessment tool in the current study was found to have a lower kappa coefficient that shows a slight agreement with PSQI.
This form of discrepancy reveals that the AI-tool lacks sensitivity in identifying severe forms of sleep disturbances leading to misinterpreting participants with moderate forms of sleep disturbances as having good quality sleep.
Even though AI-driven sleep assessment has the advantage of scalability and systematization, its exactness is contingent upon the data quality and the solidity of the algorithms engaged. Recent developments in AI-driven assessment of sleep quality tools have revealed potential with few models attaining noteworthy agreement in categorizing sleep quality.[13,14] For example, a previous study has shown that an AI model classified sleep stages with Cohen’s kappa values spanning from 0.70 to 0.84.[15] Nevertheless, these AI-tools need a vast data for training and high-tech refined modeling to meet the standards and reliability of validated existing tools like PSQI.
Furthermore, the underestimation of the AI-tool that was found in the present study highlights the limited ability of AI to analyze subjective factors of sleep quality that are inbuilt components of PSQI. Furthermore, the algorithm of the AI-tool may not be adequate enough to capture the individual variations in patterns of sleep and other environmental components that influence sleep quality. Table 2 summarizes the pros and cons of using AI-generated sleep questionnaire versus PSQI, using the result observed from this study.
Table 2: Pros and cons of using the AI-generated sleep questionnaire versus PSQI.
Key aspects PSQI AI-generated sleep questionnaire Validation of tool Globally validated sleep assessment tool Not yet validated Accuracy Proven reliability with high specificity and sensitivity Likely to underestimate sleep quality with reported low sensitivity and agreement (kappa=0.133) from the current study Reliability of sleep quality categorization It has been proven to provide accurate categorization and was traditionally used as a reference standard. Tends to misclassify moderate or severe sleep disturbances with reported slight agreement with PSQI Analysis of subjective component Consists of both objective and subjective components of sleep quality It was observed to have limited capability to assess the subjective component of sleep Scalability Manual scoring is done conventionally and hence it is less scalable It is highly scalable because it is fully automated and can be used for large-scale population studies Time consumption Consumes a lot of time to score manually and to interpret Very easy and faster to assess, analyze, and interpret Price Might need professional administration support or licensing to use After proper development and validation, it becomes a cost-effective option Customization Since it has a static structure, it is not customizable It can be updated regularly and tailored according to the study population Transparency of analysis Totally transparent in regard to scoring and interpretation Lack transparency in analysis since they are regarded as “black box” models Role of data quality Since it has validated scoring framework, it is totally independent of training data The analysis and performance are highly dependent upon model robustness and training data Limitation and future directionsFrom the present study, it was found out that an AI-generated sleep quality assessment tool powered by ChatGPT tends to underestimate the quality of sleep when compared to a conventional, validated PSQI tool.
To enhance the specificity and sensitivity of AI-driven sleep assessment tools, several suggestions should be considered for future research.
The questions formulated by AI tools need to be refined rigorously to align with validated tools like PSQI. This can be achieved by refining the questions that can identify both objective and subjective sleep quality to enhance accuracy as part of a personalized assessment. Integrating the questionnaire data with data from wearable devices like actigraphy might provide improved categorization accuracy.[16,17] It is essential to fine-tune the AI model with datasets on sleep, accounting for various groups of people across different demographics to increase the capability of AI to identify even minor variations in the quality of sleep.
Frequent calibration and validation of AI-generated sleep assessment scores against gold-standard instruments like polysomnography are required to obtain a consistent and accurate categorization of sleep quality.[18]
Further, the preference of study participants between PSQI and AI-generated sleep questionnaire was not assessed in the current study, which holds a potential limitation and an area for further exploration.
When these points are combined with AI advancements, these tools can become more reliable, accurate, and clinically relevant, thereby facilitating their broader use in research and clinical settings of sleep medicine. Future studies shall explore such modified healthcare AI-versions to individualize and ease AI-driven sleep evaluation methods with low cost and time.
CONCLUSIONIn the end, while an AI-generated questionnaire for sleep quality assessment has potential benefits in research due to its scalability and automation, the present study underscores the crucial necessity for careful training of AI algorithms, refinement, and validation of AI tools.
Therefore, these AI tools can provide highly sensitive, reliable, and accurate sleep quality assessments. In the future, further research should focus on improving AI algorithms by utilizing diverse data inputs to align more closely with existing, validated instruments, such as the PSQI.
With such advancements in AI, the AI-generated sleep questionnaire may potentially complement and replace conventional sleep tools, such as the PSQI, in the context of research and clinical practice, paving the way for personalized sleep medicine.
Comments (0)