Comparison of observed image quality and technical image quality parameters in 3D-FLAIR images

In this study, we have presented a methodology to compare the observed image quality of MRI image volumes and technical image QC parameters derived from the same volumes. We applied the forced choice method to quantify observed image quality and to differentiate devices from each other. A limited correlation with observed image quality and technical parameters was shown, both by study and by device.

The complex relation between medical image quality and technical QC parameters is difficult to quantify, where the key challenge is in converting the ambiguous differences in observed image quality to measurable parameters. With the presented methodology, based on blinded forced choice, we were able to show significant differences in the observed image quality between the devices operating with similar MRI sequence acquisition parameters.

The grading method, based on a predetermined scale, requires experienced and calibrated observers and eventually offers only limited specificity. Meanwhile, these problems can be partly overcome by the forced choice method. By increasing the number of votes, the accuracy of the estimate P values can be increased and the uncertainty of the image quality estimate decreased. In the present study, we used a single experienced observer, but the method could accommodate multiple observers. The bias between observers can be statistically controlled as part of the analysis [31]. Increasing the number of observers and votes can significantly scale up the survey, allowing it to reach study population sizes previously unfeasible with a predetermined scale-based grading.

As the power of the method lies in the number of votes; however, dedicated UI should be used to reach a maximum amount of data with a reasonable workload. With the presented UI, an experienced observer can provide a vote on a simplified question about the image volume within seconds, making the total yield of a few thousand votes reachable even with a single observer. The voting platform can be further improved to increase usability and engagement, even by including gamification elements.

The applied methods to measure the technical image quality of the image volumes have been shown to respond to changes in the clinical images [26]. A limited R2 with the observed image quality and technical image quality parameters was found in this study. While devices with better technical image quality also received a better estimated observed image quality, this correlation was not guaranteed. Compared with device-specific median values, the R2 in study-specific comparison was weaker. Study-specific R2 in contrast-based QC metrics was generally higher than in MTF-based metrics, indicating the higher role of contrast in the observed image quality with image volumes.

The QC parameter related only to noise, lacking correlation with the observed image quality, implying the level of noise is within an acceptable limit for image interpretation. The relationship between image quality and noise may not be linear but appears to have a threshold of effect.

The weaker study-specific correlation compared with the device-specific correlation is likely due to substantial study-by-study variation. Especially, nuisance features contributing to MTF-specific technical QC parameters may originate either from the device’s technical performance or patient motion artefacts. However, the same feature may have a fundamentally different impact on the observed image quality. The motion artefact can be perceived as a natural feature of an image while the reduced resolution due to technical performance may appear unnatural.

A better ranking in observed image quality was generally associated with the more recently installed devices, based on blinded quantification. This is an interesting finding supporting the effect of technological advancement. To be more specific, scanners with ID1–ID8 had improved RF system with a higher number of coil channels and signal digitization closer to the patient whereas scanners with ID9–ID15, except ID14, had not. Scanner ID14 had an advanced RF system, but the acquisition voxel size deviated from the dominant setting. The digitalization of the MRI scanner’s RF system has been shown before to improve the CNR of the brain images [32].

The results show that the observed image quality of the devices varied significantly, even with almost identical scan protocol parameters. Higher main magnetic field strength is normally seen as a way to increase signal and contrast, especially in brain imaging. However, it did not guarantee better observed image quality in this study. Theoretically, a higher field strength should allow an increase in spatial resolution while maintaining acceptable CNR and consequently increase the observed image quality. The ETL affects the high image frequency components, contributing edges to the image, and also the image contrast due to increased T2 weighting and decreased signal strength of later echoes in the train. In general, devices with lower ETL obtained better ranking in observed image quality. In our study, three devices (ID11, ID14 and ID 15) showed a considerable variation from the typical settings in effective echo time, repetition time (TR), or inversion time (TI). With the device IDs 11 and 15, this variation was likely due to vendor-specific sequence design and default settings. The reason behind TR and TI variation with the device ID14 remains unknown. In all three cases, deviating sequence parameters may have contributed to lower rankings in observed image quality.

The accuracy of the observed image quality analysis is limited by the number of votes per study and patient-by-patient variation in the sample data. In theory, the required number of votes can be determined by the planned maximum allowed variation in the image quality estimate. On the other hand, the variation in values can be calculated directly from the final image quality estimates. However, the relation between statistical variation and patient-induced variation is difficult to determine, especially if both are estimated to be in the same range. The effect of statistical variation can be decreased by increasing the number of votes. The patient-by-patient variation may be reduced through a more educated choice of patient population. In this study, the patients on each scanner were considered a random sample from the same population. There was no control on demographic parameters other than ensuring all patients were adults and thus portraying a relevant clinical setting. The impact of demographic factors on the image quality offers an interesting topic for further research requiring a significantly larger study population than presented in this study.

The statistics in forced choice experiments follow binomial distribution which approaches normal distribution with a sufficiently large sample size. With Shapiro–Wilk test, image quality estimate P was shown to be normally distributed with all but four devices (IDs 2, 4, and 8). While the reason for this deviation is unclear, it may be related to patient-by-patient variation. For example, contrast decreasing motion artefact could induce reduction in image quality deforming the distribution.

Comments (0)

No login
gif