Scientific reviews | 2

Title: Comparing two artificial intelligence software packages for normative brain volumetry in memory clinic imaging
Authors: Zaki et al.,
Publication: Neuroradiology, 2022

The authors of this study compared two of the available AI software packages for evaluation of brain volumetry in patients with dementia.

There are many things to consider before acquiring any AI software, some important considerations being the differences between the available software packages and an evaluation of clinical implementation. To increase the understanding of AI software packages, this study evaluates Quantib® ND with one of its competitors.  

What did the researchers do?

Sixty patients with a diagnosis of mild cognitive impairment (MCI), Frontotemporal dementia (FTD), or Alzheimer’s dementia (AD) after multidisciplinary expert consensus were included. Twenty healthy controls were added to the dataset as well.

All patients were processed with both software packages to acquire the structured reports. The report of the other software package was extended to make the comparison to Quantib's structured report easier. While neither of the software packages are intended to be used to make a diagnosis on the reports alone (i.e., without looking at the MRI scans or any other information), for this study two neuroradiologists were asked to diagnose the patients as “normal” or “abnormal”, and if the latter indicate a diagnosis. 

What did they find?

Inter-observer agreement (i.e., how often the readers made the same decisions) from the reports for each software package was good, both for the decision “normal” vs. “abnormal” (cohens kappa of 0.73) and moderate for specific diagnoses (cohens kappa of 0.54). For example, Quantib’s software scored a 90% simple agreement for FTD diagnosis. Furthermore, there were no significant differences between accuracy, sensitivity, specificity, or diagnostic confidence between the software packages, meaning that they both scored well on these tests. 

Quantib® ND Accuracy Sensitivity Specificity
AD 76.7 50 90
FTD 82.5 95 76.25

Figure 1. Quantib® ND’s average accuracy, sensitivity, and specificity over the two raters for the detailed diagnoses evaluated in this study. Numbers calculated from individual results of table 3 in the reviewed paper.  

Comparing the two software packages does show some differences. First, agreement for the evaluation between “normal” and “abnormal” was evaluated as moderate. This evaluation was based on the reports and their reference curves, which come from vastly different reference populations. Each software package’s healthy range is furthermore represented differently, with 2 standard deviations from the mean vs. 5-95 percentiles, or even 1-99 percentiles for software packages not evaluated in this study. Furthermore, the percentage of intracranial volume shows significant differences. The software packages evaluated here are inherently different in their calculations of this measure. Moreover, while the two packages evaluated here perform some measure of relative volume, other software packages on the market calculate absolute volumes.

Software packages characteristics - Table 2 Zaki et al.,

Figure 2.Table as represented in the article. Quantib® ND is Software 1. 

Conclusion

Based solely on reports, both AI software solutions show good inter-rater agreement, as well as good accuracy, sensitivity, and specificity. However, results from different software packages are not interchangeable. That results are not interchangeable or directly comparable is understandable when you think about results for different behavioral tests for dementia, but it is just as true for the many different methods for specific MRI analysis. This means that, before acquiring a software package, one should evaluate which is best suited for their centre and clinical population. It is furthermore important to re-iterate that these AI software packages are not intended as a diagnostic tool in itself, but as an extra source of information in a multidisciplinary setting including visual assessment of MRI, and evaluation of clinical and laboratory results.

That said, as the authors state, it is essential to understand the differences in the many AI software packages available, their characteristics, clinical performance, and use-case focus before deciding what software is right for your institution. Studies as the one reviewed here are a step towards greater understanding of AI software within the community.

Do you want to know more? Access the full scientific paper here.