Extended Data Fig. 4: Types of errors found in the original reports and the AI-generated reports.
From: Collaboration between clinicians and vision–language models in radiology report generation
(a) During the error correction evaluation, we ask expert raters to explain the identified issues in reports based on the following taxonomy: (i) incorrect findings, (ii) incorrect severity (for example, mild vs. severe pulmonary edema), (iii) incorrect location of finding (for example, left- vs. right-sided pleural effusion). The figure shows the distributions of these error types for the normal and abnormal cases separately in the IND1 and MIMIC-CXR datasets. Data is presented as mean values and 95% confidence intervals across cases are also shown. In total, there are 34 normal and 272 abnormal cases from the MIMIC-CXR dataset, and 100 normal and 200 abnormal cases from the IND1 dataset. (b) Venn diagrams of error counts for reports that contain at least one error, for the MIMIC-CXR dataset and the IND1 dataset. The intersection between the blue and the green segments indicates the number of cases where both the AI-generated report and the ground truth contained errors. The red segment indicates the cases where at least one clinically significant error is detected.