Extended Data Fig. 4: Types of errors found in the original reports and the AI-generated reports. | Nature Medicine

Extended Data Fig. 4: Types of errors found in the original reports and the AI-generated reports.

From: Collaboration between clinicians and vision–language models in radiology report generation

Extended Data Fig. 4

(a) During the error correction evaluation, we ask expert raters to explain the identified issues in reports based on the following taxonomy: (i) incorrect findings, (ii) incorrect severity (for example, mild vs. severe pulmonary edema), (iii) incorrect location of finding (for example, left- vs. right-sided pleural effusion). The figure shows the distributions of these error types for the normal and abnormal cases separately in the IND1 and MIMIC-CXR datasets. Data is presented as mean values and 95% confidence intervals across cases are also shown. In total, there are 34 normal and 272 abnormal cases from the MIMIC-CXR dataset, and 100 normal and 200 abnormal cases from the IND1 dataset. (b) Venn diagrams of error counts for reports that contain at least one error, for the MIMIC-CXR dataset and the IND1 dataset. The intersection between the blue and the green segments indicates the number of cases where both the AI-generated report and the ground truth contained errors. The red segment indicates the cases where at least one clinically significant error is detected.

Back to article page