Fig. 3: Results of pairwise preference test for MIMIC-CXR and IND1. | Nature Medicine

Fig. 3: Results of pairwise preference test for MIMIC-CXR and IND1.

From: Collaboration between clinicians and vision–language models in radiology report generation

Fig. 3

a, Preferences for Flamingo-CXR reports relative to original clinician reports. Reports are grouped according to the level of agreement between reviewers. b, Clinician preferences for Flamingo-CXR reports depending on the location of the clinician, from either the US-based cohort or the India-based cohort. Note that there are two reviews from each location cohort, so in this case, unanimity corresponds to agreement between two clinicians rather than four in the full panel. c, Preferences for normal reports and separately, for abnormal reports. In all panels, data are presented as mean values and error bars show 95% confidence intervals for the cumulative preference scores. d, Examples from MIMIC-CXR with varying degrees of inter-rater preference agreement; for two examples, all four radiologists unanimously preferred the AI report or the clinician’s report, whereas for the remaining one, the preferences were divided equally. AP, anterior–posterior; CABG, coronary artery bypass graft; IJ, internal jugular; PA-C, physician assistant - certified; SVC, superior vena cava.

Back to article page