Table 1 Comparison of automatic report generation metrics on the MIMIC-CXR dataset
From: Collaboration between clinicians and vision–language models in radiology report generation
Model | Sections | Clinical metrics | ||
---|---|---|---|---|
CheXpert F1 (all) | CheXpert F1 (top 5) | Radiograph F1 | ||
CXR-RePaiR11 | Findings only | 0.281 | – | 0.091 |
M2 Transformer12 | Findings only | – | 0.567 | 0.220 |
RGRG39 | Findings only | 0.447 | 0.547 | – |
Med-PaLM-M22, 12B | Findings only | 0.514 | 0.565 | 0.252 |
R2Gen10 | Findings + Impressions | 0.228 | 0.346 | 0.134 |
WCT14 | Findings + Impressions | 0.294 | – | 0.143 |
CvT-21DistillGPT2 (ref. 13) | Findings + Impressions | 0.384 | – | 0.154 |
BioVil-T15 | Findings + Impressions | 0.317 | – | – |
R2GenGPT29 | Findings + Impressions | 0.389 | – | – |
Flamingo-CXR (Ours) | Findings + Impressions | 0.519 | 0.580 | 0.205 |