AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation

Fathi, Nima; Kumar, Amar; Arbel, Tal

Computer Science > Computer Vision and Pattern Recognition

arXiv:2507.16940 (cs)

[Submitted on 22 Jul 2025]

Title:AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation

Authors:Nima Fathi, Amar Kumar, Tal Arbel

View PDF HTML (experimental)

Abstract:Recent advancements in Large Language Models (LLMs) have catalyzed a paradigm shift from static prediction systems to agentic AI agents capable of reasoning, interacting with tools, and adapting to complex tasks. While LLM-based agentic systems have shown promise across many domains, their application to medical imaging remains in its infancy. In this work, we introduce AURA, the first visual linguistic explainability agent designed specifically for comprehensive analysis, explanation, and evaluation of medical images. By enabling dynamic interactions, contextual explanations, and hypothesis testing, AURA represents a significant advancement toward more transparent, adaptable, and clinically aligned AI systems. We highlight the promise of agentic AI in transforming medical image analysis from static predictions to interactive decision support. Leveraging Qwen-32B, an LLM-based architecture, AURA integrates a modular toolbox comprising: (i) a segmentation suite with phase grounding, pathology segmentation, and anatomy segmentation to localize clinically meaningful regions; (ii) a counterfactual image-generation module that supports reasoning through image-level explanations; and (iii) a set of evaluation tools including pixel-wise difference-map analysis, classification, and advanced state-of-the-art components to assess diagnostic relevance and visual interpretability.

Comments:	9 pages, 3 figures, International Conference on Medical Image Computing and Computer-Assisted Intervention
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Cite as:	arXiv:2507.16940 [cs.CV]
	(or arXiv:2507.16940v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2507.16940

Submission history

From: Nima Fathi [view email]
[v1] Tue, 22 Jul 2025 18:24:18 UTC (17,263 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators