Learning A Low-Level Vision Generalist via Visual Task Prompt

Chen, Xiangyu; Liu, Yihao; Pu, Yuandong; Zhang, Wenlong; Zhou, Jiantao; Qiao, Yu; Dong, Chao

Abstract:Building a unified model for general low-level vision tasks holds significant research and practical value. Current methods encounter several critical issues. Multi-task restoration approaches can address multiple degradation-to-clean restoration tasks, while their applicability to tasks with different target domains (e.g., image stylization) is limited. Methods like PromptGIP can handle multiple input-target domains but rely on the Masked Autoencoder (MAE) paradigm. Consequently, they are tied to the ViT architecture, resulting in suboptimal image reconstruction quality. In addition, these methods are sensitive to prompt image content and often struggle with low-frequency information processing. In this paper, we propose a Visual task Prompt-based Image Processing (VPIP) framework to overcome these challenges. VPIP employs visual task prompts to manage tasks with different input-target domains and allows flexible selection of backbone network suitable for general tasks. Besides, a new prompt cross-attention is introduced to facilitate interaction between the input and prompt information. Based on the VPIP framework, we train a low-level vision generalist model, namely GenLV, on 30 diverse tasks. Experimental results show that GenLV can successfully address a variety of low-level tasks, significantly outperforming existing methods both quantitatively and qualitatively. Codes are available at this https URL.

Comments:	Accepted to ACMMM24
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2408.08601 [cs.CV]
	(or arXiv:2408.08601v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2408.08601

Computer Science > Computer Vision and Pattern Recognition

Title:Learning A Low-Level Vision Generalist via Visual Task Prompt

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators