Multi-branch Collaborative Learning Network for 3D Visual Grounding

Qian, Zhipeng; Ma, Yiwei; Lin, Zhekai; Ji, Jiayi; Zheng, Xiawu; Sun, Xiaoshuai; Ji, Rongrong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.05363 (cs)

[Submitted on 7 Jul 2024 (v1), last revised 10 Jul 2024 (this version, v2)]

Title:Multi-branch Collaborative Learning Network for 3D Visual Grounding

Authors:Zhipeng Qian, Yiwei Ma, Zhekai Lin, Jiayi Ji, Xiawu Zheng, Xiaoshuai Sun, Rongrong Ji

View PDF HTML (experimental)

Abstract:3D referring expression comprehension (3DREC) and segmentation (3DRES) have overlapping objectives, indicating their potential for collaboration. However, existing collaborative approaches predominantly depend on the results of one task to make predictions for the other, limiting effective collaboration. We argue that employing separate branches for 3DREC and 3DRES tasks enhances the model's capacity to learn specific information for each task, enabling them to acquire complementary knowledge. Thus, we propose the MCLN framework, which includes independent branches for 3DREC and 3DRES tasks. This enables dedicated exploration of each task and effective coordination between the branches. Furthermore, to facilitate mutual reinforcement between these branches, we introduce a Relative Superpoint Aggregation (RSA) module and an Adaptive Soft Alignment (ASA) module. These modules significantly contribute to the precise alignment of prediction results from the two branches, directing the module to allocate increased attention to key positions. Comprehensive experimental evaluation demonstrates that our proposed method achieves state-of-the-art performance on both the 3DREC and 3DRES tasks, with an increase of 2.05% in Acc@0.5 for 3DREC and 3.96% in mIoU for 3DRES.

Comments:	ECCV2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.05363 [cs.CV]
	(or arXiv:2407.05363v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.05363

Submission history

From: Zhipeng Qian [view email]
[v1] Sun, 7 Jul 2024 13:27:14 UTC (2,451 KB)
[v2] Wed, 10 Jul 2024 11:31:50 UTC (2,451 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-branch Collaborative Learning Network for 3D Visual Grounding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-branch Collaborative Learning Network for 3D Visual Grounding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators