Position-Guided Point Cloud Panoptic Segmentation Transformer

Xiao, Zeqi; Zhang, Wenwei; Wang, Tai; Loy, Chen Change; Lin, Dahua; Pang, Jiangmiao

doi:10.1007/s11263-024-02162-z

Position-Guided Point Cloud Panoptic Segmentation Transformer

Published: 27 July 2024

Volume 133, pages 275–290, (2025)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

1058 Accesses
15 Citations
Explore all metrics

Abstract

DEtection TRansformer (DETR) started a trend that uses a group of learnable queries for unified visual perception. This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline. Although the naive adaptation obtains fair results, the instance segmentation performance is noticeably inferior to previous works. By diving into the details, we observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain. Considering instances in 3D are more featured by their positional information, we emphasize their roles during the modeling and design a robust Mixed-parameterized Positional Embedding (MPE) to guide the segmentation process. It is embedded into backbone features and later guides the mask prediction and query update processes iteratively, leading to Position-Aware Segmentation (PA-Seg) and Masked Focal Attention (MFA). All these designs impel the queries to attend to specific regions and identify various instances. The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 2.7% and 1.2% PQ on SemanticKITTI and nuScenes datasets, respectively. The source code and models are available at https://github.com/OpenRobotLab/P3Former.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Panoptic segmentation of 3D point clouds with Gaussian mixture model in outdoor scenes

Article Open access 07 April 2024

Geodesic-Former: A Geodesic-Guided Few-Shot 3D Point Cloud Instance Segmenter

PointAF: A Novel Semantic Segmentation Network for Point Cloud

Data Availibility

The datasets that support the findings of this study are all publicly available for research purposes.

References

Alonso, I., et al. (2020). 3D-mininet: Learning a 2D representation from point clouds for fast and efficient 3d lidar semantic segmentation. IEEE Robotics and Automation Letters, 5(4), 5432–5439.
Article MATH Google Scholar
Behley, J., et al. (2019) Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9297– 9307).
Behley, J., Milioto, A., & Stachniss, C. (2021) A benchmark for LiDAR-based panoptic segmentation based on KITTI. In 2021 IEEE international conference on robotics and automation (ICRA) (pp. 13596–13603). IEEE.
Bichen, Wu., et al. (2018). Squeezeseg: Convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D lidar point cloud. In IEEE international conference on robotics and automation (ICRA) (pp. 1887–1893). IEEE.
Caesar, H., et al. (2020) nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11621–11631).
Carion, N., et al. (2020). End-to-end object detection with transformers. In Computer Vision- ECCV 2020: 16th European Conference Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. (pp. 213-229). Springer.
Chen, L.-C., et al. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.
Article MATH Google Scholar
Cheng, B., et al. (2020). Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12475–12485).
Cheng, R., et al. (2021). 2-s3net: Attentive feature fusion with adaptive feature selection for sparse semantic segmentation network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12547–12556).
Cheng, B., et al. (2021). Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1290–1299).
Cheng, B., Schwing, A., & Kirillov, A. (2021). Per-pixel classification is not all you need for semantic segmentation. Advances in Neural Information Processing Systems, 34, 17864–17875.
Google Scholar
Engelmann, F., et al. (2020). 3d-mpa: Multiproposal aggregation for 3d semantic instancesegmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9031–9040).
Fan, L., et al. (2022). Embracing single stride 3D object detector with sparse transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8458–8468).
Fong, W. K., et al. (2022). Panoptic nuscenes: A large-scale benchmark for lidar panoptic segmentation and tracking. IEEE Robotics and Automation Letters, 7(2), 3795–3802.
Article MATH Google Scholar
Gasperini, S., et al. (2021). Panoster: End-to-end panoptic segmentation of lidar point clouds. IEEE Robotics and Automation Letters, 6(2), 3216–3223.
Article MATH Google Scholar
Graham, B., Engelcke, M., & Van Der Maaten, L., . (2018). 3d semantic segmentation with submanifold sparse convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9224–9232).
Han, L., et al. (2020). Occuseg: Occupancy-aware 3D instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2940–2949).
Hong, F., et al. (2021). Lidar-based panoptic segmentation via dynamic shifting network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13090–13099).
Hou, Y., et al. (2022). Point-to-voxel knowledge distillation for lidar semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8479–8488).
Hu, Q., et al. (2020). Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11108–11117).
Kirillov, A., et al. (2019). Panoptic feature pyramid networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6399–6408).
Kirillov, A., et al. (2019). Panoptic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9404–9413).
Lang, AH., et al. (2019). Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12697–12705).
Li, J., et al. (2022). Panoptic-PHNet: Towards real-time and high-precision LiDAR panoptic segmentation via clustering pseudo heatmap. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11809–1118).
Li, Q., Qi, X., & Torr, PHS. (2020). Unifying training and inference for panoptic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13320–13328).
Lin, T-Yi., et al. (2014 ). Microsoft coco: Common objects in context. In Computer vision-ECCV 2014: 13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. (pp. 740–755). Springer.
Lin, T-Yi., et al. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980–2988).
Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv Preprint retrieved from arXiv:1711.05101
Lyu, Y., Huang, X., & Zhang, Z. (2020). Learning to segment 3D point clouds in 2D image space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12255–12264).
Mao, J., Wang, X., & Li, H. (2019). Interpolated convolutional networks for 3D point cloud understanding. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1578–1587).
Marcuzzi, R., et al. (2023). Mask-based panoptic lidar segmentation for autonomous driving. IEEE Robotics and Automation Letters, 8(2), 1141–1148.
Article MATH Google Scholar
Meng, H-Yu., et al. (2019). Vv-net: Voxel vae net with group convolutions for point cloud segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8500–8508).
Milioto, A., et al. (2019). Rangenet++: Fast and accurate lidar semantic segmentation. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 4213–4220). IEEE.
Milioto, A., et al. (2020). Lidar panoptic segmentation for autonomous driving. In 2020IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 8505–8512). IEEE.
MMDetection3D Contributors. (2020). MMDetection3D: OpenMMLab next-generation platform for general 3D object detection. https://github.com/openmmlab/mmdetection3d
Porzi, L., et al. (2019). Seamless scene segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8277–8286).
Qi, CR., et al. (2017). Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 652–660).
Qi, RC., et al. (2017). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in Neural Information Processing Systems, 30
Razani, R., et al. (2021). GP-S3Net: Graph-based panoptic sparse semantic segmentation network. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 16076–16085).
Sirohi, K., et al. (2021). Efficientlps: Efficient lidar panoptic segmentation. IEEE Transactions on Robotics, 38(3), 1894–1914.
Article MATH Google Scholar
Su, S., et al. (2023). PUPS: Point cloud unified panoptic segmentation. arXiv Preprint retrieved from arXiv:2302.06185
Sudre, CH. et al. (2017). Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In Deep learning in medical image analysis and multimodal learning for clinical decision support: Third international workshop, DLMIA 2017, and 7th international workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, September 14, Proceedings 3. (pp. 240–248). Springer.
Tang, H., et al. (2020). Searching efficient 3D architectures with sparse point-voxel convolution. In Computer vision-ECCV 2020: 16th European conference Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII. (pp. 685–702). Springer.
Thomas, H., et al. (2019). Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6411–6420).
Wang, X., et al. (2020). Solov2: Dynamic and fast instance segmentation. Advances in Neural Information Processing Systems, 33, 17721–17732.
Google Scholar
Wu, W., Qi, Z., & Fuxin, L. (2019). Pointconv: Deep convolutional networks on 3D point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9621–9630).
Xiong, Y., et al. (2019). Upsnet: A unified panoptic segmentation network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8818–8826).
Xu,J., et al. (2021). Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 16024–16033).
Xu, S., et al. (2022). Sparse cross-scale attention network for efficient lidar panoptic segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3), 2920–2928.
Article MATH Google Scholar
Zhang, W., et al. (2021). K-net: Towards unified image segmentation. Advances in Neural Information Processing Systems, 34, 10326–10338.
Google Scholar
Zhou, Z., Zhang, Y., Foroosh, H. (2021) Panoptic-polarnet: Proposal-free lidar point cloud panoptic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13194–13203).
Zhu, X., et al. (2021). Cylindrical and asymmetrical 3D convolution networks for lidar segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9939–9948).

Download references

Author information

Zeqi Xiao, Wenwei Zhang and Tai Wang have contributed equally to this work.

Authors and Affiliations

S-Lab, Nanyang Technological University, Singapore, Singapore
Zeqi Xiao, Wenwei Zhang & Chen Change Loy
The Chinese University of Hong Kong, Hong Kong, China
Tai Wang & Dahua Lin
Shanghai AI Laboratory, Shanghai, China
Jiangmiao Pang

Authors

Zeqi Xiao
View author publications
Search author on:PubMed Google Scholar
Wenwei Zhang
View author publications
Search author on:PubMed Google Scholar
Tai Wang
View author publications
Search author on:PubMed Google Scholar
Chen Change Loy
View author publications
Search author on:PubMed Google Scholar
Dahua Lin
View author publications
Search author on:PubMed Google Scholar
Jiangmiao Pang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Jiangmiao Pang.

Additional information

Communicated by Takayuki Okatani.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xiao, Z., Zhang, W., Wang, T. et al. Position-Guided Point Cloud Panoptic Segmentation Transformer. Int J Comput Vis 133, 275–290 (2025). https://doi.org/10.1007/s11263-024-02162-z

Download citation

Received: 07 August 2023
Accepted: 28 June 2024
Published: 27 July 2024
Version of record: 27 July 2024
Issue date: January 2025
DOI: https://doi.org/10.1007/s11263-024-02162-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Position-Guided Point Cloud Panoptic Segmentation Transformer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Panoptic segmentation of 3D point clouds with Gaussian mixture model in outdoor scenes

Geodesic-Former: A Geodesic-Guided Few-Shot 3D Point Cloud Instance Segmenter

PointAF: A Novel Semantic Segmentation Network for Point Cloud

Explore related subjects

Data Availibility

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now