这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

Position-Guided Point Cloud Panoptic Segmentation Transformer

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

DEtection TRansformer (DETR) started a trend that uses a group of learnable queries for unified visual perception. This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline. Although the naive adaptation obtains fair results, the instance segmentation performance is noticeably inferior to previous works. By diving into the details, we observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain. Considering instances in 3D are more featured by their positional information, we emphasize their roles during the modeling and design a robust Mixed-parameterized Positional Embedding (MPE) to guide the segmentation process. It is embedded into backbone features and later guides the mask prediction and query update processes iteratively, leading to Position-Aware Segmentation (PA-Seg) and Masked Focal Attention (MFA). All these designs impel the queries to attend to specific regions and identify various instances. The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 2.7% and 1.2% PQ on SemanticKITTI and nuScenes datasets, respectively. The source code and models are available at https://github.com/OpenRobotLab/P3Former.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availibility

The datasets that support the findings of this study are all publicly available for research purposes.

References

  • Alonso, I., et al. (2020). 3D-mininet: Learning a 2D representation from point clouds for fast and efficient 3d lidar semantic segmentation. IEEE Robotics and Automation Letters, 5(4), 5432–5439.

    Article  MATH  Google Scholar 

  • Behley, J., et al. (2019) Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9297– 9307).

  • Behley, J., Milioto, A., & Stachniss, C. (2021) A benchmark for LiDAR-based panoptic segmentation based on KITTI. In 2021 IEEE international conference on robotics and automation (ICRA) (pp. 13596–13603). IEEE.

  • Bichen, Wu., et al. (2018). Squeezeseg: Convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D lidar point cloud. In IEEE international conference on robotics and automation (ICRA) (pp. 1887–1893). IEEE.

  • Caesar, H., et al. (2020) nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11621–11631).

  • Carion, N., et al. (2020). End-to-end object detection with transformers. In Computer Vision- ECCV 2020: 16th European Conference Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. (pp. 213-229). Springer.

  • Chen, L.-C., et al. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.

    Article  MATH  Google Scholar 

  • Cheng, B., et al. (2020). Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12475–12485).

  • Cheng, R., et al. (2021). 2-s3net: Attentive feature fusion with adaptive feature selection for sparse semantic segmentation network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12547–12556).

  • Cheng, B., et al. (2021). Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1290–1299).

  • Cheng, B., Schwing, A., & Kirillov, A. (2021). Per-pixel classification is not all you need for semantic segmentation. Advances in Neural Information Processing Systems, 34, 17864–17875.

    Google Scholar 

  • Engelmann, F., et al. (2020). 3d-mpa: Multiproposal aggregation for 3d semantic instancesegmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9031–9040).

  • Fan, L., et al. (2022). Embracing single stride 3D object detector with sparse transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8458–8468).

  • Fong, W. K., et al. (2022). Panoptic nuscenes: A large-scale benchmark for lidar panoptic segmentation and tracking. IEEE Robotics and Automation Letters, 7(2), 3795–3802.

    Article  MATH  Google Scholar 

  • Gasperini, S., et al. (2021). Panoster: End-to-end panoptic segmentation of lidar point clouds. IEEE Robotics and Automation Letters, 6(2), 3216–3223.

    Article  MATH  Google Scholar 

  • Graham, B., Engelcke, M., & Van Der Maaten, L., . (2018). 3d semantic segmentation with submanifold sparse convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9224–9232).

  • Han, L., et al. (2020). Occuseg: Occupancy-aware 3D instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2940–2949).

  • Hong, F., et al. (2021). Lidar-based panoptic segmentation via dynamic shifting network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13090–13099).

  • Hou, Y., et al. (2022). Point-to-voxel knowledge distillation for lidar semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8479–8488).

  • Hu, Q., et al. (2020). Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11108–11117).

  • Kirillov, A., et al. (2019). Panoptic feature pyramid networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6399–6408).

  • Kirillov, A., et al. (2019). Panoptic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9404–9413).

  • Lang, AH., et al. (2019). Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12697–12705).

  • Li, J., et al. (2022). Panoptic-PHNet: Towards real-time and high-precision LiDAR panoptic segmentation via clustering pseudo heatmap. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11809–1118).

  • Li, Q., Qi, X., & Torr, PHS. (2020). Unifying training and inference for panoptic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13320–13328).

  • Lin, T-Yi., et al. (2014 ). Microsoft coco: Common objects in context. In Computer vision-ECCV 2014: 13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. (pp. 740–755). Springer.

  • Lin, T-Yi., et al. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980–2988).

  • Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv Preprint retrieved from arXiv:1711.05101

  • Lyu, Y., Huang, X., & Zhang, Z. (2020). Learning to segment 3D point clouds in 2D image space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12255–12264).

  • Mao, J., Wang, X., & Li, H. (2019). Interpolated convolutional networks for 3D point cloud understanding. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1578–1587).

  • Marcuzzi, R., et al. (2023). Mask-based panoptic lidar segmentation for autonomous driving. IEEE Robotics and Automation Letters, 8(2), 1141–1148.

    Article  MATH  Google Scholar 

  • Meng, H-Yu., et al. (2019). Vv-net: Voxel vae net with group convolutions for point cloud segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8500–8508).

  • Milioto, A., et al. (2019). Rangenet++: Fast and accurate lidar semantic segmentation. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 4213–4220). IEEE.

  • Milioto, A., et al. (2020). Lidar panoptic segmentation for autonomous driving. In 2020IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 8505–8512). IEEE.

  • MMDetection3D Contributors. (2020). MMDetection3D: OpenMMLab next-generation platform for general 3D object detection. https://github.com/openmmlab/mmdetection3d

  • Porzi, L., et al. (2019). Seamless scene segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8277–8286).

  • Qi, CR., et al. (2017). Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 652–660).

  • Qi, RC., et al. (2017). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in Neural Information Processing Systems, 30

  • Razani, R., et al. (2021). GP-S3Net: Graph-based panoptic sparse semantic segmentation network. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 16076–16085).

  • Sirohi, K., et al. (2021). Efficientlps: Efficient lidar panoptic segmentation. IEEE Transactions on Robotics, 38(3), 1894–1914.

    Article  MATH  Google Scholar 

  • Su, S., et al. (2023). PUPS: Point cloud unified panoptic segmentation. arXiv Preprint retrieved from arXiv:2302.06185

  • Sudre, CH. et al. (2017). Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In Deep learning in medical image analysis and multimodal learning for clinical decision support: Third international workshop, DLMIA 2017, and 7th international workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, September 14, Proceedings 3. (pp. 240–248). Springer.

  • Tang, H., et al. (2020). Searching efficient 3D architectures with sparse point-voxel convolution. In Computer vision-ECCV 2020: 16th European conference Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII. (pp. 685–702). Springer.

  • Thomas, H., et al. (2019). Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6411–6420).

  • Wang, X., et al. (2020). Solov2: Dynamic and fast instance segmentation. Advances in Neural Information Processing Systems, 33, 17721–17732.

    Google Scholar 

  • Wu, W., Qi, Z., & Fuxin, L. (2019). Pointconv: Deep convolutional networks on 3D point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9621–9630).

  • Xiong, Y., et al. (2019). Upsnet: A unified panoptic segmentation network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8818–8826).

  • Xu,J., et al. (2021). Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 16024–16033).

  • Xu, S., et al. (2022). Sparse cross-scale attention network for efficient lidar panoptic segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3), 2920–2928.

    Article  MATH  Google Scholar 

  • Zhang, W., et al. (2021). K-net: Towards unified image segmentation. Advances in Neural Information Processing Systems, 34, 10326–10338.

    Google Scholar 

  • Zhou, Z., Zhang, Y., Foroosh, H. (2021) Panoptic-polarnet: Proposal-free lidar point cloud panoptic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13194–13203).

  • Zhu, X., et al. (2021). Cylindrical and asymmetrical 3D convolution networks for lidar segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9939–9948).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiangmiao Pang.

Additional information

Communicated by Takayuki Okatani.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiao, Z., Zhang, W., Wang, T. et al. Position-Guided Point Cloud Panoptic Segmentation Transformer. Int J Comput Vis 133, 275–290 (2025). https://doi.org/10.1007/s11263-024-02162-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-024-02162-z

Keywords