这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

PointSea: Point Cloud Completion via Self-structure Augmentation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Point cloud completion is a fundamental yet not well-solved problem in 3D vision. Current approaches often rely on 3D coordinate information and/or additional data (e.g., images and scanning viewpoints) to fill in missing parts. Unlike these methods, we explore self-structure augmentation and propose PointSea for global-to-local point cloud completion. In the global stage, consider how we inspect a defective region of a physical object, we may observe it from various perspectives for a better understanding. Inspired by this, PointSea augments data representation by leveraging self-projected depth images from multiple views. To reconstruct a compact global shape from the cross-modal input, we incorporate a feature fusion module to fuse features at both intra-view and inter-view levels. In the local stage, to reveal highly detailed structures, we introduce a point generator called the self-structure dual-generator. This generator integrates both learned shape priors and geometric self-similarities for shape refinement. Unlike existing efforts that apply a unified strategy for all points, our dual-path design adapts refinement strategies conditioned on the structural type of each point, addressing the specific incompleteness of each point. Comprehensive experiments on widely-used benchmarks demonstrate that PointSea effectively understands global shapes and generates local details from incomplete input, showing clear improvements over existing methods. Our code is available at https://github.com/czvvd/SVDFormer_PointSea.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Data Availability

All synthetic datasets can be accessed at https://github.com/yuxumin/PoinTr/blob/master/DATASET.md. All real-world data can be accessed at https://github.com/xuelin-chen/pcl2pcl-gan-pub. The scene datasets are publicly available at https://github.com/JinfengX/CasFusionNet.

References

  • Aiello, E., Valsesia, D., & Magli, E. (2022). Cross-modal learning for image-guided point cloud shape completion. Advances in Neural Information Processing Systems, 35, 37349–37362.

    Google Scholar 

  • Boss, M., Huang, Z., Vasishta, A., & Jampani, V. (2024). SF3D: Stable fast 3D mesh reconstruction with UV-unwrapping and illumination disentanglement. arXiv preprint arXiv:2408.00653

  • Cai, Y., Chen, X., Zhang, C., Lin, KY., Wangv X., & Li, H. (2021). Semantic scene completion via integrating instances and scene in-the-loop. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 324–333.

  • Cao, A. Q., & de Charette, R. (2022). Monoscene: Monocular 3D semantic scene completion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3991–4001.

  • Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., & Zhang, Y. (2017). Matterport3d: Learning from RGB-D data in indoor environments. arXiv preprint arXiv:1709.06158

  • Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al. (2015). Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012

  • Chen, H., Wei, Z., Xu, Y., Wei, M., & Wang, J. (2022). Imlovenet: Misaligned image-supported registration network for low-overlap point cloud pairs. In: ACM SIGGRAPH 2022 conference proceedings, pp 1–9.

  • Chen, R., Liu, Y., Kong, L., Zhu, X., Ma, Y., Li, Y., Hou, Y., Qiao, Y., & Wang, W. (2023a). Clip2scene: Towards label-efficient 3d scene understanding by clip. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7020–7030.

  • Chen, X., Lin, K. Y., Qian, C., Zeng, G., & Li, H. (2020). 3D sketch-aware semantic scene completion via semi-supervised structure prior. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4193–4202.

  • Chen, Z., Long, F., Qiu, Z., Yao, T., Zhou, W., Luo, J., & Mei, T. (2023b). Anchorformer: Point cloud completion from discriminative nodes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13,581–13,590.

  • Chen, Z., Tang, J., Dong, Y., Cao, Z., Hong, F., Lan, Y., Wang, T., Xie, H., Wu, T., Saito, S., Pan, L., Lin, D., & Liu, Z. (2024). 3dtopia-xl: High-quality 3d PBR asset generation via primitive diffusion. arXiv preprint arXiv:2409.12957

  • Choi, S., Zhou, Q. Y., Miller, S., & Koltun, V. (2016). A large dataset of object scans. arXiv preprint arXiv:1602.02481

  • Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T., & Nießner, M. (2017a). Scannet: Richly-annotated 3d reconstructions of indoor scenes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5828–5839.

  • Dai, A., Ruizhongtai Qi, C., & Nießner, M. (2017b). Shape completion using 3d-encoder-predictor cnns and shape synthesis. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5868–5877.

  • Dai, G., Xie, J., & Fang, Y. (2018). Siamese CNN-bilstm architecture for 3d shape representation learning. In: IJCAI, pp 670–676.

  • Deitke, M., Schwenk, D., Salvador, J., Weihs, L., Michel, O., VanderBilt, E., Schmidt, L., Ehsani, K., Kembhavi, A., & Farhadi, A. (2022). Objaverse: A universe of annotated 3d objects. arXiv preprint arXiv:2212.08051

  • Deitke, M., Liu, R., Wallingford, M., Ngo, H., Michel, O., Kusupati, A., Fan, A., Laforte, C., Voleti, V., Gadre, SY., VanderBilt, E., Kembhavi, A., Vondrick, C., Gkioxari, G., Ehsani K., Schmidt, L., & Farhadi, A. (2023). Objaverse-xl: A universe of 10 m+ 3d objects. arXiv preprint arXiv:2307.05663

  • Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 248–255, https://doi.org/10.1109/CVPR.2009.5206848

  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16 x 16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations.

  • Feng, Y., Zhang, Z., Zhao, X., Ji, R., & Gao, Y. (2018). Gvcnn: Group-view convolutional neural networks for 3d shape recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 264–272.

  • Firman, M., Mac Aodha, O., Julier, S., & Brostow, G. J. (2016). Structured prediction of unobserved voxels from a single depth image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5431–5440.

  • Fu, Z., Wang, L., Xu, L., Wang, Z., Laga, H., Guo, Y., Boussaid, F., & Bennamoun, M. (2023). Vapcnet: Viewpoint-aware 3d point cloud completion. In: IEEE/CVF International Conference on Computer Vision, pp 12,108–12,118.

  • Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11), 1231–1237.

    Article  Google Scholar 

  • Gong, B., Nie, Y., Lin, Y., Han, X., & Yu, Y. (2021). Me-pcn: Point completion conditioned on mask emptiness. In: IEEE/CVF International Conference on Computer Vision, pp 12,488–12,497.

  • Guo, Y. X., & Tong, X. (2018). View-volume network for semantic scene completion from a single depth image. arXiv preprint arXiv:1806.05361.

  • Guo, Z., Zhang, R., Qiu, L., Li, X., & Heng, P. A. (2023). Joint-mae: 2d–3d joint masked autoencoders for 3d point cloud pre-training. In: Thirty-Second International Joint Conference on Artificial Intelligence, pp 791–799.

  • Han, X., Li, Z., Huang, H., Kalogerakis, E., & Yu, Y. (2017). High-resolution shape completion using deep neural networks for global structure and local geometry inference. In: IEEE/CVF International Conference on Computer Vision, pp 85–93.

  • Han, Z., Lu, H., Liu, Z., Vong, C. M., Liu, Y. S., Zwicker, M., Han, J., & Chen, C. P. (2019). 3d2seqviews: Aggregating sequential views for 3d global feature learning by CNN with hierarchical attention aggregation. IEEE Transactions on Image Processing, 28(8), 3986–3999.

    Article  MathSciNet  Google Scholar 

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778.

  • Hu, F., Chen, H., Lu, X., Zhu, Z., Wang, J., Wang, W., Wang, F. L., & Wei, M. (2022). Spcnet: Stepwise point cloud completion network. Comput Graph Forum, 41(7), 153–164.

    Article  Google Scholar 

  • Hu, T., Han, Z., Shrivastava, A., & Zwicker, M. (2019). Render4completion: Synthesizing multi-view depth maps for 3d shape completion. In: IEEE/CVF International Conference on Computer Vision.

  • Huang, D., Peng, S., He, T., Yang, H., Zhou, X., & Ouyang, W. (2023a). Ponder: Point cloud pre-training via neural rendering. In: IEEE/CVF International Conference on Computer Vision, pp 16,089–16,098.

  • Huang, S. Y., Hsu, H. Y., & Wang, F. (2022). Spovt: Semantic-prototype variational transformer for dense point cloud semantic completion. Advances in Neural Information Processing Systems, 35, 33934–33946.

    Google Scholar 

  • Huang, T., Dong, B., Yang, Y., Huang, X., Lau, R. W., Ouyang, W., & Zuo, W. (2023b). Clip2point: Transfer clip to point cloud classification with image-depth pre-training. In: IEEE/CVF International Conference on Computer Vision, pp 22,157–22,167.

  • Huang, Y., Zheng, W., Zhang, Y., Zhou, J., & Lu, J. (2023c). Tri-perspective view for vision-based 3d semantic occupancy prediction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9223–9232.

  • Huang, Z., Yu, Y., Xu, J., Ni, F., & Le, X. (2020). Pf-net: Point fractal network for 3d point cloud completion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7662–7670.

  • Jiang, H., Cheng, T., Gao, N., Zhang, H., Lin, T., Liu, W., & Wang, X. (2024). Symphonize 3d semantic scene completion with contextual instance queries. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 20,258–20,267.

  • Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling laws for neural language models. arXiv preprint arXiv:2001.08361

  • Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  • Lan, Y., Zhou, S., Lyu, Z., Hong, F., Yang, S., Dai, B., Pan, X., & Loy, C. C. (2024). Gaussiananything: Interactive point cloud latent diffusion for 3d generation. arXiv preprint arXiv:2411.08033

  • Li, S., Gao, P., Tan, X., & Wei, M. (2023a). Proxyformer: Proxy alignment assisted point cloud completion with missing part sensitive transformer. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9466–9475.

  • Li, W., Liu, J., Chen, R., Liang, Y., Chen, X., Tan, P., & Long, X. (2024). Craftsman: High-fidelity mesh generation with 3d native generation and interactive geometry refiner. arXiv preprint arXiv:2405.14979

  • Li, Y., Yu, Z., Choy, C., Xiao, C., Alvarez, J. M., Fidler, S., Feng, C., & Anandkumar, A. (2023b). Voxformer: Sparse voxel transformer for camera-based 3d semantic scene completion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9087–9098.

  • Liu, M., Sheng, L., Yang, S., Shao, J., & Hu, S. M. (2020). Morphing and sampling network for dense point cloud completion. AAAI Conference on Artificial Intelligence, 34, 11596–11603.

    Article  Google Scholar 

  • Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa F., El-Nouby A., et al. (2024). Dinov2: Learning robust visual features without supervision. Transactions on Machine Learning Research Journal, 1–31.

  • Pan, L. (2020). ECG: Edge-aware point cloud completion with graph convolution. IEEE Robotics and Automation Letters, 5(3), 4392–4398.

    Article  Google Scholar 

  • Pan, L., Chen, X., Cai, Z., Zhang, J., Zhao, H., Yi, S., & Liu, Z. (2023). Variational relational point completion network for robust 3d classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9), 11340–11351.

    Article  Google Scholar 

  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style., high-performance deep learning library. Advances in Neural Information Processing Systems, 32.

  • Qi, C. R., Yi, L., Su, H., & Guibas, L. J. (2017). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in Neural Information Processing Systems, 30.

  • Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark J., et al. (2021). Learning transferable visual models from natural language supervision. In: International conference on machine learning, pp 8748–8763.

  • Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10,684–10,695.

  • Song, S., Yu, F., Zeng, A., Chang, A. X., Savva, M., & Funkhouser, T. (2017). Semantic scene completion from a single depth image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1746–1754.

  • Stutz, D., & Geiger, A. (2018). Learning 3d shape completion from laser scan data with weak supervision. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1955–1964.

  • Su, H., Maji, S., Kalogerakis, E., & Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3d shape recognition. In: IEEE International Conference on Computer Vision, pp 945–953.

  • Suvorov, R., Logacheva, E., Mashikhin, A., Remizova, A., Ashukha, A., Silvestrov, A., Kong, N., Goka, H., Park, K., & Lempitsky, V. (2022). Resolution-robust large mask inpainting with Fourier convolutions. In: IEEE/CVF Winter Conference on Applications of Computer Vision, pp 2149–2159.

  • Tang, J., Gong, Z., Yi, R., Xie, Y., & Ma, L. (2022). Lake-net: Topology-aware point cloud completion by localizing aligned keypoints. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1726–1735.

  • Tang, J., Ren, J., Zhou, H., Liu, Z., & Zeng, G. (2024). Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. In: The Twelfth International Conference on Learning Representations.

  • Tchapmi, L. P., Kosaraju, V., Rezatofighi, H., Reid, I., & Savarese, S. (2019). Topnet: Structural point cloud decoder. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 383–392.

  • Varley, J., DeChant, C., Richardson, A., Ruales, J., & Allen, P. (2017). Shape completion enabled robotic grasping. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 2442–2447.

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, AN., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.

  • Wang, X., Ang Jr, M. H., & Lee, G. H. (2020). Cascaded refinement network for point cloud completion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 790–799.

  • Wang, X., Ang, M. H., & Lee, G. H. (2021). Voxel-based network for shape completion by leveraging edge generation. In: IEEE/CVF International Conference on Computer Vision, pp 13,189–13,198.

  • Wang, X., Ang, M. H., & Lee, G. H. (2022). Cascaded refinement network for point cloud completion with self-supervision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11), 8139–8150.

    Google Scholar 

  • Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein, M. M., & Solomon, J. M. (2019). Dynamic graph CNN for learning on point clouds. ACM Transactions On Graphics, 38(5), 1–12.

    Article  Google Scholar 

  • Wang, Y., Tan, D. J., Navab, N., & Tombari, F. (2019b). Forknet: Multi-branch volumetric semantic completion from a single depth image. In: IEEE/CVF International Conference on Computer Vision, pp 8608–8617.

  • Wang, Y., Tan, D. J., Navab, N., & Tombari, F. (2022b). Learning local displacements for point cloud completion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1568–1577.

  • Wang, Z., Yu, X., Rao, Y., Zhou, J., & Lu, J. (2022). P2p: Tuning pre-trained image models for point cloud analysis with point-to-pixel prompting. Advances in Neural Information Processing Systems, 35, 14388–14402.

    Google Scholar 

  • Wang, Z., Yu, X., Rao, Y., Zhou, J., & Lu, J. (2023). Take-a-photo: 3d-to-2d generative pre-training of point cloud models. In: IEEE/CVF International Conference on Computer Vision, pp 5640–5650.

  • Wang, Z., Wang, Y., Chen, Y., Xiang, C., Chen, S., Yu, D., Li, C., Su, H., & Zhu, J. (2024). CRM: Single image to 3d textured mesh with convolutional reconstruction model. In: European Conference on Computer Vision, pp 57–74.

  • Wei, G., Feng, Y., Ma, L., Wang, C., Zhou, Y., & Li, C. (2024). Pcdreamer: Point cloud completion through multi-view diffusion priors. arXiv preprint arXiv:2411.19036

  • Wei, M., Wei, Z., Zhou, H., Hu, F., Si, H., Chen, Z., Zhu, Z., Qiu, J., Yan, X., Guo, Y., Wang, J., & Qin, J. (2023). Agconv: Adaptive graph convolution on 3d point clouds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8), 9374–9392.

    Article  Google Scholar 

  • Wei, X., Yu, R., & Sun, J. (2023). Learning view-based graph convolutional network for multi-view 3d shape analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6), 7525–7541.

    Article  Google Scholar 

  • Wei, Y., Zhao, L., Zheng, W., Zhu, Z., Zhou, J., & Lu, J. (2023c). Surroundocc: Multi-camera 3d occupancy prediction for autonomous driving. In: IEEE/CVF International Conference on Computer Vision, pp 21,729–21,740.

  • Wen, X., Li, T., Han, Z., & Liu, Y. S. (2020). Point cloud completion by skip-attention network with hierarchical folding. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1939–1948.

  • Wen, X., Xiang, P., Han, Z., Cao, Y. P., Wan, P., Zheng, W., & Liu, Y. S. (2023). Pmp-net++: Point cloud completion by transformer-enhanced multi-step point moving paths. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1), 852–867.

    Article  Google Scholar 

  • Wu, T., Pan, L., Zhang, J., Wang, T., Liu, Z., & Lin, D. (2021). Balanced chamfer distance as a comprehensive metric for point cloud completion. Advances in Neural Information Processing Systems, 34, 29088–29100.

    Google Scholar 

  • Xia, Z., Liu, Y., Li, X., Zhu, X., Ma, Y., Li, Y., Hou, Y., & Qiao, Y. (2023). Scpnet: Semantic scene completion on point cloud. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 17,642–17,651.

  • Xiang, P., Wen, X., Liu, Y. S., Cao, Y. P., Wan, P., Zheng, W., & Han, Z. (2023). Snowflake point deconvolution for point cloud completion and generation with skip-transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5), 6320–6338.

    Google Scholar 

  • Xie, H., Yao, H., Zhou, S., Mao, J., Zhang, S., & Sun, W. (2020). Grnet: Gridding residual network for dense point cloud completion. In: European Conference on Computer Vision, pp 365–381.

  • Xu, J., Li, X., Tang, Y., Yu, Q., Hao, Y., Hu, L., & Chen, M. (2023). Casfusionnet: A cascaded network for point cloud semantic scene completion by dense feature fusion. AAAI Conference on Artificial Intelligence, 37, 3018–3026.

    Article  Google Scholar 

  • Xu, J., Cheng, W., Gao, Y., Wang, X., Gao, S., & Shan, Y. (2024). Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models. arXiv preprint arXiv:2404.07191

  • Xu, M., Wang, Y., Liu, Y., He, T., & Qiao, Y. (2023). Cp3: Unifying point cloud completion by pretrain-prompt-predict paradigm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8), 9583–9594.

    Article  Google Scholar 

  • Yan, X., Gao, J., Li, J., Zhang, R., Li, Z., Huang, R., & Cui, S. (2021). Sparse single sweep lidar point cloud segmentation via learning contextual shape priors from scene completion. AAAI Conference on Artificial Intelligence, 35, 3101–3109.

    Article  Google Scholar 

  • Yan, X., Yan, H., Wang, J., Du, H., Wu, Z., Xie, D., Pu, S., & Lu, L. (2022). Fbnet: Feedback network for point cloud completion. In: European Conference on Computer Vision, pp 676–693.

  • Yan, Y., Liu, B., Ai, J., Li, Q., Wan, R., & Pu, J. (2023). Pointssc: A cooperative vehicle-infrastructure point cloud benchmark for semantic scene completion. arXiv preprint arXiv:2309.12708

  • Yang, Y., Feng, C., Shen, Y., & Tian, D. (2018). Foldingnet: Point cloud auto-encoder via deep grid deformation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 206–215.

  • Yang, Z., & Wang, L. (2019). Learning relationships for multi-view 3d object recognition. In: IEEE/CVF International Conference on Computer Vision, pp 7505–7514.

  • Yao, J., Li, C., Sun, K., Cai, Y., Li, H., Ouyang, W., & Li, H. (2023). Ndc-scene: Boost monocular 3d semantic scene completion in normalized device coordinates space. In: IEEE/CVF International Conference on Computer Vision, pp 9421–9431.

  • Yu, X., Rao, Y., Wang, Z., Liu, Z., Lu, J., & Zhou, J. (2021). Pointr: Diverse point cloud completion with geometry-aware transformers. In: IEEE/CVF International Conference on Computer Vision, pp 12,498–12,507.

  • Yu, X., Rao, Y., Wang, Z., Lu, J., & Zhou, J. (2023). Adapointr: Diverse point cloud completion with adaptive geometry-aware transformers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(12), 14114–14130.

    Article  Google Scholar 

  • Yu, X., Xu, M., Zhang, Y., Liu, H., Ye, C., Wu, Y., Yan, Z., Zhu, C., Xiong, Z., Liang, T., et al (2023b). Mvimgnet: A large-scale dataset of multi-view images. In: IEEE/CVF conference on computer vision and pattern recognition, pp 9150–9161.

  • Yuan, W., Khot, T., Held, D., Mertz, C., & Hebert, M. (2018). PCN: Point completion network. In: International Conference on 3D Vision, pp 728–737.

  • Zeng, Y., Jiang, C., Mao, J., Han, J., Ye, C., Huang, Q., Yeung, DY., Yang, Z., Liang, X., & Xu, H (2023). Clip2: Contrastive language-image-point pretraining from real-world point cloud data. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 15,244–15,253.

  • Zhang, B., Zhao, X., Wang, H., & Hu, R. (2022a). Shape completion with points in the shadow. In: SIGGRAPH Asia 2022 Conference Papers, pp 1–9.

  • Zhang, J., Zhao, H., Yao, A., Chen, Y., Zhang, L., & Liao, H. (2018). Efficient semantic scene completion network with spatial group convolution. In: European Conference on Computer Vision, pp 733–749.

  • Zhang, L., Rao, A., & Agrawala, M. (2023a). Adding conditional control to text-to-image diffusion models. In: IEEE/CVF International Conference on Computer Vision, pp 3836–3847.

  • Zhang, L., Wang, Z., Zhang, Q., Qiu, Q., Pang, A., Jiang, H., Yang, W., Xu, L., & Yu, J. (2024). Clay: A controllable large-scale generative model for creating high-quality 3d assets. ACM Transactions on Graphics, 43(4), 1–20.

    Google Scholar 

  • Zhang, R., Guo, Z., Zhang, W., Li, K., Miao, X., Cui, B., Qiao, Y., Gao, P., & Li, H. (2022b). Pointclip: Point cloud understanding by clip. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8552–8562.

  • Zhang, R., Wang, L., Qiao, Y., Gao, P., & Li, H. (2023b). Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 21,769–21,780.

  • Zhang, S., Li, S., Hao, A., & Qin, H. (2021). Point cloud semantic scene completion from RGB-D images. AAAI Conference on Artificial Intelligence, 35, 3385–3393.

    Article  Google Scholar 

  • Zhang, S., Liu, X., Xie, H., Nie, L., Zhou, H., Tao, D., & Li, X. (2023). Learning geometric transformation for point cloud completion. International Journal of Computer Vision, 131(9), 2425–2445.

    Article  Google Scholar 

  • Zhang, W., Yan, Q., & Xiao, C. (2020). Detail preserved point cloud completion via separated feature aggregation. In: European Conference on Computer Vision, pp 512–528.

  • Zhang, W., Zhou, H., Dong, Z., Liu, J., Yan, Q., & Xiao, C. (2023). Point cloud completion via skeleton-detail transformer. IEEE Transactions on Visualization and Computer Graphics, 29(10), 4229–4242.

    Article  Google Scholar 

  • Zhang, X., Feng, Y., Li, S., Zou, C., Wan, H., Zhao, X., Guo, Y., & Gao, Y. (2021b). View-guided point cloud completion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 15,890–15,899.

  • Zhao, W., Lei, J., Wen, Y., Zhang, J., & Jia, K. (2021). Sign-agnostic implicit learning of surface self-similarities for shape modeling and reconstruction from raw point clouds. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10,256–10,265.

  • Zhou, H., Cao, Y., Chu, W., Zhu, J., Lu, T., Tai, Y., & Wang, C. (2022). Seedformer: Patch seeds based point cloud completion with upsample transformer. In: European Conference on Computer Vision, pp 416–432.

  • Zhu, X., Zhang, R., He, B., Guo, Z., Zeng, Z., Qin, Z., Zhang, S., & Gao, P. (2023a). Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning. In: IEEE/CVF International Conference on Computer Vision, pp 2639–2650.

  • Zhu, Z., Chen, H., He, X., Wang, W., Qin, J., & Wei, M. (2023b). Svdformer: Complementing point cloud via self-view augmentation and self-structure dual-generator. In: IEEE/CVF International Conference on Computer Vision, pp 14,508–14,518.

  • Zhu, Z., Nan, L., Xie, H., Chen, H., Wang, J., Wei, M., & Qin, J. (2024). CSDN: Cross-modal shape-transfer dual-refinement network for point cloud completion. IEEE Transactions on Visualization and Computer Graphics, 30(7), 3545–3563. https://doi.org/10.1109/TVCG.2023.3236061

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. T2322012, 62172218, 62032011), the Shenzhen Science and Technology Program (Nos. JCYJ20220818103401003, JCYJ20220530172403007), and the Guangdong Basic and Applied Basic Research Foundation (No. 2022A1515010170).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhe Zhu.

Additional information

Communicated by Wanli Ouyang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Z., Chen, H., He, X. et al. PointSea: Point Cloud Completion via Self-structure Augmentation. Int J Comput Vis 133, 4770–4794 (2025). https://doi.org/10.1007/s11263-025-02400-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-025-02400-y

Keywords