Abstract
Point cloud completion is a fundamental yet not well-solved problem in 3D vision. Current approaches often rely on 3D coordinate information and/or additional data (e.g., images and scanning viewpoints) to fill in missing parts. Unlike these methods, we explore self-structure augmentation and propose PointSea for global-to-local point cloud completion. In the global stage, consider how we inspect a defective region of a physical object, we may observe it from various perspectives for a better understanding. Inspired by this, PointSea augments data representation by leveraging self-projected depth images from multiple views. To reconstruct a compact global shape from the cross-modal input, we incorporate a feature fusion module to fuse features at both intra-view and inter-view levels. In the local stage, to reveal highly detailed structures, we introduce a point generator called the self-structure dual-generator. This generator integrates both learned shape priors and geometric self-similarities for shape refinement. Unlike existing efforts that apply a unified strategy for all points, our dual-path design adapts refinement strategies conditioned on the structural type of each point, addressing the specific incompleteness of each point. Comprehensive experiments on widely-used benchmarks demonstrate that PointSea effectively understands global shapes and generates local details from incomplete input, showing clear improvements over existing methods. Our code is available at https://github.com/czvvd/SVDFormer_PointSea.
Similar content being viewed by others
Data Availability
All synthetic datasets can be accessed at https://github.com/yuxumin/PoinTr/blob/master/DATASET.md. All real-world data can be accessed at https://github.com/xuelin-chen/pcl2pcl-gan-pub. The scene datasets are publicly available at https://github.com/JinfengX/CasFusionNet.
References
Aiello, E., Valsesia, D., & Magli, E. (2022). Cross-modal learning for image-guided point cloud shape completion. Advances in Neural Information Processing Systems, 35, 37349–37362.
Boss, M., Huang, Z., Vasishta, A., & Jampani, V. (2024). SF3D: Stable fast 3D mesh reconstruction with UV-unwrapping and illumination disentanglement. arXiv preprint arXiv:2408.00653
Cai, Y., Chen, X., Zhang, C., Lin, KY., Wangv X., & Li, H. (2021). Semantic scene completion via integrating instances and scene in-the-loop. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 324–333.
Cao, A. Q., & de Charette, R. (2022). Monoscene: Monocular 3D semantic scene completion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3991–4001.
Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., & Zhang, Y. (2017). Matterport3d: Learning from RGB-D data in indoor environments. arXiv preprint arXiv:1709.06158
Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al. (2015). Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012
Chen, H., Wei, Z., Xu, Y., Wei, M., & Wang, J. (2022). Imlovenet: Misaligned image-supported registration network for low-overlap point cloud pairs. In: ACM SIGGRAPH 2022 conference proceedings, pp 1–9.
Chen, R., Liu, Y., Kong, L., Zhu, X., Ma, Y., Li, Y., Hou, Y., Qiao, Y., & Wang, W. (2023a). Clip2scene: Towards label-efficient 3d scene understanding by clip. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7020–7030.
Chen, X., Lin, K. Y., Qian, C., Zeng, G., & Li, H. (2020). 3D sketch-aware semantic scene completion via semi-supervised structure prior. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4193–4202.
Chen, Z., Long, F., Qiu, Z., Yao, T., Zhou, W., Luo, J., & Mei, T. (2023b). Anchorformer: Point cloud completion from discriminative nodes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13,581–13,590.
Chen, Z., Tang, J., Dong, Y., Cao, Z., Hong, F., Lan, Y., Wang, T., Xie, H., Wu, T., Saito, S., Pan, L., Lin, D., & Liu, Z. (2024). 3dtopia-xl: High-quality 3d PBR asset generation via primitive diffusion. arXiv preprint arXiv:2409.12957
Choi, S., Zhou, Q. Y., Miller, S., & Koltun, V. (2016). A large dataset of object scans. arXiv preprint arXiv:1602.02481
Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T., & Nießner, M. (2017a). Scannet: Richly-annotated 3d reconstructions of indoor scenes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5828–5839.
Dai, A., Ruizhongtai Qi, C., & Nießner, M. (2017b). Shape completion using 3d-encoder-predictor cnns and shape synthesis. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5868–5877.
Dai, G., Xie, J., & Fang, Y. (2018). Siamese CNN-bilstm architecture for 3d shape representation learning. In: IJCAI, pp 670–676.
Deitke, M., Schwenk, D., Salvador, J., Weihs, L., Michel, O., VanderBilt, E., Schmidt, L., Ehsani, K., Kembhavi, A., & Farhadi, A. (2022). Objaverse: A universe of annotated 3d objects. arXiv preprint arXiv:2212.08051
Deitke, M., Liu, R., Wallingford, M., Ngo, H., Michel, O., Kusupati, A., Fan, A., Laforte, C., Voleti, V., Gadre, SY., VanderBilt, E., Kembhavi, A., Vondrick, C., Gkioxari, G., Ehsani K., Schmidt, L., & Farhadi, A. (2023). Objaverse-xl: A universe of 10 m+ 3d objects. arXiv preprint arXiv:2307.05663
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 248–255, https://doi.org/10.1109/CVPR.2009.5206848
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16 x 16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations.
Feng, Y., Zhang, Z., Zhao, X., Ji, R., & Gao, Y. (2018). Gvcnn: Group-view convolutional neural networks for 3d shape recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 264–272.
Firman, M., Mac Aodha, O., Julier, S., & Brostow, G. J. (2016). Structured prediction of unobserved voxels from a single depth image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5431–5440.
Fu, Z., Wang, L., Xu, L., Wang, Z., Laga, H., Guo, Y., Boussaid, F., & Bennamoun, M. (2023). Vapcnet: Viewpoint-aware 3d point cloud completion. In: IEEE/CVF International Conference on Computer Vision, pp 12,108–12,118.
Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11), 1231–1237.
Gong, B., Nie, Y., Lin, Y., Han, X., & Yu, Y. (2021). Me-pcn: Point completion conditioned on mask emptiness. In: IEEE/CVF International Conference on Computer Vision, pp 12,488–12,497.
Guo, Y. X., & Tong, X. (2018). View-volume network for semantic scene completion from a single depth image. arXiv preprint arXiv:1806.05361.
Guo, Z., Zhang, R., Qiu, L., Li, X., & Heng, P. A. (2023). Joint-mae: 2d–3d joint masked autoencoders for 3d point cloud pre-training. In: Thirty-Second International Joint Conference on Artificial Intelligence, pp 791–799.
Han, X., Li, Z., Huang, H., Kalogerakis, E., & Yu, Y. (2017). High-resolution shape completion using deep neural networks for global structure and local geometry inference. In: IEEE/CVF International Conference on Computer Vision, pp 85–93.
Han, Z., Lu, H., Liu, Z., Vong, C. M., Liu, Y. S., Zwicker, M., Han, J., & Chen, C. P. (2019). 3d2seqviews: Aggregating sequential views for 3d global feature learning by CNN with hierarchical attention aggregation. IEEE Transactions on Image Processing, 28(8), 3986–3999.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778.
Hu, F., Chen, H., Lu, X., Zhu, Z., Wang, J., Wang, W., Wang, F. L., & Wei, M. (2022). Spcnet: Stepwise point cloud completion network. Comput Graph Forum, 41(7), 153–164.
Hu, T., Han, Z., Shrivastava, A., & Zwicker, M. (2019). Render4completion: Synthesizing multi-view depth maps for 3d shape completion. In: IEEE/CVF International Conference on Computer Vision.
Huang, D., Peng, S., He, T., Yang, H., Zhou, X., & Ouyang, W. (2023a). Ponder: Point cloud pre-training via neural rendering. In: IEEE/CVF International Conference on Computer Vision, pp 16,089–16,098.
Huang, S. Y., Hsu, H. Y., & Wang, F. (2022). Spovt: Semantic-prototype variational transformer for dense point cloud semantic completion. Advances in Neural Information Processing Systems, 35, 33934–33946.
Huang, T., Dong, B., Yang, Y., Huang, X., Lau, R. W., Ouyang, W., & Zuo, W. (2023b). Clip2point: Transfer clip to point cloud classification with image-depth pre-training. In: IEEE/CVF International Conference on Computer Vision, pp 22,157–22,167.
Huang, Y., Zheng, W., Zhang, Y., Zhou, J., & Lu, J. (2023c). Tri-perspective view for vision-based 3d semantic occupancy prediction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9223–9232.
Huang, Z., Yu, Y., Xu, J., Ni, F., & Le, X. (2020). Pf-net: Point fractal network for 3d point cloud completion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7662–7670.
Jiang, H., Cheng, T., Gao, N., Zhang, H., Lin, T., Liu, W., & Wang, X. (2024). Symphonize 3d semantic scene completion with contextual instance queries. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 20,258–20,267.
Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling laws for neural language models. arXiv preprint arXiv:2001.08361
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Lan, Y., Zhou, S., Lyu, Z., Hong, F., Yang, S., Dai, B., Pan, X., & Loy, C. C. (2024). Gaussiananything: Interactive point cloud latent diffusion for 3d generation. arXiv preprint arXiv:2411.08033
Li, S., Gao, P., Tan, X., & Wei, M. (2023a). Proxyformer: Proxy alignment assisted point cloud completion with missing part sensitive transformer. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9466–9475.
Li, W., Liu, J., Chen, R., Liang, Y., Chen, X., Tan, P., & Long, X. (2024). Craftsman: High-fidelity mesh generation with 3d native generation and interactive geometry refiner. arXiv preprint arXiv:2405.14979
Li, Y., Yu, Z., Choy, C., Xiao, C., Alvarez, J. M., Fidler, S., Feng, C., & Anandkumar, A. (2023b). Voxformer: Sparse voxel transformer for camera-based 3d semantic scene completion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9087–9098.
Liu, M., Sheng, L., Yang, S., Shao, J., & Hu, S. M. (2020). Morphing and sampling network for dense point cloud completion. AAAI Conference on Artificial Intelligence, 34, 11596–11603.
Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa F., El-Nouby A., et al. (2024). Dinov2: Learning robust visual features without supervision. Transactions on Machine Learning Research Journal, 1–31.
Pan, L. (2020). ECG: Edge-aware point cloud completion with graph convolution. IEEE Robotics and Automation Letters, 5(3), 4392–4398.
Pan, L., Chen, X., Cai, Z., Zhang, J., Zhao, H., Yi, S., & Liu, Z. (2023). Variational relational point completion network for robust 3d classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9), 11340–11351.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style., high-performance deep learning library. Advances in Neural Information Processing Systems, 32.
Qi, C. R., Yi, L., Su, H., & Guibas, L. J. (2017). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in Neural Information Processing Systems, 30.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark J., et al. (2021). Learning transferable visual models from natural language supervision. In: International conference on machine learning, pp 8748–8763.
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10,684–10,695.
Song, S., Yu, F., Zeng, A., Chang, A. X., Savva, M., & Funkhouser, T. (2017). Semantic scene completion from a single depth image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1746–1754.
Stutz, D., & Geiger, A. (2018). Learning 3d shape completion from laser scan data with weak supervision. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1955–1964.
Su, H., Maji, S., Kalogerakis, E., & Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3d shape recognition. In: IEEE International Conference on Computer Vision, pp 945–953.
Suvorov, R., Logacheva, E., Mashikhin, A., Remizova, A., Ashukha, A., Silvestrov, A., Kong, N., Goka, H., Park, K., & Lempitsky, V. (2022). Resolution-robust large mask inpainting with Fourier convolutions. In: IEEE/CVF Winter Conference on Applications of Computer Vision, pp 2149–2159.
Tang, J., Gong, Z., Yi, R., Xie, Y., & Ma, L. (2022). Lake-net: Topology-aware point cloud completion by localizing aligned keypoints. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1726–1735.
Tang, J., Ren, J., Zhou, H., Liu, Z., & Zeng, G. (2024). Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. In: The Twelfth International Conference on Learning Representations.
Tchapmi, L. P., Kosaraju, V., Rezatofighi, H., Reid, I., & Savarese, S. (2019). Topnet: Structural point cloud decoder. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 383–392.
Varley, J., DeChant, C., Richardson, A., Ruales, J., & Allen, P. (2017). Shape completion enabled robotic grasping. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 2442–2447.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, AN., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
Wang, X., Ang Jr, M. H., & Lee, G. H. (2020). Cascaded refinement network for point cloud completion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 790–799.
Wang, X., Ang, M. H., & Lee, G. H. (2021). Voxel-based network for shape completion by leveraging edge generation. In: IEEE/CVF International Conference on Computer Vision, pp 13,189–13,198.
Wang, X., Ang, M. H., & Lee, G. H. (2022). Cascaded refinement network for point cloud completion with self-supervision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11), 8139–8150.
Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein, M. M., & Solomon, J. M. (2019). Dynamic graph CNN for learning on point clouds. ACM Transactions On Graphics, 38(5), 1–12.
Wang, Y., Tan, D. J., Navab, N., & Tombari, F. (2019b). Forknet: Multi-branch volumetric semantic completion from a single depth image. In: IEEE/CVF International Conference on Computer Vision, pp 8608–8617.
Wang, Y., Tan, D. J., Navab, N., & Tombari, F. (2022b). Learning local displacements for point cloud completion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1568–1577.
Wang, Z., Yu, X., Rao, Y., Zhou, J., & Lu, J. (2022). P2p: Tuning pre-trained image models for point cloud analysis with point-to-pixel prompting. Advances in Neural Information Processing Systems, 35, 14388–14402.
Wang, Z., Yu, X., Rao, Y., Zhou, J., & Lu, J. (2023). Take-a-photo: 3d-to-2d generative pre-training of point cloud models. In: IEEE/CVF International Conference on Computer Vision, pp 5640–5650.
Wang, Z., Wang, Y., Chen, Y., Xiang, C., Chen, S., Yu, D., Li, C., Su, H., & Zhu, J. (2024). CRM: Single image to 3d textured mesh with convolutional reconstruction model. In: European Conference on Computer Vision, pp 57–74.
Wei, G., Feng, Y., Ma, L., Wang, C., Zhou, Y., & Li, C. (2024). Pcdreamer: Point cloud completion through multi-view diffusion priors. arXiv preprint arXiv:2411.19036
Wei, M., Wei, Z., Zhou, H., Hu, F., Si, H., Chen, Z., Zhu, Z., Qiu, J., Yan, X., Guo, Y., Wang, J., & Qin, J. (2023). Agconv: Adaptive graph convolution on 3d point clouds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8), 9374–9392.
Wei, X., Yu, R., & Sun, J. (2023). Learning view-based graph convolutional network for multi-view 3d shape analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6), 7525–7541.
Wei, Y., Zhao, L., Zheng, W., Zhu, Z., Zhou, J., & Lu, J. (2023c). Surroundocc: Multi-camera 3d occupancy prediction for autonomous driving. In: IEEE/CVF International Conference on Computer Vision, pp 21,729–21,740.
Wen, X., Li, T., Han, Z., & Liu, Y. S. (2020). Point cloud completion by skip-attention network with hierarchical folding. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1939–1948.
Wen, X., Xiang, P., Han, Z., Cao, Y. P., Wan, P., Zheng, W., & Liu, Y. S. (2023). Pmp-net++: Point cloud completion by transformer-enhanced multi-step point moving paths. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1), 852–867.
Wu, T., Pan, L., Zhang, J., Wang, T., Liu, Z., & Lin, D. (2021). Balanced chamfer distance as a comprehensive metric for point cloud completion. Advances in Neural Information Processing Systems, 34, 29088–29100.
Xia, Z., Liu, Y., Li, X., Zhu, X., Ma, Y., Li, Y., Hou, Y., & Qiao, Y. (2023). Scpnet: Semantic scene completion on point cloud. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 17,642–17,651.
Xiang, P., Wen, X., Liu, Y. S., Cao, Y. P., Wan, P., Zheng, W., & Han, Z. (2023). Snowflake point deconvolution for point cloud completion and generation with skip-transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5), 6320–6338.
Xie, H., Yao, H., Zhou, S., Mao, J., Zhang, S., & Sun, W. (2020). Grnet: Gridding residual network for dense point cloud completion. In: European Conference on Computer Vision, pp 365–381.
Xu, J., Li, X., Tang, Y., Yu, Q., Hao, Y., Hu, L., & Chen, M. (2023). Casfusionnet: A cascaded network for point cloud semantic scene completion by dense feature fusion. AAAI Conference on Artificial Intelligence, 37, 3018–3026.
Xu, J., Cheng, W., Gao, Y., Wang, X., Gao, S., & Shan, Y. (2024). Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models. arXiv preprint arXiv:2404.07191
Xu, M., Wang, Y., Liu, Y., He, T., & Qiao, Y. (2023). Cp3: Unifying point cloud completion by pretrain-prompt-predict paradigm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8), 9583–9594.
Yan, X., Gao, J., Li, J., Zhang, R., Li, Z., Huang, R., & Cui, S. (2021). Sparse single sweep lidar point cloud segmentation via learning contextual shape priors from scene completion. AAAI Conference on Artificial Intelligence, 35, 3101–3109.
Yan, X., Yan, H., Wang, J., Du, H., Wu, Z., Xie, D., Pu, S., & Lu, L. (2022). Fbnet: Feedback network for point cloud completion. In: European Conference on Computer Vision, pp 676–693.
Yan, Y., Liu, B., Ai, J., Li, Q., Wan, R., & Pu, J. (2023). Pointssc: A cooperative vehicle-infrastructure point cloud benchmark for semantic scene completion. arXiv preprint arXiv:2309.12708
Yang, Y., Feng, C., Shen, Y., & Tian, D. (2018). Foldingnet: Point cloud auto-encoder via deep grid deformation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 206–215.
Yang, Z., & Wang, L. (2019). Learning relationships for multi-view 3d object recognition. In: IEEE/CVF International Conference on Computer Vision, pp 7505–7514.
Yao, J., Li, C., Sun, K., Cai, Y., Li, H., Ouyang, W., & Li, H. (2023). Ndc-scene: Boost monocular 3d semantic scene completion in normalized device coordinates space. In: IEEE/CVF International Conference on Computer Vision, pp 9421–9431.
Yu, X., Rao, Y., Wang, Z., Liu, Z., Lu, J., & Zhou, J. (2021). Pointr: Diverse point cloud completion with geometry-aware transformers. In: IEEE/CVF International Conference on Computer Vision, pp 12,498–12,507.
Yu, X., Rao, Y., Wang, Z., Lu, J., & Zhou, J. (2023). Adapointr: Diverse point cloud completion with adaptive geometry-aware transformers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(12), 14114–14130.
Yu, X., Xu, M., Zhang, Y., Liu, H., Ye, C., Wu, Y., Yan, Z., Zhu, C., Xiong, Z., Liang, T., et al (2023b). Mvimgnet: A large-scale dataset of multi-view images. In: IEEE/CVF conference on computer vision and pattern recognition, pp 9150–9161.
Yuan, W., Khot, T., Held, D., Mertz, C., & Hebert, M. (2018). PCN: Point completion network. In: International Conference on 3D Vision, pp 728–737.
Zeng, Y., Jiang, C., Mao, J., Han, J., Ye, C., Huang, Q., Yeung, DY., Yang, Z., Liang, X., & Xu, H (2023). Clip2: Contrastive language-image-point pretraining from real-world point cloud data. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 15,244–15,253.
Zhang, B., Zhao, X., Wang, H., & Hu, R. (2022a). Shape completion with points in the shadow. In: SIGGRAPH Asia 2022 Conference Papers, pp 1–9.
Zhang, J., Zhao, H., Yao, A., Chen, Y., Zhang, L., & Liao, H. (2018). Efficient semantic scene completion network with spatial group convolution. In: European Conference on Computer Vision, pp 733–749.
Zhang, L., Rao, A., & Agrawala, M. (2023a). Adding conditional control to text-to-image diffusion models. In: IEEE/CVF International Conference on Computer Vision, pp 3836–3847.
Zhang, L., Wang, Z., Zhang, Q., Qiu, Q., Pang, A., Jiang, H., Yang, W., Xu, L., & Yu, J. (2024). Clay: A controllable large-scale generative model for creating high-quality 3d assets. ACM Transactions on Graphics, 43(4), 1–20.
Zhang, R., Guo, Z., Zhang, W., Li, K., Miao, X., Cui, B., Qiao, Y., Gao, P., & Li, H. (2022b). Pointclip: Point cloud understanding by clip. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8552–8562.
Zhang, R., Wang, L., Qiao, Y., Gao, P., & Li, H. (2023b). Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 21,769–21,780.
Zhang, S., Li, S., Hao, A., & Qin, H. (2021). Point cloud semantic scene completion from RGB-D images. AAAI Conference on Artificial Intelligence, 35, 3385–3393.
Zhang, S., Liu, X., Xie, H., Nie, L., Zhou, H., Tao, D., & Li, X. (2023). Learning geometric transformation for point cloud completion. International Journal of Computer Vision, 131(9), 2425–2445.
Zhang, W., Yan, Q., & Xiao, C. (2020). Detail preserved point cloud completion via separated feature aggregation. In: European Conference on Computer Vision, pp 512–528.
Zhang, W., Zhou, H., Dong, Z., Liu, J., Yan, Q., & Xiao, C. (2023). Point cloud completion via skeleton-detail transformer. IEEE Transactions on Visualization and Computer Graphics, 29(10), 4229–4242.
Zhang, X., Feng, Y., Li, S., Zou, C., Wan, H., Zhao, X., Guo, Y., & Gao, Y. (2021b). View-guided point cloud completion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 15,890–15,899.
Zhao, W., Lei, J., Wen, Y., Zhang, J., & Jia, K. (2021). Sign-agnostic implicit learning of surface self-similarities for shape modeling and reconstruction from raw point clouds. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10,256–10,265.
Zhou, H., Cao, Y., Chu, W., Zhu, J., Lu, T., Tai, Y., & Wang, C. (2022). Seedformer: Patch seeds based point cloud completion with upsample transformer. In: European Conference on Computer Vision, pp 416–432.
Zhu, X., Zhang, R., He, B., Guo, Z., Zeng, Z., Qin, Z., Zhang, S., & Gao, P. (2023a). Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning. In: IEEE/CVF International Conference on Computer Vision, pp 2639–2650.
Zhu, Z., Chen, H., He, X., Wang, W., Qin, J., & Wei, M. (2023b). Svdformer: Complementing point cloud via self-view augmentation and self-structure dual-generator. In: IEEE/CVF International Conference on Computer Vision, pp 14,508–14,518.
Zhu, Z., Nan, L., Xie, H., Chen, H., Wang, J., Wei, M., & Qin, J. (2024). CSDN: Cross-modal shape-transfer dual-refinement network for point cloud completion. IEEE Transactions on Visualization and Computer Graphics, 30(7), 3545–3563. https://doi.org/10.1109/TVCG.2023.3236061
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Nos. T2322012, 62172218, 62032011), the Shenzhen Science and Technology Program (Nos. JCYJ20220818103401003, JCYJ20220530172403007), and the Guangdong Basic and Applied Basic Research Foundation (No. 2022A1515010170).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Wanli Ouyang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, Z., Chen, H., He, X. et al. PointSea: Point Cloud Completion via Self-structure Augmentation. Int J Comput Vis 133, 4770–4794 (2025). https://doi.org/10.1007/s11263-025-02400-y
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-025-02400-y