这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

Implementation and Validation of Distributed Bundle Adjustment for Super Large Scale Datasets

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We propose a distributed bundle adjustment (DBA) method using the Levenberg-Marquardt (LM) algorithm for super large-scale datasets. Most of the existing methods partition the global map to small ones and conduct bundle adjustment in the submaps. In order to fit the parallel framework, they use approximate solutions instead of the LM algorithm. However, those methods often give sub-optimal results. Different from them, we utilize the LM algorithm to conduct global bundle adjustment where the formation of the reduced camera system (RCS) is actually parallelized and executed in a distributed way. To store the large RCS, we compress it with a block-based sparse matrix compression format (BSMC), which fully exploits its block feature. The BSMC format also enables the distributed storage and updating of the global RCS. The proposed method is extensively evaluated and compared with the state-of-the-art pipelines using both synthetic and real datasets. Preliminary results demonstrate the efficient memory usage and vast scalability of the proposed method compared with the baselines. For the first time, we conducted parallel bundle adjustment using LM algorithm on a real datasets with 1.18 million images and a synthetic dataset with 10 million images (about 500 times that of the state-of-the-art LM-based BA) on a distributed computing system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Algorithm 2
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data Availability

The code for BSMC is available at https://github.com/MozartZheng/DistributedBA, the public version of the whole pipeline will also be released when it is ready.

References

  • Agarwal, S., Snavely, N., Seitz, S.M., & Szeliski, R. (2010). Bundle adjustment in the large. In: Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part II 11, pp. 29–42. Springer.

  • Agarwal, S., Furukawa, Y., Snavely, N., Simon, I., Curless, B., Seitz, S. M., & Szeliski, R. (2011). Building rome in a day. Communications of the ACM, 54(10), 105–112.

    Article  Google Scholar 

  • Agarwal, S., & Mierle, K. (2012). Ceres solver: Tutorial & reference. Google Inc, 2(72), 8.

    Google Scholar 

  • Bell, N., & Garland, M. (2009). Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–11.

  • Byröd, M., & Åström, K. (2009). Bundle adjustment using conjugate gradients with multiscale preconditioning. In: BMVC, pp. 1–10. Citeseer.

  • Byröd, M., & Åström, K. (2010). Conjugate gradient bundle adjustment. In: Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part II 11, pp. 114–127. Springer Berlin Heidelberg

  • Demmel, N., Gao, M., Laude, E., Wu, T., & Cremers, D. (2020). Distributed photometric bundle adjustment. In: 2020 International Conference on 3D Vision (3DV), pp. 140–149. IEEE.

  • Eriksson, A., Bastian, J., Chin, T.-J., & Isaksson, M. (2016). A consensus-based framework for distributed bundle adjustment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1754–1762.

  • Frahm, J.-M., Fite-Georgel, P., Gallup, D., Johnson, T., Raguram, R., Wu, C., Jen, Y.-H., Dunn, E., Clipp, B., Lazebnik, S., & Pollefeys, M. (2010). Building rome on a cloudless day. In: European Conference on Computer Vision.

  • Fu, Y., Liu, S., Kulkarni, A., Kautz, J., Efros, A.A., & Wang, X. (2024). COLMAP-Free 3D Gaussian Splatting.

  • Govindu, V.M. (2004). Lie-algebraic averaging for globally consistent motion estimation. In: IEEE Computer Society Conference on Computer Vision & Pattern Recognition.

  • Grisetti, G., Kümmerle, R., Strasdat, H., & Konolige, K. (2011). g2o: A general framework for (hyper) graph optimization. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, pp. 9–13.

  • Hartley, R., Trumpf, J., Dai, Y., & Li, H. (2013). Rotation averaging. International journal of computer vision, 103, 267–305.

    Article  MathSciNet  MATH  Google Scholar 

  • Huang, J., Huang, S., & Sun, M. (2021). Deeplm: Large-scale nonlinear least squares on deep learning frameworks using stochastic domain decomposition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10308–10317

  • Jared, H., Schonberger, J.L., Dunn, E., & Frahm, J.-M. (2015). Reconstructing the world in six days. In: CVPR.

  • Klingner, B., Martin, D., & Roseborough, J. (2013). Street view motion-from-structure-from-motion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 953–960

  • Levenberg, K. (1944). A method for the solution of certain non-linear problems in least squares. Quarterly of applied mathematics, 2(2), 164–168.

    Article  MathSciNet  MATH  Google Scholar 

  • Lin, J., Li, Z., Tang, X., Liu, J., Liu, S., Liu, J., Lu, Y., Wu, X., Xu, S., Yan, Y., & Yang, W. (2024). Vastgaussian: Vast 3d gaussians for large scene reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5166–5175.

  • Lin, C.-H., Ma, W.-C., Torralba, A., & Lucey, S. (2021). Barf: Bundle-adjusting neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5741–5751.

  • Liu, Y., Guan, H., Luo, C., Fan, L., Peng, J., & Zhang, Z. (2024). Citygaussian: Real-time high-quality large-scale scene rendering with gaussians. arXiv preprint arXiv:2404.01133

  • Lourakis, M. I., & Argyros, A. A. (2009). Sba: A software package for generic sparse bundle adjustment. ACM Transactions on Mathematical Software (TOMS), 36(1), 1–30.

    Article  MathSciNet  Google Scholar 

  • Marquardt, D. W. (1963). An algorithm for least-squares estimation of nonlinear parameters. Journal of the society for Industrial and Applied Mathematics, 11(2), 431–441.

    Article  MathSciNet  MATH  Google Scholar 

  • Natesan Ramamurthy, K., Lin, C.-C., Aravkin, A., Pankanti, S., & Viguier, R. (2017). Distributed bundle adjustment. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2146–2154.

  • Ni, K., Steedly, D., & Dellaert, F. (2007). Out-of-core bundle adjustment for large-scale 3d reconstruction. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE.

  • Ren, J., Liang, W., Yan, R., Mai, L., Liu, S., & Liu, X. (2022). Megba: A gpu-based distributed library for large-scale bundle adjustment. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVII, pp. 715–731. Springer

  • Saad, Y. (1994). Sparskit: a basic tool kit for sparse computations (version 2). Computer Science Department: University of Minnesota.

    Google Scholar 

  • Shen, T., Zhu, S., Fang, T., Zhang, R., & Quan, L. (2016). Graph-based consistent matching for structure-from-motion. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, pp. 139–155. Springer

  • Tancik, M., Casser, V., Yan, X., Pradhan, S., Mildenhall, B., Srinivasan, P.P., Barron, J.T., & Kretzschmar, H. (2022). Block-nerf: Scalable large scene neural view synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8248–8258.

  • Toselli, A., & Widlund, O. (2004). Domain Decomposition Methods-algorithms and Theory (Vol. 34). Berlin: Springer.

    Google Scholar 

  • Triggs, B., McLauchlan, P.F., Hartley, R.I., & Fitzgibbon, A.W. (2000). Bundle adjustment a modern synthesis. In: Vision Algorithms: Theory and Practice: International Workshop on Vision Algorithms Corfu, Greece, September 21–22 1999, Proceedings, pp. 298–372. Springer.

  • Turki, H., Ramanan, D., & Satyanarayanan, M. (2022). Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12922–12931.

  • Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B., & Revaud, J. (2024). Dust3r: Geometric 3d vision made easy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20697–20709.

  • Wang, Z., Wu, S., Xie, W., Chen, M., & Prisacariu, V.A. (2021). Nerf–: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064

  • Wu, C., Agarwal, S., Curless, B., & Seitz, S.M. (2011). Multicore bundle adjustment. In: CVPR 2011, pp. 3057–3064. IEEE.

  • Zhang, R., Zhu, S., Fang, T., & Quan, L. (2017). Distributed very large scale bundle adjustment by global camera consensus. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 29–38.

  • Zheng, M., Zhang, Y., Zhou, S., Zhu, J., & Xiong, X. (2016). Bundle block adjustment of large-scale remote sensing data with block-based sparse matrix compression combined with preconditioned conjugate gradient. Computers & Geosciences, 92, 70–78.

    Article  Google Scholar 

  • Zheng, M., Zhou, S., Xiong, X., & Zhu, J. (2017). A new gpu bundle adjustment method for large-scale data. Photogrammetric Engineering & Remote Sensing, 83(9), 633–641.

    Article  Google Scholar 

  • Zhou, L., Luo, Z., Zhen, M., Shen, T., Li, S., Huang, Z., Fang, T., & Quan, L. (2020). Stochastic bundle adjustment for efficient and scalable 3d reconstruction. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV, pp. 364–379. Springer.

  • Zhu, S., Shen, T., Zhou, L., Zhang, R., Wang, J., Fang, T., & Quan, L. (2017). Parallel structure from motion from local increment to global averaging. arXiv preprint arXiv:1702.08601

Download references

Acknowledgements

This work is supported by Natural Science Foundation of Hubei No. 2024AFB680 and Open Fund of Hubei Luojia Laboratory No. 220100034.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maoteng Zheng.

Additional information

Communicated by Adrien Bartoli.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Accessing Efficiency of the BSMC and CSR Format

To evaluate the accessing efficiency of a compression format, we consider two operations. (1) Given an element or a block in the compressed structure, find its column and row identities in the full matrix; (2) Given the column and row identities of an element or a block in the full matrix, find the corresponding block in the compressed structure.

1.1 Appendix A.1 The first Operation

Given the ID of an element in the first array of the CSR structure, the column ID in the full matrix can be identified immediately by the second array as shown in Figure 15 (left). If the elements are continuously accessed, the row ID in the full matrix can be obtained by row counting. Otherwise, the row ID must be interpolated from the third array. Use a dichotomy algorithm, the maximum searching times is \(\log _2nc\) since the size of the third array is nc where n is the block number and c is the block width.

Given the identity of a block in the first array of the BSMC structure, the column ID in the full matrix is firstly identified by the second array as shown in Figure 15 (right). Similar to CSR format, the row identity can be obtained by row counting if the blocks are continuously accessed, or the row identity must be searched in the third array with the maximum searching times of \(\log _2n\) since the size of the third array is the block number n. Note that one access in CSR structure only gets one element but it gets a block which includes \(c\times c\) elements in BSMC structure.

1.2 Appendix A.2 The Second Operation

Given the row and column IDs of a element in full matrix, to find its corresponding position in the CSR structure, we first locate the start and end column IDs in the second array from the third array by the row ID in the full matrix as shown in Figure 15 (left), then the column ID of the element in the first array can be searched between the start and end column IDs. The maximum searching times for each element is \(\log _2cs_i\). To find a block, we only have to find the beginning element for each row of this block. Therefore the total searching times for finding a block is equal to \(\sum ^c_{i=0}\log _2cs_i\) where \(s_i\) is the number of non-zero blocks in row i.

Given the row and column identities of a block in the full matrix, to find its corresponding position in the BSMC structure, we first locate the start and end column IDs of the block in the second array from the third array by the row identity as shown in Figure 15 (right), and then the column ID in the first array is searched between the start and end column IDs with maximum searching times equal to \(\log _2s_i\). Once the first element of the block is determined, the element of the whole block is identified. Apparently, for this operation, the BSMC is more efficient than the CSR.

Fig. 15
figure 15

Accessing the data in CSR (left) and BSMC (right) format

Table 7 Accessing efficiency of the BSMC and CSR formats

Table 7 shows the searching times of these two operations for the BSMC and CSR formats. As shown in Table 7, if the elements or blocks in the compressed structure are accessed continuously, the first operation will be immediately performed for the two formats. In the real case, for matrix-vector product, the elements or blocks in the compressed structure are continuously accessed. However, for matrix updating or aggregation, the second operation is frequently applied. The accessing efficiency of BSMC for both the first and second operations is better than the CSR; thus, it’s strongly recommended for the compression of block-based sparse matrices, such as the Hessian and RCS.

Fig. 16
figure 16

The jacobian matrix where \(\textbf{A}_(i,j)\) and \(\textbf{B}_(j,i)\) are the camera and landmark part of the jacobian generated by the edge of camera i and landmark j

Appendix B Formation of the RCS

The formation of RCS is a major step in BA. It dominates the most part of the computation complexity. In this section, we look into the details of the formation of a RCS and try to parallelize it. To be specific, given a dataset with 5 cameras and 3 landmarks, the Jacobian is shown in Figure 16, and the Hessian is obtained by normalizing the Jacobian as shown in Figure 17. The damping is not shown in the figure, but it does not affect the structure of the Hessian.

The computation of non-zero blocks of the Hessian are shown in equation (B1), (B2) and (B3).

$$\begin{aligned} \textbf{A}_i^T \textbf{A}_i=\sum _{j=0}^{p_i}\textbf{A}_{i,j}^T\textbf{A}_{i,j} \end{aligned}$$
(B1)

Where \(p_i\) is the number of landmarks seen by camera i.

$$\begin{aligned} \textbf{B}_i^T \textbf{B}_i=\sum _{i=0}^{q_j}\textbf{B}_{j,i}^T\textbf{B}_{j,i} \end{aligned}$$
(B2)

Where \(q_j\) is the number of cameras seeing landmark j.

$$\begin{aligned} \textbf{P}_{i,j}=\textbf{A}_{i,j}^T \textbf{B}_{j,i} \end{aligned}$$
(B3)

According to equation in the main text, conduct Shur complement to the Hessian, we get a matrix \(\textbf{Q}=\textbf{WV}^{-1} \textbf{W}^T\) as shown in Figure 19 (left). Then the RCS is computed by equation \(\textbf{R}=\textbf{U}-\textbf{Q}\) as shown in Figure 19 (right). So, to compute Shur complement is actually to compute the matrix \(\textbf{Q}\). Each block \(\textbf{Q}_{i,j}\) in matrix \(\textbf{Q}\) is computed according to equation (B4). Note that the block \(\textbf{Q}_{i,j}\) could be zero blocks if there is none common point between camera i and camera j.

Fig. 17
figure 17

The Hessian matrix

Fig. 18
figure 18

Structure of matrix \(\textbf{Q}_k\). The non-zero blocks are not identified since k is unknown

$$\begin{aligned} \textbf{Q}_{i,j}= & \sum _{k=0}^{t_{i,j}}\textbf{Q}_{i,j,k} \end{aligned}$$
(B4)
$$\begin{aligned} \textbf{Q}_{i,j,k}= & \textbf{P}_{i,k}(\textbf{B}_k^T\textbf{B}_k)^{-1}\textbf{P}_{j,k}^T \end{aligned}$$
(B5)

Where k is the landmark ID, \(t_{i,j}\) is the number of common points between camera i and camera j.

$$\begin{aligned} \textbf{Q}=\sum _{k=0}^L\textbf{Q}_k \end{aligned}$$
(B6)

Where the matrix \(\textbf{Q}_k\) is a component of the RCS, L is the number of 3D points, and k is the ID of 3D points.

Conduct Shur complement to the 3D points, each of which generates a matrix \(\textbf{Q}_k\) where k is the ID of the landmark, as shown in Figure 18. The matrix \(\textbf{Q}\) is the sum of \(\textbf{Q}_k\) as shown in equation (B6). This indicates that the Shur complement trick can be done point by point.

According to the Jacobian shown in figure 16, the 3D point 1 is seen in the camera 1, 2 and 3, the 3D point 2 is seen in the camera 2, 3 and 4, and the 3D point 3 is see in camera 3, 4 and 5, thus the actual structure of matrix \(\textbf{Q}_1\), \(\textbf{Q}_2\) and \(\textbf{Q}_3\) are shown in Figure 20. The number and positions of non-zero blocks are determined by the number and identities of cameras which see this 3D point. For an instance, if a landmark k is seen in m cameras, then the number of non-zero blocks in \(\textbf{Q}_k\) is \(m^2\), and positions of non-zero blocks are determined by the identities of these cameras.

The formation of an RCS can be summarized as follows:

(1) For each 3D point k, compute its Jacobian and Hessian, and then conduct the Shur complement to obtain \(\textbf{Q}_k\);

(2) Update the RCS with \(\textbf{Q}_k\);

(3) Stop until all the 3D points are completed.

Fig. 19
figure 19

The structure of the matrix \(\textbf{Q}\) (left) and \(\textbf{R}\) (right). The filled blocks are non-zero blocks and the unfilled ones are zero blocks

Fig. 20
figure 20

The structure of matrix \(\textbf{Q}_1\), \(\textbf{Q}_2\) and \(\textbf{Q}_3\) generated by landmarks 1, 2 and 3 respectively

According to the above procedures, the formation of RCS can be divided into parallel tasks, with each consisting of a group of 3D points. Those tasks are independent to each other; therefore, they can be executed in a distributed way.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, M., Chen, N., Zhu, J. et al. Implementation and Validation of Distributed Bundle Adjustment for Super Large Scale Datasets. Int J Comput Vis 133, 6712–6728 (2025). https://doi.org/10.1007/s11263-025-02505-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-025-02505-4

Keywords