Abstract
We propose a distributed bundle adjustment (DBA) method using the Levenberg-Marquardt (LM) algorithm for super large-scale datasets. Most of the existing methods partition the global map to small ones and conduct bundle adjustment in the submaps. In order to fit the parallel framework, they use approximate solutions instead of the LM algorithm. However, those methods often give sub-optimal results. Different from them, we utilize the LM algorithm to conduct global bundle adjustment where the formation of the reduced camera system (RCS) is actually parallelized and executed in a distributed way. To store the large RCS, we compress it with a block-based sparse matrix compression format (BSMC), which fully exploits its block feature. The BSMC format also enables the distributed storage and updating of the global RCS. The proposed method is extensively evaluated and compared with the state-of-the-art pipelines using both synthetic and real datasets. Preliminary results demonstrate the efficient memory usage and vast scalability of the proposed method compared with the baselines. For the first time, we conducted parallel bundle adjustment using LM algorithm on a real datasets with 1.18 million images and a synthetic dataset with 10 million images (about 500 times that of the state-of-the-art LM-based BA) on a distributed computing system.
Similar content being viewed by others
Data Availability
The code for BSMC is available at https://github.com/MozartZheng/DistributedBA, the public version of the whole pipeline will also be released when it is ready.
References
Agarwal, S., Snavely, N., Seitz, S.M., & Szeliski, R. (2010). Bundle adjustment in the large. In: Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part II 11, pp. 29–42. Springer.
Agarwal, S., Furukawa, Y., Snavely, N., Simon, I., Curless, B., Seitz, S. M., & Szeliski, R. (2011). Building rome in a day. Communications of the ACM, 54(10), 105–112.
Agarwal, S., & Mierle, K. (2012). Ceres solver: Tutorial & reference. Google Inc, 2(72), 8.
Bell, N., & Garland, M. (2009). Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–11.
Byröd, M., & Åström, K. (2009). Bundle adjustment using conjugate gradients with multiscale preconditioning. In: BMVC, pp. 1–10. Citeseer.
Byröd, M., & Åström, K. (2010). Conjugate gradient bundle adjustment. In: Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part II 11, pp. 114–127. Springer Berlin Heidelberg
Demmel, N., Gao, M., Laude, E., Wu, T., & Cremers, D. (2020). Distributed photometric bundle adjustment. In: 2020 International Conference on 3D Vision (3DV), pp. 140–149. IEEE.
Eriksson, A., Bastian, J., Chin, T.-J., & Isaksson, M. (2016). A consensus-based framework for distributed bundle adjustment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1754–1762.
Frahm, J.-M., Fite-Georgel, P., Gallup, D., Johnson, T., Raguram, R., Wu, C., Jen, Y.-H., Dunn, E., Clipp, B., Lazebnik, S., & Pollefeys, M. (2010). Building rome on a cloudless day. In: European Conference on Computer Vision.
Fu, Y., Liu, S., Kulkarni, A., Kautz, J., Efros, A.A., & Wang, X. (2024). COLMAP-Free 3D Gaussian Splatting.
Govindu, V.M. (2004). Lie-algebraic averaging for globally consistent motion estimation. In: IEEE Computer Society Conference on Computer Vision & Pattern Recognition.
Grisetti, G., Kümmerle, R., Strasdat, H., & Konolige, K. (2011). g2o: A general framework for (hyper) graph optimization. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, pp. 9–13.
Hartley, R., Trumpf, J., Dai, Y., & Li, H. (2013). Rotation averaging. International journal of computer vision, 103, 267–305.
Huang, J., Huang, S., & Sun, M. (2021). Deeplm: Large-scale nonlinear least squares on deep learning frameworks using stochastic domain decomposition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10308–10317
Jared, H., Schonberger, J.L., Dunn, E., & Frahm, J.-M. (2015). Reconstructing the world in six days. In: CVPR.
Klingner, B., Martin, D., & Roseborough, J. (2013). Street view motion-from-structure-from-motion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 953–960
Levenberg, K. (1944). A method for the solution of certain non-linear problems in least squares. Quarterly of applied mathematics, 2(2), 164–168.
Lin, J., Li, Z., Tang, X., Liu, J., Liu, S., Liu, J., Lu, Y., Wu, X., Xu, S., Yan, Y., & Yang, W. (2024). Vastgaussian: Vast 3d gaussians for large scene reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5166–5175.
Lin, C.-H., Ma, W.-C., Torralba, A., & Lucey, S. (2021). Barf: Bundle-adjusting neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5741–5751.
Liu, Y., Guan, H., Luo, C., Fan, L., Peng, J., & Zhang, Z. (2024). Citygaussian: Real-time high-quality large-scale scene rendering with gaussians. arXiv preprint arXiv:2404.01133
Lourakis, M. I., & Argyros, A. A. (2009). Sba: A software package for generic sparse bundle adjustment. ACM Transactions on Mathematical Software (TOMS), 36(1), 1–30.
Marquardt, D. W. (1963). An algorithm for least-squares estimation of nonlinear parameters. Journal of the society for Industrial and Applied Mathematics, 11(2), 431–441.
Natesan Ramamurthy, K., Lin, C.-C., Aravkin, A., Pankanti, S., & Viguier, R. (2017). Distributed bundle adjustment. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2146–2154.
Ni, K., Steedly, D., & Dellaert, F. (2007). Out-of-core bundle adjustment for large-scale 3d reconstruction. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE.
Ren, J., Liang, W., Yan, R., Mai, L., Liu, S., & Liu, X. (2022). Megba: A gpu-based distributed library for large-scale bundle adjustment. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVII, pp. 715–731. Springer
Saad, Y. (1994). Sparskit: a basic tool kit for sparse computations (version 2). Computer Science Department: University of Minnesota.
Shen, T., Zhu, S., Fang, T., Zhang, R., & Quan, L. (2016). Graph-based consistent matching for structure-from-motion. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, pp. 139–155. Springer
Tancik, M., Casser, V., Yan, X., Pradhan, S., Mildenhall, B., Srinivasan, P.P., Barron, J.T., & Kretzschmar, H. (2022). Block-nerf: Scalable large scene neural view synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8248–8258.
Toselli, A., & Widlund, O. (2004). Domain Decomposition Methods-algorithms and Theory (Vol. 34). Berlin: Springer.
Triggs, B., McLauchlan, P.F., Hartley, R.I., & Fitzgibbon, A.W. (2000). Bundle adjustment a modern synthesis. In: Vision Algorithms: Theory and Practice: International Workshop on Vision Algorithms Corfu, Greece, September 21–22 1999, Proceedings, pp. 298–372. Springer.
Turki, H., Ramanan, D., & Satyanarayanan, M. (2022). Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12922–12931.
Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B., & Revaud, J. (2024). Dust3r: Geometric 3d vision made easy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20697–20709.
Wang, Z., Wu, S., Xie, W., Chen, M., & Prisacariu, V.A. (2021). Nerf–: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064
Wu, C., Agarwal, S., Curless, B., & Seitz, S.M. (2011). Multicore bundle adjustment. In: CVPR 2011, pp. 3057–3064. IEEE.
Zhang, R., Zhu, S., Fang, T., & Quan, L. (2017). Distributed very large scale bundle adjustment by global camera consensus. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 29–38.
Zheng, M., Zhang, Y., Zhou, S., Zhu, J., & Xiong, X. (2016). Bundle block adjustment of large-scale remote sensing data with block-based sparse matrix compression combined with preconditioned conjugate gradient. Computers & Geosciences, 92, 70–78.
Zheng, M., Zhou, S., Xiong, X., & Zhu, J. (2017). A new gpu bundle adjustment method for large-scale data. Photogrammetric Engineering & Remote Sensing, 83(9), 633–641.
Zhou, L., Luo, Z., Zhen, M., Shen, T., Li, S., Huang, Z., Fang, T., & Quan, L. (2020). Stochastic bundle adjustment for efficient and scalable 3d reconstruction. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV, pp. 364–379. Springer.
Zhu, S., Shen, T., Zhou, L., Zhang, R., Wang, J., Fang, T., & Quan, L. (2017). Parallel structure from motion from local increment to global averaging. arXiv preprint arXiv:1702.08601
Acknowledgements
This work is supported by Natural Science Foundation of Hubei No. 2024AFB680 and Open Fund of Hubei Luojia Laboratory No. 220100034.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Adrien Bartoli.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Accessing Efficiency of the BSMC and CSR Format
To evaluate the accessing efficiency of a compression format, we consider two operations. (1) Given an element or a block in the compressed structure, find its column and row identities in the full matrix; (2) Given the column and row identities of an element or a block in the full matrix, find the corresponding block in the compressed structure.
1.1 Appendix A.1 The first Operation
Given the ID of an element in the first array of the CSR structure, the column ID in the full matrix can be identified immediately by the second array as shown in Figure 15 (left). If the elements are continuously accessed, the row ID in the full matrix can be obtained by row counting. Otherwise, the row ID must be interpolated from the third array. Use a dichotomy algorithm, the maximum searching times is \(\log _2nc\) since the size of the third array is nc where n is the block number and c is the block width.
Given the identity of a block in the first array of the BSMC structure, the column ID in the full matrix is firstly identified by the second array as shown in Figure 15 (right). Similar to CSR format, the row identity can be obtained by row counting if the blocks are continuously accessed, or the row identity must be searched in the third array with the maximum searching times of \(\log _2n\) since the size of the third array is the block number n. Note that one access in CSR structure only gets one element but it gets a block which includes \(c\times c\) elements in BSMC structure.
1.2 Appendix A.2 The Second Operation
Given the row and column IDs of a element in full matrix, to find its corresponding position in the CSR structure, we first locate the start and end column IDs in the second array from the third array by the row ID in the full matrix as shown in Figure 15 (left), then the column ID of the element in the first array can be searched between the start and end column IDs. The maximum searching times for each element is \(\log _2cs_i\). To find a block, we only have to find the beginning element for each row of this block. Therefore the total searching times for finding a block is equal to \(\sum ^c_{i=0}\log _2cs_i\) where \(s_i\) is the number of non-zero blocks in row i.
Given the row and column identities of a block in the full matrix, to find its corresponding position in the BSMC structure, we first locate the start and end column IDs of the block in the second array from the third array by the row identity as shown in Figure 15 (right), and then the column ID in the first array is searched between the start and end column IDs with maximum searching times equal to \(\log _2s_i\). Once the first element of the block is determined, the element of the whole block is identified. Apparently, for this operation, the BSMC is more efficient than the CSR.
Table 7 shows the searching times of these two operations for the BSMC and CSR formats. As shown in Table 7, if the elements or blocks in the compressed structure are accessed continuously, the first operation will be immediately performed for the two formats. In the real case, for matrix-vector product, the elements or blocks in the compressed structure are continuously accessed. However, for matrix updating or aggregation, the second operation is frequently applied. The accessing efficiency of BSMC for both the first and second operations is better than the CSR; thus, it’s strongly recommended for the compression of block-based sparse matrices, such as the Hessian and RCS.
Appendix B Formation of the RCS
The formation of RCS is a major step in BA. It dominates the most part of the computation complexity. In this section, we look into the details of the formation of a RCS and try to parallelize it. To be specific, given a dataset with 5 cameras and 3 landmarks, the Jacobian is shown in Figure 16, and the Hessian is obtained by normalizing the Jacobian as shown in Figure 17. The damping is not shown in the figure, but it does not affect the structure of the Hessian.
The computation of non-zero blocks of the Hessian are shown in equation (B1), (B2) and (B3).
Where \(p_i\) is the number of landmarks seen by camera i.
Where \(q_j\) is the number of cameras seeing landmark j.
According to equation in the main text, conduct Shur complement to the Hessian, we get a matrix \(\textbf{Q}=\textbf{WV}^{-1} \textbf{W}^T\) as shown in Figure 19 (left). Then the RCS is computed by equation \(\textbf{R}=\textbf{U}-\textbf{Q}\) as shown in Figure 19 (right). So, to compute Shur complement is actually to compute the matrix \(\textbf{Q}\). Each block \(\textbf{Q}_{i,j}\) in matrix \(\textbf{Q}\) is computed according to equation (B4). Note that the block \(\textbf{Q}_{i,j}\) could be zero blocks if there is none common point between camera i and camera j.
Where k is the landmark ID, \(t_{i,j}\) is the number of common points between camera i and camera j.
Where the matrix \(\textbf{Q}_k\) is a component of the RCS, L is the number of 3D points, and k is the ID of 3D points.
Conduct Shur complement to the 3D points, each of which generates a matrix \(\textbf{Q}_k\) where k is the ID of the landmark, as shown in Figure 18. The matrix \(\textbf{Q}\) is the sum of \(\textbf{Q}_k\) as shown in equation (B6). This indicates that the Shur complement trick can be done point by point.
According to the Jacobian shown in figure 16, the 3D point 1 is seen in the camera 1, 2 and 3, the 3D point 2 is seen in the camera 2, 3 and 4, and the 3D point 3 is see in camera 3, 4 and 5, thus the actual structure of matrix \(\textbf{Q}_1\), \(\textbf{Q}_2\) and \(\textbf{Q}_3\) are shown in Figure 20. The number and positions of non-zero blocks are determined by the number and identities of cameras which see this 3D point. For an instance, if a landmark k is seen in m cameras, then the number of non-zero blocks in \(\textbf{Q}_k\) is \(m^2\), and positions of non-zero blocks are determined by the identities of these cameras.
The formation of an RCS can be summarized as follows:
(1) For each 3D point k, compute its Jacobian and Hessian, and then conduct the Shur complement to obtain \(\textbf{Q}_k\);
(2) Update the RCS with \(\textbf{Q}_k\);
(3) Stop until all the 3D points are completed.
According to the above procedures, the formation of RCS can be divided into parallel tasks, with each consisting of a group of 3D points. Those tasks are independent to each other; therefore, they can be executed in a distributed way.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zheng, M., Chen, N., Zhu, J. et al. Implementation and Validation of Distributed Bundle Adjustment for Super Large Scale Datasets. Int J Comput Vis 133, 6712–6728 (2025). https://doi.org/10.1007/s11263-025-02505-4
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-025-02505-4