Abstract
Recent advances in deep supervised hashing have made remarkable achievements in the large-scale image retrieval task. However, the main training paradigms (i.e., pairwise and pointwise) of existing deep supervised hashing methods, which will significantly impact the performance of deep supervised hashing methods under practical retrieval tasks, remain insufficiently explored. Motivated by the critical role of training paradigms in deep supervised hashing and the lack of comprehensive evaluations in this area, we systematically establish the evaluation protocols and conduct an extensive study through 1,833 experiments, yielding 7,332 results across 12 datasets. Our key findings include observations such as: 1) Pointwise hashing methods tend to exhibit higher retrieval accuracy in scenarios with seen-class queries but underperform significantly with unseen-class queries. 2) Pointwise hashing methods show greater robustness with seen-class queries, whereas pairwise hashing methods with soft constraints excel when queries are from unseen classes. 3) The impact of hash code dimensions is minimal on the retrieval performance of pointwise hashing methods but more pronounced for pairwise hashing, primarily due to suboptimal real-valued feature optimization. Code along with training logs for all experiments are open-source and available at https://github.com/aassxun/DSH_Analysis.
Similar content being viewed by others
Availability of data and materials
Image data in Table 2, Table 4, Table 6, Table 13, Table 15, Table 17 and Table 24 are extracted from CIFAR-10 (Krizhevsky & Hinton, 2009), CIFAR-100 (Krizhevsky & Hinton, 2009) and ImageNet-1K (Deng et al., 2009). Image data in Table 3, Table 5, Table 14 and Table 16 are extracted from Flickr-25K (Huiskes & Lew, 2008), NUS-WIDE (Chua et al., 2009) and MS COCO (Lin et al., 2014). Image data in Table 7, Table 9, Table 11, Table 18, Table 20, Table 22, Table 25 are extracted from CUB200-2011 (Wah et al., 2011), Food101 (Bossard et al., 2014), VegFru (Hou et al., 2017). Image data in Table 8, Table 10, Table 12, Table 19, Table 21, Table 23, Table 26 are extracted from Stanford Dogs (Khosla et al., 2011), Aircraft (Maji et al., 2013) and NABirds (Van Horn et al., 2015).
References
Bossard, L., Guillaumin, M. & Gool, L.V. (2014) Food-101 – mining discriminative components with random forests. In: Proc. Eur. Conf. Comp. Vis., pp. 446–461
Cakir, F., He, K., Bargal, S. A., & Sclaroff, S. (2019). Hashing with mutual information. IEEE Trans. Pattern Anal. Mach. Intell., 41(10), 2424–2437.
Cakir, F., He, K. & Sclaroff, S. (2018) Hashing with binary matrix pursuit. In: Proc. Eur. Conf. Comp. Vis., pp. 332–348
Cao, Z., Long, M., Wang, J., & Yu, P.S. (2017) HashNet: Deep learning to hash by continuation. In: Proc. IEEE Int. Conf. Comp. Vis., pp. 5608–5617
Chen, W., Liu, Y., Wang, W., Bakker, E. M., Georgiou, T., Fieguth, P., Liu, L., & Lew, M. S. (2023). Deep learning for instance retrieval: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 45(6), 7270–7292.
Chen, Z. D., Luo, X., Wang, Y., Guo, S., & Xu, X. S. (2022). Fine-grained hashing with double filtering. IEEE Trans. Image Process., 31, 1671–1683.
Christensen, H.I., & Phillips, P.J. (2002) Empirical evaluation methods in computer vision, vol. 50. World Scientific
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., & Zheng, Y. (2009) NUS-WIDE: A real-world web image database from national university of singapore. In: Proc. ACM Int. Conf. on Image and Video Retrieval, pp. 1–9
Cui, Q., Jiang, Q.Y., Wei, X.S., Li, W.J., Yoshie, O. (2020) ExchNet: A unified hashing network for large-scale fine-grained image retrieval. In: Proc. Eur. Conf. Comp. Vis., pp. 189–205
Dasgupta, A., Kumar, R., Sarlos, T. (2011) Fast locality-sensitive hashing. In: Proc. ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, pp. 1073–1081
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 248–255 (2009)
Divvala, S.K., Hoiem, D., Hays, J.H., Efros, A.A., Hebert, M.: An empirical study of context in object detection. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 1271–1278 (2009)
Dou, Z.Y., Xu, Y., Gan, Z., Wang, J., Wang, S., Wang, L., Zhu, C., Zhang, P., Yuan, L., Peng, N., et al.: An empirical study of training end-to-end vision-and-language transformers. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 18166–18176 (2022)
Eitz, M., Hildebrand, K., Boubekeur, T., & Alexa, M. (2010). Sketch-based image retrieval: Benchmark and bag-of-features descriptors. IEEE Trans. Vis. Comput. Graph., 17(11), 1624–1636.
He, K., Cakir, F., Bargal, S.A., Sclaroff, S. (2018) Hashing as tie-aware learning to rank. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 4023–4032
He, K., Zhang, X., Ren, S., & Sun, J. (2016) Deep residual learning for image recognition. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 770–778
Hinton, G., Srivastava, N., & Swersky, K. (2012) Neural networks for machine learning lecture 6a overview of mini-batch gradient descent
Hoe, J.T., Ng, K.W., Zhang, T., Chan, C.S., Song, Y.Z., Xiang, T (2021) One loss for all: Deep hashing with a single cosine similarity based learning objective. In: Advances in Neural Inf. Process. Syst., pp. 24286–24298
Hong, W., Ren, W., Lao, J., Xie, L., Zhong, L., Wang, J., Chen, J., Liu, H., Chu, W. (2024) Training object detectors from scratch: An empirical study in the era of vision transformer. Int. J. Comput. Vision pp. 1–14
Hou, S., Feng, Y., Wang, Z.: VegFru: A domain-specific dataset for fine-grained visual categorization. In: Proc. IEEE Int. Conf. Comp. Vis., pp. 541–549 (2017)
Huiskes, M.J. & Lew, M.S. (2008) The MIR Flickr retrieval evaluation. In: ACM Int. Conf. Multimedia Inf. Retr., pp. 39–43
Humenberger, M., Cabon, Y., Pion, N., Weinzaepfel, P., Lee, D., Guérin, N., Sattler, T., & Csurka, G. (2022). Investigating the role of image retrieval for visual localization: An exhaustive benchmark. Int. J. Comput. Vision, 130(7), 1811–1836.
Jiang, Q.Y. & Li, W.J. (2018) Asymmetric deep supervised hashing. In: Proc. Conf. AAAI, pp. 3342–3349
Jiang, X., Tang, H., & Li, Z. (2024). Global meets local: Dual activation hashing network for large-scale fine-grained image retrieval. Data Eng: IEEE Trans. Knowl.
Jin, S., Yao, H., Sun, X., Zhou, S., Zhang, L., & Hua, X. (2020). Deep saliency hashing for fine-grained retrieval. IEEE Trans. Image Process., 29, 5336–5351.
Khosla, A., Jayadevaprakash, N., Yao, B., & Li, F.F (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh., pp. 1–2
Krause, J., Stark, M., Deng, J., Fei-Fei, L. (2013) 3D object representations for fine-grained categorization. In: Proc. IEEE Int. Conf. Comp. Vis. Worksh., pp. 554–561
Krizhevsky, A., Hinton, G. (2009) Learning multiple layers of features from tiny images. Citeseer
Lai, H., Pan, Y., Liu, Y., Yan, S. (2015) Simultaneous feature learning and hash coding with deep neural networks. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 3270–3278
Leng, C., Cheng, J., Wu, J., Zhang, X., Lu, H.: Supervised hashing with soft constraints. In: Proc. ACM Int. Conf. Information & Knowledge Management, pp. 1851–1854 (2014)
Li, Q., Sun, Z., He, R., & Tan, T. (2020). A general framework for deep supervised discrete hashing. Int. J. Comput. Vision, 128(8), 2204–2222.
Li, W., Duan, L., Xu, D., Tsang, I.W.H.: Text-based image retrieval using progressive multi-instance learning. In: Proc. IEEE Int. Conf. Comp. Vis., pp. 2049–2055 (2011)
Li, W.J., Wang, S., Kang, W.C.: Feature learning based deep supervised hashing with pairwise labels. arXiv preprint arXiv:1511.03855 (2015)
Li, Y., van Gemert, J.: Deep unsupervised image hashing by maximizing bit entropy. In: Proc. Conf. AAAI, pp. 2002–2010 (2021)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: Common objects in context. In: Proc. Eur. Conf. Comp. Vis., pp. 740–755 (2014)
Liu, H., Wang, R., Shan, S., Chen, X.: Deep supervised hashing for fast image retrieval. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 2064–2072 (2016)
Lu, D., Wang, J., Zeng, Z., Chen, B., Wu, S., Xia, S.T.: SwinFGHash: Fine-grained image retrieval via transformer-based hashing network. In: Proc. British Machine Vis. Conf., pp. 1–13 (2021)
Lu, X., Chen, S., Cao, Y., Zhou, X., Lu, X.: Attributes grouping and mining hashing for fine-grained image retrieval. In: ACM Int. Conf. Multimedia, pp. 6558–6566 (2023)
Luo, X., Wang, H., Wu, D., Chen, C., Deng, M., Huang, J., & Hua, X. S. (2023). A survey on deep hashing methods. ACM Trans. Knowl. Discov. Data, 17(1), 1–50.
Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: Efficient indexing for high-dimensional similarity search. In: Proc. Int. Conf. Very Large Data Bases, pp. 950–961 (2007)
Ma, C., Lu, J., & Zhou, J. (2020). Rank-consistency deep hashing for scalable multi-label image search. IEEE Trans. Multimedia, 23, 3943–3956.
Ma, C., Tsang, I. W., Peng, F., & Liu, C. (2017). Partial hash update via hamming subspace learning. IEEE Trans. Image Process., 26(4), 1939–1951.
Van der Maaten, L. & Hinton, G. (2008) Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11)
Mai, Z., Li, R., Jeong, J., Quispe, D., Kim, H., & Sanner, S. (2022). Online continual learning in image classification: An empirical survey. Neurocomputing, 469, 28–51.
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)
Phillips, P. J., & Bowyer, K. W. (1999). Empirical evaluation of computer vision algorithms. IEEE Trans. Pattern Anal. Mach. Intell., 21(4), 289–290.
R., V.R. (1957) Estimate of the number of signals in error correcting codes. Docklady Akad. Nauk, SSSR 117, 739–741
Rehman, M., Iqbal, M., Sharif, M., & Raza, M. (2012). Content based image retrieval: survey. World Applied Sciences Journal, 19(3), 404–412.
Robbins, H. & Monro, S. (1951) A stochastic approximation method. The Annals of Mathematical Statistics pp. 400–407
Shao, S., Chen, K., Karpur, A., Cui, Q., Araujo, A., Cao, B. (2023) Global features are all you need for image retrieval and reranking. In: Proc. IEEE Int. Conf. Comp. Vis., pp. 11036–11046
Shen, Y., Qin, J., Chen, J., Liu, L., Zhu, F., Shen, Z. (2019) Embarrassingly simple binary representation learning. In: Proc. IEEE Int. Conf. Comp. Vis. Worksh., pp. 1–10
Shen, Y., Sun, X., Wei, X.S., Jiang, Q.Y., Yang, J.: SEMICON: A learning-to-hash solution for large-scale fine-grained image retrieval. In: Proc. Eur. Conf. Comp. Vis., pp. 531–548 (2022)
Shen, Y., Sun, X., Wei, X. S., Xu, A., & Gao, L. (2024). Equiangular basis vectors: A novel paradigm for classification tasks. Int. J. Comput. Vision, 133, 372–397.
Shi, X., Guo, Z., Xing, F., Liang, Y., & Yang, L. (2020). Anchor-based self-ensembling for semi-supervised deep pairwise hashing. Int. J. Comput. Vision, 128(8), 2307–2324.
Shi, Y., Nie, X., Liu, X., Yang, L., & Yin, Y. (2023). Zero-shot hashing via asymmetric ratio similarity matrix. IEEE Trans. Knowl. Data Eng., 35(5), 5426–5437.
Shrivastava, A. & Li, P. (2014) Densifying one permutation hashing via rotation for fast near neighbor search. In: Proc. Int. Conf. Mach. Learn., pp. 557–565
Skandarani, Y., Jodoin, P. M., & Lalande, A. (2023). Gans for medical image synthesis: An empirical study. Journal of Imaging, 9(3), 69.
Tu, R.C., Mao, X.L., Guo, J.N., Wei, W., & Huang, H.(2021) Partial-softmax loss based deep hashing. In: Int. World Wide Web Conf., pp. 2869–2878
Van Horn, G., Branson, S., Farrell, R., Haber, S., Barry, J., Ipeirotis, P., Perona, P., Belongie, S. (2015) Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 595–604
Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P., Belongie, S. (2018) The iNaturalist species classification and detection dataset. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 8769–8778
Venkatesan, R., Koon, S.M., Jakubowski, M.H., Moulin, P. (2000) Robust image hashing. In: Proc. IEEE Int. Conf. Image Process., pp. 664–666
Vo, N., Jiang, L., Sun, C., Murphy, K., Li, L.J., Fei-Fei, L., Hays, J. (2019) Composing text and image for image retrieval-an empirical odyssey. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 6439–6448
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S. (2011) The Caltech-UCSD birds-200-2011 dataset. Tech. Report CNS-TR-2011-001
Wang, J., Zhang, T., Sebe, N., & Tao, S. H. (2017). A survey on learning to hash. IEEE Trans. Pattern Anal. Mach. Intell., 40(4), 769–790.
Wang, L., Pan, Y., Liu, C., Lai, H., Yin, J., Liu, Y. (2023) Deep hashing with minimal-distance-separated hash centers. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 23455–23464
Wang, R., Wang, R., Qiao, S., Shan, S., & Chen, X. (2020) Deep position-aware hashing for semantic continuous image retrieval. In: Proc. Winter Conf. Applications of Comp. Vis., pp. 2493–2502
Wei, X.S., Cui, Q., Yang, L., Wang, P., Liu, L. (2019) Rpc: A large-scale retail product checkout dataset. arXiv preprint arXiv:1901.07249
Wei, X. S., Shen, Y., Sun, X., Wang, P., & Peng, Y. (2023). Attribute-aware deep hashing with self-consistency for large-scale fine-grained image retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 45(11), 13904–13920.
Wei, X.S., Shen, Y., Sun, X., Ye, H.J., Yang, J. (2021) A\(^2\)-Net: Learning attribute-aware hash codes for large-scale fine-grained image retrieval. In: Advances in Neural Inf. Process. Syst., pp. 5720–5730
Wong, Y.M., Hoi, S.C., Lyu, M.R. (2007) An empirical study on large-scale content-based image retrieval. In: Int. Conf. Multimedia and Expo, pp. 2206–2209
Xia, R., Pan, Y., Lai, H., Liu, C., Yan, S. (2014) Supervised hashing for image retrieval via image representation learning. In: Proc. Conf. AAAI, pp. 2156–2162
Xu, C., Chai, Z., Xu, Z., Yuan, C., Fan, Y., Wang, J. (2022) HyP\(^2\) Loss: Beyond hypersphere metric space for multi-label image retrieval. In: ACM Int. Conf. Multimedia, pp. 3173–3184
Yu, Q., Song, J., Song, Y. Z., Xiang, T., & Hospedales, T. M. (2021). Fine-grained instance-level sketch-based image retrieval. Int. J. Comput. Vision, 129(2), 484–500.
Yuan, L., Wang, T., Zhang, X., Tay, F.E., Jie, Z., Liu, W., Feng, J. (2020) Central similarity quantization for efficient image and video retrieval. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 3083–3092
Yuan, X., Ren, L., Lu, J., Zhou, J. (2018) Relaxation-free deep hashing via policy gradient. In: Proc. Eur. Conf. Comp. Vis., pp. 134–150
Zeng, X. & Zheng, Y. (2023) Cascading hierarchical networks with multi-task balanced loss for fine-grained hashing. arXiv preprint arXiv:2303.11274
Zhang, C., Chao, W.L. & Xuan, D.(2019) An empirical study on leveraging scene graphs for visual question answering. arXiv preprint arXiv:1907.12133
Zhang, M., Zhe, X., Chen, S., & Yan, H. (2021). Deep center-based dual-constrained hashing for discriminative face image retrieval. Pattern Recogn., 117, Article 107976.
Acknowledgements
This work was supported by National Key R&D Program of China (2021YFA1001100), National Natural Science Foundation of China under Grant (62272231, 62472222), Natural Science Foundation of Jiangsu Province (No. BK20240080), CIE-Tencent Robotics X Rhino-Bird Focused Research Program, and the Fundamental Research Funds for the Central Universities (4009002401). This research work is supported by the Big Data Computing Center of Southeast University.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflicts of Interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by Yue Gao
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Visualization of Real-valued Features and Binary Hash Codes on CIFAR-100
Visualization of Real-valued Features and Binary Hash Codes on CIFAR-100
In this section, we present t-SNE (Van der Maaten & Hinton, 2008) visualizations of hash codes generated by all the methods along with the corresponding real-valued features before the hash mapping process on the CIFAR-100 dataset except for \(\hbox {HyP}^2\) Loss as it is specially designed for multi-label settings. Specifically, we set hash codes as 16 and 64 bits, 5 categories are selected as unseen while the rest as seen (for ease of visualization, we sample 8 categories as seen classes). Results in Figure 6\(\sim \) Figure 9 further improve the conclusion in Section 5.5 of the paper, i.e., the fundamental reason for the bad performance of methods trained by pairwise paradigm on (fine-grained) datasets at low dimensional hash codes is the suboptimal optimization of real-valued features compared to those of higher dimensions. In addition, some methods trained by pointwise paradigm also suffer from the suboptimal optimization of real-valued features. In the following, we provide lists for seen and unseen categories within CIFAR-100. (Seen classes list: dolphin, oak_tree, reccoon, girl, snake, cloud, leopard, road. Unseen classes list: whale, willow_tree, wolf, woman, worm.)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shen, Y., Wang, P., Wei, XS. et al. An Empirical Study on Training Paradigms for Deep Supervised Hashing. Int J Comput Vis 133, 6729–6767 (2025). https://doi.org/10.1007/s11263-025-02506-3
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-025-02506-3