这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

An Empirical Study on Training Paradigms for Deep Supervised Hashing

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Recent advances in deep supervised hashing have made remarkable achievements in the large-scale image retrieval task. However, the main training paradigms (i.e., pairwise and pointwise) of existing deep supervised hashing methods, which will significantly impact the performance of deep supervised hashing methods under practical retrieval tasks, remain insufficiently explored. Motivated by the critical role of training paradigms in deep supervised hashing and the lack of comprehensive evaluations in this area, we systematically establish the evaluation protocols and conduct an extensive study through 1,833 experiments, yielding 7,332 results across 12 datasets. Our key findings include observations such as: 1) Pointwise hashing methods tend to exhibit higher retrieval accuracy in scenarios with seen-class queries but underperform significantly with unseen-class queries. 2) Pointwise hashing methods show greater robustness with seen-class queries, whereas pairwise hashing methods with soft constraints excel when queries are from unseen classes. 3) The impact of hash code dimensions is minimal on the retrieval performance of pointwise hashing methods but more pronounced for pairwise hashing, primarily due to suboptimal real-valued feature optimization. Code along with training logs for all experiments are open-source and available at https://github.com/aassxun/DSH_Analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Availability of data and materials

Image data in Table 2, Table 4, Table 6, Table 13, Table 15, Table 17 and Table 24 are extracted from CIFAR-10 (Krizhevsky & Hinton, 2009), CIFAR-100 (Krizhevsky & Hinton, 2009) and ImageNet-1K (Deng et al., 2009). Image data in Table 3, Table 5, Table 14 and Table 16 are extracted from Flickr-25K (Huiskes & Lew, 2008), NUS-WIDE (Chua et al., 2009) and MS COCO (Lin et al., 2014). Image data in Table 7, Table 9, Table 11, Table 18, Table 20, Table 22, Table 25 are extracted from CUB200-2011 (Wah et al., 2011), Food101 (Bossard et al., 2014), VegFru (Hou et al., 2017). Image data in Table 8, Table 10, Table 12, Table 19, Table 21, Table 23, Table 26 are extracted from Stanford Dogs (Khosla et al., 2011), Aircraft (Maji et al., 2013) and NABirds (Van Horn et al., 2015).

References

  • Bossard, L., Guillaumin, M. & Gool, L.V. (2014) Food-101 – mining discriminative components with random forests. In: Proc. Eur. Conf. Comp. Vis., pp. 446–461

  • Cakir, F., He, K., Bargal, S. A., & Sclaroff, S. (2019). Hashing with mutual information. IEEE Trans. Pattern Anal. Mach. Intell., 41(10), 2424–2437.

    Article  Google Scholar 

  • Cakir, F., He, K. & Sclaroff, S. (2018) Hashing with binary matrix pursuit. In: Proc. Eur. Conf. Comp. Vis., pp. 332–348

  • Cao, Z., Long, M., Wang, J., & Yu, P.S. (2017) HashNet: Deep learning to hash by continuation. In: Proc. IEEE Int. Conf. Comp. Vis., pp. 5608–5617

  • Chen, W., Liu, Y., Wang, W., Bakker, E. M., Georgiou, T., Fieguth, P., Liu, L., & Lew, M. S. (2023). Deep learning for instance retrieval: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 45(6), 7270–7292.

    Article  Google Scholar 

  • Chen, Z. D., Luo, X., Wang, Y., Guo, S., & Xu, X. S. (2022). Fine-grained hashing with double filtering. IEEE Trans. Image Process., 31, 1671–1683.

    Article  Google Scholar 

  • Christensen, H.I., & Phillips, P.J. (2002) Empirical evaluation methods in computer vision, vol. 50. World Scientific

  • Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., & Zheng, Y. (2009) NUS-WIDE: A real-world web image database from national university of singapore. In: Proc. ACM Int. Conf. on Image and Video Retrieval, pp. 1–9

  • Cui, Q., Jiang, Q.Y., Wei, X.S., Li, W.J., Yoshie, O. (2020) ExchNet: A unified hashing network for large-scale fine-grained image retrieval. In: Proc. Eur. Conf. Comp. Vis., pp. 189–205

  • Dasgupta, A., Kumar, R., Sarlos, T. (2011) Fast locality-sensitive hashing. In: Proc. ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, pp. 1073–1081

  • Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 248–255 (2009)

  • Divvala, S.K., Hoiem, D., Hays, J.H., Efros, A.A., Hebert, M.: An empirical study of context in object detection. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 1271–1278 (2009)

  • Dou, Z.Y., Xu, Y., Gan, Z., Wang, J., Wang, S., Wang, L., Zhu, C., Zhang, P., Yuan, L., Peng, N., et al.: An empirical study of training end-to-end vision-and-language transformers. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 18166–18176 (2022)

  • Eitz, M., Hildebrand, K., Boubekeur, T., & Alexa, M. (2010). Sketch-based image retrieval: Benchmark and bag-of-features descriptors. IEEE Trans. Vis. Comput. Graph., 17(11), 1624–1636.

    Article  Google Scholar 

  • He, K., Cakir, F., Bargal, S.A., Sclaroff, S. (2018) Hashing as tie-aware learning to rank. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 4023–4032

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016) Deep residual learning for image recognition. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 770–778

  • Hinton, G., Srivastava, N., & Swersky, K. (2012) Neural networks for machine learning lecture 6a overview of mini-batch gradient descent

  • Hoe, J.T., Ng, K.W., Zhang, T., Chan, C.S., Song, Y.Z., Xiang, T (2021) One loss for all: Deep hashing with a single cosine similarity based learning objective. In: Advances in Neural Inf. Process. Syst., pp. 24286–24298

  • Hong, W., Ren, W., Lao, J., Xie, L., Zhong, L., Wang, J., Chen, J., Liu, H., Chu, W. (2024) Training object detectors from scratch: An empirical study in the era of vision transformer. Int. J. Comput. Vision pp. 1–14

  • Hou, S., Feng, Y., Wang, Z.: VegFru: A domain-specific dataset for fine-grained visual categorization. In: Proc. IEEE Int. Conf. Comp. Vis., pp. 541–549 (2017)

  • Huiskes, M.J. & Lew, M.S. (2008) The MIR Flickr retrieval evaluation. In: ACM Int. Conf. Multimedia Inf. Retr., pp. 39–43

  • Humenberger, M., Cabon, Y., Pion, N., Weinzaepfel, P., Lee, D., Guérin, N., Sattler, T., & Csurka, G. (2022). Investigating the role of image retrieval for visual localization: An exhaustive benchmark. Int. J. Comput. Vision, 130(7), 1811–1836.

    Article  Google Scholar 

  • Jiang, Q.Y. & Li, W.J. (2018) Asymmetric deep supervised hashing. In: Proc. Conf. AAAI, pp. 3342–3349

  • Jiang, X., Tang, H., & Li, Z. (2024). Global meets local: Dual activation hashing network for large-scale fine-grained image retrieval. Data Eng: IEEE Trans. Knowl.

    Google Scholar 

  • Jin, S., Yao, H., Sun, X., Zhou, S., Zhang, L., & Hua, X. (2020). Deep saliency hashing for fine-grained retrieval. IEEE Trans. Image Process., 29, 5336–5351.

    Article  Google Scholar 

  • Khosla, A., Jayadevaprakash, N., Yao, B., & Li, F.F (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh., pp. 1–2

  • Krause, J., Stark, M., Deng, J., Fei-Fei, L. (2013) 3D object representations for fine-grained categorization. In: Proc. IEEE Int. Conf. Comp. Vis. Worksh., pp. 554–561

  • Krizhevsky, A., Hinton, G. (2009) Learning multiple layers of features from tiny images. Citeseer

  • Lai, H., Pan, Y., Liu, Y., Yan, S. (2015) Simultaneous feature learning and hash coding with deep neural networks. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 3270–3278

  • Leng, C., Cheng, J., Wu, J., Zhang, X., Lu, H.: Supervised hashing with soft constraints. In: Proc. ACM Int. Conf. Information & Knowledge Management, pp. 1851–1854 (2014)

  • Li, Q., Sun, Z., He, R., & Tan, T. (2020). A general framework for deep supervised discrete hashing. Int. J. Comput. Vision, 128(8), 2204–2222.

    Article  MathSciNet  MATH  Google Scholar 

  • Li, W., Duan, L., Xu, D., Tsang, I.W.H.: Text-based image retrieval using progressive multi-instance learning. In: Proc. IEEE Int. Conf. Comp. Vis., pp. 2049–2055 (2011)

  • Li, W.J., Wang, S., Kang, W.C.: Feature learning based deep supervised hashing with pairwise labels. arXiv preprint arXiv:1511.03855 (2015)

  • Li, Y., van Gemert, J.: Deep unsupervised image hashing by maximizing bit entropy. In: Proc. Conf. AAAI, pp. 2002–2010 (2021)

  • Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: Common objects in context. In: Proc. Eur. Conf. Comp. Vis., pp. 740–755 (2014)

  • Liu, H., Wang, R., Shan, S., Chen, X.: Deep supervised hashing for fast image retrieval. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 2064–2072 (2016)

  • Lu, D., Wang, J., Zeng, Z., Chen, B., Wu, S., Xia, S.T.: SwinFGHash: Fine-grained image retrieval via transformer-based hashing network. In: Proc. British Machine Vis. Conf., pp. 1–13 (2021)

  • Lu, X., Chen, S., Cao, Y., Zhou, X., Lu, X.: Attributes grouping and mining hashing for fine-grained image retrieval. In: ACM Int. Conf. Multimedia, pp. 6558–6566 (2023)

  • Luo, X., Wang, H., Wu, D., Chen, C., Deng, M., Huang, J., & Hua, X. S. (2023). A survey on deep hashing methods. ACM Trans. Knowl. Discov. Data, 17(1), 1–50.

    Article  Google Scholar 

  • Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: Efficient indexing for high-dimensional similarity search. In: Proc. Int. Conf. Very Large Data Bases, pp. 950–961 (2007)

  • Ma, C., Lu, J., & Zhou, J. (2020). Rank-consistency deep hashing for scalable multi-label image search. IEEE Trans. Multimedia, 23, 3943–3956.

    Article  Google Scholar 

  • Ma, C., Tsang, I. W., Peng, F., & Liu, C. (2017). Partial hash update via hamming subspace learning. IEEE Trans. Image Process., 26(4), 1939–1951.

    Article  MathSciNet  MATH  Google Scholar 

  • Van der Maaten, L. & Hinton, G. (2008) Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11)

  • Mai, Z., Li, R., Jeong, J., Quispe, D., Kim, H., & Sanner, S. (2022). Online continual learning in image classification: An empirical survey. Neurocomputing, 469, 28–51.

    Article  Google Scholar 

  • Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)

  • Phillips, P. J., & Bowyer, K. W. (1999). Empirical evaluation of computer vision algorithms. IEEE Trans. Pattern Anal. Mach. Intell., 21(4), 289–290.

    Article  Google Scholar 

  • R., V.R. (1957) Estimate of the number of signals in error correcting codes. Docklady Akad. Nauk, SSSR 117, 739–741

  • Rehman, M., Iqbal, M., Sharif, M., & Raza, M. (2012). Content based image retrieval: survey. World Applied Sciences Journal, 19(3), 404–412.

    Google Scholar 

  • Robbins, H. & Monro, S. (1951) A stochastic approximation method. The Annals of Mathematical Statistics pp. 400–407

  • Shao, S., Chen, K., Karpur, A., Cui, Q., Araujo, A., Cao, B. (2023) Global features are all you need for image retrieval and reranking. In: Proc. IEEE Int. Conf. Comp. Vis., pp. 11036–11046

  • Shen, Y., Qin, J., Chen, J., Liu, L., Zhu, F., Shen, Z. (2019) Embarrassingly simple binary representation learning. In: Proc. IEEE Int. Conf. Comp. Vis. Worksh., pp. 1–10

  • Shen, Y., Sun, X., Wei, X.S., Jiang, Q.Y., Yang, J.: SEMICON: A learning-to-hash solution for large-scale fine-grained image retrieval. In: Proc. Eur. Conf. Comp. Vis., pp. 531–548 (2022)

  • Shen, Y., Sun, X., Wei, X. S., Xu, A., & Gao, L. (2024). Equiangular basis vectors: A novel paradigm for classification tasks. Int. J. Comput. Vision, 133, 372–397.

    Article  Google Scholar 

  • Shi, X., Guo, Z., Xing, F., Liang, Y., & Yang, L. (2020). Anchor-based self-ensembling for semi-supervised deep pairwise hashing. Int. J. Comput. Vision, 128(8), 2307–2324.

    Article  MathSciNet  MATH  Google Scholar 

  • Shi, Y., Nie, X., Liu, X., Yang, L., & Yin, Y. (2023). Zero-shot hashing via asymmetric ratio similarity matrix. IEEE Trans. Knowl. Data Eng., 35(5), 5426–5437.

    Google Scholar 

  • Shrivastava, A. & Li, P. (2014) Densifying one permutation hashing via rotation for fast near neighbor search. In: Proc. Int. Conf. Mach. Learn., pp. 557–565

  • Skandarani, Y., Jodoin, P. M., & Lalande, A. (2023). Gans for medical image synthesis: An empirical study. Journal of Imaging, 9(3), 69.

    Article  Google Scholar 

  • Tu, R.C., Mao, X.L., Guo, J.N., Wei, W., & Huang, H.(2021) Partial-softmax loss based deep hashing. In: Int. World Wide Web Conf., pp. 2869–2878

  • Van Horn, G., Branson, S., Farrell, R., Haber, S., Barry, J., Ipeirotis, P., Perona, P., Belongie, S. (2015) Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 595–604

  • Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P., Belongie, S. (2018) The iNaturalist species classification and detection dataset. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 8769–8778

  • Venkatesan, R., Koon, S.M., Jakubowski, M.H., Moulin, P. (2000) Robust image hashing. In: Proc. IEEE Int. Conf. Image Process., pp. 664–666

  • Vo, N., Jiang, L., Sun, C., Murphy, K., Li, L.J., Fei-Fei, L., Hays, J. (2019) Composing text and image for image retrieval-an empirical odyssey. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 6439–6448

  • Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S. (2011) The Caltech-UCSD birds-200-2011 dataset. Tech. Report CNS-TR-2011-001

  • Wang, J., Zhang, T., Sebe, N., & Tao, S. H. (2017). A survey on learning to hash. IEEE Trans. Pattern Anal. Mach. Intell., 40(4), 769–790.

    Article  Google Scholar 

  • Wang, L., Pan, Y., Liu, C., Lai, H., Yin, J., Liu, Y. (2023) Deep hashing with minimal-distance-separated hash centers. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 23455–23464

  • Wang, R., Wang, R., Qiao, S., Shan, S., & Chen, X. (2020) Deep position-aware hashing for semantic continuous image retrieval. In: Proc. Winter Conf. Applications of Comp. Vis., pp. 2493–2502

  • Wei, X.S., Cui, Q., Yang, L., Wang, P., Liu, L. (2019) Rpc: A large-scale retail product checkout dataset. arXiv preprint arXiv:1901.07249

  • Wei, X. S., Shen, Y., Sun, X., Wang, P., & Peng, Y. (2023). Attribute-aware deep hashing with self-consistency for large-scale fine-grained image retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 45(11), 13904–13920.

    Article  Google Scholar 

  • Wei, X.S., Shen, Y., Sun, X., Ye, H.J., Yang, J. (2021) A\(^2\)-Net: Learning attribute-aware hash codes for large-scale fine-grained image retrieval. In: Advances in Neural Inf. Process. Syst., pp. 5720–5730

  • Wong, Y.M., Hoi, S.C., Lyu, M.R. (2007) An empirical study on large-scale content-based image retrieval. In: Int. Conf. Multimedia and Expo, pp. 2206–2209

  • Xia, R., Pan, Y., Lai, H., Liu, C., Yan, S. (2014) Supervised hashing for image retrieval via image representation learning. In: Proc. Conf. AAAI, pp. 2156–2162

  • Xu, C., Chai, Z., Xu, Z., Yuan, C., Fan, Y., Wang, J. (2022) HyP\(^2\) Loss: Beyond hypersphere metric space for multi-label image retrieval. In: ACM Int. Conf. Multimedia, pp. 3173–3184

  • Yu, Q., Song, J., Song, Y. Z., Xiang, T., & Hospedales, T. M. (2021). Fine-grained instance-level sketch-based image retrieval. Int. J. Comput. Vision, 129(2), 484–500.

    Article  Google Scholar 

  • Yuan, L., Wang, T., Zhang, X., Tay, F.E., Jie, Z., Liu, W., Feng, J. (2020) Central similarity quantization for efficient image and video retrieval. In: Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pp. 3083–3092

  • Yuan, X., Ren, L., Lu, J., Zhou, J. (2018) Relaxation-free deep hashing via policy gradient. In: Proc. Eur. Conf. Comp. Vis., pp. 134–150

  • Zeng, X. & Zheng, Y. (2023) Cascading hierarchical networks with multi-task balanced loss for fine-grained hashing. arXiv preprint arXiv:2303.11274

  • Zhang, C., Chao, W.L. & Xuan, D.(2019) An empirical study on leveraging scene graphs for visual question answering. arXiv preprint arXiv:1907.12133

  • Zhang, M., Zhe, X., Chen, S., & Yan, H. (2021). Deep center-based dual-constrained hashing for discriminative face image retrieval. Pattern Recogn., 117, Article 107976.

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (2021YFA1001100), National Natural Science Foundation of China under Grant (62272231, 62472222), Natural Science Foundation of Jiangsu Province (No. BK20240080), CIE-Tencent Robotics X Rhino-Bird Focused Research Program, and the Fundamental Research Funds for the Central Universities (4009002401). This research work is supported by the Big Data Computing Center of Southeast University.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xiu-Shen Wei or Yazhou Yao.

Ethics declarations

Conflicts of Interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by Yue Gao

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Visualization of Real-valued Features and Binary Hash Codes on CIFAR-100

Visualization of Real-valued Features and Binary Hash Codes on CIFAR-100

In this section, we present t-SNE (Van der Maaten & Hinton, 2008) visualizations of hash codes generated by all the methods along with the corresponding real-valued features before the hash mapping process on the CIFAR-100 dataset except for \(\hbox {HyP}^2\) Loss as it is specially designed for multi-label settings. Specifically, we set hash codes as 16 and 64 bits, 5 categories are selected as unseen while the rest as seen (for ease of visualization, we sample 8 categories as seen classes). Results in Figure 6\(\sim \) Figure 9 further improve the conclusion in Section 5.5 of the paper, i.e., the fundamental reason for the bad performance of methods trained by pairwise paradigm on (fine-grained) datasets at low dimensional hash codes is the suboptimal optimization of real-valued features compared to those of higher dimensions. In addition, some methods trained by pointwise paradigm also suffer from the suboptimal optimization of real-valued features. In the following, we provide lists for seen and unseen categories within CIFAR-100. (Seen classes list: dolphin, oak_tree, reccoon, girl, snake, cloud, leopard, road. Unseen classes list: whale, willow_tree, wolf, woman, worm.)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shen, Y., Wang, P., Wei, XS. et al. An Empirical Study on Training Paradigms for Deep Supervised Hashing. Int J Comput Vis 133, 6729–6767 (2025). https://doi.org/10.1007/s11263-025-02506-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-025-02506-3

Keywords