Abstract
Open-world visual recognition aims to empower models to identify objects in real-world settings, particularly when they encounter domains or categories that are not included in the training dataset. This paper proposes a specific open-world visual recognition task, i.e. Pattern-Expandable Image Copy Detection (PE-ICD). In realistic scenarios, the continuous emergence of novel tampering patterns necessitates fast upgrades to the ICD system to prevent confusion in already-trained models. Therefore, our PE-ICD focuses on two aspects, i.e., rehearsal-free upgrade and backward-compatible deployment: (1) The rehearsal-free upgrade utilizes only the new patterns to save time, as re-training on the old patterns can be very time-consuming. (2) The backward-compatible deployment allows for comparing the updated query features against the outdated gallery features, thereby avoiding the need to re-extract features for the extensively large gallery. To lay the foundation for PE-ICD research, we construct the first regulated pattern set, CrossPattern, and propose Pattern Stripping (P-Strip). CrossPattern regulates both base and novel patterns during the initial training and subsequent upgrades. Given a query, our P-Strip separates the tamper patterns by decomposing it into an image feature and multiple pattern features. The advantage of P-Strip is that we can easily introduce new pattern features with minimal impact on the image feature and previously seen pattern features. Experimental results show that P-Strip supports both rehearsal-free upgrading and backward compatibility. Our code is publicly available at https://github.com/WangWenhao0716/PEICD.
Similar content being viewed by others
References
Berman, M., Jégou, H., Vedaldi, A., Kokkinos, I., & Douze, M. (2019). Multigrain: A unified image embedding for classes and instances. arXiv preprint arXiv:1902.05509
Budnik, M., & Avrithis, Y. (2021). Asymmetric metric learning for knowledge transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8228–8238).
Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., & Joulin, A. (2021). Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV).
Chaoyu, Z., Jianjun, Q., Shumin, Z., Jin, X., & Yang, J. (2024). Learning robust facial representation from the view of diversity and closeness. International Journal of Computer Vision, 132(2), 410–427.
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, PMLR (pp. 1597–1607).
Choudhury, S., Laina, I., Rupprecht, C., & Vedaldi, A. (2024). The curious layperson: Fine-grained image recognition without expert labels. International Journal of Computer Vision, 132(2), 537–554.
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (pp. 248–255). IEEE
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.
Douze, M., Jégou, H., Sandhawalia, H., Amsaleg, L., & Schmid, C. (2009). Evaluation of gist descriptors for web-scale image search. In Proceedings of the ACM International Conference on Image and Video Retrieval (pp. 1–8)
Douze, M., Tolias, G., Pizzi, E., Papakipos, Z., Chanussot, L., Radenovic, F., Jenicek, T., Maximov, M., Leal-Taixé, L., Elezi, I., et al. (2021). The 2021 image similarity dataset and challenge. arXiv preprint arXiv:2106.09672
Duggal, R., Zhou, H., Yang, S., Xiong, Y., Xia, W., Tu, Z., & Soatto, S. (2021). Compatibility-aware heterogeneous visual search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10723–10732).
Flusser, J., Lébl, M., Šroubek, F., Pedone, M., & Kostková, J. (2023). Blur invariants for image recognition. International Journal of Computer Vision, 131(9), 2298–2315.
Hermans, A., Beyer, L., & Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13), 3521–3526.
Lao, M., Pu, N., Liu, Y., Zhong, Z., Bakker, E.M., Sebe, N., & Lew, M.S. (2023). Multi-domain lifelong visual question answering via self-critical distillation. In Proceedings of the 31st ACM International Conference on Multimedia, Association for Computing Machinery, New York, NY, USA (pp. 4747–4758).
Li, W.H., Liu, X., & Bilen, H. (2023). Universal representations: A unified look at multiple task and domain learning. International Journal of Computer Vision, 1–25.
Li, Z., & Hoiem, D. (2017). Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12), 2935–2947.
Liu, W., Wen, Y., Yu, Z., & Yang, M. (2016). Large-margin softmax loss for convolutional neural networks. In ICML.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., & Lin, Z., et al. (2019). Pytorch: An imperative style, high-performance deep learning library.
Pizzi, E., Roy, S.D., Ravindra, S.N., Goyal, P., & Douze, M. (2022). A self-supervised descriptor for image copy detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 14532–14542).
Pu, N., Zhong, Z., Sebe, N., & Lew, M. S. (2023). A memorizing and generalizing framework for lifelong person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(11), 13567–13585.
Rao, H., Leung, C., & Miao, C. (2024). Hierarchical skeleton meta-prototype contrastive learning with hard skeleton mining for unsupervised person re-identification. International Journal of Computer Vision, 132(1), 238–260.
Rebuffi, S.A., Kolesnikov, A., Sperl, G., & Lampert, C.H. (2017). ICARL: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 2001–2010).
Shen, Y., Xiong, Y., Xia, W., & Soatto, S. (2020). Towards backward-compatible representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6368–6377).
Sohn, K. (2016). Improved deep metric learning with multi-class n-pair loss objective. In Advances in neural information processing systems (Vol. 29).
Sun, Y., Cheng, C., Zhang, Y., Zhang, C., Zheng, L., Wang, Z., & Wei, Y. (2020). Circle loss: A unified perspective of pair similarity optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6398–6407).
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, PMLR (pp. 10347–10357).
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9(11).
Wang, C.Y., Chang, Y.L., Yang, S.T., Chen, D., & Lai, S.H. (2020). Unified representation learning for cross model compatibility. In British Machine Vision Conference.
Wang, H., Wang, Y., Zhou, Z., Ji, X., Gong, D., Zhou, J., Li, Z., & Liu, W. (2018). Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5265–5274).
Wang, W., Sun, Y., Zhang, W., & Yang, Y. (2021a). D\(^{2}\)lv: A data-driven and local-verification approach for image copy detection. arXiv preprint arXiv:2111.07090
Wang, W., Zhang, W., Sun, Y., & Yang, Y. (2021b). Bag of tricks and a strong baseline for image copy detection. arXiv preprint arXiv:2111.08004
Wang, W., Sun, Y., Yang, Y. (2023a). A benchmark and asymmetrical-similarity learning for practical image copy detection. In AAAI Conference on Artificial Intelligence.
Wang, W., Zhong, Z., Wang, W., Chen, X., Ling, C., Wang, B., & Sebe, N. (2023b). Dynamically instance-guided adaptation: A backward-free approach for test-time domain adaptive semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 24090–24099).
Wang, Z., Gao, Z., Guo, K., Yang, Y., Wang, X., & Shen, H. T. (2023c). Multilateral semantic relations modeling for image text retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2830–2839).
Wu, W., Sun, Z., Song, Y., Wang, J., & Ouyang, W. (2024). Transferring vision-language models for visual recognition: A classifier perspective. International Journal of Computer Vision, 132(2), 392–409.
Zhong, Z., Zhao, Y., Lee, G. H., & Sebe, N. (2022). Adversarial style augmentation for domain generalized urban-scene segmentation. Advances in Neural Information Processing Systems, 35, 338–350.
Zhou, K., Yang, Y., Qiao, Y., & Xiang, T. (2023). Mixstyle neural networks for domain generalization and adaptation. International Journal of Computer Vision, 1–15.
Zhu, J., Liu, L., Zhan, Y., Zhu, X., Zeng, H., & Tao, D. (2023). Attribute-image person re-identification via modal-consistent metric learning. International Journal of Computer Vision, 131(11), 2959-2976.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Zhun Zhong.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Demonstration of the Base and Novel Patterns
Demonstration of the Base and Novel Patterns
Tables 5, 6, and 7 display the names, detailed elaborations, and demonstrations of base and novel patterns. Although we use four samples to illustrate, in our PEICD, the query images have no overlap with the training images, a basic requirement for image retrieval tasks.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, W., Sun, Y. & Yang, Y. Pattern-Expandable Image Copy Detection. Int J Comput Vis 132, 5618–5634 (2024). https://doi.org/10.1007/s11263-024-02140-5
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1007/s11263-024-02140-5