这是indexloc提供的服务,不要输入任何密码
Skip to main content
Log in

Revisiting Deep Ensemble for Out-of-Distribution Detection: A Loss Landscape Perspective

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Existing Out-of-Distribution (OoD) detection methods address to detect OoD samples from In-Distribution (InD) data mainly by exploring differences in features, logits and gradients in Deep Neural Networks (DNNs). We in this work propose a new perspective upon loss landscape and mode ensemble to investigate OoD detection. In the optimization of DNNs, there exist many local optima in the parameter space, or namely modes. Interestingly, we observe that these independent modes, which all reach low-loss regions with InD data (training and test data), yet yield significantly different loss landscapes with OoD data. Such an observation provides a novel view to investigate the OoD detection from the loss landscape, and further suggests significantly fluctuating OoD detection performance across these modes. For instance, FPR values of the RankFeat (Song et al. in Advances in Neural Information Processing Systems 35:17885–17898, 2022) method can range from 46.58% to 84.70% among 5 modes, showing uncertain detection performance evaluations across independent modes. Motivated by such diversities on OoD loss landscape across modes, we revisit the deep ensemble method for OoD detection through mode ensemble, leading to improved performance and benefiting the OoD detector with reduced variances. Extensive experiments covering varied OoD detectors and network structures illustrate high variances across modes and validate the superiority of mode ensemble in boosting OoD detection. We hope this work could attract attention in the view of independent modes in the loss landscape of OoD data and more reliable evaluations on OoD detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://github.com/fanghenshaometeor/ood-mode-ensemble.

  2. https://github.com/pytorch/examples/tree/main/imagenet.

  3. https://github.com/yitu-opensource/T2T-ViT.

References

  • Cimpoi, M., Maji, S., Kokkinos, I., et al. (2014) Describing textures in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3606–3613

  • Deng, J., Dong, W., Socher, R., et al. (2009) Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 248–255

  • Dosovitskiy A, Beyer L, Kolesnikov A, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations

  • Draxler, F., Veschgini, K., Salmhofer, M., et al. (2018) Essentially no barriers in neural network energy landscape. In: International conference on machine learning, PMLR, pp 1309–1318

  • Fang, K., Tao, Q., Wu, Y., et al. (2022a) On multi-head ensemble of smoothed classifiers for certified robustness. arXiv preprint arXiv:2211.10882

  • Fang, K., Tao, Q., Wu, Y., et al. (2024). Towards robust neural networks via orthogonal diversity. Pattern Recognition, 149, 110281.

    Article  Google Scholar 

  • Fang, Z., Li, Y., Lu, J., et al. (2022). Is out-of-distribution detection learnable? Advances in Neural Information Processing Systems, 35, 37199–37213.

    Google Scholar 

  • Fort, S., Hu, H., Lakshminarayanan, B. (2019) Deep ensembles: A loss landscape perspective. arXiv preprint arXiv:1912.02757

  • Garipov, T., Izmailov, P., Podoprikhin, D., et al (2018) Loss surfaces, mode connectivity, and fast ensembling of dnns. Advances in neural information processing systems 31

  • Goodfellow, I.J., Vinyals, O., Saxe, AM. (2015) Qualitatively characterizing neural network optimization problems. In: International Conference on Learning Representations

  • Han, T., & Li, Y. F. (2022). Out-of-distribution detection-assisted trustworthy machinery fault diagnosis approach with uncertainty-aware deep ensembles. Reliability Engineering & System Safety, 226, 108648.

    Article  Google Scholar 

  • He, K., Zhang, X., Ren, S., et al. (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  • Hendrycks, D., Gimpel, K. (2016) A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: International Conference on Learning Representations

  • Horváth, M.Z., Mueller, MN., Fischer, M., et al. (2021) Boosting randomized smoothing with variance reduced classifiers. In: International Conference on Learning Representations

  • Hsu, YC., Shen, Y., Jin, H., et al. (2020) Generalized odin: Detecting out-of-distribution image without learning from out-of-distribution data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10951–10960

  • Huang, G., Liu, Z., Van Der Maaten, L., et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  • Huang, R., Geng, A., & Li, Y. (2021). On the importance of gradients for detecting distributional shifts in the wild. Advances in Neural Information Processing Systems, 34, 677–689.

    Google Scholar 

  • Krizhevsky, A. (2009) Learning multiple layers of features from tiny images. Master’s thesis, University of Toronto

  • Lakshminarayanan B, Pritzel A, Blundell C (2017) Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems 30

  • Lee, J., AlRegib, G. (2020) Gradients as a measure of uncertainty in neural networks. In: 2020 IEEE International Conference on Image Processing (ICIP), IEEE, pp 2416–2420

  • Lee, K., Lee, K., Lee, H., et al. (2018) A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems 31

  • Li, H., Xu, Z., Taylor, G, et al. (2018) Visualizing the loss landscape of neural nets. Advances in neural information processing systems 31

  • Liang, S., Li, Y., Srikant, R. (2018) Enhancing the reliability of out-of-distribution image detection in neural networks. In: International Conference on Learning Representations

  • Liu, W., Wang, X., Owens, J., et al. (2020). Energy-based out-of-distribution detection. Advances in neural information processing systems, 33, 21464–21475.

    Google Scholar 

  • Maaten Van der, L., Hinton, G. (2008) Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605.

  • Miller, JP., Taori, R., Raghunathan, A., et al (2021) Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization. In: International Conference on Machine Learning, PMLR, pp 7721–7735

  • Netzer, Y., Wang, T., Coates, A, et al. (2011) Reading digits in natural images with unsupervised feature learning. In: Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning

  • Nguyen, A., Yosinski, J., Clune, J. (2015) Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 427–436

  • Peebles, W., Xie, S. (2023) Scalable diffusion models with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 4195–4205

  • Rame, A., Kirchmeyer, M., Rahier, T., et al. (2022). Diverse weight averaging for out-of-distribution generalization. Advances in Neural Information Processing Systems, 35, 10821–10836.

    Google Scholar 

  • Shen, Z., Liu, J., He, Y., et al. (2021) Towards out-of-distribution generalization: A survey. arXiv preprint arXiv:2108.13624

  • Song, Y., Sebe, N., & Wang, W. (2022). Rankfeat: Rank-1 feature removal for out-of-distribution detection. Advances in Neural Information Processing Systems, 35, 17885–17898.

    Google Scholar 

  • Sun, Y., Guo, C., & Li, Y. (2021). React: Out-of-distribution detection with rectified activations. Advances in Neural Information Processing Systems, 34, 144–157.

    Google Scholar 

  • Sun, Y., Ming, Y., Zhu, X., et al. (2022) Out-of-distribution detection with deep nearest neighbors. In: International Conference on Machine Learning, PMLR, pp 20827–20840

  • Tonin, F., Pandey, A., Patrinos, P., et al. (2021) Unsupervised energy-based out-of-distribution detection using stiefel-restricted kernel machine. In: 2021 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–8

  • Van Horn, G., Mac Aodha, O., Song, Y., et al. (2018) The inaturalist species classification and detection dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8769–8778

  • Vyas, A., Jammalamadaka, N., Zhu, X., et al. (2018) Out-of-distribution detection using an ensemble of self supervised leave-out classifiers. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 550–564

  • Wang, H., Li, Z., Feng, L., et al. (2022) Vim: Out-of-distribution with virtual-logit matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4921–4930

  • Wang, R., Li, Y., Liu, S. (2023) Exploring diversified adversarial robustness in neural networks via robust mode connectivity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2345–2351

  • Wortsman, M., Horton, M.C., Guestrin, C., et al. (2021) Learning neural network subspaces. In: International Conference on Machine Learning, PMLR, pp 11217–11227

  • Wortsman, M., Ilharco, G., Gadre, SY., et al. (2022) Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In: International conference on machine learning, PMLR, pp 23965–23998

  • Xiao, J., Hays, J., Ehinger, KA., et al. (2010) Sun database: Large-scale scene recognition from abbey to zoo. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 3485–3492

  • Xu, P., Ehinger, K.A., Zhang, Y., et al. (2015) Turkergaze: Crowdsourcing saliency with webcam based eye tracking. arXiv preprint arXiv:1504.06755

  • Xue, F., He, Z., Xie, C., et al. (2022) Boosting out-of-distribution detection with multiple pre-trained models. arXiv preprint arXiv:2212.12720

  • Yang, D., Mai Ngoc, K., Shin, I., et al. (2021). Ensemble-based out-of-distribution detection. Electronics, 10(5), 567.

    Article  Google Scholar 

  • Yang J, Zhou K, Li Y, et al (2021b) Generalized out-of-distribution detection: A survey. International Journal of Computer Vision, 1–28

  • Yang, J., Zhou, K., Liu, Z. (2023) Full-spectrum out-of-distribution detection. International Journal of Computer Vision pp 1–16

  • Ye, H., Xie, C., Cai, T., et al. (2021). Towards a theoretical framework of out-of-distribution generalization. Advances in Neural Information Processing Systems, 34, 23519–23531.

    Google Scholar 

  • Yu, F., Seff, A., Zhang, Y., et al. (2015) Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365

  • Yu, Y., Shin, S., Lee, S., et al. (2023) Block selection method for using feature norm in out-of-distribution detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 15701–15711

  • Yuan, L., Chen, Y., Wang, T., et al. (2021) Tokens-to-token vit: Training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 558–567

  • Zagoruyko, S., Komodakis, N. (2016) Wide residual networks. In: British Machine Vision Conference 2016, British Machine Vision Association

  • Zhang, C., Bengio, S., Hardt, M., et al. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3), 107–115.

  • Zhao, P., Chen, PY., Das, P., et al (2020) Bridging mode connectivity in loss landscapes and adversarial robustness. In: International Conference on Learning Representations

  • Zhou, B., Lapedriza, A., Khosla, A., et al. (2017). Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6), 1452–1464.

    Article  Google Scholar 

  • Zhou, Z. H. (2012). Ensemble methods: foundations and algorithms. CRC Press.

    Book  Google Scholar 

  • Zhu, Y., Chen, Y., Xie, C., et al. (2022). Boosting out-of-distribution detection with typical features. Advances in Neural Information Processing Systems, 35, 20758–20769.

    Google Scholar 

  • Zhu, Y., Chen, Y., Li, X., et al (2024) Rethinking out-of-distribution detection from a human-centric perspective. International Journal of Computer Vision pp 1–18

Download references

Acknowledgements

This work is jointly supported by National Natural Science Foundation of China (62376155, 62376153), Shanghai Municipal Science and Technology Research Program (22511105600), and Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102). Jie YANG and Xiaolin Huang are the corresponding authors.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xiaolin Huang or Jie Yang.

Additional information

Communicated by Hong Liu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 214 KB)

Appendices

Appendix A Baseline OoD Detectors and Mode Ensemble

We outline how the scoring function \(S(\cdot )\) is designed in the selected baseline OoD detectors and elaborate the corresponding mode ensemble strategies over these detectors. In the following, the outputs of the mode \(f:\mathbb {R}^D\rightarrow \mathbb {R}^C\) are the logits of C-dimension, in correspondence to C classes.

MSP (Hendrycks and Gimpel 2016) takes the maximum probability over the output logits as the scoring function. Given a new sample \(\varvec{x}\in \mathbb {R}^D\), its MSP score w.r.t a single mode \(f(\cdot )\) is

$$\begin{aligned} S_\textrm{MSP}(\varvec{x})=\max \left( \textrm{softmax}(f(\varvec{x}))\right) . \end{aligned}$$
(A1)

Ensemble on MSP adopts the average logits from N modes \(f_{s_i},i=1,\cdots ,N\) to calcuate the maximum probability as the score for \(\varvec{x}\):

$$\begin{aligned} S_\mathrm{MSP\text {-}ens}(\varvec{x})=\max \left( \textrm{softmax}\left( \frac{1}{N}\sum _{i=1}^Nf_{s_i}(\varvec{x})\right) \right) . \end{aligned}$$
(A2)

ODIN (Liang et al. 2018) introduces temperature scaling and adversarial examples into MSP and proposes the following score:

$$\begin{aligned} S_\textrm{ODIN}(\varvec{x})=\max \left( \textrm{softmax}(\frac{f(\varvec{{\bar{x}}})}{T})\right) , \end{aligned}$$
(A3)

where T denotes the temperature and \(\varvec{{\bar{x}}}\) denotes the perturbed adversarial example of \(\varvec{x}\). Following the settings in Liang et al. (2018), Song et al. (2022), we set \(T=1000\) and do not perturb \(\varvec{x}\) for the ImageNet-1K benchmark in experiments.

Ensemble on ODIN shares a similar scoring function with that of MSP-ensemble:

$$\begin{aligned} S_\mathrm{ODIN\text {-}ens}(\varvec{x})=\max \left( \textrm{softmax}(\frac{\sum _{i=1}^Nf_{s_i}(\varvec{{\bar{x}}})}{N\cdot T})\right) . \end{aligned}$$
(A4)

The adversarial attack is executed individually on each mode \(f_{s_i}\) and then the ODIN score is calculated on the average predictive logits on the perturbed inputs.

Energy (Liu et al. 2020) improves MSP via an energy function since energy is better aligned with the input probability density:

$$\begin{aligned} S_\textrm{energy}(\varvec{x})=\log \sum _{i=1}^C\exp (f^i(\varvec{x})), \end{aligned}$$
(A5)

where \(f^i(\varvec{x})\) denotes the i-th element in the C-dimension output logits.

Ensemble on Energy firstly averages the N logits and then computes the energy score:

$$\begin{aligned} \begin{aligned}&S_\mathrm{energy\text {-}ens}(\varvec{x})=\log \sum _{i=1}^C\exp (f_\textrm{ens}^i(\varvec{x})),\\&f_\textrm{ens}(\varvec{x})=\frac{1}{N}\sum _{i=1}^Nf_{s_i}(\varvec{x}). \end{aligned} \end{aligned}$$
(A6)

Mahalanobis (Lee et al. 2018) tries to model the network outputs at different layers as the mixture of multivariate Gaussian distributions and uses the Mahalanobis distance as the scoring function:

$$\begin{aligned} S_\textrm{mahal}(\varvec{x})=\max _c\left( -(f(\varvec{x})-\varvec{\mu }_c)^\top \Sigma (f(\varvec{x})-\varvec{\mu }_c)\right) , \end{aligned}$$
(A7)

where \(\varvec{\mu }_c\) denotes the mean feature vector of class-c and \(\Sigma \) denotes the covariance matrix across classes. In experiments, following the settings in Lee et al. (2018), Song et al. (2022), adversarial examples generated from 500 randomly-selected clean samples are involved to train the logistic regression, with a perturbation size 0.05 on CIFAR10 and 0.001 on ImageNet-1K.

Ensemble on Mahalanobis leverages the average output features at the same layers in DNNs over N modes and attacks the N modes simultaneously to calculate the Mahalanobis score. Details can be found in the released code.

KNN (Sun et al. 2022) is a simple but time-consuming and memory-inefficient detector since it performs nearest neighbor search on the \(\ell _2\)-normalized penultimate features between the test sample and all the training samples. The negative of the (k-th) shortest \(\ell _2\) distance between the features \(\varvec{h}^*\) of a new sample \(\varvec{x}^*\) and all the training features \(\varvec{h}^i\) is set as the score:

$$\begin{aligned} S_\textrm{KNN}(\varvec{x}^*)=-\min _{i:1,\cdots ,n_\textrm{tr}}\left\| \frac{\varvec{h}^*}{\Vert \varvec{h}^*\Vert _2}-\frac{\varvec{h}^i}{\Vert \varvec{h}^i\Vert _2}\right\| _2, \end{aligned}$$
(A8)

where \(\varvec{h}\) denotes the penultimate features in the DNN, and \(\varvec{h}^i\) denotes the penultimate features in correspondence to the i-th training sample in the training set of size \(n_\textrm{tr}\). The key of this detector is the \(\ell _2\) normalization on the features.

Ensemble on KNN improves performance by replacing the penultimate features from one single mode with the average penultimate features from N nodes:

$$\begin{aligned} \begin{aligned}&S_\mathrm{KNN\text {-}ens}(\varvec{x}^*)\\&\qquad =-\min _{j:1,\cdots ,n_\textrm{tr}} \left\| \frac{\sum _{i=1}^N\varvec{h}_{s_i}^*}{\Vert \sum _{i=1}^N\varvec{h}_{s_i}^*\Vert _2} -\frac{\sum _{i=1}^N\varvec{h}_{s_i}^j}{\Vert \sum _{i=1}^N\varvec{h}_{s_i}^j\Vert _2} \right\| _2. \end{aligned} \end{aligned}$$
(A9)

RankFeat (Song et al. 2022) removes the rank-1 matrix from each individual sample feature matrix \(\textbf{X}\) in the mini-batch during forward propagation in test, since the rank-1 feature drastically perturbs the predictions on OoD samples:

$$\begin{aligned} \begin{aligned}&\textbf{X}=\textbf{U}\textbf{S}\textbf{V}^\top ,\\&\textbf{X}^\prime =\textbf{X}-\varvec{s}_1\varvec{u}_1\varvec{v}_1^\top . \end{aligned} \end{aligned}$$
(A10)

In Eq. (A10), the singular value decomposition is firstly executed on the feature matrix \(\textbf{X}\) of an individual sample, leading to the left and right orthogonal singular vector matrices \(\textbf{U}\) and \(\textbf{V}\), and the rectangle diagonal singular value matrix \(\textbf{S}\). Then, the rank-1 matrix is calculated based on the largest singular value \(\varvec{s}_1\) and the two corresponding singular vectors \(\varvec{u}_1\) and \(\varvec{v}_1\), and gets subtracted from the original \(\textbf{X}\). Such removals are recommended at the 3rd and 4th blocks in DNNs. Finally, an Energy score (Liu et al., 2020) is calculated on the resulting changed output logits.

Ensemble on RankFeat executes the rank-1 feature removing on each mode individually and then average the N changed logits to compute the Energy score as Eq. (A6). Details can be found in the released code.

GradNorm (Huang et al. 2021) leverages the gradient information for OoD detection by calculating the \(\ell _1\) norm of the gradients w.r.t a KL divergence loss as the score of \(\varvec{x}\):

$$\begin{aligned} S_\textrm{GN}(\varvec{x})=\left\| \frac{\partial \ \textrm{KL}(\varvec{u}\ \Vert \ \textrm{softmax}(f(\varvec{x})))}{\partial \ \varvec{\omega }}\right\| _1, \end{aligned}$$
(A11)

where \(\varvec{u}=[1/C,\cdots ,1/C]\in \mathbb {R}^C\) and \(\varvec{\omega }\) is set as the weight parameters of the last full-connected layer in DNNs.

Ensemble on GradNorm firstly calculates the KL divergence between \(\varvec{u}\) and the softmax probability of the average logits of N modes. The final score for \(\varvec{x}\) is the average gradient norm over the selected parameters \(\varvec{\omega }_{s_i}\) in each mode:

$$\begin{aligned} \begin{aligned}&S_\mathrm{GN\text {-}ens}(\varvec{x})\\&\qquad =\frac{1}{N}\sum _{i=1}^N\left\| \frac{\partial \ \textrm{KL}(\varvec{u}\ \Vert \ \textrm{softmax}(f_\textrm{ens}(\varvec{x})))}{\partial \ \varvec{\omega }_{s_i}}\right\| _1. \end{aligned} \end{aligned}$$
(A12)
Table 12 Training time consumed by each individual model with specific GPUs in the experiments

Appendix B Experiment Setup Details

1.1 B.1 Details on Data Sets

The information of the data sets for OoD detection is outlined below. All these settings follow previous works.

For the CIFAR10 benchmark, the InD data is the training and test sets of CIFAR10, with 50,000 and 10,000 \(32\times 32\times 3\) images of 10 categories, respectively. In this experiment, all images from the OoD data sets are resized to an image size of \(32\times 32\times 3\). The evaluated OoD data sets are introduced below.

  • SVHN (Netzer et al., 2011) data set includes images of street view house numbers. The test set of SVHN is adopted for OoD detection with 26,032 digits of numbers 0-9.

  • LSUN (Yu et al., 2015) data set is about large-scale scene classification and understanding. The test set of LSUN is employed for OoD detection with 10,000 images of 10 categories.

  • iSUN (Xu et al., 2015) data set consists of 8,925 images of gaze traces, all of which are employed for OoD detection.

  • Textures (Cimpoi et al., 2014) data set covers various types of surface texture with 5,640 images of 47 categories. The whole data set is adopted in the evaluation of OoD detection performance.

  • Places365 (Zhou et al., 2017) data set is for scene recognition. A subset of 328,500 images are curated for OoD detection by (Sun et al., 2022).

For the ImageNet-1K benchmark, the InD data is the training and test sets of ImageNet-1K, with 1,281,167 and 50,000 images of 1,000 categories, respectively. In experiments, all images from the OoD data sets are resized to an image size of \(224\times 224\times 3\). The evaluated OoD data sets are introduced below.

  • iNaturalist (Van Horn et al., 2018) data set contains natural fine-grained images of different species of plants and animals. For OoD detection, 10,000 images are sampled from the selected concepts by Sun et al. (2022).

  • SUN (Xiao et al., 2010) data set covers a large variety of environmental scenes, places and the objects within. For OoD detection, 10,000 images are sampled from the selected concepts by Sun et al. (2022).

  • Places (Zhou et al., 2017) data set is about scene recognition. For OoD detection, 10,000 images are sampled by Sun et al. (2022).

  • Textures (Cimpoi et al., 2014) data set in the ImageNet-1K benchmark is the same as that described above in the CIFAR10 benchmark.

1.2 B.2 Details on Model Training

Thorough training details of the adopted modes for OoD detection are elaborated below. Particularly, to obtain multiple independent modes, it is required to re-train multiple models from scratch on the training sets of CIFAR10 (Krizhevsky, 2009) and ImageNet-1K (Deng et al., 2009) w.r.t different random seeds. For checkable reproducibility of the results reported in this paper, all the training and evaluation code and the trained checkpoints can be found in the publicly-released link given in the main text.

For the 10 independent modes of ResNet18 (He et al., 2016) and Wide ResNet28X10 (Zagoruyko & Komodakis, 2016) trained on CIFAR10 (Krizhevsky, 2009), each DNN is optimized via SGD for 150 epochs with a batch size of 256 and weight decay \(10^{-4}\). The initial learning rate is 0.1 and reduced via a cosine scheduler to 0 during training. Each DNN is trained on one single NVIDIA GeForce RTX 3090 GPU.

For the 5 independent modes of ResNet50 (He et al., 2016) and DenseNet121 (Huang et al., 2017) trained from scratch on ImageNet-1K (Deng et al., 2009), we follow the official training scriptFootnote 2 provided by PyTorch. Each DNN is optimized via SGD for 90 epochs with weight decay \(10^{-4}\). The initial learning rate is 0.1 and reduced every 30 epochs by a factor of 10. The batch size for training ResNet50 and DenseNet121 is 1000 and 800, respectively. Each training is executed parallely on 4 NVIDIA v100 GPUs.

For the 3 independent modes of T2T-ViT-14 (Yuan et al., 2021) trained from scratch on ImageNet, we follow the training script provided from the official github repositoryFootnote 3 and adopt the default recommendation settings. Each T2T-ViT-14 model is trained parallely on 8 NVIDIA v100 GPUs for 310 epochs with a batch size of 64, an initial learning rate of \(5\times 10^{-4}\) and weight decay 0.05.

The final models after the training finishes are evaluated for OoD detection in inference.

Computational overhead For training, general ensemble methods inevitably requires extra time and memory expenses in the employed multiple models than that of single-model-based methods (Horváth et al., 2021; Wortsman et al., 2022). Similarly in our proposed mode ensemble, multiple modes require training multiple DNNs. In Table 12, we provide the time-consuming of training a single model, where our N-mode ensemble takes such training N times. In implementation, we can proceed the training in parallel to reduce the heavy time cost of training sequentially, given with sufficient computation resources. In this way, multiple DNNs could be obtained in the time of training one DNN. Once the models are trained, our OoD detectors are available at hands. Then, they can be used to detect any given data, where only the inference is needed.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, K., Tao, Q., Huang, X. et al. Revisiting Deep Ensemble for Out-of-Distribution Detection: A Loss Landscape Perspective. Int J Comput Vis 132, 6107–6126 (2024). https://doi.org/10.1007/s11263-024-02156-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11263-024-02156-x

Keywords

Profiles

  1. Kun Fang