Out-of-Distribution Detection with Virtual Outlier Smoothing

Nie, Jun; Luo, Yadan; Ye, Shanshan; Zhang, Yonggang; Tian, Xinmei; Fang, Zhen

doi:10.1007/s11263-024-02210-8

Out-of-Distribution Detection with Virtual Outlier Smoothing

Open access
Published: 14 August 2024

Volume 133, pages 724–741, (2025)
Cite this article

You have full access to this open access article

Download PDF

International Journal of Computer Vision Aims and scope Submit manuscript

Out-of-Distribution Detection with Virtual Outlier Smoothing

Download PDF

2807 Accesses
1 Citation
Explore all metrics

Abstract

Detecting out-of-distribution (OOD) inputs plays a crucial role in guaranteeing the reliability of deep neural networks (DNNs) when deployed in real-world scenarios. However, DNNs typically exhibit overconfidence in OOD samples, which is attributed to the similarity in patterns between OOD and in-distribution (ID) samples. To mitigate this overconfidence, advanced approaches suggest the incorporation of auxiliary OOD samples during model training, where the outliers are assigned with an equal likelihood of belonging to any category. However, identifying outliers that share patterns with ID samples poses a significant challenge. To address the challenge, we propose a novel method, Virtual Outlier Smoothing (VOSo), which constructs auxiliary outliers using ID samples, thereby eliminating the need to search for OOD samples. Specifically, VOSo creates these virtual outliers by perturbing the semantic regions of ID samples and infusing patterns from other ID samples. For instance, a virtual outlier might consist of a cat’s face with a dog’s nose, where the cat’s face serves as the semantic feature for model prediction. Meanwhile, VOSo adjusts the labels of virtual OOD samples based on the extent of semantic region perturbation, aligning with the notion that virtual outliers may contain ID patterns. Extensive experiments are conducted on diverse OOD detection benchmarks, demonstrating the effectiveness of the proposed VOSo. Our code will be available at https://github.com/junz-debug/VOSo.

Layer Adaptive Deep Neural Networks for Out-of-Distribution Detection

Unveiling the unseen: novel strategies for object detection beyond known distributions

Article 13 September 2024

Enclosing Prototypical Variational Autoencoder for Explainable Out-of-Distribution Detection

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Deep neural networks (DNNs), benefiting from the abundance of large-scale labeled samples, have achieved remarkable success across diverse fields. These models, however, are conventionally trained and evaluated under the assumption of a closed-set environment, where the label space remains consistent throughout the training and testing stages (Huang et al., 2017). This assumption often falls short in real-world applications, where samples from unseen classes can appear unexpectedly. This discrepancy has led to a surge of interest in out-of-distribution (OOD) detection (Bendale & Boult, 2016), a task that requires models to accurately classify in-distribution (ID) samples while effectively identifying OOD samples.

The seminal work proposes a score function for detecting OOD samples (Hendrycks & Gimpel, 2017), where the maximum softmax probability is leveraged as an indicator for OOD detection. In this context, samples with low maximum softmax probability are considered as OOD samples. However, this approach is limited by the tendency of DNNs to produce overly confident predictions for OOD samples (Nguyen et al., 2015; Hein et al., 2019). This stems from the intrinsic similarity in patterns between OOD and ID samples, making ID and OOD samples indistinguishable even with various outstanding scoring functions (Lee et al., 2018; Liu et al., 2020).

To mitigate the overconfidence issue, advanced works propose to incorporate auxiliary OOD samples during training (Hendrycks et al., 2019; Wang et al., 2023), where these introduced outliers are assigned with equal likelihood of belonging to any ID class. Consequently, DNNs trained on these outliers are expected to produce low confidence in OOD samples. In this context, the performance of this approach is inherently related to the selection of outliers. However, identifying outliers that share patterns with ID samples poses a significant challenge.

To address this challenge, in this paper, we introduce a novel approach, termed virtual outlier smoothing (VOSo). Instead of searching for additional OOD samples, VOSo constructs auxiliary outliers directly from ID samples, thus endowing auxiliary outliers with patterns similar to those of ID samples and negating the need for an exhaustive search for natural OOD samples. Similar approach occurs in VOS (Du et al., 2022) and NPOS (Tao et al., 2023), where they obtain virtual outliers by sampling around boundary ID points in the feature space. Although compelling, the feature space is heavily dependent on the ID samples. For instance, for a binary classification task on cats and tables, since cats and tables are very different, the network can easily distinguish between the two, which leads to the fact that the boundaries distinguishing cats and tables do not really reflect the boundaries of cats and non-cats. Therefore, the features obtained by sampling around the feature boundaries of cats are not good OOD samples for cats. In contrast, VOSo mitigates this issue by perturbing ID samples in the image space. Specifically, VOSo constructs virtual OOD samples by perturbing the semantic regions of ID samples and infusing patterns from other ID samples. For instance, a virtual outlier could be an image combining a cat’s face with a dog’s nose, where the cat’s face serves as the primary semantic feature for model prediction. To efficiently obtain semantically relevant regions in the images, we use Class Activation Maps (CAMs) (Zhou et al., 2016), a technique for visualizing neural networks that can obtain the contribution of different image regions to the model prediction. During training, we randomly select different sizes of prediction-related regions on the input images for perturbing.

Existing outlier exposure methods (Hendrycks et al., 2019; Tao et al., 2023) typically view auxiliary outliers equally and assign an equal likelihood of belonging to categories. Specifically, they set labels for auxiliary outliers that are uniformly distributed over the ID class. Unlike them, VOSo considers assigning different labels to the constructed virtual outliers. VOSo constructs virtual outliers using ID samples to endow outliers with patterns similar to those of ID samples. Consequently, an intuitive approach is to assign these virtual outliers with labels that are related to the ID patterns. In this way, the boundary between ID and OOD will transition more smoothly, as shown in Fig. 2. Specifically, we introduce a dynamic label assignment method for these virtual outliers, where the labels of these virtual OOD samples are adjusted based on the extent of semantic region perturbation. This approach aligns with the understanding that virtual outliers may still contain patterns similar to ID samples, thus necessitating a nuanced labeling strategy, which is consistent with human intuition, as shown in Fig. 1.

Our main contributions are listed as follows:

We propose virtual outlier smoothing (VOSo), a method that creates auxiliary outliers resembling ID samples by perturbing the semantic regions of ID samples.
The constructed virtual outliers provide a novel direction for using outliers to smooth model prediction, where the labels assigned to virtual outliers are related to the extent of semantic region perturbation.
Extensive experiments show that the proposed VOSo strategy greatly improves the OOD uncertainty estimation, and an ablation study is conducted to understand the efficacy of VOSo.

2 Related Work

OOD detection. The goal of OOD detection is to enable the model to distinguish between ID samples and OOD samples while maintaining the classification accuracy of ID samples. Many works try to mitigate the overconfidence of neural networks by designing different scoring functions, such as maximum softmax probability (Hendrycks & Gimpel, 2017), energy score (Liu et al., 2020), ReAct (Sun et al., 2021) and GradNorm score (Huang et al., 2021). Despite their simplicity and convenience, these methods are more like after-the-fact fixes. These types of methods may result in more detection time. In addition to that, the proposed scoring functions may have different effects in different scenarios, which sometimes need to be picked manually in practical applications. Another promising line of work attempts to improve OOD detection performance by training-time regularization (Hendrycks et al., 2019; Tao et al., 2023; Wei et al., 2022; Wang et al., 2023; Du et al., 2022; Pinto et al., 2022). One group is to directly constrain the training process of ID data. For example, LogitNorm (Wei et al., 2022) enforces a constant vector norm on the logits in training. RegMixup (Pinto et al., 2022) utilizes Mixup (Zhang et al., 2018) as an additional regularizer to the standard cross-entropy loss.Another group of methods is to use outliers to constrain the training process. The model is regularized to produce lower confidence on outliers. These outliers can be additional manually collected surrogate OOD data (Hendrycks et al., 2019; Wang et al., 2023) or virtual outliers generated by the model (Tao et al., 2023; Du et al., 2022; Kong & Ramanan, 2021). VOS (Du et al., 2022) and NPOS (Tao et al., 2023) synthesize virtual outliers from the low-likelihood region in the feature space of ID data. OpenGAN (Kong & Ramanan, 2021) trains a GAN in the classifier’s feature space to generate virtual outliers. However, sampling virtual outliers in feature space relies on the quality of the feature space, while generative models often exhibit training instability. In contrast, VOSo circumvents these drawbacks by directly perturbing ID samples in pixel space to obtain virtual outliers (Zhang et al., 2022). In addition to this, another group of reconstruction-based methods (Yang et al., 2022; Perera et al., 2020; Zhou, 2022; Gao et al., 2023) also use generative models. They are based on the assumption that generative models trained with ID samples may not be able to reconstruct OOD samples with high quality, but this assumption may not always hold (Nalisnick et al., 2019). For this reason, MOODCat (Yang et al., 2022) and DiffGuard (Gao et al., 2023) further use conditional synthesis to mitigate this problem. However, the use of generative models to reconstruct samples in the testing phase introduces more detection time.

Confidence calibration. Many previous works have shown that neural networks tend to be overconfident in their predictions (Hein et al., 2019; Nguyen et al., 2015). To this end, some works address this problem by post-hoc methods such as Temperature Scaling (Platt et al., 1999). In addition, some method focus on the regularization of the model, such as weight decay (Guo et al., 2017), label smoothing (Szegedy et al., 2016).

Label smoothing. Label smoothing generates soft labels by applying a weighted average between the uniform distribution and the hard label. It has been shown to improve model calibration (Pereyra et al., 2017; Müller et al., 2019; Lukasik et al., 2020). It is shown that label smoothing encourages the penultimate layer features of the training examples from the same class to group in tight clusters (Müller et al., 2019). (Lukasik et al., 2020) shows that label smoothing can mitigate label noise. (Yuan et al., 2020) shows that part of knowledge distillation’s success derives from its ability to normalize soft labels the same way label smoothing does. (Yuan et al., 2023) studies the effectiveness of biased soft labels in knowledge distillation and weakly-supervised learning. In this paper, we assign soft labels to virtual outliers.

3 Preliminaries

Let $\mathcal {X}$ and $\mathcal {Y}=\{1,\ldots ,K\}$ represent the input image space and ID label space, respectively. Here, K represents the number of classes of ID samples. We consider the ID distribution $D_{X_{\textrm{I}}Y_{\textrm{I}}}$ as a joint distribution defined over $\mathcal {X} \times \mathcal {Y}$, where $X_{\textrm{I}}$ and $Y_{\textrm{I}}$ are random variables whose outputs are from spaces $\mathcal {X}$ and $\mathcal {Y}$. During testing time, there are some OOD joint distributions $D_{X_\textrm{O}Y_\textrm{O}}$ defined over $\mathcal {X}^C \times \mathcal {Y}^C$, where $X_\textrm{O}$ and $Y_\textrm{O}$ are random variables whose outputs are from semantic-shifted space $\mathcal {X}^C$ and label-shifted space $\mathcal {Y}^C$. Here, $\mathcal {X}^C$ represents OOD data space in which the data may come from classes that are unknown during training, and $\mathcal {Y}^C$ represents OOD label space with $\mathcal {Y}^C \cap \mathcal {Y} = \emptyset $. Then following (Fang et al., 2022), OOD detection can be defined as follows:

Problem 1

(OOD Detection) Given the labeled ID samples

$$\begin{aligned} S_{I} = \{(\textbf{x}^1,\textbf{y}^1),...,(\textbf{x}^n,\textbf{y}^n) \} \sim D_{X_{\textrm{I}}Y_{\textrm{I}}}^n,~i.i.d., \end{aligned}$$

OOD detection aims to learn an OOD detector G using $S_{I}$ such that for any test sample $\textbf{x}$:

if $\textbf{x}$ is drawn from $D_{X_\textrm{I}}$, then G can classify $\textbf{x}$ into correct ID classes;
if $\textbf{x}$ is drawn from $D_{X_\textrm{O}}$, then G can detect $\textbf{x}$ as OOD sample.

Note that in Problem 1, we use the one-hot vector to represent the label $\textbf{y}$.

Model and loss function. In this work, we utilize $\textbf{f}_{\varvec{\theta }}(\cdot )$ to represent the deep neural network-based model with parameters $\varvec{\theta }\in \varvec{\Theta }$, where $\varvec{\Theta }$ denotes the parameter space. We also set $\ell (\cdot ,\cdot ): \mathbb {R}^K \times \mathbb {R}^K \rightarrow \mathbb {R}_+$ to be the loss function.

Score-based OOD detection strategy. Many representative OOD detection methods (Hendrycks & Gimpel, 2017; Liang et al., 2018; Liu et al., 2020) follow a score-based strategy, i.e., given a model $\textbf{f}_{\varvec{\theta }}$ trained using $S_I$, a scoring function $S(\cdot )$ and a threshold $\tau $, then $\textbf{x}$ is detected as ID data iff $S(\textbf{x};\textbf{f}_{\varvec{\theta }})\ge \tau $, i.e.,

$$\begin{aligned} \begin{aligned} G_{\tau }(\textbf{x}) = \text {ID},~&\text {if}~S(\textbf{x};\textbf{f}_{\varvec{\theta }}) \ge \tau ;\\ G_{\tau }(\textbf{x}) = \text {OOD},~&\text {if}~S(\textbf{x};\textbf{f}_{\varvec{\theta }}) < \tau . \end{aligned} \end{aligned}$$

(1)

In this paper, we use maximum softmax probability (MSP) (Hendrycks & Gimpel, 2017) as the scoring function to design our OOD detector, i.e.,

$$\begin{aligned} S_\textrm{MSP}(\textbf{x};\textbf{f}_{\varvec{\theta }}) = \max _{k\in \mathcal {Y}} \ \textrm{softmax}_k (\textbf{f}_{\varvec{\theta }}(\textbf{x})), \end{aligned}$$

(2)

where $\textrm{softmax}_k (\textbf{f}_{\varvec{\theta }}(\textbf{x}))$ is the k-th coordinate function of $\textrm{softmax} (\textbf{f}_{\varvec{\theta }}(\textbf{x}))$.

Classical training strategy. In the majority of score-based strategies, researchers primarily emphasize the design of effective scoring functions, aiming to extract the detection potential of $\textbf{f}_{\varvec{\theta }}(\cdot )$. In general, the model $\textbf{f}_{\varvec{\theta }}$ is trained based on a unified learning strategy—empirical risk minimization (ERM) principle, i.e.,

$$\begin{aligned} \min _{\varvec{\theta }\in \varvec{\Theta }} \frac{1}{n} \sum _{i=1}^n \ell (\textbf{f}_{\varvec{\theta }}(\textbf{x}^i), \textbf{y}^i). \end{aligned}$$

(3)

In this work, our primary focus is to design a more effective training strategy that enhances the separation of ID and OOD samples for model $\textbf{f}_{\varvec{\theta }}$.

4 Methodology

In this section, we mainly introduce our proposed method, virtual outlier smoothing (VOSo).

4.1 Outlier Exposure

Deep neural networks (DNNs) trained using cross-entropy loss tend to exhibit overconfidence when encountering OOD samples. To this end, outlier exposure (OE) (Wang et al., 2023) proposes to regularize the training process by introducing an auxiliary OOD distribution $D_{X_OY_O}^a$, which makes it possible to lower the prediction confidence for OOD samples. Specifically, to mitigate the overconfidence issue, OE mainly focuses on the following learning objective:

$$\begin{aligned} \begin{aligned} \mathcal {L}(\textbf{f}_{\varvec{\theta }})&= \mathbb {E}_{(\textbf{x},\textbf{y})\sim D_{X_IY_I}} \ell _\textrm{ce}(\textbf{f}_{\varvec{\theta }}(\textbf{x}), \textbf{y}) \\ {}&\quad + \ \lambda \mathbb {E}_{(\textbf{x},\textbf{y})\sim D_{X_OY_O}^a} \ell _\textrm{oe}(\textbf{f}_{\varvec{\theta }}(\textbf{x})), \end{aligned} \end{aligned}$$

where $\ell _\textrm{ce}(\cdot ,\cdot )$ is the cross-entropy loss, $\lambda $ stands for the hyperparameter, and $\ell _\textrm{oe}(\cdot )$ represents Kullback–Leibler divergence to the uniform distribution, i.e., $-\sum _k {\text {softmax}}_k \textbf{f}_{\varvec{\theta }}(\textbf{x}) / K$. This means that the model is encouraged to produce low confidence levels for these auxiliary OOD samples. Thanks to the integration of the auxiliary outliers during training, OE usually has reliable detection performance for auxiliary OOD distribution.

4.2 Virtual Outlier Construction

OE is a promising approach to promote OOD detection. Despite its success, a significant limitation is that true OOD samples may deviate from auxiliary OOD samples, leading to overconfident judgments in scenarios where OOD samples bear resemblance to ID samples. To address this problem, one approach is to construct virtual outliers similar to ID samples directly from ID samples. The principle of constructing virtual outliers is simple but effective. Previous work (Du et al., 2022; Tao et al., 2023) obtains virtual outliers by sampling around ID margin features in feature space. However, as stated before, this approach relies on the quality of the feature space. To avoid this drawback, in this work, we propose a simple approach: injecting an anomalous pattern (a randomly selected part of another image or a 0) into the ID sample, which does not introduce much additional computational overhead. However, simply injecting may not be effective, e.g., injecting the background region of one image into the background region of another image does not build a valid OOD sample. In order to construct an effective OOD sample, we need to inject an anomaly model into the semantic region of the ID sample, which can be used to break the ID attribute.

Therefore, the construction of outliers has two-fold: i) locating semantic regions and ii) injecting patterns. Thanks to the development of visualization of DNNs, we can locate the semantic region in the image space through Class Activation Maps (CAMs), which is a weakly-supervised localization method that can identify discriminative regions, as shown in Fig. 3. Specifically, locating semantic regions can be formulated as:

$$\begin{aligned} M(\textbf{x};t)[i,j] = {\left\{ \begin{array}{ll}0, &{} \text {if } C_\textbf{x}[i,j] \ge t \\ 1, &{} \text {otherwise.}\end{array}\right. } \end{aligned}$$

(4)

where $M(\textbf{x};t)$ is the mask of an input image $\textbf{x}$, $C_\textbf{x}$ is the CAM of $\textbf{x}$, and $t \in [0,1]$ denotes the threshold to transform the CAM into a binary mask. Here, we re-scale the CAM into the range [0, 1]. In each iteration, we sample the threshold t from a Beta distribution, i.e., $t \sim \textit{Beta}(\alpha , \beta )$. As t controls the size of perturbed regions, this approach would produce different sizes of perturbed regions.

The located regions (image patch) can be injected into various patterns. For instance, one ideal approach is to inject semantic regions from images with different labels, i.e., merging the cat’s nose with a dog’s head. An on-the-fly approach is to sample patches from different images randomly. The simplest approach is to inject zeroes or noise sampled from a Gaussian distribution, sharing the same spirit with random erasing (Zhong et al., 2020). This can be formulated as:

$$\begin{aligned} \tilde{\textbf{x}}(\textbf{z}) = M(\textbf{x},t) \odot \textbf{x} + (1 - M(\textbf{x},t)) \odot \textbf{z}, \end{aligned}$$

(5)

where $\tilde{\textbf{x}}(\textbf{z})$ is the constructed virtual outlier, $M(\textbf{x},t)$ stands for the mask indicating the semantic region, and $\textbf{z}$ represents the injected patterns.

4.3 Virtual Outlier Smoothing

In the existing OE approaches, all OOD samples have the same labels, which could be inappropriate. For example, both tigers and fishes can be used as OOD classes for cats, but in the feature space, tigers should be closer to cats than fishes. It is not reasonable to assign the same OOD label to tiger and fish. In VOSo, virtual outliers are obtained by perturbing ID samples. Similarly, we should assign appropriate OOD labels to virtual outliers, according to the injection model. To this end, we propose a dynamic label assignment method.

Specifically, in VOSo, the label of the virtual outlier varies with both the injected pattern $\textbf{z}$ and the size of the mask region. Specifically, given the injected pattern $\textbf{z}$, the label ${\varvec{\tilde{{\textbf {y}}}}}$ of the outlier is a function of the mask $M(\textbf{x}, t)$ and the label $\textbf{y}$,

$$\begin{aligned} {{\varvec{\tilde{{\textbf {y}}}}}} (\textbf{x}, \textbf{z}) = \phi _{\textbf{z}}(M(\textbf{x}, t), \textbf{y}), \end{aligned}$$

(6)

where $\phi _{\textbf{z}}$ is the function required to be designed according to the type of injected pattern $\textbf{z}$.

Given an image $\textbf{x}$, we introduce three types of patterns $\textbf{z}$ to enforce the model prediction to vary with the patterns appearing in the image. The simplest case is to set $\textbf{z}_1=\textbf{x}$, where ${{\varvec{\tilde{{\textbf {y}}}}}}=\textbf{y}$. DNNs should predict the one-hot label for ID patterns. Meanwhile, we set $\textbf{z}_2=\textbf{0}$, where the label is soft and varies with the size of the perturbed region $\epsilon (t)$ controlled by the threshold t in Eq. (4). This can be formulated as follows:

$$\begin{aligned} {\varvec{\tilde{{\textbf {y}}}}(\textbf{x},\textbf{z}_2)} = (1-\epsilon (t)) \cdot \textbf{y} + \epsilon (t) /K \cdot \textbf{u}, \end{aligned}$$

(7)

where $\textbf{u} = [1/K,...,1/K]\in \mathbb {R}^{K}$ and the function $\epsilon (t)$ is designed as:

$$\begin{aligned} \epsilon (t) = 1 - \exp ((t - 1) / T). \end{aligned}$$

(8)

Here, we introduce the temperature coefficient T. The most complex pattern is set as follows:

$$\begin{aligned} \textbf{z}_3 = \lambda \textbf{x} + (1 - \lambda ) \textbf{x}', \end{aligned}$$

(9)

where $\lambda $ is drawn from $\textrm{Beta}(\alpha ,\beta )$ and $\textbf{x}'$ is a randomly selected ID sample. Accordingly, the label is designed as follows:

$$\begin{aligned} {\varvec{\tilde{{\textbf {y}}}}}(\textbf{x},\textbf{z}_{3}) = (1-\epsilon (t)) \cdot \textbf{y} + \epsilon (t) (\lambda \textbf{y} + (1 - \lambda ) \textbf{y}'), \end{aligned}$$

(10)

where $\textbf{y}'$ is the label of the input $\textbf{x}'$. As shown in Fig. 4, the most complex patterns are good outliers, because they are mainly in the low-likelihood region of the ID sample space.

According to the construction approach, introducing $\textbf{z}_1$-type patterns makes DNNs predict the one-hot label for ID patterns, since complete ID patterns are given. Similarly, introducing $\textbf{z}_2$-type patterns lowers the prediction confidence on ID samples due to the missing ID patterns. Introducing $\textbf{z}_3$-type patterns leads to OOD samples, which lowers prediction confidence on OOD samples having ID patterns. Therefore, the objective function can be written as:

$$\begin{aligned} \begin{aligned} \mathcal {L}(\textbf{f}_{\varvec{\theta }}) =&\frac{1}{n} \sum _{i=1}^n \ \Big [\ell _\textrm{ce}(\textbf{f}_{\varvec{\theta }}(\textbf{x}^i), \textbf{y}^i) \!+\! \gamma _1 \ell _\textrm{ce}(\textbf{f}_{\varvec{\theta }} ({{\varvec{\tilde{{\textbf {x}}}}}}^i(\textbf{z}_2)), {{\varvec{\tilde{{\textbf {y}}}}}}(\textbf{x}^i, \textbf{z}_2)) \\&+ \gamma _2\ell _\textrm{ce}(\textbf{f}_{\varvec{\theta }} ({\varvec{{\tilde{{\textbf {x}}}}}}^i(\textbf{z}_3)), {{\varvec{\tilde{{\textbf {y}}}}}}(\textbf{x}^i, \textbf{z}_3))\Big ], \end{aligned} \end{aligned}$$

(11)

where $\gamma _1$ and $\gamma _2$ are hyperparameters. Note that in Eq. (11), we omit the utilization of $\textbf{z}_1$ for simplicity.

4.4 Tackling Distribution Shift

Training DNNs with outliers would degenerate the prediction accuracy, leading to poor detection performance. This aligns with the rationale of domain adaptation (Gong et al., 2016). Namely, training models over data sampled from different distributions cause performance degeneration. Thus, leveraging both ID and OOD samples to train DNNs would degenerate prediction accuracy on ID samples, leading to poor OOD detection performance. This is consistent with our experimental observations, where naive incorporation of virtual outliers to the training process resulted in a limited improvement.

To address this problem, we propose to employ a dual batch normalization layer (DuBN) (Xie et al., 2020) to model two distributions simultaneously. The intuition of this approach is illustrated in Fig. 5. Specifically, in each batch normalization layer, the features of ID samples (i.e., ${{\varvec{\tilde{{\textbf {x}}}}}}(\textbf{z}_1)$ and ${{\varvec{\tilde{{\textbf {x}}}}}}(\textbf{z}_2)$) are fed into normal BN, while the features of virtual outliers (i.e., $\varvec{\tilde{{\textbf {x}}}}(\textbf{z}_3)$) are fed into an auxiliary BN. In the testing stage, DNNs merely use the normal BN for prediction.

In each iteration, we sample masking thresholds and mixing level from Beta distributions: $t_1 \sim \textrm{Bata}(\alpha _1, \beta _1), t_2 \sim \textrm{Bata}(\alpha _2, \beta _2), \lambda \sim \textrm{Bata}(\alpha _3, \beta _3)$, and compute the risk in Eq. (11), using gradient backpropagation to update our model $\textbf{f}_{\varvec{\theta }}$. An overview of our method is presented in Fig. 6.

5 Experiments

In this section, experiments are conducted to validate the effectiveness of the proposed method.

5.1 Experiment Setup

Datasets. Following the common benchmarks used in previous work (Zhang et al., 2023b), we use CIFAR10 and CIFAR100 (Krizhevsky et al., 2009) as our major ID datasets. We use five common benchmarks as the OOD test datasets: Textures (Cimpoi et al., 2014), SVHN (Netzer et al., 2011), iSUN (Xu et al., 2015), Places365 (Zhou et al., 2018) and LSUN (Yu et al., 2015). Besides, we further conduct experiments on large-scale datasets. Following NPOS, we use ImageNet-1k (Deng et al., 2009) dataset as the in-distribution data. For OOD datasets, we use iNaturalist (Horn et al., 2018), SUN (Xiao et al., 2010), Places (Zhou et al., 2018) and Texture (Cimpoi et al., 2014).

Evaluation metrics. For evaluation, we follow the commonly-used metrics in OOD detection: (1) the false positive rate of OOD samples when the true positive rate of ID samples is at $95\%$ (FPR95), and (2) the area under the receiver operating characteristic curve (AUROC). We also report ID classification accuracy (ID-ACC) to reflect the preservation level of the performance for the original classification task on ID samples.

OOD detection baselines. We use both post-hoc inference methods and training methods as baselines. For post-hoc methods, we take MSP score (Hendrycks & Gimpel, 2017), ODIN (Liang et al., 2018), ReAct (Sun et al., 2021), Energy score (Liu et al., 2020) and ASH (Djurisic et al., 2023) as baselines. And for training methods, we use MOODCAT (Yang et al., 2022), OpenGAN (Kong & Ramanan, 2021), RegMixup (Pinto et al., 2022), VOS (Du et al., 2022), LogitNorm (Wei et al., 2022) and NPOS (Tao et al., 2023) as baselines. We also compares the performance of VOSo under different OOD detection scoring functions. Besides, we further compare our method with OE methods (Hendrycks et al., 2019; Zhang et al., 2023a; Wang et al., 2023).

Table 1 OOD detection performance comparison between using softmax cross-entropy loss and VOSo loss

Out-of-Distribution Detection with Virtual Outlier Smoothing

Abstract

Similar content being viewed by others

Layer Adaptive Deep Neural Networks for Out-of-Distribution Detection

Unveiling the unseen: novel strategies for object detection beyond known distributions

Enclosing Prototypical Variational Autoencoder for Explainable Out-of-Distribution Detection

Explore related subjects

1 Introduction

2 Related Work

3 Preliminaries

Problem 1

4 Methodology

4.1 Outlier Exposure

4.2 Virtual Outlier Construction

4.3 Virtual Outlier Smoothing

4.4 Tackling Distribution Shift

5 Experiments

5.1 Experiment Setup

5.2 Experimental Results

5.3 Ablation Study

5.4 Additional Experiments on Model Calibration and Data-Shift Robustness

6 Discussions

6.1 Selection of Hyperparameters

6.2 Limitations

7 Conclusion

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords