Long-Tailed Online Anomaly Detection (LTOAD) Dataset

Yang, Chiao-An; Peng, Kuan-Chuan; Yeh, Raymond

doi:10.5281/zenodo.16283853

Published July 21, 2025 | Version v1

Dataset Open

Long-Tailed Online Anomaly Detection (LTOAD) Dataset

1. Purdue University West Lafayette
2. Mitsubishi Electric Research Laboratories

Introduction

Anomaly detection (AD) identifies the defect regions of a given image. Recent works have studied AD, focusing on learning AD without abnormal images, with long-tailed distributed training data, and using a unified model for all classes. In addition, online AD learning has also been explored. We expand in both directions to a realistic setting by considering the novel task of long-tailed online AD (LTOAD).

To encourage more follow up works on LTOAD, we publicly release the dataset splits and testing sequences used in our paper (“Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts” by Chiao-An Yang, Kuan-Chuan Peng, and Raymond A. Yeh, ICCV 2025).

Files in the unzipped folder:

1. ./README.md: This Markdown file

2. ./data: Folder contains long-tail splits and the testing sequences from four datasets. See below for details.

3. ./scripts: Folder contains dataset preparation scripts of the four datasets in our paper.

Preparation

Please download the following AD datasets from their respective official sources.
Please organize them as follows. Our `DATASET_ROOT` by default is `datasets`.

`DATASET_ROOT/`
|---`mvtec/`
|---|---`bottle/`
|---|---`cable/`
|---|---`capsule/`
|---|---`...`
|---`visa/`
|---|---`candle/`
|---|---`...`
|---`dagm/`
|---|---`Class 1/`
|---|---`...`
|---`uni_medical/`
|---|---`brain/`
|---|---`...`

MVTec: (https://www.mvtec.com/company/research/datasets/mvtec-ad ; license: CC-BY-NC-SA-4.0)
Download MVTecAD. Unzip the `mvtec_anomaly_detection.tar.xz` to your specified `DATASET_ROOT`.
We provide the following scripts to help you organize.

bash scripts/tools/prepare_mvtec.sh mvtec_anomaly_detection.tar.xz [DATASET_ROOT]
VisA: (https://github.com/amazon-science/spot-diff ; license: CC-BY-4.0)
Download VisA from their Github. Unzip the `VisA_20220922.tar` to your specified `DATASET_ROOT`.
We provide the following scripts to help you organize.

bash scripts/tools/prepare_visa.sh VisA_20220922.tar [DATASET_ROOT]
DAGM: (https://www.kaggle.com/datasets/mhskjelvareid/dagm-2007-competition-dataset-optical-inspection ; license: CDLA-Sharing-1.0)
Download DAGM from Kaggle. Unzip the `archive.zip` to your specified `DATASET_ROOT`.
We provide the following scripts to help you organize.

bash scripts/tools/prepare_dagm.sh archive.zip [DATASET_ROOT]
Uni-Medical: (https://github.com/DorisBao/BMAD ; license: CC-BY-NC-SA)
Download the organized Uni-Medical from ADer. Unzip the medical.zip to your specified `DATASET_ROOT`.
We provide the following scripts to help you organize.

bash scripts/tools/prepare_medical.sh medical.zip [DATASET_ROOT]
Our Dataset (LTOAD):
`data/`
|---`mvtec/`
|---|---`metas/`
|---|---|---`train/`
|---|---|---|---`exp-100.json`
|---|---|---|---`exp-200.json`
|---|---|---|---`step-100.json`
|---|---|---|---`...`
|---|---|---`test/`
|---|---|---|---`B.json`
|---|---|---|---`B-HF.json`
|---|---|---|---`B-TF.json`
|---|---|---|---`...`
|---`visa/`
|---|---`...`
|---dagm/`
|---|---`...`
|---`uni_medical/`
|---|---`...`
Training configurations:
For training long-tailed (offline) anomaly detection on MVTec, VisA, and DAGM, we reformatted the meta files provided by [LTAD](https://zenodo.org/records/10854201).
For Uni-Medical, we follow the LTAD's design logic and construct our own meta files. Since there are only 3 classes in Uni-Medical, we experiment with cases of each being the head class.
The meta files can be found under the `train/` of each dataset.
Testing configurations:
For each dataset (apart from Uni-Medical due to small number of classes), we provide both an offline meta file and a set of online meta files designed to simulate most cases.
The online meta files can be found under the `test/`. Each dataset consists of the following 8 configurations.
Please refer to our paper for more details.

- `B`: blurry (completely random)
- `B-HF`: blurry and head class(es) first
- `B-TF`: blurry and tail class(es) first
- `D2-HF`: disjoint with 2 sessions and head class(es) first
- `D2-TF`: disjoint with 2 sessions and tail class(es) first
- `D5-HF`: disjoint with 5 sessions and head class(es) first
- `D5-TF`: disjoint with 5 sessions and tail class(es) first
- `D5-M`: disjoint with 5 sessions and all classes mixed

Citation

If you use the LTOAD dataset in your research, please cite our paper:

@inproceedings{yang2025ltoad,
author = {Yang, Chiao-An and Peng, Kuan-Chuan and Yeh, Raymond A.},
title = {Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts},
booktitle = {The IEEE/CVF Conference on International Conference on Computer Vision (ICCV)},
year = {2025}
}

Copyright and License

The LTOAD dataset is released under CC-BY-SA-4.0 license.

For the images and annotations from MVTec, VisA, DAGM, and UniMedical datasets, please refer to their websites for their copyright and license terms.

Created by Mitsubishi Electric Research Laboratories (MERL), 2024-2025
 
SPDX-License-Identifier: CC-BY-SA-4.0

Files

LTOAD_dataset.zip

Files (393.7 kB)

Name	Size	Download all
LTOAD_dataset.zip md5:14423c25bcdd11f5d933c5b339bc9b9d	393.7 kB	Preview Download

	All versions	This version
Views	165	165
Downloads	29	29
Data volume	12.2 MB	12.2 MB

Long-Tailed Online Anomaly Detection (LTOAD) Dataset

Creators

Description

Files

LTOAD_dataset.zip

Files (393.7 kB)