Long-Tailed Online Anomaly Detection (LTOAD) Dataset
Description
Introduction
Anomaly detection (AD) identifies the defect regions of a given image. Recent works have studied AD, focusing on learning AD without abnormal images, with long-tailed distributed training data, and using a unified model for all classes. In addition, online AD learning has also been explored. We expand in both directions to a realistic setting by considering the novel task of long-tailed online AD (LTOAD).
To encourage more follow up works on LTOAD, we publicly release the dataset splits and testing sequences used in our paper (“Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts” by Chiao-An Yang, Kuan-Chuan Peng, and Raymond A. Yeh, ICCV 2025).
Files in the unzipped folder:
1. ./README.md: This Markdown file
2. ./data: Folder contains long-tail splits and the testing sequences from four datasets. See below for details.
3. ./scripts: Folder contains dataset preparation scripts of the four datasets in our paper.
Preparation
- Please download the following AD datasets from their respective official sources.
- Please organize them as follows. Our `DATASET_ROOT` by default is `datasets`.
`DATASET_ROOT/`
|---`mvtec/`
|---|---`bottle/`
|---|---`cable/`
|---|---`capsule/`
|---|---`...`
|---`visa/`
|---|---`candle/`
|---|---`...`
|---`dagm/`
|---|---`Class 1/`
|---|---`...`
|---`uni_medical/`
|---|---`brain/`
|---|---`...`
- MVTec: (https://www.mvtec.com/company/research/datasets/mvtec-ad ; license: CC-BY-NC-SA-4.0)
Download MVTecAD. Unzip the `mvtec_anomaly_detection.tar.xz` to your specified `DATASET_ROOT`.
We provide the following scripts to help you organize.
bash scripts/tools/prepare_mvtec.sh mvtec_anomaly_detection.tar.xz [DATASET_ROOT] - VisA: (https://github.com/amazon-science/spot-diff ; license: CC-BY-4.0)
Download VisA from their Github. Unzip the `VisA_20220922.tar` to your specified `DATASET_ROOT`.
We provide the following scripts to help you organize.
bash scripts/tools/prepare_visa.sh VisA_20220922.tar [DATASET_ROOT] - DAGM: (https://www.kaggle.com/datasets/mhskjelvareid/dagm-2007-competition-dataset-optical-inspection ; license: CDLA-Sharing-1.0)
Download DAGM from Kaggle. Unzip the `archive.zip` to your specified `DATASET_ROOT`.
We provide the following scripts to help you organize.
bash scripts/tools/prepare_dagm.sh archive.zip [DATASET_ROOT] - Uni-Medical: (https://github.com/DorisBao/BMAD ; license: CC-BY-NC-SA)
Download the organized Uni-Medical from ADer. Unzip the medical.zip to your specified `DATASET_ROOT`.
We provide the following scripts to help you organize.
bash scripts/tools/prepare_medical.sh medical.zip [DATASET_ROOT] - Our Dataset (LTOAD):
`data/`
|---`mvtec/`
|---|---`metas/`
|---|---|---`train/`
|---|---|---|---`exp-100.json`
|---|---|---|---`exp-200.json`
|---|---|---|---`step-100.json`
|---|---|---|---`...`
|---|---|---`test/`
|---|---|---|---`B.json`
|---|---|---|---`B-HF.json`
|---|---|---|---`B-TF.json`
|---|---|---|---`...`
|---`visa/`
|---|---`...`
|---dagm/`
|---|---`...`
|---`uni_medical/`
|---|---`...` - Training configurations:
For training long-tailed (offline) anomaly detection on MVTec, VisA, and DAGM, we reformatted the meta files provided by [LTAD](https://zenodo.org/records/10854201).
For Uni-Medical, we follow the LTAD's design logic and construct our own meta files. Since there are only 3 classes in Uni-Medical, we experiment with cases of each being the head class.
The meta files can be found under the `train/` of each dataset. - Testing configurations:
For each dataset (apart from Uni-Medical due to small number of classes), we provide both an offline meta file and a set of online meta files designed to simulate most cases.
The online meta files can be found under the `test/`. Each dataset consists of the following 8 configurations.
Please refer to our paper for more details.
- `B`: blurry (completely random)
- `B-HF`: blurry and head class(es) first
- `B-TF`: blurry and tail class(es) first
- `D2-HF`: disjoint with 2 sessions and head class(es) first
- `D2-TF`: disjoint with 2 sessions and tail class(es) first
- `D5-HF`: disjoint with 5 sessions and head class(es) first
- `D5-TF`: disjoint with 5 sessions and tail class(es) first
- `D5-M`: disjoint with 5 sessions and all classes mixed
Citation
If you use the LTOAD dataset in your research, please cite our paper:
@inproceedings{yang2025ltoad,
author = {Yang, Chiao-An and Peng, Kuan-Chuan and Yeh, Raymond A.},
title = {Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts},
booktitle = {The IEEE/CVF Conference on International Conference on Computer Vision (ICCV)},
year = {2025}
}
Copyright and License
The LTOAD dataset is released under CC-BY-SA-4.0 license.
For the images and annotations from MVTec, VisA, DAGM, and UniMedical datasets, please refer to their websites for their copyright and license terms.
Created by Mitsubishi Electric Research Laboratories (MERL), 2024-2025
SPDX-License-Identifier: CC-BY-SA-4.0
Files
LTOAD_dataset.zip
Files
(393.7 kB)
Name | Size | Download all |
---|---|---|
md5:14423c25bcdd11f5d933c5b339bc9b9d
|
393.7 kB | Preview Download |