Unsupervised Anomaly Detection in Multivariate Time Series across Heterogeneous Domains

The widespread adoption of digital services, along with the scale and complexity at which they operate, has made incidents in IT operations increasingly more likely, diverse, and impactful. This has led to the rapid development of a central aspect of "Artificial Intelligence for IT Operations" (AIOps), focusing on detecting anomalies in vast amounts of multivariate time series data generated by service entities. In this paper, we begin by introducing a unifying framework for benchmarking unsupervised anomaly detection (AD) methods, and highlight the problem of shifts in normal behaviors that can occur in practical AIOps scenarios. To tackle anomaly detection under domain shift, we then cast the problem in the framework of domain generalization and propose a novel approach, Domain-Invariant VAE for Anomaly Detection (DIVAD), to learn domain-invariant representations for unsupervised anomaly detection. Our evaluation results using the Exathlon benchmark show that the two main DIVAD variants significantly outperform the best unsupervised AD method in maximum performance, with 20% and 15% improvements in maximum peak F1-scores, respectively. Evaluation using the Application Server Dataset further demonstrates the broader applicability of our domain generalization methods.

Please find:

Our technical report here, including details about the methods compared and hyperparameter grids considered.
The two main modules for our DIVAD method here and here.
The scripts we ran to generate our results here.

Please see the OmniAnomaly branch for our integration of the OmniAnomaly method.

Example Usage

Our experiments use the Exathlon benchmark, whose installation is described in the next section. Scripts to reproduce our Spark streaming experiments are located under scripts/spark.

For Dense DIVAD-GM, we for instance ran:

./train_divad.sh weakly -1 "1 2 3 4 5 6 9 10" os_only 0 0 0.2 random 7 regular_scaling std 1 1 1 1 mean settings-rate dense gm 1 8 1.0 -1 "" "" "" "200" "64" "" "" "" "200" False False False False -1 -1 False "" "" "" "" "" "" 16 -1 -1 False False 0.0 0.0 0.0 normal 5.0 5.0 100 fixed 100000.0 adamw 1e-5 none -1 -1 0.01 1.0 128 300 val_loss 100
./train_divad.sh weakly -1 "1 2 3 4 5 6 9 10" os_only 0 0 0.2 random 7 regular_scaling std 1 1 1 1 mean settings-rate dense gm 1 8 1.0 -1 "" "" "" "200" "64" "" "" "" "200" False False False False -1 -1 False "" "" "" "" "" "" 16 -1 -1 False False 0.0 0.0 0.0 normal 5.0 5.0 100 fixed 100000.0 adamw 3e-5 none -1 -1 0.01 1.0 128 300 val_loss 100
./train_divad.sh weakly -1 "1 2 3 4 5 6 9 10" os_only 0 0 0.2 random 7 regular_scaling std 1 1 1 1 mean settings-rate dense gm 1 8 1.0 -1 "" "" "" "200" "64" "" "" "" "200" False False False False -1 -1 False "" "" "" "" "" "" 16 -1 -1 False False 0.0 0.0 0.0 normal 5.0 5.0 100 fixed 100000.0 adamw 1e-4 none -1 -1 0.01 1.0 128 300 val_loss 100
./train_divad.sh weakly -1 "1 2 3 4 5 6 9 10" os_only 0 0 0.2 random 7 regular_scaling std 1 1 1 1 mean settings-rate dense gm 1 8 1.0 -1 "" "" "" "200" "64" "" "" "" "200" False False False False -1 -1 False "" "" "" "" "" "" 16 -1 -1 False False 0.0 0.0 0.0 normal 5.0 5.0 100 fixed 100000.0 adamw 3e-4 none -1 -1 0.01 1.0 128 300 val_loss 100

And saw that the learning rate of 3e-4 achieved the minimum validation loss. We therefore further ran:

./evaluate_divad.sh weakly -1 "1 2 3 4 5 6 9 10" os_only 0 0 0.2 random 7 regular_scaling std 1 1 1 1 mean settings-rate dense gm 1 8 1.0 -1 "" "" "" "200" "64" "" "" "" "200" False False False False -1 -1 False "" "" "" "" "" "" 16 -1 -1 False False 0.0 0.0 0.0 normal 5.0 5.0 100 fixed 100000.0 adamw 3e-4 none -1 -1 0.01 1.0 128 300 val_loss 100 prior_nll_of_mean -1 -1 -1 -1
./evaluate_divad.sh weakly -1 "1 2 3 4 5 6 9 10" os_only 0 0 0.2 random 7 regular_scaling std 1 1 1 1 mean settings-rate dense gm 1 8 1.0 -1 "" "" "" "200" "64" "" "" "" "200" False False False False -1 -1 False "" "" "" "" "" "" 16 -1 -1 False False 0.0 0.0 0.0 normal 5.0 5.0 100 fixed 100000.0 adamw 3e-4 none -1 -1 0.01 1.0 128 300 val_loss 100 agg_post_nll_of_mean gm 8 -1 -1

To get the evaluation results of this method, with anomaly scores based on the prior and aggregated posterior, respectively.

Similarly, scripts to reproduce our ASD experiments are located under scripts/asd.

Exathlon: A Benchmark for Explainable Anomaly Detection over Time Series

Access to high-quality data repositories and benchmarks have been instrumental in advancing the state of the art in many experimental research domains.

Exathlon is a benchmark for explainable anomaly detection over high-dimensional time series data, constructed based on real data traces from repeated executions of large-scale stream processing jobs on an Apache Spark cluster. For some of these executions, we introduced instances of six different types of anomalous events, for which we provide ground truth labels to evaluate a wide range of anomaly detection (AD) and explanation discovery (ED) methods.

This repository contains the labeled dataset and source code for comparing various AD and ED methods under our evaluation framework.

Description and documentation.

Project Configuration

The data traces and ground truth table were uploaded as zip files under the data/raw directory of the Exathlon repository. To extract them on Linux, macOS, or using Git Bash on Windows, execute the extract_data script from the project root folder:

$ ./extract_data.sh

This will extract all data files inside data/raw, preserving its directory structure. The content of data/raw can then either be left there or moved to any other location. In all cases, the full path to the extracted raw data must be provided to DATA_ROOT entry of the .env file described below.

Please refer to the dataset documentation for additional details regarding the dataset's content and format.

Create and activate a dedicated environment using virtualenv or conda by running from the project root folder:

## OPTION 1: virtualenv
$ virtualenv -p 3.9.19 venv
$ source venv/Scripts/activate venv # Windows with Git Bash or WSL
$ venv/Scripts/activate.bat # Windows with cmd
$ source venv/bin/activate # POSIX systems

## OPTION 2: conda
$ conda create -n exathlon python=3.9.19
$ conda activate exathlon

Then install the project and dependencies by running:

$ pip install -e .[all]

Where [all] includes:

[dev]: development dependencies (formatting and testing).
[docs]: dependencies for building documentation.
[profiling]: memory profiling dependencies.
[notebooks]: dependencies only used for/in notebooks.

At the root of the project folder, create a .env file containing the lines:

OUTPUTS=path/to/pipeline/outputs
SPARK=path/to/extracted/data/raw
{OTHER_NAME}=path/to/other/data
...

"Outputs" refer to all the outputs that will be produced by Exathlon's pipeline, including intermediate and fully processed data, models, model information and final results.

Note: Running this Project on Windows

Some results and logging paths might exceed the Windows historical path length limitation of 260 characters, leading to some errors when running the pipeline. To counter this, we advise to disable this limitation following the procedure described in the official Python documentation:

Windows historically has limited path lengths to 260 characters. This meant that paths longer than this would not resolve and errors would result.

In the latest versions of Windows, this limitation can be expanded to approximately 32,000 characters. Your administrator will need to activate the “Enable Win32 long paths” group policy, or set LongPathsEnabled to 1 in the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem.

This allows the open() function, the os module and most other path functionality to accept and return paths longer than 260 characters.

After changing the above option, no further configuration is required.

Changed in version 3.6: Support for long paths was enabled in Python.

Data License

The provided dataset is licensed under a CC BY-NC-SA 4.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
conf		conf
exathlon		exathlon
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
extract_data.sh		extract_data.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unsupervised Anomaly Detection in Multivariate Time Series across Heterogeneous Domains

Example Usage

Exathlon: A Benchmark for Explainable Anomaly Detection over Time Series

Project Configuration

Note: Running this Project on Windows

Data License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

exathlonbenchmark/divad

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Anomaly Detection in Multivariate Time Series across Heterogeneous Domains

Example Usage

Exathlon: A Benchmark for Explainable Anomaly Detection over Time Series

Project Configuration

Note: Running this Project on Windows

Data License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages