The ReSi
Benchmark provides a unified framework to quantitatively compare a wide array of representational similiarty measures. It comprises 24 similarity measures, comes with 14 different architectures and spans the Vision, Language and Graph domain. [OpenReview]
In the following we explain:
- How to setup the Benchmark
- How to use the Benchmark (e.g. how to reproduce results from the Paper)
- How extend the Benchmark with additional measures.
# Download the repository
git clone <REPOSITORY_URL> resi # Redacted for anonymity. You can download the code clicking on "Download Repository" on the top right.
cd resi
# Create a virtual environment with Python 3.10
# With Conda ...
conda create -n ENVNAME python=3.10 cmake # cmake is required for rtd
conda active ENVNAME
# ... or venv
python -m venv .venv
source .venv/bin/activate
# Afterwards install the requirements and repository
pip install -r requirements.txt
pip install -e .
The "REP_SIM"
environment variable defines the EXPERIMENT_PATH
in which all results and models are saved.
If this is not specified, the experiments/
subdirectory will be used as EXPERIMENT_PATH
by default.
If you want to rerun our experiments from scratch, or test a measure that you have implemented, it is required that the necessary models and datasets have been downloaded.
Most datasets will be automatically downloaded from huggingface
(SST2, MNLI), pytorch geometric
(Cora, flickr) or ogb
(ogbn-arxiv) once the first attempt to use them is made.
All datasets are saved in EXPERIMENT_PATH/datasets/{nlp,graphs,vision}
, depending on the domain.
For vision, you need to manually download the ImageNet100 dataset from kaggle due to license restrictions.
After downloading, move train.X1-4
into one joint train
folder and rename the val.X
into a val
folder so they each contain 100 class folders.
This dataset then should be named Imagenet100
and located in the directory specified by the VISION_DATA_PATH
(should be <EXPERIMENT_PATH>/datasets/vision
-- see repsim/benchmark/paths.py
for details).
To get all relevant models, you need to download the model files from Zenodo and unpack the zipped files into corresponding subdirectories of EXPERIMENT_PATH/models
:
- Language and graphs: https://doi.org/10.5281/zenodo.11565486. Move the content of
models/nlp
insidenlp_data.zip
intoEXPERIMENT_PATH/models/nlp
. Move the models insidegraph_data.zip
intoEXPERIMENT_PATH/models/graphs
. SmolLM2 models need to be downloaded separately due to their size (414GB in total) from https://console.share.innkube.fim.uni-passau.de/browser/public/resi-benchmark%2F. -
- Vision: Run the script
python vision/download_vision_model.py
to auto-download and extract all the models. Alternatively one can manually download the files from Zenodo (Part 1/Part 2), extract them, and move them into a directory named<VISION_MODEL_PATH>/vision_models_simbench/.
(checkpaths.py
from earlier forVISION_DATA_PATH
details). The auto-download will only download the ImageNet100 models, if you want to download the CIFAR 100 models you need to manually download the models from Zenodo (Part 3 (First CIFAR 100)/Part 4/Part 5/Part 6.
- Vision: Run the script
The results from all our experiments are stored in a results.parquet
file, which you can download from Zenodo.
You need this file if you want to easily test a new similarity measure and compare it to the existing results.
Download the file and place it in the EXPERIMENT_PATH/results/
directory.
Regarding the datasets, for the
- Language domain, datasets will be automatically downloaded from huggingface.
- Vision domain: Downloading the dataset: you need to manually download the ImageNet100 dataset from kaggle due to license restrictions. After downloading, move the zip to the
VISION_DATA_PATH
into a directory namedImagenet100
(VISION_DATA_PATH
should be located in<EXPERIMENT_PATH>/datasets/vision
if no ENV variables set -- seerepsim/benchmark/paths.py
for details). In this directory extract the folders and movetrain.X1-4
into one jointtrain
folder and rename theval.X
folder intoval
resulting in two directories containing 100 class folders and a json, created during extraciton. Downloading Vision models: To get all relevant models you run the scriptpython vision/download_vision_model.py
to auto-download and extract all the models. Alternatively one can manually download the files from zenodo (Part 1/Part 2), extract them, and move them into a directory named<VISION_MODEL_PATH>/vision_models_simbench/.
(checkpaths.py
from earlier forVISION_DATA_PATH
details). - Graph domain, we chose datasts that are already included in either the
pytorch geometric
orogb
packages. Upon extracting representations for the first time, these datasts will be downloaded automatically into theEXPERIMENT_PATH/datasets/graphs/
subdirectory.
Regarding the models, you need to download the model files from --ZENODO-LINK(S)-- and unpack the zipped files into corresponding subdirectories of ´EXPERIMENT_PATH/models`.
The main way to (re)run experiments from the benchmark is to set up a config.yaml
file, and then simply run
python3 -m repsim.run -c path/to/config.yaml
In the configs/
subdirectory, you can find all the config files necessary to reproduce our experiments.
Example: As a quick example, we also provide a demo config that runs the augmentation affinity test on Cora, using GCN, GraphSAGE, and GAT models, and applying all benchmark measures except PWCCA, which often times fails to converge, and RSM norm difference and IMD score, which take relatively long to compute. This test should finish within a few minutes.
python3 -m repsim.run -c configs/demo_augmentation_test_cora.yaml
If you want only want to run experiments on specific tests or domains, you need to modify an existing config or create a new one. This example config documents the different parts of it.
If you want to run multiple experiments in parallel, it is crucial that these NEVER write/work on the same results parquet file at the same time, as specified by raw_results_filename
in the configs.
Otherwise, the file can be corrupted making the file unreadable.
It is, however, no issue to write on an already existing parquet file with a single process - this will simply append the new results.
Regarding the CSVs of (aggregated) results, which are specied in configs under table_creation
-> filename
and full_df_filename
, it is crucial to consider that existing files will be overwritten.
NOTE: The given config files in the configs
directory were designed such that no such overwriting can occur, and thus these can safely be run in parallel.
For the graph domain, another option to (re)run individual tests for all the graph models (GCN, GraphSAGE, GAT) on a given dataset is to run
python3 -m repsim.run_graphs -t {test_name} -d {dataset} [-m {measures}]
Implicitly, this scripts creates a config file as described above, which is then used to run a test. The config files stored in the configs directory were also generated from this script.
Valid dataset names are cora
, flickr
, and ogbn-arxiv
, valid test names are label_test
, shortcut_test
, augmentation_test
, layer_test
, and output_correlation_test
, where the latter runs Tests 1 and 2 from our benchmark simultaneously.
The argument for measures is optional, and by default, all measures that are registered under ALL_MEASURES
in the repsim.measures
module will be used.
In this case, results will be saved into files called graphs_{test_name}_{dataset}.parquet
, graphs_{test_name}_{dataset}.csv
(filename
), and graphs_{test_name}_{dataset}_full.csv
(full_df_filename
).
When specific measures that should be used are specified, the corresponding measure names will be appended to the result file names to avoid problems with files overwriting each other (cf. Section 2.3 above).
The name of the generated config file will follow the same pattern.
To merge all the parquet files you have produced into a single file, you can use this script.
To plot the results, the csv files with full results are used (full_df_filename
in the config).
The results from the paper are available in experiments/paper_results
.
This notebook can be used to create the overview table as well as the plots of the rank distribution in the paper.
If you want to use our benchmark on a measure that has not been implemented yet, you can easily add your measure to the benchmark with the following steps:
Add a python script your_measure.py
to the repsim.measures
module, in which your similarity measure will be implemented.
In your script, you need to implement a your similariy measure in a function of the following signature
def your_measure(
R: Union[torch.Tensor, npt.NDArray],
Rp: Union[torch.Tensor, npt.NDArray],
shape: SHAPE_TYPE,
) -> float:
where the shape parameter of type SHAPE_TYPE = Literal["nd", "ntd", "nchw"]
defines input format of the given representations: "nd"
represents input matrices in the "ntd"
corresponds to "nchw"
corresponds to "nd"
format, you can use the flatten
function that we provide in repsim.measures.utils
. We further provide additional functions for preprocessing/normalizing inputs in this module.
You can skip this step if you implement the function inside the __call__
method, as described in the next step.
The class specifies several properties of the similarity measure, such that its values can be correctly interpreted in the benchmark framework.
Particularly important are larger_is_more_similar
and is_symmetric
.
If you are unsure about the other properties, specify them as False
.
The RepresentationalSimilarityMeasure
class can be imported from repsim.benchmark.utils
.
To wrap your function into such a class, you can use the following template:
class YourMeasure(RepresentationalSimilarityMeasure):
def __init__(self):
super().__init__(
sim_func=your_measure,
larger_is_more_similar=False # Fill in True iff for your measure, higher values indicate more similarity
is_metric=True, # Fill in True iff your measure satisfies the properties of a distance metric.
is_symmetric=True, # Fill in True iff your measure is symmetric, i.e., m(R, Rp) = m(Rp,R)
invariant_to_affine=True, # Fill in True iff your measure is invariant to affine transformations
invariant_to_invertible_linear=True, # Fill in True iff your measure is invariant to invertible linear transformations
invariant_to_ortho=True, # Fill in True iff your measure is invariant to orthogonal transformations
invariant_to_permutation=True, # Fill in True iff your measure is invariant to permutations
invariant_to_isotropic_scaling=True, # Fill in True iff your measure is invariant to isotropic scaling
invariant_to_translation=True, # Fill in True iff your measure is invariant to translations
)
def __call__(self, R: torch.Tensor | npt.NDArray, Rp: torch.Tensor | npt.NDArray, shape: SHAPE_TYPE) -> float:
# here you can, in priciple, conduct some preprocessing already, such as aligning spatial dimensions for vision inputs
return self.sim_func(R, Rp, shape)
Open repsim/benchmark/__init__.py
, import YourMeasure
class, and append it to the CLASSES
list that is defined in this script - this will also automatically append it to ALL_MEASURES
, which is the list of measures considered in our benchmark. Thus, your measure is now registered in our benchmark, and can, for instance, be explicitly included in the config.yaml
files via its class name.
Our code is available under CC-BY 4.0.