Data and environmental setup to run benchmarks against blosc2
!!! Note This section included for posterity only. The processed files are provided in the repo already.
First, install GDAL to parse the input files.
sudo dnf install gdal-devel
Grab the input files from the respective agencies. Run the processing scripts to produce the datasets.
create_rea6 TOT_PRECIP.2D.201512.grb
create_era5 data.grib
The REA6_precip dataset is taken from Breaking Down Memory Walls. The code they used is here. Instead of only evaluating over the small 20KiB section of one sample, we evaluate over the entire dataset of 744 samples from the COSMO-REA6 precipitation dataset.
The other datasets are taken from the bytedelta analysis. Here the code they used no longer works. They don't publish the English names of the datasets they are using, so we can only guess to our best ability which datasets they have pulled. We use the following sample from the ERA5 reanalysis (shortname names):
- 10 metre u wind component (ERA5_wind)
- Mean sea level pressure (ERA5_pressure)
- Total precipitation (ERA5_precip)
- Downward UV radiation at the surface (ERA5_flux)
- Snow density (ERA5_snow)
| File prefix | Size | X | Y |
|---|---|---|---|
| REA6_precip | 5590016 | 848 | 824 |
| ERA5_* | 8305920 | 1440 | 721 |
First, create a python virtual environment. Then, download blosc-btune
pip install blosc2-btune
Run the benchmarks by providing the dataset directory to run against. Example:
./roundtrip /path/to/REA6_precip/
This will write a CSV file named stats.csv to the dataset directory containing the test files.