calibr: A Package for Experimental calibration using the Meta-Analysis of Bayes Factors

📌 Overview
calibr is an R package that provides tools for experimental calibration, enabling researchers to estimate and update measurement method rankings across various datasets. We propose a principled approach via combining Bayes Factors for the object model, induced by a certain measurement, against the null model, which is simply an intercept (the worst possible measurement method). The package is particularly useful for combining evidence from multiple studies.

The infrastructure is a verbatim implementation of the methods in Nikolakopoulos, S., & Ntzoufras, I. (2021). Meta analysis of Bayes factors. arXiv preprint arXiv:2103.13236. The logic underlying the choice of Bayes factors to indicate retrodictive validity is described in Mancinelli, F., & Bach, D. (2025). Experiment-based calibration: inference and decision-making.

🔧 Installation

To install calibr from GitLab, use the following command in R:

# Install necessary dependencies
install.packages("devtools")  

# Install from GitLab
devtools::install_git("https://gitlab.com/bachlab/methods/calibr.git")

📊 Use case: computing a calibration summary

In this example, three datasets from different studies are analyzed in a between-subjects setting: the binary standard score (0/1) forms two groups. Per combination of study × standard score × measurement, we simulate 20 observations. “Measure_A” is better than “Measure_B”.

1. Data preparation

library(dplyr)
set.seed(123)

# Parameters
n_studies <- 3
n_per_group <- 20
datasets <- paste0("Study", 1:n_studies)
measurements <- c("Measure_A", "Measure_B")
standard_scores <- c(0, 1)

# Simulate
df <- expand.grid(
  dataset = datasets,
  standard_score = standard_scores,
  measurement_name = measurements
) %>%
  rowwise() %>%
  do(data.frame(
    dataset = rep(.$dataset, n_per_group),
    standard_score = rep(.$standard_score, n_per_group),
    measurement_name = rep(.$measurement_name, n_per_group),
    measurement_value =
      if (.$standard_score == 0) {
        if (.$measurement_name == "Measure_A") rnorm(n_per_group, mean = 45, sd = 30) else rnorm(n_per_group, mean = 40, sd = 30)
      } else {
        if (.$measurement_name == "Measure_A") rnorm(n_per_group, mean = 70, sd = 30) else rnorm(n_per_group, mean = 55, sd = 30)
      }
  )) %>%
  ungroup()

2. Compute a calibration summary

library(calibr)

result <- calibration_summary(
  data = df,
  dataset_col = "dataset",
  measurement_name_col = "measurement_name",
  measurement_values_col = "measurement_value",
  standard_scores_col = "standard_score"
)

# Main summary table (ordered by log BF)
print(result$summary_table)
# Matrix of pairwise log BF differences
print(result$log_bayes_factor_diffs)
# Short statement about the best measurement
cat(result$best_measurement, "\n")
# Subject overlap (if subject IDs were provided)
print(result$subject_overlap)

🙋 FAQ

I don’t understand the logic of subject overlap. When can I trust the computations?
calibr provides two functions to infer measurement rankings from data: meta_bf_from_data and strict_meta_bf_from_data.

meta_bf_from_data compares Bayes factors assuming data homogeneity. Since not all measurements are always available across datasets, the Bayes factors may not be based on the exact same data.
strict_meta_bf_from_data restricts comparisons to datasets where all relevant pairs of measurements are available.

The subject_overlap metric in meta_bf_from_data quantifies how reliable each comparison is. With complete overlap, Bayes factors are directly comparable.

My BF ranking and Cohen’s d ranking don’t perfectly match. Why?
Bayes factors (BFs) and Cohen’s d are not in one-to-one correspondence. Discrepancies arise because each is combined across datasets differently. The weighting schemes emphasize different dataset features, which can lead to mismatches.

What if my standard score isn’t binary?
If your standard score is not binary, Cohen’s d is not meaningful. You will still receive all other outputs.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
R		R
man		man
tests		tests
.DS_Store		.DS_Store
.Rbuildignore		.Rbuildignore
.Rhistory		.Rhistory
.gitignore		.gitignore
CalibR.Rproj		CalibR.Rproj
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

calibr: A Package for Experimental calibration using the Meta-Analysis of Bayes Factors

🔧 Installation

📊 Use case: computing a calibration summary

1. Data preparation

2. Compute a calibration summary

🙋 FAQ

About

Licenses found

Uh oh!

Releases

Packages

Languages

License

Licenses found

bachlab/calibr

Folders and files

Latest commit

History

Repository files navigation

calibr: A Package for Experimental calibration using the Meta-Analysis of Bayes Factors

🔧 Installation

📊 Use case: computing a calibration summary

1. Data preparation

2. Compute a calibration summary

🙋 FAQ

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages