thu-ml · MischaPanch · Jun 23, 2025 · Mar 4, 2025 · Mar 4, 2025 · Mar 4, 2025
diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json
@@ -0,0 +1,22 @@
+{
+    "name": "Tianshou",
+    "dockerFile": "../Dockerfile",
+    "workspaceFolder": "/workspaces/tianshou",
+    "runArgs": ["--shm-size=1g"],
+    "customizations": {
+      "vscode": {
+        "settings": {
+          "terminal.integrated.shell.linux": "/bin/bash",
+          "python.pythonPath": "/usr/local/bin/python"
+        },
+        "extensions": [
+          "ms-python.python",
+          "ms-toolsai.jupyter",
+          "ms-python.vscode-pylance"
+        ]
+      }
+    },
+    "forwardPorts": [],
+    "postCreateCommand": "poetry install --with dev",
+    "remoteUser": "root"
+  }
diff --git a/.dockerignore b/.dockerignore
@@ -0,0 +1,14 @@
+data
+logs
+test/log
+docs/jupyter_execute
+docs/.jupyter_cache
+.lsp
+.clj-kondo
+docs/_build
+coverage*
+__pycache__
+*.egg-info
+*.egg
+.*cache
+dist
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -2,6 +2,7 @@
 - [ ] I have provided a description of the changes in this Pull Request
 - [ ] I have added documentation for my changes and have listed relevant changes in CHANGELOG.md
 - [ ] If applicable, I have added tests to cover my changes.
+- [ ] If applicable, I have made sure that the determinism tests run through, meaning that my changes haven't influenced any aspect of training. See info in the contributing documentation.
 - [ ] I have reformatted the code using `poe format` 
 - [ ] I have checked style and types with `poe lint` and `poe type-check`
 - [ ] (Optional) I ran tests locally with `poe test` 

diff --git a/.gitignore b/.gitignore
@@ -158,4 +158,7 @@ docs/conf.py
 
 # temporary scripts (for ad-hoc testing), temp folder
 /temp
-/temp*.py
+/temp*.py
+
+# determinism test snapshots
+/test/resources/determinism/
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,23 +1,62 @@
 # Changelog
 
-## Unreleased
+## Upcoming Release 1.2.0
 
 ### Changes/Improvements
 
-- trainer:
+- `trainer`:
     - Custom scoring now supported for selecting the best model. #1202
-- highlevel:
+- `highlevel`:
     - `DiscreteSACExperimentBuilder`: Expose method `with_actor_factory_default` #1248 #1250
-
+    - `ActorFactoryDefault`: Fix parameters for hidden sizes and activation not being 
+      passed on in the discrete case (affects `with_actor_factory_default` method of experiment builders)
+    - `ExperimentConfig`: Do not inherit from other classes, as this breaks automatic handling by
+      `jsonargparse` when the class is used to define interfaces (as in high-level API examples)
+    - `AutoAlphaFactoryDefault`: Differentiate discrete and continuous action spaces
+      and allow coefficient to be modified, adding an informative docstring
+      (previous implementation was reasonable only for continuous action spaces)
+        - Adjust usage in `atari_sac_hl` example accordingly.
+    - `NPGAgentFactory`, `TRPOAgentFactory`: Fix optimizer instantiation including the actor parameters
+      (which was misleadingly suggested in the docstring in the respective policy classes; docstrings were fixed),
+      as the actor parameters are intended to be handled via natural gradients internally
+- `data`:
+    - `ReplayBuffer`: Fix collection of empty episodes being disallowed 
+    - Collection was slow due to `isinstance` checks on Protocols and due to Buffer integrity validation. This was solved
+      by no longer performing `isinstance` on Protocols and by making the integrity validation disabled by default.
+- Tests:
+    - We have introduced extensive **determinism tests** which allow to validate whether
+      training processes deterministically compute the same results across different development branches.
+      This is an important step towards ensuring reproducibility and consistency, which will be 
+      instrumental in supporting Tianshou developers in their work, especially in the context of
+      algorithm development and evaluation. 
+
 ### Breaking Changes
 
-- data:
-    - stats:
-        - `InfoStats` has a new non-optional field `best_score` which is used
-          for selecting the best model. #1202
+- `trainer`:
+    - `BaseTrainer.run` and `__iter__`: Resetting was never optional prior to running the trainer,
+      yet the recently introduced parameter `reset_prior_to_run` of `run` suggested that it _was_ optional.
+      Yet the parameter was ultimately not respected, because `__iter__` would always call `reset(reset_collectors=True, reset_buffer=False)`
+      regardless. The parameter was removed; instead, the parameters of `run` now mirror the parameters of `reset`,
+      and the implicit `reset` call in `__iter__` was removed.     
+      This aligns with upcoming changes in Tianshou v2.0.0.  
+        * NOTE: If you have been using a trainer without calling `run` but by directly iterating over it, you
+          will need to call `reset` on the trainer explicitly before iterating over the trainer.
+        * Using a trainer as an iterator is considered deprecated and support for this will be removed in Tianshou v2.0.0.
+- `data`:
+    - `InfoStats` has a new non-optional field `best_score` which is used
+      for selecting the best model. #1202
+- `highlevel`:
+    - Change the way in which seeding is handled: The mechanism introduced in v1.1.0 
+      was completely revised:
+        - The `train_seed` and `test_seed` attributes were removed from `SamplingConfig`.
+          Instead, the seeds are derived from the seed defined in `ExperimentConfig`.
+        - Seed attributes of `EnvFactory` classes were removed. 
+          Instead, seeds are passed to methods of `EnvFactory`.
 
 ## Release 1.1.0
 
+**NOTE**: This release introduced (potentially severe) performance regressions in data collection, please switch to a newer release for better performance.
+
 ### Highlights
 
 #### Evaluation Package

diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,42 @@
+# Use the official Python image for the base image.
+FROM --platform=linux/amd64 python:3.11-slim
+
+# Set environment variables to make Python print directly to the terminal and avoid .pyc files.
+ENV PYTHONUNBUFFERED=1
+ENV PYTHONDONTWRITEBYTECODE=1
+
+# Install system dependencies required for the project.
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    curl \
+    build-essential \
+    git \
+    wget \
+    unzip \
+    libvips-dev \
+    gnupg2 \
+    && rm -rf /var/lib/apt/lists/*
+
+
+# Install pipx.
+RUN python3 -m pip install --no-cache-dir pipx \
+    && pipx ensurepath
+
+# Add poetry to the path
+ENV PATH="${PATH}:/root/.local/bin"
+
+# Install the latest version of Poetry using pipx.
+RUN pipx install poetry
+
+# Set the working directory. IMPORTANT: can't be changed as needs to be in sync to the dir where the project is cloned
+# to in the codespace
+WORKDIR /workspaces/tianshou
+
+# Copy the pyproject.toml and poetry.lock files (if available) into the image.
+COPY pyproject.toml poetry.lock* README.md /workspaces/tianshou/
+
+RUN poetry config virtualenvs.create false
+RUN poetry install --no-root --with dev
+
+# The entrypoint will perform an editable install, it is expected that the code is mounted in the container then
+# If you don't want to mount the code, you should override the entrypoint
+ENTRYPOINT ["/bin/bash", "-c", "poetry install --with dev && poetry run jupyter trust notebooks/*.ipynb docs/02_notebooks/*.ipynb && $0 $@"]
diff --git a/docs/04_contributing/04_contributing.rst b/docs/04_contributing/04_contributing.rst
@@ -2,11 +2,12 @@ Contributing to Tianshou
 ========================
 
 
-Install Develop Version
------------------------
+Install Development Environment
+-------------------------------
 
 Tianshou is built and managed by `poetry <https://python-poetry.org/>`_. For example,
-to install all relevant requirements in editable mode you can simply call
+to install all relevant requirements (and install Tianshou itself in editable mode)
+you can simply call
 
 .. code-block:: bash
 
@@ -36,9 +37,9 @@ Please set up pre-commit by running
 in the main directory. This should make sure that your contribution is properly
 formatted before every commit.
 
-The code is inspected and formatted by `black` and `ruff`. They are executed as
-pre-commit hooks. In addition, `poe the poet` tasks are configured.
-Simply run `poe` to see the available tasks.
+The code is inspected and formatted by ``black`` and ``ruff``. They are executed as
+pre-commit hooks. In addition, ``poe the poet`` tasks are configured.
+Simply run ``poe`` to see the available tasks.
 E.g, to format and check the linting manually you can run:
 
 .. code-block:: bash
@@ -47,8 +48,8 @@ E.g, to format and check the linting manually you can run:
     $ poe lint
 
 
-Type Check
-----------
+Type Checks
+-----------
 
 We use `mypy <https://github.com/python/mypy/>`_ to check the type annotations. To check, in the main directory, run:
 
@@ -57,8 +58,8 @@ We use `mypy <https://github.com/python/mypy/>`_ to check the type annotations.
     $ poe type-check
 
 
-Test Locally
-------------
+Testing Locally
+---------------
 
 This command will run automatic tests in the main directory
 
@@ -67,6 +68,30 @@ This command will run automatic tests in the main directory
     $ poe test
 
 
+Determinism Tests
+~~~~~~~~~~~~~~~~~
+
+We implemented "determinism tests" for Tianshou's algorithms, which allow us to determine
+whether algorithms still compute exactly the same results even after large refactorings.
+These tests are applied by
+
+  1. creating a behavior snapshot ine the old code branch before the changes and then
+  2. running the test in the new branch to ensure that the behavior is the same.
+
+Unfortunately, full determinism is difficult to achieve across different platforms and even different
+machines using the same platform an Python environment.
+Therefore, these tests are not carried out in the CI pipeline.
+Instead, it is up to the developer to run them locally and check the results whenever a change
+is made to the code base that could affect algorithm behavior.
+
+Technically, the two steps are handled by setting static flags in class ``AlgorithmDeterminismTest`` and then
+running either the full test suite or a specific determinism test (``test_*_determinism``, e.g. ``test_ddpg_determinism``)
+in the two branches to be compared.
+
+  1. On the old branch: (Temporarily) set ``ENABLED=True`` and ``FORCE_SNAPSHOT_UPDATE=True`` and run the test(s).
+  2. On the new branch: (Temporarily) set ``ENABLED=True`` and ``FORCE_SNAPSHOT_UPDATE=False`` and run the test(s).
+  3. Inspect the test results; find a summary in ``determinism_tests.log``
+
 Test by GitHub Actions
 ----------------------
 

diff --git a/docs/autogen_rst.py b/docs/autogen_rst.py
@@ -114,8 +114,8 @@ def make_rst(src_root, rst_root, clean=False, overwrite=False, package_prefix=""
                 for f in files_in_dir
                 if os.path.isdir(os.path.join(root, dirname, f)) and not f.startswith("_")
             ]
-            if not module_names:
-                log.debug(f"Skipping {dirname} as it does not contain any .py files")
+            if not module_names and "__init__.py" not in files_in_dir:
+                log.debug(f"Skipping {dirname} as it does not contain any modules or __init__.py")
                 continue
             package_qualname = f"{base_package_qualname}.{dirname}"
             package_index_rst_path = os.path.join(

diff --git a/docs/create_toc.py b/docs/create_toc.py
@@ -3,6 +3,6 @@
 
 # This script provides a platform-independent way of making the jupyter-book call (used in pyproject.toml)
 toc_file = Path(__file__).parent / "_toc.yml"
-cmd = f"jupyter-book toc from-project docs -e .rst -e .md -e .ipynb  >{toc_file}"
+cmd = f'jupyter-book toc from-project docs -e .rst -e .md -e .ipynb  >"{toc_file}"'
 print(cmd)
 os.system(cmd)