+
Skip to content

Experiments and analysis on reflection timing in reinforcement learning agents — exploring self-evaluation, meta-learning, and adaptive reflection intervals.

Notifications You must be signed in to change notification settings

srikanthbaride/reflection-timing

Repository files navigation

Build LaTeX PDF Paper

🔄 The paper is auto-compiled from LaTeX via GitHub Actions using xu-cheng/latex-action.

SRPI – Self‑Reflective Policy Improvement (Repo Scaffold)

This is a minimal, runnable scaffold for the SRPI idea:

Treat an agent’s natural‑language reflections as a learnable advantage signal via a Language Advantage Critic (LAC) and blend it with environment‑based advantages for policy improvement.

Features

  • Clean Python package layout (srpi/).
  • Simple GridWorld environment with sparse reward.
  • Policy + LAC toy implementations (no GPUs required).
  • YAML configs, structured logging to CSV/JSON, and plots.
  • Repro scripts under scripts/.

Quickstart

# (Optional) create venv
python -m venv .venv && source .venv/bin/activate  # on Windows: .venv\Scripts\activate
pip install -r requirements.txt

# Run a tiny experiment
python -m srpi.train --config configs/gridworld_min.yaml

# Plot learning curve
python scripts/plot_learning_curve.py experiments/gridworld_min/metrics.csv plots/learning_curve.png

Repo layout

srpi/
  agents/        # policy & memory
  envs/          # gridworld
  lac/           # language advantage critic (text->advantage)
  utils/         # logging, config
  train.py       # training loop (entry point)
configs/
  gridworld_min.yaml
scripts/
  plot_learning_curve.py
experiments/     # auto-created outputs (metrics, checkpoints)
logs/            # run logs
plots/           # output figures

Notes

  • This is a toy implementation intended to validate the training loop and logging/plotting. Replace the stub LAC with your preferred encoder (e.g., transformer sentence embedding) and implement a real reflection generator.
  • The plotting script uses matplotlib only.

Reflection Timing Experiment

Run the minimal study comparing reflection schedules (per-step, failure-only, success-only):

python scripts/run_reflection_timing.py --config configs/reflection_timing.yaml

# Then plot summaries
python scripts/plot_reflection_efficiency.py experiments/reflection_timing/reflection_timing_metrics.csv plots/reflection_timing.png

This logs mode, episode, success, steps, return, reflections and produces three plots:

  • success vs reflections
  • success per reflection (efficiency)
  • mean steps per mode

Final: Reflection Timing (with No-Reflection Baseline)

Run the 4-mode study:

python -m srpi.experiments.reflection_timing --config configs/reflection_timing.yaml
python scripts/plot_reflection_efficiency.py experiments/reflection_timing/reflection_timing_metrics.csv plots/reflection_timing.png

Modes: no_reflection, per_step, failure_only, success_only.

Auto-build the PDF on GitHub

Push the repo and check Actions ➜ artifact paper_pdf:

git init
git add .
git commit -m "Reflection timing with no-reflection baseline"
git branch -M main
git remote add origin https://github.com/<yourname>/reflection-timing.git
git push -u origin main

About

Experiments and analysis on reflection timing in reinforcement learning agents — exploring self-evaluation, meta-learning, and adaptive reflection intervals.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载