🔄 The paper is auto-compiled from LaTeX via GitHub Actions using
xu-cheng/latex-action
.
This is a minimal, runnable scaffold for the SRPI idea:
Treat an agent’s natural‑language reflections as a learnable advantage signal via a Language Advantage Critic (LAC) and blend it with environment‑based advantages for policy improvement.
- Clean Python package layout (
srpi/
). - Simple GridWorld environment with sparse reward.
- Policy + LAC toy implementations (no GPUs required).
- YAML configs, structured logging to CSV/JSON, and plots.
- Repro scripts under
scripts/
.
# (Optional) create venv
python -m venv .venv && source .venv/bin/activate # on Windows: .venv\Scripts\activate
pip install -r requirements.txt
# Run a tiny experiment
python -m srpi.train --config configs/gridworld_min.yaml
# Plot learning curve
python scripts/plot_learning_curve.py experiments/gridworld_min/metrics.csv plots/learning_curve.png
srpi/
agents/ # policy & memory
envs/ # gridworld
lac/ # language advantage critic (text->advantage)
utils/ # logging, config
train.py # training loop (entry point)
configs/
gridworld_min.yaml
scripts/
plot_learning_curve.py
experiments/ # auto-created outputs (metrics, checkpoints)
logs/ # run logs
plots/ # output figures
- This is a toy implementation intended to validate the training loop and logging/plotting. Replace the stub LAC with your preferred encoder (e.g., transformer sentence embedding) and implement a real reflection generator.
- The plotting script uses matplotlib only.
Run the minimal study comparing reflection schedules (per-step, failure-only, success-only):
python scripts/run_reflection_timing.py --config configs/reflection_timing.yaml
# Then plot summaries
python scripts/plot_reflection_efficiency.py experiments/reflection_timing/reflection_timing_metrics.csv plots/reflection_timing.png
This logs mode, episode, success, steps, return, reflections
and produces three plots:
- success vs reflections
- success per reflection (efficiency)
- mean steps per mode
Run the 4-mode study:
python -m srpi.experiments.reflection_timing --config configs/reflection_timing.yaml
python scripts/plot_reflection_efficiency.py experiments/reflection_timing/reflection_timing_metrics.csv plots/reflection_timing.png
Modes: no_reflection
, per_step
, failure_only
, success_only
.
Push the repo and check Actions ➜ artifact paper_pdf
:
git init
git add .
git commit -m "Reflection timing with no-reflection baseline"
git branch -M main
git remote add origin https://github.com/<yourname>/reflection-timing.git
git push -u origin main