SRPI – Self‑Reflective Policy Improvement (Repo Scaffold)

🔄 The paper is auto-compiled from LaTeX via GitHub Actions using xu-cheng/latex-action.

SRPI – Self‑Reflective Policy Improvement (Repo Scaffold)

This is a minimal, runnable scaffold for the SRPI idea:

Treat an agent’s natural‑language reflections as a learnable advantage signal via a Language Advantage Critic (LAC) and blend it with environment‑based advantages for policy improvement.

Features

Clean Python package layout (srpi/).
Simple GridWorld environment with sparse reward.
Policy + LAC toy implementations (no GPUs required).
YAML configs, structured logging to CSV/JSON, and plots.
Repro scripts under scripts/.

Quickstart

# (Optional) create venv
python -m venv .venv && source .venv/bin/activate  # on Windows: .venv\Scripts\activate
pip install -r requirements.txt

# Run a tiny experiment
python -m srpi.train --config configs/gridworld_min.yaml

# Plot learning curve
python scripts/plot_learning_curve.py experiments/gridworld_min/metrics.csv plots/learning_curve.png

Repo layout

srpi/
  agents/        # policy & memory
  envs/          # gridworld
  lac/           # language advantage critic (text->advantage)
  utils/         # logging, config
  train.py       # training loop (entry point)
configs/
  gridworld_min.yaml
scripts/
  plot_learning_curve.py
experiments/     # auto-created outputs (metrics, checkpoints)
logs/            # run logs
plots/           # output figures

Notes

This is a toy implementation intended to validate the training loop and logging/plotting. Replace the stub LAC with your preferred encoder (e.g., transformer sentence embedding) and implement a real reflection generator.
The plotting script uses matplotlib only.

Reflection Timing Experiment

Run the minimal study comparing reflection schedules (per-step, failure-only, success-only):

python scripts/run_reflection_timing.py --config configs/reflection_timing.yaml

# Then plot summaries
python scripts/plot_reflection_efficiency.py experiments/reflection_timing/reflection_timing_metrics.csv plots/reflection_timing.png

This logs mode, episode, success, steps, return, reflections and produces three plots:

success vs reflections
success per reflection (efficiency)
mean steps per mode

Final: Reflection Timing (with No-Reflection Baseline)

Run the 4-mode study:

python -m srpi.experiments.reflection_timing --config configs/reflection_timing.yaml
python scripts/plot_reflection_efficiency.py experiments/reflection_timing/reflection_timing_metrics.csv plots/reflection_timing.png

Modes: no_reflection, per_step, failure_only, success_only.

Auto-build the PDF on GitHub

Push the repo and check Actions ➜ artifact paper_pdf:

git init
git add .
git commit -m "Reflection timing with no-reflection baseline"
git branch -M main
git remote add origin https://github.com/<yourname>/reflection-timing.git
git push -u origin main

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
configs		configs
experiments/reflection_timing		experiments/reflection_timing
plots		plots
scripts		scripts
srpi		srpi
README.md		README.md
paper_reflection_timing.pdf		paper_reflection_timing.pdf
paper_reflection_timing.tex		paper_reflection_timing.tex
refs.bib		refs.bib
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SRPI – Self‑Reflective Policy Improvement (Repo Scaffold)

Features

Quickstart

Repo layout

Notes

Reflection Timing Experiment

Final: Reflection Timing (with No-Reflection Baseline)

Auto-build the PDF on GitHub

About

Uh oh!

Releases

Packages

Languages

srikanthbaride/reflection-timing

Folders and files

Latest commit

History

Repository files navigation

SRPI – Self‑Reflective Policy Improvement (Repo Scaffold)

Features

Quickstart

Repo layout

Notes

Reflection Timing Experiment

Final: Reflection Timing (with No-Reflection Baseline)

Auto-build the PDF on GitHub

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages