P‑TTS: Prompting Test‑Time Scaling 🚀

90 examples can beat 1K — P‑TTS uses principled instructional prompt augmentation to turn 90 AIME seeds into 900 high‑utility training examples, delivering strong reasoning with far less data.

• 📄 Paper • 📊 Dataset

What is P‑TTS?

P‑TTS expands a small, vetted seed set (90 AIME 2022–2024 problems) by wrapping each problem with principled instructions to elicit diverse reasoning traces from a teacher model (DeepSeek‑R1). We then fine‑tune Qwen2.5‑Instruct models on these augmented traces.

Principles used (unchanged question text; wrappers are prefixed/suffixed):

Reward – e.g., "I'll tip $200,000 for a better solution!"
Penalty – "You will be penalized if the answer is wrong."
Correctness – "You MUST provide the correct answer."
Step‑by‑Step – "Think step by step."

Data scales via augmentation multipliers m ∈ {1, 4, 5, 10}: 90 → 360 → 450 → 900.

Key Results

Benchmarks: AIME24, AIME25, MATH500, GPQA‑Diamond. Backbone: Qwen2.5‑Instruct (7B/14B/32B). Metric: accuracy (lm‑evaluation‑harness; greedy decoding).

Model	#Train ex.	AIME24	AIME25	MATH500	GPQA‑D	Avg.
P‑TTS‑32B	900	73.33%	53.33%	94.20%	60.61%	70.35%
P‑TTS‑14B	900	53.33%	26.67%	90.40%	51.01%	55.35%
P‑TTS‑7B	900	43.33%	26.67%	84.20%	41.92%	49.03%

Training Data

The training dataset consists of 900 high-quality reasoning examples generated from 90 AIME seed problems. Each seed problem is augmented using principled instruction wrappers and processed through DeepSeek-R1 to create diverse reasoning traces.

Dataset Composition:

Source: 90 AIME problems (2022-2024)
Augmentation: 4 instruction wrapper types with reward variants
Final Size: 900 training examples

Data Tokenization

Before training, you need to tokenize your raw dataset. Use the provided tokenization script:

# Run the tokenization script
python tokenize_data.py

Training

To run training, you can find our script at train/sft.py which you can invoke via one of the train/sft*.sh scripts, or you can launch via train/launch.sh if you are on a SLURM cluster (requires editing the file for your cluster setup).

Configuration

Hardware Requirements:

For 7B models: 4x A100 GPUs
For 32B models: 6x B200 GPUs

Quick Start:

git clone https://github.com/simplescaling/s1.git
cd s1
pip3 install -r requirements.txt
# First tokenize your data
python tokenize_data.py
# Then run training
bash train/sft.sh

Note: Training scripts are adapted from simplescaling/s1 (Apache-2.0).

Training Data

The script expects your training data in CSV format. Update the train_file_path variable in sft.sh:

--train_file_path="xx_tokonized.csv"

How It Works (pipeline)

90 AIME seeds → apply 4 instruction wrappers (+ reward variants) →
query teacher (DeepSeek‑R1) → collect reasoning traces → fine‑tune Qwen2.5‑Instruct

Reproduce

# 1) Build wrapped prompts from seeds
python DataConstruction/build_prompt_variants.py \
  --input DataConstruction/seeds.csv \
  --out DataConstruction/variants.csv

# 2) Query teacher model to collect reasoning traces
python DataConstruction/deepseek_query.py \
  --input DataConstruction/variants.csv \
  --out DataConstruction/DS_responses.csv

# 3) Combine Full Data
python DataConstruction/combine_deepseek_data.py

Citation

@article{bsharat2025prompting,
  title={Prompting Test-Time Scaling Is A Strong LLM Reasoning Data Augmentation},
  author={Bsharat, Sondos Mahmoud and Shen, Zhiqiang},
  journal={arXiv preprint arXiv:2510.09599},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
DataConstruction		DataConstruction
image		image
train		train
LICENSE		LICENSE
README.md		README.md
tokenize_data.py		tokenize_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

P‑TTS: Prompting Test‑Time Scaling 🚀

Table of Contents

What is P‑TTS?

Key Results

Training Data

Data Tokenization

Training

Configuration

Training Data

How It Works (pipeline)

Reproduce

Citation

About

Uh oh!

Releases

Packages

Languages

License

VILA-Lab/PTTS

Folders and files

Latest commit

History

Repository files navigation

P‑TTS: Prompting Test‑Time Scaling 🚀

Table of Contents

What is P‑TTS?

Key Results

Training Data

Data Tokenization

Training

Configuration

Training Data

How It Works (pipeline)

Reproduce

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages