End-to-end diffusion language modeling across TPU pre-training, GPU fine-tuning, evaluation, and serving.
CoDA is Salesforce AI Research's open diffusion language model. This repo contains a unified training pipeline from pre-training to post-training, evaluation harnesses, and a simple Fast-API based serving backend.
Note: This repository is provided for research purposes only. Data release is subjected to internal regulations.
Directory | Purpose |
---|---|
CoDALanguageModel/ |
Huggingface model class |
post-train/ |
Supervised fine-tuning (SFT) pipeline |
evaluation/ |
Evaluation framework |
pre-train/ |
TPU-based pre-training pipeline |
serving/ |
Serving stack |
run_sft.sh |
Launcher coupling pre-training checkpoints with the post-training diffusion trainer. |
save_hf_model.py |
Util function to convert checkpoint in Huggingface model class |
To avoid dependency conflicts, we recommend maintaining isolated environments per subsystem and activate the corresponding environment before executing scripts in each subdirectory.
- Populate TPU metadata in
pre-train/env.example
and copy topre-train/.env
. - Run
pre-train/setup_tpu.sh
to provision dependencies and sync the repository to the TPU pod. - Launch pre-training with the provided recipes (e.g.,
pre-train/recipes/midtrain_v4_512.sh
) to produce CoDA checkpoints (GCS or local storage).
- Install prerequisites following
post-train/LLaMA-Factory/README.md
. - Configure dataset metadata in
post-train/LLaMA-Factory/data/dataset_info.json
and diffusion arguments inpost-train/LLaMA-Factory/examples/train_full/*.yaml
. - Execute
./run_sft.sh
to fine-tune CoDA checkpoints with discrete denoising objectives.
- Choose a benchmark script such as
evaluation/lm_eval/eval_mbpp_humaneval.sh
. - Update
MODEL_DIR
and diffusion parameters (diffusion_steps
,temperature
,top_p
) to match the target checkpoint. - Run the script to gather metrics; logs are stored locally for aggregation and reporting.
Comparison of code-generation performance across standard and plus-enhanced benchmarks. Evalplus is computed as the mean pass@1 on enhanced variants. Bold marks results where CoDA produces the strongest diffusion-model performance.
Model | Humaneval Instruct | Humaneval Plus | MBPP Instruct | MBPP Plus | Evalplus |
---|---|---|---|---|---|
CoDA-Base | 29.3 | 23.8 | 35.2 | 46.0 | 34.9 |
CoDA-Instruct | 54.3 | 47.6 | 47.2 | 63.2 | 55.4 |
Dream-Base | 56.7 | 50.0 | 68.7 | 57.4 | 53.7 |
Dream-7B-Instruct | 57.9 | 53.7 | 68.3 | 56.1 | 54.9 |
LLaDA-8B-Instruct | 35.4 | 31.7 | 31.5 | 28.6 | 30.2 |
Qwen3-1.7B | 66.5 | 61.6 | 46.2 | 65.9 | 63.8 |
Qwen2.5-Coder-1.5B | 43.9 | 36.6 | 69.2 | 58.6 | 47.6 |
Qwen2.5-Coder-1.5B-Instruct | 70.7 | 66.5 | 69.2 | 59.4 | 62.3 |
Gemma-3-1B-it | 39.6 | 35.4 | 39.4 | 63.5 | 49.5 |
LLaMA-3.2-1B-Instruct | 35.4 | 31.1 | 24.4 | 53.7 | 42.4 |
python3 -m venv .venv
source .venv/bin/activate
pip install -r serving/requirements.txt
export HF_TOKEN="hf_..."
bash serving/fast-api/start_server.sh
The server will listen on http://localhost:8000.
python serving/fast-api/chat_cli.py --base-url http://localhost:8000 --model Salesforce/CoDA-v0-Instruct
Optional flags:
--stream
to stream tokens as they are generated--show-meta
to display latency and token usage
You can customize generation with these environment variables (defaults in parentheses):
MAX_TOKENS
(256)TEMPERATURE
(0.0)TOP_P
(unset)TOP_K
(unset)STEPS
(128)ALG
("entropy")ALG_TEMP
(0.1)BLOCK_LENGTH
(32)
Example:
export MAX_TOKENS=512
export TEMPERATURE=0.7
export TOP_P=0.9
export STEPS=128
export ALG=entropy
export ALG_TEMP=0.1
export BLOCK_LENGTH=32
bash serving/fast-api/start_server.sh
Coming soon