+
Skip to content
/ HBP Public

[NIPS 2025] This is the official PyTorch implementation of "Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM".

License

Notifications You must be signed in to change notification settings

ModelTC/HBP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM

License arXiv

Yongqiang Yao*, Jingru Tan*, Kaihuan Liang*, Feizhao Zhang, Jiahao Hu, Yazhe Niu, Shuo Wu, Ruihao Gong📧, Dahua Lin , Ningyi Xu📧 (* denotes equal contribution, 📧 denotes corresponding author.)

This is the official implementation of our paper HBP.

Overview

⚠️ Problem

problem

Hybrid training with short + long sequences causes:

  • 🚨 Workload imbalance (padding waste, uneven device utilization)
  • 🚨 Imbalanced attention computation (short vs. long variance)
  • 🚨 Wasted communication overhead (short data forced into SP)
  • 🚨 Training instability (loss normalization bias)

🛠️ HBP Framework

framework

Key components of Hierarchical Balance Packing (HBP):

  • 📦 Hierarchical Packing Groups (16K, 32K, 128K)
  • ⚖️ Balanced Packing (GreedyFill + attention-balanced batching)
  • 🔄 Dynamic Training Pipeline (adaptive SP + curriculum learning)
  • 📏 Stable Loss Normalizer (equal token contribution across batches)

⚠️ Problem vs. 🛠️ Solution (HBP)

Problem HBP Solution
Workload imbalance: Excessive padding in batches and uneven computation across devices Hierarchical Packing Groups: Assigns data to multi-level groups (16K, 32K, 128K) with optimal configs
Imbalanced attention computation: Mixing short & long sequences causes high variance in attention cost Balanced Packing: GreedyFill + attention-based batching to equalize computation load
Wasted communication overhead: Short data forced into costly sequence parallelism (SP) Optimized Grouping: Short and long data trained in separate groups, reducing unnecessary SP communication
Training instability: Loss normalization biased by sequence length differences Stable Loss Normalizer + Curriculum Learning: Equal token contribution and smooth transition from short to long sequences

In short: HBP combines multi-level packing, balanced batching, adaptive SP, curriculum learning, and stable loss to achieve up to 2.4× faster training with no performance trade-off.

Environment

pip install -r requirements.txt

Train

Calculate sample info

# sample_info_path: ./cache/data_llama3_1_128k_info.pkl in config is the generate path
sh ./tools/cal_length.sh ./config/llama3_1_8b_128k_isf.yaml

Run ISF

sh ./scripts/train.sh ./config/llama3_1_8b_128k_isf.yaml 4

Run HBP

sh ./scripts/train.sh ./config/llama3_1_8b_128k_hbp.yaml 4

Eval

We conducted a comprehensive evaluation of the LLM’s performance using OpenCompass for general tasks and long-context tasks including MMLU, MMLU PRO, CMMLU, BBH, Math, GPQA Diamond, GSM8K, HellaSwag, MathBench, HumanEval, MBPP, IFEval, Drop, Ruler and NeedleBench.We use LongBench-Cite to measure the citation quality as well as response correctness in long-context QA scenarios and LongBench-Write to measure the long output quality as well as the output length.

About opencompass

We use this commit of opencompass for evaluation. First, you need to add the sglang API to opencompass/opencompass/models (in eval/sglang_api.py).

Then, use the evaluation scripts to start the service and perform the evaluation:

sh eval/run_auto.sh /path/to/model mode_name num_node

If you need to evaluate long texts, you will need to replace the config_filename in run_request_sg.sh.

License

This repository is released under the Apache-2.0 license.

Acknowledgement

We learned a lot from the following projects when developing HBP.

About

[NIPS 2025] This is the official PyTorch implementation of "Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载