+
Skip to content

gty111/EFIM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse


Latest News 🔥

Set up Environment

We need conda to set up environment. Please install conda before executing the following instructions.

source scripts/setup.sh

After setting up environment, you should execute the instructions under conda env euro_par_artifact. Please make sure models (llama, llama-enhance, deepseek, deepseek-enhance) are properly downloaded under models.

Launch vLLM

w/o prefix caching

export MODEL=llama # choose from [llama, llama-enhance, deepseeek, deepseek-enhance]
./scripts/launch_server.sh

w/ prefix caching

export MODEL=llama # choose from [llama, llama-enhance, deepseeek, deepseek-enhance]
./scripts/launch_prefix_server.sh

Evaluate Infilling and Subtoken Generation Ability

Launch vLLM before running following commands

HumanEval

w/FIM

python benchmark/async_benchmark_humaneval.py --model llama

w/EFIM

python benchmark/async_benchmark_humaneval.py --model llama --use-EFIM

CCEval

w/FIM

python benchmark/async_benchmark_cceval.py --model llama

w/EFIM

python benchmark/async_benchmark_cceval.py --model llama --use-EFIM

Evaluate Inference Speedup

Launch vLLM before running following commands

w/FIM

python benchmark/async_benchmark_inference_speed.py --model llama --num-round 5 --num-user 16 

w/EFIM

python benchmark/async_benchmark_inference_speed.py --model llama-enhance --num-round 5 --num-user 16 --use-EFIM

FAQ

Q1: How to solve AssertionError at assert completions[idx].success?

A1: One possible solution is increasing the allowed number of open files through ulimit -n 65535.

Cite Our Work

@misc{guo2025efimefficientservingllms,
      title={EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse}, 
      author={Tianyu Guo and Hande Dong and Yichong Leng and Feng Liu and Cheater Lin and Nong Xiao and Xianwei Zhang},
      year={2025},
      eprint={2505.21889},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.21889}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载