Latest News 🔥
- [2025/05/28]: We release our paper on arXiv:2505.21889 🥳
We need conda to set up environment. Please install conda before executing the following instructions.
source scripts/setup.sh
After setting up environment, you should execute the instructions under conda env euro_par_artifact
.
Please make sure models (llama
, llama-enhance
, deepseek
, deepseek-enhance
) are properly downloaded under models
.
export MODEL=llama # choose from [llama, llama-enhance, deepseeek, deepseek-enhance]
./scripts/launch_server.sh
export MODEL=llama # choose from [llama, llama-enhance, deepseeek, deepseek-enhance]
./scripts/launch_prefix_server.sh
Launch vLLM before running following commands
python benchmark/async_benchmark_humaneval.py --model llama
python benchmark/async_benchmark_humaneval.py --model llama --use-EFIM
python benchmark/async_benchmark_cceval.py --model llama
python benchmark/async_benchmark_cceval.py --model llama --use-EFIM
Launch vLLM before running following commands
python benchmark/async_benchmark_inference_speed.py --model llama --num-round 5 --num-user 16
python benchmark/async_benchmark_inference_speed.py --model llama-enhance --num-round 5 --num-user 16 --use-EFIM
Q1: How to solve AssertionError
at assert completions[idx].success
?
A1: One possible solution is increasing the allowed number of open files through ulimit -n 65535
.
@misc{guo2025efimefficientservingllms,
title={EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse},
author={Tianyu Guo and Hande Dong and Yichong Leng and Feng Liu and Cheater Lin and Nong Xiao and Xianwei Zhang},
year={2025},
eprint={2505.21889},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.21889},
}