Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache

Official PyTorch implementation for the paper Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache

Demo

Hogwild! Inference.

Inference with shared cache:

Dependencies

Install packages from requirements.txt:

pip install -r requirements.txt

Run with multiple workers

To try inference described in the paper you can run jupyter notebooks from notebooks/ folder:

Simple example with minimal prompt: basic_example.ipynb

Hogwild! Inference with full prompt: full_example.ipynb

Minimal colab example with Llama-3.2 3B and very limited collaboration: colab_example.ipynb

Fast Inference Kernels

To use fast inference kernels, go to the inference_lib folder and run:

pip install -e . # ensure you have nvcc cuda compiler in PATH or export CUDACXX=/TODO/path/to/nvcc

to install the necessary module. You can test it using the notebook hogwild_with_fast_kernels.ipynb.

Kernels were optimized for the L40 and similar GPUs.

Cite

If you found this work useful, please consider citing:

@misc{rodionov2025hogwildinferenceparallelllm,
      title={Hogwild! Inference: Parallel LLM Generation via Concurrent Attention}, 
      author={Gleb Rodionov and Roman Garipov and Alina Shutova and George Yakushev and Vage Egiazarian and Anton Sinitsin and Denis Kuznedelev and Dan Alistarh},
      year={2025},
      eprint={2504.06261},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2504.06261}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
evals		evals
inference_lib		inference_lib
shared_cache		shared_cache
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
basic_example.ipynb		basic_example.ipynb
colab_example.ipynb		colab_example.ipynb
formatting.py		formatting.py
full_example.ipynb		full_example.ipynb
generation.py		generation.py
hogwild_with_fast_kernels.ipynb		hogwild_with_fast_kernels.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache

Demo

Inference with shared cache:

Dependencies

Run with multiple workers

Fast Inference Kernels

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 11

Uh oh!

Languages

License

eqimp/hogwild_llm

Folders and files

Latest commit

History

Repository files navigation

Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache

Demo

Inference with shared cache:

Dependencies

Run with multiple workers

Fast Inference Kernels

Cite

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 11

Uh oh!

Languages

Packages