GPT-2 in Modular MAX

Usage

max serve --model openai-community/gpt2 --custom-architectures ../max-gpt-2

Features

GPU support
Paged KV Caching
Flash Attention

Performance

GPU: Nvidia RTX 5090

Input prompt: 1st paragraph of lorem ipsum

Cold cache

Prompt processing: 3.7K tok/s

Token generation: 14.9 tok/s

Warm cache

Prompt processing: 30.7K tok/s

Token generation: 250.1 tok/s

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
arch.py		arch.py
gpt2.py		gpt2.py
model.py		model.py
pixi.lock		pixi.lock
pixi.toml		pixi.toml
weight_adapters.py		weight_adapters.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPT-2 in Modular MAX

Usage

Features

Performance

Cold cache

Warm cache

About

Uh oh!

Releases

Packages

Languages

License

tommyip/max-gpt-2

Folders and files

Latest commit

History

Repository files navigation

GPT-2 in Modular MAX

Usage

Features

Performance

Cold cache

Warm cache

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages