+
Skip to content

tommyip/max-gpt-2

Repository files navigation

GPT-2 in Modular MAX

Usage

max serve --model openai-community/gpt2 --custom-architectures ../max-gpt-2

Features

  • GPU support
  • Paged KV Caching
  • Flash Attention

Performance

GPU: Nvidia RTX 5090

Input prompt: 1st paragraph of lorem ipsum

Cold cache

Prompt processing: 3.7K tok/s

Token generation: 14.9 tok/s

Warm cache

Prompt processing: 30.7K tok/s

Token generation: 250.1 tok/s

About

GPT-2 implementation in Modular MAX

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载