+
Skip to content

Implementation from scratch in C of the Multi-head latent attention used in the Deepseek-v3 technical paper.

Notifications You must be signed in to change notification settings

thomaschlt/mla.c

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

MLA (Multi-head Linear Attention) Implementation in C 🚀

A C implementation of Multi-head Linear Attention with RoPE (Rotary Position Embedding) support.

Features ✨

  • Multi-head attention mechanism
  • RoPE (Rotary Position Embedding) implementation
  • Memory-efficient key-value caching
  • Content and positional attention scoring
  • Numerically stable softmax implementation
  • RMSNorm implementation (for query and key/value paths)

Paper Implemented 📄

This implementation is based on the "DeepSeek-V3 Technical Report" by DeepSeek-AI

Next To Do 📝

  • Add batch processing support
  • Optimize memory usage
  • Implement parallel processing
  • Performance benchmarking

Contribution 🤝

Feel free to contribute or suggest improvements!

About

Implementation from scratch in C of the Multi-head latent attention used in the Deepseek-v3 technical paper.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载