+

thomaschlt / mla.c Public

Notifications You must be signed in to change notification settings
Fork 1
Star 19

Implementation from scratch in C of the Multi-head latent attention used in the Deepseek-v3 technical paper.

19 stars 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
mla.c		mla.c

Repository files navigation

MLA (Multi-head Linear Attention) Implementation in C 🚀

A C implementation of Multi-head Linear Attention with RoPE (Rotary Position Embedding) support.

Features ✨

Multi-head attention mechanism
RoPE (Rotary Position Embedding) implementation
Memory-efficient key-value caching
Content and positional attention scoring
Numerically stable softmax implementation
RMSNorm implementation (for query and key/value paths)

Paper Implemented 📄

This implementation is based on the "DeepSeek-V3 Technical Report" by DeepSeek-AI

Next To Do 📝

Add batch processing support
Optimize memory usage
Implement parallel processing
Performance benchmarking

Contribution 🤝

Feel free to contribute or suggest improvements!

About

Implementation from scratch in C of the Multi-head latent attention used in the Deepseek-v3 technical paper.

Report repository

Releases

No releases published

Packages

No packages published

Languages

C 100.0%

点击这是indexloc提供的php浏览器服务，不要输入任何密码和下载