A C implementation of Multi-head Linear Attention with RoPE (Rotary Position Embedding) support.
- Multi-head attention mechanism
- RoPE (Rotary Position Embedding) implementation
- Memory-efficient key-value caching
- Content and positional attention scoring
- Numerically stable softmax implementation
- RMSNorm implementation (for query and key/value paths)
This implementation is based on the "DeepSeek-V3 Technical Report" by DeepSeek-AI
- Add batch processing support
- Optimize memory usage
- Implement parallel processing
- Performance benchmarking
Feel free to contribute or suggest improvements!