⚡️FFPA: Extend FlashAttention-2 with Split-D, achieve ~O(1) SRAM complexity for large headdim, 1.8x~3x↑ vs SDPA.🎉
-
Updated
May 10, 2025 - Cuda
⚡️FFPA: Extend FlashAttention-2 with Split-D, achieve ~O(1) SRAM complexity for large headdim, 1.8x~3x↑ vs SDPA.🎉
An open-source interface to use the multiple-precision solver SDPA-GMP with YALMIP
PyTorch implementation of YOLOv12 with Scaled Dot-Product Attention (SDPA) optimized by FlashAttention for fast and efficient object detection.
Add a description, image, and links to the sdpa topic page so that developers can more easily learn about it.
To associate your repository with the sdpa topic, visit your repo's landing page and select "manage topics."