Highlights
- Pro
Stars
分享 GitHub 上有趣、入门级的开源项目。Share interesting, entry-level open source projects on GitHub.
MemOS (Preview) | Intelligence Begins with Memory
A high-throughput and memory-efficient inference and serving engine for LLMs
Distributed Compiler based on Triton for Parallel Systems
Efficient Triton Kernels for LLM Training
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Development repository for the Triton language and compiler
A collection of memory efficient attention operators implemented in the Triton language.
flash attention tutorial written in python, triton, cuda, cutlass
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
Machine Learning Engineering Open Book
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Fast inference engine for Transformer models
Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
High-speed Large Language Model Serving for Local Deployment
Awesome LLMs on Device: A Comprehensive Survey
推荐系统入门教程,在线阅读地址:https://datawhalechina.github.io/fun-rec/