Stars
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Development repository for the Triton language and compiler
verl: Volcano Engine Reinforcement Learning for LLMs
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
A minimal GPU design in Verilog to learn how GPUs work from the ground up
DeepEP: an efficient expert-parallel communication library
High-speed Large Language Model Serving for Local Deployment
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Modeling, training, eval, and inference code for OLMo
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Training and serving large-scale neural networks with auto parallelization.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
DLRover: An Automatic Distributed Deep Learning System
Distributed Compiler based on Triton for Parallel Systems
DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is hosted in incubation in LF AI & Data Foundation.
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
Examples of CUDA implementations by Cutlass CuTe
Allow torch tensor memory to be released and resumed later
Implement Flash Attention using Cute.
Pre-training code for CrystalCoder 7B LLM
Data preparation code for CrystalCoder 7B LLM