Highlights
Lists (11)
Sort Name ascending (A-Z)
Stars
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
fmchisel: Efficient Compression and Training Algorithms for Foundation Models
Introduction to Machine Learning Systems
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
Renderer for the harmony response format to be used with gpt-oss
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.
A single-file educational implementation for understanding vLLM's core concepts and running LLM inference.
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Open Source DeepWiki: AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories. Join the discord: https://discord.gg/gMwThUMeme
My learning notes/codes for ML SYS.
Efficient Triton Kernels for LLM Training
Supercharge Your LLM with the Fastest KV Cache Layer
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
Minimalistic 4D-parallelism distributed training framework for education purpose
Model Context Protocol Servers
每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈
This is the official repository for The Hundred-Page Language Models Book by Andriy Burkov
The Python code to reproduce the illustrations from The Hundred-Page Machine Learning Book.
Fully open reproduction of DeepSeek-R1