Lists (1)
Sort Name ascending (A-Z)
Stars
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
A verification tool for ensuring parallelization equivalence in distributed model training.
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
A high-throughput and memory-efficient inference and serving engine for LLMs
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Performance and Detection Benchmarks for TrainCheck (https://github.com/OrderLab/TrainCheck)
DocuSnap: Your AI-powered Personal Document Assistant.
DocuSnap frontend built in Andriod Studio
A curated reading list for machine learning reliability research and practice
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
Artifact Evaluation Scripts and Workloads for TrainCheck (OSDI'25)
A Framework for Automated Validation of Deep Learning Training Tasks
ByteCheckpoint: An Unified Checkpointing Library for LFMs
JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
Collective communications library with various primitives for multi-machine training.
Disseminated, Distributed OS for Hardware Resource Disaggregation. USENIX OSDI 2018 Best Paper.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
你管这破玩意叫操作系统源码 — 像小说一样品读 Linux 0.11 核心代码
Tile primitives for speedy kernels
DeepSeek-V3/R1 inference performance simulator
VIP cheatsheet for Stanford's CME 295 Transformers and Large Language Models
Create beautiful diagrams just by typing notation in plain text.