+
Skip to content
View tqchen's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Organizations

@apache @dmlc @uwsampl @octoml

Block or report tqchen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 399 41 Updated Oct 18, 2025

a size profiler for cuda binary

Python 51 Updated Oct 7, 2025

An extremely fast Python package and project manager, written in Rust.

Rust 70,231 2,124 Updated Oct 18, 2025

VS Code extension for syntax highlighting C++/CUDA/HIP code in PyTorch load_inline() strings

Python 8 Updated Jul 25, 2025

RFC document, tooling and other content related to the array API standard

Python 258 52 Updated Sep 4, 2025

AGENTS.md — a simple, open format for guiding coding agents

TypeScript 7,365 570 Updated Oct 14, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 11,880 1,808 Updated Oct 18, 2025

🎡 Build Python wheels for all the platforms with minimal configuration.

Python 2,118 291 Updated Oct 15, 2025

A next generation Python CMake adaptor and Python API for plugins

Python 398 72 Updated Oct 13, 2025

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 380 8 Updated Oct 9, 2025

Minimum example for deploying Apache TVM's Relax IR using C++ API

C++ 4 Updated Sep 20, 2025
Python 96 9 Updated Sep 13, 2025

JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training

Python 55 1 Updated Oct 13, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,175 96 Updated Oct 17, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,313 645 Updated Oct 18, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,868 304 Updated Mar 10, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,798 718 Updated Oct 15, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,815 884 Updated Sep 30, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 14,440 2,286 Updated Oct 18, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,924 285 Updated May 15, 2025

TL2cgen (TreeLite 2 C GENerator) is a model compiler for decision tree models

C++ 37 9 Updated Oct 13, 2025
TypeScript 48 6 Updated Mar 9, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,607 267 Updated Oct 17, 2025

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T

Jupyter Notebook 3,831 721 Updated Oct 17, 2025

A PyTorch native platform for training generative AI models

Python 4,555 565 Updated Oct 18, 2025

Modeling, training, eval, and inference code for OLMo

Python 6,042 663 Updated Oct 17, 2025

Fast, Flexible and Portable Structured Generation

C++ 1,309 91 Updated Oct 11, 2025

Structured Outputs

Python 12,716 642 Updated Oct 15, 2025
Python 746 11 Updated Apr 17, 2024

CUDA Python: Performance meets Productivity

Python 3,007 215 Updated Oct 17, 2025
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载