+
Skip to content
View zhiqiu's full-sized avatar
👋
hi
👋
hi

Block or report zhiqiu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 421 69 Updated Oct 12, 2025

Simple, safe way to store and distribute tensors

Python 3,477 270 Updated Oct 10, 2025

nanobind: tiny and efficient C++/Python bindings

C++ 3,065 255 Updated Oct 13, 2025

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Python 9,202 1,211 Updated Oct 10, 2025

Minimalistic 4D-parallelism distributed training framework for education purpose

Python 1,846 137 Updated Aug 26, 2025

Minimalistic large language model 3D-parallelism training

Python 2,250 249 Updated Sep 3, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,142 81 Updated Aug 28, 2025

Zero Bubble Pipeline Parallelism

Python 431 30 Updated May 7, 2025

Efficient Triton Kernels for LLM Training

Python 5,741 415 Updated Oct 13, 2025

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

C++ 76,797 8,290 Updated May 27, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,127 201 Updated Oct 13, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 18,799 1,842 Updated Oct 6, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,168 1,092 Updated Oct 12, 2025

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 75,340 11,036 Updated Oct 13, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 3,892 525 Updated Oct 13, 2025

基于 Playwright 和AI过滤的闲鱼多任务实时/定时监控与智能分析工具,配备了功能完善的后台管理界面。帮助用户节省闲鱼商品过滤,能及时找到心仪商品。

Python 6,361 861 Updated Oct 11, 2025

[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention

Python 506 26 Updated Oct 5, 2025

📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉

Python 419 20 Updated Aug 19, 2025

Source code spell checker

Rust 3,499 144 Updated Oct 7, 2025

Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.

Python 109 19 Updated Oct 11, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…

Python 10,315 900 Updated Oct 13, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 60,092 7,285 Updated Oct 13, 2025

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 46,874 3,827 Updated Oct 12, 2025

A lightweight design for computation-communication overlap.

Cuda 180 8 Updated Oct 10, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 18,795 3,025 Updated Oct 13, 2025

CUDA Python: Performance meets Productivity

Python 3,000 213 Updated Oct 13, 2025

Material for gpu-mode lectures

Jupyter Notebook 5,163 515 Updated Sep 23, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,167 96 Updated Oct 2, 2025

Lightweight coding agent that runs in your terminal

Rust 47,250 5,687 Updated Oct 13, 2025

Patch convolution to avoid large GPU memory usage of Conv2D

Python 92 8 Updated Jan 23, 2025
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载