zhiqiu

👋

Leo Chen zhiqiu

👋

DL system

46 followers · 23 following

Achievements

x2 x3

Achievements

x2 x3

Stars

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 421 69 Updated Oct 12, 2025

huggingface / safetensors

Simple, safe way to store and distribute tensors

Python 3,477 270 Updated Oct 10, 2025

wjakob / nanobind

nanobind: tiny and efficient C++/Python bindings

C++ 3,065 255 Updated Oct 13, 2025

huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Python 9,202 1,211 Updated Oct 10, 2025

huggingface / picotron

Minimalistic 4D-parallelism distributed training framework for education purpose

Python 1,846 137 Updated Aug 26, 2025

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Python 2,250 249 Updated Sep 3, 2025

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,142 81 Updated Aug 28, 2025

sail-sg / zero-bubble-pipeline-parallelism

Forked from NVIDIA/Megatron-LM

Zero Bubble Pipeline Parallelism

Python 431 30 Updated May 7, 2025

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 5,741 415 Updated Oct 13, 2025

nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

C++ 76,797 8,290 Updated May 27, 2025

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 2,127 201 Updated Oct 13, 2025

openai / gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 18,799 1,842 Updated Oct 6, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,168 1,092 Updated Oct 12, 2025

rasbt / LLMs-from-scratch

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 75,340 11,036 Updated Oct 13, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 3,892 525 Updated Oct 13, 2025

Usagi-org / ai-goofish-monitor

基于 Playwright 和AI过滤的闲鱼多任务实时/定时监控与智能分析工具，配备了功能完善的后台管理界面。帮助用户节省闲鱼商品过滤，能及时找到心仪商品。

Python 6,361 861 Updated Oct 11, 2025

svg-project / Sparse-VideoGen

[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention

Python 506 26 Updated Oct 5, 2025

xlite-dev / Awesome-DiT-Inference

📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉

Python 419 20 Updated Aug 19, 2025

crate-ci / typos

Source code spell checker

Rust 3,499 144 Updated Oct 7, 2025

yanring / Megatron-MoE-ModelZoo

Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.

Python 109 19 Updated Oct 11, 2025

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…

Python 10,315 900 Updated Oct 13, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 60,092 7,285 Updated Oct 13, 2025

unslothai / unsloth

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 46,874 3,827 Updated Oct 12, 2025

infinigence / FlashOverlap

A lightweight design for computation-communication overlap.

Cuda 180 8 Updated Oct 10, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 18,795 3,025 Updated Oct 13, 2025

NVIDIA / cuda-python

CUDA Python: Performance meets Productivity

Python 3,000 213 Updated Oct 13, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 5,163 515 Updated Sep 23, 2025

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,167 96 Updated Oct 2, 2025

openai / codex

Lightweight coding agent that runs in your terminal

Rust 47,250 5,687 Updated Oct 13, 2025

mit-han-lab / patch_conv

Patch convolution to avoid large GPU memory usage of Conv2D

Python 92 8 Updated Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leo Chen zhiqiu

Achievements

Achievements

Block or report zhiqiu

Stars

microsoft / mscclpp

huggingface / safetensors

wjakob / nanobind

huggingface / accelerate

huggingface / picotron

huggingface / nanotron

bytedance / flux

sail-sg / zero-bubble-pipeline-parallelism

linkedin / Liger-Kernel

nomic-ai / gpt4all

THUDM / slime

openai / gpt-oss

kvcache-ai / ktransformers

rasbt / LLMs-from-scratch

flashinfer-ai / flashinfer

Usagi-org / ai-goofish-monitor

svg-project / Sparse-VideoGen

xlite-dev / Awesome-DiT-Inference

crate-ci / typos

yanring / Megatron-MoE-ModelZoo

modelscope / ms-swift

hiyouga / LLaMA-Factory

unslothai / unsloth

infinigence / FlashOverlap

sgl-project / sglang

NVIDIA / cuda-python

gpu-mode / lectures

ByteDance-Seed / Triton-distributed

openai / codex

mit-han-lab / patch_conv