+
Skip to content
View rhmaaa's full-sized avatar
  • bytedance
  • shanghai

Block or report rhmaaa

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

:trollface:Git的奇技淫巧

15,816 3,261 Updated Dec 8, 2022

A Quirky Assortment of CuTe Kernels

Python 615 48 Updated Oct 11, 2025

[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)

Python 126 5 Updated Jul 8, 2025

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 893 46 Updated Mar 19, 2025

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

555 13 Updated Sep 30, 2025

An efficient implementation of the NSA (Native Sparse Attention) kernel

Python 119 5 Updated Jun 24, 2025

Making Flux go brrr on GPUs.

Python 144 13 Updated Jul 18, 2025

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Python 1,804 292 Updated Jan 16, 2024

Puzzles for learning Triton, play it with minimal environment configuration!

Python 542 65 Updated Sep 22, 2025

CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and qua…

Cuda 198 19 Updated Oct 10, 2025

Nano vLLM

Python 7,016 893 Updated Aug 31, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,593 950 Updated Oct 11, 2025

Modified version of PyTorch able to work with changes to GPGPU-Sim

C++ 56 30 Updated Nov 18, 2022

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,644 280 Updated Oct 10, 2025

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,308 218 Updated Oct 10, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,784 711 Updated Oct 11, 2025
Cuda 120 16 Updated Mar 17, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,163 103 Updated Oct 2, 2025

Turn your AirPods into a posture coach on macOS

Swift 4 1 Updated Jun 5, 2025

Latest Advances on System-2 Reasoning

Python 1,248 69 Updated Jun 8, 2025

Paper list for Efficient Reasoning.

683 24 Updated Sep 19, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,142 84 Updated Aug 28, 2025

A lightweight design for computation-communication overlap.

Cuda 180 8 Updated Oct 10, 2025

Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport

Cuda 63 2 Updated May 9, 2025

Perplexity GPU Kernels

C++ 484 62 Updated Sep 19, 2025
C++ 10 3 Updated Feb 11, 2025

Fast Hadamard transform in CUDA, with a PyTorch interface

C 246 41 Updated Oct 6, 2025

An open source GPU based off of the AMD Southern Islands ISA.

Verilog 1,237 250 Updated Aug 18, 2025
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载