+
Skip to content
View cherhh's full-sized avatar

Block or report cherhh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 15,847 3,129 Updated Oct 11, 2025
Python 815 63 Updated Sep 12, 2025

A Virtualized Multi-LoRA Framework for Unified LLM Fine-tuning and Serving

Python 1 Updated Sep 5, 2025

A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.

C++ 11 3 Updated Sep 29, 2025
Python 29 4 Updated Mar 17, 2025

SCORPIO is a system-algorithm co-designed LLM serving engine that prioritizes heterogeneous Service Level Objectives (SLOs) like TTFT and TPOT across all scheduling stages.

Jupyter Notebook 5 1 Updated Sep 23, 2025

Venus Collective Communication Library, supported by SII and Infrawaves.

C++ 97 3 Updated Oct 10, 2025

The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >70% on SWE-bench verified!

Python 1,847 187 Updated Oct 11, 2025

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

555 13 Updated Sep 30, 2025

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

C++ 540 68 Updated Oct 11, 2025

Awesome list for LLM quantization

Python 316 19 Updated Oct 11, 2025

PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]

Python 53 7 Updated Oct 2, 2025

From Minimal GEMM to Everything

Cuda 53 2 Updated Oct 10, 2025

Unleashing the Power of Reinforcement Learning for Math and Code Reasoners

Python 724 44 Updated Jun 6, 2025

Fast and memory-efficient exact kmeans

Python 100 6 Updated Sep 30, 2025

[Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models

Python 1,126 182 Updated Aug 29, 2025

The calflops is designed to calculate FLOPs、MACs and Parameters in all various neural networks, such as Linear、 CNN、 RNN、 GCN、Transformer(Bert、LlaMA etc Large Language Model)

Python 880 36 Updated Jun 27, 2024

A Unified Cache Acceleration Framework for 🤗 Diffusers: Qwen-Image-Lightning, Qwen-Image, HunyuanImage, FLUX, Wan, etc.

Python 388 12 Updated Oct 11, 2025

[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation

Python 238 15 Updated Dec 16, 2024

[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive

Cuda 41 2 Updated Sep 24, 2025
Python 25 1 Updated Mar 24, 2025

The official implementation of flow Q-learning (FQL)

Python 236 22 Updated Jul 21, 2025

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 15,752 1,172 Updated Oct 5, 2025

Open-Sora: Democratizing Efficient Video Production for All

Python 27,390 2,700 Updated Apr 30, 2025

[ASPLOS'25] Towards End-to-End Optimization of LLM-based Applications with Ayo

Python 45 4 Updated Aug 5, 2025

Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling

Python 444 20 Updated May 17, 2025

Ring attention implementation with flash attention

Python 1 Updated Sep 10, 2025
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载