cherhh

cher cherhh

rubbish

25 followers · 601 following

Lists (8)

Sort

Stars

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 15,847 3,129 Updated Oct 11, 2025

thunlp / LLMxMapReduce

Python 815 63 Updated Sep 12, 2025

NJUDeepEngine / Loquetier

A Virtualized Multi-LoRA Framework for Unified LLM Fine-tuning and Serving

Python 1 Updated Sep 5, 2025

MoFHeka / execution-ucx

A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.

C++ 11 3 Updated Sep 29, 2025

antgroup / cakekv

Python 29 4 Updated Mar 17, 2025

MisterBrookT / Scorpio

SCORPIO is a system-algorithm co-designed LLM serving engine that prioritizes heterogeneous Service Level Objectives (SLOs) like TTFT and TPOT across all scheduling stages.

Jupyter Notebook 5 1 Updated Sep 23, 2025

sii-research / VCCL

Venus Collective Communication Library, supported by SII and Infrawaves.

C++ 97 3 Updated Oct 10, 2025

SWE-agent / mini-swe-agent

The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >70% on SWE-bench verified!

Python 1,847 187 Updated Oct 11, 2025

lynnliu030 / cs294-264-hw-FA25

Python 1 9 Updated Oct 4, 2025

October2001 / Awesome-KV-Cache-Compression

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

555 13 Updated Sep 30, 2025

jd-opensource / xllm

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

C++ 540 68 Updated Oct 11, 2025

pprp / Awesome-LLM-Quantization

Awesome list for LLM quantization

Python 316 19 Updated Oct 11, 2025

ruipeterpan / specreason

PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]

Python 53 7 Updated Oct 2, 2025

ArthurinRUC / cutlass-notes

From Minimal GEMM to Everything

Cuda 53 2 Updated Oct 10, 2025

deepseek-ai / DeepSeek-V3.2-Exp

Python 865 49 Updated Oct 2, 2025

SkyworkAI / Skywork-OR1

Unleashing the Power of Reinforcement Learning for Math and Code Reasoners

Python 724 44 Updated Jun 6, 2025

svg-project / flash-kmeans

Fast and memory-efficient exact kmeans

Python 100 6 Updated Sep 30, 2025

Zefan-Cai / R-KV

[Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models

Python 1,126 182 Updated Aug 29, 2025

MrYxJ / calculate-flops.pytorch

The calflops is designed to calculate FLOPs、MACs and Parameters in all various neural networks, such as Linear、 CNN、 RNN、 GCN、Transformer(Bert、LlaMA etc Large Language Model)

Python 880 36 Updated Jun 27, 2024