这是indexloc提供的服务,不要输入任何密码
Skip to content
View junl666's full-sized avatar
  • ZJU
  • 06:23 (UTC +08:00)

Highlights

  • Pro

Block or report junl666

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

:octocat: 分享 GitHub 上有趣、入门级的开源项目。Share interesting, entry-level open source projects on GitHub.

Python 122,735 10,572 Updated Jun 27, 2025

Tensor library for machine learning

C++ 12,876 1,286 Updated Jul 25, 2025

MemOS (Preview) | Intelligence Begins with Memory

Python 2,005 162 Updated Jul 26, 2025

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,282 1,384 Updated Jul 9, 2025

Nano vLLM

Python 5,443 649 Updated Jun 27, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 53,253 8,943 Updated Jul 26, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 915 80 Updated Jul 25, 2025

Efficient Triton Kernels for LLM Training

Python 5,420 371 Updated Jul 24, 2025

Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚

Python 29,086 1,852 Updated Mar 21, 2025

Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"

Python 56 9 Updated Nov 24, 2024

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Python 9,731 567 Updated Sep 7, 2024

Development repository for the Triton language and compiler

MLIR 16,290 2,130 Updated Jul 26, 2025

A collection of memory efficient attention operators implemented in the Triton language.

Python 273 18 Updated Jun 5, 2024

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 391 42 Updated May 14, 2025

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,408 270 Updated Jul 25, 2025

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,100 162 Updated Jul 29, 2023

Machine Learning Engineering Open Book

Python 14,523 875 Updated Jul 24, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,289 296 Updated Jul 23, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 5,710 604 Updated Jul 21, 2025

row-major matmul optimization

C++ 648 88 Updated Sep 9, 2023

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,163 923 Updated Jun 17, 2025

😼 优雅地使用基于 clash/mihomo 的代理环境

Shell 3,022 434 Updated Jul 23, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,715 1,049 Updated Jul 25, 2025

Fast inference engine for Transformer models

C++ 3,925 370 Updated Apr 8, 2025

Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

C++ 2,229 155 Updated Jul 7, 2025

High-speed Large Language Model Serving for Local Deployment

C++ 8,239 434 Updated Feb 19, 2025

simplify >2GB large onnx model

Python 60 4 Updated Nov 30, 2024

export llama to onnx

Python 131 16 Updated Dec 28, 2024

Awesome LLMs on Device: A Comprehensive Survey

1,160 106 Updated Jan 12, 2025

推荐系统入门教程,在线阅读地址:https://datawhalechina.github.io/fun-rec/

Jupyter Notebook 5,988 931 Updated Jun 24, 2025
Next