+
Skip to content
View QiJune's full-sized avatar

Organizations

@PaddlePaddle

Block or report QiJune

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlashInfer: Kernel Library for LLM Serving

Cuda 3,930 534 Updated Oct 20, 2025

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 904 43 Updated Oct 20, 2025

Dynamic Memory Management for Serving LLMs without PagedAttention

C 428 33 Updated May 30, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,616 316 Updated Aug 19, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 11,891 1,809 Updated Oct 20, 2025

Minimalist ML framework for Rust

Rust 18,325 1,267 Updated Oct 19, 2025

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,661 280 Updated Oct 20, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 60,471 10,651 Updated Oct 20, 2025

Inference code for Llama models

Python 58,848 9,813 Updated Jan 26, 2025

SCQL (Secure Collaborative Query Language) is a system that allows multiple distrusting parties to run joint analysis without revealing their private data.

Go 165 69 Updated Oct 17, 2025

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,368 584 Updated Oct 28, 2024

High-Resolution Image Synthesis with Latent Diffusion Models

Python 41,875 5,333 Updated Jun 25, 2025

Simple samples for TensorRT programming

Python 1,644 351 Updated May 27, 2025

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

Jupyter Notebook 1,585 97 Updated Feb 16, 2024

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 31,311 6,431 Updated Oct 20, 2025

Synthesizer for optimal collective communication algorithms

Python 118 27 Updated Apr 8, 2024

Repo for external large-scale work

Python 6,547 722 Updated Apr 27, 2024

Transformer related optimization, including BERT, GPT

C++ 6,329 921 Updated Mar 27, 2024
Python 2,895 332 Updated Oct 17, 2025

Development repository for the Triton language and compiler

MLIR 17,270 2,320 Updated Oct 20, 2025

Microsoft Collective Communication Library

C++ 366 32 Updated Sep 20, 2023

Large-scale model inference.

Python 631 86 Updated Sep 12, 2023

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 9,365 1,010 Updated Aug 20, 2025

A baseline repository of Auto-Parallelism in Training Neural Networks

Python 147 20 Updated Jun 25, 2022

XGo is the first AI-native programming language that integrates software engineering into a unified whole. Our vision is to enable everyone to become a builder of the world.

Go 9,332 561 Updated Oct 14, 2025

Kubernetes-native Deep Learning Framework

Python 743 116 Updated Jan 26, 2024

Training and serving large-scale neural networks with auto parallelization.

Python 3,159 353 Updated Dec 9, 2023

Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)

Python 9,225 385 Updated Aug 12, 2025

PyTorch Implementation of OpenAI GPT-2

Python 344 66 Updated Jul 4, 2024
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载