这是indexloc提供的服务,不要输入任何密码
Skip to content
View jcao-ai's full-sized avatar

Organizations

@leptonai

Block or report jcao-ai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Cosmos-RL is a flexible and scalable Reinforcement Learning framework specialized for Physical AI applications.

Python 78 9 Updated Jul 26, 2025

Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.

Python 569 31 Updated Jul 22, 2025
Python 101 7 Updated Dec 27, 2024

DeepEP: an efficient expert-parallel communication library

Cuda 8,310 868 Updated Jul 22, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

C++ 5,555 657 Updated Jul 25, 2025

FlashMLA: Efficient MLA decoding kernels

Cuda 11,656 878 Updated Apr 29, 2025

Educational implementation of the Discrete Flow Matching paper

Jupyter Notebook 98 7 Updated Aug 26, 2024
Python 104 6 Updated May 29, 2023

The repo for In-context Autoencoder

Jupyter Notebook 130 17 Updated May 11, 2024

Code for "In-Context Former: Lightning-fast Compressing Context for Large Language Model" (Findings of EMNLP 2024)

Python 16 2 Updated Nov 21, 2024

Reproduce R1 Zero on Logic Puzzle

Python 2,379 159 Updated Mar 20, 2025

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 7,487 731 Updated Jul 24, 2025

Simple RL training for reasoning

Python 3,693 275 Updated Apr 10, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 12,056 1,498 Updated Apr 24, 2025

Real Time (WebRTC & WebTransport) Proxy for LLM WebSocket APIs

Python 40 3 Updated Jan 17, 2025

This repository based by Mellanox/gpu_direct_rdma_access. Some errors in the code have been modified, some methods have been optimized, and some features have been added

C 4 1 Updated Apr 2, 2025

GIL-powered* locking library for Python

Python 47 3 Updated Jul 25, 2025

BentoDiffusion: A collection of diffusion models served with BentoML

Python 373 28 Updated Apr 29, 2025

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 849 40 Updated Jul 9, 2025

A generative speech model for daily dialogue.

Python 37,265 4,032 Updated Jul 6, 2025

This is Shopify products Scraper. The script retrieves data from the products.json file of Shopify shop. Then, for each product, it makes an additional query to the product page to retrieve data fr…

Python 21 4 Updated Nov 24, 2024

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 863 71 Updated Sep 4, 2024
Python 4,163 561 Updated Mar 19, 2024

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Python 697 31 Updated Dec 2, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 8,119 1,341 Updated Jul 26, 2025

Fast and memory-efficient exact attention

Python 18,548 1,834 Updated Jul 24, 2025

Building a quick conversation-based search demo with Lepton AI.

TypeScript 8,130 1,027 Updated Jun 15, 2025

Serving multiple LoRA finetuned LLM as one

Python 1,078 52 Updated May 8, 2024

Mamba SSM architecture

Python 15,472 1,371 Updated Jul 19, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 11,128 1,617 Updated Jul 26, 2025
Next