Stars
Cosmos-RL is a flexible and scalable Reinforcement Learning framework specialized for Physical AI applications.
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
DeepEP: an efficient expert-parallel communication library
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashMLA: Efficient MLA decoding kernels
Educational implementation of the Discrete Flow Matching paper
Code for "In-Context Former: Lightning-fast Compressing Context for Large Language Model" (Findings of EMNLP 2024)
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
Minimal reproduction of DeepSeek R1-Zero
Real Time (WebRTC & WebTransport) Proxy for LLM WebSocket APIs
This repository based by Mellanox/gpu_direct_rdma_access. Some errors in the code have been modified, some methods have been optimized, and some features have been added
BentoDiffusion: A collection of diffusion models served with BentoML
A throughput-oriented high-performance serving framework for LLMs
A generative speech model for daily dialogue.
This is Shopify products Scraper. The script retrieves data from the products.json file of Shopify shop. Then, for each product, it makes an additional query to the product page to retrieve data fr…
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Fast and memory-efficient exact attention
Building a quick conversation-based search demo with Lepton AI.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…