Stars
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
Trae Agent is an LLM-based agent for general purpose software engineering tasks.
open-source coding LLM for software engineering tasks
The power of Claude Code + [Gemini / OpenAI / Grok / OpenRouter / Ollama / Custom Model / All Of The Above] working as one.
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.
🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transf…
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
Production-ready platform for agentic workflow development.
A course of learning LLM inference serving on Apple Silicon for systems engineers.
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Langflow is a powerful tool for building and deploying AI-powered agents and workflows.
Eliminate all the tedious hassle when making state-of-the-art C++ 14 - 23 libraries!
Fully Local Manus AI. No APIs, No $200 monthly bills. Enjoy an autonomous agent that thinks, browses the web, and code for the sole cost of electricity. 🔔 Official updates only via twitter @Martin9…
Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.
A tool for bandwidth measurements on NVIDIA GPUs.
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
PaddleOCR inference in PyTorch. Converted from [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
FlashInfer: Kernel Library for LLM Serving
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.
Define, Prompt and Test MCP enabled Agents and Workflows
MSCCL++: A GPU-driven communication stack for scalable AI applications
Expose your FastAPI endpoints as Model Context Protocol (MCP) tools, with Auth!
FastAPI framework, high performance, easy to learn, fast to code, ready for production