Stars
🚀 Level up your GitHub profile readme with customizable cards including LOC statistics!
HumanLayer enables AI agents to communicate with humans in tool-based and async workflows. Guarantee human oversight of high-stakes function calls with approval workflows across slack, email and mo…
Pocket Flow: 100-line LLM framework. Let Agents build Agents!
HAMi-core compiles libvgpu.so, which ensures hard limit on GPU in container
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.
很多镜像都在国外。比如 gcr 。国内下载很慢,需要加速。致力于提供连接全世界的稳定可靠安全的容器镜像服务。
Quickly search through menu options of the front-most application - Alfred Workflow
A New and Modern SSH connector written in Python.
Profiling and tracing information for Python using viztracer and perf, the GIL exposed.
JSON benchmarks to compare different Go JSON implementations
Java Virtual Machine (JVM) Performance Benchmarks with a primary focus on top-tier Just-In-Time (JIT) Compilers, such as C2 JIT, Graal JIT, and the Falcon JIT.
Fortio load testing library, command line tool, advanced echo server and web UI in go (golang). Allows to specify a set query-per-second load and record latency histograms and other useful stats.
Supercharge Your LLM with the Fastest KV Cache Layer
An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to inference, as well as some practical experiences and conclusions.…
Cost-efficient and pluggable Infrastructure components for GenAI inference
SGLang is a fast serving framework for large language models and vision language models.
LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
A minimal GPU design in Verilog to learn how GPUs work from the ground up
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Perforator is a cluster-wide continuous profiling tool designed for large data centers
SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.
🚴 Call stack profiler for Python. Shows you why your code is slow!