Stars
Open, Multi-modal Catalog for Data & AI
Cloud Shuffle Service(CSS) is a general purpose remote shuffle solution for compute engines, including Spark/Flink/MapReduce.
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
Most advanced key-value database written in Go, extremely fast, compatible with LSM tree and B+ tree.
Real-time event streaming platform. Streaming CDC, stream processing, low-latency serving, and Iceberg management.
GraalVM compiles applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
Coroutine-Oriented Main-Memory Database Engine (VLDB 2021)
C++ implementation of Raft core logic as a replication library
A better compressed bitset in Java: used by Apache Spark, Netflix Atlas, Apache Pinot, Tablesaw, and many others
A RocksDB compatible KV storage engine with better performance
A graph-based distributed in-memory store that leverages efficient graph exploration to provide highly concurrent and low-latency queries over big linked data
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
📄 🇨🇳 📃 论文阅读笔记(分布式系统、虚拟化、机器学习)Papers Notebook (Distributed System, Virtualization, Machine Learning)
Labs for the Designing Distributed Systems book.
The Prometheus time series database layer.
YSDA course in Natural Language Processing
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Fancy stream processing made operationally mundane
The Prometheus monitoring system and time series database.
TonY is a framework to natively run deep learning frameworks on Apache Hadoop.