Stars
%%sql magic for IPython, hopefully evolving into full SQL client
Distributed query engine providing simple and reliable data processing for any modality and scale
Jupyter magics and kernels for working with remote Spark clusters
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
A system for agentic LLM-powered data processing and ETL
Use LOTUS to process all of your datasets with LLMs and embeddings. Enjoy up to 1000x speedups with fast, accurate query processing, that's as simple as writing Pandas code
Apache Fluss is a streaming storage built for real-time analytics.
Apache Doris is an easy-to-use, high performance and unified analytics database.
Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads
SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.
JLine is a Java library for handling console input.
Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
润学全球官方指定GITHUB,整理润学宗旨、纲领、理论和各类润之实例;解决为什么润,润去哪里,怎么润三大问题; 并成为新中国人的核心宗教,核心信念。
程序员延寿指南 | A programmer's guide to live longer
Apache Spark - A unified analytics engine for large-scale data processing
A data generator source connector for Flink SQL based on data-faker.
🌟 Wiki of OI / ICPC for everyone. (某大型游戏线上攻略,内含炫酷算术魔法)
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
Binaries for TPC-DS data generators