Stars
Examples and guides for using the Gemini API
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Generates the BigQuery schema from newline-delimited JSON or CSV data records.
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
A simple, performant and scalable Jax LLM!
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Apache Druid: a high performance real-time analytics database.
Notes on books I read, talks I watch, articles I study, and papers I love
Lightweight proxy to expose the UI of an Apache Spark cluster that is behind a firewall
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
An end to end demo of Google's Cloud data and analytic stack.
Ultimate Helm chart for Apicurio Registry
Examples for running Debezium (Configuration, Docker Compose files etc.)
Safely store secrets in Git/Mercurial/Subversion
An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.
A Kubernetes Operator for running the Confluent Schema Registry with a Strimzi-based Kafka cluster
Open-Source Web UI for Apache Kafka Management
Spark: The Definitive Guide's Code Repository
Upserts, Deletes And Incremental Processing on Big Data.
The Data Engineering Book - หนังสือวิศวกรรมข้อมูล ของคนไทย เพื่อคนไทย
Compare the various managed cloud services offered by the major public cloud providers in the market.
SPARK SQL - Python APIs... READ AND WRITE - Avro, Parquet, ORC, CSV, JSON, Hive tables…