Stars
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
ETL, Analytics, Versioning for Unstructured Data
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
Terraform module to create VPC resource on AWS.
A curated list of engineering blogs
A curated list of data engineering tools for software developers
Convert PDF to markdown + JSON quickly with high accuracy
A high-throughput and memory-efficient inference and serving engine for LLMs
🔥Highlighting the top ML papers every week.
LLM UI with advanced features, easy setup, and multiple backend support.
Supercharge Your LLM Application Evaluations 🚀
ARX is a comprehensive open source data anonymization tool aiming to provide scalability and usability. It supports various anonymization techniques, methods for analyzing data quality and re-ident…
This is a repo with links to everything you'd ever want to learn about data engineering
Examples and guides for using the OpenAI API
PyGWalker: Turn your dataframe into an interactive UI for visual analysis
A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin
📄 CLI that generates beautiful README.md files
Roadmap to becoming a data engineer in 2021
Kaggle-Knowhow(Korean Ver) 한국분들을 위한 Kaggle 자료 모음입니다
Fundamentals of Spark with Python (using PySpark), code examples
[한빛미디어] "이것이 취업을 위한 코딩 테스트다 with 파이썬" 전체 소스코드 저장소입니다.