- Berlin
Stars
Get your documents ready for gen AI
Soulter / markitdown
Forked from microsoft/markitdownThe repo is a fork of microsoft/markitdown, I removed magika, so onnxruntime will not be included.
Python tool for converting files and office documents to Markdown.
Universal memory layer for AI Agents; Announcing OpenMemory MCP - local and secure memory management.
Typed argument parser for Python
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website …
Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.
LlamaIndex is the leading framework for building LLM-powered agents over your data.
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of …
Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy
A Collection of BM25 Algorithms in Python
Open-source vector similarity search for Postgres
SQL databases in Python, designed for simplicity, compatibility, and robustness.
Tesseract Open Source OCR Engine (main repository)
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
SGLang is a fast serving framework for large language models and vision language models.
The #1 open-source voice interface for desktop, mobile, and ESP32 chips.
A natural language interface for computers
An extremely fast Python package and project manager, written in Rust.
RayLLM - LLMs on Ray (Archived). Read README for more info.
A proxy server for multiple ollama instances with Key security