Build software better, together

docling-project / docling

Get your documents ready for gen AI

html markdown pdf ai convert xlsx pdf-converter docx documents pptx pdf-to-text tables document-parser pdf-to-json document-parsing

Updated Jul 26, 2025
Python

Unstructured-IO / unstructured

Star

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

Updated Jul 25, 2025
HTML

PaddlePaddle / PaddleOCR

Star

Awesome multilingual OCR and Document Parsing toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

ocr db crnn ocrlite chineseocr pp-ocr document-parsing pp-structure pdf2markdown chatocr

Updated Jul 25, 2025
Python

edenai / edenai-apis

Star

Eden AI: simplify the use and deployment of AI technologies by providing a unique API that connects to the best possible AI engines

python nlp api natural-language-processing text-to-speech ocr ai computer-vision aggregator machine-translation image-processing speech-recognition speech-to-text optical-character-recognition ai-as-a-service video-recognition pre-trained-model document-parsing

Updated Jul 25, 2025
Python

run-llama / llama_cloud_services

Star

Knowledge Agents and Management in the Cloud

pdf parsing document pptx structured-data pdf-to-text pdf-to-excel tables docx-to-markdown document-parser pdf-document-processor pdf-to-json document-parsing ppt-to-json pdf-to-markdown ppt-to-markdown

Updated Jul 24, 2025
Python

enoch3712 / ExtractThinker

Star

ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

python nlp pdf machine-learning ocr ai openai pdf-to-text document-processing document-image-analysis document-intelligence llm document-parsing langchain

Updated Jul 23, 2025
Python

AdemBoukhris457 / Docs_Parsing_Techniques

Star

Jupyter notebooks testing different OCR models for document parsing (Dolphin, MonkeyOCR, Marker, Nanonets, ...)

ocr ai parsing-data document-parsing genai

Updated Jul 22, 2025
Jupyter Notebook

Kathan-max / RAG-Enhanced-Chatbot-with-LoRA-Fine-Tuning

Star

Transform your documents into intelligent conversations. This open-source RAG chatbot combines semantic search with fine-tuned language models (LLaMA, Qwen2.5VL-3B) to deliver accurate, context-aware responses from your own knowledge base. Join our community!

Updated Jul 18, 2025
Python

kevv1m / tikara

Star

The metadata and text content extractor for almost every file type.

metadata text-mining ocr language-detection text-extraction docx pdf-to-text image-to-text apache-tika document-processing github-config llm document-parsing retrieval-augmented-generation

Updated Jul 18, 2025

GiftMungmeeprued / document-parsers-list

Star

A comprehensive list of document parsers, covering PDF-to-text conversion and layout extraction. Each tested for support of tables, equations, handwriting, two-column layouts, and multi-column layouts.

pdf ocr preprocessing pdf-to-text document-image-processing data-pipeline document-parser document-parsing langchain

Updated Jul 14, 2025

AhmedZeyadTareq / Llama-Parse-Content-Extraction

Star

extract and analyze content from various file formats including PDFs, text files, and images.

content-extraction file-processing rag pdf-parser-component document-parsing llama-index llamaparse

Updated Jul 6, 2025
Python

ziming / laravel-docparser

Sponsor

Star

Docparser OCR Package for PHP Laravel

php laravel ocr docparser doc-parser document-parsing

Updated Jun 17, 2025
PHP

Anwarsha7 / resumeparser

Star

An intelligent resume parsing engine built with Python and NLP, aimed at automating the tedious task of sifting through resumes. It accurately extracts vital candidate information such as contact details, employment history, educational qualifications, and technical skills, making it an invaluable asset for recruitment and HR professionals.

python natural-language-processing text-mining information-extraction data-extraction recruitment resume-parser npl resume-analysis hr-management hr-tech parsing-data document-parsing candidate-screening

Updated Jun 2, 2025
HTML

MegrezAI / LeapRAG

Star

LeapRAG is an open-source platform that integrates advanced RAG technology with Google’s A2A protocol, enabling users to build context-aware, data-driven agents. These agents are automatically A2A-compliant and can be discovered and used by any compatible client without extra development.

nlp pdf openai pdf-to-text agents document-parser rag a2a llm document-parsing chatgpt retrieval-augmented-generation ollama deepseek a2a-protocol agent-to-agent

Updated May 27, 2025
Python

azzubair01 / zubairhub

Star

ZubairHub is a Streamlit-based application that integrates various functionalities, including social graph visualization, object detection, document parsing, text extraction, generative AI interaction, and personal data transformation.

object-detection optical-character-recognition social-graph streamlit document-parsing generative-ai

Updated May 12, 2025
Python

alexvargashn / doc23

Star

Convert PDFs, DOCX, TXT & more into structured JSON trees using Python. Built for legal, institutional and NLP applications.

python nlp open-source pdf json ocr text text-extraction legaltech honduras document-parsing

Updated Apr 28, 2025
Python

Mouez-Yazidi / Multilingual-Invoice-Parsing-with-LLaMA-4

Star

Combining OCR for text extraction with LLMs for accurate, efficient document structuring.

ocr document-parsing llama4

Updated Apr 18, 2025
Python

acenji / ats

Star

Applicant Tracking System (ATS): A powerful platform leveraging generative AI and soft-match algorithms to analyze resumes against job descriptions. Built with React and Node.js, it streamlines hiring insights. Future plans include expanding to investor pitches and other structured documents.

nodejs reactjs sorting-algorithms ats keyword-extraction nlp-machine-learning job-matching resume-analysis applicant-tracking-system document-parsing generative-ai investor-pitches

Updated Apr 15, 2025
JavaScript

ashkunwar / Text-Extraction-Using-LLM-s

Star

This repository showcases a practical and easy-to-follow implementation of text extraction using Large Language Models (LLMs). Designed for developers, data scientists, and AI enthusiasts, it walks you through everything from setup to evaluation, making it a great resource for real-world NLP applications.

transformers text-extraction llm document-parsing

Updated Apr 9, 2025
Jupyter Notebook

rithulkamesh / docproc

Sponsor

Star

Opinionated and Sophisticated Document Region Analyzer.

python machine-learning ocr text-classification text-extraction data-extraction region-detection content-extraction document-analysis layout-analysis pdf-processing pdf-text-extraction document-parsing equation-detection mathematical-symbols

Updated Apr 13, 2025
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

document-parsing

Here are 36 public repositories matching this topic...

docling-project / docling

Unstructured-IO / unstructured

PaddlePaddle / PaddleOCR

edenai / edenai-apis

run-llama / llama_cloud_services

enoch3712 / ExtractThinker

AdemBoukhris457 / Docs_Parsing_Techniques

Kathan-max / RAG-Enhanced-Chatbot-with-LoRA-Fine-Tuning

kevv1m / tikara

GiftMungmeeprued / document-parsers-list

AhmedZeyadTareq / Llama-Parse-Content-Extraction

ziming / laravel-docparser

Anwarsha7 / resumeparser

MegrezAI / LeapRAG

azzubair01 / zubairhub

alexvargashn / doc23

Mouez-Yazidi / Multilingual-Invoice-Parsing-with-LLaMA-4

acenji / ats

ashkunwar / Text-Extraction-Using-LLM-s

rithulkamesh / docproc

Improve this page

Add this topic to your repo