Stars
ocr
7 repositories
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Get your documents ready for gen AI
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems (WWW 2025)
OCR, layout analysis, reading order, table recognition in 90+ languages
Python tool for converting files and office documents to Markdown.
Awesome multilingual OCR and Document Parsing toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools,…