- All languages
- Assembly
- AutoHotkey
- B4X
- C
- C#
- C++
- CMake
- CSS
- Cuda
- Dart
- Dockerfile
- Fancy
- Go
- HTML
- Haskell
- Java
- JavaScript
- Jinja
- Jupyter Notebook
- Kotlin
- Lua
- MATLAB
- Makefile
- Markdown
- Mustache
- Objective-C
- PHP
- Perl
- Python
- Rich Text Format
- Roff
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Svelte
- Swift
- TeX
- TypeScript
- Visual Basic 6.0
- Vue
- XSLT
- Zig
Starred repositories
A library with a custom tokeniser with 131,072-token vocabulary derived from 0.5B (val) and 1B (val+test) tokens in SlimPajama. Uses a novel token generation algorithm and a dynamic programming-bas…
Jupyter notebooks testing different OCR models for document parsing (Dolphin, MonkeyOCR, Marker, Nanonets, ...)
Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.
An Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales
Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.
An open-source tool-augmented conversational language model from Fudan University
Safe, Open, High-Performance — PDF for AI
[ICCV2025] TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines
A dedicated Colab notebooks to experiment (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B & more..) On T4 GPU - free tier
A powerful tool for creating fine-tuning datasets for LLM
Fully Open Framework for Democratized Multimodal Training
The image pipeline takes raw image from sensor and convert it to meaningful image. Several algorithms like debayering, Black Level correction, auto-white balance, denoising.. will be first implemen…
基于序列表格识别算法推理库,集成PP-Structure和modelscope等表格识别算法。
The ultimate LLM/AI application development framework in Golang.
Various extensions for the Eino framework: https://github.com/cloudwego/eino
Structured Attention Matters to Multimodal LLMs in Document Understanding
📄🧠 PageIndex: Document Index for Reasoning-based RAG
A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Era of Deep Learning (Awesome & Benchmark)
Unofficial implementation of ''BEDSR-Net: A Deep Shadow Removal from a Single Document Image'' with PyTorch
开箱即用的JAVA AI 图片、视频语音识别&OCR平台AI合集包含旦不仅限于(车牌识别、安全帽识别、开门关门、常用类物识别等) 图片和视频识别 可自主 融合了AI图像识别opencv、yolo、ocr、esayAI内核识别;AI智能客服、AI语言模型、 无任何第三方API接口可定制化自主离线化部署并自主化行业化使用 避免占用内存、GPU消耗训练与识别分开使用;