fireae

fireae fireae

Achievements

Starred repositories

Tasmay-Tibrewal / tokeniser-py

A library with a custom tokeniser with 131,072-token vocabulary derived from 0.5B (val) and 1B (val+test) tokens in SlimPajama. Uses a novel token generation algorithm and a dynamic programming-bas…

Jupyter Notebook 2 Updated Apr 3, 2025

AdemBoukhris457 / Documents-Parsing-Lab

Jupyter notebooks testing different OCR models for document parsing (Dolphin, MonkeyOCR, Marker, Nanonets, ...)

Jupyter Notebook 70 8 Updated Sep 18, 2025

cohere-ai / cohere-toolkit

Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.

TypeScript 3,135 427 Updated Oct 21, 2025

cofe-ai / nanoLM

An Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales

Python 16 2 Updated Jun 6, 2024

Mobile-Artificial-Intelligence / maid

Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.

Dart 2,164 222 Updated Jul 28, 2025

LdotJdot / TDSContent

A windows file full-text search program

C# 27 8 Updated Oct 17, 2025

OpenMOSS / MOSS

An open-source tool-augmented conversational language model from Fudan University

Python 12,054 1,138 Updated Jul 13, 2024

UbiquitousLearning / mllm

Fast Multimodal LLM on Mobile Devices

C++ 1,121 136 Updated Oct 18, 2025

opendataloader-project / opendataloader-pdf

Safe, Open, High-Performance — PDF for AI

Java 705 30 Updated Oct 21, 2025

alibaba / Logics-Parsing

Python 703 57 Updated Oct 13, 2025

YesianRohn / TextSSR

[ICCV2025] TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition

Python 85 2 Updated Sep 23, 2025

google / diff-match-patch

Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

Python 7,937 1,171 Updated May 22, 2024

alibaba / higress

🤖 AI Gateway | AI Native API Gateway

Go 6,646 856 Updated Oct 21, 2025

daphne-eu / daphne

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines

C++ 76 76 Updated Oct 8, 2025

PRITHIVSAKTHIUR / OCR-ReportLab-Notebooks

A dedicated Colab notebooks to experiment (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B & more..) On T4 GPU - free tier

Jupyter Notebook 22 3 Updated Jul 29, 2025

tex-tar / tex-tar

Python 8 1 Updated Aug 1, 2025

ConardLi / easy-dataset

A powerful tool for creating fine-tuning datasets for LLM

JavaScript 11,238 1,085 Updated Oct 19, 2025

EvolvingLMMs-Lab / LLaVA-OneVision-1.5

Fully Open Framework for Democratized Multimodal Training

Python 561 40 Updated Oct 21, 2025

Om-Doiphode / Image_Pipeline

The image pipeline takes raw image from sensor and convert it to meaningful image. Several algorithms like debayering, Black Level correction, auto-white balance, denoising.. will be first implemen…

C++ 27 6 Updated Aug 25, 2023

RapidAI / RapidTable

基于序列表格识别算法推理库，集成PP-Structure和modelscope等表格识别算法。

Python 382 37 Updated Sep 4, 2025

cloudwego / eino

The ultimate LLM/AI application development framework in Golang.

Go 7,794 576 Updated Oct 21, 2025

cloudwego / eino-ext

Various extensions for the Eino framework: https://github.com/cloudwego/eino

Go 477 195 Updated Oct 21, 2025

Tencent / POINTS-Reader

180 7 Updated Sep 16, 2025

vivo / StructureMatters

Structured Attention Matters to Multimodal LLMs in Document Understanding

Python 5 Updated Jun 30, 2025

VectifyAI / PageIndex

📄🧠 PageIndex: Document Index for Reasoning-based RAG

Python 2,851 213 Updated Oct 14, 2025

allenai / OLMoASR

An open-source implementation of Whisper

Python 449 41 Updated Oct 8, 2025

qibin0506 / Cortex

个人构建MoE大模型：从预训练到DPO的完整实践

Python 1,642 130 Updated Oct 21, 2025

xw-hu / Unveiling-Deep-Shadows

A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Era of Deep Learning (Awesome & Benchmark)

Python 97 3 Updated Mar 3, 2025

IsHYuhi / BEDSR-Net_A_Deep_Shadow_Removal_Network_from_a_Single_Document_Image

Unofficial implementation of ''BEDSR-Net: A Deep Shadow Removal from a Single Document Image'' with PyTorch

Jupyter Notebook 61 12 Updated Nov 13, 2021

dromara / wgai

开箱即用的JAVA AI 图片、视频语音识别&OCR平台AI合集包含旦不仅限于(车牌识别、安全帽识别、开门关门、常用类物识别等) 图片和视频识别可自主融合了AI图像识别opencv、yolo、ocr、esayAI内核识别;AI智能客服、AI语言模型、无任何第三方API接口可定制化自主离线化部署并自主化行业化使用避免占用内存、GPU消耗训练与识别分开使用;

fireae fireae

Starred repositories

text-detection

text-recognition

ocr