streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
-
Updated
Jul 23, 2025 - Python
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
Tag manager and captioner for image datasets
AI-Powered Watermark Remover using Florence-2 and LaMA Models: A Python application leveraging state-of-the-art deep learning models to effectively remove watermarks from images with a user-friendly PyQt6 interface.
Use Segment Anything 2, grounded with Florence-2, to auto-label data for use in training vision models.
VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 Vision-Language Model. Includes a Gradio-based interface for querying and analyzing video footage.
Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.
Watermark remover tool that leverages the capabilities of Microsoft Florence and Lama Cleaner models.
Use Florence 2 to auto-label data for use in training fine-tuned object detection models.
vision language models finetuning notebooks & use cases (Medgemma - paligemma - florence .....)
A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha,meta Llama 3.2 Vision Instruct and Qwen2 VL Instruct models.
Local LLM Discord Bot
Run SOTA Vision-Language Model Florence-2 on your data!
Simple Video Summarization using Text-to-Segment Anything (Florence2 + SAM2) This project provides a video processing tool that utilizes advanced AI models, specifically Florence2 and SAM2, to detect and segment specific objects or activities in a video based on textual descriptions.
ecko-cli is a simple CLI tool that streamlines the process of processing images in a directory, generating captions, and saving them as text files. Additionally, it provides functionalities to create a JSONL file from images in the directory you specify. Images will be captioned using the Microsoft Florence-2-large model and ONNX
Simple Gradio application integrated with Hugging Face Multimodals to support visual question answering chatbot and more features
ONNX deploys for Florence 2 visual multimodal
TextSnap: Demo for Florence 2 model used in OCR tasks to extract and visualize text from images.
This application utilizes the powerful Florence-2 vision-language model from Microsoft to generate comprehensive captions for images. The model is capable of understanding visual content and expressing it in natural language.
The Ultimate Local LLM Discord Bot!!!
Add a description, image, and links to the florence-2 topic page so that developers can more easily learn about it.
To associate your repository with the florence-2 topic, visit your repo's landing page and select "manage topics."