Stars
A curated list of vibe coding references, collaborating with AI to write code.
Collection of AWESOME vision-language models for vision tasks
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
💯2025年信息系统项目管理师(软考高级)备考资源库。
Fine-Tuning SigLIP 2 for Single/Multi-Label Image Classification. Image classification vision-language encoder model fine-tuned for Image Classification Tasks
This is the official implementation of TrivialAugment and a mini-library for the application of multiple image augmentation strategies including RandAugment and TrivialAugment.
A repository of all code and resources of my published blog articles.
Official Repository for VELM, featured in CVPRW 2025 paper: "Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal LLMs"
坚持分享 GitHub 上高质量、有趣实用的开源技术教程、开发者工具、编程网站、技术资讯。A list cool, interesting projects of GitHub.
遇事不决,Vibe 力学! One-Person Company AI Tools Series – continuously updated to help boost productivity and empower your solo business!
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
A Framework of Small-scale Large Multimodal Models
Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Tr…
LAVIS - A One-stop Library for Language-Vision Intelligence
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high …
从零到一实现了一个多模态大模型,并命名为Reyes(睿视),R:睿,eyes:眼。Reyes的参数量为8B,视觉编码器使用的是InternViT-300M-448px-V2_5,语言模型侧使用的是Qwen2.5-7B-Instruct,Reyes也通过一个两层MLP投影层连接视觉编码器与语言模型。
Visual Instruction Tuning for Qwen2 Base Model
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
模型 llava-Qwen2-7B-Instruct-Chinese-CLIP 增强中文文字识别能力和表情包内涵识别能力,接近gpt4o、claude-3.5-sonnet的识别水平!
中文nlp解决方案(大模型、数据、模型、训练、推理)