Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,034 36 Updated Oct 4, 2025

IsabelleQin / Representation-Based-Fairness-Evaluation-and-Bias-Correction-Robustness-Assessment

Replication package for paper: Representation-Based Fairness Evaluation and Bias Correction Robustness Assessment in Neural Networks

Python 1 Updated Aug 27, 2025

ys-zong / VL-ICL

[ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning

Python 65 4 Updated Sep 20, 2025

ByteDance-Seed / Bagel

Open-source unified multimodal model

Python 5,183 446 Updated Aug 22, 2025

Meirtz / Awesome-Context-Engineering

🔥 Comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems. hundreds of papers, frameworks, and implementation guides for LLMs and AI agents.

2,462 166 Updated Aug 5, 2025

MoonshotAI / Kimi-VL

Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities

1,074 51 Updated Jul 15, 2025

QwenLM / Qwen3-Embedding

Python 1,493 88 Updated Sep 30, 2025

jungao1106 / ICoT

[CVPR' 25] Interleaved-Modal Chain-of-Thought

Python 89 4 Updated Oct 7, 2025

black-forest-labs / flux

Official inference repo for FLUX.1 models

Python 24,497 1,798 Updated Jul 31, 2025

wooyeolbaek / attention-map-diffusers

🚀 Cross attention map tools for huggingface/diffusers

Python 348 27 Updated Jan 18, 2025

Eclipsess / Awesome-Efficient-Reasoning-LLMs

[TMLR 2025] Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

649 32 Updated Sep 16, 2025

facebookresearch / vggt

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 11,304 1,148 Updated Oct 11, 2025

allenai / ai2thor

An open-source platform for Visual AI.

C# 1,539 258 Updated Oct 14, 2025

yu-rp / apiprompting

[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models

Python 103 6 Updated Oct 10, 2024

ispras / dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electro…

Python 612 48 Updated Sep 22, 2025

singularguy / MultimodalRAG

一个简单的多模态RAG项目

Jupyter Notebook 227 17 Updated May 13, 2025

OneIG-Bench / OneIG-Benchmark

[NeurIPS 2025 DB] OneIG-Bench is a meticulously designed comprehensive benchmark framework for fine-grained evaluation of T2I models across multiple dimensions, including subject-element alignment,…

Python 76 3 Updated Oct 2, 2025

saccharomycetes / mllms_know

[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'

Python 278 16 Updated Apr 20, 2025

PKU-ICST-MIPL / DyFo_CVPR2025

Python 89 4 Updated Aug 14, 2025

dongyh20 / Insight-V

[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Python 224 5 Updated Jul 4, 2025

google-research / maskgit

Official Jax Implementation of MaskGIT

Jupyter Notebook 536 52 Updated Nov 18, 2022

DreamLM / Dream

Dream 7B, a large diffusion language model

Python 1,018 55 Updated Sep 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Honghao Fu RomGai

Achievements

Achievements

Block or report RomGai

Stars

YuyaoGe / Awesome-Vibe-Coding

facebookresearch / perception_models

NeuroDong / Ai-Review

RomanHauksson / academic-project-astro-template

robi56 / video-summarization-resources

yalesong / tvsum

hiyouga / EasyR1

volcengine / verl

zhaochen0110 / Awesome_Think_With_Images