+
Skip to content
View RomGai's full-sized avatar
  • The University of Queensland
  • Brisbane

Block or report RomGai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 1,680 108 Updated Sep 16, 2025

Large language model review prompts

JavaScript 236 23 Updated Oct 19, 2025

Astro template to help you build an interactive project page for your research paper

Astro 395 32 Updated Oct 19, 2025

Video Summarization Dataset, Papers, Codes

174 26 Updated Aug 17, 2018

TVSum: Title-based Video Summarization dataset (CVPR 2015)

MATLAB 132 13 Updated Nov 10, 2019

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 3,824 289 Updated Oct 18, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 14,489 2,296 Updated Oct 19, 2025

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,034 36 Updated Oct 4, 2025

Replication package for paper: Representation-Based Fairness Evaluation and Bias Correction Robustness Assessment in Neural Networks

Python 1 Updated Aug 27, 2025

[ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning

Python 65 4 Updated Sep 20, 2025

Open-source unified multimodal model

Python 5,183 446 Updated Aug 22, 2025

🔥 Comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems. hundreds of papers, frameworks, and implementation guides for LLMs and AI agents.

2,462 166 Updated Aug 5, 2025

Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities

1,074 51 Updated Jul 15, 2025
Python 1,493 88 Updated Sep 30, 2025

[CVPR' 25] Interleaved-Modal Chain-of-Thought

Python 89 4 Updated Oct 7, 2025

Official inference repo for FLUX.1 models

Python 24,497 1,798 Updated Jul 31, 2025

🚀 Cross attention map tools for huggingface/diffusers

Python 348 27 Updated Jan 18, 2025

[TMLR 2025] Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

649 32 Updated Sep 16, 2025

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 11,304 1,148 Updated Oct 11, 2025

An open-source platform for Visual AI.

C# 1,539 258 Updated Oct 14, 2025

[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models

Python 103 6 Updated Oct 10, 2024

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electro…

Python 612 48 Updated Sep 22, 2025

一个简单的多模态RAG项目

Jupyter Notebook 227 17 Updated May 13, 2025

[NeurIPS 2025 DB] OneIG-Bench is a meticulously designed comprehensive benchmark framework for fine-grained evaluation of T2I models across multiple dimensions, including subject-element alignment,…

Python 76 3 Updated Oct 2, 2025

[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'

Python 278 16 Updated Apr 20, 2025
Python 89 4 Updated Aug 14, 2025

[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Python 224 5 Updated Jul 4, 2025

Official Jax Implementation of MaskGIT

Jupyter Notebook 536 52 Updated Nov 18, 2022

Dream 7B, a large diffusion language model

Python 1,018 55 Updated Sep 26, 2025
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载