+
Skip to content
View zhengli97's full-sized avatar

Block or report zhengli97

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Automatic Video Generation from Scientific Papers

Python 379 35 Updated Oct 9, 2025
Python 58 4 Updated May 5, 2025

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

Python 531 28 Updated Jun 29, 2025

Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)

Python 633 21 Updated Sep 24, 2025

Learning audio concepts from natural language supervision

Python 600 42 Updated Sep 18, 2024

🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.

2,813 125 Updated Oct 7, 2025

🔥 🔥 🔥 Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding 📹

46 1 Updated Sep 1, 2025

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 7,639 484 Updated Oct 3, 2025

Official Release of ICCV 2025 paper -- DiscretizedSDF

Python 93 6 Updated Aug 25, 2025

Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

Python 362 8 Updated Jun 22, 2025

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

Python 2,244 205 Updated Oct 6, 2025

Collection of Composed Image Retrieval (CIR) papers.

267 17 Updated Aug 18, 2025

Code for paper "Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters" CVPR2024

Python 250 19 Updated Sep 18, 2025

[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

Python 164 7 Updated Jul 7, 2025

[IJCV 2025] Smaller But Better: Unifying Layout Generation with Smaller Large Language Models

Python 146 1 Updated Aug 3, 2025

🔍 Search-o1: Agentic Search-Enhanced Large Reasoning Models [EMNLP 2025]

Python 1,063 94 Updated Aug 21, 2025

[arXiv 25] Aesthetics is Cheap, Show me the Text: An Empirical Evaluation of State-of-the-Art Generative Models for OCR

232 3 Updated Aug 28, 2025

[NeurIPS 2025 Oral] Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Python 149 13 Updated Oct 4, 2025

[ICCV 2025] Official PyTorch Code for "Advancing Textual Prompt Learning with Anchored Attributes"

Python 99 2 Updated Sep 11, 2025

[ICML 2024] The offical implementation of A2PR, a simple way to achieve SOTA in offline reinforcement learning with an adaptive advantage-guided policy regularization method, in Pytorch

Python 32 Updated May 31, 2024

Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

Python 125 7 Updated Jul 2, 2025

MedSeg-R: Medical Image Segmentation with Clinical Reasoning

7 Updated Jun 23, 2025

A paper list of some recent works about Token Compress for Vit and VLM

686 30 Updated Sep 17, 2025

[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"

Jupyter Notebook 147 4 Updated Nov 14, 2024

[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Python 321 19 Updated Oct 7, 2024

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,211 57 Updated Oct 1, 2025

When do we not need larger vision models?

Python 409 13 Updated Feb 8, 2025

A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.

1,728 72 Updated Oct 9, 2025

[CVPR 2025] Official PyTorch Code for "DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models"

Python 32 5 Updated Sep 14, 2025
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载