+
Skip to content
View zhengli97's full-sized avatar

Block or report zhengli97

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

Python 131 6 Updated Jul 7, 2025

[IJCV 2025] Smaller But Better: Unifying Layout Generation with Smaller Large Language Models

Python 138 1 Updated Jun 15, 2025

Search-o1: Agentic Search-Enhanced Large Reasoning Models

Python 985 90 Updated May 13, 2025

Evaluating SOTA image generators' generation and editing abilities in OCR tasks.

189 1 Updated Jul 12, 2025
Python 59 6 Updated Jul 10, 2025

[ICCV 2025] Official PyTorch Code for "Advancing Textual Prompt Learning with Anchored Attributes"

Python 78 2 Updated Jul 15, 2025

[ICML 2024] The offical implementation of A2PR, a simple way to achieve SOTA in offline reinforcement learning with an adaptive advantage-guided policy regularization method, in Pytorch

Python 30 Updated May 31, 2024

Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

Python 113 7 Updated Jul 2, 2025

MedSeg-R: Medical Image Segmentation with Clinical Reasoning

7 Updated Jun 23, 2025

A paper list of some recent works about Token Compress for Vit and VLM

557 25 Updated Jul 18, 2025

[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"

Jupyter Notebook 143 4 Updated Nov 14, 2024

[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Python 298 17 Updated Oct 7, 2024

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,012 44 Updated Jul 19, 2025

When do we not need larger vision models?

Python 402 13 Updated Feb 8, 2025

A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.

1,151 51 Updated Jul 17, 2025

[CVPR 2025] Official PyTorch Code for "DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models"

Python 19 3 Updated Jun 16, 2025

Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs

Python 157 42 Updated Mar 12, 2025

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Python 347 29 Updated Aug 24, 2024

[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 452 16 Updated Jan 4, 2025

(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Python 116 1 Updated Mar 6, 2025

Solve Visual Understanding with Reinforced VLMs

Python 5,330 324 Updated Jun 26, 2025
Jupyter Notebook 780 74 Updated Aug 7, 2024

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Python 2,314 163 Updated Feb 16, 2025

Code release for "SegLLM: Multi-round Reasoning Segmentation"

Python 107 10 Updated Feb 20, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 12,031 1,495 Updated Apr 24, 2025

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 4,972 1,764 Updated Feb 26, 2025

Reproduce R1 Zero on Logic Puzzle

Python 2,377 157 Updated Mar 20, 2025

Simple RL training for reasoning

Python 3,689 275 Updated Apr 10, 2025

Training Large Language Model to Reason in a Continuous Latent Space

Python 1,196 108 Updated Jan 24, 2025
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载