-
Nankai University
- Hangzhou, China
-
21:57
(UTC +08:00) - https://zhengli97.github.io/
Lists (3)
Sort Name ascending (A-Z)
Stars
[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
[IJCV 2025] Smaller But Better: Unifying Layout Generation with Smaller Large Language Models
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Evaluating SOTA image generators' generation and editing abilities in OCR tasks.
[ICCV 2025] Official PyTorch Code for "Advancing Textual Prompt Learning with Anchored Attributes"
[ICML 2024] The offical implementation of A2PR, a simple way to achieve SOTA in offline reinforcement learning with an adaptive advantage-guided policy regularization method, in Pytorch
Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
MedSeg-R: Medical Image Segmentation with Clinical Reasoning
A paper list of some recent works about Token Compress for Vit and VLM
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
When do we not need larger vision models?
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.
[CVPR 2025] Official PyTorch Code for "DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models"
Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
Solve Visual Understanding with Reinforced VLMs
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Code release for "SegLLM: Multi-round Reasoning Segmentation"
Minimal reproduction of DeepSeek R1-Zero
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Training Large Language Model to Reason in a Continuous Latent Space