zhengli97

Zheng Li zhengli97

PhD student@NKU

114 followers · 40 following

Nankai University
Hangzhou, China
12:23 (UTC +08:00)
https://zhengli97.github.io/

Achievements

Lists (3)

Sort

Stars

showlab / Paper2Video

Automatic Video Generation from Scientific Papers

Python 379 35 Updated Oct 9, 2025

ywh187 / FitPrune

Python 58 4 Updated May 5, 2025

ictnlp / LLaVA-Mini

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

Python 531 28 Updated Jun 29, 2025

NVlabs / Long-RL

Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)

Python 633 21 Updated Sep 24, 2025

microsoft / CLAP

Learning audio concepts from natural language supervision

Python 600 42 Updated Sep 18, 2024

yunlong10 / Awesome-LLMs-for-Video-Understanding

🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.

2,813 125 Updated Oct 7, 2025

pipixin321 / Awesome-Video-MLLMs

🔥 🔥 🔥 Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding 📹

46 1 Updated Sep 1, 2025

facebookresearch / dinov3

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 7,639 484 Updated Oct 3, 2025

StellarOdys2ey / Generalizable-Prompt-Learning-for-VLMs

1 Updated Aug 23, 2025

NK-CS-ZZL / DiscretizedSDF

Official Release of ICCV 2025 paper -- DiscretizedSDF

Python 93 6 Updated Aug 25, 2025

diankun-wu / Spatial-MLLM

Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

Python 362 8 Updated Jun 22, 2025

illuin-tech / colpali

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

Python 2,244 205 Updated Oct 6, 2025

haokunwen / Awesome-Composed-Image-Retrieval

Collection of Composed Image Retrieval (CIR) papers.

267 17 Updated Aug 18, 2025

JiazuoYu / MoE-Adapters4CL

Code for paper "Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters" CVPR2024

Python 250 19 Updated Sep 18, 2025

Code-kunkun / LamRA

[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

Python 164 7 Updated Jul 7, 2025

NiceRingNode / LGGPT

[IJCV 2025] Smaller But Better: Unifying Layout Generation with Smaller Large Language Models

Python 146 1 Updated Aug 3, 2025

RUC-NLPIR / Search-o1

🔍 Search-o1: Agentic Search-Enhanced Large Reasoning Models [EMNLP 2025]

Python 1,063 94 Updated Aug 21, 2025

NiceRingNode / Awesome-Generative-Models-for-OCR

[arXiv 25] Aesthetics is Cheap, Show me the Text: An Empirical Evaluation of State-of-the-Art Generative Models for OCR

232 3 Updated Aug 28, 2025

Martinser / REG

[NeurIPS 2025 Oral] Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Python 149 13 Updated Oct 4, 2025

zhengli97 / ATPrompt

[ICCV 2025] Official PyTorch Code for "Advancing Textual Prompt Learning with Anchored Attributes"

Python 99 2 Updated Sep 11, 2025

ltlhuuu / A2PR

[ICML 2024] The offical implementation of A2PR, a simple way to achieve SOTA in offline reinforcement learning with an adaptive advantage-guided policy regularization method, in Pytorch

Python 32 Updated May 31, 2024

neilwen987 / CSR_Adaptive_Rep

Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

Python 125 7 Updated Jul 2, 2025

haoshao-nku / MedSeg-R

MedSeg-R: Medical Image Segmentation with Clinical Reasoning

7 Updated Jun 23, 2025

daixiangzi / Awesome-Token-Compress

A paper list of some recent works about Token Compress for Vit and VLM

686 30 Updated Sep 17, 2025

Yangyi-Chen / SOLO

[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"

Jupyter Notebook 147 4 Updated Nov 14, 2024

DAMO-NLP-SG / VCD

[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Python 321 19 Updated Oct 7, 2024

Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,211 57 Updated Oct 1, 2025

bfshi / scaling_on_scales

When do we not need larger vision models?

Python 409 13 Updated Feb 8, 2025

jonyzhang2023 / awesome-embodied-vla-va-vln

A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.

1,728 72 Updated Oct 9, 2025

JREion / DPC

[CVPR 2025] Official PyTorch Code for "DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models"

Python 32 5 Updated Sep 14, 2025

Zheng Li zhengli97

Lists (3)

Paper List

Tools

vision-language-models

Stars