+

ttengwang

Follow

Teng Wang ttengwang

Follow

Tencent & HKU. 🌟 Actively looking for research interns in multimodality learning.

217 followers · 68 following

The University of Hong Kong
Hong Kong
ttengwang.com

Achievements

Achievements

Lists (1)

Sort

vision-language pretraining

vision-language pretraining

Stars

zyinan99 / DOGR

5 Updated Jul 16, 2025

NeekChaw / awesome-prompt

让你一眼惊艳的prompt

GCC Machine Description 200 61 Updated Jul 8, 2025

zihuixue / ProgCaptioner

Code release for the paper "Progress-Aware Video Frame Captioning" (CVPR 2025)

Python 12 1 Updated Jul 16, 2025

zzli2022 / Awesome-System2-Reasoning-LLM

Latest Advances on System-2 Reasoning

Python 1,182 63 Updated Jun 8, 2025

HW-whistleblower / True-Story-of-Pangu

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,173 1,376 Updated Jul 9, 2025

IVGSZ / Flash-VStream

This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"

Python 211 15 Updated Jul 14, 2025

THUDM / GLM-4.1V-Thinking

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.

Python 814 24 Updated Jul 16, 2025

antgroup / OmniBench

[ICML 2025 Oral] This is the official repository of the paper "What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities"

Python 14 2 Updated Jun 12, 2025

Danielskry / Awesome-RAG

😎 Awesome list of Retrieval-Augmented Generation (RAG) applications in Generative AI.

526 21 Updated Jul 13, 2025

wuyi2020 / StyleAR

StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation

Python 37 1 Updated Jun 6, 2025

hanyang1999 / discrete-diffusion-papers

A collection of papers on discrete diffusion models

152 2 Updated Jun 30, 2025

jonyzhang2023 / awesome-embodied-vla-va-vln

A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.

1,136 50 Updated Jul 17, 2025

DelinQu / awesome-vision-language-action-model

Latest Advances on Vison-Language-Action Models.

84 3 Updated Mar 4, 2025

eric-ai-lab / awesome-vision-language-navigation

A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"

520 24 Updated May 2, 2024

TianxingChen / Embodied-AI-Guide

[Lumina Embodied AI Community] 具身智能技术指南 Embodied-AI-Guide

6,407 414 Updated Jul 16, 2025

showlab / Show-o

[ICLR 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,597 68 Updated Jul 16, 2025

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,327 256 Updated Jun 12, 2025

huggingface / fineVideo

Python 78 4 Updated Sep 19, 2024

zjr2000 / REVERIE

[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

Python 17 Updated Jul 17, 2024

LiQiiiii / DLLM-Survey

[Arxiv] Discrete Diffusion in Large Language and Multimodal Models: A Survey

Python 172 2 Updated Jul 7, 2025

HITsz-TMG / Awesome-Large-Multimodal-Reasoning-Models

The development and future prospects of multimodal reasoning models.

437 18 Updated Jul 15, 2025

DreamLM / Dream

Dream 7B, a large diffusion language model

Python 844 42 Updated Jun 18, 2025

showlab / D-AR

the official repo for "D-AR: Diffusion via Autoregressive Models"

Python 106 2 Updated Jun 21, 2025

falonss703 / Awesome-Uncertainty-based-Reinforcement-Learning

🔥🔥🔥Latest Papers, Codes on Uncertainty-based RL

35 3 Updated Jul 17, 2025

guandeh17 / Self-Forcing

Python 2,261 149 Updated Jul 11, 2025

maifoundations / Visionary-R1

Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning

Python 30 Updated Jul 2, 2025

facebookresearch / vggt

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 9,889 951 Updated Jul 17, 2025

ttgeng233 / LongVALE

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))

Python 40 Updated Jun 9, 2025

AV-Reasoner / AV-Reasoner

Python 13 Updated Jun 16, 2025

liyz15 / Aligning-Latent-Spaces-with-Flow-Priors

Python 32 2 Updated Jun 6, 2025

点击这是indexloc提供的php浏览器服务，不要输入任何密码和下载