+
Skip to content
View ttengwang's full-sized avatar

Block or report ttengwang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
5 Updated Jul 16, 2025

让你一眼惊艳的prompt

GCC Machine Description 200 61 Updated Jul 8, 2025

Code release for the paper "Progress-Aware Video Frame Captioning" (CVPR 2025)

Python 12 1 Updated Jul 16, 2025

Latest Advances on System-2 Reasoning

Python 1,182 63 Updated Jun 8, 2025

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,173 1,376 Updated Jul 9, 2025

This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"

Python 211 15 Updated Jul 14, 2025

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.

Python 814 24 Updated Jul 16, 2025

[ICML 2025 Oral] This is the official repository of the paper "What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities"

Python 14 2 Updated Jun 12, 2025

😎 Awesome list of Retrieval-Augmented Generation (RAG) applications in Generative AI.

526 21 Updated Jul 13, 2025

StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation

Python 37 1 Updated Jun 6, 2025

A collection of papers on discrete diffusion models

152 2 Updated Jun 30, 2025

A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.

1,136 50 Updated Jul 17, 2025

Latest Advances on Vison-Language-Action Models.

84 3 Updated Mar 4, 2025

A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"

520 24 Updated May 2, 2024

[Lumina Embodied AI Community] 具身智能技术指南 Embodied-AI-Guide

6,407 414 Updated Jul 16, 2025

[ICLR 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,597 68 Updated Jul 16, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,327 256 Updated Jun 12, 2025
Python 78 4 Updated Sep 19, 2024

[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

Python 17 Updated Jul 17, 2024

[Arxiv] Discrete Diffusion in Large Language and Multimodal Models: A Survey

Python 172 2 Updated Jul 7, 2025

The development and future prospects of multimodal reasoning models.

437 18 Updated Jul 15, 2025

Dream 7B, a large diffusion language model

Python 844 42 Updated Jun 18, 2025

the official repo for "D-AR: Diffusion via Autoregressive Models"

Python 106 2 Updated Jun 21, 2025

🔥🔥🔥Latest Papers, Codes on Uncertainty-based RL

35 3 Updated Jul 17, 2025
Python 2,261 149 Updated Jul 11, 2025

Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning

Python 30 Updated Jul 2, 2025

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 9,889 951 Updated Jul 17, 2025

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))

Python 40 Updated Jun 9, 2025
Python 13 Updated Jun 16, 2025
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载