-
Amazon AGI
- Bellevue, WA, US
-
07:52
(UTC -07:00) - whwu95.github.io
- @dr_wenhao
- in/wenhao-w-usyd
Highlights
- Pro
Stars
Wan: Open and Advanced Large-Scale Video Generative Models
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
Awesome Reasoning in MLLMs: Papers and Projects about learning to reason with MLLMs, including Chain-of-Thought (CoT), OpenAl o1, and DeepSeek-R1
[NIPS'25 Spotlight] Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS
Efficient Multimodal Large Language Models: A Survey
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A series of math-specific large language models of our Qwen2 series.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Retrieval-Augmented Generation in 3 Lines of Code!
AudioBench: A Universal Benchmark for Audio Large Language Models
【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"
FreeVA: Offline MLLM as Training-Free Video Assistant
AcadHomepage: A Modern and Responsive Academic Personal Homepage
Awesome-LLM-Tabular: a curated list of Large Language Model applied to Tabular Data
GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?
【ICCV'2023】What Can Simple Arithmetic Operations Do for Temporal Modeling?
Demonstrate all the questions on LeetCode in the form of animation.(用动画的形式呈现解LeetCode题目的思路)
[ICCV 2023] Official Implementation of "Generalized Lightness Adaptation with Channel Selective Normalization"
A curated list of papers and open-source resources focused on 3D AIGC.
Badges for your personal developer branding, profile, and projects.
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection