-
National University of Singapore
- https://waxnkw.github.io/
Stars
Official repo for paper "Sparse Representation and Construction for High-Resolution 3D Shapes Modeling".
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model for generalized humanoid robot reasoning and skills.
Long Context Transfer from Language to Vision
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Open-Sora: Democratizing Efficient Video Production for All
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval --ICCV2023 Oral
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, B…
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Running large language models on a single GPU for throughput-oriented scenarios.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
This is the code of ECCV 2022 (Oral) paper "Fine-Grained Scene Graph Generation with Data Transfer".
Code repository for "It's About Time: Analog clock Reading in the Wild"
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
Visual Relation Grounding in Videos (ECCV'20, Spotlight)
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)