+
Skip to content
View waxnkw's full-sized avatar

Block or report waxnkw

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official repo for paper "Sparse Representation and Construction for High-Resolution 3D Shapes Modeling".

1,012 42 Updated Jun 16, 2025

NVIDIA Isaac GR00T N1.5 is the world's first open foundation model for generalized humanoid robot reasoning and skills.

Jupyter Notebook 4,427 641 Updated Jul 16, 2025

Long Context Transfer from Language to Vision

Python 384 20 Updated Mar 18, 2025
Python 4,036 383 Updated Jun 13, 2025

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Python 556 34 Updated Oct 20, 2024
Python 152 25 Updated Oct 31, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 26,894 2,619 Updated Apr 30, 2025

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 19,838 1,444 Updated Jun 30, 2025

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 8,591 663 Updated Jul 16, 2025

The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".

Python 246 9 Updated Feb 5, 2024

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)

Python 822 48 Updated Jul 29, 2024

Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

Python 3,537 357 Updated May 13, 2025

Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval --ICCV2023 Oral

Python 92 Updated Nov 2, 2023

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

Python 1,063 91 Updated Jun 13, 2024
Python 785 45 Updated Jul 8, 2024

Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability

Python 939 85 Updated Nov 11, 2023

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, B…

Python 529 39 Updated Apr 21, 2024

Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts

Python 324 36 Updated Aug 1, 2023

Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.

Python 273 25 Updated Oct 13, 2023

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 38,879 4,724 Updated Jun 2, 2025

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,348 582 Updated Oct 28, 2024

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 29,891 3,679 Updated Jul 23, 2024

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

Python 157 20 Updated Dec 9, 2024

This is the code of ECCV 2022 (Oral) paper "Fine-Grained Scene Graph Generation with Data Transfer".

Jupyter Notebook 102 10 Updated Jan 24, 2023

Official repository for the A-OKVQA dataset

Python 96 12 Updated May 8, 2024

Code repository for "It's About Time: Analog clock Reading in the Wild"

Python 77 12 Updated Jun 15, 2024

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Python 29 1 Updated Jul 18, 2023

Visual Relation Grounding in Videos (ECCV'20, Spotlight)

Python 57 7 Updated Dec 8, 2022

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Python 161 16 Updated Jul 25, 2024

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)

Python 34 4 Updated Sep 17, 2022
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载