+
Skip to content
View jwyang's full-sized avatar
🏠
🏠

Block or report jwyang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,821 144 Updated Oct 4, 2025

Fully open reproduction of DeepSeek-R1

Python 25,536 2,398 Updated Sep 8, 2025

[ICLR 2025] LAPA: Latent Action Pretraining from Videos

Python 385 18 Updated Jan 22, 2025

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Python 37 1 Updated Nov 10, 2024

[CVPR 2024 Highlight] Official PyTorch implementation of SpatialTracker: Tracking Any 2D Pixels in 3D Space

Python 1,010 39 Updated Aug 8, 2025

Matryoshka Multimodal Models

Python 111 8 Updated Jan 22, 2025

[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs".

Python 59 3 Updated Jun 17, 2024

[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Python 144 4 Updated Aug 23, 2024

Reaching LLaMA2 Performance with 0.1M Dollars

Python 985 80 Updated Jul 23, 2024

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Python 747 52 Updated Sep 27, 2024

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

Python 1,153 75 Updated Oct 21, 2024
4 Updated Sep 30, 2024
Python 628 32 Updated Feb 15, 2024
Python 416 16 Updated Jul 29, 2024

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Python 2,636 217 Updated Oct 13, 2025

[CVPR 2024] Official implementation of the paper "Visual In-context Learning"

Python 503 23 Updated Apr 8, 2024

Browse the web with GPT-4V and Vimium

Python 2,669 201 Updated Sep 25, 2024

Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

Python 1,682 132 Updated Jan 14, 2025

AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI

JavaScript 1,054 101 Updated Dec 9, 2024

[arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMs

Python 1,466 112 Updated Aug 19, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 60,008 10,521 Updated Oct 14, 2025

[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language

Python 1,333 156 Updated Oct 5, 2023

Official repository for "Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition" [ICCV 2023]

Python 100 19 Updated Apr 30, 2024

[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"

Python 2,745 143 Updated Jul 10, 2025

Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"

Python 410 11 Updated Mar 25, 2024

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Python 4,727 446 Updated Aug 19, 2024

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Python 6,458 375 Updated Jun 2, 2025

[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"

Python 731 46 Updated Jan 22, 2024

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"

Python 1,356 86 Updated Jan 23, 2024

Code base for MinD-Vis

Python 781 105 Updated May 24, 2023
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载