jwyang

🏠

Jianwei Yang jwyang

🏠

Research Scientist at Meta

1.9k followers · 32 following

Stars

microsoft / Magma

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,821 144 Updated Oct 4, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 25,536 2,398 Updated Sep 8, 2025

LatentActionPretraining / LAPA

[ICLR 2025] LAPA: Latent Action Pretraining from Videos

Python 385 18 Updated Jan 22, 2025

mu-cai / TemporalBench

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Python 37 1 Updated Nov 10, 2024

henry123-boy / SpaTracker

[CVPR 2024 Highlight] Official PyTorch implementation of SpatialTracker: Tracking Any 2D Pixels in 3D Space

Python 1,010 39 Updated Aug 8, 2025

mu-cai / matryoshka-mm

Matryoshka Multimodal Models

Python 111 8 Updated Jan 22, 2025

MengLcool / DeepStack-VL

[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs".

Python 59 3 Updated Jun 17, 2024

zzxslp / SoM-LLaVA

[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Python 144 4 Updated Aug 23, 2024

myshell-ai / JetMoE

Reaching LLaMA2 Performance with 0.1M Dollars

Python 985 80 Updated Jul 23, 2024

jzhang38 / EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Python 747 52 Updated Sep 27, 2024

FoundationVision / GLEE

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

Python 1,153 75 Updated Oct 21, 2024

roboflow / maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Python 2,636 217 Updated Oct 13, 2025

UX-Decoder / DINOv

[CVPR 2024] Official implementation of the paper "Visual In-context Learning"

Python 503 23 Updated Apr 8, 2024

ishan0102 / vimGPT

Browse the web with GPT-4V and Vimium

Python 2,669 201 Updated Sep 25, 2024

roboflow / awesome-openai-vision-api-experiments

Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

Python 1,682 132 Updated Jan 14, 2025

ddupont808 / GPT-4V-Act

AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI

JavaScript 1,054 101 Updated Dec 9, 2024

microsoft / SoM

[arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMs

Python 1,466 112 Updated Aug 19, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 60,008 10,521 Updated Oct 14, 2025

microsoft / X-Decoder

[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language

Python 1,333 156 Updated Oct 5, 2023

TalalWasim / Video-FocalNets

Official repository for "Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition" [ICCV 2023]

Python 100 19 Updated Apr 30, 2024

UX-Decoder / Semantic-SAM

[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"

Python 2,745 143 Updated Jul 10, 2025

Zhendong-Wang / Prompt-Diffusion

Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"

Python 410 11 Updated Mar 25, 2024

UX-Decoder / Segment-Everything-Everywhere-All-At-Once

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Python 4,727 446 Updated Aug 19, 2024

google-research / arxiv-latex-cleaner

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Python 6,458 375 Updated Jun 2, 2025

IDEA-Research / OpenSeeD

[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"

Python 731 46 Updated Jan 22, 2024

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"

Python 1,356 86 Updated Jan 23, 2024

zjc062 / mind-vis

Code base for MinD-Vis

Python 781 105 Updated May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly