+
Skip to content
View eric-xw's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.
  • University of California, Santa Barbara

Highlights

  • Pro

Organizations

@eric-ai-lab

Block or report eric-xw

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 10 Updated Oct 8, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,553 133 Updated Oct 9, 2025

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Python 2,927 292 Updated Sep 30, 2025

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Python 345 40 Updated Aug 6, 2025

[NeurIPS 2025] More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Python 60 3 Updated May 31, 2025

[EMNLP 2025] Official code for the paper "SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning"

Python 10 Updated Jun 30, 2025

Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"

Python 145 6 Updated Aug 4, 2025

Official implementation of the NeurIPS 2025 paper "Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space"

Python 246 20 Updated Sep 5, 2025

Agent S: an open agentic framework that uses computers like a human

Python 7,065 777 Updated Oct 5, 2025

Universal memory layer for AI Agents; Announcing OpenMemory MCP - local and secure memory management.

Python 41,046 4,364 Updated Oct 9, 2025

[ICLR 2025] EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

Python 15 3 Updated Apr 1, 2025

LLM101n: Let's build a Storyteller

34,513 1,876 Updated Aug 1, 2024

[ACL 2025 Findings] "Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models"

Python 11 Updated Feb 25, 2025

Official repo for the paper "Mojito: Motion Trajectory and Intensity Control for Video Generation""

Python 5 1 Updated Jun 11, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 13,827 1,049 Updated Oct 9, 2025

Large Concept Models: Language modeling in a sentence representation space

Python 2,289 202 Updated Jan 29, 2025

New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos

8,058 521 Updated Jun 9, 2025

A simple screen parsing tool towards pure vision based GUI agent

Jupyter Notebook 23,645 2,019 Updated Sep 12, 2025

Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.

Python 20,493 2,202 Updated Mar 11, 2025

[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"

Python 26 1 Updated Jun 23, 2025

[ECCV 2024] Official implementation of NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Python 208 14 Updated Sep 20, 2024

This is the implementation of ACL 2024 Findings paper ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

4 Updated Jun 11, 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,895 177 Updated May 26, 2025

Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"

Python 28 2 Updated Jul 31, 2024
25 Updated Jun 20, 2024

Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"

Python 29 1 Updated Jul 15, 2025

[ACL 2025 Findings] "Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"

Python 22 1 Updated Feb 21, 2025

Letta is the platform for building stateful agents: open AI with advanced memory that can learn and self-improve over time.

Python 18,713 1,937 Updated Oct 10, 2025

Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"

Python 30 4 Updated Apr 27, 2024

The official Meta Llama 3 GitHub site

Python 29,024 3,470 Updated Jan 26, 2025
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载