jinze1994

Jinze Bai jinze1994

119 followers · 3 following

Peking University

Achievements

Stars

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 14,135 1,075 Updated Oct 12, 2025

NUS-HPC-AI-Lab / VideoSys

VideoSys: An easy and efficient system for video generation

Python 2,005 133 Updated Aug 27, 2025

apple / ml-veclip

The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"

Jupyter Notebook 245 14 Updated Jan 22, 2025

Yuliang-Liu / MultimodalOCR

On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)

Python 722 53 Updated Jul 5, 2025

ZHO-ZHO-ZHO / ComfyUI-Qwen-VL-API

QWen-VL-Plus & QWen-VL-Max in ComfyUI

Python 216 20 Updated May 22, 2024

Python-Markdown / markdown

A Python implementation of John Gruber’s Markdown with Extension support.

Python 4,097 886 Updated Sep 26, 2025

QwenLM / Qwen-Agent

Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.

Python 11,912 1,084 Updated Sep 26, 2025

pipilurj / G-LLaVA

Official github repo of G-LLaVA

Python 147 4 Updated Feb 20, 2025

openai / whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Python 89,340 11,156 Updated Sep 8, 2025

jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headles…

Python 6,280 656 Updated Oct 7, 2025

OthersideAI / self-operating-computer

A framework to enable multimodal models to operate a computer.

Python 9,935 1,391 Updated Sep 19, 2025

QwenLM / Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,808 134 Updated Jul 5, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 19,884 2,045 Updated Oct 13, 2025

camenduru / Qwen-VL-Chat-colab

Python 80 16 Updated Nov 24, 2023

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

16,445 1,064 Updated Sep 24, 2025

opendatalab / WanJuan1.0

万卷1.0多模态语料

567 28 Updated Oct 20, 2023

OFA-Sys / TouchStone

Touchstone: Evaluating Vision-Language Models by Language Models

Python 83 Updated Jan 18, 2024

AILab-CVC / SEED-Bench

(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.

Python 350 13 Updated Jan 14, 2025

QwenLM / Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 6,295 464 Updated Aug 7, 2024

QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 19,489 1,624 Updated Sep 30, 2025

OFA-Sys / ONE-PEACE

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Python 1,049 70 Updated Oct 6, 2024