+
Skip to content
View jinze1994's full-sized avatar

Block or report jinze1994

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 14,135 1,075 Updated Oct 12, 2025

VideoSys: An easy and efficient system for video generation

Python 2,005 133 Updated Aug 27, 2025

The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"

Jupyter Notebook 245 14 Updated Jan 22, 2025

On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)

Python 722 53 Updated Jul 5, 2025

QWen-VL-Plus & QWen-VL-Max in ComfyUI

Python 216 20 Updated May 22, 2024

A Python implementation of John Gruber’s Markdown with Extension support.

Python 4,097 886 Updated Sep 26, 2025

Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.

Python 11,912 1,084 Updated Sep 26, 2025

Official github repo of G-LLaVA

Python 147 4 Updated Feb 20, 2025

Robust Speech Recognition via Large-Scale Weak Supervision

Python 89,340 11,156 Updated Sep 8, 2025

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headles…

Python 6,280 656 Updated Oct 7, 2025

A framework to enable multimodal models to operate a computer.

Python 9,935 1,391 Updated Sep 19, 2025

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,808 134 Updated Jul 5, 2024

Fast and memory-efficient exact attention

Python 19,884 2,045 Updated Oct 13, 2025

✨✨Latest Advances on Multimodal Large Language Models

16,445 1,064 Updated Sep 24, 2025

万卷1.0多模态语料

567 28 Updated Oct 20, 2023

Touchstone: Evaluating Vision-Language Models by Language Models

Python 83 Updated Jan 18, 2024

(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.

Python 350 13 Updated Jan 14, 2025

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 6,295 464 Updated Aug 7, 2024

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 19,489 1,624 Updated Sep 30, 2025

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Python 1,049 70 Updated Oct 6, 2024

Generate text images for training deep learning ocr model

Python 1,449 388 Updated Jan 17, 2022

Inference code for Llama models

Python 58,816 9,809 Updated Jan 26, 2025
Python 221 23 Updated Apr 18, 2025
Python 143 50 Updated Jul 1, 2023

Data and code for NeurIPS 2022 Paper "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering".

Python 693 65 Updated Sep 19, 2024

Implementation of Toolformer, Language Models That Can Use Tools, by MetaAI

Python 2,049 129 Updated Jul 22, 2024

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Python 26,376 5,442 Updated Nov 20, 2023

project page for VinVL

358 25 Updated Jul 26, 2023
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载