Stars
Vector (and Scalar) Quantization, in Pytorch
Semantic IDs: How to train an LLM-Recommender Hybrid with steerability and reasoning on recommendations.
Tongyi Deep Research, the Leading Open-source Deep Research Agent
The first challenge on short-form video quality assessment
Reference PyTorch implementation and models for DINOv3
Official implementation of the paper "Watermark Anything with Localized Messages"
video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is developed by the Department of Electronic Engineering at Tsin…
Awesome papers & datasets specifically focused on long-term videos.
Demonstrate all the questions on LeetCode in the form of animation.(用动画的形式呈现解LeetCode题目的思路)
[ICCV 2025] 🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning
🔥 [ICCV 2025 Highlight] Official open-source repo for LVFace: Progressive Cluster Optimization for Large Vision Models in Face Recognition
A generalized information-seeking agent system with Large Language Models (LLMs).
A curated list of awesome LLM agents frameworks.
Multilingual Document Layout Parsing in a Single Vision-Language Model
Retrieval and Retrieval-augmented LLMs
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
The official PyTorch Implementation of Charm: The Missing Piece in ViT fine-tuning for Image Aesthetic Assessment
[CVPR2025] KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception
🎓Automatically Update Recommendation Papers Daily using Github Actions (Update Every 12th hours)
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025
[NeurIPS 2025 Spotlight] Q-Insight: Understanding Image Quality via Visual Reinforcement Learning
Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, i…
🎓 Path to a free self-taught education in Computer Science!
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.