-
Xi'an Jiaotong University
- Xi'an, China
-
05:14
(UTC +08:00) - @XuecWu
Lists (1)
Sort Name ascending (A-Z)
Stars
Display giant ASCII-art logos with colorful gradients in your terminal — like Claude Code or Gemini CLI.
[Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
The offical repository of "So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection"
[NeurIPS 2025 DB] OneIG-Bench is a meticulously designed comprehensive benchmark framework for fine-grained evaluation of T2I models across multiple dimensions, including subject-element alignment,…
NEO Series: Native Vision-Language Models from First Principles
Official implementation of "UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing"
HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arxiv.org/abs/2501.02625
This is the repository for the paper ‘A Survey of Inductive Reasoning for Large Language Models’
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
shangshang-wang / Tora
Forked from meta-pytorch/torchtuneTora: Torchtune-LoRA for RL
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search
Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.
VChain: Chain-of-Visual-Thought for Reasoning in Video Generation
VideoNSA: Native Sparse Attention Scales Video Understanding
Automatic Video Generation from Scientific Papers
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Code release for Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizer
A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Language Models (LLMs).
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
[NeurIPS'25] DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution
[NeurIPS 2025] Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
📄 Awesome CV is LaTeX template for your outstanding job application
Kandinsky 5.0: A family of diffusion models for Video & Image generation