Highlights
- Pro
Stars
Native Multimodal Models are World Learners
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Efficient Triton Kernels for LLM Training
Muon is an optimizer for hidden layers in neural networks
Fully open reproduction of DeepSeek-R1
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Integrate the DeepSeek API into popular softwares
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Janus-Series: Unified Multimodal Understanding and Generation Models
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
A curated list of awesome LLM/VLM/VLA for Autonomous Driving(LLM4AD) resources (continually updated)
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Code release for "Omni3D A Large Benchmark and Model for 3D Object Detection in the Wild"
[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
Fast and memory-efficient exact attention
We propose a model to analyze sentiment of online stock forum and use the information to predict stock volatility in the Chinese market. By generating a sentimental dictionary, we analyze the senti…
Recently, realistic image generation using deep neural networks has become a hot topic in machine learning and computer vision. Such an image can be generated at pixel level by learning from a larg…
Enforcing temporal consistency in real-time per-frame semantic video segmentation
The official code for the paper 'Structured Knowledge Distillation for Semantic Segmentation'. (CVPR 2019 ORAL) and extension to other tasks.
[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering
Object Detection Metrics. 14 object detection metrics: mean Average Precision (mAP), Average Recall (AR), Spatio-Temporal Tube Average Precision (STT-AP). This project supports different bounding b…
Code for a series of work in LiDAR perception, including SST (CVPR 22), FSD (NeurIPS 22), FSD++ (TPAMI 23), FSDv2, and CTRL (ICCV 23, oral).
[ICCV 2023] StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection