[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.

Python 3,873 618 Updated Aug 15, 2024

FoundationAgents / MetaGPT

🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

Python 57,304 6,888 Updated Jun 30, 2025

pytorch / torchtune

PyTorch native post-training library

Python 5,352 654 Updated Jul 18, 2025

IDEA-Research / Grounded-SAM-2

Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2

Jupyter Notebook 2,472 266 Updated May 26, 2025

xinyu1205 / recognize-anything

Open-source and strong foundation image recognition models.

Jupyter Notebook 3,340 308 Updated Feb 18, 2025

huggingface / speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Python 4,108 464 Updated Apr 15, 2025

microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system

Python 26,611 2,765 Updated Jul 18, 2025

antgroup / echomimic

[AAAI 2025] EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

Python 3,968 444 Updated Dec 10, 2024

TencentARC / T2I-Adapter

T2I-Adapter

Python 3,712 225 Updated Jun 21, 2024

facebookresearch / DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 7,581 676 Updated May 31, 2024

karpathy / LLM101n

LLM101n: Let's build a Storyteller

34,017 1,850 Updated Aug 1, 2024

BboyHanat / TextGenerator

OCR dataset Text-Detection dataset Font-Classification dataset generator

Python 148 40 Updated Mar 1, 2022

brightmart / nlp_chinese_corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

9,746 1,561 Updated May 23, 2024

dikers / ocr_synth_text_chinese

生成训练文本检测数据集

Python 12 Updated Jul 1, 2020

kong36088 / BaiduImageSpider

一个超级轻量的百度图片爬虫

Python 901 394 Updated May 29, 2023

meta-llama / llama3

The official Meta Llama 3 GitHub site

Python 28,848 3,424 Updated Jan 26, 2025

PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 11,997 1,060 Updated Jul 12, 2025

wysoczanska / clip_dinoiser

Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.

Jupyter Notebook 251 14 Updated Oct 26, 2024

YuchenLiu98 / COMM

Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

Jupyter Notebook 199 6 Updated Jan 8, 2025

Ucas-HaoranWei / Vary

[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.

Python 1,838 144 Updated Dec 30, 2024

Ucas-HaoranWei / Vary-toy

Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)

Python 619 44 Updated Dec 30, 2024

IDEA-Research / DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"

Python 2,566 291 Updated Jul 31, 2024

AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Python 5,704 539 Updated Feb 26, 2025

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 48,878 5,370 Updated Jul 18, 2025

layumi / Person_reID_baseline_pytorch

⛹️ Pytorch ReID: A tiny, friendly, strong pytorch implement of person re-id / vehicle re-id baseline. Tutorial 👉https://github.com/layumi/Person_reID_baseline_pytorch/tree/master/tutorial

Python 4,309 1,021 Updated May 7, 2025

tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Jupyter Notebook 6,121 387 Updated Jun 28, 2024

Naiyuan Liu NNNNAI

Lists (3)

Audio

LLM

quantization

Stars