+
Skip to content
View frk-tt's full-sized avatar
⛷️
⛷️

Block or report frk-tt

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🎓Automatically Update Recommendation Papers Daily using Github Actions (Update Every 12th hours)

Python 66 2 Updated Jul 15, 2025

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 1,640 176 Updated Jul 15, 2025

This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024

Python 992 84 Updated Nov 22, 2024

Q-Insight: Understanding Image Quality via Visual Reinforcement Learning

Python 134 2 Updated Jun 10, 2025

Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, i…

Python 27,085 4,145 Updated Jul 11, 2025

🎓 Path to a free self-taught education in Computer Science!

HTML 188,234 23,475 Updated Jul 5, 2025

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,319 53 Updated Jun 14, 2025

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

Python 4,333 238 Updated May 5, 2025

The official repo for "DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models".

Python 5 Updated Feb 11, 2025

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,963 117 Updated Jun 16, 2025

Q-Insight is open-sourced at https://github.com/bytedance/Q-Insight. This repository will not receive further updates.

143 4 Updated May 30, 2025

A comprehensive collection of IQA papers

TeX 1,260 75 Updated Apr 8, 2025

Evaluation and Tracking for LLM Experiments and AI Agents

Python 2,636 217 Updated Jul 15, 2025

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 48,775 5,363 Updated Jul 14, 2025

Official inference framework for 1-bit LLMs

Python 20,523 1,537 Updated Jun 3, 2025

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!

JavaScript 14,110 959 Updated Jul 11, 2025

Fast, Accurate, Lightweight Python library to make State of the Art Embedding

Python 2,213 145 Updated Jul 11, 2025

Port of OpenAI's Whisper model in C/C++

C++ 41,554 4,454 Updated Jul 14, 2025

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

Jupyter Notebook 436 60 Updated Aug 27, 2024
Python 5,632 422 Updated May 11, 2025

Development repository for the Triton language and compiler

MLIR 16,176 2,116 Updated Jul 15, 2025

Enforce the output format (JSON Schema, Regex etc) of a language model

Python 1,843 78 Updated Feb 26, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,317 255 Updated Jun 12, 2025

Large-scale LLM inference engine

C++ 1,478 158 Updated Jul 15, 2025

Learn how to design systems at scale and prepare for system design interviews

37,055 4,407 Updated Apr 10, 2024

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3,407 274 Updated Jun 19, 2025

Taming Stable Diffusion for Lip Sync!

Python 4,565 726 Updated Jun 20, 2025

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

2,512 113 Updated Jul 9, 2025

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,869 176 Updated May 26, 2025
Next
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载