The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.
-
Updated
Oct 10, 2025 - JavaScript
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
☁️ Build multimodal AI applications with cloud-native stack
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
Janus-Series: Unified Multimodal Understanding and Generation Models
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Phi4, ...) (AAAI 2025).
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
Mobile-Agent: The Powerful GUI Agent Family
Solve Visual Understanding with Reinforced VLMs
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai
A Next-Generation Training Engine Built for Ultra-Large MoE Models
Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.
To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."