You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This GitHub repository contains the complete code for building Business-Ready Generative AI Systems (GenAISys) from scratch. It guides you through architecting and implementing advanced AI controllers, intelligent agents, and dynamic RAG frameworks. The projects demonstrate practical applications across various domains.
A web app that dynamically generates playable 'Spot the Difference' games from a single text prompt using a multimodal pipeline with Google's Gemini and Imagen models.
Enterprise-ready solution leveraging multimodal Generative AI (Gen AI) to enhance existing or new applications beyond text—implementing RAG, image classification, video analysis, and advanced image embeddings.
A demo multimodal AI chat application built with Streamlit and Google's Gemini model. Features include: secure Google OAuth, persistent data storage with Cloud SQL (PostgreSQL), and intelligent function calling. Includes a persona-based newsletter engine to deliver personalized insights.
ICML 2025 Papers: Dive into cutting-edge research from the premier machine learning conference. Stay current with breakthroughs in deep learning, generative AI, optimization, reinforcement learning, and beyond. Code implementations included. ⭐ support the future of machine learning research!
This is a fully autonomous, self-operating computer automation system designed to automate tasks on Windows without any user interaction. It runs scheduled or trigger-based workflows using Python, system tools, and smart agents — ideal for repetitive tasks, bots, or self-executing pipelines.
#3 Winner of Best Use of Zoom API at Stanford TreeHacks 2025! An AI-powered meeting assistant that captures video, audio and textual context from Zoom calls using multimodal RAG.
VLDBench: A large-scale benchmark for evaluating Vision-Language Models (VLMs) and Large Language Models (LLMs) on multimodal disinformation detection.
Mai is an emotionally intelligent, voice-enabled AI assistant built with FastAPI, Together.ai LLMs, memory persistence via ChromaDB, and real-time sentiment analysis. Designed to feel alive, empathetic, and human-like, Mai blends the charm of a flirty cyberpunk companion with the power of modern multimodal AI.
A hands-on collection of experimental AI mini-projects exploring large language models, multimodal reasoning, retrieval-augmented generation (RAG), reinforcement learning, and real-world applications in finance, eKYC, and voice interfaces.
🤖 Production-ready samples for building multi-modal AI agents that understand images, documents, videos, and text using Amazon Bedrock and Strands Agents. Features Claude integration, MCP tools, streaming responses, and enterprise-grade architecture.
Generative AI (Gen AI) is a branch of artificial intelligence that creates new content such as text, images, audio, or code using models like GPT or Gemini. It powers applications like AI chatbots, image generation tools, and creative assistants across various industries.
OllamaMulti-RAG 🚀 is a multimodal AI chat app combining Whisper AI for audio, LLaVA for images, and Chroma DB for PDFs, enhanced with Ollama and OpenAI API. 📄 Built for AI enthusiasts, it welcomes contributions—features, bug fixes, or optimizations—to advance practical multimodal AI research and development collaboratively.