VistAI is an advanced visual search and recommendation system that leverages state-of-the-art AI technologies to find visually similar products and provide personalized recommendations. The system combines computer vision (CLIP) for visual understanding and reinforcement learning (DQN) for intelligent recommendations.
- Visual Search: Find visually similar products using CLIP-based image embeddings
- AI-Powered Recommendations: Get personalized recommendations using Deep Q-Learning
- Responsive Web Interface: User-friendly interface for seamless interaction
- Scalable Architecture: Modular design for easy extension and maintenance
- Framework: Flask
- Computer Vision: CLIP (Contrastive Language-Image Pretraining)
- Similarity Search: FAISS (Facebook AI Similarity Search)
- Reinforcement Learning: Stable Baselines3 (DQN)
- Data Processing: NumPy, Pandas, OpenCV
- Deep Learning: PyTorch
- HTML5, CSS3, JavaScript
- Responsive design for all devices
- Python 3.8+
- pip
- Git
-
Clone the repository:
git clone https://github.com/Zeeshier/VistAI.git cd VistAI
-
Install dependencies:
pip install -r requirements.txt
-
Download CLIP model:
import clip clip.load("ViT-B/32", download_root="models/clip_model")
-
Run the application:
python src/app.py
-
Open your browser and navigate to
http://localhost:5000
VistAI/
├── data/ # Dataset and processed images
├── models/ # Pretrained models and embeddings
│ ├── clip_model/ # CLIP model weights
│ ├── dqn_model/ # Trained DQN models
│ └── embeddings/ # Image embeddings and FAISS index
├── src/ # Source code
│ ├── app.py # Flask application
│ ├── visual_search.py # CLIP-based visual search
│ ├── rl_recommender.py # DQN-based recommendation
│ ├── data_preprocessing.py# Image preprocessing utilities
│ └── utils.py # Helper functions
├── static/ # Static files (CSS, JS, uploads)
├── templates/ # HTML templates
├── tests/ # Test files
├── requirements.txt # Project dependencies
├── README.md # Project Documentation
- Utilizes OpenAI's CLIP model to generate image embeddings
- Implements FAISS for efficient similarity search
- Returns visually similar products based on input image
- Implements Deep Q-Network (DQN) for personalized recommendations
- State space: CLIP embeddings of products
- Action space: Product indices
- Reward function: Positive for correct matches, negative for mismatches
- Image Preprocessing: Resize and normalize input images
- Feature Extraction: Generate embeddings using CLIP
- Similarity Search: Find nearest neighbors using FAISS
- Result Ranking: Sort results by similarity score
- Environment Setup: Custom Gym environment for product recommendations
- Model Training: Train DQN on product embeddings
-
Performance Optimization:
- Implemented FAISS for efficient similarity search
- Optimized CLIP model loading and inference
-
Data Processing:
- Handled varying image sizes and formats
- Implemented efficient batch processing
-
Model Training:
- Addressed exploration-exploitation tradeoff in RL
- Implemented reward shaping for better convergence
- Visual Search: High accuracy in finding visually similar products
- Recommendation Quality: Improved user engagement with personalized suggestions
- Performance: Fast response time for real-time recommendations
- AI Engineers: Zeeshan Ahmad(Team Lead), M Abeer(Team Member)
- Web Developer: Muhammad Hamza Sirang(Team Member)
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for the CLIP model
- Facebook Research for FAISS
- Stable Baselines3 team
- The open-source community for valuable libraries and tools