A comprehensive machine learning system for detecting malicious URLs to protect users from phishing attacks, malware distribution, and other cyber threats.
- (In progress Project :) with continuous improvement)
- Python 3.8 or higher
- pip (Python package installer)
-
Clone the repository
git clone <repository-url> cd url_classifier
-
Create and activate virtual environment
python3 -m venv venv source venv/bin/activate # On macOS/Linux # or venv\Scripts\activate # On Windows
-
Install dependencies
pip install -r requirements.txt
-
Start the API server
uvicorn api:app --reload
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"url": "https://www.google.com"}'
Response:
{
"label": 0,
"probability": 0.12
}
curl http://localhost:8000/
cd ml
python train_classifier.py
url_classifier/
├── api.py # FastAPI server
├── real_time_detector.py # Real-time URL monitoring
├── ml/
│ ├── predict.py # Traditional ML prediction
│ ├── deep_predict.py # Deep learning prediction
│ ├── deep_learning_models.py # LSTM and BERT models
│ ├── feature_extraction.py # URL feature extraction
│ ├── train_classifier.py # Model training
│ ├── model.joblib # Trained ML model
│ ├── vectorizer.joblib # TF-IDF vectorizer
│ └── sample_data/
│ └── data.csv # Training dataset
└── requirements.txt
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Open a Pull Request
This project is licensed under the MIT License.
For support and questions, create an issue in the repository.
Disclaimer: This tool is for educational and research purposes. Always use in conjunction with other security measures.
working on these features:
- Multi-Model Support: Traditional ML (Logistic Regression) and Deep Learning (LSTM, BERT) models
- Real-Time Detection: Fast API endpoint for instant URL classification
- Caching System: Intelligent caching for improved performance
- Batch Processing: Support for processing multiple URLs simultaneously
- Production Ready: Includes logging, error handling, and monitoring capabilities