+
Skip to content

TilakKhatri/url_classifier

Repository files navigation

URL Classifier - Malicious URL Detection System

A comprehensive machine learning system for detecting malicious URLs to protect users from phishing attacks, malware distribution, and other cyber threats.

Quick Start

  • (In progress Project :) with continuous improvement)

Prerequisites

  • Python 3.8 or higher
  • pip (Python package installer)

Installation

  1. Clone the repository

    git clone <repository-url>
    cd url_classifier
  2. Create and activate virtual environment

    python3 -m venv venv
    source venv/bin/activate  # On macOS/Linux
    # or
    venv\Scripts\activate     # On Windows
  3. Install dependencies

    pip install -r requirements.txt
  4. Start the API server

    uvicorn api:app --reload

Usage

API Endpoints

URL Classification

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.google.com"}'

Response:

{
  "label": 0,
  "probability": 0.12
}

Health Check

curl http://localhost:8000/

Python Usage

Model Training

Traditional ML Model

cd ml
python train_classifier.py

Project Structure

url_classifier/
├── api.py                    # FastAPI server
├── real_time_detector.py     # Real-time URL monitoring
├── ml/
│   ├── predict.py           # Traditional ML prediction
│   ├── deep_predict.py      # Deep learning prediction
│   ├── deep_learning_models.py  # LSTM and BERT models
│   ├── feature_extraction.py    # URL feature extraction
│   ├── train_classifier.py      # Model training
│   ├── model.joblib            # Trained ML model
│   ├── vectorizer.joblib       # TF-IDF vectorizer
│   └── sample_data/
│       └── data.csv            # Training dataset
└── requirements.txt

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Open a Pull Request

License

This project is licensed under the MIT License.

Support

For support and questions, create an issue in the repository.


Disclaimer: This tool is for educational and research purposes. Always use in conjunction with other security measures.


Future Features

working on these features:

  • Multi-Model Support: Traditional ML (Logistic Regression) and Deep Learning (LSTM, BERT) models
  • Real-Time Detection: Fast API endpoint for instant URL classification
  • Caching System: Intelligent caching for improved performance
  • Batch Processing: Support for processing multiple URLs simultaneously
  • Production Ready: Includes logging, error handling, and monitoring capabilities

About

Detect Malicious url with Supervised ml algorithms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载