URL Classifier - Malicious URL Detection System

A comprehensive machine learning system for detecting malicious URLs to protect users from phishing attacks, malware distribution, and other cyber threats.

Quick Start

(In progress Project :) with continuous improvement)

Prerequisites

Python 3.8 or higher
pip (Python package installer)

Installation

Clone the repository

git clone <repository-url>
cd url_classifier

Create and activate virtual environment

python3 -m venv venv
source venv/bin/activate  # On macOS/Linux
# or
venv\Scripts\activate     # On Windows

Install dependencies
```
pip install -r requirements.txt
```
Start the API server
```
uvicorn api:app --reload
```

Usage

API Endpoints

URL Classification

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.google.com"}'

Response:

{
  "label": 0,
  "probability": 0.12
}

Health Check

curl http://localhost:8000/

Python Usage

Model Training

Traditional ML Model

cd ml
python train_classifier.py

Project Structure

url_classifier/
├── api.py                    # FastAPI server
├── real_time_detector.py     # Real-time URL monitoring
├── ml/
│   ├── predict.py           # Traditional ML prediction
│   ├── deep_predict.py      # Deep learning prediction
│   ├── deep_learning_models.py  # LSTM and BERT models
│   ├── feature_extraction.py    # URL feature extraction
│   ├── train_classifier.py      # Model training
│   ├── model.joblib            # Trained ML model
│   ├── vectorizer.joblib       # TF-IDF vectorizer
│   └── sample_data/
│       └── data.csv            # Training dataset
└── requirements.txt

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Open a Pull Request

License

This project is licensed under the MIT License.

Support

For support and questions, create an issue in the repository.

Disclaimer: This tool is for educational and research purposes. Always use in conjunction with other security measures.

Future Features

working on these features:

Multi-Model Support: Traditional ML (Logistic Regression) and Deep Learning (LSTM, BERT) models
Real-Time Detection: Fast API endpoint for instant URL classification
Caching System: Intelligent caching for improved performance
Batch Processing: Support for processing multiple URLs simultaneously
Production Ready: Includes logging, error handling, and monitoring capabilities

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ml		ml
.gitignore		.gitignore
README.md		README.md
api.py		api.py
real_time_detector.py		real_time_detector.py
requirements.txt		requirements.txt
test_import_paths.py		test_import_paths.py
test_imports.py		test_imports.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

URL Classifier - Malicious URL Detection System

Quick Start

Prerequisites

Installation

Usage

API Endpoints

URL Classification

Health Check

Python Usage

Model Training

Traditional ML Model

Project Structure

Contributing

License

Support

Future Features

About

Uh oh!

Releases

Packages

Languages

TilakKhatri/url_classifier

Folders and files

Latest commit

History

Repository files navigation

URL Classifier - Malicious URL Detection System

Quick Start

Prerequisites

Installation

Usage

API Endpoints

URL Classification

Health Check

Python Usage

Model Training

Traditional ML Model

Project Structure

Contributing

License

Support

Future Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages