A powerful hybrid anomaly detection system for Hasura DDN (Data Delivery Network) that combines statistical analysis with AI-powered query pattern detection to identify unusual data patterns, potential security concerns, and query anomalies in real-time.
-
Hybrid Detection System
- Statistical anomaly detection using Isolation Forest
- AI-powered query pattern analysis using Claude
- Real-time analysis of query results and patterns
-
Comprehensive Analysis
- Query pattern recognition
- Statistical outlier detection
- Historical data comparison
- Security concern identification
- Business logic validation
-
Persistent Storage
- Database-agnostic storage using SQLAlchemy
- Model persistence and versioning
- Configurable retention policies
- Efficient data cleanup
-
Scalable Architecture
- Modular design
- Configurable batch processing
- Efficient resource utilization
- Concurrent request handling
- Python 3.8+
- A relational database supported by SQLAlchemy:
- PostgreSQL (reference implementation)
- MySQL/MariaDB
- SQLite
- Oracle
- Microsoft SQL Server
- Or any database with SQLAlchemy dialect support
- Anthropic API key (for Claude integration)
- Hasura DDN instance
This method will let you connect the anomaly detector plugin to your data sources. See "Adding to an Existing Supergraph" for details on integrating with an existing supergraph.
- Clone the repository:
git clone https://github.com/hasura/anomaly-detection-ddn-plugin
cd anomaly-detection-ddn-plugin- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install core dependencies:
pip install -r requirements.txt- Install database-specific driver:
# PostgreSQL
pip install psycopg2-binary
# MySQL
pip install mysqlclient
# Oracle
pip install cx_Oracle
# Microsoft SQL Server
pip install pyodbc- Set up environment variables:
cp .env.sample .env
cp example/.env.sample example/.env- Remember to update the environment variables to working values.
- Connect to your data
- Remove the chinook connector and connect to your data source.
ddn connector remove chinook
ddn connector init -i- Follow steps 1-5 above.
- In your existing supergraph compose.yaml, add:
include:
- path: ../compose.yamlAnd, add the env variables into globals/subgraph.yaml, like this:
kind: Subgraph
version: v2
definition:
name: globals
generator:
rootPath: .
includePaths:
- metadata
envMapping:
ANOMALIES_URL:
fromEnv: ANOMALIES_URL
M_AUTH_KEY:
fromEnv: M_AUTH_KEYSet the path to the compose.yaml file in the directory where you installed the anomaly detector.
This will add the anomaly detector service container to the supergraph start up.
- In your supergraph, copy the
example/globals/metadata/anomalies.hmlfile to the same location within your supergraph.
This adds the plugin definition to the supergraph
- In your supergraph .env file add.
ANOMALIES_URL="http://local.hasura.dev:8787/anomalies"
M_AUTH_KEY=secret
ANTHROPIC_API_KEY=<do-your-own>- Start the supergraph
ddn run docker-startThis will create the anomaly tables.
- Expose the anomaly tables into the same supergraph or to another of your choice so that you can run PromptQL against the results.
# Create the subgraph
ddn subgraph init data_quality
# Add it to the supergraph
ddn subgraph add --subgraph data_quality/subgraph.yaml --target-supergraph supergraph.yaml
# Create the data connector
ddn connector init --subgraph data_quality/subgraph.yaml -i
# Get its metadata
ddn connector introspect anomalies --subgraph data_quality/subgraph.yaml
# Add the tables to the subgraph
ddn model add anomalies "*" --subgraph data_quality/subgraph.yaml
ddn relationship add anomalies "*" --subgraph data_quality/subgraph.yaml
# Build and run the new supergraph
ddn supergraph build local
ddn run docker-startExample .env configuration:
# Server Configuration
PORT=8787 # Server port number
HOST=0.0.0.0 # Server host address
# Data Directory
ANOMALY_DETECTION_DATA=./tmp # Directory for storing temporary files and data
# Anthropic Configuration
ANTHROPIC_API_KEY=your-api-key-here # Your Anthropic API key
CLAUDE_MODEL=claude-3-7-sonnet-20250219 # Specific Claude model to use
# Anomaly Detection Configuration
MAX_RECORDS_PER_BATCH=50 # Maximum records to process in a single batch
HISTORICAL_RETENTION_DAYS=14 # Days to keep historical data
ANOMALY_RETENTION_DAYS=90 # Days to keep anomaly records
MODEL_RETENTION_DAYS=360 # Days to keep trained models
ANOMALY_THRESHOLD=0.1 # Threshold for anomaly detection
MINIMUM_TRAINING_RECORDS=100 # Minimum records required for model training
MAX_HISTORICAL_RECORDS=100000 # Maximum historical records to store
# Processing Configuration
MAX_TOKENS=100000 # Maximum tokens for LLM requests
MAX_RECORDS_PER_BATCH=50 # Maximum records to process in one batch
EXCLUDED_DATASETS=anomalies_.*,dq_.* # Datasets matching these patterns are not processed
# Database Configuration
DB_HOST=your-db-host # Database host address
DB_CONNECT_ARGS=json-dict-of-args # Optional - can be used to specify a schema
DB_PORT=5432 # Database port
DB_NAME=anomalies # Database name
DB_USER=your-username # Database username
DB_PASSWORD=your-password # Database password
# Logging
LOG_LEVEL=DEBUG # Logging level (DEBUG, INFO, WARNING, ERROR)
-
PostgreSQL: Reference implementation, recommended for production use
- Robust JSON support
- Advanced indexing capabilities
- Excellent performance with large datasets
-
MySQL/MariaDB: Good alternative
- Wide adoption
- Good performance
- Some limitations with JSON operations
-
SQLite: Suitable for development/testing
- No separate server required
- Limited concurrent access
- Not recommended for production
-
Oracle/MSSQL: Enterprise options
- Good for integration with existing enterprise systems
- Additional licensing considerations
- May require specific configuration for optimal performance
POST /anomalies: Main anomaly detection endpointGET /health: Health check endpointGET /history/<query_id>: Get historical data for a queryDELETE /history/<query_id>: Clear historical dataGET /model/<query_id>: Get model informationPOST /analyze/<query_id>: Analyze a single recordGET /stats/<query_id>: Get statistical summaries
Check out the example directory for a complete working example:
# After updating .env and ./example/.env
python server.py
cd example
ddn run docker-startExample request:
curl -X POST http://localhost:8787/anomalies \
-H "Content-Type: application/json" \
-H "X-Hasura-User: test-user" \
-H "X-Hasura-Role: user" \
-d @example/sample_request.jsonExample response:
{
"results": {
"users": {
"is_anomaly": true,
"score": -0.876,
"model_details": {
"model_type": "isolation_forest",
"features_used": 5
},
"security_concerns": [
{
"type": "unusual_access_pattern",
"severity": "medium"
}
]
}
},
"metadata": {
"timestamp": "2024-11-10T10:30:00Z",
"query_hash": "abc123",
"total_records": 100,
"total_anomalies": 3
}
}The system consists of several key components:
-
Hybrid Detector (
hybrid_detector.py)- Combines statistical and AI-powered analysis
- Manages analysis workflow
- Coordinates between components
-
Statistical Detector (
statistical_detector.py)- Implements Isolation Forest algorithm
- Handles model training and persistence
- Processes numerical features
-
Query Detector (
query_detector.py)- AI-powered query analysis
- Pattern recognition
- Security validation
-
Database Storage (
db_storage.py)- Manages data persistence
- Handles model storage
- Implements cleanup policies
Improvements from a simple sliding window to a more sophisticated seasonality approach.
-
Database Connection Errors
Error: Could not connect to database- Verify database connection string format
- Check if database driver is installed
- Verify database credentials in
.env - Ensure database server is running
- Check network connectivity
- Verify database permissions
-
Database Driver Issues
Error: No module named 'psycopg2' (or similar)- Install appropriate database driver:
# PostgreSQL pip install psycopg2-binary # MySQL pip install mysqlclient # Oracle pip install cx_Oracle # MSSQL pip install pyodbc
- Install appropriate database driver:
-
API Key Issues
Error: Invalid API key- Verify ANTHROPIC_API_KEY in
.env - Check API key permissions
- Verify ANTHROPIC_API_KEY in
-
Model Training Errors
Error: Insufficient data for training- Ensure sufficient historical data
- Check data format consistency
- Verify database indexes
- Fork the repository from https://github.com/hasura/anomaly-detection-ddn-plugin
- Create a feature branch:
git checkout -b feature-name - Commit changes:
git commit -am 'Add feature' - Push to branch:
git push origin feature-name - Submit a Pull Request to the main repository
- Follow PEP 8 style guide
- Add unit tests for new features
- Update documentation
- Include example usage
- Add meaningful commit messages
This project is licensed under the MIT License - see the LICENSE file for details.
- Hasura team for the DDN platform
- Anthropic for the Claude API
- scikit-learn team for the Isolation Forest implementation
For support, please:
- Check the documentation
- Review existing issues on GitHub
- Open a new issue with:
- Detailed description
- Steps to reproduce
- System information
- Relevant logs
To report security vulnerabilities, please follow Hasura's security policy or email security@hasura.io.
Made with ❤️ by Hasura