dbscan_dart 2.0.1
dbscan_dart: ^2.0.1 copied to clipboard
A lightweight, pure-Dart implementation of the DBSCAN clustering algorithm with KD-Tree optimization.
🚀 DBSCAN Dart Implementation #
A high-performance, pure-Dart implementation of the DBSCAN clustering algorithm with advanced spatial optimization. This library works with any data type that implements the SpatialPoint
interface, featuring optimized distance calculations and spatial indexing for maximum performance on large datasets.
✨ Features #
- 🚀 Optimized KD-Tree Construction - Uses Floyd-Rivest quickselect for O(n log n) tree building with minimal memory allocation
- ⚡ Fast Range Search - Iterative search with squared distance comparisons avoids expensive sqrt operations
- 🧩 Enhanced Point Interface - Flexible interface with optimized distance methods for maximum performance
- 🔄 Efficient Seed Expansion - Queue-based expansion with O(1) duplicate detection using Set tracking
- 🛡️ Smart Border Point Handling - Optimized border point detection prevents misclassification as noise
- 🔀 Parallel Processing - Built-in isolate support for CPU-intensive clustering tasks
- 📊 Comprehensive Results - Returns both per-cluster point collections and per-point cluster assignments
Credits and Attribution #
This package is heavily based on the work done in mojixcoder/dbscan, with optimizations and adaptations for Dart.
The Floyd-Rivest quickselect algorithm implementation is adapted from Zverik/dart_rbush, originally by Ilya Zverev (2021) and Vladimir Agafonkin (2018), used under ISC license.
For those interested in understanding the DBSCAN algorithm in depth:
Getting started #
Add the package to your pubspec.yaml
:
dart pub add dbscan_dart
Usage #
import 'package:dbscan_dart/dbscan_dart.dart';
void main() {
// Create some sample points
final points = <SpatialPoint>[
LatLngPoint(pointId: PointId(1), lat: 1, lng: 1),
LatLngPoint(pointId: PointId(2), lat: 2, lng: 2),
LatLngPoint(pointId: PointId(3), lat: 3, lng: 3),
LatLngPoint(pointId: PointId(5), lat: 5, lng: 5),
LatLngPoint(pointId: PointId(1000), lat: 1000, lng: 1000),
LatLngPoint(pointId: PointId(1001), lat: 1001, lng: 1001),
LatLngPoint(pointId: PointId(2000), lat: 2000, lng: 2000),
];
// Set DBSCAN parameters
const eps = 5;
const minPoints = 2;
// Create a DBScan instance
final dbscan = DBScan(eps: eps, minPoints: minPoints);
// Run DBSCAN
final result = dbscan.run(points: points);
// Print clusters
print('### Clusters ###');
result.clusters.forEach((clusterId, cluster) {
print('Cluster $clusterId: $cluster');
});
// Print point labels
print('### Point Labels ###');
result.labels.forEach((pointId, label) {
print('ID(${pointId.value}): ClusterID(${label.value})');
});
// Check if a point is noise
final pointToCheck = PointId(2000);
if (result.labels[pointToCheck] == ClusterLabel.noise) {
print('Point with ID ${pointToCheck.value} is noise');
}
}
Example output:
### Clusters ###
Cluster 1: [LatLngPoint(id: 1, lat: 1, lng: 1), LatLngPoint(id: 2, lat: 2, lng: 2), LatLngPoint(id: 3, lat: 3, lng: 3), LatLngPoint(id: 5, lat: 5, lng: 5)]
Cluster 2: [LatLngPoint(id: 1000, lat: 1000, lng: 1000), LatLngPoint(id: 1001, lat: 1001, lng: 1001)]
################
### Point Labels ###
ID(1): ClusterID(1)
ID(2): ClusterID(1)
ID(3): ClusterID(1)
ID(5): ClusterID(1)
ID(1000): ClusterID(2)
ID(1001): ClusterID(2)
ID(2000): ClusterID(-1) // Noise
####################
Point with ID 2000 is noise
API Overview #
SpatialPoint Interface #
Your data type must implement:
abstract class SpatialPoint {
PointId id();
double distanceTo(SpatialPoint other);
int dimension();
double atDimension(int d);
}
LatLngPoint Implementation #
The library includes a LatLngPoint
implementation for geographical points:
final point = LatLngPoint(pointId: 1, lat: 37.7749, lng: -122.4194);
DBScan Algorithm #
The main clustering algorithm:
final dbscan = DBScan(eps: 5, minPoints: 4);
final result = dbscan.run(points: points);
Parameters:
eps
: Neighborhood radius (maximum distance between two points to be considered neighbors)minPoints
: Minimum number of points required to form a dense region
Result Structure #
The result contains:
clusters
: Map of cluster IDs to lists of points in each clusterlabels
: Map of point IDs to their assigned cluster labels
Additional information #
How DBSCAN Works #
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that groups together points that are closely packed together, marking as outliers points that lie alone in low-density regions.
The algorithm works by:
- For each point, finding all points within distance
eps
- If a point has at least
minPoints
neighbors, it's a "core point" - All points within
eps
distance of a core point are in the same cluster - Points that are not core points but are within
eps
of a core point are "border points" - Points that are neither core nor border points are "noise points"
Choosing Parameters #
-
eps: The maximum distance between two points for them to be considered neighbors
- Too small: Many small clusters or noise points
- Too large: Clusters merge together
- Tip: Try visualizing your data and estimating typical cluster sizes
-
minPoints: The minimum number of points required to form a dense region
- Rule of thumb: At least dimension + 1 (e.g., 3 for 2D data)
- Higher values create more significant clusters
- Lower values create more clusters with fewer points each
Performance Considerations #
- The implementation uses a KD-Tree for efficient range queries
- For very large datasets, consider sampling or chunking your data
- The algorithm's time complexity is approximately O(n log n) in the average case