Understand the Math Behind DBSCAN

4 min readOct 19, 2024

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a popular clustering algorithm that groups data points based on their density. It works by defining clusters as dense regions separated by areas of lower density, which allows it to discover clusters of arbitrary shapes and handle noise effectively. Here, we’ll explore the math behind DBSCAN with a step-by-step manual example.

Parameters Recap:

eps: The maximum radius of a neighborhood around a point (i.e., how close points should be to each other to be considered neighbors). Here, eps = 2.0.
min_samples: The minimum number of points needed to form a dense region (including the point itself). Here, min_samples = 3.

Dataset:

We’ll work with the following set of points:

mathematicacodePoint A: (1, 2)
Point B: (2, 2)
Point C: (2, 3)
Point D: (8, 7)
Point E: (8, 8)
Point F: (25, 80)
Point G: (1, 3)
Point H: (0, 2)

Step-by-Step Calculation:

Step 1: Calculate Distances Between Points

To determine the neighbors of each point, we first calculate the distance between every pair of points using the Euclidean distance formula:

Step 2: Compute the Distance Matrix

Using the Euclidean formula, we create a distance matrix that shows the distance between each pair of points:

Step 3: Determine Neighbors Within `eps`

A point is considered a neighbor if its distance from another point is less than or equal to eps = 2.0. Based on this rule, we determine the neighbors of each point:

Step 4: Determine Core Points (With `min_samples=3`):

A core point must have at least 3 neighbors (including itself):
Core points: A, B, C, G (each has at least 3 neighbors).
Border points: H (not enough neighbors but close to a core point).
Non-core points: D, E, F (too few neighbors).

Step 5: Expand Clusters from Core Points

Start with Point A:

Assign Cluster 1.
Expand from A: Neighbors are A, B, G, H.
Add B and G to Cluster 1.
H is a border point, associated with Cluster 1.

Move to Point B:

Already in Cluster 1.
Expand from B: Add C to Cluster 1.

Move to Point C:

Already in Cluster 1, no new neighbors to add.

Move to Point D:

D is not a core point (only 2 neighbors), so it remains noise.

Move to Point E:

E is also not a core point, so it remains noise.

Move to Point F:

F is isolated, so it is considered noise.

Step 6: Final Result

After expanding clusters, the final results are:

Cluster 1: A, B, C, G.
Border Point: H (associated with Cluster 1).
Noise Points: D, E, F.

Summary of Results:

Cluster 1: {A, B, C, G}
Border Point: {H}
Noise: {D, E, F}

Key Takeaways:

Core Points: These form the dense regions of a cluster and have enough neighboring points.
Border Points: Close to core points but do not meet the density requirement themselves.
Noise: Isolated points that do not belong to any cluster.

This example demonstrates how DBSCAN clusters points based on density and distance, making it effective for discovering clusters of arbitrary shape and handling outliers. With a basic understanding of eps and min_samples, you can apply DBSCAN to real-world datasets, adjusting these parameters to fit the data's distribution.

Understand the Math Behind DBSCAN

Parameters Recap:

Dataset:

Step-by-Step Calculation:

Step 1: Calculate Distances Between Points

Step 2: Compute the Distance Matrix

Step 3: Determine Neighbors Within `eps`

Step 4: Determine Core Points (With `min_samples=3`):

Step 5: Expand Clusters from Core Points

Step 6: Final Result

Summary of Results:

Key Takeaways:

Written by Raghda Al taei

No responses yet

Understand the Math Behind DBSCAN

Parameters Recap:

Dataset:

Step-by-Step Calculation:

Step 1: Calculate Distances Between Points

Step 2: Compute the Distance Matrix

Step 3: Determine Neighbors Within eps

Step 4: Determine Core Points (With min_samples=3):

Step 5: Expand Clusters from Core Points

Step 6: Final Result

Summary of Results:

Key Takeaways:

Written by Raghda Al taei

No responses yet

Step 3: Determine Neighbors Within `eps`

Step 4: Determine Core Points (With `min_samples=3`):