Hierarchical Clustering
Explore hierarchical clustering for building a tree of clusters. This is a foundational concept in artificial intelligence and machine learning that professional developers rely on daily. The explanations below are written to be beginner-friendly while covering the depth and nuance that comes from real-world AI/ML experience. Take your time with each section and practice the examples
What is Hierarchical Clustering?
Hierarchical clustering builds a tree of clusters by either merging smaller clusters into larger ones (agglomerative) or splitting larger clusters into smaller ones (divisive).. This is an essential concept that every AI/ML developer must understand thoroughly. In professional development environments, getting this right can mean the difference between code that works reliably and code that breaks in production. The following sections break this down into clear, digestible pieces with practical examples you can try immediately
Types of Hierarchical Clustering
- Agglomerative: Bottom-up approach
- Divisive: Top-down approach
Implementation
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial.distance import pdist
import matplotlib.pyplot as plt
# Generate data
X, y_true = make_blobs(n_samples=50, centers=3, cluster_std=0.60, random_state=0)
# Agglomerative clustering
clustering = AgglomerativeClustering(n_clusters=3)
y_pred = clustering.fit_predict(X)
# Create linkage matrix for dendrogram
linkage_matrix = linkage(X, method='ward')
# Plot dendrogram
plt.figure(figsize=(10, 6))
dendrogram(linkage_matrix)
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('Sample Index')
plt.ylabel('Distance')
plt.show()
# Visualize clusters
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=y_pred, s=50, cmap='viridis')
plt.title('Hierarchical Clustering Results')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()