K-Means Clustering
Learn K-Means clustering, a popular algorithm for partitioning data into clusters. This is a foundational concept in artificial intelligence and machine learning that professional developers rely on daily. The explanations below are written to be beginner-friendly while covering the depth and nuance that comes from real-world AI/ML experience. Take your time with each section and practice the examples
What is K-Means Clustering?
K-Means is one of the most popular unsupervised learning algorithms used for clustering. It partitions data into K clusters where each data point belongs to the cluster with the nearest mean.. This is an essential concept that every AI/ML developer must understand thoroughly. In professional development environments, getting this right can mean the difference between code that works reliably and code that breaks in production. The following sections break this down into clear, digestible pieces with practical examples you can try immediately
Algorithm Steps
- 1. Choose K centroids randomly — a critical concept in artificial intelligence and machine learning that you will use frequently in real projects
- 2. Assign each point to nearest centroid — a critical concept in artificial intelligence and machine learning that you will use frequently in real projects
- 3. Recalculate centroids as mean of assigned points
- 4. Repeat steps 2-3 until convergence — a critical concept in artificial intelligence and machine learning that you will use frequently in real projects
Implementation
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
# Generate sample data
X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)
# Create and fit the model
kmeans = KMeans(n_clusters=4, random_state=42)
kmeans.fit(X)
# Get cluster labels and centers
labels = kmeans.labels_
centers = kmeans.cluster_centers_
# Visualize the clusters
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels, s=50, cmap='viridis')
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.8, marker='x')
plt.title('K-Means Clustering Results')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
# Evaluate clustering
from sklearn.metrics import silhouette_score
silhouette_avg = silhouette_score(X, labels)
print(f"Silhouette Score: {silhouette_avg:.4f}")