K-Means Clustering
Learn K-Means clustering, a popular algorithm for partitioning data into clusters.
45 min•By Priygop Team•Last updated: Feb 2026
What is K-Means Clustering?
K-Means is one of the most popular unsupervised learning algorithms used for clustering. It partitions data into K clusters where each data point belongs to the cluster with the nearest mean.
Algorithm Steps
- 1. Choose K centroids randomly
- 2. Assign each point to nearest centroid
- 3. Recalculate centroids as mean of assigned points
- 4. Repeat steps 2-3 until convergence
Implementation
Example
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
# Generate sample data
X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)
# Create and fit the model
kmeans = KMeans(n_clusters=4, random_state=42)
kmeans.fit(X)
# Get cluster labels and centers
labels = kmeans.labels_
centers = kmeans.cluster_centers_
# Visualize the clusters
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels, s=50, cmap='viridis')
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.8, marker='x')
plt.title('K-Means Clustering Results')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
# Evaluate clustering
from sklearn.metrics import silhouette_score
silhouette_avg = silhouette_score(X, labels)
print(f"Silhouette Score: {silhouette_avg:.4f}")