K-Means Clustering — Grouping Without Labels

K-Means partitions data into K clusters by iteratively assigning each point to its nearest centroid and recomputing centroids until convergence. The algorithm minimizes within-cluster sum of squares (inertia). K must be chosen in advance — use the Elbow method or Silhouette score. K-Means assumes spherical clusters of similar size; it fails on elongated or non-convex shapes.

25 min•By Priygop Team•Updated 2026

K-Means — Elbow Method, Silhouette, and Limitation

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score, silhouette_samples
from sklearn.datasets import make_blobs, make_moons

np.random.seed(42)

# GENERATE BLOBS
X_blobs, y_true = make_blobs(n_samples=500, centers=4, cluster_std=1.0, random_state=42)

# SCALE FEATURES BEFORE CLUSTERING
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_blobs)

# FIND OPTIMAL K: ELBOW METHOD
inertias      = []
silhouette_scores = []
k_range = range(2, 11)

for k in k_range:
    km = KMeans(n_clusters=k, n_init=10, random_state=42)
    labels = km.fit_predict(X_scaled)
    inertias.append(km.inertia_)
    silhouette_scores.append(silhouette_score(X_scaled, labels))

fig, axes = plt.subplots(1, 2, figsize=(13, 5))

axes[0].plot(k_range, inertias, "bo-", linewidth=2, markersize=6)
axes[0].set_xlabel("Number of Clusters (K)")
axes[0].set_ylabel("Inertia (within-cluster SS)")
axes[0].set_title("Elbow Method -- look for the 'elbow'")
axes[0].axvline(4, color="red", linestyle="--", alpha=0.7, label="True K=4")
axes[0].legend()

axes[1].plot(k_range, silhouette_scores, "ro-", linewidth=2, markersize=6)
axes[1].set_xlabel("Number of Clusters (K)")
axes[1].set_ylabel("Silhouette Score (higher = better)")
axes[1].set_title("Silhouette Score -- peak = optimal K")
axes[1].axvline(k_range[np.argmax(silhouette_scores)], color="red", linestyle="--",
                label=f"Best K={k_range[np.argmax(silhouette_scores)]}")
axes[1].legend()

plt.tight_layout()
plt.savefig("kmeans_elbow.png", dpi=100, bbox_inches="tight")
plt.show()

# FIT OPTIMAL K-MEANS
km_final = KMeans(n_clusters=4, n_init=10, random_state=42)
labels = km_final.fit_predict(X_scaled)
print(f"K-Means (K=4): Silhouette = {silhouette_score(X_scaled, labels):.4f}")
print(f"Cluster sizes: {pd.Series(labels).value_counts().sort_index().to_dict()}")

# CLUSTER STATISTICS
cluster_df = pd.DataFrame(X_blobs, columns=["feature_1", "feature_2"])
cluster_df["cluster"] = labels
print("\nCluster centers (original scale):")
print(cluster_df.groupby("cluster")[["feature_1", "feature_2"]].mean().round(2))

# WHERE K-MEANS FAILS -- non-spherical shapes
X_moons, _ = make_moons(n_samples=300, noise=0.1, random_state=42)
X_moons_sc  = StandardScaler().fit_transform(X_moons)
labels_moons = KMeans(n_clusters=2, n_init=10, random_state=42).fit_predict(X_moons_sc)
fig, ax = plt.subplots(figsize=(7, 5))
ax.scatter(X_moons[:, 0], X_moons[:, 1], c=labels_moons, cmap="RdBu", s=30, alpha=0.8)
ax.set_title(f"K-Means on Moons -- fails!\nSilhouette={silhouette_score(X_moons_sc, labels_moons):.3f}")
plt.tight_layout()
plt.savefig("kmeans_fails.png", dpi=100, bbox_inches="tight")
plt.show()

Tip

Practice KMeans Clustering Grouping Without Labels in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

θ = θ - α × ∇L(θ). Too high α = diverge. Too low = slow.

Practice Task

Note

Practice Task — (1) Write a working example of KMeans Clustering Grouping Without Labels from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with KMeans Clustering Grouping Without Labels is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.

Topics in This Module