Mini Project: Your First ML Model — Iris Classifier
Build your first complete ML model from scratch: load data, explore it, preprocess, train three models, compare them, and make predictions on new data. This mini-project establishes the workflow you'll follow for every ML project in this course.
Complete First ML Project
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
# STEP 1: LOAD & EXPLORE DATA
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df["species"] = pd.Categorical.from_codes(iris.target, iris.target_names)
print("Dataset shape:", df.shape)
print("\nFirst 5 rows:")
print(df.head())
print("\nClass distribution:")
print(df["species"].value_counts())
print("\nBasic statistics:")
print(df.describe().round(2))
# Check for missing values
print(f"\nMissing values: {df.isnull().sum().sum()}")
# STEP 2: PREPARE DATA
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc = scaler.transform(X_test)
# STEP 3: TRAIN AND COMPARE MODELS
models = {
"Logistic Regression": LogisticRegression(max_iter=200, random_state=42),
"Decision Tree": DecisionTreeClassifier(max_depth=3, random_state=42),
"K-Nearest Neighbors": KNeighborsClassifier(n_neighbors=5),
}
results = []
for name, model in models.items():
model.fit(X_train_sc, y_train)
y_pred = model.predict(X_test_sc)
cv_scores = cross_val_score(model, X_train_sc, y_train, cv=5)
results.append({
"Model": name,
"Test Accuracy": accuracy_score(y_test, y_pred),
"CV Mean": cv_scores.mean(),
"CV Std": cv_scores.std(),
})
results_df = pd.DataFrame(results).sort_values("Test Accuracy", ascending=False)
print("\nModel Comparison:")
print(results_df.to_string(index=False, float_format="{:.3f}".format))
# STEP 4: DEEP DIVE ON BEST MODEL
best_model = LogisticRegression(max_iter=200, random_state=42)
best_model.fit(X_train_sc, y_train)
print("\nDetailed Classification Report:")
print(classification_report(y_test, best_model.predict(X_test_sc), target_names=iris.target_names))
# STEP 5: PREDICT NEW SAMPLES
new_flowers = np.array([
[5.1, 3.5, 1.4, 0.2], # likely setosa
[6.7, 3.0, 5.2, 2.3], # likely virginica
[5.9, 3.0, 4.2, 1.5], # likely versicolor
])
new_scaled = scaler.transform(new_flowers)
predictions = best_model.predict(new_scaled)
probabilities = best_model.predict_proba(new_scaled)
print("\nPredictions on new flowers:")
for i, (pred, probs) in enumerate(zip(predictions, probabilities)):
print(f" Flower {i+1}: {iris.target_names[pred]:12s} "
f"(confidence: {max(probs):.1%})")Tip
Tip
Practice Mini Project Your First ML Model Iris Classifier in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Technical diagram.
Practice Task
Note
Practice Task — (1) Write a working example of Mini Project Your First ML Model Iris Classifier from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Mini Project Your First ML Model Iris Classifier is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.