Module 3: Supervised Learning

Linear Regression

Learn the fundamentals of Linear Regression for predicting continuous values.

AI/ML Engineer

What is Linear Regression?

Linear Regression is a fundamental supervised learning algorithm used for predicting continuous values. It assumes a linear relationship between the input features and the target variable.

Key Concepts

•Simple Linear Regression: One input feature
•Multiple Linear Regression: Multiple input features
•Cost Function: Mean Squared Error (MSE)
•Optimization: Gradient Descent

Mathematical Foundation

•Linear Regression Equation: y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε
•Where:
•• y = predicted value
•• β₀ = intercept (bias)
•• βᵢ = coefficients for features
•• xᵢ = input features
•• ε = error term

Implementation with Scikit-learn

Code Example

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

# Generate sample data
np.random.seed(42)
X = np.random.rand(100, 1) * 10
y = 2 * X + 1 + np.random.randn(100, 1) * 0.5

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.4f}")
print(f"R² Score: {r2:.4f}")
print(f"Intercept: {model.intercept_[0]:.4f}")
print(f"Coefficient: {model.coef_[0][0]:.4f}")

# Visualize results
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', label='Predicted')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression Results')
plt.legend()
plt.show()

Swipe to see more code

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

# Generate sample data
np.random.seed(42)
X = np.random.rand(100, 1) * 10
y = 2 * X + 1 + np.random.randn(100, 1) * 0.5

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.4f}")
print(f"R² Score: {r2:.4f}")
print(f"Intercept: {model.intercept_[0]:.4f}")
print(f"Coefficient: {model.coef_[0][0]:.4f}")

# Visualize results
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', label='Predicted')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression Results')
plt.legend()
plt.show()

Scroll

Model Evaluation

Code Example

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Common evaluation metrics for regression
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)

# Adjusted R-squared
n = len(y_test)
p = X_test.shape[1]  # number of features
adjusted_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)

print(f"MAE: {mae:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"R²: {r2:.4f}")
print(f"Adjusted R²: {adjusted_r2:.4f}")

Swipe to see more code

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Common evaluation metrics for regression
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)

# Adjusted R-squared
n = len(y_test)
p = X_test.shape[1]  # number of features
adjusted_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)

print(f"MAE: {mae:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"R²: {r2:.4f}")
print(f"Adjusted R²: {adjusted_r2:.4f}")

Scroll

🎯 Practice Exercise

Test your understanding of this topic:

I understand the basic concepts covered in this topicI can apply the concepts in practical scenariosI'm ready to move to the next topic

Logistic Regression

Master Logistic Regression for binary classification tasks.

Content by: Nirav Khanpara

AI/ML Engineer

Connect

What is Logistic Regression?

Logistic Regression is a supervised learning algorithm used for binary classification problems. Despite its name, it's a classification algorithm, not a regression algorithm.

Key Concepts

•Binary Classification: Predicts two classes (0 or 1)
•Sigmoid Function: Maps any real number to (0,1)
•Decision Boundary: Threshold for classification
•Cost Function: Log Loss (Cross-entropy)

Sigmoid Function

•Mathematical Definition: σ(z) = 1 / (1 + e^(-z))
•Where z = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ

Implementation Example

Code Example

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Generate sample data
np.random.seed(42)
X = np.random.randn(100, 2)
y = (X[:, 0] + X[:, 1] > 0).astype(int)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Create and train the model
model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Swipe to see more code

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Generate sample data
np.random.seed(42)
X = np.random.randn(100, 2)
y = (X[:, 0] + X[:, 1] > 0).astype(int)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Create and train the model
model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Scroll

Classification Metrics

Code Example

from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score

# Precision, Recall, F1-Score
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# ROC-AUC Score
y_pred_proba = model.predict_proba(X_test)[:, 1]
roc_auc = roc_auc_score(y_test, y_pred_proba)

print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")
print(f"ROC-AUC: {roc_auc:.4f}")

Swipe to see more code

from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score

# Precision, Recall, F1-Score
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# ROC-AUC Score
y_pred_proba = model.predict_proba(X_test)[:, 1]
roc_auc = roc_auc_score(y_test, y_pred_proba)

print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")
print(f"ROC-AUC: {roc_auc:.4f}")

Scroll

🎯 Practice Exercise

Test your understanding of this topic:

I understand the basic concepts covered in this topicI can apply the concepts in practical scenariosI'm ready to move to the next topic

Decision Trees

Understand Decision Trees for classification and regression tasks.

Content by: Nirav Khanpara

AI/ML Engineer

Connect

What are Decision Trees?

Decision Trees are a type of supervised learning algorithm that can be used for both classification and regression tasks. They work by recursively splitting the data based on feature values.

Key Concepts

•Root Node: Starting point of the tree
•Internal Nodes: Decision points based on features
•Leaf Nodes: Final predictions
•Splitting Criteria: Information gain, Gini impurity

Implementation

Code Example

import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

# Generate sample data
np.random.seed(42)
X = np.random.randn(100, 2)
y = (X[:, 0] + X[:, 1] > 0).astype(int)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Create and train the model
model = DecisionTreeClassifier(max_depth=3, random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

# Visualize the tree
plt.figure(figsize=(10, 8))
plot_tree(model, filled=True, rounded=True)
plt.show()

Swipe to see more code

import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

# Generate sample data
np.random.seed(42)
X = np.random.randn(100, 2)
y = (X[:, 0] + X[:, 1] > 0).astype(int)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Create and train the model
model = DecisionTreeClassifier(max_depth=3, random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

# Visualize the tree
plt.figure(figsize=(10, 8))
plot_tree(model, filled=True, rounded=True)
plt.show()

Scroll

🎯 Practice Exercise

Test your understanding of this topic:

I understand the basic concepts covered in this topicI can apply the concepts in practical scenariosI'm ready to move to the next topic

Random Forests

Explore Random Forests, an ensemble method for improved predictions.

Content by: Nirav Khanpara

AI/ML Engineer

Connect

What are Random Forests?

Random Forests are an ensemble learning method that operates by constructing multiple decision trees and outputting the class that is the mode of the classes predicted by individual trees.

Key Concepts

•Ensemble Method: Combines multiple models
•Bootstrap Sampling: Random sampling with replacement
•Feature Randomness: Random subset of features
•Voting/Averaging: Final prediction aggregation

Implementation

Code Example

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Generate sample data
np.random.seed(42)
X = np.random.randn(100, 4)
y = (X[:, 0] + X[:, 1] + X[:, 2] > 0).astype(int)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Create and train the model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Feature importance
feature_importance = model.feature_importances_
print("\nFeature Importance:")
for i, importance in enumerate(feature_importance):
    print(f"Feature {i}: {importance:.4f}")

Swipe to see more code

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Generate sample data
np.random.seed(42)
X = np.random.randn(100, 4)
y = (X[:, 0] + X[:, 1] + X[:, 2] > 0).astype(int)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Create and train the model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Feature importance
feature_importance = model.feature_importances_
print("\nFeature Importance:")
for i, importance in enumerate(feature_importance):
    print(f"Feature {i}: {importance:.4f}")

Scroll

🎯 Practice Exercise

Test your understanding of this topic:

I understand the basic concepts covered in this topicI can apply the concepts in practical scenariosI'm ready to move to the next topic

Supervised Learning

Select Topics Overview

Linear Regression

Logistic Regression

Decision Trees

Random Forests

Linear Regression

Logistic Regression

Decision Trees

Random Forests

Linear Regression

What is Linear Regression?

Key Concepts

Mathematical Foundation

Implementation with Scikit-learn

Model Evaluation

🎯 Practice Exercise

Logistic Regression

What is Logistic Regression?

Key Concepts

Sigmoid Function

Implementation Example

Classification Metrics

🎯 Practice Exercise

Decision Trees

What are Decision Trees?

Key Concepts

Implementation

🎯 Practice Exercise

Random Forests

What are Random Forests?

Key Concepts

Implementation

🎯 Practice Exercise

Module 2: Python for Data Science

Module 4: Unsupervised Learning

Additional Resources

📚 Recommended Reading

🌐 Online Resources

Ready for the Next Module?