Decision Trees
Understand Decision Trees for classification and regression tasks. This is a foundational concept in artificial intelligence and machine learning that professional developers rely on daily. The explanations below are written to be beginner-friendly while covering the depth and nuance that comes from real-world AI/ML experience. Take your time with each section and practice the examples
What are Decision Trees?
Decision Trees are a type of supervised learning algorithm that can be used for both classification and regression tasks. They work by recursively splitting the data based on feature values.. This is an essential concept that every AI/ML developer must understand thoroughly. In professional development environments, getting this right can mean the difference between code that works reliably and code that breaks in production. The following sections break this down into clear, digestible pieces with practical examples you can try immediately
Key Concepts
- Root Node: Starting point of the tree
- Internal Nodes: Decision points based on features
- Leaf Nodes: Final predictions
- Splitting Criteria: Information gain, Gini impurity
Implementation
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
# Generate sample data
np.random.seed(42)
X = np.random.randn(100, 2)
y = (X[:, 0] + X[:, 1] > 0).astype(int)
# Split the data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Create and train the model
model = DecisionTreeClassifier(max_depth=3, random_state=42)
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
# Visualize the tree
plt.figure(figsize=(10, 8))
plot_tree(model, filled=True, rounded=True)
plt.show()