NumPy Arrays — The Foundation of Numerical ML

NumPy arrays are the backbone of every ML computation. Unlike Python lists, NumPy arrays are stored in contiguous memory, support vectorized operations (no loops needed), and enable broadcasting — applying operations across arrays of different shapes. Scikit-learn, PyTorch, and every other ML library internally uses NumPy arrays.

20 min•By Priygop Team•Updated 2026

NumPy Arrays and Vectorized Operations

import numpy as np

# CREATING ARRAYS
arr1d = np.array([1, 2, 3, 4, 5])
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])  # 3x3 matrix
print(f"1D shape: {arr1d.shape} | dtype: {arr1d.dtype}")
print(f"2D shape: {arr2d.shape}\n")

# USEFUL CONSTRUCTORS
zeros  = np.zeros((3, 4))          # 3 rows, 4 cols of 0.0
ones   = np.ones((2, 3))           # 2 rows, 3 cols of 1.0
eye    = np.eye(3)                 # 3x3 identity matrix
rng    = np.random.default_rng(42) # reproducible random generator
rand   = rng.normal(0, 1, (5, 3)) # 5x3 standard normal samples
linsp  = np.linspace(0, 1, 10)    # 10 evenly spaced values from 0 to 1

# VECTORIZED OPS -- no loops, much faster
prices = np.array([100, 200, 150, 300, 250])

# Apply discount: 15% off all prices
discounted = prices * 0.85          # element-wise multiplication
above_200  = prices[prices > 200]   # boolean indexing

print(f"Original prices:   {prices}")
print(f"Discounted 15%:    {discounted}")
print(f"Prices above 200:  {above_200}")

# SPEED COMPARISON: loop vs vectorized
large = rng.random(1_000_000)

import time
# Loop approach (slow)
t0 = time.time()
result_loop = [x ** 2 for x in large]
t_loop = time.time() - t0

# NumPy approach (fast)
t0 = time.time()
result_np = large ** 2
t_np = time.time() - t0

print(f"\nLoop: {t_loop*1000:.1f}ms | NumPy: {t_np*1000:.1f}ms | Speedup: {t_loop/t_np:.0f}x")

# BROADCASTING -- apply ops across different shapes
# Each row of X, subtract the mean of that column
X = rng.random((100, 5))   # 100 samples, 5 features
col_means = X.mean(axis=0)  # shape (5,)
X_centered = X - col_means  # broadcasting: (100,5) - (5,) -> (100,5)
print(f"\nAfter centering, column means: {X_centered.mean(axis=0).round(10)}")

# KEY ARRAY OPERATIONS FOR ML
matrix = rng.random((4, 3))
print(f"\nMatrix shape: {matrix.shape}")
print(f"Row sums:     {matrix.sum(axis=1)}")    # sum each row
print(f"Col means:    {matrix.mean(axis=0).round(3)}")  # mean of each column
print(f"Overall mean: {matrix.mean():.4f}")
print(f"Transposed:   {matrix.T.shape}")        # transpose for matrix operations

Tip

Practice NumPy Arrays The Foundation of Numerical ML in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

Machine Learning follows a structured pipeline from data to deployment

Practice Task

Note

Practice Task — (1) Write a working example of NumPy Arrays The Foundation of Numerical ML from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with NumPy Arrays The Foundation of Numerical ML is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.

Topics in This Module