Back to Course|3.5 hours|Beginner

Python for Data Science

Learn essential Python libraries for data science and machine learning. Master NumPy, Pandas, Matplotlib, and data preprocessing techniques.

Progress: 0/4 topics completed0%

Select Topics Overview

NumPy Fundamentals

Master the fundamental package for scientific computing in Python

Content by: Nirav Khanpara

AI/ML Engineer

Connect

What is NumPy?

NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions to operate on these arrays.

Key Features

•N-dimensional Arrays: Efficient array operations
•Mathematical Functions: Built-in mathematical operations
•Linear Algebra: Matrix operations and decompositions
•Random Number Generation: Various probability distributions

Creating NumPy Arrays

Code Example

import numpy as np

# Create arrays
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.zeros((3, 4))  # 3x4 array of zeros
arr3 = np.ones((2, 3))   # 2x3 array of ones
arr4 = np.arange(0, 10, 2)  # Array from 0 to 10, step 2
arr5 = np.linspace(0, 1, 5)  # 5 evenly spaced values from 0 to 1

# Random arrays
random_arr = np.random.rand(3, 3)  # 3x3 random array
normal_arr = np.random.normal(0, 1, 100)  # 100 normal distributed values

Swipe to see more code

import numpy as np

# Create arrays
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.zeros((3, 4))  # 3x4 array of zeros
arr3 = np.ones((2, 3))   # 2x3 array of ones
arr4 = np.arange(0, 10, 2)  # Array from 0 to 10, step 2
arr5 = np.linspace(0, 1, 5)  # 5 evenly spaced values from 0 to 1

# Random arrays
random_arr = np.random.rand(3, 3)  # 3x3 random array
normal_arr = np.random.normal(0, 1, 100)  # 100 normal distributed values

Scroll

Array Operations

Code Example

# Basic operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(a + b)  # [5 7 9]
print(a * b)  # [4 10 18]
print(a ** 2)  # [1 4 9]

# Statistical operations
data = np.array([1, 2, 3, 4, 5])
print(np.mean(data))    # 3.0
print(np.std(data))     # 1.414...
print(np.median(data))  # 3.0
print(np.max(data))     # 5
print(np.min(data))     # 1

Swipe to see more code

# Basic operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(a + b)  # [5 7 9]
print(a * b)  # [4 10 18]
print(a ** 2)  # [1 4 9]

# Statistical operations
data = np.array([1, 2, 3, 4, 5])
print(np.mean(data))    # 3.0
print(np.std(data))     # 1.414...
print(np.median(data))  # 3.0
print(np.max(data))     # 5
print(np.min(data))     # 1

Scroll

🎯 Practice Exercise

Test your understanding of this topic:

I understand the basic concepts covered in this topicI can apply the concepts in practical scenariosI'm ready to move to the next topic

Pandas Data Manipulation

Learn powerful data manipulation and analysis with Pandas

Content by: Nirav Khanpara

AI/ML Engineer

Connect

What is Pandas?

Pandas is a powerful data manipulation and analysis library for Python. It provides data structures for efficiently storing and manipulating large datasets, with tools for reading and writing data in various formats.

Key Data Structures

•Series: 1-dimensional labeled array
•DataFrame: 2-dimensional labeled data structure
•Panel: 3-dimensional labeled data structure

Creating DataFrames

Code Example

import pandas as pd
import numpy as np

# Create DataFrame from dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'Age': [25, 30, 35, 28],
    'City': ['NYC', 'LA', 'Chicago', 'Boston'],
    'Salary': [50000, 60000, 70000, 55000]
}
df = pd.DataFrame(data)

# Create DataFrame from list of lists
data_list = [
    ['Alice', 25, 'NYC', 50000],
    ['Bob', 30, 'LA', 60000],
    ['Charlie', 35, 'Chicago', 70000]
]
df2 = pd.DataFrame(data_list, columns=['Name', 'Age', 'City', 'Salary'])

print(df.head())

Swipe to see more code

import pandas as pd
import numpy as np

# Create DataFrame from dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'Age': [25, 30, 35, 28],
    'City': ['NYC', 'LA', 'Chicago', 'Boston'],
    'Salary': [50000, 60000, 70000, 55000]
}
df = pd.DataFrame(data)

# Create DataFrame from list of lists
data_list = [
    ['Alice', 25, 'NYC', 50000],
    ['Bob', 30, 'LA', 60000],
    ['Charlie', 35, 'Chicago', 70000]
]
df2 = pd.DataFrame(data_list, columns=['Name', 'Age', 'City', 'Salary'])

print(df.head())

Scroll

Data Selection and Filtering

Code Example

# Select columns
print(df['Name'])
print(df[['Name', 'Age']])

# Filter data
young_people = df[df['Age'] < 30]
high_salary = df[df['Salary'] > 60000]

# Multiple conditions
filtered = df[(df['Age'] > 25) & (df['Salary'] > 55000)]

# Sort data
sorted_df = df.sort_values('Age', ascending=False)

Swipe to see more code

# Select columns
print(df['Name'])
print(df[['Name', 'Age']])

# Filter data
young_people = df[df['Age'] < 30]
high_salary = df[df['Salary'] > 60000]

# Multiple conditions
filtered = df[(df['Age'] > 25) & (df['Salary'] > 55000)]

# Sort data
sorted_df = df.sort_values('Age', ascending=False)

Scroll

🎯 Practice Exercise

Test your understanding of this topic:

I understand the basic concepts covered in this topicI can apply the concepts in practical scenariosI'm ready to move to the next topic

Matplotlib Visualization

Create beautiful and informative data visualizations

Content by: Nirav Khanpara

AI/ML Engineer

Connect

What is Matplotlib?

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It provides a MATLAB-like plotting interface.

Basic Plotting

Code Example

import matplotlib.pyplot as plt
import numpy as np

# Create data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Create plot
plt.figure(figsize=(10, 6))
plt.plot(x, y, 'b-', linewidth=2, label='sin(x)')
plt.plot(x, np.cos(x), 'r--', linewidth=2, label='cos(x)')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Trigonometric Functions')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Swipe to see more code

import matplotlib.pyplot as plt
import numpy as np

# Create data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Create plot
plt.figure(figsize=(10, 6))
plt.plot(x, y, 'b-', linewidth=2, label='sin(x)')
plt.plot(x, np.cos(x), 'r--', linewidth=2, label='cos(x)')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Trigonometric Functions')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Scroll

Different Plot Types

Code Example

# Scatter plot
plt.figure(figsize=(12, 4))

plt.subplot(1, 3, 1)
plt.scatter(x, y, alpha=0.6)
plt.title('Scatter Plot')

# Bar plot
plt.subplot(1, 3, 2)
categories = ['A', 'B', 'C', 'D']
values = [4, 3, 2, 1]
plt.bar(categories, values)
plt.title('Bar Plot')

# Histogram
plt.subplot(1, 3, 3)
data = np.random.normal(0, 1, 1000)
plt.hist(data, bins=30, alpha=0.7)
plt.title('Histogram')

plt.tight_layout()
plt.show()

Swipe to see more code

# Scatter plot
plt.figure(figsize=(12, 4))

plt.subplot(1, 3, 1)
plt.scatter(x, y, alpha=0.6)
plt.title('Scatter Plot')

# Bar plot
plt.subplot(1, 3, 2)
categories = ['A', 'B', 'C', 'D']
values = [4, 3, 2, 1]
plt.bar(categories, values)
plt.title('Bar Plot')

# Histogram
plt.subplot(1, 3, 3)
data = np.random.normal(0, 1, 1000)
plt.hist(data, bins=30, alpha=0.7)
plt.title('Histogram')

plt.tight_layout()
plt.show()

Scroll

🎯 Practice Exercise

Test your understanding of this topic:

I understand the basic concepts covered in this topicI can apply the concepts in practical scenariosI'm ready to move to the next topic

Data Preprocessing

Prepare and clean data for machine learning models

Content by: Nirav Khanpara

AI/ML Engineer

Connect

What is Data Preprocessing?

Data preprocessing is a crucial step in machine learning that involves cleaning, transforming, and preparing raw data for analysis and modeling.

Handling Missing Data

Code Example

import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer

# Create sample data with missing values
data = {
    'A': [1, 2, np.nan, 4, 5],
    'B': [1.1, np.nan, 3.3, 4.4, 5.5],
    'C': ['a', 'b', 'c', np.nan, 'e']
}
df = pd.DataFrame(data)

# Check for missing values
print(df.isnull().sum())

# Fill missing values
df_filled = df.fillna(df.mean())  # For numerical columns
df_filled = df.fillna(df.mode().iloc[0])  # For categorical columns

# Using SimpleImputer
imputer = SimpleImputer(strategy='mean')
df_imputed = pd.DataFrame(imputer.fit_transform(df), columns=df.columns)

Swipe to see more code

import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer

# Create sample data with missing values
data = {
    'A': [1, 2, np.nan, 4, 5],
    'B': [1.1, np.nan, 3.3, 4.4, 5.5],
    'C': ['a', 'b', 'c', np.nan, 'e']
}
df = pd.DataFrame(data)

# Check for missing values
print(df.isnull().sum())

# Fill missing values
df_filled = df.fillna(df.mean())  # For numerical columns
df_filled = df.fillna(df.mode().iloc[0])  # For categorical columns

# Using SimpleImputer
imputer = SimpleImputer(strategy='mean')
df_imputed = pd.DataFrame(imputer.fit_transform(df), columns=df.columns)

Scroll

Feature Scaling

Code Example

from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split

# Sample data
X = np.random.randn(100, 3)
y = np.random.randint(0, 2, 100)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardization (Z-score normalization)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Min-Max scaling
minmax_scaler = MinMaxScaler()
X_train_minmax = minmax_scaler.fit_transform(X_train)
X_test_minmax = minmax_scaler.transform(X_test)

Swipe to see more code

from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split

# Sample data
X = np.random.randn(100, 3)
y = np.random.randint(0, 2, 100)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardization (Z-score normalization)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Min-Max scaling
minmax_scaler = MinMaxScaler()
X_train_minmax = minmax_scaler.fit_transform(X_train)
X_test_minmax = minmax_scaler.transform(X_test)

Scroll

🎯 Practice Exercise

Test your understanding of this topic:

I understand the basic concepts covered in this topicI can apply the concepts in practical scenariosI'm ready to move to the next topic

Previous Module

Module 1: Introduction to AI & Machine Learning

Next Module

Module 3: Supervised Learning

Additional Resources

📚 Recommended Reading

•Python for Data Analysis by Wes McKinney
•Python Data Science Handbook by Jake VanderPlas
•NumPy and Pandas Official Documentation

🌐 Online Resources

•NumPy Tutorial
•Pandas Getting Started Guide
•Matplotlib Tutorial

Ready for the Next Module?

Continue your learning journey and master the next set of concepts.

Continue to Module 3

Module 2: Python for Data Science

Python for Data Science

Select Topics Overview

NumPy Fundamentals

Pandas Data Manipulation

Matplotlib Visualization

Data Preprocessing

NumPy Fundamentals

Pandas Data Manipulation

Matplotlib Visualization

Data Preprocessing

NumPy Fundamentals

What is NumPy?

Key Features

Creating NumPy Arrays

Array Operations

🎯 Practice Exercise

Pandas Data Manipulation

What is Pandas?

Key Data Structures

Creating DataFrames

Data Selection and Filtering

🎯 Practice Exercise

Matplotlib Visualization

What is Matplotlib?

Basic Plotting

Different Plot Types

🎯 Practice Exercise

Data Preprocessing

What is Data Preprocessing?

Handling Missing Data

Feature Scaling

🎯 Practice Exercise

Module 1: Introduction to AI & Machine Learning

Module 3: Supervised Learning

Additional Resources

📚 Recommended Reading

🌐 Online Resources

Ready for the Next Module?