Web Scraping with BeautifulSoup
Web scraping extracts data from websites. BeautifulSoup parses HTML and lets you navigate, search, and modify the DOM tree.
20 min•By Priygop Team•Updated 2026
Web Scraping
Web Scraping
# pip install beautifulsoup4 requests
# Simulated HTML for demo
html = """
<html>
<head><title>Products Page</title></head>
<body>
<h1>Best Products 2024</h1>
<div class="product" id="p1">
<h2>Laptop</h2>
<span class="price">$999</span>
<p class="desc">High-performance laptop</p>
</div>
<div class="product" id="p2">
<h2>Phone</h2>
<span class="price">$699</span>
<p class="desc">Flagship smartphone</p>
</div>
<div class="product" id="p3">
<h2>Tablet</h2>
<span class="price">$449</span>
<p class="desc">Pro tablet for creators</p>
</div>
</body>
</html>
"""
# Parse with an HTML parser (simulated)
import re
# Extract product data using regex (simplified demo)
products = []
pattern = r'<div class="product" id="(\w+)">[\s\S]*?<h2>(.*?)</h2>[\s\S]*?<span class="price">(.*?)</span>[\s\S]*?<p class="desc">(.*?)</p>'
for match in re.finditer(pattern, html):
products.append({
"id": match.group(1),
"name": match.group(2),
"price": match.group(3),
"description": match.group(4),
})
print("=== Scraped Products ===")
for p in products:
print(f" {p['name']}: {p['price']} — {p['description']}")
# With BeautifulSoup (actual code):
# from bs4 import BeautifulSoup
# soup = BeautifulSoup(html, "html.parser")
# title = soup.title.string
# products = soup.find_all("div", class_="product")
# for product in products:
# name = product.h2.text
# price = product.find("span", class_="price").text
# print(f"{name}: {price}")
# Ethical scraping rules:
print("\n=== Scraping Ethics ===")
print("1. Check robots.txt before scraping")
print("2. Don't overload servers (add delays)")
print("3. Respect rate limits")
print("4. Check terms of service")
print("5. Use APIs when available")Tip
Tip
Use BeautifulSoup for HTML parsing and Selenium for JavaScript-rendered pages. Always check robots.txt before scraping.
Diagram
Loading diagram…
Every website works on this model
Common Mistake
Warning
Scraping without permission or too aggressively can get you banned. Add delays between requests and respect rate limits.
Quick Quiz
Practice Task
Note
(1) Parse HTML with find() and select(). (2) Extract all links from a page. (3) Build a structured dataset from scraped data.