Web Scraping with BeautifulSoup

Web scraping extracts data from websites. BeautifulSoup parses HTML and lets you navigate, search, and modify the DOM tree.

20 min•By Priygop Team•Updated 2026

Web Scraping

# pip install beautifulsoup4 requests

# Simulated HTML for demo
html = """
<html>
<head><title>Products Page</title></head>
<body>
  <h1>Best Products 2024</h1>
  <div class="product" id="p1">
    <h2>Laptop</h2>
    <span class="price">$999</span>
    <p class="desc">High-performance laptop</p>
  </div>
  <div class="product" id="p2">
    <h2>Phone</h2>
    <span class="price">$699</span>
    <p class="desc">Flagship smartphone</p>
  </div>
  <div class="product" id="p3">
    <h2>Tablet</h2>
    <span class="price">$449</span>
    <p class="desc">Pro tablet for creators</p>
  </div>
</body>
</html>
"""

# Parse with an HTML parser (simulated)
import re

# Extract product data using regex (simplified demo)
products = []
pattern = r'<div class="product" id="(\w+)">[\s\S]*?<h2>(.*?)</h2>[\s\S]*?<span class="price">(.*?)</span>[\s\S]*?<p class="desc">(.*?)</p>'

for match in re.finditer(pattern, html):
    products.append({
        "id": match.group(1),
        "name": match.group(2),
        "price": match.group(3),
        "description": match.group(4),
    })

print("=== Scraped Products ===")
for p in products:
    print(f"  {p['name']}: {p['price']} — {p['description']}")

# With BeautifulSoup (actual code):
# from bs4 import BeautifulSoup
# soup = BeautifulSoup(html, "html.parser")
# title = soup.title.string
# products = soup.find_all("div", class_="product")
# for product in products:
#     name = product.h2.text
#     price = product.find("span", class_="price").text
#     print(f"{name}: {price}")

# Ethical scraping rules:
print("\n=== Scraping Ethics ===")
print("1. Check robots.txt before scraping")
print("2. Don't overload servers (add delays)")
print("3. Respect rate limits")
print("4. Check terms of service")
print("5. Use APIs when available")

Tip

Use BeautifulSoup for HTML parsing and Selenium for JavaScript-rendered pages. Always check robots.txt before scraping.

Diagram

Loading diagram…

Every website works on this model

Common Mistake

Warning

Scraping without permission or too aggressively can get you banned. Add delays between requests and respect rate limits.

Quick Quiz

Practice Task

Note

(1) Parse HTML with find() and select(). (2) Extract all links from a page. (3) Build a structured dataset from scraped data.

Topics in This Module