JDoodle - Online Compiler, Editor for Java, C/C++, etc

Web Scraping with BeautifulSoup

Duration: 5 minutes

Hello there, web explorers! Are you ready to dive into the realm of Python web scraping? It’s time to learn about BeautifulSoup, a Python package for extracting data from HTML and XML files. It converts a large web page into a tree of Python objects such as tags, navigable text, and comments. An essential tool for data scientists and information searchers! Here’s a basic demonstration of what BeautifulSoup can do:

from bs4 import BeautifulSoup
import requests
# Make a request to the website
r = requests. get("http://www. example. com")
r_content = r. text
# Create a BeautifulSoup object and specify the parser
soup = BeautifulSoup(r_content, 'html. parser')
# Extract the title of the page
page_title = soup. title. text
print('Page Title:', page_title)

In this example,** BeautifulSoup** parses the HTML from the webpage and then we can use its methods to extract data simply and intuitively.

Exercise

It’s your turn to scrape:- Choose a website to scrape (Make sure it allows web scraping by checking its robots. txt file).

Use BeautifulSoup to extract some specific information from the website (like all the hyperlinks, all images, or the text in a certain div or table, etc.).

Conclusion

Excellent work! With BeautifulSoup, you’ve just discovered the untapped potential of web scraping. The internet is a vast ocean of material just begging to be explored, and you now have the means to do it. Have fun swimming and scrapbooking!

Web Scraping with BeautifulSoup

Exercise

Conclusion

Next Tutorial: Automating Tasks with Python

Code on the Go with our Mobile App!