Learn Web Scraping with Python

Web Scrapping First Example

Web Scrapping First Example:

Let's take an example to understand the scrapping practically by extracting the data from the webpage and inspecting the whole page.

First, open your favorite page on Wikipedia and inspect the whole page, and before extracting data from the webpage, you should ensure your requirement.

E.g:-

#importing the BeautifulSoup Library 
import bs4 
import requests 
 
#Creating the requests
res = requests.get("https://en.wikipedia.org/wiki/Apache_Hadoop") 
print("The object type:",type(res)) 
 
# Convert the request object to the Beautiful Soup Object 
soup = bs4.BeautifulSoup(res.text,'html5lib') 
print("The object type:",type(soup))

 

Output:-

The object type: <class 'requests.models.Response'>
The object type: <class 'bs4.BeautifulSoup'>

 

In the following lines of code, we are extracting all headings of a webpage by class name. Here front-end knowledge plays an essential role in inspecting the webpage.

soup.select('.mw-headline') 
for v in soup.select('.mw-headline'): 
    print(v.text,end = ',')

 

Output:-

History,Architecture,File systems,Hadoop distributed file system,Other file systems,JobTracker and TaskTracker: the MapReduce engine,Scheduling,Fair scheduler,Capacity scheduler,Difference between Hadoop 1 and Hadoop 2 (YARN),Difference between Hadoop 2 and Hadoop 3,Other applications,Prominent use cases,Hadoop hosting in the cloud,Commercial support,Branding,Papers,See also,References,Bibliography,External links,

 

In the above code, we imported the bs4 and requested the library. In the third line, we created a res object to send a request to the webpage. As you can observe that we have extracted all heading from the webpage.