A Simplistic Approach to Web Scraping

Shalini .

Published Oct 13, 2020

+ Follow

Have you ever wondered of scrapping a website?

Yes, I did. And here is my attempt of scrapping data and fetch it down to a tabular format.

Let me share my experience, to help wider audience achieving that smoothly.

I thought of scrapping any apparel website but not wanted to go for amazon, or flipkart.

Then I thought let me scrape “Koovs.com”(just a wild thought).

Basic steps:

1. URL/Website you want to scrape.

2. Make drivers available in your system as per the browser you want to use. As I am using Chrome, the link to chrome driver is:

https://chromedriver.chromium.org/downloads

Make this driver available in folder, where your python code is kept.

3. Keep “BeautifulSoup” package in your python system. One can create object of Beautifulsoup to use its functions like find, findall etc. to get content related to particular tag of HTML.

4. Keep basic understanding of HTML, that will help you in extracting data, which is in form of tags embedded in website pages.

5. Use exception handling while extracting parameters. So, in case that attribute is not available for a particular item, “AttributeError” gets handled by code itself.

6. Build three functions:

a. get_url => This will get you the url for the item you planning to extract info of.

For example => your search term is “Shoes”

So, your url should be: https://www.koovs.com/ Shoes

url pattern varies websites to websites. One needs to manually search it on website, to get the exact url of an item

b. extract_record => This will help you extract info like Item name, item brand, price etc using related HTML tags. One needs to analyze unique way of segregating search records and then embed that in code to extract relative tags.

c. main => This will be driver function which initiates get_url, extract_record. Main function will carry loop code to iterate over every page to extract item and item detail information.

Python Code:

import csv

from bs4 import BeautifulSoup

from selenium import webdriver

driver = webdriver.Chrome(executable_path='C:/Users/i505860/OneDrive - SAP SE/Documents/My Folder/Personnel/Extras/Extra_Dataset/Amazon_review_webscrape/chromedriver.exe')

url = 'https://www.koovs.com'

driver.get(url)

def get_url(search_text):

template = "https://www.koovs.com/{}"

return template.format(search_text)

def extract_record(item):

"""Extract and return data from a single record"""

atag = item.a

description = atag.text.strip()

url = 'https://www.koovs.com' + atag.get('href')

try:

# product price

Product_Info_parent = item.find('div', 'infoView')

brandName = Product_Info_parent.find('span', 'product_title clip-text brandName').text

productName = Product_Info_parent.find('span', 'product_title clip-text productName').text

discountPrice = Product_Info_parent.find('span', 'discountPrice').text

except AttributeError:

discountPrice = ''

result = (description, brandName, productName, discountPrice, url)

print(result)

return result

def main(search_term):

"""Run main program """

# start the webdriver

driver =

webdriver.Chrome(executable_path='C:/Users/My Folder/Personnel/Extras/Extra_Dataset/Koovs_webscrape/chromedriver.exe')

records = []

url = get_url(search_term)

for page in range(1, 3): #iterate over 3 pages.

#print(page)

driver.get(url.format(page))

soup = BeautifulSoup(driver.page_source, 'html.parser')

results = soup.find_all('li', {'imageView'})

for item in results:

record = extract_record(item)

if record:

records.append(record)

driver.close()

# save data to csv file

with open('Koovs_Scrapped_results.csv', 'w', newline='', encoding='utf-8') as f:

writer = csv.writer(f)

writer.writerow(['description', 'brandName', 'productName', 'discountPrice', 'url'])

writer.writerows(records)

var = input("enter search term ")

main(var)

Output after scrapping:

Thanks “Izzy Analytics” (Youtube).

Happy Learning!

To view or add a comment, sign in

A Simplistic Approach to Web Scraping

Shalini .

More articles by Shalini .

Others also viewed

WEB SCRAPING

Selenium vs Beautiful Soup for Web Scraping: Which One Should You Use?

World of Web Scraping

Why Web Scraping Is the Secret Weapon Every Business Needs (And How to Use It Right)

What is Web Scraping? How to Scrape Data from Website ?

Web Scraping with Scrapling: 2026 Tutorial

[VBA – WEB - SCRAPING AND CRAWLING]

Effortless Form Filling and Submission with Python: No Selenium Required

Let python help me buy my new car (1/2) Web scraping with BeautifulSoup

Rundown of our new app, Job Finder! [FOR EDUCATIONAL PURPOSES ONLY]

Explore content categories

More articles by Shalini .

💎AI demands Inclusivity

Changing landscape of analytics in supply chain

Fierce Tech Race: AI War vs Chip War

Illusion and Delusion in Data Science World

Moving Beyond Notion’s in Data Science/Machine Learning Domain

Google Colab: A Promising Platform

Dark Data: Why untouched so far?

SAP HANA: Data Push-Pull via Python

Towards Empowering ITOA: More legible insights and Value add to business

Pondering over “Multi Label” vs “Multi Class” Classification? This is for you…