Simple web scraping with Python
Web scraping with python by Angelo Faella

Simple web scraping with Python

Black Friday is almost here and, as always, you will keep refreshing the page of the product you want while waiting for a super discount.

Don't worry, there's another way to keep an eye on the price of your favorite product. You can do it with Python! In this brief article, we'll see how to create a python script that sends us an email when the price of our favorite product on Amazon falls below a certain threshold.


The target

Since I needed a mouse, as an example I chose the Logitech MX 2S wireless.

Logitech MX 2S wireless

From the Amazon page of this product we need 2 things:

  1. URL of the page;
  2. The id of the HTML element containing the price.

The first thing is very simple, just copy the URL of the page and paste it in a note. For the second one, you need to inspect the HTML code of the page.

Inspecting HTML page

As you can see in the image above, the price is contained in a <span> element with id="priceblock_ourprice". Perfect! Save this information and jump into your favorite IDE.


The script

Let's start importing the modules we need: requests to request the web page, BeautifulSoup to parse the HTML, smtplib to send emails.

imports

We'll break our code into two functions: check_price and send_email. Our "main" will look like this:

Non è stato fornito nessun testo alternativo per questa immagine

As you can see, we will send an email when the actual price of the product falls below the threshold (MY_PRICE). In this case, I set the threshold greater than the actual price just to test that everything works fine.

Now let's write the check_price function:

check_price()

The first thing to do is request the web page using get(...) method of request. We pass in the URL of the page and the headers for the HTTP request. Then, we can parse the HTML with BeautifulSoup and find(...) the <span> element with id="priceblock_ourprice". Once this is done, we can extract the price and return it as int.

Once we get the actual price, if it's below our threshold we send an email with send_email function:

send_email()

With smtplib we first identify ourselves to the server (in this case gmail) with ehlo(), then we establish a TLS connection using starttls(). All the following SMTP commands will be encrypted, so we should call ehlo(...) again (see smtplib doc). To log in you can pass to login() method, in addition to your email address, either the password of your gmail account or an App Password (RECOMMENDED). Note that to use App Passwords you must enable 2-Factor Authentication in your Google account.

For the message to send I simply wrote a string containing the actual price and a link to the product page. Of course, you can customize it and add other information. Finally, we can send the email with sendmail() method passing as input the sender, the recipient, and the message. After the email has been sent, we can close the connection to the server with quit().


The result

Once we've completed and executed the script we'll receive an email like this:

email

Great! Everything worked fine.

You are probably wondering how to make this script run automatically on a schedule. Well, there is more than one option to do this, my advice is to create an AWS Lambda function and use AWS CloudWatch Events to schedule it. Here's a simple and effective tutorial by AWS: Schedule Lambda functions using CloudWatch events.


Final words

For sure this little script (full code here) can be improved in different ways, you can take it just as a quick starting point for something more complex.

Thoughts? Questions? Errors? Let me know in the comments below and if you liked this article, please share it!

To view or add a comment, sign in

Others also viewed

Explore content categories