HTTP, HTTPS and How the Web Actually Works

HTTP, HTTPS and How the Web Actually Works

Before Web Scraping, Understand This: HTTP, HTTPS and How Servers Actually Work

Started my web scraping journey - and before touching a single line of scraping code, I realized I needed to understand what actually happens when your computer talks to a server.

Most beginners skip this part. I did not want to. Here is everything I learned on Day 1.


What is HTTP?

HTTP stands for HyperText Transfer Protocol. In plain terms, it is the set of rules that defines how a request must be written, how a response must be written, and how a client and server communicate with each other.

Breaking it down:

  • HyperText - web content like HTML pages
  • Transfer - sending data back and forth
  • Protocol - the agreed rules of communication

In one line: HTTP is the set of rules used to transfer web pages from a server to your browser.


The Restaurant Analogy

The simplest way to understand HTTP is through a real-life comparison.

  • You are the client (your browser or Python script)
  • The waiter is HTTP (the communication system)
  • The kitchen is the server
  • The food is the website content (HTML, data)

You place an order. That is the request. The kitchen prepares it and sends it back through the waiter. That is the response. HTTP is not the food and not the kitchen. It is the system that carries communication between the two.


What an HTTP Request Actually Looks Like

When you type https://google.com/search?q=python into your browser, this is what gets sent to the server behind the scenes:

GET /search?q=python HTTP/1.1 Host: www.google.com User-Agent: Mozilla/5.0 Accept: text/html

Every HTTP request follows a strict structure:

  • Request line - the method (GET), the path (/search), and the HTTP version
  • Headers - key-value pairs giving the server more context
  • Blank line - separates headers from the body
  • Body - optional, used in POST requests when sending data

The server then responds with its own structure:

HTTP/1.1 200 OK Content-Type: text/html Content-Length: 50000

followed by the actual HTML content of the page.


Where the Requests Library Fits In

This is where a lot of beginners get confused.

Wrong understanding: "Web scraping means bypassing the browser and talking directly to the server."

Correct understanding: "Web scraping means replacing the browser and acting as the client yourself."

When you write:

import requests res = requests.get("https://example.com") print(res.text)

Your Python script is not bypassing anything. It is becoming the HTTP client. It sends a properly formatted HTTP request, the server responds, and you receive the raw HTML. You are doing exactly what a browser does, just programmatically.

The three HTTP client libraries worth knowing:

  • requests - the standard Python library, most commonly paired with BeautifulSoup for scraping
  • urllib - Python's built-in module, no installation needed, gives lower-level control
  • axios - the JavaScript equivalent, widely used in Node.js environments


HTTP vs HTTPS

HTTPS stands for HyperText Transfer Protocol Secure. It is the same HTTP, but with encryption layered on top using SSL/TLS.

The difference in simple terms:

  • HTTP is like sending a postcard. Anyone who handles it in transit can read what is written.
  • HTTPS is like sending a sealed envelope. Only the recipient can open and read it.

With HTTP, your data travels as plain text. On public Wi-Fi, anyone on the same network can intercept and read it. Passwords, form data, personal information — all visible.

With HTTPS, the data is encrypted before it leaves your device. Even if someone intercepts it, they see only scrambled, unreadable data.


SSL and TLS - What Actually Provides the Security

HTTPS uses two technologies to encrypt data:

  • SSL (Secure Sockets Layer) - the original encryption technology, now considered outdated
  • TLS (Transport Layer Security) - the modern, improved version of SSL

In practice, when people say SSL they usually mean TLS. It is TLS that is actively used today. When you see the lock icon in your browser, TLS is what is running underneath.


Why HTTPS is Non-Negotiable for Login Systems

When a user submits a username and password:

  • Over HTTP - the credentials are sent as plain text. Anyone on the network can capture them.
  • Over HTTPS - the credentials are encrypted. Even if intercepted, they cannot be read.

This is not optional. Browsers now actively warn users when a login form is served over HTTP. Modern frameworks enforce HTTPS by default. APIs require it.

One important addition: HTTPS protects data in transit, but it is not the whole picture. You should also hash passwords on the server side (using something like bcrypt), use secure cookies, and implement proper authentication. HTTPS is the foundation, not the complete solution.


How HTTPS is Actually Set Up

This is something that confused me early on.

You do not enable HTTPS inside your Python or JavaScript code. HTTPS is configured at the server and domain level, completely outside your application code.

What a developer actually does:

  • Obtain an SSL/TLS certificate from a trusted authority (Let's Encrypt is free and widely used)
  • Configure the web server (Nginx, Apache) or hosting platform (AWS, Vercel, Netlify) to use that certificate
  • Force a redirect so all HTTP traffic automatically goes to HTTPS

In your code, you simply use the correct URL:

url = "https://example.com"

The security layer is handled entirely outside your code. Think of it this way: HTTPS is the lock on the building. Your code is the conversation happening inside. You do not build the lock inside the conversation. The building already has it.


Why I Started Here Before Scraping

Understanding HTTP before writing scraping code means you understand why your scraper works, why it fails, and what is actually happening between your script and the server.

When your request gets blocked, you know what headers to check. When you see a 403 or 404 status code, you know what it means. When someone asks why you used requests over urllib, you can explain it.

The fundamentals are not a detour. They are the shortcut.

#Python #WebScraping #LearningInPublic #100DaysOfCode #Programming #HTTP #WebDevelopment

To view or add a comment, sign in

More articles by Nagesh Agrawal

  • Getting Started with Requests

    What Are HTTP Client Libraries? Before scraping anything, you need a way to send HTTP requests from your Python script.…

Others also viewed

Explore content categories