HTTP, HTTPS and How the Web Actually Works
Before Web Scraping, Understand This: HTTP, HTTPS and How Servers Actually Work
Started my web scraping journey - and before touching a single line of scraping code, I realized I needed to understand what actually happens when your computer talks to a server.
Most beginners skip this part. I did not want to. Here is everything I learned on Day 1.
What is HTTP?
HTTP stands for HyperText Transfer Protocol. In plain terms, it is the set of rules that defines how a request must be written, how a response must be written, and how a client and server communicate with each other.
Breaking it down:
In one line: HTTP is the set of rules used to transfer web pages from a server to your browser.
The Restaurant Analogy
The simplest way to understand HTTP is through a real-life comparison.
You place an order. That is the request. The kitchen prepares it and sends it back through the waiter. That is the response. HTTP is not the food and not the kitchen. It is the system that carries communication between the two.
What an HTTP Request Actually Looks Like
When you type https://google.com/search?q=python into your browser, this is what gets sent to the server behind the scenes:
GET /search?q=python HTTP/1.1 Host: www.google.com User-Agent: Mozilla/5.0 Accept: text/html
Every HTTP request follows a strict structure:
The server then responds with its own structure:
HTTP/1.1 200 OK Content-Type: text/html Content-Length: 50000
followed by the actual HTML content of the page.
Where the Requests Library Fits In
This is where a lot of beginners get confused.
Wrong understanding: "Web scraping means bypassing the browser and talking directly to the server."
Correct understanding: "Web scraping means replacing the browser and acting as the client yourself."
When you write:
import requests res = requests.get("https://example.com") print(res.text)
Your Python script is not bypassing anything. It is becoming the HTTP client. It sends a properly formatted HTTP request, the server responds, and you receive the raw HTML. You are doing exactly what a browser does, just programmatically.
The three HTTP client libraries worth knowing:
Recommended by LinkedIn
HTTP vs HTTPS
HTTPS stands for HyperText Transfer Protocol Secure. It is the same HTTP, but with encryption layered on top using SSL/TLS.
The difference in simple terms:
With HTTP, your data travels as plain text. On public Wi-Fi, anyone on the same network can intercept and read it. Passwords, form data, personal information — all visible.
With HTTPS, the data is encrypted before it leaves your device. Even if someone intercepts it, they see only scrambled, unreadable data.
SSL and TLS - What Actually Provides the Security
HTTPS uses two technologies to encrypt data:
In practice, when people say SSL they usually mean TLS. It is TLS that is actively used today. When you see the lock icon in your browser, TLS is what is running underneath.
Why HTTPS is Non-Negotiable for Login Systems
When a user submits a username and password:
This is not optional. Browsers now actively warn users when a login form is served over HTTP. Modern frameworks enforce HTTPS by default. APIs require it.
One important addition: HTTPS protects data in transit, but it is not the whole picture. You should also hash passwords on the server side (using something like bcrypt), use secure cookies, and implement proper authentication. HTTPS is the foundation, not the complete solution.
How HTTPS is Actually Set Up
This is something that confused me early on.
You do not enable HTTPS inside your Python or JavaScript code. HTTPS is configured at the server and domain level, completely outside your application code.
What a developer actually does:
In your code, you simply use the correct URL:
url = "https://example.com"
The security layer is handled entirely outside your code. Think of it this way: HTTPS is the lock on the building. Your code is the conversation happening inside. You do not build the lock inside the conversation. The building already has it.
Why I Started Here Before Scraping
Understanding HTTP before writing scraping code means you understand why your scraper works, why it fails, and what is actually happening between your script and the server.
When your request gets blocked, you know what headers to check. When you see a 403 or 404 status code, you know what it means. When someone asks why you used requests over urllib, you can explain it.
The fundamentals are not a detour. They are the shortcut.
#Python #WebScraping #LearningInPublic #100DaysOfCode #Programming #HTTP #WebDevelopment