What happens when you type https://www.holbertonschool.com or any URL in your browser and press Enter

What happens when you type https://www.holbertonschool.com or any URL in your browser and press Enter

Nowadays, we spend most of our time in front of our computers using internet, to google for information, to communicate with each other and for many other purposes... But you probably never thought of what might be happening behind the scenes in order to make all of this possible.

In this article, we will explore the main steps that occur behind the scenes, starting from the point we press Enter to the point the desired website is loaded and appears on our screen.

OVERVIEW

***

What is a server?

What is a web server?

The client-server model

What is an IP address?

DNS request

What is a network protocol?

What is HTTP?

What is TCP/IP protocol?

How TCP works

Difference between TCP and UDP

Application server

Database and database server

The load balancer

HTTPS/SSL

Firewall

Conclusion

***

Before diving into the web infrastructure, let's ask ourselves these questions:

Where does the displayed data come from? and how can the web browser get it?

In order to reply to these questions, let's start by defining these commonly used terms.

What is a server?

You can think of a server as a special computer, without a keyboard, mouse, or screen, that provides some functionalities or services for other computers, called "clients". It is accessible only by a network and it is located in datacenters.

All website's files like HTML documents, images, CSS stylesheets, and JavaScript files... are stored inside a server and delivered by web servers.

What is a web server?

A web server is usually a software inside a physical server that controls how web users/clients access the website files. The web server responds to clients' requests by sending them back static content like HTML files. Nginx and Apache are very famous web servers.

The client-server model

Aucun texte alternatif pour cette image


Whenever we enter an URL in a browser. we actually ask for specific files that are hosted on a server. The browser sends a request to the server, asking for the files. when the request reaches the correct server, the web server accepts the request and sends back the files to the browser. Finally, the browser interprets the files to make them readable for us.

Aucun texte alternatif pour cette image

Let's take a look at the example above, the user types "www.foobar.com", the browser first, looks for the IP address of the server hosting the website files. when the browser find the IP address (which is 8.8.8.8 in the example), it sends a HTTP request to the server. The web server (nginx) that is installed on that server, listens to any HTTP requests, it accepts the request and responds with the files.

What is an IP address?

The IP address is the unique identifier that every machine on a network has. Just as we need the address of a person in order to send him a mail to his home, computers in the same way, use the IP address in order to communicate and send data to each other on a network.

There are 2 types of IP addresses: IP Version 4 (IPv4) and IP Version 6 (IPv6).

IPv4 (IP version 4) addresses are sequences of four numbers (from 0–255), separated by a dot (8.8.8.8).

Under IPv4, there are only 232 possible combinations, which offers just under 4.3 billion unique addresses. Due to the increase of the number of computers and devices on the Internet, we are running out of unique IPv4 addresses.

IPv6 came to the rescue by offering a much bigger number of unique IPs. An IPv6 address is a sequence of six segments of letters and/or numbers (0-F) separated by a semi colon.

How did the web browser find the IP address of www.foobar.com ?

When you type an URL (Uniform Ressource Locator) in the browser, the first thing the browser will do, is to break down that URL to get the domain name of the website.

Aucun texte alternatif pour cette image

Domain names exist because humans can remember words much better than IP addresses. So, in order to get the IP address of a server, the web browser will check first if its cache contains the IP address of the typed domain name. if the browser doesnt find the IP, it will next ask the operating system.

Note that, if the website was visited previously by the user, the browser will find the IP in the cache.

Let's assume that the user visited the website for the first time and that the browser didn't locally find the IP address of the website. In this case, the browser will make a query to a remote DNS server...

DNS request

Aucun texte alternatif pour cette image


DNS or Domain Name System is, in simple words, the technology that translates human-adapted, text-based domain names to machine-adapted, numerical-based IP.

The DNS request first goes through the resolver. The resolver is usually our Internet Service Provider. most ISPs have servers dedicated to resolving domain names. if the resolver knows the IP, then, the resolution process ends and it will send it back to the browser.

If the resolver doesn't know, the request will go to the root server. The root server doesn't know the IP address of any website, instead, it knows where the TLD (Top-Level Domain) server is. In our examples, "foobar.com" and "holberton.com", the top-level domain is ".com".

If the TLD server doesn’t know the IP, it points the resolver to the Authoritative Name Servers of the domain name. These are the servers that will know the IP address of the domain name (if the website actually exists) and can send it back to the resolver then to the web browser.

If the website doesn't exist, an error will be displayed on the screen.

After getting the IP address, it gets registered locally in the cache to avoid this long trip of DNS resolution process next time.

Now the browser knows the IP address and ready to send a HTTP request to the server.

Aucun texte alternatif pour cette image

What is a network protocol?

A network protocol is a set of rules and conventions for communication between network devices, including ways devices can identify and make connections with each other. There are also formatting rules that specify how data is packaged into sent and received messages.

Without protocols, devices wouldn't be able to understand the electronic signals that they send to each other.

What is HTTP?

Aucun texte alternatif pour cette image

HTTP stands for HyperText Transfer Protocol. It is a protocol that defines how messages are formatted and transmitted, and what actions web servers and browsers should take in response to various commands/ HTTP verbs.

For example, when the browser sends a HTTP request, the HTTP verb/method is GET by default, that means, the browser tries to get data from a specified ressource in the server.

There are other HTTP verbs or methods, like POST, PUT, HEAD, DELETE. The method POST, for example, is used to send data to a server to create/update a resource.

What is TCP/IP protocol?

TCP/IP stands for Transmission Control Protocol/Internet Protocol. TCP/IP is a set of standardized rules that allow computers to communicate on a network such as the internet.

Although TCP and IP are two separate computer network protocols, they're so often used together. As a result, the "TCP/IP" model is a recognized terminology

IP is the part that obtains the address to which data is sent while TCP is responsible for the way data is delivered, received, ordered and error-checked over the network. 

How TCP works

When a client sends a request to a server, the data is broken into packets. A packet is a small parcel of information that gets transmitted over the network. In the same way, The web server responds by sending back other packets.

Using TCP, all packets that are sent, are tracked so no data is lost or corrupted in transit. This means TCP is reliable.

Difference between TCP and UDP

Just like TCP, UDP (User Datagram Protocol) is another widely used protocol for sending packets over the Internet. But UDP is not reliable because packets that are sent over the network, are not checked and they may get lost or corrupted. On the other hand, UDP is faster and lighter than TCP.

Application server

Aucun texte alternatif pour cette image

Having only a web server in the server means that the delivered page will be only static. There won't be any interaction happening between users and the website. In order to have a dynamic website, we need an application server.

Application servers are dedicated to take some parameters and return a static content to the web server, then the web server sends the content to the client.

Database and database server

In order to store, extract, manipulate information, we need to have a database and a database server on the server.

A database is an organized collection of data. There are a few types of databases. the most commonly used type is relational databases. A relational database stores data in the form of tables. These tables may or may not, be linked to each other through primary and foreign keys.

Aucun texte alternatif pour cette image

Now, what about database servers?

Aucun texte alternatif pour cette image

A database server is the software that will allow us to interact with the database. Using a database server, we will be able to perform many tasks such as data analysis, storage, data manipulation, archiving, and other non-user specific tasks.

The load balancer

When a website has a large number of visitors, it would be impossible for one single server to handle all these requests. In order to have the website running all the time without any downtime, it would make sense to use many servers instead of one.

But how are client requests going to be sent to servers? which server will they be forwarded to first?

A load balancer is, as its name indicates, a software that will balance or distribute the load of traffic or requests accross the servers following a load-balancing algorithm. HAproxy is a very commonly used load-balancer.

Aucun texte alternatif pour cette image


There are several load-balancing algorithms, like: round-robin, weighted and least connections algorithm.

Round-robin alogrithm: Round Robin passes each new connection request to the next server in line, eventually distributing connections evenly across the array of machines being load balanced.

Aucun texte alternatif pour cette image

Weighted algorithm: The number of requests that each server/machine receives is proportionate to a ratio weight that we define for each machine. This weight can be defined based on each machine capabilities.

For example, we can say "Machine 3 can handle 2x the load of machine 1 or 2", and the load balancer will send two requests to machine 3 for each request sent to the two others.

Aucun texte alternatif pour cette image

Least Connections algorithm: in this algorithm, Requests are served first to the server which is currently handling least number of persistent connections.

HTTPS/SSL

In this example of url "https://www.holbertonschool.com", we can see that the protocol is https, Not http. So, what is HTTPS?

Hyper Text Transfer Protocol Secure (HTTPS) is the secure version of HTTP. The 'S' at the end of HTTPS stands for 'Secure'. It means all communications between the web browser and the wesbite are encrypted. HTTPS is often used to protect highly confidential online transactions like online banking and online shopping order forms.

With regular HTTP connections, all communications are in 'plain text' and can be read by any hacker that manages to break into the connection between the browser and the website. This presents a clear danger if the communication includes sensitive information like credit card details or social security number. With a HTTPS connection, all communications are securely encrypted. This means that even if somebody managed to break into the connection, they would not be able decrypt any of the data which passes between you and the website.

Aucun texte alternatif pour cette image


Aucun texte alternatif pour cette image


When a website uses HTTPS protocol, we can see a padlock icon in the address bar.

Aucun texte alternatif pour cette image

How Does HTTPS Work?

HTTPS websites typically use one of two secure protocols to encrypt communications - SSL (Secure Sockets Layer) or TLS (Transport Layer Security). Both the TLS and SSL protocols use what is known as an asymmetric Public Key Infrastructure system.

An asymmetric system uses two keys to encrypt communications, a public key and a private key. Anything encrypted with the public key can only be decrypted by the private key.

So, when we request a HTTPS connection to a website that uses HTTPS protocol, the website first sends its SSL certificate to the browser. This certificate contains the public key of the server. That means, if we encrypt any data using that public key, only the server that has the corresponding private key can decrypt and read it. After receiving the SSL certificate, the "SSL handshake" occurs and a secure connection between the two machines is established.

Firewall

A firewall is a network security system designed to prevent unauthorized access to or from a private network. Firewalls can be hardware or software.

There is two categories of firewalls: network firewalls or host-based firewalls:

Network firewalls are frequently used to prevent unauthorized Internet users from accessing private networks connected to the Internet, especially intranets. That means, All messages entering or leaving the intranet pass through the firewall, which examines each message and blocks those that do not meet the specified security criterias. Host-based firewalls run on host computers and control network traffic in and out of those machines.

Aucun texte alternatif pour cette image


In order to be protected from hackers and attacks, servers and load balancers are often equipped with firewalls.

For example, We can configure a firewall to accept connections coming from only port 22 (ssh port), 80 (http port) and 443 (https port). In that way, if an attacker tries to connect on another port, the firewall doesn't allow his request. we can also configure the firewall to accept only a certain list or range of IP addresses...

Conclusion

Aucun texte alternatif pour cette image

Now, Let's recapitulate!

What happens when you type https://www.holbertonschool.com or any URL in your browser and press Enter?

First, your browser looks for the IP address using the domain name of the website "holbertonschool.com". Once it is found, the browser sends a HTTPS request to the servers that are hosting the data. the request gets processed first by the firewall. if it passes the firewall, a secure HTTPS connection is established between the two machines.

The request is received by the load-balancer which forwards it to one of the servers depending on the configured load-balancing algorithm. The chosen web server receives the request, looks for the wanted files and sends them back in a HTTPS response to the browser.

Finally, the browser receives the packets of data and makes them readable for you.

To view or add a comment, sign in

More articles by Mariem Matri

Explore content categories