What Happens When You Type https://www.google.com on Your Browser and Then Enter?
A few notes to keep in mind may include:
When we type a URL (Uniform Resource Locator) such as https://www.google.com on our web browser/client browser and hit enter, the first thing that happens is this;
URL parsing:
The domain name is parsed to know if it is a search term or a website. The browser determines if the URL entered is a website or a search word. In our case, we have a website, so the next step is looking up the IP address of the domain name.
What is an IP address? It is a unique identifier (a unique series of numbers) used for communication and identification of a machine over a network. As of now, we have two kinds which are IPv4 and IPv6 addresses.
DNS LOOKUP: The DNS (Domain Name server) is typical of a phonebook on our phones. It is a record that saves the IP address of a server (google server) to which you want to communicate. Since humans will find it difficult to memorize 32-bit digits of IP4 addresses, it is better saved in our memory as alphabets or alphanumeric strings (for instance, yahoo.com, google.com). The DNS resolves the domain name (www.google.com to its equivalent IP address 216.58.223.196). The client/browser checks its cache to see if google.com has ever been queried for its IP address and if it doesn't find the IP address, it asks the OS (Operating System) for the IP address of google.com in its system hosts file. If it still doesn't find it, then further steps are taken.
To dive deeper into how DNS works, we will have to mention some terminologies here:
Resolver: The ISP DNS record resolver. At this point, it might have never queried google.com before, so it goes to the root server for help.
Root server: Top of the DNS hierarchy. There are 13 sets of these root servers strategically placed around the world. Twelve organizations operate it. Each set of the root server has its IP address.
The first time it sees a query, it won't know the IP address, but it knows where to redirect the Resolver to fetch the requested IP address, which is the TLD (Top Level Domain server).
TLD: This server stores addresses for top-level domain names such as .com, .net, .org, .gov, etc. it manages a top-level domain name; fortunately, google.com is part of it. But since it is the first time the Resolver is querying the TLD for that IP address, it is likely it doesn't have it lying around. So it redirects the Resolver to the last level, the Authoritative Name server.
Authoritative Name Server: The server has all domain names mappings to IP addresses. The IP address of google will be lying around somewhere here. The Resolver sends this back to the browser. Behind the scenes, we have something like this on our browser, https://216.58.223.196:443, which is the same as https://www.google.com:443/.
COMMUNICATION BETWEEN THE CLIENT AND GOOGLE SERVER:
The OSI and TCP/IP models are the two mostly referred to when trying to describe how communication protocols are carried out over the internet.
Recommended by LinkedIn
TCP/IP is a protocol model that defines how data is received or transmitted over a network. It consists of 4 layers compared to the OSI model, which consists of 7. The four-layer includes
1. Network Access Layer.
2. Internet Layer.
3. Transport Layer.
4. Application Layer.
Depending on the network operation (sending or receiving data packets), the layers can be from top to bottom or vice-versa. In our case
from the bottom to the top of the stack (4 -> 1).
Since we are using HTTPS, a secure connection between the client and server will be created so that data is not sent as plain text but encrypted and decrypted when necessary.
HTTPS: It stands for (Hyper Text Transfer Protocol Secure) It uses either SSL (Secure Sockets Layer) or TLS (Transport Layer Security) to start a secure connection. With SSL, public key encryption is used to secure data. In our case, our computer will ask google.com to identify itself, and google.com will respond by sending its SSL certificate to our computer; if this certificate is trusted, then a 3-way handshake is done between our computer and Google's server. It looks like this:
Client ------SYN-----> Server
Client <---ACK/SYN---- Server
Client ------ACK-----> Server
From henceforth, the SSL protocol will create a secure tunnel, and all encrypted data will be sent to Google's server. I want to point out that TLS (Transport Layer Security) is also a successor to SSL and uses a digital certificate to create a secure connection to a web server.
Now that we have securely established a connection, we can send data and queries to Google's server and receive data/information using HTTP requests and responses. In other words, Google's web server returns a web page that our browser displays on the screen.
The big question is this: what happens when about 1 million people try connecting to one google server simultaneously with different session times?
It's impossible to serve all requests simultaneously, so google has thousands of servers to serve requests simultaneously. More importantly, a load balancer is needed.
Load balancing is an important part of highly-available infrastructures/websites commonly used to improve the availability, performance and reliability of websites, applications, databases and other services simply by distributing the workload/traffic across different servers available in the architecture. Load balancers distribute traffic by performing health checks on the servers and using load-balancing algorithms. The importance of load balancers can never be over-emphasized as well. Two or more load balancers can be used to prevent a Single Point of Failure in the architecture.
Thanks for your time.
If there is anything you want me to write on, you can drop that in the comment session.
Thanks for the lovely comments. I promise to keep posting educative content on software engineering and networking.
Nice analysis my boss