What happens when you type https://www.holbertonschool.com in your browser and press 'Enter'​?
Photo by Deepanker Verma from Pexels

What happens when you type https://www.holbertonschool.com in your browser and press 'Enter'?

There is absolutely no denying that the internet, with its fast-paced advances, has become a very essential and preferred resource globally. With access to a wide variety of information and/or services, plus the ease and convenience of its usage, the internet enables infinite possibilities for its users. Anyone with access to it, most likely has already become so familiar with it, that browsing for anything in the internet becomes second nature. Indeed, the browser may seem like the tool to navigate the internet that makes all the magic happen, but there is so much more happening beyond that point just to bring us the desired content. Given the complexity of this browsing process, I'll be answering the aforementioned question covering the different components of this networking infrastructure design. 

The client-server model

Every resource accessible through the internet exist in an information system called the World Wide Web (WWW) also known as the Web. Such resources are hosted by servers that respond any incoming request by clients over the internet. This type of communication framework of network processes between clients and service providers is known as the client-server model, which counts with web technologies and protocols such as the Hypertext Transfer Protocol (HTTP), Domain Name Service (DNS), among others. For example, if I want to get the resources from the Holberton School's web page, I would use the browser (client) and type in the Uniform Resource Locator (URL) to connect via internet with the website and request their content, so then the computer system in charge (server) shall serve back such content. An URL (e.g. http://www.holbertonschool.com) is basically the address of each unique resource available through the Web. It is composed of different functional parts, for example, the first section would be the scheme (http), which indicates the protocol that the browser must use to request the resource, and then comes the domain name (www.holbertonschool.com), that indicates which web server is being requested. Moreover, after this last section, unless a different one is explicitly indicated, the URL automatically appends the port number (divided by a ':'), that by default the given protocol listens to. Even though we might not see it in the browser, the resulting URL would look like this: http://www.holbertonschool.com:80

Protocols

As a quick reminder, the internet refers to the global system of interconnected computer networks. In simple words, it is basically a channel for the transportation of data from a computer to another. In order to achieve this communication it uses the TCP/IP suite of protocols, which breaks down to two of its original protocols: the Transmission Control Protocol (TCP) and the Internet Protocol (IP). Being the most used ones, TCP provides a way to deliver and receive an ordered and error-free stream of information over the network. For example, when a web page is requested in a browser, the computer sends packets of data to the web server's address asking for such contents. Then the server replies with their data packets. These packets are ordered by number, tracked and checked for any errors, ensuring reliability thus preventing any data loss or corruption in the exchange. The TCP is able to succeed with the foundation of the IP, which essentially establishes what we know as internet. IP is in charge of delivering packets from the source host to the destination host based on IP addresses, which are numeric labels assigned to each device connected to a computer network using the IP for communication. For that matter, IP defines the format of such packets and provides an addressing system. Another prominent protocol would be the previously mentioned HTTP, which is used for transmitting hypermedia documents like HTML. It was mainly designed so that clients and servers can communicate, plus it listens to port 80 by default.

The DNS request

Lets say you want to visit a place, you certainly know its name but you are missing its address, so you recur to some sort of directory for that information. Similarly, the Domain Name System (DNS) works as a helping hand so that clients can reach the desired servers. As mentioned before, IP addresses are fundamental to establish network connections, but the reality is that these sequences of numbers are difficult to remember. Here is where the DNS joins in, it will basically translate the domain name (the name of the web page as we humans know it) into the IP address (the piece of information that the browser actually needs to visit such page). It might seem like a simple task, but it actually involves many different facets and organizations to achieve its purpose. As an example, say we want to visit www.holbertonschool.com for the first time with our browser. For clarification purposes, I'm going to review the anatomy of the domain name before continuing. There are different sections in a domain name and as you may have noticed they are divided by a period. The hierarchical levels are best visualized from right to left: first the domain provider (e.g. org, net, com); second, the root domain (e.g. holbertonschool.com); and third, the sub domain of the root domain (e.g. www.holbertonschool.com). In regards to the last one, there could be many sub domain names, the important thing is that they should always link to their root domain. So, initially the browser will contact the operating system's cache memory to see if the IP address is stored from a previous visit, but will fail because it's the very first time. In this case, the operating system is configured to ask for a DNS request, which will involve the intermediary help of the Resolving Name Servers (i.g. the Internet Service Provider), for that domain name. Then, in case that the resolver doesn't have the IP address in its memory, it shall start a hunting process asking other name servers. First, it will ask the Root Name Servers, which, in case of not knowing the IP address, will redirect the resolver to the location of com, which is included in the Top Level Domain (TLD) Name Servers. Then, if these don't know either, they would redirect the resolver to the location of www.holbertonschool.com, which is in the Authoritative Name Servers. These last set of name servers will ultimately provide the IP address to the resolver to take it back to the browser. All of these steps are required just for one domain name lookup, fortunately they just take milliseconds to be completed.

Security

Being a medium with many benefits for the connection among computers, it is also true that the internet can pose many security and privacy risks as well. Therefore, it's very important that we get to know some security options in a web infrastructure design. Lets start with the Firewall, which is basically a division between a private network and an outer network, that manages the traffic passing between the two. It can allow, block, and limit network traffic based on preconfigured rules in the hardware or the software. On the other hand, we have HTTPS, which defaults to port 443 and is the secure version of HTTP, hence the addition of the S. HTTPS pages use the Secure Socket Layer (SSL) protocol to encrypt communications between your browser and the website. SSL uses an asymmetric system composed by two keys to encrypt the communication: the private key and the public key. Anything that was encrypted with the public key can only be decrypted by the private key and vice-versa. For example, when a web page is requested via HTTPS connection, the website first sends its SSL certificate, which contains the public key to begin the secure session, that will allow the generation of shared 'secrets' between the client and the server. This way, even if hacker breaks into the connection, they wouldn't be able to decipher the passing data, whereas with the regular HTTP, sensitive information could be exposed as explicit text. When a HTTPS connection is in effect a padlock icon should be displayed in the browsers address bar, which includes the details of the SSL certificate.

The server side

The server side of this bilateral model can have different components and be designed in different ways, it will depend mostly on its content and functionality. First, the web server itself displays the website's static content through storing, processing and delivering web pages to the clients by using HTTP. It also supports the Simple Mail Transfer Protocol (SMTP) used for emails, and the File Transfer Protocol (FTP) used for file transfer and storage. The web server hardware connects to the internet so that it can exchange data with other connected devices, meanwhile the web server software controls how the clients access the hosted files.

Nevertheless, most modern websites also generate dynamic content, which adds interactivity between the client and the web server, this is the duty of the application servers. These type of servers are basically software frameworks that can run web applications and process data sent from another server to provide a specialized functionality offered by a business, service, or application. For example, if a cloud service needs to process data on a Windows machine, a Linux-based server could provide the web interface for the cloud service, but it can't run Windows applications. Well, the web server can send this data to a Windows-based application server, which in turn can process this data to then return the result, so that the web server can output it in the browser.  

What if the website we want to visit is heavily concurred, like Facebook or Twitter. Would their web server suffice to handle large amounts of traffic? Keep in mind that people from around the world could also be requesting from them simultaneously. For that matter, the server side can be composed of multiple servers to distribute the load across them, moreover, computers could be grouped and set to work together as one system, which is known as a computer cluster. Consequently, with the addition of more critical components, the server side will end up having improved performances and increased reliability in their system, which is commonly known in engineering as redundancy. If so, they would also need a referee to manage the incoming traffic and distribute the work-load on the system, to take care of this responsibility we have the load balancer. Furthermore, it will increase the reliability, efficiency, scalability, and availability of the application or website. Another benefit is that it can avoid any Single Point of Failure (SPOF); for example, if a server crashes the website would still be up and served by other servers in the cluster. Load balancers may use a variety of methods, which implement different algorithms that are suited for particular circumstances. Among these methods the most used ones are: the Least Connection Method, which directs the traffic to the server with the fewest active connections; the Least Responsive Time Method, which directs the traffic to the server with the fewest active connections and the lowest average response time; the Round Robin Method, which rotates servers by directing traffic to the first available server and then moves that server to the bottom of the queue; and the IP Hash, where the IP address of the client determines which server receives the request. 

Lastly, the server side will most likely need to handle large amounts of data too, and ironically data might not be tangible but it does occupy space, hence databases exist. They are basically a collection of organized information, so that the data can be accessed, managed, and updated with ease through a Database Management System (DBMS), which would be the program in charge of these duties. There are two main types of databases: the relational and the non-relational databases. Relational databases can be visualized as a collection of tables, each with a schema that represents the data types and fixed attributes of the items stored in them; whereas their DBMS will provide the functionality to handle the data stored, usually by implementing a Structured Query Language (SQL), which have well-defined and commonly accepted standards. On the other hand, non-relational databases don't necessarily follow a rigid schema that defines a specific way to handle the data, thus enabling the manipulation of unstructured data. These, also known as NoSQL databases, have more freedom for administration and flexibility for storing.

In conclusion

When you press 'Enter'... First, the browser will lookup the IP address in memory and if it's not available at hand a DNS request will be made. Second, the browser will use the given URL to send a request to the Holberton School's web server using the https scheme along with the others standard protocols for the internet usage. Third, the server will respond to this request by establishing a secure connection, that involves HTTPS and SSL certificates, for the exchange of encrypted information between the two entities. Fourth, the web server shall process the client's request and depending from it, it may need the assistance of an application server's and/or DBMS's functionality to do further processing. Lastly, the web server will reply with the requested content for the client.  

The following is a diagram showing the flow of the client's request. It links to a post where you can view the image in full-screen or even download it.

A diagram illustrating the flow of the client's request.


Sources

https://www.techopedia.com/definition/18321/client-server-model

https://www.howtogeek.com/190014/htg-explains-what-is-the-difference-between-tcp-and-udp/

https://www.educative.io/edpresso/web-server-vs-application-server

https://www.websecurity.digicert.com/security-topics/what-is-ssl-tls-https

https://www.webopedia.com/definitions/firewall/

https://www.instantssl.com/http-vs-https

https://www.cloudflare.com/learning/dns/what-is-dns/

https://techterms.com/definition/application_server#:~:text=An%20application%20server%20is%20a,running%20web%20applications

https://whatis.techtarget.com/definition/Web-server

https://avinetworks.com/what-is-load-balancing/

https://www.alooma.com/blog/types-of-modern-databases

https://www.youtube.com/watch?v=72snZctFFtA

https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_URL

https://www.gliffy.com/

To view or add a comment, sign in

More articles by Jeffrey Martinez

  • IoT: the advent of new technologies

    Try to imagine how last year would have been without having internet around. Honestly, I think it would have been like…

    1 Comment
  • Breaking down recursion

    When learning about computer science, you should come across the concept of recursion. In my case, I struggled a lot to…

  • In Python... everything is object!

    Object-oriented programming (OOP) is supported by most programming languages, providing advantages like clear…

  • Differences Between C Static and Dynamic Libraries

    When programming in C, it is mostly certain that you'll come across any of these libraries. It is because they are very…

  • What happens when you type ls -l in the shell?

    At a glance, you might rush and try to answer straight to the point, since ls is one of the most used commands in…

  • C static libraries

    What are they? In order to complete a program we may need different object files, which are obtained when the source…

  • What happens when you type gcc main.c

    There is no doubt that C is one of the most popular general purpose programming languages. This high level language has…

Others also viewed

Explore content categories