How the web works
What happens when you type google.com on your browser and press enter.
The Internet is a global network of computers giving users (hosts) connected to it access to the World Wide Web (WWW), and the World Wide Web is simply a collection of information (data) from different sources on the Internet.
Any piece of information you get access to through the internet is on the web but not everyone cares to know how that information got there—students, teachers, professors, researchers, etc. Over half of the world's population uses the internet for one thing or another but still doesn't care how it works. But if you're a curious internet user like me, you should follow this article till the end.
Now let's begin:
I'll do this by explaining these important concepts, and at the end of each explanation, I will make an illustration more friendly.
DNS REQUEST
The first thing the browser does when you type in a URL in the browser search bar is make a DNS request.
DNS, which stands for domain name system/server, is a system put in place for naming servers using human-friendly language (English, for example). Normally, all devices connected to the internet are identified by their IP address, which is unique to all devices, and that is what the browser needs to send a request to a server hosting a website (the IP address of that server). But we don't see that IP address on our address bar; all we see is the domain of the website and other information the server needs to respond to that request. Meanwhile, there are some steps the browser takes to get the IP address of that domain; this process is called IP lookup.
First, the browser checks its cache for an IP matching the specific domain; if it can't find it there, it asks the OS for help. The OS also checks its cache to see if it has it stored; otherwise, it requests the resolver, which is usually the ISPs (Internet service providers). If the ISPs can't find it, then it's onto the TLDs (top-level domains, e.g .com). Last but not least are the domain name servers, which store the IP address and return it to the OS, which, in turn, hands it back to the browser. If the IP address of this domain is found in any of these checkpoints, the search is stopped and the browser stores the IP for future reference. If the IP address can't be found, the browser might just show you a funny GIF image that says "can't connect." This IP is what the browser uses to contact the server hosting the website since computers can only communicate using their IP address.
Click here to learn more about DNS requests.
WEB SERVERS
People new to tech sometimes find it difficult to differentiate between a server, which is the hardware server, and a web server that serves the web page.
A server is a computer that is always connected to the internet and can run for days, months, or even years without going off. System administrators might decide to turn off a server for maintenance purposes, and this might only happen once a year. What makes a server so different from our home PC or organization's computer is as follows:
A web server, on the other hand, is a software program installed on a server that is responsible for serving web pages using http/https on TCP and, by default, uses port 80 unless other ports are specified in the server configuration.
All websites are hosted on a server (computer), and it is the web server (software) that makes them available for access by the clients. Popular web servers are Nginx, Apache web server, and apache tomcat . using the CURL command on the terminal, you can determine which web server is serving a page by running
curl -X HEAD ip_address/domain
Now let's continue our illustration after the browser has gotten the ip address from the domain name server. it sends a request to the the ip address requesting a document from it (HTML page). A regular web server would just return the requested page, but a standard web application has some other constraints that's put into consideration, like load balancing, secured connections (https/ssl), firewalls and maybe an application server the server uses in processing requests (e.g databases). all these would be explained in the later sections.
TCP/IP
TCP stands for Transfer control protocol.
IP (Internet Protocol).
These two work hand in hand to ensure the safe delivery of data packets to their final destination.
The TCP is in charge of making sure packets are sent and delivered successfully without any packet loss, and it's the protocol the client (browser) and the server use for establishing a connection in order for packet exchange to occur.
IP addresses are used for identifying the computer (client or server) that establishes the connection. IP addresses can be of two types: IPv4 and IPv6, which is the latest one. Referring to IPv6 as the latest one doesn't mean IPv4 is no longer in use; in fact, it is still widely used by devices using the internet.
Read more on IP addresses here.
FIREWALL
FIrewalls, I used to think of it as a wall preventing fire outbreaks when I was little, but I was right, lol.
Recommended by LinkedIn
Firewalls in computing are software programs and hardward systems put in place for monitoring incoming and outgoing connections to the computer, thereby restricting certain IP's access to the server and also restricting the server from connecting to certain addresses if one were set.
Hardware components used for firewalls might include routers and switches, which are mostly found in networking environments. Software firewalls are computer programs installed on the host operating system; most of them come preinstalled by the OS manufacturer. Examples include Microsoft Windows Firewall for Windows PC and UFW, which I use on my Ubuntu WSL (Windows sub-system for Linux).
I don't know if it has ever happened to you when you're connected to a network, maybe a school network or an organization, but you can't access certain websites, not because of a URL mismatch but because the network you're connected to has configured a firewall that prevents every device connected to it (of course the server is the one providing the internet you're using) from accessing that site. This might even be the case of a city, state, or country that has made it so that its citizens can't access certain websites, and the same thing applies to incoming traffic too.
So in the case of our website, e.g., www.codammy.tech, that has a firewall configured, if the incoming traffic meets the requirements, then the firewall test is passed and the connection can now initiate, but that's not all.
Load Balancing
Load and balance, What do you think?
Load balancing in computing is a measure put in place to manage the traffic going into a web server by distributing requests to other servers serving the same purpose with the help of certain algorithms, thereby reducing the load/weight on servers, fostering best performances and lengthening the life of servers.
Load balancing is mostly implemented by websites that receive a lot of traffic, The concept involves having multiple servers respond to a request made to the same domain (e.g., facebook.com). Even small websites can implement it for redundancy purposes, which is encouraged in software engineering.
Load balancing has a lot of advantages when put into consideration when building the infrastructure of a web application that might later in the future have many users One of it is that when one servers is down, we have another one serving the response. Load balancers are proxy servers (intermediary servers) with software that helps distribute traffic to other web servers using an algorithm. Some of the software used for load balancing is the nginx web server and haproxy.
Different websites might use different load-balancing algorithms. which might be:
2. Least connection: This algorithm sends web traffic to the server that has the least amount of traffic/connection on it.
3. Weighted balancing: This algorithm is implemented in a way that traffic would be sent to servers depending on how much load the server itself could take, taking into consideration the memory, CPU and other hardware components of the system. This calculation is done by the system administrator.
Assuming www.codammy.tech has three servers (A, B, and C), using the roundrobin algorithm, The first traffic would be sent to server A, second to server B, and the third to server C. The next traffic/request would be made to A again, then B and C this repeats itself as long as the proxy server remains active and request keeps coming in.
HTTPS/SSL
HTTPS and SSL are technologies put in place to secure communication between a client and a server.
HTTPS, which stands for hypertext control protocol secure, is the secured version of HTTP, which is the protocol that the server uses to handle requests and also give responses based on the kind of request being made by the client (browser). The "S" at the end of http stands for secure; it means the connection to the server is secured using SSL (Secure Socket Layer). The SSL provides the certificate that the browser uses to determine if the connection is secured or not; if it is, you'll find a padlock icon at the edge of the URL field.
A user can safely input personal information like a password or credit card information without fear of information theft by a third party (Hacker). Since the connection is secured, all packets transferred to the server are now encrypted using a private key stored on the server and a public key sent to the browser.
Our connection to www.codammy.tech is secured, and we can entrust our data to the site if our browser shows a padlock icon.
Application Server
Application severs are software programs that comes installed with some applications that need to communicate with other softwares on the same host (locally) or softwares that exists on a remote server. An example is the database server (MySQL server, Microsoft SQL server, e.t.c.), which listens to connections on port 3306 (default for mysql) unless a different port is configured.
if our website makes use of an application server, we refer to such website as a web application because it now uses other software programs for its complete functionality.
Databases
Finally, Databases. A Database is a software application used for storage and retrieval of data. database holds the information a user wants to access. A use case of it in a web application is when a user wants to register for a program, fill out a survey, or create an account on social media, The information the user inputs is stored in the database as soon as the user clicks on the submit or register button. I would refer to a sign-in process as a data retrieval because the details of the user trying to login would have to be fetched from the database.
The moment the client requests to get to the webserver, the web server sends a message to the application server (database this time) for help with some information. The application server then respnds with the data needed, if available, which the web server then uses to respond to the client.
I appreciate your interest in reading this article through to the end. I hope you find it helpful. Remember, one of the best gifts is the gift of knowledge, Do well to share it with your friends.
This article is the outcome of the software engineering knowledge acquired at alx_africa .
Well explained 👍👏👏👏
Hmmmmm, this is quite insightful