How the Internet Works: the Internet Layer
IP Addresses
If MAC addresses were used to send data between networks, each network would have to keep track of every MAC address in the world, since MAC addresses are not structured hierarchically. This is where the internet protocol (IP) comes in; each node on the internet also has an IP address. IP addresses are structured hierarchically, so routers (a router is a device that sends traffic between networks) can use them to drill down from high-level matches to the specific address needed.
IP addresses come in two versions. The older, and universally adopted version, is IPv4, first deployed in 1983. This version has 32 bits, allowing for 4,294,967,295 possible IP addresses.
IPv4
An IPv4 address’s 32 bits are organized into four bytes. The notation of an IPv4 address is four numbers separated by periods, and each with a value of 0 to 255.
An IPv4 packet has this structure:
IPv6
With the rapid growth of the internet in the 1990s, it became clear by around 2000 that being limited to four billion IP addresses would cause us to run out of them (and that indeed happened in 2019). Since then, there has been an ongoing initiative to adopt a newer version of IP address, called IPv6. (For example, most mobile devices support IPv6 addressing.) IPv6 addresses increase the number of bits to 128, which allows for enough addresses for everyone in the world to have about 48 octillion of them. More than enough to cover the needs for the forseeable future, in other words.
An IPv6 packet has this structure:
Notable in IPv6 is the absence of a Checksum field. At the Internet Layer, the checksum needs to be recalculated on every hop, since decrementing the hop count alters the packet metadata. The original idea was that a corrupt packet needs to be detected as soon as possible and dropped, to avoid the overhead of sending it all the way to the destination before it can be detected. However, experience has shown that the overhead of recalculating every good packet on every hop considerably outweighs the overhead of possibly adding several hops to corrupt packets, since only a small percentage of packets (roughly five percent on average) are corrupt to begin with. So IPv6 did away with the Checksum, instead expecting the Transport Layer protocol to check the integrity of its PDU upon arrival at the endpoint.
Routers
Routers are used to send traffic from one network to another. Every local network that is part of the internet has at least one router. There are also routers that are not part of the local network, typically those which handle longer internet trips. The router is responsible for opening (or “de-encapsulating”) a packet, looking at a packet’s IP address, determining where to send the packet based on its IP address, re-encapsulating the packet using the frame protocol of the destination node (usually some form of Ethernet, but not always), and forwarding the frame.
Each router has a routing table, which is a list of IP addresses that it can send to. When a router receives a packet, it searches its routing table to determine the closest match to the packet’s IP address, and forwards it there. To determine the closest match, the router goes from the specific to the general, using this (somewhat simplified) logic:
Recommended by LinkedIn
This is somewhat simplified because (among other reasons) the routers that are responsible for long-distance traffic (called core routers) don’t have local nodes or default gateways. Their routing tables only contain the addresses of other routers, so they work exclusively with step 2.
How Routers Evaluate IP Addresses
An IP address has two logical parts: the routing prefix (the first group), which identifies a network, and the host identifier (the second group), which identifies a node on that network. The more bits that are used for the routing prefix, the fewer can be used for individual host identifiers. So the larger the routing prefix, the smaller the network.
For example, one of the Charter Communications networks has all the numbers from 24.158.0.0 to 24.158.255.255. So, its routing prefix is 24.158, and there are 65,534 possible host identifiers. (But, you may ask, two bytes have 65,536 possible values, so why are there two missing identifiers? Because the highest and lowest available addresses are reserved for the router’s IP address and the broadcast address, respectively. The broadcast address is used to send to every node on the network, typically for some form of resource discovery.)
A typical office network uses the first three bytes for the routing prefix, and therefore has 254 numbers that can be used for host identifiers.
To distinguish the routing prefix from the host identifier, the router uses a subnet mask, also knowns as a netmask. The netmask uses the same format as an ordinary IP address, with bits set to 1 for the routing prefix, and set to 0 for the host identifier. Therefore, in the network in the above example, the subnet mask is 255.255.255.0. It follows that the logical AND of the subnet mask and any IP address on the network will be the routing prefix. It further follows that the logical AND of the one’s complement (a one’s complement of a number is the number with all its bits reversed) of the subnet mask (0.0.0.255, in our case) will be the host identifier.
Let’s look at an example of how this works. Suppose one of the nodes on a 254-node network has the IP address 169.254.190.93. The routing prefix would be 169.254.190, and the host identifier would be 93. Now, suppose our router receives a packet with the destination address 169.254.190.93. The router will first apply the subnet mask:
The binary AND netmask row in this table is 169.254.190.0 in decimal, which is the router’s IP address. So, the router knows that the destination IP address is in its own network, and ANDs the one’s complement of the subnet mask with it:
The result in decimal is 0.0.0.93, which is the destination’s host identifier. Once the router has calculated this host identifier, it finds the MAC address of the host (there are various ways to do this, depending on the actual network configuration) and sends the packet to it.
The Internet layer determines the destination for data and sends it there. However, it does not ensure that the data is sent intact. Also, a single node can have many different applications that use the internet to send and receive data (a common example is a browser and an email application), and the internet layer doesn’t do anything to distinguish between these. Data reliability and application-level communication are the responsibility of the Transport layer.
The next article will discuss the Transport layer, in How the Internet Works: the Transport Layer.
There’s a musical cadence that goes with this. 😎