Hadoop Distributed File System
Hadoop Distributed File System Architecture with example
HDFS is used to store large dataset in multiple systems. It follows Master-Slave Architecture.
Master Node are also called as Name Node are made up of high quality hardware, chances of failure is less. Name node stores the metadata in RAM for quick access of data from Data Nodes.
Metadata table consists of mapping details of files, data nodes and blocks i.e. which part of a file is stored in which nodes and blocks.
Slave Node are also called as Data Node are made up of low quality hardware, chances of failure is more. Data node stores the actual data of a file. A HDFS cluster can have n number of data nodes.
HDFS splits input data into Blocks of 128 MB(by default). Based on the cluster size blocks are stored in the data nodes. A data node can hold 1000's of blocks.
Secondary Name Node is used to perform merging activity of FSimage and Edit logs. FSimage will have data of file system at a point in time. Edit Logs will have recent data changes of file system(creation/deletion/modification)
The above diagram depicts 8 node HDFS cluster.
Say suppose we have a file of 1GB, which will be the input for the above cluster. For each data node, block of 128MB data is stored. i.e. 128 MB x 8 = 1024 MB or 1 GB.
Data Node Failure
Heartbeat mechanism helps to detect in case of data node failure. Each data node sends heartbeat for every 3 secs. If name node didn't get heartbeat for 10 consecutive cycles, then name node assumes that data node is failed/slow.
Recommended by LinkedIn
Replication Factor of HDFS helps in case of data node failure. In the above diagram Blocks from B1 to B8 have maintained 3 replications across data nodes across Racks.(Rack 1 & 2 are different geolocations). This helps in restoring the failure data node blocks to other data nodes in the cluster, accordingly metadata table in the name node will be updated. By default replication factor is 3. This can be changed by Hadoop admin.
Rack Awareness Mechanism is used to avoid data loss in case of natural calamities in any location. So data center are spread across different locations. The replicas of data nodes should be kept in different Racks to avoid data loss. In the above diagram, we have at least 1 copy of Block to access, if one complete Rack is down. Replicas shouldn't be created across all the Racks which will result in high network bandwidth consumption.
Name Node Failure
Name node will keep FSimage and edit logs in share folder, so that secondary name node will have access for merging activity. After merging FSimage and edit logs, FSimage will be replaced with new updated FSimage which is in the share folder. Edit logs will be cleared so that it can have latest changes on HDFS.
For every 30 seconds merging activity will happen, this continuous merging activity for every 30 secs is called as Checkpoint. 30 seconds can be modified by Hadoop Admin.
When name node fails, secondary name node will become the primary name node by using the latest FSimage, then Hadoop admin should immediately bring up one more secondary name node.
Block failure
Each Data node sends a Block Report to the Name node at fixed frequency indicating if any blocks are corrupted. Then Name node will replace the block which got corrupted and metadata table will be updated accordingly.
When a client system request for data, always request will go to name node. The name node will send the metadata to access the data. since every time the request go to name node, this will result in high latency.
A big file is chunked into multiple blocks and stored in multiple data nodes, while accessing them it will result in high throughput as it introduce the parallelism. This is one of the use case of Distributed File System.