HDFS Vs Cloud Based Object Storage

HDFS Vs Cloud Based Object Storage

What is HDFS?

HDFS stands for Hadoop Distributed File System. HDFS operates as a distributed file system designed to run on commodity hardware.

HDFS is fault-tolerant and designed to be deployed on low-cost, commodity hardware.

HDFS is suitable for storing fewer large files rather than a huge number of small files and even in case of hardware failures, it stores data dependably. It is well suited for distributed storage and distributed processing.

In HDFS compute is associated with the storage so if you want only storage and not compute HDFS is not right choice.

 

Name Node: Name node is the master node that acts like a index where the actual data is retained it holds the metadata table that keeps track of what is stored where.

Data Node: Data node stores the actual data

No alt text provided for this image
HDFS


The Data Nodes process data, serve data consumers, and log any change to the file system namespace or its properties to the Name Node. The DataNode has no knowledge of HDFS files. It stores each HDFS data block in a separate file on its local file system.

The NameNode maintains the file system namespace, contains all metadata information for the Hadoop cluster, and enforces a data replication policy. The NameNode machine is the single point of failure for the HDFS cluster, if it fails, the rest of the cluster cannot be accessed. To prevent this single point of failure, many techniques have been devised such as providing Secondary Name Node or Name Node Federation.

Name Node Federation: In newer versions of Hadoop NameNode Federation is introduced where there can be more than one NameNode to handle growing metadata.

Block storage works well for organizations that work with large amounts of transactional data or mission-critical applications that need minimal delay and consistent performance. However, it can be expensive, offers no metadata capabilities, and requires an operating system to access blocks.

 

What is Object storage?

Object storage is a technology that manages data as objects. All data is stored in one large repository which may be distributed across multiple physical storage devices, instead of being divided into files or folders.

No alt text provided for this image

Each object is a simple, self-contained repository that includes the data, metadata (descriptive information associated with an object), and a unique identifying ID number (instead of a file name and file path).

RESTful APIs use HTTP commands like “PUT” or “POST” to upload an object, “GET” to retrieve an object, and “DELETE” to remove it.        

Given that object storage is only available for several operations, there are significant limitations or benefits:

•The data blocks in the object store are intended to be written at one time, so a node does not need to lock objects before reading their contents. It is not at risk that another node may write to the object while it's being read.

•The unique identifier for the object is the only reference that can be made to it. For instance, to determine the physical location of an object's disk or storage node, you can use a simple hash function over it. When identifying which server is hosting an object's content, a compute node does not need to contact the metadata server.

•Efficient performance scaling of I/O can be achieved by accessing data without locks and by deterministic mapping of objects to the physical location of the data. There are no instances where data is accessed from the object storage server by many compute nodes.

Hi , I hope you're doing well. I came across your profile while researching [Training Institute Name] and noticed that you have experience with their programs. I’m considering enrolling in one of their courses, but since it's quite costly, I'd love to hear your perspective before making the investment. Could you share your experience with the training institute? Specifically, I'd like to know about the quality of the training, the expertise of the instructors, and if you found the course content to be valuable and up-to-date. Your insights would be extremely helpful in making my decision. Thank you in advance for your time and assistance! Best regards, Parth

Like
Reply
Like
Reply

To view or add a comment, sign in

More articles by Dinesh Waditake

  • Docker

    What is Docker? Docker is an open platform for developing, shipping, and running applications. Docker enables you to…

Others also viewed

Explore content categories