HDFS Vs Cloud Based Object Storage
What is HDFS?
HDFS stands for Hadoop Distributed File System. HDFS operates as a distributed file system designed to run on commodity hardware.
HDFS is fault-tolerant and designed to be deployed on low-cost, commodity hardware.
HDFS is suitable for storing fewer large files rather than a huge number of small files and even in case of hardware failures, it stores data dependably. It is well suited for distributed storage and distributed processing.
In HDFS compute is associated with the storage so if you want only storage and not compute HDFS is not right choice.
Name Node: Name node is the master node that acts like a index where the actual data is retained it holds the metadata table that keeps track of what is stored where.
Data Node: Data node stores the actual data
The Data Nodes process data, serve data consumers, and log any change to the file system namespace or its properties to the Name Node. The DataNode has no knowledge of HDFS files. It stores each HDFS data block in a separate file on its local file system.
The NameNode maintains the file system namespace, contains all metadata information for the Hadoop cluster, and enforces a data replication policy. The NameNode machine is the single point of failure for the HDFS cluster, if it fails, the rest of the cluster cannot be accessed. To prevent this single point of failure, many techniques have been devised such as providing Secondary Name Node or Name Node Federation.
Recommended by LinkedIn
Name Node Federation: In newer versions of Hadoop NameNode Federation is introduced where there can be more than one NameNode to handle growing metadata.
Block storage works well for organizations that work with large amounts of transactional data or mission-critical applications that need minimal delay and consistent performance. However, it can be expensive, offers no metadata capabilities, and requires an operating system to access blocks.
What is Object storage?
Object storage is a technology that manages data as objects. All data is stored in one large repository which may be distributed across multiple physical storage devices, instead of being divided into files or folders.
Each object is a simple, self-contained repository that includes the data, metadata (descriptive information associated with an object), and a unique identifying ID number (instead of a file name and file path).
RESTful APIs use HTTP commands like “PUT” or “POST” to upload an object, “GET” to retrieve an object, and “DELETE” to remove it.
Given that object storage is only available for several operations, there are significant limitations or benefits:
•The data blocks in the object store are intended to be written at one time, so a node does not need to lock objects before reading their contents. It is not at risk that another node may write to the object while it's being read.
•The unique identifier for the object is the only reference that can be made to it. For instance, to determine the physical location of an object's disk or storage node, you can use a simple hash function over it. When identifying which server is hosting an object's content, a compute node does not need to contact the metadata server.
•Efficient performance scaling of I/O can be achieved by accessing data without locks and by deterministic mapping of objects to the physical location of the data. There are no instances where data is accessed from the object storage server by many compute nodes.
Hi , I hope you're doing well. I came across your profile while researching [Training Institute Name] and noticed that you have experience with their programs. I’m considering enrolling in one of their courses, but since it's quite costly, I'd love to hear your perspective before making the investment. Could you share your experience with the training institute? Specifically, I'd like to know about the quality of the training, the expertise of the instructors, and if you found the course content to be valuable and up-to-date. Your insights would be extremely helpful in making my decision. Thank you in advance for your time and assistance! Best regards, Parth
Thanks for posting!
Well said
Dinesh Waditake well written.
Thank you for sharing.