Trying to answer : Block, File & Object
Canonical ubuntu.com

Trying to answer : Block, File & Object

Before positioning which kind of storage is ideal for the requested workload I always try to start by explaining the different storage types and use-cases. There are two important types of data within the enterprise storage environment which are structured and unstructured data.

Structured data patterns are all around in applications, whenever a new application is written with a structured data pattern in mind, the data structure and methods are written to handle data in a certain way and will always use the same data attributes or leave the missing attributes empty. Since data was always entered through a input error handling interface where not respecting the rules would result into a error and the incompatible fields have to be updated to comply with the expected data format. Many applications still make use of this design method which reduces the flexibility for new features due to backwards compatibility, reorganisation issues or other limitations.

When thinking about block level storage in an enterprise environment, the main advantage is the ability to choose the best filesystem for each use-case which will allow to match the requirements of the application to achieve maximum performance, reliability, … The downside of a block-level environment is the added management of the different layers, they all require specific knowledge and there is no one-filesystem-fits-all solution available. When thinking about the data explosion, the choice of filesystem could also limit the ability to grow while it also impacts the flexibility in automation and orchestration tools due to the specific tuning requirements. Nevertheless as a storage guy who likes tweaking I am a deep enthusiast of block-level storage but I must admit that in the growing environment sizes which will only grow faster within the upcoming years there are many challenges to tackle.

A second type of storage device which is designed to handle unstructured and semi-structured data is a Network Attached Storage device where the filesystem is not configured on the host side but on the storage subsystem. These kind of devices provide a huge benefit in management and standardisation since we are able to perform actions on files and folders from a storage side which allows more granularity for features like snapshotting and mirroring. With a NAS device all these features are available but every advantage also has a disadvantage, the filesystem will be exported through different protocols and the implementation of these daemons could create bottlenecks (performance, management, interoperability, ...). The key to performance on a filesystem is metadata, how metadata is handled and stored is crucial to the design of a filesystem architecture, future scalability and performance should always be kept in mind when defining the initial setup.

Many new applications don’t receive a standalone interface due to the flexibility that current web and mobile development platforms are able to offer. The new look-and-feel in combination with new unstructured database technologies allow application development teams to be more agile in creating new applications where they can easily add new data formats and features due to the open design. All this flexibility requires a backend that is able to grow and adapt together with the application without limitations. Reducing the overhead of the underlaying file structure and allow applications to take full control of the storage layer, an answer to this is object storage. 

By working with objects in a container instead of files on a filesystem many scalability and performance issues are resolved since the intelligence to handle the data is not coming from the filesystem but the application can directly talk to the objects through REST API’s by using the object identifyer. The metadata which was stored and handled by the filesystem in a structured design is now added to the object itself by the application, it is even possible to add extra specific metadata for each object without the need to change anything at the storage layers. The standardisation of the REST-API’s for data manipulation allows easy access, migration and the ability to integrate with new technology through the keep-it-simple-and-stupid methodology (POST, GET, PUT, PATCH,DELETE)

Object storage provides an ideal solution for unstructured data since a new data pattern can directly be implemented and stored on an object store while block and file level storage are still crucial for the environments which require high transactional performance, the main downside of an object-store is that it is very good in throughput performance but it lacks transactional performance (can be optimised by caching but not by implemented by design). New applications and legacy applications which create and store a huge amount of content also get created or adapted to make use of object stores for their data efficiency, endless scalability and cloud-enablement, another common use-case is backup.

A small summary of these three storage types : 

Block level storage has the maximum fine-tuning capabilities which provides the maximum performance through fibre-channel, iSCSI or FCoE. A LUN has to be formatted on a host and a filesystem has to be deployed. A LUN is the granularity to make use of features.

File level storage provides the a standardisation by using a single filesystem and by using export protocols to allow data communication from a host to a storage subsystem. The three common used connections protocols are NFS, SMB and FTP. A file is the granularity to make use of features.

Object level storage is managed by the applications and is designed for unstructured data. Data is manipulated through REST-API’s where S3 and Swift are the common used deployments. An object is the granularity to make use of features.

Due to the flexibility of file level storage NAS gateways are commonly used to allow block and object level storage to profit from the file level storage capabilities. All three types have their use cases, it all depends which workload and how much data has to be handled and stored by the application stack.

To make it even more complex it is even possible to use multiple types of storage in a tiering design which allows to have the best fit for each data type. Because of the region and cross-region bandwidth enhancements now cloud storage can be implemented as an extra tier in a hybrid model which could higher data availability and redundancy while lowering cost.

To view or add a comment, sign in

More articles by Joeri Van Speybroek

Others also viewed

Explore content categories