Data Today: File Systems and Storage

Dru Macasieb

Published Dec 4, 2020

As I advance my understanding of computer science, cybersecurity, and cloud technology, I figured I'd share what I have come to understand. If in this article, if have misunderstood something please correct me. I also encourage you to share your knowledge, experience and resources of the subject matter by commenting below. You'll inspire and add value to me and our network. -Dru Macasieb

In today's connected world where artificial intelligence, the Internet of Things, and computer automation are ever-so prevalent, data has become the new gold. But unlike gold, there is no scarcity of data, there's an over abundance, and its value increase the more we have of it. We’ve always generated data, but we never collected, managed, and analyzed it in the scale we do today which is growing exponentially. This has given rise to new ways [new to me at least] of managing and storing data.

Two frameworks for managing data are Hadoop and Ceph. In short, Hadoop Distributed File System (HDFS). In short, HDFS is an open-source distributed file system that maps out the broken-down pieces before processing them into outputs (Langit, 2019). In short, HDFS is a distributed batch processing systems that outputs information. Organizations have built other processes on top of this output to address their own unique needs, which gave rise to Hadoop distributions.

Vector art created by Garry Killian - www.freepik.com

While HDFS is an open-source distributed file solution, Ceph is an open-software storage solution. Ceph is a platform solution to object, block, and file storage needs of new data (Canoncial, 2020). Data centers adopting open source as the new norm for high-growth block storage, object stores, and data lakes can use Ceph as a scalable solution that is much more cost effective than HDFS.

The Hadoop and Ceph filing systems are just two frameworks used to manage and process our exponentially growing data. Data warehouse and data lakes are two terms used to describe two different ways of storing data. A data warehouse stores data in a much more organized manner, as it has been processed, analyzed, and prepared for reuse in the cloud (Brandon, 2019). Just imagine Costco and how it organizes its goods by types in rows and columns. Data lake, on the other hand, is less organized but can handle an assortment of data without compromising availability as organizations can analyze and process data for application use while in data lake (Brandon, 2019). Think of data lake, like a recreational lake where you can ride a boat, fish, and swim in it. The water in the lake is the data and you can use it at your pleasure.

Image designed by Biancoblue - Freepik.com

Our affinity for collecting, storing, and analyzing data is being outpaced by the amount we generate. The rise of object storage and the various file management and data storing techniques is a natural innovation. Although they simplify and ease the burden of the rapid flow of massive data, trying to grasp all of these techniques will only complicate you. Instead, view it as an opportunity to understand the data and file type used within an industry, and specialized in it by mastering their frameworks.

If data is the new gold, become the jeweler.

References:

Brandon, J. (2019, December, 4). What is a data lake? Everything you need to know. Retrieved from https://www.techradar.com/news/what-is-a-data-lake

Canonical. (2020). What is Ceph. Rertrieved from https://docs.google.com/document/d/1lND84EW6KCXKtlGUdIdMSGTfpXvyWIq_M8UQi3eT47o/edit#

Langit, L. (2019). Introducing Hadoop. From the course Learning Hadoop on LinkedIn Learning. Retrieved from url https://www.garudax.id/learning/learning-hadoop-2/introducing-hadoop?u=0

To view or add a comment, sign in

Data Today: File Systems and Storage

Dru Macasieb

More articles by Dru Macasieb

Others also viewed

HDFS Vs Cloud Based Object Storage

Big Data Storage Solutions: Comparing HDFS, Amazon S3,Azure ADLS Gen2 and Google Cloud Storage.

Big Data Technology : Something Old - Something New

The Evolution of Big Data Technologies

Cloudera + HortonWorks: Big Merge

Introduction

NoSQL Databases: The Future of Scalable and Flexible Data Management

Introduction to Non-Relational Databases

Understanding the Use and Importance of .crc Files in a Distributed Cloud Computing environment.

Optimizing HDFS Storage Efficiency: A Technical Journey

Explore content categories

More articles by Dru Macasieb

How Google's Cloud Run Can Benefit Your Business

Editor’s Note: An Introduction to Neodru

Asian Woman turned Cybersecurity Engineer: Four Questions Answered

The Sword of Truth and the Shield of Knowledge

The Top 10 Best Practices for Instructing Remote Courses for the Adult Learner

Connecting Course Objectives to Field Trips: What we learned from what we didn't learn at the Microsoft Store.

SHRM Case Studies: Discussing Ambercrombie & Fitch's Exclusive Culture and Comparing it with Apple's Inclusive Culture

Reinforcing Marketing Concepts: A Field Trip to the Apple Store

The Difference between technical skills, transferable skills, personal values, and how to communicate them to others.

Why you should take your students to the public library, how to connect it to learning objectives, and what to expect from them.