Data Today: File Systems and Storage
As I advance my understanding of computer science, cybersecurity, and cloud technology, I figured I'd share what I have come to understand. If in this article, if have misunderstood something please correct me. I also encourage you to share your knowledge, experience and resources of the subject matter by commenting below. You'll inspire and add value to me and our network. -Dru Macasieb
In today's connected world where artificial intelligence, the Internet of Things, and computer automation are ever-so prevalent, data has become the new gold. But unlike gold, there is no scarcity of data, there's an over abundance, and its value increase the more we have of it. We’ve always generated data, but we never collected, managed, and analyzed it in the scale we do today which is growing exponentially. This has given rise to new ways [new to me at least] of managing and storing data.
Two frameworks for managing data are Hadoop and Ceph. In short, Hadoop Distributed File System (HDFS). In short, HDFS is an open-source distributed file system that maps out the broken-down pieces before processing them into outputs (Langit, 2019). In short, HDFS is a distributed batch processing systems that outputs information. Organizations have built other processes on top of this output to address their own unique needs, which gave rise to Hadoop distributions.
While HDFS is an open-source distributed file solution, Ceph is an open-software storage solution. Ceph is a platform solution to object, block, and file storage needs of new data (Canoncial, 2020). Data centers adopting open source as the new norm for high-growth block storage, object stores, and data lakes can use Ceph as a scalable solution that is much more cost effective than HDFS.
The Hadoop and Ceph filing systems are just two frameworks used to manage and process our exponentially growing data. Data warehouse and data lakes are two terms used to describe two different ways of storing data. A data warehouse stores data in a much more organized manner, as it has been processed, analyzed, and prepared for reuse in the cloud (Brandon, 2019). Just imagine Costco and how it organizes its goods by types in rows and columns. Data lake, on the other hand, is less organized but can handle an assortment of data without compromising availability as organizations can analyze and process data for application use while in data lake (Brandon, 2019). Think of data lake, like a recreational lake where you can ride a boat, fish, and swim in it. The water in the lake is the data and you can use it at your pleasure.
Our affinity for collecting, storing, and analyzing data is being outpaced by the amount we generate. The rise of object storage and the various file management and data storing techniques is a natural innovation. Although they simplify and ease the burden of the rapid flow of massive data, trying to grasp all of these techniques will only complicate you. Instead, view it as an opportunity to understand the data and file type used within an industry, and specialized in it by mastering their frameworks.
If data is the new gold, become the jeweler.
References:
Brandon, J. (2019, December, 4). What is a data lake? Everything you need to know. Retrieved from https://www.techradar.com/news/what-is-a-data-lake
Canonical. (2020). What is Ceph. Rertrieved from https://docs.google.com/document/d/1lND84EW6KCXKtlGUdIdMSGTfpXvyWIq_M8UQi3eT47o/edit#
Langit, L. (2019). Introducing Hadoop. From the course Learning Hadoop on LinkedIn Learning. Retrieved from url https://www.garudax.id/learning/learning-hadoop-2/introducing-hadoop?u=0