Enterprise Data Lakes: yes, they are safe to swim in!
One of the biggest controversies, these days, is data lakes, and `enterprise data lake platforms’ (EDLPs). Just like we had heated debates some decades ago on whether data warehouses were to be preferred to data marts, discussions are now running high about the correct term for a pooling of all the information that an organization has at its disposal. But let’s not haggle over the right denomination of this new principle now, let’s just look at why EDLPs are so important and how they are a safe and sure way for companies to bring all their assets together. I see at least four good reasons why this new concept fits any organization.
Structured vs. unstructured
Enterprises are gathering bits by the Petabyte these days, and with the Internet of Things (IoT) coming along, the stream of information that needs to be stored, managed and protected will grow further exponentially. But more importantly: the nature of these valuable assets is changing: data is no longer structured, but also comes in unstructured or semi-structured formats, and the old way of storing everything in a relational database no longer holds ground. Unstructured and semi-structured data will come from end-user computing (driven mainly by the proliferation of BYOD devices) and from devices, sensors and other ‘connected things’. As IDC put it in their excellent study ‘Enterprise Data Lake Platforms: Deep Storage for Big Data and Analytics’, “data lakes (DLs) can be thought of as a corpus of unstructured and semi-structured data collected and collated from different sources into a single unified data pool.” Unstructured is the new normal when it comes to information strategy.
Bye bye (expensive) silo
The biggest advantage of a data lake lies, of course, in the fact that every asset that you want to analyze and base your decisions on, can be found in this one place. Gone are the days where DLs were stored in different places on various platforms and were duplicated several times. This caused enterprises to overinvest in hardware, stashing stuff over and over again. Since an EDLP takes care of pooling all assets, companies no longer need to overprovision hardware for each and every silo.
Better management and security
Pooling information comes with the added bonus of better manageability and security. Silos are hard to control and orchestrate, by concentrating everything in one DL, at least you have everything in one place, easy to overview and far more secure. Most EDLPs will contain highly sensitive data as well as less valuable information, so, as a whole, the system will need what is called the ‘triple A of security’: authorization, audit and authentication, for both users and applications. Encryption will most probably become standard for DLs, before long.
Agnostic on many fronts
The beauty of an EDLP lies in the fact that it supports many standards, especially open standards, both in input and access. After all, the information in the DL will be used in very different sets of applications, for instance classical enterprise applications such as SAP or Oracle, or new types of applications that cater to evolutions like cloud or mobile computing, or social media. EDLPs support a variety of file, object and even block or object interfaces. The same openness applies to access mechanisms that can be open, standards-based or even application-specific. The power of this model: data types can be accessed by different mechanisms without the need to transform them into a different format. No need for ETL (Extraction, Transformation, Load); all the information can be analyzed in place.
Is a DL something for me, you may ask? Well, let me assure you: every industry has a potential DL-use case. It can help you put an end to a silo’d approach, acquire a 360 degree view of your customers, get an insight into social media trends,… In fact, there’s no end to the opportunities that an EDLP can offer you. Just like moving to the cloud, transitioning to a DL approach is a journey that you will undertake in several steps, the first one being the adoption of a software-defined strategy. EDLPs are just another example of how software, not hardware, is defining the datacenter of the future.
Dear JacquesJacquesJacques Boschung Once again many thanks for your thoughts and contribution!!! I agree entirely ... . Additionally I feel that open source based solutions will take an important rule in a software driven EDLP environment. I'd like to call it a tired SW and data strategy whereby HW Medium as well as DC environment becomes a basic, virtual asset and will be managed very different to nowadays segmentations and dedications. Hybrid environment is not a technology but a leveraged mindset and the chaos management theories become standardized IT approach in order to resist in an exponentially growing EDLP data environment. best regards, Bob