Is Data Pipeline a commodity in the making?
A couple of days ago I was having a chat with a good friend of mine and he made a statement which immediately resonated with my own thoughts. He said "Data Piping and Data processing @ scale will soon become a commodity". Interesting thought isn't it?
Let me explain a bit. Today having access to highly scalable cloud processing platform is a commodity. You can setup a massive Spark cluster in AWS in less that 10 minutes, or you can create a cluster of map-reduce processors in Google Cloud in the same amount of time.
Amazon, Google or Microsoft are already providing you with all the underlaying components that allow you to create a good Data Pipeline architecture (AWS EC2, S3,EMR, RS etc).
The DataFlow logic is the missing layer. That is the glue which allows to link all these components together to build the Information layer which then create value for the business.
As cloud solutions required really a "hands on" approach 5 years ago, it was legitimate to think that the next layer will become a commodity as well.
Today a head line made me think that this might come sooner than I thought: Google wants to donate its Dataflow technology to Apache
Overall that is great news, and it calls out that the key factor of the success of any Data project is not in the technology stack but in the understanding of the business needs and how Data will provide the right Insights to create value.
Great point you make Thomas. Another interesting read somewhat related to this topic is this one https://www.garudax.id/pulse/from-data-warehouse-jailhouse-lake-gilles-hocepied?trk=prof-post