New Paradigm in Big Data  - Data Flow Computing

New Paradigm in Big Data - Data Flow Computing

Recently I had opportunity to Learn and listen to some of the great minds at Harvard about new paradigm in Big data computing . Here is my humble attempt to share with  you . Note  that this  technology is deployed  and used actively by few Fortune 500 companies . Concept of Data flow computing is  as old as 1970 but relevant now because of new innovative hardware . one of the Vendor in example is  Maxeler Technologies. (link below) 

 First important part  is to understand  difference between Control flow programming and  data flow programming  and how it impact the design.

Computing with a control flow core : In a software application, the program source is transformed into a list of instructions for a particular processor, which is then loaded into the memory attached to the processor. Data and instructions are read from memory into the processor core, where operations are performed and the results are written back to memory. Modern processors contain many levels of caching, forwarding and prediction logic to improve the efficiency of this paradigm; however the model is inherently sequential with performance limited by the latency of data movement in this loop.

 

Computing with Dataflow cores :  In a Dataflow application, the program source is transformed into a Dataflow engine configuration file, which describes the operations, layout and connections of a Dataflow engine. Data can be streamed from memory into the chip where operations are performed and data is forwarded directly from one computational unit (“dataflow core”) to another, as the results are needed, without being written to the off-chip memory until the chain of processing is complete. 

This is whole paradigm shift in  processing Data ,  Instead of spending time in pushing and pulling data from memory , computing addresses to read, write and synchronizing the threads  where 95% of CPU cycle drained.

we now  write a program that do not control the flow of data but  configure computing environment ( “programming in space” ) so data flows from input  port to  output port  in fraction of current processing time. Speed is limited by characteristics of application while substantial saving in space and power.This is 180 degree shift from  one Big  Data Lake concept  that we  architect now.

Major hurdle is complexity of programming involved as we personalized each application platform. I am sure millenniums are  up for  challenge to process Exabyte   of  DataSet under few secs !!  

Ref:  Data Flow Computing book  and Maxler web site and  my lecture notes.

Very interesting . Thank you for sharing this concept.

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore content categories