New Paradigm in Big Data - Data Flow Computing

Himanshu D.

Published May 12, 2016

Recently I had opportunity to Learn and listen to some of the great minds at Harvard about new paradigm in Big data computing . Here is my humble attempt to share with you . Note that this technology is deployed and used actively by few Fortune 500 companies . Concept of Data flow computing is as old as 1970 but relevant now because of new innovative hardware . one of the Vendor in example is Maxeler Technologies. (link below)

First important part is to understand difference between Control flow programming and data flow programming and how it impact the design.

Computing with a control flow core : In a software application, the program source is transformed into a list of instructions for a particular processor, which is then loaded into the memory attached to the processor. Data and instructions are read from memory into the processor core, where operations are performed and the results are written back to memory. Modern processors contain many levels of caching, forwarding and prediction logic to improve the efficiency of this paradigm; however the model is inherently sequential with performance limited by the latency of data movement in this loop.

Computing with Dataflow cores : In a Dataflow application, the program source is transformed into a Dataflow engine configuration file, which describes the operations, layout and connections of a Dataflow engine. Data can be streamed from memory into the chip where operations are performed and data is forwarded directly from one computational unit (“dataflow core”) to another, as the results are needed, without being written to the off-chip memory until the chain of processing is complete.

This is whole paradigm shift in processing Data , Instead of spending time in pushing and pulling data from memory , computing addresses to read, write and synchronizing the threads where 95% of CPU cycle drained.

we now write a program that do not control the flow of data but configure computing environment ( “programming in space” ) so data flows from input port to output port in fraction of current processing time. Speed is limited by characteristics of application while substantial saving in space and power.This is 180 degree shift from one Big Data Lake concept that we architect now.

Major hurdle is complexity of programming involved as we personalized each application platform. I am sure millenniums are up for challenge to process Exabyte of DataSet under few secs !!

Ref: Data Flow Computing book and Maxler web site and my lecture notes.

New Paradigm in Big Data - Data Flow Computing

Himanshu D.

More articles by this author

Others also viewed

Today in Data Storage History

From Clean Code to Distributed Chaos — Enter the CAP Theorem

Engineering at Impossible Scale: A machine61 llc. Field Guide for Peta and Exabyte Platforms

Empowering Data Science with oneAPI: Bridging the Gap in Heterogeneous Computing

Cache Optimization: Focusing on Data Alignment

Linked Lists And Data Structures - What They Are And Why You Should Care.

Delving into the Basics: Algorithms and Data Structures

The Scene Had It Figured Out Before We Did

Unlocking the Need for Speed: The Secrets Behind Kafka's Blazing Performance

Big Data or Big Compute - Which problem do you really have?

Explore content categories

Year of Innovation with InsightVault.

Jan 5, 2026

Data System Design Challenges

Nov 28, 2017

Befriending Black Swans

Nov 10, 2016

" Clean data, No Problems -:) "

Aug 14, 2016

Seven Habit of big data Deployment

Mar 3, 2016

Big Data Trends for 2016 !!

Jan 6, 2016

Thinking statistically – “Regression to Mediocrity”

Jan 2, 2016

Scoping the Big Data project for a midmarket company

Oct 6, 2015

Big Data - Walking on right path , Simple but unexpected.

Aug 27, 2015

Predication and unknown "unknown"

Jun 16, 2015