Lambda Architecture for ML Models
The Lambda architecture is layered data processing architecture. It is an architecture derived for processing large volume of data continuously. It has three layers, named Batch Layer, Fast Layer and Serving Layer. In Batch layer data is updated its layers keep data view updated in near real-time. Usually data view means a data table, which is used for reporting.
In ML models have been created. By definition ML, models are representation of real data in form of mathematical equations. Mathematical equations have parameters, derived in learning process. Hence, ML model can be considered as another form of data view. After all it's parameters are derived from real world data.
Real world data almost always arrives continuously in real-time mode. This makes ML model redundant frequently. As parameters do not reflect the newest data arrived.
Often the newest or latest information is most relevant information. In ML platforms, models frequently discarded and created afresh. If model is created over big data, it cost a lot more.
A newer, rather unexplored concept is upgrade the model. By means upgrade is do not discard but update its parameter values by processing latest data i.e. data arrived after previous update.
Fundamental question is how to update the Model , without (re)processing data again. And answer lies in mathematics followed in ML model building process. In ML models (Big) Data is processed in series of steps. A step represents tasks. In these steps many steps have property of additivity and commutativity. These two properties are the fundamental properties required to process data in parallel mode. If the setups are identified and matrix is updated in those tasks a model can be upgraded.
Now how Lambda architecture could be useful. With lambda architecture principal, first lets create a ML model with batch data or data at rest. Then update it continuously ( in a shorter frequency) using fast layer with hot data. Then serving layer will always have best possible Ml model for scoring.
Now how this can be implemented. To implement two fundamental things about data and models need be considered. First ML models input data is converted to matrix forms and subsequent steps are revised matrix representing the same data. Second ML models are optimization models or problems, iterative in nature. They iterate with a solutions as input and a revised (improved solution) as output.
In subsequent posts more details about model upgrade process will be discussed.
Interesting thoughts. Waiting for the next article