Analytical architecture evolution - part 4
Part 4: Feature engineering
So now that we’ve removed the friction out of the process of creating and deploying features into production we can turn our attention to the practice of feature engineering. By removing all technical bottlenecks we are really only limited by our imagination and computing power!
Historically we would have been limited to a pre-existing set of features when building models, as this would have been imposed by the system where the model would have been implemented (a traditional CRM application, a credit decisioning platform etc.)
We would have code which would prepare a dataset with all of the features and use this for the modelling step (regression, decision tree). We may have created a few ‘new’ features as part of the modelling process to calculate if it would be economically viable to add new features to the scoring system, but this would have been pretty rare in reality.
As a result of these constraints we probably haven’t had much practice at feature engineering and what little practice we had was a small step in an overall modelling process.
One of the ideas we have is to consider the practice of creating new features (feature engineering) as independent of the model build. This could either be inspired by domain knowledge (I’m pretty sure this feature would add value to one of our models) or more systematic in nature.
If you think about the Feature Store design described in Part 3, we are co-locating base features, derived features and scores in the same store. This creates a closed-loop system where we can create new features and have each model ‘vote’ on the likelihood that this new feature would add discrimination. We have created a feature engineering feedback loop which is important as, without the usual technical limitations, we need a way of limiting the creation of thousands of spurious features!
Moving forward, we are considering if we can use this closed-loop system to build a self-healing process for our models. I love the idea of coming in to work in the morning and being presented with an automated suggestion to re-align or rebuild a model based on either a new set of features that were created or just a general degradation in the model.
In time I believe we will move our feature engineering towards more systematic methods, for example dimension reduction techniques like Principle Component Analysis (PCA). This is a nice technique for transforming a set of observations into a set of linearly uncorrelated variables which can then be passed into a modelling step. It may require a quick brush-up on linear algebra techniques, so you may want to set some time aside to hit the books!
I think this will come hand-in-hand with a move towards machine learning techniques which may also require some time hitting the books!!