The Analysis of Things: Data Vault as the critical link between IoT systems and Graph for Analytics
So much hype out there. Everyone is very excited about the future and it feels like a revolution.
In my experience however (and I have been nerding since the dawn of the internet) the hype eventually has to be bolted to current realities before anyone will trust it. People like things that they understand and are generally risk-averse when it comes to critical systems (thank goodness for explorers and innovators though!). The Internet of Things will not really be that different.
I've been peripherally involved in solution architecture forever but only because it was necessary to understand the evolution of information and data systems. People often get excited about technology for technology sake, but what is it REALLY about? It's about improvements, more convenience, better management of complexity - I'm sure lots will just say saving and making money for business, but if the softer benefits are not realised then simply throwing money after IT is a rather short term affair, as many bygone technologies will attest!
The real value in information systems is data. Data, and the analysis of data to understand, estimate, predict, measure, govern, control, etc. IoT will be no different. Somewhere along the line of evolution, someone will have to find a way to collect all of this wonderful new data and harness it for decision making.
How will they do it? Will we all suddenly adopt a shiny new technology and drop everything we've been doing? No, certainly not. People will finally come up with business use cases to adopt IoT - for example Smart Cities - and some more clever people will work out that there are already plenty of data sources available that can feed into some sort of master data warehouse for storing and analysing all of this rich data.
How will they do it? Will they retrofit all of these old sources with some shiny new tech so that they may beautifully and synchronously meld with this analytical utopic vision, or will they seek to find more pragmatic paths to connecting systems together.
My bet is that we will at least start off by resorting to tried and true methods, perhaps building APIs to pull the data, but it will need to be stored for analysis, unless someone invents amazing quantum connectivity where data can replicate itself and behave across a distance. No, they will likely pull it all together into a historical compendium... a Vault, if you will. Perhaps a Data Vault.
Why would they use a Data Vault as collection medium? It's because the Internet of Things is a Graph. Data Vault emulates Graphs, and can generate them easily. What better bridge between old and new than to use a hybrid methodology that utilises existing tools in your data warehouse, but in a very specific way that emulates a Graph and can constantly grow while generating or updating a number of analytical environments, including Graph and can make a link between the skills of legacy and future systems.
Data Vault has been around for years, and is at it's heart an evolution of 'Snowflake' and 'Star Schema' architectures which leverages a sort of 'factless fact' network of tables to emulate graphs as 'Hubs' and 'Links'. This is the heart of the Graph emulation and it is completely poised to play a critical role in the evolution of IoT.
Oh, yes... and Data Vault is very agile to set up. Just like a Graph, you can start with something small and branch out. Achieve results quickly, and then know that you won't have to change it, because it is linked directly to the entities that YOU use to run your business. A Data Vault Model should - at it's simplest - be easily understood by someone who knows the business well enough to describe entities and their relationships - what could be more Graph-like!
However, I do not like hype (excitement is fine), so I challenge you to look around over the next few months and years, and watch how the community that is the internet begins to shift towards the new technologies. The tide of interest will inevitably move towards that which the big players move on, but this very pragmatic approach answers quite a few of the issues that will need addressing, and because it is a very mature technology, it is the 'low hanging fruit' to solving business problems with data warehouse management and analysis.