The Answer to Dumb Big Data - Universal Data Representation
Today’s big data is dumb. Without a parent application to make sense of it, the data in its raw format is useless to anyone. Meaningless. Dumb. In large organisations, teams of data specialists work full time to synchronise and consolidate dumb data from different applications for reporting. It’s a complex, ongoing process with more points of failure than flies at an Aussie barbecue.
Vendors recognise the problem and have been addressing it with in-memory databases like SAP’s HANA, which is powerful enough to both record transactions and generate reports using the same database. SAP is busy rewriting their applications to work on HANA and use data elements consistently. It’s much more efficient and makes selecting data to include in reports simpler.
In spite of such progress, though, this enterprise data is still dumb. A detailed understanding of the source application is required to reliably interpret the data.
I'd like to offer an alternative. Imagine combining the speed of in-memory processing with a universal data format which adapted dynamically to represent any information applications wanted to store or query. There would be no metadata, because information would be stored at a semantic level using a network of linked concepts, more like a brain than tables with predefined rows and columns.
A universal format would allow data generated from devices in the organisation’s “internet of everything” to combine in one repository, along with business transactions, unstructured documents, spreadsheets, records from HR, and anything else someone can conceive.
The data would be available to any application through a universal query language, which could return a result set that reflected the language and preferences of the user.
Of course, information security would no longer be the responsibility of the parent application, which anyway ceases to be effective once information is outside the application’s control. Instead, data would be encrypted, secured, managed and tracked at a much more granular level - individual concepts, associations and representations. This could include data provenance, change history, authorisations, validity timeframe, importance, and all kinds of other granular detail that have never been generally available for individual data elements.
The Internet of Things consortiums are promoting greater standardisation as the way to make “things” communicate effectively. I fear their approach will merely constrain the types of applications supported, and create an “internet limited to compatible things”.
Wouldn’t a better investment of time, money and intellect be the creation of a universal data format that allowed “things” to produce any information their inventors could conceive, without being constrained by contemporary standards?
Representing information in a universal format, which is homogeneous and self-defining, will also solve a fundamental big data analytics challenge. At present data can only be included in analytics if someone has reverse engineered the source to ensure apples are compared with apples. Every upgrade puts this assumption at risk. But when data is inherently meaningful and secure, the dependency on source application goes away, revolutionising the way analytics can be performed.
In summary, big data is dumb because it is so dependent on a source application for context and meaning. A universal data format which makes data inherently sensible would break that dependency, solving significant challenges facing “inter-thing” communications and Big Data analytics.
--------------
I really appreciate that you are reading my post. Here, at LinkedIn, I plan to focus on the need for inherently meaningful, shareable data. If you would like to read my regular posts then please click 'Follow' and send me a LinkedIn invite. And, of course, feel free to also connect via Twitter.
You might also like to join the Neurocognitive Informatics LinkedIn group.
Check out other recent LinkedIn posts by Pete Chapman:
About : Pete Chapman is an award winning innovator with a passion for machine cognition. He has developed and prototyped a homogeneous conceptual data model and is currently exploring strategies around commercialisation.
Great article Pete - I almost understood it all! :>) Love the way you think!
Quite Interesting Your article Peter, I remember a quote from Oscar Wilde that says: "An idiot going at 100 KM/H still the same Idiot" software companies just focus in speed rather than quality of data.
Great article, timely and thought provoking. I concur with your thoughts and look forward to seeing who can rise to the challenge.
I can honestly see where IBM is trying to build their own standard (proprietary nonetheless) in relation to the Watson Advisor, Analysis and Cloud solutions. IBM wants everyone to code toward Watsons architecture in order for them to liberate your data for you and sell you the results. http://www.ibm.com/smarterplanet/us/en/ibmwatson/ So it is a race to create the open data storage standard which may come with a license fee attached unless the open source community can step in and do it in 1/100th the time and 1/1000th of the cost.