Thinking in data

Thinking in data



Data Vocabulary

  • Meta-data: Data describing data. You need to think this in multiple levels as there can be meta-data (when uploaded to server by whom) on your meta-data (when taken and where by whom) on your picture
  • Persisted: When stored for later when the data might be needed again
  • Normalization: The process of reducing your model to have less redundancy
  • De-normalization: Reverse of normalization in which you include redundancy

Data Structure

When describing data structure it can be placed in three categories based on the sematic level of the structures

Showing Structured inside Semi-Structured inside Unstructured data

  • Structured: The data is structured when there is a discerable structure and a predefined data model. Examples are Database tables, documents with schemas like xml documents with xsd schemas, json with json documents, classes in typed languages
  • Semi-structured: The data is semistructured when there is a discernable structure. Examples are HTML code, graphs and tables, e-mails, XML documents, Json
  • Unstructured: The data has not immediately discernible structure and no predefined data model. Examples are text, picture and video

When discussing structure in data there are normally three types where XML and Json documents are described as Semi-structured.

But is that correct ?

The definition for Semi-structured is that you need an easily dicernable structure like in Semi-stuctured but when you add a schema you add a datamodel making it Structured

Data Consistency

When we are working with data we do not like to leave something in an inconsistent state. Inconsistent state can mean a lot of things

  • Corrupted file if some meta-data on the file is missing.
  • A structured information goes to a lower level
  • It might be easy to correct or it might be difficult but you are always in an error state trying to recover

In computer science we try move data from a consistent state to another consistent state. We use a lot of different ways to do so by fx in file transfer evaluating the consistency in the start of the transfer with the consistency en the end. We also try to go smaller steps if possible continuing the example before we evaluate consistency on file chunks instead of the whole file. If it is transfered by TCP/IP ther is packet consistency checks.

Is Data Modelling is only for databases ?

The short answer is: No

The long answer is: You work with modelling much more than you are thinking about. It is actually when you modify structure or model working with Semi-structured and Structured data.

When you are:

  • creating classes in object oriented languages
  • making a json document
  • modifying an OpenApi response to reduce the chattiness by de-normalizing the response message

Do you need a structured process ?

Probably not especially if you know the normalization process and are working on simple things such as simple data. It is a benefit to think about consistency of data even in NoSQL scenarios because of data ownership. Yes if it gets a bit more complicated or you create table proerties in a relational database. It is normally not done making the data model deteriorate making is more difficult to handle later.

The full-blown version of data structure normalization is used in database normalization. Normally I only use BCNF when working with data structure normalization because it is easier to go directly for that (https://en.wikipedia.org/wiki/Database_normalization) and from there I go for de-normalization as I usually say I go a higher normalization than I need to have control over the data owner.

Data Owner

I usually use a definition of data owner as the one location where developers are allowed to change data if the system copies the data automatically to other places or one service function with the responsibility.

I work with a data owner

  • In document databases,
  • object graphs in memory,
  • object databases databases and
  • performance optimized databases

CAP

ACID

To view or add a comment, sign in

More articles by Lars Shakya Buch-Jepsen

  • Database Method: Staged Migration

    Method Name: Staged Migration When to use: Migrate to another database vendor Lower the risk of an upgrade when the…

  • Processes and the benefit of having them as text

    In software we work a lot with processes (though we call the flows) as the main benefit of software in a company is to…

  • Missing Regex in Sql Server ?

    Some are missing regexes in Sql Server when hosted where CLR extensions cannot be installed fx Azure/AWS. Since Sql…

Others also viewed

Explore content categories