Moving a Customer Database from SQL to noSQL

Moving a Customer Database from SQL to noSQL

I’m not from Oklahoma but I’m a big fan of “Show Me”.  I’ve read numerous articles about noSQL databases and their uses, advantages, disadvantages, economics and scalability.  All of these discussions are necessary when evaluating a shift in data, software and system architecting.   However, there comes a point in time when a use case is required to make that final push towards a decision.  

My use case is specifically addresses moving a customer database from a set of MySQL tables to a set MongoDB collections.  The MySQL tables are assumed to be using the InnoDB engine meaning they have referential constraints and integrity.  The information architecture will include customer name, address, order, product and payment information.  

The Relational System


I chose to do a customer based system because it is a common model that touches upon several aspects of data design.  A customer database can be a very focused model or one that is expansive and incorporates accounting, supply chain management and customer relationship management.  In this example I will restrict the model to customers, products, orders and payment methods.  

This should look familiar and certain business assumptions have been made.  Those assumptions are:

  • Orders are shipped as a single unit to a single location
  • Any one Product can appear in multiple categories
  • Any one Product can be sourced from multiple suppliers
  • Customers can have multiple addresses
  • Customers can have multiple Payment options

The noSQL Approach

There are two common mistakes made at this point: one, just trying to replicate the relational tables into collections and two, completely scrapping any notion of the relationships between the data points.  Just because you are choosing to use a non-relational database does not mean your data doesn’t have relations.  The relations still exist but can now be handled in more scalable and flexible constructs.

I plan to focus on two aspects of the implementation: Product Category and Payment Methods.  

Product Category

The important point to remember here is the only thing that stays the same is change.  A basic technique used an RDBM is the use of what I call “set” tables.  The sole purpose of these tables is to allow flexibility in label naming conventions while insulating the internal working of the system from these labels that may change over time.  This means that that every label has a corresponding ID that is used by the system to identify it.  The label itself is used for UI purposes and can be modified without affecting functionality.  There is absolutely no reason to abandon this technique when moving to a schema-less database.

Payment Info

Many years ago I remember coming across the C programming union and its’ seemly limitless possibilities.  It allowed one variable to morph into any number of things and an array of unions could encapsulated a variety of information into a collection.  It’s not surprising then that MongoDB refers to tables as collections and the individual elements of that collection can be anything you want.  

When you think about a payment method these days it could be anything from a PayPal/Venmo Account to a Credit Card to the old school checking account.  Each has its’ own set of data points.  MongoDB can easily accommodate this due to the union like property of their collections being able to contain documents of different structures.  Simply said, each document could be something completely different which will require sharing only a document type field which implies the contents of the document.  I a very theoretical sense a single MongoDB Collection could encapsulate an entire database with this approach, equating table names with document types.  While this is perfectly fine in theory it becomes a mess in practice.

MongoDB offers another option in the form of a DBref.  This is about as close as MongoDB will come to relational data.  As an example, if a user has a credit card as one of their payment methods the order or customer document will contain a DBref to a payment information document.  The DBref, actually a small object, includes not only the MongoDB ID of the related document but also an identifier specifying the collection to find it.  In this way you can have one collection for credit cards, another for PayPal and a third for checking accounts, etc.  In a relational system you usually have two choices; one is to use this same multiple table approach which requires a mapping table that contains both the user’s id and the id of the payment method with an additional field that identifies its type which can be used to trigger the code to use the correct table.  The second option is to have a table that can encapsulate all possible combinations of data for different payment types.  My hesitation to take this approach is that you end up with a sparsely populated table and over time it becomes difficult to integrate new payment types.  

To facilitate the use of this type of relation, we’ve created a function within our system to automatically de-reference DBRef fields within a document.  With proper safeguards in place to control the depth of de-referencing, it defaults to one level, and the option of identifying the fields that you want de-referenced, it becomes very easy to pull a single data structure for use within the code that is sourced from multiple collections within the database.  

Conclusion

It is important to stress that moving a system from a RDBM system to a document based approach is not a simple recreating the tables or a total abandonment of the relations that exist within your data.  Instead it is approaching the solution from a different fresh perspective which will allow you to capitalize on the strengths of document based data storage engines while still preserving the natural relations that occur within your dataset.

To view or add a comment, sign in

More articles by Chris Wallace

  • A Company’s Culture

    Fade in: We find ourselves in a re-purposed factory floor, with large, round structural columns evenly spaced every…

    1 Comment
  • Blue Green Deployment Security Framework

    The cloud presents security professions with a host of challenges as well as opportunities. The ability to easily…

  • The SIEM is Dead!...Long Live the SIEM

    As anyone will tell you, even those who make their living selling and supporting Security Information and Event…

    1 Comment
  • The Business of “3 and D”

    Is a three point shot too risky? Not when it makes good business sense. It was on October 12, 1979 that Chris Ford of…

  • Transparent Technology - Mobile Networks

    The simple act of calling your Mom or a friend that lives on the other side of the country puts into play an amazing…

  • Geek Doodle: The Evolution of Cloud Computing

    After an extended winter hiatus I begin publishing again with a novel little Doodle entitled: Evolution of the Cloud.

  • Stuxnet: Five Years On...

    It's been more than five years since the Stuxnet virus destroyed approximately one thousand centrifuges at the Iranian…

  • Schemaless Conundrum 2: NULL, Empty, 0, ‘0’ or Missing

    What Is At Issue? Back during what my 8-year-old daughter refers to as the “olden days,” the schema of a database would…

  • Schemaless Conundrum : ALTER This!

    Introduction I’ve decided to write a series of posts examining the new noSQL environment and its effects on Software…

  • Avoiding Common noSQL Development Pitfalls

    “I believe that this nation should commit itself to achieving the goal, before this decade is out, of landing a man on…

Explore content categories