Choosing the right Database

Amit Lalloo

Published Dec 10, 2018

Operational Data reference guide

What is an Operational Data Source ?

Operational Data Source refers to the data sets in either Systems of Record or Systems of Engagement, the nature of the Data Source is always OLTP. It is important to note that OLAP processes are also becoming critical in augmenting Operational System Processes

Operational Data in Management Information Systems

MIS Systems are typically used to Run the Operations of a business by providing Business Intelligence reports etc. This class of Data is always OLAP

Is OLAP really Operational Data ?

Digital business processes are having to react on the basis of the outcome of some analytics. A Next Best Action micro-service for example will be developed on a Analytical Data Source and still be used operationally.

This Guide is consists of the following technical considerations :

Key Considerations

Availability and Recovery - Data Sources are typically the only stored persisted state of an Application System. This is should be the first and primary non-functional consideration
Integrity - The application ultimately depends on the Database to persist the data and manage the various states of the values.
Performance - Optimising the DataSource Should be considered upfront , it can be the most complex tier to optimise.
Data Lifecycle - DataSource Retention and Archiving is key to achieving the above mentioned considerations.
Data Quality - The DataSource must have mechanisms to remediate data in case of duplicates etc. Schema versioning etc must be considered.
Data Services - Performance must factor OLAP sources needing the data in near realtime. META DATA must be descriptive of the sources
Data Structures - More optimal Data Structures are now available for various types of DATA, New SQL Sources are now available in the forms of Document DB, Graph DB , Key Value Store , InMemory Columba Stores etc.

Considerations Unpacked

Availability & Recovery

Most, if not all RDBMS Sources would have been engineered to include ACID principles; What is ACID ?

Most, if not all RDBMS Sources would have been engineered to include ACID principles;

What is ACID ?

Atomicity - In a transaction involving two or more discrete pieces of information, either all of the pieces are committed or none are.
Consistency - A transaction either creates a new and valid state of data, or, if any failure occurs, returns all data to its state before the transaction was started.
Isolation - A transaction in process and not yet committed must remain isolated from any other transaction.
Durability - Committed data is saved by the system such that, even in the event of a failure and system restart, the data is available in its correct state.

Whilst all RDBMS vendors would correctly claim to be ACID compliant , to remain ACID compliant in a High Availability Configuration or Disaster Recovery Cluster can be become almost impossible. Database technologies such as Apache Cassandra solve the problem using a different model.

Always consider and document the tolerances for HA and DR across the database cluster.

Integrity

ACID to a large degree takes care of integrity of an RDBMS Source. Always consider to what degree the Application relies on the Database for enablement of two phase commits etc.

Performance

Performance can be the most complex of considerations because of its multi-dimensional nature. Following are the most usual considerations:

Table Normalisation , Under normalisation usually hinders performance the most. Over Normalisation often leads to cryptic schemas , multitudes of Views etc.
Data Size - Sharding has become a popular strategy where horizontal scaling is required. Sharding can itself become very expensive and difficult to manage. extra consideration should be taken when considering this strategy.
Concurrency - Vertical Scaling is often a popular strategy for dealing higher concurrency. For extreme high speed read requirements , consider in-memory storage.
Application optimisation - Stored procedures were popular for Application Performance Optimisation, Processing at the Data tier became less popular overtime as compute tier processing increased (effectively in-memory). processing at the Data Tier is once again becoming popular with In-memory Data Bases. Note that Stored Procedures often led to Batch Systems.

Data Lifecycle

As the value of Data Increases , applications inevitably store more parameters. When the breath of Data Increases , it becomes more important economically to have a clear Data Retention Strategy. Vertical Scaling also has physical limitations and ultimately performance can drop off drastically once the tipping point is reached in an RDBMS. Retention processes must be set in place for all OLTP Systems upfront.

Data Quality

The requirement of Data Quality should always be pushed as close to point of capture as possible, DataSources can assist by following strict structures e.g. Data types, length etc. must be strictly enforced by the Database Schema. Views to identify duplicates can be setup without having to move the data over to an analytical source.

Data Services

All Data sources can eventually be used for MI / BI and Analytics. Some Data Sources may also be required Operationally by other Systems. Performance must be factored for this. Limit as far as possible (especially long running) batches where possible to enable Anytime Access.

Data Structures

Database technologies have rapidly advanced in the past 5-10 Years. New (No) SQL Sources are fast gaining popularity largely because of the agility and light-weight nature i.e. fit for purpose.

Data bases like MongoDB for example seem like good candidates for Web Applications but attention must be given to the above mentioned considerations as transactional integrity may be compromised for performance etc.

Make appropriate use of Key-Value , Graph and other NoSQL datasources.

Advanced Data Source Considerations

Data Retention & Schema Versioning

Schema's should be Version Controlled, Managed and Archived. Data Archives should be tied back to the appropriate Schema to avoid the following:

Security

Not all RDBM Systems are equal, a fundamental yet least understood is the various layers of security offered by each DBMS.

Depending on the sensitivity of the Systems Data, correct security considerations need to apply.

Considerations include but are not limited to :

Does the Database integrate into Enterprise Directory or Does it rely on an internal Security Directory
Does the Database support role based access
Does the Database support encryption
What level of Audit Logging can be configured

Scalability

As a Systems usage scales , so typically does the Database; Scaling ability can often determine a Systems Performance.

Horizontal vs Vertical Scaling

"Horizontal scaling means that you scale by adding more machines into your pool of resources whereas Vertical scaling means that you scale by adding more power (CPU, RAM) to an existing machine."

Vertical Scaling

Vertical scaling rarely impacts the calling application and is usually a simpler route; Vertical scaling is recommended as a first alternative. When this physical tipping point is reached , it is logical to adopt a horizontal scaling approach.

Horizontal Scaling

Databases designed for Big Data Typically follow a Horizontal Scaling Pattern from the get go, These Databases usually hide the complexity of Data Shading by providing single points of access typically through the use of Access Nodes.

Where Database Systems don't hide the complexity, the application may need to deal with many Data Sources to perform a single function which adds allot of complexity into the ecosystem. If it is envisaged that an application may require large amounts of Data, select a Database Management system that supports Horizontal Scaling and ensure that it hides the complexities that come with the strategy.

Always ensure a good retention strategy is in place before considering scaling.

Database Automation

Automation of Database platform provisioning & updates must be planned for. Database setup usually requires robust configuration all of which should be Automated as far as possible; Technology selection must include Automation as a parameter.

Scaling should also be automated where possible.

Data extracts

Data extracts should be catered for in the initial Design, the Database platform must support both application and extract load; Some Systems will require realtime operational access for OLAP and operational Purposes; Consider Support for realtime Extract.

Support & Resilience

Integrated Monitoring , Predictive Performance Analysis etc. should be planned for each Database System. Machine Metrics and DB Logs should be appropriately configured in production environments; Proper Retention Strategy for Database System Logs should be in place. These logs are most informative for Performance , Scaling Requirements etc.

Failures that are predictable like "out of space" are predictable and should identified through proper monitoring.

Database Selection Matrix

The following is a quick matrix on Database Selection

Development Considerations

What Data Structures need to be supported ?
What is the typical size of Data Types ?
Does the Application Require High Degree of Data Integrity ?
What Concurrency would the application require ?

Operational Considerations

Who will Administrate the Database Platform ?
How will the platform be monitored ?
How will the Platform Scale ?
What degree of Automation is achievable ?

Commercial Considerations

Is the Technology Commercially viable for the Systems Use Cases ?
What Training is Available ?
What is the TCO ?
What is is SLA required by the System ?

Security Considerations

What types of Data Access Control is Required ?
What types of Authorisation Controls are available ?
Is Encryption Required and Does the Platform Support this ?

References :

http://searchsqlserver.techtarget.com/definition/ACID http://cassandra.apache.org/ https://datahq.co.uk/knowledge-centre/blog/top-10-considerations-whenchoosing-a-database-management-system

Dalene Minnaar 7y

Nicely laid out Amit thanks

To view or add a comment, sign in

Choosing the right Database

Amit Lalloo

Operational Data reference guide

Key Considerations

Considerations Unpacked

Data Quality

Data Services

Data Structures

Advanced Data Source Considerations

Data Retention & Schema Versioning

Security

Scalability

Database Automation

Data extracts

Support & Resilience

Database Selection Matrix

Development Considerations

Operational Considerations

Commercial Considerations

Security Considerations

More articles by Amit Lalloo

Others also viewed

Partitioning for Faster Queries and Better Scale

OLAP + OLTP: A Challenging Match-Up

Three simple rules to not fail (too much) the foundations of your data design

Data Storage

Techniques of Data Warehouse Performance Optimization

Part 5: Requirements of a Unified Data Delivery Platform

Data Warehouse Triage

The secret of Hana Performance

Relational v’s Star Data Schema’s

Data Lineage challenge: A holistic solution

Explore content categories

Operational Data reference guide

Key Considerations

Considerations Unpacked

Data Quality

Data Services

Data Structures

Advanced Data Source Considerations

Data Retention & Schema Versioning

Security

Scalability

Database Automation

Data extracts

Support & Resilience

Database Selection Matrix

Development Considerations

Operational Considerations

Commercial Considerations

Security Considerations

More articles by Amit Lalloo

Hosting your website for Free on Google Cloud Platform

An approach to modern day meta-data management

Others also viewed

Partitioning for Faster Queries and Better Scale

OLAP + OLTP: A Challenging Match-Up

Three simple rules to not fail (too much) the foundations of your data design

Data Storage

Techniques of Data Warehouse Performance Optimization

Part 5: Requirements of a Unified Data Delivery Platform

Data Warehouse Triage

The secret of Hana Performance

Relational v’s Star Data Schema’s

Data Lineage challenge: A holistic solution

Similar topics

Key Considerations for Data Lifecycle Management

How to Improve Business Operations With Analytics

Best Use Cases for Databases

Explore content categories