How to reconcile Microsoft SQL Server when you need a globally distributed application?
When you already have an application that has been developed using traditional platforms such as Microsoft SQL Server and Microsoft IIS web server, how do you extend that to users which are in different parts of the globe and need high-performance access?
Context
Microsoft SQL databases are in widespread use for many applications and systems. Historically this has been a common choice. However, Microsoft SQL has its roots firmly in 'traditional' and typically 'monolithic' applications, and although it's well understood and dependable it has its limitations. These limitations become more evident in context of the rapid changes happening as a result of public-cloud services. With the rise of cloud services from AWS, Azure and Google in particular, there is an explosion of 'cloud native' applications - built on platform-as-a-service components. We're in a period of significant change where there are new opportunities and ways of doing things differently. Cloud services are providing not only speed and agility, but enabling globalisation of IT. This includes a whole raft of new database platforms that challenge the traditional relational database often dominated by Microsoft SQL Server.
Challenges
It's been the 'holy grail' for many years to provide users with universal access to applications regardless of where they are and the device they're using. Thin-client computing, particularly Citrix in conjunction with Microsoft has come close to closing that gap. However thin-client technology still has some limits, such as: 1) latency: round-trip times to far distant places such as the other side of the globe, even using fibre there's a noticeable lag in response times; 2) printing: despite improvements it can be awkward and suffer from issues over low-bandwidth connections; 3) Scanning and uploading data: large-scale transfer of data back to the central 'hub' is often challenged by the available bandwidth.
Databases like Microsoft SQL Server are at the heart of many traditional 'three-tier' applications. However, these are designed to scale-up rather than scale-out, meaning there is a single source-of-the-truth in the form of a single primary database. Application servers such as IIS and others are kept close by to ensure good performance, as are often the end-users too (unless extended by the likes of Citrix). The challenge comes when this doesn't work for users who might be over 10,000 miles away. How do you give users in say Sydney, London and New York - equal access with uniform performance - to a single central database?
Opportunities
Native SQL Server
There are established capabilities in Microsoft SQL Server which allow for replication, but these are to provide high-availability, resilience and disaster-recovery. Always On Availability Groups in SQL Server 2016 allow for a primary database to replicate to up to eight replica secondary databases. The primary replica is available for read-write connections, but secondary databases are only read-only (if configured for this). Having a read-only copy of a database may work for some use-cases, but for many it's not enough. There is still the issue that you must write data to a single master database.
Azure SQL Database
Microsoft now provides SQL Database as a fully managed service on Azure. This takes the core of the traditional SQL Server and places it as a PaaS offering in the cloud, with instant access, guaranteed performance and availability and easy maintenance. There's even more added with built-in intelligence to maximise performance, reliability and data protection. Azure SQL offers 'active geo-replication' which is comparable with Always On Availability Groups. It enables you to configure up to four readable secondary databases in the same or different Azure regions. Geo-replication lets you query or to manually failover to secondary databases (Azure SQL Database auto-failover groups are in preview right now). However, active geo-replication doesn't add any further value to solving the challenge in question.
Azure SQL Data Sync (preview)
Although this was available in an earlier incarnation several years ago, a significant refresh was announced June 2017. With Azure SQL Data Sync (preview) you are able to synchronise data bi-directionally between multiple Azure SQL databases - including combinations of on-premise SQL Server and Azure SQL Database. Data Sync is now integrated with the new Azure Portal (ARM), and new features include: PowerShell and REST APIs to make tasks faster and easier; better security, privacy and resilience; plus enhanced monitoring and troubleshooting.
Data Sync is based around the concept of a Sync Group which is the group of databases you want to synchronise. It includes: the Sync Schema (data to be synchronised); Sync Direction (bi-direction or one-direction); Sync Interval; and Conflict Resolution Policy. Data Sync has a hub-and-spoke topology where one database is the Hub Database, the other databases are member databases. Sync occurs between the Hub and members. The Hub Database must be an Azure SQL Database, but member databases can be Azure SQL Databases and/or SQL Server running as a VMs on Azure. Finally, there is a Sync Database which is another Azure SQL Database located in the same region as the Hub Database - and contains the metadata and log for Data Sync.
Data Sync is ideal in cases where data needs to be kept in sync across several SQL databases. It can be used to support globally distributed applications. Where a business spans several regions and countries but you need minimise network latency and have data close to you, Data Sync will keep databases in regions around the world synchronised.
It should be noted that Data Sync does have some general requirements that need to be considered (https://docs.microsoft.com/en-us/azure/sql-database/sql-database-sync-data). Also, Data Sync is trigger-based, so it's a concept of eventual consistency. Microsoft guarantees that all changes will be made and Data Sync does not cause data loss. However, changes will take several minutes to synchronise. Hence it is important to understand that Azure SQL Data Sync, does not provide strong consistency, it doesn't provide distributed synchronous transactions between multiple writable masters. Instead, data replication is: asynchronous; will have some latency; it does not necessarily follow original transaction order; conflicts may happen; and the application must be able to tolerate this or designed to work with it. Also, Data Sync is not suitable to replicate a multi-terabyte database with millions of transactions per second.
As Data Sync is still in preview, features and capabilities are expected to change. However, the current limitations are:
- Replication time: there is no SLA on replication latency.
- Limitations on service and database dimensions: max number of endpoints in a single sync group is 30, max 500 tables in a Sync Group, max 100 columns per table, max row size is 24MB, maximum replication frequency is 5 minutes.
- Performance Impact: Data Sync uses insert, update, and delete triggers to track changes. It creates side tables in the user database. These activities have an impact on your database workload.
- Eventual Consistency: since Data Sync is async, transactional consistency is not guaranteed. Microsoft guarantees that all changes are made eventually, and that Data Sync does not cause data loss.
- Unsupported data types: FileStream, SQL/CLR UDT, XMLSchemaCollection (XML supported), Cursor, Timestamp, HierarchyID.
Cloud-native Databases
There will always be limits to working with relational databases such as SQL. However, there are many alternative databases which are perhaps more suited to many applications which are designed for the cloud. It makes sense that this new challenge of proving apps and data to a global user base is solved by the same cloud technologies that have been part of creating the globalisation of IT.
Taking Microsoft Azure as an example, it is currently available in 36 regions round the world and growing. It provides a host of platforms and services that give developers the ability to build, deploy and manage globally distributed applications. In terms of data, Azure offers many different database platforms from Microsoft, third-party and open-source. Which database fits depends on your requirements and perhaps your ability to re-engineer an existing app or build a new app from the ground up.
Of particular note is Azure Cosmos DB. This is Microsoft's globally distributed, multi-model database service for mission critical applications. It's got some impressive credentials:
- Elastic scaling of throughput and storage worldwide
- Guaranteed high availability (backed by SLAs)
- Automatically indexes data
- Is multi-model and supports document, key-value, graph and columnar data models
- Engineered for global distribution
Unlike SQL where global distribution is an afterthought (trying to bolt-on over a single-site database system), Azure Cosmos DB is a native globally distributed database. There is no replication to build over the database, it’s there by design.
Azure Cosmos DB is available in all Azure regions globally. You can associate any number of Azure regions with an Azure Cosmos DB database account, there are no limits on the number of regions you can associate with.
Final Thoughts
Although using the familiar Microsoft SQL database technology is often easy and familiar, making a break to explore new cloud-native databases should be on your radar. There's still mileage left in SQL database, but modern cloud apps are pushing the boundaries. Global apps require global services, and the cloud platforms such as Microsoft Azure can provide what is needed. The hard part is understanding what is available and how to use it to your advantage.