Adding an Enterprise "Wing" to SQL Server Basic Availability Group (2016 Standard Edition)

Adding an Enterprise "Wing" to SQL Server Basic Availability Group (2016 Standard Edition)

SQL Server 2016 Standard Edition has introduced a new high availability solution called Basic Availability Group. And it is the replacement of deprecated Database Mirroring feature from earlier version. Microsoft claims a Basic Availability Group (BAG) can support a failover environment (a.k.a Windows Failover Cluster) for a SINGLE database. And it will be created and managed very much similar to the ordinary (advanced) SQL Server AlwaysOn Availability Group from Enterprise edition.

It all sounds quite positive and promised, but we soon realise that the reality is actually not quite the same (that is what the life is, right?). As the BASIC version of SQL Server AlwaysOn Availability Group, it comes with several drawbacks/limitations and “One Basic Availability Group can ONLY support a SINGLE database“ limitation become one the most “unreasonable” one that turns many users off.

In a bespoke application database server deployment project that I was working on recently, due to limited budget from client, we had to implement a failover solution based on SQL Server 2016 Standard Edition instead of Enterprise one. However, I soon realised that the application is a customised version of SharePoint, and it has 12 databases that all need to be included in the failover solution via a single accessing point.

If the failover solution is using SQL Server Enterprise edition, it would be very simple to put all databases in a single availability group and then failover them as "one". But for Standard Edition, each Basic Availability Group can only host a single database. Therefore very likely, the databases in many Basic Availability Groups might be failed over separately and then causing connection errors to client.

So, would it be possible to put a design (workaround) that allow application connect all databases via a SINGLE accessing point and also simulate a process that can force all databases from multiple basic availability groups acting failover like a “WHOLE UNIT” and always stay on the same replica?

Giving the fact that Basic Availability Group is sharing the same technology foundation and principles with ordinary AlwaysOn Availability Group, the answer is quite possibly YES.

Firstly, we should know that an availability group listener is just a virtual network name (VNN) to which clients can connect in order to access a database in a primary or secondary replica of an Always On availability group. And the truth is, an availability group listener actually enables a client (application) to connect to current “primary” replica which is hosting the availability group. So, with correct permissions, the client could certainly access all the rest of database(s) who are hosted on the same SQL instance.

Based on that concept, the ideal for accessing multiple basic availability groups is defining a “logical” primary basic availability group and then create an availability group listener as a single access point. As far as we can keep all other basic availability groups stay the same replica with “primary” basic availability group all the time, the client (application) would be able to access all other databases via the same availability group listener.

Here is a test example to demonstrate how to achieve this workaround.

For test purpose, I have created a two nodes Windows Failover Cluster and installed SQL Server 2016 Standard Edition instance on each note, SQL01 and SQL02. I also created several dummy databases and added each of them into its own Basic Availability Group.

Once they are all in the right place, I defined basic availability group [HAG_WSS_Config] which contains test database [WSS_Config] as my logical primary basic BAG. And finally I created the listener as TEST_SP_SQL.

 So, as you can see from following screenshot, the client (application) is able to access all databases across 12 basic availability groups either from current primary node (SQL01) or the availability group listener (TEST_SP_SQL) as far as other 11 basic availability group stay on the same “replica” with “primary” BAG [HAG_WSS_Config].

Afterwards, I created a SQL Server Agent Job on both replicas and made a repetitive schedule (every 10 seconds, more or less) to check replica status and make sure all 12 basic availability groups always stay on the same replica.

The SQL Server Agent Job, [SYS.Failover All Basic Availability Groups] is executing a customised T-SQL scripts for following purpose:

  1. Use System DMV [SYS.DM_HADR_AVAILABILITY_REPLICA_STATES] to check if current node is the PRIMAY replica for “primary” basic availability group [HAG_WSS_Config];
  2. If the current node is PRIMARY replica then go through all other valid basic availability groups (using loop cursor on [SYS.DM_HADR_NAME_ID_MAP]);
  3. For any basic availability group is not using current node as primary replica, do manual failover and move the basic availability group to current node as same primary replica as [HAG_WSS_Config].
USE master
GO
 
DECLARE @primary_ag UNIQUEIDENTIFIER;
DECLARE @sqlcmd AS VARCHAR(500);
 
SELECT  @primary_ag = ag_id
FROM    sys.dm_hadr_name_id_map
WHERE   ag_name = 'HAG_WSS_Config';
 
IF ( SELECT role_desc FROM   sys.dm_hadr_availability_replica_states
     WHERE  group_id = @primary_ag AND is_local = 1
   ) = 'PRIMARY'
    BEGIN
 
             DECLARE @other_ag UNIQUEIDENTIFIER
             DECLARE @other_ag_name VARCHAR(255)
             DECLARE agCursor CURSOR LOCAL FAST_FORWARD FOR
                    SELECT ag_id, ag_name FROM sys.dm_hadr_name_id_map 
                     WHERE   ag_name <> 'HAG_WSS_Config';
 
             OPEN agCursor
             FETCH NEXT FROM agCursor INTO @other_ag, @other_ag_name
             WHILE @@FETCH_STATUS = 0 BEGIN
                    IF ( SELECT role_desc
                            FROM   sys.dm_hadr_availability_replica_states
                            WHERE  group_id = @other_ag AND is_local = 1
                       ) <> 'PRIMARY'
                           BEGIN
                                 SET @sqlcmd = 'ALTER AVAILABILITY GROUP [' + @other_ag_name +'] FAILOVER;'
                                 EXEC (@sqlcmd)
                           END
                    FETCH NEXT FROM agCursor INTO @other_ag, @other_ag_name
             END
             CLOSE agCursor
             DEALLOCATE agCursor
       END
 
  

Now let’s do a quick test and see how good this SQL Server Agent Job works. On today 02:40:44 PM, I have manually failed over [WSS_Config] database from SQL01 to SQL02. So, the “primary” BAG [HAG_WSS_Config] is now hosted on new PRIMARY replica SQL02. Then as you can see from following screenshots, the SQL Serve Agent Job (on SQL02) soon had detected the “primary” BAG is on a new replica (SQL02 is the now the PRIMARY), and then started manual failover for all the rest of 11 BAGs.

From Failover Event Logs, we can see, on 02:41:55 PM, the last BAG [HAG_WSS_UsageAndHealth] has successfully moved to new primary replica SQL02. And then by that time, all BAGs are now using SQL02 as new PRIMARY replica. The client has then resumed its connection to TEST_SP_SQL availability group listener and now connecting all databases on SQL02. The total failover process time (connection timeout gap) is about 1 minute 10 seconds (and this is highly depends on networking, resource and server specifications).

So, this workaround would wrap multiple basic availability groups as a SINGLE logic unit and do failover for ALL or NONE. And with defined availability group from “primary” BAG as single access point, it would provide a very similar AlwaysOn solution as same as the proper advanced High Availability Group from Enterprise Edition.

Hopefully, this would inspire some practical ideas to you if you are facing the similar limitation challenge from SQL Server Standard Edition.

P.S. If any cluster node or SQL instance is experiencing problem or offline, the Windows Failover Cluster would automatic failover all basic availability groups. Therefore, this workaround is only dealing with “unexpected” failover on database level.

Great article - thanks for posting. This has been very helpful in determining a course of action for my team.

Like
Reply

To view or add a comment, sign in

More articles by ZHENG XU

Others also viewed

Explore content categories