DBInsight's Blog

SQL Server Failover between AWS and Azure – Part 1

We have all heard the story about the online business whose AWS account got hacked and had all of its Cloud inventory destroyed, then there’s the occasional scenario where a Cloud provider is totally disabled by a “core” infrastructure fault.

What if you could seamlessly failover between Cloud providers and eliminate the “all eggs in one basket” syndrome.

Well indeed it is possible, using a Site to Site VPN between AWS and Azure and the magic of SIOS DataKeeper Clustered Edition I was able to create a multi-subnet SQL Server 2014 three node SQL Server Cluster.

The SIOS DataKeeper Clustered Edition block replicates one of more disks between the cluster nodes. Basically from the Windows Cluster and SQL Server point of view the replicated disk looks like a typical shared disk device. The added benefit is that it also eliminates a single of point of failure for disk failures since there is a mirror copy of the clustered data disks on the passive node/s.

Another benefit of this configuration is that you don’t have to worry about syncing SQL Server logins, SQL Agent jobs etc since the master and msdb databases are on the mirrored disk and the environment acts like a typical SQL Server cluster during fail over.

Also the fail back to AWS is straight forward, from the Azure VM you can easily resync the mirrored copy of Disk F: onto the AWS SQL Servers once the AWS servers have come back online.

In my test scenario I used SQL Server 2014 Enterprise Edition so I could create a three node, multi-subnet SQL Server cluster but don’t despair there is a configuration to support SQL Server Standard Edition (See part 2 of this series)

The configuration implemented in my test scenario included a SQL Server in two separate AWS Availability Zones  using synchronous replication of the shared disk, the third node was in Azure with the shared disk maintained by asynchronous disk replication.

The high level setup tasks are shown below, review the attached links from David Bermingham (clusteringformeremortails series) for the detailed steps on configuring SIOS DataKeeper and the Windows/SQL Server Cluster :-

  1. Configure a Site to Site VPN between AWS and Azure. I used this link http://msdn.microsoft.com/en-us/library/dn636917.aspx to guide me through the process, however, in my case I had two servers in AWS for SQL Server, 1 RRAS server in AWS, 1 AD server in AWS and 1 server (3rd cluster node) in Azure.  Here’s another link from David that can also assist in creating the Site to Site VPN http://clusteringformeremortals.com/2014/01/03/how-to-create-a-site-to-site-vpn-tunnel-to-the-windows-azure-cloud-using-a-window-server-2012-r2-routing-and-remote-access-rras-server/
  2. For the production environments you should create an AD in each AZ and Azure but in my test case I only created 1 AD, you would also configure separate subnets for the AD servers (in AWS) and security groups (in AWS) to separate out domain member servers and AD servers.
  3. On all of servers to be clustered, create a Disk F: to be used in the mirroring set. Note the Disk F: on all servers should be the same size.
  4. Install the SIOS DataKeeper (DK) software on each server to be clustered and register the license file. I used a domain account that was assigned local admin rights on the servers to run the DK service.
  5. Below IP 172.31.16.180 is the SQL Server cluster IP and 172.31.16.10 is the Windows cluster IP.
    The following link walks through the definition of a Windows Cluster in Azure however the DataKeeper setup for AWS is the same. http://clusteringformeremortals.com/2014/01/10/creating-a-sql-server-2014-alwayson-failover-cluster-fci-instance-in-windows-azure-iaas-azure-cloud/
  6. Create a Windows Cluster for the AWS servers.  You will need to bind a secondary private IP address for the Windows Cluster network name and another for the SQL Server Cluster network name on each AWS server using private IPs in the same subnet range of the server. This will facilitate the correct IP address to be assigned for the Windows cluster and SQL Server Cluster network names at failover.
  7. Invoke the DataKeeper GUI on server 1 and create a Job to mirror Disk F from the AWS server 1 to AWS server 2. You will get an option to add Disk F: to the Windows Cluster as part of the Disk F mirroring setup.DataKeeper disk mirroring job for the AWS servers.
    Windows Cluster shared disk.
  8. Install SQL Server clustering as per the normal sequence using Disk F: as the shared drive on the Cluster Disk Selection dialog.
  9. Add the Azure server to the AWS Windows cluster, then create a DataKeeper disk mirror from the AWS server 1 Disk F: to the Azure VM Disk F:. The following link from David provides the steps http://clusteringformeremortals.com/2014/01/ .Note before creating the mirrored disk I created the Endpoints (for TCP 137, 138, 139, 445, 9999 and 10005) on the Azure VM to allow DataKeeper to work between AWS and Azure. TCP 10005 relates to Disk F, if you have other drive letters then the TCP port number will change (see the DK technical doc for the disk to port mappings). I also configured an ACL to allow access to these ports from the AWS VPC private IP range.On the AWS side I configured ALL TCP access from the Azure SQL Server subnet to the AWS SQL Server security group inbound rules.  In production you would lock it down to the same port numbers as defined for the Azure side.Make sure you pick asynchronous mirroring and configure compression for mirroring to the remote site (Azure VM).The DataKeeper job should look similar to below where DBISQL03 is the Azure VM.
  10. Install SQL Server clustering on the Azure node.
    Again review David’s link http://clusteringformeremortals.com/2014/01/ for the detailed steps.
  11. In AWS configure create an entry in the Route table to direct all Azure destination traffic though the second network device on the RRAS server. The second network device is the external facing interface linked to the Azure VPN.
  12. To facilitate Azure client connections to an active Azure SQL Cluster (in a failover to Azure scenario) , you will have to create a client access point, a load balanced endpoint and then adjust the SQL Cluster IP. The Azure clients will connect to the active Azure SQL Cluster using the Load Balancer name or Load Balancer IP address.See the section Configuring the Client Access Point in this article from David http://blogs.msdn.com/b/microsoft_press/archive/2014/08/18/from-the-mvps-sql-server-high-availability-in-windows-azure-iaas.aspx

Under normal operations the primary AWS Windows Cluster IP address is online (in my case 172.31.16.10) and other (subnet) Windows Cluster IP addresses are offline.

Same for the SQL Server Cluster IP Address.

On failover to the Azure node you see that the Azure SQL Server Cluster IP address (192.168.2.11) is online the AWS IP addresses are offline.

Other items you should consider :-

  1. Configure your cluster quorum votes to prevent automatic failover of resources to Azure (remote site). In this case you would have a witness file share perhaps in a third AWS availability zone and grant a quorum vote to both the AWS cluster nodes and the AWS file share. If connectivity to AWS is lost and you have to fail over to Azure then you will need to perform a force quorum on the remaining cluster node in Azure.
  2. Note that the mirroring to the Azure VM is asynchronous so you would want to have an opportunity to ensure that all data is in sync before doing a controlled fail over, or if there is unexpected failure of AWS resources you can then make a call to fail over with the potential for data loss.
  3. Create AD servers in each AWS availability zone and Azure for redundancy.
  4. Configure a shorter TTL for DNS name resolution of the SQL Server cluster name in the AWS AD.
  5. Tune the CrossSubnetDelay and CrossSubnetThreshold cluster heartbeat parameters.
    See article from Elden Christensen http://blogs.msdn.com/b/clustering/archive/2012/11/21/10370765.aspx
  6. Consider redundancy strategies for the RRAS server.
  7. Test the throughput limitations between AWS and Azure.
  8. Determine how you want to handle MSDTC, i.e. configure a clustered MSDTC resource using a mirrored disk for specific for MSDTC or bind to the local MSDTC on each server.

Posted in: AWS, Azure, Cloud, Cluster, High Availability

Leave a Comment (4) ↓

3 Comments

  1. Clinton Thomson November 21, 2014

    Nice tech guys, worth a consideration when we step into Azure as well. We’re all about Amazon AWS right now.

    reply
  2. JOEY August 18, 2017

    Hi,

    I am not sure how this will work. Can you please explain something, is this just replicating the disk to the other servers? Or actually storing data across them in real time.?

    Because consider this.

    if im writing a transaction with a lot of data say 100meg (yes not probably but let’s just go with it for example sakes).

    so it gets committed to the primary node, and the software then tries to copy that data to the other 2 nodes? what happens if the server goes down before it could read the file and synchronise it? this transaction will never hit the mirrored servers.

    So basically, what needs to happen, is that the transaction needs to be FIRST replicated across all the sites before its safe to commit and the end users gets the response. Now im pretty sure this can just be done with SQL server its self? or worst make sure its wrapped in a global transaction, but that then means what if one server is down, does it commit that transaction, in the hope it will get rebuilt once it comes online? since its missing that data, and what about if that server with the new transaction goes down, and the other one comes online without the ability to now replicate it? see its not that simple im afraid.

    reply
    • Rob Risetto August 18, 2017

      Hi Joe,
      SIOS is block level replication from the primary source disk to the replica disk. Synchronous transfer will confirm the commit has been hardened on the replica before confirming the commit on the primary disk, this ofcourse can delay a SQL commit especially if network latency is high. If the replication suspends then data loss is possible. This is similar to SQL Db mirroring and AG mirroring.

      reply

Leave a Comment