AlwaysOn Availability Groups Migration - Case Study Part 1
Part 1 > Part 2 > Part 3
My current client has a very complex environment where up time is extremely important. They are a payment processor handling credit card payments 24x7x365.
They currently are using a combination of SQL Server Mirroring and Failover Cluster Instances for their high availability and disaster recovery across a WAN between data centers.
A mirroring failover is known to be a much faster recovery than clustering. This is mostly due to the disk arbitration that takes place during a failover to protect the data on the disk.
For this reason, the client was unhappy with Failover Clustering Instances. They wanted the failover to always be a mirroring failover to result in the smallest downtime.
The Windows Failover Cluster was a multi-subnet cluster traversing both data centers. There were two environments R-710s and R-720s. The R-710s were coming to the end of life. Each had a Fusion-IO solid state drive for the data. The free space on the drives was dwindling.
The mirroring witness resided in Bermuda which encountered irregular network issues causing loss of quorum frequently.
The client decided to rebuild the servers, re-installing Windows Server 2012 R2 so they have a "clean" environment.
The client had the following requirements:
- Re-install Windows on the servers.
- Increase redundancy.
- Use existing equipment.
- Keep license cost to minimum.
- Failover must be automatic with zero loss of data.
- Complete the project with near zero downtime.
With this information I recommended the following:
- Consolidate the two environments, preventing the need to purchase additional licenses1 for the new environment (new licenses hadn't been purchased for SQL Server Enterprise on the R-720s).
- Evict the passive nodes from the cluster, re-install, team the NICs for redundancy, create a new cluster (named properly).
- Add Fusion-IO drives from R710s to R720s.
- Log ship databases to rebuilt machines.
- Create Availability groups for each business region.
- Flip from Log Shipping to Availability Group.
- Tear down the old cluster.
- Rebuild the last set of servers.
- Add to new Windows Failover Cluster.
- Add servers as new availability replicas using PowerShell to minimize downtime.
Each product will have 4 replicas, 2 synchronous (local and across WAN2). Against best practices of only using asynchronous availability mode across the WAN, because the application was unable to handle the prevention of the loss of transactions during failover. The transaction(s) could possibly be committed or rolled back locally but not on the replica. They can't afford for a $100,000 transaction or batch of transactions to be committed at the client but not committed in the database..i.e. the credit card transactions were accepted but not recorded (or be paid for).
1The machines were licensed for SQL Server Enterprise on 2 - 8 core CPUs. With two active and two passive nodes, that is 32 core (16 - 2 pack licenses). They had an extra 2 pack license, so they had to purchase an additional license.
The client approved the design and the migration plan.
Over the next few weeks I will go into the details about how we migrated the 25 databases with less than 5 minutes of down time. Keep in mind I'm not a PowerShell guru, this is the most PowerShell I've ever written.