I’ve spent a lot of time trying to make the process of building a SQL AlwaysOn failover cluster in Azure. Due to the limitations of having to use DHCP this caused all sorts of problems when trying to replicate On-Premise installations in Azure. I have now got a nice build process, there is a mixture of PowerShell and GUI. It could all be done with PowerShell but I like to see what is going on sometimes hence my approach.
First off, I am assuming that you’ve built the two SQL Server nodes that are going to form part of the AlwaysOn failover cluster. For information if you are doing this for the first time you need to install 2 standalone SQL installations. There are some complex decisions to be made regarding the SQL design but for this post I am going to stick to the simple installation process. So for my setup I have two Windows 2012R2 servers running SQL 2014 Enterprise. Both servers have a default installation of SQL with a single default instance.
Pre-requisites:
– Azure PowerShell installed and connected to your subscription
– Servers must be in the same cloud service
Ok, lets go!
As mentioned above you should now have a couple of Windows servers with SQL installed on them Check that the SQL Agent is on Auto start as this is normally set to manual, and start the service.
Assign reserved IP addresses to the servers using the following Azure PowerShell
Get-AzureVM -ServiceName CLOUDNAME -Name VMNAME | Set-AzureStaticVNetIP -IPAddress 10.151.129.102 | Update-AzureVM
Install Failover Cluster services on all nodes that will be part of the cluster
Install-WindowsFeature -Name Failover-Clustering –IncludeManagementTools
Install-WindowsFeature –Name RSAT-Clustering-PowerShell
Install-WindowsFeature –Name RSAT-Clustering-CmdInterface
Next we need to configure the firewall on the servers, this is done on the Domain profile Only
– SQL Servers – TCP1433, TCP59999 and TCP5022 (In/Out)
– SharePoint – TCP1433 (in/out)
To make this easier run the following NetSh Command to set firewall on all nodes:
netsh advfirewall firewall add rule name=”SQL” dir=in action=allow protocol=TCP localport=1433 profile=domain
netsh advfirewall firewall add rule name=”AlwaysOn Endpoint” dir=in action=allow protocol=TCP localport=5022 profile=domain
netsh advfirewall firewall add rule name=”AlwaysOn Probe” dir=in action=allow protocol=TCP localport=59999 profile=domainnetsh advfirewall firewall add rule name=”SQL” dir=out action=allow protocol=TCP localport=1433 profile=domain
netsh advfirewall firewall add rule name=”AlwaysOn Endpoint” dir=out action=allow protocol=TCP localport=5022 profile=domain
netsh advfirewall firewall add rule name=”AlwaysOn Probe” dir=out action=allow protocol=TCP localport=59999 profile=domain
Now we need to build Internal Load Balancer (ILB) and join the nodes to it, create a PS1 file from the following (change variables where applicable)
# Define variables
$ServiceName = “devrobsql” # the name of the cloud service that contains the availability group nodes
$AGNodes = “embo”,”elan” # all availability group nodes containing replicas in the same cloud service, separated by commas
$EndpointName = “SQLEP1” # name of the endpoint
$EndpointPort = “1433” # public port to use for the endpoint
$ILBName = “SQLILB” # chosen name for the new ILB
$SubnetName = “RC-Subnet” # subnet name that the replicas use in the VNet
$ILBStaticIP = “10.151.129.103” # static IP address for the ILB in the subnetAdd-AzureInternalLoadBalancer -InternalLoadBalancerName $ILBName -SubnetName $SubnetName -ServiceName $ServiceName -StaticVNetIPAddress $ILBStaticIP
# Configure a load balanced endpoint for each node in $AGNodes using ILB
ForEach ($node in $AGNodes)
{
Get-AzureVM -ServiceName $ServiceName -Name $node | Add-AzureEndpoint -Name $EndpointName -LBSetName “$EndpointName-LB” -Protocol tcp -LocalPort $EndpointPort -PublicPort $EndpointPort -ProbePort 59999 -ProbeProtocol tcp -ProbeIntervalInSeconds 10 -InternalLoadBalancerName $ILBName -DirectServerReturn $true | Update-AzureVM
}
It’s now time to build the failover cluster. This is done via PowerShell because of the issues with setting up clusters with DHCP.
Connect to one of the SQL nodes that is to be part of the cluster and Run the following PS1 file to create the cluster nodes.
<#
* Copyright Microsoft Corporation
*
* Licensed under the Apache License, Version 2.0 (the “License”);
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an “AS IS” BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
#>#########################################################################################################
#
# Powershell script to create a cluster in Azure environment
# It must be run on a machine which is going to become a node in the cluster
#
# Syntax:
#
# .\CreateAzureFailoverCluster.ps1 -ClusterName “ClusterName” -ClusterNode “Node1″,”Node2”,… [-Force]
#
# Example 1
# Running this command on Machine1 or Machine2 will create cluster “MyCluster”
#
# .\CreateAzureFailoverCluster.ps1 -ClusterName “MyCluster” -ClusterNodes “Machine1”, “Machine2”
#
# Example 2
# Running this command on Machine1 or Machine2 will create cluster “MyCluster”, erasing any previous cluster state that was on the nodes
#
# .\CreateAzureFailoverCluster.ps1 -ClusterName “MyCluster” -ClusterNodes “Machine1”, “Machine2” -Force
#
# Summary of the script:
# 1. Create a 1 node cluster on local node
# 2. Set cluster IP address to static IP 169.254.1.1
# 3. Online the “Cluster Name” so that it will create an entry in active directory
# 4. Add the other nodes to the cluster
#
#
#########################################################################################################
param(
[Parameter(Mandatory=$true)]
$ClusterName,
[Parameter(Mandatory=$true)]
$ClusterNodes,
[Parameter(Mandatory=$false)]
[Switch]$Force)
Import-Module ServerManager
$OsBuildNumber = [System.Environment]::OSVersion.Version.Build
if ($OsBuildNumber -lt 7600)
{
Write-Error “Not support on Windows Visa or lower”
exit 1
}
elseif ($OsBuildNumber -lt 9200)
{
Write-Output “Windows Server 2008 R2 detected”
$ClusterFeature = Get-WindowsFeature “Failover-Clustering”
$ClusterTools = Get-WindowsFeature “RSAT-Clustering”
if ($ClusterFeature.Installed -eq $false -or $ClusterFeature.Installed -eq $false)
{
Write-Output “Needed cluster features were not found on the machine. Please run the following command to install them:”
Write-Output “Add-WindowsFeature ‘Failover-Clustering’, ‘RSAT-Clustering'”
exit 1
}
}
else
{
Write-Output “Windows Server 2012 or above detected”
$ClusterFeature = Get-WindowsFeature “Failover-Clustering”
$ClusterPowerShellTools = Get-WindowsFeature “RSAT-Clustering-PowerShell”
$ClusterCmdTools = Get-WindowsFeature “RSAT-Clustering-CmdInterface”
if ($ClusterFeature.Installed -eq $false -or $ClusterPowerShellTools.Installed -eq $false -or $ClusterCmdTools.Installed -eq $false)
{
Write-Output “Needed cluster features were not found on the machine. Please run the following command to install them:”
Write-Output “Add-WindowsFeature ‘Failover-Clustering’, ‘RSAT-Clustering-PowerShell’, ‘RSAT-Clustering-CmdInterface'”
exit 1
}
}
Import-Module FailoverClusters
$LocalMachineName = $env:computername
$LocalNodePresent = $false
# The below line will make sure that the script is running on one of the specified cluster nodes
# The Spplit(“.”) is needed, because user might specify machines using their fully qualified domain name, but we only care about the machine name in the below verification
@($ClusterNodes) | Foreach-Object {
if ([string]::Compare(($_).Split(“.”)[0], $LocalMachineName, $true) -eq 0) {
$LocalNodePresent = $true } }
if ($LocalNodePresent -eq $false)
{
Write-Error “Local machine where this script is running, must be one of the cluster nodes”
exit 1
}if ($Force)
{
Write-Output “Forcing cleanup of the specified nodes”
@($ClusterNodes) | Foreach-Object { Clear-ClusterNode “$_” -Force }
}
else
{
Write-Output “Making sure that there is no cluster currently running on the current node”
$CurrentCluster = $null
# In case there is no cluster presetn, we don’t want to show an ugly error message, so we eat it out by redirecting
# the error output to null
$CurrentCluster = Get-Cluster 2> $null
if ($CurrentCluster -ne $null)
{
Write-Error “There is an existing cluster on this machine. Please remove any existing cluster settings from the current machine before running this script”
exit 1
}
}
Write-Output “Trying to create a one node cluster on the current machine”
Sleep 5
New-Cluster -Name $ClusterName -NoStorage -Node $LocalMachineName
Write-Output “Verify that cluster is present after creation”
$CurrentCluster = $null
$CurrentCluster = Get-Cluster
if ($CurrentCluster -eq $null)
{
Write-Error “Cluster does not exist”
exit 1
}
Write-Output “Bring offline the cluster name resource”
Sleep 5
Stop-ClusterResource “Cluster Name”
Write-Output “Get all IP addresses associated with cluster group”
$AllClusterGroupIPs = Get-Cluster | Get-ClusterGroup | Get-ClusterResource | Where-Object {$_.ResourceType.Name -eq “IP Address” -or $_.ResourceType.Name -eq “IPv6 Tunnel Address” -or $_.ResourceType.Name -eq “IPv6 Address”}
$NumberOfIPs = @($AllClusterGroupIPs).Count
Write-Output “Found $NumberOfIPs IP addresses”
Write-Output “Bringing all IPs offline”
Sleep 5
$AllClusterGroupIPs | Stop-ClusterResource
Write-Output “Get the first IPv4 resource”
$AllIPv4Resources = Get-Cluster | Get-ClusterGroup | Get-ClusterResource | Where-Object {$_.ResourceType.Name -eq “IP Address”}
$FirstIPv4Resource = @($AllIPv4Resources)[0];
Write-Output “Removing all IPs except one IPv4 resource”
Sleep 5
$AllClusterGroupIPs | Where-Object {$_.Name -ne $FirstIPv4Resource.Name} | Remove-ClusterResource -Force
$NameOfIPv4Resource = $FirstIPv4Resource.Name
Write-Output “Setting the cluster IP address to a link local address”
Sleep 5
cluster res $NameOfIPv4Resource /priv enabledhcp=0 overrideaddressmatch=1 address=169.254.1.1 subnetmask=255.255.0.0
$ClusterNameResource = Get-ClusterResource “Cluster Name”
$ClusterNameResource | Start-ClusterResource -Wait 60
if ((Get-ClusterResource “Cluster Name”).State -ne “Online”)
{
Write-Error “There was an error onlining the cluster name resource”
exit 1
}
Write-Output “Adding other nodes to the cluster”
@($ClusterNodes) | Foreach-Object {
if ([string]::Compare(($_).Split(“.”)[0],$LocalMachineName, $true) -ne 0) {
Add-ClusterNode “$_” } }
Write-Output “Cluster creation finished !”
exit 0
This excellent script is taken from the Microsoft Script gallery here, I recommend checking for a later version. It’s also a great resource for PowerShell scripts.
SQL Configuration
VERY IMPORTANT! If the consuming applications are using service accounts then all nodes must have the accounts added to each instance of SQL with the same “Server” role permissions. The database permissions will get set when the DBs are sync’d.
Ensure that TCP/IP Protocol is enabled on all the SQL nodes
Enable “Always-On” through SQL Configuration Manager” (perform on all nodes of the cluster)
The next steps require there to be a database present on the SQL server that is going to be the primary
Connect to the node that will be the primary in the cluster
Set user databases recovery model to “Full” (these are the databases you want to be part of the AlwaysOn Group)
This little SQL query will help identify the databases that need changing
select [name], DATABASEPROPERTYEX([name],’recovery’)
from sysdatabases
where name not in (‘master’,’model’,’tempdb’,’msdb’)
Take a Full Backup of the user databases you’ve selected to be part of the AlwaysOn group (if you have a lot of databases it might be quicker to make a maintenance task to do this)
Create a share on this SQL server and make sure you can access the share from the other nodes.
Create the availability group on the node that is hosting the databases (in this example my consuming application is SharePoint)
– Specify a name:
– Select databases:
– Specify a replica, hit the “Add replica” button and select the other node(s)
– Click on “EndPoint” tab and check the port matches the “AlwaysOn Endpoint” firewall we added earlier
– Click the “Backup Preferences” and select an option that suits your environment (here I’ve gone for the default)
– On the “Listener” tab leave the option set to “Do not create an availability group listener now”
– Select the share you created earlier.
– Run the verification and all being well it’ll look like this (ignore the listener warning)
– Click “Finish” on the summary page
The process will start to build the always on group now. If all has gone to plan you’ll get a nice happy message like this
Click Close to finish
Create the listener
Before creating the listener make sure that the Cluster computer account has the permission to create computer objects within the “Computers” OU on the domain.
To do this open up “Active Directory Users and Computers” MMC and find the computer object created when the cluster was created. In this case the object is called “SQLCLUSTER”
Right click on the “Computers” OU within AD and select “Delegate Control” -> Click “Add” and select the computer account. (remember to add Computer objects type before trying to resolve the cluster name)
Select “Create a custom task to delegate”
Select “This folder, existing objects in this folder, and creation of new objects in this folder”
Scroll down the permissions list and select “Create all Child Objects”
Click “Finish” to exit
Create listener on the failover cluster
Open up the Failover Cluster manager on the primary node.
Expand Roles
Here you’ll see the AlwaysOn group that you created earlier (in my case “SharePointAO)
Right click the object and select “Add Resource” -> “Create Access Point”
Create a name for the listener, I am going to call it “SQLAlwaysOn”
Click through the wizard, if you get an error here check the AD permissions you set for the cluster computer object above.
Right Click the role (SharePointOA) again and at the bottom of the screen click the “Resources” tab
Expand the newly created resource (SQLALWAYSON)
Right-click the network address and select Properties
Change the IP address to Static IP Address and set the address to the one that was configured for the ILB above. Also give it a name that is a bit more descriptive
Open a PowerShell command window with elevated privileges.
Run the command and make a note of the cluster network name.
Get-ClusterNetwork
Run the following PowerShell on each of the nodes (change variables as required)
– ClusterNetworkName is what you noted above
– IPResourceName is the name you gave the IP resource above (SQLALWAYSON-IP)
– ILBIP is the IP address of the Internal Load Balancer
$ClusterNetworkName = “Cluster Network 1”
$IPResourceName = “SQLAlwaysOn-IP”
$ILBIP = “10.151.129.103”Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{“Address”=”$ILBIP”;”ProbePort”=”59999″;”SubnetMask”=”255.255.255.255″;”Network”=”$ClusterNetworkName”;”OverrideAddressMatch”=1;”EnableDhcp”=0}
Go back to the Failover Cluster Manager
Click the Role (SharePointAO) and then click the Resources tab and finally right click the “Resource” and click “Bring Online”
If everything is done correctly you’ll see this
Right click the “SharePointAO” object under “Other Resources” and select “Properties”
Navigate to the “Dependencies” tab
Select the “SQLALWAYSON” resource (not the IP) and click “OK”
Now fire up SQL Management Studio
Select AlwaysOn High Availability -> Availability Groups -> <AvailabilityGroupName> -> Availability Group Listeners
Right click Properties and change the port to 1433 (or whatever the SQL is using)
To test the listener is working correctly try connecting to is from SQL Management Studio by entering the listener name into the server name box
You can now re-point your application to the listener name rather than the SQL node name. If you are using aliasing this is easy to do.
Once the application is using the listener you can test failover
Note: Quorum – to ensure proper failover you need to have an additional quorum vote (this could be a shared disk, file share). Without this failover testing in the case of a machine failure will not be successful.
Pingback: SharePoint 2013 with SQL 2012 Always On in Azure | MyStar Blog