23 March 2016

Failure to configure AD FS | Log on as a service

Recently whilst setting up a new AD FS deployment at a client's office, I ran into some troubles completing the initial AD FS configuration (done as soon as you add the AD FS Role).  After it failed to configure, I went into Event Logs, to see that the error was that the account "NT Service\MSSQL$MICROSOFT##WID" did not have the required user right "Log on as a service".  Generally when setting this up, the service account will automatically be granted the "Log on as a service" right, but in this case, something was blocking it.

When checking the local policy, I noticed that I couldn't configure the users who have been granted that right (it was greyed out).  This means that the setting is controlled by a Group Policy.  After checking GPMC, I found the setting within the Default Domain Policy.  To resolve this issue, I set "NT SERVICE\ALL SERVICES" to be granted that right.

A quick gpupdate /force on the ADFS server then resolved the issue, and allowed me to continue with the ADFS configuration.

21 March 2016

SSO with SAML/ADFS not working - NLB Cluster with Proxy NLB Cluster

A client of mine recently had an issue with Single Sign On failing to work when attempting to login through an external website, which had been setup with SAML integration to their on-prem AD FS servers.  When going to sign in, it would redirect to fs.domain.wa.gov.au and it would fail to load.  Their setup consisted of two ADFS servers which were in a NLB (Network Load Balancing) cluster, and then two Proxy servers, which were also in their own NLB cluster.  The NLB clusters had been setup in IGMP Multicast mode and the appropriate settings had been configured on the Cisco switches.

Upon investigating this issue, I realised that the Cisco core switch failed to learn the IP/MAC address of the ADFS NLB Cluster, and thus caused all servers and workstations to fail to connect to ADFS, and SSO to fail.  The interesting part was that the Proxy servers' NLB cluster was working fine, and the Cisco Core Switch was able to learn the MAC and IP without problem.  After investigating this issue and not finding anything online, and after deleting and re-creating the cluster, I logged a case with Microsoft.  They advised that there's two Microsoft updates which could potentially resolve the NLB connectivity issue:

Update 1
Update 2

These updates were applied to both ADFS servers, and after a reboot, resolved the problem.  After testing SSO, it was still failing.  Checking the Proxy Servers now, I noticed that they can communicate with the ADFS servers now, but the Web Application Proxy was not connecting with ADFS itself, showing the following event log:

After checking online, this can occur when there's a mismatch with the certificate thumbprint on the ADFS servers, and the Proxy servers.  From PowerShell, I ran the following command to see what certificatges were installed on the server, and to confirm the thumbprint of the specific one I wanted to use:

dir Cert:/LocalMachine/My

After obtaining the correct thumbprint, I then ran the following command on the two Proxy Servers:
Install-WebApplicationProxy -CertificateThumbprint 'XXXXXXXXXXXXXXXXXX' -FederationServiceName 'fs.domain.wa.gov.au'

Once this had been completed on both Proxy servers, Single Sign On was now back up and running, and the Proxy Servers could now communicate with the ADFS servers.