Microsoft Lync Server 2013 has many architectural changes that provide better efficiency for user and data replication. The Microsoft Lync team implemented these changes so that the product would run smoother than previous editions. A particular area the new version addresses is how users are homed to Lync Front End servers and how this process plays into increased high availability. The combination of Windows Fabric, which handles replication, and User Groups, which are basically Lync user containers, has made things "magically" work smoother in the background. Let me explain a little about Windows Fabric, User Groups, and how this all works together.

Windows Fabric

Windows Fabric is a Microsoft technology used for creating highly reliable, distributable, and scalable applications. The following are some of the responsibilities of Windows Fabric in Lync 2013:

  • allows for the brick model architecture
  • replicates data between Front End servers
  • allows Lync Server to support as many as 12 Front End servers (10 is the max with Lync 2010)
  • used by the Lync Server Storage Service for replication
  • assists with maintaining three copies of users' data on the Front End servers
  • supports a new concept, an Upgrade domain, which lets you patch servers without negatively affecting users

Windows Fabric is a service that gets installed with Lync Server 2013 as part of the server setup. It can be installed on Windows Server 2008 R2 or Windows Server 2012; after Windows Fabric is installed, it acts as a management utility to handle the Lync 2013 replication.

Lync User Groups

Each Lync 2013 user belongs to a User Group that enables the user to sign in to a particular Front End server and have all of his or her user data present. Using a concept known as Brick Model architecture, the Lync product group was able to reduce the dependency on SQL Server. By leveraging the capabilities of Windows Fabric, the back-end SQL Server database is now only loosely coupled with the Front End servers. To learn more about Brick Model architecture, see "Lync Server 2013: Brick Model."

From an operational perspective, the single most visible aspect of these changes was that Front End servers are now responsible for managing user state, and as a result now contain the database that used to reside on the back-end SQL Server instance. This change was necessary in order to maintain presence and contacts during a back-end SQL Server outage and manage presence states in a consistent manner. In Lync 2010, presence for users was stored on the back-end SQL Server, which put more stringent hardware and network requirements on SQL Server to ensure fast, reliable, and uninterrupted connectivity to the Lync 2010 Front End servers. If the back-end server experienced any sort of hiccups or was unavailable for any reason, users would feel the effect of presence not updating in a timely fashion or loss of contacts.

User Groups Inner Workings

Lync Server 2010 uses an algorithm, based on a distribution of users across 10 possible Front End servers, which creates an ordered list of servers that lets clients determine which server to connect to in a given pool. This process was created on the server side whenever a Lync Front End server was authored and published in the topology. On the user side, whenever a user was enabled for Lync, an algorithm was run to determine to which Front End server the user would be homed.

For Lync 2013, the Lync development team changed the algorithm to automatically assign users to User Groups. Each of these User Groups is assigned to primary, secondary, and tertiary Front End servers if there are at least three servers in the pool. If there are fewer than three servers making a Lync 2013 pool, then a user would belong only to a primary or a primary and secondary User Group. Windows Fabric is responsible for the replication between Front End servers that maintains the copies of the user's data. Having fewer than three servers in a pool obviously reduces the number of data copies.

So now comes the biggest question that comes to mind: What happens if the server that contains the primary User Group fails? If the primary server fails, the User Group fails over to the secondary server; if the secondary server fails or is unavailable when the primary fails, then the User Group fails over to the tertiary server. Now, in an extreme case where the tertiary fails, Windows Fabric elects another Front End server (assuming one is available), retrieves the persistent state information from the backend SQL Server database, and forms a new User Group for the user to log in to.

Figure 1 shows a sample Lync Front End server topology with User Groups.

Microsoft Lync Server 2013 Front End server topology
Figure 1: Lync Server 2010 User Groups (Click image for larger view)

Now in the figure above, we happen to have six Lync 2013 Front End servers. Each Front End server has at least one User Group and some have two. Each time a Front End server is added to the topology, the Front End servers redistribute the User Groups in a fashion that will load balance them across the various Lync Front End servers.

The Lync Magic

Automatic load balancing and failover is the magic I referred to earlier. It's a good thing that User Groups and Windows Fabric work so well because things just got a little more complicated with regards to authentication to certain Front End servers. Compared to the past two versions of Lync (Lync 2010 and OCS 2007 R2), the method by which an administrator would trace which Front End server a user is connected to or authenticated to has changed. Troubleshooting is a little more challenging because we're throwing a Windows service into the mix. What makes things even more interesting is now we start talking about quorums, which is another aspect that goes hand-in-hand with Windows Fabric and User Groups. Perhaps that's a perfect transition into a topic for yet another day.