Beware token bloat!
When we talk about enterprise computing, and enterprise identity, we usually talk in terms of large-scale system architectures. But architectures can scale from enormous to minuscule—down to the level of packets on the wire. In my past few columns, I’ve spoken about some of the big identity issues we’re dealing with today, such as securely connecting your enterprise to that great big service station in the sky (aka the cloud). This month, I’d like to talk about something on the other end of the identity scale, something down near the atomic level in IT terms: the Active Directory (AD) security access token. This little mote controls authorization in an AD domain, and if you don’t pay attention to it, you might be setting yourself up for some big problems.
A Token in a Ticket
The Kerberos security protocol is the bedrock upon which AD builds practically everything else. Strictly speaking, the Kerberos protocol handles only authentication (securely identifying a user’s identity on a computer network). With the introduction of Windows 2000, Microsoft extended the Kerberos protocol to also handle authorization (determining whether a user has rights to access a resource). At the time, Microsoft was criticized for extending existing standards for its own purposes and causing interoperability problems with other Kerberos systems. In this case, the Kerberos standard does provide for extensions by making a user-defined field—a placeholder—in the ticket-granting ticket (TGT) called the Privilege Attribute Certificate (PAC). Microsoft stores the security access token in the PAC field of the Kerberos ticket to handle authorization.
How is the access token created? When a user successfully authenticates to an AD domain, the Kerberos Key Distribution Center’s Authentication Service queries its local directory service and the closest Global Catalog to determine what groups the user is a member of. It then generates an access token that contains those groups and their SIDs, and the user’s name and SID, and adds it to the TGT.
The size of the PAC field and the access token it holds is finite; the field doesn’t stretch to fit a large access token pushing up against the PAC’s limits. Therefore, the limit to the number of groups a user can be a member of is about 1,015. This is because the PAC can hold only 1,024 SIDs, minus a varying number of well-known groups that the Local Security Authority (LSA) adds to the access token. (See the Microsoft article “Users who are members of more than 1,015 groups may fail logon authentication” at support.microsoft.com/kb/328889.) That might sound very large, but users can run up against this access token limit with as few as 270 groups, and begin to feel its effects long before they reach the limit. This situation is known as token bloat, and it won’t affect just one of your users when it hits: It will affect a lot of them.
Why? Because other mechanisms, such as RPC and HTTP, rely on the MaxTokenSize registry value (HKEY_LOCAL_MACHINE\SYSTEM\CCS\Control\Lsa\Kerberos\Parameters) when they allocate buffers for authentication. By default, MaxTokenSize is 12,000 bytes; if a user is a member of more than 120 groups, he or she might begin to experience slow logons and other erratic behavior, and users with greater numbers of groups in their access token will encounter authentication errors and Access Denied authorization errors.
The MaxTokenSize value can be adjusted upward to accommodate more groups (see the Microsoft article “How to use Group Policy to add the MaxTokenSize registry entry to multiple computers” at support.microsoft.com/kb/938118), but OSs since Windows Vista and Windows Server 2008 will automatically adjust MaxTokenSize upward to compensate for greater group membership. However, this is just a Band-Aid on the problem; users will still experience the slowdown effects of a large access token, and the 1,015-group limit cannot be exceeded regardless of how high you manually set MaxTokenSize.
It’s important to keep in mind that when a user’s group membership is enumerated to create the access token, it includes all transitive group memberships as well. This means that using a deeply nested group structure—though it might be convenient from an organizational viewpoint—will increase the average size of the user’s access tokens. For example, if you’re a member of the Muleshoe Users security group, which is a member of the Bailey County Users group, which is a member of the Texas Region group, which is a member of the US Region group, you already have four group SIDs in your access token.
There’s a further consideration in this debate about token size. Different group types take up varying amounts of space in the PAC. Domain local groups take 40 bytes to store in the PAC, but global groups and universal groups take only 8 bytes per group. So, if you’ve been following group-nesting guidelines to focus on domain local groups, you’ll see token-bloat problems sooner than in a domain that uses global and universal groups.
Another place token bloat will bite you is related to Microsoft SharePoint. Starting with SharePoint 2007, security groups—not just distribution lists (DLs), which don’t have a SID—are required to configure permissions to SharePoint resources. The easiest solution, and one I’m sure many companies have implemented, is to simply turn all DLs into security groups. This is potentially a nightmare—first, because you probably haven’t managed or organized your DLs in the same way you’ve organized your security groups for access control, and second, because it will dramatically increase the size of your user’s access token when all these DLs show up in it. How many mail distribution lists are you a member of? Do you even know?
A Crash Token Diet
You can make some temporary fixes to hold things together, but to really fix token bloat you need to adopt a number of best practices in group management and object lifecycle management. Let’s look at the quick fixes first.
First, consider bumping MaxTokenSize up to its maximum setting of 64K (i.e,. 65535). This doesn’t solve your token-bloat problem, but it will stave off authentication failures in RPC and HTTP due to MaxTokenSize not being large enough. The Microsoft article mentioned earlier will show you how to use Group Policy to set this value for multiple users.
Second, the easiest way to quickly look at your own group membership is to use the Whoami command-line utility, which is included in Windows. (Whoami will also show information about the current user on the local system, so it’s a very handy utility if you need to confirm who you’re logged on as.) For group membership, you’ll want to use the /groups parameter, which will enumerate all the direct and transitive groups—and their SIDs—that the user has in their access token. You can run this command only locally, however.
Third, Tokensz (in the Microsoft Download Center at www.microsoft.com/download/en/details.aspx?id=1448) will check a user’s MaxTokenSize setting, and the /calc_groups option will list a user’s groups. With some creative scripting, you could run Tokensz against all (or a sampling of) your users to discover potential trouble spots. Ntdsutil has a function, Group Membership Evaluation, that you can run against an individual user to get detailed information about his or her access token. In addition to listing the user’s groups, it will also show SID history (which increases access token size) and group type, so it’s a good way to dive a little deeper into the most affected users.
Fourth, another quick fix is to look at your most severely affected users and see if you can simply remove them from groups they no longer need to be a part of. This is a classic symptom of poor account and object lifecycle management, because it’s much easier to add someone to a group when they’re needed than to remove them from the group when they’re not. With information from Group Membership Evaluation, you could minimize a user’s domain local groups to free up access token space.
Microsoft has a detailed document about the token-bloat problem—“Addressing Problems Due To Access Token Limitation” (www.microsoft.com/download/en/details.aspx?displaylang=en&id=13749)—to step you through remediation. To really solve the problem, however, you have to take a more strategic view of how you manage your security groups. These are the areas you should focus on:
- Minimize nested groups—There’s nothing wrong with nested groups; just don’t get carried away with them. It’s not uncommon to find circular nesting several layers deep, and that will drive you crazy in a hurry.
- Use domain local groups as the final group into the resource, and nowhere else—Promote domain local groups to global groups and universal groups where appropriate. You’ll need to study the pros and cons of the different group types, especially if you have a multi-domain forest, because each has advantages and disadvantages.
- Get on top of your group lifecycle management—This is a widespread problem in AD installations. There’s usually an urgent need to create groups, add users, and populate resources for new projects. There’s rarely that same urgency to remove users from groups, groups from server ACLs, and actually delete groups unless it’s driven by information security and a clear lifecycle plan. Products such as Imanami’s GroupID specifically focus on group lifecycle management; the challenge in this approach is getting IT management to pay for a need that’s important but not urgent.
- Limit your account administrators—The fewer individuals that can create groups, the less chance you’ll have too much group creation.
Token bloat is a common problem in larger AD installations, especially ones that have been around for a while and that don't have good group lifecycle management. If you fit into this category, use this article’s tools to look at your user community and head off token bloat before it begins.