Organizations store a huge amount of unstructured data: Microsoft Word documents, spreadsheets, images, data files from applications—the list goes on. Although applications like SharePoint help organize such data, the reality is that only about 10 percent of data resides in apps such as SharePoint. The rest sits on file servers with little control or management. Most organizations have no idea what data is on their file servers; it's really just a mess.

Add to this the different requirements organizations have for data handling: restricting who can access data, ensuring it's encrypted, ensuring it can't be printed/copied/forwarded as part of Data Loss Prevention (DLP), ensuring it's backed up and kept for a certain duration—or conversely—making sure it's deleted after a certain duration.

The need for data classification and controls has never been more important. Regulatory compliance is getting more and more stringent. Organizations face massive fines and possible jail time for senior leaders who ignore compliance requirements. And then there's the huge loss of confidence in organizations that "lose" customer data.

Many companies try to classify data in a number of ways:

  • Place documents in different locations
  • Create backup rules for, or custom scripts that backup/delete, data of certain types/ages
  • Apply Active Directory Rights Management Services (AD RMS) policies or encryption to sensitive data
  • Use Windows BitLocker Drive Encryption to protect specific volumes containing important data

The problem is that many of these approaches rely on the information worker to make the correct decision in placing or classifying the data—a risk that organizations shouldn't take. A better option is to have the file server that houses the data scan it for social security numbers, credit card numbers, special project names, and the like and then automatically classify it. Once the data is classified, you can schedule tasks (e.g., back up, encrypt, rights protect, move, delete) based on the data's classification.

Meet File Classification Infrastructure

I remember creating a session for TechEd 2006 on a new Windows Server 2003 R2 feature called File Server Resource Manager (FSRM). The FSRM component brought capabilities to Windows file servers beyond the basic volume quota capability that was part of the base operating system.

FSRM lets you assign quotas to groups of users at the folder level. You also control not just how much space is used, but how the space is used by enabling real-time file screens that block specific file types. For example, you could create a file screen that blocks MP3 files. If a user tries to write an MP3 file, he or she gets an access denied message. These customizable actions are configurable (e.g., an action could be an email to the user explaining why they got the message and include a link to corporate policies). FSRM also has great reporting capabilities that identify how file server space is being used and by whom.

In Windows Server 2008 R2, FSRM got a new capability: File Classification Infrastructure (FCI). This feature uses rules to automatically assign specific properties to files and then performs tasks on those files based on the classification. For example, a classification rule might search for strings in the format of a Social Security Number—nnn-nn-nnnn ([0-9]{3}-[0-9]{2}-[0-9]{4})—and if one is found, assign the data a Personally Identifiable Information (PII) property of Moderate, as shown in Figure 1.

Figure 1: Using File Classification Infrastructure to Classify Files

Once the data is classified, a file management task searches for data whose PII classification is set to Moderate or High and then applies an AD RMS policy that restricts how the data is used. Other actions, such as encryption or moving the data, also could be taken. Essentially, the FCI feature involves a two-step process:

  1. Classify data using automated rules.
  2. Perform tasks on data based on its classification.

A huge benefit of classification over the normal processes of searching data and then performing some immediate action is that actions don't have to be immediate. Data is classified periodically or as it's created, and then many sets of actions can be executed later based on the data's classification. This is a very powerful capability for organizations.

Although this technology was great, it wasn't widely adopted, even though organizations cried out for this type of feature. The reason for the lack of adoption was fairly simple: companies didn't know how to get started. Out of the box, FCI included no standard classification properties, no standard classification rules, and no standard tasks to perform based on the non-existent, out-of-the-box classifications. This meant organizations first had to work out what classifications they needed—halting nearly every company in its tracks. Organizations spent months working out classifications, ended up with hundreds of possibilities, and then the project fizzled out and never happened.

To combat the lack of adoption, Microsoft released a Solution Accelerator (free download) for Windows Server 2008 R2 called the Data Classification Toolkit (available online). The toolkit includes a large number of classification properties, classification rules related to common compliance requirements, and tasks based on the classifications and focused on AD RMS policies. The toolkit provides customers a project base on which they can build. The Data Classification Toolkit only has 14 classification properties, but these facilitate the handling of nearly all classification and compliance requirements. I outlined the toolkit's base properties in Table 1.

Table 1: Classification Properties Included in the Data Classification Toolkit

Classification Area

Classification Property

Possible Values

Information Privacy

1.     Personally Identifiable Information (PII)

High; Moderate; Low; Public; Not PII

2.     Protected Health Information

High; Moderate; Low

Information Security

3.     Confidentiality

High; Moderate; Low

4.     Required Clearance

Restricted; Internal Use; Public

Legal

5.     Compliancy

SOX; PCI; HIPAA and many more

6.     Discoverability

Privileged; Hold

7.     Immutable

Yes/No

8.     Intellectual Property

Copyright; Trade Secret and more

Records Management

9.     Retention

Long-term; Mid-term; Short-term; Indefinite

10.  Retention Start Date

Organizational

11.  Impact

High; Moderate; Low

12.  Department

13.  Project

14.  Personal Use

Yes/No

Note that you can use the File Server Resource Manager UI or Windows PowerShell to look at possible values. For example, I can use the Server 2012 PowerShell commands in Listing 1 to look at the Data Classification Toolkit's values for Compliancy.

Listing 1: PowerShell Commands to View the Data Classification Toolkit's Values for Compliancy
$propertyDefinition = get-fsrmclassificationpropertydefinition Compliancy_MS
Foreach ($possiblevalue in $propertyDefinition.PossibleValue)
{
    $possibleValue
}

Description    :
DisplayName    : PCI DSS
Id             : 2DD2F3EE-3BAB-45fc-B33F-65119B3B3C66
Name           : PCI DSS
PSComputerName :

Description    :
DisplayName    : HIPAA/HITECH
Id             : A9E2C599-7DC4-4bf1-90DA-E949EF25D045
Name           : HIPAA/HITECH
PSComputerName :

Description    :
DisplayName    : SOX
Id             : 0424473A-B85A-4071-8A8F-AB3F230864A0
Name           : SOX
PSComputerName :
....

Fast Forward to Windows Server 2012 FCI

So what changed in Windows Server 2012 FCI? A lot—and not just the FCI feature, but also how classification is used.

FCI is still part of the FSRM role, which itself is part of the File and Storage Services role. That means to enable FCI, you must first install the FSRM role through Server Manager (\File and Storage Services\File and iSCSI Services\File Server Resource Manager) or PowerShell:

Install-WindowsFeature FS-Resource-Manager

A major change in Server 2012 FCI is how you manage classification properties. In Server 2008 R2, the classification properties are local to each file server, which means you must be careful to ensure the same classification properties are available on all file servers, or classifications could get lost when files are moved between file servers. This is typically achieved by maintaining a master/staging file server where all classifications, rules, and tasks are defined and then exporting the configuration to other file servers (in fact, a master/staging file server is still good practice in Server 2012).

In Server 2012, the classification properties have moved into a new container in an Active Directory (AD) forest's Configuration partition (\Services\Claims Configuration\Resource Properties). You can still add local classification properties to a server, but to use AD classification, you must run the Server 2012 forest preparation step, update the forest schema, and create new containers. The plus side is that classification properties are centralized and standard across all file servers.

Manage classification properties. You manage classification properties in the Active Directory Administrative Center (ADAC). All classification properties are disabled by default, so you must enable those that you want to use. Additionally, some classification properties (such as Company, Department, and Project) require values before they can be used. To manage resource properties, launch ADAC and navigate to Dynamic Access Control, Resource Properties, where you can modify and enable properties and create additional classifications. Figure 2 shows the built-in Resource Properties in Server 2012. Note that in addition to the 14 classification properties from the Data Classification Toolkit for Server 2008 R2, there are some additional ones; namely, Company, Country, and Folder Usage.

Figure 2: Managing Classification (Resource) Properties with Active Directory Administrative Center

Management of classification properties is accomplished within ADAC's Dynamic Access Control, a major new feature in Server 2012 that lets you use data classification to control resource access. Dynamic Access Control is beyond the scope of this article, but at a high level it lets you control access to resources based on classification data and attributes of the user and machine trying to access the data. For example, you could use Dynamic Access Control to grant a level of access if the department of the user matches the department classification of the data, avoiding the need to maintain hundreds—if not thousands—of groups just for access control. This means data classification isn't intended just to help secure and organize data for compliance, but also to manage resource access in a far more auditable and logical way than using ACLs on every file.

Get classification rules and management tasks. Shifting the focus back to FCI, the centralization and inclusion of default classification properties will certainly help organizations get started, but what about classification rules and file management tasks—are they standard in the box? The answer is no; however, Microsoft has updated the Data Classification Toolkit to work with Server 2012, which means you should download and run it. The toolkit will create a number of classification rules and file management tasks on your server.

Before you run the Data Classification Toolkit import wizard (which populates the targeted FSRM server), it's important that you enable most of the classification properties in AD and then refresh the FSRM Classification Properties. If you don't take these steps first, the import process will fail because the classification properties won't be available to the templates. Additionally, you should have AD RMS deployed in your organization so that you can apply a default AD RMS policy (that you can change later) and specify XML configuration files when you import the toolkit.

Configure classifications. The Data Classification Toolkit wizard (Figure 3) steps you through the process of configuring classifications. First, import the baseline, in-box classifications to a staging server. The toolkit encourages you to manage FCI on a single server, configure rules and tasks, and then export that configuration to all file servers in your environment. Once you've tweaked the configuration, export it to an XML file and deploy it to your production file servers.

Figure 3: Primary Actions of the Data Classification Toolkit

The Data Classification Toolkit contains three baseline classification templates to help you get started. These are just XML files that you can apply to your file server; however, some customization will likely be required. The classification templates are:

  • Data Classification Toolkit Package.xml—A standard set of rules and tasks primarily focused on finding SSN and credit card information in data
  • NIST SP 800-53 Classification Package Example.xml—NIST SP 800-53
  • PCI-DSS Classification Package Example.xml—PCI-DSS

Note that you don't have to use the Data Classification Toolkit to import and export a classification configuration, but its templates provide a great starting point for your own rules and tasks. Alternatively, you can use PowerShell.

To manage classification beyond the global classification properties stored in AD, use the FSRM tool via the Classification Management and File Management Tasks navigation nodes. Figure 4 shows the classification properties available. Note that the scope is visible for each classification property. The properties with a global scope were retrieved from AD. Note also each property has a usage type that shows whether it can be used as part of Dynamic Access Control authorization, FCI, and folder management.

Figure 4: File Server Resource Manager Displays Classification Properties Available

The Classification Rules navigation node enables you to manage the rules that populate the classification properties. If you leveraged the Data Classification Toolkit and imported the standard Data Classification Toolkit Package.xml, you have a number of classification rules available; however, they are disabled by default (imported file management tasks also are disabled by default).

Take some time to look at the classification rules and how they work and create your own if necessary. Make sure you enable the rules you want to use in your environment, as shown in Figure 5. Look at the detailed properties of a classification rule. Note that you can also use PowerShell to ascertain classification properties, which gives you limitless flexibility. One action you might want to take for each rule is to customize the folders to which a rule applies (on the scope tab of a rule's properties) and how the values are set for classification properties.

Figure 5: Enable a Classification Rule with the Enable Rules Action

Schedule classification activities. Your configuration now has classification properties and rules to set them. Next you need to tell FCI when it should perform the classification. Click the Configure Classification Schedule action, which opens the FSRM properties on the Automatic Classification tab. This lets you schedule scans to classify unclassified data. Another option, Allow continuous classification for new files, is a great new feature in Server 2012 that classifies data as soon as it's created.

Store classification data. At this point your data is classified, and a common question is: Where is the classification data stored? Classification data is stored in an NTFS alternate data stream in the file or folder that has the classification:

PS E:\unsc> Get-Item .\master_chief_eyes.jpg -Stream *
   FileName: E:\unsc\master_chief_eyes.jpg
Stream                   Length
------                   ------
:$DATA                    39060
FSRM{ef88c031-595...        144

This means the classification is kept, even when you move the data between NTFS volumes. If you use Server 2012's new Resilient File System (ReFS), classification won't work because ReFS doesn't support alternate data streams in Server 2012. Additionally, if the data type supports it, the classification is stored within the document via the classification storage module. Microsoft Office is the primary application suite that allows classification to be stored in the application data, which also means the classification travels with the documents if they are stored in SharePoint. Other vendors could add support for classification storage in document data if they were so inclined. In Server 2012, the classification also is stored in the security descriptor to enable Dynamic Access Control authorization based on classification.

Use Windows Explorer. Windows Server 2012 and Windows 8 Windows Explorer exposes classification as a new Classification tab when you look at the properties of a file or folder, allowing direct manipulation of the classification; however, manually setting classification using Windows Explorer isn't practical for all organization data. One nice capability, though, is you can set classification at a folder level and then all folders and files in that folder will inherit that setting.

Perform management tasks. Although classifying the data is a huge step, you also want FCI to perform tasks based on those classifications. Use FSRM's File Management Tasks to create tasks that perform actions based on classification. If you used the Data Classification Toolkit to import a baseline configuration, some tasks are already configured, but they are also disabled. Look at the tasks and customize and enable the ones you want based on the classifications for which you have rules. For example, if you enabled rules to set a value for the Personally Identifiable Information (PII) classification property, then you should enable tasks to perform actions based on that classification property. The included RMSProtect_ModerateAndHigh_PII (Figure 6), for example, performs a task if PII is greater than Public and then sets AD RMS policy on the data. Note that a custom type of action is available, which lets you perform almost any action provided there's a command-line method of doing so. Each task has its own schedule, and (like classification rules) the option to run continuously on new files can trigger tasks as classifications are applied.

Figure 6: A File Management Task Protects Data Classified as PII Moderate or High

All that's left now is to create some data files that contain the elements your rules search for, and your data is automatically classified. In the accompanying video, I walk you through the major steps of using FCI.

Save with File Classification Infrastructure

FCI is a great technology, but before it can become useful to your organization—even before you implement it—you should understand what compliance levels your organization requires. Is your company required to adhere to certain regulatory standards? Does your company have its own standards for data retention, protection, and organization? Take time to understand these requirements, and then start implementing FCI, which comprises the following high-level steps based on a Server 2012 implementation:

  1. Enable the AD classification properties you need to use and define values if necessary.
  2. Install the FSRM role on file servers and the Data Classification Toolkit on one staging/master file server.
  3. Import a template as a starting point. Customize the classification rules and tasks to your organization's exact needs.
  4. Export the classification configuration from the staging/master file server to all other file servers.
  5. Look at using existing classifications for other purposes such as Dynamic Access Control.

The in-box FCI implementation could save your organization a lot of money over third-party solutions. It also means data needs to be "local" to Windows file servers (i.e., direct-attached or mounted volumes from a SAN). FCI doesn't work for remote data such as that accessed over Server Message Block protocol.

I really only touched on the high-level capabilities of FCI. It's one of those technologies that could literally change the way an organization works, but it's also not widely understood or used. I strongly encourage every organization to look at FCI and discover how it might help organize and control unstructured data.