Exchange's interesting document fingerprinting feature

I never much liked the Data Loss Prevention (DLP) feature when Microsoft shipped it in Exchange 2013. But it's amazing how technology grows on you over time. The changes made in Exchange 2013 SP1 have helped a lot. Expanding the set of clients that can display the DLP policy prompts is just good sense while the additional of document fingerprinting is a nice way to allow companies to include their own sensitive data types in DLP policies.

After its introduction in Exchange 2013, Data Loss Prevention (DLP) never really impressed. At least, it didn't impress me. Although the technology is an interesting extension of transport rules and Microsoft’s initial foray into the world of DLP (see this link for the EHLO blog post on the topic), it was limited by need to deploy Outlook 2013. Without this client, users couldn’t see the DLP policy prompts invoked when policy violations are detected and introducing a feature like DLP that can prevent user email going to its intended destination because it is blocked by a DLP rule seems like an opportunity for user uproar.

Thankfully, the release of Exchange 2013 SP1 tipped the balance to the right side by expanding the set of supported clients to include Outlook Web App (OWA) and OWA for Devices. It’s true that DLP is still limited because older Outlook clients still cannot display the policy prompts, but OWA support makes deployment easier.  I guess Microsoft’s hope is that customers who deploy Exchange 2013 will include a client-side upgrade into the bargain and so provide the necessary critical mass for DLP to gain traction. We’ll see.

Exchange 2013 SP1 also includes an update for the set of sensitive data types and templates provided to help customers start their DLP projects. The initial set in Exchange 2013 RTM was quite extensive as it contains sensitive data types such as credit card information, U.S. social security numbers, and so on. Templates such as “U.S. Federal Trade Commission Consumer Rules” provide prototype DLP rules to protect against the inclusion of relevant data types in email. After creating a new DLP policy from a template, you can edit it to shape the rules to provide the protection required by the organization. Indeed, if you are so inclined, you can create a DLP policy from scratch and ignore the set of Microsoft templates. Or create your very own DLP template, something that Microsoft anticipates will be done by third parties who specialize in this area.

Aside from the appearance of some templates for Saudi Arabia (how many people will want to protect themselves against the strictures of the Saudi Anti-Cyber Crime Law?), the templates installed on my server still seem to focus on the major markets (U.S, Europe, France, U.K., Germany, and Australia). No doubt we can expect further expansion in the future.

However, the most interesting DLP update included in Exchange 2013 SP1 is the ability for companies to easily create their own sensitive data types through “document fingerprints.” Essentially, you provide a copy of a document that you deem likely to contain sensitive information to EAC. Apart from the Word template format (.dotx), all of the file formats supported by the Search Foundation can be used, including Word and Adobe PDF.  

When EAC processes a document to create a digital fingerprint, it produces a small Unicode XML file containing a hash value derived by processing the contents of the file and detecting word patterns such as fields and text blocks. Exchange stores it as part of a "fingerprint classification collection" in its configuration data in Active Directory (look under Transport Settings\Rules - see below for a screen shot). The file that you provide is not uploaded or stored by Exchange - only the hash value is kept. Once the document is processed, you can use the new sensitive data type that it represents in DLP rules in exactly the same way as any of the out-of-the-box sensitive data types (like credit card or social security numbers) defined by Microsoft. The hash value that is generated can then be used to compare against documents that flow through the transport system and any rule that uses the sensitive data type associated with the hash value will "fire." 

To test the theory, I downloaded a PDF version of the Irish Personal Tax Return (Form 12S) from the Revenue Commissioners web site. I then went to the Compliance Management section of the Exchange Administration Center (EAC), then Data Loss Prevention, and then clicked “Manage document fingerprints.”  I completed some information to identify the new fingerprint and then clicked the plus sign to add the files that I wanted to include in the fingerprint. I only uploaded a single PDF, but you can add more files to represent multi-page forms. Once the file uploaded, I could then add the “Ireland Form 12S” sensitive data type to a DLP rule.

And better again, once the rule was enforced, any attempt to include a scanned copy of a Form 12S as a message attachment fell foul of the rule and the user was politely informed that they shouldn’t be sending personal information of that nature through email.

If you don't want to use EAC, you can use the splendidly-named New-FingerPrint cmdlet to process an input file. For example, here's how I processed a tax form (always popular) from the U.S.:

$InputFile = Get-Content "C:\Temp\US Form W8BEN.pdf" -Encoding byte
$W8BEN = New-Fingerprint -FileData $InputFile -Description "US Form W-8BEN"
New-DataClassification -Name "U.S. IRS Form W-8BEN" -Fingerprints $W8BEN -Description "Message contains a U.S. IRS Form W8-BEN"

This code takes a PDF for the IRS Form W8-BEN and creates a new fingerprint (hash), which is then fed into the New-DataClassification cmdlet to create a new sensitive data type for the W8-BEN form. Once the rule is created, the sensitive data type is available for use in DLP rules. You can see the hash in the screen shot below.

As explained in this EHLO blog post, these new features are just some of the ways now available in Exchange 2013 (and Exchange Online in Office 365) to protect the inadvertant disclosure of sensitive data by users, all of which need some planning before introduction to ensure that everyone understands the need for the kind of policy-driven oversight enabled by DLP.

Service packs often add features to a technology that is first introduced in the RTM release. Adding the ability for companies to create their own sensitive data types through document fingerprints in Exchange 2013 SP1 is yet another example of how to make a newly added feature (DLP) far more useful than before. Is it any wonder that so many people are so firmly attached to the notion that only fools deploy RTM code and the wise wait for a service pack?

Follow Tony @12Knocksinna

Please or Register to post comments.

What's Tony Redmond's Exchange Unwashed Blog?

On-premises and cloud-based Microsoft Exchange Server and all the associated technology that runs alongside Microsoft's enterprise messaging server.

Contributors

Tony Redmond

Tony Redmond is a senior contributing editor for Windows IT Pro and the author of Microsoft Exchange Server 2010 Inside Out (Microsoft Press) and Microsoft Exchange Server 2013 Inside Out: Mailbox...
Blog Archive

Sponsored Introduction Continue on to (or wait seconds) ×