To some people it's an indication that SkyNet is well and truly on its way to reality. To others it will be a great step forward in the effective and efficient processing of new email through machine learning. Either way, Microsoft is going to unleash the "Clutter" feature on Office 365 tenants soon. I'm a fan. Maybe you will be too!
I’m a big fan of machine learning. That is, if it helps users work smarter. A long time ago (1988), I worked on an artificial intelligence project to develop the first set of rules to automatically process messages as they arrived into user inboxes. That work was on VAX/VMS systems running ALL-IN-1 with a LISP program running in the background to interpret new messages as they arrived. The project eventually resulted in U.S. and European patents. However, Digital never took the work anywhere after we proved the concept and so it all petered out.
Roll forward 26 years and we are in a different world. Where the average ALL-IN-1 user might have received ten messages daily, the average traffic into the inboxes of today’s email warriors is much higher. Some people are better than others in coping. For instance, some select the messages that they want to read and ignore the rest on the basis that if the content is really important, the sender will ping them again. My variation is to read the first few lines and either delete the message or read on.
The “Clutter” feature discussed at the Microsoft Exchange Conference (MEC) in April is an attempt to help users focus on the most important messages by removing clutter from the inbound email stream. In this respect, clutter can probably be defined as informational messages that you don’t really need to see but might want to if you have the time. Like a message announcing that free beer is being served on the lawn after work.
The system uses a Bayesian Probit Regression model based on the “Infer.NET” work done by Microsoft Research to learn about the messages that are important to each user on an Exchange server. It’s important to say that any machine learning model can only be effective if it has a reasonable amount of data to process; if you receive two or three messages a day, you probably don’t need machine learning to help figure out what to do with your email and anyway, the total number of messages delivered to you over a month might not result in good rules. On the other hand, if your mailbox receives hundreds of messages daily, then a good pool of information will accumulate quickly and the resultant rules are far more likely to accurately represent your view of how important an individual message might be.
Processing is done by a time-based assistant running on Exchange mailbox servers and the results are stored in the root of user mailboxes. This implementation ensures that learning is done on a per-user basis and the generated rules are available to all clients, even if a client doesn’t have the necessary user interface to help refine the model by marking specific senders as important or specific messages as “clutter.” These explicit indications of how you want to process mail are added to the implicit information gathered by the assistant from how you actually read, delete, and respond to messages. For instance, if you always respond to messages that come in from your boss, it’s reasonable to assume that your boss is an important person to you and that their email should never be marked as clutter.
Outlook Web App is the first client to support UI for clutter but you can expect UI to show up in future releases of other clients like Outlook.
Once the assistant has learned enough about your email habits, the transport service can begin to process new mail as it arrives on a server. Older messages that already exist in your mailbox are not processed, largely because you’ve already made a decision about them (or maybe not). In addition, any rules that you have created will override clutter processing because you’ve made an explicit decision as to how specific messages should be processed by those rules. It’s therefore accurate to say that people who have already created a comprehensive set of rules to process new mail coming into their mailbox will gain less from “clutter” than other users.
“Clutter” is not mandatory and users will be able to opt out if they find the prospect of cyborgs filtering their new mail a tad off-putting. Of course, this is no worse than Google running email through routines to figure out what ads you’d like to see (or not), but there’s no doubt that some people do have strong feelings about any notion of message inspection.
“Clutter” is a feature that is due to appear inbefore the end of 2014. Microsoft hasn’t said yet whether this will be a feature that is restricted to certain plans. My bet is that it’s going to be limited to Plan E3 and above because this is a feature that is far more useful in an enterprise setting than it is for tenant domains used by small businesses and single professionals.
Regretfully, it appears true that on-premises customers won’t be able to use “Clutter” in the near future. I think that this is a pity, but I understand that the amount of processing that’s involved in machine learning might impose an unconscionable performance penalty on mailbox servers. Perhaps the limitation will be lifted in the future when more powerful servers are available.
Like any other machine learning project, “Clutter” will need some tweaking over time to improve and refine its model and the way that it processes email. Microsoft will no doubt gather masses of data about how “Clutter” operates from Office 365 and use that data (but not details of the messages that are processed) as the basis for refinements. The more data that you have to do the tweaking, the better, and it's probable that Microsoft wants to go through this phase of development using Office 365. Once the algorithms are fully debugged, tweaked, and refined, they can be unleashed onto the on-premises community. That's something to look forward to, I think!
Follow Tony @12Knocksinna