Last week, Microsoft showed off some multi-touch UI features it intends to introduce in Windows 7, the follow-up to Windows Vista that should appear by early 2010. I don't think this usage model is going to be particularly compelling for most people, and not just because we've seen it all already on the iPhone. The problem is that multi-touch is an evolution of the interfaces that came before it, and one that augments, rather than replaces, its predecessors. (And don't even get me started on the smudge-tastic nature of touch screens.)
Looking back on the evolution of computer UIs, I don't think many people would argue that the move from punch cards and front-panel switches to keyboards and character-mode displays by the late 1970s was a necessary boon to both usability and efficiency. Likewise, few would argue that the move to mass market GUI-based systems--first on the Mac in the 1980s but then popularized by Windows in the early 1990s--was anything other than a similar improvement.
Since then, however, various technologists have posited a future in which our interactions with PCs will become more natural. Some obvious early examples include Alan Kay's DynaBook and Apple's Knowledge Navigator, both of which in many ways foreshadowed today's commonly accepted portable computer form factors and functionality as well as Internet connectivity. More recently, we've seen initiatives around pen/stylus computing (PDAs, smart phones, Ultra-Mobile PCs--UMPCs), tablet computing (Tablet PC, UMPC), table- and wall-based computing (Microsoft Surface) and touch computing (iPhone).
What's interesting about these more natural computing environments is that they don't actually change the way we interact with computers to the degree that GUI-based systems did 25 years ago. In fact, GUI-based systems are also referred to as WIMP (Windows, Icons, Menus, and Pointing device) systems, and whether you're interacting with the display via a mouse pointer, a stylus, or your finger, you're still really performing the same basic actions. Yes, a multi-touch interface like that on the iPhone or that which Microsoft showed off for Windows 7 last week is arguably more sophisticated than point and click. But it's an evolution, not a revolution, and it's not a one-to-one replacement, because many tasks are simply easier to perform with more traditional interfaces.
Looking ahead, I see two main areas of innovation that need to occur to move the needle on natural computing, and neither is particularly radical. First, the many forms of WIMP-based computing models need to be collectively implemented across all unique computing form factors so that we can arrive at what I think of as "situational computing." That is, when you're sitting in front of a traditional PC display at work, a mouse and keyboard will almost always make the most sense. Move into a meeting room, however, and a touch-enabled Surface-based interactive wall might offer the best way to get your point across. And while standing in line at a Starbucks or grocery store, you might want to quickly triage your email using a chiclet keyboard-based phone or a touch-screen enabled iPhone. None of these interfaces replace each other. They just complement each other and form the pieces of what will be a very pervasive relationship between you and the various computing resources you do and will regularly access.
Second, the real future of computer UI is, of course, voice. Here, too, is an interface that augments, rather than replaces, other UIs, given the inapplicability of speaking out loud in certain environments. (Not that such a social faux par has stopped many cell phone users.) But in the same way that years of typing has rendered my handwriting both illegible and physically painful to do over long periods of time, the advent of accurate and usable voice interfaces could likewise trigger a decline in typing ability. I, for one, would welcome at least the opportunity to dictate articles instead of typing them. It would introduce the possibility of "writing" in non-traditional environments (while driving, for example), something that is mostly impossible today.
Not surprisingly, Microsoft is deeply involved in all of the computing interfaces mentioned above. And Windows 7 will, to some extent, include all of these interfaces as well, as does Vista today, actually. (Well, aside from multi-touch.) The problem, of course, is that some are more mature than others and voice interfaces, in particular, are among the least mature computing interfaces today. We'll get there. But it's not going to happen in Windows 7.
Never mind the heavy accents and regional dialects; just figuring out what somebody means(and whom they are directing it towards) is a masterwork of neural engineering and no small amount of guesswork. Half the time my girlfriend has no idea what I am talking about; I can't imagine my computer being any better it, no matter what the technological improvements. Until a computer can pick up facial expressions, tone, and sarcasm, don't hold your breath.
But more than that, I cannot see diction replacing writing for the simple reason that writing is an exercise in forced clarity of thought. Spoken language tends to be much "sloppier" and less structured; putting words down in writing forces us to consider the structure of not just our language but of our ideas as well.
nlaslett@ndi.org June 03, 2008 (Article Rating: