Windows IT Pro is the leading independent community for IT professionals deploying Microsoft Windows server and client applications and technologies.
  
  
  Advanced Search 


November 1995

Exploring Cairo


RSS
Subscribe to Windows IT Pro | See More Windows 2000 Articles Here | Reprints | Or get the Monthly Online Pass—only $5.95 a month!

No doubt, the massed choirs of Windows 95 users are singing from the rooftops, because Office 95 under Windows 95 allows long filenames. They think this is going to be of so much more use to them; 8+3 was a dreadful limitation, even 15 years ago.

But what narrow thinking! Going from an 8+3-character filename to a 256-character version of the same thing is progress only in as much as it's a move from disastrous to meagre on the usefulness scale.

Let's look at it another way. If you were to take all the documents on your desktop computer-or, better still, your corporate LAN-and put them into one directory, would 256-character filenames really be enough? No, of course not! You'd still want the filtering, sorting, and collating facilities offered by subdirectories. But as soon as you put documents into subdirectories, you're scattering your data into the unknown far reaches of your hard disks, never to be seen again in a hurry, because the retrieval tools are so weak and rigid.

Indeed, to look at it from still another perspective, think about a database design. You would definitely want a unique key-or a unique combination of keys-on a database entry. Allowing multiple entries to have the same master key is a recipe for disaster. Suggesting that a database manager scatter data randomly across multiple tables so you can sort and query them more easily would undoubtedly end in a major argument. The database manager would argue that all you need is to add another field or two to each record to uniquely define it.

Now spin that mindset into a disk-filing system, and you can see how even a 256-character filename strategy is pathetically weak. You need unique document identifiers and multiple "database-alike identity fields" to categorize your data. Hence, even 256-character filenames are a largely useless solution for the late '90s, especially since gigabytes of disk space storing hundreds of thousands of files cost mere hundreds of dollars today.

Not surprisingly, Microsoft has a strategy in place to sort this mess out. It's called Cairo. For what seems like an eternity, Microsoft has been proclaiming, "We are all on the road to Cairo." Indeed, this future, still unseen release of Windows NT has been coming for so long that it has gained almost mythical status. Private discussions with Microsoft have clearly indicated that Cairo hasn't vanished, that work has been steadily progressing over the last couple of years, and that 1996 is going to be the year of the grand roll-out of Cairo-at least initially-in beta form.

So, I thought it would be useful to explore the current state of play to see where this mythical Cairo technology is coming from. Hopefully, by the end of this journey from Office to Cairo, you'll agree that large components of Cairo have been sitting on your desktop in full view for nearly two years.

Of course, there are many facets to a large project like Cairo. Although I can't speculate now about how all of them will work, one clear thread is worth pursuing: how the object-storage system works today-how it might well be developed for Cairo and how it will help solve our data storage and retrieval problems.

Office and Storage
The questions must have crossed your mind: When you embed an Excel worksheet in a Word document and then save the document, how does Word "know" how to store the Excel file? Where is the Excel file stored? And if it's stored in the Word file, is it converted to a Word table?

Since you can reload the Word file and then double-click on the Excel file to bring up an in-place editing instance of Excel with all the editing and formatting capabilities of Excel supported, then the Excel worksheet must be stored natively within the Word document. These functions simply wouldn't be possible if Word converted the Excel table to a Word table.

If you embed a different kind of document, such as a PowerPoint slide, the same logic applies: The native PowerPoint presentation is stored within the Word file. Indeed, Word has no native concept of a presentation so it's hard to conceive of how such a Word-centric conversion could work.

Normally, this is the stuff of private data formats. It would immediately suggest that Microsoft had written some private Office data format that could store any data file from any Office application. That would make complete sense, except for one thing: You can embed any data file from any Object Linking and Embedding (OLE) 2-supporting application in a Word file. The implication is clear. This "embed and store" mechanism can't be relying on private formats, because you can embed objects from applications that have no connection with Microsoft. Thus, adding a Visio drawing is no more difficult than bringing in an Excel worksheet.

There must be a standardized, open-storage, container-and-content model at work here, something that any vendor can tap into. There is! It's called Structured Storage, and it's been around since the release of OLE 2.0.

Structured Storage is fascinating for many reasons, not the least of which is that so few people have actually looked carefully at how it works. Fortunately, Microsoft has tools that let you look inside a Structured Storage file to see what's going on. The best tool to use is DFVIEW.EXE, which is a Structured Storage DocFile viewer supplied with Microsoft Developers Network (MSDN) Level 2. In addition, there's a good book on OLE 2 by Kraig Brockschmidt entitled Inside OLE, Second Edition from Microsoft Press. The chapter "Structured Storage and Compound Files" is detailed and informative, although as a whole the book is a deeply technical reference that warrants several readings.

To illustrate how Structured Storage works, let's look at a basic Word 7 file. I created a new Word document, put two words into it, and saved it to C:\tempword1.doc. I then loaded this file into DFView (see Screen 1).

You can see that it shows a hierarchical layout with the top level being the storage file itself. This is the file on the disk. At the next level, there are four streams of data: The first is called Comp-Obj for Compound Object; the second is called WordDocument and contains the actual Word document data. The third and fourth are a SummaryInformation stream and a DocumentSummaryInformation stream. These last two contain summary information tags like Author, Date Created, Size, Last Printed, and so on. Each of these four streams is a separate data stream in the Structured Storage file.

Having closed the DFView display of the file and reopened it in Word 7, I inserted a simple two-cell, two-row Excel 7 worksheet into the Word file, using the Excel button provided on the Word button bars. Then, I resaved word1.doc and reopened it in DFView (see Screen 2).

Things are definitely becoming interesting. There are still the same four Word object streams in place, but there is also a new stream, called ObjectPool. If you open the ObjectPool stream, you'll see another storage object with an obscure numbered label. If you then open this object, you'll see what looks like another complete document stream, just like the one at the root of the tree. There are the same CompObj, SummaryInformation, and DocumentSummaryInformation streams. But the WordDocument stream has been replaced by a Book stream. This contains the Excel Book worksheet information-the Excel worksheet object that was inserted into the Word document. In addition, there are a couple of extra streams here to render the object when its container is running. In other words, when you are editing the Word document, you want to see a visible representation of the contents of the Excel object, even though Excel isn't running at that point.

Let's take this one step further and put a WordArt object inside the Excel object, which is inside the Word object. Phew, my head is spinning. How about yours?

Screen 3 shows what's happening-I have in-place activated the Excel object, and then inserted the WordArt 2 object into the Excel object. You can't currently in-place edit more than one level deep-OLE 2 doesn't support multiple nesting of all the negotiation that goes on between the various applications. So, although Word will negotiate with Excel to bring up the Excel "look and feel" in the Word framework, it can't negotiate with WordArt to go two levels deep.

Let's look at this triple-layer sandwich using DFView to see the result of this multiple embedding (see Screen 4). Not surprisingly, there is a WordArt object stored inside the Excel object, which is stored inside the Word object.

What Does It Mean?
OLE 2 containers can contain objects of completely foreign formats, and as far as the container is concerned, the contents of the embedded object are opaque. Word doesn't know-or need to know-how Excel stores its internal data. Nor does Excel need to know how WordArt stores its data. They simply need to know how to contain them, and this is defined in the Structured Storage standard.

Although I only inserted one object into another, there's no limit on the number of objects you can insert in parallel. For example, I could have put two Excel worksheets, each containing different objects, into the Word document. The resulting tree structure is easy to imagine. Just think of Structured Storage as an object filing system. The similarities between what DFView has revealed and what's in a standard directory/subdirectory/file structure are plain.

   Previous  [1]  2  Next 


Top Viewed ArticlesView all articles
Command Prompt Tricks

One reader shares his tip for setting up the command prompt to reflect a remote path. ...

WinInfo Short Takes: Week of November 9, 2009

An often irreverent look at some of the week's other news, including some more Windows 7 sales momentum, some Sophos stupidity, Microsoft's cloud computing self-loathing, more whining from the browser makers, Zoho's "Fake Office," and much, much more ...

Understanding File-Size Limits on NTFS and FAT

A general confusion about files sizes on FAT seems to stem from FAT32's file-size limit of 4GB and partition-size limit of 2TB. ...


Related Events WinConnections and Microsoft® Exchange Connections

Deep Dive into Windows Server 2008 R2 presented by John Savill

Check out our list of Free Email Newsletters!

Windows OSs eBooks Understanding and Leveraging Code Signing Technologies

A Guide to Windows Certification and Public Keys

SQL Server Administration for Oracle DBAs

Related Windows OSs Resources Introducing Left-Brain.com, the online IT bookstore
Looking for books, CDs, toolkits, eBooks? Prime your mind at Left-Brain.com

Discover Windows IT Pro eLearning Series!
Clear & detailed technical information and helpful how-to's, all in our trademark no-nonsense format


Windows IT Pro Home Register FAQ for Windows WinInfo News
Europe Edition About Us Contact Us/Customer Service Media Kit Affiliates / Licensing  
SQL Server Magazine Office & SharePoint Pro DevProConnections IT Job Hound
Left-Brain.com Technology Resource Directory asp.netPRO ITTV Windows SuperSite 
 
 Windows IT Pro is a Division of Penton Media Inc.
 © 2009 Penton Media, Inc. Terms of Use | Privacy Statement