Your entry to a new level of document management

After a long gestation, Microsoft SharePoint Portal Server 2001, formerly code-named Tahoe, is now available in English, German, and Japanese versions. Microsoft presents this server as a Web-based document-management and portal product that can fit easily into existing Windows 2000 or Windows NT infrastructures and that integrates tightly with Web browsers, Windows Explorer, and the Microsoft Office suite. The new product also boasts powerful indexing and search capabilities.

The product's name is a mouthful. SharePoint implies that the server can replace multiple network file shares as a preferred repository for documents; Portal refers to the familiar portal paradigm that users employ to access the data that the server gathers. (Microsoft also uses the SharePoint name in SharePoint Team Services, a technology that's now in place in Office XP—formerly code-named Office 10—and which the company intends eventually to make a part of other Microsoft products. Ideally, customers will use SharePoint Portal Server to create a central portal site for the products that use SharePoint Team Services.) So how does the new server work, and what is its true purpose?

Alike but Different
No formal link exists between SharePoint Portal Server and Microsoft Exchange 2000 Server, but the two products share technology such as the Web Storage System—WSS. (Note that you can install Exchange 2000's standard edition on the same machine as SharePoint Portal Server. However, you can't run Exchange 2000 Enterprise Server, which supports multiple databases and storage groups—SGs—on the same machine.) SharePoint Portal Server uses a modified version (Microsoft sometimes refers to it as a departmental version) of the Exchange 2000 WSS, and the SharePoint Portal Server WSS variant inherits all the capabilities that the Exchange 2000 WSS delivers, such as support for streaming file formats. The SharePoint Portal Server WSS holds any type of unstructured or semistructured information, including properties for every item that it manages.

When you add a document to the Exchange 2000 WSS, the server examines the document's OLE properties (e.g., subject, author, title, date created) and uses them to populate indexes within the WSS. Users can take advantage of these indexes to build views within Microsoft Outlook and thus look at folders' contents sorted by date, size, title, and so on. The SharePoint Portal Server WSS operates similarly but groups properties into document profiles—literally, a way to describe the information that you'd expect to gather about a document. For example, you can create a profile that describes articles to be published in a magazine; that profile might include properties for the author's name, copy editor's name, technical editor's name, and word count.

On the surface, the WSS's structure looks much like the Exchange Store. Whereas the Exchange 2000 WSS organizes data into mailboxes, folders, and items and represents these objects as rows and tables within a database, the SharePoint Portal Server WSS holds documents as items within folders and organizes those folders into workspaces. The workspaces also hold management folders, information categories, document profiles, and subscription information. (A subscription is a marker that a user places on a specific document or category. When the document changes or users add new documents to the category, SharePoint Portal Server emails a notification to users who have taken out subscriptions to inform those users that new information is available.)

The server supports three distinct types of workspaces: document-management, search, and index. The primary difference between these types is that a document-management workspace holds content as well as indexes. Search workspaces hold only indexes that SharePoint Portal Server builds from searching external information sources, such as Lotus Notes (4.6a or later) databases. Index workspaces are even more specialized and operate as part of dedicated SharePoint Portal Server search machines that build indexes from multiple sources and become the definitive source for a search. You can propagate selective indexing results to either search or document-management workspaces. Microsoft recommends that a server running SharePoint Portal Server manage no more than 10 workspaces.

Technology aside, what does the product do? Current reviews are divided about the server's primary purpose. Is the product a document-management server? Is SharePoint Portal Server's top goal to provide information in a design tailored to meet individual needs and preferences? Or are the server's content-aggregation and search features an indication that it's simply the culmination of Microsoft's attempt to build a better search engine? Microsoft seems to intend the server to do all this and more. In building SharePoint Portal Server, Microsoft concentrated on four goals: delivering departmental document management; easing deployment in existing Win2K or NT organizations; integrating with Web browsers, Windows Explorer, and Office applications; and providing search capabilities.

Deliver Departmental Document Management
Unlike server products that target an entire organization (e.g., Exchange Server), SharePoint Portal Server aims for workgroup or departmental deployments. Microsoft's intention is that every department will have a SharePoint Portal Server machine that acts as a departmental portal and repository for all the information that the department owns and manages. (One SharePoint Portal Server machine can still deliver information to a complete enterprise. The server will permit any authorized user to access information, even when the user doesn't belong to the department that owns the SharePoint Portal Server machine.)

The new product meets the Windows infrastructure's need for basic document-management features such as multiple versioning (i.e., the ability to maintain multiple versions of a document or other item that is under development). SharePoint Portal Server supports minor and major versions of documents. The server automatically creates minor versions—in which the version number moves from 1.0 to 1.1 to 1.2 and so on—each time a user updates a document. The server creates major versions—in which the version number moves from 1.0 to 2.0 to 3.0 and so on—when an authorized user elects to publish the document. The product maintains a full audit trail to track updates. Although you can't roll back to an earlier document version, you can save an earlier version as a new document.

The publishing functionality includes a simple workflow process to approve a document's content. Approval is a folder property; as Figure 1, page 54, shows, you can specify a set of approvers and configure SharePoint Portal Server to send the approvers email when a user attempts to publish a document. The workflow is extremely simple: You can send the document to approvers one after another or all at once. When you choose to send notification to all approvers, you can configure the process to publish the document after approval by all the recipients or after approval by only one. No facility exists to set a quorum (e.g., so that the server will publish the document if two out of three recipients approve) or to handle approver requests for new information. (The serial workflow is slightly more intelligent; the document passes from approver to approver in the specified order. One approver's failure to approve the document returns the document to the author.) Nevertheless, SharePoint Portal Server's publishing feature will meet the needs of many organizations that don't need or want a complex workflow.

Internally, the server maps its permissions to the Windows security model, similar to the way public-folder access works in Exchange Server 5.5. Every folder and document that you create within SharePoint Portal Server has a set of server-managed permissions, and certain roles permit users to assume these permissions. (A role is simply a collective set of permissions that enables a user and the server to interact in a specific manner.) The new server defines roles for Reader, Author, and Coordinator. These roles are fixed; you can't create customized roles as you can for Exchange Server public folders.

As the name implies, a Reader can read documents in a folder but can't edit documents or add new ones. You can associate the Reader role with a Win2K group; for example, you can associate the role with the Everyone group on folders that hold information you want to make available to all users. An Author can create and edit documents. A Coordinator can perform any operation, including setting up new document profiles, on the folder. The Coordinator is an important role because it also takes responsibility for creating categories and ensuring that workspaces are managed in a logical and accessible manner.

If you already run Exchange Server, you might currently use public folders for document management. However, significant differences exist between the two servers' designs and features (Table 1 compares the two products' primary features). Therefore, don't assume that you can attain SharePoint Portal Server's document-management abilities through Exchange Server (at least not out of the box). See the sidebar "Why Not Just Use Public Folders?" page 59, for details.

Ease Deployment Within Existing Infrastructures
Microsoft's goal is to have SharePoint Portal Server up and running in less than 15 minutes when you deploy it in an existing Win2K or NT organization. SharePoint Portal Server runs only on a Win2K server with Service Pack 1 (SP1) or later, but this requirement doesn't obstruct the goal because Win2K servers fit well in NT infrastructures. And the product doesn't depend on Active Directory (AD) or any other Win2K component—a tacit acknowledgment that companies will be running NT for years to come. For information about the hardware that you need to run SharePoint Portal Server, see the sidebar "Hardware Fundamentals," page 61.

Currently, you can't link SharePoint Portal Server machines to form a cohesive whole, so you don't need to generate a comprehensive design before you install the first server (as you must do in an Exchange Server organization). However, you can use one server to index the content on other SharePoint Portal Server machines, so you can deploy an organizationwide portal server that relies on the information that many other SharePoint Portal Server machines manage in the background.

You perform some management operations (e.g., define workspaces, set permissions, manage indexes) through a Microsoft Management Console (MMC) snap-in, but the snap-in is limited compared with the snap-ins for Exchange 2000 or AD. The new server is primarily Web based, so you perform most management operations through a browser or Windows Explorer (as Figure 2 shows).

From an IT perspective, the prospect of a department installing and operating a SharePoint Portal Server machine should have roughly the same impact as installing and operating a file server. The department might require assistance to commission the new server, but after it's up and running, the product places no special demands on the infrastructure. (The corporate Help desk, however, might need to cope with demands from users who want to access information on the new server but who don't have the necessary access rights.)

Of course, any investment your company makes in deploying and managing SharePoint Portal Server will be wasted if users don't modify the way they think about documents. Merely dumping documents into a network file share or public folder isn't sufficient. Users must give thought to how to organize and categorize documents effectively, what workspaces to use, and how to administer those workspaces. Don't underestimate the organizational challenges implicit in a change from loose to structured document management.

Leverage Web Browsers, Windows Explorer, and Office
SharePoint Portal Server doesn't require a massive software-deployment effort, either. To permit access to the new server, simply install SharePoint Portal Server client extensions (which you can find in the server CD-ROM's client kit) on your users' systems. These extensions add the necessary components to each system's browser and to Windows Explorer and Office applications. Everyone who accesses a SharePoint Portal Server machine requires a Client Access License (CAL). Users can interact with SharePoint Portal Server workspaces through three interfaces:

  • A portal, which comprises Web Parts that form a digital dashboard that users can access through a Web browser. (Figure 3 shows an example of a typical portal.)
  • Client extensions for Windows Explorer and Web folders.
  • Add-ins for Office applications.

Access through a portal. Microsoft envisions that most users will access SharePoint Portal Server functionality through a Web browser, so when you install the product, it automatically creates a Web site to host workspaces. This Web site acts as a portal and includes a set of Web Parts, each of which describes a function such as search, folder views, or subscriptions. The server maintains these basic Web Parts; to build a customized dashboard, you can mix and match basic Web Parts with Web Parts that access other data sources. For example, one Web Part might fetch data from Exchange Server to show a user's calendar, whereas another might list the user's subscriptions to categories of SharePoint Portal Server information.

Microsoft claims that you can use Netscape 4.72 or Microsoft Internet Explorer (IE) 5.0 or later to access SharePoint Portal Server, but IE 5.5 is the best choice. Web Parts are inline floating frame (IFrame) elements that run within a browser and that use numerous HTML <DIV> tags. Even the current Netscape browsers don't support <IFRAME> or <DIV> tags, so these browsers' ability to support Web Parts is limited.

Access through Windows Explorer. Most users will access the portal through a Web browser, but document authors will likely want to use the Windows Explorer and Office add-ins to work with documents that reside in the SharePoint Portal Server repository. The client extensions for Windows Explorer let users work with folders on a SharePoint Portal Server machine in the same way they work with folders on a local or network drive. You can select and right-click an item to reveal available SharePoint Portal Server actions as well as regular Windows Explorer options (e.g., Create Shortcut, Rename).

Conceptually, this method is similar to the way that Server Message Block (SMB) support in Exchange 2000 lets users view their mailbox contents through the M drive (although working with mailbox contents through the M drive can be dangerous because users can easily delete important messages and attachments if they don't know what they're doing). By contrast, SharePoint Portal Server extensions let users perform document-management operations and manage workspaces.

Figure 4 shows Windows Explorer pointing to a folder called Exchange 2000 in a SharePoint Portal Server workspace. The folder holds many documents, presentations, and white papers. The selected presentation is currently checked in; I could check out the item to edit it on an exclusive basis or publish the item to make a new version available to users. Note that users can't see documents, even through searches of the repository, until those documents are published. This restriction ensures that only authors and administrators can access developing documents until content is final and approved.

Access through Office applications. A COM-based Office 2000 add-in provides document management options (e.g., Check In, as Figure 5 shows) for menus in Microsoft Word, Excel, and PowerPoint. SharePoint Portal Server also supports Office XP applications and includes a new search feature to let users look for documents in SharePoint Portal Server workspaces. (Menu options are available only for Office 2000 and later.) In addition to the menu options, Microsoft has built intelligence into the integration between Office and the new server product. When users double-click to open a document before checking it out, the Office application sends a warning that the document isn't checked out and gives users the option to check out the document. This warning, which Figure 6, page 62, shows, keeps users from making mistakes and simplifies the check-out process.

Support for Outlook is a surprising but understandable omission. The WWW Distributed Authoring and Versioning (WebDAV) protocol is the basis for communication between SharePoint Portal Server and Office applications. However, Outlook—even Outlook 2002—is Messaging API (MAPI)­based. Microsoft is unlikely to develop Outlook­SharePoint Portal Server integration components until Outlook fully embraces WebDAV.

Provide Comprehensive Search Capabilities
Microsoft regards SharePoint Portal Server's search engine as the best of its kind—capable of responsively handling millions of documents. (The validity of this perception will be proved or disproved after SharePoint Portal Server has been in production for a while.) Indeed, Microsoft has patented some of the search engine's algorithms.

Users can manually copy existing documents into the SharePoint Portal Server database, but this method takes too much time to be a good choice when assembling a departmental archive. Also, Microsoft doesn't provide an out-of-the-box method to point SharePoint Portal Server to a file share or other information source, then copy the documents from that source into a workspace. The advantage of moving documents into a workspace is that you can then use the server's document-management features to control the documents. However, if you simply want to search legacy information, you can leave documents on the file share and point SharePoint Portal Server at those documents.

Inevitably, useful information in many formats ends up scattered across Web servers, file shares, Exchange 2000 and Exchange Server 5.5 public folders, Lotus Notes databases, and even other SharePoint Portal Server machines. SharePoint Portal Server ships with numerous filters for different file types (e.g., .doc, .html, .tiff, .txt). These filters let the server create indexes from a variety of sources. (However, you can't search or index mailboxes on Exchange Server or Lotus Notes servers.) Third-party vendors can create filters for vendor-specific file types; for example, Adobe Systems offers a .pdf filter and Corel offers a WordPerfect filter. You can even use the SharePoint Portal Server 2001 Software Development Kit (SDK) to write your own filter. (For information about using the SDK, visit http://www.microsoft.com/sharepoint/downloads/tools/sdk.htm.)

SharePoint Portal Server indexes are secure. In other words, if you don't possess permission to see a document, it won't show up in a search result even when the document is a direct match against the search criteria.

Putting data into a repository is one task; getting the data out is quite another. Getting an aggregate view of the huge amount of information available in a company's network is often a difficult undertaking. SharePoint Portal Server's solution is to create one access point (i.e., the portal) to information assembled from multiple discrete sources. Everyone is familiar with Web crawlers (e.g., AltaVista) and with the problems associated with the process of assembling information through crawling. Asking a search engine to locate documents relating to a topic such as Win2K is a guarantee that you'll find many more items than you can handle; even a refined search might not be able to target the information you truly need. Combined with outdated or missing links, crawling can be a frustrating experience.

SharePoint Portal Server's crawling technology is different from Web crawling in two respects. First, the product can combine information from many different sources, including external and internal Web sites. Being able to aggregate information from network file shares, Exchange Server public folders, Lotus Notes databases, and Web sites into one search operation is a powerful feature. SharePoint Portal Server supports different types of crawl behavior—scheduled, fast incremental, notification-based, and adaptive—to provide control over the amount of crawling activity that occurs. (Adaptive crawling uses models of document volatility—literally, how often document contents are expected to change—to reduce the time spent indexing stale documents.)

Second, SharePoint Portal Server's indexing technology does a much better job of categorizing data than Web crawlers do. The server product uses support vector machine categorization, a hefty term for automatic categorization. When you provide SharePoint Portal Server with a well-structured document that uses styles and properties to convey its contents, the server's Category Assistant feature can learn from the document's structure, then automatically categorize other documents with similar structures. For example, white papers on the Microsoft Developer Network (MSDN) Web site follow a particular format. After SharePoint Portal Server learns the structure of an MSDN white paper from one example, the product can recognize other white papers that it encounters. The value of this interesting concept, however, depends on users' disciplined structuring of documents.

Note that crawling doesn't retrieve data and place it in the SharePoint Portal Server database. Just like Web crawlers, SharePoint Portal Server notes a data item's location and content, then stores that information. SharePoint Portal Server stores actual documents only when you explicitly load them into a document-management workspace.

SharePoint Portal Server uses a modified version of the search engine that ships with Exchange 2000 and Microsoft SQL Server 2000 to accommodate features such as Category Assistant and Best Bets. The latter feature helps users find the documents that might be the closest to the users' needs. A Best Bet often results from a direct match between a search criterion and a keyword set on a document. For example, if you use replication as a keyword for a white paper, SharePoint Portal Server will present the white paper as a Best Bet to anyone who searches for "replication."

Maintaining several search engines (i.e., for Exchange Server, SharePoint Portal Server, and SQL Server) is clearly inefficient, so in the future Microsoft will probably combine the engines into one system service. However, the company has given no firm indication of when this combination will happen.

A Resounding Success?
How successful will Microsoft be in creating a clear differentiation between SharePoint Portal Server and third-party document-management systems? The focus on department-level deployments is a good first step. The product's independence from AD will help encourage acceptance of the product in organizations that haven't yet deployed a Win2K infrastructure. Like all products, though, SharePoint Portal Server has some obvious weaknesses.

The inability to connect SharePoint Portal Server machines to form a seamless document-management solution for distributed organizations is one such weakness, and the relative lack of granularity in the role-based permission model is another. Lack of support for offline access is a particular pet peeve of mine, and the new server offers no equivalent of Exchange Server's offline slave replica folders. Therefore, road warriors and other offsite users must connect to the network to work with documents. Another problem comes to light when you connect across a slow connection, such as a 28.8Kbps phone link. A client PC and a SharePoint Portal Server machine transmit an enormous amount of data (relatively speaking) when users check in, check out, or publish documents. I understand that a great deal of work is probably going on behind the scenes, but Microsoft needs to reduce the bandwidth demands to make constant travelers such as myself truly happy. However, I acknowledge that SharePoint Portal Server is designed to meet the needs of people who work with documents on a daily basis, so accommodating remote access probably ranks low on the required-feature list.

Microsoft also needs to work out some obvious deployment concerns. How much difficulty will we have with the requirement that users change the way they manage documents if they want to maximize the server's benefits? How can we protect workspaces against virus-infected documents? (Sybari Software and Trend Micro are working on antivirus solutions for the product but didn't have any final releases at the time of this writing.) How do we build a disaster-recovery plan to protect a SharePoint Portal Server machine against hardware failures? Time and experience will help solve these problems, but don't expect nirvana overnight.

A Good Start
For a first release, SharePoint Portal Server incorporates a lot of interesting technology. This easy-to-use document-management server might be a fitting replacement for network file shares and Exchange Server public folders. The server will undoubtedly be a fine fit for organizations that have based their technical infrastructure on Microsoft Windows and Office. Future releases will certainly improve upon this release, but if your organization needs to bring documents under better control sooner rather than later, consider implementing SharePoint Portal Server now.