Scalability, performance, and Java improvements put this server database on top

This is not your father's database. IBM's latest server database release, DB2 Universal Database 5.2, still sports the high-tech equivalents of chrome bumpers and wood trim—but we're talking real wood here, and this machine definitely has more horsepower under the hood than you'd want your kids to drive around with.

The DB2 Heritage
In the mainframe world, IBM and database are almost synonymous terms. Various mainframe versions of DB2 store the data that the world's banks, stock exchanges, and airline reservation systems depend on. But on smaller systems, DB2 was never successful outside dedicated IBM shops.

Aside from providing a strong AIX (IBM's UNIX OS) version, DB2 for minicomputer OSs was a halfhearted effort. You could run it on non-IBM systems, but you really didn't want to because you could run Oracle or Sybase instead.

Over the past 3 years, that situation has changed. DB2 is leading IBM's charge into new markets, and nowhere is this advance more visible than in Windows NT-based organizations. IBM is now the only company other than Microsoft that offers a comprehensive server suite for NT. IBM's NT products have become more polished with each release.

Although IBM is a relative newcomer to the NT market, the company isn't bringing untried technology to the table. In the same way that Microsoft is merging all its OS technology into different versions of Windows 2000 (Win2K), IBM has merged its technology from many database projects into DB2.

IBM's DB2 5.0 release shipped in September 1997. That release was a huge event for the company because DB2 5.0 merged the mainframe-based optimizer and query-processor technology of DB2 Common Server 2.x with the raw performance of DB2 Parallel Edition 1.x, which ran exclusively on AIX. The result was substantially improved performance across the board for customers.

DB2 5.2 is now IBM's core database product for minicomputers. The product runs on Windows NT 3.51 or later, OS/2 Warp Server 3.0 or later, AIX, HP-UX 10.0 or later, SCO UnixWare 7.0, and Sun Solaris. IBM also offers a beta for Linux. The one-user Personal Edition of DB2 runs on Windows 9x. Even the AS/400 is moving from its version of DB2 to DB2 5.2 in third quarter 1999.

DB2's expansion to new OSs means NT users now have access to the best database technology that runs on NT. This statement is especially true insofar as performance and scalability are concerned.

In the ease-of-use category, DB2 5.2 is a solid performer. The product is easier to use than Microsoft SQL Server 6.5 and much easier to use than Oracle. However, Microsoft SQL Server 7.0, which came out in December 1998, sets a whole new standard for ease of administration, hands-off operation, and automatic self-tuning. Once again, no company in the database industry can touch Microsoft in these areas.

In terms of scalability, programmability, and multimedia data handling, Oracle and DB2 lead the pack; SQL Server 7.0 and Sybase Adaptive Server Enterprise trail behind. You'll have to base your choice of a database product on your specific needs. I describe how DB2 5.2 performs, and you can determine how well it matches up with what you're looking for in a server database.

Scalability from One to Many PCs
Any performance analysis of DB2 must evaluate two categories of DB2 5.2 products. DB2 Workgroup Edition and DB2 Enterprise Edition fall into one category. DB2 Enterprise Edition has different licensing from DB2 Workgroup Edition and includes DB2 Connect, IBM's mainframe database gateway. However, both of these DB2 editions provide the same single-server architecture you'd get if you were to buy SQL Server or Oracle.

The other category of DB2 products includes DB2 Enterprise Extended Edition (DB2 EEE), which brings IBM's clustered database architecture to NT. DB2 EEE works with a cluster of servers that looks like one server to your applications.

I installed both DB2 Enterprise Edition and DB2 EEE. The installation procedure is simple and nearly the same for both versions. I was able to log on to both versions the same way, use client applications identically, and even use the same graphical administration tools. (DB2's tools automatically adjust to manage either DB2 Enterprise Edition or DB2 EEE.)

With DB2 EEE, your users' database applications (such as Seagate Crystal Reports or Sybase PowerBuilder) will work just as they would with DB2 Workgroup Edition or DB2 Enterprise Edition. However, DB2 EEE works quite differently under the hood. DB2 EEE distributes work among all the nodes in its cluster and consolidates the results before sending them back to users. Those users won't notice any change from the way their applications work on one server.

I built a DB2 EEE cluster of four servers and found the process to be straightforward. Loading data into a database that runs on four (or eight) machines is different from loading data into a one-server database, and clustering a DB2 server installation makes administrative tasks such as backing up and restoring databases more complex. But most administrators will be able to handle these tasks after reading the DB2 EEE manual.

Clustering's performance improvements are worth the extra effort. To succeed, you must have a fast network link between the machines. (A dedicated 100Mbps network is a must.) Also, make sure to distribute your data evenly among all the nodes in the cluster. DB2 EEE uses a hash partitioning algorithm that has worked well for me. The algorithm might not work well for you, depending on the distribution of your data partitioning column. If IBM's default hashing algorithm doesn't produce an even distribution of data in the cluster, you can choose a new column or adjust the way data is distributed among your servers. DB2 EEE's Db2split utility, as well as a number of other utilities, lets you optimize your cluster configuration.

In my tests, DB2 EEE's performance improved almost linearly with each server I added to the cluster. Performance gains are occasionally even better than linear (i.e., superlinear scaling) when incoming queries touch the same data pages on each server that other queries have touched. However, you can't count on superlinear scaling unless you're a query-tuning wizard.

Parallel Everything
A big trend in the database industry over the past couple of years has centered on moving toward using parallel database engines tuned for multi-CPU servers. Parallelism doesn't refer to the parallel handling of multiple users (something databases have always done); the term refers to processing one query on multiple CPUs simultaneously.

Parallelism can produce extremely large performance increases, especially for reporting and analysis queries. In my tests, DB2 EEE usually performed complex queries three times as fast on a quad server as on a single-processor machine, and sometimes the quad system provided a straight 400 percent increase in speed.

DB2 also uses parallel techniques to quicken database loads, backups, and restores. In version 5.2, DB2 creates indexes in parallel and adds intraquery parallel processing to nodes in a DB2 EEE cluster. (Previously, nodes in a cluster used only one CPU per query.) This process goes by the accurate, if inelegant, moniker intranode intraquery parallelism.

In two- and four-CPU servers, these changes make such a big difference that they are indispensable for firms that want to stay competitive in the database market. Oracle introduced across-the-board parallelism in Oracle8, Sybase did so with Sybase 11.5, and Microsoft came through with the same set of features in SQL Server 7.0.

Database Design
Although SQL Server was catching up to DB2 in terms of parallelism, the opposite was true in terms of database design. With DB2 5.2, you can build real clustered indexes (tables physically organized according to a primary key). Oracle uses a nonstandard definition of a clustered index to mean interleaving primary and foreign key data pages—a much less useful feature than DB2's index clustering is. (Oracle8 supports clustered indexes under the name index-organized tables.)

Previous versions of DB2 make physically ordering data pages in a table easy. However, the previous versions don't keep the order you've established as you insert new rows. You must run a Reorg command manually to return the table to its proper order—an unnecessary step in DB2 5.2.

DB2 5.2 also lets you choose the size of on-disk pages (and matching memory buffer pools). An option on a drop-down menu lets you make the pages either 4KB or 8KB in size. Besides offering performance gains, 8KB pages permit rows as wide as 8101 bytes and increase the number of possible columns in a table from 500 to 1012.

IBM has also added specific enhancements to DB2 for data-warehousing shops, which often need to combine results from many small tables into one large table. DB2 recognizes this kind of data-warehousing query and uses a specially optimized join strategy to respond. However, DB2 (along with SQL Server and Sybase Adaptive Server Enterprise) still doesn't support bitmap indexes. This deficiency is the largest remaining performance hole in DB2's engine. Therefore, Oracle is still the data-warehousing performance champ for queries on low-cardinality data.

Programming Power
In terms of programming, I found DB2 to be alternately head of the class and a slow learner. The database's biggest drawback is that it doesn't supply procedural SQL extensions the way Sybase, SQL Server, and Oracle do. Extensions such as Transact-SQL (T-SQL) and Procedural Language/SQL (PL/SQL) are useful and substantially improve performance. These extensions have become key tools for many DBAs.

DB2 supports custom procedures, but you have to write them in external third-party programming languages such as C, C++, or Java. This limitation makes stored procedures more difficult to write. One problem is that you must use a separate tool to write and compile the code. Another problem is that T-SQL and PL/SQL let you manipulate database data using plain SQL statements or through specialized database objects called cursors. Doing the same kind of manipulation in a general-purpose language is more complex and takes more code. If you're a DBA who writes a lot of business logic into the database server—which you'll need to do if you're not using a midtier application server for that purpose—you'll find this restriction frustrating.

DB2 5.2 includes an ODBC driver that IBM updated to Microsoft's ODBC 3.0 specification. The ODBC driver works well with Microsoft Office and with programming tools such as Visual Basic (VB) and Visual C++(VC++). However, DB2 doesn't include a driver for OLE DB, the database access interface Microsoft developed to replace ODBC.

DB2 offers the best Java support in the industry, Oracle offers the next best, and Sybase and Informix tie for third. I had many options when writing Java code to access DB2 data.

DB2 2.1 was the first major database to support Java Database Connectivity (JDBC) access. DB2 5.2 is the first database to support the next important Java database access standard, SQL Java. SQLJ support lets you write Java with embedded SQL commands just as you might have done with C code, instead of the more complex approach of using JDBC libraries. My Java code became simpler to read and understand when I switched to SQLJ.

Java is a fine choice for writing DB2 stored procedures, and DB2 lets you write Java-based custom database functions and virtual tables. These structures are handy for getting realtime information into the database in relational form. You can also write DB2 stored procedures in C, C++, or any language that lets you create ActiveX objects, such as VB. SQL Server doesn't have any Java support. Microsoft is sticking strictly with T-SQL for internal functions and Windows DLL or ActiveX components for external functions. (You can use Visual J++—VJ++—to write ActiveX objects in Java, but you'll need to buy separate JDBC data access libraries or connect SQL Server through a JDBC-to-ODBC wrapper, because Microsoft doesn't provide JDBC client code.)

DB2 includes a set of extensions, called extenders, that simplify storing, searching through, and managing multimedia data types. Specifically, DB2 has extenders for storing and searching through pictures, video clips, free-form text, and sound data. IBM doesn't offer some types of extenders (e.g., for mapping data), but you can purchase third-party products that provide these extenders.

For HTML-based applications, DB2 has a Web server connector called Net.Data. Net.Data contains a scripting language for generating dynamic Web forms and pages from DB2 data. I found Net.Data's syntax cumbersome and awkward, and I quickly wished I could use something else. I recommend you use a product such as Allaire's ColdFusion or SilverStream Software's SilverStream and save yourself some aggravation.

Graphical Tools and Wizards
IBM invested heavily in its graphical tools between the DB2 2.x and DB2 5.0 release dates. DB2 5.2 contains state-of-the-art graphical capability. You can do almost all of a DBA's typical database management tasks using only DB2's main management console Control Center, which Screen 1 shows. You can create databases, tables, views, triggers, indexes, table spaces, and buffer pools; set up nearly realtime data replication between two DB2 servers; and monitor user activity.

DB2 goes further by providing wizards that IBM calls SmartGuides to help new DBAs tune a system. The SmartGuides help you adjust memory and disk resources for optimal performance to accommodate your anticipated workloads. DB2's HTML-based Help system is excellent—fully searchable and complete—so you can be assured of finding assistance when you need it.

DB2 even beat Microsoft at administration-tool building until SQL Server 7.0 shipped. SQL Server 7.0's autotuning features and wizards redefined the functionality a company could build into a server database. Although I'm comfortable leaving SQL Server 7.0 (or Computer Associates' OpenIngres, also a low-maintenance champ) in a branch office, I'm not comfortable leaving DB2 servers in the hands of anyone who is not a professional DBA. Tuning DB2's variously sized memory caches is a challenge that requires you to know what you're doing. SQL Server 7.0 eliminates the need for such detailed tuning.

The DB2 wizards trade speed for ease of use. For example, when I configured a test database to use autoexpanding disk storage devices, I measured database insert speed at about 273 rows per second during a database load. When I created properly sized database devices ahead of time, performance jumped to 362 rows per second because DB2 wasn't incrementally expanding its devices every few rows.

For professional DBAs, DB2 offers many graphical and command-line tools to coax the last ounce of performance out of each database. You can monitor DB2 databases using a number of realtime performance-monitoring tools (Screen 2 shows several of these tools). You can also set alerts on particular key measurements to warn you when the parameters they monitor exceed the limits you set. DB2 fully integrates with NT's Performance Monitor. DB2 also integrates completely with NT's security layer to provide one logon for NT domain users.

DB2 has an outstanding query-analysis tool. In a four-way test join, the tool showed me in elegant tree format exactly how a DB2 server processed my query, and which type of join (e.g., hash, merge, nested loop) the server was using, as Screen 3 shows. Every database have this kind of graphical query-analysis tool, but SQL Server 7.0 is the only other database that does. DB2 has the most sophisticated optimizer and query rewriter on the market (although Oracle comes close), so you probably won't need to tune queries manually.

Rounding out DB2's tool suite is a Java-based version of Control Center that I ran from a separate PC as a remote management test. The Java version, which Screen 4 shows, looks almost identical to the native Windows version and has almost exactly the same feature set. However, I couldn't run DB2's event-analysis or performance-monitoring tools from the Java version, and the Java version didn't work with Microsoft's Internet Explorer (IE) 4.01 (with Service Pack 1—SP1) Web browser. Several items of text were missing from various buttons and menus. The Java version worked fine with Netscape Navigator 4.5.

The Bottom Line—The Top Pick
IBM is bringing some very sophisticated database technology to the NT platform and is making that technology easy to set up, simple to administer, and inexpensive to buy. For ease of use, SQL Server 7.0 takes top honors. But if you're prepared to deal with more elaborate options, DB2 will accommodate more complex and demanding database needs. In particular, DB2 EEE clustering technology is unique among NT applications and provides a great way to scale your database system without moving to UNIX.

DB2 Universal Database 5.2
Contact: IBM * 800-426-3333
Web: http://www.software.ibm.com/data/db2/udb
Price: $25,000 for 2 to 4 processors and 50 DB2 5.2 Enterprise Edition licenses
System Requirements:
Server: Windows NT Server 3.51 or later, NT Workstation 3.51 or later, Windows 9x, OS/2 Warp Server 3.0 or later, AIX, HP-UX 10.0 or later, SCO UnixWare 7.0, Sun Solaris, or Linux, 32MB of RAM, 150MB of hard disk space, Client: Any OS that the server component runs on, plus Windows 3.1, IRIX, SINIX, SCO OpenServer, DOS, or MacOS.