Check out the new SQL Server Community at http://www.winntmag.com/sql. It features a forum, a refined index to SQL articles, and links to other SQL-related Web sites. You can download the code in this article from the Windows NT Magazine Web site (http://www.winntmag.com).
Q: I must create multiple indexes. Does the order I create them in matter?
You probably know that a clustered index forces SQL Server to store data sorted in the physical order of the index key, but you might not realize the implications of this arrangement when you create multiple indexes on one table. Imagine that you want to use SQL Server's bulk copy program (bcp) to move a large table that has four nonclustered indexes and one clustered index. Because you want to use fast bcp, you drop all five indexes (fast bcp requires that you drop all indexes). When you're ready to re-create the indexes, create the clustered index first. Why? The leaf level (bottom level) of a nonclustered index stores an index key for every row in the underlying table: a 1:1 relationship. If the underlying table has 1000 rows, you'll have 1000 entries in the leaf level of your nonclustered index. Additionally, each entry in the leaf level stores the physical page number of the row the index entry points to.
You can see what's so bad about creating the clustered index last. Creating the clustered index re-sorts the table, which means that every row will probably end up on a different page. Therefore, all the information in the leaf level of the four existing nonclustered indexes will be wrong (because nonclustered indexes contain page numbers for the row they point to). This situation forces SQL Server to re-create each index. In other words, creating the clustered index last wastes time—both yours and SQL Server's.
Q: Does Windows NT Magazine use SQL Server for its Web site?
Windows NT Magazine Web master, T.J. Harty, says, "Yes, indeed. About 80 percent of the pages on the http://www.winntmag.com site originate from the SQL Server. That many pages are dynamic. And for the site search, I'm using Index Server."
Q: I'd like some information about what is happening in SQL database devices. Exactly how does SQL interface to the disk when SQL creates database devices? Is SQL bypassing NTFS altogether and using its own device driver? Are these files not accessible by the native file system (i.e., NTFS)? What is the advantage of having database devices? Also, I heard a rumor that database devices were not included in SQL 7.0—true?
SQL Server uses standard Windows NT I/O mechanisms when creating a database device and then reading or writing to that device. SQL Server doesn't bypass the file system drivers. Whether you place devices on NTFS or FAT partitions, NT handles all read/write requests as though the device were a regular file. You can create a device on a raw partition that doesn't use the file system (as you can in UNIX), but you don't have a valid reason to take this approach.
Remember that devices are nothing more than regular files. They're how SQL Server interacts with the operating system (OS) I/O subsystem. Devices came from the old Sybase code, which is where SQL Server has its roots. Typically, devices have a .dat extension, and you can manipulate them like any other file. Of course, the data is unreadable if you open the file directly, as you see in Screen 1. But the data is all there—you can grab it if you know how to look for it.
As for devices disappearing in SQL 7.0, at press time we're bound by a nondisclosure agreement, so we can't be too specific. But let's just say we wouldn't discourage you from believing that rumor.
Q: I'm one of those SQL programmers who has used the xp command method of using bulk copy program (bcp), and I was particularly interested in what you said in your August column about Distributed Management Object (DMO) BulkCopy. Using SQLOLE piqued my interest, but I have two questions. I need to move data into a table. I've chosen the ImportData method. One property I must set is DataFileType because I have a format file specification. I don't know how to specify the SQLOLE_DATAFILE_TYPE SQLOLEDataFile_UseFormatFile property in my Transact-SQL (T-SQL) statement batch. Any suggestions? Second, I read that ImportData doesn't work and returns an error. Any truth to this? If so, are there any workarounds?
We'll answer the easy part first. ImportData works with two exceptions. The first exception is related to batch size. You get an error message if the batch size you specify for the BulkCopy bcp object is smaller than the number of rows in your data file. Fortunately, SQL Server loads the data properly, so you can catch the error message in code and ignore it. The second problem deals with importing data that contains explicit identity values. SQL Server assigns the next available identity values to your column even when you set BulkCopy's IncludeIdentityValues property to True. Microsoft Knowledge Base (http://www.microsoft.com/kb) articles Q167867 and Q163450 deal with these problems in detail.
Now for the tricky part of your question: You were right on track when you tried to set the DataFileType of your BulkCopy object to SQLOLEDataFile_ UseFormatFile. SQL Server requires this configuration when you import data with a format file. You ran into a problem because the examples in Books Online (BOL) for calling SQL-DMO objects from T-SQL are inadequate. If you look at SQL Server Books Online (in the Microsoft SQL Server Group in the Programs Start Menu), you'll find that BOL forgets to mention that SQLOLE DataFile_UseFormatFile is an integer constant with a value of 5. The tag SQLOLEDataFile_UseFormatFile is part of the SQLOLE_DATAFILE_TYPE enumerated datatype, and you can use it only from Object Linking and Embedding (OLE) Automation clients that load the Microsoft SQLOLE Object Library and related header files. You can use T-SQL as an OLE Automation client, but T-SQL apparently doesn't have access to the enumerated datatypes listed in BOL.
generates the following error message
Source: ODSOLE Extended Procedure
Description: Type mismatch.
SQL Server is much happier if you pass in 5, the real value that the SQLOLEDataFile_UseFormatFile constant represents.
You can use several methods to find the real values for SQL-DMO enumerated datatypes listed in BOL. We found the value for SQLOLEDataFile_UseFormatFile by opening a debug window in Visual Basic (VB) and typing
after we loaded the Microsoft SQLOLE Object Library from the References dialog box. Here's the correct command to set the DataFileType:
EXEC @hr = sp_OASetProperty @BCPObject , 'DataFileType', 5
Listing 1 shows a sample of how you can bcp data into SQL Server using OLE Automation from T-SQL.
Q: We have had SQL Server in production for some time now, but we have just realized that the master database device was created with a 2 GB size! Can we shrink the master database?
You can use the SHRINKDB command in Database Consistency Checker (DBCC) to shrink the master database (see DBCC in the Transaction-SQL reference in Books Online—BOL), but unfortunately, you must be in single-user mode and use the WITH MASTER OVERRIDE clause. 2GB. However, we strongly recommend you make a good backup of the master database before doing anything to it, including shrinking. We recommend you create a disk-level backup of master.dat because it is the easiest and quickest way to recover.
Also, remember to stop the SQL Executive and not use SQL Enterprise Manager when you're running SQL Server in single-user mode. If you don't stop these services, you will be using the only free connection, and you won't be able to connect and issue your SHRINKDB command.
Q: I can't dump my database to a network mapped drive. What might be happening?
Permissions are the most common cause of problems in dumping a database to a network drive. The account performing the dump must have the correct permissions to write to the network drive. The account isn't the one that you as a user are logged on to but the one that the SQL Server service runs under. By default, SQL Server will run under the LocalSystem account. Typically, this account can't connect to a remote share and write a file, so it obviously can't dump a database.
Fixing the problem is usually as simple as having SQL Server start as an account that can see and write to the network drive. Use xp_cmdshell "dir \\servername\share" as a permission test. SQL Server can't dump to the drive if it can't list its directory. We used the universal naming convention (UNC) when specifying the remote drive. Drive letters such as G or F exist only when the user is logged on. The SQL Server service account runs as a service so it's never logged on. You must use UNC names to access a remote share point.
The same rules apply if you're using SQL Executive to perform a scheduled database dump. The only difference is that you must check permissions for the account that the SQL Executive service runs under.
Q: How can I reset an identity column to its original value?
Identity columns are useful when you must automatically generate unique system keys, but sometimes you want to start over; for example, in a development environment where you must run the same test many times, you need the identity values to start from the beginning each time. The TRUNCATE TABLE command removes all rows in a table and resets your identity column to the original seed value. DELETE FROM TABLE will remove all the rows but won't reset the identity value to its original position.
Q: Microsoft published the object linking and embedding database (OLE DB) for online analytical processing (OLAP) draft specification in September. Will it have any effect on the SQL Server community?
Yes, Microsoft's new OLAP server, code-named Plato, will affect SQL Server. However, we don't know how Microsoft will package the server—as part of SQL Server 7.0, or as a separate product. Microsoft will definitely support OLAP, a general term describing decision support activities that are multidimensional (e.g., show me the trend in revenues by district over time) and that support both rollups and drill-downs. OLAP servers typically preaggregate some totals to avoid time-consuming joins of the underlying relational data, and Microsoft has designed the OLE DB for OLAP spec to be generic enough so that all consumers can access and navigate this multidimensional data, regardless of data provider and data store. For more information about OLE DB for OLAP, see http://www.microsoft.com/data/oledb/olap. If you've never used OLAP, Cognos is giving away a copy of its 8MB PowerPlay OLAP client and a 4MB data set of mutual fund data (http://www.cognos.com/funds). You can have fun learning OLAP by exploring mutual fund performance data.
Q: I heard that Microsoft's Professional Developer's Conference (PDC) in San Diego included a couple of presentations on planning for Sphinx. Can you summarize the main points?
David Marshall's two sessions described the main new features in SQL Server 7.0, code-named Sphinx, in terms of the development team's four goals: ease (of installation, deployment, management, and use), performance and scalability, a distributed database, and a good development environment.
We're impressed by the trend toward self-tuning (the server now configures and tunes more than 30 parameters), and better lock management (locking granularity—table, page, or row; the optimizer now controls lock management at the statement level). We also applaud many of the two dozen new wizards—especially the one for setting up server security (which has changed significantly) and new ones for cluster server setup, Open Database Connectivity (ODBC) configuration, replication (which also is vastly improved), and setting up data transformation services (useful for data mart and online analytical processingOLAPserver applications).
With regard to security, the main new feature is SQL Server roles—groupings that SQL Server (instead of NT groups) owns and administers. Server and database administration have predefined roles, and although you'll still be able to use aliases, you'll probably want to migrate to roles, especially for use in three-tier applications. (The Windows 98 version will require that you use roles.)
Query processing is improved, too. Query processing now includes cost-based intraquery parallelism, support for heterogeneous queries, higher limits (a maximum of 32 tables per join, up from 16), and support for the popular TOP N rows available in Microsoft Access. Microsoft has improved Showplan and optimizer hints, and a new locking engine controls dynamic locking. Microsoft had to rewrite the locking engine anyway to support the new page sizes (increased from the current 2KB to 8KB), and Microsoft has apparently done row-level locking right this time.
SQL Server's biggest weakness has been its lack of upward scalability—you didn't tend to use SQL Server and very large databases (VLDBs) in the same sentence—and we don't have room this month to go into all the new features that support scalability. But take our word for it: You'll find plenty of features—from faster import/export to better secondary indexes and text/image support (using a tree structure instead of linked lists).
As of SQL Server Service Pack 1, the sp_tableoption system stored procedure includes a new Table lock on bulk load option. When you enable this option for a table, bulk loading of data requires only an exclusive table lock on the table. With the option disabled—the default behavior—each newly allocated extent requires an extent lock for the duration of the batch. Enabling the option lets you specify larger batch sizes without needing to increase the sp_configure value for locks. If you're using fast (nonlogged) bcp, this option can significantly improve the speed of bulk copy.
Cache Hit Ratio Not a Practical Performance Measurement
Caveat emptor! You've probably read that the SQL Server cache hit ratio counter in Windows NT Performance Monitor (Perfmon) is an excellent measure of memory utilization on your server. The theory is that you have enough memory if the cache hit ratio is high; most experts consider 90 percent or higher a good number. Unfortunately, the Perfmon cache hit ratio counter measures the ratio since SQL Server has been running. In other words, Perfmon doesn't measure the cache hit ratio over the configured sampling interval. This method of measuring means that the value is practically useless if your server has been running for a while (count your blessings), because you can't track cache hit ratio trends over time. To make matters worse, cache hit ratio statistics show a page being read from cache, even if the Read Ahead (RA) Manager just performed a physical read to get it off disk. Technically, the read was satisfied from cached data, but that measurement is misleading because the cache hit ratio data implies physical disk I/O wasn't necessary.
Faster than a...
We thought we'd share C.J. Salzano's response to a summertime press release from Microsoft announcing that "continuing its strong momentum, SQL Server, Enterprise Edition 6.5 delivered 12,026.07 tpmC (transactions per minute for the C benchmark) @ 39.38 $/tpmC on a Unisys Aquanta HS/6 server (6xPentiumPro 200 MHz)." C.J. commented: "Yeah, yeah, yeah, we've seen this before—every month or so, a couple thousand more. They're just squirting in some WD-40 to speed things up. I think that every few days Bill's Exchange Server runs this code:
newtpcvalue = (lasttpcvalue + 2000 chunks)
newtpcvalue = (newtpcvalue - acouplechunks)
SendMail (ListOfMyFriends, newtpcvalue, sendnow, myserver)
lasttpcvalue = newtpcvalue
Well, judge for yourself by drifting over to http://www.microsoft.com/sql/tpc. You can find out more about the Transaction Processing Performance Council's performance benchmarks at http://www.tpc.org/faq_TPCC.html.
Crosstabs are never the easiest queries to create in SQL. In July, Ray Fiorillo (Ray_Fiorillo@ios.doi.gov) posted this question on the Microsoft SQL list at http://www.swynk.com, and Ron Talmage (firstname.lastname@example.org) provided a great reply.
Q:I'm attempting to create a type of crosstab query based on a table that (when stripped to the bare essentials relevant to this problem) looks something like this:
I don't need to count Code 2, so I can eliminate it with a WHERE clause. I need the results grouped by category, and one column will use the COUNT() aggregate function to derive total cases, but I must create additional columns based on the value of code. I want the resulting table to look something like Table 1.
Ron suggested the following:
(SELECT COUNT(*) FROM mytable t2
WHERE code = '0' AND
t1.category = t2.category) AS "Code = 0",
(SELECT COUNT(*) FROM mytable t2
WHERE (code = 'X' OR code = '8') AND
t1.category = t2.category) AS "Code = X OR 8",
(SELECT COUNT(*) FROM mytable t2
WHERE code = '1' AND
t1.category = t2.category) AS "Code = 1"
FROM mytable t1
WHERE code != '2'
GROUP BY category
Nancy Hidy (email@example.com), a database systems consultant with Shell Services, writes: "We have inexplicable ghost connections occur periodically, and the problem always seems to be on servers with multiple databases. So we have to coordinate an 'outage' to stop and restart SQL Server just to clean up these connections. These strange connections are just another lovely 'feature' of Microsoft SQL Server. We call it 'recycling SQL Server.'"