Q: If you roll back an insert transaction into a table with an identity column, that identity value gets used up, and the table has a gap in the identity numbers. Usually, a gap isn't a problem, but I'm cheating a bit with an accounts receivable system and using an identity column to generate invoice numbers. How can I eliminate the gaps?
We wish we had happier news for you, but no magic setting will make identity behave as you would like. In this case, you have to design a custom number-generation process if you must avoid gaps.
Using identity this way is expensive, because this method serializes all transactions in the database that need to insert into that table. Ouch! You have to serialize transactions (i.e., only one user can receive a new identity value at a time) because you can't assign an identity value to User #10 until you know User #9 has already committed an identity value. Without serialization, you'd have a gap if User #10 received an identity value and User #9 later decided to roll back the transaction.
Generating Key Values Without the Identity Property
Generating key values became much easier when Microsoft introduced the identity property in SQL Server 6.0, but the identity property is not appropriate for all situations. (The identity property automatically generates a key value that you can retrieve with SELECT @@IDENTITY after an insert command executes.) You sometimes want to be able to retrieve the value before the insert takes place--and SQL Server doesn't offer a way to do so with Identity.
Listing 1, page 210, shows a solution for generating key values. This approach is easy to use and doesn't cause the overhead or concurrency problems associated with most key-generation techniques. We have used this approach, and it's always worked well.
This approach is simple. Store the last value used for each of the key types in a NextKeyValue table (see Listing 1), and use GetNextKeyValue, a stored procedure that accepts a key type such as EmployeeId and returns the next available value for EmployeeId. This technique isn't as transparent as identity, but it gives you access to the next key value before the update occurs.
You might be thinking, "Horrible contention will occur if all the key values are stored in the same table and many people are grabbing different types of keys at the same time." We admit that contention might be a problem, but our solution takes advantage of the way updates and locks work in SQL Server and thus avoids most potential problems.
Look at the six Empty1 through Empty6 fields in the NextKeyValue table. These fields make up the first trick you'll need to make the GetNextKeyValue approach work. SQL Server 6.5 doesn't support row locks on updates. However, if you want to avoid serious hot spots (when several users try to grab different key types at the same time), you need row-level concurrency. The empty fields provide fake row locks by padding the size of the row so only one row can fit on a page. SQL Server will still be doing page-level locking, but we've implemented row locks on the table because each page lock affects only a single row. You don't want to use this technique on large tables because it wastes space, but it works on a table with a few dozen rows.
The second trick required to make this approach work is forcing the update to occur in place. (You can perform an update in four ways; for more information, see Knowledge Base article Q135871, "INF: Update Methods Used in SQL Server 6.0," at http://premium.microsoft.com/support/kb/articles/q135/8/71.asp.) Placing a unique clustered index on the KeyName column and ensuring that the KeyValue isn't a variable length data type (which also means it must be NOT NULL) lets GetNextKeyValue perform the update as in-place. (Although Q135871 describes other requirements for performing an update in place, you don't need to worry about them in the example shown in Listing 2.)
As Listing 2 shows, you pass in a key name (such as EmployeeId), and the procedure returns the next KeyName value for you to use in your application. For flexibility, this procedure returns the next key value as an output parameter and as a result set, but you can choose the one that works best for your needs.
What type of locking behavior can you expect with this model? The test scripts in Listing 3 show you how the process handles locks. You must run each script from a separate connection, so you must run them from separate query windows. You see that the first three connections run fine with no blocking. You can determine the next key value for DepartmentId in Connection 2 even though Connection 1 has page locks on the NextKeyTable, as Table 1 shows. Connection 4 is trying to get a key value for AccountId, but Connection 3 is in the middle of a transaction that's trying to do the same thing. Connection 4 will be blocked until Connection 3 either commits or rolls back. Note that users can obtain NextKeyValues for different key types, such as EmployeeId and DepartmentId, but users will block each other if they simultaneously try to grab the next key value for the same key types.
Q: Is there a way to bcp out either single rows (preferably), or perhaps a subset of data rows? I'm trying to initiate the export of a record from a trigger and get one flat file that has all the transactions that were processed during the day.
Unfortunately, there's no direct way to bcp out a single row from a table. We can think of two techniques that might accomplish what you want to do:
- Bcp out of a view. If you can create a view matching the row subset you want, you can bcp out of the view the same way you would out of a table.
- Insert the result set of a stored procedure directly into a table, by using a call something like this:
INSERT INTO ResultSetStorage EXEC CreateMyResultSet
Create a stored procedure using cursor logic that returns the exact result set you want. (This procedure should be easy to create because cursors let you process a result set one row at a time.) Then insert the filtered result set created by the procedure into a temporary holding using the INSERT INTO ResultSetStorage EXEC CreateMyResultSet method. Finally, bcp out from your newly created table. You don't need to use a stored procedure. The EXEC statement will accept any string for dynamic execution.
Removing Replication Settings
If you've ever used SQL Server's replication, you know that getting rid of replication after you've installed it isn't easy. Microsoft provides wizards for setting up replication, but not for undoing it. Fortunately, we came across the following procedure. Microsoft designed the sp_Msunpublish procedure to remove replication-related information from a database that SQL Server has loaded from a dump of a published database. Sp_Msunpublish removes publication and subscription information stored in the database, marks published objects as unpublished, and removes any transactions marked for replication in the transaction log. You can find an installation script and usage instructions at http://support.microsoft.com/download/support/mslfiles/sqlunpub.exe. You can use this procedure with SQL Server 6.0 and 6.5. Microsoft has vastly improved replication in SQL Server 7.0.
Optimizing Your Nightly DBCC Operations
Sometimes software, hardware, or other anomalies result in lost or corrupt data. Microsoft's Database Consistency Checker (DBCC) helps deal with these situations. Books Online (BOL) says that DBCC "is used to check the logical and physical consistency of a database, check memory usage, decrease the size of a database, check performance statistics, and so on.... DBCC helps ensure the physical and logical consistency of a database; however, DBCC is not corrective. It is recommended that you make periodic checks to ensure the logical and physical consistency of your data."
You need to run DBCC periodically to make sure your data is OK. Preventive maintenance is the name of the game. However, running DBCC has a performance penalty, so you can't run it all day looking for potential problems.
Joe Marler, senior Microsoft SQL Server support engineer and a SQL Server expert, offers some advice about running DBCC NEWALLOC instead of DBCC CHECKDB. He talks about some performance differences between the two commands and suggests a minimum frequency for running both commands. Here's what Joe says.
DBCC NEWALLOC checks for allocation inconsistencies and every page chain of every table and index, the most common and serious database problems. Because NEWALLOC I/O is sequential, it's much faster than CHECKDB. On a large Compaq ProLiant-class machine, NEWALLOC can often check 5GB per hour. Furthermore, NEWALLOC tends to scale upward linearly with the amount of data in use and imposes no locking concurrency overhead. Starting with SQL Server 6.5 Service Pack 2 (SP2), you can run NEWALLOC online--that is, without putting the database in single-user mode--without getting many spurious errors.
By contrast, CHECKDB requires lots of random I/O--about one I/O per row per nonclustered index. For this reason, CHECKDB takes a long time and doesn't scale predictably with database size, because scaling is a function of the number of nonclustered indexes and number of rows in the tables. Also, CHECKDB takes a share table lock, thus affecting concurrency significantly.
Many people don't run DBCC because it takes so long. They often seem unaware that NEWALLOC checks most of the serious things in the database, and it has no locking overhead. Although running both NEWALLOC and CHECKDB is best, you can often get by with running NEWALLOC often--and running CHECKDB less frequently (e.g., weekly).
Reader Tip: DBCC PERFLOG, Logins, and Passwords
In your November 1997 column in the second tip for using the DBCC PERFLOG output, you wrote, "Although we can't insert DBCC PERFLOG output directly into our tracking table, we insert the result set from a stored procedure." SQL Server 6.5 has the data piping ability with more than stored procedures. You can use the INSERT ... EXECUTE(
insert PerfLogOutput (DBName, LogSize, SpaceUsed, Status) execute('dbcc perflog')
You can use any dynamic SQL statement to populate a table. The only restriction is that you can't return multiple result-sets. Here's another example:
insert PerfLogOutput (DBName, LogSize, SpaceUsed, Status) Execute('Select ''Test'' as DBName, 0.0 AS LogSize, 0.0 AS SpaceUsed, convert(bit, 1) AS Status')
I wrote two stored procedures--sp_logininfo.sql and sp_droplogin_all.sql-- that you might find useful (the source code for these stored procedures is available on the Windows NT Magazine Web site). You can use sp_logininfo.sql to obtain information about a SQL Server login. If you execute the stored procedure with no parameters, it will report whether the current user is connected through a standard or an integrated/mixed type login--an important capability for Windows NT security. You can also store the output into variables to make it easy to use from other stored procedures. Another stored procedure, sp_droplogin_all.sql, drops a login from all databases, whether the login is a user or is using an alias as someone in that database.ing procedure. Microsoft designed thber column about sp_TransferPasswords, you can simplify the statement
SELECT name, password INTO #RemoteSysLogins FROM master..syslogins WHERE 0 = 1
SELECT convert(sysname, space(30)) AS name, convert(sysname, space(30)) as password INTO #RemoteSysLogins
Of course, you have to hard-code the lengths, but you can also use this feature to add dummy rows to a result-set. For example:
create table #some_table (seqno int identity not null) insert #some_table default values select convert(int, -1) as seqno union all select seqno from #some_table
This alternative SELECT syntax highlights the fact that a SELECT statement can return static values not retrieved from a table, but we don't agree that this approach is always simpler. The proposed change would have required us to explain both the convert and space functions and mention the undocumented sysname internal datatype. Our original SQL Server code is easy to read and shows a useful trick for populating an empty temp table by using a WHERE clause that will never evaluate to true.
Q: In the August issue, you wrote about whether dumping a database on one server and then restoring it on another was safe. You said that a mismatched devices problem could occur if the data and log device allocation were different from the original. After the restore, can you tell (e.g., from the Database Consistency Checker--DBCC--checks) whether this problem has occurred?
Furthermore, what if I had a 100MB database with a 75MB data portion and a 25MB log spanned over four devices? Would consolidating the database on one 25MB log device and one 75MB device cause a problem if I had done a restore based on the original four-device database?
DBCC might report an error message, and you might be able to safely dump a database on one server and then restore it on another and have the device allocation match. But we're not aware of any foolproof detection method, so don't tempt fate by trying this procedure. You need to have a script of sp_help_revdatabase (see the question on backups in our March column) to re-create the databases in the correct order.
Regarding your second question, the number of devices doesn't technically have anything to do with difficulties in restores. The issue is database fragments--information from the syssegments table. Fragments will reflect the order in which you created the table, as in this example:
CREATE DB Test 1 Data = 10 Log = 10 Alter Table Data = 100
We create a table with three fragments: 10 for data, then 10 for log, then 100 for data. The devices don't matter. The descriptions of sp_coalesce_fragments and sp_help_revdatabases in Books Online (BOL) might be useful for what you're trying to do.
Send your questions to us individually, or send them to us collectively at firstname.lastname@example.org.