NCR primarily designed the Teradata system for data-warehousing applications that are populated by data derived from operational OLTP applications, so the system provides several data-loading tools. Teradata's array of data-loading tools includes FastLoad, MultiLoad, and TPUMP. FastLoad quickly populates empty tables. MultiLoad loads and updates tables that contain existing data, and TPUMP trickle loads data into a table as a continuous background task. Teradata also provides the FastExport tool, which can export data from the Teradata database into a flat text file.
For workload management, the Teradata system supplies an optional Database Query Manager (DBQM) utility that governs the workload that client connections submit. This utility helps you manage server availability by limiting the amount of disk and CPU resources that a given client connection can allocate.
Scalability Testing
To test Teradata's database-growth and multiuser scalability, I conducted several tests on the Teradata database server running on the WorldMark hardware. I conducted these tests on single- and dual-node configurations to determine the scale-out capabilities of Teradata's MPP architecture.
To measure database scalability and query performance, I ran 15 decision-support queries and measured the time Teradata took to complete the entire set of queries. This set of tests provided an indication of system scalability as the volume of data increased. These queries represented a mixed system workload in which seven of the queries generated a light system workload, seven queries provided a medium system workload, and one query generated a heavy system workload. We ran the first set of queries on a 10GB database, then ran the same set of queries on a 100GB database and measured the time the system needed to complete both test sets.
In Graph 1, you can see that the single-node configuration ran the query set in 23 minutes and 33 seconds using the 10GB database and 3 hours, 57 minutes, and 55 seconds using the 100GB database. Using the 10GB database in a dual-node configuration, Teradata required 17 minutes and 58 seconds to execute the complete set of queries. Using the 100GB database in a dual-node configuration, Teradata required 2 hours and 29 minutes to execute the same set of queries. The goal in this type of test is for the database system to provide linear scalability as the database size increases. Thus, if Teradata requires 17 minutes and 58 seconds (i.e., 1078 seconds) to complete the 10GB test, ideally, the system would complete the 100GB test in 10 times that amount of time (i.e., 10,780 seconds, or 2 hours, 59 minutes, and 40 seconds). My test results show that the Teradata system approached linear scalability in a single-node configuration and provided better than linear scalability in a dual-node configuration. These tests show that adding a second node to your configuration results in a performance improvement in the system's ability to process large databases.
To test the system's ability to scale upward as you add users to the system, I ran the same set of 15 queries against the 10GB database using varying numbers of attached client systems, and I measured the time that Teradata required to complete the entire set of queries for all the client systems. To minimize the possibility that the Teradata system will cache data, each client system ran the query set in a random order. In addition, I randomly selected the values for each query from the set of all possible data values.
Graph 2 shows the results of the multiuser throughput tests. In the ideal results, the database system would provide linear scalability as the number of users increases. In other words, if the query set requires 17 minutes and 58 seconds (i.e., 1078 seconds) to execute at the one-user level, ideally the system can complete the query set for 56 users in 56 times the total amount of time required for one user (i.e., 60,368 seconds, or 16 hours, 46 minutes, and 8 seconds). In my tests, at the 56-user level on a dual-node system, the Teradata system completed the query set in 9 hours, 53 minutes, and 27 seconds. In both configurations, the Teradata system provided better than linear scalability. The Teradata system was able to make use of the overlapping database I/O that the different client systems requested.
In addition, the Teradata system's performance improved by jumping from a single-node to a dual-node configuration. However, my test results don't show a linear increase in capacity from adding a second node. The Teradata technical support personnel explained that adding a node to a single-node configuration doesn't usually result in linear scalability. However, adding two nodes to a dual-node configuration typically results in a linear increase in performance. The support person also noted that a bigger improvement would have been evident if I had run the tests against the larger 100GB database.
During the multiuser throughput testing, I ran into a couple of problems that were related to the Teradata ODBC driver. While the Teradata system provided excellent scalability as the number of users increased, I found that I was unable to run tests using multiple or virtual test applications per client system. Running multiple client applications produced an error in which the Teradata system refused additional client connections. At the time of this writing, NCR was unable to determine the cause of this error. In addition, while I was investigating this condition, I found that the Teradata ODBC driver consumed 100 percent of the CPU utilization for the client system while it was executing a query. NCR confirmed that this situation was typical and resulted from the driver periodically polling the Teradata system. This polling is designed to let the driver process asynchronous queries. The Teradata technical support contacts stated that the driver would let client applications get processor time, but I didn't measure the effect on client performance in my testing.
An Enterprising Database Server
NCR usually sells the WorldMark server and the Teradata database as a packaged solution that includes the system setup and consulting aid required to get the system up and running. In addition, NCR's Global Support Center (GSC) provides 24 * 7 system and software support. The enterprise-level performance that the Teradata system demonstrated in my testing paralleled the recent TPC-R scores that NCR posted with the Transaction Processing Performance Council (TPC). Running under Windows 2000, the Teradata system posted top TPC-R scores of 21,254 queries per hour. For more information about these scores, go to http://www.tpc.org.
I found the Teradata system to be an excellent platform for enterprise data-warehousing and decision-support applications. The WorldMark 4800 system running Teradata for NT clearly proved its ability to handle very large databases (VLDBs) and provide excellent multiuser scaling. In addition, the system's automated database-tuning and data-placement feature makes it easier to maintain than competing enterprise database solutions that require the administrator to explicitly select the data's physical placement.
Although the Teradata system provides excellent database scaling and query performance, it's not a general-purpose OLTP-type database system. NCR's omission of basic features, such as stored procedures support, underline Teradata's strong emphasis toward decision support and data-warehousing implementations.
Anonymous User November 15, 2004 (Article Rating: