Enterprise scalability for NT data warehouses
[Editor's Note: Denielle Otey of TECA helped develop the benchmarking program I used to test this product. The Windows 2000 Magazine Lab helped run the tests.]
Many companies are turning to data-warehousing and OLAP decision-support applications to gain a competitive advantage. NCR targets its Teradata 3.0.1 for Windows NT database software for this rapidly growing market. The company designed Teradata to provide enterprise-level scalability through a Massively Parallel Processor (MPP) architecture. MPP lets the product scale upward by dividing the workload among multiple processor nodes that use a shared-nothing architecture. (MPP systems are a collection of two or more loosely coupled SMP processors that use a shared SCSI disk subsystem.)
Teradata runs under NT on standard PC hardware but supports only single-node SMP scalability. On NCR's WorldMark servers, Teradata can support as many as four MPP clustered nodes under NT. The next Teradata version, which NCR plans to release in the summer of 2000, will support as many as 64 processor nodes under NT.
System Architecture
The Teradata system can perform basic online transaction processing (OLTP)-type work, but its primary strength lies in its ability to process queries and support data-warehousing and decision-support applications. The Teradata system's parallel-processing architecture lets it divide a query workload among its processing nodes, which then work independently on the query's different parts.
When the Teradata system first receives an SQL request, the Parsing Engine (PE) checks the request for valid SQL syntax. Each PE is a virtual processor that manages sessions, optimizes queries, and decomposes the incoming SQL statements into parallel steps that multiple nodes execute simultaneously. NCR has developed Teradata's PE since 1983, and it is one of the most highly developed query processors available. You can configure multiple PEs on a multinode system such as the WorldMark 4800 server. The system I tested had two PEs. After the PE optimizes the query, it sends the query instruction steps to the Access Module Processes (AMPs) through the Bynet.
The core of Teradata's database architecture, the AMP, is a virtual processor that retrieves data and performs database-management operations such as joins, locks, and aggregation. Each AMP owns a given set of data that Teradata divides among the AMPs at row level, and Teradata automatically distributes data among the AMPs as you add data to a table. A significant management benefit of the Teradata database is that you don't need to be concerned about the physical placement of data because Teradata offers no way to control data placement. The Teradata system automatically handles all data placement. NCR configured my test system to use 10 AMPs. Figure 1 illustrates Teradata's basic system architecture.
Teradata's software architecture provides enterprise-class reliability. Each AMP is a virtual processor, so you can move AMPs among the processor nodes in the event of a hardware node failure. AMPs also support a high-availability feature called fallback tables, which are basically a copy of a table that Teradata duplicates to different AMPs. Fallback tables provide protection from AMP or disk failure. For an additional level of software fault tolerance, you can configure AMPs in clusters.
Teradata is a fully relational database system that supports the standard ANSI-92 SQL syntax for performing basic SQL Data Definition Language (DDL) and Data Manipulation Language (DML) statements. The Teradata database supports multiple databases as well as views, triggers, and Referential Integrity (RI), which you enable through Referential Constraint Checks that you can define for parent and child tables. The Teradata database also provides support for international languages.
Although Teradata supports the entire range of SQL query options, including outer joins and left outer joins, it lacks support for stored procedures. This omission would be a serious limitation in an OLTP database, but NCR designed the Teradata database primarily as an OLAP decision-support database, which doesn't usually require stored procedures for query-type processing. However, NCR claims that the next release of Teradata will include support for stored procedures.
Teradata also has an unusual requirement in that each table must have a primary index that isn't a separate structure from the table. This requirement is a result of how Teradata distributes rows among different AMPs. The Teradata system uses the index to ensure row uniqueness. Table 1 lists Teradata's capacities under NT.
System Overview
I reviewed Teradata on a WorldMark 4800 system that came in a two-cabinet configuration in which one cabinet housed two 4-way Xeon 550MHz processor nodes, and the second cabinet contained the Bynet interprocessor communication network and the disk subsystem. Each processor node had 1GB of memory, four 9GB 10,000rpm drives, two 10/100 Ethernet LAN connections, and a UPS. The disk subsystem consisted of two NSC disk array controllers with eighty 9GB drives that used a RAID 1 (mirrored) implementation with a total capacity of 360GB. The WorldMark 4800 system provides several fault-tolerant hardware facilities to deliver enterprise-class availability. The system also provides dual-processing nodes, dual Bynets, redundant RAID controllers, multiple LAN connections, and redundant power supplies.
In addition to the two-cabinet server, the WorldMark 4800 includes an Administration Workstation (AWS) that you use to manage and monitor WorldMark MPP systems' processing and storage subsystems. The AWS consisted of a standard NCR PC running NT 4.0. The system had a 600MHz Pentium III processor and 128MB of RAM, one 9GB 10,000rpm drive, and a special Ethernet adapter that connects the workstation to the WorldMark 4800 server. You use a keyboard/video/mouse (KVM) switch to switch access between the AWS and each processor node on the WorldMark 4800. Figure 2 shows the graphical systems-management interface that the AWS provides.
Database Tools and Utilities
The Teradata ODBC driver provides basic client connectivity to the Teradata system. On all client systems that connect to the Teradata system, you must install the ODBC driver and configure a data source that points the client system to the Teradata server. In my testing, I found that the driver was easy to set up but it didn't support unattended installation to client systemsI had to manually configure the driver at each client system. In addition to the ODBC driver, the Teradata system supplies a Java Database Connection (JDBC) driver for Java application development and a WinCLI software development kit (SDK) for non-ODBC-database application development.
Teradata provides several client utilities that let you manage and manipulate the database. The Teradata Manager is a suite of management programs that you use to perform primary administrative tasks. WinDDI is the primary administrative tool in the suite. This tool provides a graphical interface, which Figure 3 shows, that lets you create databases, users, and tables, as well as perform administrative tasks such as setting permissions and creating triggers.
For ad hoc data queries and management tasks, the Teradata system provides the Queryman tool. Similar to SQL Server's Query Analyzer, the Queryman utility lets you enter and run SQL statements from its Query pane. The lower portion of the Queryman interface, which Figure 4 shows, displays the results of executed SQL statements. In this History pane, you can retrieve and rerun an SQL statement by double-clicking the statement.