Letting your PCs do the talking lowers your bottom line

Communication networks consist of two types of traffic—data and voice. Many companies still maintain a dedicated voice network and a separate data network but are beginning to see the value (e.g., lower cost, flat-rate pricing) of Voice over IP (VoIP)­based data networks. VoIP is a technology that lets you make telephone calls and send faxes over an IP data network as if you were using a traditional Public Switched Telephone Network (PSTN). This capability lets companies reduce telephone and fax costs; converge data, voice, fax, and video services; and build new network infrastructures for advanced e-commerce applications (e.g., Web call centers).

VoIP has gained support from standards organizations, such as the International Telecommunications Union (ITU) and Internet Engineering Task Force (IETF), and communication vendors. Today, you can find VoIP products on the market and build VoIP-enabled networks. Before you implement VoIP in your enterprise, become familiar with its applications, underlying technologies, and H.323 and other VoIP standards, and explore several VoIP deployment considerations.

VoIP Applications
Voice communications play a fundamental role in daily life. In the near term, PSTNs will remain an important vehicle of voice delivery. VoIP, however, provides a competitive alternative to PSTNs by reproducing telephone capabilities at a significantly lower cost. VoIP is applicable to almost any type of voice communication, from simple interperson or interoffice calls to complicated teleconferences. The following examples of VoIP applications are possibilities that might work for your enterprise.

Telephone and PC communication. You can enable voice communication between traditional telephones on a PSTN and PCs in an IP network by connecting the IP network to the PSTN with an IP-PSTN gateway, as Figure 1 shows. The gateway interprets protocols and voice information for the two networks. With this configuration, a PC user can use a PC-based telephone to call a gateway close to the final call destination. The phone charge is based on the distance between the gateway and the destination.

Internet users can even receive telephone calls directly on their PCs. For example, many Internet users are dial-up users who have only one telephone line. When a user is accessing the Internet, the phone line is busy, so the user can't receive phone calls. However, VoIP software, such as eRing Solutions' itRings!, lets an Internet PC receive a call from a person who uses a traditional telephone. The PC user uses a microphone and speakers to talk and listen to the caller. When the PSTN user calls the Internet user, the PSTN user calls an itRings! gateway, then enters the Internet user's phone number. The itRings! gateway linking the PSTN and Internet establishes communication between the PSTN and Internet users.

Interoffice trunking. To handle intraoffice and interoffice voice communications, your company might have its own PBXs distributed throughout different locations and branch offices. Traditionally, you lease lines, called tie trunks in telecommunications terminology, from telecom carriers to interconnect PBXs. To reduce costs and consolidate network facilities, you can use your IP network to link these PBXs. In Figure 2, IP-PSTN gateways connect the PBX at each of two locations to the IP data network. In this way, voice and data share the same network. The IP-PSTN gateway can often compress a voice call (e.g., from 64Kbps to 8Kbps), which reduces the bandwidth requirement on the data network. The IP-PSTN gateway implementation is either a dedicated device or an integral part of a PBX. If a PBX incorporates the IP-PSTN gateway functionality, it's often called an IP PBX or iPBX device.

Remote access for mobile users. Many companies provide mobile users with remote access to the company's intranet through dial-up and VPN services. From their remote computers, mobile users can often access corporate data resources but not corporate voicemail and fax services. To make phone calls and use fax services, these users need additional analog lines. If the company's voice and fax systems are IP-enabled, however, mobile users can access voice and fax services from the same computer that they use for remote data access. Mobile users can dial their ISP's local Point of Presence (POP) and use the Internet to submit and retrieve voice messages.

Multimedia applications. VoIP multimedia applications, such as Microsoft NetMeeting and Microsoft Exchange 2000 Conferencing Server, enable workers in different locations to use their networked PCs to collaborate through realtime conferencing, whiteboard, and screen-sharing. These applications let users talk with one another from their PCs over an IP network, which saves travel time and expense and improves work efficiency.

E-commerce. Customers often browse a vendor's Web site for information. Sometimes they might not find the information they seek or might have questions about the information they find. To serve customers better, an e-commerce Web site can provide an interactive mechanism that lets customers talk to a live customer service agent in a call center. This voice-enabled e-commerce Web site integrates the Internet, PBX, and call center into one system.

When a customer has a question, he or she can click a Speak to an Agent button on the vendor's Web site. The customer then fills out and submits a short form that provides the vendor with contact information and details about what information the customer is seeking. The Web site directs the form to the proper customer service agent in the call center and establishes a voice channel between the customer's PC and the agent's telephone set in the PBX-based call center. During conversation, the agent can push Web pages to the customer's screen and guide the customer to the correct information. Avaya's Electronic Interactive Voice Response (e-IVR) and Cisco Systems' Cisco Customer Interaction Suite enable voice interaction between customers and agents at e-commerce Web sites.

H.323 History
One cornerstone of VoIP technology is the H.323 standard. H.323 defines realtime multimedia communications—audio, video, and data—over packet-based networks, such as IP-based networks. The ITU specified the H.323 standard and, in 1996, released the first version, which focused heavily on multimedia communications in LAN environments that don't provide a guaranteed Quality of Service (QoS).

While the ITU released this standard, researchers were experimenting with voice communications over the Internet. The use of proprietary methods for setting up calls, compressing voice traffic, and locating communicating parties resulted in incompatible products. Communities within research and industry quickly realized the need for a VoIP standard and adopted H.323 to achieve interoperability between VoIP products. New VoIP requirements, such as communication between a PC-based phone and a traditional phone, voice communication between two PC users by way of the Internet, quality of voice communication over the Internet, and call authentication and authorization, influenced the development of the second version of H.323. In 1998, the ITU released H.323 version 2, which accommodates these requirements. This version is not only applicable in LAN environments but also in WAN and metropolitan area network (MAN) environments. In 2000, the H.323 standard evolved to version 3, which incorporates features such as fax capabilities over packet networks, fast-call setup, and communication between gatekeepers.

H.323 is part of ITU's H.32x standard family. Other H.32x standards define multimedia communications over other network types (e.g., H.320 for ISDN, H.321 and H.310 for Broadband ISDN—B-ISDN,H.322 for LANs that provide guaranteed QoS, and H.324 for switched-circuit networks. PSTNs are switched-circuit networks).

H.323 Components
The H.323 standard comprises four components: terminals, gateways, gatekeepers, and Multipoint Control Units (MCUs). Together, these components can provide point-to-point and point-to-multipoint multimedia communications.

An H.323 terminal provides realtime multimedia communications with other terminals by supporting voice and optional video and data communications. The terminal can be a PC-based phone, a standalone device (e.g., an IP phone), or an application running on a PC. H.323 terminals are compatible with other H.32x terminals.

An H.323 gateway interconnects and enables communication between H.323 networks and non-H.323 networks, such as PSTNs. The gateway translates protocols for call setup and release and converts and transfers information between the two networks. (However, for communication between two H.323 terminals in an H.323 network, you don't need a gateway.) A popular H.323 gateway is the IP-PSTN gateway that connects an IP network and PSTN and lets an H.323 terminal talk to a traditional telephone in the PSTN.

An H.323 gatekeeper is the central point of an H.323 network and provides call control within the network. Its functions include address translation, admission control, bandwidth management, and tracking and reporting conversation times. A gatekeeper is optional, but when it's present, H.323 terminals and gateways (aka endpoints) must use it.

An H.323 MCU manages conferences between three or more H.323 terminals. When terminals participate in a conference, they must establish a connection to the MCU. The MCU ensures that all terminals in the conference have a common level of communication, controls conference resources (e.g., the specific terminal that multicasts video), and determines which audio or video coder-decoder (codec) to use between the terminals. In addition, the MCU can optionally provide centralized processing of conference media information streams.

Figure 3 depicts an H.323 network that contains the standard's four components and the interconnection with a switched-circuit network. The gateway, gatekeeper, and MCU are separate systems in Figure 3, but they're logical components, and vendors can implement all three components in one system. A collection of terminals, gateways, and MCUs managed by one gatekeeper in an H.323 network is an H.323 zone. Only one gatekeeper exists per zone, and you can have one or more zones across your IP network.

H.323 Protocols
The H.323 standard includes seven major protocols: audio codec; video codec; H.225.0 registration, admission, and status (RAS); H.255.0 call signaling (aka Q.931); H.245 control signaling; Real Time Protocol (RTP); and Real Time Control Protocol (RTCP). Audio codec encodes the voice signals from the microphone on the H.323 terminal into audio codes suitable for transmission on the H.323 network. It decodes the audio codes that the H.323 terminal receives from the H.323 network. An H.323 terminal supports one or more audio coding and decoding algorithms, including G.711, G.722, G.723.1, G.728, and G.729. (The ITU specified all these algorithms). H.323 requires vendors of H.323 terminal implementations to support G.711. For two terminals to understand each other, they must support at least one common audio codec algorithm.

Similarly, video codec encodes video signals that a camera on an H.323 terminal receives into video codes for transmission on the H.323 network. It decodes the received video codes for the video display on the H.323 terminal. Video codec is optional for H.323 implementation. H.323 video codec algorithms are H.261 and H.263. Two H.323 terminals exchanging video information must support at least one common video codec algorithm.

H.225.0 RAS is a client/server protocol for use between an endpoint and the gatekeeper. This protocol defines how H.323 endpoints locate and register with the gatekeeper. H.225.0 RAS also defines how the gatekeeper locates endpoints, admits the endpoints into the zone, and specifies their access permission.

H.255.0 call signaling is a protocol for establishing a connection for H.245 control signaling. H.245 control signaling is a protocol for exchanging control messages, such as a terminal's audio and video capacities, and negotiating call features between two communicating endpoints.

H.323 borrowed RTP and RTCP from the IETF, which defines the protocols in Request for Comments (RFC) 1889. VoIP applications are often realtime audio and video applications. RTP provides end-to-end delivery services for data that requires realtime support. The services include payload (i.e., packet) type identification, sequence numbering, timestamping, and delivery monitoring. In VoIP, RTP uses UDP's multiplexing and checksum functions. RTCP is a companion protocol of RTP. RTCP monitors the quality of data delivery through a feedback-reporting function that the RTCP sender and receiver perform. The protocol also carries a transport-level identification, the canonical name, for an RTP source that the destination uses to synchronize audio and video.

Figure 4 shows how the H.323 protocols fit into the VoIP protocol stack. Audio codec and video codec run on top of RTP and RTCP; H.225.0 RAS, RTP, and RTCP run on top of UDP; and H.255.0 call signaling and H.245 control signaling run on top of TCP.

H.323 Procedures
Let's look at how the H.323 components work together and how H.323 uses the protocols to process a VoIP call. Two terminals can communicate directly without a gatekeeper, but the gatekeeper provides several useful functions, such as admission control and bandwidth management. A good H.323 network includes a gatekeeper.

Before one terminal can talk to another terminal in a gatekeeper-governed H.323 network, the terminal must have admission from the gatekeeper. The terminal discovers the gatekeeper through a static or dynamic method. In the static method, you configure the terminal to statically store the gatekeeper's IP address in the terminal. In the dynamic method, the terminal sends a gatekeeper request message to a multicast address, and the gatekeeper that listens to the multicast address answers the request with a gatekeeper confirmation message that contains the gatekeeper's IP address. The first terminal, Terminal 1, for example, sends an admission request to the gatekeeper and asks permission to talk to the destination terminal, Terminal 2. The gatekeeper responds to Terminal 1 with an admission confirmation message that includes Terminal 2's IP address. The gatekeeper can deny the admission request if its ACL shows Terminal 1 doesn't have permission to talk to Terminal 2 or the current use of bandwidth has exceeded a set threshold. This admission procedure uses the H.225.0 RAS protocol.

When Terminal 1 receives the admission confirmation, it opens a TCP connection for the H.255.0 call-signaling protocol between Terminal 1 and Terminal 2 and uses the protocol to send Terminal 2 a setup message. When Terminal 2 receives the setup message, it goes through the same admission procedure that Terminal 1 did to request permission from the gatekeeper to talk to Terminal 1. When Terminal 2 receives the admission confirmation, it then uses the H.255.0 call-signaling protocol to send Terminal 1 a connect message that includes the H.245 TCP port number that Terminal 2 wants to use. Terminal 1 then establishes a TCP connection with Terminal 2. Through the connection, both terminals use the H.245 control-signaling protocol to exchange terminal-capability information and negotiate the features of the call. Before using the H.245 protocol to open audio channels, the terminals also determine which terminal is the master and which is the slave for the call. Each terminal opens a unidirectional logical channel for audio transmission. Terminal 1 sends Terminal 2 an H.245 opening-channel request to open an audio channel, which includes the UDP port that the RTCP receiver will report on. Terminal 2 replies to Terminal 1 with an H.245 opening-channel acknowledgment that indicates the UDP ports that the RTP audio stream will use and the RTCP sender will report on. In the same way, Terminal 2 establishes another audio channel from itself to Terminal 1. The terminals complete the call setup and can start to exchange VoIP communications. Figure 5 illustrates this process.

This example doesn't describe how a terminal talks to another terminal outside the H.323 zone, such as in a PSTN. However, the process is similar: Terminal 1 communicates with a gateway, and the gateway converts the protocols and messages between Terminal 1 and Terminal 2.

Other VoIP Standards
H.323 has received support from VoIP vendors, but two other VoIP standards are also gaining acceptance in the industry. The IETF developed Session Initiation Protocol (SIP, defined in RFC 2543) and Media Gateway Control Protocol (MGCP, defined in RFC 2705) for the Internet.

As you've learned, call setup in H.323 isn't simple—it requires admission control, connection establishment, call-feature negotiation, and audio channel opening and involves several protocols. A VoIP call setup by H.323 can take much longer than an average PSTN call. Although the new H.323 version 3 speeds call setup, vendors haven't widely implemented this version in their products.

Compared with H.323, SIP is a lightweight protocol in regard to call setup. When a user wants to call another user, the caller initiates the call with an invite message, which contains information such as the caller's identification and call features and services that the caller wants to use. The caller sends the invite message to an SIP server, which functions either as a proxy or a redirect server. The caller learns the SIP server address by querying a DNS server. When an SIP proxy server receives the invite message, it uses a location service to find the called party and forwards the invite message to the called party. The called party then sends an OK response to the SIP proxy server, which forwards the OK response to the caller. The caller then sends the called party an acknowledgment through the proxy server that completes the call setup. Figure 6 depicts the SIP call setup process with a proxy server. In contrast to this process, when an SIP redirect server receives a message, it returns to the caller the called party's location, to which the caller can send the invite message. The SIP redirect server can also perform call party authentication and authorization.

SIP uses an Internet-standard URL to identify an SIP client or user (e.g., SIP:john@acme.com, which represents user John's SIP identification). With this format, you can use someone's email address to guess that person's SIP identification. You can embed an SIP URL in a Web page; then, when you click the SIP URL in the Web page, you can initiate a call to the person whom the URL represents.

SIP offers a unique feature, fork, that H.323 doesn't provide. An SIP server can fork a received invite message by issuing more than one request to a group of phones or computers so that several extensions receive the same call together. The party who answers the call handles the rest of the communication. This feature works well for customer service operations.

SIP is a text-based protocol, similar to HTTP, and has only six fundamental control messages for call setup and release. Therefore, you can easily implement SIP with languages such as Java and Perl. One of the vendors that strongly support and promote SIP is 3Com. The company has developed an SIP server and a line of SIP phone products. Microsoft is also developing an SIP server and SIP client for Windows, which the company displayed at the Voice of the Net (VON) Spring 2001 conference.

Another VoIP standard for gateway implementation is the IETF's MGCP. This protocol assumes that call setup and control is outside the gateway. Separating the gateway functions from the call setup and control functions simplifies gateway implementation, upgrading, and maintenance. MGCP can work with H.323. MGCP-defined and -supported gateways include trunking gateways (i.e., IP-PSTN gateways) that interface between VoIP networks and PSTNs; Voice over Asynchronous Transfer Mode (ATM) gateways that interface between Voice over ATM networks and PSTNs; residential gateways that provide interfaces to VoIP networks for cable modem, Digital Subscriber Line (DSL), and broadband wireless devices; and PBX-based gateways that provide traditional digital PBX interfaces to VoIP networks.

Deployment Considerations
VoIP is still new and hasn't been widely implemented. However, the driving force of the ubiquitous IP will make sure that you eventually deploy VoIP applications in your network. Before you start to implement VoIP, you need to build a network infrastructure that supports VoIP, keeping in mind the following considerations.

QoS. A traditional telephone call relies on a connection-oriented circuit that a PSTN provides with high QoS. An IP network, however, is connectionless and, by default, provides neither QoS nor the same level of performance and reliability as a PSTN. Voice communication between two parties through an IP network often involves many network hops and devices, such as H.323 gateways and IP routers. An end-to-end one-way delay of more than 150 milliseconds (ms) is generally too long. To achieve good performance and reliability for VoIP applications, deploy QoS in your network before deploying VoIP. QoS lets you prioritize your network traffic and guarantees timely data delivery for time-sensitive applications (e.g., VoIP applications). For more information about QoS, see "Build a Better Network with QoS," November 1998, and Barrie Sosinsky and C. Thi Nguyen, "Quality of Service," http://www.win2000mag.com, InstantDoc ID 8202.

Firewalls. When your VoIP deployment involves firewalls (e.g., a VoIP application between your intranet and the Internet), you need to use firewalls that support VoIP (e.g., support H.323 if the application is an H.323 application). An H.323 application often dynamically negotiates TCP and UDP ports to use during call setup. This characteristic is quite different from other IP applications.

NAT. Many companies use Network Address Translation (NAT) to map public IP addresses and private IP addresses and keep the company's intranet IP addresses from the Internet. An H.323 application can fail when going through NAT because the application can repeat the destination IP address in its payloads. If your NAT isn't H.323-aware, it can't convert the address in the application payload when converting the address in the IP packet header.

DHCP. To work on an IP network, an IP phone needs an IP address. An IP phone might use DHCP to request a dynamic address from your DHCP server, but to configure the IP phone, you often must add to your DHCP server a custom DHCP option that includes special parameters to support VoIP. Examine your DHCP server to see whether it lets you define custom options. Windows 2000 and Windows NT DHCP servers support custom options.

Learning PBX. Companies often have a voice network team that's separate from the data network team. Many IP network engineers and managers aren't familiar with their company's voice systems. To better integrate your voice network into your IP network, you need to learn about your company's PBXs and analog and digital trunk circuits. With comprehensive knowledge of your voice systems, you can choose VoIP products wisely, such as a gateway that supports your PBX trunk-port type. Many companies have started to integrate their voice and data network teams into one group to better support their VoIP deployment.

Lower Your Bottom Line
VoIP helps reduce the cost of voice communication in your enterprise by using the cost advantage of the IP-network-and-Internet setup and integrating the old PSTN. This evolving technology lets you build new interactive e-commerce applications to improve efficiency and quality of customer service.