Troubleshoot users' handheld devices
Research In Motion's (RIM's) BlackBerry is a PDA that gives you wireless access to your Exchange Server Inbox and Calendar. The BlackBerry has become one of the more prevalent solutions for providing wireless access to Exchange email and personal information manager (PIM) data. BlackBerry devices provide wireless access to Exchange data through BlackBerry Desktop Software (for one user) or BlackBerry Enterprise Server (for multiple users). Many organizations and government agencies deploy BlackBerry Enterprise Server as an enterprise solution to let emergency workers and other field personnel stay connected. If you have users who rely on BlackBerry devices, you should understand how these handhelds communicate and how to monitor them and troubleshoot their problems.
Links in a Chain
Figure 1, page 2, depicts the chain of components and services in a BlackBerry network—mailboxes on the Exchange server, the BlackBerry Enterprise Server system, the Internet, RIM's Server Routing Protocol (SRP) host, RIM's wireless relays, the Mobile Switching Offices (MSOs), radio cell base stations, and the BlackBerry device. BlackBerry Enterprise Server uses Messaging API (MAPI) Collaboration Data Objects (CDO) to interact with Exchange mailboxes. When BlackBerry Enterprise Server starts to monitor a mailbox, it registers itself with the Exchange server by logging on to the mailbox. When a new message arrives in the mailbox, the Exchange server generates a MAPI-based notification to alert BlackBerry Enterprise Server of the message's arrival. BlackBerry Enterprise Server reads the first 2000 bytes of the message (to save bandwidth, the server sends only the beginning of long messages), encrypts those bytes, and passes them over the Internet to the next link in the chain—the SRP host.
The SRP host is an Internet-accessible RIM server that accepts and authenticates communications sessions from the BlackBerry Enterprise Server system and determines which wireless network to use to communicate with the BlackBerry device. In North America, BlackBerry devices use one of three wireless networks—Mobitex, DataTAC, or General Packet Radio Service (GPRS). In Europe, BlackBerry devices use Global System for Mobile Communication (GSM)/GPRS exclusively. (Most of today's BlackBerry devices don't support roaming on networks outside their home territory. In other words, a US/Canadian BlackBerry doesn't function in Europe and vice versa, even when similar networks are available in both places, because although the networks use the same technology—e.g., GPRS—they use different frequency bands—1900MHz in North America and 900MHz or 1800MHz elsewhere. However, in October, RIM introduced the BlackBerry 6710, which supports international roaming.)
After the SRP host identifies the appropriate network, it routes the message to one of RIM's wireless network relays (i.e., RIM's interfaces to the wireless network providers). Message routers then use MSO location-tracking data to determine which radio cell base station should get the message for the BlackBerry. Figure 2 shows an MSO connected to four base stations, each of which anchors a cell. When a BlackBerry is inside a cell, its embedded transceiver communicates with that cell's antenna. The MSO keeps track of which cell the BlackBerry is in. (The BlackBerry's wireless radio transceiver has a unique serial number issued by the wireless carrier that MSOs use to locate and identify the handheld device when it's active on the wireless network.) As the BlackBerry moves toward the edge of the cell, the transceiver and the base station in the adjacent cell measure signal strength to and from the device. As the BlackBerry moves farther from the center of the current cell, the signal strength diminishes in that cell and increases in the adjacent cell. When the BlackBerry crosses the boundary between the cells, its location is updated within the MSO. This tracking process lets the BlackBerry network route messages to BlackBerry devices, no matter where they move within the network.
The process is almost exactly reversed when a user sends a message or command from the BlackBerry. The base station accepts the message and places it on the wireless network. The network's routers use the MSO to determine how to get the message back to the RIM network. The RIM network then uses routing information specific to the BlackBerry device that sent the message or command to send the message to the BlackBerry Enterprise Server system, which interacts with the Exchange mailbox.
After you deploy BlackBerry Enterprise Server, it operates in almost a set-it-and-forget-it manner. The problem you'll hear about most frequently from users is that messages aren't moving to or from their handhelds. Let's look at the events that occur or can occur at points along the communication chain that might prevent messages from moving or give the impression that messages aren't moving, as well as some steps you and your users can take to watch for and mitigate these problems.
BlackBerry Enterprise Server can monitor and support hundreds of accounts at the same time. To accomplish this job, BlackBerry Enterprise Server runs as a multithreaded application that uses MAPI CDO to access accounts on one or more Exchange servers. The number of threads BlackBerry Enterprise Server spawns varies depending on the number of servers and accounts it needs to support, but if the server supports more than a few handhelds, one thread typically services many mailboxes at once. (You can see how many threads BlackBerry Enterprise Server is running in two ways. First, you can use Performance Monitor—select Threads, then look through the instances for the BlackBerryServe/x objects. Second, you can look in the BlackBerry Enterprise Server debug logs described later in this article. When BlackBerry Enterprise Server starts or when you add a handheld device to or remove one from the server, the server makes an entry in the log that shows the number of threads in use.) The threaded process rotates through the mailboxes, usually very quickly, performing necessary tasks such as checking for and forwarding messages to BlackBerry devices or carrying out instructions, such as to compose and send a new message, from the devices.
Under certain conditions, a MAPI application can unexpectedly be terminated or become unresponsive. A MAPI application usually is terminated because it has encountered an error it can't recover from, such as a corrupt data item in a mailbox or calendar. A MAPI application can become unresponsive when it's waiting for an action that can't be completed because of, for example, a lost connection to an Exchange server. When these types of events happen to a BlackBerry Enterprise Server thread, it stops cycling through mailboxes. When such events happen to all the threads, BlackBerry Enterprise Server stops moving messages to and from handhelds.
In addition to the immediate SRP host that your BlackBerry Enterprise Server system connects to, other systems in the RIM and wireless networks handle message traffic. These infrastructures are fault tolerant and are designed to provide multiple routing paths, thus they aren't usually the source of delivery problems. After a message has made its way to the RIM and wireless networks, connectivity between the BlackBerry and a cell base station is the major determinant of whether messages are successfully delivered. A BlackBerry is essentially a two-way radio. If you've ever listened to the radio in a car, you've noticed that your distance from the signal source and objects between you and the signal source affect the radio's reception. You might find that if you stop next to a large truck, static interrupts your previously clear signal, but if you pull forward a few feet, the signal quality returns. Similar circumstances affect the BlackBerry. Users' proximity to a cell base station and the materials—especially metals between users and the base station can significantly affect signal quality. For example, large buildings have a lot of metal—steel support frames, steel rebar within concrete, aluminum framing in office walls and cubicles—that can reduce or block signal penetration. Many buildings also have a Mylar coating on the windows to act as a security mechanism and to reflect sunlight. This coating can also deflect radio signals. Coverage might also be spotty in underground parking garages or buildings. You should communicate these limitations to your users so that they don't have unreasonable expectations. To learn how to check signal strength and cell location from a BlackBerry, see the sidebar "Secret Key Combinations."
Another factor to consider in relation to signal connectivity is that a limited number of channels are available for communication. Unlike cell phones, which require a constant connection—type channel, the BlackBerry uses packets to communicate, so it doesn't require a constant connection. Thus, users can get channel time with a BlackBerry much more easily than with a cell phone, even if many devices are contending for the channels. Nevertheless, if many devices are on the network in your vicinity, the combination of congestion and a weak signal might affect communication.
When multiple messages are bound for a particular BlackBerry, BlackBerry Enterprise Server queues those messages and sends no more than five at one time to the wireless network. While BlackBerry Enterprise Server waits to receive confirmation that the last five messages it sent to the SRP host were delivered, it places and holds subsequent messages in a queue on the server. When the relay sends a message to the wireless network, the MSO routes the message to the cell site in which the MSO thinks the BlackBerry is operating. Wireless networks vary, but typically if the wireless network can't reach the BlackBerry (e.g., because of a poor signal or no signal), the network briefly stops trying to communicate with the handheld so that channels and resources can service reachable wireless devices rather than waste resources on unreachable handhelds. After a short wait (typically between 5 minutes and 15 minutes), the wireless network again tries to contact the handheld. If after an hour the wireless network still can't contact the handheld, the network extends the wait period and attempts a connection only once per hour. After 4 hours, the message expires from the wireless network and the RIM network's relay must resend it.
If the BlackBerry moves to a new cell during a wait period, the device's location can become out of sync with the location that the MSO stores. Messages continue to queue at the BlackBerry Enterprise Server system, the RIM relay, and the wireless network routers. When the MSO receives updated information about the handheld's location, the BlackBerry receives a flood of messages.
A user whose handheld device has indicated a strong signal for several hours, has received no messages during that time, then suddenly receives a bulk delivery of messages (some of which were sent many hours ago) might understandably misinterpret this influx as a sign of a server problem. You need to let your users know that this situation can occur from time to time and doesn't represent a problem with the BlackBerry Enterprise Server system or Exchange servers.
Looking for Clues
When users report message transport problems, you need to determine where along the communication chain the problem exists. The first thing I usually do is look at recent connectivity and statistics information for the user. In an Exchange Server 5.5 environment, you can use Microsoft Exchange Administrator to access this information. The BlackBerry Enterprise Server installation program adds an extension DLL to your Exchange servers. This extension adds a tab to each mailbox profile; you can use this tab to configure BlackBerry settings for the account. The tab also provides access to statistics and information about a handheld's recent connectivity, as Figure 3 shows. In an Exchange 2000 Server environment, you use the Microsoft Management Console (MMC) BlackBerry Enterprise Server Management snap-in to access Handheld Manager, which displays the same statistics page.
In Figure 3, you can see that the BlackBerry Enterprise Server system hasn't sent or received anything from the handheld for 8 days and almost 9 hours. Such a long amount of time without contact typically signals a problem such as the user forgetting to turn back on the BlackBerry's transceiver, a lapsed airtime contract, or a damaged transceiver. If the handheld's radio is on, you can generate a network registration request to test the device's connection to the wireless network or you can have the user send a PIN message to himself or herself. (RIM assigns to each BlackBerry a unique PIN that RIM uses to identify the device. PIN-mode communications are messages sent from one device to another rather than through an Exchange server.) To generate a network registration request, access the BlackBerry's Options mode, open Network Settings, click the track wheel, and select Register Now. This action sends a registration message to the RIM network; in response, the BlackBerry should receive a registration confirmation message. If the device receives this message, you know that it can communicate over the wireless network to the RIM infrastructure.
As I mentioned earlier, sometimes the BlackBerry's actual location is different from where the MSO thinks the handheld is located. Sending a PIN message or the registration packet or even just turning the radio off and on updates the BlackBerry's location, which usually also lets the RIM and wireless networks successfully route messages to the handheld. If these simple actions don't resolve the delivery problem, you might need to resort to a more drastic action such as wiping the device's memory clean and reloading the handheld's applications and data or removing and re-adding the handheld to BlackBerry Enterprise Server. These actions reset the databases that the handheld and BlackBerry Enterprise Server maintain.
If the Handheld Statistics screen shows a large value in the Pending to handheld field, as many as five messages are most likely queued on the wireless relay and the others are queued on the BlackBerry Enterprise Server system. Your next troubleshooting step is to look at the BlackBerry Enterprise Server Monitor status icon to check the connection between the BlackBerry Enterprise Server system and SRP host. The BlackBerry Enterprise Server Monitor watches the Windows event log for BlackBerry-generated events and periodically sends ping messages to the SRP host. If the Monitor is receiving responses to its pings, it displays an icon with a radio tower and radio waves. If the Monitor isn't receiving responses, it displays an icon with a radio tower and an exclamation point (no waves) and logs an event in the Windows Application log. You can also configure the Monitor to send an email alert to one or more SMTP addresses.
Connections between the BlackBerry Enterprise Server system and SRP host are initiated only by the BlackBerry Enterprise Server system, never by the SRP host. However, after a connection is established, communication is bidirectional. If either host is taken offline or an intermediate system (such as a firewall, DNS server, or router) causes a connectivity interruption, the connection between the BlackBerry Enterprise Server system and SRP host is lost and the BlackBerry Enterprise Server writes event ID 30155 to the BlackBerry debug logs (in the C:\Program Files\Research In Motion\Blackberry for Exchange directory by default). The logs are plaintext and contain data about all the actions that BlackBerry Enterprise Server performs. (For a closer look at the logs, see the Web sidebar "BlackBerry Enterprise Server Debug Logs," http://www.exchangeadmin.com, InstantDoc ID 27222.) The BlackBerry Enterprise Server system immediately tries to reestablish the connection, but depending on the reason the connection was lost, the attempt might not be successful immediately. If the connection is frequently terminated or you can't reestablish it, you'll need to look in the BlackBerry debug logs to determine the reason. For example, in the event
SRP connection dropped,
the Winsock error code 10060 means that the connection timed out. To decode other errors, see "Windows Sockets Error Codes" (http://msdn.microsoft.com/library/en-us/winsock/winsock/windows_sockets_error_codes_2.asp).
By far one of the simplest things that you can do to ensure that your BlackBerry Enterprise Server system is functioning problem-free is to apply the most recent service pack. The most recent release is BlackBerry Enterprise Server 2.1 Service Pack 4 (SP4). Among other improvements, RIM has made a few changes that help prevent hung threads from negatively affecting overall operations. When BlackBerry Enterprise Server detects a hung thread, the server can restart the thread or prevent it from affecting other BlackBerry Enterprise Server processes. Another step you can easily take is to properly configure the BlackBerry Enterprise Server Monitor to send alerts to a monitored account or your pager when an SRP connection is lost or when significant events are logged in the Windows Application event log.
For many organizations, BlackBerry email is a crucial communications tool that you need to ensure is running properly. If your BlackBerry Enterprise Server system experiences problems, you need to know right away. Monitor the health status of the system the way you do your other servers: Establish baselines, check on queues, generate usage reports, and watch for signs of trouble in the Windows event log and BlackBerry Enterprise Server debug logs.