Software reduces the operational and energy costs of servers without turning them off
Last fall, 100 server administrators at some of the world's largest companies were surveyed about their physical and virtual servers. Nearly three-quarters of them indicated that 15 percent of their physical servers are running but not being used. "If you were to pull the plug on those servers, nobody in the business would be impacted by those servers being down," said Andy Dominey, 1E product manager. 1E and the Alliance to Save Energy (ASE) commissioned an independent research firm to conduct this survey.
Unused physical servers can lead to not only wasted energy costs but also wasted operational costs. "The operational licensing and hardware costs to keep a server up and running is fairly significant," said Dominey. "The total running cost of 2,000 servers is almost $8 million. The vast majority of that—about $6.5 million—is equated to just the operational costs alone."
Running unused servers isn't a problem limited to physical servers. In all, 65 percent of the server administrators admitted that, at some point, they virtualized a server that wasn't even needed in the first place. Even more common is virtual sprawl—that is, the uncontrolled deployment of virtual machines (VMs). In all, 84 percent of the server administrators said they are experiencing or are concerned about virtual server sprawl. Virtual sprawl can result from various actions. For example, too many VMs might have been created or the purpose for which a VM was created no longer exists.
These survey findings as well as others are published in the 1E/ASE report titled "Server Energy and Efficiency Report 2009". ASE is a nonprofit coalition that promotes the efficient and clean use of energy worldwide. It was founded in 1977 by two U.S. senators who were concerned that Americans were beginning to return to energy-wasteful lifestyles after the energy crisis created by the Arab oil embargo. Founded in 1997, 1E offers products and services aimed to make IT more efficient. It's probably best known for its NightWatchman power-management solution for desktops.
1E recently released the newest version of a product that bears a similar name: NightWatchman Server Edition. Like the NightWatchman desktop solution, NightWatchman Server Edition can help reduce power consumption, but it does so in an entirely different way.
"When we looked at servers originally," explained Dominey, "we looked at it as a server power-management solution. We were originally thinking about turning off servers. We were met very quickly with concerns from our customers." In a nutshell, the customers basically said that switching off production servers is very, very bad.
"The power management we do today with Server Edition doesn't, at any point, bring the server down. The server is always up and always responsive," explained Dominey. "But by using our Drowsy Server capabilities, we're able to reduce the power consumption of a typical server by about 12 to 15 percent."
The concept behind Drowsy Server is simple. When a server is busy doing the task for which it was bought and provisioned—which is referred to as Useful Work in 1E lingo—power management isn't applied. When a server isn't doing Useful Work, power management is applied.
To determine whether a server is performing Useful Work, NightWatchman Server Edition analyzes the application layer to determine what the server is actually doing. "Server Edition is application aware," said Dominey. "When an application is doing something that's pertinent to the business, it favors performance. When an application is doing something not pertinent to the business, it favors power management."
Obviously, knowing about and understanding every single application that exists in the marketplace in order to determine whether or not it performs Useful Work is impossible. So, NightWatchman Server Edition assumes that everything that runs on a server is useful, except for the processes on an exclusion list. Out of the box, the exclusion list contains about 100 processes that 1E has found to be nonproductive.
For example, various backup processes (e.g., Windows Backup Utility, Veritas Backup Exec, Symantec Backup Exec) are in the out-of-the-box exclusion list. "Are you really that concerned if your backups take 2 or 3 minutes or even 10 minutes longer to run? We believe that it's more important at that point in time to reduce the energy consumption of the server as much as possible," said Dominey. "In some cases, the backup time does matter—and we acknowledge that." That's why the exception list is customizable.
By default, the Drowsy Server functionality is disabled. Each company can decide whether to enable it on one, some, or all of its servers. To enable the Drowsy Server functionality, you use the drowsy policy, which specifies what should happen when Useful Work isn’t happening. It's one of two policies that can be applied to a server. The other is the operational policy, which specifies what should happen when useful work is happening. Both policies are customizable.
"We originally looked at approaching power management with a schedule-based policy in mind, much like our client solution, but very quickly we identified that scheduling on a server for the most part is difficult if not impossible because server workloads can vary significantly. So, we do everything dynamically. The agent itself makes decisions on the fly every second about whether the machine should be power managed."
There are currently agents for servers running Windows Server 2003 and later, Ubuntu, NovellSUSE, and Solaris 10, which is new to version 2.0. Another new feature in version 2.0 is built-in integration with the Microsoft Hyper-V and VMware ESX 3.5 and 4 platforms. "By doing that, we've enabled our Useful Work calculation in our efficiency matrix to extend beyond the virtual guests and actually into the virtual host layer as well. We can show you from both a Useful Work perspective and a generic efficiency standpoint whether or not you're really getting the most out of your virtual host." Agents are required for VM guests but not the host.
For both virtualized and physical servers, NightWatchman Server Edition produces an in-depth report that includes candidates for decommissioning, the steps for doing so, and the projected cost savings in terms of memory and hard disk costs, operational costs, and power costs.
Servers become candidates for decommissioning when they haven't been used much or at all during a specified time period. (The minimum recommended period is 30 days.) Servers also become candidates when they aren't being used for any Useful Work—that is, they're only running processes found in the exclusion list. By running those processes on servers that are performing useful work, the candidate servers can be decommissioned, repurposed, or reallocated.
In the case of virtual servers, the information in the report can be used to help combat virtual sprawl because it identifies those VMs that aren't performing useful work. In addition, it identifies any unofficial VM guests (i.e., guests that haven't reported in during the specified time period, which means they don't have an agent).
Although NightWatchman Server Edition is targeted toward environments containing 500 to 50,000 servers, it can be beneficial in smaller environments as well—something that 1E found out firsthand. With fewer than 100 servers, 1E thought its IT organization was relatively lean. "The IT team believed wholeheartedly that we would not find any systems doing non-useful work," said Dominey. "We actually found about 25 percent were performing no useful work at all." Everyone, including the IT team, was quite surprised.
"Just because you think you know what is happening in your environment, regardless of how small it is, doesn't necessarily mean that it's so," said Dominey.