Q: What is Azure Auto-Scale?

A: Many services in Azure will scale based on utilization; however, it's slightly more complicated for IaaS. With IaaS, you're responsible for the virtual machine content, so there's no guarantee that Azure can just duplicate a virtual machine template and it will work correctly in a multi-instance configuration (although this will change in the future with the resource model). It's possible to enable scaling with IaaS; it works as follows:

  1. You create multiple virtual machines that provide the same service (e.g., multiple virtual machines all running IIS that are part of a load-balanced set).
  2. The virtual machines are part of an Availability Group.
  3. The Availability Group is configured for auto-scale within the Cloud Service, which has configuration for resource utilization goals to control whether virtual machines are started or stopped.

You need to pre-create all the virtual machines; Azure won't create them. After you create multiple machines and they're part of an Availability Group, you can configure the auto-scaling:

  1. Select the Cloud Services workspace and select the Cloud Service that contains the Availability Group.
  2. Select the Scale tab.
  3. Select the Availability Set that you want to configure auto-scale for, which will show the settings for the scale.
  4. For IaaS, you can select the CPU metric to scale by, which allows you to configure the range of virtual machines that can be scaled (minimum and maximum), in addition to the target CPU utilization for the virtual machines, along with the number of virtual machines you need to scale up and down by or you can scale by queue depth with similar options available.
  5. You can select a schedule for when scaling should be applied.
  6. Click Save.

The benefit of auto-scale is it only starts the number of virtual machines required to provide the service capacity needed. Although there's some delay in the scaling actions, you only pay for the instances required, which saves you money. It's important that you have a means to load-balance the service that detects when instances are available (e.g., the load-balancing endpoint).