Azure Batch is a robust cloud-computing service that provides parallel batch-processing management along with clusters of VMs to efficiently tackle compute-intensive workloads of any size. Although large-scale parallel batch processing has been around for decades, it has long been restricted to those individuals and businesses with the funds and expertise to purchase and run high-performance computing (HPC) resources. However, Azure Batch has now made batch processing far more feasible and affordable with a platform that allows any Azure subscriber to process and analyze data-intensive workloads at scale. Azure automates and streamlines the setup and management of HPC clusters, standalone VMs, virtual networks and batch job scheduling to accomplish this remarkable computing feat.
Batch and parallel processing
When coupled together, batch processing and parallel processing involve multiple computers or CPUs working simultaneously on separate tasks within the same overall processing job. Batch processing via parallel computing has a long history that stretches all the way back to the age of punch-card computers. Yet the basic value proposition has remained largely unchanged. By executing separate tasks simultaneously, rather than in sequence, the overall batch job is completed far more quickly. First, however, one must divide the overall batch job into separate tasks that do not depend on one another for their completion. Only then can the separate, component tasks be executed simultaneously by separate computers.
In the digital age, data-intensive jobs spanning nearly every industry are benefiting from batch processing via parallel computing. Here are a few examples:
- 3D-image rendering
- Payroll processing
- Risk modeling
- Genomic research and analysis
- City traffic modeling
- Web search processing
- Weather and climate forecasting
- System stress analysis
In many cases, these highly data-intensive jobs call for enormous computing power, often involving dozens, hundreds or even thousands of CPUs running in parallel. Therefore, individuals and organizations without large IT budgets to invest in the necessary infrastructure often find themselves prevented from pursuing these important jobs. However, with the rise of on-demand cloud computing from companies like Azure, large-scale batch computing is becoming feasible and affordable as a cloud service.
Three reasons to use Batch
Choosing Azure’s rich set of cloud-computing tools for your parallel batch processing jobs comes with a number of clear benefits. Here are three to consider:
1. Windows or Linux: your choice
Azure Batch is a flexible platform that embraces the two major operating systems and frameworks most often used to run data-intensive compute jobs at scale. Whether you prefer to build solutions using the Microsoft .NET framework or Linux (CentOS, Ubuntu, SUSE Linux Enterprise), Azure Batch welcomes your code.
2. Breathing room for your apps
With Azure Batch, it’s no longer necessary to confine your applications within the often painful limitations imposed by relatively inflexible on-premises workstations and clusters. Azure Batch gives your apps breathing room in the Azure cloud. Your apps can run separate tasks in parallel on as many separate VMs (i.e. compute nodes) as needed for optimal processing. You get to decide how Azure Batch processes and distributes your input data. You also get to choose the task parameters and the start command.
3. Scale by orders of magnitude
On-premises resources for large, compute-intensive jobs are notoriously expensive and difficult to scale efficiently. With Azure Batch, by contrast, you don’t need an enormous budget to pay for IT infrastructure and high-level expertise. Batch provides the resources as-needed with pay-per-use processing through a highly flexible pool of compute nodes that scales automatically as your needs grow.
Example: Accelerated 3D-image rendering
In order to illustrate the usefulness of Azure Batch, let’s walk through Azure’s example of a highly compute-intensive 3D-image rendering job that you are trying to perform. Let’s suppose that the computing demands of the rendering job quickly outstrip your on-premises resources, and you soon turn to Azure Batch for a solution. Here’s how it works:
Uploading your input and app
Your first step on Azure Batch is to upload the data you want to process as input, along with the app you want to run on the data. In this example, your input data is the collection of 3D-image files you want to render, and your app is the 3D-rendering engine or toolset you wish to use. Both your files and your code will go into Azure’s cloud storage solution: Azure Storage.
Creating a pool of compute nodes
Next, under your direction, Azure Batch will create a pool of compute nodes, each of which will execute a separate task within the overall job of rendering the 3D images provided as input. As expected, processing these separate rendering tasks will take place simultaneously, in parallel. Note that Azure Batch will take care of installing the provided rendering engine or toolset on the nodes.
Creating a batch job with tasks
With the pool of parallel compute nodes in place, it’s time to create your batch job and add tasks to it. Azure Batch’s job-scheduling engine will automatically handle scheduling the tasks and assigning them to compute nodes.
Monitoring the job’s progress
Once you’ve created your batch job and used your start command, Azure Batch’s pool of parallel compute nodes will begin performing their separate, simultaneous tasks, each of which contributes to rendering the 3D-image files you provided. Meanwhile, you can use a local client to query Batch through HTTPS in order to securely monitor individual tasks or the batch job as a whole.
Retrieving the output
Depending on your preferences, you can either retrieve the output from each completed task directly from each node in your pool; or you can choose to have all your nodes upload their results to Azure Storage, where you can retrieve their results together. The final output, a complete rendering of your 3D-image files, should be completed in a fraction of the time that sequential processing would require.
Azure Batch pricing
Azure Batch pricing is highly flexible. There is no charge for the Azure account itself. Batch only charges you for the resources consumed during batch jobs. These include VM processing, which is charged at Azure’s standard rate, and data/application storage. The pricing structure is pay-as-you-go, with no up-front or termination fees. Cost management tools are free.
Tackling data-intensive workloads with Azure Batch
Thanks to the recent rise of cloud computing, any individual or enterprise that wants to process and analyze high-volume, data-intensive workloads at practically any scale can now do so. As outlined above, choosing Azure Batch comes with at least three clear benefits. First, Azure Batch provides the flexibility to work with either the Windows .NET framework or any of three flavors of Linux. Second, your apps can run more efficiently on separate, parallel compute nodes in the Azure cloud. Third, you can scale by orders of magnitude as Azure’s highly flexible pool of compute nodes automatically scales with your needs. As the example of 3D-image rendering illustrates, Azure Batch follows a straightforward, intuitive setup and management process. Simply upload your input data and application code to Azure Storage; direct Azure Batch to create a pool of compute nodes; create a batch job with component tasks; monitor the job’s progress as the compute nodes run their separate, parallel tasks; and finally retrieve the output when the batch job is complete. Remarkably, all of this processing power and flexibility comes with no up-front costs or termination fees whatsoever, allowing you to take on large-scale, compute-intensive workloads with confidence.
For more on Azure services that can help your business grow and scale, contact us.