Microsoft are announcing new innovations across their datacenter fleet, including the Microsoft Azure Maia AI Accelerator, optimized for artificial intelligence (AI) tasks and generative AI, and the Microsoft Azure Cobalt CPU, an Arm-based processor tailored to run general purpose compute workloads on the Microsoft Cloud.

Microsoft Azure Maia: An AI Accelerator chip designed to run cloud-based training and inferencing for AI workloads such as OpenAI models, Bing, GitHub Copilot and ChatGPT.

Microsoft Azure Cobalt: A cloud-native chip based on Arm architecture optimized for performance, power efficiency and cost-effectiveness for general purpose workloads.

The Maia AI Accelerator is geared towards excelling in artificial intelligence (AI) tasks and generative AI, while the Cobalt CPU is an Arm-based processor specifically crafted to handle general-purpose compute workloads on the Microsoft Cloud. Scheduled to be deployed in Microsoft's datacenters early next year, these chips will initially power services like Microsoft Copilot and Azure OpenAI Service.

 

Optimising every later of the Stack


Chips are hailed as the workhorses of the cloud, processing vast amounts of data to enable a range of activities, from sending emails to generating images. Microsoft's decision to incorporate custom chips is likened to building a house, allowing the company to control every design aspect for its cloud and AI workloads. These chips will integrate into custom server boards within tailor-made racks designed to seamlessly fit into Microsoft's existing datacenters. The synergy between hardware and software, co-designed to unlock new capabilities, is a key focus.

Microsoft's unveiling of these custom chips marks a strategic move towards providing a flexible, efficient, and sustainable infrastructure to meet the escalating demand for compute power in the cloud and AI realms. The integration of Maia and Cobalt into Microsoft's ecosystem signifies a pivotal step in shaping the future of cloud and AI services.

Additionally, Microsoft are announcing the general availability of Azure Boost, a system that makes storage and networking faster by moving those processes off the host servers onto purpose-built hardware and software. You can now achieve up to 12.5 GBs throughput, 650K input output operations per second (IOPs) in remote storage performance to run data-intensive workloads, and up to 200 GBs in networking bandwidth for network-intensive workloads.

Microsoft is expanding its industry partnerships to offer more infrastructure options for customers. They introduced the NC H100 v5 Virtual Machine Series for NVIDIA H100 Tensor Core GPUs, enhancing performance and efficiency for mid-range AI training and generative AI inferencing. Microsoft plans to incorporate the NVIDIA H200 Tensor Core GPU next year for larger model inferencing without increased latency. Furthermore, the company will introduce AMD MI300X accelerated VMs to Azure, designed for high-range AI model training and generative inferencing with AMD’s latest GPU, the AMD Instinct MI300X. By combining first-party silicon with a diverse array of chips from industry partners, Microsoft aims to provide customers with greater choice in terms of price and performance, emphasizing a commitment to customer satisfaction.

 

 

Customer Hardware, from Chip to Datacenter


Microsoft began to custom build its own servers and racks, driving down costs and giving customers a more consistent experience. Over time, silicon became the primary missing piece. The testing process includes determining how every single chip will perform under different frequency, temperature and power conditions for peak performance and importantly, testing each chip in the same conditions and configurations that it will experience in a real world Microsoft datacenter.

The ability to build its own custom silicon allows Microsoft to target certain qualities and ensure that the chips perform optimally on its most important workloads.

The silicon architecture unveiled today also lets Microsoft not only enhance cooling efficiency but optimize the use of its current datacenter assets and maximise server capacity within its existing footprint.

 

Microsoft has shared its design learnings from its custom rack with industry partners and can use those no matter what piece of silicon sits inside, said Stemen. “All the things we build, whether infrastructure or software or firmware, we can leverage whether we deploy our chips or those from our industry partners,” he said. “This is a choice the customer gets to make, and we’re trying to provide the best set of options for them, whether it’s for performance or cost or any other dimension they care about.”

Microsoft plans to expand that set of options in the future; it is already designing second-generation versions of the Azure Maia AI Accelerator series and the Azure Cobalt CPU series. 

 

Resources: 

https://news.microsoft.com/source/features/ai/in-house-chips-silicon-to-service-to-meet-ai-demand/

https://azure.microsoft.com/en-us/blog/microsoft-azure-delivers-purpose-built-cloud-infrastructure-in-the-era-of-ai/