AWS HyperPod Task Governance prevents GPUs from being idle


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. More information


Cost remains a primary concern of enterprise AI use, and it’s a challenge that AWS is addressing head-on.

At the AWS: Reinventing 2024 At today’s conference, the cloud giant announced HyperPod Task Governance, a sophisticated solution targeting one of the most expensive inefficiencies in enterprise AI operations: underutilized GPU resources.

According to AWS, HyperPod Task Governance can increase AI accelerator utilization, helping companies optimize AI costs and generate potentially significant savings.

“This innovation helps you maximize the utilization of computing resources by automating the prioritization and management of these Gen AI tasks, reducing cost by up to 40%,” said Swami Sivasubramanian, Vice President of AI and AWS data.

End GPU idle time

As organizations rapidly ramp up their AI initiatives, many are discovering a costly paradox. Despite heavy investments in GPU infrastructure to power various AI workloads, including training, tuning, and inference, these expensive computing resources often sit idle.

Business leaders report surprisingly low utilization rates on their AI projects, even when teams compete for computing resources. As a result, this is actually a challenge that AWS was facing.

“Internally, we had this kind of problem when we were scaling over a year ago, and we built a system that takes into account the consumption needs of these accelerators,” Sivasubramanian told VentureBeat. “I talked to a lot of our customers, CIOs and CEOs, they said we want exactly that; we want it as part of Sagemaker and that’s what we’re releasing.”

Swami said that once the system was deployed, utilization of the AWS AI Accelerator went through the roof with utilization rates increasing by more than 90%

How HyperPod Task Governance works

SageMaker Hyperpod technology was first announced in the conference re:invent 2023.

SageMaker HyperPod is designed to handle the complexity of training large models with billions or tens of billions of parameters, which requires managing large pools of machine learning accelerators.

HyperPod Task Governance adds a new layer of control to SageMaker Hyperpod by introducing the intelligent allocation of resources to different AI workloads.

The system recognizes that different AI tasks have different demand patterns throughout the day. For example, inference workloads tend to peak during business hours when applications are most in use, while training and experimentation can be scheduled during off-peak hours.

The system provides companies with real-time information on project utilization, team resource consumption, and computing needs. It enables organizations to effectively load balance their GPU resources across teams and projects, ensuring that expensive AI infrastructure never sits idle.

AWS wants to make sure companies don’t leave money on the table

Sivasubramanian emphasized the critical importance of AI cost management during his keynote.

As an example, he said that if an organization has assigned a thousand deployed AI accelerators, not all of them are being used consistently over a 24-hour period. During the day, they are heavily used for inference, but at night, a large portion of these expensive resources are idle when demand for inference may be very low.

“We live in a world where computing resources are finite and expensive and it can be difficult to maximize utilization and allocate resources efficiently, which is typically done through spreadsheets and calendars,” he said. “Now, without a strategic approach to resource allocation, you’re not only missing opportunities, you’re leaving money on the table.”



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *