Sign up for our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Find out more
The use of AI continues to expand, and with more and more companies integrating AI tools into their workflows, many want to look for more options to reduce the costs associated with running AI models.
To answer customer demand, AWS announced two new features on Bedrock to reduce the cost of running AI models and applications, which are already available on competing platforms.
During a keynote speech at AWS re:Invent, Swami Sivasubramanian, vice president of AI and Data at AWS, announced Intelligent Prompt Routing on Bedrock and the arrival of Prompt Caching.
Intelligent prompt routing would help customers route prompts to the best size so that a large template doesn’t answer a simple query.
“Developers need the right templates for their applications, which is why we offer a wide range of templates,” Sivasubramanian said.
AWS said Intelligent Prompt Routing “can reduce costs by up to 30% without compromising accuracy.” Users will need to choose a template family and Bedrock’s Intelligent Prompt Routing will send prompts to the right-sized templates within that family.
Moving recommendations across different models to optimize usage and costs has slowly gained prominence in the AI industry. Startup Not Diamond announced it intelligent routing functionality in July.
Voice agent company Argo Labs, an AWS customer, said it uses intelligent prompt routing to ensure that the right-sized models handle different customer requests. Simple yes or no questions like “Do you have a reservation?” are handled by a smaller, but more complicated model like “What vegan options are available?” would be directed to a larger one.
Caching requests
AWS also announced that Bedrock will now support prompt caching, where Bedrock can keep common prompts or repeat without pinging the model and generating another token.
“Token generation costs can often increase, particularly for repeat requests,” Sivasubramanian said. “We wanted to give customers an easy way to dynamically cache prompts without sacrificing accuracy.”
AWS said fast caching reduces costs “by up to 90% and latency by up to 85% for supported models.”
However, AWS is a bit late to this trend. Prompt caching is available on other platforms to help users reduce costs when reusing prompts. Anthropicby Claude 3.5 Sonnet and Haiku offer immediate caching on its API. OpenAI Also Extended prompt caching for its API.
Using AI models can be expensive
Running AI applications remains expensive, not only due to the cost of training models, but also their usage. Businesses have said that the costs of using AI are high it is still one of the biggest obstacles to a wider diffusion.
As companies move towards agent-based use cases, there is still a cost associated with having users ping the model and the agent to start performing its tasks. Methods like prompt caching and intelligent routing can help reduce costs by limiting when a prompt pings a model API to answer a query.
Model developers, however, said that as adoption increases, prices of some models may decrease. OpenAI said it anticipates The costs of artificial intelligence could decrease Soon.
More models
AWS, which hosts many of Amazon’s models, including its own new Nova models – and major open source vendors, will add new templates to Bedrock. This includes models from Poolside, Stability AI and Stable Diffusion 3.5 Lumaby Ray 2. The models are expected to launch soon on Bedrock.
Luma CEO and co-founder Amit Jain told VentureBeat that AWS is the company’s first cloud provider partner to host its models. Jain said the company used Amazon’s SageMaker HyperPod when building and training the Luma models.
“The AWS team had engineers who felt like part of our team because they helped us solve problems. It took us almost a week or two to bring our models to life,” Jain said.
Source link