Liquid AI’s new STAR model architecture surpasses Transformers


Sign up for our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Find out more


As rumors and reports circulate about the difficulties faced by leading AI companies in developing newer, more powerful large language models (LLMs).the spotlight is increasingly shifting towards alternative architectures to the “Transformer” – the technology behind much of the current boom in generative AI, pioneered by Google researchers in the seminal 2017 paper “Attention is all you need.

As described in that paper and from now on, a transformer is a deep learning neural network architecture that processes sequential data, such as text or time series information.

Now, Liquid AI, startup born at MIT ha introduced STAR (Synthesis of Customized Architectures)an innovative framework designed to automate the generation and optimization of AI model architectures.

The STAR framework leverages evolutionary algorithms and a numerical coding system to address the complex challenge of balancing quality and efficiency in deep learning models.

According to Liquid AI’s research team, which includes Armin W. Thomas, Rom Parnichkun, Alexander Amini, Stefano Massaroli, and Michael Poli, STAR’s approach represents a shift from traditional architectural design methods.

Instead of relying on manual tuning or pre-built templates, STAR uses a hierarchical coding technique, called “STAR genomes,” to explore a vast design space of potential architectures.

These genomes enable iterative optimization processes such as recombination and mutation, allowing STAR to synthesize and refine architectures tailored to specific parameters and hardware requirements.

Cache size reduction by 90% compared to traditional ML transformers

Liquid AI’s initial focus for STAR was on autoregressive language modeling, an area where traditional Transformer architectures have long been dominant.

In tests conducted during the research, the Liquid AI research team demonstrated STAR’s ability to generate architectures that consistently outperformed Transformer++ and highly optimized hybrid models.

For example, when optimizing cache size and quality, STAR-evolved architectures achieved cache size reductions of up to 37 percent compared to hybrid models and 90 percent compared to Transformers. Despite these improvements in efficiency, STAR-generated models maintained or exceeded the predictive performance of their counterparts.

Likewise, when tasked with optimizing model quality and size, STAR reduced parameter counts by up to 13%, while still improving performance on standard benchmarks.

The research also highlighted STAR’s ability to scale its projects. An evolved model from STAR scaled from 125 million to 1 billion parameters provided comparable or superior results to existing Transformer++ and hybrid models, all while significantly reducing inference cache requirements.

Redesign the AI ​​model architecture

Liquid AI said STAR is rooted in a design theory that incorporates principles of dynamical systems, signal processing and numerical linear algebra.

This fundamental approach allowed the team to develop a versatile search space for computational units, including components such as attention mechanisms, recurrences, and convolutions.

One of STAR’s defining features is its modularity, which allows the framework to codify and optimize architectures across multiple hierarchical levels. This capability provides insights into recurring design patterns and allows researchers to identify effective combinations of architectural components.

What is the future of STAR?

STAR’s ability to synthesize efficient, high-performance architectures has potential applications far beyond language modeling. Liquid AI envisions this framework being used to address challenges in various industries where the trade-off between quality and computational efficiency is critical.

While Liquid AI has not yet revealed specific plans for commercial deployment or pricing, the research findings signal significant progress in the field of automated architecture design. For researchers and developers looking to optimize AI systems, STAR could be a powerful tool for pushing the limits of model performance and efficiency.

With its open research approach, Liquid AI published the full details of STAR in a peer-reviewed paperencouraging collaboration and further innovation. As the AI ​​landscape continues to evolve, frameworks like STAR are poised to play a key role in shaping the next generation of intelligent systems. STAR may also herald the birth of a new post-Transformer architecture boom—a welcome winter holiday gift for the machine learning and artificial intelligence research community.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *