Check out the on-demand sessions from the Low-Code/No-Code Summit to learn how to successfully innovate and achieve efficiency by upskilling and scaling citizen developers. Watch now.
Large language models (LLMs) are all the talk of the AI world right now, but training them can be challenging and expensive; models with multi-billions of parameters require months of work by experienced engineers to get up and (reliably and accurately) running.
A new joint offering from Cerebras Systems and Cirrascale Cloud Services aims to democratize AI by giving users the ability to train GPT-class models much more inexpensively than existing providers — and with just a few lines of code.
“We believe that LLMs are under-hyped,” Andrew Feldman, CEO and cofounder of Cerebras Systems said in a pre-briefing. “Within the next year, we will see a sweeping rise in the impact of LLMs in various parts of the economy.”
Similarly, generative AI may be one of the most important technological advances in recent history, as it enables the ability to write documents, create images and code software from ordinary text inputs.
Intelligent Security Summit
Learn the critical role of AI & ML in cybersecurity and industry specific case studies on December 8. Register for your free pass today.
To help accelerate adoption and improve the accuracy of generative AI, Cerebras also today announced a new partnership with AI content platform Jasper AI.
“We really feel like the next chapter of Generative AI is personalized models that continually get better and better,” said Jasper CEO Dave Rogenmoser.
Stage one of the technology was “really exciting,” he said, but “it’s about to get much, much more exciting.”
Unlocking research opportunities
Relative to LLMs, traditional cloud providers can struggle because they are unable to guarantee latency between large numbers of GPUs. Feldman explained that variable latency produces complex and time-consuming challenges in distributing a large AI model among GPUs, and there are “large swings in time to train.”
The new Cerebras AI Model Studio, which is hosted on the Cirrascale AI Innovation Cloud, allows users to train generative Transformer (GPT)-class models — including GPT-J, GPT-3 and GPT-NeoX — on Cerebras Wafer-Scale Clusters. This includes the newly announced Andromeda AI supercomputer.
Users can choose from state-of-the-art GPT-class models, ranging from 1.3 billion parameters up to 175 billion parameters, and complete training with eight times faster time to accuracy than on an A100, and at half the price of traditional cloud providers, said Feldman.
For instance, training time on GPT-J with a traditional cloud takes roughly 64 days from scratch; the Cerebras AI Model Studio reduces that to eight days from scratch. Similarly, on traditional clouds, production costs on GPUs alone are up to $61,000; while on Cerebras, it is $45,000 for the full production run.
The new tool eliminates the need for devops and distributed programming; push-button model scanning can be from one to 20 billion parameters. Models can be trained with longer sequence lengths, thus opening up new research opportunities.
“We’re unlocking a fundamentally new ability to research at this scale,” said Cerebras head of product Andy Hock.
As Feldman noted, Cerebras’ mission is “to broaden access to deep learning and rapidly accelerate the performance of AI workloads.”
Its new AI Model Studio is “easy and dead simple,” he said. “We’ve organized this so you can jump on, you can point, you can click.”
Accelerating AI’s potential
Meanwhile, the young Jasper (founded in 2021) will use Cerebras’ Andromeda AI supercomputer to train its computationally intensive models in “a fraction of the time,” said Rogenmoser.
As he noted, enterprises want personalized models, “and they want them badly.”
“They want these models to become better, to self-optimize based on past usage data, based on performance,” he said.
In its initial work on small workloads with Andromeda — which was announced this month at SC22, the international conference for high-performance computing, networking, storage and analysis — Jasper found that the supercomputer completed work that thousands of GPUs were incapable of doing.
The company expects to “dramatically advance AI work,” including training GPT networks to fit AI outputs to all levels of end-user complexity and granularity. This will enable Jasper to personalize content across multiple classes of customers quickly and easily, said Rogenmoser.
The partnership “enables us to invent the future of generative AI by doing things that are impractical or simply impossible with traditional infrastructure,” he said.
Jasper’s products are used by 100,000 customers to write copy for marketing, ads, books and other materials. Rogenmoser described the company as eliminating “the tyranny of the blank page” by serving as “an AI co-pilot.”
As he put it, this allows creators to focus on the key elements of their story, “not the mundane.”