Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more
Let the OSS Enterprise newsletter guide your open source journey! Sign up here.
Large language models capable of writing poems, summaries, and computer code are driving the demand for “natural language processing (NLP) as a service.” As these models become more capable — and accessible, relatively speaking — appetite in the enterprise for them is growing. According to a 2021 survey from John Snow Labs and Gradient Flow, 60% of tech leaders indicated that their NLP budgets grew by at least 10% compared to 2020, while a third — 33% — said that their spending climbed by more than 30%.
Well-resourced providers like OpenAI, Cohere, and AI21 Labs are reaping the benefits. As of March, OpenAI said that GPT-3 was being used in more than 300 different apps by “tens of thousands” of developers and producing 4.5 billion words per day. Historically, training and deploying these models was beyond the reach of startups without substantial capital — not to mention compute resources. But the emergence of open source NLP models, datasets, and infrastructure is democratizing the technology in surprising ways.
Open source NLP
The hurdles to developing a state-of-the-art language model are significant. Those with the resources to develop and train them, like OpenAI, often choose not to open-source their systems in favor of commercializing them (or exclusively licensing them). But even the models that are open-sourced require immense compute resources to commercialize.
Take, for example, Megatron 530B, which was jointly created and released by Microsoft and Nvidia. The model was originally trained across 560 Nvidia DGX A100 servers, each hosting 8 Nvidia A100 80GB GPUs. Microsoft and Nvidia say that they observed between 113 and 126 teraflops per second per GPU while training Megatron 530B, which would put the training cost in the millions of dollars. (A teraflop rating measures the performance of hardware, including GPUs.)
Inference — actually running the trained model — is another challenge. Getting inferencing (e.g., sentence autocompletion) time with Megatron 530B down to a half a second requires the equivalent of two $199,000 Nvidia DGX A100 systems. While cloud alternatives might be cheaper, they’re not dramatically so — one estimate pegs the cost of running GPT-3 on a single Amazon Web Services instance at a minimum of $87,000 per year.
Recently, however, open research efforts like EleutherAI have lowered the barriers to entry. A grassroots collection of AI researchers, EleutherAI aims to eventually deliver the code and datasets needed to run a model similar (though not identical) to GPT-3. The group has already released a dataset called The Pile that’s designed to train large language models to complete text, write code, and more. (Incidentally, Megatron 530B was trained on The Pile.) And in June, EleutherAI made available under the Apache 2.0 license GPT-Neo and its successor, GPT-J, a language model that performs nearly on par with an equivalent-sized GPT-3 model.
One of the startups serving EleutherAI’s models as a service is NLP Cloud, which was founded a year ago by Julien Salinas, a former software engineer at Hunter.io and the founder of money-lending service StudyLink.fr. Salinas says the idea came to him when he realized that, as a programmer, it was becoming and easier to leverage open source NLP models for business applications but harder to get them to run properly in production.
NLP Cloud — which has five employees — hasn’t raised money from external investors, but claims to be profitable.
“Our customer base is growing rapidly, and we see very diverse customers using NLP Cloud — from freelancers to startups and bigger tech companies,” Salinas told VentureBeat via email. “For example, we are currently helping a customer create a programming expert AI that doesn’t code for you, but — even more importantly— gives you advanced information about specific technical fields that you can leverage when developing your application (e.g., as a Go developer, you might want to learn how to use goroutines). We have another customer who fine-tuned his own version of GPT-J on NLP Cloud in order to make medical summaries of conversations between doctors and patients.”
NLP Cloud competes with Neuro, which serves models via an API including EleutherAI’s GPT-J on a pay-per-use basis. Pursuing greater efficiency, Neuro says it runs a lighter-weight version of GPT-J that still produces “strong results” for applications like generating marketing copy. In another cost-saving measure, Neuro also has customers share cloud GPUs, the power consumption of which the company caps below a certain level.
“Customer growth has been good. We’ve had many users put us into their production environment without having spoken with them — which is amazing for an enterprise product,” CEO Paul Hetherington told VentureBeat via email. “Some people have spent over $1,000 in their first day of usage with integration times of minutes in many instances. We have customers using GPT-J … in a variety of ways, including market copy, generating stories and articles, and generating dialogue for characters in games or chatbots.”
Neuro, which claims to run all of its compute in-house, has an 11-person team and recently graduated from Y Combinator’s Winter 2021 cohort. Hetherington says that the plan is to continue to build out its cloud network and to grow its relationship with EleutherAI.
Another EleutherAI model adopter is CoreWeave, which also works closely with EleutherAI to train the group’s larger models. CoreWeave, a cloud service provider that initially focused on cryptocurrency mining, says that serving NLP models is its “largest use case to date” and currently works with customers including Novel AI, whose AI-powered platform helps users create stories and embark on text-based adventures.
“We’ve leaned into NLP because of the size of the market and the void we fill as a cloud provider,” CoreWeave cofounder and CTO Brian Venturo told VentureBeat via email. “I think we’ve been really successful here because of the infrastructure we built, and the cost advantages our clients see on CoreWeave compared to competitors.”
No language model is immune to bias and toxicity, as research has repeatedly shown. Larger NLP-as-a-service providers have taken a range of approaches in attempting to mitigate the effects, from consulting external advisory councils to implementing filters that prevent customers from using the models to generate certain content, like that pertaining to self-harm.
At the dataset level, EleutherAI claims to have performed “extensive bias analysis” on The Pile and made “tough editorial decisions” to exclude data that they felt were “unacceptably negatively biased” toward certain groups or views.
NLP Cloud allows customers to upload a blacklist of words to reduce the risk of generating offending content with its hosted models. In order to preserve the integrity of the original models, flaws and all, the company hasn’t deployed filters or attempted to detoxify any of the models it serves. But Salinas says that if NLP Cloud does make modifications in the future, it’ll be transparent about the fact that it has done so.
“The most important risk of toxicity comes from GPT-J as it is a powerful AI model for text generation, so it should be used responsibly,” Salinas said.
Neither NLP Cloud nor Neuro explicitly prohibit customers from using models for potentially problematic use cases — although both reserve the right to revoke access to the models for any reason. CoreWeave, for its part, believes that not policing its customers’ applications is a selling point of its service — but advocates for general “AI safety.”
“[O]ur clients fine-tune models [to, for instance, reduce toxicity] regularly. This empowers them to ‘re-train’ large language models on a relatively small data set to make the model more relevant to their use case,” Venturo continued. “We don’t currently have an out-of-the-box solution for clients to do this, but I’d expect that to change in the coming weeks.”
Hetherington notes that Neuro also offers fine-tuning capabilities “with little-to-no programming expertise required.”
The path forward
While the hands-off approach to model moderation might not sit well with every customer, startups like NLP Cloud, Neuro, and CoreWeave argue that they’re making NLP technology more accessible than their better-funded rivals.
For example, on NLP Cloud, the plan for three requests per minute using GPT-J costs $29 per month on a cloud CPU or $99 per month on a GPU — no matter the number of tokens (i.e., words). By contrast, OpenAI charges on a per-token basis. Towards Data Science compared OpenAI’s and NLP Cloud’s offerings and found that a customer offering an essay-generating app that receives 10 requests every minute would have to pay around $2,850 per month if they used one of OpenAI’s less-capable models (Curie) versus $699 with NLP Cloud.
Startups built on open source models like EleutherAI’s could drive the next wave of NLP adoption. Advisory firm Mordor Intelligence forecasts that the NLP market will more than triple its revenue by 2025, as business interest in AI rises.
“Deploying these models efficiently so we can maintain an affordable pricing, while making them reliable without any interruption, is a challenge. [But the goal is to provide] a way for developers and data scientists to make the most of NLP in production without worrying about DevOps,” Salinas said.