Check out all the on-demand sessions from the Intelligent Security Summit here.
Large Language Models (LLMs), or systems that understand and generate text, have recently emerged as a hot topic in the field of AI. The release of LLMs by tech giants such as OpenAI, Google, Amazon, Microsoft and Nvidia, and open-source communities demonstrates the high potential of the LLM field and represents a major step forward in its development. Not all language models, however, are created equal.
In this article, we’ll look at the key differences among approaches to using LLMs after they are built, including open-source products, products for internal use, products platforms and products on top of platforms. We’ll also dig into complexities in each approach, as well as discuss how each is likely to advance in the coming years. But first, the bigger picture.
What are large language models anyway?
The common applications of LLM models range from simple tasks such as question answering, text recognition and text classification, to more creative ones such as text or code generation, research into current AI capabilities and human-like conversational agents. The creative generation is certainly impressive, but the more advanced products based on those models are yet to come.
What’s the big deal about LLM technology?
The use of LLMs has increased dramatically in recent years as newer and larger systems are developed. One reason is that a single model can be used for a variety of tasks, such as text generation, sentence completion, classification and translation. In addition, they appear capable of making reasonable predictions when given only a few labeled examples, so-called “few-shot learning.”
Intelligent Security Summit On-Demand
Learn the critical role of AI & ML in cybersecurity and industry specific case studies. Watch on-demand sessions today.
Let’s take a closer look at three different development paths available to LLM models. We’ll evaluate the potential drawbacks they may face in the future, and brainstorm potential solutions.
Open-source LLMs are created as open-collaboration software, with the original source code and models made freely available for redistribution and modification. This allows AI scientists to work on and use the models’ high-quality capabilities (for free) on their own projects, rather than limiting model development to a selected group of tech companies.
A few examples are Bloom, Yalm and even Salesforce, which provide environments that facilitate rapid and scalable AI/ML development. Even though open-source development is by definition open for contributors to use, it will incur high development costs. Hosting, training and even fine-tuning these models is a further drain, as it requires investment, specialized knowledge and large volumes of specially connected GPUs.
Tech companies’ continuing investment and open-sourcing of these technologies could be motivated by brand-related goals, such as showcasing the company’s leadership in the field, or by more practical ones, such as discovering alternative value-adds that the broader community can come up with.
In other words, investment and human guidance are required for these technologies to be useful for business applications. Often, adaptation of models can be achieved through either fine-tuning on certain amounts of human-labeled data, or continuous interaction with developers and the results they generated from the models.
The clear leader here is OpenAI, which has created the most useful models and enabled some of them through an API. But many smaller startups, such as CopyAI, JasperAI and Contenda, kickstart the development of their own LLM-powered applications on top of the “model-as-a-service” offered by leaders in the field.
As these smaller businesses compete for a share of their respective markets, they leverage the power of supercomputer-scale models, fine-tuning for the task at hand while using a much smaller quantity of data. Their applications are typically trained to solve a single task, and focus on a specific and much narrower market segment.
Other companies develop their own models competitive with OpenAI’s, contributing to the advancement of the science of generative AI. Examples include AI21, Cohere, and GPT-J-6B by EleutheraAI, where models generate or classify text.
Another application of language models is code generation. Companies such as OpenAI and GitHub (with the GitHub Copilot plugin based on OpenAI Codex), Tabnine and Kite produce tools for automatic code generation.
Tech giants like Google, DeepMind and Amazon keep their own versions of LLMs — some of which are based on open-source data — in-house. They research and develop their models to further the field of language AI; to use them as classifiers for business functions such as moderation and social media classification; or to assist in the development of long tails for large collections of written requests, such as ad and product description generation.
What are the limitations of LLMs?
We’ve already discussed some of the drawbacks, such as high development and maintenance costs. Let’s dive a bit deeper into the more technical issues and the potential ways of overcoming them.
According to research, larger models generate false answers, conspiracies and untrustworthy information more frequently than smaller ones do. The 6B-parameter GPT-J model, for example, was 17% less accurate than its 125M-parameter counterpart.
Since LLMs are trained on internet data, they may capture undesirable societal biases relating to race, gender, ideology and religion. In this context, alignment with disparate human values still remains a particular challenge.
Providing open access to those models, such as in a recent Galactica case, can be risky as well. Without preliminary human verification, the models might inadvertently produce racist comments, or inaccurate scientific claims.
Is there a solution to improve LLMs?
Merely scaling up models appears to be less promising for improving truthfulness and avoiding explicit content than fine-tuning with training objectives other than text imitation.
A bias or truth detection system with a supervised classifier that analyzes content to find parts that fit the definition of “biased” for a given case could be one way to fix these types of errors. But that still leaves you with the problem of training the model.
The solution is data, or, more specifically, a large amount of data labeled by humans. After feeding the system enough data samples and the corresponding polygon annotation for locating explicit content, portions of the dataset that have been identified as harmful or false are either removed or masked to prevent their use in the model’s outputs.
In addition to bias detection, human evaluation can be used to evaluate texts based on their fluency and readability, natural language, grammatical errors, cohesion, logic and relevance.
Not quite AGI yet
Without a doubt, recent years have seen some truly impressive advances in AI language models, and scientists have been able to make progress in some of the field’s most difficult areas. Yet despite their progress, LLMs still lack some of the most important aspects of intelligence, such as common sense, casualty detection, explicit language detection and intuitive physics.
As a result, some researchers are questioning whether training solely on language is the best way to build truly intelligent systems, regardless of how much data is used. Language functions well as a compression system for communicating the essence of messages. But it is difficult to learn the specifics and contexts of human experience through language alone.
A system trained on both form and meaning — for example, on videos, images, sounds and text simultaneously — might aid in advancing the science of natural language understanding. In any case, it will be interesting to see where developing robust LLM systems will take science. One thing is hard to doubt, though: The potential value of LLMs is still significantly greater than what has been achieved so far.
Fedor Zhdanov is head of ML at Toloka.