Four thoughts on AI deep learning in 2022

This article is part of a VB special issue. Read the full series here: How Data Privacy Is Transforming Marketing.

We’re putting another year of exciting developments in artificial intelligence (AI) deep learning behind us – one filled with remarkable progress, controversies and, of course, disputes. As we wrap up 2022 and prepare to embrace what 2023 has in store, here are some of the most notable overarching trends that marked this year in deep learning.

1. Scale continues to be an important factor

One theme that has remained constant in deep learning over the past few years is the drive to create bigger neural networks. The availability of computer resources makes scaling neural networks possible, as well as specialized AI hardware, large datasets, and the development of scale-friendly architectures like the transformer model.

For the moment, companies are obtaining better results by scaling neural networks to larger sizes. In the past year, DeepMind announced Gopher, a 280-billion parameter large language model (LLM); Google announced Pathways Language Model (PaLM), with 540 billion parameters, and Generalist Language Model (GLaM), with up to 1.2 trillion parameters; and Microsoft and Nvidia released the Megatron-Turing NLG, a 530-billion-parameter LLM.

One of the interesting aspects of scale is emergent abilities, where larger models succeed at accomplishing tasks that were impossible with smaller ones. This phenomenon has been especially intriguing in LLMs, where models show promising results on a wider range of tasks and benchmarks as they grow in size.


Low-Code/No-Code Summit

Join today’s leading executives at the Low-Code/No-Code Summit virtually on November 9. Register for your free pass today.

Register Here

It is worth noting, however, that some of deep learning’s fundamental problems remain unsolved, even in the largest models (more on this in a bit).

2. Unsupervised learning continues to deliver

Many successful deep learning applications require humans to label training examples, also known as supervised learning. But most data available on the internet does not come with the clean labels needed for supervised learning. And data annotation is expensive and slow, creating bottlenecks. This is why researchers have long sought advances in unsupervised learning, where deep learning models are trained without the need for human-annotated data.

There has been tremendous progress in this field, in recent years, especially in LLMs, which are mostly trained on large sets of raw data gathered from around the internet. While LLMs continued to make progress in 2022, we also saw other trends in unsupervised learning techniques gaining traction.

For example, there were phenomenal advances in text-to-image models this year. Models like OpenAI’s DALL-E 2, Google’s Imagen, and Stability AI’s Stable Diffusion have displayed the power of unsupervised learning. Unlike older text-to-image models, which required well-annotated pairs of images and descriptions, these models use large datasets of loosely captioned images that already exist on the internet. The sheer size of their training datasets (which is only possible because there’s no need for manual labeling) and variability of the captioning schemes enables these models to find all kinds of intricate patterns between textual and visual information. As a result, they are much more flexible in generating images for various descriptions.

3. Multimodality takes big strides

Text-to-image generators have another interesting characteristic: they combine multiple data types in a single model. Being able to process multiple modalities enables deep learning models to take on much more complicated tasks. 

Multimodality is very important to the kind of intelligence found in humans and animals. For instance, when you see a tree and hear the rustling of the wind in its branches, your mind can quickly associate them together. Likewise, when you see the word “tree,” you can quickly conjure the image of a tree, remember the smell of pine after a rainfall, or recall other experiences you’ve previously had. 

Evidently, multimodality has played an important role in making deep learning systems more flexible. This was perhaps best displayed by DeepMind’s Gato, a deep learning model trained on a variety of data types, including images, text and proprioception data. Gato showed decent performance in multiple tasks, including image captioning, interactive dialogues, controlling a robotic arm and playing games. This is in contrast to classic deep learning models, which are designed to perform a single task.

Some researchers have taken the notion as far as proposing that a system like Gato is all we need to achieve artificial general intelligence (AGI). While many scientists disagree with this opinion, what is for sure is that multimodality has brought important achievements for deep learning.

4. Fundamental deep learning problems remain

Despite the impressive achievements of deep learning, some of the field’s problems remain unsolved. Among them are causality, compositionality, common sense, reasoning, planning, intuitive physics, and abstraction and analogy-making. 

These are some of the mysteries of intelligence that are still being studied by scientists in different fields. Pure scale- and data-based deep learning approaches have helped make incremental progress on some of these problems while failing to provide a definitive solution. 

For example, larger LLMs can maintain coherence and consistency over longer stretches of text. But they fail on tasks that require meticulous step-by-step reasoning and planning.

Likewise, text-to-image generators create stunning graphics but make basic mistakes when asked to draw images that require compositionality or have complex descriptions.

These challenges are being discussed and explored by different scientists, including some of the pioneers of deep learning. Prominent among them is Yann LeCun, the Turing Award–winning inventor of convolutional neural networks (CNN), who recently wrote a long essay on the limits of LLMs that learn from text alone. LeCun is doing research on a deep learning architecture that learns world models and can tackle some of the challenges that the field currently suffers from.

Deep learning has come a long way. But the more progress we make, the more we become aware of the challenges of creating truly intelligent systems. Next year will surely be just as exciting as this one.

Originally appeared on: TheSpuzz