We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
Many artificial intelligence (AI) algorithms are used to classify, organize or reason about data. Generative algorithms create data using models of the world to synthesize images, sounds and videos that often look increasingly realistic. The algorithms begin with models of what a world must be like and then they create a simulated world that fits the model.
Generative AIs are frequently found in various content creation roles. They’re used by movie makers to either fill narrative gaps or, sometimes, carry much of the storyline. Some news organizations generate short snippets or even entire stories about events, especially highly structured sports or financial reports.
Not all generative algorithms produce content. Some algorithms are deployed in user interfaces to enhance the screen or user interfaces. Others help the blind by generating audio descriptions. In many applications, the techniques assist rather than take center stage.
The algorithms are now common enough that developers make artistic decisions about their goals. Some aim for the most realistic output and judge it by how indistinguishable the people or animals may be from photographic footage of actual creatures. Others think like artists or animators and want to produce a more stylized product that is obviously not real but more like a cartoon.
What are the dangers of generative AIs?
Some generative AI algorithms are good enough to deceive. These results, sometimes called “deep fakes,” may be used to masquerade as another person and commit all manners of fraud in their name. Some may try to imitate a person and withdraw money from a bank. Others may try to place words in another person’s mouth to frame them for a crime like libel, slander or more.
One particularly salacious approach involves generating pornography that seems to include another person. These results may be used for blackmail, coercion, extortion or revenge.
Also read: ‘Sentient’ artificial intelligence: Have we reached peak AI hype?
Can the results of generative AIs be distinguished from real images?
The results of modern algorithms are often very realistic but a trained eye can usually spot small differences. This is harder with some of the best algorithms that are often found in the best computer graphics for Hollywood movies with large budgets.
The differences are often visible because the generated images are too perfect. The skin tone may follow a steady gradient. The hairs may all bend and wave in the same amounts with the same periods. The colors may be too consistent.
One research project at MIT suggested looking at these areas for inconsistencies that could indicate the work of a generative AI:
- Cheeks and foreheads: The wrinkles in these areas are often nonexistent. If there are wrinkles that are added, they don’t move in a realistic way.
- Shadows: In the areas around the eyes, the nose and an open mouth, the shadows are often poorly formed. They may not follow the lighting of the scene as the head changes position.
- Glasses: The position and angle of any lighting glare on the lenses should shift correctly as the head moves relative to the lights.
- Beards and mustaches: Do these move with the face? Are they all similar in shading and coloring, something that is rare in real life?
- Blinking: Do the eyes blink? Do they blink too often? Or not enough?
- Lips: Do they always move in the same way for all phonemes? Is the size and shape consistent with the rest of the face? Deep fake algorithms try to generate new positions of the lips for each word that is spoken and this leaves many opportunities for detection. If the process is too regular and repetitive, the lip movements may be generated by an algorithm.
The research project at MIT also offers a chance for readers to explore various deep fakes and attempt to detect them.
What are generative architectures?
The area of creating realistic images, sounds and storylines is new and the focus of much active research. The approaches are varied and far from fixed. Scientists are still discovering new architectures and strategies today.
One common approach is called Generative Adversarial Networks (GAN) because it depends on at least two different AI algorithms competing against each other and then converging upon a result.
One algorithm, often a neural network, is responsible for creating a draft of a solution. It’s called the “generative network.” A second algorithm, also usually a neural network, evaluates the quality of the solution by comparing it to other realistic answers. This is often called the “discriminator network.” Sometimes there can be multiple versions of either the generator or discriminator.
The entire process repeats a number of times and each side of the algorithm helps train the other. The generator learns which results are more acceptable. The discriminator learns which parts of the results are most likely to indicate realism.
Another solution, sometimes called Transformers, avoids the adversarial approach. A single network is trained to produce the most realistic solutions. Microsoft has one, known as GPT-n for General, Pre-trained Network, that’s been trained over the years using large blocks of text gathered from Wikipedia and the general internet. The latest version, GPT-3, is closed source and licensed directly for many tasks including generative AI. It’s said to have more than 175 billion parameters. Several other similar models include Google’s LaMDA (Language Model for Dialogue Applications) and China’s Wu Dao 2.0.
A third variety is sometimes called a “Variational Auto-Encoder.” These solutions depend upon compression algorithms that are designed to shrink data files using some of the patterns and structures within. These algorithms work in reverse, using random values to drive the creation.
What are the political challenges of generative AI?
Storytelling and fiction are old traditions that are well understood and usually harmless. Generating fake images, videos or audio recordings for political advantage is also an old tradition, but it is far from harmless.
The greatest danger is that generative AI will be used to create fake news stories to influence the political decisions of leaders and citizens. Stories of atrocities, crimes and other forms of misbehavior are easy to concoct. When the AI is able to generate fake evidence, it becomes difficult or even impossible for people to make informed decisions. Truth becomes impossible to ascertain.
For this reason, many believe that truly successful Generative AIs pose a very grave danger to the philosophical foundation of our political and personal lives.
Also read: Report: 5 key trends for AI’s future
Are computer game companies using generative AI?
Many of the leaders in creating simulated visual scenes and audio are computer game companies. The companies that specialize in computer graphics have spent the last few decades creating more elaborate versions of reality that are increasingly realistic. There are dozens of good examples of computer games that allow the game player to imagine being in another realm.
The generative AI scientists often borrow many of the ideas and techniques from computer graphics and games. Still, many draw a distinction between generative AI and the world of computer gaming.
One reason why the game companies are usually not mentioned is because they’ve relied heavily on human artists to create much of what we see on the screen. While they’ve been leaders in creating extensive graphics algorithms for rendering the scenes, most of the details were ultimately directed by humans.
Generative AI algorithms seek to take over this role from the artists. The AI is responsible for structuring the scenes, choosing the elements and then arranging them inside it. While the rules inside the model may be crafted, in part, by some human, the goal is to make the algorithm the ultimate director or creator.
How are market leaders using Generative AI?
Amazon’s Web Services offers Polly, a tool for turning text into speech. The service offers three different tiers of service. The basic version uses tried and tested algorithms. The middle tier uses what it calls Neural Text-to-Speech (NTTS) for an approach using neural networks that’s been tuned to deliver a neutral voice that’s common in news narration. The third version allows companies to create their own personalized voice for their brand so the speaking sound will be associated only with their products.
Microsoft’s Github offers a service called CodeAssist that helps programmers by suggesting snippets of software that might help fill a gap. It’s been trained on more than a billion lines of code from public, open source git repositories. It can turn a short phrase or comment like “fetch tweets” into a full function by searching through its knowledge. The system, while much more intelligent than simple code completion, is still intended to just be an assistant for a human. The marketing literature calls it a co-pilot but “you’re the pilot.”
Amazon also offers DeepComposer, an AI that can turn a short melody into a complete song. The system comes with pre-trained models that are designed to fit many of the common genres of music. The system is also meant to be an assistant for a human who first creates some simple musical segments and then guides the composition by adjusting some of the parameters for the machine learning algorithm.
IBM uses some of its generative models to help with drug design. That is, they’re exploring how to train their AIs to imagine new molecules that may have the right shape to work as drugs. In particular, they’re looking for antimicrobial peptides that can target specific diseases. The marketing literature announces, “In just the field of drug discovery, it’s believed that there are some 1063 possible drug-like molecules in the universe. Trial and error can’t possibly get us through all those combinations.”
Many of the game companies are, by their very nature, experts at creating artificial worlds and building stories around them. Companies like Nintendo, Rockstar, Valve, Activision, Electronic Arts and Ubisoft are just a few of the major names. They are rarely discussed in the context of generative AI even though they’ve been creating and deploying many similar algorithms. Indeed, their expertise often goes back decades and originated before people used the term AI to describe their work.
What about generative AI startups?
Many of the startups and established companies that work with generative AI algorithms are in the gaming industry. Indeed, many of the video game companies have been actively pursuing creating the most realistic representations from the beginning. It’s fair to say that many, if not most, video game companies are involved in some form of Generative AI.
Some, though, standout for their focus on using AI techniques. Respeacher is building voice cloning technology for the advertising, entertainment and video game businesses. Their machine learning technology begins with a sample voice and then learns all of the parameters so that new dialog can be rendered in this voice.
Rephrase.ai, Synthesia, offers a full text-to-video solution that is used in the advertising industry to create customized or even personalized sales pitches. Their tools begin with models that learn how a person’s face moves for each phoneme and then use this to create synthetic video from the models. They also maintain a collection of stock models, some generated from celebrities who license their image.
D-ID tries to apply all of the lessons from creating deep fake in reverse. It will take a real video of a human and then remove many of the recognizable attributes like the position of the eyes or the shape of the nose. The idea is to offer some anonymization while retaining the essential message of the video.
Rosebud.ai offers a full collection of synthetic algorithms that begin with a simple text description and then build models of humans or worlds that match the request. Their tools are used by people to explore creative ideas and then see them rendered. They ship versions as apps for iOS and Android. They are also bundling some creations as non-fungible tokens (NFTs) that can be resold on various cryptocurrency marketplaces.
Is there anything generative AI can’t do?
The capability of a generative AI is largely in the eyes or ears or the beholders. Do the results feel real enough to serve a purpose? If it’s meant to be realistic, does it appear indistinguishable from a photograph? If it’s meant to be artistic or stylized, does it reach those artistic goals?
The world of deep fakes is already delivering on the goal of distorting and replacing reality for people. Many are worried that some of these will destroy our ability to trust images or sound recordings because skilled purveyors will be able to create any version of the past that they would like.
The implications for politics and the justice system are serious and many believe that it’s essential for counterfeit detection algorithms must also be available to battle this scourge. For now, many of the algorithms that can detect anomalies from the synthesis process are good enough to detect the deep fakes from well-known algorithms.
The future, though, of detection could evolve into a cat-and-mouse game. The deep fake creators search for better algorithms that can evade detectors while the detection teams work to look for more telltale patterns that can flag synthetic results.
Already the different techniques described above for detecting deep fakes are being turned into automated tools. While the deep fakes may fool some people initially, a concerted effort seems likely to be able to detect the fakes with enough accuracy, time and precision.
Read more:Doubling down on AI: Pursue one clear path and beware two critical risks