Creating responsible AI products using human oversight 

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!

As a business, you no longer need to develop everything from scratch or train your own ML models. With machine-learning-as-a-service (MLaaS) becoming increasingly ubiquitous, the marketplace is flooded with numerous turnkey solutions and ML platforms. According to Mordor Intelligence, the market is expected to reach $17 billion by 2027.

The market

Total AI startup funding worldwide was close to $40 billion last year, compared to less than $1 billion a decade ago. Many big and small cloud companies that have entered the MLOps space are now beginning to realize the need for human involvement while operating their models.

The main goal of many of the AI platforms is to become appealing to the general user by making ML largely automated and available in low-code environments. But whether companies build ML solutions exclusively for their own use or for the sake of their customer, there’s a common problem – a lot of them train and monitor their models on low-quality data. Models trained on these types of data can produce predictions and hence products that are inherently biased, misleading and ultimately substandard.

Models and human involvement

Many of these are encoder-decoder models that use recurrent neural networks for sequence-to-sequence prediction. They work by taking an input, converting it into a vector, and then decoding it into a sentence; a similar approach works if the initial input is, say, an image. These have a wide range of applications – from virtual assistants to content moderation. 

The trouble is that human-handled data is often used haphazardly and without proper supervision to support these models, which may lead to multiple problems down the road. However, these models are a part of the larger human-in-the-loop framework – that is, they involve human interaction by design. With that in mind, they should be subject to consistent oversight at every stage of production to enable responsible AI products. But what does it mean exactly for an AI product to be “responsible”? 

What is responsible AI?

The notion of responsible AI comes down to improving the lives of people around the world by always “taking into account ethical and societal implications,” according to most AI researchers.  

Thus, it refers to the social and technical aspects of AI product design – both in terms of how these systems are built (development) and what output they deliver (usability). Among some of the most pressing AI responsibility challenges today are those of: 

  • Data collection biases.
  • Labeling biases.
  • Lack of pipeline/artifact transparency, including AI explainability issues. 
  • Compromised infrastructure security and user privacy.
  • Unfair treatment of those who label the data and operate these models.
  • Degradation of model quality and model accountability requirements over time.

Recent research suggests that only half of all global consumers today trust how AI is being implemented by corporate entities and organizations, while in some places like the UK this figure is close to two-thirds of the surveyed population.

Collection and labeling issues 

Every AI solution must travel a long way from the outset to full deployment, and every ill-taken step can lead to a potentially irresponsible product. For a start, when the data is being collected, it may contain offensive language and images right off the bat, which – when not dealt with in time – can produce thorny outcomes. Or public data could contain some accidentally revealed confidential information, which is better not to be revealed repeatedly in an automated manner. 

In the labeling stage, both biased labeling and confusion of observations are widely recognized issues that can do the same. Biased labeling refers to how a particular group of labelers can misinterpret information and data-tag a certain way based on their cultural background, which has already led to some inherently racist products and unequal hiring opportunities. The good news is that, in theory, this bias can be overcome statistically by using more varied groups of labelers, increasing sample sizes, collecting different datasets, and using other algorithmic solutions.

The problem of observational confusion has more to do with the maker’s opinion – that is, what the client actually wants to see as their end product. For example, should people wearing nurse outfits on Halloween be counted as medical nurses during labeling? Or should Rami Malek dressed as the lead singer of Queen be counted as Freddie Mercury? This issue can only be put to rest when those doing the labeling are provided with precise instructions that contain clear and plentiful examples. When unresolved, these ambiguities may lead to an AI product that acts negligently. Likewise, if the maker’s opinion happens to be different from the user’s, we’re likely to be faced with the same outcome again.

Ethics and responsible AI

There’s also the problem of ethical treatment of people behind AI – how to make their wages fair and offer them humane working conditions. Some tech companies strive to provide a voice for these people and go out of their way to treat them as what they truly are, the drivers of the AI industry. However, it’s still all too common to find human labelers working long hours from cramped offices. 

Training, production, and post-deployment 

Other issues may occur when the models are trained and deployed, stemming from both the fundamentally subjective data prepared for these models (i.e., the labelers) and the efforts of those individuals designing and fine-tuning the algorithms (i.e., the engineers). Beyond the need for unbiased and well-labeled data in the initial stage, the models need to be consistently monitored for overfitting and degradation afterward. 

There are two other issues related to this: irreproducible ML research and unexplainable models. As with any hard science, research should ideally be replicable; however, this isn’t always possible with ML because experiments cannot always be run in sequence. This has to do with the fact that things may change in the real world from your baseline one day to your test the next day, rendering your figures incomparable. Your test sets also change a lot as the model evolves. The way to combat that is to have better experimental protocols and use parallel experimental designs such as A/B testing. 

With unexplainable models affecting many AI products, there may be a certain judgment or prediction coming from the model, but how or why it emerged exactly may remain unclear. In some situations – like credit risk management – these results often cannot be accepted for granted as ground truth, which is why explainable AI models that provide sufficient details and reasons must always be favored in such cases.

Importantly, companies that build responsible AI products should also be able to explain how their products are created, not just operated, which may entail offering their pipelines for inspection whenever necessary. To achieve that, transparency across the company’s business processes and the product’s functions has to remain consistent throughout. 

Is it worth it? 

So, with so many potential problems and hazards, the big question pops up – is the game worth the candle? Do we need these models after all? According to a recent article published in Nature, the answer is still yes, but we need to make AI fair. More companies are finding out that their business can be significantly improved with AI if the product is built responsibly.

It’s also important to remember that we need ML models to help us make the decisions, not have the models do all of the decision-making for us. The trouble is that many jump on the AI bandwagon without knowing what they’re getting into, how to supervise ML operations properly, and ultimately how to build responsible AI products. 

When we start shying away from “boring” operational tasks of responsible data collection, unbiased labeling, reproducible algorithms, and model monitoring, we’re bound to wind up with mediocre results. Often, these results cost us more when we attempt to fix them compared to when we do everything right in the first place.

Responsible AI: The bottom line

The ML market, of which MLaaS is an integral part, is moving forward at an ever faster pace. This leaves us with a resounding and unequivocal truth – to enjoy responsible AI products, we need to possess responsible models and processes. With that in mind, human oversight at every stage is crucial if we’re to make the human-machine collaboration work in our favor. We need to remember that while automation can be freeing, we can only build, operate, and maintain responsible AI models when key decisions are left in our hands. 

Fedor Zhdanov is head of ML products at Toloka AI.

Originally appeared on: TheSpuzz