We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
Trust in technology is eroding. This is especially true when it comes to emerging technologies such as AI, machine learning, augmented and virtual reality and the Internet of Things. These technologies are powerful and have the potential for great good. But they are not well understood by end-users of tech and, in some cases, not even by creators of tech. Mistrust is especially high when these technologies are used in fields such as healthcare, ﬁnance, food safety, and law enforcement, where the consequences of flawed or biased technology are much more serious than getting a bad movie recommendation from Netflix.
What can companies that use emerging technologies to engage and serve customers do to regain lost trust? The simple answer is to safeguard users’ interests. Easier said than done.
An approach I recommend is a concept I call Design for Trust. In simple terms, Design for Trust is a collection of three design principles and associated methodologies. The three principles are Fairness, Explainability, and Accountability.
There is an old saying from accounting borrowed in the early days of computing: garbage in, garbage out—shorthand for the idea that poor quality input will always produce faulty output. In AI and machine learning (ML) systems, faulty output usually means inaccurate or biased. Both are problematic, but the latter is controversial because biased systems can adversely affect people based on attributes such as race, gender, or ethnicity.
There are numerous examples of bias in AI/ML systems. A particularly egregious one came to light in September of 2021 when it was reported that on Facebook, “Black men saw an automated prompt from the social network that asked if they would like to ‘keep seeing videos about Primates,’ causing the company to investigate and disable the AI-powered feature that pushed the message.”
Facebook called this “an unacceptable error,” and, of course, it was. It occurred because the AI/ML system’s facial recognition feature did a poor job of distinguishing persons of color and minorities. The underlying problem was likely data bias. The datasets used to train the system didn’t include enough images or context from minorities to enable the system to learn properly.
Another type of bias, model bias, has plagued many tech companies, including Google. In the early days of Google, fairness was not an issue. But as the company grew and became the global de facto standard for search, more people began to complain its search results were biased.
Google search results are based on algorithms that decide which search results are presented to searchers. To help them get the results they seek, Google also auto-completes search requests with suggestions and presents “knowledge panels,” which provide snapshots of search results based on what is available on the web, and news results, which typically cannot be changed or removed by moderators. There is nothing inherently biased about these features. But whether they add to or detract from fairness depends on how they are designed, implemented, and governed by Google.
Over the years, Google has initiated a series of actions to improve the fairness of search results and protect users. Today, Google uses blacklists, algorithm tweaks, and an army of humans to shape what people see as part of its search page results. The company created an Algorithm Review Board to keep track of biases and to ensure that search results don’t favor its own offerings or links compared to those of independent third parties. Google also upgraded its privacy options to prevent unknown location tracking of users.
For tech creators seeking to build unbiased systems, the keys are paying attention to datasets, the model, and team diversity. Datasets must be diverse and large enough to provide systems with ample options to learn to recognize and distinguish between races, genders, and ethnicities. Models must be designed to properly weight factors that the system uses to make decisions. Because datasets are chosen and models designed by humans, highly trained and diverse teams are an essential component. Design for Trust is critical and it goes without saying that extensive testing should be performed before systems are deployed.
Even as tech creators take steps to improve the accuracy and fairness of their AI/ML systems, there remains a lack of transparency about how the systems make the decisions and produce results. AI/ML systems are typically known and understood only by the data scientists, programmers and designers who created them. So, while their inputs and outputs are visible to users, their internal workings such as the logic and objective/reward functions of the algorithms and platforms cannot be examined so others can understand whether they are performing as expected and learning from their results and feedback as they should. Equally opaque is whether the data and analytical models have been designed and are being supervised by people who understand the processes, functions, steps and desired outcomes. Design for Trust can help.
Lack of transparency isn’t always a problem. But when the decisions being made by AI/ML systems have serious consequences — think medical diagnoses, safety-critical systems such as autonomous automobiles, and loan approvals — being able to explain how a system made them is essential. Thus, the need is for explainability in addition to fairness.
Take the example of the long-standing problem of systemic racism in lending. Before technology, the problem was bias in the people making decisions about who gets loans or credit and who doesn’t. But that same bias can be present in AI/ML systems based on the datasets chosen and the models created because those decisions are made by humans. If an individual feels they were unfairly denied a loan, banks and credit card companies should be able to explain the decision. In fact, in a growing number of geographies, they are required to.
This is true in the insurance industry in many parts of Europe, where insurance companies are required to design their claims processing and approval systems to conform to standards of both fairness and explainability in order to improve trust. When an insurance claim is denied, the firms must provide a criteria and thorough explanation of why.
Today, explainability is often achieved by the people who developed the systems creating documentation of the system’s design and an audit trail of the processes it goes through to make decisions. A key challenge in explainability is that systems are increasingly analyzing and processing data at speeds beyond humans’ ability to process or comprehend. In these situations, the only way to provide explainability to have machines monitoring and checking the work of machines. This is the driver behind an emerging field called Explainable AI (XAI). XAI is a set of processes and methods that let humans understand the results and outputs of AI/ML systems.
Even with the best attempts to create technology systems that are fair and explainable, things can go awry. When they do, the fact that the inner workings of many systems are typically known only by the data scientists, developers, and programmers who created them, it can be difficult to identify what went wrong and trace it back to choices made by creators, providers, and users that led to those outcomes. Nevertheless, someone or some entity must be held accountable.
Take the example of Microsoft’s conversational bot, Tay. Released in 2016, Tay was designed to engage people in dialogue while emulating the style and slang of a teenage girl. Within 16 hours of its release, Tay had tweeted more than 95,000 times with a large percentage of them being abusive and offensive to minorities. The problem was Tay was designed to learn more about language from the interactions it had with people—and many of the responses to Tay’s tweets were themselves abusive and offensive to minorities. The underlying problem with Tay was model bias. Poor decisions were made by the people at Microsoft who designed the learning model for Tay. Yet, Tay learned racist language from people on the internet, which caused it to respond the way it did. As it’s impossible to hold “people on the internet” accountable, Microsoft must bear the lion’s share of responsibility… and it did.
Now consider the example of Tesla, its AutoPilot driver-assistance system and its higher-level functionality called Full Self-Driving Capability. Tesla has long been criticized for giving its driver-assistance features a name that might lead people to think it can operate on its own and over-selling the capabilities of both systems. Over the years, the U.S. National Highway Traffic Safety Administration (NHTSA) has opened more than 30 special crash investigations involving Teslas that might have been linked to AutoPilot. In August 2021, in the wake of 11 crashes involving Teslas and first-responder vehicles that resulted in 17 injuries and one death, the NHTSA launched a formal investigation of AutoPilot.
The NHTSA has its work cut out for it because determining who is at fault for an accident involving a Tesla is complicated. Was the cause a flaw in the design of AutoPilot, misuse of AutoPilot by a driver, a malfunction of a Tesla component that had nothing to do with self-driving, or a driver error or violation that could have happened in any vehicle regardless of whether it has an autonomous driving system or not, for example, texting while driving or excessive speed?
Despite the complexity of determining blame in some of these situations, it is always the responsibility of the creators and providers of technology to 1) conform to global and local laws, regulations, and standards, and community standards and norms; and 2) clearly define and communicate the financial, legal, and ethical responsibilities of each party involved in using their systems.
Practices that can help tech providers with these responsibilities include:
- Thorough and continuous testing of data, models, algorithms, usage, learning, and outcomes of a system to ensure the system meets financial, legal, and ethical requirements and standards
- Creating and maintaining a source model and audit trail of how the system is performing in a format that humans can understand and making it available when needed
- Developing contingency plans for pulling back or disabling AI/ML implementations that violate any of these standards
In the end, Design for Trust is not a one-time activity. Instead, it is a perpetual managing and monitoring and adjusting of systems for qualities that erode trust.
Arun ‘Rak’ Ramchandran is a corporate VP at Hexaware.