3 model monitoring tips for reliable results when deploying AI

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

Artificial Intelligence (AI) promises to transform almost every business on the planet. That’s why most business leaders are asking themselves what they need to do to successfully deploy AI into production. 

Many get stuck deciphering which applications are realistic for the business; which will hold up over time as the business changes; and which will put the least strain on their teams. But during production, one of the leading indicators of an AI project’s success is the ongoing model monitoring practices put into place around it. 

The best teams employ three key strategies for AI model monitoring:

1. Performance shift monitoring

Measuring shifts in AI model performance requires two layers of metric analysis: health and business metrics. Most Machine Learning (ML) teams focus solely on model health metrics. These include metrics used during training — like precision and recall — as well as operational metrics — like CPU usage, memory, and network I/O. While these metrics are necessary, they’re insufficient on their own. To ensure AI models are impactful in the real world, ML teams should also monitor trends and fluctuations in product and business metrics that are directly impacted by AI. 


MetaBeat 2022

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

For example, YouTube uses AI to recommend a personalized set of videos to every user based on several factors: watch history, number of sessions, user engagement, and more. And when these models don’t perform well, users spend less time on the app watching videos. 

To increase visibility into performance, teams should build a single, unified dashboard that highlights model health metrics alongside key product and business metrics. This visibility also helps ML Ops teams debug issues effectively as they arise. 

2. Outlier detection

Models can sometimes produce an outcome that is significantly outside of the normal range of results  — we call this an outlier. Outliers can be disruptive to business outcomes and often have major negative consequences if they go unnoticed.

For example, Uber uses AI to dynamically determine the price of every ride, including surge pricing. This is based on a variety of factors — like rider demand or availability of drivers in an area. Consider a scenario where a concert concludes and attendees simultaneously request rides. Due to an increase in demand, the model might surge the price of a ride by 100 times the normal range. Riders never want to pay 100 times the price to hail a ride, and this can have a significant impact on consumer trust.

Monitoring can help businesses balance the benefits of AI predictions with their need for predictable outcomes. Automated alerts can help ML operations teams detect outliers in real time by giving them a chance to respond before any harm occurs. Additionally, ML Ops teams should invest in tooling to override the output of the model manually.  

In our example above, detecting the outlier in the pricing model can alert the team and help them take corrective action — like disabling the surge before riders notice. Furthermore, it can help the ML team collect valuable data to retrain the model to prevent this from occurring in the future. 

3. Data drift tracking 

Drift refers to a model’s performance degrading over time once it’s in production. Because AI models are often trained on a small set of data, they initially perform well, since the real-world production data is very similar to the training data. But with time, actual production data changes due to a variety of factors, like user behavior, geographies and time of year. 

Consider a conversational AI bot that solves customer support issues. As we launch this bot for various customers, we might notice that users can request support in vastly different ways. For example, a user requesting support from a bank might speak more formally, whereas a user on a shopping website might speak more casually. This change in language patterns compared to the training data can result in bot performance getting worse with time. 

To ensure models remain effective, the best ML teams track the drift in the distribution of features — that is, embeddings between our training data and production data. A large change in distribution indicates the need to retrain our models to achieve optimal performance. Ideally, data drift needs to be monitored at least every six months and can occur as frequently as every few weeks for high-volume applications. Failing to do so could cause significant inaccuracies and hinder the model’s overall trustworthiness. 

A structured approach to success 

AI is neither a magic bullet for business transformation nor a false promise of improvement. Like any other technology, it has tremendous promise given the right strategy. 

If developed from scratch, AI can not be deployed and then left to run on its own without proper attention. Truly transformative AI deployments adopt a structured approach that involves careful monitoring, testing, and increased improvement over time. Businesses that do not have the time nor the resources to take this approach will find themselves caught in a perpetual game of catch-up. 

Rahul Kayala is principal product manager at Moveworks.

Originally appeared on: TheSpuzz