Report: 37% of ML leaders say they don’t have the data needed to improve model performance

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!


A new report by Scale AI uncovers what’s working and what’s not working with AI implementation, and the best practices for ML teams to move from just testing to real-world deployment. The report explores every stage of the ML lifecycle – from data collection and annotation to model development, deployment, and monitoring – in order to understand where AI innovation is being bottlenecked, where breakdowns occur, and what approaches are helping companies find success.

The report’s goal is to continue to shed light on the realities of what it takes to unlock the full potential of AI for every business and help empower organizations and ML practitioners to clear their current hurdles, learn and implement best practices, and ultimately use AI as a strategic advantage.

For ML practitioners, data quality is one of the most important factors in their success, and according to respondents, it’s also the most difficult challenge to overcome. In this study, more than one-third (37%) of all respondents said they do not have the variety of data they need to improve model performance. Not only do they not have variety of data, but quality is also an issue — only 9% of respondents indicated their training data is free from noise, bias and gaps. 

The majority of respondents have problems with their training data. The top three issues are data noise (67%), data bias (47%) and domain gaps (47%).

Most teams, regardless of industry or level of AI advancement, face similar challenges with data quality and variety. Scale’s data suggests that working closely with annotation partners can help ML teams overcome challenges in data curation and annotation quality, accelerating model deployment. ML teams that are not at all engaged with annotation partners are the most likely to take greater than three months to get annotated data. 

This survey was conducted online within the United States by Scale AI from March 31, 2022, to April 12, 2022. More than 1,300 ML practitioners including those from Meta, Amazon, Spotify and more were surveyed for the report.

Read the full report by Scale AI.

Originally appeared on: TheSpuzz

Scoophot
Logo