Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.
Data science is a quickly growing technology as organizations of all sizes embrace of AI and ML and along with that growth has come no shortage of concerns.
The 2022 State of Data Science report, released today by data science platform vendor Anaconda identifies key trends and concerns for data scientists and the organizations that employ them. Among the trends identified by Anaconda is the fact that the open source Python programming language continues to dominate the data science landscape.
Among the key concerns identified in the report has to do with barriers to adoption of data science overall.
“One area that did surprise me was that two-thirds of respondents felt that the biggest barrier to successful enterprise adoption of data science is insufficient investment in data engineering and tooling to enable production of good models,” Peter Wang, Anaconda CEO and co-founder, told VentureBeat. “We’ve always known that data science and machine learning can suffer from poor models and inputs but it was interesting to see our respondents rank this even higher than the talent/headcount gap.”
MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.
AI bias is far from a solved issue
The issue of AI bias is one that is well-known for data science. What isn’t as well-known is exactly what organizations are actually doing to combat the issue.
Last year, Anaconda’s 2021 State of Data Science found that 40% of orgs were planning or doing something to help with the issue of bias. Anaconda didn’t ask the same question this year, opting instead to take a different approach.
“Instead of asking if organizations were planning to address bias, we wanted to look at the specific steps organizations are now taking to ensure fairness and mitigate bias,” Wang said. “We realized from our findings last year that organizations had plans in the works to address this, so for 2022, we wanted to look into what actions they took, if any, and where their priorities are.”
As part of AI bias prevention efforts, 31% of respondents noted that they evaluate data collection methods according to internally set standards for fairness. In contrast, 24% noted that they do not have standards for fairness and bias mitigation in data sets and models.
AI explainability is a foundational element for helping to identify and prevent bias. When asked what tools are used for AI explainability, 35% of respondents noted that their organizations perform a series of controlled tests to assess model interpretability, while 24% do not have any measures or tools to ensure model explainability.
“While each response measure has less than 50% of these efforts in place, the results here tell us that organizations are taking a varied approach to mitigating bias,” Wang said. “Ultimately, organizations are taking action, they’re just early in their journey of addressing bias.”
How data scientists spend their time
Data scientists have a number of different tasks they need to do as part of their jobs.
While actually deploying models is the desired end goal, that’s not where data scientists actually spend most of their time. In fact, the study found that data scientists only spend 9% of their time on deploying models. Similarly respondents reported they only spend 9% of their time on model selection.
The biggest time sink is data preparation and cleansing which accounts for 38% of the time.
The love and fear relationship with open source
The report also asked data scientists about how they use and view open source software.
Eighty-seven percent responded that their organizations allowed for open source software. Yet despite that use, 54% of respondents noted that they are worried about open source security.
“Today, open source is embedded across nearly every piece of software and technology and it’s not just because it’s cheaper in the long run,” Wang said. “The innovation occurring around AI, machine learning and data science is all happening within the open-source ecosystem at a speed that can’t be matched by a closed system.”
That said, Wang said that it’s understandable for organizations to be aware of the risks involved with open source and develop a plan for mitigating any potential vulnerabilities.
“One of the benefits of open source is that patches and solutions are built out in the open instead of behind closed doors,” he said.
The Anaconda report was based on a survey of 3,493 respondents from 133 countries.