We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
New York-based Dataiku, which provides a centralized solution for the design, deployment and management of enterprise artificial intelligence (AI) applications, has released version 11 of its unified data and AI platform. The update, set to be generally available in July, focuses on delivering on the promise of “everyday AI” and provides new capabilities to not only help data experts handle more expansive AI projects, but also enable nontechnical business users to easily engage with AI for improved workflows, among other benefits.
“Expert data scientists, data engineers and ML [machine learning] engineers are some of the most valuable and sought-after jobs today. Yet all too often, talented data scientists spend most of their time on low-value logistics like setting up and maintaining environments, preparing data and putting projects into production. With extensive automation built into Dataiku 11, we’re helping companies eliminate the frustrating busywork so companies can make more of their AI investment quickly and ultimately create a culture of AI to transform industries,” Clément Stenac, CTO and cofounder of Dataiku, said.
Below is a rundown of the key capabilities.
Code Studios with experiment tracking
Code Studios in Dataiku 11 provides AI developers with a fully managed, isolated coding environment in their Dataiku project, where they can work using their own preferred IDE or web app stack. The solution gives AI developers a way to code how they’re comfortable while complying with their company’s policies for analytics centralization and governance (if any). Previously, anything like this would have meant going for a custom setup, with increased cost and complexity.
The solution also comes with an experiment-tracking feature, which provides developers with a central interface to store and compare all bespoke model runs made programmatically using the MLFlow framework.
Seamless computer vision development
To simplify the resource-intensive task of developing computer vision models, Dataiku 11 brings a built-in data labeling framework and a visual ML interface.
The former, as the company explains, automatically annotates data in large amounts – a task often handled through third-party platforms like Tasq.ai. Meanwhile, the latter provides an end-to-end, visual path for common computer vision tasks, enabling both advanced and novice data scientists to tackle complex object detection and image classification use cases, from data preparation to developing and deploying the models.
Business users, especially the ones with limited technical expertise, often find it difficult to analyze historical data and create robust business forecast models for decision-making. To address this, Dataiku 11 offers built-in tools that provide no-code visual interfaces and help teams analyze temporal data and develop, evaluate and deploy time-series forecasting models.
The latest release also brings a Feature Store with new object-sharing flows to improve organization-wide collaboration and accelerate the entire process of model development. According to the company, the capability will give data teams a dedicated zone to access or share reference datasets containing curated AI features. This will keep developers from re-engineering the same features or using redundant data assets for ML projects and prevent inefficiencies and inconsistencies.
Teams often use a manual trial and error (what if) method to provide business stakeholders with actionable insights that could help them achieve the best possible outcomes.
With Outcome Optimization, coming as part of Dataiku 11, the entire process will be automated. In essence, it will automatically consider user-defined constraints and find the optimal set of input values that will give the desired results. For example, it could prescribe what changes a manufacturer could make to factory conditions in order to achieve the maximum production yield or what adjustments to a bank consumer’s financial profile would lead to the lowest probability of loan defaults.
Among other things, the company has introduced tools to improve oversight and control over model development and deployment. This includes an automated tool to generate flow documents and a central registry that captures snapshots of all data pipelines and project artifacts – for review and sign-off before production. The company will also provide model stress tests, which will examine model behavior in real-world deployment situations prior to the actual deployment.