How MLops deployment can be easier with open-source versioning

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

Modern software development typically follows a very iterative approach known as continuous integration/continuous development (CI/CD). The promise of CI/CD is better software that is released quicker and it’s a promise that ClearML now intends to bring to the world of machine learning (ML).

ClearML today announced the general availability of its enterprise MLops platform that extends the capabilities of the company’s open-source edition. The ClearML Enterprise platform provides organizations with security controls and additional capabilities for rapidly iterating and deploying ML workflows.

“The key driver is having the ability to very quickly move ML from research into the business units because it’s always an iterative process,” Moses Guttmann, CEO and cofounder of ClearML told VentureBeat. “You cannot assume that the first time you actually push a model into production it will work and you need to have feedback from different business units.”

ML development and deployment is not a linear process

There is no shortage of solutions in the MLops space with vendors including Domino Data Lab, Big Panda, Run AI and technologies on the cloud vendor platforms including AWS Sagemaker and Google’s Vertex AI.


MetaBeat 2022

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

Among the many reasons why there are a growing number of vendors in the sector, a significant one is because building and deploying ML models is often a complicated process with many manual steps. A primary goal of MLops tools is to help automate the process of building and deploying models.

While automation is important, it only solves part of the complexity. A key challenge for artificial intelligence (AI) models, that was identified in a recently released Gartner report, is that approximately only half of AI models actually end up making it into production.

From Guttmann’s perspective, with application development, developers tend to have a linear way of building things. This implies that for example, new code written six months after the initial development is better than the original. That same view does not tend to work with machine learning as the process involves more research and more experimentation to determine what actually works best.

“Development is always money sunk into the problem until you actually see the fruits of the effort and we want to decrease that development time to a minimum,” he said.

How ClearML Works

The basic ML workflow involves the use of some form of dataset that has gone through a data labelling process. The data is then used to train a model, which can be deployed to make predictions or perform automated actions.

One of the features that is specific to ClearML’s new release is a capability the company calls ‘Hyper-Datasets.’ That feature enables organizations to more easily extract metadata info from unstructured datasets, like video or audio files, such that it can be more easily used for training.

“With Hyper-Datasets, we’re basically taking metadata and making it queryable,” Guttmann said.

ClearML’s platform helps with all the stages of the ML workflow and also maintains versioning systems for each of the steps. Guttmann explained that the way the automation is configured is also largely automated. 

Rather than a data scientist needing to always manually configure every single step in the ML workflow, ClearML provides users with two lines of code that help to track and then create automations. With those two lines of code, ClearML monitors each step of an ML workload and then is able to reproduce it, as well as track versions.

The ability to generate reports and query the ClearML system to better understand the MLops process is also part of the platform. Guttmann said that ClearML uses multiple database technologies including Redis, MongoDB and Elastic to store data that users can query to help compare models and track performance. 

Going a step further, the data and reports generated by ClearML can be integrated into other workflow and collaboration tools an organization is using, such as Slack or Jira.

“We’re constantly expanding the capability to help data scientists create something that others in the organization can actually use,” he said.

Originally appeared on: TheSpuzz