Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more
This article was contributed by Aymane Hachcham, data scientist and contributor to neptune.ai
MLOps refers to the operation of machine learning in production. It combines DevOps with lifecycle tracking, reusable infrastructure, and reproducible environments to operationalize machine learning at scale across an entire organization. The term MLOps was first coined by Google in their paper on Machine Learning Operations, although it does have roots in software operations. Google’s goal with this paper was to introduce a new approach to developing AI products that is more agile, collaborative, and customer-centric. MLOps is an advanced form of traditional DevOps and ML/AI that mostly focuses on automation to design, manage, and optimize ML pipelines.
Machine learning on top of DevOps
MLOps is based on DevOps, which is a modern practice for building, delivering, and operating corporate applications effectively. DevOps began a decade ago as a method for rival tribes of software developers (the Devs) and IT operations teams (the Ops) to interact.
MLOps help data scientists monitor and keep track of their solutions in real-life production environments. Furthermore, the real work that happens behind the scenes when pushing to production involves significant issues in terms of both raw performance and managerial discipline. Datasets are huge and constantly expanding, and they can change in real-time. AI models need regular monitoring via rounds of experimentation, adjusting, and retraining.
Lifecycle tracking is a process that enables various team members to track and manage the life cycle of a product from inception to deployment. The system keeps track of all the changes made to the product throughout this process and allows each user to revert back to a previous version if necessary.
Lifecycle tracking focuses on the iterative model development phase where you attempt a variety of modifications to bring your model’s performance to the desired level. A modest modification in training information can sometimes have a significant influence on performance. Because there are multiple layers of experiment tracking involving model training metadata, model versions, training data, etc., you may decide to choose a platform that can automate all these processes for you and manage scalability and team collaboration.
Model training metadata
During the course of a project (especially if there are several individuals working on the project), your experiment data may be spread across multiple devices. In such instances, it can be difficult to control the experimental process, and some knowledge is likely to be lost. You may choose to work with a platform that offers solutions to this issue.
The best way to track the hyperparameters of your different version models is using a configuration file. These are simple text files with a preset structure and standard libraries to interpret them, such as JSON encoder and decoder or PyYAML.json, YAML, and cfg files are common standards. Below is an example of a YAML file for a credit scoring project:
project: ORGANIZATION/project-I-credit-scoring name: cs-credit-default-risk parameters: # Data preparation n_cv_splits: 5 validation_size: 0.2 stratified_cv: True shuffle: 1 # Random forest rf__n_estimators: 2000 rf__criterion: gini rf__max_features: 0.2 rf__max_depth: 40 rf__min_samples_split: 50 rf__min_samples_leaf: 20 rf__max_leaf_nodes: 60 rf__class_weight: balanced # Post Processing aggregation_method: rank_mean
One way to do this is with Hydra, a new Facebook AI project that streamlines the setup of more sophisticated machine learning experiments.
The key takeaways from Hydra are:
- You may compose your hyperparameter configuration dynamically.
- You can pass additional arguments not found in the configuration to the CLI.
Hydra is more versatile and allows you or your MLOps engineer to override complicated configurations (including config groups and hierarchies). The library is well-suited for deep-learning projects and is more reliable than a simple YAML file.
A minimalist example should look like the following:
# Use your previous yaml config file:
project: ORGANIZATION/project-I-credit-scoring name: cs-credit-default-risk parameters: # Data preparation n_cv_splits: 5 validation_size: 0.2 stratified_cv: True shuffle: 1 # Random forest rf__n_estimators: 2000 rf__criterion: gini rf__max_features: 0.2 rf__max_depth: 40 rf__min_samples_split: 50 rf__min_samples_leaf: 20 rf__max_leaf_nodes: 60 rf__class_weight: balanced
Create your Hydra configuration file:
import hydra from omegaconf import DictConfig @hydra.main(config_path="hydra-config.yaml") def paramter_config(cfg): print(cfg.pretty()) # this prints config in a reader friendly way print(cfg.parameters.rf__n_estimators) # Access values from your config file if __name__ == "__main__": train()
When you start training your model, Hydra will log and print the configuration you’ve given:
name: cs-credit-default-risk parameters: n_cv_splits: 5 rf__class_weight: balanced rf__criterion: gini rf__max_depth: 40 rf__n_estimators: 2000 shuffle: 1 stratified_cv: true validation_size: 0.2 project: ORGANIZATION/project-I-credit-scoring
Solid AI infrastructure
The AI infrastructure is the backbone of every AI project. In order for an AI company to be successful, it needs a solid network, servers, and storage solutions. This includes not only hardware but also the software tools that enable them to iterate quickly on machine learning algorithms. It’s extremely important that these solutions are scalable and can adapt as needs change over time.
Objectives and KPIs: key for MLOps engineers
Two principal categories fall under MLOps scope: predictive and prescriptive. Predictive MLOps is about predicting the outcome of a decision based on historic data while prescriptive MLOps is about providing recommendations for decisions before they are made.
And those two categories abide by four general principles:
- Don’t overthink which aim to directly optimize; instead, track various indicators at first
- For your initial aim, select a basic, observable, and accountable metric
- Establish governance objectives
- Fairness and privacy must be enforced
In terms of code, one could establish multiple requirements to have perfectly functional production code. However, the big deal comes when ML models run inference in post-production and are exposed to vulnerabilities never tested against. Therefore, testing is a tremendously important part of the process that actually needs a lot of attention.
Proper testing workflow should always account for the following rules:
- Perform automated regression testing
- Check code quality using static analysis.
- And finally, employ continuous integration
Principal KPIs in MLOps
There is no one-size-fits-all solution when it comes to MLOps KPIs. The metrics you or your MLOps engineer want to monitor will depend on your specific goals and environment. You should start by considering what you need to optimize, how quickly you need to make changes, and what kind of data you can collect. Major KPIs to always keep an eye on when deploying ML software in production include:
Hybrid MLOps infrastructure
The advent of MLOps has seen new-age businesses moving their datacenters into the cloud. This trend has shown that companies that are looking for agility and cost efficiency can easily switch to a fully-managed platform for their infrastructure management needs.
Hybrid MLOps capabilities are defined as those that have some interaction with the cloud while also having some interaction with local computing resources. Local compute resources can include laptops running Jupyter notebooks and Python scripts, HDFS clusters storing terabytes of data, web apps serving millions of people globally, on-premises AWS Outposts, and a plethora of additional applications.
Many companies and MLOps engineers, in response to increased regulatory and data privacy concerns, are turning to hybrid solutions to handle data localization. Furthermore, an increasing number of smart edge devices are fueling creative new services across sectors. Because these devices create large amounts of complicated data that must frequently be processed and evaluated in real-time, IT directors must determine how and where to process that data.
How to implement a hybrid MLOps process for MLOps engineers
A robust AI infrastructure heavily relies on an active learning data pipeline. When used correctly, the data pipeline may dramatically accelerate the development of ML models. It can also lower the cost of developing ML models.
Continuous integration and continuous delivery (CI/CD) are terms used to describe the processes of integrating and delivering software inside a CI/CD framework. Machine learning extends the integration step with data and model validation, whereas delivery handles the difficulties of machine learning installations.
Machine learning experts and MLOps engineers devote a significant amount of work to troubleshooting and enhancing model performance. CI/CD tools save time and automate as much manual work as feasible. Some tools used in business are:
- Github actions
- GitLab Ci/CD
- Circle CI
CT (Continuous Training), a notion specific to MLOps, is all about automating model retraining. It covers the whole model lifetime, from data intake through measuring performance in production. CT guarantees that your algorithm is updated as soon as there is evidence of deterioration or a change in the environment.
Model training pipeline
A model training pipeline is an important part of the ongoing training process and the overall MLOps workflow. It trains and retrains models on a regular basis, freeing up data scientists to focus on building new models for other business challenges.
Each time the pipeline performs a new training the following sequence of operations is performed:
- Data ingestion: Obtaining fresh data from external repositories or feature stores, where data is preserved as reusable “features” tailored to specific business scenarios.
- Data preparation: A crucial step, where data anomalies are detected, the pipeline can be immediately paused until data engineers can resolve the issue.
- Model training and validation: In the most basic case, the model is trained on newly imported and processed data or characteristics. However, you may conduct numerous training runs in parallel or in sequence to find the ideal parameters for a production model. Then the inference is run and tested on specific sets of data to assess the modelâ€TMs performance.
- Data versioning: Data versioning is the technique of preserving data artifacts in the same way as code versions are saved in software development.
All those steps can be implemented by an MLOps engineer in complex solution software that provides full functionalities.
Once the model is trained and ready for a production setup, it is pushed into a model registry, which serves as a centralized repository for all metadata for published models. For each, model-specific entries are determined to serve as the model’s metadata, for example:
project: ORGANIZATION/project-I-credit-scoring model_version: model_v_10.0.02 Identifiers - version - name - version_date - remote_path_to_serialized_model - model_stage_of_deployment - datasets_used_for_training - runtime_metrics
The stage of model deployment. The most recent option, Model-as-a-Service, is now the most popular since it simplifies deployment by isolating the machine learning component from software code. This implies that you or your MLOps engineer can change a model version without having to re-deploy the application.
Generally speaking, there are three main ways to deploy an ML model:
- On an IoT device.
- On an embedded device, consumer application.
- On a dedicated web service available via a REST API.
The best platforms that provide SDKs and APIs for model serving are:
You or your MLOps engineer can also launch multiple models for the same service to perform testing in production. For example, you can try testing for competing model versions. This strategy involves simultaneously deploying many models with comparable results to determine which model is superior. The process is similar to A/B testing, except that you can compare more than two models at the same time.
Upon release, the model’s performance may be influenced by a variety of circumstances, ranging from an initial mismatch between research and real data to changes in customer behavior. Typically, machine learning models do not show errors right away, but their predictions do have an impact on the eventual results. Inadequate insights might result in poor company decisions and, as a result, financial losses. Software tools that MLOps engineers could consider to handle model monitoring are MLWatcher, DbLue, and Qualdo.
Don’t forget that managing any form of company IT infrastructure is not easy. There are constant concerns about security, performance, availability, pricing, and other factors. Hybrid cloud solutions are no exception, introducing higher layers of complexity making IT management even more difficult. To avoid such problems, businesses and MLOps engineers should implement retroactive processes like anomaly detection and making early alerts, and also be ready to trigger ML retraining pipelines as soon as problems arise.
Aymane Hachcham is a data scientist and contributor to neptune.ai