We already wrote above that building infrastructure from Open-Source tools requires more resources from the IT team. This not only increases development costs but also increases Time-to-market.
Therefore, we have developed the Cloud ML Platform, a convenient pre-configured environment where you can implement the full cycle of ML development. Out of the box, JupyterHub and MLflow are available in the ML Platform, which is already integrated. These solutions are familiar to many data scientists and allow you to solve basic MLOps problems.
We also integrated MLflow with the cloud infrastructure to solve server tasks. MLflow Deploy can automatically package ML models into Docker containers and make them available via a REST API to solve real-time maintenance tasks. The service is integrated with ML Platform components: JupyterHub and MLflow.
For the Cloud ML Platform, we chose Open-Source tools for several reasons:
- Low entry threshold — we have integrated and configured MLOps tools, so there is no need to waste time figuring out all the nuances of implementation;
- There are no Vendor Lock-in risks if one of the vendors decides to leave the market;
- It is easy to adapt tools for business tasks.
Also, there are key tools for Data Engineering: Hadoop, Spark, Greenplum, ClickHouse, and Airflow.
We accrue 3,000 bonus rubles to new users to test the Cloud ML Platform from. Register with the service and start building MLOps in your company.
Complex Solutions In MLOps
- Pachyderm
Pachyderm automates data transformation with data versioning, data lineage, and end-to-end pipelines in Kubernetes. You can use the same syntax for version control as in Git. In Pachyderm, the top level of the object is the repository, so Commit, Branches, File, History, and Provenance can be used to track and version a dataset.
Here are a few key features of Pachyderm:
- Saving versions of data. Pachyderms can create different versions of data and store them in organized repositories. Due to this, you can detect bad commits and make changes to them, as well as replay pipelines with the previous version of the data.
- Container analytics. The service supports many libraries and frameworks at every stage of the pipeline.
- Individually scalable pipeline stages. Pachyderm supports data/workload calculation as needed for each pipeline stage.
The available version is for small teams. And those who need advanced features can choose the Enterprise version.
- Seldon Core
It supports REST and gRPC protocols and manual and automatic scaling. This is one of the most popular deployment solutions. Seldon Core is another Open-Source platform used for ML models similar to Kubernetes. Among the features of the tool:
- inference graphs made up of predictors, transformers, routers, combiners, and so on;
- metrics with integration into Prometheus and Grafana;
- auditable I/O request logging thanks to integration with Elasticsearch;
- integration with Jaeger for distributed tracing of microservices.
Also Read: How To Work With Big Data Faster And More Efficiently: Kubernetes For Data Science