Skip to content

Latest commit

 

History

History
68 lines (37 loc) · 3.47 KB

README.md

File metadata and controls

68 lines (37 loc) · 3.47 KB

MLOps Pipeline with Gretel Synthetic Data

This repository, located at gretelai/gretel-mlops, demonstrates how to leverage Gretel Synthetic data within MLOps pipelines across multiple cloud platforms: Amazon SageMaker, Google Cloud Vertex AI, and Azure AI. By integrating synthetic data into machine learning operations, we aim to enhance data privacy, availability, and the robustness of ML models in a world increasingly focused on data security and regulatory compliance.

Features

  • Notebooks: Detailed guides to running MLOps pipelines in SageMaker Pipelines, Vertex AI Pipelines, and Azure AI Pipelines, tailored for each cloud provider.
  • Code Repository: Essential code for constructing these pipelines, providing a practical resource for implementing synthetic data workflows.
  • Config Folder: A collection of dataset examples ready for ML model training on classification or regression tasks with Gretel Synthetics, showcasing the versatility and effectiveness of synthetic data.

Supporting Blogpost

Dive deeper into the integration of synthetic data with MLOps by reading our blog post: "How to Use Amazon SageMaker Pipelines MLOps with Gretel Synthetic Data". This comprehensive guide illuminates the advantages of synthetic data, detailing integration steps and highlighting the synergy between Gretel and SageMaker Pipelines for privacy-conscious and efficient ML model training.

Getting Started

Prerequisites

To begin, ensure you have:

  • An account with the relevant cloud provider. Create one at AWS, Google Cloud, or Azure.
  • Access to Gretel services. Sign up and obtain your API key via the Gretel Console.

Installation

  1. Clone this repository to your local machine or cloud environment.
git clone https://github.com/gretelai/gretel-mlops.git

We require Python 3.9+ to run Gretel services through the SDK.

Usage

To utilize MLOps pipelines with Gretel Synthetic data across various cloud platforms, follow these steps. Example pipeline configurations that run on public datasets are available in the config folder. Each link below directs you to a specific notebook that guides you through the pipeline execution process for the respective cloud service. Detailed guidance in each notebook will help you effectively integrate Gretel Synthetic data into your ML workflows.

Azure AI

For implementing the pipeline in Azure AI, refer to the following notebook:

Amazon SageMaker

For deploying and running the pipeline in Amazon SageMaker, access the notebook here:

Google Cloud Vertex AI

To use the pipeline with Google Cloud's Vertex AI, follow the instructions in this notebook:

Contributing

We welcome contributions to improve this project! Whether you're fixing bugs, adding features, or improving documentation, please let us know how to get involved.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.