In this workshop, you will go through the steps required to build a machine learning application on AWS using Amazon SageMaker Studio Experience.
- 01 - Build and train models: Perform data preparation and analysis using the SageMaker Studio Jupyterlab notebook experience and run your local code as a SageMaker Training job using the remote function feature. MLflow will be used to track and observe the experiments.
- 02 - Deploy models: You will learn to use SageMaker Studio's Code Editor, which is based on Visual Studio Code – Open Source (Code-OSS), to deploy the model into an endpoint using SageMaker ModelBuilder.
- 03 - Complete a complete pipeline: You will create a complete pipeline from a workflow consisting of multiple steps using SageMaker Studio's Code Editor.
- 04 - Build HTTP API: You will learn how to build an HTTP endpoint using AWS Lambda and Amazon API Gateway to serve inference requests from a web client.
- 05 - Invoke HTTP API: You will invoke the HTTP API from the browser.
This workshop covers multiple of the new features announced at AWS re:Invent 2023. To learn more about these new features, watch the recording for the breakout session Scale complete ML development with Amazon SageMaker Studio (AIM325):
- Remote and Step decorators for simple packaging and remote function calling.
- ModelBuilder for easier packaging, local testing and deployment of models.
- New SageMaker Studio Experience for Jupyterlab and Code Editor (based on Visual Studio Code - Code OSS).
Update October 15, 2024: The workshop has been extended to cover the following features
- MLflow for Experiments: Allows to utilize open source MLflow for observability through out your iterations.
- SageMaker Local Mode: Allows you to run created pipelines also locally using Docker, which makes your development life cycle much faster.
- Added Support for multi project isolation, by adding a custom project prefix for all resource like S3 buckets, training job names, inference endpoint names, pipeline names.
AWS re:Invent 2023 - Scale complete ML development with Amazon SageMaker Studio (AIM325)
The machine mearning process is an iterative process consisting of several steps:
- Identifying a business problem and the related machine learning problem
- Data ingestion, integration and preparation
- Data visualization and analysis, feature engineering, model training and model evaluation
- Model deployment, model monitoring and debugging
These steps are usually repeated multiple times to better meet business goals after the source data changes or performance of the model drops, for example.
The following diagram shows how the process works:
After you deploy a model, you can integrate it with your own application to provide insights to the end users.
Amazon SageMaker is a fully-managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale.
Amazon SageMaker removes the complexity that holds back developer success with each of these steps; indeed, it includes modules that can be used together or independently to build, train, and deploy your machine learning models.
You will use the AI4I 2020 Predictive Maintenance Dataset from the UCI Machine Learning Repository. This synthetic dataset, which contains predictive maintenance data encountered in industry, consists of 10,000 records and 14 features. The features include various measurements collected from machinery and indication of whether the mechine is likely to fail. This basic dataset oversimplifies a predictive maintenance task. However, it keeps this workshop easy to follow while being a good representative of the various steps of the machine learning workflow. You can adapt the steps in this workshop to solve other machine learning tasks, including generative AI fine-tuning and deployment.
In this workshop, your goal is to build a simple machine learning model that predicts whether a piece of machinery is going to fail.
Following is an excerpt from the dataset:
UDI | Product ID | Type | Air temperature [K] | Process temperature [K] | ... | Machine failure |
---|---|---|---|---|---|---|
1 | M14860 | M | 298.1 | 308.6 | ... | 0 |
2 | L47181 | L | 298.2 | 308.7 | ... | 0 |
3 | L47182 | L | 298.1 | 308.5 | ... | 0 |
51 | L47230 | L | 298.9 | 309.1 | ... | 1 |
The binary (0 or 1) nature of the target variable, Machine failure, suggests you are solving a binary classification problem. In this workshop, you will build a logistic regression model, which will predict a continuous value in the range [0,1]. Using a regression model to solve a binary classification problem is a common approach. The predicted score indicates the system’s certainty that the given observation belongs to the positive class. To make the decision about whether the observation should be classified as positive or negative, as a consumer of this score, you can interpret the score by picking a classification threshold (cut-off) and compare the score against it. Any observations with scores higher than the threshold are then predicted as the positive class and scores lower than the threshold are predicted as the negative class. To learn more about this approach, read https://docs.aws.amazon.com/machine-learning/latest/dg/binary-classification.html.
This diagram shows what you will be building in this workshop:
If you are attending the Scale complete ML development with Amazon SageMaker Studio workshop run by AWS, the AWS event facilitator has provided you access to a temporary AWS account preconfigured for this workshop. Proceed to Module 0: Open SageMaker Studio.
If you want to use your own AWS account, you'll have to execute some preliminary configuration steps as described in the Setup Guide.
⚠️ Running this workshop in your AWS account will incur costs. You will need to delete the resources you create to avoid incurring further costs after you have completed the workshop. See the clean up steps.
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Giuseppe A. Porcelli - Principal ML Specialist Solutions Architect - Amazon Web Services
Antonio Duma - Senior Startup Solutions Architect - Amazon Web Services
Hasan Poonawala - Senior ML Specialist Solutions Architect - Amazon Web Services
Mehran Nikoo - ML & Generative AI Go-To-Market Specialist - Amazon Web Services
Bruno Pistone - AI/ML Specialist Solutions Architect - Amazon Web Services
Durga Sury - ML Solutions Architect - Amazon Web Services
Arlind Nocaj - Senior Solutions Architect - Amazon Web Services