Skip to content

Commit

Permalink
Merge pull request #6358 from singhniraj08:fix-template-tutorials
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 571010025
  • Loading branch information
tfx-copybara committed Oct 10, 2023
2 parents 7ea8d69 + 125a7d1 commit 67b1251
Show file tree
Hide file tree
Showing 2 changed files with 50 additions and 50 deletions.
60 changes: 30 additions & 30 deletions docs/tutorials/tfx/template.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -48,15 +48,15 @@
"Note: We recommend running this tutorial on Google Cloud Vertex AI Workbench. [Launch this notebook on Vertex AI Workbench](https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?q=download_url%3Dhttps%253A%252F%252Fraw.githubusercontent.com%252Ftensorflow%252Ftfx%252Fmaster%252Fdocs%252Ftutorials%252Ftfx%252Ftemplate.ipynb).\n",
"\n",
"\n",
"\u003cdiv class=\"devsite-table-wrapper\"\u003e\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
"\u003ctd\u003e\u003ca target=\"_blank\" href=\"https://www.tensorflow.org/tfx/tutorials/tfx/template\"\u003e\n",
"\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\"/\u003eView on TensorFlow.org\u003c/a\u003e\u003c/td\u003e\n",
"\u003ctd\u003e\u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/tfx/blob/master/docs/tutorials/tfx/template.ipynb\"\u003e\n",
"\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\"\u003eRun in Google Colab\u003c/a\u003e\u003c/td\u003e\n",
"\u003ctd\u003e\u003ca target=\"_blank\" href=\"https://github.com/tensorflow/tfx/tree/master/docs/tutorials/tfx/template.ipynb\"\u003e\n",
"\u003cimg width=32px src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\"\u003eView source on GitHub\u003c/a\u003e\u003c/td\u003e\n",
"\u003ctd\u003e\u003ca href=\"https://storage.googleapis.com/tensorflow_docs/tfx/docs/tutorials/tfx/template.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\u003c/td\u003e\n",
"\u003c/table\u003e\u003c/div\u003e"
"<div class=\"devsite-table-wrapper\"><table class=\"tfo-notebook-buttons\" align=\"left\">\n",
"<td><a target=\"_blank\" href=\"https://www.tensorflow.org/tfx/tutorials/tfx/template\">\n",
"<img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\"/>View on TensorFlow.org</a></td>\n",
"<td><a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/tfx/blob/master/docs/tutorials/tfx/template.ipynb\">\n",
"<img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\">Run in Google Colab</a></td>\n",
"<td><a target=\"_blank\" href=\"https://github.com/tensorflow/tfx/tree/master/docs/tutorials/tfx/template.ipynb\">\n",
"<img width=32px src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\">View source on GitHub</a></td>\n",
"<td><a href=\"https://storage.googleapis.com/tensorflow_docs/tfx/docs/tutorials/tfx/template.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Download notebook</a></td>\n",
"</table></div>"
]
},
{
Expand Down Expand Up @@ -156,7 +156,7 @@
"outputs": [],
"source": [
"# Read GCP project id from env.\n",
"shell_output=!gcloud config list --format 'value(core.project)' 2\u003e/dev/null\n",
"shell_output=!gcloud config list --format 'value(core.project)' 2>/dev/null\n",
"GOOGLE_CLOUD_PROJECT=shell_output[0]\n",
"%env GOOGLE_CLOUD_PROJECT={GOOGLE_CLOUD_PROJECT}\n",
"print(\"GCP project ID:\" + GOOGLE_CLOUD_PROJECT)"
Expand All @@ -168,9 +168,9 @@
"id": "A_6r4uzE0oky"
},
"source": [
"We also need to access your KFP cluster. You can access it in your Google Cloud Console under \"AI Platform \u003e Pipeline\" menu. The \"endpoint\" of the KFP cluster can be found from the URL of the Pipelines dashboard, or you can get it from the URL of the Getting Started page where you launched this notebook. Let's create an `ENDPOINT` environment variable and set it to the KFP cluster endpoint. **ENDPOINT should contain only the hostname part of the URL.** For example, if the URL of the KFP dashboard is `https://1e9deb537390ca22-dot-asia-east1.pipelines.googleusercontent.com/#/start`, ENDPOINT value becomes `1e9deb537390ca22-dot-asia-east1.pipelines.googleusercontent.com`.\n",
"We also need to access your KFP cluster. You can access it in your Google Cloud Console under \"AI Platform > Pipeline\" menu. The \"endpoint\" of the KFP cluster can be found from the URL of the Pipelines dashboard, or you can get it from the URL of the Getting Started page where you launched this notebook. Let's create an `ENDPOINT` environment variable and set it to the KFP cluster endpoint. **ENDPOINT should contain only the hostname part of the URL.** For example, if the URL of the KFP dashboard is `https://1e9deb537390ca22-dot-asia-east1.pipelines.googleusercontent.com/#/start`, ENDPOINT value becomes `1e9deb537390ca22-dot-asia-east1.pipelines.googleusercontent.com`.\n",
"\n",
"\u003e**NOTE: You MUST set your ENDPOINT value below.**"
">**NOTE: You MUST set your ENDPOINT value below.**"
]
},
{
Expand Down Expand Up @@ -295,7 +295,7 @@
"id": "1tEYUQxH0olO"
},
"source": [
"\u003eNOTE: Don't forget to change directory in `File Browser` on the left by clicking into the project directory once it is created."
">NOTE: Don't forget to change directory in `File Browser` on the left by clicking into the project directory once it is created."
]
},
{
Expand Down Expand Up @@ -344,7 +344,7 @@
"outputs": [],
"source": [
"!{sys.executable} -m models.features_test\n",
"!{sys.executable} -m models.keras.model_test\n"
"!{sys.executable} -m models.keras_model.model_test\n"
]
},
{
Expand All @@ -355,7 +355,7 @@
"source": [
"## Step 4. Run your first TFX pipeline\n",
"\n",
"Components in the TFX pipeline will generate outputs for each run as [ML Metadata Artifacts](https://www.tensorflow.org/tfx/guide/mlmd), and they need to be stored somewhere. You can use any storage which the KFP cluster can access, and for this example we will use Google Cloud Storage (GCS). A default GCS bucket should have been created automatically. Its name will be `\u003cyour-project-id\u003e-kubeflowpipelines-default`.\n"
"Components in the TFX pipeline will generate outputs for each run as [ML Metadata Artifacts](https://www.tensorflow.org/tfx/guide/mlmd), and they need to be stored somewhere. You can use any storage which the KFP cluster can access, and for this example we will use Google Cloud Storage (GCS). A default GCS bucket should have been created automatically. Its name will be `<your-project-id>-kubeflowpipelines-default`.\n"
]
},
{
Expand Down Expand Up @@ -386,7 +386,7 @@
"source": [
"Let's create a TFX pipeline using the `tfx pipeline create` command.\n",
"\n",
"\u003eNote: When creating a pipeline for KFP, we need a container image which will be used to run our pipeline. And `skaffold` will build the image for us. Because skaffold pulls base images from the docker hub, it will take 5~10 minutes when we build the image for the first time, but it will take much less time from the second build."
">Note: When creating a pipeline for KFP, we need a container image which will be used to run our pipeline. And `skaffold` will build the image for us. Because skaffold pulls base images from the docker hub, it will take 5~10 minutes when we build the image for the first time, but it will take much less time from the second build."
]
},
{
Expand Down Expand Up @@ -443,7 +443,7 @@
"However, we recommend visiting the KFP Dashboard. You can access the KFP Dashboard from the Cloud AI Platform Pipelines menu in Google Cloud Console. Once you visit the dashboard, you will be able to find the pipeline, and access a wealth of information about the pipeline.\n",
"For example, you can find your runs under the *Experiments* menu, and when you open your execution run under Experiments you can find all your artifacts from the pipeline under *Artifacts* menu.\n",
"\n",
"\u003eNote: If your pipeline run fails, you can see detailed logs for each TFX component in the Experiments tab in the KFP Dashboard.\n",
">Note: If your pipeline run fails, you can see detailed logs for each TFX component in the Experiments tab in the KFP Dashboard.\n",
" \n",
"One of the major sources of failure is permission related problems. Please make sure your KFP cluster has permissions to access Google Cloud APIs. This can be configured [when you create a KFP cluster in GCP](https://cloud.google.com/ai-platform/pipelines/docs/setting-up), or see [Troubleshooting document in GCP](https://cloud.google.com/ai-platform/pipelines/docs/troubleshooting)."
]
Expand All @@ -458,7 +458,7 @@
"\n",
"In this step, you will add components for data validation including `StatisticsGen`, `SchemaGen`, and `ExampleValidator`. If you are interested in data validation, please see [Get started with Tensorflow Data Validation](https://www.tensorflow.org/tfx/data_validation/get_started).\n",
"\n",
"\u003e**Double-click to change directory to `pipeline` and double-click again to open `pipeline.py`**. Find and uncomment the 3 lines which add `StatisticsGen`, `SchemaGen`, and `ExampleValidator` to the pipeline. (Tip: search for comments containing `TODO(step 5):`). Make sure to save `pipeline.py` after you edit it.\n",
">**Double-click to change directory to `pipeline` and double-click again to open `pipeline.py`**. Find and uncomment the 3 lines which add `StatisticsGen`, `SchemaGen`, and `ExampleValidator` to the pipeline. (Tip: search for comments containing `TODO(step 5):`). Make sure to save `pipeline.py` after you edit it.\n",
"\n",
"You now need to update the existing pipeline with modified pipeline definition. Use the `tfx pipeline update` command to update your pipeline, followed by the `tfx run create` command to create a new execution run of your updated pipeline.\n"
]
Expand Down Expand Up @@ -500,7 +500,7 @@
"\n",
"In this step, you will add components for training and model validation including `Transform`, `Trainer`, `Resolver`, `Evaluator`, and `Pusher`.\n",
"\n",
"\u003e**Double-click to open `pipeline.py`**. Find and uncomment the 5 lines which add `Transform`, `Trainer`, `Resolver`, `Evaluator` and `Pusher` to the pipeline. (Tip: search for `TODO(step 6):`)\n",
">**Double-click to open `pipeline.py`**. Find and uncomment the 5 lines which add `Transform`, `Trainer`, `Resolver`, `Evaluator` and `Pusher` to the pipeline. (Tip: search for `TODO(step 6):`)\n",
"\n",
"As you did before, you now need to update the existing pipeline with the modified pipeline definition. The instructions are the same as Step 5. Update the pipeline using `tfx pipeline update`, and create an execution run using `tfx run create`.\n"
]
Expand Down Expand Up @@ -545,17 +545,17 @@
"\n",
"[BigQuery](https://cloud.google.com/bigquery) is a serverless, highly scalable, and cost-effective cloud data warehouse. BigQuery can be used as a source for training examples in TFX. In this step, we will add `BigQueryExampleGen` to the pipeline.\n",
"\n",
"\u003e**Double-click to open `pipeline.py`**. Comment out `CsvExampleGen` and uncomment the line which creates an instance of `BigQueryExampleGen`. You also need to uncomment the `query` argument of the `create_pipeline` function.\n",
">**Double-click to open `pipeline.py`**. Comment out `CsvExampleGen` and uncomment the line which creates an instance of `BigQueryExampleGen`. You also need to uncomment the `query` argument of the `create_pipeline` function.\n",
"\n",
"We need to specify which GCP project to use for BigQuery, and this is done by setting `--project` in `beam_pipeline_args` when creating a pipeline.\n",
"\n",
"\u003e**Double-click to open `configs.py`**. Uncomment the definition of `GOOGLE_CLOUD_REGION`, `BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS` and `BIG_QUERY_QUERY`. You should replace the region value in this file with the correct values for your GCP project.\n",
">**Double-click to open `configs.py`**. Uncomment the definition of `GOOGLE_CLOUD_REGION`, `BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS` and `BIG_QUERY_QUERY`. You should replace the region value in this file with the correct values for your GCP project.\n",
"\n",
"\u003e**Note: You MUST set your GCP region in the `configs.py` file before proceeding.**\n",
">**Note: You MUST set your GCP region in the `configs.py` file before proceeding.**\n",
"\n",
"\u003e**Change directory one level up.** Click the name of the directory above the file list. The name of the directory is the name of the pipeline which is `my_pipeline` if you didn't change.\n",
">**Change directory one level up.** Click the name of the directory above the file list. The name of the directory is the name of the pipeline which is `my_pipeline` if you didn't change.\n",
"\n",
"\u003e**Double-click to open `kubeflow_runner.py`**. Uncomment two arguments, `query` and `beam_pipeline_args`, for the `create_pipeline` function.\n",
">**Double-click to open `kubeflow_runner.py`**. Uncomment two arguments, `query` and `beam_pipeline_args`, for the `create_pipeline` function.\n",
"\n",
"Now the pipeline is ready to use BigQuery as an example source. Update the pipeline as before and create a new execution run as we did in step 5 and 6."
]
Expand Down Expand Up @@ -584,11 +584,11 @@
"\n",
"Several [TFX Components uses Apache Beam](https://www.tensorflow.org/tfx/guide/beam) to implement data-parallel pipelines, and it means that you can distribute data processing workloads using [Google Cloud Dataflow](https://cloud.google.com/dataflow/). In this step, we will set the Kubeflow orchestrator to use dataflow as the data processing back-end for Apache Beam.\n",
"\n",
"\u003e**Double-click `pipeline` to change directory, and double-click to open `configs.py`**. Uncomment the definition of `GOOGLE_CLOUD_REGION`, and `DATAFLOW_BEAM_PIPELINE_ARGS`.\n",
">**Double-click `pipeline` to change directory, and double-click to open `configs.py`**. Uncomment the definition of `GOOGLE_CLOUD_REGION`, and `DATAFLOW_BEAM_PIPELINE_ARGS`.\n",
"\n",
"\u003e**Change directory one level up.** Click the name of the directory above the file list. The name of the directory is the name of the pipeline which is `my_pipeline` if you didn't change.\n",
">**Change directory one level up.** Click the name of the directory above the file list. The name of the directory is the name of the pipeline which is `my_pipeline` if you didn't change.\n",
"\n",
"\u003e**Double-click to open `kubeflow_runner.py`**. Uncomment `beam_pipeline_args`. (Also make sure to comment out current `beam_pipeline_args` that you added in Step 7.)\n",
">**Double-click to open `kubeflow_runner.py`**. Uncomment `beam_pipeline_args`. (Also make sure to comment out current `beam_pipeline_args` that you added in Step 7.)\n",
"\n",
"Now the pipeline is ready to use Dataflow. Update the pipeline and create an execution run as we did in step 5 and 6."
]
Expand Down Expand Up @@ -626,11 +626,11 @@
"\n",
"TFX interoperates with several managed GCP services, such as [Cloud AI Platform for Training and Prediction](https://cloud.google.com/ai-platform/). You can set your `Trainer` component to use Cloud AI Platform Training, a managed service for training ML models. Moreover, when your model is built and ready to be served, you can *push* your model to Cloud AI Platform Prediction for serving. In this step, we will set our `Trainer` and `Pusher` component to use Cloud AI Platform services.\n",
"\n",
"\u003eBefore editing files, you might first have to enable *AI Platform Training \u0026 Prediction API*.\n",
">Before editing files, you might first have to enable *AI Platform Training & Prediction API*.\n",
"\n",
"\u003e**Double-click `pipeline` to change directory, and double-click to open `configs.py`**. Uncomment the definition of `GOOGLE_CLOUD_REGION`, `GCP_AI_PLATFORM_TRAINING_ARGS` and `GCP_AI_PLATFORM_SERVING_ARGS`. We will use our custom built container image to train a model in Cloud AI Platform Training, so we should set `masterConfig.imageUri` in `GCP_AI_PLATFORM_TRAINING_ARGS` to the same value as `CUSTOM_TFX_IMAGE` above.\n",
">**Double-click `pipeline` to change directory, and double-click to open `configs.py`**. Uncomment the definition of `GOOGLE_CLOUD_REGION`, `GCP_AI_PLATFORM_TRAINING_ARGS` and `GCP_AI_PLATFORM_SERVING_ARGS`. We will use our custom built container image to train a model in Cloud AI Platform Training, so we should set `masterConfig.imageUri` in `GCP_AI_PLATFORM_TRAINING_ARGS` to the same value as `CUSTOM_TFX_IMAGE` above.\n",
"\n",
"\u003e**Change directory one level up, and double-click to open `kubeflow_runner.py`**. Uncomment `ai_platform_training_args` and `ai_platform_serving_args`.\n",
">**Change directory one level up, and double-click to open `kubeflow_runner.py`**. Uncomment `ai_platform_training_args` and `ai_platform_serving_args`.\n",
"\n",
"Update the pipeline and create an execution run as we did in step 5 and 6."
]
Expand Down
Loading

0 comments on commit 67b1251

Please sign in to comment.