Skip to content

Commit

Permalink
Restructured Kubeflow Pipelines docs
Browse files Browse the repository at this point in the history
Signed-off-by: hbelmiro <[email protected]>
  • Loading branch information
hbelmiro committed May 20, 2024
1 parent 57e57eb commit 9e6b943
Show file tree
Hide file tree
Showing 93 changed files with 287 additions and 507 deletions.
4 changes: 2 additions & 2 deletions content/en/_index.html
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ <h3 class="section-head text-center">Kubeflow Components</h3>
<div class="container">
<div class="card-deck">
<div class="card border-primary-dark">
<a href="/docs/components/pipelines/v2/introduction/" target="_blank" rel="noopener" >
<a href="/docs/components/pipelines/overview/introduction//" target="_blank" rel="noopener" >
<img
src="/docs/images/logos/kubeflow.png"
class="card-img-top"
Expand All @@ -53,7 +53,7 @@ <h3 class="section-head text-center">Kubeflow Components</h3>
<div class="card-body bg-primary-dark">
<h5 class="card-title text-white section-head">Pipelines</h5>
<p class="card-text text-white">
<a target="_blank" rel="noopener" href="/docs/components/pipelines/v2/introduction/">Kubeflow Pipelines</a> (KFP) is a platform for building then deploying portable and scalable machine learning workflows using Kubernetes.
<a target="_blank" rel="noopener" href="/docs/components/pipelines/overview/introduction//">Kubeflow Pipelines</a> (KFP) is a platform for building then deploying portable and scalable machine learning workflows using Kubernetes.
</p>
</div>
</div>
Expand Down
29 changes: 28 additions & 1 deletion content/en/_redirects
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,33 @@
# Removed redundant UI guide. Quickstart is a better destination.
/docs/pipelines/pipelines-ui/ /docs/components/pipelines/pipelines-quickstart/

# Restructured the pipeline docs (https://github.com/kubeflow/website/issues/3716)
docs/components/pipelines/v2/introduction/ docs/components/pipelines/overview/introduction/
docs/components/pipelines/v2/installation/quickstart/ docs/components/pipelines/overview/installation/
docs/components/pipelines/v2/hello-world/ docs/components/pipelines/overview/getting-started/
docs/components/pipelines/v2/run-a-pipeline/ docs/components/pipelines/how-to/run-a-pipeline/
docs/components/pipelines/v2/installation/quickstart/ docs/components/pipelines/overview/introduction/
docs/components/pipelines/v2/compile-a-pipeline/ docs/components/pipelines/how-to/compile-a-pipeline/
docs/components/pipelines/v2/local-execution/ docs/components/pipelines/how-to/execute-kfp-pipelines-locally/
docs/components/pipelines/v2/cli/ docs/components/pipelines/how-to/cli/
docs/components/pipelines/v2/components/lightweight-python-components/ docs/components/pipelines/how-to/create-components/lightweight-python-components/
docs/components/pipelines/v2/pipelines/pipeline-basics/ docs/components/pipelines/how-to/create-components/compose-components-into-pipelines/
docs/components/pipelines/v2/components/containerized-python-components/ docs/components/pipelines/how-to/create-components/containerized-python-components/
docs/components/pipelines/v2/components/container-components/ docs/components/pipelines/how-to/create-components/container-components/
docs/components/pipelines/v2/components/importer-component/ docs/components/pipelines/how-to/create-components/importer-component/
docs/components/pipelines/v2/components/additional-functionality/ docs/components/pipelines/how-to/create-components/additional-functionality/
docs/components/pipelines/v2/load-and-share-components/ docs/components/pipelines/how-to/load-and-share-components/
docs/components/pipelines/v2/pipelines/control-flow/ docs/components/pipelines/how-to/control-flow/
docs/components/pipelines/v2/data-types/parameters/ docs/components/pipelines/how-to/parameters/
docs/components/pipelines/v2/data-types/artifacts/ docs/components/pipelines/how-to/artifacts/
docs/components/pipelines/v2/caching/ docs/components/pipelines/how-to/caching/
docs/components/pipelines/v2/administration/server-config/ docs/components/pipelines/how-to/server-config/
docs/components/pipelines/v2/migration/ docs/components/pipelines/how-to/migration/
docs/components/pipelines/v2/platform-specific-features/ docs/components/pipelines/how-to/platform-specific-features/
docs/components/pipelines/v2/community-and-support/ docs/components/pipelines/reference/community-and-support/
docs/components/pipelines/v2/version-compatibility/ docs/components/pipelines/reference/version-compatibility/
docs/components/pipelines/v2/reference/api/kubeflow-pipeline-api-spec/ docs/components/pipelines/reference/api/kubeflow-pipeline-api-spec/

# Restructured the pipelines docs.
/docs/pipelines/ /docs/components/pipelines
/docs/pipelines/output-viewer/ /docs/components/pipelines/sdk/output-viewer/
Expand Down Expand Up @@ -184,7 +211,7 @@ docs/started/requirements/ /docs/started/getting-started/
/docs/components/pipelines/sdk/v2/* /docs/components/pipelines/sdk-v2/:splat

# pipelines v1 -> v2 redirects
/docs/components/pipelines/* /docs/components/pipelines/v1/:splat
/docs/components/pipelines/* /docs/components/pipelines/legacy-v1/:splat

# Restructure About section
/docs/about/kubeflow /docs/started/introduction
Expand Down
5 changes: 5 additions & 0 deletions content/en/docs/components/pipelines/how-to/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
+++
title = "How-to/User guides"
description = "Documentation for Kubeflow Pipelines."
weight = 2
+++
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
+++
title = "Artifacts"
description = "Create, use, pass, and track ML artifacts"
weight = 2
title = "Create, use, pass, and track ML artifacts"
weight = 10
+++

{{% kfp-v2-keywords %}}
Expand Down Expand Up @@ -241,14 +240,14 @@ On the [KFP open source][oss-be] UI, `ClassificationMetrics`, `SlicedClassificat
[dsl-slicedclassificationmetrics]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.SlicedClassificationMetrics
[dsl-html]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.HTML
[dsl-markdown]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.Markdown
[type-checking]: /docs/components/pipelines/v2/compile-a-pipeline#type-checking
[type-checking]: /docs/components/pipelines/how-to/compile-a-pipeline#type-checking
[oss-be]: /docs/components/pipelines/v2/installation/
[pipelines]: /docs/components/pipelines/v2/pipelines/pipelines-basics/
[container-components]: /docs/components/pipelines/v2/components/lightweight-python-components/
[python-components]: /docs/components/pipelines/v2/components/container-components
[container-components]: /docs/components/pipelines/how-to/create-components/lightweight-python-components//
[python-components]: /docs/components/pipelines/how-to/create-components/container-components
[dsl-parallelfor]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.ParallelFor
[dsl-collected]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.Collected
[parallel-looping]: https://www.kubeflow.org/docs/components/pipelines/v2/pipelines/control-flow/#parallel-looping-dslparallelfor
[traditional-artifact-syntax]: /docs/components/pipelines/v2/data-types/artifacts/#traditional-artifact-syntax
[multiple-outputs]: /docs/components/pipelines/v2/data-types/parameters/#multiple-output-parameters
[pythonic-artifact-syntax]: /docs/components/pipelines/v2/data-types/artifacts/#new-pythonic-artifact-syntax
[parallel-looping]: https://www.kubeflow.org/docs/components/pipelines/how-to/control-flow/#parallel-looping-dslparallelfor
[traditional-artifact-syntax]: /docs/components/pipelines/how-to/artifacts/#traditional-artifact-syntax
[multiple-outputs]: /docs/components/pipelines/how-to/parameters/#multiple-output-parameters
[pythonic-artifact-syntax]: /docs/components/pipelines/how-to/artifacts/#new-pythonic-artifact-syntax
147 changes: 147 additions & 0 deletions content/en/docs/components/pipelines/how-to/build-advanced-pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
+++
title = "Build a More Advanced ML Pipeline"
weight = 11
+++

{{% kfp-v2-keywords %}}

This step demonstrates how to build a more advanced machine learning (ML) pipeline that leverages additional KFP pipeline composition features.

The following ML pipeline creates a dataset, normalizes the features of the dataset as a preprocessing step, and trains a simple ML model on the data using different hyperparameters:

```python
from typing import List

from kfp import client
from kfp import dsl
from kfp.dsl import Dataset
from kfp.dsl import Input
from kfp.dsl import Model
from kfp.dsl import Output


@dsl.component(packages_to_install=['pandas==1.3.5'])
def create_dataset(iris_dataset: Output[Dataset]):
import pandas as pd

csv_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
col_names = [
'Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width', 'Labels'
]
df = pd.read_csv(csv_url, names=col_names)

with open(iris_dataset.path, 'w') as f:
df.to_csv(f)


@dsl.component(packages_to_install=['pandas==1.3.5', 'scikit-learn==1.0.2'])
def normalize_dataset(
input_iris_dataset: Input[Dataset],
normalized_iris_dataset: Output[Dataset],
standard_scaler: bool,
min_max_scaler: bool,
):
if standard_scaler is min_max_scaler:
raise ValueError(
'Exactly one of standard_scaler or min_max_scaler must be True.')

import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler

with open(input_iris_dataset.path) as f:
df = pd.read_csv(f)
labels = df.pop('Labels')

if standard_scaler:
scaler = StandardScaler()
if min_max_scaler:
scaler = MinMaxScaler()

df = pd.DataFrame(scaler.fit_transform(df))
df['Labels'] = labels
with open(normalized_iris_dataset.path, 'w') as f:
df.to_csv(f)


@dsl.component(packages_to_install=['pandas==1.3.5', 'scikit-learn==1.0.2'])
def train_model(
normalized_iris_dataset: Input[Dataset],
model: Output[Model],
n_neighbors: int,
):
import pickle

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

with open(normalized_iris_dataset.path) as f:
df = pd.read_csv(f)

y = df.pop('Labels')
X = df

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

clf = KNeighborsClassifier(n_neighbors=n_neighbors)
clf.fit(X_train, y_train)
with open(model.path, 'wb') as f:
pickle.dump(clf, f)


@dsl.pipeline(name='iris-training-pipeline')
def my_pipeline(
standard_scaler: bool,
min_max_scaler: bool,
neighbors: List[int],
):
create_dataset_task = create_dataset()

normalize_dataset_task = normalize_dataset(
input_iris_dataset=create_dataset_task.outputs['iris_dataset'],
standard_scaler=True,
min_max_scaler=False)

with dsl.ParallelFor(neighbors) as n_neighbors:
train_model(
normalized_iris_dataset=normalize_dataset_task
.outputs['normalized_iris_dataset'],
n_neighbors=n_neighbors)


endpoint = '<KFP_UI_URL>'
kfp_client = client.Client(host=endpoint)
run = kfp_client.create_run_from_pipeline_func(
my_pipeline,
arguments={
'min_max_scaler': True,
'standard_scaler': False,
'neighbors': [3, 6, 9]
},
)
url = f'{endpoint}/#/runs/details/{run.run_id}'
print(url)
```

This example introduces the following new features in the pipeline:

* Some Python **packages to install** are added at component runtime, using the `packages_to_install` argument on the `@dsl.component` decorator, as follows:

`@dsl.component(packages_to_install=['pandas==1.3.5'])`

To use a library after installing it, you must include its import statements within the scope of the component function, so that the library is imported at component runtime.

* **Input and output artifacts** of types `Dataset` and `Model` are introduced in the component signature to describe the input and output artifacts of the components. This is done using the type annotation generics `Input[]` and `Output[]` for input and output artifacts respectively.

Within the scope of a component, artifacts can be read (for inputs) and written (for outputs) via the `.path` attribute. The KFP backend ensures that *input* artifact files are copied *to* the executing pod's local file system from the remote storage at runtime, so that the component function can read input artifacts from the local file system. By comparison, *output* artifact files are copied *from* the local file system of the pod to remote storage, when the component finishes running. This way, the output artifacts persist outside the pod. In both cases, the component author needs to interact with the local file system only to create persistent artifacts.

The arguments for the parameters annotated with `Output[]` are not passed to components by the pipeline author. The KFP backend passes this artifact during component runtime, so that component authors don't need to be concerned about the path to which the output artifacts are written. After an output artifact is written, the backend executing the component recognizes the KFP artifact types (`Dataset` or `Model`), and organizes them on the Dashboard.

An output artifact can be passed as an input to a downstream component using the `.outputs` attribute of the source task and the output artifact parameter name, as follows:

`create_dataset_task.outputs['iris_dataset']`

* One of the **DSL control flow features**, `dsl.ParallelFor`, is used. It is a context manager that lets pipeline authors create tasks. These tasks execute in parallel in a loop. Using `dsl.ParallelFor` to iterate over the `neighbors` pipeline argument lets you execute the `train_model` component with different arguments and test multiple hyperparameters in one pipeline run. Other control flow features include `dsl.Condition` and `dsl.ExitHandler`.

Congratulations! You now have a KFP deployment, an end-to-end ML pipeline, and an introduction to the UI. That's just the beginning of KFP pipeline and Dashboard features.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
+++
title = "Caching"
title = "Use Caching"
description = "How to use caching in Kubeflow Pipelines."
weight = 13
weight = 12
+++

Kubeflow Pipelines support caching to eliminate redundant executions and improve
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
+++
title = "Command Line Interface"
description = "Interact with KFP via the CLI"
weight = 10
title = "Interact with KFP via the CLI"
weight = 4
+++

{{% kfp-v2-keywords %}}
Expand Down Expand Up @@ -158,5 +157,5 @@ For more information about the arguments and flags supported by the `kfp compone

[cli-reference-docs]: https://kubeflow-pipelines.readthedocs.io/en/stable/source/cli.html
[kfp-sdk-api-ref]: https://kubeflow-pipelines.readthedocs.io/en/stable/index.html
[lightweight-python-component]: /docs/components/pipelines/v2/components/lightweight-python-components
[containerized-python-components]: /docs/components/pipelines/v2/components/containerized-python-components
[lightweight-python-component]: /docs/components/pipelines/how-to/create-components/lightweight-python-components/
[containerized-python-components]: /docs/components/pipelines/how-to/create-components/containerized-python-components
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
+++
title = "Compile a Pipeline"
description = "Compile pipelines and components to YAML"
weight = 7
weight = 2
+++

{{% kfp-v2-keywords %}}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
+++
title = "Control Flow"
description = "Create pipelines with control flow"
weight = 2
title = "Create pipelines with control flow"
weight = 8
+++

{{% kfp-v2-keywords %}}
Expand Down Expand Up @@ -209,8 +208,8 @@ def my_pipeline(text: str = 'message'):

Note that the component used for the caller task (`print_op` in the example above) requires a default value for all inputs it consumes from an upstream task. The default value is applied if the upstream task fails to produce the outputs that are passed to the caller task. Specifying default values ensures that the caller task always succeeds, regardless of the status of the upstream task.

[data-passing]: /docs/components/pipelines/v2/pipelines/pipeline-basics#data-passing-and-task-dependencies
[pipeline-basics]: /docs/components/pipelines/v2/pipelines/pipeline-basics
[data-passing]: /docs/components/pipelines/how-to/create-components/compose-components-into-pipelines#data-passing-and-task-dependencies
[pipeline-basics]: /docs/components/pipelines/how-to/create-components/compose-components-into-pipelines
[dsl-condition]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.Condition
[dsl-exithandler]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.ExitHandler
[dsl-parallelfor]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.ParallelFor
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
+++
title = "Components"
description = "Author KFP components"
weight = 5
title = "Create components"
weight = 6
+++

{{% kfp-v2-keywords %}}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,9 @@ def dataset_concatenator(

Note that if you provide a `description` argument to the [`@dsl.pipeline`][dsl-pipeline] decorator, KFP will use this description instead of the docstring description.

[ir-yaml]: /docs/components/pipelines/v2/compile-a-pipeline#ir-yaml
[ir-yaml]: /docs/components/pipelines/how-to/compile-a-pipeline#ir-yaml
[google-docstring-style]: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html
[dsl-pipeline]: https://kubeflow-pipelines.readthedocs.io/en/stable/source/dsl.html#kfp.dsl.pipeline
[output-artifacts]: /docs/components/pipelines/v2/data-types/artifacts#declaring-inputoutput-artifacts
[output-artifacts]: /docs/components/pipelines/how-to/artifacts#declaring-inputoutput-artifacts
[dsl-outputpath]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.OutputPath
[output-type-marker]: https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.Output
Loading

0 comments on commit 9e6b943

Please sign in to comment.