kflow

The ambition of kflow is to make it easier to build R based components orchestrated by Google’s Kubeflow. Importantly, this package does not intend to be a full R replacement for the python SDK (at least not yet!). However, I’ve had some good luck in wrapping the python SDK with reticulate, so if you need to go full R, that would be a good option.

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("ndiquattro/kflow")

Example Usage

To illustrate how to use {kflow} we’ll set up a simple component example where we predict the transmission type of a car in mtcars based on an input parameter. We will work with a single function that will eventually be translated to a single kubeflow component.

Note that our argument names need to follow a convention for the conversion to component to succeed. Each argument must end in a slug that identifies the argument type. The conversions for slug to kubeflow type are:

Inputs

_string = String
_int = Integer
_bool = Bool
_float = Float

Outputs

_out = outputPath
_metrics = Metrics
_uimeta = UI_metadata

With all that defined, let’s create the function:

library(kflow)

tm_predict <- function(predictor_string, file_out, performance_metrics, curve_uimeta) {
  
  # Train Model
  cars_dat <- mtcars
  cars_dat$am <- factor(cars_dat$am)
  
  form <- as.formula(paste0("am ~ ", predictor_string))
  model <- glm(form, binomial, cars_dat)
  
  # Make Predictions
  cars_dat$prob_auto <- predict(model, type = "response")
  
  # Save results
  kf_write_output(cars_dat, file_out)  # This ensures the path exists then writes to a kubeflow provided path
  
  # Score and save metrics
  kf_init_metrics() %>%  # Start an empy JSON
    kf_add_metric(name = "roc", value = yardstick::roc_auc(cars_dat, am, prob_auto)$.estimate, format = "RAW") %>% 
    kf_add_metric(name = "pr-auc", value = yardstick::pr_auc(cars_dat, am, prob_auto)$.estimate, format = "RAW") %>% 
    kf_write_output(curve_uimeta)
  
  # Save ROC Curve
  roc_file <- tempfile()
  yardstick::roc_curve(test_preds_org, observed, estimated) %>%
    mutate(specificity = 1 - specificity) %>%   # convert to FPR
    filter(is.finite(.threshold)) %>%   # KF not going to like -Inf to Inf
    write.csv(roc_file, col_names = FALSE)  # Save without headers
  
  kf_init_ui_meta() %>% 
    kf_add_roc(roc_file)
}

component <-
  kf_make_component(
    "tm_predict",
    "Transmission Predictor",
    "Predicts if a car has an automatic transmission based on a provided variable",
    "rocker/tidyverse:3.6.2"
  )

cat(component, sep = "\n")
#> name: Transmission Predictor
#> description: Predicts if a car has an automatic transmission based on a provided variable
#> inputs:
#> - name: predictor_string
#>   type: String
#> outputs:
#> - name: file_out
#>   type: ~
#> - name: mlpipeline_metrics
#>   type: Metrics
#> - name: mlpipeline_ui_metadata
#>   type: UI_metadata
#> implementation:
#>   container:
#>     image: rocker/tidyverse:3.6.2
#>     args:
#>     - inputValue: predictor_string
#>     - outputPath: file_out
#>     - outputPath: mlpipeline_metrics
#>     - outputPath: mlpipeline_ui_metadata
#>     command:
#>     - Rscript
#>     - -e
#>     - args<-commandArgs(trailingOnly=TRUE)
#>     - -e
#>     - tm_predict(args[1],args[2],args[3],args[4])

Next let’s take a look at an example of how the metrics/ui meta functions work. Essentially they are just helpers for creating JSON in a structure kubeflow expects. They can be written by kf_write_output() just like any other information we want to save.

You can also inspect the JSON as you go. First create the base:

base_metrics <- kf_init_metrics()
base_metrics
#> {
#>   "metrics": []
#> }

Then add a metric:

base_metrics %>% 
  kf_add_metric(
    name = "coolness-factor",
    value = 100,
    format = "RAW"
  )
#> {
#>   "metrics": [
#>     {
#>       "name": "coolness-factor",
#>       "numberValue": 100,
#>       "format": "RAW"
#>     }
#>   ]
#> }

You can chain as many metrics together as you’d like:

base_metrics %>% 
  kf_add_metric(
    name = "coolness-factor",
    value = 100,
    format = "RAW"
  ) %>% 
  kf_add_metric(
    name = "badness-factor",
    value = 0,
    format = "RAW"
  )
#> {
#>   "metrics": [
#>     {
#>       "name": "coolness-factor",
#>       "numberValue": 100,
#>       "format": "RAW"
#>     },
#>     {
#>       "name": "badness-factor",
#>       "numberValue": 0,
#>       "format": "RAW"
#>     }
#>   ]
#> }

When written to a _metrics or _uimeta path they will show up in the kubeflow UI!

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
R		R
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
kflow.Rproj		kflow.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

kflow

Installation

Example Usage

About

Licenses found

Releases

Packages

Languages

License

Licenses found

ndiquattro/kflow

Folders and files

Latest commit

History

Repository files navigation

kflow

Installation

Example Usage

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages