The ambition of kflow is to make it easier to build R based components orchestrated by Google’s Kubeflow. Importantly, this package does not intend to be a full R replacement for the python SDK (at least not yet!). However, I’ve had some good luck in wrapping the python SDK with reticulate, so if you need to go full R, that would be a good option.
You can install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("ndiquattro/kflow")
To illustrate how to use {kflow} we’ll set up a simple component example
where we predict the transmission type of a car in mtcars
based on an
input parameter. We will work with a single function that will
eventually be translated to a single kubeflow component.
Note that our argument names need to follow a convention for the conversion to component to succeed. Each argument must end in a slug that identifies the argument type. The conversions for slug to kubeflow type are:
Inputs
- _string = String
- _int = Integer
- _bool = Bool
- _float = Float
Outputs
- _out = outputPath
- _metrics = Metrics
- _uimeta = UI_metadata
With all that defined, let’s create the function:
library(kflow)
tm_predict <- function(predictor_string, file_out, performance_metrics, curve_uimeta) {
# Train Model
cars_dat <- mtcars
cars_dat$am <- factor(cars_dat$am)
form <- as.formula(paste0("am ~ ", predictor_string))
model <- glm(form, binomial, cars_dat)
# Make Predictions
cars_dat$prob_auto <- predict(model, type = "response")
# Save results
kf_write_output(cars_dat, file_out) # This ensures the path exists then writes to a kubeflow provided path
# Score and save metrics
kf_init_metrics() %>% # Start an empy JSON
kf_add_metric(name = "roc", value = yardstick::roc_auc(cars_dat, am, prob_auto)$.estimate, format = "RAW") %>%
kf_add_metric(name = "pr-auc", value = yardstick::pr_auc(cars_dat, am, prob_auto)$.estimate, format = "RAW") %>%
kf_write_output(curve_uimeta)
# Save ROC Curve
roc_file <- tempfile()
yardstick::roc_curve(test_preds_org, observed, estimated) %>%
mutate(specificity = 1 - specificity) %>% # convert to FPR
filter(is.finite(.threshold)) %>% # KF not going to like -Inf to Inf
write.csv(roc_file, col_names = FALSE) # Save without headers
kf_init_ui_meta() %>%
kf_add_roc(roc_file)
}
component <-
kf_make_component(
"tm_predict",
"Transmission Predictor",
"Predicts if a car has an automatic transmission based on a provided variable",
"rocker/tidyverse:3.6.2"
)
cat(component, sep = "\n")
#> name: Transmission Predictor
#> description: Predicts if a car has an automatic transmission based on a provided variable
#> inputs:
#> - name: predictor_string
#> type: String
#> outputs:
#> - name: file_out
#> type: ~
#> - name: mlpipeline_metrics
#> type: Metrics
#> - name: mlpipeline_ui_metadata
#> type: UI_metadata
#> implementation:
#> container:
#> image: rocker/tidyverse:3.6.2
#> args:
#> - inputValue: predictor_string
#> - outputPath: file_out
#> - outputPath: mlpipeline_metrics
#> - outputPath: mlpipeline_ui_metadata
#> command:
#> - Rscript
#> - -e
#> - args<-commandArgs(trailingOnly=TRUE)
#> - -e
#> - tm_predict(args[1],args[2],args[3],args[4])
Next let’s take a look at an example of how the metrics/ui meta
functions work. Essentially they are just helpers for creating JSON in a
structure kubeflow expects. They can be written by kf_write_output()
just like any other information we want to save.
You can also inspect the JSON as you go. First create the base:
base_metrics <- kf_init_metrics()
base_metrics
#> {
#> "metrics": []
#> }
Then add a metric:
base_metrics %>%
kf_add_metric(
name = "coolness-factor",
value = 100,
format = "RAW"
)
#> {
#> "metrics": [
#> {
#> "name": "coolness-factor",
#> "numberValue": 100,
#> "format": "RAW"
#> }
#> ]
#> }
You can chain as many metrics together as you’d like:
base_metrics %>%
kf_add_metric(
name = "coolness-factor",
value = 100,
format = "RAW"
) %>%
kf_add_metric(
name = "badness-factor",
value = 0,
format = "RAW"
)
#> {
#> "metrics": [
#> {
#> "name": "coolness-factor",
#> "numberValue": 100,
#> "format": "RAW"
#> },
#> {
#> "name": "badness-factor",
#> "numberValue": 0,
#> "format": "RAW"
#> }
#> ]
#> }
When written to a _metrics
or _uimeta
path they will show up in the
kubeflow UI!