metrics: Add the start of the local report generator

Add the first basic report generator code, utilising R, Rmarkdown and pandoc to do the data processing and pdf report generation. Fixes: clearlinux#121 Signed-off-by: Graham Whaley <[email protected]>
grahamwhaley · Jul 23, 2019 · 45598b2 · 45598b2
1 parent 39a3f46
commit 45598b2
Show file tree

Hide file tree

Showing 7 changed files with 533 additions and 0 deletions.
diff --git a/metrics/report/README.md b/metrics/report/README.md
@@ -0,0 +1,73 @@
+* [cloud-native-setup metrics report generator](#cloud-native-setup-metrics-report-generator)
+   * [Data gathering](#data-gathering)
+   * [Report generation](#report-generation)
+   * [Debugging and development](#debugging-and-development)
+
+# cloud-native-setup metrics report generator
+
+The files within this directory can be used to generate a 'metrics report'
+for Kubernetes.
+
+The primary workflow consists of two stages:
+
+1) Run the provided report metrics data gathering scripts on the system(s) you wish
+to analyze.
+2) Run the provided report generation script to analyze the data and generate a
+report file.
+
+## Data gathering
+
+Data gathering is provided by the `grabdata.sh` script. When run, this script
+executes a set of tests from the `cloud-native-setup/metrics` directory. The JSON results files
+will be placed into the `cloud-native-setup/metrics/results` directory.
+
+Once the results are generated, create a suitably named subdirectory of
+`tests/metrics/results`, and move the JSON files into it.
+
+Repeat this process if you want to compare multiple sets of results. Note, the
+report generation scripts process all subfolders of `tests/metrics/results` when
+generating the report.
+
+You can restrict the subset of tests run by `grabdata.sh` via its commandline parameters:
+
+| Option | Description |
+| ------ | ----------- |
+| -a | Run all tests (default) |
+| -d | Run the density tests |
+| -h | Print this help |
+
+## Report generation
+
+Report generation is provided by the `makereport.sh` script. By default this script 
+processes all subdirectories of the `cloud-native-setup/metrics/results` directory to generate the report.
+To run in the default mode, execute the following:
+
+```sh
+$ ./makereport.sh
+```
+
+The report generation tool uses [Rmarkdown](https://github.com/rstudio/rmarkdown),
+[R](https://www.r-project.org/about.html) and [pandoc](https://pandoc.org/) to produce
+a PDF report. To avoid the need for all users to set up a working environment
+with all the necessary tooling, the `makereport.sh` script utilises a `Dockerfile` with
+the environment pre-defined in order to produce the report. Thus, you need to
+have Docker installed on your system in order to run the report generation.
+
+The resulting `metrics_report.pdf` is generated into the `output` subdir of the `report`
+directory.
+
+## Debugging and development
+
+To aid in script development and debugging, the `makereport.sh` script offers a debug
+facility via the `-d` command line option. Using this option will place you into a `bash`
+shell within the running `Dockerfile` image used to generate the report. From there you
+can examine the Docker image environment, and execute the generation scripts. E.g., to
+test the `scaling.R` script, you can execute:
+
+```bash
+$ makereport.sh -d
+# R
+> source('/inputdir/Env.R')
+> source('/scripts/scaling.R')
+```
+
diff --git a/metrics/report/grabdata.sh b/metrics/report/grabdata.sh
@@ -0,0 +1,97 @@
+#!/bin/bash
+# Copyright (c) 2019 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+
+# Run a set of the metrics tests to gather data to be used with the report
+# generator. The general ideal is to have the tests configured to generate
+# useful, meaninful and repeatable (stable, with minimised variance) results.
+# If the tests have to be run more or longer to achieve that, then generally
+# that is fine - this test is not intended to be quick, it is intended to
+# be repeatable.
+
+# Note - no 'set -e' in this file - if one of the metrics tests fails
+# then we wish to continue to try the rest.
+# Finally at the end, in some situations, we explicitly exit with a
+# failure code if necessary.
+
+SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
+source "${SCRIPT_DIR}/../lib/common.bash"
+RESULTS_DIR=${SCRIPT_DIR}/../results
+
+# By default we run all the tests
+RUN_ALL=1
+
+help() {
+	usage=$(cat << EOF
+Usage: $0 [-h] [options]
+   Description:
+        This script gathers a number of metrics for use in the
+        report generation script. Which tests are run can be
+        configured on the commandline. Specifically enabling
+        individual tests will disable the 'all' option, unless
+        'all' is also specified last.
+   Options:
+        -a,         Run all tests (default).
+        -h,         Print this help.
+        -s,         Run the scaling tests.
+EOF
+)
+	echo "$usage"
+}
+
+# Set up the initial state
+init() {
+	metrics_onetime_init
+
+	local OPTIND
+	while getopts "adhnst" opt;do
+		case ${opt} in
+		a)
+		    RUN_ALL=1
+		    ;;
+		h)
+		    help
+		    exit 0;
+		    ;;
+		s)
+		    RUN_SCALING=1
+		    RUN_ALL=
+		    ;;
+		?)
+		    # parse failure
+		    help
+		    die "Failed to parse arguments"
+		    ;;
+		esac
+	done
+	shift $((OPTIND-1))
+}
+
+run_scaling() {
+	echo "Running scaling tests"
+
+	(cd scaling; ./k8s_scale.sh)
+}
+
+# Execute metrics scripts
+run() {
+	pushd "$SCRIPT_DIR/.."
+
+	if [ -n "$RUN_ALL" ] || [ -n "$RUN_SCALING" ]; then
+		run_scaling
+	fi
+
+	popd
+}
+
+finish() {
+	echo "Now please create a suitably descriptively named subdirectory in"
+	echo "$RESULTS_DIR and copy the .json results files into it before running"
+	echo "this script again."
+}
+
+init "$@"
+run
+finish
+
diff --git a/metrics/report/makereport.sh b/metrics/report/makereport.sh
@@ -0,0 +1,72 @@
+#!/bin/bash
+# Copyright (c) 2019 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+
+# Take the data found in subdirectories of the metrics 'results' directory,
+# and turn them into a PDF report. Use a Dockerfile containing all the tooling
+# and scripts we need to do that.
+
+set -e
+
+SCRIPT_PATH=$(dirname "$(readlink -f "$0")")
+source "${SCRIPT_PATH}/../lib/common.bash"
+
+IMAGE="${IMAGE:-metrics-report}"
+DOCKERFILE="${SCRIPT_PATH}/report_dockerfile/Dockerfile"
+
+HOSTINPUTDIR="${SCRIPT_PATH}/../results"
+RENVFILE="${HOSTINPUTDIR}/Env.R"
+HOSTOUTPUTDIR="${SCRIPT_PATH}/output"
+
+GUESTINPUTDIR="/inputdir/"
+GUESTOUTPUTDIR="/outputdir/"
+
+setup() {
+	echo "Checking subdirectories"
+	check_subdir="$(ls -dx ${HOSTINPUTDIR}/*/ 2> /dev/null | wc -l)"
+	if [ $check_subdir -eq 0 ]; then
+		die "No subdirs in [${HOSTINPUTDIR}] to read results from."
+	fi
+
+	echo "Checking Dockerfile"
+	check_dockerfiles_images "$IMAGE" "$DOCKERFILE"
+
+	mkdir -p "$HOSTOUTPUTDIR" && true
+
+	echo "inputdir=\"${GUESTINPUTDIR}\"" > ${RENVFILE}
+	echo "outputdir=\"${GUESTOUTPUTDIR}\"" >> ${RENVFILE}
+
+	# A bit of a hack to get an R syntax'd list of dirs to process
+	# Also, need it as not host-side dir path - so short relative names
+	resultdirs="$(cd ${HOSTINPUTDIR}; ls -dx */)"
+	resultdirslist=$(echo ${resultdirs} | sed 's/ \+/", "/g')
+	echo "resultdirs=c(" >> ${RENVFILE}
+	echo "	\"${resultdirslist}\"" >> ${RENVFILE}
+	echo ")" >> ${RENVFILE}
+}
+
+run() {
+	docker run -ti --rm -v ${HOSTINPUTDIR}:${GUESTINPUTDIR} -v ${HOSTOUTPUTDIR}:${GUESTOUTPUTDIR} ${IMAGE} ${extra_command}
+	ls -la ${HOSTOUTPUTDIR}/*
+}
+
+main() {
+
+	local OPTIND
+	while getopts "d" opt;do
+		case ${opt} in
+		d)
+			# In debug mode, run a shell instead of the default report generation
+			extra_command="bash"
+			;;
+		esac
+	done
+	shift $((OPTIND-1))
+
+	setup
+	run
+}
+
+main "$@"
+
diff --git a/metrics/report/report_dockerfile/Dockerfile b/metrics/report/report_dockerfile/Dockerfile
@@ -0,0 +1,40 @@
+# Copyright (c) 2018-2019 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+
+# Set up an Ubuntu image with the components needed to generate a
+# metrics report. That includes:
+#  - R
+#  - The R 'tidyverse'
+#  - pandoc
+#  - The report generation R files and helper scripts
+
+# Start with the base rocker tidyverse.
+# We would have used the 'verse' base, that already has some of the docs processing
+# installed, but I could not figure out how to add in the extra bits we needed to
+# the lite tex version is uses.
+FROM rocker/tidyverse
+
+# Version of the Dockerfile
+LABEL DOCKERFILE_VERSION="1.0"
+
+# Without this some of the package installs stop to try and ask questions...
+ENV DEBIAN_FRONTEND=noninteractive
+
+# Install the extra doc processing parts we need for our Rmarkdown PDF flow.
+RUN apt-get update -qq && \
+  apt-get install -y \
+    texlive-latex-base \
+    texlive-fonts-recommended \
+    latex-xcolor
+
+# Install the extra R packages we need.
+RUN install2.r --error --deps TRUE \
+	gridExtra \
+	ggpubr
+
+# Pull in our actual worker scripts
+COPY . /scripts
+
+# By default generate the report
+CMD ["/scripts/genreport.sh"]
diff --git a/metrics/report/report_dockerfile/genreport.sh b/metrics/report/report_dockerfile/genreport.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+# Copyright (c) 2018-2019 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+
+REPORTNAME="metrics_report.pdf"
+
+cd scripts
+
+Rscript --slave -e "library(knitr);knit('metrics_report.Rmd')"
+Rscript --slave -e "library(knitr);pandoc('metrics_report.md', format='latex')"
+
+cp /scripts/${REPORTNAME} /outputdir
+echo "The report, named ${REPORTNAME}, can be found in the output directory"
diff --git a/metrics/report/report_dockerfile/metrics_report.Rmd b/metrics/report/report_dockerfile/metrics_report.Rmd
@@ -0,0 +1,40 @@
+---
+# Copyright (c) 2018-2019 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+#
+title: "Kubernetes metrics report"
+author: "Auto generated"
+date: "`r format(Sys.time(), '%d %B, %Y')`"
+output:
+  pdf_document:
+urlcolor: blue
+---
+
+```{r setup, include=FALSE}
+#Set these opts to get pdf images which fit into beamer slides better
+opts_chunk$set(dev = 'pdf')
+# Pick up any env set by the invoking script, such as the root dir of the
+# results data tree
+source("/inputdir/Env.R")
+```
+\pagebreak
+
+# Introduction
+This report compares the metrics between multiple sets of data generated from
+the [cloud-native-setup report generation scripts](https://github.com/clearlinux/cloud-native-setup/metrics/report/README.md).
+
+This report was generated using the data from the **`r resultdirs`** results directories.
+
+\pagebreak
+
+# Runtime scaling
+This [test](https://github.com/clearlinux/cloud-native-setup/metrics/scaling/k8s_scale.sh)
+measures the system memory 'free' reduction, CPU idle % and pod boot time as it launches more
+and more idle `busybox` pods on a single node Kubernetes cluster.
+
+> Note: CPU % is measured as a system whole - 100% represents *all* CPUs on the node.
+
+```{r, echo=FALSE, fig.cap="K8S scaling"}
+source('scaling.R')
+```