Skip to content
Markus Müller edited this page Jul 2, 2021 · 11 revisions

Introduction

You can find a longer tutorial here, it explains what kubernetes is and why installing Spark on kubernetes is a good idea.

Installing microk8s

On Ubuntu 20.10 we can skip the first 3 items of the tutorial mentioned above. Just type in

sudo snap install microk8s --classic

wait a bit, and run microk8s status to find out if the cluster is ready.

Now enable services like intra-cluster name resolution, dashboard, storage and ingress with

microk8s enable dns dashboard storage ingress

Again wait a bit until you can access the dashboard.

dashboard example

Use microk8s kubectl -n kube-system get service kubernetes-dashboard to find the dashboard's cluster IP. Enabling the ingress service connects the kubernetes based dashboard with the workstation browser.

See also Installing microk8s on Ubuntu

Installing Spark

Download Spark from the Spark Download page and extract it with tar xfz ~/Downloads/spark-3.1.2-bin-hadoop3.2.tgz Make the spark shell accessible with export SPARK_HOME=~/spark-3.1.2-bin-hadoop3.2/ and alias spark-shell='/home/markus/spark-3.1.2-bin-hadoop3.2/bin/spark-shell'

Install Spark master and worker on kubernetes with

microk8s kubectl apply -f https://raw.githubusercontent.com/big-data-europe/docker-spark/master/k8s-spark-cluster.yaml

Use the kubernetes dashboard to follow the deployment of both spark components. This is how it should look like (for the worker)

spark dash

Logs should show that both pods can make use of their cluster IPs, otherwise Spark worker and master cannot find each other. If there are any issues try to disable and re-enable kubernetes dns with microk8s disable dns followed by microk8s enable dns.

Test

Schedule SparkPi to the kubernetes based Spark cluster with

microk8s kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:3.1.1-hadoop3.2 -- bash ./spark/bin/spark-submit --master spark://spark-master:7077 --deploy-mode client --conf spark.driver.host=spark-client --conf spark.kubernetes.authenticate.caCertFile=/var/snap/microk8s/current/certs/ca.crt --class org.apache.spark.examples.SparkPi local:///spark/examples/jars/spark-examples_2.12-3.1.1.jar

This commmand