-
Notifications
You must be signed in to change notification settings - Fork 2
Home
You can find a longer tutorial here, it explains what kubernetes is and why installing Spark on kubernetes is a good idea.
On Ubuntu 20.10 we can skip the first 3 items of the tutorial mentioned above. Just type in
sudo snap install microk8s --classic
wait a bit, and run microk8s status
to find out if the cluster is ready.
Now enable services like intra-cluster name resolution, dashboard, storage and ingress with
microk8s enable dns dashboard storage ingress
Again wait a bit until you can access the dashboard.
Use microk8s kubectl -n kube-system get service kubernetes-dashboard
to find the dashboard's cluster IP. Enabling the ingress service connects the kubernetes based dashboard with the workstation browser.
See also Installing microk8s on Ubuntu
Download Spark from the Spark Download page and extract it with tar xfz ~/Downloads/spark-3.1.2-bin-hadoop3.2.tgz
Make the spark shell accessible with export SPARK_HOME=~/spark-3.1.2-bin-hadoop3.2/
and alias spark-shell='/home/markus/spark-3.1.2-bin-hadoop3.2/bin/spark-shell'
Install Spark master and worker on kubernetes with
microk8s kubectl apply -f https://raw.githubusercontent.com/big-data-europe/docker-spark/master/k8s-spark-cluster.yaml
Use the kubernetes dashboard to follow the deployment of both spark components. This is how it should look like (for the worker)
Logs should show that both pods can make use of their cluster IPs, otherwise Spark worker and master cannot find each other. If there are any issues try to disable and re-enable kubernetes dns with microk8s disable dns
followed by microk8s enable dns
.
Schedule SparkPi to the kubernetes based Spark cluster with
microk8s kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:3.1.1-hadoop3.2 -- bash ./spark/bin/spark-submit --master spark://spark-master:7077 --deploy-mode client --conf spark.driver.host=spark-client --conf spark.kubernetes.authenticate.caCertFile=/var/snap/microk8s/current/certs/ca.crt --class org.apache.spark.examples.SparkPi local:///spark/examples/jars/spark-examples_2.12-3.1.1.jar
This commmand