The Spark Maven template image serves as a base image to build your own Maven application to run on a Spark cluster. See big-data-europe/docker-spark README for a description how to setup a Spark cluster.
You can build and launch your Maven application on a Spark cluster by extending this image with your sources. The template uses Maven as build tool, so make sure you have a pom.xml
file for your application specifying all the dependencies.
The Maven package
command must create an assembly JAR (or 'uber' JAR) containing your code and its dependencies. Spark and Hadoop dependencies should be listes as provided
. The Maven shade plugin provides a plugin to build such assembly JARs.
- Create a Dockerfile in the root folder of your project (which also contains a
pom.xml
) - Extend the Spark Maven template Docker image
- Configure the following environment variables (unless the default value satisfies):
SPARK_MASTER_NAME
(default: spark-master)SPARK_MASTER_PORT
(default: 7077)SPARK_APPLICATION_JAR_NAME
(default: application-1.0)SPARK_APPLICATION_MAIN_CLASS
(default: my.main.Application)SPARK_APPLICATION_ARGS
(default: "")
- Build and run the image
docker build --rm=true -t bde/spark-app .
docker run --name my-spark-app --link spark-master:spark-master -d bde/spark-app
The sources in the project folder will be automatically added to /usr/src/app
if you directly extend the Spark Maven template image. Otherwise you will have to add and package the sources by yourself in your Dockerfile with the commands:
COPY . /usr/src/app
RUN cd /usr/src/app \
&& mvn clean package
If you overwrite the template's CMD
in your Dockerfile, make sure to execute the /template.sh
script at the end.
FROM bde2020/spark-maven-template:3.3.0-hadoop3.3
MAINTAINER Erika Pauwels <[email protected]>
MAINTAINER Gezim Sejdiu <[email protected]>
ENV SPARK_APPLICATION_JAR_NAME my-app-1.0-SNAPSHOT-with-dependencies
ENV SPARK_APPLICATION_MAIN_CLASS eu.bde.my.Application
ENV SPARK_APPLICATION_ARGS "foo bar baz"