A Giter8 template for Scala Spark Projects.
This template kickstarts your Spark project with a modified version of the popular "word count" example, now enhanced to handle stop words. This foundational example can be replaced or expanded as per your project requirements, allowing you to tailor the Spark components to fit your specific needs.
- Rapid Initialization: Quickly bootstrap your Spark application with a ready-to-use example.
- Customizable Base: Easily swap out the word count example to accommodate your unique project objectives and integrate additional Spark components as needed.
- Encourages Best Practices: Begin with a project set up for good coding practices, starting with 100% code coverage (albeit with just one sample test). It's expected that coverage will naturally adjust as your project grows.
Using sbt (0.13.13+) just do
sbt new eff3ct0/spark-template.g8
Follow these steps to set up and run your Spark application:
You can test the example Spark job directly from SBT. Open a terminal in the project directory and execute:
sbt "run inputFile.txt outputFile.txt"
When prompted, choose CountingLocalApp
to see the example in action.
Prepare your project for distribution or deployment by building it with SBT:
- Navigate to the project directory:
cd <project-directory>
- Run the assembly:
sbt assembly
This command creates a JAR file in the target/scala-/ directory. For more details on configuring * sbt-assembly*, refer to the sbt-assembly documentation.
To execute your Spark job on a cluster, use the Spark spark-submit script:
/path/to/spark-home/bin/spark-submit \
--class <package-name>.CountingApp \
--name <spark-app> \
--master <master url> \
./target/scala-<scala-version>/<jar-name> \
<input file> <output file>
Make sure to replace placeholders like , , , , and with actual values relevant to your project and environment.
More information on submitting Spark jobs can be found in the Spark documentation.
This project is available under your choice of the Apache 2.0 or CC0 1.0 license. Choose the one that best suits your needs:
This template is provided "as-is" without any warranties. Modify and distribute as needed to fit your project requirements.