Skip to content
This repository has been archived by the owner on Oct 31, 2024. It is now read-only.

Latest commit

 

History

History
20 lines (14 loc) · 500 Bytes

README.md

File metadata and controls

20 lines (14 loc) · 500 Bytes

Apache Spark Statistics

http://stevenskelton.ca/

Basic setup of an in-memory computation project using StackOverflow's data dump.

Installation (Scala 2.10)

  • Download Spark src from github [url], scala 2.10 branch

  • Compile spark assembly

  • sbt assembly

  • Create /lib directory in this project

  • Copy spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop1.0.4.jar from assembly\target\scala-2.10 to /lib

  • compile, and run project -> sbt package run

Unit tests

Change VM arguments -Xmx6096m