Skip to content
Albert Bifet edited this page Oct 22, 2013 · 5 revisions
  1. Download SAMOA
git clone [email protected]:yahoo/samoa.git
cd samoa
mvn -Pstorm package
  1. Download the Forest CoverType dataset from
http://sourceforge.net/projects/moa-datastream/files/Datasets/Classification/covtypeNorm.arff.zip/download

Forest Covertype contains the forest cover type for 30 x 30 meter cells obtained from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. It contains 581, 012 instances and 54 attributes, and it has been used in several papers on data stream classification.

  1. Run an Example. classifying the CoverType dataset with the bagging algorithm:
  • in a simulation environment
java -cp target/SAMOA-Storm-0.0.1.jar com.yahoo.labs.samoa.DoTask "PrequentialEvaluation -l classifiers.ensemble.Bagging -f (ArffFileStream -f covtypeNorm.arff) -f 100000"
  • in Storm Local Mode
java -cp target/SAMOA-Storm-0.0.1.jar com.yahoo.labs.samoa.StormLocalDoTask "PrequentialEvaluation -l classifiers.ensemble.Bagging -f (ArffFileStream -f covtypeNorm.arff) -f 100000"

The output will be a list of the evaluation results, plotted each 100,000 instances.