forked from YahooArchive/samoa
-
Notifications
You must be signed in to change notification settings - Fork 1
Getting Started
Albert Bifet edited this page Oct 22, 2013
·
5 revisions
- Download SAMOA
git clone [email protected]:yahoo/samoa.git
cd samoa
mvn -Pstorm package
- Download the Forest CoverType dataset from
http://sourceforge.net/projects/moa-datastream/files/Datasets/Classification/covtypeNorm.arff.zip/download
Forest Covertype contains the forest cover type for 30 x 30 meter cells obtained from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. It contains 581, 012 instances and 54 attributes, and it has been used in several papers on data stream classification.
- Run an Example. classifying the CoverType dataset with the bagging algorithm:
- in a simulation environment
java -cp target/SAMOA-Storm-0.0.1.jar com.yahoo.labs.samoa.DoTask "PrequentialEvaluation -l classifiers.ensemble.Bagging -f (ArffFileStream -f covtypeNorm.arff) -f 100000"
- in Storm Local Mode
java -cp target/SAMOA-Storm-0.0.1.jar com.yahoo.labs.samoa.StormLocalDoTask "PrequentialEvaluation -l classifiers.ensemble.Bagging -f (ArffFileStream -f covtypeNorm.arff) -f 100000"
The output will be a list of the evaluation results, plotted each 100,000 instances.