diff --git a/README.md b/README.md index 947dcb4..f2a5261 100644 --- a/README.md +++ b/README.md @@ -23,14 +23,7 @@ Or install it yourself as: ## Usage -* [Instances](#instances) -* [Filters](#filters) -* [Attribute selection](#attribute-selection) -* [Classifiers](#classifiers) -* [Clusterers](#clusterers) -* [Serializing objects](#serializing-objects) - -Start using Weka's Machine Learning and Data Mining algorithms by requiring the gem: +Use Weka's Machine Learning and Data Mining algorithms by requiring the gem: ```ruby require 'weka' @@ -40,667 +33,8 @@ The weka gem tries to carry over the namespaces defined in Weka and enhances som The idea behind keeping the namespaces is, that you can also use the [Weka documentation](http://weka.sourceforge.net/doc.dev/) for looking up functionality and classes. -Analog to the Weka doc you can find the following namespaces: - -| Namespace | Description | -|----------------------------|------------------------------------------------------------------| -| `Weka::Core` | defines base classes for loading, saving, creating, and editing a dataset | -| `Weka::Classifiers` | defines classifier classes in different sub-modules (`Bayes`, `Functions`, `Lazy`, `Meta`, `Rules`, and `Trees` ) | -| `Weka::Filters` | defines filter classes for processing datasets in the `Supervised` or `Unsupervised`, and `Attribute` or `Instance` sub-modules | -| `Weka::Clusterers` | defines clusterer classes | -| `Weka::AttributeSelection` | defines classes for selecting attributes from a dataset | - -### Instances - -Instances objects hold the dataset that is used to train a classifier or that -should be classified based on training data. - -Instances can be loaded from files and saved to files. -Supported formats are *ARFF*, *CSV*, and *JSON*. - -#### Loading Instances from a file - -Instances can be loaded from ARFF, CSV, and JSON files: - -```ruby -instances = Weka::Core::Instances.from_arff('weather.arff') -instances = Weka::Core::Instances.from_csv('weather.csv') -instances = Weka::Core::Instances.from_json('weather.json') -``` - -#### Creating Instances - -Attributes of an Instances object can be defined in a block using the `with_attributes` method. The class attribute can be set by the `class_attribute: true` option on the fly with defining an attribute. - -```ruby -# create instances with relation name 'weather' and attributes -instances = Weka::Core::Instances.new(relation_name: 'weather').with_attributes do - nominal :outlook, values: ['sunny', 'overcast', 'rainy'] - numeric :temperature - numeric :humidity - nominal :windy, values: [true, false] - date :last_storm, 'yyyy-MM-dd' - nominal :play, values: [:yes, :no], class_attribute: true -end -``` - -You can also pass an array of Attributes on instantiating new Instances: -This is useful, if you want to create a new empty Instances object with the same -attributes as an already existing one: - -```ruby -# Take attributes from existing instances -attributes = instances.attributes - -# create an empty Instances object with the given attributes -test_instances = Weka::Core::Instances.new(attributes: attributes) -``` - -#### Saving Instances as files - -You can save Instances as ARFF, CSV, or JSON file. - -```ruby -instances.to_arff('weather.arff') -instances.to_csv('weather.csv') -instances.to_json('weather.json') -``` - -#### Adding additional attributes - -You can add additional attributes to the Instances after its initialization. -All records that are already in the dataset will get an unknown value (`?`) for -the new attribute. - -```ruby -instances.add_numeric_attribute(:pressure) -instances.add_nominal_attribute(:grandma_says, values: [:hm, :bad, :terrible]) -instances.add_date_attribute(:last_rain, 'yyyy-MM-dd HH:mm') -``` - -#### Adding a data instance - -You can add a data instance to the Instances by using the `add_instance` method: - -```ruby -data = [:sunny, 70, 80, true, '2015-12-06', :yes, 1.1, :hm, '2015-12-24 20:00'] -instances.add_instance(data) - -# with custom weight: -instances.add_instance(data, weight: 2.0) -``` - -Multiple instances can be added with the `add_instances` method: - -```ruby -data = [ - [:sunny, 70, 80, true, '2015-12-06', :yes, 1.1, :hm, '2015-12-24 20:00'], - [:overcast, 80, 85, false, '2015-11-11', :no, 0.9, :bad, '2015-12-25 18:13'] -] - -instances.add_instances(data, weight: 2.0) -``` - -If the `weight` argument is not given, then a default weight of 1.0 is used. -The weight in `add_instances` is used for all the added instances. - -#### Setting a class attribute - -You can set an earlier defined attribute as the class attribute of the dataset. -This allows classifiers to use the class for building a classification model while training. - -```ruby -instances.add_nominal_attribute(:size, values: ['L', 'XL']) -instances.class_attribute = :size -``` - -The added attribute can also be directly set as the class attribute: - -```ruby -instances.add_nominal_attribute(:size, values: ['L', 'XL'], class_attribute: true) -``` - -Keep in mind that you can only assign existing attributes to be the class attribute. -The class attribute will not appear in the `instances.attributes` anymore and can be accessed with the `class_attribute` method. - - -#### Alias methods - -`Weka::Core::Instances` has following alias methods: - -| method | alias | -|-----------------------|-------------------------| -| `numeric` | `add_numeric_attribute` | -| `nominal` | `add_nominal_attribute` | -| `date` | `add_date_attribute` | -| `string` | `add_string_attribute` | -| `set_class_attribute` | `class_attribute=` | -| `with_attributes` | `add_attributes` | - -The methods on the left side are meant to be used when defining -attributes in a block when using `#with_attributes` (or `#add_attributes`). - -The alias methods are meant to be used for explicitly adding -attributes to an Instances object or defining its class attribute later on. - -## Filters - -Filters are used to preprocess datasets. - -There are two categories of filters which are also reflected by the namespaces: - -* *supervised* – The filter requires a class atribute to be set -* *unsupervised* – A class attribute is not required to be present - -In each category there are two sub-categories: - -* *attribute-based* – Attributes (columns) are processed -* *instance-based* – Instances (rows) are processed - -Thus, Filter classes are organized in the following four namespaces: - -```ruby -Weka::Filters::Supervised::Attribute -Weka::Filters::Supervised::Instance - -Weka::Filters::Unsupervised::Attribute -Weka::Filters::Unsupervised::Instance -``` - -#### Filtering Instances - -Filters can be used directly to filter Instances: - -```ruby -# create filter -filter = Weka::Filters::Unsupervised::Attribute::Normalize.new - -# filter instances -filtered_data = filter.filter(instances) -``` - -You can also apply a Filter on an Instances object: - -```ruby -# create filter -filter = Weka::Filters::Unsupervised::Attribute::Normalize.new - -# apply filter on instances -filtered_data = instances.apply_filter(filter) -``` - -With this approach, it is possible to chain multiple filters on a dataset: - -```ruby -# create filters -include Weka::Filters::Unsupervised::Attribute - -normalize = Normalize.new -discretize = Discretize.new - -# apply a filter chain on instances -filtered_data = instances.apply_filter(normalize).apply_filter(discretize) - -# or even shorter -filtered_data = instances.apply_filters(normalize, discretize) -``` - -#### Setting Filter options - -Any Filter has several options. You can list a description of all options of a filter: - -```ruby -puts Weka::Filters::Unsupervised::Attribute::Normalize.options -# -S The scaling factor for the output range. -# (default: 1.0) -# -T The translation of the output range. -# (default: 0.0) -# -unset-class-temporarily Unsets the class index temporarily before the filter is -# applied to the data. -# (default: no) -``` - -To get the default option set of a Filter you can run `.default_options`: - -```ruby -Weka::Filters::Unsupervised::Attribute::Normalize.default_options -# => '-S 1.0 -T 0.0' -``` - -Options can be set while building a Filter: - -```ruby -filter = Weka::Filters::Unsupervised::Attribute::Normalize.build do - use_options '-S 0.5' -end -``` - -Or they can be set or changed after you created the Filter: - -```ruby -filter = Weka::Filters::Unsupervised::Attribute::Normalize.new -filter.use_options('-S 0.5') -``` - -## Attribute selection - -Selecting attributes (features) from a set of instances is important -for getting the best result out of a classification or clustering. -Attribute selection reduces the number of attributes and thereby can speed up -the runtime of the algorithms. -It also avoids processing too many attributes when only a certain subset is essential -for building a good model. - -For attribute selection you need to apply a search and an evaluation method on a dataset. - -Search methods are defined in the `Weka::AttributeSelection::Search` module. -There are search methods for subset search and individual attribute search. - -Evaluators are defined in the `Weka::AttributeSelection::Evaluator` module. -Corresponding to search method types there are two evalutor types for subset search and individual search. - -The search methods and evaluators from each category can be combined to perform an attribute selection. - -**Classes for attribute *subset* selection:** - -| Search | Evaluators | -|-------------------------------|------------------------------| -| `BestFirst`, `GreedyStepwise` | `CfsSubset`, `WrapperSubset` | - -**Classes for *individual* attribute selection:** - -| Search | Evaluators | -|----------|------------| -| `Ranker` | `CorrelationAttribute`, `GainRatioAttribute`, `InfoGainAttribute`, `OneRAttribute`, `ReliefFAttribute`, `SymmetricalUncertAttribute` | - -An attribute selection can either be performed with the `Weka::AttributeSelection::AttributeSelection` class: - -```ruby -instances = Weka::Core::Instances.from_arff('weather.arff') - -selection = Weka::AttributeSelection::AttributeSelection.new -selection.search = Weka::AttributeSelection::Search::Ranker.new -selection.evaluator = Weka::AttributeSelection::Evaluator::PricipalComponents.new - -selection.select_attribute(instances) -puts selection.summary -``` - -Or you can use the supervised `AttributeSelection` filter to directly filter instances: - -```ruby -instances = Weka::Core::Instances.from_arff('weather.arff') -search = Weka::AttributeSelection::Search::Ranker.new -evaluator = Weka::AttributeSelection::Evaluator::PricipalComponents.new - -filter = Weka::Filters::Supervised::Attribute::AttributeSelection.build do - use_search search - use_evaluator evaluator -end - -filtered_instances = instances.apply_filter(filter) -``` - -## Classifiers - -Weka‘s classification and regression algorithms can be found in the `Weka::Classifiers` -namespace. - -The classifier classes are organised in the following submodules: - -```ruby -Weka::Classifiers::Bayes -Weka::Classifiers::Functions -Weka::Classifiers::Lazy -Weka::Classifiers::Meta -Weka::Classifiers::Rules -Weka::Classifiers::Trees -``` - -#### Getting information about a classifier - -To get a description about the classifier class and its available options -you can use the class methods `.description` and `.options` on each classifier: - -```ruby -puts Weka::Classifiers::Trees::RandomForest.description -# Class for constructing a forest of random trees. -# For more information see: -# Leo Breiman (2001). Random Forests. Machine Learning. 45(1):5-32. - -puts Weka::Classifiers::Trees::RandomForest.options -# -I Number of trees to build. -# (default 100) -# -K Number of features to consider (<1=int(log_2(#predictors)+1)). -# (default 0) -# ... - -``` - -The default options that are used for a classifier can be displayed with: - -```ruby -Weka::Classifiers::Trees::RandomForest.default_options -# => "-I 100 -K 0 -S 1 -num-slots 1" -``` - -#### Creating a new classifier - -To build a new classifiers model based on training instances you can use -the following syntax: - -```ruby -instances = Weka::Core::Instances.from_arff('weather.arff') -instances.class_attribute = :play - -classifier = Weka::Classifiers::Trees::RandomForest.new -classifier.use_options('-I 200 -K 5') -classifier.train_with_instances(instances) -``` -You can also build a classifier by using the block syntax: - -```ruby -classifier = Weka::Classifiers::Trees::RandomForest.build do - use_options '-I 200 -K 5' - train_with_instances instances -end - -``` - -#### Evaluating a classifier model - -You can evaluate the trained classifier using [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)): - -```ruby -# default number of folds is 3 -evaluation = classifier.cross_validate - -# with a custom number of folds -evaluation = classifier.cross_validate(folds: 10) -``` - -The cross-validation returns a `Weka::Classifiers::Evaluation` object which can be used to get details about the accuracy of the trained classification model: - -```ruby -puts evaluation.summary -# -# Correctly Classified Instances 10 71.4286 % -# Incorrectly Classified Instances 4 28.5714 % -# Kappa statistic 0.3778 -# Mean absolute error 0.4098 -# Root mean squared error 0.4657 -# Relative absolute error 87.4588 % -# Root relative squared error 96.2945 % -# Coverage of cases (0.95 level) 100 % -# Mean rel. region size (0.95 level) 96.4286 % -# Total Number of Instances 14 -``` - -The evaluation holds detailed information about a number of different meassures of interest, -like the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall), the FP/FN/TP/TN-rates, [F-Measure](https://en.wikipedia.org/wiki/F1_score) and the areas under PRC and [ROC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic) curve. - -If your trained classifier should be evaluated against a set of *test instances*, -you can use `evaluate`: - -```ruby -test_instances = Weka::Core::Instances.from_arff('test_data.arff') -test_instances.class_attribute = :play - -evaluation = classifier.evaluate(test_instances) -``` - -#### Classifying new data - -Each classifier implements either a `classify` method or a `distibution_for` method, or both. - -The `classify` method takes a Weka::Core::DenseInstance or an Array of values as argument and returns the predicted class value: - -```ruby -instances = Weka::Core::Instances.from_arff('unclassified_data.arff') - -# with an instance as argument -instances.map do |instance| - classifier.classify(instance) -end -# => ['no', 'yes', 'yes', ...] - -# with an Array of values as argument -classifier.classify [:sunny, 80, 80, :FALSE, '?'] -# => 'yes' -``` - -The `distribution_for` method takes a Weka::Core::DenseInstance or an Array of values as argument as well and returns a hash with the distributions per class value: - -```ruby -instances = Weka::Core::Instances.from_arff('unclassified_data.arff') - -# with an instance as argument -classifier.distribution_for(instances.first) -# => { "yes" => 0.26, "no" => 0.74 } - -# with an Array of values as argument -classifier.distribution_for [:sunny, 80, 80, :FALSE, '?'] -# => { "yes" => 0.62, "no" => 0.38 } -``` - -### Clusterers - -Clustering is an unsupervised machine learning technique which tries to find patterns in data and group sets of data. Clustering algorithms work without class attributes. - -Weka‘s clustering algorithms can be found in the `Weka::Clusterers` namespace. - -The following clusterer classes are available: - -```ruby -Weka::Clusterers::Canopy -Weka::Clusterers::Cobweb -Weka::Clusterers::EM -Weka::Clusterers::FarthestFirst -Weka::Clusterers::HierarchicalClusterer -Weka::Clusterers::SimpleKMeans -``` - -#### Getting information about a clusterer - -To get a description about the clusterer class and its available options -you can use the class methods `.description` and `.options` on each clusterer: - -```ruby -puts Weka::Clusterers::SimpleKMeans.description -# Cluster data using the k means algorithm. -# ... - -puts Weka::Clusterers::SimpleKMeans.options -# -N Number of clusters. -# (default 2). -# -init Initialization method to use. -# 0 = random, 1 = k-means++, 2 = canopy, 3 = farthest first. -# (default = 0) -# ... -``` - -The default options that are used for a clusterer can be displayed with: - -```ruby -Weka::Clusterers::SimpleKMeans.default_options -# => "-init 0 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -# -t2 -1.0 -N 2 -A weka.core.EuclideanDistance -R first-last -I 500 -num-slots 1 -S 10" -``` - -#### Creating a new Clusterer - -To build a new clusterer model based on training instances you can use the following syntax: - -```ruby -instances = Weka::Core::Instances.from_arff('weather.arff') - -clusterer = Weka::Clusterers::SimpleKMeans.new -clusterer.use_options('-N 3 -I 600') -clusterer.train_with_instances(instances) -``` - -You can also build a clusterer by using the block syntax: - -```ruby -classifier = Weka::Clusterers::SimpleKMeans.build do - use_options '-N 5 -I 600' - train_with_instances instances -end -``` - -#### Evaluating a clusterer model - -You can evaluate trained density-based clusterer using [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)) (The only density-based clusterer in the Weka lib is `EM` at the moment). - -The cross-validation returns the cross-validated log-likelihood: - -```ruby -# default number of folds is 3 -log_likelihood = clusterer.cross_validate -# => -10.556166997137497 - -# with a custom number of folds -log_likelihood = clusterer.cross_validate(folds: 10) -# => -10.262696653333032 -``` - -If your trained classifier should be evaluated against a set of *test instances*, -you can use `evaluate`. -The evaluation returns a `Weka::Clusterer::ClusterEvaluation` object which can be used to get details about the accuracy of the trained clusterer model: - -```ruby -test_instances = Weka::Core::Instances.from_arff('test_data.arff') -evaluation = clusterer.evaluate(test_instances) - -puts evaluation.summary -# EM -# == -# -# Number of clusters: 2 -# Number of iterations performed: 7 -# -# Cluster -# Attribute 0 1 -# (0.35) (0.65) -# ============================== -# outlook -# sunny 3.8732 3.1268 -# overcast 1.7746 4.2254 -# rainy 2.1889 4.8111 -# [total] 7.8368 12.1632 -# ... -``` - -#### Clustering new data - -Similar to classifiers, clusterers come with a either a `cluster` method or a `distribution_for` method which both take a Weka::Core::DenseInstance or an Array of values as argument. - -The `classify` method returns the index of the predicted cluster: - -```ruby -instances = Weka::Core::Instances.from_arff('unlabeled_data.arff') - -clusterer = Weka::Clusterers::Canopy.build - train_with_instances instances -end - -# with an instance as argument -instances.map do |instance| - clusterer.cluster(instance) -end -# => [3, 3, 4, 0, 0, 1, 2, 3, 0, 0, 2, 2, 4, 1] - -# with an Array of values as argument -clusterer.cluster [:sunny, 80, 80, :FALSE] -# => 4 -``` - -The `distribution_for` method returns an Array with the distributions at the cluster‘s index: - -```ruby -# with an instance as argument -clusterer.distribution_for(instances.first) -# => [0.17229465277140552, 0.1675583309853506, 0.15089102301329346, 0.3274056122786787, 0.18185038095127165] - -# with an Array of values as argument -classifier.distribution_for [:sunny, 80, 80, :FALSE] -# => [0.21517055355632506, 0.16012256401406233, 0.17890840384466453, 0.2202344150907843, 0.2255640634941639] -``` - -#### Adding a cluster attribute to a dataset - -After building and training a clusterer with training instances you can use the clusterer -in the unsupervised attribute filter `AddCluster` to assign a cluster to each instance of a dataset: - -```ruby -filter = Weka::Filter::Unsupervised::Attribute::AddCluster.new -filter.clusterer = clusterer - -instances = Weka::Core::Instances.from_arff('unlabeled_data.arff') -clustered_instances = instances.apply_filter(filter) - -puts clustered_instances.to_s -``` - -`clustered_instance` now has a nominal `cluster` attribute as the last attribute. -The values of the cluster attribute are the *N* cluster names, e.g. with *N = 2* clusters, the ARFF representation looks like: - -``` -... -@attribute outlook {sunny,overcast,rainy} -@attribute temperature numeric -@attribute humidity numeric -@attribute windy {TRUE,FALSE} -@attribute cluster {cluster1,cluster2} -... -``` - -Each instance is now assigned to a cluster, e.g.: - -``` -... -@data -sunny,85,85,FALSE,cluster1 -sunny,80,90,TRUE,cluster1 -... -``` - -### Serializing Objects - -You can serialize objects with the `Weka::Core::SerializationHelper` class: - -```ruby -# writing an Object to a file: -Weka::Core::SerializationHelper.write('path/to/file.model', classifier) - -# load an Object from a serialized file: -object = Weka::Core::SerializationHelper.read('path/to/file.model') -``` - -Instead of `.write` and `.read` you can also call the aliases `.serialize` and `.deserialize`. - -Serialization can be helpful if the training of e.g. a classifier model takes -some minutes. Instead of running the whole training on instantiating a classifier you -can speed up this process tremendously by serializing a classifier once it was trained and later load it from the file again. - -Classifiers, Clusterers, Instances and Filters also have a `#serialize` method -which you can use to directly serialize an Instance of these, e.g. for a Classifier: - -```ruby -instances = Weka::Core::Instances.from_arff('weather.arff') -instances.class_attribute = :play - -classifier = Weka::Core::Trees::RandomForest.build do - train_with_instances instances -end - -# store trained model as binary file -classifier.serialize('randomforest.model') - -# load Classifier from binary file -loaded_classifier = Weka::Core::SerializationHelper.deserialize('randomforest.model') -# => # -``` +Please refer to [the gem‘s Wiki](https://github.com/paulgoetze/weka-jruby/wiki) for +detailed information about how to use weka with JRuby and some examplary code snippets. ## Development diff --git a/lib/weka/core/attribute.rb b/lib/weka/core/attribute.rb index eba4035..28b7daa 100644 --- a/lib/weka/core/attribute.rb +++ b/lib/weka/core/attribute.rb @@ -11,14 +11,14 @@ def values # The order of the if statements is important here, because a date is also # a numeric. def internal_value_of(value) - if date? - parse_date(value.to_s) - elsif numeric? - value.to_f - elsif nominal? - index_of_value(value.to_s) - end + return value if value === Float::NAN + return Float::NAN if [nil, '?'].include?(value) + return parse_date(value.to_s) if date? + return value.to_f if numeric? + return index_of_value(value.to_s) if nominal? end end + + Weka::Core::Attribute.__persistent__ = true end end diff --git a/lib/weka/core/dense_instance.rb b/lib/weka/core/dense_instance.rb index 39e78d0..d06360c 100644 --- a/lib/weka/core/dense_instance.rb +++ b/lib/weka/core/dense_instance.rb @@ -7,7 +7,11 @@ class DenseInstance java_import "java.text.SimpleDateFormat" def initialize(data, weight: 1.0) - super(weight, data.to_java(:double)) + if data.kind_of?(Integer) + super(data) + else + super(weight, to_java_double(data)) + end end def attributes @@ -30,15 +34,7 @@ def each_attribute_with_index def to_a to_double_array.each_with_index.map do |value, index| - attribute = attribute_at(index) - - if attribute.date? - format_date(value, attribute.date_format) - elsif attribute.numeric? - value - elsif attribute.nominal? - attribute.value(value) - end + value_from(value, index) end end @@ -47,6 +43,29 @@ def to_a private + def to_java_double(values) + data = values.map do |value| + ['?', nil].include?(value) ? Float::NAN : value + end + + data.to_java(:double) + end + + def value_from(value, index) + return '?' if value.nan? + return value if dataset.nil? + + attribute = attribute_at(index) + + if attribute.date? + format_date(value, attribute.date_format) + elsif attribute.numeric? + value + elsif attribute.nominal? + attribute.value(value) + end + end + def attribute_at(index) return attributes[index] unless dataset.class_attribute_defined? diff --git a/lib/weka/core/instances.rb b/lib/weka/core/instances.rb index 018c3be..5ab45b0 100644 --- a/lib/weka/core/instances.rb +++ b/lib/weka/core/instances.rb @@ -171,6 +171,12 @@ def apply_filters(*filters) end end + def merge(*instances) + instances.inject(self) do |merged_instances, dataset| + self.class.merge_instances(merged_instances, dataset) + end + end + private def add_attribute(attribute) diff --git a/lib/weka/version.rb b/lib/weka/version.rb index 14b3784..159eb8a 100644 --- a/lib/weka/version.rb +++ b/lib/weka/version.rb @@ -1,3 +1,3 @@ module Weka - VERSION = "0.2.0" + VERSION = "0.3.0" end diff --git a/spec/core/attribute_spec.rb b/spec/core/attribute_spec.rb index f528fba..fe60c0f 100644 --- a/spec/core/attribute_spec.rb +++ b/spec/core/attribute_spec.rb @@ -25,6 +25,18 @@ it 'should return the value as a float if given as string' do expect(attribute.internal_value_of('3.5')).to eq 3.5 end + + it 'should return NaN if the given value is Float::NAN' do + expect(attribute.internal_value_of(Float::NAN)).to be Float::NAN + end + + it 'should return NaN if the given value is nil' do + expect(attribute.internal_value_of(nil)).to be Float::NAN + end + + it 'should return NaN if the given value is "?"' do + expect(attribute.internal_value_of("?")).to be Float::NAN + end end context 'a nominal attribute' do @@ -42,6 +54,18 @@ expect(attribute.internal_value_of(:true)).to eq 0 expect(attribute.internal_value_of(:false)).to eq 1 end + + it 'should return NaN if the given value is Float::NAN' do + expect(attribute.internal_value_of(Float::NAN)).to be Float::NAN + end + + it 'should return NaN if the given value is nil' do + expect(attribute.internal_value_of(nil)).to be Float::NAN + end + + it 'should return NaN if the given value is "?"' do + expect(attribute.internal_value_of("?")).to be Float::NAN + end end context 'a data attribute' do @@ -59,6 +83,18 @@ it 'should return the right date timestamp value' do expect(attribute.internal_value_of(datetime)).to eq unix_timestamp end + + it 'should return NaN if the given value is Float::NAN' do + expect(attribute.internal_value_of(Float::NAN)).to be Float::NAN + end + + it 'should return NaN if the given value is nil' do + expect(attribute.internal_value_of(nil)).to be Float::NAN + end + + it 'should return NaN if the given value is "?"' do + expect(attribute.internal_value_of("?")).to be Float::NAN + end end end end \ No newline at end of file diff --git a/spec/core/dense_instance_spec.rb b/spec/core/dense_instance_spec.rb index b4b24e6..d381abc 100644 --- a/spec/core/dense_instance_spec.rb +++ b/spec/core/dense_instance_spec.rb @@ -25,6 +25,27 @@ end end + describe 'instantiation' do + describe 'with an Integer value' do + it 'should create a instance with only missing values' do + values = Weka::Core::DenseInstance.new(2).values + expect(values).to eq ['?', '?'] + end + end + + describe 'with an array' do + it 'should create an instance with the given values' do + values = Weka::Core::DenseInstance.new([1, 2, 3]).values + expect(values).to eq [1, 2, 3] + end + + it 'should handle "?" values or nil values' do + values = Weka::Core::DenseInstance.new([1, '?', nil, 4]).values + expect(values).to eq [1, '?', '?', 4] + end + end + end + describe '#to_a' do let(:values) { ['rainy',50.0, 50.0,'TRUE','no','2015-12-24 11:11'] } diff --git a/spec/core/instances_spec.rb b/spec/core/instances_spec.rb index 5dbe674..bbcdb96 100644 --- a/spec/core/instances_spec.rb +++ b/spec/core/instances_spec.rb @@ -26,6 +26,7 @@ it { is_expected.to respond_to :add_instance } it { is_expected.to respond_to :apply_filter } it { is_expected.to respond_to :apply_filters } + it { is_expected.to respond_to :merge } it { is_expected.to respond_to :class_attribute= } it { is_expected.to respond_to :class_attribute } @@ -412,6 +413,19 @@ expect(subject.instances.last.to_s).to eq data.to_s end + + it 'should add a given instance with only missing values' do + data = Weka::Core::DenseInstance.new(subject.size) + subject.add_instance(data) + expect(subject.instances.last.to_s).to eq data.to_s + end + + it 'should add a given instance with partly missing values' do + data = [:sunny, 70, nil, '?', Float::NAN] + subject.add_instance(data) + + expect(subject.instances.last.to_s).to eq 'sunny,70,?,?,?' + end end describe '#add_instances' do @@ -460,4 +474,45 @@ end end + describe '#merge' do + let(:attribute_a) { subject.attributes[0] } + let(:attribute_b) { subject.attributes[1] } + let(:attribute_c) { subject.attributes[2] } + + let(:instances_a) { Weka::Core::Instances.new(attributes: [attribute_a]) } + let(:instances_b) { Weka::Core::Instances.new(attributes: [attribute_b]) } + let(:instances_c) { Weka::Core::Instances.new(attributes: [attribute_c]) } + + context 'when merging one instances object' do + it 'should call .merge_instance of Weka::Core::Instances' do + expect(Weka::Core::Instances) + .to receive(:merge_instances) + .with(instances_a, instances_b) + + instances_a.merge(instances_b) + end + + it 'should return the result of .merge_instance' do + merged = double('instances') + allow(Weka::Core::Instances).to receive(:merge_instances).and_return(merged) + + expect(instances_a.merge(instances_b)).to eq merged + end + end + + context 'when merging multiple instances' do + it 'should call .merge_instances mutliple times' do + expect(Weka::Core::Instances).to receive(:merge_instances).twice + instances_a.merge(instances_b, instances_c) + end + + it 'should return the merged instances' do + merged = instances_a.merge(instances_b, instances_c) + merged_attributes = [attribute_a, attribute_b, attribute_c] + + expect(merged.attributes).to match_array merged_attributes + end + end + end + end