Sparse concatenate #2286

astaric · 2017-05-05T12:55:25Z

Issue

Add support for sparse tables to Table.concatenate. Fixes #2154.

Description of changes

Modify Table.extend to use vstack to concatenate compatible table, use scipy.sparse.vstack if some of the arrays are sparse.

Includes

Code changes
Tests
Documentation

codecov-io · 2017-05-05T16:37:28Z

Codecov Report

Merging #2286 into master will decrease coverage by 0.02%.
The diff coverage is 93.18%.

@@            Coverage Diff             @@
##           master    #2286      +/-   ##
==========================================
- Coverage    73.2%   73.18%   -0.03%     
==========================================
  Files         316      316              
  Lines       55306    55287      -19     
==========================================
- Hits        40488    40462      -26     
- Misses      14818    14825       +7

astaric · 2017-05-08T10:50:32Z

@lanzagar the (randomly) failing test has nothing to do with this PR. I have restarted the build, hopefully, it will pass this time :)

I am tagging you since you have merged the latest changes to the mosaic (#2133). Can you take a look?

nikicc · 2017-05-09T14:52:22Z

I tested it on text and it doesn't crash any more. Though, the same strange behaviour as described in #2304 happens on BoW and prevents to properly test.

astaric · 2017-05-09T18:30:17Z

Union and intersection are computed according to the "sameness" of the variable, not equality. Since BoW features have compute_value set, variables from different BoWs are considered different, since there is no way to tell if both compute values are the same.

If we modify the widget to only compare type, name, (and values for discrete variables), what do we set for compute_value on the output?

nikicc

Looks good to me.

nikicc · 2017-05-17T11:36:14Z

Orange/data/table.py

                    (len(attr_cols) or len(class_cols)):
                raise TypeError(
                    "Ordinary attributes can only have primitive values")
            if len(attr_cols):
-                if len(attr_cols) == 1:
+                if sp.issparse(self.X) and len(attr_cols) == 1:


Why do we need this? Scipy seems to be able to assign 2d arrays to columns regardless if attr_cols = 2 or attr_cols = [2]. Is this here just for efficiency?

>>> import numpy as np >>> import scipy.sparse as sp >>> x = sp.csr_matrix(np.eye(2)) >>> x.toarray() array([[ 1., 0.], [ 0., 1.]]) >>> x[:, [1]] = np.array([[2], [2]]) /Users/Niko/anaconda/envs/orange3/lib/python3.5/site-packages/scipy/sparse/compressed.py:774: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient. SparseEfficiencyWarning) >>> x.toarray() array([[ 1., 2.], [ 0., 2.]]) >>> x[:, 1] = np.array([[3], [3]]) >>> x.toarray() array([[ 1., 3.], [ 0., 3.]])

The original check has been here for a long time :)

I have removed the if completely.

nikicc · 2017-05-17T11:39:11Z

Orange/widgets/utils/annotated_data.py

@@ -6,6 +6,30 @@
 ANNOTATED_DATA_FEATURE_NAME = "Selected"


+def add_columns(domain, attributes=(), class_vars=(), metas=()):


Should this maybe be put in domain.py?

Use vstack that works with both sparse and dense tables.

- Allow setting of whole columns/rows in X and Y (when setting np.ndarray) - Convert indices for sparse tables only. Otherwise columns in X and Y have to be set with 1d array and columns in metas with 2d array.

Use avoid using Table.from_numpy to make it compatible with classes extending Table.

astaric force-pushed the sparse-concatenate branch from aeee778 to dd9ee3a Compare May 5, 2017 16:37

astaric force-pushed the sparse-concatenate branch from dd9ee3a to 0759397 Compare May 8, 2017 10:20

jerneju mentioned this pull request May 12, 2017

[FIX] Merge: work with sparse #2305

Merged

3 tasks

mstrazar mentioned this pull request May 12, 2017

Venn Diagram should work on sparse #2164

Closed

astaric force-pushed the sparse-concatenate branch from 0759397 to eb3d9d7 Compare May 16, 2017 07:53

nikicc approved these changes May 17, 2017

View reviewed changes

astaric force-pushed the sparse-concatenate branch from eb3d9d7 to 344dcfc Compare May 18, 2017 06:22

astaric added 6 commits May 18, 2017 08:24

OWConcatenate: Use Table.concatenate

ed6f18a

Table: use vstack for row-concatenate

ddcb2cd

Use vstack that works with both sparse and dense tables.

table_tests: Remove test for undocumented behavior

e04cd8c

Table: Relax checks in __setitem__

6736b30

- Allow setting of whole columns/rows in X and Y (when setting np.ndarray) - Convert indices for sparse tables only. Otherwise columns in X and Y have to be set with 1d array and columns in metas with 2d array.

utils: add add_column function

c244ccc

Concatenate: Refactor appending of source id

f337bf6

Use avoid using Table.from_numpy to make it compatible with classes extending Table.

astaric force-pushed the sparse-concatenate branch from 344dcfc to f337bf6 Compare May 18, 2017 06:24

nikicc merged commit 696e786 into biolab:master May 18, 2017

nikicc mentioned this pull request May 19, 2017

Initialization and resizing sparse Tables #2294

Closed

3 tasks

astaric deleted the sparse-concatenate branch September 8, 2017 08:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse concatenate #2286

Sparse concatenate #2286

astaric commented May 5, 2017

codecov-io commented May 5, 2017 •

edited

Loading

astaric commented May 8, 2017

nikicc commented May 9, 2017

astaric commented May 9, 2017

nikicc left a comment

nikicc May 17, 2017

astaric May 18, 2017

nikicc May 17, 2017

		@@ -6,6 +6,30 @@
		ANNOTATED_DATA_FEATURE_NAME = "Selected"


		def add_columns(domain, attributes=(), class_vars=(), metas=()):

Sparse concatenate #2286

Sparse concatenate #2286

Conversation

astaric commented May 5, 2017

Issue

Description of changes

Includes

codecov-io commented May 5, 2017 • edited Loading

Codecov Report

astaric commented May 8, 2017

nikicc commented May 9, 2017

astaric commented May 9, 2017

nikicc left a comment

Choose a reason for hiding this comment

nikicc May 17, 2017

Choose a reason for hiding this comment

astaric May 18, 2017

Choose a reason for hiding this comment

nikicc May 17, 2017

Choose a reason for hiding this comment

codecov-io commented May 5, 2017 •

edited

Loading