-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse concatenate #2286
Sparse concatenate #2286
Conversation
aeee778
to
dd9ee3a
Compare
Codecov Report
@@ Coverage Diff @@
## master #2286 +/- ##
==========================================
- Coverage 73.2% 73.18% -0.03%
==========================================
Files 316 316
Lines 55306 55287 -19
==========================================
- Hits 40488 40462 -26
- Misses 14818 14825 +7 |
dd9ee3a
to
0759397
Compare
@lanzagar the (randomly) failing test has nothing to do with this PR. I have restarted the build, hopefully, it will pass this time :) I am tagging you since you have merged the latest changes to the mosaic (#2133). Can you take a look? |
I tested it on text and it doesn't crash any more. Though, the same strange behaviour as described in #2304 happens on BoW and prevents to properly test. |
Union and intersection are computed according to the "sameness" of the variable, not equality. Since BoW features have compute_value set, variables from different BoWs are considered different, since there is no way to tell if both compute values are the same. If we modify the widget to only compare type, name, (and values for discrete variables), what do we set for compute_value on the output? |
0759397
to
eb3d9d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Orange/data/table.py
Outdated
(len(attr_cols) or len(class_cols)): | ||
raise TypeError( | ||
"Ordinary attributes can only have primitive values") | ||
if len(attr_cols): | ||
if len(attr_cols) == 1: | ||
if sp.issparse(self.X) and len(attr_cols) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? Scipy seems to be able to assign 2d arrays to columns regardless if attr_cols = 2
or attr_cols = [2]
. Is this here just for efficiency?
>>> import numpy as np
>>> import scipy.sparse as sp
>>> x = sp.csr_matrix(np.eye(2))
>>> x.toarray()
array([[ 1., 0.],
[ 0., 1.]])
>>> x[:, [1]] = np.array([[2], [2]])
/Users/Niko/anaconda/envs/orange3/lib/python3.5/site-packages/scipy/sparse/compressed.py:774: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
SparseEfficiencyWarning)
>>> x.toarray()
array([[ 1., 2.],
[ 0., 2.]])
>>> x[:, 1] = np.array([[3], [3]])
>>> x.toarray()
array([[ 1., 3.],
[ 0., 3.]])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original check has been here for a long time :)
I have removed the if completely.
@@ -6,6 +6,30 @@ | |||
ANNOTATED_DATA_FEATURE_NAME = "Selected" | |||
|
|||
|
|||
def add_columns(domain, attributes=(), class_vars=(), metas=()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this maybe be put in domain.py
?
eb3d9d7
to
344dcfc
Compare
Use vstack that works with both sparse and dense tables.
- Allow setting of whole columns/rows in X and Y (when setting np.ndarray) - Convert indices for sparse tables only. Otherwise columns in X and Y have to be set with 1d array and columns in metas with 2d array.
Use avoid using Table.from_numpy to make it compatible with classes extending Table.
344dcfc
to
f337bf6
Compare
Issue
Add support for sparse tables to Table.concatenate. Fixes #2154.
Description of changes
Modify Table.extend to use vstack to concatenate compatible table, use scipy.sparse.vstack if some of the arrays are sparse.
Includes