[WIP][ENH] Scatterplot, HeatMap, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data #1655

VesnaT · 2016-10-12T10:40:02Z

No description provided.

janezd · 2016-10-12T10:47:02Z

I haven't yet checked the content of the PR (obviously), but before we merge it we should perhaps talk about the signal name again. I don't think that the verb "to flag" appears in the dictionary with that meaning. Even the use of word "flag" for some kind of marker is IMHO limited to computer science, so the name "flagged data" wouldn't mean anything to an outsider.

What is wrong with "marked"?

Vesna, sorry if this will require some (hopefully trivial) refactoring.

codecov-io · 2016-10-12T10:47:17Z

Current coverage is 89.38% (diff: 100%)

Merging #1655 into master will increase coverage by <.01%

@@             master      #1655   diff @@
==========================================
  Files            79         79          
  Lines          8589       8593     +4   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           7677       7681     +4   
  Misses          912        912          
  Partials          0          0

Powered by Codecov. Last update 4d1ea03...5610750

VesnaT · 2016-10-12T10:55:55Z

No worries.. I've intentionally made this PS 'small' (only four widgets are modified).

janezd · 2016-10-14T14:35:24Z

Orange/misc/flagged_data.py

+    if name not in names:
+        return name
+    counts = [int(re.match(r"(" + name + " )(\d{1,}$)", n).group(2))
+              for n in names if re.match(r"(" + name + " )(\d{1,}$)", n)] + [1]


Something like this:

counts = max((int(mo.group(2)) for mo in re.finditer(r"(" + name + " )(\d{1,}$)", n)), default=0)

Matter of taste: I'd write r"({})(\d{{1,}}$".format(name)

janezd · 2016-10-14T14:54:34Z

Orange/tree.py

@@ -228,6 +228,11 @@ def get_instances(self, nodes):
        if subsets:
            return self.instances[np.unique(np.hstack(subsets))]

+    def get_indices(self, nodes):


get_instances could call this function. Don't forget to handle the case when get_indices returns None.

janezd · 2016-10-14T16:41:49Z

Orange/tests/test_misc.py

+        self.assertEqual(len(flagged), len(self.zoo))
+        self.assertEqual(0, np.sum([i[FLAGGED_FEATURE_NAME] for i in flagged]))
+
+    def test_cascade_flagged_tables(self):


I could be too smart by half and replace your function that uses regular expressions with one that uses the first unoccupied name. This test wouldn't fail because there are no "holes". Can you add some code to this test to remove the second meta and then "flag" the table again, so my smart idea would fail?

janezd · 2016-10-14T16:56:16Z

Orange/misc/flagged_data.py

+import numpy as np
+from Orange.data import Table, Domain, DiscreteVariable
+
+FLAGGED_SIGNAL_NAME = "Flagged Data"


Thinking about it again, I started liking the constant because it will indeed make us stick to the same name in all widgets.

The name is not perfect, though. Not only will we change "flagged" to something else, but it also suggests that it is a name of a flagged signal. ANNOTATED_DATA_SIGNAL_NAME = "Data" is better (than ANNOTATED_SIGNAL_NAME) but awfully long. Think about it...

If nothing else, the module belongs within Orange.widgets because it includes a name of the signal, hence it is related to widgets. :)

janezd · 2016-10-14T17:21:08Z

Orange/widgets/unsupervised/tests/test_owdistancemap.py

+        self.assertEqual(0, np.sum([i[FLAGGED_FEATURE_NAME] for i in flagged]))
+
+        # select data points
+        points = random.sample(range(0, len(self.iris)), 20)


I prefer deterministic tests. Randomly choose some indices instead of choosing some indices at random. (https://xkcd.com/221/).

janezd · 2016-10-14T17:23:43Z

Orange/widgets/unsupervised/tests/test_owdistancemap.py

+
+        # check selected data output
+        selected = self.get_output("Data")
+        self.assertEqual(len(selected), len(points))


What about testing that the correct instances were chosen? There may be a better way to do it, but I somewhere used something like np.testing.assert_almost_equal(selected.X, self.iris.X[points]).

janezd · 2016-10-14T17:25:56Z

Orange/widgets/unsupervised/tests/test_owmds.py

+        self.assertEqual(0, np.sum([i[FLAGGED_FEATURE_NAME] for i in flagged]))
+
+        # select data points
+        points = random.sample(range(0, len(self.iris)), 20)


Same as in DistanceMap.

janezd · 2016-10-14T17:43:56Z

Github is showing some of my comments as outdated, although you haven't made any further commits. Please check those, too.

Apart from these minor suggestions, I like the PR, in particular your factoring out of the parts of the tests. It would be even greater if you could simulate, say, selection action, but I know this is probably too hard.

Please tell me/us when you make the changes, so we don't wait too long with merging, since rebasing dozens of widgets is not that much fun.

VesnaT · 2016-10-17T07:42:57Z

The comments are outdated because the code was moved to a Mixin in one of the following commits (441b15a), since it was very similar for all widgets. I could have rebased, but wanted to keep the code in case someone didn't like the Mixin idea.
I thought I did simulate the selection...

Since there are only clusters in Selected Data, 'Other' should be removed from its domain. The value is still present in Flagged Data domain.

VesnaT · 2016-10-18T09:31:49Z

Done.

janezd · 2016-10-18T18:20:36Z

Orange/widgets/utils/annotated_data.py

+    domain = Domain(data.domain.attributes, data.domain.class_vars, metas)
+    annotated = np.zeros((len(data), 1))
+    if selected_indices is not None:
+        annotated[selected_indices] = 1


Should this be 0 or 1? If nothing is selected, all instances have to have Selected=No, no?

Should this be 0 or 1? If nothing is selected, all instances have to have Selected=No, no?

I'm stupid. Please ignore.

janezd · 2016-10-18T18:33:08Z

Orange/widgets/evaluate/tests/test_owconfusionmatrix.py

+        selected = [i for i, t in enumerate(zip(
+            self.widget.results.actual, self.widget.results.predicted[0]))
+                    if t in indices]
+        self.selected_indices = self.widget.results.row_indices[selected]


Would it be nicer if _select_data returned selected_indices instead of (ab?)using the instance's attributes for semi-global data storage?

janezd · 2016-10-18T18:43:25Z

Orange/widgets/tests/base.py

+        self.same_input_output_domain = True
+        self.selected_indices = []
+
+    def test_outputs(self):


I like the way you factor out the tests.

janezd · 2016-10-18T18:46:58Z

Orange/widgets/tests/base.py

+    def _compare_selected_annotated_domains(self, selected, annotated):
+        selected_vars = selected.domain.variables + selected.domain.metas
+        annotated_vars = annotated.domain.variables + annotated.domain.metas
+        self.assertTrue(all((var in annotated_vars for var in selected_vars)))


This tests whether annotated.domain.variables + annotated.domain.metas are a subset (<=) of selected.domain.variables + selected.domain.metas. Doing it explicitly, using sets, would be more obvious and easier to read, I guess.

janezd · 2016-10-18T19:00:07Z

Orange/widgets/unsupervised/owhierarchicalclustering.py

@@ -1092,10 +1099,12 @@ def commit(self):

        if not selected_indices:
            self.send("Selected Data", None)
-            self.send("Other Data", None)
+            annotated_data = create_annotated_table(items, selected_indices) \


Not really an issue, but since you're going to make another commit anyway, can you replace selected_indices with [], so that it's obvious this call will always select all (or no) instances.

janezd · 2016-10-18T19:03:37Z

Orange/widgets/unsupervised/owhierarchicalclustering.py

-                unselected_data = data[~mask]
+                if self.append_clusters:
+                    def remove_other_value(vars_):
+                        vars_ = [var for var in vars_]


Why not copy?

janezd

I went through all changes and widgets. Since none of my suggestions are substantial, I'm approving the request, but you can still follow them if you decide so.

VesnaT · 2016-10-19T08:55:28Z

I will fix the suggested in the next PR.

VesnaT changed the title ~~[ENH] Scatterplot and Unsupervised widgets: Output Flagged Data~~ [WIP][ENH] Scatterplot and Unsupervised widgets: Output Flagged Data Oct 12, 2016

VesnaT force-pushed the flagged_data branch from a05f360 to 924dde3 Compare October 13, 2016 08:42

VesnaT changed the title ~~[WIP][ENH] Scatterplot and Unsupervised widgets: Output Flagged Data~~ [ENH] Scatterplot and Unsupervised widgets: Output Flagged Data Oct 13, 2016

VesnaT force-pushed the flagged_data branch 4 times, most recently from cd69ccc to cafa07c Compare October 14, 2016 12:38

VesnaT changed the title ~~[ENH] Scatterplot and Unsupervised widgets: Output Flagged Data~~ [ENH] Scatterplot, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data Oct 14, 2016

VesnaT force-pushed the flagged_data branch from cafa07c to 7dc0295 Compare October 14, 2016 13:10

VesnaT changed the title ~~[ENH] Scatterplot, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data~~ [ENH] Scatterplot, ConfusionMatrix and Unsupervised widgets: Output Flagged Data Oct 14, 2016

VesnaT changed the title ~~[ENH] Scatterplot, ConfusionMatrix and Unsupervised widgets: Output Flagged Data~~ [ENH] Scatterplot, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data Oct 14, 2016

VesnaT force-pushed the flagged_data branch from ceb2a5e to b232484 Compare October 14, 2016 14:21

VesnaT changed the title ~~[ENH] Scatterplot, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data~~ [ENH] Scatterplot, HeatMap, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data Oct 14, 2016

janezd reviewed Oct 14, 2016

View reviewed changes

VesnaT changed the title ~~[ENH] Scatterplot, HeatMap, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data~~ [WIP][ENH] Scatterplot, HeatMap, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data Oct 17, 2016

VesnaT force-pushed the flagged_data branch from b232484 to 8246eb9 Compare October 17, 2016 15:20

VesnaT added 3 commits October 18, 2016 09:53

misc: Add a module for 'Flagged Data' creation

6e29391

OWScatterPlot: Output Flagged Data

eafb96a

OWScatterPlot: Remove Other Data output

65b7525

VesnaT added 3 commits October 18, 2016 09:53

OWScatterPlot: Refactor send_data()

ac9c422

OWHierarchicalClustering: Output Flagged Data

f62e4e9

OWHierarchicalClustering: Remove Other Data output

dd266ba

VesnaT force-pushed the flagged_data branch from 8246eb9 to 24405e0 Compare October 18, 2016 08:00

VesnaT added 5 commits October 18, 2016 10:17

OWHierarchicalClustering: Set Outputs to None when data is removed

762d1d4

OWHierarchicalClustering: Remove 'Other' value from Cluster variable

b2553ab

Since there are only clusters in Selected Data, 'Other' should be removed from its domain. The value is still present in Flagged Data domain.

OWDistanceMap: Output Flagged Data

8e422a0

OWDistanceMap: Rename Data to Selected Data

793e81f

OWMDS: Output Flagged Data instead of Data

83b7bfd

VesnaT force-pushed the flagged_data branch from 24405e0 to 050aa33 Compare October 18, 2016 08:17

VesnaT added 4 commits October 18, 2016 11:19

Unittests: Refactoring

69ceb53

OWConfusionMatrix: Output Flagged Data

deb8585

OWTreeGraph: Output Flagged Data

4435821

OWHeatMap: Output Flagged Data

5610750

VesnaT force-pushed the flagged_data branch from 050aa33 to 5610750 Compare October 18, 2016 09:19

janezd reviewed Oct 18, 2016

View reviewed changes

janezd approved these changes Oct 18, 2016

View reviewed changes

janezd mentioned this pull request Oct 18, 2016

[ENH] Canvas: Always show the link dialog if the user holds Shift #1673

Merged

astaric merged commit caa0ff2 into biolab:master Oct 19, 2016

nikicc mentioned this pull request Nov 14, 2016

OWConfusionMatrix: Fix predicitons order #1751

Closed

3 tasks

astaric modified the milestone: 3.3.9 Nov 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][ENH] Scatterplot, HeatMap, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data #1655

[WIP][ENH] Scatterplot, HeatMap, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data #1655

VesnaT commented Oct 12, 2016

janezd commented Oct 12, 2016

codecov-io commented Oct 12, 2016 •

edited

Loading

VesnaT commented Oct 12, 2016

janezd Oct 14, 2016 •

edited by kernc

Loading

janezd Oct 14, 2016

janezd Oct 14, 2016

janezd Oct 14, 2016

janezd Oct 14, 2016

janezd Oct 14, 2016

janezd Oct 14, 2016

janezd Oct 14, 2016

janezd Oct 14, 2016

janezd commented Oct 14, 2016

VesnaT commented Oct 17, 2016

VesnaT commented Oct 18, 2016

janezd Oct 18, 2016

janezd Oct 18, 2016

janezd Oct 18, 2016

janezd Oct 18, 2016

janezd Oct 18, 2016

janezd Oct 18, 2016

janezd Oct 18, 2016

janezd left a comment

VesnaT commented Oct 19, 2016

[WIP][ENH] Scatterplot, HeatMap, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data #1655

[WIP][ENH] Scatterplot, HeatMap, TreeGraph, ConfusionMatrix and Unsupervised widgets: Output Flagged Data #1655

Conversation

VesnaT commented Oct 12, 2016

janezd commented Oct 12, 2016

codecov-io commented Oct 12, 2016 • edited Loading

Current coverage is 89.38% (diff: 100%)

VesnaT commented Oct 12, 2016

janezd Oct 14, 2016 • edited by kernc Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janezd commented Oct 14, 2016

VesnaT commented Oct 17, 2016

VesnaT commented Oct 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janezd left a comment

Choose a reason for hiding this comment

VesnaT commented Oct 19, 2016

codecov-io commented Oct 12, 2016 •

edited

Loading

janezd Oct 14, 2016 •

edited by kernc

Loading