Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Nice binning of time variables (Distributions, SOM) #4123

Merged
merged 2 commits into from
Oct 22, 2019

Conversation

janezd
Copy link
Contributor

@janezd janezd commented Oct 20, 2019

Fixes #3964.

Description of changes
  • provides a general time_binning function, similar to decimal_binning, which can be used in any visualization widget that discretizes numeric features
  • reverses the order of returned binnings in decimal_binning
  • improves the BinDefinition
  • uses the above in Distributions in SOM (the two widgets that already use decimal_binning)
Includes
  • Code changes
  • Tests
  • Documentation

@janezd janezd force-pushed the discretize-datetime-labels branch from cc749d9 to f7d6551 Compare October 20, 2019 17:28
@codecov
Copy link

codecov bot commented Oct 20, 2019

Codecov Report

Merging #4123 into master will increase coverage by 0.03%.
The diff coverage is 93.1%.

@@            Coverage Diff             @@
##           master    #4123      +/-   ##
==========================================
+ Coverage   85.62%   85.66%   +0.03%     
==========================================
  Files         388      389       +1     
  Lines       69435    69638     +203     
==========================================
+ Hits        59454    59654     +200     
- Misses       9981     9984       +3

@janezd janezd force-pushed the discretize-datetime-labels branch 3 times, most recently from 0b3c15d to 947facb Compare October 21, 2019 10:44
@ajdapretnar
Copy link
Contributor

I ❤️ it!

@ajdapretnar
Copy link
Contributor

So far I was not able to find any faults with it. It is responsive, labels are correct, output is as expected. This makes Distributions really useful for visualizing datetime data!

The only thing that doesn't seem to be implemented yet is the legend in SOM - it shows floats for datetime.

self.colors = [QColor(*color) for color in self.attr_color.colors]
else:
col = self.data.get_column_view(self.attr_color)[0].astype(float)
self.thresholds = decimal_binnings(col, min_bins=4)[0][1:-1]
binning = decimal_binnings(col, min_bins=4)[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forget to check whether self.attr_color is a Time Variable. This doesn't use time_binnings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I just ported it to new signature of decimal_binning. It works now.

I'm not sure I like the legend. It uses short labels.

Screenshot 2019-10-21 at 16 32 15

Perhaps the lower bound for each interval should be in the long form. That is, the last two should be 17 Apr - Jul and >= 17 Jul. The second would be in short form, except of course when the longer is needed, like in 16 Oct - 17 Jan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this.

Screenshot 2019-10-21 at 19 27 15


if not return_defs:
bins = [get_bins(bin) for bin in bins]
def time_binnings(data, *, min_bins=2, max_bins=50, min_unique=5, add_unique=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output of time_binnings and decimal_binnings are not the same. The latter uses BinDefinition, which has thresholds and labels properties, while the former doesn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be. OWDistributions calls either time_binnings or decimal_binnings, stores the result and then doesn't care where the bins came from.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But OWSOM doesn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was OWSOM's problem. Fixed.

@janezd janezd force-pushed the discretize-datetime-labels branch 2 times, most recently from 97ff40b to 04d9602 Compare October 21, 2019 14:30
@janezd janezd force-pushed the discretize-datetime-labels branch from 94fc142 to 29f8b43 Compare October 21, 2019 19:49
@ajdapretnar ajdapretnar merged commit 4b3f94c into biolab:master Oct 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Distributions: binning datetime
2 participants