Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Feature Constructor: Vectorize when possible #5949

Closed
wants to merge 2 commits into from

Conversation

janezd
Copy link
Contributor

@janezd janezd commented Apr 20, 2022

Issue

Resolves #5911.

Description of changes
  • Try to compute the function on columns; fall back to per-instance if this fails.
  • Import functions from np instead of from math when available.
  • Show all functions in combo (avoid compiling a separate list of functions)
  • Remove cumsum, cumprod, which created two-dimensional data instead of a column
Includes
  • Code changes
  • Tests

@janezd janezd added the needs discussion Core developers need to discuss the issue label Apr 20, 2022
@janezd janezd force-pushed the featureconstructor-vectorize branch from b9e2847 to 449bb82 Compare April 21, 2022 20:36
@codecov
Copy link

codecov bot commented Apr 21, 2022

Codecov Report

Merging #5949 (7a2b9e8) into master (647f5da) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #5949   +/-   ##
=======================================
  Coverage   86.28%   86.28%           
=======================================
  Files         315      315           
  Lines       66906    66920   +14     
=======================================
+ Hits        57728    57745   +17     
+ Misses       9178     9175    -3     

@ajdapretnar
Copy link
Contributor

This works for my limited amount of use cases.

@markotoplak
Copy link
Member

I was looking at math vs numpy docs and did not see any cases where I would predict conflicts. That does not mean they aren't any.

@janezd also raised the question about import everything from math. I would rather name a subset of functions, perhaps those that are already available in the python 3.10, so that we do not break any current workflows. Imports should be "soft" since not all of them are available on oldest supported python.

@janezd janezd force-pushed the featureconstructor-vectorize branch from ebd5967 to 76822b9 Compare April 22, 2022 17:49
@janezd
Copy link
Contributor Author

janezd commented Apr 22, 2022

@markotoplak, I checked the functions in math and all seem useful, so I'd keep it the selection it is.

As for reporting warnings and errors in expressions: the problem here is that the user can write an expression that cannot be evaluated by numpy, in which case we fall back to per-instance evaluation. To be on the safe side, I'd fall back at any sign of trouble, that is, at any warning or error. When this happen, the execution will be slower (that is: as slow as before this PR) and any errors will be properly reported.

@janezd janezd force-pushed the featureconstructor-vectorize branch 3 times, most recently from add7617 to 64d5fc8 Compare April 22, 2022 19:14
@janezd janezd marked this pull request as ready for review April 22, 2022 19:15
@janezd janezd changed the title [RFC] [ENH] Feature Constructor: Vectorize when possible [ENH] Feature Constructor: Vectorize when possible Apr 22, 2022
@janezd janezd force-pushed the featureconstructor-vectorize branch from 64d5fc8 to 7a2b9e8 Compare April 22, 2022 20:27
@janezd janezd removed the needs discussion Core developers need to discuss the issue label Apr 22, 2022
@janezd janezd assigned ales-erjavec and unassigned markotoplak and lanzagar May 13, 2022
@ales-erjavec
Copy link
Contributor

The random functions (gauss, normalvariate, ...) no longer work on per instance basis. Single generated value is used for the whole column (e.g. sepal_length + gauss(0, 1) or just gauss(0, 1))

@janezd
Copy link
Contributor Author

janezd commented May 17, 2022

Closed in favor of #5975.

@janezd janezd closed this May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance Issue with Feature Constructor
5 participants