-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add k nearest neighbors and cross validation bandwidth methods #97
base: master
Are you sure you want to change the base?
Conversation
Two automatic bandwidth selection methods were added: - k nearest neighbors - cross validated bandwidth The second is based on a grid_search_cv method, wich computes the score over a grid of bandwidth. A score method was added to BaseKDE, for that purpose.
Hi @inti-abbate . Thanks for the PR. Letting you know now that I've seen it, and I will look more closely over it once I find the time. Regarding the last question, I think adding joblib is the way to go - it introduces a dependency, but it's common in similar libraries and it helps speed up computations. Tests are failing due to black. Please install black and run Note to self: see this comment also |
922b46f
to
7e75fa4
Compare
Thanks for the answer. Of course, take your time. |
Sorry about the previous commit, which I force-pushed it back. I just applied black, but flake8 still finds error E203. As I could find in google, this a known issue between black and flake8, and according to here, E203 should be ignored. |
Feel free to add E203 to flake8 ignore list. |
May I ask what happened to this PR? If it's just about the formatting, I could give it a try. After all, the black issue might be resolved at this point, after two years. |
Hi @matfax. By the time of the last commit, the implemented BW selection methods were ready to be used. Regarding the formatting, I could not manage to avoid the flake8 errors, but yes, maybe it was a black issue that is already resolved. |
Two automatic bandwidth selection methods were added:
grid_search_cv
function, wich computes the score over a grid of bandwidths. A score method was added to BaseKDE for that purpose. Sincegrid_search_cv
requires a model, the cv bw method could not be implemented as an independent function. Instead it is a method ofBaseKDE
that callsgrid_search_cv
withmodel=self
(thru and intermediate functioncross_val
, which chooses the best bw according to the grid search).Since both method require parameters other than
data
andweights
, a slight change was introduced to the API:fit
method now have the signaturefit(self, data, weights=None, **kwargs)
, where**kwargs
are the keyword arguments to be passed to the bw selection method. If no**kwargs
is passed, the default parameters will be used, so the change is backwards compatible.There is an issue in which I'd like to know your opinion. Both methods are quite computationally expensive (for large datasets), so they could benefit from CPU parallelization, which is quite easy to add with
joblib
library. On the other hand, this would go against the guideline "Import as few external dependencies as possible". Let me know which option you'd prefer.