Add k nearest neighbors and cross validation bandwidth methods #97

inti-abbate · 2021-06-01T14:26:18Z

Two automatic bandwidth selection methods were added:

k nearest neighbors: Calculates the bw for each data point as the distance to the kth neighbor
cross validated bandwidth: based on a (new) grid_search_cv function, wich computes the score over a grid of bandwidths. A score method was added to BaseKDE for that purpose. Since grid_search_cv requires a model, the cv bw method could not be implemented as an independent function. Instead it is a method of BaseKDE that calls grid_search_cv with model=self (thru and intermediate function cross_val, which chooses the best bw according to the grid search).

Since both method require parameters other than data and weights, a slight change was introduced to the API:
fit method now have the signature fit(self, data, weights=None, **kwargs), where **kwargs are the keyword arguments to be passed to the bw selection method. If no **kwargs is passed, the default parameters will be used, so the change is backwards compatible.

There is an issue in which I'd like to know your opinion. Both methods are quite computationally expensive (for large datasets), so they could benefit from CPU parallelization, which is quite easy to add with joblib library. On the other hand, this would go against the guideline "Import as few external dependencies as possible". Let me know which option you'd prefer.

Two automatic bandwidth selection methods were added: - k nearest neighbors - cross validated bandwidth The second is based on a grid_search_cv method, wich computes the score over a grid of bandwidth. A score method was added to BaseKDE, for that purpose.

tommyod · 2021-06-02T15:08:16Z

Hi @inti-abbate . Thanks for the PR. Letting you know now that I've seen it, and I will look more closely over it once I find the time. Regarding the last question, I think adding joblib is the way to go - it introduces a dependency, but it's common in similar libraries and it helps speed up computations.

Tests are failing due to black. Please install black and run python -m black KDEpy -l 120 to auto-format the code.

Note to self: see this comment also

inti-abbate · 2021-06-02T16:29:03Z

Thanks for the answer. Of course, take your time.
I'll (try to) fix the failing checks, and add parallelization with joblib.

inti-abbate · 2021-06-02T16:40:08Z

Sorry about the previous commit, which I force-pushed it back. I just applied black, but flake8 still finds error E203. As I could find in google, this a known issue between black and flake8, and according to here, E203 should be ignored.

tommyod · 2021-06-02T16:49:48Z

Feel free to add E203 to flake8 ignore list.

matfax · 2023-06-14T14:27:37Z

May I ask what happened to this PR? If it's just about the formatting, I could give it a try. After all, the black issue might be resolved at this point, after two years.

inti-abbate · 2023-06-14T16:58:35Z

Hi @matfax. By the time of the last commit, the implemented BW selection methods were ready to be used. Regarding the formatting, I could not manage to avoid the flake8 errors, but yes, maybe it was a black issue that is already resolved.
I think it would be great if you can fix the formatting issue (and rebase all the new commits) so that the PR can be merged!

inti-abbate force-pushed the knn_and_cross_val branch from 922b46f to 7e75fa4 Compare June 2, 2021 16:24

Apply black for auto-formatting

d5c69d8

Add parallelization with joblib in KNN and CV methods

4addc23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add k nearest neighbors and cross validation bandwidth methods #97

Add k nearest neighbors and cross validation bandwidth methods #97

inti-abbate commented Jun 1, 2021

tommyod commented Jun 2, 2021 •

edited

Loading

inti-abbate commented Jun 2, 2021

inti-abbate commented Jun 2, 2021

tommyod commented Jun 2, 2021

matfax commented Jun 14, 2023

inti-abbate commented Jun 14, 2023

Add k nearest neighbors and cross validation bandwidth methods #97

Are you sure you want to change the base?

Add k nearest neighbors and cross validation bandwidth methods #97

Conversation

inti-abbate commented Jun 1, 2021

tommyod commented Jun 2, 2021 • edited Loading

inti-abbate commented Jun 2, 2021

inti-abbate commented Jun 2, 2021

tommyod commented Jun 2, 2021

matfax commented Jun 14, 2023

inti-abbate commented Jun 14, 2023

tommyod commented Jun 2, 2021 •

edited

Loading