Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize batch size #137

Open
csala opened this issue Mar 10, 2021 · 0 comments
Open

Optimize batch size #137

csala opened this issue Mar 10, 2021 · 0 comments
Labels
feature request Request for a new feature

Comments

@csala
Copy link
Contributor

csala commented Mar 10, 2021

Problem Description

Currently (and even after #135) is resolved, the last batch from the dataset loader is dropped if it is shorter than the batch size, potentially resulting in dropping a considerable portion of the dataset.

For example, if a dataset has 999 rows and the batch size is 500, 499 rows are being currently dropped.

Expected behavior

We should think about a way to optimize the batch size to ensure that we drop the minimum number of rows possible, while still trying to get as close as possible to the specified batch size.

We may possibly consider adding a boolean optimize_batch_size argument for it.

@csala csala added the internal The issue doesn't change the API or functionality label Mar 10, 2021
@csala csala added feature request Request for a new feature and removed internal The issue doesn't change the API or functionality labels Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

1 participant