Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Memory leaks with load_dataset_multi_txn #330

Open
Ramlaoui opened this issue Dec 24, 2024 · 0 comments
Open

[Bug]: Memory leaks with load_dataset_multi_txn #330

Ramlaoui opened this issue Dec 24, 2024 · 0 comments
Labels
bug Something isn't working community pgai

Comments

@Ramlaoui
Copy link

Ramlaoui commented Dec 24, 2024

What happened?

Hi, I've been using the new feature to add HugginFace datasets inside a table. However, for large datasets it seems like the call to load_dataset_multi_txn crashes after a certain time because of OOM problems.
I've tried playing with the size of the batch and commit_every_n_batches but I still get the same issue.

Is there any way to mitigate this issue or at least to have a parameter setting where we want to start uploading from the dataset (eg. after 1000 batches).

image

pgai extension affected

0.6.0

pgai library affected

No response

PostgreSQL version used

17.1

What operating system did you use?

Ubuntu 24.04 32GB RAM

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

call ai.load_dataset_multi_txn('LeMaterial/LeMat-Bulk', 'compatible_pbe', table_name => 'lemat', if_table_exists => 'append', commit_every_n_batches => 100);

Are you going to work on the bugfix?

🆘 No, could someone else please work on the bugfix?

@Ramlaoui Ramlaoui added bug Something isn't working community pgai labels Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community pgai
Projects
None yet
Development

No branches or pull requests

1 participant