Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Issues with Java DB API compared to Python #94

Open
buildingredcrane opened this issue Oct 7, 2024 · 0 comments
Open

Memory Issues with Java DB API compared to Python #94

buildingredcrane opened this issue Oct 7, 2024 · 0 comments

Comments

@buildingredcrane
Copy link

buildingredcrane commented Oct 7, 2024

When loading a large set data into duckdb from arrow flight in Java/Scala an excessive amount of memory compared to python

The following code in python works using a few Gb of memory , where cur is a sqlflite

        cur.execute("SELECT * FROM data_table ")
        data_table = cur.fetch_record_batch()
        ddb.sql("INSERT INTO ddb_data_table SELECT * FROM data_table", connection=get_ddb_connection())

where as blows out the memory on the same box in scala, I killed it when it reached 12 Gb

        statement.setSqlQuery("SELECT * FROM data_table ")
        val arrowReader = statement.executeQuery().getReader()
        val arrowArrayStream = ArrowArrayStream.allocateNew(jdbcAllocator)
        Data.exportArrayStream(jdbcAllocator, arrowReader, arrowArrayStream)
        ddb_conn.registerArrowStream("data_table", arrowArrayStream)
        val ddbStatement = ddb_conn.createStatement()
        statement.execute("INSERT INTO ddb_data_table SELECT * FROM data_table")        

In both cases the duckdb is file backed

Initially I thought it was an issue with with the arrow flight library but I can iterate though getting the data from the arrow stream without issue. So think the problem lies in the duckdb-java library. I've tried version 1.1.0 and 1.1.1. Feels like the java library is holding on to references it doesn't need to and in affect trying to load the entire arrow stream into memory maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant