You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When loading a large set data into duckdb from arrow flight in Java/Scala an excessive amount of memory compared to python
The following code in python works using a few Gb of memory , where cur is a sqlflite
cur.execute("SELECT * FROM data_table ")
data_table=cur.fetch_record_batch()
ddb.sql("INSERT INTO ddb_data_table SELECT * FROM data_table", connection=get_ddb_connection())
where as blows out the memory on the same box in scala, I killed it when it reached 12 Gb
statement.setSqlQuery("SELECT * FROM data_table ")
valarrowReader= statement.executeQuery().getReader()
valarrowArrayStream=ArrowArrayStream.allocateNew(jdbcAllocator)
Data.exportArrayStream(jdbcAllocator, arrowReader, arrowArrayStream)
ddb_conn.registerArrowStream("data_table", arrowArrayStream)
valddbStatement= ddb_conn.createStatement()
statement.execute("INSERT INTO ddb_data_table SELECT * FROM data_table")
In both cases the duckdb is file backed
Initially I thought it was an issue with with the arrow flight library but I can iterate though getting the data from the arrow stream without issue. So think the problem lies in the duckdb-java library. I've tried version 1.1.0 and 1.1.1. Feels like the java library is holding on to references it doesn't need to and in affect trying to load the entire arrow stream into memory maybe?
The text was updated successfully, but these errors were encountered:
When loading a large set data into duckdb from arrow flight in Java/Scala an excessive amount of memory compared to python
The following code in python works using a few Gb of memory , where cur is a sqlflite
where as blows out the memory on the same box in scala, I killed it when it reached 12 Gb
In both cases the duckdb is file backed
Initially I thought it was an issue with with the arrow flight library but I can iterate though getting the data from the arrow stream without issue. So think the problem lies in the duckdb-java library. I've tried version 1.1.0 and 1.1.1. Feels like the java library is holding on to references it doesn't need to and in affect trying to load the entire arrow stream into memory maybe?
The text was updated successfully, but these errors were encountered: