Skip to content

Reading and writing in batches #381

Answered by adamreeve
pavlexander asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @pavlexander, the way this is generally dealt with in Parquet is by splitting data up into row groups. When using the row-oriented API as in your example code, ParquetSharp will internally buffer the data until you create a new row group, and we don't automatically create new row groups between WriteRows calls. So what you probably want instead is something more like:

using var writer = ParquetFile.CreateRowWriter<MyDataType>(fileFullPath, properties);

writer.WriteRows(batch1);

writer.StartNewRowGroup();
writer.WriteRows(batch2);

writer.StartNewRowGroup();
writer.WriteRows(batch3);

writer.Close();

Then when reading, you can read one row group at a time, eg.:

using var reader = Parq…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by pavlexander
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants