-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-7860][CORE] In shuffle writer, replace MemoryMappedFile with ReadableFile to avoid OOM #7861
base: main
Are you sure you want to change the base?
Conversation
cc @kecookier |
/Benchmark Velox |
1 similar comment
/Benchmark Velox |
/Benchmark Velox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -73,7 +73,7 @@ void Spill::insertPayload( | |||
|
|||
void Spill::openSpillFile() { | |||
if (!is_) { | |||
GLUTEN_ASSIGN_OR_THROW(is_, arrow::io::MemoryMappedFile::Open(spillFile_, arrow::io::FileMode::READ)); | |||
GLUTEN_ASSIGN_OR_THROW(is_, arrow::io::ReadableFile::Open(spillFile_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the API implemented with buffered read?
Not sure whether https://github.com/apache/arrow/blob/main/cpp/src/arrow/io/buffered.h may help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spill merge needn't buffer
@zhztheplayer There's no shuffle spill on jenkins. The change won't be tested. |
Thought we always rely on Spark-controlled spill in shuffle. Does Jenkins CI always have enough memory for all shuffle data? |
This comment was marked as off-topic.
This comment was marked as off-topic.
Is it because the spill will be triggered on other operators in the pipeline? Like a sort + shuffle. Will the sort be triggered or shuffle? |
@zhztheplayer @marin-ma can we create a query and config to test it? |
What changes were proposed in this pull request?
This pr fixed #7860 by replacing
mmap
withread
to read spill file.How was this patch tested?