Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-7860][CORE] In shuffle writer, replace MemoryMappedFile with ReadableFile to avoid OOM #7861

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ccat3z
Copy link
Contributor

@ccat3z ccat3z commented Nov 8, 2024

What changes were proposed in this pull request?

This pr fixed #7860 by replacing mmap with read to read spill file.

How was this patch tested?

@github-actions github-actions bot added the VELOX label Nov 8, 2024
Copy link

github-actions bot commented Nov 8, 2024

#7860

@ccat3z
Copy link
Contributor Author

ccat3z commented Nov 8, 2024

cc @kecookier

@zhztheplayer zhztheplayer changed the title [GLUTEN-7860][CORE] Replace MemoryMappedFile with ReadableFile to avoid OOM [GLUTEN-7860][CORE] In shuffle writer, replace MemoryMappedFile with ReadableFile to avoid OOM Nov 8, 2024
@kecookier
Copy link
Contributor

/Benchmark Velox

1 similar comment
@ccat3z
Copy link
Contributor Author

ccat3z commented Nov 9, 2024

/Benchmark Velox

@ccat3z ccat3z marked this pull request as ready for review November 9, 2024 03:15
@ccat3z
Copy link
Contributor Author

ccat3z commented Nov 9, 2024

/Benchmark Velox

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ccat3z Do you see #7860 fixed with this approach?

I am triggering a benchmark manually.

cc @marin-ma @FelixYBW

@@ -73,7 +73,7 @@ void Spill::insertPayload(

void Spill::openSpillFile() {
if (!is_) {
GLUTEN_ASSIGN_OR_THROW(is_, arrow::io::MemoryMappedFile::Open(spillFile_, arrow::io::FileMode::READ));
GLUTEN_ASSIGN_OR_THROW(is_, arrow::io::ReadableFile::Open(spillFile_));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the API implemented with buffered read?

Not sure whether https://github.com/apache/arrow/blob/main/cpp/src/arrow/io/buffered.h may help here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spill merge needn't buffer

@marin-ma
Copy link
Contributor

I am triggering a benchmark manually.

@zhztheplayer There's no shuffle spill on jenkins. The change won't be tested.

@zhztheplayer
Copy link
Member

I am triggering a benchmark manually.

@zhztheplayer There's no shuffle spill on jenkins. The change won't be tested.

Thought we always rely on Spark-controlled spill in shuffle. Does Jenkins CI always have enough memory for all shuffle data?

@GlutenPerfBot

This comment was marked as off-topic.

@FelixYBW
Copy link
Contributor

@zhztheplayer There's no shuffle spill on jenkins. The change won't be tested.

Is it because the spill will be triggered on other operators in the pipeline? Like a sort + shuffle. Will the sort be triggered or shuffle?

@FelixYBW
Copy link
Contributor

@zhztheplayer @marin-ma can we create a query and config to test it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CORE] LocalParitionWriter causes OOM during mergeSpills
6 participants