-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50853][CORE] Close temp shuffle file writable channel #49531
[SPARK-50853][CORE] Close temp shuffle file writable channel #49531
Conversation
…bleChannel.closeAndRead method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making a PR. This sounds like a bug fix. Do you think if we can make a unit test for your claims, @ChenMichael ?
Should be closing file handles when they are not needed anymore instead of relying on finalizer/cleanables to do it.
cc @mridulm |
It seems that this code path does not have sufficient test coverage. I added a log statement in the
So, could you add a targeted test case in this pull request? @ChenMichael Thanks ~ |
Yea, I was finding it difficult to write a unit test for this, so the way I tested was by launching spark shell, running some queries that would force shuffles and looking at the number of open file handles for the executors through lsof. I'll spend some time trying to come up with test cases for it. |
Not sure if there's a better way to demonstrate the problem, but I added a test case that is kind of showing how the channel isn't closed through fetchBlocks flow at least. I had to make a bunch of things public though. Let me know if there's a better way to test this. |
@@ -130,6 +132,62 @@ class NettyBlockTransferServiceSuite | |||
assert(hitExecutorDeadException) | |||
} | |||
|
|||
test("SPARK-50853 - example of simple download file writable channel not being closed") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think this test case is a bit heavy. According to the pr description, perhaps the new case just needs to ensure that after calling the closeAndRead()
method on SimpleDownloadWritableChannel
, the isOpen()
status should return false (Before this PR, this assertion would fail.).
For example, maybe we could add a very simple case in the network-shuffle module:
@Test
public void testIsOpenAfterCallCloseAndRead() throws IOException {
File tempFile = File.createTempFile("test", ".tmp");
tempFile.deleteOnExit();
Map<String, String> confMap = new HashMap<>();
// Perhaps some additional configuration options need to be put into the confMap.
TransportConf shuffle = new TransportConf("shuffle", new MapConfigProvider(confMap));
DownloadFile downloadFile = null;
try {
downloadFile = new SimpleDownloadFile(tempFile, shuffle);
DownloadFileWritableChannel channel = downloadFile.openForWriting();
// ...
// ... Perhaps some other operations are needed.
channel.closeAndRead();
Assertions.assertFalse(channel.isOpen(), "Channel should be closed after closeAndRead.");
// ... Perhaps some additional assertions need to be added.
} finally {
if (downloadFile != null) {
downloadFile.delete();
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This is much simpler. Done
a93d0f3
to
2e4667e
Compare
Thank you for updating. Could you make CI happy @ChenMichael |
I looked at the failing tests and they aren't related to the changes here. They are also passing in the rerun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @ChenMichael and @LuciferYang .
### What changes were proposed in this pull request? Currently, there are two implementations of DownloadFileWritableChannel (which is used for writing data fetched to disk), SimpleDownloadWritableChannel and EncryptedDownloadWritableChannel. The latter closes the writable channel in it's implementation of closeAndRead method while the former does not. As a result, SimpleDownloadWritableChannel channel is never closed and is relying on either the finalizer in FileOutputStream or the phantom cleanable in FileDescriptor to close the file descriptor. The change in this PR is to close the channel in SimpleDownloadWritableChannel closeAndRead method. ### Why are the changes needed? Should be closing file handles when they are not needed anymore instead of relying on finalizer/cleanables to do it. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing spark tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes #49531 from ChenMichael/SPARK-50853-close-temp-shuffle-file-channel. Authored-by: Michael Chen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 4e3b831) Signed-off-by: Dongjoon Hyun <[email protected]>
Merged to master/4.0. Please make a backporting PR to branch-3.5, @ChenMichael . |
What changes were proposed in this pull request?
Currently, there are two implementations of DownloadFileWritableChannel (which is used for writing data fetched to disk), SimpleDownloadWritableChannel and EncryptedDownloadWritableChannel. The latter closes the writable channel in it's implementation of closeAndRead method while the former does not. As a result, SimpleDownloadWritableChannel channel is never closed and is relying on either the finalizer in FileOutputStream or the phantom cleanable in FileDescriptor to close the file descriptor. The change in this PR is to close the channel in SimpleDownloadWritableChannel closeAndRead method.
Why are the changes needed?
Should be closing file handles when they are not needed anymore instead of relying on finalizer/cleanables to do it.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing spark tests.
Was this patch authored or co-authored using generative AI tooling?
No