Skip to content

Commit

Permalink
[SPARK-50716][CORE] Fix the cleanup logic for symbolic links in `Java…
Browse files Browse the repository at this point in the history
…Utils.deleteRecursivelyUsingJavaIO` method

### What changes were proposed in this pull request?

To address the cleanup logic for symbolic links in the `JavaUtils.deleteRecursivelyUsingJavaIO` method, the following changes have been made in this pr:

1. Change to use `Files.readAttributes(file.toPath(), BasicFileAttributes.class, LinkOption.NOFOLLOW_LINKS)` to read the `BasicFileAttributes` of the file. By specifying `LinkOption.NOFOLLOW_LINKS`, the attributes of the symbolic link itself are read, rather than those of the file it points to. This allows us to use `fileAttributes.isSymbolicLink()` to check if a file is a symbolic link.

2. After the above change, it is no longer possible for `fileAttributes.isDirectory()` and `fileAttributes.isSymbolicLink()` to be true simultaneously. Therefore, when `fileAttributes.isDirectory()` is true, there is no need to check `!fileAttributes.isSymbolicLink()`.

3. When `fileAttributes.isSymbolicLink()` is true, deletion behavior for the symbolic link has been added.

4. When `!file.exists()` is true, an additional check for `!fileAttributes.isSymbolicLink()` has been added. This is because for a broken symbolic link, `file.exists()` will also return false, but in such cases, we should proceed with the cleanup.

5. The previously handwritten `isSymlink` method in JavaUtils has been removed, as it is no longer needed after the above changes.

### Why are the changes needed?
Fix the cleanup logic for symbolic links in `JavaUtils.deleteRecursivelyUsingJavaIO` method.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- New test cases have been added
- Check with existing test cases which named `PipedRDDSuite`:

Run `build/sbt "core/testOnly org.apache.spark.rdd.PipedRDDSuite"`

Before

```
git status
On branch upmaster
Your branch is up to date with 'upstream/master'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
    core/tasks/

ls -l core/tasks
total 0
drwxr-xr-x  5 yangjie01  staff  160  1  3 18:15 099f2492-acef-4556-8a34-1318dccf7ad2
drwxr-xr-x  5 yangjie01  staff  160  1  3 18:15 47d46196-2f7b-4c7b-acf3-7e1d26584c12
drwxr-xr-x  5 yangjie01  staff  160  1  3 18:15 5e23fe20-1e3f-49b8-8404-5cd3b1033e37
drwxr-xr-x  5 yangjie01  staff  160  1  3 18:15 a2cbf5a9-3ebf-4332-be87-c9501830750e
drwxr-xr-x  5 yangjie01  staff  160  1  3 18:15 ddf45bf5-d0fa-4970-9094-930f382b675c
drwxr-xr-x  5 yangjie01  staff  160  1  3 18:15 e25fe5ad-a0be-48d0-81f6-605542f447b5

ls -l core/tasks/099f2492-acef-4556-8a34-1318dccf7ad2
total 0
lrwxr-xr-x  1 yangjie01  staff  59  1  3 18:15 benchmarks -> /Users/yangjie01/SourceCode/git/spark-sbt/core/./benchmarks
lrwxr-xr-x  1 yangjie01  staff  52  1  3 18:15 src -> /Users/yangjie01/SourceCode/git/spark-sbt/core/./src
lrwxr-xr-x  1 yangjie01  staff  55  1  3 18:15 target -> /Users/yangjie01/SourceCode/git/spark-sbt/core/./target
```

We noticed that symbolic links are left behind after the tests, even though manual cleanup has been invoked in the test code:

https://github.com/apache/spark/blob/b210f422b0078d535eddc696ebba8d92f67b81fb/core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala#L214-L232

After

```
git status
On branch deleteRecursivelyUsingJavaIO-SymbolicLink
Your branch is up to date with 'origin/deleteRecursivelyUsingJavaIO-SymbolicLink'.

nothing to commit, working tree clean
```

We observe that there are no residual symbolic links left after the tests.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #49347 from LuciferYang/deleteRecursivelyUsingJavaIO-SymbolicLink.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
LuciferYang authored and dongjoon-hyun committed Jan 4, 2025
1 parent dccb129 commit 98dc763
Show file tree
Hide file tree
Showing 2 changed files with 45 additions and 16 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
import java.nio.channels.ReadableByteChannel;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.LinkOption;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.*;
import java.util.concurrent.TimeUnit;
Expand Down Expand Up @@ -125,10 +126,11 @@ public static void deleteRecursively(File file, FilenameFilter filter) throws IO
private static void deleteRecursivelyUsingJavaIO(
File file,
FilenameFilter filter) throws IOException {
if (!file.exists()) return;
BasicFileAttributes fileAttributes =
Files.readAttributes(file.toPath(), BasicFileAttributes.class);
if (fileAttributes.isDirectory() && !isSymlink(file)) {
Files.readAttributes(file.toPath(), BasicFileAttributes.class, LinkOption.NOFOLLOW_LINKS);
// SPARK-50716: If the file does not exist and not a broken symbolic link, return directly.
if (!file.exists() && !fileAttributes.isSymbolicLink()) return;
if (fileAttributes.isDirectory()) {
IOException savedIOException = null;
for (File child : listFilesSafely(file, filter)) {
try {
Expand All @@ -143,8 +145,8 @@ private static void deleteRecursivelyUsingJavaIO(
}
}

// Delete file only when it's a normal file or an empty directory.
if (fileAttributes.isRegularFile() ||
// Delete file only when it's a normal file, a symbolic link, or an empty directory.
if (fileAttributes.isRegularFile() || fileAttributes.isSymbolicLink() ||
(fileAttributes.isDirectory() && listFilesSafely(file, null).length == 0)) {
boolean deleted = file.delete();
// Delete can also fail if the file simply did not exist.
Expand Down Expand Up @@ -192,17 +194,6 @@ private static File[] listFilesSafely(File file, FilenameFilter filter) throws I
}
}

private static boolean isSymlink(File file) throws IOException {
Objects.requireNonNull(file);
File fileInCanonicalDir = null;
if (file.getParent() == null) {
fileInCanonicalDir = file;
} else {
fileInCanonicalDir = new File(file.getParentFile().getCanonicalFile(), file.getName());
}
return !fileInCanonicalDir.getCanonicalFile().equals(fileInCanonicalDir.getAbsoluteFile());
}

private static final Map<String, TimeUnit> timeSuffixes;

private static final Map<String, ByteUnit> byteSuffixes;
Expand Down
38 changes: 38 additions & 0 deletions core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import java.lang.reflect.Field
import java.net.{BindException, ServerSocket, URI}
import java.nio.{ByteBuffer, ByteOrder}
import java.nio.charset.StandardCharsets.UTF_8
import java.nio.file.{Files => JFiles}
import java.text.DecimalFormatSymbols
import java.util.Locale
import java.util.concurrent.TimeUnit
Expand Down Expand Up @@ -731,6 +732,43 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties {
assert(!sourceFile2.exists())
}

test("SPARK-50716: deleteRecursively - SymbolicLink To File") {
val tempDir = Utils.createTempDir()
val sourceFile = new File(tempDir, "foo.txt")
JFiles.write(sourceFile.toPath, "Some content".getBytes)
assert(sourceFile.exists())

val symlinkFile = new File(tempDir, "bar.txt")
JFiles.createSymbolicLink(symlinkFile.toPath, sourceFile.toPath)

// Check that the symlink was created successfully
assert(JFiles.isSymbolicLink(symlinkFile.toPath))
Utils.deleteRecursively(tempDir)

// Verify that everything is deleted
assert(!tempDir.exists)
}

test("SPARK-50716: deleteRecursively - SymbolicLink To Dir") {
val tempDir = Utils.createTempDir()
val sourceDir = new File(tempDir, "sourceDir")
assert(sourceDir.mkdir())
val sourceFile = new File(sourceDir, "file.txt")
JFiles.write(sourceFile.toPath, "Some content".getBytes)

val symlinkDir = new File(tempDir, "targetDir")
JFiles.createSymbolicLink(symlinkDir.toPath, sourceDir.toPath)

// Check that the symlink was created successfully
assert(JFiles.isSymbolicLink(symlinkDir.toPath))

// Now delete recursively
Utils.deleteRecursively(tempDir)

// Verify that everything is deleted
assert(!tempDir.exists)
}

test("loading properties from file") {
withTempDir { tmpDir =>
val outFile = File.createTempFile("test-load-spark-properties", "test", tmpDir)
Expand Down

0 comments on commit 98dc763

Please sign in to comment.