Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-50716][CORE] Fix the cleanup logic for symbolic links in `Java…
…Utils.deleteRecursivelyUsingJavaIO` method ### What changes were proposed in this pull request? To address the cleanup logic for symbolic links in the `JavaUtils.deleteRecursivelyUsingJavaIO` method, the following changes have been made in this pr: 1. Change to use `Files.readAttributes(file.toPath(), BasicFileAttributes.class, LinkOption.NOFOLLOW_LINKS)` to read the `BasicFileAttributes` of the file. By specifying `LinkOption.NOFOLLOW_LINKS`, the attributes of the symbolic link itself are read, rather than those of the file it points to. This allows us to use `fileAttributes.isSymbolicLink()` to check if a file is a symbolic link. 2. After the above change, it is no longer possible for `fileAttributes.isDirectory()` and `fileAttributes.isSymbolicLink()` to be true simultaneously. Therefore, when `fileAttributes.isDirectory()` is true, there is no need to check `!fileAttributes.isSymbolicLink()`. 3. When `fileAttributes.isSymbolicLink()` is true, deletion behavior for the symbolic link has been added. 4. When `!file.exists()` is true, an additional check for `!fileAttributes.isSymbolicLink()` has been added. This is because for a broken symbolic link, `file.exists()` will also return false, but in such cases, we should proceed with the cleanup. 5. The previously handwritten `isSymlink` method in JavaUtils has been removed, as it is no longer needed after the above changes. ### Why are the changes needed? Fix the cleanup logic for symbolic links in `JavaUtils.deleteRecursivelyUsingJavaIO` method. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - New test cases have been added - Check with existing test cases which named `PipedRDDSuite`: Run `build/sbt "core/testOnly org.apache.spark.rdd.PipedRDDSuite"` Before ``` git status On branch upmaster Your branch is up to date with 'upstream/master'. Untracked files: (use "git add <file>..." to include in what will be committed) core/tasks/ ls -l core/tasks total 0 drwxr-xr-x 5 yangjie01 staff 160 1 3 18:15 099f2492-acef-4556-8a34-1318dccf7ad2 drwxr-xr-x 5 yangjie01 staff 160 1 3 18:15 47d46196-2f7b-4c7b-acf3-7e1d26584c12 drwxr-xr-x 5 yangjie01 staff 160 1 3 18:15 5e23fe20-1e3f-49b8-8404-5cd3b1033e37 drwxr-xr-x 5 yangjie01 staff 160 1 3 18:15 a2cbf5a9-3ebf-4332-be87-c9501830750e drwxr-xr-x 5 yangjie01 staff 160 1 3 18:15 ddf45bf5-d0fa-4970-9094-930f382b675c drwxr-xr-x 5 yangjie01 staff 160 1 3 18:15 e25fe5ad-a0be-48d0-81f6-605542f447b5 ls -l core/tasks/099f2492-acef-4556-8a34-1318dccf7ad2 total 0 lrwxr-xr-x 1 yangjie01 staff 59 1 3 18:15 benchmarks -> /Users/yangjie01/SourceCode/git/spark-sbt/core/./benchmarks lrwxr-xr-x 1 yangjie01 staff 52 1 3 18:15 src -> /Users/yangjie01/SourceCode/git/spark-sbt/core/./src lrwxr-xr-x 1 yangjie01 staff 55 1 3 18:15 target -> /Users/yangjie01/SourceCode/git/spark-sbt/core/./target ``` We noticed that symbolic links are left behind after the tests, even though manual cleanup has been invoked in the test code: https://github.com/apache/spark/blob/b210f422b0078d535eddc696ebba8d92f67b81fb/core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala#L214-L232 After ``` git status On branch deleteRecursivelyUsingJavaIO-SymbolicLink Your branch is up to date with 'origin/deleteRecursivelyUsingJavaIO-SymbolicLink'. nothing to commit, working tree clean ``` We observe that there are no residual symbolic links left after the tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes #49347 from LuciferYang/deleteRecursivelyUsingJavaIO-SymbolicLink. Lead-authored-by: yangjie01 <[email protected]> Co-authored-by: YangJie <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
- Loading branch information