forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-45844][SQL] Implement case-insensitivity for XML
### What changes were proposed in this pull request? This PR addresses the current lack of support for case-insensitive schema handling in XML file formats. Our approach now follows the `SQLConf` case insensitivity setting in both schema inference and file read operations. We handle duplicate keys in the following behavior: 1. When we encounter duplicates (whether case-sensitive or not) in a row, we will convert them into an array and pick the first one we encounter as the array's name. 2. When we encounter duplicates across rows, we will also respect the first one we encounter Keys of the map-type data are string types and are not treated as field names, thereby not requiring case-sensitivity checks. ### Why are the changes needed? To keep consistent with other file formats and reduce maintenance efforts. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#43722 from shujingyang-db/case-sensitive. Lead-authored-by: Shujing Yang <[email protected]> Co-authored-by: Shujing Yang <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
- Loading branch information
1 parent
aa10ac7
commit 2cac768
Showing
6 changed files
with
235 additions
and
34 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
12 changes: 12 additions & 0 deletions
12
sql/core/src/test/resources/test-data/xml-resources/attributes-case-sensitive.xml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
<?xml version="1.0"?> | ||
<ROWSET> | ||
<ROW> | ||
<struct attr1="1">1</struct> | ||
<array attr2="2">2</array> | ||
<array Attr2="3">3</array> | ||
<array aTTr2="4">4</array> | ||
</ROW> | ||
<ROW> | ||
<struct Attr1="5">5</struct> | ||
</ROW> | ||
</ROWSET> |
Oops, something went wrong.