Skip to content

[VL] Enable filter push-down on nested field #29430

[VL] Enable filter push-down on nested field

[VL] Enable filter push-down on nested field #29430

Triggered via pull request November 14, 2024 03:45
@rui-morui-mo
synchronize #7946
Status Success
Total duration 22s
Artifacts

dev_cron.yml

on: pull_request_target
Fit to window
Zoom out
Zoom in

Annotations

50 errors
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$13390/1121076727@2e6a445))] +- Filter (last#244622 = Jones) +- Project [id#244584, name#244585.first AS first#244620, name#244585.middle AS middle#244621, name#244585.last AS last#244622] +- Filter isnotnull(name#244585.middle) +- Project [id#244584, name#244585, address#244586, pets#244587, friends#244588, relatives#244589, employer#244590, relations#244591, p#244592] +- SubqueryAlias contacts +- View (`contacts`, [id#244584,name#244585,address#244586,pets#244587,friends#244588,relatives#244589,employer#244590,relations#244591,p#244592]) +- Relation [id#244584,name#244585,address#244586,pets#244587,friends#244588,relatives#244589,employer#244590,relations#244591,p#244592] parquet == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#244584) AS count(id)#244629L] +- Filter (last#244622 = Jones) +- Project [id#244584, name#244585.first AS first#244620, name#244585.middle AS middle#244621, name#244585.last AS last#244622] +- Filter isnotnull(name#244585.middle) +- Project [id#244584, name#244585, address#244586, pets#244587, friends#244588, relatives#244589, employer#244590, relations#244591, p#244592] +- SubqueryAlias contacts +- View (`contacts`, [id#244584,name#244585,address#244586,pets#244587,friends#244588,relatives#244589,employer#244590,relations#244591,p#244592]) +- Relation [id#244584,name#244585,address#244586,pets#244587,friends#244588,relatives#244589,employer#244590,relations#244591,p#244592] parquet == Optimized Logical Plan == Aggregate [count(id#244584) AS count(id)#244629L] +- Project [id#244584] +- Filter ((isnotnull(name#244585) AND isnotnull(name#244585.middle)) AND (name#244585.last = Jones)) +- Relation [id#244584,name#244585,address#244586,pets#244587,friends#244588,relatives#244589,employer#244590,relations#244591,p#244592] parquet == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(18507) HashAggregateTransformer(keys=[], functions=[count(id#244584)], isStreamingAgg=false, output=[count(id)#244629L]) +- ^(18507) InputIteratorTransformer[count#244641L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1228086], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(18506) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#244584)], isStreamingAgg=false, output=[count#244641L]) +- ^(18506) ProjectExecTransformer [id#244584] +- ^(18506) FilterExecTransformer ((isnotnull(name#244585) AND isnotnull(name#244585.middle)) AND (name#244585.last = Jones)) +- ^(18506) FileScanTransformer parquet [id#244584,name#244585,p#244592] Batched: true, DataFilters: [isnotnull(name#244585), isnotnull(name#244585.middle), (name#244585.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-b771ec69-dc58-4580-9d66-c0bc75eb6709/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#244584)], output=[count(id)#244629L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1228047] +- HashAggregate(keys=[], functions=[partial_count(id#244584)], output=[count#244641L]) +- Project [id#244584] +- Filter ((isnotnull(name#244585) AND isnotnull(name#244585.middle)) AND (name#244585.last = Jones)) +- FileScan parquet [id#244584,name#244585,p#244592] Batched: false, DataFilters: [isnotnull(name#244585), isnotnull(name#244585.middle), (name#244585.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-b771ec69-dc58-4580-9d66-c0bc75eb6709/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$13390/1121076727@2e6a445))] +- Filter (last#244747 = Jones) +- Project [id#244709, name#244710.first AS first#244745, name#244710.middle AS middle#244746, name#244710.last AS last#244747] +- Filter isnotnull(name#244710.middle) +- Project [id#244709, name#244710, address#244711, pets#244712, friends#244713, relatives#244714, employer#244715, relations#244716, p#244717] +- SubqueryAlias contacts +- View (`contacts`, [id#244709,name#244710,address#244711,pets#244712,friends#244713,relatives#244714,employer#244715,relations#244716,p#244717]) +- Relation [id#244709,name#244710,address#244711,pets#244712,friends#244713,relatives#244714,employer#244715,relations#244716,p#244717] parquet == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#244709) AS count(id)#244754L] +- Filter (last#244747 = Jones) +- Project [id#244709, name#244710.first AS first#244745, name#244710.middle AS middle#244746, name#244710.last AS last#244747] +- Filter isnotnull(name#244710.middle) +- Project [id#244709, name#244710, address#244711, pets#244712, friends#244713, relatives#244714, employer#244715, relations#244716, p#244717] +- SubqueryAlias contacts +- View (`contacts`, [id#244709,name#244710,address#244711,pets#244712,friends#244713,relatives#244714,employer#244715,relations#244716,p#244717]) +- Relation [id#244709,name#244710,address#244711,pets#244712,friends#244713,relatives#244714,employer#244715,relations#244716,p#244717] parquet == Optimized Logical Plan == Aggregate [count(id#244709) AS count(id)#244754L] +- Project [id#244709] +- Filter ((isnotnull(name#244710) AND isnotnull(name#244710.middle)) AND (name#244710.last = Jones)) +- Relation [id#244709,name#244710,address#244711,pets#244712,friends#244713,relatives#244714,employer#244715,relations#244716,p#244717] parquet == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(18511) HashAggregateTransformer(keys=[], functions=[count(id#244709)], isStreamingAgg=false, output=[count(id)#244754L]) +- ^(18511) InputIteratorTransformer[count#244766L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1228309], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(18510) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#244709)], isStreamingAgg=false, output=[count#244766L]) +- ^(18510) ProjectExecTransformer [id#244709] +- ^(18510) FilterExecTransformer ((isnotnull(name#244710) AND isnotnull(name#244710.middle)) AND (name#244710.last = Jones)) +- ^(18510) FileScanTransformer parquet [id#244709,name#244710,p#244717] Batched: true, DataFilters: [isnotnull(name#244710), isnotnull(name#244710.middle), (name#244710.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-402dd32e-43f0-437b-ad6b-b6af6918c44b/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#244709)], output=[count(id)#244754L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1228270] +- HashAggregate(keys=[], functions=[partial_count(id#244709)], output=[count#244766L]) +- Project [id#244709] +- Filter ((isnotnull(name#244710) AND isnotnull(name#244710.middle)) AND (name#244710.last = Jones)) +- FileScan parquet [id#244709,name#244710,p#244717] Batched: false, DataFilters: [isnotnull(name#244710), isnotnull(name#244710.middle), (name#244710.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-402dd32e-43f0-437b-ad6b-b6af6918c44b/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$13390/1121076727@2e6a445))] +- Filter (last#244866 = Jones) +- Project [id#244828, name#244829.first AS first#244864, name#244829.middle AS middle#244865, name#244829.last AS last#244866] +- Filter isnotnull(name#244829.middle) +- Project [id#244828, name#244829, address#244830, pets#244831, friends#244832, relatives#244833, employer#244834, relations#244835, p#244836] +- SubqueryAlias contacts +- View (`contacts`, [id#244828,name#244829,address#244830,pets#244831,friends#244832,relatives#244833,employer#244834,relations#244835,p#244836]) +- Relation [id#244828,name#244829,address#244830,pets#244831,friends#244832,relatives#244833,employer#244834,relations#244835,p#244836] parquet == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#244828) AS count(id)#244873L] +- Filter (last#244866 = Jones) +- Project [id#244828, name#244829.first AS first#244864, name#244829.middle AS middle#244865, name#244829.last AS last#244866] +- Filter isnotnull(name#244829.middle) +- Project [id#244828, name#244829, address#244830, pets#244831, friends#244832, relatives#244833, employer#244834, relations#244835, p#244836] +- SubqueryAlias contacts +- View (`contacts`, [id#244828,name#244829,address#244830,pets#244831,friends#244832,relatives#244833,employer#244834,relations#244835,p#244836]) +- Relation [id#244828,name#244829,address#244830,pets#244831,friends#244832,relatives#244833,employer#244834,relations#244835,p#244836] parquet == Optimized Logical Plan == Aggregate [count(id#244828) AS count(id)#244873L] +- Project [id#244828] +- Filter ((isnotnull(name#244829) AND isnotnull(name#244829.middle)) AND (name#244829.last = Jones)) +- Relation [id#244828,name#244829,address#244830,pets#244831,friends#244832,relatives#244833,employer#244834,relations#244835,p#244836] parquet == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(18515) HashAggregateTransformer(keys=[], functions=[count(id#244828)], isStreamingAgg=false, output=[count(id)#244873L]) +- ^(18515) InputIteratorTransformer[count#244885L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1228532], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(18514) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#244828)], isStreamingAgg=false, output=[count#244885L]) +- ^(18514) ProjectExecTransformer [id#244828] +- ^(18514) FilterExecTransformer ((isnotnull(name#244829) AND isnotnull(name#244829.middle)) AND (name#244829.last = Jones)) +- ^(18514) FileScanTransformer parquet [id#244828,name#244829,p#244836] Batched: true, DataFilters: [isnotnull(name#244829), isnotnull(name#244829.middle), (name#244829.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-f76345e8-d4e8-4a5b-bbdb-2af14d4e34b0/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#244828)], output=[count(id)#244873L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1228493] +- HashAggregate(keys=[], functions=[partial_count(id#244828)], output=[count#244885L]) +- Project [id#244828] +- Filter ((isnotnull(name#244829) AND isnotnull(name#244829.middle)) AND (name#244829.last = Jones)) +- FileScan parquet [id#244828,name#244829,p#244836] Batched: false, DataFilters: [isnotnull(name#244829), isnotnull(name#244829.middle), (name#244829.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-f76345e8-d4e8-4a5b-bbdb-2af14d4e34b0/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$13390/1121076727@2e6a445))] +- Filter (last#244991 = Jones) +- Project [id#244953, name#244954.first AS first#244989, name#244954.middle AS middle#244990, name#244954.last AS last#244991] +- Filter isnotnull(name#244954.middle) +- Project [id#244953, name#244954, address#244955, pets#244956, friends#244957, relatives#244958, employer#244959, relations#244960, p#244961] +- SubqueryAlias contacts +- View (`contacts`, [id#244953,name#244954,address#244955,pets#244956,friends#244957,relatives#244958,employer#244959,relations#244960,p#244961]) +- Relation [id#244953,name#244954,address#244955,pets#244956,friends#244957,relatives#244958,employer#244959,relations#244960,p#244961] parquet == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#244953) AS count(id)#244998L] +- Filter (last#244991 = Jones) +- Project [id#244953, name#244954.first AS first#244989, name#244954.middle AS middle#244990, name#244954.last AS last#244991] +- Filter isnotnull(name#244954.middle) +- Project [id#244953, name#244954, address#244955, pets#244956, friends#244957, relatives#244958, employer#244959, relations#244960, p#244961] +- SubqueryAlias contacts +- View (`contacts`, [id#244953,name#244954,address#244955,pets#244956,friends#244957,relatives#244958,employer#244959,relations#244960,p#244961]) +- Relation [id#244953,name#244954,address#244955,pets#244956,friends#244957,relatives#244958,employer#244959,relations#244960,p#244961] parquet == Optimized Logical Plan == Aggregate [count(id#244953) AS count(id)#244998L] +- Project [id#244953] +- Filter ((isnotnull(name#244954) AND isnotnull(name#244954.middle)) AND (name#244954.last = Jones)) +- Relation [id#244953,name#244954,address#244955,pets#244956,friends#244957,relatives#244958,employer#244959,relations#244960,p#244961] parquet == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(18519) HashAggregateTransformer(keys=[], functions=[count(id#244953)], isStreamingAgg=false, output=[count(id)#244998L]) +- ^(18519) InputIteratorTransformer[count#245010L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1228755], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(18518) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#244953)], isStreamingAgg=false, output=[count#245010L]) +- ^(18518) ProjectExecTransformer [id#244953] +- ^(18518) FilterExecTransformer ((isnotnull(name#244954) AND isnotnull(name#244954.middle)) AND (name#244954.last = Jones)) +- ^(18518) FileScanTransformer parquet [id#244953,name#244954,p#244961] Batched: true, DataFilters: [isnotnull(name#244954), isnotnull(name#244954.middle), (name#244954.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-d7e7f713-4548-4b40-a85d-b6698d80cdc8/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#244953)], output=[count(id)#244998L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1228716] +- HashAggregate(keys=[], functions=[partial_count(id#244953)], output=[count#245010L]) +- Project [id#244953] +- Filter ((isnotnull(name#244954) AND isnotnull(name#244954.middle)) AND (name#244954.last = Jones)) +- FileScan parquet [id#244953,name#244954,p#244961] Batched: false, DataFilters: [isnotnull(name#244954), isnotnull(name#244954.middle), (name#244954.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-d7e7f713-4548-4b40-a85d-b6698d80cdc8/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#257861.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#257860,name#257861,address#257862,pets#257863,friends#257864,relatives#257865,employer#257866,relations#257867,p#257868]) +- Relation [id#257860,name#257861,address#257862,pets#257863,friends#257864,relatives#257865,employer#257866,relations#257867,p#257868] parquet == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#257861.First AS First#257931, NAME#257861.MiDDle AS MiDDle#257932] +- Filter isnotnull(Name#257861.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#257860,name#257861,address#257862,pets#257863,friends#257864,relatives#257865,employer#257866,relations#257867,p#257868]) +- Relation [id#257860,name#257861,address#257862,pets#257863,friends#257864,relatives#257865,employer#257866,relations#257867,p#257868] parquet == Optimized Logical Plan == Project [name#257861.first AS First#257931, name#257861.middle AS MiDDle#257932] +- Filter (isnotnull(name#257861) AND isnotnull(name#257861.middle)) +- Relation [id#257860,name#257861,address#257862,pets#257863,friends#257864,relatives#257865,employer#257866,relations#257867,p#257868] parquet == Physical Plan == VeloxColumnarToRow +- ^(19118) ProjectExecTransformer [name#257861.first AS First#257931, name#257861.middle AS MiDDle#257932] +- ^(19118) FilterExecTransformer (isnotnull(name#257861) AND isnotnull(name#257861.middle)) +- ^(19118) FileScanTransformer parquet [name#257861,p#257868] Batched: true, DataFilters: [isnotnull(name#257861), isnotnull(name#257861.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-fdff98a1-de98-4725-a10f-79ca54ea3241/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>> == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#258009.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#258008,name#258009,address#258010,pets#258011,friends#258012,relatives#258013,employer#258014,relations#258015,p#258016]) +- Relation [id#258008,name#258009,address#258010,pets#258011,friends#258012,relatives#258013,employer#258014,relations#258015,p#258016] parquet == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#258009.First AS First#258079, NAME#258009.MiDDle AS MiDDle#258080] +- Filter isnotnull(Name#258009.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#258008,name#258009,address#258010,pets#258011,friends#258012,relatives#258013,employer#258014,relations#258015,p#258016]) +- Relation [id#258008,name#258009,address#258010,pets#258011,friends#258012,relatives#258013,employer#258014,relations#258015,p#258016] parquet == Optimized Logical Plan == Project [name#258009.first AS First#258079, name#258009.middle AS MiDDle#258080] +- Filter (isnotnull(name#258009) AND isnotnull(name#258009.middle)) +- Relation [id#258008,name#258009,address#258010,pets#258011,friends#258012,relatives#258013,employer#258014,relations#258015,p#258016] parquet == Physical Plan == VeloxColumnarToRow +- ^(19122) ProjectExecTransformer [name#258009.first AS First#258079, name#258009.middle AS MiDDle#258080] +- ^(19122) FilterExecTransformer (isnotnull(name#258009) AND isnotnull(name#258009.middle)) +- ^(19122) FileScanTransformer parquet [name#258009,p#258016] Batched: true, DataFilters: [isnotnull(name#258009), isnotnull(name#258009.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-cc51932e-5373-4187-9dbc-ac23e24574b9/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>> == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#258151.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#258150,name#258151,address#258152,pets#258153,friends#258154,relatives#258155,employer#258156,relations#258157,p#258158]) +- Relation [id#258150,name#258151,address#258152,pets#258153,friends#258154,relatives#258155,employer#258156,relations#258157,p#258158] parquet == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#258151.First AS First#258221, NAME#258151.MiDDle AS MiDDle#258222] +- Filter isnotnull(Name#258151.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#258150,name#258151,address#258152,pets#258153,friends#258154,relatives#258155,employer#258156,relations#258157,p#258158]) +- Relation [id#258150,name#258151,address#258152,pets#258153,friends#258154,relatives#258155,employer#258156,relations#258157,p#258158] parquet == Optimized Logical Plan == Project [name#258151.first AS First#258221, name#258151.middle AS MiDDle#258222] +- Filter (isnotnull(name#258151) AND isnotnull(name#258151.middle)) +- Relation [id#258150,name#258151,address#258152,pets#258153,friends#258154,relatives#258155,employer#258156,relations#258157,p#258158] parquet == Physical Plan == VeloxColumnarToRow +- ^(19126) ProjectExecTransformer [name#258151.first AS First#258221, name#258151.middle AS MiDDle#258222] +- ^(19126) FilterExecTransformer (isnotnull(name#258151) AND isnotnull(name#258151.middle)) +- ^(19126) FileScanTransformer parquet [name#258151,p#258158] Batched: true, DataFilters: [isnotnull(name#258151), isnotnull(name#258151.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-a76268f2-5e51-42c5-a72c-d6f504dae3f3/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>> == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#258299.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#258298,name#258299,address#258300,pets#258301,friends#258302,relatives#258303,employer#258304,relations#258305,p#258306]) +- Relation [id#258298,name#258299,address#258300,pets#258301,friends#258302,relatives#258303,employer#258304,relations#258305,p#258306] parquet == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#258299.First AS First#258369, NAME#258299.MiDDle AS MiDDle#258370] +- Filter isnotnull(Name#258299.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#258298,name#258299,address#258300,pets#258301,friends#258302,relatives#258303,employer#258304,relations#258305,p#258306]) +- Relation [id#258298,name#258299,address#258300,pets#258301,friends#258302,relatives#258303,employer#258304,relations#258305,p#258306] parquet == Optimized Logical Plan == Project [name#258299.first AS First#258369, name#258299.middle AS MiDDle#258370] +- Filter (isnotnull(name#258299) AND isnotnull(name#258299.middle)) +- Relation [id#258298,name#258299,address#258300,pets#258301,friends#258302,relatives#258303,employer#258304,relations#258305,p#258306] parquet == Physical Plan == VeloxColumnarToRow +- ^(19130) ProjectExecTransformer [name#258299.first AS First#258369, name#258299.middle AS MiDDle#258370] +- ^(19130) FilterExecTransformer (isnotnull(name#258299) AND isnotnull(name#258299.middle)) +- ^(19130) FileScanTransformer parquet [name#258299,p#258306] Batched: true, DataFilters: [isnotnull(name#258299), isnotnull(name#258299.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-aacaf644-49cf-4b4e-a7dc-75ce59ed7d15/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>> == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$13390/1121076727@2e6a445))] +- Filter (last#167351 = Jones) +- Project [id#167313, name#167314.first AS first#167349, name#167314.middle AS middle#167350, name#167314.last AS last#167351] +- Filter isnotnull(name#167314.middle) +- Project [id#167313, name#167314, address#167315, pets#167316, friends#167317, relatives#167318, employer#167319, relations#167320, p#167321] +- SubqueryAlias contacts +- View (`contacts`, [id#167313,name#167314,address#167315,pets#167316,friends#167317,relatives#167318,employer#167319,relations#167320,p#167321]) +- RelationV2[id#167313, name#167314, address#167315, pets#167316, friends#167317, relatives#167318, employer#167319, relations#167320, p#167321] parquet file:/tmp/spark-b110863f-3f85-4cb2-b067-c5cbcb7f5713/contacts == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#167313) AS count(id)#167358L] +- Filter (last#167351 = Jones) +- Project [id#167313, name#167314.first AS first#167349, name#167314.middle AS middle#167350, name#167314.last AS last#167351] +- Filter isnotnull(name#167314.middle) +- Project [id#167313, name#167314, address#167315, pets#167316, friends#167317, relatives#167318, employer#167319, relations#167320, p#167321] +- SubqueryAlias contacts +- View (`contacts`, [id#167313,name#167314,address#167315,pets#167316,friends#167317,relatives#167318,employer#167319,relations#167320,p#167321]) +- RelationV2[id#167313, name#167314, address#167315, pets#167316, friends#167317, relatives#167318, employer#167319, relations#167320, p#167321] parquet file:/tmp/spark-b110863f-3f85-4cb2-b067-c5cbcb7f5713/contacts == Optimized Logical Plan == Aggregate [count(id#167313) AS count(id)#167358L] +- Project [id#167313] +- Filter ((isnotnull(name#167314) AND isnotnull(name#167314.middle)) AND (name#167314.last = Jones)) +- RelationV2[id#167313, name#167314] parquet file:/tmp/spark-b110863f-3f85-4cb2-b067-c5cbcb7f5713/contacts == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(11326) HashAggregateTransformer(keys=[], functions=[count(id#167313)], isStreamingAgg=false, output=[count(id)#167358L]) +- ^(11326) InputIteratorTransformer[count#167367L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=946810], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(11325) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#167313)], isStreamingAgg=false, output=[count#167367L]) +- ^(11325) ProjectExecTransformer [id#167313] +- ^(11325) FilterExecTransformer ((isnotnull(name#167314) AND isnotnull(name#167314.middle)) AND (name#167314.last = Jones)) +- ^(11325) BatchScanExecTransformer[id#167313, name#167314] ParquetScan DataFilters: [isnotnull(name#167314), isnotnull(name#167314.middle), (name#167314.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-b110863f-3f85-4cb2-b067-c5cbcb7f5713/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)] RuntimeFilters: [] +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#167313)], output=[count(id)#167358L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=946773] +- HashAggregate(keys=[], functions=[partial_count(id#167313)], output=[count#167367L]) +- Project [id#167313] +- Filter ((isnotnull(name#167314) AND isnotnull(name#167314.middle)) AND (name#167314.last = Jones)) +- BatchScan[id#167313, name#167314] ParquetScan DataFilters: [isnotnull(name#167314), isnotnull(name#167314.middle), (name#167314.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-b110863f-3f85-4cb2-b067-c5cbcb7f5713/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)] RuntimeFilters: [] == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$13390/1121076727@2e6a445))] +- Filter (last#167476 = Jones) +- Project [id#167438, name#167439.first AS first#167474, name#167439.middle AS middle#167475, name#167439.last AS last#167476] +- Filter isnotnull(name#167439.middle) +- Project [id#167438, name#167439, address#167440, pets#167441, friends#167442, relatives#167443, employer#167444, relations#167445, p#167446] +- SubqueryAlias contacts +- View (`contacts`, [id#167438,name#167439,address#167440,pets#167441,friends#167442,relatives#167443,employer#167444,relations#167445,p#167446]) +- RelationV2[id#167438, name#167439, address#167440, pets#167441, friends#167442, relatives#167443, employer#167444, relations#167445, p#167446] parquet file:/tmp/spark-d90c9f45-df42-4689-b81a-90e59ed12635/contacts == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#167438) AS count(id)#167483L] +- Filter (last#167476 = Jones) +- Project [id#167438, name#167439.first AS first#167474, name#167439.middle AS middle#167475, name#167439.last AS last#167476] +- Filter isnotnull(name#167439.middle) +- Project [id#167438, name#167439, address#167440, pets#167441, friends#167442, relatives#167443, employer#167444, relations#167445, p#167446] +- SubqueryAlias contacts +- View (`contacts`, [id#167438,name#167439,address#167440,pets#167441,friends#167442,relatives#167443,employer#167444,relations#167445,p#167446]) +- RelationV2[id#167438, name#167439, address#167440, pets#167441, friends#167442, relatives#167443, employer#167444, relations#167445, p#167446] parquet file:/tmp/spark-d90c9f45-df42-4689-b81a-90e59ed12635/contacts == Optimized Logical Plan == Aggregate [count(id#167438) AS count(id)#167483L] +- Project [id#167438] +- Filter ((isnotnull(name#167439) AND isnotnull(name#167439.middle)) AND (name#167439.last = Jones)) +- RelationV2[id#167438, name#167439] parquet file:/tmp/spark-d90c9f45-df42-4689-b81a-90e59ed12635/contacts == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(11330) HashAggregateTransformer(keys=[], functions=[count(id#167438)], isStreamingAgg=false, output=[count(id)#167483L]) +- ^(11330) InputIteratorTransformer[count#167492L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=947027], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(11329) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#167438)], isStreamingAgg=false, output=[count#167492L]) +- ^(11329) ProjectExecTransformer [id#167438] +- ^(11329) FilterExecTransformer ((isnotnull(name#167439) AND isnotnull(name#167439.middle)) AND (name#167439.last = Jones)) +- ^(11329) BatchScanExecTransformer[id#167438, name#167439] ParquetScan DataFilters: [isnotnull(name#167439), isnotnull(name#167439.middle), (name#167439.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-d90c9f45-df42-4689-b81a-90e59ed12635/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)] RuntimeFilters: [] +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#167438)], output=[count(id)#167483L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=946990] +- HashAggregate(keys=[], functions=[partial_count(id#167438)], output=[count#167492L]) +- Project [id#167438] +- Filter ((isnotnull(name#167439) AND isnotnull(name#167439.middle)) AND (name#167439.last = Jones)) +- BatchScan[id#167438, name#167439] ParquetScan DataFilters: [isnotnull(name#167439), isnotnull(name#167439.middle), (name#167439.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-d90c9f45-df42-4689-b81a-90e59ed12635/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)] RuntimeFilters: [] == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$13390/1121076727@2e6a445))] +- Filter (last#167595 = Jones) +- Project [id#167557, name#167558.first AS first#167593, name#167558.middle AS middle#167594, name#167558.last AS last#167595] +- Filter isnotnull(name#167558.middle) +- Project [id#167557, name#167558, address#167559, pets#167560, friends#167561, relatives#167562, employer#167563, relations#167564, p#167565] +- SubqueryAlias contacts +- View (`contacts`, [id#167557,name#167558,address#167559,pets#167560,friends#167561,relatives#167562,employer#167563,relations#167564,p#167565]) +- RelationV2[id#167557, name#167558, address#167559, pets#167560, friends#167561, relatives#167562, employer#167563, relations#167564, p#167565] parquet file:/tmp/spark-8cd40a91-8a66-47ca-aea7-c50bf9bc8a55/contacts == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#167557) AS count(id)#167602L] +- Filter (last#167595 = Jones) +- Project [id#167557, name#167558.first AS first#167593, name#167558.middle AS middle#167594, name#167558.last AS last#167595] +- Filter isnotnull(name#167558.middle) +- Project [id#167557, name#167558, address#167559, pets#167560, friends#167561, relatives#167562, employer#167563, relations#167564, p#167565] +- SubqueryAlias contacts +- View (`contacts`, [id#167557,name#167558,address#167559,pets#167560,friends#167561,relatives#167562,employer#167563,relations#167564,p#167565]) +- RelationV2[id#167557, name#167558, address#167559, pets#167560, friends#167561, relatives#167562, employer#167563, relations#167564, p#167565] parquet file:/tmp/spark-8cd40a91-8a66-47ca-aea7-c50bf9bc8a55/contacts == Optimized Logical Plan == Aggregate [count(id#167557) AS count(id)#167602L] +- Project [id#167557] +- Filter ((isnotnull(name#167558) AND isnotnull(name#167558.middle)) AND (name#167558.last = Jones)) +- RelationV2[id#167557, name#167558] parquet file:/tmp/spark-8cd40a91-8a66-47ca-aea7-c50bf9bc8a55/contacts == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(11334) HashAggregateTransformer(keys=[], functions=[count(id#167557)], isStreamingAgg=false, output=[count(id)#167602L]) +- ^(11334) InputIteratorTransformer[count#167611L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=947244], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(11333) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#167557)], isStreamingAgg=false, output=[count#167611L]) +- ^(11333) ProjectExecTransformer [id#167557] +- ^(11333) FilterExecTransformer ((isnotnull(name#167558) AND isnotnull(name#167558.middle)) AND (name#167558.last = Jones)) +- ^(11333) BatchScanExecTransformer[id#167557, name#167558] ParquetScan DataFilters: [isnotnull(name#167558), isnotnull(name#167558.middle), (name#167558.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-8cd40a91-8a66-47ca-aea7-c50bf9bc8a55/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)] RuntimeFilters: [] +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#167557)], output=[count(id)#167602L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=947207] +- HashAggregate(keys=[], functions=[partial_count(id#167557)], output=[count#167611L]) +- Project [id#167557] +- Filter ((isnotnull(name#167558) AND isnotnull(name#167558.middle)) AND (name#167558.last = Jones)) +- BatchScan[id#167557, name#167558] ParquetScan DataFilters: [isnotnull(name#167558), isnotnull(name#167558.middle), (name#167558.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-8cd40a91-8a66-47ca-aea7-c50bf9bc8a55/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)] RuntimeFilters: [] == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$13390/1121076727@2e6a445))] +- Filter (last#167720 = Jones) +- Project [id#167682, name#167683.first AS first#167718, name#167683.middle AS middle#167719, name#167683.last AS last#167720] +- Filter isnotnull(name#167683.middle) +- Project [id#167682, name#167683, address#167684, pets#167685, friends#167686, relatives#167687, employer#167688, relations#167689, p#167690] +- SubqueryAlias contacts +- View (`contacts`, [id#167682,name#167683,address#167684,pets#167685,friends#167686,relatives#167687,employer#167688,relations#167689,p#167690]) +- RelationV2[id#167682, name#167683, address#167684, pets#167685, friends#167686, relatives#167687, employer#167688, relations#167689, p#167690] parquet file:/tmp/spark-35ba832c-e968-4532-b9c5-9756e9b682bc/contacts == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#167682) AS count(id)#167727L] +- Filter (last#167720 = Jones) +- Project [id#167682, name#167683.first AS first#167718, name#167683.middle AS middle#167719, name#167683.last AS last#167720] +- Filter isnotnull(name#167683.middle) +- Project [id#167682, name#167683, address#167684, pets#167685, friends#167686, relatives#167687, employer#167688, relations#167689, p#167690] +- SubqueryAlias contacts +- View (`contacts`, [id#167682,name#167683,address#167684,pets#167685,friends#167686,relatives#167687,employer#167688,relations#167689,p#167690]) +- RelationV2[id#167682, name#167683, address#167684, pets#167685, friends#167686, relatives#167687, employer#167688, relations#167689, p#167690] parquet file:/tmp/spark-35ba832c-e968-4532-b9c5-9756e9b682bc/contacts == Optimized Logical Plan == Aggregate [count(id#167682) AS count(id)#167727L] +- Project [id#167682] +- Filter ((isnotnull(name#167683) AND isnotnull(name#167683.middle)) AND (name#167683.last = Jones)) +- RelationV2[id#167682, name#167683] parquet file:/tmp/spark-35ba832c-e968-4532-b9c5-9756e9b682bc/contacts == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(11338) HashAggregateTransformer(keys=[], functions=[count(id#167682)], isStreamingAgg=false, output=[count(id)#167727L]) +- ^(11338) InputIteratorTransformer[count#167736L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=947461], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(11337) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#167682)], isStreamingAgg=false, output=[count#167736L]) +- ^(11337) ProjectExecTransformer [id#167682] +- ^(11337) FilterExecTransformer ((isnotnull(name#167683) AND isnotnull(name#167683.middle)) AND (name#167683.last = Jones)) +- ^(11337) BatchScanExecTransformer[id#167682, name#167683] ParquetScan DataFilters: [isnotnull(name#167683), isnotnull(name#167683.middle), (name#167683.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-35ba832c-e968-4532-b9c5-9756e9b682bc/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)] RuntimeFilters: [] +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#167682)], output=[count(id)#167727L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=947424] +- HashAggregate(keys=[], functions=[partial_count(id#167682)], output=[count#167736L]) +- Project [id#167682] +- Filter ((isnotnull(name#167683) AND isnotnull(name#167683.middle)) AND (name#167683.last = Jones)) +- BatchScan[id#167682, name#167683] ParquetScan DataFilters: [isnotnull(name#167683), isnotnull(name#167683.middle), (name#167683.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-35ba832c-e968-4532-b9c5-9756e9b682bc/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)] RuntimeFilters: [] == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#180206.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#180205,name#180206,address#180207,pets#180208,friends#180209,relatives#180210,employer#180211,relations#180212,p#180213]) +- RelationV2[id#180205, name#180206, address#180207, pets#180208, friends#180209, relatives#180210, employer#180211, relations#180212, p#180213] parquet file:/tmp/spark-03185a12-9972-4426-a7ee-e05b879be6b5/contacts == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#180206.First AS First#180266, NAME#180206.MiDDle AS MiDDle#180267] +- Filter isnotnull(Name#180206.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#180205,name#180206,address#180207,pets#180208,friends#180209,relatives#180210,employer#180211,relations#180212,p#180213]) +- RelationV2[id#180205, name#180206, address#180207, pets#180208, friends#180209, relatives#180210, employer#180211, relations#180212, p#180213] parquet file:/tmp/spark-03185a12-9972-4426-a7ee-e05b879be6b5/contacts == Optimized Logical Plan == Project [name#180206.first AS First#180266, name#180206.middle AS MiDDle#180267] +- Filter (isnotnull(name#180206) AND isnotnull(name#180206.middle)) +- RelationV2[name#180206] parquet file:/tmp/spark-03185a12-9972-4426-a7ee-e05b879be6b5/contacts == Physical Plan == VeloxColumnarToRow +- ^(11937) ProjectExecTransformer [name#180206.first AS First#180266, name#180206.middle AS MiDDle#180267] +- ^(11937) FilterExecTransformer (isnotnull(name#180206) AND isnotnull(name#180206.middle)) +- ^(11937) BatchScanExecTransformer[name#180206] ParquetScan DataFilters: [isnotnull(name#180206), isnotnull(name#180206.middle)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-03185a12-9972-4426-a7ee-e05b879be6b5/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>, PushedFilters: [IsNotNull(name.middle)] RuntimeFilters: [] == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#180338.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#180337,name#180338,address#180339,pets#180340,friends#180341,relatives#180342,employer#180343,relations#180344,p#180345]) +- RelationV2[id#180337, name#180338, address#180339, pets#180340, friends#180341, relatives#180342, employer#180343, relations#180344, p#180345] parquet file:/tmp/spark-e78cdbc1-9327-4c82-abbe-be1781e5c94b/contacts == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#180338.First AS First#180398, NAME#180338.MiDDle AS MiDDle#180399] +- Filter isnotnull(Name#180338.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#180337,name#180338,address#180339,pets#180340,friends#180341,relatives#180342,employer#180343,relations#180344,p#180345]) +- RelationV2[id#180337, name#180338, address#180339, pets#180340, friends#180341, relatives#180342, employer#180343, relations#180344, p#180345] parquet file:/tmp/spark-e78cdbc1-9327-4c82-abbe-be1781e5c94b/contacts == Optimized Logical Plan == Project [name#180338.first AS First#180398, name#180338.middle AS MiDDle#180399] +- Filter (isnotnull(name#180338) AND isnotnull(name#180338.middle)) +- RelationV2[name#180338] parquet file:/tmp/spark-e78cdbc1-9327-4c82-abbe-be1781e5c94b/contacts == Physical Plan == VeloxColumnarToRow +- ^(11941) ProjectExecTransformer [name#180338.first AS First#180398, name#180338.middle AS MiDDle#180399] +- ^(11941) FilterExecTransformer (isnotnull(name#180338) AND isnotnull(name#180338.middle)) +- ^(11941) BatchScanExecTransformer[name#180338] ParquetScan DataFilters: [isnotnull(name#180338), isnotnull(name#180338.middle)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-e78cdbc1-9327-4c82-abbe-be1781e5c94b/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>, PushedFilters: [IsNotNull(name.middle)] RuntimeFilters: [] == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#180464.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#180463,name#180464,address#180465,pets#180466,friends#180467,relatives#180468,employer#180469,relations#180470,p#180471]) +- RelationV2[id#180463, name#180464, address#180465, pets#180466, friends#180467, relatives#180468, employer#180469, relations#180470, p#180471] parquet file:/tmp/spark-bbe6f7a6-e4f7-4cba-bb80-5e16922d111c/contacts == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#180464.First AS First#180524, NAME#180464.MiDDle AS MiDDle#180525] +- Filter isnotnull(Name#180464.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#180463,name#180464,address#180465,pets#180466,friends#180467,relatives#180468,employer#180469,relations#180470,p#180471]) +- RelationV2[id#180463, name#180464, address#180465, pets#180466, friends#180467, relatives#180468, employer#180469, relations#180470, p#180471] parquet file:/tmp/spark-bbe6f7a6-e4f7-4cba-bb80-5e16922d111c/contacts == Optimized Logical Plan == Project [name#180464.first AS First#180524, name#180464.middle AS MiDDle#180525] +- Filter (isnotnull(name#180464) AND isnotnull(name#180464.middle)) +- RelationV2[name#180464] parquet file:/tmp/spark-bbe6f7a6-e4f7-4cba-bb80-5e16922d111c/contacts == Physical Plan == VeloxColumnarToRow +- ^(11945) ProjectExecTransformer [name#180464.first AS First#180524, name#180464.middle AS MiDDle#180525] +- ^(11945) FilterExecTransformer (isnotnull(name#180464) AND isnotnull(name#180464.middle)) +- ^(11945) BatchScanExecTransformer[name#180464] ParquetScan DataFilters: [isnotnull(name#180464), isnotnull(name#180464.middle)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-bbe6f7a6-e4f7-4cba-bb80-5e16922d111c/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>, PushedFilters: [IsNotNull(name.middle)] RuntimeFilters: [] == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#180596.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#180595,name#180596,address#180597,pets#180598,friends#180599,relatives#180600,employer#180601,relations#180602,p#180603]) +- RelationV2[id#180595, name#180596, address#180597, pets#180598, friends#180599, relatives#180600, employer#180601, relations#180602, p#180603] parquet file:/tmp/spark-262243fc-70a9-4771-8878-219faae974b9/contacts == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#180596.First AS First#180656, NAME#180596.MiDDle AS MiDDle#180657] +- Filter isnotnull(Name#180596.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#180595,name#180596,address#180597,pets#180598,friends#180599,relatives#180600,employer#180601,relations#180602,p#180603]) +- RelationV2[id#180595, name#180596, address#180597, pets#180598, friends#180599, relatives#180600, employer#180601, relations#180602, p#180603] parquet file:/tmp/spark-262243fc-70a9-4771-8878-219faae974b9/contacts == Optimized Logical Plan == Project [name#180596.first AS First#180656, name#180596.middle AS MiDDle#180657] +- Filter (isnotnull(name#180596) AND isnotnull(name#180596.middle)) +- RelationV2[name#180596] parquet file:/tmp/spark-262243fc-70a9-4771-8878-219faae974b9/contacts == Physical Plan == VeloxColumnarToRow +- ^(11949) ProjectExecTransformer [name#180596.first AS First#180656, name#180596.middle AS MiDDle#180657] +- ^(11949) FilterExecTransformer (isnotnull(name#180596) AND isnotnull(name#180596.middle)) +- ^(11949) BatchScanExecTransformer[name#180596] ParquetScan DataFilters: [isnotnull(name#180596), isnotnull(name#180596.middle)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-262243fc-70a9-4771-8878-219faae974b9/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>, PushedFilters: [IsNotNull(name.middle)] RuntimeFilters: [] == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$14479/1380631591@2a624e19))] +- Filter (last#294783 = Jones) +- Project [id#294745, name#294746.first AS first#294781, name#294746.middle AS middle#294782, name#294746.last AS last#294783] +- Filter isnotnull(name#294746.middle) +- Project [id#294745, name#294746, address#294747, pets#294748, friends#294749, relatives#294750, employer#294751, relations#294752, p#294753] +- SubqueryAlias contacts +- View (`contacts`, [id#294745,name#294746,address#294747,pets#294748,friends#294749,relatives#294750,employer#294751,relations#294752,p#294753]) +- Relation [id#294745,name#294746,address#294747,pets#294748,friends#294749,relatives#294750,employer#294751,relations#294752,p#294753] parquet == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#294745) AS count(id)#294790L] +- Filter (last#294783 = Jones) +- Project [id#294745, name#294746.first AS first#294781, name#294746.middle AS middle#294782, name#294746.last AS last#294783] +- Filter isnotnull(name#294746.middle) +- Project [id#294745, name#294746, address#294747, pets#294748, friends#294749, relatives#294750, employer#294751, relations#294752, p#294753] +- SubqueryAlias contacts +- View (`contacts`, [id#294745,name#294746,address#294747,pets#294748,friends#294749,relatives#294750,employer#294751,relations#294752,p#294753]) +- Relation [id#294745,name#294746,address#294747,pets#294748,friends#294749,relatives#294750,employer#294751,relations#294752,p#294753] parquet == Optimized Logical Plan == Aggregate [count(id#294745) AS count(id)#294790L] +- Project [id#294745] +- Filter ((isnotnull(name#294746.last) AND isnotnull(name#294746.middle)) AND (name#294746.last = Jones)) +- Relation [id#294745,name#294746,address#294747,pets#294748,friends#294749,relatives#294750,employer#294751,relations#294752,p#294753] parquet == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(22087) HashAggregateTransformer(keys=[], functions=[count(id#294745)], isStreamingAgg=false, output=[count(id)#294790L]) +- ^(22087) InputIteratorTransformer[count#294802L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1406819], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(22086) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#294745)], isStreamingAgg=false, output=[count#294802L]) +- ^(22086) ProjectExecTransformer [id#294745] +- ^(22086) FilterExecTransformer ((isnotnull(name#294746.last) AND isnotnull(name#294746.middle)) AND (name#294746.last = Jones)) +- ^(22086) FileScanTransformer parquet [id#294745,name#294746,p#294753] Batched: true, DataFilters: [isnotnull(name#294746.last), isnotnull(name#294746.middle), (name#294746.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-995b461e-177d-46c7-8b13-543fabdd7765/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#294745)], output=[count(id)#294790L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1406780] +- HashAggregate(keys=[], functions=[partial_count(id#294745)], output=[count#294802L]) +- Project [id#294745] +- Filter ((isnotnull(name#294746.last) AND isnotnull(name#294746.middle)) AND (name#294746.last = Jones)) +- FileScan parquet [id#294745,name#294746,p#294753] Batched: false, DataFilters: [isnotnull(name#294746.last), isnotnull(name#294746.middle), (name#294746.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-995b461e-177d-46c7-8b13-543fabdd7765/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$14479/1380631591@2a624e19))] +- Filter (last#294908 = Jones) +- Project [id#294870, name#294871.first AS first#294906, name#294871.middle AS middle#294907, name#294871.last AS last#294908] +- Filter isnotnull(name#294871.middle) +- Project [id#294870, name#294871, address#294872, pets#294873, friends#294874, relatives#294875, employer#294876, relations#294877, p#294878] +- SubqueryAlias contacts +- View (`contacts`, [id#294870,name#294871,address#294872,pets#294873,friends#294874,relatives#294875,employer#294876,relations#294877,p#294878]) +- Relation [id#294870,name#294871,address#294872,pets#294873,friends#294874,relatives#294875,employer#294876,relations#294877,p#294878] parquet == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#294870) AS count(id)#294915L] +- Filter (last#294908 = Jones) +- Project [id#294870, name#294871.first AS first#294906, name#294871.middle AS middle#294907, name#294871.last AS last#294908] +- Filter isnotnull(name#294871.middle) +- Project [id#294870, name#294871, address#294872, pets#294873, friends#294874, relatives#294875, employer#294876, relations#294877, p#294878] +- SubqueryAlias contacts +- View (`contacts`, [id#294870,name#294871,address#294872,pets#294873,friends#294874,relatives#294875,employer#294876,relations#294877,p#294878]) +- Relation [id#294870,name#294871,address#294872,pets#294873,friends#294874,relatives#294875,employer#294876,relations#294877,p#294878] parquet == Optimized Logical Plan == Aggregate [count(id#294870) AS count(id)#294915L] +- Project [id#294870] +- Filter ((isnotnull(name#294871.last) AND isnotnull(name#294871.middle)) AND (name#294871.last = Jones)) +- Relation [id#294870,name#294871,address#294872,pets#294873,friends#294874,relatives#294875,employer#294876,relations#294877,p#294878] parquet == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(22091) HashAggregateTransformer(keys=[], functions=[count(id#294870)], isStreamingAgg=false, output=[count(id)#294915L]) +- ^(22091) InputIteratorTransformer[count#294927L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1407042], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(22090) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#294870)], isStreamingAgg=false, output=[count#294927L]) +- ^(22090) ProjectExecTransformer [id#294870] +- ^(22090) FilterExecTransformer ((isnotnull(name#294871.last) AND isnotnull(name#294871.middle)) AND (name#294871.last = Jones)) +- ^(22090) FileScanTransformer parquet [id#294870,name#294871,p#294878] Batched: true, DataFilters: [isnotnull(name#294871.last), isnotnull(name#294871.middle), (name#294871.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-75e220a9-640a-497c-a66a-39780a68c92d/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#294870)], output=[count(id)#294915L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1407003] +- HashAggregate(keys=[], functions=[partial_count(id#294870)], output=[count#294927L]) +- Project [id#294870] +- Filter ((isnotnull(name#294871.last) AND isnotnull(name#294871.middle)) AND (name#294871.last = Jones)) +- FileScan parquet [id#294870,name#294871,p#294878] Batched: false, DataFilters: [isnotnull(name#294871.last), isnotnull(name#294871.middle), (name#294871.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-75e220a9-640a-497c-a66a-39780a68c92d/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$14479/1380631591@2a624e19))] +- Filter (last#295027 = Jones) +- Project [id#294989, name#294990.first AS first#295025, name#294990.middle AS middle#295026, name#294990.last AS last#295027] +- Filter isnotnull(name#294990.middle) +- Project [id#294989, name#294990, address#294991, pets#294992, friends#294993, relatives#294994, employer#294995, relations#294996, p#294997] +- SubqueryAlias contacts +- View (`contacts`, [id#294989,name#294990,address#294991,pets#294992,friends#294993,relatives#294994,employer#294995,relations#294996,p#294997]) +- Relation [id#294989,name#294990,address#294991,pets#294992,friends#294993,relatives#294994,employer#294995,relations#294996,p#294997] parquet == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#294989) AS count(id)#295034L] +- Filter (last#295027 = Jones) +- Project [id#294989, name#294990.first AS first#295025, name#294990.middle AS middle#295026, name#294990.last AS last#295027] +- Filter isnotnull(name#294990.middle) +- Project [id#294989, name#294990, address#294991, pets#294992, friends#294993, relatives#294994, employer#294995, relations#294996, p#294997] +- SubqueryAlias contacts +- View (`contacts`, [id#294989,name#294990,address#294991,pets#294992,friends#294993,relatives#294994,employer#294995,relations#294996,p#294997]) +- Relation [id#294989,name#294990,address#294991,pets#294992,friends#294993,relatives#294994,employer#294995,relations#294996,p#294997] parquet == Optimized Logical Plan == Aggregate [count(id#294989) AS count(id)#295034L] +- Project [id#294989] +- Filter ((isnotnull(name#294990.last) AND isnotnull(name#294990.middle)) AND (name#294990.last = Jones)) +- Relation [id#294989,name#294990,address#294991,pets#294992,friends#294993,relatives#294994,employer#294995,relations#294996,p#294997] parquet == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(22095) HashAggregateTransformer(keys=[], functions=[count(id#294989)], isStreamingAgg=false, output=[count(id)#295034L]) +- ^(22095) InputIteratorTransformer[count#295046L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1407265], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(22094) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#294989)], isStreamingAgg=false, output=[count#295046L]) +- ^(22094) ProjectExecTransformer [id#294989] +- ^(22094) FilterExecTransformer ((isnotnull(name#294990.last) AND isnotnull(name#294990.middle)) AND (name#294990.last = Jones)) +- ^(22094) FileScanTransformer parquet [id#294989,name#294990,p#294997] Batched: true, DataFilters: [isnotnull(name#294990.last), isnotnull(name#294990.middle), (name#294990.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-2cb1ddc6-9744-4444-ad36-a48219ee06c8/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#294989)], output=[count(id)#295034L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1407226] +- HashAggregate(keys=[], functions=[partial_count(id#294989)], output=[count#295046L]) +- Project [id#294989] +- Filter ((isnotnull(name#294990.last) AND isnotnull(name#294990.middle)) AND (name#294990.last = Jones)) +- FileScan parquet [id#294989,name#294990,p#294997] Batched: false, DataFilters: [isnotnull(name#294990.last), isnotnull(name#294990.middle), (name#294990.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-2cb1ddc6-9744-4444-ad36-a48219ee06c8/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$14479/1380631591@2a624e19))] +- Filter (last#295152 = Jones) +- Project [id#295114, name#295115.first AS first#295150, name#295115.middle AS middle#295151, name#295115.last AS last#295152] +- Filter isnotnull(name#295115.middle) +- Project [id#295114, name#295115, address#295116, pets#295117, friends#295118, relatives#295119, employer#295120, relations#295121, p#295122] +- SubqueryAlias contacts +- View (`contacts`, [id#295114,name#295115,address#295116,pets#295117,friends#295118,relatives#295119,employer#295120,relations#295121,p#295122]) +- Relation [id#295114,name#295115,address#295116,pets#295117,friends#295118,relatives#295119,employer#295120,relations#295121,p#295122] parquet == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#295114) AS count(id)#295159L] +- Filter (last#295152 = Jones) +- Project [id#295114, name#295115.first AS first#295150, name#295115.middle AS middle#295151, name#295115.last AS last#295152] +- Filter isnotnull(name#295115.middle) +- Project [id#295114, name#295115, address#295116, pets#295117, friends#295118, relatives#295119, employer#295120, relations#295121, p#295122] +- SubqueryAlias contacts +- View (`contacts`, [id#295114,name#295115,address#295116,pets#295117,friends#295118,relatives#295119,employer#295120,relations#295121,p#295122]) +- Relation [id#295114,name#295115,address#295116,pets#295117,friends#295118,relatives#295119,employer#295120,relations#295121,p#295122] parquet == Optimized Logical Plan == Aggregate [count(id#295114) AS count(id)#295159L] +- Project [id#295114] +- Filter ((isnotnull(name#295115.last) AND isnotnull(name#295115.middle)) AND (name#295115.last = Jones)) +- Relation [id#295114,name#295115,address#295116,pets#295117,friends#295118,relatives#295119,employer#295120,relations#295121,p#295122] parquet == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(22099) HashAggregateTransformer(keys=[], functions=[count(id#295114)], isStreamingAgg=false, output=[count(id)#295159L]) +- ^(22099) InputIteratorTransformer[count#295171L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1407488], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(22098) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#295114)], isStreamingAgg=false, output=[count#295171L]) +- ^(22098) ProjectExecTransformer [id#295114] +- ^(22098) FilterExecTransformer ((isnotnull(name#295115.last) AND isnotnull(name#295115.middle)) AND (name#295115.last = Jones)) +- ^(22098) FileScanTransformer parquet [id#295114,name#295115,p#295122] Batched: true, DataFilters: [isnotnull(name#295115.last), isnotnull(name#295115.middle), (name#295115.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-302d6025-9493-4565-b471-71e149508b29/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#295114)], output=[count(id)#295159L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1407449] +- HashAggregate(keys=[], functions=[partial_count(id#295114)], output=[count#295171L]) +- Project [id#295114] +- Filter ((isnotnull(name#295115.last) AND isnotnull(name#295115.middle)) AND (name#295115.last = Jones)) +- FileScan parquet [id#295114,name#295115,p#295122] Batched: false, DataFilters: [isnotnull(name#295115.last), isnotnull(name#295115.middle), (name#295115.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-302d6025-9493-4565-b471-71e149508b29/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#308179.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#308178,name#308179,address#308180,pets#308181,friends#308182,relatives#308183,employer#308184,relations#308185,p#308186]) +- Relation [id#308178,name#308179,address#308180,pets#308181,friends#308182,relatives#308183,employer#308184,relations#308185,p#308186] parquet == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#308179.First AS First#308249, NAME#308179.MiDDle AS MiDDle#308250] +- Filter isnotnull(Name#308179.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#308178,name#308179,address#308180,pets#308181,friends#308182,relatives#308183,employer#308184,relations#308185,p#308186]) +- Relation [id#308178,name#308179,address#308180,pets#308181,friends#308182,relatives#308183,employer#308184,relations#308185,p#308186] parquet == Optimized Logical Plan == Project [name#308179.first AS First#308249, name#308179.middle AS MiDDle#308250] +- Filter isnotnull(name#308179.middle) +- Relation [id#308178,name#308179,address#308180,pets#308181,friends#308182,relatives#308183,employer#308184,relations#308185,p#308186] parquet == Physical Plan == VeloxColumnarToRow +- ^(22719) ProjectExecTransformer [name#308179.first AS First#308249, name#308179.middle AS MiDDle#308250] +- ^(22719) FilterExecTransformer isnotnull(name#308179.middle) +- ^(22719) FileScanTransformer parquet [name#308179,p#308186] Batched: true, DataFilters: [isnotnull(name#308179.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-cf832e8f-814e-4e40-923b-347a6798b05c/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>> == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#308327.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#308326,name#308327,address#308328,pets#308329,friends#308330,relatives#308331,employer#308332,relations#308333,p#308334]) +- Relation [id#308326,name#308327,address#308328,pets#308329,friends#308330,relatives#308331,employer#308332,relations#308333,p#308334] parquet == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#308327.First AS First#308397, NAME#308327.MiDDle AS MiDDle#308398] +- Filter isnotnull(Name#308327.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#308326,name#308327,address#308328,pets#308329,friends#308330,relatives#308331,employer#308332,relations#308333,p#308334]) +- Relation [id#308326,name#308327,address#308328,pets#308329,friends#308330,relatives#308331,employer#308332,relations#308333,p#308334] parquet == Optimized Logical Plan == Project [name#308327.first AS First#308397, name#308327.middle AS MiDDle#308398] +- Filter isnotnull(name#308327.middle) +- Relation [id#308326,name#308327,address#308328,pets#308329,friends#308330,relatives#308331,employer#308332,relations#308333,p#308334] parquet == Physical Plan == VeloxColumnarToRow +- ^(22723) ProjectExecTransformer [name#308327.first AS First#308397, name#308327.middle AS MiDDle#308398] +- ^(22723) FilterExecTransformer isnotnull(name#308327.middle) +- ^(22723) FileScanTransformer parquet [name#308327,p#308334] Batched: true, DataFilters: [isnotnull(name#308327.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-a56ad0a2-4d21-412b-a55f-98d8a3f2854e/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>> == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#308469.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#308468,name#308469,address#308470,pets#308471,friends#308472,relatives#308473,employer#308474,relations#308475,p#308476]) +- Relation [id#308468,name#308469,address#308470,pets#308471,friends#308472,relatives#308473,employer#308474,relations#308475,p#308476] parquet == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#308469.First AS First#308539, NAME#308469.MiDDle AS MiDDle#308540] +- Filter isnotnull(Name#308469.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#308468,name#308469,address#308470,pets#308471,friends#308472,relatives#308473,employer#308474,relations#308475,p#308476]) +- Relation [id#308468,name#308469,address#308470,pets#308471,friends#308472,relatives#308473,employer#308474,relations#308475,p#308476] parquet == Optimized Logical Plan == Project [name#308469.first AS First#308539, name#308469.middle AS MiDDle#308540] +- Filter isnotnull(name#308469.middle) +- Relation [id#308468,name#308469,address#308470,pets#308471,friends#308472,relatives#308473,employer#308474,relations#308475,p#308476] parquet == Physical Plan == VeloxColumnarToRow +- ^(22727) ProjectExecTransformer [name#308469.first AS First#308539, name#308469.middle AS MiDDle#308540] +- ^(22727) FilterExecTransformer isnotnull(name#308469.middle) +- ^(22727) FileScanTransformer parquet [name#308469,p#308476] Batched: true, DataFilters: [isnotnull(name#308469.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-28bd014b-db8a-4bfa-8c08-5c00f1b0dfc1/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>> == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#308617.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#308616,name#308617,address#308618,pets#308619,friends#308620,relatives#308621,employer#308622,relations#308623,p#308624]) +- Relation [id#308616,name#308617,address#308618,pets#308619,friends#308620,relatives#308621,employer#308622,relations#308623,p#308624] parquet == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#308617.First AS First#308687, NAME#308617.MiDDle AS MiDDle#308688] +- Filter isnotnull(Name#308617.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#308616,name#308617,address#308618,pets#308619,friends#308620,relatives#308621,employer#308622,relations#308623,p#308624]) +- Relation [id#308616,name#308617,address#308618,pets#308619,friends#308620,relatives#308621,employer#308622,relations#308623,p#308624] parquet == Optimized Logical Plan == Project [name#308617.first AS First#308687, name#308617.middle AS MiDDle#308688] +- Filter isnotnull(name#308617.middle) +- Relation [id#308616,name#308617,address#308618,pets#308619,friends#308620,relatives#308621,employer#308622,relations#308623,p#308624] parquet == Physical Plan == VeloxColumnarToRow +- ^(22731) ProjectExecTransformer [name#308617.first AS First#308687, name#308617.middle AS MiDDle#308688] +- ^(22731) FilterExecTransformer isnotnull(name#308617.middle) +- ^(22731) FileScanTransformer parquet [name#308617,p#308624] Batched: true, DataFilters: [isnotnull(name#308617.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-57377ccd-b794-442b-83a1-55caa0e15016/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>> == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$14479/1380631591@2a624e19))] +- Filter (last#183293 = Jones) +- Project [id#183255, name#183256.first AS first#183291, name#183256.middle AS middle#183292, name#183256.last AS last#183293] +- Filter isnotnull(name#183256.middle) +- Project [id#183255, name#183256, address#183257, pets#183258, friends#183259, relatives#183260, employer#183261, relations#183262, p#183263] +- SubqueryAlias contacts +- View (`contacts`, [id#183255,name#183256,address#183257,pets#183258,friends#183259,relatives#183260,employer#183261,relations#183262,p#183263]) +- RelationV2[id#183255, name#183256, address#183257, pets#183258, friends#183259, relatives#183260, employer#183261, relations#183262, p#183263] parquet file:/tmp/spark-69df9949-8cf2-49ad-ae2c-6c51256f68b1/contacts == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#183255) AS count(id)#183300L] +- Filter (last#183293 = Jones) +- Project [id#183255, name#183256.first AS first#183291, name#183256.middle AS middle#183292, name#183256.last AS last#183293] +- Filter isnotnull(name#183256.middle) +- Project [id#183255, name#183256, address#183257, pets#183258, friends#183259, relatives#183260, employer#183261, relations#183262, p#183263] +- SubqueryAlias contacts +- View (`contacts`, [id#183255,name#183256,address#183257,pets#183258,friends#183259,relatives#183260,employer#183261,relations#183262,p#183263]) +- RelationV2[id#183255, name#183256, address#183257, pets#183258, friends#183259, relatives#183260, employer#183261, relations#183262, p#183263] parquet file:/tmp/spark-69df9949-8cf2-49ad-ae2c-6c51256f68b1/contacts == Optimized Logical Plan == Aggregate [count(id#183255) AS count(id)#183300L] +- Project [id#183255] +- Filter ((isnotnull(name#183256.last) AND isnotnull(name#183256.middle)) AND (name#183256.last = Jones)) +- RelationV2[id#183255, name#183256] parquet file:/tmp/spark-69df9949-8cf2-49ad-ae2c-6c51256f68b1/contacts == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(11864) HashAggregateTransformer(keys=[], functions=[count(id#183255)], isStreamingAgg=false, output=[count(id)#183300L]) +- ^(11864) InputIteratorTransformer[count#183305L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=982225], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(11863) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#183255)], isStreamingAgg=false, output=[count#183305L]) +- ^(11863) ProjectExecTransformer [id#183255] +- ^(11863) FilterExecTransformer ((isnotnull(name#183256.last) AND isnotnull(name#183256.middle)) AND (name#183256.last = Jones)) +- ^(11863) BatchScanExecTransformer[id#183255, name#183256] ParquetScan DataFilters: [isnotnull(name#183256.last), isnotnull(name#183256.middle), (name#183256.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-69df9949-8cf2-49ad-ae2c-6c51256f68b1/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: [] +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#183255)], output=[count(id)#183300L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=982188] +- HashAggregate(keys=[], functions=[partial_count(id#183255)], output=[count#183305L]) +- Project [id#183255] +- Filter ((isnotnull(name#183256.last) AND isnotnull(name#183256.middle)) AND (name#183256.last = Jones)) +- BatchScan[id#183255, name#183256] ParquetScan DataFilters: [isnotnull(name#183256.last), isnotnull(name#183256.middle), (name#183256.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-69df9949-8cf2-49ad-ae2c-6c51256f68b1/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: [] == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$14479/1380631591@2a624e19))] +- Filter (last#183414 = Jones) +- Project [id#183376, name#183377.first AS first#183412, name#183377.middle AS middle#183413, name#183377.last AS last#183414] +- Filter isnotnull(name#183377.middle) +- Project [id#183376, name#183377, address#183378, pets#183379, friends#183380, relatives#183381, employer#183382, relations#183383, p#183384] +- SubqueryAlias contacts +- View (`contacts`, [id#183376,name#183377,address#183378,pets#183379,friends#183380,relatives#183381,employer#183382,relations#183383,p#183384]) +- RelationV2[id#183376, name#183377, address#183378, pets#183379, friends#183380, relatives#183381, employer#183382, relations#183383, p#183384] parquet file:/tmp/spark-125db852-2cde-4a72-a4b4-88dd5251ece4/contacts == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#183376) AS count(id)#183421L] +- Filter (last#183414 = Jones) +- Project [id#183376, name#183377.first AS first#183412, name#183377.middle AS middle#183413, name#183377.last AS last#183414] +- Filter isnotnull(name#183377.middle) +- Project [id#183376, name#183377, address#183378, pets#183379, friends#183380, relatives#183381, employer#183382, relations#183383, p#183384] +- SubqueryAlias contacts +- View (`contacts`, [id#183376,name#183377,address#183378,pets#183379,friends#183380,relatives#183381,employer#183382,relations#183383,p#183384]) +- RelationV2[id#183376, name#183377, address#183378, pets#183379, friends#183380, relatives#183381, employer#183382, relations#183383, p#183384] parquet file:/tmp/spark-125db852-2cde-4a72-a4b4-88dd5251ece4/contacts == Optimized Logical Plan == Aggregate [count(id#183376) AS count(id)#183421L] +- Project [id#183376] +- Filter ((isnotnull(name#183377.last) AND isnotnull(name#183377.middle)) AND (name#183377.last = Jones)) +- RelationV2[id#183376, name#183377] parquet file:/tmp/spark-125db852-2cde-4a72-a4b4-88dd5251ece4/contacts == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(11868) HashAggregateTransformer(keys=[], functions=[count(id#183376)], isStreamingAgg=false, output=[count(id)#183421L]) +- ^(11868) InputIteratorTransformer[count#183426L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=982442], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(11867) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#183376)], isStreamingAgg=false, output=[count#183426L]) +- ^(11867) ProjectExecTransformer [id#183376] +- ^(11867) FilterExecTransformer ((isnotnull(name#183377.last) AND isnotnull(name#183377.middle)) AND (name#183377.last = Jones)) +- ^(11867) BatchScanExecTransformer[id#183376, name#183377] ParquetScan DataFilters: [isnotnull(name#183377.last), isnotnull(name#183377.middle), (name#183377.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-125db852-2cde-4a72-a4b4-88dd5251ece4/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: [] +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#183376)], output=[count(id)#183421L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=982405] +- HashAggregate(keys=[], functions=[partial_count(id#183376)], output=[count#183426L]) +- Project [id#183376] +- Filter ((isnotnull(name#183377.last) AND isnotnull(name#183377.middle)) AND (name#183377.last = Jones)) +- BatchScan[id#183376, name#183377] ParquetScan DataFilters: [isnotnull(name#183377.last), isnotnull(name#183377.middle), (name#183377.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-125db852-2cde-4a72-a4b4-88dd5251ece4/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: [] == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$14479/1380631591@2a624e19))] +- Filter (last#183529 = Jones) +- Project [id#183491, name#183492.first AS first#183527, name#183492.middle AS middle#183528, name#183492.last AS last#183529] +- Filter isnotnull(name#183492.middle) +- Project [id#183491, name#183492, address#183493, pets#183494, friends#183495, relatives#183496, employer#183497, relations#183498, p#183499] +- SubqueryAlias contacts +- View (`contacts`, [id#183491,name#183492,address#183493,pets#183494,friends#183495,relatives#183496,employer#183497,relations#183498,p#183499]) +- RelationV2[id#183491, name#183492, address#183493, pets#183494, friends#183495, relatives#183496, employer#183497, relations#183498, p#183499] parquet file:/tmp/spark-d305494e-58c4-46e5-88d8-e98341bc437e/contacts == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#183491) AS count(id)#183536L] +- Filter (last#183529 = Jones) +- Project [id#183491, name#183492.first AS first#183527, name#183492.middle AS middle#183528, name#183492.last AS last#183529] +- Filter isnotnull(name#183492.middle) +- Project [id#183491, name#183492, address#183493, pets#183494, friends#183495, relatives#183496, employer#183497, relations#183498, p#183499] +- SubqueryAlias contacts +- View (`contacts`, [id#183491,name#183492,address#183493,pets#183494,friends#183495,relatives#183496,employer#183497,relations#183498,p#183499]) +- RelationV2[id#183491, name#183492, address#183493, pets#183494, friends#183495, relatives#183496, employer#183497, relations#183498, p#183499] parquet file:/tmp/spark-d305494e-58c4-46e5-88d8-e98341bc437e/contacts == Optimized Logical Plan == Aggregate [count(id#183491) AS count(id)#183536L] +- Project [id#183491] +- Filter ((isnotnull(name#183492.last) AND isnotnull(name#183492.middle)) AND (name#183492.last = Jones)) +- RelationV2[id#183491, name#183492] parquet file:/tmp/spark-d305494e-58c4-46e5-88d8-e98341bc437e/contacts == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(11872) HashAggregateTransformer(keys=[], functions=[count(id#183491)], isStreamingAgg=false, output=[count(id)#183536L]) +- ^(11872) InputIteratorTransformer[count#183541L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=982659], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(11871) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#183491)], isStreamingAgg=false, output=[count#183541L]) +- ^(11871) ProjectExecTransformer [id#183491] +- ^(11871) FilterExecTransformer ((isnotnull(name#183492.last) AND isnotnull(name#183492.middle)) AND (name#183492.last = Jones)) +- ^(11871) BatchScanExecTransformer[id#183491, name#183492] ParquetScan DataFilters: [isnotnull(name#183492.last), isnotnull(name#183492.middle), (name#183492.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-d305494e-58c4-46e5-88d8-e98341bc437e/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: [] +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#183491)], output=[count(id)#183536L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=982622] +- HashAggregate(keys=[], functions=[partial_count(id#183491)], output=[count#183541L]) +- Project [id#183491] +- Filter ((isnotnull(name#183492.last) AND isnotnull(name#183492.middle)) AND (name#183492.last = Jones)) +- BatchScan[id#183491, name#183492] ParquetScan DataFilters: [isnotnull(name#183492.last), isnotnull(name#183492.middle), (name#183492.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-d305494e-58c4-46e5-88d8-e98341bc437e/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: [] == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$14479/1380631591@2a624e19))] +- Filter (last#183650 = Jones) +- Project [id#183612, name#183613.first AS first#183648, name#183613.middle AS middle#183649, name#183613.last AS last#183650] +- Filter isnotnull(name#183613.middle) +- Project [id#183612, name#183613, address#183614, pets#183615, friends#183616, relatives#183617, employer#183618, relations#183619, p#183620] +- SubqueryAlias contacts +- View (`contacts`, [id#183612,name#183613,address#183614,pets#183615,friends#183616,relatives#183617,employer#183618,relations#183619,p#183620]) +- RelationV2[id#183612, name#183613, address#183614, pets#183615, friends#183616, relatives#183617, employer#183618, relations#183619, p#183620] parquet file:/tmp/spark-f7a30b38-513b-40c1-9e0c-24edc9af30e5/contacts == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#183612) AS count(id)#183657L] +- Filter (last#183650 = Jones) +- Project [id#183612, name#183613.first AS first#183648, name#183613.middle AS middle#183649, name#183613.last AS last#183650] +- Filter isnotnull(name#183613.middle) +- Project [id#183612, name#183613, address#183614, pets#183615, friends#183616, relatives#183617, employer#183618, relations#183619, p#183620] +- SubqueryAlias contacts +- View (`contacts`, [id#183612,name#183613,address#183614,pets#183615,friends#183616,relatives#183617,employer#183618,relations#183619,p#183620]) +- RelationV2[id#183612, name#183613, address#183614, pets#183615, friends#183616, relatives#183617, employer#183618, relations#183619, p#183620] parquet file:/tmp/spark-f7a30b38-513b-40c1-9e0c-24edc9af30e5/contacts == Optimized Logical Plan == Aggregate [count(id#183612) AS count(id)#183657L] +- Project [id#183612] +- Filter ((isnotnull(name#183613.last) AND isnotnull(name#183613.middle)) AND (name#183613.last = Jones)) +- RelationV2[id#183612, name#183613] parquet file:/tmp/spark-f7a30b38-513b-40c1-9e0c-24edc9af30e5/contacts == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(11876) HashAggregateTransformer(keys=[], functions=[count(id#183612)], isStreamingAgg=false, output=[count(id)#183657L]) +- ^(11876) InputIteratorTransformer[count#183662L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=982876], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(11875) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#183612)], isStreamingAgg=false, output=[count#183662L]) +- ^(11875) ProjectExecTransformer [id#183612] +- ^(11875) FilterExecTransformer ((isnotnull(name#183613.last) AND isnotnull(name#183613.middle)) AND (name#183613.last = Jones)) +- ^(11875) BatchScanExecTransformer[id#183612, name#183613] ParquetScan DataFilters: [isnotnull(name#183613.last), isnotnull(name#183613.middle), (name#183613.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-f7a30b38-513b-40c1-9e0c-24edc9af30e5/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: [] +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#183612)], output=[count(id)#183657L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=982839] +- HashAggregate(keys=[], functions=[partial_count(id#183612)], output=[count#183662L]) +- Project [id#183612] +- Filter ((isnotnull(name#183613.last) AND isnotnull(name#183613.middle)) AND (name#183613.last = Jones)) +- BatchScan[id#183612, name#183613] ParquetScan DataFilters: [isnotnull(name#183613.last), isnotnull(name#183613.middle), (name#183613.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-f7a30b38-513b-40c1-9e0c-24edc9af30e5/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: [] == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#196009.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#196008,name#196009,address#196010,pets#196011,friends#196012,relatives#196013,employer#196014,relations#196015,p#196016]) +- RelationV2[id#196008, name#196009, address#196010, pets#196011, friends#196012, relatives#196013, employer#196014, relations#196015, p#196016] parquet file:/tmp/spark-504bf387-b699-45d3-9865-ba2af61d27f2/contacts == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#196009.First AS First#196069, NAME#196009.MiDDle AS MiDDle#196070] +- Filter isnotnull(Name#196009.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#196008,name#196009,address#196010,pets#196011,friends#196012,relatives#196013,employer#196014,relations#196015,p#196016]) +- RelationV2[id#196008, name#196009, address#196010, pets#196011, friends#196012, relatives#196013, employer#196014, relations#196015, p#196016] parquet file:/tmp/spark-504bf387-b699-45d3-9865-ba2af61d27f2/contacts == Optimized Logical Plan == Project [name#196009.first AS First#196069, name#196009.middle AS MiDDle#196070] +- Filter isnotnull(name#196009.middle) +- RelationV2[name#196009] parquet file:/tmp/spark-504bf387-b699-45d3-9865-ba2af61d27f2/contacts == Physical Plan == VeloxColumnarToRow +- ^(12496) ProjectExecTransformer [name#196009.first AS First#196069, name#196009.middle AS MiDDle#196070] +- ^(12496) FilterExecTransformer isnotnull(name#196009.middle) +- ^(12496) BatchScanExecTransformer[name#196009] ParquetScan DataFilters: [isnotnull(name#196009.MIDDLE)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-504bf387-b699-45d3-9865-ba2af61d27f2/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.middle)], PushedGroupBy: [], ReadSchema: struct<name:struct<first:string,middle:string>>, PushedFilters: [IsNotNull(name.middle)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: [] == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#196137.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#196136,name#196137,address#196138,pets#196139,friends#196140,relatives#196141,employer#196142,relations#196143,p#196144]) +- RelationV2[id#196136, name#196137, address#196138, pets#196139, friends#196140, relatives#196141, employer#196142, relations#196143, p#196144] parquet file:/tmp/spark-2d19326d-2455-4f5f-8339-13e075576720/contacts == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#196137.First AS First#196197, NAME#196137.MiDDle AS MiDDle#196198] +- Filter isnotnull(Name#196137.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#196136,name#196137,address#196138,pets#196139,friends#196140,relatives#196141,employer#196142,relations#196143,p#196144]) +- RelationV2[id#196136, name#196137, address#196138, pets#196139, friends#196140, relatives#196141, employer#196142, relations#196143, p#196144] parquet file:/tmp/spark-2d19326d-2455-4f5f-8339-13e075576720/contacts == Optimized Logical Plan == Project [name#196137.first AS First#196197, name#196137.middle AS MiDDle#196198] +- Filter isnotnull(name#196137.middle) +- RelationV2[name#196137] parquet file:/tmp/spark-2d19326d-2455-4f5f-8339-13e075576720/contacts == Physical Plan == VeloxColumnarToRow +- ^(12500) ProjectExecTransformer [name#196137.first AS First#196197, name#196137.middle AS MiDDle#196198] +- ^(12500) FilterExecTransformer isnotnull(name#196137.middle) +- ^(12500) BatchScanExecTransformer[name#196137] ParquetScan DataFilters: [isnotnull(name#196137.MIDDLE)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-2d19326d-2455-4f5f-8339-13e075576720/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.middle)], PushedGroupBy: [], ReadSchema: struct<name:struct<first:string,middle:string>>, PushedFilters: [IsNotNull(name.middle)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: [] == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#196259.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#196258,name#196259,address#196260,pets#196261,friends#196262,relatives#196263,employer#196264,relations#196265,p#196266]) +- RelationV2[id#196258, name#196259, address#196260, pets#196261, friends#196262, relatives#196263, employer#196264, relations#196265, p#196266] parquet file:/tmp/spark-54ee8f73-6d77-4af9-b536-c9f49478417b/contacts == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#196259.First AS First#196319, NAME#196259.MiDDle AS MiDDle#196320] +- Filter isnotnull(Name#196259.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#196258,name#196259,address#196260,pets#196261,friends#196262,relatives#196263,employer#196264,relations#196265,p#196266]) +- RelationV2[id#196258, name#196259, address#196260, pets#196261, friends#196262, relatives#196263, employer#196264, relations#196265, p#196266] parquet file:/tmp/spark-54ee8f73-6d77-4af9-b536-c9f49478417b/contacts == Optimized Logical Plan == Project [name#196259.first AS First#196319, name#196259.middle AS MiDDle#196320] +- Filter isnotnull(name#196259.middle) +- RelationV2[name#196259] parquet file:/tmp/spark-54ee8f73-6d77-4af9-b536-c9f49478417b/contacts == Physical Plan == VeloxColumnarToRow +- ^(12504) ProjectExecTransformer [name#196259.first AS First#196319, name#196259.middle AS MiDDle#196320] +- ^(12504) FilterExecTransformer isnotnull(name#196259.middle) +- ^(12504) BatchScanExecTransformer[name#196259] ParquetScan DataFilters: [isnotnull(name#196259.MIDDLE)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-54ee8f73-6d77-4af9-b536-c9f49478417b/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.middle)], PushedGroupBy: [], ReadSchema: struct<name:struct<first:string,middle:string>>, PushedFilters: [IsNotNull(name.middle)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: [] == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#196387.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#196386,name#196387,address#196388,pets#196389,friends#196390,relatives#196391,employer#196392,relations#196393,p#196394]) +- RelationV2[id#196386, name#196387, address#196388, pets#196389, friends#196390, relatives#196391, employer#196392, relations#196393, p#196394] parquet file:/tmp/spark-805d0ea7-f827-4b26-8c62-252d696507dd/contacts == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#196387.First AS First#196447, NAME#196387.MiDDle AS MiDDle#196448] +- Filter isnotnull(Name#196387.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#196386,name#196387,address#196388,pets#196389,friends#196390,relatives#196391,employer#196392,relations#196393,p#196394]) +- RelationV2[id#196386, name#196387, address#196388, pets#196389, friends#196390, relatives#196391, employer#196392, relations#196393, p#196394] parquet file:/tmp/spark-805d0ea7-f827-4b26-8c62-252d696507dd/contacts == Optimized Logical Plan == Project [name#196387.first AS First#196447, name#196387.middle AS MiDDle#196448] +- Filter isnotnull(name#196387.middle) +- RelationV2[name#196387] parquet file:/tmp/spark-805d0ea7-f827-4b26-8c62-252d696507dd/contacts == Physical Plan == VeloxColumnarToRow +- ^(12508) ProjectExecTransformer [name#196387.first AS First#196447, name#196387.middle AS MiDDle#196448] +- ^(12508) FilterExecTransformer isnotnull(name#196387.middle) +- ^(12508) BatchScanExecTransformer[name#196387] ParquetScan DataFilters: [isnotnull(name#196387.MIDDLE)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-805d0ea7-f827-4b26-8c62-252d696507dd/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.middle)], PushedGroupBy: [], ReadSchema: struct<name:struct<first:string,middle:string>>, PushedFilters: [IsNotNull(name.middle)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: [] == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15328/1048723032@233ccb3b))] +- Filter (last#312182 = Jones) +- Project [id#312144, name#312145.first AS first#312180, name#312145.middle AS middle#312181, name#312145.last AS last#312182] +- Filter isnotnull(name#312145.middle) +- Project [id#312144, name#312145, address#312146, pets#312147, friends#312148, relatives#312149, employer#312150, relations#312151, p#312152] +- SubqueryAlias contacts +- View (`contacts`, [id#312144,name#312145,address#312146,pets#312147,friends#312148,relatives#312149,employer#312150,relations#312151,p#312152]) +- Relation [id#312144,name#312145,address#312146,pets#312147,friends#312148,relatives#312149,employer#312150,relations#312151,p#312152] parquet == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#312144) AS count(id)#312189L] +- Filter (last#312182 = Jones) +- Project [id#312144, name#312145.first AS first#312180, name#312145.middle AS middle#312181, name#312145.last AS last#312182] +- Filter isnotnull(name#312145.middle) +- Project [id#312144, name#312145, address#312146, pets#312147, friends#312148, relatives#312149, employer#312150, relations#312151, p#312152] +- SubqueryAlias contacts +- View (`contacts`, [id#312144,name#312145,address#312146,pets#312147,friends#312148,relatives#312149,employer#312150,relations#312151,p#312152]) +- Relation [id#312144,name#312145,address#312146,pets#312147,friends#312148,relatives#312149,employer#312150,relations#312151,p#312152] parquet == Optimized Logical Plan == Aggregate [count(id#312144) AS count(id)#312189L] +- Project [id#312144] +- Filter ((isnotnull(name#312145.last) AND isnotnull(name#312145.middle)) AND (name#312145.last = Jones)) +- Relation [id#312144,name#312145,address#312146,pets#312147,friends#312148,relatives#312149,employer#312150,relations#312151,p#312152] parquet == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(23490) HashAggregateTransformer(keys=[], functions=[count(id#312144)], isStreamingAgg=false, output=[count(id)#312189L]) +- ^(23490) InputIteratorTransformer[count#312201L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1508543], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(23489) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#312144)], isStreamingAgg=false, output=[count#312201L]) +- ^(23489) ProjectExecTransformer [id#312144] +- ^(23489) FilterExecTransformer ((isnotnull(name#312145.last) AND isnotnull(name#312145.middle)) AND (name#312145.last = Jones)) +- ^(23489) FileScanTransformer parquet [id#312144,name#312145,p#312152] Batched: true, DataFilters: [isnotnull(name#312145.last), isnotnull(name#312145.middle), (name#312145.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-f8aa6b6a-c063-4ef6-a595-8d796105d43b/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#312144)], output=[count(id)#312189L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1508485] +- HashAggregate(keys=[], functions=[partial_count(id#312144)], output=[count#312201L]) +- Project [id#312144] +- Filter ((isnotnull(name#312145.last) AND isnotnull(name#312145.middle)) AND (name#312145.last = Jones)) +- FileScan parquet [id#312144,name#312145,p#312152] Batched: true, DataFilters: [isnotnull(name#312145.last), isnotnull(name#312145.middle), (name#312145.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-f8aa6b6a-c063-4ef6-a595-8d796105d43b/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15328/1048723032@233ccb3b))] +- Filter (last#312307 = Jones) +- Project [id#312269, name#312270.first AS first#312305, name#312270.middle AS middle#312306, name#312270.last AS last#312307] +- Filter isnotnull(name#312270.middle) +- Project [id#312269, name#312270, address#312271, pets#312272, friends#312273, relatives#312274, employer#312275, relations#312276, p#312277] +- SubqueryAlias contacts +- View (`contacts`, [id#312269,name#312270,address#312271,pets#312272,friends#312273,relatives#312274,employer#312275,relations#312276,p#312277]) +- Relation [id#312269,name#312270,address#312271,pets#312272,friends#312273,relatives#312274,employer#312275,relations#312276,p#312277] parquet == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#312269) AS count(id)#312314L] +- Filter (last#312307 = Jones) +- Project [id#312269, name#312270.first AS first#312305, name#312270.middle AS middle#312306, name#312270.last AS last#312307] +- Filter isnotnull(name#312270.middle) +- Project [id#312269, name#312270, address#312271, pets#312272, friends#312273, relatives#312274, employer#312275, relations#312276, p#312277] +- SubqueryAlias contacts +- View (`contacts`, [id#312269,name#312270,address#312271,pets#312272,friends#312273,relatives#312274,employer#312275,relations#312276,p#312277]) +- Relation [id#312269,name#312270,address#312271,pets#312272,friends#312273,relatives#312274,employer#312275,relations#312276,p#312277] parquet == Optimized Logical Plan == Aggregate [count(id#312269) AS count(id)#312314L] +- Project [id#312269] +- Filter ((isnotnull(name#312270.last) AND isnotnull(name#312270.middle)) AND (name#312270.last = Jones)) +- Relation [id#312269,name#312270,address#312271,pets#312272,friends#312273,relatives#312274,employer#312275,relations#312276,p#312277] parquet == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(23494) HashAggregateTransformer(keys=[], functions=[count(id#312269)], isStreamingAgg=false, output=[count(id)#312314L]) +- ^(23494) InputIteratorTransformer[count#312326L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1508819], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(23493) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#312269)], isStreamingAgg=false, output=[count#312326L]) +- ^(23493) ProjectExecTransformer [id#312269] +- ^(23493) FilterExecTransformer ((isnotnull(name#312270.last) AND isnotnull(name#312270.middle)) AND (name#312270.last = Jones)) +- ^(23493) FileScanTransformer parquet [id#312269,name#312270,p#312277] Batched: true, DataFilters: [isnotnull(name#312270.last), isnotnull(name#312270.middle), (name#312270.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-509c36f2-928c-4630-9b30-b7d9f4f7a58d/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#312269)], output=[count(id)#312314L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1508761] +- HashAggregate(keys=[], functions=[partial_count(id#312269)], output=[count#312326L]) +- Project [id#312269] +- Filter ((isnotnull(name#312270.last) AND isnotnull(name#312270.middle)) AND (name#312270.last = Jones)) +- FileScan parquet [id#312269,name#312270,p#312277] Batched: true, DataFilters: [isnotnull(name#312270.last), isnotnull(name#312270.middle), (name#312270.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-509c36f2-928c-4630-9b30-b7d9f4f7a58d/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15328/1048723032@233ccb3b))] +- Filter (last#312426 = Jones) +- Project [id#312388, name#312389.first AS first#312424, name#312389.middle AS middle#312425, name#312389.last AS last#312426] +- Filter isnotnull(name#312389.middle) +- Project [id#312388, name#312389, address#312390, pets#312391, friends#312392, relatives#312393, employer#312394, relations#312395, p#312396] +- SubqueryAlias contacts +- View (`contacts`, [id#312388,name#312389,address#312390,pets#312391,friends#312392,relatives#312393,employer#312394,relations#312395,p#312396]) +- Relation [id#312388,name#312389,address#312390,pets#312391,friends#312392,relatives#312393,employer#312394,relations#312395,p#312396] parquet == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#312388) AS count(id)#312433L] +- Filter (last#312426 = Jones) +- Project [id#312388, name#312389.first AS first#312424, name#312389.middle AS middle#312425, name#312389.last AS last#312426] +- Filter isnotnull(name#312389.middle) +- Project [id#312388, name#312389, address#312390, pets#312391, friends#312392, relatives#312393, employer#312394, relations#312395, p#312396] +- SubqueryAlias contacts +- View (`contacts`, [id#312388,name#312389,address#312390,pets#312391,friends#312392,relatives#312393,employer#312394,relations#312395,p#312396]) +- Relation [id#312388,name#312389,address#312390,pets#312391,friends#312392,relatives#312393,employer#312394,relations#312395,p#312396] parquet == Optimized Logical Plan == Aggregate [count(id#312388) AS count(id)#312433L] +- Project [id#312388] +- Filter ((isnotnull(name#312389.last) AND isnotnull(name#312389.middle)) AND (name#312389.last = Jones)) +- Relation [id#312388,name#312389,address#312390,pets#312391,friends#312392,relatives#312393,employer#312394,relations#312395,p#312396] parquet == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(23498) HashAggregateTransformer(keys=[], functions=[count(id#312388)], isStreamingAgg=false, output=[count(id)#312433L]) +- ^(23498) InputIteratorTransformer[count#312445L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1509076], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(23497) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#312388)], isStreamingAgg=false, output=[count#312445L]) +- ^(23497) ProjectExecTransformer [id#312388] +- ^(23497) FilterExecTransformer ((isnotnull(name#312389.last) AND isnotnull(name#312389.middle)) AND (name#312389.last = Jones)) +- ^(23497) FileScanTransformer parquet [id#312388,name#312389,p#312396] Batched: true, DataFilters: [isnotnull(name#312389.last), isnotnull(name#312389.middle), (name#312389.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-0edcad54-a9e0-478c-bcd9-b86020c74b37/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#312388)], output=[count(id)#312433L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1509037] +- HashAggregate(keys=[], functions=[partial_count(id#312388)], output=[count#312445L]) +- Project [id#312388] +- Filter ((isnotnull(name#312389.last) AND isnotnull(name#312389.middle)) AND (name#312389.last = Jones)) +- FileScan parquet [id#312388,name#312389,p#312396] Batched: false, DataFilters: [isnotnull(name#312389.last), isnotnull(name#312389.middle), (name#312389.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-0edcad54-a9e0-478c-bcd9-b86020c74b37/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15328/1048723032@233ccb3b))] +- Filter (last#312551 = Jones) +- Project [id#312513, name#312514.first AS first#312549, name#312514.middle AS middle#312550, name#312514.last AS last#312551] +- Filter isnotnull(name#312514.middle) +- Project [id#312513, name#312514, address#312515, pets#312516, friends#312517, relatives#312518, employer#312519, relations#312520, p#312521] +- SubqueryAlias contacts +- View (`contacts`, [id#312513,name#312514,address#312515,pets#312516,friends#312517,relatives#312518,employer#312519,relations#312520,p#312521]) +- Relation [id#312513,name#312514,address#312515,pets#312516,friends#312517,relatives#312518,employer#312519,relations#312520,p#312521] parquet == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#312513) AS count(id)#312558L] +- Filter (last#312551 = Jones) +- Project [id#312513, name#312514.first AS first#312549, name#312514.middle AS middle#312550, name#312514.last AS last#312551] +- Filter isnotnull(name#312514.middle) +- Project [id#312513, name#312514, address#312515, pets#312516, friends#312517, relatives#312518, employer#312519, relations#312520, p#312521] +- SubqueryAlias contacts +- View (`contacts`, [id#312513,name#312514,address#312515,pets#312516,friends#312517,relatives#312518,employer#312519,relations#312520,p#312521]) +- Relation [id#312513,name#312514,address#312515,pets#312516,friends#312517,relatives#312518,employer#312519,relations#312520,p#312521] parquet == Optimized Logical Plan == Aggregate [count(id#312513) AS count(id)#312558L] +- Project [id#312513] +- Filter ((isnotnull(name#312514.last) AND isnotnull(name#312514.middle)) AND (name#312514.last = Jones)) +- Relation [id#312513,name#312514,address#312515,pets#312516,friends#312517,relatives#312518,employer#312519,relations#312520,p#312521] parquet == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(23502) HashAggregateTransformer(keys=[], functions=[count(id#312513)], isStreamingAgg=false, output=[count(id)#312558L]) +- ^(23502) InputIteratorTransformer[count#312570L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1509314], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(23501) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#312513)], isStreamingAgg=false, output=[count#312570L]) +- ^(23501) ProjectExecTransformer [id#312513] +- ^(23501) FilterExecTransformer ((isnotnull(name#312514.last) AND isnotnull(name#312514.middle)) AND (name#312514.last = Jones)) +- ^(23501) FileScanTransformer parquet [id#312513,name#312514,p#312521] Batched: true, DataFilters: [isnotnull(name#312514.last), isnotnull(name#312514.middle), (name#312514.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-43d59f86-20fc-43e0-9036-55ccf661c611/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#312513)], output=[count(id)#312558L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1509275] +- HashAggregate(keys=[], functions=[partial_count(id#312513)], output=[count#312570L]) +- Project [id#312513] +- Filter ((isnotnull(name#312514.last) AND isnotnull(name#312514.middle)) AND (name#312514.last = Jones)) +- FileScan parquet [id#312513,name#312514,p#312521] Batched: false, DataFilters: [isnotnull(name#312514.last), isnotnull(name#312514.middle), (name#312514.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-43d59f86-20fc-43e0-9036-55ccf661c611/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#325658.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#325657,name#325658,address#325659,pets#325660,friends#325661,relatives#325662,employer#325663,relations#325664,p#325665]) +- Relation [id#325657,name#325658,address#325659,pets#325660,friends#325661,relatives#325662,employer#325663,relations#325664,p#325665] parquet == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#325658.First AS First#325728, NAME#325658.MiDDle AS MiDDle#325729] +- Filter isnotnull(Name#325658.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#325657,name#325658,address#325659,pets#325660,friends#325661,relatives#325662,employer#325663,relations#325664,p#325665]) +- Relation [id#325657,name#325658,address#325659,pets#325660,friends#325661,relatives#325662,employer#325663,relations#325664,p#325665] parquet == Optimized Logical Plan == Project [name#325658.first AS First#325728, name#325658.middle AS MiDDle#325729] +- Filter isnotnull(name#325658.middle) +- Relation [id#325657,name#325658,address#325659,pets#325660,friends#325661,relatives#325662,employer#325663,relations#325664,p#325665] parquet == Physical Plan == VeloxColumnarToRow +- ^(24146) ProjectExecTransformer [name#325658.first AS First#325728, name#325658.middle AS MiDDle#325729] +- ^(24146) FilterExecTransformer isnotnull(name#325658.middle) +- ^(24146) FileScanTransformer parquet [name#325658,p#325665] Batched: true, DataFilters: [isnotnull(name#325658.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-36ceaca8-fe76-4d85-a0a2-a02f39ebe9b2/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>> == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#325806.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#325805,name#325806,address#325807,pets#325808,friends#325809,relatives#325810,employer#325811,relations#325812,p#325813]) +- Relation [id#325805,name#325806,address#325807,pets#325808,friends#325809,relatives#325810,employer#325811,relations#325812,p#325813] parquet == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#325806.First AS First#325876, NAME#325806.MiDDle AS MiDDle#325877] +- Filter isnotnull(Name#325806.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#325805,name#325806,address#325807,pets#325808,friends#325809,relatives#325810,employer#325811,relations#325812,p#325813]) +- Relation [id#325805,name#325806,address#325807,pets#325808,friends#325809,relatives#325810,employer#325811,relations#325812,p#325813] parquet == Optimized Logical Plan == Project [name#325806.first AS First#325876, name#325806.middle AS MiDDle#325877] +- Filter isnotnull(name#325806.middle) +- Relation [id#325805,name#325806,address#325807,pets#325808,friends#325809,relatives#325810,employer#325811,relations#325812,p#325813] parquet == Physical Plan == VeloxColumnarToRow +- ^(24150) ProjectExecTransformer [name#325806.first AS First#325876, name#325806.middle AS MiDDle#325877] +- ^(24150) FilterExecTransformer isnotnull(name#325806.middle) +- ^(24150) FileScanTransformer parquet [name#325806,p#325813] Batched: true, DataFilters: [isnotnull(name#325806.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-a14f20b2-a5d0-4b13-bf22-6e8a66dd45c2/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>> == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#325948.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#325947,name#325948,address#325949,pets#325950,friends#325951,relatives#325952,employer#325953,relations#325954,p#325955]) +- Relation [id#325947,name#325948,address#325949,pets#325950,friends#325951,relatives#325952,employer#325953,relations#325954,p#325955] parquet == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#325948.First AS First#326018, NAME#325948.MiDDle AS MiDDle#326019] +- Filter isnotnull(Name#325948.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#325947,name#325948,address#325949,pets#325950,friends#325951,relatives#325952,employer#325953,relations#325954,p#325955]) +- Relation [id#325947,name#325948,address#325949,pets#325950,friends#325951,relatives#325952,employer#325953,relations#325954,p#325955] parquet == Optimized Logical Plan == Project [name#325948.first AS First#326018, name#325948.middle AS MiDDle#326019] +- Filter isnotnull(name#325948.middle) +- Relation [id#325947,name#325948,address#325949,pets#325950,friends#325951,relatives#325952,employer#325953,relations#325954,p#325955] parquet == Physical Plan == VeloxColumnarToRow +- ^(24154) ProjectExecTransformer [name#325948.first AS First#326018, name#325948.middle AS MiDDle#326019] +- ^(24154) FilterExecTransformer isnotnull(name#325948.middle) +- ^(24154) FileScanTransformer parquet [name#325948,p#325955] Batched: true, DataFilters: [isnotnull(name#325948.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-b768c431-5775-4455-bff1-cbe2f1767db3/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>> == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#326096.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#326095,name#326096,address#326097,pets#326098,friends#326099,relatives#326100,employer#326101,relations#326102,p#326103]) +- Relation [id#326095,name#326096,address#326097,pets#326098,friends#326099,relatives#326100,employer#326101,relations#326102,p#326103] parquet == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#326096.First AS First#326166, NAME#326096.MiDDle AS MiDDle#326167] +- Filter isnotnull(Name#326096.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#326095,name#326096,address#326097,pets#326098,friends#326099,relatives#326100,employer#326101,relations#326102,p#326103]) +- Relation [id#326095,name#326096,address#326097,pets#326098,friends#326099,relatives#326100,employer#326101,relations#326102,p#326103] parquet == Optimized Logical Plan == Project [name#326096.first AS First#326166, name#326096.middle AS MiDDle#326167] +- Filter isnotnull(name#326096.middle) +- Relation [id#326095,name#326096,address#326097,pets#326098,friends#326099,relatives#326100,employer#326101,relations#326102,p#326103] parquet == Physical Plan == VeloxColumnarToRow +- ^(24158) ProjectExecTransformer [name#326096.first AS First#326166, name#326096.middle AS MiDDle#326167] +- ^(24158) FilterExecTransformer isnotnull(name#326096.middle) +- ^(24158) FileScanTransformer parquet [name#326096,p#326103] Batched: true, DataFilters: [isnotnull(name#326096.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-79cb4431-0993-4b01-b726-e45816561933/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>> == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15328/1048723032@233ccb3b))] +- Filter (last#191086 = Jones) +- Project [id#191048, name#191049.first AS first#191084, name#191049.middle AS middle#191085, name#191049.last AS last#191086] +- Filter isnotnull(name#191049.middle) +- Project [id#191048, name#191049, address#191050, pets#191051, friends#191052, relatives#191053, employer#191054, relations#191055, p#191056] +- SubqueryAlias contacts +- View (`contacts`, [id#191048,name#191049,address#191050,pets#191051,friends#191052,relatives#191053,employer#191054,relations#191055,p#191056]) +- RelationV2[id#191048, name#191049, address#191050, pets#191051, friends#191052, relatives#191053, employer#191054, relations#191055, p#191056] parquet file:/tmp/spark-f64d6944-952e-4b9d-aa6f-0047609cc728/contacts == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#191048) AS count(id)#191093L] +- Filter (last#191086 = Jones) +- Project [id#191048, name#191049.first AS first#191084, name#191049.middle AS middle#191085, name#191049.last AS last#191086] +- Filter isnotnull(name#191049.middle) +- Project [id#191048, name#191049, address#191050, pets#191051, friends#191052, relatives#191053, employer#191054, relations#191055, p#191056] +- SubqueryAlias contacts +- View (`contacts`, [id#191048,name#191049,address#191050,pets#191051,friends#191052,relatives#191053,employer#191054,relations#191055,p#191056]) +- RelationV2[id#191048, name#191049, address#191050, pets#191051, friends#191052, relatives#191053, employer#191054, relations#191055, p#191056] parquet file:/tmp/spark-f64d6944-952e-4b9d-aa6f-0047609cc728/contacts == Optimized Logical Plan == Aggregate [count(id#191048) AS count(id)#191093L] +- Project [id#191048] +- Filter ((isnotnull(name#191049.last) AND isnotnull(name#191049.middle)) AND (name#191049.last = Jones)) +- RelationV2[id#191048, name#191049] parquet file:/tmp/spark-f64d6944-952e-4b9d-aa6f-0047609cc728/contacts == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(12615) HashAggregateTransformer(keys=[], functions=[count(id#191048)], isStreamingAgg=false, output=[count(id)#191093L]) +- ^(12615) InputIteratorTransformer[count#191098L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1041705], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(12614) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#191048)], isStreamingAgg=false, output=[count#191098L]) +- ^(12614) ProjectExecTransformer [id#191048] +- ^(12614) FilterExecTransformer ((isnotnull(name#191049.last) AND isnotnull(name#191049.middle)) AND (name#191049.last = Jones)) +- ^(12614) BatchScanTransformer parquet file:/tmp/spark-f64d6944-952e-4b9d-aa6f-0047609cc728/contacts[id#191048, name#191049] ParquetScan DataFilters: [isnotnull(name#191049.last), isnotnull(name#191049.middle), (name#191049.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-f64d6944-952e-4b9d-aa6f-0047609cc728/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> RuntimeFilters: [] +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#191048)], output=[count(id)#191093L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1041649] +- HashAggregate(keys=[], functions=[partial_count(id#191048)], output=[count#191098L]) +- Project [id#191048] +- Filter ((isnotnull(name#191049.last) AND isnotnull(name#191049.middle)) AND (name#191049.last = Jones)) +- BatchScan parquet file:/tmp/spark-f64d6944-952e-4b9d-aa6f-0047609cc728/contacts[id#191048, name#191049] ParquetScan DataFilters: [isnotnull(name#191049.last), isnotnull(name#191049.middle), (name#191049.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-f64d6944-952e-4b9d-aa6f-0047609cc728/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> RuntimeFilters: [] == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15328/1048723032@233ccb3b))] +- Filter (last#191251 = Jones) +- Project [id#191213, name#191214.first AS first#191249, name#191214.middle AS middle#191250, name#191214.last AS last#191251] +- Filter isnotnull(name#191214.middle) +- Project [id#191213, name#191214, address#191215, pets#191216, friends#191217, relatives#191218, employer#191219, relations#191220, p#191221] +- SubqueryAlias contacts +- View (`contacts`, [id#191213,name#191214,address#191215,pets#191216,friends#191217,relatives#191218,employer#191219,relations#191220,p#191221]) +- RelationV2[id#191213, name#191214, address#191215, pets#191216, friends#191217, relatives#191218, employer#191219, relations#191220, p#191221] parquet file:/tmp/spark-02ba9fbd-5890-47a0-9167-739ee699afd8/contacts == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#191213) AS count(id)#191258L] +- Filter (last#191251 = Jones) +- Project [id#191213, name#191214.first AS first#191249, name#191214.middle AS middle#191250, name#191214.last AS last#191251] +- Filter isnotnull(name#191214.middle) +- Project [id#191213, name#191214, address#191215, pets#191216, friends#191217, relatives#191218, employer#191219, relations#191220, p#191221] +- SubqueryAlias contacts +- View (`contacts`, [id#191213,name#191214,address#191215,pets#191216,friends#191217,relatives#191218,employer#191219,relations#191220,p#191221]) +- RelationV2[id#191213, name#191214, address#191215, pets#191216, friends#191217, relatives#191218, employer#191219, relations#191220, p#191221] parquet file:/tmp/spark-02ba9fbd-5890-47a0-9167-739ee699afd8/contacts == Optimized Logical Plan == Aggregate [count(id#191213) AS count(id)#191258L] +- Project [id#191213] +- Filter ((isnotnull(name#191214.last) AND isnotnull(name#191214.middle)) AND (name#191214.last = Jones)) +- RelationV2[id#191213, name#191214] parquet file:/tmp/spark-02ba9fbd-5890-47a0-9167-739ee699afd8/contacts == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(12619) HashAggregateTransformer(keys=[], functions=[count(id#191213)], isStreamingAgg=false, output=[count(id)#191258L]) +- ^(12619) InputIteratorTransformer[count#191263L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1041975], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(12618) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#191213)], isStreamingAgg=false, output=[count#191263L]) +- ^(12618) ProjectExecTransformer [id#191213] +- ^(12618) FilterExecTransformer ((isnotnull(name#191214.last) AND isnotnull(name#191214.middle)) AND (name#191214.last = Jones)) +- ^(12618) BatchScanTransformer parquet file:/tmp/spark-02ba9fbd-5890-47a0-9167-739ee699afd8/contacts[id#191213, name#191214] ParquetScan DataFilters: [isnotnull(name#191214.last), isnotnull(name#191214.middle), (name#191214.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-02ba9fbd-5890-47a0-9167-739ee699afd8/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> RuntimeFilters: [] +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#191213)], output=[count(id)#191258L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1041919] +- HashAggregate(keys=[], functions=[partial_count(id#191213)], output=[count#191263L]) +- Project [id#191213] +- Filter ((isnotnull(name#191214.last) AND isnotnull(name#191214.middle)) AND (name#191214.last = Jones)) +- BatchScan parquet file:/tmp/spark-02ba9fbd-5890-47a0-9167-739ee699afd8/contacts[id#191213, name#191214] ParquetScan DataFilters: [isnotnull(name#191214.last), isnotnull(name#191214.middle), (name#191214.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-02ba9fbd-5890-47a0-9167-739ee699afd8/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> RuntimeFilters: [] == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15328/1048723032@233ccb3b))] +- Filter (last#191410 = Jones) +- Project [id#191372, name#191373.first AS first#191408, name#191373.middle AS middle#191409, name#191373.last AS last#191410] +- Filter isnotnull(name#191373.middle) +- Project [id#191372, name#191373, address#191374, pets#191375, friends#191376, relatives#191377, employer#191378, relations#191379, p#191380] +- SubqueryAlias contacts +- View (`contacts`, [id#191372,name#191373,address#191374,pets#191375,friends#191376,relatives#191377,employer#191378,relations#191379,p#191380]) +- RelationV2[id#191372, name#191373, address#191374, pets#191375, friends#191376, relatives#191377, employer#191378, relations#191379, p#191380] parquet file:/tmp/spark-349b25a0-d4fc-48d9-89bb-59ad23f047bd/contacts == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#191372) AS count(id)#191417L] +- Filter (last#191410 = Jones) +- Project [id#191372, name#191373.first AS first#191408, name#191373.middle AS middle#191409, name#191373.last AS last#191410] +- Filter isnotnull(name#191373.middle) +- Project [id#191372, name#191373, address#191374, pets#191375, friends#191376, relatives#191377, employer#191378, relations#191379, p#191380] +- SubqueryAlias contacts +- View (`contacts`, [id#191372,name#191373,address#191374,pets#191375,friends#191376,relatives#191377,employer#191378,relations#191379,p#191380]) +- RelationV2[id#191372, name#191373, address#191374, pets#191375, friends#191376, relatives#191377, employer#191378, relations#191379, p#191380] parquet file:/tmp/spark-349b25a0-d4fc-48d9-89bb-59ad23f047bd/contacts == Optimized Logical Plan == Aggregate [count(id#191372) AS count(id)#191417L] +- Project [id#191372] +- Filter ((isnotnull(name#191373.last) AND isnotnull(name#191373.middle)) AND (name#191373.last = Jones)) +- RelationV2[id#191372, name#191373] parquet file:/tmp/spark-349b25a0-d4fc-48d9-89bb-59ad23f047bd/contacts == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(12623) HashAggregateTransformer(keys=[], functions=[count(id#191372)], isStreamingAgg=false, output=[count(id)#191417L]) +- ^(12623) InputIteratorTransformer[count#191422L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1042226], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(12622) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#191372)], isStreamingAgg=false, output=[count#191422L]) +- ^(12622) ProjectExecTransformer [id#191372] +- ^(12622) FilterExecTransformer ((isnotnull(name#191373.last) AND isnotnull(name#191373.middle)) AND (name#191373.last = Jones)) +- ^(12622) BatchScanTransformer parquet file:/tmp/spark-349b25a0-d4fc-48d9-89bb-59ad23f047bd/contacts[id#191372, name#191373] ParquetScan DataFilters: [isnotnull(name#191373.last), isnotnull(name#191373.middle), (name#191373.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-349b25a0-d4fc-48d9-89bb-59ad23f047bd/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> RuntimeFilters: [] +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#191372)], output=[count(id)#191417L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1042189] +- HashAggregate(keys=[], functions=[partial_count(id#191372)], output=[count#191422L]) +- Project [id#191372] +- Filter ((isnotnull(name#191373.last) AND isnotnull(name#191373.middle)) AND (name#191373.last = Jones)) +- BatchScan parquet file:/tmp/spark-349b25a0-d4fc-48d9-89bb-59ad23f047bd/contacts[id#191372, name#191373] ParquetScan DataFilters: [isnotnull(name#191373.last), isnotnull(name#191373.middle), (name#191373.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-349b25a0-d4fc-48d9-89bb-59ad23f047bd/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> RuntimeFilters: [] == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15328/1048723032@233ccb3b))] +- Filter (last#191575 = Jones) +- Project [id#191537, name#191538.first AS first#191573, name#191538.middle AS middle#191574, name#191538.last AS last#191575] +- Filter isnotnull(name#191538.middle) +- Project [id#191537, name#191538, address#191539, pets#191540, friends#191541, relatives#191542, employer#191543, relations#191544, p#191545] +- SubqueryAlias contacts +- View (`contacts`, [id#191537,name#191538,address#191539,pets#191540,friends#191541,relatives#191542,employer#191543,relations#191544,p#191545]) +- RelationV2[id#191537, name#191538, address#191539, pets#191540, friends#191541, relatives#191542, employer#191543, relations#191544, p#191545] parquet file:/tmp/spark-becb63af-cf85-4198-9262-598e2681ac32/contacts == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#191537) AS count(id)#191582L] +- Filter (last#191575 = Jones) +- Project [id#191537, name#191538.first AS first#191573, name#191538.middle AS middle#191574, name#191538.last AS last#191575] +- Filter isnotnull(name#191538.middle) +- Project [id#191537, name#191538, address#191539, pets#191540, friends#191541, relatives#191542, employer#191543, relations#191544, p#191545] +- SubqueryAlias contacts +- View (`contacts`, [id#191537,name#191538,address#191539,pets#191540,friends#191541,relatives#191542,employer#191543,relations#191544,p#191545]) +- RelationV2[id#191537, name#191538, address#191539, pets#191540, friends#191541, relatives#191542, employer#191543, relations#191544, p#191545] parquet file:/tmp/spark-becb63af-cf85-4198-9262-598e2681ac32/contacts == Optimized Logical Plan == Aggregate [count(id#191537) AS count(id)#191582L] +- Project [id#191537] +- Filter ((isnotnull(name#191538.last) AND isnotnull(name#191538.middle)) AND (name#191538.last = Jones)) +- RelationV2[id#191537, name#191538] parquet file:/tmp/spark-becb63af-cf85-4198-9262-598e2681ac32/contacts == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(12627) HashAggregateTransformer(keys=[], functions=[count(id#191537)], isStreamingAgg=false, output=[count(id)#191582L]) +- ^(12627) InputIteratorTransformer[count#191587L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1042458], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(12626) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#191537)], isStreamingAgg=false, output=[count#191587L]) +- ^(12626) ProjectExecTransformer [id#191537] +- ^(12626) FilterExecTransformer ((isnotnull(name#191538.last) AND isnotnull(name#191538.middle)) AND (name#191538.last = Jones)) +- ^(12626) BatchScanTransformer parquet file:/tmp/spark-becb63af-cf85-4198-9262-598e2681ac32/contacts[id#191537, name#191538] ParquetScan DataFilters: [isnotnull(name#191538.last), isnotnull(name#191538.middle), (name#191538.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-becb63af-cf85-4198-9262-598e2681ac32/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> RuntimeFilters: [] +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#191537)], output=[count(id)#191582L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1042421] +- HashAggregate(keys=[], functions=[partial_count(id#191537)], output=[count#191587L]) +- Project [id#191537] +- Filter ((isnotnull(name#191538.last) AND isnotnull(name#191538.middle)) AND (name#191538.last = Jones)) +- BatchScan parquet file:/tmp/spark-becb63af-cf85-4198-9262-598e2681ac32/contacts[id#191537, name#191538] ParquetScan DataFilters: [isnotnull(name#191538.last), isnotnull(name#191538.middle), (name#191538.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-becb63af-cf85-4198-9262-598e2681ac32/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> RuntimeFilters: [] == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#204832.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#204831,name#204832,address#204833,pets#204834,friends#204835,relatives#204836,employer#204837,relations#204838,p#204839]) +- RelationV2[id#204831, name#204832, address#204833, pets#204834, friends#204835, relatives#204836, employer#204837, relations#204838, p#204839] parquet file:/tmp/spark-57ee7c84-631a-4f01-952e-f1cb78b3685f/contacts == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#204832.First AS First#204892, NAME#204832.MiDDle AS MiDDle#204893] +- Filter isnotnull(Name#204832.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#204831,name#204832,address#204833,pets#204834,friends#204835,relatives#204836,employer#204837,relations#204838,p#204839]) +- RelationV2[id#204831, name#204832, address#204833, pets#204834, friends#204835, relatives#204836, employer#204837, relations#204838, p#204839] parquet file:/tmp/spark-57ee7c84-631a-4f01-952e-f1cb78b3685f/contacts == Optimized Logical Plan == Project [name#204832.first AS First#204892, name#204832.middle AS MiDDle#204893] +- Filter isnotnull(name#204832.middle) +- RelationV2[name#204832] parquet file:/tmp/spark-57ee7c84-631a-4f01-952e-f1cb78b3685f/contacts == Physical Plan == VeloxColumnarToRow +- ^(13271) ProjectExecTransformer [name#204832.first AS First#204892, name#204832.middle AS MiDDle#204893] +- ^(13271) FilterExecTransformer isnotnull(name#204832.middle) +- ^(13271) BatchScanTransformer parquet file:/tmp/spark-57ee7c84-631a-4f01-952e-f1cb78b3685f/contacts[name#204832] ParquetScan DataFilters: [isnotnull(name#204832.MIDDLE)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-57ee7c84-631a-4f01-952e-f1cb78b3685f/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.middle)], PushedGroupBy: [], ReadSchema: struct<name:struct<first:string,middle:string>> RuntimeFilters: [] == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#204960.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#204959,name#204960,address#204961,pets#204962,friends#204963,relatives#204964,employer#204965,relations#204966,p#204967]) +- RelationV2[id#204959, name#204960, address#204961, pets#204962, friends#204963, relatives#204964, employer#204965, relations#204966, p#204967] parquet file:/tmp/spark-8fdf0f4a-9fac-4625-8ca0-76228fe87439/contacts == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#204960.First AS First#205020, NAME#204960.MiDDle AS MiDDle#205021] +- Filter isnotnull(Name#204960.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#204959,name#204960,address#204961,pets#204962,friends#204963,relatives#204964,employer#204965,relations#204966,p#204967]) +- RelationV2[id#204959, name#204960, address#204961, pets#204962, friends#204963, relatives#204964, employer#204965, relations#204966, p#204967] parquet file:/tmp/spark-8fdf0f4a-9fac-4625-8ca0-76228fe87439/contacts == Optimized Logical Plan == Project [name#204960.first AS First#205020, name#204960.middle AS MiDDle#205021] +- Filter isnotnull(name#204960.middle) +- RelationV2[name#204960] parquet file:/tmp/spark-8fdf0f4a-9fac-4625-8ca0-76228fe87439/contacts == Physical Plan == VeloxColumnarToRow +- ^(13275) ProjectExecTransformer [name#204960.first AS First#205020, name#204960.middle AS MiDDle#205021] +- ^(13275) FilterExecTransformer isnotnull(name#204960.middle) +- ^(13275) BatchScanTransformer parquet file:/tmp/spark-8fdf0f4a-9fac-4625-8ca0-76228fe87439/contacts[name#204960] ParquetScan DataFilters: [isnotnull(name#204960.MIDDLE)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-8fdf0f4a-9fac-4625-8ca0-76228fe87439/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.middle)], PushedGroupBy: [], ReadSchema: struct<name:struct<first:string,middle:string>> RuntimeFilters: [] == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#205082.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#205081,name#205082,address#205083,pets#205084,friends#205085,relatives#205086,employer#205087,relations#205088,p#205089]) +- RelationV2[id#205081, name#205082, address#205083, pets#205084, friends#205085, relatives#205086, employer#205087, relations#205088, p#205089] parquet file:/tmp/spark-0c9858aa-c8a9-4c26-8722-2e6a486f00ee/contacts == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#205082.First AS First#205142, NAME#205082.MiDDle AS MiDDle#205143] +- Filter isnotnull(Name#205082.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#205081,name#205082,address#205083,pets#205084,friends#205085,relatives#205086,employer#205087,relations#205088,p#205089]) +- RelationV2[id#205081, name#205082, address#205083, pets#205084, friends#205085, relatives#205086, employer#205087, relations#205088, p#205089] parquet file:/tmp/spark-0c9858aa-c8a9-4c26-8722-2e6a486f00ee/contacts == Optimized Logical Plan == Project [name#205082.first AS First#205142, name#205082.middle AS MiDDle#205143] +- Filter isnotnull(name#205082.middle) +- RelationV2[name#205082] parquet file:/tmp/spark-0c9858aa-c8a9-4c26-8722-2e6a486f00ee/contacts == Physical Plan == VeloxColumnarToRow +- ^(13279) ProjectExecTransformer [name#205082.first AS First#205142, name#205082.middle AS MiDDle#205143] +- ^(13279) FilterExecTransformer isnotnull(name#205082.middle) +- ^(13279) BatchScanTransformer parquet file:/tmp/spark-0c9858aa-c8a9-4c26-8722-2e6a486f00ee/contacts[name#205082] ParquetScan DataFilters: [isnotnull(name#205082.MIDDLE)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-0c9858aa-c8a9-4c26-8722-2e6a486f00ee/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.middle)], PushedGroupBy: [], ReadSchema: struct<name:struct<first:string,middle:string>> RuntimeFilters: [] == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project ['Name.First, 'NAME.MiDDle] +- Filter isnotnull(Name#205210.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#205209,name#205210,address#205211,pets#205212,friends#205213,relatives#205214,employer#205215,relations#205216,p#205217]) +- RelationV2[id#205209, name#205210, address#205211, pets#205212, friends#205213, relatives#205214, employer#205215, relations#205216, p#205217] parquet file:/tmp/spark-63b14327-07e6-4b3f-8bc0-50d844ceef35/contacts == Analyzed Logical Plan == First: string, MiDDle: string Project [Name#205210.First AS First#205270, NAME#205210.MiDDle AS MiDDle#205271] +- Filter isnotnull(Name#205210.MIDDLE) +- SubqueryAlias contacts +- View (`contacts`, [id#205209,name#205210,address#205211,pets#205212,friends#205213,relatives#205214,employer#205215,relations#205216,p#205217]) +- RelationV2[id#205209, name#205210, address#205211, pets#205212, friends#205213, relatives#205214, employer#205215, relations#205216, p#205217] parquet file:/tmp/spark-63b14327-07e6-4b3f-8bc0-50d844ceef35/contacts == Optimized Logical Plan == Project [name#205210.first AS First#205270, name#205210.middle AS MiDDle#205271] +- Filter isnotnull(name#205210.middle) +- RelationV2[name#205210] parquet file:/tmp/spark-63b14327-07e6-4b3f-8bc0-50d844ceef35/contacts == Physical Plan == VeloxColumnarToRow +- ^(13283) ProjectExecTransformer [name#205210.first AS First#205270, name#205210.middle AS MiDDle#205271] +- ^(13283) FilterExecTransformer isnotnull(name#205210.middle) +- ^(13283) BatchScanTransformer parquet file:/tmp/spark-63b14327-07e6-4b3f-8bc0-50d844ceef35/contacts[name#205210] ParquetScan DataFilters: [isnotnull(name#205210.MIDDLE)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-63b14327-07e6-4b3f-8bc0-50d844ceef35/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.middle)], PushedGroupBy: [], ReadSchema: struct<name:struct<first:string,middle:string>> RuntimeFilters: [] == Results == == Results == !== Correct Answer - 2 == == Spark Answer - 4 == !struct<> struct<First:string,MiDDle:string> [Jane,X.] [Jane,X.] ![John,Y.] [Janet,null] ! [Jim,null] ! [John,Y.]
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15841/1926373532@2850e296))] +- Filter (last#301405 = Jones) +- Project [id#301367, name#301368.first AS first#301403, name#301368.middle AS middle#301404, name#301368.last AS last#301405] +- Filter isnotnull(name#301368.middle) +- Project [id#301367, name#301368, address#301369, pets#301370, friends#301371, relatives#301372, employer#301373, relations#301374, p#301375] +- SubqueryAlias contacts +- View (`contacts`, [id#301367,name#301368,address#301369,pets#301370,friends#301371,relatives#301372,employer#301373,relations#301374,p#301375]) +- Relation [id#301367,name#301368,address#301369,pets#301370,friends#301371,relatives#301372,employer#301373,relations#301374,p#301375] parquet == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#301367) AS count(id)#301412L] +- Filter (last#301405 = Jones) +- Project [id#301367, name#301368.first AS first#301403, name#301368.middle AS middle#301404, name#301368.last AS last#301405] +- Filter isnotnull(name#301368.middle) +- Project [id#301367, name#301368, address#301369, pets#301370, friends#301371, relatives#301372, employer#301373, relations#301374, p#301375] +- SubqueryAlias contacts +- View (`contacts`, [id#301367,name#301368,address#301369,pets#301370,friends#301371,relatives#301372,employer#301373,relations#301374,p#301375]) +- Relation [id#301367,name#301368,address#301369,pets#301370,friends#301371,relatives#301372,employer#301373,relations#301374,p#301375] parquet == Optimized Logical Plan == Aggregate [count(id#301367) AS count(id)#301412L] +- Project [id#301367] +- Filter ((isnotnull(name#301368.last) AND isnotnull(name#301368.middle)) AND (name#301368.last = Jones)) +- Relation [id#301367,name#301368,address#301369,pets#301370,friends#301371,relatives#301372,employer#301373,relations#301374,p#301375] parquet == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(22976) HashAggregateTransformer(keys=[], functions=[count(id#301367)], isStreamingAgg=false, output=[count(id)#301412L]) +- ^(22976) InputIteratorTransformer[count#301424L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1467561], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(22975) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#301367)], isStreamingAgg=false, output=[count#301424L]) +- ^(22975) ProjectExecTransformer [id#301367] +- ^(22975) FilterExecTransformer ((isnotnull(name#301368.last) AND isnotnull(name#301368.middle)) AND (name#301368.last = Jones)) +- ^(22975) FileScanTransformer parquet [id#301367,name#301368,p#301375] Batched: true, DataFilters: [isnotnull(name#301368.last), isnotnull(name#301368.middle), (name#301368.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-e190383e-cba2-470b-87b9-3db3c0f0c1d7/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#301367)], output=[count(id)#301412L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1467503] +- HashAggregate(keys=[], functions=[partial_count(id#301367)], output=[count#301424L]) +- Project [id#301367] +- Filter ((isnotnull(name#301368.last) AND isnotnull(name#301368.middle)) AND (name#301368.last = Jones)) +- FileScan parquet [id#301367,name#301368,p#301375] Batched: true, DataFilters: [isnotnull(name#301368.last), isnotnull(name#301368.middle), (name#301368.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-e190383e-cba2-470b-87b9-3db3c0f0c1d7/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field: org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query: Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] Timezone Env: == Parsed Logical Plan == 'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15841/1926373532@2850e296))] +- Filter (last#301530 = Jones) +- Project [id#301492, name#301493.first AS first#301528, name#301493.middle AS middle#301529, name#301493.last AS last#301530] +- Filter isnotnull(name#301493.middle) +- Project [id#301492, name#301493, address#301494, pets#301495, friends#301496, relatives#301497, employer#301498, relations#301499, p#301500] +- SubqueryAlias contacts +- View (`contacts`, [id#301492,name#301493,address#301494,pets#301495,friends#301496,relatives#301497,employer#301498,relations#301499,p#301500]) +- Relation [id#301492,name#301493,address#301494,pets#301495,friends#301496,relatives#301497,employer#301498,relations#301499,p#301500] parquet == Analyzed Logical Plan == count(id): bigint Aggregate [count(id#301492) AS count(id)#301537L] +- Filter (last#301530 = Jones) +- Project [id#301492, name#301493.first AS first#301528, name#301493.middle AS middle#301529, name#301493.last AS last#301530] +- Filter isnotnull(name#301493.middle) +- Project [id#301492, name#301493, address#301494, pets#301495, friends#301496, relatives#301497, employer#301498, relations#301499, p#301500] +- SubqueryAlias contacts +- View (`contacts`, [id#301492,name#301493,address#301494,pets#301495,friends#301496,relatives#301497,employer#301498,relations#301499,p#301500]) +- Relation [id#301492,name#301493,address#301494,pets#301495,friends#301496,relatives#301497,employer#301498,relations#301499,p#301500] parquet == Optimized Logical Plan == Aggregate [count(id#301492) AS count(id)#301537L] +- Project [id#301492] +- Filter ((isnotnull(name#301493.last) AND isnotnull(name#301493.middle)) AND (name#301493.last = Jones)) +- Relation [id#301492,name#301493,address#301494,pets#301495,friends#301496,relatives#301497,employer#301498,relations#301499,p#301500] parquet == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == VeloxColumnarToRow +- ^(22980) HashAggregateTransformer(keys=[], functions=[count(id#301492)], isStreamingAgg=false, output=[count(id)#301537L]) +- ^(22980) InputIteratorTransformer[count#301549L] +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1467837], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType) +- VeloxResizeBatches 1024, 2147483647 +- ^(22979) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#301492)], isStreamingAgg=false, output=[count#301549L]) +- ^(22979) ProjectExecTransformer [id#301492] +- ^(22979) FilterExecTransformer ((isnotnull(name#301493.last) AND isnotnull(name#301493.middle)) AND (name#301493.last = Jones)) +- ^(22979) FileScanTransformer parquet [id#301492,name#301493,p#301500] Batched: true, DataFilters: [isnotnull(name#301493.last), isnotnull(name#301493.middle), (name#301493.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-a5dc5ac3-4825-4281-b9b2-b5a98b302a7b/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> +- == Initial Plan == HashAggregate(keys=[], functions=[count(id#301492)], output=[count(id)#301537L]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1467779] +- HashAggregate(keys=[], functions=[partial_count(id#301492)], output=[count#301549L]) +- Project [id#301492] +- Filter ((isnotnull(name#301493.last) AND isnotnull(name#301493.middle)) AND (name#301493.last = Jones)) +- FileScan parquet [id#301492,name#301493,p#301500] Batched: true, DataFilters: [isnotnull(name#301493.last), isnotnull(name#301493.middle), (name#301493.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-a5dc5ac3-4825-4281-b9b2-b5a98b302a7b/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> == Results == == Results == !== Correct Answer - 1 == == Spark Answer - 1 == !struct<> struct<count(id):bigint> ![0] [2]