[VL] Enable filter push-down on nested field #29430
Annotations
50 errors
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$13390/1121076727@2e6a445))]
+- Filter (last#244622 = Jones)
+- Project [id#244584, name#244585.first AS first#244620, name#244585.middle AS middle#244621, name#244585.last AS last#244622]
+- Filter isnotnull(name#244585.middle)
+- Project [id#244584, name#244585, address#244586, pets#244587, friends#244588, relatives#244589, employer#244590, relations#244591, p#244592]
+- SubqueryAlias contacts
+- View (`contacts`, [id#244584,name#244585,address#244586,pets#244587,friends#244588,relatives#244589,employer#244590,relations#244591,p#244592])
+- Relation [id#244584,name#244585,address#244586,pets#244587,friends#244588,relatives#244589,employer#244590,relations#244591,p#244592] parquet
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#244584) AS count(id)#244629L]
+- Filter (last#244622 = Jones)
+- Project [id#244584, name#244585.first AS first#244620, name#244585.middle AS middle#244621, name#244585.last AS last#244622]
+- Filter isnotnull(name#244585.middle)
+- Project [id#244584, name#244585, address#244586, pets#244587, friends#244588, relatives#244589, employer#244590, relations#244591, p#244592]
+- SubqueryAlias contacts
+- View (`contacts`, [id#244584,name#244585,address#244586,pets#244587,friends#244588,relatives#244589,employer#244590,relations#244591,p#244592])
+- Relation [id#244584,name#244585,address#244586,pets#244587,friends#244588,relatives#244589,employer#244590,relations#244591,p#244592] parquet
== Optimized Logical Plan ==
Aggregate [count(id#244584) AS count(id)#244629L]
+- Project [id#244584]
+- Filter ((isnotnull(name#244585) AND isnotnull(name#244585.middle)) AND (name#244585.last = Jones))
+- Relation [id#244584,name#244585,address#244586,pets#244587,friends#244588,relatives#244589,employer#244590,relations#244591,p#244592] parquet
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(18507) HashAggregateTransformer(keys=[], functions=[count(id#244584)], isStreamingAgg=false, output=[count(id)#244629L])
+- ^(18507) InputIteratorTransformer[count#244641L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1228086], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(18506) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#244584)], isStreamingAgg=false, output=[count#244641L])
+- ^(18506) ProjectExecTransformer [id#244584]
+- ^(18506) FilterExecTransformer ((isnotnull(name#244585) AND isnotnull(name#244585.middle)) AND (name#244585.last = Jones))
+- ^(18506) FileScanTransformer parquet [id#244584,name#244585,p#244592] Batched: true, DataFilters: [isnotnull(name#244585), isnotnull(name#244585.middle), (name#244585.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-b771ec69-dc58-4580-9d66-c0bc75eb6709/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#244584)], output=[count(id)#244629L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1228047]
+- HashAggregate(keys=[], functions=[partial_count(id#244584)], output=[count#244641L])
+- Project [id#244584]
+- Filter ((isnotnull(name#244585) AND isnotnull(name#244585.middle)) AND (name#244585.last = Jones))
+- FileScan parquet [id#244584,name#244585,p#244592] Batched: false, DataFilters: [isnotnull(name#244585), isnotnull(name#244585.middle), (name#244585.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-b771ec69-dc58-4580-9d66-c0bc75eb6709/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$13390/1121076727@2e6a445))]
+- Filter (last#244747 = Jones)
+- Project [id#244709, name#244710.first AS first#244745, name#244710.middle AS middle#244746, name#244710.last AS last#244747]
+- Filter isnotnull(name#244710.middle)
+- Project [id#244709, name#244710, address#244711, pets#244712, friends#244713, relatives#244714, employer#244715, relations#244716, p#244717]
+- SubqueryAlias contacts
+- View (`contacts`, [id#244709,name#244710,address#244711,pets#244712,friends#244713,relatives#244714,employer#244715,relations#244716,p#244717])
+- Relation [id#244709,name#244710,address#244711,pets#244712,friends#244713,relatives#244714,employer#244715,relations#244716,p#244717] parquet
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#244709) AS count(id)#244754L]
+- Filter (last#244747 = Jones)
+- Project [id#244709, name#244710.first AS first#244745, name#244710.middle AS middle#244746, name#244710.last AS last#244747]
+- Filter isnotnull(name#244710.middle)
+- Project [id#244709, name#244710, address#244711, pets#244712, friends#244713, relatives#244714, employer#244715, relations#244716, p#244717]
+- SubqueryAlias contacts
+- View (`contacts`, [id#244709,name#244710,address#244711,pets#244712,friends#244713,relatives#244714,employer#244715,relations#244716,p#244717])
+- Relation [id#244709,name#244710,address#244711,pets#244712,friends#244713,relatives#244714,employer#244715,relations#244716,p#244717] parquet
== Optimized Logical Plan ==
Aggregate [count(id#244709) AS count(id)#244754L]
+- Project [id#244709]
+- Filter ((isnotnull(name#244710) AND isnotnull(name#244710.middle)) AND (name#244710.last = Jones))
+- Relation [id#244709,name#244710,address#244711,pets#244712,friends#244713,relatives#244714,employer#244715,relations#244716,p#244717] parquet
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(18511) HashAggregateTransformer(keys=[], functions=[count(id#244709)], isStreamingAgg=false, output=[count(id)#244754L])
+- ^(18511) InputIteratorTransformer[count#244766L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1228309], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(18510) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#244709)], isStreamingAgg=false, output=[count#244766L])
+- ^(18510) ProjectExecTransformer [id#244709]
+- ^(18510) FilterExecTransformer ((isnotnull(name#244710) AND isnotnull(name#244710.middle)) AND (name#244710.last = Jones))
+- ^(18510) FileScanTransformer parquet [id#244709,name#244710,p#244717] Batched: true, DataFilters: [isnotnull(name#244710), isnotnull(name#244710.middle), (name#244710.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-402dd32e-43f0-437b-ad6b-b6af6918c44b/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#244709)], output=[count(id)#244754L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1228270]
+- HashAggregate(keys=[], functions=[partial_count(id#244709)], output=[count#244766L])
+- Project [id#244709]
+- Filter ((isnotnull(name#244710) AND isnotnull(name#244710.middle)) AND (name#244710.last = Jones))
+- FileScan parquet [id#244709,name#244710,p#244717] Batched: false, DataFilters: [isnotnull(name#244710), isnotnull(name#244710.middle), (name#244710.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-402dd32e-43f0-437b-ad6b-b6af6918c44b/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$13390/1121076727@2e6a445))]
+- Filter (last#244866 = Jones)
+- Project [id#244828, name#244829.first AS first#244864, name#244829.middle AS middle#244865, name#244829.last AS last#244866]
+- Filter isnotnull(name#244829.middle)
+- Project [id#244828, name#244829, address#244830, pets#244831, friends#244832, relatives#244833, employer#244834, relations#244835, p#244836]
+- SubqueryAlias contacts
+- View (`contacts`, [id#244828,name#244829,address#244830,pets#244831,friends#244832,relatives#244833,employer#244834,relations#244835,p#244836])
+- Relation [id#244828,name#244829,address#244830,pets#244831,friends#244832,relatives#244833,employer#244834,relations#244835,p#244836] parquet
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#244828) AS count(id)#244873L]
+- Filter (last#244866 = Jones)
+- Project [id#244828, name#244829.first AS first#244864, name#244829.middle AS middle#244865, name#244829.last AS last#244866]
+- Filter isnotnull(name#244829.middle)
+- Project [id#244828, name#244829, address#244830, pets#244831, friends#244832, relatives#244833, employer#244834, relations#244835, p#244836]
+- SubqueryAlias contacts
+- View (`contacts`, [id#244828,name#244829,address#244830,pets#244831,friends#244832,relatives#244833,employer#244834,relations#244835,p#244836])
+- Relation [id#244828,name#244829,address#244830,pets#244831,friends#244832,relatives#244833,employer#244834,relations#244835,p#244836] parquet
== Optimized Logical Plan ==
Aggregate [count(id#244828) AS count(id)#244873L]
+- Project [id#244828]
+- Filter ((isnotnull(name#244829) AND isnotnull(name#244829.middle)) AND (name#244829.last = Jones))
+- Relation [id#244828,name#244829,address#244830,pets#244831,friends#244832,relatives#244833,employer#244834,relations#244835,p#244836] parquet
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(18515) HashAggregateTransformer(keys=[], functions=[count(id#244828)], isStreamingAgg=false, output=[count(id)#244873L])
+- ^(18515) InputIteratorTransformer[count#244885L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1228532], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(18514) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#244828)], isStreamingAgg=false, output=[count#244885L])
+- ^(18514) ProjectExecTransformer [id#244828]
+- ^(18514) FilterExecTransformer ((isnotnull(name#244829) AND isnotnull(name#244829.middle)) AND (name#244829.last = Jones))
+- ^(18514) FileScanTransformer parquet [id#244828,name#244829,p#244836] Batched: true, DataFilters: [isnotnull(name#244829), isnotnull(name#244829.middle), (name#244829.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-f76345e8-d4e8-4a5b-bbdb-2af14d4e34b0/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#244828)], output=[count(id)#244873L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1228493]
+- HashAggregate(keys=[], functions=[partial_count(id#244828)], output=[count#244885L])
+- Project [id#244828]
+- Filter ((isnotnull(name#244829) AND isnotnull(name#244829.middle)) AND (name#244829.last = Jones))
+- FileScan parquet [id#244828,name#244829,p#244836] Batched: false, DataFilters: [isnotnull(name#244829), isnotnull(name#244829.middle), (name#244829.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-f76345e8-d4e8-4a5b-bbdb-2af14d4e34b0/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$13390/1121076727@2e6a445))]
+- Filter (last#244991 = Jones)
+- Project [id#244953, name#244954.first AS first#244989, name#244954.middle AS middle#244990, name#244954.last AS last#244991]
+- Filter isnotnull(name#244954.middle)
+- Project [id#244953, name#244954, address#244955, pets#244956, friends#244957, relatives#244958, employer#244959, relations#244960, p#244961]
+- SubqueryAlias contacts
+- View (`contacts`, [id#244953,name#244954,address#244955,pets#244956,friends#244957,relatives#244958,employer#244959,relations#244960,p#244961])
+- Relation [id#244953,name#244954,address#244955,pets#244956,friends#244957,relatives#244958,employer#244959,relations#244960,p#244961] parquet
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#244953) AS count(id)#244998L]
+- Filter (last#244991 = Jones)
+- Project [id#244953, name#244954.first AS first#244989, name#244954.middle AS middle#244990, name#244954.last AS last#244991]
+- Filter isnotnull(name#244954.middle)
+- Project [id#244953, name#244954, address#244955, pets#244956, friends#244957, relatives#244958, employer#244959, relations#244960, p#244961]
+- SubqueryAlias contacts
+- View (`contacts`, [id#244953,name#244954,address#244955,pets#244956,friends#244957,relatives#244958,employer#244959,relations#244960,p#244961])
+- Relation [id#244953,name#244954,address#244955,pets#244956,friends#244957,relatives#244958,employer#244959,relations#244960,p#244961] parquet
== Optimized Logical Plan ==
Aggregate [count(id#244953) AS count(id)#244998L]
+- Project [id#244953]
+- Filter ((isnotnull(name#244954) AND isnotnull(name#244954.middle)) AND (name#244954.last = Jones))
+- Relation [id#244953,name#244954,address#244955,pets#244956,friends#244957,relatives#244958,employer#244959,relations#244960,p#244961] parquet
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(18519) HashAggregateTransformer(keys=[], functions=[count(id#244953)], isStreamingAgg=false, output=[count(id)#244998L])
+- ^(18519) InputIteratorTransformer[count#245010L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1228755], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(18518) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#244953)], isStreamingAgg=false, output=[count#245010L])
+- ^(18518) ProjectExecTransformer [id#244953]
+- ^(18518) FilterExecTransformer ((isnotnull(name#244954) AND isnotnull(name#244954.middle)) AND (name#244954.last = Jones))
+- ^(18518) FileScanTransformer parquet [id#244953,name#244954,p#244961] Batched: true, DataFilters: [isnotnull(name#244954), isnotnull(name#244954.middle), (name#244954.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-d7e7f713-4548-4b40-a85d-b6698d80cdc8/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#244953)], output=[count(id)#244998L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1228716]
+- HashAggregate(keys=[], functions=[partial_count(id#244953)], output=[count#245010L])
+- Project [id#244953]
+- Filter ((isnotnull(name#244954) AND isnotnull(name#244954.middle)) AND (name#244954.last = Jones))
+- FileScan parquet [id#244953,name#244954,p#244961] Batched: false, DataFilters: [isnotnull(name#244954), isnotnull(name#244954.middle), (name#244954.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-d7e7f713-4548-4b40-a85d-b6698d80cdc8/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#257861.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#257860,name#257861,address#257862,pets#257863,friends#257864,relatives#257865,employer#257866,relations#257867,p#257868])
+- Relation [id#257860,name#257861,address#257862,pets#257863,friends#257864,relatives#257865,employer#257866,relations#257867,p#257868] parquet
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#257861.First AS First#257931, NAME#257861.MiDDle AS MiDDle#257932]
+- Filter isnotnull(Name#257861.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#257860,name#257861,address#257862,pets#257863,friends#257864,relatives#257865,employer#257866,relations#257867,p#257868])
+- Relation [id#257860,name#257861,address#257862,pets#257863,friends#257864,relatives#257865,employer#257866,relations#257867,p#257868] parquet
== Optimized Logical Plan ==
Project [name#257861.first AS First#257931, name#257861.middle AS MiDDle#257932]
+- Filter (isnotnull(name#257861) AND isnotnull(name#257861.middle))
+- Relation [id#257860,name#257861,address#257862,pets#257863,friends#257864,relatives#257865,employer#257866,relations#257867,p#257868] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(19118) ProjectExecTransformer [name#257861.first AS First#257931, name#257861.middle AS MiDDle#257932]
+- ^(19118) FilterExecTransformer (isnotnull(name#257861) AND isnotnull(name#257861.middle))
+- ^(19118) FileScanTransformer parquet [name#257861,p#257868] Batched: true, DataFilters: [isnotnull(name#257861), isnotnull(name#257861.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-fdff98a1-de98-4725-a10f-79ca54ea3241/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#258009.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#258008,name#258009,address#258010,pets#258011,friends#258012,relatives#258013,employer#258014,relations#258015,p#258016])
+- Relation [id#258008,name#258009,address#258010,pets#258011,friends#258012,relatives#258013,employer#258014,relations#258015,p#258016] parquet
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#258009.First AS First#258079, NAME#258009.MiDDle AS MiDDle#258080]
+- Filter isnotnull(Name#258009.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#258008,name#258009,address#258010,pets#258011,friends#258012,relatives#258013,employer#258014,relations#258015,p#258016])
+- Relation [id#258008,name#258009,address#258010,pets#258011,friends#258012,relatives#258013,employer#258014,relations#258015,p#258016] parquet
== Optimized Logical Plan ==
Project [name#258009.first AS First#258079, name#258009.middle AS MiDDle#258080]
+- Filter (isnotnull(name#258009) AND isnotnull(name#258009.middle))
+- Relation [id#258008,name#258009,address#258010,pets#258011,friends#258012,relatives#258013,employer#258014,relations#258015,p#258016] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(19122) ProjectExecTransformer [name#258009.first AS First#258079, name#258009.middle AS MiDDle#258080]
+- ^(19122) FilterExecTransformer (isnotnull(name#258009) AND isnotnull(name#258009.middle))
+- ^(19122) FileScanTransformer parquet [name#258009,p#258016] Batched: true, DataFilters: [isnotnull(name#258009), isnotnull(name#258009.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-cc51932e-5373-4187-9dbc-ac23e24574b9/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#258151.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#258150,name#258151,address#258152,pets#258153,friends#258154,relatives#258155,employer#258156,relations#258157,p#258158])
+- Relation [id#258150,name#258151,address#258152,pets#258153,friends#258154,relatives#258155,employer#258156,relations#258157,p#258158] parquet
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#258151.First AS First#258221, NAME#258151.MiDDle AS MiDDle#258222]
+- Filter isnotnull(Name#258151.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#258150,name#258151,address#258152,pets#258153,friends#258154,relatives#258155,employer#258156,relations#258157,p#258158])
+- Relation [id#258150,name#258151,address#258152,pets#258153,friends#258154,relatives#258155,employer#258156,relations#258157,p#258158] parquet
== Optimized Logical Plan ==
Project [name#258151.first AS First#258221, name#258151.middle AS MiDDle#258222]
+- Filter (isnotnull(name#258151) AND isnotnull(name#258151.middle))
+- Relation [id#258150,name#258151,address#258152,pets#258153,friends#258154,relatives#258155,employer#258156,relations#258157,p#258158] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(19126) ProjectExecTransformer [name#258151.first AS First#258221, name#258151.middle AS MiDDle#258222]
+- ^(19126) FilterExecTransformer (isnotnull(name#258151) AND isnotnull(name#258151.middle))
+- ^(19126) FileScanTransformer parquet [name#258151,p#258158] Batched: true, DataFilters: [isnotnull(name#258151), isnotnull(name#258151.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-a76268f2-5e51-42c5-a72c-d6f504dae3f3/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#258299.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#258298,name#258299,address#258300,pets#258301,friends#258302,relatives#258303,employer#258304,relations#258305,p#258306])
+- Relation [id#258298,name#258299,address#258300,pets#258301,friends#258302,relatives#258303,employer#258304,relations#258305,p#258306] parquet
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#258299.First AS First#258369, NAME#258299.MiDDle AS MiDDle#258370]
+- Filter isnotnull(Name#258299.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#258298,name#258299,address#258300,pets#258301,friends#258302,relatives#258303,employer#258304,relations#258305,p#258306])
+- Relation [id#258298,name#258299,address#258300,pets#258301,friends#258302,relatives#258303,employer#258304,relations#258305,p#258306] parquet
== Optimized Logical Plan ==
Project [name#258299.first AS First#258369, name#258299.middle AS MiDDle#258370]
+- Filter (isnotnull(name#258299) AND isnotnull(name#258299.middle))
+- Relation [id#258298,name#258299,address#258300,pets#258301,friends#258302,relatives#258303,employer#258304,relations#258305,p#258306] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(19130) ProjectExecTransformer [name#258299.first AS First#258369, name#258299.middle AS MiDDle#258370]
+- ^(19130) FilterExecTransformer (isnotnull(name#258299) AND isnotnull(name#258299.middle))
+- ^(19130) FileScanTransformer parquet [name#258299,p#258306] Batched: true, DataFilters: [isnotnull(name#258299), isnotnull(name#258299.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-aacaf644-49cf-4b4e-a7dc-75ce59ed7d15/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$13390/1121076727@2e6a445))]
+- Filter (last#167351 = Jones)
+- Project [id#167313, name#167314.first AS first#167349, name#167314.middle AS middle#167350, name#167314.last AS last#167351]
+- Filter isnotnull(name#167314.middle)
+- Project [id#167313, name#167314, address#167315, pets#167316, friends#167317, relatives#167318, employer#167319, relations#167320, p#167321]
+- SubqueryAlias contacts
+- View (`contacts`, [id#167313,name#167314,address#167315,pets#167316,friends#167317,relatives#167318,employer#167319,relations#167320,p#167321])
+- RelationV2[id#167313, name#167314, address#167315, pets#167316, friends#167317, relatives#167318, employer#167319, relations#167320, p#167321] parquet file:/tmp/spark-b110863f-3f85-4cb2-b067-c5cbcb7f5713/contacts
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#167313) AS count(id)#167358L]
+- Filter (last#167351 = Jones)
+- Project [id#167313, name#167314.first AS first#167349, name#167314.middle AS middle#167350, name#167314.last AS last#167351]
+- Filter isnotnull(name#167314.middle)
+- Project [id#167313, name#167314, address#167315, pets#167316, friends#167317, relatives#167318, employer#167319, relations#167320, p#167321]
+- SubqueryAlias contacts
+- View (`contacts`, [id#167313,name#167314,address#167315,pets#167316,friends#167317,relatives#167318,employer#167319,relations#167320,p#167321])
+- RelationV2[id#167313, name#167314, address#167315, pets#167316, friends#167317, relatives#167318, employer#167319, relations#167320, p#167321] parquet file:/tmp/spark-b110863f-3f85-4cb2-b067-c5cbcb7f5713/contacts
== Optimized Logical Plan ==
Aggregate [count(id#167313) AS count(id)#167358L]
+- Project [id#167313]
+- Filter ((isnotnull(name#167314) AND isnotnull(name#167314.middle)) AND (name#167314.last = Jones))
+- RelationV2[id#167313, name#167314] parquet file:/tmp/spark-b110863f-3f85-4cb2-b067-c5cbcb7f5713/contacts
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(11326) HashAggregateTransformer(keys=[], functions=[count(id#167313)], isStreamingAgg=false, output=[count(id)#167358L])
+- ^(11326) InputIteratorTransformer[count#167367L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=946810], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(11325) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#167313)], isStreamingAgg=false, output=[count#167367L])
+- ^(11325) ProjectExecTransformer [id#167313]
+- ^(11325) FilterExecTransformer ((isnotnull(name#167314) AND isnotnull(name#167314.middle)) AND (name#167314.last = Jones))
+- ^(11325) BatchScanExecTransformer[id#167313, name#167314] ParquetScan DataFilters: [isnotnull(name#167314), isnotnull(name#167314.middle), (name#167314.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-b110863f-3f85-4cb2-b067-c5cbcb7f5713/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)] RuntimeFilters: []
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#167313)], output=[count(id)#167358L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=946773]
+- HashAggregate(keys=[], functions=[partial_count(id#167313)], output=[count#167367L])
+- Project [id#167313]
+- Filter ((isnotnull(name#167314) AND isnotnull(name#167314.middle)) AND (name#167314.last = Jones))
+- BatchScan[id#167313, name#167314] ParquetScan DataFilters: [isnotnull(name#167314), isnotnull(name#167314.middle), (name#167314.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-b110863f-3f85-4cb2-b067-c5cbcb7f5713/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$13390/1121076727@2e6a445))]
+- Filter (last#167476 = Jones)
+- Project [id#167438, name#167439.first AS first#167474, name#167439.middle AS middle#167475, name#167439.last AS last#167476]
+- Filter isnotnull(name#167439.middle)
+- Project [id#167438, name#167439, address#167440, pets#167441, friends#167442, relatives#167443, employer#167444, relations#167445, p#167446]
+- SubqueryAlias contacts
+- View (`contacts`, [id#167438,name#167439,address#167440,pets#167441,friends#167442,relatives#167443,employer#167444,relations#167445,p#167446])
+- RelationV2[id#167438, name#167439, address#167440, pets#167441, friends#167442, relatives#167443, employer#167444, relations#167445, p#167446] parquet file:/tmp/spark-d90c9f45-df42-4689-b81a-90e59ed12635/contacts
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#167438) AS count(id)#167483L]
+- Filter (last#167476 = Jones)
+- Project [id#167438, name#167439.first AS first#167474, name#167439.middle AS middle#167475, name#167439.last AS last#167476]
+- Filter isnotnull(name#167439.middle)
+- Project [id#167438, name#167439, address#167440, pets#167441, friends#167442, relatives#167443, employer#167444, relations#167445, p#167446]
+- SubqueryAlias contacts
+- View (`contacts`, [id#167438,name#167439,address#167440,pets#167441,friends#167442,relatives#167443,employer#167444,relations#167445,p#167446])
+- RelationV2[id#167438, name#167439, address#167440, pets#167441, friends#167442, relatives#167443, employer#167444, relations#167445, p#167446] parquet file:/tmp/spark-d90c9f45-df42-4689-b81a-90e59ed12635/contacts
== Optimized Logical Plan ==
Aggregate [count(id#167438) AS count(id)#167483L]
+- Project [id#167438]
+- Filter ((isnotnull(name#167439) AND isnotnull(name#167439.middle)) AND (name#167439.last = Jones))
+- RelationV2[id#167438, name#167439] parquet file:/tmp/spark-d90c9f45-df42-4689-b81a-90e59ed12635/contacts
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(11330) HashAggregateTransformer(keys=[], functions=[count(id#167438)], isStreamingAgg=false, output=[count(id)#167483L])
+- ^(11330) InputIteratorTransformer[count#167492L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=947027], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(11329) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#167438)], isStreamingAgg=false, output=[count#167492L])
+- ^(11329) ProjectExecTransformer [id#167438]
+- ^(11329) FilterExecTransformer ((isnotnull(name#167439) AND isnotnull(name#167439.middle)) AND (name#167439.last = Jones))
+- ^(11329) BatchScanExecTransformer[id#167438, name#167439] ParquetScan DataFilters: [isnotnull(name#167439), isnotnull(name#167439.middle), (name#167439.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-d90c9f45-df42-4689-b81a-90e59ed12635/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)] RuntimeFilters: []
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#167438)], output=[count(id)#167483L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=946990]
+- HashAggregate(keys=[], functions=[partial_count(id#167438)], output=[count#167492L])
+- Project [id#167438]
+- Filter ((isnotnull(name#167439) AND isnotnull(name#167439.middle)) AND (name#167439.last = Jones))
+- BatchScan[id#167438, name#167439] ParquetScan DataFilters: [isnotnull(name#167439), isnotnull(name#167439.middle), (name#167439.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-d90c9f45-df42-4689-b81a-90e59ed12635/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$13390/1121076727@2e6a445))]
+- Filter (last#167595 = Jones)
+- Project [id#167557, name#167558.first AS first#167593, name#167558.middle AS middle#167594, name#167558.last AS last#167595]
+- Filter isnotnull(name#167558.middle)
+- Project [id#167557, name#167558, address#167559, pets#167560, friends#167561, relatives#167562, employer#167563, relations#167564, p#167565]
+- SubqueryAlias contacts
+- View (`contacts`, [id#167557,name#167558,address#167559,pets#167560,friends#167561,relatives#167562,employer#167563,relations#167564,p#167565])
+- RelationV2[id#167557, name#167558, address#167559, pets#167560, friends#167561, relatives#167562, employer#167563, relations#167564, p#167565] parquet file:/tmp/spark-8cd40a91-8a66-47ca-aea7-c50bf9bc8a55/contacts
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#167557) AS count(id)#167602L]
+- Filter (last#167595 = Jones)
+- Project [id#167557, name#167558.first AS first#167593, name#167558.middle AS middle#167594, name#167558.last AS last#167595]
+- Filter isnotnull(name#167558.middle)
+- Project [id#167557, name#167558, address#167559, pets#167560, friends#167561, relatives#167562, employer#167563, relations#167564, p#167565]
+- SubqueryAlias contacts
+- View (`contacts`, [id#167557,name#167558,address#167559,pets#167560,friends#167561,relatives#167562,employer#167563,relations#167564,p#167565])
+- RelationV2[id#167557, name#167558, address#167559, pets#167560, friends#167561, relatives#167562, employer#167563, relations#167564, p#167565] parquet file:/tmp/spark-8cd40a91-8a66-47ca-aea7-c50bf9bc8a55/contacts
== Optimized Logical Plan ==
Aggregate [count(id#167557) AS count(id)#167602L]
+- Project [id#167557]
+- Filter ((isnotnull(name#167558) AND isnotnull(name#167558.middle)) AND (name#167558.last = Jones))
+- RelationV2[id#167557, name#167558] parquet file:/tmp/spark-8cd40a91-8a66-47ca-aea7-c50bf9bc8a55/contacts
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(11334) HashAggregateTransformer(keys=[], functions=[count(id#167557)], isStreamingAgg=false, output=[count(id)#167602L])
+- ^(11334) InputIteratorTransformer[count#167611L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=947244], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(11333) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#167557)], isStreamingAgg=false, output=[count#167611L])
+- ^(11333) ProjectExecTransformer [id#167557]
+- ^(11333) FilterExecTransformer ((isnotnull(name#167558) AND isnotnull(name#167558.middle)) AND (name#167558.last = Jones))
+- ^(11333) BatchScanExecTransformer[id#167557, name#167558] ParquetScan DataFilters: [isnotnull(name#167558), isnotnull(name#167558.middle), (name#167558.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-8cd40a91-8a66-47ca-aea7-c50bf9bc8a55/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)] RuntimeFilters: []
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#167557)], output=[count(id)#167602L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=947207]
+- HashAggregate(keys=[], functions=[partial_count(id#167557)], output=[count#167611L])
+- Project [id#167557]
+- Filter ((isnotnull(name#167558) AND isnotnull(name#167558.middle)) AND (name#167558.last = Jones))
+- BatchScan[id#167557, name#167558] ParquetScan DataFilters: [isnotnull(name#167558), isnotnull(name#167558.middle), (name#167558.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-8cd40a91-8a66-47ca-aea7-c50bf9bc8a55/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$13390/1121076727@2e6a445))]
+- Filter (last#167720 = Jones)
+- Project [id#167682, name#167683.first AS first#167718, name#167683.middle AS middle#167719, name#167683.last AS last#167720]
+- Filter isnotnull(name#167683.middle)
+- Project [id#167682, name#167683, address#167684, pets#167685, friends#167686, relatives#167687, employer#167688, relations#167689, p#167690]
+- SubqueryAlias contacts
+- View (`contacts`, [id#167682,name#167683,address#167684,pets#167685,friends#167686,relatives#167687,employer#167688,relations#167689,p#167690])
+- RelationV2[id#167682, name#167683, address#167684, pets#167685, friends#167686, relatives#167687, employer#167688, relations#167689, p#167690] parquet file:/tmp/spark-35ba832c-e968-4532-b9c5-9756e9b682bc/contacts
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#167682) AS count(id)#167727L]
+- Filter (last#167720 = Jones)
+- Project [id#167682, name#167683.first AS first#167718, name#167683.middle AS middle#167719, name#167683.last AS last#167720]
+- Filter isnotnull(name#167683.middle)
+- Project [id#167682, name#167683, address#167684, pets#167685, friends#167686, relatives#167687, employer#167688, relations#167689, p#167690]
+- SubqueryAlias contacts
+- View (`contacts`, [id#167682,name#167683,address#167684,pets#167685,friends#167686,relatives#167687,employer#167688,relations#167689,p#167690])
+- RelationV2[id#167682, name#167683, address#167684, pets#167685, friends#167686, relatives#167687, employer#167688, relations#167689, p#167690] parquet file:/tmp/spark-35ba832c-e968-4532-b9c5-9756e9b682bc/contacts
== Optimized Logical Plan ==
Aggregate [count(id#167682) AS count(id)#167727L]
+- Project [id#167682]
+- Filter ((isnotnull(name#167683) AND isnotnull(name#167683.middle)) AND (name#167683.last = Jones))
+- RelationV2[id#167682, name#167683] parquet file:/tmp/spark-35ba832c-e968-4532-b9c5-9756e9b682bc/contacts
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(11338) HashAggregateTransformer(keys=[], functions=[count(id#167682)], isStreamingAgg=false, output=[count(id)#167727L])
+- ^(11338) InputIteratorTransformer[count#167736L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=947461], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(11337) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#167682)], isStreamingAgg=false, output=[count#167736L])
+- ^(11337) ProjectExecTransformer [id#167682]
+- ^(11337) FilterExecTransformer ((isnotnull(name#167683) AND isnotnull(name#167683.middle)) AND (name#167683.last = Jones))
+- ^(11337) BatchScanExecTransformer[id#167682, name#167683] ParquetScan DataFilters: [isnotnull(name#167683), isnotnull(name#167683.middle), (name#167683.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-35ba832c-e968-4532-b9c5-9756e9b682bc/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)] RuntimeFilters: []
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#167682)], output=[count(id)#167727L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=947424]
+- HashAggregate(keys=[], functions=[partial_count(id#167682)], output=[count#167736L])
+- Project [id#167682]
+- Filter ((isnotnull(name#167683) AND isnotnull(name#167683.middle)) AND (name#167683.last = Jones))
+- BatchScan[id#167682, name#167683] ParquetScan DataFilters: [isnotnull(name#167683), isnotnull(name#167683.middle), (name#167683.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-35ba832c-e968-4532-b9c5-9756e9b682bc/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.middle), EqualTo(name.last,Jones)] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#180206.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#180205,name#180206,address#180207,pets#180208,friends#180209,relatives#180210,employer#180211,relations#180212,p#180213])
+- RelationV2[id#180205, name#180206, address#180207, pets#180208, friends#180209, relatives#180210, employer#180211, relations#180212, p#180213] parquet file:/tmp/spark-03185a12-9972-4426-a7ee-e05b879be6b5/contacts
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#180206.First AS First#180266, NAME#180206.MiDDle AS MiDDle#180267]
+- Filter isnotnull(Name#180206.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#180205,name#180206,address#180207,pets#180208,friends#180209,relatives#180210,employer#180211,relations#180212,p#180213])
+- RelationV2[id#180205, name#180206, address#180207, pets#180208, friends#180209, relatives#180210, employer#180211, relations#180212, p#180213] parquet file:/tmp/spark-03185a12-9972-4426-a7ee-e05b879be6b5/contacts
== Optimized Logical Plan ==
Project [name#180206.first AS First#180266, name#180206.middle AS MiDDle#180267]
+- Filter (isnotnull(name#180206) AND isnotnull(name#180206.middle))
+- RelationV2[name#180206] parquet file:/tmp/spark-03185a12-9972-4426-a7ee-e05b879be6b5/contacts
== Physical Plan ==
VeloxColumnarToRow
+- ^(11937) ProjectExecTransformer [name#180206.first AS First#180266, name#180206.middle AS MiDDle#180267]
+- ^(11937) FilterExecTransformer (isnotnull(name#180206) AND isnotnull(name#180206.middle))
+- ^(11937) BatchScanExecTransformer[name#180206] ParquetScan DataFilters: [isnotnull(name#180206), isnotnull(name#180206.middle)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-03185a12-9972-4426-a7ee-e05b879be6b5/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>, PushedFilters: [IsNotNull(name.middle)] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#180338.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#180337,name#180338,address#180339,pets#180340,friends#180341,relatives#180342,employer#180343,relations#180344,p#180345])
+- RelationV2[id#180337, name#180338, address#180339, pets#180340, friends#180341, relatives#180342, employer#180343, relations#180344, p#180345] parquet file:/tmp/spark-e78cdbc1-9327-4c82-abbe-be1781e5c94b/contacts
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#180338.First AS First#180398, NAME#180338.MiDDle AS MiDDle#180399]
+- Filter isnotnull(Name#180338.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#180337,name#180338,address#180339,pets#180340,friends#180341,relatives#180342,employer#180343,relations#180344,p#180345])
+- RelationV2[id#180337, name#180338, address#180339, pets#180340, friends#180341, relatives#180342, employer#180343, relations#180344, p#180345] parquet file:/tmp/spark-e78cdbc1-9327-4c82-abbe-be1781e5c94b/contacts
== Optimized Logical Plan ==
Project [name#180338.first AS First#180398, name#180338.middle AS MiDDle#180399]
+- Filter (isnotnull(name#180338) AND isnotnull(name#180338.middle))
+- RelationV2[name#180338] parquet file:/tmp/spark-e78cdbc1-9327-4c82-abbe-be1781e5c94b/contacts
== Physical Plan ==
VeloxColumnarToRow
+- ^(11941) ProjectExecTransformer [name#180338.first AS First#180398, name#180338.middle AS MiDDle#180399]
+- ^(11941) FilterExecTransformer (isnotnull(name#180338) AND isnotnull(name#180338.middle))
+- ^(11941) BatchScanExecTransformer[name#180338] ParquetScan DataFilters: [isnotnull(name#180338), isnotnull(name#180338.middle)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-e78cdbc1-9327-4c82-abbe-be1781e5c94b/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>, PushedFilters: [IsNotNull(name.middle)] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#180464.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#180463,name#180464,address#180465,pets#180466,friends#180467,relatives#180468,employer#180469,relations#180470,p#180471])
+- RelationV2[id#180463, name#180464, address#180465, pets#180466, friends#180467, relatives#180468, employer#180469, relations#180470, p#180471] parquet file:/tmp/spark-bbe6f7a6-e4f7-4cba-bb80-5e16922d111c/contacts
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#180464.First AS First#180524, NAME#180464.MiDDle AS MiDDle#180525]
+- Filter isnotnull(Name#180464.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#180463,name#180464,address#180465,pets#180466,friends#180467,relatives#180468,employer#180469,relations#180470,p#180471])
+- RelationV2[id#180463, name#180464, address#180465, pets#180466, friends#180467, relatives#180468, employer#180469, relations#180470, p#180471] parquet file:/tmp/spark-bbe6f7a6-e4f7-4cba-bb80-5e16922d111c/contacts
== Optimized Logical Plan ==
Project [name#180464.first AS First#180524, name#180464.middle AS MiDDle#180525]
+- Filter (isnotnull(name#180464) AND isnotnull(name#180464.middle))
+- RelationV2[name#180464] parquet file:/tmp/spark-bbe6f7a6-e4f7-4cba-bb80-5e16922d111c/contacts
== Physical Plan ==
VeloxColumnarToRow
+- ^(11945) ProjectExecTransformer [name#180464.first AS First#180524, name#180464.middle AS MiDDle#180525]
+- ^(11945) FilterExecTransformer (isnotnull(name#180464) AND isnotnull(name#180464.middle))
+- ^(11945) BatchScanExecTransformer[name#180464] ParquetScan DataFilters: [isnotnull(name#180464), isnotnull(name#180464.middle)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-bbe6f7a6-e4f7-4cba-bb80-5e16922d111c/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>, PushedFilters: [IsNotNull(name.middle)] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#180596.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#180595,name#180596,address#180597,pets#180598,friends#180599,relatives#180600,employer#180601,relations#180602,p#180603])
+- RelationV2[id#180595, name#180596, address#180597, pets#180598, friends#180599, relatives#180600, employer#180601, relations#180602, p#180603] parquet file:/tmp/spark-262243fc-70a9-4771-8878-219faae974b9/contacts
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#180596.First AS First#180656, NAME#180596.MiDDle AS MiDDle#180657]
+- Filter isnotnull(Name#180596.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#180595,name#180596,address#180597,pets#180598,friends#180599,relatives#180600,employer#180601,relations#180602,p#180603])
+- RelationV2[id#180595, name#180596, address#180597, pets#180598, friends#180599, relatives#180600, employer#180601, relations#180602, p#180603] parquet file:/tmp/spark-262243fc-70a9-4771-8878-219faae974b9/contacts
== Optimized Logical Plan ==
Project [name#180596.first AS First#180656, name#180596.middle AS MiDDle#180657]
+- Filter (isnotnull(name#180596) AND isnotnull(name#180596.middle))
+- RelationV2[name#180596] parquet file:/tmp/spark-262243fc-70a9-4771-8878-219faae974b9/contacts
== Physical Plan ==
VeloxColumnarToRow
+- ^(11949) ProjectExecTransformer [name#180596.first AS First#180656, name#180596.middle AS MiDDle#180657]
+- ^(11949) FilterExecTransformer (isnotnull(name#180596) AND isnotnull(name#180596.middle))
+- ^(11949) BatchScanExecTransformer[name#180596] ParquetScan DataFilters: [isnotnull(name#180596), isnotnull(name#180596.middle)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-262243fc-70a9-4771-8878-219faae974b9/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>, PushedFilters: [IsNotNull(name.middle)] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$14479/1380631591@2a624e19))]
+- Filter (last#294783 = Jones)
+- Project [id#294745, name#294746.first AS first#294781, name#294746.middle AS middle#294782, name#294746.last AS last#294783]
+- Filter isnotnull(name#294746.middle)
+- Project [id#294745, name#294746, address#294747, pets#294748, friends#294749, relatives#294750, employer#294751, relations#294752, p#294753]
+- SubqueryAlias contacts
+- View (`contacts`, [id#294745,name#294746,address#294747,pets#294748,friends#294749,relatives#294750,employer#294751,relations#294752,p#294753])
+- Relation [id#294745,name#294746,address#294747,pets#294748,friends#294749,relatives#294750,employer#294751,relations#294752,p#294753] parquet
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#294745) AS count(id)#294790L]
+- Filter (last#294783 = Jones)
+- Project [id#294745, name#294746.first AS first#294781, name#294746.middle AS middle#294782, name#294746.last AS last#294783]
+- Filter isnotnull(name#294746.middle)
+- Project [id#294745, name#294746, address#294747, pets#294748, friends#294749, relatives#294750, employer#294751, relations#294752, p#294753]
+- SubqueryAlias contacts
+- View (`contacts`, [id#294745,name#294746,address#294747,pets#294748,friends#294749,relatives#294750,employer#294751,relations#294752,p#294753])
+- Relation [id#294745,name#294746,address#294747,pets#294748,friends#294749,relatives#294750,employer#294751,relations#294752,p#294753] parquet
== Optimized Logical Plan ==
Aggregate [count(id#294745) AS count(id)#294790L]
+- Project [id#294745]
+- Filter ((isnotnull(name#294746.last) AND isnotnull(name#294746.middle)) AND (name#294746.last = Jones))
+- Relation [id#294745,name#294746,address#294747,pets#294748,friends#294749,relatives#294750,employer#294751,relations#294752,p#294753] parquet
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(22087) HashAggregateTransformer(keys=[], functions=[count(id#294745)], isStreamingAgg=false, output=[count(id)#294790L])
+- ^(22087) InputIteratorTransformer[count#294802L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1406819], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(22086) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#294745)], isStreamingAgg=false, output=[count#294802L])
+- ^(22086) ProjectExecTransformer [id#294745]
+- ^(22086) FilterExecTransformer ((isnotnull(name#294746.last) AND isnotnull(name#294746.middle)) AND (name#294746.last = Jones))
+- ^(22086) FileScanTransformer parquet [id#294745,name#294746,p#294753] Batched: true, DataFilters: [isnotnull(name#294746.last), isnotnull(name#294746.middle), (name#294746.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-995b461e-177d-46c7-8b13-543fabdd7765/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#294745)], output=[count(id)#294790L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1406780]
+- HashAggregate(keys=[], functions=[partial_count(id#294745)], output=[count#294802L])
+- Project [id#294745]
+- Filter ((isnotnull(name#294746.last) AND isnotnull(name#294746.middle)) AND (name#294746.last = Jones))
+- FileScan parquet [id#294745,name#294746,p#294753] Batched: false, DataFilters: [isnotnull(name#294746.last), isnotnull(name#294746.middle), (name#294746.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-995b461e-177d-46c7-8b13-543fabdd7765/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$14479/1380631591@2a624e19))]
+- Filter (last#294908 = Jones)
+- Project [id#294870, name#294871.first AS first#294906, name#294871.middle AS middle#294907, name#294871.last AS last#294908]
+- Filter isnotnull(name#294871.middle)
+- Project [id#294870, name#294871, address#294872, pets#294873, friends#294874, relatives#294875, employer#294876, relations#294877, p#294878]
+- SubqueryAlias contacts
+- View (`contacts`, [id#294870,name#294871,address#294872,pets#294873,friends#294874,relatives#294875,employer#294876,relations#294877,p#294878])
+- Relation [id#294870,name#294871,address#294872,pets#294873,friends#294874,relatives#294875,employer#294876,relations#294877,p#294878] parquet
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#294870) AS count(id)#294915L]
+- Filter (last#294908 = Jones)
+- Project [id#294870, name#294871.first AS first#294906, name#294871.middle AS middle#294907, name#294871.last AS last#294908]
+- Filter isnotnull(name#294871.middle)
+- Project [id#294870, name#294871, address#294872, pets#294873, friends#294874, relatives#294875, employer#294876, relations#294877, p#294878]
+- SubqueryAlias contacts
+- View (`contacts`, [id#294870,name#294871,address#294872,pets#294873,friends#294874,relatives#294875,employer#294876,relations#294877,p#294878])
+- Relation [id#294870,name#294871,address#294872,pets#294873,friends#294874,relatives#294875,employer#294876,relations#294877,p#294878] parquet
== Optimized Logical Plan ==
Aggregate [count(id#294870) AS count(id)#294915L]
+- Project [id#294870]
+- Filter ((isnotnull(name#294871.last) AND isnotnull(name#294871.middle)) AND (name#294871.last = Jones))
+- Relation [id#294870,name#294871,address#294872,pets#294873,friends#294874,relatives#294875,employer#294876,relations#294877,p#294878] parquet
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(22091) HashAggregateTransformer(keys=[], functions=[count(id#294870)], isStreamingAgg=false, output=[count(id)#294915L])
+- ^(22091) InputIteratorTransformer[count#294927L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1407042], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(22090) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#294870)], isStreamingAgg=false, output=[count#294927L])
+- ^(22090) ProjectExecTransformer [id#294870]
+- ^(22090) FilterExecTransformer ((isnotnull(name#294871.last) AND isnotnull(name#294871.middle)) AND (name#294871.last = Jones))
+- ^(22090) FileScanTransformer parquet [id#294870,name#294871,p#294878] Batched: true, DataFilters: [isnotnull(name#294871.last), isnotnull(name#294871.middle), (name#294871.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-75e220a9-640a-497c-a66a-39780a68c92d/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#294870)], output=[count(id)#294915L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1407003]
+- HashAggregate(keys=[], functions=[partial_count(id#294870)], output=[count#294927L])
+- Project [id#294870]
+- Filter ((isnotnull(name#294871.last) AND isnotnull(name#294871.middle)) AND (name#294871.last = Jones))
+- FileScan parquet [id#294870,name#294871,p#294878] Batched: false, DataFilters: [isnotnull(name#294871.last), isnotnull(name#294871.middle), (name#294871.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-75e220a9-640a-497c-a66a-39780a68c92d/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$14479/1380631591@2a624e19))]
+- Filter (last#295027 = Jones)
+- Project [id#294989, name#294990.first AS first#295025, name#294990.middle AS middle#295026, name#294990.last AS last#295027]
+- Filter isnotnull(name#294990.middle)
+- Project [id#294989, name#294990, address#294991, pets#294992, friends#294993, relatives#294994, employer#294995, relations#294996, p#294997]
+- SubqueryAlias contacts
+- View (`contacts`, [id#294989,name#294990,address#294991,pets#294992,friends#294993,relatives#294994,employer#294995,relations#294996,p#294997])
+- Relation [id#294989,name#294990,address#294991,pets#294992,friends#294993,relatives#294994,employer#294995,relations#294996,p#294997] parquet
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#294989) AS count(id)#295034L]
+- Filter (last#295027 = Jones)
+- Project [id#294989, name#294990.first AS first#295025, name#294990.middle AS middle#295026, name#294990.last AS last#295027]
+- Filter isnotnull(name#294990.middle)
+- Project [id#294989, name#294990, address#294991, pets#294992, friends#294993, relatives#294994, employer#294995, relations#294996, p#294997]
+- SubqueryAlias contacts
+- View (`contacts`, [id#294989,name#294990,address#294991,pets#294992,friends#294993,relatives#294994,employer#294995,relations#294996,p#294997])
+- Relation [id#294989,name#294990,address#294991,pets#294992,friends#294993,relatives#294994,employer#294995,relations#294996,p#294997] parquet
== Optimized Logical Plan ==
Aggregate [count(id#294989) AS count(id)#295034L]
+- Project [id#294989]
+- Filter ((isnotnull(name#294990.last) AND isnotnull(name#294990.middle)) AND (name#294990.last = Jones))
+- Relation [id#294989,name#294990,address#294991,pets#294992,friends#294993,relatives#294994,employer#294995,relations#294996,p#294997] parquet
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(22095) HashAggregateTransformer(keys=[], functions=[count(id#294989)], isStreamingAgg=false, output=[count(id)#295034L])
+- ^(22095) InputIteratorTransformer[count#295046L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1407265], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(22094) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#294989)], isStreamingAgg=false, output=[count#295046L])
+- ^(22094) ProjectExecTransformer [id#294989]
+- ^(22094) FilterExecTransformer ((isnotnull(name#294990.last) AND isnotnull(name#294990.middle)) AND (name#294990.last = Jones))
+- ^(22094) FileScanTransformer parquet [id#294989,name#294990,p#294997] Batched: true, DataFilters: [isnotnull(name#294990.last), isnotnull(name#294990.middle), (name#294990.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-2cb1ddc6-9744-4444-ad36-a48219ee06c8/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#294989)], output=[count(id)#295034L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1407226]
+- HashAggregate(keys=[], functions=[partial_count(id#294989)], output=[count#295046L])
+- Project [id#294989]
+- Filter ((isnotnull(name#294990.last) AND isnotnull(name#294990.middle)) AND (name#294990.last = Jones))
+- FileScan parquet [id#294989,name#294990,p#294997] Batched: false, DataFilters: [isnotnull(name#294990.last), isnotnull(name#294990.middle), (name#294990.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-2cb1ddc6-9744-4444-ad36-a48219ee06c8/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$14479/1380631591@2a624e19))]
+- Filter (last#295152 = Jones)
+- Project [id#295114, name#295115.first AS first#295150, name#295115.middle AS middle#295151, name#295115.last AS last#295152]
+- Filter isnotnull(name#295115.middle)
+- Project [id#295114, name#295115, address#295116, pets#295117, friends#295118, relatives#295119, employer#295120, relations#295121, p#295122]
+- SubqueryAlias contacts
+- View (`contacts`, [id#295114,name#295115,address#295116,pets#295117,friends#295118,relatives#295119,employer#295120,relations#295121,p#295122])
+- Relation [id#295114,name#295115,address#295116,pets#295117,friends#295118,relatives#295119,employer#295120,relations#295121,p#295122] parquet
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#295114) AS count(id)#295159L]
+- Filter (last#295152 = Jones)
+- Project [id#295114, name#295115.first AS first#295150, name#295115.middle AS middle#295151, name#295115.last AS last#295152]
+- Filter isnotnull(name#295115.middle)
+- Project [id#295114, name#295115, address#295116, pets#295117, friends#295118, relatives#295119, employer#295120, relations#295121, p#295122]
+- SubqueryAlias contacts
+- View (`contacts`, [id#295114,name#295115,address#295116,pets#295117,friends#295118,relatives#295119,employer#295120,relations#295121,p#295122])
+- Relation [id#295114,name#295115,address#295116,pets#295117,friends#295118,relatives#295119,employer#295120,relations#295121,p#295122] parquet
== Optimized Logical Plan ==
Aggregate [count(id#295114) AS count(id)#295159L]
+- Project [id#295114]
+- Filter ((isnotnull(name#295115.last) AND isnotnull(name#295115.middle)) AND (name#295115.last = Jones))
+- Relation [id#295114,name#295115,address#295116,pets#295117,friends#295118,relatives#295119,employer#295120,relations#295121,p#295122] parquet
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(22099) HashAggregateTransformer(keys=[], functions=[count(id#295114)], isStreamingAgg=false, output=[count(id)#295159L])
+- ^(22099) InputIteratorTransformer[count#295171L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1407488], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(22098) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#295114)], isStreamingAgg=false, output=[count#295171L])
+- ^(22098) ProjectExecTransformer [id#295114]
+- ^(22098) FilterExecTransformer ((isnotnull(name#295115.last) AND isnotnull(name#295115.middle)) AND (name#295115.last = Jones))
+- ^(22098) FileScanTransformer parquet [id#295114,name#295115,p#295122] Batched: true, DataFilters: [isnotnull(name#295115.last), isnotnull(name#295115.middle), (name#295115.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-302d6025-9493-4565-b471-71e149508b29/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#295114)], output=[count(id)#295159L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1407449]
+- HashAggregate(keys=[], functions=[partial_count(id#295114)], output=[count#295171L])
+- Project [id#295114]
+- Filter ((isnotnull(name#295115.last) AND isnotnull(name#295115.middle)) AND (name#295115.last = Jones))
+- FileScan parquet [id#295114,name#295115,p#295122] Batched: false, DataFilters: [isnotnull(name#295115.last), isnotnull(name#295115.middle), (name#295115.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-302d6025-9493-4565-b471-71e149508b29/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#308179.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#308178,name#308179,address#308180,pets#308181,friends#308182,relatives#308183,employer#308184,relations#308185,p#308186])
+- Relation [id#308178,name#308179,address#308180,pets#308181,friends#308182,relatives#308183,employer#308184,relations#308185,p#308186] parquet
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#308179.First AS First#308249, NAME#308179.MiDDle AS MiDDle#308250]
+- Filter isnotnull(Name#308179.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#308178,name#308179,address#308180,pets#308181,friends#308182,relatives#308183,employer#308184,relations#308185,p#308186])
+- Relation [id#308178,name#308179,address#308180,pets#308181,friends#308182,relatives#308183,employer#308184,relations#308185,p#308186] parquet
== Optimized Logical Plan ==
Project [name#308179.first AS First#308249, name#308179.middle AS MiDDle#308250]
+- Filter isnotnull(name#308179.middle)
+- Relation [id#308178,name#308179,address#308180,pets#308181,friends#308182,relatives#308183,employer#308184,relations#308185,p#308186] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(22719) ProjectExecTransformer [name#308179.first AS First#308249, name#308179.middle AS MiDDle#308250]
+- ^(22719) FilterExecTransformer isnotnull(name#308179.middle)
+- ^(22719) FileScanTransformer parquet [name#308179,p#308186] Batched: true, DataFilters: [isnotnull(name#308179.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-cf832e8f-814e-4e40-923b-347a6798b05c/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#308327.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#308326,name#308327,address#308328,pets#308329,friends#308330,relatives#308331,employer#308332,relations#308333,p#308334])
+- Relation [id#308326,name#308327,address#308328,pets#308329,friends#308330,relatives#308331,employer#308332,relations#308333,p#308334] parquet
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#308327.First AS First#308397, NAME#308327.MiDDle AS MiDDle#308398]
+- Filter isnotnull(Name#308327.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#308326,name#308327,address#308328,pets#308329,friends#308330,relatives#308331,employer#308332,relations#308333,p#308334])
+- Relation [id#308326,name#308327,address#308328,pets#308329,friends#308330,relatives#308331,employer#308332,relations#308333,p#308334] parquet
== Optimized Logical Plan ==
Project [name#308327.first AS First#308397, name#308327.middle AS MiDDle#308398]
+- Filter isnotnull(name#308327.middle)
+- Relation [id#308326,name#308327,address#308328,pets#308329,friends#308330,relatives#308331,employer#308332,relations#308333,p#308334] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(22723) ProjectExecTransformer [name#308327.first AS First#308397, name#308327.middle AS MiDDle#308398]
+- ^(22723) FilterExecTransformer isnotnull(name#308327.middle)
+- ^(22723) FileScanTransformer parquet [name#308327,p#308334] Batched: true, DataFilters: [isnotnull(name#308327.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-a56ad0a2-4d21-412b-a55f-98d8a3f2854e/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#308469.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#308468,name#308469,address#308470,pets#308471,friends#308472,relatives#308473,employer#308474,relations#308475,p#308476])
+- Relation [id#308468,name#308469,address#308470,pets#308471,friends#308472,relatives#308473,employer#308474,relations#308475,p#308476] parquet
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#308469.First AS First#308539, NAME#308469.MiDDle AS MiDDle#308540]
+- Filter isnotnull(Name#308469.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#308468,name#308469,address#308470,pets#308471,friends#308472,relatives#308473,employer#308474,relations#308475,p#308476])
+- Relation [id#308468,name#308469,address#308470,pets#308471,friends#308472,relatives#308473,employer#308474,relations#308475,p#308476] parquet
== Optimized Logical Plan ==
Project [name#308469.first AS First#308539, name#308469.middle AS MiDDle#308540]
+- Filter isnotnull(name#308469.middle)
+- Relation [id#308468,name#308469,address#308470,pets#308471,friends#308472,relatives#308473,employer#308474,relations#308475,p#308476] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(22727) ProjectExecTransformer [name#308469.first AS First#308539, name#308469.middle AS MiDDle#308540]
+- ^(22727) FilterExecTransformer isnotnull(name#308469.middle)
+- ^(22727) FileScanTransformer parquet [name#308469,p#308476] Batched: true, DataFilters: [isnotnull(name#308469.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-28bd014b-db8a-4bfa-8c08-5c00f1b0dfc1/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#308617.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#308616,name#308617,address#308618,pets#308619,friends#308620,relatives#308621,employer#308622,relations#308623,p#308624])
+- Relation [id#308616,name#308617,address#308618,pets#308619,friends#308620,relatives#308621,employer#308622,relations#308623,p#308624] parquet
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#308617.First AS First#308687, NAME#308617.MiDDle AS MiDDle#308688]
+- Filter isnotnull(Name#308617.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#308616,name#308617,address#308618,pets#308619,friends#308620,relatives#308621,employer#308622,relations#308623,p#308624])
+- Relation [id#308616,name#308617,address#308618,pets#308619,friends#308620,relatives#308621,employer#308622,relations#308623,p#308624] parquet
== Optimized Logical Plan ==
Project [name#308617.first AS First#308687, name#308617.middle AS MiDDle#308688]
+- Filter isnotnull(name#308617.middle)
+- Relation [id#308616,name#308617,address#308618,pets#308619,friends#308620,relatives#308621,employer#308622,relations#308623,p#308624] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(22731) ProjectExecTransformer [name#308617.first AS First#308687, name#308617.middle AS MiDDle#308688]
+- ^(22731) FilterExecTransformer isnotnull(name#308617.middle)
+- ^(22731) FileScanTransformer parquet [name#308617,p#308624] Batched: true, DataFilters: [isnotnull(name#308617.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-57377ccd-b794-442b-83a1-55caa0e15016/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$14479/1380631591@2a624e19))]
+- Filter (last#183293 = Jones)
+- Project [id#183255, name#183256.first AS first#183291, name#183256.middle AS middle#183292, name#183256.last AS last#183293]
+- Filter isnotnull(name#183256.middle)
+- Project [id#183255, name#183256, address#183257, pets#183258, friends#183259, relatives#183260, employer#183261, relations#183262, p#183263]
+- SubqueryAlias contacts
+- View (`contacts`, [id#183255,name#183256,address#183257,pets#183258,friends#183259,relatives#183260,employer#183261,relations#183262,p#183263])
+- RelationV2[id#183255, name#183256, address#183257, pets#183258, friends#183259, relatives#183260, employer#183261, relations#183262, p#183263] parquet file:/tmp/spark-69df9949-8cf2-49ad-ae2c-6c51256f68b1/contacts
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#183255) AS count(id)#183300L]
+- Filter (last#183293 = Jones)
+- Project [id#183255, name#183256.first AS first#183291, name#183256.middle AS middle#183292, name#183256.last AS last#183293]
+- Filter isnotnull(name#183256.middle)
+- Project [id#183255, name#183256, address#183257, pets#183258, friends#183259, relatives#183260, employer#183261, relations#183262, p#183263]
+- SubqueryAlias contacts
+- View (`contacts`, [id#183255,name#183256,address#183257,pets#183258,friends#183259,relatives#183260,employer#183261,relations#183262,p#183263])
+- RelationV2[id#183255, name#183256, address#183257, pets#183258, friends#183259, relatives#183260, employer#183261, relations#183262, p#183263] parquet file:/tmp/spark-69df9949-8cf2-49ad-ae2c-6c51256f68b1/contacts
== Optimized Logical Plan ==
Aggregate [count(id#183255) AS count(id)#183300L]
+- Project [id#183255]
+- Filter ((isnotnull(name#183256.last) AND isnotnull(name#183256.middle)) AND (name#183256.last = Jones))
+- RelationV2[id#183255, name#183256] parquet file:/tmp/spark-69df9949-8cf2-49ad-ae2c-6c51256f68b1/contacts
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(11864) HashAggregateTransformer(keys=[], functions=[count(id#183255)], isStreamingAgg=false, output=[count(id)#183300L])
+- ^(11864) InputIteratorTransformer[count#183305L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=982225], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(11863) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#183255)], isStreamingAgg=false, output=[count#183305L])
+- ^(11863) ProjectExecTransformer [id#183255]
+- ^(11863) FilterExecTransformer ((isnotnull(name#183256.last) AND isnotnull(name#183256.middle)) AND (name#183256.last = Jones))
+- ^(11863) BatchScanExecTransformer[id#183255, name#183256] ParquetScan DataFilters: [isnotnull(name#183256.last), isnotnull(name#183256.middle), (name#183256.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-69df9949-8cf2-49ad-ae2c-6c51256f68b1/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: []
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#183255)], output=[count(id)#183300L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=982188]
+- HashAggregate(keys=[], functions=[partial_count(id#183255)], output=[count#183305L])
+- Project [id#183255]
+- Filter ((isnotnull(name#183256.last) AND isnotnull(name#183256.middle)) AND (name#183256.last = Jones))
+- BatchScan[id#183255, name#183256] ParquetScan DataFilters: [isnotnull(name#183256.last), isnotnull(name#183256.middle), (name#183256.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-69df9949-8cf2-49ad-ae2c-6c51256f68b1/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$14479/1380631591@2a624e19))]
+- Filter (last#183414 = Jones)
+- Project [id#183376, name#183377.first AS first#183412, name#183377.middle AS middle#183413, name#183377.last AS last#183414]
+- Filter isnotnull(name#183377.middle)
+- Project [id#183376, name#183377, address#183378, pets#183379, friends#183380, relatives#183381, employer#183382, relations#183383, p#183384]
+- SubqueryAlias contacts
+- View (`contacts`, [id#183376,name#183377,address#183378,pets#183379,friends#183380,relatives#183381,employer#183382,relations#183383,p#183384])
+- RelationV2[id#183376, name#183377, address#183378, pets#183379, friends#183380, relatives#183381, employer#183382, relations#183383, p#183384] parquet file:/tmp/spark-125db852-2cde-4a72-a4b4-88dd5251ece4/contacts
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#183376) AS count(id)#183421L]
+- Filter (last#183414 = Jones)
+- Project [id#183376, name#183377.first AS first#183412, name#183377.middle AS middle#183413, name#183377.last AS last#183414]
+- Filter isnotnull(name#183377.middle)
+- Project [id#183376, name#183377, address#183378, pets#183379, friends#183380, relatives#183381, employer#183382, relations#183383, p#183384]
+- SubqueryAlias contacts
+- View (`contacts`, [id#183376,name#183377,address#183378,pets#183379,friends#183380,relatives#183381,employer#183382,relations#183383,p#183384])
+- RelationV2[id#183376, name#183377, address#183378, pets#183379, friends#183380, relatives#183381, employer#183382, relations#183383, p#183384] parquet file:/tmp/spark-125db852-2cde-4a72-a4b4-88dd5251ece4/contacts
== Optimized Logical Plan ==
Aggregate [count(id#183376) AS count(id)#183421L]
+- Project [id#183376]
+- Filter ((isnotnull(name#183377.last) AND isnotnull(name#183377.middle)) AND (name#183377.last = Jones))
+- RelationV2[id#183376, name#183377] parquet file:/tmp/spark-125db852-2cde-4a72-a4b4-88dd5251ece4/contacts
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(11868) HashAggregateTransformer(keys=[], functions=[count(id#183376)], isStreamingAgg=false, output=[count(id)#183421L])
+- ^(11868) InputIteratorTransformer[count#183426L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=982442], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(11867) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#183376)], isStreamingAgg=false, output=[count#183426L])
+- ^(11867) ProjectExecTransformer [id#183376]
+- ^(11867) FilterExecTransformer ((isnotnull(name#183377.last) AND isnotnull(name#183377.middle)) AND (name#183377.last = Jones))
+- ^(11867) BatchScanExecTransformer[id#183376, name#183377] ParquetScan DataFilters: [isnotnull(name#183377.last), isnotnull(name#183377.middle), (name#183377.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-125db852-2cde-4a72-a4b4-88dd5251ece4/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: []
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#183376)], output=[count(id)#183421L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=982405]
+- HashAggregate(keys=[], functions=[partial_count(id#183376)], output=[count#183426L])
+- Project [id#183376]
+- Filter ((isnotnull(name#183377.last) AND isnotnull(name#183377.middle)) AND (name#183377.last = Jones))
+- BatchScan[id#183376, name#183377] ParquetScan DataFilters: [isnotnull(name#183377.last), isnotnull(name#183377.middle), (name#183377.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-125db852-2cde-4a72-a4b4-88dd5251ece4/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$14479/1380631591@2a624e19))]
+- Filter (last#183529 = Jones)
+- Project [id#183491, name#183492.first AS first#183527, name#183492.middle AS middle#183528, name#183492.last AS last#183529]
+- Filter isnotnull(name#183492.middle)
+- Project [id#183491, name#183492, address#183493, pets#183494, friends#183495, relatives#183496, employer#183497, relations#183498, p#183499]
+- SubqueryAlias contacts
+- View (`contacts`, [id#183491,name#183492,address#183493,pets#183494,friends#183495,relatives#183496,employer#183497,relations#183498,p#183499])
+- RelationV2[id#183491, name#183492, address#183493, pets#183494, friends#183495, relatives#183496, employer#183497, relations#183498, p#183499] parquet file:/tmp/spark-d305494e-58c4-46e5-88d8-e98341bc437e/contacts
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#183491) AS count(id)#183536L]
+- Filter (last#183529 = Jones)
+- Project [id#183491, name#183492.first AS first#183527, name#183492.middle AS middle#183528, name#183492.last AS last#183529]
+- Filter isnotnull(name#183492.middle)
+- Project [id#183491, name#183492, address#183493, pets#183494, friends#183495, relatives#183496, employer#183497, relations#183498, p#183499]
+- SubqueryAlias contacts
+- View (`contacts`, [id#183491,name#183492,address#183493,pets#183494,friends#183495,relatives#183496,employer#183497,relations#183498,p#183499])
+- RelationV2[id#183491, name#183492, address#183493, pets#183494, friends#183495, relatives#183496, employer#183497, relations#183498, p#183499] parquet file:/tmp/spark-d305494e-58c4-46e5-88d8-e98341bc437e/contacts
== Optimized Logical Plan ==
Aggregate [count(id#183491) AS count(id)#183536L]
+- Project [id#183491]
+- Filter ((isnotnull(name#183492.last) AND isnotnull(name#183492.middle)) AND (name#183492.last = Jones))
+- RelationV2[id#183491, name#183492] parquet file:/tmp/spark-d305494e-58c4-46e5-88d8-e98341bc437e/contacts
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(11872) HashAggregateTransformer(keys=[], functions=[count(id#183491)], isStreamingAgg=false, output=[count(id)#183536L])
+- ^(11872) InputIteratorTransformer[count#183541L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=982659], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(11871) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#183491)], isStreamingAgg=false, output=[count#183541L])
+- ^(11871) ProjectExecTransformer [id#183491]
+- ^(11871) FilterExecTransformer ((isnotnull(name#183492.last) AND isnotnull(name#183492.middle)) AND (name#183492.last = Jones))
+- ^(11871) BatchScanExecTransformer[id#183491, name#183492] ParquetScan DataFilters: [isnotnull(name#183492.last), isnotnull(name#183492.middle), (name#183492.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-d305494e-58c4-46e5-88d8-e98341bc437e/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: []
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#183491)], output=[count(id)#183536L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=982622]
+- HashAggregate(keys=[], functions=[partial_count(id#183491)], output=[count#183541L])
+- Project [id#183491]
+- Filter ((isnotnull(name#183492.last) AND isnotnull(name#183492.middle)) AND (name#183492.last = Jones))
+- BatchScan[id#183491, name#183492] ParquetScan DataFilters: [isnotnull(name#183492.last), isnotnull(name#183492.middle), (name#183492.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-d305494e-58c4-46e5-88d8-e98341bc437e/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$14479/1380631591@2a624e19))]
+- Filter (last#183650 = Jones)
+- Project [id#183612, name#183613.first AS first#183648, name#183613.middle AS middle#183649, name#183613.last AS last#183650]
+- Filter isnotnull(name#183613.middle)
+- Project [id#183612, name#183613, address#183614, pets#183615, friends#183616, relatives#183617, employer#183618, relations#183619, p#183620]
+- SubqueryAlias contacts
+- View (`contacts`, [id#183612,name#183613,address#183614,pets#183615,friends#183616,relatives#183617,employer#183618,relations#183619,p#183620])
+- RelationV2[id#183612, name#183613, address#183614, pets#183615, friends#183616, relatives#183617, employer#183618, relations#183619, p#183620] parquet file:/tmp/spark-f7a30b38-513b-40c1-9e0c-24edc9af30e5/contacts
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#183612) AS count(id)#183657L]
+- Filter (last#183650 = Jones)
+- Project [id#183612, name#183613.first AS first#183648, name#183613.middle AS middle#183649, name#183613.last AS last#183650]
+- Filter isnotnull(name#183613.middle)
+- Project [id#183612, name#183613, address#183614, pets#183615, friends#183616, relatives#183617, employer#183618, relations#183619, p#183620]
+- SubqueryAlias contacts
+- View (`contacts`, [id#183612,name#183613,address#183614,pets#183615,friends#183616,relatives#183617,employer#183618,relations#183619,p#183620])
+- RelationV2[id#183612, name#183613, address#183614, pets#183615, friends#183616, relatives#183617, employer#183618, relations#183619, p#183620] parquet file:/tmp/spark-f7a30b38-513b-40c1-9e0c-24edc9af30e5/contacts
== Optimized Logical Plan ==
Aggregate [count(id#183612) AS count(id)#183657L]
+- Project [id#183612]
+- Filter ((isnotnull(name#183613.last) AND isnotnull(name#183613.middle)) AND (name#183613.last = Jones))
+- RelationV2[id#183612, name#183613] parquet file:/tmp/spark-f7a30b38-513b-40c1-9e0c-24edc9af30e5/contacts
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(11876) HashAggregateTransformer(keys=[], functions=[count(id#183612)], isStreamingAgg=false, output=[count(id)#183657L])
+- ^(11876) InputIteratorTransformer[count#183662L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=982876], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(11875) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#183612)], isStreamingAgg=false, output=[count#183662L])
+- ^(11875) ProjectExecTransformer [id#183612]
+- ^(11875) FilterExecTransformer ((isnotnull(name#183613.last) AND isnotnull(name#183613.middle)) AND (name#183613.last = Jones))
+- ^(11875) BatchScanExecTransformer[id#183612, name#183613] ParquetScan DataFilters: [isnotnull(name#183613.last), isnotnull(name#183613.middle), (name#183613.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-f7a30b38-513b-40c1-9e0c-24edc9af30e5/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: []
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#183612)], output=[count(id)#183657L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=982839]
+- HashAggregate(keys=[], functions=[partial_count(id#183612)], output=[count#183662L])
+- Project [id#183612]
+- Filter ((isnotnull(name#183613.last) AND isnotnull(name#183613.middle)) AND (name#183613.last = Jones))
+- BatchScan[id#183612, name#183613] ParquetScan DataFilters: [isnotnull(name#183613.last), isnotnull(name#183613.middle), (name#183613.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-f7a30b38-513b-40c1-9e0c-24edc9af30e5/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>, PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#196009.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#196008,name#196009,address#196010,pets#196011,friends#196012,relatives#196013,employer#196014,relations#196015,p#196016])
+- RelationV2[id#196008, name#196009, address#196010, pets#196011, friends#196012, relatives#196013, employer#196014, relations#196015, p#196016] parquet file:/tmp/spark-504bf387-b699-45d3-9865-ba2af61d27f2/contacts
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#196009.First AS First#196069, NAME#196009.MiDDle AS MiDDle#196070]
+- Filter isnotnull(Name#196009.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#196008,name#196009,address#196010,pets#196011,friends#196012,relatives#196013,employer#196014,relations#196015,p#196016])
+- RelationV2[id#196008, name#196009, address#196010, pets#196011, friends#196012, relatives#196013, employer#196014, relations#196015, p#196016] parquet file:/tmp/spark-504bf387-b699-45d3-9865-ba2af61d27f2/contacts
== Optimized Logical Plan ==
Project [name#196009.first AS First#196069, name#196009.middle AS MiDDle#196070]
+- Filter isnotnull(name#196009.middle)
+- RelationV2[name#196009] parquet file:/tmp/spark-504bf387-b699-45d3-9865-ba2af61d27f2/contacts
== Physical Plan ==
VeloxColumnarToRow
+- ^(12496) ProjectExecTransformer [name#196009.first AS First#196069, name#196009.middle AS MiDDle#196070]
+- ^(12496) FilterExecTransformer isnotnull(name#196009.middle)
+- ^(12496) BatchScanExecTransformer[name#196009] ParquetScan DataFilters: [isnotnull(name#196009.MIDDLE)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-504bf387-b699-45d3-9865-ba2af61d27f2/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.middle)], PushedGroupBy: [], ReadSchema: struct<name:struct<first:string,middle:string>>, PushedFilters: [IsNotNull(name.middle)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#196137.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#196136,name#196137,address#196138,pets#196139,friends#196140,relatives#196141,employer#196142,relations#196143,p#196144])
+- RelationV2[id#196136, name#196137, address#196138, pets#196139, friends#196140, relatives#196141, employer#196142, relations#196143, p#196144] parquet file:/tmp/spark-2d19326d-2455-4f5f-8339-13e075576720/contacts
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#196137.First AS First#196197, NAME#196137.MiDDle AS MiDDle#196198]
+- Filter isnotnull(Name#196137.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#196136,name#196137,address#196138,pets#196139,friends#196140,relatives#196141,employer#196142,relations#196143,p#196144])
+- RelationV2[id#196136, name#196137, address#196138, pets#196139, friends#196140, relatives#196141, employer#196142, relations#196143, p#196144] parquet file:/tmp/spark-2d19326d-2455-4f5f-8339-13e075576720/contacts
== Optimized Logical Plan ==
Project [name#196137.first AS First#196197, name#196137.middle AS MiDDle#196198]
+- Filter isnotnull(name#196137.middle)
+- RelationV2[name#196137] parquet file:/tmp/spark-2d19326d-2455-4f5f-8339-13e075576720/contacts
== Physical Plan ==
VeloxColumnarToRow
+- ^(12500) ProjectExecTransformer [name#196137.first AS First#196197, name#196137.middle AS MiDDle#196198]
+- ^(12500) FilterExecTransformer isnotnull(name#196137.middle)
+- ^(12500) BatchScanExecTransformer[name#196137] ParquetScan DataFilters: [isnotnull(name#196137.MIDDLE)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-2d19326d-2455-4f5f-8339-13e075576720/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.middle)], PushedGroupBy: [], ReadSchema: struct<name:struct<first:string,middle:string>>, PushedFilters: [IsNotNull(name.middle)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#196259.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#196258,name#196259,address#196260,pets#196261,friends#196262,relatives#196263,employer#196264,relations#196265,p#196266])
+- RelationV2[id#196258, name#196259, address#196260, pets#196261, friends#196262, relatives#196263, employer#196264, relations#196265, p#196266] parquet file:/tmp/spark-54ee8f73-6d77-4af9-b536-c9f49478417b/contacts
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#196259.First AS First#196319, NAME#196259.MiDDle AS MiDDle#196320]
+- Filter isnotnull(Name#196259.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#196258,name#196259,address#196260,pets#196261,friends#196262,relatives#196263,employer#196264,relations#196265,p#196266])
+- RelationV2[id#196258, name#196259, address#196260, pets#196261, friends#196262, relatives#196263, employer#196264, relations#196265, p#196266] parquet file:/tmp/spark-54ee8f73-6d77-4af9-b536-c9f49478417b/contacts
== Optimized Logical Plan ==
Project [name#196259.first AS First#196319, name#196259.middle AS MiDDle#196320]
+- Filter isnotnull(name#196259.middle)
+- RelationV2[name#196259] parquet file:/tmp/spark-54ee8f73-6d77-4af9-b536-c9f49478417b/contacts
== Physical Plan ==
VeloxColumnarToRow
+- ^(12504) ProjectExecTransformer [name#196259.first AS First#196319, name#196259.middle AS MiDDle#196320]
+- ^(12504) FilterExecTransformer isnotnull(name#196259.middle)
+- ^(12504) BatchScanExecTransformer[name#196259] ParquetScan DataFilters: [isnotnull(name#196259.MIDDLE)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-54ee8f73-6d77-4af9-b536-c9f49478417b/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.middle)], PushedGroupBy: [], ReadSchema: struct<name:struct<first:string,middle:string>>, PushedFilters: [IsNotNull(name.middle)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#196387.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#196386,name#196387,address#196388,pets#196389,friends#196390,relatives#196391,employer#196392,relations#196393,p#196394])
+- RelationV2[id#196386, name#196387, address#196388, pets#196389, friends#196390, relatives#196391, employer#196392, relations#196393, p#196394] parquet file:/tmp/spark-805d0ea7-f827-4b26-8c62-252d696507dd/contacts
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#196387.First AS First#196447, NAME#196387.MiDDle AS MiDDle#196448]
+- Filter isnotnull(Name#196387.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#196386,name#196387,address#196388,pets#196389,friends#196390,relatives#196391,employer#196392,relations#196393,p#196394])
+- RelationV2[id#196386, name#196387, address#196388, pets#196389, friends#196390, relatives#196391, employer#196392, relations#196393, p#196394] parquet file:/tmp/spark-805d0ea7-f827-4b26-8c62-252d696507dd/contacts
== Optimized Logical Plan ==
Project [name#196387.first AS First#196447, name#196387.middle AS MiDDle#196448]
+- Filter isnotnull(name#196387.middle)
+- RelationV2[name#196387] parquet file:/tmp/spark-805d0ea7-f827-4b26-8c62-252d696507dd/contacts
== Physical Plan ==
VeloxColumnarToRow
+- ^(12508) ProjectExecTransformer [name#196387.first AS First#196447, name#196387.middle AS MiDDle#196448]
+- ^(12508) FilterExecTransformer isnotnull(name#196387.middle)
+- ^(12508) BatchScanExecTransformer[name#196387] ParquetScan DataFilters: [isnotnull(name#196387.MIDDLE)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-805d0ea7-f827-4b26-8c62-252d696507dd/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.middle)], PushedGroupBy: [], ReadSchema: struct<name:struct<first:string,middle:string>>, PushedFilters: [IsNotNull(name.middle)], PushedAggregation: [], PushedGroupBy: [] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15328/1048723032@233ccb3b))]
+- Filter (last#312182 = Jones)
+- Project [id#312144, name#312145.first AS first#312180, name#312145.middle AS middle#312181, name#312145.last AS last#312182]
+- Filter isnotnull(name#312145.middle)
+- Project [id#312144, name#312145, address#312146, pets#312147, friends#312148, relatives#312149, employer#312150, relations#312151, p#312152]
+- SubqueryAlias contacts
+- View (`contacts`, [id#312144,name#312145,address#312146,pets#312147,friends#312148,relatives#312149,employer#312150,relations#312151,p#312152])
+- Relation [id#312144,name#312145,address#312146,pets#312147,friends#312148,relatives#312149,employer#312150,relations#312151,p#312152] parquet
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#312144) AS count(id)#312189L]
+- Filter (last#312182 = Jones)
+- Project [id#312144, name#312145.first AS first#312180, name#312145.middle AS middle#312181, name#312145.last AS last#312182]
+- Filter isnotnull(name#312145.middle)
+- Project [id#312144, name#312145, address#312146, pets#312147, friends#312148, relatives#312149, employer#312150, relations#312151, p#312152]
+- SubqueryAlias contacts
+- View (`contacts`, [id#312144,name#312145,address#312146,pets#312147,friends#312148,relatives#312149,employer#312150,relations#312151,p#312152])
+- Relation [id#312144,name#312145,address#312146,pets#312147,friends#312148,relatives#312149,employer#312150,relations#312151,p#312152] parquet
== Optimized Logical Plan ==
Aggregate [count(id#312144) AS count(id)#312189L]
+- Project [id#312144]
+- Filter ((isnotnull(name#312145.last) AND isnotnull(name#312145.middle)) AND (name#312145.last = Jones))
+- Relation [id#312144,name#312145,address#312146,pets#312147,friends#312148,relatives#312149,employer#312150,relations#312151,p#312152] parquet
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(23490) HashAggregateTransformer(keys=[], functions=[count(id#312144)], isStreamingAgg=false, output=[count(id)#312189L])
+- ^(23490) InputIteratorTransformer[count#312201L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1508543], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(23489) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#312144)], isStreamingAgg=false, output=[count#312201L])
+- ^(23489) ProjectExecTransformer [id#312144]
+- ^(23489) FilterExecTransformer ((isnotnull(name#312145.last) AND isnotnull(name#312145.middle)) AND (name#312145.last = Jones))
+- ^(23489) FileScanTransformer parquet [id#312144,name#312145,p#312152] Batched: true, DataFilters: [isnotnull(name#312145.last), isnotnull(name#312145.middle), (name#312145.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-f8aa6b6a-c063-4ef6-a595-8d796105d43b/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#312144)], output=[count(id)#312189L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1508485]
+- HashAggregate(keys=[], functions=[partial_count(id#312144)], output=[count#312201L])
+- Project [id#312144]
+- Filter ((isnotnull(name#312145.last) AND isnotnull(name#312145.middle)) AND (name#312145.last = Jones))
+- FileScan parquet [id#312144,name#312145,p#312152] Batched: true, DataFilters: [isnotnull(name#312145.last), isnotnull(name#312145.middle), (name#312145.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-f8aa6b6a-c063-4ef6-a595-8d796105d43b/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15328/1048723032@233ccb3b))]
+- Filter (last#312307 = Jones)
+- Project [id#312269, name#312270.first AS first#312305, name#312270.middle AS middle#312306, name#312270.last AS last#312307]
+- Filter isnotnull(name#312270.middle)
+- Project [id#312269, name#312270, address#312271, pets#312272, friends#312273, relatives#312274, employer#312275, relations#312276, p#312277]
+- SubqueryAlias contacts
+- View (`contacts`, [id#312269,name#312270,address#312271,pets#312272,friends#312273,relatives#312274,employer#312275,relations#312276,p#312277])
+- Relation [id#312269,name#312270,address#312271,pets#312272,friends#312273,relatives#312274,employer#312275,relations#312276,p#312277] parquet
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#312269) AS count(id)#312314L]
+- Filter (last#312307 = Jones)
+- Project [id#312269, name#312270.first AS first#312305, name#312270.middle AS middle#312306, name#312270.last AS last#312307]
+- Filter isnotnull(name#312270.middle)
+- Project [id#312269, name#312270, address#312271, pets#312272, friends#312273, relatives#312274, employer#312275, relations#312276, p#312277]
+- SubqueryAlias contacts
+- View (`contacts`, [id#312269,name#312270,address#312271,pets#312272,friends#312273,relatives#312274,employer#312275,relations#312276,p#312277])
+- Relation [id#312269,name#312270,address#312271,pets#312272,friends#312273,relatives#312274,employer#312275,relations#312276,p#312277] parquet
== Optimized Logical Plan ==
Aggregate [count(id#312269) AS count(id)#312314L]
+- Project [id#312269]
+- Filter ((isnotnull(name#312270.last) AND isnotnull(name#312270.middle)) AND (name#312270.last = Jones))
+- Relation [id#312269,name#312270,address#312271,pets#312272,friends#312273,relatives#312274,employer#312275,relations#312276,p#312277] parquet
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(23494) HashAggregateTransformer(keys=[], functions=[count(id#312269)], isStreamingAgg=false, output=[count(id)#312314L])
+- ^(23494) InputIteratorTransformer[count#312326L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1508819], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(23493) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#312269)], isStreamingAgg=false, output=[count#312326L])
+- ^(23493) ProjectExecTransformer [id#312269]
+- ^(23493) FilterExecTransformer ((isnotnull(name#312270.last) AND isnotnull(name#312270.middle)) AND (name#312270.last = Jones))
+- ^(23493) FileScanTransformer parquet [id#312269,name#312270,p#312277] Batched: true, DataFilters: [isnotnull(name#312270.last), isnotnull(name#312270.middle), (name#312270.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-509c36f2-928c-4630-9b30-b7d9f4f7a58d/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#312269)], output=[count(id)#312314L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1508761]
+- HashAggregate(keys=[], functions=[partial_count(id#312269)], output=[count#312326L])
+- Project [id#312269]
+- Filter ((isnotnull(name#312270.last) AND isnotnull(name#312270.middle)) AND (name#312270.last = Jones))
+- FileScan parquet [id#312269,name#312270,p#312277] Batched: true, DataFilters: [isnotnull(name#312270.last), isnotnull(name#312270.middle), (name#312270.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-509c36f2-928c-4630-9b30-b7d9f4f7a58d/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15328/1048723032@233ccb3b))]
+- Filter (last#312426 = Jones)
+- Project [id#312388, name#312389.first AS first#312424, name#312389.middle AS middle#312425, name#312389.last AS last#312426]
+- Filter isnotnull(name#312389.middle)
+- Project [id#312388, name#312389, address#312390, pets#312391, friends#312392, relatives#312393, employer#312394, relations#312395, p#312396]
+- SubqueryAlias contacts
+- View (`contacts`, [id#312388,name#312389,address#312390,pets#312391,friends#312392,relatives#312393,employer#312394,relations#312395,p#312396])
+- Relation [id#312388,name#312389,address#312390,pets#312391,friends#312392,relatives#312393,employer#312394,relations#312395,p#312396] parquet
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#312388) AS count(id)#312433L]
+- Filter (last#312426 = Jones)
+- Project [id#312388, name#312389.first AS first#312424, name#312389.middle AS middle#312425, name#312389.last AS last#312426]
+- Filter isnotnull(name#312389.middle)
+- Project [id#312388, name#312389, address#312390, pets#312391, friends#312392, relatives#312393, employer#312394, relations#312395, p#312396]
+- SubqueryAlias contacts
+- View (`contacts`, [id#312388,name#312389,address#312390,pets#312391,friends#312392,relatives#312393,employer#312394,relations#312395,p#312396])
+- Relation [id#312388,name#312389,address#312390,pets#312391,friends#312392,relatives#312393,employer#312394,relations#312395,p#312396] parquet
== Optimized Logical Plan ==
Aggregate [count(id#312388) AS count(id)#312433L]
+- Project [id#312388]
+- Filter ((isnotnull(name#312389.last) AND isnotnull(name#312389.middle)) AND (name#312389.last = Jones))
+- Relation [id#312388,name#312389,address#312390,pets#312391,friends#312392,relatives#312393,employer#312394,relations#312395,p#312396] parquet
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(23498) HashAggregateTransformer(keys=[], functions=[count(id#312388)], isStreamingAgg=false, output=[count(id)#312433L])
+- ^(23498) InputIteratorTransformer[count#312445L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1509076], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(23497) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#312388)], isStreamingAgg=false, output=[count#312445L])
+- ^(23497) ProjectExecTransformer [id#312388]
+- ^(23497) FilterExecTransformer ((isnotnull(name#312389.last) AND isnotnull(name#312389.middle)) AND (name#312389.last = Jones))
+- ^(23497) FileScanTransformer parquet [id#312388,name#312389,p#312396] Batched: true, DataFilters: [isnotnull(name#312389.last), isnotnull(name#312389.middle), (name#312389.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-0edcad54-a9e0-478c-bcd9-b86020c74b37/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#312388)], output=[count(id)#312433L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1509037]
+- HashAggregate(keys=[], functions=[partial_count(id#312388)], output=[count#312445L])
+- Project [id#312388]
+- Filter ((isnotnull(name#312389.last) AND isnotnull(name#312389.middle)) AND (name#312389.last = Jones))
+- FileScan parquet [id#312388,name#312389,p#312396] Batched: false, DataFilters: [isnotnull(name#312389.last), isnotnull(name#312389.middle), (name#312389.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-0edcad54-a9e0-478c-bcd9-b86020c74b37/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15328/1048723032@233ccb3b))]
+- Filter (last#312551 = Jones)
+- Project [id#312513, name#312514.first AS first#312549, name#312514.middle AS middle#312550, name#312514.last AS last#312551]
+- Filter isnotnull(name#312514.middle)
+- Project [id#312513, name#312514, address#312515, pets#312516, friends#312517, relatives#312518, employer#312519, relations#312520, p#312521]
+- SubqueryAlias contacts
+- View (`contacts`, [id#312513,name#312514,address#312515,pets#312516,friends#312517,relatives#312518,employer#312519,relations#312520,p#312521])
+- Relation [id#312513,name#312514,address#312515,pets#312516,friends#312517,relatives#312518,employer#312519,relations#312520,p#312521] parquet
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#312513) AS count(id)#312558L]
+- Filter (last#312551 = Jones)
+- Project [id#312513, name#312514.first AS first#312549, name#312514.middle AS middle#312550, name#312514.last AS last#312551]
+- Filter isnotnull(name#312514.middle)
+- Project [id#312513, name#312514, address#312515, pets#312516, friends#312517, relatives#312518, employer#312519, relations#312520, p#312521]
+- SubqueryAlias contacts
+- View (`contacts`, [id#312513,name#312514,address#312515,pets#312516,friends#312517,relatives#312518,employer#312519,relations#312520,p#312521])
+- Relation [id#312513,name#312514,address#312515,pets#312516,friends#312517,relatives#312518,employer#312519,relations#312520,p#312521] parquet
== Optimized Logical Plan ==
Aggregate [count(id#312513) AS count(id)#312558L]
+- Project [id#312513]
+- Filter ((isnotnull(name#312514.last) AND isnotnull(name#312514.middle)) AND (name#312514.last = Jones))
+- Relation [id#312513,name#312514,address#312515,pets#312516,friends#312517,relatives#312518,employer#312519,relations#312520,p#312521] parquet
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(23502) HashAggregateTransformer(keys=[], functions=[count(id#312513)], isStreamingAgg=false, output=[count(id)#312558L])
+- ^(23502) InputIteratorTransformer[count#312570L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1509314], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(23501) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#312513)], isStreamingAgg=false, output=[count#312570L])
+- ^(23501) ProjectExecTransformer [id#312513]
+- ^(23501) FilterExecTransformer ((isnotnull(name#312514.last) AND isnotnull(name#312514.middle)) AND (name#312514.last = Jones))
+- ^(23501) FileScanTransformer parquet [id#312513,name#312514,p#312521] Batched: true, DataFilters: [isnotnull(name#312514.last), isnotnull(name#312514.middle), (name#312514.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-43d59f86-20fc-43e0-9036-55ccf661c611/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#312513)], output=[count(id)#312558L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1509275]
+- HashAggregate(keys=[], functions=[partial_count(id#312513)], output=[count#312570L])
+- Project [id#312513]
+- Filter ((isnotnull(name#312514.last) AND isnotnull(name#312514.middle)) AND (name#312514.last = Jones))
+- FileScan parquet [id#312513,name#312514,p#312521] Batched: false, DataFilters: [isnotnull(name#312514.last), isnotnull(name#312514.middle), (name#312514.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-43d59f86-20fc-43e0-9036-55ccf661c611/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#325658.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#325657,name#325658,address#325659,pets#325660,friends#325661,relatives#325662,employer#325663,relations#325664,p#325665])
+- Relation [id#325657,name#325658,address#325659,pets#325660,friends#325661,relatives#325662,employer#325663,relations#325664,p#325665] parquet
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#325658.First AS First#325728, NAME#325658.MiDDle AS MiDDle#325729]
+- Filter isnotnull(Name#325658.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#325657,name#325658,address#325659,pets#325660,friends#325661,relatives#325662,employer#325663,relations#325664,p#325665])
+- Relation [id#325657,name#325658,address#325659,pets#325660,friends#325661,relatives#325662,employer#325663,relations#325664,p#325665] parquet
== Optimized Logical Plan ==
Project [name#325658.first AS First#325728, name#325658.middle AS MiDDle#325729]
+- Filter isnotnull(name#325658.middle)
+- Relation [id#325657,name#325658,address#325659,pets#325660,friends#325661,relatives#325662,employer#325663,relations#325664,p#325665] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(24146) ProjectExecTransformer [name#325658.first AS First#325728, name#325658.middle AS MiDDle#325729]
+- ^(24146) FilterExecTransformer isnotnull(name#325658.middle)
+- ^(24146) FileScanTransformer parquet [name#325658,p#325665] Batched: true, DataFilters: [isnotnull(name#325658.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-36ceaca8-fe76-4d85-a0a2-a02f39ebe9b2/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#325806.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#325805,name#325806,address#325807,pets#325808,friends#325809,relatives#325810,employer#325811,relations#325812,p#325813])
+- Relation [id#325805,name#325806,address#325807,pets#325808,friends#325809,relatives#325810,employer#325811,relations#325812,p#325813] parquet
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#325806.First AS First#325876, NAME#325806.MiDDle AS MiDDle#325877]
+- Filter isnotnull(Name#325806.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#325805,name#325806,address#325807,pets#325808,friends#325809,relatives#325810,employer#325811,relations#325812,p#325813])
+- Relation [id#325805,name#325806,address#325807,pets#325808,friends#325809,relatives#325810,employer#325811,relations#325812,p#325813] parquet
== Optimized Logical Plan ==
Project [name#325806.first AS First#325876, name#325806.middle AS MiDDle#325877]
+- Filter isnotnull(name#325806.middle)
+- Relation [id#325805,name#325806,address#325807,pets#325808,friends#325809,relatives#325810,employer#325811,relations#325812,p#325813] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(24150) ProjectExecTransformer [name#325806.first AS First#325876, name#325806.middle AS MiDDle#325877]
+- ^(24150) FilterExecTransformer isnotnull(name#325806.middle)
+- ^(24150) FileScanTransformer parquet [name#325806,p#325813] Batched: true, DataFilters: [isnotnull(name#325806.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-a14f20b2-a5d0-4b13-bf22-6e8a66dd45c2/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#325948.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#325947,name#325948,address#325949,pets#325950,friends#325951,relatives#325952,employer#325953,relations#325954,p#325955])
+- Relation [id#325947,name#325948,address#325949,pets#325950,friends#325951,relatives#325952,employer#325953,relations#325954,p#325955] parquet
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#325948.First AS First#326018, NAME#325948.MiDDle AS MiDDle#326019]
+- Filter isnotnull(Name#325948.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#325947,name#325948,address#325949,pets#325950,friends#325951,relatives#325952,employer#325953,relations#325954,p#325955])
+- Relation [id#325947,name#325948,address#325949,pets#325950,friends#325951,relatives#325952,employer#325953,relations#325954,p#325955] parquet
== Optimized Logical Plan ==
Project [name#325948.first AS First#326018, name#325948.middle AS MiDDle#326019]
+- Filter isnotnull(name#325948.middle)
+- Relation [id#325947,name#325948,address#325949,pets#325950,friends#325951,relatives#325952,employer#325953,relations#325954,p#325955] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(24154) ProjectExecTransformer [name#325948.first AS First#326018, name#325948.middle AS MiDDle#326019]
+- ^(24154) FilterExecTransformer isnotnull(name#325948.middle)
+- ^(24154) FileScanTransformer parquet [name#325948,p#325955] Batched: true, DataFilters: [isnotnull(name#325948.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-b768c431-5775-4455-bff1-cbe2f1767db3/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV1SchemaPruningSuite.Non-vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#326096.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#326095,name#326096,address#326097,pets#326098,friends#326099,relatives#326100,employer#326101,relations#326102,p#326103])
+- Relation [id#326095,name#326096,address#326097,pets#326098,friends#326099,relatives#326100,employer#326101,relations#326102,p#326103] parquet
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#326096.First AS First#326166, NAME#326096.MiDDle AS MiDDle#326167]
+- Filter isnotnull(Name#326096.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#326095,name#326096,address#326097,pets#326098,friends#326099,relatives#326100,employer#326101,relations#326102,p#326103])
+- Relation [id#326095,name#326096,address#326097,pets#326098,friends#326099,relatives#326100,employer#326101,relations#326102,p#326103] parquet
== Optimized Logical Plan ==
Project [name#326096.first AS First#326166, name#326096.middle AS MiDDle#326167]
+- Filter isnotnull(name#326096.middle)
+- Relation [id#326095,name#326096,address#326097,pets#326098,friends#326099,relatives#326100,employer#326101,relations#326102,p#326103] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(24158) ProjectExecTransformer [name#326096.first AS First#326166, name#326096.middle AS MiDDle#326167]
+- ^(24158) FilterExecTransformer isnotnull(name#326096.middle)
+- ^(24158) FileScanTransformer parquet [name#326096,p#326103] Batched: true, DataFilters: [isnotnull(name#326096.middle)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-79cb4431-0993-4b01-b726-e45816561933/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.middle)], ReadSchema: struct<name:struct<first:string,middle:string>>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15328/1048723032@233ccb3b))]
+- Filter (last#191086 = Jones)
+- Project [id#191048, name#191049.first AS first#191084, name#191049.middle AS middle#191085, name#191049.last AS last#191086]
+- Filter isnotnull(name#191049.middle)
+- Project [id#191048, name#191049, address#191050, pets#191051, friends#191052, relatives#191053, employer#191054, relations#191055, p#191056]
+- SubqueryAlias contacts
+- View (`contacts`, [id#191048,name#191049,address#191050,pets#191051,friends#191052,relatives#191053,employer#191054,relations#191055,p#191056])
+- RelationV2[id#191048, name#191049, address#191050, pets#191051, friends#191052, relatives#191053, employer#191054, relations#191055, p#191056] parquet file:/tmp/spark-f64d6944-952e-4b9d-aa6f-0047609cc728/contacts
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#191048) AS count(id)#191093L]
+- Filter (last#191086 = Jones)
+- Project [id#191048, name#191049.first AS first#191084, name#191049.middle AS middle#191085, name#191049.last AS last#191086]
+- Filter isnotnull(name#191049.middle)
+- Project [id#191048, name#191049, address#191050, pets#191051, friends#191052, relatives#191053, employer#191054, relations#191055, p#191056]
+- SubqueryAlias contacts
+- View (`contacts`, [id#191048,name#191049,address#191050,pets#191051,friends#191052,relatives#191053,employer#191054,relations#191055,p#191056])
+- RelationV2[id#191048, name#191049, address#191050, pets#191051, friends#191052, relatives#191053, employer#191054, relations#191055, p#191056] parquet file:/tmp/spark-f64d6944-952e-4b9d-aa6f-0047609cc728/contacts
== Optimized Logical Plan ==
Aggregate [count(id#191048) AS count(id)#191093L]
+- Project [id#191048]
+- Filter ((isnotnull(name#191049.last) AND isnotnull(name#191049.middle)) AND (name#191049.last = Jones))
+- RelationV2[id#191048, name#191049] parquet file:/tmp/spark-f64d6944-952e-4b9d-aa6f-0047609cc728/contacts
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(12615) HashAggregateTransformer(keys=[], functions=[count(id#191048)], isStreamingAgg=false, output=[count(id)#191093L])
+- ^(12615) InputIteratorTransformer[count#191098L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1041705], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(12614) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#191048)], isStreamingAgg=false, output=[count#191098L])
+- ^(12614) ProjectExecTransformer [id#191048]
+- ^(12614) FilterExecTransformer ((isnotnull(name#191049.last) AND isnotnull(name#191049.middle)) AND (name#191049.last = Jones))
+- ^(12614) BatchScanTransformer parquet file:/tmp/spark-f64d6944-952e-4b9d-aa6f-0047609cc728/contacts[id#191048, name#191049] ParquetScan DataFilters: [isnotnull(name#191049.last), isnotnull(name#191049.middle), (name#191049.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-f64d6944-952e-4b9d-aa6f-0047609cc728/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> RuntimeFilters: []
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#191048)], output=[count(id)#191093L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1041649]
+- HashAggregate(keys=[], functions=[partial_count(id#191048)], output=[count#191098L])
+- Project [id#191048]
+- Filter ((isnotnull(name#191049.last) AND isnotnull(name#191049.middle)) AND (name#191049.last = Jones))
+- BatchScan parquet file:/tmp/spark-f64d6944-952e-4b9d-aa6f-0047609cc728/contacts[id#191048, name#191049] ParquetScan DataFilters: [isnotnull(name#191049.last), isnotnull(name#191049.middle), (name#191049.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-f64d6944-952e-4b9d-aa6f-0047609cc728/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15328/1048723032@233ccb3b))]
+- Filter (last#191251 = Jones)
+- Project [id#191213, name#191214.first AS first#191249, name#191214.middle AS middle#191250, name#191214.last AS last#191251]
+- Filter isnotnull(name#191214.middle)
+- Project [id#191213, name#191214, address#191215, pets#191216, friends#191217, relatives#191218, employer#191219, relations#191220, p#191221]
+- SubqueryAlias contacts
+- View (`contacts`, [id#191213,name#191214,address#191215,pets#191216,friends#191217,relatives#191218,employer#191219,relations#191220,p#191221])
+- RelationV2[id#191213, name#191214, address#191215, pets#191216, friends#191217, relatives#191218, employer#191219, relations#191220, p#191221] parquet file:/tmp/spark-02ba9fbd-5890-47a0-9167-739ee699afd8/contacts
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#191213) AS count(id)#191258L]
+- Filter (last#191251 = Jones)
+- Project [id#191213, name#191214.first AS first#191249, name#191214.middle AS middle#191250, name#191214.last AS last#191251]
+- Filter isnotnull(name#191214.middle)
+- Project [id#191213, name#191214, address#191215, pets#191216, friends#191217, relatives#191218, employer#191219, relations#191220, p#191221]
+- SubqueryAlias contacts
+- View (`contacts`, [id#191213,name#191214,address#191215,pets#191216,friends#191217,relatives#191218,employer#191219,relations#191220,p#191221])
+- RelationV2[id#191213, name#191214, address#191215, pets#191216, friends#191217, relatives#191218, employer#191219, relations#191220, p#191221] parquet file:/tmp/spark-02ba9fbd-5890-47a0-9167-739ee699afd8/contacts
== Optimized Logical Plan ==
Aggregate [count(id#191213) AS count(id)#191258L]
+- Project [id#191213]
+- Filter ((isnotnull(name#191214.last) AND isnotnull(name#191214.middle)) AND (name#191214.last = Jones))
+- RelationV2[id#191213, name#191214] parquet file:/tmp/spark-02ba9fbd-5890-47a0-9167-739ee699afd8/contacts
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(12619) HashAggregateTransformer(keys=[], functions=[count(id#191213)], isStreamingAgg=false, output=[count(id)#191258L])
+- ^(12619) InputIteratorTransformer[count#191263L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1041975], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(12618) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#191213)], isStreamingAgg=false, output=[count#191263L])
+- ^(12618) ProjectExecTransformer [id#191213]
+- ^(12618) FilterExecTransformer ((isnotnull(name#191214.last) AND isnotnull(name#191214.middle)) AND (name#191214.last = Jones))
+- ^(12618) BatchScanTransformer parquet file:/tmp/spark-02ba9fbd-5890-47a0-9167-739ee699afd8/contacts[id#191213, name#191214] ParquetScan DataFilters: [isnotnull(name#191214.last), isnotnull(name#191214.middle), (name#191214.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-02ba9fbd-5890-47a0-9167-739ee699afd8/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> RuntimeFilters: []
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#191213)], output=[count(id)#191258L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1041919]
+- HashAggregate(keys=[], functions=[partial_count(id#191213)], output=[count#191263L])
+- Project [id#191213]
+- Filter ((isnotnull(name#191214.last) AND isnotnull(name#191214.middle)) AND (name#191214.last = Jones))
+- BatchScan parquet file:/tmp/spark-02ba9fbd-5890-47a0-9167-739ee699afd8/contacts[id#191213, name#191214] ParquetScan DataFilters: [isnotnull(name#191214.last), isnotnull(name#191214.middle), (name#191214.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-02ba9fbd-5890-47a0-9167-739ee699afd8/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15328/1048723032@233ccb3b))]
+- Filter (last#191410 = Jones)
+- Project [id#191372, name#191373.first AS first#191408, name#191373.middle AS middle#191409, name#191373.last AS last#191410]
+- Filter isnotnull(name#191373.middle)
+- Project [id#191372, name#191373, address#191374, pets#191375, friends#191376, relatives#191377, employer#191378, relations#191379, p#191380]
+- SubqueryAlias contacts
+- View (`contacts`, [id#191372,name#191373,address#191374,pets#191375,friends#191376,relatives#191377,employer#191378,relations#191379,p#191380])
+- RelationV2[id#191372, name#191373, address#191374, pets#191375, friends#191376, relatives#191377, employer#191378, relations#191379, p#191380] parquet file:/tmp/spark-349b25a0-d4fc-48d9-89bb-59ad23f047bd/contacts
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#191372) AS count(id)#191417L]
+- Filter (last#191410 = Jones)
+- Project [id#191372, name#191373.first AS first#191408, name#191373.middle AS middle#191409, name#191373.last AS last#191410]
+- Filter isnotnull(name#191373.middle)
+- Project [id#191372, name#191373, address#191374, pets#191375, friends#191376, relatives#191377, employer#191378, relations#191379, p#191380]
+- SubqueryAlias contacts
+- View (`contacts`, [id#191372,name#191373,address#191374,pets#191375,friends#191376,relatives#191377,employer#191378,relations#191379,p#191380])
+- RelationV2[id#191372, name#191373, address#191374, pets#191375, friends#191376, relatives#191377, employer#191378, relations#191379, p#191380] parquet file:/tmp/spark-349b25a0-d4fc-48d9-89bb-59ad23f047bd/contacts
== Optimized Logical Plan ==
Aggregate [count(id#191372) AS count(id)#191417L]
+- Project [id#191372]
+- Filter ((isnotnull(name#191373.last) AND isnotnull(name#191373.middle)) AND (name#191373.last = Jones))
+- RelationV2[id#191372, name#191373] parquet file:/tmp/spark-349b25a0-d4fc-48d9-89bb-59ad23f047bd/contacts
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(12623) HashAggregateTransformer(keys=[], functions=[count(id#191372)], isStreamingAgg=false, output=[count(id)#191417L])
+- ^(12623) InputIteratorTransformer[count#191422L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1042226], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(12622) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#191372)], isStreamingAgg=false, output=[count#191422L])
+- ^(12622) ProjectExecTransformer [id#191372]
+- ^(12622) FilterExecTransformer ((isnotnull(name#191373.last) AND isnotnull(name#191373.middle)) AND (name#191373.last = Jones))
+- ^(12622) BatchScanTransformer parquet file:/tmp/spark-349b25a0-d4fc-48d9-89bb-59ad23f047bd/contacts[id#191372, name#191373] ParquetScan DataFilters: [isnotnull(name#191373.last), isnotnull(name#191373.middle), (name#191373.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-349b25a0-d4fc-48d9-89bb-59ad23f047bd/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> RuntimeFilters: []
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#191372)], output=[count(id)#191417L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1042189]
+- HashAggregate(keys=[], functions=[partial_count(id#191372)], output=[count#191422L])
+- Project [id#191372]
+- Filter ((isnotnull(name#191373.last) AND isnotnull(name#191373.middle)) AND (name#191373.last = Jones))
+- BatchScan parquet file:/tmp/spark-349b25a0-d4fc-48d9-89bb-59ad23f047bd/contacts[id#191372, name#191373] ParquetScan DataFilters: [isnotnull(name#191373.last), isnotnull(name#191373.middle), (name#191373.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-349b25a0-d4fc-48d9-89bb-59ad23f047bd/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15328/1048723032@233ccb3b))]
+- Filter (last#191575 = Jones)
+- Project [id#191537, name#191538.first AS first#191573, name#191538.middle AS middle#191574, name#191538.last AS last#191575]
+- Filter isnotnull(name#191538.middle)
+- Project [id#191537, name#191538, address#191539, pets#191540, friends#191541, relatives#191542, employer#191543, relations#191544, p#191545]
+- SubqueryAlias contacts
+- View (`contacts`, [id#191537,name#191538,address#191539,pets#191540,friends#191541,relatives#191542,employer#191543,relations#191544,p#191545])
+- RelationV2[id#191537, name#191538, address#191539, pets#191540, friends#191541, relatives#191542, employer#191543, relations#191544, p#191545] parquet file:/tmp/spark-becb63af-cf85-4198-9262-598e2681ac32/contacts
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#191537) AS count(id)#191582L]
+- Filter (last#191575 = Jones)
+- Project [id#191537, name#191538.first AS first#191573, name#191538.middle AS middle#191574, name#191538.last AS last#191575]
+- Filter isnotnull(name#191538.middle)
+- Project [id#191537, name#191538, address#191539, pets#191540, friends#191541, relatives#191542, employer#191543, relations#191544, p#191545]
+- SubqueryAlias contacts
+- View (`contacts`, [id#191537,name#191538,address#191539,pets#191540,friends#191541,relatives#191542,employer#191543,relations#191544,p#191545])
+- RelationV2[id#191537, name#191538, address#191539, pets#191540, friends#191541, relatives#191542, employer#191543, relations#191544, p#191545] parquet file:/tmp/spark-becb63af-cf85-4198-9262-598e2681ac32/contacts
== Optimized Logical Plan ==
Aggregate [count(id#191537) AS count(id)#191582L]
+- Project [id#191537]
+- Filter ((isnotnull(name#191538.last) AND isnotnull(name#191538.middle)) AND (name#191538.last = Jones))
+- RelationV2[id#191537, name#191538] parquet file:/tmp/spark-becb63af-cf85-4198-9262-598e2681ac32/contacts
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(12627) HashAggregateTransformer(keys=[], functions=[count(id#191537)], isStreamingAgg=false, output=[count(id)#191582L])
+- ^(12627) InputIteratorTransformer[count#191587L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1042458], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(12626) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#191537)], isStreamingAgg=false, output=[count#191587L])
+- ^(12626) ProjectExecTransformer [id#191537]
+- ^(12626) FilterExecTransformer ((isnotnull(name#191538.last) AND isnotnull(name#191538.middle)) AND (name#191538.last = Jones))
+- ^(12626) BatchScanTransformer parquet file:/tmp/spark-becb63af-cf85-4198-9262-598e2681ac32/contacts[id#191537, name#191538] ParquetScan DataFilters: [isnotnull(name#191538.last), isnotnull(name#191538.middle), (name#191538.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-becb63af-cf85-4198-9262-598e2681ac32/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> RuntimeFilters: []
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#191537)], output=[count(id)#191582L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1042421]
+- HashAggregate(keys=[], functions=[partial_count(id#191537)], output=[count#191587L])
+- Project [id#191537]
+- Filter ((isnotnull(name#191538.last) AND isnotnull(name#191538.middle)) AND (name#191538.last = Jones))
+- BatchScan parquet file:/tmp/spark-becb63af-cf85-4198-9262-598e2681ac32/contacts[id#191537, name#191538] ParquetScan DataFilters: [isnotnull(name#191538.last), isnotnull(name#191538.middle), (name#191538.last = Jones)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-becb63af-cf85-4198-9262-598e2681ac32/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], PushedGroupBy: [], ReadSchema: struct<id:int,name:struct<middle:string,last:string>> RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#204832.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#204831,name#204832,address#204833,pets#204834,friends#204835,relatives#204836,employer#204837,relations#204838,p#204839])
+- RelationV2[id#204831, name#204832, address#204833, pets#204834, friends#204835, relatives#204836, employer#204837, relations#204838, p#204839] parquet file:/tmp/spark-57ee7c84-631a-4f01-952e-f1cb78b3685f/contacts
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#204832.First AS First#204892, NAME#204832.MiDDle AS MiDDle#204893]
+- Filter isnotnull(Name#204832.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#204831,name#204832,address#204833,pets#204834,friends#204835,relatives#204836,employer#204837,relations#204838,p#204839])
+- RelationV2[id#204831, name#204832, address#204833, pets#204834, friends#204835, relatives#204836, employer#204837, relations#204838, p#204839] parquet file:/tmp/spark-57ee7c84-631a-4f01-952e-f1cb78b3685f/contacts
== Optimized Logical Plan ==
Project [name#204832.first AS First#204892, name#204832.middle AS MiDDle#204893]
+- Filter isnotnull(name#204832.middle)
+- RelationV2[name#204832] parquet file:/tmp/spark-57ee7c84-631a-4f01-952e-f1cb78b3685f/contacts
== Physical Plan ==
VeloxColumnarToRow
+- ^(13271) ProjectExecTransformer [name#204832.first AS First#204892, name#204832.middle AS MiDDle#204893]
+- ^(13271) FilterExecTransformer isnotnull(name#204832.middle)
+- ^(13271) BatchScanTransformer parquet file:/tmp/spark-57ee7c84-631a-4f01-952e-f1cb78b3685f/contacts[name#204832] ParquetScan DataFilters: [isnotnull(name#204832.MIDDLE)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-57ee7c84-631a-4f01-952e-f1cb78b3685f/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.middle)], PushedGroupBy: [], ReadSchema: struct<name:struct<first:string,middle:string>> RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV2SchemaPruningSuite.Spark vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#204960.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#204959,name#204960,address#204961,pets#204962,friends#204963,relatives#204964,employer#204965,relations#204966,p#204967])
+- RelationV2[id#204959, name#204960, address#204961, pets#204962, friends#204963, relatives#204964, employer#204965, relations#204966, p#204967] parquet file:/tmp/spark-8fdf0f4a-9fac-4625-8ca0-76228fe87439/contacts
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#204960.First AS First#205020, NAME#204960.MiDDle AS MiDDle#205021]
+- Filter isnotnull(Name#204960.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#204959,name#204960,address#204961,pets#204962,friends#204963,relatives#204964,employer#204965,relations#204966,p#204967])
+- RelationV2[id#204959, name#204960, address#204961, pets#204962, friends#204963, relatives#204964, employer#204965, relations#204966, p#204967] parquet file:/tmp/spark-8fdf0f4a-9fac-4625-8ca0-76228fe87439/contacts
== Optimized Logical Plan ==
Project [name#204960.first AS First#205020, name#204960.middle AS MiDDle#205021]
+- Filter isnotnull(name#204960.middle)
+- RelationV2[name#204960] parquet file:/tmp/spark-8fdf0f4a-9fac-4625-8ca0-76228fe87439/contacts
== Physical Plan ==
VeloxColumnarToRow
+- ^(13275) ProjectExecTransformer [name#204960.first AS First#205020, name#204960.middle AS MiDDle#205021]
+- ^(13275) FilterExecTransformer isnotnull(name#204960.middle)
+- ^(13275) BatchScanTransformer parquet file:/tmp/spark-8fdf0f4a-9fac-4625-8ca0-76228fe87439/contacts[name#204960] ParquetScan DataFilters: [isnotnull(name#204960.MIDDLE)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-8fdf0f4a-9fac-4625-8ca0-76228fe87439/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.middle)], PushedGroupBy: [], ReadSchema: struct<name:struct<first:string,middle:string>> RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - without partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#205082.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#205081,name#205082,address#205083,pets#205084,friends#205085,relatives#205086,employer#205087,relations#205088,p#205089])
+- RelationV2[id#205081, name#205082, address#205083, pets#205084, friends#205085, relatives#205086, employer#205087, relations#205088, p#205089] parquet file:/tmp/spark-0c9858aa-c8a9-4c26-8722-2e6a486f00ee/contacts
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#205082.First AS First#205142, NAME#205082.MiDDle AS MiDDle#205143]
+- Filter isnotnull(Name#205082.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#205081,name#205082,address#205083,pets#205084,friends#205085,relatives#205086,employer#205087,relations#205088,p#205089])
+- RelationV2[id#205081, name#205082, address#205083, pets#205084, friends#205085, relatives#205086, employer#205087, relations#205088, p#205089] parquet file:/tmp/spark-0c9858aa-c8a9-4c26-8722-2e6a486f00ee/contacts
== Optimized Logical Plan ==
Project [name#205082.first AS First#205142, name#205082.middle AS MiDDle#205143]
+- Filter isnotnull(name#205082.middle)
+- RelationV2[name#205082] parquet file:/tmp/spark-0c9858aa-c8a9-4c26-8722-2e6a486f00ee/contacts
== Physical Plan ==
VeloxColumnarToRow
+- ^(13279) ProjectExecTransformer [name#205082.first AS First#205142, name#205082.middle AS MiDDle#205143]
+- ^(13279) FilterExecTransformer isnotnull(name#205082.middle)
+- ^(13279) BatchScanTransformer parquet file:/tmp/spark-0c9858aa-c8a9-4c26-8722-2e6a486f00ee/contacts[name#205082] ParquetScan DataFilters: [isnotnull(name#205082.MIDDLE)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-0c9858aa-c8a9-4c26-8722-2e6a486f00ee/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.middle)], PushedGroupBy: [], ReadSchema: struct<name:struct<first:string,middle:string>> RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV2SchemaPruningSuite.Non-vectorized reader - with partition data column - SPARK-34963: extract case-insensitive struct field from struct:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV2SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['Name.First, 'NAME.MiDDle]
+- Filter isnotnull(Name#205210.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#205209,name#205210,address#205211,pets#205212,friends#205213,relatives#205214,employer#205215,relations#205216,p#205217])
+- RelationV2[id#205209, name#205210, address#205211, pets#205212, friends#205213, relatives#205214, employer#205215, relations#205216, p#205217] parquet file:/tmp/spark-63b14327-07e6-4b3f-8bc0-50d844ceef35/contacts
== Analyzed Logical Plan ==
First: string, MiDDle: string
Project [Name#205210.First AS First#205270, NAME#205210.MiDDle AS MiDDle#205271]
+- Filter isnotnull(Name#205210.MIDDLE)
+- SubqueryAlias contacts
+- View (`contacts`, [id#205209,name#205210,address#205211,pets#205212,friends#205213,relatives#205214,employer#205215,relations#205216,p#205217])
+- RelationV2[id#205209, name#205210, address#205211, pets#205212, friends#205213, relatives#205214, employer#205215, relations#205216, p#205217] parquet file:/tmp/spark-63b14327-07e6-4b3f-8bc0-50d844ceef35/contacts
== Optimized Logical Plan ==
Project [name#205210.first AS First#205270, name#205210.middle AS MiDDle#205271]
+- Filter isnotnull(name#205210.middle)
+- RelationV2[name#205210] parquet file:/tmp/spark-63b14327-07e6-4b3f-8bc0-50d844ceef35/contacts
== Physical Plan ==
VeloxColumnarToRow
+- ^(13283) ProjectExecTransformer [name#205210.first AS First#205270, name#205210.middle AS MiDDle#205271]
+- ^(13283) FilterExecTransformer isnotnull(name#205210.middle)
+- ^(13283) BatchScanTransformer parquet file:/tmp/spark-63b14327-07e6-4b3f-8bc0-50d844ceef35/contacts[name#205210] ParquetScan DataFilters: [isnotnull(name#205210.MIDDLE)], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-63b14327-07e6-4b3f-8bc0-50d844ceef35/contacts], PartitionFilters: [], PushedAggregation: [], PushedFilters: [IsNotNull(name.middle)], PushedGroupBy: [], ReadSchema: struct<name:struct<first:string,middle:string>> RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 4 ==
!struct<> struct<First:string,MiDDle:string>
[Jane,X.] [Jane,X.]
![John,Y.] [Janet,null]
! [Jim,null]
! [John,Y.]
|
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - without partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15841/1926373532@2850e296))]
+- Filter (last#301405 = Jones)
+- Project [id#301367, name#301368.first AS first#301403, name#301368.middle AS middle#301404, name#301368.last AS last#301405]
+- Filter isnotnull(name#301368.middle)
+- Project [id#301367, name#301368, address#301369, pets#301370, friends#301371, relatives#301372, employer#301373, relations#301374, p#301375]
+- SubqueryAlias contacts
+- View (`contacts`, [id#301367,name#301368,address#301369,pets#301370,friends#301371,relatives#301372,employer#301373,relations#301374,p#301375])
+- Relation [id#301367,name#301368,address#301369,pets#301370,friends#301371,relatives#301372,employer#301373,relations#301374,p#301375] parquet
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#301367) AS count(id)#301412L]
+- Filter (last#301405 = Jones)
+- Project [id#301367, name#301368.first AS first#301403, name#301368.middle AS middle#301404, name#301368.last AS last#301405]
+- Filter isnotnull(name#301368.middle)
+- Project [id#301367, name#301368, address#301369, pets#301370, friends#301371, relatives#301372, employer#301373, relations#301374, p#301375]
+- SubqueryAlias contacts
+- View (`contacts`, [id#301367,name#301368,address#301369,pets#301370,friends#301371,relatives#301372,employer#301373,relations#301374,p#301375])
+- Relation [id#301367,name#301368,address#301369,pets#301370,friends#301371,relatives#301372,employer#301373,relations#301374,p#301375] parquet
== Optimized Logical Plan ==
Aggregate [count(id#301367) AS count(id)#301412L]
+- Project [id#301367]
+- Filter ((isnotnull(name#301368.last) AND isnotnull(name#301368.middle)) AND (name#301368.last = Jones))
+- Relation [id#301367,name#301368,address#301369,pets#301370,friends#301371,relatives#301372,employer#301373,relations#301374,p#301375] parquet
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(22976) HashAggregateTransformer(keys=[], functions=[count(id#301367)], isStreamingAgg=false, output=[count(id)#301412L])
+- ^(22976) InputIteratorTransformer[count#301424L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1467561], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(22975) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#301367)], isStreamingAgg=false, output=[count#301424L])
+- ^(22975) ProjectExecTransformer [id#301367]
+- ^(22975) FilterExecTransformer ((isnotnull(name#301368.last) AND isnotnull(name#301368.middle)) AND (name#301368.last = Jones))
+- ^(22975) FileScanTransformer parquet [id#301367,name#301368,p#301375] Batched: true, DataFilters: [isnotnull(name#301368.last), isnotnull(name#301368.middle), (name#301368.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-e190383e-cba2-470b-87b9-3db3c0f0c1d7/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#301367)], output=[count(id)#301412L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1467503]
+- HashAggregate(keys=[], functions=[partial_count(id#301367)], output=[count#301424L])
+- Project [id#301367]
+- Filter ((isnotnull(name#301368.last) AND isnotnull(name#301368.middle)) AND (name#301368.last = Jones))
+- FileScan parquet [id#301367,name#301368,p#301375] Batched: true, DataFilters: [isnotnull(name#301368.last), isnotnull(name#301368.middle), (name#301368.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-e190383e-cba2-470b-87b9-3db3c0f0c1d7/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|
GlutenParquetV1SchemaPruningSuite.Spark vectorized reader - with partition data column - select one complex field and having is null predicate on another complex field:
org/apache/spark/sql/execution/datasources/parquet/GlutenParquetV1SchemaPruningSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [unresolvedalias(count('id), Some(org.apache.spark.sql.Column$$Lambda$15841/1926373532@2850e296))]
+- Filter (last#301530 = Jones)
+- Project [id#301492, name#301493.first AS first#301528, name#301493.middle AS middle#301529, name#301493.last AS last#301530]
+- Filter isnotnull(name#301493.middle)
+- Project [id#301492, name#301493, address#301494, pets#301495, friends#301496, relatives#301497, employer#301498, relations#301499, p#301500]
+- SubqueryAlias contacts
+- View (`contacts`, [id#301492,name#301493,address#301494,pets#301495,friends#301496,relatives#301497,employer#301498,relations#301499,p#301500])
+- Relation [id#301492,name#301493,address#301494,pets#301495,friends#301496,relatives#301497,employer#301498,relations#301499,p#301500] parquet
== Analyzed Logical Plan ==
count(id): bigint
Aggregate [count(id#301492) AS count(id)#301537L]
+- Filter (last#301530 = Jones)
+- Project [id#301492, name#301493.first AS first#301528, name#301493.middle AS middle#301529, name#301493.last AS last#301530]
+- Filter isnotnull(name#301493.middle)
+- Project [id#301492, name#301493, address#301494, pets#301495, friends#301496, relatives#301497, employer#301498, relations#301499, p#301500]
+- SubqueryAlias contacts
+- View (`contacts`, [id#301492,name#301493,address#301494,pets#301495,friends#301496,relatives#301497,employer#301498,relations#301499,p#301500])
+- Relation [id#301492,name#301493,address#301494,pets#301495,friends#301496,relatives#301497,employer#301498,relations#301499,p#301500] parquet
== Optimized Logical Plan ==
Aggregate [count(id#301492) AS count(id)#301537L]
+- Project [id#301492]
+- Filter ((isnotnull(name#301493.last) AND isnotnull(name#301493.middle)) AND (name#301493.last = Jones))
+- Relation [id#301492,name#301493,address#301494,pets#301495,friends#301496,relatives#301497,employer#301498,relations#301499,p#301500] parquet
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
VeloxColumnarToRow
+- ^(22980) HashAggregateTransformer(keys=[], functions=[count(id#301492)], isStreamingAgg=false, output=[count(id)#301537L])
+- ^(22980) InputIteratorTransformer[count#301549L]
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1467837], [shuffle_writer_type=hash], [OUTPUT] List(count:LongType), [OUTPUT] List(count:LongType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(22979) FlushableHashAggregateTransformer(keys=[], functions=[partial_count(id#301492)], isStreamingAgg=false, output=[count#301549L])
+- ^(22979) ProjectExecTransformer [id#301492]
+- ^(22979) FilterExecTransformer ((isnotnull(name#301493.last) AND isnotnull(name#301493.middle)) AND (name#301493.last = Jones))
+- ^(22979) FileScanTransformer parquet [id#301492,name#301493,p#301500] Batched: true, DataFilters: [isnotnull(name#301493.last), isnotnull(name#301493.middle), (name#301493.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-a5dc5ac3-4825-4281-b9b2-b5a98b302a7b/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
+- == Initial Plan ==
HashAggregate(keys=[], functions=[count(id#301492)], output=[count(id)#301537L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=1467779]
+- HashAggregate(keys=[], functions=[partial_count(id#301492)], output=[count#301549L])
+- Project [id#301492]
+- Filter ((isnotnull(name#301493.last) AND isnotnull(name#301493.middle)) AND (name#301493.last = Jones))
+- FileScan parquet [id#301492,name#301493,p#301500] Batched: true, DataFilters: [isnotnull(name#301493.last), isnotnull(name#301493.middle), (name#301493.last = Jones)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-a5dc5ac3-4825-4281-b9b2-b5a98b302a7b/contacts], PartitionFilters: [], PushedFilters: [IsNotNull(name.last), IsNotNull(name.middle), EqualTo(name.last,Jones)], ReadSchema: struct<id:int,name:struct<middle:string,last:string>>
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<count(id):bigint>
![0] [2]
|