You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Finally, for binary operations, which partitioner is set on the output depends on the parent RDDs’ partitioners. By default, it is a hash partitioner, with the number of partitions set to the level of parallelism of the operation. However, if one of the parents has a partitioner set, it will be that partitioner; and if both parents have a partitioner set, it will be the partitioner of the first parent.
我在 Learning Spark 中看到有一段话:
子RDD的partitioner应该由父RDD的partitioner决定。但在 SparkInternals 的第二章,父子RDD的partitioner都不相同,这是怎么回事?如果两个父RDD的其中一个是hash-partitioner,那么子RDD不应该也是hash-partitioner吗?
The text was updated successfully, but these errors were encountered: