You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Can we make Scio's sortValues more memory-efficient by, ideally, avoiding materializing the output of GroupByKey, which might be lazy? Perhaps we could make a new API, .groupByKeyAndSortValues, that applies GBK + SortValues directly, without translating back and forth between Java/Scala iterables.
The text was updated successfully, but these errors were encountered:
A pipeline on Scio 0.14.3 migrated from applying Beam's SortValues transform directly:
to the scio-extra API:
and immediately ran out of memory on DF:
The SorterOps line in question is where we convert the Scala iterable to a Java one: https://github.com/spotify/scio/blob/v0.14.3/scio-extra/src/main/scala/com/spotify/scio/extra/sorter/syntax/SCollectionSyntax.scala#L68
Can we make Scio's
sortValues
more memory-efficient by, ideally, avoiding materializing the output ofGroupByKey
, which might be lazy? Perhaps we could make a new API,.groupByKeyAndSortValues
, that applies GBK + SortValues directly, without translating back and forth between Java/Scala iterables.The text was updated successfully, but these errors were encountered: