Skip to content

Commit

Permalink
deploy: 085b288
Browse files Browse the repository at this point in the history
  • Loading branch information
RustedBones committed Aug 21, 2024
1 parent 6958055 commit 071ac07
Show file tree
Hide file tree
Showing 1,302 changed files with 2,540 additions and 2,514 deletions.
18 changes: 9 additions & 9 deletions Builtin.html
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@
<ul style="display: none">
<li class="md-nav__item md-version" id="project.version">
<label class="md-nav__link" for="__version">
<i class="md-icon" title="Version">label_outline</i> 0.14.5-35-e785613-20240724T153414Z*
<i class="md-icon" title="Version">label_outline</i> 0.14.6-18-085b288-20240821T085147Z*
</label>
</li>
</ul>
Expand Down Expand Up @@ -276,9 +276,9 @@ <h1><a href="#built-in-functionality" name="built-in-functionality" class="ancho
<p>Scio is a thin wrapper on top of Beam offering idiomatic Scala APIs. Check out the <a href="https://beam.apache.org/documentation/programming-guide/">Beam Programming Guide</a> first for a detailed explanation of the Beam programming model and concepts.</p>
<h2><a href="#basics" name="basics" class="anchor"><span class="anchor-link"></span></a>Basics</h2>
<ul>
<li><a href="https://spotify.github.io/scio/api/com/spotify/scio/ScioContext.html" title="com.spotify.scio.ScioContext"><code>ScioContext</code></a> wraps Beam&rsquo;s <a href="https://beam.apache.org/releases/javadoc/2.57.0/?org/apache/beam/sdk/Pipeline.html" title="org.apache.beam.sdk.Pipeline"><code>Pipeline</code></a></li>
<li><a href="https://spotify.github.io/scio/api/com/spotify/scio/values/SCollection.html" title="com.spotify.scio.values.SCollection"><code>SCollection</code></a> wraps Beam&rsquo;s <a href="https://beam.apache.org/releases/javadoc/2.57.0/?org/apache/beam/sdk/values/PCollection.html" title="org.apache.beam.sdk.values.PCollection"><code>PCollection</code></a></li>
<li><a href="https://spotify.github.io/scio/api/com/spotify/scio/ScioResult.html" title="com.spotify.scio.ScioResult"><code>ScioResult</code></a> wraps Beam&rsquo;s <a href="https://beam.apache.org/releases/javadoc/2.57.0/?org/apache/beam/sdk/PipelineResult.html" title="org.apache.beam.sdk.PipelineResult"><code>PipelineResult</code></a></li>
<li><a href="https://spotify.github.io/scio/api/com/spotify/scio/ScioContext.html" title="com.spotify.scio.ScioContext"><code>ScioContext</code></a> wraps Beam&rsquo;s <a href="https://beam.apache.org/releases/javadoc/2.58.1/?org/apache/beam/sdk/Pipeline.html" title="org.apache.beam.sdk.Pipeline"><code>Pipeline</code></a></li>
<li><a href="https://spotify.github.io/scio/api/com/spotify/scio/values/SCollection.html" title="com.spotify.scio.values.SCollection"><code>SCollection</code></a> wraps Beam&rsquo;s <a href="https://beam.apache.org/releases/javadoc/2.58.1/?org/apache/beam/sdk/values/PCollection.html" title="org.apache.beam.sdk.values.PCollection"><code>PCollection</code></a></li>
<li><a href="https://spotify.github.io/scio/api/com/spotify/scio/ScioResult.html" title="com.spotify.scio.ScioResult"><code>ScioResult</code></a> wraps Beam&rsquo;s <a href="https://beam.apache.org/releases/javadoc/2.58.1/?org/apache/beam/sdk/PipelineResult.html" title="org.apache.beam.sdk.PipelineResult"><code>PipelineResult</code></a></li>
</ul>
<p>See dedicated sections on:</p>
<ul>
Expand Down Expand Up @@ -337,7 +337,7 @@ <h2><a href="#contextandargs" name="contextandargs" class="anchor"><span class="
val cmdlineArgs: Array[String] = ???
val (sc, args) = ContextAndArgs(cmdlineArgs)
</code></pre>
<p>If you need custom pipeline options, subclass Beam&rsquo;s <a href="https://beam.apache.org/releases/javadoc/2.57.0/?org/apache/beam/sdk/options/PipelineOptions.html" title="org.apache.beam.sdk.options.PipelineOptions"><code>PipelineOptions</code></a> and use <code>ContextAndArgs.typed</code>:</p>
<p>If you need custom pipeline options, subclass Beam&rsquo;s <a href="https://beam.apache.org/releases/javadoc/2.58.1/?org/apache/beam/sdk/options/PipelineOptions.html" title="org.apache.beam.sdk.options.PipelineOptions"><code>PipelineOptions</code></a> and use <code>ContextAndArgs.typed</code>:</p>
<pre class="prettyprint"><code class="language-scala mdoc:compile-only">import com.spotify.scio._
import org.apache.beam.sdk.options.PipelineOptions

Expand All @@ -356,7 +356,7 @@ <h3><a href="#counting" name="counting" class="anchor"><span class="anchor-link"
<ul>
<li><a href="https://spotify.github.io/scio/api/com/spotify/scio/values/SCollection.html#count:com.spotify.scio.values.SCollection[Long]" title="com.spotify.scio.values.SCollection"><code>count</code></a> (or <code>countByKey</code>) counts the number of elements</li>
<li><a href="https://spotify.github.io/scio/api/com/spotify/scio/values/SCollection.html#countByValue:com.spotify.scio.values.SCollection[(T,Long)]" title="com.spotify.scio.values.SCollection"><code>countByValue</code></a> counts the number of elements for each value in a <code>SCollection[T]</code></li>
<li><a href="https://spotify.github.io/scio/api/com/spotify/scio/values/SCollection.html#countApproxDistinct(estimator:com.spotify.scio.estimators.ApproxDistinctCounter[T]):com.spotify.scio.values.SCollection[Long]" title="com.spotify.scio.values.SCollection"><code>countApproxDistinct</code></a> (or <code>countApproxDistinctByKey</code>) estimates a distinct count, with Beam&rsquo;s <a href="https://beam.apache.org/releases/javadoc/2.57.0/?org/apache/beam/sdk/transforms/ApproximateUnique.html" title="org.apache.beam.sdk.transforms.ApproximateUnique"><code>ApproximateUnique</code></a> or Scio&rsquo;s HyperLogLog-based <a href="https://spotify.github.io/scio/api/com/spotify/scio/estimators/ApproxDistinctCounter.html" title="com.spotify.scio.estimators.ApproxDistinctCounter"><code>ApproxDistinctCounter</code></a></li>
<li><a href="https://spotify.github.io/scio/api/com/spotify/scio/values/SCollection.html#countApproxDistinct(estimator:com.spotify.scio.estimators.ApproxDistinctCounter[T]):com.spotify.scio.values.SCollection[Long]" title="com.spotify.scio.values.SCollection"><code>countApproxDistinct</code></a> (or <code>countApproxDistinctByKey</code>) estimates a distinct count, with Beam&rsquo;s <a href="https://beam.apache.org/releases/javadoc/2.58.1/?org/apache/beam/sdk/transforms/ApproximateUnique.html" title="org.apache.beam.sdk.transforms.ApproximateUnique"><code>ApproximateUnique</code></a> or Scio&rsquo;s HyperLogLog-based <a href="https://spotify.github.io/scio/api/com/spotify/scio/estimators/ApproxDistinctCounter.html" title="com.spotify.scio.estimators.ApproxDistinctCounter"><code>ApproxDistinctCounter</code></a></li>
</ul>
<pre class="prettyprint"><code class="language-scala mdoc:compile-only">import com.spotify.scio.values.SCollection
import com.spotify.scio.extra.hll.zetasketch.ZetaSketchHllPlusPlus
Expand All @@ -370,7 +370,7 @@ <h3><a href="#statistics" name="statistics" class="anchor"><span class="anchor-l
<li><a href="https://spotify.github.io/scio/api/com/spotify/scio/values/SCollection.html#max(implicitord:Ordering[T]):com.spotify.scio.values.SCollection[T]" title="com.spotify.scio.values.SCollection"><code>max</code></a> (or <code>maxByKey</code>) finds the maximum element given some <a href="http://www.scala-lang.org/api/2.13.14/scala/math/Ordering.html" title="scala.math.Ordering"><code>Ordering</code></a></li>
<li><a href="https://spotify.github.io/scio/api/com/spotify/scio/values/SCollection.html#min(implicitord:Ordering[T]):com.spotify.scio.values.SCollection[T]" title="com.spotify.scio.values.SCollection"><code>min</code></a> (or <code>minByKey</code>) finds the minimum element given some <a href="http://www.scala-lang.org/api/2.13.14/scala/math/Ordering.html" title="scala.math.Ordering"><code>Ordering</code></a></li>
<li><a href="https://spotify.github.io/scio/api/com/spotify/scio/values/SCollection.html#mean(implicitev:Numeric[T]):com.spotify.scio.values.SCollection[Double]" title="com.spotify.scio.values.SCollection"><code>mean</code></a> finds the mean given some <a href="http://www.scala-lang.org/api/2.13.14/scala/math/Numeric.html" title="scala.math.Numeric"><code>Numeric</code></a></li>
<li><a href="https://spotify.github.io/scio/api/com/spotify/scio/values/SCollection.html#quantilesApprox(numQuantiles:Int)(implicitord:Ordering[T]):com.spotify.scio.values.SCollection[Iterable[T]]" title="com.spotify.scio.values.SCollection"><code>quantilesApprox</code></a> (or <code>approxQuantilesByKey</code>) finds the distribution using Beam&rsquo;s <a href="https://beam.apache.org/releases/javadoc/2.57.0/?org/apache/beam/sdk/transforms/ApproximateQuantiles.html" title="org.apache.beam.sdk.transforms.ApproximateQuantiles"><code>ApproximateQuantiles</code></a></li>
<li><a href="https://spotify.github.io/scio/api/com/spotify/scio/values/SCollection.html#quantilesApprox(numQuantiles:Int)(implicitord:Ordering[T]):com.spotify.scio.values.SCollection[Iterable[T]]" title="com.spotify.scio.values.SCollection"><code>quantilesApprox</code></a> (or <code>approxQuantilesByKey</code>) finds the distribution using Beam&rsquo;s <a href="https://beam.apache.org/releases/javadoc/2.58.1/?org/apache/beam/sdk/transforms/ApproximateQuantiles.html" title="org.apache.beam.sdk.transforms.ApproximateQuantiles"><code>ApproximateQuantiles</code></a></li>
</ul>
<p>For <code>SCollection</code>s containing <code>Double</code>, Scio additionally provides a <a href="https://spotify.github.io/scio/api/com/spotify/scio/values/DoubleSCollectionFunctions.html#stats:com.spotify.scio.values.SCollection[com.spotify.scio.util.StatCounter]" title="com.spotify.scio.values.DoubleSCollectionFunctions"><code>stats</code></a> method that computes the count, mean, min, max, variance, standard deviation, sample variance, and sample standard deviation over the <code>SCollection</code>. Convenience methods are available directly on the <a href="https://spotify.github.io/scio/api/com/spotify/scio/values/DoubleSCollectionFunctions.html" title="com.spotify.scio.values.DoubleSCollectionFunctions"><code>SCollection</code></a> if only a single value is required:</p>
<pre class="prettyprint"><code class="language-scala mdoc:compile-only">import com.spotify.scio.values.SCollection
Expand Down Expand Up @@ -422,7 +422,7 @@ <h3><a href="#sums-combinations" name="sums-combinations" class="anchor"><span c
</code></pre>
<p>See also <a href="extras/Algebird.html">Algebird</a></p>
<h2><a href="#metrics" name="metrics" class="anchor"><span class="anchor-link"></span></a>Metrics</h2>
<p>Scio supports Beam&rsquo;s <a href="https://beam.apache.org/releases/javadoc/2.57.0/?org/apache/beam/sdk/metrics/Counter.html" title="org.apache.beam.sdk.metrics.Counter"><code>Counter</code></a> <a href="https://beam.apache.org/releases/javadoc/2.57.0/?org/apache/beam/sdk/metrics/Distribution.html" title="org.apache.beam.sdk.metrics.Distribution"><code>Distribution</code></a> and <a href="https://beam.apache.org/releases/javadoc/2.57.0/?org/apache/beam/sdk/metrics/Gauge.html" title="org.apache.beam.sdk.metrics.Gauge"><code>Gauge</code></a>.</p>
<p>Scio supports Beam&rsquo;s <a href="https://beam.apache.org/releases/javadoc/2.58.1/?org/apache/beam/sdk/metrics/Counter.html" title="org.apache.beam.sdk.metrics.Counter"><code>Counter</code></a> <a href="https://beam.apache.org/releases/javadoc/2.58.1/?org/apache/beam/sdk/metrics/Distribution.html" title="org.apache.beam.sdk.metrics.Distribution"><code>Distribution</code></a> and <a href="https://beam.apache.org/releases/javadoc/2.58.1/?org/apache/beam/sdk/metrics/Gauge.html" title="org.apache.beam.sdk.metrics.Gauge"><code>Gauge</code></a>.</p>
<p>See <a href="https://spotify.github.io/scio/examples/MetricsExample.scala.html">MetricsExample</a>.</p>
<h2><a href="#scioresult" name="scioresult" class="anchor"><span class="anchor-link"></span></a>ScioResult</h2>
<p><a href="https://spotify.github.io/scio/api/com/spotify/scio/ScioResult.html" title="com.spotify.scio.ScioResult"><code>ScioResult</code></a> can be used to access metric values, individually or as a group:</p>
Expand Down Expand Up @@ -595,7 +595,7 @@ <h2><a href="#misc" name="misc" class="anchor"><span class="anchor-link"></span>
</div>
<div class="print-only">
<span class="md-source-file md-version">
0.14.5-35-e785613-20240724T153414Z*
0.14.6-18-085b288-20240821T085147Z*
</span>
</div>
</article>
Expand Down
10 changes: 5 additions & 5 deletions FAQ.html
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@
<ul style="display: none">
<li class="md-nav__item md-version" id="project.version">
<label class="md-nav__link" for="__version">
<i class="md-icon" title="Version">label_outline</i> 0.14.5-35-e785613-20240724T153414Z*
<i class="md-icon" title="Version">label_outline</i> 0.14.6-18-085b288-20240821T085147Z*
</label>
</li>
</ul>
Expand Down Expand Up @@ -584,7 +584,7 @@ <h4><a href="#how-do-i-access-various-files-outside-of-a-sciocontext-" name="how
<ul>
<li>For Scio version &gt;= <code>0.4.0</code></li>
</ul>
<p>Starting from Scio <code>0.4.0</code> you can use Apache Beam&rsquo;s <a href="https://beam.apache.org/releases/javadoc/2.57.0/?org/apache/beam/sdk/io/FileSystems.html" title="org.apache.beam.sdk.io.FileSystems"><code>Filesystems</code></a> abstraction:</p>
<p>Starting from Scio <code>0.4.0</code> you can use Apache Beam&rsquo;s <a href="https://beam.apache.org/releases/javadoc/2.58.1/?org/apache/beam/sdk/io/FileSystems.html" title="org.apache.beam.sdk.io.FileSystems"><code>Filesystems</code></a> abstraction:</p>
<pre class="prettyprint"><code class="language-scala mdoc:reset:silent">import org.apache.beam.sdk.io.FileSystems
// the path can be any of the supported Filesystems, e.g. local, GCS, HDFS
def readmeResource = FileSystems.matchNewResource(&quot;gs://&lt;bucket&gt;/README.md&quot;, false)
Expand All @@ -594,7 +594,7 @@ <h4><a href="#how-do-i-access-various-files-outside-of-a-sciocontext-" name="how
<li>For Scio version &lt; <code>0.4.0</code></li>
</ul><div class="callout note "><div class="callout-title">Note</div>
<p>This part is GCS specific.</p></div>
<p>You can get a <a href="https://beam.apache.org/releases/javadoc/2.57.0/?org/apache/beam/sdk/extensions/gcp/options/GcsOptions.html#getGcsUtil--" title="org.apache.beam.sdk.extensions.gcp.options.GcsOptions"><code>GcsUtil</code></a> instance from <code>ScioContext</code>, which can be used to open GCS files in read or write mode.</p>
<p>You can get a <a href="https://beam.apache.org/releases/javadoc/2.58.1/?org/apache/beam/sdk/extensions/gcp/options/GcsOptions.html#getGcsUtil--" title="org.apache.beam.sdk.extensions.gcp.options.GcsOptions"><code>GcsUtil</code></a> instance from <code>ScioContext</code>, which can be used to open GCS files in read or write mode.</p>
<pre class="prettyprint"><code class="language-scala mdoc:reset:silent">import com.spotify.scio.ContextAndArgs
import org.apache.beam.sdk.extensions.gcp.options.GcsOptions

Expand Down Expand Up @@ -780,7 +780,7 @@ <h4><a href="#can-i-use-trait-instead-of-method-" name="can-i-use-trait-instead-
<h4><a href="#how-to-inspect-the-content-of-an-scollection-" name="how-to-inspect-the-content-of-an-scollection-" class="anchor"><span class="anchor-link"></span></a>How to inspect the content of an <code>SCollection</code>?</h4>
<p>There is multiple options here: - Use <code>debug()</code> method on an <code>SCollection</code> to print its content as the data flows through the DAG during the execution (after the <code>run</code> or <code>runAndCollect</code>) - Use a debugger and setup break points - make sure to break inside of your functions to stop control at the execution not the pipeline construction time - In <a href="./Scio-REPL.html">Scio-REPL</a>, use <code>runAndCollect()</code> to execute the pipeline and materialize the contents of an <code>SCollection</code></p>
<h4><a href="#how-do-i-improve-side-input-performance-" name="how-do-i-improve-side-input-performance-" class="anchor"><span class="anchor-link"></span></a>How do I improve side input performance?</h4>
<p>By default, Dataflow workers allocate 100MB (see <a href="https://beam.apache.org/releases/javadoc/2.57.0/?org/apache/beam/runners/dataflow/options/DataflowWorkerHarnessOptions.html#getWorkerCacheMb--" title="org.apache.beam.runners.dataflow.options.DataflowWorkerHarnessOptions"><code>DataflowWorkerHarnessOptions#getWorkerCacheMb</code></a>) of memory for caching side inputs, and falls back to disk or network. Therefore jobs with large side inputs may be slow. To override this default, register <code>DataflowWorkerHarnessOptions</code> before parsing command line arguments and then pass <code>--workerCacheMb=N</code> when submitting the job.</p>
<p>By default, Dataflow workers allocate 100MB (see <a href="https://beam.apache.org/releases/javadoc/2.58.1/?org/apache/beam/runners/dataflow/options/DataflowWorkerHarnessOptions.html#getWorkerCacheMb--" title="org.apache.beam.runners.dataflow.options.DataflowWorkerHarnessOptions"><code>DataflowWorkerHarnessOptions#getWorkerCacheMb</code></a>) of memory for caching side inputs, and falls back to disk or network. Therefore jobs with large side inputs may be slow. To override this default, register <code>DataflowWorkerHarnessOptions</code> before parsing command line arguments and then pass <code>--workerCacheMb=N</code> when submitting the job.</p>
<pre class="prettyprint"><code class="language-scala mdoc:reset:silent">import com.spotify.scio._
import org.apache.beam.sdk.options.PipelineOptionsFactory
import org.apache.beam.runners.dataflow.options.DataflowWorkerHarnessOptions
Expand Down Expand Up @@ -824,7 +824,7 @@ <h4><a href="#how-to-manually-investigate-a-cloud-dataflow-worker" name="how-to-
</div>
<div class="print-only">
<span class="md-source-file md-version">
0.14.5-35-e785613-20240724T153414Z*
0.14.6-18-085b288-20240821T085147Z*
</span>
</div>
</article>
Expand Down
Loading

0 comments on commit 071ac07

Please sign in to comment.