From 08a19b96a09b11bbff1b183954ed2244ba0b029b Mon Sep 17 00:00:00 2001 From: Anton Gulenko Date: Wed, 27 May 2020 12:52:22 +0200 Subject: [PATCH] Update index.md --- docs/index.md | 109 +++++++++++++------------------------------------- 1 file changed, 28 insertions(+), 81 deletions(-) diff --git a/docs/index.md b/docs/index.md index cfe78e1..e502ff2 100644 --- a/docs/index.md +++ b/docs/index.md @@ -4,107 +4,54 @@ [![Reliability](https://ci.bitflow.team/sonarqube/api/project_badges/measure?project=bitflow4j&metric=reliability_rating)](http://wally144.cit.tu-berlin.de/sonarqube/dashboard?id=bitflow4j) # bitflow4j -bitflow4j is a lightweight framework for performing data analysis on streamed timeseries data. -This library implements sending and receiving of a data over various transport channels. +bitflow4j is a plugin for [`go-bitflow`](https://github.com/bitflow-stream/go-bitflow) that allows writing and executing datastream operators in Java. +Java operators can be used inside the Bitflowscript executed by `go-bitflow`. +The main `go-bitflow` dataflow graph runs as a single Go process, while each `bitflow4j` operator is executed in a separate child process that receives input data over the standard input and produces results on the standard output. -## Installation +# Installation -At first, you can +All commands are executed in the repository root. -``` -$ git clone https://github.com/bitflow-stream/bitflow4j -$ cd bitflow4j -$ mvn install -``` -Within the `/target` directory, the file `bitflow4j-0.0.1-jar-with-dependencies.jar` is created. This file should be used for further usages. - -## Bitflow script usage and examples +### Full build (slow but stable, Alpine only) -It is convenient, to invoke the Bitflow4j functionality through a domain specific language called `Bitflow Script`. -This lightweight scripting language is implemented in the `bitflow-stream` sub-directory of the [antlr-grammars repository](https://github.com/bitflow-stream/antlr-grammars). -See the [README of the Bitflow Script](https://github.com/bitflow-stream/antlr-grammars/tree/master/bitflow-script) for details regarding the script syntax. +A full Docker image for the amd64 platform (based on Alpine Linux) can be built with the following command: -The main entrypoint for a Bitflow Script is the class `bitflow4j.Main`. -It can be started with the `--help` parameter to list all command line options. -The script can be passed directly on the command line, or in a file, using the `-f` option: - -``` -java -cp bitflow4j.jar bitflow4j.Main -f bitflow-script.bf ``` +docker build -t IMAGE_NAME -f build/multi-stage/alpine-full.Dockerfile . ``` -java -cp bitflow4j.jar bitflow4j.Main -s "input.csv -> Standardize() -> output.csv" -``` - -When starting, the Java implementation of the Bitflow Script scans a list of packages for pipeline steps that will be made available in the script. The list of packages to scan defaults to `bitflow4j`, but can be modified by passing a comma-separated list of packages to the `-P` parameter. - -To see which pipeline steps and forks are available, start `bitflow4j.Main` with the `--capabilities` parameter. -Specify the `-v` parameter for verbose logging, which will show which classes have been scanned, but not registered for usage in the script, because they do not fullfill the requirements listed below. - -### Registering new Pipeline Steps +### Cached build (ARM or Alpine) -Installation via maven (add this to your pom.xml): +Build the Jar file natively and then build the Docker container. +Choose a target platform, one of `[alpine|arm32v7|arm64v7]`. ``` - - bitflow4j - bitflow4j - 0.0.1 - +TARGET=[arm32v7|arm64v8|alpine] +mvn install +docker build -t IMAGE_NAME -f build/$TARGET.Dockerfile . ``` -To provide functionality inside a Bitflow Script, a class must implement one of 3 interfaces and 1 abstract class. -All implementations of these interfaces are scanned in a list of packages, as describe above. +# Dockerhub -The general conditions for successfully registering a class as a pipeline step are: +Docker container images are available in the [`bitflowstream/bitflow-pipeline-java`](https://hub.docker.com/repository/docker/bitflowstream/bitflow-pipeline-java) Dockerhub repository: -* The class must not be abstract -* The class must have at least one constructor with only the following types: `String`, `int`, `long`, `double`, `float`, `boolean`, and the respective boxed types. -* Alternatively, a constructor with a single `java.lang.Map` constructor is supported. In this case the parameters are not parsed or validated, but simply passed to the constructor inside the Map. - -Registered classes are available in the script under their **simple names**. -Currently, if multiple implementations share a name, only one of them will be available in the script. Such collisions will be logged as a warning. - -The parameter names provided in the script must satisfy one of the constructors in the class that implements the pipeline step. -The provided parameter values will be parsed to the expected types (`double`, `boolean`, etc), except if the `Map` constructor is used. -An error will be shown if parsing a parameter fails or if no fitting constructor can be found. - -The following interfaces can be implemented. See the documentation of the Bitflow Script for more information. - - - `bitflow4j.PipelineStep` (abstract class): Registered subclasses of `PipelineStep` can directly be instantiated and added as a single pipeline step. - - `bitflow4j.registry.ProcessingStepBuilder` (interface): Registered instances of this interface can be used to modify the created pipeline in more complex ways than adding a single pipeline step. For example, multiple connected instances of `PipelineStep` can be instantiated and added in a predefined sequence. This should only be used, when pipeline steps depend on each other, so that a constructor with only primitive parameter types is not sufficient. - - `bitflow4j.steps.fork.ScriptableDistributor` (interface): Registered implementations of `ScriptableDistributor` can be instantiated as a single fork step inside the pipeline. They will receive all defined sub pipelines and must distribute samples according to their custom logic. - - `bitflow4j.registry.ForkBuilder` (interface): If the creation of a `ScriptableDistributor` cannot be implemented in a single constructor with primitive parameter types, a `ForkBuilder` can be used to move the creation of the `ScriptableDistributor` to a dedicated builder function. However, contrary to a `PipelineBuilder`, the `Pipeline` can not be further modified by a `ForkBuilder`. +``` +docker pull bitflowstream/bitflow-pipeline-java +docker run -ti bitflowstream/bitflow-pipeline-java --help +``` -## Usage in Java code +The Docker manifest will select the appropriate platform (amd64/arm32v7/arm64v8) automatically. -The basic data entity is a `bitflow4j.Sample`, which consists of a timestamp, a vector of double values (double array), and a String-map of tags. -Supported marshalling formats are CSV and a dense binary format. -Supported transport channels are files, standard I/O, and TCP. -Received or generated Samples can be modified or analysed through a Pipeline object, which sends incoming Samples through a chain of -transformation steps implementing the bitflow4j.PipelineStep interface. +# Usage -Bitflow4j can be used programmatically through the `Pipeline` class and different instances and implementations of `Source` and `PipelineStep`. +For the usage of the [`bitflowstream/bitflow-pipeline-java`](https://hub.docker.com/repository/docker/bitflowstream/bitflow-pipeline-java) container, see the [`go-bitflow`](https://github.com/bitflow-stream/go-bitflow) documentation. -``` -new Pipeline() - .inputCsv("/some/path/to/data.csv") - .HistogramAlgorithm("/path/to/visualization/output") - .outputCsv("/path/to/output/results.csv") - .runAndWait(); -``` +`bitflow4j` allows to use the `java(step=NAME, args={ a1=v1, a2=v2 }, exe-args=JAVA_ARGS)` operator. +When starting, `bitflow4j` scans for implementations of the [`bitflow4j.ProcessingStep`](src/main/java/ProcessingStep.java) interface. +All non-abstract classes can be used as `NAME` in the `java()` operator, by default referred to through their simple class name. -Also parallel pipelines can be defined via forks. Forks are also steps, which provide the possibility to define several different subpipelines. There exists different distribution techniques like `ForkTag`, `MultiplexDistributor` and `RoundRobinDistributor`. +Example: ``` -ForkTag forkTag = new ForkTag("tagName"); -forkTag.setSubPipelines( - Arrays.asList( - new Pair<>("key1", new Pipeline().step(new BatchFeatureScaler.MinMax())), - new Pair<>("key2", new Pipeline().step(new BatchFeatureScaler.Standardize())))); -new Pipeline() - .inputCsv("/some/path/to/data.csv") - .step(new Fork(forkTag)) - .outputCsv("/path/to/output/results.csv") - .runAndWait(); +docker run -p 8888 -ti bitflowstream/bitflow-pipeline-java ':8888 -> java(step=echo, args={msg=hello}) -> text://-' ```