Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update index.md #88

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 28 additions & 81 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,107 +4,54 @@
[![Reliability](https://ci.bitflow.team/sonarqube/api/project_badges/measure?project=bitflow4j&metric=reliability_rating)](http://wally144.cit.tu-berlin.de/sonarqube/dashboard?id=bitflow4j)

# bitflow4j
<b>bitflow4j</b> is a lightweight framework for performing data analysis on streamed timeseries data.
This library implements sending and receiving of a data over various transport channels.
<b>bitflow4j</b> is a plugin for [`go-bitflow`](https://github.com/bitflow-stream/go-bitflow) that allows writing and executing datastream operators in Java.
Java operators can be used inside the Bitflowscript executed by `go-bitflow`.
The main `go-bitflow` dataflow graph runs as a single Go process, while each `bitflow4j` operator is executed in a separate child process that receives input data over the standard input and produces results on the standard output.

## Installation
# Installation

At first, you can
All commands are executed in the repository root.

```
$ git clone https://github.com/bitflow-stream/bitflow4j
$ cd bitflow4j
$ mvn install
```
Within the `/target` directory, the file `bitflow4j-0.0.1-jar-with-dependencies.jar` is created. This file should be used for further usages.

## Bitflow script usage and examples
### Full build (slow but stable, Alpine only)

It is convenient, to invoke the Bitflow4j functionality through a domain specific language called `Bitflow Script`.
This lightweight scripting language is implemented in the `bitflow-stream` sub-directory of the [antlr-grammars repository](https://github.com/bitflow-stream/antlr-grammars).
See the [README of the Bitflow Script](https://github.com/bitflow-stream/antlr-grammars/tree/master/bitflow-script) for details regarding the script syntax.
A full Docker image for the amd64 platform (based on Alpine Linux) can be built with the following command:

The main entrypoint for a Bitflow Script is the class `bitflow4j.Main`.
It can be started with the `--help` parameter to list all command line options.
The script can be passed directly on the command line, or in a file, using the `-f` option:

```
java -cp bitflow4j.jar bitflow4j.Main -f bitflow-script.bf
```
docker build -t IMAGE_NAME -f build/multi-stage/alpine-full.Dockerfile .
```
java -cp bitflow4j.jar bitflow4j.Main -s "input.csv -> Standardize() -> output.csv"
```

When starting, the Java implementation of the Bitflow Script scans a list of packages for pipeline steps that will be made available in the script. The list of packages to scan defaults to `bitflow4j`, but can be modified by passing a comma-separated list of packages to the `-P` parameter.

To see which pipeline steps and forks are available, start `bitflow4j.Main` with the `--capabilities` parameter.
Specify the `-v` parameter for verbose logging, which will show which classes have been scanned, but not registered for usage in the script, because they do not fullfill the requirements listed below.


### Registering new Pipeline Steps
### Cached build (ARM or Alpine)

Installation via maven (add this to your pom.xml):
Build the Jar file natively and then build the Docker container.
Choose a target platform, one of `[alpine|arm32v7|arm64v7]`.

```
<dependency>
<groupId>bitflow4j</groupId>
<artifactId>bitflow4j</artifactId>
<version>0.0.1</version>
</dependency>
TARGET=[arm32v7|arm64v8|alpine]
mvn install
docker build -t IMAGE_NAME -f build/$TARGET.Dockerfile .
```

To provide functionality inside a Bitflow Script, a class must implement one of 3 interfaces and 1 abstract class.
All implementations of these interfaces are scanned in a list of packages, as describe above.
# Dockerhub

The general conditions for successfully registering a class as a pipeline step are:
Docker container images are available in the [`bitflowstream/bitflow-pipeline-java`](https://hub.docker.com/repository/docker/bitflowstream/bitflow-pipeline-java) Dockerhub repository:

* The class must not be abstract
* The class must have at least one constructor with only the following types: `String`, `int`, `long`, `double`, `float`, `boolean`, and the respective boxed types.
* Alternatively, a constructor with a single `java.lang.Map<String, String>` constructor is supported. In this case the parameters are not parsed or validated, but simply passed to the constructor inside the Map.

Registered classes are available in the script under their **simple names**.
Currently, if multiple implementations share a name, only one of them will be available in the script. Such collisions will be logged as a warning.

The parameter names provided in the script must satisfy one of the constructors in the class that implements the pipeline step.
The provided parameter values will be parsed to the expected types (`double`, `boolean`, etc), except if the `Map<String, String>` constructor is used.
An error will be shown if parsing a parameter fails or if no fitting constructor can be found.

The following interfaces can be implemented. See the documentation of the Bitflow Script for more information.

- `bitflow4j.PipelineStep` (abstract class): Registered subclasses of `PipelineStep` can directly be instantiated and added as a single pipeline step.
- `bitflow4j.registry.ProcessingStepBuilder` (interface): Registered instances of this interface can be used to modify the created pipeline in more complex ways than adding a single pipeline step. For example, multiple connected instances of `PipelineStep` can be instantiated and added in a predefined sequence. This should only be used, when pipeline steps depend on each other, so that a constructor with only primitive parameter types is not sufficient.
- `bitflow4j.steps.fork.ScriptableDistributor` (interface): Registered implementations of `ScriptableDistributor` can be instantiated as a single fork step inside the pipeline. They will receive all defined sub pipelines and must distribute samples according to their custom logic.
- `bitflow4j.registry.ForkBuilder` (interface): If the creation of a `ScriptableDistributor` cannot be implemented in a single constructor with primitive parameter types, a `ForkBuilder` can be used to move the creation of the `ScriptableDistributor` to a dedicated builder function. However, contrary to a `PipelineBuilder`, the `Pipeline` can not be further modified by a `ForkBuilder`.
```
docker pull bitflowstream/bitflow-pipeline-java
docker run -ti bitflowstream/bitflow-pipeline-java --help
```

## Usage in Java code
The Docker manifest will select the appropriate platform (amd64/arm32v7/arm64v8) automatically.

The basic data entity is a `bitflow4j.Sample`, which consists of a timestamp, a vector of double values (double array), and a String-map of tags.
Supported marshalling formats are CSV and a dense binary format.
Supported transport channels are files, standard I/O, and TCP.
Received or generated Samples can be modified or analysed through a Pipeline object, which sends incoming Samples through a chain of
transformation steps implementing the bitflow4j.PipelineStep interface.
# Usage

Bitflow4j can be used programmatically through the `Pipeline` class and different instances and implementations of `Source` and `PipelineStep`.
For the usage of the [`bitflowstream/bitflow-pipeline-java`](https://hub.docker.com/repository/docker/bitflowstream/bitflow-pipeline-java) container, see the [`go-bitflow`](https://github.com/bitflow-stream/go-bitflow) documentation.

```
new Pipeline()
.inputCsv("/some/path/to/data.csv")
.HistogramAlgorithm("/path/to/visualization/output")
.outputCsv("/path/to/output/results.csv")
.runAndWait();
```
`bitflow4j` allows to use the `java(step=NAME, args={ a1=v1, a2=v2 }, exe-args=JAVA_ARGS)` operator.
When starting, `bitflow4j` scans for implementations of the [`bitflow4j.ProcessingStep`](src/main/java/ProcessingStep.java) interface.
All non-abstract classes can be used as `NAME` in the `java()` operator, by default referred to through their simple class name.

Also parallel pipelines can be defined via forks. Forks are also steps, which provide the possibility to define several different subpipelines. There exists different distribution techniques like `ForkTag`, `MultiplexDistributor` and `RoundRobinDistributor`.
Example:

```
ForkTag forkTag = new ForkTag("tagName");
forkTag.setSubPipelines(
Arrays.asList(
new Pair<>("key1", new Pipeline().step(new BatchFeatureScaler.MinMax())),
new Pair<>("key2", new Pipeline().step(new BatchFeatureScaler.Standardize()))));
new Pipeline()
.inputCsv("/some/path/to/data.csv")
.step(new Fork(forkTag))
.outputCsv("/path/to/output/results.csv")
.runAndWait();
docker run -p 8888 -ti bitflowstream/bitflow-pipeline-java ':8888 -> java(step=echo, args={msg=hello}) -> text://-'
```