Skip to content

Stream Grids 3.How to create a stream workflow

mishisinku edited this page Sep 16, 2017 · 1 revision

The following steps are involved in building a Stream Workflow:

Source Configuration

This section allows the user to define the sources for the Streaming data. The supported sources currently are Kafka, Kinesis, MQTT, RabbitMQ, S3, Socket, Twitter.

Under Stream Jobs tab, click on 'Source Configuration'. This lists down all the existing connections defined so far. To create a new connection, follow these steps: Click on Create Connection button.

source_connections

Choose a source from the dropdown view:

source_dropdown

Provide the source related configuration and click on 'Create Connection'

kafka_properties

This adds the connection to the list of connections displayed earlier.

Message Configuration

This section allows the user to define the format and schema(fields and their datatypes) of the data that will be received from the streams. Each message created by the user will be tied to a Source configuration created in the earlier step. Supported formats are XML, JSON, Delimited, Avro, Regex, Fixed Width. User can also choose from an existing set of templates(Apache logs, Write Ahead logs, Twitter tweets etc). These templates internally contain the format and schema definition and user can easily add/remove/change the data fields/data types defined by the template.

Nested fields are also supported in JSON and XML formats. The following images illustrate the steps to define a JSON message and link it to the Kafka Source Configuration created above.

  1. Select a source configuration and corresponding topic on the left side of the screen.

msg_wizard_1

  1. On the right side of the screen, provide the necessary input like Message Name (person_details), Use default Message (No), Message format (JSON).

msg_wizard_1a

  1. Click 'next' to navigate to the next screen of the wizard. And upload a sample.json and click on Upload file button.

msg_wizard_2

  1. Click 'next' to navigate to the next screen of the wizard, it shows a list of all the fields and their datatypes are automatically inferred. You can add/modify/remove the fields or their datatypes using the edit, add new record and trash buttons.

msg_wizard_3

Sink Configuration

This section allows the user to define the target for the Streaming data. Data would go through all the configured transformations and finally be persisted to a persistent store. Under Stream Jobs tab, click on 'Sink Configuration'. This lists down all the existing connections defined so far. To create a new connection, follow these steps: Click on Create Connection button.

persistent_store

Choose a persistent store/sink from the dropdown view:

persistent_store_2

Provide the sink related configuration and click on 'Create Connection'

persistent_store_2a

This adds the connection to the list of connections displayed earlier.

You can repeat the same process for configuring an emitter also.

Workflow Creation

Now we have created all the prerequisites needed to create a simple workflow. (1 source, 1 message, 1 persistent store). Let's create a simple workflow now. To do the same, click on 'Create Workflow' under 'Stream Jobs' tab.

workflow_1

You will be redirected to a grid, where you can drag and drop source/transformation/emitter/persistent store operators and join them.

workflow_2

Choose one of each source(Kafka)/transformation(Filter)/persistent store(Hive).

workflow_3

Now link the operators.

workflow_4

Double click on the source now to configure it. Choose the message you created earlier to link this source with the message.

workflow_5

Now double click on the Filter node, to configure the filter criterion, as shown below. Click save.

workflow_6

Now double click on the Persistent Store node to configure it and choose the corresponding connection we created and related configuration.Click save.

workflow_6a

Now we have setup our workflow. Click on Actions->Go to Workflow page. This will redirect to a dashboard containing all stream workflows created in the cluster so far.

workflow_7

Similarly we can create workflows with complexity varying from above example to a workflow containing multiple sources/transformations/persistent stores. A few examples of complex workflows can look like this:

lta_1

bts_1

Clone this wiki locally