Stream Grids 2.What is a Stream Workflow

Stream Workflow is a structured flow of data which collects, processes and analyzes high-volume data to generate real-time insights. These workflows use Apache Spark Streaming APIs to process streaming data in micro-batches and enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Support for Apache Storm will be added in future releases.

Composition of a Stream Workflow:

A Stream Workflow is made up of Sources, Transformations, Emitters & Persistent Stores.

• Sources: Data access in Stream Grids is realized by Sources which are built-in drag and drop operators to consume data from various data sources such as message queues, transactional databases, log files, and sensors for IOT data. Examples: Kafka, RabbitMQ, Twitter etc.

• Transformations: Transformations are the built-in operators for processing the streaming data by performing various transformations operations. Support for analytical operations to be added in future releases. Examples: Window, Sort, Join, Group, Enrich/Lookup, Deduplicate, Aggregation etc.

• Persistent Stores: Persistent Stores define the destination stage of a workflow which could be a NoSql store, relational database, or distributed File Systems. Examples: HDFS, HBase, Cassandra, Elastic Search, Solr etc.

• Emitters: Emitters are the same as persistent stores and act as destination stage of a workflow, except that they support further downstream operations on the streaming data. Examples could be like messaging queues like Kafka, RabbitMQ etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream Grids 2.What is a Stream Workflow

Composition of a Stream Workflow:

Clone this wiki locally