Skip to content

Stream Grids 2.What is a Stream Workflow

Sri Harsha Boda edited this page Sep 15, 2017 · 1 revision

Stream Workflow is a structured flow of data which collects, processes and analyzes high-volume data to generate real-time insights. These workflows use Apache Spark Streaming APIs to process streaming data in micro-batches and enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Support for Apache Storm will be added in future releases.

Composition of a Stream Workflow:

A Stream Workflow is made up of Sources, Transformations, Emitters & Persistent Stores.

• Sources: Data access in Stream Grids is realized by Sources which are built-in drag and drop operators to consume data from various data sources such as message queues, transactional databases, log files, and sensors for IOT data. Examples: Kafka, RabbitMQ, Twitter etc.

• Transformations: Transformations are the built-in operators for processing the streaming data by performing various transformations operations. Support for analytical operations to be added in future releases. Examples: Window, Sort, Join, Group, Enrich/Lookup, Deduplicate, Aggregation etc.

• Persistent Stores: Persistent Stores define the destination stage of a workflow which could be a NoSql store, relational database, or distributed File Systems. Examples: HDFS, HBase, Cassandra, Elastic Search, Solr etc.

• Emitters: Emitters are the same as persistent stores and act as destination stage of a workflow, except that they support further downstream operations on the streaming data. Examples could be like messaging queues like Kafka, RabbitMQ etc.

Clone this wiki locally