Skip to content

Spark structured streaming with Kafka data source and writing to Cassandra

License

Notifications You must be signed in to change notification settings

ansrivas/spark-structured-streaming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is an example of structured-streaming with latest Spark v2.1.0.

A spark job reads from Kafka topic, manipulates data as datasets/dataframes and writes to Cassandra.

Usage:

  1. Inside setup directory, run docker-compose up -d to launch instances of zookeeper, kafka and cassandra

  2. Wait for a few seconds and then run docker ps to make sure all the three services are running.

  3. Then run pip install -r requirements.txt

  4. main.py generates some random data and publishes it to a topic in kafka.

  5. Run the spark-app using sbt clean compile run in a console. This app will listen on topic (check Main.scala) and writes it to Cassandra.

  6. Again run main.py to write some test data on a kafka topic.

  7. Finally check if the data has been published in cassandra.

  • Go to cqlsh docker exec -it cas_01_test cqlsh localhost
  • And then run select * from my_keyspace.test_table ;
  1. Another branch avro-example contains avro deserialization code.

Credits:

  • This repository has borrowed some snippets from killrweather app.

About

Spark structured streaming with Kafka data source and writing to Cassandra

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published