A spark job reads from Kafka topic, manipulates data as datasets/dataframes and writes to Cassandra.
-
Inside
setup
directory, rundocker-compose up -d
to launch instances ofzookeeper
,kafka
andcassandra
-
Wait for a few seconds and then run
docker ps
to make sure all the three services are running. -
Then run
pip install -r requirements.txt
-
main.py
generates some random data and publishes it to a topic in kafka. -
Run the spark-app using
sbt clean compile run
in a console. This app will listen on topic (check Main.scala) and writes it to Cassandra. -
Again run
main.py
to write some test data on a kafka topic. -
Finally check if the data has been published in cassandra.
- Go to cqlsh
docker exec -it cas_01_test cqlsh localhost
- And then run
select * from my_keyspace.test_table ;
- Another branch avro-example contains avro deserialization code.
Credits:
- This repository has borrowed some snippets from killrweather app.