A proof of concept about collecting clickstream data using Javascript, Divolte Collector, Apache Kafka and Java consumer application.
Divolte Collector with Apache Kafka
- Divolte Collector and Apache Kafka: require java 8+
- Create a folder with bellow files:
- index.html, for geo-location information I have used this open source tool geolocation-db
- main.js
- Download divolte collector // you may check the latest version available in the official website.
wget http://divolte-releases.s3-website-eu-west-1.amazonaws.com/divolte-collector/0.9.0/distributions/divolte-collector-0.9.0.tar.gz
- Unzip the tar file, and go inside the folder.
tar -xzf divolte-collector-*.tar.gz
cd divolte-collector-*
- Create bellow files in /conf folder with related content:
- Run Divolte collector server.
./bin/divolte-collector
- Download Apache Kafka // you may check the latest version available in the official website.
wget https://downloads.apache.org/kafka/2.8.0/kafka_2.13-2.8.0.tgz
- Unzip the tar file, and go inside the folder.
tar xzf kafka_*.tgz
cd kafka_*/
- You can update config files if you wich, we are going to use the default values:
- Run Zookeeper.
./bin/zookeeper-server-start.sh ./config/zookeeper.properties
- Run Kafka cluster.
./bin/kafka-server-start.sh ./config/server.properties
- You can create a Kafka topic named "tracking", but it is unnecessary because Divolte Collector is going to create it by default.
./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic tracking
- You can create a Kafka consumer CLI for a quick check.
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic tracking
- Downlaod the java project as subfolder from this repository using SVN.
svn checkout https://github.com/soufianeodf/youtube-divolte-collector-with-apache-kafka/trunk/kafka-consumer
- Open project in your prefer text editor.
- Update Maven to download all dependencies.
- Run KafkaConsumerExample Class.