Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

divolte-collector file which is inside bin folder is can be run only in Linux Machine? #1

Open
prk2331 opened this issue Jan 16, 2022 · 17 comments
Assignees

Comments

@prk2331
Copy link

prk2331 commented Jan 16, 2022

I am facing issue while running this script in windows.
.\bin\divolte-collector

So basically this will open the file not execute it
Can you please help!

@soufianeodf
Copy link
Owner

soufianeodf commented Jan 16, 2022

Unfortunatelly Divolte Collector does not support Windows, as described in the documentation: http://divolte-releases.s3-website-eu-west-1.amazonaws.com/divolte-collector/0.9.0/userdoc/html/getting_started.html, "Divolte Collector is currently only supported on Unix-like systems, such as Linux or Mac OS X. While it should also work in Cygwin, we haven’t tested this yet.".

But the good news is that you can use Docker to run it in Windows, you can check this repository https://github.com/soufianeodf/youtube-divolte-kafka-druid-superset where I built Divolte Collector Docker image, or you can check this repository of Divolte team itself, to see how they built the Docker image or use it: https://github.com/divolte/docker-divolte.

@prk2331
Copy link
Author

prk2331 commented Jan 16, 2022

Thanks for your reply @soufianeodf
Really Appreciated !
I will check this docker image

@prk2331
Copy link
Author

prk2331 commented Jan 16, 2022

hi @soufianeodf
I am facing this error

$ docker run baa86b5b4117
standard_init_linux.go:228: exec user process caused: no such file or directory

@soufianeodf
Copy link
Owner

Can you show me please your Dockerfile content ?

@soufianeodf soufianeodf self-assigned this Jan 16, 2022
@prk2331
Copy link
Author

prk2331 commented Jan 17, 2022

@soufianeodf
Copy link
Owner

soufianeodf commented Jan 17, 2022

Update the last line in Dockerfile by this one CMD ["sh","/opt/divolte/start.sh"] because you are using Windows.

@prk2331
Copy link
Author

prk2331 commented Jan 17, 2022

@soufianeodf yes its working,
So right now here what we are doing we are using docker to run the Divolte collector and all conf would be copied from local and paste into the docker container when we build the docker container.

then I changed divolte-collector conf according to the container path

schema_file = "/opt/divolte/divolte-collector/conf/MyEventRecord.avsc"
mapping_script_file = "/opt/divolte/divolte-collector/conf/mapping.groovy"

Docker build and expose port

sudo docker build -t divoltecollector .
sudo docker run -p 8290:8290 divoltecollector:latest

So Kafka essence is loosed here I think because our Divolte collector is running in our container and kafka is running
outside the container thats why giving me an error Broker Not available and localhost:8290 is not accessible outside the container, inside container localhost:8290 is reachable.

`So here do we need to set up the Kafka in DockerFile?
Please help

2022-01-17 15:47:14.405Z [kafka-producer-network-thread | divolte.collector] WARN [NetworkClient]: [Producer clientId=divolte.collector] Connection to node -1 could not be established. Broker may not be available

`

@soufianeodf
Copy link
Owner

Yes you can use an image of Apache Kafka, you can check the following docker-compose: https://github.com/soufianeodf/youtube-divolte-kafka-druid-superset/blob/main/docker-compose.yml, just pick the neccessary elements you need.

@prk2331
Copy link
Author

prk2331 commented Jan 18, 2022

Hello @soufianeodf
Can you please guide me in couple of questions:

A. So in Avro schema file, "name" key contains value of Kafka Topic name or divolte signal event fire name
I think Kafka Topic name,
1. then what should be the use of myCustomEvent in below use case?
2. type: "record" is use for in avro?
3. Divolte signal first parameter contains, Custom event name or DOM Events? because I run your example it works very well and when I changed divolte.signal('onloadEvent',{"os_name": "Ubuntu"}) to something else it still works fine

   _JS file_:       
            divolte.signal('myCustomEvent',{"os_name": "Ubuntu"})

  _groovy_:
           map eventParameter('os_name') onto 'os_name'

  _Avro Schema:_
                {
             "name": "tracking",
             "type": "record",
            "fields": [
              { "name": "os_name", "type": "string" }
              ]
  
        }

B. I am unable to create Multiple schema in this one topic ? I tried by space Just after the first schema ?

@soufianeodf
Copy link
Owner

soufianeodf commented Jan 18, 2022

Hey @prk2331,

Good questions.

A-0: so for the Avro schema, "name" it's just an arbitrary name that you can give to your object, it doesn't matter in this case, you can learn more about avro schema in the documentation: https://avro.apache.org/docs/current/gettingstartedjava.html, and you can play with this online tool: http://avro4s-ui.landoop.com, it will give you an Avro schema for a Json payload that you pass, will help you understand Avro more, but normally you should read the documentation and build your Avro schema yourself.

A-1: myCustomEvent is the eventType, that you can name it whatever you want, and then in Groovy file, you can mappe it to a variable name you defined in the Avro schema. You can check the Divolte Collector documentation:
http://divolte-releases.s3-website-eu-west-1.amazonaws.com/divolte-collector/0.9.0/userdoc/html/configuration.html#browser-sources

http://divolte-releases.s3-website-eu-west-1.amazonaws.com/divolte-collector/0.9.0/userdoc/html/mapping_reference.html#eventtype

A-2: "record" in Avro means an object in Json, again, you will need to check the documentation to understand Avro more.

A-3: No, Divolte signal first parameter is an eventType not a DOM event, as explained previousely.

B: If you want to send multiple events with defferent schemas to an Apache Kafka topic, you will need to create separate Avro schema files, and their related mapping.

@soufianeodf
Copy link
Owner

You can check my youtube channel: https://www.youtube.com/channel/UC7uhy5NJ3Cenz0kNNmtsw1g where I shared couple of videos explaining those details.

@prk2331
Copy link
Author

prk2331 commented Jan 20, 2022

Hi @soufianeodf
your all provided links helped me a lot
now i am struggling to find out this one is achievable or not

I decided to integrate Divolte with Web pack can you please help me out to understand How can we run it as a Static js page or something like offline way. My Wrapper JS call this Divolte .
I tried this one :- I Download the Divolte.js and paste into my.js , remove <script src="http://localhost:8290/divolte.js"> to <script src="my.js"> but 2 cse-events are not found in console error

@prk2331
Copy link
Author

prk2331 commented Jan 23, 2022

hi @soufianeodf
Can you please help me out

did you find out the solution for this problem ?
https://stackoverflow.com/questions/68850178/divolte-collector-and-kafka-send-each-event-to-its-specific-topic

@soufianeodf
Copy link
Owner

soufianeodf commented Jan 23, 2022

Hey @prk2331 , glad to hear that the provided links helped you.

About downloading the divolte js file, I'm afraid to tell it's not going to work, because that javascript file is coming from divolte server, and it's the one that passes the data to Divolte-Collector, so if you download the js content, you will not send the data to Divolte server, so you are not going to see anything comming to Divolte.

About this issue, this is a limitation that Divolte has, and the solution that I have used is having only one huge schema, that I'm going to send to Divolte, and then to one topic in Apache Kafka, then using Logstash as a filter, that will use for example a field that I called "category", to filter with, and then push each category type to a specific ElasticSearch index.

@prk2331
Copy link
Author

prk2331 commented Jan 24, 2022

As I completed my all testcases with Divolte which are required to integrate with my project
So as I checked I am also getting this error which you faced
as I created two Kafka and there separated groovy file and Avro files and sink with their separated Kafka topic , So here what they do Pushing records to all Kafka sinks which is totally wrong here. if the any one of field is same as other schema field then both Kafka topic getting same values on that field, and which are not matched they are getting blank string, but its is pushing records to all the Kafka sinks
If this is the issue then I have to drop using Divolte
Can you please help me out in this what alternate solution we can be able to apply here?

As I am struggled to find out this one, I am passing the random string event name:
here below I am passing "myevent"
divolte.signal('myevent', {eventType: "view", userId: "282", productId: "181", bar: "1"})
so what is the use of myevent here? I think it would be for
By passing the event name we can bind some mapping values to the avro schema fields something like that?

Major issue is here this divolte is pushing the messages to all sinked kafka's

@soufianeodf
Copy link
Owner

Hey @prk2331

Yes, this is an issue with the Divolte, and as I told you, the solution I have adopted is creating only one Avro file, which will contain the merge of your two avro files, for example if you send two payloads that match respectively two avro schema files avro-1.avsc and avro-2.avsc:

paylod that match first avro file

{
   "eventType":"product_detail",
   "userId":"282",
   "productId":"181",
   "bar":"1",
}

paylod that match second avro file

{
   "eventType":"user_information",
   "userName":"john doe",
   "userEmail":"[email protected]",
}

the final paylod // when eventType is product_detail

{
   "eventType":"product_detail", // ["product_detail", "user_information"]

   "product___userId":"282",
   "product___productId":"181",
   "product___bar":"1",

   "user___userName":"",
   "user___userEmail":"",
}

the final paylod // when eventType is user_information

{
   "eventType":"user_information", // ["product_detail", "user_information"]

   "product___userId":"",
   "product___productId":"",
   "product___bar":"",

   "user___userName":"john doe",
   "user___userEmail":"[email protected]",
}

And then you will send that payload to only one Kafka topic.

And then you will need to use Logstash as a filter, and push the result to an ElasticSearch, the content of the logstash file will be something like this:

input {
    kafka {
        bootstrap_servers => "ip-address-of-your-kafka-server:9092"
        topics => ["your-kafka-topic-name"]
        codec => avro {
            schema_uri => "/usr/share/logstash/config/you-avro-schema.avsc"
        }
        value_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
    }
}

filter {
    if [eventType] == 'product_detail' {
        prune {
            blacklist_names => ["^user___.*"]
        }
    }

    if [eventType] == 'user_information' {
        prune {
            blacklist_names => ["^product___.*"]
        }
    }
}

output {
    stdout {
        codec => rubydebug
    }

    if [eventType] == 'product_detail' {
        elasticsearch {
            hosts => "elasticsearch:9200"
            index => "my_product_detail_index"
        }
    }
    else if [eventType] == 'user_information' {
        elasticsearch {
            hosts => "elasticsearch:9200"
            index => "my_user_information_index"
        }
    }
    else {
        elasticsearch {
            hosts => "elasticsearch:9200"
            index => "my_unrecognized_events_index"
        }
    }
}

@prk2331
Copy link
Author

prk2331 commented Dec 31, 2022

Hi @soufianeodf
Wishing you a very Happy New Year to you and your family
Can you please provide me your opinion regarding (Action Tracking) Like-Dislike capture, Touch events on click etc.
i am assuming for this use case the settlement which we are using in Divolte Logstash that wont works her
that Divolte is only perfect for event capturing (Add to cart, Purchase etc.)

Do we write custom JS to capture these Action Tracking ? or do we make this settlement in Divolte only (eventType == LIKE {"product_id": "102x"}.... what is the best practise
Even though one more thing which I want to tell you , Divolte party Id or visitor id, and session Id is also working wrong , after every refresh party id, and session id changes (persistent issue is there), and after session timeout , new session wont be created in the next payload, (incremental web session issue is there ) I am making this through my custom logic in front end and save in cookies, so we should know from where to start (code is under development) and I got new requirement as well for Action Tracking.

We are making Recommendation Engine Product

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants