datahub-dine/tutorials/customer return prediction at main · SAP-samples/datahub-dine

Name	Name	Last commit message	Last commit date
parent directory ..
code snippets	code snippets
images	images
README.md	README.md
dockerfile	dockerfile

Customer Return Prediction

Prerequisite

you should have an instance of mysql server up and running.

create a table called "return" in mysql and load the data in this table from return.csv.
To run this pipeline you should have one s3 account.

upload the customer.csv and product.csv in s3 bucket.
One Hana system should be up and running.

create a table called "soHeader" in Hana and upload the data from soHeader.csv file.

Creating pipeline for return prediction

Open the Data Hub dashboard and open the modeler.
Create a new graph by clicking on the "+" sign on the top.
Search for the "MySQL Table Consumer" operator in the operator's section.
Drag and drop "MySQL Table Consumer" operator in to the graph.
Click on “Open Configuration” and provide the connection details for your mysql server. Select the source table as "return" which you have create in prerequisite section.
Next drag and drop "Flowagent CSV Producer" operator in the graph.
Connect the above two operators.
Next step is to connect to the hana system. For that in datahub we have "HANA Table Consumer" operator which allows us to consume hana table. drag and drop this operator in the graph.
Click on "Open Configuration" and provide the connection details for your HANA system. select the source table as "soHeader" which you would have created in the prerequisite steps.
Drag and drop "Flowagent CSV Producer" in the graph and connect it with the "HANA Table Consumer" as shown below.

Drag and drop two read file operators to read data from youe s3 bucket. For this scenario we are consuming product and customer data from s3.

select the first "ReadFile" operator and assign the below property.

Service: s3

Connection: Connection details for s3

Bucket: Bucket name

Path: product.csv

Similarly select the second "ReadFile" operator and give the below properties.

Service:s3

Connection: Connection details for s3

Bucket: Bucket name

Path: customer.csv

Now we need to install required python libraries to run our python code. To do that, select the Repository tab. Expand dockerfiles and create a folder named as “Return_Prediction_Docker_File” under that.

Right click on “Return_Prediction_Docker_File” and select “Create File”.

A create Docker window will pop up. Name it as “dockerfile”. Click on "Create".

Copy the code from this file and paste it in the script section.

18)Select the configuration for this docker file. Click on the “+” icon on the right side of Tags and add the following tags to the configuration by simply entering the library’s name and press enter.

Save the file and build this docker file by clicking build button. Once completed it will show you the build status as completed, and orange circle will turn to green.

Again, go back to the graph and search for the "multiplexer" in the operators section.

Drag and drop "1:2 Multiplexer" operator in the graph.

Join the "Flowagent CSV Producer" which is connected to the "MySQL Table Consumer", to this python operator so we can read return data in to this python operator.

Now search for the python2operator in the operators section and drag and drop it to the graph.

Now let’s add 1 input port and 2 output ports to the python operator. To do that select the python operator and click on “add port”.

Give the following properties for input port and then click OK.

Similarly add the output ports. provide the below properties to the output ports.

Connect the input port of python operator to output port of multiplexer as shown below.

The graph will look like below.

Now select the python operator. It will show you all the available option with this operator, then choose open script option.

A new page will open where you can write python code. Copy the code from here and paste it. This code will run decision tree algorithm on return data and create a tree for that.

Go back to the graph

Now next thing is to tell the graph where we can find the python libraries that we installed. For that right click on python operator and select “Group”.

Select the entire group and open the configuration for that.

Next step is to add tags. Tags describe the runtime requirements of the operator and force the execution in a specific Docker image instance whose Docker file was annotated with the same Tag and Version.

Click on “+” button to add tags. Add the below tags

Now add "Wiretap" and "HTML Viewer" operators to the graph and connect it to the "output" and "output1" port of the python operator as show below. Here "HTML Viewer" operator is use to render html code to the browser.

Drag and drop another python operator in the graph.

Create 4 input port namely "input1" , "input2", "input3", "input4" and one output port "output" in this python operator.

connect the different data sources to this python operator as shown in below diagram.

Again add "Group" to this python operator. For adding Group please follow step 22 to step 25.

Now Open the script section of this python operator and copy and paste this code here.

Add a terminal to this operator's output.
The final graph will look like below.

Save the pipeline and run it.

Once running you can select the "HTML Viewer" and select "OPEN UI". Here you will see the decision tree that had been created for the return dataset.

This pipeline also joins the data from different data sources and save it in the "masterData.csv" file at /vrep/vflow/data/masterData.csv. To see this file just open "System Management" from your Datahub launchpad.

Choose files. Under files -> vflow -> data, you can see "masterData.csv" file.

Creating graph using SAP Analytics cloud.

login into your SAC account.
Create new story by clicking create -> new story button on the right hand side panel.
Select "Access & Explore Data".
Upload the "masterData.csv" file in SAC.
By default SAC create dimention and measure autometically. If you want to change it you can chage it by clicking on the column and then from left hand side select the property. For example: If you want to change "return" to measure from dimention, you can simply click on return and change the property to measure.
Now go to the story tab and add chart. Here you can create diffrent types of graph like pie chart, bar chart or donut chart etc.
For more information about how to create graphs in SAC please refer SAP analytics cloud documentation
Some of the sample graphs are shown below.