Python scripts for simulating Commonfare data, calculating commonshare, calculating recommendations, and visualisation
Requirements:
Python 3.x, NetworkX 2.2, Louvain community detection, dateutil (and the random names generator if running the simulation). Install with the following commands:
pip install networkx==2.2
pip install scipy python-louvain python-dateutil names
python/
-
parsegexf.py: Main class for parsing GEXF file, which then calls makegraphs.py to calculate commonshare and output JSON files.
-
config.py: Contains key constants used in the simulation. Values in here can be adjusted to determine how many users are generated, the number of actions per day, and how many days the simulation runs for. It now also contains constants to allow adjustment of collusion detection.
-
kcore.py: Contains adjusted core_number method from the 'core.py' file of NetworkX. Additional methods have been implemented to calculate the weighted, directed core number values at particular points in time. Also contains an implemented collusion detection algorithm.
-
makegraphs.py: Uses the methods in kcore.py to calculate Commonshare values for each node in the graph every two weeks. Outputs JSON files, described below.
-
pagerank.py: Contains an implementation of the 'Personalised PageRank' algorithm used in the story recommender (details below)
Classes for simulation (in the /simulation directory):
- graphclasses.py: Base classes that represent entities in the simulation
- listinggenerator.py: Generates listing names by picking an adjective and a noun from requisite dictionaries
- phrases.py: Generates story 'names' in the simulation by picking four random words from a dictionary
- simulation.py: Run 'python simulation.py' from the python/simulation directory to generate simulated data (this gets stored in data/input/simulateddata.gexf)
data/output/
-
graphdata/biweekly/...: Contains graph-based JSON files representing every two weeks of Commonfare interactions, with Commonshare values calculated for each node (1.json ... X.json) Also contains a cumulative graph-based JSON file of every interaction made in Commonfare since its initiation (0.json)
-
userdata/...: Contains a file for every user, named <USER_ID>.json, which represents their entire interaction history
-
recommenderdata.gexf: Contains a cleaned version of the original GEXF, used for generating story recommendations
A very basic Docker image is available to run the python scripts parsegexf.py
and pagerank.py
, the methods of which are exposed through a simple web API, as described below.
Input and output data is exchanged through the files in ./data
directory which is mounted as a volume.
To build this image make sure you have Docker installed in your host. It that is the case you just run:
$ docker build -t commonfare/commonshare-python .
If you now check docker images available in your host machine you would notice one named commonfare/commonshare-python
.
$ docker images
...
commonfare/commonshare-python latest 323a3b42764f 30 minutes ago 297MB
...
This Docker image runs the Flask app, which exposes a simple API for running the following two Python scripts:
parsegexf.py
which takes as input a file in GEXF format and produces as output a series of files in./data/output/
directory.pagerank.py
which takes as input a story id and a user id and calculates the recommended stories for such user based on the input story.
Parameters and environment variables
The following environment variables are used as parameters and can be set when calling the docker image:
TASK
- can be eitherparse
orpagerank
depending on which task you want to be performed. Default:parse
GEXF_INPUT
- is the gexf input file used which will be parsed when running theparse
task. Default:./data/input/latest.gexf
PAGERANK_FILE
- is the input file used when calculating the recommendations through thepagerank
task. Default:./data/output/recommenderdata.gexf
STORY_ID
- input story used for the pagerankUSER_ID
- input user used for the pagerank
A few examples are provided in the sections below to better clarify how to use this docker image.
The following command will start the service, connecting port 5000 of the Docker container (Flask default) to port 5000 of your machine:
$ docker run -it --rm -p 5000:5000 -v "$PWD/data":/usr/src/app/data commonfare/commonshare-python
Specify a different input file via the GEXF_INPUT
environment variable.
$ docker run -it --rm -p 5000:5000 -v "$PWD/data":/usr/src/app/data -e GEXF_INPUT=./data/input/input3.gexf commonfare/commonshare-python
If you like docker-compose, you can build and run using
$ docker-compose build
$ docker-compose up
To run parsegexf.py, use the following URL...
#This will return a simple JSON object {success: true} on successful completion (note this takes a few minutes)
http://127.0.0.1:5000/parse
...and to run pagerank.py...
#This will return a JSON array of three IDs corresponding to stories that the user specified by *userid* should be recommended on reading story *storyid* If the story or user ID cannot be found, [0,0,0] will be returned instead.
http://127.0.0.1:5000/recommend/*storyid*/*userid*