-
Notifications
You must be signed in to change notification settings - Fork 8
Instructions to install and run PDI Docker
This is the Dockerized version of Polar Deep Insights. The project is composed of two parts, the Insight Generator, which extracts insights from a data set, and the Insight Visualizer, which allows for explorations of those insights.
- Install docker
- Verify that it's running by typing
docker ps
into a command prompt. If you get a response, it's running. - If you normally log into a docker registry on your machine, do so now.
- Verify that it's running by typing
- Install elastic search tools
- Install npm if it isn't installed already, else skip this step.
-
$ sudo npm install -g elasticsearch-tools
- Installs elastic search tools.
-
Download and unpack Polar Deep Insights
-
$ git clone https://github.com/USCDataScience/polar-deep-insights.git
- Creates a polar-deep-insights folder and downloads the project files
cd polar-deep-insights/Docker
-
-
Download polar.usc.edu index mappings
-
es-export-mappings --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data-mappings.json
- These enable insight generator to understand the data provided.
-
-
Download ElasticSearch index data
-
es-export-bulk --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data.json
- This will take a while - the polar data set contains 100k documents (go get coffee).
-
-
docker pull uscdatascience/pdi-generator
- You can also build locally by using this command :
cd insight-generator
- You can also build locally by using this command :
docker build -t uscdatascience/pdi-generator -f InsightGenDockerfile .
PDI_JSON_PATH=/data/polar docker-compose up -d
cd ../insight-visualizer
-
docker build -t uscdatascience/polar-deep-insights -f PolarDeepInsightsDockerfile .
- You can also pull from docker hub with
docker pull uscdatascience/polar-deep-insights
- You can also pull from docker hub with
PDI_JSON_PATH=data/polar docker-compose up -d
- Access application at http://localhost/pdi/
- Access elasticsearch at http://localhost/elasticsearch/
- If you are planning on analyzing your own files, copy them in to the
data/files
folder. The system will recognize and extract data from over 1400 different file types. - If using your own database - Replace the
http://polar.usc.edu/elasticsearch
in the above command with your remote elastic search url or your localhost elastic index's url and run the above command.
-
Add files to the following folders according to these instructions:
-
data/files
: Add your data files of any filetype - to generate insights from -
data/polar
: Contains mappings and data from the elastic search url -
data/ingest
: Output from pdi insight generator will be saved here under the filenameingest_data.json
-
data/sparkler/raw
: Add Sparkler crawled data from the SOLR index into thesparkler_rawdata.json
file in this folder -
data/sparkler/parsed
: Sparkler data (indata/sparkler/raw/sparkler_rawdata.json
) is parsed using parse.py and saved insparkler_data.json
-
-
The Insight Generator Docker container exposes the following ports: 8765 - Geo Topic Parser 9998 - Apache Tika Server 8060 - Grobid Quantities REST API
-
This Insight Visualizer Docker container exposes the following ports: 80 - Apache2/HTTPD server 9000 - Grunt server serving up the PDI application 9200 - Elasticsearch 2.4.6 server 35729 - Auto refresh port for AngularJS apps
docker logs -f container_id
- use your docker container's id
docker exec -it container_id bash
- use your docker container's id
PS: You need to add CORS extension to the browser and to enable it in order to download concept ontology and additional precomputed information from http://polar.usc.edu/elasticsearch/ and elsewhere.
Information Retrieval and Data Science (IRDS) research group, University of Southern California.