Skip to content

Instructions to install and run PDI Docker

Chris Mattmann edited this page Dec 11, 2018 · 6 revisions

This is the Dockerized version of Polar Deep Insights. The project is composed of two parts, the Insight Generator, which extracts insights from a data set, and the Insight Visualizer, which allows for explorations of those insights.

Quick Start

Prerequisites

  1. Install docker
    • Verify that it's running by typing docker ps into a command prompt. If you get a response, it's running.
    • If you normally log into a docker registry on your machine, do so now.
  2. Install elastic search tools
    • Install npm if it isn't installed already, else skip this step.
    • $ sudo npm install -g elasticsearch-tools
      • Installs elastic search tools.

Download Dependencies

  1. Download and unpack Polar Deep Insights

    • $ git clone https://github.com/USCDataScience/polar-deep-insights.git
      • Creates a polar-deep-insights folder and downloads the project files
    • cd polar-deep-insights/Docker
  2. Download polar.usc.edu index mappings

    • es-export-mappings --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data-mappings.json
      • These enable insight generator to understand the data provided.
  3. Download ElasticSearch index data

    • es-export-bulk --url http://polar.usc.edu/elasticsearch --file data/polar/polar-data.json
      • This will take a while - the polar data set contains 100k documents (go get coffee).

Build Insight Generator Docker Container

  1. docker pull uscdatascience/pdi-generator
    • You can also build locally by using this command : cd insight-generator
  2. docker build -t uscdatascience/pdi-generator -f InsightGenDockerfile .
  3. PDI_JSON_PATH=/data/polar docker-compose up -d

Build Insight Visualizer Docker Container

  1. cd ../insight-visualizer
  2. docker build -t uscdatascience/polar-deep-insights -f PolarDeepInsightsDockerfile .
    • You can also pull from docker hub with docker pull uscdatascience/polar-deep-insights
  3. PDI_JSON_PATH=data/polar docker-compose up -d
  4. Access application at http://localhost/pdi/
  5. Access elasticsearch at http://localhost/elasticsearch/

NOTES

  • If you are planning on analyzing your own files, copy them in to the data/files folder. The system will recognize and extract data from over 1400 different file types.
  • If using your own database - Replace the http://polar.usc.edu/elasticsearch in the above command with your remote elastic search url or your localhost elastic index's url and run the above command.
  1. Add files to the following folders according to these instructions:

    1. data/files : Add your data files of any filetype - to generate insights from
    2. data/polar : Contains mappings and data from the elastic search url
    3. data/ingest : Output from pdi insight generator will be saved here under the filename ingest_data.json
    4. data/sparkler/raw : Add Sparkler crawled data from the SOLR index into the sparkler_rawdata.json file in this folder
    5. data/sparkler/parsed : Sparkler data (in data/sparkler/raw/sparkler_rawdata.json) is parsed using parse.py and saved in sparkler_data.json
  2. The Insight Generator Docker container exposes the following ports: 8765 - Geo Topic Parser 9998 - Apache Tika Server 8060 - Grobid Quantities REST API

  3. This Insight Visualizer Docker container exposes the following ports: 80 - Apache2/HTTPD server 9000 - Grunt server serving up the PDI application 9200 - Elasticsearch 2.4.6 server 35729 - Auto refresh port for AngularJS apps

Monitoring the Container

docker logs -f container_id - use your docker container's id

Logging onto the Container with a Bash Shell

docker exec -it container_id bash - use your docker container's id

PS: You need to add CORS extension to the browser and to enable it in order to download concept ontology and additional precomputed information from http://polar.usc.edu/elasticsearch/ and elsewhere.