Introduction

Hillview: a big data spreadsheet. Hillview is a cloud-based application for browsing large datasets. The hillview user interface executes in a browser. Currently the software is alpha quality, under active development. For more information and examples see the Hillview user manual. Here is a short video of the system in action.

Developing Hillview

Software Dependences

Back-end: Ubuntu Linux or MacOS
Java 8, Maven build system, various Java libraries (Maven will manage the libraries)
Front-end: Typescript, webpack, Tomcat app server, node.js; some JavaScript libraries: d3, pako, and rx-js
Cloud service management: Python3
IDEA Intellij for development (optional)

Project structure

Hillview is currently split into two separate Maven projects.

platform: pure Java, includes the entire back-end. platform can be developed using the free (community edition) of Intellij IDEA.
web: the web server, web client and web services; this project links to the result produced by the platform project. To develop and debug this we have used capabilities available only in the paid version of Intellij, Ultimate, but only Maven is needed to build.

Single-machine development and testing

These instructions describe how to run hillview on a single machine using a sample dataset.

First install all software required as described below.
Check/edit the file ./bin/config.sh and select the appropriate versions for the software dependences.
Build the software:

$ cd bin
$ ./rebuild.sh

Download and prepare the sample data. The download script will download and decompress some CSV files with flights data from FAA. You can edit the program data/ontime/download.py to change the range of data that will be downloaded; the default is to download 2 months of data. The dataset has 110 columns; we can use them all, but for the demo we have stripped the dataset to 15 columns to better fit on the screen. The following command creates the smaller files from the downloaded data; this has to be done only once, after downloading the data.

$ ./demo-data-cleaner.sh

Next start the back-end service which performs all the data processing:

$ ./backend-start.sh &

Start the web server which receives requests from clients and dispatches them to the back-end servers; note that the folder where this command is run is important, since the path to the data files is relative to this folder.

$ ./frontend-start.sh

start a web browser at http://localhost:8080
when you are done stop the two services by killing the frontend-start.sh and backend-start.sh jobs.

Deploying the Hillview service on a cluster

Hillview uses ssh to deploy code on the cluster. Prior to deployment you must setup ssh on the cluster to use password-less access to the cluster machines, as described here: https://www.ssh.com/ssh/copy-id

Please note that Hillview allows arbitrary access to files on the worker nodes from the client application. The worker nodes should be deployed within a restricted secure environment (e.g. containers).

Before you run these commands, make sure you've built both platform and web projects. The deployment scripts are in the bin folder.

$: cd bin

Service configuration

The fixed configuration of the Hillview service is obtained from a configuration file; there is a sample file bin/config.py. This is a Python file with global variables that describe the setup of the Hillview service.

# This file is a Python program that defines the configuration for a
# Hillview deployment.  It is imported as a Python module by other
# Python files that handle the deployment.

# Name of machine hosting the web server
webserver = "web.server.name"

# Names of the machines hosting the workers; the web
# server machine can also act as a worker
backends = [
    "worker1.name",
    "worker2.name" # etc.
]

# This is a Python map which can be used to override the
# default_heap_size value below for specific machines.
backends_heapsize = {
    "worker1.name": "25G"
}

# Network port where the servers listen for requests
backend_port = 3569
# Java heap size for Hillview service
default_heap_size = "25G"
# User account for running the Hillview service
user = "hillview"
# Folder where the hillview service is installed on remote machines
service_folder = "/home/hillview"
# Version of Apache Tomcat to deploy
tomcat_version = "9.0.4"
# Tomcat installation folder name
tomcat = "apache-tomcat-" + tomcat_version
# If true delete old log files
cleanup = False

Deployment scripts

The following command installs the software on the machines:

$: deploy.py config.py

The service is started by running the following command:

$: start.py config.py

To connect to the service open http://<webserver>:8080 in your web browser.

To stop the services you can run:

$: stop.py config.py

Contributing code

You will need to sign a CLA (Contributor License Agreement) to contribute code to Hillview under an Apache-2 license. This is very standard.

Setup IntelliJ IDEA

Download and install Intellij IDEA: https://www.jetbrains.com/idea/. You can just untar the linux binary in a place of your choice and run the shell script ideaXXX/bin/idea.sh. The web projects uses capabilities only available in the paid version of Intellij IDEA.

Loading into IntelliJ IDEA

One solution is to load only the module that you want to contribute to: move to the corresponding folder: cd platform or cd web and start intellij there.

Alternatively, if you have IntelliJ Ultimate you can create an empty project in the hillview folder, and then import three modules (from File/Project structure/Modules, add three modules: web/pom.xml, platform/pom.xml, and the root folder hillview itself).

Using git to contribute

Fork the repository using the "fork" button on github, by following these instructions: https://help.github.com/articles/fork-a-repo/
Run IntelliJ code inspection (Analyze/Inspect code) before commit and solve all open issues.
Submit them into your own forked repository and send us a pull request.

In more detail, here is a step-by-step guide to committing your changes:

Create a new branch for each fix; give it a nice suggestive name:
- git branch yourBranchName
- git checkout yourBranchName
- The main benefit of using branches is that you can have multiple branches active at the same time, one for each independent fix.
git add <files that changed>
git commit -m "Description of commit"
git fetch upstream
git rebase upstream/master
Resolve conflicts, if any (rebase won't work if you don't; as you find conflicts you will need to git add the files you have merged, and then you may need to use git rebase --continue or git rebase --skip)
Test, analyze merged version.
git push -f origin yourBranchName. You won't need the -f if you are not updating a previous push to this branch.
Create a pull request to merge your new branch into master (using the web ui).
Delete your branch after the merging has been done git branch -D yourBranchName
To run the program you should try the master branch:

git checkout master
git fetch upstream
git rebase upstream/master
git push origin master

Guidance in writing code

The pseudorandom generator is implemented in the class Randomness.java and uses Mersenne Twister. Do not use the Java Random class, but this one.
By default all pointers are assumed to be non-null; use the @Nullable annotation (from javax.annotation) for all pointers which can be null. Use Converters.checkNull to cast a @Nullable to a @NonNull pointer.
(optional) Use "mvn site" to generate the FindBugs report in target/site/findbugs.html. Make sure any new code checked in does not introduce any violations. A subset of these checks is also done by the IDEA code inspection tool.

Software needed for deployment

Installing Java

We use Java 8.

First, download a JDK for Linux x64 from here: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Note: it is not enough to have a Java VM installed, you need a JDK.

Make sure to download the tarball version of the JDK.

Unpack the JDK, and set your JAVA_HOME environment variable to point to the unpacked folder (e.g, /jdk/jdk1.8.0_101). To set your JAVA_HOME environment variable, add the following to your ~/.bashrc or ~/.zshrc.

$ export JAVA_HOME="<path-to-jdk-folder>"

Installing other software needed

The following shell script will install the other required dependences for building and testing.

$ cd bin
$ ./install-dependences.sh

Impala Java libraries

If you want to access the Impala database you will need to download and install the JDBC connectors for Impala libraries from Cloudera. (These are not free software, so they are not available in Java Maven repositories.) You should install these in your local Maven repository, e.g. in the ~/.m2/com/cloudera/impala folder. You may also need to adjust the version of the libraries in the file platform/pom.xml.

Name		Name	Last commit message	Last commit date
Latest commit History 702 Commits
bin		bin
data		data
deployment		deployment
docs		docs
platform		platform
web		web
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE.txt		NOTICE.txt
README.md		README.md
hillview-logo.png		hillview-logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Developing Hillview

Software Dependences

Project structure

Single-machine development and testing

Deploying the Hillview service on a cluster

Service configuration

Deployment scripts

Contributing code

Setup IntelliJ IDEA

Loading into IntelliJ IDEA

Using git to contribute

Guidance in writing code

Software needed for deployment

Installing Java

Installing other software needed

Impala Java libraries

About

Releases

Packages

Languages

License

parikg/hiero

Folders and files

Latest commit

History

Repository files navigation

Introduction

Developing Hillview

Software Dependences

Project structure

Single-machine development and testing

Deploying the Hillview service on a cluster

Service configuration

Deployment scripts

Contributing code

Setup IntelliJ IDEA

Loading into IntelliJ IDEA

Using git to contribute

Guidance in writing code

Software needed for deployment

Installing Java

Installing other software needed

Impala Java libraries

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages