Skip to content

Commit

Permalink
Reuse the ml4cvd dependencies file to build the custom Terra image. (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
deflaux authored Jun 4, 2020
1 parent d89fd55 commit 70e331e
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 89 deletions.
92 changes: 13 additions & 79 deletions docker/terra_image/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,90 +1,24 @@
FROM us.gcr.io/broad-dsp-gcr-public/terra-jupyter-r:0.0.7
# https://github.com/DataBiosphere/terra-docker/blob/master/terra-jupyter-r/CHANGELOG.md
FROM us.gcr.io/broad-dsp-gcr-public/terra-jupyter-gatk:1.0.0
# https://github.com/DataBiosphere/terra-docker/blob/master/terra-jupyter-gatk/CHANGELOG.md

USER root

# Fix annoying pip warnings in the source image.
RUN chmod -R a+rwx $HOME/.cache

# Add to the PATH so that the tensorflow upgrade does not emit warnings.
ENV PATH $PATH:$HOME/.local/bin

# Temporarily make pip install to the system directory.
ENV PIP_USER=false

# Add minimal ml4cvd dependencies.
RUN apt-get install libxt-dev -y \
# TODO(deflaux) update and use https://github.com/broadinstitute/ml/blob/master/docker/vm_boot_images/config/tensorflow-requirements.txt
# instead, after the Tensorflow 2.0 pull request is merged.
&& pip3 install \
apache-beam[gcp]==2.12.0 \
google-cloud-storage==1.25.0 \
h5py==2.9.0 \
imageio==2.6.1 \
keras==2.2.5 \
nibabel==2.5.0 \
seaborn==0.9.0 \
vtk==8.1.2

# Add visualization libraries not installed by default on Terra and upgrade
# to newer versions for other libraries that are installed by default.
RUN pip3 install --upgrade \

# Install signal processing libraries.
biosppy \

# Upgrade to latest version of this so that the tensorflow upgrade does not
# emit warnings.
setuptools \

# Upgrade to latest version of this so that %%bigquery does not print the
# first few rows of the downloaded dataframe of query results.
# Pin version due to https://github.com/googleapis/google-cloud-python/issues/9965
google-cloud-bigquery[pandas]==1.22.0 \

# Upgrade to latest version of tensorflow.
tensorflow \

# Upgrade to latest version of statsmodels to resolve an issue with plotnine.
statsmodels \

# Install data visualization libraries.
facets-overview \
# Pin version due to Terra's more strict 'Content Security Policy'.
altair==3.3.0 \
plotnine \
pydicom \
vega \
vega_datasets \

# Configure notebook extensions.
&& jupyter nbextension install --py vega \
&& jupyter nbextension enable --py vega

# TODO(deflaux): this also needs libgl1-mesa-glx and/or python3-tk to fix error
# "ModuleNotFoundError: No module named 'vtkOpenGLKitPython'"
# but neither of those have installation candidate for this base image.

# Terra notebook Content Security Policy prohibits importing this HTML from a
# remote location, so we download a local copy instead. The instructions in
# https://github.com/PAIR-code/facets#setup do not apply to Terra due to the way
# URLs are constructed. In the python library code that refers to this HTML
# file, we create a symlink to it so that the notebook-relative URL
# ./facets-jupyter.html will load.
RUN cd $JUPYTER_HOME/custom \
&& wget https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html \
&& wget https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js

# Make pip install to a user directory, instead of a system directory which
# requires root. This is useful so `pip install` commands can be run in the
# context of a notebook.
ENV PIP_USER=true
# Address errors due to enum34 package https://stackoverflow.com/a/45716067/4138705
RUN pip3 uninstall -y enum34

# Install custom private python package from Puneet Batra's team. Enable easy
# local edits by placing it in the home directory with write access for
# jupyter-user.
USER $USER
RUN mkdir -p $HOME/ml4cvd_pkg
COPY --chown=jupyter-user:users ml4cvd $HOME/ml4cvd_pkg/ml4cvd
COPY --chown=jupyter-user:users config $HOME/ml4cvd_pkg/config
ENV PYTHONPATH $PYTHONPATH:$HOME/ml4cvd_pkg

RUN pip3 install --user -r $HOME/ml4cvd_pkg/config/tensorflow-requirements.txt \
# Upgrade to a newer version so that %%bigquery does not print the
# first few rows of the downloaded dataframe of query results.
# Pin version due to https://github.com/googleapis/google-cloud-python/issues/9965
&& pip3 install --upgrade --user google-cloud-bigquery[pandas]==1.22.0 \
# Configure notebook extensions.
&& jupyter nbextension install --user --py vega \
&& jupyter nbextension enable --user --py vega
6 changes: 4 additions & 2 deletions docker/terra_image/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,10 @@

To build and push:
```
cp -r ../../ml4cvd . \
&& perl -i -pe 's/^import vtk/#import vtk/g' ml4cvd/tensor_from_file.py \
mv ml4cvd ml4cvdBAK_$(date +"%Y%m%d_%H%M%S") \
&& mv config configBAK_$(date +"%Y%m%d_%H%M%S") \
&& cp -r ../../ml4cvd . \
&& cp -r ../vm_boot_images/config . \
&& gcloud --project uk-biobank-sek-data builds submit \
--timeout 20m \
--tag gcr.io/uk-biobank-sek-data/ml4cvd_terra:`date +"%Y%m%d_%H%M%S"` .
Expand Down
8 changes: 2 additions & 6 deletions notebooks/review_results/identify_a_sample_to_review.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,9 @@
"<div class=\"alert alert-block alert-warning\">\n",
" This notebook assumes:\n",
" <ul>\n",
" <li><b>Terra</b> is running custom Docker image <kbd>gcr.io/uk-biobank-sek-data/ml4cvd_terra:20200226_122553</kbd>.</li>\n",
" <li><b>Terra</b> is running custom Docker image <kbd>gcr.io/uk-biobank-sek-data/ml4cvd_terra:20200601_163801</kbd>.</li>\n",
" <li><b>ml4cvd</b> is running custom Docker image <kbd>gcr.io/broad-ml4cvd/deeplearning:tf2-latest-gpu</kbd>.</li>\n",
" </ul>\n",
"</div>\n",
"\n",
"<div class=\"alert alert-block alert-danger\">\n",
"<b>Terra</b> notebooks with Facets for interactive data exploration are broken in the latest version of chrome. The work-around described in <a href=\"https://github.com/PAIR-code/facets/issues/207\">https://github.com/PAIR-code/facets/issues/207</a> is in place but a bit more work is needed per <a href=\"https://broadworkbench.atlassian.net/browse/IA-1684\">https://broadworkbench.atlassian.net/browse/IA-1684</a>.\n",
"</div>"
]
},
Expand Down Expand Up @@ -230,7 +226,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
"version": "3.7.7"
},
"toc": {
"base_numbering": 1,
Expand Down
4 changes: 2 additions & 2 deletions notebooks/review_results/review_one_sample.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
"<div class=\"alert alert-block alert-warning\">\n",
" This notebook assumes\n",
" <ul>\n",
" <li><b>Terra</b> is running custom Docker image <kbd>gcr.io/uk-biobank-sek-data/ml4cvd_terra:20200226_122553</kbd>.</li>\n",
" <li><b>Terra</b> is running custom Docker image <kbd>gcr.io/uk-biobank-sek-data/ml4cvd_terra:20200601_163801</kbd>.</li>\n",
" <li><b>ml4cvd</b> is running custom Docker image <kbd>gcr.io/broad-ml4cvd/deeplearning:tf2-latest-gpu</kbd>.</li>\n",
" </ul>\n",
"</div>"
Expand Down Expand Up @@ -570,7 +570,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
"version": "3.7.7"
},
"toc": {
"base_numbering": 1,
Expand Down

0 comments on commit 70e331e

Please sign in to comment.