Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/azure container #75

Closed
wants to merge 36 commits into from
Closed
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
b843511
Add Dockerfile for Azure Container Instances
AlexAxthelm Sep 25, 2023
c99da36
Pin package and FROM versions
AlexAxthelm Sep 25, 2023
797d711
Separate Chrome Installation
AlexAxthelm Sep 25, 2023
7683c6b
Set working directory before copy, limit copy
AlexAxthelm Sep 25, 2023
815110b
Use docker build secrets to pass github auth
AlexAxthelm Sep 25, 2023
658ac32
Update documentation, and do not leak secrets
AlexAxthelm Sep 25, 2023
e65debc
Update buildkit information
AlexAxthelm Sep 25, 2023
7dea5e5
Add deploy ARM Template
AlexAxthelm Sep 26, 2023
239ef62
Add infrastructure to copy files from rawdata to inputs
AlexAxthelm Oct 3, 2023
6f90760
Rearrange files
AlexAxthelm Oct 3, 2023
1e2b176
WIP: deploy works until factset pull
AlexAxthelm Nov 14, 2023
9257d1a
Use DESCRIPTION for dependency management
AlexAxthelm Jan 17, 2024
f8f6dec
Resolve dependency installation
AlexAxthelm Jan 17, 2024
1f862ec
Wrap up file copy step
AlexAxthelm Jan 18, 2024
4d2dcdb
copy FS files
AlexAxthelm Jan 18, 2024
a23fc83
Merge branch 'main' into feature/azure-container
AlexAxthelm Jan 18, 2024
b977903
convert from `{rlog}` to `{logger}`
AlexAxthelm Jan 18, 2024
845ca1a
Clean logging strings
AlexAxthelm Jan 18, 2024
6d81834
add {glue} to dependencies
AlexAxthelm Jan 18, 2024
264d9dc
Disable readr progress bar
AlexAxthelm Jan 18, 2024
879faac
Don't update factset data on CICD runs
AlexAxthelm Jan 18, 2024
7d0c0d3
Ensure output path exists
AlexAxthelm Jan 18, 2024
65b792c
fix bad mount path
AlexAxthelm Jan 18, 2024
5b10fa7
disable object_name_linter
AlexAxthelm Jan 19, 2024
d830af0
improve creating output directory
AlexAxthelm Jan 19, 2024
30c061d
Merge branch 'main' into feature/azure-container
AlexAxthelm Jan 19, 2024
9387905
prefer `seq` over `x:y`
AlexAxthelm Jan 19, 2024
ea8dfb8
Add DEBUG and TRACE logging
AlexAxthelm Jan 19, 2024
4222390
remove unused file
AlexAxthelm Jan 19, 2024
4466032
don't check for missing envvars, just read .env
AlexAxthelm Jan 19, 2024
4d2dee1
Add pak options to not update sysreqs db
AlexAxthelm Jan 19, 2024
676a778
Add tar to pack up files
AlexAxthelm Jan 20, 2024
1f0cd00
Allow option to not create tar
AlexAxthelm Jan 20, 2024
75d70b1
Set CRAN Repo in Rprofile.site
AlexAxthelm Jan 20, 2024
468f1ff
Increase memory available
AlexAxthelm Jan 21, 2024
acc9c0f
Change to supporte GPU, update docs
AlexAxthelm Jan 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@
.Ruserdata
.env
.DS_Store
*parameters.json
github_pat.txt
3 changes: 3 additions & 0 deletions ACI/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
inputs/
outputs/
dataprep_inputs/
59 changes: 59 additions & 0 deletions ACI/Dockerfile.ACI
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
FROM rocker/tidyverse:4.3.1

# install system dependencies for R packages
RUN apt-get update && apt-get install --no-install-recommends -y \
curl=7.81.* \
git=1:2.34.* \
gnupg=2.2.* \
libcurl4-openssl-dev=7.81.* \
libfontconfig1-dev=2.13.* \
libfreetype6-dev=2.11.* \
libfribidi-dev=1.0.* \
libgit2-dev=1.1.* \
libharfbuzz-dev=2.7.* \
libicu-dev=70.1-* \
libjpeg-dev=8c-* \
libpng-dev=1.6.* \
libssl-dev=3.0.* \
libtiff-dev=4.3.* \
libxml2-dev=2.9.* \
make=4.3-* \
pandoc=2.9.2.* \
zlib1g-dev=1:1.2.* \
&& rm -rf /var/lib/apt/lists/*

RUN curl -fsSL -o /tmp/google-chrome.deb https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb \
&& apt-get update \
&& DEBIAN_FRONTEND='noninteractive' apt-get install --no-install-recommends -y /tmp/google-chrome.deb \
&& rm /tmp/google-chrome.deb \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /workflow.data.preparation

# set frozen CRAN repo
ARG CRAN_REPO="https://packagemanager.posit.co/cran/__linux__/jammy/2023-10-30"
ARG R_HOME="/usr/local/lib/R"
RUN echo "options(repos = c(CRAN = '$CRAN_REPO'), pkg.sysreqs = FALSE)" >> "${R_HOME}/etc/Rprofile.site"

# Install R dependencies
COPY DESCRIPTION DESCRIPTION

# install pak, find dependencises from DESCRIPTION, and install them.
RUN --mount=type=secret,id=github_pat \
Rscript -e "\
Sys.setenv(GITHUB_PAT = readLines('/run/secrets/github_pat')); \
install.packages('pak'); \
deps <- pak::local_deps(root = '.'); \
pkg_deps <- deps[!deps[['direct']], 'ref']; \
cat(pkg_deps); \
pak::pak(pkg_deps); \
Sys.unsetenv('GITHUB_PAT'); \
"

COPY ./run_pacta_data_preparation.R run_pacta_data_preparation.R
COPY ./config.yml config.yml
COPY ./ACI/copy_raw_data.R copy_raw_data.R

COPY ./ACI/copy_files_and_run_data_prep.sh /usr/local/bin/copy_files_and_run_data_prep

CMD ["copy_files_and_run_data_prep"]
194 changes: 194 additions & 0 deletions ACI/azure-deploy.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "0.0.4",
"parameters": {
"location": {
"type": "string",
"defaultValue": "[resourceGroup().location]",
"metadata": {
"description": "Location for all resources."
}
},
"identity": {
"type": "string",
"metadata": {
"description": "The ID of the user assigned identity to use for the container group."
}
},
"serviceprincipal": {
"type": "string",
"metadata": {
"description": "The ID of the service principal to use for the container group."
}
},
"containerGroupName": {
"type": "string",
"metadata": {
"description": "The name of the container group."
}
},
"restartPolicy": {
"type": "string",
"defaultValue": "OnFailure",
"allowedValues": [
"Always",
"Never",
"OnFailure"
],
"metadata": {
"description": "The behavior of Azure runtime if container has stopped."
}
},
"rawdata-storageaccountkey": {
"type": "securestring",
"metadata": {
"description": "The storage account key for the rawdata storage account."
}
},
"dataprepinputs-storageaccountkey": {
"type": "securestring",
"metadata": {
"description": "The storage account key for the rawdata storage account."
}
},
"dataprepoutputs-storageaccountkey": {
"type": "securestring",
"metadata": {
"description": "The storage account key for the rawdata storage account."
}
},
"starttime": {
"type": "string",
"defaultValue": "[utcNow()]",
"metadata": {
"description": "The time to start the container group."
}
}
},
"variables": {
"azurecontainerregistry": "transitionmonitordockerregistry.azurecr.io"
},
"functions": [],
"resources": [
{
"type": "Microsoft.ContainerInstance/containerGroups",
"apiVersion": "2021-09-01",
"name": "[parameters('containerGroupName')]",
"location": "[parameters('location')]",
"identity": {
"type": "UserAssigned",
"userAssignedIdentities": {
"[parameters('identity')]": {}
}
},
"metadata": {
"data-prep environmentVariables description": {
"DEPLOY_START_TIME": "The time the container was deployed.",
"R_CONFIG_ACTIVE": "The active config for the container.",
"R_CONFIG_FILE": "The config file for the container.",
"LOG_LEVEL": "The log level for the container. See {rlog} docs."
}
},
"properties": {
"containers": [
{
"name": "data-prep",
"properties": {
"image": "[concat(variables('azurecontainerregistry'),'/workflow.data.preparation_aci:latest')]",
"ports": [],
"resources": {
"requests": {
"cpu": 1,
"memoryInGB": 1
}
},
"environmentVariables": [
{
"name": "DEPLOY_START_TIME",
"value": "[parameters('starttime')]"
},
{
"name": "R_CONFIG_ACTIVE",
"value": "2022Q4_CICD"
},
{
"name": "R_CONFIG_FILE",
"value": "/workflow.data.preparation/config.yml"
},
{
"name": "LOG_LEVEL",
"value": "DEBUG"
}
],
"volumeMounts": [
{
"name": "factset-extracted",
"mountPath": "/mnt/factset-extracted/"
},
{
"name": "rawdatavolume",
"mountPath": "/mnt/rawdata/"
},
{
"name": "inputsvolume",
"mountPath": "/mnt/inputs/"
},
{
"name": "outputsvolume",
"mountPath": "/mnt/outputs/"
}
]
}
}
],
"imageRegistryCredentials": [
{
"server": "[variables('azurecontainerregistry')]",
"identity": "[parameters('identity')]"
}
],
"restartPolicy": "[parameters('restartPolicy')]",
"osType": "Linux",
"volumes": [
{
"name": "factset-extracted",
"azureFile": {
"shareName": "factset-extracted",
"readOnly": true,
"storageAccountName": "pactarawdata",
"storageAccountKey": "[parameters('rawdata-storageaccountkey')]"
}
},
{
"name": "rawdatavolume",
"azureFile": {
"shareName": "rawdata",
"readOnly": true,
"storageAccountName": "pactarawdata",
"storageAccountKey": "[parameters('rawdata-storageaccountkey')]"
}
},
{
"name": "inputsvolume",
"azureFile": {
"shareName": "data-prep-inputs",
"readOnly": false,
"storageAccountName": "pactadata",
"storageAccountKey": "[parameters('dataprepinputs-storageaccountkey')]"
}
},
{
"name": "outputsvolume",
"azureFile": {
"shareName": "data-prep-outputs",
"readOnly": false,
"storageAccountName": "pactadata",
"storageAccountKey": "[parameters('dataprepoutputs-storageaccountkey')]"
}
}
]
}
}
],
"outputs": {}
}
13 changes: 13 additions & 0 deletions ACI/copy_files_and_run_data_prep.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#! /bin/sh
set -e

inputs_dir="/mnt/inputs"

# copy raw data, then run normal data prep script
Rscript /workflow.data.preparation/copy_raw_data.R 2>&1 | \
tee "$inputs_dir/$DEPLOY_START_TIME-copy.log"

Rscript /workflow.data.preparation/run_pacta_data_preparation.R 2>&1 | \
tee "$inputs_dir/$DEPLOY_START_TIME-prep.log"

exit 0
Loading