Skip to content

Commit

Permalink
feat: dbt-core image (#62)
Browse files Browse the repository at this point in the history
* Feat/mw tasks (#61)

* fix

* init

* Update README.md

* Create tasks-phase1.md

description of tasks for phase1

* Update tasks-phase1.md

* Update README.md

* Update main.tf

* Update tasks-phase1.md

* Update tasks-phase1.md

* Update tasks-phase1.md

* Update tasks-phase1.md

* Update tasks-phase1.md

* Update tasks-phase1.md

* Add files via upload

* Update tasks-phase1.md

* Update tasks-phase1.md

* Add files via upload

* Update tasks-phase1.md

* Update tasks-phase1.md

* Update tasks-phase1.md

* Update tasks-phase1.md

* Update tasks-phase1.md

* Update tasks-phase1.md

* Update tasks-phase1.md

* Small corrections

---------

Co-authored-by: Tomasz Gambin <[email protected]>

* Update envs

---------

Co-authored-by: Tomasz Gambin <[email protected]>
  • Loading branch information
mwiewior and tgambin authored Dec 1, 2023
1 parent 8d1a98b commit 726fb7e
Show file tree
Hide file tree
Showing 59 changed files with 3,001 additions and 8 deletions.
2 changes: 2 additions & 0 deletions .checkov.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ skip-check:
- CKV_GCP_41 # "Ensure that IAM users are not assigned the Service Account User or Service Account Token Creator roles at project level"
- CKV_GCP_29 # "Ensure that Cloud Storage buckets have uniform bucket-level access enabled"
- CKV_GCP_62 # "Bucket should log access"
- CKV_DOCKER_3 # "Ensure that a user for the container has been created"
- CKV_DOCKER_2 # "Ensure that HEALTHCHECK instructions have been added to container images"

skip-path:
- gha-creds-*json
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,8 +152,9 @@ No providers.
| <a name="module_composer"></a> [composer](#module\_composer) | ./modules/composer | n/a |
| <a name="module_data-pipelines"></a> [data-pipelines](#module\_data-pipelines) | ./modules/data-pipeline | n/a |
| <a name="module_dataproc"></a> [dataproc](#module\_dataproc) | ./modules/dataproc | n/a |
| <a name="module_dbt_docker_image"></a> [dbt\_docker\_image](#module\_dbt\_docker\_image) | ./modules/dbt_docker_image | n/a |
| <a name="module_gcr"></a> [gcr](#module\_gcr) | ./modules/gcr | n/a |
| <a name="module_jupyter_docker_image"></a> [jupyter\_docker\_image](#module\_jupyter\_docker\_image) | ./modules/docker_image | n/a |
| <a name="module_jupyter_docker_image"></a> [jupyter\_docker\_image](#module\_jupyter\_docker\_image) | ./modules/jupyter_docker_image | n/a |
| <a name="module_vertex_ai_workbench"></a> [vertex\_ai\_workbench](#module\_vertex\_ai\_workbench) | ./modules/vertex-ai-workbench | n/a |
| <a name="module_vpc"></a> [vpc](#module\_vpc) | ./modules/vpc | n/a |

Expand Down
2 changes: 1 addition & 1 deletion cicd_bootstrap/conf/github_actions.tfvars
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
github_org = "tgambin"
github_org = "bdg-tbd"
github_repo = "tbd-workshop-1"
2 changes: 1 addition & 1 deletion env/backend.tfvars
Original file line number Diff line number Diff line change
@@ -1 +1 @@
bucket = "tbd-2023z-9903-state"
bucket = "tbd-2023z-9910-state"
6 changes: 3 additions & 3 deletions env/project.tfvars
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
project_name = "tbd-2023z-9903"
iac_service_account = "tbd-2023z-9903-lab@tbd-2023z-9903.iam.gserviceaccount.com"
ai_notebook_instance_owner = "tgambin@gmail.com"
project_name = "tbd-2023z-9910"
iac_service_account = "tbd-2023z-9910-lab@tbd-2023z-9910.iam.gserviceaccount.com"
ai_notebook_instance_owner = "marek.wiewiorka@gmail.com"
13 changes: 12 additions & 1 deletion main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ locals {
composer_subnet_address = "10.11.0.0/16"
code_bucket_name = "${var.project_name}-code"
data_bucket_name = "${var.project_name}-data"
spark_version = "3.3.2"
}

module "vpc" {
Expand All @@ -25,10 +26,11 @@ module "gcr" {

module "jupyter_docker_image" {
depends_on = [module.gcr]
source = "./modules/docker_image"
source = "./modules/jupyter_docker_image"
registry_hostname = module.gcr.registry_hostname
registry_repo_name = coalesce(var.project_name)
project_name = var.project_name
spark_version = local.spark_version
}

module "vertex_ai_workbench" {
Expand Down Expand Up @@ -89,3 +91,12 @@ module "data-pipelines" {
data_bucket_name = local.data_bucket_name
}

module "dbt_docker_image" {
depends_on = [module.composer]
source = "./modules/dbt_docker_image"
registry_hostname = module.gcr.registry_hostname
registry_repo_name = coalesce(var.project_name)
project_name = var.project_name
spark_version = local.spark_version
}

File renamed without changes.
42 changes: 42 additions & 0 deletions modules/dbt_docker_image/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# dbt_docker_image

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | ~> 1.5.0 |
| <a name="requirement_docker"></a> [docker](#requirement\_docker) | 3.0.2 |

## Providers

| Name | Version |
|------|---------|
| <a name="provider_docker"></a> [docker](#provider\_docker) | 3.0.2 |

## Modules

No modules.

## Resources

| Name | Type |
|------|------|
| [docker_image.dbt](https://registry.terraform.io/providers/kreuzwerker/docker/3.0.2/docs/resources/image) | resource |
| [docker_registry_image.dbt](https://registry.terraform.io/providers/kreuzwerker/docker/3.0.2/docs/resources/registry_image) | resource |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_dbt_spark_version"></a> [dbt\_spark\_version](#input\_dbt\_spark\_version) | dbt-spark version | `string` | `"1.7.1"` | no |
| <a name="input_dbt_version"></a> [dbt\_version](#input\_dbt\_version) | dbt core version | `string` | `"1.7.3"` | no |
| <a name="input_project_name"></a> [project\_name](#input\_project\_name) | Project name | `string` | n/a | yes |
| <a name="input_registry_hostname"></a> [registry\_hostname](#input\_registry\_hostname) | Image registry hostname | `string` | n/a | yes |
| <a name="input_registry_repo_name"></a> [registry\_repo\_name](#input\_registry\_repo\_name) | Image registry repository name | `string` | n/a | yes |
| <a name="input_spark_version"></a> [spark\_version](#input\_spark\_version) | Apache Spark version | `string` | `"3.3.2"` | no |

## Outputs

No outputs.
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
25 changes: 25 additions & 0 deletions modules/dbt_docker_image/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
resource "docker_image" "dbt" {
name = "${var.registry_hostname}/${var.registry_repo_name}/dbt:${var.dbt_version}"
build {
context = "${path.module}/resources"
build_args = {
DBT_VERSION : var.dbt_version
DBT_SPARK_VERSION : var.dbt_spark_version
SPARK_VERSION : var.spark_version
PROJECT_NAME : var.project_name
}
tag = ["${var.registry_hostname}/${var.registry_repo_name}/dbt:latest"]
}
triggers = {
dir_sha1 = sha1(join("", [for f in fileset(path.cwd, "${path.module}/resources/*") : filesha1(f)]))
}
}


resource "docker_registry_image" "dbt" {
name = docker_image.dbt.name
keep_remotely = true
triggers = {
dir_sha1 = sha1(join("", [for f in fileset(path.cwd, "${path.module}/resources/*") : filesha1(f)]))
}
}
41 changes: 41 additions & 0 deletions modules/dbt_docker_image/resources/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
FROM python:3.10-slim-bullseye as base
ARG DBT_VERSION
ARG DBT_SPARK_VERSION
ARG SPARK_VERSION
ARG HADOOP_CONF_DIR=/etc/hadoop/conf

# System setup
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
git \
ssh-client \
software-properties-common \
make \
build-essential \
ca-certificates \
libpq-dev \
python-dev \
libsasl2-dev \
gcc \
unixodbc-dev \
&& apt-get clean \
&& rm -rf \
/var/lib/apt/lists/* \
/tmp/* \
/var/tmp/*

# Env vars
ENV PYTHONIOENCODING=utf-8
ENV LANG=C.UTF-8

# Update python and install packages
RUN pip install --upgrade pip setuptools wheel --no-cache-dir \
dbt-core==$DBT_VERSION \
dbt-spark==$DBT_SPARK_VERSION \
pyspark==$SPARK_VERSION

RUN mkdir -p $HADOOP_CONF_DIR
COPY conf/* $HADOOP_CONF_DIR
ENV HADOOP_CONF_DIR=$HADOOP_CONF_DIR

WORKDIR /dbt
32 changes: 32 additions & 0 deletions modules/dbt_docker_image/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
variable "registry_hostname" {
type = string
description = "Image registry hostname"
}

variable "registry_repo_name" {
type = string
description = "Image registry repository name"
}

variable "dbt_version" {
type = string
description = "dbt core version"
default = "1.7.3"
}

variable "dbt_spark_version" {
type = string
description = "dbt-spark version"
default = "1.7.1"
}

variable "spark_version" {
type = string
description = "Apache Spark version"
default = "3.3.2"
}

variable "project_name" {
type = string
description = "Project name"
}
File renamed without changes.
24 changes: 24 additions & 0 deletions modules/jupyter_docker_image/.terraform.lock.hcl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ No modules.
| <a name="input_project_name"></a> [project\_name](#input\_project\_name) | Project name | `string` | n/a | yes |
| <a name="input_registry_hostname"></a> [registry\_hostname](#input\_registry\_hostname) | Image registry hostname | `string` | n/a | yes |
| <a name="input_registry_repo_name"></a> [registry\_repo\_name](#input\_registry\_repo\_name) | Image registry repository name | `string` | n/a | yes |
| <a name="input_spark_version"></a> [spark\_version](#input\_spark\_version) | Apache Spark version | `string` | `"3.3.2"` | no |

## Outputs

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ resource "docker_image" "jupyter" {
context = "${path.module}/resources"
build_args = {
JUPYTERLAB_VERSION : var.jupyterlab_version
SPARK_VERSION : var.spark_version
PROJECT_NAME : var.project_name
}
tag = ["${var.registry_hostname}/${var.registry_repo_name}/jupyter:latest"]
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ FROM gcr.io/deeplearning-platform-release/base-cpu.py310:m108
ARG JUPYTERLAB_VERSION
ARG PROJECT_NAME
ARG HADOOP_CONF_DIR=/etc/hadoop/conf
ARG SPARK_VERSION
#checkov:skip=CKV_DOCKER_2: "Ensure that HEALTHCHECK instructions have been added to container images"
# to remove to make this image run as non-root user
#checkov:skip=CKV_DOCKER_3: "Ensure that a user for the container has been created"
Expand All @@ -22,7 +23,7 @@ ENV PATH=/usr/bin:/usr/local/bin:$PATH
RUN wget --quiet https://bootstrap.pypa.io/get-pip.py && python3.8 get-pip.py
RUN pip3 install --no-cache-dir \
jupyterlab==$JUPYTERLAB_VERSION \
pyspark==3.3.2 \
pyspark==$SPARK_VERSION \
mlflow==2.3.0
RUN mkdir -p $HADOOP_CONF_DIR
COPY conf/* $HADOOP_CONF_DIR
Expand Down
Loading

0 comments on commit 726fb7e

Please sign in to comment.