Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add JSON classifier and ETL IAM role #17

Merged
merged 4 commits into from
Nov 12, 2024
Merged

Conversation

patheard
Copy link
Member

@patheard patheard commented Nov 9, 2024

Summary

  • Add a custom classifier for crawling files that contain a JSON array of objects.
  • Add Glue ETL IAM role.

Related

Add a custom classifier for crawling files that contain a JSON
array of objects.
@patheard patheard self-assigned this Nov 9, 2024
@patheard patheard changed the title fix: add JSON classifier to account tags crawler fix: add JSON classifier and ETL IAM role Nov 10, 2024
@patheard patheard requested review from wmoussa-gc and a team November 12, 2024 13:09
Copy link

Production: glue 🧴

✅   Terraform Init: success
✅   Terraform Validate: success
✅   Terraform Format: success
✅   Terraform Plan: success
✅   Conftest: success

Plan: 5 to add, 3 to change, 0 to destroy
Show summary
CHANGE NAME
update aws_glue_crawler.operations_aws_production_account_tags
aws_iam_policy.glue_crawler
aws_kms_key.aws_glue
add aws_glue_classifier.json_object_array
aws_iam_policy.glue_etl
aws_iam_role.glue_etl
aws_iam_role_policy_attachment.glue_etl
aws_iam_role_policy_attachment.glue_etl_service_role
Show plan
Resource actions are indicated with the following symbols:
  + create
  ~ update in-place
 <= read (data resources)

Terraform will perform the following actions:

  # data.aws_iam_policy_document.aws_glue will be read during apply
  # (config refers to values not yet known)
 <= data "aws_iam_policy_document" "aws_glue" {
      + id            = (known after apply)
      + json          = (known after apply)
      + minified_json = (known after apply)

      + statement {
          + actions   = [
              + "kms:*",
            ]
          + effect    = "Allow"
          + resources = [
              + "*",
            ]

          + principals {
              + identifiers = [
                  + "739275439843",
                ]
              + type        = "AWS"
            }
        }
      + statement {
          + actions   = [
              + "kms:Decrypt*",
              + "kms:Describe*",
              + "kms:Encrypt*",
              + "kms:GenerateDataKey*",
              + "kms:ReEncrypt*",
            ]
          + effect    = "Allow"
          + resources = [
              + "*",
            ]

          + principals {
              + identifiers = [
                  + "logs.ca-central-1.amazonaws.com",
                ]
              + type        = "Service"
            }
        }
      + statement {
          + actions   = [
              + "kms:CreateGrant",
              + "kms:Decrypt",
              + "kms:DescribeKey",
              + "kms:Encrypt",
              + "kms:GenerateDataKey",
              + "kms:GenerateDataKeyWithoutPlaintext",
              + "kms:ReEncryptFrom",
              + "kms:ReEncryptTo",
              + "kms:RetireGrant",
            ]
          + effect    = "Allow"
          + resources = [
              + "*",
            ]

          + principals {
              + identifiers = [
                  + "arn:aws:iam::739275439843:role/service-role/AWSGlueCrawler-DataLake",
                  + (known after apply),
                ]
              + type        = "AWS"
            }
        }
    }

  # data.aws_iam_policy_document.glue_crawler_combined will be read during apply
  # (config refers to values not yet known)
 <= data "aws_iam_policy_document" "glue_crawler_combined" {
      + id                      = (known after apply)
      + json                    = (known after apply)
      + minified_json           = (known after apply)
      + source_policy_documents = [
          + jsonencode(
                {
                  + Statement = [
                      + {
                          + Action   = "s3:GetObject"
                          + Effect   = "Allow"
                          + Resource = [
                              + "arn:aws:s3:::cds-data-lake-transformed-production/*",
                              + "arn:aws:s3:::cds-data-lake-raw-production/*",
                              + "arn:aws:s3:::cds-data-lake-curated-production/*",
                            ]
                          + Sid      = "ReadDataLakeS3Buckets"
                        },
                    ]
                  + Version   = "2012-10-17"
                }
            ),
          + (known after apply),
        ]
    }

  # data.aws_iam_policy_document.glue_etl_combined will be read during apply
  # (config refers to values not yet known)
 <= data "aws_iam_policy_document" "glue_etl_combined" {
      + id                      = (known after apply)
      + json                    = (known after apply)
      + minified_json           = (known after apply)
      + source_policy_documents = [
          + jsonencode(
                {
                  + Statement = [
                      + {
                          + Action   = "s3:GetObject"
                          + Effect   = "Allow"
                          + Resource = [
                              + "arn:aws:s3:::cds-data-lake-transformed-production/*",
                              + "arn:aws:s3:::cds-data-lake-raw-production/*",
                              + "arn:aws:s3:::cds-data-lake-curated-production/*",
                            ]
                          + Sid      = "ReadDataLakeS3Buckets"
                        },
                    ]
                  + Version   = "2012-10-17"
                }
            ),
          + jsonencode(
                {
                  + Statement = [
                      + {
                          + Action   = "s3:PutObject"
                          + Effect   = "Allow"
                          + Resource = [
                              + "arn:aws:s3:::cds-data-lake-transformed-production/*",
                              + "arn:aws:s3:::cds-data-lake-curated-production/*",
                            ]
                          + Sid      = "WriteDataLakeS3TransformedBuckets"
                        },
                    ]
                  + Version   = "2012-10-17"
                }
            ),
          + (known after apply),
        ]
    }

  # data.aws_iam_policy_document.glue_kms will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "aws_iam_policy_document" "glue_kms" {
      + id            = (known after apply)
      + json          = (known after apply)
      + minified_json = (known after apply)

      + statement {
          + actions   = [
              + "kms:CreateGrant",
              + "kms:Decrypt",
              + "kms:DescribeKey",
              + "kms:Encrypt",
              + "kms:GenerateDataKey",
              + "kms:GenerateDataKeyWithoutPlaintext",
              + "kms:ReEncryptFrom",
              + "kms:ReEncryptTo",
              + "kms:RetireGrant",
            ]
          + effect    = "Allow"
          + resources = [
              + "arn:aws:kms:ca-central-1:739275439843:key/991f91d8-209e-40f2-b925-915adf74a043",
            ]
          + sid       = "UseGlueKey"
        }
      + statement {
          + actions   = [
              + "logs:AssociateKmsKey",
            ]
          + effect    = "Allow"
          + resources = [
              + "arn:aws:logs:ca-central-1:739275439843:log-group:/aws-glue/crawlers-role/service-role/AWSGlueCrawler-DataLake-encryption-at-rest:*",
            ]
          + sid       = "AssociateKmsKey"
        }
    }

  # aws_glue_classifier.json_object_array will be created
  + resource "aws_glue_classifier" "json_object_array" {
      + id   = (known after apply)
      + name = "json_object_array"

      + json_classifier {
          + json_path = "$[*]"
        }
    }

  # aws_glue_crawler.operations_aws_production_account_tags will be updated in-place
  ~ resource "aws_glue_crawler" "operations_aws_production_account_tags" {
      ~ classifiers            = [
          ~ "JSON array" -> "json_object_array",
        ]
        id                     = "Organization Account Tags"
        name                   = "Organization Account Tags"
        tags                   = {}
        # (9 unchanged attributes hidden)

        # (5 unchanged blocks hidden)
    }

  # aws_iam_policy.glue_crawler will be updated in-place
  ~ resource "aws_iam_policy" "glue_crawler" {
        id               = "arn:aws:iam::739275439843:policy/service-role/AWSGlueCrawler-DataLake"
        name             = "AWSGlueCrawler-DataLake"
      ~ policy           = jsonencode(
            {
              - Statement = [
                  - {
                      - Action   = "s3:GetObject"
                      - Effect   = "Allow"
                      - Resource = [
                          - "arn:aws:s3:::cds-data-lake-transformed-production/*",
                          - "arn:aws:s3:::cds-data-lake-raw-production/*",
                          - "arn:aws:s3:::cds-data-lake-curated-production/*",
                        ]
                      - Sid      = "ReadDataLakeS3Buckets"
                    },
                  - {
                      - Action   = [
                          - "kms:RetireGrant",
                          - "kms:ReEncryptTo",
                          - "kms:ReEncryptFrom",
                          - "kms:GenerateDataKeyWithoutPlaintext",
                          - "kms:GenerateDataKey",
                          - "kms:Encrypt",
                          - "kms:DescribeKey",
                          - "kms:Decrypt",
                          - "kms:CreateGrant",
                        ]
                      - Effect   = "Allow"
                      - Resource = "arn:aws:kms:ca-central-1:739275439843:key/991f91d8-209e-40f2-b925-915adf74a043"
                      - Sid      = "UseGlueKey"
                    },
                  - {
                      - Action   = "logs:AssociateKmsKey"
                      - Effect   = "Allow"
                      - Resource = "arn:aws:logs:ca-central-1:739275439843:log-group:/aws-glue/crawlers-role/service-role/AWSGlueCrawler-DataLake-encryption-at-rest:*"
                      - Sid      = "AssociateKmsKey"
                    },
                ]
              - Version   = "2012-10-17"
            }
        ) -> (known after apply)
        tags             = {}
        # (7 unchanged attributes hidden)
    }

  # aws_iam_policy.glue_etl will be created
  + resource "aws_iam_policy" "glue_etl" {
      + arn              = (known after apply)
      + attachment_count = (known after apply)
      + id               = (known after apply)
      + name             = "AWSGlueETL-DataLake"
      + name_prefix      = (known after apply)
      + path             = "/service-role/"
      + policy           = (known after apply)
      + policy_id        = (known after apply)
      + tags_all         = {
          + "CostCentre" = "PlatformDataLake"
          + "Terraform"  = "true"
        }
    }

  # aws_iam_role.glue_etl will be created
  + resource "aws_iam_role" "glue_etl" {
      + arn                   = (known after apply)
      + assume_role_policy    = jsonencode(
            {
              + Statement = [
                  + {
                      + Action    = "sts:AssumeRole"
                      + Effect    = "Allow"
                      + Principal = {
                          + Service = "glue.amazonaws.com"
                        }
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + create_date           = (known after apply)
      + force_detach_policies = false
      + id                    = (known after apply)
      + managed_policy_arns   = (known after apply)
      + max_session_duration  = 3600
      + name                  = "AWSGlueETL-DataLake"
      + name_prefix           = (known after apply)
      + path                  = "/service-role/"
      + tags_all              = {
          + "CostCentre" = "PlatformDataLake"
          + "Terraform"  = "true"
        }
      + unique_id             = (known after apply)

      + inline_policy (known after apply)
    }

  # aws_iam_role_policy_attachment.glue_etl will be created
  + resource "aws_iam_role_policy_attachment" "glue_etl" {
      + id         = (known after apply)
      + policy_arn = (known after apply)
      + role       = "AWSGlueETL-DataLake"
    }

  # aws_iam_role_policy_attachment.glue_etl_service_role will be created
  + resource "aws_iam_role_policy_attachment" "glue_etl_service_role" {
      + id         = (known after apply)
      + policy_arn = "arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole"
      + role       = "AWSGlueETL-DataLake"
    }

  # aws_kms_key.aws_glue will be updated in-place
  ~ resource "aws_kms_key" "aws_glue" {
        id                                 = "991f91d8-209e-40f2-b925-915adf74a043"
      ~ policy                             = jsonencode(
            {
              - Statement = [
                  - {
                      - Action    = "kms:*"
                      - Effect    = "Allow"
                      - Principal = {
                          - AWS = "739275439843"
                        }
                      - Resource  = "*"
                    },
                  - {
                      - Action    = [
                          - "kms:ReEncrypt*",
                          - "kms:GenerateDataKey*",
                          - "kms:Encrypt*",
                          - "kms:Describe*",
                          - "kms:Decrypt*",
                        ]
                      - Effect    = "Allow"
                      - Principal = {
                          - Service = "logs.ca-central-1.amazonaws.com"
                        }
                      - Resource  = "*"
                    },
                  - {
                      - Action    = [
                          - "kms:RetireGrant",
                          - "kms:ReEncryptTo",
                          - "kms:ReEncryptFrom",
                          - "kms:GenerateDataKeyWithoutPlaintext",
                          - "kms:GenerateDataKey",
                          - "kms:Encrypt",
                          - "kms:DescribeKey",
                          - "kms:Decrypt",
                          - "kms:CreateGrant",
                        ]
                      - Effect    = "Allow"
                      - Principal = {
                          - AWS = "arn:aws:iam::739275439843:role/service-role/AWSGlueCrawler-DataLake"
                        }
                      - Resource  = "*"
                    },
                ]
              - Version   = "2012-10-17"
            }
        ) -> (known after apply)
        tags                               = {}
        # (13 unchanged attributes hidden)
    }

Plan: 5 to add, 3 to change, 0 to destroy.

─────────────────────────────────────────────────────────────────────────────

Saved the plan to: plan.tfplan

To perform exactly these actions, run the following command to apply:
    terraform apply "plan.tfplan"
Show Conftest results
WARN - plan.json - main - Missing Common Tags: ["aws_glue_catalog_database.operations_aws_production"]
WARN - plan.json - main - Missing Common Tags: ["aws_glue_crawler.operations_aws_production_account_tags"]
WARN - plan.json - main - Missing Common Tags: ["aws_glue_crawler.operations_aws_production_cost_usage_report"]
WARN - plan.json - main - Missing Common Tags: ["aws_iam_policy.glue_crawler"]
WARN - plan.json - main - Missing Common Tags: ["aws_iam_policy.glue_etl"]
WARN - plan.json - main - Missing Common Tags: ["aws_iam_role.glue_crawler"]
WARN - plan.json - main - Missing Common Tags: ["aws_iam_role.glue_etl"]
WARN - plan.json - main - Missing Common Tags: ["aws_kms_key.aws_glue"]

27 tests, 19 passed, 8 warnings, 0 failures, 0 exceptions

@patheard patheard merged commit a34a2e6 into main Nov 12, 2024
4 checks passed
@patheard patheard deleted the fix/json-classifier branch November 12, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants