Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Datadog integration and update module references in manifests #275

Merged
merged 2 commits into from
Nov 15, 2024

Conversation

brettcurtis
Copy link
Contributor

@brettcurtis brettcurtis commented Nov 15, 2024

Summary by CodeRabbit

  • New Features

    • Introduced a new module for the Kubernetes Datadog Operator, enhancing monitoring capabilities.
    • Added new data sources for Google Cloud Platform resources to improve infrastructure management.
    • Enabled Datadog monitoring by updating the configuration.
  • Bug Fixes

    • Updated the configuration for the Kubernetes Datadog Operator module to ensure proper functionality.
  • Documentation

    • Enhanced Terraform documentation with detailed information on providers, modules, resources, and required inputs.

@brettcurtis brettcurtis self-assigned this Nov 15, 2024
Copy link
Contributor

coderabbitai bot commented Nov 15, 2024

Walkthrough

This pull request introduces several changes across multiple files related to the Kubernetes Datadog Operator module. Key modifications include updates to the .pre-commit-config.yaml to skip additional checks, enhancements to the README.md files for documentation clarity, and the addition of new data sources in the data.tf file. The configuration for the Datadog Operator module is updated to a new version, and the datadog_enable variable is set to enable monitoring.

Changes

File Path Change Summary
.pre-commit-config.yaml Updated --skip-check argument for checkov hook from "CKV_TF_1" to "CKV_TF_1,CKV_TF_2".
regional/datadog/README.md Added module entry for kubernetes_datadog_operator with source and version v0.1.4.
regional/datadog/main.tf Updated module source from v0.1.3 to v0.1.4, activated previously commented-out code, and added cluster_prefix.
regional/datadog/manifests/README.md Updated documentation to include provider, new module, resources, and inputs for Datadog configuration.
regional/datadog/manifests/data.tf Introduced new data sources for GCP resources: google_client_config, google_container_cluster, google_projects, and google_project.
regional/datadog/manifests/main.tf Activated module declaration for kubernetes_datadog_operator_manifests and updated source version.
tfvars/sandbox.tfvars Updated datadog_enable variable from false to true.

Possibly related PRs

  • Refactor project ID and number references #142: The changes in this PR also involve modifications to the .pre-commit-config.yaml file, specifically related to the checkov hook, which is directly relevant to the changes made in the main PR regarding the --skip-check argument.

Suggested reviewers

  • osinfra-sa

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

infracost bot commented Nov 15, 2024

💰 Infracost report

Monthly estimate increased by $25 📈

Changed project Baseline cost Usage cost* Total change New monthly cost
main-sandbox +$0 +$25 +$25 (+201%) $37

*Usage costs were estimated using Infracost Cloud settings, see docs for other options.

Estimate details (includes details of unsupported resources)
Key: * usage cost, ~ changed, + added, - removed

──────────────────────────────────
Project: main-sandbox

+ module.datadog[0].google_pubsub_subscription.integration
  +$15

    + Message delivery data
      +$5, +0.125 TiB*

    + Retained acknowledged message storage
      +$5, +18 GiB*

    + Snapshot message backlog storage
      +$5, +18 GiB*

+ module.datadog[0].google_logging_project_sink.integration
  +$5

    + Logging data
      +$5, +10 GB*

+ module.datadog[0].google_pubsub_topic.integration
  +$5

    + Message ingestion data
      +$5, +0.125 TiB*

Monthly cost change for main-sandbox
Amount:  +$25 ($12 → $37)
Percent: +201%

──────────────────────────────────
Key: * usage cost, ~ changed, + added, - removed
24 projects have no cost estimate changes.
Run the following command to see their breakdown: infracost breakdown --path=/path/to/code

──────────────────────────────────
*Usage costs were estimated using Infracost Cloud settings, see docs for other options.

148 cloud resources were detected:
∙ 21 were estimated
∙ 114 were free
∙ 13 are not supported yet, see https://infracost.io/requested-resources:
  ∙ 4 x google_gke_hub_feature
  ∙ 3 x google_gke_hub_membership
  ∙ 2 x google_billing_budget
  ∙ 2 x google_cloud_identity_group_membership
  ∙ 2 x google_compute_security_policy

Infracost estimate: Monthly estimate increased by $25 ↑
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Changed project                                    ┃ Baseline cost ┃ Usage cost* ┃ Total change ┃
┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━╋━━━━━━━━━━━━━━┫
┃ main-sandbox                                       ┃           +$0 ┃        +$25 ┃ +$25 (+201%) ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━┻━━━━━━━━━━━━━━┛
This comment will be updated when code changes.

@brettcurtis brettcurtis temporarily deployed to Sandbox Datadog: Regional - us-east4-a November 15, 2024 12:39 — with GitHub Actions Inactive
@brettcurtis brettcurtis temporarily deployed to Sandbox Datadog: Regional - us-east1-b November 15, 2024 12:39 — with GitHub Actions Inactive
@brettcurtis brettcurtis temporarily deployed to Sandbox Datadog Manifests: Regional - us-east1-b November 15, 2024 12:41 — with GitHub Actions Inactive
@brettcurtis brettcurtis temporarily deployed to Sandbox Datadog: Regional - us-east4-a November 15, 2024 15:22 — with GitHub Actions Inactive
@brettcurtis brettcurtis temporarily deployed to Sandbox Datadog Manifests: Regional - us-east1-b November 15, 2024 15:22 — with GitHub Actions Inactive
@brettcurtis brettcurtis marked this pull request as ready for review November 15, 2024 15:37
@brettcurtis brettcurtis requested a review from a user November 15, 2024 15:37
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (6)
regional/datadog/main.tf (1)

9-9: Consider making cluster_prefix configurable

The cluster prefix is hardcoded to "plt". Consider making this configurable through a variable to support different environments or naming conventions.

-  cluster_prefix = "plt"
+  cluster_prefix = var.cluster_prefix
regional/datadog/manifests/main.tf (1)

9-11: Consider parameterizing environment-specific values

The following hardcoded values might need to be parameterized for better flexibility across different environments:

  • cluster_prefix = "plt"
  • registry = "us-docker.pkg.dev/plt-lz-services-tf79-prod/plt-docker-virtual"

Consider using variables for these values:

-  cluster_prefix = "plt"
-  registry       = "us-docker.pkg.dev/plt-lz-services-tf79-prod/plt-docker-virtual"
+  cluster_prefix = var.cluster_prefix
+  registry       = var.container_registry
regional/datadog/manifests/data.tf (2)

Line range hint 18-20: Consider parameterizing the project prefix

While the filter works, the hardcoded prefix plt-k8s-* could make the code less maintainable. Consider:

  1. Moving the prefix to a variable
  2. Adding validation for the environment label existence
+variable "project_prefix" {
+  description = "Prefix used for GCP project names"
+  type        = string
+  default     = "plt-k8s-"
+}

 data "google_projects" "this" {
-  filter = "name:plt-k8s-* labels.env:${module.helpers.environment}"
+  filter = "name:${var.project_prefix}* labels.env:${module.helpers.environment}"
 }

Line range hint 25-27: Add error handling for project lookup

The current implementation assumes that at least one project will always be found. This could lead to runtime errors if no projects match the filter criteria.

Consider adding validation:

 data "google_project" "this" {
+  count = length(data.google_projects.this.projects) > 0 ? 1 : 0
   project_id = data.google_projects.this.projects.0.project_id
 }

+locals {
+  project_id = try(
+    data.google_project.this[0].project_id,
+    fail("No matching GCP projects found with prefix '${var.project_prefix}' and environment '${module.helpers.environment}'")
+  )
+}
regional/datadog/README.md (1)

Line range hint 1-50: Consider security and operational aspects of Datadog integration

A few architectural considerations for this integration:

  1. Ensure Datadog API/APP keys are managed securely (e.g., using secret management solutions)
  2. Consider implementing proper RBAC policies for the Datadog Operator in GKE
  3. Plan for monitoring coverage across all regions where this module will be deployed
  4. Consider setting up alerts and dashboards as code alongside this integration
regional/datadog/manifests/README.md (1)

Line range hint 31-35: Ensure secure handling of Datadog credentials.

The Datadog API and APP keys are sensitive values. Please ensure that:

  1. These keys are stored securely (e.g., using secret management solutions)
  2. The values are marked as sensitive in the variable definitions

Consider adding the following to your variable definitions in the corresponding .tf files:

variable "datadog_api_key" {
  description = "Datadog API key"
  type        = string
  sensitive   = true
}

variable "datadog_app_key" {
  description = "Datadog APP key"
  type        = string
  sensitive   = true
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b5f12e9 and 50e98b4.

📒 Files selected for processing (7)
  • .pre-commit-config.yaml (1 hunks)
  • regional/datadog/README.md (1 hunks)
  • regional/datadog/main.tf (1 hunks)
  • regional/datadog/manifests/README.md (1 hunks)
  • regional/datadog/manifests/data.tf (1 hunks)
  • regional/datadog/manifests/main.tf (1 hunks)
  • tfvars/sandbox.tfvars (1 hunks)
🔇 Additional comments (13)
tfvars/sandbox.tfvars (2)

1-1: Verify Datadog security configurations

Enabling Datadog requires proper security configurations to ensure secure data collection and transmission.

#!/bin/bash
# Description: Verify Datadog security configurations

# Check for Datadog security-related configurations
rg -A 5 "security|rbac|role|clusterrole|secret" regional/datadog/

# Check for Datadog API key handling
ast-grep --pattern 'apiKey|api_key|appKey|app_key'

# Look for network policies
rg -A 5 "NetworkPolicy" regional/datadog/manifests/

Ensure the following security measures are in place:

  • RBAC configurations for Datadog service accounts
  • Network policies to control Datadog agent communications
  • Secure handling of Datadog API and app keys

1-1: Verify Datadog costs against monthly budget constraints

The sandbox environment has a monthly budget of $25 (line 11). Please ensure that enabling Datadog monitoring won't exceed this budget constraint.

Consider implementing resource quotas and limits for Datadog agents to control costs.

✅ Verification successful

Enabling Datadog in sandbox environment appears safe within budget constraints

Based on the verification results:

  • The Datadog operator module sets default resource limits and requests for agents
  • The node agent has configurable CPU and memory limits to control resource usage
  • The sandbox environment has a monthly budget of $25, which should accommodate basic Datadog monitoring given the resource constraints in place
  • The implementation uses the official Datadog operator with standard configurations
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for Datadog cost configurations and resource limits

# Look for Datadog resource configurations
rg -A 5 "resources|limits" regional/datadog/

# Look for any Datadog cost-related configurations
rg -A 5 "cost|budget|billing" regional/datadog/

Length of output: 172


Script:

#!/bin/bash
# Let's try a broader search to find Datadog-related configurations

# Search for any Datadog-related terraform files
fd -e tf -e tfvars . | xargs rg -l "datadog"

# Search for resource configurations in terraform files
rg -A 5 "resource\s+\"datadog" 

# Look for variables related to Datadog
rg "variable\s+\"datadog" -A 5

# Search for any cost or budget related configurations
rg "budget|cost|pricing" -A 5

Length of output: 4701


Script:

#!/bin/bash
# Let's check for Datadog agent configurations and resource limits

# Look for Datadog agent configurations
rg -A 10 "agent:|cluster_agent:|resources:" regional/datadog/manifests/

# Check main Datadog configuration
cat regional/datadog/manifests/main.tf

# Look for any helm values or configurations
fd . -e yaml -e yml | xargs rg -l "datadog"

Length of output: 696


Script:

#!/bin/bash
# Let's check the Datadog operator module referenced and the entity configuration

# Check the entity configuration
cat entity.datadog.yaml

# Try to find any default values or configurations in the module
curl -s https://raw.githubusercontent.com/osinfra-io/terraform-kubernetes-datadog-operator/v0.1.4/regional/manifests/main.tf

# Look for any resource quotas or limits in kubernetes configurations
fd . -e yaml -e yml | xargs rg "resources:|limits:|requests:"

Length of output: 11844

regional/datadog/main.tf (2)

7-8: Ensure secure handling of Datadog credentials

The Datadog API and App keys are sensitive credentials. Verify that:

  1. The variables are marked as sensitive in the variables definition
  2. The keys are stored securely (e.g., in a secret manager)
#!/bin/bash
# Description: Check variable definitions and potential credential exposure

# Check if variables are marked as sensitive
rg -A 5 "variable \"datadog_(api|app)_key\"" 

# Look for potential hardcoded credentials
rg -g '!.git' -i "datadog.*key.*=.*[a-z0-9]{32}"

4-5: Verify the module version upgrade changes

The module version has been updated to v0.1.4. Let's verify the changelog for any breaking changes.

regional/datadog/manifests/main.tf (3)

7-8: LGTM! Secure handling of sensitive parameters

Good practice using variables for sensitive Datadog API and APP keys instead of hardcoding them.


4-12: LGTM! Complete Datadog Operator configuration

The module configuration includes all necessary parameters for enabling the Datadog integration, which aligns with the PR objectives.


4-5: Verify the module version and check for breaking changes

The module version has been updated from v0.1.3 to v0.1.4. Let's verify this version.

regional/datadog/manifests/data.tf (2)

Line range hint 3-5: LGTM: Correctly configured google_client_config data source

The empty configuration block is appropriate for this data source as it retrieves the default client configuration.


Line range hint 9-13: Verify cluster naming convention and existence

The cluster name format plt-${module.helpers.region}-${module.helpers.zone} makes assumptions about the naming convention. Please verify:

  1. That this naming pattern matches your GKE cluster naming standards
  2. That the cluster exists in the specified region
✅ Verification successful

Cluster naming convention is consistent and verified

The naming convention plt-${region}-${zone} is consistently used across the infrastructure:

  • It matches the cluster prefix "plt" defined in multiple modules (cert-manager, istio, datadog)
  • The same pattern is used for subnet naming and cluster references
  • The naming is part of a broader platform (plt) infrastructure pattern, as evidenced by artifact registry and project naming
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the GKE cluster naming convention across the infrastructure

# Search for GKE cluster definitions in Terraform files
rg -t tf "google_container_cluster|google_container_cluster" -A 5

# Search for any documentation about cluster naming conventions
fd -t f "README.md" -X grep -l "cluster.*naming"

Length of output: 1296


Script:

#!/bin/bash
# Let's search for more context about the cluster naming pattern and configuration

# Search for any plt- prefix usage in terraform files
rg -t tf "plt-" -B 2 -A 2

# Search for any variables or locals that might define cluster naming
rg -t tf "cluster.*name.*=|name.*cluster.*=" -B 2 -A 2

# Search for any helper module references that might contain naming conventions
fd -t tf . -X grep -l "module.helpers"

Length of output: 2845

.pre-commit-config.yaml (1)

42-42: Verify the necessity of skipping the version pinning check

The addition of CKV_TF_2 to the skip list disables Checkov's version pinning verification. While this might be needed for the Datadog integration, skipping version checks could lead to infrastructure instability if module versions unexpectedly change.

Consider the following recommendations:

  1. Document the rationale for skipping CKV_TF_2 in a comment
  2. If possible, implement version pinning and remove this exemption to ensure infrastructure stability
  3. If the exemption is necessary, consider adding it selectively only for specific modules rather than globally

Run this script to analyze version pinning patterns in Terraform files:

regional/datadog/README.md (1)

19-19: LGTM! Module addition with proper version pinning.

The addition of the Kubernetes Datadog Operator module with a pinned version is a good practice for infrastructure stability.

Let's verify the module source and version:

✅ Verification successful

Module source and version verified successfully

The repository osinfra-io/terraform-kubernetes-datadog-operator exists and the tag v0.1.4 is available, confirming the module source is valid and accessible.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the module source repository and version tag existence

# Check if the repository exists and is accessible
gh repo view osinfra-io/terraform-kubernetes-datadog-operator

# Verify the version tag exists
gh api \
  -H "Accept: application/vnd.github+json" \
  repos/osinfra-io/terraform-kubernetes-datadog-operator/git/refs/tags/v0.1.4

Length of output: 5530

regional/datadog/manifests/README.md (2)

19-19: Verify Datadog operator module version.

The Kubernetes Datadog operator module is pinned to version v0.1.4. Let's verify if this is the latest stable version.

#!/bin/bash
# Description: Check latest version of the Datadog operator module

# Get latest release tag
gh api repos/osinfra-io/terraform-kubernetes-datadog-operator/releases/latest --jq .tag_name

# Check for any breaking changes in recent releases
gh api repos/osinfra-io/terraform-kubernetes-datadog-operator/releases --jq '.[0:3] | .[] | {tag_name, body}'

23-28: Verify data source dependencies and permissions.

The added data sources look good and are well-documented. However, let's verify that all necessary IAM permissions are in place for these data sources to function correctly.

#!/bin/bash
# Description: Check for IAM-related configurations in the codebase

# Search for IAM role definitions or bindings related to these data sources
rg -A 5 "google_(client_config|container_cluster|project|projects)"

# Look for any existing IAM configurations
fd -t f "iam" -x cat {} \;

regional/datadog/manifests/data.tf Show resolved Hide resolved
regional/datadog/manifests/README.md Show resolved Hide resolved
@brettcurtis brettcurtis merged commit e290f8e into main Nov 15, 2024
37 of 39 checks passed
@brettcurtis brettcurtis deleted the datadog branch November 15, 2024 16:04
@coderabbitai coderabbitai bot mentioned this pull request Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

1 participant