Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle sensitive information being inside a list in resource_dict #2178

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

dbasunag
Copy link
Contributor

@dbasunag dbasunag commented Oct 23, 2024

Short description:

Current code hash_resource_dict() does not provide a flexible way to hide fields like userData, for virtual machines (it could be present in spec.template.spec.volumes)

More details:
What this PR does / why we need it:
Which issue(s) this PR fixes:
Special notes for reviewer:
Bug:

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a new function to enhance security by masking sensitive values in resource dictionaries.
    • Added a property in the VirtualMachine class to access specific keys related to user data.
  • Improvements

    • Simplified the keys_to_hash property in the SealedSecret class for better data structure alignment.
    • Updated the hashing process for resource data, improving logging security.
    • Enhanced the keys_to_hash property in the Resource class to reflect a new keypath format.
  • Tests

    • Added new test cases to validate the functionality of the new hashing feature, including edge cases and invalid inputs.

@redhat-qe-bot2
Copy link

Report bugs in Issues

The following are automatically added:

  • Add reviewers from OWNER file (in the root of the repository) under reviewers section.
  • Set PR size label.
  • New issue is created for the PR. (Closed when PR is merged/closed)
  • Run pre-commit if .pre-commit-config.yaml exists in the repo.

Available user actions:

  • To mark PR as WIP comment /wip to the PR, To remove it from the PR comment /wip cancel to the PR.
  • To block merging of PR comment /hold, To un-block merging of PR comment /hold cancel.
  • To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
    verified label removed on each new commit push.
  • To cherry pick a merged PR comment /cherry-pick <target branch to cherry-pick to> in the PR.
    • Multiple target branches can be cherry-picked, separated by spaces. (/cherry-pick branch1 branch2)
    • Cherry-pick will be started when PR is merged
  • To build and push container image command /build-and-push-container in the PR (tag will be the PR number).
    • You can add extra args to the Podman build command
      • Example: /build-and-push-container --build-arg OPENSHIFT_PYTHON_WRAPPER_COMMIT=<commit_hash>
  • To add a label by comment use /<label name>, to remove, use /<label name> cancel
  • To assign reviewers based on OWNERS file use /assign-reviewers
  • To check if PR can be merged use /check-can-merge
Supported /retest check runs
  • /retest tox: Retest tox
  • /retest python-module-install: Retest python-module-install
  • /retest all: Retest all
Supported labels
  • hold
  • verified
  • wip
  • lgtm

Copy link

coderabbitai bot commented Oct 23, 2024

Walkthrough

This pull request introduces a new function, replace_key_with_hashed_value, in the ocp_resources/resource.py file, which recursively searches a nested dictionary for a specified key and replaces its value with "*******" if found. The hash_resource_dict method in the Resource class is updated to utilize this new function, simplifying the process of hashing resource data. Additionally, the keys_to_hash property in the SealedSecret class is modified to reflect a new keypath separator, and a new keys_to_hash property is added to the VirtualMachine class.

Changes

File Path Change Summary
ocp_resources/resource.py - Added replace_key_with_hashed_value(resource_dict, key_name) function.
- Updated hash_resource_dict(self, resource_dict) method to use the new function and removed benedict library usage.
- Updated keys_to_hash(self) property to reflect new keypath separator ">".
ocp_resources/sealed_secret.py - Updated keys_to_hash property to return ["spec>data", "spec>encryptedData"] instead of ["spec..data", "spec..encryptedData"].
ocp_resources/virtual_machine.py - Added new keys_to_hash property returning ["spec>template>spec>volumes[]>cloudInitNoCloud>userData"].
tests/test_validate_resources.py - Added new fixture vm_spec() and four test methods to validate the replace_key_with_hashed_value function.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
ocp_resources/virtual_machine.py (1)

179-181: LGTM! Consider adding documentation.

The implementation correctly identifies the sensitive field that needs to be masked. Consider adding a docstring to explain the purpose of this property and its security implications.

 @property
 def keys_to_hash(self):
+    """
+    Returns a list of keys containing sensitive information that should be masked.
+    
+    Returns:
+        list[str]: List of keys to be masked in the resource dictionary.
+        Currently masks 'userData' which contains sensitive boot-time configuration.
+    """
     return ["userData"]
ocp_resources/resource.py (1)

136-156: LGTM with suggestions for improvements.

The implementation correctly handles nested dictionaries and lists. However, consider these improvements:

  1. Add protection against circular references to prevent stack overflow
  2. Consider making the mask value configurable for flexibility

Here's a suggested improvement:

-def change_dict_value_to_hashed(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]:
+def change_dict_value_to_hashed(
+    resource_dict: Dict[Any, Any],
+    key_name: str,
+    mask: str = "******",
+    _seen: set | None = None
+) -> Dict[Any, Any]:
     """
     Recursively search a nested dictionary for a given key and changes its value to "******" if found.
 
     Args:
         resource_dict: The nested dictionary to search.
         key_name: The key to find.
+        mask: The value to use for masking sensitive data.
+        _seen: Internal parameter to track circular references.
 
     Returns:
         The modified dictionary.
     """
+    if _seen is None:
+        _seen = set()
+
+    # Handle circular references
+    if id(resource_dict) in _seen:
+        return resource_dict
+
     if isinstance(resource_dict, dict):
+        _seen.add(id(resource_dict))
         for key, value in resource_dict.items():
             if key == key_name:
-                resource_dict[key] = "******"
+                resource_dict[key] = mask
             elif isinstance(value, dict):
-                resource_dict[key] = change_dict_value_to_hashed(value, key_name)
+                resource_dict[key] = change_dict_value_to_hashed(value, key_name, mask, _seen)
             elif isinstance(value, list):
                 for key_list, value_list in enumerate(value):
-                    value[key_list] = change_dict_value_to_hashed(value_list, key_name)
+                    value[key_list] = change_dict_value_to_hashed(value_list, key_name, mask, _seen)
     return resource_dict
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 0c5830d and dafe9d0.

📒 Files selected for processing (3)
  • ocp_resources/resource.py (2 hunks)
  • ocp_resources/sealed_secret.py (1 hunks)
  • ocp_resources/virtual_machine.py (1 hunks)
🧰 Additional context used
🔇 Additional comments (3)
ocp_resources/sealed_secret.py (1)

53-53: Verify the impact of simplified key paths.

The simplified key paths in keys_to_hash might affect how sensitive data is masked in the resource dictionary. Let's verify this change works correctly with the updated hash_resource_dict implementation.

Consider adding a docstring to the keys_to_hash property to document:

  1. The expected format of the keys
  2. How these keys are used in the resource dictionary masking process
  3. Why the "spec.." prefix was removed

This will help future maintainers understand the design decision and prevent accidental reintroduction of the prefix.

✅ Verification successful

Let me gather more information about how the key paths are actually processed in the codebase.

This will help us understand:

  1. How the key paths are processed in the hashing function
  2. If there are tests verifying this functionality
  3. Where the SealedSecret class is used in the codebase

Let me check one more thing to verify the test coverage and actual usage patterns.

This will help us understand:

  1. The test coverage for this functionality
  2. The complete context of SealedSecret class
  3. How these specific keys are actually used in hashing

Based on the gathered information, I can now provide the final response:

Simplified key paths in keys_to_hash are correct and safe

The change from ["spec..data", "spec..encryptedData"] to ["data", "encryptedData"] is correct because:

  1. The change_dict_value_to_hashed function performs a recursive dictionary search for exact key matches, making the "spec.." prefix unnecessary
  2. The SealedSecret class directly accepts data and encrypted_data as top-level parameters in its constructor
  3. The implementation aligns with the actual structure of the resource where sensitive data is stored

The change maintains security while simplifying the implementation, as the hashing mechanism in hash_resource_dict will still mask all sensitive information correctly.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of keys_to_hash in the codebase
# and ensure it's compatible with the new implementation

# Check if there are any direct references to the old format "spec..data"
echo "Checking for old key format usage..."
rg -l "spec\.\.data|spec\.\.encryptedData" 

# Look for other implementations of keys_to_hash to ensure consistency
echo "Checking other keys_to_hash implementations..."
ast-grep --pattern 'def keys_to_hash'

# Check the Resource class implementation that uses these keys
echo "Checking Resource class implementation..."
rg -A 10 "def hash_resource_dict"

Length of output: 1254


Script:

#!/bin/bash
# Check the implementation of change_dict_value_to_hashed function
echo "Checking change_dict_value_to_hashed implementation..."
rg -A 15 "def change_dict_value_to_hashed"

# Check if there are any tests that verify this functionality
echo "Checking related tests..."
fd -e py test_sealed_secret test_resource --exec rg -l "keys_to_hash|hash_resource_dict" {}

# Check the actual usage of SealedSecret class
echo "Checking SealedSecret usage..."
rg -l "class SealedSecret"

Length of output: 1542


Script:

#!/bin/bash
# Check for test files with correct path
echo "Checking test files..."
fd -e py -g "*test*" --exec rg -l "keys_to_hash|hash_resource_dict" {}

# Check the complete SealedSecret class implementation
echo "Checking SealedSecret class implementation..."
rg -A 20 "class SealedSecret" ocp_resources/sealed_secret.py

# Check actual usage of hash_resource_dict with these keys
echo "Checking actual usage patterns..."
rg -B 5 -A 5 "hash_resource_dict.*data|hash_resource_dict.*encryptedData"

Length of output: 1177

ocp_resources/resource.py (2)

1198-1198: LGTM! Clean integration of the new function.

The change correctly integrates the new change_dict_value_to_hashed function while maintaining the existing behavior and safety measures (deep copy, condition checks).


136-156: Verify the impact on existing consumers.

Let's verify the impact of these changes on existing code.

Also applies to: 1198-1198

✅ Verification successful

Let me gather more information about the keys_to_hash property since the previous search didn't show its complete usage pattern.

This will help us understand:

  1. Where and how keys_to_hash is defined and used
  2. The usage pattern of hash_log_data which controls the hashing behavior
  3. Any potential impact on existing implementations

Based on the gathered information, I can now provide a final response as we have a clear picture of the usage and impact:

The changes to change_dict_value_to_hashed are safe and properly integrated

The function is used within a well-defined pattern where:

  • It's only called from the base Resource class's hash_resource_dict method
  • The hashing behavior is controlled by two factors:
    • The hash_log_data boolean parameter (defaults to True)
    • Resource-specific keys_to_hash property implementations
  • Only specific resource types implement keys_to_hash:
    • ConfigMap: ["data", "binaryData"]
    • VirtualMachine: ["userData"]
    • SealedSecret: ["data", "encryptedData"]
    • Secret: ["data", "stringData"]

The changes maintain the existing behavior while improving the implementation, and all current consumers are properly integrated with this pattern.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for usages of hash_resource_dict and keys_to_hash property

echo "Checking for direct calls to hash_resource_dict..."
rg -A 2 "hash_resource_dict\("

echo -e "\nChecking for classes that override keys_to_hash property..."
ast-grep --pattern 'class $_ {
  $$$
  @property
  def keys_to_hash($_) {
    $$$
  }
  $$$
}'

Length of output: 1839


Script:

#!/bin/bash
# Let's search for any references to keys_to_hash in the codebase
echo "Searching for keys_to_hash references..."
rg -A 2 "keys_to_hash"

# Also check for hash_log_data since it's used in conjunction
echo -e "\nSearching for hash_log_data references..."
rg -A 2 "hash_log_data"

Length of output: 2647

Comment on lines 179 to 181
@property
def keys_to_hash(self):
return ["userData"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider using a class constant for the sensitive key.

To improve maintainability and reusability, consider defining the sensitive key as a class constant.

 class VirtualMachine(NamespacedResource):
+    # Sensitive fields that should be masked when serializing
+    SENSITIVE_USER_DATA_KEY = "userData"
+
     """
     Virtual Machine object, inherited from Resource.
     Implements actions start / stop / status / wait for VM status / is running
     """
     
     @property
     def keys_to_hash(self):
-        return ["userData"]
+        return [self.SENSITIVE_USER_DATA_KEY]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@property
def keys_to_hash(self):
return ["userData"]
@property
def keys_to_hash(self):
return [self.SENSITIVE_USER_DATA_KEY]

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
ocp_resources/resource.py (1)

136-156: Improve function documentation and edge case handling.

The function implementation looks good, but there are a few suggestions for improvement:

  1. Documentation could be enhanced:

    • Add a note about in-place modification
    • Document the behavior for edge cases
    • Add example usage
  2. Consider handling edge cases:

    • None values
    • Non-dict/non-list objects in nested structures
  3. Performance optimization:

    • Consider using a generator for list iteration
    • Add early return for empty/None input

Here's how you could improve the implementation:

 def change_dict_value_to_hashed(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]:
     """
     Recursively search a nested dictionary for a given key and changes its value to "******" if found.
+    The function modifies the input dictionary in-place.
 
     Args:
         resource_dict: The nested dictionary to search.
         key_name: The key to find.
 
     Returns:
         The modified dictionary.
+
+    Example:
+        >>> d = {"a": {"sensitive": "secret"}, "b": [{"sensitive": "hidden"}]}
+        >>> change_dict_value_to_hashed(d, "sensitive")
+        {'a': {'sensitive': '******'}, 'b': [{'sensitive': '******'}]}
     """
+    if not resource_dict:
+        return resource_dict
+
     if isinstance(resource_dict, dict):
         for key, value in resource_dict.items():
             if key == key_name:
                 resource_dict[key] = "******"
-            elif isinstance(value, dict):
+            elif value is not None and isinstance(value, dict):
                 resource_dict[key] = change_dict_value_to_hashed(value, key_name)
-            elif isinstance(value, list):
+            elif value is not None and isinstance(value, list):
                 for key_list, value_list in enumerate(value):
-                    value[key_list] = change_dict_value_to_hashed(value_list, key_name)
+                    if value_list is not None:
+                        value[key_list] = change_dict_value_to_hashed(value_list, key_name)
     return resource_dict
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between dafe9d0 and 462ee06.

📒 Files selected for processing (1)
  • ocp_resources/resource.py (2 hunks)
🧰 Additional context used
🔇 Additional comments (2)
ocp_resources/resource.py (2)

1199-1199: LGTM! Clean implementation of sensitive data handling.

The modification correctly utilizes the new change_dict_value_to_hashed function to handle sensitive information in resource dictionaries.


1199-1199: Verify handling of sensitive fields across the codebase.

Let's ensure that all sensitive fields, particularly userData in VM specs, are properly handled by the new implementation.

Run the following script to verify the usage:

ocp_resources/resource.py Outdated Show resolved Hide resolved
@@ -134,9 +133,27 @@ def sub_resource_level(current_class: Any, owner_class: Any, parent_class: Any)
return None


# Exceptions classes
def change_dict_value_to_hashed(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests.

ocp_resources/resource.py Show resolved Hide resolved
ocp_resources/resource.py Outdated Show resolved Hide resolved
ocp_resources/sealed_secret.py Outdated Show resolved Hide resolved
@@ -15,7 +15,6 @@
import kubernetes
from kubernetes.dynamic import DynamicClient, ResourceInstance
import yaml
from benedict import benedict
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

benedict is no longer used and should be removed from the project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. Will do.

def replace_key_with_hashed_value(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]:
"""
Recursively search a nested dictionary for a given key and changes its value to "******" if found.
The function supports two key formats:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing newline

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the newline should come after the oneliner function description

1. Regular path: "path>to>key"
2. List path: "path>to[]>key"
Args:
resource_dict: The nested dictionary to search. Must be a dictionary.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
resource_dict: The nested dictionary to search. Must be a dictionary.
resource_dict: The nested dictionary to search.

no need for this; the function has typing

"""
Recursively search a nested dictionary for a given key and changes its value to "******" if found.
The function supports two key formats:
1. Regular path: "path>to>key"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do not understand the examples; can you provide real ones? i.e example dict and example path

Raises:
ValueError: If resource_dict is not a dictionary or key_name is not a string.
"""
# Create a deep copy to avoid modifying the input
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the inline comments; the code is clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, those were from coderabbit :). Will remove.

Comment on lines +1217 to +1218
if not isinstance(resource_dict, dict):
raise ValueError("Expected a dictionary as the first argument")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? we do not check types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@myakove suggested adding this one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to have a nice error when failing, typing is not enforced, only suggests the type (by IDE)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we do not have such checks in the wrapper

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know how do you want me to proceed.

assert replace_key_with_hashed_value(resource_dict={}, key_name="a>b") == {}

# Non-existent key
assert replace_key_with_hashed_value(resource_dict={"x": {"y": "z"}}, key_name="a>b") == {"x": {"y": "z"}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each should be a separate test

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (4)
tests/test_validate_resources.py (2)

10-22: Consider enhancing the vm_spec fixture with additional test cases.

The fixture provides good coverage but could be enhanced with more real-world scenarios:

  1. Empty lists/objects
  2. Null values
  3. Multiple nested lists
  4. Different data types (numbers, booleans)
 @pytest.fixture()
 def vm_spec():
     return {
         "spec": {
             "template": {
                 "features": {"someNewFeature": {"someNewData": "sensitive information"}},
                 "spec": {
                     "volumes": [
                         {"name": "volume1", "userData": "sensitive-data"},
+                        {"name": "volume2", "userData": None},
+                        {"name": "volume3", "userData": 12345},
+                        {"name": "volume4"}
                     ],
+                    "nestedList": [
+                        {"items": [{"userData": "nested-sensitive"}]}
+                    ],
+                    "emptyList": []
                 },
             }
         }
     }

41-62: Enhance test assertions for better coverage.

While the test verifies the hashed value, it should also verify that:

  1. Other fields remain unchanged
  2. The structure of the dictionary is preserved
  3. The length of lists remains the same
 def test_replace_key_with_hashed_value(vm_spec, key_name, expected_key):
+    original_dict = vm_spec.copy()
+    result = replace_key_with_hashed_value(resource_dict=vm_spec, key_name=key_name)
     assert (
-        benedict(replace_key_with_hashed_value(resource_dict=vm_spec, key_name=key_name), keypath_separator=">")[
+        benedict(result, keypath_separator=">")[
             expected_key
         ]
         == "*******"
     )
+    # Verify other fields remain unchanged
+    assert result["spec"]["template"]["spec"]["volumes"][0]["name"] == original_dict["spec"]["template"]["spec"]["volumes"][0]["name"]
+    # Verify structure is preserved
+    assert len(result["spec"]["template"]["spec"]["volumes"]) == len(original_dict["spec"]["template"]["spec"]["volumes"])
ocp_resources/resource.py (2)

140-140: Add newline after line 140.

Add a newline after the line "The function supports two key formats:" to maintain consistent docstring formatting.


1214-1214: Fix docstring example indentation.

The example in the docstring should be properly indented for better readability.

-             To hash spec['data'] key pass: ["spec>data"]
+            To hash spec['data'] key pass: ["spec>data"]
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 576abae and 691d3eb.

📒 Files selected for processing (2)
  • ocp_resources/resource.py (2 hunks)
  • tests/test_validate_resources.py (2 hunks)
🔇 Additional comments (2)
ocp_resources/resource.py (2)

137-179: Implementation looks good!

The function effectively handles both regular dictionary paths and list paths, with good separation of concerns and proper deep copying to avoid modifying the input dictionary. The use of benedict for key path handling is a robust choice.


1219-1225: Implementation looks good!

The changes to hash_resource_dict properly validate input and integrate well with the new replace_key_with_hashed_value function. The type checking adds robustness to the implementation.

Comment on lines +65 to +79
@pytest.mark.parametrize(
"resource, key_name, expected_result",
[
# Empty dictionary
pytest.param({}, "a>b", {}, id="test_replace_key_with_hashed_value_empty_dict"),
# Non-existent key
pytest.param(
{"x": {"y": "z"}}, "a>b", {"x": {"y": "z"}}, id="test_replace_key_with_hashed_value_key_doesnt_exist"
),
],
)
def test_replace_key_with_hashed_value_edge_cases(resource, key_name, expected_result):
"""Test edge cases for replace_key_with_hashed_value function."""
assert replace_key_with_hashed_value(resource_dict=resource, key_name=key_name) == expected_result

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add more edge cases as suggested in previous reviews.

The current edge cases are good but incomplete. Add tests for:

  1. Invalid key paths
  2. Empty key path
  3. Non-string key path
  4. Deeply nested structures
  5. Multiple occurrences of same key at different levels
     [
         # Empty dictionary
         pytest.param({}, "a>b", {}, id="test_replace_key_with_hashed_value_empty_dict"),
         # Non-existent key
         pytest.param(
             {"x": {"y": "z"}}, "a>b", {"x": {"y": "z"}}, id="test_replace_key_with_hashed_value_key_doesnt_exist"
         ),
+        # Invalid key path
+        pytest.param(
+            {"a": {"b": "value"}}, "a>>b", {"a": {"b": "value"}}, 
+            id="test_replace_key_with_hashed_value_invalid_path",
+            marks=pytest.mark.xfail(raises=ValueError)
+        ),
+        # Empty key path
+        pytest.param(
+            {"a": "value"}, "", {"a": "value"},
+            id="test_replace_key_with_hashed_value_empty_path",
+            marks=pytest.mark.xfail(raises=ValueError)
+        ),
+        # Deep nesting
+        pytest.param(
+            {"l1": {"l2": {"l3": {"l4": {"key": "value"}}}}},
+            "l1>l2>l3>l4>key",
+            {"l1": {"l2": {"l3": {"l4": {"key": "*******"}}}}},
+            id="test_replace_key_with_hashed_value_deep_nesting"
+        ),
     ],

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +88 to +91
def test_replace_key_with_hashed_value_invalid_inputs():
"""Test that the function handles invalid inputs appropriately."""
with pytest.raises(TypeError):
replace_key_with_hashed_value(resource_dict=None, key_name="a>b")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Expand invalid input tests as suggested in previous reviews.

Add more invalid input scenarios:

  1. Non-string key path
  2. Invalid key path format
  3. List/dict as key path
 def test_replace_key_with_hashed_value_invalid_inputs():
     """Test that the function handles invalid inputs appropriately."""
     with pytest.raises(TypeError):
         replace_key_with_hashed_value(resource_dict=None, key_name="a>b")
+    # Non-string key path
+    with pytest.raises(TypeError):
+        replace_key_with_hashed_value(resource_dict={}, key_name=123)
+    
+    # List as key path
+    with pytest.raises(TypeError):
+        replace_key_with_hashed_value(resource_dict={}, key_name=["a", "b"])
+    
+    # Dict as key path
+    with pytest.raises(TypeError):
+        replace_key_with_hashed_value(resource_dict={}, key_name={"a": "b"})
+    
+    # Invalid key path format
+    with pytest.raises(ValueError):
+        replace_key_with_hashed_value(resource_dict={}, key_name="a>>b")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_replace_key_with_hashed_value_invalid_inputs():
"""Test that the function handles invalid inputs appropriately."""
with pytest.raises(TypeError):
replace_key_with_hashed_value(resource_dict=None, key_name="a>b")
def test_replace_key_with_hashed_value_invalid_inputs():
"""Test that the function handles invalid inputs appropriately."""
with pytest.raises(TypeError):
replace_key_with_hashed_value(resource_dict=None, key_name="a>b")
# Non-string key path
with pytest.raises(TypeError):
replace_key_with_hashed_value(resource_dict={}, key_name=123)
# List as key path
with pytest.raises(TypeError):
replace_key_with_hashed_value(resource_dict={}, key_name=["a", "b"])
# Dict as key path
with pytest.raises(TypeError):
replace_key_with_hashed_value(resource_dict={}, key_name={"a": "b"})
# Invalid key path format
with pytest.raises(ValueError):
replace_key_with_hashed_value(resource_dict={}, key_name="a>>b")

Comment on lines +81 to +86
def test_replace_key_with_hashed_value_multiple_occurances(vm_spec):
vm_spec["spec"]["template"]["spec"]["volumes"].append({"name": "volume2", "userData": "more sensitive-data"})
result = replace_key_with_hashed_value(resource_dict=vm_spec, key_name="spec>template>spec>volumes[]>userData")
for volume in result["spec"]["template"]["spec"]["volumes"]:
assert volume["userData"] == "*******"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid modifying fixture data and enhance multiple occurrences test.

  1. Instead of modifying the fixture, create a new dictionary with the test data
  2. Add assertions to verify the original data remains unchanged
  3. Test multiple occurrences at different nesting levels
 def test_replace_key_with_hashed_value_multiple_occurances(vm_spec):
-    vm_spec["spec"]["template"]["spec"]["volumes"].append({"name": "volume2", "userData": "more sensitive-data"})
+    test_data = {
+        "userData": "root-level",
+        "spec": {
+            "template": {
+                "spec": {
+                    "volumes": [
+                        {"name": "volume1", "userData": "level-1"},
+                        {"name": "volume2", "userData": "level-1"}
+                    ],
+                    "nested": {
+                        "userData": "level-2"
+                    }
+                }
+            }
+        }
+    }
+    original_data = test_data.copy()
+    result = replace_key_with_hashed_value(resource_dict=test_data, key_name="userData")
-    result = replace_key_with_hashed_value(resource_dict=vm_spec, key_name="spec>template>spec>volumes[]>userData")
-    for volume in result["spec"]["template"]["spec"]["volumes"]:
-        assert volume["userData"] == "*******"
+    # Verify all occurrences are hashed
+    assert result["userData"] == "*******"
+    assert result["spec"]["template"]["spec"]["nested"]["userData"] == "*******"
+    for volume in result["spec"]["template"]["spec"]["volumes"]:
+        assert volume["userData"] == "*******"
+    # Verify original data is unchanged
+    assert test_data == original_data
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_replace_key_with_hashed_value_multiple_occurances(vm_spec):
vm_spec["spec"]["template"]["spec"]["volumes"].append({"name": "volume2", "userData": "more sensitive-data"})
result = replace_key_with_hashed_value(resource_dict=vm_spec, key_name="spec>template>spec>volumes[]>userData")
for volume in result["spec"]["template"]["spec"]["volumes"]:
assert volume["userData"] == "*******"
def test_replace_key_with_hashed_value_multiple_occurances(vm_spec):
test_data = {
"userData": "root-level",
"spec": {
"template": {
"spec": {
"volumes": [
{"name": "volume1", "userData": "level-1"},
{"name": "volume2", "userData": "level-1"}
],
"nested": {
"userData": "level-2"
}
}
}
}
}
original_data = test_data.copy()
result = replace_key_with_hashed_value(resource_dict=test_data, key_name="userData")
# Verify all occurrences are hashed
assert result["userData"] == "*******"
assert result["spec"]["template"]["spec"]["nested"]["userData"] == "*******"
for volume in result["spec"]["template"]["spec"]["volumes"]:
assert volume["userData"] == "*******"
# Verify original data is unchanged
assert test_data == original_data

def replace_key_with_hashed_value(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]:
"""
Recursively search a nested dictionary for a given key and changes its value to "******" if found.
The function supports two key formats:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the newline should come after the oneliner function description

Comment on lines +1217 to +1218
if not isinstance(resource_dict, dict):
raise ValueError("Expected a dictionary as the first argument")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we do not have such checks in the wrapper

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
ocp_resources/resource.py (3)

137-141: Add empty line after docstring's first line.

According to PEP 257, there should be an empty line after the docstring's first line.

    """
    Recursively search a nested dictionary for a given key and changes its value to "******" if found.

    The function supports two key formats:

142-148: Enhance docstring with concrete examples.

The docstring would be more helpful with concrete dictionary examples.

    1. Regular dictionary path:
        A key to be hashed can be found directly in a dictionary, e.g. "a>b>c", would hash the value associated with
-        key "c", where dictionary format is {a:{b:{c:sensitive data}}}
+        key "c", where dictionary format is:
+        input = {
+            "a": {
+                "b": {
+                    "c": "sensitive data"
+                }
+            }
+        }
+        output = {
+            "a": {
+                "b": {
+                    "c": "*******"
+                }
+            }
+        }
    2. List path:
        A key to be hashed can be found in a dictionary that is in list somewhere in a dictionary, e.g. "a>b[]>c",
-        would hash the value associated with key "c", where dictionary format is {a:{b:[{d: not sensitive data},
-        {c:sensitive data}]}}
+        would hash the value associated with key "c", where dictionary format is:
+        input = {
+            "a": {
+                "b": [
+                    {"d": "not sensitive data"},
+                    {"c": "sensitive data"}
+                ]
+            }
+        }
+        output = {
+            "a": {
+                "b": [
+                    {"d": "not sensitive data"},
+                    {"c": "*******"}
+                ]
+            }
+        }

158-160: Consider optimizing deep copy usage.

Creating a deep copy for every recursive call could impact performance with large nested dictionaries. Consider passing the original dictionary in the recursive calls and only creating a deep copy at the top level.

-    result = copy.deepcopy(resource_dict)
-
-    benedict_resource_dict = benedict(result, keypath_separator=">")
+    # Only create deep copy at top level
+    if not hasattr(replace_key_with_hashed_value, '_is_recursive_call'):
+        resource_dict = copy.deepcopy(resource_dict)
+    benedict_resource_dict = benedict(resource_dict, keypath_separator=">")
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 691d3eb and f01c5c0.

📒 Files selected for processing (1)
  • ocp_resources/resource.py (2 hunks)
🔇 Additional comments (1)
ocp_resources/resource.py (1)

1219-1225: LGTM! Input validation and data handling look good.

The changes to hash_resource_dict are well-implemented:

  • Proper input validation
  • Safe handling of the input dictionary through deep copy
  • Clear logic for applying hashing to specified keys

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants