Handle sensitive information being inside a list in resource_dict #2178

dbasunag · 2024-10-23T19:40:32Z

Short description:

Current code hash_resource_dict() does not provide a flexible way to hide fields like userData, for virtual machines (it could be present in spec.template.spec.volumes)

More details:

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for reviewer:

Bug:

Summary by CodeRabbit

New Features
- Introduced a new function to enhance security by masking sensitive values in resource dictionaries.
- Added a property in the VirtualMachine class to access specific keys related to user data.
Improvements
- Simplified the keys_to_hash property in the SealedSecret class for better data structure alignment.
- Updated the hashing process for resource data, improving logging security.
- Enhanced the keys_to_hash property in the Resource class to reflect a new keypath format.
Tests
- Added new test cases to validate the functionality of the new hashing feature, including edge cases and invalid inputs.

redhat-qe-bot2 · 2024-10-23T19:40:40Z

Report bugs in Issues

The following are automatically added:

Add reviewers from OWNER file (in the root of the repository) under reviewers section.
Set PR size label.
New issue is created for the PR. (Closed when PR is merged/closed)
Run pre-commit if .pre-commit-config.yaml exists in the repo.

Available user actions:

To mark PR as WIP comment /wip to the PR, To remove it from the PR comment /wip cancel to the PR.
To block merging of PR comment /hold, To un-block merging of PR comment /hold cancel.
To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
verified label removed on each new commit push.
To cherry pick a merged PR comment /cherry-pick <target branch to cherry-pick to> in the PR.
- Multiple target branches can be cherry-picked, separated by spaces. (/cherry-pick branch1 branch2)
- Cherry-pick will be started when PR is merged
To build and push container image command /build-and-push-container in the PR (tag will be the PR number).
- You can add extra args to the Podman build command
  - Example: /build-and-push-container --build-arg OPENSHIFT_PYTHON_WRAPPER_COMMIT=<commit_hash>
To add a label by comment use /<label name>, to remove, use /<label name> cancel
To assign reviewers based on OWNERS file use /assign-reviewers
To check if PR can be merged use /check-can-merge

Supported /retest check runs

/retest tox: Retest tox
/retest python-module-install: Retest python-module-install
/retest all: Retest all

Supported labels

hold
verified
wip
lgtm

coderabbitai · 2024-10-23T19:40:41Z

Walkthrough

This pull request introduces a new function, replace_key_with_hashed_value, in the ocp_resources/resource.py file, which recursively searches a nested dictionary for a specified key and replaces its value with "*******" if found. The hash_resource_dict method in the Resource class is updated to utilize this new function, simplifying the process of hashing resource data. Additionally, the keys_to_hash property in the SealedSecret class is modified to reflect a new keypath separator, and a new keys_to_hash property is added to the VirtualMachine class.

Changes

File Path	Change Summary
ocp_resources/resource.py	- Added `replace_key_with_hashed_value(resource_dict, key_name)` function. - Updated `hash_resource_dict(self, resource_dict)` method to use the new function and removed `benedict` library usage. - Updated `keys_to_hash(self)` property to reflect new keypath separator ">".
ocp_resources/sealed_secret.py	- Updated `keys_to_hash` property to return `["spec>data", "spec>encryptedData"]` instead of `["spec..data", "spec..encryptedData"]`.
ocp_resources/virtual_machine.py	- Added new `keys_to_hash` property returning `["spec>template>spec>volumes[]>cloudInitNoCloud>userData"]`.
tests/test_validate_resources.py	- Added new fixture `vm_spec()` and four test methods to validate the `replace_key_with_hashed_value` function.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)

ocp_resources/virtual_machine.py (1)

179-181: LGTM! Consider adding documentation.

The implementation correctly identifies the sensitive field that needs to be masked. Consider adding a docstring to explain the purpose of this property and its security implications.
 @property
 def keys_to_hash(self):
+    """
+    Returns a list of keys containing sensitive information that should be masked.
+    
+    Returns:
+        list[str]: List of keys to be masked in the resource dictionary.
+        Currently masks 'userData' which contains sensitive boot-time configuration.
+    """
     return ["userData"]

ocp_resources/resource.py (1)

136-156: LGTM with suggestions for improvements.

The implementation correctly handles nested dictionaries and lists. However, consider these improvements:

Add protection against circular references to prevent stack overflow
Consider making the mask value configurable for flexibility

Here's a suggested improvement:

-def change_dict_value_to_hashed(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]:
+def change_dict_value_to_hashed(
+    resource_dict: Dict[Any, Any],
+    key_name: str,
+    mask: str = "******",
+    _seen: set | None = None
+) -> Dict[Any, Any]:
     """
     Recursively search a nested dictionary for a given key and changes its value to "******" if found.
 
     Args:
         resource_dict: The nested dictionary to search.
         key_name: The key to find.
+        mask: The value to use for masking sensitive data.
+        _seen: Internal parameter to track circular references.
 
     Returns:
         The modified dictionary.
     """
+    if _seen is None:
+        _seen = set()
+
+    # Handle circular references
+    if id(resource_dict) in _seen:
+        return resource_dict
+
     if isinstance(resource_dict, dict):
+        _seen.add(id(resource_dict))
         for key, value in resource_dict.items():
             if key == key_name:
-                resource_dict[key] = "******"
+                resource_dict[key] = mask
             elif isinstance(value, dict):
-                resource_dict[key] = change_dict_value_to_hashed(value, key_name)
+                resource_dict[key] = change_dict_value_to_hashed(value, key_name, mask, _seen)
             elif isinstance(value, list):
                 for key_list, value_list in enumerate(value):
-                    value[key_list] = change_dict_value_to_hashed(value_list, key_name)
+                    value[key_list] = change_dict_value_to_hashed(value_list, key_name, mask, _seen)
     return resource_dict

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 0c5830d and dafe9d0.

📒 Files selected for processing (3)

ocp_resources/resource.py (2 hunks)
ocp_resources/sealed_secret.py (1 hunks)
ocp_resources/virtual_machine.py (1 hunks)

🧰 Additional context used

🔇 Additional comments (3)

ocp_resources/sealed_secret.py (1)
53-53: Verify the impact of simplified key paths.

The simplified key paths in keys_to_hash might affect how sensitive data is masked in the resource dictionary. Let's verify this change works correctly with the updated hash_resource_dict implementation.

Consider adding a docstring to the keys_to_hash property to document:

The expected format of the keys

How these keys are used in the resource dictionary masking process

Why the "spec.." prefix was removed

This will help future maintainers understand the design decision and prevent accidental reintroduction of the prefix.

✅ Verification successful

Let me gather more information about how the key paths are actually processed in the codebase.

This will help us understand:

How the key paths are processed in the hashing function

If there are tests verifying this functionality

Where the SealedSecret class is used in the codebase

Let me check one more thing to verify the test coverage and actual usage patterns.

This will help us understand:

The test coverage for this functionality

The complete context of SealedSecret class

How these specific keys are actually used in hashing

Based on the gathered information, I can now provide the final response:

Simplified key paths in keys_to_hash are correct and safe

The change from ["spec..data", "spec..encryptedData"] to ["data", "encryptedData"] is correct because:

The change_dict_value_to_hashed function performs a recursive dictionary search for exact key matches, making the "spec.." prefix unnecessary

The SealedSecret class directly accepts data and encrypted_data as top-level parameters in its constructor

The implementation aligns with the actual structure of the resource where sensitive data is stored

The change maintains security while simplifying the implementation, as the hashing mechanism in hash_resource_dict will still mask all sensitive information correctly.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the usage of keys_to_hash in the codebase
# and ensure it's compatible with the new implementation

# Check if there are any direct references to the old format "spec..data"
echo "Checking for old key format usage..."
rg -l "spec\.\.data|spec\.\.encryptedData" 

# Look for other implementations of keys_to_hash to ensure consistency
echo "Checking other keys_to_hash implementations..."
ast-grep --pattern 'def keys_to_hash'

# Check the Resource class implementation that uses these keys
echo "Checking Resource class implementation..."
rg -A 10 "def hash_resource_dict"
Length of output: 1254

Script:
#!/bin/bash
# Check the implementation of change_dict_value_to_hashed function
echo "Checking change_dict_value_to_hashed implementation..."
rg -A 15 "def change_dict_value_to_hashed"

# Check if there are any tests that verify this functionality
echo "Checking related tests..."
fd -e py test_sealed_secret test_resource --exec rg -l "keys_to_hash|hash_resource_dict" {}

# Check the actual usage of SealedSecret class
echo "Checking SealedSecret usage..."
rg -l "class SealedSecret"
Length of output: 1542

Script:
#!/bin/bash
# Check for test files with correct path
echo "Checking test files..."
fd -e py -g "*test*" --exec rg -l "keys_to_hash|hash_resource_dict" {}

# Check the complete SealedSecret class implementation
echo "Checking SealedSecret class implementation..."
rg -A 20 "class SealedSecret" ocp_resources/sealed_secret.py

# Check actual usage of hash_resource_dict with these keys
echo "Checking actual usage patterns..."
rg -B 5 -A 5 "hash_resource_dict.*data|hash_resource_dict.*encryptedData"
Length of output: 1177
ocp_resources/resource.py (2)
1198-1198: LGTM! Clean integration of the new function.

The change correctly integrates the new change_dict_value_to_hashed function while maintaining the existing behavior and safety measures (deep copy, condition checks).

136-156: Verify the impact on existing consumers.

Let's verify the impact of these changes on existing code.

Also applies to: 1198-1198

✅ Verification successful

Let me gather more information about the keys_to_hash property since the previous search didn't show its complete usage pattern.

This will help us understand:

Where and how keys_to_hash is defined and used

The usage pattern of hash_log_data which controls the hashing behavior

Any potential impact on existing implementations

Based on the gathered information, I can now provide a final response as we have a clear picture of the usage and impact:

The changes to change_dict_value_to_hashed are safe and properly integrated

The function is used within a well-defined pattern where:

It's only called from the base Resource class's hash_resource_dict method

The hashing behavior is controlled by two factors:

The hash_log_data boolean parameter (defaults to True)

Resource-specific keys_to_hash property implementations

Only specific resource types implement keys_to_hash:

ConfigMap: ["data", "binaryData"]

VirtualMachine: ["userData"]

SealedSecret: ["data", "encryptedData"]

Secret: ["data", "stringData"]

The changes maintain the existing behavior while improving the implementation, and all current consumers are properly integrated with this pattern.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check for usages of hash_resource_dict and keys_to_hash property

echo "Checking for direct calls to hash_resource_dict..."
rg -A 2 "hash_resource_dict\("

echo -e "\nChecking for classes that override keys_to_hash property..."
ast-grep --pattern 'class $_ {
  $$$
  @property
  def keys_to_hash($_) {
    $$$
  }
  $$$
}'
Length of output: 1839

Script:
#!/bin/bash
# Let's search for any references to keys_to_hash in the codebase
echo "Searching for keys_to_hash references..."
rg -A 2 "keys_to_hash"

# Also check for hash_log_data since it's used in conjunction
echo -e "\nSearching for hash_log_data references..."
rg -A 2 "hash_log_data"
Length of output: 2647

coderabbitai · 2024-10-23T19:45:04Z

ocp_resources/virtual_machine.py

+    @property
+    def keys_to_hash(self):
+        return ["userData"]


🛠️ Refactor suggestion

Consider using a class constant for the sensitive key.

To improve maintainability and reusability, consider defining the sensitive key as a class constant.

class VirtualMachine(NamespacedResource): + # Sensitive fields that should be masked when serializing + SENSITIVE_USER_DATA_KEY = "userData" + """ Virtual Machine object, inherited from Resource. Implements actions start / stop / status / wait for VM status / is running """ @property def keys_to_hash(self): - return ["userData"] + return [self.SENSITIVE_USER_DATA_KEY]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@property

def keys_to_hash(self):

return ["userData"]

@property

def keys_to_hash(self):

return [self.SENSITIVE_USER_DATA_KEY]

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

ocp_resources/resource.py (1)

136-156: Improve function documentation and edge case handling.

The function implementation looks good, but there are a few suggestions for improvement:

Documentation could be enhanced:
- Add a note about in-place modification
- Document the behavior for edge cases
- Add example usage
Consider handling edge cases:
- None values
- Non-dict/non-list objects in nested structures
Performance optimization:
- Consider using a generator for list iteration
- Add early return for empty/None input

Here's how you could improve the implementation:

 def change_dict_value_to_hashed(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]:
     """
     Recursively search a nested dictionary for a given key and changes its value to "******" if found.
+    The function modifies the input dictionary in-place.
 
     Args:
         resource_dict: The nested dictionary to search.
         key_name: The key to find.
 
     Returns:
         The modified dictionary.
+
+    Example:
+        >>> d = {"a": {"sensitive": "secret"}, "b": [{"sensitive": "hidden"}]}
+        >>> change_dict_value_to_hashed(d, "sensitive")
+        {'a': {'sensitive': '******'}, 'b': [{'sensitive': '******'}]}
     """
+    if not resource_dict:
+        return resource_dict
+
     if isinstance(resource_dict, dict):
         for key, value in resource_dict.items():
             if key == key_name:
                 resource_dict[key] = "******"
-            elif isinstance(value, dict):
+            elif value is not None and isinstance(value, dict):
                 resource_dict[key] = change_dict_value_to_hashed(value, key_name)
-            elif isinstance(value, list):
+            elif value is not None and isinstance(value, list):
                 for key_list, value_list in enumerate(value):
-                    value[key_list] = change_dict_value_to_hashed(value_list, key_name)
+                    if value_list is not None:
+                        value[key_list] = change_dict_value_to_hashed(value_list, key_name)
     return resource_dict

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between dafe9d0 and 462ee06.

📒 Files selected for processing (1)

ocp_resources/resource.py (2 hunks)

🧰 Additional context used

🔇 Additional comments (2)

ocp_resources/resource.py (2)

1199-1199: LGTM! Clean implementation of sensitive data handling.

The modification correctly utilizes the new change_dict_value_to_hashed function to handle sensitive information in resource dictionaries.

1199-1199: Verify handling of sensitive fields across the codebase.

Let's ensure that all sensitive fields, particularly userData in VM specs, are properly handled by the new implementation.

Run the following script to verify the usage:

ocp_resources/resource.py

myakove · 2024-10-25T08:17:42Z

ocp_resources/resource.py

@@ -134,9 +133,27 @@ def sub_resource_level(current_class: Any, owner_class: Any, parent_class: Any)
    return None


-# Exceptions classes
+def change_dict_value_to_hashed(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]:


Please add tests.

ocp_resources/resource.py

ocp_resources/sealed_secret.py

myakove · 2024-10-25T08:25:09Z

ocp_resources/resource.py

@@ -15,7 +15,6 @@
 import kubernetes
 from kubernetes.dynamic import DynamicClient, ResourceInstance
 import yaml
-from benedict import benedict


benedict is no longer used and should be removed from the project.

ok. Will do.

Co-authored-by: Meni Yakove <[email protected]>

rnetser · 2024-11-07T10:25:17Z

ocp_resources/resource.py

+def replace_key_with_hashed_value(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]:
+    """
+    Recursively search a nested dictionary for a given key and changes its value to "******" if found.
+    The function supports two key formats:


missing newline

the newline should come after the oneliner function description

rnetser · 2024-11-07T10:27:00Z

ocp_resources/resource.py

+    1. Regular path: "path>to>key"
+    2. List path: "path>to[]>key"
+    Args:
+        resource_dict: The nested dictionary to search. Must be a dictionary.


Suggested change

resource_dict: The nested dictionary to search. Must be a dictionary.

resource_dict: The nested dictionary to search.

no need for this; the function has typing

rnetser · 2024-11-07T10:28:25Z

ocp_resources/resource.py

+    """
+    Recursively search a nested dictionary for a given key and changes its value to "******" if found.
+    The function supports two key formats:
+    1. Regular path: "path>to>key"


i do not understand the examples; can you provide real ones? i.e example dict and example path

rnetser · 2024-11-07T10:28:59Z

ocp_resources/resource.py

+    Raises:
+        ValueError: If resource_dict is not a dictionary or key_name is not a string.
+    """
+    # Create a deep copy to avoid modifying the input


please remove the inline comments; the code is clear

ok, those were from coderabbit :). Will remove.

rnetser · 2024-11-07T10:32:47Z

ocp_resources/resource.py

+        if not isinstance(resource_dict, dict):
+            raise ValueError("Expected a dictionary as the first argument")


why is this needed? we do not check types

@myakove suggested adding this one.

It's better to have a nice error when failing, typing is not enforced, only suggests the type (by IDE)

but we do not have such checks in the wrapper

Let me know how do you want me to proceed.

rnetser · 2024-11-07T10:34:26Z

tests/test_validate_resources.py

+    assert replace_key_with_hashed_value(resource_dict={}, key_name="a>b") == {}
+
+    # Non-existent key
+    assert replace_key_with_hashed_value(resource_dict={"x": {"y": "z"}}, key_name="a>b") == {"x": {"y": "z"}}


each should be a separate test

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (4)

tests/test_validate_resources.py (2)

10-22: Consider enhancing the vm_spec fixture with additional test cases.

The fixture provides good coverage but could be enhanced with more real-world scenarios:

Empty lists/objects
Null values
Multiple nested lists
Different data types (numbers, booleans)

 @pytest.fixture()
 def vm_spec():
     return {
         "spec": {
             "template": {
                 "features": {"someNewFeature": {"someNewData": "sensitive information"}},
                 "spec": {
                     "volumes": [
                         {"name": "volume1", "userData": "sensitive-data"},
+                        {"name": "volume2", "userData": None},
+                        {"name": "volume3", "userData": 12345},
+                        {"name": "volume4"}
                     ],
+                    "nestedList": [
+                        {"items": [{"userData": "nested-sensitive"}]}
+                    ],
+                    "emptyList": []
                 },
             }
         }
     }

41-62: Enhance test assertions for better coverage.

While the test verifies the hashed value, it should also verify that:

Other fields remain unchanged
The structure of the dictionary is preserved
The length of lists remains the same

 def test_replace_key_with_hashed_value(vm_spec, key_name, expected_key):
+    original_dict = vm_spec.copy()
+    result = replace_key_with_hashed_value(resource_dict=vm_spec, key_name=key_name)
     assert (
-        benedict(replace_key_with_hashed_value(resource_dict=vm_spec, key_name=key_name), keypath_separator=">")[
+        benedict(result, keypath_separator=">")[
             expected_key
         ]
         == "*******"
     )
+    # Verify other fields remain unchanged
+    assert result["spec"]["template"]["spec"]["volumes"][0]["name"] == original_dict["spec"]["template"]["spec"]["volumes"][0]["name"]
+    # Verify structure is preserved
+    assert len(result["spec"]["template"]["spec"]["volumes"]) == len(original_dict["spec"]["template"]["spec"]["volumes"])

ocp_resources/resource.py (2)

140-140: Add newline after line 140.

Add a newline after the line "The function supports two key formats:" to maintain consistent docstring formatting.

1214-1214: Fix docstring example indentation.

The example in the docstring should be properly indented for better readability.
-             To hash spec['data'] key pass: ["spec>data"]
+            To hash spec['data'] key pass: ["spec>data"]

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 576abae and 691d3eb.

📒 Files selected for processing (2)

ocp_resources/resource.py (2 hunks)
tests/test_validate_resources.py (2 hunks)

🔇 Additional comments (2)

ocp_resources/resource.py (2)

137-179: Implementation looks good!

The function effectively handles both regular dictionary paths and list paths, with good separation of concerns and proper deep copying to avoid modifying the input dictionary. The use of benedict for key path handling is a robust choice.

1219-1225: Implementation looks good!

The changes to hash_resource_dict properly validate input and integrate well with the new replace_key_with_hashed_value function. The type checking adds robustness to the implementation.

coderabbitai · 2024-11-07T22:30:28Z

tests/test_validate_resources.py

+@pytest.mark.parametrize(
+    "resource, key_name, expected_result",
+    [
+        # Empty dictionary
+        pytest.param({}, "a>b", {}, id="test_replace_key_with_hashed_value_empty_dict"),
+        # Non-existent key
+        pytest.param(
+            {"x": {"y": "z"}}, "a>b", {"x": {"y": "z"}}, id="test_replace_key_with_hashed_value_key_doesnt_exist"
+        ),
+    ],
+)
+def test_replace_key_with_hashed_value_edge_cases(resource, key_name, expected_result):
+    """Test edge cases for replace_key_with_hashed_value function."""
+    assert replace_key_with_hashed_value(resource_dict=resource, key_name=key_name) == expected_result
+


🛠️ Refactor suggestion

Add more edge cases as suggested in previous reviews.

The current edge cases are good but incomplete. Add tests for:

Invalid key paths

Empty key path

Non-string key path

Deeply nested structures

Multiple occurrences of same key at different levels

[ # Empty dictionary pytest.param({}, "a>b", {}, id="test_replace_key_with_hashed_value_empty_dict"), # Non-existent key pytest.param( {"x": {"y": "z"}}, "a>b", {"x": {"y": "z"}}, id="test_replace_key_with_hashed_value_key_doesnt_exist" ), + # Invalid key path + pytest.param( + {"a": {"b": "value"}}, "a>>b", {"a": {"b": "value"}}, + id="test_replace_key_with_hashed_value_invalid_path", + marks=pytest.mark.xfail(raises=ValueError) + ), + # Empty key path + pytest.param( + {"a": "value"}, "", {"a": "value"}, + id="test_replace_key_with_hashed_value_empty_path", + marks=pytest.mark.xfail(raises=ValueError) + ), + # Deep nesting + pytest.param( + {"l1": {"l2": {"l3": {"l4": {"key": "value"}}}}}, + "l1>l2>l3>l4>key", + {"l1": {"l2": {"l3": {"l4": {"key": "*******"}}}}}, + id="test_replace_key_with_hashed_value_deep_nesting" + ), ],

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-07T22:30:28Z

tests/test_validate_resources.py

+def test_replace_key_with_hashed_value_invalid_inputs():
+    """Test that the function handles invalid inputs appropriately."""
+    with pytest.raises(TypeError):
+        replace_key_with_hashed_value(resource_dict=None, key_name="a>b")


🛠️ Refactor suggestion

Expand invalid input tests as suggested in previous reviews.

Add more invalid input scenarios:

Non-string key path

Invalid key path format

List/dict as key path

def test_replace_key_with_hashed_value_invalid_inputs(): """Test that the function handles invalid inputs appropriately.""" with pytest.raises(TypeError): replace_key_with_hashed_value(resource_dict=None, key_name="a>b") + # Non-string key path + with pytest.raises(TypeError): + replace_key_with_hashed_value(resource_dict={}, key_name=123) + + # List as key path + with pytest.raises(TypeError): + replace_key_with_hashed_value(resource_dict={}, key_name=["a", "b"]) + + # Dict as key path + with pytest.raises(TypeError): + replace_key_with_hashed_value(resource_dict={}, key_name={"a": "b"}) + + # Invalid key path format + with pytest.raises(ValueError): + replace_key_with_hashed_value(resource_dict={}, key_name="a>>b")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_replace_key_with_hashed_value_invalid_inputs():

"""Test that the function handles invalid inputs appropriately."""

with pytest.raises(TypeError):

replace_key_with_hashed_value(resource_dict=None, key_name="a>b")

def test_replace_key_with_hashed_value_invalid_inputs():

"""Test that the function handles invalid inputs appropriately."""

with pytest.raises(TypeError):

replace_key_with_hashed_value(resource_dict=None, key_name="a>b")

# Non-string key path

with pytest.raises(TypeError):

replace_key_with_hashed_value(resource_dict={}, key_name=123)

# List as key path

with pytest.raises(TypeError):

replace_key_with_hashed_value(resource_dict={}, key_name=["a", "b"])

# Dict as key path

with pytest.raises(TypeError):

replace_key_with_hashed_value(resource_dict={}, key_name={"a": "b"})

# Invalid key path format

with pytest.raises(ValueError):

replace_key_with_hashed_value(resource_dict={}, key_name="a>>b")

coderabbitai · 2024-11-07T22:30:28Z

tests/test_validate_resources.py

+def test_replace_key_with_hashed_value_multiple_occurances(vm_spec):
+    vm_spec["spec"]["template"]["spec"]["volumes"].append({"name": "volume2", "userData": "more sensitive-data"})
+    result = replace_key_with_hashed_value(resource_dict=vm_spec, key_name="spec>template>spec>volumes[]>userData")
+    for volume in result["spec"]["template"]["spec"]["volumes"]:
+        assert volume["userData"] == "*******"
+


🛠️ Refactor suggestion

Avoid modifying fixture data and enhance multiple occurrences test.

Instead of modifying the fixture, create a new dictionary with the test data

Add assertions to verify the original data remains unchanged

Test multiple occurrences at different nesting levels

def test_replace_key_with_hashed_value_multiple_occurances(vm_spec): - vm_spec["spec"]["template"]["spec"]["volumes"].append({"name": "volume2", "userData": "more sensitive-data"}) + test_data = { + "userData": "root-level", + "spec": { + "template": { + "spec": { + "volumes": [ + {"name": "volume1", "userData": "level-1"}, + {"name": "volume2", "userData": "level-1"} + ], + "nested": { + "userData": "level-2" + } + } + } + } + } + original_data = test_data.copy() + result = replace_key_with_hashed_value(resource_dict=test_data, key_name="userData") - result = replace_key_with_hashed_value(resource_dict=vm_spec, key_name="spec>template>spec>volumes[]>userData") - for volume in result["spec"]["template"]["spec"]["volumes"]: - assert volume["userData"] == "*******" + # Verify all occurrences are hashed + assert result["userData"] == "*******" + assert result["spec"]["template"]["spec"]["nested"]["userData"] == "*******" + for volume in result["spec"]["template"]["spec"]["volumes"]: + assert volume["userData"] == "*******" + # Verify original data is unchanged + assert test_data == original_data

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_replace_key_with_hashed_value_multiple_occurances(vm_spec):

vm_spec["spec"]["template"]["spec"]["volumes"].append({"name": "volume2", "userData": "more sensitive-data"})

result = replace_key_with_hashed_value(resource_dict=vm_spec, key_name="spec>template>spec>volumes[]>userData")

for volume in result["spec"]["template"]["spec"]["volumes"]:

assert volume["userData"] == "*******"

def test_replace_key_with_hashed_value_multiple_occurances(vm_spec):

test_data = {

"userData": "root-level",

"spec": {

"template": {

"spec": {

"volumes": [

{"name": "volume1", "userData": "level-1"},

{"name": "volume2", "userData": "level-1"}

],

"nested": {

"userData": "level-2"

}

}

}

}

}

original_data = test_data.copy()

result = replace_key_with_hashed_value(resource_dict=test_data, key_name="userData")

# Verify all occurrences are hashed

assert result["userData"] == "*******"

assert result["spec"]["template"]["spec"]["nested"]["userData"] == "*******"

for volume in result["spec"]["template"]["spec"]["volumes"]:

assert volume["userData"] == "*******"

# Verify original data is unchanged

assert test_data == original_data

rnetser · 2024-11-10T17:21:41Z

ocp_resources/resource.py

+def replace_key_with_hashed_value(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]:
+    """
+    Recursively search a nested dictionary for a given key and changes its value to "******" if found.
+    The function supports two key formats:


the newline should come after the oneliner function description

rnetser · 2024-11-10T17:28:00Z

ocp_resources/resource.py

+        if not isinstance(resource_dict, dict):
+            raise ValueError("Expected a dictionary as the first argument")


but we do not have such checks in the wrapper

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)

ocp_resources/resource.py (3)

137-141: Add empty line after docstring's first line.

According to PEP 257, there should be an empty line after the docstring's first line.

    """
    Recursively search a nested dictionary for a given key and changes its value to "******" if found.

    The function supports two key formats:

142-148: Enhance docstring with concrete examples.

The docstring would be more helpful with concrete dictionary examples.

    1. Regular dictionary path:
        A key to be hashed can be found directly in a dictionary, e.g. "a>b>c", would hash the value associated with
-        key "c", where dictionary format is {a:{b:{c:sensitive data}}}
+        key "c", where dictionary format is:
+        input = {
+            "a": {
+                "b": {
+                    "c": "sensitive data"
+                }
+            }
+        }
+        output = {
+            "a": {
+                "b": {
+                    "c": "*******"
+                }
+            }
+        }
    2. List path:
        A key to be hashed can be found in a dictionary that is in list somewhere in a dictionary, e.g. "a>b[]>c",
-        would hash the value associated with key "c", where dictionary format is {a:{b:[{d: not sensitive data},
-        {c:sensitive data}]}}
+        would hash the value associated with key "c", where dictionary format is:
+        input = {
+            "a": {
+                "b": [
+                    {"d": "not sensitive data"},
+                    {"c": "sensitive data"}
+                ]
+            }
+        }
+        output = {
+            "a": {
+                "b": [
+                    {"d": "not sensitive data"},
+                    {"c": "*******"}
+                ]
+            }
+        }

158-160: Consider optimizing deep copy usage.

Creating a deep copy for every recursive call could impact performance with large nested dictionaries. Consider passing the original dictionary in the recursive calls and only creating a deep copy at the top level.

-    result = copy.deepcopy(resource_dict)
-
-    benedict_resource_dict = benedict(result, keypath_separator=">")
+    # Only create deep copy at top level
+    if not hasattr(replace_key_with_hashed_value, '_is_recursive_call'):
+        resource_dict = copy.deepcopy(resource_dict)
+    benedict_resource_dict = benedict(resource_dict, keypath_separator=">")

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 691d3eb and f01c5c0.

📒 Files selected for processing (1)

ocp_resources/resource.py (2 hunks)

🔇 Additional comments (1)

ocp_resources/resource.py (1)

1219-1225: LGTM! Input validation and data handling look good.

The changes to hash_resource_dict are well-implemented:

Proper input validation
Safe handling of the input dictionary through deep copy
Clear logic for applying hashing to specified keys

Handle sensitive information being inside a list in resource_dict

dafe9d0

redhat-qe-bot2 added needs-rebase size/S labels Oct 23, 2024

redhat-qe-bot2 requested a review from omrirh October 23, 2024 19:40

redhat-qe-bot2 mentioned this pull request Oct 23, 2024

Handle sensitive information being inside a list in resource_dict - 2178 #2179

Open

redhat-qe-bot2 added the branch-main label Oct 23, 2024

redhat-qe-bot2 requested review from myakove and rnetser October 23, 2024 19:40

coderabbitai bot reviewed Oct 23, 2024

View reviewed changes

rh-bot-1 added the commented-coderabbitai[bot] label Oct 23, 2024

Merge branch 'main' into hash_cloudinit

462ee06

redhat-qe-bot2 removed commented-coderabbitai[bot] needs-rebase labels Oct 23, 2024

coderabbitai bot reviewed Oct 23, 2024

View reviewed changes

redhat-qe-bot2 added the commented-coderabbitai[bot] label Oct 23, 2024

myakove requested changes Oct 25, 2024

View reviewed changes

rh-bot-1 added changes-requested-myakove commented-rnetser commented-dbasunag commented-myakove labels Oct 25, 2024

Update ocp_resources/resource.py

6527412

Co-authored-by: Meni Yakove <[email protected]>

redhat-qe-bot1 added needs-rebase and removed commented-myakove commented-coderabbitai[bot] commented-rnetser commented-dbasunag changes-requested-myakove labels Oct 28, 2024

redhat-qe-bot1 requested a review from myakove October 28, 2024 13:35

redhat-qe-bot2 added the commented-dbasunag label Nov 4, 2024

redhat-qe-bot1 added the needs-rebase label Nov 5, 2024

rnetser requested changes Nov 7, 2024

View reviewed changes

rh-bot-1 added the changes-requested-rnetser label Nov 7, 2024

address review comments

691d3eb

redhat-qe-bot2 removed changes-requested-rnetser changes-requested-myakove commented-dbasunag labels Nov 7, 2024

redhat-qe-bot2 requested review from rnetser and myakove November 7, 2024 22:23

coderabbitai bot reviewed Nov 7, 2024

View reviewed changes

redhat-qe-bot2 added the commented-coderabbitai[bot] label Nov 7, 2024

rh-bot-1 removed commented-coderabbitai[bot] needs-rebase labels Nov 7, 2024

redhat-qe-bot2 added the commented-myakove label Nov 9, 2024

rnetser requested changes Nov 10, 2024

View reviewed changes

redhat-qe-bot added the changes-requested-rnetser label Nov 10, 2024

fix newline

f01c5c0

dbasunag force-pushed the hash_cloudinit branch from 7785a8e to f01c5c0 Compare November 11, 2024 16:38

redhat-qe-bot added needs-rebase and removed changes-requested-rnetser commented-myakove labels Nov 11, 2024

redhat-qe-bot requested a review from rnetser November 11, 2024 16:38

coderabbitai bot reviewed Nov 11, 2024

View reviewed changes

rh-bot-1 added the commented-coderabbitai[bot] label Nov 11, 2024

redhat-qe-bot added the commented-dbasunag label Nov 11, 2024

Merge branch 'main' into hash_cloudinit

d00217f

redhat-qe-bot removed commented-dbasunag needs-rebase commented-coderabbitai[bot] labels Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle sensitive information being inside a list in resource_dict #2178

Handle sensitive information being inside a list in resource_dict #2178

dbasunag commented Oct 23, 2024 •

edited by coderabbitai bot

Loading

redhat-qe-bot2 commented Oct 23, 2024

coderabbitai bot commented Oct 23, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Oct 23, 2024

coderabbitai bot left a comment

myakove Oct 25, 2024

myakove Oct 25, 2024

dbasunag Oct 25, 2024

rnetser Nov 7, 2024

rnetser Nov 10, 2024

rnetser Nov 7, 2024

rnetser Nov 7, 2024

rnetser Nov 7, 2024

dbasunag Nov 7, 2024

rnetser Nov 7, 2024

dbasunag Nov 7, 2024

myakove Nov 9, 2024

rnetser Nov 10, 2024

dbasunag Nov 11, 2024

rnetser Nov 7, 2024

coderabbitai bot left a comment

coderabbitai bot Nov 7, 2024

coderabbitai bot Nov 7, 2024

coderabbitai bot Nov 7, 2024

rnetser Nov 10, 2024

rnetser Nov 10, 2024

coderabbitai bot left a comment

	resource_dict: The nested dictionary to search. Must be a dictionary.
	resource_dict: The nested dictionary to search.

		if not isinstance(resource_dict, dict):
		raise ValueError("Expected a dictionary as the first argument")

-def test_replace_key_with_hashed_value_multiple_occurances(vm_spec):
-    vm_spec["spec"]["template"]["spec"]["volumes"].append({"name": "volume2", "userData": "more sensitive-data"})
-    result = replace_key_with_hashed_value(resource_dict=vm_spec, key_name="spec>template>spec>volumes[]>userData")
-    for volume in result["spec"]["template"]["spec"]["volumes"]:
-        assert volume["userData"] == "*******"
+def test_replace_key_with_hashed_value_multiple_occurances(vm_spec):
+    test_data = {
+        "userData": "root-level",
+        "spec": {
+            "template": {
+                "spec": {
+                    "volumes": [
+                        {"name": "volume1", "userData": "level-1"},
+                        {"name": "volume2", "userData": "level-1"}
+                    ],
+                    "nested": {
+                        "userData": "level-2"
+                    }
+                }
+            }
+        }
+    }
+    original_data = test_data.copy()
+    result = replace_key_with_hashed_value(resource_dict=test_data, key_name="userData")
+    # Verify all occurrences are hashed
+    assert result["userData"] == "*******"
+    assert result["spec"]["template"]["spec"]["nested"]["userData"] == "*******"
+    for volume in result["spec"]["template"]["spec"]["volumes"]:
+        assert volume["userData"] == "*******"
+    # Verify original data is unchanged
+    assert test_data == original_data

Handle sensitive information being inside a list in resource_dict #2178

Are you sure you want to change the base?

Handle sensitive information being inside a list in resource_dict #2178

Conversation

dbasunag commented Oct 23, 2024 • edited by coderabbitai bot Loading

Short description:

More details:

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for reviewer:

Bug:

Summary by CodeRabbit

Summary by CodeRabbit

redhat-qe-bot2 commented Oct 23, 2024

coderabbitai bot commented Oct 23, 2024 • edited Loading

Walkthrough

Changes

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 23, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 7, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 7, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

dbasunag commented Oct 23, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 23, 2024 •

edited

Loading