-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle sensitive information being inside a list in resource_dict #2178
base: main
Are you sure you want to change the base?
Conversation
Report bugs in Issues The following are automatically added:
Available user actions:
Supported /retest check runs
Supported labels
|
WalkthroughThis pull request introduces a new function, Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
ocp_resources/virtual_machine.py (1)
179-181
: LGTM! Consider adding documentation.The implementation correctly identifies the sensitive field that needs to be masked. Consider adding a docstring to explain the purpose of this property and its security implications.
@property def keys_to_hash(self): + """ + Returns a list of keys containing sensitive information that should be masked. + + Returns: + list[str]: List of keys to be masked in the resource dictionary. + Currently masks 'userData' which contains sensitive boot-time configuration. + """ return ["userData"]ocp_resources/resource.py (1)
136-156
: LGTM with suggestions for improvements.The implementation correctly handles nested dictionaries and lists. However, consider these improvements:
- Add protection against circular references to prevent stack overflow
- Consider making the mask value configurable for flexibility
Here's a suggested improvement:
-def change_dict_value_to_hashed(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]: +def change_dict_value_to_hashed( + resource_dict: Dict[Any, Any], + key_name: str, + mask: str = "******", + _seen: set | None = None +) -> Dict[Any, Any]: """ Recursively search a nested dictionary for a given key and changes its value to "******" if found. Args: resource_dict: The nested dictionary to search. key_name: The key to find. + mask: The value to use for masking sensitive data. + _seen: Internal parameter to track circular references. Returns: The modified dictionary. """ + if _seen is None: + _seen = set() + + # Handle circular references + if id(resource_dict) in _seen: + return resource_dict + if isinstance(resource_dict, dict): + _seen.add(id(resource_dict)) for key, value in resource_dict.items(): if key == key_name: - resource_dict[key] = "******" + resource_dict[key] = mask elif isinstance(value, dict): - resource_dict[key] = change_dict_value_to_hashed(value, key_name) + resource_dict[key] = change_dict_value_to_hashed(value, key_name, mask, _seen) elif isinstance(value, list): for key_list, value_list in enumerate(value): - value[key_list] = change_dict_value_to_hashed(value_list, key_name) + value[key_list] = change_dict_value_to_hashed(value_list, key_name, mask, _seen) return resource_dict
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- ocp_resources/resource.py (2 hunks)
- ocp_resources/sealed_secret.py (1 hunks)
- ocp_resources/virtual_machine.py (1 hunks)
🧰 Additional context used
🔇 Additional comments (3)
ocp_resources/sealed_secret.py (1)
53-53
: Verify the impact of simplified key paths.The simplified key paths in
keys_to_hash
might affect how sensitive data is masked in the resource dictionary. Let's verify this change works correctly with the updatedhash_resource_dict
implementation.Consider adding a docstring to the
keys_to_hash
property to document:
- The expected format of the keys
- How these keys are used in the resource dictionary masking process
- Why the "spec.." prefix was removed
This will help future maintainers understand the design decision and prevent accidental reintroduction of the prefix.
✅ Verification successful
Let me gather more information about how the key paths are actually processed in the codebase.
This will help us understand:
- How the key paths are processed in the hashing function
- If there are tests verifying this functionality
- Where the SealedSecret class is used in the codebase
Let me check one more thing to verify the test coverage and actual usage patterns.
This will help us understand:
- The test coverage for this functionality
- The complete context of SealedSecret class
- How these specific keys are actually used in hashing
Based on the gathered information, I can now provide the final response:
Simplified key paths in
keys_to_hash
are correct and safeThe change from
["spec..data", "spec..encryptedData"]
to["data", "encryptedData"]
is correct because:
- The
change_dict_value_to_hashed
function performs a recursive dictionary search for exact key matches, making the "spec.." prefix unnecessary- The SealedSecret class directly accepts
data
andencrypted_data
as top-level parameters in its constructor- The implementation aligns with the actual structure of the resource where sensitive data is stored
The change maintains security while simplifying the implementation, as the hashing mechanism in
hash_resource_dict
will still mask all sensitive information correctly.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of keys_to_hash in the codebase # and ensure it's compatible with the new implementation # Check if there are any direct references to the old format "spec..data" echo "Checking for old key format usage..." rg -l "spec\.\.data|spec\.\.encryptedData" # Look for other implementations of keys_to_hash to ensure consistency echo "Checking other keys_to_hash implementations..." ast-grep --pattern 'def keys_to_hash' # Check the Resource class implementation that uses these keys echo "Checking Resource class implementation..." rg -A 10 "def hash_resource_dict"Length of output: 1254
Script:
#!/bin/bash # Check the implementation of change_dict_value_to_hashed function echo "Checking change_dict_value_to_hashed implementation..." rg -A 15 "def change_dict_value_to_hashed" # Check if there are any tests that verify this functionality echo "Checking related tests..." fd -e py test_sealed_secret test_resource --exec rg -l "keys_to_hash|hash_resource_dict" {} # Check the actual usage of SealedSecret class echo "Checking SealedSecret usage..." rg -l "class SealedSecret"Length of output: 1542
Script:
#!/bin/bash # Check for test files with correct path echo "Checking test files..." fd -e py -g "*test*" --exec rg -l "keys_to_hash|hash_resource_dict" {} # Check the complete SealedSecret class implementation echo "Checking SealedSecret class implementation..." rg -A 20 "class SealedSecret" ocp_resources/sealed_secret.py # Check actual usage of hash_resource_dict with these keys echo "Checking actual usage patterns..." rg -B 5 -A 5 "hash_resource_dict.*data|hash_resource_dict.*encryptedData"Length of output: 1177
ocp_resources/resource.py (2)
1198-1198
: LGTM! Clean integration of the new function.The change correctly integrates the new
change_dict_value_to_hashed
function while maintaining the existing behavior and safety measures (deep copy, condition checks).
136-156
: Verify the impact on existing consumers.Let's verify the impact of these changes on existing code.
Also applies to: 1198-1198
✅ Verification successful
Let me gather more information about the
keys_to_hash
property since the previous search didn't show its complete usage pattern.This will help us understand:
- Where and how
keys_to_hash
is defined and used- The usage pattern of
hash_log_data
which controls the hashing behavior- Any potential impact on existing implementations
Based on the gathered information, I can now provide a final response as we have a clear picture of the usage and impact:
The changes to
change_dict_value_to_hashed
are safe and properly integratedThe function is used within a well-defined pattern where:
- It's only called from the base
Resource
class'shash_resource_dict
method- The hashing behavior is controlled by two factors:
- The
hash_log_data
boolean parameter (defaults to True)- Resource-specific
keys_to_hash
property implementations- Only specific resource types implement
keys_to_hash
:
- ConfigMap: ["data", "binaryData"]
- VirtualMachine: ["userData"]
- SealedSecret: ["data", "encryptedData"]
- Secret: ["data", "stringData"]
The changes maintain the existing behavior while improving the implementation, and all current consumers are properly integrated with this pattern.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for usages of hash_resource_dict and keys_to_hash property echo "Checking for direct calls to hash_resource_dict..." rg -A 2 "hash_resource_dict\(" echo -e "\nChecking for classes that override keys_to_hash property..." ast-grep --pattern 'class $_ { $$$ @property def keys_to_hash($_) { $$$ } $$$ }'Length of output: 1839
Script:
#!/bin/bash # Let's search for any references to keys_to_hash in the codebase echo "Searching for keys_to_hash references..." rg -A 2 "keys_to_hash" # Also check for hash_log_data since it's used in conjunction echo -e "\nSearching for hash_log_data references..." rg -A 2 "hash_log_data"Length of output: 2647
ocp_resources/virtual_machine.py
Outdated
@property | ||
def keys_to_hash(self): | ||
return ["userData"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider using a class constant for the sensitive key.
To improve maintainability and reusability, consider defining the sensitive key as a class constant.
class VirtualMachine(NamespacedResource):
+ # Sensitive fields that should be masked when serializing
+ SENSITIVE_USER_DATA_KEY = "userData"
+
"""
Virtual Machine object, inherited from Resource.
Implements actions start / stop / status / wait for VM status / is running
"""
@property
def keys_to_hash(self):
- return ["userData"]
+ return [self.SENSITIVE_USER_DATA_KEY]
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
@property | |
def keys_to_hash(self): | |
return ["userData"] | |
@property | |
def keys_to_hash(self): | |
return [self.SENSITIVE_USER_DATA_KEY] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
ocp_resources/resource.py (1)
136-156
: Improve function documentation and edge case handling.The function implementation looks good, but there are a few suggestions for improvement:
Documentation could be enhanced:
- Add a note about in-place modification
- Document the behavior for edge cases
- Add example usage
Consider handling edge cases:
- None values
- Non-dict/non-list objects in nested structures
Performance optimization:
- Consider using a generator for list iteration
- Add early return for empty/None input
Here's how you could improve the implementation:
def change_dict_value_to_hashed(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]: """ Recursively search a nested dictionary for a given key and changes its value to "******" if found. + The function modifies the input dictionary in-place. Args: resource_dict: The nested dictionary to search. key_name: The key to find. Returns: The modified dictionary. + + Example: + >>> d = {"a": {"sensitive": "secret"}, "b": [{"sensitive": "hidden"}]} + >>> change_dict_value_to_hashed(d, "sensitive") + {'a': {'sensitive': '******'}, 'b': [{'sensitive': '******'}]} """ + if not resource_dict: + return resource_dict + if isinstance(resource_dict, dict): for key, value in resource_dict.items(): if key == key_name: resource_dict[key] = "******" - elif isinstance(value, dict): + elif value is not None and isinstance(value, dict): resource_dict[key] = change_dict_value_to_hashed(value, key_name) - elif isinstance(value, list): + elif value is not None and isinstance(value, list): for key_list, value_list in enumerate(value): - value[key_list] = change_dict_value_to_hashed(value_list, key_name) + if value_list is not None: + value[key_list] = change_dict_value_to_hashed(value_list, key_name) return resource_dict
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- ocp_resources/resource.py (2 hunks)
🧰 Additional context used
🔇 Additional comments (2)
ocp_resources/resource.py (2)
1199-1199
: LGTM! Clean implementation of sensitive data handling.The modification correctly utilizes the new
change_dict_value_to_hashed
function to handle sensitive information in resource dictionaries.
1199-1199
: Verify handling of sensitive fields across the codebase.Let's ensure that all sensitive fields, particularly
userData
in VM specs, are properly handled by the new implementation.Run the following script to verify the usage:
ocp_resources/resource.py
Outdated
@@ -134,9 +133,27 @@ def sub_resource_level(current_class: Any, owner_class: Any, parent_class: Any) | |||
return None | |||
|
|||
|
|||
# Exceptions classes | |||
def change_dict_value_to_hashed(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests.
ocp_resources/resource.py
Outdated
@@ -15,7 +15,6 @@ | |||
import kubernetes | |||
from kubernetes.dynamic import DynamicClient, ResourceInstance | |||
import yaml | |||
from benedict import benedict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
benedict
is no longer used and should be removed from the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. Will do.
Co-authored-by: Meni Yakove <[email protected]>
def replace_key_with_hashed_value(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]: | ||
""" | ||
Recursively search a nested dictionary for a given key and changes its value to "******" if found. | ||
The function supports two key formats: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the newline should come after the oneliner function description
ocp_resources/resource.py
Outdated
1. Regular path: "path>to>key" | ||
2. List path: "path>to[]>key" | ||
Args: | ||
resource_dict: The nested dictionary to search. Must be a dictionary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resource_dict: The nested dictionary to search. Must be a dictionary. | |
resource_dict: The nested dictionary to search. |
no need for this; the function has typing
ocp_resources/resource.py
Outdated
""" | ||
Recursively search a nested dictionary for a given key and changes its value to "******" if found. | ||
The function supports two key formats: | ||
1. Regular path: "path>to>key" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i do not understand the examples; can you provide real ones? i.e example dict and example path
ocp_resources/resource.py
Outdated
Raises: | ||
ValueError: If resource_dict is not a dictionary or key_name is not a string. | ||
""" | ||
# Create a deep copy to avoid modifying the input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove the inline comments; the code is clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, those were from coderabbit :). Will remove.
if not isinstance(resource_dict, dict): | ||
raise ValueError("Expected a dictionary as the first argument") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? we do not check types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@myakove suggested adding this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to have a nice error when failing, typing is not enforced, only suggests the type (by IDE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we do not have such checks in the wrapper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know how do you want me to proceed.
tests/test_validate_resources.py
Outdated
assert replace_key_with_hashed_value(resource_dict={}, key_name="a>b") == {} | ||
|
||
# Non-existent key | ||
assert replace_key_with_hashed_value(resource_dict={"x": {"y": "z"}}, key_name="a>b") == {"x": {"y": "z"}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each should be a separate test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (4)
tests/test_validate_resources.py (2)
10-22
: Consider enhancing the vm_spec fixture with additional test cases.The fixture provides good coverage but could be enhanced with more real-world scenarios:
- Empty lists/objects
- Null values
- Multiple nested lists
- Different data types (numbers, booleans)
@pytest.fixture() def vm_spec(): return { "spec": { "template": { "features": {"someNewFeature": {"someNewData": "sensitive information"}}, "spec": { "volumes": [ {"name": "volume1", "userData": "sensitive-data"}, + {"name": "volume2", "userData": None}, + {"name": "volume3", "userData": 12345}, + {"name": "volume4"} ], + "nestedList": [ + {"items": [{"userData": "nested-sensitive"}]} + ], + "emptyList": [] }, } } }
41-62
: Enhance test assertions for better coverage.While the test verifies the hashed value, it should also verify that:
- Other fields remain unchanged
- The structure of the dictionary is preserved
- The length of lists remains the same
def test_replace_key_with_hashed_value(vm_spec, key_name, expected_key): + original_dict = vm_spec.copy() + result = replace_key_with_hashed_value(resource_dict=vm_spec, key_name=key_name) assert ( - benedict(replace_key_with_hashed_value(resource_dict=vm_spec, key_name=key_name), keypath_separator=">")[ + benedict(result, keypath_separator=">")[ expected_key ] == "*******" ) + # Verify other fields remain unchanged + assert result["spec"]["template"]["spec"]["volumes"][0]["name"] == original_dict["spec"]["template"]["spec"]["volumes"][0]["name"] + # Verify structure is preserved + assert len(result["spec"]["template"]["spec"]["volumes"]) == len(original_dict["spec"]["template"]["spec"]["volumes"])ocp_resources/resource.py (2)
140-140
: Add newline after line 140.Add a newline after the line "The function supports two key formats:" to maintain consistent docstring formatting.
1214-1214
: Fix docstring example indentation.The example in the docstring should be properly indented for better readability.
- To hash spec['data'] key pass: ["spec>data"] + To hash spec['data'] key pass: ["spec>data"]
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (2)
ocp_resources/resource.py
(2 hunks)tests/test_validate_resources.py
(2 hunks)
🔇 Additional comments (2)
ocp_resources/resource.py (2)
137-179
: Implementation looks good!
The function effectively handles both regular dictionary paths and list paths, with good separation of concerns and proper deep copying to avoid modifying the input dictionary. The use of benedict for key path handling is a robust choice.
1219-1225
: Implementation looks good!
The changes to hash_resource_dict
properly validate input and integrate well with the new replace_key_with_hashed_value
function. The type checking adds robustness to the implementation.
@pytest.mark.parametrize( | ||
"resource, key_name, expected_result", | ||
[ | ||
# Empty dictionary | ||
pytest.param({}, "a>b", {}, id="test_replace_key_with_hashed_value_empty_dict"), | ||
# Non-existent key | ||
pytest.param( | ||
{"x": {"y": "z"}}, "a>b", {"x": {"y": "z"}}, id="test_replace_key_with_hashed_value_key_doesnt_exist" | ||
), | ||
], | ||
) | ||
def test_replace_key_with_hashed_value_edge_cases(resource, key_name, expected_result): | ||
"""Test edge cases for replace_key_with_hashed_value function.""" | ||
assert replace_key_with_hashed_value(resource_dict=resource, key_name=key_name) == expected_result | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add more edge cases as suggested in previous reviews.
The current edge cases are good but incomplete. Add tests for:
- Invalid key paths
- Empty key path
- Non-string key path
- Deeply nested structures
- Multiple occurrences of same key at different levels
[
# Empty dictionary
pytest.param({}, "a>b", {}, id="test_replace_key_with_hashed_value_empty_dict"),
# Non-existent key
pytest.param(
{"x": {"y": "z"}}, "a>b", {"x": {"y": "z"}}, id="test_replace_key_with_hashed_value_key_doesnt_exist"
),
+ # Invalid key path
+ pytest.param(
+ {"a": {"b": "value"}}, "a>>b", {"a": {"b": "value"}},
+ id="test_replace_key_with_hashed_value_invalid_path",
+ marks=pytest.mark.xfail(raises=ValueError)
+ ),
+ # Empty key path
+ pytest.param(
+ {"a": "value"}, "", {"a": "value"},
+ id="test_replace_key_with_hashed_value_empty_path",
+ marks=pytest.mark.xfail(raises=ValueError)
+ ),
+ # Deep nesting
+ pytest.param(
+ {"l1": {"l2": {"l3": {"l4": {"key": "value"}}}}},
+ "l1>l2>l3>l4>key",
+ {"l1": {"l2": {"l3": {"l4": {"key": "*******"}}}}},
+ id="test_replace_key_with_hashed_value_deep_nesting"
+ ),
],
Committable suggestion skipped: line range outside the PR's diff.
def test_replace_key_with_hashed_value_invalid_inputs(): | ||
"""Test that the function handles invalid inputs appropriately.""" | ||
with pytest.raises(TypeError): | ||
replace_key_with_hashed_value(resource_dict=None, key_name="a>b") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Expand invalid input tests as suggested in previous reviews.
Add more invalid input scenarios:
- Non-string key path
- Invalid key path format
- List/dict as key path
def test_replace_key_with_hashed_value_invalid_inputs():
"""Test that the function handles invalid inputs appropriately."""
with pytest.raises(TypeError):
replace_key_with_hashed_value(resource_dict=None, key_name="a>b")
+ # Non-string key path
+ with pytest.raises(TypeError):
+ replace_key_with_hashed_value(resource_dict={}, key_name=123)
+
+ # List as key path
+ with pytest.raises(TypeError):
+ replace_key_with_hashed_value(resource_dict={}, key_name=["a", "b"])
+
+ # Dict as key path
+ with pytest.raises(TypeError):
+ replace_key_with_hashed_value(resource_dict={}, key_name={"a": "b"})
+
+ # Invalid key path format
+ with pytest.raises(ValueError):
+ replace_key_with_hashed_value(resource_dict={}, key_name="a>>b")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def test_replace_key_with_hashed_value_invalid_inputs(): | |
"""Test that the function handles invalid inputs appropriately.""" | |
with pytest.raises(TypeError): | |
replace_key_with_hashed_value(resource_dict=None, key_name="a>b") | |
def test_replace_key_with_hashed_value_invalid_inputs(): | |
"""Test that the function handles invalid inputs appropriately.""" | |
with pytest.raises(TypeError): | |
replace_key_with_hashed_value(resource_dict=None, key_name="a>b") | |
# Non-string key path | |
with pytest.raises(TypeError): | |
replace_key_with_hashed_value(resource_dict={}, key_name=123) | |
# List as key path | |
with pytest.raises(TypeError): | |
replace_key_with_hashed_value(resource_dict={}, key_name=["a", "b"]) | |
# Dict as key path | |
with pytest.raises(TypeError): | |
replace_key_with_hashed_value(resource_dict={}, key_name={"a": "b"}) | |
# Invalid key path format | |
with pytest.raises(ValueError): | |
replace_key_with_hashed_value(resource_dict={}, key_name="a>>b") |
def test_replace_key_with_hashed_value_multiple_occurances(vm_spec): | ||
vm_spec["spec"]["template"]["spec"]["volumes"].append({"name": "volume2", "userData": "more sensitive-data"}) | ||
result = replace_key_with_hashed_value(resource_dict=vm_spec, key_name="spec>template>spec>volumes[]>userData") | ||
for volume in result["spec"]["template"]["spec"]["volumes"]: | ||
assert volume["userData"] == "*******" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid modifying fixture data and enhance multiple occurrences test.
- Instead of modifying the fixture, create a new dictionary with the test data
- Add assertions to verify the original data remains unchanged
- Test multiple occurrences at different nesting levels
def test_replace_key_with_hashed_value_multiple_occurances(vm_spec):
- vm_spec["spec"]["template"]["spec"]["volumes"].append({"name": "volume2", "userData": "more sensitive-data"})
+ test_data = {
+ "userData": "root-level",
+ "spec": {
+ "template": {
+ "spec": {
+ "volumes": [
+ {"name": "volume1", "userData": "level-1"},
+ {"name": "volume2", "userData": "level-1"}
+ ],
+ "nested": {
+ "userData": "level-2"
+ }
+ }
+ }
+ }
+ }
+ original_data = test_data.copy()
+ result = replace_key_with_hashed_value(resource_dict=test_data, key_name="userData")
- result = replace_key_with_hashed_value(resource_dict=vm_spec, key_name="spec>template>spec>volumes[]>userData")
- for volume in result["spec"]["template"]["spec"]["volumes"]:
- assert volume["userData"] == "*******"
+ # Verify all occurrences are hashed
+ assert result["userData"] == "*******"
+ assert result["spec"]["template"]["spec"]["nested"]["userData"] == "*******"
+ for volume in result["spec"]["template"]["spec"]["volumes"]:
+ assert volume["userData"] == "*******"
+ # Verify original data is unchanged
+ assert test_data == original_data
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def test_replace_key_with_hashed_value_multiple_occurances(vm_spec): | |
vm_spec["spec"]["template"]["spec"]["volumes"].append({"name": "volume2", "userData": "more sensitive-data"}) | |
result = replace_key_with_hashed_value(resource_dict=vm_spec, key_name="spec>template>spec>volumes[]>userData") | |
for volume in result["spec"]["template"]["spec"]["volumes"]: | |
assert volume["userData"] == "*******" | |
def test_replace_key_with_hashed_value_multiple_occurances(vm_spec): | |
test_data = { | |
"userData": "root-level", | |
"spec": { | |
"template": { | |
"spec": { | |
"volumes": [ | |
{"name": "volume1", "userData": "level-1"}, | |
{"name": "volume2", "userData": "level-1"} | |
], | |
"nested": { | |
"userData": "level-2" | |
} | |
} | |
} | |
} | |
} | |
original_data = test_data.copy() | |
result = replace_key_with_hashed_value(resource_dict=test_data, key_name="userData") | |
# Verify all occurrences are hashed | |
assert result["userData"] == "*******" | |
assert result["spec"]["template"]["spec"]["nested"]["userData"] == "*******" | |
for volume in result["spec"]["template"]["spec"]["volumes"]: | |
assert volume["userData"] == "*******" | |
# Verify original data is unchanged | |
assert test_data == original_data |
def replace_key_with_hashed_value(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]: | ||
""" | ||
Recursively search a nested dictionary for a given key and changes its value to "******" if found. | ||
The function supports two key formats: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the newline should come after the oneliner function description
if not isinstance(resource_dict, dict): | ||
raise ValueError("Expected a dictionary as the first argument") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we do not have such checks in the wrapper
7785a8e
to
f01c5c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
ocp_resources/resource.py (3)
137-141
: Add empty line after docstring's first line.According to PEP 257, there should be an empty line after the docstring's first line.
""" Recursively search a nested dictionary for a given key and changes its value to "******" if found. The function supports two key formats:
142-148
: Enhance docstring with concrete examples.The docstring would be more helpful with concrete dictionary examples.
1. Regular dictionary path: A key to be hashed can be found directly in a dictionary, e.g. "a>b>c", would hash the value associated with - key "c", where dictionary format is {a:{b:{c:sensitive data}}} + key "c", where dictionary format is: + input = { + "a": { + "b": { + "c": "sensitive data" + } + } + } + output = { + "a": { + "b": { + "c": "*******" + } + } + } 2. List path: A key to be hashed can be found in a dictionary that is in list somewhere in a dictionary, e.g. "a>b[]>c", - would hash the value associated with key "c", where dictionary format is {a:{b:[{d: not sensitive data}, - {c:sensitive data}]}} + would hash the value associated with key "c", where dictionary format is: + input = { + "a": { + "b": [ + {"d": "not sensitive data"}, + {"c": "sensitive data"} + ] + } + } + output = { + "a": { + "b": [ + {"d": "not sensitive data"}, + {"c": "*******"} + ] + } + }
158-160
: Consider optimizing deep copy usage.Creating a deep copy for every recursive call could impact performance with large nested dictionaries. Consider passing the original dictionary in the recursive calls and only creating a deep copy at the top level.
- result = copy.deepcopy(resource_dict) - - benedict_resource_dict = benedict(result, keypath_separator=">") + # Only create deep copy at top level + if not hasattr(replace_key_with_hashed_value, '_is_recursive_call'): + resource_dict = copy.deepcopy(resource_dict) + benedict_resource_dict = benedict(resource_dict, keypath_separator=">")
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
ocp_resources/resource.py
(2 hunks)
🔇 Additional comments (1)
ocp_resources/resource.py (1)
1219-1225
: LGTM! Input validation and data handling look good.
The changes to hash_resource_dict
are well-implemented:
- Proper input validation
- Safe handling of the input dictionary through deep copy
- Clear logic for applying hashing to specified keys
Short description:
Current code hash_resource_dict() does not provide a flexible way to hide fields like userData, for virtual machines (it could be present in spec.template.spec.volumes)
More details:
What this PR does / why we need it:
Which issue(s) this PR fixes:
Special notes for reviewer:
Bug:
Summary by CodeRabbit
Summary by CodeRabbit
New Features
VirtualMachine
class to access specific keys related to user data.Improvements
keys_to_hash
property in theSealedSecret
class for better data structure alignment.keys_to_hash
property in theResource
class to reflect a new keypath format.Tests