Update Protocol Tests #3227

jonathan343 · 2024-07-26T15:42:51Z

Context

This PR replaces the existing protocol tests in botocore/tests/unit/protocols/... with new tests generated from Smithy protocol test models. We use a version of these Smithy tests that are converted to the format currently supported by our existing test runner (test_protocols.py).

Issues

Many of the new protocol tests are failing due to test runner, serialization, and parsing issues. I've highlighted the notable issues below to provide additional context for reviwers:

Test Runner (`test_protocols.py`)

Response Body Normalization - The new protocol tests define expected body values with JSON and XML that includes extra white space and newlines. This prohibits us from continuing to do direct assertions between expected and actual values. Instead, we normalize the expected body value by attempting to parse the content as a JSON or XML object based on the protocol.
Handle Special Float Types - In the protocol test suite, certain special float types are represented as strings: "Infinity", "-Infinity", and "NaN". However, we parse these values as actual floats types, so we need to convert them back to their string representation before comparing with the expected values.

Input Serialization (`serialize.py`)

...

Response Parsing (`parsers.py`)

Infer Root XML Node - We should be trying to infer the following root nodes when parsing responses.
- ec2 - Serializes XML responses within an XML root node with the name of the operation's output suffixed with "Response" such as <OperationNameResponse>. (see more)
- query - Serializes a nested element with the name of the operation's output suffixed with "Result" such as <OperationNameResult>. (see more)
JSON Error Parsing - Parse errors as described in Operation error serialization for ALL json-based protocols. This behavior currently only exists for the rest-json protocol and doesn't handle : characters.
JSON Parse Header Values - This PR includes updates to the rest-json parser to handle boolean, float, and double values when represented as strings in a header.
...

nateprewitt

Looks like a good start! I left some initial feedback to make sure we're not regressing things with these changes. Ideally we should be able to tie code blocks to individual test(s) changes.

botocore/parsers.py

nateprewitt · 2024-07-31T15:54:47Z

botocore/parsers.py

-        # This method is needed because we have to special case flattened list
-        # with a serialization name.  If this is the case we use the
-        # locationName from the list's member shape as the key name for the
-        # surrounding structure.
-        if shape.type_name == 'list' and shape.serialization.get('flattened'):
-            list_member_serialized_name = shape.member.serialization.get(
-                'name'
-            )
-            if list_member_serialized_name is not None:
-                return list_member_serialized_name
-        serialized_name = shape.serialization.get('name')
-        if serialized_name is not None:
-            return serialized_name
-        return member_name


Why did we hoist all of this?

From my understanding, the conditional logic we have defined here goes against the guidance provided in the Flattened list serialization guide.

The xmlName trait applied to the member of a list has no effect when serializing a flattened list into a structure/union. For example, given the following:

union Choice { @xmlFlattened flat: MyList } list MyList { @xmlName("Hi") member: String }

The XML serialization of Choice is:

<Choice> <flat>example1</flat> <flat>example2</flat> <flat>example3</flat> </Choice>

Hmm I look another look at this and did find some outliers.
I wrote a script to parse all service model files to look for list shapes with "flattened":true and a locationName. The results are posted below:

Detecting flattened list shapes with a locationName: s3control: * StorageLensConfigurationList - StorageLensConfiguration * StorageLensGroupList - StorageLensGroup sdb: * AttributeList - Attribute * AttributeNameList - AttributeName * DeletableItemList - Item * DomainNameList - DomainName * ItemList - Item * ReplaceableAttributeList - Attribute * ReplaceableItemList - Item

These services don't follow the expected behavior indicated by the Smithy guidance. Looking more into this.

nateprewitt · 2024-07-31T15:56:20Z

botocore/parsers.py

+        if not response.get('body'):
+            return {}


This is an error today because it means we got back a malformed response from the service. I think this only arises with S3 in specific edge cases with their deferred 200 responses. What test cases are enforcing this requirement now?

This was a misunderstanding on my part. There is a new query protocol test case (id:QueryEmptyInputAndEmptyOutput) that ensures SDKs properly parse responses for output shapes with no members to {}.

These responses normally look like below:

<ExampleResponse xmlns="..."> <ExampleResult/> <ResponseMetadata> <RequestId>778994ee-73cf-4128-a909-55c10282758c</RequestId> </ResponseMetadata> </ExampleResponse>

The smithy test should be updated to use a response similar to above.

I went back to this and will remove it from the next revision.
I misunderstood a test case, a fix needs to be made upstream to Smithy.

tests/unit/test_protocols.py

nateprewitt · 2024-07-31T17:17:07Z

tests/unit/test_protocols.py

+    if protocol_type in ['query', 'ec2']:
+        if expected.get('headers', {}).get('Content-Type'):
+            expected['headers']['Content-Type'] += '; charset=utf-8'


Why are we only getting charset information with query/ec2?

For the query and ec2 protocols, we specifically set the Content-Type header to application/x-www-form-urlencoded; charset=utf-8. The smithy protocol tests only expect application/x-www-form-urlencoded. This line adds the missing ; charset=utf-8 to match what we expect for our SDK.

tests/unit/test_protocols.py

nateprewitt · 2024-07-31T17:20:08Z

tests/unit/test_protocols.py

+    except (json.JSONDecodeError, ET.ParseError):
+        assert_equal(actual_body, expected_body, 'Body value')


Do we have test cases that are giving us back bad bodies that we expect to work? This seems like an anti-pattern because we're potentially letting breakages through the test suite with this.

There are many cases in rest-json and rest-xml that expect bodies not in JSON or XML format. If it fails to parse, we still assert the exact string value.

nateprewitt · 2024-07-31T17:21:25Z

tests/unit/test_protocols.py

+        if value in [float('Infinity'), float('-Infinity')] or math.isnan(
+            value
+        ):


Is there something unique about NaN that we don't use float('NaN') for the check like we do inf?

Yea, NaN is unique in that it isn't equal to any value, including itself. Using the in or == operators won't work with float('NaN') since it will always equate to false. Because NaN it's the only float that doesn't equal itself, we can check for NaN with value != value. I can update to use this option if preferred.

jonathan343

Replied/resolved some of the comments and suggestions. Still looking deeper into a few of them but keeping changes local for now because I think GitHub will hide the unresolved comments and make it harder to keep track of them if I push new changes.

botocore/parsers.py

jonathan343 · 2024-08-01T22:59:34Z

tests/unit/test_protocols.py

+    if protocol_type in ['query', 'ec2']:
+        if expected.get('headers', {}).get('Content-Type'):
+            expected['headers']['Content-Type'] += '; charset=utf-8'


For the query and ec2 protocols, we specifically set the Content-Type header to application/x-www-form-urlencoded; charset=utf-8. The smithy protocol tests only expect application/x-www-form-urlencoded. This line adds the missing ; charset=utf-8 to match what we expect for our SDK.

jonathan343 · 2024-08-02T12:45:39Z

tests/unit/test_protocols.py

+        if value in [float('Infinity'), float('-Infinity')] or math.isnan(
+            value
+        ):


Yea, NaN is unique in that it isn't equal to any value, including itself. Using the in or == operators won't work with float('NaN') since it will always equate to false. Because NaN it's the only float that doesn't equal itself, we can check for NaN with value != value. I can update to use this option if preferred.

jonathan343 · 2024-08-02T17:46:47Z

botocore/parsers.py

-        # This method is needed because we have to special case flattened list
-        # with a serialization name.  If this is the case we use the
-        # locationName from the list's member shape as the key name for the
-        # surrounding structure.
-        if shape.type_name == 'list' and shape.serialization.get('flattened'):
-            list_member_serialized_name = shape.member.serialization.get(
-                'name'
-            )
-            if list_member_serialized_name is not None:
-                return list_member_serialized_name
-        serialized_name = shape.serialization.get('name')
-        if serialized_name is not None:
-            return serialized_name
-        return member_name


From my understanding, the conditional logic we have defined here goes against the guidance provided in the Flattened list serialization guide.

The xmlName trait applied to the member of a list has no effect when serializing a flattened list into a structure/union. For example, given the following:

union Choice { @xmlFlattened flat: MyList } list MyList { @xmlName("Hi") member: String }

The XML serialization of Choice is:

<Choice> <flat>example1</flat> <flat>example2</flat> <flat>example3</flat> </Choice>

jonathan343 · 2024-08-02T17:54:14Z

botocore/parsers.py

@@ -586,14 +585,24 @@ def _parse_body_as_xml(self, response, shape, inject_metadata=True):
                start = self._find_result_wrapped_shape(
                    shape.serialization['resultWrapper'], root
                )
+            else:


This logic was added in effort to align with the following serialization guidance:

Query:

The awsQuery protocol serializes XML responses within an XML root node with the name of the operation's output suffixed with "Response". A nested element, with the name of the operation's output suffixed with "Result", contains the contents of the successful response.
ref: https://smithy.io/2.0/aws/protocols/aws-query-protocol.html#response-serialization

EC2:

The ec2Query protocol serializes XML responses within an XML root node with the name of the operation's output suffixed with "Response", which contains the contents of the successful response.
ref: https://smithy.io/2.0/aws/protocols/aws-ec2-query-protocol.html#response-serialization

Note: The next revision doesn't hardcode "Result" and instead uses a ROOT_NODE_SUFFIX constant that is defined as ROOT_NODE_SUFFIX = 'Result' for QueryParser and ROOT_NODE_SUFFIX = 'Response' for EC2QueryParser.

jonathan343 · 2024-08-02T17:59:26Z

botocore/parsers.py

        code = body.get('__type', response_code and str(response_code))
        if code is not None:
+            # The "Code" value can come from either a response
+            # header or a value in the JSON body.
+            if 'x-amzn-errortype' in response['headers']:
+                code = response['headers']['x-amzn-errortype']
+            elif 'code' in body or 'Code' in body:
+                code = body.get('code', body.get('Code', ''))


The protocol tests require us to support following requirement for operation error parsing:

The component MUST be one of the following: an additional header with the name X-Amzn-Errortype, a body field with the name code, or a body field named __type.

JSON-1.0 - https://smithy.io/2.0/aws/protocols/aws-json-1_0-protocol.html#operation-error-serialization

JSON-1.1 - https://smithy.io/2.0/aws/protocols/aws-json-1_1-protocol.html#operation-error-serialization

REST-JSON - https://smithy.io/2.0/aws/protocols/aws-restjson1-protocol.html#operation-error-serialization

One update I will have to make is the preference of __type for JSON-1.1/1.0 and x-amzn-errortype for REST-JSON

jonathan343 · 2024-08-02T18:13:08Z

botocore/parsers.py

+            if location is None or location not in self.KNOWN_LOCATIONS:
                continue


Yea, you're right. I'll revert this change. It would short circuit earlier, but not necessary.

jonathan343 · 2024-08-02T19:10:53Z

botocore/parsers.py

@@ -994,14 +1020,28 @@ def _handle_string(self, shape, value):
        parsed = value
        if is_json_value_header(shape):
            decoded = base64.b64decode(value).decode(self.DEFAULT_ENCODING)
-            parsed = json.loads(decoded)
+            parsed = json.dumps(json.loads(decoded))


Maybe I misunderstood this, but I assumed if this method is handling a string value, we should be returning the string representation of the parsed json object.

jonathan343 · 2024-08-03T00:01:17Z

botocore/parsers.py

+        if isinstance(value, str):
+            if value == 'true':
+                return True
+            else:
+                return False


Updating this to use the ensure_boolean. Was the concern with only accepting true and not True? Or with the change as a whole?

codecov-commenter · 2024-08-21T23:08:34Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 99.24242% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.10%. Comparing base (c68aa1a) to head (2ec3a18).
Report is 113 commits behind head on develop.

Files	Patch %	Lines
botocore/serialize.py	98.93%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3227      +/-   ##
===========================================
- Coverage    93.12%   93.10%   -0.03%     
===========================================
  Files           66       66              
  Lines        14252    14354     +102     
===========================================
+ Hits         13272    13364      +92     
- Misses         980      990      +10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…thy.

jonathan343 force-pushed the protocol-tests branch from 56eb039 to cdea655 Compare July 29, 2024 17:44

nateprewitt requested changes Jul 31, 2024

View reviewed changes

jonathan343 commented Aug 3, 2024

View reviewed changes

jonathan343 force-pushed the protocol-tests branch from 5fa0e66 to 428b418 Compare August 16, 2024 14:25

jonathan343 added 8 commits August 16, 2024 07:31

Fix minor typos in test_protocols.py

45ba138

Add smithy generated response parsing protocol tests

4714a35

Add smithy generated request serialization protocol tests

9a94f30

Resolve a majority of the response parsing test cases.

0559d2e

Update test_input_compliance to support new protocol tests

772fd81

Resolve a majority of the request serialization test cases.

6d55023

Fix more input serialization issues.

7db1cf3

Partially address PR feedback.

752465c

jonathan343 force-pushed the protocol-tests branch from 428b418 to 752465c Compare August 16, 2024 14:32

jonathan343 added 2 commits August 19, 2024 15:26

Implement granular protocol tests ignore list.

35aeaf6

Clean up and more CR feedback.

6afd162

Use UTF-8 by default on windows to match the expected output from Smi…

2ec3a18

…thy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Protocol Tests #3227

Update Protocol Tests #3227

jonathan343 commented Jul 26, 2024 •

edited

Loading

nateprewitt left a comment

nateprewitt Jul 31, 2024

jonathan343 Aug 2, 2024

jonathan343 Aug 21, 2024 •

edited

Loading

nateprewitt Jul 31, 2024

jonathan343 Aug 21, 2024

jonathan343 Aug 21, 2024

nateprewitt Jul 31, 2024

jonathan343 Aug 1, 2024

nateprewitt Jul 31, 2024

jonathan343 Aug 21, 2024

nateprewitt Jul 31, 2024

jonathan343 Aug 2, 2024

jonathan343 left a comment

jonathan343 Aug 1, 2024

jonathan343 Aug 2, 2024

jonathan343 Aug 2, 2024

jonathan343 Aug 2, 2024

jonathan343 Aug 2, 2024

jonathan343 Aug 2, 2024

jonathan343 Aug 2, 2024

jonathan343 Aug 3, 2024

codecov-commenter commented Aug 21, 2024 •

edited

Loading

		except (json.JSONDecodeError, ET.ParseError):
		assert_equal(actual_body, expected_body, 'Body value')

		if location is None or location not in self.KNOWN_LOCATIONS:
		continue

Update Protocol Tests #3227

Are you sure you want to change the base?

Update Protocol Tests #3227

Conversation

jonathan343 commented Jul 26, 2024 • edited Loading

Context

Issues

Test Runner (test_protocols.py)

Input Serialization (serialize.py)

Response Parsing (parsers.py)

nateprewitt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonathan343 Aug 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonathan343 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Aug 21, 2024 • edited Loading

Codecov Report

jonathan343 commented Jul 26, 2024 •

edited

Loading

Test Runner (`test_protocols.py`)

Input Serialization (`serialize.py`)

Response Parsing (`parsers.py`)

jonathan343 Aug 21, 2024 •

edited

Loading

codecov-commenter commented Aug 21, 2024 •

edited

Loading