Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add trailing_commas to dump/dumps #372

Merged
merged 6 commits into from
Oct 21, 2024

Conversation

mavjop
Copy link
Contributor

@mavjop mavjop commented Oct 10, 2024

Issue #, if available:
Issue # 370

Description of changes:

Add an optional boolean argument trailing_commas to the dump and dumps methods in simpleion. Default: False.

When trailing_commas=True, and when indent is not None, this causes dump[s] to include a comma on the last item in a container.

Note: Since the C module already does not support pretty printing, this flag does nothing when using the C module.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Add an optional boolean argument `trailing_commas` to the `dump`
and `dumps` methods in `simpleion`. Default: False.

When `trailing_commas=True`, and when `indent` is not `None`, this
causes `dump[s]` to include a comma on the last item in a container.

**Note:** Since the C module already **does not support pretty
printing**, this flag does nothing when using the C module.
Stephen Jacob added 3 commits October 10, 2024 10:12
Make the `trailing_commas` flag do what it says: Only apply when pretty
printing (`indent is not None`).

Fix all remaining `test_simpleion.py` failures.
@@ -121,11 +121,15 @@ def dump(obj, fp, imports=None, binary=True, sequence_as_stream=False, indent=No
indent (Str): If binary is False and indent is a string, then members of containers will be pretty-printed with
a newline followed by that string repeated for each level of nesting. None (the default) selects the most
compact representation without any newlines. Example: to indent with four spaces per level of nesting,
use ``' '``.
use ``' '``. Supported only in the pure Python implementation (because pretty printing is not yet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np!

@@ -369,7 +369,7 @@ def _raw_writer_coroutine(depth=0, container_event=None, whence=None, indent=Non
ion_event, self = (yield transition)
delegate = self

if has_written_values and not ion_event.event_type.ends_container:
if has_written_values and ((indent and trailing_commas) or not ion_event.event_type.ends_container):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be wrong but I think this will also add a space after the last item in an s-exp: (foo bar bar ) I'm not sure that's bad but I don't think it's intended.

Copy link
Contributor Author

@mavjop mavjop Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I did think you were right, but now I actually think not, because there's a separate set of delimiters defined for use when pretty printing:

_CONTAINER_DELIMITER_MAP_PRETTY = {
    IonType.STRUCT: b',',
    IonType.LIST: b',',
    IonType.SEXP: b'', # we use newlines when pretty printing
}

which uses an empty string as the delimiter for s-expressions, since they add newlines.

I could test this to make sure, but I'm pretty sure.

If we do want to apply this to non-pretty-printing, then we could do that, but then your concern expressed here would need an additional work-around, or there would indeed be an extra space.

I think the solution, if needed, would be to fork:

_CONTAINER_DELIMITER_MAP_NORMAL = {
    IonType.STRUCT: b',',
    IonType.LIST: b',',
    IonType.SEXP: b' ',
}

into:

_CONTAINER_DELIMITER_MAP_NORMAL = {
    IonType.STRUCT: b',',
    IonType.LIST: b',',
    IonType.SEXP: b' ',
}
_CONTAINER_END_DELIMITER_MAP_NORMAL = {
    IonType.STRUCT: b',',
    IonType.LIST: b',',
    IonType.SEXP: b'',
}

and use _CONTAINER_END_DELIMITER_MAP_NORMAL for the final trailing delimiter.

tuple_as_sexp (Optional[True|False]): When True, all tuple values will be written as Ion s-expressions.
When False, all tuple values will be written as Ion lists. Default: False.
omit_version_marker (Optional[True|False]): If binary is False and omit_version_marker is True, omits the
Ion Version Marker ($ion_1_0) from the output. Default: False.
trailing_commas (Optional[True|False]): If binary is False and pretty printing (indent is not None), includes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: it may actually be clearer to a user to make it independent of indent I know you don't intend to use it that way, and TBH I don't know the value of doing so, but it would be avoid the need to explain and users understand the dependent relationship, or test for it in the code. 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a thought.

I actually had it accidentally including trailing commas in non-pretty-printed output for the first revision, so it would be easy to put back to that (except for the problem explained in the next paragraph). I guess if you say you don't want indent but you do want trailing commas, there's no strong reason not to.

However ... there is one reason not to. This is a reason why I'd rather that be a feature we could consider but not try to include here. That reason is that the Python C module implementation already doesn't support pretty-printing, so I was able to leave it out of scope. If we make trailing_commas applicable to non-pretty-printed output, then I have to go and do all the C module implementation.

I'd rather not hold this PR up to implement non-pretty-printed trailing-comma output in the C module; especially since I'm not sure we're likely to see any strong demand for trailing commas in non-pretty-printed output. I guess we could make it not depend on indent, but still disclaim it as "not supported in the C implementation". So that's an option.

@mavjop
Copy link
Contributor Author

mavjop commented Oct 11, 2024

Ouch, that's a lot of failed checks. :/ I guess I'll have to have a look into those at some point.

So I don't go on any wild goose chases, do we normally expect all of those checks to pass?

[UPDATE: Looking more, most of them are segfaults and stuff that don't look in any way potentially related to this PR, so I suspect this is (sadly) the status quo, and not something caused by this code change.]

"""C-extension implementation. Users should prefer to call ``dump``."""

res = ionc.ionc_write(obj, binary, sequence_as_stream, tuple_as_sexp)
res = ionc.ionc_write(obj, binary, sequence_as_stream, tuple_as_sexp, trailing_commas)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call is what is causing the segfault in the unit tests. ionc_write isn't expecting to get the trailing_commas argument, and is not handling it well.

If you want to put in a # TODO:, and drop the trailing_commas from the ionc.ionc_write call, I can dig into why it isn't handling that a bit more gracefully, and that should clear up the test errors (the ones that aren't already explained).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops! I meant to omit the argument from the call to ionc...

I'll fix that and post a new rev.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I must be confused. I'm not passing the arg to the C code, as far as I can see?
(Note lack of trailing_commas=trailing_commas in the first branch.)

    if c_ext and __IS_C_EXTENSION_SUPPORTED and (imports is None and indent is None):
        return dump_extension(obj, fp, binary=binary, sequence_as_stream=sequence_as_stream,
                              tuple_as_sexp=tuple_as_sexp, omit_version_marker=omit_version_marker)
    else:
        return dump_python(obj, fp, imports=imports, binary=binary, sequence_as_stream=sequence_as_stream,
                           indent=indent,
                           tuple_as_sexp=tuple_as_sexp, omit_version_marker=omit_version_marker,
                           trailing_commas=trailing_commas)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, oops, in dump_extension. D'oh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

(Removed trailing_commas from dump_extension() and added a TODO).

Copy link
Contributor

@nirosys nirosys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@nirosys nirosys merged commit 479ed74 into amazon-ion:master Oct 21, 2024
35 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants