-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add builtin support for truncation #25
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for the detailed PR message.
Reviewed 5 of 6 files at r1.
Reviewable status: 5 of 6 files reviewed, 4 unresolved discussions (waiting on @abarganier, @knz, @kzh, and @rimadeodhar)
markers_test.go, line 237 at r1 (raw file):
) }}, 20)) }, "Inside saf... ‹uns›..."},
btw yeah I see how the nested truncate looks odd with the multiple ...
in there. I think it's something we can get feedback from users on once people have a chance to use the truncation feature. I imagine these complex cases will be less common since we'll truncate at the top level usually.
interfaces/interfaces.go, line 151 at r1 (raw file):
} type TruncatedValue struct {
nit: struct inside a file named interfaces.go
. Maybe there's a better place for it.
internal/buffer/buffer.go, line 37 at r1 (raw file):
// TODO(kzh): describe these fields runes int limit int
nit: wonder if runeCount
and runeLimit
might be better...?
internal/buffer/buffer.go, line 149 at r1 (raw file):
func (b *Buffer) WriteString(s string) (n int, err error) { if b.limit != 0 { r := []rune(s)
move r
definition below if statement.
internal/buffer/buffer.go, line 222 at r1 (raw file):
} type TruncateState struct {
nit: add docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the experiment.
There are two discussions needed here.
- I am not too convinced yet by the idea of a helper
Truncate
function. There are several unanswered questions with it:
- What does
Truncate(Truncate(a, 20), 3)
even mean? What aboutT(T(a, 3), 20
? i.e. does it commute? do we want it to? Is the current behavior of the implementation deliberate or merely an accident? - What does
Safe(Truncate(x, n))
mean? (This needs tests) - What does
Truncate(x, 0)
mean? (This needs tests) - What does
Truncate(x, n)
do ifx
is formatted using a mix of safe and unsafe sub-strings? (This needs tests)
If I am not mistaken, the current behavior of Truncate(x, n)
is (approximately)
- format x as usual
- truncate x to n characters, or n-1 if it starts with a redaction marker
- re-introduce a closing redaction marker if it was truncated away
Then, why not implement Truncate
like this - without changing the formatting code?
- when we first discussed this, you had proposed a formatting flag, for example
%#20v
. Why did you not do this? It looks to me as if the implementation of this would be simpler.
Reviewed 5 of 6 files at r1.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @abarganier, @kzh, and @rimadeodhar)
internal/buffer/buffer.go, line 149 at r1 (raw file):
func (b *Buffer) WriteString(s string) (n int, err error) { if b.limit != 0 { r := []rune(s)
please don't implement this like this. The redact library really means to go the longest way possible to avoid heap allocations. Here you're creating a new slice on the heap for each string argument and using sub-slices for something that could be implemented with a for loop.
internal/markers/constants.go, line 28 at r1 (raw file):
EscapeMarkS = string(EscapeMark) RedactedS = StartS + "×" + EndS Truncate = `...`
I'd recommend …
instead.
f3d2d0e
to
f79d56f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @dhartunian and @knz! Sorry for the long wait. I brushed this to the side for a bit to focus on other work (namely the Datadog Agent in CC).
What does Truncate(Truncate(a, 20), 3) even mean? What about T(T(a, 3), 20? i.e. does it commute? do we want it to? Is the current behavior of the implementation deliberate or merely an accident?
The current behavior is deliberate. In the cases of nested truncation operations, the limit of each inner truncation is min
'ed against the outer limit. This logic happens in the Truncate
method of the Buffer
struct. So T(T(a, 20), 3)
would effectively be the same as T(T(a, 3), 3)
since 20 > 3.
The usage of nested truncation will likely be rare. Support for it was added just in case someone tries:
type Foo struct {
a string
b string
}
func (f Foo) SafeFormat(w redact.SafePrinter, _ rune) {
w.Print(redact.Truncate(f.a, 10), redact.Truncate(f.b, 15))
}
log.Println(redact.Truncate(Foo{}, 20))
What does Safe(Truncate(x, n)) mean? (This needs tests)
The Truncate operation does not affect the "safe/unsafe-ness" of a value. So Safe(Truncate(x, n)) should mean the same as Safe(x) but truncated to n runes.
What does Truncate(x, 0) mean? (This needs tests)
Truncate only applies when there is a positive limit. The entire value of x
would be outputted here.
What does Truncate(x, n) do if x is formatted using a mix of safe and unsafe sub-strings? (This needs tests)
It would first limit the output x to n runes. If the truncation ends with an open redaction marker, the ending redaction marker would be promptly appended in addition to a truncation marker afterwards. So: Truncate(Unsafe("hello", 3)) -> ‹hel›…
.
If I am not mistaken, the current behavior of Truncate(x, n) is (approximately)
The truncation logic is mostly embedded into the Buffer
struct write methods. Truncate sets a limit in the struct's field runeLimit
. All future write operations then keeps count of runes written in the buffer and will restrict writing when this limit is hit. This is done this way to prevent unnecessary memory allocation that happens when the buffer grows. This is a key detail since truncation doesn't actually truncate once after the entire output is generated. It performs this operation as the buffer is being written to. Something else to note is that all markers are excluded from this rune count.
Then, why not implement Truncate like this - without changing the formatting code?
when we first discussed this, you had proposed a formatting flag, for example %#20v. Why did you not do this? It looks to me as if the implementation of this would be simpler.
In my mind, the implementation would have been somewhat similar with the bulk of the logic happening within the Buffer
struct write operations. It was just a matter of how this truncation feature is exposed to developers that use this library. I opted for this Truncate
function since it makes this feature more visible/discoverable from a developer's perspective than a formatting flag which is a bit much more subtle. It also makes any code that uses this functionality more understandable from a glance rather than cryptic. You can somewhat tell right off the bat what log.Info(redact.Truncate("hello", 5)
intends to do versus log.Infof("%#5v", "hello")
. I don't have a strong opinion on this and totally super cool to switch over, but I feel the underlying implementation would still be somewhat similar.
Reviewable status: 1 of 6 files reviewed, 4 unresolved discussions (waiting on @abarganier, @dhartunian, @knz, and @rimadeodhar)
markers_test.go, line 237 at r1 (raw file):
Previously, dhartunian (David Hartunian) wrote…
btw yeah I see how the nested truncate looks odd with the multiple
...
in there. I think it's something we can get feedback from users on once people have a chance to use the truncation feature. I imagine these complex cases will be less common since we'll truncate at the top level usually.
👍
interfaces/interfaces.go, line 151 at r1 (raw file):
Previously, dhartunian (David Hartunian) wrote…
nit: struct inside a file named
interfaces.go
. Maybe there's a better place for it.
I think the file name might mean more broadly interfaces in general, not strictly limited specifically to interfaces in Go. This is judging off of the fact that there are already some non interface type definitions in there. However, I could very well be misinterpreting this 🤔
internal/buffer/buffer.go, line 37 at r1 (raw file):
Previously, dhartunian (David Hartunian) wrote…
nit: wonder if
runeCount
andruneLimit
might be better...?
Done.
internal/buffer/buffer.go, line 149 at r1 (raw file):
Previously, dhartunian (David Hartunian) wrote…
move
r
definition below if statement.
Done.
internal/buffer/buffer.go, line 149 at r1 (raw file):
Previously, knz (kena) wrote…
please don't implement this like this. The redact library really means to go the longest way possible to avoid heap allocations. Here you're creating a new slice on the heap for each string argument and using sub-slices for something that could be implemented with a for loop.
Done. Is this what you were thinking of?
internal/buffer/buffer.go, line 222 at r1 (raw file):
Previously, dhartunian (David Hartunian) wrote…
nit: add docstring
Done.
internal/markers/constants.go, line 28 at r1 (raw file):
Previously, knz (kena) wrote…
I'd recommend
…
instead.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r4.
Reviewable status: 2 of 6 files reviewed, 9 unresolved discussions (waiting on @abarganier, @dhartunian, @knz, and @kzh)
internal/buffer/buffer.go, line 140 at r4 (raw file):
if b.runeLimit != 0 { if b.runeLimit == b.runeCount { if len(p) != 0 {
Nit: You can use length here.
internal/buffer/buffer.go, line 143 at r4 (raw file):
Quoted 5 lines of code…
if b.runeLimit == b.runeCount { if len(p) != 0 { b.excess = true } return
Seems like this can be made simpler
if b.runeLimit == b.runeCount {
b.excess = length != 0
return
}
internal/buffer/buffer.go, line 201 at r4 (raw file):
} excess = len(p) != 0 return
Don't need this return statement right? It's at the end of the function?
internal/buffer/buffer.go, line 214 at r4 (raw file):
} excess = len(str) != 0 return
Ditto.
internal/buffer/buffer.go, line 222 at r4 (raw file):
if b.runeLimit == b.runeCount { b.excess = true return nil
Do we not consider truncation to be an error? i.e. we don't actually write the byte in this case but simply set the excess field right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what happens if you specify Truncate with an object (say, a struct or an error) which has a SafeFormat method that mixes multiple redactable and non-redactable substrings. If my reading of the code is right, we'll get multiple ellipses in that case. (This needs tests!)
I think that if you combine the %xxv
approach and also do not use a separate TruncatedValue
type (i.e. do not implement this logic in internal.Buffer, and instead do it in the printf code), this will remove the artifacts when an object that contains multiple sub-fields gets formatted.
Meanwhile, from a purely engineering perspective:
The truncation logic is mostly embedded into the Buffer struct write methods
This still feels strange to me. This design means we're paying overhead for a rarely-used feature in the common case.
Reviewable status: 2 of 6 files reviewed, 9 unresolved discussions (waiting on @abarganier, @dhartunian, @knz, and @kzh)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As found out in PR #26, you'll also need to double check the behavior of %q
and %x
with truncation
As well as the behavior of %v
with a nil
pointer, i.e. what does Truncate(nil)
do
(In fact I recommend you pilfer the unit tests in that other PR and find out what they reveal about your implementation)
Reviewable status: 2 of 6 files reviewed, 9 unresolved discussions (waiting on @abarganier, @dhartunian, @knz, and @kzh)
This PR adds in truncation support to the redact library. This is a proof of concept of what this feature could look like. It is exposed through the API as a wrapper function similar to
redact.Safe
.An example usage is:
Truncation limits the rune length of a printed value and appends a truncation marker
...
in the case that the limit is exceeded. The motivation behind this feature is from cockroachdb/cockroach#67065 where printed keys can take too much screen real estate hence the need for truncation. It seems appropriate to insert this logic into the redaction library since the notation of unsafe/safe segments of a printed value is construed within. It would feel somewhat redundant to interpret these semantics outside.Regarding overhead, the truncation logic is embedded in such a way that there is minimal additional performance/memory demands when writing to the buffer and only when necessary.
Notes:
I'm unsure if this is the expected behavior of truncation. I'm a tad worried this could be somewhat misleading in suggesting that the unsafe segment is complete in contrast to "‹hel...›".
@knz I'd love to hear your thoughts on this!
This change is