-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid per-byte loop in cstring{,Utf8} builders #569
base: master
Are you sure you want to change the base?
Conversation
96880aa
to
266d6da
Compare
The emulated CI build failures are spurious/systemic, not related to the PR. If I add a couple of new benchmarks that use somewhat longer string literals in builders: --- a/bench/BenchAll.hs
+++ b/bench/BenchAll.hs
@@ -259,6 +259,8 @@ main = do
, benchB' "UTF-8 String" () $ \() -> P.cstringUtf8 "hello world\0"#
, benchB' "String (naive)" "hello world!" fromString
, benchB' "String" () $ \() -> P.cstring "hello world!"#
+ , benchB' "AsciiLit64" () $ \() -> P.cstring "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"#
+ , benchB' "Utf8Lit64" () $ \() -> P.cstringUtf8 "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\xc0\x80xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"#
]
, bgroup "Encoding wrappers" The relevant benchmark results (GHC 9.4.5) are:
The baseline master branch run was:
|
Thanks for this. I was also looking into this but hadn't pushed anywhere public because I didn't want to give myself another excuse to delay 0.11.4.0. I agree the CI failures look spurious. The i386 CI job is currently broken, but I've retried hoping the others will pass. Your I will take a closer look later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The branching logic can potentially be simplified some. Currently we ask:
- Are we done?
- Is there a null to decode?
- Is the output buffer full?
- Are there any non-nulls to copy?
But we can also ask only:
- Is there a null to decode? (If we are done, the answer will be no.)
- Does the decoded string up to and including that null to decode fit in the output buffer? (If not, copy as much as possible and report a full buffer.)
That would mean we perform extra zero-length memcpy
s in some cases, particularly when there are consecutive (encoded) nulls, so it's not a clear win a priori. But it may be worth investigating.
9086b60
to
e6cc4a2
Compare
nitpick: could |
Sure. Done. I do hope we won't forget to squash before merging... |
If there's anything further I need to do, please let me know... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been a bit sidetracked the last few weeks, sorry.
How is performance affected for strings consisting mostly of null characters? If this patch hurts it some, that's probably OK, but I'd like to know roughly by how much.
!op' = op0 `plusPtr` (nullFree + 1) | ||
nullAt' <- c_strstr ip' modifiedUtf8NUL | ||
modUtf8_step ip' len' nullAt' k (BufferRange op' ope) | ||
| avail > 0 = do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question, but also avail == 0
should be a very rare case.
@vdukhovni please rebase to trigger updated CI jobs. |
44fdcbc
to
0645428
Compare
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM module naming nitpicking!
@vdukhovni could you possibly address @clyring's questions?
-- | GHC represents @NUL@ in string literals via an overlong 2-byte encoding, | ||
-- which is part of "modified UTF-8" (GHC does not also implement CESU-8). | ||
modifiedUtf8NUL :: CString | ||
modifiedUtf8NUL = Ptr "\xc0\x80"# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modifiedUtf8NUL = Ptr "\xc0\x80"# | |
modUtf8NUL = Ptr "\xc0\x80"# |
Let's keep the prefix consistent.
ping @vdukhovni Do you plan to come back to this patch? Would you like to pass this off to a maintainer? |
It's basically ready, right. There were just some cosmetic issues that perhaps a maintainer could tweak to suite their preference and I can review the result? Does that work? |
Copy chunks of the input to the output buffer with 'memcpy', up to the shorter of the available buffer space and the "null-free" portion of the remaining string. For the UTF8 version, encoded NUL bytes are located via strstr(3).
0645428
to
01b5f36
Compare
Perhaps I can get this over the line. What remains to be done? |
-- available buffer space. If the string is long enough, we may have asked | ||
-- for less than its full length, filling the buffer with the rest will go | ||
-- into the next builder step. | ||
| avail > nullFree = do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please check with hpc
that tests provide sufficient coverage of all cases here? (Sorry, I'm AFK and cannot check myself)
This PR is languishing. Where do we go from here? |
The main questions I had were the ones I raised in this round of review. I've just started to look into them myself since I'd really like this patch to land eventually. Another idea that has since occurred to me is that since |
Perhaps, though one might expect that an optimised C-library |
Heads-up: I'll probably push some updates and finishing touches to this branch later today or tomorrow. |
* Do not measure the overhead of allocating destination chunks * Add several more benchmarks for P.cstring and P.cstringUtf8
(This won't work with -fpure-haskell yet.)
The small-builder benchmarks were set up in a terrible way that made using them to investigate performance very difficult. My recent push should hopefully fix that. |
The magic noinline id just isn't available with ghc-8.0...
@@ -84,6 +84,8 @@ module Data.ByteString.Builder.Internal ( | |||
-- , sizedChunksInsert | |||
|
|||
, byteStringCopy | |||
, asciiLiteralCopy | |||
, modUtf8LitCopy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, modUtf8LitCopy | |
, modUtf8LiteralCopy |
For consistency with asciiLiteralCopy
(or we might as well chose to use Lit
for both)
Here's what the benchmarks currently look like on my machine with Baseline (3ce0346):
Topic (2603009):
Lots of big changes. Some are expected:
But there's also a nasty surprise:
I also wanted to see how the
|
I have confirmed that reducing the syntactic arity of |
Removing milestone for now. |
Copy chunks of the input to the output buffer with 'memcpy', up to the shorter of the available buffer space and the "null-free" portion of the remaining string. For the UTF8 version, encoded NUL bytes are located via strstr(3).