-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster convert for uv_write, no huge boiler-plate assembly, and not throw InexactError potentially while printing #56349
base: master
Are you sure you want to change the base?
Conversation
Typically with performance PRs one also provide a benchmark to show that the PR has any effect in practice. Could you do that? |
I have no benchmakrs, microbenchmarking likely ineffective, less L1 cache use would help indirectly not crowding out for other use with this 1 or zero assembly instructions on 64-bit now, and pretty sure faster on 32-bit too, and better there (arguably, not just for performance). Feel free to backport (or not, if unsure, think risky), but add backport lablel so can be considered later and at least merge for master? I just thought this would be a simple change, one line, low-hanging fruit. Getting rid of malloc is my real goal, found it here: Line 1114 in 643dd4a
alloca might work, but would be a separate PR, I think. Do you think eliminating malloc is not worthy (or easily possible) since in immediate free after?
If this is never larger could fit on the stac, with alloca. It wasn't easy to find Line 25 in fcf7ec0
UV_REQ_TYPE_MAP seems related and I only found that here: https://docs.libuv.org/en/v1.x/request.html [The CI error was actually mine.. now fixed.] New CI failure (not mine this time?!):
|
@PallHaraldsson it is very difficult to follow your explanations as to what motivates this PR. In the description you talk about something being "faster" but I see no evidence or at least plausible explanation what should be faster here and why. On the other than it seems this PR is about writing more than 2GB at a time on 32bit systems. The change you make seems to be to have it throw an exception at the start and not at the end, with a maybe more helpful error message? What is it now? |
It was a one-line change in the beginning. I saw un-needed bloated assembly code (on 64-bit), with I thought about effect on 32-bit, but then made changes to conform to the original return type (for 32-bit platforms), why the code is a bit non-obvious... after my further change. I'm not very concerned making 32-bit faster, but I think it's at least not slower, and with such systems often having less (L1) cache likely more helpful there. |
Exactly (but wasn't the original point or then in the code). You can construct string larger than 2GB on 32-bit, AND then print them (I didn't test), and THEN you would get InexactError thrown, so it's not hugely important to be able to print first. It's of course a very obscure error to throw (and after the fact) and if you try to catch the exception and continue, then it seems like a bug, at least a memory leak if it would actually work. I kind of liked just returning UInt64 also on 32-bit, would have been simpler code. :) And no throwing or need to explain workaround. No real CI failure (e.g. not for 32-bit, what I was last fixing; there are some false alarm CI failures). |
8af10ed
to
22f6ad8
Compare
Using % Int would work up to 2 GB strings only on 32-bit. Maybe keep status quo there, or unproblematic to return In64 there too not Int32 as before?
Co-authored-by: Max Horn <[email protected]>
22f6ad8
to
fcc972b
Compare
Using
% Int
, unproblematic/faster on 64-bit, works up to 2 GB strings only on 32-bit, but before this change printed larger strings but then then throws inexact, so kind of didn't work anyway.