Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster convert for uv_write, no huge boiler-plate assembly, and not throw InexactError potentially while printing #56349

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

PallHaraldsson
Copy link
Contributor

@PallHaraldsson PallHaraldsson commented Oct 26, 2024

Using % Int, unproblematic/faster on 64-bit, works up to 2 GB strings only on 32-bit, but before this change printed larger strings but then then throws inexact, so kind of didn't work anyway.

@KristofferC
Copy link
Member

Typically with performance PRs one also provide a benchmark to show that the PR has any effect in practice. Could you do that?

@inkydragon inkydragon added the performance Must go faster label Oct 27, 2024
@PallHaraldsson PallHaraldsson marked this pull request as draft October 27, 2024 15:52
@PallHaraldsson PallHaraldsson marked this pull request as ready for review October 27, 2024 16:10
@PallHaraldsson PallHaraldsson changed the title Faster convert, no huge boiler-plate assembly Faster convert for uv_write, no huge boiler-plate assembly Oct 27, 2024
@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Oct 27, 2024

I have no benchmakrs, microbenchmarking likely ineffective, less L1 cache use would help indirectly not crowding out for other use with this 1 or zero assembly instructions on 64-bit now, and pretty sure faster on 32-bit too, and better there (arguably, not just for performance).

Feel free to backport (or not, if unsure, think risky), but add backport lablel so can be considered later and at least merge for master?

I just thought this would be a simple change, one line, low-hanging fruit.

Getting rid of malloc is my real goal, found it here:

uvw = Libc.malloc(_sizeof_uv_write)

alloca might work, but would be a separate PR, I think. Do you think eliminating malloc is not worthy (or easily possible) since in immediate free after?

julia> Int(Base._sizeof_uv_write)
200

If this is never larger could fit on the stac, with alloca.

It wasn't easy to find _sizeof_uv_write but seems defined here:

for r in uv_req_types

UV_REQ_TYPE_MAP seems related and I only found that here: https://docs.libuv.org/en/v1.x/request.html

[The CI error was actually mine.. now fixed.]

New CI failure (not mine this time?!):

┌ Warning: Process I/O error
│   exception = (UndefVarError(:i, Base), Union{Ptr{Nothing}, Base.InterpreterIP}[Ptr{Nothing}(0xf75f5eda), Ptr{Nothing}(0xe561148a), Ptr{Nothing}(0xe5610685), Ptr{Nothing}(0xe5612262), Ptr{Nothing}(0xe5612dce), Ptr{Nothing}(0xf756538b), Ptr{Nothing}(0xf758acc5)])
└ @ Base process.jl:315
ERROR: LoadError: TaskFailedException
Stacktrace:
  [1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
    @ Base ./task.jl:1104
  [2] wait()
    @ Base ./task.jl:1168
  [3] wait(c::Base.GenericCondition{ReentrantLock}; first::Bool)
    @ Base ./condition.jl:141
  [4] wait
    @ ./condition.jl:136 [inlined]
  [5] take_buffered(c::Channel{String})
    @ Base ./channels.jl:493
  [6] take!
    @ ./channels.jl:487 [inlined]
  [7] iterate(c::Channel{String}, state::Nothing)
    @ Base ./channels.jl:670
  [8] generate_precompile_statements()
    @ Main.anonymous /cache/build/builder-amdci4-2/julialang/julia-master/contrib/generate_precompile.jl:396
  [9] top-level scope
    @ /cache/build/builder-amdci4-2/julialang/julia-master/contrib/generate_precompile.jl:416
 [10] eval(m::Module, e::Any)
    @ Core ./boot.jl:458
 [11] top-level scope
    @ /cache/build/builder-amdci4-2/julialang/julia-master/contrib/generate_precompile.jl:4
 [12] include(mod::Module, _path::String)
    @ Base ./Base.jl:590
 [13] exec_options(opts::Base.JLOptions)
    @ Base ./client.jl:329
 [14] _start()
    @ Base ./client.jl:558
    nested task error: TaskFailedException
    Stacktrace:
      [1] #wait#577
        @ ./task.jl:363 [inlined]
      [2] wait
        @ ./task.jl:360 [inlined]
      [3] wait(x::Base.Process, syncd::Bool)
        @ Base ./process.jl:705
      [4] wait
        @ ./process.jl:687 [inlined]
      [5] success(x::Base.Process)
        @ Base ./process.jl:556
      [6] run(::Base.CmdRedirect; wait::Bool)
        @ Base ./process.jl:513
      [7] run
        @ ./process.jl:510 [inlined]
      [8] (::Main.anonymous.var"#generate_precompile_statements##4#generate_precompile_statements##5"{Channel{String}, String})(prec_path::String)
        @ Main.anonymous /cache/build/builder-amdci4-2/julialang/julia-master/contrib/generate_precompile.jl:338
      [9] mktempdir(fn::Main.anonymous.var"#generate_precompile_statements##4#generate_precompile_statements##5"{Channel{String}, String}, parent::String; prefix::String)
        @ Base.Filesystem ./file.jl:899
     [10] mktempdir (repeats 2 times)
        @ ./file.jl:895 [inlined]
     [11] (::Main.anonymous.var"#generate_precompile_statements##2#generate_precompile_statements##3"{Channel{String}, String})()
        @ Main.anonymous /cache/build/builder-amdci4-2/julialang/julia-master/contrib/generate_precompile.jl:311
        nested task error: UndefVarError: `i` not defined in `Base`
        Stacktrace:
         [1] uv_write(s::Base.PipeEndpoint, p::Ptr{UInt8}, n::UInt32)
           @ Base ./stream.jl:1070
         [2] unsafe_write(s::Base.PipeEndpoint, p::Ptr{UInt8}, n::UInt32)
           @ Base ./stream.jl:1158
         [3] write
           @ ./iobuffer.jl:543 [inlined]
         [4] (::Base.var"#setup_stdio##0#setup_stdio##1"{Base.PipeEndpoint, Base.PipeEndpoint, IOBuffer})()
           @ Base ./process.jl:313
in expression starting at /cache/build/builder-amdci4-2/julialang/julia-master/contrib/generate_precompile.jl:4
Collecting and executing precompile statements
└ Collect (Basic: ◐ ) => Execute ◐ 90*** This error is usually fixed by running `make clean`. If the error persists, try `make cleanall`. ***
make[1]: *** [sysimage.mk:96: /cache/build/builder-amdci4-2/julialang/julia-master/usr/lib/julia/sys-o.a] Error 1
make[1]: Leaving directory '/cache/build/builder-amdci4-2/julialang/julia-master'
make: *** [Makefile:113: julia-sysimg-release] Error 2

base/stream.jl Outdated Show resolved Hide resolved
base/stream.jl Outdated Show resolved Hide resolved
base/stream.jl Outdated Show resolved Hide resolved
@fingolfin
Copy link
Contributor

@PallHaraldsson it is very difficult to follow your explanations as to what motivates this PR. In the description you talk about something being "faster" but I see no evidence or at least plausible explanation what should be faster here and why.

On the other than it seems this PR is about writing more than 2GB at a time on 32bit systems. The change you make seems to be to have it throw an exception at the start and not at the end, with a maybe more helpful error message?

What is it now?

base/stream.jl Outdated Show resolved Hide resolved
@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Oct 27, 2024

It was a one-line change in the beginning. I saw un-needed bloated assembly code (on 64-bit), with @code_native. Int(n) is always a check (from UInt), while n % Int isn't.

I thought about effect on 32-bit, but then made changes to conform to the original return type (for 32-bit platforms), why the code is a bit non-obvious... after my further change.

I'm not very concerned making 32-bit faster, but I think it's at least not slower, and with such systems often having less (L1) cache likely more helpful there.

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Oct 27, 2024

The change you make seems to be to have it throw an exception at the start and not at the end, with a maybe more helpful error message?

Exactly (but wasn't the original point or then in the code).

You can construct string larger than 2GB on 32-bit, AND then print them (I didn't test), and THEN you would get InexactError thrown, so it's not hugely important to be able to print first.

It's of course a very obscure error to throw (and after the fact) and if you try to catch the exception and continue, then it seems like a bug, at least a memory leak if it would actually work.

I kind of liked just returning UInt64 also on 32-bit, would have been simpler code. :) And no throwing or need to explain workaround.

No real CI failure (e.g. not for 32-bit, what I was last fixing; there are some false alarm CI failures).

@PallHaraldsson PallHaraldsson marked this pull request as draft October 27, 2024 22:10
@PallHaraldsson PallHaraldsson marked this pull request as ready for review October 28, 2024 08:18
@PallHaraldsson PallHaraldsson changed the title Faster convert for uv_write, no huge boiler-plate assembly Faster convert for uv_write, no huge boiler-plate assembly, and not throw InexactError potentially while printing Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants