optimize constant length `memorynew` intrinsic (take 2) #56847

oscardssmith · 2024-12-16T23:37:02Z

replaces #55913 (the rebase was more annoying than starting from scratch)
This allows the compiler to better understand what's going on for memorynew with compile-time constant length, allowing for LLVM level escape analysis in some cases. There is more room to grow this (currently this only optimizes for fairly small Memory since bigger ones would require writing some more LLVM code, and we probably want a size limit on putting Memory on the stack to avoid stackoverflow. For larger ones, we could potentially inline the free so the Memory doesn't have to be swept by the GC, etc.

julia> function g()
           m = Memory{Int}(undef, 2)
           for i in 1:2
              m[i] = i
           end
           m[1]+m[2]
       end

julia> @btime g()
  9.735 ns (1 allocation: 48 bytes) #before
  1.719 ns (0 allocations: 0 bytes) #after

pchintalapudi · 2024-12-17T02:43:18Z

Imo having done this a couple of times, it might be more fruitful to start with just simple partial escape analysis for any object; if an allocation is only ever escaping in error blocks, and doesn't escape its address elsewhere, you can clone the alloc and memcpy from stack to heap at the beginning of each error block (we already track basically all of this info in our current escape analysis). This has the added advantage of making downstream array optimizations far easier to trigger (since you're no longer dependent on IRCE/type inference to eliminate all boundschecks before doing anything interesting) and therefore make bugs more obvious before releasing to the wild. I'm curious to know of any real world test cases this is able to elide arrays on.

oscardssmith · 2024-12-17T04:17:44Z

@nanosoldier runtests(configuration = (julia_args=["--check-bounds=auto"],), vs_configuration = (julia_args=["--check-bounds=auto"],))

oscardssmith · 2024-12-17T04:26:18Z

I definitely want to do BoundsError excepted escape analysis, but I think that belongs in a separate PR. As it stands, this PR only enables escaping in relatively few places, and there absolutely is room to expand the cases where the optimization can occur.

gbaraldi · 2024-12-17T12:56:04Z

@pchintalapudi it actually does work in some cases surprisingly enough. I.e this breaks the performance tips doctest, because that test stops allocating :). But I agree that for this to be truly worth it, we need partial escape analysis and potentially being able to do this IPO.

oscardssmith · 2024-12-18T19:13:43Z

How do people feel about this? if there aren't objections, I'll merge as long as Pkgeval comes back clean

vtjnash

SGTM

vtjnash · 2024-12-18T19:40:51Z

@nanosoldier runbenchmarks(!"scalar", vs=":master")

nanosoldier · 2024-12-18T22:38:32Z

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

topolarity · 2024-12-18T22:47:50Z

The NearestNeighbors, LMDB, and GeoAcceleratedArrays failures look relevant

oscardssmith · 2024-12-18T23:44:58Z

LMDB is their own fault (see wildart/LMDB.jl#41, they are using unsafe_wrap incorrectly).
see #55913 (comment) wrt Nearest neighbors (TLDR there does seem to be a failure that neither gabriel or I can reproduce locally except on macos where master fails with a different error).
GeoAcceleratedArrays might be real.

nanosoldier · 2024-12-19T17:19:22Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@gbaraldi

while investigating some missed optimizations in #56847, @gbaraldi and I realized that `copy(::Array)` was using `jl_genericmemory_copy_slice` rather than the `memmove`/`jl_genericmemory_copyto` that `copyto!` lowers to. This version lets us use the faster LLVM based Memory initialization, and the memove can theoretically be further optimized by LLVM (e.g. not copying elements that get over-written without ever being read). ``` julia> @Btime copy($[1,2,3]) 15.521 ns (2 allocations: 80 bytes) # before 12.116 ns (2 allocations: 80 bytes) #after julia> m = Memory{Int}(undef, 3); julia> m.=[1,2,3]; julia> @Btime copy($m) 11.013 ns (1 allocation: 48 bytes) #before 9.042 ns (1 allocation: 48 bytes) #after ``` We also optimize the `memorynew` type inference to make it so that getting the length of a memory with known length will propagate that length information (which is important for cases like `similar`/`copy` etc).

oscardssmith · 2024-12-23T22:17:03Z

@nanosoldier runtests(["Kroki", "GeoAcceleratedArrays", "VectorizedStatistics", "FinanceCore", "Isoplot", "MarsagliaDiscreteSamplers", "VectorizedReduction", "Chron", "TensorOperationsTBLIS", "BiBufferedStreams"], configuration = (julia_args = ["--check-bounds=auto"],), vs_configuration = (julia_args = ["--check-bounds=auto"],))

nanosoldier · 2024-12-23T23:12:59Z

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

oscardssmith mentioned this pull request Dec 16, 2024

optimize memorynew intrinsic for constant length Memory #55913

Closed

oscardssmith added performance Must go faster compiler:codegen Generation of LLVM IR and native code needs pkgeval Tests for all registered packages should be run with this change arrays [a, r, r, a, y, s] labels Dec 16, 2024

oscardssmith requested review from pchintalapudi, topolarity and gbaraldi December 16, 2024 23:45

oscardssmith mentioned this pull request Dec 17, 2024

simplify and slightly improve memorynew inference #56857

Merged

vtjnash reviewed Dec 18, 2024

View reviewed changes

oscardssmith added 2 commits December 23, 2024 17:11

optimize constant length Memory

4af85d4

add todo

d7c27ce

oscardssmith force-pushed the os/optimize-const-len-memorynew branch from 3236e12 to d7c27ce Compare December 23, 2024 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize constant length `memorynew` intrinsic (take 2) #56847

optimize constant length `memorynew` intrinsic (take 2) #56847

oscardssmith commented Dec 16, 2024

pchintalapudi commented Dec 17, 2024

oscardssmith commented Dec 17, 2024

oscardssmith commented Dec 17, 2024

gbaraldi commented Dec 17, 2024

oscardssmith commented Dec 18, 2024

vtjnash left a comment

vtjnash commented Dec 18, 2024

nanosoldier commented Dec 18, 2024

topolarity commented Dec 18, 2024 •

edited

Loading

oscardssmith commented Dec 18, 2024

nanosoldier commented Dec 19, 2024

oscardssmith commented Dec 23, 2024

nanosoldier commented Dec 23, 2024

optimize constant length memorynew intrinsic (take 2) #56847

Are you sure you want to change the base?

optimize constant length memorynew intrinsic (take 2) #56847

Conversation

oscardssmith commented Dec 16, 2024

pchintalapudi commented Dec 17, 2024

oscardssmith commented Dec 17, 2024

oscardssmith commented Dec 17, 2024

gbaraldi commented Dec 17, 2024

oscardssmith commented Dec 18, 2024

vtjnash left a comment

Choose a reason for hiding this comment

vtjnash commented Dec 18, 2024

nanosoldier commented Dec 18, 2024

topolarity commented Dec 18, 2024 • edited Loading

oscardssmith commented Dec 18, 2024

nanosoldier commented Dec 19, 2024

oscardssmith commented Dec 23, 2024

nanosoldier commented Dec 23, 2024

optimize constant length `memorynew` intrinsic (take 2) #56847

optimize constant length `memorynew` intrinsic (take 2) #56847

topolarity commented Dec 18, 2024 •

edited

Loading