Halide v16.0.0
What's Changed
General Notes
- Support for the Vulkan API (w/SPIR-V codegen)
- Support for WebGPU (experimental)
- Improved Halide IR HTML Visualization
- Fixed a regression in the Adams2019 auto-scheduler that disabled sub-tiling
- Added GPU auto-scheduler (Anderson2021)
Efficient Automatic Scheduling of Imaging and Vision Pipelines for the GPU
Luke Anderson, Andrew Adams, Karima Ma, Tzu-Mao Li, Tian Jin, Jonathan Ragan-Kelley
Proceedings of the ACM on Programming Languages (OOPSLA 2021)
Deprecations / Removals
OpenGLCompute
has been deprecatedParamMap
has been deprecated- Deprecated
HVX_shared_object
feature has been removed - References to deprecated fixed-point operators have been removed
- Deprecated
halide_target_feature_disable_llvm_loop_opt
has been removed - Deprecated
MIPS
device support has been removed
Notable Fixes & Changes
- Generate dot() in the Metal backend by @vksnk in #7085
- Add evaluate() and evaluate_may_gpu() to Python bindings by @steven-johnson in #7108
- Add support for generating LLVM vector predication intrinsics. by @zvookin in #7111
- RISC V vector predication support intrinsics support by @zvookin in #7119
- Add range-checking to Buffer objects in Python by @steven-johnson in #7128
- Fix Python buffer handling by @steven-johnson in #7125
- [WASM] Use rounding_mul_shift_right for q15mulr_sat_s pattern by @rootjalex in #7134
- [x86] Generate AVX512 fixed-point instructions by @rootjalex in #7129
- Fix readnone attribute for llvm 16 by @abadams in #7152
- Call cache.clear between internal functions in CG_C by @steven-johnson in #7155
- Add
bfloat
support tohalide_type_to_string()
by @steven-johnson in #7154 - Factor simd_op_check into separate files by architecture. by @zvookin in #7163
- Slightly improve error message for non-integer RDom min/extent by @abadams in #7151
- Migrate from MCJIT to ORC JIT by @dkurt in #7166
- Use n32:64 in RISC-V data layout by @dkurt in #7175
- Don't attempt to use makecontext()/swapcontext() on Android by @steven-johnson in #7196
- Add bridging for clang _Float16 type. by @zvookin in #7201
- Fix issue with vector predicated comparison and select instructions. by @zvookin in #7205
- Add RISC V zvl flag for LLVM version 16 or greater. by @zvookin in #7209
- Extend LLVM IR type mangling to handle scalars. by @zvookin in #7212
- Fix bitrot in PowerPC testing by @steven-johnson in #7211
- Use aligned_alloc() as default allocator for HalideBuffer.h on most platforms by @steven-johnson in #7190
- Tighten alignment promises for halide_malloc() by @steven-johnson in #7222
- Fix some sources of signed integer overflow in the compiler by @abadams in #7231
- Explicitly stage strided loads by @abadams in #7230
- Remove deprecated halide_target_feature_disable_llvm_loop_opt by @steven-johnson in #7247
- Conditional allocations shouldn't fail for size=0 in C++ backend (#7255) by @steven-johnson in #7256
- Inline into extern function args during bounds inference by @abadams in #7261
- Use ::aligned_alloc() instead of std::aligned_alloc() in HalideBuffer.h by @steven-johnson in #7268
- Optimize Module::compile() for some edge cases by @steven-johnson in #7269
- Drop support for MIPS (#7287) by @steven-johnson in #7289
- Emit prototypes for destructor functions in C Backend by @steven-johnson in #7296
- [HVX] Fix EliminateInterleaves by @rootjalex in #7279
- Remove dependency on platform threads library by @alexreinking in #7297
- Fix error of add_halide_generator in cross-compilation by @stevesuzuki-arm in #7283
- Fix issue in add_halide_runtime in cross-compilation by @stevesuzuki-arm in #7284
- Add workaround for the const-or-not user_context issue (#635) by @steven-johnson in #7291
- [x86 & wasm] Split up double saturating-narrows from i32 by @rootjalex in #7280
- Hoist vector slices using rewrite rules by @abadams in #7243
- Improved halide_popcount by @Aelphy in #7225
- halide_popcount<uint64_t> is broken by @steven-johnson in #7313
- Fix segfault by nonconstant bound in Adams2019 by @stevesuzuki-arm in #7321
- Make auto scheduler libs available in HalideHelpers package by @stevesuzuki-arm in #7285
- Improve support for Arm baremetal compilation and runtime by @stevesuzuki-arm in #7286
- Remove deprecated
HVX_shared_object
feature by @steven-johnson in #7331 - Fix a subtle uninitialized-memory-read in Buffer::for_each_value() by @steven-johnson in #7330
- Add a hook to Codegen_C::compile() by @steven-johnson in #7335
- Tiny improvements in codegen in C backend by @steven-johnson in #7337
- Devirtualize the protected compile() methods in Codegen_C by @steven-johnson in #7341
- Fix tuple output bounds checks by @abadams in #7345
- Change early-bound default args in Python bindings to late-bound by @steven-johnson in #7347
- Fix Python error handling by @steven-johnson in #7352
- Permit vectorization of non-recursive atomic operations by @abadams in #7346
- Update WABT to 1.0.32; Increase stack size for WASM AOT apps by @steven-johnson in #7373
- Bounds visitors for min/max were missing single_point mutated case by @abadams in #7377
- Fix overflow in x86 absd lowering by @abadams in #7407
- Add initial support for WebGPU by @jrprice in #6492
- Use pmaddubsw for non-RDom horizontal widening adds by @abadams in #7440
- Compute comparison masks in narrower types if possible by @abadams in #7392
- Fix bugs in PyTorch codegen. by @Yongqi-Zhuo in #7443
- Remove references to deprecated variants of fixed-point operators by @steven-johnson in #7457
- Add GPU autoscheduler by @aekul in #6856
- d3d12 runtime: replacing spinlocks by mutex objects by @slomp in #7489
- Feature Enhancement: Halide IR HTML Visualization by @maaz139 in #7421
- Deprecate ParamMap (#7121) by @steven-johnson in #7357
- Forbid assigning to Buffer(Expr) by introducing an intermediate type. by @abadams in #7517
- [vulkan phase2] Vulkan Runtime by @derek-gerstmann in #6924
- Add libfuzzer compatible fuzz harness by @silvergasp in #7512
- fuzz: Port correctness/cse fuzzer over to libfuzzer by @silvergasp in #7543
- metal : replacing spinlock by mutex by @slomp in #7532
- Fix save_tiff() PlanarConfig assignment for monochrome inputs by @philboske in #7568
- Fix various compilation errors with AppleClang 14.0.3 by @steven-johnson in #7578
- fuzz: Add libfuzzer compatible bounds fuzzer by @silvergasp in #7549
- Significant change to RISC V and scalable vector code generation. by @zvookin in #7616
- Fix inverted may_subtile checks by @abadams in #7626
- Deprecate OpenGLCompute for Halide 16 by @shoaibkamil in #7627
New Contributors
- @sashashura made their first contribution in #7136
- @twesterhout made their first contribution in #7315
- @terryheo made their first contribution in #7323
- @adrian-lebioda made their first contribution in #7379
- @Ttayu made their first contribution in #7402
- @Yongqi-Zhuo made their first contribution in #7443
- @aekul made their first contribution in #6856
- @zhen8838 made their first contribution in #7494
- @maaz139 made their first contribution in #7421
- @silvergasp made their first contribution in #7512
- @dbabokin made their first contribution in #7545
- @philboske made their first contribution in #7568
Full Changelog: v15.0.1...v16.0.0