v2.7.1
This is a patch release containing the following changes to v2.7:
- Fixed performance regression for batch normalization primitive in TBB and threadpool configurations (cd953e4)
- Improved grouped convolution performance on Xe Architecture GPUs (d7a781e, cb1f3fe, 4e84474, 7ba3c40)
- Fixed runtime error in int8 reorder on Intel GPUs (53532a9)
- Reverted MEMFD allocator in Xbyak to avoid segfaults in high load scenarios (3e29ae2)
- Fixed a defect with incorrect caching of BRGEMM-based matmul primitive implementations with trivial dimensions (87cd979)
- Improved depthwise convolution performance with per-tensor binary post-ops for Intel CPUs (f430a5a)
- Extended threadpool API to manage maximum concurrency (8a1e959, 64e5594)
- Fixed potential integer overflow in BRGEMM-based convolution implementation (25ccee3)
- Fixed performance regression in concat primitive with any format on Intel CPUs (2a60ade, feb614d)
- Fixed compile-time warnings in
matmul_perf
example (b5faa77) - Fixed 'insufficient registers in requested bundle' runtime error in convolution primitive on Xe Architecture GPUs (4c9d46a)
- Addressed performance regression for certain convolution cases on Xe Architecture GPUs (f28b58a, 18764fb)
- Added support for Intel DPC++/C++ Compiler 2023 (c3781c6, a1a8952, 9bc87e6, e3b1987)
- Fixed int8 matmul and inner product performance regression on Xe Architecture GPUs (3693fbf, c8adc17)
- Fixed accuracy issue for convolution, inner product and matmul primitives with
tanh
post-op on Xe Architecture GPUs (88b4e57, 83ce6d2, 6224dc6, 10f0d0a) - Suppressed spurious build warnings with GCC 11 (44255a8)