Hello,
The 24.11 release of Compute Library is out and comes with a collection
of improvements and new features.
Source code and prebuilt binaries are available at:
[1]https://github.com/ARM-software/ComputeLibrary/releases/tag/v24.11
Highlights of the release:
* Add SVE SoftmaxLayer kernel for BF16
* Provide stateless API for CpuGemmLowpMatrixMultiplyCore,
CpuQuantize, and DequantizationLayer
* Extend static quantization interface for both matmul and
convolution operations
IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy
the information in any medium. Thank you.
References
1. https://github.com/ARM-software/ComputeLibrary/releases/tag/v24.11
Hello,
The 24.09 release of Compute Library is out and comes with a collection
of improvements and new features.
Source code and prebuilt binaries are available at:
[1]https://github.com/ARM-software/ComputeLibrary/releases/tag/v24.09
Highlights of the release:
* Provide a wrapper class to expose cpu::CpuSoftmaxGeneric
* Detect number of cores in Windows®
* Add Optimized SME kernel for QASYMM8_SIGNED elementwise addition
operation
IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy
the information in any medium. Thank you.
References
1. https://github.com/ARM-software/ComputeLibrary/releases/tag/v24.09
Hello,
The 24.08.1 release of Compute Library is out and comes with a
collection of improvements and new features.
Source code and prebuilt binaries are available at:
[1]https://github.com/ARM-software/ComputeLibrary/releases/tag/v24.08.1
Highlights of the release:
* Change inheritance qualifiers of experimental Cpu operator
interface classes to public for cpu-wrappers.
* Mismatches in static quantization updated after configure tests
* CpuSoftmax configure ignores is_log on validation
* Linker errors in armv8.2a Windows® builds
IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy
the information in any medium. Thank you.
References
1. https://github.com/ARM-software/ComputeLibrary/releases/tag/v24.08.1
Hello,
The v24.05 release of Compute Library is out and comes with a collection of improvements and new features.
Source code and prebuilt binaries are available at: https://github.com/ARM-software/ComputeLibrary/releases/tag/v24.05
Highlights of the release:
- Add CLScatter operator for FP32/16, S32/16/8, U32/16/8 data types.
- Various fixes to enable FP16 kernels in armv8a multi_isa builds.
- Updated logic in the OpenMP scheduler to exclude LITTLE cores.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hello,
The v24.02.1 release of Compute Library is out and comes with a collection of improvements.
Source code and prebuilt binaries are available at: https://github.com/ARM-software/ComputeLibrary/releases/tag/v24.02.1
Highlights of the release:
- Fix performance regression in fixed-format kernels
- Fix compile and runtime errors in arm_compute_validation for Windows on Arm(WoA)
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hello,
The v23.11 release of Compute Library is out and comes with a collection of improvements and new features.
Source code and prebuilt binaries are available at:
https://github.com/ARM-software/ComputeLibrary/releases/tag/v23.11
[https://opengraph.githubassets.com/9c6e9733a1038ab714edff3a08a9589bd88d70f3…]<https://github.com/ARM-software/ComputeLibrary/releases/tag/v23.11>
Release v23.11 · ARM-software/ComputeLibrary<https://github.com/ARM-software/ComputeLibrary/releases/tag/v23.11>
Public major release Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available here: https://arm-software.github.io/ComputeLibrary/v23.11github.com
Highlights of the release:
- New features
- Add support for input data type U64/S64 in CLCast and NECast.
- Add support for output data type S64 in NEArgMinMaxLayer and CLArgMinMaxLayer
- Port the following kernels in the experimental Dynamic Fusion interface to use the new Compute Kernel Writer interface:
- experimental::dynamic_fusion::GpuCkwResize
- experimental::dynamic_fusion::GpuCkwPool2d
- experimental::dynamic_fusion::GpuCkwDepthwiseConv2d
- experimental::dynamic_fusion::GpuCkwMatMul
- Add support for OpenCL™ comand buffer with mutable dispatch extension.
- Add support for Arm® Cortex®-A520 and Arm® Cortex®-R82.
- Add support for negative axis values and inverted axis values in arm_compute::NEReverse and arm_compute::CLReverse.
- Add new OpenCL™ kernels:
- opencl::kernels::ClMatMulLowpNativeMMULKernel support for QASYMM8 and QASYMM8_SIGNED, with batch support
- Performance optimizations:
- Optimize cpu::CpuReshape
- Optimize opencl::ClTranspose
- Optimize NEStackLayer
- Optimize CLReductionOperation.
- Optimize CLSoftmaxLayer.
- Optimize start-up time of NEConvolutionLayer for some input configurations where GeMM is selected as the convolution algorithm
- Reduce CPU Overhead by optimal flushing of CL kernels.
- Deprecate support for Bfloat16 in cpu::CpuCast.
- Support for U32 axis in arm_compute::NEReverse and arm_compute::CLReverse will be deprecated in 24.02.
- Remove legacy PostOps interface. PostOps was the experimental interface for kernel fusion and is replaced by the new Dynamic Fusion interface.
- Update OpenCL™ API headers to v2023.04.17.
Thanks
ACL
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi, I wonder if ARM Compute Library can be built and run on ARM v7l
processors, qith subset of the functionalities as SVE not supported on
v7l? Thanks for info
Hello,
The 23.08 release of Compute Library is out and comes with a collection of improvements and new features.
Source code and prebuilt binaries are available at: https://github.com/ARM-software/ComputeLibrary/releases/tag/v23.08
[https://opengraph.githubassets.com/6f01aff4f7ab61ec8b32d60f2ac777cf469f2c19…]<https://github.com/ARM-software/ComputeLibrary/releases/tag/v23.08>
Release v23.08 · ARM-software/ComputeLibrary<https://github.com/ARM-software/ComputeLibrary/releases/tag/v23.08>
Public major release Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available here: https://arm-software.github.io/ComputeLibrary/v23.08/github.com
Highlights of the release:
* Rewrite CLArgMinMaxLayer for axis 0 and enable S64 output.
* Add multi-sketch support for dynamic fusion.
* Break up arm_compute/core/Types.h and utils/Utils.h a bit to reduce unused code in each inclusion of these headers.
* Add Fused Activation to CLMatMul.
* Implement FP32/FP16 opencl::kernels::ClMatMulNativeMMULKernel using the MMUL extension.
* Use MatMul in fully connected layer with dynamic weights when supported.
* Optimize CPU depthwise convolution with channel multiplier.
* Add support in CpuCastKernel for conversion of S64/U64 to F32.
* Add new OpenCL™ kernels:
opencl::kernels::ClMatMulNativeMMULKernel support for FP32 and FP16, with batch support
* Enable transposed convolution with non-square kernels on CPU and GPU.
* Add support for input data type U64/S64 in CLCast.
* Add new Compute Kernel Writer (CKW) subproject that offers a C++ interface to generate tile-based OpenCL code in just-in-time fashion.
* Port the following kernels in the experimental Dynamic Fusion interface to use the new Compute Kernel Writer interface with support for FP16/FP32 only:
experimental::dynamic_fusion::GpuCkwActivation
experimental::dynamic_fusion::GpuCkwCast
experimental::dynamic_fusion::GpuCkwDirectConv2d
experimental::dynamic_fusion::GpuCkwElementwiseBinary
experimental::dynamic_fusion::GpuCkwStore
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.