Hardware Acceleration
Erasure correcting codes (ECC/FEC) provide elegant solutions to many networking problems. Whether you want to build reliable super low latency applications or efficiently send data to 1000s of devices simultaneously, ECC/FEC can provide compelling advantages. However, one common concern when integrating an ECC/FEC algorithm is often how big is the computational overhead?
The unsatisfying answer to this question is, as often the case in engineering, it depends. More specifically it depends on the configuration of the algorithm. E.g. how much data is processed by the ECC/FEC encoder/decoder, what is the specific type of algorithm being used, etc.
Although the configuration of the algorithm does have a significant impact on the performance, the quality of the implementation also needs to be taken into account. The key question is essentially - how efficient is the software implementation of the specific algorithm?
Things like avoiding unnecessary memory allocation and copying of data can make a big difference. Additionally it’s often possible to boost performance further by taking advantage of modern CPUs SIMD (Single instruction Multiple Data) instructions.
Speeding up computations using SIMD
SIMD can be used to significantly speed up the computations of an ECC/FEC algorithm by processing data in parallel.
These computations are fundamental to all ECC/FEC algorithms and SIMD acceleration is therefore not tied to a specific algorithm but only depends on whether the implementation supports it.
What is the impact of SIMD
To provide some sense of the impact of SIMD the following graphs show the raw throughput of a typical ECC/FEC operation on both ARM (Android) and x86 (Desktop) CPUs. We tested using the following configurations:
aarch64: 64bit ARM CPU without using acceleration
aarch64-neon: 64bit ARM CPU using NEON SIMD acceleration
x86: 64bit x66 CPU without using acceleration
x86-ssse3: 64bit ARM CPU using SSSE3 SIMD acceleration
x86-avx2: 64bit ARM CPU using AVX2 SIMD acceleration
The graph shows the boost in performance from the non accelerated version to the SIMD accelerated operation can be several 100s MB/s.
If we look at the relative gain of adding SIMD acceleration you can see that on ARM we can have roughly a 5x speed-up in the algorithm and on x86 we can have up to 16x speed-up!
Clearly we lose a significant amount of performance without SIMD acceleration. This can have a big impact when running the algorithms, e.g., on battery driven or resource constrained devices, or running in a shared environment like a cloud service where processes are competing for valuable CPU time and every minute counts.
All Steinwurf’s ECC/FEC algorithms utilize SIMD acceleration with run-time detection of the CPU capabilities. This means the same binary can run on multiple different CPUs and automatically utilize the fastest SIMD acceleration available.
To learn more about Steinwurf’s ECC/FEC algorithm or discuss how we can help improve your ECC/FEC performance feel free to reach out at contact@steinwurf.com.