performance of gather/scatter with different types of indexes
Kay F. Jahnke
[please enable javascript to see the address]
Mon May 15 17:26:44 CEST 2017
Am 15.05.2017 um 09:56 schrieb Matthias Kretz:
>
> my guess:
> ST::IndexType is an alias for SimdArray<int, ?>. So, it's typically passed as
> one or two SIMD registers to the gather/scatter functions. However, I have not
> implemented gathers with the existing AVX2 intrinsics yet, so you get the
> scalar fallback implementation in all cases. Meaning, it has to read the
> scalar elements of the SimdArray. Thus, it has to do the same as for the
> TinyVector, except that the TinyVector is possibly easier to optimize for the
> compiler since the scalars are already passed around in memory or even general
> purpose registers
This must be what happens. I wasn't aware of the missing AVX2
gathers/scatters implementation issue.
But I reckon the AVX gather/scatters are there? I compiled with -mAVX,
tried both versions and could not detect a significant difference. Now
my gather/scatters are from quite widely spread out memory locations.
Maybe the code is so memory-bound that the speed difference between
using the intrinsics and the fallback scalar implementation is not
really visible because it's only a small portion of the execution time.
> If you are interested in optimizing gathers I'd be happy to help you with
> resolving https://github.com/VcDevel/Vc/issues/32.
I should, really, because my code uses gathers and scatters all the time.
Do I have to have a git account to actually see the issue? I can't see
anything but the issue page with the title and some sort of history
which I can't access.
Can you point me to the relevant bit of code so that I can get a feel
what would be required? If it's implemented in AVX, maybe I can just
take that as a template and adapt it.
Kay
More information about the Vc
mailing list