masked loads?

Tue Aug 9 08:04:28 CEST 2016

On Montag, 8. August 2016 17:48:32 CEST Giordano Khouri wrote:
> _mm_maskload_ps is an AVX intrinsic. _mm_maskmoveu_si128 is SSE2, but will
> cause address exceptions even if those bytes are masked.

That is a good point, thanks for pointing it out. However, I guess I should 
point out how the SSE vs. AVX namespaces/policy tags work (since Vc 1.0). When 
it says Vector<T, VectorAbi::Sse> (SSE::Vector<T> is just an alias for the 
former) then you're only asking for using xmm registers for function arguments 
of those types. When it says Vector<T, VectorAbi::Avx> you get ymm registers. 
The set of instructions used to implement the functions and operators is 
partially orthogonal. E.g. compile with -mavx2 and explicitly use SSE Vectors 
in your code: You'll get vector objects of 16 Bytes, but using AVX 
instructions and most importantly VEX encoding (and thus ternary 
instructions).

So, it is possible to implement an SSE::Vector function using AVX intrinsics. 
But it is not enough for the cases where you compile with -mno-avx. So it'll 
need an #ifdef for __AVX__ and fall back to a manual gather for the pure SSE 
case.

Cheers,
  Matthias

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                                https://kretzfamily.de
 GSI Helmholtzzentrum für Schwerionenforschung             https://gsi.de
 SIMD easy and portable                     https://github.com/VcDevel/Vc
──────────────────────────────────────────────────────────────────────────