std::vector vs Vc::Memory
Sandro Wenzel
[please enable javascript to see the address]
Fri Mar 7 14:11:18 CET 2014
Dear Tijskens,
I am writing back to confirm that I get the same observations as you now. I
am attaching a slightly modified code that puts the individual tests in
some functions ( in order to look at the assembly and to enable binary
instrumentation analysis ... ).
I also added the same tests using plain C-like arrays. Those seem to give
good "Vc" performance immediately:
@@@ test0 - begin @@@
std::vector<float> vector : 2087 cycles/repetition, 6.14e-07
seconds/repetition, 1x speedup, 3.4 GHz, 1000000 repetitions.
float * vector : 287 cycles/repetition, 8.45e-08 seconds/repetition, 7.27x
speedup, 3.4 GHz, 1000000 repetitions.
Vc::vector<float> -one from std::vector : 635 cycles/repetition, 1.869e-07
seconds/repetition, 3.29x speedup, 3.4 GHz, 1000000 repetitions.
Vc::vector<float> -one from plain array : 319 cycles/repetition, 9.397e-08
seconds/repetition, 6.53x speedup, 3.4 GHz, 1000000 repetitions.
Vc::vector with -1 : 410 cycles/repetition, 1.207e-07 seconds/repetition,
5.09x speedup, 3.4 GHz, 1000000 repetitions.
Vc::vector with -1 and plain array : 317 cycles/repetition, 9.345e-08
seconds/repetition, 6.57x speedup, 3.4 GHz, 1000000 repetitions.
Vc::memory : 310 cycles/repetition, 9.144e-08 seconds/repetition, 6.72x
speedup, 3.4 GHz, 1000000 repetitions.
@@@ test0 - done @@@
My conclusion is that std::vector is to be avoided ... ( and anyway I still
had issues with alignment ). Note also that the compiler autovectorization
is better than any other solution here ( probably because it also unrolls
... ).
I compiled like this:
icc -mavx -I ./ -O2 -I ${VCROOT}/include testmodif.cpp -o foo.x -std=c++11
-L ${VCROOT}/lib -lVc -fabi-version=6
Best
Sandro
2014-03-06 9:42 GMT+01:00 Tijskens Engelbert <
[please enable javascript to see the address]>:
> Dear sandro
> the attachments contains the main file test.cpp and the included timer.h
> i included some unrolling tests as mathias suggested for the scalar case.
> that helps indeed. didn't check the simd case so far.
> kindest regards,
> bert
>
>[please enable javascript to see the address]> wrote:
>
> Dear Tijskens,
>
> I was intrigued by your observations and tried to reproduce them but I
> failed. Actually, I feel like Matthias that measuring such short
> minimalistic code section is really tough.
>
> Would you be able to share your benchmark code and the way you compile
> it such that I can have a more thorough look?
>
> Best
>
> Sandro
>
>
>
> 2014-03-04 18:57 GMT+01:00 Tijskens Engelbert <
>[please enable javascript to see the address]>:
>
> Dear all,
>
> I am trying to figure out how to use std::vector<float> efficiently in
> combination with Vc. (to have dynamic arrays and performance)
>
> std::vector<float> x(1024);
> for( int i=0; i<ne; ++i ) {//initialize
> x[i]=1.0;
> }
> // scalar loop using std::vector
> for( int i=0; i<ne; ++i ) {
> x[i] -= 1.0;
> }
> // vector loop using std::vector
> for( int i=0; i<ne; i+=Vc::float_v::Size )
> {
> Vc::float_v vx( &x[i] );
> vx -= 1.0;
> vx.store( &x[i] );
> }
> // vector loop using Vc::Memory instead of std::vector
> Vc::Memory<Vc::float_v,ne> Vx;
> for( int i=0; i<ne; ++i ) {//initialize
> Vx[i] = 1.0;
> }
> Vc::float_v one(1.);
> ET_TIME_THIS
> ( "Vc::Memory<Vc::float_v,ne> vector",
> for( int i=0; i<nv; ++i ) {
> Vx.vector(i) -= one;
> }
> When i time these loops i get the following results
> scalar loop using std::vector : 2162 cycles/repetition, 9.4e-07
> seconds/repetition, 1 x speedup, 2.3 GHz, 100 repetitions.
> vector loop using std::vector : 357 cycles/repetition, 1.6e-07
> seconds/repetition, 6.04x speedup, 2.24 GHz, 100 repetitions.
> vector loop using Vc::Memory : 288 cycles/repetition, 1.2e-07
> seconds/repetition, 7.49x speedup, 2.4 GHz, 100 repetitions.
>
>
> is there a way to improve the vector loop using std::vector? By the way
> if i write the second loop as
> // vector loop using std::vector
> for( int i=0; i<ne; i+=Vc::float_v::Size )
> {
> Vc::float_v vx( &x[i] );
> Vx.vector(i) -= one;
> vx.store( &x[i] );
> }
> things get even worse, the speedup being only 4.2x roughly.
>
>
> _______________________________________________
> Vc mailing list
>[please enable javascript to see the address]
> https://compeng.uni-frankfurt.de/mailman/listinfo/vc
>
>
>
>
> --
> Dr. Sandro Wenzel
> PH / SFT
> CERN
>
>
>
--
Dr. Sandro Wenzel
PH / SFT
CERN
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://compeng.uni-frankfurt.de/pipermail/vc/attachments/20140307/b81fe582/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: testmodif2.cpp
Type: text/x-c++src
Size: 3019 bytes
Desc: not available
URL: <http://compeng.uni-frankfurt.de/pipermail/vc/attachments/20140307/b81fe582/attachment.cpp>
More information about the Vc
mailing list