<div dir="ltr">Dear Tijskens,<div><br></div><div style>I am writing back to confirm that I get the same observations as you now. I am attaching a slightly modified code that puts the individual tests in some functions ( in order to look at the assembly and to enable binary instrumentation analysis ... ).</div>
<div style><br></div><div style>I also added the same tests using plain C-like arrays. Those seem to give good "Vc" performance immediately:</div><div style><div><br></div><div>@@@ test0 - begin @@@</div><div>std::vector<float> vector : 2087 cycles/repetition, 6.14e-07 seconds/repetition, 1x speedup, 3.4 GHz, 1000000 repetitions.</div>
<div>float * vector : 287 cycles/repetition, 8.45e-08 seconds/repetition, 7.27x speedup, 3.4 GHz, 1000000 repetitions.</div><div>Vc::vector<float> -one from std::vector : 635 cycles/repetition, 1.869e-07 seconds/repetition, 3.29x speedup, 3.4 GHz, 1000000 repetitions.</div>
<div>Vc::vector<float> -one from plain array : 319 cycles/repetition, 9.397e-08 seconds/repetition, 6.53x speedup, 3.4 GHz, 1000000 repetitions.</div><div>Vc::vector with -1 : 410 cycles/repetition, 1.207e-07 seconds/repetition, 5.09x speedup, 3.4 GHz, 1000000 repetitions.</div>
<div>Vc::vector with -1 and plain array : 317 cycles/repetition, 9.345e-08 seconds/repetition, 6.57x speedup, 3.4 GHz, 1000000 repetitions.</div><div>Vc::memory : 310 cycles/repetition, 9.144e-08 seconds/repetition, 6.72x speedup, 3.4 GHz, 1000000 repetitions.</div>
<div>@@@ test0 - done @@@</div><div><br></div></div><div style> </div><div style><br></div><div style>My conclusion is that std::vector is to be avoided ... ( and anyway I still had issues with alignment ). Note also that the compiler autovectorization is better than any other solution here ( probably because it also unrolls ... ).</div>
<div style><br></div><div style><br></div><div style>I compiled like this:</div><div style><br></div><div style><div>icc -mavx -I ./ -O2 -I ${VCROOT}/include testmodif.cpp -o foo.x -std=c++11 -L ${VCROOT}/lib -lVc -fabi-version=6</div>
<div><br></div></div><div style><br></div><div style>Best</div><div style><br></div><div style>Sandro</div><div style><br></div><div style><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-03-06 9:42 GMT+01:00 Tijskens Engelbert <span dir="ltr"><<a href="mailto:Engelbert.Tijskens@uantwerpen.be" target="_blank">Engelbert.Tijskens@uantwerpen.be</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div style="word-wrap:break-word">Dear sandro
<div>the attachments contains the main file test.cpp and the included timer.h</div>
<div>i included some unrolling tests as mathias suggested for the scalar case. that helps indeed. didn’t check the simd case so far.</div>
<div>kindest regards,</div>
<div>bert</div>
<div></div>
</div><div><div class="h5">
<div style="word-wrap:break-word">
<div></div>
<div></div>
</div>
<div style="word-wrap:break-word">
<div></div>
<div><br>
<div>
<div>On 05 Mar 2014, at 09:23, Sandro Wenzel <<a href="mailto:sandro.wenzel@cern.ch" target="_blank">sandro.wenzel@cern.ch</a>> wrote:</div>
<br>
<blockquote type="cite">
<div dir="ltr">Dear Tijskens,
<div><br>
</div>
<div>I was intrigued by your observations and tried to reproduce them but I failed. Actually, I feel like Matthias that measuring such short minimalistic code section is really tough.</div>
<div><br>
</div>
<div>Would you be able to share your benchmark code and the way you compile it such that I can have a more thorough look?</div>
<div><br>
</div>
<div>Best</div>
<div><br>
</div>
<div>Sandro</div>
<div><br>
</div>
<div><br>
<br>
<div>2014-03-04 18:57 GMT+01:00 Tijskens Engelbert <span dir="ltr">
<<a href="mailto:Engelbert.Tijskens@uantwerpen.be" target="_blank">Engelbert.Tijskens@uantwerpen.be</a>></span>:<br>
<blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">Dear all,
<div><br>
</div>
<div>I am trying to figure out how to use std::vector<float> efficiently in combination with Vc. (to have dynamic arrays and performance)</div>
<div><br>
</div>
<div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> std::<span style="color:rgb(0,97,65)">vector</span><<span style="color:rgb(147,26,104)">float</span>> x(1024);</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> <span style="color:#931a68">
for</span>( <span style="color:#931a68">int</span> i=0; i<ne; ++i ) {//initialize</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> x[i]=1.0;</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> }</div>
<div style="margin:0px;font-size:11px;font-family:Monaco">// scalar loop using std::vector</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> <span style="color:rgb(147,26,104)">for</span>(
<span style="color:rgb(147,26,104)">int</span> i=0; i<ne; ++i ) {</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> x[i] -= 1.0;</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> }</div>
<div style="margin:0px;font-size:11px;font-family:Monaco">// vector loop using std::vector</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> <span style="color:rgb(147,26,104)">
for</span>( <span style="color:rgb(147,26,104)">int</span> i=0; i<ne; i+=Vc::<span style="color:rgb(0,97,65)">float_v</span>::<span style="color:rgb(3,38,204)">Size</span> )</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> {</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> Vc::<span style="color:#006141">float_v</span> vx( &x[i] );</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> vx -= 1.0;</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> vx.store( &x[i] );</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> }</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"><span style="color:rgb(78,144,114)">// vector loop using Vc::Memory instead of std::vector</span></div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> Vc::<span style="text-decoration:underline;color:#006141">Memory</span><Vc::<span style="color:#006141">float_v</span>,ne> Vx;</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> <span style="color:#931a68">
for</span>( <span style="color:#931a68">int</span> i=0; i<ne; ++i ) {//initialize</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> Vx[i] = 1.0;</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> }</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> Vc::<span style="color:#006141">float_v</span> one(1.);</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> ET_TIME_THIS</div>
<div style="margin:0px;font-size:11px;font-family:Monaco;color:rgb(57,51,255)">
<span> ( </span>"<span style="text-decoration:underline">Vc</span>::Memory<Vc::float_v,<span style="text-decoration:underline">ne</span>> vector"<span>,</span></div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> <span style="color:#931a68">
for</span>( <span style="color:#931a68">int</span> i=0; i<nv; ++i ) {</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> Vx.<span style="text-decoration:underline">vector</span>(i) -= one;</div>
<div style="margin:0px;font-size:11px;font-family:Monaco"> }</div>
<div style="margin:0px;font-size:11px;font-family:Monaco">When i time these loops i get the following results</div>
</div>
<div style="margin:0px;font-size:11px;font-family:Monaco">
<div style="margin:0px;font-family:Menlo">scalar loop using std::vector : 2162 cycles/repetition, 9.4e-07 seconds/repetition, 1 x speedup, 2.3 GHz, 100 repetitions.</div>
<div style="margin:0px;font-family:Menlo">vector loop using std::vector : 357 cycles/repetition, 1.6e-07 seconds/repetition, 6.04x speedup, 2.24 GHz, 100 repetitions.</div>
<div style="margin:0px;font-family:Menlo">vector loop using Vc::Memory : 288 cycles/repetition, 1.2e-07 seconds/repetition, 7.49x speedup, 2.4 GHz, 100 repetitions.</div>
<div><br>
</div>
<div><br>
</div>
<div>is there a way to improve the vector loop using std::vector? By the way if i write the second loop as</div>
<div>
<div style="margin:0px">// vector loop using std::vector</div>
<div style="margin:0px"> <span style="color:rgb(147,26,104)">for</span>( <span style="color:rgb(147,26,104)">int</span> i=0; i<ne; i+=Vc::<span style="color:rgb(0,97,65)">float_v</span>::<span style="color:rgb(3,38,204)">Size</span> )</div>
<div style="margin:0px"> {</div>
<div style="margin:0px"> Vc::<span style="color:rgb(0,97,65)">float_v</span> vx( &x[i] );</div>
<div style="margin:0px"> Vx.<span style="text-decoration:underline">vector</span>(i) -= one;</div>
<div style="margin:0px"> vx.store( &x[i] );</div>
<div style="margin:0px"> }</div>
</div>
<div style="margin:0px">things get even worse, the speedup being only 4.2x roughly.</div>
<div><br>
</div>
</div>
</div>
<br>
_______________________________________________<br>
Vc mailing list<br>
<a href="mailto:Vc@compeng.uni-frankfurt.de" target="_blank">Vc@compeng.uni-frankfurt.de</a><br>
<a href="https://compeng.uni-frankfurt.de/mailman/listinfo/vc" target="_blank">https://compeng.uni-frankfurt.de/mailman/listinfo/vc</a><br>
<br>
</blockquote>
</div>
<br>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">Dr. Sandro Wenzel<br>
<div>PH / SFT</div>
<div>CERN <br>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr">Dr. Sandro Wenzel<br><div>PH / SFT</div><div>CERN <br><br></div></div>
</div>