Vc - gcc @@@ test0 - begin @@@ @@@ scalar loops use Vc::malloc @@@ array size = 2000 @@@ vectorization enabled @@@ NO streaming stores /10000 done. scalar + : 770 cycles/repetition, 3.358e-07 seconds/repetition, 1x speedup, 2.29 GHz, 10000 repetitions. /10000 done. vector + : 481 cycles/repetition, 2.098e-07 seconds/repetition, 1.6x speedup, 2.29 GHz, 10000 repetitions. /10000 done. scalar - : 768 cycles/repetition, 3.347e-07 seconds/repetition, 1x speedup, 2.29 GHz, 10000 repetitions. /10000 done. vector - : 418 cycles/repetition, 1.823e-07 seconds/repetition, 1.84x speedup, 2.29 GHz, 10000 repetitions. /10000 done. scalar * : 768 cycles/repetition, 3.351e-07 seconds/repetition, 1x speedup, 2.29 GHz, 10000 repetitions. /10000 done. vector * : 417 cycles/repetition, 1.819e-07 seconds/repetition, 1.85x speedup, 2.29 GHz, 10000 repetitions. /10000 done. scalar / : 2607 cycles/repetition, 1.136e-06 seconds/repetition, 0.296x speedup, 2.29 GHz, 10000 repetitions. /10000 done. vector / : 2510 cycles/repetition, 1.094e-06 seconds/repetition, 0.307x speedup, 2.29 GHz, 10000 repetitions. /10000 done. scalar sqrt : 18665 cycles/repetition, 8.134e-06 seconds/repetition, 0.0413x speedup, 2.29 GHz, 10000 repetitions. /10000 done. vector sqrt : 2479 cycles/repetition, 1.081e-06 seconds/repetition, 0.311x speedup, 2.29 GHz, 10000 repetitions. /10000 done. scalar log : 53051 cycles/repetition, 2.312e-05 seconds/repetition, 0.0145x speedup, 2.29 GHz, 10000 repetitions. /10000 done. vector log : 11354 cycles/repetition, 4.948e-06 seconds/repetition, 0.0679x speedup, 2.29 GHz, 10000 repetitions. #different results: 44/2000 maxreldiff=1.18e-07 /10000 done. scalar pow : 316990 cycles/repetition, 0.0001381 seconds/repetition, 0.00243x speedup, 2.29 GHz, 10000 repetitions. /10000 done. vector pow : 23566 cycles/repetition, 1.027e-05 seconds/repetition, 0.0327x speedup, 2.29 GHz, 10000 repetitions. #different results: 391/2000 maxreldiff=2.39e-07 @@@ test0 - done @@@ real 0m1.897s user 0m1.896s sys 0m0.000s boost::simd - gcc @@@ test0 - begin @@@ @@@ array size = 2000 @@@ vectorization enabled /10000 done. scalar + : 4857 cycles/repetition, 2.117e-06 seconds/repetition, 1x speedup, 2.29 GHz, 10000 repetitions. /10000 done. vector + : 984 cycles/repetition, 4.29e-07 seconds/repetition, 4.93x speedup, 2.29 GHz, 10000 repetitions. /10000 done. scalar - : 4778 cycles/repetition, 2.082e-06 seconds/repetition, 1.02x speedup, 2.29 GHz, 10000 repetitions. /10000 done. vector - : 959 cycles/repetition, 4.181e-07 seconds/repetition, 5.06x speedup, 2.29 GHz, 10000 repetitions. /10000 done. scalar * : 4810 cycles/repetition, 2.096e-06 seconds/repetition, 1.01x speedup, 2.29 GHz, 10000 repetitions. /10000 done. vector * : 1000 cycles/repetition, 4.36e-07 seconds/repetition, 4.85x speedup, 2.29 GHz, 10000 repetitions. /10000 done. scalar / : 10177 cycles/repetition, 4.435e-06 seconds/repetition, 0.477x speedup, 2.29 GHz, 10000 repetitions. /10000 done. vector / : 2660 cycles/repetition, 1.16e-06 seconds/repetition, 1.83x speedup, 2.29 GHz, 10000 repetitions. /10000 done. scalar sqrt : 18501 cycles/repetition, 8.062e-06 seconds/repetition, 0.263x speedup, 2.29 GHz, 10000 repetitions. /10000 done. vector sqrt : 2572 cycles/repetition, 1.121e-06 seconds/repetition, 1.89x speedup, 2.29 GHz, 10000 repetitions. /10000 done. scalar log : 57332 cycles/repetition, 2.498e-05 seconds/repetition, 0.0847x speedup, 2.29 GHz, 10000 repetitions. /10000 done. vector nt2::log : 12456 cycles/repetition, 5.428e-06 seconds/repetition, 0.39x speedup, 2.29 GHz, 10000 repetitions. #different results: 70/2000 maxreldiff=1.17e-07 /10000 done. scalar pow : 323627 cycles/repetition, 0.000141 seconds/repetition, 0.015x speedup, 2.29 GHz, 10000 repetitions. /10000 done. vector nt2::pow : 39257 cycles/repetition, 1.711e-05 seconds/repetition, 0.124x speedup, 2.29 GHz, 10000 repetitions. #different results: 448/2000 maxreldiff=2.39e-07 @@@ test0 - done @@@ real 0m2.111s user 0m2.110s sys 0m0.000s