Vc - gcc @@@ test0 - begin @@@ @@@ scalar loops use Vc::malloc @@@ array size = 2000 @@@ vectorization enabled @@@ NO streaming stores /100000 done. scalar + : 752 cycles/repetition, 3.278e-07 seconds/repetition, 1x speedup, 2.29 GHz, 100000 repetitions. /100000 done. vector + : 397 cycles/repetition, 1.733e-07 seconds/repetition, 1.89x speedup, 2.29 GHz, 100000 repetitions. /100000 done. scalar - : 745 cycles/repetition, 3.251e-07 seconds/repetition, 1.01x speedup, 2.29 GHz, 100000 repetitions. /100000 done. vector - : 410 cycles/repetition, 1.787e-07 seconds/repetition, 1.83x speedup, 2.29 GHz, 100000 repetitions. /100000 done. scalar * : 746 cycles/repetition, 3.255e-07 seconds/repetition, 1.01x speedup, 2.29 GHz, 100000 repetitions. /100000 done. vector * : 397 cycles/repetition, 1.73e-07 seconds/repetition, 1.89x speedup, 2.29 GHz, 100000 repetitions. /100000 done. scalar / : 2499 cycles/repetition, 1.089e-06 seconds/repetition, 0.301x speedup, 2.29 GHz, 100000 repetitions. /100000 done. vector / : 2512 cycles/repetition, 1.095e-06 seconds/repetition, 0.299x speedup, 2.29 GHz, 100000 repetitions. /100000 done. scalar sqrt : 18239 cycles/repetition, 7.948e-06 seconds/repetition, 0.0412x speedup, 2.29 GHz, 100000 repetitions. /100000 done. vector sqrt : 2500 cycles/repetition, 1.09e-06 seconds/repetition, 0.301x speedup, 2.29 GHz, 100000 repetitions. /100000 done. scalar log : 53060 cycles/repetition, 2.312e-05 seconds/repetition, 0.0142x speedup, 2.29 GHz, 100000 repetitions. /100000 done. vector log : 11075 cycles/repetition, 4.826e-06 seconds/repetition, 0.0679x speedup, 2.29 GHz, 100000 repetitions. #different results: 44/2000 maxreldiff=1.18e-07 /100000 done. scalar pow : 318772 cycles/repetition, 0.0001389 seconds/repetition, 0.00236x speedup, 2.29 GHz, 100000 repetitions. /100000 done. vector pow : 24079 cycles/repetition, 1.049e-05 seconds/repetition, 0.0312x speedup, 2.29 GHz, 100000 repetitions. #different results: 391/2000 maxreldiff=2.39e-07 @@@ test0 - done @@@ real 0m19.010s user 0m18.998s sys 0m0.000s boost::simd - gcc @@@ test0 - begin @@@ @@@ array size = 2000 @@@ vectorization enabled /100000 done. scalar + : 5147 cycles/repetition, 2.243e-06 seconds/repetition, 1x speedup, 2.29 GHz, 100000 repetitions. /100000 done. vector + : 948 cycles/repetition, 4.134e-07 seconds/repetition, 5.43x speedup, 2.29 GHz, 100000 repetitions. /100000 done. scalar - : 5343 cycles/repetition, 2.329e-06 seconds/repetition, 0.963x speedup, 2.29 GHz, 100000 repetitions. /100000 done. vector - : 977 cycles/repetition, 4.258e-07 seconds/repetition, 5.27x speedup, 2.29 GHz, 100000 repetitions. /100000 done. scalar * : 5421 cycles/repetition, 2.362e-06 seconds/repetition, 0.949x speedup, 2.29 GHz, 100000 repetitions. /100000 done. vector * : 994 cycles/repetition, 4.335e-07 seconds/repetition, 5.17x speedup, 2.29 GHz, 100000 repetitions. /100000 done. scalar / : 10557 cycles/repetition, 4.601e-06 seconds/repetition, 0.488x speedup, 2.29 GHz, 100000 repetitions. /100000 done. vector / : 2654 cycles/repetition, 1.157e-06 seconds/repetition, 1.94x speedup, 2.29 GHz, 100000 repetitions. /100000 done. scalar sqrt : 19258 cycles/repetition, 8.392e-06 seconds/repetition, 0.267x speedup, 2.29 GHz, 100000 repetitions. /100000 done. vector sqrt : 2581 cycles/repetition, 1.125e-06 seconds/repetition, 1.99x speedup, 2.29 GHz, 100000 repetitions. /100000 done. scalar log : 59894 cycles/repetition, 2.61e-05 seconds/repetition, 0.0859x speedup, 2.29 GHz, 100000 repetitions. /100000 done. vector nt2::log : 14230 cycles/repetition, 6.201e-06 seconds/repetition, 0.362x speedup, 2.29 GHz, 100000 repetitions. #different results: 70/2000 maxreldiff=1.17e-07 /100000 done. scalar pow : 326297 cycles/repetition, 0.0001422 seconds/repetition, 0.0158x speedup, 2.29 GHz, 100000 repetitions. /100000 done. vector nt2::pow : 39461 cycles/repetition, 1.72e-05 seconds/repetition, 0.13x speedup, 2.29 GHz, 100000 repetitions. #different results: 448/2000 maxreldiff=2.39e-07 @@@ test0 - done @@@ real 0m21.519s user 0m21.504s sys 0m0.000s