Vc - gcc @@@ test0 - begin @@@ @@@ scalar loops use Vc::malloc @@@ array size = 2000 @@@ vectorization enabled @@@ NO streaming stores /10 done. scalar + : 4054 cycles/repetition, 1.8e-06 seconds/repetition, 1x speedup, 2.25 GHz, 10 repetitions. /10 done. vector + : 1359 cycles/repetition, 6.108e-07 seconds/repetition, 2.98x speedup, 2.23 GHz, 10 repetitions. /10 done. scalar - : 2120 cycles/repetition, 9.404e-07 seconds/repetition, 1.91x speedup, 2.25 GHz, 10 repetitions. /10 done. vector - : 1363 cycles/repetition, 6.094e-07 seconds/repetition, 2.97x speedup, 2.24 GHz, 10 repetitions. /10 done. scalar * : 2091 cycles/repetition, 9.271e-07 seconds/repetition, 1.94x speedup, 2.26 GHz, 10 repetitions. /10 done. vector * : 1330 cycles/repetition, 5.948e-07 seconds/repetition, 3.05x speedup, 2.24 GHz, 10 repetitions. /10 done. scalar / : 6797 cycles/repetition, 2.977e-06 seconds/repetition, 0.597x speedup, 2.28 GHz, 10 repetitions. /10 done. vector / : 6781 cycles/repetition, 2.97e-06 seconds/repetition, 0.598x speedup, 2.28 GHz, 10 repetitions. /10 done. scalar sqrt : 49851 cycles/repetition, 2.174e-05 seconds/repetition, 0.0813x speedup, 2.29 GHz, 10 repetitions. /10 done. vector sqrt : 6773 cycles/repetition, 2.966e-06 seconds/repetition, 0.599x speedup, 2.28 GHz, 10 repetitions. /10 done. scalar log : 148018 cycles/repetition, 6.452e-05 seconds/repetition, 0.0274x speedup, 2.29 GHz, 10 repetitions. /10 done. vector log : 31952 cycles/repetition, 1.412e-05 seconds/repetition, 0.127x speedup, 2.26 GHz, 10 repetitions. #different results: 44/2000 maxreldiff=1.18e-07 /10 done. scalar pow : 894171 cycles/repetition, 0.0003898 seconds/repetition, 0.00453x speedup, 2.29 GHz, 10 repetitions. /10 done. vector pow : 63665 cycles/repetition, 2.776e-05 seconds/repetition, 0.0637x speedup, 2.29 GHz, 10 repetitions. #different results: 391/2000 maxreldiff=2.39e-07 @@@ test0 - done @@@ real 0m0.009s user 0m0.004s sys 0m0.005s boost::simd - gcc @@@ test0 - begin @@@ @@@ array size = 2000 @@@ vectorization enabled /10 done. scalar + : 12779 cycles/repetition, 5.609e-06 seconds/repetition, 1x speedup, 2.28 GHz, 10 repetitions. /10 done. vector + : 3945 cycles/repetition, 1.74e-06 seconds/repetition, 3.24x speedup, 2.27 GHz, 10 repetitions. /10 done. scalar - : 12730 cycles/repetition, 5.566e-06 seconds/repetition, 1x speedup, 2.29 GHz, 10 repetitions. /10 done. vector - : 3298 cycles/repetition, 1.453e-06 seconds/repetition, 3.87x speedup, 2.27 GHz, 10 repetitions. /10 done. scalar * : 12742 cycles/repetition, 5.568e-06 seconds/repetition, 1x speedup, 2.29 GHz, 10 repetitions. /10 done. vector * : 3220 cycles/repetition, 1.418e-06 seconds/repetition, 3.97x speedup, 2.27 GHz, 10 repetitions. /10 done. scalar / : 26887 cycles/repetition, 1.173e-05 seconds/repetition, 0.475x speedup, 2.29 GHz, 10 repetitions. /10 done. vector / : 7161 cycles/repetition, 3.136e-06 seconds/repetition, 1.78x speedup, 2.28 GHz, 10 repetitions. /10 done. scalar sqrt : 56989 cycles/repetition, 2.486e-05 seconds/repetition, 0.224x speedup, 2.29 GHz, 10 repetitions. /10 done. vector sqrt : 7136 cycles/repetition, 3.127e-06 seconds/repetition, 1.79x speedup, 2.28 GHz, 10 repetitions. /10 done. scalar log : 159360 cycles/repetition, 6.947e-05 seconds/repetition, 0.0802x speedup, 2.29 GHz, 10 repetitions. /10 done. vector nt2::log : 34108 cycles/repetition, 1.489e-05 seconds/repetition, 0.375x speedup, 2.29 GHz, 10 repetitions. #different results: 70/2000 maxreldiff=1.17e-07 /10 done. scalar pow : 900736 cycles/repetition, 0.0003925 seconds/repetition, 0.0142x speedup, 2.29 GHz, 10 repetitions. /10 done. vector nt2::pow : 104892 cycles/repetition, 4.573e-05 seconds/repetition, 0.122x speedup, 2.29 GHz, 10 repetitions. #different results: 448/2000 maxreldiff=2.39e-07 @@@ test0 - done @@@ real 0m0.010s user 0m0.009s sys 0m0.000s