Vc - gcc @@@ test0 - begin @@@ @@@ scalar loops use Vc::malloc @@@ array size = 2000 @@@ vectorization enabled @@@ NO streaming stores /1000 done. scalar + : 1941 cycles/repetition, 8.464e-07 seconds/repetition, 1x speedup, 2.29 GHz, 1000 repetitions. /1000 done. vector + : 1000 cycles/repetition, 4.362e-07 seconds/repetition, 1.94x speedup, 2.29 GHz, 1000 repetitions. /1000 done. scalar - : 1984 cycles/repetition, 8.654e-07 seconds/repetition, 0.978x speedup, 2.29 GHz, 1000 repetitions. /1000 done. vector - : 415 cycles/repetition, 1.813e-07 seconds/repetition, 4.67x speedup, 2.29 GHz, 1000 repetitions. /1000 done. scalar * : 764 cycles/repetition, 3.33e-07 seconds/repetition, 2.54x speedup, 2.29 GHz, 1000 repetitions. /1000 done. vector * : 410 cycles/repetition, 1.79e-07 seconds/repetition, 4.73x speedup, 2.29 GHz, 1000 repetitions. /1000 done. scalar / : 2626 cycles/repetition, 1.144e-06 seconds/repetition, 0.739x speedup, 2.29 GHz, 1000 repetitions. /1000 done. vector / : 2616 cycles/repetition, 1.14e-06 seconds/repetition, 0.742x speedup, 2.29 GHz, 1000 repetitions. /1000 done. scalar sqrt : 29771 cycles/repetition, 1.297e-05 seconds/repetition, 0.0652x speedup, 2.29 GHz, 1000 repetitions. /1000 done. vector sqrt : 2520 cycles/repetition, 1.098e-06 seconds/repetition, 0.77x speedup, 2.29 GHz, 1000 repetitions. /1000 done. scalar log : 56425 cycles/repetition, 2.459e-05 seconds/repetition, 0.0344x speedup, 2.29 GHz, 1000 repetitions. /1000 done. vector log : 11427 cycles/repetition, 4.981e-06 seconds/repetition, 0.17x speedup, 2.29 GHz, 1000 repetitions. #different results: 44/2000 maxreldiff=1.18e-07 /1000 done. scalar pow : 320071 cycles/repetition, 0.0001395 seconds/repetition, 0.00607x speedup, 2.29 GHz, 1000 repetitions. /1000 done. vector pow : 23402 cycles/repetition, 1.02e-05 seconds/repetition, 0.083x speedup, 2.29 GHz, 1000 repetitions. #different results: 391/2000 maxreldiff=2.39e-07 @@@ test0 - done @@@ real 0m0.202s user 0m0.197s sys 0m0.004s boost::simd - gcc @@@ test0 - begin @@@ @@@ array size = 2000 @@@ vectorization enabled /1000 done. scalar + : 4945 cycles/repetition, 2.155e-06 seconds/repetition, 1x speedup, 2.29 GHz, 1000 repetitions. /1000 done. vector + : 1022 cycles/repetition, 4.456e-07 seconds/repetition, 4.84x speedup, 2.29 GHz, 1000 repetitions. /1000 done. scalar - : 4907 cycles/repetition, 2.139e-06 seconds/repetition, 1.01x speedup, 2.29 GHz, 1000 repetitions. /1000 done. vector - : 1011 cycles/repetition, 4.409e-07 seconds/repetition, 4.89x speedup, 2.29 GHz, 1000 repetitions. /1000 done. scalar * : 4936 cycles/repetition, 2.152e-06 seconds/repetition, 1x speedup, 2.29 GHz, 1000 repetitions. /1000 done. vector * : 1051 cycles/repetition, 4.581e-07 seconds/repetition, 4.71x speedup, 2.29 GHz, 1000 repetitions. /1000 done. scalar / : 10194 cycles/repetition, 4.443e-06 seconds/repetition, 0.485x speedup, 2.29 GHz, 1000 repetitions. /1000 done. vector / : 2691 cycles/repetition, 1.173e-06 seconds/repetition, 1.84x speedup, 2.29 GHz, 1000 repetitions. /1000 done. scalar sqrt : 19619 cycles/repetition, 8.55e-06 seconds/repetition, 0.252x speedup, 2.29 GHz, 1000 repetitions. /1000 done. vector sqrt : 2661 cycles/repetition, 1.16e-06 seconds/repetition, 1.86x speedup, 2.29 GHz, 1000 repetitions. /1000 done. scalar log : 59617 cycles/repetition, 2.598e-05 seconds/repetition, 0.083x speedup, 2.29 GHz, 1000 repetitions. /1000 done. vector nt2::log : 12467 cycles/repetition, 5.433e-06 seconds/repetition, 0.397x speedup, 2.29 GHz, 1000 repetitions. #different results: 70/2000 maxreldiff=1.17e-07 /1000 done. scalar pow : 326687 cycles/repetition, 0.0001424 seconds/repetition, 0.0151x speedup, 2.29 GHz, 1000 repetitions. /1000 done. vector nt2::pow : 39758 cycles/repetition, 1.733e-05 seconds/repetition, 0.124x speedup, 2.29 GHz, 1000 repetitions. #different results: 448/2000 maxreldiff=2.39e-07 @@@ test0 - done @@@ real 0m0.216s user 0m0.216s sys 0m0.000s