Intel SVML support

Thu Feb 7 18:48:17 CET 2013

[please enable javascript to see the address]> wrote:
> Hello Roberto,
>  
> great initiative! Supporting the use of SVML is a very nice feature. Obviously this needs to stay optional as you already implemented it.
> It would be fine to adjust the unit tests to simply expect different precision when compiled against SVML.
>  
> I would open up a new branch in git to integrate these changes. If you send me the patch as an attachement I would be able to easily apply it. Or you could register at code.compeng.uni-frankfurt.de and get commit access yourself (then the commit history would carry your name properly :) ).

Sounds good, if you give me commit rights I will merge the patch myself in the next couple of weeks.

> A few comments on the patch:
>  
> - The cmake logic should use a switch (like the BUILD_EXAMPLES switch) to request use of SVML. In that case cmake would search for the library (find_library is what you need) and if not found it would then error out.

Ok

> - Your functions now look like this:
>  
> static inline Vector<float> exp(VC_ALIGNED_PARAMETER(Vector<float>) x) {
>     Vector<float> tmp;
>     tmp.data() = __svml_expf4(x.data());
>     return tmp;
> }
>  
> Have you tried to do:
> static inline Vector<float> exp(VC_ALIGNED_PARAMETER(Vector<float>) x) {
>     return __svml_expf4(x.data());
> }
> ? If __svml_expf4 returns __m128/__m256 this should do exactly what you want, as the compiler sees it needs to call the Vector<float>(__m128) constructor.
>  

> - The sincos hack is dangerous. You should use VC_GNU_ASM to determine whether __asm__ is allowed. Then the movaps vs. vmovaps is determined by VC_USE_VEX_CODING. Note that you can compile Vc with SSE and VEX coding. Also, the Windows branch of the SSE implementation looks like a copy-paste error. :)
> I find it strange, though, that SVML returns the cosine in xmm1/ymm1. Are you sure that it does not expect a pointer as one of it arguments? Is this documented somewhere?

I followed the approach used in Agner Fog's vector library (http://www.agner.org/optimize/vectorclass.pdf); I assume there is a good reason why he is doing it this way but I will look into it in more detail.

Thanks,
Roberto

> Regards,
> 	Matthias
>  
> On Thursday 31 January 2013 13:28:48 Roberto Agostino Vitillo wrote:
> > Hi,
> > 
> > The following patch adds support for the Intel SVML library. Intel SVML has
> > an accuracy of 4 ulp (typically 2) and in general it seems to outperform Vc
> > by a factor of up to 2 (Ivy Bridge). The Intel library provides also a
> > higher accuracy for double precision which is vital for the science
> > experiment I am working for.
> > 
> > Support is enabled by passing to cmake the path of the Intel SVML library
> > through the INTEL_SVML_PATH flag i.e. cmake
> > -DINTEL_SVML_PATH=/opt/intel/composerxe/compiler/lib/intel64/.
> > 
> > The following tests are failing on Linux when enabling SVML:
> > 
> > c++11_math_sse (Failed)
> > c++11_math_avx (Failed)
> > math_VC_LOG_ILP_sse (Failed)
> > math_VC_LOG_ILP_avx (Failed)
> >  c++11_math_VC_LOG_ILP_sse (Failed)
> > c++11_math_VC_LOG_ILP_avx (Failed)
> > math_VC_LOG_ILP2_sse (Failed)
> > math_VC_LOG_ILP2_avx (Failed)
> > c++11_math_VC_LOG_ILP2_sse (Failed)
> > c++11_math_VC_LOG_ILP2_avx (Failed)
> > 
> > They all fail on exp() and log(). Vc allows a distance of 1 and 2 for single
> > and double precision respectively while SVML has a distance of 3 in some
> > cases.
> > 
> > I am sure the code can be organized better architecturally but it should
> > provide everything you need to hopefully add support for the Intel library.
> > 
> > Roberto
> > 
> > 
> > diff --git a/CMakeLists.txt b/CMakeLists.txt
> > index 9895338..83c4a46 100644
> > --- a/CMakeLists.txt
> > +++ b/CMakeLists.txt
> > @@ -98,6 +98,11 @@ if(Vc_COMPILER_IS_INTEL)
> >     set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -w1 -fp-model precise")
> >  endif()
> > 
> > +if(INTEL_SVML_PATH)
> > +    add_definitions(-DUSE_INTEL_SVML)
> > +    set(CMAKE_EXE_LINKER_FLAGS "-L ${INTEL_SVML}/ -lsvml")
> > +endif(INTEL_SVML_PATH)
> > +
> >  if(CMAKE_BUILD_TYPE STREQUAL "" AND NOT CMAKE_CXX_FLAGS MATCHES "-O[123]")
> >     message(STATUS "WARNING! It seems you are compiling without
> > optimization. Please set CMAKE_BUILD_TYPE.") endif(CMAKE_BUILD_TYPE
> > STREQUAL "" AND NOT CMAKE_CXX_FLAGS MATCHES "-O[123]") diff --git
> > a/common/exponential.h b/common/exponential.h
> > index 9063172..1f14e20 100644
> > --- a/common/exponential.h
> > +++ b/common/exponential.h
> > @@ -49,6 +49,40 @@ namespace Common
> >      template<typename T> struct TypenameForLdexp { typedef Vector<int>
> > Type; }; template<> struct TypenameForLdexp<Vc::sfloat> { typedef
> > Vector<short> Type; };
> > 
> > +#if defined(USE_INTEL_SVML)
> > +#if defined(VC_IMPL_SSE)
> > +static inline Vector<float> exp(VC_ALIGNED_PARAMETER(Vector<float>) x) {
> > +    Vector<float> tmp;
> > +    tmp.data() = __svml_expf4(x.data());
> > +    return tmp;
> > +}
> > +
> > +static inline Vector<sfloat> exp(VC_ALIGNED_PARAMETER(Vector<sfloat>) x) {
> > +    Vector<sfloat> tmp;
> > +    tmp.data()[0] = __svml_expf4(x.data()[0]);
> > +    tmp.data()[1] = __svml_expf4(x.data()[1]);
> > +    return tmp;
> > +}
> > +
> > +static inline Vector<double> exp(VC_ALIGNED_PARAMETER(Vector<double>) x) {
> > +    Vector<double> tmp;
> > +    tmp.data() = __svml_exp2(x.data());
> > +    return tmp;
> > +}
> > +#else
> > +template<typename T> static inline Vector<T>
> > exp(VC_ALIGNED_PARAMETER(Vector<T>) x) { +    Vector<T> tmp;
> > +    tmp.data() = __svml_expf8(x.data());
> > +    return tmp;
> > +}
> > +
> > +template<> inline Vector<double> exp(VC_ALIGNED_PARAMETER(Vector<double>)
> > x) { +    Vector<double> tmp;
> > +    tmp.data() = __svml_exp4(x.data());
> > +    return tmp;
> > +}
> > +#endif
> > +#else
> >      template<typename T> static inline Vector<T>
> > exp(VC_ALIGNED_PARAMETER(Vector<T>) _x) { typedef Vector<T> V;
> >          typedef typename V::Mask M;
> > @@ -131,6 +165,7 @@ namespace Common
> > 
> >          return x;
> >      }
> > +#endif
> >  } // namespace Common
> >  namespace VC__USE_NAMESPACE
> >  {
> > diff --git a/common/logarithm.h b/common/logarithm.h
> > index f5b8455..5247ce6 100644
> > --- a/common/logarithm.h
> > +++ b/common/logarithm.h
> > @@ -49,6 +49,8 @@
> >  #define VC_COMMON_LOGARITHM_H
> > 
> >  #include "macros.h"
> > +#include "svml.h"
> > +
> >  namespace Vc
> >  {
> >  namespace Common
> > @@ -56,6 +58,9 @@ namespace Common
> >  #ifdef VC__USE_NAMESPACE
> >  using Vc::VC__USE_NAMESPACE::Const;
> >  using Vc::VC__USE_NAMESPACE::Vector;
> > +using Vc::VC__USE_NAMESPACE::float_v;
> > +using Vc::VC__USE_NAMESPACE::sfloat_v;
> > +using Vc::VC__USE_NAMESPACE::double_v;
> >  #endif
> >  enum LogarithmBase {
> >      BaseE, Base10, Base2
> > @@ -166,8 +171,8 @@ struct LogImpl
> >          }
> >      }
> > 
> > -    static inline Vc_ALWAYS_INLINE void log_series(Vector<double>
> > &VC_RESTRICT x, Vector<double>::AsArg exponent) { -        typedef
> > Vector<double> V;
> > +    static inline Vc_ALWAYS_INLINE void log_series(double_v &VC_RESTRICT x,
> > double_v::AsArg exponent) { +        typedef double_v V;
> >          typedef Const<double> C;
> >          const V x2 = x * x;
> >          V y = C::P(0);
> > @@ -246,6 +251,107 @@ struct LogImpl
> >      }
> >  };
> > 
> > +#if defined(USE_INTEL_SVML)
> > +#if defined(VC_IMPL_SSE)
> > +// log
> > +static inline float_v log(VC_ALIGNED_PARAMETER(float_v) x) {
> > +    float_v tmp;
> > +    tmp.data() = __svml_logf4(x.data());
> > +    return tmp;
> > +}
> > +
> > +static inline sfloat_v log(VC_ALIGNED_PARAMETER(sfloat_v) x) {
> > +    sfloat_v tmp;
> > +    tmp.data()[0] = __svml_logf4(x.data()[0]);
> > +    tmp.data()[1] = __svml_logf4(x.data()[1]);
> > +    return tmp;
> > +}
> > +
> > +static inline double_v log(VC_ALIGNED_PARAMETER(double_v) x) {
> > +    double_v tmp;
> > +    tmp.data() = __svml_log2(x.data());
> > +    return tmp;
> > +}
> > +
> > +// log10
> > +static inline float_v log10(VC_ALIGNED_PARAMETER(float_v) x) {
> > +    float_v tmp;
> > +    tmp.data() = __svml_log10f4(x.data());
> > +    return tmp;
> > +}
> > +
> > +static inline sfloat_v log10(VC_ALIGNED_PARAMETER(sfloat_v) x) {
> > +    sfloat_v tmp;
> > +    tmp.data()[0] = __svml_log10f4(x.data()[0]);
> > +    tmp.data()[1] = __svml_log10f4(x.data()[1]);
> > +    return tmp;
> > +}
> > +
> > +static inline double_v log10(VC_ALIGNED_PARAMETER(double_v) x) {
> > +    double_v tmp;
> > +    tmp.data() = __svml_log102(x.data());
> > +    return tmp;
> > +}
> > +// log2
> > +static inline float_v log2(VC_ALIGNED_PARAMETER(float_v) x) {
> > +    float_v tmp;
> > +    tmp.data() = __svml_log2f4(x.data());
> > +    return tmp;
> > +}
> > +
> > +static inline sfloat_v log2(VC_ALIGNED_PARAMETER(sfloat_v) x) {
> > +    sfloat_v tmp;
> > +    tmp.data()[0] = __svml_log2f4(x.data()[0]);
> > +    tmp.data()[1] = __svml_log2f4(x.data()[1]);
> > +    return tmp;
> > +}
> > +
> > +static inline double_v log2(VC_ALIGNED_PARAMETER(double_v) x) {
> > +    double_v tmp;
> > +    tmp.data() = __svml_log22(x.data());
> > +    return tmp;
> > +}
> > +#else
> > +// log
> > +template<typename T> static inline Vector<T>
> > log(VC_ALIGNED_PARAMETER(Vector<T>) x) { +    Vector<T> tmp;
> > +    tmp.data() = __svml_logf8(x.data());
> > +    return tmp;
> > +}
> > +
> > +template<> inline double_v log(VC_ALIGNED_PARAMETER(double_v) x) {
> > +    double_v tmp;
> > +    tmp.data() = __svml_log4(x.data());
> > +    return tmp;
> > +}
> > +
> > +// log10
> > +template<typename T> static inline Vector<T>
> > log10(VC_ALIGNED_PARAMETER(Vector<T>) x) { +    Vector<T> tmp;
> > +    tmp.data() = __svml_log10f8(x.data());
> > +    return tmp;
> > +}
> > +
> > +template<> inline double_v log10(VC_ALIGNED_PARAMETER(double_v) x) {
> > +    double_v tmp;
> > +    tmp.data() = __svml_log104(x.data());
> > +    return tmp;
> > +}
> > +
> > +// log2
> > +template<typename T> static inline Vector<T>
> > log2(VC_ALIGNED_PARAMETER(Vector<T>) x) { +    Vector<T> tmp;
> > +    tmp.data() = __svml_log2f8(x.data());
> > +    return tmp;
> > +}
> > +
> > +template<> inline double_v log2(VC_ALIGNED_PARAMETER(double_v) x) {
> > +    double_v tmp;
> > +    tmp.data() = __svml_log24(x.data());
> > +    return tmp;
> > +}
> > +#endif
> > +#else
> >  template<typename T> static inline Vector<T>
> > log(VC_ALIGNED_PARAMETER(Vector<T>) x) { typedef typename Vector<T>::Mask
> > M;
> >      typedef Const<T> C;
> > @@ -261,6 +367,8 @@ template<typename T> static inline Vector<T>
> > log2(VC_ALIGNED_PARAMETER(Vector<T> typedef Const<T> C;
> >      return LogImpl<Base2>::calc(x);
> >  }
> > +#endif
> > +
> >  } // namespace Common
> >  #ifdef VC__USE_NAMESPACE
> >  namespace VC__USE_NAMESPACE
> > diff --git a/common/svml.h b/common/svml.h
> > new file mode 100644
> > index 0000000..8ecd782
> > --- /dev/null
> > +++ b/common/svml.h
> > @@ -0,0 +1,68 @@
> > +/*  This file is part of the Vc library.
> > +
>[please enable javascript to see the address]>
> > +
> > +    Vc is free software: you can redistribute it and/or modify
> > +    it under the terms of the GNU Lesser General Public License as
> > +    published by the Free Software Foundation, either version 3 of
> > +    the License, or (at your option) any later version.
> > +
> > +    Vc is distributed in the hope that it will be useful, but
> > +    WITHOUT ANY WARRANTY; without even the implied warranty of
> > +    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +    GNU Lesser General Public License for more details.
> > +
> > +    You should have received a copy of the GNU Lesser General Public
> > +    License along with Vc.  If not, see <http://www.gnu.org/licenses/>.
> > +*/
> > +
> > +#ifndef VC_COMMON_SVML_H
> > +#define VC_COMMON_SVML_H
> > +
> > +#if defined(USE_INTEL_SVML)
> > +extern "C"{
> > +__m128 __svml_sinf4(__m128 v1);
> > +__m128d __svml_sin2(__m128d v1);
> > +__m128 __svml_cosf4(__m128 v1);
> > +__m128d __svml_cos2(__m128d v1);
> > +__m128 __svml_sincosf4(__m128 v1);
> > +__m128d __svml_sincos2(__m128d v1);
> > +__m128 __svml_asinf4(__m128 v1);
> > +__m128d __svml_asin2(__m128d v1);
> > +__m128 __svml_atanf4(__m128 v1);
> > +__m128d __svml_atan2(__m128d v1);
> > +__m128 __svml_atan2f4(__m128 v1, __m128 v2);
> > +__m128d __svml_atan22(__m128d v1, __m128d v2);
> > +__m128 __svml_logf4(__m128 v1);
> > +__m128d __svml_log2(__m128d v1);
> > +__m128 __svml_log2f4(__m128 v1);
> > +__m128d __svml_log22(__m128d v1);
> > +__m128 __svml_log10f4(__m128 v1);
> > +__m128d __svml_log102(__m128d v1);
> > +__m128 __svml_expf4(__m128 v1);
> > +__m128d __svml_exp2(__m128d v1);
> > +
> > +__m256 __svml_sinf8(__m256 v1);
> > +__m256d __svml_sin4(__m256d v1);
> > +__m256 __svml_cosf8(__m256 v1);
> > +__m256d __svml_cos4(__m256d v1);
> > +__m256 __svml_sincosf8(__m256 v1);
> > +__m256d __svml_sincos4(__m256d v1);
> > +__m256 __svml_asinf8(__m256 v1);
> > +__m256d __svml_asin4(__m256d v1);
> > +__m256 __svml_atanf8(__m256 v1);
> > +__m256d __svml_atan4(__m256d v1);
> > +__m256 __svml_atan2f8(__m256 v1, __m256 v2);
> > +__m256d __svml_atan24(__m256d v1, __m256d v2);
> > +__m256 __svml_logf8(__m256 v1);
> > +__m256d __svml_log4(__m256d v1);
> > +__m256 __svml_log2f8(__m256 v1);
> > +__m256d __svml_log24(__m256d v1);
> > +__m256 __svml_log10f8(__m256 v1);
> > +__m256d __svml_log104(__m256d v1);
> > +__m256 __svml_expf8(__m256 v1);
> > +__m256d __svml_exp4(__m256d v1);
> > +}
> > +#endif
> > +
> > +#endif
> > diff --git a/src/trigonometric.cpp b/src/trigonometric.cpp
> > index e24bc93..2e41059 100644
> > --- a/src/trigonometric.cpp
> > +++ b/src/trigonometric.cpp
> > @@ -20,6 +20,7 @@
> >  #include <Vc/Vc>
> >  #if defined(VC_IMPL_SSE) || defined(VC_IMPL_AVX)
> >  #include <common/macros.h>
> > +#include <common/svml.h>
> > 
> >  namespace Vc
> >  {
> > @@ -74,6 +75,229 @@ namespace
> >      }
> >  } // anonymous namespace
> > 
> > +#if defined(USE_INTEL_SVML)
> > +#if defined(VC_IMPL_SSE)
> > +// sin
> > +template<> template<> float_v Trigonometric<VC_IMPL>::sin(const float_v
> > &_x){ +    float_v tmp;
> > +    tmp.data() = __svml_sinf4(_x.data());
> > +    return tmp;
> > +}
> > +
> > +template<> template<> sfloat_v Trigonometric<VC_IMPL>::sin(const sfloat_v
> > &_x){ +    sfloat_v tmp;
> > +    tmp.data()[0] = __svml_sinf4(_x.data()[0]);
> > +    tmp.data()[1] = __svml_sinf4(_x.data()[1]);
> > +    return tmp;
> > +}
> > +
> > +template<> template<> double_v Trigonometric<VC_IMPL>::sin(const double_v
> > &_x){ +    double_v tmp;
> > +    tmp.data() = __svml_sin2(_x.data());
> > +    return tmp;
> > +}
> > +
> > +// cos
> > +template<> template<> float_v Trigonometric<VC_IMPL>::cos(const float_v
> > &_x){ +    float_v tmp;
> > +    tmp.data() = __svml_cosf4(_x.data());
> > +    return tmp;
> > +}
> > +
> > +template<> template<> sfloat_v Trigonometric<VC_IMPL>::cos(const sfloat_v
> > &_x){ +    sfloat_v tmp;
> > +    tmp.data()[0] = __svml_cosf4(_x.data()[0]);
> > +    tmp.data()[1] = __svml_cosf4(_x.data()[1]);
> > +    return tmp;
> > +}
> > +
> > +template<> template<> double_v Trigonometric<VC_IMPL>::cos(const double_v
> > &_x){ +    double_v tmp;
> > +    tmp.data() = __svml_cos2(_x.data());
> > +    return tmp;
> > +}
> > +
> > +// sincos
> > +template<> template<> void Trigonometric<VC_IMPL>::sincos(const float_v
> > &_x, float_v *_sin, float_v *_cos) { +    _sin->data() =
> > __svml_sincosf4(_x.data());
> > +#if defined(__unix__) || defined(__GNUC__)
> > +     __asm__ __volatile__ ( "movaps %%xmm1, %0":"=m"(_cos->data()));
> > +#else // Windows
> > +    _asm vmovapd _cos->data(), ymm1;
> > +#endif
> > +}
> > +
> > +template<> template<> void Trigonometric<VC_IMPL>::sincos(const sfloat_v
> > &_x, sfloat_v *_sin, sfloat_v *_cos) { +    _sin->data()[0] =
> > __svml_sincosf4(_x.data()[0]);
> > +#if defined(__unix__) || defined(__GNUC__)
> > +     __asm__ __volatile__ ( "movaps %%xmm1, %0":"=m"(_cos->data()[0]));
> > +#else // Windows
> > +    _asm vmovapd _cos->data()[0], ymm1;
> > +#endif
> > +
> > +    _sin->data()[1] = __svml_sincosf4(_x.data()[1]);
> > +#if defined(__unix__) || defined(__GNUC__)
> > +     __asm__ __volatile__ ( "movaps %%xmm1, %0":"=m"(_cos->data()[1]));
> > +#else // Windows
> > +    _asm vmovapd _cos->data()[1], ymm1;
> > +#endif
> > +}
> > +
> > +template<> template<> void Trigonometric<VC_IMPL>::sincos(const double_v
> > &_x, double_v *_sin, double_v *_cos) { +  _sin->data() =
> > __svml_sincos2(_x.data());
> > +#if defined(__unix__) || defined(__GNUC__)
> > +     __asm__ __volatile__ ( "movaps %%xmm1, %0":"=m"(_cos->data()));
> > +#else // Windows
> > +    _asm vmovapd _cos->data(), ymm1;
> > +#endif
> > +}
> > +
> > +// asin
> > +template<> template<> float_v Trigonometric<VC_IMPL>::asin(const float_v
> > &_x){ +    float_v tmp;
> > +    tmp.data() = __svml_asinf4(_x.data());
> > +    return tmp;
> > +}
> > +
> > +template<> template<> sfloat_v Trigonometric<VC_IMPL>::asin(const sfloat_v
> > &_x){ +    sfloat_v tmp;
> > +    tmp.data()[0] = __svml_asinf4(_x.data()[0]);
> > +    tmp.data()[1] = __svml_asinf4(_x.data()[1]);
> > +    return tmp;
> > +}
> > +
> > +template<> template<> double_v Trigonometric<VC_IMPL>::asin(const double_v
> > &_x){ +    double_v tmp;
> > +    tmp.data() = __svml_asin2(_x.data());
> > +    return tmp;
> > +}
> > +
> > +// atan
> > +template<> template<> float_v Trigonometric<VC_IMPL>::atan(const float_v
> > &_x){ +    float_v tmp;
> > +    tmp.data() = __svml_atanf4(_x.data());
> > +    return tmp;
> > +}
> > +
> > +template<> template<> sfloat_v Trigonometric<VC_IMPL>::atan(const sfloat_v
> > &_x){ +    sfloat_v tmp;
> > +    tmp.data()[0] = __svml_atanf4(_x.data()[0]);
> > +    tmp.data()[1] = __svml_atanf4(_x.data()[1]);
> > +    return tmp;
> > +}
> > +
> > +template<> template<> double_v Trigonometric<VC_IMPL>::atan(const double_v
> > &_x){ +    double_v tmp;
> > +    tmp.data() = __svml_atan2(_x.data());
> > +    return tmp;
> > +}
> > +
> > +// atan2
> > +template<> template<> float_v Trigonometric<VC_IMPL>::atan2(const float_v
> > &_x, const float_v &_y){ +    float_v tmp;
> > +    tmp.data() = __svml_atan2f4(_x.data(), _y.data());
> > +    return tmp;
> > +}
> > +
> > +template<> template<> sfloat_v Trigonometric<VC_IMPL>::atan2(const sfloat_v
> > &_x, const sfloat_v &_y){ +    sfloat_v tmp;
> > +    tmp.data()[0] = __svml_atan2f4(_x.data()[0], _y.data()[0]);
> > +    tmp.data()[1] = __svml_atan2f4(_x.data()[1], _y.data()[1]);
> > +    return tmp;
> > +}
> > +
> > +template<> template<> double_v Trigonometric<VC_IMPL>::atan2(const double_v
> > &_x, const double_v &_y){ +    double_v tmp;
> > +    tmp.data() = __svml_atan22(_x.data(), _y.data());
> > +    return tmp;
> > +}
> > +#else
> > +// sin
> > +template<> template<typename _T> Vector<_T>
> > Trigonometric<VC_IMPL>::sin(const Vector<_T> &_x){ +    Vector<_T> tmp;
> > +    tmp.data() = __svml_sinf8(_x.data());
> > +    return tmp;
> > +}
> > +
> > +template<> template<> double_v Trigonometric<VC_IMPL>::sin(const double_v
> > &_x){ +    double_v tmp;
> > +    tmp.data() = __svml_sin4(_x.data());
> > +    return tmp;
> > +}
> > +
> > +// cos
> > +template<> template<typename _T> Vector<_T>
> > Trigonometric<VC_IMPL>::cos(const Vector<_T> &_x){ +    Vector<_T> tmp;
> > +    tmp.data() = __svml_cosf8(_x.data());
> > +    return tmp;
> > +}
> > +
> > +template<> template<> double_v Trigonometric<VC_IMPL>::cos(const double_v
> > &_x){ +    double_v tmp;
> > +    tmp.data() = __svml_cos4(_x.data());
> > +    return tmp;
> > +}
> > +
> > +// sincos
> > +template<> template<typename _T> void Trigonometric<VC_IMPL>::sincos(const
> > Vector<_T> &_x, Vector<_T> *_sin, Vector<_T> *_cos) { +    _sin->data() =
> > __svml_sincosf8(_x.data());
> > +#if defined(__unix__) || defined(__GNUC__)
> > +    __asm__ __volatile__ ( "vmovaps %%ymm1, %0":"=m"(_cos->data()));
> > +#else // Windows
> > +    _asm vmovaps _cos->data(), ymm1;
> > +#endif
> > +}
> > +
> > +template<> template<> void Trigonometric<VC_IMPL>::sincos(const double_v
> > &_x, double_v *_sin, double_v *_cos) { +    _sin->data() =
> > __svml_sincos4(_x.data());
> > +#if defined(__unix__) || defined(__GNUC__)
> > +    __asm__ __volatile__ ( "vmovaps %%ymm1, %0":"=m"(_cos->data()));
> > +#else // Windows
> > +    _asm vmovaps _cos->data(), ymm1;
> > +#endif
> > +}
> > +
> > +// asin
> > +template<> template<typename _T> Vector<_T>
> > Trigonometric<VC_IMPL>::asin(const Vector<_T> &_x){ +    Vector<_T> tmp;
> > +    tmp.data() = __svml_asinf8(_x.data());
> > +    return tmp;
> > +}
> > +
> > +template<> template<> double_v Trigonometric<VC_IMPL>::asin(const double_v
> > &_x){ +    double_v tmp;
> > +    tmp.data() = __svml_asin4(_x.data());
> > +    return tmp;
> > +}
> > +
> > +// atan
> > +template<> template<typename _T> Vector<_T>
> > Trigonometric<VC_IMPL>::atan(const Vector<_T> &_x){ +    Vector<_T> tmp;
> > +    tmp.data() = __svml_atanf8(_x.data());
> > +    return tmp;
> > +}
> > +
> > +template<> template<> double_v Trigonometric<VC_IMPL>::atan(const double_v
> > &_x){ +    double_v tmp;
> > +    tmp.data() = __svml_atan4(_x.data());
> > +    return tmp;
> > +}
> > +
> > +// atan2
> > +template<> template<typename _T> Vector<_T>
> > Trigonometric<VC_IMPL>::atan2(const Vector<_T> &_x, const Vector<_T> &_y){
> > +    Vector<_T> tmp;
> > +    tmp.data() = __svml_atan2f8(_x.data(), _y.data());
> > +    return tmp;
> > +}
> > +
> > +template<> template<> double_v Trigonometric<VC_IMPL>::atan2(const double_v
> > &_x, const double_v &_y){ +    double_v tmp;
> > +    tmp.data() = __svml_atan24(_x.data(), _y.data());
> > +    return tmp;
> > +}
> > +#endif
> > +#else
> > +
> >  /*
> >   * algorithm for sine and cosine:
> >   *
> > @@ -472,6 +696,8 @@ template<> template<> double_v
> > Trigonometric<VC_IMPL>::atan2 (const double_v &y,
> > 
> >      return a;
> >  }
> > +#endif
> > +
> >  } // namespace Vc
> > 
> >  #include <common/undomacros.h>
> -- 
> Dipl.-Phys. Matthias Kretz
>  
> Phone: +49 69 798 44110
> Web:   http://compeng.uni-frankfurt.de/?mkretz
>  
> SIMD easy and portable: http://compeng.uni-frankfurt.de/?vc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://compeng.uni-frankfurt.de/pipermail/vc-devel/attachments/20130207/287cf25c/attachment-0001.html>