Tag: intrinsics

108 Header files for x86 SIMD intrinsics 2012-06-27T14:44:54.400

78 Incrementing 'masked' bitsets 2017-06-26T19:14:47.680

41 What are intrinsics? 2010-02-15T20:02:16.073

30 How to use VC++ intrinsic functions w/o run-time library 2010-05-30T14:16:26.167

25 Equivalent of InterlockedIncrement in Linux/gcc 2010-01-24T04:22:36.620

23 SSE, intrinsics, and alignment 2012-09-19T20:06:06.967

23 Get member of __m128 by index? 2012-09-27T15:06:06.887

22 Is there a good reference for ARM Neon intrinsics? 2010-05-17T17:52:04.080

21 c++ SSE SIMD framework 2011-02-10T03:42:58.740

19 When will JVM use intrinsics 2013-11-10T16:49:56.077

18 print a __m128i variable 2012-11-06T18:34:33.680

18 Why and when to use __noop? 2013-01-22T12:57:35.627

17 Do I get a performance penalty when mixing SSE integer/float SIMD instructions 2011-02-14T19:28:05.790

16 How to rotate an SSE/AVX vector 2012-08-10T17:52:28.643

15 Divide by floating-point number using NEON intrinsics 2011-07-20T09:41:36.643

15 SIMD and difference between packed and scalar double precision 2013-04-25T15:20:21.993

15 Intrinsics for CPUID like informations? 2013-07-20T03:40:39.920

15 How to implement "_mm_storeu_epi64" without aliasing problems? 2014-07-16T17:39:21.830

15 When the compiler reorders AVX instructions on Sandy, does it affect performance? 2015-01-04T20:10:03.537

14 What's the difference between logical SSE intrinsics? 2010-05-10T17:32:07.627

14 Scatter intrinsics in AVX 2012-12-24T11:16:35.630

13 How to use the multiply and accumulate intrinsics in ARM Cortex-a8? 2010-07-13T18:56:59.603

13 Reference manual/tutorial for SIMD intrinsics? 2011-07-28T11:03:08.893

13 Is it possible to cast floats directly to __m128 if they are 16 byte aligned? 2012-08-01T12:57:14.093

13 What's the proper way to use different versions of SSE intrinsics in GCC? 2013-03-23T08:54:04.560

13 Undocumented intrinsic routines 2015-05-23T20:23:03.183

13 What are _mm_prefetch() locality hints? 2017-10-02T08:06:31.060

12 How to use MSVC intrinsics to get the equivalent of this GCC code? 2008-12-10T13:00:57.177

12 How does _mm_mwait work? 2010-04-02T02:23:58.683

12 When should I use _mm_sfence _mm_lfence and _mm_mfence 2010-12-27T09:35:19.900

12 Arm Neon Intrinsics vs hand assembly 2012-03-22T18:48:09.380

11 Most efficient way to store 4 dot products into a contiguous array in C using SSE intrinsics 2010-11-13T06:08:51.333

11 gcc, simd intrinsics and fast-math concepts 2011-02-11T07:13:58.320

11 How to load a pixel struct into an SSE register? 2012-08-25T11:44:35.403

11 Funnel shift - what is it? 2012-10-07T08:00:51.080

11 Produce loops without cmp instruction in GCC 2014-09-18T20:17:59.760

11 Intel Intrinsics guide - Latency and Throughput 2016-10-23T13:05:37.613

11 Why are there 128bit load functions for SSE? 2017-05-27T13:10:11.400

11 Fallback implementation for conflict detection in AVX2 2017-06-30T09:47:51.883

10 How do I reorder vector data using ARM Neon intrinsics? 2010-04-11T07:02:14.137

10 SSE instruction set not enabled 2012-02-04T21:06:29.373

10 Check XMM register for all zeroes 2012-04-16T14:08:15.900

10 How to sum __m256 horizontally? 2012-11-04T13:55:49.090

10 Vectorizing Modular Arithmetic 2013-12-16T06:35:36.723

10 Emulating shifts on 32 bytes with AVX 2014-08-11T17:14:04.820

9 Dot product - SSE2 vs BLAS 2009-07-07T03:34:44.753

9 Testing for builtins/intrinsics 2010-12-01T08:12:26.780

9 128-bit division intrinsic in Visual C++ 2011-12-09T23:50:47.253

9 Initializing an __m128 type from a 64-bit unsigned int 2014-05-05T19:25:02.470

9 Most efficient way to check if all __m128i components are 0 [using SSE intrinsics] 2015-01-12T15:44:34.350

9 Efficiently gather individual bytes, separated by a byte-stride of 4 2015-08-12T23:49:03.750

9 How to instruct compiler to generate unaligned loads for __m128 2015-11-24T09:04:58.890

9 Simple C++ expression templates wrapping intrinsics produces different instructions 2016-12-01T10:21:50.917

9 Costs of new AVX512 instruction - Scatter store 2017-09-04T18:23:41.713

8 x86 max/min asm instructions? 2009-12-28T14:46:45.213

8 How compilers treat SSE (or any) intrinsic functions? 2011-04-15T13:03:02.583

8 SSE2 code optimization 2011-11-03T13:33:35.450

8 SSE intrinsics - comparison if/else optimization 2012-01-24T12:07:28.903

8 How can I get an intrinsic for the exp() function in x64 code? 2012-04-10T19:51:41.930

8 What's the difference between __popcnt() and _mm_popcnt_u32()? 2012-06-20T06:32:26.317

8 AVX 256-bit equivalent for _mm_load1_ps 2013-06-13T23:59:06.360

8 NEON intrinsic types work in C but throw invalid arguments error in C++ 2013-08-27T18:55:37.730

8 AVX log intrinsics (_mm256_log_ps) missing in g++-4.8? 2013-09-11T17:36:35.390

8 Fast calculate hamming distance in C 2014-08-02T20:13:19.093

8 Best way to shuffle 64-bit portions of two __m128i's 2014-08-13T18:37:48.173

8 How can I implement a portable pointer compare and swap? 2015-07-02T10:19:36.690

8 SSE rounds down when it should round up 2015-10-14T01:30:37.417

8 Can PTEST be used to test if two registers are both zero or some other condition? 2017-04-30T23:03:25.030

8 Truth-table reduction to ternary logic operations, vpternlog 2017-11-28T17:28:52.410

8 Computing 8 horizontal sums of eight AVX single-precission floating-point vectors 2018-07-10T21:41:40.343

8 How to implement an efficient _mm256_madd_epi8? 2018-07-17T13:11:30.900

8 Constexpr and SSE intrinsics 2018-08-16T14:59:10.790

7 Make compiler copy characters using movsd 2009-07-16T12:48:58.777

7 Bilinear filter with SSE4.1 intrinsics 2011-05-11T09:57:20.680

7 How do I perform 8 x 8 matrix operation using SSE? 2011-11-27T13:20:25.963

7 _mm_alignr_epi8 (PALIGNR) equivalent in AVX2 2011-12-15T09:39:40.193

7 Use both SSE2 intrinsics and gcc inline assembler 2012-01-27T21:33:08.913

7 SSE intrinsics cause normal float operation to return -1.#INV 2012-01-29T10:42:29.390

7 Visual C++ x64 add with carry 2012-02-04T23:42:37.877

7 128-bit SSE counter? 2012-02-19T12:03:20.223

7 Accessing arbitrary 16-bit elements packed in a 128-bit register 2012-04-01T11:18:55.273

7 Summing 3 lanes in a NEON float32x4_t 2012-12-14T00:50:39.713

7 SSE3 intrinsics: How to find the maximum of a large array of floats 2013-03-06T04:26:00.430

7 SSE _mm_load_pd works while _mm_store_pd segfaults 2013-03-18T11:42:01.510

7 Best assembly or compilation for minimum of three values 2013-08-19T04:57:26.947

7 Clang/GCC Compiler Intrinsics without corresponding compiler flag 2014-01-26T12:41:51.803

7 Arm NEON and poly8_t and poly16_t 2014-03-06T12:17:59.510

7 Why java division for integer is faster than hacker's delight implementation 2014-03-10T21:47:06.113

7 Why do java intrinsic functions still have code? 2014-04-13T09:28:55.490

7 How to check with Intel intrinsics if AVX extensions is supported by the CPU? 2014-06-17T09:42:12.253

7 _addcarry_u64 and _addcarryx_u64 with MSVC and ICC 2015-03-24T09:46:51.467

7 Potential bug in Visual Studio C compiler or in Intel Intrinsics' AVX2 "_mm256_set_epi64x" function 2016-05-29T11:08:07.847

7 Does Clang have something like #pragma GCC target? 2017-09-11T23:37:44.413

7 SIMD Intrinsics and Persistent Variables/State 2018-01-23T17:24:23.060

7 does gcc's __builtin_cpu_supports check for OS support? 2018-02-08T04:31:33.407

6 intrinsic memcmp 2009-05-13T03:18:12.847

6 Intel AVX intrinsics: any compatibility library out? 2010-04-25T14:16:10.093

6 How to optimize a cycle? 2010-10-21T11:40:25.423

6 How do I fake a user log in for unit testing purposes using fakeiteasy within asp.net mvc 2 2010-12-02T13:42:45.087