net: stop using mmx intrinsics

Message ID 1710969121-18503-2-git-send-email-roretzla@linux.microsoft.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series net: stop using mmx intrinsics |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/intel-Functional success Functional PASS
ci/iol-abi-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS

Commit Message

Tyler Retzlaff March 20, 2024, 9:12 p.m. UTC
  Update code to use only avx/sse intrinsics as mmx is not supported on
MSVC.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/net/net_crc_avx512.c | 28 ++++++++++------------------
 lib/net/net_crc_sse.c    | 28 ++++++++++------------------
 2 files changed, 20 insertions(+), 36 deletions(-)
  

Comments

Thomas Monjalon March 21, 2024, 5:09 p.m. UTC | #1
20/03/2024 22:12, Tyler Retzlaff:
> +#ifdef RTE_TOOLCHAIN_MSVC
> +#include <intrin.h>
> +#else
>  #include <x86intrin.h>
> +#endif

It is not the same include in MSVC?
Is it something we want to wrap in a DPDK header file?
  
Tyler Retzlaff March 21, 2024, 5:27 p.m. UTC | #2
On Thu, Mar 21, 2024 at 06:09:01PM +0100, Thomas Monjalon wrote:
> 20/03/2024 22:12, Tyler Retzlaff:
> > +#ifdef RTE_TOOLCHAIN_MSVC
> > +#include <intrin.h>
> > +#else
> >  #include <x86intrin.h>
> > +#endif
> 
> It is not the same include in MSVC?

unfortunately intrin.h is vestigial in the monolithic approach. to use
any intrinsic you're supposed to include only the one and only true
header instead of vendor/arch feature specific headers.

> Is it something we want to wrap in a DPDK header file?

do you mean create a monolithic rte_intrinsic.h header that is
essentially

#ifdef MSVC
#include <intrin.h>
#else
#include <x86intrin.h>
#include <immintrin.h>
#include <nmmintrin.h>
...
#endif

i assumed that doing something like this might be unpopular due to the
unnecessary namespace pollution.

another alternative could be to find a way to limit that pollution only
to msvc by stashing intrin.h in e.g. rte_common.h (or rte_os.h) under
conditional compile but the problem i think we had with that approach is
that some llvm headers don't define prototypes that match those from
msvc see lib/eal/windows/include/rte_windows.h another issue arises
where if the application includes intrin.h before dpdk headers we again
have to deal with llvm vs msvc differences.

fwiw the instance highlighted llvm should have volatile qualified in
their prototype but didn't.

i will commit to looking into this more after applications are working.
  
Thomas Monjalon March 21, 2024, 6:01 p.m. UTC | #3
21/03/2024 18:27, Tyler Retzlaff:
> On Thu, Mar 21, 2024 at 06:09:01PM +0100, Thomas Monjalon wrote:
> > 20/03/2024 22:12, Tyler Retzlaff:
> > > +#ifdef RTE_TOOLCHAIN_MSVC
> > > +#include <intrin.h>
> > > +#else
> > >  #include <x86intrin.h>
> > > +#endif
> > 
> > It is not the same include in MSVC?
> 
> unfortunately intrin.h is vestigial in the monolithic approach. to use
> any intrinsic you're supposed to include only the one and only true
> header instead of vendor/arch feature specific headers.
> 
> > Is it something we want to wrap in a DPDK header file?
> 
> do you mean create a monolithic rte_intrinsic.h header that is
> essentially
> 
> #ifdef MSVC
> #include <intrin.h>
> #else
> #include <x86intrin.h>
> #include <immintrin.h>
> #include <nmmintrin.h>
> ...
> #endif
> 
> i assumed that doing something like this might be unpopular due to the
> unnecessary namespace pollution.

We already have such a file.
It is rte_vect.h.
I suppose we should just make sure it is included consistently
instead of x86intrin.h or immintrin.h

This command will show where changes are required:
	git grep intrin.h
  
Tyler Retzlaff March 21, 2024, 6:18 p.m. UTC | #4
On Thu, Mar 21, 2024 at 07:01:17PM +0100, Thomas Monjalon wrote:
> 21/03/2024 18:27, Tyler Retzlaff:
> > On Thu, Mar 21, 2024 at 06:09:01PM +0100, Thomas Monjalon wrote:
> > > 20/03/2024 22:12, Tyler Retzlaff:
> > > > +#ifdef RTE_TOOLCHAIN_MSVC
> > > > +#include <intrin.h>
> > > > +#else
> > > >  #include <x86intrin.h>
> > > > +#endif
> > > 
> > > It is not the same include in MSVC?
> > 
> > unfortunately intrin.h is vestigial in the monolithic approach. to use
> > any intrinsic you're supposed to include only the one and only true
> > header instead of vendor/arch feature specific headers.
> > 
> > > Is it something we want to wrap in a DPDK header file?
> > 
> > do you mean create a monolithic rte_intrinsic.h header that is
> > essentially
> > 
> > #ifdef MSVC
> > #include <intrin.h>
> > #else
> > #include <x86intrin.h>
> > #include <immintrin.h>
> > #include <nmmintrin.h>
> > ...
> > #endif
> > 
> > i assumed that doing something like this might be unpopular due to the
> > unnecessary namespace pollution.
> 
> We already have such a file.
> It is rte_vect.h.
> I suppose we should just make sure it is included consistently
> instead of x86intrin.h or immintrin.h
> 
> This command will show where changes are required:
> 	git grep intrin.h

there were some corner cases i can't recall, but since you identified
rte_vect.h is the preferred header let me do some experiments to see
what i can learn.  i'll either submit a series addressing it
specifically or come back with details.

thanks!

> 
>
  
Tyler Retzlaff March 28, 2024, 4:16 p.m. UTC | #5
On Thu, Mar 21, 2024 at 07:01:17PM +0100, Thomas Monjalon wrote:
> 21/03/2024 18:27, Tyler Retzlaff:
> > On Thu, Mar 21, 2024 at 06:09:01PM +0100, Thomas Monjalon wrote:
> > > 20/03/2024 22:12, Tyler Retzlaff:
> > > > +#ifdef RTE_TOOLCHAIN_MSVC
> > > > +#include <intrin.h>
> > > > +#else
> > > >  #include <x86intrin.h>
> > > > +#endif
> > > 
> > > It is not the same include in MSVC?
> > 
> > unfortunately intrin.h is vestigial in the monolithic approach. to use
> > any intrinsic you're supposed to include only the one and only true
> > header instead of vendor/arch feature specific headers.
> > 
> > > Is it something we want to wrap in a DPDK header file?
> > 
> > do you mean create a monolithic rte_intrinsic.h header that is
> > essentially
> > 
> > #ifdef MSVC
> > #include <intrin.h>
> > #else
> > #include <x86intrin.h>
> > #include <immintrin.h>
> > #include <nmmintrin.h>
> > ...
> > #endif
> > 
> > i assumed that doing something like this might be unpopular due to the
> > unnecessary namespace pollution.
> 
> We already have such a file.
> It is rte_vect.h.
> I suppose we should just make sure it is included consistently
> instead of x86intrin.h or immintrin.h
> 
> This command will show where changes are required:
> 	git grep intrin.h

thanks! i saw none of the problems i had before so this worked great.

there is only one other include of intrin.h in eal now and it is not for
vector intrinsics so it should be cleaner to just include rte_vect.h
whenever SIMD / vector intrinsics are required for windows and !windows.

> 
>
  

Patch

diff --git a/lib/net/net_crc_avx512.c b/lib/net/net_crc_avx512.c
index 0f0dee4..6d0c644 100644
--- a/lib/net/net_crc_avx512.c
+++ b/lib/net/net_crc_avx512.c
@@ -8,7 +8,11 @@ 
 
 #include "net_crc.h"
 
+#ifdef RTE_TOOLCHAIN_MSVC
+#include <intrin.h>
+#else
 #include <x86intrin.h>
+#endif
 
 /* VPCLMULQDQ CRC computation context structure */
 struct crc_vpclmulqdq_ctx {
@@ -331,13 +335,10 @@  static const alignas(16) uint32_t mask2[4] = {
 			c9, c10, c11);
 	crc32_eth.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15,
 			c16, c17, 0, 0);
-	crc32_eth.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16),
-			_mm_cvtsi64_m64(c17));
+	crc32_eth.fold_1x128b = _mm_set_epi64x(c17, c16);
 
-	crc32_eth.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18),
-			_mm_cvtsi64_m64(c19));
-	crc32_eth.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20),
-			_mm_cvtsi64_m64(c21));
+	crc32_eth.rk5_rk6 = _mm_set_epi64x(c19, c18);
+	crc32_eth.rk7_rk8 = _mm_set_epi64x(c21, c20);
 }
 
 static void
@@ -378,13 +379,10 @@  static const alignas(16) uint32_t mask2[4] = {
 			c9, c10, c11);
 	crc16_ccitt.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15,
 			c16, c17, 0, 0);
-	crc16_ccitt.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16),
-			_mm_cvtsi64_m64(c17));
+	crc16_ccitt.fold_1x128b = _mm_set_epi64x(c17, c16);
 
-	crc16_ccitt.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18),
-			_mm_cvtsi64_m64(c19));
-	crc16_ccitt.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20),
-			_mm_cvtsi64_m64(c21));
+	crc16_ccitt.rk5_rk6 = _mm_set_epi64x(c19, c18);
+	crc16_ccitt.rk7_rk8 = _mm_set_epi64x(c21, c20);
 }
 
 void
@@ -392,12 +390,6 @@  static const alignas(16) uint32_t mask2[4] = {
 {
 	crc32_load_init_constants();
 	crc16_load_init_constants();
-
-	/*
-	 * Reset the register as following calculation may
-	 * use other data types such as float, double, etc.
-	 */
-	_mm_empty();
 }
 
 uint32_t
diff --git a/lib/net/net_crc_sse.c b/lib/net/net_crc_sse.c
index d673ae3..9ab80a0 100644
--- a/lib/net/net_crc_sse.c
+++ b/lib/net/net_crc_sse.c
@@ -10,7 +10,11 @@ 
 
 #include "net_crc.h"
 
+#ifdef RTE_TOOLCHAIN_MSVC
+#include <intrin.h>
+#else
 #include <x86intrin.h>
+#endif
 
 /** PCLMULQDQ CRC computation context structure */
 struct crc_pclmulqdq_ctx {
@@ -272,12 +276,9 @@  static const alignas(16) uint8_t crc_xmm_shift_tab[48] = {
 	p =  0x10811LLU;
 
 	/** Save the params in context structure */
-	crc16_ccitt_pclmulqdq.rk1_rk2 =
-		_mm_setr_epi64(_mm_cvtsi64_m64(k1), _mm_cvtsi64_m64(k2));
-	crc16_ccitt_pclmulqdq.rk5_rk6 =
-		_mm_setr_epi64(_mm_cvtsi64_m64(k5), _mm_cvtsi64_m64(k6));
-	crc16_ccitt_pclmulqdq.rk7_rk8 =
-		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
+	crc16_ccitt_pclmulqdq.rk1_rk2 = _mm_set_epi64x(k2, k1);
+	crc16_ccitt_pclmulqdq.rk5_rk6 = _mm_set_epi64x(k6, k5);
+	crc16_ccitt_pclmulqdq.rk7_rk8 = _mm_set_epi64x(p, q);
 
 	/** Initialize CRC32 data */
 	k1 = 0xccaa009eLLU;
@@ -288,18 +289,9 @@  static const alignas(16) uint8_t crc_xmm_shift_tab[48] = {
 	p =  0x1db710641LLU;
 
 	/** Save the params in context structure */
-	crc32_eth_pclmulqdq.rk1_rk2 =
-		_mm_setr_epi64(_mm_cvtsi64_m64(k1), _mm_cvtsi64_m64(k2));
-	crc32_eth_pclmulqdq.rk5_rk6 =
-		_mm_setr_epi64(_mm_cvtsi64_m64(k5), _mm_cvtsi64_m64(k6));
-	crc32_eth_pclmulqdq.rk7_rk8 =
-		_mm_setr_epi64(_mm_cvtsi64_m64(q), _mm_cvtsi64_m64(p));
-
-	/**
-	 * Reset the register as following calculation may
-	 * use other data types such as float, double, etc.
-	 */
-	_mm_empty();
+	crc32_eth_pclmulqdq.rk1_rk2 = _mm_set_epi64x(k2, k1);
+	crc32_eth_pclmulqdq.rk5_rk6 = _mm_set_epi64x(k6, k5);
+	crc32_eth_pclmulqdq.rk7_rk8 = _mm_set_epi64x(p, q);
 }
 
 uint32_t