[v2,07/12] acl: add infrastructure to support AVX512 classify

Message ID 20200915165025.543-8-konstantin.ananyev@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series acl: introduce AVX512 classify method |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Ananyev, Konstantin Sept. 15, 2020, 4:50 p.m. UTC
  Add necessary changes to support new AVX512 specific ACL classify
algorithm:
 - changes in meson.build to check that build tools
   (compiler, assembler, etc.) do properly support AVX512.
 - run-time checks to make sure target platform does support AVX512.
 - dummy rte_acl_classify_avx512() for targets where AVX512
   implementation couldn't be properly supported.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 config/x86/meson.build          |  3 ++-
 lib/librte_acl/acl.h            |  4 ++++
 lib/librte_acl/acl_run_avx512.c | 17 ++++++++++++++
 lib/librte_acl/meson.build      | 39 +++++++++++++++++++++++++++++++++
 lib/librte_acl/rte_acl.c        | 29 ++++++++++++++++++++++++
 lib/librte_acl/rte_acl.h        |  1 +
 6 files changed, 92 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_acl/acl_run_avx512.c
  

Comments

Bruce Richardson Sept. 16, 2020, 9:11 a.m. UTC | #1
On Tue, Sep 15, 2020 at 05:50:20PM +0100, Konstantin Ananyev wrote:
> Add necessary changes to support new AVX512 specific ACL classify
> algorithm:
>  - changes in meson.build to check that build tools
>    (compiler, assembler, etc.) do properly support AVX512.
>  - run-time checks to make sure target platform does support AVX512.
>  - dummy rte_acl_classify_avx512() for targets where AVX512
>    implementation couldn't be properly supported.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---

This all looks correct, though I wonder do you really need to check all
those AVX512 flags in each case? Since "F" is always present in any AVX512
implementation perhaps it can be checked, though if the other three always
need to be checked I can understand if you want to keep it there for
completeness. [Are all the other 3 used in your code?]

Acked-by: Bruce Richardson <bruce.richardson@intel.com>
  
Vladimir Medvedkin Sept. 16, 2020, 9:36 a.m. UTC | #2
Hi Bruce,

On 16/09/2020 10:11, Bruce Richardson wrote:
> On Tue, Sep 15, 2020 at 05:50:20PM +0100, Konstantin Ananyev wrote:
>> Add necessary changes to support new AVX512 specific ACL classify
>> algorithm:
>>   - changes in meson.build to check that build tools
>>     (compiler, assembler, etc.) do properly support AVX512.
>>   - run-time checks to make sure target platform does support AVX512.
>>   - dummy rte_acl_classify_avx512() for targets where AVX512
>>     implementation couldn't be properly supported.
>>
>> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
>> ---
> 
> This all looks correct, though I wonder do you really need to check all
> those AVX512 flags in each case? Since "F" is always present in any AVX512
> implementation perhaps it can be checked, though if the other three always
> need to be checked I can understand if you want to keep it there for
> completeness. [Are all the other 3 used in your code?]
> 

As for me it is good to check all the flags supported by compiler. Some 
old (but still supported by dpdk) gcc can't compile the code in some 
circumstances. For example:

gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)   <-- pretty 
old but still supported, right?

gcc -march=native -dM -E - < /dev/null | grep "AVX512"
#define __AVX512F__ 1
#define __AVX512BW__ 1
#define __AVX512CD__ 1
#define __AVX512DQ__ 1

Does not support __AVX512VL__

from acl_run_avx512x8.h in first_trans8 there is 
_mm256_mmask_i32gather_epi32 which requires this flag, so compilation 
will fail.

> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
>
  
Bruce Richardson Sept. 16, 2020, 9:49 a.m. UTC | #3
On Wed, Sep 16, 2020 at 10:36:32AM +0100, Medvedkin, Vladimir wrote:
> Hi Bruce,
> 
> On 16/09/2020 10:11, Bruce Richardson wrote:
> > On Tue, Sep 15, 2020 at 05:50:20PM +0100, Konstantin Ananyev wrote:
> > > Add necessary changes to support new AVX512 specific ACL classify
> > > algorithm:
> > >   - changes in meson.build to check that build tools
> > >     (compiler, assembler, etc.) do properly support AVX512.
> > >   - run-time checks to make sure target platform does support AVX512.
> > >   - dummy rte_acl_classify_avx512() for targets where AVX512
> > >     implementation couldn't be properly supported.
> > > 
> > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > ---
> > 
> > This all looks correct, though I wonder do you really need to check all
> > those AVX512 flags in each case? Since "F" is always present in any AVX512
> > implementation perhaps it can be checked, though if the other three always
> > need to be checked I can understand if you want to keep it there for
> > completeness. [Are all the other 3 used in your code?]
> > 
> 
> As for me it is good to check all the flags supported by compiler. Some old
> (but still supported by dpdk) gcc can't compile the code in some
> circumstances. For example:
> 
> gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)   <-- pretty old
> but still supported, right?
> 
> gcc -march=native -dM -E - < /dev/null | grep "AVX512"
> #define __AVX512F__ 1
> #define __AVX512BW__ 1
> #define __AVX512CD__ 1
> #define __AVX512DQ__ 1
> 
> Does not support __AVX512VL__
> 
Interesting, seems like checking them all to be sure is the right approach
so.
My ack stands so, and ignore the comment.
  
Ananyev, Konstantin Sept. 16, 2020, 10:06 a.m. UTC | #4
> On Wed, Sep 16, 2020 at 10:36:32AM +0100, Medvedkin, Vladimir wrote:
> > Hi Bruce,
> >
> > On 16/09/2020 10:11, Bruce Richardson wrote:
> > > On Tue, Sep 15, 2020 at 05:50:20PM +0100, Konstantin Ananyev wrote:
> > > > Add necessary changes to support new AVX512 specific ACL classify
> > > > algorithm:
> > > >   - changes in meson.build to check that build tools
> > > >     (compiler, assembler, etc.) do properly support AVX512.
> > > >   - run-time checks to make sure target platform does support AVX512.
> > > >   - dummy rte_acl_classify_avx512() for targets where AVX512
> > > >     implementation couldn't be properly supported.
> > > >
> > > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > > ---
> > >
> > > This all looks correct, though I wonder do you really need to check all
> > > those AVX512 flags in each case? Since "F" is always present in any AVX512
> > > implementation perhaps it can be checked, though if the other three always
> > > need to be checked I can understand if you want to keep it there for
> > > completeness. [Are all the other 3 used in your code?]

Yep, ACL uses all of them.
Thanks
Konstantin

> > >
> >
> > As for me it is good to check all the flags supported by compiler. Some old
> > (but still supported by dpdk) gcc can't compile the code in some
> > circumstances. For example:
> >
> > gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)   <-- pretty old
> > but still supported, right?
> >
> > gcc -march=native -dM -E - < /dev/null | grep "AVX512"
> > #define __AVX512F__ 1
> > #define __AVX512BW__ 1
> > #define __AVX512CD__ 1
> > #define __AVX512DQ__ 1
> >
> > Does not support __AVX512VL__
> >
> Interesting, seems like checking them all to be sure is the right approach
> so.
> My ack stands so, and ignore the comment.
  

Patch

diff --git a/config/x86/meson.build b/config/x86/meson.build
index 6ec020ef6..c5626e914 100644
--- a/config/x86/meson.build
+++ b/config/x86/meson.build
@@ -23,7 +23,8 @@  foreach f:base_flags
 endforeach
 
 optional_flags = ['AES', 'PCLMUL',
-		'AVX', 'AVX2', 'AVX512F',
+		'AVX', 'AVX2',
+		'AVX512F', 'AVX512VL', 'AVX512CD', 'AVX512BW',
 		'RDRND', 'RDSEED']
 foreach f:optional_flags
 	if cc.get_define('__@0@__'.format(f), args: machine_args) == '1'
diff --git a/lib/librte_acl/acl.h b/lib/librte_acl/acl.h
index 39d45a0c2..2022cf253 100644
--- a/lib/librte_acl/acl.h
+++ b/lib/librte_acl/acl.h
@@ -201,6 +201,10 @@  int
 rte_acl_classify_avx2(const struct rte_acl_ctx *ctx, const uint8_t **data,
 	uint32_t *results, uint32_t num, uint32_t categories);
 
+int
+rte_acl_classify_avx512(const struct rte_acl_ctx *ctx, const uint8_t **data,
+	uint32_t *results, uint32_t num, uint32_t categories);
+
 int
 rte_acl_classify_neon(const struct rte_acl_ctx *ctx, const uint8_t **data,
 	uint32_t *results, uint32_t num, uint32_t categories);
diff --git a/lib/librte_acl/acl_run_avx512.c b/lib/librte_acl/acl_run_avx512.c
new file mode 100644
index 000000000..67274989d
--- /dev/null
+++ b/lib/librte_acl/acl_run_avx512.c
@@ -0,0 +1,17 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "acl_run_sse.h"
+
+int
+rte_acl_classify_avx512(const struct rte_acl_ctx *ctx, const uint8_t **data,
+	uint32_t *results, uint32_t num, uint32_t categories)
+{
+	if (num >= MAX_SEARCHES_SSE8)
+		return search_sse_8(ctx, data, results, num, categories);
+	if (num >= MAX_SEARCHES_SSE4)
+		return search_sse_4(ctx, data, results, num, categories);
+
+	return rte_acl_classify_scalar(ctx, data, results, num, categories);
+}
diff --git a/lib/librte_acl/meson.build b/lib/librte_acl/meson.build
index d1e2c184c..b2fd61cad 100644
--- a/lib/librte_acl/meson.build
+++ b/lib/librte_acl/meson.build
@@ -27,6 +27,45 @@  if dpdk_conf.has('RTE_ARCH_X86')
 		cflags += '-DCC_AVX2_SUPPORT'
 	endif
 
+	# compile AVX512 version if:
+	# we are building 64-bit binary AND binutils can generate proper code
+
+	if dpdk_conf.has('RTE_ARCH_X86_64') and binutils_ok.returncode() == 0
+
+		# compile AVX512 version if either:
+		# a. we have AVX512 supported in minimum instruction set
+		#    baseline
+		# b. it's not minimum instruction set, but supported by
+		#    compiler
+		#
+		# in former case, just add avx512 C file to files list
+		# in latter case, compile c file to static lib, using correct
+		# compiler flags, and then have the .o file from static lib
+		# linked into main lib.
+
+		if dpdk_conf.has('RTE_MACHINE_CPUFLAG_AVX512F') and \
+			dpdk_conf.has('RTE_MACHINE_CPUFLAG_AVX512VL') and \
+			dpdk_conf.has('RTE_MACHINE_CPUFLAG_AVX512CD') and \
+			dpdk_conf.has('RTE_MACHINE_CPUFLAG_AVX512BW')
+
+			sources += files('acl_run_avx512.c')
+			cflags += '-DCC_AVX512_SUPPORT'
+
+		elif cc.has_multi_arguments('-mavx512f', '-mavx512vl',
+					'-mavx512cd', '-mavx512bw')
+
+			avx512_tmplib = static_library('avx512_tmp',
+				'acl_run_avx512.c',
+				dependencies: static_rte_eal,
+				c_args: cflags +
+					['-mavx512f', '-mavx512vl',
+					 '-mavx512cd', '-mavx512bw'])
+			objs += avx512_tmplib.extract_objects(
+					'acl_run_avx512.c')
+			cflags += '-DCC_AVX512_SUPPORT'
+		endif
+	endif
+
 elif dpdk_conf.has('RTE_ARCH_ARM') or dpdk_conf.has('RTE_ARCH_ARM64')
 	cflags += '-flax-vector-conversions'
 	sources += files('acl_run_neon.c')
diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c
index fbcf45fdc..fdcb7a798 100644
--- a/lib/librte_acl/rte_acl.c
+++ b/lib/librte_acl/rte_acl.c
@@ -16,6 +16,22 @@  static struct rte_tailq_elem rte_acl_tailq = {
 };
 EAL_REGISTER_TAILQ(rte_acl_tailq)
 
+#ifndef CC_AVX512_SUPPORT
+/*
+ * If the compiler doesn't support AVX512 instructions,
+ * then the dummy one would be used instead for AVX512 classify method.
+ */
+int
+rte_acl_classify_avx512(__rte_unused const struct rte_acl_ctx *ctx,
+	__rte_unused const uint8_t **data,
+	__rte_unused uint32_t *results,
+	__rte_unused uint32_t num,
+	__rte_unused uint32_t categories)
+{
+	return -ENOTSUP;
+}
+#endif
+
 #ifndef CC_AVX2_SUPPORT
 /*
  * If the compiler doesn't support AVX2 instructions,
@@ -77,6 +93,7 @@  static const rte_acl_classify_t classify_fns[] = {
 	[RTE_ACL_CLASSIFY_AVX2] = rte_acl_classify_avx2,
 	[RTE_ACL_CLASSIFY_NEON] = rte_acl_classify_neon,
 	[RTE_ACL_CLASSIFY_ALTIVEC] = rte_acl_classify_altivec,
+	[RTE_ACL_CLASSIFY_AVX512] = rte_acl_classify_avx512,
 };
 
 /*
@@ -126,6 +143,17 @@  acl_check_alg_ppc(enum rte_acl_classify_alg alg)
 static int
 acl_check_alg_x86(enum rte_acl_classify_alg alg)
 {
+	if (alg == RTE_ACL_CLASSIFY_AVX512) {
+#ifdef CC_AVX512_SUPPORT
+		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) &&
+			rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512VL) &&
+			rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512CD) &&
+			rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW))
+			return 0;
+#endif
+		return -ENOTSUP;
+	}
+
 	if (alg == RTE_ACL_CLASSIFY_AVX2) {
 #ifdef CC_AVX2_SUPPORT
 		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
@@ -159,6 +187,7 @@  acl_check_alg(enum rte_acl_classify_alg alg)
 		return acl_check_alg_arm(alg);
 	case RTE_ACL_CLASSIFY_ALTIVEC:
 		return acl_check_alg_ppc(alg);
+	case RTE_ACL_CLASSIFY_AVX512:
 	case RTE_ACL_CLASSIFY_AVX2:
 	case RTE_ACL_CLASSIFY_SSE:
 		return acl_check_alg_x86(alg);
diff --git a/lib/librte_acl/rte_acl.h b/lib/librte_acl/rte_acl.h
index 3999f15de..d243a1c84 100644
--- a/lib/librte_acl/rte_acl.h
+++ b/lib/librte_acl/rte_acl.h
@@ -241,6 +241,7 @@  enum rte_acl_classify_alg {
 	RTE_ACL_CLASSIFY_AVX2 = 3,    /**< requires AVX2 support. */
 	RTE_ACL_CLASSIFY_NEON = 4,    /**< requires NEON support. */
 	RTE_ACL_CLASSIFY_ALTIVEC = 5,    /**< requires ALTIVEC support. */
+	RTE_ACL_CLASSIFY_AVX512 = 6,    /**< requires AVX512 support. */
 };
 
 /**