[v4,1/8] eal: introduce zmm type for AVX 512-bit

Message ID 98b10e12eb46cff65494a94eaf0f04b2dcefd245.1594238610.git.vladimir.medvedkin@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series fib: implement AVX512 vector lookup |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-intel-Performance success Performance Testing PASS

Commit Message

Vladimir Medvedkin July 8, 2020, 8:16 p.m. UTC
  New data type to manipulate 512 bit AVX values.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_eal/x86/include/rte_vect.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)
  

Comments

David Marchand July 9, 2020, 1:48 p.m. UTC | #1
On Wed, Jul 8, 2020 at 10:17 PM Vladimir Medvedkin
<vladimir.medvedkin@intel.com> wrote:
>
> New data type to manipulate 512 bit AVX values.

The title mentions a "zmm" type that is not added by this patch.

Maybe instead, "eal/x86: introduce AVX 512-bit type"


>
> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  lib/librte_eal/x86/include/rte_vect.h | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
>
> diff --git a/lib/librte_eal/x86/include/rte_vect.h b/lib/librte_eal/x86/include/rte_vect.h
> index df5a60762..ae59126bc 100644
> --- a/lib/librte_eal/x86/include/rte_vect.h
> +++ b/lib/librte_eal/x86/include/rte_vect.h
> @@ -13,6 +13,7 @@
>
>  #include <stdint.h>
>  #include <rte_config.h>
> +#include <rte_common.h>
>  #include "generic/rte_vect.h"
>
>  #if (defined(__ICC) || \
> @@ -90,6 +91,26 @@ __extension__ ({                 \
>  })
>  #endif /* (defined(__ICC) && __ICC < 1210) */
>
> +#ifdef __AVX512F__
> +
> +typedef __m512i __x86_zmm_t;

We don't need this interim type, using the native __m512 is enough afaics.

Looking at the whole applied series:
$ git grep -lw __x86_zmm_t
lib/librte_eal/x86/include/rte_vect.h


> +
> +#define        ZMM_SIZE        (sizeof(__x86_zmm_t))
> +#define        ZMM_MASK        (ZMM_SIZE - 1)

Macros in a public header need a RTE_ prefix + this is x86 specific,
then RTE_X86_.

Looking at the whole applied series:
$ git grep -lw ZMM_SIZE
lib/librte_eal/x86/include/rte_vect.h
$ git grep -lw ZMM_MASK
lib/librte_eal/x86/include/rte_vect.h

So I wonder if we need to export it or we can instead just #undef
after the struct definition.


> +
> +typedef union __rte_x86_zmm  {
> +       __x86_zmm_t      z;
> +       ymm_t    y[ZMM_SIZE / sizeof(ymm_t)];
> +       xmm_t    x[ZMM_SIZE / sizeof(xmm_t)];
> +       uint8_t  u8[ZMM_SIZE / sizeof(uint8_t)];
> +       uint16_t u16[ZMM_SIZE / sizeof(uint16_t)];
> +       uint32_t u32[ZMM_SIZE / sizeof(uint32_t)];
> +       uint64_t u64[ZMM_SIZE / sizeof(uint64_t)];
> +       double   pd[ZMM_SIZE / sizeof(double)];
> +} __rte_aligned(ZMM_SIZE) __rte_x86_zmm_t;

I don't understand this forced alignment statement.
Would not natural alignment be enough, since all fields in this union
have the same size?
  
Vladimir Medvedkin July 9, 2020, 2:52 p.m. UTC | #2
Hi David,

Thanks for review

On 09/07/2020 14:48, David Marchand wrote:
> On Wed, Jul 8, 2020 at 10:17 PM Vladimir Medvedkin
> <vladimir.medvedkin@intel.com> wrote:
>>
>> New data type to manipulate 512 bit AVX values.
> 
> The title mentions a "zmm" type that is not added by this patch.
> 
> Maybe instead, "eal/x86: introduce AVX 512-bit type"
> 

Agree

> 
>>
>> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
>> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
>> ---
>>   lib/librte_eal/x86/include/rte_vect.h | 21 +++++++++++++++++++++
>>   1 file changed, 21 insertions(+)
>>
>> diff --git a/lib/librte_eal/x86/include/rte_vect.h b/lib/librte_eal/x86/include/rte_vect.h
>> index df5a60762..ae59126bc 100644
>> --- a/lib/librte_eal/x86/include/rte_vect.h
>> +++ b/lib/librte_eal/x86/include/rte_vect.h
>> @@ -13,6 +13,7 @@
>>
>>   #include <stdint.h>
>>   #include <rte_config.h>
>> +#include <rte_common.h>
>>   #include "generic/rte_vect.h"
>>
>>   #if (defined(__ICC) || \
>> @@ -90,6 +91,26 @@ __extension__ ({                 \
>>   })
>>   #endif /* (defined(__ICC) && __ICC < 1210) */
>>
>> +#ifdef __AVX512F__
>> +
>> +typedef __m512i __x86_zmm_t;
> 
> We don't need this interim type, using the native __m512 is enough afaics.
> 

Agree

> Looking at the whole applied series:
> $ git grep -lw __x86_zmm_t
> lib/librte_eal/x86/include/rte_vect.h
> 
> 
>> +
>> +#define        ZMM_SIZE        (sizeof(__x86_zmm_t))
>> +#define        ZMM_MASK        (ZMM_SIZE - 1)
> 
> Macros in a public header need a RTE_ prefix + this is x86 specific,
> then RTE_X86_.
> 
> Looking at the whole applied series:
> $ git grep -lw ZMM_SIZE
> lib/librte_eal/x86/include/rte_vect.h
> $ git grep -lw ZMM_MASK
> lib/librte_eal/x86/include/rte_vect.h
> 
> So I wonder if we need to export it or we can instead just #undef
> after the struct definition.

I think it's better to undef it

> 
> 
>> +
>> +typedef union __rte_x86_zmm  {
>> +       __x86_zmm_t      z;
>> +       ymm_t    y[ZMM_SIZE / sizeof(ymm_t)];
>> +       xmm_t    x[ZMM_SIZE / sizeof(xmm_t)];
>> +       uint8_t  u8[ZMM_SIZE / sizeof(uint8_t)];
>> +       uint16_t u16[ZMM_SIZE / sizeof(uint16_t)];
>> +       uint32_t u32[ZMM_SIZE / sizeof(uint32_t)];
>> +       uint64_t u64[ZMM_SIZE / sizeof(uint64_t)];
>> +       double   pd[ZMM_SIZE / sizeof(double)];
>> +} __rte_aligned(ZMM_SIZE) __rte_x86_zmm_t;
> 
> I don't understand this forced alignment statement.
> Would not natural alignment be enough, since all fields in this union
> have the same size?
> 

Some compilers won't align this union
https://mails.dpdk.org/archives/dev/2020-March/159591.html

>
  
David Marchand July 9, 2020, 3:20 p.m. UTC | #3
On Thu, Jul 9, 2020 at 4:52 PM Medvedkin, Vladimir
<vladimir.medvedkin@intel.com> wrote:
> >> +
> >> +#define        ZMM_SIZE        (sizeof(__x86_zmm_t))
> >> +#define        ZMM_MASK        (ZMM_SIZE - 1)
> >
> > Macros in a public header need a RTE_ prefix + this is x86 specific,
> > then RTE_X86_.
> >
> > Looking at the whole applied series:
> > $ git grep -lw ZMM_SIZE
> > lib/librte_eal/x86/include/rte_vect.h
> > $ git grep -lw ZMM_MASK
> > lib/librte_eal/x86/include/rte_vect.h
> >
> > So I wonder if we need to export it or we can instead just #undef
> > after the struct definition.
>
> I think it's better to undef it

Even if you undef the macro, please still prefix it.
This is to avoid conflicts with macros defined before including this
rte_vect.h header.


>
> >
> >
> >> +
> >> +typedef union __rte_x86_zmm  {
> >> +       __x86_zmm_t      z;
> >> +       ymm_t    y[ZMM_SIZE / sizeof(ymm_t)];
> >> +       xmm_t    x[ZMM_SIZE / sizeof(xmm_t)];
> >> +       uint8_t  u8[ZMM_SIZE / sizeof(uint8_t)];
> >> +       uint16_t u16[ZMM_SIZE / sizeof(uint16_t)];
> >> +       uint32_t u32[ZMM_SIZE / sizeof(uint32_t)];
> >> +       uint64_t u64[ZMM_SIZE / sizeof(uint64_t)];
> >> +       double   pd[ZMM_SIZE / sizeof(double)];
> >> +} __rte_aligned(ZMM_SIZE) __rte_x86_zmm_t;
> >
> > I don't understand this forced alignment statement.
> > Would not natural alignment be enough, since all fields in this union
> > have the same size?
> >
>
> Some compilers won't align this union
> https://mails.dpdk.org/archives/dev/2020-March/159591.html

Ok, interesting, I will try to keep in mind.
  

Patch

diff --git a/lib/librte_eal/x86/include/rte_vect.h b/lib/librte_eal/x86/include/rte_vect.h
index df5a60762..ae59126bc 100644
--- a/lib/librte_eal/x86/include/rte_vect.h
+++ b/lib/librte_eal/x86/include/rte_vect.h
@@ -13,6 +13,7 @@ 
 
 #include <stdint.h>
 #include <rte_config.h>
+#include <rte_common.h>
 #include "generic/rte_vect.h"
 
 #if (defined(__ICC) || \
@@ -90,6 +91,26 @@  __extension__ ({                 \
 })
 #endif /* (defined(__ICC) && __ICC < 1210) */
 
+#ifdef __AVX512F__
+
+typedef __m512i __x86_zmm_t;
+
+#define	ZMM_SIZE	(sizeof(__x86_zmm_t))
+#define	ZMM_MASK	(ZMM_SIZE - 1)
+
+typedef union __rte_x86_zmm  {
+	__x86_zmm_t	 z;
+	ymm_t    y[ZMM_SIZE / sizeof(ymm_t)];
+	xmm_t    x[ZMM_SIZE / sizeof(xmm_t)];
+	uint8_t  u8[ZMM_SIZE / sizeof(uint8_t)];
+	uint16_t u16[ZMM_SIZE / sizeof(uint16_t)];
+	uint32_t u32[ZMM_SIZE / sizeof(uint32_t)];
+	uint64_t u64[ZMM_SIZE / sizeof(uint64_t)];
+	double   pd[ZMM_SIZE / sizeof(double)];
+} __rte_aligned(ZMM_SIZE) __rte_x86_zmm_t;
+
+#endif /* __AVX512F__ */
+
 #ifdef __cplusplus
 }
 #endif