[1/6] eal: introduce zmm type for AVX 512-bit

Message ID 1583757826-375246-2-git-send-email-vladimir.medvedkin@intel.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series fib: implement AVX512 vector lookup |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Vladimir Medvedkin March 9, 2020, 12:43 p.m. UTC
  New data type to manipulate 512 bit AVX values.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 lib/librte_eal/common/include/arch/x86/rte_vect.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)
  

Comments

Jerin Jacob March 9, 2020, 4:39 p.m. UTC | #1
On Mon, Mar 9, 2020 at 6:14 PM Vladimir Medvedkin
<vladimir.medvedkin@intel.com> wrote:
>
> New data type to manipulate 512 bit AVX values.
>
> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> ---
>  lib/librte_eal/common/include/arch/x86/rte_vect.h | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
>
> diff --git a/lib/librte_eal/common/include/arch/x86/rte_vect.h b/lib/librte_eal/common/include/arch/x86/rte_vect.h
> index df5a607..09f30e6 100644
> --- a/lib/librte_eal/common/include/arch/x86/rte_vect.h
> +++ b/lib/librte_eal/common/include/arch/x86/rte_vect.h
> @@ -90,6 +90,26 @@ __extension__ ({                 \
>  })
>  #endif /* (defined(__ICC) && __ICC < 1210) */
>
> +#ifdef __AVX512F__
> +
> +typedef __m512i zmm_t;
> +
> +#define        ZMM_SIZE        (sizeof(zmm_t))
> +#define        ZMM_MASK        (ZMM_SIZE - 1)
> +
> +typedef union rte_zmm {
> +       zmm_t    z;
> +       ymm_t    y[ZMM_SIZE / sizeof(ymm_t)];
> +       xmm_t    x[ZMM_SIZE / sizeof(xmm_t)];
> +       uint8_t  u8[ZMM_SIZE / sizeof(uint8_t)];
> +       uint16_t u16[ZMM_SIZE / sizeof(uint16_t)];
> +       uint32_t u32[ZMM_SIZE / sizeof(uint32_t)];
> +       uint64_t u64[ZMM_SIZE / sizeof(uint64_t)];
> +       double   pd[ZMM_SIZE / sizeof(double)];

Are we missing __attribute__((aligned(64))) here?

> +} rte_zmm_t;

IMO, Due to legacy reason, we have selected  rte_xmm_t, rte_ymm_t for
128 and 256 operations in public APIs[1]

# Not sure where xmm_t and ymm_t and new zmm_t come from? Is this name
x86 arch-specific? If so,
why not give the more generic name rte_512i_t or something?
# Currently, In every arch file, we are repeating the definition for
rte_xmm_t, Why not make, this generic definition
in common file. ie.  rte_zmm_t or rte_512i_t definition in common
file(./lib/librte_eal/common/include/generic/rte_vect.h)
# Currently ./lib/librte_eal/common/include/generic/rte_vect.h has
defintion for rte_vXsY_t for vector representation, would that
be enough for public API? Do we need to new type?


[1]
rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
uint32_t defv)


> +
> +#endif /* __AVX512F__ */
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 2.7.4
>
  
Vladimir Medvedkin March 10, 2020, 2:44 p.m. UTC | #2
Hi Jerin,

On 09/03/2020 16:39, Jerin Jacob wrote:
> On Mon, Mar 9, 2020 at 6:14 PM Vladimir Medvedkin
> <vladimir.medvedkin@intel.com> wrote:
>> New data type to manipulate 512 bit AVX values.
>>
>> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
>> ---
>>   lib/librte_eal/common/include/arch/x86/rte_vect.h | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
>>
>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_vect.h b/lib/librte_eal/common/include/arch/x86/rte_vect.h
>> index df5a607..09f30e6 100644
>> --- a/lib/librte_eal/common/include/arch/x86/rte_vect.h
>> +++ b/lib/librte_eal/common/include/arch/x86/rte_vect.h
>> @@ -90,6 +90,26 @@ __extension__ ({                 \
>>   })
>>   #endif /* (defined(__ICC) && __ICC < 1210) */
>>
>> +#ifdef __AVX512F__
>> +
>> +typedef __m512i zmm_t;
>> +
>> +#define        ZMM_SIZE        (sizeof(zmm_t))
>> +#define        ZMM_MASK        (ZMM_SIZE - 1)
>> +
>> +typedef union rte_zmm {
>> +       zmm_t    z;
>> +       ymm_t    y[ZMM_SIZE / sizeof(ymm_t)];
>> +       xmm_t    x[ZMM_SIZE / sizeof(xmm_t)];
>> +       uint8_t  u8[ZMM_SIZE / sizeof(uint8_t)];
>> +       uint16_t u16[ZMM_SIZE / sizeof(uint16_t)];
>> +       uint32_t u32[ZMM_SIZE / sizeof(uint32_t)];
>> +       uint64_t u64[ZMM_SIZE / sizeof(uint64_t)];
>> +       double   pd[ZMM_SIZE / sizeof(double)];
> Are we missing __attribute__((aligned(64))) here?
Agree. While modern compilers align __m512i by default, some old could 
failure to align. Please correct me if I'm wrong.
>
>> +} rte_zmm_t;
> IMO, Due to legacy reason, we have selected  rte_xmm_t, rte_ymm_t for
> 128 and 256 operations in public APIs[1]
As for me, since these functions are inlined, prototype should be 
changed to uint32_t ip[4] instead of passing vector type as an argument.
> # Not sure where xmm_t and ymm_t and new zmm_t come from? Is this name
> x86 arch-specific?
Yes, that's why they are in arch/x86/rte_vect.h
> If so,
> why not give the more generic name rte_512i_t or something?
> # Currently, In every arch file, we are repeating the definition for
> rte_xmm_t, Why not make, this generic definition
> in common file. ie.  rte_zmm_t or rte_512i_t definition in common
> file(./lib/librte_eal/common/include/generic/rte_vect.h)
I think there could be some arch specific thing that prevents it from 
being generic.
> # Currently ./lib/librte_eal/common/include/generic/rte_vect.h has
> defintion for rte_vXsY_t for vector representation, would that
> be enough for public API? Do we need to new type?

Definitions for rte_vXsY_tare almost the same as compiler's 
__m[128,256,512]i apart from alignment.
Union types such as rte_zmm_t are very useful because of the ability to 
access parts of a wide vector register with an arbitrary granularity. 
For example, some old compiler don't support 
_mm512_set_epi8()/_mm512_set_epi16() intrinsics, so accessing ".u8[]" of 
".u16[]" solves the problem.

>
>
> [1]
> rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
> uint32_t defv)
>
>
>> +
>> +#endif /* __AVX512F__ */
>> +
>>   #ifdef __cplusplus
>>   }
>>   #endif
>> --
>> 2.7.4
>>
  
Jerin Jacob March 20, 2020, 8:23 a.m. UTC | #3
On Tue, Mar 10, 2020 at 8:14 PM Medvedkin, Vladimir
<vladimir.medvedkin@intel.com> wrote:
>
> Hi Jerin,

Hi Vladimir,


>
> Are we missing __attribute__((aligned(64))) here?
>
> Agree. While modern compilers align __m512i by default, some old could failure to align. Please correct me if I'm wrong.

Yes.

>
> +} rte_zmm_t;
>
> IMO, Due to legacy reason, we have selected  rte_xmm_t, rte_ymm_t for
> 128 and 256 operations in public APIs[1]
>
> As for me, since these functions are inlined, prototype should be changed to uint32_t ip[4] instead of passing vector type as an argument.

OK. Makes sense.

> # Not sure where xmm_t and ymm_t and new zmm_t come from? Is this name
> x86 arch-specific?
>
> Yes, that's why they are in arch/x86/rte_vect.h

See the last comment.

>
> If so,
> why not give the more generic name rte_512i_t or something?
> # Currently, In every arch file, we are repeating the definition for
> rte_xmm_t, Why not make, this generic definition
> in common file. ie.  rte_zmm_t or rte_512i_t definition in common
> file(./lib/librte_eal/common/include/generic/rte_vect.h)
>
> I think there could be some arch specific thing that prevents it from being generic.
>
> # Currently ./lib/librte_eal/common/include/generic/rte_vect.h has
> defintion for rte_vXsY_t for vector representation, would that
> be enough for public API? Do we need to new type?
>
> Definitions for rte_vXsY_tare almost the same as compiler's __m[128,256,512]i apart from alignment.
> Union types such as rte_zmm_t are very useful because of the ability to access parts of a wide vector register with an arbitrary granularity. For example, some old compiler don't support _mm512_set_epi8()/_mm512_set_epi16() intrinsics, so accessing ".u8[]" of ".u16[]" solves the problem.

Yes. We are on the same page.

I think, the only difference in thought is, the x86 specific
definition(rte_zmm_t) name should be something
it needs to be reflected as internal or arch-specific. Earlier APIs
such rte_lpm_lookupx4 has leaked
the xmm_t  definition to public API.
To avoid that danger, please make rte_zmm_t as internal/arch-specific.
Something __rte_x86_zmm_t or
so that denotes it is not a public symbol.
  

Patch

diff --git a/lib/librte_eal/common/include/arch/x86/rte_vect.h b/lib/librte_eal/common/include/arch/x86/rte_vect.h
index df5a607..09f30e6 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_vect.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_vect.h
@@ -90,6 +90,26 @@  __extension__ ({                 \
 })
 #endif /* (defined(__ICC) && __ICC < 1210) */
 
+#ifdef __AVX512F__
+
+typedef __m512i zmm_t;
+
+#define	ZMM_SIZE	(sizeof(zmm_t))
+#define	ZMM_MASK	(ZMM_SIZE - 1)
+
+typedef union rte_zmm {
+	zmm_t	 z;
+	ymm_t    y[ZMM_SIZE / sizeof(ymm_t)];
+	xmm_t    x[ZMM_SIZE / sizeof(xmm_t)];
+	uint8_t  u8[ZMM_SIZE / sizeof(uint8_t)];
+	uint16_t u16[ZMM_SIZE / sizeof(uint16_t)];
+	uint32_t u32[ZMM_SIZE / sizeof(uint32_t)];
+	uint64_t u64[ZMM_SIZE / sizeof(uint64_t)];
+	double   pd[ZMM_SIZE / sizeof(double)];
+} rte_zmm_t;
+
+#endif /* __AVX512F__ */
+
 #ifdef __cplusplus
 }
 #endif