x86: rte_mov256 was missing for AVX2
Checks
Commit Message
The rte_mov256 function was missing for AVX2.
Does nobody build test for AVX2 and check the compiler output?
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
lib/eal/x86/include/rte_memcpy.h | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
Comments
20/08/2022 12:30, Morten Brørup:
> The rte_mov256 function was missing for AVX2.
> Does nobody build test for AVX2 and check the compiler output?
Please could you specify the context/setup to reproduce the issue?
An error message would be nice to paste here as well.
Thanks
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Monday, 29 August 2022 12.56
>
> 20/08/2022 12:30, Morten Brørup:
> > The rte_mov256 function was missing for AVX2.
> > Does nobody build test for AVX2 and check the compiler output?
>
> Please could you specify the context/setup to reproduce the issue?
I stumbled upon it while working on the new non-temporal memcpy function.
Reproduction described below.
>
> An error message would be nice to paste here as well.
> Thanks
The rte_memcpy declarations are in the lib/eal/generic/rte_memcpy.h header file, so add this declaration header file to the implementation file. (I wonder why it is not already there?)
lib/eal/x86/rte_memcpy.h:
#include <rte_common.h>
#include <rte_config.h>
#include <rte_debug.h>
+ #include "generic/rte_memcpy.h"
#ifdef __cplusplus
extern "C" {
#endif
The error messages from ninja look like this:
[46/2597] Compiling C object lib/acl/libavx2_tmp.a.p/acl_run_avx2.c.o
In file included from ../lib/eal/x86/include/rte_memcpy.h:24,
from ../lib/acl/rte_acl_osdep.h:40,
from ../lib/acl/rte_acl.h:14,
from ../lib/acl/acl_run.h:8,
from ../lib/acl/acl_run_sse.h:5,
from ../lib/acl/acl_run_avx2.h:5,
from ../lib/acl/acl_run_avx2.c:6:
../lib/eal/include/generic/rte_memcpy.h:89:1: warning: 'rte_mov256' declared 'static' but never defined [-Wunused-function]
89 | rte_mov256(uint8_t *dst, const uint8_t *src);
| ^~~~~~~~~~
[52/2597] Compiling C object lib/acl/libavx512_tmp.a.p/acl_run_avx512.c.o
In file included from ../lib/eal/x86/include/rte_memcpy.h:24,
from ../lib/acl/rte_acl_osdep.h:40,
from ../lib/acl/rte_acl.h:14,
from ../lib/acl/acl_run.h:8,
from ../lib/acl/acl_run_sse.h:5,
from ../lib/acl/acl_run_avx512.c:5:
../lib/eal/include/generic/rte_memcpy.h:89:1: warning: 'rte_mov256' declared 'static' but never defined [-Wunused-function]
89 | rte_mov256(uint8_t *dst, const uint8_t *src);
| ^~~~~~~~~~
At SmartShare Systems we follow a coding convention of including the declaration header file at the absolute top of the file implementing it. This reveals at build time if anything is missing in the declaration header file. The DPDK Project could do the same, and find bugs like this.
Here's an example:
foo.h:
------
// Declaration
static inline uint32_t bar(uint32_t x);
foo.c:
------
#include <foo.h> // <-- Note: At the absolute top!
#include <stdint.h>
// Implementation
static inline uint32_t bar(uint32_t x)
{
return x * 2;
}
Following our coding convention will reveal that <stdint.h> is required for using <foo.h>, and thus should be included in foo.h (not in foo.c) - because someone else might include <foo.h>, and then <stdint.h> could be missing there.
-Morten
29/08/2022 14:18, Morten Brørup:
> At SmartShare Systems we follow a coding convention of including the declaration header file at the absolute top of the file implementing it. This reveals at build time if anything is missing in the declaration header file. The DPDK Project could do the same, and find bugs like this.
>
> Here's an example:
>
> foo.h:
> ------
> // Declaration
> static inline uint32_t bar(uint32_t x);
>
> foo.c:
> ------
> #include <foo.h> // <-- Note: At the absolute top!
> #include <stdint.h>
>
> // Implementation
> static inline uint32_t bar(uint32_t x)
> {
> return x * 2;
> }
>
> Following our coding convention will reveal that <stdint.h> is required for using <foo.h>, and thus should be included in foo.h (not in foo.c) - because someone else might include <foo.h>, and then <stdint.h> could be missing there.
Yes we could follow this convention.
Bruce, David, Thomas,
PING. Please ack or review this simple patch, so it can be merged.
Details were already discussed on the list with Thomas.
NB: The test errors in Patchwork are bogus: "ERROR: Could not detect Ninja v1.5 or newer" is clearly not related to the patch.
-Morten
> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Saturday, 20 August 2022 12.31
>
> The rte_mov256 function was missing for AVX2.
> Does nobody build test for AVX2 and check the compiler output?
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
> lib/eal/x86/include/rte_memcpy.h | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/lib/eal/x86/include/rte_memcpy.h
> b/lib/eal/x86/include/rte_memcpy.h
> index b678b5c942..d4d7a5cfc8 100644
> --- a/lib/eal/x86/include/rte_memcpy.h
> +++ b/lib/eal/x86/include/rte_memcpy.h
> @@ -371,6 +371,23 @@ rte_mov128(uint8_t *dst, const uint8_t *src)
> rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 *
> 32);
> }
>
> +/**
> + * Copy 256 bytes from one location to another,
> + * locations should not overlap.
> + */
> +static __rte_always_inline void
> +rte_mov256(uint8_t *dst, const uint8_t *src)
> +{
> + rte_mov32((uint8_t *)dst + 0 * 32, (const uint8_t *)src + 0 *
> 32);
> + rte_mov32((uint8_t *)dst + 1 * 32, (const uint8_t *)src + 1 *
> 32);
> + rte_mov32((uint8_t *)dst + 2 * 32, (const uint8_t *)src + 2 *
> 32);
> + rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 *
> 32);
> + rte_mov32((uint8_t *)dst + 4 * 32, (const uint8_t *)src + 4 *
> 32);
> + rte_mov32((uint8_t *)dst + 5 * 32, (const uint8_t *)src + 5 *
> 32);
> + rte_mov32((uint8_t *)dst + 6 * 32, (const uint8_t *)src + 6 *
> 32);
> + rte_mov32((uint8_t *)dst + 7 * 32, (const uint8_t *)src + 7 *
> 32);
> +}
> +
> /**
> * Copy 128-byte blocks from one location to another,
> * locations should not overlap.
> --
> 2.17.1
On Wed, Sep 28, 2022 at 09:44:35PM +0200, Morten Brørup wrote:
> Bruce, David, Thomas,
>
> PING. Please ack or review this simple patch, so it can be merged.
>
> Details were already discussed on the list with Thomas.
>
> NB: The test errors in Patchwork are bogus: "ERROR: Could not detect Ninja v1.5 or newer" is clearly not related to the patch.
>
> -Morten
>
> > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > Sent: Saturday, 20 August 2022 12.31
> >
> > The rte_mov256 function was missing for AVX2.
> > Does nobody build test for AVX2 and check the compiler output?
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
On Sat, Aug 20, 2022 at 12:30 PM Morten Brørup <mb@smartsharesystems.com> wrote:
>
> The rte_mov256 function was missing for AVX2.
Afaiu:
Fixes: 9144d6bcdefd ("eal/x86: optimize memcpy for SSE and AVX")
This has been missing for a long time, so I guess nobody actually uses it.
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Applied, thanks.
If you think it is worth always including the generic/ headers in all
arch specific headers, can you work on it?
Thanks.
@@ -371,6 +371,23 @@ rte_mov128(uint8_t *dst, const uint8_t *src)
rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 * 32);
}
+/**
+ * Copy 256 bytes from one location to another,
+ * locations should not overlap.
+ */
+static __rte_always_inline void
+rte_mov256(uint8_t *dst, const uint8_t *src)
+{
+ rte_mov32((uint8_t *)dst + 0 * 32, (const uint8_t *)src + 0 * 32);
+ rte_mov32((uint8_t *)dst + 1 * 32, (const uint8_t *)src + 1 * 32);
+ rte_mov32((uint8_t *)dst + 2 * 32, (const uint8_t *)src + 2 * 32);
+ rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 * 32);
+ rte_mov32((uint8_t *)dst + 4 * 32, (const uint8_t *)src + 4 * 32);
+ rte_mov32((uint8_t *)dst + 5 * 32, (const uint8_t *)src + 5 * 32);
+ rte_mov32((uint8_t *)dst + 6 * 32, (const uint8_t *)src + 6 * 32);
+ rte_mov32((uint8_t *)dst + 7 * 32, (const uint8_t *)src + 7 * 32);
+}
+
/**
* Copy 128-byte blocks from one location to another,
* locations should not overlap.