eal/x86: remove redundant round to improve performance

Message ID 20230329091658.1599349-1-leyi.rong@intel.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series eal/x86: remove redundant round to improve performance |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-unit-testing success Testing PASS
ci/github-robot: build success github build: passed
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-testing warning Testing issues
ci/iol-x86_64-unit-testing fail Testing issues
ci/intel-Functional success Functional PASS
ci/iol-abi-testing success Testing PASS

Commit Message

Leyi Rong March 29, 2023, 9:16 a.m. UTC
  In rte_memcpy_aligned(), one redundant round is taken in the 64 bytes
block copy loops if the size is a multiple of 64. So, let the catch-up
copy the last 64 bytes in this case.

Suggested-by: Morten Brørup <mb@smartsharesystems.com>
Signed-off-by: Leyi Rong <leyi.rong@intel.com>
---
 lib/eal/x86/include/rte_memcpy.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Morten Brørup March 29, 2023, 9:30 a.m. UTC | #1
> From: Leyi Rong [mailto:leyi.rong@intel.com]
> Sent: Wednesday, 29 March 2023 11.17
> 
> In rte_memcpy_aligned(), one redundant round is taken in the 64 bytes
> block copy loops if the size is a multiple of 64. So, let the catch-up
> copy the last 64 bytes in this case.
> 
> Suggested-by: Morten Brørup <mb@smartsharesystems.com>
> Signed-off-by: Leyi Rong <leyi.rong@intel.com>
> ---
>  lib/eal/x86/include/rte_memcpy.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/eal/x86/include/rte_memcpy.h
> b/lib/eal/x86/include/rte_memcpy.h
> index d4d7a5cfc8..fd151be708 100644
> --- a/lib/eal/x86/include/rte_memcpy.h
> +++ b/lib/eal/x86/include/rte_memcpy.h
> @@ -846,7 +846,7 @@ rte_memcpy_aligned(void *dst, const void *src, size_t n)
>  	}
> 
>  	/* Copy 64 bytes blocks */
> -	for (; n >= 64; n -= 64) {
> +	for (; n > 64; n -= 64) {
>  		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
>  		dst = (uint8_t *)dst + 64;
>  		src = (const uint8_t *)src + 64;
> --
> 2.34.1
> 

Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
  
Bruce Richardson March 29, 2023, 10:20 a.m. UTC | #2
On Wed, Mar 29, 2023 at 05:16:58PM +0800, Leyi Rong wrote:
> In rte_memcpy_aligned(), one redundant round is taken in the 64 bytes
> block copy loops if the size is a multiple of 64. So, let the catch-up
> copy the last 64 bytes in this case.
> 
> Suggested-by: Morten Brørup <mb@smartsharesystems.com>
> Signed-off-by: Leyi Rong <leyi.rong@intel.com>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

Thanks for doing the fix for this.
  
David Marchand April 4, 2023, 1:15 p.m. UTC | #3
On Wed, Mar 29, 2023 at 11:17 AM Leyi Rong <leyi.rong@intel.com> wrote:
>
> In rte_memcpy_aligned(), one redundant round is taken in the 64 bytes
> block copy loops if the size is a multiple of 64. So, let the catch-up
> copy the last 64 bytes in this case.

Fixes: f5472703c0bd ("eal: optimize aligned memcpy on x86")

>
> Suggested-by: Morten Brørup <mb@smartsharesystems.com>
> Signed-off-by: Leyi Rong <leyi.rong@intel.com>

Reviewed-by: David Marchand <david.marchand@redhat.com>
  
David Marchand June 7, 2023, 4:44 p.m. UTC | #4
On Tue, Apr 4, 2023 at 3:15 PM David Marchand <david.marchand@redhat.com> wrote:
> On Wed, Mar 29, 2023 at 11:17 AM Leyi Rong <leyi.rong@intel.com> wrote:
> >
> > In rte_memcpy_aligned(), one redundant round is taken in the 64 bytes
> > block copy loops if the size is a multiple of 64. So, let the catch-up
> > copy the last 64 bytes in this case.
>
> Fixes: f5472703c0bd ("eal: optimize aligned memcpy on x86")
> >
> > Suggested-by: Morten Brørup <mb@smartsharesystems.com>
> > Signed-off-by: Leyi Rong <leyi.rong@intel.com>
> Reviewed-by: David Marchand <david.marchand@redhat.com>

Applied, thanks.
  

Patch

diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h
index d4d7a5cfc8..fd151be708 100644
--- a/lib/eal/x86/include/rte_memcpy.h
+++ b/lib/eal/x86/include/rte_memcpy.h
@@ -846,7 +846,7 @@  rte_memcpy_aligned(void *dst, const void *src, size_t n)
 	}
 
 	/* Copy 64 bytes blocks */
-	for (; n >= 64; n -= 64) {
+	for (; n > 64; n -= 64) {
 		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
 		dst = (uint8_t *)dst + 64;
 		src = (const uint8_t *)src + 64;