[dpdk-dev] eal: force gcc to inline rte_movX function

Message ID 20180412051636.240746-1-junjie.j.chen@intel.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

junjie.j.chen@intel.com April 12, 2018, 5:16 a.m. UTC
  From: "Chen, Junjie" <junjie.j.chen@intel.com>

Sometimes gcc does not inline the function despite keyword *inline*,
we obeserve rte_movX is not inline when doing performance profiling,
so use *always_inline* keyword to force gcc to inline the function.

Signed-off-by: Chen, Junjie <junjie.j.chen@intel.com>
---
 .../common/include/arch/x86/rte_memcpy.h           | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)
  

Comments

Thomas Monjalon April 17, 2018, 1:22 p.m. UTC | #1
12/04/2018 07:16, Junjie Chen:
> From: "Chen, Junjie" <junjie.j.chen@intel.com>
> 
> Sometimes gcc does not inline the function despite keyword *inline*,
> we obeserve rte_movX is not inline when doing performance profiling,
> so use *always_inline* keyword to force gcc to inline the function.
> 
> Signed-off-by: Chen, Junjie <junjie.j.chen@intel.com>
> ---
>  .../common/include/arch/x86/rte_memcpy.h           | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)

The title should start with "eal/x86:"
Something like that:
	eal/x86: force inlining of memcpy sub-functions

Bruce, Konstantin, any review of the content/optimization?
  
Bruce Richardson April 17, 2018, 2:57 p.m. UTC | #2
On Tue, Apr 17, 2018 at 03:22:06PM +0200, Thomas Monjalon wrote:
> 12/04/2018 07:16, Junjie Chen:
> > From: "Chen, Junjie" <junjie.j.chen@intel.com>
> > 
> > Sometimes gcc does not inline the function despite keyword *inline*,
> > we obeserve rte_movX is not inline when doing performance profiling,
> > so use *always_inline* keyword to force gcc to inline the function.
> > 
> > Signed-off-by: Chen, Junjie <junjie.j.chen@intel.com>
> > ---
> >  .../common/include/arch/x86/rte_memcpy.h           | 22 +++++++++++-----------
> >  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> The title should start with "eal/x86:"
> Something like that:
> 	eal/x86: force inlining of memcpy sub-functions
> 
> Bruce, Konstantin, any review of the content/optimization?
> 
No objection here.

Acked-by: Bruce Richardson <bruce.richardson@intel.com>
  
junjie.j.chen@intel.com April 18, 2018, 2:43 a.m. UTC | #3
Thanks to point this out. I agree for the title change.

Do you want me to send v2 patch? Or you can handle it when committing? 

> > > Sometimes gcc does not inline the function despite keyword *inline*,
> > > we obeserve rte_movX is not inline when doing performance profiling,
> > > so use *always_inline* keyword to force gcc to inline the function.
> > >
> > > Signed-off-by: Chen, Junjie <junjie.j.chen@intel.com>
> > > ---
> > >  .../common/include/arch/x86/rte_memcpy.h           | 22
> +++++++++++-----------
> > >  1 file changed, 11 insertions(+), 11 deletions(-)
> >
> > The title should start with "eal/x86:"
> > Something like that:
> > 	eal/x86: force inlining of memcpy sub-functions
> >
> > Bruce, Konstantin, any review of the content/optimization?
> >
> No objection here.
> 
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
  
Thomas Monjalon April 18, 2018, 7:25 a.m. UTC | #4
18/04/2018 04:43, Chen, Junjie J:
> Thanks to point this out. I agree for the title change.
> 
> Do you want me to send v2 patch? Or you can handle it when committing? 
> 
> > > > Sometimes gcc does not inline the function despite keyword *inline*,
> > > > we obeserve rte_movX is not inline when doing performance profiling,
> > > > so use *always_inline* keyword to force gcc to inline the function.
> > > >
> > > > Signed-off-by: Chen, Junjie <junjie.j.chen@intel.com>
> > > > ---
> > > >  .../common/include/arch/x86/rte_memcpy.h           | 22
> > +++++++++++-----------
> > > >  1 file changed, 11 insertions(+), 11 deletions(-)
> > >
> > > The title should start with "eal/x86:"
> > > Something like that:
> > > 	eal/x86: force inlining of memcpy sub-functions
> > >
> > > Bruce, Konstantin, any review of the content/optimization?
> > >
> > No objection here.
> > 
> > Acked-by: Bruce Richardson <bruce.richardson@intel.com>

Applied, thanks
  

Patch

diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
index cc140ecca..5ead68ab2 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
@@ -52,7 +52,7 @@  rte_memcpy(void *dst, const void *src, size_t n);
  * Copy 16 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
 	__m128i xmm0;
@@ -65,7 +65,7 @@  rte_mov16(uint8_t *dst, const uint8_t *src)
  * Copy 32 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov32(uint8_t *dst, const uint8_t *src)
 {
 	__m256i ymm0;
@@ -78,7 +78,7 @@  rte_mov32(uint8_t *dst, const uint8_t *src)
  * Copy 64 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov64(uint8_t *dst, const uint8_t *src)
 {
 	__m512i zmm0;
@@ -91,7 +91,7 @@  rte_mov64(uint8_t *dst, const uint8_t *src)
  * Copy 128 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov128(uint8_t *dst, const uint8_t *src)
 {
 	rte_mov64(dst + 0 * 64, src + 0 * 64);
@@ -102,7 +102,7 @@  rte_mov128(uint8_t *dst, const uint8_t *src)
  * Copy 256 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov256(uint8_t *dst, const uint8_t *src)
 {
 	rte_mov64(dst + 0 * 64, src + 0 * 64);
@@ -293,7 +293,7 @@  rte_memcpy_generic(void *dst, const void *src, size_t n)
  * Copy 16 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
 	__m128i xmm0;
@@ -306,7 +306,7 @@  rte_mov16(uint8_t *dst, const uint8_t *src)
  * Copy 32 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov32(uint8_t *dst, const uint8_t *src)
 {
 	__m256i ymm0;
@@ -319,7 +319,7 @@  rte_mov32(uint8_t *dst, const uint8_t *src)
  * Copy 64 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov64(uint8_t *dst, const uint8_t *src)
 {
 	rte_mov32((uint8_t *)dst + 0 * 32, (const uint8_t *)src + 0 * 32);
@@ -486,7 +486,7 @@  rte_memcpy_generic(void *dst, const void *src, size_t n)
  * Copy 16 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
 	__m128i xmm0;
@@ -499,7 +499,7 @@  rte_mov16(uint8_t *dst, const uint8_t *src)
  * Copy 32 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov32(uint8_t *dst, const uint8_t *src)
 {
 	rte_mov16((uint8_t *)dst + 0 * 16, (const uint8_t *)src + 0 * 16);
@@ -510,7 +510,7 @@  rte_mov32(uint8_t *dst, const uint8_t *src)
  * Copy 64 bytes from one location to another,
  * locations should not overlap.
  */
-static inline void
+static __rte_always_inline void
 rte_mov64(uint8_t *dst, const uint8_t *src)
 {
 	rte_mov16((uint8_t *)dst + 0 * 16, (const uint8_t *)src + 0 * 16);