[dpdk-dev] eal: force gcc to inline rte_movX function
Checks
Commit Message
From: "Chen, Junjie" <junjie.j.chen@intel.com>
Sometimes gcc does not inline the function despite keyword *inline*,
we obeserve rte_movX is not inline when doing performance profiling,
so use *always_inline* keyword to force gcc to inline the function.
Signed-off-by: Chen, Junjie <junjie.j.chen@intel.com>
---
.../common/include/arch/x86/rte_memcpy.h | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
Comments
12/04/2018 07:16, Junjie Chen:
> From: "Chen, Junjie" <junjie.j.chen@intel.com>
>
> Sometimes gcc does not inline the function despite keyword *inline*,
> we obeserve rte_movX is not inline when doing performance profiling,
> so use *always_inline* keyword to force gcc to inline the function.
>
> Signed-off-by: Chen, Junjie <junjie.j.chen@intel.com>
> ---
> .../common/include/arch/x86/rte_memcpy.h | 22 +++++++++++-----------
> 1 file changed, 11 insertions(+), 11 deletions(-)
The title should start with "eal/x86:"
Something like that:
eal/x86: force inlining of memcpy sub-functions
Bruce, Konstantin, any review of the content/optimization?
On Tue, Apr 17, 2018 at 03:22:06PM +0200, Thomas Monjalon wrote:
> 12/04/2018 07:16, Junjie Chen:
> > From: "Chen, Junjie" <junjie.j.chen@intel.com>
> >
> > Sometimes gcc does not inline the function despite keyword *inline*,
> > we obeserve rte_movX is not inline when doing performance profiling,
> > so use *always_inline* keyword to force gcc to inline the function.
> >
> > Signed-off-by: Chen, Junjie <junjie.j.chen@intel.com>
> > ---
> > .../common/include/arch/x86/rte_memcpy.h | 22 +++++++++++-----------
> > 1 file changed, 11 insertions(+), 11 deletions(-)
>
> The title should start with "eal/x86:"
> Something like that:
> eal/x86: force inlining of memcpy sub-functions
>
> Bruce, Konstantin, any review of the content/optimization?
>
No objection here.
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Thanks to point this out. I agree for the title change.
Do you want me to send v2 patch? Or you can handle it when committing?
> > > Sometimes gcc does not inline the function despite keyword *inline*,
> > > we obeserve rte_movX is not inline when doing performance profiling,
> > > so use *always_inline* keyword to force gcc to inline the function.
> > >
> > > Signed-off-by: Chen, Junjie <junjie.j.chen@intel.com>
> > > ---
> > > .../common/include/arch/x86/rte_memcpy.h | 22
> +++++++++++-----------
> > > 1 file changed, 11 insertions(+), 11 deletions(-)
> >
> > The title should start with "eal/x86:"
> > Something like that:
> > eal/x86: force inlining of memcpy sub-functions
> >
> > Bruce, Konstantin, any review of the content/optimization?
> >
> No objection here.
>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
18/04/2018 04:43, Chen, Junjie J:
> Thanks to point this out. I agree for the title change.
>
> Do you want me to send v2 patch? Or you can handle it when committing?
>
> > > > Sometimes gcc does not inline the function despite keyword *inline*,
> > > > we obeserve rte_movX is not inline when doing performance profiling,
> > > > so use *always_inline* keyword to force gcc to inline the function.
> > > >
> > > > Signed-off-by: Chen, Junjie <junjie.j.chen@intel.com>
> > > > ---
> > > > .../common/include/arch/x86/rte_memcpy.h | 22
> > +++++++++++-----------
> > > > 1 file changed, 11 insertions(+), 11 deletions(-)
> > >
> > > The title should start with "eal/x86:"
> > > Something like that:
> > > eal/x86: force inlining of memcpy sub-functions
> > >
> > > Bruce, Konstantin, any review of the content/optimization?
> > >
> > No objection here.
> >
> > Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Applied, thanks
@@ -52,7 +52,7 @@ rte_memcpy(void *dst, const void *src, size_t n);
* Copy 16 bytes from one location to another,
* locations should not overlap.
*/
-static inline void
+static __rte_always_inline void
rte_mov16(uint8_t *dst, const uint8_t *src)
{
__m128i xmm0;
@@ -65,7 +65,7 @@ rte_mov16(uint8_t *dst, const uint8_t *src)
* Copy 32 bytes from one location to another,
* locations should not overlap.
*/
-static inline void
+static __rte_always_inline void
rte_mov32(uint8_t *dst, const uint8_t *src)
{
__m256i ymm0;
@@ -78,7 +78,7 @@ rte_mov32(uint8_t *dst, const uint8_t *src)
* Copy 64 bytes from one location to another,
* locations should not overlap.
*/
-static inline void
+static __rte_always_inline void
rte_mov64(uint8_t *dst, const uint8_t *src)
{
__m512i zmm0;
@@ -91,7 +91,7 @@ rte_mov64(uint8_t *dst, const uint8_t *src)
* Copy 128 bytes from one location to another,
* locations should not overlap.
*/
-static inline void
+static __rte_always_inline void
rte_mov128(uint8_t *dst, const uint8_t *src)
{
rte_mov64(dst + 0 * 64, src + 0 * 64);
@@ -102,7 +102,7 @@ rte_mov128(uint8_t *dst, const uint8_t *src)
* Copy 256 bytes from one location to another,
* locations should not overlap.
*/
-static inline void
+static __rte_always_inline void
rte_mov256(uint8_t *dst, const uint8_t *src)
{
rte_mov64(dst + 0 * 64, src + 0 * 64);
@@ -293,7 +293,7 @@ rte_memcpy_generic(void *dst, const void *src, size_t n)
* Copy 16 bytes from one location to another,
* locations should not overlap.
*/
-static inline void
+static __rte_always_inline void
rte_mov16(uint8_t *dst, const uint8_t *src)
{
__m128i xmm0;
@@ -306,7 +306,7 @@ rte_mov16(uint8_t *dst, const uint8_t *src)
* Copy 32 bytes from one location to another,
* locations should not overlap.
*/
-static inline void
+static __rte_always_inline void
rte_mov32(uint8_t *dst, const uint8_t *src)
{
__m256i ymm0;
@@ -319,7 +319,7 @@ rte_mov32(uint8_t *dst, const uint8_t *src)
* Copy 64 bytes from one location to another,
* locations should not overlap.
*/
-static inline void
+static __rte_always_inline void
rte_mov64(uint8_t *dst, const uint8_t *src)
{
rte_mov32((uint8_t *)dst + 0 * 32, (const uint8_t *)src + 0 * 32);
@@ -486,7 +486,7 @@ rte_memcpy_generic(void *dst, const void *src, size_t n)
* Copy 16 bytes from one location to another,
* locations should not overlap.
*/
-static inline void
+static __rte_always_inline void
rte_mov16(uint8_t *dst, const uint8_t *src)
{
__m128i xmm0;
@@ -499,7 +499,7 @@ rte_mov16(uint8_t *dst, const uint8_t *src)
* Copy 32 bytes from one location to another,
* locations should not overlap.
*/
-static inline void
+static __rte_always_inline void
rte_mov32(uint8_t *dst, const uint8_t *src)
{
rte_mov16((uint8_t *)dst + 0 * 16, (const uint8_t *)src + 0 * 16);
@@ -510,7 +510,7 @@ rte_mov32(uint8_t *dst, const uint8_t *src)
* Copy 64 bytes from one location to another,
* locations should not overlap.
*/
-static inline void
+static __rte_always_inline void
rte_mov64(uint8_t *dst, const uint8_t *src)
{
rte_mov16((uint8_t *)dst + 0 * 16, (const uint8_t *)src + 0 * 16);