[dpdk-dev,v3] mempool: replace c memcpy code semantics with optimized rte_memcpy

Message ID 1467288996-6109-1-git-send-email-jerin.jacob@caviumnetworks.com (mailing list archive)
State Accepted, archived
Delegated to: Yuanhan Liu
Headers

Commit Message

Jerin Jacob June 30, 2016, 12:16 p.m. UTC
  Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
v1..v2
Corrected the the git commit message(s/mbuf/mempool/g)
v2..v3
re-base to master
---
---
 lib/librte_mempool/rte_mempool.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
  

Comments

Thomas Monjalon June 30, 2016, 5:28 p.m. UTC | #1
2016-06-30 17:46, Jerin Jacob:
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Acked-by: Olivier Matz <olivier.matz@6wind.com>

Applied, thanks
  
Ferruh Yigit July 5, 2016, 8:43 a.m. UTC | #2
On 6/30/2016 6:28 PM, Thomas Monjalon wrote:
> 2016-06-30 17:46, Jerin Jacob:
>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> 
> Applied, thanks
> 

Hi Jerin,

This commit cause a compilation error on target i686-native-linuxapp-gcc
with gcc6.

Compilation error is:
== Build drivers/net/virtio
  CC virtio_rxtx_simple.o
In file included from
/root/development/dpdk/build/include/rte_mempool.h:77:0,
                 from
/root/development/dpdk/drivers/net/virtio/virtio_rxtx_simple.c:46:
/root/development/dpdk/drivers/net/virtio/virtio_rxtx_simple.c: In
function ‘virtio_xmit_pkts_simple’:
/root/development/dpdk/build/include/rte_memcpy.h:551:2: error: array
subscript is above array bounds [-Werror=array-bounds]
  rte_mov16((uint8_t *)dst + 1 * 16, (const uint8_t *)src + 1 * 16);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/development/dpdk/build/include/rte_memcpy.h:552:2: error: array
subscript is above array bounds [-Werror=array-bounds]
  rte_mov16((uint8_t *)dst + 2 * 16, (const uint8_t *)src + 2 * 16);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/development/dpdk/build/include/rte_memcpy.h:553:2: error: array
subscript is above array bounds [-Werror=array-bounds]
  rte_mov16((uint8_t *)dst + 3 * 16, (const uint8_t *)src + 3 * 16);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/development/dpdk/build/include/rte_memcpy.h:554:2: error: array
subscript is above array bounds [-Werror=array-bounds]
  rte_mov16((uint8_t *)dst + 4 * 16, (const uint8_t *)src + 4 * 16);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...
...

Thanks,
ferruh
  
Yuanhan Liu July 5, 2016, 11:32 a.m. UTC | #3
On Tue, Jul 05, 2016 at 09:43:03AM +0100, Ferruh Yigit wrote:
> On 6/30/2016 6:28 PM, Thomas Monjalon wrote:
> > 2016-06-30 17:46, Jerin Jacob:
> >> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> >> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> > 
> > Applied, thanks
> > 
> 
> Hi Jerin,
> 
> This commit cause a compilation error on target i686-native-linuxapp-gcc
> with gcc6.

Besides that, I'm more curious to know have you actually seen any
performance boost?

	--yliu
  
Jerin Jacob July 5, 2016, 1:13 p.m. UTC | #4
On Tue, Jul 05, 2016 at 07:32:46PM +0800, Yuanhan Liu wrote:
> On Tue, Jul 05, 2016 at 09:43:03AM +0100, Ferruh Yigit wrote:
> > On 6/30/2016 6:28 PM, Thomas Monjalon wrote:
> > > 2016-06-30 17:46, Jerin Jacob:
> > >> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > >> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> > > 
> > > Applied, thanks
> > > 
> > 
> > Hi Jerin,
> > 
> > This commit cause a compilation error on target i686-native-linuxapp-gcc
> > with gcc6.
> 
> Besides that, I'm more curious to know have you actually seen any
> performance boost?

let me first address your curiosity,
http://dpdk.org/dev/patchwork/patch/12993/( check the second comment)
http://dpdk.org/ml/archives/dev/2016-June/042701.html

Ferruh,

I have tested on a x86 machine with gcc 6.1. I could n't see any issues
with i686-native-linuxapp-gcc target

Steps following to create gcc 6.1 toolchain
https://sahas.ra.naman.ms/2016/05/31/building-gcc6-1-on-fedora-23/
(removed --disable-multilib to have support for -m32)

➜ [dpdk-master] $ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/gcc-6.1.0/libexec/gcc/x86_64-pc-linux-gnu/6.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-6.1.0/configure --prefix=/opt/gcc-6.1.0
--enable-languages=c,c++ --enable-libmudflap --with-system-zlib
Thread model: posix
gcc version 6.1.0 (GCC)

More over this issue seems like an issue from x86 rte_memcpy implementation.

Jerin
  
Yuanhan Liu July 5, 2016, 1:42 p.m. UTC | #5
On Tue, Jul 05, 2016 at 06:43:57PM +0530, Jerin Jacob wrote:
> On Tue, Jul 05, 2016 at 07:32:46PM +0800, Yuanhan Liu wrote:
> > On Tue, Jul 05, 2016 at 09:43:03AM +0100, Ferruh Yigit wrote:
> > > On 6/30/2016 6:28 PM, Thomas Monjalon wrote:
> > > > 2016-06-30 17:46, Jerin Jacob:
> > > >> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > >> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> > > > 
> > > > Applied, thanks
> > > > 
> > > 
> > > Hi Jerin,
> > > 
> > > This commit cause a compilation error on target i686-native-linuxapp-gcc
> > > with gcc6.
> > 
> > Besides that, I'm more curious to know have you actually seen any
> > performance boost?
> 
> let me first address your curiosity,
> http://dpdk.org/dev/patchwork/patch/12993/( check the second comment)
> http://dpdk.org/ml/archives/dev/2016-June/042701.html

Thanks. Maybe it's good to include those info in the commit log next
time.

	--yliu
  
Ferruh Yigit July 5, 2016, 2:09 p.m. UTC | #6
On 7/5/2016 2:13 PM, Jerin Jacob wrote:
> On Tue, Jul 05, 2016 at 07:32:46PM +0800, Yuanhan Liu wrote:
>> On Tue, Jul 05, 2016 at 09:43:03AM +0100, Ferruh Yigit wrote:
>>> On 6/30/2016 6:28 PM, Thomas Monjalon wrote:
>>>> 2016-06-30 17:46, Jerin Jacob:
>>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>>> Acked-by: Olivier Matz <olivier.matz@6wind.com>
>>>>
>>>> Applied, thanks
>>>>
>>>
>>> Hi Jerin,
>>>
>>> This commit cause a compilation error on target i686-native-linuxapp-gcc
>>> with gcc6.
>>
>> Besides that, I'm more curious to know have you actually seen any
>> performance boost?
> 
> let me first address your curiosity,
> http://dpdk.org/dev/patchwork/patch/12993/( check the second comment)
> http://dpdk.org/ml/archives/dev/2016-June/042701.html
> 
> Ferruh,

Hi Jerin,

> 
> I have tested on a x86 machine with gcc 6.1. I could n't see any issues
> with i686-native-linuxapp-gcc target
Thanks for investigating the issue.

> 
> Steps following to create gcc 6.1 toolchain
> https://sahas.ra.naman.ms/2016/05/31/building-gcc6-1-on-fedora-23/
> (removed --disable-multilib to have support for -m32)
> 
> ➜ [dpdk-master] $ gcc -v
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/opt/gcc-6.1.0/libexec/gcc/x86_64-pc-linux-gnu/6.1.0/lto-wrapper
> Target: x86_64-pc-linux-gnu
> Configured with: ../gcc-6.1.0/configure --prefix=/opt/gcc-6.1.0
> --enable-languages=c,c++ --enable-libmudflap --with-system-zlib
> Thread model: posix
> gcc version 6.1.0 (GCC)
I am using Fedora24, which has gcc6 (6.1.1) as default.

> 
> More over this issue seems like an issue from x86 rte_memcpy implementation.
You are right. But i686 compilation starts failing with this commit.
And reverting this commit in the current HEAD solves the compilation
problem.
I am not really clear about reason of the compilation error.

Thanks,
ferruh
  
Ferruh Yigit July 6, 2016, 4:21 p.m. UTC | #7
On 7/5/2016 3:09 PM, Ferruh Yigit wrote:
> On 7/5/2016 2:13 PM, Jerin Jacob wrote:
>> On Tue, Jul 05, 2016 at 07:32:46PM +0800, Yuanhan Liu wrote:
>>> On Tue, Jul 05, 2016 at 09:43:03AM +0100, Ferruh Yigit wrote:
>>>> On 6/30/2016 6:28 PM, Thomas Monjalon wrote:
>>>>> 2016-06-30 17:46, Jerin Jacob:
>>>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>>>> Acked-by: Olivier Matz <olivier.matz@6wind.com>
>>>>>
>>>>> Applied, thanks
>>>>>
>>>>
>>>> Hi Jerin,
>>>>
>>>> This commit cause a compilation error on target i686-native-linuxapp-gcc
>>>> with gcc6.
>>>
>>> Besides that, I'm more curious to know have you actually seen any
>>> performance boost?
>>
>> let me first address your curiosity,
>> http://dpdk.org/dev/patchwork/patch/12993/( check the second comment)
>> http://dpdk.org/ml/archives/dev/2016-June/042701.html
>>
>> Ferruh,
> 
> Hi Jerin,
> 
>>
>> I have tested on a x86 machine with gcc 6.1. I could n't see any issues
>> with i686-native-linuxapp-gcc target
> Thanks for investigating the issue.
> 
>>
>> Steps following to create gcc 6.1 toolchain
>> https://sahas.ra.naman.ms/2016/05/31/building-gcc6-1-on-fedora-23/
>> (removed --disable-multilib to have support for -m32)
>>
>> ➜ [dpdk-master] $ gcc -v
>> Using built-in specs.
>> COLLECT_GCC=gcc
>> COLLECT_LTO_WRAPPER=/opt/gcc-6.1.0/libexec/gcc/x86_64-pc-linux-gnu/6.1.0/lto-wrapper
>> Target: x86_64-pc-linux-gnu
>> Configured with: ../gcc-6.1.0/configure --prefix=/opt/gcc-6.1.0
>> --enable-languages=c,c++ --enable-libmudflap --with-system-zlib
>> Thread model: posix
>> gcc version 6.1.0 (GCC)
> I am using Fedora24, which has gcc6 (6.1.1) as default.
> 
>>
>> More over this issue seems like an issue from x86 rte_memcpy implementation.
> You are right. But i686 compilation starts failing with this commit.
> And reverting this commit in the current HEAD solves the compilation
> problem.
> I am not really clear about reason of the compilation error.

The compile error is because compiler is so smart now and at the same
time not enough smart.

Call stack is as following:

virtio_xmit_pkts_simple
  virtio_xmit_cleanup
    rte_mempool_put_bulk
      rte_mempool_generic_put
        __mempool_generic_put
	  rte_memcpy

The array used as source buffer in virtio_xmit_cleanup (free) is a
pointer array with 32 elements, in 32bit this makes 128 bytes.

in rte_memcpy() implementation, there a code piece as following:
if (size > 256) {
    rte_move128(...);
    rte_move128(...); <--- [1]
    ....
}

The compiler traces the array all through the call stack and knows the
size of array is 128 and generates a warning on above [1] which tries to
access beyond byte 128.
But unfortunately it ignores the "(size > 256)" check.

In 64bit, this warning is not generated because array size becomes 256
bytes.

So this warning is a false positive. Although I am working on it, if
anybody has a suggestion to prevent warning, quite welcome. At worst I
will disable this compiler warning for the file.

Thanks,
ferruh
  
Ferruh Yigit July 7, 2016, 1:51 p.m. UTC | #8
On 7/6/2016 5:21 PM, Ferruh Yigit wrote:
> On 7/5/2016 3:09 PM, Ferruh Yigit wrote:
>> On 7/5/2016 2:13 PM, Jerin Jacob wrote:
>>> On Tue, Jul 05, 2016 at 07:32:46PM +0800, Yuanhan Liu wrote:
>>>> On Tue, Jul 05, 2016 at 09:43:03AM +0100, Ferruh Yigit wrote:
>>>>> On 6/30/2016 6:28 PM, Thomas Monjalon wrote:
>>>>>> 2016-06-30 17:46, Jerin Jacob:
>>>>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>>>>> Acked-by: Olivier Matz <olivier.matz@6wind.com>
>>>>>>
>>>>>> Applied, thanks
>>>>>>
>>>>>
>>>>> Hi Jerin,
>>>>>
>>>>> This commit cause a compilation error on target i686-native-linuxapp-gcc
>>>>> with gcc6.
>>>>
>>>> Besides that, I'm more curious to know have you actually seen any
>>>> performance boost?
>>>
>>> let me first address your curiosity,
>>> http://dpdk.org/dev/patchwork/patch/12993/( check the second comment)
>>> http://dpdk.org/ml/archives/dev/2016-June/042701.html
>>>
>>> Ferruh,
>>
>> Hi Jerin,
>>
>>>
>>> I have tested on a x86 machine with gcc 6.1. I could n't see any issues
>>> with i686-native-linuxapp-gcc target
>> Thanks for investigating the issue.
>>
>>>
>>> Steps following to create gcc 6.1 toolchain
>>> https://sahas.ra.naman.ms/2016/05/31/building-gcc6-1-on-fedora-23/
>>> (removed --disable-multilib to have support for -m32)
>>>
>>> ➜ [dpdk-master] $ gcc -v
>>> Using built-in specs.
>>> COLLECT_GCC=gcc
>>> COLLECT_LTO_WRAPPER=/opt/gcc-6.1.0/libexec/gcc/x86_64-pc-linux-gnu/6.1.0/lto-wrapper
>>> Target: x86_64-pc-linux-gnu
>>> Configured with: ../gcc-6.1.0/configure --prefix=/opt/gcc-6.1.0
>>> --enable-languages=c,c++ --enable-libmudflap --with-system-zlib
>>> Thread model: posix
>>> gcc version 6.1.0 (GCC)
>> I am using Fedora24, which has gcc6 (6.1.1) as default.
>>
>>>
>>> More over this issue seems like an issue from x86 rte_memcpy implementation.
>> You are right. But i686 compilation starts failing with this commit.
>> And reverting this commit in the current HEAD solves the compilation
>> problem.
>> I am not really clear about reason of the compilation error.
> 
> The compile error is because compiler is so smart now and at the same
> time not enough smart.
> 
> Call stack is as following:
> 
> virtio_xmit_pkts_simple
>   virtio_xmit_cleanup
>     rte_mempool_put_bulk
>       rte_mempool_generic_put
>         __mempool_generic_put
> 	  rte_memcpy
> 
> The array used as source buffer in virtio_xmit_cleanup (free) is a
> pointer array with 32 elements, in 32bit this makes 128 bytes.
> 
> in rte_memcpy() implementation, there a code piece as following:
> if (size > 256) {
>     rte_move128(...);
>     rte_move128(...); <--- [1]
>     ....
> }
> 
> The compiler traces the array all through the call stack and knows the
> size of array is 128 and generates a warning on above [1] which tries to
> access beyond byte 128.
> But unfortunately it ignores the "(size > 256)" check.
> 
> In 64bit, this warning is not generated because array size becomes 256
> bytes.
> 
> So this warning is a false positive. Although I am working on it, if
> anybody has a suggestion to prevent warning, quite welcome. At worst I
> will disable this compiler warning for the file.

I have sent a patch:
http://dpdk.org/ml/archives/dev/2016-July/043492.html

Giving a hint to compiler that variable "size" is related to the size of
the source buffer fixes compiler warning.

- This update is in virtio fast path, can you please review it from
point of performance effect.

- Isn't this surprisingly smart of compiler, or am I missing something J

Thanks,
ferruh
  

Patch

diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index b2a5197..c8a81e2 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -74,6 +74,7 @@ 
 #include <rte_memory.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
+#include <rte_memcpy.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -1028,7 +1029,6 @@  static inline void __attribute__((always_inline))
 __mempool_generic_put(struct rte_mempool *mp, void * const *obj_table,
 		      unsigned n, struct rte_mempool_cache *cache, int flags)
 {
-	uint32_t index;
 	void **cache_objs;
 
 	/* increment stat now, adding in mempool always success */
@@ -1052,8 +1052,7 @@  __mempool_generic_put(struct rte_mempool *mp, void * const *obj_table,
 	 */
 
 	/* Add elements back into the cache */
-	for (index = 0; index < n; ++index, obj_table++)
-		cache_objs[index] = *obj_table;
+	rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n);
 
 	cache->len += n;