mbox series

[v1,0/2] reimplement rwlock and add relevant perf test case

Message ID 1544672265-219262-1-git-send-email-joyce.kong@arm.com (mailing list archive)
Headers
Series reimplement rwlock and add relevant perf test case |

Message

Joyce Kong Dec. 13, 2018, 3:37 a.m. UTC
  v1: reimplement rwlock with __atomic builtins, and add a rwlock perf test
    on all available cores to benchmark the improvement.

We tested the patches on three arm64 platforms, ThundeX2 gained 20% performance,
Qualcomm gained 36% and the 4-Cortex-A72 Marvell MACCHIATObin gained 19.6%.
Below is the detailed test result on ThunderX2:

*** rwlock_autotest without __atomic builtins ***
Rwlock Perf Test on 128 cores...
Core [0] count = 281
Core [1] count = 252
Core [2] count = 290
Core [3] count = 259
Core [4] count = 287
...
Core [209] count = 3
Core [210] count = 31
Core [211] count = 120
Total count = 18537

*** rwlock_autotest with __atomic builtins ***
Rwlock Perf Test on 128 cores...
Core [0] count = 346
Core [1] count = 355
Core [2] count = 259
Core [3] count = 285
Core [4] count = 320
...
Core [209] count = 2
Core [210] count = 23
Core [211] count = 63
Total count = 22194

Gavin Hu (1):
  rwlock: reimplement with __atomic builtins

Joyce Kong (1):
  test/rwlock: add perf test case

 lib/librte_eal/common/include/generic/rte_rwlock.h | 16 ++---
 test/test/test_rwlock.c                            | 71 ++++++++++++++++++++++
 2 files changed, 79 insertions(+), 8 deletions(-)
  

Comments

Stephen Hemminger Dec. 13, 2018, 5:27 a.m. UTC | #1
On Thu, 13 Dec 2018 11:37:43 +0800
Joyce Kong <joyce.kong@arm.com> wrote:

> v1: reimplement rwlock with __atomic builtins, and add a rwlock perf test
>     on all available cores to benchmark the improvement.
> 
> We tested the patches on three arm64 platforms, ThundeX2 gained 20% performance,
> Qualcomm gained 36% and the 4-Cortex-A72 Marvell MACCHIATObin gained 19.6%.
> Below is the detailed test result on ThunderX2:
> 
> *** rwlock_autotest without __atomic builtins ***
> Rwlock Perf Test on 128 cores...
> Core [0] count = 281
> Core [1] count = 252
> Core [2] count = 290
> Core [3] count = 259
> Core [4] count = 287
> ...
> Core [209] count = 3
> Core [210] count = 31
> Core [211] count = 120
> Total count = 18537
> 
> *** rwlock_autotest with __atomic builtins ***
> Rwlock Perf Test on 128 cores...
> Core [0] count = 346
> Core [1] count = 355
> Core [2] count = 259
> Core [3] count = 285
> Core [4] count = 320
> ...
> Core [209] count = 2
> Core [210] count = 23
> Core [211] count = 63
> Total count = 22194
> 
> Gavin Hu (1):
>   rwlock: reimplement with __atomic builtins
> 
> Joyce Kong (1):
>   test/rwlock: add perf test case
> 
>  lib/librte_eal/common/include/generic/rte_rwlock.h | 16 ++---
>  test/test/test_rwlock.c                            | 71 ++++++++++++++++++++++
>  2 files changed, 79 insertions(+), 8 deletions(-)
> 

Did you consider using a better algorithm not just better primitives.
See https://locklessinc.com/articles/locks/ for a more complete discussion
of alternatives like ticket locks.
  
Gavin Hu Dec. 14, 2018, 1:30 a.m. UTC | #2
Hi Stephen,

Thanks for your comment and sharing the link!
We are looking into it and it may take more time for performance profiling.

Best Regards,
Gavin

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, December 13, 2018 1:27 PM
> To: Joyce Kong (Arm Technology China) <Joyce.Kong@arm.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>; thomas@monjalon.net;
> jerin.jacob@caviumnetworks.com; hemant.agrawal@nxp.com; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu (Arm Technology
> China) <Gavin.Hu@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v1 0/2] reimplement rwlock and add
> relevant perf test case
> 
> On Thu, 13 Dec 2018 11:37:43 +0800
> Joyce Kong <joyce.kong@arm.com> wrote:
> 
> > v1: reimplement rwlock with __atomic builtins, and add a rwlock perf test
> >     on all available cores to benchmark the improvement.
> >
> > We tested the patches on three arm64 platforms, ThundeX2 gained 20%
> > performance, Qualcomm gained 36% and the 4-Cortex-A72 Marvell
> MACCHIATObin gained 19.6%.
> > Below is the detailed test result on ThunderX2:
> >
> > *** rwlock_autotest without __atomic builtins *** Rwlock Perf Test on
> > 128 cores...
> > Core [0] count = 281
> > Core [1] count = 252
> > Core [2] count = 290
> > Core [3] count = 259
> > Core [4] count = 287
> > ...
> > Core [209] count = 3
> > Core [210] count = 31
> > Core [211] count = 120
> > Total count = 18537
> >
> > *** rwlock_autotest with __atomic builtins *** Rwlock Perf Test on 128
> > cores...
> > Core [0] count = 346
> > Core [1] count = 355
> > Core [2] count = 259
> > Core [3] count = 285
> > Core [4] count = 320
> > ...
> > Core [209] count = 2
> > Core [210] count = 23
> > Core [211] count = 63
> > Total count = 22194
> >
> > Gavin Hu (1):
> >   rwlock: reimplement with __atomic builtins
> >
> > Joyce Kong (1):
> >   test/rwlock: add perf test case
> >
> >  lib/librte_eal/common/include/generic/rte_rwlock.h | 16 ++---
> >  test/test/test_rwlock.c                            | 71 ++++++++++++++++++++++
> >  2 files changed, 79 insertions(+), 8 deletions(-)
> >
> 
> Did you consider using a better algorithm not just better primitives.
> See https://locklessinc.com/articles/locks/ for a more complete discussion of
> alternatives like ticket locks.
  
Honnappa Nagarahalli Dec. 17, 2018, 5:16 a.m. UTC | #3
Adding other platform maintainers as it affects all platforms.

> -----Original Message-----
> From: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>
> Sent: Thursday, December 13, 2018 7:30 PM
> To: Stephen Hemminger <stephen@networkplumber.org>; Joyce Kong (Arm
> Technology China) <Joyce.Kong@arm.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>; thomas@monjalon.net;
> jerin.jacob@caviumnetworks.com; hemant.agrawal@nxp.com; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v1 0/2] reimplement rwlock and add relevant
> perf test case
> 
> Hi Stephen,
> 
> Thanks for your comment and sharing the link!
> We are looking into it and it may take more time for performance profiling.
> 
> Best Regards,
> Gavin
> 
> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Thursday, December 13, 2018 1:27 PM
> > To: Joyce Kong (Arm Technology China) <Joyce.Kong@arm.com>
> > Cc: dev@dpdk.org; nd <nd@arm.com>; thomas@monjalon.net;
> > jerin.jacob@caviumnetworks.com; hemant.agrawal@nxp.com; Honnappa
> > Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu (Arm Technology
> > China) <Gavin.Hu@arm.com>
> > Subject: Re: [dpdk-dev] [PATCH v1 0/2] reimplement rwlock and add
> > relevant perf test case
> >
> > On Thu, 13 Dec 2018 11:37:43 +0800
> > Joyce Kong <joyce.kong@arm.com> wrote:
> >
> > > v1: reimplement rwlock with __atomic builtins, and add a rwlock perf test
> > >     on all available cores to benchmark the improvement.
> > >
> > > We tested the patches on three arm64 platforms, ThundeX2 gained 20%
> > > performance, Qualcomm gained 36% and the 4-Cortex-A72 Marvell
> > MACCHIATObin gained 19.6%.
> > > Below is the detailed test result on ThunderX2:
> > >
> > > *** rwlock_autotest without __atomic builtins *** Rwlock Perf Test
> > > on
> > > 128 cores...
> > > Core [0] count = 281
> > > Core [1] count = 252
> > > Core [2] count = 290
> > > Core [3] count = 259
> > > Core [4] count = 287
> > > ...
> > > Core [209] count = 3
> > > Core [210] count = 31
> > > Core [211] count = 120
> > > Total count = 18537
> > >
> > > *** rwlock_autotest with __atomic builtins *** Rwlock Perf Test on
> > > 128 cores...
> > > Core [0] count = 346
> > > Core [1] count = 355
> > > Core [2] count = 259
> > > Core [3] count = 285
> > > Core [4] count = 320
> > > ...
> > > Core [209] count = 2
> > > Core [210] count = 23
> > > Core [211] count = 63
> > > Total count = 22194
> > >
> > > Gavin Hu (1):
> > >   rwlock: reimplement with __atomic builtins
> > >
> > > Joyce Kong (1):
> > >   test/rwlock: add perf test case
> > >
> > >  lib/librte_eal/common/include/generic/rte_rwlock.h | 16 ++---
> > >  test/test/test_rwlock.c                            | 71 ++++++++++++++++++++++
> > >  2 files changed, 79 insertions(+), 8 deletions(-)
> > >
> >
> > Did you consider using a better algorithm not just better primitives.
> > See https://locklessinc.com/articles/locks/ for a more complete
> > discussion of alternatives like ticket locks.
  
Thomas Monjalon Dec. 19, 2018, 8:37 p.m. UTC | #4
17/12/2018 06:16, Honnappa Nagarahalli:
> Adding other platform maintainers as it affects all platforms.

There is no other comment. I am not sure what to do with this patch.


> From: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>
> > 
> > Hi Stephen,
> > 
> > Thanks for your comment and sharing the link!
> > We are looking into it and it may take more time for performance profiling.
> > 
> > Best Regards,
> > Gavin
> > 
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > > On Thu, 13 Dec 2018 11:37:43 +0800
> > > Joyce Kong <joyce.kong@arm.com> wrote:
> > >
> > > > v1: reimplement rwlock with __atomic builtins, and add a rwlock perf test
> > > >     on all available cores to benchmark the improvement.
> > > >
> > > > We tested the patches on three arm64 platforms, ThundeX2 gained 20%
> > > > performance, Qualcomm gained 36% and the 4-Cortex-A72 Marvell
> > > MACCHIATObin gained 19.6%.
> > > > Below is the detailed test result on ThunderX2:
> > > >
> > > > *** rwlock_autotest without __atomic builtins *** Rwlock Perf Test
> > > > on
> > > > 128 cores...
> > > > Core [0] count = 281
> > > > Core [1] count = 252
> > > > Core [2] count = 290
> > > > Core [3] count = 259
> > > > Core [4] count = 287
> > > > ...
> > > > Core [209] count = 3
> > > > Core [210] count = 31
> > > > Core [211] count = 120
> > > > Total count = 18537
> > > >
> > > > *** rwlock_autotest with __atomic builtins *** Rwlock Perf Test on
> > > > 128 cores...
> > > > Core [0] count = 346
> > > > Core [1] count = 355
> > > > Core [2] count = 259
> > > > Core [3] count = 285
> > > > Core [4] count = 320
> > > > ...
> > > > Core [209] count = 2
> > > > Core [210] count = 23
> > > > Core [211] count = 63
> > > > Total count = 22194
> > > >
> > > > Gavin Hu (1):
> > > >   rwlock: reimplement with __atomic builtins
> > > >
> > > > Joyce Kong (1):
> > > >   test/rwlock: add perf test case
> > > >
> > > >  lib/librte_eal/common/include/generic/rte_rwlock.h | 16 ++---
> > > >  test/test/test_rwlock.c                            | 71 ++++++++++++++++++++++
> > > >  2 files changed, 79 insertions(+), 8 deletions(-)
> > > >
> > >
> > > Did you consider using a better algorithm not just better primitives.
> > > See https://locklessinc.com/articles/locks/ for a more complete
> > > discussion of alternatives like ticket locks.