Message ID | 1544672265-219262-1-git-send-email-joyce.kong@arm.com (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2437B1B43E; Thu, 13 Dec 2018 04:37:56 +0100 (CET) Received: from foss.arm.com (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id 6C1351B432 for <dev@dpdk.org>; Thu, 13 Dec 2018 04:37:55 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EF8DEA78; Wed, 12 Dec 2018 19:37:53 -0800 (PST) Received: from net-arm-thunderx2.shanghai.arm.com (net-arm-thunderx2.shanghai.arm.com [10.169.40.71]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id A46AB3F6A8; Wed, 12 Dec 2018 19:37:52 -0800 (PST) From: Joyce Kong <joyce.kong@arm.com> To: dev@dpdk.org Cc: nd@arm.com, thomas@monjalon.net, jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com, honnappa.nagarahalli@arm.com, gavin.hu@arm.com, joyce.kong@arm.com Date: Thu, 13 Dec 2018 11:37:43 +0800 Message-Id: <1544672265-219262-1-git-send-email-joyce.kong@arm.com> X-Mailer: git-send-email 2.7.4 Subject: [dpdk-dev] [PATCH v1 0/2] reimplement rwlock and add relevant perf test case X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Series |
reimplement rwlock and add relevant perf test case
|
|
Message
Joyce Kong
Dec. 13, 2018, 3:37 a.m. UTC
v1: reimplement rwlock with __atomic builtins, and add a rwlock perf test on all available cores to benchmark the improvement. We tested the patches on three arm64 platforms, ThundeX2 gained 20% performance, Qualcomm gained 36% and the 4-Cortex-A72 Marvell MACCHIATObin gained 19.6%. Below is the detailed test result on ThunderX2: *** rwlock_autotest without __atomic builtins *** Rwlock Perf Test on 128 cores... Core [0] count = 281 Core [1] count = 252 Core [2] count = 290 Core [3] count = 259 Core [4] count = 287 ... Core [209] count = 3 Core [210] count = 31 Core [211] count = 120 Total count = 18537 *** rwlock_autotest with __atomic builtins *** Rwlock Perf Test on 128 cores... Core [0] count = 346 Core [1] count = 355 Core [2] count = 259 Core [3] count = 285 Core [4] count = 320 ... Core [209] count = 2 Core [210] count = 23 Core [211] count = 63 Total count = 22194 Gavin Hu (1): rwlock: reimplement with __atomic builtins Joyce Kong (1): test/rwlock: add perf test case lib/librte_eal/common/include/generic/rte_rwlock.h | 16 ++--- test/test/test_rwlock.c | 71 ++++++++++++++++++++++ 2 files changed, 79 insertions(+), 8 deletions(-)
Comments
On Thu, 13 Dec 2018 11:37:43 +0800 Joyce Kong <joyce.kong@arm.com> wrote: > v1: reimplement rwlock with __atomic builtins, and add a rwlock perf test > on all available cores to benchmark the improvement. > > We tested the patches on three arm64 platforms, ThundeX2 gained 20% performance, > Qualcomm gained 36% and the 4-Cortex-A72 Marvell MACCHIATObin gained 19.6%. > Below is the detailed test result on ThunderX2: > > *** rwlock_autotest without __atomic builtins *** > Rwlock Perf Test on 128 cores... > Core [0] count = 281 > Core [1] count = 252 > Core [2] count = 290 > Core [3] count = 259 > Core [4] count = 287 > ... > Core [209] count = 3 > Core [210] count = 31 > Core [211] count = 120 > Total count = 18537 > > *** rwlock_autotest with __atomic builtins *** > Rwlock Perf Test on 128 cores... > Core [0] count = 346 > Core [1] count = 355 > Core [2] count = 259 > Core [3] count = 285 > Core [4] count = 320 > ... > Core [209] count = 2 > Core [210] count = 23 > Core [211] count = 63 > Total count = 22194 > > Gavin Hu (1): > rwlock: reimplement with __atomic builtins > > Joyce Kong (1): > test/rwlock: add perf test case > > lib/librte_eal/common/include/generic/rte_rwlock.h | 16 ++--- > test/test/test_rwlock.c | 71 ++++++++++++++++++++++ > 2 files changed, 79 insertions(+), 8 deletions(-) > Did you consider using a better algorithm not just better primitives. See https://locklessinc.com/articles/locks/ for a more complete discussion of alternatives like ticket locks.
Hi Stephen, Thanks for your comment and sharing the link! We are looking into it and it may take more time for performance profiling. Best Regards, Gavin > -----Original Message----- > From: Stephen Hemminger <stephen@networkplumber.org> > Sent: Thursday, December 13, 2018 1:27 PM > To: Joyce Kong (Arm Technology China) <Joyce.Kong@arm.com> > Cc: dev@dpdk.org; nd <nd@arm.com>; thomas@monjalon.net; > jerin.jacob@caviumnetworks.com; hemant.agrawal@nxp.com; Honnappa > Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu (Arm Technology > China) <Gavin.Hu@arm.com> > Subject: Re: [dpdk-dev] [PATCH v1 0/2] reimplement rwlock and add > relevant perf test case > > On Thu, 13 Dec 2018 11:37:43 +0800 > Joyce Kong <joyce.kong@arm.com> wrote: > > > v1: reimplement rwlock with __atomic builtins, and add a rwlock perf test > > on all available cores to benchmark the improvement. > > > > We tested the patches on three arm64 platforms, ThundeX2 gained 20% > > performance, Qualcomm gained 36% and the 4-Cortex-A72 Marvell > MACCHIATObin gained 19.6%. > > Below is the detailed test result on ThunderX2: > > > > *** rwlock_autotest without __atomic builtins *** Rwlock Perf Test on > > 128 cores... > > Core [0] count = 281 > > Core [1] count = 252 > > Core [2] count = 290 > > Core [3] count = 259 > > Core [4] count = 287 > > ... > > Core [209] count = 3 > > Core [210] count = 31 > > Core [211] count = 120 > > Total count = 18537 > > > > *** rwlock_autotest with __atomic builtins *** Rwlock Perf Test on 128 > > cores... > > Core [0] count = 346 > > Core [1] count = 355 > > Core [2] count = 259 > > Core [3] count = 285 > > Core [4] count = 320 > > ... > > Core [209] count = 2 > > Core [210] count = 23 > > Core [211] count = 63 > > Total count = 22194 > > > > Gavin Hu (1): > > rwlock: reimplement with __atomic builtins > > > > Joyce Kong (1): > > test/rwlock: add perf test case > > > > lib/librte_eal/common/include/generic/rte_rwlock.h | 16 ++--- > > test/test/test_rwlock.c | 71 ++++++++++++++++++++++ > > 2 files changed, 79 insertions(+), 8 deletions(-) > > > > Did you consider using a better algorithm not just better primitives. > See https://locklessinc.com/articles/locks/ for a more complete discussion of > alternatives like ticket locks.
Adding other platform maintainers as it affects all platforms. > -----Original Message----- > From: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com> > Sent: Thursday, December 13, 2018 7:30 PM > To: Stephen Hemminger <stephen@networkplumber.org>; Joyce Kong (Arm > Technology China) <Joyce.Kong@arm.com> > Cc: dev@dpdk.org; nd <nd@arm.com>; thomas@monjalon.net; > jerin.jacob@caviumnetworks.com; hemant.agrawal@nxp.com; Honnappa > Nagarahalli <Honnappa.Nagarahalli@arm.com> > Subject: RE: [dpdk-dev] [PATCH v1 0/2] reimplement rwlock and add relevant > perf test case > > Hi Stephen, > > Thanks for your comment and sharing the link! > We are looking into it and it may take more time for performance profiling. > > Best Regards, > Gavin > > > -----Original Message----- > > From: Stephen Hemminger <stephen@networkplumber.org> > > Sent: Thursday, December 13, 2018 1:27 PM > > To: Joyce Kong (Arm Technology China) <Joyce.Kong@arm.com> > > Cc: dev@dpdk.org; nd <nd@arm.com>; thomas@monjalon.net; > > jerin.jacob@caviumnetworks.com; hemant.agrawal@nxp.com; Honnappa > > Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu (Arm Technology > > China) <Gavin.Hu@arm.com> > > Subject: Re: [dpdk-dev] [PATCH v1 0/2] reimplement rwlock and add > > relevant perf test case > > > > On Thu, 13 Dec 2018 11:37:43 +0800 > > Joyce Kong <joyce.kong@arm.com> wrote: > > > > > v1: reimplement rwlock with __atomic builtins, and add a rwlock perf test > > > on all available cores to benchmark the improvement. > > > > > > We tested the patches on three arm64 platforms, ThundeX2 gained 20% > > > performance, Qualcomm gained 36% and the 4-Cortex-A72 Marvell > > MACCHIATObin gained 19.6%. > > > Below is the detailed test result on ThunderX2: > > > > > > *** rwlock_autotest without __atomic builtins *** Rwlock Perf Test > > > on > > > 128 cores... > > > Core [0] count = 281 > > > Core [1] count = 252 > > > Core [2] count = 290 > > > Core [3] count = 259 > > > Core [4] count = 287 > > > ... > > > Core [209] count = 3 > > > Core [210] count = 31 > > > Core [211] count = 120 > > > Total count = 18537 > > > > > > *** rwlock_autotest with __atomic builtins *** Rwlock Perf Test on > > > 128 cores... > > > Core [0] count = 346 > > > Core [1] count = 355 > > > Core [2] count = 259 > > > Core [3] count = 285 > > > Core [4] count = 320 > > > ... > > > Core [209] count = 2 > > > Core [210] count = 23 > > > Core [211] count = 63 > > > Total count = 22194 > > > > > > Gavin Hu (1): > > > rwlock: reimplement with __atomic builtins > > > > > > Joyce Kong (1): > > > test/rwlock: add perf test case > > > > > > lib/librte_eal/common/include/generic/rte_rwlock.h | 16 ++--- > > > test/test/test_rwlock.c | 71 ++++++++++++++++++++++ > > > 2 files changed, 79 insertions(+), 8 deletions(-) > > > > > > > Did you consider using a better algorithm not just better primitives. > > See https://locklessinc.com/articles/locks/ for a more complete > > discussion of alternatives like ticket locks.
17/12/2018 06:16, Honnappa Nagarahalli: > Adding other platform maintainers as it affects all platforms. There is no other comment. I am not sure what to do with this patch. > From: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com> > > > > Hi Stephen, > > > > Thanks for your comment and sharing the link! > > We are looking into it and it may take more time for performance profiling. > > > > Best Regards, > > Gavin > > > > From: Stephen Hemminger <stephen@networkplumber.org> > > > On Thu, 13 Dec 2018 11:37:43 +0800 > > > Joyce Kong <joyce.kong@arm.com> wrote: > > > > > > > v1: reimplement rwlock with __atomic builtins, and add a rwlock perf test > > > > on all available cores to benchmark the improvement. > > > > > > > > We tested the patches on three arm64 platforms, ThundeX2 gained 20% > > > > performance, Qualcomm gained 36% and the 4-Cortex-A72 Marvell > > > MACCHIATObin gained 19.6%. > > > > Below is the detailed test result on ThunderX2: > > > > > > > > *** rwlock_autotest without __atomic builtins *** Rwlock Perf Test > > > > on > > > > 128 cores... > > > > Core [0] count = 281 > > > > Core [1] count = 252 > > > > Core [2] count = 290 > > > > Core [3] count = 259 > > > > Core [4] count = 287 > > > > ... > > > > Core [209] count = 3 > > > > Core [210] count = 31 > > > > Core [211] count = 120 > > > > Total count = 18537 > > > > > > > > *** rwlock_autotest with __atomic builtins *** Rwlock Perf Test on > > > > 128 cores... > > > > Core [0] count = 346 > > > > Core [1] count = 355 > > > > Core [2] count = 259 > > > > Core [3] count = 285 > > > > Core [4] count = 320 > > > > ... > > > > Core [209] count = 2 > > > > Core [210] count = 23 > > > > Core [211] count = 63 > > > > Total count = 22194 > > > > > > > > Gavin Hu (1): > > > > rwlock: reimplement with __atomic builtins > > > > > > > > Joyce Kong (1): > > > > test/rwlock: add perf test case > > > > > > > > lib/librte_eal/common/include/generic/rte_rwlock.h | 16 ++--- > > > > test/test/test_rwlock.c | 71 ++++++++++++++++++++++ > > > > 2 files changed, 79 insertions(+), 8 deletions(-) > > > > > > > > > > Did you consider using a better algorithm not just better primitives. > > > See https://locklessinc.com/articles/locks/ for a more complete > > > discussion of alternatives like ticket locks.