From patchwork Mon Jul 22 08:44:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Phil Yang X-Patchwork-Id: 56823 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id DEECB3DC; Mon, 22 Jul 2019 10:45:03 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by dpdk.org (Postfix) with ESMTP id 995411D7 for ; Mon, 22 Jul 2019 10:45:01 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DBDAF344; Mon, 22 Jul 2019 01:45:00 -0700 (PDT) Received: from phil-VirtualBox.shanghai.arm.com (unknown [10.169.109.155]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 1F1483F71F; Mon, 22 Jul 2019 01:44:58 -0700 (PDT) From: Phil Yang To: dev@dpdk.org Cc: thomas@monjalon.net, jerinj@marvell.com, gage.eads@intel.com, hemant.agrawal@nxp.com, Honnappa.Nagarahalli@arm.com, gavin.hu@arm.com, nd@arm.com Date: Mon, 22 Jul 2019 16:44:23 +0800 Message-Id: <1563785065-12969-1-git-send-email-phil.yang@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1561257671-10316-1-git-send-email-phil.yang@arm.com> References: <1561257671-10316-1-git-send-email-phil.yang@arm.com> Subject: [dpdk-dev] [PATCH v4 1/3] eal/arm64: add 128-bit atomic compare exchange X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add 128-bit atomic compare exchange on aarch64. Suggested-by: Jerin Jacob Signed-off-by: Phil Yang Tested-by: Honnappa Nagarahalli Reviewed-by: Honnappa Nagarahalli --- V4: 1. Add RTE_ARM_FEATURE_ATOMICS flag to support LSE CASP instructions. (Jerin Jocob) 2. Fix possible arm64 ABI break by making casp_op_name noinline. (Jerin Jocob) 3. Add rte_stack_lf_stubs.h to reduce the ifdef clutter. (Gage Eads/Jerin Jocob) v3: 1. Avoid duplication code with macro. (Jerin Jocob) 2. Make invalid memory order to strongest barrier. (Jerin Jocob) 3. Update doc/guides/prog_guide/env_abstraction_layer.rst. (Gage Eads) 4. Fix 32-bit x86 builds issue. (Gage Eads) 5. Correct documentation issues in UT. (Gage Eads) v2: Initial version. config/arm/meson.build | 1 + config/common_base | 5 + config/defconfig_arm64-thunderx2-linuxapp-gcc | 1 + .../common/include/arch/arm/rte_atomic_64.h | 162 +++++++++++++++++++++ .../common/include/arch/x86/rte_atomic_64.h | 12 -- lib/librte_eal/common/include/generic/rte_atomic.h | 17 ++- 6 files changed, 185 insertions(+), 13 deletions(-) diff --git a/config/arm/meson.build b/config/arm/meson.build index 979018e..a88f21e 100644 --- a/config/arm/meson.build +++ b/config/arm/meson.build @@ -68,6 +68,7 @@ flags_thunderx_extra = [ ['RTE_USE_C11_MEM_MODEL', false]] flags_thunderx2_extra = [ ['RTE_MACHINE', '"thunderx2"'], + ['RTE_ARM_FEATURE_ATOMICS', true], ['RTE_CACHE_LINE_SIZE', 64], ['RTE_MAX_NUMA_NODES', 2], ['RTE_MAX_LCORE', 256], diff --git a/config/common_base b/config/common_base index 8ef75c2..8862495 100644 --- a/config/common_base +++ b/config/common_base @@ -1067,3 +1067,8 @@ CONFIG_RTE_APP_CRYPTO_PERF=y # Compile the eventdev application # CONFIG_RTE_APP_EVENTDEV=y + +# +# Compile ARM LSE ATOMIC instructions statically +# +CONFIG_RTE_ARM_FEATURE_ATOMICS=n diff --git a/config/defconfig_arm64-thunderx2-linuxapp-gcc b/config/defconfig_arm64-thunderx2-linuxapp-gcc index cc5c64b..17b6dec 100644 --- a/config/defconfig_arm64-thunderx2-linuxapp-gcc +++ b/config/defconfig_arm64-thunderx2-linuxapp-gcc @@ -6,6 +6,7 @@ CONFIG_RTE_MACHINE="thunderx2" +CONFIG_RTE_ARM_FEATURE_ATOMICS=y CONFIG_RTE_CACHE_LINE_SIZE=64 CONFIG_RTE_MAX_NUMA_NODES=2 CONFIG_RTE_MAX_LCORE=256 diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h index 97060e4..88b7ff4 100644 --- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright(c) 2015 Cavium, Inc + * Copyright(c) 2019 Arm Limited */ #ifndef _RTE_ATOMIC_ARM64_H_ @@ -14,6 +15,9 @@ extern "C" { #endif #include "generic/rte_atomic.h" +#include +#include +#include #define dsb(opt) asm volatile("dsb " #opt : : : "memory") #define dmb(opt) asm volatile("dmb " #opt : : : "memory") @@ -40,6 +44,164 @@ extern "C" { #define rte_cio_rmb() dmb(oshld) +/*------------------------ 128 bit atomic operations -------------------------*/ + +#define __HAS_ACQ(mo) ((mo) != __ATOMIC_RELAXED && (mo) != __ATOMIC_RELEASE) +#define __HAS_RLS(mo) ((mo) == __ATOMIC_RELEASE || (mo) == __ATOMIC_ACQ_REL || \ + (mo) == __ATOMIC_SEQ_CST) + +#define __MO_LOAD(mo) (__HAS_ACQ((mo)) ? __ATOMIC_ACQUIRE : __ATOMIC_RELAXED) +#define __MO_STORE(mo) (__HAS_RLS((mo)) ? __ATOMIC_RELEASE : __ATOMIC_RELAXED) + +#if defined(__ARM_FEATURE_ATOMICS) || defined(RTE_ARM_FEATURE_ATOMICS) +#define __ATOMIC128_CAS_OP(cas_op_name, op_string) \ +static __rte_noinline rte_int128_t \ +cas_op_name(rte_int128_t *dst, rte_int128_t old, \ + rte_int128_t updated) \ +{ \ + /* caspX instructions register pair must start from even-numbered + * register at operand 1. + * So, specify registers for local variables here. + */ \ + register uint64_t x0 __asm("x0") = (uint64_t)old.val[0]; \ + register uint64_t x1 __asm("x1") = (uint64_t)old.val[1]; \ + register uint64_t x2 __asm("x2") = (uint64_t)updated.val[0]; \ + register uint64_t x3 __asm("x3") = (uint64_t)updated.val[1]; \ + asm volatile( \ + op_string " %[old0], %[old1], %[upd0], %[upd1], [%[dst]]" \ + : [old0] "+r" (x0), \ + [old1] "+r" (x1) \ + : [upd0] "r" (x2), \ + [upd1] "r" (x3), \ + [dst] "r" (dst) \ + : "memory"); \ + old.val[0] = x0; \ + old.val[1] = x1; \ + return old; \ +} + +__ATOMIC128_CAS_OP(__rte_cas_relaxed, "casp") +__ATOMIC128_CAS_OP(__rte_cas_acquire, "caspa") +__ATOMIC128_CAS_OP(__rte_cas_release, "caspl") +__ATOMIC128_CAS_OP(__rte_cas_acq_rel, "caspal") +#else +#define __ATOMIC128_LDX_OP(ldx_op_name, op_string) \ +static inline rte_int128_t \ +ldx_op_name(const rte_int128_t *src) \ +{ \ + rte_int128_t ret; \ + asm volatile( \ + op_string " %0, %1, %2" \ + : "=&r" (ret.val[0]), \ + "=&r" (ret.val[1]) \ + : "Q" (src->val[0]) \ + : "memory"); \ + return ret; \ +} + +__ATOMIC128_LDX_OP(__rte_ldx_relaxed, "ldxp") +__ATOMIC128_LDX_OP(__rte_ldx_acquire, "ldaxp") + +#define __ATOMIC128_STX_OP(stx_op_name, op_string) \ +static inline uint32_t \ +stx_op_name(rte_int128_t *dst, const rte_int128_t src) \ +{ \ + uint32_t ret; \ + asm volatile( \ + op_string " %w0, %1, %2, %3" \ + : "=&r" (ret) \ + : "r" (src.val[0]), \ + "r" (src.val[1]), \ + "Q" (dst->val[0]) \ + : "memory"); \ + /* Return 0 on success, 1 on failure */ \ + return ret; \ +} + +__ATOMIC128_STX_OP(__rte_stx_relaxed, "stxp") +__ATOMIC128_STX_OP(__rte_stx_release, "stlxp") +#endif + +static inline int __rte_experimental +rte_atomic128_cmp_exchange(rte_int128_t *dst, + rte_int128_t *exp, + const rte_int128_t *src, + unsigned int weak, + int success, + int failure) +{ + /* Always do strong CAS */ + RTE_SET_USED(weak); + /* Ignore memory ordering for failure, memory order for + * success must be stronger or equal + */ + RTE_SET_USED(failure); + /* Find invalid memory order */ + RTE_ASSERT(success == __ATOMIC_RELAXED + || success == __ATOMIC_ACQUIRE + || success == __ATOMIC_RELEASE + || success == __ATOMIC_ACQ_REL + || success == __ATOMIC_SEQ_CST); + +#ifdef __ARM_FEATURE_ATOMICS + rte_int128_t expected = *exp; + rte_int128_t desired = *src; + rte_int128_t old; + + if (success == __ATOMIC_RELAXED) + old = __rte_cas_relaxed(dst, expected, desired); + else if (success == __ATOMIC_ACQUIRE) + old = __rte_cas_acquire(dst, expected, desired); + else if (success == __ATOMIC_RELEASE) + old = __rte_cas_release(dst, expected, desired); + else + old = __rte_cas_acq_rel(dst, expected, desired); +#else + int ldx_mo = __MO_LOAD(success); + int stx_mo = __MO_STORE(success); + uint32_t ret = 1; + register rte_int128_t expected = *exp; + register rte_int128_t desired = *src; + register rte_int128_t old; + + /* ldx128 can not guarantee atomic, + * Must write back src or old to verify atomicity of ldx128; + */ + do { + if (ldx_mo == __ATOMIC_RELAXED) + old = __rte_ldx_relaxed(dst); + else + old = __rte_ldx_acquire(dst); + + if (likely(old.int128 == expected.int128)) { + if (stx_mo == __ATOMIC_RELAXED) + ret = __rte_stx_relaxed(dst, desired); + else + ret = __rte_stx_release(dst, desired); + } else { + /* In the failure case (since 'weak' is ignored and only + * weak == 0 is implemented), expected should contain the + * atomically read value of dst. This means, 'old' needs + * to be stored back to ensure it was read atomically. + */ + if (stx_mo == __ATOMIC_RELAXED) + ret = __rte_stx_relaxed(dst, old); + else + ret = __rte_stx_release(dst, old); + } + } while (unlikely(ret)); +#endif + + /* Unconditionally updating expected removes + * an 'if' statement. + * expected should already be in register if + * not in the cache. + */ + *exp = old; + + return (old.int128 == expected.int128); +} + #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h index e087c6c..1217129 100644 --- a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h @@ -212,18 +212,6 @@ static inline void rte_atomic64_clear(rte_atomic64_t *v) /*------------------------ 128 bit atomic operations -------------------------*/ -/** - * 128-bit integer structure. - */ -RTE_STD_C11 -typedef struct { - RTE_STD_C11 - union { - uint64_t val[2]; - __extension__ __int128 int128; - }; -} __rte_aligned(16) rte_int128_t; - __rte_experimental static inline int rte_atomic128_cmp_exchange(rte_int128_t *dst, diff --git a/lib/librte_eal/common/include/generic/rte_atomic.h b/lib/librte_eal/common/include/generic/rte_atomic.h index 24ff7dc..e6ab15a 100644 --- a/lib/librte_eal/common/include/generic/rte_atomic.h +++ b/lib/librte_eal/common/include/generic/rte_atomic.h @@ -1081,6 +1081,20 @@ static inline void rte_atomic64_clear(rte_atomic64_t *v) /*------------------------ 128 bit atomic operations -------------------------*/ +/** + * 128-bit integer structure. + */ +RTE_STD_C11 +typedef struct { + RTE_STD_C11 + union { + uint64_t val[2]; +#ifdef RTE_ARCH_64 + __extension__ __int128 int128; +#endif + }; +} __rte_aligned(16) rte_int128_t; + #ifdef __DOXYGEN__ /** @@ -1093,7 +1107,8 @@ static inline void rte_atomic64_clear(rte_atomic64_t *v) * *exp = *dst * @endcode * - * @note This function is currently only available for the x86-64 platform. + * @note This function is currently available for the x86-64 and aarch64 + * platforms. * * @note The success and failure arguments must be one of the __ATOMIC_* values * defined in the C++11 standard. For details on their behavior, refer to the From patchwork Mon Jul 22 08:44:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Phil Yang X-Patchwork-Id: 56824 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 8C14B1BDE0; Mon, 22 Jul 2019 10:45:10 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by dpdk.org (Postfix) with ESMTP id C83371BCB8 for ; Mon, 22 Jul 2019 10:45:08 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2C6EF344; Mon, 22 Jul 2019 01:45:08 -0700 (PDT) Received: from phil-VirtualBox.shanghai.arm.com (unknown [10.169.109.155]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 695123F71F; Mon, 22 Jul 2019 01:45:06 -0700 (PDT) From: Phil Yang To: dev@dpdk.org Cc: thomas@monjalon.net, jerinj@marvell.com, gage.eads@intel.com, hemant.agrawal@nxp.com, Honnappa.Nagarahalli@arm.com, gavin.hu@arm.com, nd@arm.com Date: Mon, 22 Jul 2019 16:44:24 +0800 Message-Id: <1563785065-12969-2-git-send-email-phil.yang@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1563785065-12969-1-git-send-email-phil.yang@arm.com> References: <1561257671-10316-1-git-send-email-phil.yang@arm.com> <1563785065-12969-1-git-send-email-phil.yang@arm.com> Subject: [dpdk-dev] [PATCH v4 2/3] test/atomic: add 128b compare and swap test X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add 128b atomic compare and swap test for aarch64 and x86_64. Signed-off-by: Phil Yang Reviewed-by: Honnappa Nagarahalli Acked-by: Gage Eads Acked-by: Jerin Jacob Tested-by: Jerin Jacob --- app/test/test_atomic.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 118 insertions(+), 2 deletions(-) diff --git a/app/test/test_atomic.c b/app/test/test_atomic.c index 43be30e..ff6ff88 100644 --- a/app/test/test_atomic.c +++ b/app/test/test_atomic.c @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright(c) 2010-2014 Intel Corporation + * Copyright(c) 2019 Arm Limited */ #include @@ -20,7 +21,7 @@ * Atomic Variables * ================ * - * - The main test function performs three subtests. The first test + * - The main test function performs four subtests. The first test * checks that the usual inc/dec/add/sub functions are working * correctly: * @@ -61,11 +62,27 @@ * atomic_sub(&count, tmp+1); * * - At the end of the test, the *count* value must be 0. + * + * - Test "128b compare and swap" (aarch64 and x86_64 only) + * + * - Initialize 128-bit atomic variables to zero. + * + * - Invoke ``test_atomici128_cmp_exchange()`` on each lcore. Before doing + * anything else, the cores are waiting a synchro. Each lcore does + * these compare and swap (CAS) operations several times:: + * + * Acquired CAS update counter.val[0] + 2; counter.val[1] + 1; + * Released CAS update counter.val[0] + 2; counter.val[1] + 1; + * Acquired_Released CAS update counter.val[0] + 2; counter.val[1] + 1; + * Relaxed CAS update counter.val[0] + 2; counter.val[1] + 1; + * + * - At the end of the test, the *count128* first 64-bit value and + * second 64-bit value differ by the total iterations. */ #define NUM_ATOMIC_TYPES 3 -#define N 10000 +#define N 1000000 static rte_atomic16_t a16; static rte_atomic32_t a32; @@ -216,6 +233,74 @@ test_atomic_dec_and_test(__attribute__((unused)) void *arg) return 0; } +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64) +static rte_int128_t count128; + +/* + * rte_atomic128_cmp_exchange() should update a 128 bits counter's first 64 + * bits by 2 and the second 64 bits by 1 in this test. It should return true + * if the compare exchange operation is successful. + * This test repeats 128 bits compare and swap operations 10K rounds. In each + * iteration it runs compare and swap operation with different memory models. + */ +static int +test_atomic128_cmp_exchange(__attribute__((unused)) void *arg) +{ + rte_int128_t expected; + int success; + unsigned int i; + + while (rte_atomic32_read(&synchro) == 0) + ; + + expected = count128; + + for (i = 0; i < N; i++) { + do { + rte_int128_t desired; + + desired.val[0] = expected.val[0] + 2; + desired.val[1] = expected.val[1] + 1; + + success = rte_atomic128_cmp_exchange(&count128, &expected, + &desired, 1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); + } while (success == 0); + + do { + rte_int128_t desired; + + desired.val[0] = expected.val[0] + 2; + desired.val[1] = expected.val[1] + 1; + + success = rte_atomic128_cmp_exchange(&count128, &expected, + &desired, 1, __ATOMIC_RELEASE, __ATOMIC_RELAXED); + } while (success == 0); + + do { + rte_int128_t desired; + + desired.val[0] = expected.val[0] + 2; + desired.val[1] = expected.val[1] + 1; + + success = rte_atomic128_cmp_exchange(&count128, &expected, + &desired, 1, __ATOMIC_ACQ_REL, __ATOMIC_RELAXED); + } while (success == 0); + + do { + rte_int128_t desired; + + desired.val[0] = expected.val[0] + 2; + desired.val[1] = expected.val[1] + 1; + + success = rte_atomic128_cmp_exchange(&count128, &expected, + &desired, 1, __ATOMIC_RELAXED, __ATOMIC_RELAXED); + } while (success == 0); + } + + return 0; +} +#endif + static int test_atomic(void) { @@ -340,6 +425,37 @@ test_atomic(void) return -1; } +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64) + /* + * This case tests the functionality of rte_atomic128b_cmp_exchange + * API. It calls rte_atomic128b_cmp_exchange with four kinds of memory + * models successively on each slave core. Once each 128-bit atomic + * compare and swap operation is successful, it updates the global + * 128-bit counter by 2 for the first 64-bit and 1 for the second + * 64-bit. Each slave core iterates this test 10K times. + * At the end of test, verify whether the first 64-bits of the 128-bit + * counter and the second 64bits is differ by the total iterations. If + * it is, the test passes. + */ + printf("128b compare and swap test\n"); + uint64_t iterations = 0; + + rte_atomic32_clear(&synchro); + count128.val[0] = 0; + count128.val[1] = 0; + + rte_eal_mp_remote_launch(test_atomic128_cmp_exchange, NULL, SKIP_MASTER); + rte_atomic32_set(&synchro, 1); + rte_eal_mp_wait_lcore(); + rte_atomic32_clear(&synchro); + + iterations = count128.val[0] - count128.val[1]; + if (iterations != 4*N*(rte_lcore_count()-1)) { + printf("128b compare and swap failed\n"); + return -1; + } +#endif + return 0; } From patchwork Mon Jul 22 08:44:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Phil Yang X-Patchwork-Id: 56825 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 932E81BDE8; Mon, 22 Jul 2019 10:45:14 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by dpdk.org (Postfix) with ESMTP id 77A6D1BDE4 for ; Mon, 22 Jul 2019 10:45:12 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0A3CB344; Mon, 22 Jul 2019 01:45:12 -0700 (PDT) Received: from phil-VirtualBox.shanghai.arm.com (unknown [10.169.109.155]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 4627A3F71F; Mon, 22 Jul 2019 01:45:10 -0700 (PDT) From: Phil Yang To: dev@dpdk.org Cc: thomas@monjalon.net, jerinj@marvell.com, gage.eads@intel.com, hemant.agrawal@nxp.com, Honnappa.Nagarahalli@arm.com, gavin.hu@arm.com, nd@arm.com Date: Mon, 22 Jul 2019 16:44:25 +0800 Message-Id: <1563785065-12969-3-git-send-email-phil.yang@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1563785065-12969-1-git-send-email-phil.yang@arm.com> References: <1561257671-10316-1-git-send-email-phil.yang@arm.com> <1563785065-12969-1-git-send-email-phil.yang@arm.com> Subject: [dpdk-dev] [PATCH v4 3/3] eal/stack: enable lock-free stack for aarch64 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Enable both c11 atomic and non c11 atomic lock-free stack for aarch64. Suggested-by: Gage Eads Suggested-by: Jerin Jacob Signed-off-by: Phil Yang Reviewed-by: Honnappa Nagarahalli Tested-by: Honnappa Nagarahalli --- doc/guides/prog_guide/env_abstraction_layer.rst | 4 +- doc/guides/rel_notes/release_19_08.rst | 3 ++ lib/librte_stack/rte_stack_lf.h | 4 ++ lib/librte_stack/rte_stack_lf_c11.h | 16 ------- lib/librte_stack/rte_stack_lf_generic.h | 16 ------- lib/librte_stack/rte_stack_lf_stubs.h | 59 +++++++++++++++++++++++++ 6 files changed, 68 insertions(+), 34 deletions(-) create mode 100644 lib/librte_stack/rte_stack_lf_stubs.h diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index f15bcd9..d569f95 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -592,8 +592,8 @@ Known Issues Alternatively, applications can use the lock-free stack mempool handler. When considering this handler, note that: - - It is currently limited to the x86_64 platform, because it uses an - instruction (16-byte compare-and-swap) that is not yet available on other + - It is currently limited to the aarch64 and x86_64 platforms, because it uses + an instruction (16-byte compare-and-swap) that is not yet available on other platforms. - It has worse average-case performance than the non-preemptive rte_ring, but software caching (e.g. the mempool cache) can mitigate this by reducing the diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst index 0a3f840..25d45c1 100644 --- a/doc/guides/rel_notes/release_19_08.rst +++ b/doc/guides/rel_notes/release_19_08.rst @@ -212,6 +212,9 @@ New Features Added multiple cores feature to compression perf tool application. +* **Added Lock-free Stack for aarch64.** + + The lock-free stack implementation is enabled for aarch64 platforms. Removed Items ------------- diff --git a/lib/librte_stack/rte_stack_lf.h b/lib/librte_stack/rte_stack_lf.h index f5581f0..e67630c 100644 --- a/lib/librte_stack/rte_stack_lf.h +++ b/lib/librte_stack/rte_stack_lf.h @@ -5,11 +5,15 @@ #ifndef _RTE_STACK_LF_H_ #define _RTE_STACK_LF_H_ +#if !(defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)) +#include "rte_stack_lf_stubs.h" +#else #ifdef RTE_USE_C11_MEM_MODEL #include "rte_stack_lf_c11.h" #else #include "rte_stack_lf_generic.h" #endif +#endif /** * @internal Push several objects on the lock-free stack (MT-safe). diff --git a/lib/librte_stack/rte_stack_lf_c11.h b/lib/librte_stack/rte_stack_lf_c11.h index 3d677ae..999359f 100644 --- a/lib/librte_stack/rte_stack_lf_c11.h +++ b/lib/librte_stack/rte_stack_lf_c11.h @@ -36,12 +36,6 @@ __rte_stack_lf_push_elems(struct rte_stack_lf_list *list, struct rte_stack_lf_elem *last, unsigned int num) { -#ifndef RTE_ARCH_X86_64 - RTE_SET_USED(first); - RTE_SET_USED(last); - RTE_SET_USED(list); - RTE_SET_USED(num); -#else struct rte_stack_lf_head old_head; int success; @@ -79,7 +73,6 @@ __rte_stack_lf_push_elems(struct rte_stack_lf_list *list, * to the LIFO len update. */ __atomic_add_fetch(&list->len, num, __ATOMIC_RELEASE); -#endif } static __rte_always_inline struct rte_stack_lf_elem * @@ -88,14 +81,6 @@ __rte_stack_lf_pop_elems(struct rte_stack_lf_list *list, void **obj_table, struct rte_stack_lf_elem **last) { -#ifndef RTE_ARCH_X86_64 - RTE_SET_USED(obj_table); - RTE_SET_USED(last); - RTE_SET_USED(list); - RTE_SET_USED(num); - - return NULL; -#else struct rte_stack_lf_head old_head; uint64_t len; int success; @@ -169,7 +154,6 @@ __rte_stack_lf_pop_elems(struct rte_stack_lf_list *list, } while (success == 0); return old_head.top; -#endif } #endif /* _RTE_STACK_LF_C11_H_ */ diff --git a/lib/librte_stack/rte_stack_lf_generic.h b/lib/librte_stack/rte_stack_lf_generic.h index 3182151..3abbb53 100644 --- a/lib/librte_stack/rte_stack_lf_generic.h +++ b/lib/librte_stack/rte_stack_lf_generic.h @@ -36,12 +36,6 @@ __rte_stack_lf_push_elems(struct rte_stack_lf_list *list, struct rte_stack_lf_elem *last, unsigned int num) { -#ifndef RTE_ARCH_X86_64 - RTE_SET_USED(first); - RTE_SET_USED(last); - RTE_SET_USED(list); - RTE_SET_USED(num); -#else struct rte_stack_lf_head old_head; int success; @@ -75,7 +69,6 @@ __rte_stack_lf_push_elems(struct rte_stack_lf_list *list, } while (success == 0); rte_atomic64_add((rte_atomic64_t *)&list->len, num); -#endif } static __rte_always_inline struct rte_stack_lf_elem * @@ -84,14 +77,6 @@ __rte_stack_lf_pop_elems(struct rte_stack_lf_list *list, void **obj_table, struct rte_stack_lf_elem **last) { -#ifndef RTE_ARCH_X86_64 - RTE_SET_USED(obj_table); - RTE_SET_USED(last); - RTE_SET_USED(list); - RTE_SET_USED(num); - - return NULL; -#else struct rte_stack_lf_head old_head; int success; @@ -159,7 +144,6 @@ __rte_stack_lf_pop_elems(struct rte_stack_lf_list *list, } while (success == 0); return old_head.top; -#endif } #endif /* _RTE_STACK_LF_GENERIC_H_ */ diff --git a/lib/librte_stack/rte_stack_lf_stubs.h b/lib/librte_stack/rte_stack_lf_stubs.h new file mode 100644 index 0000000..d924bc6 --- /dev/null +++ b/lib/librte_stack/rte_stack_lf_stubs.h @@ -0,0 +1,59 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2019 Arm Limited + */ + +#ifndef _RTE_STACK_LF_STUBS_H_ +#define _RTE_STACK_LF_STUBS_H_ + +#include +#include + +static __rte_always_inline unsigned int +__rte_stack_lf_count(struct rte_stack *s) +{ + /* stack_lf_push() and stack_lf_pop() do not update the list's contents + * and stack_lf->len atomically, which can cause the list to appear + * shorter than it actually is if this function is called while other + * threads are modifying the list. + * + * However, given the inherently approximate nature of the get_count + * callback -- even if the list and its size were updated atomically, + * the size could change between when get_count executes and when the + * value is returned to the caller -- this is acceptable. + * + * The stack_lf->len updates are placed such that the list may appear to + * have fewer elements than it does, but will never appear to have more + * elements. If the mempool is near-empty to the point that this is a + * concern, the user should consider increasing the mempool size. + */ + return (unsigned int)rte_atomic64_read((rte_atomic64_t *) + &s->stack_lf.used.len); +} + +static __rte_always_inline void +__rte_stack_lf_push_elems(struct rte_stack_lf_list *list, + struct rte_stack_lf_elem *first, + struct rte_stack_lf_elem *last, + unsigned int num) +{ + RTE_SET_USED(first); + RTE_SET_USED(last); + RTE_SET_USED(list); + RTE_SET_USED(num); +} + +static __rte_always_inline struct rte_stack_lf_elem * +__rte_stack_lf_pop_elems(struct rte_stack_lf_list *list, + unsigned int num, + void **obj_table, + struct rte_stack_lf_elem **last) +{ + RTE_SET_USED(obj_table); + RTE_SET_USED(last); + RTE_SET_USED(list); + RTE_SET_USED(num); + + return NULL; +} + +#endif /* _RTE_STACK_LF_STUBS_H_ */