From patchwork Tue Jun 13 09:25:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavan Nikhilesh Bhagavatula X-Patchwork-Id: 128559 X-Patchwork-Delegate: jerinj@marvell.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id DFD8D42CA8; Tue, 13 Jun 2023 11:25:57 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D1B8C427F5; Tue, 13 Jun 2023 11:25:57 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by mails.dpdk.org (Postfix) with ESMTP id 98F1C427F2 for ; Tue, 13 Jun 2023 11:25:56 +0200 (CEST) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35D563If013028 for ; Tue, 13 Jun 2023 02:25:56 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=EIt9ux+afZ4YLWWQlNcCPhNPsUs8j1Y2N20+zVwy0+w=; b=ccOhL5Dj8/mDjVwjq8UdREEnvGeuu1pYdy3HkJN6Jni/Imd9N+1EE+RFHzrywnEuVp/T u39GBB6CmgktPvcgfEAWT8dHdCBSOiNK7hyBgaFuKudQ6Jw3j3gNr7xlTZfTifcZzXMz ShFHMkHOlfh5HqNzmdB5hvVQ+K3gRVIWNB4WEynCzz49ikEHnu9xI2k2W6WQ2EGyGS13 iUsVaYF0AIGojYMrEQexF0ESWExL1rUXrdr6+1UnM2DNPaNtD7qtvocJASeQCVbitrf1 6i/nOTcP19poZrPq+VhoNytpdX5hkYULjR8WeTvP3qHWi0obtWNyIS4EZs0qyn2U9M1k qg== Received: from dc5-exch02.marvell.com ([199.233.59.182]) by mx0b-0016f401.pphosted.com (PPS) with ESMTPS id 3r4rpkfj0c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Tue, 13 Jun 2023 02:25:55 -0700 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Tue, 13 Jun 2023 02:25:53 -0700 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.48 via Frontend Transport; Tue, 13 Jun 2023 02:25:53 -0700 Received: from MININT-80QBFE8.corp.innovium.com (unknown [10.28.164.122]) by maili.marvell.com (Postfix) with ESMTP id E91E93F708C; Tue, 13 Jun 2023 02:25:50 -0700 (PDT) From: To: , Pavan Nikhilesh , "Shijith Thotton" , Nithin Dabilpuram , Kiran Kumar K , Sunil Kumar Kori , Satha Rao CC: Subject: [PATCH v2 1/3] event/cnxk: align TX queue buffer adjustment Date: Tue, 13 Jun 2023 14:55:46 +0530 Message-ID: <20230613092548.1315-1-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230516143752.4941-1-pbhagavatula@marvell.com> References: <20230516143752.4941-1-pbhagavatula@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: SBzO5QnCDTxys4lyd_r_GXEwlcxRaSVI X-Proofpoint-GUID: SBzO5QnCDTxys4lyd_r_GXEwlcxRaSVI X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-13_04,2023-06-12_02,2023-05-22_02 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Pavan Nikhilesh Remove recalculating SQB thresholds in Tx queue buffer adjustment. The adjustment is already done during Tx queue setup. Signed-off-by: Pavan Nikhilesh --- v2 Changes: - Rebase on ToT. drivers/event/cnxk/cn10k_eventdev.c | 9 +-------- drivers/event/cnxk/cn10k_tx_worker.h | 6 +++--- drivers/event/cnxk/cn9k_eventdev.c | 9 +-------- drivers/event/cnxk/cn9k_worker.h | 12 +++++++++--- drivers/net/cnxk/cn10k_tx.h | 12 ++++++------ drivers/net/cnxk/cn9k_tx.h | 5 +++-- 6 files changed, 23 insertions(+), 30 deletions(-) -- 2.25.1 diff --git a/drivers/event/cnxk/cn10k_eventdev.c b/drivers/event/cnxk/cn10k_eventdev.c index 670fc9e926..8ee9ab3c5c 100644 --- a/drivers/event/cnxk/cn10k_eventdev.c +++ b/drivers/event/cnxk/cn10k_eventdev.c @@ -843,16 +843,9 @@ cn10k_sso_txq_fc_update(const struct rte_eth_dev *eth_dev, int32_t tx_queue_id) sq = &cnxk_eth_dev->sqs[tx_queue_id]; txq = eth_dev->data->tx_queues[tx_queue_id]; sqes_per_sqb = 1U << txq->sqes_per_sqb_log2; - sq->nb_sqb_bufs_adj = - sq->nb_sqb_bufs - - RTE_ALIGN_MUL_CEIL(sq->nb_sqb_bufs, sqes_per_sqb) / - sqes_per_sqb; if (cnxk_eth_dev->tx_offloads & RTE_ETH_TX_OFFLOAD_SECURITY) - sq->nb_sqb_bufs_adj -= (cnxk_eth_dev->outb.nb_desc / - (sqes_per_sqb - 1)); + sq->nb_sqb_bufs_adj -= (cnxk_eth_dev->outb.nb_desc / sqes_per_sqb); txq->nb_sqb_bufs_adj = sq->nb_sqb_bufs_adj; - txq->nb_sqb_bufs_adj = - ((100 - ROC_NIX_SQB_THRESH) * txq->nb_sqb_bufs_adj) / 100; } } diff --git a/drivers/event/cnxk/cn10k_tx_worker.h b/drivers/event/cnxk/cn10k_tx_worker.h index 31cbccf7d6..b6c9bb1d26 100644 --- a/drivers/event/cnxk/cn10k_tx_worker.h +++ b/drivers/event/cnxk/cn10k_tx_worker.h @@ -32,9 +32,9 @@ cn10k_sso_txq_fc_wait(const struct cn10k_eth_txq *txq) static __rte_always_inline int32_t cn10k_sso_sq_depth(const struct cn10k_eth_txq *txq) { - return (txq->nb_sqb_bufs_adj - - __atomic_load_n((int16_t *)txq->fc_mem, __ATOMIC_RELAXED)) - << txq->sqes_per_sqb_log2; + int32_t avail = (int32_t)txq->nb_sqb_bufs_adj - + (int32_t)__atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED); + return (avail << txq->sqes_per_sqb_log2) - avail; } static __rte_always_inline uint16_t diff --git a/drivers/event/cnxk/cn9k_eventdev.c b/drivers/event/cnxk/cn9k_eventdev.c index 7ed9aa1331..dde58b60e4 100644 --- a/drivers/event/cnxk/cn9k_eventdev.c +++ b/drivers/event/cnxk/cn9k_eventdev.c @@ -877,16 +877,9 @@ cn9k_sso_txq_fc_update(const struct rte_eth_dev *eth_dev, int32_t tx_queue_id) sq = &cnxk_eth_dev->sqs[tx_queue_id]; txq = eth_dev->data->tx_queues[tx_queue_id]; sqes_per_sqb = 1U << txq->sqes_per_sqb_log2; - sq->nb_sqb_bufs_adj = - sq->nb_sqb_bufs - - RTE_ALIGN_MUL_CEIL(sq->nb_sqb_bufs, sqes_per_sqb) / - sqes_per_sqb; if (cnxk_eth_dev->tx_offloads & RTE_ETH_TX_OFFLOAD_SECURITY) - sq->nb_sqb_bufs_adj -= (cnxk_eth_dev->outb.nb_desc / - (sqes_per_sqb - 1)); + sq->nb_sqb_bufs_adj -= (cnxk_eth_dev->outb.nb_desc / sqes_per_sqb); txq->nb_sqb_bufs_adj = sq->nb_sqb_bufs_adj; - txq->nb_sqb_bufs_adj = - ((100 - ROC_NIX_SQB_THRESH) * txq->nb_sqb_bufs_adj) / 100; } } diff --git a/drivers/event/cnxk/cn9k_worker.h b/drivers/event/cnxk/cn9k_worker.h index ec2c1c68dd..ed3b97d7e1 100644 --- a/drivers/event/cnxk/cn9k_worker.h +++ b/drivers/event/cnxk/cn9k_worker.h @@ -713,6 +713,14 @@ cn9k_sso_hws_xmit_sec_one(const struct cn9k_eth_txq *txq, uint64_t base, } #endif +static __rte_always_inline int32_t +cn9k_sso_sq_depth(const struct cn9k_eth_txq *txq) +{ + int32_t avail = (int32_t)txq->nb_sqb_bufs_adj - + (int32_t)__atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED); + return (avail << txq->sqes_per_sqb_log2) - avail; +} + static __rte_always_inline uint16_t cn9k_sso_hws_event_tx(uint64_t base, struct rte_event *ev, uint64_t *cmd, uint64_t *txq_data, const uint32_t flags) @@ -736,9 +744,7 @@ cn9k_sso_hws_event_tx(uint64_t base, struct rte_event *ev, uint64_t *cmd, if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && txq->tx_compl.ena) handle_tx_completion_pkts(txq, 1); - if (((txq->nb_sqb_bufs_adj - - __atomic_load_n((int16_t *)txq->fc_mem, __ATOMIC_RELAXED)) - << txq->sqes_per_sqb_log2) <= 0) + if (cn9k_sso_sq_depth(txq) <= 0) return 0; cn9k_nix_tx_skeleton(txq, cmd, flags, 0); cn9k_nix_xmit_prepare(txq, m, cmd, flags, txq->lso_tun_fmt, txq->mark_flag, diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h index 4f23a8dfc3..a365cbe0ee 100644 --- a/drivers/net/cnxk/cn10k_tx.h +++ b/drivers/net/cnxk/cn10k_tx.h @@ -35,12 +35,13 @@ #define NIX_XMIT_FC_OR_RETURN(txq, pkts) \ do { \ + int64_t avail; \ /* Cached value is low, Update the fc_cache_pkts */ \ if (unlikely((txq)->fc_cache_pkts < (pkts))) { \ + avail = txq->nb_sqb_bufs_adj - *txq->fc_mem; \ /* Multiply with sqe_per_sqb to express in pkts */ \ (txq)->fc_cache_pkts = \ - ((txq)->nb_sqb_bufs_adj - *(txq)->fc_mem) \ - << (txq)->sqes_per_sqb_log2; \ + (avail << (txq)->sqes_per_sqb_log2) - avail; \ /* Check it again for the room */ \ if (unlikely((txq)->fc_cache_pkts < (pkts))) \ return 0; \ @@ -113,10 +114,9 @@ cn10k_nix_vwqe_wait_fc(struct cn10k_eth_txq *txq, int64_t req) if (cached < 0) { /* Check if we have space else retry. */ do { - refill = - (txq->nb_sqb_bufs_adj - - __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED)) - << txq->sqes_per_sqb_log2; + refill = txq->nb_sqb_bufs_adj - + __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED); + refill = (refill << txq->sqes_per_sqb_log2) - refill; } while (refill <= 0); __atomic_compare_exchange(&txq->fc_cache_pkts, &cached, &refill, 0, __ATOMIC_RELEASE, diff --git a/drivers/net/cnxk/cn9k_tx.h b/drivers/net/cnxk/cn9k_tx.h index 8f1e05a461..fba4bb4215 100644 --- a/drivers/net/cnxk/cn9k_tx.h +++ b/drivers/net/cnxk/cn9k_tx.h @@ -32,12 +32,13 @@ #define NIX_XMIT_FC_OR_RETURN(txq, pkts) \ do { \ + int64_t avail; \ /* Cached value is low, Update the fc_cache_pkts */ \ if (unlikely((txq)->fc_cache_pkts < (pkts))) { \ + avail = txq->nb_sqb_bufs_adj - *txq->fc_mem; \ /* Multiply with sqe_per_sqb to express in pkts */ \ (txq)->fc_cache_pkts = \ - ((txq)->nb_sqb_bufs_adj - *(txq)->fc_mem) \ - << (txq)->sqes_per_sqb_log2; \ + (avail << (txq)->sqes_per_sqb_log2) - avail; \ /* Check it again for the room */ \ if (unlikely((txq)->fc_cache_pkts < (pkts))) \ return 0; \ From patchwork Tue Jun 13 09:25:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavan Nikhilesh Bhagavatula X-Patchwork-Id: 128560 X-Patchwork-Delegate: jerinj@marvell.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4770842CA8; Tue, 13 Jun 2023 11:26:03 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id DCAD842D0B; Tue, 13 Jun 2023 11:25:59 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id 7C97C42C54 for ; Tue, 13 Jun 2023 11:25:58 +0200 (CEST) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35D56F7I005629 for ; Tue, 13 Jun 2023 02:25:57 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=fUBPnMXGlGplJDLtQfB9S4Kjn7S+J8aO+vhrNFtzYOQ=; b=Sz7wbjCTtU9U2DweMCVS0eSqcccMEZD+so5AmyRhNKDASYxARxwxlGN4lIcRJvuHdsJv f+QkFElXQusn3S7+4Bmod9krZyu+apH2OAuK4lyq1w0MqqzPaPeTJiUrD/GlEe+fYYeA t5H1zb0bX8SpF1gYFkUpEuCVsgqfO9SCZqmbm2JHQgUQbgiZPsGD1w/6TogA3OsPzZ8H SgbJkgP5es7IO4eNRnJxHjfkvzu6QuBaSF5FcS5Vd6Yq7M4YkQS0KjLo6/1swh9wBcRJ fJoO0ZI3yKpMWKkursIiTLjFiNsDlSQqyaaAjIK43hBjJvTDiLwUsYadbpgSFcKZS5xc UA== Received: from dc5-exch01.marvell.com ([199.233.59.181]) by mx0a-0016f401.pphosted.com (PPS) with ESMTPS id 3r65023jg2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Tue, 13 Jun 2023 02:25:57 -0700 Received: from DC5-EXCH02.marvell.com (10.69.176.39) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Tue, 13 Jun 2023 02:25:55 -0700 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server id 15.0.1497.48 via Frontend Transport; Tue, 13 Jun 2023 02:25:55 -0700 Received: from MININT-80QBFE8.corp.innovium.com (unknown [10.28.164.122]) by maili.marvell.com (Postfix) with ESMTP id 2E8963F708D; Tue, 13 Jun 2023 02:25:53 -0700 (PDT) From: To: , Pavan Nikhilesh , "Shijith Thotton" CC: Subject: [PATCH v2 2/3] event/cnxk: use local labels in asm intrinsic Date: Tue, 13 Jun 2023 14:55:47 +0530 Message-ID: <20230613092548.1315-2-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230613092548.1315-1-pbhagavatula@marvell.com> References: <20230516143752.4941-1-pbhagavatula@marvell.com> <20230613092548.1315-1-pbhagavatula@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: VFil_vA810NcGJN8lW9R7U0WZ8NoZWtR X-Proofpoint-GUID: VFil_vA810NcGJN8lW9R7U0WZ8NoZWtR X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-13_04,2023-06-12_02,2023-05-22_02 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Pavan Nikhilesh Using labels in asm generates them as regular function and shades callstack in tools like gdb or perf. Use local label instead for better visibility. Signed-off-by: Pavan Nikhilesh --- drivers/event/cnxk/cn10k_worker.h | 8 ++--- drivers/event/cnxk/cn9k_worker.h | 25 ++++++++-------- drivers/event/cnxk/cnxk_tim_worker.h | 44 ++++++++++++++-------------- drivers/event/cnxk/cnxk_worker.h | 8 ++--- 4 files changed, 43 insertions(+), 42 deletions(-) diff --git a/drivers/event/cnxk/cn10k_worker.h b/drivers/event/cnxk/cn10k_worker.h index a01894ae10..2af0bb3f9f 100644 --- a/drivers/event/cnxk/cn10k_worker.h +++ b/drivers/event/cnxk/cn10k_worker.h @@ -269,12 +269,12 @@ cn10k_sso_hws_get_work_empty(struct cn10k_sso_hws *ws, struct rte_event *ev, #ifdef RTE_ARCH_ARM64 asm volatile(PLT_CPU_FEATURE_PREAMBLE " ldp %[tag], %[wqp], [%[tag_loc]] \n" - " tbz %[tag], 63, done%= \n" + " tbz %[tag], 63, .Ldone%= \n" " sevl \n" - "rty%=: wfe \n" + ".Lrty%=: wfe \n" " ldp %[tag], %[wqp], [%[tag_loc]] \n" - " tbnz %[tag], 63, rty%= \n" - "done%=: dmb ld \n" + " tbnz %[tag], 63, .Lrty%= \n" + ".Ldone%=: dmb ld \n" : [tag] "=&r"(gw.u64[0]), [wqp] "=&r"(gw.u64[1]) : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0) : "memory"); diff --git a/drivers/event/cnxk/cn9k_worker.h b/drivers/event/cnxk/cn9k_worker.h index ed3b97d7e1..9ddab095ac 100644 --- a/drivers/event/cnxk/cn9k_worker.h +++ b/drivers/event/cnxk/cn9k_worker.h @@ -232,18 +232,19 @@ cn9k_sso_hws_dual_get_work(uint64_t base, uint64_t pair_base, rte_prefetch_non_temporal(dws->lookup_mem); #ifdef RTE_ARCH_ARM64 asm volatile(PLT_CPU_FEATURE_PREAMBLE - "rty%=: \n" + ".Lrty%=: \n" " ldr %[tag], [%[tag_loc]] \n" " ldr %[wqp], [%[wqp_loc]] \n" - " tbnz %[tag], 63, rty%= \n" - "done%=: str %[gw], [%[pong]] \n" + " tbnz %[tag], 63, .Lrty%= \n" + ".Ldone%=: str %[gw], [%[pong]] \n" " dmb ld \n" " sub %[mbuf], %[wqp], #0x80 \n" " prfm pldl1keep, [%[mbuf]] \n" : [tag] "=&r"(gw.u64[0]), [wqp] "=&r"(gw.u64[1]), [mbuf] "=&r"(mbuf) : [tag_loc] "r"(base + SSOW_LF_GWS_TAG), - [wqp_loc] "r"(base + SSOW_LF_GWS_WQP), [gw] "r"(dws->gw_wdata), + [wqp_loc] "r"(base + SSOW_LF_GWS_WQP), + [gw] "r"(dws->gw_wdata), [pong] "r"(pair_base + SSOW_LF_GWS_OP_GET_WORK0)); #else gw.u64[0] = plt_read64(base + SSOW_LF_GWS_TAG); @@ -282,13 +283,13 @@ cn9k_sso_hws_get_work(struct cn9k_sso_hws *ws, struct rte_event *ev, asm volatile(PLT_CPU_FEATURE_PREAMBLE " ldr %[tag], [%[tag_loc]] \n" " ldr %[wqp], [%[wqp_loc]] \n" - " tbz %[tag], 63, done%= \n" + " tbz %[tag], 63, .Ldone%= \n" " sevl \n" - "rty%=: wfe \n" + ".Lrty%=: wfe \n" " ldr %[tag], [%[tag_loc]] \n" " ldr %[wqp], [%[wqp_loc]] \n" - " tbnz %[tag], 63, rty%= \n" - "done%=: dmb ld \n" + " tbnz %[tag], 63, .Lrty%= \n" + ".Ldone%=: dmb ld \n" " sub %[mbuf], %[wqp], #0x80 \n" " prfm pldl1keep, [%[mbuf]] \n" : [tag] "=&r"(gw.u64[0]), [wqp] "=&r"(gw.u64[1]), @@ -330,13 +331,13 @@ cn9k_sso_hws_get_work_empty(uint64_t base, struct rte_event *ev, asm volatile(PLT_CPU_FEATURE_PREAMBLE " ldr %[tag], [%[tag_loc]] \n" " ldr %[wqp], [%[wqp_loc]] \n" - " tbz %[tag], 63, done%= \n" + " tbz %[tag], 63, .Ldone%= \n" " sevl \n" - "rty%=: wfe \n" + ".Lrty%=: wfe \n" " ldr %[tag], [%[tag_loc]] \n" " ldr %[wqp], [%[wqp_loc]] \n" - " tbnz %[tag], 63, rty%= \n" - "done%=: dmb ld \n" + " tbnz %[tag], 63, .Lrty%= \n" + ".Ldone%=: dmb ld \n" " sub %[mbuf], %[wqp], #0x80 \n" : [tag] "=&r"(gw.u64[0]), [wqp] "=&r"(gw.u64[1]), [mbuf] "=&r"(mbuf) diff --git a/drivers/event/cnxk/cnxk_tim_worker.h b/drivers/event/cnxk/cnxk_tim_worker.h index 8fafb8f09c..f0857f26ba 100644 --- a/drivers/event/cnxk/cnxk_tim_worker.h +++ b/drivers/event/cnxk/cnxk_tim_worker.h @@ -262,12 +262,12 @@ cnxk_tim_add_entry_sp(struct cnxk_tim_ring *const tim_ring, #ifdef RTE_ARCH_ARM64 asm volatile(PLT_CPU_FEATURE_PREAMBLE " ldxr %[hbt], [%[w1]] \n" - " tbz %[hbt], 33, dne%= \n" + " tbz %[hbt], 33, .Ldne%= \n" " sevl \n" - "rty%=: wfe \n" + ".Lrty%=: wfe \n" " ldxr %[hbt], [%[w1]] \n" - " tbnz %[hbt], 33, rty%= \n" - "dne%=: \n" + " tbnz %[hbt], 33, .Lrty%=\n" + ".Ldne%=: \n" : [hbt] "=&r"(hbt_state) : [w1] "r"((&bkt->w1)) : "memory"); @@ -345,12 +345,12 @@ cnxk_tim_add_entry_mp(struct cnxk_tim_ring *const tim_ring, #ifdef RTE_ARCH_ARM64 asm volatile(PLT_CPU_FEATURE_PREAMBLE " ldxr %[hbt], [%[w1]] \n" - " tbz %[hbt], 33, dne%= \n" + " tbz %[hbt], 33, .Ldne%= \n" " sevl \n" - "rty%=: wfe \n" + ".Lrty%=: wfe \n" " ldxr %[hbt], [%[w1]] \n" - " tbnz %[hbt], 33, rty%= \n" - "dne%=: \n" + " tbnz %[hbt], 33, .Lrty%=\n" + ".Ldne%=: \n" : [hbt] "=&r"(hbt_state) : [w1] "r"((&bkt->w1)) : "memory"); @@ -374,13 +374,13 @@ cnxk_tim_add_entry_mp(struct cnxk_tim_ring *const tim_ring, cnxk_tim_bkt_dec_lock(bkt); #ifdef RTE_ARCH_ARM64 asm volatile(PLT_CPU_FEATURE_PREAMBLE - " ldxr %[rem], [%[crem]] \n" - " tbz %[rem], 63, dne%= \n" + " ldxr %[rem], [%[crem]] \n" + " tbz %[rem], 63, .Ldne%= \n" " sevl \n" - "rty%=: wfe \n" - " ldxr %[rem], [%[crem]] \n" - " tbnz %[rem], 63, rty%= \n" - "dne%=: \n" + ".Lrty%=: wfe \n" + " ldxr %[rem], [%[crem]] \n" + " tbnz %[rem], 63, .Lrty%= \n" + ".Ldne%=: \n" : [rem] "=&r"(rem) : [crem] "r"(&bkt->w1) : "memory"); @@ -478,12 +478,12 @@ cnxk_tim_add_entry_brst(struct cnxk_tim_ring *const tim_ring, #ifdef RTE_ARCH_ARM64 asm volatile(PLT_CPU_FEATURE_PREAMBLE " ldxr %[hbt], [%[w1]] \n" - " tbz %[hbt], 33, dne%= \n" + " tbz %[hbt], 33, .Ldne%= \n" " sevl \n" - "rty%=: wfe \n" + ".Lrty%=: wfe \n" " ldxr %[hbt], [%[w1]] \n" - " tbnz %[hbt], 33, rty%= \n" - "dne%=: \n" + " tbnz %[hbt], 33, .Lrty%=\n" + ".Ldne%=: \n" : [hbt] "=&r"(hbt_state) : [w1] "r"((&bkt->w1)) : "memory"); @@ -510,13 +510,13 @@ cnxk_tim_add_entry_brst(struct cnxk_tim_ring *const tim_ring, asm volatile(PLT_CPU_FEATURE_PREAMBLE " ldxrb %w[lock_cnt], [%[lock]] \n" " tst %w[lock_cnt], 255 \n" - " beq dne%= \n" + " beq .Ldne%= \n" " sevl \n" - "rty%=: wfe \n" + ".Lrty%=: wfe \n" " ldxrb %w[lock_cnt], [%[lock]] \n" " tst %w[lock_cnt], 255 \n" - " bne rty%= \n" - "dne%=: \n" + " bne .Lrty%= \n" + ".Ldne%=: \n" : [lock_cnt] "=&r"(lock_cnt) : [lock] "r"(&bkt->lock) : "memory"); diff --git a/drivers/event/cnxk/cnxk_worker.h b/drivers/event/cnxk/cnxk_worker.h index 22d90afba2..2bd41f8a5e 100644 --- a/drivers/event/cnxk/cnxk_worker.h +++ b/drivers/event/cnxk/cnxk_worker.h @@ -71,12 +71,12 @@ cnxk_sso_hws_swtag_wait(uintptr_t tag_op) asm volatile(PLT_CPU_FEATURE_PREAMBLE " ldr %[swtb], [%[swtp_loc]] \n" - " tbz %[swtb], 62, done%= \n" + " tbz %[swtb], 62, .Ldone%= \n" " sevl \n" - "rty%=: wfe \n" + ".Lrty%=: wfe \n" " ldr %[swtb], [%[swtp_loc]] \n" - " tbnz %[swtb], 62, rty%= \n" - "done%=: \n" + " tbnz %[swtb], 62, .Lrty%= \n" + ".Ldone%=: \n" : [swtb] "=&r"(swtp) : [swtp_loc] "r"(tag_op)); #else From patchwork Tue Jun 13 09:25:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavan Nikhilesh Bhagavatula X-Patchwork-Id: 128561 X-Patchwork-Delegate: jerinj@marvell.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id BB94142CA8; Tue, 13 Jun 2023 11:26:09 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id DF0F242D12; Tue, 13 Jun 2023 11:26:02 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by mails.dpdk.org (Postfix) with ESMTP id CDF0442D16 for ; Tue, 13 Jun 2023 11:26:01 +0200 (CEST) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35D563Ih013028 for ; Tue, 13 Jun 2023 02:26:01 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=Ip2EjRx2+1IAUgtpUrZXU9i4sD0/vH6Fq0jKVccpDeM=; b=i42adTjgrKhsv81iYGzPihVNfnnV5M9i2VYe7QoFncKd0oqgf/zy+/xGMG0mg8a9BGFs yEon6R9FQzGqB5iYtUt12+rErwwLtbs10JJkId8v5lKnld05vxMJSJ3Vvp7u2ly+4Jy9 zG3gHHxy4pr07kuaGMqohITZpHjlZc0lpopF+FM3+fE0dEfZmE+F/D+hw6kiJqoUkJvn cUI9XdVaskfE20HksqTY23GBLuVOBEXKjhclxl8wNOBaG2xJjud9Dd/+HFbJk7kfZDtb E7Qo6YrcUdK6uyNP2srsCDji5gJMHkuv5yYp7rc8pSmwgWg2f4lklzEms6yVklnrV7m2 Kg== Received: from dc5-exch02.marvell.com ([199.233.59.182]) by mx0b-0016f401.pphosted.com (PPS) with ESMTPS id 3r4rpkfj1g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Tue, 13 Jun 2023 02:26:01 -0700 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Tue, 13 Jun 2023 02:25:59 -0700 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.48 via Frontend Transport; Tue, 13 Jun 2023 02:25:59 -0700 Received: from MININT-80QBFE8.corp.innovium.com (unknown [10.28.164.122]) by maili.marvell.com (Postfix) with ESMTP id 59BEC3F708C; Tue, 13 Jun 2023 02:25:56 -0700 (PDT) From: To: , Pavan Nikhilesh , "Shijith Thotton" , Nithin Dabilpuram , Kiran Kumar K , Sunil Kumar Kori , Satha Rao CC: Subject: [PATCH v2 3/3] event/cnxk: use WFE in Tx fc wait Date: Tue, 13 Jun 2023 14:55:48 +0530 Message-ID: <20230613092548.1315-3-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230613092548.1315-1-pbhagavatula@marvell.com> References: <20230516143752.4941-1-pbhagavatula@marvell.com> <20230613092548.1315-1-pbhagavatula@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: 9NleZBztNpsG5CrYPyyp7XIwowU6fJdO X-Proofpoint-GUID: 9NleZBztNpsG5CrYPyyp7XIwowU6fJdO X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-13_04,2023-06-12_02,2023-05-22_02 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Pavan Nikhilesh Use WFE is Tx path when waiting for space in the Tx queue. Depending upon the Tx queue contention and size, WFE will reduce the cache pressure and power consumption. In multi-core scenarios we have observed up to 8W power reduction. Signed-off-by: Pavan Nikhilesh --- drivers/event/cnxk/cn10k_tx_worker.h | 18 ++++ drivers/net/cnxk/cn10k_tx.h | 152 +++++++++++++++++++++++---- 2 files changed, 147 insertions(+), 23 deletions(-) diff --git a/drivers/event/cnxk/cn10k_tx_worker.h b/drivers/event/cnxk/cn10k_tx_worker.h index b6c9bb1d26..dea6cdcde2 100644 --- a/drivers/event/cnxk/cn10k_tx_worker.h +++ b/drivers/event/cnxk/cn10k_tx_worker.h @@ -24,9 +24,27 @@ cn10k_sso_hws_xtract_meta(struct rte_mbuf *m, const uint64_t *txq_data) static __rte_always_inline void cn10k_sso_txq_fc_wait(const struct cn10k_eth_txq *txq) { +#ifdef RTE_ARCH_ARM64 + uint64_t space; + + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldxr %[space], [%[addr]] \n" + " cmp %[adj], %[space] \n" + " b.hi .Ldne%= \n" + " sevl \n" + ".Lrty%=: wfe \n" + " ldxr %[space], [%[addr]] \n" + " cmp %[adj], %[space] \n" + " b.ls .Lrty%= \n" + ".Ldne%=: \n" + : [space] "=&r"(space) + : [adj] "r"(txq->nb_sqb_bufs_adj), [addr] "r"(txq->fc_mem) + : "memory"); +#else while ((uint64_t)txq->nb_sqb_bufs_adj <= __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED)) ; +#endif } static __rte_always_inline int32_t diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h index a365cbe0ee..d0e8350ce2 100644 --- a/drivers/net/cnxk/cn10k_tx.h +++ b/drivers/net/cnxk/cn10k_tx.h @@ -102,27 +102,72 @@ cn10k_nix_tx_mbuf_validate(struct rte_mbuf *m, const uint32_t flags) } static __plt_always_inline void -cn10k_nix_vwqe_wait_fc(struct cn10k_eth_txq *txq, int64_t req) +cn10k_nix_vwqe_wait_fc(struct cn10k_eth_txq *txq, uint16_t req) { int64_t cached, refill; + int64_t pkts; retry: +#ifdef RTE_ARCH_ARM64 + + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldxr %[pkts], [%[addr]] \n" + " tbz %[pkts], 63, .Ldne%= \n" + " sevl \n" + ".Lrty%=: wfe \n" + " ldxr %[pkts], [%[addr]] \n" + " tbnz %[pkts], 63, .Lrty%= \n" + ".Ldne%=: \n" + : [pkts] "=&r"(pkts) + : [addr] "r"(&txq->fc_cache_pkts) + : "memory"); +#else + RTE_SET_USED(pkts); while (__atomic_load_n(&txq->fc_cache_pkts, __ATOMIC_RELAXED) < 0) ; +#endif cached = __atomic_fetch_sub(&txq->fc_cache_pkts, req, __ATOMIC_ACQUIRE) - req; /* Check if there is enough space, else update and retry. */ - if (cached < 0) { - /* Check if we have space else retry. */ - do { - refill = txq->nb_sqb_bufs_adj - - __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED); - refill = (refill << txq->sqes_per_sqb_log2) - refill; - } while (refill <= 0); - __atomic_compare_exchange(&txq->fc_cache_pkts, &cached, &refill, - 0, __ATOMIC_RELEASE, - __ATOMIC_RELAXED); + if (cached >= 0) + return; + + /* Check if we have space else retry. */ +#ifdef RTE_ARCH_ARM64 + int64_t val; + + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldxr %[val], [%[addr]] \n" + " sub %[val], %[adj], %[val] \n" + " lsl %[refill], %[val], %[shft] \n" + " sub %[refill], %[refill], %[val] \n" + " sub %[refill], %[refill], %[sub] \n" + " cmp %[refill], #0x0 \n" + " b.ge .Ldne%= \n" + " sevl \n" + ".Lrty%=: wfe \n" + " ldxr %[val], [%[addr]] \n" + " sub %[val], %[adj], %[val] \n" + " lsl %[refill], %[val], %[shft] \n" + " sub %[refill], %[refill], %[val] \n" + " sub %[refill], %[refill], %[sub] \n" + " cmp %[refill], #0x0 \n" + " b.lt .Lrty%= \n" + ".Ldne%=: \n" + : [refill] "=&r"(refill), [val] "=&r" (val) + : [addr] "r"(txq->fc_mem), [adj] "r"(txq->nb_sqb_bufs_adj), + [shft] "r"(txq->sqes_per_sqb_log2), [sub] "r"(req) + : "memory"); +#else + do { + refill = (txq->nb_sqb_bufs_adj - __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED)); + refill = (refill << txq->sqes_per_sqb_log2) - refill; + refill -= req; + } while (refill < 0); +#endif + if (!__atomic_compare_exchange(&txq->fc_cache_pkts, &cached, &refill, + 0, __ATOMIC_RELEASE, + __ATOMIC_RELAXED)) goto retry; - } } /* Function to determine no of tx subdesc required in case ext @@ -283,10 +328,27 @@ static __rte_always_inline void cn10k_nix_sec_fc_wait_one(struct cn10k_eth_txq *txq) { uint64_t nb_desc = txq->cpt_desc; - uint64_t *fc = txq->cpt_fc; - - while (nb_desc <= __atomic_load_n(fc, __ATOMIC_RELAXED)) + uint64_t fc; + +#ifdef RTE_ARCH_ARM64 + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldxr %[space], [%[addr]] \n" + " cmp %[nb_desc], %[space] \n" + " b.hi .Ldne%= \n" + " sevl \n" + ".Lrty%=: wfe \n" + " ldxr %[space], [%[addr]] \n" + " cmp %[nb_desc], %[space] \n" + " b.ls .Lrty%= \n" + ".Ldne%=: \n" + : [space] "=&r"(fc) + : [nb_desc] "r"(nb_desc), [addr] "r"(txq->cpt_fc) + : "memory"); +#else + RTE_SET_USED(fc); + while (nb_desc <= __atomic_load_n(txq->cpt_fc, __ATOMIC_RELAXED)) ; +#endif } static __rte_always_inline void @@ -294,7 +356,7 @@ cn10k_nix_sec_fc_wait(struct cn10k_eth_txq *txq, uint16_t nb_pkts) { int32_t nb_desc, val, newval; int32_t *fc_sw; - volatile uint64_t *fc; + uint64_t *fc; /* Check if there is any CPT instruction to submit */ if (!nb_pkts) @@ -302,21 +364,59 @@ cn10k_nix_sec_fc_wait(struct cn10k_eth_txq *txq, uint16_t nb_pkts) again: fc_sw = txq->cpt_fc_sw; - val = __atomic_fetch_sub(fc_sw, nb_pkts, __ATOMIC_RELAXED) - nb_pkts; +#ifdef RTE_ARCH_ARM64 + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldxr %w[pkts], [%[addr]] \n" + " tbz %w[pkts], 31, .Ldne%= \n" + " sevl \n" + ".Lrty%=: wfe \n" + " ldxr %w[pkts], [%[addr]] \n" + " tbnz %w[pkts], 31, .Lrty%= \n" + ".Ldne%=: \n" + : [pkts] "=&r"(val) + : [addr] "r"(fc_sw) + : "memory"); +#else + /* Wait for primary core to refill FC. */ + while (__atomic_load_n(fc_sw, __ATOMIC_RELAXED) < 0) + ; +#endif + + val = __atomic_fetch_sub(fc_sw, nb_pkts, __ATOMIC_ACQUIRE) - nb_pkts; if (likely(val >= 0)) return; nb_desc = txq->cpt_desc; fc = txq->cpt_fc; +#ifdef RTE_ARCH_ARM64 + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldxr %[refill], [%[addr]] \n" + " sub %[refill], %[desc], %[refill] \n" + " sub %[refill], %[refill], %[pkts] \n" + " cmp %[refill], #0x0 \n" + " b.ge .Ldne%= \n" + " sevl \n" + ".Lrty%=: wfe \n" + " ldxr %[refill], [%[addr]] \n" + " sub %[refill], %[desc], %[refill] \n" + " sub %[refill], %[refill], %[pkts] \n" + " cmp %[refill], #0x0 \n" + " b.lt .Lrty%= \n" + ".Ldne%=: \n" + : [refill] "=&r"(newval) + : [addr] "r"(fc), [desc] "r"(nb_desc), [pkts] "r"(nb_pkts) + : "memory"); +#else while (true) { newval = nb_desc - __atomic_load_n(fc, __ATOMIC_RELAXED); newval -= nb_pkts; if (newval >= 0) break; } +#endif - if (!__atomic_compare_exchange_n(fc_sw, &val, newval, false, - __ATOMIC_RELAXED, __ATOMIC_RELAXED)) + if (!__atomic_compare_exchange_n(fc_sw, &val, newval, false, __ATOMIC_RELEASE, + __ATOMIC_RELAXED)) goto again; } @@ -3033,10 +3133,16 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, wd.data[1] |= ((uint64_t)(lnum - 17)) << 12; wd.data[1] |= (uint64_t)(lmt_id + 16); - if (flags & NIX_TX_VWQE_F) - cn10k_nix_vwqe_wait_fc(txq, - burst - (cn10k_nix_pkts_per_vec_brst(flags) >> - 1)); + if (flags & NIX_TX_VWQE_F) { + if (flags & NIX_TX_MULTI_SEG_F) { + if (burst - (cn10k_nix_pkts_per_vec_brst(flags) >> 1) > 0) + cn10k_nix_vwqe_wait_fc(txq, + burst - (cn10k_nix_pkts_per_vec_brst(flags) >> 1)); + } else { + cn10k_nix_vwqe_wait_fc(txq, + burst - (cn10k_nix_pkts_per_vec_brst(flags) >> 1)); + } + } /* STEOR1 */ roc_lmt_submit_steorl(wd.data[1], pa); } else if (lnum) {