From patchwork Fri Feb 14 06:45:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavan Nikhilesh Bhagavatula X-Patchwork-Id: 65809 X-Patchwork-Delegate: jerinj@marvell.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 95514A0547; Fri, 14 Feb 2020 07:45:41 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 9F6F71BDAE; Fri, 14 Feb 2020 07:45:40 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by dpdk.org (Postfix) with ESMTP id 6E7C05B3C for ; Fri, 14 Feb 2020 07:45:39 +0100 (CET) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 01E6itAK003844; Thu, 13 Feb 2020 22:45:32 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding : content-type; s=pfpt0818; bh=hzncQI/QVlvVTT+fqX4lknp1fGprxKpm6thSMNTo/7s=; b=qiblMEwce1hMXLCt2qx8UhvupPDdyu4QnLwtl4GeTSbZsQBhNTvwePWAgLgw7aTgoyGo Be+Lo9c3mtfkqXAEfUjjBK0xGCmCKvKLO0qEeHuiF5p3awXYJOkHleRQkskeZIvxYiZf lhedlMmllx8qrNRN235eCaEGWZA9/qa04hSr7xsHZ0PoGfbpragOMlThV26K2+LzV07t +gGv9hv3hAgQpDTAZQLDcY0+6JJ5hpW7iwrXmp6k4Ql3g7VOO9EAD+2omASQTOgXjjM5 WvxJDrl6PqpX+Cdv8pKpjrau/F/JtttJyrWW+ASQlF6K5GE+fUAS6i5b5h8vnO9ZwLvY fw== Received: from sc-exch03.marvell.com ([199.233.58.183]) by mx0b-0016f401.pphosted.com with ESMTP id 2y4j5k1t2k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Thu, 13 Feb 2020 22:45:32 -0800 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by SC-EXCH03.marvell.com (10.93.176.83) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 13 Feb 2020 22:45:29 -0800 Received: from SC-EXCH03.marvell.com (10.93.176.83) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 13 Feb 2020 22:45:29 -0800 Received: from maili.marvell.com (10.93.176.43) by SC-EXCH03.marvell.com (10.93.176.83) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Thu, 13 Feb 2020 22:45:28 -0800 Received: from BG-LT7430.marvell.com (unknown [10.28.17.21]) by maili.marvell.com (Postfix) with ESMTP id 8DD113F703F; Thu, 13 Feb 2020 22:45:26 -0800 (PST) From: To: , Pavan Nikhilesh CC: , Date: Fri, 14 Feb 2020 12:15:24 +0530 Message-ID: <20200214064525.1895-1-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.572 definitions=2020-02-14_01:2020-02-12, 2020-02-14 signatures=0 Subject: [dpdk-dev] [PATCH] event/octeontx2: remove WFE from dualslot dequeue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Pavan Nikhilesh Each workslot is always bound to a specific lcore there is no multi-core contention to cause cache trashing as a result it is safe to remove the WFE. Also, in dual workslot dequeue work will mostlikely be available on the pair workslot making WFE impractical. Signed-off-by: Pavan Nikhilesh Reviewed-by: Gavin Hu --- Also, this in-turn reduces the branch misses Before: 0 arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,branch_filter=1,jitter=1,min_latency=0/ 0 dummy:u 0 llc-miss 0 tlb-miss 853 branch-miss 0 remote-access 0 l1d-miss After: 0 arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,branch_filter=1,jitter=1,min_latency=0/ 0 dummy:u 0 llc-miss 0 tlb-miss 250 branch-miss 0 remote-access 0 l1d-miss WFE Data: 0x4C40 - WFI_WFE_WAIT_CYCLES - Number of cycles waiting at a WFI or WFE instruction. - WFE Cycles before the patch for Dual workslot #perf stat -C 20 -e r4C40 sleep 1 Performance counter stats for 'CPU(s) 20': 264 r4C40 1.002494168 seconds time elapsed - WFE Cycles for single workslot #perf stat -C 20 -e r4C40 sleep 1 Performance counter stats for 'CPU(s) 20': 908,778,351 r4C40 1.002598253 seconds time elapsed drivers/event/octeontx2/otx2_worker_dual.h | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) -- 2.17.1 diff --git a/drivers/event/octeontx2/otx2_worker_dual.h b/drivers/event/octeontx2/otx2_worker_dual.h index 5134e3d52..c88420eb4 100644 --- a/drivers/event/octeontx2/otx2_worker_dual.h +++ b/drivers/event/octeontx2/otx2_worker_dual.h @@ -29,11 +29,7 @@ otx2_ssogws_dual_get_work(struct otx2_ssogws_state *ws, rte_prefetch_non_temporal(lookup_mem); #ifdef RTE_ARCH_ARM64 asm volatile( - " ldr %[tag], [%[tag_loc]] \n" - " ldr %[wqp], [%[wqp_loc]] \n" - " tbz %[tag], 63, done%= \n" - " sevl \n" - "rty%=: wfe \n" + "rty%=: \n" " ldr %[tag], [%[tag_loc]] \n" " ldr %[wqp], [%[wqp_loc]] \n" " tbnz %[tag], 63, rty%= \n"