From patchwork Tue Jul 11 10:24:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongdong Liu X-Patchwork-Id: 129443 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4EF7F42E44; Tue, 11 Jul 2023 12:28:09 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id CE79942D1D; Tue, 11 Jul 2023 12:27:59 +0200 (CEST) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by mails.dpdk.org (Postfix) with ESMTP id BD386410DC; Tue, 11 Jul 2023 12:27:55 +0200 (CEST) Received: from kwepemi500017.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4R0cS54NX2zMqVJ; Tue, 11 Jul 2023 18:24:37 +0800 (CST) Received: from localhost.localdomain (10.28.79.22) by kwepemi500017.china.huawei.com (7.221.188.110) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Tue, 11 Jul 2023 18:27:53 +0800 From: Dongdong Liu To: , , , CC: Subject: [PATCH 3/5] net/hns3: optimize free mbuf code for SVE Tx Date: Tue, 11 Jul 2023 18:24:46 +0800 Message-ID: <20230711102448.11627-4-liudongdong3@huawei.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20230711102448.11627-1-liudongdong3@huawei.com> References: <20230711102448.11627-1-liudongdong3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.28.79.22] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemi500017.china.huawei.com (7.221.188.110) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Huisong Li Currently, hns3 SVE Tx checks the valid bits of all descriptors in a batch and then determines whether to release the corresponding mbufs. Actually, once the valid bit of any descriptor in a batch isn't cleared, driver does not need to scan the rest of descriptors. If we optimize SVE codes algorithm about this function, the performance of a single queue for 64B packet is improved by ~2% on txonly forwarding mode. And if use C code to scan all descriptors, the performance is improved by ~8%. So this patch selects C code to optimize this code to improve the SVE Tx performance. Signed-off-by: Huisong Li Signed-off-by: Dongdong Liu --- drivers/net/hns3/hns3_rxtx_vec_sve.c | 42 +--------------------------- 1 file changed, 1 insertion(+), 41 deletions(-) diff --git a/drivers/net/hns3/hns3_rxtx_vec_sve.c b/drivers/net/hns3/hns3_rxtx_vec_sve.c index 8bfc3de049..5011544e07 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_sve.c +++ b/drivers/net/hns3/hns3_rxtx_vec_sve.c @@ -337,46 +337,6 @@ hns3_recv_pkts_vec_sve(void *__restrict rx_queue, return nb_rx; } -static inline void -hns3_tx_free_buffers_sve(struct hns3_tx_queue *txq) -{ -#define HNS3_SVE_CHECK_DESCS_PER_LOOP 8 -#define TX_VLD_U8_ZIP_INDEX svindex_u8(0, 4) - svbool_t pg32 = svwhilelt_b32(0, HNS3_SVE_CHECK_DESCS_PER_LOOP); - svuint32_t vld, vld2; - svuint8_t vld_u8; - uint64_t vld_all; - struct hns3_desc *tx_desc; - int i; - - /* - * All mbufs can be released only when the VLD bits of all - * descriptors in a batch are cleared. - */ - /* do logical OR operation for all desc's valid field */ - vld = svdup_n_u32(0); - tx_desc = &txq->tx_ring[txq->next_to_clean]; - for (i = 0; i < txq->tx_rs_thresh; i += HNS3_SVE_CHECK_DESCS_PER_LOOP, - tx_desc += HNS3_SVE_CHECK_DESCS_PER_LOOP) { - vld2 = svld1_gather_u32offset_u32(pg32, (uint32_t *)tx_desc, - svindex_u32(BD_FIELD_VALID_OFFSET, BD_SIZE)); - vld = svorr_u32_z(pg32, vld, vld2); - } - /* shift left and then right to get all valid bit */ - vld = svlsl_n_u32_z(pg32, vld, - HNS3_UINT32_BIT - 1 - HNS3_TXD_VLD_B); - vld = svreinterpret_u32_s32(svasr_n_s32_z(pg32, - svreinterpret_s32_u32(vld), HNS3_UINT32_BIT - 1)); - /* use tbl to compress 32bit-lane to 8bit-lane */ - vld_u8 = svtbl_u8(svreinterpret_u8_u32(vld), TX_VLD_U8_ZIP_INDEX); - /* dump compressed 64bit to variable */ - svst1_u64(PG64_64BIT, &vld_all, svreinterpret_u64_u8(vld_u8)); - if (vld_all > 0) - return; - - hns3_tx_bulk_free_buffers(txq); -} - static inline void hns3_tx_fill_hw_ring_sve(struct hns3_tx_queue *txq, struct rte_mbuf **pkts, @@ -462,7 +422,7 @@ hns3_xmit_fixed_burst_vec_sve(void *__restrict tx_queue, uint16_t nb_tx = 0; if (txq->tx_bd_ready < txq->tx_free_thresh) - hns3_tx_free_buffers_sve(txq); + hns3_tx_free_buffers(txq); nb_pkts = RTE_MIN(txq->tx_bd_ready, nb_pkts); if (unlikely(nb_pkts == 0)) {