From patchwork Tue Jul 11 10:24:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongdong Liu X-Patchwork-Id: 129442 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 659EE42E44; Tue, 11 Jul 2023 12:28:01 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E484642BAC; Tue, 11 Jul 2023 12:27:57 +0200 (CEST) Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by mails.dpdk.org (Postfix) with ESMTP id 947C740A7D; Tue, 11 Jul 2023 12:27:55 +0200 (CEST) Received: from kwepemi500017.china.huawei.com (unknown [172.30.72.57]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4R0cWD2F9Mz1FDml; Tue, 11 Jul 2023 18:27:20 +0800 (CST) Received: from localhost.localdomain (10.28.79.22) by kwepemi500017.china.huawei.com (7.221.188.110) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Tue, 11 Jul 2023 18:27:53 +0800 From: Dongdong Liu To: , , , CC: Subject: [PATCH 1/5] net/hns3: fix incorrect index to look up table in NEON Rx Date: Tue, 11 Jul 2023 18:24:44 +0800 Message-ID: <20230711102448.11627-2-liudongdong3@huawei.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20230711102448.11627-1-liudongdong3@huawei.com> References: <20230711102448.11627-1-liudongdong3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.28.79.22] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemi500017.china.huawei.com (7.221.188.110) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Huisong Li In hns3_recv_burst_vec(), the index to get packet length and data size are reversed. Fortunately, this doesn't affect functionality because the NEON Rx only supports single BD in which the packet length is equal to the date size. Now this patch fixes it to get back to the truth. Fixes: a3d4f4d291d7 ("net/hns3: support NEON Rx") Cc: stable@dpdk.org Signed-off-by: Huisong Li Signed-off-by: Dongdong Liu --- drivers/net/hns3/hns3_rxtx_vec_neon.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index 6c49c70fc7..564d831a48 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -142,8 +142,8 @@ hns3_recv_burst_vec(struct hns3_rx_queue *__restrict rxq, /* mask to shuffle from desc to mbuf's rx_descriptor_fields1 */ uint8x16_t shuf_desc_fields_msk = { 0xff, 0xff, 0xff, 0xff, /* packet type init zero */ - 22, 23, 0xff, 0xff, /* rx.pkt_len to rte_mbuf.pkt_len */ - 20, 21, /* size to rte_mbuf.data_len */ + 20, 21, 0xff, 0xff, /* rx.pkt_len to rte_mbuf.pkt_len */ + 22, 23, /* size to rte_mbuf.data_len */ 0xff, 0xff, /* rte_mbuf.vlan_tci init zero */ 8, 9, 10, 11, /* rx.rss_hash to rte_mbuf.hash.rss */ }; From patchwork Tue Jul 11 10:24:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongdong Liu X-Patchwork-Id: 129445 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 72E2D42E44; Tue, 11 Jul 2023 12:28:23 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B3B5642D33; Tue, 11 Jul 2023 12:28:02 +0200 (CEST) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by mails.dpdk.org (Postfix) with ESMTP id CFDDE42D20; Tue, 11 Jul 2023 12:27:59 +0200 (CEST) Received: from kwepemi500017.china.huawei.com (unknown [172.30.72.54]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4R0cVV2D1CzVjTw; Tue, 11 Jul 2023 18:26:42 +0800 (CST) Received: from localhost.localdomain (10.28.79.22) by kwepemi500017.china.huawei.com (7.221.188.110) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Tue, 11 Jul 2023 18:27:53 +0800 From: Dongdong Liu To: , , , CC: Subject: [PATCH 2/5] net/hns3: fix the order of NEON Rx code Date: Tue, 11 Jul 2023 18:24:45 +0800 Message-ID: <20230711102448.11627-3-liudongdong3@huawei.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20230711102448.11627-1-liudongdong3@huawei.com> References: <20230711102448.11627-1-liudongdong3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.28.79.22] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemi500017.china.huawei.com (7.221.188.110) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Huisong Li This patch reorders the order of the NEON Rx for better maintenance and easier understanding. Fixes: a3d4f4d291d7 ("net/hns3: support NEON Rx") Cc: stable@dpdk.org Signed-off-by: Huisong Li Signed-off-by: Dongdong Liu --- drivers/net/hns3/hns3_rxtx_vec_neon.h | 78 +++++++++++---------------- 1 file changed, 31 insertions(+), 47 deletions(-) diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index 564d831a48..0dc6b9f0a2 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -180,19 +180,12 @@ hns3_recv_burst_vec(struct hns3_rx_queue *__restrict rxq, bd_vld = vset_lane_u16(rxdp[2].rx.bdtype_vld_udp0, bd_vld, 2); bd_vld = vset_lane_u16(rxdp[3].rx.bdtype_vld_udp0, bd_vld, 3); - /* load 2 mbuf pointer */ - mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); - bd_vld = vshl_n_u16(bd_vld, HNS3_UINT16_BIT - 1 - HNS3_RXD_VLD_B); bd_vld = vreinterpret_u16_s16( vshr_n_s16(vreinterpret_s16_u16(bd_vld), HNS3_UINT16_BIT - 1)); stat = ~vget_lane_u64(vreinterpret_u64_u16(bd_vld), 0); - - /* load 2 mbuf pointer again */ - mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); - if (likely(stat == 0)) bd_valid_num = HNS3_DEFAULT_DESCS_PER_LOOP; else @@ -200,20 +193,20 @@ hns3_recv_burst_vec(struct hns3_rx_queue *__restrict rxq, if (bd_valid_num == 0) break; - /* use offset to control below data load oper ordering */ - offset = rxq->offset_table[bd_valid_num]; + /* load 4 mbuf pointer */ + mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); + mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); - /* store 2 mbuf pointer into rx_pkts */ + /* store 4 mbuf pointer into rx_pkts */ vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1); + vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2); - /* read first two descs */ + /* use offset to control below data load oper ordering */ + offset = rxq->offset_table[bd_valid_num]; + + /* read 4 descs */ descs[0] = vld2q_u64((uint64_t *)(rxdp + offset)); descs[1] = vld2q_u64((uint64_t *)(rxdp + offset + 1)); - - /* store 2 mbuf pointer into rx_pkts again */ - vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2); - - /* read remains two descs */ descs[2] = vld2q_u64((uint64_t *)(rxdp + offset + 2)); descs[3] = vld2q_u64((uint64_t *)(rxdp + offset + 3)); @@ -221,56 +214,47 @@ hns3_recv_burst_vec(struct hns3_rx_queue *__restrict rxq, pkt_mbuf1.val[1] = vreinterpretq_u8_u64(descs[0].val[1]); pkt_mbuf2.val[0] = vreinterpretq_u8_u64(descs[1].val[0]); pkt_mbuf2.val[1] = vreinterpretq_u8_u64(descs[1].val[1]); + pkt_mbuf3.val[0] = vreinterpretq_u8_u64(descs[2].val[0]); + pkt_mbuf3.val[1] = vreinterpretq_u8_u64(descs[2].val[1]); + pkt_mbuf4.val[0] = vreinterpretq_u8_u64(descs[3].val[0]); + pkt_mbuf4.val[1] = vreinterpretq_u8_u64(descs[3].val[1]); - /* pkt 1,2 convert format from desc to pktmbuf */ + /* 4 packets convert format from desc to pktmbuf */ pkt_mb1 = vqtbl2q_u8(pkt_mbuf1, shuf_desc_fields_msk); pkt_mb2 = vqtbl2q_u8(pkt_mbuf2, shuf_desc_fields_msk); + pkt_mb3 = vqtbl2q_u8(pkt_mbuf3, shuf_desc_fields_msk); + pkt_mb4 = vqtbl2q_u8(pkt_mbuf4, shuf_desc_fields_msk); - /* store the first 8 bytes of pkt 1,2 mbuf's rearm_data */ - *(uint64_t *)&sw_ring[pos + 0].mbuf->rearm_data = - rxq->mbuf_initializer; - *(uint64_t *)&sw_ring[pos + 1].mbuf->rearm_data = - rxq->mbuf_initializer; - - /* pkt 1,2 remove crc */ + /* 4 packets remove crc */ tmp = vsubq_u16(vreinterpretq_u16_u8(pkt_mb1), crc_adjust); pkt_mb1 = vreinterpretq_u8_u16(tmp); tmp = vsubq_u16(vreinterpretq_u16_u8(pkt_mb2), crc_adjust); pkt_mb2 = vreinterpretq_u8_u16(tmp); + tmp = vsubq_u16(vreinterpretq_u16_u8(pkt_mb3), crc_adjust); + pkt_mb3 = vreinterpretq_u8_u16(tmp); + tmp = vsubq_u16(vreinterpretq_u16_u8(pkt_mb4), crc_adjust); + pkt_mb4 = vreinterpretq_u8_u16(tmp); - pkt_mbuf3.val[0] = vreinterpretq_u8_u64(descs[2].val[0]); - pkt_mbuf3.val[1] = vreinterpretq_u8_u64(descs[2].val[1]); - pkt_mbuf4.val[0] = vreinterpretq_u8_u64(descs[3].val[0]); - pkt_mbuf4.val[1] = vreinterpretq_u8_u64(descs[3].val[1]); - - /* pkt 3,4 convert format from desc to pktmbuf */ - pkt_mb3 = vqtbl2q_u8(pkt_mbuf3, shuf_desc_fields_msk); - pkt_mb4 = vqtbl2q_u8(pkt_mbuf4, shuf_desc_fields_msk); - - /* pkt 1,2 save to rx_pkts mbuf */ + /* save packet info to rx_pkts mbuf */ vst1q_u8((void *)&sw_ring[pos + 0].mbuf->rx_descriptor_fields1, pkt_mb1); vst1q_u8((void *)&sw_ring[pos + 1].mbuf->rx_descriptor_fields1, pkt_mb2); + vst1q_u8((void *)&sw_ring[pos + 2].mbuf->rx_descriptor_fields1, + pkt_mb3); + vst1q_u8((void *)&sw_ring[pos + 3].mbuf->rx_descriptor_fields1, + pkt_mb4); - /* pkt 3,4 remove crc */ - tmp = vsubq_u16(vreinterpretq_u16_u8(pkt_mb3), crc_adjust); - pkt_mb3 = vreinterpretq_u8_u16(tmp); - tmp = vsubq_u16(vreinterpretq_u16_u8(pkt_mb4), crc_adjust); - pkt_mb4 = vreinterpretq_u8_u16(tmp); - - /* store the first 8 bytes of pkt 3,4 mbuf's rearm_data */ + /* store the first 8 bytes of packets mbuf's rearm_data */ + *(uint64_t *)&sw_ring[pos + 0].mbuf->rearm_data = + rxq->mbuf_initializer; + *(uint64_t *)&sw_ring[pos + 1].mbuf->rearm_data = + rxq->mbuf_initializer; *(uint64_t *)&sw_ring[pos + 2].mbuf->rearm_data = rxq->mbuf_initializer; *(uint64_t *)&sw_ring[pos + 3].mbuf->rearm_data = rxq->mbuf_initializer; - /* pkt 3,4 save to rx_pkts mbuf */ - vst1q_u8((void *)&sw_ring[pos + 2].mbuf->rx_descriptor_fields1, - pkt_mb3); - vst1q_u8((void *)&sw_ring[pos + 3].mbuf->rx_descriptor_fields1, - pkt_mb4); - rte_prefetch_non_temporal(rxdp + HNS3_DEFAULT_DESCS_PER_LOOP); parse_retcode = hns3_desc_parse_field(rxq, &sw_ring[pos], From patchwork Tue Jul 11 10:24:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongdong Liu X-Patchwork-Id: 129443 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4EF7F42E44; Tue, 11 Jul 2023 12:28:09 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id CE79942D1D; Tue, 11 Jul 2023 12:27:59 +0200 (CEST) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by mails.dpdk.org (Postfix) with ESMTP id BD386410DC; Tue, 11 Jul 2023 12:27:55 +0200 (CEST) Received: from kwepemi500017.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4R0cS54NX2zMqVJ; Tue, 11 Jul 2023 18:24:37 +0800 (CST) Received: from localhost.localdomain (10.28.79.22) by kwepemi500017.china.huawei.com (7.221.188.110) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Tue, 11 Jul 2023 18:27:53 +0800 From: Dongdong Liu To: , , , CC: Subject: [PATCH 3/5] net/hns3: optimize free mbuf code for SVE Tx Date: Tue, 11 Jul 2023 18:24:46 +0800 Message-ID: <20230711102448.11627-4-liudongdong3@huawei.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20230711102448.11627-1-liudongdong3@huawei.com> References: <20230711102448.11627-1-liudongdong3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.28.79.22] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemi500017.china.huawei.com (7.221.188.110) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Huisong Li Currently, hns3 SVE Tx checks the valid bits of all descriptors in a batch and then determines whether to release the corresponding mbufs. Actually, once the valid bit of any descriptor in a batch isn't cleared, driver does not need to scan the rest of descriptors. If we optimize SVE codes algorithm about this function, the performance of a single queue for 64B packet is improved by ~2% on txonly forwarding mode. And if use C code to scan all descriptors, the performance is improved by ~8%. So this patch selects C code to optimize this code to improve the SVE Tx performance. Signed-off-by: Huisong Li Signed-off-by: Dongdong Liu --- drivers/net/hns3/hns3_rxtx_vec_sve.c | 42 +--------------------------- 1 file changed, 1 insertion(+), 41 deletions(-) diff --git a/drivers/net/hns3/hns3_rxtx_vec_sve.c b/drivers/net/hns3/hns3_rxtx_vec_sve.c index 8bfc3de049..5011544e07 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_sve.c +++ b/drivers/net/hns3/hns3_rxtx_vec_sve.c @@ -337,46 +337,6 @@ hns3_recv_pkts_vec_sve(void *__restrict rx_queue, return nb_rx; } -static inline void -hns3_tx_free_buffers_sve(struct hns3_tx_queue *txq) -{ -#define HNS3_SVE_CHECK_DESCS_PER_LOOP 8 -#define TX_VLD_U8_ZIP_INDEX svindex_u8(0, 4) - svbool_t pg32 = svwhilelt_b32(0, HNS3_SVE_CHECK_DESCS_PER_LOOP); - svuint32_t vld, vld2; - svuint8_t vld_u8; - uint64_t vld_all; - struct hns3_desc *tx_desc; - int i; - - /* - * All mbufs can be released only when the VLD bits of all - * descriptors in a batch are cleared. - */ - /* do logical OR operation for all desc's valid field */ - vld = svdup_n_u32(0); - tx_desc = &txq->tx_ring[txq->next_to_clean]; - for (i = 0; i < txq->tx_rs_thresh; i += HNS3_SVE_CHECK_DESCS_PER_LOOP, - tx_desc += HNS3_SVE_CHECK_DESCS_PER_LOOP) { - vld2 = svld1_gather_u32offset_u32(pg32, (uint32_t *)tx_desc, - svindex_u32(BD_FIELD_VALID_OFFSET, BD_SIZE)); - vld = svorr_u32_z(pg32, vld, vld2); - } - /* shift left and then right to get all valid bit */ - vld = svlsl_n_u32_z(pg32, vld, - HNS3_UINT32_BIT - 1 - HNS3_TXD_VLD_B); - vld = svreinterpret_u32_s32(svasr_n_s32_z(pg32, - svreinterpret_s32_u32(vld), HNS3_UINT32_BIT - 1)); - /* use tbl to compress 32bit-lane to 8bit-lane */ - vld_u8 = svtbl_u8(svreinterpret_u8_u32(vld), TX_VLD_U8_ZIP_INDEX); - /* dump compressed 64bit to variable */ - svst1_u64(PG64_64BIT, &vld_all, svreinterpret_u64_u8(vld_u8)); - if (vld_all > 0) - return; - - hns3_tx_bulk_free_buffers(txq); -} - static inline void hns3_tx_fill_hw_ring_sve(struct hns3_tx_queue *txq, struct rte_mbuf **pkts, @@ -462,7 +422,7 @@ hns3_xmit_fixed_burst_vec_sve(void *__restrict tx_queue, uint16_t nb_tx = 0; if (txq->tx_bd_ready < txq->tx_free_thresh) - hns3_tx_free_buffers_sve(txq); + hns3_tx_free_buffers(txq); nb_pkts = RTE_MIN(txq->tx_bd_ready, nb_pkts); if (unlikely(nb_pkts == 0)) { From patchwork Tue Jul 11 10:24:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongdong Liu X-Patchwork-Id: 129444 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 31C6742E44; Tue, 11 Jul 2023 12:28:17 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8188042D38; Tue, 11 Jul 2023 12:28:01 +0200 (CEST) Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by mails.dpdk.org (Postfix) with ESMTP id 0FAC34003C; Tue, 11 Jul 2023 12:27:56 +0200 (CEST) Received: from kwepemi500017.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4R0cSS18bCztRF1; Tue, 11 Jul 2023 18:24:56 +0800 (CST) Received: from localhost.localdomain (10.28.79.22) by kwepemi500017.china.huawei.com (7.221.188.110) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Tue, 11 Jul 2023 18:27:54 +0800 From: Dongdong Liu To: , , , CC: Subject: [PATCH 4/5] net/hns3: optimize the rearm mbuf function for SVE Rx Date: Tue, 11 Jul 2023 18:24:47 +0800 Message-ID: <20230711102448.11627-5-liudongdong3@huawei.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20230711102448.11627-1-liudongdong3@huawei.com> References: <20230711102448.11627-1-liudongdong3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.28.79.22] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemi500017.china.huawei.com (7.221.188.110) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Huisong Li Use hns3_rxq_rearm_mbuf() to replace the hns3_rxq_rearm_mbuf_sve() to optimize the performance of SVE Rx. On the rxonly forwarding mode, the performance of a single queue for 64B packet is improved by ~15%. Signed-off-by: Huisong Li Signed-off-by: Dongdong Liu --- drivers/net/hns3/hns3_rxtx_vec.c | 51 --------------------------- drivers/net/hns3/hns3_rxtx_vec.h | 51 +++++++++++++++++++++++++++ drivers/net/hns3/hns3_rxtx_vec_sve.c | 52 ++-------------------------- 3 files changed, 53 insertions(+), 101 deletions(-) diff --git a/drivers/net/hns3/hns3_rxtx_vec.c b/drivers/net/hns3/hns3_rxtx_vec.c index cd9264d91b..9708ec614e 100644 --- a/drivers/net/hns3/hns3_rxtx_vec.c +++ b/drivers/net/hns3/hns3_rxtx_vec.c @@ -55,57 +55,6 @@ hns3_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) return nb_tx; } -static inline void -hns3_rxq_rearm_mbuf(struct hns3_rx_queue *rxq) -{ -#define REARM_LOOP_STEP_NUM 4 - struct hns3_entry *rxep = &rxq->sw_ring[rxq->rx_rearm_start]; - struct hns3_desc *rxdp = rxq->rx_ring + rxq->rx_rearm_start; - uint64_t dma_addr; - int i; - - if (unlikely(rte_mempool_get_bulk(rxq->mb_pool, (void *)rxep, - HNS3_DEFAULT_RXQ_REARM_THRESH) < 0)) { - rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed++; - return; - } - - for (i = 0; i < HNS3_DEFAULT_RXQ_REARM_THRESH; i += REARM_LOOP_STEP_NUM, - rxep += REARM_LOOP_STEP_NUM, rxdp += REARM_LOOP_STEP_NUM) { - if (likely(i < - HNS3_DEFAULT_RXQ_REARM_THRESH - REARM_LOOP_STEP_NUM)) { - rte_prefetch_non_temporal(rxep[4].mbuf); - rte_prefetch_non_temporal(rxep[5].mbuf); - rte_prefetch_non_temporal(rxep[6].mbuf); - rte_prefetch_non_temporal(rxep[7].mbuf); - } - - dma_addr = rte_mbuf_data_iova_default(rxep[0].mbuf); - rxdp[0].addr = rte_cpu_to_le_64(dma_addr); - rxdp[0].rx.bd_base_info = 0; - - dma_addr = rte_mbuf_data_iova_default(rxep[1].mbuf); - rxdp[1].addr = rte_cpu_to_le_64(dma_addr); - rxdp[1].rx.bd_base_info = 0; - - dma_addr = rte_mbuf_data_iova_default(rxep[2].mbuf); - rxdp[2].addr = rte_cpu_to_le_64(dma_addr); - rxdp[2].rx.bd_base_info = 0; - - dma_addr = rte_mbuf_data_iova_default(rxep[3].mbuf); - rxdp[3].addr = rte_cpu_to_le_64(dma_addr); - rxdp[3].rx.bd_base_info = 0; - } - - rxq->rx_rearm_start += HNS3_DEFAULT_RXQ_REARM_THRESH; - if (rxq->rx_rearm_start >= rxq->nb_rx_desc) - rxq->rx_rearm_start = 0; - - rxq->rx_rearm_nb -= HNS3_DEFAULT_RXQ_REARM_THRESH; - - hns3_write_reg_opt(rxq->io_head_reg, HNS3_DEFAULT_RXQ_REARM_THRESH); -} - uint16_t hns3_recv_pkts_vec(void *__restrict rx_queue, struct rte_mbuf **__restrict rx_pkts, diff --git a/drivers/net/hns3/hns3_rxtx_vec.h b/drivers/net/hns3/hns3_rxtx_vec.h index 2c8a91921e..a9a6774294 100644 --- a/drivers/net/hns3/hns3_rxtx_vec.h +++ b/drivers/net/hns3/hns3_rxtx_vec.h @@ -94,4 +94,55 @@ hns3_rx_reassemble_pkts(struct rte_mbuf **rx_pkts, return count; } + +static inline void +hns3_rxq_rearm_mbuf(struct hns3_rx_queue *rxq) +{ +#define REARM_LOOP_STEP_NUM 4 + struct hns3_entry *rxep = &rxq->sw_ring[rxq->rx_rearm_start]; + struct hns3_desc *rxdp = rxq->rx_ring + rxq->rx_rearm_start; + uint64_t dma_addr; + int i; + + if (unlikely(rte_mempool_get_bulk(rxq->mb_pool, (void *)rxep, + HNS3_DEFAULT_RXQ_REARM_THRESH) < 0)) { + rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed++; + return; + } + + for (i = 0; i < HNS3_DEFAULT_RXQ_REARM_THRESH; i += REARM_LOOP_STEP_NUM, + rxep += REARM_LOOP_STEP_NUM, rxdp += REARM_LOOP_STEP_NUM) { + if (likely(i < + HNS3_DEFAULT_RXQ_REARM_THRESH - REARM_LOOP_STEP_NUM)) { + rte_prefetch_non_temporal(rxep[4].mbuf); + rte_prefetch_non_temporal(rxep[5].mbuf); + rte_prefetch_non_temporal(rxep[6].mbuf); + rte_prefetch_non_temporal(rxep[7].mbuf); + } + + dma_addr = rte_mbuf_data_iova_default(rxep[0].mbuf); + rxdp[0].addr = rte_cpu_to_le_64(dma_addr); + rxdp[0].rx.bd_base_info = 0; + + dma_addr = rte_mbuf_data_iova_default(rxep[1].mbuf); + rxdp[1].addr = rte_cpu_to_le_64(dma_addr); + rxdp[1].rx.bd_base_info = 0; + + dma_addr = rte_mbuf_data_iova_default(rxep[2].mbuf); + rxdp[2].addr = rte_cpu_to_le_64(dma_addr); + rxdp[2].rx.bd_base_info = 0; + + dma_addr = rte_mbuf_data_iova_default(rxep[3].mbuf); + rxdp[3].addr = rte_cpu_to_le_64(dma_addr); + rxdp[3].rx.bd_base_info = 0; + } + + rxq->rx_rearm_start += HNS3_DEFAULT_RXQ_REARM_THRESH; + if (rxq->rx_rearm_start >= rxq->nb_rx_desc) + rxq->rx_rearm_start = 0; + + rxq->rx_rearm_nb -= HNS3_DEFAULT_RXQ_REARM_THRESH; + + hns3_write_reg_opt(rxq->io_head_reg, HNS3_DEFAULT_RXQ_REARM_THRESH); +} #endif /* HNS3_RXTX_VEC_H */ diff --git a/drivers/net/hns3/hns3_rxtx_vec_sve.c b/drivers/net/hns3/hns3_rxtx_vec_sve.c index 5011544e07..54aef7db8d 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_sve.c +++ b/drivers/net/hns3/hns3_rxtx_vec_sve.c @@ -237,54 +237,6 @@ hns3_recv_burst_vec_sve(struct hns3_rx_queue *__restrict rxq, return nb_rx; } -static inline void -hns3_rxq_rearm_mbuf_sve(struct hns3_rx_queue *rxq) -{ -#define REARM_LOOP_STEP_NUM 4 - struct hns3_entry *rxep = &rxq->sw_ring[rxq->rx_rearm_start]; - struct hns3_desc *rxdp = rxq->rx_ring + rxq->rx_rearm_start; - struct hns3_entry *rxep_tmp = rxep; - int i; - - if (unlikely(rte_mempool_get_bulk(rxq->mb_pool, (void *)rxep, - HNS3_DEFAULT_RXQ_REARM_THRESH) < 0)) { - rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed++; - return; - } - - for (i = 0; i < HNS3_DEFAULT_RXQ_REARM_THRESH; i += REARM_LOOP_STEP_NUM, - rxep_tmp += REARM_LOOP_STEP_NUM) { - svuint64_t prf = svld1_u64(PG64_256BIT, (uint64_t *)rxep_tmp); - svprfd_gather_u64base(PG64_256BIT, prf, SV_PLDL1STRM); - } - - for (i = 0; i < HNS3_DEFAULT_RXQ_REARM_THRESH; i += REARM_LOOP_STEP_NUM, - rxep += REARM_LOOP_STEP_NUM, rxdp += REARM_LOOP_STEP_NUM) { - uint64_t iova[REARM_LOOP_STEP_NUM]; - iova[0] = rte_mbuf_iova_get(rxep[0].mbuf); - iova[1] = rte_mbuf_iova_get(rxep[1].mbuf); - iova[2] = rte_mbuf_iova_get(rxep[2].mbuf); - iova[3] = rte_mbuf_iova_get(rxep[3].mbuf); - svuint64_t siova = svld1_u64(PG64_256BIT, iova); - siova = svadd_n_u64_z(PG64_256BIT, siova, RTE_PKTMBUF_HEADROOM); - svuint64_t ol_base = svdup_n_u64(0); - svst1_scatter_u64offset_u64(PG64_256BIT, - (uint64_t *)&rxdp[0].addr, - svindex_u64(BD_FIELD_ADDR_OFFSET, BD_SIZE), siova); - svst1_scatter_u64offset_u64(PG64_256BIT, - (uint64_t *)&rxdp[0].addr, - svindex_u64(BD_FIELD_OL_OFFSET, BD_SIZE), ol_base); - } - - rxq->rx_rearm_start += HNS3_DEFAULT_RXQ_REARM_THRESH; - if (rxq->rx_rearm_start >= rxq->nb_rx_desc) - rxq->rx_rearm_start = 0; - - rxq->rx_rearm_nb -= HNS3_DEFAULT_RXQ_REARM_THRESH; - - hns3_write_reg_opt(rxq->io_head_reg, HNS3_DEFAULT_RXQ_REARM_THRESH); -} - uint16_t hns3_recv_pkts_vec_sve(void *__restrict rx_queue, struct rte_mbuf **__restrict rx_pkts, @@ -300,7 +252,7 @@ hns3_recv_pkts_vec_sve(void *__restrict rx_queue, nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, HNS3_SVE_DEFAULT_DESCS_PER_LOOP); if (rxq->rx_rearm_nb > HNS3_DEFAULT_RXQ_REARM_THRESH) - hns3_rxq_rearm_mbuf_sve(rxq); + hns3_rxq_rearm_mbuf(rxq); if (unlikely(!(rxdp->rx.bd_base_info & rte_cpu_to_le_32(1u << HNS3_RXD_VLD_B)))) @@ -331,7 +283,7 @@ hns3_recv_pkts_vec_sve(void *__restrict rx_queue, break; if (rxq->rx_rearm_nb > HNS3_DEFAULT_RXQ_REARM_THRESH) - hns3_rxq_rearm_mbuf_sve(rxq); + hns3_rxq_rearm_mbuf(rxq); } return nb_rx; From patchwork Tue Jul 11 10:24:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongdong Liu X-Patchwork-Id: 129446 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 268B442E44; Tue, 11 Jul 2023 12:28:31 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 41D5942D43; Tue, 11 Jul 2023 12:28:09 +0200 (CEST) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by mails.dpdk.org (Postfix) with ESMTP id 3981242D40; Tue, 11 Jul 2023 12:28:07 +0200 (CEST) Received: from kwepemi500017.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4R0cVb5MDVzVjHM; Tue, 11 Jul 2023 18:26:47 +0800 (CST) Received: from localhost.localdomain (10.28.79.22) by kwepemi500017.china.huawei.com (7.221.188.110) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Tue, 11 Jul 2023 18:27:58 +0800 From: Dongdong Liu To: , , , CC: Subject: [PATCH 5/5] net/hns3: optimize SVE Rx performance Date: Tue, 11 Jul 2023 18:24:48 +0800 Message-ID: <20230711102448.11627-6-liudongdong3@huawei.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20230711102448.11627-1-liudongdong3@huawei.com> References: <20230711102448.11627-1-liudongdong3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.28.79.22] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemi500017.china.huawei.com (7.221.188.110) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Huisong Li This patch optimizes SVE Rx performance by the following ways: 1> optimize the calculation of valid BD number. 2> remove a temporary variable (key_fields) 3> use C language to parse some descriptor fields, instead of SVE instruction. 4> small step prefetch descriptor. On the rxonly forwarding mode, the performance of a single queue or 64B packet is improved by ~40%. Signed-off-by: Huisong Li Signed-off-by: Dongdong Liu --- drivers/net/hns3/hns3_rxtx_vec_sve.c | 138 ++++++--------------------- 1 file changed, 28 insertions(+), 110 deletions(-) diff --git a/drivers/net/hns3/hns3_rxtx_vec_sve.c b/drivers/net/hns3/hns3_rxtx_vec_sve.c index 54aef7db8d..0e9abfebec 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_sve.c +++ b/drivers/net/hns3/hns3_rxtx_vec_sve.c @@ -20,40 +20,36 @@ #define BD_SIZE 32 #define BD_FIELD_ADDR_OFFSET 0 -#define BD_FIELD_L234_OFFSET 8 -#define BD_FIELD_XLEN_OFFSET 12 -#define BD_FIELD_RSS_OFFSET 16 -#define BD_FIELD_OL_OFFSET 24 #define BD_FIELD_VALID_OFFSET 28 -typedef struct { - uint32_t l234_info[HNS3_SVE_DEFAULT_DESCS_PER_LOOP]; - uint32_t ol_info[HNS3_SVE_DEFAULT_DESCS_PER_LOOP]; - uint32_t bd_base_info[HNS3_SVE_DEFAULT_DESCS_PER_LOOP]; -} HNS3_SVE_KEY_FIELD_S; - static inline uint32_t hns3_desc_parse_field_sve(struct hns3_rx_queue *rxq, struct rte_mbuf **rx_pkts, - HNS3_SVE_KEY_FIELD_S *key, + struct hns3_desc *rxdp, uint32_t bd_vld_num) { + uint32_t l234_info, ol_info, bd_base_info; uint32_t retcode = 0; int ret, i; for (i = 0; i < (int)bd_vld_num; i++) { /* init rte_mbuf.rearm_data last 64-bit */ rx_pkts[i]->ol_flags = RTE_MBUF_F_RX_RSS_HASH; - - ret = hns3_handle_bdinfo(rxq, rx_pkts[i], key->bd_base_info[i], - key->l234_info[i]); + rx_pkts[i]->hash.rss = rxdp[i].rx.rss_hash; + rx_pkts[i]->pkt_len = rte_le_to_cpu_16(rxdp[i].rx.pkt_len) - + rxq->crc_len; + rx_pkts[i]->data_len = rx_pkts[i]->pkt_len; + + l234_info = rxdp[i].rx.l234_info; + ol_info = rxdp[i].rx.ol_info; + bd_base_info = rxdp[i].rx.bd_base_info; + ret = hns3_handle_bdinfo(rxq, rx_pkts[i], bd_base_info, l234_info); if (unlikely(ret)) { retcode |= 1u << i; continue; } - rx_pkts[i]->packet_type = hns3_rx_calc_ptype(rxq, - key->l234_info[i], key->ol_info[i]); + rx_pkts[i]->packet_type = hns3_rx_calc_ptype(rxq, l234_info, ol_info); /* Increment bytes counter */ rxq->basic_stats.bytes += rx_pkts[i]->pkt_len; @@ -77,46 +73,16 @@ hns3_recv_burst_vec_sve(struct hns3_rx_queue *__restrict rxq, uint16_t nb_pkts, uint64_t *bd_err_mask) { -#define XLEN_ADJUST_LEN 32 -#define RSS_ADJUST_LEN 16 -#define GEN_VLD_U8_ZIP_INDEX svindex_s8(28, -4) uint16_t rx_id = rxq->next_to_use; struct hns3_entry *sw_ring = &rxq->sw_ring[rx_id]; struct hns3_desc *rxdp = &rxq->rx_ring[rx_id]; - struct hns3_desc *rxdp2; - HNS3_SVE_KEY_FIELD_S key_field; + struct hns3_desc *rxdp2, *next_rxdp; uint64_t bd_valid_num; uint32_t parse_retcode; uint16_t nb_rx = 0; int pos, offset; - uint16_t xlen_adjust[XLEN_ADJUST_LEN] = { - 0, 0xffff, 1, 0xffff, /* 1st mbuf: pkt_len and dat_len */ - 2, 0xffff, 3, 0xffff, /* 2st mbuf: pkt_len and dat_len */ - 4, 0xffff, 5, 0xffff, /* 3st mbuf: pkt_len and dat_len */ - 6, 0xffff, 7, 0xffff, /* 4st mbuf: pkt_len and dat_len */ - 8, 0xffff, 9, 0xffff, /* 5st mbuf: pkt_len and dat_len */ - 10, 0xffff, 11, 0xffff, /* 6st mbuf: pkt_len and dat_len */ - 12, 0xffff, 13, 0xffff, /* 7st mbuf: pkt_len and dat_len */ - 14, 0xffff, 15, 0xffff, /* 8st mbuf: pkt_len and dat_len */ - }; - - uint32_t rss_adjust[RSS_ADJUST_LEN] = { - 0, 0xffff, /* 1st mbuf: rss */ - 1, 0xffff, /* 2st mbuf: rss */ - 2, 0xffff, /* 3st mbuf: rss */ - 3, 0xffff, /* 4st mbuf: rss */ - 4, 0xffff, /* 5st mbuf: rss */ - 5, 0xffff, /* 6st mbuf: rss */ - 6, 0xffff, /* 7st mbuf: rss */ - 7, 0xffff, /* 8st mbuf: rss */ - }; - svbool_t pg32 = svwhilelt_b32(0, HNS3_SVE_DEFAULT_DESCS_PER_LOOP); - svuint16_t xlen_tbl1 = svld1_u16(PG16_256BIT, xlen_adjust); - svuint16_t xlen_tbl2 = svld1_u16(PG16_256BIT, &xlen_adjust[16]); - svuint32_t rss_tbl1 = svld1_u32(PG32_256BIT, rss_adjust); - svuint32_t rss_tbl2 = svld1_u32(PG32_256BIT, &rss_adjust[8]); /* compile-time verifies the xlen_adjust mask */ RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, data_len) != @@ -126,30 +92,21 @@ hns3_recv_burst_vec_sve(struct hns3_rx_queue *__restrict rxq, for (pos = 0; pos < nb_pkts; pos += HNS3_SVE_DEFAULT_DESCS_PER_LOOP, rxdp += HNS3_SVE_DEFAULT_DESCS_PER_LOOP) { - svuint64_t vld_clz, mbp1st, mbp2st, mbuf_init; - svuint64_t xlen1st, xlen2st, rss1st, rss2st; - svuint32_t l234, ol, vld, vld2, xlen, rss; - svuint8_t vld_u8; + svuint64_t mbp1st, mbp2st, mbuf_init; + svuint32_t vld; + svbool_t vld_op; /* calc how many bd valid: part 1 */ vld = svld1_gather_u32offset_u32(pg32, (uint32_t *)rxdp, svindex_u32(BD_FIELD_VALID_OFFSET, BD_SIZE)); - vld2 = svlsl_n_u32_z(pg32, vld, - HNS3_UINT32_BIT - 1 - HNS3_RXD_VLD_B); - vld2 = svreinterpret_u32_s32(svasr_n_s32_z(pg32, - svreinterpret_s32_u32(vld2), HNS3_UINT32_BIT - 1)); + vld = svand_n_u32_z(pg32, vld, BIT(HNS3_RXD_VLD_B)); + vld_op = svcmpne_n_u32(pg32, vld, BIT(HNS3_RXD_VLD_B)); + bd_valid_num = svcntp_b32(pg32, svbrkb_b_z(pg32, vld_op)); + if (bd_valid_num == 0) + break; /* load 4 mbuf pointer */ mbp1st = svld1_u64(PG64_256BIT, (uint64_t *)&sw_ring[pos]); - - /* calc how many bd valid: part 2 */ - vld_u8 = svtbl_u8(svreinterpret_u8_u32(vld2), - svreinterpret_u8_s8(GEN_VLD_U8_ZIP_INDEX)); - vld_clz = svnot_u64_z(PG64_64BIT, svreinterpret_u64_u8(vld_u8)); - vld_clz = svclz_u64_z(PG64_64BIT, vld_clz); - svst1_u64(PG64_64BIT, &bd_valid_num, vld_clz); - bd_valid_num /= HNS3_UINT8_BIT; - /* load 4 more mbuf pointer */ mbp2st = svld1_u64(PG64_256BIT, (uint64_t *)&sw_ring[pos + 4]); @@ -159,65 +116,25 @@ hns3_recv_burst_vec_sve(struct hns3_rx_queue *__restrict rxq, /* store 4 mbuf pointer into rx_pkts */ svst1_u64(PG64_256BIT, (uint64_t *)&rx_pkts[pos], mbp1st); - - /* load key field to vector reg */ - l234 = svld1_gather_u32offset_u32(pg32, (uint32_t *)rxdp2, - svindex_u32(BD_FIELD_L234_OFFSET, BD_SIZE)); - ol = svld1_gather_u32offset_u32(pg32, (uint32_t *)rxdp2, - svindex_u32(BD_FIELD_OL_OFFSET, BD_SIZE)); - /* store 4 mbuf pointer into rx_pkts again */ svst1_u64(PG64_256BIT, (uint64_t *)&rx_pkts[pos + 4], mbp2st); - /* load datalen, pktlen and rss_hash */ - xlen = svld1_gather_u32offset_u32(pg32, (uint32_t *)rxdp2, - svindex_u32(BD_FIELD_XLEN_OFFSET, BD_SIZE)); - rss = svld1_gather_u32offset_u32(pg32, (uint32_t *)rxdp2, - svindex_u32(BD_FIELD_RSS_OFFSET, BD_SIZE)); - - /* store key field to stash buffer */ - svst1_u32(pg32, (uint32_t *)key_field.l234_info, l234); - svst1_u32(pg32, (uint32_t *)key_field.bd_base_info, vld); - svst1_u32(pg32, (uint32_t *)key_field.ol_info, ol); - - /* sub crc_len for pkt_len and data_len */ - xlen = svreinterpret_u32_u16(svsub_n_u16_z(PG16_256BIT, - svreinterpret_u16_u32(xlen), rxq->crc_len)); - /* init mbuf_initializer */ mbuf_init = svdup_n_u64(rxq->mbuf_initializer); - - /* extract datalen, pktlen and rss from xlen and rss */ - xlen1st = svreinterpret_u64_u16( - svtbl_u16(svreinterpret_u16_u32(xlen), xlen_tbl1)); - xlen2st = svreinterpret_u64_u16( - svtbl_u16(svreinterpret_u16_u32(xlen), xlen_tbl2)); - rss1st = svreinterpret_u64_u32( - svtbl_u32(svreinterpret_u32_u32(rss), rss_tbl1)); - rss2st = svreinterpret_u64_u32( - svtbl_u32(svreinterpret_u32_u32(rss), rss_tbl2)); - /* save mbuf_initializer */ svst1_scatter_u64base_offset_u64(PG64_256BIT, mbp1st, offsetof(struct rte_mbuf, rearm_data), mbuf_init); svst1_scatter_u64base_offset_u64(PG64_256BIT, mbp2st, offsetof(struct rte_mbuf, rearm_data), mbuf_init); - /* save datalen and pktlen and rss */ - svst1_scatter_u64base_offset_u64(PG64_256BIT, mbp1st, - offsetof(struct rte_mbuf, pkt_len), xlen1st); - svst1_scatter_u64base_offset_u64(PG64_256BIT, mbp1st, - offsetof(struct rte_mbuf, hash.rss), rss1st); - svst1_scatter_u64base_offset_u64(PG64_256BIT, mbp2st, - offsetof(struct rte_mbuf, pkt_len), xlen2st); - svst1_scatter_u64base_offset_u64(PG64_256BIT, mbp2st, - offsetof(struct rte_mbuf, hash.rss), rss2st); - - rte_prefetch_non_temporal(rxdp + - HNS3_SVE_DEFAULT_DESCS_PER_LOOP); + next_rxdp = rxdp + HNS3_SVE_DEFAULT_DESCS_PER_LOOP; + rte_prefetch_non_temporal(next_rxdp); + rte_prefetch_non_temporal(next_rxdp + 2); + rte_prefetch_non_temporal(next_rxdp + 4); + rte_prefetch_non_temporal(next_rxdp + 6); parse_retcode = hns3_desc_parse_field_sve(rxq, &rx_pkts[pos], - &key_field, bd_valid_num); + &rxdp2[offset], bd_valid_num); if (unlikely(parse_retcode)) (*bd_err_mask) |= ((uint64_t)parse_retcode) << pos; @@ -237,6 +154,7 @@ hns3_recv_burst_vec_sve(struct hns3_rx_queue *__restrict rxq, return nb_rx; } + uint16_t hns3_recv_pkts_vec_sve(void *__restrict rx_queue, struct rte_mbuf **__restrict rx_pkts,