Patch Detail
get:
Show a patch.
patch:
Update a patch.
put:
Update a patch.
GET /api/patches/129446/?format=api
http://patchwork.dpdk.org/api/patches/129446/?format=api", "web_url": "http://patchwork.dpdk.org/project/dpdk/patch/20230711102448.11627-6-liudongdong3@huawei.com/", "project": { "id": 1, "url": "http://patchwork.dpdk.org/api/projects/1/?format=api", "name": "DPDK", "link_name": "dpdk", "list_id": "dev.dpdk.org", "list_email": "dev@dpdk.org", "web_url": "http://core.dpdk.org", "scm_url": "git://dpdk.org/dpdk", "webscm_url": "http://git.dpdk.org/dpdk", "list_archive_url": "https://inbox.dpdk.org/dev", "list_archive_url_format": "https://inbox.dpdk.org/dev/{}", "commit_url_format": "" }, "msgid": "<20230711102448.11627-6-liudongdong3@huawei.com>", "list_archive_url": "https://inbox.dpdk.org/dev/20230711102448.11627-6-liudongdong3@huawei.com", "date": "2023-07-11T10:24:48", "name": "[5/5] net/hns3: optimize SVE Rx performance", "commit_ref": null, "pull_url": null, "state": "accepted", "archived": true, "hash": "933c0f60225f66bcfe9200fbcaf0319a5f6a7a52", "submitter": { "id": 2718, "url": "http://patchwork.dpdk.org/api/people/2718/?format=api", "name": "Dongdong Liu", "email": "liudongdong3@huawei.com" }, "delegate": { "id": 319, "url": "http://patchwork.dpdk.org/api/users/319/?format=api", "username": "fyigit", "first_name": "Ferruh", "last_name": "Yigit", "email": "ferruh.yigit@amd.com" }, "mbox": "http://patchwork.dpdk.org/project/dpdk/patch/20230711102448.11627-6-liudongdong3@huawei.com/mbox/", "series": [ { "id": 28901, "url": "http://patchwork.dpdk.org/api/series/28901/?format=api", "web_url": "http://patchwork.dpdk.org/project/dpdk/list/?series=28901", "date": "2023-07-11T10:24:43", "name": "net/hns3: some performance optimizations", "version": 1, "mbox": "http://patchwork.dpdk.org/series/28901/mbox/" } ], "comments": "http://patchwork.dpdk.org/api/patches/129446/comments/", "check": "success", "checks": "http://patchwork.dpdk.org/api/patches/129446/checks/", "tags": {}, "related": [], "headers": { "Return-Path": "<dev-bounces@dpdk.org>", "X-Original-To": "patchwork@inbox.dpdk.org", "Delivered-To": "patchwork@inbox.dpdk.org", "Received": [ "from mails.dpdk.org (mails.dpdk.org [217.70.189.124])\n\tby inbox.dpdk.org (Postfix) with ESMTP id 268B442E44;\n\tTue, 11 Jul 2023 12:28:31 +0200 (CEST)", "from mails.dpdk.org (localhost [127.0.0.1])\n\tby mails.dpdk.org (Postfix) with ESMTP id 41D5942D43;\n\tTue, 11 Jul 2023 12:28:09 +0200 (CEST)", "from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188])\n by mails.dpdk.org (Postfix) with ESMTP id 3981242D40;\n Tue, 11 Jul 2023 12:28:07 +0200 (CEST)", "from kwepemi500017.china.huawei.com (unknown [172.30.72.56])\n by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4R0cVb5MDVzVjHM;\n Tue, 11 Jul 2023 18:26:47 +0800 (CST)", "from localhost.localdomain (10.28.79.22) by\n kwepemi500017.china.huawei.com (7.221.188.110) with Microsoft SMTP Server\n (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id\n 15.1.2507.27; Tue, 11 Jul 2023 18:27:58 +0800" ], "From": "Dongdong Liu <liudongdong3@huawei.com>", "To": "<dev@dpdk.org>, <ferruh.yigit@amd.com>, <thomas@monjalon.net>,\n <andrew.rybchenko@oktetlabs.ru>", "CC": "<stable@dpdk.org>", "Subject": "[PATCH 5/5] net/hns3: optimize SVE Rx performance", "Date": "Tue, 11 Jul 2023 18:24:48 +0800", "Message-ID": "<20230711102448.11627-6-liudongdong3@huawei.com>", "X-Mailer": "git-send-email 2.22.0", "In-Reply-To": "<20230711102448.11627-1-liudongdong3@huawei.com>", "References": "<20230711102448.11627-1-liudongdong3@huawei.com>", "MIME-Version": "1.0", "Content-Transfer-Encoding": "8bit", "Content-Type": "text/plain", "X-Originating-IP": "[10.28.79.22]", "X-ClientProxiedBy": "dggems706-chm.china.huawei.com (10.3.19.183) To\n kwepemi500017.china.huawei.com (7.221.188.110)", "X-CFilter-Loop": "Reflected", "X-BeenThere": "dev@dpdk.org", "X-Mailman-Version": "2.1.29", "Precedence": "list", "List-Id": "DPDK patches and discussions <dev.dpdk.org>", "List-Unsubscribe": "<https://mails.dpdk.org/options/dev>,\n <mailto:dev-request@dpdk.org?subject=unsubscribe>", "List-Archive": "<http://mails.dpdk.org/archives/dev/>", "List-Post": "<mailto:dev@dpdk.org>", "List-Help": "<mailto:dev-request@dpdk.org?subject=help>", "List-Subscribe": "<https://mails.dpdk.org/listinfo/dev>,\n <mailto:dev-request@dpdk.org?subject=subscribe>", "Errors-To": "dev-bounces@dpdk.org" }, "content": "From: Huisong Li <lihuisong@huawei.com>\n\nThis patch optimizes SVE Rx performance by the following ways:\n1> optimize the calculation of valid BD number.\n2> remove a temporary variable (key_fields)\n3> use C language to parse some descriptor fields, instead of\n SVE instruction.\n4> small step prefetch descriptor.\n\nOn the rxonly forwarding mode, the performance of a single queue\nor 64B packet is improved by ~40%.\n\nSigned-off-by: Huisong Li <lihuisong@huawei.com>\nSigned-off-by: Dongdong Liu <liudongdong3@huawei.com>\n---\n drivers/net/hns3/hns3_rxtx_vec_sve.c | 138 ++++++---------------------\n 1 file changed, 28 insertions(+), 110 deletions(-)", "diff": "diff --git a/drivers/net/hns3/hns3_rxtx_vec_sve.c b/drivers/net/hns3/hns3_rxtx_vec_sve.c\nindex 54aef7db8d..0e9abfebec 100644\n--- a/drivers/net/hns3/hns3_rxtx_vec_sve.c\n+++ b/drivers/net/hns3/hns3_rxtx_vec_sve.c\n@@ -20,40 +20,36 @@\n \n #define BD_SIZE\t\t\t32\n #define BD_FIELD_ADDR_OFFSET\t0\n-#define BD_FIELD_L234_OFFSET\t8\n-#define BD_FIELD_XLEN_OFFSET\t12\n-#define BD_FIELD_RSS_OFFSET\t16\n-#define BD_FIELD_OL_OFFSET\t24\n #define BD_FIELD_VALID_OFFSET\t28\n \n-typedef struct {\n-\tuint32_t l234_info[HNS3_SVE_DEFAULT_DESCS_PER_LOOP];\n-\tuint32_t ol_info[HNS3_SVE_DEFAULT_DESCS_PER_LOOP];\n-\tuint32_t bd_base_info[HNS3_SVE_DEFAULT_DESCS_PER_LOOP];\n-} HNS3_SVE_KEY_FIELD_S;\n-\n static inline uint32_t\n hns3_desc_parse_field_sve(struct hns3_rx_queue *rxq,\n \t\t\t struct rte_mbuf **rx_pkts,\n-\t\t\t HNS3_SVE_KEY_FIELD_S *key,\n+\t\t\t struct hns3_desc *rxdp,\n \t\t\t uint32_t bd_vld_num)\n {\n+\tuint32_t l234_info, ol_info, bd_base_info;\n \tuint32_t retcode = 0;\n \tint ret, i;\n \n \tfor (i = 0; i < (int)bd_vld_num; i++) {\n \t\t/* init rte_mbuf.rearm_data last 64-bit */\n \t\trx_pkts[i]->ol_flags = RTE_MBUF_F_RX_RSS_HASH;\n-\n-\t\tret = hns3_handle_bdinfo(rxq, rx_pkts[i], key->bd_base_info[i],\n-\t\t\t\t\t key->l234_info[i]);\n+\t\trx_pkts[i]->hash.rss = rxdp[i].rx.rss_hash;\n+\t\trx_pkts[i]->pkt_len = rte_le_to_cpu_16(rxdp[i].rx.pkt_len) -\n+\t\t\t\t\trxq->crc_len;\n+\t\trx_pkts[i]->data_len = rx_pkts[i]->pkt_len;\n+\n+\t\tl234_info = rxdp[i].rx.l234_info;\n+\t\tol_info = rxdp[i].rx.ol_info;\n+\t\tbd_base_info = rxdp[i].rx.bd_base_info;\n+\t\tret = hns3_handle_bdinfo(rxq, rx_pkts[i], bd_base_info, l234_info);\n \t\tif (unlikely(ret)) {\n \t\t\tretcode |= 1u << i;\n \t\t\tcontinue;\n \t\t}\n \n-\t\trx_pkts[i]->packet_type = hns3_rx_calc_ptype(rxq,\n-\t\t\t\t\tkey->l234_info[i], key->ol_info[i]);\n+\t\trx_pkts[i]->packet_type = hns3_rx_calc_ptype(rxq, l234_info, ol_info);\n \n \t\t/* Increment bytes counter */\n \t\trxq->basic_stats.bytes += rx_pkts[i]->pkt_len;\n@@ -77,46 +73,16 @@ hns3_recv_burst_vec_sve(struct hns3_rx_queue *__restrict rxq,\n \t\t\tuint16_t nb_pkts,\n \t\t\tuint64_t *bd_err_mask)\n {\n-#define XLEN_ADJUST_LEN\t\t32\n-#define RSS_ADJUST_LEN\t\t16\n-#define GEN_VLD_U8_ZIP_INDEX\tsvindex_s8(28, -4)\n \tuint16_t rx_id = rxq->next_to_use;\n \tstruct hns3_entry *sw_ring = &rxq->sw_ring[rx_id];\n \tstruct hns3_desc *rxdp = &rxq->rx_ring[rx_id];\n-\tstruct hns3_desc *rxdp2;\n-\tHNS3_SVE_KEY_FIELD_S key_field;\n+\tstruct hns3_desc *rxdp2, *next_rxdp;\n \tuint64_t bd_valid_num;\n \tuint32_t parse_retcode;\n \tuint16_t nb_rx = 0;\n \tint pos, offset;\n \n-\tuint16_t xlen_adjust[XLEN_ADJUST_LEN] = {\n-\t\t0, 0xffff, 1, 0xffff, /* 1st mbuf: pkt_len and dat_len */\n-\t\t2, 0xffff, 3, 0xffff, /* 2st mbuf: pkt_len and dat_len */\n-\t\t4, 0xffff, 5, 0xffff, /* 3st mbuf: pkt_len and dat_len */\n-\t\t6, 0xffff, 7, 0xffff, /* 4st mbuf: pkt_len and dat_len */\n-\t\t8, 0xffff, 9, 0xffff, /* 5st mbuf: pkt_len and dat_len */\n-\t\t10, 0xffff, 11, 0xffff, /* 6st mbuf: pkt_len and dat_len */\n-\t\t12, 0xffff, 13, 0xffff, /* 7st mbuf: pkt_len and dat_len */\n-\t\t14, 0xffff, 15, 0xffff, /* 8st mbuf: pkt_len and dat_len */\n-\t};\n-\n-\tuint32_t rss_adjust[RSS_ADJUST_LEN] = {\n-\t\t0, 0xffff, /* 1st mbuf: rss */\n-\t\t1, 0xffff, /* 2st mbuf: rss */\n-\t\t2, 0xffff, /* 3st mbuf: rss */\n-\t\t3, 0xffff, /* 4st mbuf: rss */\n-\t\t4, 0xffff, /* 5st mbuf: rss */\n-\t\t5, 0xffff, /* 6st mbuf: rss */\n-\t\t6, 0xffff, /* 7st mbuf: rss */\n-\t\t7, 0xffff, /* 8st mbuf: rss */\n-\t};\n-\n \tsvbool_t pg32 = svwhilelt_b32(0, HNS3_SVE_DEFAULT_DESCS_PER_LOOP);\n-\tsvuint16_t xlen_tbl1 = svld1_u16(PG16_256BIT, xlen_adjust);\n-\tsvuint16_t xlen_tbl2 = svld1_u16(PG16_256BIT, &xlen_adjust[16]);\n-\tsvuint32_t rss_tbl1 = svld1_u32(PG32_256BIT, rss_adjust);\n-\tsvuint32_t rss_tbl2 = svld1_u32(PG32_256BIT, &rss_adjust[8]);\n \n \t/* compile-time verifies the xlen_adjust mask */\n \tRTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, data_len) !=\n@@ -126,30 +92,21 @@ hns3_recv_burst_vec_sve(struct hns3_rx_queue *__restrict rxq,\n \n \tfor (pos = 0; pos < nb_pkts; pos += HNS3_SVE_DEFAULT_DESCS_PER_LOOP,\n \t\t\t\t rxdp += HNS3_SVE_DEFAULT_DESCS_PER_LOOP) {\n-\t\tsvuint64_t vld_clz, mbp1st, mbp2st, mbuf_init;\n-\t\tsvuint64_t xlen1st, xlen2st, rss1st, rss2st;\n-\t\tsvuint32_t l234, ol, vld, vld2, xlen, rss;\n-\t\tsvuint8_t vld_u8;\n+\t\tsvuint64_t mbp1st, mbp2st, mbuf_init;\n+\t\tsvuint32_t vld;\n+\t\tsvbool_t vld_op;\n \n \t\t/* calc how many bd valid: part 1 */\n \t\tvld = svld1_gather_u32offset_u32(pg32, (uint32_t *)rxdp,\n \t\t\tsvindex_u32(BD_FIELD_VALID_OFFSET, BD_SIZE));\n-\t\tvld2 = svlsl_n_u32_z(pg32, vld,\n-\t\t\t\t HNS3_UINT32_BIT - 1 - HNS3_RXD_VLD_B);\n-\t\tvld2 = svreinterpret_u32_s32(svasr_n_s32_z(pg32,\n-\t\t\tsvreinterpret_s32_u32(vld2), HNS3_UINT32_BIT - 1));\n+\t\tvld = svand_n_u32_z(pg32, vld, BIT(HNS3_RXD_VLD_B));\n+\t\tvld_op = svcmpne_n_u32(pg32, vld, BIT(HNS3_RXD_VLD_B));\n+\t\tbd_valid_num = svcntp_b32(pg32, svbrkb_b_z(pg32, vld_op));\n+\t\tif (bd_valid_num == 0)\n+\t\t\tbreak;\n \n \t\t/* load 4 mbuf pointer */\n \t\tmbp1st = svld1_u64(PG64_256BIT, (uint64_t *)&sw_ring[pos]);\n-\n-\t\t/* calc how many bd valid: part 2 */\n-\t\tvld_u8 = svtbl_u8(svreinterpret_u8_u32(vld2),\n-\t\t\t\t svreinterpret_u8_s8(GEN_VLD_U8_ZIP_INDEX));\n-\t\tvld_clz = svnot_u64_z(PG64_64BIT, svreinterpret_u64_u8(vld_u8));\n-\t\tvld_clz = svclz_u64_z(PG64_64BIT, vld_clz);\n-\t\tsvst1_u64(PG64_64BIT, &bd_valid_num, vld_clz);\n-\t\tbd_valid_num /= HNS3_UINT8_BIT;\n-\n \t\t/* load 4 more mbuf pointer */\n \t\tmbp2st = svld1_u64(PG64_256BIT, (uint64_t *)&sw_ring[pos + 4]);\n \n@@ -159,65 +116,25 @@ hns3_recv_burst_vec_sve(struct hns3_rx_queue *__restrict rxq,\n \n \t\t/* store 4 mbuf pointer into rx_pkts */\n \t\tsvst1_u64(PG64_256BIT, (uint64_t *)&rx_pkts[pos], mbp1st);\n-\n-\t\t/* load key field to vector reg */\n-\t\tl234 = svld1_gather_u32offset_u32(pg32, (uint32_t *)rxdp2,\n-\t\t\t\tsvindex_u32(BD_FIELD_L234_OFFSET, BD_SIZE));\n-\t\tol = svld1_gather_u32offset_u32(pg32, (uint32_t *)rxdp2,\n-\t\t\t\tsvindex_u32(BD_FIELD_OL_OFFSET, BD_SIZE));\n-\n \t\t/* store 4 mbuf pointer into rx_pkts again */\n \t\tsvst1_u64(PG64_256BIT, (uint64_t *)&rx_pkts[pos + 4], mbp2st);\n \n-\t\t/* load datalen, pktlen and rss_hash */\n-\t\txlen = svld1_gather_u32offset_u32(pg32, (uint32_t *)rxdp2,\n-\t\t\t\tsvindex_u32(BD_FIELD_XLEN_OFFSET, BD_SIZE));\n-\t\trss = svld1_gather_u32offset_u32(pg32, (uint32_t *)rxdp2,\n-\t\t\t\tsvindex_u32(BD_FIELD_RSS_OFFSET, BD_SIZE));\n-\n-\t\t/* store key field to stash buffer */\n-\t\tsvst1_u32(pg32, (uint32_t *)key_field.l234_info, l234);\n-\t\tsvst1_u32(pg32, (uint32_t *)key_field.bd_base_info, vld);\n-\t\tsvst1_u32(pg32, (uint32_t *)key_field.ol_info, ol);\n-\n-\t\t/* sub crc_len for pkt_len and data_len */\n-\t\txlen = svreinterpret_u32_u16(svsub_n_u16_z(PG16_256BIT,\n-\t\t\tsvreinterpret_u16_u32(xlen), rxq->crc_len));\n-\n \t\t/* init mbuf_initializer */\n \t\tmbuf_init = svdup_n_u64(rxq->mbuf_initializer);\n-\n-\t\t/* extract datalen, pktlen and rss from xlen and rss */\n-\t\txlen1st = svreinterpret_u64_u16(\n-\t\t\tsvtbl_u16(svreinterpret_u16_u32(xlen), xlen_tbl1));\n-\t\txlen2st = svreinterpret_u64_u16(\n-\t\t\tsvtbl_u16(svreinterpret_u16_u32(xlen), xlen_tbl2));\n-\t\trss1st = svreinterpret_u64_u32(\n-\t\t\tsvtbl_u32(svreinterpret_u32_u32(rss), rss_tbl1));\n-\t\trss2st = svreinterpret_u64_u32(\n-\t\t\tsvtbl_u32(svreinterpret_u32_u32(rss), rss_tbl2));\n-\n \t\t/* save mbuf_initializer */\n \t\tsvst1_scatter_u64base_offset_u64(PG64_256BIT, mbp1st,\n \t\t\toffsetof(struct rte_mbuf, rearm_data), mbuf_init);\n \t\tsvst1_scatter_u64base_offset_u64(PG64_256BIT, mbp2st,\n \t\t\toffsetof(struct rte_mbuf, rearm_data), mbuf_init);\n \n-\t\t/* save datalen and pktlen and rss */\n-\t\tsvst1_scatter_u64base_offset_u64(PG64_256BIT, mbp1st,\n-\t\t\toffsetof(struct rte_mbuf, pkt_len), xlen1st);\n-\t\tsvst1_scatter_u64base_offset_u64(PG64_256BIT, mbp1st,\n-\t\t\toffsetof(struct rte_mbuf, hash.rss), rss1st);\n-\t\tsvst1_scatter_u64base_offset_u64(PG64_256BIT, mbp2st,\n-\t\t\toffsetof(struct rte_mbuf, pkt_len), xlen2st);\n-\t\tsvst1_scatter_u64base_offset_u64(PG64_256BIT, mbp2st,\n-\t\t\toffsetof(struct rte_mbuf, hash.rss), rss2st);\n-\n-\t\trte_prefetch_non_temporal(rxdp +\n-\t\t\t\t\t HNS3_SVE_DEFAULT_DESCS_PER_LOOP);\n+\t\tnext_rxdp = rxdp + HNS3_SVE_DEFAULT_DESCS_PER_LOOP;\n+\t\trte_prefetch_non_temporal(next_rxdp);\n+\t\trte_prefetch_non_temporal(next_rxdp + 2);\n+\t\trte_prefetch_non_temporal(next_rxdp + 4);\n+\t\trte_prefetch_non_temporal(next_rxdp + 6);\n \n \t\tparse_retcode = hns3_desc_parse_field_sve(rxq, &rx_pkts[pos],\n-\t\t\t\t\t&key_field, bd_valid_num);\n+\t\t\t\t\t&rxdp2[offset], bd_valid_num);\n \t\tif (unlikely(parse_retcode))\n \t\t\t(*bd_err_mask) |= ((uint64_t)parse_retcode) << pos;\n \n@@ -237,6 +154,7 @@ hns3_recv_burst_vec_sve(struct hns3_rx_queue *__restrict rxq,\n \treturn nb_rx;\n }\n \n+\n uint16_t\n hns3_recv_pkts_vec_sve(void *__restrict rx_queue,\n \t\t struct rte_mbuf **__restrict rx_pkts,\n", "prefixes": [ "5/5" ] }{ "id": 129446, "url": "