get:
Show a patch.

patch:
Update a patch.

put:
Update a patch.

GET /api/patches/129446/?format=api
HTTP 200 OK
Allow: GET, PUT, PATCH, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept

{
    "id": 129446,
    "url": "http://patchwork.dpdk.org/api/patches/129446/?format=api",
    "web_url": "http://patchwork.dpdk.org/project/dpdk/patch/20230711102448.11627-6-liudongdong3@huawei.com/",
    "project": {
        "id": 1,
        "url": "http://patchwork.dpdk.org/api/projects/1/?format=api",
        "name": "DPDK",
        "link_name": "dpdk",
        "list_id": "dev.dpdk.org",
        "list_email": "dev@dpdk.org",
        "web_url": "http://core.dpdk.org",
        "scm_url": "git://dpdk.org/dpdk",
        "webscm_url": "http://git.dpdk.org/dpdk",
        "list_archive_url": "https://inbox.dpdk.org/dev",
        "list_archive_url_format": "https://inbox.dpdk.org/dev/{}",
        "commit_url_format": ""
    },
    "msgid": "<20230711102448.11627-6-liudongdong3@huawei.com>",
    "list_archive_url": "https://inbox.dpdk.org/dev/20230711102448.11627-6-liudongdong3@huawei.com",
    "date": "2023-07-11T10:24:48",
    "name": "[5/5] net/hns3: optimize SVE Rx performance",
    "commit_ref": null,
    "pull_url": null,
    "state": "accepted",
    "archived": true,
    "hash": "933c0f60225f66bcfe9200fbcaf0319a5f6a7a52",
    "submitter": {
        "id": 2718,
        "url": "http://patchwork.dpdk.org/api/people/2718/?format=api",
        "name": "Dongdong Liu",
        "email": "liudongdong3@huawei.com"
    },
    "delegate": {
        "id": 319,
        "url": "http://patchwork.dpdk.org/api/users/319/?format=api",
        "username": "fyigit",
        "first_name": "Ferruh",
        "last_name": "Yigit",
        "email": "ferruh.yigit@amd.com"
    },
    "mbox": "http://patchwork.dpdk.org/project/dpdk/patch/20230711102448.11627-6-liudongdong3@huawei.com/mbox/",
    "series": [
        {
            "id": 28901,
            "url": "http://patchwork.dpdk.org/api/series/28901/?format=api",
            "web_url": "http://patchwork.dpdk.org/project/dpdk/list/?series=28901",
            "date": "2023-07-11T10:24:43",
            "name": "net/hns3: some performance optimizations",
            "version": 1,
            "mbox": "http://patchwork.dpdk.org/series/28901/mbox/"
        }
    ],
    "comments": "http://patchwork.dpdk.org/api/patches/129446/comments/",
    "check": "success",
    "checks": "http://patchwork.dpdk.org/api/patches/129446/checks/",
    "tags": {},
    "related": [],
    "headers": {
        "Return-Path": "<dev-bounces@dpdk.org>",
        "X-Original-To": "patchwork@inbox.dpdk.org",
        "Delivered-To": "patchwork@inbox.dpdk.org",
        "Received": [
            "from mails.dpdk.org (mails.dpdk.org [217.70.189.124])\n\tby inbox.dpdk.org (Postfix) with ESMTP id 268B442E44;\n\tTue, 11 Jul 2023 12:28:31 +0200 (CEST)",
            "from mails.dpdk.org (localhost [127.0.0.1])\n\tby mails.dpdk.org (Postfix) with ESMTP id 41D5942D43;\n\tTue, 11 Jul 2023 12:28:09 +0200 (CEST)",
            "from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188])\n by mails.dpdk.org (Postfix) with ESMTP id 3981242D40;\n Tue, 11 Jul 2023 12:28:07 +0200 (CEST)",
            "from kwepemi500017.china.huawei.com (unknown [172.30.72.56])\n by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4R0cVb5MDVzVjHM;\n Tue, 11 Jul 2023 18:26:47 +0800 (CST)",
            "from localhost.localdomain (10.28.79.22) by\n kwepemi500017.china.huawei.com (7.221.188.110) with Microsoft SMTP Server\n (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id\n 15.1.2507.27; Tue, 11 Jul 2023 18:27:58 +0800"
        ],
        "From": "Dongdong Liu <liudongdong3@huawei.com>",
        "To": "<dev@dpdk.org>, <ferruh.yigit@amd.com>, <thomas@monjalon.net>,\n <andrew.rybchenko@oktetlabs.ru>",
        "CC": "<stable@dpdk.org>",
        "Subject": "[PATCH 5/5] net/hns3: optimize SVE Rx performance",
        "Date": "Tue, 11 Jul 2023 18:24:48 +0800",
        "Message-ID": "<20230711102448.11627-6-liudongdong3@huawei.com>",
        "X-Mailer": "git-send-email 2.22.0",
        "In-Reply-To": "<20230711102448.11627-1-liudongdong3@huawei.com>",
        "References": "<20230711102448.11627-1-liudongdong3@huawei.com>",
        "MIME-Version": "1.0",
        "Content-Transfer-Encoding": "8bit",
        "Content-Type": "text/plain",
        "X-Originating-IP": "[10.28.79.22]",
        "X-ClientProxiedBy": "dggems706-chm.china.huawei.com (10.3.19.183) To\n kwepemi500017.china.huawei.com (7.221.188.110)",
        "X-CFilter-Loop": "Reflected",
        "X-BeenThere": "dev@dpdk.org",
        "X-Mailman-Version": "2.1.29",
        "Precedence": "list",
        "List-Id": "DPDK patches and discussions <dev.dpdk.org>",
        "List-Unsubscribe": "<https://mails.dpdk.org/options/dev>,\n <mailto:dev-request@dpdk.org?subject=unsubscribe>",
        "List-Archive": "<http://mails.dpdk.org/archives/dev/>",
        "List-Post": "<mailto:dev@dpdk.org>",
        "List-Help": "<mailto:dev-request@dpdk.org?subject=help>",
        "List-Subscribe": "<https://mails.dpdk.org/listinfo/dev>,\n <mailto:dev-request@dpdk.org?subject=subscribe>",
        "Errors-To": "dev-bounces@dpdk.org"
    },
    "content": "From: Huisong Li <lihuisong@huawei.com>\n\nThis patch optimizes SVE Rx performance by the following ways:\n1> optimize the calculation of valid BD number.\n2> remove a temporary variable (key_fields)\n3> use C language to parse some descriptor fields, instead of\n   SVE instruction.\n4> small step prefetch descriptor.\n\nOn the rxonly forwarding mode, the performance of a single queue\nor 64B packet is improved by ~40%.\n\nSigned-off-by: Huisong Li <lihuisong@huawei.com>\nSigned-off-by: Dongdong Liu <liudongdong3@huawei.com>\n---\n drivers/net/hns3/hns3_rxtx_vec_sve.c | 138 ++++++---------------------\n 1 file changed, 28 insertions(+), 110 deletions(-)",
    "diff": "diff --git a/drivers/net/hns3/hns3_rxtx_vec_sve.c b/drivers/net/hns3/hns3_rxtx_vec_sve.c\nindex 54aef7db8d..0e9abfebec 100644\n--- a/drivers/net/hns3/hns3_rxtx_vec_sve.c\n+++ b/drivers/net/hns3/hns3_rxtx_vec_sve.c\n@@ -20,40 +20,36 @@\n \n #define BD_SIZE\t\t\t32\n #define BD_FIELD_ADDR_OFFSET\t0\n-#define BD_FIELD_L234_OFFSET\t8\n-#define BD_FIELD_XLEN_OFFSET\t12\n-#define BD_FIELD_RSS_OFFSET\t16\n-#define BD_FIELD_OL_OFFSET\t24\n #define BD_FIELD_VALID_OFFSET\t28\n \n-typedef struct {\n-\tuint32_t l234_info[HNS3_SVE_DEFAULT_DESCS_PER_LOOP];\n-\tuint32_t ol_info[HNS3_SVE_DEFAULT_DESCS_PER_LOOP];\n-\tuint32_t bd_base_info[HNS3_SVE_DEFAULT_DESCS_PER_LOOP];\n-} HNS3_SVE_KEY_FIELD_S;\n-\n static inline uint32_t\n hns3_desc_parse_field_sve(struct hns3_rx_queue *rxq,\n \t\t\t  struct rte_mbuf **rx_pkts,\n-\t\t\t  HNS3_SVE_KEY_FIELD_S *key,\n+\t\t\t  struct hns3_desc *rxdp,\n \t\t\t  uint32_t   bd_vld_num)\n {\n+\tuint32_t l234_info, ol_info, bd_base_info;\n \tuint32_t retcode = 0;\n \tint ret, i;\n \n \tfor (i = 0; i < (int)bd_vld_num; i++) {\n \t\t/* init rte_mbuf.rearm_data last 64-bit */\n \t\trx_pkts[i]->ol_flags = RTE_MBUF_F_RX_RSS_HASH;\n-\n-\t\tret = hns3_handle_bdinfo(rxq, rx_pkts[i], key->bd_base_info[i],\n-\t\t\t\t\t key->l234_info[i]);\n+\t\trx_pkts[i]->hash.rss = rxdp[i].rx.rss_hash;\n+\t\trx_pkts[i]->pkt_len = rte_le_to_cpu_16(rxdp[i].rx.pkt_len) -\n+\t\t\t\t\trxq->crc_len;\n+\t\trx_pkts[i]->data_len = rx_pkts[i]->pkt_len;\n+\n+\t\tl234_info = rxdp[i].rx.l234_info;\n+\t\tol_info = rxdp[i].rx.ol_info;\n+\t\tbd_base_info = rxdp[i].rx.bd_base_info;\n+\t\tret = hns3_handle_bdinfo(rxq, rx_pkts[i], bd_base_info, l234_info);\n \t\tif (unlikely(ret)) {\n \t\t\tretcode |= 1u << i;\n \t\t\tcontinue;\n \t\t}\n \n-\t\trx_pkts[i]->packet_type = hns3_rx_calc_ptype(rxq,\n-\t\t\t\t\tkey->l234_info[i], key->ol_info[i]);\n+\t\trx_pkts[i]->packet_type = hns3_rx_calc_ptype(rxq, l234_info, ol_info);\n \n \t\t/* Increment bytes counter */\n \t\trxq->basic_stats.bytes += rx_pkts[i]->pkt_len;\n@@ -77,46 +73,16 @@ hns3_recv_burst_vec_sve(struct hns3_rx_queue *__restrict rxq,\n \t\t\tuint16_t nb_pkts,\n \t\t\tuint64_t *bd_err_mask)\n {\n-#define XLEN_ADJUST_LEN\t\t32\n-#define RSS_ADJUST_LEN\t\t16\n-#define GEN_VLD_U8_ZIP_INDEX\tsvindex_s8(28, -4)\n \tuint16_t rx_id = rxq->next_to_use;\n \tstruct hns3_entry *sw_ring = &rxq->sw_ring[rx_id];\n \tstruct hns3_desc *rxdp = &rxq->rx_ring[rx_id];\n-\tstruct hns3_desc *rxdp2;\n-\tHNS3_SVE_KEY_FIELD_S key_field;\n+\tstruct hns3_desc *rxdp2, *next_rxdp;\n \tuint64_t bd_valid_num;\n \tuint32_t parse_retcode;\n \tuint16_t nb_rx = 0;\n \tint pos, offset;\n \n-\tuint16_t xlen_adjust[XLEN_ADJUST_LEN] = {\n-\t\t0,  0xffff, 1,  0xffff,    /* 1st mbuf: pkt_len and dat_len */\n-\t\t2,  0xffff, 3,  0xffff,    /* 2st mbuf: pkt_len and dat_len */\n-\t\t4,  0xffff, 5,  0xffff,    /* 3st mbuf: pkt_len and dat_len */\n-\t\t6,  0xffff, 7,  0xffff,    /* 4st mbuf: pkt_len and dat_len */\n-\t\t8,  0xffff, 9,  0xffff,    /* 5st mbuf: pkt_len and dat_len */\n-\t\t10, 0xffff, 11, 0xffff,    /* 6st mbuf: pkt_len and dat_len */\n-\t\t12, 0xffff, 13, 0xffff,    /* 7st mbuf: pkt_len and dat_len */\n-\t\t14, 0xffff, 15, 0xffff,    /* 8st mbuf: pkt_len and dat_len */\n-\t};\n-\n-\tuint32_t rss_adjust[RSS_ADJUST_LEN] = {\n-\t\t0, 0xffff,        /* 1st mbuf: rss */\n-\t\t1, 0xffff,        /* 2st mbuf: rss */\n-\t\t2, 0xffff,        /* 3st mbuf: rss */\n-\t\t3, 0xffff,        /* 4st mbuf: rss */\n-\t\t4, 0xffff,        /* 5st mbuf: rss */\n-\t\t5, 0xffff,        /* 6st mbuf: rss */\n-\t\t6, 0xffff,        /* 7st mbuf: rss */\n-\t\t7, 0xffff,        /* 8st mbuf: rss */\n-\t};\n-\n \tsvbool_t pg32 = svwhilelt_b32(0, HNS3_SVE_DEFAULT_DESCS_PER_LOOP);\n-\tsvuint16_t xlen_tbl1 = svld1_u16(PG16_256BIT, xlen_adjust);\n-\tsvuint16_t xlen_tbl2 = svld1_u16(PG16_256BIT, &xlen_adjust[16]);\n-\tsvuint32_t rss_tbl1 = svld1_u32(PG32_256BIT, rss_adjust);\n-\tsvuint32_t rss_tbl2 = svld1_u32(PG32_256BIT, &rss_adjust[8]);\n \n \t/* compile-time verifies the xlen_adjust mask */\n \tRTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, data_len) !=\n@@ -126,30 +92,21 @@ hns3_recv_burst_vec_sve(struct hns3_rx_queue *__restrict rxq,\n \n \tfor (pos = 0; pos < nb_pkts; pos += HNS3_SVE_DEFAULT_DESCS_PER_LOOP,\n \t\t\t\t     rxdp += HNS3_SVE_DEFAULT_DESCS_PER_LOOP) {\n-\t\tsvuint64_t vld_clz, mbp1st, mbp2st, mbuf_init;\n-\t\tsvuint64_t xlen1st, xlen2st, rss1st, rss2st;\n-\t\tsvuint32_t l234, ol, vld, vld2, xlen, rss;\n-\t\tsvuint8_t  vld_u8;\n+\t\tsvuint64_t mbp1st, mbp2st, mbuf_init;\n+\t\tsvuint32_t vld;\n+\t\tsvbool_t vld_op;\n \n \t\t/* calc how many bd valid: part 1 */\n \t\tvld = svld1_gather_u32offset_u32(pg32, (uint32_t *)rxdp,\n \t\t\tsvindex_u32(BD_FIELD_VALID_OFFSET, BD_SIZE));\n-\t\tvld2 = svlsl_n_u32_z(pg32, vld,\n-\t\t\t\t    HNS3_UINT32_BIT - 1 - HNS3_RXD_VLD_B);\n-\t\tvld2 = svreinterpret_u32_s32(svasr_n_s32_z(pg32,\n-\t\t\tsvreinterpret_s32_u32(vld2), HNS3_UINT32_BIT - 1));\n+\t\tvld = svand_n_u32_z(pg32, vld, BIT(HNS3_RXD_VLD_B));\n+\t\tvld_op = svcmpne_n_u32(pg32, vld, BIT(HNS3_RXD_VLD_B));\n+\t\tbd_valid_num = svcntp_b32(pg32, svbrkb_b_z(pg32, vld_op));\n+\t\tif (bd_valid_num == 0)\n+\t\t\tbreak;\n \n \t\t/* load 4 mbuf pointer */\n \t\tmbp1st = svld1_u64(PG64_256BIT, (uint64_t *)&sw_ring[pos]);\n-\n-\t\t/* calc how many bd valid: part 2 */\n-\t\tvld_u8 = svtbl_u8(svreinterpret_u8_u32(vld2),\n-\t\t\t\t  svreinterpret_u8_s8(GEN_VLD_U8_ZIP_INDEX));\n-\t\tvld_clz = svnot_u64_z(PG64_64BIT, svreinterpret_u64_u8(vld_u8));\n-\t\tvld_clz = svclz_u64_z(PG64_64BIT, vld_clz);\n-\t\tsvst1_u64(PG64_64BIT, &bd_valid_num, vld_clz);\n-\t\tbd_valid_num /= HNS3_UINT8_BIT;\n-\n \t\t/* load 4 more mbuf pointer */\n \t\tmbp2st = svld1_u64(PG64_256BIT, (uint64_t *)&sw_ring[pos + 4]);\n \n@@ -159,65 +116,25 @@ hns3_recv_burst_vec_sve(struct hns3_rx_queue *__restrict rxq,\n \n \t\t/* store 4 mbuf pointer into rx_pkts */\n \t\tsvst1_u64(PG64_256BIT, (uint64_t *)&rx_pkts[pos], mbp1st);\n-\n-\t\t/* load key field to vector reg */\n-\t\tl234 = svld1_gather_u32offset_u32(pg32, (uint32_t *)rxdp2,\n-\t\t\t\tsvindex_u32(BD_FIELD_L234_OFFSET, BD_SIZE));\n-\t\tol = svld1_gather_u32offset_u32(pg32, (uint32_t *)rxdp2,\n-\t\t\t\tsvindex_u32(BD_FIELD_OL_OFFSET, BD_SIZE));\n-\n \t\t/* store 4 mbuf pointer into rx_pkts again */\n \t\tsvst1_u64(PG64_256BIT, (uint64_t *)&rx_pkts[pos + 4], mbp2st);\n \n-\t\t/* load datalen, pktlen and rss_hash */\n-\t\txlen = svld1_gather_u32offset_u32(pg32, (uint32_t *)rxdp2,\n-\t\t\t\tsvindex_u32(BD_FIELD_XLEN_OFFSET, BD_SIZE));\n-\t\trss = svld1_gather_u32offset_u32(pg32, (uint32_t *)rxdp2,\n-\t\t\t\tsvindex_u32(BD_FIELD_RSS_OFFSET, BD_SIZE));\n-\n-\t\t/* store key field to stash buffer */\n-\t\tsvst1_u32(pg32, (uint32_t *)key_field.l234_info, l234);\n-\t\tsvst1_u32(pg32, (uint32_t *)key_field.bd_base_info, vld);\n-\t\tsvst1_u32(pg32, (uint32_t *)key_field.ol_info, ol);\n-\n-\t\t/* sub crc_len for pkt_len and data_len */\n-\t\txlen = svreinterpret_u32_u16(svsub_n_u16_z(PG16_256BIT,\n-\t\t\tsvreinterpret_u16_u32(xlen), rxq->crc_len));\n-\n \t\t/* init mbuf_initializer */\n \t\tmbuf_init = svdup_n_u64(rxq->mbuf_initializer);\n-\n-\t\t/* extract datalen, pktlen and rss from xlen and rss */\n-\t\txlen1st = svreinterpret_u64_u16(\n-\t\t\tsvtbl_u16(svreinterpret_u16_u32(xlen), xlen_tbl1));\n-\t\txlen2st = svreinterpret_u64_u16(\n-\t\t\tsvtbl_u16(svreinterpret_u16_u32(xlen), xlen_tbl2));\n-\t\trss1st = svreinterpret_u64_u32(\n-\t\t\tsvtbl_u32(svreinterpret_u32_u32(rss), rss_tbl1));\n-\t\trss2st = svreinterpret_u64_u32(\n-\t\t\tsvtbl_u32(svreinterpret_u32_u32(rss), rss_tbl2));\n-\n \t\t/* save mbuf_initializer */\n \t\tsvst1_scatter_u64base_offset_u64(PG64_256BIT, mbp1st,\n \t\t\toffsetof(struct rte_mbuf, rearm_data), mbuf_init);\n \t\tsvst1_scatter_u64base_offset_u64(PG64_256BIT, mbp2st,\n \t\t\toffsetof(struct rte_mbuf, rearm_data), mbuf_init);\n \n-\t\t/* save datalen and pktlen and rss */\n-\t\tsvst1_scatter_u64base_offset_u64(PG64_256BIT, mbp1st,\n-\t\t\toffsetof(struct rte_mbuf, pkt_len), xlen1st);\n-\t\tsvst1_scatter_u64base_offset_u64(PG64_256BIT, mbp1st,\n-\t\t\toffsetof(struct rte_mbuf, hash.rss), rss1st);\n-\t\tsvst1_scatter_u64base_offset_u64(PG64_256BIT, mbp2st,\n-\t\t\toffsetof(struct rte_mbuf, pkt_len), xlen2st);\n-\t\tsvst1_scatter_u64base_offset_u64(PG64_256BIT, mbp2st,\n-\t\t\toffsetof(struct rte_mbuf, hash.rss), rss2st);\n-\n-\t\trte_prefetch_non_temporal(rxdp +\n-\t\t\t\t\t  HNS3_SVE_DEFAULT_DESCS_PER_LOOP);\n+\t\tnext_rxdp = rxdp + HNS3_SVE_DEFAULT_DESCS_PER_LOOP;\n+\t\trte_prefetch_non_temporal(next_rxdp);\n+\t\trte_prefetch_non_temporal(next_rxdp + 2);\n+\t\trte_prefetch_non_temporal(next_rxdp + 4);\n+\t\trte_prefetch_non_temporal(next_rxdp + 6);\n \n \t\tparse_retcode = hns3_desc_parse_field_sve(rxq, &rx_pkts[pos],\n-\t\t\t\t\t&key_field, bd_valid_num);\n+\t\t\t\t\t&rxdp2[offset], bd_valid_num);\n \t\tif (unlikely(parse_retcode))\n \t\t\t(*bd_err_mask) |= ((uint64_t)parse_retcode) << pos;\n \n@@ -237,6 +154,7 @@ hns3_recv_burst_vec_sve(struct hns3_rx_queue *__restrict rxq,\n \treturn nb_rx;\n }\n \n+\n uint16_t\n hns3_recv_pkts_vec_sve(void *__restrict rx_queue,\n \t\t       struct rte_mbuf **__restrict rx_pkts,\n",
    "prefixes": [
        "5/5"
    ]
}