From patchwork Tue May 17 10:51:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joyce Kong X-Patchwork-Id: 111240 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E05E6A0505; Tue, 17 May 2022 12:52:15 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D8F5242B5B; Tue, 17 May 2022 12:52:07 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 1C5D042823 for ; Tue, 17 May 2022 12:52:06 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A94151063; Tue, 17 May 2022 03:52:05 -0700 (PDT) Received: from net-arm-n1amp-02.shanghai.arm.com (net-arm-n1amp-02.shanghai.arm.com [10.169.210.142]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id D36D73F66F; Tue, 17 May 2022 03:52:03 -0700 (PDT) From: Joyce Kong To: Jakub Grajciar Cc: ruifeng.wang@arm.com, dev@dpdk.org, nd@arm.com, Joyce Kong Subject: [PATCH v1 2/2] net/memif: add a Tx fast path Date: Tue, 17 May 2022 10:51:09 +0000 Message-Id: <20220517105109.1086090-3-joyce.kong@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220517105109.1086090-1-joyce.kong@arm.com> References: <20220412093243.3670187-1-joyce.kong@arm.com> <20220517105109.1086090-1-joyce.kong@arm.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org For memif non-zero-copy mode, there is a branch to compare the mbuf and memif buffer size during memory copying. If all mbufs come from the same mempool, and memif buf size >= mbuf size, add a fast Tx memory copy path without the comparing branch and with mbuf bulk free, otherwise still run the original Tx path. The removal of the branch and bulk free lead to considerable performance uplift. Test with 1p1q on Ampere Altra AArch64 server, -------------------------------------------- buf size | memif >= mbuf | memif < mbuf | -------------------------------------------- non-zc gain | 13.35% | -0.77% | -------------------------------------------- zc gain | 17.15% | -0.47% | -------------------------------------------- Test with 1p1q on Cascade Lake Xeon X86server, -------------------------------------------- buf size | memif >= mbuf | memif < mbuf | -------------------------------------------- non-zc gain | 10.10% | -0.29% | -------------------------------------------- zc gain | 8.87% | -0.99% | -------------------------------------------- Signed-off-by: Joyce Kong --- drivers/net/memif/rte_eth_memif.c | 134 ++++++++++++++++++++---------- 1 file changed, 92 insertions(+), 42 deletions(-) diff --git a/drivers/net/memif/rte_eth_memif.c b/drivers/net/memif/rte_eth_memif.c index f55776ca46..f6ef7c6e93 100644 --- a/drivers/net/memif/rte_eth_memif.c +++ b/drivers/net/memif/rte_eth_memif.c @@ -660,62 +660,112 @@ eth_memif_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) n_free = __atomic_load_n(&ring->head, __ATOMIC_ACQUIRE) - slot; } - while (n_tx_pkts < nb_pkts && n_free) { - mbuf_head = *bufs++; - nb_segs = mbuf_head->nb_segs; - mbuf = mbuf_head; + uint8_t i; + struct rte_mbuf **buf_tmp = bufs; + mbuf_head = *buf_tmp++; + struct rte_mempool *mp = mbuf_head->pool; + + for (i = 1; i < nb_pkts; i++) { + mbuf_head = *buf_tmp++; + if (mbuf_head->pool != mp) + break; + } + + uint16_t mbuf_size = rte_pktmbuf_data_room_size(mp) - RTE_PKTMBUF_HEADROOM; + if (i == nb_pkts && pmd->cfg.pkt_buffer_size >= mbuf_size) { + buf_tmp = bufs; + while (n_tx_pkts < nb_pkts && n_free) { + mbuf_head = *bufs++; + nb_segs = mbuf_head->nb_segs; + mbuf = mbuf_head; - saved_slot = slot; - d0 = &ring->desc[slot & mask]; - dst_off = 0; - dst_len = (type == MEMIF_RING_C2S) ? - pmd->run.pkt_buffer_size : d0->length; + saved_slot = slot; -next_in_chain: - src_off = 0; - src_len = rte_pktmbuf_data_len(mbuf); +next_in_chain1: + d0 = &ring->desc[slot & mask]; + cp_len = rte_pktmbuf_data_len(mbuf); - while (src_len) { - if (dst_len == 0) { + rte_memcpy((uint8_t *)memif_get_buffer(proc_private, d0), + rte_pktmbuf_mtod(mbuf, void *), cp_len); + + d0->length = cp_len; + mq->n_bytes += cp_len; + slot++; + n_free--; + + if (--nb_segs > 0) { if (n_free) { - slot++; - n_free--; d0->flags |= MEMIF_DESC_FLAG_NEXT; - d0 = &ring->desc[slot & mask]; - dst_off = 0; - dst_len = (type == MEMIF_RING_C2S) ? - pmd->run.pkt_buffer_size : d0->length; - d0->flags = 0; + mbuf = mbuf->next; + goto next_in_chain1; } else { slot = saved_slot; - goto no_free_slots; + goto free_mbufs; } } - cp_len = RTE_MIN(dst_len, src_len); - rte_memcpy((uint8_t *)memif_get_buffer(proc_private, - d0) + dst_off, - rte_pktmbuf_mtod_offset(mbuf, void *, src_off), - cp_len); + n_tx_pkts++; + } +free_mbufs: + rte_pktmbuf_free_bulk(buf_tmp, n_tx_pkts); + } else { + while (n_tx_pkts < nb_pkts && n_free) { + mbuf_head = *bufs++; + nb_segs = mbuf_head->nb_segs; + mbuf = mbuf_head; - mq->n_bytes += cp_len; - src_off += cp_len; - dst_off += cp_len; - src_len -= cp_len; - dst_len -= cp_len; + saved_slot = slot; + d0 = &ring->desc[slot & mask]; + dst_off = 0; + dst_len = (type == MEMIF_RING_C2S) ? + pmd->run.pkt_buffer_size : d0->length; - d0->length = dst_off; - } +next_in_chain2: + src_off = 0; + src_len = rte_pktmbuf_data_len(mbuf); - if (--nb_segs > 0) { - mbuf = mbuf->next; - goto next_in_chain; - } + while (src_len) { + if (dst_len == 0) { + if (n_free) { + slot++; + n_free--; + d0->flags |= MEMIF_DESC_FLAG_NEXT; + d0 = &ring->desc[slot & mask]; + dst_off = 0; + dst_len = (type == MEMIF_RING_C2S) ? + pmd->run.pkt_buffer_size : d0->length; + d0->flags = 0; + } else { + slot = saved_slot; + goto no_free_slots; + } + } + cp_len = RTE_MIN(dst_len, src_len); - n_tx_pkts++; - slot++; - n_free--; - rte_pktmbuf_free(mbuf_head); + rte_memcpy((uint8_t *)memif_get_buffer(proc_private, + d0) + dst_off, + rte_pktmbuf_mtod_offset(mbuf, void *, src_off), + cp_len); + + mq->n_bytes += cp_len; + src_off += cp_len; + dst_off += cp_len; + src_len -= cp_len; + dst_len -= cp_len; + + d0->length = dst_off; + } + + if (--nb_segs > 0) { + mbuf = mbuf->next; + goto next_in_chain2; + } + + n_tx_pkts++; + slot++; + n_free--; + rte_pktmbuf_free(mbuf_head); + } } no_free_slots: