From patchwork Wed Jul 3 10:44:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at Cisco)" X-Patchwork-Id: 56007 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 5F6F51B94E; Wed, 3 Jul 2019 12:44:30 +0200 (CEST) Received: from alln-iport-7.cisco.com (alln-iport-7.cisco.com [173.37.142.94]) by dpdk.org (Postfix) with ESMTP id 56E36CFA6 for ; Wed, 3 Jul 2019 12:44:27 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=52144; q=dns/txt; s=iport; t=1562150667; x=1563360267; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=4RQ2xgK6LWZwneeZwDJVW4dOVuCebF4Bqxg1Eoo3taQ=; b=Zwjr7GgWwmEY1rThKJ1SJukFIWR8tKXFHUrxh7pRLzs41dq+9k7UZauK ta9toMSHVvTO8YpSgh/mES0jAeGwIG+gyco2IKIFTwUkiRV9mMKLtuMC5 9lJuSICf1PrQR3jGm2Am+495dH9n0qH/9gjmgFj9nlHiZT31nqOMqGMF5 U=; X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0BGAADrhRxd/4UNJK1lGQEBAQEBAQEBAQEBAQcBAQEBAQGBZ4IXalQwKAqicIVAiW2BZwkBAQEMAQEbFAEBhEACgiAjOBMBAwEBBAEBAgEFbYo3DIVLBhoBDFIQUVcGDgWDIgGBdhSlXDOCT4J4gzGBR4E0hwhmg3EXgUA/gRGDUIQJJIV5BIwJBgSHcIIFlDMJghiFelyNHQwbgitsijyKHotYgwiWAwIRFYFnIYFYTSMVO4JsCYJBAgEXgnCEfYY2PQEBMY0KAYEgAQE X-IronPort-AV: E=Sophos;i="5.63,446,1557187200"; d="scan'208";a="290581653" Received: from alln-core-11.cisco.com ([173.36.13.133]) by alln-iport-7.cisco.com with ESMTP/TLS/DHE-RSA-SEED-SHA; 03 Jul 2019 10:44:23 +0000 Received: from XCH-RCD-017.cisco.com (xch-rcd-017.cisco.com [173.37.102.27]) by alln-core-11.cisco.com (8.15.2/8.15.2) with ESMTPS id x63AiNKk015957 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=FAIL) for ; Wed, 3 Jul 2019 10:44:23 GMT Received: from localhost.localdomain (10.61.105.203) by XCH-RCD-017.cisco.com (173.37.102.27) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 3 Jul 2019 05:44:21 -0500 From: Jakub Grajciar To: CC: Jakub Grajciar Date: Wed, 3 Jul 2019 12:44:08 +0200 Message-ID: <20190703104408.6953-1-jgrajcia@cisco.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190702102156.28114-1-jgrajcia@cisco.com> References: <20190702102156.28114-1-jgrajcia@cisco.com> MIME-Version: 1.0 X-Originating-IP: [10.61.105.203] X-ClientProxiedBy: xch-rtp-009.cisco.com (64.101.220.149) To XCH-RCD-017.cisco.com (173.37.102.27) X-Outbound-SMTP-Client: 173.37.102.27, xch-rcd-017.cisco.com X-Outbound-Node: alln-core-11.cisco.com Subject: [dpdk-dev] [PATCH v3] net/memif: zero-copy slave X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Zero-copy slave support for memif PMD. Slave interface exposes DPDK memory to master interface. Only single file segments are supported (EAL option --single-file-segments). Signed-off-by: Jakub Grajciar --- doc/guides/nics/memif.rst | 29 ++ drivers/net/memif/Makefile | 1 + drivers/net/memif/memif_rxtx.c | 567 ++++++++++++++++++++++++++++ drivers/net/memif/memif_rxtx.h | 36 ++ drivers/net/memif/memif_socket.c | 63 ++-- drivers/net/memif/meson.build | 3 +- drivers/net/memif/rte_eth_memif.c | 594 +++++++++++------------------- drivers/net/memif/rte_eth_memif.h | 12 +- 8 files changed, 887 insertions(+), 418 deletions(-) create mode 100644 drivers/net/memif/memif_rxtx.c create mode 100644 drivers/net/memif/memif_rxtx.h V2: - fix coding style V3: - fix compilation issues -- 2.17.1 diff --git a/doc/guides/nics/memif.rst b/doc/guides/nics/memif.rst index de2d481eb..46cadb13f 100644 --- a/doc/guides/nics/memif.rst +++ b/doc/guides/nics/memif.rst @@ -171,6 +171,35 @@ Files - net/memif/memif.h *- descriptor and ring definitions* - net/memif/rte_eth_memif.c *- eth_memif_rx() eth_memif_tx()* +Zero-copy slave +~~~~~~~~~~~~~~~ + +**Shared memory format** + +Region 0 is created by memif driver and contains rings. Slave interface exposes DPDK memory (memseg). +Instead of using memfd_create() to create new shared file, existing memsegs are used. +Master interface functions the same as with zero-copy disabled. + +region 0: + ++-----------------------+ +| Rings | ++-----------+-----------+ +| S2M rings | M2S rings | ++-----------+-----------+ + +region n: + ++-----------------+ +| Buffers | ++-----------------+ +|memseg | ++-----------------+ + +Buffers are dequeued and enqueued as needed. Offset descriptor field is calculated at tx. +Only single file segments mode (EAL option --single-file-segments) is supported, as calculating +offset from multiple segments is too expensive. + Example: testpmd ---------------------------- In this example we run two instances of testpmd application and transmit packets over memif. diff --git a/drivers/net/memif/Makefile b/drivers/net/memif/Makefile index fdbdf3378..ceb24df73 100644 --- a/drivers/net/memif/Makefile +++ b/drivers/net/memif/Makefile @@ -30,5 +30,6 @@ LDLIBS += -lrte_bus_vdev # SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += rte_eth_memif.c SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif_socket.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif_rxtx.c include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/memif/memif_rxtx.c b/drivers/net/memif/memif_rxtx.c new file mode 100644 index 000000000..f34cafacd --- /dev/null +++ b/drivers/net/memif/memif_rxtx.c @@ -0,0 +1,567 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright 2019 Cisco Systems, Inc. All rights reserved. + */ + +#include +#include + +#include +#include +#include +#include +#include + +#include + +#include "rte_eth_memif.h" +#include "memif_rxtx.h" + +static void * +memif_get_buffer(struct pmd_process_private *proc_private, memif_desc_t *d) +{ + return ((uint8_t *)proc_private->regions[d->region]->addr + d->offset); +} + +/* Free mbufs received by master */ +static void +memif_free_stored_mbufs(struct pmd_process_private *proc_private, struct memif_queue *mq) +{ + uint16_t mask = (1 << mq->log2_ring_size) - 1; + memif_ring_t *ring = memif_get_ring_from_queue(proc_private, mq); + + /* FIXME: improve performance */ + while (mq->last_tail != ring->tail) { + RTE_MBUF_PREFETCH_TO_FREE(mq->buffers[(mq->last_tail + 1) & mask]); + /* Decrement refcnt and free mbuf. (current segment) */ + rte_mbuf_refcnt_update(mq->buffers[mq->last_tail & mask], -1); + rte_pktmbuf_free_seg(mq->buffers[mq->last_tail & mask]); + mq->last_tail++; + } +} + +static int +memif_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *cur_tail, + struct rte_mbuf *tail) +{ + /* Check for number-of-segments-overflow */ + if (unlikely(head->nb_segs + tail->nb_segs > RTE_MBUF_MAX_NB_SEGS)) + return -EOVERFLOW; + + /* Chain 'tail' onto the old tail */ + cur_tail->next = tail; + + /* accumulate number of segments and total length. */ + head->nb_segs = (uint16_t)(head->nb_segs + tail->nb_segs); + + tail->pkt_len = tail->data_len; + head->pkt_len += tail->pkt_len; + + return 0; +} + +uint16_t +eth_memif_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) +{ + struct memif_queue *mq = queue; + struct pmd_internals *pmd = rte_eth_devices[mq->in_port].data->dev_private; + struct pmd_process_private *proc_private = + rte_eth_devices[mq->in_port].process_private; + memif_ring_t *ring = memif_get_ring_from_queue(proc_private, mq); + uint16_t cur_slot, last_slot, n_slots, ring_size, mask, s0; + uint16_t n_rx_pkts = 0; + uint16_t mbuf_size = rte_pktmbuf_data_room_size(mq->mempool) - + RTE_PKTMBUF_HEADROOM; + uint16_t src_len, src_off, dst_len, dst_off, cp_len; + memif_ring_type_t type = mq->type; + memif_desc_t *d0; + struct rte_mbuf *mbuf, *mbuf_head, *mbuf_tail; + uint64_t b; + ssize_t size __rte_unused; + uint16_t head; + int ret; + struct rte_eth_link link; + + if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0)) + return 0; + if (unlikely(ring == NULL)) { + /* Secondary process will attempt to request regions. */ + rte_eth_link_get(mq->in_port, &link); + return 0; + } + + /* consume interrupt */ + if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) + size = read(mq->intr_handle.fd, &b, sizeof(b)); + + ring_size = 1 << mq->log2_ring_size; + mask = ring_size - 1; + + cur_slot = (type == MEMIF_RING_S2M) ? mq->last_head : mq->last_tail; + last_slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail; + if (cur_slot == last_slot) + goto refill; + n_slots = last_slot - cur_slot; + + while (n_slots && n_rx_pkts < nb_pkts) { + mbuf_head = rte_pktmbuf_alloc(mq->mempool); + if (unlikely(mbuf_head == NULL)) + goto no_free_bufs; + mbuf = mbuf_head; + mbuf->port = mq->in_port; + +next_slot: + s0 = cur_slot & mask; + d0 = &ring->desc[s0]; + + src_len = d0->length; + dst_off = 0; + src_off = 0; + + do { + dst_len = mbuf_size - dst_off; + if (dst_len == 0) { + dst_off = 0; + dst_len = mbuf_size; + + /* store pointer to tail */ + mbuf_tail = mbuf; + mbuf = rte_pktmbuf_alloc(mq->mempool); + if (unlikely(mbuf == NULL)) + goto no_free_bufs; + mbuf->port = mq->in_port; + ret = memif_pktmbuf_chain(mbuf_head, mbuf_tail, mbuf); + if (unlikely(ret < 0)) { + MIF_LOG(ERR, "number-of-segments-overflow"); + rte_pktmbuf_free(mbuf); + goto no_free_bufs; + } + } + cp_len = RTE_MIN(dst_len, src_len); + + rte_pktmbuf_data_len(mbuf) += cp_len; + rte_pktmbuf_pkt_len(mbuf) = rte_pktmbuf_data_len(mbuf); + if (mbuf != mbuf_head) + rte_pktmbuf_pkt_len(mbuf_head) += cp_len; + + memcpy(rte_pktmbuf_mtod_offset(mbuf, void *, dst_off), + (uint8_t *)memif_get_buffer(proc_private, d0) + src_off, + cp_len); + + src_off += cp_len; + dst_off += cp_len; + src_len -= cp_len; + } while (src_len); + + cur_slot++; + n_slots--; + + if (d0->flags & MEMIF_DESC_FLAG_NEXT) + goto next_slot; + + mq->n_bytes += rte_pktmbuf_pkt_len(mbuf_head); + *bufs++ = mbuf_head; + n_rx_pkts++; + } + +no_free_bufs: + if (type == MEMIF_RING_S2M) { + rte_mb(); + ring->tail = cur_slot; + mq->last_head = cur_slot; + } else { + mq->last_tail = cur_slot; + } + +refill: + if (type == MEMIF_RING_M2S) { + head = ring->head; + n_slots = ring_size - head + mq->last_tail; + + while (n_slots--) { + s0 = head++ & mask; + d0 = &ring->desc[s0]; + d0->length = pmd->run.pkt_buffer_size; + } + rte_mb(); + ring->head = head; + } + + mq->n_pkts += n_rx_pkts; + return n_rx_pkts; +} + +uint16_t +eth_memif_rx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) +{ + struct memif_queue *mq = queue; + struct pmd_internals *pmd = rte_eth_devices[mq->in_port].data->dev_private; + struct pmd_process_private *proc_private = + rte_eth_devices[mq->in_port].process_private; + memif_ring_t *ring = memif_get_ring_from_queue(proc_private, mq); + uint16_t cur_slot, last_slot, n_slots, ring_size, mask, s0, head; + uint16_t n_rx_pkts = 0; + memif_desc_t *d0; + struct rte_mbuf *mbuf, *mbuf_tail; + struct rte_mbuf *mbuf_head = NULL; + int ret; + struct rte_eth_link link; + + if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0)) + return 0; + if (unlikely(ring == NULL)) { + /* Secondary process will attempt to request regions. */ + rte_eth_link_get(mq->in_port, &link); + return 0; + } + + /* consume interrupt */ + if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) { + uint64_t b; + ssize_t size __rte_unused; + size = read(mq->intr_handle.fd, &b, sizeof(b)); + } + + ring_size = 1 << mq->log2_ring_size; + mask = ring_size - 1; + + cur_slot = mq->last_tail; + last_slot = ring->tail; + if (cur_slot == last_slot) + goto refill; + n_slots = last_slot - cur_slot; + + while (n_slots && n_rx_pkts < nb_pkts) { + s0 = cur_slot & mask; + + d0 = &ring->desc[s0]; + mbuf_head = mq->buffers[s0]; + mbuf = mbuf_head; + +next_slot: + /* prefetch next descriptor */ + if (n_rx_pkts + 1 < nb_pkts) + rte_prefetch0(&ring->desc[(cur_slot + 1) & mask]); + + mbuf->port = mq->in_port; + rte_pktmbuf_data_len(mbuf) = d0->length; + rte_pktmbuf_pkt_len(mbuf) = rte_pktmbuf_data_len(mbuf); + + mq->n_bytes += rte_pktmbuf_data_len(mbuf); + + cur_slot++; + n_slots--; + if (d0->flags & MEMIF_DESC_FLAG_NEXT) { + s0 = cur_slot & mask; + d0 = &ring->desc[s0]; + mbuf_tail = mbuf; + mbuf = mq->buffers[s0]; + ret = memif_pktmbuf_chain(mbuf_head, mbuf_tail, mbuf); + if (unlikely(ret < 0)) { + MIF_LOG(ERR, "number-of-segments-overflow"); + goto refill; + } + goto next_slot; + } + + *bufs++ = mbuf_head; + n_rx_pkts++; + } + + mq->last_tail = cur_slot; + +/* Supply master with new buffers */ +refill: + head = ring->head; + n_slots = ring_size - head + mq->last_tail; + + if (n_slots < 32) + goto no_free_mbufs; + + ret = rte_pktmbuf_alloc_bulk(mq->mempool, &mq->buffers[head & mask], n_slots); + if (unlikely(ret < 0)) + goto no_free_mbufs; + + while (n_slots--) { + s0 = head++ & mask; + if (n_slots > 0) + rte_prefetch0(mq->buffers[head & mask]); + d0 = &ring->desc[s0]; + /* store buffer header */ + mbuf = mq->buffers[s0]; + /* populate descriptor */ + d0->length = rte_pktmbuf_data_room_size(mq->mempool) - + RTE_PKTMBUF_HEADROOM; + d0->region = 1; + d0->offset = rte_pktmbuf_mtod(mbuf, uint8_t *) - + (uint8_t *)proc_private->regions[d0->region]->addr; + } +no_free_mbufs: + rte_mb(); + ring->head = head; + + mq->n_pkts += n_rx_pkts; + + return n_rx_pkts; +} + +uint16_t +eth_memif_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) +{ + struct memif_queue *mq = queue; + struct pmd_internals *pmd = rte_eth_devices[mq->in_port].data->dev_private; + struct pmd_process_private *proc_private = + rte_eth_devices[mq->in_port].process_private; + memif_ring_t *ring = memif_get_ring_from_queue(proc_private, mq); + uint16_t slot, saved_slot, n_free, ring_size, mask, n_tx_pkts = 0; + uint16_t src_len, src_off, dst_len, dst_off, cp_len; + memif_ring_type_t type = mq->type; + memif_desc_t *d0; + struct rte_mbuf *mbuf; + struct rte_mbuf *mbuf_head; + uint64_t a; + ssize_t size; + struct rte_eth_link link; + + if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0)) + return 0; + if (unlikely(ring == NULL)) { + /* Secondary process will attempt to request regions. */ + rte_eth_link_get(mq->in_port, &link); + return 0; + } + + ring_size = 1 << mq->log2_ring_size; + mask = ring_size - 1; + + n_free = ring->tail - mq->last_tail; + mq->last_tail += n_free; + slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail; + + if (type == MEMIF_RING_S2M) + n_free = ring_size - ring->head + mq->last_tail; + else + n_free = ring->head - ring->tail; + + while (n_tx_pkts < nb_pkts && n_free) { + mbuf_head = *bufs++; + mbuf = mbuf_head; + + saved_slot = slot; + d0 = &ring->desc[slot & mask]; + dst_off = 0; + dst_len = (type == MEMIF_RING_S2M) ? + pmd->run.pkt_buffer_size : d0->length; + +next_in_chain: + src_off = 0; + src_len = rte_pktmbuf_data_len(mbuf); + + while (src_len) { + if (dst_len == 0) { + if (n_free) { + slot++; + n_free--; + d0->flags |= MEMIF_DESC_FLAG_NEXT; + d0 = &ring->desc[slot & mask]; + dst_off = 0; + dst_len = (type == MEMIF_RING_S2M) ? + pmd->run.pkt_buffer_size : d0->length; + d0->flags = 0; + } else { + slot = saved_slot; + goto no_free_slots; + } + } + cp_len = RTE_MIN(dst_len, src_len); + + memcpy((uint8_t *)memif_get_buffer(proc_private, d0) + dst_off, + rte_pktmbuf_mtod_offset(mbuf, void *, src_off), + cp_len); + + mq->n_bytes += cp_len; + src_off += cp_len; + dst_off += cp_len; + src_len -= cp_len; + dst_len -= cp_len; + + d0->length = dst_off; + } + + if (rte_pktmbuf_is_contiguous(mbuf) == 0) { + mbuf = mbuf->next; + goto next_in_chain; + } + + n_tx_pkts++; + slot++; + n_free--; + rte_pktmbuf_free(mbuf_head); + } + +no_free_slots: + rte_mb(); + if (type == MEMIF_RING_S2M) + ring->head = slot; + else + ring->tail = slot; + + if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) { + a = 1; + size = write(mq->intr_handle.fd, &a, sizeof(a)); + if (unlikely(size < 0)) { + MIF_LOG(WARNING, + "Failed to send interrupt. %s", strerror(errno)); + } + } + + mq->n_err += nb_pkts - n_tx_pkts; + mq->n_pkts += n_tx_pkts; + return n_tx_pkts; +} + +static inline int +memif_tx_one_zc(struct pmd_process_private *proc_private, struct memif_queue *mq, + memif_ring_t *ring, struct rte_mbuf *mbuf, const uint16_t mask, + uint16_t slot, uint16_t n_free) +{ + memif_desc_t *d0; + int used_slots = 1; + +next_in_chain: + /* store pointer to mbuf to free it later */ + mq->buffers[slot & mask] = mbuf; + /* Increment refcnt to make sure the buffer is not freed before master + * receives it. (current segment) + */ + rte_mbuf_refcnt_update(mbuf, 1); + /* populate descriptor */ + d0 = &ring->desc[slot & mask]; + d0->length = rte_pktmbuf_data_len(mbuf); + /* FIXME: get region index */ + d0->region = 1; + d0->offset = rte_pktmbuf_mtod(mbuf, uint8_t *) - + (uint8_t *)proc_private->regions[d0->region]->addr; + d0->flags = 0; + + /* check if buffer is chained */ + if (rte_pktmbuf_is_contiguous(mbuf) == 0) { + if (n_free < 2) + return 0; + /* mark buffer as chained */ + d0->flags |= MEMIF_DESC_FLAG_NEXT; + /* advance mbuf */ + mbuf = mbuf->next; + /* update counters */ + used_slots++; + slot++; + n_free--; + goto next_in_chain; + } + return used_slots; +} + +uint16_t +eth_memif_tx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) +{ + struct memif_queue *mq = queue; + struct pmd_internals *pmd = rte_eth_devices[mq->in_port].data->dev_private; + struct pmd_process_private *proc_private = + rte_eth_devices[mq->in_port].process_private; + memif_ring_t *ring = memif_get_ring_from_queue(proc_private, mq); + uint16_t slot, n_free, ring_size, mask, n_tx_pkts = 0; + memif_ring_type_t type = mq->type; + struct rte_eth_link link; + + if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0)) + return 0; + if (unlikely(ring == NULL)) { + /* Secondary process will attempt to request regions. */ + rte_eth_link_get(mq->in_port, &link); + return 0; + } + + ring_size = 1 << mq->log2_ring_size; + mask = ring_size - 1; + + /* free mbufs received by master */ + memif_free_stored_mbufs(proc_private, mq); + + /* ring type always MEMIF_RING_S2M */ + slot = ring->head; + n_free = ring_size - ring->head + mq->last_tail; + + int used_slots; + + while (n_free && (n_tx_pkts < nb_pkts)) { + while ((n_free > 4) && ((nb_pkts - n_tx_pkts) > 4)) { + if ((nb_pkts - n_tx_pkts) > 8) { + rte_prefetch0(*bufs + 4); + rte_prefetch0(*bufs + 5); + rte_prefetch0(*bufs + 6); + rte_prefetch0(*bufs + 7); + } + used_slots = memif_tx_one_zc(proc_private, mq, ring, *bufs++, + mask, slot, n_free); + if (unlikely(used_slots < 1)) + goto no_free_slots; + n_tx_pkts++; + slot += used_slots; + n_free -= used_slots; + + used_slots = memif_tx_one_zc(proc_private, mq, ring, *bufs++, + mask, slot, n_free); + if (unlikely(used_slots < 1)) + goto no_free_slots; + n_tx_pkts++; + slot += used_slots; + n_free -= used_slots; + + used_slots = memif_tx_one_zc(proc_private, mq, ring, *bufs++, + mask, slot, n_free); + if (unlikely(used_slots < 1)) + goto no_free_slots; + n_tx_pkts++; + slot += used_slots; + n_free -= used_slots; + + used_slots = memif_tx_one_zc(proc_private, mq, ring, *bufs++, + mask, slot, n_free); + if (unlikely(used_slots < 1)) + goto no_free_slots; + n_tx_pkts++; + slot += used_slots; + n_free -= used_slots; + } + used_slots = memif_tx_one_zc(proc_private, mq, ring, *bufs++, + mask, slot, n_free); + if (unlikely(used_slots < 1)) + goto no_free_slots; + n_tx_pkts++; + slot += used_slots; + n_free -= used_slots; + } + +no_free_slots: + rte_mb(); + /* update ring pointers */ + if (type == MEMIF_RING_S2M) + ring->head = slot; + else + ring->tail = slot; + + /* Send interrupt, if enabled. */ + if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) { + uint64_t a = 1; + ssize_t size = write(mq->intr_handle.fd, &a, sizeof(a)); + if (unlikely(size < 0)) { + MIF_LOG(WARNING, + "Failed to send interrupt. %s", strerror(errno)); + } + } + + /* increment queue counters */ + mq->n_err += nb_pkts - n_tx_pkts; + mq->n_pkts += n_tx_pkts; + + return n_tx_pkts; +} diff --git a/drivers/net/memif/memif_rxtx.h b/drivers/net/memif/memif_rxtx.h new file mode 100644 index 000000000..1d00865b8 --- /dev/null +++ b/drivers/net/memif/memif_rxtx.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright 2019 Cisco Systems, Inc. All rights reserved. + */ + +#ifndef _MEMIF_RX_TX_H_ +#define _MEMIF_RX_TX_H_ + +#include "memif.h" + +/** + * Ger memif ring from shared memory. + * + * @param pmd + * device internals + * @param type + * memif ring direction + * @param ring_idx + * ring index + * + * @return + * - memif ring + */ + +uint16_t +eth_memif_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts); + +uint16_t +eth_memif_rx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts); + +uint16_t +eth_memif_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts); + +uint16_t +eth_memif_tx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts); + +#endif /* MEMIF_RX_TX_H */ diff --git a/drivers/net/memif/memif_socket.c b/drivers/net/memif/memif_socket.c index 01a935f87..2bd7f2927 100644 --- a/drivers/net/memif/memif_socket.c +++ b/drivers/net/memif/memif_socket.c @@ -176,8 +176,7 @@ memif_msg_receive_hello(struct rte_eth_dev *dev, memif_msg_t *msg) strlcpy(pmd->remote_name, (char *)h->name, sizeof(pmd->remote_name)); - MIF_LOG(DEBUG, "%s: Connecting to %s.", - rte_vdev_device_name(pmd->vdev), pmd->remote_name); + MIF_LOG(DEBUG, "Connecting to %s.", pmd->remote_name); return 0; } @@ -339,8 +338,7 @@ memif_msg_receive_connect(struct rte_eth_dev *dev, memif_msg_t *msg) strlcpy(pmd->remote_if_name, (char *)c->if_name, sizeof(pmd->remote_if_name)); - MIF_LOG(INFO, "%s: Remote interface %s connected.", - rte_vdev_device_name(pmd->vdev), pmd->remote_if_name); + MIF_LOG(INFO, "Remote interface %s connected.", pmd->remote_if_name); return 0; } @@ -358,8 +356,7 @@ memif_msg_receive_connected(struct rte_eth_dev *dev, memif_msg_t *msg) strlcpy(pmd->remote_if_name, (char *)c->if_name, sizeof(pmd->remote_if_name)); - MIF_LOG(INFO, "%s: Remote interface %s connected.", - rte_vdev_device_name(pmd->vdev), pmd->remote_if_name); + MIF_LOG(INFO, "Remote interface %s connected.", pmd->remote_if_name); return 0; } @@ -370,14 +367,13 @@ memif_msg_receive_disconnect(struct rte_eth_dev *dev, memif_msg_t *msg) struct pmd_internals *pmd = dev->data->dev_private; memif_msg_disconnect_t *d = &msg->disconnect; - memset(pmd->remote_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE); + memset(pmd->remote_disc_string, 0, sizeof(pmd->remote_disc_string)); strlcpy(pmd->remote_disc_string, (char *)d->string, sizeof(pmd->remote_disc_string)); - MIF_LOG(INFO, "%s: Disconnect received: %s", - rte_vdev_device_name(pmd->vdev), pmd->remote_disc_string); + MIF_LOG(INFO, "Disconnect received: %s", pmd->remote_disc_string); - memset(pmd->local_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE); + memset(pmd->local_disc_string, 0, 96); memif_disconnect(dev); return 0; } @@ -473,7 +469,6 @@ memif_msg_enq_connect(struct rte_eth_dev *dev) { struct pmd_internals *pmd = dev->data->dev_private; struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc); - const char *name = rte_vdev_device_name(pmd->vdev); memif_msg_connect_t *c; if (e == NULL) @@ -481,7 +476,7 @@ memif_msg_enq_connect(struct rte_eth_dev *dev) c = &e->msg.connect; e->msg.type = MEMIF_MSG_TYPE_CONNECT; - strlcpy((char *)c->if_name, name, sizeof(c->if_name)); + strlcpy((char *)c->if_name, dev->data->name, sizeof(c->if_name)); return 0; } @@ -491,7 +486,6 @@ memif_msg_enq_connected(struct rte_eth_dev *dev) { struct pmd_internals *pmd = dev->data->dev_private; struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc); - const char *name = rte_vdev_device_name(pmd->vdev); memif_msg_connected_t *c; if (e == NULL) @@ -499,7 +493,7 @@ memif_msg_enq_connected(struct rte_eth_dev *dev) c = &e->msg.connected; e->msg.type = MEMIF_MSG_TYPE_CONNECTED; - strlcpy((char *)c->if_name, name, sizeof(c->if_name)); + strlcpy((char *)c->if_name, dev->data->name, sizeof(c->if_name)); return 0; } @@ -525,7 +519,6 @@ void memif_disconnect(struct rte_eth_dev *dev) { struct pmd_internals *pmd = dev->data->dev_private; - struct pmd_process_private *proc_private = dev->process_private; struct memif_msg_queue_elt *elt, *next; struct memif_queue *mq; struct rte_intr_handle *ih; @@ -615,7 +608,7 @@ memif_disconnect(struct rte_eth_dev *dev) } } - memif_free_regions(proc_private); + memif_free_regions(dev); /* reset connection configuration */ memset(&pmd->run, 0, sizeof(pmd->run)); @@ -662,7 +655,7 @@ memif_msg_receive(struct memif_control_channel *cc) if (cmsg->cmsg_type == SCM_CREDENTIALS) cr = (struct ucred *)CMSG_DATA(cmsg); else if (cmsg->cmsg_type == SCM_RIGHTS) - memcpy(&afd, CMSG_DATA(cmsg), sizeof(int)); + rte_memcpy(&afd, CMSG_DATA(cmsg), sizeof(int)); } cmsg = CMSG_NXTHDR(&mh, cmsg); } @@ -861,7 +854,7 @@ memif_listener_handler(void *arg) } static struct memif_socket * -memif_socket_create(struct pmd_internals *pmd, char *key, uint8_t listener) +memif_socket_create(char *key, uint8_t listener) { struct memif_socket *sock; struct sockaddr_un un; @@ -899,17 +892,15 @@ memif_socket_create(struct pmd_internals *pmd, char *key, uint8_t listener) if (ret < 0) goto error; - MIF_LOG(DEBUG, "%s: Memif listener socket %s created.", - rte_vdev_device_name(pmd->vdev), sock->filename); + MIF_LOG(DEBUG, "Memif listener socket %s created.", sock->filename); sock->intr_handle.fd = sockfd; sock->intr_handle.type = RTE_INTR_HANDLE_EXT; ret = rte_intr_callback_register(&sock->intr_handle, memif_listener_handler, sock); if (ret < 0) { - MIF_LOG(ERR, "%s: Failed to register interrupt " - "callback for listener socket", - rte_vdev_device_name(pmd->vdev)); + MIF_LOG(ERR, "Failed to register interrupt " + "callback for listener socket"); return NULL; } } @@ -917,8 +908,7 @@ memif_socket_create(struct pmd_internals *pmd, char *key, uint8_t listener) return sock; error: - MIF_LOG(ERR, "%s: Failed to setup socket %s: %s", - rte_vdev_device_name(pmd->vdev), key, strerror(errno)); + MIF_LOG(ERR, "Failed to setup socket %s: %s", key, strerror(errno)); if (sock != NULL) rte_free(sock); return NULL; @@ -960,9 +950,8 @@ memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename) rte_memcpy(key, socket_filename, strlen(socket_filename)); ret = rte_hash_lookup_data(hash, key, (void **)&socket); if (ret < 0) { - socket = memif_socket_create(pmd, key, - (pmd->role == - MEMIF_ROLE_SLAVE) ? 0 : 1); + socket = memif_socket_create(key, + (pmd->role == MEMIF_ROLE_SLAVE) ? 0 : 1); if (socket == NULL) return -1; ret = rte_hash_add_key_data(hash, key, socket); @@ -993,8 +982,7 @@ memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename) elt = rte_malloc("pmd-queue", sizeof(struct memif_socket_dev_list_elt), 0); if (elt == NULL) { - MIF_LOG(ERR, "%s: Failed to add device to socket device list.", - rte_vdev_device_name(pmd->vdev)); + MIF_LOG(ERR, "Failed to add device to socket device list."); return -1; } elt->dev = dev; @@ -1068,8 +1056,7 @@ memif_connect_slave(struct rte_eth_dev *dev) sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0); if (sockfd < 0) { - MIF_LOG(ERR, "%s: Failed to open socket.", - rte_vdev_device_name(pmd->vdev)); + MIF_LOG(ERR, "Failed to open socket."); return -1; } @@ -1080,19 +1067,16 @@ memif_connect_slave(struct rte_eth_dev *dev) ret = connect(sockfd, (struct sockaddr *)&sun, sizeof(struct sockaddr_un)); if (ret < 0) { - MIF_LOG(ERR, "%s: Failed to connect socket: %s.", - rte_vdev_device_name(pmd->vdev), pmd->socket_filename); + MIF_LOG(ERR, "Failed to connect socket: %s.", pmd->socket_filename); goto error; } - MIF_LOG(DEBUG, "%s: Memif socket: %s connected.", - rte_vdev_device_name(pmd->vdev), pmd->socket_filename); + MIF_LOG(DEBUG, "Memif socket: %s connected.", pmd->socket_filename); pmd->cc = rte_zmalloc("memif-cc", sizeof(struct memif_control_channel), 0); if (pmd->cc == NULL) { - MIF_LOG(ERR, "%s: Failed to allocate control channel.", - rte_vdev_device_name(pmd->vdev)); + MIF_LOG(ERR, "Failed to allocate control channel."); goto error; } @@ -1105,8 +1089,7 @@ memif_connect_slave(struct rte_eth_dev *dev) ret = rte_intr_callback_register(&pmd->cc->intr_handle, memif_intr_handler, pmd->cc); if (ret < 0) { - MIF_LOG(ERR, "%s: Failed to register interrupt callback " - "for control fd", rte_vdev_device_name(pmd->vdev)); + MIF_LOG(ERR, "Failed to register interrupt callback for control fd"); goto error; } diff --git a/drivers/net/memif/meson.build b/drivers/net/memif/meson.build index bedc97311..ce7acd1a0 100644 --- a/drivers/net/memif/meson.build +++ b/drivers/net/memif/meson.build @@ -6,7 +6,8 @@ if host_machine.system() != 'linux' endif sources = files('rte_eth_memif.c', - 'memif_socket.c') + 'memif_socket.c', + 'memif_rxtx.c') allow_experimental_apis = true # Experimantal APIs: diff --git a/drivers/net/memif/rte_eth_memif.c b/drivers/net/memif/rte_eth_memif.c index e9ddf6413..324700a2f 100644 --- a/drivers/net/memif/rte_eth_memif.c +++ b/drivers/net/memif/rte_eth_memif.c @@ -23,9 +23,14 @@ #include #include #include +#include +#include +#include +#include #include "rte_eth_memif.h" #include "memif_socket.h" +#include "memif_rxtx.h" #define ETH_MEMIF_ID_ARG "id" #define ETH_MEMIF_ROLE_ARG "role" @@ -56,6 +61,122 @@ memif_version(void) return ("memif-" RTE_STR(MEMIF_VERSION_MAJOR) "." RTE_STR(MEMIF_VERSION_MINOR)); } +static int +memif_region_init_zc(const struct rte_memseg_list *msl, const struct rte_memseg *ms, + void *arg) +{ + struct pmd_process_private *proc_private = (struct pmd_process_private *)arg; + struct memif_region *r; + + if (proc_private->regions_num < 1) { + MIF_LOG(ERR, "Missing descriptor region"); + return -1; + } + + r = proc_private->regions[proc_private->regions_num - 1]; + + if (r->addr != msl->base_va) + r = proc_private->regions[++proc_private->regions_num - 1]; + + if (r == NULL) { + r = rte_zmalloc("region", sizeof(struct memif_region), 0); + if (r == NULL) { + MIF_LOG(ERR, "Failed to alloc memif region."); + return -ENOMEM; + } + + r->addr = msl->base_va; + r->region_size = ms->len; + r->fd = rte_memseg_get_fd(ms); + if (r->fd < 0) + return -1; + r->pkt_buffer_offset = 0; + + proc_private->regions[proc_private->regions_num - 1] = r; + } else { + r->region_size += ms->len; + } + + return 0; +} + +static int +memif_region_init_shm(struct rte_eth_dev *dev, uint8_t has_buffers) +{ + struct pmd_internals *pmd = dev->data->dev_private; + struct pmd_process_private *proc_private = dev->process_private; + char shm_name[ETH_MEMIF_SHM_NAME_SIZE]; + int ret = 0; + struct memif_region *r; + + if (proc_private->regions_num >= ETH_MEMIF_MAX_REGION_NUM) { + MIF_LOG(ERR, "Too many regions."); + return -1; + } + + r = rte_zmalloc("region", sizeof(struct memif_region), 0); + if (r == NULL) { + MIF_LOG(ERR, "Failed to alloc memif region."); + return -ENOMEM; + } + + /* calculate buffer offset */ + r->pkt_buffer_offset = (pmd->run.num_s2m_rings + pmd->run.num_m2s_rings) * + (sizeof(memif_ring_t) + sizeof(memif_desc_t) * + (1 << pmd->run.log2_ring_size)); + + r->region_size = r->pkt_buffer_offset; + /* if region has buffers, add buffers size to region_size */ + if (has_buffers == 1) + r->region_size += (uint32_t)(pmd->run.pkt_buffer_size * + (1 << pmd->run.log2_ring_size) * + (pmd->run.num_s2m_rings + + pmd->run.num_m2s_rings)); + + memset(shm_name, 0, sizeof(char) * ETH_MEMIF_SHM_NAME_SIZE); + snprintf(shm_name, ETH_MEMIF_SHM_NAME_SIZE, "memif_region_%d", + proc_private->regions_num); + + r->fd = memfd_create(shm_name, MFD_ALLOW_SEALING); + if (r->fd < 0) { + MIF_LOG(ERR, "Failed to create shm file: %s.", strerror(errno)); + ret = -1; + goto error; + } + + ret = fcntl(r->fd, F_ADD_SEALS, F_SEAL_SHRINK); + if (ret < 0) { + MIF_LOG(ERR, "Failed to add seals to shm file: %s.", strerror(errno)); + goto error; + } + + ret = ftruncate(r->fd, r->region_size); + if (ret < 0) { + MIF_LOG(ERR, "Failed to truncate shm file: %s.", strerror(errno)); + goto error; + } + + r->addr = mmap(NULL, r->region_size, PROT_READ | + PROT_WRITE, MAP_SHARED, r->fd, 0); + if (r->addr == MAP_FAILED) { + MIF_LOG(ERR, "Failed to mmap shm region: %s.", strerror(ret)); + ret = -1; + goto error; + } + + proc_private->regions[proc_private->regions_num] = r; + proc_private->regions_num++; + + return ret; + +error: + if (r->fd > 0) + close(r->fd); + r->fd = -1; + + return ret; +} + /* Message header to synchronize regions */ struct mp_region_msg { char port_name[RTE_DEV_NAME_MAX_LEN]; @@ -116,10 +237,14 @@ memif_mp_request_regions(struct rte_eth_dev *dev) struct mp_region_msg *reply_param; struct memif_region *r; struct pmd_process_private *proc_private = dev->process_private; + struct pmd_internals *pmd = dev->data->dev_private; + /* in case of zero-copy slave, only request region 0 */ + uint16_t max_region_num = (pmd->flags & ETH_MEMIF_FLAG_ZERO_COPY) ? + 1 : ETH_MEMIF_MAX_REGION_NUM; MIF_LOG(DEBUG, "Requesting memory regions"); - for (i = 0; i < ETH_MEMIF_MAX_REGION_NUM; i++) { + for (i = 0; i < max_region_num; i++) { /* Prepare the message */ memset(&msg, 0, sizeof(msg)); strlcpy(msg.name, MEMIF_MP_SEND_REGION, sizeof(msg.name)); @@ -161,6 +286,12 @@ memif_mp_request_regions(struct rte_eth_dev *dev) free(reply); } + if (pmd->flags & ETH_MEMIF_FLAG_ZERO_COPY) { + ret = rte_memseg_walk(memif_region_init_zc, (void *)proc_private); + if (ret < 0) + return ret; + } + return memif_connect(dev); } @@ -199,7 +330,7 @@ memif_get_ring_offset(struct rte_eth_dev *dev, struct memif_queue *mq, (uint8_t *)proc_private->regions[mq->region]->addr); } -static memif_ring_t * +memif_ring_t * memif_get_ring_from_queue(struct pmd_process_private *proc_private, struct memif_queue *mq) { @@ -212,291 +343,26 @@ memif_get_ring_from_queue(struct pmd_process_private *proc_private, return (memif_ring_t *)((uint8_t *)r->addr + mq->ring_offset); } -static void * -memif_get_buffer(struct pmd_process_private *proc_private, memif_desc_t *d) -{ - return ((uint8_t *)proc_private->regions[d->region]->addr + d->offset); -} - -static int -memif_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *cur_tail, - struct rte_mbuf *tail) -{ - /* Check for number-of-segments-overflow */ - if (unlikely(head->nb_segs + tail->nb_segs > RTE_MBUF_MAX_NB_SEGS)) - return -EOVERFLOW; - - /* Chain 'tail' onto the old tail */ - cur_tail->next = tail; - - /* accumulate number of segments and total length. */ - head->nb_segs = (uint16_t)(head->nb_segs + tail->nb_segs); - - tail->pkt_len = tail->data_len; - head->pkt_len += tail->pkt_len; - - return 0; -} - -static uint16_t -eth_memif_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) -{ - struct memif_queue *mq = queue; - struct pmd_internals *pmd = rte_eth_devices[mq->in_port].data->dev_private; - struct pmd_process_private *proc_private = - rte_eth_devices[mq->in_port].process_private; - memif_ring_t *ring = memif_get_ring_from_queue(proc_private, mq); - uint16_t cur_slot, last_slot, n_slots, ring_size, mask, s0; - uint16_t n_rx_pkts = 0; - uint16_t mbuf_size = rte_pktmbuf_data_room_size(mq->mempool) - - RTE_PKTMBUF_HEADROOM; - uint16_t src_len, src_off, dst_len, dst_off, cp_len; - memif_ring_type_t type = mq->type; - memif_desc_t *d0; - struct rte_mbuf *mbuf, *mbuf_head, *mbuf_tail; - uint64_t b; - ssize_t size __rte_unused; - uint16_t head; - int ret; - struct rte_eth_link link; - - if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0)) - return 0; - if (unlikely(ring == NULL)) { - /* Secondary process will attempt to request regions. */ - rte_eth_link_get(mq->in_port, &link); - return 0; - } - - /* consume interrupt */ - if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) - size = read(mq->intr_handle.fd, &b, sizeof(b)); - - ring_size = 1 << mq->log2_ring_size; - mask = ring_size - 1; - - cur_slot = (type == MEMIF_RING_S2M) ? mq->last_head : mq->last_tail; - last_slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail; - if (cur_slot == last_slot) - goto refill; - n_slots = last_slot - cur_slot; - - while (n_slots && n_rx_pkts < nb_pkts) { - mbuf_head = rte_pktmbuf_alloc(mq->mempool); - if (unlikely(mbuf_head == NULL)) - goto no_free_bufs; - mbuf = mbuf_head; - mbuf->port = mq->in_port; - -next_slot: - s0 = cur_slot & mask; - d0 = &ring->desc[s0]; - - src_len = d0->length; - dst_off = 0; - src_off = 0; - - do { - dst_len = mbuf_size - dst_off; - if (dst_len == 0) { - dst_off = 0; - dst_len = mbuf_size; - - /* store pointer to tail */ - mbuf_tail = mbuf; - mbuf = rte_pktmbuf_alloc(mq->mempool); - if (unlikely(mbuf == NULL)) - goto no_free_bufs; - mbuf->port = mq->in_port; - ret = memif_pktmbuf_chain(mbuf_head, mbuf_tail, mbuf); - if (unlikely(ret < 0)) { - MIF_LOG(ERR, "number-of-segments-overflow"); - rte_pktmbuf_free(mbuf); - goto no_free_bufs; - } - } - cp_len = RTE_MIN(dst_len, src_len); - - rte_pktmbuf_data_len(mbuf) += cp_len; - rte_pktmbuf_pkt_len(mbuf) = rte_pktmbuf_data_len(mbuf); - if (mbuf != mbuf_head) - rte_pktmbuf_pkt_len(mbuf_head) += cp_len; - - memcpy(rte_pktmbuf_mtod_offset(mbuf, void *, dst_off), - (uint8_t *)memif_get_buffer(proc_private, d0) + - src_off, cp_len); - - src_off += cp_len; - dst_off += cp_len; - src_len -= cp_len; - } while (src_len); - - cur_slot++; - n_slots--; - - if (d0->flags & MEMIF_DESC_FLAG_NEXT) - goto next_slot; - - mq->n_bytes += rte_pktmbuf_pkt_len(mbuf_head); - *bufs++ = mbuf_head; - n_rx_pkts++; - } - -no_free_bufs: - if (type == MEMIF_RING_S2M) { - rte_mb(); - ring->tail = cur_slot; - mq->last_head = cur_slot; - } else { - mq->last_tail = cur_slot; - } - -refill: - if (type == MEMIF_RING_M2S) { - head = ring->head; - n_slots = ring_size - head + mq->last_tail; - - while (n_slots--) { - s0 = head++ & mask; - d0 = &ring->desc[s0]; - d0->length = pmd->run.pkt_buffer_size; - } - rte_mb(); - ring->head = head; - } - - mq->n_pkts += n_rx_pkts; - return n_rx_pkts; -} - -static uint16_t -eth_memif_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) -{ - struct memif_queue *mq = queue; - struct pmd_internals *pmd = rte_eth_devices[mq->in_port].data->dev_private; - struct pmd_process_private *proc_private = - rte_eth_devices[mq->in_port].process_private; - memif_ring_t *ring = memif_get_ring_from_queue(proc_private, mq); - uint16_t slot, saved_slot, n_free, ring_size, mask, n_tx_pkts = 0; - uint16_t src_len, src_off, dst_len, dst_off, cp_len; - memif_ring_type_t type = mq->type; - memif_desc_t *d0; - struct rte_mbuf *mbuf; - struct rte_mbuf *mbuf_head; - uint64_t a; - ssize_t size; - struct rte_eth_link link; - - if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0)) - return 0; - if (unlikely(ring == NULL)) { - /* Secondary process will attempt to request regions. */ - rte_eth_link_get(mq->in_port, &link); - return 0; - } - - ring_size = 1 << mq->log2_ring_size; - mask = ring_size - 1; - - n_free = ring->tail - mq->last_tail; - mq->last_tail += n_free; - slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail; - - if (type == MEMIF_RING_S2M) - n_free = ring_size - ring->head + mq->last_tail; - else - n_free = ring->head - ring->tail; - - while (n_tx_pkts < nb_pkts && n_free) { - mbuf_head = *bufs++; - mbuf = mbuf_head; - - saved_slot = slot; - d0 = &ring->desc[slot & mask]; - dst_off = 0; - dst_len = (type == MEMIF_RING_S2M) ? - pmd->run.pkt_buffer_size : d0->length; - -next_in_chain: - src_off = 0; - src_len = rte_pktmbuf_data_len(mbuf); - - while (src_len) { - if (dst_len == 0) { - if (n_free) { - slot++; - n_free--; - d0->flags |= MEMIF_DESC_FLAG_NEXT; - d0 = &ring->desc[slot & mask]; - dst_off = 0; - dst_len = (type == MEMIF_RING_S2M) ? - pmd->run.pkt_buffer_size : d0->length; - d0->flags = 0; - } else { - slot = saved_slot; - goto no_free_slots; - } - } - cp_len = RTE_MIN(dst_len, src_len); - - memcpy((uint8_t *)memif_get_buffer(proc_private, d0) + dst_off, - rte_pktmbuf_mtod_offset(mbuf, void *, src_off), - cp_len); - - mq->n_bytes += cp_len; - src_off += cp_len; - dst_off += cp_len; - src_len -= cp_len; - dst_len -= cp_len; - - d0->length = dst_off; - } - - if (rte_pktmbuf_is_contiguous(mbuf) == 0) { - mbuf = mbuf->next; - goto next_in_chain; - } - - n_tx_pkts++; - slot++; - n_free--; - rte_pktmbuf_free(mbuf_head); - } - -no_free_slots: - rte_mb(); - if (type == MEMIF_RING_S2M) - ring->head = slot; - else - ring->tail = slot; - - if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) { - a = 1; - size = write(mq->intr_handle.fd, &a, sizeof(a)); - if (unlikely(size < 0)) { - MIF_LOG(WARNING, - "Failed to send interrupt. %s", strerror(errno)); - } - } - - mq->n_err += nb_pkts - n_tx_pkts; - mq->n_pkts += n_tx_pkts; - return n_tx_pkts; -} - void -memif_free_regions(struct pmd_process_private *proc_private) +memif_free_regions(struct rte_eth_dev *dev) { + struct pmd_process_private *proc_private = dev->process_private; + struct pmd_internals *pmd = dev->data->dev_private; int i; struct memif_region *r; - MIF_LOG(DEBUG, "Free memory regions"); /* regions are allocated contiguously, so it's * enough to loop until 'proc_private->regions_num' */ for (i = 0; i < proc_private->regions_num; i++) { r = proc_private->regions[i]; if (r != NULL) { + /* This is memzone */ + if (i > 0 && (pmd->flags & ETH_MEMIF_FLAG_ZERO_COPY)) { + r->addr = NULL; + if (r->fd > 0) + close(r->fd); + } if (r->addr != NULL) { munmap(r->addr, r->region_size); if (r->fd > 0) { @@ -511,92 +377,32 @@ memif_free_regions(struct pmd_process_private *proc_private) proc_private->regions_num = 0; } -static int -memif_region_init_shm(struct rte_eth_dev *dev, uint8_t has_buffers) -{ - struct pmd_internals *pmd = dev->data->dev_private; - struct pmd_process_private *proc_private = dev->process_private; - char shm_name[ETH_MEMIF_SHM_NAME_SIZE]; - int ret = 0; - struct memif_region *r; - - if (proc_private->regions_num >= ETH_MEMIF_MAX_REGION_NUM) { - MIF_LOG(ERR, "Too many regions."); - return -1; - } - - r = rte_zmalloc("region", sizeof(struct memif_region), 0); - if (r == NULL) { - MIF_LOG(ERR, "Failed to alloc memif region."); - return -ENOMEM; - } - - /* calculate buffer offset */ - r->pkt_buffer_offset = (pmd->run.num_s2m_rings + pmd->run.num_m2s_rings) * - (sizeof(memif_ring_t) + sizeof(memif_desc_t) * - (1 << pmd->run.log2_ring_size)); - - r->region_size = r->pkt_buffer_offset; - /* if region has buffers, add buffers size to region_size */ - if (has_buffers == 1) - r->region_size += (uint32_t)(pmd->run.pkt_buffer_size * - (1 << pmd->run.log2_ring_size) * - (pmd->run.num_s2m_rings + - pmd->run.num_m2s_rings)); - - memset(shm_name, 0, sizeof(char) * ETH_MEMIF_SHM_NAME_SIZE); - snprintf(shm_name, ETH_MEMIF_SHM_NAME_SIZE, "memif_region_%d", - proc_private->regions_num); - - r->fd = memfd_create(shm_name, MFD_ALLOW_SEALING); - if (r->fd < 0) { - MIF_LOG(ERR, "Failed to create shm file: %s.", strerror(errno)); - ret = -1; - goto error; - } - - ret = fcntl(r->fd, F_ADD_SEALS, F_SEAL_SHRINK); - if (ret < 0) { - MIF_LOG(ERR, "Failed to add seals to shm file: %s.", strerror(errno)); - goto error; - } - - ret = ftruncate(r->fd, r->region_size); - if (ret < 0) { - MIF_LOG(ERR, "Failed to truncate shm file: %s.", strerror(errno)); - goto error; - } - - r->addr = mmap(NULL, r->region_size, PROT_READ | - PROT_WRITE, MAP_SHARED, r->fd, 0); - if (r->addr == MAP_FAILED) { - MIF_LOG(ERR, "Failed to mmap shm region: %s.", strerror(ret)); - ret = -1; - goto error; - } - - proc_private->regions[proc_private->regions_num] = r; - proc_private->regions_num++; - - return ret; - -error: - if (r->fd > 0) - close(r->fd); - r->fd = -1; - - return ret; -} - static int memif_regions_init(struct rte_eth_dev *dev) { + struct pmd_internals *pmd = dev->data->dev_private; int ret; - /* create one buffer region */ - ret = memif_region_init_shm(dev, /* has buffer */ 1); - if (ret < 0) - return ret; + /* + * Zero-copy exposes dpdk memory. + * Each memseg list will be represented by memif region. + * Zero-copy regions indexing: memseg list idx + 1, + * as we already have region 0 reserved for descriptors. + */ + if (pmd->flags & ETH_MEMIF_FLAG_ZERO_COPY) { + /* create region idx 0 containing descriptors */ + ret = memif_region_init_shm(dev, 0); + if (ret < 0) + return ret; + ret = rte_memseg_walk(memif_region_init_zc, (void *)dev->process_private); + if (ret < 0) + return ret; + } else { + /* create one memory region contaning rings and buffers */ + ret = memif_region_init_shm(dev, /* has buffers */ 1); + if (ret < 0) + return ret; + } return 0; } @@ -616,6 +422,10 @@ memif_init_rings(struct rte_eth_dev *dev) ring->tail = 0; ring->cookie = MEMIF_COOKIE; ring->flags = 0; + + if (pmd->flags & ETH_MEMIF_FLAG_ZERO_COPY) + continue; + for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) { slot = i * (1 << pmd->run.log2_ring_size) + j; ring->desc[j].region = 0; @@ -632,6 +442,10 @@ memif_init_rings(struct rte_eth_dev *dev) ring->tail = 0; ring->cookie = MEMIF_COOKIE; ring->flags = 0; + + if (pmd->flags & ETH_MEMIF_FLAG_ZERO_COPY) + continue; + for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) { slot = (i + pmd->run.num_s2m_rings) * (1 << pmd->run.log2_ring_size) + j; @@ -645,7 +459,7 @@ memif_init_rings(struct rte_eth_dev *dev) } /* called only by slave */ -static void +static int memif_init_queues(struct rte_eth_dev *dev) { struct pmd_internals *pmd = dev->data->dev_private; @@ -666,6 +480,13 @@ memif_init_queues(struct rte_eth_dev *dev) "Failed to create eventfd for tx queue %d: %s.", i, strerror(errno)); } + mq->buffers = NULL; + if (pmd->flags & ETH_MEMIF_FLAG_ZERO_COPY) { + mq->buffers = rte_zmalloc("bufs", sizeof(struct rte_mbuf *) * + (1 << mq->log2_ring_size), 0); + if (mq->buffers == NULL) + return -ENOMEM; + } } for (i = 0; i < pmd->run.num_m2s_rings; i++) { @@ -682,7 +503,15 @@ memif_init_queues(struct rte_eth_dev *dev) "Failed to create eventfd for rx queue %d: %s.", i, strerror(errno)); } + mq->buffers = NULL; + if (pmd->flags & ETH_MEMIF_FLAG_ZERO_COPY) { + mq->buffers = rte_zmalloc("bufs", sizeof(struct rte_mbuf *) * + (1 << mq->log2_ring_size), 0); + if (mq->buffers == NULL) + return -ENOMEM; + } } + return 0; } int @@ -696,7 +525,9 @@ memif_init_regions_and_queues(struct rte_eth_dev *dev) memif_init_rings(dev); - memif_init_queues(dev); + ret = memif_init_queues(dev); + if (ret < 0) + return ret; return 0; } @@ -720,8 +551,16 @@ memif_connect(struct rte_eth_dev *dev) mr->addr = mmap(NULL, mr->region_size, PROT_READ | PROT_WRITE, MAP_SHARED, mr->fd, 0); - if (mr->addr == NULL) + if (mr->addr == MAP_FAILED) { + MIF_LOG(ERR, "mmap failed: %s\n", + strerror(errno)); return -1; + } + } + if (i > 0 && (pmd->flags & ETH_MEMIF_FLAG_ZERO_COPY)) { + /* close memseg file */ + close(mr->fd); + mr->fd = -1; } } } @@ -782,8 +621,7 @@ memif_dev_start(struct rte_eth_dev *dev) ret = memif_connect_master(dev); break; default: - MIF_LOG(ERR, "%s: Unknown role: %d.", - rte_vdev_device_name(pmd->vdev), pmd->role); + MIF_LOG(ERR, "Unknown role: %d.", pmd->role); ret = -1; break; } @@ -848,8 +686,7 @@ memif_tx_queue_setup(struct rte_eth_dev *dev, mq = rte_zmalloc("tx-queue", sizeof(struct memif_queue), 0); if (mq == NULL) { - MIF_LOG(ERR, "%s: Failed to allocate tx queue id: %u", - rte_vdev_device_name(pmd->vdev), qid); + MIF_LOG(ERR, "Failed to allocate tx queue id: %u", qid); return -ENOMEM; } @@ -878,8 +715,7 @@ memif_rx_queue_setup(struct rte_eth_dev *dev, mq = rte_zmalloc("rx-queue", sizeof(struct memif_queue), 0); if (mq == NULL) { - MIF_LOG(ERR, "%s: Failed to allocate rx queue id: %u", - rte_vdev_device_name(pmd->vdev), qid); + MIF_LOG(ERR, "Failed to allocate rx queue id: %u", qid); return -ENOMEM; } @@ -920,7 +756,7 @@ memif_link_update(struct rte_eth_dev *dev, memif_mp_request_regions(dev); } else if (dev->data->dev_link.link_status == ETH_LINK_DOWN && proc_private->regions_num > 0) { - memif_free_regions(proc_private); + memif_free_regions(dev); } } return 0; @@ -964,6 +800,7 @@ memif_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats) mq = dev->data->tx_queues[i]; stats->q_opackets[i] = mq->n_pkts; stats->q_obytes[i] = mq->n_bytes; + stats->q_errors[i] = mq->n_err; stats->opackets += mq->n_pkts; stats->obytes += mq->n_bytes; stats->oerrors += mq->n_err; @@ -1043,11 +880,6 @@ memif_create(struct rte_vdev_device *vdev, enum memif_role_t role, const unsigned int numa_node = vdev->device.numa_node; const char *name = rte_vdev_device_name(vdev); - if (flags & ETH_MEMIF_FLAG_ZERO_COPY) { - MIF_LOG(ERR, "Zero-copy slave not supported."); - return -1; - } - eth_dev = rte_eth_vdev_allocate(vdev, sizeof(*pmd)); if (eth_dev == NULL) { MIF_LOG(ERR, "%s: Unable to allocate device struct.", name); @@ -1071,6 +903,9 @@ memif_create(struct rte_vdev_device *vdev, enum memif_role_t role, pmd->flags = flags; pmd->flags |= ETH_MEMIF_FLAG_DISABLED; pmd->role = role; + /* Zero-copy flag irelevant to master. */ + if (pmd->role == MEMIF_ROLE_MASTER) + pmd->flags &= ~ETH_MEMIF_FLAG_ZERO_COPY; ret = memif_socket_init(eth_dev, socket_filename); if (ret < 0) @@ -1094,8 +929,14 @@ memif_create(struct rte_vdev_device *vdev, enum memif_role_t role, eth_dev->dev_ops = &ops; eth_dev->device = &vdev->device; - eth_dev->rx_pkt_burst = eth_memif_rx; - eth_dev->tx_pkt_burst = eth_memif_tx; + if (pmd->flags & ETH_MEMIF_FLAG_ZERO_COPY) { + eth_dev->rx_pkt_burst = eth_memif_rx_zc; + eth_dev->tx_pkt_burst = eth_memif_tx_zc; + } else { + eth_dev->rx_pkt_burst = eth_memif_rx; + eth_dev->tx_pkt_burst = eth_memif_tx; + } + eth_dev->data->dev_flags &= RTE_ETH_DEV_CLOSE_REMOVE; @@ -1124,9 +965,14 @@ memif_set_role(const char *key __rte_unused, const char *value, static int memif_set_zc(const char *key __rte_unused, const char *value, void *extra_args) { + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; uint32_t *flags = (uint32_t *)extra_args; if (strstr(value, "yes") != NULL) { + if (!mcfg->single_file_segments) { + MIF_LOG(ERR, "Zero-copy doesn't support multi-file segments."); + return -ENOTSUP; + } *flags |= ETH_MEMIF_FLAG_ZERO_COPY; } else if (strstr(value, "no") != NULL) { *flags &= ~ETH_MEMIF_FLAG_ZERO_COPY; diff --git a/drivers/net/memif/rte_eth_memif.h b/drivers/net/memif/rte_eth_memif.h index 24e8a0914..dea79820c 100644 --- a/drivers/net/memif/rte_eth_memif.h +++ b/drivers/net/memif/rte_eth_memif.h @@ -63,13 +63,16 @@ struct memif_queue { uint16_t last_head; /**< last ring head */ uint16_t last_tail; /**< last ring tail */ + struct rte_mbuf **buffers; + /**< Stored mbufs. Used in zero-copy tx. Slave stores transmitted + * mbufs to free them once master has received them. + */ + /* rx/tx info */ uint64_t n_pkts; /**< number of rx/tx packets */ uint64_t n_bytes; /**< number of rx/tx bytes */ uint64_t n_err; /**< number of tx errors */ - memif_ring_t *ring; /**< pointer to ring */ - struct rte_intr_handle intr_handle; /**< interrupt handle */ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */ @@ -132,7 +135,7 @@ struct pmd_process_private { * @param proc_private * device process private data */ -void memif_free_regions(struct pmd_process_private *proc_private); +void memif_free_regions(struct rte_eth_dev *dev); /** * Finalize connection establishment process. Map shared memory file @@ -158,6 +161,9 @@ int memif_connect(struct rte_eth_dev *dev); */ int memif_init_regions_and_queues(struct rte_eth_dev *dev); +memif_ring_t *memif_get_ring_from_queue(struct pmd_process_private *proc_private, + struct memif_queue *mq); + /** * Get memif version string. *