From patchwork Thu Aug 16 14:43:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zhang X-Patchwork-Id: 43741 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 367C04C88; Thu, 16 Aug 2018 16:42:52 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 159A749CF for ; Thu, 16 Aug 2018 16:42:49 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Aug 2018 07:42:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,247,1531810800"; d="scan'208";a="65704089" Received: from dpdk51.sh.intel.com ([10.67.110.190]) by orsmga008.jf.intel.com with ESMTP; 16 Aug 2018 07:42:37 -0700 From: Qi Zhang To: dev@dpdk.org Cc: magnus.karlsson@intel.com, bjorn.topel@intel.com, jingjing.wu@intel.com, xiaoyun.li@intel.com, ferruh.yigit@intel.com, Qi Zhang Date: Thu, 16 Aug 2018 22:43:16 +0800 Message-Id: <20180816144321.17719-2-qi.z.zhang@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180816144321.17719-1-qi.z.zhang@intel.com> References: <20180816144321.17719-1-qi.z.zhang@intel.com> Subject: [dpdk-dev] [RFC v3 1/6] net/af_xdp: new PMD driver X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add a new PMD driver for AF_XDP which is a proposed faster version of AF_PACKET interface in Linux. https://fosdem.org/2018/schedule/event/af_xdp/ https://lwn.net/Articles/745934/ This patch enable the vanilla version. Packet data will copy between xdp socket's memory buffer and rx queue's mbuf mempool, also memory allocation of xdp socket's memory buffer is simply managed by a fifo ring. Further improvement will be covered in following patches. Signed-off-by: Qi Zhang --- config/common_base | 5 + config/common_linuxapp | 1 + drivers/net/Makefile | 1 + drivers/net/af_xdp/Makefile | 30 + drivers/net/af_xdp/meson.build | 7 + drivers/net/af_xdp/rte_eth_af_xdp.c | 1247 +++++++++++++++++++++++++ drivers/net/af_xdp/rte_pmd_af_xdp_version.map | 4 + mk/rte.app.mk | 1 + 8 files changed, 1296 insertions(+) create mode 100644 drivers/net/af_xdp/Makefile create mode 100644 drivers/net/af_xdp/meson.build create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map diff --git a/config/common_base b/config/common_base index 4bcbaf923..81aa81754 100644 --- a/config/common_base +++ b/config/common_base @@ -383,6 +383,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n # +# Compile software PMD backed by AF_XDP sockets (Linux only) +# +CONFIG_RTE_LIBRTE_PMD_AF_XDP=n + +# # Compile link bonding PMD library # CONFIG_RTE_LIBRTE_PMD_BOND=y diff --git a/config/common_linuxapp b/config/common_linuxapp index 9c5ea9d89..5fa1cfb87 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -18,6 +18,7 @@ CONFIG_RTE_LIBRTE_PMD_VHOST=y CONFIG_RTE_LIBRTE_IFC_PMD=y CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y +CONFIG_RTE_LIBRTE_PMD_AF_XDP=y CONFIG_RTE_LIBRTE_PMD_TAP=y CONFIG_RTE_LIBRTE_AVP_PMD=y CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=y diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 664398de9..7cff65c45 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d) endif DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet +DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark DIRS-$(CONFIG_RTE_LIBRTE_AVF_PMD) += avf DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile new file mode 100644 index 000000000..8dee0144a --- /dev/null +++ b/drivers/net/af_xdp/Makefile @@ -0,0 +1,30 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2018 Intel Corporation + +include $(RTE_SDK)/mk/rte.vars.mk + +# +# library name +# +LIB = librte_pmd_af_xdp.a + +EXPORT_MAP := rte_pmd_af_xdp_version.map + +LIBABIVER := 1 + + +CFLAGS += -O3 +# below line should be removed +CFLAGS += -I/home/qzhan15/bpf/usr/include + +CFLAGS += $(WERROR_FLAGS) +LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring +LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs +LDLIBS += -lrte_bus_vdev + +# +# all source are stored in SRCS-y +# +SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build new file mode 100644 index 000000000..4b6652685 --- /dev/null +++ b/drivers/net/af_xdp/meson.build @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2018 Intel Corporation + +if host_machine.system() != 'linux' + build = false +endif +sources = files('rte_eth_af_xdp.c') diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c new file mode 100644 index 000000000..12252014d --- /dev/null +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c @@ -0,0 +1,1247 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation. + */ + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifndef SOL_XDP +#define SOL_XDP 283 +#endif + +#ifndef AF_XDP +#define AF_XDP 44 +#endif + +#ifndef PF_XDP +#define PF_XDP AF_XDP +#endif + +#define ETH_AF_XDP_IFACE_ARG "iface" +#define ETH_AF_XDP_QUEUE_IDX_ARG "queue" +#define ETH_AF_XDP_XSK_MAP_ID_ARG "xsk_map_id" +#define ETH_AF_XDP_XSK_MAP_KEY_START_ARG "xsk_map_key_start" +#define ETH_AF_XDP_XSK_MAP_KEY_COUNT_ARG "xsk_map_key_count" + +#define ETH_AF_XDP_FRAME_SIZE 2048 +#define ETH_AF_XDP_NUM_BUFFERS 4096 +#define ETH_AF_XDP_DATA_HEADROOM 0 +#define ETH_AF_XDP_DFLT_NUM_DESCS 1024 +#define ETH_AF_XDP_FQ_NUM_DESCS 1024 +#define ETH_AF_XDP_CQ_NUM_DESCS 1024 +#define ETH_AF_XDP_DFLT_QUEUE_IDX 0 + +#define ETH_AF_XDP_RX_BATCH_SIZE 16 +#define ETH_AF_XDP_TX_BATCH_SIZE 16 + +#define ETH_AF_XDP_MAX_QUEUE_PAIRS 16 + +struct xdp_umem_uqueue { + uint32_t cached_prod; + uint32_t cached_cons; + uint32_t mask; + uint32_t size; + uint32_t *producer; + uint32_t *consumer; + uint64_t *ring; + void *map; +}; + +struct xdp_umem { + char *frames; + struct xdp_umem_uqueue fq; + struct xdp_umem_uqueue cq; + struct rte_ring *buf_ring; /* be used to manage the buffer */ + int fd; +}; + +struct xdp_uqueue { + uint32_t cached_prod; + uint32_t cached_cons; + uint32_t mask; + uint32_t size; + uint32_t *producer; + uint32_t *consumer; + struct xdp_desc *ring; + void *map; +}; + +static inline uint32_t xq_nb_avail(struct xdp_uqueue *q, uint32_t ndescs) +{ + uint32_t entries = q->cached_prod - q->cached_cons; + + if (entries == 0) { + q->cached_prod = *q->producer; + entries = q->cached_prod - q->cached_cons; + } + + return (entries > ndescs) ? ndescs : entries; +} + +static inline uint32_t xq_nb_free(struct xdp_uqueue *q, uint32_t ndescs) +{ + uint32_t free_entries = q->cached_cons - q->cached_prod; + + if (free_entries >= ndescs) + return free_entries; + + /* Refresh the local tail pointer */ + q->cached_cons = *q->consumer + q->size; + return q->cached_cons - q->cached_prod; +} + +static inline uint32_t umem_nb_avail(struct xdp_umem_uqueue *q, uint32_t nb) +{ + uint32_t entries = q->cached_prod - q->cached_cons; + + if (entries == 0) { + q->cached_prod = *q->producer; + entries = q->cached_prod - q->cached_cons; + } + return (entries > nb) ? nb : entries; +} + +static inline uint32_t umem_nb_free(struct xdp_umem_uqueue *q, uint32_t nb) +{ + uint32_t free_entries = q->cached_cons - q->cached_prod; + + if (free_entries >= nb) + return free_entries; + + /* Refresh the local tail pointer */ + q->cached_cons = *q->consumer + q->size; + + return q->cached_cons - q->cached_prod; +} + +static inline int umem_fill_to_kernel_ex(struct xdp_umem_uqueue *fq, + struct xdp_desc *d, + size_t nb) +{ + uint32_t i; + + if (umem_nb_free(fq, nb) < nb) + return -ENOSPC; + + for (i = 0; i < nb; i++) { + uint32_t idx = fq->cached_prod++ & fq->mask; + + fq->ring[idx] = d[i].addr; + } + + rte_smp_wmb(); + + *fq->producer = fq->cached_prod; + + return 0; +} + +static inline int umem_fill_to_kernel(struct xdp_umem_uqueue *fq, + uint64_t *d, + size_t nb) +{ + uint32_t i; + + if (umem_nb_free(fq, nb) < nb) + return -ENOSPC; + + for (i = 0; i < nb; i++) { + uint32_t idx = fq->cached_prod++ & fq->mask; + + fq->ring[idx] = d[i]; + } + + rte_smp_wmb(); + *fq->producer = fq->cached_prod; + + return 0; +} + +static inline size_t umem_complete_from_kernel(struct xdp_umem_uqueue *cq, + uint64_t *d, size_t nb) +{ + uint32_t idx, i, entries = umem_nb_avail(cq, nb); + + rte_smp_rmb(); + + for (i = 0; i < entries; i++) { + idx = cq->cached_cons++ & cq->mask; + d[i] = cq->ring[idx]; + } + + if (entries > 0) { + rte_smp_wmb(); + *cq->consumer = cq->cached_cons; + } + + return entries; +} + +static inline int xq_enq(struct xdp_uqueue *uq, + const struct xdp_desc *descs, + unsigned int ndescs) +{ + struct xdp_desc *r = uq->ring; + unsigned int i; + + if (xq_nb_free(uq, ndescs) < ndescs) + return -ENOSPC; + + for (i = 0; i < ndescs; i++) { + uint32_t idx = uq->cached_prod++ & uq->mask; + + r[idx].addr = descs[i].addr; + r[idx].len = descs[i].len; + } + + rte_smp_wmb(); + + *uq->producer = uq->cached_prod; + return 0; +} + +static inline int xq_deq(struct xdp_uqueue *uq, + struct xdp_desc *descs, + int ndescs) +{ + struct xdp_desc *r = uq->ring; + unsigned int idx; + int i, entries; + + entries = xq_nb_avail(uq, ndescs); + rte_smp_rmb(); + + for (i = 0; i < entries; i++) { + idx = uq->cached_cons++ & uq->mask; + descs[i] = r[idx]; + } + + if (entries > 0) { + rte_smp_wmb(); + + *uq->consumer = uq->cached_cons; + } + + return entries; +} + +struct pkt_rx_queue { + int xsk_fd; + uint16_t queue_idx; + struct xdp_uqueue rx; + struct xdp_umem *umem; + struct rte_mempool *mb_pool; + + unsigned long rx_pkts; + unsigned long rx_bytes; + unsigned long rx_dropped; + + struct pkt_tx_queue *pair; +}; + +struct pkt_tx_queue { + uint16_t queue_idx; + struct xdp_uqueue tx; + + unsigned long tx_pkts; + unsigned long err_pkts; + unsigned long tx_bytes; + + struct pkt_rx_queue *pair; +}; + +struct pmd_internals { + int if_index; + char if_name[IFNAMSIZ]; + uint16_t queue_idx; + struct ether_addr eth_addr; + struct xdp_umem *umem_share; + int umem_share_count; + struct rte_mempool *mb_pool_share; + int xsk_map_id; + int xsk_map_key_start; + int xsk_map_key_count; + + struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS]; + struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS]; +}; + +static const char * const valid_arguments[] = { + ETH_AF_XDP_IFACE_ARG, + ETH_AF_XDP_QUEUE_IDX_ARG, + ETH_AF_XDP_XSK_MAP_ID_ARG, + ETH_AF_XDP_XSK_MAP_KEY_START_ARG, + ETH_AF_XDP_XSK_MAP_KEY_COUNT_ARG, + NULL +}; + +static struct rte_eth_link pmd_link = { + .link_speed = ETH_SPEED_NUM_10G, + .link_duplex = ETH_LINK_FULL_DUPLEX, + .link_status = ETH_LINK_DOWN, + .link_autoneg = ETH_LINK_AUTONEG +}; + +static char *get_pkt_data(struct xdp_umem *umem, uint64_t addr) +{ + return &umem->frames[addr]; +} + +static uint16_t +eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) +{ + struct xdp_desc descs[ETH_AF_XDP_RX_BATCH_SIZE]; + void *addrs[ETH_AF_XDP_RX_BATCH_SIZE]; + struct pkt_rx_queue *rxq = queue; + struct xdp_uqueue *uq = &rxq->rx; + struct xdp_umem_uqueue *fq = &rxq->umem->fq; + uint32_t free_thresh = fq->size >> 1; + struct rte_mbuf *mbuf; + unsigned long dropped = 0; + unsigned long rx_bytes = 0; + uint16_t count = 0; + int rcvd, i; + + nb_pkts = nb_pkts < ETH_AF_XDP_RX_BATCH_SIZE ? + nb_pkts : ETH_AF_XDP_RX_BATCH_SIZE; + + if (umem_nb_free(fq, free_thresh) >= free_thresh) { + int n = rte_ring_dequeue_bulk(rxq->umem->buf_ring, + addrs, + ETH_AF_XDP_RX_BATCH_SIZE, + NULL); + if (n == 0) + return -ENOMEM; + + if (umem_fill_to_kernel(fq, (uint64_t *)&addrs[0], + ETH_AF_XDP_RX_BATCH_SIZE)) { + rte_ring_enqueue_bulk(rxq->umem->buf_ring, + addrs, + ETH_AF_XDP_RX_BATCH_SIZE, + NULL); + } + } + + /* read data */ + rcvd = xq_deq(uq, descs, nb_pkts); + if (rcvd == 0) + return 0; + + for (i = 0; i < rcvd; i++) { + char *pkt; + uint64_t addr = descs[i].addr; + + mbuf = rte_pktmbuf_alloc(rxq->mb_pool); + rte_pktmbuf_pkt_len(mbuf) = + rte_pktmbuf_data_len(mbuf) = + descs[i].len; + if (mbuf) { + pkt = get_pkt_data(rxq->umem, addr); + memcpy(rte_pktmbuf_mtod(mbuf, void *), + pkt, descs[i].len); + rx_bytes += descs[i].len; + bufs[count++] = mbuf; + } else { + dropped++; + } + addrs[i] = (void *)addr; + } + + rte_ring_enqueue_bulk(rxq->umem->buf_ring, addrs, rcvd, NULL); + + rxq->rx_pkts += (rcvd - dropped); + rxq->rx_bytes += rx_bytes; + rxq->rx_dropped += dropped; + + return count; +} + +static void kick_tx(struct pkt_tx_queue *txq) +{ + void *addrs[ETH_AF_XDP_TX_BATCH_SIZE]; + struct rte_ring *buf_ring = txq->pair->umem->buf_ring; + struct xdp_umem_uqueue *cq = &txq->pair->umem->cq; + int fd = txq->pair->xsk_fd; + int ret, n; + + while (1) { + + ret = sendto(fd, NULL, 0, MSG_DONTWAIT, NULL, 0); + + /* everything is ok */ + if (ret >= 0) + break; + + /* some thing unexpected */ + if (errno != EBUSY && errno != EAGAIN) + break; + + /* pull from complete qeueu to leave more space */ + if (errno == EAGAIN) { + n = umem_complete_from_kernel(cq, + (uint64_t *)&addrs[0], + ETH_AF_XDP_TX_BATCH_SIZE); + if (n > 0) + rte_ring_enqueue_bulk(buf_ring, + addrs, n, NULL); + } + } +} + +static uint16_t +eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) +{ + struct pkt_tx_queue *txq = queue; + struct xdp_uqueue *uq = &txq->tx; + struct xdp_umem_uqueue *cq = &txq->pair->umem->cq; + struct rte_mbuf *mbuf; + struct xdp_desc descs[ETH_AF_XDP_TX_BATCH_SIZE]; + void *addrs[ETH_AF_XDP_TX_BATCH_SIZE]; + uint16_t i, valid; + unsigned long tx_bytes = 0; + + nb_pkts = nb_pkts < ETH_AF_XDP_TX_BATCH_SIZE ? + nb_pkts : ETH_AF_XDP_TX_BATCH_SIZE; + + int n = umem_complete_from_kernel(cq, (uint64_t *)&addrs[0], + ETH_AF_XDP_TX_BATCH_SIZE); + if (n > 0) + rte_ring_enqueue_bulk(txq->pair->umem->buf_ring, + addrs, n, NULL); + + nb_pkts = rte_ring_dequeue_bulk(txq->pair->umem->buf_ring, addrs, + nb_pkts, NULL); + if (!nb_pkts) + return 0; + + valid = 0; + for (i = 0; i < nb_pkts; i++) { + char *pkt; + unsigned int buf_len = + ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM; + mbuf = bufs[i]; + if (mbuf->pkt_len <= buf_len) { + descs[valid].addr = (uint64_t)addrs[valid]; + descs[valid].len = mbuf->pkt_len; + descs[valid].options = 0; + pkt = get_pkt_data(txq->pair->umem, descs[valid].addr); + memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *), + descs[i].len); + valid++; + tx_bytes += mbuf->pkt_len; + } + rte_pktmbuf_free(mbuf); + } + + if (xq_enq(uq, descs, valid)) { + valid = 0; + tx_bytes = 0; + } else { + kick_tx(txq); + } + + if (valid < nb_pkts) + rte_ring_enqueue_bulk(txq->pair->umem->buf_ring, &addrs[valid], + nb_pkts - valid, NULL); + + txq->err_pkts += (nb_pkts - valid); + txq->tx_pkts += valid; + txq->tx_bytes += tx_bytes; + + return nb_pkts; +} + +static void +fill_rx_desc(struct xdp_umem *umem) +{ + struct xdp_umem_uqueue *fq = &umem->fq; + void *p = NULL; + uint32_t i; + + for (i = 0; i < fq->size / 2; i++) { + rte_ring_dequeue(umem->buf_ring, &p); + if (umem_fill_to_kernel(fq, (uint64_t *)&p, 1)) { + rte_ring_enqueue(umem->buf_ring, p); + break; + } + } +} + +static int +eth_dev_start(struct rte_eth_dev *dev) +{ + dev->data->dev_link.link_status = ETH_LINK_UP; + + return 0; +} + +/* This function gets called when the current port gets stopped. */ +static void +eth_dev_stop(struct rte_eth_dev *dev) +{ + dev->data->dev_link.link_status = ETH_LINK_DOWN; +} + +static int +eth_dev_configure(struct rte_eth_dev *dev __rte_unused) +{ + /* rx/tx must be paired */ + if (dev->data->nb_rx_queues != dev->data->nb_tx_queues) + return -EINVAL; + + return 0; +} + +static void +eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) +{ + struct pmd_internals *internals = dev->data->dev_private; + + dev_info->if_index = internals->if_index; + dev_info->max_mac_addrs = 1; + dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN; + dev_info->max_rx_queues = internals->xsk_map_key_count; + dev_info->max_tx_queues = internals->xsk_map_key_count; + dev_info->min_rx_bufsize = 0; + + dev_info->default_rxportconf.nb_queues = 1; + dev_info->default_txportconf.nb_queues = 1; + dev_info->default_rxportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS; + dev_info->default_txportconf.ring_size = ETH_AF_XDP_DFLT_NUM_DESCS; +} + +static int +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct xdp_statistics xdp_stats; + struct pkt_rx_queue *rxq; + socklen_t optlen; + int i; + + optlen = sizeof(struct xdp_statistics); + for (i = 0; i < dev->data->nb_rx_queues; i++) { + rxq = &internals->rx_queues[i]; + stats->q_ipackets[i] = internals->rx_queues[i].rx_pkts; + stats->q_ibytes[i] = internals->rx_queues[i].rx_bytes; + + stats->q_opackets[i] = internals->tx_queues[i].tx_pkts; + stats->q_errors[i] = internals->tx_queues[i].err_pkts; + stats->q_obytes[i] = internals->tx_queues[i].tx_bytes; + + stats->ipackets += stats->q_ipackets[i]; + stats->ibytes += stats->q_ibytes[i]; + stats->imissed += internals->rx_queues[i].rx_dropped; + getsockopt(rxq->xsk_fd, SOL_XDP, XDP_STATISTICS, + &xdp_stats, &optlen); + stats->imissed += xdp_stats.rx_dropped; + + stats->opackets += stats->q_opackets[i]; + stats->oerrors += stats->q_errors[i]; + stats->obytes += stats->q_obytes[i]; + } + + return 0; +} + +static void +eth_stats_reset(struct rte_eth_dev *dev) +{ + struct pmd_internals *internals = dev->data->dev_private; + int i; + + for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) { + internals->rx_queues[i].rx_pkts = 0; + internals->rx_queues[i].rx_bytes = 0; + internals->rx_queues[i].rx_dropped = 0; + + internals->tx_queues[i].tx_pkts = 0; + internals->tx_queues[i].err_pkts = 0; + internals->tx_queues[i].tx_bytes = 0; + } +} + +static void +eth_dev_close(struct rte_eth_dev *dev __rte_unused) +{ +} + +static void +eth_queue_release(void *q __rte_unused) +{ +} + +static int +eth_link_update(struct rte_eth_dev *dev __rte_unused, + int wait_to_complete __rte_unused) +{ + return 0; +} + +static void xdp_umem_destroy(struct xdp_umem *umem) +{ + if (umem->frames) + free(umem->frames); + if (umem->buf_ring) + rte_ring_free(umem->buf_ring); + + free(umem); +} + +static struct xdp_umem *xdp_umem_configure(int sfd) +{ + int fq_size = ETH_AF_XDP_FQ_NUM_DESCS; + int cq_size = ETH_AF_XDP_CQ_NUM_DESCS; + struct xdp_mmap_offsets off; + struct xdp_umem_reg mr; + struct xdp_umem *umem; + char ring_name[0x100]; + socklen_t optlen; + void *bufs = NULL; + uint64_t i; + + umem = calloc(1, sizeof(*umem)); + if (!umem) + return NULL; + + snprintf(ring_name, 0x100, "%s_%d", "af_xdp_ring", sfd); + umem->buf_ring = rte_ring_create(ring_name, + ETH_AF_XDP_NUM_BUFFERS, + SOCKET_ID_ANY, + 0x0); + if (!umem->buf_ring) { + RTE_LOG(ERR, PMD, + "Failed to create rte_ring\n"); + goto err; + } + + for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++) + rte_ring_enqueue(umem->buf_ring, + (void *)(i * ETH_AF_XDP_FRAME_SIZE + + ETH_AF_XDP_DATA_HEADROOM)); + + if (posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */ + ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) { + RTE_LOG(ERR, PMD, + "Failed to allocate memory pool.\n"); + goto err; + } + + mr.addr = (uint64_t)bufs; + mr.len = ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE; + mr.chunk_size = ETH_AF_XDP_FRAME_SIZE; + mr.headroom = ETH_AF_XDP_DATA_HEADROOM; + + if (setsockopt(sfd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr))) { + RTE_LOG(ERR, PMD, + "Failed to register memory pool.\n"); + goto err; + } + + if (setsockopt(sfd, SOL_XDP, XDP_UMEM_FILL_RING, &fq_size, + sizeof(int))) { + RTE_LOG(ERR, PMD, + "Failed to setup fill ring.\n"); + goto err; + } + + if (setsockopt(sfd, SOL_XDP, XDP_UMEM_COMPLETION_RING, &cq_size, + sizeof(int))) { + RTE_LOG(ERR, PMD, + "Failed to setup complete ring.\n"); + goto err; + } + + optlen = sizeof(off); + if (getsockopt(sfd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen)) { + RTE_LOG(ERR, PMD, + "Failed to get map fr/cr offset.\n"); + goto err; + } + + umem->fq.map = mmap(0, off.fr.desc + + fq_size * sizeof(uint64_t), + PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, sfd, + XDP_UMEM_PGOFF_FILL_RING); + + if (umem->fq.ring == MAP_FAILED) { + RTE_LOG(ERR, PMD, + "Failed to allocate memory for fq.\n"); + goto err; + } + + umem->fq.mask = fq_size - 1; + umem->fq.size = fq_size; + umem->fq.producer = + (uint32_t *)((uint64_t)umem->fq.map + off.fr.producer); + umem->fq.consumer = + (uint32_t *)((uint64_t)umem->fq.map + off.fr.consumer); + umem->fq.ring = (uint64_t *)((uint64_t)umem->fq.map + off.fr.desc); + umem->fq.cached_cons = fq_size; + + umem->cq.map = mmap(0, off.cr.desc + + cq_size * sizeof(uint64_t), + PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, sfd, + XDP_UMEM_PGOFF_COMPLETION_RING); + + if (umem->cq.ring == MAP_FAILED) { + RTE_LOG(ERR, PMD, + "Failed to allocate memory for caq\n"); + goto err; + } + + umem->cq.mask = cq_size - 1; + umem->cq.size = cq_size; + umem->cq.producer = + (uint32_t *)((uint64_t)umem->cq.map + off.cr.producer); + umem->cq.consumer = + (uint32_t *)((uint64_t)umem->cq.map + off.cr.consumer); + umem->cq.ring = (uint64_t *)((uint64_t)umem->cq.map + off.cr.desc); + + umem->frames = bufs; + umem->fd = sfd; + + return umem; + +err: + xdp_umem_destroy(umem); + return NULL; + +} + +static int +xsk_configure(struct pkt_rx_queue *rxq, int ring_size, struct xdp_umem *umem) +{ + struct pkt_tx_queue *txq = rxq->pair; + struct xdp_mmap_offsets off; + int new_umem = 0; + socklen_t optlen; + + rxq->xsk_fd = socket(PF_XDP, SOCK_RAW, 0); + if (rxq->xsk_fd < 0) + return -1; + + if (!umem) { + rxq->umem = xdp_umem_configure(rxq->xsk_fd); + if (!rxq->umem) + goto err; + new_umem = 1; + } else { + rxq->umem = umem; + } + + if (setsockopt(rxq->xsk_fd, SOL_XDP, XDP_RX_RING, + &ring_size, sizeof(int))) { + RTE_LOG(ERR, PMD, "Failed to setup Rx ring.\n"); + goto err; + } + + if (setsockopt(rxq->xsk_fd, SOL_XDP, XDP_TX_RING, + &ring_size, sizeof(int))) { + RTE_LOG(ERR, PMD, "Failed to setup Tx ring.\n"); + goto err; + } + + optlen = sizeof(off); + if (getsockopt(rxq->xsk_fd, SOL_XDP, XDP_MMAP_OFFSETS, + &off, &optlen)) { + RTE_LOG(ERR, PMD, "Failed to get map rx/tx offsets.\n"); + goto err; + } + + /* Rx */ + rxq->rx.map = mmap(NULL, + off.rx.desc + + ring_size * sizeof(struct xdp_desc), + PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, rxq->xsk_fd, + XDP_PGOFF_RX_RING); + + if (rxq->rx.ring == MAP_FAILED) { + RTE_LOG(ERR, PMD, "Failed to map Rx ring memory.\n"); + goto err; + } + + fill_rx_desc(rxq->umem); + /* Tx */ + txq->tx.map = mmap(NULL, + off.tx.desc + + ring_size * sizeof(struct xdp_desc), + PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, rxq->xsk_fd, + XDP_PGOFF_TX_RING); + + if (txq->tx.ring == MAP_FAILED) { + RTE_LOG(ERR, PMD, "Failed to map Tx ring memory\n"); + goto err; + } + + rxq->rx.mask = ring_size - 1; + rxq->rx.size = ring_size; + rxq->rx.producer = + (uint32_t *)((uint64_t)rxq->rx.map + off.rx.producer); + rxq->rx.consumer = + (uint32_t *)((uint64_t)rxq->rx.map + off.rx.consumer); + rxq->rx.ring = (struct xdp_desc *)((uint64_t)rxq->rx.map + off.rx.desc); + + txq->tx.mask = ring_size - 1; + txq->tx.size = ring_size; + txq->tx.producer = + (uint32_t *)((uint64_t)txq->tx.map + off.tx.producer); + txq->tx.consumer = + (uint32_t *)((uint64_t)txq->tx.map + off.tx.consumer); + txq->tx.ring = (struct xdp_desc *)((uint64_t)txq->tx.map + off.tx.desc); + txq->tx.cached_cons = ring_size; + + return 0; + +err: + if (new_umem) + xdp_umem_destroy(rxq->umem); + close(rxq->xsk_fd); + rxq->xsk_fd = 0; + + return -1; +} + +static void +queue_reset(struct pmd_internals *internals, uint16_t queue_idx) +{ + struct pkt_rx_queue *rxq = &internals->rx_queues[queue_idx]; + struct pkt_tx_queue *txq = rxq->pair; + + if (rxq->xsk_fd) { + close(rxq->xsk_fd); + if (internals->umem_share_count > 0) { + internals->umem_share_count--; + if (internals->umem_share_count == 0 && + internals->umem_share) { + xdp_umem_destroy(internals->umem_share); + internals->umem_share = NULL; + } + } + } + memset(rxq, 0, sizeof(*rxq)); + memset(txq, 0, sizeof(*txq)); + rxq->pair = txq; + txq->pair = rxq; + rxq->queue_idx = queue_idx; + txq->queue_idx = queue_idx; +} + +static int +eth_rx_queue_setup(struct rte_eth_dev *dev, + uint16_t rx_queue_id, + uint16_t nb_rx_desc, + unsigned int socket_id __rte_unused, + const struct rte_eth_rxconf *rx_conf __rte_unused, + struct rte_mempool *mb_pool) +{ + struct pmd_internals *internals = dev->data->dev_private; + unsigned int buf_size, data_size; + struct pkt_rx_queue *rxq; + struct sockaddr_xdp sxdp = {0}; + int xsk_key; + int map_fd; + + if (dev->data->nb_rx_queues <= rx_queue_id) { + RTE_LOG(ERR, PMD, + "Invalid rx queue id: %d\n", rx_queue_id); + return -EINVAL; + } + + rxq = &internals->rx_queues[rx_queue_id]; + queue_reset(internals, rx_queue_id); + + /* Now get the space available for data in the mbuf */ + buf_size = rte_pktmbuf_data_room_size(mb_pool) - + RTE_PKTMBUF_HEADROOM; + data_size = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM; + + if (data_size > buf_size) { + RTE_LOG(ERR, PMD, + "%s: %d bytes will not fit in mbuf (%d bytes)\n", + dev->device->name, data_size, buf_size); + return -ENOMEM; + } + + rxq->mb_pool = mb_pool; + + if (xsk_configure(rxq, nb_rx_desc, internals->umem_share)) { + RTE_LOG(ERR, PMD, + "Failed to configure xdp socket\n"); + return -EINVAL; + } + + sxdp.sxdp_family = PF_XDP; + sxdp.sxdp_ifindex = internals->if_index; + sxdp.sxdp_queue_id = internals->queue_idx; + sxdp.sxdp_flags = 0; + if (internals->umem_share) { + RTE_LOG(INFO, PMD, + "use share umem at queue id %d\n", rx_queue_id); + sxdp.sxdp_flags = XDP_SHARED_UMEM; + sxdp.sxdp_shared_umem_fd = internals->umem_share->fd; + } + + if (bind(rxq->xsk_fd, (struct sockaddr *)&sxdp, sizeof(sxdp))) { + RTE_LOG(ERR, PMD, "Failed to bind xdp socket\n"); + if (!internals->umem_share) + xdp_umem_destroy(rxq->umem); + goto err; + } + + if (!internals->umem_share) + internals->umem_share = rxq->umem; + + internals->umem_share_count++; + map_fd = bpf_map_get_fd_by_id(internals->xsk_map_id); + + xsk_key = internals->xsk_map_key_start + rx_queue_id; + if (bpf_map_update_elem(map_fd, &xsk_key, &rxq->xsk_fd, 0)) { + RTE_LOG(ERR, PMD, + "Failed to update xsk map\n"); + goto err; + } + + dev->data->rx_queues[rx_queue_id] = rxq; + return 0; + +err: + queue_reset(internals, rx_queue_id); + return -EINVAL; +} + +static int +eth_tx_queue_setup(struct rte_eth_dev *dev, + uint16_t tx_queue_id, + uint16_t nb_tx_desc, + unsigned int socket_id __rte_unused, + const struct rte_eth_txconf *tx_conf __rte_unused) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct pkt_tx_queue *txq; + + if (dev->data->nb_tx_queues <= tx_queue_id) { + RTE_LOG(ERR, PMD, "Invalid tx queue id: %d\n", tx_queue_id); + return -EINVAL; + } + + RTE_LOG(WARNING, PMD, "Warning tx queue setup size=%d will be skipped\n", + nb_tx_desc); + txq = &internals->tx_queues[tx_queue_id]; + + dev->data->tx_queues[tx_queue_id] = txq; + return 0; +} + +static int +eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct ifreq ifr = { .ifr_mtu = mtu }; + int ret; + int s; + + s = socket(PF_INET, SOCK_DGRAM, 0); + if (s < 0) + return -EINVAL; + + snprintf(ifr.ifr_name, IFNAMSIZ, "%s", internals->if_name); + ret = ioctl(s, SIOCSIFMTU, &ifr); + close(s); + + if (ret < 0) + return -EINVAL; + + return 0; +} + +static void +eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask) +{ + struct ifreq ifr; + int s; + + s = socket(PF_INET, SOCK_DGRAM, 0); + if (s < 0) + return; + + snprintf(ifr.ifr_name, IFNAMSIZ, "%s", if_name); + if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0) + goto out; + ifr.ifr_flags &= mask; + ifr.ifr_flags |= flags; + if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0) + goto out; +out: + close(s); +} + +static void +eth_dev_promiscuous_enable(struct rte_eth_dev *dev) +{ + struct pmd_internals *internals = dev->data->dev_private; + + eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0); +} + +static void +eth_dev_promiscuous_disable(struct rte_eth_dev *dev) +{ + struct pmd_internals *internals = dev->data->dev_private; + + eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC); +} + +static const struct eth_dev_ops ops = { + .dev_start = eth_dev_start, + .dev_stop = eth_dev_stop, + .dev_close = eth_dev_close, + .dev_configure = eth_dev_configure, + .dev_infos_get = eth_dev_info, + .mtu_set = eth_dev_mtu_set, + .promiscuous_enable = eth_dev_promiscuous_enable, + .promiscuous_disable = eth_dev_promiscuous_disable, + .rx_queue_setup = eth_rx_queue_setup, + .tx_queue_setup = eth_tx_queue_setup, + .rx_queue_release = eth_queue_release, + .tx_queue_release = eth_queue_release, + .link_update = eth_link_update, + .stats_get = eth_stats_get, + .stats_reset = eth_stats_reset, +}; + +static struct rte_vdev_driver pmd_af_xdp_drv; + +static void +parse_parameters(struct rte_kvargs *kvlist, + char **if_name, + int *queue_idx, + int *xsk_map_id, + int *xsk_map_key_start, + int *xsk_map_key_count) +{ + struct rte_kvargs_pair *pair = NULL; + unsigned int k_idx; + + for (k_idx = 0; k_idx < kvlist->count; k_idx++) { + pair = &kvlist->pairs[k_idx]; + if (strstr(pair->key, ETH_AF_XDP_IFACE_ARG)) + *if_name = pair->value; + else if (strstr(pair->key, ETH_AF_XDP_QUEUE_IDX_ARG)) + *queue_idx = atoi(pair->value); + else if (strstr(pair->key, ETH_AF_XDP_XSK_MAP_ID_ARG)) + *xsk_map_id = atoi(pair->value); + else if (strstr(pair->value, ETH_AF_XDP_XSK_MAP_KEY_START_ARG)) + *xsk_map_key_start = atoi(pair->value); + else if (strstr(pair->key, ETH_AF_XDP_XSK_MAP_KEY_COUNT_ARG)) + *xsk_map_key_count = atoi(pair->value); + } +} + +static int +get_iface_info(const char *if_name, + struct ether_addr *eth_addr, + int *if_index) +{ + struct ifreq ifr; + int sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP); + + if (sock < 0) + return -1; + + strcpy(ifr.ifr_name, if_name); + if (ioctl(sock, SIOCGIFINDEX, &ifr)) + goto error; + + if (ioctl(sock, SIOCGIFHWADDR, &ifr)) + goto error; + + memcpy(eth_addr, ifr.ifr_hwaddr.sa_data, 6); + + close(sock); + *if_index = if_nametoindex(if_name); + return 0; + +error: + close(sock); + return -1; +} + +static int +init_internals(struct rte_vdev_device *dev, + const char *if_name, + int queue_idx, + int xsk_map_id, + int xsk_map_key_start, + int xsk_map_key_count) +{ + const char *name = rte_vdev_device_name(dev); + struct rte_eth_dev *eth_dev = NULL; + const unsigned int numa_node = dev->device.numa_node; + struct pmd_internals *internals = NULL; + int ret; + int i; + + internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node); + if (!internals) + return -ENOMEM; + + internals->queue_idx = queue_idx; + internals->xsk_map_id = xsk_map_id; + internals->xsk_map_key_start = xsk_map_key_start; + internals->xsk_map_key_count = xsk_map_key_count; + strcpy(internals->if_name, if_name); + + for (i = 0; i < ETH_AF_XDP_MAX_QUEUE_PAIRS; i++) { + internals->tx_queues[i].pair = &internals->rx_queues[i]; + internals->rx_queues[i].pair = &internals->tx_queues[i]; + } + + ret = get_iface_info(if_name, &internals->eth_addr, + &internals->if_index); + if (ret) + goto err; + + eth_dev = rte_eth_vdev_allocate(dev, 0); + if (!eth_dev) + goto err; + + eth_dev->data->dev_private = internals; + eth_dev->data->dev_link = pmd_link; + eth_dev->data->mac_addrs = &internals->eth_addr; + eth_dev->dev_ops = &ops; + eth_dev->rx_pkt_burst = eth_af_xdp_rx; + eth_dev->tx_pkt_burst = eth_af_xdp_tx; + + rte_eth_dev_probing_finish(eth_dev); + return 0; + +err: + rte_free(internals); + return -1; +} + +static int +rte_pmd_af_xdp_probe(struct rte_vdev_device *dev) +{ + struct rte_kvargs *kvlist; + char *if_name = NULL; + int queue_idx = ETH_AF_XDP_DFLT_QUEUE_IDX; + struct rte_eth_dev *eth_dev; + int xsk_map_id = -1; + int xsk_map_key_start = 0; + int xsk_map_key_count = 1; + const char *name; + int ret; + + RTE_LOG(INFO, PMD, "Initializing pmd_af_packet for %s\n", + rte_vdev_device_name(dev)); + + name = rte_vdev_device_name(dev); + if (rte_eal_process_type() == RTE_PROC_SECONDARY && + strlen(rte_vdev_device_args(dev)) == 0) { + eth_dev = rte_eth_dev_attach_secondary(name); + if (!eth_dev) { + RTE_LOG(ERR, PMD, "Failed to probe %s\n", name); + return -EINVAL; + } + eth_dev->dev_ops = &ops; + rte_eth_dev_probing_finish(eth_dev); + } + + kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments); + if (!kvlist) { + RTE_LOG(ERR, PMD, + "Invalid kvargs\n"); + return -EINVAL; + } + + if (dev->device.numa_node == SOCKET_ID_ANY) + dev->device.numa_node = rte_socket_id(); + + parse_parameters(kvlist, &if_name, + &queue_idx, + &xsk_map_id, + &xsk_map_key_start, + &xsk_map_key_count); + + if (xsk_map_id < 0) { + RTE_LOG(ERR, PMD, + "Invalid map id\n"); + return -EINVAL; + } + ret = init_internals(dev, if_name, queue_idx, xsk_map_id, + xsk_map_key_start, xsk_map_key_count); + + rte_kvargs_free(kvlist); + + return ret; +} + +static int +rte_pmd_af_xdp_remove(struct rte_vdev_device *dev) +{ + struct rte_eth_dev *eth_dev = NULL; + struct pmd_internals *internals; + int i; + + RTE_LOG(INFO, PMD, "Closing AF_XDP ethdev on numa socket %u\n", + rte_socket_id()); + + if (!dev) + return -1; + + /* find the ethdev entry */ + eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev)); + if (!eth_dev) + return -1; + + internals = eth_dev->data->dev_private; + + for (i = 0; i < internals->xsk_map_key_count; i++) + queue_reset(internals, i); + + rte_ring_free(internals->umem_share->buf_ring); + rte_free(internals->umem_share->frames); + rte_free(internals->umem_share); + rte_free(internals); + + rte_eth_dev_release_port(eth_dev); + + return 0; +} + +static struct rte_vdev_driver pmd_af_xdp_drv = { + .probe = rte_pmd_af_xdp_probe, + .remove = rte_pmd_af_xdp_remove, +}; + +RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv); +RTE_PMD_REGISTER_ALIAS(net_af_xdp, eth_af_xdp); +RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp, + "iface= " + "queue= " + "xsk_map_id= " + "xsk_map_key_start= " + "xsk_map_key_count= "); diff --git a/drivers/net/af_xdp/rte_pmd_af_xdp_version.map b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map new file mode 100644 index 000000000..ef3539840 --- /dev/null +++ b/drivers/net/af_xdp/rte_pmd_af_xdp_version.map @@ -0,0 +1,4 @@ +DPDK_2.0 { + + local: *; +}; diff --git a/mk/rte.app.mk b/mk/rte.app.mk index de33883be..428ad8ab0 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -118,6 +118,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL) += -lrte_mempool_dpaa2 endif _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += -lrte_pmd_af_packet +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += -lrte_pmd_af_xdp -lelf -lbpf _LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += -lrte_pmd_ark _LDLIBS-$(CONFIG_RTE_LIBRTE_AVF_PMD) += -lrte_pmd_avf _LDLIBS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += -lrte_pmd_avp From patchwork Thu Aug 16 14:43:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zhang X-Patchwork-Id: 43742 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 83F204CB3; Thu, 16 Aug 2018 16:42:54 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 2BAB349CF for ; Thu, 16 Aug 2018 16:42:51 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Aug 2018 07:42:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,247,1531810800"; d="scan'208";a="65704094" Received: from dpdk51.sh.intel.com ([10.67.110.190]) by orsmga008.jf.intel.com with ESMTP; 16 Aug 2018 07:42:39 -0700 From: Qi Zhang To: dev@dpdk.org Cc: magnus.karlsson@intel.com, bjorn.topel@intel.com, jingjing.wu@intel.com, xiaoyun.li@intel.com, ferruh.yigit@intel.com, Qi Zhang Date: Thu, 16 Aug 2018 22:43:17 +0800 Message-Id: <20180816144321.17719-3-qi.z.zhang@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180816144321.17719-1-qi.z.zhang@intel.com> References: <20180816144321.17719-1-qi.z.zhang@intel.com> Subject: [dpdk-dev] [RFC v3 2/6] lib/mbuf: enable parse flags when create mempool X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This give the option that applicaiton can configure each memory chunk's size precisely. (by MEMPOOL_F_NO_SPREAD). Signed-off-by: Qi Zhang --- lib/librte_mbuf/rte_mbuf.c | 15 ++++++++++++--- lib/librte_mbuf/rte_mbuf.h | 8 +++++++- 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index e714c5a59..dd119f5ac 100644 --- a/lib/librte_mbuf/rte_mbuf.c +++ b/lib/librte_mbuf/rte_mbuf.c @@ -110,7 +110,7 @@ rte_pktmbuf_init(struct rte_mempool *mp, struct rte_mempool * rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n, unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size, - int socket_id, const char *ops_name) + unsigned int flags, int socket_id, const char *ops_name) { struct rte_mempool *mp; struct rte_pktmbuf_pool_private mbp_priv; @@ -130,7 +130,7 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n, mbp_priv.mbuf_priv_size = priv_size; mp = rte_mempool_create_empty(name, n, elt_size, cache_size, - sizeof(struct rte_pktmbuf_pool_private), socket_id, 0); + sizeof(struct rte_pktmbuf_pool_private), socket_id, flags); if (mp == NULL) return NULL; @@ -164,9 +164,18 @@ rte_pktmbuf_pool_create(const char *name, unsigned int n, int socket_id) { return rte_pktmbuf_pool_create_by_ops(name, n, cache_size, priv_size, - data_room_size, socket_id, NULL); + data_room_size, 0, socket_id, NULL); } +/* helper to create a mbuf pool with NO_SPREAD */ +struct rte_mempool * +rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n, + unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size, + unsigned int flags, int socket_id) +{ + return rte_pktmbuf_pool_create_by_ops(name, n, cache_size, priv_size, + data_room_size, flags, socket_id, NULL); +} /* do some sanity checks on a mbuf: panic if it fails */ void rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header) diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index 9ce5d76d7..d83d17b79 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -1127,6 +1127,12 @@ rte_pktmbuf_pool_create(const char *name, unsigned n, unsigned cache_size, uint16_t priv_size, uint16_t data_room_size, int socket_id); +struct rte_mempool * +rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n, + unsigned cache_size, uint16_t priv_size, uint16_t data_room_size, + unsigned flags, int socket_id); + + /** * Create a mbuf pool with a given mempool ops name * @@ -1167,7 +1173,7 @@ rte_pktmbuf_pool_create(const char *name, unsigned n, struct rte_mempool * rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n, unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size, - int socket_id, const char *ops_name); + unsigned int flags, int socket_id, const char *ops_name); /** * Get the data room size of mbufs stored in a pktmbuf_pool From patchwork Thu Aug 16 14:43:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zhang X-Patchwork-Id: 43743 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 78DD44CC3; Thu, 16 Aug 2018 16:42:56 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 9D4484C88 for ; Thu, 16 Aug 2018 16:42:51 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Aug 2018 07:42:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,247,1531810800"; d="scan'208";a="65704098" Received: from dpdk51.sh.intel.com ([10.67.110.190]) by orsmga008.jf.intel.com with ESMTP; 16 Aug 2018 07:42:40 -0700 From: Qi Zhang To: dev@dpdk.org Cc: magnus.karlsson@intel.com, bjorn.topel@intel.com, jingjing.wu@intel.com, xiaoyun.li@intel.com, ferruh.yigit@intel.com, Qi Zhang Date: Thu, 16 Aug 2018 22:43:18 +0800 Message-Id: <20180816144321.17719-4-qi.z.zhang@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180816144321.17719-1-qi.z.zhang@intel.com> References: <20180816144321.17719-1-qi.z.zhang@intel.com> Subject: [dpdk-dev] [RFC v3 3/6] lib/mempool: allow page size aligned mempool X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Allow create a mempool with page size aligned base address. Signed-off-by: Qi Zhang --- lib/librte_mempool/rte_mempool.c | 3 +++ lib/librte_mempool/rte_mempool.h | 1 + 2 files changed, 4 insertions(+) diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c index 03e6b5f73..61f7764c5 100644 --- a/lib/librte_mempool/rte_mempool.c +++ b/lib/librte_mempool/rte_mempool.c @@ -508,6 +508,9 @@ rte_mempool_populate_default(struct rte_mempool *mp) if (try_contig) flags |= RTE_MEMZONE_IOVA_CONTIG; + if (mp->flags & MEMPOOL_F_PAGE_ALIGN) + align = getpagesize(); + mz = rte_memzone_reserve_aligned(mz_name, mem_size, mp->socket_id, flags, align); diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 7c9cd9a2f..75553b36f 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -264,6 +264,7 @@ struct rte_mempool { #define MEMPOOL_F_POOL_CREATED 0x0010 /**< Internal: pool is created. */ #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */ #define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */ +#define MEMPOOL_F_PAGE_ALIGN 0x0040 /**< Chunk's base address is page aligned */ /** * @internal When debug is enabled, store some statistics. From patchwork Thu Aug 16 14:43:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zhang X-Patchwork-Id: 43744 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id C666A4F90; Thu, 16 Aug 2018 16:42:57 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id C62AE4C8D for ; Thu, 16 Aug 2018 16:42:51 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Aug 2018 07:42:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,247,1531810800"; d="scan'208";a="65704101" Received: from dpdk51.sh.intel.com ([10.67.110.190]) by orsmga008.jf.intel.com with ESMTP; 16 Aug 2018 07:42:42 -0700 From: Qi Zhang To: dev@dpdk.org Cc: magnus.karlsson@intel.com, bjorn.topel@intel.com, jingjing.wu@intel.com, xiaoyun.li@intel.com, ferruh.yigit@intel.com, Qi Zhang Date: Thu, 16 Aug 2018 22:43:19 +0800 Message-Id: <20180816144321.17719-5-qi.z.zhang@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180816144321.17719-1-qi.z.zhang@intel.com> References: <20180816144321.17719-1-qi.z.zhang@intel.com> Subject: [dpdk-dev] [RFC v3 4/6] net/af_xdp: use mbuf mempool for buffer management X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf be allocated from rte_mempool can be convert to xdp_desc's address and vice versa. Signed-off-by: Qi Zhang --- drivers/net/af_xdp/rte_eth_af_xdp.c | 184 +++++++++++++++++++++--------------- 1 file changed, 108 insertions(+), 76 deletions(-) diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c index 12252014d..69bc38536 100644 --- a/drivers/net/af_xdp/rte_eth_af_xdp.c +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c @@ -42,7 +42,11 @@ #define ETH_AF_XDP_FRAME_SIZE 2048 #define ETH_AF_XDP_NUM_BUFFERS 4096 -#define ETH_AF_XDP_DATA_HEADROOM 0 +/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */ +#define ETH_AF_XDP_MBUF_OVERHEAD 192 +/* data start from offset 320 (192 + 128) bytes */ +#define ETH_AF_XDP_DATA_HEADROOM \ + (ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM) #define ETH_AF_XDP_DFLT_NUM_DESCS 1024 #define ETH_AF_XDP_FQ_NUM_DESCS 1024 #define ETH_AF_XDP_CQ_NUM_DESCS 1024 @@ -68,7 +72,7 @@ struct xdp_umem { char *frames; struct xdp_umem_uqueue fq; struct xdp_umem_uqueue cq; - struct rte_ring *buf_ring; /* be used to manage the buffer */ + struct rte_mempool *mb_pool; /* be used to manage the buffer */ int fd; }; @@ -304,11 +308,25 @@ static char *get_pkt_data(struct xdp_umem *umem, uint64_t addr) return &umem->frames[addr]; } +static inline struct rte_mbuf * +addr_to_mbuf(struct xdp_umem *umem, uint64_t addr) +{ + return (struct rte_mbuf *)((uint64_t)umem->frames + addr - 0x100); +} + +static inline uint64_t +mbuf_to_addr(struct xdp_umem *umem, struct rte_mbuf *mbuf) +{ + return (uint64_t)mbuf->buf_addr + mbuf->data_off - + (uint64_t)umem->frames; +} + static uint16_t eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) { struct xdp_desc descs[ETH_AF_XDP_RX_BATCH_SIZE]; - void *addrs[ETH_AF_XDP_RX_BATCH_SIZE]; + struct rte_mbuf *bufs_to_fill[ETH_AF_XDP_RX_BATCH_SIZE]; + uint64_t addrs[ETH_AF_XDP_RX_BATCH_SIZE]; struct pkt_rx_queue *rxq = queue; struct xdp_uqueue *uq = &rxq->rx; struct xdp_umem_uqueue *fq = &rxq->umem->fq; @@ -317,25 +335,25 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) unsigned long dropped = 0; unsigned long rx_bytes = 0; uint16_t count = 0; - int rcvd, i; + int rcvd, i, ret; nb_pkts = nb_pkts < ETH_AF_XDP_RX_BATCH_SIZE ? nb_pkts : ETH_AF_XDP_RX_BATCH_SIZE; if (umem_nb_free(fq, free_thresh) >= free_thresh) { - int n = rte_ring_dequeue_bulk(rxq->umem->buf_ring, - addrs, - ETH_AF_XDP_RX_BATCH_SIZE, - NULL); - if (n == 0) + ret = rte_pktmbuf_alloc_bulk(rxq->umem->mb_pool, + bufs_to_fill, + ETH_AF_XDP_RX_BATCH_SIZE); + if (ret) return -ENOMEM; - if (umem_fill_to_kernel(fq, (uint64_t *)&addrs[0], - ETH_AF_XDP_RX_BATCH_SIZE)) { - rte_ring_enqueue_bulk(rxq->umem->buf_ring, - addrs, - ETH_AF_XDP_RX_BATCH_SIZE, - NULL); + for (i = 0; i < ETH_AF_XDP_RX_BATCH_SIZE; i++) + addrs[i] = mbuf_to_addr(rxq->umem, bufs_to_fill[i]); + + if (umem_fill_to_kernel(fq, addrs, + ETH_AF_XDP_RX_BATCH_SIZE)) { + for (i = 0; i < ETH_AF_XDP_RX_BATCH_SIZE; i++) + rte_pktmbuf_free(bufs_to_fill[i]); } } @@ -361,11 +379,9 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) } else { dropped++; } - addrs[i] = (void *)addr; + rte_pktmbuf_free(addr_to_mbuf(rxq->umem, addr)); } - rte_ring_enqueue_bulk(rxq->umem->buf_ring, addrs, rcvd, NULL); - rxq->rx_pkts += (rcvd - dropped); rxq->rx_bytes += rx_bytes; rxq->rx_dropped += dropped; @@ -375,11 +391,10 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) static void kick_tx(struct pkt_tx_queue *txq) { - void *addrs[ETH_AF_XDP_TX_BATCH_SIZE]; - struct rte_ring *buf_ring = txq->pair->umem->buf_ring; struct xdp_umem_uqueue *cq = &txq->pair->umem->cq; + uint64_t addrs[ETH_AF_XDP_TX_BATCH_SIZE]; int fd = txq->pair->xsk_fd; - int ret, n; + int ret, n, i; while (1) { @@ -398,9 +413,10 @@ static void kick_tx(struct pkt_tx_queue *txq) n = umem_complete_from_kernel(cq, (uint64_t *)&addrs[0], ETH_AF_XDP_TX_BATCH_SIZE); - if (n > 0) - rte_ring_enqueue_bulk(buf_ring, - addrs, n, NULL); + for (i = 0; i < n; i++) + rte_pktmbuf_free( + addr_to_mbuf(txq->pair->umem, + addrs[i])); } } } @@ -413,23 +429,21 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) struct xdp_umem_uqueue *cq = &txq->pair->umem->cq; struct rte_mbuf *mbuf; struct xdp_desc descs[ETH_AF_XDP_TX_BATCH_SIZE]; - void *addrs[ETH_AF_XDP_TX_BATCH_SIZE]; - uint16_t i, valid; + uint64_t addrs[ETH_AF_XDP_TX_BATCH_SIZE]; + struct rte_mbuf *bufs_to_fill[ETH_AF_XDP_TX_BATCH_SIZE]; unsigned long tx_bytes = 0; + int i, valid, n; nb_pkts = nb_pkts < ETH_AF_XDP_TX_BATCH_SIZE ? nb_pkts : ETH_AF_XDP_TX_BATCH_SIZE; - int n = umem_complete_from_kernel(cq, (uint64_t *)&addrs[0], - ETH_AF_XDP_TX_BATCH_SIZE); - if (n > 0) - rte_ring_enqueue_bulk(txq->pair->umem->buf_ring, - addrs, n, NULL); - - nb_pkts = rte_ring_dequeue_bulk(txq->pair->umem->buf_ring, addrs, - nb_pkts, NULL); - if (!nb_pkts) - return 0; + n = umem_complete_from_kernel(cq, addrs, + ETH_AF_XDP_TX_BATCH_SIZE); + if (n > 0) { + for (i = 0; i < n; i++) + rte_pktmbuf_free(addr_to_mbuf(txq->pair->umem, + addrs[i])); + } valid = 0; for (i = 0; i < nb_pkts; i++) { @@ -438,7 +452,13 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM; mbuf = bufs[i]; if (mbuf->pkt_len <= buf_len) { - descs[valid].addr = (uint64_t)addrs[valid]; + bufs_to_fill[valid] = + rte_pktmbuf_alloc(txq->pair->umem->mb_pool); + if (!bufs_to_fill[valid]) + break; + descs[valid].addr = + mbuf_to_addr(txq->pair->umem, + bufs_to_fill[valid]); descs[valid].len = mbuf->pkt_len; descs[valid].options = 0; pkt = get_pkt_data(txq->pair->umem, descs[valid].addr); @@ -447,20 +467,20 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) valid++; tx_bytes += mbuf->pkt_len; } - rte_pktmbuf_free(mbuf); } if (xq_enq(uq, descs, valid)) { + for (i = 0; i < valid; i++) + rte_pktmbuf_free(bufs_to_fill[i]); + nb_pkts = 0; valid = 0; tx_bytes = 0; } else { kick_tx(txq); + for (i = 0; i < nb_pkts; i++) + rte_pktmbuf_free(bufs[i]); } - if (valid < nb_pkts) - rte_ring_enqueue_bulk(txq->pair->umem->buf_ring, &addrs[valid], - nb_pkts - valid, NULL); - txq->err_pkts += (nb_pkts - valid); txq->tx_pkts += valid; txq->tx_bytes += tx_bytes; @@ -472,13 +492,15 @@ static void fill_rx_desc(struct xdp_umem *umem) { struct xdp_umem_uqueue *fq = &umem->fq; - void *p = NULL; + struct rte_mbuf *mbuf; + uint64_t addr; uint32_t i; for (i = 0; i < fq->size / 2; i++) { - rte_ring_dequeue(umem->buf_ring, &p); - if (umem_fill_to_kernel(fq, (uint64_t *)&p, 1)) { - rte_ring_enqueue(umem->buf_ring, p); + mbuf = rte_pktmbuf_alloc(umem->mb_pool); + addr = mbuf_to_addr(umem, mbuf); + if (umem_fill_to_kernel(fq, &addr, 1)) { + rte_pktmbuf_free(mbuf); break; } } @@ -597,14 +619,28 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused, static void xdp_umem_destroy(struct xdp_umem *umem) { - if (umem->frames) - free(umem->frames); - if (umem->buf_ring) - rte_ring_free(umem->buf_ring); + if (umem->mb_pool) + rte_mempool_free(umem->mb_pool); free(umem); } +static inline uint64_t get_base_addr(struct rte_mempool *mp) +{ + struct rte_mempool_memhdr *memhdr; + + memhdr = STAILQ_FIRST(&mp->mem_list); + return (uint64_t)(memhdr->addr); +} + +static inline uint64_t get_len(struct rte_mempool *mp) +{ + struct rte_mempool_memhdr *memhdr; + + memhdr = STAILQ_FIRST(&mp->mem_list); + return (uint64_t)(memhdr->len); +} + static struct xdp_umem *xdp_umem_configure(int sfd) { int fq_size = ETH_AF_XDP_FQ_NUM_DESCS; @@ -612,40 +648,29 @@ static struct xdp_umem *xdp_umem_configure(int sfd) struct xdp_mmap_offsets off; struct xdp_umem_reg mr; struct xdp_umem *umem; - char ring_name[0x100]; + char pool_name[0x100]; socklen_t optlen; - void *bufs = NULL; - uint64_t i; umem = calloc(1, sizeof(*umem)); if (!umem) return NULL; - snprintf(ring_name, 0x100, "%s_%d", "af_xdp_ring", sfd); - umem->buf_ring = rte_ring_create(ring_name, - ETH_AF_XDP_NUM_BUFFERS, - SOCKET_ID_ANY, - 0x0); - if (!umem->buf_ring) { - RTE_LOG(ERR, PMD, - "Failed to create rte_ring\n"); - goto err; - } + snprintf(pool_name, 0x100, "%s_%d", "af_xdp_ring", sfd); + umem->mb_pool = rte_pktmbuf_pool_create_with_flags( + pool_name, ETH_AF_XDP_NUM_BUFFERS, + 250, 0, + ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_MBUF_OVERHEAD, + MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN, + SOCKET_ID_ANY); - for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++) - rte_ring_enqueue(umem->buf_ring, - (void *)(i * ETH_AF_XDP_FRAME_SIZE + - ETH_AF_XDP_DATA_HEADROOM)); - - if (posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */ - ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE)) { + if (!umem->mb_pool || umem->mb_pool->nb_mem_chunks != 1) { RTE_LOG(ERR, PMD, - "Failed to allocate memory pool.\n"); + "Failed to create rte_mempool\n"); goto err; } - mr.addr = (uint64_t)bufs; - mr.len = ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE; + mr.addr = get_base_addr(umem->mb_pool); + mr.len = get_len(umem->mb_pool); mr.chunk_size = ETH_AF_XDP_FRAME_SIZE; mr.headroom = ETH_AF_XDP_DATA_HEADROOM; @@ -717,7 +742,7 @@ static struct xdp_umem *xdp_umem_configure(int sfd) (uint32_t *)((uint64_t)umem->cq.map + off.cr.consumer); umem->cq.ring = (uint64_t *)((uint64_t)umem->cq.map + off.cr.desc); - umem->frames = bufs; + umem->frames = (void *)get_base_addr(umem->mb_pool); umem->fd = sfd; return umem; @@ -729,7 +754,8 @@ static struct xdp_umem *xdp_umem_configure(int sfd) } static int -xsk_configure(struct pkt_rx_queue *rxq, int ring_size, struct xdp_umem *umem) +xsk_configure(struct pkt_rx_queue *rxq, int ring_size, + struct xdp_umem *umem) { struct pkt_tx_queue *txq = rxq->pair; struct xdp_mmap_offsets off; @@ -863,6 +889,12 @@ eth_rx_queue_setup(struct rte_eth_dev *dev, int xsk_key; int map_fd; + if (mb_pool == NULL) { + RTE_LOG(ERR, PMD, + "Invalid mb_pool\n"); + return -EINVAL; + } + if (dev->data->nb_rx_queues <= rx_queue_id) { RTE_LOG(ERR, PMD, "Invalid rx queue id: %d\n", rx_queue_id); @@ -1222,7 +1254,7 @@ rte_pmd_af_xdp_remove(struct rte_vdev_device *dev) for (i = 0; i < internals->xsk_map_key_count; i++) queue_reset(internals, i); - rte_ring_free(internals->umem_share->buf_ring); + rte_mempool_free(internals->umem_share->mb_pool); rte_free(internals->umem_share->frames); rte_free(internals->umem_share); rte_free(internals); From patchwork Thu Aug 16 14:43:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zhang X-Patchwork-Id: 43745 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 011DB5323; Thu, 16 Aug 2018 16:42:58 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 3F6C24C8E for ; Thu, 16 Aug 2018 16:42:52 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Aug 2018 07:42:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,247,1531810800"; d="scan'208";a="65704106" Received: from dpdk51.sh.intel.com ([10.67.110.190]) by orsmga008.jf.intel.com with ESMTP; 16 Aug 2018 07:42:43 -0700 From: Qi Zhang To: dev@dpdk.org Cc: magnus.karlsson@intel.com, bjorn.topel@intel.com, jingjing.wu@intel.com, xiaoyun.li@intel.com, ferruh.yigit@intel.com, Qi Zhang Date: Thu, 16 Aug 2018 22:43:20 +0800 Message-Id: <20180816144321.17719-6-qi.z.zhang@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180816144321.17719-1-qi.z.zhang@intel.com> References: <20180816144321.17719-1-qi.z.zhang@intel.com> Subject: [dpdk-dev] [RFC v3 5/6] net/af_xdp: enable zero copy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Try to check if external mempool (from rx_queue_setup) is fit for af_xdp, if it is, it will be registered to af_xdp socket directly and there will be no packet data copy on Rx and Tx. Signed-off-by: Qi Zhang --- drivers/net/af_xdp/rte_eth_af_xdp.c | 158 +++++++++++++++++++++++++----------- 1 file changed, 112 insertions(+), 46 deletions(-) diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c index 69bc38536..c78c66a8c 100644 --- a/drivers/net/af_xdp/rte_eth_af_xdp.c +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c @@ -73,6 +73,7 @@ struct xdp_umem { struct xdp_umem_uqueue fq; struct xdp_umem_uqueue cq; struct rte_mempool *mb_pool; /* be used to manage the buffer */ + uint8_t zc; int fd; }; @@ -258,6 +259,7 @@ struct pkt_rx_queue { unsigned long rx_dropped; struct pkt_tx_queue *pair; + uint8_t zc; }; struct pkt_tx_queue { @@ -366,20 +368,24 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) char *pkt; uint64_t addr = descs[i].addr; - mbuf = rte_pktmbuf_alloc(rxq->mb_pool); - rte_pktmbuf_pkt_len(mbuf) = - rte_pktmbuf_data_len(mbuf) = - descs[i].len; - if (mbuf) { - pkt = get_pkt_data(rxq->umem, addr); - memcpy(rte_pktmbuf_mtod(mbuf, void *), - pkt, descs[i].len); - rx_bytes += descs[i].len; - bufs[count++] = mbuf; + if (!rxq->zc) { + mbuf = rte_pktmbuf_alloc(rxq->mb_pool); + rte_pktmbuf_pkt_len(mbuf) = + rte_pktmbuf_data_len(mbuf) = + descs[i].len; + if (mbuf) { + pkt = get_pkt_data(rxq->umem, addr); + memcpy(rte_pktmbuf_mtod(mbuf, void *), + pkt, descs[i].len); + rx_bytes += descs[i].len; + bufs[count++] = mbuf; + } else { + dropped++; + } + rte_pktmbuf_free(addr_to_mbuf(rxq->umem, addr)); } else { - dropped++; + bufs[count++] = addr_to_mbuf(rxq->umem, addr); } - rte_pktmbuf_free(addr_to_mbuf(rxq->umem, addr)); } rxq->rx_pkts += (rcvd - dropped); @@ -425,14 +431,17 @@ static uint16_t eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) { struct pkt_tx_queue *txq = queue; + struct xdp_umem *umem = txq->pair->umem; struct xdp_uqueue *uq = &txq->tx; struct xdp_umem_uqueue *cq = &txq->pair->umem->cq; + struct rte_mempool *mp = umem->mb_pool; struct rte_mbuf *mbuf; struct xdp_desc descs[ETH_AF_XDP_TX_BATCH_SIZE]; uint64_t addrs[ETH_AF_XDP_TX_BATCH_SIZE]; struct rte_mbuf *bufs_to_fill[ETH_AF_XDP_TX_BATCH_SIZE]; + struct rte_mbuf *bufs_to_free[ETH_AF_XDP_TX_BATCH_SIZE]; unsigned long tx_bytes = 0; - int i, valid, n; + int i, valid, n, free, fill; nb_pkts = nb_pkts < ETH_AF_XDP_TX_BATCH_SIZE ? nb_pkts : ETH_AF_XDP_TX_BATCH_SIZE; @@ -446,39 +455,57 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) } valid = 0; + free = 0; + fill = 0; for (i = 0; i < nb_pkts; i++) { - char *pkt; - unsigned int buf_len = - ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM; mbuf = bufs[i]; - if (mbuf->pkt_len <= buf_len) { - bufs_to_fill[valid] = - rte_pktmbuf_alloc(txq->pair->umem->mb_pool); - if (!bufs_to_fill[valid]) - break; - descs[valid].addr = - mbuf_to_addr(txq->pair->umem, - bufs_to_fill[valid]); + /* mbuf is in shared mempool, zero copy */ + if (txq->pair->zc && bufs[i]->pool == mp) { + descs[valid].addr = mbuf_to_addr(umem, mbuf); descs[valid].len = mbuf->pkt_len; descs[valid].options = 0; - pkt = get_pkt_data(txq->pair->umem, descs[valid].addr); - memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *), - descs[i].len); valid++; tx_bytes += mbuf->pkt_len; + } else { + char *pkt; + unsigned int buf_len = + ETH_AF_XDP_FRAME_SIZE - + ETH_AF_XDP_DATA_HEADROOM; + if (mbuf->pkt_len <= buf_len) { + + bufs_to_fill[fill] = rte_pktmbuf_alloc(mp); + if (bufs_to_fill[fill] == NULL) { + bufs_to_free[free++] = mbuf; + continue; + } + + descs[valid].addr = + mbuf_to_addr(umem, bufs_to_fill[fill]); + fill++; + descs[valid].len = mbuf->pkt_len; + descs[valid].options = 0; + pkt = get_pkt_data(umem, descs[valid].addr); + memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *), + descs[i].len); + valid++; + tx_bytes += mbuf->pkt_len; + } + bufs_to_free[free++] = mbuf; } } if (xq_enq(uq, descs, valid)) { - for (i = 0; i < valid; i++) + /* if failed, all tmp mbufs need to be free */ + for (i = 0; i < fill; i++) rte_pktmbuf_free(bufs_to_fill[i]); nb_pkts = 0; valid = 0; tx_bytes = 0; } else { + /* if passed, original mbuf need to be free */ + for (i = 0; i < free; i++) + rte_pktmbuf_free(bufs_to_free[i]); kick_tx(txq); - for (i = 0; i < nb_pkts; i++) - rte_pktmbuf_free(bufs[i]); } txq->err_pkts += (nb_pkts - valid); @@ -641,7 +668,7 @@ static inline uint64_t get_len(struct rte_mempool *mp) return (uint64_t)(memhdr->len); } -static struct xdp_umem *xdp_umem_configure(int sfd) +static struct xdp_umem *xdp_umem_configure(int sfd, struct rte_mempool *mb_pool) { int fq_size = ETH_AF_XDP_FQ_NUM_DESCS; int cq_size = ETH_AF_XDP_CQ_NUM_DESCS; @@ -655,18 +682,24 @@ static struct xdp_umem *xdp_umem_configure(int sfd) if (!umem) return NULL; - snprintf(pool_name, 0x100, "%s_%d", "af_xdp_ring", sfd); - umem->mb_pool = rte_pktmbuf_pool_create_with_flags( - pool_name, ETH_AF_XDP_NUM_BUFFERS, - 250, 0, - ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_MBUF_OVERHEAD, - MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN, - SOCKET_ID_ANY); - - if (!umem->mb_pool || umem->mb_pool->nb_mem_chunks != 1) { - RTE_LOG(ERR, PMD, - "Failed to create rte_mempool\n"); - goto err; + if (!mb_pool) { + snprintf(pool_name, 0x100, "%s_%d", "af_xdp_ring", sfd); + umem->mb_pool = rte_pktmbuf_pool_create_with_flags( + pool_name, ETH_AF_XDP_NUM_BUFFERS, + 250, 0, + ETH_AF_XDP_FRAME_SIZE - + ETH_AF_XDP_MBUF_OVERHEAD, + MEMPOOL_F_NO_SPREAD | MEMPOOL_F_PAGE_ALIGN, + SOCKET_ID_ANY); + + if (!umem->mb_pool || umem->mb_pool->nb_mem_chunks != 1) { + RTE_LOG(ERR, PMD, + "Failed to create rte_mempool\n"); + goto err; + } + } else { + umem->mb_pool = mb_pool; + umem->zc = 1; } mr.addr = get_base_addr(umem->mb_pool); @@ -753,9 +786,34 @@ static struct xdp_umem *xdp_umem_configure(int sfd) } +static uint8_t +check_mempool_zc(struct rte_mempool *mp) +{ + RTE_ASSERT(mp); + + /* must continues */ + if (mp->nb_mem_chunks > 1) + return 0; + + /* check header size */ + if (mp->header_size != RTE_CACHE_LINE_SIZE) + return 0; + + /* check base address */ + if ((uint64_t)get_base_addr(mp) % getpagesize() != 0) + return 0; + + /* check chunk size */ + if ((mp->elt_size + mp->header_size + mp->trailer_size) % + ETH_AF_XDP_FRAME_SIZE != 0) + return 0; + + return 1; +} + static int xsk_configure(struct pkt_rx_queue *rxq, int ring_size, - struct xdp_umem *umem) + struct xdp_umem *umem, struct rte_mempool *mb_pool) { struct pkt_tx_queue *txq = rxq->pair; struct xdp_mmap_offsets off; @@ -767,7 +825,8 @@ xsk_configure(struct pkt_rx_queue *rxq, int ring_size, return -1; if (!umem) { - rxq->umem = xdp_umem_configure(rxq->xsk_fd); + mb_pool = check_mempool_zc(mb_pool) ? mb_pool : NULL; + rxq->umem = xdp_umem_configure(rxq->xsk_fd, mb_pool); if (!rxq->umem) goto err; new_umem = 1; @@ -918,7 +977,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev, rxq->mb_pool = mb_pool; - if (xsk_configure(rxq, nb_rx_desc, internals->umem_share)) { + if (xsk_configure(rxq, nb_rx_desc, internals->umem_share, mb_pool)) { RTE_LOG(ERR, PMD, "Failed to configure xdp socket\n"); return -EINVAL; @@ -945,6 +1004,13 @@ eth_rx_queue_setup(struct rte_eth_dev *dev, if (!internals->umem_share) internals->umem_share = rxq->umem; + if (mb_pool == internals->umem_share->mb_pool) + rxq->zc = internals->umem_share->zc; + + if (rxq->zc) + RTE_LOG(INFO, PMD, + "zero copy enabled on rx queue %d\n", rx_queue_id); + internals->umem_share_count++; map_fd = bpf_map_get_fd_by_id(internals->xsk_map_id); From patchwork Thu Aug 16 14:43:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zhang X-Patchwork-Id: 43746 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 48FF25681; Thu, 16 Aug 2018 16:43:00 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 673174C94 for ; Thu, 16 Aug 2018 16:42:52 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Aug 2018 07:42:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,247,1531810800"; d="scan'208";a="65704111" Received: from dpdk51.sh.intel.com ([10.67.110.190]) by orsmga008.jf.intel.com with ESMTP; 16 Aug 2018 07:42:45 -0700 From: Qi Zhang To: dev@dpdk.org Cc: magnus.karlsson@intel.com, bjorn.topel@intel.com, jingjing.wu@intel.com, xiaoyun.li@intel.com, ferruh.yigit@intel.com, Qi Zhang Date: Thu, 16 Aug 2018 22:43:21 +0800 Message-Id: <20180816144321.17719-7-qi.z.zhang@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20180816144321.17719-1-qi.z.zhang@intel.com> References: <20180816144321.17719-1-qi.z.zhang@intel.com> Subject: [dpdk-dev] [RFC v3 6/6] app/testpmd: add mempool flags parameter X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" When create rte_mempool, flags can be parsed from command line. Now, it is possible for testpmd to create a af_xdp friendly mempool (which enable zero copy). Signed-off-by: Qi Zhang --- app/test-pmd/parameters.c | 12 ++++++++++++ app/test-pmd/testpmd.c | 15 +++++++++------ app/test-pmd/testpmd.h | 1 + 3 files changed, 22 insertions(+), 6 deletions(-) diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c index 962fad789..a5778e1a2 100644 --- a/app/test-pmd/parameters.c +++ b/app/test-pmd/parameters.c @@ -61,6 +61,7 @@ usage(char* progname) "--tx-first | --stats-period=PERIOD | " "--coremask=COREMASK --portmask=PORTMASK --numa " "--mbuf-size= | --total-num-mbufs= | " + "--mp-flags= | " "--nb-cores= | --nb-ports= | " #ifdef RTE_LIBRTE_CMDLINE "--eth-peers-configfile= | " @@ -105,6 +106,7 @@ usage(char* progname) printf(" --socket-num=N: set socket from which all memory is allocated " "in NUMA mode.\n"); printf(" --mbuf-size=N: set the data size of mbuf to N bytes.\n"); + printf(" --mp-flags=N: set the flags when create mbuf memory pool.\n"); printf(" --total-num-mbufs=N: set the number of mbufs to be allocated " "in mbuf pools.\n"); printf(" --max-pkt-len=N: set the maximum size of packet to N bytes.\n"); @@ -568,6 +570,7 @@ launch_args_parse(int argc, char** argv) { "ring-numa-config", 1, 0, 0 }, { "socket-num", 1, 0, 0 }, { "mbuf-size", 1, 0, 0 }, + { "mp-flags", 1, 0, 0 }, { "total-num-mbufs", 1, 0, 0 }, { "max-pkt-len", 1, 0, 0 }, { "pkt-filter-mode", 1, 0, 0 }, @@ -772,6 +775,15 @@ launch_args_parse(int argc, char** argv) rte_exit(EXIT_FAILURE, "mbuf-size should be > 0 and < 65536\n"); } + if (!strcmp(lgopts[opt_idx].name, "mp-flags")) { + n = atoi(optarg); + if (n > 0 && n <= 0xFFFF) + mp_flags = (uint16_t)n; + else + rte_exit(EXIT_FAILURE, + "mp-flags should be > 0 and < 65536\n"); + } + if (!strcmp(lgopts[opt_idx].name, "total-num-mbufs")) { n = atoi(optarg); if (n > 1024) diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index ee48db2a3..0567cc5dd 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -173,6 +173,7 @@ uint32_t burst_tx_delay_time = BURST_TX_WAIT_US; uint32_t burst_tx_retry_num = BURST_TX_RETRIES; uint16_t mbuf_data_size = DEFAULT_MBUF_DATA_SIZE; /**< Mbuf data space size. */ +uint16_t mp_flags = 0; /**< flags parsed when create mempool */ uint32_t param_total_num_mbufs = 0; /**< number of mbufs in all pools - if * specified on command-line. */ uint16_t stats_period; /**< Period to show statistics (disabled by default) */ @@ -533,6 +534,7 @@ set_def_fwd_config(void) */ static void mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf, + unsigned int flags, unsigned int socket_id) { char pool_name[RTE_MEMPOOL_NAMESIZE]; @@ -550,7 +552,7 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf, rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf, mb_size, (unsigned) mb_mempool_cache, sizeof(struct rte_pktmbuf_pool_private), - socket_id, 0); + socket_id, flags); if (rte_mp == NULL) goto err; @@ -565,8 +567,8 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf, /* wrapper to rte_mempool_create() */ TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n", rte_mbuf_best_mempool_ops()); - rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf, - mb_mempool_cache, 0, mbuf_seg_size, socket_id); + rte_mp = rte_pktmbuf_pool_create_with_flags(pool_name, nb_mbuf, + mb_mempool_cache, 0, mbuf_seg_size, flags, socket_id); } err: @@ -797,13 +799,14 @@ init_config(void) for (i = 0; i < num_sockets; i++) mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool, - socket_ids[i]); + mp_flags, socket_ids[i]); } else { if (socket_num == UMA_NO_CONFIG) - mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool, 0); + mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool, + mp_flags, 0); else mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool, - socket_num); + mp_flags, socket_num); } init_port_config(); diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index a1f661472..f5f8692ea 100644 --- a/app/test-pmd/testpmd.h +++ b/app/test-pmd/testpmd.h @@ -379,6 +379,7 @@ extern uint8_t dcb_config; extern uint8_t dcb_test; extern uint16_t mbuf_data_size; /**< Mbuf data space size. */ +extern uint16_t mp_flags; /**< flags for mempool creation. */ extern uint32_t param_total_num_mbufs; extern uint16_t stats_period;