From patchwork Mon Sep 9 03:27:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Xiaoyun" X-Patchwork-Id: 58965 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 0D4291EAE1; Mon, 9 Sep 2019 05:28:07 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 405F41EAD9 for ; Mon, 9 Sep 2019 05:28:03 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Sep 2019 20:28:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,483,1559545200"; d="scan'208";a="188896243" Received: from dpdk-xiaoyun3.sh.intel.com ([10.67.119.190]) by orsmga006.jf.intel.com with ESMTP; 08 Sep 2019 20:28:01 -0700 From: Xiaoyun Li To: jingjing.wu@intel.com, keith.wiles@intel.com, omkar.maslekar@intel.com, cunming.liang@intel.com Cc: dev@dpdk.org, Xiaoyun Li Date: Mon, 9 Sep 2019 11:27:27 +0800 Message-Id: <20190909032730.29718-2-xiaoyun.li@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190909032730.29718-1-xiaoyun.li@intel.com> References: <20190906075402.114177-1-xiaoyun.li@intel.com> <20190909032730.29718-1-xiaoyun.li@intel.com> Subject: [dpdk-dev] [PATCH v4 1/4] raw/ntb: setup ntb queue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Setup and init ntb txq and rxq. And negotiate queue information with the peer. If queue size and number of queues are not consistent on both sides, return error. Signed-off-by: Xiaoyun Li --- doc/guides/rawdevs/ntb.rst | 39 +- doc/guides/rel_notes/release_19_11.rst | 4 + drivers/raw/ntb/Makefile | 3 + drivers/raw/ntb/meson.build | 1 + drivers/raw/ntb/ntb.c | 705 ++++++++++++++++++------- drivers/raw/ntb/ntb.h | 151 ++++-- drivers/raw/ntb/ntb_hw_intel.c | 26 +- drivers/raw/ntb/rte_pmd_ntb.h | 43 ++ 8 files changed, 718 insertions(+), 254 deletions(-) create mode 100644 drivers/raw/ntb/rte_pmd_ntb.h diff --git a/doc/guides/rawdevs/ntb.rst b/doc/guides/rawdevs/ntb.rst index 0a61ec03d..99e7db441 100644 --- a/doc/guides/rawdevs/ntb.rst +++ b/doc/guides/rawdevs/ntb.rst @@ -45,8 +45,45 @@ to use, i.e. igb_uio, vfio. The ``dpdk-devbind.py`` script can be used to show devices status and to bind them to a suitable kernel driver. They will appear under the category of "Misc (rawdev) devices". +Ring Layout +----------- + +Since read/write remote system's memory are through PCI bus, remote read +is much more expensive than remote write. Thus, the enqueue and dequeue +based on ntb ring should avoid remote read. The ring layout for ntb is +like the following: +- Ring Format: + desc_ring: + 0 16 64 + +---------------------------------------------------------------+ + | buffer address | + +---------------+-----------------------------------------------+ + | buffer length | resv | + +---------------+-----------------------------------------------+ + used_ring: + 0 16 32 + +---------------+---------------+ + | packet length | flags | + +---------------+---------------+ +- Ring Layout + +------------------------+ +------------------------+ + | used_ring | | desc_ring | + | +---+ | | +---+ | + | | | | | | | | + | +---+ +--------+ | | +---+ | + | | | ---> | buffer | <+---+-| | | + | +---+ +--------+ | | +---+ | + | | | | | | | | + | +---+ | | +---+ | + | ... | | ... | + | | | | + | +---------+ | | +---------+ | + | | tx_tail | | | | rx_tail | | + | System A +---------+ | | System B +---------+ | + +------------------------+ +------------------------+ + <---------traffic--------- + Limitation ---------- -- The FIFO hasn't been introduced and will come in 19.11 release. - This PMD only supports Intel Skylake platform. diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst index 8490d897c..7ac3d5ca6 100644 --- a/doc/guides/rel_notes/release_19_11.rst +++ b/doc/guides/rel_notes/release_19_11.rst @@ -56,6 +56,10 @@ New Features Also, make sure to start the actual text at the margin. ========================================================= + * **Introduced FIFO for NTB PMD.** + + Introduced FIFO for NTB (Non-transparent Bridge) PMD to support + packet based processing. Removed Items ------------- diff --git a/drivers/raw/ntb/Makefile b/drivers/raw/ntb/Makefile index 6fe2aaf40..814cd05ca 100644 --- a/drivers/raw/ntb/Makefile +++ b/drivers/raw/ntb/Makefile @@ -25,4 +25,7 @@ LIBABIVER := 1 SRCS-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV) += ntb.c SRCS-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV) += ntb_hw_intel.c +# install this header file +SYMLINK-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV)-include := rte_pmd_ntb.h + include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/raw/ntb/meson.build b/drivers/raw/ntb/meson.build index 7f39437f8..7a7d26126 100644 --- a/drivers/raw/ntb/meson.build +++ b/drivers/raw/ntb/meson.build @@ -5,4 +5,5 @@ deps += ['rawdev', 'mbuf', 'mempool', 'pci', 'bus_pci'] sources = files('ntb.c', 'ntb_hw_intel.c') +install_headers('rte_pmd_ntb.h') allow_experimental_apis = true diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c index bfecce1e4..728deccdf 100644 --- a/drivers/raw/ntb/ntb.c +++ b/drivers/raw/ntb/ntb.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -19,6 +20,7 @@ #include #include "ntb_hw_intel.h" +#include "rte_pmd_ntb.h" #include "ntb.h" int ntb_logtype; @@ -28,48 +30,7 @@ static const struct rte_pci_id pci_id_ntb_map[] = { { .vendor_id = 0, /* sentinel */ }, }; -static int -ntb_set_mw(struct rte_rawdev *dev, int mw_idx, uint64_t mw_size) -{ - struct ntb_hw *hw = dev->dev_private; - char mw_name[RTE_MEMZONE_NAMESIZE]; - const struct rte_memzone *mz; - int ret = 0; - - if (hw->ntb_ops->mw_set_trans == NULL) { - NTB_LOG(ERR, "Not supported to set mw."); - return -ENOTSUP; - } - - snprintf(mw_name, sizeof(mw_name), "ntb_%d_mw_%d", - dev->dev_id, mw_idx); - - mz = rte_memzone_lookup(mw_name); - if (mz) - return 0; - - /** - * Hardware requires that mapped memory base address should be - * aligned with EMBARSZ and needs continuous memzone. - */ - mz = rte_memzone_reserve_aligned(mw_name, mw_size, dev->socket_id, - RTE_MEMZONE_IOVA_CONTIG, hw->mw_size[mw_idx]); - if (!mz) { - NTB_LOG(ERR, "Cannot allocate aligned memzone."); - return -EIO; - } - hw->mz[mw_idx] = mz; - - ret = (*hw->ntb_ops->mw_set_trans)(dev, mw_idx, mz->iova, mw_size); - if (ret) { - NTB_LOG(ERR, "Cannot set mw translation."); - return ret; - } - - return ret; -} - -static void +static inline void ntb_link_cleanup(struct rte_rawdev *dev) { struct ntb_hw *hw = dev->dev_private; @@ -89,20 +50,94 @@ ntb_link_cleanup(struct rte_rawdev *dev) } /* Clear mw so that peer cannot access local memory.*/ - for (i = 0; i < hw->mw_cnt; i++) { + for (i = 0; i < hw->used_mw_num; i++) { status = (*hw->ntb_ops->mw_set_trans)(dev, i, 0, 0); if (status) NTB_LOG(ERR, "Failed to clean mw."); } } +static inline int +ntb_handshake_work(const struct rte_rawdev *dev) +{ + struct ntb_hw *hw = dev->dev_private; + uint32_t val; + int ret, i; + + if (hw->ntb_ops->spad_write == NULL || + hw->ntb_ops->mw_set_trans == NULL) { + NTB_LOG(ERR, "Scratchpad/MW setting is not supported."); + return -ENOTSUP; + } + + /* Tell peer the mw info of local side. */ + ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS, 1, hw->mw_cnt); + if (ret < 0) + return ret; + for (i = 0; i < hw->mw_cnt; i++) { + NTB_LOG(INFO, "Local %u mw size: 0x%"PRIx64"", i, + hw->mw_size[i]); + val = hw->mw_size[i] >> 32; + ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_H + 2 * i, + 1, val); + if (ret < 0) + return ret; + val = hw->mw_size[i]; + ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_L + 2 * i, + 1, val); + if (ret < 0) + return ret; + } + + /* Tell peer about the queue info and map memory to the peer. */ + ret = (*hw->ntb_ops->spad_write)(dev, SPAD_Q_SZ, 1, hw->queue_size); + if (ret < 0) + return ret; + ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_QPS, 1, + hw->queue_pairs); + if (ret < 0) + return ret; + ret = (*hw->ntb_ops->spad_write)(dev, SPAD_USED_MWS, 1, + hw->used_mw_num); + if (ret < 0) + return ret; + for (i = 0; i < hw->used_mw_num; i++) { + val = (uint64_t)(size_t)(hw->mz[i]->addr) >> 32; + ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_BA_H + 2 * i, + 1, val); + if (ret < 0) + return ret; + val = (uint64_t)(size_t)(hw->mz[i]->addr); + ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_BA_L + 2 * i, + 1, val); + if (ret < 0) + return ret; + } + + for (i = 0; i < hw->used_mw_num; i++) { + ret = (*hw->ntb_ops->mw_set_trans)(dev, i, hw->mz[i]->iova, + hw->mz[i]->len); + if (ret < 0) + return ret; + } + + /* Ring doorbell 0 to tell peer the device is ready. */ + ret = (*hw->ntb_ops->peer_db_set)(dev, 0); + if (ret < 0) + return ret; + + return 0; +} + static void ntb_dev_intr_handler(void *param) { struct rte_rawdev *dev = (struct rte_rawdev *)param; struct ntb_hw *hw = dev->dev_private; - uint32_t mw_size_h, mw_size_l; + uint32_t val_h, val_l; + uint64_t peer_mw_size; uint64_t db_bits = 0; + uint8_t peer_mw_cnt; int i = 0; if (hw->ntb_ops->db_read == NULL || @@ -118,7 +153,7 @@ ntb_dev_intr_handler(void *param) /* Doorbell 0 is for peer device ready. */ if (db_bits & 1) { - NTB_LOG(DEBUG, "DB0: Peer device is up."); + NTB_LOG(INFO, "DB0: Peer device is up."); /* Clear received doorbell. */ (*hw->ntb_ops->db_clear)(dev, 1); @@ -129,47 +164,44 @@ ntb_dev_intr_handler(void *param) if (hw->peer_dev_up) return; - if (hw->ntb_ops->spad_read == NULL || - hw->ntb_ops->spad_write == NULL) { - NTB_LOG(ERR, "Scratchpad is not supported."); + if (hw->ntb_ops->spad_read == NULL) { + NTB_LOG(ERR, "Scratchpad read is not supported."); + return; + } + + /* Check if mw setting on the peer is the same as local. */ + peer_mw_cnt = (*hw->ntb_ops->spad_read)(dev, SPAD_NUM_MWS, 0); + if (peer_mw_cnt != hw->mw_cnt) { + NTB_LOG(ERR, "Both mw cnt must be the same."); return; } - hw->peer_mw_cnt = (*hw->ntb_ops->spad_read) - (dev, SPAD_NUM_MWS, 0); - hw->peer_mw_size = rte_zmalloc("uint64_t", - hw->peer_mw_cnt * sizeof(uint64_t), 0); for (i = 0; i < hw->mw_cnt; i++) { - mw_size_h = (*hw->ntb_ops->spad_read) - (dev, SPAD_MW0_SZ_H + 2 * i, 0); - mw_size_l = (*hw->ntb_ops->spad_read) - (dev, SPAD_MW0_SZ_L + 2 * i, 0); - hw->peer_mw_size[i] = ((uint64_t)mw_size_h << 32) | - mw_size_l; + val_h = (*hw->ntb_ops->spad_read) + (dev, SPAD_MW0_SZ_H + 2 * i, 0); + val_l = (*hw->ntb_ops->spad_read) + (dev, SPAD_MW0_SZ_L + 2 * i, 0); + peer_mw_size = ((uint64_t)val_h << 32) | val_l; NTB_LOG(DEBUG, "Peer %u mw size: 0x%"PRIx64"", i, - hw->peer_mw_size[i]); + peer_mw_size); + if (peer_mw_size != hw->mw_size[i]) { + NTB_LOG(ERR, "Mw config must be the same."); + return; + } } hw->peer_dev_up = 1; /** - * Handshake with peer. Spad_write only works when both - * devices are up. So write spad again when db is received. - * And set db again for the later device who may miss + * Handshake with peer. Spad_write & mw_set_trans only works + * when both devices are up. So write spad again when db is + * received. And set db again for the later device who may miss * the 1st db. */ - for (i = 0; i < hw->mw_cnt; i++) { - (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS, - 1, hw->mw_cnt); - mw_size_h = hw->mw_size[i] >> 32; - (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_H + 2 * i, - 1, mw_size_h); - - mw_size_l = hw->mw_size[i]; - (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_L + 2 * i, - 1, mw_size_l); + if (ntb_handshake_work(dev) < 0) { + NTB_LOG(ERR, "Handshake work failed."); + return; } - (*hw->ntb_ops->peer_db_set)(dev, 0); /* To get the link info. */ if (hw->ntb_ops->get_link_status == NULL) { @@ -183,7 +215,7 @@ ntb_dev_intr_handler(void *param) } if (db_bits & (1 << 1)) { - NTB_LOG(DEBUG, "DB1: Peer device is down."); + NTB_LOG(INFO, "DB1: Peer device is down."); /* Clear received doorbell. */ (*hw->ntb_ops->db_clear)(dev, 2); @@ -197,7 +229,7 @@ ntb_dev_intr_handler(void *param) } if (db_bits & (1 << 2)) { - NTB_LOG(DEBUG, "DB2: Peer device agrees dev to be down."); + NTB_LOG(INFO, "DB2: Peer device agrees dev to be down."); /* Clear received doorbell. */ (*hw->ntb_ops->db_clear)(dev, (1 << 2)); hw->peer_dev_up = 0; @@ -206,24 +238,228 @@ ntb_dev_intr_handler(void *param) } static void -ntb_queue_conf_get(struct rte_rawdev *dev __rte_unused, - uint16_t queue_id __rte_unused, - rte_rawdev_obj_t queue_conf __rte_unused) +ntb_queue_conf_get(struct rte_rawdev *dev, + uint16_t queue_id, + rte_rawdev_obj_t queue_conf) +{ + struct ntb_queue_conf *q_conf = queue_conf; + struct ntb_hw *hw = dev->dev_private; + + q_conf->tx_free_thresh = hw->tx_queues[queue_id]->tx_free_thresh; + q_conf->nb_desc = hw->rx_queues[queue_id]->nb_rx_desc; + q_conf->rx_mp = hw->rx_queues[queue_id]->mpool; +} + +static void +ntb_rxq_release_mbufs(struct ntb_rx_queue *q) +{ + int i; + + if (!q || !q->sw_ring) { + NTB_LOG(ERR, "Pointer to rxq or sw_ring is NULL"); + return; + } + + for (i = 0; i < q->nb_rx_desc; i++) { + if (q->sw_ring[i].mbuf) { + rte_pktmbuf_free_seg(q->sw_ring[i].mbuf); + q->sw_ring[i].mbuf = NULL; + } + } +} + +static void +ntb_rxq_release(struct ntb_rx_queue *rxq) +{ + if (!rxq) { + NTB_LOG(ERR, "Pointer to rxq is NULL"); + return; + } + + ntb_rxq_release_mbufs(rxq); + + rte_free(rxq->sw_ring); + rte_free(rxq); +} + +static int +ntb_rxq_setup(struct rte_rawdev *dev, + uint16_t qp_id, + rte_rawdev_obj_t queue_conf) +{ + struct ntb_queue_conf *rxq_conf = queue_conf; + struct ntb_hw *hw = dev->dev_private; + struct ntb_rx_queue *rxq; + + /* Allocate the rx queue data structure */ + rxq = rte_zmalloc_socket("ntb rx queue", + sizeof(struct ntb_rx_queue), + RTE_CACHE_LINE_SIZE, + dev->socket_id); + if (!rxq) { + NTB_LOG(ERR, "Failed to allocate memory for " + "rx queue data structure."); + return -ENOMEM; + } + + if (rxq_conf->rx_mp == NULL) { + NTB_LOG(ERR, "Invalid null mempool pointer."); + return -EINVAL; + } + rxq->nb_rx_desc = rxq_conf->nb_desc; + rxq->mpool = rxq_conf->rx_mp; + rxq->port_id = dev->dev_id; + rxq->queue_id = qp_id; + rxq->hw = hw; + + /* Allocate the software ring. */ + rxq->sw_ring = + rte_zmalloc_socket("ntb rx sw ring", + sizeof(struct ntb_rx_entry) * + rxq->nb_rx_desc, + RTE_CACHE_LINE_SIZE, + dev->socket_id); + if (!rxq->sw_ring) { + ntb_rxq_release(rxq); + NTB_LOG(ERR, "Failed to allocate memory for SW ring"); + return -ENOMEM; + } + + hw->rx_queues[qp_id] = rxq; + + return 0; +} + +static void +ntb_txq_release_mbufs(struct ntb_tx_queue *q) +{ + int i; + + if (!q || !q->sw_ring) { + NTB_LOG(ERR, "Pointer to txq or sw_ring is NULL"); + return; + } + + for (i = 0; i < q->nb_tx_desc; i++) { + if (q->sw_ring[i].mbuf) { + rte_pktmbuf_free_seg(q->sw_ring[i].mbuf); + q->sw_ring[i].mbuf = NULL; + } + } +} + +static void +ntb_txq_release(struct ntb_tx_queue *txq) { + if (!txq) { + NTB_LOG(ERR, "Pointer to txq is NULL"); + return; + } + + ntb_txq_release_mbufs(txq); + + rte_free(txq->sw_ring); + rte_free(txq); } static int -ntb_queue_setup(struct rte_rawdev *dev __rte_unused, - uint16_t queue_id __rte_unused, - rte_rawdev_obj_t queue_conf __rte_unused) +ntb_txq_setup(struct rte_rawdev *dev, + uint16_t qp_id, + rte_rawdev_obj_t queue_conf) { + struct ntb_queue_conf *txq_conf = queue_conf; + struct ntb_hw *hw = dev->dev_private; + struct ntb_tx_queue *txq; + uint16_t i, prev; + + /* Allocate the TX queue data structure. */ + txq = rte_zmalloc_socket("ntb tx queue", + sizeof(struct ntb_tx_queue), + RTE_CACHE_LINE_SIZE, + dev->socket_id); + if (!txq) { + NTB_LOG(ERR, "Failed to allocate memory for " + "tx queue structure"); + return -ENOMEM; + } + + txq->nb_tx_desc = txq_conf->nb_desc; + txq->port_id = dev->dev_id; + txq->queue_id = qp_id; + txq->hw = hw; + + /* Allocate software ring */ + txq->sw_ring = + rte_zmalloc_socket("ntb tx sw ring", + sizeof(struct ntb_tx_entry) * + txq->nb_tx_desc, + RTE_CACHE_LINE_SIZE, + dev->socket_id); + if (!txq->sw_ring) { + ntb_txq_release(txq); + NTB_LOG(ERR, "Failed to allocate memory for SW TX ring"); + return -ENOMEM; + } + + prev = txq->nb_tx_desc - 1; + for (i = 0; i < txq->nb_tx_desc; i++) { + txq->sw_ring[i].mbuf = NULL; + txq->sw_ring[i].last_id = i; + txq->sw_ring[prev].next_id = i; + prev = i; + } + + txq->tx_free_thresh = txq_conf->tx_free_thresh ? + txq_conf->tx_free_thresh : + NTB_DFLT_TX_FREE_THRESH; + if (txq->tx_free_thresh >= txq->nb_tx_desc - 3) { + NTB_LOG(ERR, "tx_free_thresh must be less than nb_desc - 3. " + "(tx_free_thresh=%u qp_id=%u)", txq->tx_free_thresh, + qp_id); + return -EINVAL; + } + + hw->tx_queues[qp_id] = txq; + return 0; } + +static int +ntb_queue_setup(struct rte_rawdev *dev, + uint16_t queue_id, + rte_rawdev_obj_t queue_conf) +{ + struct ntb_hw *hw = dev->dev_private; + int ret; + + if (queue_id > hw->queue_pairs) + return -EINVAL; + + ret = ntb_txq_setup(dev, queue_id, queue_conf); + if (ret < 0) + return ret; + + ret = ntb_rxq_setup(dev, queue_id, queue_conf); + + return ret; +} + static int -ntb_queue_release(struct rte_rawdev *dev __rte_unused, - uint16_t queue_id __rte_unused) +ntb_queue_release(struct rte_rawdev *dev, uint16_t queue_id) { + struct ntb_hw *hw = dev->dev_private; + struct ntb_tx_queue *txq; + struct ntb_rx_queue *rxq; + + if (queue_id > hw->queue_pairs) + return -EINVAL; + + txq = hw->tx_queues[queue_id]; + rxq = hw->rx_queues[queue_id]; + ntb_txq_release(txq); + ntb_rxq_release(rxq); + return 0; } @@ -234,6 +470,77 @@ ntb_queue_count(struct rte_rawdev *dev) return hw->queue_pairs; } +static int +ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id) +{ + struct ntb_hw *hw = dev->dev_private; + struct ntb_rx_queue *rxq = hw->rx_queues[qp_id]; + struct ntb_tx_queue *txq = hw->tx_queues[qp_id]; + volatile struct ntb_header *local_hdr; + struct ntb_header *remote_hdr; + uint16_t q_size = hw->queue_size; + uint32_t hdr_offset; + void *bar_addr; + uint16_t i; + + if (hw->ntb_ops->get_peer_mw_addr == NULL) { + NTB_LOG(ERR, "Failed to get mapped peer addr."); + return -EINVAL; + } + + /* Put queue info into the start of shared memory. */ + hdr_offset = hw->hdr_size_per_queue * qp_id; + local_hdr = (volatile struct ntb_header *) + ((size_t)hw->mz[0]->addr + hdr_offset); + bar_addr = (*hw->ntb_ops->get_peer_mw_addr)(dev, 0); + if (bar_addr == NULL) + return -EINVAL; + remote_hdr = (struct ntb_header *) + ((size_t)bar_addr + hdr_offset); + + /* rxq init. */ + rxq->rx_desc_ring = (struct ntb_desc *) + (&remote_hdr->desc_ring); + rxq->rx_used_ring = (volatile struct ntb_used *) + (&local_hdr->desc_ring[q_size]); + rxq->avail_cnt = &remote_hdr->avail_cnt; + rxq->used_cnt = &local_hdr->used_cnt; + + for (i = 0; i < rxq->nb_rx_desc - 1; i++) { + struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mpool); + if (unlikely(!mbuf)) { + NTB_LOG(ERR, "Failed to allocate mbuf for RX"); + return -ENOMEM; + } + mbuf->port = dev->dev_id; + + rxq->sw_ring[i].mbuf = mbuf; + + rxq->rx_desc_ring[i].addr = rte_pktmbuf_mtod(mbuf, size_t); + rxq->rx_desc_ring[i].len = mbuf->buf_len - RTE_PKTMBUF_HEADROOM; + } + rte_wmb(); + *rxq->avail_cnt = rxq->nb_rx_desc - 1; + rxq->last_avail = rxq->nb_rx_desc - 1; + rxq->last_used = 0; + + /* txq init */ + txq->tx_desc_ring = (volatile struct ntb_desc *) + (&local_hdr->desc_ring); + txq->tx_used_ring = (struct ntb_used *) + (&remote_hdr->desc_ring[q_size]); + txq->avail_cnt = &local_hdr->avail_cnt; + txq->used_cnt = &remote_hdr->used_cnt; + + rte_wmb(); + *txq->used_cnt = 0; + txq->last_used = 0; + txq->last_avail = 0; + txq->nb_tx_free = txq->nb_tx_desc - 1; + + return 0; +} + static int ntb_enqueue_bufs(struct rte_rawdev *dev, struct rte_rawdev_buf **buffers, @@ -278,58 +585,51 @@ static void ntb_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info) { struct ntb_hw *hw = dev->dev_private; - struct ntb_attr *ntb_attrs = dev_info; + struct ntb_dev_info *info = dev_info; - strncpy(ntb_attrs[NTB_TOPO_ID].name, NTB_TOPO_NAME, NTB_ATTR_NAME_LEN); - switch (hw->topo) { - case NTB_TOPO_B2B_DSD: - strncpy(ntb_attrs[NTB_TOPO_ID].value, "B2B DSD", - NTB_ATTR_VAL_LEN); - break; - case NTB_TOPO_B2B_USD: - strncpy(ntb_attrs[NTB_TOPO_ID].value, "B2B USD", - NTB_ATTR_VAL_LEN); - break; - default: - strncpy(ntb_attrs[NTB_TOPO_ID].value, "Unsupported", - NTB_ATTR_VAL_LEN); - } + info->mw_cnt = hw->mw_cnt; + info->mw_size = hw->mw_size; - strncpy(ntb_attrs[NTB_LINK_STATUS_ID].name, NTB_LINK_STATUS_NAME, - NTB_ATTR_NAME_LEN); - snprintf(ntb_attrs[NTB_LINK_STATUS_ID].value, NTB_ATTR_VAL_LEN, - "%d", hw->link_status); - - strncpy(ntb_attrs[NTB_SPEED_ID].name, NTB_SPEED_NAME, - NTB_ATTR_NAME_LEN); - snprintf(ntb_attrs[NTB_SPEED_ID].value, NTB_ATTR_VAL_LEN, - "%d", hw->link_speed); - - strncpy(ntb_attrs[NTB_WIDTH_ID].name, NTB_WIDTH_NAME, - NTB_ATTR_NAME_LEN); - snprintf(ntb_attrs[NTB_WIDTH_ID].value, NTB_ATTR_VAL_LEN, - "%d", hw->link_width); - - strncpy(ntb_attrs[NTB_MW_CNT_ID].name, NTB_MW_CNT_NAME, - NTB_ATTR_NAME_LEN); - snprintf(ntb_attrs[NTB_MW_CNT_ID].value, NTB_ATTR_VAL_LEN, - "%d", hw->mw_cnt); + /** + * Intel hardware requires that mapped memory base address should be + * aligned with EMBARSZ and needs continuous memzone. + */ + info->mw_size_align = (uint8_t)(hw->pci_dev->id.vendor_id == + NTB_INTEL_VENDOR_ID); - strncpy(ntb_attrs[NTB_DB_CNT_ID].name, NTB_DB_CNT_NAME, - NTB_ATTR_NAME_LEN); - snprintf(ntb_attrs[NTB_DB_CNT_ID].value, NTB_ATTR_VAL_LEN, - "%d", hw->db_cnt); + if (!hw->queue_size || !hw->queue_pairs) { + NTB_LOG(ERR, "No queue size and queue num assigned."); + return; + } - strncpy(ntb_attrs[NTB_SPAD_CNT_ID].name, NTB_SPAD_CNT_NAME, - NTB_ATTR_NAME_LEN); - snprintf(ntb_attrs[NTB_SPAD_CNT_ID].value, NTB_ATTR_VAL_LEN, - "%d", hw->spad_cnt); + hw->hdr_size_per_queue = RTE_ALIGN(sizeof(struct ntb_header) + + hw->queue_size * sizeof(struct ntb_desc) + + hw->queue_size * sizeof(struct ntb_used), + RTE_CACHE_LINE_SIZE); + info->ntb_hdr_size = hw->hdr_size_per_queue * hw->queue_pairs; } static int -ntb_dev_configure(const struct rte_rawdev *dev __rte_unused, - rte_rawdev_obj_t config __rte_unused) +ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config) { + struct ntb_dev_config *conf = config; + struct ntb_hw *hw = dev->dev_private; + int ret; + + hw->queue_pairs = conf->num_queues; + hw->queue_size = conf->queue_size; + hw->used_mw_num = conf->mz_num; + hw->mz = conf->mz_list; + hw->rx_queues = rte_zmalloc("ntb_rx_queues", + sizeof(struct ntb_rx_queue *) * hw->queue_pairs, 0); + hw->tx_queues = rte_zmalloc("ntb_tx_queues", + sizeof(struct ntb_tx_queue *) * hw->queue_pairs, 0); + + /* Start handshake with the peer. */ + ret = ntb_handshake_work(dev); + if (ret < 0) + return ret; + return 0; } @@ -337,21 +637,52 @@ static int ntb_dev_start(struct rte_rawdev *dev) { struct ntb_hw *hw = dev->dev_private; - int ret, i; + uint32_t peer_base_l, peer_val; + uint64_t peer_base_h; + uint32_t i; + int ret; - /* TODO: init queues and start queues. */ + if (!hw->link_status || !hw->peer_dev_up) + return -EINVAL; - /* Map memory of bar_size to remote. */ - hw->mz = rte_zmalloc("struct rte_memzone *", - hw->mw_cnt * sizeof(struct rte_memzone *), 0); - for (i = 0; i < hw->mw_cnt; i++) { - ret = ntb_set_mw(dev, i, hw->mw_size[i]); + for (i = 0; i < hw->queue_pairs; i++) { + ret = ntb_queue_init(dev, i); if (ret) { - NTB_LOG(ERR, "Fail to set mw."); + NTB_LOG(ERR, "Failed to init queue."); return ret; } } + hw->peer_mw_base = rte_zmalloc("ntb_peer_mw_base", hw->mw_cnt * + sizeof(uint64_t), 0); + + if (hw->ntb_ops->spad_read == NULL) + return -ENOTSUP; + + peer_val = (*hw->ntb_ops->spad_read)(dev, SPAD_Q_SZ, 0); + if (peer_val != hw->queue_size) { + NTB_LOG(ERR, "Inconsistent queue size! (local: %u peer: %u)", + hw->queue_size, peer_val); + return -EINVAL; + } + + peer_val = (*hw->ntb_ops->spad_read)(dev, SPAD_NUM_QPS, 0); + if (peer_val != hw->queue_pairs) { + NTB_LOG(ERR, "Inconsistent number of queues! (local: %u peer:" + " %u)", hw->queue_pairs, peer_val); + return -EINVAL; + } + + hw->peer_used_mws = (*hw->ntb_ops->spad_read)(dev, SPAD_USED_MWS, 0); + + for (i = 0; i < hw->peer_used_mws; i++) { + peer_base_h = (*hw->ntb_ops->spad_read)(dev, + SPAD_MW0_BA_H + 2 * i, 0); + peer_base_l = (*hw->ntb_ops->spad_read)(dev, + SPAD_MW0_BA_L + 2 * i, 0); + hw->peer_mw_base[i] = (peer_base_h << 32) + peer_base_l; + } + dev->started = 1; return 0; @@ -361,10 +692,10 @@ static void ntb_dev_stop(struct rte_rawdev *dev) { struct ntb_hw *hw = dev->dev_private; + struct ntb_rx_queue *rxq; + struct ntb_tx_queue *txq; uint32_t time_out; - int status; - - /* TODO: stop rx/tx queues. */ + int status, i; if (!hw->peer_dev_up) goto clean; @@ -405,6 +736,13 @@ ntb_dev_stop(struct rte_rawdev *dev) if (status) NTB_LOG(ERR, "Failed to clear doorbells."); + for (i = 0; i < hw->queue_pairs; i++) { + rxq = hw->rx_queues[i]; + txq = hw->tx_queues[i]; + ntb_rxq_release_mbufs(rxq); + ntb_txq_release_mbufs(txq); + } + dev->started = 0; } @@ -413,12 +751,15 @@ ntb_dev_close(struct rte_rawdev *dev) { struct ntb_hw *hw = dev->dev_private; struct rte_intr_handle *intr_handle; - int ret = 0; + int i; if (dev->started) ntb_dev_stop(dev); - /* TODO: free queues. */ + /* free queues */ + for (i = 0; i < hw->queue_pairs; i++) + ntb_queue_release(dev, i); + hw->queue_pairs = 0; intr_handle = &hw->pci_dev->intr_handle; /* Clean datapath event and vec mapping */ @@ -434,7 +775,7 @@ ntb_dev_close(struct rte_rawdev *dev) rte_intr_callback_unregister(intr_handle, ntb_dev_intr_handler, dev); - return ret; + return 0; } static int @@ -445,7 +786,7 @@ ntb_dev_reset(struct rte_rawdev *rawdev __rte_unused) static int ntb_attr_set(struct rte_rawdev *dev, const char *attr_name, - uint64_t attr_value) + uint64_t attr_value) { struct ntb_hw *hw; int index; @@ -463,7 +804,21 @@ ntb_attr_set(struct rte_rawdev *dev, const char *attr_name, index = atoi(&attr_name[NTB_SPAD_USER_LEN]); (*hw->ntb_ops->spad_write)(dev, hw->spad_user_list[index], 1, attr_value); - NTB_LOG(INFO, "Set attribute (%s) Value (%" PRIu64 ")", + NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")", + attr_name, attr_value); + return 0; + } + + if (!strncmp(attr_name, NTB_QUEUE_SZ_NAME, NTB_ATTR_NAME_LEN)) { + hw->queue_size = attr_value; + NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")", + attr_name, attr_value); + return 0; + } + + if (!strncmp(attr_name, NTB_QUEUE_NUM_NAME, NTB_ATTR_NAME_LEN)) { + hw->queue_pairs = attr_value; + NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")", attr_name, attr_value); return 0; } @@ -475,7 +830,7 @@ ntb_attr_set(struct rte_rawdev *dev, const char *attr_name, static int ntb_attr_get(struct rte_rawdev *dev, const char *attr_name, - uint64_t *attr_value) + uint64_t *attr_value) { struct ntb_hw *hw; int index; @@ -489,49 +844,50 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name, if (!strncmp(attr_name, NTB_TOPO_NAME, NTB_ATTR_NAME_LEN)) { *attr_value = hw->topo; - NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")", + NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")", attr_name, *attr_value); return 0; } if (!strncmp(attr_name, NTB_LINK_STATUS_NAME, NTB_ATTR_NAME_LEN)) { - *attr_value = hw->link_status; - NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")", + /* hw->link_status only indicates hw link status. */ + *attr_value = hw->link_status && hw->peer_dev_up; + NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")", attr_name, *attr_value); return 0; } if (!strncmp(attr_name, NTB_SPEED_NAME, NTB_ATTR_NAME_LEN)) { *attr_value = hw->link_speed; - NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")", + NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")", attr_name, *attr_value); return 0; } if (!strncmp(attr_name, NTB_WIDTH_NAME, NTB_ATTR_NAME_LEN)) { *attr_value = hw->link_width; - NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")", + NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")", attr_name, *attr_value); return 0; } if (!strncmp(attr_name, NTB_MW_CNT_NAME, NTB_ATTR_NAME_LEN)) { *attr_value = hw->mw_cnt; - NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")", + NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")", attr_name, *attr_value); return 0; } if (!strncmp(attr_name, NTB_DB_CNT_NAME, NTB_ATTR_NAME_LEN)) { *attr_value = hw->db_cnt; - NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")", + NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")", attr_name, *attr_value); return 0; } if (!strncmp(attr_name, NTB_SPAD_CNT_NAME, NTB_ATTR_NAME_LEN)) { *attr_value = hw->spad_cnt; - NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")", + NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")", attr_name, *attr_value); return 0; } @@ -542,7 +898,7 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name, index = atoi(&attr_name[NTB_SPAD_USER_LEN]); *attr_value = (*hw->ntb_ops->spad_read)(dev, hw->spad_user_list[index], 0); - NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")", + NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")", attr_name, *attr_value); return 0; } @@ -585,6 +941,7 @@ ntb_xstats_reset(struct rte_rawdev *dev __rte_unused, return 0; } + static const struct rte_rawdev_ops ntb_ops = { .dev_info_get = ntb_dev_info_get, .dev_configure = ntb_dev_configure, @@ -615,7 +972,6 @@ ntb_init_hw(struct rte_rawdev *dev, struct rte_pci_device *pci_dev) { struct ntb_hw *hw = dev->dev_private; struct rte_intr_handle *intr_handle; - uint32_t val; int ret, i; hw->pci_dev = pci_dev; @@ -688,45 +1044,6 @@ ntb_init_hw(struct rte_rawdev *dev, struct rte_pci_device *pci_dev) /* enable uio intr after callback register */ rte_intr_enable(intr_handle); - if (hw->ntb_ops->spad_write == NULL) { - NTB_LOG(ERR, "Scratchpad is not supported."); - return -ENOTSUP; - } - /* Tell peer the mw_cnt of local side. */ - ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS, 1, hw->mw_cnt); - if (ret) { - NTB_LOG(ERR, "Failed to tell peer mw count."); - return ret; - } - - /* Tell peer each mw size on local side. */ - for (i = 0; i < hw->mw_cnt; i++) { - NTB_LOG(DEBUG, "Local %u mw size: 0x%"PRIx64"", i, - hw->mw_size[i]); - val = hw->mw_size[i] >> 32; - ret = (*hw->ntb_ops->spad_write) - (dev, SPAD_MW0_SZ_H + 2 * i, 1, val); - if (ret) { - NTB_LOG(ERR, "Failed to tell peer mw size."); - return ret; - } - - val = hw->mw_size[i]; - ret = (*hw->ntb_ops->spad_write) - (dev, SPAD_MW0_SZ_L + 2 * i, 1, val); - if (ret) { - NTB_LOG(ERR, "Failed to tell peer mw size."); - return ret; - } - } - - /* Ring doorbell 0 to tell peer the device is ready. */ - ret = (*hw->ntb_ops->peer_db_set)(dev, 0); - if (ret) { - NTB_LOG(ERR, "Failed to tell peer device is probed."); - return ret; - } - return ret; } @@ -839,5 +1156,5 @@ RTE_INIT(ntb_init_log) { ntb_logtype = rte_log_register("pmd.raw.ntb"); if (ntb_logtype >= 0) - rte_log_set_level(ntb_logtype, RTE_LOG_DEBUG); + rte_log_set_level(ntb_logtype, RTE_LOG_INFO); } diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h index d355231b0..0ad20aed3 100644 --- a/drivers/raw/ntb/ntb.h +++ b/drivers/raw/ntb/ntb.h @@ -2,8 +2,8 @@ * Copyright(c) 2019 Intel Corporation. */ -#ifndef _NTB_RAWDEV_H_ -#define _NTB_RAWDEV_H_ +#ifndef _NTB_H_ +#define _NTB_H_ #include @@ -19,38 +19,13 @@ extern int ntb_logtype; /* Device IDs */ #define NTB_INTEL_DEV_ID_B2B_SKX 0x201C -#define NTB_TOPO_NAME "topo" -#define NTB_LINK_STATUS_NAME "link_status" -#define NTB_SPEED_NAME "speed" -#define NTB_WIDTH_NAME "width" -#define NTB_MW_CNT_NAME "mw_count" -#define NTB_DB_CNT_NAME "db_count" -#define NTB_SPAD_CNT_NAME "spad_count" /* Reserved to app to use. */ #define NTB_SPAD_USER "spad_user_" #define NTB_SPAD_USER_LEN (sizeof(NTB_SPAD_USER) - 1) -#define NTB_SPAD_USER_MAX_NUM 10 +#define NTB_SPAD_USER_MAX_NUM 4 #define NTB_ATTR_NAME_LEN 30 -#define NTB_ATTR_VAL_LEN 30 -#define NTB_ATTR_MAX 20 - -/* NTB Attributes */ -struct ntb_attr { - /**< Name of the attribute */ - char name[NTB_ATTR_NAME_LEN]; - /**< Value or reference of value of attribute */ - char value[NTB_ATTR_NAME_LEN]; -}; -enum ntb_attr_idx { - NTB_TOPO_ID = 0, - NTB_LINK_STATUS_ID, - NTB_SPEED_ID, - NTB_WIDTH_ID, - NTB_MW_CNT_ID, - NTB_DB_CNT_ID, - NTB_SPAD_CNT_ID, -}; +#define NTB_DFLT_TX_FREE_THRESH 256 enum ntb_topo { NTB_TOPO_NONE = 0, @@ -87,10 +62,15 @@ enum ntb_spad_idx { SPAD_NUM_MWS = 1, SPAD_NUM_QPS, SPAD_Q_SZ, + SPAD_USED_MWS, SPAD_MW0_SZ_H, SPAD_MW0_SZ_L, SPAD_MW1_SZ_H, SPAD_MW1_SZ_L, + SPAD_MW0_BA_H, + SPAD_MW0_BA_L, + SPAD_MW1_BA_H, + SPAD_MW1_BA_L, }; /** @@ -110,26 +90,97 @@ enum ntb_spad_idx { * @vector_bind: Bind vector source [intr] to msix vector [msix]. */ struct ntb_dev_ops { - int (*ntb_dev_init)(struct rte_rawdev *dev); - void *(*get_peer_mw_addr)(struct rte_rawdev *dev, int mw_idx); - int (*mw_set_trans)(struct rte_rawdev *dev, int mw_idx, + int (*ntb_dev_init)(const struct rte_rawdev *dev); + void *(*get_peer_mw_addr)(const struct rte_rawdev *dev, int mw_idx); + int (*mw_set_trans)(const struct rte_rawdev *dev, int mw_idx, uint64_t addr, uint64_t size); - int (*get_link_status)(struct rte_rawdev *dev); - int (*set_link)(struct rte_rawdev *dev, bool up); - uint32_t (*spad_read)(struct rte_rawdev *dev, int spad, bool peer); - int (*spad_write)(struct rte_rawdev *dev, int spad, + int (*get_link_status)(const struct rte_rawdev *dev); + int (*set_link)(const struct rte_rawdev *dev, bool up); + uint32_t (*spad_read)(const struct rte_rawdev *dev, int spad, + bool peer); + int (*spad_write)(const struct rte_rawdev *dev, int spad, bool peer, uint32_t spad_v); - uint64_t (*db_read)(struct rte_rawdev *dev); - int (*db_clear)(struct rte_rawdev *dev, uint64_t db_bits); - int (*db_set_mask)(struct rte_rawdev *dev, uint64_t db_mask); - int (*peer_db_set)(struct rte_rawdev *dev, uint8_t db_bit); - int (*vector_bind)(struct rte_rawdev *dev, uint8_t intr, uint8_t msix); + uint64_t (*db_read)(const struct rte_rawdev *dev); + int (*db_clear)(const struct rte_rawdev *dev, uint64_t db_bits); + int (*db_set_mask)(const struct rte_rawdev *dev, uint64_t db_mask); + int (*peer_db_set)(const struct rte_rawdev *dev, uint8_t db_bit); + int (*vector_bind)(const struct rte_rawdev *dev, uint8_t intr, + uint8_t msix); +}; + +struct ntb_desc { + uint64_t addr; /* buffer addr */ + uint16_t len; /* buffer length */ + uint16_t rsv1; + uint32_t rsv2; +}; + +struct ntb_used { + uint16_t len; /* buffer length */ +#define NTB_FLAG_EOP 1 /* end of packet */ + uint16_t flags; /* flags */ +}; + +struct ntb_rx_entry { + struct rte_mbuf *mbuf; +}; + +struct ntb_rx_queue { + struct ntb_desc *rx_desc_ring; + volatile struct ntb_used *rx_used_ring; + uint16_t *avail_cnt; + volatile uint16_t *used_cnt; + uint16_t last_avail; + uint16_t last_used; + uint16_t nb_rx_desc; + + uint16_t rx_free_thresh; + + struct rte_mempool *mpool; /**< mempool for mbuf allocation */ + struct ntb_rx_entry *sw_ring; + + uint16_t queue_id; /**< DPDK queue index. */ + uint16_t port_id; /**< Device port identifier. */ + + struct ntb_hw *hw; +}; + +struct ntb_tx_entry { + struct rte_mbuf *mbuf; + uint16_t next_id; + uint16_t last_id; +}; + +struct ntb_tx_queue { + volatile struct ntb_desc *tx_desc_ring; + struct ntb_used *tx_used_ring; + volatile uint16_t *avail_cnt; + uint16_t *used_cnt; + uint16_t last_avail; /**< Next need to be free. */ + uint16_t last_used; /**< Next need to be sent. */ + uint16_t nb_tx_desc; + + /**< Total number of TX descriptors ready to be allocated. */ + uint16_t nb_tx_free; + uint16_t tx_free_thresh; + + struct ntb_tx_entry *sw_ring; + + uint16_t queue_id; /**< DPDK queue index. */ + uint16_t port_id; /**< Device port identifier. */ + + struct ntb_hw *hw; +}; + +struct ntb_header { + uint16_t avail_cnt __rte_cache_aligned; + uint16_t used_cnt __rte_cache_aligned; + struct ntb_desc desc_ring[] __rte_cache_aligned; }; /* ntb private data. */ struct ntb_hw { uint8_t mw_cnt; - uint8_t peer_mw_cnt; uint8_t db_cnt; uint8_t spad_cnt; @@ -147,18 +198,26 @@ struct ntb_hw { struct rte_pci_device *pci_dev; char *hw_addr; - uint64_t *mw_size; - uint64_t *peer_mw_size; uint8_t peer_dev_up; + uint64_t *mw_size; + /* remote mem base addr */ + uint64_t *peer_mw_base; uint16_t queue_pairs; uint16_t queue_size; + uint32_t hdr_size_per_queue; + + struct ntb_rx_queue **rx_queues; + struct ntb_tx_queue **tx_queues; - /**< mem zone to populate RX ring. */ + /* memzone to populate RX ring. */ const struct rte_memzone **mz; + uint8_t used_mw_num; + + uint8_t peer_used_mws; /* Reserve several spad for app to use. */ int spad_user_list[NTB_SPAD_USER_MAX_NUM]; }; -#endif /* _NTB_RAWDEV_H_ */ +#endif /* _NTB_H_ */ diff --git a/drivers/raw/ntb/ntb_hw_intel.c b/drivers/raw/ntb/ntb_hw_intel.c index 21eaa8511..0e73f1609 100644 --- a/drivers/raw/ntb/ntb_hw_intel.c +++ b/drivers/raw/ntb/ntb_hw_intel.c @@ -26,7 +26,7 @@ static enum xeon_ntb_bar intel_ntb_bar[] = { }; static int -intel_ntb_dev_init(struct rte_rawdev *dev) +intel_ntb_dev_init(const struct rte_rawdev *dev) { struct ntb_hw *hw = dev->dev_private; uint8_t reg_val, bar; @@ -77,7 +77,7 @@ intel_ntb_dev_init(struct rte_rawdev *dev) hw->db_cnt = XEON_DB_COUNT; hw->spad_cnt = XEON_SPAD_COUNT; - hw->mw_size = rte_zmalloc("uint64_t", + hw->mw_size = rte_zmalloc("ntb_mw_size", hw->mw_cnt * sizeof(uint64_t), 0); for (i = 0; i < hw->mw_cnt; i++) { bar = intel_ntb_bar[i]; @@ -94,7 +94,7 @@ intel_ntb_dev_init(struct rte_rawdev *dev) } static void * -intel_ntb_get_peer_mw_addr(struct rte_rawdev *dev, int mw_idx) +intel_ntb_get_peer_mw_addr(const struct rte_rawdev *dev, int mw_idx) { struct ntb_hw *hw = dev->dev_private; uint8_t bar; @@ -116,7 +116,7 @@ intel_ntb_get_peer_mw_addr(struct rte_rawdev *dev, int mw_idx) } static int -intel_ntb_mw_set_trans(struct rte_rawdev *dev, int mw_idx, +intel_ntb_mw_set_trans(const struct rte_rawdev *dev, int mw_idx, uint64_t addr, uint64_t size) { struct ntb_hw *hw = dev->dev_private; @@ -163,7 +163,7 @@ intel_ntb_mw_set_trans(struct rte_rawdev *dev, int mw_idx, } static int -intel_ntb_get_link_status(struct rte_rawdev *dev) +intel_ntb_get_link_status(const struct rte_rawdev *dev) { struct ntb_hw *hw = dev->dev_private; uint16_t reg_val; @@ -195,7 +195,7 @@ intel_ntb_get_link_status(struct rte_rawdev *dev) } static int -intel_ntb_set_link(struct rte_rawdev *dev, bool up) +intel_ntb_set_link(const struct rte_rawdev *dev, bool up) { struct ntb_hw *hw = dev->dev_private; uint32_t ntb_ctrl, reg_off; @@ -221,7 +221,7 @@ intel_ntb_set_link(struct rte_rawdev *dev, bool up) } static uint32_t -intel_ntb_spad_read(struct rte_rawdev *dev, int spad, bool peer) +intel_ntb_spad_read(const struct rte_rawdev *dev, int spad, bool peer) { struct ntb_hw *hw = dev->dev_private; uint32_t spad_v, reg_off; @@ -241,7 +241,7 @@ intel_ntb_spad_read(struct rte_rawdev *dev, int spad, bool peer) } static int -intel_ntb_spad_write(struct rte_rawdev *dev, int spad, +intel_ntb_spad_write(const struct rte_rawdev *dev, int spad, bool peer, uint32_t spad_v) { struct ntb_hw *hw = dev->dev_private; @@ -263,7 +263,7 @@ intel_ntb_spad_write(struct rte_rawdev *dev, int spad, } static uint64_t -intel_ntb_db_read(struct rte_rawdev *dev) +intel_ntb_db_read(const struct rte_rawdev *dev) { struct ntb_hw *hw = dev->dev_private; uint64_t db_off, db_bits; @@ -278,7 +278,7 @@ intel_ntb_db_read(struct rte_rawdev *dev) } static int -intel_ntb_db_clear(struct rte_rawdev *dev, uint64_t db_bits) +intel_ntb_db_clear(const struct rte_rawdev *dev, uint64_t db_bits) { struct ntb_hw *hw = dev->dev_private; uint64_t db_off; @@ -293,7 +293,7 @@ intel_ntb_db_clear(struct rte_rawdev *dev, uint64_t db_bits) } static int -intel_ntb_db_set_mask(struct rte_rawdev *dev, uint64_t db_mask) +intel_ntb_db_set_mask(const struct rte_rawdev *dev, uint64_t db_mask) { struct ntb_hw *hw = dev->dev_private; uint64_t db_m_off; @@ -312,7 +312,7 @@ intel_ntb_db_set_mask(struct rte_rawdev *dev, uint64_t db_mask) } static int -intel_ntb_peer_db_set(struct rte_rawdev *dev, uint8_t db_idx) +intel_ntb_peer_db_set(const struct rte_rawdev *dev, uint8_t db_idx) { struct ntb_hw *hw = dev->dev_private; uint32_t db_off; @@ -332,7 +332,7 @@ intel_ntb_peer_db_set(struct rte_rawdev *dev, uint8_t db_idx) } static int -intel_ntb_vector_bind(struct rte_rawdev *dev, uint8_t intr, uint8_t msix) +intel_ntb_vector_bind(const struct rte_rawdev *dev, uint8_t intr, uint8_t msix) { struct ntb_hw *hw = dev->dev_private; uint8_t reg_off; diff --git a/drivers/raw/ntb/rte_pmd_ntb.h b/drivers/raw/ntb/rte_pmd_ntb.h new file mode 100644 index 000000000..6591ce793 --- /dev/null +++ b/drivers/raw/ntb/rte_pmd_ntb.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2019 Intel Corporation. + */ + +#ifndef _RTE_PMD_NTB_H_ +#define _RTE_PMD_NTB_H_ + +/* App needs to set/get these attrs */ +#define NTB_QUEUE_SZ_NAME "queue_size" +#define NTB_QUEUE_NUM_NAME "queue_num" +#define NTB_TOPO_NAME "topo" +#define NTB_LINK_STATUS_NAME "link_status" +#define NTB_SPEED_NAME "speed" +#define NTB_WIDTH_NAME "width" +#define NTB_MW_CNT_NAME "mw_count" +#define NTB_DB_CNT_NAME "db_count" +#define NTB_SPAD_CNT_NAME "spad_count" + +#define NTB_MAX_DESC_SIZE 1024 +#define NTB_MIN_DESC_SIZE 64 + +struct ntb_dev_info { + uint32_t ntb_hdr_size; + /**< memzone needs to be mw size align or not. */ + uint8_t mw_size_align; + uint8_t mw_cnt; + uint64_t *mw_size; +}; + +struct ntb_dev_config { + uint16_t num_queues; + uint16_t queue_size; + uint8_t mz_num; + const struct rte_memzone **mz_list; +}; + +struct ntb_queue_conf { + uint16_t nb_desc; + uint16_t tx_free_thresh; + struct rte_mempool *rx_mp; +}; + +#endif /* _RTE_PMD_NTB_H_ */ From patchwork Mon Sep 9 03:27:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Xiaoyun" X-Patchwork-Id: 58966 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 89F491EAF9; Mon, 9 Sep 2019 05:28:10 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 50BCE1EAE1 for ; Mon, 9 Sep 2019 05:28:06 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Sep 2019 20:28:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,483,1559545200"; d="scan'208";a="188896250" Received: from dpdk-xiaoyun3.sh.intel.com ([10.67.119.190]) by orsmga006.jf.intel.com with ESMTP; 08 Sep 2019 20:28:04 -0700 From: Xiaoyun Li To: jingjing.wu@intel.com, keith.wiles@intel.com, omkar.maslekar@intel.com, cunming.liang@intel.com Cc: dev@dpdk.org, Xiaoyun Li Date: Mon, 9 Sep 2019 11:27:28 +0800 Message-Id: <20190909032730.29718-3-xiaoyun.li@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190909032730.29718-1-xiaoyun.li@intel.com> References: <20190906075402.114177-1-xiaoyun.li@intel.com> <20190909032730.29718-1-xiaoyun.li@intel.com> Subject: [dpdk-dev] [PATCH v4 2/4] raw/ntb: add xstats support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add xstats support for ntb rawdev. Support tx-packets, tx-bytes, tx-errors and rx-packets, rx-bytes, rx-missed. Signed-off-by: Xiaoyun Li --- drivers/raw/ntb/ntb.c | 133 ++++++++++++++++++++++++++++++++++++------ drivers/raw/ntb/ntb.h | 11 ++++ 2 files changed, 126 insertions(+), 18 deletions(-) diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c index 728deccdf..3ddfa2afb 100644 --- a/drivers/raw/ntb/ntb.c +++ b/drivers/raw/ntb/ntb.c @@ -30,6 +30,17 @@ static const struct rte_pci_id pci_id_ntb_map[] = { { .vendor_id = 0, /* sentinel */ }, }; +/* Align with enum ntb_xstats_idx */ +static struct rte_rawdev_xstats_name ntb_xstats_names[] = { + {"Tx-packets"}, + {"Tx-bytes"}, + {"Tx-errors"}, + {"Rx-packets"}, + {"Rx-bytes"}, + {"Rx-missed"}, +}; +#define NTB_XSTATS_NUM RTE_DIM(ntb_xstats_names) + static inline void ntb_link_cleanup(struct rte_rawdev *dev) { @@ -538,6 +549,10 @@ ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id) txq->last_avail = 0; txq->nb_tx_free = txq->nb_tx_desc - 1; + /* Set per queue stats. */ + for (i = 0; i < NTB_XSTATS_NUM; i++) + hw->ntb_xstats[i + NTB_XSTATS_NUM * (qp_id + 1)] = 0; + return 0; } @@ -614,6 +629,7 @@ ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config) { struct ntb_dev_config *conf = config; struct ntb_hw *hw = dev->dev_private; + uint32_t xstats_num; int ret; hw->queue_pairs = conf->num_queues; @@ -624,6 +640,10 @@ ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config) sizeof(struct ntb_rx_queue *) * hw->queue_pairs, 0); hw->tx_queues = rte_zmalloc("ntb_tx_queues", sizeof(struct ntb_tx_queue *) * hw->queue_pairs, 0); + /* First total stats, then per queue stats. */ + xstats_num = (hw->queue_pairs + 1) * NTB_XSTATS_NUM; + hw->ntb_xstats = rte_zmalloc("ntb_xstats", xstats_num * + sizeof(uint64_t), 0); /* Start handshake with the peer. */ ret = ntb_handshake_work(dev); @@ -645,6 +665,10 @@ ntb_dev_start(struct rte_rawdev *dev) if (!hw->link_status || !hw->peer_dev_up) return -EINVAL; + /* Set total stats. */ + for (i = 0; i < NTB_XSTATS_NUM; i++) + hw->ntb_xstats[i] = 0; + for (i = 0; i < hw->queue_pairs; i++) { ret = ntb_queue_init(dev, i); if (ret) { @@ -909,38 +933,111 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name, } static int -ntb_xstats_get(const struct rte_rawdev *dev __rte_unused, - const unsigned int ids[] __rte_unused, - uint64_t values[] __rte_unused, - unsigned int n __rte_unused) +ntb_xstats_get(const struct rte_rawdev *dev, + const unsigned int ids[], + uint64_t values[], + unsigned int n) { - return 0; + struct ntb_hw *hw = dev->dev_private; + uint32_t i, j, off, xstats_num; + + /* Calculate total stats of all queues. */ + for (i = 0; i < NTB_XSTATS_NUM; i++) { + hw->ntb_xstats[i] = 0; + for (j = 0; j < hw->queue_pairs; j++) { + off = NTB_XSTATS_NUM * (j + 1) + i; + hw->ntb_xstats[i] += hw->ntb_xstats[off]; + } + } + + xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1); + for (i = 0; i < n && ids[i] < xstats_num; i++) + values[i] = hw->ntb_xstats[ids[i]]; + + return i; } static int -ntb_xstats_get_names(const struct rte_rawdev *dev __rte_unused, - struct rte_rawdev_xstats_name *xstats_names __rte_unused, - unsigned int size __rte_unused) +ntb_xstats_get_names(const struct rte_rawdev *dev, + struct rte_rawdev_xstats_name *xstats_names, + unsigned int size) { - return 0; + struct ntb_hw *hw = dev->dev_private; + uint32_t xstats_num, i, j, off; + + xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1); + if (xstats_names == NULL || size < xstats_num) + return xstats_num; + + /* Total stats names */ + memcpy(xstats_names, ntb_xstats_names, sizeof(ntb_xstats_names)); + + /* Queue stats names */ + for (i = 0; i < hw->queue_pairs; i++) { + for (j = 0; j < NTB_XSTATS_NUM; j++) { + off = j + (i + 1) * NTB_XSTATS_NUM; + snprintf(xstats_names[off].name, + sizeof(xstats_names[0].name), + "%s_q%u", ntb_xstats_names[j].name, i); + } + } + + return xstats_num; } static uint64_t -ntb_xstats_get_by_name(const struct rte_rawdev *dev __rte_unused, - const char *name __rte_unused, - unsigned int *id __rte_unused) +ntb_xstats_get_by_name(const struct rte_rawdev *dev, + const char *name, unsigned int *id) { - return 0; + struct rte_rawdev_xstats_name *xstats_names; + struct ntb_hw *hw = dev->dev_private; + uint32_t xstats_num, i, j, off; + + if (name == NULL) + return -EINVAL; + + xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1); + xstats_names = rte_zmalloc("ntb_stats_name", + sizeof(struct rte_rawdev_xstats_name) * + xstats_num, 0); + ntb_xstats_get_names(dev, xstats_names, xstats_num); + + /* Calculate total stats of all queues. */ + for (i = 0; i < NTB_XSTATS_NUM; i++) { + for (j = 0; j < hw->queue_pairs; j++) { + off = NTB_XSTATS_NUM * (j + 1) + i; + hw->ntb_xstats[i] += hw->ntb_xstats[off]; + } + } + + for (i = 0; i < xstats_num; i++) { + if (!strncmp(name, xstats_names[i].name, + RTE_RAW_DEV_XSTATS_NAME_SIZE)) { + *id = i; + rte_free(xstats_names); + return hw->ntb_xstats[i]; + } + } + + NTB_LOG(ERR, "Cannot find the xstats name."); + + return -EINVAL; } static int -ntb_xstats_reset(struct rte_rawdev *dev __rte_unused, - const uint32_t ids[] __rte_unused, - uint32_t nb_ids __rte_unused) +ntb_xstats_reset(struct rte_rawdev *dev, + const uint32_t ids[], + uint32_t nb_ids) { - return 0; -} + struct ntb_hw *hw = dev->dev_private; + uint32_t i, xstats_num; + xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1); + for (i = 0; i < nb_ids && ids[i] < xstats_num; i++) + hw->ntb_xstats[ids[i]] = 0; + + return i; +} static const struct rte_rawdev_ops ntb_ops = { .dev_info_get = ntb_dev_info_get, diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h index 0ad20aed3..09e28050f 100644 --- a/drivers/raw/ntb/ntb.h +++ b/drivers/raw/ntb/ntb.h @@ -27,6 +27,15 @@ extern int ntb_logtype; #define NTB_DFLT_TX_FREE_THRESH 256 +enum ntb_xstats_idx { + NTB_TX_PKTS_ID = 0, + NTB_TX_BYTES_ID, + NTB_TX_ERRS_ID, + NTB_RX_PKTS_ID, + NTB_RX_BYTES_ID, + NTB_RX_MISS_ID, +}; + enum ntb_topo { NTB_TOPO_NONE = 0, NTB_TOPO_B2B_USD, @@ -216,6 +225,8 @@ struct ntb_hw { uint8_t peer_used_mws; + uint64_t *ntb_xstats; + /* Reserve several spad for app to use. */ int spad_user_list[NTB_SPAD_USER_MAX_NUM]; }; From patchwork Mon Sep 9 03:27:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Xiaoyun" X-Patchwork-Id: 58967 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 7E9281EB0A; Mon, 9 Sep 2019 05:28:12 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id DAADE1EAE9 for ; Mon, 9 Sep 2019 05:28:08 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Sep 2019 20:28:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,483,1559545200"; d="scan'208";a="188896267" Received: from dpdk-xiaoyun3.sh.intel.com ([10.67.119.190]) by orsmga006.jf.intel.com with ESMTP; 08 Sep 2019 20:28:06 -0700 From: Xiaoyun Li To: jingjing.wu@intel.com, keith.wiles@intel.com, omkar.maslekar@intel.com, cunming.liang@intel.com Cc: dev@dpdk.org, Xiaoyun Li Date: Mon, 9 Sep 2019 11:27:29 +0800 Message-Id: <20190909032730.29718-4-xiaoyun.li@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190909032730.29718-1-xiaoyun.li@intel.com> References: <20190906075402.114177-1-xiaoyun.li@intel.com> <20190909032730.29718-1-xiaoyun.li@intel.com> Subject: [dpdk-dev] [PATCH v4 3/4] raw/ntb: add enqueue and dequeue functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Introduce enqueue and dequeue functions to support packet based processing. And enable write-combining for ntb driver since it can improve the performance a lot. Signed-off-by: Xiaoyun Li Acked-by: Jingjing Wu --- doc/guides/rawdevs/ntb.rst | 28 ++++ drivers/raw/ntb/ntb.c | 242 ++++++++++++++++++++++++++++++--- drivers/raw/ntb/ntb.h | 2 + drivers/raw/ntb/ntb_hw_intel.c | 22 +++ 4 files changed, 275 insertions(+), 19 deletions(-) diff --git a/doc/guides/rawdevs/ntb.rst b/doc/guides/rawdevs/ntb.rst index 99e7db441..afd5769fc 100644 --- a/doc/guides/rawdevs/ntb.rst +++ b/doc/guides/rawdevs/ntb.rst @@ -45,6 +45,24 @@ to use, i.e. igb_uio, vfio. The ``dpdk-devbind.py`` script can be used to show devices status and to bind them to a suitable kernel driver. They will appear under the category of "Misc (rawdev) devices". +Prerequisites +------------- +NTB PMD needs kernel PCI driver to support write combining (WC) to get +better performance. The difference will be more than 10 times. +To enable WC, there are 2 ways. +- Insert igb_uio with ``wc_active=1`` flag if use igb_uio driver. + insmod igb_uio.ko wc_active=1 +- Enable WC for NTB device's Bar 2 and Bar 4 (Mapped memory) manually. + Get bar base address using ``lspci -vvv -s ae:00.0 | grep Region``. + Region 0: Memory at 39bfe0000000 (64-bit, prefetchable) [size=64K] + Region 2: Memory at 39bfa0000000 (64-bit, prefetchable) [size=512M] + Region 4: Memory at 39bfc0000000 (64-bit, prefetchable) [size=512M] + Using the following command to enable WC. + echo "base=0x39bfa0000000 size=0x400000 type=write-combining" >> /proc/mtrr + echo "base=0x39bfc0000000 size=0x400000 type=write-combining" >> /proc/mtrr + To disable WC for these regions, using the following. + echo "disable=1" >> /proc/mtrr + Ring Layout ----------- @@ -83,6 +101,16 @@ like the following: +------------------------+ +------------------------+ <---------traffic--------- +- Enqueue and Dequeue + Based on this ring layout, enqueue reads rx_tail to get how many free + buffers and writes used_ring and tx_tail to tell the peer which buffers + are filled with data. + And dequeue reads tx_tail to get how many packets are arrived, and + writes desc_ring and rx_tail to tell the peer about the new allocated + buffers. + So in this way, only remote write happens and remote read can be avoid + to get better performance. + Limitation ---------- diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c index 3ddfa2afb..a34f3f9ee 100644 --- a/drivers/raw/ntb/ntb.c +++ b/drivers/raw/ntb/ntb.c @@ -556,26 +556,140 @@ ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id) return 0; } +static inline void +ntb_enqueue_cleanup(struct ntb_tx_queue *txq) +{ + struct ntb_tx_entry *sw_ring = txq->sw_ring; + uint16_t tx_free = txq->last_avail; + uint16_t nb_to_clean, i; + + /* avail_cnt + 1 represents where to rx next in the peer. */ + nb_to_clean = (*txq->avail_cnt - txq->last_avail + 1 + + txq->nb_tx_desc) & (txq->nb_tx_desc - 1); + nb_to_clean = RTE_MIN(nb_to_clean, txq->tx_free_thresh); + for (i = 0; i < nb_to_clean; i++) { + if (sw_ring[tx_free].mbuf) + rte_pktmbuf_free_seg(sw_ring[tx_free].mbuf); + tx_free = (tx_free + 1) & (txq->nb_tx_desc - 1); + } + + txq->nb_tx_free += nb_to_clean; + txq->last_avail = tx_free; +} + static int ntb_enqueue_bufs(struct rte_rawdev *dev, struct rte_rawdev_buf **buffers, unsigned int count, rte_rawdev_obj_t context) { - /* Not FIFO right now. Just for testing memory write. */ struct ntb_hw *hw = dev->dev_private; - unsigned int i; - void *bar_addr; - size_t size; + struct ntb_tx_queue *txq = hw->tx_queues[(size_t)context]; + struct ntb_tx_entry *sw_ring = txq->sw_ring; + struct rte_mbuf *txm; + struct ntb_used tx_used[NTB_MAX_DESC_SIZE]; + volatile struct ntb_desc *tx_item; + uint16_t tx_last, nb_segs, off, last_used, avail_cnt; + uint16_t nb_mbufs = 0; + uint16_t nb_tx = 0; + uint64_t bytes = 0; + void *buf_addr; + int i; - if (hw->ntb_ops->get_peer_mw_addr == NULL) - return -ENOTSUP; - bar_addr = (*hw->ntb_ops->get_peer_mw_addr)(dev, 0); - size = (size_t)context; + if (unlikely(hw->ntb_ops->ioremap == NULL)) { + NTB_LOG(ERR, "Ioremap not supported."); + return nb_tx; + } - for (i = 0; i < count; i++) - rte_memcpy(bar_addr, buffers[i]->buf_addr, size); - return 0; + if (unlikely(dev->started == 0 || hw->peer_dev_up == 0)) { + NTB_LOG(DEBUG, "Link is not up."); + return nb_tx; + } + + if (txq->nb_tx_free < txq->tx_free_thresh) + ntb_enqueue_cleanup(txq); + + off = NTB_XSTATS_NUM * ((size_t)context + 1); + last_used = txq->last_used; + avail_cnt = *txq->avail_cnt;/* Where to alloc next. */ + for (nb_tx = 0; nb_tx < count; nb_tx++) { + txm = (struct rte_mbuf *)(buffers[nb_tx]->buf_addr); + if (txm == NULL || txq->nb_tx_free < txm->nb_segs) + break; + + tx_last = (txq->last_used + txm->nb_segs - 1) & + (txq->nb_tx_desc - 1); + nb_segs = txm->nb_segs; + for (i = 0; i < nb_segs; i++) { + /* Not enough ring space for tx. */ + if (txq->last_used == avail_cnt) + goto end_of_tx; + sw_ring[txq->last_used].mbuf = txm; + tx_item = txq->tx_desc_ring + txq->last_used; + + if (!tx_item->len) { + (hw->ntb_xstats[NTB_TX_ERRS_ID + off])++; + goto end_of_tx; + } + if (txm->data_len > tx_item->len) { + NTB_LOG(ERR, "Data length exceeds buf length." + " Only %u data would be transmitted.", + tx_item->len); + txm->data_len = tx_item->len; + } + + /* translate remote virtual addr to bar virtual addr */ + buf_addr = (*hw->ntb_ops->ioremap)(dev, tx_item->addr); + if (buf_addr == NULL) { + (hw->ntb_xstats[NTB_TX_ERRS_ID + off])++; + NTB_LOG(ERR, "Null remap addr."); + goto end_of_tx; + } + rte_memcpy(buf_addr, rte_pktmbuf_mtod(txm, void *), + txm->data_len); + + tx_used[nb_mbufs].len = txm->data_len; + tx_used[nb_mbufs++].flags = (txq->last_used == + tx_last) ? + NTB_FLAG_EOP : 0; + + /* update stats */ + bytes += txm->data_len; + + txm = txm->next; + + sw_ring[txq->last_used].next_id = (txq->last_used + 1) & + (txq->nb_tx_desc - 1); + sw_ring[txq->last_used].last_id = tx_last; + txq->last_used = (txq->last_used + 1) & + (txq->nb_tx_desc - 1); + } + txq->nb_tx_free -= nb_segs; + } + +end_of_tx: + if (nb_tx) { + uint16_t nb1, nb2; + if (nb_mbufs > txq->nb_tx_desc - last_used) { + nb1 = txq->nb_tx_desc - last_used; + nb2 = nb_mbufs - txq->nb_tx_desc + last_used; + } else { + nb1 = nb_mbufs; + nb2 = 0; + } + rte_memcpy(txq->tx_used_ring + last_used, tx_used, + sizeof(struct ntb_used) * nb1); + rte_memcpy(txq->tx_used_ring, tx_used + nb1, + sizeof(struct ntb_used) * nb2); + *txq->used_cnt = txq->last_used; + rte_wmb(); + + /* update queue stats */ + hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes; + hw->ntb_xstats[NTB_TX_PKTS_ID + off] += nb_tx; + } + + return nb_tx; } static int @@ -584,16 +698,106 @@ ntb_dequeue_bufs(struct rte_rawdev *dev, unsigned int count, rte_rawdev_obj_t context) { - /* Not FIFO. Just for testing memory read. */ struct ntb_hw *hw = dev->dev_private; - unsigned int i; - size_t size; + struct ntb_rx_queue *rxq = hw->rx_queues[(size_t)context]; + struct ntb_rx_entry *sw_ring = rxq->sw_ring; + struct ntb_desc rx_desc[NTB_MAX_DESC_SIZE]; + struct rte_mbuf *first, *rxm_t; + struct rte_mbuf *prev = NULL; + volatile struct ntb_used *rx_item; + uint16_t nb_mbufs = 0; + uint16_t nb_rx = 0; + uint64_t bytes = 0; + uint16_t off, last_avail, used_cnt, used_nb; + int i; - size = (size_t)context; + if (unlikely(dev->started == 0 || hw->peer_dev_up == 0)) { + NTB_LOG(DEBUG, "Link is not up"); + return nb_rx; + } + + used_cnt = *rxq->used_cnt; + + if (rxq->last_used == used_cnt) + return nb_rx; + + last_avail = rxq->last_avail; + used_nb = (used_cnt - rxq->last_used) & (rxq->nb_rx_desc - 1); + count = RTE_MIN(count, used_nb); + for (nb_rx = 0; nb_rx < count; nb_rx++) { + i = 0; + while (true) { + rx_item = rxq->rx_used_ring + rxq->last_used; + rxm_t = sw_ring[rxq->last_used].mbuf; + rxm_t->data_len = rx_item->len; + rxm_t->data_off = RTE_PKTMBUF_HEADROOM; + rxm_t->port = rxq->port_id; + + if (!i) { + rxm_t->nb_segs = 1; + first = rxm_t; + first->pkt_len = 0; + buffers[nb_rx]->buf_addr = rxm_t; + } else { + prev->next = rxm_t; + first->nb_segs++; + } - for (i = 0; i < count; i++) - rte_memcpy(buffers[i]->buf_addr, hw->mz[i]->addr, size); - return 0; + prev = rxm_t; + first->pkt_len += prev->data_len; + rxq->last_used = (rxq->last_used + 1) & + (rxq->nb_rx_desc - 1); + + /* alloc new mbuf */ + rxm_t = rte_mbuf_raw_alloc(rxq->mpool); + if (unlikely(rxm_t == NULL)) { + NTB_LOG(ERR, "recv alloc mbuf failed."); + goto end_of_rx; + } + rxm_t->port = rxq->port_id; + sw_ring[rxq->last_avail].mbuf = rxm_t; + i++; + + /* fill new desc */ + rx_desc[nb_mbufs].addr = + rte_pktmbuf_mtod(rxm_t, size_t); + rx_desc[nb_mbufs++].len = rxm_t->buf_len - + RTE_PKTMBUF_HEADROOM; + rxq->last_avail = (rxq->last_avail + 1) & + (rxq->nb_rx_desc - 1); + + if (rx_item->flags & NTB_FLAG_EOP) + break; + } + /* update stats */ + bytes += first->pkt_len; + } + +end_of_rx: + if (nb_rx) { + uint16_t nb1, nb2; + if (nb_mbufs > rxq->nb_rx_desc - last_avail) { + nb1 = rxq->nb_rx_desc - last_avail; + nb2 = nb_mbufs - rxq->nb_rx_desc + last_avail; + } else { + nb1 = nb_mbufs; + nb2 = 0; + } + rte_memcpy(rxq->rx_desc_ring + last_avail, rx_desc, + sizeof(struct ntb_desc) * nb1); + rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1, + sizeof(struct ntb_desc) * nb2); + *rxq->avail_cnt = rxq->last_avail; + rte_wmb(); + + /* update queue stats */ + off = NTB_XSTATS_NUM * ((size_t)context + 1); + hw->ntb_xstats[NTB_RX_BYTES_ID + off] += bytes; + hw->ntb_xstats[NTB_RX_PKTS_ID + off] += nb_rx; + hw->ntb_xstats[NTB_RX_MISS_ID + off] += (count - nb_rx); + } + + return nb_rx; } static void @@ -1240,7 +1444,7 @@ ntb_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_ntb_pmd = { .id_table = pci_id_ntb_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_WC_ACTIVATE, .probe = ntb_probe, .remove = ntb_remove, }; diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h index 09e28050f..eff1f6f07 100644 --- a/drivers/raw/ntb/ntb.h +++ b/drivers/raw/ntb/ntb.h @@ -87,6 +87,7 @@ enum ntb_spad_idx { * @ntb_dev_init: Init ntb dev. * @get_peer_mw_addr: To get the addr of peer mw[mw_idx]. * @mw_set_trans: Set translation of internal memory that remote can access. + * @ioremap: Translate the remote host address to bar address. * @get_link_status: get link status, link speed and link width. * @set_link: Set local side up/down. * @spad_read: Read local/peer spad register val. @@ -103,6 +104,7 @@ struct ntb_dev_ops { void *(*get_peer_mw_addr)(const struct rte_rawdev *dev, int mw_idx); int (*mw_set_trans)(const struct rte_rawdev *dev, int mw_idx, uint64_t addr, uint64_t size); + void *(*ioremap)(const struct rte_rawdev *dev, uint64_t addr); int (*get_link_status)(const struct rte_rawdev *dev); int (*set_link)(const struct rte_rawdev *dev, bool up); uint32_t (*spad_read)(const struct rte_rawdev *dev, int spad, diff --git a/drivers/raw/ntb/ntb_hw_intel.c b/drivers/raw/ntb/ntb_hw_intel.c index 0e73f1609..e7f8667cd 100644 --- a/drivers/raw/ntb/ntb_hw_intel.c +++ b/drivers/raw/ntb/ntb_hw_intel.c @@ -162,6 +162,27 @@ intel_ntb_mw_set_trans(const struct rte_rawdev *dev, int mw_idx, return 0; } +static void * +intel_ntb_ioremap(const struct rte_rawdev *dev, uint64_t addr) +{ + struct ntb_hw *hw = dev->dev_private; + void *mapped = NULL; + void *base; + int i; + + for (i = 0; i < hw->peer_used_mws; i++) { + if (addr >= hw->peer_mw_base[i] && + addr <= hw->peer_mw_base[i] + hw->mw_size[i]) { + base = intel_ntb_get_peer_mw_addr(dev, i); + mapped = (void *)(size_t)(addr - hw->peer_mw_base[i] + + (size_t)base); + break; + } + } + + return mapped; +} + static int intel_ntb_get_link_status(const struct rte_rawdev *dev) { @@ -357,6 +378,7 @@ const struct ntb_dev_ops intel_ntb_ops = { .ntb_dev_init = intel_ntb_dev_init, .get_peer_mw_addr = intel_ntb_get_peer_mw_addr, .mw_set_trans = intel_ntb_mw_set_trans, + .ioremap = intel_ntb_ioremap, .get_link_status = intel_ntb_get_link_status, .set_link = intel_ntb_set_link, .spad_read = intel_ntb_spad_read, From patchwork Mon Sep 9 03:27:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Xiaoyun" X-Patchwork-Id: 58968 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 0494F1EB15; Mon, 9 Sep 2019 05:28:15 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 1CD8B1EAB6 for ; Mon, 9 Sep 2019 05:28:11 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Sep 2019 20:28:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,483,1559545200"; d="scan'208";a="188896280" Received: from dpdk-xiaoyun3.sh.intel.com ([10.67.119.190]) by orsmga006.jf.intel.com with ESMTP; 08 Sep 2019 20:28:09 -0700 From: Xiaoyun Li To: jingjing.wu@intel.com, keith.wiles@intel.com, omkar.maslekar@intel.com, cunming.liang@intel.com Cc: dev@dpdk.org, Xiaoyun Li Date: Mon, 9 Sep 2019 11:27:30 +0800 Message-Id: <20190909032730.29718-5-xiaoyun.li@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190909032730.29718-1-xiaoyun.li@intel.com> References: <20190906075402.114177-1-xiaoyun.li@intel.com> <20190909032730.29718-1-xiaoyun.li@intel.com> Subject: [dpdk-dev] [PATCH v4 4/4] examples/ntb: support more functions for NTB X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Support to transmit files between two systems. Support iofwd between one ethdev and NTB device. Support rxonly and txonly for NTB device. Support to set forwarding mode as file-trans, txonly, rxonly or iofwd. Support to show/clear port stats and throughput. Signed-off-by: Xiaoyun Li --- doc/guides/sample_app_ug/ntb.rst | 59 +- examples/ntb/meson.build | 3 + examples/ntb/ntb_fwd.c | 1298 +++++++++++++++++++++++++++--- 3 files changed, 1232 insertions(+), 128 deletions(-) diff --git a/doc/guides/sample_app_ug/ntb.rst b/doc/guides/sample_app_ug/ntb.rst index 079242175..f8291d7d1 100644 --- a/doc/guides/sample_app_ug/ntb.rst +++ b/doc/guides/sample_app_ug/ntb.rst @@ -5,8 +5,17 @@ NTB Sample Application ====================== The ntb sample application shows how to use ntb rawdev driver. -This sample provides interactive mode to transmit file between -two hosts. +This sample provides interactive mode to do packet based processing +between two systems. + +This sample supports 4 types of packet forwarding mode. + +* ``file-trans``: transmit files between two systems. The sample will + be polling to receive files from the peer and save the file as + ``ntb_recv_file[N]``, [N] represents the number of received file. +* ``rxonly``: NTB receives packets but doesn't transmit them. +* ``txonly``: NTB generates and transmits packets without receiving any. +* ``iofwd``: iofwd between NTB device and ethdev. Compiling the Application ------------------------- @@ -29,6 +38,40 @@ Refer to the *DPDK Getting Started Guide* for general information on running applications and the Environment Abstraction Layer (EAL) options. +Command-line Options +-------------------- + +The application supports the following command-line options. + +* ``--buf-size=N`` + + Set the data size of the mbufs used to N bytes, where N < 65536. + The default value is 2048. + +* ``--fwd-mode=mode`` + + Set the packet forwarding mode as ``file-trans``, ``txonly``, + ``rxonly`` or ``iofwd``. + +* ``--nb-desc=N`` + + Set number of descriptors of queue as N, namely queue size, + where 64 <= N <= 1024. The default value is 1024. + +* ``--txfreet=N`` + + Set the transmit free threshold of TX rings to N, where 0 <= N <= + the value of ``--nb-desc``. The default value is 256. + +* ``--burst=N`` + + Set the number of packets per burst to N, where 1 <= N <= 32. + The default value is 32. + +* ``--qp=N`` + + Set the number of queues as N, where qp > 0. + Using the application --------------------- @@ -41,7 +84,11 @@ The application is console-driven using the cmdline DPDK interface: From this interface the available commands and descriptions of what they do as as follows: -* ``send [filepath]``: Send file to the peer host. -* ``receive [filepath]``: Receive file to [filepath]. Need the peer - to send file successfully first. -* ``quit``: Exit program +* ``send [filepath]``: Send file to the peer host. Need to be in + file-trans forwarding mode first. +* ``start``: Start transmission. +* ``stop``: Stop transmission. +* ``show/clear port stats``: Show/Clear port stats and throughput. +* ``set fwd file-trans/rxonly/txonly/iofwd``: Set packet forwarding + mode. +* ``quit``: Exit program. diff --git a/examples/ntb/meson.build b/examples/ntb/meson.build index 9a6288f4f..f5435fe12 100644 --- a/examples/ntb/meson.build +++ b/examples/ntb/meson.build @@ -14,3 +14,6 @@ cflags += ['-D_FILE_OFFSET_BITS=64'] sources = files( 'ntb_fwd.c' ) +if dpdk_conf.has('RTE_LIBRTE_PMD_NTB_RAWDEV') + deps += 'rawdev_ntb' +endif diff --git a/examples/ntb/ntb_fwd.c b/examples/ntb/ntb_fwd.c index f8c970cdb..b1ea71c8f 100644 --- a/examples/ntb/ntb_fwd.c +++ b/examples/ntb/ntb_fwd.c @@ -14,21 +14,103 @@ #include #include #include +#include +#include #include +#include +#include -#define NTB_DRV_NAME_LEN 7 -static uint64_t max_file_size = 0x400000; +/* Per-port statistics struct */ +struct ntb_port_statistics { + uint64_t tx; + uint64_t rx; +} __rte_cache_aligned; +/* Port 0: NTB dev, Port 1: ethdev when iofwd. */ +struct ntb_port_statistics ntb_port_stats[2]; + +struct ntb_fwd_stream { + uint16_t tx_port; + uint16_t rx_port; + uint16_t qp_id; + uint8_t tx_ntb; /* If ntb device is tx port. */ +}; + +struct ntb_fwd_lcore_conf { + uint16_t stream_id; + uint16_t nb_stream; + uint8_t stopped; +}; + +enum ntb_fwd_mode { + FILE_TRANS = 0, + RXONLY, + TXONLY, + IOFWD, + MAX_FWD_MODE, +}; +static const char *const fwd_mode_s[] = { + "file-trans", + "rxonly", + "txonly", + "iofwd", + NULL, +}; +static enum ntb_fwd_mode fwd_mode = MAX_FWD_MODE; + +static struct ntb_fwd_lcore_conf fwd_lcore_conf[RTE_MAX_LCORE]; +static struct ntb_fwd_stream *fwd_streams; + +static struct rte_mempool *mbuf_pool; + +#define NTB_DRV_NAME_LEN 7 +#define MEMPOOL_CACHE_SIZE 256 + +static uint8_t in_test; static uint8_t interactive = 1; +static uint16_t eth_port_id = RTE_MAX_ETHPORTS; static uint16_t dev_id; +/* Number of queues, default set as 1 */ +static uint16_t num_queues = 1; +static uint16_t ntb_buf_size = RTE_MBUF_DEFAULT_BUF_SIZE; + +/* Configurable number of descriptors */ +#define NTB_DEFAULT_NUM_DESCS 1024 +static uint16_t nb_desc = NTB_DEFAULT_NUM_DESCS; + +static uint16_t tx_free_thresh; + +#define NTB_MAX_PKT_BURST 32 +#define NTB_DFLT_PKT_BURST 32 +static uint16_t pkt_burst = NTB_DFLT_PKT_BURST; + +#define BURST_TX_RETRIES 64 + +static struct rte_eth_conf eth_port_conf = { + .rxmode = { + .mq_mode = ETH_MQ_RX_RSS, + .split_hdr_size = 0, + }, + .rx_adv_conf = { + .rss_conf = { + .rss_key = NULL, + .rss_hf = ETH_RSS_IP, + }, + }, + .txmode = { + .mq_mode = ETH_MQ_TX_NONE, + }, +}; + /* *** Help command with introduction. *** */ struct cmd_help_result { cmdline_fixed_string_t help; }; -static void cmd_help_parsed(__attribute__((unused)) void *parsed_result, - struct cmdline *cl, - __attribute__((unused)) void *data) +static void +cmd_help_parsed(__attribute__((unused)) void *parsed_result, + struct cmdline *cl, + __attribute__((unused)) void *data) { cmdline_printf( cl, @@ -37,13 +119,17 @@ static void cmd_help_parsed(__attribute__((unused)) void *parsed_result, "Control:\n" " quit :" " Quit the application.\n" - "\nFile transmit:\n" + "\nTransmission:\n" " send [path] :" - " Send [path] file. (No more than %"PRIu64")\n" - " recv [path] :" - " Receive file to [path]. Make sure sending is done" - " on the other side.\n", - max_file_size + " Send [path] file. Only take effect in file-trans mode\n" + " start :" + " Start transmissions.\n" + " stop :" + " Stop transmissions.\n" + " clear/show port stats :" + " Clear/show port stats.\n" + " set fwd file-trans/rxonly/txonly/iofwd :" + " Set packet forwarding mode.\n" ); } @@ -66,13 +152,37 @@ struct cmd_quit_result { cmdline_fixed_string_t quit; }; -static void cmd_quit_parsed(__attribute__((unused)) void *parsed_result, - struct cmdline *cl, - __attribute__((unused)) void *data) +static void +cmd_quit_parsed(__attribute__((unused)) void *parsed_result, + struct cmdline *cl, + __attribute__((unused)) void *data) { + struct ntb_fwd_lcore_conf *conf; + uint8_t lcore_id; + + /* Stop transmission first. */ + RTE_LCORE_FOREACH_SLAVE(lcore_id) { + conf = &fwd_lcore_conf[lcore_id]; + + if (!conf->nb_stream) + continue; + + if (conf->stopped) + continue; + + conf->stopped = 1; + } + printf("\nWaiting for lcores to finish...\n"); + rte_eal_mp_wait_lcore(); + in_test = 0; + /* Stop traffic and Close port. */ rte_rawdev_stop(dev_id); rte_rawdev_close(dev_id); + if (eth_port_id < RTE_MAX_ETHPORTS && fwd_mode == IOFWD) { + rte_eth_dev_stop(eth_port_id); + rte_eth_dev_close(eth_port_id); + } cmdline_quit(cl); } @@ -102,21 +212,19 @@ cmd_sendfile_parsed(void *parsed_result, __attribute__((unused)) void *data) { struct cmd_sendfile_result *res = parsed_result; - struct rte_rawdev_buf *pkts_send[1]; - uint64_t rsize, size, link; - uint8_t *buff; + struct rte_rawdev_buf *pkts_send[NTB_MAX_PKT_BURST]; + struct rte_mbuf *mbuf_send[NTB_MAX_PKT_BURST]; + uint64_t size, count, i, nb_burst; + uint16_t nb_tx, buf_size; + unsigned int nb_pkt; + size_t queue_id = 0; + uint16_t retry = 0; uint32_t val; FILE *file; - if (!rte_rawdevs[dev_id].started) { - printf("Device needs to be up first. Try later.\n"); - return; - } - - rte_rawdev_get_attr(dev_id, "link_status", &link); - if (!link) { - printf("Link is not up, cannot send file.\n"); - return; + if (num_queues != 1) { + printf("File transmission only supports 1 queue.\n"); + num_queues = 1; } file = fopen(res->filepath, "r"); @@ -127,30 +235,13 @@ cmd_sendfile_parsed(void *parsed_result, if (fseek(file, 0, SEEK_END) < 0) { printf("Fail to get file size.\n"); + fclose(file); return; } size = ftell(file); if (fseek(file, 0, SEEK_SET) < 0) { printf("Fail to get file size.\n"); - return; - } - - /** - * No FIFO now. Only test memory. Limit sending file - * size <= max_file_size. - */ - if (size > max_file_size) { - printf("Warning: The file is too large. Only send first" - " %"PRIu64" bits.\n", max_file_size); - size = max_file_size; - } - - buff = (uint8_t *)malloc(size); - rsize = fread(buff, size, 1, file); - if (rsize != 1) { - printf("Fail to read file.\n"); fclose(file); - free(buff); return; } @@ -159,22 +250,63 @@ cmd_sendfile_parsed(void *parsed_result, rte_rawdev_set_attr(dev_id, "spad_user_0", val); val = size; rte_rawdev_set_attr(dev_id, "spad_user_1", val); + printf("Sending file, size is %"PRIu64"\n", size); + + for (i = 0; i < NTB_MAX_PKT_BURST; i++) + pkts_send[i] = (struct rte_rawdev_buf *) + malloc(sizeof(struct rte_rawdev_buf)); + + buf_size = ntb_buf_size - RTE_PKTMBUF_HEADROOM; + count = (size + buf_size - 1) / buf_size; + nb_burst = (count + pkt_burst - 1) / pkt_burst; - pkts_send[0] = (struct rte_rawdev_buf *)malloc - (sizeof(struct rte_rawdev_buf)); - pkts_send[0]->buf_addr = buff; + for (i = 0; i < nb_burst; i++) { + val = RTE_MIN(count, pkt_burst); + if (rte_mempool_get_bulk(mbuf_pool, (void **)mbuf_send, + val) == 0) { + for (nb_pkt = 0; nb_pkt < val; nb_pkt++) { + mbuf_send[nb_pkt]->port = dev_id; + mbuf_send[nb_pkt]->data_len = + fread(rte_pktmbuf_mtod(mbuf_send[nb_pkt], + void *), 1, buf_size, file); + mbuf_send[nb_pkt]->pkt_len = + mbuf_send[nb_pkt]->data_len; + pkts_send[nb_pkt]->buf_addr = mbuf_send[nb_pkt]; + } + } else { + for (nb_pkt = 0; nb_pkt < val; nb_pkt++) { + mbuf_send[nb_pkt] = + rte_mbuf_raw_alloc(mbuf_pool); + if (mbuf_send[nb_pkt] == NULL) + break; + mbuf_send[nb_pkt]->port = dev_id; + mbuf_send[nb_pkt]->data_len = + fread(rte_pktmbuf_mtod(mbuf_send[nb_pkt], + void *), 1, buf_size, file); + mbuf_send[nb_pkt]->pkt_len = + mbuf_send[nb_pkt]->data_len; + pkts_send[nb_pkt]->buf_addr = mbuf_send[nb_pkt]; + } + } - if (rte_rawdev_enqueue_buffers(dev_id, pkts_send, 1, - (void *)(size_t)size)) { - printf("Fail to enqueue.\n"); - goto clean; + nb_tx = rte_rawdev_enqueue_buffers(dev_id, pkts_send, nb_pkt, + (void *)queue_id); + while (nb_tx != nb_pkt && retry < BURST_TX_RETRIES) { + rte_delay_us(1); + nb_tx += rte_rawdev_enqueue_buffers(dev_id, + &pkts_send[nb_tx], nb_pkt - nb_tx, + (void *)queue_id); + } + count -= nb_pkt; } + /* Clear register after file sending done. */ + rte_rawdev_set_attr(dev_id, "spad_user_0", 0); + rte_rawdev_set_attr(dev_id, "spad_user_1", 0); printf("Done sending file.\n"); -clean: + for (i = 0; i < NTB_MAX_PKT_BURST; i++) + free(pkts_send[i]); fclose(file); - free(buff); - free(pkts_send[0]); } cmdline_parse_token_string_t cmd_send_file_send = @@ -195,79 +327,680 @@ cmdline_parse_inst_t cmd_send_file = { }, }; -/* *** RECEIVE FILE PARAMETERS *** */ -struct cmd_recvfile_result { - cmdline_fixed_string_t recv_string; - char filepath[]; -}; +#define RECV_FILE_LEN 30 +static int +start_polling_recv_file(void *param) +{ + struct rte_rawdev_buf *pkts_recv[NTB_MAX_PKT_BURST]; + struct ntb_fwd_lcore_conf *conf = param; + struct rte_mbuf *mbuf; + char filepath[RECV_FILE_LEN]; + uint64_t val, size, file_len; + uint16_t nb_rx, i, file_no; + size_t queue_id = 0; + FILE *file; + + for (i = 0; i < NTB_MAX_PKT_BURST; i++) + pkts_recv[i] = (struct rte_rawdev_buf *) + malloc(sizeof(struct rte_rawdev_buf)); + + file_no = 0; + while (!conf->stopped) { + snprintf(filepath, RECV_FILE_LEN, "ntb_recv_file%d", file_no); + file = fopen(filepath, "w"); + if (file == NULL) { + printf("Fail to open the file.\n"); + return -EINVAL; + } + + rte_rawdev_get_attr(dev_id, "spad_user_0", &val); + size = val << 32; + rte_rawdev_get_attr(dev_id, "spad_user_1", &val); + size |= val; + + if (!size) { + fclose(file); + continue; + } + + file_len = 0; + nb_rx = NTB_MAX_PKT_BURST; + while (file_len < size && !conf->stopped) { + nb_rx = rte_rawdev_dequeue_buffers(dev_id, pkts_recv, + pkt_burst, (void *)queue_id); + ntb_port_stats[0].rx += nb_rx; + for (i = 0; i < nb_rx; i++) { + mbuf = pkts_recv[i]->buf_addr; + fwrite(rte_pktmbuf_mtod(mbuf, void *), 1, + mbuf->data_len, file); + file_len += mbuf->data_len; + rte_pktmbuf_free(mbuf); + pkts_recv[i]->buf_addr = NULL; + } + } + + printf("Received file (size: %" PRIu64 ") from peer to %s.\n", + size, filepath); + fclose(file); + file_no++; + } + + for (i = 0; i < NTB_MAX_PKT_BURST; i++) + free(pkts_recv[i]); + return 0; +} + +static int +start_iofwd_per_lcore(void *param) +{ + struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST]; + struct rte_mbuf *pkts_burst[NTB_MAX_PKT_BURST]; + struct ntb_fwd_lcore_conf *conf = param; + struct ntb_fwd_stream fs; + uint16_t nb_rx, nb_tx; + int i, j; + + for (i = 0; i < NTB_MAX_PKT_BURST; i++) + ntb_buf[i] = (struct rte_rawdev_buf *) + malloc(sizeof(struct rte_rawdev_buf)); + + while (!conf->stopped) { + for (i = 0; i < conf->nb_stream; i++) { + fs = fwd_streams[conf->stream_id + i]; + if (fs.tx_ntb) { + nb_rx = rte_eth_rx_burst(fs.rx_port, + fs.qp_id, pkts_burst, + pkt_burst); + if (unlikely(nb_rx == 0)) + continue; + for (j = 0; j < nb_rx; j++) + ntb_buf[j]->buf_addr = pkts_burst[j]; + nb_tx = + rte_rawdev_enqueue_buffers(fs.tx_port, + ntb_buf, nb_rx, + (void *)(size_t)fs.qp_id); + ntb_port_stats[0].tx += nb_tx; + ntb_port_stats[1].rx += nb_rx; + } else { + nb_rx = + rte_rawdev_dequeue_buffers(fs.rx_port, + ntb_buf, pkt_burst, + (void *)(size_t)fs.qp_id); + if (unlikely(nb_rx == 0)) + continue; + for (j = 0; j < nb_rx; j++) + pkts_burst[j] = ntb_buf[j]->buf_addr; + nb_tx = rte_eth_tx_burst(fs.tx_port, + fs.qp_id, pkts_burst, nb_rx); + ntb_port_stats[1].tx += nb_tx; + ntb_port_stats[0].rx += nb_rx; + } + if (unlikely(nb_tx < nb_rx)) { + do { + rte_pktmbuf_free(pkts_burst[nb_tx]); + } while (++nb_tx < nb_rx); + } + } + } + + for (i = 0; i < NTB_MAX_PKT_BURST; i++) + free(ntb_buf[i]); + + return 0; +} + +static int +start_rxonly_per_lcore(void *param) +{ + struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST]; + struct ntb_fwd_lcore_conf *conf = param; + struct ntb_fwd_stream fs; + uint16_t nb_rx; + int i, j; + + for (i = 0; i < NTB_MAX_PKT_BURST; i++) + ntb_buf[i] = (struct rte_rawdev_buf *) + malloc(sizeof(struct rte_rawdev_buf)); + + while (!conf->stopped) { + for (i = 0; i < conf->nb_stream; i++) { + fs = fwd_streams[conf->stream_id + i]; + nb_rx = rte_rawdev_dequeue_buffers(fs.rx_port, + ntb_buf, pkt_burst, (void *)(size_t)fs.qp_id); + if (unlikely(nb_rx == 0)) + continue; + ntb_port_stats[0].rx += nb_rx; + + for (j = 0; j < nb_rx; j++) + rte_pktmbuf_free(ntb_buf[j]->buf_addr); + } + } + + for (i = 0; i < NTB_MAX_PKT_BURST; i++) + free(ntb_buf[i]); + + return 0; +} + + +static int +start_txonly_per_lcore(void *param) +{ + struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST]; + struct rte_mbuf *pkts_burst[NTB_MAX_PKT_BURST]; + struct ntb_fwd_lcore_conf *conf = param; + struct ntb_fwd_stream fs; + uint16_t nb_pkt, nb_tx; + int i; + + for (i = 0; i < NTB_MAX_PKT_BURST; i++) + ntb_buf[i] = (struct rte_rawdev_buf *) + malloc(sizeof(struct rte_rawdev_buf)); + + while (!conf->stopped) { + for (i = 0; i < conf->nb_stream; i++) { + fs = fwd_streams[conf->stream_id + i]; + if (rte_mempool_get_bulk(mbuf_pool, (void **)pkts_burst, + pkt_burst) == 0) { + for (nb_pkt = 0; nb_pkt < pkt_burst; nb_pkt++) { + pkts_burst[nb_pkt]->port = dev_id; + pkts_burst[nb_pkt]->data_len = + pkts_burst[nb_pkt]->buf_len - + RTE_PKTMBUF_HEADROOM; + pkts_burst[nb_pkt]->pkt_len = + pkts_burst[nb_pkt]->data_len; + ntb_buf[nb_pkt]->buf_addr = + pkts_burst[nb_pkt]; + } + } else { + for (nb_pkt = 0; nb_pkt < pkt_burst; nb_pkt++) { + pkts_burst[nb_pkt] = + rte_pktmbuf_alloc(mbuf_pool); + if (pkts_burst[nb_pkt] == NULL) + break; + pkts_burst[nb_pkt]->port = dev_id; + pkts_burst[nb_pkt]->data_len = + pkts_burst[nb_pkt]->buf_len - + RTE_PKTMBUF_HEADROOM; + pkts_burst[nb_pkt]->pkt_len = + pkts_burst[nb_pkt]->data_len; + ntb_buf[nb_pkt]->buf_addr = + pkts_burst[nb_pkt]; + } + } + nb_tx = rte_rawdev_enqueue_buffers(fs.tx_port, + ntb_buf, nb_pkt, (void *)(size_t)fs.qp_id); + ntb_port_stats[0].tx += nb_tx; + if (unlikely(nb_tx < nb_pkt)) { + do { + rte_pktmbuf_free( + ntb_buf[nb_tx]->buf_addr); + } while (++nb_tx < nb_pkt); + } + } + } + + for (i = 0; i < NTB_MAX_PKT_BURST; i++) + free(ntb_buf[i]); + + return 0; +} + +static int +ntb_fwd_config_setup(void) +{ + uint16_t i; + + /* Make sure iofwd has valid ethdev. */ + if (fwd_mode == IOFWD && eth_port_id >= RTE_MAX_ETHPORTS) { + printf("No ethdev, cannot be in iofwd mode."); + return -EINVAL; + } + + if (fwd_mode == IOFWD) { + fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams", + sizeof(struct ntb_fwd_stream) * num_queues * 2, + RTE_CACHE_LINE_SIZE); + for (i = 0; i < num_queues; i++) { + fwd_streams[i * 2].qp_id = i; + fwd_streams[i * 2].tx_port = dev_id; + fwd_streams[i * 2].rx_port = eth_port_id; + fwd_streams[i * 2].tx_ntb = 1; + + fwd_streams[i * 2 + 1].qp_id = i; + fwd_streams[i * 2 + 1].tx_port = eth_port_id; + fwd_streams[i * 2 + 1].rx_port = dev_id; + fwd_streams[i * 2 + 1].tx_ntb = 0; + } + return 0; + } + + if (fwd_mode == RXONLY || fwd_mode == FILE_TRANS) { + /* Only support 1 queue in file-trans for in order. */ + if (fwd_mode == FILE_TRANS) + num_queues = 1; + + fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams", + sizeof(struct ntb_fwd_stream) * num_queues, + RTE_CACHE_LINE_SIZE); + for (i = 0; i < num_queues; i++) { + fwd_streams[i].qp_id = i; + fwd_streams[i].tx_port = RTE_MAX_ETHPORTS; + fwd_streams[i].rx_port = dev_id; + fwd_streams[i].tx_ntb = 0; + } + return 0; + } + + if (fwd_mode == TXONLY) { + fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams", + sizeof(struct ntb_fwd_stream) * num_queues, + RTE_CACHE_LINE_SIZE); + for (i = 0; i < num_queues; i++) { + fwd_streams[i].qp_id = i; + fwd_streams[i].tx_port = dev_id; + fwd_streams[i].rx_port = RTE_MAX_ETHPORTS; + fwd_streams[i].tx_ntb = 1; + } + } + return 0; +} static void -cmd_recvfile_parsed(void *parsed_result, - __attribute__((unused)) struct cmdline *cl, - __attribute__((unused)) void *data) +assign_stream_to_lcores(void) { - struct cmd_sendfile_result *res = parsed_result; - struct rte_rawdev_buf *pkts_recv[1]; - uint8_t *buff; - uint64_t val; - size_t size; - FILE *file; + struct ntb_fwd_lcore_conf *conf; + struct ntb_fwd_stream *fs; + uint16_t nb_streams, sm_per_lcore, sm_id, i; + uint8_t lcore_id, lcore_num, nb_extra; - if (!rte_rawdevs[dev_id].started) { - printf("Device needs to be up first. Try later.\n"); - return; + lcore_num = rte_lcore_count(); + /* Exclude master core */ + lcore_num--; + + nb_streams = (fwd_mode == IOFWD) ? num_queues * 2 : num_queues; + + sm_per_lcore = nb_streams / lcore_num; + nb_extra = nb_streams % lcore_num; + sm_id = 0; + i = 0; + + RTE_LCORE_FOREACH_SLAVE(lcore_id) { + conf = &fwd_lcore_conf[lcore_id]; + + if (i < nb_extra) { + conf->nb_stream = sm_per_lcore + 1; + conf->stream_id = sm_id; + sm_id = sm_id + sm_per_lcore + 1; + } else { + conf->nb_stream = sm_per_lcore; + conf->stream_id = sm_id; + sm_id = sm_id + sm_per_lcore; + } + + i++; + if (sm_id >= nb_streams) + break; + } + + /* Print packet forwading config. */ + RTE_LCORE_FOREACH_SLAVE(lcore_id) { + conf = &fwd_lcore_conf[lcore_id]; + + if (!conf->nb_stream) + continue; + + printf("Streams on Lcore %u :\n", lcore_id); + for (i = 0; i < conf->nb_stream; i++) { + fs = &fwd_streams[conf->stream_id + i]; + if (fwd_mode == IOFWD) + printf(" + Stream %u : %s%u RX -> %s%u TX," + " Q=%u\n", conf->stream_id + i, + fs->tx_ntb ? "Eth" : "NTB", fs->rx_port, + fs->tx_ntb ? "NTB" : "Eth", fs->tx_port, + fs->qp_id); + if (fwd_mode == FILE_TRANS || fwd_mode == RXONLY) + printf(" + Stream %u : %s%u RX only\n", + conf->stream_id, "NTB", fs->rx_port); + if (fwd_mode == TXONLY) + printf(" + Stream %u : %s%u TX only\n", + conf->stream_id, "NTB", fs->tx_port); + } } +} - rte_rawdev_get_attr(dev_id, "link_status", &val); - if (!val) { - printf("Link is not up, cannot receive file.\n"); +static void +start_pkt_fwd(void) +{ + struct ntb_fwd_lcore_conf *conf; + struct rte_eth_link eth_link; + uint8_t lcore_id; + int ret, i; + + ret = ntb_fwd_config_setup(); + if (ret < 0) { + printf("Cannot start traffic. Please reset fwd mode.\n"); return; } - file = fopen(res->filepath, "w"); - if (file == NULL) { - printf("Fail to open the file.\n"); + /* If using iofwd, checking ethdev link status first. */ + if (fwd_mode == IOFWD) { + printf("Checking eth link status...\n"); + /* Wait for eth link up at most 100 times. */ + for (i = 0; i < 100; i++) { + rte_eth_link_get(eth_port_id, ð_link); + if (eth_link.link_status) { + printf("Eth%u Link Up. Speed %u Mbps - %s\n", + eth_port_id, eth_link.link_speed, + (eth_link.link_duplex == + ETH_LINK_FULL_DUPLEX) ? + ("full-duplex") : ("half-duplex")); + break; + } + } + if (!eth_link.link_status) { + printf("Eth%u link down. Cannot start traffic.\n", + eth_port_id); + return; + } + } + + assign_stream_to_lcores(); + in_test = 1; + + RTE_LCORE_FOREACH_SLAVE(lcore_id) { + conf = &fwd_lcore_conf[lcore_id]; + + if (!conf->nb_stream) + continue; + + conf->stopped = 0; + if (fwd_mode == FILE_TRANS) + rte_eal_remote_launch(start_polling_recv_file, + conf, lcore_id); + else if (fwd_mode == IOFWD) + rte_eal_remote_launch(start_iofwd_per_lcore, + conf, lcore_id); + else if (fwd_mode == RXONLY) + rte_eal_remote_launch(start_rxonly_per_lcore, + conf, lcore_id); + else if (fwd_mode == TXONLY) + rte_eal_remote_launch(start_txonly_per_lcore, + conf, lcore_id); + } +} + +/* *** START FWD PARAMETERS *** */ +struct cmd_start_result { + cmdline_fixed_string_t start; +}; + +static void +cmd_start_parsed(__attribute__((unused)) void *parsed_result, + __attribute__((unused)) struct cmdline *cl, + __attribute__((unused)) void *data) +{ + start_pkt_fwd(); +} + +cmdline_parse_token_string_t cmd_start_start = + TOKEN_STRING_INITIALIZER(struct cmd_start_result, start, "start"); + +cmdline_parse_inst_t cmd_start = { + .f = cmd_start_parsed, + .data = NULL, + .help_str = "start pkt fwd between ntb and ethdev", + .tokens = { + (void *)&cmd_start_start, + NULL, + }, +}; + +/* *** STOP *** */ +struct cmd_stop_result { + cmdline_fixed_string_t stop; +}; + +static void +cmd_stop_parsed(__attribute__((unused)) void *parsed_result, + __attribute__((unused)) struct cmdline *cl, + __attribute__((unused)) void *data) +{ + struct ntb_fwd_lcore_conf *conf; + uint8_t lcore_id; + + RTE_LCORE_FOREACH_SLAVE(lcore_id) { + conf = &fwd_lcore_conf[lcore_id]; + + if (!conf->nb_stream) + continue; + + if (conf->stopped) + continue; + + conf->stopped = 1; + } + printf("\nWaiting for lcores to finish...\n"); + rte_eal_mp_wait_lcore(); + in_test = 0; + printf("\nDone.\n"); +} + +cmdline_parse_token_string_t cmd_stop_stop = + TOKEN_STRING_INITIALIZER(struct cmd_stop_result, stop, "stop"); + +cmdline_parse_inst_t cmd_stop = { + .f = cmd_stop_parsed, + .data = NULL, + .help_str = "stop: Stop packet forwarding", + .tokens = { + (void *)&cmd_stop_stop, + NULL, + }, +}; + +static void +ntb_stats_clear(void) +{ + int nb_ids, i; + uint32_t *ids; + + /* Clear NTB dev stats */ + nb_ids = rte_rawdev_xstats_names_get(dev_id, NULL, 0); + if (nb_ids < 0) { + printf("Error: Cannot get count of xstats\n"); return; } + ids = malloc(sizeof(uint32_t) * nb_ids); + for (i = 0; i < nb_ids; i++) + ids[i] = i; + rte_rawdev_xstats_reset(dev_id, ids, nb_ids); + printf("\n statistics for NTB port %d cleared\n", dev_id); + + /* Clear Ethdev stats if have any */ + if (fwd_mode == IOFWD && eth_port_id != RTE_MAX_ETHPORTS) { + rte_eth_stats_reset(eth_port_id); + printf("\n statistics for ETH port %d cleared\n", eth_port_id); + } +} + +static inline void +ntb_calculate_throughput(uint16_t port) { + uint64_t diff_pkts_rx, diff_pkts_tx, diff_cycles; + uint64_t mpps_rx, mpps_tx; + static uint64_t prev_pkts_rx[2]; + static uint64_t prev_pkts_tx[2]; + static uint64_t prev_cycles[2]; + + diff_cycles = prev_cycles[port]; + prev_cycles[port] = rte_rdtsc(); + if (diff_cycles > 0) + diff_cycles = prev_cycles[port] - diff_cycles; + diff_pkts_rx = (ntb_port_stats[port].rx > prev_pkts_rx[port]) ? + (ntb_port_stats[port].rx - prev_pkts_rx[port]) : 0; + diff_pkts_tx = (ntb_port_stats[port].tx > prev_pkts_tx[port]) ? + (ntb_port_stats[port].tx - prev_pkts_tx[port]) : 0; + prev_pkts_rx[port] = ntb_port_stats[port].rx; + prev_pkts_tx[port] = ntb_port_stats[port].tx; + mpps_rx = diff_cycles > 0 ? + diff_pkts_rx * rte_get_tsc_hz() / diff_cycles : 0; + mpps_tx = diff_cycles > 0 ? + diff_pkts_tx * rte_get_tsc_hz() / diff_cycles : 0; + printf(" Throughput (since last show)\n"); + printf(" Rx-pps: %12"PRIu64"\n Tx-pps: %12"PRIu64"\n", + mpps_rx, mpps_tx); + +} + +static void +ntb_stats_display(void) +{ + struct rte_rawdev_xstats_name *xstats_names; + struct rte_eth_stats stats; + uint64_t *values; + uint32_t *ids; + int nb_ids, i; - rte_rawdev_get_attr(dev_id, "spad_user_0", &val); - size = val << 32; - rte_rawdev_get_attr(dev_id, "spad_user_1", &val); - size |= val; + printf("###### statistics for NTB port %d #######\n", dev_id); - buff = (uint8_t *)malloc(size); - pkts_recv[0] = (struct rte_rawdev_buf *)malloc - (sizeof(struct rte_rawdev_buf)); - pkts_recv[0]->buf_addr = buff; + /* Get NTB dev stats and stats names */ + nb_ids = rte_rawdev_xstats_names_get(dev_id, NULL, 0); + if (nb_ids < 0) { + printf("Error: Cannot get count of xstats\n"); + return; + } + xstats_names = malloc(sizeof(struct rte_rawdev_xstats_name) * nb_ids); + if (xstats_names == NULL) { + printf("Cannot allocate memory for xstats lookup\n"); + return; + } + if (nb_ids != rte_rawdev_xstats_names_get( + dev_id, xstats_names, nb_ids)) { + printf("Error: Cannot get xstats lookup\n"); + free(xstats_names); + return; + } + ids = malloc(sizeof(uint32_t) * nb_ids); + for (i = 0; i < nb_ids; i++) + ids[i] = i; + values = malloc(sizeof(uint64_t) * nb_ids); + if (nb_ids != rte_rawdev_xstats_get(dev_id, ids, values, nb_ids)) { + printf("Error: Unable to get xstats\n"); + free(xstats_names); + free(values); + free(ids); + return; + } + + /* Display NTB dev stats */ + for (i = 0; i < nb_ids; i++) + printf(" %s: %"PRIu64"\n", xstats_names[i].name, values[i]); + ntb_calculate_throughput(0); - if (rte_rawdev_dequeue_buffers(dev_id, pkts_recv, 1, (void *)size)) { - printf("Fail to dequeue.\n"); - goto clean; + /* Get Ethdev stats if have any */ + if (fwd_mode == IOFWD && eth_port_id != RTE_MAX_ETHPORTS) { + printf("###### statistics for ETH port %d ######\n", + eth_port_id); + rte_eth_stats_get(eth_port_id, &stats); + printf(" RX-packets: %"PRIu64"\n", stats.ipackets); + printf(" RX-bytes: %"PRIu64"\n", stats.ibytes); + printf(" RX-errors: %"PRIu64"\n", stats.ierrors); + printf(" RX-missed: %"PRIu64"\n", stats.imissed); + printf(" TX-packets: %"PRIu64"\n", stats.opackets); + printf(" TX-bytes: %"PRIu64"\n", stats.obytes); + printf(" TX-errors: %"PRIu64"\n", stats.oerrors); + ntb_calculate_throughput(1); } - fwrite(buff, size, 1, file); - printf("Done receiving to file.\n"); + free(xstats_names); + free(values); + free(ids); +} -clean: - fclose(file); - free(buff); - free(pkts_recv[0]); +/* *** SHOW/CLEAR PORT STATS *** */ +struct cmd_stats_result { + cmdline_fixed_string_t show; + cmdline_fixed_string_t port; + cmdline_fixed_string_t stats; +}; + +static void +cmd_stats_parsed(void *parsed_result, + __attribute__((unused)) struct cmdline *cl, + __attribute__((unused)) void *data) +{ + struct cmd_stats_result *res = parsed_result; + if (!strcmp(res->show, "clear")) + ntb_stats_clear(); + else + ntb_stats_display(); } -cmdline_parse_token_string_t cmd_recv_file_recv = - TOKEN_STRING_INITIALIZER(struct cmd_recvfile_result, recv_string, - "recv"); -cmdline_parse_token_string_t cmd_recv_file_filepath = - TOKEN_STRING_INITIALIZER(struct cmd_recvfile_result, filepath, NULL); +cmdline_parse_token_string_t cmd_stats_show = + TOKEN_STRING_INITIALIZER(struct cmd_stats_result, show, "show#clear"); +cmdline_parse_token_string_t cmd_stats_port = + TOKEN_STRING_INITIALIZER(struct cmd_stats_result, port, "port"); +cmdline_parse_token_string_t cmd_stats_stats = + TOKEN_STRING_INITIALIZER(struct cmd_stats_result, stats, "stats"); -cmdline_parse_inst_t cmd_recv_file = { - .f = cmd_recvfile_parsed, +cmdline_parse_inst_t cmd_stats = { + .f = cmd_stats_parsed, .data = NULL, - .help_str = "recv ", + .help_str = "show|clear port stats", .tokens = { - (void *)&cmd_recv_file_recv, - (void *)&cmd_recv_file_filepath, + (void *)&cmd_stats_show, + (void *)&cmd_stats_port, + (void *)&cmd_stats_stats, + NULL, + }, +}; + +/* *** SET FORWARDING MODE *** */ +struct cmd_set_fwd_mode_result { + cmdline_fixed_string_t set; + cmdline_fixed_string_t fwd; + cmdline_fixed_string_t mode; +}; + +static void +cmd_set_fwd_mode_parsed(__attribute__((unused)) void *parsed_result, + __attribute__((unused)) struct cmdline *cl, + __attribute__((unused)) void *data) +{ + struct cmd_set_fwd_mode_result *res = parsed_result; + int i; + + if (in_test) { + printf("Please stop traffic first.\n"); + return; + } + + for (i = 0; i < MAX_FWD_MODE; i++) { + if (!strcmp(res->mode, fwd_mode_s[i])) { + fwd_mode = i; + return; + } + } + printf("Invalid %s packet forwarding mode.\n", res->mode); +} + +cmdline_parse_token_string_t cmd_setfwd_set = + TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, set, "set"); +cmdline_parse_token_string_t cmd_setfwd_fwd = + TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, fwd, "fwd"); +cmdline_parse_token_string_t cmd_setfwd_mode = + TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, mode, + "file-trans#iofwd#txonly#rxonly"); + +cmdline_parse_inst_t cmd_set_fwd_mode = { + .f = cmd_set_fwd_mode_parsed, + .data = NULL, + .help_str = "set forwarding mode as file-trans|rxonly|txonly|iofwd", + .tokens = { + (void *)&cmd_setfwd_set, + (void *)&cmd_setfwd_fwd, + (void *)&cmd_setfwd_mode, NULL, }, }; @@ -276,7 +1009,10 @@ cmdline_parse_inst_t cmd_recv_file = { cmdline_parse_ctx_t main_ctx[] = { (cmdline_parse_inst_t *)&cmd_help, (cmdline_parse_inst_t *)&cmd_send_file, - (cmdline_parse_inst_t *)&cmd_recv_file, + (cmdline_parse_inst_t *)&cmd_start, + (cmdline_parse_inst_t *)&cmd_stop, + (cmdline_parse_inst_t *)&cmd_stats, + (cmdline_parse_inst_t *)&cmd_set_fwd_mode, (cmdline_parse_inst_t *)&cmd_quit, NULL, }; @@ -305,45 +1041,257 @@ signal_handler(int signum) } } +#define OPT_BUF_SIZE "buf-size" +#define OPT_FWD_MODE "fwd-mode" +#define OPT_NB_DESC "nb-desc" +#define OPT_TXFREET "txfreet" +#define OPT_BURST "burst" +#define OPT_QP "qp" + +enum { + /* long options mapped to a short option */ + OPT_NO_ZERO_COPY_NUM = 1, + OPT_BUF_SIZE_NUM, + OPT_FWD_MODE_NUM, + OPT_NB_DESC_NUM, + OPT_TXFREET_NUM, + OPT_BURST_NUM, + OPT_QP_NUM, +}; + +static const char short_options[] = + "i" /* interactive mode */ + ; + +static const struct option lgopts[] = { + {OPT_BUF_SIZE, 1, NULL, OPT_BUF_SIZE_NUM }, + {OPT_FWD_MODE, 1, NULL, OPT_FWD_MODE_NUM }, + {OPT_NB_DESC, 1, NULL, OPT_NB_DESC_NUM }, + {OPT_TXFREET, 1, NULL, OPT_TXFREET_NUM }, + {OPT_BURST, 1, NULL, OPT_BURST_NUM }, + {OPT_QP, 1, NULL, OPT_QP_NUM }, + {0, 0, NULL, 0 } +}; + static void ntb_usage(const char *prgname) { printf("%s [EAL options] -- [options]\n" - "-i : run in interactive mode (default value is 1)\n", - prgname); + "-i: run in interactive mode.\n" + "-qp=N: set number of queues as N (N > 0, default: 1).\n" + "--fwd-mode=N: set fwd mode (N: file-trans | rxonly | " + "txonly | iofwd, default: file-trans)\n" + "--buf-size=N: set mbuf dataroom size as N (0 < N < 65535," + " default: 2048).\n" + "--nb-desc=N: set number of descriptors as N (%u <= N <= %u," + " default: 1024).\n" + "--txfreet=N: set tx free thresh for NTB driver as N. (N >= 0)\n" + "--burst=N: set pkt burst as N (0 < N <= %u default: 32).\n", + prgname, NTB_MIN_DESC_SIZE, NTB_MAX_DESC_SIZE, + NTB_MAX_PKT_BURST); } -static int -parse_args(int argc, char **argv) +static void +ntb_parse_args(int argc, char **argv) { char *prgname = argv[0], **argvopt = argv; - int opt, ret; + int opt, opt_idx, n, i; - /* Only support interactive mode to send/recv file first. */ - while ((opt = getopt(argc, argvopt, "i")) != EOF) { + while ((opt = getopt_long(argc, argvopt, short_options, + lgopts, &opt_idx)) != EOF) { switch (opt) { case 'i': - printf("Interactive-mode selected\n"); + printf("Interactive-mode selected.\n"); interactive = 1; break; + case OPT_QP_NUM: + n = atoi(optarg); + if (n > 0) + num_queues = n; + else + rte_exit(EXIT_FAILURE, "q must be > 0.\n"); + break; + case OPT_BUF_SIZE_NUM: + n = atoi(optarg); + if (n > RTE_PKTMBUF_HEADROOM && n <= 0xFFFF) + ntb_buf_size = n; + else + rte_exit(EXIT_FAILURE, "buf-size must be > " + "%u and < 65536.\n", + RTE_PKTMBUF_HEADROOM); + break; + case OPT_FWD_MODE_NUM: + for (i = 0; i < MAX_FWD_MODE; i++) { + if (!strcmp(optarg, fwd_mode_s[i])) { + fwd_mode = i; + break; + } + } + if (i == MAX_FWD_MODE) + rte_exit(EXIT_FAILURE, "Unsupported mode. " + "(Should be: file-trans | rxonly | txonly " + "| iofwd)\n"); + break; + case OPT_NB_DESC_NUM: + n = atoi(optarg); + if (n >= NTB_MIN_DESC_SIZE && n <= NTB_MAX_DESC_SIZE) + nb_desc = n; + else + rte_exit(EXIT_FAILURE, "nb-desc must be within" + " [%u, %u].\n", NTB_MIN_DESC_SIZE, + NTB_MAX_DESC_SIZE); + break; + case OPT_TXFREET_NUM: + n = atoi(optarg); + if (n >= 0) + tx_free_thresh = n; + else + rte_exit(EXIT_FAILURE, "txfreet must be" + " >= 0\n"); + break; + case OPT_BURST_NUM: + n = atoi(optarg); + if (n > 0 && n <= NTB_MAX_PKT_BURST) + pkt_burst = n; + else + rte_exit(EXIT_FAILURE, "burst must be within " + "(0, %u].\n", NTB_MAX_PKT_BURST); + break; default: ntb_usage(prgname); - return -1; + rte_exit(EXIT_FAILURE, + "Command line is incomplete or incorrect.\n"); + break; } } +} - if (optind >= 0) - argv[optind-1] = prgname; +static void +ntb_mempool_mz_free(__rte_unused struct rte_mempool_memhdr *memhdr, + void *opaque) +{ + const struct rte_memzone *mz = opaque; + rte_memzone_free(mz); +} - ret = optind-1; - optind = 1; /* reset getopt lib */ - return ret; +static struct rte_mempool * +ntb_mbuf_pool_create(uint16_t mbuf_seg_size, uint32_t nb_mbuf, + struct ntb_dev_info ntb_info, + struct ntb_dev_config *ntb_conf, + unsigned int socket_id) +{ + size_t mz_len, total_elt_sz, max_mz_len, left_sz; + struct rte_pktmbuf_pool_private mbp_priv; + char pool_name[RTE_MEMPOOL_NAMESIZE]; + char mz_name[RTE_MEMZONE_NAMESIZE]; + const struct rte_memzone *mz; + struct rte_mempool *mp; + uint64_t align; + uint32_t mz_id; + int ret; + + snprintf(pool_name, sizeof(pool_name), "ntb_mbuf_pool_%u", socket_id); + mp = rte_mempool_create_empty(pool_name, nb_mbuf, + (mbuf_seg_size + sizeof(struct rte_mbuf)), + MEMPOOL_CACHE_SIZE, + sizeof(struct rte_pktmbuf_pool_private), + socket_id, 0); + if (mp == NULL) + return NULL; + + mbp_priv.mbuf_data_room_size = mbuf_seg_size; + mbp_priv.mbuf_priv_size = 0; + rte_pktmbuf_pool_init(mp, &mbp_priv); + + ntb_conf->mz_list = rte_zmalloc("ntb_memzone_list", + sizeof(struct rte_memzone *) * + ntb_info.mw_cnt, 0); + if (ntb_conf->mz_list == NULL) + goto fail; + + /* Put ntb header on mw0. */ + if (ntb_info.mw_size[0] < ntb_info.ntb_hdr_size) { + printf("mw0 (size: %" PRIu64 ") is not enough for ntb hdr" + " (size: %u)\n", ntb_info.mw_size[0], + ntb_info.ntb_hdr_size); + goto fail; + } + + total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size; + left_sz = total_elt_sz * nb_mbuf; + for (mz_id = 0; mz_id < ntb_info.mw_cnt; mz_id++) { + /* If populated mbuf is enough, no need to reserve extra mz. */ + if (!left_sz) + break; + snprintf(mz_name, sizeof(mz_name), "ntb_mw_%d", mz_id); + align = ntb_info.mw_size_align ? ntb_info.mw_size[mz_id] : + RTE_CACHE_LINE_SIZE; + /* Reserve ntb header space on memzone 0. */ + max_mz_len = mz_id ? ntb_info.mw_size[mz_id] : + ntb_info.mw_size[mz_id] - ntb_info.ntb_hdr_size; + mz_len = left_sz <= max_mz_len ? left_sz : + (max_mz_len / total_elt_sz * total_elt_sz); + if (!mz_len) + continue; + mz = rte_memzone_reserve_aligned(mz_name, mz_len, socket_id, + RTE_MEMZONE_IOVA_CONTIG, align); + if (mz == NULL) { + printf("Cannot allocate %" PRIu64 " aligned memzone" + " %u\n", align, mz_id); + goto fail; + } + left_sz -= mz_len; + + /* Reserve ntb header space on memzone 0. */ + if (mz_id) + ret = rte_mempool_populate_iova(mp, mz->addr, mz->iova, + mz->len, ntb_mempool_mz_free, + (void *)(uintptr_t)mz); + else + ret = rte_mempool_populate_iova(mp, + (void *)((size_t)mz->addr + + ntb_info.ntb_hdr_size), + mz->iova + ntb_info.ntb_hdr_size, + mz->len - ntb_info.ntb_hdr_size, + ntb_mempool_mz_free, + (void *)(uintptr_t)mz); + if (ret < 0) { + rte_memzone_free(mz); + rte_mempool_free(mp); + return NULL; + } + + ntb_conf->mz_list[mz_id] = mz; + } + if (left_sz) { + printf("mw space is not enough for mempool.\n"); + goto fail; + } + + ntb_conf->mz_num = mz_id; + rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL); + + return mp; +fail: + rte_mempool_free(mp); + return NULL; } int main(int argc, char **argv) { + struct rte_eth_conf eth_pconf = eth_port_conf; + struct rte_rawdev_info ntb_rawdev_conf; + struct rte_rawdev_info ntb_rawdev_info; + struct rte_eth_dev_info ethdev_info; + struct rte_eth_rxconf eth_rx_conf; + struct rte_eth_txconf eth_tx_conf; + struct ntb_queue_conf ntb_q_conf; + struct ntb_dev_config ntb_conf; + struct ntb_dev_info ntb_info; + uint64_t ntb_link_status; + uint32_t nb_mbuf; int ret, i; signal(SIGINT, signal_handler); @@ -353,6 +1301,9 @@ main(int argc, char **argv) if (ret < 0) rte_exit(EXIT_FAILURE, "Error with EAL initialization.\n"); + if (rte_lcore_count() < 2) + rte_exit(EXIT_FAILURE, "Need at least 2 cores\n"); + /* Find 1st ntb rawdev. */ for (i = 0; i < RTE_RAWDEV_MAX_DEVS; i++) if (rte_rawdevs[i].driver_name && @@ -368,15 +1319,118 @@ main(int argc, char **argv) argc -= ret; argv += ret; - ret = parse_args(argc, argv); + ntb_parse_args(argc, argv); + + rte_rawdev_set_attr(dev_id, NTB_QUEUE_SZ_NAME, nb_desc); + printf("Set queue size as %u.\n", nb_desc); + rte_rawdev_set_attr(dev_id, NTB_QUEUE_NUM_NAME, num_queues); + printf("Set queue number as %u.\n", num_queues); + ntb_rawdev_info.dev_private = (rte_rawdev_obj_t)(&ntb_info); + rte_rawdev_info_get(dev_id, &ntb_rawdev_info); + + nb_mbuf = nb_desc * num_queues * 2 * 2 + rte_lcore_count() * + MEMPOOL_CACHE_SIZE; + mbuf_pool = ntb_mbuf_pool_create(ntb_buf_size, nb_mbuf, ntb_info, + &ntb_conf, rte_socket_id()); + if (mbuf_pool == NULL) + rte_exit(EXIT_FAILURE, "Cannot create mbuf pool.\n"); + + ntb_conf.num_queues = num_queues; + ntb_conf.queue_size = nb_desc; + ntb_rawdev_conf.dev_private = (rte_rawdev_obj_t)(&ntb_conf); + ret = rte_rawdev_configure(dev_id, &ntb_rawdev_conf); + if (ret) + rte_exit(EXIT_FAILURE, "Can't config ntb dev: err=%d, " + "port=%u\n", ret, dev_id); + + ntb_q_conf.tx_free_thresh = tx_free_thresh; + ntb_q_conf.nb_desc = nb_desc; + ntb_q_conf.rx_mp = mbuf_pool; + for (i = 0; i < num_queues; i++) { + /* Setup rawdev queue */ + ret = rte_rawdev_queue_setup(dev_id, i, &ntb_q_conf); + if (ret < 0) + rte_exit(EXIT_FAILURE, + "Failed to setup ntb queue %u.\n", i); + } + + /* Waiting for peer dev up at most 100s.*/ + printf("Checking ntb link status...\n"); + for (i = 0; i < 1000; i++) { + rte_rawdev_get_attr(dev_id, NTB_LINK_STATUS_NAME, + &ntb_link_status); + if (ntb_link_status) { + printf("Peer dev ready, ntb link up.\n"); + break; + } + rte_delay_ms(100); + } + rte_rawdev_get_attr(dev_id, NTB_LINK_STATUS_NAME, &ntb_link_status); + if (ntb_link_status == 0) + printf("Expire 100s. Link is not up. Please restart app.\n"); + + ret = rte_rawdev_start(dev_id); if (ret < 0) - rte_exit(EXIT_FAILURE, "Invalid arguments\n"); + rte_exit(EXIT_FAILURE, "rte_rawdev_start: err=%d, port=%u\n", + ret, dev_id); + + /* Find 1st ethdev */ + eth_port_id = rte_eth_find_next(0); - rte_rawdev_start(dev_id); + if (eth_port_id < RTE_MAX_ETHPORTS) { + rte_eth_dev_info_get(eth_port_id, ðdev_info); + eth_pconf.rx_adv_conf.rss_conf.rss_hf &= + ethdev_info.flow_type_rss_offloads; + ret = rte_eth_dev_configure(eth_port_id, num_queues, + num_queues, ð_pconf); + if (ret) + rte_exit(EXIT_FAILURE, "Can't config ethdev: err=%d, " + "port=%u\n", ret, eth_port_id); + eth_rx_conf = ethdev_info.default_rxconf; + eth_rx_conf.offloads = eth_pconf.rxmode.offloads; + eth_tx_conf = ethdev_info.default_txconf; + eth_tx_conf.offloads = eth_pconf.txmode.offloads; + + /* Setup ethdev queue if ethdev exists */ + for (i = 0; i < num_queues; i++) { + ret = rte_eth_rx_queue_setup(eth_port_id, i, nb_desc, + rte_eth_dev_socket_id(eth_port_id), + ð_rx_conf, mbuf_pool); + if (ret < 0) + rte_exit(EXIT_FAILURE, + "Failed to setup eth rxq %u.\n", i); + ret = rte_eth_tx_queue_setup(eth_port_id, i, nb_desc, + rte_eth_dev_socket_id(eth_port_id), + ð_tx_conf); + if (ret < 0) + rte_exit(EXIT_FAILURE, + "Failed to setup eth txq %u.\n", i); + } + + ret = rte_eth_dev_start(eth_port_id); + if (ret < 0) + rte_exit(EXIT_FAILURE, "rte_eth_dev_start: err=%d, " + "port=%u\n", ret, eth_port_id); + } + + /* initialize port stats */ + memset(&ntb_port_stats, 0, sizeof(ntb_port_stats)); + + /* Set default fwd mode if user doesn't set it. */ + if (fwd_mode == MAX_FWD_MODE && eth_port_id < RTE_MAX_ETHPORTS) { + printf("Set default fwd mode as iofwd.\n"); + fwd_mode = IOFWD; + } + if (fwd_mode == MAX_FWD_MODE) { + printf("Set default fwd mode as file-trans.\n"); + fwd_mode = FILE_TRANS; + } if (interactive) { sleep(1); prompt(); + } else { + start_pkt_fwd(); } return 0;