From patchwork Sun Aug 14 15:06:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Jiayu" X-Patchwork-Id: 114952 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C4287A00C3; Sun, 14 Aug 2022 17:06:44 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 6E1CB40697; Sun, 14 Aug 2022 17:06:44 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by mails.dpdk.org (Postfix) with ESMTP id 6100F40146 for ; Sun, 14 Aug 2022 17:06:42 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1660489602; x=1692025602; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=07pe0AdbrjYKdGHj+Z7qoK2qHbBhsZNuEsaHrr11SVE=; b=lVoyCl40gutzU7fg1Lf6eJAR+KCA85o3S5DDdVW0n3aNtmW8817b/7eh LKIDhfOreJacUjkN/PL9aF8WxozvL7w8LOjxtO7u6kxt+WupQxxbo4QBz jaBb/RIpSWOaq4czGj7HRPGRlsb3ZmDe73Eh035B7F6Hm8LqTRyVKZm1B vkbgAN3h059Lb+err8I/IZZypSHBldYaH83REQ09n5xnuXugOUZnQWYpf 1VAePqfvll9ZWpQ+7ktr4xsLI2E5gUu/9q90cftJHLwFQH3Bri2ZfjfRj zm3bEaiHPyY9QK1VPIV7L4dxSUEFQTfpJjlu9cGoce3AOZ77/nODpzEI1 w==; X-IronPort-AV: E=McAfee;i="6400,9594,10439"; a="293101972" X-IronPort-AV: E=Sophos;i="5.93,236,1654585200"; d="scan'208";a="293101972" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Aug 2022 08:06:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,236,1654585200"; d="scan'208";a="674583665" Received: from npg-dpdk-virtio-jiayuhu-117.sh.intel.com ([10.67.119.166]) by fmsmga004.fm.intel.com with ESMTP; 14 Aug 2022 08:06:39 -0700 From: Jiayu Hu To: dev@dpdk.org Cc: maxime.coquelin@redhat.com, chenbo.xia@intel.com, xingguang.he@intel.com, Jiayu Hu , Yuan Wang , Wenwu Ma Subject: [PATCH] net/vhost: support asynchronous data path Date: Sun, 14 Aug 2022 11:06:36 -0400 Message-Id: <20220814150636.2260317-1-jiayu.hu@intel.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Vhost asynchronous data-path offloads packet copy from the CPU to the DMA engine. As a result, large packet copy can be accelerated by the DMA engine, and vhost can free CPU cycles for higher level functions. In this patch, we enable asynchronous data-path for vhostpmd. Asynchronous data path is enabled per tx/rx queue, and users need to specify the DMA device used by the tx/rx queue. Each tx/rx queue only supports to use one DMA device, but one DMA device can be shared among multiple tx/rx queues of different vhostpmd ports. Two PMD parameters are added: - dmas: specify the used DMA device for a tx/rx queue (Default: no queues enable asynchronous data path) - dma-ring-size: DMA ring size. (Default: 2048). Here is an example: --vdev 'eth_vhost0,iface=./s0,dmas=[txq0@0000:00.01.0;rxq0@0000:00.01.1],dma-ring-size=4096' Signed-off-by: Jiayu Hu Signed-off-by: Yuan Wang Signed-off-by: Wenwu Ma --- drivers/net/vhost/meson.build | 1 + drivers/net/vhost/rte_eth_vhost.c | 488 ++++++++++++++++++++++++++++-- drivers/net/vhost/rte_eth_vhost.h | 14 + 3 files changed, 470 insertions(+), 33 deletions(-) diff --git a/drivers/net/vhost/meson.build b/drivers/net/vhost/meson.build index f481a3a4b8..22a0ab3a58 100644 --- a/drivers/net/vhost/meson.build +++ b/drivers/net/vhost/meson.build @@ -9,4 +9,5 @@ endif deps += 'vhost' sources = files('rte_eth_vhost.c') +testpmd_sources = files('vhost_testpmd.c') headers = files('rte_eth_vhost.h') diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c index 7e512d94bf..361e5d66c6 100644 --- a/drivers/net/vhost/rte_eth_vhost.c +++ b/drivers/net/vhost/rte_eth_vhost.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #include "rte_eth_vhost.h" @@ -36,8 +38,13 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; #define ETH_VHOST_LINEAR_BUF "linear-buffer" #define ETH_VHOST_EXT_BUF "ext-buffer" #define ETH_VHOST_LEGACY_OL_FLAGS "legacy-ol-flags" +#define ETH_VHOST_DMA_ARG "dmas" +#define ETH_VHOST_DMA_RING_SIZE "dma-ring-size" #define VHOST_MAX_PKT_BURST 32 +#define INVALID_DMA_ID -1 +#define DEFAULT_DMA_RING_SIZE 2048 + static const char *valid_arguments[] = { ETH_VHOST_IFACE_ARG, ETH_VHOST_QUEUES_ARG, @@ -48,6 +55,8 @@ static const char *valid_arguments[] = { ETH_VHOST_LINEAR_BUF, ETH_VHOST_EXT_BUF, ETH_VHOST_LEGACY_OL_FLAGS, + ETH_VHOST_DMA_ARG, + ETH_VHOST_DMA_RING_SIZE, NULL }; @@ -79,8 +88,39 @@ struct vhost_queue { struct vhost_stats stats; int intr_enable; rte_spinlock_t intr_lock; + + /* Flag of enabling async data path */ + bool async_register; + /* DMA device ID */ + int16_t dma_id; + /** + * For a Rx queue, "txq" points to its peer Tx queue. + * For a Tx queue, "txq" is never used. + */ + struct vhost_queue *txq; + /* Array to keep DMA completed packets */ + struct rte_mbuf *cmpl_pkts[VHOST_MAX_PKT_BURST]; }; +struct dma_input_info { + int16_t dmas[RTE_MAX_QUEUES_PER_PORT * 2]; + uint16_t dma_ring_size; +}; + +static int16_t configured_dmas[RTE_DMADEV_DEFAULT_MAX]; +static int dma_count; + +/** + * By default, its Rx path to call rte_vhost_poll_enqueue_completed() for enqueue operations. + * However, Rx function is never been called in testpmd "txonly" mode, thus causing virtio + * cannot receive DMA completed packets. To make txonly mode work correctly, we provide a + * command in testpmd to call rte_vhost_poll_enqueue_completed() in Tx path. + * + * When set async_tx_poll_completed to true, Tx path calls rte_vhost_poll_enqueue_completed(); + * otherwise, Rx path calls it. + */ +bool async_tx_poll_completed; + struct pmd_internal { rte_atomic32_t dev_attached; char *iface_name; @@ -93,6 +133,10 @@ struct pmd_internal { bool vlan_strip; bool rx_sw_csum; bool tx_sw_csum; + struct { + int16_t dma_id; + bool async_register; + } queue_dmas[RTE_MAX_QUEUES_PER_PORT * 2]; }; struct internal_list { @@ -123,6 +167,17 @@ struct rte_vhost_vring_state { static struct rte_vhost_vring_state *vring_states[RTE_MAX_ETHPORTS]; +static bool +dma_is_configured(int16_t dma_id) +{ + int i; + + for (i = 0; i < dma_count; i++) + if (configured_dmas[i] == dma_id) + return true; + return false; +} + static int vhost_dev_xstats_reset(struct rte_eth_dev *dev) { @@ -395,6 +450,17 @@ vhost_dev_rx_sw_csum(struct rte_mbuf *mbuf) mbuf->ol_flags |= RTE_MBUF_F_RX_L4_CKSUM_GOOD; } +static inline void +vhost_tx_free_completed(uint16_t vid, uint16_t virtqueue_id, int16_t dma_id, + struct rte_mbuf **pkts, uint16_t count) +{ + uint16_t i, ret; + + ret = rte_vhost_poll_enqueue_completed(vid, virtqueue_id, pkts, count, dma_id, 0); + for (i = 0; likely(i < ret); i++) + rte_pktmbuf_free(pkts[i]); +} + static uint16_t eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) { @@ -403,7 +469,7 @@ eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) uint16_t nb_receive = nb_bufs; if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0)) - return 0; + goto tx_poll; rte_atomic32_set(&r->while_queuing, 1); @@ -411,19 +477,36 @@ eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) goto out; /* Dequeue packets from guest TX queue */ - while (nb_receive) { - uint16_t nb_pkts; - uint16_t num = (uint16_t)RTE_MIN(nb_receive, - VHOST_MAX_PKT_BURST); - - nb_pkts = rte_vhost_dequeue_burst(r->vid, r->virtqueue_id, - r->mb_pool, &bufs[nb_rx], - num); - - nb_rx += nb_pkts; - nb_receive -= nb_pkts; - if (nb_pkts < num) - break; + if (!r->async_register) { + while (nb_receive) { + uint16_t nb_pkts; + uint16_t num = (uint16_t)RTE_MIN(nb_receive, + VHOST_MAX_PKT_BURST); + + nb_pkts = rte_vhost_dequeue_burst(r->vid, r->virtqueue_id, + r->mb_pool, &bufs[nb_rx], + num); + + nb_rx += nb_pkts; + nb_receive -= nb_pkts; + if (nb_pkts < num) + break; + } + } else { + while (nb_receive) { + uint16_t nb_pkts; + uint16_t num = (uint16_t)RTE_MIN(nb_receive, VHOST_MAX_PKT_BURST); + int nr_inflight; + + nb_pkts = rte_vhost_async_try_dequeue_burst(r->vid, r->virtqueue_id, + r->mb_pool, &bufs[nb_rx], num, &nr_inflight, + r->dma_id, 0); + + nb_rx += nb_pkts; + nb_receive -= nb_pkts; + if (nb_pkts < num) + break; + } } r->stats.pkts += nb_rx; @@ -444,6 +527,17 @@ eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) out: rte_atomic32_set(&r->while_queuing, 0); +tx_poll: + /** + * Poll and free completed packets for the virtqueue of Tx queue. + * Note that we access Tx queue's virtqueue, which is protected + * by vring lock. + */ + if (!async_tx_poll_completed && r->txq->async_register) { + vhost_tx_free_completed(r->vid, r->txq->virtqueue_id, r->txq->dma_id, + r->cmpl_pkts, VHOST_MAX_PKT_BURST); + } + return nb_rx; } @@ -485,31 +579,53 @@ eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) } /* Enqueue packets to guest RX queue */ - while (nb_send) { - uint16_t nb_pkts; - uint16_t num = (uint16_t)RTE_MIN(nb_send, - VHOST_MAX_PKT_BURST); + if (!r->async_register) { + while (nb_send) { + uint16_t nb_pkts; + uint16_t num = (uint16_t)RTE_MIN(nb_send, VHOST_MAX_PKT_BURST); + + nb_pkts = rte_vhost_enqueue_burst(r->vid, r->virtqueue_id, + &bufs[nb_tx], num); + + nb_tx += nb_pkts; + nb_send -= nb_pkts; + if (nb_pkts < num) + break; + } - nb_pkts = rte_vhost_enqueue_burst(r->vid, r->virtqueue_id, - &bufs[nb_tx], num); + for (i = 0; likely(i < nb_tx); i++) { + nb_bytes += bufs[i]->pkt_len; + rte_pktmbuf_free(bufs[i]); + } - nb_tx += nb_pkts; - nb_send -= nb_pkts; - if (nb_pkts < num) - break; - } + } else { + while (nb_send) { + uint16_t nb_pkts; + uint16_t num = (uint16_t)RTE_MIN(nb_send, VHOST_MAX_PKT_BURST); - for (i = 0; likely(i < nb_tx); i++) - nb_bytes += bufs[i]->pkt_len; + nb_pkts = rte_vhost_submit_enqueue_burst(r->vid, r->virtqueue_id, + &bufs[nb_tx], num, r->dma_id, 0); - nb_missed = nb_bufs - nb_tx; + nb_tx += nb_pkts; + nb_send -= nb_pkts; + if (nb_pkts < num) + break; + } + for (i = 0; likely(i < nb_tx); i++) + nb_bytes += bufs[i]->pkt_len; + + if (unlikely(async_tx_poll_completed)) { + vhost_tx_free_completed(r->vid, r->virtqueue_id, r->dma_id, r->cmpl_pkts, + VHOST_MAX_PKT_BURST); + } + } + + nb_missed = nb_bufs - nb_tx; r->stats.pkts += nb_tx; r->stats.bytes += nb_bytes; r->stats.missed_pkts += nb_missed; - for (i = 0; likely(i < nb_tx); i++) - rte_pktmbuf_free(bufs[i]); out: rte_atomic32_set(&r->while_queuing, 0); @@ -797,6 +913,8 @@ queue_setup(struct rte_eth_dev *eth_dev, struct pmd_internal *internal) vq->vid = internal->vid; vq->internal = internal; vq->port = eth_dev->data->port_id; + if (i < eth_dev->data->nb_tx_queues) + vq->txq = eth_dev->data->tx_queues[i]; } for (i = 0; i < eth_dev->data->nb_tx_queues; i++) { vq = eth_dev->data->tx_queues[i]; @@ -982,6 +1100,9 @@ vring_state_changed(int vid, uint16_t vring, int enable) struct rte_vhost_vring_state *state; struct rte_eth_dev *eth_dev; struct internal_list *list; + struct vhost_queue *queue; + struct pmd_internal *internal; + int qid; char ifname[PATH_MAX]; rte_vhost_get_ifname(vid, ifname, sizeof(ifname)); @@ -1010,6 +1131,65 @@ vring_state_changed(int vid, uint16_t vring, int enable) update_queuing_status(eth_dev, false); + qid = vring / VIRTIO_QNUM; + if (vring % VIRTIO_QNUM == VIRTIO_RXQ) + queue = eth_dev->data->tx_queues[qid]; + else + queue = eth_dev->data->rx_queues[qid]; + + if (!queue) + goto skip; + + internal = eth_dev->data->dev_private; + + /* Register async data path for the queue assigned valid DMA device */ + if (internal->queue_dmas[queue->virtqueue_id].dma_id == INVALID_DMA_ID) + goto skip; + + if (enable && !queue->async_register) { + if (rte_vhost_async_channel_register_thread_unsafe(vid, vring)) { + VHOST_LOG(ERR, "Failed to register async for vid-%u vring-%u!\n", vid, + vring); + return -1; + } + + queue->async_register = true; + internal->queue_dmas[vring].async_register = true; + + VHOST_LOG(INFO, "Succeed to register async for vid-%u vring-%u\n", vid, vring); + } + + if (!enable && queue->async_register) { + struct rte_mbuf *pkts[VHOST_MAX_PKT_BURST]; + uint16_t ret, i, nr_done = 0; + uint16_t dma_id = queue->dma_id; + + while (rte_vhost_async_get_inflight_thread_unsafe(vid, vring) > 0) { + ret = rte_vhost_clear_queue_thread_unsafe(vid, vring, pkts, + VHOST_MAX_PKT_BURST, dma_id, 0); + + for (i = 0; i < ret ; i++) + rte_pktmbuf_free(pkts[i]); + + nr_done += ret; + } + + VHOST_LOG(INFO, "Completed %u in-flight pkts for vid-%u vring-%u\n", nr_done, vid, + vring); + + if (rte_vhost_async_channel_unregister_thread_unsafe(vid, vring)) { + VHOST_LOG(ERR, "Failed to unregister async for vid-%u vring-%u\n", vid, + vring); + return -1; + } + + queue->async_register = false; + internal->queue_dmas[vring].async_register = false; + + VHOST_LOG(INFO, "Succeed to unregister async for vid-%u vring-%u\n", vid, vring); + } + +skip: VHOST_LOG(INFO, "vring%u is %s\n", vring, enable ? "enabled" : "disabled"); @@ -1214,11 +1394,37 @@ eth_dev_stop(struct rte_eth_dev *dev) return 0; } +static inline int +async_clear_virtqueue(uint16_t vid, uint16_t virtqueue_id, int16_t dma_id) +{ + struct rte_mbuf *pkts[VHOST_MAX_PKT_BURST]; + uint16_t i, ret, nr_done = 0; + + while (rte_vhost_async_get_inflight(vid, virtqueue_id) > 0) { + ret = rte_vhost_clear_queue(vid, virtqueue_id, pkts, VHOST_MAX_PKT_BURST, dma_id, + 0); + for (i = 0; i < ret ; i++) + rte_pktmbuf_free(pkts[i]); + + nr_done += ret; + } + VHOST_LOG(INFO, "Completed %u pkts for vid-%u vring-%u\n", nr_done, vid, virtqueue_id); + + if (rte_vhost_async_channel_unregister(vid, virtqueue_id)) { + VHOST_LOG(ERR, "Failed to unregister async for vid-%u vring-%u\n", vid, + virtqueue_id); + return -1; + } + + return nr_done; +} + static int eth_dev_close(struct rte_eth_dev *dev) { struct pmd_internal *internal; struct internal_list *list; + struct vhost_queue *queue; unsigned int i, ret; if (rte_eal_process_type() != RTE_PROC_PRIMARY) @@ -1232,6 +1438,27 @@ eth_dev_close(struct rte_eth_dev *dev) list = find_internal_resource(internal->iface_name); if (list) { + /* Make sure all in-flight packets are completed before destroy virtio */ + if (dev->data->rx_queues) { + for (i = 0; i < dev->data->nb_rx_queues; i++) { + queue = dev->data->rx_queues[i]; + if (queue->async_register) { + async_clear_virtqueue(queue->vid, queue->virtqueue_id, + queue->dma_id); + } + } + } + + if (dev->data->tx_queues) { + for (i = 0; i < dev->data->nb_tx_queues; i++) { + queue = dev->data->tx_queues[i]; + if (queue->async_register) { + async_clear_virtqueue(queue->vid, queue->virtqueue_id, + queue->dma_id); + } + } + } + rte_vhost_driver_unregister(internal->iface_name); pthread_mutex_lock(&internal_list_lock); TAILQ_REMOVE(&internal_list, list, next); @@ -1266,6 +1493,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id, struct rte_mempool *mb_pool) { struct vhost_queue *vq; + struct pmd_internal *internal = dev->data->dev_private; vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue), RTE_CACHE_LINE_SIZE, socket_id); @@ -1276,6 +1504,8 @@ eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id, vq->mb_pool = mb_pool; vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ; + vq->async_register = internal->queue_dmas[vq->virtqueue_id].async_register; + vq->dma_id = internal->queue_dmas[vq->virtqueue_id].dma_id; rte_spinlock_init(&vq->intr_lock); dev->data->rx_queues[rx_queue_id] = vq; @@ -1289,6 +1519,7 @@ eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id, const struct rte_eth_txconf *tx_conf __rte_unused) { struct vhost_queue *vq; + struct pmd_internal *internal = dev->data->dev_private; vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue), RTE_CACHE_LINE_SIZE, socket_id); @@ -1298,6 +1529,8 @@ eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id, } vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ; + vq->async_register = internal->queue_dmas[vq->virtqueue_id].async_register; + vq->dma_id = internal->queue_dmas[vq->virtqueue_id].dma_id; rte_spinlock_init(&vq->intr_lock); dev->data->tx_queues[tx_queue_id] = vq; @@ -1508,13 +1741,14 @@ static const struct eth_dev_ops ops = { static int eth_dev_vhost_create(struct rte_vdev_device *dev, char *iface_name, int16_t queues, const unsigned int numa_node, uint64_t flags, - uint64_t disable_flags) + uint64_t disable_flags, struct dma_input_info *dma_input) { const char *name = rte_vdev_device_name(dev); struct rte_eth_dev_data *data; struct pmd_internal *internal = NULL; struct rte_eth_dev *eth_dev = NULL; struct rte_ether_addr *eth_addr = NULL; + int i = 0; VHOST_LOG(INFO, "Creating VHOST-USER backend on numa socket %u\n", numa_node); @@ -1563,6 +1797,12 @@ eth_dev_vhost_create(struct rte_vdev_device *dev, char *iface_name, eth_dev->rx_pkt_burst = eth_vhost_rx; eth_dev->tx_pkt_burst = eth_vhost_tx; + for (i = 0; i < RTE_MAX_QUEUES_PER_PORT * 2; i++) { + /* Invalid DMA ID indicates the queue does not want to enable async data path */ + internal->queue_dmas[i].dma_id = dma_input->dmas[i]; + internal->queue_dmas[i].async_register = false; + } + rte_eth_dev_probing_finish(eth_dev); return 0; @@ -1602,6 +1842,153 @@ open_int(const char *key __rte_unused, const char *value, void *extra_args) return 0; } +static int +init_dma(int16_t dma_id, uint16_t ring_size) +{ + struct rte_dma_info info; + struct rte_dma_conf dev_config = { .nb_vchans = 1 }; + struct rte_dma_vchan_conf qconf = { + .direction = RTE_DMA_DIR_MEM_TO_MEM, + }; + int ret = 0; + + if (dma_is_configured(dma_id)) + goto out; + + if (rte_dma_info_get(dma_id, &info) != 0) { + VHOST_LOG(ERR, "dma %u get info failed\n", dma_id); + ret = -1; + goto out; + } + + if (info.max_vchans < 1) { + VHOST_LOG(ERR, "No channels available on dma %d\n", dma_id); + ret = -1; + goto out; + } + + if (rte_dma_configure(dma_id, &dev_config) != 0) { + VHOST_LOG(ERR, "dma %u configure failed\n", dma_id); + ret = -1; + goto out; + } + + rte_dma_info_get(dma_id, &info); + if (info.nb_vchans != 1) { + VHOST_LOG(ERR, "dma %u has no queues\n", dma_id); + ret = -1; + goto out; + } + + qconf.nb_desc = RTE_MIN(ring_size, info.max_desc); + if (rte_dma_vchan_setup(dma_id, 0, &qconf) != 0) { + VHOST_LOG(ERR, "dma %u queue setup failed\n", dma_id); + ret = -1; + goto out; + } + + if (rte_dma_start(dma_id) != 0) { + VHOST_LOG(ERR, "dma %u start failed\n", dma_id); + ret = -1; + goto out; + } + + configured_dmas[dma_count++] = dma_id; + +out: + return ret; +} + +static int +open_dma(const char *key __rte_unused, const char *value, void *extra_args) +{ + struct dma_input_info *dma_input = extra_args; + char *input = strndup(value, strlen(value) + 1); + char *addrs = input; + char *ptrs[2]; + char *start, *end, *substr; + uint16_t qid, virtqueue_id; + int16_t dma_id; + int ret = 0; + + while (isblank(*addrs)) + addrs++; + if (*addrs == '\0') { + VHOST_LOG(ERR, "No input DMA addresses\n"); + ret = -1; + goto out; + } + + /* process DMA devices within bracket. */ + addrs++; + substr = strtok(addrs, ";]"); + if (!substr) { + VHOST_LOG(ERR, "No input DMA addresse\n"); + ret = -1; + goto out; + } + + do { + rte_strsplit(substr, strlen(substr), ptrs, 2, '@'); + + char *txq, *rxq; + bool is_txq; + + txq = strstr(ptrs[0], "txq"); + rxq = strstr(ptrs[0], "rxq"); + if (txq == NULL && rxq == NULL) { + VHOST_LOG(ERR, "Illegal queue\n"); + ret = -1; + goto out; + } else if (txq) { + is_txq = true; + start = txq; + } else { + is_txq = false; + start = rxq; + } + + start += 3; + qid = strtol(start, &end, 0); + if (end == start) { + VHOST_LOG(ERR, "No input queue ID\n"); + ret = -1; + goto out; + } + + virtqueue_id = is_txq ? qid * 2 + VIRTIO_RXQ : qid * 2 + VIRTIO_TXQ; + + dma_id = rte_dma_get_dev_id_by_name(ptrs[1]); + if (dma_id < 0) { + VHOST_LOG(ERR, "Fail to find DMA device %s.\n", ptrs[1]); + ret = -1; + goto out; + } + + ret = init_dma(dma_id, dma_input->dma_ring_size); + if (ret != 0) { + VHOST_LOG(ERR, "Fail to initialize DMA %u\n", dma_id); + ret = -1; + break; + } + + dma_input->dmas[virtqueue_id] = dma_id; + + substr = strtok(NULL, ";]"); + } while (substr); + + for (int i = 0; i < dma_count; i++) { + if (rte_vhost_async_dma_configure(configured_dmas[i], 0) < 0) { + VHOST_LOG(ERR, "Fail to configure DMA %u to vhost\n", configured_dmas[i]); + ret = -1; + } + } + +out: + free(input); + return ret; +} + static int rte_pmd_vhost_probe(struct rte_vdev_device *dev) { @@ -1620,6 +2007,10 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev) int legacy_ol_flags = 0; struct rte_eth_dev *eth_dev; const char *name = rte_vdev_device_name(dev); + struct dma_input_info dma_input; + + memset(dma_input.dmas, INVALID_DMA_ID, sizeof(dma_input.dmas)); + dma_input.dma_ring_size = DEFAULT_DMA_RING_SIZE; VHOST_LOG(INFO, "Initializing pmd_vhost for %s\n", name); @@ -1735,6 +2126,35 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev) goto out_free; } + if (rte_kvargs_count(kvlist, ETH_VHOST_DMA_RING_SIZE) == 1) { + ret = rte_kvargs_process(kvlist, ETH_VHOST_DMA_RING_SIZE, + &open_int, &dma_input.dma_ring_size); + if (ret < 0) + goto out_free; + + if (!rte_is_power_of_2(dma_input.dma_ring_size)) { + dma_input.dma_ring_size = rte_align32pow2(dma_input.dma_ring_size); + VHOST_LOG(INFO, "Convert dma_ring_size to the power of two %u\n", + dma_input.dma_ring_size); + } + } + + if (rte_kvargs_count(kvlist, ETH_VHOST_DMA_ARG) == 1) { + ret = rte_kvargs_process(kvlist, ETH_VHOST_DMA_ARG, + &open_dma, &dma_input); + if (ret < 0) { + VHOST_LOG(ERR, "Failed to parse %s\n", ETH_VHOST_DMA_ARG); + goto out_free; + } + + flags |= RTE_VHOST_USER_ASYNC_COPY; + /** + * Don't support live migration when enable + * DMA acceleration. + */ + disable_flags |= (1ULL << VHOST_F_LOG_ALL); + } + if (legacy_ol_flags == 0) flags |= RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS; @@ -1742,7 +2162,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev) dev->device.numa_node = rte_socket_id(); ret = eth_dev_vhost_create(dev, iface_name, queues, - dev->device.numa_node, flags, disable_flags); + dev->device.numa_node, flags, disable_flags, &dma_input); if (ret == -1) VHOST_LOG(ERR, "Failed to create %s\n", name); @@ -1786,4 +2206,6 @@ RTE_PMD_REGISTER_PARAM_STRING(net_vhost, "postcopy-support=<0|1> " "tso=<0|1> " "linear-buffer=<0|1> " - "ext-buffer=<0|1>"); + "ext-buffer=<0|1> " + "dma-ring-size=" + "dmas=[txq0@dma_addr;rxq0@dma_addr] "); diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h index 0e68b9f668..96e0e8e7be 100644 --- a/drivers/net/vhost/rte_eth_vhost.h +++ b/drivers/net/vhost/rte_eth_vhost.h @@ -14,6 +14,8 @@ extern "C" { #include +extern bool async_tx_poll_completed; + /* * Event description. */ @@ -52,6 +54,18 @@ int rte_eth_vhost_get_queue_event(uint16_t port_id, */ int rte_eth_vhost_get_vid_from_port_id(uint16_t port_id); +/** + * Ask Tx side to poll completed packets + * + * @param flag + * flag. + */ +static __rte_always_inline void +rte_eth_vhost_async_tx_poll_completed(bool enable) +{ + async_tx_poll_completed = enable; +} + #ifdef __cplusplus } #endif