From patchwork Mon Jan 24 16:40:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Jiayu" X-Patchwork-Id: 106340 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4C535A04A6; Mon, 24 Jan 2022 09:37:03 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0D287410EF; Mon, 24 Jan 2022 09:37:02 +0100 (CET) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by mails.dpdk.org (Postfix) with ESMTP id 9CABA40E2D for ; Mon, 24 Jan 2022 09:36:59 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643013419; x=1674549419; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Wk6XHhSN0MIIa/WllCRW5e1L5K/0mWmrgOQsLZ1al0M=; b=UXYvo0FDPrq/0Ok7LP7BoCVW5kTlg26qPjSIsMZFLio/GGPH9u5iYRHc 5m1mphfcZTgMl0XrVZyzunutVNiZAa8MTq+eTo92EoclwDW18kp+xD+ZU 4fKhMywYkRyYa/qbuHVTeROjnlcSDZZhLTNdHjwgqQhfCe9Rv/ePXoqY3 qF/mXOQCQ1M9d+jhuvAf9SMY3bMp7WAv2b/VXoJ+hXq1tdUbEpPdv3Ro6 OGgAMtMgpeief2WsfunpWz5yUpnfks7I6XkSKnNgEq8Sq1SxixmxAZjyO CyvFEZOQddlckZnz+PHitBIMURLufvrhoa1GubZfYCzPbw6xH22rW8DRb g==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="243592446" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="243592446" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 00:36:59 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="627415235" Received: from npgdpdkvirtiojiayuhu117.sh.intel.com ([10.67.119.202]) by orsmga004.jf.intel.com with ESMTP; 24 Jan 2022 00:36:55 -0800 From: Jiayu Hu To: dev@dpdk.org Cc: maxime.coquelin@redhat.com, i.maximets@ovn.org, chenbo.xia@intel.com, bruce.richardson@intel.com, harry.van.haaren@intel.com, sunil.pai.g@intel.com, john.mcnamara@intel.com, xuan.ding@intel.com, cheng1.jiang@intel.com, liangma@liangbit.com, Jiayu Hu Subject: [PATCH v2 1/1] vhost: integrate dmadev in asynchronous datapath Date: Mon, 24 Jan 2022 11:40:11 -0500 Message-Id: <20220124164011.1402593-2-jiayu.hu@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220124164011.1402593-1-jiayu.hu@intel.com> References: <20211230215505.329674-1-jiayu.hu@intel.com> <20220124164011.1402593-1-jiayu.hu@intel.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Since dmadev is introduced in 21.11, to avoid the overhead of vhost DMA abstraction layer and simplify application logics, this patch integrates dmadev in asynchronous data path. Signed-off-by: Jiayu Hu Signed-off-by: Sunil Pai G --- doc/guides/prog_guide/vhost_lib.rst | 95 ++++----- examples/vhost/Makefile | 2 +- examples/vhost/ioat.c | 218 -------------------- examples/vhost/ioat.h | 63 ------ examples/vhost/main.c | 255 ++++++++++++++++++----- examples/vhost/main.h | 11 + examples/vhost/meson.build | 6 +- lib/vhost/meson.build | 2 +- lib/vhost/rte_vhost.h | 2 + lib/vhost/rte_vhost_async.h | 132 +++++------- lib/vhost/version.map | 3 + lib/vhost/vhost.c | 148 ++++++++++---- lib/vhost/vhost.h | 64 +++++- lib/vhost/vhost_user.c | 2 + lib/vhost/virtio_net.c | 305 +++++++++++++++++++++++----- 15 files changed, 744 insertions(+), 564 deletions(-) delete mode 100644 examples/vhost/ioat.c delete mode 100644 examples/vhost/ioat.h diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index 76f5d303c9..acc10ea851 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -106,12 +106,11 @@ The following is an overview of some key Vhost API functions: - ``RTE_VHOST_USER_ASYNC_COPY`` Asynchronous data path will be enabled when this flag is set. Async data - path allows applications to register async copy devices (typically - hardware DMA channels) to the vhost queues. Vhost leverages the copy - device registered to free CPU from memory copy operations. A set of - async data path APIs are defined for DPDK applications to make use of - the async capability. Only packets enqueued/dequeued by async APIs are - processed through the async data path. + path allows applications to register DMA channels to the vhost queues. + Vhost leverages the registered DMA devices to free CPU from memory copy + operations. A set of async data path APIs are defined for DPDK applications + to make use of the async capability. Only packets enqueued/dequeued by + async APIs are processed through the async data path. Currently this feature is only implemented on split ring enqueue data path. @@ -218,52 +217,30 @@ The following is an overview of some key Vhost API functions: Enable or disable zero copy feature of the vhost crypto backend. -* ``rte_vhost_async_channel_register(vid, queue_id, config, ops)`` +* ``rte_vhost_async_dma_configure(dmas_id, count, poll_factor)`` - Register an async copy device channel for a vhost queue after vring - is enabled. Following device ``config`` must be specified together - with the registration: + Tell vhost what DMA devices are going to use. This function needs to + be called before register async data-path for vring. - * ``features`` +* ``rte_vhost_async_channel_register(vid, queue_id)`` - This field is used to specify async copy device features. + Register async DMA acceleration for a vhost queue after vring is enabled. - ``RTE_VHOST_ASYNC_INORDER`` represents the async copy device can - guarantee the order of copy completion is the same as the order - of copy submission. +* ``rte_vhost_async_channel_register_thread_unsafe(vid, queue_id)`` - Currently, only ``RTE_VHOST_ASYNC_INORDER`` capable device is - supported by vhost. - - Applications must provide following ``ops`` callbacks for vhost lib to - work with the async copy devices: - - * ``transfer_data(vid, queue_id, descs, opaque_data, count)`` - - vhost invokes this function to submit copy data to the async devices. - For non-async_inorder capable devices, ``opaque_data`` could be used - for identifying the completed packets. - - * ``check_completed_copies(vid, queue_id, opaque_data, max_packets)`` - - vhost invokes this function to get the copy data completed by async - devices. - -* ``rte_vhost_async_channel_register_thread_unsafe(vid, queue_id, config, ops)`` - - Register an async copy device channel for a vhost queue without - performing any locking. + Register async DMA acceleration for a vhost queue without performing + any locking. This function is only safe to call in vhost callback functions (i.e., struct rte_vhost_device_ops). * ``rte_vhost_async_channel_unregister(vid, queue_id)`` - Unregister the async copy device channel from a vhost queue. + Unregister the async DMA acceleration from a vhost queue. Unregistration will fail, if the vhost queue has in-flight packets that are not completed. - Unregister async copy devices in vring_state_changed() may + Unregister async DMA acceleration in vring_state_changed() may fail, as this API tries to acquire the spinlock of vhost queue. The recommended way is to unregister async copy devices for all vhost queues in destroy_device(), when a @@ -271,24 +248,19 @@ The following is an overview of some key Vhost API functions: * ``rte_vhost_async_channel_unregister_thread_unsafe(vid, queue_id)`` - Unregister the async copy device channel for a vhost queue without - performing any locking. + Unregister async DMA acceleration for a vhost queue without performing + any locking. This function is only safe to call in vhost callback functions (i.e., struct rte_vhost_device_ops). -* ``rte_vhost_submit_enqueue_burst(vid, queue_id, pkts, count, comp_pkts, comp_count)`` +* ``rte_vhost_submit_enqueue_burst(vid, queue_id, pkts, count, dma_id, vchan_id)`` Submit an enqueue request to transmit ``count`` packets from host to guest - by async data path. Successfully enqueued packets can be transfer completed - or being occupied by DMA engines; transfer completed packets are returned in - ``comp_pkts``, but others are not guaranteed to finish, when this API - call returns. + by async data path. Applications must not free the packets submitted for + enqueue until the packets are completed. - Applications must not free the packets submitted for enqueue until the - packets are completed. - -* ``rte_vhost_poll_enqueue_completed(vid, queue_id, pkts, count)`` +* ``rte_vhost_poll_enqueue_completed(vid, queue_id, pkts, count, dma_id, vchan_id)`` Poll enqueue completion status from async data path. Completed packets are returned to applications through ``pkts``. @@ -298,7 +270,7 @@ The following is an overview of some key Vhost API functions: This function returns the amount of in-flight packets for the vhost queue using async acceleration. -* ``rte_vhost_clear_queue_thread_unsafe(vid, queue_id, **pkts, count)`` +* ``rte_vhost_clear_queue_thread_unsafe(vid, queue_id, **pkts, count, dma_id, vchan_id)`` Clear inflight packets which are submitted to DMA engine in vhost async data path. Completed packets are returned to applications through ``pkts``. @@ -442,3 +414,26 @@ Finally, a set of device ops is defined for device specific operations: * ``get_notify_area`` Called to get the notify area info of the queue. + +Vhost asynchronous data path +---------------------------- + +Vhost asynchronous data path leverages DMA devices to offload memory +copies from the CPU and it is implemented in an asynchronous way. It +enables applications, like OVS, to save CPU cycles and hide memory copy +overhead, thus achieving higher throughput. + +Vhost doesn't manage DMA devices and applications, like OVS, need to +manage and configure DMA devices. Applications need to tell vhost what +DMA devices to use in every data path function call. This design enables +the flexibility for applications to dynamically use DMA channels in +different function modules, not limited in vhost. + +In addition, vhost supports M:N mapping between vrings and DMA virtual +channels. Specifically, one vring can use multiple different DMA channels +and one DMA channel can be shared by multiple vrings at the same time. +The reason of enabling one vring to use multiple DMA channels is that +it's possible that more than one dataplane threads enqueue packets to +the same vring with their own DMA virtual channels. Besides, the number +of DMA devices is limited. For the purpose of scaling, it's necessary to +support sharing DMA channels among vrings. diff --git a/examples/vhost/Makefile b/examples/vhost/Makefile index 587ea2ab47..975a5dfe40 100644 --- a/examples/vhost/Makefile +++ b/examples/vhost/Makefile @@ -5,7 +5,7 @@ APP = vhost-switch # all source are stored in SRCS-y -SRCS-y := main.c virtio_net.c ioat.c +SRCS-y := main.c virtio_net.c PKGCONF ?= pkg-config diff --git a/examples/vhost/ioat.c b/examples/vhost/ioat.c deleted file mode 100644 index 9aeeb12fd9..0000000000 --- a/examples/vhost/ioat.c +++ /dev/null @@ -1,218 +0,0 @@ -/* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2010-2020 Intel Corporation - */ - -#include -#ifdef RTE_RAW_IOAT -#include -#include - -#include "ioat.h" -#include "main.h" - -struct dma_for_vhost dma_bind[MAX_VHOST_DEVICE]; - -struct packet_tracker { - unsigned short size_track[MAX_ENQUEUED_SIZE]; - unsigned short next_read; - unsigned short next_write; - unsigned short last_remain; - unsigned short ioat_space; -}; - -struct packet_tracker cb_tracker[MAX_VHOST_DEVICE]; - -int -open_ioat(const char *value) -{ - struct dma_for_vhost *dma_info = dma_bind; - char *input = strndup(value, strlen(value) + 1); - char *addrs = input; - char *ptrs[2]; - char *start, *end, *substr; - int64_t vid, vring_id; - struct rte_ioat_rawdev_config config; - struct rte_rawdev_info info = { .dev_private = &config }; - char name[32]; - int dev_id; - int ret = 0; - uint16_t i = 0; - char *dma_arg[MAX_VHOST_DEVICE]; - int args_nr; - - while (isblank(*addrs)) - addrs++; - if (*addrs == '\0') { - ret = -1; - goto out; - } - - /* process DMA devices within bracket. */ - addrs++; - substr = strtok(addrs, ";]"); - if (!substr) { - ret = -1; - goto out; - } - args_nr = rte_strsplit(substr, strlen(substr), - dma_arg, MAX_VHOST_DEVICE, ','); - if (args_nr <= 0) { - ret = -1; - goto out; - } - while (i < args_nr) { - char *arg_temp = dma_arg[i]; - uint8_t sub_nr; - sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@'); - if (sub_nr != 2) { - ret = -1; - goto out; - } - - start = strstr(ptrs[0], "txd"); - if (start == NULL) { - ret = -1; - goto out; - } - - start += 3; - vid = strtol(start, &end, 0); - if (end == start) { - ret = -1; - goto out; - } - - vring_id = 0 + VIRTIO_RXQ; - if (rte_pci_addr_parse(ptrs[1], - &(dma_info + vid)->dmas[vring_id].addr) < 0) { - ret = -1; - goto out; - } - - rte_pci_device_name(&(dma_info + vid)->dmas[vring_id].addr, - name, sizeof(name)); - dev_id = rte_rawdev_get_dev_id(name); - if (dev_id == (uint16_t)(-ENODEV) || - dev_id == (uint16_t)(-EINVAL)) { - ret = -1; - goto out; - } - - if (rte_rawdev_info_get(dev_id, &info, sizeof(config)) < 0 || - strstr(info.driver_name, "ioat") == NULL) { - ret = -1; - goto out; - } - - (dma_info + vid)->dmas[vring_id].dev_id = dev_id; - (dma_info + vid)->dmas[vring_id].is_valid = true; - config.ring_size = IOAT_RING_SIZE; - config.hdls_disable = true; - if (rte_rawdev_configure(dev_id, &info, sizeof(config)) < 0) { - ret = -1; - goto out; - } - rte_rawdev_start(dev_id); - cb_tracker[dev_id].ioat_space = IOAT_RING_SIZE - 1; - dma_info->nr++; - i++; - } -out: - free(input); - return ret; -} - -int32_t -ioat_transfer_data_cb(int vid, uint16_t queue_id, - struct rte_vhost_iov_iter *iov_iter, - struct rte_vhost_async_status *opaque_data, uint16_t count) -{ - uint32_t i_iter; - uint16_t dev_id = dma_bind[vid].dmas[queue_id * 2 + VIRTIO_RXQ].dev_id; - struct rte_vhost_iov_iter *iter = NULL; - unsigned long i_seg; - unsigned short mask = MAX_ENQUEUED_SIZE - 1; - unsigned short write = cb_tracker[dev_id].next_write; - - if (!opaque_data) { - for (i_iter = 0; i_iter < count; i_iter++) { - iter = iov_iter + i_iter; - i_seg = 0; - if (cb_tracker[dev_id].ioat_space < iter->nr_segs) - break; - while (i_seg < iter->nr_segs) { - rte_ioat_enqueue_copy(dev_id, - (uintptr_t)(iter->iov[i_seg].src_addr), - (uintptr_t)(iter->iov[i_seg].dst_addr), - iter->iov[i_seg].len, - 0, - 0); - i_seg++; - } - write &= mask; - cb_tracker[dev_id].size_track[write] = iter->nr_segs; - cb_tracker[dev_id].ioat_space -= iter->nr_segs; - write++; - } - } else { - /* Opaque data is not supported */ - return -1; - } - /* ring the doorbell */ - rte_ioat_perform_ops(dev_id); - cb_tracker[dev_id].next_write = write; - return i_iter; -} - -int32_t -ioat_check_completed_copies_cb(int vid, uint16_t queue_id, - struct rte_vhost_async_status *opaque_data, - uint16_t max_packets) -{ - if (!opaque_data) { - uintptr_t dump[255]; - int n_seg; - unsigned short read, write; - unsigned short nb_packet = 0; - unsigned short mask = MAX_ENQUEUED_SIZE - 1; - unsigned short i; - - uint16_t dev_id = dma_bind[vid].dmas[queue_id * 2 - + VIRTIO_RXQ].dev_id; - n_seg = rte_ioat_completed_ops(dev_id, 255, NULL, NULL, dump, dump); - if (n_seg < 0) { - RTE_LOG(ERR, - VHOST_DATA, - "fail to poll completed buf on IOAT device %u", - dev_id); - return 0; - } - if (n_seg == 0) - return 0; - - cb_tracker[dev_id].ioat_space += n_seg; - n_seg += cb_tracker[dev_id].last_remain; - - read = cb_tracker[dev_id].next_read; - write = cb_tracker[dev_id].next_write; - for (i = 0; i < max_packets; i++) { - read &= mask; - if (read == write) - break; - if (n_seg >= cb_tracker[dev_id].size_track[read]) { - n_seg -= cb_tracker[dev_id].size_track[read]; - read++; - nb_packet++; - } else { - break; - } - } - cb_tracker[dev_id].next_read = read; - cb_tracker[dev_id].last_remain = n_seg; - return nb_packet; - } - /* Opaque data is not supported */ - return -1; -} - -#endif /* RTE_RAW_IOAT */ diff --git a/examples/vhost/ioat.h b/examples/vhost/ioat.h deleted file mode 100644 index d9bf717e8d..0000000000 --- a/examples/vhost/ioat.h +++ /dev/null @@ -1,63 +0,0 @@ -/* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2010-2020 Intel Corporation - */ - -#ifndef _IOAT_H_ -#define _IOAT_H_ - -#include -#include -#include - -#define MAX_VHOST_DEVICE 1024 -#define IOAT_RING_SIZE 4096 -#define MAX_ENQUEUED_SIZE 4096 - -struct dma_info { - struct rte_pci_addr addr; - uint16_t dev_id; - bool is_valid; -}; - -struct dma_for_vhost { - struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2]; - uint16_t nr; -}; - -#ifdef RTE_RAW_IOAT -int open_ioat(const char *value); - -int32_t -ioat_transfer_data_cb(int vid, uint16_t queue_id, - struct rte_vhost_iov_iter *iov_iter, - struct rte_vhost_async_status *opaque_data, uint16_t count); - -int32_t -ioat_check_completed_copies_cb(int vid, uint16_t queue_id, - struct rte_vhost_async_status *opaque_data, - uint16_t max_packets); -#else -static int open_ioat(const char *value __rte_unused) -{ - return -1; -} - -static int32_t -ioat_transfer_data_cb(int vid __rte_unused, uint16_t queue_id __rte_unused, - struct rte_vhost_iov_iter *iov_iter __rte_unused, - struct rte_vhost_async_status *opaque_data __rte_unused, - uint16_t count __rte_unused) -{ - return -1; -} - -static int32_t -ioat_check_completed_copies_cb(int vid __rte_unused, - uint16_t queue_id __rte_unused, - struct rte_vhost_async_status *opaque_data __rte_unused, - uint16_t max_packets __rte_unused) -{ - return -1; -} -#endif -#endif /* _IOAT_H_ */ diff --git a/examples/vhost/main.c b/examples/vhost/main.c index 590a77c723..b2c272059e 100644 --- a/examples/vhost/main.c +++ b/examples/vhost/main.c @@ -24,8 +24,9 @@ #include #include #include +#include +#include -#include "ioat.h" #include "main.h" #ifndef MAX_QUEUES @@ -56,6 +57,13 @@ #define RTE_TEST_TX_DESC_DEFAULT 512 #define INVALID_PORT_ID 0xFF +#define INVALID_DMA_ID -1 + +#define DMA_RING_SIZE 4096 + +struct dma_for_vhost dma_bind[RTE_MAX_VHOST_DEVICE]; +int16_t dmas_id[RTE_DMADEV_DEFAULT_MAX]; +static int dma_count; /* mask of enabled ports */ static uint32_t enabled_port_mask = 0; @@ -94,10 +102,6 @@ static int client_mode; static int builtin_net_driver; -static int async_vhost_driver; - -static char *dma_type; - /* Specify timeout (in useconds) between retries on RX. */ static uint32_t burst_rx_delay_time = BURST_RX_WAIT_US; /* Specify the number of retries on RX. */ @@ -191,18 +195,150 @@ struct mbuf_table lcore_tx_queue[RTE_MAX_LCORE]; * Every data core maintains a TX buffer for every vhost device, * which is used for batch pkts enqueue for higher performance. */ -struct vhost_bufftable *vhost_txbuff[RTE_MAX_LCORE * MAX_VHOST_DEVICE]; +struct vhost_bufftable *vhost_txbuff[RTE_MAX_LCORE * RTE_MAX_VHOST_DEVICE]; #define MBUF_TABLE_DRAIN_TSC ((rte_get_tsc_hz() + US_PER_S - 1) \ / US_PER_S * BURST_TX_DRAIN_US) +static inline bool +is_dma_configured(int16_t dev_id) +{ + int i; + + for (i = 0; i < dma_count; i++) + if (dmas_id[i] == dev_id) + return true; + return false; +} + static inline int open_dma(const char *value) { - if (dma_type != NULL && strncmp(dma_type, "ioat", 4) == 0) - return open_ioat(value); + struct dma_for_vhost *dma_info = dma_bind; + char *input = strndup(value, strlen(value) + 1); + char *addrs = input; + char *ptrs[2]; + char *start, *end, *substr; + int64_t vid; + + struct rte_dma_info info; + struct rte_dma_conf dev_config = { .nb_vchans = 1 }; + struct rte_dma_vchan_conf qconf = { + .direction = RTE_DMA_DIR_MEM_TO_MEM, + .nb_desc = DMA_RING_SIZE + }; + + int dev_id; + int ret = 0; + uint16_t i = 0; + char *dma_arg[RTE_MAX_VHOST_DEVICE]; + int args_nr; + + while (isblank(*addrs)) + addrs++; + if (*addrs == '\0') { + ret = -1; + goto out; + } + + /* process DMA devices within bracket. */ + addrs++; + substr = strtok(addrs, ";]"); + if (!substr) { + ret = -1; + goto out; + } + + args_nr = rte_strsplit(substr, strlen(substr), dma_arg, RTE_MAX_VHOST_DEVICE, ','); + if (args_nr <= 0) { + ret = -1; + goto out; + } + + while (i < args_nr) { + char *arg_temp = dma_arg[i]; + uint8_t sub_nr; + + sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@'); + if (sub_nr != 2) { + ret = -1; + goto out; + } + + start = strstr(ptrs[0], "txd"); + if (start == NULL) { + ret = -1; + goto out; + } + + start += 3; + vid = strtol(start, &end, 0); + if (end == start) { + ret = -1; + goto out; + } + + dev_id = rte_dma_get_dev_id_by_name(ptrs[1]); + if (dev_id < 0) { + RTE_LOG(ERR, VHOST_CONFIG, "Fail to find DMA %s.\n", ptrs[1]); + ret = -1; + goto out; + } + + /* DMA device is already configured, so skip */ + if (is_dma_configured(dev_id)) + goto done; + + if (rte_dma_info_get(dev_id, &info) != 0) { + RTE_LOG(ERR, VHOST_CONFIG, "Error with rte_dma_info_get()\n"); + ret = -1; + goto out; + } + + if (info.max_vchans < 1) { + RTE_LOG(ERR, VHOST_CONFIG, "No channels available on device %d\n", dev_id); + ret = -1; + goto out; + } - return -1; + if (rte_dma_configure(dev_id, &dev_config) != 0) { + RTE_LOG(ERR, VHOST_CONFIG, "Fail to configure DMA %d.\n", dev_id); + ret = -1; + goto out; + } + + /* Check the max desc supported by DMA device */ + rte_dma_info_get(dev_id, &info); + if (info.nb_vchans != 1) { + RTE_LOG(ERR, VHOST_CONFIG, "No configured queues reported by DMA %d.\n", + dev_id); + ret = -1; + goto out; + } + + qconf.nb_desc = RTE_MIN(DMA_RING_SIZE, info.max_desc); + + if (rte_dma_vchan_setup(dev_id, 0, &qconf) != 0) { + RTE_LOG(ERR, VHOST_CONFIG, "Fail to set up DMA %d.\n", dev_id); + ret = -1; + goto out; + } + + if (rte_dma_start(dev_id) != 0) { + RTE_LOG(ERR, VHOST_CONFIG, "Fail to start DMA %u.\n", dev_id); + ret = -1; + goto out; + } + + dmas_id[dma_count++] = dev_id; + +done: + (dma_info + vid)->dmas[VIRTIO_RXQ].dev_id = dev_id; + i++; + } +out: + free(input); + return ret; } /* @@ -500,8 +636,6 @@ enum { OPT_CLIENT_NUM, #define OPT_BUILTIN_NET_DRIVER "builtin-net-driver" OPT_BUILTIN_NET_DRIVER_NUM, -#define OPT_DMA_TYPE "dma-type" - OPT_DMA_TYPE_NUM, #define OPT_DMAS "dmas" OPT_DMAS_NUM, }; @@ -539,8 +673,6 @@ us_vhost_parse_args(int argc, char **argv) NULL, OPT_CLIENT_NUM}, {OPT_BUILTIN_NET_DRIVER, no_argument, NULL, OPT_BUILTIN_NET_DRIVER_NUM}, - {OPT_DMA_TYPE, required_argument, - NULL, OPT_DMA_TYPE_NUM}, {OPT_DMAS, required_argument, NULL, OPT_DMAS_NUM}, {NULL, 0, 0, 0}, @@ -661,10 +793,6 @@ us_vhost_parse_args(int argc, char **argv) } break; - case OPT_DMA_TYPE_NUM: - dma_type = optarg; - break; - case OPT_DMAS_NUM: if (open_dma(optarg) == -1) { RTE_LOG(INFO, VHOST_CONFIG, @@ -672,7 +800,6 @@ us_vhost_parse_args(int argc, char **argv) us_vhost_usage(prgname); return -1; } - async_vhost_driver = 1; break; case OPT_CLIENT_NUM: @@ -841,9 +968,10 @@ complete_async_pkts(struct vhost_dev *vdev) { struct rte_mbuf *p_cpl[MAX_PKT_BURST]; uint16_t complete_count; + int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id; complete_count = rte_vhost_poll_enqueue_completed(vdev->vid, - VIRTIO_RXQ, p_cpl, MAX_PKT_BURST); + VIRTIO_RXQ, p_cpl, MAX_PKT_BURST, dma_id, 0); if (complete_count) { free_pkts(p_cpl, complete_count); __atomic_sub_fetch(&vdev->pkts_inflight, complete_count, __ATOMIC_SEQ_CST); @@ -877,17 +1005,18 @@ static __rte_always_inline void drain_vhost(struct vhost_dev *vdev) { uint16_t ret; - uint32_t buff_idx = rte_lcore_id() * MAX_VHOST_DEVICE + vdev->vid; + uint32_t buff_idx = rte_lcore_id() * RTE_MAX_VHOST_DEVICE + vdev->vid; uint16_t nr_xmit = vhost_txbuff[buff_idx]->len; struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table; if (builtin_net_driver) { ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit); - } else if (async_vhost_driver) { + } else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) { uint16_t enqueue_fail = 0; + int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id; complete_async_pkts(vdev); - ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ, m, nr_xmit); + ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ, m, nr_xmit, dma_id, 0); __atomic_add_fetch(&vdev->pkts_inflight, ret, __ATOMIC_SEQ_CST); enqueue_fail = nr_xmit - ret; @@ -905,7 +1034,7 @@ drain_vhost(struct vhost_dev *vdev) __ATOMIC_SEQ_CST); } - if (!async_vhost_driver) + if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) free_pkts(m, nr_xmit); } @@ -921,7 +1050,7 @@ drain_vhost_table(void) if (unlikely(vdev->remove == 1)) continue; - vhost_txq = vhost_txbuff[lcore_id * MAX_VHOST_DEVICE + vhost_txq = vhost_txbuff[lcore_id * RTE_MAX_VHOST_DEVICE + vdev->vid]; cur_tsc = rte_rdtsc(); @@ -970,7 +1099,7 @@ virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m) return 0; } - vhost_txq = vhost_txbuff[lcore_id * MAX_VHOST_DEVICE + dst_vdev->vid]; + vhost_txq = vhost_txbuff[lcore_id * RTE_MAX_VHOST_DEVICE + dst_vdev->vid]; vhost_txq->m_table[vhost_txq->len++] = m; if (enable_stats) { @@ -1211,12 +1340,13 @@ drain_eth_rx(struct vhost_dev *vdev) if (builtin_net_driver) { enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ, pkts, rx_count); - } else if (async_vhost_driver) { + } else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) { uint16_t enqueue_fail = 0; + int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id; complete_async_pkts(vdev); enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid, - VIRTIO_RXQ, pkts, rx_count); + VIRTIO_RXQ, pkts, rx_count, dma_id, 0); __atomic_add_fetch(&vdev->pkts_inflight, enqueue_count, __ATOMIC_SEQ_CST); enqueue_fail = rx_count - enqueue_count; @@ -1235,7 +1365,7 @@ drain_eth_rx(struct vhost_dev *vdev) __ATOMIC_SEQ_CST); } - if (!async_vhost_driver) + if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) free_pkts(pkts, rx_count); } @@ -1357,7 +1487,7 @@ destroy_device(int vid) } for (i = 0; i < RTE_MAX_LCORE; i++) - rte_free(vhost_txbuff[i * MAX_VHOST_DEVICE + vid]); + rte_free(vhost_txbuff[i * RTE_MAX_VHOST_DEVICE + vid]); if (builtin_net_driver) vs_vhost_net_remove(vdev); @@ -1387,18 +1517,20 @@ destroy_device(int vid) "(%d) device has been removed from data core\n", vdev->vid); - if (async_vhost_driver) { + if (dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled) { uint16_t n_pkt = 0; + int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id; struct rte_mbuf *m_cpl[vdev->pkts_inflight]; while (vdev->pkts_inflight) { n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, VIRTIO_RXQ, - m_cpl, vdev->pkts_inflight); + m_cpl, vdev->pkts_inflight, dma_id, 0); free_pkts(m_cpl, n_pkt); __atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, __ATOMIC_SEQ_CST); } rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ); + dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = false; } rte_free(vdev); @@ -1425,12 +1557,12 @@ new_device(int vid) vdev->vid = vid; for (i = 0; i < RTE_MAX_LCORE; i++) { - vhost_txbuff[i * MAX_VHOST_DEVICE + vid] + vhost_txbuff[i * RTE_MAX_VHOST_DEVICE + vid] = rte_zmalloc("vhost bufftable", sizeof(struct vhost_bufftable), RTE_CACHE_LINE_SIZE); - if (vhost_txbuff[i * MAX_VHOST_DEVICE + vid] == NULL) { + if (vhost_txbuff[i * RTE_MAX_VHOST_DEVICE + vid] == NULL) { RTE_LOG(INFO, VHOST_DATA, "(%d) couldn't allocate memory for vhost TX\n", vid); return -1; @@ -1468,20 +1600,13 @@ new_device(int vid) "(%d) device has been added to data core %d\n", vid, vdev->coreid); - if (async_vhost_driver) { - struct rte_vhost_async_config config = {0}; - struct rte_vhost_async_channel_ops channel_ops; - - if (dma_type != NULL && strncmp(dma_type, "ioat", 4) == 0) { - channel_ops.transfer_data = ioat_transfer_data_cb; - channel_ops.check_completed_copies = - ioat_check_completed_copies_cb; - - config.features = RTE_VHOST_ASYNC_INORDER; + if (dma_bind[vid].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) { + int ret; - return rte_vhost_async_channel_register(vid, VIRTIO_RXQ, - config, &channel_ops); - } + ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ); + if (ret == 0) + dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = true; + return ret; } return 0; @@ -1502,14 +1627,15 @@ vring_state_changed(int vid, uint16_t queue_id, int enable) if (queue_id != VIRTIO_RXQ) return 0; - if (async_vhost_driver) { + if (dma_bind[vid].dmas[queue_id].async_enabled) { if (!enable) { uint16_t n_pkt = 0; + int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id; struct rte_mbuf *m_cpl[vdev->pkts_inflight]; while (vdev->pkts_inflight) { n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, queue_id, - m_cpl, vdev->pkts_inflight); + m_cpl, vdev->pkts_inflight, dma_id, 0); free_pkts(m_cpl, n_pkt); __atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, __ATOMIC_SEQ_CST); } @@ -1657,6 +1783,24 @@ create_mbuf_pool(uint16_t nr_port, uint32_t nr_switch_core, uint32_t mbuf_size, rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n"); } +static void +reset_dma(void) +{ + int i; + + for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) { + int j; + + for (j = 0; j < RTE_MAX_QUEUES_PER_PORT * 2; j++) { + dma_bind[i].dmas[j].dev_id = INVALID_DMA_ID; + dma_bind[i].dmas[j].async_enabled = false; + } + } + + for (i = 0; i < RTE_DMADEV_DEFAULT_MAX; i++) + dmas_id[i] = INVALID_DMA_ID; +} + /* * Main function, does initialisation and calls the per-lcore functions. */ @@ -1679,6 +1823,9 @@ main(int argc, char *argv[]) argc -= ret; argv += ret; + /* initialize dma structures */ + reset_dma(); + /* parse app arguments */ ret = us_vhost_parse_args(argc, argv); if (ret < 0) @@ -1754,11 +1901,21 @@ main(int argc, char *argv[]) if (client_mode) flags |= RTE_VHOST_USER_CLIENT; + if (dma_count) { + if (rte_vhost_async_dma_configure(dmas_id, dma_count, 1) < 0) { + RTE_LOG(ERR, VHOST_PORT, "Failed to configure DMA in vhost.\n"); + for (i = 0; i < dma_count; i++) + if (dmas_id[i] >= 0) + rte_dma_stop(dmas_id[i]); + rte_exit(EXIT_FAILURE, "Cannot use given DMA devices\n"); + } + } + /* Register vhost user driver to handle vhost messages. */ for (i = 0; i < nb_sockets; i++) { char *file = socket_files + i * PATH_MAX; - if (async_vhost_driver) + if (dma_count) flags = flags | RTE_VHOST_USER_ASYNC_COPY; ret = rte_vhost_driver_register(file, flags); diff --git a/examples/vhost/main.h b/examples/vhost/main.h index e7b1ac60a6..b4a453e77e 100644 --- a/examples/vhost/main.h +++ b/examples/vhost/main.h @@ -8,6 +8,7 @@ #include #include +#include /* Macros for printing using RTE_LOG */ #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1 @@ -79,6 +80,16 @@ struct lcore_info { struct vhost_dev_tailq_list vdev_list; }; +struct dma_info { + struct rte_pci_addr addr; + int16_t dev_id; + bool async_enabled; +}; + +struct dma_for_vhost { + struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2]; +}; + /* we implement non-extra virtio net features */ #define VIRTIO_NET_FEATURES 0 diff --git a/examples/vhost/meson.build b/examples/vhost/meson.build index 3efd5e6540..87a637f83f 100644 --- a/examples/vhost/meson.build +++ b/examples/vhost/meson.build @@ -12,13 +12,9 @@ if not is_linux endif deps += 'vhost' +deps += 'dmadev' allow_experimental_apis = true sources = files( 'main.c', 'virtio_net.c', ) - -if dpdk_conf.has('RTE_RAW_IOAT') - deps += 'raw_ioat' - sources += files('ioat.c') -endif diff --git a/lib/vhost/meson.build b/lib/vhost/meson.build index cdb37a4814..bc7272053b 100644 --- a/lib/vhost/meson.build +++ b/lib/vhost/meson.build @@ -36,4 +36,4 @@ headers = files( driver_sdk_headers = files( 'vdpa_driver.h', ) -deps += ['ethdev', 'cryptodev', 'hash', 'pci'] +deps += ['ethdev', 'cryptodev', 'hash', 'pci', 'dmadev'] diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h index b454c05868..15c37dd26e 100644 --- a/lib/vhost/rte_vhost.h +++ b/lib/vhost/rte_vhost.h @@ -113,6 +113,8 @@ extern "C" { #define VHOST_USER_F_PROTOCOL_FEATURES 30 #endif +#define RTE_MAX_VHOST_DEVICE 1024 + struct rte_vdpa_device; /** diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h index a87ea6ba37..758a80f403 100644 --- a/lib/vhost/rte_vhost_async.h +++ b/lib/vhost/rte_vhost_async.h @@ -26,73 +26,6 @@ struct rte_vhost_iov_iter { unsigned long nr_segs; }; -/** - * dma transfer status - */ -struct rte_vhost_async_status { - /** An array of application specific data for source memory */ - uintptr_t *src_opaque_data; - /** An array of application specific data for destination memory */ - uintptr_t *dst_opaque_data; -}; - -/** - * dma operation callbacks to be implemented by applications - */ -struct rte_vhost_async_channel_ops { - /** - * instruct async engines to perform copies for a batch of packets - * - * @param vid - * id of vhost device to perform data copies - * @param queue_id - * queue id to perform data copies - * @param iov_iter - * an array of IOV iterators - * @param opaque_data - * opaque data pair sending to DMA engine - * @param count - * number of elements in the "descs" array - * @return - * number of IOV iterators processed, negative value means error - */ - int32_t (*transfer_data)(int vid, uint16_t queue_id, - struct rte_vhost_iov_iter *iov_iter, - struct rte_vhost_async_status *opaque_data, - uint16_t count); - /** - * check copy-completed packets from the async engine - * @param vid - * id of vhost device to check copy completion - * @param queue_id - * queue id to check copy completion - * @param opaque_data - * buffer to receive the opaque data pair from DMA engine - * @param max_packets - * max number of packets could be completed - * @return - * number of async descs completed, negative value means error - */ - int32_t (*check_completed_copies)(int vid, uint16_t queue_id, - struct rte_vhost_async_status *opaque_data, - uint16_t max_packets); -}; - -/** - * async channel features - */ -enum { - RTE_VHOST_ASYNC_INORDER = 1U << 0, -}; - -/** - * async channel configuration - */ -struct rte_vhost_async_config { - uint32_t features; - uint32_t rsvd[2]; -}; - /** * Register an async channel for a vhost queue * @@ -100,17 +33,11 @@ struct rte_vhost_async_config { * vhost device id async channel to be attached to * @param queue_id * vhost queue id async channel to be attached to - * @param config - * Async channel configuration structure - * @param ops - * Async channel operation callbacks * @return * 0 on success, -1 on failures */ __rte_experimental -int rte_vhost_async_channel_register(int vid, uint16_t queue_id, - struct rte_vhost_async_config config, - struct rte_vhost_async_channel_ops *ops); +int rte_vhost_async_channel_register(int vid, uint16_t queue_id); /** * Unregister an async channel for a vhost queue @@ -136,17 +63,11 @@ int rte_vhost_async_channel_unregister(int vid, uint16_t queue_id); * vhost device id async channel to be attached to * @param queue_id * vhost queue id async channel to be attached to - * @param config - * Async channel configuration - * @param ops - * Async channel operation callbacks * @return * 0 on success, -1 on failures */ __rte_experimental -int rte_vhost_async_channel_register_thread_unsafe(int vid, uint16_t queue_id, - struct rte_vhost_async_config config, - struct rte_vhost_async_channel_ops *ops); +int rte_vhost_async_channel_register_thread_unsafe(int vid, uint16_t queue_id); /** * Unregister an async channel for a vhost queue without performing any @@ -179,12 +100,17 @@ int rte_vhost_async_channel_unregister_thread_unsafe(int vid, * array of packets to be enqueued * @param count * packets num to be enqueued + * @param dma_id + * the identifier of the DMA device + * @param vchan_id + * the identifier of virtual DMA channel * @return * num of packets enqueued */ __rte_experimental uint16_t rte_vhost_submit_enqueue_burst(int vid, uint16_t queue_id, - struct rte_mbuf **pkts, uint16_t count); + struct rte_mbuf **pkts, uint16_t count, int16_t dma_id, + uint16_t vchan_id); /** * This function checks async completion status for a specific vhost @@ -199,12 +125,17 @@ uint16_t rte_vhost_submit_enqueue_burst(int vid, uint16_t queue_id, * blank array to get return packet pointer * @param count * size of the packet array + * @param dma_id + * the identifier of the DMA device + * @param vchan_id + * the identifier of virtual DMA channel * @return * num of packets returned */ __rte_experimental uint16_t rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id, - struct rte_mbuf **pkts, uint16_t count); + struct rte_mbuf **pkts, uint16_t count, int16_t dma_id, + uint16_t vchan_id); /** * This function returns the amount of in-flight packets for the vhost @@ -235,11 +166,44 @@ int rte_vhost_async_get_inflight(int vid, uint16_t queue_id); * Blank array to get return packet pointer * @param count * Size of the packet array + * @param dma_id + * the identifier of the DMA device + * @param vchan_id + * the identifier of virtual DMA channel * @return * Number of packets returned */ __rte_experimental uint16_t rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id, - struct rte_mbuf **pkts, uint16_t count); + struct rte_mbuf **pkts, uint16_t count, int16_t dma_id, + uint16_t vchan_id); +/** + * The DMA vChannels used in asynchronous data path must be configured + * first. So this function needs to be called before enabling DMA + * acceleration for vring. If this function fails, asynchronous data path + * cannot be enabled for any vring further. + * + * DMA devices used in data-path must belong to DMA devices given in this + * function. But users are free to use DMA devices given in the function + * in non-vhost scenarios, only if guarantee no copies in vhost are + * offloaded to them at the same time. + * + * @param dmas_id + * DMA ID array + * @param count + * Element number of 'dmas_id' + * @param poll_factor + * For large or scatter-gather packets, one packet would consist of + * small buffers. In this case, vhost will issue several DMA copy + * operations for the packet. Therefore, the number of copies to + * check by rte_dma_completed() is calculated by "nb_pkts_to_poll * + * poll_factor" andused in rte_vhost_poll_enqueue_completed(). The + * default value of "poll_factor" is 1. + * @return + * 0 on success, and -1 on failure + */ +__rte_experimental +int rte_vhost_async_dma_configure(int16_t *dmas_id, uint16_t count, + uint16_t poll_factor); #endif /* _RTE_VHOST_ASYNC_H_ */ diff --git a/lib/vhost/version.map b/lib/vhost/version.map index a7ef7f1976..1202ba9c1a 100644 --- a/lib/vhost/version.map +++ b/lib/vhost/version.map @@ -84,6 +84,9 @@ EXPERIMENTAL { # added in 21.11 rte_vhost_get_monitor_addr; + + # added in 22.03 + rte_vhost_async_dma_configure; }; INTERNAL { diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c index 13a9bb9dd1..c408cee63e 100644 --- a/lib/vhost/vhost.c +++ b/lib/vhost/vhost.c @@ -25,7 +25,7 @@ #include "vhost.h" #include "vhost_user.h" -struct virtio_net *vhost_devices[MAX_VHOST_DEVICE]; +struct virtio_net *vhost_devices[RTE_MAX_VHOST_DEVICE]; pthread_mutex_t vhost_dev_lock = PTHREAD_MUTEX_INITIALIZER; /* Called with iotlb_lock read-locked */ @@ -344,6 +344,7 @@ vhost_free_async_mem(struct vhost_virtqueue *vq) return; rte_free(vq->async->pkts_info); + rte_free(vq->async->pkts_cmpl_flag); rte_free(vq->async->buffers_packed); vq->async->buffers_packed = NULL; @@ -667,12 +668,12 @@ vhost_new_device(void) int i; pthread_mutex_lock(&vhost_dev_lock); - for (i = 0; i < MAX_VHOST_DEVICE; i++) { + for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) { if (vhost_devices[i] == NULL) break; } - if (i == MAX_VHOST_DEVICE) { + if (i == RTE_MAX_VHOST_DEVICE) { VHOST_LOG_CONFIG(ERR, "Failed to find a free slot for new device.\n"); pthread_mutex_unlock(&vhost_dev_lock); @@ -1626,8 +1627,7 @@ rte_vhost_extern_callback_register(int vid, } static __rte_always_inline int -async_channel_register(int vid, uint16_t queue_id, - struct rte_vhost_async_channel_ops *ops) +async_channel_register(int vid, uint16_t queue_id) { struct virtio_net *dev = get_device(vid); struct vhost_virtqueue *vq = dev->virtqueue[queue_id]; @@ -1656,6 +1656,14 @@ async_channel_register(int vid, uint16_t queue_id, goto out_free_async; } + async->pkts_cmpl_flag = rte_zmalloc_socket(NULL, vq->size * sizeof(bool), + RTE_CACHE_LINE_SIZE, node); + if (!async->pkts_cmpl_flag) { + VHOST_LOG_CONFIG(ERR, "failed to allocate async pkts_cmpl_flag (vid %d, qid: %d)\n", + vid, queue_id); + goto out_free_async; + } + if (vq_is_packed(dev)) { async->buffers_packed = rte_malloc_socket(NULL, vq->size * sizeof(struct vring_used_elem_packed), @@ -1676,9 +1684,6 @@ async_channel_register(int vid, uint16_t queue_id, } } - async->ops.check_completed_copies = ops->check_completed_copies; - async->ops.transfer_data = ops->transfer_data; - vq->async = async; return 0; @@ -1691,15 +1696,13 @@ async_channel_register(int vid, uint16_t queue_id, } int -rte_vhost_async_channel_register(int vid, uint16_t queue_id, - struct rte_vhost_async_config config, - struct rte_vhost_async_channel_ops *ops) +rte_vhost_async_channel_register(int vid, uint16_t queue_id) { struct vhost_virtqueue *vq; struct virtio_net *dev = get_device(vid); int ret; - if (dev == NULL || ops == NULL) + if (dev == NULL) return -1; if (queue_id >= VHOST_MAX_VRING) @@ -1710,33 +1713,20 @@ rte_vhost_async_channel_register(int vid, uint16_t queue_id, if (unlikely(vq == NULL || !dev->async_copy)) return -1; - if (unlikely(!(config.features & RTE_VHOST_ASYNC_INORDER))) { - VHOST_LOG_CONFIG(ERR, - "async copy is not supported on non-inorder mode " - "(vid %d, qid: %d)\n", vid, queue_id); - return -1; - } - - if (unlikely(ops->check_completed_copies == NULL || - ops->transfer_data == NULL)) - return -1; - rte_spinlock_lock(&vq->access_lock); - ret = async_channel_register(vid, queue_id, ops); + ret = async_channel_register(vid, queue_id); rte_spinlock_unlock(&vq->access_lock); return ret; } int -rte_vhost_async_channel_register_thread_unsafe(int vid, uint16_t queue_id, - struct rte_vhost_async_config config, - struct rte_vhost_async_channel_ops *ops) +rte_vhost_async_channel_register_thread_unsafe(int vid, uint16_t queue_id) { struct vhost_virtqueue *vq; struct virtio_net *dev = get_device(vid); - if (dev == NULL || ops == NULL) + if (dev == NULL) return -1; if (queue_id >= VHOST_MAX_VRING) @@ -1747,18 +1737,7 @@ rte_vhost_async_channel_register_thread_unsafe(int vid, uint16_t queue_id, if (unlikely(vq == NULL || !dev->async_copy)) return -1; - if (unlikely(!(config.features & RTE_VHOST_ASYNC_INORDER))) { - VHOST_LOG_CONFIG(ERR, - "async copy is not supported on non-inorder mode " - "(vid %d, qid: %d)\n", vid, queue_id); - return -1; - } - - if (unlikely(ops->check_completed_copies == NULL || - ops->transfer_data == NULL)) - return -1; - - return async_channel_register(vid, queue_id, ops); + return async_channel_register(vid, queue_id); } int @@ -1835,6 +1814,95 @@ rte_vhost_async_channel_unregister_thread_unsafe(int vid, uint16_t queue_id) return 0; } +static __rte_always_inline void +vhost_free_async_dma_mem(void) +{ + uint16_t i; + + for (i = 0; i < RTE_DMADEV_DEFAULT_MAX; i++) { + struct async_dma_info *dma = &dma_copy_track[i]; + int16_t j; + + if (dma->max_vchans == 0) + continue; + + for (j = 0; j < dma->max_vchans; j++) + rte_free(dma->vchans[j].pkts_completed_flag); + + rte_free(dma->vchans); + dma->vchans = NULL; + dma->max_vchans = 0; + } +} + +int +rte_vhost_async_dma_configure(int16_t *dmas_id, uint16_t count, uint16_t poll_factor) +{ + uint16_t i; + + if (!dmas_id) { + VHOST_LOG_CONFIG(ERR, "Invalid DMA configuration parameter.\n"); + return -1; + } + + if (poll_factor == 0) { + VHOST_LOG_CONFIG(ERR, "Invalid DMA poll factor %u\n", poll_factor); + return -1; + } + dma_poll_factor = poll_factor; + + for (i = 0; i < count; i++) { + struct async_dma_vchan_info *vchans; + struct rte_dma_info info; + uint16_t max_vchans; + uint16_t max_desc; + uint16_t j; + + if (!rte_dma_is_valid(dmas_id[i])) { + VHOST_LOG_CONFIG(ERR, "DMA %d is not found. Cannot enable async" + " data-path\n.", dmas_id[i]); + vhost_free_async_dma_mem(); + return -1; + } + + rte_dma_info_get(dmas_id[i], &info); + + max_vchans = info.max_vchans; + max_desc = info.max_desc; + + if (!rte_is_power_of_2(max_desc)) + max_desc = rte_align32pow2(max_desc); + + vchans = rte_zmalloc(NULL, sizeof(struct async_dma_vchan_info) * max_vchans, + RTE_CACHE_LINE_SIZE); + if (vchans == NULL) { + VHOST_LOG_CONFIG(ERR, "Failed to allocate vchans for dma-%d." + " Cannot enable async data-path.\n", dmas_id[i]); + vhost_free_async_dma_mem(); + return -1; + } + + for (j = 0; j < max_vchans; j++) { + vchans[j].pkts_completed_flag = rte_zmalloc(NULL, sizeof(bool *) * max_desc, + RTE_CACHE_LINE_SIZE); + if (!vchans[j].pkts_completed_flag) { + VHOST_LOG_CONFIG(ERR, "Failed to allocate pkts_completed_flag for " + "dma-%d vchan-%u\n", dmas_id[i], j); + vhost_free_async_dma_mem(); + return -1; + } + + vchans[j].ring_size = max_desc; + vchans[j].ring_mask = max_desc - 1; + } + + dma_copy_track[dmas_id[i]].vchans = vchans; + dma_copy_track[dmas_id[i]].max_vchans = max_vchans; + } + + return 0; +} + int rte_vhost_async_get_inflight(int vid, uint16_t queue_id) { diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h index 7085e0885c..475843fec0 100644 --- a/lib/vhost/vhost.h +++ b/lib/vhost/vhost.h @@ -19,6 +19,7 @@ #include #include #include +#include #include "rte_vhost.h" #include "rte_vdpa.h" @@ -50,6 +51,7 @@ #define VHOST_MAX_ASYNC_IT (MAX_PKT_BURST) #define VHOST_MAX_ASYNC_VEC 2048 +#define VHOST_ASYNC_DMA_BATCHING_SIZE 32 #define PACKED_DESC_ENQUEUE_USED_FLAG(w) \ ((w) ? (VRING_DESC_F_AVAIL | VRING_DESC_F_USED | VRING_DESC_F_WRITE) : \ @@ -119,6 +121,42 @@ struct vring_used_elem_packed { uint32_t count; }; +struct async_dma_vchan_info { + /* circular array to track if packet copy completes */ + bool **pkts_completed_flag; + + /* max elements in 'metadata' */ + uint16_t ring_size; + /* ring index mask for 'metadata' */ + uint16_t ring_mask; + + /* batching copies before a DMA doorbell */ + uint16_t nr_batching; + + /** + * DMA virtual channel lock. Although it is able to bind DMA + * virtual channels to data plane threads, vhost control plane + * thread could call data plane functions too, thus causing + * DMA device contention. + * + * For example, in VM exit case, vhost control plane thread needs + * to clear in-flight packets before disable vring, but there could + * be anotther data plane thread is enqueuing packets to the same + * vring with the same DMA virtual channel. But dmadev PMD functions + * are lock-free, so the control plane and data plane threads + * could operate the same DMA virtual channel at the same time. + */ + rte_spinlock_t dma_lock; +}; + +struct async_dma_info { + uint16_t max_vchans; + struct async_dma_vchan_info *vchans; +}; + +extern struct async_dma_info dma_copy_track[RTE_DMADEV_DEFAULT_MAX]; +extern uint16_t dma_poll_factor; + /** * inflight async packet information */ @@ -129,9 +167,6 @@ struct async_inflight_info { }; struct vhost_async { - /* operation callbacks for DMA */ - struct rte_vhost_async_channel_ops ops; - struct rte_vhost_iov_iter iov_iter[VHOST_MAX_ASYNC_IT]; struct rte_vhost_iovec iovec[VHOST_MAX_ASYNC_VEC]; uint16_t iter_idx; @@ -139,6 +174,25 @@ struct vhost_async { /* data transfer status */ struct async_inflight_info *pkts_info; + /** + * Packet reorder array. "true" indicates that DMA device + * completes all copies for the packet. + * + * Note that this array could be written by multiple threads + * simultaneously. For example, in the case of thread0 and + * thread1 RX packets from NIC and then enqueue packets to + * vring0 and vring1 with own DMA device DMA0 and DMA1, it's + * possible for thread0 to get completed copies belonging to + * vring1 from DMA0, while thread0 is calling rte_vhost_poll + * _enqueue_completed() for vring0 and thread1 is calling + * rte_vhost_submit_enqueue_burst() for vring1. In this case, + * vq->access_lock cannot protect pkts_cmpl_flag of vring1. + * + * However, since offloading is per-packet basis, each packet + * flag will only be written by one thread. And single byte + * write is atomic, so no lock for pkts_cmpl_flag is needed. + */ + bool *pkts_cmpl_flag; uint16_t pkts_idx; uint16_t pkts_inflight_n; union { @@ -198,6 +252,7 @@ struct vhost_virtqueue { /* Record packed ring first dequeue desc index */ uint16_t shadow_last_used_idx; + uint16_t batch_copy_max_elems; uint16_t batch_copy_nb_elems; struct batch_copy_elem *batch_copy_elems; int numa_node; @@ -568,8 +623,7 @@ extern int vhost_data_log_level; #define PRINT_PACKET(device, addr, size, header) do {} while (0) #endif -#define MAX_VHOST_DEVICE 1024 -extern struct virtio_net *vhost_devices[MAX_VHOST_DEVICE]; +extern struct virtio_net *vhost_devices[RTE_MAX_VHOST_DEVICE]; #define VHOST_BINARY_SEARCH_THRESH 256 diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index 5eb1dd6812..3147e72f04 100644 --- a/lib/vhost/vhost_user.c +++ b/lib/vhost/vhost_user.c @@ -527,6 +527,8 @@ vhost_user_set_vring_num(struct virtio_net **pdev, return RTE_VHOST_MSG_RESULT_ERR; } + vq->batch_copy_max_elems = vq->size; + return RTE_VHOST_MSG_RESULT_OK; } diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c index b3d954aab4..305f6cd562 100644 --- a/lib/vhost/virtio_net.c +++ b/lib/vhost/virtio_net.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -25,6 +26,10 @@ #define MAX_BATCH_LEN 256 +/* DMA device copy operation tracking array. */ +struct async_dma_info dma_copy_track[RTE_DMADEV_DEFAULT_MAX]; +uint16_t dma_poll_factor = 1; + static __rte_always_inline bool rxvq_is_mergeable(struct virtio_net *dev) { @@ -43,6 +48,140 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t nr_vring) return (is_tx ^ (idx & 1)) == 0 && idx < nr_vring; } +static __rte_always_inline uint16_t +vhost_async_dma_transfer(struct vhost_virtqueue *vq, int16_t dma_id, + uint16_t vchan_id, uint16_t head_idx, + struct rte_vhost_iov_iter *pkts, uint16_t nr_pkts) +{ + struct async_dma_vchan_info *dma_info = &dma_copy_track[dma_id].vchans[vchan_id]; + uint16_t ring_mask = dma_info->ring_mask; + uint16_t pkt_idx, bce_idx = 0; + + rte_spinlock_lock(&dma_info->dma_lock); + + for (pkt_idx = 0; pkt_idx < nr_pkts; pkt_idx++) { + struct rte_vhost_iovec *iov = pkts[pkt_idx].iov; + int copy_idx, last_copy_idx = 0; + uint16_t nr_segs = pkts[pkt_idx].nr_segs; + uint16_t nr_sw_copy = 0; + uint16_t i; + + if (rte_dma_burst_capacity(dma_id, vchan_id) < nr_segs) + goto out; + + for (i = 0; i < nr_segs; i++) { + /* Fallback to SW copy if error happens */ + copy_idx = rte_dma_copy(dma_id, vchan_id, (rte_iova_t)iov[i].src_addr, + (rte_iova_t)iov[i].dst_addr, iov[i].len, + RTE_DMA_OP_FLAG_LLC); + if (unlikely(copy_idx < 0)) { + /* Find corresponding VA pair and do SW copy */ + rte_memcpy(vq->batch_copy_elems[bce_idx].dst, + vq->batch_copy_elems[bce_idx].src, + vq->batch_copy_elems[bce_idx].len); + nr_sw_copy++; + + /** + * All copies of the packet are performed + * by the CPU, set the packet completion flag + * to true, as all copies are done. + */ + if (nr_sw_copy == nr_segs) { + vq->async->pkts_cmpl_flag[head_idx % vq->size] = true; + break; + } else if (i == (nr_segs - 1)) { + /** + * A part of copies of current packet + * are enqueued to the DMA successfully + * but the last copy fails, store the + * packet completion flag address + * in the last DMA copy slot. + */ + dma_info->pkts_completed_flag[last_copy_idx & ring_mask] = + &vq->async->pkts_cmpl_flag[head_idx % vq->size]; + break; + } + } else + last_copy_idx = copy_idx; + + bce_idx++; + + /** + * Only store packet completion flag address in the last copy's + * slot, and other slots are set to NULL. + */ + if (i == (nr_segs - 1)) { + dma_info->pkts_completed_flag[copy_idx & ring_mask] = + &vq->async->pkts_cmpl_flag[head_idx % vq->size]; + } + } + + dma_info->nr_batching += nr_segs; + if (unlikely(dma_info->nr_batching >= VHOST_ASYNC_DMA_BATCHING_SIZE)) { + rte_dma_submit(dma_id, vchan_id); + dma_info->nr_batching = 0; + } + + head_idx++; + } + +out: + if (dma_info->nr_batching > 0) { + rte_dma_submit(dma_id, vchan_id); + dma_info->nr_batching = 0; + } + rte_spinlock_unlock(&dma_info->dma_lock); + vq->batch_copy_nb_elems = 0; + + return pkt_idx; +} + +static __rte_always_inline uint16_t +vhost_async_dma_check_completed(int16_t dma_id, uint16_t vchan_id, uint16_t max_pkts) +{ + struct async_dma_vchan_info *dma_info = &dma_copy_track[dma_id].vchans[vchan_id]; + uint16_t ring_mask = dma_info->ring_mask; + uint16_t last_idx = 0; + uint16_t nr_copies; + uint16_t copy_idx; + uint16_t i; + bool has_error = false; + + rte_spinlock_lock(&dma_info->dma_lock); + + /** + * Print error log for debugging, if DMA reports error during + * DMA transfer. We do not handle error in vhost level. + */ + nr_copies = rte_dma_completed(dma_id, vchan_id, max_pkts, &last_idx, &has_error); + if (unlikely(has_error)) { + VHOST_LOG_DATA(ERR, "dma %d vchannel %u reports error in rte_dma_completed()\n", + dma_id, vchan_id); + } else if (nr_copies == 0) + goto out; + + copy_idx = last_idx - nr_copies + 1; + for (i = 0; i < nr_copies; i++) { + bool *flag; + + flag = dma_info->pkts_completed_flag[copy_idx & ring_mask]; + if (flag) { + /** + * Mark the packet flag as received. The flag + * could belong to another virtqueue but write + * is atomic. + */ + *flag = true; + dma_info->pkts_completed_flag[copy_idx & ring_mask] = NULL; + } + copy_idx++; + } + +out: + rte_spinlock_unlock(&dma_info->dma_lock); + return nr_copies; +} + static inline void do_data_copy_enqueue(struct virtio_net *dev, struct vhost_virtqueue *vq) { @@ -865,12 +1004,13 @@ async_iter_reset(struct vhost_async *async) static __rte_always_inline int async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq, struct rte_mbuf *m, uint32_t mbuf_offset, - uint64_t buf_iova, uint32_t cpy_len) + uint64_t buf_addr, uint64_t buf_iova, uint32_t cpy_len) { struct vhost_async *async = vq->async; uint64_t mapped_len; uint32_t buf_offset = 0; void *hpa; + struct batch_copy_elem *bce = vq->batch_copy_elems; while (cpy_len) { hpa = (void *)(uintptr_t)gpa_to_first_hpa(dev, @@ -886,6 +1026,31 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq, hpa, (size_t)mapped_len))) return -1; + /** + * Keep VA for all IOVA segments for falling back to SW + * copy in case of rte_dma_copy() error. + */ + if (unlikely(vq->batch_copy_nb_elems >= vq->batch_copy_max_elems)) { + struct batch_copy_elem *tmp; + uint16_t nb_elems = 2 * vq->batch_copy_max_elems; + + VHOST_LOG_DATA(DEBUG, "(%d) %s: run out of batch_copy_elems, " + "and realloc double elements.\n", dev->vid, __func__); + tmp = rte_realloc_socket(vq->batch_copy_elems, nb_elems * sizeof(*tmp), + RTE_CACHE_LINE_SIZE, vq->numa_node); + if (!tmp) { + VHOST_LOG_DATA(ERR, "Failed to re-alloc batch_copy_elems\n"); + return -1; + } + + vq->batch_copy_max_elems = nb_elems; + vq->batch_copy_elems = tmp; + bce = tmp; + } + bce[vq->batch_copy_nb_elems].dst = (void *)((uintptr_t)(buf_addr + buf_offset)); + bce[vq->batch_copy_nb_elems].src = rte_pktmbuf_mtod_offset(m, void *, mbuf_offset); + bce[vq->batch_copy_nb_elems++].len = mapped_len; + cpy_len -= (uint32_t)mapped_len; mbuf_offset += (uint32_t)mapped_len; buf_offset += (uint32_t)mapped_len; @@ -901,7 +1066,8 @@ sync_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq, { struct batch_copy_elem *batch_copy = vq->batch_copy_elems; - if (likely(cpy_len > MAX_BATCH_LEN || vq->batch_copy_nb_elems >= vq->size)) { + if (likely(cpy_len > MAX_BATCH_LEN || + vq->batch_copy_nb_elems >= vq->batch_copy_max_elems)) { rte_memcpy((void *)((uintptr_t)(buf_addr)), rte_pktmbuf_mtod_offset(m, void *, mbuf_offset), cpy_len); @@ -1020,8 +1186,10 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, if (is_async) { if (async_mbuf_to_desc_seg(dev, vq, m, mbuf_offset, + buf_addr + buf_offset, buf_iova + buf_offset, cpy_len) < 0) goto error; + } else { sync_mbuf_to_desc_seg(dev, vq, m, mbuf_offset, buf_addr + buf_offset, @@ -1449,9 +1617,9 @@ store_dma_desc_info_packed(struct vring_used_elem_packed *s_ring, } static __rte_noinline uint32_t -virtio_dev_rx_async_submit_split(struct virtio_net *dev, - struct vhost_virtqueue *vq, uint16_t queue_id, - struct rte_mbuf **pkts, uint32_t count) +virtio_dev_rx_async_submit_split(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint16_t queue_id, struct rte_mbuf **pkts, uint32_t count, + int16_t dma_id, uint16_t vchan_id) { struct buf_vector buf_vec[BUF_VECTOR_MAX]; uint32_t pkt_idx = 0; @@ -1503,17 +1671,16 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev, if (unlikely(pkt_idx == 0)) return 0; - n_xfer = async->ops.transfer_data(dev->vid, queue_id, async->iov_iter, 0, pkt_idx); - if (unlikely(n_xfer < 0)) { - VHOST_LOG_DATA(ERR, "(%d) %s: failed to transfer data for queue id %d.\n", - dev->vid, __func__, queue_id); - n_xfer = 0; - } + n_xfer = vhost_async_dma_transfer(vq, dma_id, vchan_id, async->pkts_idx, async->iov_iter, + pkt_idx); pkt_err = pkt_idx - n_xfer; if (unlikely(pkt_err)) { uint16_t num_descs = 0; + VHOST_LOG_DATA(DEBUG, "(%d) %s: failed to transfer %u packets for queue %u.\n", + dev->vid, __func__, pkt_err, queue_id); + /* update number of completed packets */ pkt_idx = n_xfer; @@ -1656,13 +1823,13 @@ dma_error_handler_packed(struct vhost_virtqueue *vq, uint16_t slot_idx, } static __rte_noinline uint32_t -virtio_dev_rx_async_submit_packed(struct virtio_net *dev, - struct vhost_virtqueue *vq, uint16_t queue_id, - struct rte_mbuf **pkts, uint32_t count) +virtio_dev_rx_async_submit_packed(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint16_t queue_id, struct rte_mbuf **pkts, uint32_t count, + int16_t dma_id, uint16_t vchan_id) { uint32_t pkt_idx = 0; uint32_t remained = count; - int32_t n_xfer; + uint16_t n_xfer; uint16_t num_buffers; uint16_t num_descs; @@ -1670,6 +1837,7 @@ virtio_dev_rx_async_submit_packed(struct virtio_net *dev, struct async_inflight_info *pkts_info = async->pkts_info; uint32_t pkt_err = 0; uint16_t slot_idx = 0; + uint16_t head_idx = async->pkts_idx % vq->size; do { rte_prefetch0(&vq->desc_packed[vq->last_avail_idx]); @@ -1694,19 +1862,17 @@ virtio_dev_rx_async_submit_packed(struct virtio_net *dev, if (unlikely(pkt_idx == 0)) return 0; - n_xfer = async->ops.transfer_data(dev->vid, queue_id, async->iov_iter, 0, pkt_idx); - if (unlikely(n_xfer < 0)) { - VHOST_LOG_DATA(ERR, "(%d) %s: failed to transfer data for queue id %d.\n", - dev->vid, __func__, queue_id); - n_xfer = 0; - } - - pkt_err = pkt_idx - n_xfer; + n_xfer = vhost_async_dma_transfer(vq, dma_id, vchan_id, head_idx, + async->iov_iter, pkt_idx); async_iter_reset(async); - if (unlikely(pkt_err)) + pkt_err = pkt_idx - n_xfer; + if (unlikely(pkt_err)) { + VHOST_LOG_DATA(DEBUG, "(%d) %s: failed to transfer %u packets for queue %u.\n", + dev->vid, __func__, pkt_err, queue_id); dma_error_handler_packed(vq, slot_idx, pkt_err, &pkt_idx); + } if (likely(vq->shadow_used_idx)) { /* keep used descriptors. */ @@ -1826,28 +1992,43 @@ write_back_completed_descs_packed(struct vhost_virtqueue *vq, static __rte_always_inline uint16_t vhost_poll_enqueue_completed(struct virtio_net *dev, uint16_t queue_id, - struct rte_mbuf **pkts, uint16_t count) + struct rte_mbuf **pkts, uint16_t count, int16_t dma_id, + uint16_t vchan_id) { struct vhost_virtqueue *vq = dev->virtqueue[queue_id]; struct vhost_async *async = vq->async; struct async_inflight_info *pkts_info = async->pkts_info; - int32_t n_cpl; + uint32_t max_count; + uint16_t nr_cpl_pkts = 0; uint16_t n_descs = 0, n_buffers = 0; uint16_t start_idx, from, i; - n_cpl = async->ops.check_completed_copies(dev->vid, queue_id, 0, count); - if (unlikely(n_cpl < 0)) { - VHOST_LOG_DATA(ERR, "(%d) %s: failed to check completed copies for queue id %d.\n", - dev->vid, __func__, queue_id); - return 0; + /* Check completed copies for the given DMA vChannel */ + max_count = count * dma_poll_factor; + vhost_async_dma_check_completed(dma_id, vchan_id, max_count <= UINT16_MAX ? max_count : + UINT16_MAX); + + start_idx = async_get_first_inflight_pkt_idx(vq); + + /** + * Calculate the number of copy completed packets. + * Note that there may be completed packets even if + * no copies are reported done by the given DMA vChannel, + * as DMA vChannels could be shared by other threads. + */ + from = start_idx; + while (vq->async->pkts_cmpl_flag[from] && count--) { + vq->async->pkts_cmpl_flag[from] = false; + from++; + if (from >= vq->size) + from -= vq->size; + nr_cpl_pkts++; } - if (n_cpl == 0) + if (nr_cpl_pkts == 0) return 0; - start_idx = async_get_first_inflight_pkt_idx(vq); - - for (i = 0; i < n_cpl; i++) { + for (i = 0; i < nr_cpl_pkts; i++) { from = (start_idx + i) % vq->size; /* Only used with packed ring */ n_buffers += pkts_info[from].nr_buffers; @@ -1856,7 +2037,7 @@ vhost_poll_enqueue_completed(struct virtio_net *dev, uint16_t queue_id, pkts[i] = pkts_info[from].mbuf; } - async->pkts_inflight_n -= n_cpl; + async->pkts_inflight_n -= nr_cpl_pkts; if (likely(vq->enabled && vq->access_ok)) { if (vq_is_packed(dev)) { @@ -1877,12 +2058,13 @@ vhost_poll_enqueue_completed(struct virtio_net *dev, uint16_t queue_id, } } - return n_cpl; + return nr_cpl_pkts; } uint16_t rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id, - struct rte_mbuf **pkts, uint16_t count) + struct rte_mbuf **pkts, uint16_t count, int16_t dma_id, + uint16_t vchan_id) { struct virtio_net *dev = get_device(vid); struct vhost_virtqueue *vq; @@ -1906,9 +2088,20 @@ rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id, return 0; } - rte_spinlock_lock(&vq->access_lock); + if (unlikely(!dma_copy_track[dma_id].vchans || + vchan_id > dma_copy_track[dma_id].max_vchans)) { + VHOST_LOG_DATA(ERR, "(%d) %s: invalid DMA %d vchan %u.\n", + dev->vid, __func__, dma_id, vchan_id); + return 0; + } - n_pkts_cpl = vhost_poll_enqueue_completed(dev, queue_id, pkts, count); + if (!rte_spinlock_trylock(&vq->access_lock)) { + VHOST_LOG_CONFIG(DEBUG, "Failed to poll completed packets from queue id %u. " + "virt queue busy.\n", queue_id); + return 0; + } + + n_pkts_cpl = vhost_poll_enqueue_completed(dev, queue_id, pkts, count, dma_id, vchan_id); rte_spinlock_unlock(&vq->access_lock); @@ -1917,7 +2110,8 @@ rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id, uint16_t rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id, - struct rte_mbuf **pkts, uint16_t count) + struct rte_mbuf **pkts, uint16_t count, int16_t dma_id, + uint16_t vchan_id) { struct virtio_net *dev = get_device(vid); struct vhost_virtqueue *vq; @@ -1941,14 +2135,21 @@ rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id, return 0; } - n_pkts_cpl = vhost_poll_enqueue_completed(dev, queue_id, pkts, count); + if (unlikely(!dma_copy_track[dma_id].vchans || + vchan_id > dma_copy_track[dma_id].max_vchans)) { + VHOST_LOG_DATA(ERR, "(%d) %s: invalid DMA %d vchan %u.\n", + dev->vid, __func__, dma_id, vchan_id); + return 0; + } + + n_pkts_cpl = vhost_poll_enqueue_completed(dev, queue_id, pkts, count, dma_id, vchan_id); return n_pkts_cpl; } static __rte_always_inline uint32_t virtio_dev_rx_async_submit(struct virtio_net *dev, uint16_t queue_id, - struct rte_mbuf **pkts, uint32_t count) + struct rte_mbuf **pkts, uint32_t count, int16_t dma_id, uint16_t vchan_id) { struct vhost_virtqueue *vq; uint32_t nb_tx = 0; @@ -1960,6 +2161,13 @@ virtio_dev_rx_async_submit(struct virtio_net *dev, uint16_t queue_id, return 0; } + if (unlikely(!dma_copy_track[dma_id].vchans || + vchan_id > dma_copy_track[dma_id].max_vchans)) { + VHOST_LOG_DATA(ERR, "(%d) %s: invalid DMA %d vchan %u.\n", dev->vid, __func__, + dma_id, vchan_id); + return 0; + } + vq = dev->virtqueue[queue_id]; rte_spinlock_lock(&vq->access_lock); @@ -1980,10 +2188,10 @@ virtio_dev_rx_async_submit(struct virtio_net *dev, uint16_t queue_id, if (vq_is_packed(dev)) nb_tx = virtio_dev_rx_async_submit_packed(dev, vq, queue_id, - pkts, count); + pkts, count, dma_id, vchan_id); else nb_tx = virtio_dev_rx_async_submit_split(dev, vq, queue_id, - pkts, count); + pkts, count, dma_id, vchan_id); out: if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) @@ -1997,7 +2205,8 @@ virtio_dev_rx_async_submit(struct virtio_net *dev, uint16_t queue_id, uint16_t rte_vhost_submit_enqueue_burst(int vid, uint16_t queue_id, - struct rte_mbuf **pkts, uint16_t count) + struct rte_mbuf **pkts, uint16_t count, int16_t dma_id, + uint16_t vchan_id) { struct virtio_net *dev = get_device(vid); @@ -2011,7 +2220,7 @@ rte_vhost_submit_enqueue_burst(int vid, uint16_t queue_id, return 0; } - return virtio_dev_rx_async_submit(dev, queue_id, pkts, count); + return virtio_dev_rx_async_submit(dev, queue_id, pkts, count, dma_id, vchan_id); } static inline bool @@ -2369,7 +2578,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, cpy_len = RTE_MIN(buf_avail, mbuf_avail); if (likely(cpy_len > MAX_BATCH_LEN || - vq->batch_copy_nb_elems >= vq->size || + vq->batch_copy_nb_elems >= vq->batch_copy_max_elems || (hdr && cur == m))) { rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, mbuf_offset),