From patchwork Fri Oct 11 17:09:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Flavio Leitner X-Patchwork-Id: 60985 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A76D01EB4F; Fri, 11 Oct 2019 19:10:14 +0200 (CEST) Received: from sysclose.org (smtp.sysclose.org [69.164.214.230]) by dpdk.org (Postfix) with ESMTP id 33C151EB4E for ; Fri, 11 Oct 2019 19:10:12 +0200 (CEST) Received: by sysclose.org (Postfix, from userid 5001) id 5C6FD65B5; Fri, 11 Oct 2019 17:10:36 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 sysclose.org 5C6FD65B5 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sysclose.org; s=201903; t=1570813836; bh=fey5pEySa3qHz9oQgpZFg6rRZPPwWBniOGwvx89sq2I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=j+ExrOjRIR/6Uq2G4wp/x+FwIqGqdX/h5Htm4wM31FPtJOAwkFVVt88bjlLVe3ZIq P3IkAHiJ5P9Idy+3YWKxfwQps5U4Gnw72SbFgvKctOig0tA/4Ik5MJwZsexBb5d/S0 t4PdfJefn4XZYD7o1RIwlQdalR3uKawCUH/yj6ZpEfeV8Zw+7cNomXFjVhNt0zUSj0 zol2xFSFNykeY+bzzCvQBIqrBW1E4ZXgCGSZTqykxLn9wI/+2didyMafhYtrVfIdSp z41Ew8RdB38X2xFEBOtCEPhXREjNDmVPC+RR8zNkUIGJHofZTsaR8WjLXLTYKxlI/v lesfHVy35L2CA== X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on mail.sysclose.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from localhost (unknown [177.183.215.210]) by sysclose.org (Postfix) with ESMTPSA id 84DD86501; Fri, 11 Oct 2019 17:10:34 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 sysclose.org 84DD86501 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sysclose.org; s=201903; t=1570813834; bh=fey5pEySa3qHz9oQgpZFg6rRZPPwWBniOGwvx89sq2I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NKsdigYBN9K1R2cPI7lq50r8+IewgWWMHWFiDFZ2smBUUOSH4dIIIZ/puqt3gWbbQ VILM7ztITMDdKBK8L9dd9ubunW6T+GCpjrPgpouW02n/Paa1FHSaUsZmlHgZjHpkfg KQ8etCpWtdDkI11FFRzi0DCGQkYe2NoOFH2Pu/tj+gUXt6w7sCeTZfYco/wg284u20 0XH1rWETMj/TJFkjOfb0708BK0b+81Empa+XokO2iDTCa/kHOqdZ0hoSJHwLCvpon7 HY0bNq/PXDHx66K12W1YIZ37w3u4O+vkQ2bOsEBrWPRsMw6fQ31UWbiHsjUeU2gJDK eLk/qv/eYtJOQ== From: Flavio Leitner To: dev@dpdk.org Cc: Ilya Maximets , Maxime Coquelin , Shahaf Shuler , David Marchand , Tiwei Bie , Obrembski MichalX , Stokes Ian Date: Fri, 11 Oct 2019 14:09:47 -0300 Message-Id: <20191011170947.30656-1-fbl@sysclose.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191004201008.3981-1-fbl@sysclose.org> References: <20191004201008.3981-1-fbl@sysclose.org> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v3] vhost: add support for large buffers X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The rte_vhost_dequeue_burst supports two ways of dequeuing data. If the data fits into a buffer, then all data is copied and a single linear buffer is returned. Otherwise it allocates additional mbufs and chains them together to return a multiple segments mbuf. While that covers most use cases, it forces applications that need to work with larger data sizes to support multiple segments mbufs. The non-linear characteristic brings complexity and performance implications to the application. To resolve the issue, add support to attach external buffer to a pktmbuf and let the host provide during registration if attaching an external buffer to pktmbuf is supported and if only linear buffer are supported. Signed-off-by: Flavio Leitner --- doc/guides/prog_guide/vhost_lib.rst | 35 +++++++++ lib/librte_vhost/rte_vhost.h | 4 ++ lib/librte_vhost/socket.c | 22 ++++++ lib/librte_vhost/vhost.c | 22 ++++++ lib/librte_vhost/vhost.h | 4 ++ lib/librte_vhost/virtio_net.c | 108 ++++++++++++++++++++++++---- 6 files changed, 181 insertions(+), 14 deletions(-) - Changelog: V3: - prevent the new features to be used with zero copy - fixed sizeof() usage - fixed log msg indentation - removed/replaced asserts - used the correct virt2iova function - fixed the patch's title - OvS PoC code: https://github.com/fleitner/ovs/tree/rte_malloc-v3 V2: - Used rte_malloc() instead of another mempool as suggested by Shahaf. - Added the documentation section. - Using driver registration to negotiate the features. - OvS PoC code: https://github.com/fleitner/ovs/commit/8fc197c40b1d4fda331686a7b919e9e2b670dda7 diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index fc3ee4353..07e40e3c5 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -117,6 +117,41 @@ The following is an overview of some key Vhost API functions: Enabling this flag should only be done when the calling application does not pre-fault the guest shared memory, otherwise migration would fail. + - ``RTE_VHOST_USER_LINEARBUF_SUPPORT`` + + Enabling this flag forces vhost dequeue function to only provide linear + pktmbuf (no multi-segmented pktmbuf). + + The vhost library by default provides a single pktmbuf for given a + packet, but if for some reason the data doesn't fit into a single + pktmbuf (e.g., TSO is enabled), the library will allocate additional + pktmbufs from the same mempool and chain them together to create a + multi-segmented pktmbuf. + + However, the vhost application needs to support multi-segmented format. + If the vhost application does not support that format and requires large + buffers to be dequeue, this flag should be enabled to force only linear + buffers (see RTE_VHOST_USER_EXTBUF_SUPPORT) or drop the packet. + + It is disabled by default. + + - ``RTE_VHOST_USER_EXTBUF_SUPPORT`` + + Enabling this flag allows vhost dequeue function to allocate and attach + an external buffer to a pktmbuf if the pkmbuf doesn't provide enough + space to store all data. + + This is useful when the vhost application wants to support large packets + but doesn't want to increase the default mempool object size nor to + support multi-segmented mbufs (non-linear). In this case, a fresh buffer + is allocated using rte_malloc() which gets attached to a pktmbuf using + rte_pktmbuf_attach_extbuf(). + + See RTE_VHOST_USER_LINEARBUF_SUPPORT as well to disable multi-segmented + mbufs for application that doesn't support chained mbufs. + + It is disabled by default. + * ``rte_vhost_driver_set_features(path, features)`` This function sets the feature bits the vhost-user driver supports. The diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index 19474bca0..b821b5df4 100644 --- a/lib/librte_vhost/rte_vhost.h +++ b/lib/librte_vhost/rte_vhost.h @@ -30,6 +30,10 @@ extern "C" { #define RTE_VHOST_USER_DEQUEUE_ZERO_COPY (1ULL << 2) #define RTE_VHOST_USER_IOMMU_SUPPORT (1ULL << 3) #define RTE_VHOST_USER_POSTCOPY_SUPPORT (1ULL << 4) +/* support mbuf with external buffer attached */ +#define RTE_VHOST_USER_EXTBUF_SUPPORT (1ULL << 5) +/* support only linear buffers (no chained mbufs) */ +#define RTE_VHOST_USER_LINEARBUF_SUPPORT (1ULL << 6) /** Protocol features. */ #ifndef VHOST_USER_PROTOCOL_F_MQ diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 274988c4d..e546be2a8 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -40,6 +40,8 @@ struct vhost_user_socket { bool dequeue_zero_copy; bool iommu_support; bool use_builtin_virtio_net; + bool extbuf; + bool linearbuf; /* * The "supported_features" indicates the feature bits the @@ -232,6 +234,12 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) if (vsocket->dequeue_zero_copy) vhost_enable_dequeue_zero_copy(vid); + if (vsocket->extbuf) + vhost_enable_extbuf(vid); + + if (vsocket->linearbuf) + vhost_enable_linearbuf(vid); + RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", vid); if (vsocket->notify_ops->new_connection) { @@ -870,6 +878,8 @@ rte_vhost_driver_register(const char *path, uint64_t flags) goto out_free; } vsocket->dequeue_zero_copy = flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY; + vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT; + vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT; /* * Set the supported features correctly for the builtin vhost-user @@ -894,6 +904,18 @@ rte_vhost_driver_register(const char *path, uint64_t flags) * not compatible with postcopy. */ if (vsocket->dequeue_zero_copy) { + if (vsocket->extbuf) { + RTE_LOG(ERR, VHOST_CONFIG, + "error: zero copy is incompatible with external buffers\n"); + ret = -1; + goto out_free; + } + if (vsocket->linearbuf) { + RTE_LOG(ERR, VHOST_CONFIG, + "error: zero copy is incompatible with linear buffers\n"); + ret = -1; + goto out_free; + } vsocket->supported_features &= ~(1ULL << VIRTIO_F_IN_ORDER); vsocket->features &= ~(1ULL << VIRTIO_F_IN_ORDER); diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index cea44df8c..77457f538 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -605,6 +605,28 @@ vhost_set_builtin_virtio_net(int vid, bool enable) dev->flags &= ~VIRTIO_DEV_BUILTIN_VIRTIO_NET; } +void +vhost_enable_extbuf(int vid) +{ + struct virtio_net *dev = get_device(vid); + + if (dev == NULL) + return; + + dev->extbuf = 1; +} + +void +vhost_enable_linearbuf(int vid) +{ + struct virtio_net *dev = get_device(vid); + + if (dev == NULL) + return; + + dev->linearbuf = 1; +} + int rte_vhost_get_mtu(int vid, uint16_t *mtu) { diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 5131a97a3..0346bd118 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -302,6 +302,8 @@ struct virtio_net { rte_atomic16_t broadcast_rarp; uint32_t nr_vring; int dequeue_zero_copy; + int extbuf; + int linearbuf; struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ) char ifname[IF_NAME_SZ]; @@ -476,6 +478,8 @@ void vhost_attach_vdpa_device(int vid, int did); void vhost_set_ifname(int, const char *if_name, unsigned int if_len); void vhost_enable_dequeue_zero_copy(int vid); void vhost_set_builtin_virtio_net(int vid, bool enable); +void vhost_enable_extbuf(int vid); +void vhost_enable_linearbuf(int vid); struct vhost_device_ops const *vhost_driver_callback_get(const char *path); diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5b85b832d..b84a72748 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -1289,6 +1289,92 @@ get_zmbuf(struct vhost_virtqueue *vq) return NULL; } +static void +virtio_dev_extbuf_free(void *addr __rte_unused, void *opaque) +{ + rte_free(opaque); +} + +static int +virtio_dev_extbuf_alloc(struct rte_mbuf *pkt, uint16_t size) +{ + struct rte_mbuf_ext_shared_info *shinfo = NULL; + uint16_t buf_len = 0; + rte_iova_t iova; + void *buf; + + /* Try to use pkt buffer to store shinfo to reduce the amount of memory + * required, otherwise store shinfo in the new buffer. + */ + if (rte_pktmbuf_tailroom(pkt) > sizeof(*shinfo)) + shinfo = rte_pktmbuf_mtod(pkt, + struct rte_mbuf_ext_shared_info *); + else + buf_len += sizeof(*shinfo); + + if (unlikely(buf_len + size + RTE_PKTMBUF_HEADROOM > UINT16_MAX)) { + RTE_LOG(ERR, VHOST_DATA, + "buffer size %d exceeded maximum.\n", buf_len); + return -ENOSPC; + } + + buf_len += size + RTE_PKTMBUF_HEADROOM; + buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE); + if (unlikely(buf == NULL)) + return -ENOMEM; + + /* initialize shinfo */ + if (shinfo) { + shinfo->free_cb = virtio_dev_extbuf_free; + shinfo->fcb_opaque = buf; + rte_mbuf_ext_refcnt_set(shinfo, 1); + } else { + shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len, + virtio_dev_extbuf_free, buf); + if (unlikely(shinfo == NULL)) { + RTE_LOG(ERR, VHOST_DATA, "Failed to init shinfo\n"); + return -1; + } + } + + iova = rte_malloc_virt2iova(buf); + rte_pktmbuf_attach_extbuf(pkt, buf, iova, buf_len, shinfo); + rte_pktmbuf_reset_headroom(pkt); + + return 0; +} + +/* + * Allocate a host supported pktmbuf. + */ +static __rte_always_inline struct rte_mbuf * +virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp, + uint16_t data_len) +{ + struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp); + + if (unlikely(pkt == NULL)) + return NULL; + + if (rte_pktmbuf_tailroom(pkt) >= data_len) + return pkt; + + /* attach an external buffer if supported */ + if (dev->extbuf && !virtio_dev_extbuf_alloc(pkt, data_len)) + return pkt; + + /* check if chained buffers are allowed */ + if (!dev->linearbuf) + return pkt; + + /* Data doesn't fit into the buffer and the host supports + * only linear buffers + */ + rte_pktmbuf_free(pkt); + + return NULL; +} + static __rte_noinline uint16_t virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) @@ -1343,26 +1429,23 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, for (i = 0; i < count; i++) { struct buf_vector buf_vec[BUF_VECTOR_MAX]; uint16_t head_idx; - uint32_t dummy_len; + uint32_t buf_len; uint16_t nr_vec = 0; int err; if (unlikely(fill_vec_buf_split(dev, vq, vq->last_avail_idx + i, &nr_vec, buf_vec, - &head_idx, &dummy_len, + &head_idx, &buf_len, VHOST_ACCESS_RO) < 0)) break; if (likely(dev->dequeue_zero_copy == 0)) update_shadow_used_ring_split(vq, head_idx, 0); - pkts[i] = rte_pktmbuf_alloc(mbuf_pool); - if (unlikely(pkts[i] == NULL)) { - RTE_LOG(ERR, VHOST_DATA, - "Failed to allocate memory for mbuf.\n"); + pkts[i] = virtio_dev_pktmbuf_alloc(dev, mbuf_pool, buf_len); + if (unlikely(pkts[i] == NULL)) break; - } err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i], mbuf_pool); @@ -1451,14 +1534,14 @@ virtio_dev_tx_packed(struct virtio_net *dev, struct vhost_virtqueue *vq, for (i = 0; i < count; i++) { struct buf_vector buf_vec[BUF_VECTOR_MAX]; uint16_t buf_id; - uint32_t dummy_len; + uint32_t buf_len; uint16_t desc_count, nr_vec = 0; int err; if (unlikely(fill_vec_buf_packed(dev, vq, vq->last_avail_idx, &desc_count, buf_vec, &nr_vec, - &buf_id, &dummy_len, + &buf_id, &buf_len, VHOST_ACCESS_RO) < 0)) break; @@ -1466,12 +1549,9 @@ virtio_dev_tx_packed(struct virtio_net *dev, struct vhost_virtqueue *vq, update_shadow_used_ring_packed(vq, buf_id, 0, desc_count); - pkts[i] = rte_pktmbuf_alloc(mbuf_pool); - if (unlikely(pkts[i] == NULL)) { - RTE_LOG(ERR, VHOST_DATA, - "Failed to allocate memory for mbuf.\n"); + pkts[i] = virtio_dev_pktmbuf_alloc(dev, mbuf_pool, buf_len); + if (unlikely(pkts[i] == NULL)) break; - } err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i], mbuf_pool);