Message ID | 20230606081852.71003-1-maxime.coquelin@redhat.com (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1564142C3E; Tue, 6 Jun 2023 10:19:04 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 9680540A84; Tue, 6 Jun 2023 10:19:03 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mails.dpdk.org (Postfix) with ESMTP id 0AE5C40697 for <dev@dpdk.org>; Tue, 6 Jun 2023 10:19:01 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686039541; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=34rArwYV6N2Dl26BmARA5mE+wMAAey2l0lSLQRzrl4Y=; b=EIvjnPyrkAQbEAcYn0gcnEPSkk2ZlMUhbIAMpmtG++mfpt+b5aLtYglrLC0LZKTgcgnmBy GyL79mX+a2jg0aRgwpxltQZfq3iq/AKt/pD4Fku5X877mOq+3aCg40uk3fODcyJCs1Hp8W 54akGcfFMPBzDBwS3FXclQ3mWkg+etE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-475-aDPEiIOMMqSKOeeU-HwCog-1; Tue, 06 Jun 2023 04:18:58 -0400 X-MC-Unique: aDPEiIOMMqSKOeeU-HwCog-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 999B78027F5; Tue, 6 Jun 2023 08:18:57 +0000 (UTC) Received: from max-t490s.redhat.com (unknown [10.39.208.25]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1C38840B4CD7; Tue, 6 Jun 2023 08:18:54 +0000 (UTC) From: Maxime Coquelin <maxime.coquelin@redhat.com> To: dev@dpdk.org, chenbo.xia@intel.com, david.marchand@redhat.com, mkp@redhat.com, fbl@redhat.com, jasowang@redhat.com, cunming.liang@intel.com, xieyongji@bytedance.com, echaudro@redhat.com, eperezma@redhat.com, amorenoz@redhat.com, lulu@redhat.com Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Subject: [PATCH v5 00/26] Add VDUSE support to Vhost library Date: Tue, 6 Jun 2023 10:18:26 +0200 Message-Id: <20230606081852.71003-1-maxime.coquelin@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"; x-default=true X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org |
Series |
Add VDUSE support to Vhost library
|
|
Message
Maxime Coquelin
June 6, 2023, 8:18 a.m. UTC
This series introduces a new type of backend, VDUSE, to the Vhost library. VDUSE stands for vDPA device in Userspace, it enables implementing a Virtio device in userspace and have it attached to the Kernel vDPA bus. Once attached to the vDPA bus, it can be used either by Kernel Virtio drivers, like virtio-net in our case, via the virtio-vdpa driver. Doing that, the device is visible to the Kernel networking stack and is exposed to userspace as a regular netdev. It can also be exposed to userspace thanks to the vhost-vdpa driver, via a vhost-vdpa chardev that can be passed to QEMU or Virtio-user PMD. While VDUSE support is already available in upstream Kernel, a couple of patches are required to support network device type: https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc In order to attach the created VDUSE device to the vDPA bus, a recent iproute2 version containing the vdpa tool is required. Benchmark results: ================== On this v2, PVP reference benchmark has been run & compared with Vhost-user. When doing macswap forwarding in the worload, no difference is seen. When doing io forwarding in the workload, we see 4% performance degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is explained by the use of the IOTLB layer in the Vhost-library when using VDUSE, whereas Vhost-user/Virtio-user does not make use of it. Usage: ====== 1. Probe required Kernel modules # modprobe vdpa # modprobe vduse # modprobe virtio-vdpa 2. Build (require vduse kernel headers to be available) # meson build # ninja -C build 3. Create a VDUSE device (vduse0) using Vhost PMD with testpmd (with 4 queue pairs in this example) # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9 -- -i --txq=4 --rxq=4 4. Attach the VDUSE device to the vDPA bus # vdpa dev add name vduse0 mgmtdev vduse => The virtio-net netdev shows up (eth0 here) # ip l show eth0 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff 5. Start/stop traffic in testpmd testpmd> start testpmd> show port stats 0 ######################## NIC statistics for port 0 ######################## RX-packets: 11 RX-missed: 0 RX-bytes: 1482 RX-errors: 0 RX-nombuf: 0 TX-packets: 1 TX-errors: 0 TX-bytes: 62 Throughput (since last show) Rx-pps: 0 Rx-bps: 0 Tx-pps: 0 Tx-bps: 0 ############################################################################ testpmd> stop 6. Detach the VDUSE device from the vDPA bus # vdpa dev del vduse0 7. Quit testpmd testpmd> quit Known issues & remaining work: ============================== - Fix issue in FD manager (still polling while FD has been removed) - Add Netlink support in Vhost library - Support device reconnection -> a temporary patch to support reconnection via a tmpfs file is available, upstream solution would be in-kernel and is being developed. -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137 - Support packed ring - Provide more performance benchmark results Changes in v5: ============== - Delay starting/stopping the device to after having replied to the VDUSE event in order to avoid a deadlock encountered when testing with OVS. - Mention reconnection support lack in the release note. Changes in v4: ============== - Applied patch 1 and patch 2 from v3 - Rebased on top of Eelco series - Fix coredump clear in IOTLB cache removal (David) - Remove uneeded ret variable in vhost_vring_inject_irq (David) - Fixed release note (David, Chenbo) Changes in v2/v3: ================= - Fixed mem_set_dump() parameter (patch 4) - Fixed accidental comment change (patch 7, Chenbo) - Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo) - move change from patch 12 to 13 (Chenbo) - Enable locks annotation for control queue (Patch 17) - Send control queue notification when used descriptors enqueued (Patch 17) - Lock control queue IOTLB lock (Patch 17) - Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo) - Set VDUSE dev FD as NONBLOCK (Patch 18) - Enable more Virtio features (Patch 18) - Remove calls to pthread_setcancelstate() (Patch 22) - Add calls to fdset_pipe_notify() when adding and deleting FDs from a set (Patch 22) - Use RTE_DIM() to get requests string array size (Patch 22) - Set reply result for IOTLB update message (Patch 25, Chenbo) - Fix queues enablement with multiqueue (Patch 26) - Move kickfd creation for better logging (Patch 26) - Improve logging (Patch 26) - Uninstall cvq kickfd in case of handler installation failure (Patch 27) - Enable CVQ notifications once handler is installed (Patch 27) - Don't advertise multiqueue and control queue if app only request single queue pair (Patch 27) - Add release notes Maxime Coquelin (26): vhost: fix IOTLB entries overlap check with previous entry vhost: add helper of IOTLB entries coredump vhost: add helper for IOTLB entries shared page check vhost: don't dump unneeded pages with IOTLB vhost: change to single IOTLB cache per device vhost: add offset field to IOTLB entries vhost: add page size info to IOTLB entry vhost: retry translating IOVA after IOTLB miss vhost: introduce backend ops vhost: add IOTLB cache entry removal callback vhost: add helper for IOTLB misses vhost: add helper for interrupt injection vhost: add API to set max queue pairs net/vhost: use API to set max queue pairs vhost: add control virtqueue support vhost: add VDUSE device creation and destruction vhost: add VDUSE callback for IOTLB miss vhost: add VDUSE callback for IOTLB entry removal vhost: add VDUSE callback for IRQ injection vhost: add VDUSE events handler vhost: add support for virtqueue state get event vhost: add support for VDUSE status set event vhost: add support for VDUSE IOTLB update event vhost: add VDUSE device startup vhost: add multiqueue support to VDUSE vhost: add VDUSE device stop doc/guides/prog_guide/vhost_lib.rst | 4 + doc/guides/rel_notes/release_23_07.rst | 12 + drivers/net/vhost/rte_eth_vhost.c | 3 + lib/vhost/iotlb.c | 333 +++++++------ lib/vhost/iotlb.h | 45 +- lib/vhost/meson.build | 5 + lib/vhost/rte_vhost.h | 17 + lib/vhost/socket.c | 72 ++- lib/vhost/vduse.c | 646 +++++++++++++++++++++++++ lib/vhost/vduse.h | 33 ++ lib/vhost/version.map | 1 + lib/vhost/vhost.c | 70 ++- lib/vhost/vhost.h | 57 ++- lib/vhost/vhost_user.c | 51 +- lib/vhost/vhost_user.h | 2 +- lib/vhost/virtio_net_ctrl.c | 286 +++++++++++ lib/vhost/virtio_net_ctrl.h | 10 + 17 files changed, 1409 insertions(+), 238 deletions(-) create mode 100644 lib/vhost/vduse.c create mode 100644 lib/vhost/vduse.h create mode 100644 lib/vhost/virtio_net_ctrl.c create mode 100644 lib/vhost/virtio_net_ctrl.h
Comments
Hi Maxime, > -----Original Message----- > From: Maxime Coquelin <maxime.coquelin@redhat.com> > Sent: Tuesday, June 6, 2023 4:18 PM > To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>; > david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com; > jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji > <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com; > amorenoz@redhat.com; lulu@redhat.com > Cc: Maxime Coquelin <maxime.coquelin@redhat.com> > Subject: [PATCH v5 00/26] Add VDUSE support to Vhost library > > This series introduces a new type of backend, VDUSE, > to the Vhost library. > > VDUSE stands for vDPA device in Userspace, it enables > implementing a Virtio device in userspace and have it > attached to the Kernel vDPA bus. > > Once attached to the vDPA bus, it can be used either by > Kernel Virtio drivers, like virtio-net in our case, via > the virtio-vdpa driver. Doing that, the device is visible > to the Kernel networking stack and is exposed to userspace > as a regular netdev. > > It can also be exposed to userspace thanks to the > vhost-vdpa driver, via a vhost-vdpa chardev that can be > passed to QEMU or Virtio-user PMD. > > While VDUSE support is already available in upstream > Kernel, a couple of patches are required to support > network device type: > > https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc > > In order to attach the created VDUSE device to the vDPA > bus, a recent iproute2 version containing the vdpa tool is > required. > > Benchmark results: > ================== > > On this v2, PVP reference benchmark has been run & compared with > Vhost-user. > > When doing macswap forwarding in the worload, no difference is seen. > When doing io forwarding in the workload, we see 4% performance > degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is > explained by the use of the IOTLB layer in the Vhost-library when using > VDUSE, whereas Vhost-user/Virtio-user does not make use of it. > > Usage: > ====== > > 1. Probe required Kernel modules > # modprobe vdpa > # modprobe vduse > # modprobe virtio-vdpa > > 2. Build (require vduse kernel headers to be available) > # meson build > # ninja -C build > > 3. Create a VDUSE device (vduse0) using Vhost PMD with > testpmd (with 4 queue pairs in this example) > # ./build/app/dpdk-testpmd --no-pci -- > vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9 -- -i -- > txq=4 --rxq=4 > > 4. Attach the VDUSE device to the vDPA bus > # vdpa dev add name vduse0 mgmtdev vduse > => The virtio-net netdev shows up (eth0 here) > # ip l show eth0 > 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP > mode DEFAULT group default qlen 1000 > link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff > > 5. Start/stop traffic in testpmd > testpmd> start > testpmd> show port stats 0 > ######################## NIC statistics for port 0 > ######################## > RX-packets: 11 RX-missed: 0 RX-bytes: 1482 > RX-errors: 0 > RX-nombuf: 0 > TX-packets: 1 TX-errors: 0 TX-bytes: 62 > > Throughput (since last show) > Rx-pps: 0 Rx-bps: 0 > Tx-pps: 0 Tx-bps: 0 > > ########################################################################## > ## > testpmd> stop > > 6. Detach the VDUSE device from the vDPA bus > # vdpa dev del vduse0 > > 7. Quit testpmd > testpmd> quit > > Known issues & remaining work: > ============================== > - Fix issue in FD manager (still polling while FD has been removed) > - Add Netlink support in Vhost library > - Support device reconnection > -> a temporary patch to support reconnection via a tmpfs file is > available, > upstream solution would be in-kernel and is being developed. > -> https://gitlab.com/mcoquelin/dpdk-next-virtio/- > /commit/5ad06ce14159a9ce36ee168dd13ef389cec91137 > - Support packed ring > - Provide more performance benchmark results > > Changes in v5: > ============== > - Delay starting/stopping the device to after having replied to the VDUSE > event in order to avoid a deadlock encountered when testing with OVS. Could you explain more to help me understand the deadlock issue? Thanks, Chenbo > - Mention reconnection support lack in the release note. > > Changes in v4: > ============== > - Applied patch 1 and patch 2 from v3 > - Rebased on top of Eelco series > - Fix coredump clear in IOTLB cache removal (David) > - Remove uneeded ret variable in vhost_vring_inject_irq (David) > - Fixed release note (David, Chenbo) > > Changes in v2/v3: > ================= > - Fixed mem_set_dump() parameter (patch 4) > - Fixed accidental comment change (patch 7, Chenbo) > - Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo) > - move change from patch 12 to 13 (Chenbo) > - Enable locks annotation for control queue (Patch 17) > - Send control queue notification when used descriptors enqueued (Patch 17) > - Lock control queue IOTLB lock (Patch 17) > - Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo) > - Set VDUSE dev FD as NONBLOCK (Patch 18) > - Enable more Virtio features (Patch 18) > - Remove calls to pthread_setcancelstate() (Patch 22) > - Add calls to fdset_pipe_notify() when adding and deleting FDs from a set > (Patch 22) > - Use RTE_DIM() to get requests string array size (Patch 22) > - Set reply result for IOTLB update message (Patch 25, Chenbo) > - Fix queues enablement with multiqueue (Patch 26) > - Move kickfd creation for better logging (Patch 26) > - Improve logging (Patch 26) > - Uninstall cvq kickfd in case of handler installation failure (Patch 27) > - Enable CVQ notifications once handler is installed (Patch 27) > - Don't advertise multiqueue and control queue if app only request single > queue pair (Patch 27) > - Add release notes > > Maxime Coquelin (26): > vhost: fix IOTLB entries overlap check with previous entry > vhost: add helper of IOTLB entries coredump > vhost: add helper for IOTLB entries shared page check > vhost: don't dump unneeded pages with IOTLB > vhost: change to single IOTLB cache per device > vhost: add offset field to IOTLB entries > vhost: add page size info to IOTLB entry > vhost: retry translating IOVA after IOTLB miss > vhost: introduce backend ops > vhost: add IOTLB cache entry removal callback > vhost: add helper for IOTLB misses > vhost: add helper for interrupt injection > vhost: add API to set max queue pairs > net/vhost: use API to set max queue pairs > vhost: add control virtqueue support > vhost: add VDUSE device creation and destruction > vhost: add VDUSE callback for IOTLB miss > vhost: add VDUSE callback for IOTLB entry removal > vhost: add VDUSE callback for IRQ injection > vhost: add VDUSE events handler > vhost: add support for virtqueue state get event > vhost: add support for VDUSE status set event > vhost: add support for VDUSE IOTLB update event > vhost: add VDUSE device startup > vhost: add multiqueue support to VDUSE > vhost: add VDUSE device stop > > doc/guides/prog_guide/vhost_lib.rst | 4 + > doc/guides/rel_notes/release_23_07.rst | 12 + > drivers/net/vhost/rte_eth_vhost.c | 3 + > lib/vhost/iotlb.c | 333 +++++++------ > lib/vhost/iotlb.h | 45 +- > lib/vhost/meson.build | 5 + > lib/vhost/rte_vhost.h | 17 + > lib/vhost/socket.c | 72 ++- > lib/vhost/vduse.c | 646 +++++++++++++++++++++++++ > lib/vhost/vduse.h | 33 ++ > lib/vhost/version.map | 1 + > lib/vhost/vhost.c | 70 ++- > lib/vhost/vhost.h | 57 ++- > lib/vhost/vhost_user.c | 51 +- > lib/vhost/vhost_user.h | 2 +- > lib/vhost/virtio_net_ctrl.c | 286 +++++++++++ > lib/vhost/virtio_net_ctrl.h | 10 + > 17 files changed, 1409 insertions(+), 238 deletions(-) > create mode 100644 lib/vhost/vduse.c > create mode 100644 lib/vhost/vduse.h > create mode 100644 lib/vhost/virtio_net_ctrl.c > create mode 100644 lib/vhost/virtio_net_ctrl.h > > -- > 2.40.1
On Tue, Jun 6, 2023 at 10:19 AM Maxime Coquelin <maxime.coquelin@redhat.com> wrote: > > This series introduces a new type of backend, VDUSE, > to the Vhost library. > > VDUSE stands for vDPA device in Userspace, it enables > implementing a Virtio device in userspace and have it > attached to the Kernel vDPA bus. > > Once attached to the vDPA bus, it can be used either by > Kernel Virtio drivers, like virtio-net in our case, via > the virtio-vdpa driver. Doing that, the device is visible > to the Kernel networking stack and is exposed to userspace > as a regular netdev. > > It can also be exposed to userspace thanks to the > vhost-vdpa driver, via a vhost-vdpa chardev that can be > passed to QEMU or Virtio-user PMD. > > While VDUSE support is already available in upstream > Kernel, a couple of patches are required to support > network device type: > > https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc > > In order to attach the created VDUSE device to the vDPA > bus, a recent iproute2 version containing the vdpa tool is > required. > > Benchmark results: > ================== > > On this v2, PVP reference benchmark has been run & compared with > Vhost-user. > > When doing macswap forwarding in the worload, no difference is seen. > When doing io forwarding in the workload, we see 4% performance > degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is > explained by the use of the IOTLB layer in the Vhost-library when using > VDUSE, whereas Vhost-user/Virtio-user does not make use of it. > > Usage: > ====== > > 1. Probe required Kernel modules > # modprobe vdpa > # modprobe vduse > # modprobe virtio-vdpa > > 2. Build (require vduse kernel headers to be available) > # meson build > # ninja -C build > > 3. Create a VDUSE device (vduse0) using Vhost PMD with > testpmd (with 4 queue pairs in this example) > # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9 -- -i --txq=4 --rxq=4 9 is a nice but undefined value. 8 is enough. In general, I prefer "human readable" strings, like *:debug ;-). > > 4. Attach the VDUSE device to the vDPA bus > # vdpa dev add name vduse0 mgmtdev vduse > => The virtio-net netdev shows up (eth0 here) > # ip l show eth0 > 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 > link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff > > 5. Start/stop traffic in testpmd > testpmd> start > testpmd> show port stats 0 > ######################## NIC statistics for port 0 ######################## > RX-packets: 11 RX-missed: 0 RX-bytes: 1482 > RX-errors: 0 > RX-nombuf: 0 > TX-packets: 1 TX-errors: 0 TX-bytes: 62 > > Throughput (since last show) > Rx-pps: 0 Rx-bps: 0 > Tx-pps: 0 Tx-bps: 0 > ############################################################################ > testpmd> stop > > 6. Detach the VDUSE device from the vDPA bus > # vdpa dev del vduse0 > > 7. Quit testpmd > testpmd> quit > > Known issues & remaining work: > ============================== > - Fix issue in FD manager (still polling while FD has been removed) > - Add Netlink support in Vhost library > - Support device reconnection > -> a temporary patch to support reconnection via a tmpfs file is available, > upstream solution would be in-kernel and is being developed. > -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137 > - Support packed ring > - Provide more performance benchmark results We are missing a reference to the kernel patches required to have vduse accept net devices. I had played with the patches at v1 and it was working ok. I did not review in depth the latest revisions, but I followed your series from the PoC/start. Overall, the series lgtm. For the series, Acked-by: David Marchand <david.marchand@redhat.com>
On 6/7/23 08:48, Xia, Chenbo wrote: > Hi Maxime, > >> -----Original Message----- >> From: Maxime Coquelin <maxime.coquelin@redhat.com> >> Sent: Tuesday, June 6, 2023 4:18 PM >> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>; >> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com; >> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji >> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com; >> amorenoz@redhat.com; lulu@redhat.com >> Cc: Maxime Coquelin <maxime.coquelin@redhat.com> >> Subject: [PATCH v5 00/26] Add VDUSE support to Vhost library >> >> This series introduces a new type of backend, VDUSE, >> to the Vhost library. >> >> VDUSE stands for vDPA device in Userspace, it enables >> implementing a Virtio device in userspace and have it >> attached to the Kernel vDPA bus. >> >> Once attached to the vDPA bus, it can be used either by >> Kernel Virtio drivers, like virtio-net in our case, via >> the virtio-vdpa driver. Doing that, the device is visible >> to the Kernel networking stack and is exposed to userspace >> as a regular netdev. >> >> It can also be exposed to userspace thanks to the >> vhost-vdpa driver, via a vhost-vdpa chardev that can be >> passed to QEMU or Virtio-user PMD. >> >> While VDUSE support is already available in upstream >> Kernel, a couple of patches are required to support >> network device type: >> >> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc >> >> In order to attach the created VDUSE device to the vDPA >> bus, a recent iproute2 version containing the vdpa tool is >> required. >> >> Benchmark results: >> ================== >> >> On this v2, PVP reference benchmark has been run & compared with >> Vhost-user. >> >> When doing macswap forwarding in the worload, no difference is seen. >> When doing io forwarding in the workload, we see 4% performance >> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is >> explained by the use of the IOTLB layer in the Vhost-library when using >> VDUSE, whereas Vhost-user/Virtio-user does not make use of it. >> >> Usage: >> ====== >> >> 1. Probe required Kernel modules >> # modprobe vdpa >> # modprobe vduse >> # modprobe virtio-vdpa >> >> 2. Build (require vduse kernel headers to be available) >> # meson build >> # ninja -C build >> >> 3. Create a VDUSE device (vduse0) using Vhost PMD with >> testpmd (with 4 queue pairs in this example) >> # ./build/app/dpdk-testpmd --no-pci -- >> vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9 -- -i -- >> txq=4 --rxq=4 >> >> 4. Attach the VDUSE device to the vDPA bus >> # vdpa dev add name vduse0 mgmtdev vduse >> => The virtio-net netdev shows up (eth0 here) >> # ip l show eth0 >> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP >> mode DEFAULT group default qlen 1000 >> link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff >> >> 5. Start/stop traffic in testpmd >> testpmd> start >> testpmd> show port stats 0 >> ######################## NIC statistics for port 0 >> ######################## >> RX-packets: 11 RX-missed: 0 RX-bytes: 1482 >> RX-errors: 0 >> RX-nombuf: 0 >> TX-packets: 1 TX-errors: 0 TX-bytes: 62 >> >> Throughput (since last show) >> Rx-pps: 0 Rx-bps: 0 >> Tx-pps: 0 Tx-bps: 0 >> >> ########################################################################## >> ## >> testpmd> stop >> >> 6. Detach the VDUSE device from the vDPA bus >> # vdpa dev del vduse0 >> >> 7. Quit testpmd >> testpmd> quit >> >> Known issues & remaining work: >> ============================== >> - Fix issue in FD manager (still polling while FD has been removed) >> - Add Netlink support in Vhost library >> - Support device reconnection >> -> a temporary patch to support reconnection via a tmpfs file is >> available, >> upstream solution would be in-kernel and is being developed. >> -> https://gitlab.com/mcoquelin/dpdk-next-virtio/- >> /commit/5ad06ce14159a9ce36ee168dd13ef389cec91137 >> - Support packed ring >> - Provide more performance benchmark results >> >> Changes in v5: >> ============== >> - Delay starting/stopping the device to after having replied to the VDUSE >> event in order to avoid a deadlock encountered when testing with OVS. > > Could you explain more to help me understand the deadlock issue? Sure. The V5 fixes an ABBA deadlock involving OVS mutex and kernel rtnl_lock(), two OVS threads and the vdpa tool process. We have an OVS bridge with a mlx5 port already added. We add the vduse port to the same bridge. Then we use the iproute2 vdpa tool to attach the vduse device the the kernel vdpa bus. when doing this the rtnl lock is taken when the virtio- net device is probed, and VDUSE_SET_STATUS gets sent and waits for its reply. This VDUSE_SET_STATUS request is handled by the DPDK VDUSE event handler, and if DRIVER_OK bit is set the Vhsot .new_device() callback is called, which triggers a bridge reconfiguration. On bridge reconfiguration, the mlx5 port takes the OVS mutex and performs an ioctl() which tries to take the rtnl lock, but is is already owned by the vdpa tool. The vduse_events thread is stucked waiting for the OVS mutex, so the reply to the VDUSE_SET_STATUS event is never sent, and the vdpa tool process is stucked for 30 seconds, until a timeout happens. When the timeourt happen, everything is unblocked, but the VDUSE device has been marked as broken, and so not usable anymore. I could reproduce and provide you the backtraces of the different threads if you wish. Anyway, I think it makes sense to perform the device startup after having replied to VDUSE_SET_STATUS request, as it just mean the device has taken into account the new status of the driver. Hope it clarifies, let me know if you need more details. Thanks, Maxime > Thanks, > Chenbo > >> - Mention reconnection support lack in the release note. >> >> Changes in v4: >> ============== >> - Applied patch 1 and patch 2 from v3 >> - Rebased on top of Eelco series >> - Fix coredump clear in IOTLB cache removal (David) >> - Remove uneeded ret variable in vhost_vring_inject_irq (David) >> - Fixed release note (David, Chenbo) >> >> Changes in v2/v3: >> ================= >> - Fixed mem_set_dump() parameter (patch 4) >> - Fixed accidental comment change (patch 7, Chenbo) >> - Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo) >> - move change from patch 12 to 13 (Chenbo) >> - Enable locks annotation for control queue (Patch 17) >> - Send control queue notification when used descriptors enqueued (Patch 17) >> - Lock control queue IOTLB lock (Patch 17) >> - Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo) >> - Set VDUSE dev FD as NONBLOCK (Patch 18) >> - Enable more Virtio features (Patch 18) >> - Remove calls to pthread_setcancelstate() (Patch 22) >> - Add calls to fdset_pipe_notify() when adding and deleting FDs from a set >> (Patch 22) >> - Use RTE_DIM() to get requests string array size (Patch 22) >> - Set reply result for IOTLB update message (Patch 25, Chenbo) >> - Fix queues enablement with multiqueue (Patch 26) >> - Move kickfd creation for better logging (Patch 26) >> - Improve logging (Patch 26) >> - Uninstall cvq kickfd in case of handler installation failure (Patch 27) >> - Enable CVQ notifications once handler is installed (Patch 27) >> - Don't advertise multiqueue and control queue if app only request single >> queue pair (Patch 27) >> - Add release notes >> >> Maxime Coquelin (26): >> vhost: fix IOTLB entries overlap check with previous entry >> vhost: add helper of IOTLB entries coredump >> vhost: add helper for IOTLB entries shared page check >> vhost: don't dump unneeded pages with IOTLB >> vhost: change to single IOTLB cache per device >> vhost: add offset field to IOTLB entries >> vhost: add page size info to IOTLB entry >> vhost: retry translating IOVA after IOTLB miss >> vhost: introduce backend ops >> vhost: add IOTLB cache entry removal callback >> vhost: add helper for IOTLB misses >> vhost: add helper for interrupt injection >> vhost: add API to set max queue pairs >> net/vhost: use API to set max queue pairs >> vhost: add control virtqueue support >> vhost: add VDUSE device creation and destruction >> vhost: add VDUSE callback for IOTLB miss >> vhost: add VDUSE callback for IOTLB entry removal >> vhost: add VDUSE callback for IRQ injection >> vhost: add VDUSE events handler >> vhost: add support for virtqueue state get event >> vhost: add support for VDUSE status set event >> vhost: add support for VDUSE IOTLB update event >> vhost: add VDUSE device startup >> vhost: add multiqueue support to VDUSE >> vhost: add VDUSE device stop >> >> doc/guides/prog_guide/vhost_lib.rst | 4 + >> doc/guides/rel_notes/release_23_07.rst | 12 + >> drivers/net/vhost/rte_eth_vhost.c | 3 + >> lib/vhost/iotlb.c | 333 +++++++------ >> lib/vhost/iotlb.h | 45 +- >> lib/vhost/meson.build | 5 + >> lib/vhost/rte_vhost.h | 17 + >> lib/vhost/socket.c | 72 ++- >> lib/vhost/vduse.c | 646 +++++++++++++++++++++++++ >> lib/vhost/vduse.h | 33 ++ >> lib/vhost/version.map | 1 + >> lib/vhost/vhost.c | 70 ++- >> lib/vhost/vhost.h | 57 ++- >> lib/vhost/vhost_user.c | 51 +- >> lib/vhost/vhost_user.h | 2 +- >> lib/vhost/virtio_net_ctrl.c | 286 +++++++++++ >> lib/vhost/virtio_net_ctrl.h | 10 + >> 17 files changed, 1409 insertions(+), 238 deletions(-) >> create mode 100644 lib/vhost/vduse.c >> create mode 100644 lib/vhost/vduse.h >> create mode 100644 lib/vhost/virtio_net_ctrl.c >> create mode 100644 lib/vhost/virtio_net_ctrl.h >> >> -- >> 2.40.1 >
> -----Original Message----- > From: Maxime Coquelin <maxime.coquelin@redhat.com> > Sent: Wednesday, June 7, 2023 10:59 PM > To: Xia, Chenbo <chenbo.xia@intel.com>; dev@dpdk.org; > david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com; > jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji > <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com; > amorenoz@redhat.com; lulu@redhat.com > Subject: Re: [PATCH v5 00/26] Add VDUSE support to Vhost library > > > > On 6/7/23 08:48, Xia, Chenbo wrote: > > Hi Maxime, > > > >> -----Original Message----- > >> From: Maxime Coquelin <maxime.coquelin@redhat.com> > >> Sent: Tuesday, June 6, 2023 4:18 PM > >> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>; > >> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com; > >> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, > Yongji > >> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com; > >> amorenoz@redhat.com; lulu@redhat.com > >> Cc: Maxime Coquelin <maxime.coquelin@redhat.com> > >> Subject: [PATCH v5 00/26] Add VDUSE support to Vhost library > >> > >> This series introduces a new type of backend, VDUSE, > >> to the Vhost library. > >> > >> VDUSE stands for vDPA device in Userspace, it enables > >> implementing a Virtio device in userspace and have it > >> attached to the Kernel vDPA bus. > >> > >> Once attached to the vDPA bus, it can be used either by > >> Kernel Virtio drivers, like virtio-net in our case, via > >> the virtio-vdpa driver. Doing that, the device is visible > >> to the Kernel networking stack and is exposed to userspace > >> as a regular netdev. > >> > >> It can also be exposed to userspace thanks to the > >> vhost-vdpa driver, via a vhost-vdpa chardev that can be > >> passed to QEMU or Virtio-user PMD. > >> > >> While VDUSE support is already available in upstream > >> Kernel, a couple of patches are required to support > >> network device type: > >> > >> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc > >> > >> In order to attach the created VDUSE device to the vDPA > >> bus, a recent iproute2 version containing the vdpa tool is > >> required. > >> > >> Benchmark results: > >> ================== > >> > >> On this v2, PVP reference benchmark has been run & compared with > >> Vhost-user. > >> > >> When doing macswap forwarding in the worload, no difference is seen. > >> When doing io forwarding in the workload, we see 4% performance > >> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is > >> explained by the use of the IOTLB layer in the Vhost-library when using > >> VDUSE, whereas Vhost-user/Virtio-user does not make use of it. > >> > >> Usage: > >> ====== > >> > >> 1. Probe required Kernel modules > >> # modprobe vdpa > >> # modprobe vduse > >> # modprobe virtio-vdpa > >> > >> 2. Build (require vduse kernel headers to be available) > >> # meson build > >> # ninja -C build > >> > >> 3. Create a VDUSE device (vduse0) using Vhost PMD with > >> testpmd (with 4 queue pairs in this example) > >> # ./build/app/dpdk-testpmd --no-pci -- > >> vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9 -- -i > -- > >> txq=4 --rxq=4 > >> > >> 4. Attach the VDUSE device to the vDPA bus > >> # vdpa dev add name vduse0 mgmtdev vduse > >> => The virtio-net netdev shows up (eth0 here) > >> # ip l show eth0 > >> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP > >> mode DEFAULT group default qlen 1000 > >> link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff > >> > >> 5. Start/stop traffic in testpmd > >> testpmd> start > >> testpmd> show port stats 0 > >> ######################## NIC statistics for port 0 > >> ######################## > >> RX-packets: 11 RX-missed: 0 RX-bytes: 1482 > >> RX-errors: 0 > >> RX-nombuf: 0 > >> TX-packets: 1 TX-errors: 0 TX-bytes: 62 > >> > >> Throughput (since last show) > >> Rx-pps: 0 Rx-bps: 0 > >> Tx-pps: 0 Tx-bps: 0 > >> > >> > ########################################################################## > >> ## > >> testpmd> stop > >> > >> 6. Detach the VDUSE device from the vDPA bus > >> # vdpa dev del vduse0 > >> > >> 7. Quit testpmd > >> testpmd> quit > >> > >> Known issues & remaining work: > >> ============================== > >> - Fix issue in FD manager (still polling while FD has been removed) > >> - Add Netlink support in Vhost library > >> - Support device reconnection > >> -> a temporary patch to support reconnection via a tmpfs file is > >> available, > >> upstream solution would be in-kernel and is being developed. > >> -> https://gitlab.com/mcoquelin/dpdk-next-virtio/- > >> /commit/5ad06ce14159a9ce36ee168dd13ef389cec91137 > >> - Support packed ring > >> - Provide more performance benchmark results > >> > >> Changes in v5: > >> ============== > >> - Delay starting/stopping the device to after having replied to the > VDUSE > >> event in order to avoid a deadlock encountered when testing with OVS. > > > > Could you explain more to help me understand the deadlock issue? > > Sure. > > The V5 fixes an ABBA deadlock involving OVS mutex and kernel > rtnl_lock(), two OVS threads and the vdpa tool process. > > We have an OVS bridge with a mlx5 port already added. > We add the vduse port to the same bridge. > Then we use the iproute2 vdpa tool to attach the vduse device the the > kernel vdpa bus. when doing this the rtnl lock is taken when the virtio- > net device is probed, and VDUSE_SET_STATUS gets sent and waits for its > reply. > > This VDUSE_SET_STATUS request is handled by the DPDK VDUSE event > handler, and if DRIVER_OK bit is set the Vhsot .new_device() callback is > called, which triggers a bridge reconfiguration. > > On bridge reconfiguration, the mlx5 port takes the OVS mutex and > performs an ioctl() which tries to take the rtnl lock, but is is already > owned by the vdpa tool. > > The vduse_events thread is stucked waiting for the OVS mutex, so the > reply to the VDUSE_SET_STATUS event is never sent, and the vdpa tool > process is stucked for 30 seconds, until a timeout happens. > > When the timeourt happen, everything is unblocked, but the VDUSE device > has been marked as broken, and so not usable anymore. > > I could reproduce and provide you the backtraces of the different > threads if you wish. > > Anyway, I think it makes sense to perform the device startup after > having replied to VDUSE_SET_STATUS request, as it just mean the device > has taken into account the new status of the driver. > > Hope it clarifies, let me know if you need more details. It's very clear! Thanks Maxime for the explanation! /Chenbo > > Thanks, > Maxime > > > Thanks, > > Chenbo > > > >> - Mention reconnection support lack in the release note. > >> > >> Changes in v4: > >> ============== > >> - Applied patch 1 and patch 2 from v3 > >> - Rebased on top of Eelco series > >> - Fix coredump clear in IOTLB cache removal (David) > >> - Remove uneeded ret variable in vhost_vring_inject_irq (David) > >> - Fixed release note (David, Chenbo) > >> > >> Changes in v2/v3: > >> ================= > >> - Fixed mem_set_dump() parameter (patch 4) > >> - Fixed accidental comment change (patch 7, Chenbo) > >> - Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo) > >> - move change from patch 12 to 13 (Chenbo) > >> - Enable locks annotation for control queue (Patch 17) > >> - Send control queue notification when used descriptors enqueued (Patch > 17) > >> - Lock control queue IOTLB lock (Patch 17) > >> - Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo) > >> - Set VDUSE dev FD as NONBLOCK (Patch 18) > >> - Enable more Virtio features (Patch 18) > >> - Remove calls to pthread_setcancelstate() (Patch 22) > >> - Add calls to fdset_pipe_notify() when adding and deleting FDs from a > set > >> (Patch 22) > >> - Use RTE_DIM() to get requests string array size (Patch 22) > >> - Set reply result for IOTLB update message (Patch 25, Chenbo) > >> - Fix queues enablement with multiqueue (Patch 26) > >> - Move kickfd creation for better logging (Patch 26) > >> - Improve logging (Patch 26) > >> - Uninstall cvq kickfd in case of handler installation failure (Patch > 27) > >> - Enable CVQ notifications once handler is installed (Patch 27) > >> - Don't advertise multiqueue and control queue if app only request > single > >> queue pair (Patch 27) > >> - Add release notes > >> > >> Maxime Coquelin (26): > >> vhost: fix IOTLB entries overlap check with previous entry > >> vhost: add helper of IOTLB entries coredump > >> vhost: add helper for IOTLB entries shared page check > >> vhost: don't dump unneeded pages with IOTLB > >> vhost: change to single IOTLB cache per device > >> vhost: add offset field to IOTLB entries > >> vhost: add page size info to IOTLB entry > >> vhost: retry translating IOVA after IOTLB miss > >> vhost: introduce backend ops > >> vhost: add IOTLB cache entry removal callback > >> vhost: add helper for IOTLB misses > >> vhost: add helper for interrupt injection > >> vhost: add API to set max queue pairs > >> net/vhost: use API to set max queue pairs > >> vhost: add control virtqueue support > >> vhost: add VDUSE device creation and destruction > >> vhost: add VDUSE callback for IOTLB miss > >> vhost: add VDUSE callback for IOTLB entry removal > >> vhost: add VDUSE callback for IRQ injection > >> vhost: add VDUSE events handler > >> vhost: add support for virtqueue state get event > >> vhost: add support for VDUSE status set event > >> vhost: add support for VDUSE IOTLB update event > >> vhost: add VDUSE device startup > >> vhost: add multiqueue support to VDUSE > >> vhost: add VDUSE device stop > >> > >> doc/guides/prog_guide/vhost_lib.rst | 4 + > >> doc/guides/rel_notes/release_23_07.rst | 12 + > >> drivers/net/vhost/rte_eth_vhost.c | 3 + > >> lib/vhost/iotlb.c | 333 +++++++------ > >> lib/vhost/iotlb.h | 45 +- > >> lib/vhost/meson.build | 5 + > >> lib/vhost/rte_vhost.h | 17 + > >> lib/vhost/socket.c | 72 ++- > >> lib/vhost/vduse.c | 646 > +++++++++++++++++++++++++ > >> lib/vhost/vduse.h | 33 ++ > >> lib/vhost/version.map | 1 + > >> lib/vhost/vhost.c | 70 ++- > >> lib/vhost/vhost.h | 57 ++- > >> lib/vhost/vhost_user.c | 51 +- > >> lib/vhost/vhost_user.h | 2 +- > >> lib/vhost/virtio_net_ctrl.c | 286 +++++++++++ > >> lib/vhost/virtio_net_ctrl.h | 10 + > >> 17 files changed, 1409 insertions(+), 238 deletions(-) > >> create mode 100644 lib/vhost/vduse.c > >> create mode 100644 lib/vhost/vduse.h > >> create mode 100644 lib/vhost/virtio_net_ctrl.c > >> create mode 100644 lib/vhost/virtio_net_ctrl.h > >> > >> -- > >> 2.40.1 > >
On 6/7/23 10:05, David Marchand wrote: > On Tue, Jun 6, 2023 at 10:19 AM Maxime Coquelin > <maxime.coquelin@redhat.com> wrote: >> >> This series introduces a new type of backend, VDUSE, >> to the Vhost library. >> >> VDUSE stands for vDPA device in Userspace, it enables >> implementing a Virtio device in userspace and have it >> attached to the Kernel vDPA bus. >> >> Once attached to the vDPA bus, it can be used either by >> Kernel Virtio drivers, like virtio-net in our case, via >> the virtio-vdpa driver. Doing that, the device is visible >> to the Kernel networking stack and is exposed to userspace >> as a regular netdev. >> >> It can also be exposed to userspace thanks to the >> vhost-vdpa driver, via a vhost-vdpa chardev that can be >> passed to QEMU or Virtio-user PMD. >> >> While VDUSE support is already available in upstream >> Kernel, a couple of patches are required to support >> network device type: >> >> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc >> >> In order to attach the created VDUSE device to the vDPA >> bus, a recent iproute2 version containing the vdpa tool is >> required. >> >> Benchmark results: >> ================== >> >> On this v2, PVP reference benchmark has been run & compared with >> Vhost-user. >> >> When doing macswap forwarding in the worload, no difference is seen. >> When doing io forwarding in the workload, we see 4% performance >> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is >> explained by the use of the IOTLB layer in the Vhost-library when using >> VDUSE, whereas Vhost-user/Virtio-user does not make use of it. >> >> Usage: >> ====== >> >> 1. Probe required Kernel modules >> # modprobe vdpa >> # modprobe vduse >> # modprobe virtio-vdpa >> >> 2. Build (require vduse kernel headers to be available) >> # meson build >> # ninja -C build >> >> 3. Create a VDUSE device (vduse0) using Vhost PMD with >> testpmd (with 4 queue pairs in this example) >> # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9 -- -i --txq=4 --rxq=4 > > 9 is a nice but undefined value. 8 is enough. > In general, I prefer "human readable" strings, like *:debug ;-). > > >> >> 4. Attach the VDUSE device to the vDPA bus >> # vdpa dev add name vduse0 mgmtdev vduse >> => The virtio-net netdev shows up (eth0 here) >> # ip l show eth0 >> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 >> link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff >> >> 5. Start/stop traffic in testpmd >> testpmd> start >> testpmd> show port stats 0 >> ######################## NIC statistics for port 0 ######################## >> RX-packets: 11 RX-missed: 0 RX-bytes: 1482 >> RX-errors: 0 >> RX-nombuf: 0 >> TX-packets: 1 TX-errors: 0 TX-bytes: 62 >> >> Throughput (since last show) >> Rx-pps: 0 Rx-bps: 0 >> Tx-pps: 0 Tx-bps: 0 >> ############################################################################ >> testpmd> stop >> >> 6. Detach the VDUSE device from the vDPA bus >> # vdpa dev del vduse0 >> >> 7. Quit testpmd >> testpmd> quit >> >> Known issues & remaining work: >> ============================== >> - Fix issue in FD manager (still polling while FD has been removed) >> - Add Netlink support in Vhost library >> - Support device reconnection >> -> a temporary patch to support reconnection via a tmpfs file is available, >> upstream solution would be in-kernel and is being developed. >> -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137 >> - Support packed ring >> - Provide more performance benchmark results > > We are missing a reference to the kernel patches required to have > vduse accept net devices. Right, I mention it in the cover letter, but it should be in the release note also. I propose to append this to the release note: "While VDUSE support is already available in upstream Kernel, a couple of patches are required to support network device type, which are being upstreamed: https://lore.kernel.org/all/20230419134329.346825-1-maxime.coquelin@redhat.com/" Does that sound good to you? Thanks, Maxime > > I had played with the patches at v1 and it was working ok. > I did not review in depth the latest revisions, but I followed your > series from the PoC/start. > Overall, the series lgtm. > > For the series, > Acked-by: David Marchand <david.marchand@redhat.com> > >
On Thu, Jun 8, 2023 at 11:17 AM Maxime Coquelin <maxime.coquelin@redhat.com> wrote: > On 6/7/23 10:05, David Marchand wrote: > > On Tue, Jun 6, 2023 at 10:19 AM Maxime Coquelin > > <maxime.coquelin@redhat.com> wrote: > >> > >> This series introduces a new type of backend, VDUSE, > >> to the Vhost library. > >> > >> VDUSE stands for vDPA device in Userspace, it enables > >> implementing a Virtio device in userspace and have it > >> attached to the Kernel vDPA bus. > >> > >> Once attached to the vDPA bus, it can be used either by > >> Kernel Virtio drivers, like virtio-net in our case, via > >> the virtio-vdpa driver. Doing that, the device is visible > >> to the Kernel networking stack and is exposed to userspace > >> as a regular netdev. > >> > >> It can also be exposed to userspace thanks to the > >> vhost-vdpa driver, via a vhost-vdpa chardev that can be > >> passed to QEMU or Virtio-user PMD. > >> > >> While VDUSE support is already available in upstream > >> Kernel, a couple of patches are required to support > >> network device type: > >> > >> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc > >> > >> In order to attach the created VDUSE device to the vDPA > >> bus, a recent iproute2 version containing the vdpa tool is > >> required. > >> > >> Benchmark results: > >> ================== > >> > >> On this v2, PVP reference benchmark has been run & compared with > >> Vhost-user. > >> > >> When doing macswap forwarding in the worload, no difference is seen. > >> When doing io forwarding in the workload, we see 4% performance > >> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is > >> explained by the use of the IOTLB layer in the Vhost-library when using > >> VDUSE, whereas Vhost-user/Virtio-user does not make use of it. > >> > >> Usage: > >> ====== > >> > >> 1. Probe required Kernel modules > >> # modprobe vdpa > >> # modprobe vduse > >> # modprobe virtio-vdpa > >> > >> 2. Build (require vduse kernel headers to be available) > >> # meson build > >> # ninja -C build > >> > >> 3. Create a VDUSE device (vduse0) using Vhost PMD with > >> testpmd (with 4 queue pairs in this example) > >> # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9 -- -i --txq=4 --rxq=4 > > > > 9 is a nice but undefined value. 8 is enough. > > In general, I prefer "human readable" strings, like *:debug ;-). > > > > > >> > >> 4. Attach the VDUSE device to the vDPA bus > >> # vdpa dev add name vduse0 mgmtdev vduse > >> => The virtio-net netdev shows up (eth0 here) > >> # ip l show eth0 > >> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 > >> link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff > >> > >> 5. Start/stop traffic in testpmd > >> testpmd> start > >> testpmd> show port stats 0 > >> ######################## NIC statistics for port 0 ######################## > >> RX-packets: 11 RX-missed: 0 RX-bytes: 1482 > >> RX-errors: 0 > >> RX-nombuf: 0 > >> TX-packets: 1 TX-errors: 0 TX-bytes: 62 > >> > >> Throughput (since last show) > >> Rx-pps: 0 Rx-bps: 0 > >> Tx-pps: 0 Tx-bps: 0 > >> ############################################################################ > >> testpmd> stop > >> > >> 6. Detach the VDUSE device from the vDPA bus > >> # vdpa dev del vduse0 > >> > >> 7. Quit testpmd > >> testpmd> quit > >> > >> Known issues & remaining work: > >> ============================== > >> - Fix issue in FD manager (still polling while FD has been removed) > >> - Add Netlink support in Vhost library > >> - Support device reconnection > >> -> a temporary patch to support reconnection via a tmpfs file is available, > >> upstream solution would be in-kernel and is being developed. > >> -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137 > >> - Support packed ring > >> - Provide more performance benchmark results > > > > We are missing a reference to the kernel patches required to have > > vduse accept net devices. > > Right, I mention it in the cover letter, but it should be in the release > note also. I propose to append this to the release note: > "While VDUSE support is already available in upstream Kernel, a couple > of patches are required to support network device type, which are being > upstreamed: > https://lore.kernel.org/all/20230419134329.346825-1-maxime.coquelin@redhat.com/" > > Does that sound good to you? Ok for me. Thanks.
On 6/6/23 10:18, Maxime Coquelin wrote: > This series introduces a new type of backend, VDUSE, > to the Vhost library. > > VDUSE stands for vDPA device in Userspace, it enables > implementing a Virtio device in userspace and have it > attached to the Kernel vDPA bus. > > Once attached to the vDPA bus, it can be used either by > Kernel Virtio drivers, like virtio-net in our case, via > the virtio-vdpa driver. Doing that, the device is visible > to the Kernel networking stack and is exposed to userspace > as a regular netdev. > > It can also be exposed to userspace thanks to the > vhost-vdpa driver, via a vhost-vdpa chardev that can be > passed to QEMU or Virtio-user PMD. > > While VDUSE support is already available in upstream > Kernel, a couple of patches are required to support > network device type: > > https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc > > In order to attach the created VDUSE device to the vDPA > bus, a recent iproute2 version containing the vdpa tool is > required. > > Benchmark results: > ================== > > On this v2, PVP reference benchmark has been run & compared with > Vhost-user. > > When doing macswap forwarding in the worload, no difference is seen. > When doing io forwarding in the workload, we see 4% performance > degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is > explained by the use of the IOTLB layer in the Vhost-library when using > VDUSE, whereas Vhost-user/Virtio-user does not make use of it. > > Usage: > ====== > > 1. Probe required Kernel modules > # modprobe vdpa > # modprobe vduse > # modprobe virtio-vdpa > > 2. Build (require vduse kernel headers to be available) > # meson build > # ninja -C build > > 3. Create a VDUSE device (vduse0) using Vhost PMD with > testpmd (with 4 queue pairs in this example) > # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9 -- -i --txq=4 --rxq=4 > > 4. Attach the VDUSE device to the vDPA bus > # vdpa dev add name vduse0 mgmtdev vduse > => The virtio-net netdev shows up (eth0 here) > # ip l show eth0 > 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 > link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff > > 5. Start/stop traffic in testpmd > testpmd> start > testpmd> show port stats 0 > ######################## NIC statistics for port 0 ######################## > RX-packets: 11 RX-missed: 0 RX-bytes: 1482 > RX-errors: 0 > RX-nombuf: 0 > TX-packets: 1 TX-errors: 0 TX-bytes: 62 > > Throughput (since last show) > Rx-pps: 0 Rx-bps: 0 > Tx-pps: 0 Tx-bps: 0 > ############################################################################ > testpmd> stop > > 6. Detach the VDUSE device from the vDPA bus > # vdpa dev del vduse0 > > 7. Quit testpmd > testpmd> quit > > Known issues & remaining work: > ============================== > - Fix issue in FD manager (still polling while FD has been removed) > - Add Netlink support in Vhost library > - Support device reconnection > -> a temporary patch to support reconnection via a tmpfs file is available, > upstream solution would be in-kernel and is being developed. > -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137 > - Support packed ring > - Provide more performance benchmark results > > Changes in v5: > ============== > - Delay starting/stopping the device to after having replied to the VDUSE > event in order to avoid a deadlock encountered when testing with OVS. > - Mention reconnection support lack in the release note. > > Changes in v4: > ============== > - Applied patch 1 and patch 2 from v3 > - Rebased on top of Eelco series > - Fix coredump clear in IOTLB cache removal (David) > - Remove uneeded ret variable in vhost_vring_inject_irq (David) > - Fixed release note (David, Chenbo) > > Changes in v2/v3: > ================= > - Fixed mem_set_dump() parameter (patch 4) > - Fixed accidental comment change (patch 7, Chenbo) > - Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo) > - move change from patch 12 to 13 (Chenbo) > - Enable locks annotation for control queue (Patch 17) > - Send control queue notification when used descriptors enqueued (Patch 17) > - Lock control queue IOTLB lock (Patch 17) > - Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo) > - Set VDUSE dev FD as NONBLOCK (Patch 18) > - Enable more Virtio features (Patch 18) > - Remove calls to pthread_setcancelstate() (Patch 22) > - Add calls to fdset_pipe_notify() when adding and deleting FDs from a set (Patch 22) > - Use RTE_DIM() to get requests string array size (Patch 22) > - Set reply result for IOTLB update message (Patch 25, Chenbo) > - Fix queues enablement with multiqueue (Patch 26) > - Move kickfd creation for better logging (Patch 26) > - Improve logging (Patch 26) > - Uninstall cvq kickfd in case of handler installation failure (Patch 27) > - Enable CVQ notifications once handler is installed (Patch 27) > - Don't advertise multiqueue and control queue if app only request single queue pair (Patch 27) > - Add release notes > > Maxime Coquelin (26): > vhost: fix IOTLB entries overlap check with previous entry > vhost: add helper of IOTLB entries coredump > vhost: add helper for IOTLB entries shared page check > vhost: don't dump unneeded pages with IOTLB > vhost: change to single IOTLB cache per device > vhost: add offset field to IOTLB entries > vhost: add page size info to IOTLB entry > vhost: retry translating IOVA after IOTLB miss > vhost: introduce backend ops > vhost: add IOTLB cache entry removal callback > vhost: add helper for IOTLB misses > vhost: add helper for interrupt injection > vhost: add API to set max queue pairs > net/vhost: use API to set max queue pairs > vhost: add control virtqueue support > vhost: add VDUSE device creation and destruction > vhost: add VDUSE callback for IOTLB miss > vhost: add VDUSE callback for IOTLB entry removal > vhost: add VDUSE callback for IRQ injection > vhost: add VDUSE events handler > vhost: add support for virtqueue state get event > vhost: add support for VDUSE status set event > vhost: add support for VDUSE IOTLB update event > vhost: add VDUSE device startup > vhost: add multiqueue support to VDUSE > vhost: add VDUSE device stop > > doc/guides/prog_guide/vhost_lib.rst | 4 + > doc/guides/rel_notes/release_23_07.rst | 12 + > drivers/net/vhost/rte_eth_vhost.c | 3 + > lib/vhost/iotlb.c | 333 +++++++------ > lib/vhost/iotlb.h | 45 +- > lib/vhost/meson.build | 5 + > lib/vhost/rte_vhost.h | 17 + > lib/vhost/socket.c | 72 ++- > lib/vhost/vduse.c | 646 +++++++++++++++++++++++++ > lib/vhost/vduse.h | 33 ++ > lib/vhost/version.map | 1 + > lib/vhost/vhost.c | 70 ++- > lib/vhost/vhost.h | 57 ++- > lib/vhost/vhost_user.c | 51 +- > lib/vhost/vhost_user.h | 2 +- > lib/vhost/virtio_net_ctrl.c | 286 +++++++++++ > lib/vhost/virtio_net_ctrl.h | 10 + > 17 files changed, 1409 insertions(+), 238 deletions(-) > create mode 100644 lib/vhost/vduse.c > create mode 100644 lib/vhost/vduse.h > create mode 100644 lib/vhost/virtio_net_ctrl.c > create mode 100644 lib/vhost/virtio_net_ctrl.h > Applied to dpdk-next-virtio/main. Thanks, Maxime