From patchwork Wed Apr 15 16:47:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marvin Liu X-Patchwork-Id: 68541 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 187B7A0563; Wed, 15 Apr 2020 11:13:29 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 861C21D64D; Wed, 15 Apr 2020 11:13:23 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id E33DD1D5CC for ; Wed, 15 Apr 2020 11:13:19 +0200 (CEST) IronPort-SDR: XUoTptyyJdvCE30JjYfQCEpm2Qu5ALDb1PiZE6+e4PArDeLAhVRWzNoifeL3jxLYot/vcVVfwa EcAY+Y/8rp2A== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 02:13:19 -0700 IronPort-SDR: Vyk5jGVb0yOG5L7in7ylrvcV+BIbeoTbcJIaFFobT29ZYOB8qX4xDVhaU+hFvvevAFJVbx9WHg LBDa1MFEJrCA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,386,1580803200"; d="scan'208";a="400250832" Received: from npg-dpdk-virtual-marvin-dev.sh.intel.com ([10.67.119.58]) by orsmga004.jf.intel.com with ESMTP; 15 Apr 2020 02:13:18 -0700 From: Marvin Liu To: maxime.coquelin@redhat.com, xiaolong.ye@intel.com, zhihong.wang@intel.com Cc: harry.van.haaren@intel.com, dev@dpdk.org, Marvin Liu Date: Thu, 16 Apr 2020 00:47:26 +0800 Message-Id: <20200415164733.75416-2-yong.liu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200415164733.75416-1-yong.liu@intel.com> References: <20200313174230.74661-1-yong.liu@intel.com> <20200415164733.75416-1-yong.liu@intel.com> Subject: [dpdk-dev] [PATCH v4 1/8] net/virtio: enable vectorized datapath X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Previously, virtio split ring vectorized datapath is enabled as default. This is not suitable for everyone as that datapath not follow virtio spec. Add specific config for virtio vectorized datapath selection. This config will be also used for virtio packed ring. Signed-off-by: Marvin Liu diff --git a/config/common_base b/config/common_base index 7ca2f28b1..afeda85b0 100644 --- a/config/common_base +++ b/config/common_base @@ -450,6 +450,7 @@ CONFIG_RTE_LIBRTE_VIRTIO_PMD=y CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_RX=n CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_TX=n CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_DUMP=n +CONFIG_RTE_LIBRTE_VIRTIO_INC_VECTOR=y # # Compile virtio device emulation inside virtio PMD driver diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile index efdcb0d93..9ef445bc9 100644 --- a/drivers/net/virtio/Makefile +++ b/drivers/net/virtio/Makefile @@ -29,6 +29,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c +ifeq ($(CONFIG_RTE_LIBRTE_VIRTIO_INC_VECTOR),y) ifeq ($(CONFIG_RTE_ARCH_X86),y) SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_sse.c else ifeq ($(CONFIG_RTE_ARCH_PPC_64),y) @@ -36,6 +37,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_altivec.c else ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),) SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_neon.c endif +endif ifeq ($(CONFIG_RTE_VIRTIO_USER),y) SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c From patchwork Wed Apr 15 16:47:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marvin Liu X-Patchwork-Id: 68542 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 64D45A0563; Wed, 15 Apr 2020 11:13:38 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E45461D654; Wed, 15 Apr 2020 11:13:25 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id E2FE61D62D for ; Wed, 15 Apr 2020 11:13:21 +0200 (CEST) IronPort-SDR: vGiMV8YMai9MuQ4if3ga+s/DGJleeJUuEmXHidWS9AcyY07zWJKvqKqhMxzamOi5DA7SSjBW70 s3yE5D2MDDTA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 02:13:21 -0700 IronPort-SDR: HawkskQqurqDnQRNZSbac+mwj3mwLY9WFwPhXt0CPTDgB73cY2e7jq5jt8YY/IV/NIVLkHBZYj +8RtTp1+3jew== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,386,1580803200"; d="scan'208";a="400250838" Received: from npg-dpdk-virtual-marvin-dev.sh.intel.com ([10.67.119.58]) by orsmga004.jf.intel.com with ESMTP; 15 Apr 2020 02:13:19 -0700 From: Marvin Liu To: maxime.coquelin@redhat.com, xiaolong.ye@intel.com, zhihong.wang@intel.com Cc: harry.van.haaren@intel.com, dev@dpdk.org, Marvin Liu Date: Thu, 16 Apr 2020 00:47:27 +0800 Message-Id: <20200415164733.75416-3-yong.liu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200415164733.75416-1-yong.liu@intel.com> References: <20200313174230.74661-1-yong.liu@intel.com> <20200415164733.75416-1-yong.liu@intel.com> Subject: [dpdk-dev] [PATCH v4 2/8] net/virtio-user: add vectorized datapath parameter X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add new parameter "vectorized" which can enable vectorized datapath explicitly. This parameter will work for both split ring and packed ring. When "vectorized" option is on, driver will check both compiling environment and running enviornment. Signed-off-by: Marvin Liu diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index f9d0ea70d..19a36ad82 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -1547,7 +1547,7 @@ set_rxtx_funcs(struct rte_eth_dev *eth_dev) eth_dev->rx_pkt_burst = &virtio_recv_pkts_packed; } } else { - if (hw->use_simple_rx) { + if (hw->use_vec_rx) { PMD_INIT_LOG(INFO, "virtio: using simple Rx path on port %u", eth_dev->data->port_id); eth_dev->rx_pkt_burst = virtio_recv_pkts_vec; @@ -2157,33 +2157,31 @@ virtio_dev_configure(struct rte_eth_dev *dev) return -EBUSY; } - hw->use_simple_rx = 1; - if (vtpci_with_feature(hw, VIRTIO_F_IN_ORDER)) { hw->use_inorder_tx = 1; hw->use_inorder_rx = 1; - hw->use_simple_rx = 0; + hw->use_vec_rx = 0; } if (vtpci_packed_queue(hw)) { - hw->use_simple_rx = 0; + hw->use_vec_rx = 0; hw->use_inorder_rx = 0; } #if defined RTE_ARCH_ARM64 || defined RTE_ARCH_ARM if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) { - hw->use_simple_rx = 0; + hw->use_vec_rx = 0; } #endif if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) { - hw->use_simple_rx = 0; + hw->use_vec_rx = 0; } if (rx_offloads & (DEV_RX_OFFLOAD_UDP_CKSUM | DEV_RX_OFFLOAD_TCP_CKSUM | DEV_RX_OFFLOAD_TCP_LRO | DEV_RX_OFFLOAD_VLAN_STRIP)) - hw->use_simple_rx = 0; + hw->use_vec_rx = 0; return 0; } diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h index 7433d2f08..36afed313 100644 --- a/drivers/net/virtio/virtio_pci.h +++ b/drivers/net/virtio/virtio_pci.h @@ -250,7 +250,8 @@ struct virtio_hw { uint8_t vlan_strip; uint8_t use_msix; uint8_t modern; - uint8_t use_simple_rx; + uint8_t use_vec_rx; + uint8_t use_vec_tx; uint8_t use_inorder_rx; uint8_t use_inorder_tx; uint8_t weak_barriers; diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 3a2dbc2e0..285af1d47 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -995,7 +995,7 @@ virtio_dev_rx_queue_setup_finish(struct rte_eth_dev *dev, uint16_t queue_idx) /* Allocate blank mbufs for the each rx descriptor */ nbufs = 0; - if (hw->use_simple_rx) { + if (hw->use_vec_rx) { for (desc_idx = 0; desc_idx < vq->vq_nentries; desc_idx++) { vq->vq_split.ring.avail->ring[desc_idx] = desc_idx; @@ -1013,7 +1013,7 @@ virtio_dev_rx_queue_setup_finish(struct rte_eth_dev *dev, uint16_t queue_idx) &rxvq->fake_mbuf; } - if (hw->use_simple_rx) { + if (hw->use_vec_rx) { while (vq->vq_free_cnt >= RTE_VIRTIO_VPMD_RX_REARM_THRESH) { virtio_rxq_rearm_vec(rxvq); nbufs += RTE_VIRTIO_VPMD_RX_REARM_THRESH; diff --git a/drivers/net/virtio/virtio_user_ethdev.c b/drivers/net/virtio/virtio_user_ethdev.c index e61af4068..ca7797cfa 100644 --- a/drivers/net/virtio/virtio_user_ethdev.c +++ b/drivers/net/virtio/virtio_user_ethdev.c @@ -450,6 +450,8 @@ static const char *valid_args[] = { VIRTIO_USER_ARG_IN_ORDER, #define VIRTIO_USER_ARG_PACKED_VQ "packed_vq" VIRTIO_USER_ARG_PACKED_VQ, +#define VIRTIO_USER_ARG_VECTORIZED "vectorized" + VIRTIO_USER_ARG_VECTORIZED, NULL }; @@ -518,7 +520,8 @@ virtio_user_eth_dev_alloc(struct rte_vdev_device *vdev) */ hw->use_msix = 1; hw->modern = 0; - hw->use_simple_rx = 0; + hw->use_vec_rx = 0; + hw->use_vec_tx = 0; hw->use_inorder_rx = 0; hw->use_inorder_tx = 0; hw->virtio_user_dev = dev; @@ -552,6 +555,8 @@ virtio_user_pmd_probe(struct rte_vdev_device *dev) uint64_t mrg_rxbuf = 1; uint64_t in_order = 1; uint64_t packed_vq = 0; + uint64_t vectorized = 0; + char *path = NULL; char *ifname = NULL; char *mac_addr = NULL; @@ -668,6 +673,17 @@ virtio_user_pmd_probe(struct rte_vdev_device *dev) } } +#ifdef RTE_LIBRTE_VIRTIO_INC_VECTOR + if (rte_kvargs_count(kvlist, VIRTIO_USER_ARG_VECTORIZED) == 1) { + if (rte_kvargs_process(kvlist, VIRTIO_USER_ARG_VECTORIZED, + &get_integer_arg, &vectorized) < 0) { + PMD_INIT_LOG(ERR, "error to parse %s", + VIRTIO_USER_ARG_VECTORIZED); + goto end; + } + } +#endif + if (queues > 1 && cq == 0) { PMD_INIT_LOG(ERR, "multi-q requires ctrl-q"); goto end; @@ -705,6 +721,7 @@ virtio_user_pmd_probe(struct rte_vdev_device *dev) } hw = eth_dev->data->dev_private; + if (virtio_user_dev_init(hw->virtio_user_dev, path, queues, cq, queue_size, mac_addr, &ifname, server_mode, mrg_rxbuf, in_order, packed_vq) < 0) { @@ -720,6 +737,20 @@ virtio_user_pmd_probe(struct rte_vdev_device *dev) goto end; } + if (vectorized) { + if (packed_vq) { +#if defined(CC_AVX512_SUPPORT) + hw->use_vec_rx = 1; + hw->use_vec_tx = 1; +#else + PMD_INIT_LOG(INFO, + "building environment do not match packed ring vectorized requirement"); +#endif + } else { + hw->use_vec_rx = 1; + } + } + rte_eth_dev_probing_finish(eth_dev); ret = 0; @@ -777,4 +808,5 @@ RTE_PMD_REGISTER_PARAM_STRING(net_virtio_user, "server=<0|1> " "mrg_rxbuf=<0|1> " "in_order=<0|1> " - "packed_vq=<0|1>"); + "packed_vq=<0|1>" + "vectorized=<0|1>"); diff --git a/drivers/net/virtio/virtqueue.c b/drivers/net/virtio/virtqueue.c index 0b4e3bf3e..349ff0c9d 100644 --- a/drivers/net/virtio/virtqueue.c +++ b/drivers/net/virtio/virtqueue.c @@ -32,7 +32,7 @@ virtqueue_detach_unused(struct virtqueue *vq) end = (vq->vq_avail_idx + vq->vq_free_cnt) & (vq->vq_nentries - 1); for (idx = 0; idx < vq->vq_nentries; idx++) { - if (hw->use_simple_rx && type == VTNET_RQ) { + if (hw->use_vec_rx && type == VTNET_RQ) { if (start <= end && idx >= start && idx < end) continue; if (start > end && (idx >= start || idx < end)) @@ -97,7 +97,7 @@ virtqueue_rxvq_flush_split(struct virtqueue *vq) for (i = 0; i < nb_used; i++) { used_idx = vq->vq_used_cons_idx & (vq->vq_nentries - 1); uep = &vq->vq_split.ring.used->ring[used_idx]; - if (hw->use_simple_rx) { + if (hw->use_vec_rx) { desc_idx = used_idx; rte_pktmbuf_free(vq->sw_ring[desc_idx]); vq->vq_free_cnt++; @@ -121,7 +121,7 @@ virtqueue_rxvq_flush_split(struct virtqueue *vq) vq->vq_used_cons_idx++; } - if (hw->use_simple_rx) { + if (hw->use_vec_rx) { while (vq->vq_free_cnt >= RTE_VIRTIO_VPMD_RX_REARM_THRESH) { virtio_rxq_rearm_vec(rxq); if (virtqueue_kick_prepare(vq)) From patchwork Wed Apr 15 16:47:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marvin Liu X-Patchwork-Id: 68543 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 117B9A0563; Wed, 15 Apr 2020 11:13:59 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 8D5EE1D664; Wed, 15 Apr 2020 11:13:28 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 0DFBA1D650 for ; Wed, 15 Apr 2020 11:13:23 +0200 (CEST) IronPort-SDR: yhnYltuDuTrltYclZI6PcmCbk/OX0kojfNTO4jD2FZkojnPz0nMhE1FtGO9IPb0Q5J32aFPcA3 2GsZ01wRJk+w== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 02:13:23 -0700 IronPort-SDR: cI+R7Z6/eIzkJ6IfBV26BbRLA/ZEAzHdyiLO2JS0wUC8vR4SeegNFQ9Sua6UuTH0g8ei4+7swE 8M6v7pmRr5DQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,386,1580803200"; d="scan'208";a="400250860" Received: from npg-dpdk-virtual-marvin-dev.sh.intel.com ([10.67.119.58]) by orsmga004.jf.intel.com with ESMTP; 15 Apr 2020 02:13:21 -0700 From: Marvin Liu To: maxime.coquelin@redhat.com, xiaolong.ye@intel.com, zhihong.wang@intel.com Cc: harry.van.haaren@intel.com, dev@dpdk.org, Marvin Liu Date: Thu, 16 Apr 2020 00:47:28 +0800 Message-Id: <20200415164733.75416-4-yong.liu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200415164733.75416-1-yong.liu@intel.com> References: <20200313174230.74661-1-yong.liu@intel.com> <20200415164733.75416-1-yong.liu@intel.com> Subject: [dpdk-dev] [PATCH v4 3/8] net/virtio: add vectorized packed ring Rx function X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Optimize packed ring Rx datapath when AVX512 enabled and mergeable buffer/Rx LRO offloading are not required. Solution of optimization is pretty like vhost, is that split datapath into batch and single functions. Batch function is further optimized by vector instructions. Also pad desc extra structure to 16 bytes aligned, thus four elements will be saved in one batch. Signed-off-by: Marvin Liu diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile index 9ef445bc9..4d20cb61a 100644 --- a/drivers/net/virtio/Makefile +++ b/drivers/net/virtio/Makefile @@ -37,6 +37,40 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_altivec.c else ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),) SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_neon.c endif + +ifneq ($(FORCE_DISABLE_AVX512), y) + CC_AVX512_SUPPORT=\ + $(shell $(CC) -march=native -dM -E - &1 | \ + sed '/./{H;$$!d} ; x ; /AVX512F/!d; /AVX512BW/!d; /AVX512VL/!d' | \ + grep -q AVX512 && echo 1) +endif + +ifeq ($(CC_AVX512_SUPPORT), 1) +CFLAGS += -DCC_AVX512_SUPPORT +SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_packed_avx.c + +ifeq ($(RTE_TOOLCHAIN), gcc) +ifeq ($(shell test $(GCC_VERSION) -ge 83 && echo 1), 1) +CFLAGS += -DVIRTIO_GCC_UNROLL_PRAGMA +endif +endif + +ifeq ($(RTE_TOOLCHAIN), clang) +ifeq ($(shell test $(CLANG_MAJOR_VERSION)$(CLANG_MINOR_VERSION) -ge 37 && echo 1), 1) +CFLAGS += -DVIRTIO_CLANG_UNROLL_PRAGMA +endif +endif + +ifeq ($(RTE_TOOLCHAIN), icc) +ifeq ($(shell test $(ICC_MAJOR_VERSION) -ge 16 && echo 1), 1) +CFLAGS += -DVIRTIO_ICC_UNROLL_PRAGMA +endif +endif + +ifeq ($(shell test $(GCC_VERSION) -ge 100 && echo 1), 1) +CFLAGS_virtio_rxtx_packed_avx.o += -Wno-zero-length-bounds +endif +endif endif ifeq ($(CONFIG_RTE_VIRTIO_USER),y) diff --git a/drivers/net/virtio/meson.build b/drivers/net/virtio/meson.build index 04c7fdf25..00f84282c 100644 --- a/drivers/net/virtio/meson.build +++ b/drivers/net/virtio/meson.build @@ -11,6 +11,19 @@ deps += ['kvargs', 'bus_pci'] if arch_subdir == 'x86' sources += files('virtio_rxtx_simple_sse.c') + if dpdk_conf.has('RTE_MACHINE_CPUFLAG_AVX512F') + if '-mno-avx512f' not in machine_args and cc.has_argument('-mavx512vl') and cc.has_argument('-mavx512bw') + cflags += ['-DCC_AVX512_SUPPORT'] + if (toolchain == 'gcc' and cc.version().version_compare('>=8.3.0')) + cflags += '-DVHOST_GCC_UNROLL_PRAGMA' + elif (toolchain == 'clang' and cc.version().version_compare('>=3.7.0')) + cflags += '-DVHOST_CLANG_UNROLL_PRAGMA' + elif (toolchain == 'icc' and cc.version().version_compare('>=16.0.0')) + cflags += '-DVHOST_ICC_UNROLL_PRAGMA' + endif + sources += files('virtio_rxtx_packed_avx.c') + endif + endif elif arch_subdir == 'ppc_64' sources += files('virtio_rxtx_simple_altivec.c') elif arch_subdir == 'arm' and host_machine.cpu_family().startswith('aarch64') diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h index cd8947656..10e39670e 100644 --- a/drivers/net/virtio/virtio_ethdev.h +++ b/drivers/net/virtio/virtio_ethdev.h @@ -104,6 +104,9 @@ uint16_t virtio_xmit_pkts_inorder(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts); +uint16_t virtio_recv_pkts_packed_vec(void *rx_queue, struct rte_mbuf **rx_pkts, + uint16_t nb_pkts); + int eth_virtio_dev_init(struct rte_eth_dev *eth_dev); void virtio_interrupt_handler(void *param); diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 285af1d47..965ce3dab 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -1245,7 +1245,6 @@ virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr) return 0; } -#define VIRTIO_MBUF_BURST_SZ 64 #define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc)) uint16_t virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) @@ -2328,3 +2327,11 @@ virtio_xmit_pkts_inorder(void *tx_queue, return nb_tx; } + +__rte_weak uint16_t +virtio_recv_pkts_packed_vec(void __rte_unused *rx_queue, + struct rte_mbuf __rte_unused **rx_pkts, + uint16_t __rte_unused nb_pkts) +{ + return 0; +} diff --git a/drivers/net/virtio/virtio_rxtx_packed_avx.c b/drivers/net/virtio/virtio_rxtx_packed_avx.c new file mode 100644 index 000000000..f2976b98f --- /dev/null +++ b/drivers/net/virtio/virtio_rxtx_packed_avx.c @@ -0,0 +1,358 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2020 Intel Corporation + */ + +#include +#include +#include +#include +#include + +#include + +#include "virtio_logs.h" +#include "virtio_ethdev.h" +#include "virtio_pci.h" +#include "virtqueue.h" + +#define PACKED_FLAGS_MASK (1ULL << 55 | 1ULL << 63) + +#define PACKED_BATCH_SIZE (RTE_CACHE_LINE_SIZE / \ + sizeof(struct vring_packed_desc)) +#define PACKED_BATCH_MASK (PACKED_BATCH_SIZE - 1) + +#ifdef VIRTIO_GCC_UNROLL_PRAGMA +#define virtio_for_each_try_unroll(iter, val, size) _Pragma("GCC unroll 4") \ + for (iter = val; iter < size; iter++) +#endif + +#ifdef VIRTIO_CLANG_UNROLL_PRAGMA +#define virtio_for_each_try_unroll(iter, val, size) _Pragma("unroll 4") \ + for (iter = val; iter < size; iter++) +#endif + +#ifdef VIRTIO_ICC_UNROLL_PRAGMA +#define virtio_for_each_try_unroll(iter, val, size) _Pragma("unroll (4)") \ + for (iter = val; iter < size; iter++) +#endif + +#ifndef virtio_for_each_try_unroll +#define virtio_for_each_try_unroll(iter, val, num) \ + for (iter = val; iter < num; iter++) +#endif + + +static inline void +virtio_update_batch_stats(struct virtnet_stats *stats, + uint16_t pkt_len1, + uint16_t pkt_len2, + uint16_t pkt_len3, + uint16_t pkt_len4) +{ + stats->bytes += pkt_len1; + stats->bytes += pkt_len2; + stats->bytes += pkt_len3; + stats->bytes += pkt_len4; +} +/* Optionally fill offload information in structure */ +static inline int +virtio_vec_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr) +{ + struct rte_net_hdr_lens hdr_lens; + uint32_t hdrlen, ptype; + int l4_supported = 0; + + /* nothing to do */ + if (hdr->flags == 0) + return 0; + + /* GSO not support in vec path, skip check */ + m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN; + + ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK); + m->packet_type = ptype; + if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP || + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP || + (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP) + l4_supported = 1; + + if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) { + hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len; + if (hdr->csum_start <= hdrlen && l4_supported) { + m->ol_flags |= PKT_RX_L4_CKSUM_NONE; + } else { + /* Unknown proto or tunnel, do sw cksum. We can assume + * the cksum field is in the first segment since the + * buffers we provided to the host are large enough. + * In case of SCTP, this will be wrong since it's a CRC + * but there's nothing we can do. + */ + uint16_t csum = 0, off; + + rte_raw_cksum_mbuf(m, hdr->csum_start, + rte_pktmbuf_pkt_len(m) - hdr->csum_start, + &csum); + if (likely(csum != 0xffff)) + csum = ~csum; + off = hdr->csum_offset + hdr->csum_start; + if (rte_pktmbuf_data_len(m) >= off + 1) + *rte_pktmbuf_mtod_offset(m, uint16_t *, + off) = csum; + } + } else if (hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID && l4_supported) { + m->ol_flags |= PKT_RX_L4_CKSUM_GOOD; + } + + return 0; +} + +static uint16_t +virtqueue_dequeue_batch_packed_vec(struct virtnet_rx *rxvq, + struct rte_mbuf **rx_pkts) +{ + struct virtqueue *vq = rxvq->vq; + struct virtio_hw *hw = vq->hw; + uint16_t hdr_size = hw->vtnet_hdr_size; + uint64_t addrs[PACKED_BATCH_SIZE << 1]; + uint16_t id = vq->vq_used_cons_idx; + uint8_t desc_stats; + uint16_t i; + void *desc_addr; + + if (id & PACKED_BATCH_MASK) + return -1; + + /* only care avail/used bits */ + __m512i desc_flags = _mm512_maskz_set1_epi64(0xaa, PACKED_FLAGS_MASK); + desc_addr = &vq->vq_packed.ring.desc[id]; + + rte_smp_rmb(); + __m512i packed_desc = _mm512_loadu_si512(desc_addr); + __m512i flags_mask = _mm512_maskz_and_epi64(0xff, packed_desc, + desc_flags); + + __m512i used_flags; + if (vq->vq_packed.used_wrap_counter) + used_flags = _mm512_maskz_set1_epi64(0xaa, PACKED_FLAGS_MASK); + else + used_flags = _mm512_setzero_si512(); + + /* Check all descs are used */ + desc_stats = _mm512_cmp_epu64_mask(flags_mask, used_flags, + _MM_CMPINT_EQ); + if (desc_stats != 0xff) + return -1; + + virtio_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { + rx_pkts[i] = (struct rte_mbuf *)vq->vq_descx[id + i].cookie; + rte_packet_prefetch(rte_pktmbuf_mtod(rx_pkts[i], void *)); + + addrs[i << 1] = (uint64_t)rx_pkts[i]->rx_descriptor_fields1; + addrs[(i << 1) + 1] = + (uint64_t)rx_pkts[i]->rx_descriptor_fields1 + 8; + } + + /* addresses of pkt_len and data_len */ + __m512i vindex = _mm512_loadu_si512((void *)addrs); + + /* + * select 10b*4 load 32bit from packed_desc[95:64] + * mmask 0110b*4 save 32bit into pkt_len and data_len + */ + __m512i value = _mm512_maskz_shuffle_epi32(0x6666, packed_desc, 0xAA); + + /* mmask 0110b*4 reduce hdr_len from pkt_len and data_len */ + __m512i mbuf_len_offset = _mm512_maskz_set1_epi32(0x6666, + (uint32_t)-hdr_size); + + value = _mm512_add_epi32(value, mbuf_len_offset); + /* batch store into mbufs */ + _mm512_i64scatter_epi64(0, vindex, value, 1); + + if (hw->has_rx_offload) { + virtio_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { + char *addr = (char *)rx_pkts[i]->buf_addr + + RTE_PKTMBUF_HEADROOM - hdr_size; + virtio_vec_rx_offload(rx_pkts[i], + (struct virtio_net_hdr *)addr); + } + } + + virtio_update_batch_stats(&rxvq->stats, rx_pkts[0]->pkt_len, + rx_pkts[1]->pkt_len, rx_pkts[2]->pkt_len, + rx_pkts[3]->pkt_len); + + vq->vq_free_cnt += PACKED_BATCH_SIZE; + + vq->vq_used_cons_idx += PACKED_BATCH_SIZE; + if (vq->vq_used_cons_idx >= vq->vq_nentries) { + vq->vq_used_cons_idx -= vq->vq_nentries; + vq->vq_packed.used_wrap_counter ^= 1; + } + + return 0; +} + +static uint16_t +virtqueue_dequeue_single_packed_vec(struct virtnet_rx *rxvq, + struct rte_mbuf **rx_pkts) +{ + uint16_t used_idx, id; + uint32_t len; + struct virtqueue *vq = rxvq->vq; + struct virtio_hw *hw = vq->hw; + uint32_t hdr_size = hw->vtnet_hdr_size; + struct virtio_net_hdr *hdr; + struct vring_packed_desc *desc; + struct rte_mbuf *cookie; + + desc = vq->vq_packed.ring.desc; + used_idx = vq->vq_used_cons_idx; + if (!desc_is_used(&desc[used_idx], vq)) + return -1; + + len = desc[used_idx].len; + id = desc[used_idx].id; + cookie = (struct rte_mbuf *)vq->vq_descx[id].cookie; + if (unlikely(cookie == NULL)) { + PMD_DRV_LOG(ERR, "vring descriptor with no mbuf cookie at %u", + vq->vq_used_cons_idx); + return -1; + } + rte_prefetch0(cookie); + rte_packet_prefetch(rte_pktmbuf_mtod(cookie, void *)); + + cookie->data_off = RTE_PKTMBUF_HEADROOM; + cookie->ol_flags = 0; + cookie->pkt_len = (uint32_t)(len - hdr_size); + cookie->data_len = (uint32_t)(len - hdr_size); + + hdr = (struct virtio_net_hdr *)((char *)cookie->buf_addr + + RTE_PKTMBUF_HEADROOM - hdr_size); + if (hw->has_rx_offload) + virtio_vec_rx_offload(cookie, hdr); + + *rx_pkts = cookie; + + rxvq->stats.bytes += cookie->pkt_len; + + vq->vq_free_cnt++; + vq->vq_used_cons_idx++; + if (vq->vq_used_cons_idx >= vq->vq_nentries) { + vq->vq_used_cons_idx -= vq->vq_nentries; + vq->vq_packed.used_wrap_counter ^= 1; + } + + return 0; +} + +static inline void +virtio_recv_refill_packed_vec(struct virtnet_rx *rxvq, + struct rte_mbuf **cookie, + uint16_t num) +{ + struct virtqueue *vq = rxvq->vq; + struct vring_packed_desc *start_dp = vq->vq_packed.ring.desc; + uint16_t flags = vq->vq_packed.cached_flags; + struct virtio_hw *hw = vq->hw; + struct vq_desc_extra *dxp; + uint16_t idx, i; + uint16_t total_num = 0; + uint16_t head_idx = vq->vq_avail_idx; + uint16_t head_flag = vq->vq_packed.cached_flags; + uint64_t addr; + + do { + idx = vq->vq_avail_idx; + virtio_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { + dxp = &vq->vq_descx[idx + i]; + dxp->cookie = (void *)cookie[total_num + i]; + + addr = VIRTIO_MBUF_ADDR(cookie[total_num + i], vq) + + RTE_PKTMBUF_HEADROOM - hw->vtnet_hdr_size; + start_dp[idx + i].addr = addr; + start_dp[idx + i].len = cookie[total_num + i]->buf_len + - RTE_PKTMBUF_HEADROOM + hw->vtnet_hdr_size; + if (total_num || i) { + virtqueue_store_flags_packed(&start_dp[idx + i], + flags, hw->weak_barriers); + } + } + + vq->vq_avail_idx += PACKED_BATCH_SIZE; + if (vq->vq_avail_idx >= vq->vq_nentries) { + vq->vq_avail_idx -= vq->vq_nentries; + vq->vq_packed.cached_flags ^= + VRING_PACKED_DESC_F_AVAIL_USED; + flags = vq->vq_packed.cached_flags; + } + total_num += PACKED_BATCH_SIZE; + } while (total_num < num); + + virtqueue_store_flags_packed(&start_dp[head_idx], head_flag, + hw->weak_barriers); + vq->vq_free_cnt = (uint16_t)(vq->vq_free_cnt - num); +} + +uint16_t +virtio_recv_pkts_packed_vec(void *rx_queue, + struct rte_mbuf **rx_pkts, + uint16_t nb_pkts) +{ + struct virtnet_rx *rxvq = rx_queue; + struct virtqueue *vq = rxvq->vq; + struct virtio_hw *hw = vq->hw; + uint16_t num, nb_rx = 0; + uint32_t nb_enqueued = 0; + uint16_t free_cnt = vq->vq_free_thresh; + + if (unlikely(hw->started == 0)) + return nb_rx; + + num = RTE_MIN(VIRTIO_MBUF_BURST_SZ, nb_pkts); + if (likely(num > PACKED_BATCH_SIZE)) + num = num - ((vq->vq_used_cons_idx + num) % PACKED_BATCH_SIZE); + + while (num) { + if (!virtqueue_dequeue_batch_packed_vec(rxvq, + &rx_pkts[nb_rx])) { + nb_rx += PACKED_BATCH_SIZE; + num -= PACKED_BATCH_SIZE; + continue; + } + if (!virtqueue_dequeue_single_packed_vec(rxvq, + &rx_pkts[nb_rx])) { + nb_rx++; + num--; + continue; + } + break; + }; + + PMD_RX_LOG(DEBUG, "dequeue:%d", num); + + rxvq->stats.packets += nb_rx; + + if (likely(vq->vq_free_cnt >= free_cnt)) { + struct rte_mbuf *new_pkts[free_cnt]; + if (likely(rte_pktmbuf_alloc_bulk(rxvq->mpool, new_pkts, + free_cnt) == 0)) { + virtio_recv_refill_packed_vec(rxvq, new_pkts, + free_cnt); + nb_enqueued += free_cnt; + } else { + struct rte_eth_dev *dev = + &rte_eth_devices[rxvq->port_id]; + dev->data->rx_mbuf_alloc_failed += free_cnt; + } + } + + if (likely(nb_enqueued)) { + if (unlikely(virtqueue_kick_prepare_packed(vq))) { + virtqueue_notify(vq); + PMD_RX_LOG(DEBUG, "Notified"); + } + } + + return nb_rx; +} diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h index 6301c56b2..43e305ecc 100644 --- a/drivers/net/virtio/virtqueue.h +++ b/drivers/net/virtio/virtqueue.h @@ -20,6 +20,7 @@ struct rte_mbuf; #define DEFAULT_RX_FREE_THRESH 32 +#define VIRTIO_MBUF_BURST_SZ 64 /* * Per virtio_ring.h in Linux. * For virtio_pci on SMP, we don't need to order with respect to MMIO @@ -236,7 +237,8 @@ struct vq_desc_extra { void *cookie; uint16_t ndescs; uint16_t next; -}; + uint8_t padding[4]; +} __rte_packed __rte_aligned(16); struct virtqueue { struct virtio_hw *hw; /**< virtio_hw structure pointer. */ From patchwork Wed Apr 15 16:47:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marvin Liu X-Patchwork-Id: 68544 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1CDD1A0563; Wed, 15 Apr 2020 11:14:12 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 8C08C1D670; Wed, 15 Apr 2020 11:13:32 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 2308E1D657 for ; Wed, 15 Apr 2020 11:13:25 +0200 (CEST) IronPort-SDR: TMREwekndswtlesLflY+92vUyyum2ULMKgC2V9rrT3GmMQvXUZwYM3aa/EIohcgvRMH6HQR47/ saxeiN7fFDkw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 02:13:25 -0700 IronPort-SDR: PKZIhpomQY+Tye3HQ78cSKPHBI+ihirV9AQlq5S4ex7eUbFK4IOwcMhWr+DI/BKBd7C5HK+zRs wKxUv2ax4uqQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,386,1580803200"; d="scan'208";a="400250864" Received: from npg-dpdk-virtual-marvin-dev.sh.intel.com ([10.67.119.58]) by orsmga004.jf.intel.com with ESMTP; 15 Apr 2020 02:13:23 -0700 From: Marvin Liu To: maxime.coquelin@redhat.com, xiaolong.ye@intel.com, zhihong.wang@intel.com Cc: harry.van.haaren@intel.com, dev@dpdk.org, Marvin Liu Date: Thu, 16 Apr 2020 00:47:29 +0800 Message-Id: <20200415164733.75416-5-yong.liu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200415164733.75416-1-yong.liu@intel.com> References: <20200313174230.74661-1-yong.liu@intel.com> <20200415164733.75416-1-yong.liu@intel.com> Subject: [dpdk-dev] [PATCH v4 4/8] net/virtio: reuse packed ring xmit functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Move xmit offload and packed ring xmit enqueue function to header file. These functions will be reused by packed ring vectorized Tx function. Signed-off-by: Marvin Liu diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 965ce3dab..1d8135f4f 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -264,10 +264,6 @@ virtqueue_dequeue_rx_inorder(struct virtqueue *vq, return i; } -#ifndef DEFAULT_TX_FREE_THRESH -#define DEFAULT_TX_FREE_THRESH 32 -#endif - static void virtio_xmit_cleanup_inorder_packed(struct virtqueue *vq, int num) { @@ -562,68 +558,7 @@ virtio_tso_fix_cksum(struct rte_mbuf *m) } -/* avoid write operation when necessary, to lessen cache issues */ -#define ASSIGN_UNLESS_EQUAL(var, val) do { \ - if ((var) != (val)) \ - (var) = (val); \ -} while (0) - -#define virtqueue_clear_net_hdr(_hdr) do { \ - ASSIGN_UNLESS_EQUAL((_hdr)->csum_start, 0); \ - ASSIGN_UNLESS_EQUAL((_hdr)->csum_offset, 0); \ - ASSIGN_UNLESS_EQUAL((_hdr)->flags, 0); \ - ASSIGN_UNLESS_EQUAL((_hdr)->gso_type, 0); \ - ASSIGN_UNLESS_EQUAL((_hdr)->gso_size, 0); \ - ASSIGN_UNLESS_EQUAL((_hdr)->hdr_len, 0); \ -} while (0) - -static inline void -virtqueue_xmit_offload(struct virtio_net_hdr *hdr, - struct rte_mbuf *cookie, - bool offload) -{ - if (offload) { - if (cookie->ol_flags & PKT_TX_TCP_SEG) - cookie->ol_flags |= PKT_TX_TCP_CKSUM; - - switch (cookie->ol_flags & PKT_TX_L4_MASK) { - case PKT_TX_UDP_CKSUM: - hdr->csum_start = cookie->l2_len + cookie->l3_len; - hdr->csum_offset = offsetof(struct rte_udp_hdr, - dgram_cksum); - hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; - break; - - case PKT_TX_TCP_CKSUM: - hdr->csum_start = cookie->l2_len + cookie->l3_len; - hdr->csum_offset = offsetof(struct rte_tcp_hdr, cksum); - hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; - break; - - default: - ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0); - ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0); - ASSIGN_UNLESS_EQUAL(hdr->flags, 0); - break; - } - /* TCP Segmentation Offload */ - if (cookie->ol_flags & PKT_TX_TCP_SEG) { - hdr->gso_type = (cookie->ol_flags & PKT_TX_IPV6) ? - VIRTIO_NET_HDR_GSO_TCPV6 : - VIRTIO_NET_HDR_GSO_TCPV4; - hdr->gso_size = cookie->tso_segsz; - hdr->hdr_len = - cookie->l2_len + - cookie->l3_len + - cookie->l4_len; - } else { - ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0); - ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0); - ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0); - } - } -} static inline void virtqueue_enqueue_xmit_inorder(struct virtnet_tx *txvq, @@ -725,102 +660,6 @@ virtqueue_enqueue_xmit_packed_fast(struct virtnet_tx *txvq, virtqueue_store_flags_packed(dp, flags, vq->hw->weak_barriers); } -static inline void -virtqueue_enqueue_xmit_packed(struct virtnet_tx *txvq, struct rte_mbuf *cookie, - uint16_t needed, int can_push, int in_order) -{ - struct virtio_tx_region *txr = txvq->virtio_net_hdr_mz->addr; - struct vq_desc_extra *dxp; - struct virtqueue *vq = txvq->vq; - struct vring_packed_desc *start_dp, *head_dp; - uint16_t idx, id, head_idx, head_flags; - int16_t head_size = vq->hw->vtnet_hdr_size; - struct virtio_net_hdr *hdr; - uint16_t prev; - bool prepend_header = false; - - id = in_order ? vq->vq_avail_idx : vq->vq_desc_head_idx; - - dxp = &vq->vq_descx[id]; - dxp->ndescs = needed; - dxp->cookie = cookie; - - head_idx = vq->vq_avail_idx; - idx = head_idx; - prev = head_idx; - start_dp = vq->vq_packed.ring.desc; - - head_dp = &vq->vq_packed.ring.desc[idx]; - head_flags = cookie->next ? VRING_DESC_F_NEXT : 0; - head_flags |= vq->vq_packed.cached_flags; - - if (can_push) { - /* prepend cannot fail, checked by caller */ - hdr = rte_pktmbuf_mtod_offset(cookie, struct virtio_net_hdr *, - -head_size); - prepend_header = true; - - /* if offload disabled, it is not zeroed below, do it now */ - if (!vq->hw->has_tx_offload) - virtqueue_clear_net_hdr(hdr); - } else { - /* setup first tx ring slot to point to header - * stored in reserved region. - */ - start_dp[idx].addr = txvq->virtio_net_hdr_mem + - RTE_PTR_DIFF(&txr[idx].tx_hdr, txr); - start_dp[idx].len = vq->hw->vtnet_hdr_size; - hdr = (struct virtio_net_hdr *)&txr[idx].tx_hdr; - idx++; - if (idx >= vq->vq_nentries) { - idx -= vq->vq_nentries; - vq->vq_packed.cached_flags ^= - VRING_PACKED_DESC_F_AVAIL_USED; - } - } - - virtqueue_xmit_offload(hdr, cookie, vq->hw->has_tx_offload); - - do { - uint16_t flags; - - start_dp[idx].addr = VIRTIO_MBUF_DATA_DMA_ADDR(cookie, vq); - start_dp[idx].len = cookie->data_len; - if (prepend_header) { - start_dp[idx].addr -= head_size; - start_dp[idx].len += head_size; - prepend_header = false; - } - - if (likely(idx != head_idx)) { - flags = cookie->next ? VRING_DESC_F_NEXT : 0; - flags |= vq->vq_packed.cached_flags; - start_dp[idx].flags = flags; - } - prev = idx; - idx++; - if (idx >= vq->vq_nentries) { - idx -= vq->vq_nentries; - vq->vq_packed.cached_flags ^= - VRING_PACKED_DESC_F_AVAIL_USED; - } - } while ((cookie = cookie->next) != NULL); - - start_dp[prev].id = id; - - vq->vq_free_cnt = (uint16_t)(vq->vq_free_cnt - needed); - vq->vq_avail_idx = idx; - - if (!in_order) { - vq->vq_desc_head_idx = dxp->next; - if (vq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END) - vq->vq_desc_tail_idx = VQ_RING_DESC_CHAIN_END; - } - - virtqueue_store_flags_packed(head_dp, head_flags, - vq->hw->weak_barriers); -} - static inline void virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie, uint16_t needed, int use_indirect, int can_push, diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h index 43e305ecc..31c48710c 100644 --- a/drivers/net/virtio/virtqueue.h +++ b/drivers/net/virtio/virtqueue.h @@ -18,6 +18,7 @@ struct rte_mbuf; +#define DEFAULT_TX_FREE_THRESH 32 #define DEFAULT_RX_FREE_THRESH 32 #define VIRTIO_MBUF_BURST_SZ 64 @@ -562,4 +563,162 @@ virtqueue_notify(struct virtqueue *vq) #define VIRTQUEUE_DUMP(vq) do { } while (0) #endif +/* avoid write operation when necessary, to lessen cache issues */ +#define ASSIGN_UNLESS_EQUAL(var, val) do { \ + if ((var) != (val)) \ + (var) = (val); \ +} while (0) + +#define virtqueue_clear_net_hdr(_hdr) do { \ + ASSIGN_UNLESS_EQUAL((_hdr)->csum_start, 0); \ + ASSIGN_UNLESS_EQUAL((_hdr)->csum_offset, 0); \ + ASSIGN_UNLESS_EQUAL((_hdr)->flags, 0); \ + ASSIGN_UNLESS_EQUAL((_hdr)->gso_type, 0); \ + ASSIGN_UNLESS_EQUAL((_hdr)->gso_size, 0); \ + ASSIGN_UNLESS_EQUAL((_hdr)->hdr_len, 0); \ +} while (0) + +static inline void +virtqueue_xmit_offload(struct virtio_net_hdr *hdr, + struct rte_mbuf *cookie, + bool offload) +{ + if (offload) { + if (cookie->ol_flags & PKT_TX_TCP_SEG) + cookie->ol_flags |= PKT_TX_TCP_CKSUM; + + switch (cookie->ol_flags & PKT_TX_L4_MASK) { + case PKT_TX_UDP_CKSUM: + hdr->csum_start = cookie->l2_len + cookie->l3_len; + hdr->csum_offset = offsetof(struct rte_udp_hdr, + dgram_cksum); + hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; + break; + + case PKT_TX_TCP_CKSUM: + hdr->csum_start = cookie->l2_len + cookie->l3_len; + hdr->csum_offset = offsetof(struct rte_tcp_hdr, cksum); + hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; + break; + + default: + ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0); + ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0); + ASSIGN_UNLESS_EQUAL(hdr->flags, 0); + break; + } + + /* TCP Segmentation Offload */ + if (cookie->ol_flags & PKT_TX_TCP_SEG) { + hdr->gso_type = (cookie->ol_flags & PKT_TX_IPV6) ? + VIRTIO_NET_HDR_GSO_TCPV6 : + VIRTIO_NET_HDR_GSO_TCPV4; + hdr->gso_size = cookie->tso_segsz; + hdr->hdr_len = + cookie->l2_len + + cookie->l3_len + + cookie->l4_len; + } else { + ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0); + ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0); + ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0); + } + } +} + +static inline void +virtqueue_enqueue_xmit_packed(struct virtnet_tx *txvq, struct rte_mbuf *cookie, + uint16_t needed, int can_push, int in_order) +{ + struct virtio_tx_region *txr = txvq->virtio_net_hdr_mz->addr; + struct vq_desc_extra *dxp; + struct virtqueue *vq = txvq->vq; + struct vring_packed_desc *start_dp, *head_dp; + uint16_t idx, id, head_idx, head_flags; + int16_t head_size = vq->hw->vtnet_hdr_size; + struct virtio_net_hdr *hdr; + uint16_t prev; + bool prepend_header = false; + + id = in_order ? vq->vq_avail_idx : vq->vq_desc_head_idx; + + dxp = &vq->vq_descx[id]; + dxp->ndescs = needed; + dxp->cookie = cookie; + + head_idx = vq->vq_avail_idx; + idx = head_idx; + prev = head_idx; + start_dp = vq->vq_packed.ring.desc; + + head_dp = &vq->vq_packed.ring.desc[idx]; + head_flags = cookie->next ? VRING_DESC_F_NEXT : 0; + head_flags |= vq->vq_packed.cached_flags; + + if (can_push) { + /* prepend cannot fail, checked by caller */ + hdr = rte_pktmbuf_mtod_offset(cookie, struct virtio_net_hdr *, + -head_size); + prepend_header = true; + + /* if offload disabled, it is not zeroed below, do it now */ + if (!vq->hw->has_tx_offload) + virtqueue_clear_net_hdr(hdr); + } else { + /* setup first tx ring slot to point to header + * stored in reserved region. + */ + start_dp[idx].addr = txvq->virtio_net_hdr_mem + + RTE_PTR_DIFF(&txr[idx].tx_hdr, txr); + start_dp[idx].len = vq->hw->vtnet_hdr_size; + hdr = (struct virtio_net_hdr *)&txr[idx].tx_hdr; + idx++; + if (idx >= vq->vq_nentries) { + idx -= vq->vq_nentries; + vq->vq_packed.cached_flags ^= + VRING_PACKED_DESC_F_AVAIL_USED; + } + } + + virtqueue_xmit_offload(hdr, cookie, vq->hw->has_tx_offload); + + do { + uint16_t flags; + + start_dp[idx].addr = VIRTIO_MBUF_DATA_DMA_ADDR(cookie, vq); + start_dp[idx].len = cookie->data_len; + if (prepend_header) { + start_dp[idx].addr -= head_size; + start_dp[idx].len += head_size; + prepend_header = false; + } + + if (likely(idx != head_idx)) { + flags = cookie->next ? VRING_DESC_F_NEXT : 0; + flags |= vq->vq_packed.cached_flags; + start_dp[idx].flags = flags; + } + prev = idx; + idx++; + if (idx >= vq->vq_nentries) { + idx -= vq->vq_nentries; + vq->vq_packed.cached_flags ^= + VRING_PACKED_DESC_F_AVAIL_USED; + } + } while ((cookie = cookie->next) != NULL); + + start_dp[prev].id = id; + + vq->vq_free_cnt = (uint16_t)(vq->vq_free_cnt - needed); + vq->vq_avail_idx = idx; + + if (!in_order) { + vq->vq_desc_head_idx = dxp->next; + if (vq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END) + vq->vq_desc_tail_idx = VQ_RING_DESC_CHAIN_END; + } + + virtqueue_store_flags_packed(head_dp, head_flags, + vq->hw->weak_barriers); +} #endif /* _VIRTQUEUE_H_ */ From patchwork Wed Apr 15 16:47:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marvin Liu X-Patchwork-Id: 68545 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 38F06A0563; Wed, 15 Apr 2020 11:14:24 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id CD2D51D676; Wed, 15 Apr 2020 11:13:33 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 3ED581D661 for ; Wed, 15 Apr 2020 11:13:28 +0200 (CEST) IronPort-SDR: 4uUqQbEHXH4M2LU04rZ5+Kbn2vOa4wgK9xJl8uL798NXh+y4kd/ireXtmfOK3uCEumLgBK6acK KEjCxc3t4A5Q== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 02:13:27 -0700 IronPort-SDR: gDS2cc6ZdD4px1Q0TABnP44ofyRCjujptqeT/gryEv3k4+uEGsJRyaVQrxu4rKnQirEsrs6tQu nGEpYz5J4qsw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,386,1580803200"; d="scan'208";a="400250869" Received: from npg-dpdk-virtual-marvin-dev.sh.intel.com ([10.67.119.58]) by orsmga004.jf.intel.com with ESMTP; 15 Apr 2020 02:13:25 -0700 From: Marvin Liu To: maxime.coquelin@redhat.com, xiaolong.ye@intel.com, zhihong.wang@intel.com Cc: harry.van.haaren@intel.com, dev@dpdk.org, Marvin Liu Date: Thu, 16 Apr 2020 00:47:30 +0800 Message-Id: <20200415164733.75416-6-yong.liu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200415164733.75416-1-yong.liu@intel.com> References: <20200313174230.74661-1-yong.liu@intel.com> <20200415164733.75416-1-yong.liu@intel.com> Subject: [dpdk-dev] [PATCH v4 5/8] net/virtio: add vectorized packed ring Tx datapath X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Optimize packed ring Tx datapath alike Rx datapath. Split Tx datapath into batch and single Tx functions. Batch function further optimized by vector instructions. Signed-off-by: Marvin Liu diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h index 10e39670e..c9aaef0af 100644 --- a/drivers/net/virtio/virtio_ethdev.h +++ b/drivers/net/virtio/virtio_ethdev.h @@ -107,6 +107,9 @@ uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t virtio_recv_pkts_packed_vec(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts); +uint16_t virtio_xmit_pkts_packed_vec(void *tx_queue, struct rte_mbuf **tx_pkts, + uint16_t nb_pkts); + int eth_virtio_dev_init(struct rte_eth_dev *eth_dev); void virtio_interrupt_handler(void *param); diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 1d8135f4f..58c7778f4 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -2174,3 +2174,11 @@ virtio_recv_pkts_packed_vec(void __rte_unused *rx_queue, { return 0; } + +__rte_weak uint16_t +virtio_xmit_pkts_packed_vec(void __rte_unused *tx_queue, + struct rte_mbuf __rte_unused **tx_pkts, + uint16_t __rte_unused nb_pkts) +{ + return 0; +} diff --git a/drivers/net/virtio/virtio_rxtx_packed_avx.c b/drivers/net/virtio/virtio_rxtx_packed_avx.c index f2976b98f..732256c86 100644 --- a/drivers/net/virtio/virtio_rxtx_packed_avx.c +++ b/drivers/net/virtio/virtio_rxtx_packed_avx.c @@ -15,6 +15,21 @@ #include "virtio_pci.h" #include "virtqueue.h" +/* reference count offset in mbuf rearm data */ +#define REF_CNT_OFFSET 16 +/* segment number offset in mbuf rearm data */ +#define SEG_NUM_OFFSET 32 + +#define DEFAULT_REARM_DATA (1ULL << SEG_NUM_OFFSET | \ + 1ULL << REF_CNT_OFFSET) +/* id offset in packed ring desc higher 64bits */ +#define ID_OFFSET 32 +/* flag offset in packed ring desc higher 64bits */ +#define FLAG_OFFSET 48 + +/* net hdr short size mask */ +#define NET_HDR_MASK 0x3F + #define PACKED_FLAGS_MASK (1ULL << 55 | 1ULL << 63) #define PACKED_BATCH_SIZE (RTE_CACHE_LINE_SIZE / \ @@ -41,6 +56,47 @@ for (iter = val; iter < num; iter++) #endif +static void +virtio_xmit_cleanup_packed_vec(struct virtqueue *vq) +{ + struct vring_packed_desc *desc = vq->vq_packed.ring.desc; + struct vq_desc_extra *dxp; + uint16_t used_idx, id, curr_id, free_cnt = 0; + uint16_t size = vq->vq_nentries; + struct rte_mbuf *mbufs[size]; + uint16_t nb_mbuf = 0, i; + + used_idx = vq->vq_used_cons_idx; + + if (!desc_is_used(&desc[used_idx], vq)) + return; + + id = desc[used_idx].id; + + do { + curr_id = used_idx; + dxp = &vq->vq_descx[used_idx]; + used_idx += dxp->ndescs; + free_cnt += dxp->ndescs; + + if (dxp->cookie != NULL) { + mbufs[nb_mbuf] = dxp->cookie; + dxp->cookie = NULL; + nb_mbuf++; + } + + if (used_idx >= size) { + used_idx -= size; + vq->vq_packed.used_wrap_counter ^= 1; + } + } while (curr_id != id); + + for (i = 0; i < nb_mbuf; i++) + rte_pktmbuf_free(mbufs[i]); + + vq->vq_used_cons_idx = used_idx; + vq->vq_free_cnt += free_cnt; +} static inline void virtio_update_batch_stats(struct virtnet_stats *stats, @@ -54,6 +110,229 @@ virtio_update_batch_stats(struct virtnet_stats *stats, stats->bytes += pkt_len3; stats->bytes += pkt_len4; } + +static inline int +virtqueue_enqueue_batch_packed_vec(struct virtnet_tx *txvq, + struct rte_mbuf **tx_pkts) +{ + struct virtqueue *vq = txvq->vq; + uint16_t head_size = vq->hw->vtnet_hdr_size; + uint16_t idx = vq->vq_avail_idx; + struct virtio_net_hdr *hdr; + uint16_t i, cmp; + + if (vq->vq_avail_idx & PACKED_BATCH_MASK) + return -1; + + /* Load four mbufs rearm data */ + __m256i mbufs = _mm256_set_epi64x( + *tx_pkts[3]->rearm_data, + *tx_pkts[2]->rearm_data, + *tx_pkts[1]->rearm_data, + *tx_pkts[0]->rearm_data); + + /* refcnt=1 and nb_segs=1 */ + __m256i mbuf_ref = _mm256_set1_epi64x(DEFAULT_REARM_DATA); + __m256i head_rooms = _mm256_set1_epi16(head_size); + + /* Check refcnt and nb_segs */ + cmp = _mm256_cmpneq_epu16_mask(mbufs, mbuf_ref); + if (cmp & 0x6666) + return -1; + + /* Check headroom is enough */ + cmp = _mm256_mask_cmp_epu16_mask(0x1111, mbufs, head_rooms, + _MM_CMPINT_LT); + if (unlikely(cmp)) + return -1; + + __m512i dxps = _mm512_set_epi64( + 0x1, (uint64_t)tx_pkts[3], + 0x1, (uint64_t)tx_pkts[2], + 0x1, (uint64_t)tx_pkts[1], + 0x1, (uint64_t)tx_pkts[0]); + + _mm512_storeu_si512((void *)&vq->vq_descx[idx], dxps); + + virtio_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { + tx_pkts[i]->data_off -= head_size; + tx_pkts[i]->data_len += head_size; + } + +#ifdef RTE_VIRTIO_USER + __m512i descs_base = _mm512_set_epi64( + tx_pkts[3]->data_len, + (uint64_t)(*(uintptr_t *)((uintptr_t)tx_pkts[3])), + tx_pkts[2]->data_len, + (uint64_t)(*(uintptr_t *)((uintptr_t)tx_pkts[2])), + tx_pkts[1]->data_len, + (uint64_t)(*(uintptr_t *)((uintptr_t)tx_pkts[1])), + tx_pkts[0]->data_len, + (uint64_t)(*(uintptr_t *)((uintptr_t)tx_pkts[0]))); +#else + __m512i descs_base = _mm512_set_epi64( + tx_pkts[3]->data_len, tx_pkts[3]->buf_iova, + tx_pkts[2]->data_len, tx_pkts[2]->buf_iova, + tx_pkts[1]->data_len, tx_pkts[1]->buf_iova, + tx_pkts[0]->data_len, tx_pkts[0]->buf_iova); +#endif + + /* id offset and data offset */ + __m512i data_offsets = _mm512_set_epi64( + (uint64_t)3 << ID_OFFSET, tx_pkts[3]->data_off, + (uint64_t)2 << ID_OFFSET, tx_pkts[2]->data_off, + (uint64_t)1 << ID_OFFSET, tx_pkts[1]->data_off, + 0, tx_pkts[0]->data_off); + + __m512i new_descs = _mm512_add_epi64(descs_base, data_offsets); + + uint64_t flags_temp = (uint64_t)idx << ID_OFFSET | + (uint64_t)vq->vq_packed.cached_flags << FLAG_OFFSET; + + /* flags offset and guest virtual address offset */ +#ifdef RTE_VIRTIO_USER + __m128i flag_offset = _mm_set_epi64x(flags_temp, (uint64_t)vq->offset); +#else + __m128i flag_offset = _mm_set_epi64x(flags_temp, 0); +#endif + __m512i flag_offsets = _mm512_broadcast_i32x4(flag_offset); + + __m512i descs = _mm512_add_epi64(new_descs, flag_offsets); + + if (!vq->hw->has_tx_offload) { + __m128i mask = _mm_set1_epi16(0xFFFF); + virtio_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { + hdr = rte_pktmbuf_mtod_offset(tx_pkts[i], + struct virtio_net_hdr *, -head_size); + __m128i v_hdr = _mm_loadu_si128((void *)hdr); + if (unlikely(_mm_mask_test_epi16_mask(NET_HDR_MASK, + v_hdr, mask))) { + __m128i all_zero = _mm_setzero_si128(); + _mm_mask_storeu_epi16((void *)hdr, + NET_HDR_MASK, all_zero); + } + } + } else { + virtio_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { + hdr = rte_pktmbuf_mtod_offset(tx_pkts[i], + struct virtio_net_hdr *, -head_size); + virtqueue_xmit_offload(hdr, tx_pkts[i], true); + } + } + + /* Enqueue Packet buffers */ + rte_smp_wmb(); + _mm512_storeu_si512((void *)&vq->vq_packed.ring.desc[idx], descs); + + virtio_update_batch_stats(&txvq->stats, tx_pkts[0]->pkt_len, + tx_pkts[1]->pkt_len, tx_pkts[2]->pkt_len, + tx_pkts[3]->pkt_len); + + vq->vq_avail_idx += PACKED_BATCH_SIZE; + vq->vq_free_cnt -= PACKED_BATCH_SIZE; + + if (vq->vq_avail_idx >= vq->vq_nentries) { + vq->vq_avail_idx -= vq->vq_nentries; + vq->vq_packed.cached_flags ^= + VRING_PACKED_DESC_F_AVAIL_USED; + } + + return 0; +} + +static inline int +virtqueue_enqueue_single_packed_vec(struct virtnet_tx *txvq, + struct rte_mbuf *txm) +{ + struct virtqueue *vq = txvq->vq; + struct virtio_hw *hw = vq->hw; + uint16_t hdr_size = hw->vtnet_hdr_size; + uint16_t slots, can_push; + int16_t need; + + /* How many main ring entries are needed to this Tx? + * any_layout => number of segments + * default => number of segments + 1 + */ + can_push = rte_mbuf_refcnt_read(txm) == 1 && + RTE_MBUF_DIRECT(txm) && + txm->nb_segs == 1 && + rte_pktmbuf_headroom(txm) >= hdr_size; + + slots = txm->nb_segs + !can_push; + need = slots - vq->vq_free_cnt; + + /* Positive value indicates it need free vring descriptors */ + if (unlikely(need > 0)) { + virtio_xmit_cleanup_packed_vec(vq); + need = slots - vq->vq_free_cnt; + if (unlikely(need > 0)) { + PMD_TX_LOG(ERR, + "No free tx descriptors to transmit"); + return -1; + } + } + + /* Enqueue Packet buffers */ + virtqueue_enqueue_xmit_packed(txvq, txm, slots, can_push, 1); + + txvq->stats.bytes += txm->pkt_len; + return 0; +} + +uint16_t +virtio_xmit_pkts_packed_vec(void *tx_queue, struct rte_mbuf **tx_pkts, + uint16_t nb_pkts) +{ + struct virtnet_tx *txvq = tx_queue; + struct virtqueue *vq = txvq->vq; + struct virtio_hw *hw = vq->hw; + uint16_t nb_tx = 0; + uint16_t remained; + + if (unlikely(hw->started == 0 && tx_pkts != hw->inject_pkts)) + return nb_tx; + + if (unlikely(nb_pkts < 1)) + return nb_pkts; + + PMD_TX_LOG(DEBUG, "%d packets to xmit", nb_pkts); + + if (vq->vq_free_cnt <= vq->vq_nentries - vq->vq_free_thresh) + virtio_xmit_cleanup_packed_vec(vq); + + remained = RTE_MIN(nb_pkts, vq->vq_free_cnt); + + while (remained) { + if (remained >= PACKED_BATCH_SIZE) { + if (!virtqueue_enqueue_batch_packed_vec(txvq, + &tx_pkts[nb_tx])) { + nb_tx += PACKED_BATCH_SIZE; + remained -= PACKED_BATCH_SIZE; + continue; + } + } + if (!virtqueue_enqueue_single_packed_vec(txvq, + tx_pkts[nb_tx])) { + nb_tx++; + remained--; + continue; + } + break; + }; + + txvq->stats.packets += nb_tx; + + if (likely(nb_tx)) { + if (unlikely(virtqueue_kick_prepare_packed(vq))) { + virtqueue_notify(vq); + PMD_TX_LOG(DEBUG, "Notified backend after xmit"); + } + } + + return nb_tx; +} + /* Optionally fill offload information in structure */ static inline int virtio_vec_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr) From patchwork Wed Apr 15 16:47:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marvin Liu X-Patchwork-Id: 68546 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 83377A0563; Wed, 15 Apr 2020 11:14:33 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 123C91D67E; Wed, 15 Apr 2020 11:13:35 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id EAB8D1D668 for ; Wed, 15 Apr 2020 11:13:29 +0200 (CEST) IronPort-SDR: XFg8J2kY/nD3oCVpVhwQwQOUSuNS7sBnTrFOXrhHvgeDv5NkTX6aBEQDgwO/vfDqQKdLidBzi4 J9IdZROsD7ww== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 02:13:29 -0700 IronPort-SDR: 1jm7ma+0ser4p2Gp6k9/Gfw8lpOLLUTRe8BrsoEzsSTAh+E6DrK+VEGB21yIaV1NMaNM0bd+UN 4Hs6EGD8lNvg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,386,1580803200"; d="scan'208";a="400250874" Received: from npg-dpdk-virtual-marvin-dev.sh.intel.com ([10.67.119.58]) by orsmga004.jf.intel.com with ESMTP; 15 Apr 2020 02:13:28 -0700 From: Marvin Liu To: maxime.coquelin@redhat.com, xiaolong.ye@intel.com, zhihong.wang@intel.com Cc: harry.van.haaren@intel.com, dev@dpdk.org, Marvin Liu Date: Thu, 16 Apr 2020 00:47:31 +0800 Message-Id: <20200415164733.75416-7-yong.liu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200415164733.75416-1-yong.liu@intel.com> References: <20200313174230.74661-1-yong.liu@intel.com> <20200415164733.75416-1-yong.liu@intel.com> Subject: [dpdk-dev] [PATCH v4 6/8] eal/x86: identify AVX512 extensions flag X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Read CPUID to check if AVX512 extensions are supported. Signed-off-by: Marvin Liu diff --git a/lib/librte_eal/common/arch/x86/rte_cpuflags.c b/lib/librte_eal/common/arch/x86/rte_cpuflags.c index 6492df556..54e9f6185 100644 --- a/lib/librte_eal/common/arch/x86/rte_cpuflags.c +++ b/lib/librte_eal/common/arch/x86/rte_cpuflags.c @@ -109,6 +109,9 @@ const struct feature_entry rte_cpu_feature_table[] = { FEAT_DEF(RTM, 0x00000007, 0, RTE_REG_EBX, 11) FEAT_DEF(AVX512F, 0x00000007, 0, RTE_REG_EBX, 16) FEAT_DEF(RDSEED, 0x00000007, 0, RTE_REG_EBX, 18) + FEAT_DEF(AVX512CD, 0x00000007, 0, RTE_REG_EBX, 28) + FEAT_DEF(AVX512BW, 0x00000007, 0, RTE_REG_EBX, 30) + FEAT_DEF(AVX512VL, 0x00000007, 0, RTE_REG_EBX, 31) FEAT_DEF(LAHF_SAHF, 0x80000001, 0, RTE_REG_ECX, 0) FEAT_DEF(LZCNT, 0x80000001, 0, RTE_REG_ECX, 4) diff --git a/lib/librte_eal/common/include/arch/x86/rte_cpuflags.h b/lib/librte_eal/common/include/arch/x86/rte_cpuflags.h index 25ba47b96..5bf99e05f 100644 --- a/lib/librte_eal/common/include/arch/x86/rte_cpuflags.h +++ b/lib/librte_eal/common/include/arch/x86/rte_cpuflags.h @@ -98,6 +98,9 @@ enum rte_cpu_flag_t { RTE_CPUFLAG_RTM, /**< Transactional memory */ RTE_CPUFLAG_AVX512F, /**< AVX512F */ RTE_CPUFLAG_RDSEED, /**< RDSEED instruction */ + RTE_CPUFLAG_AVX512CD, /**< AVX512CD */ + RTE_CPUFLAG_AVX512BW, /**< AVX512BW */ + RTE_CPUFLAG_AVX512VL, /**< AVX512VL */ /* (EAX 80000001h) ECX features */ RTE_CPUFLAG_LAHF_SAHF, /**< LAHF_SAHF */ From patchwork Wed Apr 15 16:47:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marvin Liu X-Patchwork-Id: 68547 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id C4D7DA0563; Wed, 15 Apr 2020 11:14:40 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 22EF31D686; Wed, 15 Apr 2020 11:13:36 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 0BE6E1D66B for ; Wed, 15 Apr 2020 11:13:31 +0200 (CEST) IronPort-SDR: ahbSKEmzv+9SkaRIPiSF7r6SKJ0obtT9cg6P/3m33Tzx8LrdGGa32fALb1JAt4WU1aYrPbyuTH bwFK46R/oHlw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 02:13:31 -0700 IronPort-SDR: MhdvKicbHsot7hDkVdsfdZWp7RAe3/ueV9zNAplFPnqELiV9sFsxr/LLET4/dKpHvEmlXFqeZy Toviak38G3cQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,386,1580803200"; d="scan'208";a="400250881" Received: from npg-dpdk-virtual-marvin-dev.sh.intel.com ([10.67.119.58]) by orsmga004.jf.intel.com with ESMTP; 15 Apr 2020 02:13:29 -0700 From: Marvin Liu To: maxime.coquelin@redhat.com, xiaolong.ye@intel.com, zhihong.wang@intel.com Cc: harry.van.haaren@intel.com, dev@dpdk.org, Marvin Liu Date: Thu, 16 Apr 2020 00:47:32 +0800 Message-Id: <20200415164733.75416-8-yong.liu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200415164733.75416-1-yong.liu@intel.com> References: <20200313174230.74661-1-yong.liu@intel.com> <20200415164733.75416-1-yong.liu@intel.com> Subject: [dpdk-dev] [PATCH v4 7/8] net/virtio: add election for vectorized datapath X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Packed ring vectorized datapath will be selected when criterian matched. 1. vectorized option is enabled 2. AVX512F and required extensions are supported by compiler and host 3. virtio VERSION_1 and IN_ORDER features are negotiated 4. virtio mergeable feature is not negotiated 5. LRO offloading is disabled Split ring vectorized rx will be selected when criterian matched. 1. vectorized option is enabled 2. virtio mergeable and IN_ORDER features are not negotiated 3. LRO, chksum and vlan strip offloading are disabled Signed-off-by: Marvin Liu diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 19a36ad82..a6ce3a0b0 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -1518,9 +1518,12 @@ set_rxtx_funcs(struct rte_eth_dev *eth_dev) if (vtpci_packed_queue(hw)) { PMD_INIT_LOG(INFO, "virtio: using packed ring %s Tx path on port %u", - hw->use_inorder_tx ? "inorder" : "standard", + hw->use_vec_tx ? "vectorized" : "standard", eth_dev->data->port_id); - eth_dev->tx_pkt_burst = virtio_xmit_pkts_packed; + if (hw->use_vec_tx) + eth_dev->tx_pkt_burst = virtio_xmit_pkts_packed_vec; + else + eth_dev->tx_pkt_burst = virtio_xmit_pkts_packed; } else { if (hw->use_inorder_tx) { PMD_INIT_LOG(INFO, "virtio: using inorder Tx path on port %u", @@ -1534,7 +1537,13 @@ set_rxtx_funcs(struct rte_eth_dev *eth_dev) } if (vtpci_packed_queue(hw)) { - if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) { + if (hw->use_vec_rx) { + PMD_INIT_LOG(INFO, + "virtio: using packed ring vectorized Rx path on port %u", + eth_dev->data->port_id); + eth_dev->rx_pkt_burst = + &virtio_recv_pkts_packed_vec; + } else if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) { PMD_INIT_LOG(INFO, "virtio: using packed ring mergeable buffer Rx path on port %u", eth_dev->data->port_id); @@ -1548,7 +1557,7 @@ set_rxtx_funcs(struct rte_eth_dev *eth_dev) } } else { if (hw->use_vec_rx) { - PMD_INIT_LOG(INFO, "virtio: using simple Rx path on port %u", + PMD_INIT_LOG(INFO, "virtio: using vectorized Rx path on port %u", eth_dev->data->port_id); eth_dev->rx_pkt_burst = virtio_recv_pkts_vec; } else if (hw->use_inorder_rx) { @@ -1921,6 +1930,10 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev) goto err_virtio_init; hw->opened = true; +#ifdef RTE_LIBRTE_VIRTIO_INC_VECTOR + hw->use_vec_rx = 1; + hw->use_vec_tx = 1; +#endif return 0; @@ -2157,31 +2170,63 @@ virtio_dev_configure(struct rte_eth_dev *dev) return -EBUSY; } - if (vtpci_with_feature(hw, VIRTIO_F_IN_ORDER)) { - hw->use_inorder_tx = 1; - hw->use_inorder_rx = 1; - hw->use_vec_rx = 0; - } - if (vtpci_packed_queue(hw)) { - hw->use_vec_rx = 0; - hw->use_inorder_rx = 0; - } + if ((hw->use_vec_rx || hw->use_vec_tx) && + (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) || + !rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW) || + !rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512VL) || + !vtpci_with_feature(hw, VIRTIO_F_IN_ORDER) || + !vtpci_with_feature(hw, VIRTIO_F_VERSION_1))) { + PMD_DRV_LOG(INFO, + "disabled packed ring vectorization for requirements are not met"); + hw->use_vec_rx = 0; + hw->use_vec_tx = 0; + } + + if (hw->use_vec_rx) { + if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) { + PMD_DRV_LOG(INFO, + "disabled packed ring vectorized rx for mrg_rxbuf enabled"); + hw->use_vec_rx = 0; + } + if (rx_offloads & DEV_RX_OFFLOAD_TCP_LRO) { + PMD_DRV_LOG(INFO, + "disabled packed ring vectorized rx for TCP_LRO enabled"); + hw->use_vec_rx = 0; + } + } + } else { + if (vtpci_with_feature(hw, VIRTIO_F_IN_ORDER)) { + hw->use_inorder_tx = 1; + hw->use_inorder_rx = 1; + hw->use_vec_rx = 0; + } + + if (hw->use_vec_rx) { #if defined RTE_ARCH_ARM64 || defined RTE_ARCH_ARM - if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) { - hw->use_vec_rx = 0; - } + if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) { + PMD_DRV_LOG(INFO, + "disabled split ring vectorization for requirements are not met"); + hw->use_vec_rx = 0; + } #endif - if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) { - hw->use_vec_rx = 0; - } + if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) { + PMD_DRV_LOG(INFO, + "disabled split ring vectorized rx for mrg_rxbuf enabled"); + hw->use_vec_rx = 0; + } - if (rx_offloads & (DEV_RX_OFFLOAD_UDP_CKSUM | - DEV_RX_OFFLOAD_TCP_CKSUM | - DEV_RX_OFFLOAD_TCP_LRO | - DEV_RX_OFFLOAD_VLAN_STRIP)) - hw->use_vec_rx = 0; + if (rx_offloads & (DEV_RX_OFFLOAD_UDP_CKSUM | + DEV_RX_OFFLOAD_TCP_CKSUM | + DEV_RX_OFFLOAD_TCP_LRO | + DEV_RX_OFFLOAD_VLAN_STRIP)) { + PMD_DRV_LOG(INFO, + "disabled split ring vectorized rx for offloading enabled"); + hw->use_vec_rx = 0; + } + } + } return 0; } From patchwork Wed Apr 15 16:47:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marvin Liu X-Patchwork-Id: 68548 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id EF4A4A0563; Wed, 15 Apr 2020 11:14:54 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 968411D696; Wed, 15 Apr 2020 11:13:40 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id C57FD1D66F for ; Wed, 15 Apr 2020 11:13:38 +0200 (CEST) IronPort-SDR: IHRpqld4tInvL+keDnwoHMyy82k8PxrD71bfy6g6VFCjYG6IjmDQkO7OctrASoAYwPrOwis7PZ 6ODy/aIZUDDQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2020 02:13:37 -0700 IronPort-SDR: 66YzX7HuxXdkFABcwr3O4jfvnC6CrWPXTkaBiHpUzUueuSDXGswfBKk8TT9GzOgGJVnVezmYti VrvEn2aG6ygQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,386,1580803200"; d="scan'208";a="400250891" Received: from npg-dpdk-virtual-marvin-dev.sh.intel.com ([10.67.119.58]) by orsmga004.jf.intel.com with ESMTP; 15 Apr 2020 02:13:31 -0700 From: Marvin Liu To: maxime.coquelin@redhat.com, xiaolong.ye@intel.com, zhihong.wang@intel.com Cc: harry.van.haaren@intel.com, dev@dpdk.org, Marvin Liu Date: Thu, 16 Apr 2020 00:47:33 +0800 Message-Id: <20200415164733.75416-9-yong.liu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200415164733.75416-1-yong.liu@intel.com> References: <20200313174230.74661-1-yong.liu@intel.com> <20200415164733.75416-1-yong.liu@intel.com> Subject: [dpdk-dev] [PATCH v4 8/8] doc: add packed vectorized datapath X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Document packed virtqueue vectorized datapath selection logic in virtio net PMD. Add packed virtqueue vectorized datapath features to new ini file. Signed-off-by: Marvin Liu diff --git a/doc/guides/nics/features/virtio-packed_vec.ini b/doc/guides/nics/features/virtio-packed_vec.ini new file mode 100644 index 000000000..b239bcaad --- /dev/null +++ b/doc/guides/nics/features/virtio-packed_vec.ini @@ -0,0 +1,22 @@ +; +; Supported features of the 'virtio_packed_vec' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Speed capabilities = P +Link status = Y +Link status event = Y +Rx interrupt = Y +Queue start/stop = Y +Promiscuous mode = Y +Allmulticast mode = Y +Unicast MAC filter = Y +Multicast MAC filter = Y +VLAN filter = Y +Basic stats = Y +Stats per queue = Y +BSD nic_uio = Y +Linux UIO = Y +Linux VFIO = Y +x86-64 = Y diff --git a/doc/guides/nics/features/virtio_vec.ini b/doc/guides/nics/features/virtio-split_vec.ini similarity index 88% rename from doc/guides/nics/features/virtio_vec.ini rename to doc/guides/nics/features/virtio-split_vec.ini index e60fe36ae..4142fc9f0 100644 --- a/doc/guides/nics/features/virtio_vec.ini +++ b/doc/guides/nics/features/virtio-split_vec.ini @@ -1,5 +1,5 @@ ; -; Supported features of the 'virtio_vec' network poll mode driver. +; Supported features of the 'virtio_split_vec' network poll mode driver. ; ; Refer to default.ini for the full list of available PMD features. ; diff --git a/doc/guides/nics/virtio.rst b/doc/guides/nics/virtio.rst index d1f5fb898..7c9ad9466 100644 --- a/doc/guides/nics/virtio.rst +++ b/doc/guides/nics/virtio.rst @@ -403,6 +403,11 @@ Below devargs are supported by the virtio-user vdev: It is used to enable virtio device packed virtqueue feature. (Default: 0 (disabled)) +#. ``vectorized``: + + It is used to enable virtio device vectorized datapath. + (Default: 0 (disabled)) + Virtio paths Selection and Usage -------------------------------- @@ -454,6 +459,13 @@ according to below configuration: both negotiated, this path will be selected. #. Packed virtqueue in-order non-mergeable path: If in-order feature is negotiated and Rx mergeable is not negotiated, this path will be selected. +#. Packed virtqueue vectorized Rx path: If building and running environment support + AVX512 && in-order feature is negotiated && Rx mergeable is not negotiated && + TCP_LRO Rx offloading is disabled && vectorized option enabled, + this path will be selected. +#. Packed virtqueue vectorized Tx path: If building and running environment support + AVX512 && in-order feature is negotiated && vectorized option enabled, + this path will be selected. Rx/Tx callbacks of each Virtio path ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -476,6 +488,8 @@ are shown in below table: Packed virtqueue non-meregable path virtio_recv_pkts_packed virtio_xmit_pkts_packed Packed virtqueue in-order mergeable path virtio_recv_mergeable_pkts_packed virtio_xmit_pkts_packed Packed virtqueue in-order non-mergeable path virtio_recv_pkts_packed virtio_xmit_pkts_packed + Packed virtqueue vectorized Rx path virtio_recv_pkts_packed_vec virtio_xmit_pkts_packed + Packed virtqueue vectorized Tx path virtio_recv_pkts_packed virtio_xmit_pkts_packed_vec ============================================ ================================= ======================== Virtio paths Support Status from Release to Release @@ -493,20 +507,22 @@ All virtio paths support status are shown in below table: .. table:: Virtio Paths and Releases - ============================================ ============= ============= ============= - Virtio paths 16.11 ~ 18.05 18.08 ~ 18.11 19.02 ~ 19.11 - ============================================ ============= ============= ============= - Split virtqueue mergeable path Y Y Y - Split virtqueue non-mergeable path Y Y Y - Split virtqueue vectorized Rx path Y Y Y - Split virtqueue simple Tx path Y N N - Split virtqueue in-order mergeable path Y Y - Split virtqueue in-order non-mergeable path Y Y - Packed virtqueue mergeable path Y - Packed virtqueue non-mergeable path Y - Packed virtqueue in-order mergeable path Y - Packed virtqueue in-order non-mergeable path Y - ============================================ ============= ============= ============= + ============================================ ============= ============= ============= ======= + Virtio paths 16.11 ~ 18.05 18.08 ~ 18.11 19.02 ~ 19.11 20.05 ~ + ============================================ ============= ============= ============= ======= + Split virtqueue mergeable path Y Y Y Y + Split virtqueue non-mergeable path Y Y Y Y + Split virtqueue vectorized Rx path Y Y Y Y + Split virtqueue simple Tx path Y N N N + Split virtqueue in-order mergeable path Y Y Y + Split virtqueue in-order non-mergeable path Y Y Y + Packed virtqueue mergeable path Y Y + Packed virtqueue non-mergeable path Y Y + Packed virtqueue in-order mergeable path Y Y + Packed virtqueue in-order non-mergeable path Y Y + Packed virtqueue vectorized Rx path Y + Packed virtqueue vectorized Tx path Y + ============================================ ============= ============= ============= ======= QEMU Support Status ~~~~~~~~~~~~~~~~~~~