From patchwork Fri Dec 22 19:44:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Xing, Beilei" X-Patchwork-Id: 512 Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7E5934375E; Fri, 22 Dec 2023 12:22:13 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 075414025D; Fri, 22 Dec 2023 12:22:13 +0100 (CET) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by mails.dpdk.org (Postfix) with ESMTP id 804834003C for ; Fri, 22 Dec 2023 12:22:11 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1703244131; x=1734780131; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=HeSKhnwP3YBKcVypwgeyZ1nPMCI7Wcg1JgfkcigJ7kQ=; b=SsjDt3QKw6rx3M5eZdhJTU66DWzv/EayXoa0HLmViz5rfj7vleIUpURm vHPW28psOFOIjlddNpu08lTmCIIeqZPC3HG3cDNdaK7gOcPj7HIPqP4OC AbK9JivaI1w9BY+GVqstShHPKtZy/blNqstrZDPwT6GvAFOKYr8ShNiju IpIainF7svfw8ua3bWKz2hiKt8u80UyXHTliY9/y7T3UhENY97cJvu28U h/McNO0S72/dHZL5Rs57HAZXt0MdR/Y6BE4eO2tGFZh953HEAp2lBbWpo yYZY28PnQfQDZDwD8fXRJJvA/gHh4Jr+yppLjlMGBpLkQKrDeERLUixcj w==; X-IronPort-AV: E=McAfee;i="6600,9927,10931"; a="399927180" X-IronPort-AV: E=Sophos;i="6.04,296,1695711600"; d="scan'208";a="399927180" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Dec 2023 03:22:10 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,296,1695711600"; d="scan'208";a="11437145" Received: from dpdk-beileix-icelake.sh.intel.com ([10.67.116.155]) by fmviesa002.fm.intel.com with ESMTP; 22 Dec 2023 03:22:08 -0800 From: beilei.xing@intel.com To: anatoly.burakov@intel.com Cc: dev@dpdk.org, thomas@monjalon.net, ferruh.yigit@amd.com, bruce.richardson@intel.com, chenbox@nvidia.com, yahui.cao@intel.com, Beilei Xing Subject: [PATCH 0/4] add VFIO IOMMUFD/CDEV support Date: Fri, 22 Dec 2023 19:44:49 +0000 Message-Id: <20231222194453.3049693-1-beilei.xing@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Beilei Xing This is a draft implementation to support IOMMUFD[1] user interface and VFIO CDEV user interface[2]. Problem statement: Linux now includes multiple device-passthrough frameworks (e.g. VFIO and vDPA) and those frameworks implement their own logic for managing I/O page tables, which is hard to scale to support modern IOMMU features like PASID, I/O page fault, IOMMU dirty page tracking, etc. In order to fix the issue, a new standalone IOMMU subsystem called IOMMUFD is introduced in Linux Kernel since v6.2. The goal is to make Linux subsystems like VFIO and vDPA to consume a unified IOMMU framework. Along with this new IOMMUFD framework, new device-centric VFIO uAPI called VFIO CDEV is also introduced since Linux Kernel v6.6. vDPA support for IOMMUFD in Linux Kernel is still work in progress[3]. Since all new IOMMU features provided by different vendors will only be supported in the new framework instead of legacy one, it's important for DPDK to support this new IOMMUFD framework to use latest IOMMU features. For VFIO subsystem, mainline Linux supports both of VFIO Container/GROUP interface and VFIO IOMMUFD/CDEV interface. IOMMUFD has no impact on the existing VFIO Container/Group interface, while latest IOMMU feature(e.g. PASID/SSID) may be only available through VFIO IOMMUFD/CDEV interface. Comparing with VFIO Container and VFIO IOMMUFD, vfio device uAPI does not change while I/O page tables management is moved from VFIO Container into IOMMUFD interface. Design: For DPDK implementation, since VFIO Container/GROUP & VFIO IOMMUFD/CDEV may co-exist now, a new VFIO IOMMUFD file/interface will be added in EAL. Since IOMMUFD is a unified framework which can be consumed by VFIO, vDPA, etc, iommufd will be added as a standalone file/interface in EAL. Hence, DPDK bus driver (e.g. PCI) has 2 option to probe vfio device. The diagram below shows relationship between VFIO Container/GROUP, IOMMUFD, VFIO CDEV and bus driver (e.g. PCI) in DPDK with some comments below. _____________________ | [4] | | | | | |PCI BUS | |_____________________| | | | | ________________v___ ___v______________ ________________________ | [1] | | [2] | | | |vfio container | | | | | |vfio group | |vfio cdev | | Other Consumer | | | | | | (vDPA IOMMUFD, | |VFIO | |VFIO IOMMUFD(new) | | common memory) | |____________________| |__________________| |________________________| | | | | ___v______________v___ | [3] | | i/o page table mgmt | | | | | |IOMMUFD(new) | |______________________| 1. VFIO component is the existed and mature framework for device passthrough. No function changes here. 2. VFIO IOMMUFD is a new component added to co-work with IOMMUFD. It exposes function for PCI BUS to probe PCI device through VFIO CDEV interface. 3. IOMMUFD is a new component added. It exposes unified interface for VFIO IOMMUFD and other consumer to manage I/O page table. 4. PCI BUS is the existed component. Since now Linux has both of VFIO Container/GROUP & VFIO IOMMUFD/CDEV support, PCI BUS needs to determine the interface to probe the PCI device depending on user configuration. TBD: Multi-process will be supported in future. [1] https://lwn.net/Articles/912515/ [2] https://patchwork.kernel.org/project/kvm/cover/20230718135551.6592-1-yi.l.liu@intel.com/ [3] https://lore.kernel.org/lkml/20231103171641.1703146-1-lulu@redhat.com/ Beilei Xing (3): vfio: add VFIO IOMMUFD support bus/pci: add VFIO CDEV support eal: add new args to choose VFIO mode Yahui Cao (1): iommufd: add IOMMUFD support config/meson.build | 3 + config/rte_config.h | 1 + drivers/bus/pci/bus_pci_driver.h | 1 + drivers/bus/pci/linux/pci.c | 21 +- drivers/bus/pci/linux/pci_init.h | 4 + drivers/bus/pci/linux/pci_vfio.c | 52 +++- lib/eal/common/eal_common_config.c | 6 + lib/eal/common/eal_common_options.c | 48 +++- lib/eal/common/eal_internal_cfg.h | 1 + lib/eal/common/eal_options.h | 2 + lib/eal/include/rte_eal.h | 18 ++ lib/eal/include/rte_iommufd.h | 73 ++++++ lib/eal/include/rte_vfio.h | 55 ++++ lib/eal/linux/eal.c | 22 ++ lib/eal/linux/eal_iommufd.c | 183 +++++++++++++ lib/eal/linux/eal_iommufd.h | 43 ++++ lib/eal/linux/eal_vfio.h | 3 + lib/eal/linux/eal_vfio_iommufd.c | 385 ++++++++++++++++++++++++++++ lib/eal/linux/meson.build | 2 + lib/eal/version.map | 6 + 20 files changed, 918 insertions(+), 11 deletions(-) create mode 100644 lib/eal/include/rte_iommufd.h create mode 100644 lib/eal/linux/eal_iommufd.c create mode 100644 lib/eal/linux/eal_iommufd.h create mode 100644 lib/eal/linux/eal_vfio_iommufd.c