[0/4] add VFIO IOMMUFD/CDEV support

Message ID 20231222194453.3049693-1-beilei.xing@intel.com (mailing list archive)
Headers
Series add VFIO IOMMUFD/CDEV support |

Message

Xing, Beilei Dec. 22, 2023, 7:44 p.m. UTC
  From: Beilei Xing <beilei.xing@intel.com>

This is a draft implementation to support IOMMUFD[1] user interface and VFIO
CDEV user interface[2].

Problem statement:
Linux now includes multiple device-passthrough frameworks (e.g. VFIO and vDPA)
and those frameworks implement their own logic for managing I/O page tables,
which is hard to scale to support modern IOMMU features like PASID, I/O page
fault, IOMMU dirty page tracking, etc.

In order to fix the issue, a new standalone IOMMU subsystem called IOMMUFD is
introduced in Linux Kernel since v6.2. The goal is to make Linux subsystems like
VFIO and vDPA to consume a unified IOMMU framework. Along with this new IOMMUFD
framework, new device-centric VFIO uAPI called VFIO CDEV is also introduced
since Linux Kernel v6.6. vDPA support for IOMMUFD in Linux Kernel is still work
in progress[3].

Since all new IOMMU features provided by different vendors will only be supported
in the new framework instead of legacy one, it's important for DPDK to support
this new IOMMUFD framework to use latest IOMMU features.

For VFIO subsystem, mainline Linux supports both of VFIO Container/GROUP interface
and VFIO IOMMUFD/CDEV interface. IOMMUFD has no impact on the existing VFIO
Container/Group interface, while latest IOMMU feature(e.g. PASID/SSID) may be only
available through VFIO IOMMUFD/CDEV interface. Comparing with VFIO Container and
VFIO IOMMUFD, vfio device uAPI does not change while I/O page tables management is
moved from VFIO Container into IOMMUFD interface.

Design:
For DPDK implementation, since VFIO Container/GROUP & VFIO IOMMUFD/CDEV may co-exist
now, a new VFIO IOMMUFD file/interface will be added in EAL. Since IOMMUFD is a
unified framework which can be consumed by VFIO, vDPA, etc, iommufd will be added
as a standalone file/interface in EAL. Hence, DPDK bus driver (e.g. PCI) has 2
option to probe vfio device.

The diagram below shows relationship between VFIO Container/GROUP, IOMMUFD, VFIO
CDEV and bus driver (e.g. PCI) in DPDK with some comments below.

                     _____________________
                    |        [4]          |
                    |                     |
                    |                     |
                    |PCI BUS              |
                    |_____________________|
                        |             |
                        |             |
        ________________v___       ___v______________      ________________________
       |       [1]          |     |       [2]        |    |                        |
       |vfio container      |     |                  |    |                        |
       |vfio group          |     |vfio cdev         |    |   Other Consumer       |
       |                    |     |                  |    |   (vDPA IOMMUFD,       |
       |VFIO                |     |VFIO IOMMUFD(new) |    |    common memory)      |
       |____________________|     |__________________|    |________________________|
                                                |              |
                                                |              |
                                             ___v______________v___
                                            |        [3]           |
                                            | i/o page table mgmt  |
                                            |                      |
                                            |                      |
                                            |IOMMUFD(new)          |
                                            |______________________|

1. VFIO component is the existed and mature framework for device passthrough. No
   function changes here.
2. VFIO IOMMUFD is a new component added to co-work with IOMMUFD. It exposes function
   for PCI BUS to probe PCI device through VFIO CDEV interface.
3. IOMMUFD is a new component added. It exposes unified interface for VFIO IOMMUFD
   and other consumer to manage I/O page table.
4. PCI BUS is the existed component. Since now Linux has both of VFIO Container/GROUP
   & VFIO IOMMUFD/CDEV support, PCI BUS needs to determine the interface to probe the
   PCI device depending on user configuration.

TBD:
Multi-process will be supported in future.

[1] https://lwn.net/Articles/912515/
[2] https://patchwork.kernel.org/project/kvm/cover/20230718135551.6592-1-yi.l.liu@intel.com/
[3] https://lore.kernel.org/lkml/20231103171641.1703146-1-lulu@redhat.com/

Beilei Xing (3):
  vfio: add VFIO IOMMUFD support
  bus/pci: add VFIO CDEV support
  eal: add new args to choose VFIO mode

Yahui Cao (1):
  iommufd: add IOMMUFD support

 config/meson.build                  |   3 +
 config/rte_config.h                 |   1 +
 drivers/bus/pci/bus_pci_driver.h    |   1 +
 drivers/bus/pci/linux/pci.c         |  21 +-
 drivers/bus/pci/linux/pci_init.h    |   4 +
 drivers/bus/pci/linux/pci_vfio.c    |  52 +++-
 lib/eal/common/eal_common_config.c  |   6 +
 lib/eal/common/eal_common_options.c |  48 +++-
 lib/eal/common/eal_internal_cfg.h   |   1 +
 lib/eal/common/eal_options.h        |   2 +
 lib/eal/include/rte_eal.h           |  18 ++
 lib/eal/include/rte_iommufd.h       |  73 ++++++
 lib/eal/include/rte_vfio.h          |  55 ++++
 lib/eal/linux/eal.c                 |  22 ++
 lib/eal/linux/eal_iommufd.c         | 183 +++++++++++++
 lib/eal/linux/eal_iommufd.h         |  43 ++++
 lib/eal/linux/eal_vfio.h            |   3 +
 lib/eal/linux/eal_vfio_iommufd.c    | 385 ++++++++++++++++++++++++++++
 lib/eal/linux/meson.build           |   2 +
 lib/eal/version.map                 |   6 +
 20 files changed, 918 insertions(+), 11 deletions(-)
 create mode 100644 lib/eal/include/rte_iommufd.h
 create mode 100644 lib/eal/linux/eal_iommufd.c
 create mode 100644 lib/eal/linux/eal_iommufd.h
 create mode 100644 lib/eal/linux/eal_vfio_iommufd.c