mbox series

[RFC,v3,0/6] Add mdev (Mediated device) support in DPDK

Message ID 20210601030644.3318-1-chenbo.xia@intel.com (mailing list archive)
Headers
Series Add mdev (Mediated device) support in DPDK |

Message

Chenbo Xia June 1, 2021, 3:06 a.m. UTC
  Hi everyone,

This is a draft implementation of the mdev (Mediated device [1])
support in DPDK PCI bus driver. Mdev is a way to virtualize devices
in Linux kernel. Based on the device-api (mdev_type/device_api),
there could be different types of mdev devices (e.g. vfio-pci).
In this patchset, the PCI bus driver is extended to support scanning
and probing the mdev devices whose device-api is "vfio-pci".

                     +---------+
                     | PCI bus |
                     +----+----+
                          |
         +--------+-------+-------+--------+
         |        |               |        |
  Physical PCI devices ...   Mediated PCI devices ...

The first four patches in this patchset are mainly preparation of mdev
bus support. The left two patches are the key implementation of mdev bus.

The implementation of mdev bus in DPDK has several options:

1: Embed mdev bus in current pci bus

   This patchset takes this option for an example. Mdev has several
   device types: pci/platform/amba/ccw/ap. DPDK currently only cares
   pci devices in all mdev device types so we could embed the mdev bus
   into current pci bus. Then pci bus with mdev support will scan/plug/
   unplug/.. not only normal pci devices but also mediated pci devices.

2: A new mdev bus that scans mediated pci devices and probes mdev driver to
   plug-in pci devices to pci bus

   If we took this option, a new mdev bus will be implemented to scan
   mediated pci devices and a new mdev driver for pci devices will be
   implemented in pci bus to plug-in mediated pci devices to pci bus.

   Our RFC v1 takes this option:
   http://patchwork.dpdk.org/project/dpdk/cover/20190403071844.21126-1-tiwei.bie@intel.com/

   Note that: for either option 1 or 2, device drivers do not know the
   implementation difference but only use structs/functions exposed by
   pci bus. Mediated pci devices are different from normal pci devices
   on: 1. Mediated pci devices use UUID as address but normal ones use BDF.
   2. Mediated pci devices may have some capabilities that normal pci
   devices do not have. For example, mediated pci devices could have
   regions that have sparse mmap capability, which allows a region to have
   multiple mmap areas. Another example is mediated pci devices may have
   regions/part of regions not mmaped but need to access them. Above
   difference will change the current ABI (i.e., struct rte_pci_device).
   Please check 5th and 6th patch for details.

3. A brand new mdev bus that does everything

   This option will implement a new and standalone mdev bus. This option
   does not need any changes in current pci bus but only needs some shared
   code (linux vfio part) in pci bus. Drivers of devices that support mdev
   will register itself as a mdev driver and do not rely on pci bus anymore.
   This option, IMHO, will make the code clean. The only potential problem
   may be code duplication, which could be solved by making code of linux
   vfio part of pci bus common and shared.

Your comments on above three options are welcomed and appreciated!

Thanks!
Chenbo

----------------------------------------------------------------------------
RFC v3:
- Add sparse mmap support
- Minor fixes and improvements

RFC v2:
- Let PCI bus scan mediated PCI devices directly
- Address Keith's comments
- Merge below patch into this series (David)
   http://patches.dpdk.org/patch/55927/
- Add internal representation of PCI device (David)
- Minor fixes and improvements

[1] https://github.com/torvalds/linux/blob/master/Documentation/driver-api/vfio-mediated-device.rst

Chenbo Xia (1):
  bus/pci: add sparse mmap support for mediated PCI devices

Tiwei Bie (5):
  bus/pci: introduce an internal representation of PCI device
  bus/pci: avoid depending on private value in kernel source
  bus/pci: introduce helper for MMIO read and write
  eal: add a helper for reading string from sysfs
  bus/pci: add mdev support

 drivers/bus/pci/bsd/pci.c             |  36 +-
 drivers/bus/pci/linux/pci.c           | 107 ++++-
 drivers/bus/pci/linux/pci_init.h      |  29 +-
 drivers/bus/pci/linux/pci_uio.c       |  22 +
 drivers/bus/pci/linux/pci_vfio.c      | 586 ++++++++++++++++++++++----
 drivers/bus/pci/linux/pci_vfio_mdev.c | 277 ++++++++++++
 drivers/bus/pci/meson.build           |   1 +
 drivers/bus/pci/pci_common.c          |  86 ++--
 drivers/bus/pci/pci_params.c          |  36 +-
 drivers/bus/pci/private.h             |  40 ++
 drivers/bus/pci/rte_bus_pci.h         |  83 +++-
 drivers/bus/pci/version.map           |   4 +
 lib/eal/common/eal_filesystem.h       |  10 +
 lib/eal/freebsd/eal.c                 |  22 +
 lib/eal/linux/eal.c                   |  39 +-
 lib/eal/version.map                   |   3 +
 16 files changed, 1224 insertions(+), 157 deletions(-)
 create mode 100644 drivers/bus/pci/linux/pci_vfio_mdev.c
  

Comments

Thomas Monjalon June 11, 2021, 7:15 a.m. UTC | #1
01/06/2021 05:06, Chenbo Xia:
> Hi everyone,
> 
> This is a draft implementation of the mdev (Mediated device [1])
> support in DPDK PCI bus driver. Mdev is a way to virtualize devices
> in Linux kernel. Based on the device-api (mdev_type/device_api),
> there could be different types of mdev devices (e.g. vfio-pci).

Please could you illustrate with an usage of mdev in DPDK?
What does it enable which is not possible today?

> In this patchset, the PCI bus driver is extended to support scanning
> and probing the mdev devices whose device-api is "vfio-pci".
> 
>                      +---------+
>                      | PCI bus |
>                      +----+----+
>                           |
>          +--------+-------+-------+--------+
>          |        |               |        |
>   Physical PCI devices ...   Mediated PCI devices ...
> 
> The first four patches in this patchset are mainly preparation of mdev
> bus support. The left two patches are the key implementation of mdev bus.
> 
> The implementation of mdev bus in DPDK has several options:
> 
> 1: Embed mdev bus in current pci bus
> 
>    This patchset takes this option for an example. Mdev has several
>    device types: pci/platform/amba/ccw/ap. DPDK currently only cares
>    pci devices in all mdev device types so we could embed the mdev bus
>    into current pci bus. Then pci bus with mdev support will scan/plug/
>    unplug/.. not only normal pci devices but also mediated pci devices.

I think it is a different bus.
It would be cleaner to not touch the PCI bus.
Having a separate bus will allow an easy way to identify a device
with the new generic devargs syntax, example:
	bus=mdev,uuid=XXX
or more complex:
	bus=mdev,uuid=XXX/class=crypto/driver=qat,foo=bar

> 2: A new mdev bus that scans mediated pci devices and probes mdev driver to
>    plug-in pci devices to pci bus
> 
>    If we took this option, a new mdev bus will be implemented to scan
>    mediated pci devices and a new mdev driver for pci devices will be
>    implemented in pci bus to plug-in mediated pci devices to pci bus.
> 
>    Our RFC v1 takes this option:
>    http://patchwork.dpdk.org/project/dpdk/cover/20190403071844.21126-1-tiwei.bie@intel.com/
> 
>    Note that: for either option 1 or 2, device drivers do not know the
>    implementation difference but only use structs/functions exposed by
>    pci bus. Mediated pci devices are different from normal pci devices
>    on: 1. Mediated pci devices use UUID as address but normal ones use BDF.
>    2. Mediated pci devices may have some capabilities that normal pci
>    devices do not have. For example, mediated pci devices could have
>    regions that have sparse mmap capability, which allows a region to have
>    multiple mmap areas. Another example is mediated pci devices may have
>    regions/part of regions not mmaped but need to access them. Above
>    difference will change the current ABI (i.e., struct rte_pci_device).
>    Please check 5th and 6th patch for details.
> 
> 3. A brand new mdev bus that does everything
> 
>    This option will implement a new and standalone mdev bus. This option
>    does not need any changes in current pci bus but only needs some shared
>    code (linux vfio part) in pci bus. Drivers of devices that support mdev
>    will register itself as a mdev driver and do not rely on pci bus anymore.
>    This option, IMHO, will make the code clean. The only potential problem
>    may be code duplication, which could be solved by making code of linux
>    vfio part of pci bus common and shared.

Yes I prefer this third option.
We can find an elegant way of sharing some VFIO code between buses.
  
Chenbo Xia June 15, 2021, 2:49 a.m. UTC | #2
Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, June 11, 2021 3:16 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; Liang, Cunming <cunming.liang@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit,
> Ferruh <ferruh.yigit@intel.com>; mdr@ashroe.eu; nhorman@tuxdriver.com;
> Richardson, Bruce <bruce.richardson@intel.com>; david.marchand@redhat.com;
> stephen@networkplumber.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Subject: Re: [dpdk-dev] [RFC v3 0/6] Add mdev (Mediated device) support in
> DPDK
> 
> 01/06/2021 05:06, Chenbo Xia:
> > Hi everyone,
> >
> > This is a draft implementation of the mdev (Mediated device [1])
> > support in DPDK PCI bus driver. Mdev is a way to virtualize devices
> > in Linux kernel. Based on the device-api (mdev_type/device_api),
> > there could be different types of mdev devices (e.g. vfio-pci).
> 
> Please could you illustrate with an usage of mdev in DPDK?
> What does it enable which is not possible today?

The main purpose is for DPDK to drive mdev-based devices, which is not
possible today.

I'd take PCI devices for an example. Currently DPDK can only drive devices
of physical pci bus under /sys/bus/pci and kernel exposes the pci devices
to APP in that way.

But there are PCI devices using vfio-mdev as a software framework to expose
Mdev to APP under /sys/bus/mdev. Devices could choose this way of virtualizing
itself to let multiple APPs share one physical device. For example, Intel
Scalable IOV technology is known to use vfio-mdev as SW framework for Scalable
IOV enabled devices (and Intel net/crypto/raw devices support this tech). For
those mdev-based devices, DPDK needs support on the bus layer to scan/plug/probe/..
them, which is the main effort this patchset does. There are also other devices
using the vfio-mdev framework, AFAIK, Nvidia's GPU is the first one using mdev
and Intel's GPU virtualization also uses it.

> 
> > In this patchset, the PCI bus driver is extended to support scanning
> > and probing the mdev devices whose device-api is "vfio-pci".
> >
> >                      +---------+
> >                      | PCI bus |
> >                      +----+----+
> >                           |
> >          +--------+-------+-------+--------+
> >          |        |               |        |
> >   Physical PCI devices ...   Mediated PCI devices ...
> >
> > The first four patches in this patchset are mainly preparation of mdev
> > bus support. The left two patches are the key implementation of mdev bus.
> >
> > The implementation of mdev bus in DPDK has several options:
> >
> > 1: Embed mdev bus in current pci bus
> >
> >    This patchset takes this option for an example. Mdev has several
> >    device types: pci/platform/amba/ccw/ap. DPDK currently only cares
> >    pci devices in all mdev device types so we could embed the mdev bus
> >    into current pci bus. Then pci bus with mdev support will scan/plug/
> >    unplug/.. not only normal pci devices but also mediated pci devices.
> 
> I think it is a different bus.
> It would be cleaner to not touch the PCI bus.
> Having a separate bus will allow an easy way to identify a device
> with the new generic devargs syntax, example:
> 	bus=mdev,uuid=XXX
> or more complex:
> 	bus=mdev,uuid=XXX/class=crypto/driver=qat,foo=bar

OK. Agree on cleaner to not touch PCI bus. And there may also be a 'type=pci'
as mdev has several types in its definition (pci/ap/platform/ccw/...).

> 
> > 2: A new mdev bus that scans mediated pci devices and probes mdev driver to
> >    plug-in pci devices to pci bus
> >
> >    If we took this option, a new mdev bus will be implemented to scan
> >    mediated pci devices and a new mdev driver for pci devices will be
> >    implemented in pci bus to plug-in mediated pci devices to pci bus.
> >
> >    Our RFC v1 takes this option:
> >    http://patchwork.dpdk.org/project/dpdk/cover/20190403071844.21126-1-
> tiwei.bie@intel.com/
> >
> >    Note that: for either option 1 or 2, device drivers do not know the
> >    implementation difference but only use structs/functions exposed by
> >    pci bus. Mediated pci devices are different from normal pci devices
> >    on: 1. Mediated pci devices use UUID as address but normal ones use BDF.
> >    2. Mediated pci devices may have some capabilities that normal pci
> >    devices do not have. For example, mediated pci devices could have
> >    regions that have sparse mmap capability, which allows a region to have
> >    multiple mmap areas. Another example is mediated pci devices may have
> >    regions/part of regions not mmaped but need to access them. Above
> >    difference will change the current ABI (i.e., struct rte_pci_device).
> >    Please check 5th and 6th patch for details.
> >
> > 3. A brand new mdev bus that does everything
> >
> >    This option will implement a new and standalone mdev bus. This option
> >    does not need any changes in current pci bus but only needs some shared
> >    code (linux vfio part) in pci bus. Drivers of devices that support mdev
> >    will register itself as a mdev driver and do not rely on pci bus anymore.
> >    This option, IMHO, will make the code clean. The only potential problem
> >    may be code duplication, which could be solved by making code of linux
> >    vfio part of pci bus common and shared.
> 
> Yes I prefer this third option.
> We can find an elegant way of sharing some VFIO code between buses.

Yes, I have not thought about the details of the code sharing but will try to make
it elegant.

Thanks,
Chenbo

>
  
Thomas Monjalon June 15, 2021, 7:48 a.m. UTC | #3
15/06/2021 04:49, Xia, Chenbo:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 01/06/2021 05:06, Chenbo Xia:
> > > Hi everyone,
> > >
> > > This is a draft implementation of the mdev (Mediated device [1])
> > > support in DPDK PCI bus driver. Mdev is a way to virtualize devices
> > > in Linux kernel. Based on the device-api (mdev_type/device_api),
> > > there could be different types of mdev devices (e.g. vfio-pci).
> > 
> > Please could you illustrate with an usage of mdev in DPDK?
> > What does it enable which is not possible today?
> 
> The main purpose is for DPDK to drive mdev-based devices, which is not
> possible today.
> 
> I'd take PCI devices for an example. Currently DPDK can only drive devices
> of physical pci bus under /sys/bus/pci and kernel exposes the pci devices
> to APP in that way.
> 
> But there are PCI devices using vfio-mdev as a software framework to expose
> Mdev to APP under /sys/bus/mdev. Devices could choose this way of virtualizing
> itself to let multiple APPs share one physical device. For example, Intel
> Scalable IOV technology is known to use vfio-mdev as SW framework for Scalable
> IOV enabled devices (and Intel net/crypto/raw devices support this tech). For
> those mdev-based devices, DPDK needs support on the bus layer to scan/plug/probe/..
> them, which is the main effort this patchset does. There are also other devices
> using the vfio-mdev framework, AFAIK, Nvidia's GPU is the first one using mdev
> and Intel's GPU virtualization also uses it.

Yes mdev was designed for virtualization I think.
The use of mdev for Scalable IOV without virtualization
may be seen as an abuse by Linux maintainers,
as they currently seem to prefer the auxiliary bus (which is a real bus).

Mellanox got a push back when trying to use mdev for the same purpose
(Scalable Function, also called Sub-Function) in the kernel.
The Linux community decided to use the auxiliary bus.

Any other feedback on the choice mdev vs aux?
Is there any kernel code supporting this mdev model for Intel devices?

> > > In this patchset, the PCI bus driver is extended to support scanning
> > > and probing the mdev devices whose device-api is "vfio-pci".
> > >
> > >                      +---------+
> > >                      | PCI bus |
> > >                      +----+----+
> > >                           |
> > >          +--------+-------+-------+--------+
> > >          |        |               |        |
> > >   Physical PCI devices ...   Mediated PCI devices ...
> > >
> > > The first four patches in this patchset are mainly preparation of mdev
> > > bus support. The left two patches are the key implementation of mdev bus.
> > >
> > > The implementation of mdev bus in DPDK has several options:
> > >
> > > 1: Embed mdev bus in current pci bus
> > >
> > >    This patchset takes this option for an example. Mdev has several
> > >    device types: pci/platform/amba/ccw/ap. DPDK currently only cares
> > >    pci devices in all mdev device types so we could embed the mdev bus
> > >    into current pci bus. Then pci bus with mdev support will scan/plug/
> > >    unplug/.. not only normal pci devices but also mediated pci devices.
> > 
> > I think it is a different bus.
> > It would be cleaner to not touch the PCI bus.
> > Having a separate bus will allow an easy way to identify a device
> > with the new generic devargs syntax, example:
> > 	bus=mdev,uuid=XXX
> > or more complex:
> > 	bus=mdev,uuid=XXX/class=crypto/driver=qat,foo=bar
> 
> OK. Agree on cleaner to not touch PCI bus. And there may also be a 'type=pci'
> as mdev has several types in its definition (pci/ap/platform/ccw/...).
> 
> > > 2: A new mdev bus that scans mediated pci devices and probes mdev driver to
> > >    plug-in pci devices to pci bus
> > >
> > >    If we took this option, a new mdev bus will be implemented to scan
> > >    mediated pci devices and a new mdev driver for pci devices will be
> > >    implemented in pci bus to plug-in mediated pci devices to pci bus.
> > >
> > >    Our RFC v1 takes this option:
> > >    http://patchwork.dpdk.org/project/dpdk/cover/20190403071844.21126-1-
> > tiwei.bie@intel.com/
> > >
> > >    Note that: for either option 1 or 2, device drivers do not know the
> > >    implementation difference but only use structs/functions exposed by
> > >    pci bus. Mediated pci devices are different from normal pci devices
> > >    on: 1. Mediated pci devices use UUID as address but normal ones use BDF.
> > >    2. Mediated pci devices may have some capabilities that normal pci
> > >    devices do not have. For example, mediated pci devices could have
> > >    regions that have sparse mmap capability, which allows a region to have
> > >    multiple mmap areas. Another example is mediated pci devices may have
> > >    regions/part of regions not mmaped but need to access them. Above
> > >    difference will change the current ABI (i.e., struct rte_pci_device).
> > >    Please check 5th and 6th patch for details.
> > >
> > > 3. A brand new mdev bus that does everything
> > >
> > >    This option will implement a new and standalone mdev bus. This option
> > >    does not need any changes in current pci bus but only needs some shared
> > >    code (linux vfio part) in pci bus. Drivers of devices that support mdev
> > >    will register itself as a mdev driver and do not rely on pci bus anymore.
> > >    This option, IMHO, will make the code clean. The only potential problem
> > >    may be code duplication, which could be solved by making code of linux
> > >    vfio part of pci bus common and shared.
> > 
> > Yes I prefer this third option.
> > We can find an elegant way of sharing some VFIO code between buses.
> 
> Yes, I have not thought about the details of the code sharing but will try to make
> it elegant.

Great, thanks.
  
Chenbo Xia June 15, 2021, 10:44 a.m. UTC | #4
Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Tuesday, June 15, 2021 3:48 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; Liang, Cunming <cunming.liang@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit,
> Ferruh <ferruh.yigit@intel.com>; mdr@ashroe.eu; nhorman@tuxdriver.com;
> Richardson, Bruce <bruce.richardson@intel.com>; david.marchand@redhat.com;
> stephen@networkplumber.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> jgg@nvidia.com; parav@nvidia.com; xuemingl@nvidia.com
> Subject: Re: [dpdk-dev] [RFC v3 0/6] Add mdev (Mediated device) support in
> DPDK
> 
> 15/06/2021 04:49, Xia, Chenbo:
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 01/06/2021 05:06, Chenbo Xia:
> > > > Hi everyone,
> > > >
> > > > This is a draft implementation of the mdev (Mediated device [1])
> > > > support in DPDK PCI bus driver. Mdev is a way to virtualize devices
> > > > in Linux kernel. Based on the device-api (mdev_type/device_api),
> > > > there could be different types of mdev devices (e.g. vfio-pci).
> > >
> > > Please could you illustrate with an usage of mdev in DPDK?
> > > What does it enable which is not possible today?
> >
> > The main purpose is for DPDK to drive mdev-based devices, which is not
> > possible today.
> >
> > I'd take PCI devices for an example. Currently DPDK can only drive devices
> > of physical pci bus under /sys/bus/pci and kernel exposes the pci devices
> > to APP in that way.
> >
> > But there are PCI devices using vfio-mdev as a software framework to expose
> > Mdev to APP under /sys/bus/mdev. Devices could choose this way of
> virtualizing
> > itself to let multiple APPs share one physical device. For example, Intel
> > Scalable IOV technology is known to use vfio-mdev as SW framework for
> Scalable
> > IOV enabled devices (and Intel net/crypto/raw devices support this tech).
> For
> > those mdev-based devices, DPDK needs support on the bus layer to
> scan/plug/probe/..
> > them, which is the main effort this patchset does. There are also other
> devices
> > using the vfio-mdev framework, AFAIK, Nvidia's GPU is the first one using
> mdev
> > and Intel's GPU virtualization also uses it.
> 
> Yes mdev was designed for virtualization I think.
> The use of mdev for Scalable IOV without virtualization
> may be seen as an abuse by Linux maintainers,
> as they currently seem to prefer the auxiliary bus (which is a real bus).
> 
> Mellanox got a push back when trying to use mdev for the same purpose
> (Scalable Function, also called Sub-Function) in the kernel.
> The Linux community decided to use the auxiliary bus.
> 
> Any other feedback on the choice mdev vs aux?

OK. Thanks for the info. Much appreciated.

I could investigate a bit about the choice and later come back to you.

> Is there any kernel code supporting this mdev model for Intel devices?

Now there's only intel GPU. But I think you care more about devices that DPDK could
drive: a dma device (DPDK's name ioat under raw/ioat) is on its way upstreaming
(https://www.spinics.net/lists/kvm/msg244417.html)

Thanks,
Chenbo

> 
> > > > In this patchset, the PCI bus driver is extended to support scanning
> > > > and probing the mdev devices whose device-api is "vfio-pci".
> > > >
> > > >                      +---------+
> > > >                      | PCI bus |
> > > >                      +----+----+
> > > >                           |
> > > >          +--------+-------+-------+--------+
> > > >          |        |               |        |
> > > >   Physical PCI devices ...   Mediated PCI devices ...
> > > >
> > > > The first four patches in this patchset are mainly preparation of mdev
> > > > bus support. The left two patches are the key implementation of mdev bus.
> > > >
> > > > The implementation of mdev bus in DPDK has several options:
> > > >
> > > > 1: Embed mdev bus in current pci bus
> > > >
> > > >    This patchset takes this option for an example. Mdev has several
> > > >    device types: pci/platform/amba/ccw/ap. DPDK currently only cares
> > > >    pci devices in all mdev device types so we could embed the mdev bus
> > > >    into current pci bus. Then pci bus with mdev support will scan/plug/
> > > >    unplug/.. not only normal pci devices but also mediated pci devices.
> > >
> > > I think it is a different bus.
> > > It would be cleaner to not touch the PCI bus.
> > > Having a separate bus will allow an easy way to identify a device
> > > with the new generic devargs syntax, example:
> > > 	bus=mdev,uuid=XXX
> > > or more complex:
> > > 	bus=mdev,uuid=XXX/class=crypto/driver=qat,foo=bar
> >
> > OK. Agree on cleaner to not touch PCI bus. And there may also be a
> 'type=pci'
> > as mdev has several types in its definition (pci/ap/platform/ccw/...).
> >
> > > > 2: A new mdev bus that scans mediated pci devices and probes mdev driver
> to
> > > >    plug-in pci devices to pci bus
> > > >
> > > >    If we took this option, a new mdev bus will be implemented to scan
> > > >    mediated pci devices and a new mdev driver for pci devices will be
> > > >    implemented in pci bus to plug-in mediated pci devices to pci bus.
> > > >
> > > >    Our RFC v1 takes this option:
> > > >    http://patchwork.dpdk.org/project/dpdk/cover/20190403071844.21126-1-
> > > tiwei.bie@intel.com/
> > > >
> > > >    Note that: for either option 1 or 2, device drivers do not know the
> > > >    implementation difference but only use structs/functions exposed by
> > > >    pci bus. Mediated pci devices are different from normal pci devices
> > > >    on: 1. Mediated pci devices use UUID as address but normal ones use
> BDF.
> > > >    2. Mediated pci devices may have some capabilities that normal pci
> > > >    devices do not have. For example, mediated pci devices could have
> > > >    regions that have sparse mmap capability, which allows a region to
> have
> > > >    multiple mmap areas. Another example is mediated pci devices may have
> > > >    regions/part of regions not mmaped but need to access them. Above
> > > >    difference will change the current ABI (i.e., struct rte_pci_device).
> > > >    Please check 5th and 6th patch for details.
> > > >
> > > > 3. A brand new mdev bus that does everything
> > > >
> > > >    This option will implement a new and standalone mdev bus. This option
> > > >    does not need any changes in current pci bus but only needs some
> shared
> > > >    code (linux vfio part) in pci bus. Drivers of devices that support
> mdev
> > > >    will register itself as a mdev driver and do not rely on pci bus
> anymore.
> > > >    This option, IMHO, will make the code clean. The only potential
> problem
> > > >    may be code duplication, which could be solved by making code of
> linux
> > > >    vfio part of pci bus common and shared.
> > >
> > > Yes I prefer this third option.
> > > We can find an elegant way of sharing some VFIO code between buses.
> >
> > Yes, I have not thought about the details of the code sharing but will try
> to make
> > it elegant.
> 
> Great, thanks.
>
  
Jason Gunthorpe June 15, 2021, 11:57 a.m. UTC | #5
On Tue, Jun 15, 2021 at 09:48:24AM +0200, Thomas Monjalon wrote:
> 15/06/2021 04:49, Xia, Chenbo:
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 01/06/2021 05:06, Chenbo Xia:
> > > > Hi everyone,
> > > >
> > > > This is a draft implementation of the mdev (Mediated device [1])
> > > > support in DPDK PCI bus driver. Mdev is a way to virtualize devices
> > > > in Linux kernel. Based on the device-api (mdev_type/device_api),
> > > > there could be different types of mdev devices (e.g. vfio-pci).
> > > 
> > > Please could you illustrate with an usage of mdev in DPDK?
> > > What does it enable which is not possible today?
> > 
> > The main purpose is for DPDK to drive mdev-based devices, which is not
> > possible today.
> > 
> > I'd take PCI devices for an example. Currently DPDK can only drive devices
> > of physical pci bus under /sys/bus/pci and kernel exposes the pci devices
> > to APP in that way.
> > 
> > But there are PCI devices using vfio-mdev as a software framework to expose
> > Mdev to APP under /sys/bus/mdev. Devices could choose this way of virtualizing
> > itself to let multiple APPs share one physical device. For example, Intel
> > Scalable IOV technology is known to use vfio-mdev as SW framework for Scalable
> > IOV enabled devices (and Intel net/crypto/raw devices support this tech). For
> > those mdev-based devices, DPDK needs support on the bus layer to scan/plug/probe/..
> > them, which is the main effort this patchset does. There are also other devices
> > using the vfio-mdev framework, AFAIK, Nvidia's GPU is the first one using mdev
> > and Intel's GPU virtualization also uses it.
> 
> Yes mdev was designed for virtualization I think.
> The use of mdev for Scalable IOV without virtualization
> may be seen as an abuse by Linux maintainers,
> as they currently seem to prefer the auxiliary bus (which is a real bus).
> 
> Mellanox got a push back when trying to use mdev for the same purpose
> (Scalable Function, also called Sub-Function) in the kernel.
> The Linux community decided to use the auxiliary bus.
> 
> Any other feedback on the choice mdev vs aux?
> Is there any kernel code supporting this mdev model for Intel devices?

IMHO until a kernel networking driver is accepted that uses mdev this
is all just dead code in dpdk and shouldn't be merged.

I think it is unlikely that future networking drivers will use mdev.

> > > > 2: A new mdev bus that scans mediated pci devices and probes mdev driver to
> > > >    plug-in pci devices to pci bus

And we are likely not doing 'mediated pci devices' at all..

Jason