[2/2] eal: fix IOVA mode selection as VA for pci drivers
Checks
Commit Message
The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which
was intended to mean "driver only supports VA" but had been understood
as "driver supports both PA and VA" by most net drivers and used to let
dpdk processes to run as non root (which do not have access to physical
addresses on recent kernels).
The check on physical addresses actually closed the gap for those
drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this
flag can retain its intended meaning.
Document explicitly its meaning.
We can check that a driver requirement wrt to IOVA mode is fulfilled
before trying to probe a device.
Finally, document the heuristic used to select the IOVA mode and hope
that we won't break it again.
Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode")
Signed-off-by: David Marchand <david.marchand@redhat.com>
---
doc/guides/prog_guide/env_abstraction_layer.rst | 31 +++++++++++++++++++++++++
drivers/bus/pci/linux/pci.c | 16 +++++--------
drivers/bus/pci/pci_common.c | 30 +++++++++++++++++++-----
drivers/bus/pci/rte_bus_pci.h | 4 ++--
drivers/net/atlantic/atl_ethdev.c | 3 +--
drivers/net/bnxt/bnxt_ethdev.c | 3 +--
drivers/net/e1000/em_ethdev.c | 3 +--
drivers/net/e1000/igb_ethdev.c | 5 ++--
drivers/net/enic/enic_ethdev.c | 3 +--
drivers/net/fm10k/fm10k_ethdev.c | 3 +--
drivers/net/i40e/i40e_ethdev.c | 3 +--
drivers/net/i40e/i40e_ethdev_vf.c | 2 +-
drivers/net/iavf/iavf_ethdev.c | 3 +--
drivers/net/ice/ice_ethdev.c | 3 +--
drivers/net/ixgbe/ixgbe_ethdev.c | 5 ++--
drivers/net/mlx4/mlx4.c | 3 +--
drivers/net/mlx5/mlx5.c | 2 +-
drivers/net/nfp/nfp_net.c | 6 ++---
drivers/net/octeontx2/otx2_ethdev.c | 5 ----
drivers/net/qede/qede_ethdev.c | 6 ++---
drivers/raw/ioat/ioat_rawdev.c | 3 +--
lib/librte_eal/common/eal_common_bus.c | 30 +++++++++++++++++++++---
22 files changed, 110 insertions(+), 62 deletions(-)
Comments
I was expecting some replies / reviews of this patch today.
10/07/2019 23:48, David Marchand:
> The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which
> was intended to mean "driver only supports VA" but had been understood
> as "driver supports both PA and VA" by most net drivers and used to let
> dpdk processes to run as non root (which do not have access to physical
> addresses on recent kernels).
>
> The check on physical addresses actually closed the gap for those
> drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this
> flag can retain its intended meaning.
> Document explicitly its meaning.
>
> We can check that a driver requirement wrt to IOVA mode is fulfilled
> before trying to probe a device.
>
> Finally, document the heuristic used to select the IOVA mode and hope
> that we won't break it again.
>
> Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode")
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, July 11, 2019 8:11 PM
> To: dev@dpdk.org
> Cc: David Marchand <david.marchand@redhat.com>;
> anatoly.burakov@intel.com; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>; John McNamara <john.mcnamara@intel.com>;
> Marko Kovacevic <marko.kovacevic@intel.com>; Igor Russkikh
> <igor.russkikh@aquantia.com>; Pavel Belous <pavel.belous@aquantia.com>;
> Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
> <somnath.kotur@broadcom.com>; Wenzhuo Lu <wenzhuo.lu@intel.com>;
> John Daley <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>;
> Qi Zhang <qi.z.zhang@intel.com>; Xiao Wang <xiao.w.wang@intel.com>;
> Beilei Xing <beilei.xing@intel.com>; Jingjing Wu <jingjing.wu@intel.com>;
> Qiming Yang <qiming.yang@intel.com>; Konstantin Ananyev
> <konstantin.ananyev@intel.com>; Matan Azrad <matan@mellanox.com>;
> Shahaf Shuler <shahafs@mellanox.com>; Yongseok Koh
> <yskoh@mellanox.com>; Viacheslav Ovsiienko
> <viacheslavo@mellanox.com>; Alejandro Lucero
> <alejandro.lucero@netronome.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Rasesh Mody <rmody@marvell.com>; Shahed
> Shaikh <shshaikh@marvell.com>; Bruce Richardson
> <bruce.richardson@intel.com>
> Subject: Re: [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for
> pci drivers
>
> I was expecting some replies / reviews of this patch today.
In general, the theme is OK with this patch. It will fix the existing problems.
I need to spend more time on reviewing the code and documentation.
On 10-Jul-19 10:48 PM, David Marchand wrote:
> The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which
> was intended to mean "driver only supports VA" but had been understood
> as "driver supports both PA and VA" by most net drivers and used to let
> dpdk processes to run as non root (which do not have access to physical
> addresses on recent kernels).
>
> The check on physical addresses actually closed the gap for those
> drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this
> flag can retain its intended meaning.
> Document explicitly its meaning.
>
So, we always assume that all devices support both IOVA as PA and IOVA
as VA by default. Well, as long as it's understood and documented :)
Unless...
<snip>
> +
> +IOVA Mode is selected by considering what the current usable Devices on the
> +system requires and/or supports.
> +
> +Below is the 2-step heuristic for this choice.
> +
> +For the first step, EAL asks each bus its requirement in terms of IOVA mode
> +and decides on a preferred IOVA mode.
> +
> +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is RTE_IOVA_PA,
> +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is RTE_IOVA_VA,
> +- if all buses report RTE_IOVA_DC, no bus expressed a preferrence, then the
> + preferred mode is RTE_IOVA_DC,
> +- if the buses disagree (at least one wants RTE_IOVA_PA and at least one wants
> + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the
> + check on Physical Addresses availability),
> +
> +The second step is checking if the preferred mode complies with the Physical
> +Addresses availability since those are only available to root user in recent
> +kernels.
> +
> +- if the preferred mode is RTE_IOVA_PA but there is no access to Physical
> + Addresses, then EAL init will fail early, since later probing of the devices
> + would fail anyway,
> +- if the preferred mode is RTE_IOVA_DC then based on the Physical Addresses
> + availability, the preferred mode is adjusted to RTE_IOVA_PA or RTE_IOVA_VA.
> + In the case when the buses had disagreed on the IOVA Mode at the first step,
> + part of the buses won't work because of this decision.
Is there any specific reason why we always prefer PA if physical
addresses are available? Since we're already assuming that all devices
support PA and VA anyway, what's the harm in enabling VA by default?
I seem to recall there were some concerns around SPDK and PA address
availability - doesn't that mean that the assumption regarding PA and VA
mode always being supported doesn't actually hold in practice?
By the way, the reason i'm harping away on IOVA as VA being the default
is because having IOVA as PA is not a free (as in beer) choice - we
sacrifice some usability by doing that. Right now, by default, mempool
will ask for IOVA-contiguous memory first, and this is slow in IOVA as
PA mode - meaning, e.g. testpmd startup time is greatly increased for
smaller page sizes because of IOVA as PA mode is the default in DPDK.
I would also like to steer people away from using real physical
addresses because doing so while requiring lots of IOVA contiguous
memory also requires legacy mem mode, which i would rather people not
use and grow dependent on, and would like to remove it at some point as
it adds a lot of complexity for a corner case.
So, picking address mode is not *just* about whether the device supports
them - it has usability implications as well.
12/07/2019 13:03, Burakov, Anatoly:
> On 10-Jul-19 10:48 PM, David Marchand wrote:
> > The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which
> > was intended to mean "driver only supports VA" but had been understood
> > as "driver supports both PA and VA" by most net drivers and used to let
> > dpdk processes to run as non root (which do not have access to physical
> > addresses on recent kernels).
> >
> > The check on physical addresses actually closed the gap for those
> > drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this
> > flag can retain its intended meaning.
> > Document explicitly its meaning.
> >
>
> So, we always assume that all devices support both IOVA as PA and IOVA
> as VA by default. Well, as long as it's understood and documented :)
Yes
Please make sure it is well documented.
> Unless...
>
>
> <snip>
>
> > +
> > +IOVA Mode is selected by considering what the current usable Devices on the
> > +system requires and/or supports.
> > +
> > +Below is the 2-step heuristic for this choice.
> > +
> > +For the first step, EAL asks each bus its requirement in terms of IOVA mode
> > +and decides on a preferred IOVA mode.
> > +
> > +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is RTE_IOVA_PA,
> > +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is RTE_IOVA_VA,
> > +- if all buses report RTE_IOVA_DC, no bus expressed a preferrence, then the
> > + preferred mode is RTE_IOVA_DC,
> > +- if the buses disagree (at least one wants RTE_IOVA_PA and at least one wants
> > + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the
> > + check on Physical Addresses availability),
> > +
> > +The second step is checking if the preferred mode complies with the Physical
> > +Addresses availability since those are only available to root user in recent
> > +kernels.
> > +
> > +- if the preferred mode is RTE_IOVA_PA but there is no access to Physical
> > + Addresses, then EAL init will fail early, since later probing of the devices
> > + would fail anyway,
> > +- if the preferred mode is RTE_IOVA_DC then based on the Physical Addresses
> > + availability, the preferred mode is adjusted to RTE_IOVA_PA or RTE_IOVA_VA.
> > + In the case when the buses had disagreed on the IOVA Mode at the first step,
> > + part of the buses won't work because of this decision.
>
> Is there any specific reason why we always prefer PA if physical
> addresses are available? Since we're already assuming that all devices
> support PA and VA anyway, what's the harm in enabling VA by default?
If PA is available, it means we are running as root.
We can assume that using root is a choice, probably related
to a preference for PA.
> I seem to recall there were some concerns around SPDK and PA address
> availability - doesn't that mean that the assumption regarding PA and VA
> mode always being supported doesn't actually hold in practice?
>
> By the way, the reason i'm harping away on IOVA as VA being the default
> is because having IOVA as PA is not a free (as in beer) choice - we
> sacrifice some usability by doing that. Right now, by default, mempool
> will ask for IOVA-contiguous memory first, and this is slow in IOVA as
> PA mode - meaning, e.g. testpmd startup time is greatly increased for
> smaller page sizes because of IOVA as PA mode is the default in DPDK.
>
> I would also like to steer people away from using real physical
> addresses because doing so while requiring lots of IOVA contiguous
> memory also requires legacy mem mode, which i would rather people not
> use and grow dependent on, and would like to remove it at some point as
> it adds a lot of complexity for a corner case.
That's why we should better encourage to not run as root.
We need more documentation about how to run as normal user.
> So, picking address mode is not *just* about whether the device supports
> them - it has usability implications as well.
If we consider running as root an exception, then it makes
sense to pick address mode which fits this exception (PA).
On 12-Jul-19 1:43 PM, Thomas Monjalon wrote:
> 12/07/2019 13:03, Burakov, Anatoly:
>> On 10-Jul-19 10:48 PM, David Marchand wrote:
>>> The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which
>>> was intended to mean "driver only supports VA" but had been understood
>>> as "driver supports both PA and VA" by most net drivers and used to let
>>> dpdk processes to run as non root (which do not have access to physical
>>> addresses on recent kernels).
>>>
>>> The check on physical addresses actually closed the gap for those
>>> drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this
>>> flag can retain its intended meaning.
>>> Document explicitly its meaning.
>>>
>>
>> So, we always assume that all devices support both IOVA as PA and IOVA
>> as VA by default. Well, as long as it's understood and documented :)
>
> Yes
> Please make sure it is well documented.
>
>> Unless...
>>
>>
>> <snip>
>>
>>> +
>>> +IOVA Mode is selected by considering what the current usable Devices on the
>>> +system requires and/or supports.
>>> +
>>> +Below is the 2-step heuristic for this choice.
>>> +
>>> +For the first step, EAL asks each bus its requirement in terms of IOVA mode
>>> +and decides on a preferred IOVA mode.
>>> +
>>> +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is RTE_IOVA_PA,
>>> +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is RTE_IOVA_VA,
>>> +- if all buses report RTE_IOVA_DC, no bus expressed a preferrence, then the
>>> + preferred mode is RTE_IOVA_DC,
>>> +- if the buses disagree (at least one wants RTE_IOVA_PA and at least one wants
>>> + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the
>>> + check on Physical Addresses availability),
>>> +
>>> +The second step is checking if the preferred mode complies with the Physical
>>> +Addresses availability since those are only available to root user in recent
>>> +kernels.
>>> +
>>> +- if the preferred mode is RTE_IOVA_PA but there is no access to Physical
>>> + Addresses, then EAL init will fail early, since later probing of the devices
>>> + would fail anyway,
>>> +- if the preferred mode is RTE_IOVA_DC then based on the Physical Addresses
>>> + availability, the preferred mode is adjusted to RTE_IOVA_PA or RTE_IOVA_VA.
>>> + In the case when the buses had disagreed on the IOVA Mode at the first step,
>>> + part of the buses won't work because of this decision.
>>
>> Is there any specific reason why we always prefer PA if physical
>> addresses are available? Since we're already assuming that all devices
>> support PA and VA anyway, what's the harm in enabling VA by default?
>
> If PA is available, it means we are running as root.
> We can assume that using root is a choice, probably related
> to a preference for PA.
>
>> I seem to recall there were some concerns around SPDK and PA address
>> availability - doesn't that mean that the assumption regarding PA and VA
>> mode always being supported doesn't actually hold in practice?
>>
>> By the way, the reason i'm harping away on IOVA as VA being the default
>> is because having IOVA as PA is not a free (as in beer) choice - we
>> sacrifice some usability by doing that. Right now, by default, mempool
>> will ask for IOVA-contiguous memory first, and this is slow in IOVA as
>> PA mode - meaning, e.g. testpmd startup time is greatly increased for
>> smaller page sizes because of IOVA as PA mode is the default in DPDK.
>>
>> I would also like to steer people away from using real physical
>> addresses because doing so while requiring lots of IOVA contiguous
>> memory also requires legacy mem mode, which i would rather people not
>> use and grow dependent on, and would like to remove it at some point as
>> it adds a lot of complexity for a corner case.
>
> That's why we should better encourage to not run as root.
> We need more documentation about how to run as normal user.
>
>> So, picking address mode is not *just* about whether the device supports
>> them - it has usability implications as well.
>
> If we consider running as root an exception, then it makes
> sense to pick address mode which fits this exception (PA).
>
When you put it that way, that does indeed make sense. Typically though,
developers tend to run as root. I shall hereby stop doing so :)
On Fri, Jul 12, 2019 at 01:58:46PM +0100, Burakov, Anatoly wrote:
> On 12-Jul-19 1:43 PM, Thomas Monjalon wrote:
> > If we consider running as root an exception, then it makes
> > sense to pick address mode which fits this exception (PA).
> >
>
> When you put it that way, that does indeed make sense. Typically though,
> developers tend to run as root. I shall hereby stop doing so :)
>
Welcome to the sane side! Learn to love "sudo"!
> > > +
> > > +IOVA Mode is selected by considering what the current usable
> > > +Devices on the system requires and/or supports.
> > > +
> > > +Below is the 2-step heuristic for this choice.
> > > +
> > > +For the first step, EAL asks each bus its requirement in terms of
> > > +IOVA mode and decides on a preferred IOVA mode.
> > > +
> > > +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is
> > > +RTE_IOVA_PA,
> > > +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is
> > > +RTE_IOVA_VA,
> > > +- if all buses report RTE_IOVA_DC, no bus expressed a preferrence,
> > > +then the
> > > + preferred mode is RTE_IOVA_DC,
> > > +- if the buses disagree (at least one wants RTE_IOVA_PA and at
> > > +least one wants
> > > + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see
> > > +below with the
> > > + check on Physical Addresses availability),
> > > +
> > > +The second step is checking if the preferred mode complies with the
> > > +Physical Addresses availability since those are only available to
> > > +root user in recent kernels.
> > > +
> > > +- if the preferred mode is RTE_IOVA_PA but there is no access to
> > > +Physical
> > > + Addresses, then EAL init will fail early, since later probing of
> > > +the devices
> > > + would fail anyway,
> > > +- if the preferred mode is RTE_IOVA_DC then based on the Physical
> > > +Addresses
> > > + availability, the preferred mode is adjusted to RTE_IOVA_PA or
> RTE_IOVA_VA.
> > > + In the case when the buses had disagreed on the IOVA Mode at the
> > > +first step,
> > > + part of the buses won't work because of this decision.
> >
> > Is there any specific reason why we always prefer PA if physical
> > addresses are available? Since we're already assuming that all devices
> > support PA and VA anyway, what's the harm in enabling VA by default?
>
> If PA is available, it means we are running as root.
> We can assume that using root is a choice, probably related to a preference
> for PA.
# Even if we are running as root, Why to choose PA in case of DC?
ie. Following logic is not need
if (iova_mode == RTE_IOVA_DC) {
iova_mode = phys_addrs ? RTE_IOVA_PA : RTE_IOVA_VA;
RTE_LOG(DEBUG, EAL,
"Buses did not request a specific IOVA mode, using '%s' based on physical addresses availability.\n",
phys_addrs ? "PA" : "VA");
}
# When DPDK running on guest, Anyway it can not access the real PA, It will be IPA.
So I don't understand logic behind choose PA when DC. To me, it make sense to choose PA when DC.
# To align with RTE_PCI_DRV_NEED_MAPPING flag and reflect it "need" rather
than support, I think, flag can be changed to RTE_PCI_DRV_NEED_IOVA_AS_VA
Other than above points,
Reviewed this patch and tested on octeontx2, It looks good to me.
15/07/2019 16:26, Jerin Jacob Kollanukkaran:
> > > > +
> > > > +IOVA Mode is selected by considering what the current usable
> > > > +Devices on the system requires and/or supports.
> > > > +
> > > > +Below is the 2-step heuristic for this choice.
> > > > +
> > > > +For the first step, EAL asks each bus its requirement in terms of
> > > > +IOVA mode and decides on a preferred IOVA mode.
> > > > +
> > > > +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is
> > > > +RTE_IOVA_PA,
> > > > +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is
> > > > +RTE_IOVA_VA,
> > > > +- if all buses report RTE_IOVA_DC, no bus expressed a preferrence,
> > > > +then the
> > > > + preferred mode is RTE_IOVA_DC,
> > > > +- if the buses disagree (at least one wants RTE_IOVA_PA and at
> > > > +least one wants
> > > > + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see
> > > > +below with the
> > > > + check on Physical Addresses availability),
> > > > +
> > > > +The second step is checking if the preferred mode complies with the
> > > > +Physical Addresses availability since those are only available to
> > > > +root user in recent kernels.
> > > > +
> > > > +- if the preferred mode is RTE_IOVA_PA but there is no access to
> > > > +Physical
> > > > + Addresses, then EAL init will fail early, since later probing of
> > > > +the devices
> > > > + would fail anyway,
> > > > +- if the preferred mode is RTE_IOVA_DC then based on the Physical
> > > > +Addresses
> > > > + availability, the preferred mode is adjusted to RTE_IOVA_PA or
> > RTE_IOVA_VA.
> > > > + In the case when the buses had disagreed on the IOVA Mode at the
> > > > +first step,
> > > > + part of the buses won't work because of this decision.
> > >
> > > Is there any specific reason why we always prefer PA if physical
> > > addresses are available? Since we're already assuming that all devices
> > > support PA and VA anyway, what's the harm in enabling VA by default?
> >
> > If PA is available, it means we are running as root.
> > We can assume that using root is a choice, probably related to a preference
> > for PA.
>
> # Even if we are running as root, Why to choose PA in case of DC?
> ie. Following logic is not need
> if (iova_mode == RTE_IOVA_DC) {
> iova_mode = phys_addrs ? RTE_IOVA_PA : RTE_IOVA_VA;
> RTE_LOG(DEBUG, EAL,
> "Buses did not request a specific IOVA mode, using '%s' based on physical addresses availability.\n",
> phys_addrs ? "PA" : "VA");
> }
Why running as root if using VA anyway?
We can assume the user knows what he is doing, so it is a user choice.
We want to allow the user choosing, right?
> # When DPDK running on guest, Anyway it can not access the real PA, It will be IPA.
What is IPA? Isn't it a beer?
> So I don't understand logic behind choose PA when DC.
> To me, it make sense to choose PA when DC.
You probably mean "choose VA".
> # To align with RTE_PCI_DRV_NEED_MAPPING flag and reflect it "need" rather
> than support, I think, flag can be changed to RTE_PCI_DRV_NEED_IOVA_AS_VA
I think the most important is to have a good documentation of this flag
(it was not done properly when Cavium introduced it initially).
If you want to rename the flag, you can do it in a separate patch.
If renaming, I really would like to get an answer to an old question:
Why IO adress is called IOVA? The name "IOVA_AS_VA" looks strange.
For reference, one description of addressing:
https://lists.linuxfoundation.org/pipermail/iommu/2018-May/027686.html
About the naming, do you remember how I insisted to have a correct naming
of all related stuff in DPDK? It was hard to get it accepted,
the discussion was not nice and I stopped insisting to get all details fine
because I just got bored. It was a really bad experience.
You can ask why I remind this now? Because we must take care of all
details, make sure our messages are well understood, and be cooperative.
> Other than above points,
> Reviewed this patch and tested on octeontx2, It looks good to me.
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, July 15, 2019 8:34 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> Cc: Burakov, Anatoly <anatoly.burakov@intel.com>; David Marchand
> <david.marchand@redhat.com>; dev@dpdk.org; John McNamara
> <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Igor Russkikh
> <igor.russkikh@aquantia.com>; Pavel Belous <pavel.belous@aquantia.com>;
> Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
> <somnath.kotur@broadcom.com>; Wenzhuo Lu <wenzhuo.lu@intel.com>;
> John Daley <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>;
> Qi Zhang <qi.z.zhang@intel.com>; Xiao Wang <xiao.w.wang@intel.com>;
> Beilei Xing <beilei.xing@intel.com>; Jingjing Wu <jingjing.wu@intel.com>;
> Qiming Yang <qiming.yang@intel.com>; Konstantin Ananyev
> <konstantin.ananyev@intel.com>; Matan Azrad <matan@mellanox.com>;
> Shahaf Shuler <shahafs@mellanox.com>; Yongseok Koh
> <yskoh@mellanox.com>; Viacheslav Ovsiienko
> <viacheslavo@mellanox.com>; Alejandro Lucero
> <alejandro.lucero@netronome.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Rasesh Mody <rmody@marvell.com>; Shahed
> Shaikh <shshaikh@marvell.com>; Bruce Richardson
> <bruce.richardson@intel.com>; alialnu@mellanox.com;
> aconole@redhat.com
> Subject: Re: [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers
>
> 15/07/2019 16:26, Jerin Jacob Kollanukkaran:
> > > > > +
> > > > > +IOVA Mode is selected by considering what the current usable
> > > > > +Devices on the system requires and/or supports.
> > > > > +
> > > > > +Below is the 2-step heuristic for this choice.
> > > > > +
> > > > > +For the first step, EAL asks each bus its requirement in terms
> > > > > +of IOVA mode and decides on a preferred IOVA mode.
> > > > > +
> > > > > +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode
> > > > > +is RTE_IOVA_PA,
> > > > > +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode
> > > > > +is RTE_IOVA_VA,
> > > > > +- if all buses report RTE_IOVA_DC, no bus expressed a
> > > > > +preferrence, then the
> > > > > + preferred mode is RTE_IOVA_DC,
> > > > > +- if the buses disagree (at least one wants RTE_IOVA_PA and at
> > > > > +least one wants
> > > > > + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC
> > > > > +(see below with the
> > > > > + check on Physical Addresses availability),
> > > > > +
> > > > > +The second step is checking if the preferred mode complies with
> > > > > +the Physical Addresses availability since those are only
> > > > > +available to root user in recent kernels.
> > > > > +
> > > > > +- if the preferred mode is RTE_IOVA_PA but there is no access
> > > > > +to Physical
> > > > > + Addresses, then EAL init will fail early, since later probing
> > > > > +of the devices
> > > > > + would fail anyway,
> > > > > +- if the preferred mode is RTE_IOVA_DC then based on the
> > > > > +Physical Addresses
> > > > > + availability, the preferred mode is adjusted to RTE_IOVA_PA
> > > > > +or
> > > RTE_IOVA_VA.
> > > > > + In the case when the buses had disagreed on the IOVA Mode at
> > > > > +the first step,
> > > > > + part of the buses won't work because of this decision.
> > > >
> > > > Is there any specific reason why we always prefer PA if physical
> > > > addresses are available? Since we're already assuming that all
> > > > devices support PA and VA anyway, what's the harm in enabling VA by
> default?
> > >
> > > If PA is available, it means we are running as root.
> > > We can assume that using root is a choice, probably related to a
> > > preference for PA.
> >
> > # Even if we are running as root, Why to choose PA in case of DC?
> > ie. Following logic is not need
> > if (iova_mode == RTE_IOVA_DC) {
> > iova_mode = phys_addrs ? RTE_IOVA_PA : RTE_IOVA_VA;
> > RTE_LOG(DEBUG, EAL,
> > "Buses did not request a specific IOVA mode, using '%s'
> based on physical addresses availability.\n",
> > phys_addrs ? "PA" : "VA");
> > }
>
> Why running as root if using VA anyway?
> We can assume the user knows what he is doing, so it is a user choice.
> We want to allow the user choosing, right?
The user can override iova=pa/va as eal argument if user needs to run a specific mode.
Running as root for various other reason(just be lazy) etc. it is not or it should not
be connected to set the mode as PA.
>
> > # When DPDK running on guest, Anyway it can not access the real PA, It will
> be IPA.
>
> What is IPA? Isn't it a beer?
There may a beer with that name. In this context, it is "Intermediate physical address"
>
> > So I don't understand logic behind choose PA when DC.
> > To me, it make sense to choose PA when DC.
>
> You probably mean "choose VA".
Yup.
>
> > # To align with RTE_PCI_DRV_NEED_MAPPING flag and reflect it "need"
> > rather than support, I think, flag can be changed to
> > RTE_PCI_DRV_NEED_IOVA_AS_VA
>
> I think the most important is to have a good documentation of this flag (it
> was not done properly when Cavium introduced it initially).
> If you want to rename the flag, you can do it in a separate patch.
> If renaming, I really would like to get an answer to an old question:
> Why IO adress is called IOVA? The name "IOVA_AS_VA" looks strange.
IOVA = IO virtual address
Since IOVA can be PA or VA, the name IOVA_AS_VA as chosen
> For reference, one description of addressing:
> https://lists.linuxfoundation.org/pipermail/iommu/2018-May/027686.html
>
> About the naming, do you remember how I insisted to have a correct naming
> of all related stuff in DPDK? It was hard to get it accepted, the discussion was
> not nice and I stopped insisting to get all details fine because I just got bored.
> It was a really bad experience.
I agree.
To me that bad experience was due to mostly not having enough technical comments
On the proposal. Though I am not the author/owner of it.
> You can ask why I remind this now? Because we must take care of all details,
> make sure our messages are well understood, and be cooperative.
No disagreement.
If we see the history the meaning got changed/updated in this commit
By adding intel drivers to it. I would nt say it is big ideal, It just C code,
It can be changed based on the need. I think, what really import is,
maintain the the feature and commitment towards fixing any issue.
commit f37dfab21c988d2d0ecb3c82be4ba9738c7e51c7
Author: Jianfeng Tan <jianfeng.tan@intel.com>
Date: Wed Oct 11 10:33:48 2017 +0000
drivers/net: enable IOVA mode for Intel PMDs
If we want to enable IOVA mode, introduced by
commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
we need PMDs (for PCI devices) to expose this flag.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>
> > Other than above points,
> > Reviewed this patch and tested on octeontx2, It looks good to me.
>
>
15/07/2019 17:35, Jerin Jacob Kollanukkaran:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 15/07/2019 16:26, Jerin Jacob Kollanukkaran:
> > > > > Is there any specific reason why we always prefer PA if physical
> > > > > addresses are available? Since we're already assuming that all
> > > > > devices support PA and VA anyway, what's the harm in enabling VA by
> > default?
> > > >
> > > > If PA is available, it means we are running as root.
> > > > We can assume that using root is a choice, probably related to a
> > > > preference for PA.
> > >
> > > # Even if we are running as root, Why to choose PA in case of DC?
> > > ie. Following logic is not need
> > > if (iova_mode == RTE_IOVA_DC) {
> > > iova_mode = phys_addrs ? RTE_IOVA_PA : RTE_IOVA_VA;
> > > RTE_LOG(DEBUG, EAL,
> > > "Buses did not request a specific IOVA mode, using '%s'
> > based on physical addresses availability.\n",
> > > phys_addrs ? "PA" : "VA");
> > > }
> >
> > Why running as root if using VA anyway?
> > We can assume the user knows what he is doing, so it is a user choice.
> > We want to allow the user choosing, right?
>
> The user can override iova=pa/va as eal argument if user needs to run a specific mode.
> Running as root for various other reason(just be lazy) etc. it is not or it should not
> be connected to set the mode as PA.
Good point.
I tend to prefer avoiding the use of EAL arguments because they may
be unavailable, depending on the application.
> > > # When DPDK running on guest, Anyway it can not access the real PA, It will
> > be IPA.
> >
> > What is IPA? Isn't it a beer?
>
> There may a beer with that name. In this context, it is "Intermediate physical address"
>
> > > So I don't understand logic behind choose PA when DC.
> > > To me, it make sense to choose PA when DC.
> >
> > You probably mean "choose VA".
>
> Yup.
>
> > > # To align with RTE_PCI_DRV_NEED_MAPPING flag and reflect it "need"
> > > rather than support, I think, flag can be changed to
> > > RTE_PCI_DRV_NEED_IOVA_AS_VA
> >
> > I think the most important is to have a good documentation of this flag (it
> > was not done properly when Cavium introduced it initially).
> > If you want to rename the flag, you can do it in a separate patch.
> > If renaming, I really would like to get an answer to an old question:
> > Why IO adress is called IOVA? The name "IOVA_AS_VA" looks strange.
>
> IOVA = IO virtual address
> Since IOVA can be PA or VA, the name IOVA_AS_VA as chosen
We could also call it "bus address" or "device address".
I think the word "IOVA" was enforced by Linux.
Anyway, my real issue when using "virtual" is that we don't really
know what we are talking about: is it an IOMMU translated address
on the device side or an MMU translated address on the application side?
I think we should better explain things.
One diagram which can help:
https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit#/media/File:MMU_and_IOMMU.svg
> > For reference, one description of addressing:
> > https://lists.linuxfoundation.org/pipermail/iommu/2018-May/027686.html
> >
> > About the naming, do you remember how I insisted to have a correct naming
> > of all related stuff in DPDK? It was hard to get it accepted, the discussion was
> > not nice and I stopped insisting to get all details fine because I just got bored.
> > It was a really bad experience.
>
> I agree.
> To me that bad experience was due to mostly not having enough technical comments
> On the proposal. Though I am not the author/owner of it.
>
> > You can ask why I remind this now? Because we must take care of all details,
> > make sure our messages are well understood, and be cooperative.
>
> No disagreement.
> If we see the history the meaning got changed/updated in this commit
> By adding intel drivers to it. I would nt say it is big ideal, It just C code,
> It can be changed based on the need. I think, what really import is,
> maintain the the feature and commitment towards fixing any issue.
>
> commit f37dfab21c988d2d0ecb3c82be4ba9738c7e51c7
> Author: Jianfeng Tan <jianfeng.tan@intel.com>
> Date: Wed Oct 11 10:33:48 2017 +0000
>
> drivers/net: enable IOVA mode for Intel PMDs
>
> If we want to enable IOVA mode, introduced by
> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
> we need PMDs (for PCI devices) to expose this flag.
>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
The doxygen meaning did not change from day one:
/** Device driver supports IOVA as VA */
But the commit log meaning was:
"Flag used when driver needs to operate in iova=va mode."
And the Intel commit log had a different understanding:
"If we want to enable IOVA mode, [..] we need PMDs [..] to expose this flag."
Anyway we agree on the new meaning to be the original one the author
had in mind (i.e. "driver needs").
> > > Other than above points,
> > > Reviewed this patch and tested on octeontx2, It looks good to me.
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, July 15, 2019 9:36 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> Cc: Burakov, Anatoly <anatoly.burakov@intel.com>; David Marchand
> <david.marchand@redhat.com>; dev@dpdk.org; John McNamara
> <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Igor Russkikh
> <igor.russkikh@aquantia.com>; Pavel Belous <pavel.belous@aquantia.com>;
> Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur
> <somnath.kotur@broadcom.com>; Wenzhuo Lu <wenzhuo.lu@intel.com>;
> John Daley <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>;
> Qi Zhang <qi.z.zhang@intel.com>; Xiao Wang <xiao.w.wang@intel.com>;
> Beilei Xing <beilei.xing@intel.com>; Jingjing Wu <jingjing.wu@intel.com>;
> Qiming Yang <qiming.yang@intel.com>; Konstantin Ananyev
> <konstantin.ananyev@intel.com>; Matan Azrad <matan@mellanox.com>;
> Shahaf Shuler <shahafs@mellanox.com>; Yongseok Koh
> <yskoh@mellanox.com>; Viacheslav Ovsiienko
> <viacheslavo@mellanox.com>; Alejandro Lucero
> <alejandro.lucero@netronome.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; Kiran Kumar Kokkilagadda
> <kirankumark@marvell.com>; Rasesh Mody <rmody@marvell.com>; Shahed
> Shaikh <shshaikh@marvell.com>; Bruce Richardson
> <bruce.richardson@intel.com>; alialnu@mellanox.com;
> aconole@redhat.com
> Subject: Re: [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers
>
> 15/07/2019 17:35, Jerin Jacob Kollanukkaran:
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 15/07/2019 16:26, Jerin Jacob Kollanukkaran:
> > > > > > Is there any specific reason why we always prefer PA if
> > > > > > physical addresses are available? Since we're already assuming
> > > > > > that all devices support PA and VA anyway, what's the harm in
> > > > > > enabling VA by
> > > default?
> > > > >
> > > > > If PA is available, it means we are running as root.
> > > > > We can assume that using root is a choice, probably related to a
> > > > > preference for PA.
> > > >
> > > > # Even if we are running as root, Why to choose PA in case of DC?
> > > > ie. Following logic is not need
> > > > if (iova_mode == RTE_IOVA_DC) {
> > > > iova_mode = phys_addrs ? RTE_IOVA_PA : RTE_IOVA_VA;
> > > > RTE_LOG(DEBUG, EAL,
> > > > "Buses did not request a specific IOVA mode, using '%s'
> > > based on physical addresses availability.\n",
> > > > phys_addrs ? "PA" : "VA");
> > > > }
> > >
> > > Why running as root if using VA anyway?
> > > We can assume the user knows what he is doing, so it is a user choice.
> > > We want to allow the user choosing, right?
> >
> > The user can override iova=pa/va as eal argument if user needs to run a
> specific mode.
> > Running as root for various other reason(just be lazy) etc. it is not
> > or it should not be connected to set the mode as PA.
>
> Good point.
> I tend to prefer avoiding the use of EAL arguments because they may be
> unavailable, depending on the application.
Yes. The default case suffice the requirement here.ie when it DC chosen from
bus layer select the VA. I don't see any point in overriding that.
It is a good default. Do you think any case where it need to be "changed"?
If not, let stick with VA i.e until unless if there no HARD requirement for PA.
i.e Stayaway from PA WHEN possible.
>
> > > > # When DPDK running on guest, Anyway it can not access the real
> > > > PA, It will
> > > be IPA.
> > >
> > > What is IPA? Isn't it a beer?
> >
> > There may a beer with that name. In this context, it is "Intermediate
> physical address"
> >
> > > > So I don't understand logic behind choose PA when DC.
> > > > To me, it make sense to choose PA when DC.
> > >
> > > You probably mean "choose VA".
> >
> > Yup.
> >
> > > > # To align with RTE_PCI_DRV_NEED_MAPPING flag and reflect it "need"
> > > > rather than support, I think, flag can be changed to
> > > > RTE_PCI_DRV_NEED_IOVA_AS_VA
> > >
> > > I think the most important is to have a good documentation of this
> > > flag (it was not done properly when Cavium introduced it initially).
> > > If you want to rename the flag, you can do it in a separate patch.
> > > If renaming, I really would like to get an answer to an old question:
> > > Why IO adress is called IOVA? The name "IOVA_AS_VA" looks strange.
> >
> > IOVA = IO virtual address
> > Since IOVA can be PA or VA, the name IOVA_AS_VA as chosen
>
> We could also call it "bus address" or "device address".
> I think the word "IOVA" was enforced by Linux.
> Anyway, my real issue when using "virtual" is that we don't really know what
> we are talking about: is it an IOMMU translated address on the device side or
> an MMU translated address on the application side?
Actually in linux kernel, it creates the same mapping for device and CPU so
user both can access the very same address. In this context, IOVA means,
Virtual address for device. The OS can do same mapping in CPU MMU tables as well.
>
> I think we should better explain things.
> One diagram which can help:
> https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_managem
> ent_unit#/media/File:MMU_and_IOMMU.svg
>
> > > For reference, one description of addressing:
> > > https://lists.linuxfoundation.org/pipermail/iommu/2018-May/027686.ht
> > > ml
> > >
> > > About the naming, do you remember how I insisted to have a correct
> > > naming of all related stuff in DPDK? It was hard to get it accepted,
> > > the discussion was not nice and I stopped insisting to get all details fine
> because I just got bored.
> > > It was a really bad experience.
> >
> > I agree.
> > To me that bad experience was due to mostly not having enough
> > technical comments On the proposal. Though I am not the author/owner of
> it.
> >
> > > You can ask why I remind this now? Because we must take care of all
> > > details, make sure our messages are well understood, and be
> cooperative.
> >
> > No disagreement.
> > If we see the history the meaning got changed/updated in this commit
> > By adding intel drivers to it. I would nt say it is big ideal, It
> > just C code, It can be changed based on the need. I think, what really
> > import is, maintain the the feature and commitment towards fixing any
> issue.
> >
> > commit f37dfab21c988d2d0ecb3c82be4ba9738c7e51c7
> > Author: Jianfeng Tan <jianfeng.tan@intel.com>
> > Date: Wed Oct 11 10:33:48 2017 +0000
> >
> > drivers/net: enable IOVA mode for Intel PMDs
> >
> > If we want to enable IOVA mode, introduced by
> > commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
> > we need PMDs (for PCI devices) to expose this flag.
> >
> > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>
> The doxygen meaning did not change from day one:
> /** Device driver supports IOVA as VA */ But the commit log
> meaning was:
> "Flag used when driver needs to operate in iova=va mode."
> And the Intel commit log had a different understanding:
> "If we want to enable IOVA mode, [..] we need PMDs [..] to expose
> this flag."
>
> Anyway we agree on the new meaning to be the original one the author had
> in mind (i.e. "driver needs").
>
> > > > Other than above points,
> > > > Reviewed this patch and tested on octeontx2, It looks good to me.
>
>
@@ -419,6 +419,37 @@ Misc Functions
Locks and atomic operations are per-architecture (i686 and x86_64).
+IOVA Mode Detection
+~~~~~~~~~~~~~~~~~~~
+
+IOVA Mode is selected by considering what the current usable Devices on the
+system requires and/or supports.
+
+Below is the 2-step heuristic for this choice.
+
+For the first step, EAL asks each bus its requirement in terms of IOVA mode
+and decides on a preferred IOVA mode.
+
+- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is RTE_IOVA_PA,
+- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is RTE_IOVA_VA,
+- if all buses report RTE_IOVA_DC, no bus expressed a preferrence, then the
+ preferred mode is RTE_IOVA_DC,
+- if the buses disagree (at least one wants RTE_IOVA_PA and at least one wants
+ RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the
+ check on Physical Addresses availability),
+
+The second step is checking if the preferred mode complies with the Physical
+Addresses availability since those are only available to root user in recent
+kernels.
+
+- if the preferred mode is RTE_IOVA_PA but there is no access to Physical
+ Addresses, then EAL init will fail early, since later probing of the devices
+ would fail anyway,
+- if the preferred mode is RTE_IOVA_DC then based on the Physical Addresses
+ availability, the preferred mode is adjusted to RTE_IOVA_PA or RTE_IOVA_VA.
+ In the case when the buses had disagreed on the IOVA Mode at the first step,
+ part of the buses won't work because of this decision.
+
IOVA Mode Configuration
~~~~~~~~~~~~~~~~~~~~~~~
@@ -578,12 +578,10 @@ enum rte_iova_mode
else
is_vfio_noiommu_enabled = 0;
}
- if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) == 0) {
+ if (is_vfio_noiommu_enabled != 0)
iova_mode = RTE_IOVA_PA;
- } else if (is_vfio_noiommu_enabled != 0) {
- RTE_LOG(DEBUG, EAL, "Forcing to 'PA', vfio-noiommu mode configured\n");
- iova_mode = RTE_IOVA_PA;
- }
+ else if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) != 0)
+ iova_mode = RTE_IOVA_VA;
#endif
break;
}
@@ -594,8 +592,8 @@ enum rte_iova_mode
break;
default:
- RTE_LOG(DEBUG, EAL, "Unsupported kernel driver? Defaulting to IOVA as 'PA'\n");
- iova_mode = RTE_IOVA_PA;
+ if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) != 0)
+ iova_mode = RTE_IOVA_VA;
break;
}
@@ -607,10 +605,8 @@ enum rte_iova_mode
if (iommu_no_va == -1)
iommu_no_va = pci_one_device_iommu_support_va(pdev)
? 0 : 1;
- if (iommu_no_va != 0) {
- RTE_LOG(DEBUG, EAL, "Forcing to 'PA', IOMMU does not support IOVA as 'VA'\n");
+ if (iommu_no_va != 0)
iova_mode = RTE_IOVA_PA;
- }
}
return iova_mode;
}
@@ -169,8 +169,22 @@ static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev)
* This needs to be before rte_pci_map_device(), as it enables to use
* driver flags for adjusting configuration.
*/
- if (!already_probed)
+ if (!already_probed) {
+ enum rte_iova_mode dev_iova_mode;
+ enum rte_iova_mode iova_mode;
+
+ dev_iova_mode = pci_device_iova_mode(dr, dev);
+ iova_mode = rte_eal_iova_mode();
+ if (dev_iova_mode != RTE_IOVA_DC &&
+ dev_iova_mode != iova_mode) {
+ RTE_LOG(ERR, EAL, " Expecting '%s' IOVA mode but current mode is '%s', not initializing\n",
+ dev_iova_mode == RTE_IOVA_PA ? "PA" : "VA",
+ iova_mode == RTE_IOVA_PA ? "PA" : "VA");
+ return -EINVAL;
+ }
+
dev->driver = dr;
+ }
if (!already_probed && (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING)) {
/* map resources for devices that use igb_uio */
@@ -629,12 +643,16 @@ enum rte_iova_mode
devices_want_va = true;
}
}
- if (devices_want_pa) {
- iova_mode = RTE_IOVA_PA;
- if (devices_want_va)
- RTE_LOG(WARNING, EAL, "Some devices want 'VA' but forcing 'PA' because other devices want it\n");
- } else if (devices_want_va) {
+ if (devices_want_va && !devices_want_pa) {
iova_mode = RTE_IOVA_VA;
+ } else if (devices_want_pa && !devices_want_va) {
+ iova_mode = RTE_IOVA_PA;
+ } else {
+ iova_mode = RTE_IOVA_DC;
+ if (devices_want_va) {
+ RTE_LOG(WARNING, EAL, "Some devices want 'VA' but forcing 'DC' because other devices want 'PA'.\n");
+ RTE_LOG(WARNING, EAL, "Depending on the final decision by the EAL, part of your devices won't initialise.\n");
+ }
}
return iova_mode;
}
@@ -187,8 +187,8 @@ struct rte_pci_bus {
#define RTE_PCI_DRV_INTR_RMV 0x0010
/** Device driver needs to keep mapped resources if unsupported dev detected */
#define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
-/** Device driver supports IOVA as VA */
-#define RTE_PCI_DRV_IOVA_AS_VA 0X0040
+/** Device driver only supports IOVA as VA and cannot work with IOVA as PA */
+#define RTE_PCI_DRV_IOVA_AS_VA 0x0040
/**
* Map the PCI device resources in user space virtual memory address
@@ -157,8 +157,7 @@ static void atl_dev_info_get(struct rte_eth_dev *dev,
static struct rte_pci_driver rte_atl_pmd = {
.id_table = pci_id_atl_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
- RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = eth_atl_pci_probe,
.remove = eth_atl_pci_remove,
};
@@ -4028,8 +4028,7 @@ static int bnxt_pci_remove(struct rte_pci_device *pci_dev)
static struct rte_pci_driver bnxt_rte_pmd = {
.id_table = bnxt_pci_id_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING |
- RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = bnxt_pci_probe,
.remove = bnxt_pci_remove,
};
@@ -352,8 +352,7 @@ static int eth_em_pci_remove(struct rte_pci_device *pci_dev)
static struct rte_pci_driver rte_em_pmd = {
.id_table = pci_id_em_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
- RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = eth_em_pci_probe,
.remove = eth_em_pci_remove,
};
@@ -1116,8 +1116,7 @@ static int eth_igb_pci_remove(struct rte_pci_device *pci_dev)
static struct rte_pci_driver rte_igb_pmd = {
.id_table = pci_id_igb_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
- RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = eth_igb_pci_probe,
.remove = eth_igb_pci_remove,
};
@@ -1140,7 +1139,7 @@ static int eth_igbvf_pci_remove(struct rte_pci_device *pci_dev)
*/
static struct rte_pci_driver rte_igbvf_pmd = {
.id_table = pci_id_igbvf_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
.probe = eth_igbvf_pci_probe,
.remove = eth_igbvf_pci_remove,
};
@@ -1247,8 +1247,7 @@ static int eth_enic_pci_remove(struct rte_pci_device *pci_dev)
static struct rte_pci_driver rte_enic_pmd = {
.id_table = pci_id_enic_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
- RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = eth_enic_pci_probe,
.remove = eth_enic_pci_remove,
};
@@ -3268,8 +3268,7 @@ static int eth_fm10k_pci_remove(struct rte_pci_device *pci_dev)
static struct rte_pci_driver rte_pmd_fm10k = {
.id_table = pci_id_fm10k_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
- RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = eth_fm10k_pci_probe,
.remove = eth_fm10k_pci_remove,
};
@@ -696,8 +696,7 @@ static int eth_i40e_pci_remove(struct rte_pci_device *pci_dev)
static struct rte_pci_driver rte_i40e_pmd = {
.id_table = pci_id_i40e_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
- RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = eth_i40e_pci_probe,
.remove = eth_i40e_pci_remove,
};
@@ -1557,7 +1557,7 @@ static int eth_i40evf_pci_remove(struct rte_pci_device *pci_dev)
*/
static struct rte_pci_driver rte_i40evf_pmd = {
.id_table = pci_id_i40evf_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
.probe = eth_i40evf_pci_probe,
.remove = eth_i40evf_pci_remove,
};
@@ -1402,8 +1402,7 @@ static int eth_iavf_pci_remove(struct rte_pci_device *pci_dev)
/* Adaptive virtual function driver struct */
static struct rte_pci_driver rte_iavf_pmd = {
.id_table = pci_id_iavf_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
- RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = eth_iavf_pci_probe,
.remove = eth_iavf_pci_remove,
};
@@ -3737,8 +3737,7 @@ static int ice_xstats_get_names(__rte_unused struct rte_eth_dev *dev,
static struct rte_pci_driver rte_ice_pmd = {
.id_table = pci_id_ice_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
- RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = ice_pci_probe,
.remove = ice_pci_remove,
};
@@ -1869,8 +1869,7 @@ static int eth_ixgbe_pci_remove(struct rte_pci_device *pci_dev)
static struct rte_pci_driver rte_ixgbe_pmd = {
.id_table = pci_id_ixgbe_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
- RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = eth_ixgbe_pci_probe,
.remove = eth_ixgbe_pci_remove,
};
@@ -1892,7 +1891,7 @@ static int eth_ixgbevf_pci_remove(struct rte_pci_device *pci_dev)
*/
static struct rte_pci_driver rte_ixgbevf_pmd = {
.id_table = pci_id_ixgbevf_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
.probe = eth_ixgbevf_pci_probe,
.remove = eth_ixgbevf_pci_remove,
};
@@ -1142,8 +1142,7 @@ struct mlx4_conf {
},
.id_table = mlx4_pci_id_map,
.probe = mlx4_pci_probe,
- .drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV |
- RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV,
};
#ifdef RTE_IBVERBS_LINK_DLOPEN
@@ -2087,7 +2087,7 @@ struct mlx5_dev_spawn_data {
.dma_map = mlx5_dma_map,
.dma_unmap = mlx5_dma_unmap,
.drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV |
- RTE_PCI_DRV_PROBE_AGAIN | RTE_PCI_DRV_IOVA_AS_VA,
+ RTE_PCI_DRV_PROBE_AGAIN,
};
#ifdef RTE_IBVERBS_LINK_DLOPEN
@@ -3760,16 +3760,14 @@ static int eth_nfp_pci_remove(struct rte_pci_device *pci_dev)
static struct rte_pci_driver rte_nfp_net_pf_pmd = {
.id_table = pci_id_nfp_pf_net_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
- RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = nfp_pf_pci_probe,
.remove = eth_nfp_pci_remove,
};
static struct rte_pci_driver rte_nfp_net_vf_pmd = {
.id_table = pci_id_nfp_vf_net_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
- RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = eth_nfp_pci_probe,
.remove = eth_nfp_pci_remove,
};
@@ -1188,11 +1188,6 @@
goto fail;
}
- if (rte_eal_iova_mode() != RTE_IOVA_VA) {
- otx2_err("iova mode should be va");
- goto fail;
- }
-
if (conf->link_speeds & ETH_LINK_SPEED_FIXED) {
otx2_err("Setting link speed/duplex not supported");
goto fail;
@@ -2737,8 +2737,7 @@ static int qedevf_eth_dev_pci_remove(struct rte_pci_device *pci_dev)
static struct rte_pci_driver rte_qedevf_pmd = {
.id_table = pci_id_qedevf_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
- RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = qedevf_eth_dev_pci_probe,
.remove = qedevf_eth_dev_pci_remove,
};
@@ -2757,8 +2756,7 @@ static int qede_eth_dev_pci_remove(struct rte_pci_device *pci_dev)
static struct rte_pci_driver rte_qede_pmd = {
.id_table = pci_id_qede_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
- RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = qede_eth_dev_pci_probe,
.remove = qede_eth_dev_pci_remove,
};
@@ -338,8 +338,7 @@
static struct rte_pci_driver ioat_pmd_drv = {
.id_table = pci_id_ioat_map,
- .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
- RTE_PCI_DRV_IOVA_AS_VA,
+ .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = ioat_rawdev_probe,
.remove = ioat_rawdev_remove,
};
@@ -228,13 +228,37 @@ struct rte_bus *
enum rte_iova_mode
rte_bus_get_iommu_class(void)
{
- int mode = RTE_IOVA_DC;
+ enum rte_iova_mode mode = RTE_IOVA_DC;
+ bool buses_want_va = false;
+ bool buses_want_pa = false;
struct rte_bus *bus;
TAILQ_FOREACH(bus, &rte_bus_list, next) {
+ enum rte_iova_mode bus_iova_mode;
- if (bus->get_iommu_class)
- mode |= bus->get_iommu_class();
+ if (bus->get_iommu_class == NULL)
+ continue;
+
+ bus_iova_mode = bus->get_iommu_class();
+ RTE_LOG(DEBUG, EAL, "Bus %s wants IOVA as '%s'\n",
+ bus->name,
+ bus_iova_mode == RTE_IOVA_DC ? "DC" :
+ (bus_iova_mode == RTE_IOVA_PA ? "PA" : "VA"));
+ if (bus_iova_mode == RTE_IOVA_PA)
+ buses_want_pa = true;
+ else if (bus_iova_mode == RTE_IOVA_VA)
+ buses_want_va = true;
+ }
+ if (buses_want_va && !buses_want_pa) {
+ mode = RTE_IOVA_VA;
+ } else if (buses_want_pa && !buses_want_va) {
+ mode = RTE_IOVA_PA;
+ } else {
+ mode = RTE_IOVA_DC;
+ if (buses_want_va) {
+ RTE_LOG(WARNING, EAL, "Some buses want 'VA' but forcing 'DC' because other buses want 'PA'.\n");
+ RTE_LOG(WARNING, EAL, "Depending on the final decision by the EAL, part of your buses won't initialise.\n");
+ }
}
return mode;