bus/pci: fix IOVA as VA mode selection

Message ID 20190708142450.51597-1-jerinj@marvell.com
State Superseded
Delegated to: Thomas Monjalon
Headers show
Series
  • bus/pci: fix IOVA as VA mode selection
Related show

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/intel-Performance-Testing success Performance Testing PASS
ci/mellanox-Performance-Testing success Performance Testing PASS
ci/checkpatch success coding style OK

Commit Message

Jerin Jacob Kollanukkaran July 8, 2019, 2:24 p.m.
From: Jerin Jacob <jerinj@marvell.com>

Existing logic fails to select IOVA mode as VA
if driver request to enable IOVA as VA.

IOVA as VA has more strict requirement than other modes,
so enabling positive logic for IOVA as VA selection.

This patch also updates the default IOVA mode as PA
for PCI devices as it has to deal with DMA engines unlike
the virtual devices that may need only IOVA as DC.

Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode")

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---

If system has only IOVA as VA devices, with out this patch none of the
devices works on top of tree now. Request to review and close it for RC1.

---
 drivers/bus/pci/linux/pci.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

David Marchand July 8, 2019, 6:39 p.m. | #1
On Mon, Jul 8, 2019 at 4:25 PM <jerinj@marvell.com> wrote:

> From: Jerin Jacob <jerinj@marvell.com>
>
> Existing logic fails to select IOVA mode as VA
> if driver request to enable IOVA as VA.
>
> IOVA as VA has more strict requirement than other modes,
> so enabling positive logic for IOVA as VA selection.
>
> This patch also updates the default IOVA mode as PA
> for PCI devices as it has to deal with DMA engines unlike
> the virtual devices that may need only IOVA as DC.
>

We have three cases:
- driver/hw supports IOVA as PA only
- driver/hw supports IOVA as both VA and PA
- driver/hw supports IOVA as VA only

Looking at the header:
/** Device driver supports IOVA as VA */
#define RTE_PCI_DRV_IOVA_AS_VA 0X0040

It clearly states that the driver supports IOVA as VA, and not requires.
With only one flag, we can't decide on those three cases and we might want
to introduce a new flag for your usecase.

But, in any case, this is too short for -rc1.
I prefer to nak this patch for now.
You can still set --iova-mode=va and we can try to find a solution for -rc2.
Jerin Jacob Kollanukkaran July 8, 2019, 7:13 p.m. | #2
See below,

Please send the email as text to avoid formatting issue.(No HTML)

From: David Marchand <david.marchand@redhat.com> 
Sent: Tuesday, July 9, 2019 12:09 AM
To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben Walker <benjamin.walker@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>
Subject: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode selection
Bruce Richardson July 9, 2019, 8:39 a.m. | #3
On Mon, Jul 08, 2019 at 07:13:28PM +0000, Jerin Jacob Kollanukkaran wrote:
> See below,
> 
> Please send the email as text to avoid formatting issue.(No HTML)
> 
> From: David Marchand <david.marchand@redhat.com> 
> Sent: Tuesday, July 9, 2019 12:09 AM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben Walker <benjamin.walker@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>
> Subject: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode selection
> 
> ________________________________________
> 
> On Mon, Jul 8, 2019 at 4:25 PM <mailto:jerinj@marvell.com> wrote:
> From: Jerin Jacob <mailto:jerinj@marvell.com>
> 
> Existing logic fails to select IOVA mode as VA
> if driver request to enable IOVA as VA.
> 
> IOVA as VA has more strict requirement than other modes,
> so enabling positive logic for IOVA as VA selection.
> 
> This patch also updates the default IOVA mode as PA
> for PCI devices as it has to deal with DMA engines unlike
> the virtual devices that may need only IOVA as DC.
> 
> We have three cases:
> - driver/hw supports IOVA as PA only
> 
> [Jerin] It is not driver cap, it is more of system cap(IOMMU vs non  IOMMU). We are already addressing that case
> 

Not necessarily. It's possible to have hardware that does not use the IOMMU
on a platform. Therefore, you have more than two options to support.

/Bruce
Jerin Jacob Kollanukkaran July 9, 2019, 9:05 a.m. | #4
> -----Original Message-----
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Tuesday, July 9, 2019 2:10 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> Cc: David Marchand <david.marchand@redhat.com>; dev <dev@dpdk.org>;
> Thomas Monjalon <thomas@monjalon.net>; Ben Walker
> <benjamin.walker@intel.com>; Burakov, Anatoly
> <anatoly.burakov@intel.com>
> Subject: Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode
> selection
> 
> On Mon, Jul 08, 2019 at 07:13:28PM +0000, Jerin Jacob Kollanukkaran wrote:
> > See below,
> >
> > Please send the email as text to avoid formatting issue.(No HTML)
> >
> > From: David Marchand <david.marchand@redhat.com>
> > Sent: Tuesday, July 9, 2019 12:09 AM
> > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> > Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>;
> Ben
> > Walker <benjamin.walker@intel.com>; Burakov, Anatoly
> > <anatoly.burakov@intel.com>
> > Subject: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> > selection
> >
> > ________________________________________
> >
> > On Mon, Jul 8, 2019 at 4:25 PM <mailto:jerinj@marvell.com> wrote:
> > From: Jerin Jacob <mailto:jerinj@marvell.com>
> >
> > Existing logic fails to select IOVA mode as VA if driver request to
> > enable IOVA as VA.
> >
> > IOVA as VA has more strict requirement than other modes, so enabling
> > positive logic for IOVA as VA selection.
> >
> > This patch also updates the default IOVA mode as PA for PCI devices as
> > it has to deal with DMA engines unlike the virtual devices that may
> > need only IOVA as DC.
> >
> > We have three cases:
> > - driver/hw supports IOVA as PA only
> >
> > [Jerin] It is not driver cap, it is more of system cap(IOMMU vs non
> > IOMMU). We are already addressing that case
> >
> 
> Not necessarily. It's possible to have hardware that does not use the IOMMU
> on a platform. Therefore, you have more than two options to support.

Any example such device?

> 
> /Bruce
Bruce Richardson July 9, 2019, 9:32 a.m. | #5
On Tue, Jul 09, 2019 at 09:05:07AM +0000, Jerin Jacob Kollanukkaran wrote:
> > -----Original Message-----
> > From: Bruce Richardson <bruce.richardson@intel.com>
> > Sent: Tuesday, July 9, 2019 2:10 PM
> > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> > Cc: David Marchand <david.marchand@redhat.com>; dev <dev@dpdk.org>;
> > Thomas Monjalon <thomas@monjalon.net>; Ben Walker
> > <benjamin.walker@intel.com>; Burakov, Anatoly
> > <anatoly.burakov@intel.com>
> > Subject: Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode
> > selection
> > 
> > On Mon, Jul 08, 2019 at 07:13:28PM +0000, Jerin Jacob Kollanukkaran wrote:
> > > See below,
> > >
> > > Please send the email as text to avoid formatting issue.(No HTML)
> > >
> > > From: David Marchand <david.marchand@redhat.com>
> > > Sent: Tuesday, July 9, 2019 12:09 AM
> > > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> > > Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>;
> > Ben
> > > Walker <benjamin.walker@intel.com>; Burakov, Anatoly
> > > <anatoly.burakov@intel.com>
> > > Subject: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> > > selection
> > >
> > > ________________________________________
> > >
> > > On Mon, Jul 8, 2019 at 4:25 PM <mailto:jerinj@marvell.com> wrote:
> > > From: Jerin Jacob <mailto:jerinj@marvell.com>
> > >
> > > Existing logic fails to select IOVA mode as VA if driver request to
> > > enable IOVA as VA.
> > >
> > > IOVA as VA has more strict requirement than other modes, so enabling
> > > positive logic for IOVA as VA selection.
> > >
> > > This patch also updates the default IOVA mode as PA for PCI devices as
> > > it has to deal with DMA engines unlike the virtual devices that may
> > > need only IOVA as DC.
> > >
> > > We have three cases:
> > > - driver/hw supports IOVA as PA only
> > >
> > > [Jerin] It is not driver cap, it is more of system cap(IOMMU vs non
> > > IOMMU). We are already addressing that case
> > >
> > 
> > Not necessarily. It's possible to have hardware that does not use the IOMMU
> > on a platform. Therefore, you have more than two options to support.
> 
> Any example such device?
> 

On further investigation, it appears I was wrong/misinformed. All devices
I'm aware of work fine with an IOMMU if one is one the platform. Please
ignore my previous assertion, and thanks for getting me to follow up on this!

/Bruce
Burakov, Anatoly July 9, 2019, 9:44 a.m. | #6
On 08-Jul-19 8:13 PM, Jerin Jacob Kollanukkaran wrote:
> See below,
> 
> Please send the email as text to avoid formatting issue.(No HTML)
> 
> From: David Marchand <david.marchand@redhat.com>
> Sent: Tuesday, July 9, 2019 12:09 AM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben Walker <benjamin.walker@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>
> Subject: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode selection
> 
> ________________________________________
> 
> On Mon, Jul 8, 2019 at 4:25 PM <mailto:jerinj@marvell.com> wrote:
> From: Jerin Jacob <mailto:jerinj@marvell.com>
> 
> Existing logic fails to select IOVA mode as VA
> if driver request to enable IOVA as VA.
> 
> IOVA as VA has more strict requirement than other modes,
> so enabling positive logic for IOVA as VA selection.
> 
> This patch also updates the default IOVA mode as PA
> for PCI devices as it has to deal with DMA engines unlike
> the virtual devices that may need only IOVA as DC.
> 
> We have three cases:
> - driver/hw supports IOVA as PA only
> 
> [Jerin] It is not driver cap, it is more of system cap(IOMMU vs non  IOMMU). We are already addressing that case

I don't get how this works. How does "system capability" affect what the 
device itself supports? Are we to assume that *all* hardware support 
IOVA as VA by default? "System capability" is more of a bus issue than 
an individual device issue, is it not?
Jerin Jacob Kollanukkaran July 9, 2019, 11:13 a.m. | #7
> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Tuesday, July 9, 2019 3:15 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
> <david.marchand@redhat.com>
> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben
> Walker <benjamin.walker@intel.com>
> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> selection
> 
> On 08-Jul-19 8:13 PM, Jerin Jacob Kollanukkaran wrote:
> > See below,
> >
> > Please send the email as text to avoid formatting issue.(No HTML)
> >
> > From: David Marchand <david.marchand@redhat.com>
> > Sent: Tuesday, July 9, 2019 12:09 AM
> > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> > Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>;
> Ben
> > Walker <benjamin.walker@intel.com>; Burakov, Anatoly
> > <anatoly.burakov@intel.com>
> > Subject: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> > selection
> >
> > ________________________________________
> >
> > On Mon, Jul 8, 2019 at 4:25 PM <mailto:jerinj@marvell.com> wrote:
> > From: Jerin Jacob <mailto:jerinj@marvell.com>
> >
> > Existing logic fails to select IOVA mode as VA if driver request to
> > enable IOVA as VA.
> >
> > IOVA as VA has more strict requirement than other modes, so enabling
> > positive logic for IOVA as VA selection.
> >
> > This patch also updates the default IOVA mode as PA for PCI devices as
> > it has to deal with DMA engines unlike the virtual devices that may
> > need only IOVA as DC.
> >
> > We have three cases:
> > - driver/hw supports IOVA as PA only
> >
> > [Jerin] It is not driver cap, it is more of system cap(IOMMU vs non
> > IOMMU). We are already addressing that case
> 
> I don't get how this works. How does "system capability" affect what the
> device itself supports? Are we to assume that *all* hardware support IOVA
> as VA by default? "System capability" is more of a bus issue than an individual
> device issue, is it not?

What I meant is, supporting VA vs PA is function of IOMMU(not the device attribute).
Ie. Device makes the  bus master request, if IOMMU available and enabled in the SYSTEM ,
It goes over IOMMU  and translate the IOVA to physical address.

Another way to put is, Is there any _PCIe_ device which need/requires
RTE_PCI_DRV_NEED_IOVA_AS_PA in rte_pci_driver.drv_flags





> 
> --
> Thanks,
> Anatoly
Burakov, Anatoly July 9, 2019, 11:40 a.m. | #8
On 09-Jul-19 12:13 PM, Jerin Jacob Kollanukkaran wrote:
>> -----Original Message-----
>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
>> Sent: Tuesday, July 9, 2019 3:15 PM
>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
>> <david.marchand@redhat.com>
>> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben
>> Walker <benjamin.walker@intel.com>
>> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
>> selection
>>
>> On 08-Jul-19 8:13 PM, Jerin Jacob Kollanukkaran wrote:
>>> See below,
>>>
>>> Please send the email as text to avoid formatting issue.(No HTML)
>>>
>>> From: David Marchand <david.marchand@redhat.com>
>>> Sent: Tuesday, July 9, 2019 12:09 AM
>>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
>>> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>;
>> Ben
>>> Walker <benjamin.walker@intel.com>; Burakov, Anatoly
>>> <anatoly.burakov@intel.com>
>>> Subject: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
>>> selection
>>>
>>> ________________________________________
>>>
>>> On Mon, Jul 8, 2019 at 4:25 PM <mailto:jerinj@marvell.com> wrote:
>>> From: Jerin Jacob <mailto:jerinj@marvell.com>
>>>
>>> Existing logic fails to select IOVA mode as VA if driver request to
>>> enable IOVA as VA.
>>>
>>> IOVA as VA has more strict requirement than other modes, so enabling
>>> positive logic for IOVA as VA selection.
>>>
>>> This patch also updates the default IOVA mode as PA for PCI devices as
>>> it has to deal with DMA engines unlike the virtual devices that may
>>> need only IOVA as DC.
>>>
>>> We have three cases:
>>> - driver/hw supports IOVA as PA only
>>>
>>> [Jerin] It is not driver cap, it is more of system cap(IOMMU vs non
>>> IOMMU). We are already addressing that case
>>
>> I don't get how this works. How does "system capability" affect what the
>> device itself supports? Are we to assume that *all* hardware support IOVA
>> as VA by default? "System capability" is more of a bus issue than an individual
>> device issue, is it not?
> 
> What I meant is, supporting VA vs PA is function of IOMMU(not the device attribute).
> Ie. Device makes the  bus master request, if IOMMU available and enabled in the SYSTEM ,
> It goes over IOMMU  and translate the IOVA to physical address.
> 
> Another way to put is, Is there any _PCIe_ device which need/requires
> RTE_PCI_DRV_NEED_IOVA_AS_PA in rte_pci_driver.drv_flags
> 
> 

Previously, as far as i can tell, the flag was used to indicate support 
for IOVA as VA mode, not *requirement* for IOVA as VA mode. For example, 
there are multiple patches [1][2][3][4] (i'm sure i can find more!) that 
added IOVA as VA support to various drivers, and they all were worded it 
in this exact way - "support for IOVA as VA mode", not "require IOVA as 
VA mode". As far as i can tell, none of these drivers *require* IOVA as 
VA mode - they merely use this flag to indicate support for it.

Specifically, from my perspective, the "support for IOVA as VA mode" has 
in practice always indicated support for VFIO (or similar drivers) as 
far as the PCI bus is concerned. As in, the device *could* use IOVA as 
VA mode, but since it may be bound to igb_uio (which doesn't support 
IOVA as VA), the IOVA as VA mode may not be supported for a particular 
device. So, a particular device *cannot support* IOVA as VA if it's 
bound to igb_uio or uio_pci_generic (or VFIO in noiommu mode). This is 
not *just* a capability thing, but also kernel driver issue.

Now suddenly it turns out that someone somewhere "knew" that "IOVA as 
VA" flag in PCI drivers is supposed to indicate *requirement* and not 
support, and it appears that this knowledge was not communicated nor 
documented anywhere, and is now treated as common knowledge.

[1] http://patchwork.dpdk.org/patch/53206/
[2] http://patchwork.dpdk.org/patch/50274/
[3] http://patchwork.dpdk.org/patch/50991/
[4] http://patchwork.dpdk.org/patch/46134/
Jerin Jacob Kollanukkaran July 9, 2019, 12:11 p.m. | #9
> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Tuesday, July 9, 2019 5:10 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
> <david.marchand@redhat.com>
> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben
> Walker <benjamin.walker@intel.com>
> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> selection
> >>> ________________________________________
> >>>
> >>> On Mon, Jul 8, 2019 at 4:25 PM <mailto:jerinj@marvell.com> wrote:
> >>> From: Jerin Jacob <mailto:jerinj@marvell.com>
> >>>
> >>> Existing logic fails to select IOVA mode as VA if driver request to
> >>> enable IOVA as VA.
> >>>
> >>> IOVA as VA has more strict requirement than other modes, so enabling
> >>> positive logic for IOVA as VA selection.
> >>>
> >>> This patch also updates the default IOVA mode as PA for PCI devices
> >>> as it has to deal with DMA engines unlike the virtual devices that
> >>> may need only IOVA as DC.
> >>>
> >>> We have three cases:
> >>> - driver/hw supports IOVA as PA only
> >>>
> >>> [Jerin] It is not driver cap, it is more of system cap(IOMMU vs non
> >>> IOMMU). We are already addressing that case
> >>
> >> I don't get how this works. How does "system capability" affect what
> >> the device itself supports? Are we to assume that *all* hardware
> >> support IOVA as VA by default? "System capability" is more of a bus
> >> issue than an individual device issue, is it not?
> >
> > What I meant is, supporting VA vs PA is function of IOMMU(not the device
> attribute).
> > Ie. Device makes the  bus master request, if IOMMU available and
> > enabled in the SYSTEM , It goes over IOMMU  and translate the IOVA to
> physical address.
> >
> > Another way to put is, Is there any _PCIe_ device which need/requires
> > RTE_PCI_DRV_NEED_IOVA_AS_PA in rte_pci_driver.drv_flags
> >
> >
> 
> Previously, as far as i can tell, the flag was used to indicate support for IOVA
> as VA mode, not *requirement* for IOVA as VA mode. For example, there
> are multiple patches [1][2][3][4] (i'm sure i can find more!) that added IOVA
> as VA support to various drivers, and they all were worded it in this exact way
> - "support for IOVA as VA mode", not "require IOVA as VA mode". As far as i
> can tell, none of these drivers *require* IOVA as VA mode - they merely use
> this flag to indicate support for it.

Some class of devices NEED IOVA as VA for performance reasons.
Specially the devices has HW mempool allocators. On those devices If we don’t use IOVA as VA,
Upon getting packet from device, It needs to go over rte_mem_iova2virt() per 
packet see driver/net/dppa2. Which has real performance issue.

> Specifically, from my perspective, the "support for IOVA as VA mode" has in
> practice always indicated support for VFIO (or similar drivers) as far as the PCI
> bus is concerned. As in, the device *could* use IOVA as VA mode, but since it
> may be bound to igb_uio (which doesn't support IOVA as VA), the IOVA as
> VA mode may not be supported for a particular device. So, a particular device
> *cannot support* IOVA as VA if it's bound to igb_uio or uio_pci_generic (or
> VFIO in noiommu mode). This is not *just* a capability thing, but also kernel
> driver issue.

Yes. See below.

> 
> Now suddenly it turns out that someone somewhere "knew" that "IOVA as
> VA" flag in PCI drivers is supposed to indicate *requirement* and not
> support, and it appears that this knowledge was not communicated nor
> documented anywhere, and is now treated as common knowledge.

I think, the confusion here is,  I was under impression that
# If device supports IOVA as VA and system runs with IOMMU then
the  dpdk should run in IOVA as VA mode.
If above statement true then we don’t really need a new flag.

Couple of points to make forward progress:
# If we think, there is a use case where device is IOVA as VA 
And system runs in IOMMU mode then for some reason DPDK needs
to run in PA mode. If so, we need to create two flags
RTE_PCI_DRV_IOVA_AS_VA - it can run either modes
RTE_PCI_DRV_NEED_IOVA_AS_VA - it can run only on IOVA as VA
# With top of tree, Currently it never runs in IOVA as VA mode.
That’s a separate problem to fix. Which effect all the devices
Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though
Device support RTE_PCI_DRV_IOVA_AS_VA, it is not running
With IOMMU protection and/or root privilege is required to run DPDK.


> 
> [1] http://patchwork.dpdk.org/patch/53206/
> [2] http://patchwork.dpdk.org/patch/50274/
> [3] http://patchwork.dpdk.org/patch/50991/
> [4] http://patchwork.dpdk.org/patch/46134/
> 
> --
> Thanks,
> Anatoly
Burakov, Anatoly July 9, 2019, 1:30 p.m. | #10
On 09-Jul-19 1:11 PM, Jerin Jacob Kollanukkaran wrote:
>> -----Original Message-----
>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
>> Sent: Tuesday, July 9, 2019 5:10 PM
>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
>> <david.marchand@redhat.com>
>> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben
>> Walker <benjamin.walker@intel.com>
>> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
>> selection
>>>>> ________________________________________
>>>>>
>>>>> On Mon, Jul 8, 2019 at 4:25 PM <mailto:jerinj@marvell.com> wrote:
>>>>> From: Jerin Jacob <mailto:jerinj@marvell.com>
>>>>>
>>>>> Existing logic fails to select IOVA mode as VA if driver request to
>>>>> enable IOVA as VA.
>>>>>
>>>>> IOVA as VA has more strict requirement than other modes, so enabling
>>>>> positive logic for IOVA as VA selection.
>>>>>
>>>>> This patch also updates the default IOVA mode as PA for PCI devices
>>>>> as it has to deal with DMA engines unlike the virtual devices that
>>>>> may need only IOVA as DC.
>>>>>
>>>>> We have three cases:
>>>>> - driver/hw supports IOVA as PA only
>>>>>
>>>>> [Jerin] It is not driver cap, it is more of system cap(IOMMU vs non
>>>>> IOMMU). We are already addressing that case
>>>>
>>>> I don't get how this works. How does "system capability" affect what
>>>> the device itself supports? Are we to assume that *all* hardware
>>>> support IOVA as VA by default? "System capability" is more of a bus
>>>> issue than an individual device issue, is it not?
>>>
>>> What I meant is, supporting VA vs PA is function of IOMMU(not the device
>> attribute).
>>> Ie. Device makes the  bus master request, if IOMMU available and
>>> enabled in the SYSTEM , It goes over IOMMU  and translate the IOVA to
>> physical address.
>>>
>>> Another way to put is, Is there any _PCIe_ device which need/requires
>>> RTE_PCI_DRV_NEED_IOVA_AS_PA in rte_pci_driver.drv_flags
>>>
>>>
>>
>> Previously, as far as i can tell, the flag was used to indicate support for IOVA
>> as VA mode, not *requirement* for IOVA as VA mode. For example, there
>> are multiple patches [1][2][3][4] (i'm sure i can find more!) that added IOVA
>> as VA support to various drivers, and they all were worded it in this exact way
>> - "support for IOVA as VA mode", not "require IOVA as VA mode". As far as i
>> can tell, none of these drivers *require* IOVA as VA mode - they merely use
>> this flag to indicate support for it.
> 
> Some class of devices NEED IOVA as VA for performance reasons.
> Specially the devices has HW mempool allocators. On those devices If we don’t use IOVA as VA,
> Upon getting packet from device, It needs to go over rte_mem_iova2virt() per
> packet see driver/net/dppa2. Which has real performance issue.

I wouldn't classify this as "needing" IOVA. "Need" implies it cannot 
work without it, whereas in this case it's more of a "highly 
recommended" rather than "need".

>>
>> Now suddenly it turns out that someone somewhere "knew" that "IOVA as
>> VA" flag in PCI drivers is supposed to indicate *requirement* and not
>> support, and it appears that this knowledge was not communicated nor
>> documented anywhere, and is now treated as common knowledge.
> 
> I think, the confusion here is,  I was under impression that
> # If device supports IOVA as VA and system runs with IOMMU then
> the  dpdk should run in IOVA as VA mode.
> If above statement true then we don’t really need a new flag.

Exactly. And the flag used to indicate that the device *supports* IOVA 
as VA, not that it *requires* it.

> 
> Couple of points to make forward progress:
> # If we think, there is a use case where device is IOVA as VA
> And system runs in IOMMU mode then for some reason DPDK needs
> to run in PA mode. If so, we need to create two flags
> RTE_PCI_DRV_IOVA_AS_VA - it can run either modes

There are use cases - KNI and igb_uio come to mind. Whether IOMMU uses 
VA or PA is a different from whether IOMMU is in use - there is no law 
that states that, when using IOMMU, IOVA have to have 1:1 mapping with 
VA. IOMMU requirement does not necessarily imply IOVA as VA - it is 
perfectly legal to program IOMMU to use IOVA as PA (which we currently 
do when we e.g. use VFIO for some devices and igb_uio for others).

> RTE_PCI_DRV_NEED_IOVA_AS_VA - it can run only on IOVA as VA

If we're adding a flag, we might as well not create a confusion and do 
it consistently. If IOVA as PA is supported, have a flag to indicate 
that. If IOVA as VA is supported, have a flag to indicate that. Absence 
of either flag implies inability to work in that mode. I don't see how 
this is less clear and self-documenting than having two IOVA as 
VA-related flags that have slightly different meaning and imply things 
not otherwise stated explicitly.

> # With top of tree, Currently it never runs in IOVA as VA mode.
> That’s a separate problem to fix. Which effect all the devices
> Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though
> Device support RTE_PCI_DRV_IOVA_AS_VA, it is not running
> With IOMMU protection and/or root privilege is required to run DPDK.
> 
> 
>>
>> [1] http://patchwork.dpdk.org/patch/53206/
>> [2] http://patchwork.dpdk.org/patch/50274/
>> [3] http://patchwork.dpdk.org/patch/50991/
>> [4] http://patchwork.dpdk.org/patch/46134/
>>
>> --
>> Thanks,
>> Anatoly
Burakov, Anatoly July 9, 2019, 1:50 p.m. | #11
On 09-Jul-19 2:30 PM, Burakov, Anatoly wrote:
> On 09-Jul-19 1:11 PM, Jerin Jacob Kollanukkaran wrote:
>>> -----Original Message-----
>>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
>>> Sent: Tuesday, July 9, 2019 5:10 PM
>>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
>>> <david.marchand@redhat.com>
>>> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben
>>> Walker <benjamin.walker@intel.com>
>>> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
>>> selection
>>>>>> ________________________________________
>>>>>>
>>>>>> On Mon, Jul 8, 2019 at 4:25 PM <mailto:jerinj@marvell.com> wrote:
>>>>>> From: Jerin Jacob <mailto:jerinj@marvell.com>
>>>>>>
>>>>>> Existing logic fails to select IOVA mode as VA if driver request to
>>>>>> enable IOVA as VA.
>>>>>>
>>>>>> IOVA as VA has more strict requirement than other modes, so enabling
>>>>>> positive logic for IOVA as VA selection.
>>>>>>
>>>>>> This patch also updates the default IOVA mode as PA for PCI devices
>>>>>> as it has to deal with DMA engines unlike the virtual devices that
>>>>>> may need only IOVA as DC.
>>>>>>
>>>>>> We have three cases:
>>>>>> - driver/hw supports IOVA as PA only
>>>>>>
>>>>>> [Jerin] It is not driver cap, it is more of system cap(IOMMU vs non
>>>>>> IOMMU). We are already addressing that case
>>>>>
>>>>> I don't get how this works. How does "system capability" affect what
>>>>> the device itself supports? Are we to assume that *all* hardware
>>>>> support IOVA as VA by default? "System capability" is more of a bus
>>>>> issue than an individual device issue, is it not?
>>>>
>>>> What I meant is, supporting VA vs PA is function of IOMMU(not the 
>>>> device
>>> attribute).
>>>> Ie. Device makes the  bus master request, if IOMMU available and
>>>> enabled in the SYSTEM , It goes over IOMMU  and translate the IOVA to
>>> physical address.
>>>>
>>>> Another way to put is, Is there any _PCIe_ device which need/requires
>>>> RTE_PCI_DRV_NEED_IOVA_AS_PA in rte_pci_driver.drv_flags
>>>>
>>>>
>>>
>>> Previously, as far as i can tell, the flag was used to indicate 
>>> support for IOVA
>>> as VA mode, not *requirement* for IOVA as VA mode. For example, there
>>> are multiple patches [1][2][3][4] (i'm sure i can find more!) that 
>>> added IOVA
>>> as VA support to various drivers, and they all were worded it in this 
>>> exact way
>>> - "support for IOVA as VA mode", not "require IOVA as VA mode". As 
>>> far as i
>>> can tell, none of these drivers *require* IOVA as VA mode - they 
>>> merely use
>>> this flag to indicate support for it.
>>
>> Some class of devices NEED IOVA as VA for performance reasons.
>> Specially the devices has HW mempool allocators. On those devices If 
>> we don’t use IOVA as VA,
>> Upon getting packet from device, It needs to go over 
>> rte_mem_iova2virt() per
>> packet see driver/net/dppa2. Which has real performance issue.
> 
> I wouldn't classify this as "needing" IOVA. "Need" implies it cannot 
> work without it, whereas in this case it's more of a "highly 
> recommended" rather than "need".
> 
>>>
>>> Now suddenly it turns out that someone somewhere "knew" that "IOVA as
>>> VA" flag in PCI drivers is supposed to indicate *requirement* and not
>>> support, and it appears that this knowledge was not communicated nor
>>> documented anywhere, and is now treated as common knowledge.
>>
>> I think, the confusion here is,  I was under impression that
>> # If device supports IOVA as VA and system runs with IOMMU then
>> the  dpdk should run in IOVA as VA mode.
>> If above statement true then we don’t really need a new flag.
> 
> Exactly. And the flag used to indicate that the device *supports* IOVA 
> as VA, not that it *requires* it.

...unless the driver itself is written in such a way as to simply not 
support VA to PA lookups - in that case, the above suggested way of 
simply not indicating IOVA as PA support would fix the issue in that it 
will require the device to either work in IOVA as VA mode, or fail to 
initialize. Current semantics of only having one flag do not distinguish 
between "can do both PA and VA" and "can only do VA" - hence the 
suggestion of adding an additional flag indicating IOVA as PA support.
Jerin Jacob Kollanukkaran July 9, 2019, 2 p.m. | #12
> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Tuesday, July 9, 2019 7:00 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
> <david.marchand@redhat.com>
> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben
> Walker <benjamin.walker@intel.com>
> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> selection
> 
> On 09-Jul-19 1:11 PM, Jerin Jacob Kollanukkaran wrote:
> >> -----Original Message-----
> >> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> >> Sent: Tuesday, July 9, 2019 5:10 PM
> >> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
> >> <david.marchand@redhat.com>
> >> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>;
> Ben
> >> Walker <benjamin.walker@intel.com>
> >> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA
> >> mode selection
> >>>>> ________________________________________
> >>>>>
> >>>>> On Mon, Jul 8, 2019 at 4:25 PM <mailto:jerinj@marvell.com> wrote:
> >>>>> From: Jerin Jacob <mailto:jerinj@marvell.com>
> >>>>>
> >>>>> Existing logic fails to select IOVA mode as VA if driver request
> >>>>> to enable IOVA as VA.
> >>>>>
> >>>>> IOVA as VA has more strict requirement than other modes, so
> >>>>> enabling positive logic for IOVA as VA selection.
> >>>>>
> >>>>> This patch also updates the default IOVA mode as PA for PCI
> >>>>> devices as it has to deal with DMA engines unlike the virtual
> >>>>> devices that may need only IOVA as DC.
> >>>>>
> >>>>> We have three cases:
> >>>>> - driver/hw supports IOVA as PA only
> >>>>>
> >>>>> [Jerin] It is not driver cap, it is more of system cap(IOMMU vs
> >>>>> non IOMMU). We are already addressing that case
> >>>>
> >>>> I don't get how this works. How does "system capability" affect
> >>>> what the device itself supports? Are we to assume that *all*
> >>>> hardware support IOVA as VA by default? "System capability" is more
> >>>> of a bus issue than an individual device issue, is it not?
> >>>
> >>> What I meant is, supporting VA vs PA is function of IOMMU(not the
> >>> device
> >> attribute).
> >>> Ie. Device makes the  bus master request, if IOMMU available and
> >>> enabled in the SYSTEM , It goes over IOMMU  and translate the IOVA
> >>> to
> >> physical address.
> >>>
> >>> Another way to put is, Is there any _PCIe_ device which
> >>> need/requires RTE_PCI_DRV_NEED_IOVA_AS_PA in
> >>> rte_pci_driver.drv_flags
> >>>
> >>>
> >>
> >> Previously, as far as i can tell, the flag was used to indicate
> >> support for IOVA as VA mode, not *requirement* for IOVA as VA mode.
> >> For example, there are multiple patches [1][2][3][4] (i'm sure i can
> >> find more!) that added IOVA as VA support to various drivers, and
> >> they all were worded it in this exact way
> >> - "support for IOVA as VA mode", not "require IOVA as VA mode". As
> >> far as i can tell, none of these drivers *require* IOVA as VA mode -
> >> they merely use this flag to indicate support for it.
> >
> > Some class of devices NEED IOVA as VA for performance reasons.
> > Specially the devices has HW mempool allocators. On those devices If
> > we don’t use IOVA as VA, Upon getting packet from device, It needs to
> > go over rte_mem_iova2virt() per packet see driver/net/dppa2. Which has
> real performance issue.
> 
> I wouldn't classify this as "needing" IOVA. "Need" implies it cannot work
> without it, whereas in this case it's more of a "highly recommended" rather
> than "need".

It is "need" as performance is horrible without it as is per packet SW translation.
A "need" for DPDK performance perspective.

> 
> >>
> >> Now suddenly it turns out that someone somewhere "knew" that "IOVA
> as
> >> VA" flag in PCI drivers is supposed to indicate *requirement* and not
> >> support, and it appears that this knowledge was not communicated nor
> >> documented anywhere, and is now treated as common knowledge.
> >
> > I think, the confusion here is,  I was under impression that # If
> > device supports IOVA as VA and system runs with IOMMU then the  dpdk
> > should run in IOVA as VA mode.
> > If above statement true then we don’t really need a new flag.
> 
> Exactly. And the flag used to indicate that the device *supports* IOVA as VA,
> not that it *requires* it.
> 
> >
> > Couple of points to make forward progress:
> > # If we think, there is a use case where device is IOVA as VA And
> > system runs in IOMMU mode then for some reason DPDK needs to run in
> PA
> > mode. If so, we need to create two flags RTE_PCI_DRV_IOVA_AS_VA - it
> > can run either modes
> 
> There are use cases - KNI and igb_uio come to mind. Whether IOMMU uses
> VA or PA is a different from whether IOMMU is in use - there is no law that
> states that, when using IOMMU, IOVA have to have 1:1 mapping with VA.
> IOMMU requirement does not necessarily imply IOVA as VA - it is perfectly
> legal to program IOMMU to use IOVA as PA (which we currently do when we
> e.g. use VFIO for some devices and igb_uio for others).

For KNI, we already submitted a patch to support IOVA as VA.
I don’t know about igb_uio, if IOVA as PA, we might as well disable IOMMU.
Is igb_uio enables IOMMU? I don’t see any reference.
grep -ri "iommu" kernel/linux/igb_uio/

Again, it is not device attribute, it is system attribute. In current KNI case it
Fall backs to PA irrespective of device capability so we don’t need any
separate flag from driver.

Even if we introduce a flag, what it is supposed to do?

> 
> > RTE_PCI_DRV_NEED_IOVA_AS_VA - it can run only on IOVA as VA
> 
> If we're adding a flag, we might as well not create a confusion and do it
> consistently. If IOVA as PA is supported, have a flag to indicate that. If IOVA
> as VA is supported, have a flag to indicate that. Absence of either flag implies

So in what category i40e driver comes? By default, pci bus should return PA for class.
If VA supported then return VA.
So how new flag will help? 

> inability to work in that mode. I don't see how this is less clear and self-
> documenting than having two IOVA as VA-related flags that have slightly
> different meaning and imply things not otherwise stated explicitly.
> 
> > # With top of tree, Currently it never runs in IOVA as VA mode.
> > That’s a separate problem to fix. Which effect all the devices
> > Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though Device
> > support RTE_PCI_DRV_IOVA_AS_VA, it is not running With IOMMU
> > protection and/or root privilege is required to run DPDK.

What's your view on this existing problem?


> >
> >
> >>
> >> [1] http://patchwork.dpdk.org/patch/53206/
> >> [2] http://patchwork.dpdk.org/patch/50274/
> >> [3] http://patchwork.dpdk.org/patch/50991/
> >> [4] http://patchwork.dpdk.org/patch/46134/
> >>
> >> --
> >> Thanks,
> >> Anatoly
> 
> 
> --
> Thanks,
> Anatoly
Jerin Jacob Kollanukkaran July 9, 2019, 2:19 p.m. | #13
> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Tuesday, July 9, 2019 7:21 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
> <david.marchand@redhat.com>
> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben
> Walker <benjamin.walker@intel.com>
> Subject: Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode
> selection
> 
> On 09-Jul-19 2:30 PM, Burakov, Anatoly wrote:
> > On 09-Jul-19 1:11 PM, Jerin Jacob Kollanukkaran wrote:
> >>> -----Original Message-----
> >>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> >>> Sent: Tuesday, July 9, 2019 5:10 PM
> >>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
> >>> <david.marchand@redhat.com>
> >>> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>;
> Ben
> >>> Walker <benjamin.walker@intel.com>
> >>> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA
> >>> mode selection
> >>>>>> ________________________________________
> >>>>>>
> >>>>>> On Mon, Jul 8, 2019 at 4:25 PM <mailto:jerinj@marvell.com> wrote:
> >>>>>> From: Jerin Jacob <mailto:jerinj@marvell.com>
> >>>>>>
> >>>>>> Existing logic fails to select IOVA mode as VA if driver request
> >>>>>> to enable IOVA as VA.
> >>>>>>
> >>>>>> IOVA as VA has more strict requirement than other modes, so
> >>>>>> enabling positive logic for IOVA as VA selection.
> >>>>>>
> >>>>>> This patch also updates the default IOVA mode as PA for PCI
> >>>>>> devices as it has to deal with DMA engines unlike the virtual
> >>>>>> devices that may need only IOVA as DC.
> >>>>>>
> >>>>>> We have three cases:
> >>>>>> - driver/hw supports IOVA as PA only
> >>>>>>
> >>>>>> [Jerin] It is not driver cap, it is more of system cap(IOMMU vs
> >>>>>> non IOMMU). We are already addressing that case
> >>>>>
> >>>>> I don't get how this works. How does "system capability" affect
> >>>>> what the device itself supports? Are we to assume that *all*
> >>>>> hardware support IOVA as VA by default? "System capability" is
> >>>>> more of a bus issue than an individual device issue, is it not?
> >>>>
> >>>> What I meant is, supporting VA vs PA is function of IOMMU(not the
> >>>> device
> >>> attribute).
> >>>> Ie. Device makes the  bus master request, if IOMMU available and
> >>>> enabled in the SYSTEM , It goes over IOMMU  and translate the IOVA
> >>>> to
> >>> physical address.
> >>>>
> >>>> Another way to put is, Is there any _PCIe_ device which
> >>>> need/requires RTE_PCI_DRV_NEED_IOVA_AS_PA in
> >>>> rte_pci_driver.drv_flags
> >>>>
> >>>>
> >>>
> >>> Previously, as far as i can tell, the flag was used to indicate
> >>> support for IOVA as VA mode, not *requirement* for IOVA as VA mode.
> >>> For example, there are multiple patches [1][2][3][4] (i'm sure i can
> >>> find more!) that added IOVA as VA support to various drivers, and
> >>> they all were worded it in this exact way
> >>> - "support for IOVA as VA mode", not "require IOVA as VA mode". As
> >>> far as i can tell, none of these drivers *require* IOVA as VA mode -
> >>> they merely use this flag to indicate support for it.
> >>
> >> Some class of devices NEED IOVA as VA for performance reasons.
> >> Specially the devices has HW mempool allocators. On those devices If
> >> we don’t use IOVA as VA, Upon getting packet from device, It needs to
> >> go over
> >> rte_mem_iova2virt() per
> >> packet see driver/net/dppa2. Which has real performance issue.
> >
> > I wouldn't classify this as "needing" IOVA. "Need" implies it cannot
> > work without it, whereas in this case it's more of a "highly
> > recommended" rather than "need".
> >
> >>>
> >>> Now suddenly it turns out that someone somewhere "knew" that "IOVA
> >>> as VA" flag in PCI drivers is supposed to indicate *requirement* and
> >>> not support, and it appears that this knowledge was not communicated
> >>> nor documented anywhere, and is now treated as common knowledge.
> >>
> >> I think, the confusion here is,  I was under impression that # If
> >> device supports IOVA as VA and system runs with IOMMU then the  dpdk
> >> should run in IOVA as VA mode.
> >> If above statement true then we don’t really need a new flag.
> >
> > Exactly. And the flag used to indicate that the device *supports* IOVA
> > as VA, not that it *requires* it.
> 
> ...unless the driver itself is written in such a way as to simply not support VA
> to PA lookups

Yes. 

> - in that case, the above suggested way of simply not indicating
> IOVA as PA support would fix the issue in that it will require the device to
> either work in IOVA as VA mode, or fail to initialize. Current semantics of only
> having one flag do not distinguish between "can do both PA and VA" and
> "can only do VA" - hence the suggestion of adding an additional flag
> indicating IOVA as PA support.

Currently all device can support "can do both PA and VA" but system limits through vfio-nommu or
Igb_uio or KNI it fallback to PA

So question comes what we do with new flag in pci_device_iova_mode()
In my view:

pci_device_iova_mode() can return RTE_IOVA_PA as default for PCI device.
if PCIe device supports IOVA_AS_VA,  pci_device_iova_mode() needs to return RTE_IOVA_VA if "SYSTEM" supports it

In this context, What will be the responsibility of new flag? Or How do you want to change the behavior of pci_device_iova_mode()







> 
> --
> Thanks,
> Anatoly
Burakov, Anatoly July 9, 2019, 2:37 p.m. | #14
On 09-Jul-19 3:00 PM, Jerin Jacob Kollanukkaran wrote:
>> -----Original Message-----
>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
>> Sent: Tuesday, July 9, 2019 7:00 PM
>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
>> <david.marchand@redhat.com>
>> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben
>> Walker <benjamin.walker@intel.com>
>> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
>> selection
>>
>> On 09-Jul-19 1:11 PM, Jerin Jacob Kollanukkaran wrote:
>>>> -----Original Message-----
>>>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
>>>> Sent: Tuesday, July 9, 2019 5:10 PM
>>>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
>>>> <david.marchand@redhat.com>
>>>> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>;
>> Ben
>>>> Walker <benjamin.walker@intel.com>
>>>> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA
>>>> mode selection
>>>>>>> ________________________________________
>>>>>>>
>>>>>>> On Mon, Jul 8, 2019 at 4:25 PM <mailto:jerinj@marvell.com> wrote:
>>>>>>> From: Jerin Jacob <mailto:jerinj@marvell.com>
>>>>>>>
>>>>>>> Existing logic fails to select IOVA mode as VA if driver request
>>>>>>> to enable IOVA as VA.
>>>>>>>
>>>>>>> IOVA as VA has more strict requirement than other modes, so
>>>>>>> enabling positive logic for IOVA as VA selection.
>>>>>>>
>>>>>>> This patch also updates the default IOVA mode as PA for PCI
>>>>>>> devices as it has to deal with DMA engines unlike the virtual
>>>>>>> devices that may need only IOVA as DC.
>>>>>>>
>>>>>>> We have three cases:
>>>>>>> - driver/hw supports IOVA as PA only
>>>>>>>
>>>>>>> [Jerin] It is not driver cap, it is more of system cap(IOMMU vs
>>>>>>> non IOMMU). We are already addressing that case
>>>>>>
>>>>>> I don't get how this works. How does "system capability" affect
>>>>>> what the device itself supports? Are we to assume that *all*
>>>>>> hardware support IOVA as VA by default? "System capability" is more
>>>>>> of a bus issue than an individual device issue, is it not?
>>>>>
>>>>> What I meant is, supporting VA vs PA is function of IOMMU(not the
>>>>> device
>>>> attribute).
>>>>> Ie. Device makes the  bus master request, if IOMMU available and
>>>>> enabled in the SYSTEM , It goes over IOMMU  and translate the IOVA
>>>>> to
>>>> physical address.
>>>>>
>>>>> Another way to put is, Is there any _PCIe_ device which
>>>>> need/requires RTE_PCI_DRV_NEED_IOVA_AS_PA in
>>>>> rte_pci_driver.drv_flags
>>>>>
>>>>>
>>>>
>>>> Previously, as far as i can tell, the flag was used to indicate
>>>> support for IOVA as VA mode, not *requirement* for IOVA as VA mode.
>>>> For example, there are multiple patches [1][2][3][4] (i'm sure i can
>>>> find more!) that added IOVA as VA support to various drivers, and
>>>> they all were worded it in this exact way
>>>> - "support for IOVA as VA mode", not "require IOVA as VA mode". As
>>>> far as i can tell, none of these drivers *require* IOVA as VA mode -
>>>> they merely use this flag to indicate support for it.
>>>
>>> Some class of devices NEED IOVA as VA for performance reasons.
>>> Specially the devices has HW mempool allocators. On those devices If
>>> we don’t use IOVA as VA, Upon getting packet from device, It needs to
>>> go over rte_mem_iova2virt() per packet see driver/net/dppa2. Which has
>> real performance issue.
>>
>> I wouldn't classify this as "needing" IOVA. "Need" implies it cannot work
>> without it, whereas in this case it's more of a "highly recommended" rather
>> than "need".
> 
> It is "need" as performance is horrible without it as is per packet SW translation.
> A "need" for DPDK performance perspective.

Would the driver fail to initialize if it detects running as IOVA as PA?

> 
>>
>>>>
>>>> Now suddenly it turns out that someone somewhere "knew" that "IOVA
>> as
>>>> VA" flag in PCI drivers is supposed to indicate *requirement* and not
>>>> support, and it appears that this knowledge was not communicated nor
>>>> documented anywhere, and is now treated as common knowledge.
>>>
>>> I think, the confusion here is,  I was under impression that # If
>>> device supports IOVA as VA and system runs with IOMMU then the  dpdk
>>> should run in IOVA as VA mode.
>>> If above statement true then we don’t really need a new flag.
>>
>> Exactly. And the flag used to indicate that the device *supports* IOVA as VA,
>> not that it *requires* it.
>>
>>>
>>> Couple of points to make forward progress:
>>> # If we think, there is a use case where device is IOVA as VA And
>>> system runs in IOMMU mode then for some reason DPDK needs to run in
>> PA
>>> mode. If so, we need to create two flags RTE_PCI_DRV_IOVA_AS_VA - it
>>> can run either modes
>>
>> There are use cases - KNI and igb_uio come to mind. Whether IOMMU uses
>> VA or PA is a different from whether IOMMU is in use - there is no law that
>> states that, when using IOMMU, IOVA have to have 1:1 mapping with VA.
>> IOMMU requirement does not necessarily imply IOVA as VA - it is perfectly
>> legal to program IOMMU to use IOVA as PA (which we currently do when we
>> e.g. use VFIO for some devices and igb_uio for others).
> 
> For KNI, we already submitted a patch to support IOVA as VA.

Yep, point being that it *didn't work* before, hence we may want to 
account for possible future use cases like this (however admittedly 
hacky they are). There are valid use cases to enforce IOVA as VA support 
only (such as for performance reasons, even though it would be 
technically possible to use IOVA as PA), and there could be valid use 
cases to enforce IOVA as PA support only (for example, i seem to 
remember that crypto/qat driver at one point didn't support VFIO driver, 
effectively rendering it not supporting IOVA as VA).

> I don’t know about igb_uio, if IOVA as PA, we might as well disable IOMMU.
> Is igb_uio enables IOMMU? I don’t see any reference.
> grep -ri "iommu" kernel/linux/igb_uio/

igb_uio can work with IOMMU with pass-through mode. When Linux is booted 
up in pass-through, IOMMU is enabled and igb_uio will work, and VFIO can 
use both IOVA as PA and IOVA as VA, while igb_uio can only use IOVA as 
PA. So yes, igb_uio does enable IOMMU in a very limited way, but only to 
set up 1:1 mapping of IOVA with PA.

Also, some other use cases will also require IOVA as PA while having 
full IOMMU support. An example of this would be systems with limited 
IOMMU width (such as VM's) - even though the IOMMU is technically 
supported, we may not have the necessary address width to run all 
devices in IOVA as VA mode, and would need to fall back to IOVA as PA. 
Since we cannot *require* IOVA as VA in current codebase, any driver 
that expects IOVA as VA to always be enabled will presumably not work.

> 
> Again, it is not device attribute, it is system attribute.

If it's a system attribute, why is it a device driver flag then? The 
system may or may not support IOMMU, the device itself probably doesn't 
care since bus address looks the same in both cases, *but the driver 
might* (such as would be in your case - requiring IOVA as VA and 
disallowing IOVA as PA for performance reasons).

> In current KNI case it
> Fall backs to PA irrespective of device capability so we don’t need any
> separate flag from driver.
> 
> Even if we introduce a flag, what it is supposed to do?

The same thing you are suggesting for one of your HW mempool drivers: a 
"need" to only run in IOVA as VA mode.

The logic in this case (as far as the driver is concerned, disregarding 
the kernel driver issue for now) would be:

has_pa = (drv->flags & SUPPORTS_PA) != 0;
has_va = (drv->flags & SUPPORTS_VA) != 0;
if (has_va && has_pa)
     return RTE_IOVA_DC; // don't care, supports both
if (has_va)
     return RTE_IOVA_VA; // only supports VA - your case
return RTE_IOVA_PA; // only supports PA

Currently (again, disregarding your interpretation of how IOVA as VA 
works and looking at the actual commit history), we always seem to imply 
that IOVA as PA works for all devices, and we use IOVA_AS_VA flag to 
indicate that the device *also* supports IOVA as VA mode.

But we don't have any way to express a *requirement* for IOVA as VA mode 
- only for IOVA as PA mode. That is the purpose of the new flag. You are 
stating that the IOVA_AS_VA drv flag is an expression of that 
requirement, but that is not reflected in the codebase - our commit 
history indicates that we don't treat IOVA as VA as hard requirement 
whenever this flag is specified (and i would argue that we shouldn't).

> 
>>
>>> RTE_PCI_DRV_NEED_IOVA_AS_VA - it can run only on IOVA as VA
>>
>> If we're adding a flag, we might as well not create a confusion and do it
>> consistently. If IOVA as PA is supported, have a flag to indicate that. If IOVA
>> as VA is supported, have a flag to indicate that. Absence of either flag implies
> 
> So in what category i40e driver comes? By default, pci bus should return PA for class.
> If VA supported then return VA.
> So how new flag will help?

We seem to be in agreement that we need *two* flags to express all three 
of the above. The question is, which flags. You suggest to have 
"supports IOVA as VA" and "requires IOVA as VA" as two options, while i 
am suggesting to have "supports IOVA as PA" and "supports IOVA as VA" as 
flags. This requires modification to all existing drivers and is perhaps 
undesirable from that point of view (this isn't my decision), but it's 
less confusing than having two IOVA-as-VA flags that differ slightly in 
their meaning (supports VA vs. requires VA).

Going back to your i40e example, AFAIK i40e supports both IOVA as VA and 
IOVA as PA - so in this case it should return RTE_IOVA_DC (i.e. use 
whatever's available). If other devices also don't care, then push the 
decision to the upper layers and not decide anything at the bus level [1].

[1] http://patchwork.dpdk.org/patch/54801/

> 
>> inability to work in that mode. I don't see how this is less clear and self-
>> documenting than having two IOVA as VA-related flags that have slightly
>> different meaning and imply things not otherwise stated explicitly.
>>
>>> # With top of tree, Currently it never runs in IOVA as VA mode.
>>> That’s a separate problem to fix. Which effect all the devices
>>> Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though Device
>>> support RTE_PCI_DRV_IOVA_AS_VA, it is not running With IOMMU
>>> protection and/or root privilege is required to run DPDK.
> 
> What's your view on this existing problem?

My view would be to always run in IOVA as VA by default and only falling 
back to IOVA as PA if there is a need to do that. Yet, it seems that 
whenever i try to bring this up, the response (not necessarily from you, 
so this is not directed at you specifically) seems to be that because of 
hotplug, we have to start in the "safest" (from device support point of 
view) mode - that is, in IOVA as PA. Seeing how, as you claim, some 
devices require IOVA as VA, then IOVA as PA is no longer the "safe" 
default that all devices will support. Perhaps we can use this 
opportunity to finally make IOVA as VA the default :)

> 
> 
>>>
>>>
>>>>
>>>> [1] http://patchwork.dpdk.org/patch/53206/
>>>> [2] http://patchwork.dpdk.org/patch/50274/
>>>> [3] http://patchwork.dpdk.org/patch/50991/
>>>> [4] http://patchwork.dpdk.org/patch/46134/
>>>>
>>>> --
>>>> Thanks,
>>>> Anatoly
>>
>>
>> --
>> Thanks,
>> Anatoly
Burakov, Anatoly July 9, 2019, 2:54 p.m. | #15
On 09-Jul-19 3:00 PM, Jerin Jacob Kollanukkaran wrote:
>> -----Original Message-----
>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
>> Sent: Tuesday, July 9, 2019 7:00 PM
>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
>> <david.marchand@redhat.com>
>> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben
>> Walker <benjamin.walker@intel.com>
>> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
>> selection
>>

<snip>

>>
>>> # With top of tree, Currently it never runs in IOVA as VA mode.
>>> That’s a separate problem to fix. Which effect all the devices
>>> Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though Device
>>> support RTE_PCI_DRV_IOVA_AS_VA, it is not running With IOMMU
>>> protection and/or root privilege is required to run DPDK.
> 

By the way, there seems to be some confusion here. IOVA as PA mode does 
*not* imply running without IOMMU protection. If IOVA as PA mode is 
used, it would require root privileges (to get physical addresses), but 
the IOMMU protection is still enabled. IOMMU doesn't care what you set 
up your addresses as, and the fact that they're 1:1 PA addresses doesn't 
mean IOMMU is not engaged.
Jerin Jacob Kollanukkaran July 9, 2019, 2:58 p.m. | #16
> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Tuesday, July 9, 2019 8:24 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
> <david.marchand@redhat.com>
> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben
> Walker <benjamin.walker@intel.com>
> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> selection
> 
> On 09-Jul-19 3:00 PM, Jerin Jacob Kollanukkaran wrote:
> >> -----Original Message-----
> >> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> >> Sent: Tuesday, July 9, 2019 7:00 PM
> >> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
> >> <david.marchand@redhat.com>
> >> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>;
> Ben
> >> Walker <benjamin.walker@intel.com>
> >> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA
> >> mode selection
> >>
> 
> <snip>
> 
> >>
> >>> # With top of tree, Currently it never runs in IOVA as VA mode.
> >>> That’s a separate problem to fix. Which effect all the devices
> >>> Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though
> Device
> >>> support RTE_PCI_DRV_IOVA_AS_VA, it is not running With IOMMU
> >>> protection and/or root privilege is required to run DPDK.
> >
> 
> By the way, there seems to be some confusion here. IOVA as PA mode does
> *not* imply running without IOMMU protection. If IOVA as PA mode is used,
> it would require root privileges (to get physical addresses), but the IOMMU
> protection is still enabled. IOMMU doesn't care what you set up your

Yes. It was thinking more  of VFIO perspective. Not igb_uio.


> addresses as, and the fact that they're 1:1 PA addresses doesn't mean
> IOMMU is not engaged.



> 
> --
> Thanks,
> Anatoly
Burakov, Anatoly July 9, 2019, 3:02 p.m. | #17
On 09-Jul-19 3:58 PM, Jerin Jacob Kollanukkaran wrote:
>> -----Original Message-----
>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
>> Sent: Tuesday, July 9, 2019 8:24 PM
>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
>> <david.marchand@redhat.com>
>> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben
>> Walker <benjamin.walker@intel.com>
>> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
>> selection
>>
>> On 09-Jul-19 3:00 PM, Jerin Jacob Kollanukkaran wrote:
>>>> -----Original Message-----
>>>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
>>>> Sent: Tuesday, July 9, 2019 7:00 PM
>>>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
>>>> <david.marchand@redhat.com>
>>>> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>;
>> Ben
>>>> Walker <benjamin.walker@intel.com>
>>>> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA
>>>> mode selection
>>>>
>>
>> <snip>
>>
>>>>
>>>>> # With top of tree, Currently it never runs in IOVA as VA mode.
>>>>> That’s a separate problem to fix. Which effect all the devices
>>>>> Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though
>> Device
>>>>> support RTE_PCI_DRV_IOVA_AS_VA, it is not running With IOMMU
>>>>> protection and/or root privilege is required to run DPDK.
>>>
>>
>> By the way, there seems to be some confusion here. IOVA as PA mode does
>> *not* imply running without IOMMU protection. If IOVA as PA mode is used,
>> it would require root privileges (to get physical addresses), but the IOMMU
>> protection is still enabled. IOMMU doesn't care what you set up your
> 
> Yes. It was thinking more  of VFIO perspective. Not igb_uio.
> 

It is the same for both.

When IOMMU is fully enabled (iommu=on at boot time), igb_uio will simply 
not work. VFIO will work, whichever address mode you use.

When IOMMU is in pass-through mode (iommu=pt at boot time), both igb_uio 
and VFIO will work, although igb_uio will only support IOVA as PA mode. 
Both modes will enable IOMMU, and both can run in IOVA as PA mode 
without losing that protection.

It's only when IOMMU is off, igb_uio will not engage IOMMU, and VFIO 
will only work in no-IOMMU mode (thus not engaging IOMMU either), and 
only then you lack the IOMMU protection.
Thomas Monjalon July 9, 2019, 3:04 p.m. | #18
09/07/2019 16:37, Burakov, Anatoly:
> My view would be to always run in IOVA as VA by default and only falling 
> back to IOVA as PA if there is a need to do that. Yet, it seems that 
> whenever i try to bring this up, the response (not necessarily from you, 
> so this is not directed at you specifically) seems to be that because of 
> hotplug, we have to start in the "safest" (from device support point of 
> view) mode - that is, in IOVA as PA. Seeing how, as you claim, some 
> devices require IOVA as VA, then IOVA as PA is no longer the "safe" 
> default that all devices will support. Perhaps we can use this 
> opportunity to finally make IOVA as VA the default :)

That's a good point Anatoly. We need to decide what is the safest default.

About the capabilities flags, please let's agree that we want
to express 3 cases, so we need 2 flags.
About the preference of a mode for a device, if a mode is really
bad for a device, I suggest to not advertise it in capabilities.

In order to take a better decision, we need a summary of the
decision algorithm per layer, involving kernel driver capabilities
and memory capabilities.
Burakov, Anatoly July 9, 2019, 3:06 p.m. | #19
On 09-Jul-19 4:04 PM, Thomas Monjalon wrote:
> 09/07/2019 16:37, Burakov, Anatoly:
>> My view would be to always run in IOVA as VA by default and only falling
>> back to IOVA as PA if there is a need to do that. Yet, it seems that
>> whenever i try to bring this up, the response (not necessarily from you,
>> so this is not directed at you specifically) seems to be that because of
>> hotplug, we have to start in the "safest" (from device support point of
>> view) mode - that is, in IOVA as PA. Seeing how, as you claim, some
>> devices require IOVA as VA, then IOVA as PA is no longer the "safe"
>> default that all devices will support. Perhaps we can use this
>> opportunity to finally make IOVA as VA the default :)
> 
> That's a good point Anatoly. We need to decide what is the safest default.
> 
> About the capabilities flags, please let's agree that we want
> to express 3 cases, so we need 2 flags.

We do agree on this.

> About the preference of a mode for a device, if a mode is really
> bad for a device, I suggest to not advertise it in capabilities.

Yes, agree with that as well.

> 
> In order to take a better decision, we need a summary of the
> decision algorithm per layer, involving kernel driver capabilities
> and memory capabilities.
> 

This needs to be documented very well, as this seems to be a source of 
great confusion for us all. If all was spelled out in the docs, we 
wouldn't need this long discussion to figure out that we actually agree :D
Thomas Monjalon July 9, 2019, 3:12 p.m. | #20
09/07/2019 17:02, Burakov, Anatoly:
> When IOMMU is fully enabled (iommu=on at boot time), igb_uio will simply 
> not work. VFIO will work, whichever address mode you use.
> 
> When IOMMU is in pass-through mode (iommu=pt at boot time), both igb_uio 
> and VFIO will work, although igb_uio will only support IOVA as PA mode. 
> Both modes will enable IOMMU, and both can run in IOVA as PA mode 
> without losing that protection.
> 
> It's only when IOMMU is off, igb_uio will not engage IOMMU, and VFIO 
> will only work in no-IOMMU mode (thus not engaging IOMMU either), and 
> only then you lack the IOMMU protection.

Could we try to make IOMMU status clear in DPDK logs?
Then we could check the kernel drivers loaded and give
a compatibility status for each of them as debug logs.
Burakov, Anatoly July 9, 2019, 3:18 p.m. | #21
On 09-Jul-19 4:12 PM, Thomas Monjalon wrote:
> 09/07/2019 17:02, Burakov, Anatoly:
>> When IOMMU is fully enabled (iommu=on at boot time), igb_uio will simply
>> not work. VFIO will work, whichever address mode you use.
>>
>> When IOMMU is in pass-through mode (iommu=pt at boot time), both igb_uio
>> and VFIO will work, although igb_uio will only support IOVA as PA mode.
>> Both modes will enable IOMMU, and both can run in IOVA as PA mode
>> without losing that protection.
>>
>> It's only when IOMMU is off, igb_uio will not engage IOMMU, and VFIO
>> will only work in no-IOMMU mode (thus not engaging IOMMU either), and
>> only then you lack the IOMMU protection.
> 
> Could we try to make IOMMU status clear in DPDK logs?
> Then we could check the kernel drivers loaded and give
> a compatibility status for each of them as debug logs.
> 

I don't think there is a way to know IOMMU status from DPDK. It's a 
property of the system. We can kinda-sorta check IOMMU status if we have 
VFIO driver (there's a API to check for vfio_noiommu i think), and we do 
log it into debug output, but there is no such facility for igb_uio - we 
cannot know if it does or does not engage the IOMMU. (not unless we grep 
dmesg or something)
Jerin Jacob Kollanukkaran July 9, 2019, 5:50 p.m. | #22
> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Tuesday, July 9, 2019 8:07 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
> <david.marchand@redhat.com>
> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben
> Walker <benjamin.walker@intel.com>
> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> selection
issue.
> >>
> >> I wouldn't classify this as "needing" IOVA. "Need" implies it cannot
> >> work without it, whereas in this case it's more of a "highly
> >> recommended" rather than "need".
> >
> > It is "need" as performance is horrible without it as is per packet SW
> translation.
> > A "need" for DPDK performance perspective.
> 
> Would the driver fail to initialize if it detects running as IOVA as PA?

Yes.
https://git.dpdk.org/dpdk/tree/drivers/net/octeontx2/otx2_ethdev.c#n1191

> Also, some other use cases will also require IOVA as PA while having full
> IOMMU support. An example of this would be systems with limited IOMMU
> width (such as VM's) - even though the IOMMU is technically supported, we
> may not have the necessary address width to run all devices in IOVA as VA
> mode, and would need to fall back to IOVA as PA.
> Since we cannot *require* IOVA as VA in current codebase, any driver that
> expects IOVA as VA to always be enabled will presumably not work.
> 
> >
> > Again, it is not device attribute, it is system attribute.
> 
> If it's a system attribute, why is it a device driver flag then? The system may
> or may not support IOMMU, the device itself probably doesn't care since bus
> address looks the same in both cases, *but the driver
> might* (such as would be in your case - requiring IOVA as VA and disallowing
> IOVA as PA for performance reasons).

Agree.

> 
> Currently (again, disregarding your interpretation of how IOVA as VA works
> and looking at the actual commit history), we always seem to imply that IOVA
> as PA works for all devices, and we use IOVA_AS_VA flag to indicate that the
> device *also* supports IOVA as VA mode.
> 
> But we don't have any way to express a *requirement* for IOVA as VA mode
> - only for IOVA as PA mode. That is the purpose of the new flag. You are
> stating that the IOVA_AS_VA drv flag is an expression of that requirement,
> but that is not reflected in the codebase - our commit history indicates that
> we don't treat IOVA as VA as hard requirement whenever this flag is
> specified (and i would argue that we shouldn't).

No objection to further classify it.

How about the following

1) Change RTE_PCI_DRV_IOVA_AS_VA as RTE_PCI_DRV_IOVA_AS_DC
It is same as existing RTE_PCI_DRV_IOVA_AS_VA. Meaning driver don't care IOVA as PA or VA.
2) Introduce RTE_PCI_DRV_NEED_IOVA_AS_VA(Driver needs IOVA as VA)
This would selected for octeontx device "drivers"
3) Change existing driver's "drv_flags" as RTE_PCI_DRV_IOVA_AS_DC if it can work
with PA and VA(literally all exiting drivers which currently has RTE_PCI_DRV_IOVA_AS_VA excluding the octeontx drivers)

In pci_device_iova_mode()

if (drv->flags & RTE_PCI_DRV_IOVA_AS_DC)
	iova_mode = RTE_IOVA_DC;
else if (drv->flags & RTE_PCI_DRV_NEED_IOVA_AS_VA)
	iova_mode = RTE_IOVA_VA;
else
	iova_mode = RTE_IOVA_PA;

I can submit the patch if above is OK.

> >>
> >>> # With top of tree, Currently it never runs in IOVA as VA mode.
> >>> That’s a separate problem to fix. Which effect all the devices
> >>> Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though
> Device
> >>> support RTE_PCI_DRV_IOVA_AS_VA, it is not running With IOMMU
> >>> protection and/or root privilege is required to run DPDK.
> >
> > What's your view on this existing problem?
> 
> My view would be to always run in IOVA as VA by default and only falling
> back to IOVA as PA if there is a need to do that. Yet, it seems that whenever i
> try to bring this up, the response (not necessarily from you, so this is not
> directed at you specifically) seems to be that because of hotplug, we have to
> start in the "safest" (from device support point of
> view) mode - that is, in IOVA as PA. Seeing how, as you claim, some devices
> require IOVA as VA, then IOVA as PA is no longer the "safe"
> default that all devices will support. Perhaps we can use this opportunity to
> finally make IOVA as VA the default :)

I was thinking to use VA as default if system/device/driver supports.
That’s the reason for the original patch to have VA selection in
pci code itself. But it makes sense to move that up.

Not related to this patch, Why hotplug prefers IOVA as PA? Now, If I understand it correctly, to accommodate
Hotplug for SPDK the pci_device_iova_mode() made it as DC so that
common code can pick PA.

I have no strong opinion on this, if it help for SPDK then no issue in keeping
default as PA in case of DC.
David Marchand July 10, 2019, 8:09 a.m. | #23
Hello guys,

On Tue, Jul 9, 2019 at 7:52 PM Jerin Jacob Kollanukkaran <jerinj@marvell.com>
wrote:

> > -----Original Message-----
> > From: Burakov, Anatoly <anatoly.burakov@intel.com>
> > Sent: Tuesday, July 9, 2019 8:07 PM
> > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
> > <david.marchand@redhat.com>
> > Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben
> > Walker <benjamin.walker@intel.com>
> > Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> > selection
> issue.
> > >>
> > >> I wouldn't classify this as "needing" IOVA. "Need" implies it cannot
> > >> work without it, whereas in this case it's more of a "highly
> > >> recommended" rather than "need".
> > >
> > > It is "need" as performance is horrible without it as is per packet SW
> > translation.
> > > A "need" for DPDK performance perspective.
> >
> > Would the driver fail to initialize if it detects running as IOVA as PA?
>
> Yes.
> https://git.dpdk.org/dpdk/tree/drivers/net/octeontx2/otx2_ethdev.c#n1191
>
> > Also, some other use cases will also require IOVA as PA while having full
> > IOMMU support. An example of this would be systems with limited IOMMU
> > width (such as VM's) - even though the IOMMU is technically supported, we
> > may not have the necessary address width to run all devices in IOVA as VA
> > mode, and would need to fall back to IOVA as PA.
> > Since we cannot *require* IOVA as VA in current codebase, any driver that
> > expects IOVA as VA to always be enabled will presumably not work.
> >
> > >
> > > Again, it is not device attribute, it is system attribute.
> >
> > If it's a system attribute, why is it a device driver flag then? The
> system may
> > or may not support IOMMU, the device itself probably doesn't care since
> bus
> > address looks the same in both cases, *but the driver
> > might* (such as would be in your case - requiring IOVA as VA and
> disallowing
> > IOVA as PA for performance reasons).
>
> Agree.
>
> >
> > Currently (again, disregarding your interpretation of how IOVA as VA
> works
> > and looking at the actual commit history), we always seem to imply that
> IOVA
> > as PA works for all devices, and we use IOVA_AS_VA flag to indicate that
> the
> > device *also* supports IOVA as VA mode.
> >
> > But we don't have any way to express a *requirement* for IOVA as VA mode
> > - only for IOVA as PA mode. That is the purpose of the new flag. You are
> > stating that the IOVA_AS_VA drv flag is an expression of that
> requirement,
> > but that is not reflected in the codebase - our commit history indicates
> that
> > we don't treat IOVA as VA as hard requirement whenever this flag is
> > specified (and i would argue that we shouldn't).
>
> No objection to further classify it.
>

I propose to introduce:
* RTE_PCI_DRV_IOVA_AS_PA which means "the combination of the pmd+kmod+hw
supports usage of Physical Addresses"
* RTE_PCI_DRV_IOVA_AS_VA which means "the combination of the pmd+kmod+hw
supports usage of Virtual Addresses"

- For the pci bus, the algorigthm would be:

devices_want_pa = false
devices_want_va = false

Foreach pci device
  Skip blacklisted devices
  Skip unbound devices (i.e. we only consider devices bound to a known
kernel driver)
  Skip unsupported devices (i.e. we only consider devices that have a pmd
that supports them)

  If the combination pmd+kmod only supports VA (RTE_PCI_DRV_IOVA_AS_VA
capability in driver flags), then devices_want_va = true
  Else if the combination pmd+kmod only supports PA (RTE_PCI_DRV_IOVA_AS_PA
capability in driver flags), then devices_want_pa = true

If devices_want_va and !devices_want_pa
  return RTE_IOVA_VA
If devices_want_pa and !devices_want_va
  return RTE_IOVA_PA

return RTE_IOVA_DC

Notes:
* the IOMMU limitations are considered as a per device/driver thing, since
the kmod is the one that configures the system IOMMU,
* the case "devices_want_pa and devices_want_va" is considered as DC, we
leave EAL decide based on the physical addresses availability because we
can't comply with all present devices/drivers in the system.
  This means that at bus probe time for a device, we must add a check that
the combination is fulfilled (and avoid this check in the drivers
themselves).


- For the global bus code, that aggregates the different buses preferences,
we need to do the same, while I suspect a bug at the moment.

The algorigthm:

buses_want_pa = false
buses_want_va = false

Foreach bus
  If the bus reports RTE_IOVA_VA, then buses_want_va = true
  Else if the bus reports RTE_IOVA_PA, then buses_want_pa = true

If buses_want_va and !buses_want_pa
  return RTE_IOVA_VA
If buses_want_pa and !buses_want_va
  return RTE_IOVA_PA

return RTE_IOVA_DC


- Finally at EAL level, we keep the current code.


Hope I did not miss anything.
If we agree on this, I will send the changes and an update in the
documentation.

Patch

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 33c8ea7e9..99636831e 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -567,7 +567,7 @@  enum rte_iova_mode
 pci_device_iova_mode(const struct rte_pci_driver *pdrv,
 		     const struct rte_pci_device *pdev)
 {
-	enum rte_iova_mode iova_mode = RTE_IOVA_DC;
+	enum rte_iova_mode iova_mode = RTE_IOVA_PA;
 	static int iommu_no_va = -1;
 
 	switch (pdev->kdrv) {
@@ -581,8 +581,8 @@  pci_device_iova_mode(const struct rte_pci_driver *pdrv,
 			else
 				is_vfio_noiommu_enabled = 0;
 		}
-		if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) == 0) {
-			iova_mode = RTE_IOVA_PA;
+		if (pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
+			iova_mode = RTE_IOVA_VA;
 		} else if (is_vfio_noiommu_enabled != 0) {
 			RTE_LOG(DEBUG, EAL, "Forcing to 'PA', vfio-noiommu mode configured\n");
 			iova_mode = RTE_IOVA_PA;
@@ -592,8 +592,8 @@  pci_device_iova_mode(const struct rte_pci_driver *pdrv,
 	}
 
 	case RTE_KDRV_NIC_MLX:
-		if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) == 0)
-			iova_mode = RTE_IOVA_PA;
+		if (pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA)
+			iova_mode = RTE_IOVA_VA;
 		break;
 
 	case RTE_KDRV_IGB_UIO: