[v2] eal/bus: use RTE_IOVA_PA only if phys addresses are available

Message ID 20180907155843.96465-1-dariusz.stojaczyk@intel.com
State Superseded, archived
Delegated to: Thomas Monjalon
Headers show
Series
  • [v2] eal/bus: use RTE_IOVA_PA only if phys addresses are available
Related show

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/checkpatch success coding style OK

Commit Message

Stojaczyk, Dariusz Sept. 7, 2018, 3:58 p.m.
When neither RTE_IOVA_VA nor RTE_IOVA_PA was explicitly
requested, DPDK would currently fallback to the default
RTE_IOVA_PA mode and possibly encounter a failure later
on if running as a non-priviledged user. Attempting to
use RTE_IOVA_VA if no phys addresses are available may
help in this case.

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
---
Changes since v1:
 * added a missing rte_memory.h include

 lib/librte_eal/common/eal_common_bus.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

Comments

Burakov, Anatoly Sept. 17, 2018, 10:33 a.m. | #1
On 07-Sep-18 4:58 PM, Darek Stojaczyk wrote:
> When neither RTE_IOVA_VA nor RTE_IOVA_PA was explicitly
> requested, DPDK would currently fallback to the default
> RTE_IOVA_PA mode and possibly encounter a failure later
> on if running as a non-priviledged user. Attempting to
> use RTE_IOVA_VA if no phys addresses are available may
> help in this case.
> 
> Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
> ---
> Changes since v1:
>   * added a missing rte_memory.h include
> 
>   lib/librte_eal/common/eal_common_bus.c | 19 +++++++++++++++----
>   1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
> index 0943851cc..68c581b8a 100644
> --- a/lib/librte_eal/common/eal_common_bus.c
> +++ b/lib/librte_eal/common/eal_common_bus.c
> @@ -37,6 +37,7 @@
>   #include <rte_bus.h>
>   #include <rte_debug.h>
>   #include <rte_string_fns.h>
> +#include <rte_memory.h>
>   
>   #include "eal_private.h"
>   
> @@ -236,9 +237,19 @@ rte_bus_get_iommu_class(void)
>   			mode |= bus->get_iommu_class();
>   	}
>   
> -	if (mode != RTE_IOVA_VA) {
> -		/* Use default IOVA mode */
> -		mode = RTE_IOVA_PA;
> +	if (mode == RTE_IOVA_VA)
> +		return RTE_IOVA_VA;
> +
> +	if (mode & RTE_IOVA_PA) {
> +		/* Not all buses support RTE_IOVA_VA, fallback to RTE_IOVA_PA */
> +		return RTE_IOVA_PA;
> +	}
> +
> +	if (rte_eal_using_phys_addrs()) {
> +		/* Default to RTE_IOVA_PA only if it's supported */
> +		return RTE_IOVA_PA;
>   	}
> -	return mode;
> +
> +	/* Since RTE_IOVA_PA is unsupported, fallback to RTE_IOVA_VA */
> +	return RTE_IOVA_VA;
>   }
> 

This is a good change, however I think that this is too pessimistic. If 
i don't have any devices that explictly require IOVA_PA, i should be 
running in IOVA_VA mode.

This of course doesn't take hotplug into account, so a command-line 
switch to force one or the other should also be available.

For example, at startup, i might have devices bound to VFIO, so IOVA_VA 
mode is picked. However, even though at a time of startup none of the 
devices require physical addresses, i also know that i might later 
hotplug a device that requires IOVA_PA (leaving the question of hotplug 
brokenness aside for now...) - currently, this scenario will not work, 
as i will be forced to use IOVA_VA mode unless i happen to have a 
IOVA_PA device available at startup.

Similarly, if i'm running DPDK as root but am only using virtual devices 
like pcap, i should be able to force DPDK into using VA addresses [*], 
yet currently i will be forced to use IOVA_PA if i don't *also* have a 
few devices bound exclusively to VFIO.

[*] Do we have vdev devices that require IOVA_PA? I can't think of any...
Stojaczyk, Dariusz Sept. 17, 2018, 1:06 p.m. | #2
> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Monday, September 17, 2018 12:34 PM
> To: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com>; dev@dpdk.org;
> Santosh Shukla <santosh.shukla@caviumnetworks.com>; Hemant Agrawal
> <hemant.agrawal@nxp.com>; Jerin Jacob
> <jerin.jacob@caviumnetworks.com>
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; Chas Williams
> <chas3@att.com>
> Subject: Re: [PATCH v2] eal/bus: use RTE_IOVA_PA only if phys addresses
> are available
> 
> On 07-Sep-18 4:58 PM, Darek Stojaczyk wrote:
> > When neither RTE_IOVA_VA nor RTE_IOVA_PA was explicitly requested,
> > DPDK would currently fallback to the default RTE_IOVA_PA mode and
> > possibly encounter a failure later on if running as a non-priviledged
> > user. Attempting to use RTE_IOVA_VA if no phys addresses are available
> > may help in this case.
> >
> > Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
> > ---
> > Changes since v1:
> >   * added a missing rte_memory.h include
> >
> >   lib/librte_eal/common/eal_common_bus.c | 19 +++++++++++++++----
> >   1 file changed, 15 insertions(+), 4 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/eal_common_bus.c
> > b/lib/librte_eal/common/eal_common_bus.c
> > index 0943851cc..68c581b8a 100644
> > --- a/lib/librte_eal/common/eal_common_bus.c
> > +++ b/lib/librte_eal/common/eal_common_bus.c
> > @@ -37,6 +37,7 @@
> >   #include <rte_bus.h>
> >   #include <rte_debug.h>
> >   #include <rte_string_fns.h>
> > +#include <rte_memory.h>
> >
> >   #include "eal_private.h"
> >
> > @@ -236,9 +237,19 @@ rte_bus_get_iommu_class(void)
> >   			mode |= bus->get_iommu_class();
> >   	}
> >
> > -	if (mode != RTE_IOVA_VA) {
> > -		/* Use default IOVA mode */
> > -		mode = RTE_IOVA_PA;
> > +	if (mode == RTE_IOVA_VA)
> > +		return RTE_IOVA_VA;
> > +
> > +	if (mode & RTE_IOVA_PA) {
> > +		/* Not all buses support RTE_IOVA_VA, fallback to
> RTE_IOVA_PA */
> > +		return RTE_IOVA_PA;
> > +	}
> > +
> > +	if (rte_eal_using_phys_addrs()) {
> > +		/* Default to RTE_IOVA_PA only if it's supported */
> > +		return RTE_IOVA_PA;
> >   	}
> > -	return mode;
> > +
> > +	/* Since RTE_IOVA_PA is unsupported, fallback to RTE_IOVA_VA */
> > +	return RTE_IOVA_VA;
> >   }
> >
> 
> This is a good change, however I think that this is too pessimistic. If i don't
> have any devices that explictly require IOVA_PA, i should be running in
> IOVA_VA mode.

Another problem may occur when trying to hotplug devices that support only 39bit DMA. You may not be able to map any memory with vfio when in RTE_IOVA_VA mode, as virtual addresses likely occupy more than 39 bits. 

The rte_pci bus enforces RTE_IOVA_PA whenever it finds such devices on init.

I have no doubt the logic can be improved here, but for now RTE_IOVA_PA is the only safe default.

D.

> 
> This of course doesn't take hotplug into account, so a command-line switch
> to force one or the other should also be available.
> 
> For example, at startup, i might have devices bound to VFIO, so IOVA_VA
> mode is picked. However, even though at a time of startup none of the
> devices require physical addresses, i also know that i might later hotplug a
> device that requires IOVA_PA (leaving the question of hotplug brokenness
> aside for now...) - currently, this scenario will not work, as i will be forced to
> use IOVA_VA mode unless i happen to have a IOVA_PA device available at
> startup.
> 
> Similarly, if i'm running DPDK as root but am only using virtual devices like
> pcap, i should be able to force DPDK into using VA addresses [*], yet
> currently i will be forced to use IOVA_PA if i don't *also* have a few devices
> bound exclusively to VFIO.
> 
> [*] Do we have vdev devices that require IOVA_PA? I can't think of any...
> 
> --
> Thanks,
> Anatoly
Thomas Monjalon Oct. 28, 2018, 11:11 p.m. | #3
17/09/2018 12:33, Burakov, Anatoly:
> On 07-Sep-18 4:58 PM, Darek Stojaczyk wrote:
> > When neither RTE_IOVA_VA nor RTE_IOVA_PA was explicitly
> > requested, DPDK would currently fallback to the default
> > RTE_IOVA_PA mode and possibly encounter a failure later
> > on if running as a non-priviledged user. Attempting to
> > use RTE_IOVA_VA if no phys addresses are available may
> > help in this case.
> > 
> > Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
> > ---
> > Changes since v1:
> >   * added a missing rte_memory.h include
> > 
> >   lib/librte_eal/common/eal_common_bus.c | 19 +++++++++++++++----
> >   1 file changed, 15 insertions(+), 4 deletions(-)
> > 
> > diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
> > index 0943851cc..68c581b8a 100644
> > --- a/lib/librte_eal/common/eal_common_bus.c
> > +++ b/lib/librte_eal/common/eal_common_bus.c
> > @@ -37,6 +37,7 @@
> >   #include <rte_bus.h>
> >   #include <rte_debug.h>
> >   #include <rte_string_fns.h>
> > +#include <rte_memory.h>
> >   
> >   #include "eal_private.h"
> >   
> > @@ -236,9 +237,19 @@ rte_bus_get_iommu_class(void)
> >   			mode |= bus->get_iommu_class();
> >   	}
> >   
> > -	if (mode != RTE_IOVA_VA) {
> > -		/* Use default IOVA mode */
> > -		mode = RTE_IOVA_PA;
> > +	if (mode == RTE_IOVA_VA)
> > +		return RTE_IOVA_VA;
> > +
> > +	if (mode & RTE_IOVA_PA) {
> > +		/* Not all buses support RTE_IOVA_VA, fallback to RTE_IOVA_PA */
> > +		return RTE_IOVA_PA;
> > +	}
> > +
> > +	if (rte_eal_using_phys_addrs()) {
> > +		/* Default to RTE_IOVA_PA only if it's supported */
> > +		return RTE_IOVA_PA;
> >   	}
> > -	return mode;
> > +
> > +	/* Since RTE_IOVA_PA is unsupported, fallback to RTE_IOVA_VA */
> > +	return RTE_IOVA_VA;
> >   }
> > 
> 
> This is a good change, however I think that this is too pessimistic. If 
> i don't have any devices that explictly require IOVA_PA, i should be 
> running in IOVA_VA mode.
> 
> This of course doesn't take hotplug into account, so a command-line 
> switch to force one or the other should also be available.
> 
> For example, at startup, i might have devices bound to VFIO, so IOVA_VA 
> mode is picked. However, even though at a time of startup none of the 
> devices require physical addresses, i also know that i might later 
> hotplug a device that requires IOVA_PA (leaving the question of hotplug 
> brokenness aside for now...) - currently, this scenario will not work, 
> as i will be forced to use IOVA_VA mode unless i happen to have a 
> IOVA_PA device available at startup.
> 
> Similarly, if i'm running DPDK as root but am only using virtual devices 
> like pcap, i should be able to force DPDK into using VA addresses [*], 
> yet currently i will be forced to use IOVA_PA if i don't *also* have a 
> few devices bound exclusively to VFIO.
> 
> [*] Do we have vdev devices that require IOVA_PA? I can't think of any...

If running as root, what is the benefit of using virtual addresses?
Burakov, Anatoly Oct. 30, 2018, 10:25 a.m. | #4
On 28-Oct-18 11:11 PM, Thomas Monjalon wrote:
> 17/09/2018 12:33, Burakov, Anatoly:
>> On 07-Sep-18 4:58 PM, Darek Stojaczyk wrote:
>>> When neither RTE_IOVA_VA nor RTE_IOVA_PA was explicitly
>>> requested, DPDK would currently fallback to the default
>>> RTE_IOVA_PA mode and possibly encounter a failure later
>>> on if running as a non-priviledged user. Attempting to
>>> use RTE_IOVA_VA if no phys addresses are available may
>>> help in this case.
>>>
>>> Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
>>> ---
>>> Changes since v1:
>>>    * added a missing rte_memory.h include
>>>
>>>    lib/librte_eal/common/eal_common_bus.c | 19 +++++++++++++++----
>>>    1 file changed, 15 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
>>> index 0943851cc..68c581b8a 100644
>>> --- a/lib/librte_eal/common/eal_common_bus.c
>>> +++ b/lib/librte_eal/common/eal_common_bus.c
>>> @@ -37,6 +37,7 @@
>>>    #include <rte_bus.h>
>>>    #include <rte_debug.h>
>>>    #include <rte_string_fns.h>
>>> +#include <rte_memory.h>
>>>    
>>>    #include "eal_private.h"
>>>    
>>> @@ -236,9 +237,19 @@ rte_bus_get_iommu_class(void)
>>>    			mode |= bus->get_iommu_class();
>>>    	}
>>>    
>>> -	if (mode != RTE_IOVA_VA) {
>>> -		/* Use default IOVA mode */
>>> -		mode = RTE_IOVA_PA;
>>> +	if (mode == RTE_IOVA_VA)
>>> +		return RTE_IOVA_VA;
>>> +
>>> +	if (mode & RTE_IOVA_PA) {
>>> +		/* Not all buses support RTE_IOVA_VA, fallback to RTE_IOVA_PA */
>>> +		return RTE_IOVA_PA;
>>> +	}
>>> +
>>> +	if (rte_eal_using_phys_addrs()) {
>>> +		/* Default to RTE_IOVA_PA only if it's supported */
>>> +		return RTE_IOVA_PA;
>>>    	}
>>> -	return mode;
>>> +
>>> +	/* Since RTE_IOVA_PA is unsupported, fallback to RTE_IOVA_VA */
>>> +	return RTE_IOVA_VA;
>>>    }
>>>
>>
>> This is a good change, however I think that this is too pessimistic. If
>> i don't have any devices that explictly require IOVA_PA, i should be
>> running in IOVA_VA mode.
>>
>> This of course doesn't take hotplug into account, so a command-line
>> switch to force one or the other should also be available.
>>
>> For example, at startup, i might have devices bound to VFIO, so IOVA_VA
>> mode is picked. However, even though at a time of startup none of the
>> devices require physical addresses, i also know that i might later
>> hotplug a device that requires IOVA_PA (leaving the question of hotplug
>> brokenness aside for now...) - currently, this scenario will not work,
>> as i will be forced to use IOVA_VA mode unless i happen to have a
>> IOVA_PA device available at startup.
>>
>> Similarly, if i'm running DPDK as root but am only using virtual devices
>> like pcap, i should be able to force DPDK into using VA addresses [*],
>> yet currently i will be forced to use IOVA_PA if i don't *also* have a
>> few devices bound exclusively to VFIO.
>>
>> [*] Do we have vdev devices that require IOVA_PA? I can't think of any...
> 
> If running as root, what is the benefit of using virtual addresses?
> 

Contiguous memory addresses is one that comes to mind.
Alejandro Lucero Oct. 30, 2018, 12:58 p.m. | #5
On Mon, Sep 17, 2018 at 2:06 PM Stojaczyk, Dariusz <
dariusz.stojaczyk@intel.com> wrote:

>
>
> > -----Original Message-----
> > From: Burakov, Anatoly
> > Sent: Monday, September 17, 2018 12:34 PM
> > To: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com>; dev@dpdk.org;
> > Santosh Shukla <santosh.shukla@caviumnetworks.com>; Hemant Agrawal
> > <hemant.agrawal@nxp.com>; Jerin Jacob
> > <jerin.jacob@caviumnetworks.com>
> > Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; Chas Williams
> > <chas3@att.com>
> > Subject: Re: [PATCH v2] eal/bus: use RTE_IOVA_PA only if phys addresses
> > are available
> >
> > On 07-Sep-18 4:58 PM, Darek Stojaczyk wrote:
> > > When neither RTE_IOVA_VA nor RTE_IOVA_PA was explicitly requested,
> > > DPDK would currently fallback to the default RTE_IOVA_PA mode and
> > > possibly encounter a failure later on if running as a non-priviledged
> > > user. Attempting to use RTE_IOVA_VA if no phys addresses are available
> > > may help in this case.
> > >
> > > Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
> > > ---
> > > Changes since v1:
> > >   * added a missing rte_memory.h include
> > >
> > >   lib/librte_eal/common/eal_common_bus.c | 19 +++++++++++++++----
> > >   1 file changed, 15 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/lib/librte_eal/common/eal_common_bus.c
> > > b/lib/librte_eal/common/eal_common_bus.c
> > > index 0943851cc..68c581b8a 100644
> > > --- a/lib/librte_eal/common/eal_common_bus.c
> > > +++ b/lib/librte_eal/common/eal_common_bus.c
> > > @@ -37,6 +37,7 @@
> > >   #include <rte_bus.h>
> > >   #include <rte_debug.h>
> > >   #include <rte_string_fns.h>
> > > +#include <rte_memory.h>
> > >
> > >   #include "eal_private.h"
> > >
> > > @@ -236,9 +237,19 @@ rte_bus_get_iommu_class(void)
> > >                     mode |= bus->get_iommu_class();
> > >     }
> > >
> > > -   if (mode != RTE_IOVA_VA) {
> > > -           /* Use default IOVA mode */
> > > -           mode = RTE_IOVA_PA;
> > > +   if (mode == RTE_IOVA_VA)
> > > +           return RTE_IOVA_VA;
> > > +
> > > +   if (mode & RTE_IOVA_PA) {
> > > +           /* Not all buses support RTE_IOVA_VA, fallback to
> > RTE_IOVA_PA */
> > > +           return RTE_IOVA_PA;
> > > +   }
> > > +
> > > +   if (rte_eal_using_phys_addrs()) {
> > > +           /* Default to RTE_IOVA_PA only if it's supported */
> > > +           return RTE_IOVA_PA;
> > >     }
> > > -   return mode;
> > > +
> > > +   /* Since RTE_IOVA_PA is unsupported, fallback to RTE_IOVA_VA */
> > > +   return RTE_IOVA_VA;
> > >   }
> > >
> >
> > This is a good change, however I think that this is too pessimistic. If
> i don't
> > have any devices that explictly require IOVA_PA, i should be running in
> > IOVA_VA mode.
>
> Another problem may occur when trying to hotplug devices that support only
> 39bit DMA. You may not be able to map any memory with vfio when in
> RTE_IOVA_VA mode, as virtual addresses likely occupy more than 39 bits.
>
>
There is now a hint for trying to map memory as low as possible instead of
using default Linux mmap base address. This makes devices with addressing
limitations being usable as long as the physical memory to map is not more
than what those devices allow.



> The rte_pci bus enforces RTE_IOVA_PA whenever it finds such devices on
> init.
>
> I have no doubt the logic can be improved here, but for now RTE_IOVA_PA is
> the only safe default.
>
> D.
>
> >
> > This of course doesn't take hotplug into account, so a command-line
> switch
> > to force one or the other should also be available.
> >
> > For example, at startup, i might have devices bound to VFIO, so IOVA_VA
> > mode is picked. However, even though at a time of startup none of the
> > devices require physical addresses, i also know that i might later
> hotplug a
> > device that requires IOVA_PA (leaving the question of hotplug brokenness
> > aside for now...) - currently, this scenario will not work, as i will be
> forced to
> > use IOVA_VA mode unless i happen to have a IOVA_PA device available at
> > startup.
> >
> > Similarly, if i'm running DPDK as root but am only using virtual devices
> like
> > pcap, i should be able to force DPDK into using VA addresses [*], yet
> > currently i will be forced to use IOVA_PA if i don't *also* have a few
> devices
> > bound exclusively to VFIO.
> >
> > [*] Do we have vdev devices that require IOVA_PA? I can't think of any...
> >
> > --
> > Thanks,
> > Anatoly
>

Patch

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851cc..68c581b8a 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@ 
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_memory.h>
 
 #include "eal_private.h"
 
@@ -236,9 +237,19 @@  rte_bus_get_iommu_class(void)
 			mode |= bus->get_iommu_class();
 	}
 
-	if (mode != RTE_IOVA_VA) {
-		/* Use default IOVA mode */
-		mode = RTE_IOVA_PA;
+	if (mode == RTE_IOVA_VA)
+		return RTE_IOVA_VA;
+
+	if (mode & RTE_IOVA_PA) {
+		/* Not all buses support RTE_IOVA_VA, fallback to RTE_IOVA_PA */
+		return RTE_IOVA_PA;
+	}
+
+	if (rte_eal_using_phys_addrs()) {
+		/* Default to RTE_IOVA_PA only if it's supported */
+		return RTE_IOVA_PA;
 	}
-	return mode;
+
+	/* Since RTE_IOVA_PA is unsupported, fallback to RTE_IOVA_VA */
+	return RTE_IOVA_VA;
 }