[v2] eal/linux: force iova-mode va without pa available

Message ID 20231124100904.388453-1-christian.ehrhardt@canonical.com (mailing list archive)
State Rejected, archived
Delegated to: Thomas Monjalon
Headers
Series [v2] eal/linux: force iova-mode va without pa available |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/github-robot: build success github build: passed
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-sample-apps-testing success Testing PASS

Commit Message

Christian Ehrhardt Nov. 24, 2023, 10:09 a.m. UTC
  From: David Wilder <dwilder@us.ibm.com>

When using --no-huge option physical address are not guaranteed
to be persistent.

This change effectively makes "--no-huge" the same as
"--no-huge --iova-mode=va".

When --no-huge is used (or any other condition making physical
addresses unavailable) setting --iova-mode=pa will have no effect.

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
---
 doc/guides/prog_guide/env_abstraction_layer.rst |  9 ++++++---
 lib/eal/linux/eal.c                             | 16 ++++++++++------
 2 files changed, 16 insertions(+), 9 deletions(-)
  

Comments

Dmitry Kozlyuk Nov. 24, 2023, 10:29 a.m. UTC | #1
2023-11-24 11:09 (UTC+0100), christian.ehrhardt@canonical.com:
[...]
> diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
> index 57da058cec..2f1fce3c54 100644
> --- a/lib/eal/linux/eal.c
> +++ b/lib/eal/linux/eal.c
> @@ -1067,6 +1067,16 @@ rte_eal_init(int argc, char **argv)
>  
>  	phys_addrs = rte_eal_using_phys_addrs() != 0;
>  
> +	if (!phys_addrs) {
> +		/* if we have no access to physical addresses, pick IOVA as VA mode. */
> +		if (internal_conf->iova_mode == RTE_IOVA_PA)
> +			RTE_LOG(WARNING, EAL, "WARNING: --iova-mode=pa, but Physical addresses are unavailable, selecting IOVA as VA mode.\n");

If an impossible combination of options is requested,
initialization should fail instead.

> +		else
> +			RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n");
> +		internal_conf->iova_mode = RTE_IOVA_VA;
> +		rte_eal_get_configuration()->iova_mode = internal_conf->iova_mode;
> +	}
> +
>  	/* if no EAL option "--iova-mode=<pa|va>", use bus IOVA scheme */
>  	if (internal_conf->iova_mode == RTE_IOVA_DC) {
>  		/* autodetect the IOVA mapping mode */

What do you think about keeping the existing code structure:

if (--iova-mode not specified) {
	iova_mode = VA if !phys_addrs or !RTE_IOVA_IN_MBUF (with logs)
	if (iova_mode == DC) {
		// autodetect from bus requirements and IOMMU (with logs)
	}
	rte_eal_get_configuration()->iova_mode = iova_mode;
} else {
	rte_eal_get_configuration()->iova_mode =
		internal_conf->iova_mode;
}
// verify rte_eal_get_configuration()->iova_mode

Note: the logic should be consistent across OS when possible.
  
Christian Ehrhardt Nov. 28, 2023, 2:39 p.m. UTC | #2
On Fri, Nov 24, 2023 at 11:29 AM Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
wrote:

> 2023-11-24 11:09 (UTC+0100), christian.ehrhardt@canonical.com:
> [...]
> > diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
> > index 57da058cec..2f1fce3c54 100644
> > --- a/lib/eal/linux/eal.c
> > +++ b/lib/eal/linux/eal.c
> > @@ -1067,6 +1067,16 @@ rte_eal_init(int argc, char **argv)
> >
> >       phys_addrs = rte_eal_using_phys_addrs() != 0;
> >
> > +     if (!phys_addrs) {
> > +             /* if we have no access to physical addresses, pick IOVA
> as VA mode. */
> > +             if (internal_conf->iova_mode == RTE_IOVA_PA)
> > +                     RTE_LOG(WARNING, EAL, "WARNING: --iova-mode=pa,
> but Physical addresses are unavailable, selecting IOVA as VA mode.\n");
>
> If an impossible combination of options is requested,
> initialization should fail instead.
>

You are absolutely right Dmitry.
In fact I was only trying to rebase an old patch that we used to carry.
But the more I look at and think about it the less I like the approach.

A production setup has an admin that should do this consciously.
What we actually should change is not the behavior of EAL, but just the
test automation to work on no-huge ppc64.
That is simpler and has much less impact and therefore probably is the
right solution.

Consider this patch here withdrawn.
I'll submit the fix to the tests in a few seconds.


> > +             else
> > +                     RTE_LOG(DEBUG, EAL, "Physical addresses are
> unavailable, selecting IOVA as VA mode.\n");
> > +             internal_conf->iova_mode = RTE_IOVA_VA;
> > +             rte_eal_get_configuration()->iova_mode =
> internal_conf->iova_mode;
> > +     }
> > +
> >       /* if no EAL option "--iova-mode=<pa|va>", use bus IOVA scheme */
> >       if (internal_conf->iova_mode == RTE_IOVA_DC) {
> >               /* autodetect the IOVA mapping mode */
>
> What do you think about keeping the existing code structure:
>
> if (--iova-mode not specified) {
>         iova_mode = VA if !phys_addrs or !RTE_IOVA_IN_MBUF (with logs)
>         if (iova_mode == DC) {
>                 // autodetect from bus requirements and IOMMU (with logs)
>         }
>         rte_eal_get_configuration()->iova_mode = iova_mode;
> } else {
>         rte_eal_get_configuration()->iova_mode =
>                 internal_conf->iova_mode;
> }
> // verify rte_eal_get_configuration()->iova_mode
>
> Note: the logic should be consistent across OS when possible.
>
>
  
David Christensen Nov. 29, 2023, 9:50 p.m. UTC | #3
On 11/24/23 2:09 AM, christian.ehrhardt@canonical.com wrote:
> From: David Wilder <dwilder@us.ibm.com>
> 
> When using --no-huge option physical address are not guaranteed
> to be persistent.
> 
> This change effectively makes "--no-huge" the same as
> "--no-huge --iova-mode=va".
> 
> When --no-huge is used (or any other condition making physical
> addresses unavailable) setting --iova-mode=pa will have no effect.
> 
> Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
> ---
>   doc/guides/prog_guide/env_abstraction_layer.rst |  9 ++++++---
>   lib/eal/linux/eal.c                             | 16 ++++++++++------
>   2 files changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
> index 6debf54efb..20c7355e0f 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -559,9 +559,12 @@ IOVA Mode is selected by considering what the current usable Devices on the
>   system require and/or support.
> 
>   On FreeBSD, RTE_IOVA_PA is always the default. On Linux, the IOVA mode is
> -detected based on a 2-step heuristic detailed below.
> +detected based on a heuristic detailed below.
> 
> -For the first step, EAL asks each bus its requirement in terms of IOVA mode
> +For the first step, if no Physical Addresses are available RTE_IOVA_VA is
> +selected.
> +
> +Then EAL asks each bus its requirement in terms of IOVA mode
>   and decides on a preferred IOVA mode.
> 
>   - if all buses report RTE_IOVA_PA, then the preferred IOVA mode is RTE_IOVA_PA,
> @@ -575,7 +578,7 @@ and decides on a preferred IOVA mode.
>   If the buses have expressed no preference on which IOVA mode to pick, then a
>   default is selected using the following logic:
> 
> -- if physical addresses are not available, RTE_IOVA_VA mode is used
> +- if enable_iova_as_pa was not set at build RTE_IOVA_VA mode is used
>   - if /sys/kernel/iommu_groups is not empty, RTE_IOVA_VA mode is used
>   - otherwise, RTE_IOVA_PA mode is used
> 
> diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
> index 57da058cec..2f1fce3c54 100644
> --- a/lib/eal/linux/eal.c
> +++ b/lib/eal/linux/eal.c
> @@ -1067,6 +1067,16 @@ rte_eal_init(int argc, char **argv)
> 
>   	phys_addrs = rte_eal_using_phys_addrs() != 0;
> 
> +	if (!phys_addrs) {
> +		/* if we have no access to physical addresses, pick IOVA as VA mode. */
> +		if (internal_conf->iova_mode == RTE_IOVA_PA)
> +			RTE_LOG(WARNING, EAL, "WARNING: --iova-mode=pa, but Physical addresses are unavailable, selecting IOVA as VA mode.\n");
> +		else
> +			RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n");
> +		internal_conf->iova_mode = RTE_IOVA_VA;
> +		rte_eal_get_configuration()->iova_mode = internal_conf->iova_mode;
> +	}
> +
>   	/* if no EAL option "--iova-mode=<pa|va>", use bus IOVA scheme */
>   	if (internal_conf->iova_mode == RTE_IOVA_DC) {
>   		/* autodetect the IOVA mapping mode */
> @@ -1078,12 +1088,6 @@ rte_eal_init(int argc, char **argv)
>   			if (!RTE_IOVA_IN_MBUF) {
>   				iova_mode = RTE_IOVA_VA;
>   				RTE_LOG(DEBUG, EAL, "IOVA as VA mode is forced by build option.\n");
> -			} else if (!phys_addrs) {
> -				/* if we have no access to physical addresses,
> -				 * pick IOVA as VA mode.
> -				 */
> -				iova_mode = RTE_IOVA_VA;
> -				RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n");
>   			} else if (is_iommu_enabled()) {
>   				/* we have an IOMMU, pick IOVA as VA mode */
>   				iova_mode = RTE_IOVA_VA;

What tests are you running that generate an error without explicitly 
selecting the iova-mode?  When I run the fast-tests I see the system 
correctly selecting VA without any additional command-line parameters 
when running 23.11:


21:47:05 DPDK_TEST=acl_autotest MALLOC_PERTURB_=201 
/home/drc/src/dpdk/build/app/dpdk-test --no-huge -m 2048
----------------------------------- output 
-----------------------------------
stdout:
RTE>>acl_autotest
acl context <acl_ctx>@0x179cbd300
   socket_id=-1
   alg=5
   first_load_sz=0
   max_rules=196608
   rule_size=128
   num_rules=0
   num_categories=0
   num_tries=0
acl context <acl_ctx>@0x179cbd300
   socket_id=-1
   alg=5
   first_load_sz=0
   max_rules=196608
   rule_size=128
   num_rules=0
   num_categories=0
   num_tries=0
running test_convert_rules(acl_ipv4vlan_tuple)
running test_convert_rules(acl_ipv4vlan_tuple, 
RTE_ACL_FIELD_TYPE_BITMASK type for IPv4)
running test_convert_rules(acl_ipv4vlan_tuple, RTE_ACL_FIELD_TYPE_RANGE 
type for IPv4)
running test_convert_rules(acl_ipv4vlan_tuple: swap VLAN and PORTs order)
running test_convert_rules(acl_ipv4vlan_tuple: swap SRC and DST IPv4 order)
test_u32_range#1704 starting range test from 0 to 264192
Test OK
RTE>>
stderr:
EAL: Detected CPU lcores: 128
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'

Dave
  
Christian Ehrhardt Nov. 30, 2023, 1:48 p.m. UTC | #4
On Wed, Nov 29, 2023 at 10:51 PM David Christensen <drc@linux.vnet.ibm.com>
wrote:

>
>
> On 11/24/23 2:09 AM, christian.ehrhardt@canonical.com wrote:
> > From: David Wilder <dwilder@us.ibm.com>
> >
> > When using --no-huge option physical address are not guaranteed
> > to be persistent.
> >
> > This change effectively makes "--no-huge" the same as
> > "--no-huge --iova-mode=va".
> >
> > When --no-huge is used (or any other condition making physical
> > addresses unavailable) setting --iova-mode=pa will have no effect.
> >
> > Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
> > ---
> >   doc/guides/prog_guide/env_abstraction_layer.rst |  9 ++++++---
> >   lib/eal/linux/eal.c                             | 16 ++++++++++------
> >   2 files changed, 16 insertions(+), 9 deletions(-)
> >
> > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst
> b/doc/guides/prog_guide/env_abstraction_layer.rst
> > index 6debf54efb..20c7355e0f 100644
> > --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> > @@ -559,9 +559,12 @@ IOVA Mode is selected by considering what the
> current usable Devices on the
> >   system require and/or support.
> >
> >   On FreeBSD, RTE_IOVA_PA is always the default. On Linux, the IOVA mode
> is
> > -detected based on a 2-step heuristic detailed below.
> > +detected based on a heuristic detailed below.
> >
> > -For the first step, EAL asks each bus its requirement in terms of IOVA
> mode
> > +For the first step, if no Physical Addresses are available RTE_IOVA_VA
> is
> > +selected.
> > +
> > +Then EAL asks each bus its requirement in terms of IOVA mode
> >   and decides on a preferred IOVA mode.
> >
> >   - if all buses report RTE_IOVA_PA, then the preferred IOVA mode is
> RTE_IOVA_PA,
> > @@ -575,7 +578,7 @@ and decides on a preferred IOVA mode.
> >   If the buses have expressed no preference on which IOVA mode to pick,
> then a
> >   default is selected using the following logic:
> >
> > -- if physical addresses are not available, RTE_IOVA_VA mode is used
> > +- if enable_iova_as_pa was not set at build RTE_IOVA_VA mode is used
> >   - if /sys/kernel/iommu_groups is not empty, RTE_IOVA_VA mode is used
> >   - otherwise, RTE_IOVA_PA mode is used
> >
> > diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
> > index 57da058cec..2f1fce3c54 100644
> > --- a/lib/eal/linux/eal.c
> > +++ b/lib/eal/linux/eal.c
> > @@ -1067,6 +1067,16 @@ rte_eal_init(int argc, char **argv)
> >
> >       phys_addrs = rte_eal_using_phys_addrs() != 0;
> >
> > +     if (!phys_addrs) {
> > +             /* if we have no access to physical addresses, pick IOVA
> as VA mode. */
> > +             if (internal_conf->iova_mode == RTE_IOVA_PA)
> > +                     RTE_LOG(WARNING, EAL, "WARNING: --iova-mode=pa,
> but Physical addresses are unavailable, selecting IOVA as VA mode.\n");
> > +             else
> > +                     RTE_LOG(DEBUG, EAL, "Physical addresses are
> unavailable, selecting IOVA as VA mode.\n");
> > +             internal_conf->iova_mode = RTE_IOVA_VA;
> > +             rte_eal_get_configuration()->iova_mode =
> internal_conf->iova_mode;
> > +     }
> > +
> >       /* if no EAL option "--iova-mode=<pa|va>", use bus IOVA scheme */
> >       if (internal_conf->iova_mode == RTE_IOVA_DC) {
> >               /* autodetect the IOVA mapping mode */
> > @@ -1078,12 +1088,6 @@ rte_eal_init(int argc, char **argv)
> >                       if (!RTE_IOVA_IN_MBUF) {
> >                               iova_mode = RTE_IOVA_VA;
> >                               RTE_LOG(DEBUG, EAL, "IOVA as VA mode is
> forced by build option.\n");
> > -                     } else if (!phys_addrs) {
> > -                             /* if we have no access to physical
> addresses,
> > -                              * pick IOVA as VA mode.
> > -                              */
> > -                             iova_mode = RTE_IOVA_VA;
> > -                             RTE_LOG(DEBUG, EAL, "Physical addresses
> are unavailable, selecting IOVA as VA mode.\n");
> >                       } else if (is_iommu_enabled()) {
> >                               /* we have an IOMMU, pick IOVA as VA mode
> */
> >                               iova_mode = RTE_IOVA_VA;
>
> What tests are you running that generate an error without explicitly
> selecting the iova-mode?  When I run the fast-tests I see the system
> correctly selecting VA without any additional command-line parameters
> when running 23.11:
>
>
Hi, we run the tests as they are defined in debian/ubuntu autopkgtest, here
a full log:


https://autopkgtest.ubuntu.com/results/autopkgtest-noble-paelzer-dpdk-23.11-test-builds/noble/ppc64el/d/dpdk/20231123_134814_ed029@/log.gz

Fasttests, just like yours, fail for us.
But this is a virtual test env that might not be as powerful as yours.

But as I've said, consider this one withdrawn and the one just fixing the
test config preferred.


> 21:47:05 DPDK_TEST=acl_autotest MALLOC_PERTURB_=201
> /home/drc/src/dpdk/build/app/dpdk-test --no-huge -m 2048
> ----------------------------------- output
> -----------------------------------
> stdout:
> RTE>>acl_autotest
> acl context <acl_ctx>@0x179cbd300
>    socket_id=-1
>    alg=5
>    first_load_sz=0
>    max_rules=196608
>    rule_size=128
>    num_rules=0
>    num_categories=0
>    num_tries=0
> acl context <acl_ctx>@0x179cbd300
>    socket_id=-1
>    alg=5
>    first_load_sz=0
>    max_rules=196608
>    rule_size=128
>    num_rules=0
>    num_categories=0
>    num_tries=0
> running test_convert_rules(acl_ipv4vlan_tuple)
> running test_convert_rules(acl_ipv4vlan_tuple,
> RTE_ACL_FIELD_TYPE_BITMASK type for IPv4)
> running test_convert_rules(acl_ipv4vlan_tuple, RTE_ACL_FIELD_TYPE_RANGE
> type for IPv4)
> running test_convert_rules(acl_ipv4vlan_tuple: swap VLAN and PORTs order)
> running test_convert_rules(acl_ipv4vlan_tuple: swap SRC and DST IPv4 order)
> test_u32_range#1704 starting range test from 0 to 264192
> Test OK
> RTE>>
> stderr:
> EAL: Detected CPU lcores: 128
> EAL: Detected NUMA nodes: 2
> EAL: Detected static linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'VA'
>
> Dave
>
  

Patch

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 6debf54efb..20c7355e0f 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -559,9 +559,12 @@  IOVA Mode is selected by considering what the current usable Devices on the
 system require and/or support.
 
 On FreeBSD, RTE_IOVA_PA is always the default. On Linux, the IOVA mode is
-detected based on a 2-step heuristic detailed below.
+detected based on a heuristic detailed below.
 
-For the first step, EAL asks each bus its requirement in terms of IOVA mode
+For the first step, if no Physical Addresses are available RTE_IOVA_VA is
+selected.
+
+Then EAL asks each bus its requirement in terms of IOVA mode
 and decides on a preferred IOVA mode.
 
 - if all buses report RTE_IOVA_PA, then the preferred IOVA mode is RTE_IOVA_PA,
@@ -575,7 +578,7 @@  and decides on a preferred IOVA mode.
 If the buses have expressed no preference on which IOVA mode to pick, then a
 default is selected using the following logic:
 
-- if physical addresses are not available, RTE_IOVA_VA mode is used
+- if enable_iova_as_pa was not set at build RTE_IOVA_VA mode is used
 - if /sys/kernel/iommu_groups is not empty, RTE_IOVA_VA mode is used
 - otherwise, RTE_IOVA_PA mode is used
 
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 57da058cec..2f1fce3c54 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1067,6 +1067,16 @@  rte_eal_init(int argc, char **argv)
 
 	phys_addrs = rte_eal_using_phys_addrs() != 0;
 
+	if (!phys_addrs) {
+		/* if we have no access to physical addresses, pick IOVA as VA mode. */
+		if (internal_conf->iova_mode == RTE_IOVA_PA)
+			RTE_LOG(WARNING, EAL, "WARNING: --iova-mode=pa, but Physical addresses are unavailable, selecting IOVA as VA mode.\n");
+		else
+			RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n");
+		internal_conf->iova_mode = RTE_IOVA_VA;
+		rte_eal_get_configuration()->iova_mode = internal_conf->iova_mode;
+	}
+
 	/* if no EAL option "--iova-mode=<pa|va>", use bus IOVA scheme */
 	if (internal_conf->iova_mode == RTE_IOVA_DC) {
 		/* autodetect the IOVA mapping mode */
@@ -1078,12 +1088,6 @@  rte_eal_init(int argc, char **argv)
 			if (!RTE_IOVA_IN_MBUF) {
 				iova_mode = RTE_IOVA_VA;
 				RTE_LOG(DEBUG, EAL, "IOVA as VA mode is forced by build option.\n");
-			} else if (!phys_addrs) {
-				/* if we have no access to physical addresses,
-				 * pick IOVA as VA mode.
-				 */
-				iova_mode = RTE_IOVA_VA;
-				RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n");
 			} else if (is_iommu_enabled()) {
 				/* we have an IOMMU, pick IOVA as VA mode */
 				iova_mode = RTE_IOVA_VA;