bus/pci: fix selection of default device NUMA node

Message ID 20211026090610.10823-1-houssem.bouhlel@6wind.com (mailing list archive)
State Rejected, archived
Delegated to: David Marchand
Headers
Series bus/pci: fix selection of default device NUMA node |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/github-robot: build success github build: passed
ci/iol-broadcom-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-broadcom-Functional success Functional Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-mellanox-Performance fail Performance Testing issues
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS

Commit Message

Houssem Bouhlel Oct. 26, 2021, 9:06 a.m. UTC
  There can be dev binding issue when no hugepages
are allocated for socket 0.
To avoid this, set device numa node value based on
the first lcore instead of 0.

Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
Cc: stable@dpdk.org

Signed-off-by: Houssem Bouhlel <houssem.bouhlel@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/bus/pci/pci_common.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
  

Comments

Olivier Matz Oct. 26, 2021, 9:17 a.m. UTC | #1
On Tue, Oct 26, 2021 at 11:06:10AM +0200, Houssem Bouhlel wrote:
> There can be dev binding issue when no hugepages
> are allocated for socket 0.
> To avoid this, set device numa node value based on
> the first lcore instead of 0.
> 
> Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")

Sorry, the Fixes line is wrong. This is the correct one:
Fixes: 8a04cb612589 ("pci: set default numa node for broken systems")

> Cc: stable@dpdk.org
> 
> Signed-off-by: Houssem Bouhlel <houssem.bouhlel@6wind.com>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  drivers/bus/pci/pci_common.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> index f8fff2c98ebf..c70ab2373c79 100644
> --- a/drivers/bus/pci/pci_common.c
> +++ b/drivers/bus/pci/pci_common.c
> @@ -166,6 +166,7 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
>  			 struct rte_pci_device *dev)
>  {
>  	int ret;
> +	unsigned int socket_id;
>  	bool already_probed;
>  	struct rte_pci_addr *loc;
>  
> @@ -194,7 +195,8 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
>  		if (rte_socket_count() > 1)
>  			RTE_LOG(INFO, EAL, "Device %s is not NUMA-aware, defaulting socket to 0\n",
>  					dev->name);

One more comment (sorry, I should have done it before you send the mail):
We should move this log below, and use the socket_id instead of 0.

> -		dev->device.numa_node = 0;
> +		socket_id = rte_lcore_to_socket_id(rte_get_next_lcore(-1, 0, 0));
> +		dev->device.numa_node = socket_id;
>  	}
>  
>  	already_probed = rte_dev_is_probed(&dev->device);
> -- 
> 2.30.2
>
  
Olivier Matz Oct. 29, 2021, 8:44 a.m. UTC | #2
+CC David

On Tue, Oct 26, 2021 at 11:17:08AM +0200, Olivier Matz wrote:
> On Tue, Oct 26, 2021 at 11:06:10AM +0200, Houssem Bouhlel wrote:
> > There can be dev binding issue when no hugepages
> > are allocated for socket 0.
> > To avoid this, set device numa node value based on
> > the first lcore instead of 0.
> > 
> > Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
> 
> Sorry, the Fixes line is wrong. This is the correct one:
> Fixes: 8a04cb612589 ("pci: set default numa node for broken systems")
> 
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Houssem Bouhlel <houssem.bouhlel@6wind.com>
> > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > ---
> >  drivers/bus/pci/pci_common.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> > index f8fff2c98ebf..c70ab2373c79 100644
> > --- a/drivers/bus/pci/pci_common.c
> > +++ b/drivers/bus/pci/pci_common.c
> > @@ -166,6 +166,7 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> >  			 struct rte_pci_device *dev)
> >  {
> >  	int ret;
> > +	unsigned int socket_id;
> >  	bool already_probed;
> >  	struct rte_pci_addr *loc;
> >  
> > @@ -194,7 +195,8 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> >  		if (rte_socket_count() > 1)
> >  			RTE_LOG(INFO, EAL, "Device %s is not NUMA-aware, defaulting socket to 0\n",
> >  					dev->name);
> 
> One more comment (sorry, I should have done it before you send the mail):
> We should move this log below, and use the socket_id instead of 0.
> 
> > -		dev->device.numa_node = 0;
> > +		socket_id = rte_lcore_to_socket_id(rte_get_next_lcore(-1, 0, 0));
> > +		dev->device.numa_node = socket_id;

After some offline discussions with David, some additional comments:

- a similar change may be needed in other bus drivers

- instead of setting the numa node to an existing socket, it can make
  more sense to keep its value to unknown (-1). This would however be a
  behavior change for pci bus, which returns 0 since 2015 for unknown
  cases. See:
    81f8d2317df2 ("eal/linux: fix socket value for undetermined numa node")
    8a04cb612589 ("pci: set default numa node for broken systems")

I'll tend to be in favor of using -1. Any other opinion?
Should we announce a behavior change in this case?

> >  	}
> >  
> >  	already_probed = rte_dev_is_probed(&dev->device);
> > -- 
> > 2.30.2
> >
  
David Marchand Nov. 3, 2021, 8:36 p.m. UTC | #3
On Fri, Oct 29, 2021 at 10:45 AM Olivier Matz <olivier.matz@6wind.com> wrote:
>
> +CC David
>
> On Tue, Oct 26, 2021 at 11:17:08AM +0200, Olivier Matz wrote:
> > On Tue, Oct 26, 2021 at 11:06:10AM +0200, Houssem Bouhlel wrote:
> > > There can be dev binding issue when no hugepages
> > > are allocated for socket 0.
> > > To avoid this, set device numa node value based on
> > > the first lcore instead of 0.
> > >
> > > Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
> >
> > Sorry, the Fixes line is wrong. This is the correct one:
> > Fixes: 8a04cb612589 ("pci: set default numa node for broken systems")
> >
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Houssem Bouhlel <houssem.bouhlel@6wind.com>
> > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > ---
> > >  drivers/bus/pci/pci_common.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> > > index f8fff2c98ebf..c70ab2373c79 100644
> > > --- a/drivers/bus/pci/pci_common.c
> > > +++ b/drivers/bus/pci/pci_common.c
> > > @@ -166,6 +166,7 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> > >                      struct rte_pci_device *dev)
> > >  {
> > >     int ret;
> > > +   unsigned int socket_id;
> > >     bool already_probed;
> > >     struct rte_pci_addr *loc;
> > >
> > > @@ -194,7 +195,8 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> > >             if (rte_socket_count() > 1)
> > >                     RTE_LOG(INFO, EAL, "Device %s is not NUMA-aware, defaulting socket to 0\n",
> > >                                     dev->name);
> >
> > One more comment (sorry, I should have done it before you send the mail):
> > We should move this log below, and use the socket_id instead of 0.
> >
> > > -           dev->device.numa_node = 0;
> > > +           socket_id = rte_lcore_to_socket_id(rte_get_next_lcore(-1, 0, 0));
> > > +           dev->device.numa_node = socket_id;
>
> After some offline discussions with David, some additional comments:
>
> - a similar change may be needed in other bus drivers
>
> - instead of setting the numa node to an existing socket, it can make
>   more sense to keep its value to unknown (-1). This would however be a
>   behavior change for pci bus, which returns 0 since 2015 for unknown
>   cases. See:
>     81f8d2317df2 ("eal/linux: fix socket value for undetermined numa node")
>     8a04cb612589 ("pci: set default numa node for broken systems")
>
> I'll tend to be in favor of using -1. Any other opinion?
> Should we announce a behavior change in this case?

Good summary.
I copied some more people.

I am for -1 too (as a way to indicate "I don't know what this PCI
device affinity is").

It is dangerous to change now, and I think it is late for 21.11.
  
Olivier Matz Nov. 4, 2021, 8:57 a.m. UTC | #4
On Wed, Nov 03, 2021 at 09:36:49PM +0100, David Marchand wrote:
> On Fri, Oct 29, 2021 at 10:45 AM Olivier Matz <olivier.matz@6wind.com> wrote:
> >
> > +CC David
> >
> > On Tue, Oct 26, 2021 at 11:17:08AM +0200, Olivier Matz wrote:
> > > On Tue, Oct 26, 2021 at 11:06:10AM +0200, Houssem Bouhlel wrote:
> > > > There can be dev binding issue when no hugepages
> > > > are allocated for socket 0.
> > > > To avoid this, set device numa node value based on
> > > > the first lcore instead of 0.
> > > >
> > > > Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
> > >
> > > Sorry, the Fixes line is wrong. This is the correct one:
> > > Fixes: 8a04cb612589 ("pci: set default numa node for broken systems")
> > >
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Houssem Bouhlel <houssem.bouhlel@6wind.com>
> > > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > ---
> > > >  drivers/bus/pci/pci_common.c | 4 +++-
> > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> > > > index f8fff2c98ebf..c70ab2373c79 100644
> > > > --- a/drivers/bus/pci/pci_common.c
> > > > +++ b/drivers/bus/pci/pci_common.c
> > > > @@ -166,6 +166,7 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> > > >                      struct rte_pci_device *dev)
> > > >  {
> > > >     int ret;
> > > > +   unsigned int socket_id;
> > > >     bool already_probed;
> > > >     struct rte_pci_addr *loc;
> > > >
> > > > @@ -194,7 +195,8 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> > > >             if (rte_socket_count() > 1)
> > > >                     RTE_LOG(INFO, EAL, "Device %s is not NUMA-aware, defaulting socket to 0\n",
> > > >                                     dev->name);
> > >
> > > One more comment (sorry, I should have done it before you send the mail):
> > > We should move this log below, and use the socket_id instead of 0.
> > >
> > > > -           dev->device.numa_node = 0;
> > > > +           socket_id = rte_lcore_to_socket_id(rte_get_next_lcore(-1, 0, 0));
> > > > +           dev->device.numa_node = socket_id;
> >
> > After some offline discussions with David, some additional comments:
> >
> > - a similar change may be needed in other bus drivers
> >
> > - instead of setting the numa node to an existing socket, it can make
> >   more sense to keep its value to unknown (-1). This would however be a
> >   behavior change for pci bus, which returns 0 since 2015 for unknown
> >   cases. See:
> >     81f8d2317df2 ("eal/linux: fix socket value for undetermined numa node")
> >     8a04cb612589 ("pci: set default numa node for broken systems")
> >
> > I'll tend to be in favor of using -1. Any other opinion?
> > Should we announce a behavior change in this case?
> 
> Good summary.
> I copied some more people.
> 
> I am for -1 too (as a way to indicate "I don't know what this PCI
> device affinity is").
> 
> It is dangerous to change now, and I think it is late for 21.11.

+1, we can make an announce and change this for next version.
  
Thomas Monjalon July 14, 2022, 1:46 p.m. UTC | #5
03/11/2021 21:36, David Marchand:
> On Fri, Oct 29, 2021 at 10:45 AM Olivier Matz <olivier.matz@6wind.com> wrote:
> >
> > +CC David
> >
> > On Tue, Oct 26, 2021 at 11:17:08AM +0200, Olivier Matz wrote:
> > > On Tue, Oct 26, 2021 at 11:06:10AM +0200, Houssem Bouhlel wrote:
> > > > There can be dev binding issue when no hugepages
> > > > are allocated for socket 0.
> > > > To avoid this, set device numa node value based on
> > > > the first lcore instead of 0.
> > > >
> > > > Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
> > >
> > > Sorry, the Fixes line is wrong. This is the correct one:
> > > Fixes: 8a04cb612589 ("pci: set default numa node for broken systems")
> > >
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Houssem Bouhlel <houssem.bouhlel@6wind.com>
> > > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > ---
> > > >  drivers/bus/pci/pci_common.c | 4 +++-
> > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> > > > index f8fff2c98ebf..c70ab2373c79 100644
> > > > --- a/drivers/bus/pci/pci_common.c
> > > > +++ b/drivers/bus/pci/pci_common.c
> > > > @@ -166,6 +166,7 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> > > >                      struct rte_pci_device *dev)
> > > >  {
> > > >     int ret;
> > > > +   unsigned int socket_id;
> > > >     bool already_probed;
> > > >     struct rte_pci_addr *loc;
> > > >
> > > > @@ -194,7 +195,8 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> > > >             if (rte_socket_count() > 1)
> > > >                     RTE_LOG(INFO, EAL, "Device %s is not NUMA-aware, defaulting socket to 0\n",
> > > >                                     dev->name);
> > >
> > > One more comment (sorry, I should have done it before you send the mail):
> > > We should move this log below, and use the socket_id instead of 0.
> > >
> > > > -           dev->device.numa_node = 0;
> > > > +           socket_id = rte_lcore_to_socket_id(rte_get_next_lcore(-1, 0, 0));
> > > > +           dev->device.numa_node = socket_id;
> >
> > After some offline discussions with David, some additional comments:
> >
> > - a similar change may be needed in other bus drivers

Yes we need to be consistent.
You need to check what is done in all OSes as well.
Example of a place to look at:
	3c6e58102510 ("bus/pci: fix unknown NUMA node value on Windows")

> > - instead of setting the numa node to an existing socket, it can make
> >   more sense to keep its value to unknown (-1). This would however be a
> >   behavior change for pci bus, which returns 0 since 2015 for unknown
> >   cases. See:
> >     81f8d2317df2 ("eal/linux: fix socket value for undetermined numa node")
> >     8a04cb612589 ("pci: set default numa node for broken systems")
> >
> > I'll tend to be in favor of using -1. Any other opinion?
> > Should we announce a behavior change in this case?
> 
> Good summary.
> I copied some more people.
> 
> I am for -1 too (as a way to indicate "I don't know what this PCI
> device affinity is").

-1 is SOCKET_ID_ANY
I suppose it is OK to use SOCKET_ID_ANY when we have no other info.
  

Patch

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index f8fff2c98ebf..c70ab2373c79 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -166,6 +166,7 @@  rte_pci_probe_one_driver(struct rte_pci_driver *dr,
 			 struct rte_pci_device *dev)
 {
 	int ret;
+	unsigned int socket_id;
 	bool already_probed;
 	struct rte_pci_addr *loc;
 
@@ -194,7 +195,8 @@  rte_pci_probe_one_driver(struct rte_pci_driver *dr,
 		if (rte_socket_count() > 1)
 			RTE_LOG(INFO, EAL, "Device %s is not NUMA-aware, defaulting socket to 0\n",
 					dev->name);
-		dev->device.numa_node = 0;
+		socket_id = rte_lcore_to_socket_id(rte_get_next_lcore(-1, 0, 0));
+		dev->device.numa_node = socket_id;
 	}
 
 	already_probed = rte_dev_is_probed(&dev->device);