bus/pci: fix selection of default device NUMA node
Checks
Commit Message
There can be dev binding issue when no hugepages
are allocated for socket 0.
To avoid this, set device numa node value based on
the first lcore instead of 0.
Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
Cc: stable@dpdk.org
Signed-off-by: Houssem Bouhlel <houssem.bouhlel@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
drivers/bus/pci/pci_common.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Comments
On Tue, Oct 26, 2021 at 11:06:10AM +0200, Houssem Bouhlel wrote:
> There can be dev binding issue when no hugepages
> are allocated for socket 0.
> To avoid this, set device numa node value based on
> the first lcore instead of 0.
>
> Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
Sorry, the Fixes line is wrong. This is the correct one:
Fixes: 8a04cb612589 ("pci: set default numa node for broken systems")
> Cc: stable@dpdk.org
>
> Signed-off-by: Houssem Bouhlel <houssem.bouhlel@6wind.com>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
> drivers/bus/pci/pci_common.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> index f8fff2c98ebf..c70ab2373c79 100644
> --- a/drivers/bus/pci/pci_common.c
> +++ b/drivers/bus/pci/pci_common.c
> @@ -166,6 +166,7 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> struct rte_pci_device *dev)
> {
> int ret;
> + unsigned int socket_id;
> bool already_probed;
> struct rte_pci_addr *loc;
>
> @@ -194,7 +195,8 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> if (rte_socket_count() > 1)
> RTE_LOG(INFO, EAL, "Device %s is not NUMA-aware, defaulting socket to 0\n",
> dev->name);
One more comment (sorry, I should have done it before you send the mail):
We should move this log below, and use the socket_id instead of 0.
> - dev->device.numa_node = 0;
> + socket_id = rte_lcore_to_socket_id(rte_get_next_lcore(-1, 0, 0));
> + dev->device.numa_node = socket_id;
> }
>
> already_probed = rte_dev_is_probed(&dev->device);
> --
> 2.30.2
>
+CC David
On Tue, Oct 26, 2021 at 11:17:08AM +0200, Olivier Matz wrote:
> On Tue, Oct 26, 2021 at 11:06:10AM +0200, Houssem Bouhlel wrote:
> > There can be dev binding issue when no hugepages
> > are allocated for socket 0.
> > To avoid this, set device numa node value based on
> > the first lcore instead of 0.
> >
> > Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
>
> Sorry, the Fixes line is wrong. This is the correct one:
> Fixes: 8a04cb612589 ("pci: set default numa node for broken systems")
>
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Houssem Bouhlel <houssem.bouhlel@6wind.com>
> > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > ---
> > drivers/bus/pci/pci_common.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> > index f8fff2c98ebf..c70ab2373c79 100644
> > --- a/drivers/bus/pci/pci_common.c
> > +++ b/drivers/bus/pci/pci_common.c
> > @@ -166,6 +166,7 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> > struct rte_pci_device *dev)
> > {
> > int ret;
> > + unsigned int socket_id;
> > bool already_probed;
> > struct rte_pci_addr *loc;
> >
> > @@ -194,7 +195,8 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> > if (rte_socket_count() > 1)
> > RTE_LOG(INFO, EAL, "Device %s is not NUMA-aware, defaulting socket to 0\n",
> > dev->name);
>
> One more comment (sorry, I should have done it before you send the mail):
> We should move this log below, and use the socket_id instead of 0.
>
> > - dev->device.numa_node = 0;
> > + socket_id = rte_lcore_to_socket_id(rte_get_next_lcore(-1, 0, 0));
> > + dev->device.numa_node = socket_id;
After some offline discussions with David, some additional comments:
- a similar change may be needed in other bus drivers
- instead of setting the numa node to an existing socket, it can make
more sense to keep its value to unknown (-1). This would however be a
behavior change for pci bus, which returns 0 since 2015 for unknown
cases. See:
81f8d2317df2 ("eal/linux: fix socket value for undetermined numa node")
8a04cb612589 ("pci: set default numa node for broken systems")
I'll tend to be in favor of using -1. Any other opinion?
Should we announce a behavior change in this case?
> > }
> >
> > already_probed = rte_dev_is_probed(&dev->device);
> > --
> > 2.30.2
> >
On Fri, Oct 29, 2021 at 10:45 AM Olivier Matz <olivier.matz@6wind.com> wrote:
>
> +CC David
>
> On Tue, Oct 26, 2021 at 11:17:08AM +0200, Olivier Matz wrote:
> > On Tue, Oct 26, 2021 at 11:06:10AM +0200, Houssem Bouhlel wrote:
> > > There can be dev binding issue when no hugepages
> > > are allocated for socket 0.
> > > To avoid this, set device numa node value based on
> > > the first lcore instead of 0.
> > >
> > > Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
> >
> > Sorry, the Fixes line is wrong. This is the correct one:
> > Fixes: 8a04cb612589 ("pci: set default numa node for broken systems")
> >
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Houssem Bouhlel <houssem.bouhlel@6wind.com>
> > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > ---
> > > drivers/bus/pci/pci_common.c | 4 +++-
> > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> > > index f8fff2c98ebf..c70ab2373c79 100644
> > > --- a/drivers/bus/pci/pci_common.c
> > > +++ b/drivers/bus/pci/pci_common.c
> > > @@ -166,6 +166,7 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> > > struct rte_pci_device *dev)
> > > {
> > > int ret;
> > > + unsigned int socket_id;
> > > bool already_probed;
> > > struct rte_pci_addr *loc;
> > >
> > > @@ -194,7 +195,8 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> > > if (rte_socket_count() > 1)
> > > RTE_LOG(INFO, EAL, "Device %s is not NUMA-aware, defaulting socket to 0\n",
> > > dev->name);
> >
> > One more comment (sorry, I should have done it before you send the mail):
> > We should move this log below, and use the socket_id instead of 0.
> >
> > > - dev->device.numa_node = 0;
> > > + socket_id = rte_lcore_to_socket_id(rte_get_next_lcore(-1, 0, 0));
> > > + dev->device.numa_node = socket_id;
>
> After some offline discussions with David, some additional comments:
>
> - a similar change may be needed in other bus drivers
>
> - instead of setting the numa node to an existing socket, it can make
> more sense to keep its value to unknown (-1). This would however be a
> behavior change for pci bus, which returns 0 since 2015 for unknown
> cases. See:
> 81f8d2317df2 ("eal/linux: fix socket value for undetermined numa node")
> 8a04cb612589 ("pci: set default numa node for broken systems")
>
> I'll tend to be in favor of using -1. Any other opinion?
> Should we announce a behavior change in this case?
Good summary.
I copied some more people.
I am for -1 too (as a way to indicate "I don't know what this PCI
device affinity is").
It is dangerous to change now, and I think it is late for 21.11.
On Wed, Nov 03, 2021 at 09:36:49PM +0100, David Marchand wrote:
> On Fri, Oct 29, 2021 at 10:45 AM Olivier Matz <olivier.matz@6wind.com> wrote:
> >
> > +CC David
> >
> > On Tue, Oct 26, 2021 at 11:17:08AM +0200, Olivier Matz wrote:
> > > On Tue, Oct 26, 2021 at 11:06:10AM +0200, Houssem Bouhlel wrote:
> > > > There can be dev binding issue when no hugepages
> > > > are allocated for socket 0.
> > > > To avoid this, set device numa node value based on
> > > > the first lcore instead of 0.
> > > >
> > > > Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
> > >
> > > Sorry, the Fixes line is wrong. This is the correct one:
> > > Fixes: 8a04cb612589 ("pci: set default numa node for broken systems")
> > >
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Houssem Bouhlel <houssem.bouhlel@6wind.com>
> > > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > ---
> > > > drivers/bus/pci/pci_common.c | 4 +++-
> > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> > > > index f8fff2c98ebf..c70ab2373c79 100644
> > > > --- a/drivers/bus/pci/pci_common.c
> > > > +++ b/drivers/bus/pci/pci_common.c
> > > > @@ -166,6 +166,7 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> > > > struct rte_pci_device *dev)
> > > > {
> > > > int ret;
> > > > + unsigned int socket_id;
> > > > bool already_probed;
> > > > struct rte_pci_addr *loc;
> > > >
> > > > @@ -194,7 +195,8 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> > > > if (rte_socket_count() > 1)
> > > > RTE_LOG(INFO, EAL, "Device %s is not NUMA-aware, defaulting socket to 0\n",
> > > > dev->name);
> > >
> > > One more comment (sorry, I should have done it before you send the mail):
> > > We should move this log below, and use the socket_id instead of 0.
> > >
> > > > - dev->device.numa_node = 0;
> > > > + socket_id = rte_lcore_to_socket_id(rte_get_next_lcore(-1, 0, 0));
> > > > + dev->device.numa_node = socket_id;
> >
> > After some offline discussions with David, some additional comments:
> >
> > - a similar change may be needed in other bus drivers
> >
> > - instead of setting the numa node to an existing socket, it can make
> > more sense to keep its value to unknown (-1). This would however be a
> > behavior change for pci bus, which returns 0 since 2015 for unknown
> > cases. See:
> > 81f8d2317df2 ("eal/linux: fix socket value for undetermined numa node")
> > 8a04cb612589 ("pci: set default numa node for broken systems")
> >
> > I'll tend to be in favor of using -1. Any other opinion?
> > Should we announce a behavior change in this case?
>
> Good summary.
> I copied some more people.
>
> I am for -1 too (as a way to indicate "I don't know what this PCI
> device affinity is").
>
> It is dangerous to change now, and I think it is late for 21.11.
+1, we can make an announce and change this for next version.
03/11/2021 21:36, David Marchand:
> On Fri, Oct 29, 2021 at 10:45 AM Olivier Matz <olivier.matz@6wind.com> wrote:
> >
> > +CC David
> >
> > On Tue, Oct 26, 2021 at 11:17:08AM +0200, Olivier Matz wrote:
> > > On Tue, Oct 26, 2021 at 11:06:10AM +0200, Houssem Bouhlel wrote:
> > > > There can be dev binding issue when no hugepages
> > > > are allocated for socket 0.
> > > > To avoid this, set device numa node value based on
> > > > the first lcore instead of 0.
> > > >
> > > > Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
> > >
> > > Sorry, the Fixes line is wrong. This is the correct one:
> > > Fixes: 8a04cb612589 ("pci: set default numa node for broken systems")
> > >
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Houssem Bouhlel <houssem.bouhlel@6wind.com>
> > > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > ---
> > > > drivers/bus/pci/pci_common.c | 4 +++-
> > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> > > > index f8fff2c98ebf..c70ab2373c79 100644
> > > > --- a/drivers/bus/pci/pci_common.c
> > > > +++ b/drivers/bus/pci/pci_common.c
> > > > @@ -166,6 +166,7 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> > > > struct rte_pci_device *dev)
> > > > {
> > > > int ret;
> > > > + unsigned int socket_id;
> > > > bool already_probed;
> > > > struct rte_pci_addr *loc;
> > > >
> > > > @@ -194,7 +195,8 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> > > > if (rte_socket_count() > 1)
> > > > RTE_LOG(INFO, EAL, "Device %s is not NUMA-aware, defaulting socket to 0\n",
> > > > dev->name);
> > >
> > > One more comment (sorry, I should have done it before you send the mail):
> > > We should move this log below, and use the socket_id instead of 0.
> > >
> > > > - dev->device.numa_node = 0;
> > > > + socket_id = rte_lcore_to_socket_id(rte_get_next_lcore(-1, 0, 0));
> > > > + dev->device.numa_node = socket_id;
> >
> > After some offline discussions with David, some additional comments:
> >
> > - a similar change may be needed in other bus drivers
Yes we need to be consistent.
You need to check what is done in all OSes as well.
Example of a place to look at:
3c6e58102510 ("bus/pci: fix unknown NUMA node value on Windows")
> > - instead of setting the numa node to an existing socket, it can make
> > more sense to keep its value to unknown (-1). This would however be a
> > behavior change for pci bus, which returns 0 since 2015 for unknown
> > cases. See:
> > 81f8d2317df2 ("eal/linux: fix socket value for undetermined numa node")
> > 8a04cb612589 ("pci: set default numa node for broken systems")
> >
> > I'll tend to be in favor of using -1. Any other opinion?
> > Should we announce a behavior change in this case?
>
> Good summary.
> I copied some more people.
>
> I am for -1 too (as a way to indicate "I don't know what this PCI
> device affinity is").
-1 is SOCKET_ID_ANY
I suppose it is OK to use SOCKET_ID_ANY when we have no other info.
@@ -166,6 +166,7 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
struct rte_pci_device *dev)
{
int ret;
+ unsigned int socket_id;
bool already_probed;
struct rte_pci_addr *loc;
@@ -194,7 +195,8 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
if (rte_socket_count() > 1)
RTE_LOG(INFO, EAL, "Device %s is not NUMA-aware, defaulting socket to 0\n",
dev->name);
- dev->device.numa_node = 0;
+ socket_id = rte_lcore_to_socket_id(rte_get_next_lcore(-1, 0, 0));
+ dev->device.numa_node = socket_id;
}
already_probed = rte_dev_is_probed(&dev->device);