mem: fix allocation failure on non-NUMA kernel
Checks
Commit Message
Running dpdk-helloworld on Linux with lib numa present,
but no kernel support for NUMA (CONFIG_NUMA=n) causes
ret_service_init() to fail with EAL: error allocating
rte services array.
alloc_seg() calls get_mempolicy to verify that the allocation
has happened on the correct socket, but receives ENOSYS from
the kernel and fails the allocation.
The allocated socket should only be verified if check_numa() is true.
Fixes: 2a96c88be83e ("mem: ease init in a docker container")
Cc: nicolas.dichtel@6wind.com
Cc: stable@dpdk.org
Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
---
lib/librte_eal/linux/eal_memalloc.c | 28 +++++++++++++++++-----------
1 file changed, 17 insertions(+), 11 deletions(-)
Comments
Le 05/08/2020 à 14:26, Nick Connolly a écrit :
> Running dpdk-helloworld on Linux with lib numa present,
> but no kernel support for NUMA (CONFIG_NUMA=n) causes
> ret_service_init() to fail with EAL: error allocating
> rte services array.
>
> alloc_seg() calls get_mempolicy to verify that the allocation
> has happened on the correct socket, but receives ENOSYS from
> the kernel and fails the allocation.
>
> The allocated socket should only be verified if check_numa() is true.
>
> Fixes: 2a96c88be83e ("mem: ease init in a docker container")
I'm wondering if the bug existed before this commit.
Before this commit, it was:
move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
if (cur_socket_id != socket_id) {
/* error */
Isn't it possible to hit this error case if CONFIG_NUMA is unset in the kernel?
[snip]
> + if (check_numa()) {
> + ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
> + MPOL_F_NODE | MPOL_F_ADDR);
> + if (ret < 0) {
> + RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
> + __func__, strerror(errno));
> + goto mapped;
> + } else if (cur_socket_id != socket_id) {
> + RTE_LOG(DEBUG, EAL,
> + "%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
> + __func__, socket_id, cur_socket_id);
> + goto mapped;
> + }
> + } else {
> + if (rte_socket_count() > 1)
> + RTE_LOG(DEBUG, EAL, "%s(): not checking socket for allocation (wanted %d)\n",
> + __func__, socket_id);
nit: maybe an higher log level like WARNING?
Regards,
Nicolas
Hi Nicolas,
Thanks for the quick response.
On 05/08/2020 14:42, Nicolas Dichtel wrote:
> Le 05/08/2020 à 14:26, Nick Connolly a écrit :
>> Running dpdk-helloworld on Linux with lib numa present,
>> but no kernel support for NUMA (CONFIG_NUMA=n) causes
>> ret_service_init() to fail with EAL: error allocating
>> rte services array.
>>
>> alloc_seg() calls get_mempolicy to verify that the allocation
>> has happened on the correct socket, but receives ENOSYS from
>> the kernel and fails the allocation.
>>
>> The allocated socket should only be verified if check_numa() is true.
>>
>> Fixes: 2a96c88be83e ("mem: ease init in a docker container")
> I'm wondering if the bug existed before this commit.
>
> Before this commit, it was:
> move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
> if (cur_socket_id != socket_id) {
> /* error */
>
> Isn't it possible to hit this error case if CONFIG_NUMA is unset in the kernel?
I've just run the previous code to test this out and you are right that
move_pages does indeed return -1 with errno set to ENOSYS, but nothing
checks this so execution carries on and compares cur_socket_id (which
will be unchanged from the zero initialization) with socket_id (which is
presumably also zero), thus allowing the allocation to succeed!
> [snip]
>> + if (check_numa()) {
>> + ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
>> + MPOL_F_NODE | MPOL_F_ADDR);
>> + if (ret < 0) {
>> + RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
>> + __func__, strerror(errno));
>> + goto mapped;
>> + } else if (cur_socket_id != socket_id) {
>> + RTE_LOG(DEBUG, EAL,
>> + "%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
>> + __func__, socket_id, cur_socket_id);
>> + goto mapped;
>> + }
>> + } else {
>> + if (rte_socket_count() > 1)
>> + RTE_LOG(DEBUG, EAL, "%s(): not checking socket for allocation (wanted %d)\n",
>> + __func__, socket_id);
> nit: maybe an higher log level like WARNING?
Open to guidance here - my concern was that this is going to be
generated for every call to alloc_seg() and I'm not sure what the
frequency will be - I'm cautious about flooding the log with warnings
under 'normal running'. Are the implications of running on a multi
socket system with NUMA support disabled in the kernel purely
performance related for the DPDK or is there a functional correctness
issue as well?
>
> Regards,
> Nicolas
Regards,
Nick
Le 05/08/2020 à 16:20, Nick Connolly a écrit :
[snip]
>>> Fixes: 2a96c88be83e ("mem: ease init in a docker container")
>> I'm wondering if the bug existed before this commit.
>>
>> Before this commit, it was:
>> move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
>> if (cur_socket_id != socket_id) {
>> /* error */
>>
>> Isn't it possible to hit this error case if CONFIG_NUMA is unset in the kernel?
> I've just run the previous code to test this out and you are right that
> move_pages does indeed return -1 with errno set to ENOSYS, but nothing checks
> this so execution carries on and compares cur_socket_id (which will be unchanged
> from the zero initialization) with socket_id (which is presumably also zero),
> thus allowing the allocation to succeed!
I came to this conclusion, but I didn't check if socket_id could be != from 0.
>> [snip]
>>> + if (check_numa()) {
>>> + ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
>>> + MPOL_F_NODE | MPOL_F_ADDR);
>>> + if (ret < 0) {
>>> + RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
>>> + __func__, strerror(errno));
>>> + goto mapped;
>>> + } else if (cur_socket_id != socket_id) {
>>> + RTE_LOG(DEBUG, EAL,
>>> + "%s(): allocation happened on wrong socket (wanted %d,
>>> got %d)\n",
>>> + __func__, socket_id, cur_socket_id);
>>> + goto mapped;
>>> + }
>>> + } else {
>>> + if (rte_socket_count() > 1)
>>> + RTE_LOG(DEBUG, EAL, "%s(): not checking socket for allocation
>>> (wanted %d)\n",
>>> + __func__, socket_id);
>> nit: maybe an higher log level like WARNING?
> Open to guidance here - my concern was that this is going to be generated for
> every call to alloc_seg() and I'm not sure what the frequency will be - I'm
> cautious about flooding the log with warnings under 'normal running'. Are the
> implications of running on a multi socket system with NUMA support disabled in
> the kernel purely performance related for the DPDK or is there a functional
> correctness issue as well?
Is it really a 'normal running' to have CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES in
dpdk and not CONFIG_NUMA in the kernel?
Regards,
Nicolas
On 05/08/2020 15:36, Nicolas Dichtel wrote:
> Le 05/08/2020 à 16:20, Nick Connolly a écrit :
> [snip]
>>>> Fixes: 2a96c88be83e ("mem: ease init in a docker container")
>>> I'm wondering if the bug existed before this commit.
>>>
>>> Before this commit, it was:
>>> move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
>>> if (cur_socket_id != socket_id) {
>>> /* error */
>>>
>>> Isn't it possible to hit this error case if CONFIG_NUMA is unset in the kernel?
>> I've just run the previous code to test this out and you are right that
>> move_pages does indeed return -1 with errno set to ENOSYS, but nothing checks
>> this so execution carries on and compares cur_socket_id (which will be unchanged
>> from the zero initialization) with socket_id (which is presumably also zero),
>> thus allowing the allocation to succeed!
> I came to this conclusion, but I didn't check if socket_id could be != from 0.
>
>>> [snip]
>>>> + if (check_numa()) {
>>>> + ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
>>>> + MPOL_F_NODE | MPOL_F_ADDR);
>>>> + if (ret < 0) {
>>>> + RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
>>>> + __func__, strerror(errno));
>>>> + goto mapped;
>>>> + } else if (cur_socket_id != socket_id) {
>>>> + RTE_LOG(DEBUG, EAL,
>>>> + "%s(): allocation happened on wrong socket (wanted %d,
>>>> got %d)\n",
>>>> + __func__, socket_id, cur_socket_id);
>>>> + goto mapped;
>>>> + }
>>>> + } else {
>>>> + if (rte_socket_count() > 1)
>>>> + RTE_LOG(DEBUG, EAL, "%s(): not checking socket for allocation
>>>> (wanted %d)\n",
>>>> + __func__, socket_id);
>>> nit: maybe an higher log level like WARNING?
>> Open to guidance here - my concern was that this is going to be generated for
>> every call to alloc_seg() and I'm not sure what the frequency will be - I'm
>> cautious about flooding the log with warnings under 'normal running'. Are the
>> implications of running on a multi socket system with NUMA support disabled in
>> the kernel purely performance related for the DPDK or is there a functional
>> correctness issue as well?
> Is it really a 'normal running' to have CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES in
> dpdk and not CONFIG_NUMA in the kernel?
I'm not an expert of DPDK, but I think it needs to be treated as 'normal
running', for the following reasons:
1. The existing code in eal_memalloc_alloc_seg_bulk() is designed to
work even if check_numa() indicates that NUMA support is not enabled:
#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
if (check_numa()) {
oldmask = numa_allocate_nodemask();
prepare_numa(&oldpolicy, oldmask, socket);
have_numa = true;
}
#endif
2. The DPDK application could be built with
CONFIG_RTE_EAL_NUMA_AWARE_HUGE_PAGES and then the binary run on
different systems with and without NUMA support.
Regards,
Nick
Le 05/08/2020 à 16:53, Nick Connolly a écrit :
[snip]
>>>>> + if (check_numa()) {
>>>>> + ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
>>>>> + MPOL_F_NODE | MPOL_F_ADDR);
>>>>> + if (ret < 0) {
>>>>> + RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
>>>>> + __func__, strerror(errno));
>>>>> + goto mapped;
>>>>> + } else if (cur_socket_id != socket_id) {
>>>>> + RTE_LOG(DEBUG, EAL,
>>>>> + "%s(): allocation happened on wrong socket (wanted %d,
>>>>> got %d)\n",
>>>>> + __func__, socket_id, cur_socket_id);
>>>>> + goto mapped;
>>>>> + }
>>>>> + } else {
>>>>> + if (rte_socket_count() > 1)
>>>>> + RTE_LOG(DEBUG, EAL, "%s(): not checking socket for allocation
>>>>> (wanted %d)\n",
>>>>> + __func__, socket_id);
>>>> nit: maybe an higher log level like WARNING?
>>> Open to guidance here - my concern was that this is going to be generated for
>>> every call to alloc_seg() and I'm not sure what the frequency will be - I'm
>>> cautious about flooding the log with warnings under 'normal running'. Are the
>>> implications of running on a multi socket system with NUMA support disabled in
>>> the kernel purely performance related for the DPDK or is there a functional
>>> correctness issue as well?
>> Is it really a 'normal running' to have CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES in
>> dpdk and not CONFIG_NUMA in the kernel?
>
> I'm not an expert of DPDK, but I think it needs to be treated as 'normal
> running', for the following reasons:
>
> 1. The existing code in eal_memalloc_alloc_seg_bulk() is designed to
> work even if check_numa() indicates that NUMA support is not enabled:
>
> #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
> if (check_numa()) {
> oldmask = numa_allocate_nodemask();
> prepare_numa(&oldpolicy, oldmask, socket);
> have_numa = true;
> }
> #endif
The question was not to return an error, but to display a warning. So the code
will work (after your patch), no problem.
> 2. The DPDK application could be built with
> CONFIG_RTE_EAL_NUMA_AWARE_HUGE_PAGES and then the binary run on
> different systems with and without NUMA support.
In a production environment, it seems odd to have a custom kernel and a generic
dpdk app, it's why I propose the log level WARNING (or NOTICE maybe?).
I let other comment about this, I don't have a strong opinion.
Regards,
Nicolas
On 05/08/2020 16:13, Nicolas Dichtel wrote:
> Le 05/08/2020 à 16:53, Nick Connolly a écrit :
> [snip]
>>>>>> + if (check_numa()) {
>>>>>> + ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
>>>>>> + MPOL_F_NODE | MPOL_F_ADDR);
>>>>>> + if (ret < 0) {
>>>>>> + RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
>>>>>> + __func__, strerror(errno));
>>>>>> + goto mapped;
>>>>>> + } else if (cur_socket_id != socket_id) {
>>>>>> + RTE_LOG(DEBUG, EAL,
>>>>>> + "%s(): allocation happened on wrong socket (wanted %d,
>>>>>> got %d)\n",
>>>>>> + __func__, socket_id, cur_socket_id);
>>>>>> + goto mapped;
>>>>>> + }
>>>>>> + } else {
>>>>>> + if (rte_socket_count() > 1)
>>>>>> + RTE_LOG(DEBUG, EAL, "%s(): not checking socket for allocation
>>>>>> (wanted %d)\n",
>>>>>> + __func__, socket_id);
>>>>> nit: maybe an higher log level like WARNING?
>>>> Open to guidance here - my concern was that this is going to be generated for
>>>> every call to alloc_seg() and I'm not sure what the frequency will be - I'm
>>>> cautious about flooding the log with warnings under 'normal running'. Are the
>>>> implications of running on a multi socket system with NUMA support disabled in
>>>> the kernel purely performance related for the DPDK or is there a functional
>>>> correctness issue as well?
>>> Is it really a 'normal running' to have CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES in
>>> dpdk and not CONFIG_NUMA in the kernel?
>> I'm not an expert of DPDK, but I think it needs to be treated as 'normal
>> running', for the following reasons:
>>
>> 1. The existing code in eal_memalloc_alloc_seg_bulk() is designed to
>> work even if check_numa() indicates that NUMA support is not enabled:
>>
>> #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
>> if (check_numa()) {
>> oldmask = numa_allocate_nodemask();
>> prepare_numa(&oldpolicy, oldmask, socket);
>> have_numa = true;
>> }
>> #endif
> The question was not to return an error, but to display a warning. So the code
> will work (after your patch), no problem.
>
>> 2. The DPDK application could be built with
>> CONFIG_RTE_EAL_NUMA_AWARE_HUGE_PAGES and then the binary run on
>> different systems with and without NUMA support.
> In a production environment, it seems odd to have a custom kernel and a generic
> dpdk app, it's why I propose the log level WARNING (or NOTICE maybe?).
> I let other comment about this, I don't have a strong opinion.
Thanks - appreciate the input - I also have no strong opinions here and
am happy to be guided.
Regards,
Nick
On 05-Aug-20 4:21 PM, Nick Connolly wrote:
>
>
> On 05/08/2020 16:13, Nicolas Dichtel wrote:
>> Le 05/08/2020 à 16:53, Nick Connolly a écrit :
>> [snip]
>>>>>>> + if (check_numa()) {
>>>>>>> + ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
>>>>>>> + MPOL_F_NODE | MPOL_F_ADDR);
>>>>>>> + if (ret < 0) {
>>>>>>> + RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
>>>>>>> + __func__, strerror(errno));
>>>>>>> + goto mapped;
>>>>>>> + } else if (cur_socket_id != socket_id) {
>>>>>>> + RTE_LOG(DEBUG, EAL,
>>>>>>> + "%s(): allocation happened on wrong socket
>>>>>>> (wanted %d,
>>>>>>> got %d)\n",
>>>>>>> + __func__, socket_id, cur_socket_id);
>>>>>>> + goto mapped;
>>>>>>> + }
>>>>>>> + } else {
>>>>>>> + if (rte_socket_count() > 1)
>>>>>>> + RTE_LOG(DEBUG, EAL, "%s(): not checking socket for
>>>>>>> allocation
>>>>>>> (wanted %d)\n",
>>>>>>> + __func__, socket_id);
>>>>>> nit: maybe an higher log level like WARNING?
>>>>> Open to guidance here - my concern was that this is going to be
>>>>> generated for
>>>>> every call to alloc_seg() and I'm not sure what the frequency will
>>>>> be - I'm
>>>>> cautious about flooding the log with warnings under 'normal
>>>>> running'. Are the
>>>>> implications of running on a multi socket system with NUMA support
>>>>> disabled in
>>>>> the kernel purely performance related for the DPDK or is there a
>>>>> functional
>>>>> correctness issue as well?
>>>> Is it really a 'normal running' to have
>>>> CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES in
>>>> dpdk and not CONFIG_NUMA in the kernel?
>>> I'm not an expert of DPDK, but I think it needs to be treated as 'normal
>>> running', for the following reasons:
>>>
>>> 1. The existing code in eal_memalloc_alloc_seg_bulk() is designed to
>>> work even if check_numa() indicates that NUMA support is not
>>> enabled:
>>>
>>> #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
>>> if (check_numa()) {
>>> oldmask = numa_allocate_nodemask();
>>> prepare_numa(&oldpolicy, oldmask, socket);
>>> have_numa = true;
>>> }
>>> #endif
>> The question was not to return an error, but to display a warning. So
>> the code
>> will work (after your patch), no problem.
>>
>>> 2. The DPDK application could be built with
>>> CONFIG_RTE_EAL_NUMA_AWARE_HUGE_PAGES and then the binary run on
>>> different systems with and without NUMA support.
>> In a production environment, it seems odd to have a custom kernel and
>> a generic
>> dpdk app, it's why I propose the log level WARNING (or NOTICE maybe?).
>> I let other comment about this, I don't have a strong opinion.
> Thanks - appreciate the input - I also have no strong opinions here and
> am happy to be guided.
If there is a socket mismatch, wouldn't allocation fail anyway, which
would result in an error message? Usually, when things fail, the first
obvious thing to do is turn up the debug log. I'm inclined to leave it
as DEBUG.
>
> Regards,
> Nick
On 05-Aug-20 1:26 PM, Nick Connolly wrote:
> Running dpdk-helloworld on Linux with lib numa present,
> but no kernel support for NUMA (CONFIG_NUMA=n) causes
> ret_service_init() to fail with EAL: error allocating
> rte services array.
>
> alloc_seg() calls get_mempolicy to verify that the allocation
> has happened on the correct socket, but receives ENOSYS from
> the kernel and fails the allocation.
>
> The allocated socket should only be verified if check_numa() is true.
>
> Fixes: 2a96c88be83e ("mem: ease init in a docker container")
> Cc: nicolas.dichtel@6wind.com
> Cc: stable@dpdk.org
> Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
> ---
> lib/librte_eal/linux/eal_memalloc.c | 28 +++++++++++++++++-----------
> 1 file changed, 17 insertions(+), 11 deletions(-)
>
> diff --git a/lib/librte_eal/linux/eal_memalloc.c b/lib/librte_eal/linux/eal_memalloc.c
> index db60e7997..179757809 100644
> --- a/lib/librte_eal/linux/eal_memalloc.c
> +++ b/lib/librte_eal/linux/eal_memalloc.c
> @@ -610,17 +610,23 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
> }
>
> #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
> - ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
> - MPOL_F_NODE | MPOL_F_ADDR);
> - if (ret < 0) {
> - RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
> - __func__, strerror(errno));
> - goto mapped;
> - } else if (cur_socket_id != socket_id) {
> - RTE_LOG(DEBUG, EAL,
> - "%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
> - __func__, socket_id, cur_socket_id);
> - goto mapped;
> + if (check_numa()) {
> + ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
> + MPOL_F_NODE | MPOL_F_ADDR);
> + if (ret < 0) {
> + RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
> + __func__, strerror(errno));
> + goto mapped;
> + } else if (cur_socket_id != socket_id) {
> + RTE_LOG(DEBUG, EAL,
> + "%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
> + __func__, socket_id, cur_socket_id);
> + goto mapped;
> + }
> + } else {
> + if (rte_socket_count() > 1)
> + RTE_LOG(DEBUG, EAL, "%s(): not checking socket for allocation (wanted %d)\n",
> + __func__, socket_id);
> }
If there is no kernel support for NUMA, how would we end up with >1
socket count?
> #else
> if (rte_socket_count() > 1)
>
Hi Anatoly,
Thanks for the response. You are asking a good question - here's what I
know:
The issue arose on a single socket system, running WSL2 (full Linux
kernel running as a lightweight VM under Windows).
The default kernel in this environment is built with CONFIG_NUMA=n which
means get_mempolicy() returns an error.
This causes the check to ensure that the allocated memory is associated
with the correct socket to fail.
The change is to skip the allocation check if check_numa() indicates
that NUMA-aware memory is not supported.
Researching the meaning of CONFIG_NUMA, I found
https://cateee.net/lkddb/web-lkddb/NUMA.html which says:
> Enable NUMA (Non-Uniform Memory Access) support.
> The kernel will try to allocate memory used by a CPU on the local
> memory controller of the CPU and add some more NUMA awareness to the
> kernel.
Clearly CONFIG_NUMA enables memory awareness, but there's no indication
in the description whether information about the NUMA physical
architecture is 'hidden', or whether it is still exposed through
/sys/devices/system/node* (which is used by the rte initialisation code
to determine how many sockets there are). Unfortunately, I don't have
ready access to a multi-socket Linux system that I can test this out on,
so I took the conservative approach that it may be possible to have
CONFIG_NUMA disabled, but the kernel still report more than one node,
and coded the change to generate a debug message if this occurs.
Do you know whether CONFIG_NUMA turns off all knowledge about the
hardware architecture? If it does, then I agree that the test for
rte_socket_count() serves no purpose and should be removed.
Many thanks,
Nick
On 17/09/2020 12:31, Burakov, Anatoly wrote:
> On 05-Aug-20 1:26 PM, Nick Connolly wrote:
>> Running dpdk-helloworld on Linux with lib numa present,
>> but no kernel support for NUMA (CONFIG_NUMA=n) causes
>> ret_service_init() to fail with EAL: error allocating
>> rte services array.
>>
>> alloc_seg() calls get_mempolicy to verify that the allocation
>> has happened on the correct socket, but receives ENOSYS from
>> the kernel and fails the allocation.
>>
>> The allocated socket should only be verified if check_numa() is true.
>>
>> Fixes: 2a96c88be83e ("mem: ease init in a docker container")
>> Cc: nicolas.dichtel@6wind.com
>> Cc: stable@dpdk.org
>> Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
>> ---
>> lib/librte_eal/linux/eal_memalloc.c | 28 +++++++++++++++++-----------
>> 1 file changed, 17 insertions(+), 11 deletions(-)
>>
>> diff --git a/lib/librte_eal/linux/eal_memalloc.c
>> b/lib/librte_eal/linux/eal_memalloc.c
>> index db60e7997..179757809 100644
>> --- a/lib/librte_eal/linux/eal_memalloc.c
>> +++ b/lib/librte_eal/linux/eal_memalloc.c
>> @@ -610,17 +610,23 @@ alloc_seg(struct rte_memseg *ms, void *addr,
>> int socket_id,
>> }
>> #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
>> - ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
>> - MPOL_F_NODE | MPOL_F_ADDR);
>> - if (ret < 0) {
>> - RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
>> - __func__, strerror(errno));
>> - goto mapped;
>> - } else if (cur_socket_id != socket_id) {
>> - RTE_LOG(DEBUG, EAL,
>> - "%s(): allocation happened on wrong socket (wanted
>> %d, got %d)\n",
>> - __func__, socket_id, cur_socket_id);
>> - goto mapped;
>> + if (check_numa()) {
>> + ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
>> + MPOL_F_NODE | MPOL_F_ADDR);
>> + if (ret < 0) {
>> + RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
>> + __func__, strerror(errno));
>> + goto mapped;
>> + } else if (cur_socket_id != socket_id) {
>> + RTE_LOG(DEBUG, EAL,
>> + "%s(): allocation happened on wrong socket
>> (wanted %d, got %d)\n",
>> + __func__, socket_id, cur_socket_id);
>> + goto mapped;
>> + }
>> + } else {
>> + if (rte_socket_count() > 1)
>> + RTE_LOG(DEBUG, EAL, "%s(): not checking socket for
>> allocation (wanted %d)\n",
>> + __func__, socket_id);
>> }
>
> If there is no kernel support for NUMA, how would we end up with >1
> socket count?
>
>> #else
>> if (rte_socket_count() > 1)
>>
>
>
On 17-Sep-20 1:29 PM, Nick Connolly wrote:
> Hi Anatoly,
>
> Thanks for the response. You are asking a good question - here's what I
> know:
>
> The issue arose on a single socket system, running WSL2 (full Linux
> kernel running as a lightweight VM under Windows).
> The default kernel in this environment is built with CONFIG_NUMA=n which
> means get_mempolicy() returns an error.
> This causes the check to ensure that the allocated memory is associated
> with the correct socket to fail.
>
> The change is to skip the allocation check if check_numa() indicates
> that NUMA-aware memory is not supported.
>
> Researching the meaning of CONFIG_NUMA, I found
> https://cateee.net/lkddb/web-lkddb/NUMA.html which says:
>> Enable NUMA (Non-Uniform Memory Access) support.
>> The kernel will try to allocate memory used by a CPU on the local
>> memory controller of the CPU and add some more NUMA awareness to the
>> kernel.
>
> Clearly CONFIG_NUMA enables memory awareness, but there's no indication
> in the description whether information about the NUMA physical
> architecture is 'hidden', or whether it is still exposed through
> /sys/devices/system/node* (which is used by the rte initialisation code
> to determine how many sockets there are). Unfortunately, I don't have
> ready access to a multi-socket Linux system that I can test this out on,
> so I took the conservative approach that it may be possible to have
> CONFIG_NUMA disabled, but the kernel still report more than one node,
> and coded the change to generate a debug message if this occurs.
>
> Do you know whether CONFIG_NUMA turns off all knowledge about the
> hardware architecture? If it does, then I agree that the test for
> rte_socket_count() serves no purpose and should be removed.
>
I have a system with a custom compiled kernel, i can recompile it
without this flag and test this. I'll report back with results :)
Hi Anatoly,
Thanks. My recollection is that all of the NUMA configuration flags
were set to 'n'.
Regards,
Nick
On 17/09/2020 13:57, Burakov, Anatoly wrote:
> On 17-Sep-20 1:29 PM, Nick Connolly wrote:
>> Hi Anatoly,
>>
>> Thanks for the response. You are asking a good question - here's
>> what I know:
>>
>> The issue arose on a single socket system, running WSL2 (full Linux
>> kernel running as a lightweight VM under Windows).
>> The default kernel in this environment is built with CONFIG_NUMA=n
>> which means get_mempolicy() returns an error.
>> This causes the check to ensure that the allocated memory is
>> associated with the correct socket to fail.
>>
>> The change is to skip the allocation check if check_numa() indicates
>> that NUMA-aware memory is not supported.
>>
>> Researching the meaning of CONFIG_NUMA, I found
>> https://cateee.net/lkddb/web-lkddb/NUMA.html which says:
>>> Enable NUMA (Non-Uniform Memory Access) support.
>>> The kernel will try to allocate memory used by a CPU on the local
>>> memory controller of the CPU and add some more NUMA awareness to the
>>> kernel.
>>
>> Clearly CONFIG_NUMA enables memory awareness, but there's no
>> indication in the description whether information about the NUMA
>> physical architecture is 'hidden', or whether it is still exposed
>> through /sys/devices/system/node* (which is used by the rte
>> initialisation code to determine how many sockets there are).
>> Unfortunately, I don't have ready access to a multi-socket Linux
>> system that I can test this out on, so I took the conservative
>> approach that it may be possible to have CONFIG_NUMA disabled, but
>> the kernel still report more than one node, and coded the change to
>> generate a debug message if this occurs.
>>
>> Do you know whether CONFIG_NUMA turns off all knowledge about the
>> hardware architecture? If it does, then I agree that the test for
>> rte_socket_count() serves no purpose and should be removed.
>>
>
> I have a system with a custom compiled kernel, i can recompile it
> without this flag and test this. I'll report back with results :)
>
On 17-Sep-20 2:05 PM, Nick Connolly wrote:
> Hi Anatoly,
>
> Thanks. My recollection is that all of the NUMA configuration flags
> were set to 'n'.
>
> Regards,
> Nick
>
> On 17/09/2020 13:57, Burakov, Anatoly wrote:
>> On 17-Sep-20 1:29 PM, Nick Connolly wrote:
>>> Hi Anatoly,
>>>
>>> Thanks for the response. You are asking a good question - here's
>>> what I know:
>>>
>>> The issue arose on a single socket system, running WSL2 (full Linux
>>> kernel running as a lightweight VM under Windows).
>>> The default kernel in this environment is built with CONFIG_NUMA=n
>>> which means get_mempolicy() returns an error.
>>> This causes the check to ensure that the allocated memory is
>>> associated with the correct socket to fail.
>>>
>>> The change is to skip the allocation check if check_numa() indicates
>>> that NUMA-aware memory is not supported.
>>>
>>> Researching the meaning of CONFIG_NUMA, I found
>>> https://cateee.net/lkddb/web-lkddb/NUMA.html which says:
>>>> Enable NUMA (Non-Uniform Memory Access) support.
>>>> The kernel will try to allocate memory used by a CPU on the local
>>>> memory controller of the CPU and add some more NUMA awareness to the
>>>> kernel.
>>>
>>> Clearly CONFIG_NUMA enables memory awareness, but there's no
>>> indication in the description whether information about the NUMA
>>> physical architecture is 'hidden', or whether it is still exposed
>>> through /sys/devices/system/node* (which is used by the rte
>>> initialisation code to determine how many sockets there are).
>>> Unfortunately, I don't have ready access to a multi-socket Linux
>>> system that I can test this out on, so I took the conservative
>>> approach that it may be possible to have CONFIG_NUMA disabled, but
>>> the kernel still report more than one node, and coded the change to
>>> generate a debug message if this occurs.
>>>
>>> Do you know whether CONFIG_NUMA turns off all knowledge about the
>>> hardware architecture? If it does, then I agree that the test for
>>> rte_socket_count() serves no purpose and should be removed.
>>>
>>
>> I have a system with a custom compiled kernel, i can recompile it
>> without this flag and test this. I'll report back with results :)
>>
>
With CONFIG_NUMA set to 'n':
[root@xxx ~]# find /sys -name "node*"
/sys/kernel/software_nodes/node0
[root@xxx ~]#
This is confirmed by running DPDK on that machine - i can see all cores
from all sockets, but they're all appearing on socket 0. So, yes, that
check isn't necessary :)
Excellent - thanks - I'll amend the patch.
On 17/09/2020 15:07, Burakov, Anatoly wrote:
> On 17-Sep-20 2:05 PM, Nick Connolly wrote:
>> Hi Anatoly,
>>
>> Thanks. My recollection is that all of the NUMA configuration flags
>> were set to 'n'.
>>
>> Regards,
>> Nick
>>
>> On 17/09/2020 13:57, Burakov, Anatoly wrote:
>>> On 17-Sep-20 1:29 PM, Nick Connolly wrote:
>>>> Hi Anatoly,
>>>>
>>>> Thanks for the response. You are asking a good question - here's
>>>> what I know:
>>>>
>>>> The issue arose on a single socket system, running WSL2 (full Linux
>>>> kernel running as a lightweight VM under Windows).
>>>> The default kernel in this environment is built with CONFIG_NUMA=n
>>>> which means get_mempolicy() returns an error.
>>>> This causes the check to ensure that the allocated memory is
>>>> associated with the correct socket to fail.
>>>>
>>>> The change is to skip the allocation check if check_numa()
>>>> indicates that NUMA-aware memory is not supported.
>>>>
>>>> Researching the meaning of CONFIG_NUMA, I found
>>>> https://cateee.net/lkddb/web-lkddb/NUMA.html which says:
>>>>> Enable NUMA (Non-Uniform Memory Access) support.
>>>>> The kernel will try to allocate memory used by a CPU on the local
>>>>> memory controller of the CPU and add some more NUMA awareness to
>>>>> the kernel.
>>>>
>>>> Clearly CONFIG_NUMA enables memory awareness, but there's no
>>>> indication in the description whether information about the NUMA
>>>> physical architecture is 'hidden', or whether it is still exposed
>>>> through /sys/devices/system/node* (which is used by the rte
>>>> initialisation code to determine how many sockets there are).
>>>> Unfortunately, I don't have ready access to a multi-socket Linux
>>>> system that I can test this out on, so I took the conservative
>>>> approach that it may be possible to have CONFIG_NUMA disabled, but
>>>> the kernel still report more than one node, and coded the change to
>>>> generate a debug message if this occurs.
>>>>
>>>> Do you know whether CONFIG_NUMA turns off all knowledge about the
>>>> hardware architecture? If it does, then I agree that the test for
>>>> rte_socket_count() serves no purpose and should be removed.
>>>>
>>>
>>> I have a system with a custom compiled kernel, i can recompile it
>>> without this flag and test this. I'll report back with results :)
>>>
>>
>
> With CONFIG_NUMA set to 'n':
>
> [root@xxx ~]# find /sys -name "node*"
> /sys/kernel/software_nodes/node0
> [root@xxx ~]#
>
> This is confirmed by running DPDK on that machine - i can see all
> cores from all sockets, but they're all appearing on socket 0. So,
> yes, that check isn't necessary :)
>
On 17-Sep-20 3:08 PM, Nick Connolly wrote:
> Excellent - thanks - I'll amend the patch.
>
> On 17/09/2020 15:07, Burakov, Anatoly wrote:
>> On 17-Sep-20 2:05 PM, Nick Connolly wrote:
>>> Hi Anatoly,
>>>
>>> Thanks. My recollection is that all of the NUMA configuration flags
>>> were set to 'n'.
>>>
>>> Regards,
>>> Nick
>>>
>>> On 17/09/2020 13:57, Burakov, Anatoly wrote:
>>>> On 17-Sep-20 1:29 PM, Nick Connolly wrote:
>>>>> Hi Anatoly,
>>>>>
>>>>> Thanks for the response. You are asking a good question - here's
>>>>> what I know:
>>>>>
>>>>> The issue arose on a single socket system, running WSL2 (full Linux
>>>>> kernel running as a lightweight VM under Windows).
>>>>> The default kernel in this environment is built with CONFIG_NUMA=n
>>>>> which means get_mempolicy() returns an error.
>>>>> This causes the check to ensure that the allocated memory is
>>>>> associated with the correct socket to fail.
>>>>>
>>>>> The change is to skip the allocation check if check_numa()
>>>>> indicates that NUMA-aware memory is not supported.
>>>>>
>>>>> Researching the meaning of CONFIG_NUMA, I found
>>>>> https://cateee.net/lkddb/web-lkddb/NUMA.html which says:
>>>>>> Enable NUMA (Non-Uniform Memory Access) support.
>>>>>> The kernel will try to allocate memory used by a CPU on the local
>>>>>> memory controller of the CPU and add some more NUMA awareness to
>>>>>> the kernel.
>>>>>
>>>>> Clearly CONFIG_NUMA enables memory awareness, but there's no
>>>>> indication in the description whether information about the NUMA
>>>>> physical architecture is 'hidden', or whether it is still exposed
>>>>> through /sys/devices/system/node* (which is used by the rte
>>>>> initialisation code to determine how many sockets there are).
>>>>> Unfortunately, I don't have ready access to a multi-socket Linux
>>>>> system that I can test this out on, so I took the conservative
>>>>> approach that it may be possible to have CONFIG_NUMA disabled, but
>>>>> the kernel still report more than one node, and coded the change to
>>>>> generate a debug message if this occurs.
>>>>>
>>>>> Do you know whether CONFIG_NUMA turns off all knowledge about the
>>>>> hardware architecture? If it does, then I agree that the test for
>>>>> rte_socket_count() serves no purpose and should be removed.
>>>>>
>>>>
>>>> I have a system with a custom compiled kernel, i can recompile it
>>>> without this flag and test this. I'll report back with results :)
>>>>
>>>
>>
>> With CONFIG_NUMA set to 'n':
>>
>> [root@xxx ~]# find /sys -name "node*"
>> /sys/kernel/software_nodes/node0
>> [root@xxx ~]#
>>
>> This is confirmed by running DPDK on that machine - i can see all
>> cores from all sockets, but they're all appearing on socket 0. So,
>> yes, that check isn't necessary :)
>>
>
I would also add a comment explaining why we're checking for NUMA
support when NUMA support is defined at compiled time.
Sure.
On 17/09/2020 15:18, Burakov, Anatoly wrote:
> On 17-Sep-20 3:08 PM, Nick Connolly wrote:
>> Excellent - thanks - I'll amend the patch.
>>
>> On 17/09/2020 15:07, Burakov, Anatoly wrote:
>>> On 17-Sep-20 2:05 PM, Nick Connolly wrote:
>>>> Hi Anatoly,
>>>>
>>>> Thanks. My recollection is that all of the NUMA configuration
>>>> flags were set to 'n'.
>>>>
>>>> Regards,
>>>> Nick
>>>>
>>>> On 17/09/2020 13:57, Burakov, Anatoly wrote:
>>>>> On 17-Sep-20 1:29 PM, Nick Connolly wrote:
>>>>>> Hi Anatoly,
>>>>>>
>>>>>> Thanks for the response. You are asking a good question - here's
>>>>>> what I know:
>>>>>>
>>>>>> The issue arose on a single socket system, running WSL2 (full
>>>>>> Linux kernel running as a lightweight VM under Windows).
>>>>>> The default kernel in this environment is built with
>>>>>> CONFIG_NUMA=n which means get_mempolicy() returns an error.
>>>>>> This causes the check to ensure that the allocated memory is
>>>>>> associated with the correct socket to fail.
>>>>>>
>>>>>> The change is to skip the allocation check if check_numa()
>>>>>> indicates that NUMA-aware memory is not supported.
>>>>>>
>>>>>> Researching the meaning of CONFIG_NUMA, I found
>>>>>> https://cateee.net/lkddb/web-lkddb/NUMA.html which says:
>>>>>>> Enable NUMA (Non-Uniform Memory Access) support.
>>>>>>> The kernel will try to allocate memory used by a CPU on the
>>>>>>> local memory controller of the CPU and add some more NUMA
>>>>>>> awareness to the kernel.
>>>>>>
>>>>>> Clearly CONFIG_NUMA enables memory awareness, but there's no
>>>>>> indication in the description whether information about the NUMA
>>>>>> physical architecture is 'hidden', or whether it is still exposed
>>>>>> through /sys/devices/system/node* (which is used by the rte
>>>>>> initialisation code to determine how many sockets there are).
>>>>>> Unfortunately, I don't have ready access to a multi-socket Linux
>>>>>> system that I can test this out on, so I took the conservative
>>>>>> approach that it may be possible to have CONFIG_NUMA disabled,
>>>>>> but the kernel still report more than one node, and coded the
>>>>>> change to generate a debug message if this occurs.
>>>>>>
>>>>>> Do you know whether CONFIG_NUMA turns off all knowledge about the
>>>>>> hardware architecture? If it does, then I agree that the test
>>>>>> for rte_socket_count() serves no purpose and should be removed.
>>>>>>
>>>>>
>>>>> I have a system with a custom compiled kernel, i can recompile it
>>>>> without this flag and test this. I'll report back with results :)
>>>>>
>>>>
>>>
>>> With CONFIG_NUMA set to 'n':
>>>
>>> [root@xxx ~]# find /sys -name "node*"
>>> /sys/kernel/software_nodes/node0
>>> [root@xxx ~]#
>>>
>>> This is confirmed by running DPDK on that machine - i can see all
>>> cores from all sockets, but they're all appearing on socket 0. So,
>>> yes, that check isn't necessary :)
>>>
>>
>
> I would also add a comment explaining why we're checking for NUMA
> support when NUMA support is defined at compiled time.
>
@@ -610,17 +610,23 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
}
#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
- ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
- MPOL_F_NODE | MPOL_F_ADDR);
- if (ret < 0) {
- RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
- __func__, strerror(errno));
- goto mapped;
- } else if (cur_socket_id != socket_id) {
- RTE_LOG(DEBUG, EAL,
- "%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
- __func__, socket_id, cur_socket_id);
- goto mapped;
+ if (check_numa()) {
+ ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
+ MPOL_F_NODE | MPOL_F_ADDR);
+ if (ret < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
+ __func__, strerror(errno));
+ goto mapped;
+ } else if (cur_socket_id != socket_id) {
+ RTE_LOG(DEBUG, EAL,
+ "%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
+ __func__, socket_id, cur_socket_id);
+ goto mapped;
+ }
+ } else {
+ if (rte_socket_count() > 1)
+ RTE_LOG(DEBUG, EAL, "%s(): not checking socket for allocation (wanted %d)\n",
+ __func__, socket_id);
}
#else
if (rte_socket_count() > 1)