malloc: enhance NUMA affinity heuristic

Message ID 20221221104858.296530-1-david.marchand@redhat.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series malloc: enhance NUMA affinity heuristic |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/github-robot: build success github build: passed
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/loongarch-compilation success Compilation OK
ci/iol-intel-Performance success Performance Testing PASS
ci/loongarch-unit-testing fail Unit Testing FAIL
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS

Commit Message

David Marchand Dec. 21, 2022, 10:48 a.m. UTC
  Trying to allocate memory on the first detected numa node has less
chance to find some memory actually available rather than on the main
lcore numa node (especially when the DPDK application is started only
on one numa node).

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 lib/eal/common/malloc_heap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Bruce Richardson Dec. 21, 2022, 11:16 a.m. UTC | #1
On Wed, Dec 21, 2022 at 11:48:57AM +0100, David Marchand wrote:
> Trying to allocate memory on the first detected numa node has less
> chance to find some memory actually available rather than on the main
> lcore numa node (especially when the DPDK application is started only
> on one numa node).
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
  
Ferruh Yigit Dec. 21, 2022, 1:50 p.m. UTC | #2
On 12/21/2022 11:16 AM, Bruce Richardson wrote:
> On Wed, Dec 21, 2022 at 11:48:57AM +0100, David Marchand wrote:
>> Trying to allocate memory on the first detected numa node has less
>> chance to find some memory actually available rather than on the main
>> lcore numa node (especially when the DPDK application is started only
>> on one numa node).
>>
>> Signed-off-by: David Marchand <david.marchand@redhat.com>
>> ---
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>


Acked-by: Ferruh Yigit <ferruh.yigit@amd.com>
  
David Marchand Dec. 21, 2022, 2:57 p.m. UTC | #3
Hello Min,

On Wed, Dec 21, 2022 at 11:49 AM David Marchand
<david.marchand@redhat.com> wrote:
>
> Trying to allocate memory on the first detected numa node has less
> chance to find some memory actually available rather than on the main
> lcore numa node (especially when the DPDK application is started only
> on one numa node).
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>

I see a failure in the loongarch CI.

Running binary with
argv[]:'/home/zhoumin/dpdk/build/app/test/dpdk-test'
'--file-prefix=eal_flags_c_opt_autotest' '--proc-type=secondary'
'--lcores' '0-1,2@(5-7),(3-5)@(0,2),(0,6),7'
Error - process did not run ok with valid corelist value
Test Failed

The logs don't give the full picture (though it is not LoongArch CI fault).

I tried to read back on past mail exchanges about the loongarch
server, but I did not find the info.
I suspect cores 5 to 7 belong to different numa nodes, can you confirm?


I'll post a new revision to account for this case.
  
zhoumin Dec. 27, 2022, 9 a.m. UTC | #4
Hi David,


First of all, I sincerely apologize for the late reply.

I had checked this issue carefully and had some useful findings.

On Wed, Dec 21, 2022 at 22:57 PM, David Marchand wrote:
> Hello Min,
>
> On Wed, Dec 21, 2022 at 11:49 AM David Marchand
> <david.marchand@redhat.com> wrote:
>> Trying to allocate memory on the first detected numa node has less
>> chance to find some memory actually available rather than on the main
>> lcore numa node (especially when the DPDK application is started only
>> on one numa node).
>>
>> Signed-off-by: David Marchand <david.marchand@redhat.com>
> I see a failure in the loongarch CI.
>
> Running binary with
> argv[]:'/home/zhoumin/dpdk/build/app/test/dpdk-test'
> '--file-prefix=eal_flags_c_opt_autotest' '--proc-type=secondary'
> '--lcores' '0-1,2@(5-7),(3-5)@(0,2),(0,6),7'
> Error - process did not run ok with valid corelist value
> Test Failed
>
> The logs don't give the full picture (though it is not LoongArch CI fault).
>
> I tried to read back on past mail exchanges about the loongarch
> server, but I did not find the info.
> I suspect cores 5 to 7 belong to different numa nodes, can you confirm?

The cores 5 to 7 belong to the same numa node (NUMA node1) on the 
Loongson-3C5000LL CPU on which LoongArch DPDK CI runs.

>
> I'll post a new revision to account for this case.
>

The LoongArch DPDK CI uses the core 0-7 to run all the DPDK unit tests 
by adding the arg '-l 0-7' in the meson test args. In the above test 
case, the arg '--lcores' '0-1,2@(5-7),(3-5)@(0,2),(0,6),7' will make the 
lcore 0 and 6 to run on the core 0 or 6. The logs of EAL will make it 
more clear when I set the log level of EAL to debug as follows:
EAL: Main lcore 0 is ready (tid=fff3ee18f0;cpuset=[0,6])
EAL: lcore 1 is ready (tid=fff2de4cf0;cpuset=[1])
EAL: lcore 2 is ready (tid=fff25e0cf0;cpuset=[5,6,7])
EAL: lcore 5 is ready (tid=fff0dd4cf0;cpuset=[0,2])
EAL: lcore 4 is ready (tid=fff15d8cf0;cpuset=[0,2])
EAL: lcore 3 is ready (tid=fff1ddccf0;cpuset=[0,2])
EAL: lcore 7 is ready (tid=ffdb7f8cf0;cpuset=[7])
EAL: lcore 6 is ready (tid=ffdbffccf0;cpuset=[0,6])

However, The cores 0 and 6 belong to different numa nodes on the 
Loongson-3C5000LL CPU. The core 0 belongs to NUMA node 0 and the core 6 
belongs to NUMA node 1 as follows:
$ lscpu
Architecture:        loongarch64
Byte Order:          Little Endian
CPU(s):              32
On-line CPU(s) list: 0-31
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           8
NUMA node(s):        8
...
NUMA node0 CPU(s):   0-3
NUMA node1 CPU(s):   4-7
NUMA node2 CPU(s):   8-11
NUMA node3 CPU(s):   12-15
NUMA node4 CPU(s):   16-19
NUMA node5 CPU(s):   20-23
NUMA node6 CPU(s):   24-27
NUMA node7 CPU(s):   28-31
...

So the socket_id for the lcore 0 and 6 will be set to -1 which can be 
seen from the thread_update_affinity(). Meanwhile, I print out the 
socket_id for the lcore 0 to RTE_MAX_LCORE - 1 as follows:
lcore_config[*].socket_id: -1 0 1 0 0 0 -1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 
5 5 6 6 6 6 7 7 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0

In this test case, the modified malloc_get_numa_socket() will return -1 
which caused a memory allocation failure.
Whether it is acceptable in DPDK that the socket_id for a lcore is -1? 
If it's ok, maybe we can check the socket_id of main lcore before using 
it, such as:
diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c
index d7c410b786..3ee19aee15 100644
--- a/lib/eal/common/malloc_heap.c
+++ b/lib/eal/common/malloc_heap.c
@@ -717,6 +717,10 @@ malloc_get_numa_socket(void)
                         return socket_id;
         }

+       socket_id = rte_lcore_to_socket_id(rte_get_main_lcore());
+       if (socket_id != (unsigned int)SOCKET_ID_ANY)
+               return socket_id;
+
         return rte_socket_id_by_idx(0);
  }
  
David Marchand Jan. 3, 2023, 10:56 a.m. UTC | #5
On Tue, Dec 27, 2022 at 10:00 AM zhoumin <zhoumin@loongson.cn> wrote:
>
> Hi David,
>
>
> First of all, I sincerely apologize for the late reply.
>
> I had checked this issue carefully and had some useful findings.
>
> On Wed, Dec 21, 2022 at 22:57 PM, David Marchand wrote:
> > Hello Min,
> >
> > On Wed, Dec 21, 2022 at 11:49 AM David Marchand
> > <david.marchand@redhat.com> wrote:
> >> Trying to allocate memory on the first detected numa node has less
> >> chance to find some memory actually available rather than on the main
> >> lcore numa node (especially when the DPDK application is started only
> >> on one numa node).
> >>
> >> Signed-off-by: David Marchand <david.marchand@redhat.com>
> > I see a failure in the loongarch CI.
> >
> > Running binary with
> > argv[]:'/home/zhoumin/dpdk/build/app/test/dpdk-test'
> > '--file-prefix=eal_flags_c_opt_autotest' '--proc-type=secondary'
> > '--lcores' '0-1,2@(5-7),(3-5)@(0,2),(0,6),7'
> > Error - process did not run ok with valid corelist value
> > Test Failed
> >
> > The logs don't give the full picture (though it is not LoongArch CI fault).
> >
> > I tried to read back on past mail exchanges about the loongarch
> > server, but I did not find the info.
> > I suspect cores 5 to 7 belong to different numa nodes, can you confirm?
>
> The cores 5 to 7 belong to the same numa node (NUMA node1) on the
> Loongson-3C5000LL CPU on which LoongArch DPDK CI runs.
>
> >
> > I'll post a new revision to account for this case.
> >
>
> The LoongArch DPDK CI uses the core 0-7 to run all the DPDK unit tests
> by adding the arg '-l 0-7' in the meson test args. In the above test
> case, the arg '--lcores' '0-1,2@(5-7),(3-5)@(0,2),(0,6),7' will make the
> lcore 0 and 6 to run on the core 0 or 6. The logs of EAL will make it
> more clear when I set the log level of EAL to debug as follows:
> EAL: Main lcore 0 is ready (tid=fff3ee18f0;cpuset=[0,6])

The syntax for this --lcores option is not obvious...
This log really helps.


> EAL: lcore 1 is ready (tid=fff2de4cf0;cpuset=[1])
> EAL: lcore 2 is ready (tid=fff25e0cf0;cpuset=[5,6,7])
> EAL: lcore 5 is ready (tid=fff0dd4cf0;cpuset=[0,2])
> EAL: lcore 4 is ready (tid=fff15d8cf0;cpuset=[0,2])
> EAL: lcore 3 is ready (tid=fff1ddccf0;cpuset=[0,2])
> EAL: lcore 7 is ready (tid=ffdb7f8cf0;cpuset=[7])
> EAL: lcore 6 is ready (tid=ffdbffccf0;cpuset=[0,6])
>
> However, The cores 0 and 6 belong to different numa nodes on the
> Loongson-3C5000LL CPU. The core 0 belongs to NUMA node 0 and the core 6
> belongs to NUMA node 1 as follows:
> $ lscpu
> Architecture:        loongarch64
> Byte Order:          Little Endian
> CPU(s):              32
> On-line CPU(s) list: 0-31
> Thread(s) per core:  1
> Core(s) per socket:  4
> Socket(s):           8
> NUMA node(s):        8
> ...
> NUMA node0 CPU(s):   0-3
> NUMA node1 CPU(s):   4-7
> NUMA node2 CPU(s):   8-11
> NUMA node3 CPU(s):   12-15
> NUMA node4 CPU(s):   16-19
> NUMA node5 CPU(s):   20-23
> NUMA node6 CPU(s):   24-27
> NUMA node7 CPU(s):   28-31
> ...
>
> So the socket_id for the lcore 0 and 6 will be set to -1 which can be
> seen from the thread_update_affinity(). Meanwhile, I print out the
> socket_id for the lcore 0 to RTE_MAX_LCORE - 1 as follows:
> lcore_config[*].socket_id: -1 0 1 0 0 0 -1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5
> 5 5 6 6 6 6 7 7 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0
>
> In this test case, the modified malloc_get_numa_socket() will return -1
> which caused a memory allocation failure.
> Whether it is acceptable in DPDK that the socket_id for a lcore is -1?
> If it's ok, maybe we can check the socket_id of main lcore before using
> it, such as:
> diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c
> index d7c410b786..3ee19aee15 100644
> --- a/lib/eal/common/malloc_heap.c
> +++ b/lib/eal/common/malloc_heap.c
> @@ -717,6 +717,10 @@ malloc_get_numa_socket(void)
>                          return socket_id;
>          }
>
> +       socket_id = rte_lcore_to_socket_id(rte_get_main_lcore());
> +       if (socket_id != (unsigned int)SOCKET_ID_ANY)
> +               return socket_id;
> +
>          return rte_socket_id_by_idx(0);
>   }

Yep, this is what I had in mind before going off.
v2 incoming.

Thanks Min!
  

Patch

diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c
index d7c410b786..08f965a525 100644
--- a/lib/eal/common/malloc_heap.c
+++ b/lib/eal/common/malloc_heap.c
@@ -717,7 +717,7 @@  malloc_get_numa_socket(void)
 			return socket_id;
 	}
 
-	return rte_socket_id_by_idx(0);
+	return rte_lcore_to_socket_id(rte_get_main_lcore());
 }
 
 void *