[v2] eal: fix data race in multi-process support
Checks
Commit Message
If DPDK is built with thread sanitizer it reports a race
in setting of multiprocess file descriptor. The fix is to
use atomic operations when updating mp_fd.
Build:
$ meson -Db_sanitize=address build
$ ninja -C build
Simple example:
$ .build/app/dpdk-testpmd -l 1-3 --no-huge
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Static memory layout is selected, amount of reserved memory can be adjusted with -m or --socket-mem
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /run/user/1000/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
testpmd: No probed ethernet devices
testpmd: create a new mbuf pool <mb_pool_0>: n=163456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
EAL: Error - exiting with code: 1
Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate memory
==================
WARNING: ThreadSanitizer: data race (pid=87245)
Write of size 4 at 0x558e04d8ff70 by main thread:
#0 rte_mp_channel_cleanup <null> (dpdk-testpmd+0x1e7d30c)
#1 rte_eal_cleanup <null> (dpdk-testpmd+0x1e85929)
#2 rte_exit <null> (dpdk-testpmd+0x1e5bc0a)
#3 mbuf_pool_create.cold <null> (dpdk-testpmd+0x274011)
#4 main <null> (dpdk-testpmd+0x5cc15d)
Previous read of size 4 at 0x558e04d8ff70 by thread T2:
#0 mp_handle <null> (dpdk-testpmd+0x1e7c439)
#1 ctrl_thread_init <null> (dpdk-testpmd+0x1e6ee1e)
As if synchronized via sleep:
#0 nanosleep ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:366 (libtsan.so.0+0x6075e)
#1 get_tsc_freq <null> (dpdk-testpmd+0x1e92ff9)
#2 set_tsc_freq <null> (dpdk-testpmd+0x1e6f2fc)
#3 rte_eal_timer_init <null> (dpdk-testpmd+0x1e931a4)
#4 rte_eal_init.cold <null> (dpdk-testpmd+0x29e578)
#5 main <null> (dpdk-testpmd+0x5cbc45)
Location is global 'mp_fd' of size 4 at 0x558e04d8ff70 (dpdk-testpmd+0x000003122f70)
Thread T2 'rte_mp_handle' (tid=87248, running) created by main thread at:
#0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:969 (libtsan.so.0+0x5ad75)
#1 rte_ctrl_thread_create <null> (dpdk-testpmd+0x1e6efd0)
#2 rte_mp_channel_init.cold <null> (dpdk-testpmd+0x29cb7c)
#3 rte_eal_init <null> (dpdk-testpmd+0x1e8662e)
#4 main <null> (dpdk-testpmd+0x5cbc45)
SUMMARY: ThreadSanitizer: data race (/home/shemminger/DPDK/main/build/app/dpdk-testpmd+0x1e7d30c) in rte_mp_channel_cleanup
==================
ThreadSanitizer: reported 1 warnings
Fixes: bacaa2754017 ("eal: add channel for multi-process communication")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/eal/common/eal_common_proc.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
Comments
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
On 2022/9/7 0:45, Stephen Hemminger wrote:
> If DPDK is built with thread sanitizer it reports a race
> in setting of multiprocess file descriptor. The fix is to
> use atomic operations when updating mp_fd.
>
> Build:
> $ meson -Db_sanitize=address build
> $ ninja -C build
>
> Simple example:
> $ .build/app/dpdk-testpmd -l 1-3 --no-huge
> EAL: Detected CPU lcores: 16
> EAL: Detected NUMA nodes: 1
> EAL: Static memory layout is selected, amount of reserved memory can be adjusted with -m or --socket-mem
> EAL: Detected static linkage of DPDK
> EAL: Multi-process socket /run/user/1000/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'VA'
> testpmd: No probed ethernet devices
> testpmd: create a new mbuf pool <mb_pool_0>: n=163456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> EAL: Error - exiting with code: 1
> Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate memory
> ==================
> WARNING: ThreadSanitizer: data race (pid=87245)
> Write of size 4 at 0x558e04d8ff70 by main thread:
> #0 rte_mp_channel_cleanup <null> (dpdk-testpmd+0x1e7d30c)
> #1 rte_eal_cleanup <null> (dpdk-testpmd+0x1e85929)
> #2 rte_exit <null> (dpdk-testpmd+0x1e5bc0a)
> #3 mbuf_pool_create.cold <null> (dpdk-testpmd+0x274011)
> #4 main <null> (dpdk-testpmd+0x5cc15d)
>
> Previous read of size 4 at 0x558e04d8ff70 by thread T2:
> #0 mp_handle <null> (dpdk-testpmd+0x1e7c439)
> #1 ctrl_thread_init <null> (dpdk-testpmd+0x1e6ee1e)
>
> As if synchronized via sleep:
> #0 nanosleep ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:366 (libtsan.so.0+0x6075e)
> #1 get_tsc_freq <null> (dpdk-testpmd+0x1e92ff9)
> #2 set_tsc_freq <null> (dpdk-testpmd+0x1e6f2fc)
> #3 rte_eal_timer_init <null> (dpdk-testpmd+0x1e931a4)
> #4 rte_eal_init.cold <null> (dpdk-testpmd+0x29e578)
> #5 main <null> (dpdk-testpmd+0x5cbc45)
>
> Location is global 'mp_fd' of size 4 at 0x558e04d8ff70 (dpdk-testpmd+0x000003122f70)
>
> Thread T2 'rte_mp_handle' (tid=87248, running) created by main thread at:
> #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:969 (libtsan.so.0+0x5ad75)
> #1 rte_ctrl_thread_create <null> (dpdk-testpmd+0x1e6efd0)
> #2 rte_mp_channel_init.cold <null> (dpdk-testpmd+0x29cb7c)
> #3 rte_eal_init <null> (dpdk-testpmd+0x1e8662e)
> #4 main <null> (dpdk-testpmd+0x5cbc45)
>
> SUMMARY: ThreadSanitizer: data race (/home/shemminger/DPDK/main/build/app/dpdk-testpmd+0x1e7d30c) in rte_mp_channel_cleanup
> ==================
> ThreadSanitizer: reported 1 warnings
>
> Fixes: bacaa2754017 ("eal: add channel for multi-process communication")
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> lib/eal/common/eal_common_proc.c | 17 ++++++++---------
> 1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
> index 313060528fec..1fc1d6c53bd2 100644
> --- a/lib/eal/common/eal_common_proc.c
> +++ b/lib/eal/common/eal_common_proc.c
> @@ -260,7 +260,7 @@ rte_mp_action_unregister(const char *name)
> }
>
...
06/09/2022 18:45, Stephen Hemminger:
> If DPDK is built with thread sanitizer it reports a race
> in setting of multiprocess file descriptor. The fix is to
> use atomic operations when updating mp_fd.
>
> Build:
> $ meson -Db_sanitize=address build
> $ ninja -C build
>
> Simple example:
> $ .build/app/dpdk-testpmd -l 1-3 --no-huge
> EAL: Detected CPU lcores: 16
> EAL: Detected NUMA nodes: 1
> EAL: Static memory layout is selected, amount of reserved memory can be adjusted with -m or --socket-mem
> EAL: Detected static linkage of DPDK
> EAL: Multi-process socket /run/user/1000/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'VA'
> testpmd: No probed ethernet devices
> testpmd: create a new mbuf pool <mb_pool_0>: n=163456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> EAL: Error - exiting with code: 1
> Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate memory
> ==================
> WARNING: ThreadSanitizer: data race (pid=87245)
> Write of size 4 at 0x558e04d8ff70 by main thread:
> #0 rte_mp_channel_cleanup <null> (dpdk-testpmd+0x1e7d30c)
> #1 rte_eal_cleanup <null> (dpdk-testpmd+0x1e85929)
> #2 rte_exit <null> (dpdk-testpmd+0x1e5bc0a)
> #3 mbuf_pool_create.cold <null> (dpdk-testpmd+0x274011)
> #4 main <null> (dpdk-testpmd+0x5cc15d)
>
> Previous read of size 4 at 0x558e04d8ff70 by thread T2:
> #0 mp_handle <null> (dpdk-testpmd+0x1e7c439)
> #1 ctrl_thread_init <null> (dpdk-testpmd+0x1e6ee1e)
>
> As if synchronized via sleep:
> #0 nanosleep ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:366 (libtsan.so.0+0x6075e)
> #1 get_tsc_freq <null> (dpdk-testpmd+0x1e92ff9)
> #2 set_tsc_freq <null> (dpdk-testpmd+0x1e6f2fc)
> #3 rte_eal_timer_init <null> (dpdk-testpmd+0x1e931a4)
> #4 rte_eal_init.cold <null> (dpdk-testpmd+0x29e578)
> #5 main <null> (dpdk-testpmd+0x5cbc45)
>
> Location is global 'mp_fd' of size 4 at 0x558e04d8ff70 (dpdk-testpmd+0x000003122f70)
>
> Thread T2 'rte_mp_handle' (tid=87248, running) created by main thread at:
> #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:969 (libtsan.so.0+0x5ad75)
> #1 rte_ctrl_thread_create <null> (dpdk-testpmd+0x1e6efd0)
> #2 rte_mp_channel_init.cold <null> (dpdk-testpmd+0x29cb7c)
> #3 rte_eal_init <null> (dpdk-testpmd+0x1e8662e)
> #4 main <null> (dpdk-testpmd+0x5cbc45)
>
> SUMMARY: ThreadSanitizer: data race (/home/shemminger/DPDK/main/build/app/dpdk-testpmd+0x1e7d30c) in rte_mp_channel_cleanup
> ==================
> ThreadSanitizer: reported 1 warnings
>
> Fixes: bacaa2754017 ("eal: add channel for multi-process communication")
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
+Cc: stable@dpdk.org
Applied, thanks.
@@ -260,7 +260,7 @@ rte_mp_action_unregister(const char *name)
}
static int
-read_msg(struct mp_msg_internal *m, struct sockaddr_un *s)
+read_msg(int fd, struct mp_msg_internal *m, struct sockaddr_un *s)
{
int msglen;
struct iovec iov;
@@ -281,7 +281,7 @@ read_msg(struct mp_msg_internal *m, struct sockaddr_un *s)
msgh.msg_controllen = sizeof(control);
retry:
- msglen = recvmsg(mp_fd, &msgh, 0);
+ msglen = recvmsg(fd, &msgh, 0);
/* zero length message means socket was closed */
if (msglen == 0)
@@ -390,11 +390,12 @@ mp_handle(void *arg __rte_unused)
{
struct mp_msg_internal msg;
struct sockaddr_un sa;
+ int fd;
- while (mp_fd >= 0) {
+ while ((fd = __atomic_load_n(&mp_fd, __ATOMIC_RELAXED)) >= 0) {
int ret;
- ret = read_msg(&msg, &sa);
+ ret = read_msg(fd, &msg, &sa);
if (ret <= 0)
break;
@@ -638,9 +639,8 @@ rte_mp_channel_init(void)
NULL, mp_handle, NULL) < 0) {
RTE_LOG(ERR, EAL, "failed to create mp thread: %s\n",
strerror(errno));
- close(mp_fd);
close(dir_fd);
- mp_fd = -1;
+ close(__atomic_exchange_n(&mp_fd, -1, __ATOMIC_RELAXED));
return -1;
}
@@ -656,11 +656,10 @@ rte_mp_channel_cleanup(void)
{
int fd;
- if (mp_fd < 0)
+ fd = __atomic_exchange_n(&mp_fd, -1, __ATOMIC_RELAXED);
+ if (fd < 0)
return;
- fd = mp_fd;
- mp_fd = -1;
pthread_cancel(mp_handle_tid);
pthread_join(mp_handle_tid, NULL);
close_socket_fd(fd);