test/mbuf: fix the forked process segment fault

Message ID 20230522060137.225154-1-ruifeng.wang@arm.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series test/mbuf: fix the forked process segment fault |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/github-robot: build success github build: passed
ci/iol-broadcom-Performance success Performance Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS

Commit Message

Ruifeng Wang May 22, 2023, 6:01 a.m. UTC
  Access of any memory in the hugepage shared file-backed area will trigger
an unexpected forked child process segment fault. The root cause is DPDK
doesn't support fork model [1] (calling rte_eal_init() before fork()).
Forked child process can't be treated as a secondary process.

Hence fix it by avoiding fork and doing verification in the main process.

[1] https://mails.dpdk.org/archives/dev/2018-July/108106.html

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_mbuf.c | 50 +++++++++++++-------------------------------
 1 file changed, 14 insertions(+), 36 deletions(-)
  

Comments

Burakov, Anatoly May 22, 2023, 9:24 a.m. UTC | #1
On 5/22/2023 7:01 AM, Ruifeng Wang wrote:
> Access of any memory in the hugepage shared file-backed area will trigger
> an unexpected forked child process segment fault. The root cause is DPDK
> doesn't support fork model [1] (calling rte_eal_init() before fork()).
> Forked child process can't be treated as a secondary process.
> 
> Hence fix it by avoiding fork and doing verification in the main process.
> 
> [1] https://mails.dpdk.org/archives/dev/2018-July/108106.html
> 
> Fixes: af75078fece3 ("first public release")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Jia He <justin.he@arm.com>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---

Would this be something that a secondary process-based test could test? 
That's how we test rte_panic() and other calls.
  
Ruifeng Wang May 22, 2023, 9:55 a.m. UTC | #2
> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Monday, May 22, 2023 5:24 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; olivier.matz@6wind.com
> Cc: dev@dpdk.org; stable@dpdk.org; thomas@monjalon.net; stephen@networkplumber.org; Justin
> He <Justin.He@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd
> <nd@arm.com>
> Subject: Re: [PATCH] test/mbuf: fix the forked process segment fault
> 
> On 5/22/2023 7:01 AM, Ruifeng Wang wrote:
> > Access of any memory in the hugepage shared file-backed area will
> > trigger an unexpected forked child process segment fault. The root
> > cause is DPDK doesn't support fork model [1] (calling rte_eal_init() before fork()).
> > Forked child process can't be treated as a secondary process.
> >
> > Hence fix it by avoiding fork and doing verification in the main process.
> >
> > [1] https://mails.dpdk.org/archives/dev/2018-July/108106.html
> >
> > Fixes: af75078fece3 ("first public release")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Jia He <justin.he@arm.com>
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> 
> Would this be something that a secondary process-based test could test?
> That's how we test rte_panic() and other calls.

This case validates mbuf. IMO there is no need to do validation in a secondary process.
Unit test for rte_panic() also uses fork() and could have the same issue.

> 
> --
> Thanks,
> Anatoly
  
Burakov, Anatoly May 22, 2023, 10:19 a.m. UTC | #3
On 5/22/2023 10:55 AM, Ruifeng Wang wrote:
>> -----Original Message-----
>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
>> Sent: Monday, May 22, 2023 5:24 PM
>> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; olivier.matz@6wind.com
>> Cc: dev@dpdk.org; stable@dpdk.org; thomas@monjalon.net; stephen@networkplumber.org; Justin
>> He <Justin.He@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd
>> <nd@arm.com>
>> Subject: Re: [PATCH] test/mbuf: fix the forked process segment fault
>>
>> On 5/22/2023 7:01 AM, Ruifeng Wang wrote:
>>> Access of any memory in the hugepage shared file-backed area will
>>> trigger an unexpected forked child process segment fault. The root
>>> cause is DPDK doesn't support fork model [1] (calling rte_eal_init() before fork()).
>>> Forked child process can't be treated as a secondary process.
>>>
>>> Hence fix it by avoiding fork and doing verification in the main process.
>>>
>>> [1] https://mails.dpdk.org/archives/dev/2018-July/108106.html
>>>
>>> Fixes: af75078fece3 ("first public release")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Jia He <justin.he@arm.com>
>>> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>> ---
>>
>> Would this be something that a secondary process-based test could test?
>> That's how we test rte_panic() and other calls.
> 
> This case validates mbuf. IMO there is no need to do validation in a secondary process.
> Unit test for rte_panic() also uses fork() and could have the same issue.
> 

In that case, rte_panic() test should be fixed as well.

My concern is that ideally, we shouldn't intentionally crash the test 
app if something goes wrong, and calling rte_panic() accomplishes just 
that - which is why I suggested running them in secondary processes 
instead, so that any call into rte_panic happens inside a secondary 
process, and the main test process doesn't crash even if the test has 
failed.
  
Stephen Hemminger May 22, 2023, 3:21 p.m. UTC | #4
On Mon, 22 May 2023 11:19:24 +0100
"Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:

> > 
> > This case validates mbuf. IMO there is no need to do validation in a secondary process.
> > Unit test for rte_panic() also uses fork() and could have the same issue.
> >   
> 
> In that case, rte_panic() test should be fixed as well.
> 
> My concern is that ideally, we shouldn't intentionally crash the test 
> app if something goes wrong, and calling rte_panic() accomplishes just 
> that - which is why I suggested running them in secondary processes 
> instead, so that any call into rte_panic happens inside a secondary 
> process, and the main test process doesn't crash even if the test has 
> failed.
> 

All forks outside of EAL are bad. The test should be removed, it was buggy
when first written.
  
Burakov, Anatoly May 22, 2023, 3:37 p.m. UTC | #5
On 5/22/2023 4:21 PM, Stephen Hemminger wrote:
> On Mon, 22 May 2023 11:19:24 +0100
> "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:
> 
>>>
>>> This case validates mbuf. IMO there is no need to do validation in a secondary process.
>>> Unit test for rte_panic() also uses fork() and could have the same issue.
>>>    
>>
>> In that case, rte_panic() test should be fixed as well.
>>
>> My concern is that ideally, we shouldn't intentionally crash the test
>> app if something goes wrong, and calling rte_panic() accomplishes just
>> that - which is why I suggested running them in secondary processes
>> instead, so that any call into rte_panic happens inside a secondary
>> process, and the main test process doesn't crash even if the test has
>> failed.
>>
> 
> All forks outside of EAL are bad. The test should be removed, it was buggy
> when first written.

I agree that none of the tests (or anything else in DPDK) should fork, 
but some things still crash process (as in, cause rte_panic), and the 
purpose of using fork() was for that. We can rewrite the tests to not 
fork and instead use other methods (such as spinning up secondary 
processes), we don't have to remove them.
  
Ruifeng Wang May 23, 2023, 3:45 a.m. UTC | #6
> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Monday, May 22, 2023 6:19 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; olivier.matz@6wind.com
> Cc: dev@dpdk.org; stable@dpdk.org; thomas@monjalon.net; stephen@networkplumber.org; Justin
> He <Justin.He@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd
> <nd@arm.com>
> Subject: Re: [PATCH] test/mbuf: fix the forked process segment fault
> 
> On 5/22/2023 10:55 AM, Ruifeng Wang wrote:
> >> -----Original Message-----
> >> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> >> Sent: Monday, May 22, 2023 5:24 PM
> >> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; olivier.matz@6wind.com
> >> Cc: dev@dpdk.org; stable@dpdk.org; thomas@monjalon.net;
> >> stephen@networkplumber.org; Justin He <Justin.He@arm.com>; Honnappa
> >> Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> >> Subject: Re: [PATCH] test/mbuf: fix the forked process segment fault
> >>
> >> On 5/22/2023 7:01 AM, Ruifeng Wang wrote:
> >>> Access of any memory in the hugepage shared file-backed area will
> >>> trigger an unexpected forked child process segment fault. The root
> >>> cause is DPDK doesn't support fork model [1] (calling rte_eal_init() before fork()).
> >>> Forked child process can't be treated as a secondary process.
> >>>
> >>> Hence fix it by avoiding fork and doing verification in the main process.
> >>>
> >>> [1] https://mails.dpdk.org/archives/dev/2018-July/108106.html
> >>>
> >>> Fixes: af75078fece3 ("first public release")
> >>> Cc: stable@dpdk.org
> >>>
> >>> Signed-off-by: Jia He <justin.he@arm.com>
> >>> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >>> ---
> >>
> >> Would this be something that a secondary process-based test could test?
> >> That's how we test rte_panic() and other calls.
> >
> > This case validates mbuf. IMO there is no need to do validation in a secondary process.
> > Unit test for rte_panic() also uses fork() and could have the same issue.
> >
> 
> In that case, rte_panic() test should be fixed as well.
> 
> My concern is that ideally, we shouldn't intentionally crash the test app if something
> goes wrong, and calling rte_panic() accomplishes just that - which is why I suggested
> running them in secondary processes instead, so that any call into rte_panic happens
> inside a secondary process, and the main test process doesn't crash even if the test has
> failed.

Agree that intentionally crashing the test app is bad.
In this patch, verification of mbuf is changed to use another API without rte_panic().
Then the verification can be done directly in the primary. And the indirectness of
using a secondary process is removed. Because verification will not crash the process.

> 
> --
> Thanks,
> Anatoly
  
Burakov, Anatoly May 23, 2023, 10:12 a.m. UTC | #7
On 5/23/2023 4:45 AM, Ruifeng Wang wrote:
>> -----Original Message-----
>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
>> Sent: Monday, May 22, 2023 6:19 PM
>> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; olivier.matz@6wind.com
>> Cc: dev@dpdk.org; stable@dpdk.org; thomas@monjalon.net; stephen@networkplumber.org; Justin
>> He <Justin.He@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd
>> <nd@arm.com>
>> Subject: Re: [PATCH] test/mbuf: fix the forked process segment fault
>>
>> On 5/22/2023 10:55 AM, Ruifeng Wang wrote:
>>>> -----Original Message-----
>>>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
>>>> Sent: Monday, May 22, 2023 5:24 PM
>>>> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; olivier.matz@6wind.com
>>>> Cc: dev@dpdk.org; stable@dpdk.org; thomas@monjalon.net;
>>>> stephen@networkplumber.org; Justin He <Justin.He@arm.com>; Honnappa
>>>> Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
>>>> Subject: Re: [PATCH] test/mbuf: fix the forked process segment fault
>>>>
>>>> On 5/22/2023 7:01 AM, Ruifeng Wang wrote:
>>>>> Access of any memory in the hugepage shared file-backed area will
>>>>> trigger an unexpected forked child process segment fault. The root
>>>>> cause is DPDK doesn't support fork model [1] (calling rte_eal_init() before fork()).
>>>>> Forked child process can't be treated as a secondary process.
>>>>>
>>>>> Hence fix it by avoiding fork and doing verification in the main process.
>>>>>
>>>>> [1] https://mails.dpdk.org/archives/dev/2018-July/108106.html
>>>>>
>>>>> Fixes: af75078fece3 ("first public release")
>>>>> Cc: stable@dpdk.org
>>>>>
>>>>> Signed-off-by: Jia He <justin.he@arm.com>
>>>>> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>>>> ---
>>>>
>>>> Would this be something that a secondary process-based test could test?
>>>> That's how we test rte_panic() and other calls.
>>>
>>> This case validates mbuf. IMO there is no need to do validation in a secondary process.
>>> Unit test for rte_panic() also uses fork() and could have the same issue.
>>>
>>
>> In that case, rte_panic() test should be fixed as well.
>>
>> My concern is that ideally, we shouldn't intentionally crash the test app if something
>> goes wrong, and calling rte_panic() accomplishes just that - which is why I suggested
>> running them in secondary processes instead, so that any call into rte_panic happens
>> inside a secondary process, and the main test process doesn't crash even if the test has
>> failed.
> 
> Agree that intentionally crashing the test app is bad.
> In this patch, verification of mbuf is changed to use another API without rte_panic().
> Then the verification can be done directly in the primary. And the indirectness of
> using a secondary process is removed. Because verification will not crash the process.
> 

Oh,

My apologies, I did not notice that. In that case,

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
  
Thomas Monjalon June 12, 2023, 2:47 p.m. UTC | #8
23/05/2023 12:12, Burakov, Anatoly:
> On 5/23/2023 4:45 AM, Ruifeng Wang wrote:
> >> -----Original Message-----
> >> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> >> Sent: Monday, May 22, 2023 6:19 PM
> >> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; olivier.matz@6wind.com
> >> Cc: dev@dpdk.org; stable@dpdk.org; thomas@monjalon.net; stephen@networkplumber.org; Justin
> >> He <Justin.He@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd
> >> <nd@arm.com>
> >> Subject: Re: [PATCH] test/mbuf: fix the forked process segment fault
> >>
> >> On 5/22/2023 10:55 AM, Ruifeng Wang wrote:
> >>>> -----Original Message-----
> >>>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> >>>> Sent: Monday, May 22, 2023 5:24 PM
> >>>> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; olivier.matz@6wind.com
> >>>> Cc: dev@dpdk.org; stable@dpdk.org; thomas@monjalon.net;
> >>>> stephen@networkplumber.org; Justin He <Justin.He@arm.com>; Honnappa
> >>>> Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> >>>> Subject: Re: [PATCH] test/mbuf: fix the forked process segment fault
> >>>>
> >>>> On 5/22/2023 7:01 AM, Ruifeng Wang wrote:
> >>>>> Access of any memory in the hugepage shared file-backed area will
> >>>>> trigger an unexpected forked child process segment fault. The root
> >>>>> cause is DPDK doesn't support fork model [1] (calling rte_eal_init() before fork()).
> >>>>> Forked child process can't be treated as a secondary process.
> >>>>>
> >>>>> Hence fix it by avoiding fork and doing verification in the main process.
> >>>>>
> >>>>> [1] https://mails.dpdk.org/archives/dev/2018-July/108106.html
> >>>>>
> >>>>> Fixes: af75078fece3 ("first public release")
> >>>>> Cc: stable@dpdk.org
> >>>>>
> >>>>> Signed-off-by: Jia He <justin.he@arm.com>
> >>>>> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >>>>> ---
> >>>>
> >>>> Would this be something that a secondary process-based test could test?
> >>>> That's how we test rte_panic() and other calls.
> >>>
> >>> This case validates mbuf. IMO there is no need to do validation in a secondary process.
> >>> Unit test for rte_panic() also uses fork() and could have the same issue.
> >>>
> >>
> >> In that case, rte_panic() test should be fixed as well.
> >>
> >> My concern is that ideally, we shouldn't intentionally crash the test app if something
> >> goes wrong, and calling rte_panic() accomplishes just that - which is why I suggested
> >> running them in secondary processes instead, so that any call into rte_panic happens
> >> inside a secondary process, and the main test process doesn't crash even if the test has
> >> failed.
> > 
> > Agree that intentionally crashing the test app is bad.
> > In this patch, verification of mbuf is changed to use another API without rte_panic().
> > Then the verification can be done directly in the primary. And the indirectness of
> > using a secondary process is removed. Because verification will not crash the process.
> > 
> 
> Oh,
> 
> My apologies, I did not notice that. In that case,
> 
> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

Applied, thanks.
  

Patch

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index 8d8d3b9386..efac01806b 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -1167,38 +1167,16 @@  test_failing_mbuf_sanity_check(struct rte_mempool *pktmbuf_pool)
 	return TEST_SKIPPED;
 }
 #else
-
-#include <unistd.h>
-#include <sys/resource.h>
-#include <sys/time.h>
-#include <sys/wait.h>
-
-/* use fork() to test mbuf errors panic */
-static int
-verify_mbuf_check_panics(struct rte_mbuf *buf)
+/* Verify if mbuf can pass the check */
+static bool
+mbuf_check_pass(struct rte_mbuf *buf)
 {
-	int pid;
-	int status;
+	const char *reason;
 
-	pid = fork();
+	if (rte_mbuf_check(buf, 1, &reason) == 0)
+		return true;
 
-	if (pid == 0) {
-		struct rlimit rl;
-
-		/* No need to generate a coredump when panicking. */
-		rl.rlim_cur = rl.rlim_max = 0;
-		setrlimit(RLIMIT_CORE, &rl);
-		rte_mbuf_sanity_check(buf, 1); /* should panic */
-		exit(0);  /* return normally if it doesn't panic */
-	} else if (pid < 0) {
-		printf("Fork Failed\n");
-		return -1;
-	}
-	wait(&status);
-	if(status == 0)
-		return -1;
-
-	return 0;
+	return false;
 }
 
 static int
@@ -1215,19 +1193,19 @@  test_failing_mbuf_sanity_check(struct rte_mempool *pktmbuf_pool)
 		return -1;
 
 	printf("Checking good mbuf initially\n");
-	if (verify_mbuf_check_panics(buf) != -1)
+	if (!mbuf_check_pass(buf))
 		return -1;
 
 	printf("Now checking for error conditions\n");
 
-	if (verify_mbuf_check_panics(NULL)) {
+	if (mbuf_check_pass(NULL)) {
 		printf("Error with NULL mbuf test\n");
 		return -1;
 	}
 
 	badbuf = *buf;
 	badbuf.pool = NULL;
-	if (verify_mbuf_check_panics(&badbuf)) {
+	if (mbuf_check_pass(&badbuf)) {
 		printf("Error with bad-pool mbuf test\n");
 		return -1;
 	}
@@ -1235,7 +1213,7 @@  test_failing_mbuf_sanity_check(struct rte_mempool *pktmbuf_pool)
 	if (RTE_IOVA_IN_MBUF) {
 		badbuf = *buf;
 		rte_mbuf_iova_set(&badbuf, 0);
-		if (verify_mbuf_check_panics(&badbuf)) {
+		if (mbuf_check_pass(&badbuf)) {
 			printf("Error with bad-physaddr mbuf test\n");
 			return -1;
 		}
@@ -1243,21 +1221,21 @@  test_failing_mbuf_sanity_check(struct rte_mempool *pktmbuf_pool)
 
 	badbuf = *buf;
 	badbuf.buf_addr = NULL;
-	if (verify_mbuf_check_panics(&badbuf)) {
+	if (mbuf_check_pass(&badbuf)) {
 		printf("Error with bad-addr mbuf test\n");
 		return -1;
 	}
 
 	badbuf = *buf;
 	badbuf.refcnt = 0;
-	if (verify_mbuf_check_panics(&badbuf)) {
+	if (mbuf_check_pass(&badbuf)) {
 		printf("Error with bad-refcnt(0) mbuf test\n");
 		return -1;
 	}
 
 	badbuf = *buf;
 	badbuf.refcnt = UINT16_MAX;
-	if (verify_mbuf_check_panics(&badbuf)) {
+	if (mbuf_check_pass(&badbuf)) {
 		printf("Error with bad-refcnt(MAX) mbuf test\n");
 		return -1;
 	}