[v2] eal: fix thread race in control thread creation

Message ID 1677704982-2643-1-git-send-email-roretzla@linux.microsoft.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series [v2] eal: fix thread race in control thread creation |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/intel-Functional success Functional PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS

Commit Message

Tyler Retzlaff March 1, 2023, 9:09 p.m. UTC
  When ctrl_thread_init transitions params->ctrl_thread_status from
CTRL_THREAD_LAUNCHING the creating thread and new thread may run
concurrently leading to unsynchronized access to params.

This permits races for both the failure and success paths after
ctrl_thread_status is stored.
  * params->ret may be loaded in ctrl_thread_init failure path
  * params->arg may be loaded in ctrl_thread_start or
    control_thread_start when calling start_routine.

For ctrl_thread_init remove the params->ret load and just return 1 since
it is only interpreted as a indicator of success / failure of
ctrl_thread_init.

For {ctrl,control}_thread_start store param->arg in stack allocated
storage prior to calling ctrl_thread_init and use the copy when calling
start_routine.

For control_thread_start if ctrl_thread_init fails just return 0 instead
of loading params->ret, since the value returned is unused when
ctrl_thread_status is set to CTRL_THREAD_ERROR when ctrl_thread_init
fails.

Fixes: 878b7468eacb ("eal: add platform agnostic control thread API")

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
---
 lib/eal/common/eal_common_thread.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)
  

Comments

Honnappa Nagarahalli March 2, 2023, 1:30 a.m. UTC | #1
> -----Original Message-----
> From: Tyler Retzlaff <roretzla@linux.microsoft.com>
> Sent: Wednesday, March 1, 2023 3:10 PM
> To: dev@dpdk.org; david.marchand@redhat.com
> Cc: thomas@monjalon.net; Tyler Retzlaff <roretzla@linux.microsoft.com>
> Subject: [PATCH v2] eal: fix thread race in control thread creation
> 
> When ctrl_thread_init transitions params->ctrl_thread_status from
> CTRL_THREAD_LAUNCHING the creating thread and new thread may run
> concurrently leading to unsynchronized access to params.
IMO, the code will be simpler if we did not free 'params' in 'rte_thread_create_control'/'rte_ctrl_thread_create'. We could avoid creating the local copies of start_routine and the arg.
See more comments below.

> 
> This permits races for both the failure and success paths after ctrl_thread_status
> is stored.
>   * params->ret may be loaded in ctrl_thread_init failure path
>   * params->arg may be loaded in ctrl_thread_start or
>     control_thread_start when calling start_routine.
> 
> For ctrl_thread_init remove the params->ret load and just return 1 since it is
> only interpreted as a indicator of success / failure of ctrl_thread_init.
> 
> For {ctrl,control}_thread_start store param->arg in stack allocated storage prior
> to calling ctrl_thread_init and use the copy when calling start_routine.
> 
> For control_thread_start if ctrl_thread_init fails just return 0 instead of loading
> params->ret, since the value returned is unused when ctrl_thread_status is set
> to CTRL_THREAD_ERROR when ctrl_thread_init fails.
> 
> Fixes: 878b7468eacb ("eal: add platform agnostic control thread API")
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> Reviewed-by: David Marchand <david.marchand@redhat.com>
> ---
>  lib/eal/common/eal_common_thread.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/eal/common/eal_common_thread.c
> b/lib/eal/common/eal_common_thread.c
> index edb9d4e..079a385 100644
> --- a/lib/eal/common/eal_common_thread.c
> +++ b/lib/eal/common/eal_common_thread.c
> @@ -256,7 +256,7 @@ static int ctrl_thread_init(void *arg)
>  	if (params->ret != 0) {
>  		__atomic_store_n(&params->ctrl_thread_status,
>  			CTRL_THREAD_ERROR, __ATOMIC_RELEASE);
> -		return params->ret;
> +		return 1;
>  	}
> 
>  	__atomic_store_n(&params->ctrl_thread_status,
> @@ -268,23 +268,25 @@ static int ctrl_thread_init(void *arg)  static void
> *ctrl_thread_start(void *arg)  {
>  	struct rte_thread_ctrl_params *params = arg;
> +	void *start_arg = params->arg;
>  	void *(*start_routine)(void *) = params->u.ctrl_start_routine;
These copies can be avoided, code will be much simpler

> 
>  	if (ctrl_thread_init(arg) != 0)
>  		return NULL;
> 
> -	return start_routine(params->arg);
> +	return start_routine(start_arg);
We can free 'params' here after 'start_routine' returns.

>  }
> 
>  static uint32_t control_thread_start(void *arg)  {
>  	struct rte_thread_ctrl_params *params = arg;
> +	void *start_arg = params->arg;
>  	rte_thread_func start_routine = params->u.control_start_routine;
> 
>  	if (ctrl_thread_init(arg) != 0)
> -		return params->ret;
> +		return 0;
> 
> -	return start_routine(params->arg);
> +	return start_routine(start_arg);
>  }
> 
>  int
> --
> 1.8.3.1
  
Tyler Retzlaff March 2, 2023, 1:45 a.m. UTC | #2
On Thu, Mar 02, 2023 at 01:30:13AM +0000, Honnappa Nagarahalli wrote:
> 
> 
> > -----Original Message-----
> > From: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > Sent: Wednesday, March 1, 2023 3:10 PM
> > To: dev@dpdk.org; david.marchand@redhat.com
> > Cc: thomas@monjalon.net; Tyler Retzlaff <roretzla@linux.microsoft.com>
> > Subject: [PATCH v2] eal: fix thread race in control thread creation
> > 
> > When ctrl_thread_init transitions params->ctrl_thread_status from
> > CTRL_THREAD_LAUNCHING the creating thread and new thread may run
> > concurrently leading to unsynchronized access to params.
> IMO, the code will be simpler if we did not free 'params' in 'rte_thread_create_control'/'rte_ctrl_thread_create'. We could avoid creating the local copies of start_routine and the arg.

You mean in the success case i assume? it still has to be free'd if
rte_thread_create fails.

> See more comments below.
> 
> > 
> > This permits races for both the failure and success paths after ctrl_thread_status
> > is stored.
> >   * params->ret may be loaded in ctrl_thread_init failure path
> >   * params->arg may be loaded in ctrl_thread_start or
> >     control_thread_start when calling start_routine.
> > 
> > For ctrl_thread_init remove the params->ret load and just return 1 since it is
> > only interpreted as a indicator of success / failure of ctrl_thread_init.
> > 
> > For {ctrl,control}_thread_start store param->arg in stack allocated storage prior
> > to calling ctrl_thread_init and use the copy when calling start_routine.
> > 
> > For control_thread_start if ctrl_thread_init fails just return 0 instead of loading
> > params->ret, since the value returned is unused when ctrl_thread_status is set
> > to CTRL_THREAD_ERROR when ctrl_thread_init fails.
> > 
> > Fixes: 878b7468eacb ("eal: add platform agnostic control thread API")
> > 
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > Reviewed-by: David Marchand <david.marchand@redhat.com>
> > ---
> >  lib/eal/common/eal_common_thread.c | 10 ++++++----
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> > 
> > diff --git a/lib/eal/common/eal_common_thread.c
> > b/lib/eal/common/eal_common_thread.c
> > index edb9d4e..079a385 100644
> > --- a/lib/eal/common/eal_common_thread.c
> > +++ b/lib/eal/common/eal_common_thread.c
> > @@ -256,7 +256,7 @@ static int ctrl_thread_init(void *arg)
> >  	if (params->ret != 0) {
> >  		__atomic_store_n(&params->ctrl_thread_status,
> >  			CTRL_THREAD_ERROR, __ATOMIC_RELEASE);
> > -		return params->ret;
> > +		return 1;
> >  	}
> > 
> >  	__atomic_store_n(&params->ctrl_thread_status,
> > @@ -268,23 +268,25 @@ static int ctrl_thread_init(void *arg)  static void
> > *ctrl_thread_start(void *arg)  {
> >  	struct rte_thread_ctrl_params *params = arg;
> > +	void *start_arg = params->arg;
> >  	void *(*start_routine)(void *) = params->u.ctrl_start_routine;
> These copies can be avoided, code will be much simpler
> 
> > 
> >  	if (ctrl_thread_init(arg) != 0)
> >  		return NULL;
> > 
> > -	return start_routine(params->arg);
> > +	return start_routine(start_arg);
> We can free 'params' here after 'start_routine' returns.

I guess it doesn't matter if the allocation is retained for the duration
of start_routine() which could be ~long.

David/Honnappah let me know what you decide. if you'd prefer to change
to honnappah's suggestion i'll put a new version up.

> 
> >  }
> > 
> >  static uint32_t control_thread_start(void *arg)  {
> >  	struct rte_thread_ctrl_params *params = arg;
> > +	void *start_arg = params->arg;
> >  	rte_thread_func start_routine = params->u.control_start_routine;
> > 
> >  	if (ctrl_thread_init(arg) != 0)
> > -		return params->ret;
> > +		return 0;
> > 
> > -	return start_routine(params->arg);
> > +	return start_routine(start_arg);
> >  }
> > 
> >  int
> > --
> > 1.8.3.1
  
Honnappa Nagarahalli March 2, 2023, 4:08 a.m. UTC | #3
<snip>

> > >
> > > When ctrl_thread_init transitions params->ctrl_thread_status from
> > > CTRL_THREAD_LAUNCHING the creating thread and new thread may run
> > > concurrently leading to unsynchronized access to params.
> > IMO, the code will be simpler if we did not free 'params' in
> 'rte_thread_create_control'/'rte_ctrl_thread_create'. We could avoid creating
> the local copies of start_routine and the arg.
> 
> You mean in the success case i assume? it still has to be free'd if
> rte_thread_create fails.
Yes

> 
> > See more comments below.
> >
> > >
> > > This permits races for both the failure and success paths after
> > > ctrl_thread_status is stored.
> > >   * params->ret may be loaded in ctrl_thread_init failure path
> > >   * params->arg may be loaded in ctrl_thread_start or
> > >     control_thread_start when calling start_routine.
> > >
> > > For ctrl_thread_init remove the params->ret load and just return 1
> > > since it is only interpreted as a indicator of success / failure of
> ctrl_thread_init.
> > >
> > > For {ctrl,control}_thread_start store param->arg in stack allocated
> > > storage prior to calling ctrl_thread_init and use the copy when calling
> start_routine.
> > >
> > > For control_thread_start if ctrl_thread_init fails just return 0
> > > instead of loading
> > > params->ret, since the value returned is unused when
> > > params->ctrl_thread_status is set
> > > to CTRL_THREAD_ERROR when ctrl_thread_init fails.
> > >
> > > Fixes: 878b7468eacb ("eal: add platform agnostic control thread
> > > API")
> > >
> > > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > > Reviewed-by: David Marchand <david.marchand@redhat.com>
> > > ---
> > >  lib/eal/common/eal_common_thread.c | 10 ++++++----
> > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/lib/eal/common/eal_common_thread.c
> > > b/lib/eal/common/eal_common_thread.c
> > > index edb9d4e..079a385 100644
> > > --- a/lib/eal/common/eal_common_thread.c
> > > +++ b/lib/eal/common/eal_common_thread.c
> > > @@ -256,7 +256,7 @@ static int ctrl_thread_init(void *arg)
> > >  	if (params->ret != 0) {
> > >  		__atomic_store_n(&params->ctrl_thread_status,
> > >  			CTRL_THREAD_ERROR, __ATOMIC_RELEASE);
> > > -		return params->ret;
> > > +		return 1;
> > >  	}
> > >
> > >  	__atomic_store_n(&params->ctrl_thread_status,
> > > @@ -268,23 +268,25 @@ static int ctrl_thread_init(void *arg)  static
> > > void *ctrl_thread_start(void *arg)  {
> > >  	struct rte_thread_ctrl_params *params = arg;
> > > +	void *start_arg = params->arg;
> > >  	void *(*start_routine)(void *) = params->u.ctrl_start_routine;
> > These copies can be avoided, code will be much simpler
> >
> > >
> > >  	if (ctrl_thread_init(arg) != 0)
> > >  		return NULL;
> > >
> > > -	return start_routine(params->arg);
> > > +	return start_routine(start_arg);
> > We can free 'params' here after 'start_routine' returns.
> 
> I guess it doesn't matter if the allocation is retained for the duration of
> start_routine() which could be ~long.
Yes, that's what I thought, it is a small size.

> 
> David/Honnappah let me know what you decide. if you'd prefer to change to
> honnappah's suggestion i'll put a new version up.
> 
> >
> > >  }
> > >
> > >  static uint32_t control_thread_start(void *arg)  {
> > >  	struct rte_thread_ctrl_params *params = arg;
> > > +	void *start_arg = params->arg;
> > >  	rte_thread_func start_routine = params->u.control_start_routine;
> > >
> > >  	if (ctrl_thread_init(arg) != 0)
> > > -		return params->ret;
> > > +		return 0;
> > >
> > > -	return start_routine(params->arg);
> > > +	return start_routine(start_arg);
> > >  }
> > >
> > >  int
> > > --
> > > 1.8.3.1
  
David Marchand March 7, 2023, 1:53 p.m. UTC | #4
On Thu, Mar 2, 2023 at 5:18 AM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
> > > >   if (ctrl_thread_init(arg) != 0)
> > > >           return NULL;
> > > >
> > > > - return start_routine(params->arg);
> > > > + return start_routine(start_arg);
> > > We can free 'params' here after 'start_routine' returns.
> >
> > I guess it doesn't matter if the allocation is retained for the duration of
> > start_routine() which could be ~long.
> Yes, that's what I thought, it is a small size.
>
> >
> > David/Honnappah let me know what you decide. if you'd prefer to change to
> > honnappah's suggestion i'll put a new version up.

Hum, how would it look like?
- the parent thread would free params if the child thread fails to set affinity,
- the child thread would free params otherwise,

Is this what you propose?

This works, but we have to be extra careful if any change is done in
this part later.
  
Tyler Retzlaff March 7, 2023, 5:54 p.m. UTC | #5
On Tue, Mar 07, 2023 at 02:53:34PM +0100, David Marchand wrote:
> On Thu, Mar 2, 2023 at 5:18 AM Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com> wrote:
> > > > >   if (ctrl_thread_init(arg) != 0)
> > > > >           return NULL;
> > > > >
> > > > > - return start_routine(params->arg);
> > > > > + return start_routine(start_arg);
> > > > We can free 'params' here after 'start_routine' returns.
> > >
> > > I guess it doesn't matter if the allocation is retained for the duration of
> > > start_routine() which could be ~long.
> > Yes, that's what I thought, it is a small size.
> >
> > >
> > > David/Honnappah let me know what you decide. if you'd prefer to change to
> > > honnappah's suggestion i'll put a new version up.
> 
> Hum, how would it look like?
> - the parent thread would free params if the child thread fails to set affinity,
> - the child thread would free params otherwise,
> 
> Is this what you propose?
> 
> This works, but we have to be extra careful if any change is done in
> this part later.

i think the fix i've already put up is kind of trivially verifiable as
correct, if we just need a fix then it works.

> 
> 
> -- 
> David Marchand
  
David Marchand March 9, 2023, 9:01 a.m. UTC | #6
On Wed, Mar 1, 2023 at 10:09 PM Tyler Retzlaff
<roretzla@linux.microsoft.com> wrote:
>
> When ctrl_thread_init transitions params->ctrl_thread_status from
> CTRL_THREAD_LAUNCHING the creating thread and new thread may run
> concurrently leading to unsynchronized access to params.
>
> This permits races for both the failure and success paths after
> ctrl_thread_status is stored.
>   * params->ret may be loaded in ctrl_thread_init failure path
>   * params->arg may be loaded in ctrl_thread_start or
>     control_thread_start when calling start_routine.
>
> For ctrl_thread_init remove the params->ret load and just return 1 since
> it is only interpreted as a indicator of success / failure of
> ctrl_thread_init.
>
> For {ctrl,control}_thread_start store param->arg in stack allocated
> storage prior to calling ctrl_thread_init and use the copy when calling
> start_routine.
>
> For control_thread_start if ctrl_thread_init fails just return 0 instead
> of loading params->ret, since the value returned is unused when
> ctrl_thread_status is set to CTRL_THREAD_ERROR when ctrl_thread_init
> fails.
>
> Fixes: 878b7468eacb ("eal: add platform agnostic control thread API")
>
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> Reviewed-by: David Marchand <david.marchand@redhat.com>

Honnappa reached out offlist and proposed to send the discussed rework
as a followup patch.
So I'll take this fix as is for now.

Applied, thanks Tyler.
  

Patch

diff --git a/lib/eal/common/eal_common_thread.c b/lib/eal/common/eal_common_thread.c
index edb9d4e..079a385 100644
--- a/lib/eal/common/eal_common_thread.c
+++ b/lib/eal/common/eal_common_thread.c
@@ -256,7 +256,7 @@  static int ctrl_thread_init(void *arg)
 	if (params->ret != 0) {
 		__atomic_store_n(&params->ctrl_thread_status,
 			CTRL_THREAD_ERROR, __ATOMIC_RELEASE);
-		return params->ret;
+		return 1;
 	}
 
 	__atomic_store_n(&params->ctrl_thread_status,
@@ -268,23 +268,25 @@  static int ctrl_thread_init(void *arg)
 static void *ctrl_thread_start(void *arg)
 {
 	struct rte_thread_ctrl_params *params = arg;
+	void *start_arg = params->arg;
 	void *(*start_routine)(void *) = params->u.ctrl_start_routine;
 
 	if (ctrl_thread_init(arg) != 0)
 		return NULL;
 
-	return start_routine(params->arg);
+	return start_routine(start_arg);
 }
 
 static uint32_t control_thread_start(void *arg)
 {
 	struct rte_thread_ctrl_params *params = arg;
+	void *start_arg = params->arg;
 	rte_thread_func start_routine = params->u.control_start_routine;
 
 	if (ctrl_thread_init(arg) != 0)
-		return params->ret;
+		return 0;
 
-	return start_routine(params->arg);
+	return start_routine(start_arg);
 }
 
 int