[2/2] eal: fix hang in ctrl thread creation error logic

Message ID 20210407201603.149234-2-lucp.at.work@gmail.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series [1/2] eal: fix race in ctrl thread creation |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/iol-intel-Performance success Performance Testing PASS
ci/travis-robot success travis build: passed
ci/github-robot success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-testing success Testing PASS
ci/intel-Testing success Testing PASS

Commit Message

Luc Pelletier April 7, 2021, 8:16 p.m. UTC
  The affinity of a control thread is set after it has been launched. If
setting the affinity fails, pthread_cancel is called followed by a call
to pthread_join, which can hang forever if the thread's start routine
doesn't call a pthread cancellation point.

This patch modifies the logic so that the control thread exits
gracefully if the affinity cannot be set successfully and removes the
call to pthread_cancel.

Fixes: 6383d26 ("eal: set name when creating a control thread")
Cc: olivier.matz@6wind.com
Cc: stable@dpdk.org

Signed-off-by: Luc Pelletier <lucp.at.work@gmail.com>
---

Hi Olivier,
Hi Honnappa,

As discussed, I've split the changes into 2 patches. This second commit
removes the pthread_cancel call which could result in a hang on join, if
the ctrl thread routine didn't call a cancellation point.

 lib/librte_eal/common/eal_common_thread.c | 29 +++++++++++++----------
 1 file changed, 17 insertions(+), 12 deletions(-)
  

Comments

Olivier Matz April 8, 2021, 2:20 p.m. UTC | #1
Hi Luc,

On Wed, Apr 07, 2021 at 04:16:06PM -0400, Luc Pelletier wrote:
> The affinity of a control thread is set after it has been launched. If
> setting the affinity fails, pthread_cancel is called followed by a call
> to pthread_join, which can hang forever if the thread's start routine
> doesn't call a pthread cancellation point.
> 
> This patch modifies the logic so that the control thread exits
> gracefully if the affinity cannot be set successfully and removes the
> call to pthread_cancel.
> 
> Fixes: 6383d26 ("eal: set name when creating a control thread")
> Cc: olivier.matz@6wind.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Luc Pelletier <lucp.at.work@gmail.com>

Thank you for these 2 fixes. Note the the title of your patches do not
contain the version (should have been v8?). I don't know how critical
it is for commiters.

Acked-by: Olivier Matz <olivier.matz@6wind.com>
  
Honnappa Nagarahalli April 8, 2021, 5:07 p.m. UTC | #2
<snip>

> 
> The affinity of a control thread is set after it has been launched. If setting the
> affinity fails, pthread_cancel is called followed by a call to pthread_join, which
> can hang forever if the thread's start routine doesn't call a pthread
> cancellation point.
> 
> This patch modifies the logic so that the control thread exits gracefully if the
> affinity cannot be set successfully and removes the call to pthread_cancel.
> 
> Fixes: 6383d26 ("eal: set name when creating a control thread")
> Cc: olivier.matz@6wind.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Luc Pelletier <lucp.at.work@gmail.com>
Looks good.
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
> 
<snip>
  
Luc Pelletier April 8, 2021, 6:01 p.m. UTC | #3
> Thank you for these 2 fixes. Note the the title of your patches do not
> contain the version (should have been v8?). I don't know how critical
> it is for commiters.

Thanks Olivier. I'll admit that I wasn't sure if I should version the
patches after splitting the original. I opted not to but it seems like
I should have. If it's a problem, please let me know and I'll repost
them with 'v8'.

Le jeu. 8 avr. 2021 à 10:20, Olivier Matz <olivier.matz@6wind.com> a écrit :
>
> Hi Luc,
>
> On Wed, Apr 07, 2021 at 04:16:06PM -0400, Luc Pelletier wrote:
> > The affinity of a control thread is set after it has been launched. If
> > setting the affinity fails, pthread_cancel is called followed by a call
> > to pthread_join, which can hang forever if the thread's start routine
> > doesn't call a pthread cancellation point.
> >
> > This patch modifies the logic so that the control thread exits
> > gracefully if the affinity cannot be set successfully and removes the
> > call to pthread_cancel.
> >
> > Fixes: 6383d26 ("eal: set name when creating a control thread")
> > Cc: olivier.matz@6wind.com
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Luc Pelletier <lucp.at.work@gmail.com>
>
> Thank you for these 2 fixes. Note the the title of your patches do not
> contain the version (should have been v8?). I don't know how critical
> it is for commiters.
>
> Acked-by: Olivier Matz <olivier.matz@6wind.com>
  
David Marchand April 9, 2021, 8:13 a.m. UTC | #4
On Thu, Apr 8, 2021 at 8:02 PM Luc Pelletier <lucp.at.work@gmail.com> wrote:
>
> > Thank you for these 2 fixes. Note the the title of your patches do not
> > contain the version (should have been v8?). I don't know how critical
> > it is for commiters.
>
> Thanks Olivier. I'll admit that I wasn't sure if I should version the
> patches after splitting the original. I opted not to but it seems like
> I should have. If it's a problem, please let me know and I'll repost
> them with 'v8'.

I followed this series closely, so not an issue for me.
No need to resend.
I'll look at merging it today.
  
David Marchand April 9, 2021, 2:34 p.m. UTC | #5
On Wed, Apr 7, 2021 at 10:29 PM Luc Pelletier <lucp.at.work@gmail.com> wrote:
>
> The affinity of a control thread is set after it has been launched. If
> setting the affinity fails, pthread_cancel is called followed by a call
> to pthread_join, which can hang forever if the thread's start routine
> doesn't call a pthread cancellation point.
>
> This patch modifies the logic so that the control thread exits
> gracefully if the affinity cannot be set successfully and removes the
> call to pthread_cancel.
>
> Fixes: 6383d26 ("eal: set name when creating a control thread")

Fixed sha1's while applying.
We prefer sha1 on 12 chars, like described in
https://doc.dpdk.org/guides/contributing/patches.html#commit-messages-body.

> Cc: stable@dpdk.org
>
> Signed-off-by: Luc Pelletier <lucp.at.work@gmail.com>

Acked-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Series applied, thanks for the fixes.
  

Patch

diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c
index 3347e91bf..03dbcd9e8 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -187,14 +187,18 @@  static void *ctrl_thread_init(void *arg)
 		eal_get_internal_configuration();
 	rte_cpuset_t *cpuset = &internal_conf->ctrl_cpuset;
 	struct rte_thread_ctrl_params *params = arg;
-	void *(*start_routine)(void *) = params->start_routine;
+	void *(*start_routine)(void *);
 	void *routine_arg = params->arg;
 
 	__rte_thread_init(rte_lcore_id(), cpuset);
 
 	pthread_barrier_wait(&params->configured);
+	start_routine = params->start_routine;
 	ctrl_params_free(params);
 
+	if (start_routine == NULL)
+		return NULL;
+
 	return start_routine(routine_arg);
 }
 
@@ -218,14 +222,12 @@  rte_ctrl_thread_create(pthread_t *thread, const char *name,
 	params->refcnt = 2;
 
 	ret = pthread_barrier_init(&params->configured, NULL, 2);
-	if (ret != 0) {
-		free(params);
-		return -ret;
-	}
+	if (ret != 0)
+		goto fail_no_barrier;
 
 	ret = pthread_create(thread, attr, ctrl_thread_init, (void *)params);
 	if (ret != 0)
-		goto fail;
+		goto fail_with_barrier;
 
 	if (name != NULL) {
 		ret = rte_thread_setname(*thread, name);
@@ -236,19 +238,22 @@  rte_ctrl_thread_create(pthread_t *thread, const char *name,
 
 	ret = pthread_setaffinity_np(*thread, sizeof(*cpuset), cpuset);
 	if (ret != 0)
-		goto fail_cancel;
+		params->start_routine = NULL;
 
 	pthread_barrier_wait(&params->configured);
 	ctrl_params_free(params);
 
-	return 0;
+	if (ret != 0)
+		/* start_routine has been set to NULL above; */
+		/* ctrl thread will exit immediately */
+		pthread_join(*thread, NULL);
 
-fail_cancel:
-	pthread_cancel(*thread);
-	pthread_join(*thread, NULL);
+	return -ret;
 
-fail:
+fail_with_barrier:
 	pthread_barrier_destroy(&params->configured);
+
+fail_no_barrier:
 	free(params);
 
 	return -ret;