[v2,3/4] timer: fix function to stop all timers

Message ID 20220810070958.3111119-1-s.v.naga.harish.k@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Jerin Jacob
Headers
Series [v2,1/4] eventdev/timer: add periodic event timer support |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed

Commit Message

Naga Harish K, S V Aug. 10, 2022, 7:09 a.m. UTC
  There is a possibility of deadlock in this API,
as same spinlock is tried to be acquired in nested manner.

In timer_del function, if the previous owner and current owner lcore
are different, the lock is tried to be acquired even though the same
lock is already acquired by the caller of timer_del function.

This patch removes the acquisition of nested locking.

Fixes: 821c51267bcd63a ("timer: add function to stop all timers in a list")
Cc: stable@dpdk.org

Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
---
 lib/timer/rte_timer.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)
  

Comments

Carrillo, Erik G Aug. 10, 2022, 7:29 p.m. UTC | #1
Hi Harish,

> -----Original Message-----
> From: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> Sent: Wednesday, August 10, 2022 2:10 AM
> To: Carrillo, Erik G <erik.g.carrillo@intel.com>
> Cc: dev@dpdk.org; stable@dpdk.org
> Subject: [PATCH v2 3/4] timer: fix function to stop all timers
> 
> There is a possibility of deadlock in this API, as same spinlock is tried to be
> acquired in nested manner.
> 
> In timer_del function, if the previous owner and current owner lcore are

It might be clearer to say something like:

 "If the lcore that is stopping the timer is different from the lcore that owns the timer, the timer list lock is acquired in timer_del(), even if local_is_locked is true.  Because the same lock was already acquired in rte_timer_stop_all(), the thread will hang."
  
Thanks,
Erik

> different, the lock is tried to be acquired even though the same lock is
> already acquired by the caller of timer_del function.
> 
> This patch removes the acquisition of nested locking.
> 
> Fixes: 821c51267bcd63a ("timer: add function to stop all timers in a list")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
> ---
  
Stephen Hemminger Aug. 10, 2022, 7:38 p.m. UTC | #2
On Wed, 10 Aug 2022 19:29:36 +0000
"Carrillo, Erik G" <erik.g.carrillo@intel.com> wrote:

> Hi Harish,
> 
> > -----Original Message-----
> > From: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > Sent: Wednesday, August 10, 2022 2:10 AM
> > To: Carrillo, Erik G <erik.g.carrillo@intel.com>
> > Cc: dev@dpdk.org; stable@dpdk.org
> > Subject: [PATCH v2 3/4] timer: fix function to stop all timers
> > 
> > There is a possibility of deadlock in this API, as same spinlock is tried to be
> > acquired in nested manner.
> > 
> > In timer_del function, if the previous owner and current owner lcore are  
> 
> It might be clearer to say something like:
> 
>  "If the lcore that is stopping the timer is different from the lcore that owns the timer, the timer list lock is acquired in timer_del(), even if local_is_locked is true.  Because the same lock was already acquired in rte_timer_stop_all(), the thread will hang."
>   

Yes, the timer owner flag acts like a lock and this is AB BA deadlock
  
Naga Harish K, S V Aug. 11, 2022, 3:42 p.m. UTC | #3
Hi Gabe,

> -----Original Message-----
> From: Carrillo, Erik G <erik.g.carrillo@intel.com>
> Sent: Thursday, August 11, 2022 1:00 AM
> To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> Cc: dev@dpdk.org; stable@dpdk.org
> Subject: RE: [PATCH v2 3/4] timer: fix function to stop all timers
> 
> Hi Harish,
> 
> > -----Original Message-----
> > From: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > Sent: Wednesday, August 10, 2022 2:10 AM
> > To: Carrillo, Erik G <erik.g.carrillo@intel.com>
> > Cc: dev@dpdk.org; stable@dpdk.org
> > Subject: [PATCH v2 3/4] timer: fix function to stop all timers
> >
> > There is a possibility of deadlock in this API, as same spinlock is
> > tried to be acquired in nested manner.
> >
> > In timer_del function, if the previous owner and current owner lcore
> > are
> 
> It might be clearer to say something like:
> 
>  "If the lcore that is stopping the timer is different from the lcore that owns
> the timer, the timer list lock is acquired in timer_del(), even if local_is_locked
> is true.  Because the same lock was already acquired in rte_timer_stop_all(),
> the thread will hang."
> 

Incorporated the commit message in v3 version of the patch

> Thanks,
> Erik
> 
> > different, the lock is tried to be acquired even though the same lock
> > is already acquired by the caller of timer_del function.
> >
> > This patch removes the acquisition of nested locking.
> >
> > Fixes: 821c51267bcd63a ("timer: add function to stop all timers in a
> > list")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
> > ---
  

Patch

diff --git a/lib/timer/rte_timer.c b/lib/timer/rte_timer.c
index 9994813d0d..85d67573eb 100644
--- a/lib/timer/rte_timer.c
+++ b/lib/timer/rte_timer.c
@@ -580,7 +580,7 @@  rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
 }
 
 static int
-__rte_timer_stop(struct rte_timer *tim, int local_is_locked,
+__rte_timer_stop(struct rte_timer *tim,
 		 struct rte_timer_data *timer_data)
 {
 	union rte_timer_status prev_status, status;
@@ -602,7 +602,7 @@  __rte_timer_stop(struct rte_timer *tim, int local_is_locked,
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, local_is_locked, priv_timer);
+		timer_del(tim, prev_status, 0, priv_timer);
 		__TIMER_STAT_ADD(priv_timer, pending, -1);
 	}
 
@@ -631,7 +631,7 @@  rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
 
 	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
 
-	return __rte_timer_stop(tim, 0, timer_data);
+	return __rte_timer_stop(tim, timer_data);
 }
 
 /* loop until rte_timer_stop() succeed */
@@ -987,21 +987,16 @@  rte_timer_stop_all(uint32_t timer_data_id, unsigned int *walk_lcores,
 		walk_lcore = walk_lcores[i];
 		priv_timer = &timer_data->priv_timer[walk_lcore];
 
-		rte_spinlock_lock(&priv_timer->list_lock);
-
 		for (tim = priv_timer->pending_head.sl_next[0];
 		     tim != NULL;
 		     tim = next_tim) {
 			next_tim = tim->sl_next[0];
 
-			/* Call timer_stop with lock held */
-			__rte_timer_stop(tim, 1, timer_data);
+			__rte_timer_stop(tim, timer_data);
 
 			if (f)
 				f(tim, f_arg);
 		}
-
-		rte_spinlock_unlock(&priv_timer->list_lock);
 	}
 
 	return 0;