[1/1] timer: add limitation note for sync stop and reset

Message ID 1599662474-44882-2-git-send-email-erik.g.carrillo@intel.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series timer: add limitation note for sync stop and reset |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/travis-robot success Travis build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Carrillo, Erik G Sept. 9, 2020, 2:41 p.m. UTC
If a timer's callback function calls rte_timer_reset_sync() or
rte_timer_stop_sync() on another timer that is in the RUNNING state and
owned by the current lcore, the *_sync() calls will loop indefinitely.

Relatedly, if a timer's callback function calls *_sync() on another
timer that is in the RUNNING state and is owned by a different lcore,
but a timer callback function runs on that different lcore and calls
*_sync() on a timer that is in the RUNNING state and owned by the
current lcore, the two lcores will loop indefinitely.

Add a note in the rte_timer_stop_sync and rte_timer_reset_sync
documentation that indicates that these APIs should not be used inside
timer callback functions in order to avoid the hangs described above,
and suggests an alternative.

Bugzilla ID: 491
Cc: stable@dpdk.org

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_timer/rte_timer.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)
  

Comments

Honnappa Nagarahalli Sept. 10, 2020, 1:22 a.m. UTC | #1
<snip>

> 
> If a timer's callback function calls rte_timer_reset_sync() or
> rte_timer_stop_sync() on another timer that is in the RUNNING state and
> owned by the current lcore, the *_sync() calls will loop indefinitely.
> 
> Relatedly, if a timer's callback function calls *_sync() on another timer that is
> in the RUNNING state and is owned by a different lcore, but a timer callback
> function runs on that different lcore and calls
> *_sync() on a timer that is in the RUNNING state and owned by the current
> lcore, the two lcores will loop indefinitely.
> 
> Add a note in the rte_timer_stop_sync and rte_timer_reset_sync
> documentation that indicates that these APIs should not be used inside
> timer callback functions in order to avoid the hangs described above, and
> suggests an alternative.
> 
> Bugzilla ID: 491
> Cc: stable@dpdk.org
> 
> Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Looks good.
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  lib/librte_timer/rte_timer.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h index
> c6b3d45..d7c3e03 100644
> --- a/lib/librte_timer/rte_timer.h
> +++ b/lib/librte_timer/rte_timer.h
> @@ -274,6 +274,12 @@ int rte_timer_reset(struct rte_timer *tim, uint64_t
> ticks,
>   *   The callback function of the timer.
>   * @param arg
>   *   The user argument of the callback function.
> + *
> + * @note
> + *   This API should not be called inside a timer's callback function to
> + *   reset another timer; doing so could hang in certain scenarios. Instead,
> + *   the rte_timer_reset() API can be called directly and its return code
> + *   can be checked for success or failure.
>   */
>  void
>  rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks, @@ -313,6
> +319,12 @@ int rte_timer_stop(struct rte_timer *tim);
>   *
>   * @param tim
>   *   The timer handle.
> + *
> + * @note
> + *   This API should not be called inside a timer's callback function to
> + *   stop another timer; doing so could hang in certain scenarios. Instead,
> the
> + *   rte_timer_stop() API can be called directly and its return code can
> + *   be checked for success or failure.
>   */
>  void rte_timer_stop_sync(struct rte_timer *tim);
> 
> --
> 2.6.4
  
David Marchand Oct. 8, 2020, 10:28 a.m. UTC | #2
On Thu, Sep 10, 2020 at 3:23 AM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
> > If a timer's callback function calls rte_timer_reset_sync() or
> > rte_timer_stop_sync() on another timer that is in the RUNNING state and
> > owned by the current lcore, the *_sync() calls will loop indefinitely.
> >
> > Relatedly, if a timer's callback function calls *_sync() on another timer that is
> > in the RUNNING state and is owned by a different lcore, but a timer callback
> > function runs on that different lcore and calls
> > *_sync() on a timer that is in the RUNNING state and owned by the current
> > lcore, the two lcores will loop indefinitely.
> >
> > Add a note in the rte_timer_stop_sync and rte_timer_reset_sync
> > documentation that indicates that these APIs should not be used inside
> > timer callback functions in order to avoid the hangs described above, and
> > suggests an alternative.
> >
> > Bugzilla ID: 491
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Applied, thanks.

Since we go with documenting a limitation, should we mark the original
patches [1] and [2] as rejected instead of deferred?

1: https://patches.dpdk.org/patch/75156/
2: https://patches.dpdk.org/patch/73683/
  
Carrillo, Erik G Oct. 8, 2020, 1:58 p.m. UTC | #3
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Thursday, October 8, 2020 5:28 AM
> To: Carrillo, Erik G <erik.g.carrillo@intel.com>
> Cc: dev@dpdk.org; stable@dpdk.org; nd <nd@arm.com>; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Sarosh Arif
> <sarosh.arif@emumba.com>
> Subject: Re: [dpdk-dev] [PATCH 1/1] timer: add limitation note for sync stop
> and reset
> 
> On Thu, Sep 10, 2020 at 3:23 AM Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com> wrote:
> > > If a timer's callback function calls rte_timer_reset_sync() or
> > > rte_timer_stop_sync() on another timer that is in the RUNNING state
> > > and owned by the current lcore, the *_sync() calls will loop indefinitely.
> > >
> > > Relatedly, if a timer's callback function calls *_sync() on another
> > > timer that is in the RUNNING state and is owned by a different
> > > lcore, but a timer callback function runs on that different lcore
> > > and calls
> > > *_sync() on a timer that is in the RUNNING state and owned by the
> > > current lcore, the two lcores will loop indefinitely.
> > >
> > > Add a note in the rte_timer_stop_sync and rte_timer_reset_sync
> > > documentation that indicates that these APIs should not be used
> > > inside timer callback functions in order to avoid the hangs
> > > described above, and suggests an alternative.
> > >
> > > Bugzilla ID: 491
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> 
> Applied, thanks.
> 
> Since we go with documenting a limitation, should we mark the original
> patches [1] and [2] as rejected instead of deferred?
> 
> 1: https://patches.dpdk.org/patch/75156/
> 2: https://patches.dpdk.org/patch/73683/
> 
> 
Thanks, David.  

Yes, those patches should be moved to "rejected" - I tried to do it myself, but got permission errors.  Sarosh, can you make these updates?

Thanks,
Erik

> --
> David Marchand
  
David Marchand Oct. 8, 2020, 2:11 p.m. UTC | #4
On Thu, Oct 8, 2020 at 3:58 PM Carrillo, Erik G
<erik.g.carrillo@intel.com> wrote:
> > Since we go with documenting a limitation, should we mark the original
> > patches [1] and [2] as rejected instead of deferred?
> >
> > 1: https://patches.dpdk.org/patch/75156/
> > 2: https://patches.dpdk.org/patch/73683/
> >
> >
> Thanks, David.
>
> Yes, those patches should be moved to "rejected" - I tried to do it myself, but got permission errors.  Sarosh, can you make these updates?

I updated them.
Thanks Erik.
  

Patch

diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index c6b3d45..d7c3e03 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -274,6 +274,12 @@  int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
  *   The callback function of the timer.
  * @param arg
  *   The user argument of the callback function.
+ *
+ * @note
+ *   This API should not be called inside a timer's callback function to
+ *   reset another timer; doing so could hang in certain scenarios. Instead,
+ *   the rte_timer_reset() API can be called directly and its return code
+ *   can be checked for success or failure.
  */
 void
 rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
@@ -313,6 +319,12 @@  int rte_timer_stop(struct rte_timer *tim);
  *
  * @param tim
  *   The timer handle.
+ *
+ * @note
+ *   This API should not be called inside a timer's callback function to
+ *   stop another timer; doing so could hang in certain scenarios. Instead, the
+ *   rte_timer_stop() API can be called directly and its return code can
+ *   be checked for success or failure.
  */
 void rte_timer_stop_sync(struct rte_timer *tim);