[dpdk-dev,v2] Change alarm cancel function to thread-safe:

Message ID 20140925172358.GG32725@hmsreliant.think-freely.org (mailing list archive)
State Not Applicable, archived
Headers

Commit Message

Neil Horman Sept. 25, 2014, 5:23 p.m. UTC
On Thu, Sep 25, 2014 at 04:03:48PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
> > Sent: Thursday, September 25, 2014 4:08 PM
> > To: Jastrzebski, MichalX K
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> > 
> > On Thu, Sep 25, 2014 at 01:56:08PM +0100, Michal Jastrzebski wrote:
> > >     Change alarm cancel function to thread-safe.
> > >     It eliminates a race between threads using rte_alarm_cancel and
> > >     rte_alarm_set.
> > >
> > > Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
> > > Reviewed-by: Michal Jastrzebski <michalx.k.jastrzebski@intel.com>
> > >
> > > ---
> > >  lib/librte_eal/common/include/rte_alarm.h |    3 +-
> > >  lib/librte_eal/linuxapp/eal/eal_alarm.c   |   68 ++++++++++++++++++-----------
> > >  2 files changed, 45 insertions(+), 26 deletions(-)
> > >
> > 
> > > diff --git a/lib/librte_eal/common/include/rte_alarm.h b/lib/librte_eal/common/include/rte_alarm.h
> > > index d451522..e7cbaef 100644
> > > --- a/lib/librte_eal/common/include/rte_alarm.h
> > > +++ b/lib/librte_eal/common/include/rte_alarm.h
> > > @@ -76,7 +76,8 @@ typedef void (*rte_eal_alarm_callback)(void *arg);
> > >  int rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback cb, void *cb_arg);
> > >
> > >  /**
> > > - * Function to cancel an alarm callback which has been registered before.
> > > + * Function to cancel an alarm callback which has been registered before. If
> > > + * used outside alarm callback it wait for all callbacks to finish its execution.
> > >   *
> > >   * @param cb_fn
> > >   *  alarm callback
> > > diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > > index 480f0cb..ea8dfb4 100644
> > > --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > > +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > > @@ -69,7 +69,8 @@ struct alarm_entry {
> > >  	struct timeval time;
> > >  	rte_eal_alarm_callback cb_fn;
> > >  	void *cb_arg;
> > > -	volatile int executing;
> > > +	volatile uint8_t executing;
> > > +	volatile pthread_t executing_id;
> > >  };
> > >
> > >  static LIST_HEAD(alarm_list, alarm_entry) alarm_list = LIST_HEAD_INITIALIZER();
> > > @@ -108,11 +109,13 @@ eal_alarm_callback(struct rte_intr_handle *hdl __rte_unused,
> > >  			(ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec == now.tv_sec &&
> > >  						ap->time.tv_usec <= now.tv_usec))){
> > >  		ap->executing = 1;
> > > +		ap->executing_id = pthread_self();
> > How exactly does this work?  From my read all alarm callbacks are handled by the
> > thread created in rte_eal_intr_init (which runs forever in
> > eal_intr_thread_main()). 
> 
> In current implementation - yes.
> 
>  So every assignment to the above executing_id value
> > will be from that thread.  As such, anytime rte_eal_alarm_cancel is called from
> > within a callback we are guaranteed that:
> > a) the ap->executing flag is set to 1
> > b) the ap->executing_id value should equal whatever is returned from
> > pthread_self()
> 
> Yes
> 
> > 
> > That will cause the executing counter local to the cancel function to get
> > incremented, meaning we will deadlock withing that do { ... } while (executing
> > != 0) loop, no?
> 
> No, as for the case when cancel is called from callback:
> pthread_equal(ap->executing_id, pthread_self())
> would return non-zero value (which means threads ids are equal), so executing will not be incremented. 
> 
Ah, pthread_equal is one of the backwards functions that returns zero for
inequality.  Maybe then rewrite that as:
if (!pthread_equal(...)

So its clear that we're looking for inequality there to increment?

> > 
> > >  		rte_spinlock_unlock(&alarm_list_lk);
> > >
> > >  		ap->cb_fn(ap->cb_arg);
> > >
> > >  		rte_spinlock_lock(&alarm_list_lk);
> > > +
> > >  		LIST_REMOVE(ap, next);
> > >  		rte_free(ap);
> > >  	}
> > > @@ -145,7 +148,7 @@ rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback cb_fn, void *cb_arg)
> > >  	if (us < 1 || us > (UINT64_MAX - US_PER_S) || cb_fn == NULL)
> > >  		return -EINVAL;
> > >
> > > -	new_alarm = rte_malloc(NULL, sizeof(*new_alarm), 0);
> > > +	new_alarm = rte_zmalloc(NULL, sizeof(*new_alarm), 0);
> > >  	if (new_alarm == NULL)
> > >  		return -ENOMEM;
> > >
> > > @@ -156,7 +159,6 @@ rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback cb_fn, void *cb_arg)
> > >  	new_alarm->cb_arg = cb_arg;
> > >  	new_alarm->time.tv_usec = (now.tv_usec + us) % US_PER_S;
> > >  	new_alarm->time.tv_sec = now.tv_sec + ((now.tv_usec + us) / US_PER_S);
> > > -	new_alarm->executing = 0;
> > >
> > This removes the only place where ->executing is cleared again.  If there is
> > only one change to this bits state (which is the case after this patch), it
> > seems that you can just use the executing bit as the test in the alarm_cancel
> > function, and remove all the pthread_self mess.
> 
> I believe we do need executing_id here.
> It allows us to distinguish are we executing cancel from a callback or not.
> 
Given what you said above, I agree, at least in the current implementation.  It
still seems like theres a simpler solution that doesn't require all the
comparative gymnastics.

What if, instead of testing if you're the callback thread, we turn the executing
field of alarm_entry into a bitfield, where bit 0 represents the former
"executing" state, and bit 1 is defined as a "cancelled" bit.  Then
rte_eal_alarm_cancel becomes a search that, when an alarm is found simply or's
in the cancelled bit to the executing bit field.  When the callback thread runs,
it skips executing any alarm that is marked as cancelled, but frees all alarm
entries that are executed or cancelled.  That gives us a single point at which
frees of alarm entires happen?  Something like the patch below (completely
untested)?

It also seems like the alarm api as a whole could use some improvement.  The
way its written right now, theres no way to refer to a specific alarm (i.e.
cancelation relies on the specification of a function and data pointer, which
may refer to multiple timers).  Shouldn't rte_eal_alarm_set return an opaque
handle to a unique timer instance that can be store by a caller and used to
specfically cancel that timer?  Thats how both the bsd and linux timer
subsystems model timers.
  

Comments

Ananyev, Konstantin Sept. 25, 2014, 11:24 p.m. UTC | #1
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Thursday, September 25, 2014 6:24 PM
> To: Ananyev, Konstantin
> Cc: Jastrzebski, MichalX K; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> 
> On Thu, Sep 25, 2014 at 04:03:48PM +0000, Ananyev, Konstantin wrote:
> >
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
> > > Sent: Thursday, September 25, 2014 4:08 PM
> > > To: Jastrzebski, MichalX K
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> > >
> > > On Thu, Sep 25, 2014 at 01:56:08PM +0100, Michal Jastrzebski wrote:
> > > >     Change alarm cancel function to thread-safe.
> > > >     It eliminates a race between threads using rte_alarm_cancel and
> > > >     rte_alarm_set.
> > > >
> > > > Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
> > > > Reviewed-by: Michal Jastrzebski <michalx.k.jastrzebski@intel.com>
> > > >
> > > > ---
> > > >  lib/librte_eal/common/include/rte_alarm.h |    3 +-
> > > >  lib/librte_eal/linuxapp/eal/eal_alarm.c   |   68 ++++++++++++++++++-----------
> > > >  2 files changed, 45 insertions(+), 26 deletions(-)
> > > >
> > >
> > > > diff --git a/lib/librte_eal/common/include/rte_alarm.h b/lib/librte_eal/common/include/rte_alarm.h
> > > > index d451522..e7cbaef 100644
> > > > --- a/lib/librte_eal/common/include/rte_alarm.h
> > > > +++ b/lib/librte_eal/common/include/rte_alarm.h
> > > > @@ -76,7 +76,8 @@ typedef void (*rte_eal_alarm_callback)(void *arg);
> > > >  int rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback cb, void *cb_arg);
> > > >
> > > >  /**
> > > > - * Function to cancel an alarm callback which has been registered before.
> > > > + * Function to cancel an alarm callback which has been registered before. If
> > > > + * used outside alarm callback it wait for all callbacks to finish its execution.
> > > >   *
> > > >   * @param cb_fn
> > > >   *  alarm callback
> > > > diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > > > index 480f0cb..ea8dfb4 100644
> > > > --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > > > +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > > > @@ -69,7 +69,8 @@ struct alarm_entry {
> > > >  	struct timeval time;
> > > >  	rte_eal_alarm_callback cb_fn;
> > > >  	void *cb_arg;
> > > > -	volatile int executing;
> > > > +	volatile uint8_t executing;
> > > > +	volatile pthread_t executing_id;
> > > >  };
> > > >
> > > >  static LIST_HEAD(alarm_list, alarm_entry) alarm_list = LIST_HEAD_INITIALIZER();
> > > > @@ -108,11 +109,13 @@ eal_alarm_callback(struct rte_intr_handle *hdl __rte_unused,
> > > >  			(ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec == now.tv_sec &&
> > > >  						ap->time.tv_usec <= now.tv_usec))){
> > > >  		ap->executing = 1;
> > > > +		ap->executing_id = pthread_self();
> > > How exactly does this work?  From my read all alarm callbacks are handled by the
> > > thread created in rte_eal_intr_init (which runs forever in
> > > eal_intr_thread_main()).
> >
> > In current implementation - yes.
> >
> >  So every assignment to the above executing_id value
> > > will be from that thread.  As such, anytime rte_eal_alarm_cancel is called from
> > > within a callback we are guaranteed that:
> > > a) the ap->executing flag is set to 1
> > > b) the ap->executing_id value should equal whatever is returned from
> > > pthread_self()
> >
> > Yes
> >
> > >
> > > That will cause the executing counter local to the cancel function to get
> > > incremented, meaning we will deadlock withing that do { ... } while (executing
> > > != 0) loop, no?
> >
> > No, as for the case when cancel is called from callback:
> > pthread_equal(ap->executing_id, pthread_self())
> > would return non-zero value (which means threads ids are equal), so executing will not be incremented.
> >
> Ah, pthread_equal is one of the backwards functions that returns zero for
> inequality.  Maybe then rewrite that as:
> if (!pthread_equal(...)
> 
> So its clear that we're looking for inequality there to increment?
> 
> > >
> > > >  		rte_spinlock_unlock(&alarm_list_lk);
> > > >
> > > >  		ap->cb_fn(ap->cb_arg);
> > > >
> > > >  		rte_spinlock_lock(&alarm_list_lk);
> > > > +
> > > >  		LIST_REMOVE(ap, next);
> > > >  		rte_free(ap);
> > > >  	}
> > > > @@ -145,7 +148,7 @@ rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback cb_fn, void *cb_arg)
> > > >  	if (us < 1 || us > (UINT64_MAX - US_PER_S) || cb_fn == NULL)
> > > >  		return -EINVAL;
> > > >
> > > > -	new_alarm = rte_malloc(NULL, sizeof(*new_alarm), 0);
> > > > +	new_alarm = rte_zmalloc(NULL, sizeof(*new_alarm), 0);
> > > >  	if (new_alarm == NULL)
> > > >  		return -ENOMEM;
> > > >
> > > > @@ -156,7 +159,6 @@ rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback cb_fn, void *cb_arg)
> > > >  	new_alarm->cb_arg = cb_arg;
> > > >  	new_alarm->time.tv_usec = (now.tv_usec + us) % US_PER_S;
> > > >  	new_alarm->time.tv_sec = now.tv_sec + ((now.tv_usec + us) / US_PER_S);
> > > > -	new_alarm->executing = 0;
> > > >
> > > This removes the only place where ->executing is cleared again.  If there is
> > > only one change to this bits state (which is the case after this patch), it
> > > seems that you can just use the executing bit as the test in the alarm_cancel
> > > function, and remove all the pthread_self mess.
> >
> > I believe we do need executing_id here.
> > It allows us to distinguish are we executing cancel from a callback or not.
> >
> Given what you said above, I agree, at least in the current implementation.  It
> still seems like theres a simpler solution that doesn't require all the
> comparative gymnastics.
> 
> What if, instead of testing if you're the callback thread, we turn the executing
> field of alarm_entry into a bitfield, where bit 0 represents the former
> "executing" state, and bit 1 is defined as a "cancelled" bit.  Then
> rte_eal_alarm_cancel becomes a search that, when an alarm is found simply or's
> in the cancelled bit to the executing bit field.  When the callback thread runs,
> it skips executing any alarm that is marked as cancelled, but frees all alarm
> entries that are executed or cancelled.  That gives us a single point at which
> frees of alarm entires happen?  Something like the patch below (completely
> untested)?

So basically cancel() just set ALARM_CANCELLED and leaves actual alarm deletion to the callback()?
I think it is doable - but I don't see any real advantage with that approach.
Yes, code will become a bit simpler, as  we'll have one point when we remove alarm from the list.
But from other side, imagine such simple test-case:

for (i = 0; i < 0x100000; i++) {
   rte_eal_alarm_set(ONE_MIN, cb_func, (void *)i);
   rte_eal_alarm_cancel(cb_func, (void *)i);
} 

We'll endup with 1M of cancelled, but still not removed entries in the alarm_list.
With current implementation that means - few MBs of wasted memory,
plus very slow set() and cancel(), as they'll  have to traverse all entries in the list.  
And all that - for empty from user perspective alarm_list 
So I still prefer Michal's way.
After all, it doesn't look that complicated to me. 
BTW, any particular reason you are so negative about pthread_self()?

> 
> It also seems like the alarm api as a whole could use some improvement.  The
> way its written right now, theres no way to refer to a specific alarm (i.e.
> cancelation relies on the specification of a function and data pointer, which
> may refer to multiple timers).  Shouldn't rte_eal_alarm_set return an opaque
> handle to a unique timer instance that can be store by a caller and used to
> specfically cancel that timer?  Thats how both the bsd and linux timer
> subsystems model timers.

Yeh,  alarm API looks a bit unusual. 
Though, I suppose that's subject for another patch/discussion :)

> 
> 
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> index 480f0cb..73b6dc5 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> @@ -64,6 +64,9 @@
>  #define MS_PER_S 1000
>  #define US_PER_S (US_PER_MS * MS_PER_S)
> 
> +#define ALARM_EXECUTING (1 << 0)
> +#define ALARM_CANCELLED (1 << 1)
> +
>  struct alarm_entry {
>  	LIST_ENTRY(alarm_entry) next;
>  	struct timeval time;
> @@ -107,12 +110,14 @@ eal_alarm_callback(struct rte_intr_handle *hdl __rte_unused,
>  			gettimeofday(&now, NULL) == 0 &&
>  			(ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec == now.tv_sec &&
>  						ap->time.tv_usec <= now.tv_usec))){
> -		ap->executing = 1;
> -		rte_spinlock_unlock(&alarm_list_lk);
> +		ap->executing |= ALARM_EXECUTING;
> +		if (likely(!(ap->executing & ALARM_CANCELLED)) {
> +			rte_spinlock_unlock(&alarm_list_lk);
> 
> -		ap->cb_fn(ap->cb_arg);
> +			ap->cb_fn(ap->cb_arg);
> 
> -		rte_spinlock_lock(&alarm_list_lk);
> +			rte_spinlock_lock(&alarm_list_lk);
> +		}
>  		LIST_REMOVE(ap, next);
>  		rte_free(ap);
>  	}
> @@ -209,10 +214,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)
>  	rte_spinlock_lock(&alarm_list_lk);
>  	/* remove any matches at the start of the list */
>  	while ((ap = LIST_FIRST(&alarm_list)) != NULL &&
> -			cb_fn == ap->cb_fn && ap->executing == 0 &&
> +			cb_fn == ap->cb_fn &&
>  			(cb_arg == (void *)-1 || cb_arg == ap->cb_arg)) {
> -		LIST_REMOVE(ap, next);
> -		rte_free(ap);
> +		ap->executing |= ALARM_CANCELLED;
>  		count++;
>  	}
>  	ap_prev = ap;
> @@ -220,10 +224,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)
>  	/* now go through list, removing entries not at start */
>  	LIST_FOREACH(ap, &alarm_list, next) {
>  		/* this won't be true first time through */
> -		if (cb_fn == ap->cb_fn &&  ap->executing == 0 &&
> +		if (cb_fn == ap->cb_fn &&
>  				(cb_arg == (void *)-1 || cb_arg == ap->cb_arg)) {
> -			LIST_REMOVE(ap,next);
> -			rte_free(ap);
> +			ap->executing |= ALARM_CANCELLED;
>  			count++;
>  			ap = ap_prev;
>  		}
  
Wodkowski, PawelX Sept. 26, 2014, 6:33 a.m. UTC | #2
> Given what you said above, I agree, at least in the current implementation.  It
> still seems like theres a simpler solution that doesn't require all the
> comparative gymnastics.

Yes, there is simpler solution, but this solution involve recursive locking.
DPDK recursive spinlocks are no an option in here, so only option is posix recursive
mutex, which I think is even worst option than this gymnastics.

> 
> What if, instead of testing if you're the callback thread, we turn the executing
> field of alarm_entry into a bitfield, where bit 0 represents the former
> "executing" state, and bit 1 is defined as a "cancelled" bit.  Then
> rte_eal_alarm_cancel becomes a search that, when an alarm is found simply or's
> in the cancelled bit to the executing bit field.  When the callback thread runs,
> it skips executing any alarm that is marked as cancelled, but frees all alarm
> entries that are executed or cancelled.  That gives us a single point at which
> frees of alarm entires happen?  Something like the patch below (completely
> untested)?
> 
> It also seems like the alarm api as a whole could use some improvement.  The
> way its written right now, theres no way to refer to a specific alarm (i.e.
> cancelation relies on the specification of a function and data pointer, which
> may refer to multiple timers).  Shouldn't rte_eal_alarm_set return an opaque
> handle to a unique timer instance that can be store by a caller and used to
> specfically cancel that timer?  Thats how both the bsd and linux timer
> subsystems model timers.
> 

Goal was to not break user applications that use this library.

> 
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> index 480f0cb..73b6dc5 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> @@ -64,6 +64,9 @@
>  #define MS_PER_S 1000
>  #define US_PER_S (US_PER_MS * MS_PER_S)
> 
> +#define ALARM_EXECUTING (1 << 0)
> +#define ALARM_CANCELLED (1 << 1)
> +
>  struct alarm_entry {
>  	LIST_ENTRY(alarm_entry) next;
>  	struct timeval time;
> @@ -107,12 +110,14 @@ eal_alarm_callback(struct rte_intr_handle *hdl
> __rte_unused,
>  			gettimeofday(&now, NULL) == 0 &&
>  			(ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec ==
> now.tv_sec &&
>  						ap->time.tv_usec <=
> now.tv_usec))){
> -		ap->executing = 1;
> -		rte_spinlock_unlock(&alarm_list_lk);

Removing unlock here introduce deadlock.

> +		ap->executing |= ALARM_EXECUTING;
> +		if (likely(!(ap->executing & ALARM_CANCELLED)) {
> +			rte_spinlock_unlock(&alarm_list_lk);
> 
> -		ap->cb_fn(ap->cb_arg);
> +			ap->cb_fn(ap->cb_arg);
> 
> -		rte_spinlock_lock(&alarm_list_lk);
> +			rte_spinlock_lock(&alarm_list_lk);
> +		}
>  		LIST_REMOVE(ap, next);
>  		rte_free(ap);
>  	}
> @@ -209,10 +214,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> void *cb_arg)
>  	rte_spinlock_lock(&alarm_list_lk);
>  	/* remove any matches at the start of the list */
>  	while ((ap = LIST_FIRST(&alarm_list)) != NULL &&
> -			cb_fn == ap->cb_fn && ap->executing == 0 &&
> +			cb_fn == ap->cb_fn &&
>  			(cb_arg == (void *)-1 || cb_arg == ap->cb_arg)) {
> -		LIST_REMOVE(ap, next);
> -		rte_free(ap);
> +		ap->executing |= ALARM_CANCELLED;
>  		count++;
>  	}
>  	ap_prev = ap;
> @@ -220,10 +224,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> void *cb_arg)
>  	/* now go through list, removing entries not at start */
>  	LIST_FOREACH(ap, &alarm_list, next) {
>  		/* this won't be true first time through */
> -		if (cb_fn == ap->cb_fn &&  ap->executing == 0 &&
> +		if (cb_fn == ap->cb_fn &&
>  				(cb_arg == (void *)-1 || cb_arg == ap->cb_arg))
> {
> -			LIST_REMOVE(ap,next);
> -			rte_free(ap);
> +			ap->executing |= ALARM_CANCELLED;
>  			count++;
>  			ap = ap_prev;
>  		}

Pawel
  
Wodkowski, PawelX Sept. 26, 2014, 9:49 a.m. UTC | #3
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > index 480f0cb..73b6dc5 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > @@ -64,6 +64,9 @@
> >  #define MS_PER_S 1000
> >  #define US_PER_S (US_PER_MS * MS_PER_S)
> >
> > +#define ALARM_EXECUTING (1 << 0)
> > +#define ALARM_CANCELLED (1 << 1)
> > +
> >  struct alarm_entry {
> >  	LIST_ENTRY(alarm_entry) next;
> >  	struct timeval time;
> > @@ -107,12 +110,14 @@ eal_alarm_callback(struct rte_intr_handle *hdl
> > __rte_unused,
> >  			gettimeofday(&now, NULL) == 0 &&
> >  			(ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec ==
> > now.tv_sec &&
> >  						ap->time.tv_usec <=
> > now.tv_usec))){
> > -		ap->executing = 1;
> > -		rte_spinlock_unlock(&alarm_list_lk);
> 
> Removing unlock here introduce deadlock.

I does no spotted unlocking bellow so above is invalid.

> 
> > +		ap->executing |= ALARM_EXECUTING;
> > +		if (likely(!(ap->executing & ALARM_CANCELLED)) {
> > +			rte_spinlock_unlock(&alarm_list_lk);
> >
> > -		ap->cb_fn(ap->cb_arg);
> > +			ap->cb_fn(ap->cb_arg);
> >
> > -		rte_spinlock_lock(&alarm_list_lk);
> > +			rte_spinlock_lock(&alarm_list_lk);
> > +		}
> >  		LIST_REMOVE(ap, next);
> >  		rte_free(ap);
> >  	}
> > @@ -209,10 +214,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> > void *cb_arg)
> >  	rte_spinlock_lock(&alarm_list_lk);
> >  	/* remove any matches at the start of the list */
> >  	while ((ap = LIST_FIRST(&alarm_list)) != NULL &&
> > -			cb_fn == ap->cb_fn && ap->executing == 0 &&
> > +			cb_fn == ap->cb_fn &&
> >  			(cb_arg == (void *)-1 || cb_arg == ap->cb_arg)) {
> > -		LIST_REMOVE(ap, next);
> > -		rte_free(ap);
> > +		ap->executing |= ALARM_CANCELLED;
> >  		count++;
> >  	}
> >  	ap_prev = ap;
> > @@ -220,10 +224,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> > void *cb_arg)
> >  	/* now go through list, removing entries not at start */
> >  	LIST_FOREACH(ap, &alarm_list, next) {
> >  		/* this won't be true first time through */
> > -		if (cb_fn == ap->cb_fn &&  ap->executing == 0 &&
> > +		if (cb_fn == ap->cb_fn &&
> >  				(cb_arg == (void *)-1 || cb_arg == ap->cb_arg))
> > {
> > -			LIST_REMOVE(ap,next);
> > -			rte_free(ap);
> > +			ap->executing |= ALARM_CANCELLED;
> >  			count++;
> >  			ap = ap_prev;
> >  		}
> 
> Pawel
  
Neil Horman Sept. 26, 2014, 1:43 p.m. UTC | #4
On Fri, Sep 26, 2014 at 06:33:12AM +0000, Wodkowski, PawelX wrote:
> > Given what you said above, I agree, at least in the current implementation.  It
> > still seems like theres a simpler solution that doesn't require all the
> > comparative gymnastics.
> 
> Yes, there is simpler solution, but this solution involve recursive locking.
> DPDK recursive spinlocks are no an option in here, so only option is posix recursive
> mutex, which I think is even worst option than this gymnastics.
> 
I agree, lets avoid more locking if we can.

> > 
> > What if, instead of testing if you're the callback thread, we turn the executing
> > field of alarm_entry into a bitfield, where bit 0 represents the former
> > "executing" state, and bit 1 is defined as a "cancelled" bit.  Then
> > rte_eal_alarm_cancel becomes a search that, when an alarm is found simply or's
> > in the cancelled bit to the executing bit field.  When the callback thread runs,
> > it skips executing any alarm that is marked as cancelled, but frees all alarm
> > entries that are executed or cancelled.  That gives us a single point at which
> > frees of alarm entires happen?  Something like the patch below (completely
> > untested)?
> > 
> > It also seems like the alarm api as a whole could use some improvement.  The
> > way its written right now, theres no way to refer to a specific alarm (i.e.
> > cancelation relies on the specification of a function and data pointer, which
> > may refer to multiple timers).  Shouldn't rte_eal_alarm_set return an opaque
> > handle to a unique timer instance that can be store by a caller and used to
> > specfically cancel that timer?  Thats how both the bsd and linux timer
> > subsystems model timers.
> > 
> 
> Goal was to not break user applications that use this library.
> 
You break API all the time, why are you worried about it here?  I'm all for
maintaining API definately, but once my ABI versioning code gets into place we
can manage this alot better.

> > 
> > 
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > index 480f0cb..73b6dc5 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > @@ -64,6 +64,9 @@
> >  #define MS_PER_S 1000
> >  #define US_PER_S (US_PER_MS * MS_PER_S)
> > 
> > +#define ALARM_EXECUTING (1 << 0)
> > +#define ALARM_CANCELLED (1 << 1)
> > +
> >  struct alarm_entry {
> >  	LIST_ENTRY(alarm_entry) next;
> >  	struct timeval time;
> > @@ -107,12 +110,14 @@ eal_alarm_callback(struct rte_intr_handle *hdl
> > __rte_unused,
> >  			gettimeofday(&now, NULL) == 0 &&
> >  			(ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec ==
> > now.tv_sec &&
> >  						ap->time.tv_usec <=
> > now.tv_usec))){
> > -		ap->executing = 1;
> > -		rte_spinlock_unlock(&alarm_list_lk);
> 
> Removing unlock here introduce deadlock.
> 
Please look more closely, I've not removed anything, only moved where the lock
occurs.

> > +		ap->executing |= ALARM_EXECUTING;
> > +		if (likely(!(ap->executing & ALARM_CANCELLED)) {
> > +			rte_spinlock_unlock(&alarm_list_lk);
The unlock is now here, conditional on needing to call the callback.

> > 
> > -		ap->cb_fn(ap->cb_arg);
> > +			ap->cb_fn(ap->cb_arg);
> > 
> > -		rte_spinlock_lock(&alarm_list_lk);
> > +			rte_spinlock_lock(&alarm_list_lk);
> > +		}
> >  		LIST_REMOVE(ap, next);
> >  		rte_free(ap);
> >  	}
> > @@ -209,10 +214,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> > void *cb_arg)
> >  	rte_spinlock_lock(&alarm_list_lk);
> >  	/* remove any matches at the start of the list */
> >  	while ((ap = LIST_FIRST(&alarm_list)) != NULL &&
> > -			cb_fn == ap->cb_fn && ap->executing == 0 &&
> > +			cb_fn == ap->cb_fn &&
> >  			(cb_arg == (void *)-1 || cb_arg == ap->cb_arg)) {
> > -		LIST_REMOVE(ap, next);
> > -		rte_free(ap);
> > +		ap->executing |= ALARM_CANCELLED;
> >  		count++;
> >  	}
> >  	ap_prev = ap;
> > @@ -220,10 +224,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> > void *cb_arg)
> >  	/* now go through list, removing entries not at start */
> >  	LIST_FOREACH(ap, &alarm_list, next) {
> >  		/* this won't be true first time through */
> > -		if (cb_fn == ap->cb_fn &&  ap->executing == 0 &&
> > +		if (cb_fn == ap->cb_fn &&
> >  				(cb_arg == (void *)-1 || cb_arg == ap->cb_arg))
> > {
> > -			LIST_REMOVE(ap,next);
> > -			rte_free(ap);
> > +			ap->executing |= ALARM_CANCELLED;
> >  			count++;
> >  			ap = ap_prev;
> >  		}
> 
> Pawel
>
  

Patch

diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c b/lib/librte_eal/linuxapp/eal/eal_alarm.c
index 480f0cb..73b6dc5 100644
--- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
+++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
@@ -64,6 +64,9 @@ 
 #define MS_PER_S 1000
 #define US_PER_S (US_PER_MS * MS_PER_S)
 
+#define ALARM_EXECUTING (1 << 0) 
+#define ALARM_CANCELLED (1 << 1)
+ 
 struct alarm_entry {
 	LIST_ENTRY(alarm_entry) next;
 	struct timeval time;
@@ -107,12 +110,14 @@  eal_alarm_callback(struct rte_intr_handle *hdl __rte_unused,
 			gettimeofday(&now, NULL) == 0 &&
 			(ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec == now.tv_sec &&
 						ap->time.tv_usec <= now.tv_usec))){
-		ap->executing = 1;
-		rte_spinlock_unlock(&alarm_list_lk);
+		ap->executing |= ALARM_EXECUTING;
+		if (likely(!(ap->executing & ALARM_CANCELLED)) {
+			rte_spinlock_unlock(&alarm_list_lk);
 
-		ap->cb_fn(ap->cb_arg);
+			ap->cb_fn(ap->cb_arg);
 
-		rte_spinlock_lock(&alarm_list_lk);
+			rte_spinlock_lock(&alarm_list_lk);
+		}
 		LIST_REMOVE(ap, next);
 		rte_free(ap);
 	}
@@ -209,10 +214,9 @@  rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)
 	rte_spinlock_lock(&alarm_list_lk);
 	/* remove any matches at the start of the list */
 	while ((ap = LIST_FIRST(&alarm_list)) != NULL &&
-			cb_fn == ap->cb_fn && ap->executing == 0 &&
+			cb_fn == ap->cb_fn &&
 			(cb_arg == (void *)-1 || cb_arg == ap->cb_arg)) {
-		LIST_REMOVE(ap, next);
-		rte_free(ap);
+		ap->executing |= ALARM_CANCELLED;
 		count++;
 	}
 	ap_prev = ap;
@@ -220,10 +224,9 @@  rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)
 	/* now go through list, removing entries not at start */
 	LIST_FOREACH(ap, &alarm_list, next) {
 		/* this won't be true first time through */
-		if (cb_fn == ap->cb_fn &&  ap->executing == 0 &&
+		if (cb_fn == ap->cb_fn && 
 				(cb_arg == (void *)-1 || cb_arg == ap->cb_arg)) {
-			LIST_REMOVE(ap,next);
-			rte_free(ap);
+			ap->executing |= ALARM_CANCELLED;
 			count++;
 			ap = ap_prev;
 		}