From patchwork Thu Nov 29 23:35:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Carrillo, Erik G" X-Patchwork-Id: 48426 X-Patchwork-Delegate: jerinj@marvell.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6BCF31B51E; Fri, 30 Nov 2018 00:35:29 +0100 (CET) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 7792D1B50E for ; Fri, 30 Nov 2018 00:35:27 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Nov 2018 15:35:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,296,1539673200"; d="scan'208";a="119276306" Received: from wcpqa1.an.intel.com ([10.123.72.207]) by fmsmga001.fm.intel.com with ESMTP; 29 Nov 2018 15:35:26 -0800 From: Erik Gabriel Carrillo To: pbhagavatula@caviumnetworks.com, jerin.jacob@caviumnetworks.com, rsanford@akamai.com Cc: dev@dpdk.org Date: Thu, 29 Nov 2018 17:35:12 -0600 Message-Id: <1543534514-183766-2-git-send-email-erik.g.carrillo@intel.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1543534514-183766-1-git-send-email-erik.g.carrillo@intel.com> References: <1543534514-183766-1-git-send-email-erik.g.carrillo@intel.com> Subject: [dpdk-dev] [PATCH 1/3] timer: allow timer management in shared memory X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, the timer library uses a per-process table of structures to manage skiplists of timers presumably because timers contain arbitrary function pointers whose value may not resolve properly in other processes. However, if the same callback is used handle all timers, and that callback is only invoked in one process, then it woud be safe to allow the data structures to be allocated in shared memory, and to allow secondary processes to modify the timer lists. This would let timers be used in more multi-process scenarios. The library's global variables are wrapped with a struct, and an array of these structures is created in shared memory. The original APIs are updated to reference the zeroth entry in the array. This maintains the original behavior for both primary and secondary processes since the set intersection of their coremasks should be empty [1]. New APIs are introduced to enable the allocation/deallocation of other entries in the array. New variants of the APIs used to start and stop timers are introduced; they allow a caller to specify which array entry should be used to locate the timer list to insert into or delete from. Finally, a new variant of rte_timer_manage() is introduced, which allows a caller to specify which array entry should be used to locate the timer lists to process; it can also process multiple timer lists per invocation. [1] https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations Signed-off-by: Erik Gabriel Carrillo --- lib/librte_timer/Makefile | 1 + lib/librte_timer/rte_timer.c | 526 +++++++++++++++++++++++++++------ lib/librte_timer/rte_timer.h | 168 ++++++++++- lib/librte_timer/rte_timer_version.map | 21 +- 4 files changed, 614 insertions(+), 102 deletions(-) diff --git a/lib/librte_timer/Makefile b/lib/librte_timer/Makefile index 4ebd528..8ec63f4 100644 --- a/lib/librte_timer/Makefile +++ b/lib/librte_timer/Makefile @@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk # library name LIB = librte_timer.a +CFLAGS += -DALLOW_EXPERIMENTAL_API CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 LDLIBS += -lrte_eal diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c index 30c7b0a..a76be8b 100644 --- a/lib/librte_timer/rte_timer.c +++ b/lib/librte_timer/rte_timer.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #include #include @@ -21,23 +22,27 @@ #include #include #include +#include +#include #include "rte_timer.h" -LIST_HEAD(rte_timer_list, rte_timer); - +/** + * Per-lcore info for timers. + */ struct priv_timer { - struct rte_timer pending_head; /**< dummy timer instance to head up list */ + struct rte_timer pending_head; /**< dummy timer to head up list */ rte_spinlock_t list_lock; /**< lock to protect list access */ /** per-core variable that true if a timer was updated on this - * core since last reset of the variable */ + * core since last reset of the variable + */ int updated; /** track the current depth of the skiplist */ - unsigned curr_skiplist_depth; + unsigned int curr_skiplist_depth; - unsigned prev_lcore; /**< used for lcore round robin */ + unsigned int prev_lcore; /**< used for lcore round robin */ /** running timer on this lcore now */ struct rte_timer *running_tim; @@ -48,33 +53,140 @@ struct priv_timer { #endif } __rte_cache_aligned; -/** per-lcore private info for timers */ -static struct priv_timer priv_timer[RTE_MAX_LCORE]; +#define FL_ALLOCATED (1 << 0) +struct rte_timer_data { + struct priv_timer priv_timer[RTE_MAX_LCORE]; + uint8_t internal_flags; +}; + +#define RTE_MAX_DATA_ELS 64 +static struct rte_timer_data *rte_timer_data_arr; +static uint32_t default_data_id; // id set to zero automatically +static uint32_t rte_timer_subsystem_initialized; /* when debug is enabled, store some statistics */ #ifdef RTE_LIBRTE_TIMER_DEBUG -#define __TIMER_STAT_ADD(name, n) do { \ +#define __TIMER_STAT_ADD(data, name, n) do { \ unsigned __lcore_id = rte_lcore_id(); \ if (__lcore_id < RTE_MAX_LCORE) \ - priv_timer[__lcore_id].stats.name += (n); \ + data->priv_timer[__lcore_id].stats.name += (n); \ } while(0) #else -#define __TIMER_STAT_ADD(name, n) do {} while(0) +#define __TIMER_STAT_ADD(data, name, n) do {} while (0) #endif -/* Init the timer library. */ -void +static inline int +timer_data_valid(uint32_t id) +{ + return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED); +} + +/* validate ID and retrieve timer data pointer, or return error value */ +#define TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, retval) do { \ + if (id >= RTE_MAX_DATA_ELS || !timer_data_valid(id)) \ + return retval; \ + timer_data = &rte_timer_data_arr[id]; \ +} while (0) + +int __rte_experimental +rte_timer_data_alloc(uint32_t *id_ptr) +{ + int i; + struct rte_timer_data *data; + + if (!rte_timer_subsystem_initialized) + return -ENOMEM; + + for (i = 0; i < RTE_MAX_DATA_ELS; i++) { + data = &rte_timer_data_arr[i]; + if (!(data->internal_flags & FL_ALLOCATED)) { + data->internal_flags |= FL_ALLOCATED; + + if (id_ptr) + *id_ptr = i; + + return 0; + } + } + + return -ENOSPC; +} + +int __rte_experimental +rte_timer_data_dealloc(uint32_t id) +{ + struct rte_timer_data *timer_data; + TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, -EINVAL); + + timer_data->internal_flags &= ~(FL_ALLOCATED); + + return 0; +} + +/* Init the timer library. Allocate an array of timer data structs in shared + * memory, and allocate the zeroth entry for use with original timer + * APIs. Since the intersection of the sets of lcore ids in primary and + * secondary processes should be empty, the zeroth entry can be shared by + * multiple processes. + */ +int rte_timer_subsystem_init(void) { - unsigned lcore_id; + const struct rte_memzone *mz; + struct rte_timer_data *data; + int i, lcore_id; + static const char *mz_name = "rte_timer_mz"; - /* since priv_timer is static, it's zeroed by default, so only init some - * fields. - */ - for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id ++) { - rte_spinlock_init(&priv_timer[lcore_id].list_lock); - priv_timer[lcore_id].prev_lcore = lcore_id; + if (rte_timer_subsystem_initialized) + return -EALREADY; + + if (rte_eal_process_type() != RTE_PROC_PRIMARY) { + mz = rte_memzone_lookup(mz_name); + if (mz == NULL) + return -EEXIST; + + rte_timer_data_arr = mz->addr; + + rte_timer_data_arr[default_data_id].internal_flags |= + FL_ALLOCATED; + + rte_timer_subsystem_initialized = 1; + + return 0; + } + + mz = rte_memzone_reserve_aligned(mz_name, + RTE_MAX_DATA_ELS * sizeof(*rte_timer_data_arr), + SOCKET_ID_ANY, 0, RTE_CACHE_LINE_SIZE); + if (mz == NULL) + return -ENOMEM; + + rte_timer_data_arr = mz->addr; + + for (i = 0; i < RTE_MAX_DATA_ELS; i++) { + data = &rte_timer_data_arr[i]; + + for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { + rte_spinlock_init( + &data->priv_timer[lcore_id].list_lock); + data->priv_timer[lcore_id].prev_lcore = lcore_id; + } } + + rte_timer_data_arr[default_data_id].internal_flags |= FL_ALLOCATED; + + rte_timer_subsystem_initialized = 1; + + return 0; +} + +void __rte_experimental +rte_timer_subsystem_finalize(void) +{ + if (rte_timer_data_arr) + rte_free(rte_timer_data_arr); + + rte_timer_subsystem_initialized = 0; } /* Initialize the timer handle tim for use */ @@ -95,7 +207,8 @@ rte_timer_init(struct rte_timer *tim) */ static int timer_set_config_state(struct rte_timer *tim, - union rte_timer_status *ret_prev_status) + union rte_timer_status *ret_prev_status, + struct rte_timer_data *data) { union rte_timer_status prev_status, status; int success = 0; @@ -113,7 +226,7 @@ timer_set_config_state(struct rte_timer *tim, */ if (prev_status.state == RTE_TIMER_RUNNING && (prev_status.owner != (uint16_t)lcore_id || - tim != priv_timer[lcore_id].running_tim)) + tim != data->priv_timer[lcore_id].running_tim)) return -1; /* timer is being configured on another core */ @@ -207,13 +320,13 @@ timer_get_skiplist_level(unsigned curr_depth) */ static void timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore, - struct rte_timer **prev) + struct rte_timer **prev, struct rte_timer_data *data) { - unsigned lvl = priv_timer[tim_lcore].curr_skiplist_depth; - prev[lvl] = &priv_timer[tim_lcore].pending_head; - while(lvl != 0) { + unsigned int lvl = data->priv_timer[tim_lcore].curr_skiplist_depth; + prev[lvl] = &data->priv_timer[tim_lcore].pending_head; + while (lvl != 0) { lvl--; - prev[lvl] = prev[lvl+1]; + prev[lvl] = prev[lvl + 1]; while (prev[lvl]->sl_next[lvl] && prev[lvl]->sl_next[lvl]->expire <= time_val) prev[lvl] = prev[lvl]->sl_next[lvl]; @@ -226,14 +339,16 @@ timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore, */ static void timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore, - struct rte_timer **prev) + struct rte_timer **prev, + struct rte_timer_data *data) { int i; /* to get a specific entry in the list, look for just lower than the time * values, and then increment on each level individually if necessary */ - timer_get_prev_entries(tim->expire - 1, tim_lcore, prev); - for (i = priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0; i--) { + timer_get_prev_entries(tim->expire - 1, tim_lcore, prev, data); + for (i = data->priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0; + i--) { while (prev[i]->sl_next[i] != NULL && prev[i]->sl_next[i] != tim && prev[i]->sl_next[i]->expire <= tim->expire) @@ -247,20 +362,21 @@ timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore, * timer must not be in a list */ static void -timer_add(struct rte_timer *tim, unsigned int tim_lcore) +timer_add(struct rte_timer *tim, unsigned int tim_lcore, + struct rte_timer_data *data) { unsigned lvl; struct rte_timer *prev[MAX_SKIPLIST_DEPTH+1]; /* find where exactly this element goes in the list of elements * for each depth. */ - timer_get_prev_entries(tim->expire, tim_lcore, prev); + timer_get_prev_entries(tim->expire, tim_lcore, prev, data); /* now assign it a new level and add at that level */ const unsigned tim_level = timer_get_skiplist_level( - priv_timer[tim_lcore].curr_skiplist_depth); - if (tim_level == priv_timer[tim_lcore].curr_skiplist_depth) - priv_timer[tim_lcore].curr_skiplist_depth++; + data->priv_timer[tim_lcore].curr_skiplist_depth); + if (tim_level == data->priv_timer[tim_lcore].curr_skiplist_depth) + data->priv_timer[tim_lcore].curr_skiplist_depth++; lvl = tim_level; while (lvl > 0) { @@ -272,9 +388,10 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore) prev[0]->sl_next[0] = tim; /* save the lowest list entry into the expire field of the dummy hdr - * NOTE: this is not atomic on 32-bit*/ - priv_timer[tim_lcore].pending_head.expire = priv_timer[tim_lcore].\ - pending_head.sl_next[0]->expire; + * NOTE: this is not atomic on 32-bit + */ + data->priv_timer[tim_lcore].pending_head.expire = + data->priv_timer[tim_lcore].pending_head.sl_next[0]->expire; } /* @@ -284,7 +401,7 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore) */ static void timer_del(struct rte_timer *tim, union rte_timer_status prev_status, - int local_is_locked) + int local_is_locked, struct rte_timer_data *data) { unsigned lcore_id = rte_lcore_id(); unsigned prev_owner = prev_status.owner; @@ -295,30 +412,33 @@ timer_del(struct rte_timer *tim, union rte_timer_status prev_status, * list; if it is on local core, we need to lock if we are not * called from rte_timer_manage() */ if (prev_owner != lcore_id || !local_is_locked) - rte_spinlock_lock(&priv_timer[prev_owner].list_lock); + rte_spinlock_lock(&data->priv_timer[prev_owner].list_lock); /* save the lowest list entry into the expire field of the dummy hdr. * NOTE: this is not atomic on 32-bit */ - if (tim == priv_timer[prev_owner].pending_head.sl_next[0]) - priv_timer[prev_owner].pending_head.expire = + if (tim == data->priv_timer[prev_owner].pending_head.sl_next[0]) + data->priv_timer[prev_owner].pending_head.expire = ((tim->sl_next[0] == NULL) ? 0 : tim->sl_next[0]->expire); /* adjust pointers from previous entries to point past this */ - timer_get_prev_entries_for_node(tim, prev_owner, prev); - for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--) { + timer_get_prev_entries_for_node(tim, prev_owner, prev, data); + i = data->priv_timer[prev_owner].curr_skiplist_depth - 1; + for ( ; i >= 0; i--) { if (prev[i]->sl_next[i] == tim) prev[i]->sl_next[i] = tim->sl_next[i]; } /* in case we deleted last entry at a level, adjust down max level */ - for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--) - if (priv_timer[prev_owner].pending_head.sl_next[i] == NULL) - priv_timer[prev_owner].curr_skiplist_depth --; + for (i = data->priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; + i--) + if (data->priv_timer[prev_owner].pending_head.sl_next[i] == + NULL) + data->priv_timer[prev_owner].curr_skiplist_depth--; else break; if (prev_owner != lcore_id || !local_is_locked) - rte_spinlock_unlock(&priv_timer[prev_owner].list_lock); + rte_spinlock_unlock(&data->priv_timer[prev_owner].list_lock); } /* Reset and start the timer associated with the timer handle (private func) */ @@ -326,7 +446,8 @@ static int __rte_timer_reset(struct rte_timer *tim, uint64_t expire, uint64_t period, unsigned tim_lcore, rte_timer_cb_t fct, void *arg, - int local_is_locked) + int local_is_locked, + struct rte_timer_data *data) { union rte_timer_status prev_status, status; int ret; @@ -337,9 +458,9 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire, if (lcore_id < RTE_MAX_LCORE) { /* EAL thread with valid lcore_id */ tim_lcore = rte_get_next_lcore( - priv_timer[lcore_id].prev_lcore, + data->priv_timer[lcore_id].prev_lcore, 0, 1); - priv_timer[lcore_id].prev_lcore = tim_lcore; + data->priv_timer[lcore_id].prev_lcore = tim_lcore; } else /* non-EAL thread do not run rte_timer_manage(), * so schedule the timer on the first enabled lcore. */ @@ -348,20 +469,20 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire, /* wait that the timer is in correct status before update, * and mark it as being configured */ - ret = timer_set_config_state(tim, &prev_status); + ret = timer_set_config_state(tim, &prev_status, data); if (ret < 0) return -1; - __TIMER_STAT_ADD(reset, 1); + __TIMER_STAT_ADD(data, reset, 1); if (prev_status.state == RTE_TIMER_RUNNING && lcore_id < RTE_MAX_LCORE) { - priv_timer[lcore_id].updated = 1; + data->priv_timer[lcore_id].updated = 1; } /* remove it from list */ if (prev_status.state == RTE_TIMER_PENDING) { - timer_del(tim, prev_status, local_is_locked); - __TIMER_STAT_ADD(pending, -1); + timer_del(tim, prev_status, local_is_locked, data); + __TIMER_STAT_ADD(data, pending, -1); } tim->period = period; @@ -374,10 +495,10 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire, * we are not called from rte_timer_manage() */ if (tim_lcore != lcore_id || !local_is_locked) - rte_spinlock_lock(&priv_timer[tim_lcore].list_lock); + rte_spinlock_lock(&data->priv_timer[tim_lcore].list_lock); - __TIMER_STAT_ADD(pending, 1); - timer_add(tim, tim_lcore); + __TIMER_STAT_ADD(data, pending, 1); + timer_add(tim, tim_lcore, data); /* update state: as we are in CONFIG state, only us can modify * the state so we don't need to use cmpset() here */ @@ -387,7 +508,7 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire, tim->status.u32 = status.u32; if (tim_lcore != lcore_id || !local_is_locked) - rte_spinlock_unlock(&priv_timer[tim_lcore].list_lock); + rte_spinlock_unlock(&data->priv_timer[tim_lcore].list_lock); return 0; } @@ -395,11 +516,23 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire, /* Reset and start the timer associated with the timer handle tim */ int rte_timer_reset(struct rte_timer *tim, uint64_t ticks, - enum rte_timer_type type, unsigned tim_lcore, - rte_timer_cb_t fct, void *arg) + enum rte_timer_type type, unsigned int tim_lcore, + rte_timer_cb_t fct, void *arg) +{ + return rte_timer_alt_reset(default_data_id, tim, ticks, type, + tim_lcore, fct, arg); +} + +int __rte_experimental +rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim, + uint64_t ticks, enum rte_timer_type type, + unsigned int tim_lcore, rte_timer_cb_t fct, void *arg) { uint64_t cur_time = rte_get_timer_cycles(); uint64_t period; + struct rte_timer_data *timer_data; + + TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL); if (unlikely((tim_lcore != (unsigned)LCORE_ID_ANY) && !(rte_lcore_is_enabled(tim_lcore) || @@ -412,7 +545,7 @@ rte_timer_reset(struct rte_timer *tim, uint64_t ticks, period = 0; return __rte_timer_reset(tim, cur_time + ticks, period, tim_lcore, - fct, arg, 0); + fct, arg, 0, timer_data); } /* loop until rte_timer_reset() succeed */ @@ -430,26 +563,35 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks, int rte_timer_stop(struct rte_timer *tim) { + return rte_timer_alt_stop(default_data_id, tim); +} + +int __rte_experimental +rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim) +{ union rte_timer_status prev_status, status; unsigned lcore_id = rte_lcore_id(); int ret; + struct rte_timer_data *timer_data; + + TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL); /* wait that the timer is in correct status before update, * and mark it as being configured */ - ret = timer_set_config_state(tim, &prev_status); + ret = timer_set_config_state(tim, &prev_status, timer_data); if (ret < 0) return -1; - __TIMER_STAT_ADD(stop, 1); + __TIMER_STAT_ADD(timer_data, stop, 1); if (prev_status.state == RTE_TIMER_RUNNING && lcore_id < RTE_MAX_LCORE) { - priv_timer[lcore_id].updated = 1; + timer_data->priv_timer[lcore_id].updated = 1; } /* remove it from list */ if (prev_status.state == RTE_TIMER_PENDING) { - timer_del(tim, prev_status, 0); - __TIMER_STAT_ADD(pending, -1); + timer_del(tim, prev_status, 0, timer_data); + __TIMER_STAT_ADD(timer_data, pending, -1); } /* mark timer as stopped */ @@ -486,13 +628,14 @@ void rte_timer_manage(void) struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1]; uint64_t cur_time; int i, ret; + struct rte_timer_data *data = &rte_timer_data_arr[default_data_id]; /* timer manager only runs on EAL thread with valid lcore_id */ assert(lcore_id < RTE_MAX_LCORE); - __TIMER_STAT_ADD(manage, 1); + __TIMER_STAT_ADD(data, manage, 1); /* optimize for the case where per-cpu list is empty */ - if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL) + if (data->priv_timer[lcore_id].pending_head.sl_next[0] == NULL) return; cur_time = rte_get_timer_cycles(); @@ -500,32 +643,34 @@ void rte_timer_manage(void) /* on 64-bit the value cached in the pending_head.expired will be * updated atomically, so we can consult that for a quick check here * outside the lock */ - if (likely(priv_timer[lcore_id].pending_head.expire > cur_time)) + if (likely(data->priv_timer[lcore_id].pending_head.expire > cur_time)) return; #endif /* browse ordered list, add expired timers in 'expired' list */ - rte_spinlock_lock(&priv_timer[lcore_id].list_lock); + rte_spinlock_lock(&data->priv_timer[lcore_id].list_lock); /* if nothing to do just unlock and return */ - if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL || - priv_timer[lcore_id].pending_head.sl_next[0]->expire > cur_time) { - rte_spinlock_unlock(&priv_timer[lcore_id].list_lock); + if (data->priv_timer[lcore_id].pending_head.sl_next[0] == NULL || + data->priv_timer[lcore_id].pending_head.sl_next[0]->expire > + cur_time) { + rte_spinlock_unlock(&data->priv_timer[lcore_id].list_lock); return; } /* save start of list of expired timers */ - tim = priv_timer[lcore_id].pending_head.sl_next[0]; + tim = data->priv_timer[lcore_id].pending_head.sl_next[0]; /* break the existing list at current time point */ - timer_get_prev_entries(cur_time, lcore_id, prev); - for (i = priv_timer[lcore_id].curr_skiplist_depth -1; i >= 0; i--) { - if (prev[i] == &priv_timer[lcore_id].pending_head) + timer_get_prev_entries(cur_time, lcore_id, prev, data); + for (i = data->priv_timer[lcore_id].curr_skiplist_depth - 1; i >= 0; + i--) { + if (prev[i] == &data->priv_timer[lcore_id].pending_head) continue; - priv_timer[lcore_id].pending_head.sl_next[i] = + data->priv_timer[lcore_id].pending_head.sl_next[i] = prev[i]->sl_next[i]; if (prev[i]->sl_next[i] == NULL) - priv_timer[lcore_id].curr_skiplist_depth--; + data->priv_timer[lcore_id].curr_skiplist_depth--; prev[i] ->sl_next[i] = NULL; } @@ -548,25 +693,25 @@ void rte_timer_manage(void) } /* update the next to expire timer value */ - priv_timer[lcore_id].pending_head.expire = - (priv_timer[lcore_id].pending_head.sl_next[0] == NULL) ? 0 : - priv_timer[lcore_id].pending_head.sl_next[0]->expire; + data->priv_timer[lcore_id].pending_head.expire = + (data->priv_timer[lcore_id].pending_head.sl_next[0] == NULL) ? 0 : + data->priv_timer[lcore_id].pending_head.sl_next[0]->expire; - rte_spinlock_unlock(&priv_timer[lcore_id].list_lock); + rte_spinlock_unlock(&data->priv_timer[lcore_id].list_lock); /* now scan expired list and call callbacks */ for (tim = run_first_tim; tim != NULL; tim = next_tim) { next_tim = tim->sl_next[0]; - priv_timer[lcore_id].updated = 0; - priv_timer[lcore_id].running_tim = tim; + data->priv_timer[lcore_id].updated = 0; + data->priv_timer[lcore_id].running_tim = tim; /* execute callback function with list unlocked */ tim->f(tim, tim->arg); - __TIMER_STAT_ADD(pending, -1); + __TIMER_STAT_ADD(data, pending, -1); /* the timer was stopped or reloaded by the callback * function, we have nothing to do here */ - if (priv_timer[lcore_id].updated == 1) + if (data->priv_timer[lcore_id].updated == 1) continue; if (tim->period == 0) { @@ -578,33 +723,217 @@ void rte_timer_manage(void) } else { /* keep it in list and mark timer as pending */ - rte_spinlock_lock(&priv_timer[lcore_id].list_lock); + rte_spinlock_lock( + &data->priv_timer[lcore_id].list_lock); status.state = RTE_TIMER_PENDING; - __TIMER_STAT_ADD(pending, 1); + __TIMER_STAT_ADD(data, pending, 1); status.owner = (int16_t)lcore_id; rte_wmb(); tim->status.u32 = status.u32; __rte_timer_reset(tim, tim->expire + tim->period, - tim->period, lcore_id, tim->f, tim->arg, 1); - rte_spinlock_unlock(&priv_timer[lcore_id].list_lock); + tim->period, lcore_id, tim->f, tim->arg, 1, + data); + rte_spinlock_unlock( + &data->priv_timer[lcore_id].list_lock); + } + } + data->priv_timer[lcore_id].running_tim = NULL; +} + +int __rte_experimental +rte_timer_alt_manage(uint32_t timer_data_id, + unsigned int *poll_lcores, + int nb_poll_lcores, + rte_timer_alt_manage_cb_t f) +{ + union rte_timer_status status; + struct rte_timer *tim, *next_tim, **pprev; + struct rte_timer *run_first_tims[RTE_MAX_LCORE]; + unsigned int this_lcore = rte_lcore_id(); + struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1]; + uint64_t cur_time; + int i, j, ret; + int nb_runlists = 0; + struct priv_timer *priv_timer; + uint32_t poll_lcore; + struct rte_timer_data *data; + + TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL); + + /* timer manager only runs on EAL thread with valid lcore_id */ + assert(this_lcore < RTE_MAX_LCORE); + + __TIMER_STAT_ADD(data, manage, 1); + + if (poll_lcores == NULL) { + poll_lcores = (unsigned int []){rte_lcore_id()}; + nb_poll_lcores = 1; + } + + for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores; + poll_lcore = poll_lcores[++i]) { + priv_timer = &data->priv_timer[poll_lcore]; + + /* optimize for the case where per-cpu list is empty */ + if (priv_timer->pending_head.sl_next[0] == NULL) + continue; + cur_time = rte_get_timer_cycles(); + +#ifdef RTE_ARCH_64 + /* on 64-bit the value cached in the pending_head.expired will + * be updated atomically, so we can consult that for a quick + * check here outside the lock + */ + if (likely(priv_timer->pending_head.expire > cur_time)) + continue; +#endif + + /* browse ordered list, add expired timers in 'expired' list */ + rte_spinlock_lock(&priv_timer->list_lock); + + /* if nothing to do just unlock and return */ + if (priv_timer->pending_head.sl_next[0] == NULL || + priv_timer->pending_head.sl_next[0]->expire > cur_time) { + rte_spinlock_unlock(&priv_timer->list_lock); + continue; + } + + /* save start of list of expired timers */ + tim = priv_timer->pending_head.sl_next[0]; + + /* break the existing list at current time point */ + timer_get_prev_entries(cur_time, poll_lcore, prev, data); + for (j = priv_timer->curr_skiplist_depth - 1; j >= 0; j--) { + if (prev[j] == &priv_timer->pending_head) + continue; + + priv_timer->pending_head.sl_next[j] = + prev[j]->sl_next[j]; + + if (prev[j]->sl_next[j] == NULL) + priv_timer->curr_skiplist_depth--; + + prev[j]->sl_next[j] = NULL; + } + + /* transition run-list from PENDING to RUNNING */ + run_first_tims[nb_runlists] = tim; + pprev = &run_first_tims[nb_runlists]; + nb_runlists++; + + for ( ; tim != NULL; tim = next_tim) { + next_tim = tim->sl_next[0]; + + ret = timer_set_running_state(tim); + if (likely(ret == 0)) { + pprev = &tim->sl_next[0]; + } else { + /* another core is trying to re-config this one, + * remove it from local expired list + */ + *pprev = next_tim; + } + } + + /* update the next to expire timer value */ + priv_timer->pending_head.expire = + (priv_timer->pending_head.sl_next[0] == NULL) ? 0 : + priv_timer->pending_head.sl_next[0]->expire; + + rte_spinlock_unlock(&priv_timer->list_lock); + } + + /* Now process the run lists */ + while (1) { + bool done = true; + uint64_t min_expire = UINT64_MAX; + int min_idx = 0; + + /* Find the next oldest timer to process */ + for (i = 0; i < nb_runlists; i++) { + tim = run_first_tims[i]; + + if (tim != NULL && tim->expire < min_expire) { + min_expire = tim->expire; + min_idx = i; + done = false; + } + } + + if (done) + break; + + tim = run_first_tims[min_idx]; + + /* Move down the runlist from which we picked a timer to + * execute + */ + run_first_tims[min_idx] = run_first_tims[min_idx]->sl_next[0]; + + priv_timer->updated = 0; + priv_timer->running_tim = tim; + + /* Call the provided callback function */ + f(tim); + + __TIMER_STAT_ADD(data, pending, -1); + + /* the timer was stopped or reloaded by the callback + * function, we have nothing to do here + */ + if (priv_timer->updated == 1) + continue; + + if (tim->period == 0) { + /* remove from done list and mark timer as stopped */ + status.state = RTE_TIMER_STOP; + status.owner = RTE_TIMER_NO_OWNER; + rte_wmb(); + tim->status.u32 = status.u32; + } else { + /* keep it in list and mark timer as pending */ + rte_spinlock_lock( + &data->priv_timer[this_lcore].list_lock); + status.state = RTE_TIMER_PENDING; + __TIMER_STAT_ADD(data, pending, 1); + status.owner = (int16_t)this_lcore; + rte_wmb(); + tim->status.u32 = status.u32; + __rte_timer_reset(tim, tim->expire + tim->period, + tim->period, this_lcore, tim->f, tim->arg, 1, + data); + rte_spinlock_unlock( + &data->priv_timer[this_lcore].list_lock); } + + priv_timer->running_tim = NULL; } - priv_timer[lcore_id].running_tim = NULL; + + return 0; } /* dump statistics about timers */ void rte_timer_dump_stats(FILE *f) { + rte_timer_alt_dump_stats(default_data_id, f); +} + +int __rte_experimental +rte_timer_alt_dump_stats(uint32_t timer_data_id __rte_unused, FILE *f) +{ #ifdef RTE_LIBRTE_TIMER_DEBUG struct rte_timer_debug_stats sum; unsigned lcore_id; + struct rte_timer_data *timer_data; + + TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL); memset(&sum, 0, sizeof(sum)); for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { - sum.reset += priv_timer[lcore_id].stats.reset; - sum.stop += priv_timer[lcore_id].stats.stop; - sum.manage += priv_timer[lcore_id].stats.manage; - sum.pending += priv_timer[lcore_id].stats.pending; + sum.reset += data->priv_timer[lcore_id].stats.reset; + sum.stop += data->priv_timer[lcore_id].stats.stop; + sum.manage += data->priv_timer[lcore_id].stats.manage; + sum.pending += data->priv_timer[lcore_id].stats.pending; } fprintf(f, "Timer statistics:\n"); fprintf(f, " reset = %"PRIu64"\n", sum.reset); @@ -614,4 +943,5 @@ void rte_timer_dump_stats(FILE *f) #else fprintf(f, "No timer statistics, RTE_LIBRTE_TIMER_DEBUG is disabled\n"); #endif + return 0; } diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h index 9b95cd2..9daa334 100644 --- a/lib/librte_timer/rte_timer.h +++ b/lib/librte_timer/rte_timer.h @@ -39,6 +39,7 @@ #include #include #include +#include #ifdef __cplusplus extern "C" { @@ -132,12 +133,52 @@ struct rte_timer #endif /** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Allocate a timer data instance in shared memory to track a set of pending + * timer lists. + * + * @param id_ptr + * Pointer to variable into which to write the identifier of the allocated + * timer data instance. + * + * @return + * 0: Success + * -ENOSPC: maximum number of timer data instances already allocated + */ +int __rte_experimental rte_timer_data_alloc(uint32_t *id_ptr); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Deallocate a timer data instance. + * + * @param id + * Identifier of the timer data instance to deallocate. + * + * @return + * 0: Success + * -EINVAL: invalid timer data instance identifier + */ +int __rte_experimental rte_timer_data_dealloc(uint32_t id); + +/** * Initialize the timer library. * * Initializes internal variables (list, locks and so on) for the RTE * timer library. */ -void rte_timer_subsystem_init(void); +int rte_timer_subsystem_init(void); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Free timer subsystem resources. + */ +void __rte_experimental rte_timer_subsystem_finalize(void); /** * Initialize a timer handle. @@ -254,7 +295,6 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks, */ int rte_timer_stop(struct rte_timer *tim); - /** * Loop until rte_timer_stop() succeeds. * @@ -302,6 +342,130 @@ void rte_timer_manage(void); */ void rte_timer_dump_stats(FILE *f); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * This function is the same as rte_timer_reset(), except that it allows a + * caller to specify the rte_timer_data instance containing the list to which + * the timer should be added. + * + * @see rte_timer_reset() + * + * @param timer_data_id + * An identifier indicating which instance of timer data should be used for + * this operation. + * @param tim + * The timer handle. + * @param ticks + * The number of cycles (see rte_get_hpet_hz()) before the callback + * function is called. + * @param type + * The type can be either: + * - PERIODICAL: The timer is automatically reloaded after execution + * (returns to the PENDING state) + * - SINGLE: The timer is one-shot, that is, the timer goes to a + * STOPPED state after execution. + * @param tim_lcore + * The ID of the lcore where the timer callback function has to be + * executed. If tim_lcore is LCORE_ID_ANY, the timer library will + * launch it on a different core for each call (round-robin). + * @param fct + * The callback function of the timer. This parameter can be NULL if (and + * only if) rte_timer_alt_manage() will be used to manage this timer. + * @param arg + * The user argument of the callback function. + * @return + * - 0: Success; the timer is scheduled. + * - (-1): Timer is in the RUNNING or CONFIG state. + * - -EINVAL: invalid timer_data_id + */ +int __rte_experimental +rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim, + uint64_t ticks, enum rte_timer_type type, + unsigned int tim_lcore, rte_timer_cb_t fct, void *arg); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * This function is the same as rte_timer_stop(), except that it allows a + * caller to specify the rte_timer_data instance containing the list from which + * this timer should be removed. + * + * @see rte_timer_stop() + * + * @param timer_data_id + * An identifier indicating which instance of timer data should be used for + * this operation. + * @param tim + * The timer handle. + * @return + * - 0: Success; the timer is stopped. + * - (-1): The timer is in the RUNNING or CONFIG state. + * - -EINVAL: invalid timer_data_id + */ +int __rte_experimental +rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim); + +/** + * Callback function type for rte_timer_alt_manage(). + */ +typedef void (*rte_timer_alt_manage_cb_t)(void *); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Manage a set of timer lists and execute the specified callback function for + * all expired timers. This function is similar to rte_timer_manage(), except + * that it allows a caller to specify the timer_data instance that should + * be operated on, as well as a set of lcore IDs identifying which timer lists + * should be processed. Callback functions of individual timers are ignored. + * + * @see rte_timer_manage() + * + * @param timer_data_id + * An identifier indicating which instance of timer data should be used for + * this operation. + * @param poll_lcores + * An array of lcore ids identifying the timer lists that should be processed. + * NULL is allowed - if NULL, the timer list corresponding to the lcore + * calling this routine is processed (same as rte_timer_manage()). + * @param n_poll_lcores + * The size of the poll_lcores array. If 'poll_lcores' is NULL, this parameter + * is ignored. + * @param f + * The callback function which should be called for all expired timers. + * @return + * - 0: success + * - -EINVAL: invalid timer_data_id + */ +int __rte_experimental +rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores, + int n_poll_lcores, rte_timer_alt_manage_cb_t f); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * This function is the same as rte_timer_dump_stats(), except that it allows + * the caller to specify the rte_timer_data instance that should be used. + * + * @see rte_timer_dump_stats() + * + * @param timer_data_id + * An identifier indicating which instance of timer data should be used for + * this operation. + * @param f + * A pointer to a file for output + * @return + * - 0: success + * - -EINVAL: invalid timer_data_id + */ +int __rte_experimental +rte_timer_alt_dump_stats(uint32_t timer_data_id, FILE *f); + #ifdef __cplusplus } #endif diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map index 9b2e4b8..1e6b70d 100644 --- a/lib/librte_timer/rte_timer_version.map +++ b/lib/librte_timer/rte_timer_version.map @@ -3,13 +3,30 @@ DPDK_2.0 { rte_timer_dump_stats; rte_timer_init; - rte_timer_manage; rte_timer_pending; rte_timer_reset; rte_timer_reset_sync; rte_timer_stop; rte_timer_stop_sync; - rte_timer_subsystem_init; local: *; }; + +DPDK_19.02 { + global: + + rte_timer_manage; + rte_timer_subsystem_init; +} DPDK_2.0; + +EXPERIMENTAL { + global: + + rte_timer_alt_dump_stats; + rte_timer_alt_manage; + rte_timer_alt_reset; + rte_timer_alt_stop; + rte_timer_data_alloc; + rte_timer_data_dealloc; + rte_timer_subsystem_finalize; +}; From patchwork Thu Nov 29 23:35:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Carrillo, Erik G" X-Patchwork-Id: 48427 X-Patchwork-Delegate: jerinj@marvell.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 985F21B52B; Fri, 30 Nov 2018 00:35:32 +0100 (CET) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id D6EDF1B523 for ; Fri, 30 Nov 2018 00:35:30 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Nov 2018 15:35:30 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,296,1539673200"; d="scan'208";a="119277011" Received: from wcpqa1.an.intel.com ([10.123.72.207]) by fmsmga001.fm.intel.com with ESMTP; 29 Nov 2018 15:35:30 -0800 From: Erik Gabriel Carrillo To: pbhagavatula@caviumnetworks.com, jerin.jacob@caviumnetworks.com, rsanford@akamai.com Cc: dev@dpdk.org Date: Thu, 29 Nov 2018 17:35:13 -0600 Message-Id: <1543534514-183766-3-git-send-email-erik.g.carrillo@intel.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1543534514-183766-1-git-send-email-erik.g.carrillo@intel.com> References: <1543534514-183766-1-git-send-email-erik.g.carrillo@intel.com> Subject: [dpdk-dev] [PATCH 2/3] timer: add function to stop all timers in a list X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add a function to the timer API that allows a caller to traverse a specified set of timer lists, stopping each timer in each list, and invoking a callback function. Signed-off-by: Erik Gabriel Carrillo --- lib/librte_timer/rte_timer.c | 81 +++++++++++++++++++++++++++------- lib/librte_timer/rte_timer.h | 32 ++++++++++++++ lib/librte_timer/rte_timer_version.map | 1 + 3 files changed, 97 insertions(+), 17 deletions(-) diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c index a76be8b..1eaf755 100644 --- a/lib/librte_timer/rte_timer.c +++ b/lib/librte_timer/rte_timer.c @@ -559,39 +559,30 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks, rte_pause(); } -/* Stop the timer associated with the timer handle tim */ -int -rte_timer_stop(struct rte_timer *tim) -{ - return rte_timer_alt_stop(default_data_id, tim); -} - -int __rte_experimental -rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim) +static int +__rte_timer_stop(struct rte_timer *tim, int local_is_locked, + struct rte_timer_data *data) { union rte_timer_status prev_status, status; unsigned lcore_id = rte_lcore_id(); int ret; - struct rte_timer_data *timer_data; - - TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL); /* wait that the timer is in correct status before update, * and mark it as being configured */ - ret = timer_set_config_state(tim, &prev_status, timer_data); + ret = timer_set_config_state(tim, &prev_status, data); if (ret < 0) return -1; - __TIMER_STAT_ADD(timer_data, stop, 1); + __TIMER_STAT_ADD(data, stop, 1); if (prev_status.state == RTE_TIMER_RUNNING && lcore_id < RTE_MAX_LCORE) { - timer_data->priv_timer[lcore_id].updated = 1; + data->priv_timer[lcore_id].updated = 1; } /* remove it from list */ if (prev_status.state == RTE_TIMER_PENDING) { - timer_del(tim, prev_status, 0, timer_data); - __TIMER_STAT_ADD(timer_data, pending, -1); + timer_del(tim, prev_status, local_is_locked, data); + __TIMER_STAT_ADD(data, pending, -1); } /* mark timer as stopped */ @@ -603,6 +594,23 @@ rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim) return 0; } +/* Stop the timer associated with the timer handle tim */ +int +rte_timer_stop(struct rte_timer *tim) +{ + return rte_timer_alt_stop(default_data_id, tim); +} + +int __rte_experimental +rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim) +{ + struct rte_timer_data *timer_data; + + TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL); + + return __rte_timer_stop(tim, 0, timer_data); +} + /* loop until rte_timer_stop() succeed */ void rte_timer_stop_sync(struct rte_timer *tim) @@ -912,6 +920,45 @@ rte_timer_alt_manage(uint32_t timer_data_id, return 0; } +/* Walk pending lists, stopping timers and calling user-specified function */ +int __rte_experimental +rte_timer_stop_all(uint32_t timer_data_id, unsigned int *walk_lcores, + int nb_walk_lcores, + rte_timer_stop_all_cb_t f, void *f_arg) +{ + int i; + struct priv_timer *priv_timer; + uint32_t walk_lcore; + struct rte_timer *tim, *next_tim; + struct rte_timer_data *timer_data; + + TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL); + + for (i = 0, walk_lcore = walk_lcores[i]; + i < nb_walk_lcores; + walk_lcore = walk_lcores[++i]) { + priv_timer = &timer_data->priv_timer[walk_lcore]; + + rte_spinlock_lock(&priv_timer->list_lock); + + for (tim = priv_timer->pending_head.sl_next[0]; + tim != NULL; + tim = next_tim) { + next_tim = tim->sl_next[0]; + + /* Call timer_stop with lock held */ + __rte_timer_stop(tim, 1, timer_data); + + if (f) + f(tim, f_arg); + } + + rte_spinlock_unlock(&priv_timer->list_lock); + } + + return 0; +} + /* dump statistics about timers */ void rte_timer_dump_stats(FILE *f) { diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h index 9daa334..27b1ebd 100644 --- a/lib/librte_timer/rte_timer.h +++ b/lib/librte_timer/rte_timer.h @@ -446,6 +446,38 @@ rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores, int n_poll_lcores, rte_timer_alt_manage_cb_t f); /** + * Callback function type for rte_timer_stop_all(). + */ +typedef void (*rte_timer_stop_all_cb_t)(struct rte_timer *tim, void *arg); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Walk the pending timer lists for the specified lcore IDs, and for each timer + * that is encountered, stop it and call the specified callback function to + * process it further. + * + * @param timer_data_id + * An identifier indicating which instance of timer data should be used for + * this operation. + * @param walk_lcores + * An array of lcore ids identifying the timer lists that should be processed. + * @param nb_walk_lcores + * The size of the walk_lcores array. + * @param f + * The callback function which should be called for each timers. Can be NULL. + * @param f_arg + * An arbitrary argument that will be passed to f, if it is called. + * @return + * - 0: success + * - EINVAL: invalid timer_data_id + */ +int __rte_experimental +rte_timer_stop_all(uint32_t timer_data_id, unsigned int *walk_lcores, + int nb_walk_lcores, rte_timer_stop_all_cb_t f, void *f_arg); + +/** * @warning * @b EXPERIMENTAL: this API may change without prior notice * diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map index 1e6b70d..0fab845 100644 --- a/lib/librte_timer/rte_timer_version.map +++ b/lib/librte_timer/rte_timer_version.map @@ -28,5 +28,6 @@ EXPERIMENTAL { rte_timer_alt_stop; rte_timer_data_alloc; rte_timer_data_dealloc; + rte_timer_stop_all; rte_timer_subsystem_finalize; }; From patchwork Thu Nov 29 23:35:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Carrillo, Erik G" X-Patchwork-Id: 48428 X-Patchwork-Delegate: jerinj@marvell.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 224921B536; Fri, 30 Nov 2018 00:35:36 +0100 (CET) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 4D1F81B4B9 for ; Fri, 30 Nov 2018 00:35:34 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Nov 2018 15:35:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,296,1539673200"; d="scan'208";a="119277610" Received: from wcpqa1.an.intel.com ([10.123.72.207]) by fmsmga001.fm.intel.com with ESMTP; 29 Nov 2018 15:35:33 -0800 From: Erik Gabriel Carrillo To: pbhagavatula@caviumnetworks.com, jerin.jacob@caviumnetworks.com, rsanford@akamai.com Cc: dev@dpdk.org Date: Thu, 29 Nov 2018 17:35:14 -0600 Message-Id: <1543534514-183766-4-git-send-email-erik.g.carrillo@intel.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1543534514-183766-1-git-send-email-erik.g.carrillo@intel.com> References: <1543534514-183766-1-git-send-email-erik.g.carrillo@intel.com> Subject: [dpdk-dev] [PATCH 3/3] eventdev: add new software event timer adapter X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This commit updates the implementation of the software event timer adapter. The original version used rings to let producer cores (and secondary processes) send timers to a service core, which would then arm or cancel the timers, depending on what the application had requested. The ring can be a bottleneck, so we replace the original implementation with one that uses new APIs introduced in the timer library. The new APIs allow the underlying timer skiplists to be allocated in shared memory, which allows the producer cores in both primary and secondary processes to install timers directly into the lists, obviating the need for a ring. Each producer core also gets a unique timer list to insert timers into, so no contention occurs there. The adapter's service function can utilize a new flavor of rte_timer_manage() that can traverse multiple timer lists, and also accepts a callback function. The callback function is only called from the primary process, since that's where the service runs, and the callback is the same for all timers - it is defined to enqueue a timer expiry event in the event device. Signed-off-by: Erik Gabriel Carrillo --- lib/librte_eventdev/rte_event_timer_adapter.c | 687 +++++++++++--------------- 1 file changed, 275 insertions(+), 412 deletions(-) diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c index 79070d4..9c528cb 100644 --- a/lib/librte_eventdev/rte_event_timer_adapter.c +++ b/lib/librte_eventdev/rte_event_timer_adapter.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -19,6 +20,7 @@ #include #include #include +#include #include "rte_eventdev.h" #include "rte_eventdev_pmd.h" @@ -34,7 +36,7 @@ static int evtim_buffer_logtype; static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX]; -static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops; +static const struct rte_event_timer_adapter_ops swtim_ops; #define EVTIM_LOG(level, logtype, ...) \ rte_log(RTE_LOG_ ## level, logtype, \ @@ -211,7 +213,7 @@ rte_event_timer_adapter_create_ext( * implementation. */ if (adapter->ops == NULL) - adapter->ops = &sw_event_adapter_timer_ops; + adapter->ops = &swtim_ops; /* Allow driver to do some setup */ FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP); @@ -334,7 +336,7 @@ rte_event_timer_adapter_lookup(uint16_t adapter_id) * implementation. */ if (adapter->ops == NULL) - adapter->ops = &sw_event_adapter_timer_ops; + adapter->ops = &swtim_ops; /* Set fast-path function pointers */ adapter->arm_burst = adapter->ops->arm_burst; @@ -491,6 +493,7 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id, } *nb_events_inv = 0; + *nb_events_flushed = rte_event_enqueue_burst(dev_id, port_id, &events[tail_idx], n); if (*nb_events_flushed != n && rte_errno == -EINVAL) { @@ -498,137 +501,123 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id, (*nb_events_inv)++; } + if (*nb_events_flushed > 0) + EVTIM_BUF_LOG_DBG("enqueued %"PRIu16" timer events to event " + "device", *nb_events_flushed); + bufp->tail = bufp->tail + *nb_events_flushed + *nb_events_inv; } /* * Software event timer adapter implementation */ - -struct rte_event_timer_adapter_sw_data { - /* List of messages for outstanding timers */ - TAILQ_HEAD(, msg) msgs_tailq_head; - /* Lock to guard tailq and armed count */ - rte_spinlock_t msgs_tailq_sl; +struct swtim { /* Identifier of service executing timer management logic. */ uint32_t service_id; /* The cycle count at which the adapter should next tick */ uint64_t next_tick_cycles; - /* Incremented as the service moves through phases of an iteration */ - volatile int service_phase; /* The tick resolution used by adapter instance. May have been * adjusted from what user requested */ uint64_t timer_tick_ns; /* Maximum timeout in nanoseconds allowed by adapter instance. */ uint64_t max_tmo_ns; - /* Ring containing messages to arm or cancel event timers */ - struct rte_ring *msg_ring; - /* Mempool containing msg objects */ - struct rte_mempool *msg_pool; /* Buffered timer expiry events to be enqueued to an event device. */ struct event_buffer buffer; /* Statistics */ struct rte_event_timer_adapter_stats stats; - /* The number of threads currently adding to the message ring */ - rte_atomic16_t message_producer_count; + /* Mempool of timer objects */ + struct rte_mempool *tim_pool; + /* Back pointer for convenience */ + struct rte_event_timer_adapter *adapter; + /* Identifier of timer data instance */ + uint32_t timer_data_id; + /* Track which cores have actually armed a timer */ + rte_atomic16_t in_use[RTE_MAX_LCORE]; + /* Track which cores' timer lists should be polled */ + unsigned int poll_lcores[RTE_MAX_LCORE]; + /* The number of lists that should be polled */ + int n_poll_lcores; + /* Lock to atomically access the above two variables */ + rte_spinlock_t poll_lcores_sl; }; -enum msg_type {MSG_TYPE_ARM, MSG_TYPE_CANCEL}; - -struct msg { - enum msg_type type; - struct rte_event_timer *evtim; - struct rte_timer tim; - TAILQ_ENTRY(msg) msgs; -}; +static inline struct swtim * +swtim_pmd_priv(const struct rte_event_timer_adapter *adapter) +{ + return adapter->data->adapter_priv; +} static void -sw_event_timer_cb(struct rte_timer *tim, void *arg) +swtim_callback(void *arg) { - int ret; + struct rte_timer *tim = arg; + struct rte_event_timer *evtim = tim->arg; + struct rte_event_timer_adapter *adapter; + struct swtim *sw; uint16_t nb_evs_flushed = 0; uint16_t nb_evs_invalid = 0; uint64_t opaque; - struct rte_event_timer *evtim; - struct rte_event_timer_adapter *adapter; - struct rte_event_timer_adapter_sw_data *sw_data; + int ret; - evtim = arg; opaque = evtim->impl_opaque[1]; adapter = (struct rte_event_timer_adapter *)(uintptr_t)opaque; - sw_data = adapter->data->adapter_priv; + sw = swtim_pmd_priv(adapter); - ret = event_buffer_add(&sw_data->buffer, &evtim->ev); + ret = event_buffer_add(&sw->buffer, &evtim->ev); if (ret < 0) { /* If event buffer is full, put timer back in list with * immediate expiry value, so that we process it again on the * next iteration. */ - rte_timer_reset_sync(tim, 0, SINGLE, rte_lcore_id(), - sw_event_timer_cb, evtim); + rte_timer_alt_reset(sw->timer_data_id, tim, 0, SINGLE, + rte_lcore_id(), NULL, evtim); + + sw->stats.evtim_retry_count++; - sw_data->stats.evtim_retry_count++; EVTIM_LOG_DBG("event buffer full, resetting rte_timer with " "immediate expiry value"); } else { - struct msg *m = container_of(tim, struct msg, tim); - TAILQ_REMOVE(&sw_data->msgs_tailq_head, m, msgs); EVTIM_BUF_LOG_DBG("buffered an event timer expiry event"); - evtim->state = RTE_EVENT_TIMER_NOT_ARMED; + rte_mempool_put(sw->tim_pool, tim); + sw->stats.evtim_exp_count++; - /* Free the msg object containing the rte_timer now that - * we've buffered its event successfully. - */ - rte_mempool_put(sw_data->msg_pool, m); - - /* Bump the count when we successfully add an expiry event to - * the buffer. - */ - sw_data->stats.evtim_exp_count++; + evtim->state = RTE_EVENT_TIMER_NOT_ARMED; } - if (event_buffer_batch_ready(&sw_data->buffer)) { - event_buffer_flush(&sw_data->buffer, + if (event_buffer_batch_ready(&sw->buffer)) { + event_buffer_flush(&sw->buffer, adapter->data->event_dev_id, adapter->data->event_port_id, &nb_evs_flushed, &nb_evs_invalid); - sw_data->stats.ev_enq_count += nb_evs_flushed; - sw_data->stats.ev_inv_count += nb_evs_invalid; + sw->stats.ev_enq_count += nb_evs_flushed; + sw->stats.ev_inv_count += nb_evs_invalid; } } static __rte_always_inline uint64_t get_timeout_cycles(struct rte_event_timer *evtim, - struct rte_event_timer_adapter *adapter) + const struct rte_event_timer_adapter *adapter) { - uint64_t timeout_ns; - struct rte_event_timer_adapter_sw_data *sw_data; - - sw_data = adapter->data->adapter_priv; - timeout_ns = evtim->timeout_ticks * sw_data->timer_tick_ns; + struct swtim *sw = swtim_pmd_priv(adapter); + uint64_t timeout_ns = evtim->timeout_ticks * sw->timer_tick_ns; return timeout_ns * rte_get_timer_hz() / NSECPERSEC; - } /* This function returns true if one or more (adapter) ticks have occurred since * the last time it was called. */ static inline bool -adapter_did_tick(struct rte_event_timer_adapter *adapter) +swtim_did_tick(struct swtim *sw) { uint64_t cycles_per_adapter_tick, start_cycles; uint64_t *next_tick_cyclesp; - struct rte_event_timer_adapter_sw_data *sw_data; - - sw_data = adapter->data->adapter_priv; - next_tick_cyclesp = &sw_data->next_tick_cycles; - cycles_per_adapter_tick = sw_data->timer_tick_ns * + next_tick_cyclesp = &sw->next_tick_cycles; + cycles_per_adapter_tick = sw->timer_tick_ns * (rte_get_timer_hz() / NSECPERSEC); - start_cycles = rte_get_timer_cycles(); /* Note: initially, *next_tick_cyclesp == 0, so the clause below will @@ -640,7 +629,6 @@ adapter_did_tick(struct rte_event_timer_adapter *adapter) * boundary. */ start_cycles -= start_cycles % cycles_per_adapter_tick; - *next_tick_cyclesp = start_cycles + cycles_per_adapter_tick; return true; @@ -655,15 +643,12 @@ check_timeout(struct rte_event_timer *evtim, const struct rte_event_timer_adapter *adapter) { uint64_t tmo_nsec; - struct rte_event_timer_adapter_sw_data *sw_data; - - sw_data = adapter->data->adapter_priv; - tmo_nsec = evtim->timeout_ticks * sw_data->timer_tick_ns; + struct swtim *sw = swtim_pmd_priv(adapter); - if (tmo_nsec > sw_data->max_tmo_ns) + tmo_nsec = evtim->timeout_ticks * sw->timer_tick_ns; + if (tmo_nsec > sw->max_tmo_ns) return -1; - - if (tmo_nsec < sw_data->timer_tick_ns) + if (tmo_nsec < sw->timer_tick_ns) return -2; return 0; @@ -691,110 +676,34 @@ check_destination_event_queue(struct rte_event_timer *evtim, return 0; } -#define NB_OBJS 32 static int -sw_event_timer_adapter_service_func(void *arg) +swtim_service_func(void *arg) { - int i, num_msgs; - uint64_t cycles, opaque; + struct rte_event_timer_adapter *adapter = arg; + struct swtim *sw = swtim_pmd_priv(adapter); uint16_t nb_evs_flushed = 0; uint16_t nb_evs_invalid = 0; - struct rte_event_timer_adapter *adapter; - struct rte_event_timer_adapter_sw_data *sw_data; - struct rte_event_timer *evtim = NULL; - struct rte_timer *tim = NULL; - struct msg *msg, *msgs[NB_OBJS]; - - adapter = arg; - sw_data = adapter->data->adapter_priv; - - sw_data->service_phase = 1; - rte_smp_wmb(); - - while (rte_atomic16_read(&sw_data->message_producer_count) > 0 || - !rte_ring_empty(sw_data->msg_ring)) { - - num_msgs = rte_ring_dequeue_burst(sw_data->msg_ring, - (void **)msgs, NB_OBJS, NULL); - - for (i = 0; i < num_msgs; i++) { - int ret = 0; - - RTE_SET_USED(ret); - - msg = msgs[i]; - evtim = msg->evtim; - - switch (msg->type) { - case MSG_TYPE_ARM: - EVTIM_SVC_LOG_DBG("dequeued ARM message from " - "ring"); - tim = &msg->tim; - rte_timer_init(tim); - cycles = get_timeout_cycles(evtim, - adapter); - ret = rte_timer_reset(tim, cycles, SINGLE, - rte_lcore_id(), - sw_event_timer_cb, - evtim); - RTE_ASSERT(ret == 0); - - evtim->impl_opaque[0] = (uintptr_t)tim; - evtim->impl_opaque[1] = (uintptr_t)adapter; - - TAILQ_INSERT_TAIL(&sw_data->msgs_tailq_head, - msg, - msgs); - break; - case MSG_TYPE_CANCEL: - EVTIM_SVC_LOG_DBG("dequeued CANCEL message " - "from ring"); - opaque = evtim->impl_opaque[0]; - tim = (struct rte_timer *)(uintptr_t)opaque; - RTE_ASSERT(tim != NULL); - - ret = rte_timer_stop(tim); - RTE_ASSERT(ret == 0); - - /* Free the msg object for the original arm - * request. - */ - struct msg *m; - m = container_of(tim, struct msg, tim); - TAILQ_REMOVE(&sw_data->msgs_tailq_head, m, - msgs); - rte_mempool_put(sw_data->msg_pool, m); - - /* Free the msg object for the current msg */ - rte_mempool_put(sw_data->msg_pool, msg); - - evtim->impl_opaque[0] = 0; - evtim->impl_opaque[1] = 0; - - break; - } - } - } - - sw_data->service_phase = 2; - rte_smp_wmb(); - if (adapter_did_tick(adapter)) { - rte_timer_manage(); + if (swtim_did_tick(sw)) { + /* This lock is seldom acquired on the arm side */ + rte_spinlock_lock(&sw->poll_lcores_sl); + rte_timer_alt_manage(sw->timer_data_id, + sw->poll_lcores, + sw->n_poll_lcores, + swtim_callback); + rte_spinlock_unlock(&sw->poll_lcores_sl); - event_buffer_flush(&sw_data->buffer, + event_buffer_flush(&sw->buffer, adapter->data->event_dev_id, adapter->data->event_port_id, - &nb_evs_flushed, &nb_evs_invalid); + &nb_evs_flushed, + &nb_evs_invalid); - sw_data->stats.ev_enq_count += nb_evs_flushed; - sw_data->stats.ev_inv_count += nb_evs_invalid; - sw_data->stats.adapter_tick_count++; + sw->stats.ev_enq_count += nb_evs_flushed; + sw->stats.ev_inv_count += nb_evs_invalid; + sw->stats.adapter_tick_count++; } - sw_data->service_phase = 0; - rte_smp_wmb(); - return 0; } @@ -828,168 +737,145 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual) return cache_size; } -#define SW_MIN_INTERVAL 1E5 - static int -sw_event_timer_adapter_init(struct rte_event_timer_adapter *adapter) +swtim_init(struct rte_event_timer_adapter *adapter) { - int ret; - struct rte_event_timer_adapter_sw_data *sw_data; - uint64_t nb_timers; + int i, ret; + struct swtim *sw; unsigned int flags; struct rte_service_spec service; - static bool timer_subsystem_inited; // static initialized to false - /* Allocate storage for SW implementation data */ - char priv_data_name[RTE_RING_NAMESIZE]; - snprintf(priv_data_name, RTE_RING_NAMESIZE, "sw_evtim_adap_priv_%"PRIu8, - adapter->data->id); - adapter->data->adapter_priv = rte_zmalloc_socket( - priv_data_name, - sizeof(struct rte_event_timer_adapter_sw_data), - RTE_CACHE_LINE_SIZE, - adapter->data->socket_id); - if (adapter->data->adapter_priv == NULL) { + /* Allocate storage for private data area */ +#define SWTIM_NAMESIZE 32 + char swtim_name[SWTIM_NAMESIZE]; + snprintf(swtim_name, SWTIM_NAMESIZE, "swtim_%"PRIu8, + adapter->data->id); + sw = rte_zmalloc_socket(swtim_name, sizeof(*sw), RTE_CACHE_LINE_SIZE, + adapter->data->socket_id); + if (sw == NULL) { EVTIM_LOG_ERR("failed to allocate space for private data"); rte_errno = ENOMEM; return -1; } - if (adapter->data->conf.timer_tick_ns < SW_MIN_INTERVAL) { - EVTIM_LOG_ERR("failed to create adapter with requested tick " - "interval"); - rte_errno = EINVAL; - return -1; - } - - sw_data = adapter->data->adapter_priv; - - sw_data->timer_tick_ns = adapter->data->conf.timer_tick_ns; - sw_data->max_tmo_ns = adapter->data->conf.max_tmo_ns; + /* Connect storage to adapter instance */ + adapter->data->adapter_priv = sw; + sw->adapter = adapter; - TAILQ_INIT(&sw_data->msgs_tailq_head); - rte_spinlock_init(&sw_data->msgs_tailq_sl); - rte_atomic16_init(&sw_data->message_producer_count); - - /* Rings require power of 2, so round up to next such value */ - nb_timers = rte_align64pow2(adapter->data->conf.nb_timers); - - char msg_ring_name[RTE_RING_NAMESIZE]; - snprintf(msg_ring_name, RTE_RING_NAMESIZE, - "sw_evtim_adap_msg_ring_%"PRIu8, adapter->data->id); - flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ? - RING_F_SP_ENQ | RING_F_SC_DEQ : - RING_F_SC_DEQ; - sw_data->msg_ring = rte_ring_create(msg_ring_name, nb_timers, - adapter->data->socket_id, flags); - if (sw_data->msg_ring == NULL) { - EVTIM_LOG_ERR("failed to create message ring"); - rte_errno = ENOMEM; - goto free_priv_data; - } + sw->timer_tick_ns = adapter->data->conf.timer_tick_ns; + sw->max_tmo_ns = adapter->data->conf.max_tmo_ns; - char pool_name[RTE_RING_NAMESIZE]; - snprintf(pool_name, RTE_RING_NAMESIZE, "sw_evtim_adap_msg_pool_%"PRIu8, + /* Create a timer pool */ + char pool_name[SWTIM_NAMESIZE]; + snprintf(pool_name, SWTIM_NAMESIZE, "swtim_pool_%"PRIu8, adapter->data->id); - - /* Both the arming/canceling thread and the service thread will do puts - * to the mempool, but if the SP_PUT flag is enabled, we can specify - * single-consumer get for the mempool. - */ - flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ? - MEMPOOL_F_SC_GET : 0; - - /* The usable size of a ring is count - 1, so subtract one here to - * make the counts agree. - */ + /* Optimal mempool size is a power of 2 minus one */ + uint64_t nb_timers = rte_align64pow2(adapter->data->conf.nb_timers); int pool_size = nb_timers - 1; int cache_size = compute_msg_mempool_cache_size( adapter->data->conf.nb_timers, nb_timers); - sw_data->msg_pool = rte_mempool_create(pool_name, pool_size, - sizeof(struct msg), cache_size, - 0, NULL, NULL, NULL, NULL, - adapter->data->socket_id, flags); - if (sw_data->msg_pool == NULL) { - EVTIM_LOG_ERR("failed to create message object mempool"); + flags = 0; /* pool is multi-producer, multi-consumer */ + sw->tim_pool = rte_mempool_create(pool_name, pool_size, + sizeof(struct rte_timer), cache_size, 0, NULL, NULL, + NULL, NULL, adapter->data->socket_id, flags); + if (sw->tim_pool == NULL) { + EVTIM_LOG_ERR("failed to create timer object mempool"); rte_errno = ENOMEM; - goto free_msg_ring; + goto free_alloc; + } + + /* Initialize the variables that track in-use timer lists */ + rte_spinlock_init(&sw->poll_lcores_sl); + for (i = 0; i < RTE_MAX_LCORE; i++) + rte_atomic16_init(&sw->in_use[i]); + + /* Initialize the timer subsystem and allocate timer data instance */ + ret = rte_timer_subsystem_init(); + if (ret < 0) { + if (ret != -EALREADY) { + EVTIM_LOG_ERR("failed to initialize timer subsystem"); + rte_errno = ret; + goto free_mempool; + } + } + + ret = rte_timer_data_alloc(&sw->timer_data_id); + if (ret < 0) { + EVTIM_LOG_ERR("failed to allocate timer data instance"); + rte_errno = ret; + goto free_mempool; } - event_buffer_init(&sw_data->buffer); + /* Initialize timer event buffer */ + event_buffer_init(&sw->buffer); + + sw->adapter = adapter; /* Register a service component to run adapter logic */ memset(&service, 0, sizeof(service)); snprintf(service.name, RTE_SERVICE_NAME_MAX, - "sw_evimer_adap_svc_%"PRIu8, adapter->data->id); + "swtim_svc_%"PRIu8, adapter->data->id); service.socket_id = adapter->data->socket_id; - service.callback = sw_event_timer_adapter_service_func; + service.callback = swtim_service_func; service.callback_userdata = adapter; service.capabilities &= ~(RTE_SERVICE_CAP_MT_SAFE); - ret = rte_service_component_register(&service, &sw_data->service_id); + ret = rte_service_component_register(&service, &sw->service_id); if (ret < 0) { EVTIM_LOG_ERR("failed to register service %s with id %"PRIu32 - ": err = %d", service.name, sw_data->service_id, + ": err = %d", service.name, sw->service_id, ret); rte_errno = ENOSPC; - goto free_msg_pool; + goto free_mempool; } EVTIM_LOG_DBG("registered service %s with id %"PRIu32, service.name, - sw_data->service_id); + sw->service_id); - adapter->data->service_id = sw_data->service_id; + adapter->data->service_id = sw->service_id; adapter->data->service_inited = 1; - if (!timer_subsystem_inited) { - rte_timer_subsystem_init(); - timer_subsystem_inited = true; - } - return 0; - -free_msg_pool: - rte_mempool_free(sw_data->msg_pool); -free_msg_ring: - rte_ring_free(sw_data->msg_ring); -free_priv_data: - rte_free(sw_data); +free_mempool: + rte_mempool_free(sw->tim_pool); +free_alloc: + rte_free(sw); return -1; } -static int -sw_event_timer_adapter_uninit(struct rte_event_timer_adapter *adapter) +static void +swtim_free_tim(struct rte_timer *tim, void *arg) { - int ret; - struct msg *m1, *m2; - struct rte_event_timer_adapter_sw_data *sw_data = - adapter->data->adapter_priv; + struct swtim *sw = arg; - rte_spinlock_lock(&sw_data->msgs_tailq_sl); - - /* Cancel outstanding rte_timers and free msg objects */ - m1 = TAILQ_FIRST(&sw_data->msgs_tailq_head); - while (m1 != NULL) { - EVTIM_LOG_DBG("freeing outstanding timer"); - m2 = TAILQ_NEXT(m1, msgs); - - rte_timer_stop_sync(&m1->tim); - rte_mempool_put(sw_data->msg_pool, m1); + rte_mempool_put(sw->tim_pool, (void *)tim); +} - m1 = m2; - } +/* Traverse the list of outstanding timers and put them back in the mempool + * before freeing the adapter to avoid leaking the memory. + */ +static int +swtim_uninit(struct rte_event_timer_adapter *adapter) +{ + int ret; + struct swtim *sw = swtim_pmd_priv(adapter); - rte_spinlock_unlock(&sw_data->msgs_tailq_sl); + /* Free outstanding timers */ + rte_timer_stop_all(sw->timer_data_id, + sw->poll_lcores, + sw->n_poll_lcores, + swtim_free_tim, + sw); - ret = rte_service_component_unregister(sw_data->service_id); + ret = rte_service_component_unregister(sw->service_id); if (ret < 0) { EVTIM_LOG_ERR("failed to unregister service component"); return ret; } - rte_ring_free(sw_data->msg_ring); - rte_mempool_free(sw_data->msg_pool); - rte_free(adapter->data->adapter_priv); + rte_mempool_free(sw->tim_pool); + rte_free(sw); + adapter->data->adapter_priv = NULL; return 0; } @@ -1010,88 +896,79 @@ get_mapped_count_for_service(uint32_t service_id) } static int -sw_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter) +swtim_start(const struct rte_event_timer_adapter *adapter) { int mapped_count; - struct rte_event_timer_adapter_sw_data *sw_data; - - sw_data = adapter->data->adapter_priv; + struct swtim *sw = swtim_pmd_priv(adapter); /* Mapping the service to more than one service core can introduce * delays while one thread is waiting to acquire a lock, so only allow * one core to be mapped to the service. + * + * Note: the service could be modified such that it spreads cores to + * poll over multiple service instances. */ - mapped_count = get_mapped_count_for_service(sw_data->service_id); + mapped_count = get_mapped_count_for_service(sw->service_id); - if (mapped_count == 1) - return rte_service_component_runstate_set(sw_data->service_id, - 1); + if (mapped_count != 1) + return mapped_count < 1 ? -ENOENT : -ENOTSUP; - return mapped_count < 1 ? -ENOENT : -ENOTSUP; + return rte_service_component_runstate_set(sw->service_id, 1); } static int -sw_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter) +swtim_stop(const struct rte_event_timer_adapter *adapter) { int ret; - struct rte_event_timer_adapter_sw_data *sw_data = - adapter->data->adapter_priv; + struct swtim *sw = swtim_pmd_priv(adapter); - ret = rte_service_component_runstate_set(sw_data->service_id, 0); + ret = rte_service_component_runstate_set(sw->service_id, 0); if (ret < 0) return ret; - /* Wait for the service to complete its final iteration before - * stopping. - */ - while (sw_data->service_phase != 0) + /* Wait for the service to complete its final iteration */ + while (rte_service_may_be_active(sw->service_id)) rte_pause(); - rte_smp_rmb(); - return 0; } static void -sw_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter, +swtim_get_info(const struct rte_event_timer_adapter *adapter, struct rte_event_timer_adapter_info *adapter_info) { - struct rte_event_timer_adapter_sw_data *sw_data; - sw_data = adapter->data->adapter_priv; - - adapter_info->min_resolution_ns = sw_data->timer_tick_ns; - adapter_info->max_tmo_ns = sw_data->max_tmo_ns; + struct swtim *sw = swtim_pmd_priv(adapter); + adapter_info->min_resolution_ns = sw->timer_tick_ns; + adapter_info->max_tmo_ns = sw->max_tmo_ns; } static int -sw_event_timer_adapter_stats_get(const struct rte_event_timer_adapter *adapter, - struct rte_event_timer_adapter_stats *stats) +swtim_stats_get(const struct rte_event_timer_adapter *adapter, + struct rte_event_timer_adapter_stats *stats) { - struct rte_event_timer_adapter_sw_data *sw_data; - sw_data = adapter->data->adapter_priv; - *stats = sw_data->stats; + struct swtim *sw = swtim_pmd_priv(adapter); + *stats = sw->stats; /* structure copy */ return 0; } static int -sw_event_timer_adapter_stats_reset( - const struct rte_event_timer_adapter *adapter) +swtim_stats_reset(const struct rte_event_timer_adapter *adapter) { - struct rte_event_timer_adapter_sw_data *sw_data; - sw_data = adapter->data->adapter_priv; - memset(&sw_data->stats, 0, sizeof(sw_data->stats)); + struct swtim *sw = swtim_pmd_priv(adapter); + memset(&sw->stats, 0, sizeof(sw->stats)); return 0; } -static __rte_always_inline uint16_t -__sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter, - struct rte_event_timer **evtims, - uint16_t nb_evtims) +static uint16_t +__swtim_arm_burst(const struct rte_event_timer_adapter *adapter, + struct rte_event_timer **evtims, + uint16_t nb_evtims) { - uint16_t i; - int ret; - struct rte_event_timer_adapter_sw_data *sw_data; - struct msg *msgs[nb_evtims]; + int i, ret; + struct swtim *sw = swtim_pmd_priv(adapter); + uint32_t lcore_id = rte_lcore_id(); + struct rte_timer *tim, *tims[nb_evtims]; + uint64_t cycles; #ifdef RTE_LIBRTE_EVENTDEV_DEBUG /* Check that the service is running. */ @@ -1101,101 +978,104 @@ __sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter, } #endif - sw_data = adapter->data->adapter_priv; + /* Adjust lcore_id if non-EAL thread. Arbitrarily pick the timer list of + * the highest lcore to insert such timers into + */ + if (lcore_id == LCORE_ID_ANY) + lcore_id = RTE_MAX_LCORE - 1; + + /* If this is the first time we're arming an event timer on this lcore, + * mark this lcore as "in use"; this will cause the service + * function to process the timer list that corresponds to this lcore. + */ + if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore_id]))) { + rte_spinlock_lock(&sw->poll_lcores_sl); + EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll", + lcore_id); + sw->poll_lcores[sw->n_poll_lcores++] = lcore_id; + rte_spinlock_unlock(&sw->poll_lcores_sl); + } - ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims); + ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims, + nb_evtims); if (ret < 0) { rte_errno = ENOSPC; return 0; } - /* Let the service know we're producing messages for it to process */ - rte_atomic16_inc(&sw_data->message_producer_count); - - /* If the service is managing timers, wait for it to finish */ - while (sw_data->service_phase == 2) - rte_pause(); - - rte_smp_rmb(); - for (i = 0; i < nb_evtims; i++) { /* Don't modify the event timer state in these cases */ if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) { rte_errno = EALREADY; break; } else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED || - evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) { + evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) { rte_errno = EINVAL; break; } ret = check_timeout(evtims[i], adapter); - if (ret == -1) { + if (unlikely(ret == -1)) { evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE; rte_errno = EINVAL; break; - } - if (ret == -2) { + } else if (unlikely(ret == -2)) { evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY; rte_errno = EINVAL; break; } - if (check_destination_event_queue(evtims[i], adapter) < 0) { + if (unlikely(check_destination_event_queue(evtims[i], + adapter) < 0)) { evtims[i]->state = RTE_EVENT_TIMER_ERROR; rte_errno = EINVAL; break; } - /* Checks passed, set up a message to enqueue */ - msgs[i]->type = MSG_TYPE_ARM; - msgs[i]->evtim = evtims[i]; + tim = tims[i]; + rte_timer_init(tim); - /* Set the payload pointer if not set. */ - if (evtims[i]->ev.event_ptr == NULL) - evtims[i]->ev.event_ptr = evtims[i]; + evtims[i]->impl_opaque[0] = (uintptr_t)tim; + evtims[i]->impl_opaque[1] = (uintptr_t)adapter; - /* msg objects that get enqueued successfully will be freed - * either by a future cancel operation or by the timer - * expiration callback. - */ - if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) { - rte_errno = ENOSPC; + cycles = get_timeout_cycles(evtims[i], adapter); + ret = rte_timer_alt_reset(sw->timer_data_id, tim, cycles, + SINGLE, lcore_id, NULL, evtims[i]); + if (ret < 0) { + /* tim was in RUNNING or CONFIG state */ + evtims[i]->state = RTE_EVENT_TIMER_ERROR; break; } - EVTIM_LOG_DBG("enqueued ARM message to ring"); - + rte_smp_wmb(); + EVTIM_LOG_DBG("armed an event timer"); evtims[i]->state = RTE_EVENT_TIMER_ARMED; } - /* Let the service know we're done producing messages */ - rte_atomic16_dec(&sw_data->message_producer_count); - if (i < nb_evtims) - rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i], - nb_evtims - i); + rte_mempool_put_bulk(sw->tim_pool, + (void **)&tims[i], nb_evtims - i); return i; } static uint16_t -sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter, - struct rte_event_timer **evtims, - uint16_t nb_evtims) +swtim_arm_burst(const struct rte_event_timer_adapter *adapter, + struct rte_event_timer **evtims, + uint16_t nb_evtims) { - return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims); + return __swtim_arm_burst(adapter, evtims, nb_evtims); } static uint16_t -sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter, - struct rte_event_timer **evtims, - uint16_t nb_evtims) +swtim_cancel_burst(const struct rte_event_timer_adapter *adapter, + struct rte_event_timer **evtims, + uint16_t nb_evtims) { - uint16_t i; - int ret; - struct rte_event_timer_adapter_sw_data *sw_data; - struct msg *msgs[nb_evtims]; + int i, ret; + struct rte_timer *timp; + uint64_t opaque; + struct swtim *sw = swtim_pmd_priv(adapter); #ifdef RTE_LIBRTE_EVENTDEV_DEBUG /* Check that the service is running. */ @@ -1205,23 +1085,6 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter, } #endif - sw_data = adapter->data->adapter_priv; - - ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims); - if (ret < 0) { - rte_errno = ENOSPC; - return 0; - } - - /* Let the service know we're producing messages for it to process */ - rte_atomic16_inc(&sw_data->message_producer_count); - - /* If the service could be modifying event timer states, wait */ - while (sw_data->service_phase == 2) - rte_pause(); - - rte_smp_rmb(); - for (i = 0; i < nb_evtims; i++) { /* Don't modify the event timer state in these cases */ if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) { @@ -1232,54 +1095,54 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter, break; } - msgs[i]->type = MSG_TYPE_CANCEL; - msgs[i]->evtim = evtims[i]; + opaque = evtims[i]->impl_opaque[0]; + timp = (struct rte_timer *)(uintptr_t)opaque; + RTE_ASSERT(timp != NULL); - if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) { - rte_errno = ENOSPC; + ret = rte_timer_alt_stop(sw->timer_data_id, timp); + if (ret < 0) { + /* Timer is running or being configured */ + rte_errno = EAGAIN; break; } - EVTIM_LOG_DBG("enqueued CANCEL message to ring"); + rte_mempool_put(sw->tim_pool, (void **)timp); evtims[i]->state = RTE_EVENT_TIMER_CANCELED; - } + evtims[i]->impl_opaque[0] = 0; + evtims[i]->impl_opaque[1] = 0; - /* Let the service know we're done producing messages */ - rte_atomic16_dec(&sw_data->message_producer_count); - - if (i < nb_evtims) - rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i], - nb_evtims - i); + rte_smp_wmb(); + } return i; } static uint16_t -sw_event_timer_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter, - struct rte_event_timer **evtims, - uint64_t timeout_ticks, - uint16_t nb_evtims) +swtim_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter, + struct rte_event_timer **evtims, + uint64_t timeout_ticks, + uint16_t nb_evtims) { int i; for (i = 0; i < nb_evtims; i++) evtims[i]->timeout_ticks = timeout_ticks; - return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims); + return __swtim_arm_burst(adapter, evtims, nb_evtims); } -static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops = { - .init = sw_event_timer_adapter_init, - .uninit = sw_event_timer_adapter_uninit, - .start = sw_event_timer_adapter_start, - .stop = sw_event_timer_adapter_stop, - .get_info = sw_event_timer_adapter_get_info, - .stats_get = sw_event_timer_adapter_stats_get, - .stats_reset = sw_event_timer_adapter_stats_reset, - .arm_burst = sw_event_timer_arm_burst, - .arm_tmo_tick_burst = sw_event_timer_arm_tmo_tick_burst, - .cancel_burst = sw_event_timer_cancel_burst, +static const struct rte_event_timer_adapter_ops swtim_ops = { + .init = swtim_init, + .uninit = swtim_uninit, + .start = swtim_start, + .stop = swtim_stop, + .get_info = swtim_get_info, + .stats_get = swtim_stats_get, + .stats_reset = swtim_stats_reset, + .arm_burst = swtim_arm_burst, + .arm_tmo_tick_burst = swtim_arm_tmo_tick_burst, + .cancel_burst = swtim_cancel_burst, }; RTE_INIT(event_timer_adapter_init_log)