From patchwork Thu Nov 29 23:35:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Carrillo, Erik G" X-Patchwork-Id: 48426 X-Patchwork-Delegate: jerinj@marvell.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6BCF31B51E; Fri, 30 Nov 2018 00:35:29 +0100 (CET) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 7792D1B50E for ; Fri, 30 Nov 2018 00:35:27 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Nov 2018 15:35:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,296,1539673200"; d="scan'208";a="119276306" Received: from wcpqa1.an.intel.com ([10.123.72.207]) by fmsmga001.fm.intel.com with ESMTP; 29 Nov 2018 15:35:26 -0800 From: Erik Gabriel Carrillo To: pbhagavatula@caviumnetworks.com, jerin.jacob@caviumnetworks.com, rsanford@akamai.com Cc: dev@dpdk.org Date: Thu, 29 Nov 2018 17:35:12 -0600 Message-Id: <1543534514-183766-2-git-send-email-erik.g.carrillo@intel.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1543534514-183766-1-git-send-email-erik.g.carrillo@intel.com> References: <1543534514-183766-1-git-send-email-erik.g.carrillo@intel.com> Subject: [dpdk-dev] [PATCH 1/3] timer: allow timer management in shared memory X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, the timer library uses a per-process table of structures to manage skiplists of timers presumably because timers contain arbitrary function pointers whose value may not resolve properly in other processes. However, if the same callback is used handle all timers, and that callback is only invoked in one process, then it woud be safe to allow the data structures to be allocated in shared memory, and to allow secondary processes to modify the timer lists. This would let timers be used in more multi-process scenarios. The library's global variables are wrapped with a struct, and an array of these structures is created in shared memory. The original APIs are updated to reference the zeroth entry in the array. This maintains the original behavior for both primary and secondary processes since the set intersection of their coremasks should be empty [1]. New APIs are introduced to enable the allocation/deallocation of other entries in the array. New variants of the APIs used to start and stop timers are introduced; they allow a caller to specify which array entry should be used to locate the timer list to insert into or delete from. Finally, a new variant of rte_timer_manage() is introduced, which allows a caller to specify which array entry should be used to locate the timer lists to process; it can also process multiple timer lists per invocation. [1] https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations Signed-off-by: Erik Gabriel Carrillo --- lib/librte_timer/Makefile | 1 + lib/librte_timer/rte_timer.c | 526 +++++++++++++++++++++++++++------ lib/librte_timer/rte_timer.h | 168 ++++++++++- lib/librte_timer/rte_timer_version.map | 21 +- 4 files changed, 614 insertions(+), 102 deletions(-) diff --git a/lib/librte_timer/Makefile b/lib/librte_timer/Makefile index 4ebd528..8ec63f4 100644 --- a/lib/librte_timer/Makefile +++ b/lib/librte_timer/Makefile @@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk # library name LIB = librte_timer.a +CFLAGS += -DALLOW_EXPERIMENTAL_API CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 LDLIBS += -lrte_eal diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c index 30c7b0a..a76be8b 100644 --- a/lib/librte_timer/rte_timer.c +++ b/lib/librte_timer/rte_timer.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #include #include @@ -21,23 +22,27 @@ #include #include #include +#include +#include #include "rte_timer.h" -LIST_HEAD(rte_timer_list, rte_timer); - +/** + * Per-lcore info for timers. + */ struct priv_timer { - struct rte_timer pending_head; /**< dummy timer instance to head up list */ + struct rte_timer pending_head; /**< dummy timer to head up list */ rte_spinlock_t list_lock; /**< lock to protect list access */ /** per-core variable that true if a timer was updated on this - * core since last reset of the variable */ + * core since last reset of the variable + */ int updated; /** track the current depth of the skiplist */ - unsigned curr_skiplist_depth; + unsigned int curr_skiplist_depth; - unsigned prev_lcore; /**< used for lcore round robin */ + unsigned int prev_lcore; /**< used for lcore round robin */ /** running timer on this lcore now */ struct rte_timer *running_tim; @@ -48,33 +53,140 @@ struct priv_timer { #endif } __rte_cache_aligned; -/** per-lcore private info for timers */ -static struct priv_timer priv_timer[RTE_MAX_LCORE]; +#define FL_ALLOCATED (1 << 0) +struct rte_timer_data { + struct priv_timer priv_timer[RTE_MAX_LCORE]; + uint8_t internal_flags; +}; + +#define RTE_MAX_DATA_ELS 64 +static struct rte_timer_data *rte_timer_data_arr; +static uint32_t default_data_id; // id set to zero automatically +static uint32_t rte_timer_subsystem_initialized; /* when debug is enabled, store some statistics */ #ifdef RTE_LIBRTE_TIMER_DEBUG -#define __TIMER_STAT_ADD(name, n) do { \ +#define __TIMER_STAT_ADD(data, name, n) do { \ unsigned __lcore_id = rte_lcore_id(); \ if (__lcore_id < RTE_MAX_LCORE) \ - priv_timer[__lcore_id].stats.name += (n); \ + data->priv_timer[__lcore_id].stats.name += (n); \ } while(0) #else -#define __TIMER_STAT_ADD(name, n) do {} while(0) +#define __TIMER_STAT_ADD(data, name, n) do {} while (0) #endif -/* Init the timer library. */ -void +static inline int +timer_data_valid(uint32_t id) +{ + return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED); +} + +/* validate ID and retrieve timer data pointer, or return error value */ +#define TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, retval) do { \ + if (id >= RTE_MAX_DATA_ELS || !timer_data_valid(id)) \ + return retval; \ + timer_data = &rte_timer_data_arr[id]; \ +} while (0) + +int __rte_experimental +rte_timer_data_alloc(uint32_t *id_ptr) +{ + int i; + struct rte_timer_data *data; + + if (!rte_timer_subsystem_initialized) + return -ENOMEM; + + for (i = 0; i < RTE_MAX_DATA_ELS; i++) { + data = &rte_timer_data_arr[i]; + if (!(data->internal_flags & FL_ALLOCATED)) { + data->internal_flags |= FL_ALLOCATED; + + if (id_ptr) + *id_ptr = i; + + return 0; + } + } + + return -ENOSPC; +} + +int __rte_experimental +rte_timer_data_dealloc(uint32_t id) +{ + struct rte_timer_data *timer_data; + TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, -EINVAL); + + timer_data->internal_flags &= ~(FL_ALLOCATED); + + return 0; +} + +/* Init the timer library. Allocate an array of timer data structs in shared + * memory, and allocate the zeroth entry for use with original timer + * APIs. Since the intersection of the sets of lcore ids in primary and + * secondary processes should be empty, the zeroth entry can be shared by + * multiple processes. + */ +int rte_timer_subsystem_init(void) { - unsigned lcore_id; + const struct rte_memzone *mz; + struct rte_timer_data *data; + int i, lcore_id; + static const char *mz_name = "rte_timer_mz"; - /* since priv_timer is static, it's zeroed by default, so only init some - * fields. - */ - for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id ++) { - rte_spinlock_init(&priv_timer[lcore_id].list_lock); - priv_timer[lcore_id].prev_lcore = lcore_id; + if (rte_timer_subsystem_initialized) + return -EALREADY; + + if (rte_eal_process_type() != RTE_PROC_PRIMARY) { + mz = rte_memzone_lookup(mz_name); + if (mz == NULL) + return -EEXIST; + + rte_timer_data_arr = mz->addr; + + rte_timer_data_arr[default_data_id].internal_flags |= + FL_ALLOCATED; + + rte_timer_subsystem_initialized = 1; + + return 0; + } + + mz = rte_memzone_reserve_aligned(mz_name, + RTE_MAX_DATA_ELS * sizeof(*rte_timer_data_arr), + SOCKET_ID_ANY, 0, RTE_CACHE_LINE_SIZE); + if (mz == NULL) + return -ENOMEM; + + rte_timer_data_arr = mz->addr; + + for (i = 0; i < RTE_MAX_DATA_ELS; i++) { + data = &rte_timer_data_arr[i]; + + for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { + rte_spinlock_init( + &data->priv_timer[lcore_id].list_lock); + data->priv_timer[lcore_id].prev_lcore = lcore_id; + } } + + rte_timer_data_arr[default_data_id].internal_flags |= FL_ALLOCATED; + + rte_timer_subsystem_initialized = 1; + + return 0; +} + +void __rte_experimental +rte_timer_subsystem_finalize(void) +{ + if (rte_timer_data_arr) + rte_free(rte_timer_data_arr); + + rte_timer_subsystem_initialized = 0; } /* Initialize the timer handle tim for use */ @@ -95,7 +207,8 @@ rte_timer_init(struct rte_timer *tim) */ static int timer_set_config_state(struct rte_timer *tim, - union rte_timer_status *ret_prev_status) + union rte_timer_status *ret_prev_status, + struct rte_timer_data *data) { union rte_timer_status prev_status, status; int success = 0; @@ -113,7 +226,7 @@ timer_set_config_state(struct rte_timer *tim, */ if (prev_status.state == RTE_TIMER_RUNNING && (prev_status.owner != (uint16_t)lcore_id || - tim != priv_timer[lcore_id].running_tim)) + tim != data->priv_timer[lcore_id].running_tim)) return -1; /* timer is being configured on another core */ @@ -207,13 +320,13 @@ timer_get_skiplist_level(unsigned curr_depth) */ static void timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore, - struct rte_timer **prev) + struct rte_timer **prev, struct rte_timer_data *data) { - unsigned lvl = priv_timer[tim_lcore].curr_skiplist_depth; - prev[lvl] = &priv_timer[tim_lcore].pending_head; - while(lvl != 0) { + unsigned int lvl = data->priv_timer[tim_lcore].curr_skiplist_depth; + prev[lvl] = &data->priv_timer[tim_lcore].pending_head; + while (lvl != 0) { lvl--; - prev[lvl] = prev[lvl+1]; + prev[lvl] = prev[lvl + 1]; while (prev[lvl]->sl_next[lvl] && prev[lvl]->sl_next[lvl]->expire <= time_val) prev[lvl] = prev[lvl]->sl_next[lvl]; @@ -226,14 +339,16 @@ timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore, */ static void timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore, - struct rte_timer **prev) + struct rte_timer **prev, + struct rte_timer_data *data) { int i; /* to get a specific entry in the list, look for just lower than the time * values, and then increment on each level individually if necessary */ - timer_get_prev_entries(tim->expire - 1, tim_lcore, prev); - for (i = priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0; i--) { + timer_get_prev_entries(tim->expire - 1, tim_lcore, prev, data); + for (i = data->priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0; + i--) { while (prev[i]->sl_next[i] != NULL && prev[i]->sl_next[i] != tim && prev[i]->sl_next[i]->expire <= tim->expire) @@ -247,20 +362,21 @@ timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore, * timer must not be in a list */ static void -timer_add(struct rte_timer *tim, unsigned int tim_lcore) +timer_add(struct rte_timer *tim, unsigned int tim_lcore, + struct rte_timer_data *data) { unsigned lvl; struct rte_timer *prev[MAX_SKIPLIST_DEPTH+1]; /* find where exactly this element goes in the list of elements * for each depth. */ - timer_get_prev_entries(tim->expire, tim_lcore, prev); + timer_get_prev_entries(tim->expire, tim_lcore, prev, data); /* now assign it a new level and add at that level */ const unsigned tim_level = timer_get_skiplist_level( - priv_timer[tim_lcore].curr_skiplist_depth); - if (tim_level == priv_timer[tim_lcore].curr_skiplist_depth) - priv_timer[tim_lcore].curr_skiplist_depth++; + data->priv_timer[tim_lcore].curr_skiplist_depth); + if (tim_level == data->priv_timer[tim_lcore].curr_skiplist_depth) + data->priv_timer[tim_lcore].curr_skiplist_depth++; lvl = tim_level; while (lvl > 0) { @@ -272,9 +388,10 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore) prev[0]->sl_next[0] = tim; /* save the lowest list entry into the expire field of the dummy hdr - * NOTE: this is not atomic on 32-bit*/ - priv_timer[tim_lcore].pending_head.expire = priv_timer[tim_lcore].\ - pending_head.sl_next[0]->expire; + * NOTE: this is not atomic on 32-bit + */ + data->priv_timer[tim_lcore].pending_head.expire = + data->priv_timer[tim_lcore].pending_head.sl_next[0]->expire; } /* @@ -284,7 +401,7 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore) */ static void timer_del(struct rte_timer *tim, union rte_timer_status prev_status, - int local_is_locked) + int local_is_locked, struct rte_timer_data *data) { unsigned lcore_id = rte_lcore_id(); unsigned prev_owner = prev_status.owner; @@ -295,30 +412,33 @@ timer_del(struct rte_timer *tim, union rte_timer_status prev_status, * list; if it is on local core, we need to lock if we are not * called from rte_timer_manage() */ if (prev_owner != lcore_id || !local_is_locked) - rte_spinlock_lock(&priv_timer[prev_owner].list_lock); + rte_spinlock_lock(&data->priv_timer[prev_owner].list_lock); /* save the lowest list entry into the expire field of the dummy hdr. * NOTE: this is not atomic on 32-bit */ - if (tim == priv_timer[prev_owner].pending_head.sl_next[0]) - priv_timer[prev_owner].pending_head.expire = + if (tim == data->priv_timer[prev_owner].pending_head.sl_next[0]) + data->priv_timer[prev_owner].pending_head.expire = ((tim->sl_next[0] == NULL) ? 0 : tim->sl_next[0]->expire); /* adjust pointers from previous entries to point past this */ - timer_get_prev_entries_for_node(tim, prev_owner, prev); - for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--) { + timer_get_prev_entries_for_node(tim, prev_owner, prev, data); + i = data->priv_timer[prev_owner].curr_skiplist_depth - 1; + for ( ; i >= 0; i--) { if (prev[i]->sl_next[i] == tim) prev[i]->sl_next[i] = tim->sl_next[i]; } /* in case we deleted last entry at a level, adjust down max level */ - for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--) - if (priv_timer[prev_owner].pending_head.sl_next[i] == NULL) - priv_timer[prev_owner].curr_skiplist_depth --; + for (i = data->priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; + i--) + if (data->priv_timer[prev_owner].pending_head.sl_next[i] == + NULL) + data->priv_timer[prev_owner].curr_skiplist_depth--; else break; if (prev_owner != lcore_id || !local_is_locked) - rte_spinlock_unlock(&priv_timer[prev_owner].list_lock); + rte_spinlock_unlock(&data->priv_timer[prev_owner].list_lock); } /* Reset and start the timer associated with the timer handle (private func) */ @@ -326,7 +446,8 @@ static int __rte_timer_reset(struct rte_timer *tim, uint64_t expire, uint64_t period, unsigned tim_lcore, rte_timer_cb_t fct, void *arg, - int local_is_locked) + int local_is_locked, + struct rte_timer_data *data) { union rte_timer_status prev_status, status; int ret; @@ -337,9 +458,9 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire, if (lcore_id < RTE_MAX_LCORE) { /* EAL thread with valid lcore_id */ tim_lcore = rte_get_next_lcore( - priv_timer[lcore_id].prev_lcore, + data->priv_timer[lcore_id].prev_lcore, 0, 1); - priv_timer[lcore_id].prev_lcore = tim_lcore; + data->priv_timer[lcore_id].prev_lcore = tim_lcore; } else /* non-EAL thread do not run rte_timer_manage(), * so schedule the timer on the first enabled lcore. */ @@ -348,20 +469,20 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire, /* wait that the timer is in correct status before update, * and mark it as being configured */ - ret = timer_set_config_state(tim, &prev_status); + ret = timer_set_config_state(tim, &prev_status, data); if (ret < 0) return -1; - __TIMER_STAT_ADD(reset, 1); + __TIMER_STAT_ADD(data, reset, 1); if (prev_status.state == RTE_TIMER_RUNNING && lcore_id < RTE_MAX_LCORE) { - priv_timer[lcore_id].updated = 1; + data->priv_timer[lcore_id].updated = 1; } /* remove it from list */ if (prev_status.state == RTE_TIMER_PENDING) { - timer_del(tim, prev_status, local_is_locked); - __TIMER_STAT_ADD(pending, -1); + timer_del(tim, prev_status, local_is_locked, data); + __TIMER_STAT_ADD(data, pending, -1); } tim->period = period; @@ -374,10 +495,10 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire, * we are not called from rte_timer_manage() */ if (tim_lcore != lcore_id || !local_is_locked) - rte_spinlock_lock(&priv_timer[tim_lcore].list_lock); + rte_spinlock_lock(&data->priv_timer[tim_lcore].list_lock); - __TIMER_STAT_ADD(pending, 1); - timer_add(tim, tim_lcore); + __TIMER_STAT_ADD(data, pending, 1); + timer_add(tim, tim_lcore, data); /* update state: as we are in CONFIG state, only us can modify * the state so we don't need to use cmpset() here */ @@ -387,7 +508,7 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire, tim->status.u32 = status.u32; if (tim_lcore != lcore_id || !local_is_locked) - rte_spinlock_unlock(&priv_timer[tim_lcore].list_lock); + rte_spinlock_unlock(&data->priv_timer[tim_lcore].list_lock); return 0; } @@ -395,11 +516,23 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire, /* Reset and start the timer associated with the timer handle tim */ int rte_timer_reset(struct rte_timer *tim, uint64_t ticks, - enum rte_timer_type type, unsigned tim_lcore, - rte_timer_cb_t fct, void *arg) + enum rte_timer_type type, unsigned int tim_lcore, + rte_timer_cb_t fct, void *arg) +{ + return rte_timer_alt_reset(default_data_id, tim, ticks, type, + tim_lcore, fct, arg); +} + +int __rte_experimental +rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim, + uint64_t ticks, enum rte_timer_type type, + unsigned int tim_lcore, rte_timer_cb_t fct, void *arg) { uint64_t cur_time = rte_get_timer_cycles(); uint64_t period; + struct rte_timer_data *timer_data; + + TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL); if (unlikely((tim_lcore != (unsigned)LCORE_ID_ANY) && !(rte_lcore_is_enabled(tim_lcore) || @@ -412,7 +545,7 @@ rte_timer_reset(struct rte_timer *tim, uint64_t ticks, period = 0; return __rte_timer_reset(tim, cur_time + ticks, period, tim_lcore, - fct, arg, 0); + fct, arg, 0, timer_data); } /* loop until rte_timer_reset() succeed */ @@ -430,26 +563,35 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks, int rte_timer_stop(struct rte_timer *tim) { + return rte_timer_alt_stop(default_data_id, tim); +} + +int __rte_experimental +rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim) +{ union rte_timer_status prev_status, status; unsigned lcore_id = rte_lcore_id(); int ret; + struct rte_timer_data *timer_data; + + TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL); /* wait that the timer is in correct status before update, * and mark it as being configured */ - ret = timer_set_config_state(tim, &prev_status); + ret = timer_set_config_state(tim, &prev_status, timer_data); if (ret < 0) return -1; - __TIMER_STAT_ADD(stop, 1); + __TIMER_STAT_ADD(timer_data, stop, 1); if (prev_status.state == RTE_TIMER_RUNNING && lcore_id < RTE_MAX_LCORE) { - priv_timer[lcore_id].updated = 1; + timer_data->priv_timer[lcore_id].updated = 1; } /* remove it from list */ if (prev_status.state == RTE_TIMER_PENDING) { - timer_del(tim, prev_status, 0); - __TIMER_STAT_ADD(pending, -1); + timer_del(tim, prev_status, 0, timer_data); + __TIMER_STAT_ADD(timer_data, pending, -1); } /* mark timer as stopped */ @@ -486,13 +628,14 @@ void rte_timer_manage(void) struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1]; uint64_t cur_time; int i, ret; + struct rte_timer_data *data = &rte_timer_data_arr[default_data_id]; /* timer manager only runs on EAL thread with valid lcore_id */ assert(lcore_id < RTE_MAX_LCORE); - __TIMER_STAT_ADD(manage, 1); + __TIMER_STAT_ADD(data, manage, 1); /* optimize for the case where per-cpu list is empty */ - if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL) + if (data->priv_timer[lcore_id].pending_head.sl_next[0] == NULL) return; cur_time = rte_get_timer_cycles(); @@ -500,32 +643,34 @@ void rte_timer_manage(void) /* on 64-bit the value cached in the pending_head.expired will be * updated atomically, so we can consult that for a quick check here * outside the lock */ - if (likely(priv_timer[lcore_id].pending_head.expire > cur_time)) + if (likely(data->priv_timer[lcore_id].pending_head.expire > cur_time)) return; #endif /* browse ordered list, add expired timers in 'expired' list */ - rte_spinlock_lock(&priv_timer[lcore_id].list_lock); + rte_spinlock_lock(&data->priv_timer[lcore_id].list_lock); /* if nothing to do just unlock and return */ - if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL || - priv_timer[lcore_id].pending_head.sl_next[0]->expire > cur_time) { - rte_spinlock_unlock(&priv_timer[lcore_id].list_lock); + if (data->priv_timer[lcore_id].pending_head.sl_next[0] == NULL || + data->priv_timer[lcore_id].pending_head.sl_next[0]->expire > + cur_time) { + rte_spinlock_unlock(&data->priv_timer[lcore_id].list_lock); return; } /* save start of list of expired timers */ - tim = priv_timer[lcore_id].pending_head.sl_next[0]; + tim = data->priv_timer[lcore_id].pending_head.sl_next[0]; /* break the existing list at current time point */ - timer_get_prev_entries(cur_time, lcore_id, prev); - for (i = priv_timer[lcore_id].curr_skiplist_depth -1; i >= 0; i--) { - if (prev[i] == &priv_timer[lcore_id].pending_head) + timer_get_prev_entries(cur_time, lcore_id, prev, data); + for (i = data->priv_timer[lcore_id].curr_skiplist_depth - 1; i >= 0; + i--) { + if (prev[i] == &data->priv_timer[lcore_id].pending_head) continue; - priv_timer[lcore_id].pending_head.sl_next[i] = + data->priv_timer[lcore_id].pending_head.sl_next[i] = prev[i]->sl_next[i]; if (prev[i]->sl_next[i] == NULL) - priv_timer[lcore_id].curr_skiplist_depth--; + data->priv_timer[lcore_id].curr_skiplist_depth--; prev[i] ->sl_next[i] = NULL; } @@ -548,25 +693,25 @@ void rte_timer_manage(void) } /* update the next to expire timer value */ - priv_timer[lcore_id].pending_head.expire = - (priv_timer[lcore_id].pending_head.sl_next[0] == NULL) ? 0 : - priv_timer[lcore_id].pending_head.sl_next[0]->expire; + data->priv_timer[lcore_id].pending_head.expire = + (data->priv_timer[lcore_id].pending_head.sl_next[0] == NULL) ? 0 : + data->priv_timer[lcore_id].pending_head.sl_next[0]->expire; - rte_spinlock_unlock(&priv_timer[lcore_id].list_lock); + rte_spinlock_unlock(&data->priv_timer[lcore_id].list_lock); /* now scan expired list and call callbacks */ for (tim = run_first_tim; tim != NULL; tim = next_tim) { next_tim = tim->sl_next[0]; - priv_timer[lcore_id].updated = 0; - priv_timer[lcore_id].running_tim = tim; + data->priv_timer[lcore_id].updated = 0; + data->priv_timer[lcore_id].running_tim = tim; /* execute callback function with list unlocked */ tim->f(tim, tim->arg); - __TIMER_STAT_ADD(pending, -1); + __TIMER_STAT_ADD(data, pending, -1); /* the timer was stopped or reloaded by the callback * function, we have nothing to do here */ - if (priv_timer[lcore_id].updated == 1) + if (data->priv_timer[lcore_id].updated == 1) continue; if (tim->period == 0) { @@ -578,33 +723,217 @@ void rte_timer_manage(void) } else { /* keep it in list and mark timer as pending */ - rte_spinlock_lock(&priv_timer[lcore_id].list_lock); + rte_spinlock_lock( + &data->priv_timer[lcore_id].list_lock); status.state = RTE_TIMER_PENDING; - __TIMER_STAT_ADD(pending, 1); + __TIMER_STAT_ADD(data, pending, 1); status.owner = (int16_t)lcore_id; rte_wmb(); tim->status.u32 = status.u32; __rte_timer_reset(tim, tim->expire + tim->period, - tim->period, lcore_id, tim->f, tim->arg, 1); - rte_spinlock_unlock(&priv_timer[lcore_id].list_lock); + tim->period, lcore_id, tim->f, tim->arg, 1, + data); + rte_spinlock_unlock( + &data->priv_timer[lcore_id].list_lock); + } + } + data->priv_timer[lcore_id].running_tim = NULL; +} + +int __rte_experimental +rte_timer_alt_manage(uint32_t timer_data_id, + unsigned int *poll_lcores, + int nb_poll_lcores, + rte_timer_alt_manage_cb_t f) +{ + union rte_timer_status status; + struct rte_timer *tim, *next_tim, **pprev; + struct rte_timer *run_first_tims[RTE_MAX_LCORE]; + unsigned int this_lcore = rte_lcore_id(); + struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1]; + uint64_t cur_time; + int i, j, ret; + int nb_runlists = 0; + struct priv_timer *priv_timer; + uint32_t poll_lcore; + struct rte_timer_data *data; + + TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL); + + /* timer manager only runs on EAL thread with valid lcore_id */ + assert(this_lcore < RTE_MAX_LCORE); + + __TIMER_STAT_ADD(data, manage, 1); + + if (poll_lcores == NULL) { + poll_lcores = (unsigned int []){rte_lcore_id()}; + nb_poll_lcores = 1; + } + + for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores; + poll_lcore = poll_lcores[++i]) { + priv_timer = &data->priv_timer[poll_lcore]; + + /* optimize for the case where per-cpu list is empty */ + if (priv_timer->pending_head.sl_next[0] == NULL) + continue; + cur_time = rte_get_timer_cycles(); + +#ifdef RTE_ARCH_64 + /* on 64-bit the value cached in the pending_head.expired will + * be updated atomically, so we can consult that for a quick + * check here outside the lock + */ + if (likely(priv_timer->pending_head.expire > cur_time)) + continue; +#endif + + /* browse ordered list, add expired timers in 'expired' list */ + rte_spinlock_lock(&priv_timer->list_lock); + + /* if nothing to do just unlock and return */ + if (priv_timer->pending_head.sl_next[0] == NULL || + priv_timer->pending_head.sl_next[0]->expire > cur_time) { + rte_spinlock_unlock(&priv_timer->list_lock); + continue; + } + + /* save start of list of expired timers */ + tim = priv_timer->pending_head.sl_next[0]; + + /* break the existing list at current time point */ + timer_get_prev_entries(cur_time, poll_lcore, prev, data); + for (j = priv_timer->curr_skiplist_depth - 1; j >= 0; j--) { + if (prev[j] == &priv_timer->pending_head) + continue; + + priv_timer->pending_head.sl_next[j] = + prev[j]->sl_next[j]; + + if (prev[j]->sl_next[j] == NULL) + priv_timer->curr_skiplist_depth--; + + prev[j]->sl_next[j] = NULL; + } + + /* transition run-list from PENDING to RUNNING */ + run_first_tims[nb_runlists] = tim; + pprev = &run_first_tims[nb_runlists]; + nb_runlists++; + + for ( ; tim != NULL; tim = next_tim) { + next_tim = tim->sl_next[0]; + + ret = timer_set_running_state(tim); + if (likely(ret == 0)) { + pprev = &tim->sl_next[0]; + } else { + /* another core is trying to re-config this one, + * remove it from local expired list + */ + *pprev = next_tim; + } + } + + /* update the next to expire timer value */ + priv_timer->pending_head.expire = + (priv_timer->pending_head.sl_next[0] == NULL) ? 0 : + priv_timer->pending_head.sl_next[0]->expire; + + rte_spinlock_unlock(&priv_timer->list_lock); + } + + /* Now process the run lists */ + while (1) { + bool done = true; + uint64_t min_expire = UINT64_MAX; + int min_idx = 0; + + /* Find the next oldest timer to process */ + for (i = 0; i < nb_runlists; i++) { + tim = run_first_tims[i]; + + if (tim != NULL && tim->expire < min_expire) { + min_expire = tim->expire; + min_idx = i; + done = false; + } + } + + if (done) + break; + + tim = run_first_tims[min_idx]; + + /* Move down the runlist from which we picked a timer to + * execute + */ + run_first_tims[min_idx] = run_first_tims[min_idx]->sl_next[0]; + + priv_timer->updated = 0; + priv_timer->running_tim = tim; + + /* Call the provided callback function */ + f(tim); + + __TIMER_STAT_ADD(data, pending, -1); + + /* the timer was stopped or reloaded by the callback + * function, we have nothing to do here + */ + if (priv_timer->updated == 1) + continue; + + if (tim->period == 0) { + /* remove from done list and mark timer as stopped */ + status.state = RTE_TIMER_STOP; + status.owner = RTE_TIMER_NO_OWNER; + rte_wmb(); + tim->status.u32 = status.u32; + } else { + /* keep it in list and mark timer as pending */ + rte_spinlock_lock( + &data->priv_timer[this_lcore].list_lock); + status.state = RTE_TIMER_PENDING; + __TIMER_STAT_ADD(data, pending, 1); + status.owner = (int16_t)this_lcore; + rte_wmb(); + tim->status.u32 = status.u32; + __rte_timer_reset(tim, tim->expire + tim->period, + tim->period, this_lcore, tim->f, tim->arg, 1, + data); + rte_spinlock_unlock( + &data->priv_timer[this_lcore].list_lock); } + + priv_timer->running_tim = NULL; } - priv_timer[lcore_id].running_tim = NULL; + + return 0; } /* dump statistics about timers */ void rte_timer_dump_stats(FILE *f) { + rte_timer_alt_dump_stats(default_data_id, f); +} + +int __rte_experimental +rte_timer_alt_dump_stats(uint32_t timer_data_id __rte_unused, FILE *f) +{ #ifdef RTE_LIBRTE_TIMER_DEBUG struct rte_timer_debug_stats sum; unsigned lcore_id; + struct rte_timer_data *timer_data; + + TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL); memset(&sum, 0, sizeof(sum)); for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { - sum.reset += priv_timer[lcore_id].stats.reset; - sum.stop += priv_timer[lcore_id].stats.stop; - sum.manage += priv_timer[lcore_id].stats.manage; - sum.pending += priv_timer[lcore_id].stats.pending; + sum.reset += data->priv_timer[lcore_id].stats.reset; + sum.stop += data->priv_timer[lcore_id].stats.stop; + sum.manage += data->priv_timer[lcore_id].stats.manage; + sum.pending += data->priv_timer[lcore_id].stats.pending; } fprintf(f, "Timer statistics:\n"); fprintf(f, " reset = %"PRIu64"\n", sum.reset); @@ -614,4 +943,5 @@ void rte_timer_dump_stats(FILE *f) #else fprintf(f, "No timer statistics, RTE_LIBRTE_TIMER_DEBUG is disabled\n"); #endif + return 0; } diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h index 9b95cd2..9daa334 100644 --- a/lib/librte_timer/rte_timer.h +++ b/lib/librte_timer/rte_timer.h @@ -39,6 +39,7 @@ #include #include #include +#include #ifdef __cplusplus extern "C" { @@ -132,12 +133,52 @@ struct rte_timer #endif /** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Allocate a timer data instance in shared memory to track a set of pending + * timer lists. + * + * @param id_ptr + * Pointer to variable into which to write the identifier of the allocated + * timer data instance. + * + * @return + * 0: Success + * -ENOSPC: maximum number of timer data instances already allocated + */ +int __rte_experimental rte_timer_data_alloc(uint32_t *id_ptr); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Deallocate a timer data instance. + * + * @param id + * Identifier of the timer data instance to deallocate. + * + * @return + * 0: Success + * -EINVAL: invalid timer data instance identifier + */ +int __rte_experimental rte_timer_data_dealloc(uint32_t id); + +/** * Initialize the timer library. * * Initializes internal variables (list, locks and so on) for the RTE * timer library. */ -void rte_timer_subsystem_init(void); +int rte_timer_subsystem_init(void); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Free timer subsystem resources. + */ +void __rte_experimental rte_timer_subsystem_finalize(void); /** * Initialize a timer handle. @@ -254,7 +295,6 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks, */ int rte_timer_stop(struct rte_timer *tim); - /** * Loop until rte_timer_stop() succeeds. * @@ -302,6 +342,130 @@ void rte_timer_manage(void); */ void rte_timer_dump_stats(FILE *f); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * This function is the same as rte_timer_reset(), except that it allows a + * caller to specify the rte_timer_data instance containing the list to which + * the timer should be added. + * + * @see rte_timer_reset() + * + * @param timer_data_id + * An identifier indicating which instance of timer data should be used for + * this operation. + * @param tim + * The timer handle. + * @param ticks + * The number of cycles (see rte_get_hpet_hz()) before the callback + * function is called. + * @param type + * The type can be either: + * - PERIODICAL: The timer is automatically reloaded after execution + * (returns to the PENDING state) + * - SINGLE: The timer is one-shot, that is, the timer goes to a + * STOPPED state after execution. + * @param tim_lcore + * The ID of the lcore where the timer callback function has to be + * executed. If tim_lcore is LCORE_ID_ANY, the timer library will + * launch it on a different core for each call (round-robin). + * @param fct + * The callback function of the timer. This parameter can be NULL if (and + * only if) rte_timer_alt_manage() will be used to manage this timer. + * @param arg + * The user argument of the callback function. + * @return + * - 0: Success; the timer is scheduled. + * - (-1): Timer is in the RUNNING or CONFIG state. + * - -EINVAL: invalid timer_data_id + */ +int __rte_experimental +rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim, + uint64_t ticks, enum rte_timer_type type, + unsigned int tim_lcore, rte_timer_cb_t fct, void *arg); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * This function is the same as rte_timer_stop(), except that it allows a + * caller to specify the rte_timer_data instance containing the list from which + * this timer should be removed. + * + * @see rte_timer_stop() + * + * @param timer_data_id + * An identifier indicating which instance of timer data should be used for + * this operation. + * @param tim + * The timer handle. + * @return + * - 0: Success; the timer is stopped. + * - (-1): The timer is in the RUNNING or CONFIG state. + * - -EINVAL: invalid timer_data_id + */ +int __rte_experimental +rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim); + +/** + * Callback function type for rte_timer_alt_manage(). + */ +typedef void (*rte_timer_alt_manage_cb_t)(void *); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Manage a set of timer lists and execute the specified callback function for + * all expired timers. This function is similar to rte_timer_manage(), except + * that it allows a caller to specify the timer_data instance that should + * be operated on, as well as a set of lcore IDs identifying which timer lists + * should be processed. Callback functions of individual timers are ignored. + * + * @see rte_timer_manage() + * + * @param timer_data_id + * An identifier indicating which instance of timer data should be used for + * this operation. + * @param poll_lcores + * An array of lcore ids identifying the timer lists that should be processed. + * NULL is allowed - if NULL, the timer list corresponding to the lcore + * calling this routine is processed (same as rte_timer_manage()). + * @param n_poll_lcores + * The size of the poll_lcores array. If 'poll_lcores' is NULL, this parameter + * is ignored. + * @param f + * The callback function which should be called for all expired timers. + * @return + * - 0: success + * - -EINVAL: invalid timer_data_id + */ +int __rte_experimental +rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores, + int n_poll_lcores, rte_timer_alt_manage_cb_t f); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * This function is the same as rte_timer_dump_stats(), except that it allows + * the caller to specify the rte_timer_data instance that should be used. + * + * @see rte_timer_dump_stats() + * + * @param timer_data_id + * An identifier indicating which instance of timer data should be used for + * this operation. + * @param f + * A pointer to a file for output + * @return + * - 0: success + * - -EINVAL: invalid timer_data_id + */ +int __rte_experimental +rte_timer_alt_dump_stats(uint32_t timer_data_id, FILE *f); + #ifdef __cplusplus } #endif diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map index 9b2e4b8..1e6b70d 100644 --- a/lib/librte_timer/rte_timer_version.map +++ b/lib/librte_timer/rte_timer_version.map @@ -3,13 +3,30 @@ DPDK_2.0 { rte_timer_dump_stats; rte_timer_init; - rte_timer_manage; rte_timer_pending; rte_timer_reset; rte_timer_reset_sync; rte_timer_stop; rte_timer_stop_sync; - rte_timer_subsystem_init; local: *; }; + +DPDK_19.02 { + global: + + rte_timer_manage; + rte_timer_subsystem_init; +} DPDK_2.0; + +EXPERIMENTAL { + global: + + rte_timer_alt_dump_stats; + rte_timer_alt_manage; + rte_timer_alt_reset; + rte_timer_alt_stop; + rte_timer_data_alloc; + rte_timer_data_dealloc; + rte_timer_subsystem_finalize; +};