From patchwork Tue Jun 1 12:00:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 93720 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 40EA1A0524; Tue, 1 Jun 2021 14:00:49 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B38BF40FDF; Tue, 1 Jun 2021 14:00:44 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by mails.dpdk.org (Postfix) with ESMTP id 4C3BE40041 for ; Tue, 1 Jun 2021 14:00:42 +0200 (CEST) IronPort-SDR: dZGciVnY9ofUahIM18m/baFphQpg8FJxj2pr8Eb5xYGEenyKTlQEYpGfGUyQvhlf01CScXqD+N uZqOmPYAYJ6g== X-IronPort-AV: E=McAfee;i="6200,9189,10001"; a="203528466" X-IronPort-AV: E=Sophos;i="5.83,239,1616482800"; d="scan'208";a="203528466" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2021 05:00:40 -0700 IronPort-SDR: InDbsWhm/ZEqci5aw/6cYFhMoXz7xJTUiBVYb/cR8KywDg9a7CSFTcX9ZNFsI6znas0U60bFt0 i3cqM/jfrIOA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,239,1616482800"; d="scan'208";a="549706980" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.54]) by fmsmga001.fm.intel.com with ESMTP; 01 Jun 2021 05:00:39 -0700 From: Anatoly Burakov To: dev@dpdk.org, Bruce Richardson , Konstantin Ananyev Cc: ciara.loftus@intel.com, david.hunt@intel.com Date: Tue, 1 Jun 2021 12:00:30 +0000 Message-Id: <8007029ea9e5129ea43f0c11708169406a16727f.1622548381.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v1 1/7] power_intrinsics: allow monitor checks inversion X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Previously, the semantics of power monitor were such that we were checking current value against the expected value, and if they matched, then the sleep was aborted. This is somewhat inflexible, because it only allowed us to check for a specific value. This commit adds an option to reverse the check, so that we can have monitor sleep aborted if the expected value *doesn't* match what's in memory. This allows us to both implement all currently implemented driver code, as well as support more use cases which don't easily map to previous semantics (such as waiting on writes to AF_XDP counter value). Since the old behavior is the default, no need to adjust existing implementations. Signed-off-by: Anatoly Burakov --- lib/eal/include/generic/rte_power_intrinsics.h | 4 ++++ lib/eal/x86/rte_power_intrinsics.c | 5 ++++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h index dddca3d41c..1006c2edfc 100644 --- a/lib/eal/include/generic/rte_power_intrinsics.h +++ b/lib/eal/include/generic/rte_power_intrinsics.h @@ -31,6 +31,10 @@ struct rte_power_monitor_cond { * 4, or 8. Supplying any other value will result in * an error. */ + uint8_t invert; /**< Invert check for expected value (e.g. instead of + * checking if `val` matches something, check if + * `val` *doesn't* match a particular value) + */ }; /** diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c index 39ea9fdecd..5d944e9aa4 100644 --- a/lib/eal/x86/rte_power_intrinsics.c +++ b/lib/eal/x86/rte_power_intrinsics.c @@ -117,7 +117,10 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc, const uint64_t masked = cur_value & pmc->mask; /* if the masked value is already matching, abort */ - if (masked == pmc->val) + if (!pmc->invert && masked == pmc->val) + goto end; + /* same, but for inverse check */ + if (pmc->invert && masked != pmc->val) goto end; } From patchwork Tue Jun 1 12:00:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 93721 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3EF2DA0524; Tue, 1 Jun 2021 14:00:56 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D9CF6410E3; Tue, 1 Jun 2021 14:00:45 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by mails.dpdk.org (Postfix) with ESMTP id E643540E64 for ; Tue, 1 Jun 2021 14:00:42 +0200 (CEST) IronPort-SDR: Zt3RQsIJ6nfQlR6AIKQsdWrykurFyKL/m1d7PUlzZ5awMul0AKll3GjKzBK6S2xqAP4WfiDwsP y8uTO94b6w+A== X-IronPort-AV: E=McAfee;i="6200,9189,10001"; a="203528469" X-IronPort-AV: E=Sophos;i="5.83,239,1616482800"; d="scan'208";a="203528469" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2021 05:00:42 -0700 IronPort-SDR: eN3k6PN3C6ghrePUTI9J9RenxglJnAkKZcLxl4fWlx+xnKVqTRJyS68Fja9VkwXl/vHzgV9RM8 upcjt3AyvvBA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,239,1616482800"; d="scan'208";a="549706987" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.54]) by fmsmga001.fm.intel.com with ESMTP; 01 Jun 2021 05:00:40 -0700 From: Anatoly Burakov To: dev@dpdk.org, Ciara Loftus , Qi Zhang Cc: david.hunt@intel.com Date: Tue, 1 Jun 2021 12:00:31 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v1 2/7] net/af_xdp: add power monitor support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Implement support for .get_monitor_addr in AF_XDP driver. Signed-off-by: Anatoly Burakov Acked-by: Ciara Loftus --- drivers/net/af_xdp/rte_eth_af_xdp.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c index eb5660a3dc..dfbf74ea53 100644 --- a/drivers/net/af_xdp/rte_eth_af_xdp.c +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c @@ -37,6 +37,7 @@ #include #include #include +#include #include "compat.h" @@ -788,6 +789,29 @@ eth_dev_configure(struct rte_eth_dev *dev) return 0; } +static int +eth_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) +{ + struct pkt_rx_queue *rxq = rx_queue; + unsigned int *prod = rxq->rx.producer; + const uint32_t cur_val = rxq->rx.cached_prod; /* use cached value */ + + /* watch for changes in producer ring */ + pmc->addr = (void*)prod; + + /* store current value */ + pmc->val = cur_val; + pmc->mask = (uint32_t)~0; /* mask entire uint32_t value */ + + /* AF_XDP producer ring index is 32-bit */ + pmc->size = sizeof(uint32_t); + + /* this requires an inverted check */ + pmc->invert = 1; + + return 0; +} + static int eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) { @@ -1448,6 +1472,7 @@ static const struct eth_dev_ops ops = { .link_update = eth_link_update, .stats_get = eth_stats_get, .stats_reset = eth_stats_reset, + .get_monitor_addr = eth_get_monitor_addr }; /** parse busy_budget argument */ From patchwork Tue Jun 1 12:00:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 93722 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 14DD3A0524; Tue, 1 Jun 2021 14:01:03 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0DC03410EF; Tue, 1 Jun 2021 14:00:48 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by mails.dpdk.org (Postfix) with ESMTP id 33C73410E8 for ; Tue, 1 Jun 2021 14:00:46 +0200 (CEST) IronPort-SDR: 97aw+JIu0nY3VAzxORNDcRZU6AUZpCByXvKKOe+Bju9NWRQPcWNNFp/FoOQrgOtO1DrC3zoiNa BcaayfBHOIOQ== X-IronPort-AV: E=McAfee;i="6200,9189,10001"; a="203528476" X-IronPort-AV: E=Sophos;i="5.83,239,1616482800"; d="scan'208";a="203528476" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2021 05:00:45 -0700 IronPort-SDR: wlols0yPotOxme461yEfwfXmsNq4H1h1Xjh4o7KOBXByPpGXPJTZcEhyKPQY3N+gcLrdWjdvvu HifsFcerM6kA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,239,1616482800"; d="scan'208";a="549706995" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.54]) by fmsmga001.fm.intel.com with ESMTP; 01 Jun 2021 05:00:42 -0700 From: Anatoly Burakov To: dev@dpdk.org, Jan Viktorin , Ruifeng Wang , Jerin Jacob , David Christensen , Ray Kinsella , Neil Horman , Bruce Richardson , Konstantin Ananyev Cc: ciara.loftus@intel.com, david.hunt@intel.com Date: Tue, 1 Jun 2021 12:00:32 +0000 Message-Id: <712a208c3e90dda2c765d3384f5be944df1258b0.1622548381.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v1 3/7] eal: add power monitor for multiple events X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Use RTM and WAITPKG instructions to perform a wait-for-writes similar to what UMWAIT does, but without the limitation of having to listen for just one event. This works because the optimized power state used by the TPAUSE instruction will cause a wake up on RTM transaction abort, so if we add the addresses we're interested in to the read-set, any write to those addresses will wake us up. Signed-off-by: Konstantin Ananyev Signed-off-by: Anatoly Burakov --- lib/eal/arm/rte_power_intrinsics.c | 11 +++ lib/eal/include/generic/rte_cpuflags.h | 2 + .../include/generic/rte_power_intrinsics.h | 35 ++++++++++ lib/eal/ppc/rte_power_intrinsics.c | 11 +++ lib/eal/version.map | 3 + lib/eal/x86/rte_cpuflags.c | 2 + lib/eal/x86/rte_power_intrinsics.c | 69 +++++++++++++++++++ 7 files changed, 133 insertions(+) diff --git a/lib/eal/arm/rte_power_intrinsics.c b/lib/eal/arm/rte_power_intrinsics.c index e83f04072a..78f55b7203 100644 --- a/lib/eal/arm/rte_power_intrinsics.c +++ b/lib/eal/arm/rte_power_intrinsics.c @@ -38,3 +38,14 @@ rte_power_monitor_wakeup(const unsigned int lcore_id) return -ENOTSUP; } + +int +rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[], + const uint32_t num, const uint64_t tsc_timestamp) +{ + RTE_SET_USED(pmc); + RTE_SET_USED(num); + RTE_SET_USED(tsc_timestamp); + + return -ENOTSUP; +} diff --git a/lib/eal/include/generic/rte_cpuflags.h b/lib/eal/include/generic/rte_cpuflags.h index 28a5aecde8..d35551e931 100644 --- a/lib/eal/include/generic/rte_cpuflags.h +++ b/lib/eal/include/generic/rte_cpuflags.h @@ -24,6 +24,8 @@ struct rte_cpu_intrinsics { /**< indicates support for rte_power_monitor function */ uint32_t power_pause : 1; /**< indicates support for rte_power_pause function */ + uint32_t power_monitor_multi : 1; + /**< indicates support for rte_power_monitor_multi function */ }; /** diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h index 1006c2edfc..acb0d759ce 100644 --- a/lib/eal/include/generic/rte_power_intrinsics.h +++ b/lib/eal/include/generic/rte_power_intrinsics.h @@ -113,4 +113,39 @@ int rte_power_monitor_wakeup(const unsigned int lcore_id); __rte_experimental int rte_power_pause(const uint64_t tsc_timestamp); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Monitor a set of addresses for changes. This will cause the CPU to enter an + * architecture-defined optimized power state until either one of the specified + * memory addresses is written to, a certain TSC timestamp is reached, or other + * reasons cause the CPU to wake up. + * + * Additionally, `expected` 64-bit values and 64-bit masks are provided. If + * mask is non-zero, the current value pointed to by the `p` pointer will be + * checked against the expected value, and if they do not match, the entering of + * optimized power state may be aborted. + * + * @warning It is responsibility of the user to check if this function is + * supported at runtime using `rte_cpu_get_intrinsics_support()` API call. + * Failing to do so may result in an illegal CPU instruction error. + * + * @param pmc + * An array of monitoring condition structures. + * @param num + * Length of the `pmc` array. + * @param tsc_timestamp + * Maximum TSC timestamp to wait for. Note that the wait behavior is + * architecture-dependent. + * + * @return + * 0 on success + * -EINVAL on invalid parameters + * -ENOTSUP if unsupported + */ +__rte_experimental +int rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[], + const uint32_t num, const uint64_t tsc_timestamp); + #endif /* _RTE_POWER_INTRINSIC_H_ */ diff --git a/lib/eal/ppc/rte_power_intrinsics.c b/lib/eal/ppc/rte_power_intrinsics.c index 7fc9586da7..f00b58ade5 100644 --- a/lib/eal/ppc/rte_power_intrinsics.c +++ b/lib/eal/ppc/rte_power_intrinsics.c @@ -38,3 +38,14 @@ rte_power_monitor_wakeup(const unsigned int lcore_id) return -ENOTSUP; } + +int +rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[], + const uint32_t num, const uint64_t tsc_timestamp) +{ + RTE_SET_USED(pmc); + RTE_SET_USED(num); + RTE_SET_USED(tsc_timestamp); + + return -ENOTSUP; +} diff --git a/lib/eal/version.map b/lib/eal/version.map index fe5c3dac98..4ccd5475d6 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -423,6 +423,9 @@ EXPERIMENTAL { rte_version_release; # WINDOWS_NO_EXPORT rte_version_suffix; # WINDOWS_NO_EXPORT rte_version_year; # WINDOWS_NO_EXPORT + + # added in 21.08 + rte_power_monitor_multi; # WINDOWS_NO_EXPORT }; INTERNAL { diff --git a/lib/eal/x86/rte_cpuflags.c b/lib/eal/x86/rte_cpuflags.c index a96312ff7f..d339734a8c 100644 --- a/lib/eal/x86/rte_cpuflags.c +++ b/lib/eal/x86/rte_cpuflags.c @@ -189,5 +189,7 @@ rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics) if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_WAITPKG)) { intrinsics->power_monitor = 1; intrinsics->power_pause = 1; + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_RTM)) + intrinsics->power_monitor_multi = 1; } } diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c index 5d944e9aa4..4f972673ce 100644 --- a/lib/eal/x86/rte_power_intrinsics.c +++ b/lib/eal/x86/rte_power_intrinsics.c @@ -4,6 +4,7 @@ #include #include +#include #include #include "rte_power_intrinsics.h" @@ -28,6 +29,7 @@ __umwait_wakeup(volatile void *addr) } static bool wait_supported; +static bool wait_multi_supported; static inline uint64_t __get_umwait_val(const volatile void *p, const uint8_t sz) @@ -170,6 +172,8 @@ RTE_INIT(rte_power_intrinsics_init) { if (i.power_monitor && i.power_pause) wait_supported = 1; + if (i.power_monitor_multi) + wait_multi_supported = 1; } int @@ -208,6 +212,9 @@ rte_power_monitor_wakeup(const unsigned int lcore_id) * In this case, since we've already woken up, the "wakeup" was * unneeded, and since T1 is still waiting on T2 releasing the lock, the * wakeup address is still valid so it's perfectly safe to write it. + * + * For multi-monitor case, the act of locking will in itself trigger the + * wakeup, so no additional writes necessary. */ rte_spinlock_lock(&s->lock); if (s->monitor_addr != NULL) @@ -216,3 +223,65 @@ rte_power_monitor_wakeup(const unsigned int lcore_id) return 0; } + +int +rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[], + const uint32_t num, const uint64_t tsc_timestamp) +{ + const unsigned int lcore_id = rte_lcore_id(); + struct power_wait_status *s = &wait_status[lcore_id]; + uint32_t i, rc; + + /* check if supported */ + if (!wait_multi_supported) + return -ENOTSUP; + + if (pmc == NULL || num == 0) + return -EINVAL; + + /* we are already inside transaction region, return */ + if (rte_xtest() != 0) + return 0; + + /* start new transaction region */ + rc = rte_xbegin(); + + /* transaction abort, possible write to one of wait addresses */ + if (rc != RTE_XBEGIN_STARTED) + return 0; + + /* + * the mere act of reading the lock status here adds the lock to + * the read set. This means that when we trigger a wakeup from another + * thread, even if we don't have a defined wakeup address and thus don't + * actually cause any writes, the act of locking our lock will itself + * trigger the wakeup and abort the transaction. + */ + rte_spinlock_is_locked(&s->lock); + + /* + * add all addresses to wait on into transaction read-set and check if + * any of wakeup conditions are already met. + */ + for (i = 0; i < num; i++) { + const struct rte_power_monitor_cond *c = &pmc[i]; + const uint64_t val = __get_umwait_val(pmc->addr, pmc->size); + const uint64_t masked = val & c->mask; + + /* if the masked value is already matching, abort */ + if (!c->invert && masked == c->val) + break; + /* same, but for inverse check */ + if (c->invert && masked != c->val) + break; + } + + /* none of the conditions were met, sleep until timeout */ + if (i == num) + rte_power_pause(tsc_timestamp); + + /* end transaction region */ + rte_xend(); + + return 0; +} From patchwork Tue Jun 1 12:00:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 93723 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B444BA0524; Tue, 1 Jun 2021 14:01:09 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 38F91410F9; Tue, 1 Jun 2021 14:00:49 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by mails.dpdk.org (Postfix) with ESMTP id 70446410EE for ; Tue, 1 Jun 2021 14:00:47 +0200 (CEST) IronPort-SDR: OpNvqKleZ26ul2CE/e57maQa88nFwCjsaR/MxgNYzmpzN/naUMjODACGoiqYS59th6l/ZjGTtH LisChuiUMVrg== X-IronPort-AV: E=McAfee;i="6200,9189,10001"; a="203528481" X-IronPort-AV: E=Sophos;i="5.83,239,1616482800"; d="scan'208";a="203528481" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2021 05:00:47 -0700 IronPort-SDR: xAyBoIQi4WMy10mkb1XepJ5vDd+GcnuO1K/Rwh49TPSDhlaWAKQrDsmGOqhEBajxpigNDJ+PH7 +RtH4ohC+fjQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,239,1616482800"; d="scan'208";a="549707003" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.54]) by fmsmga001.fm.intel.com with ESMTP; 01 Jun 2021 05:00:45 -0700 From: Anatoly Burakov To: dev@dpdk.org, David Hunt Cc: ciara.loftus@intel.com Date: Tue, 1 Jun 2021 12:00:33 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v1 4/7] power: remove thread safety from PMD power API's X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, we expect that only one callback can be active at any given moment, for a particular queue configuration, which is relatively easy to implement in a thread-safe way. However, we're about to add support for multiple queues per lcore, which will greatly increase the possibility of various race conditions. We could have used something like an RCU for this use case, but absent of a pressing need for thread safety we'll go the easy way and just mandate that the API's are to be called when all affected ports are stopped, and document this limitation. This greatly simplifies the `rte_power_monitor`-related code. Signed-off-by: Anatoly Burakov --- lib/power/meson.build | 3 + lib/power/rte_power_pmd_mgmt.c | 106 ++++++++------------------------- lib/power/rte_power_pmd_mgmt.h | 6 ++ 3 files changed, 35 insertions(+), 80 deletions(-) diff --git a/lib/power/meson.build b/lib/power/meson.build index c1097d32f1..4f6a242364 100644 --- a/lib/power/meson.build +++ b/lib/power/meson.build @@ -21,4 +21,7 @@ headers = files( 'rte_power_pmd_mgmt.h', 'rte_power_guest_channel.h', ) +if cc.has_argument('-Wno-cast-qual') + cflags += '-Wno-cast-qual' +endif deps += ['timer', 'ethdev'] diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c index db03cbf420..0707c60a4f 100644 --- a/lib/power/rte_power_pmd_mgmt.c +++ b/lib/power/rte_power_pmd_mgmt.c @@ -40,8 +40,6 @@ struct pmd_queue_cfg { /**< Callback mode for this queue */ const struct rte_eth_rxtx_callback *cur_cb; /**< Callback instance */ - volatile bool umwait_in_progress; - /**< are we currently sleeping? */ uint64_t empty_poll_stats; /**< Number of empty polls */ } __rte_cache_aligned; @@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused, struct rte_power_monitor_cond pmc; uint16_t ret; - /* - * we might get a cancellation request while being - * inside the callback, in which case the wakeup - * wouldn't work because it would've arrived too early. - * - * to get around this, we notify the other thread that - * we're sleeping, so that it can spin until we're done. - * unsolicited wakeups are perfectly safe. - */ - q_conf->umwait_in_progress = true; - - rte_atomic_thread_fence(__ATOMIC_SEQ_CST); - - /* check if we need to cancel sleep */ - if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) { - /* use monitoring condition to sleep */ - ret = rte_eth_get_monitor_addr(port_id, qidx, - &pmc); - if (ret == 0) - rte_power_monitor(&pmc, UINT64_MAX); - } - q_conf->umwait_in_progress = false; - - rte_atomic_thread_fence(__ATOMIC_SEQ_CST); + /* use monitoring condition to sleep */ + ret = rte_eth_get_monitor_addr(port_id, qidx, + &pmc); + if (ret == 0) + rte_power_monitor(&pmc, UINT64_MAX); } } else q_conf->empty_poll_stats = 0; @@ -183,6 +162,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, { struct pmd_queue_cfg *queue_cfg; struct rte_eth_dev_info info; + rte_rx_callback_fn clb; int ret; RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); @@ -232,17 +212,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, ret = -ENOTSUP; goto end; } - /* initialize data before enabling the callback */ - queue_cfg->empty_poll_stats = 0; - queue_cfg->cb_mode = mode; - queue_cfg->umwait_in_progress = false; - queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; - - /* ensure we update our state before callback starts */ - rte_atomic_thread_fence(__ATOMIC_SEQ_CST); - - queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id, - clb_umwait, NULL); + clb = clb_umwait; break; } case RTE_POWER_MGMT_TYPE_SCALE: @@ -269,16 +239,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, ret = -ENOTSUP; goto end; } - /* initialize data before enabling the callback */ - queue_cfg->empty_poll_stats = 0; - queue_cfg->cb_mode = mode; - queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; - - /* this is not necessary here, but do it anyway */ - rte_atomic_thread_fence(__ATOMIC_SEQ_CST); - - queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, - queue_id, clb_scale_freq, NULL); + clb = clb_scale_freq; break; } case RTE_POWER_MGMT_TYPE_PAUSE: @@ -286,18 +247,21 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, if (global_data.tsc_per_us == 0) calc_tsc(); - /* initialize data before enabling the callback */ - queue_cfg->empty_poll_stats = 0; - queue_cfg->cb_mode = mode; - queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; - - /* this is not necessary here, but do it anyway */ - rte_atomic_thread_fence(__ATOMIC_SEQ_CST); - - queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id, - clb_pause, NULL); + clb = clb_pause; break; + default: + RTE_LOG(DEBUG, POWER, "Invalid power management type\n"); + ret = -EINVAL; + goto end; } + + /* initialize data before enabling the callback */ + queue_cfg->empty_poll_stats = 0; + queue_cfg->cb_mode = mode; + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id, + clb, NULL); + ret = 0; end: return ret; @@ -323,27 +287,8 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id, /* stop any callbacks from progressing */ queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; - /* ensure we update our state before continuing */ - rte_atomic_thread_fence(__ATOMIC_SEQ_CST); - switch (queue_cfg->cb_mode) { - case RTE_POWER_MGMT_TYPE_MONITOR: - { - bool exit = false; - do { - /* - * we may request cancellation while the other thread - * has just entered the callback but hasn't started - * sleeping yet, so keep waking it up until we know it's - * done sleeping. - */ - if (queue_cfg->umwait_in_progress) - rte_power_monitor_wakeup(lcore_id); - else - exit = true; - } while (!exit); - } - /* fall-through */ + case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */ case RTE_POWER_MGMT_TYPE_PAUSE: rte_eth_remove_rx_callback(port_id, queue_id, queue_cfg->cur_cb); @@ -356,10 +301,11 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id, break; } /* - * we don't free the RX callback here because it is unsafe to do so - * unless we know for a fact that all data plane threads have stopped. + * the API doc mandates that the user stops all processing on affected + * ports before calling any of these API's, so we can assume that the + * callbacks can be freed. we're intentionally casting away const-ness. */ - queue_cfg->cur_cb = NULL; + rte_free((void *)queue_cfg->cur_cb); return 0; } diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h index 7a0ac24625..7557f5d7e1 100644 --- a/lib/power/rte_power_pmd_mgmt.h +++ b/lib/power/rte_power_pmd_mgmt.h @@ -43,6 +43,9 @@ enum rte_power_pmd_mgmt_type { * * @note This function is not thread-safe. * + * @warning This function must be called when all affected Ethernet ports are + * stopped and no Rx/Tx is in progress! + * * @param lcore_id * The lcore the Rx queue will be polled from. * @param port_id @@ -69,6 +72,9 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, * * @note This function is not thread-safe. * + * @warning This function must be called when all affected Ethernet ports are + * stopped and no Rx/Tx is in progress! + * * @param lcore_id * The lcore the Rx queue is polled from. * @param port_id From patchwork Tue Jun 1 12:00:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 93724 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B19B3A0524; Tue, 1 Jun 2021 14:01:17 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id BE586410FE; Tue, 1 Jun 2021 14:00:52 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by mails.dpdk.org (Postfix) with ESMTP id 2849641102 for ; Tue, 1 Jun 2021 14:00:50 +0200 (CEST) IronPort-SDR: S9GurumMFW3rjL8ILRhc2WHpSM45DUbDeMESZaCmRQdAUsRRqkpL8bXkqUgwyxpPJwEi0YipVd gWbvyVN5bhnw== X-IronPort-AV: E=McAfee;i="6200,9189,10001"; a="203528488" X-IronPort-AV: E=Sophos;i="5.83,239,1616482800"; d="scan'208";a="203528488" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2021 05:00:50 -0700 IronPort-SDR: mMxOuFj2LfE0sE4ZGKDCvWBbf3YcWDsASQ016H9k8YcnVN0BgWrla6OOTYypFaVnky15Q3lYTJ c+uaf8mPb86A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,239,1616482800"; d="scan'208";a="549707016" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.54]) by fmsmga001.fm.intel.com with ESMTP; 01 Jun 2021 05:00:47 -0700 From: Anatoly Burakov To: dev@dpdk.org, David Hunt , Ray Kinsella , Neil Horman Cc: ciara.loftus@intel.com Date: Tue, 1 Jun 2021 12:00:34 +0000 Message-Id: <601f09ad562f97ca1e0077cb36ba46e448df382d.1622548381.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v1 5/7] power: support callbacks for multiple Rx queues X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, there is a hard limitation on the PMD power management support that only allows it to support a single queue per lcore. This is not ideal as most DPDK use cases will poll multiple queues per core. The PMD power management mechanism relies on ethdev Rx callbacks, so it is very difficult to implement such support because callbacks are effectively stateless and have no visibility into what the other ethdev devices are doing. This places limitations on what we can do within the framework of Rx callbacks, but the basics of this implementation are as follows: - Replace per-queue structures with per-lcore ones, so that any device polled from the same lcore can share data - Any queue that is going to be polled from a specific lcore has to be added to the list of cores to poll, so that the callback is aware of other queues being polled by the same lcore - Both the empty poll counter and the actual power saving mechanism is shared between all queues polled on a particular lcore, and is only activated when a special designated "power saving" queue is polled. To put it another way, we have no idea which queue the user will poll in what order, so we rely on them telling us that queue X is the last one in the polling loop, so any power management should happen there. - A new API is added to mark a specific Rx queue as "power saving". Failing to call this API will result in no power management, however when having only one queue per core it is obvious which queue is the "power saving" one, so things will still work without this new API for use cases that were previously working without it. - The limitation on UMWAIT-based polling is not removed because UMWAIT is incapable of monitoring more than one address. Signed-off-by: Anatoly Burakov --- lib/power/rte_power_pmd_mgmt.c | 335 ++++++++++++++++++++++++++------- lib/power/rte_power_pmd_mgmt.h | 34 ++++ lib/power/version.map | 3 + 3 files changed, 306 insertions(+), 66 deletions(-) diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c index 0707c60a4f..60dd21a19c 100644 --- a/lib/power/rte_power_pmd_mgmt.c +++ b/lib/power/rte_power_pmd_mgmt.c @@ -33,7 +33,19 @@ enum pmd_mgmt_state { PMD_MGMT_ENABLED }; -struct pmd_queue_cfg { +struct queue { + uint16_t portid; + uint16_t qid; +}; +struct pmd_core_cfg { + struct queue queues[RTE_MAX_ETHPORTS]; + /**< Which port-queue pairs are associated with this lcore? */ + struct queue power_save_queue; + /**< When polling multiple queues, all but this one will be ignored */ + bool power_save_queue_set; + /**< When polling multiple queues, power save queue must be set */ + size_t n_queues; + /**< How many queues are in the list? */ volatile enum pmd_mgmt_state pwr_mgmt_state; /**< State of power management for this queue */ enum rte_power_pmd_mgmt_type cb_mode; @@ -43,8 +55,97 @@ struct pmd_queue_cfg { uint64_t empty_poll_stats; /**< Number of empty polls */ } __rte_cache_aligned; +static struct pmd_core_cfg lcore_cfg[RTE_MAX_LCORE]; -static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; +static inline bool +queue_equal(const struct queue *l, const struct queue *r) +{ + return l->portid == r->portid && l->qid == r->qid; +} + +static inline void +queue_copy(struct queue *dst, const struct queue *src) +{ + dst->portid = src->portid; + dst->qid = src->qid; +} + +static inline bool +queue_is_power_save(const struct pmd_core_cfg *cfg, const struct queue *q) { + const struct queue *pwrsave = &cfg->power_save_queue; + + /* if there's only single queue, no need to check anything */ + if (cfg->n_queues == 1) + return true; + return cfg->power_save_queue_set && queue_equal(q, pwrsave); +} + +static int +queue_list_find(const struct pmd_core_cfg *cfg, const struct queue *q, + size_t *idx) { + size_t i; + for (i = 0; i < cfg->n_queues; i++) { + const struct queue *cur = &cfg->queues[i]; + if (queue_equal(cur, q)) { + if (idx != NULL) + *idx = i; + return 0; + } + } + return -1; +} + +static int +queue_set_power_save(struct pmd_core_cfg *cfg, const struct queue *q) { + if (queue_list_find(cfg, q, NULL) < 0) + return -ENOENT; + queue_copy(&cfg->power_save_queue, q); + cfg->power_save_queue_set = true; + return 0; +} + +static int +queue_list_add(struct pmd_core_cfg *cfg, const struct queue *q) +{ + size_t idx = cfg->n_queues; + if (idx >= RTE_DIM(cfg->queues)) + return -ENOSPC; + /* is it already in the list? */ + if (queue_list_find(cfg, q, NULL) == 0) + return -EEXIST; + queue_copy(&cfg->queues[idx], q); + cfg->n_queues++; + + return 0; +} + +static int +queue_list_remove(struct pmd_core_cfg *cfg, const struct queue *q) +{ + struct queue *found, *pwrsave; + size_t idx, last_idx = cfg->n_queues - 1; + + if (queue_list_find(cfg, q, &idx) != 0) + return -ENOENT; + + /* erase the queue pair being deleted */ + found = &cfg->queues[idx]; + memset(found, 0, sizeof(*found)); + + /* move the rest of the list */ + for (; idx < last_idx; idx++) + queue_copy(&cfg->queues[idx], &cfg->queues[idx + 1]); + cfg->n_queues--; + + /* if this was a power save queue, unset it */ + pwrsave = &cfg->power_save_queue; + if (cfg->power_save_queue_set && queue_is_power_save(cfg, q)) { + cfg->power_save_queue_set = false; + memset(pwrsave, 0, sizeof(*pwrsave)); + } + + return 0; +} static void calc_tsc(void) @@ -79,10 +180,10 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, uint16_t max_pkts __rte_unused, void *addr __rte_unused) { + const unsigned int lcore = rte_lcore_id(); + struct pmd_core_cfg *q_conf; - struct pmd_queue_cfg *q_conf; - - q_conf = &port_cfg[port_id][qidx]; + q_conf = &lcore_cfg[lcore]; if (unlikely(nb_rx == 0)) { q_conf->empty_poll_stats++; @@ -107,11 +208,26 @@ clb_pause(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, uint16_t max_pkts __rte_unused, void *addr __rte_unused) { - struct pmd_queue_cfg *q_conf; + const unsigned int lcore = rte_lcore_id(); + const struct queue q = {port_id, qidx}; + const bool empty = nb_rx == 0; + struct pmd_core_cfg *q_conf; - q_conf = &port_cfg[port_id][qidx]; + q_conf = &lcore_cfg[lcore]; - if (unlikely(nb_rx == 0)) { + /* early exit */ + if (likely(!empty)) { + q_conf->empty_poll_stats = 0; + } else { + /* do we care about this particular queue? */ + if (!queue_is_power_save(q_conf, &q)) + return nb_rx; + + /* + * we can increment unconditionally here because if there were + * non-empty polls in other queues assigned to this core, we + * dropped the counter to zero anyway. + */ q_conf->empty_poll_stats++; /* sleep for 1 microsecond */ if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) { @@ -127,8 +243,7 @@ clb_pause(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused, rte_pause(); } } - } else - q_conf->empty_poll_stats = 0; + } return nb_rx; } @@ -138,29 +253,97 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, uint16_t max_pkts __rte_unused, void *_ __rte_unused) { - struct pmd_queue_cfg *q_conf; + const unsigned int lcore = rte_lcore_id(); + const struct queue q = {port_id, qidx}; + const bool empty = nb_rx == 0; + struct pmd_core_cfg *q_conf; - q_conf = &port_cfg[port_id][qidx]; + q_conf = &lcore_cfg[lcore]; - if (unlikely(nb_rx == 0)) { + /* early exit */ + if (likely(!empty)) { + q_conf->empty_poll_stats = 0; + + /* scale up freq immediately */ + rte_power_freq_max(rte_lcore_id()); + } else { + /* do we care about this particular queue? */ + if (!queue_is_power_save(q_conf, &q)) + return nb_rx; + + /* + * we can increment unconditionally here because if there were + * non-empty polls in other queues assigned to this core, we + * dropped the counter to zero anyway. + */ q_conf->empty_poll_stats++; if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) /* scale down freq */ rte_power_freq_min(rte_lcore_id()); - } else { - q_conf->empty_poll_stats = 0; - /* scale up freq */ - rte_power_freq_max(rte_lcore_id()); } return nb_rx; } +static int +check_scale(unsigned int lcore) +{ + enum power_management_env env; + + /* only PSTATE and ACPI modes are supported */ + if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) && + !rte_power_check_env_supported(PM_ENV_PSTATE_CPUFREQ)) { + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n"); + return -ENOTSUP; + } + /* ensure we could initialize the power library */ + if (rte_power_init(lcore)) + return -EINVAL; + + /* ensure we initialized the correct env */ + env = rte_power_get_env(); + if (env != PM_ENV_ACPI_CPUFREQ && env != PM_ENV_PSTATE_CPUFREQ) { + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized\n"); + return -ENOTSUP; + } + + /* we're done */ + return 0; +} + +static int +check_monitor(struct pmd_core_cfg *cfg, const struct queue *qdata) +{ + struct rte_power_monitor_cond dummy; + + /* check if rte_power_monitor is supported */ + if (!global_data.intrinsics_support.power_monitor) { + RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n"); + return -ENOTSUP; + } + + if (cfg->n_queues > 0) { + RTE_LOG(DEBUG, POWER, "Monitoring multiple queues is not supported\n"); + return -ENOTSUP; + } + + /* check if the device supports the necessary PMD API */ + if (rte_eth_get_monitor_addr(qdata->portid, qdata->qid, + &dummy) == -ENOTSUP) { + RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_get_monitor_addr\n"); + return -ENOTSUP; + } + + /* we're done */ + return 0; +} + int rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, uint16_t queue_id, enum rte_power_pmd_mgmt_type mode) { - struct pmd_queue_cfg *queue_cfg; + const struct queue qdata = {port_id, queue_id}; + struct pmd_core_cfg *queue_cfg; struct rte_eth_dev_info info; rte_rx_callback_fn clb; int ret; @@ -183,9 +366,11 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, goto end; } - queue_cfg = &port_cfg[port_id][queue_id]; + queue_cfg = &lcore_cfg[lcore_id]; - if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) { + /* if callback was already enabled, check current callback type */ + if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED && + queue_cfg->cb_mode != mode) { ret = -EINVAL; goto end; } @@ -195,53 +380,20 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, switch (mode) { case RTE_POWER_MGMT_TYPE_MONITOR: - { - struct rte_power_monitor_cond dummy; - - /* check if rte_power_monitor is supported */ - if (!global_data.intrinsics_support.power_monitor) { - RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n"); - ret = -ENOTSUP; + /* check if we can add a new queue */ + ret = check_monitor(queue_cfg, &qdata); + if (ret < 0) goto end; - } - /* check if the device supports the necessary PMD API */ - if (rte_eth_get_monitor_addr(port_id, queue_id, - &dummy) == -ENOTSUP) { - RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_get_monitor_addr\n"); - ret = -ENOTSUP; - goto end; - } clb = clb_umwait; break; - } case RTE_POWER_MGMT_TYPE_SCALE: - { - enum power_management_env env; - /* only PSTATE and ACPI modes are supported */ - if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) && - !rte_power_check_env_supported( - PM_ENV_PSTATE_CPUFREQ)) { - RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n"); - ret = -ENOTSUP; + /* check if we can add a new queue */ + ret = check_scale(lcore_id); + if (ret < 0) goto end; - } - /* ensure we could initialize the power library */ - if (rte_power_init(lcore_id)) { - ret = -EINVAL; - goto end; - } - /* ensure we initialized the correct env */ - env = rte_power_get_env(); - if (env != PM_ENV_ACPI_CPUFREQ && - env != PM_ENV_PSTATE_CPUFREQ) { - RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized\n"); - ret = -ENOTSUP; - goto end; - } clb = clb_scale_freq; break; - } case RTE_POWER_MGMT_TYPE_PAUSE: /* figure out various time-to-tsc conversions */ if (global_data.tsc_per_us == 0) @@ -254,11 +406,20 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, ret = -EINVAL; goto end; } + /* add this queue to the list */ + ret = queue_list_add(queue_cfg, &qdata); + if (ret < 0) { + RTE_LOG(DEBUG, POWER, "Failed to add queue to list: %s\n", + strerror(-ret)); + goto end; + } /* initialize data before enabling the callback */ - queue_cfg->empty_poll_stats = 0; - queue_cfg->cb_mode = mode; - queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + if (queue_cfg->n_queues == 1) { + queue_cfg->empty_poll_stats = 0; + queue_cfg->cb_mode = mode; + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + } queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id, clb, NULL); @@ -271,7 +432,9 @@ int rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id, uint16_t port_id, uint16_t queue_id) { - struct pmd_queue_cfg *queue_cfg; + const struct queue qdata = {port_id, queue_id}; + struct pmd_core_cfg *queue_cfg; + int ret; RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); @@ -279,13 +442,24 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id, return -EINVAL; /* no need to check queue id as wrong queue id would not be enabled */ - queue_cfg = &port_cfg[port_id][queue_id]; + queue_cfg = &lcore_cfg[lcore_id]; if (queue_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED) return -EINVAL; - /* stop any callbacks from progressing */ - queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; + /* + * There is no good/easy way to do this without race conditions, so we + * are just going to throw our hands in the air and hope that the user + * has read the documentation and has ensured that ports are stopped at + * the time we enter the API functions. + */ + ret = queue_list_remove(queue_cfg, &qdata); + if (ret < 0) + return -ret; + + /* if we've removed all queues from the lists, set state to disabled */ + if (queue_cfg->n_queues == 0) + queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; switch (queue_cfg->cb_mode) { case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */ @@ -309,3 +483,32 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id, return 0; } + +int +rte_power_ethdev_pmgmt_queue_set_power_save(unsigned int lcore_id, + uint16_t port_id, uint16_t queue_id) +{ + const struct queue qdata = {port_id, queue_id}; + struct pmd_core_cfg *queue_cfg; + int ret; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); + + if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT) + return -EINVAL; + + /* no need to check queue id as wrong queue id would not be enabled */ + queue_cfg = &lcore_cfg[lcore_id]; + + if (queue_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED) + return -EINVAL; + + ret = queue_set_power_save(queue_cfg, &qdata); + if (ret < 0) { + RTE_LOG(DEBUG, POWER, "Failed to set power save queue: %s\n", + strerror(-ret)); + return -ret; + } + + return 0; +} diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h index 7557f5d7e1..edf8d8714f 100644 --- a/lib/power/rte_power_pmd_mgmt.h +++ b/lib/power/rte_power_pmd_mgmt.h @@ -90,6 +90,40 @@ int rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id, uint16_t port_id, uint16_t queue_id); +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice. + * + * Set a specific Ethernet device Rx queue to be the "power save" queue for a + * particular lcore. When multiple queues are assigned to a single lcore using + * the `rte_power_ethdev_pmgmt_queue_enable` API, only one of them will trigger + * the power management. In a typical scenario, the last queue to be polled on + * a particular lcore should be designated as power save queue. + * + * @note This function is not thread-safe. + * + * @note When using multiple queues per lcore, calling this function is + * mandatory. If not called, no power management routines would be triggered + * when the traffic starts. + * + * @warning This function must be called when all affected Ethernet ports are + * stopped and no Rx/Tx is in progress! + * + * @param lcore_id + * The lcore the Rx queue is polled from. + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The queue identifier of the Ethernet device. + * @return + * 0 on success + * <0 on error + */ +__rte_experimental +int +rte_power_ethdev_pmgmt_queue_set_power_save(unsigned int lcore_id, + uint16_t port_id, uint16_t queue_id); + #ifdef __cplusplus } #endif diff --git a/lib/power/version.map b/lib/power/version.map index b004e3e4a9..105d1d94c2 100644 --- a/lib/power/version.map +++ b/lib/power/version.map @@ -38,4 +38,7 @@ EXPERIMENTAL { # added in 21.02 rte_power_ethdev_pmgmt_queue_disable; rte_power_ethdev_pmgmt_queue_enable; + + # added in 21.08 + rte_power_ethdev_pmgmt_queue_set_power_save; }; From patchwork Tue Jun 1 12:00:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 93725 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id ED98AA0524; Tue, 1 Jun 2021 14:01:24 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id EF8E44111A; Tue, 1 Jun 2021 14:00:53 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by mails.dpdk.org (Postfix) with ESMTP id 44BE440689 for ; Tue, 1 Jun 2021 14:00:52 +0200 (CEST) IronPort-SDR: gP9kZ4BlKDP426F02Ixp4UbMBieq0Yi8mbBmRxIzVaaaV4xmeJEyFI5HKOEs1zhF4t3J19xJ2H jhAKe69bajBA== X-IronPort-AV: E=McAfee;i="6200,9189,10001"; a="203528496" X-IronPort-AV: E=Sophos;i="5.83,239,1616482800"; d="scan'208";a="203528496" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2021 05:00:51 -0700 IronPort-SDR: I9eKDRiv3RRS2Lc8My3B68qjnENer2hs4aLLy/ztcdKP0yE10142wLV6jEvZTnxXIsQbV5Q+LZ YIXdwHw1QQUw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,239,1616482800"; d="scan'208";a="549707033" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.54]) by fmsmga001.fm.intel.com with ESMTP; 01 Jun 2021 05:00:50 -0700 From: Anatoly Burakov To: dev@dpdk.org, David Hunt Cc: ciara.loftus@intel.com Date: Tue, 1 Jun 2021 12:00:35 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v1 6/7] power: support monitoring multiple Rx queues X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Use the new multi-monitor intrinsic to allow monitoring multiple ethdev Rx queues while entering the energy efficient power state. The multi version will be used unconditionally if supported, and the UMWAIT one will only be used when multi-monitor is not supported by the hardware. Signed-off-by: Anatoly Burakov --- lib/power/rte_power_pmd_mgmt.c | 75 +++++++++++++++++++++++++++++++++- 1 file changed, 73 insertions(+), 2 deletions(-) diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c index 60dd21a19c..9e0b8bdfaf 100644 --- a/lib/power/rte_power_pmd_mgmt.c +++ b/lib/power/rte_power_pmd_mgmt.c @@ -147,6 +147,23 @@ queue_list_remove(struct pmd_core_cfg *cfg, const struct queue *q) return 0; } +static inline int +get_monitor_addresses(struct pmd_core_cfg *cfg, + struct rte_power_monitor_cond *pmc) +{ + size_t i; + int ret; + + for (i = 0; i < cfg->n_queues; i++) { + struct rte_power_monitor_cond *cur = &pmc[i]; + struct queue *q = &cfg->queues[i]; + ret = rte_eth_get_monitor_addr(q->portid, q->qid, cur); + if (ret < 0) + return ret; + } + return 0; +} + static void calc_tsc(void) { @@ -175,6 +192,48 @@ calc_tsc(void) } } +static uint16_t +clb_multiwait(uint16_t port_id, uint16_t qidx, + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, + uint16_t max_pkts __rte_unused, void *addr __rte_unused) +{ + const unsigned int lcore = rte_lcore_id(); + const struct queue q = {port_id, qidx}; + const bool empty = nb_rx == 0; + struct pmd_core_cfg *q_conf; + + q_conf = &lcore_cfg[lcore]; + + /* early exit */ + if (likely(!empty)) { + q_conf->empty_poll_stats = 0; + } else { + /* do we care about this particular queue? */ + if (!queue_is_power_save(q_conf, &q)) + return nb_rx; + + /* + * we can increment unconditionally here because if there were + * non-empty polls in other queues assigned to this core, we + * dropped the counter to zero anyway. + */ + q_conf->empty_poll_stats++; + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) { + struct rte_power_monitor_cond pmc[RTE_MAX_ETHPORTS]; + uint16_t ret; + + /* gather all monitoring conditions */ + ret = get_monitor_addresses(q_conf, pmc); + + if (ret == 0) + rte_power_monitor_multi(pmc, + q_conf->n_queues, UINT64_MAX); + } + } + + return nb_rx; +} + static uint16_t clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, uint16_t max_pkts __rte_unused, @@ -315,14 +374,19 @@ static int check_monitor(struct pmd_core_cfg *cfg, const struct queue *qdata) { struct rte_power_monitor_cond dummy; + bool multimonitor_supported; /* check if rte_power_monitor is supported */ if (!global_data.intrinsics_support.power_monitor) { RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n"); return -ENOTSUP; } + /* check if multi-monitor is supported */ + multimonitor_supported = + global_data.intrinsics_support.power_monitor_multi; - if (cfg->n_queues > 0) { + /* if we're adding a new queue, do we support multiple queues? */ + if (cfg->n_queues > 0 && !multimonitor_supported) { RTE_LOG(DEBUG, POWER, "Monitoring multiple queues is not supported\n"); return -ENOTSUP; } @@ -338,6 +402,13 @@ check_monitor(struct pmd_core_cfg *cfg, const struct queue *qdata) return 0; } +static inline rte_rx_callback_fn +get_monitor_callback(void) +{ + return global_data.intrinsics_support.power_monitor_multi ? + clb_multiwait : clb_umwait; +} + int rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, uint16_t queue_id, enum rte_power_pmd_mgmt_type mode) @@ -385,7 +456,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, if (ret < 0) goto end; - clb = clb_umwait; + clb = get_monitor_callback(); break; case RTE_POWER_MGMT_TYPE_SCALE: /* check if we can add a new queue */ From patchwork Tue Jun 1 12:00:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 93726 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B2359A0524; Tue, 1 Jun 2021 14:01:31 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 4169E4111E; Tue, 1 Jun 2021 14:00:55 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by mails.dpdk.org (Postfix) with ESMTP id CA34040E78 for ; Tue, 1 Jun 2021 14:00:53 +0200 (CEST) IronPort-SDR: caZm20BT4SQO944K9SIY18PvW02GDO/4GJvIHORMu0akvJkUJ0DSPZmNXbk++KiDihpwI1xRiW +ubT4Cd4+RVQ== X-IronPort-AV: E=McAfee;i="6200,9189,10001"; a="203528504" X-IronPort-AV: E=Sophos;i="5.83,239,1616482800"; d="scan'208";a="203528504" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2021 05:00:53 -0700 IronPort-SDR: X1uRnl8qLXUmufsNEs95jShRHaq3bmjm5Zm8uokR16GIk/HrL6D8AqjcWGgv/O4OJ87/YcA+bI Bme+JS2aM7Yw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,239,1616482800"; d="scan'208";a="549707053" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.54]) by fmsmga001.fm.intel.com with ESMTP; 01 Jun 2021 05:00:52 -0700 From: Anatoly Burakov To: dev@dpdk.org, David Hunt Cc: ciara.loftus@intel.com Date: Tue, 1 Jun 2021 12:00:36 +0000 Message-Id: <545b85cf2de1cf5f3dd5efd137f2f2ecd92ac2da.1622548381.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v1 7/7] l3fwd-power: support multiqueue in PMD pmgmt modes X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, l3fwd-power enforces the limitation of having one queue per lcore. This is no longer necessary, so remove the limitation, and always mark the last queue in qconf as the power save queue. Signed-off-by: Anatoly Burakov --- examples/l3fwd-power/main.c | 39 +++++++++++++++++++++++-------------- 1 file changed, 24 insertions(+), 15 deletions(-) diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index f8dfed1634..3057c06936 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -2498,6 +2498,27 @@ mode_to_str(enum appmode mode) } } +static void +pmd_pmgmt_set_up(unsigned int lcore, uint16_t portid, uint16_t qid, bool last) +{ + int ret; + + ret = rte_power_ethdev_pmgmt_queue_enable(lcore, portid, + qid, pmgmt_type); + if (ret < 0) + rte_exit(EXIT_FAILURE, + "rte_power_ethdev_pmgmt_queue_enable: err=%d, port=%d\n", + ret, portid); + + if (!last) + return; + ret = rte_power_ethdev_pmgmt_queue_set_power_save(lcore, portid, qid); + if (ret < 0) + rte_exit(EXIT_FAILURE, + "rte_power_ethdev_pmgmt_queue_set_power_save: err=%d, port=%d\n", + ret, portid); +} + int main(int argc, char **argv) { @@ -2723,12 +2744,6 @@ main(int argc, char **argv) printf("\nInitializing rx queues on lcore %u ... ", lcore_id ); fflush(stdout); - /* PMD power management mode can only do 1 queue per core */ - if (app_mode == APP_MODE_PMD_MGMT && qconf->n_rx_queue > 1) { - rte_exit(EXIT_FAILURE, - "In PMD power management mode, only one queue per lcore is allowed\n"); - } - /* init RX queues */ for(queue = 0; queue < qconf->n_rx_queue; ++queue) { struct rte_eth_rxconf rxq_conf; @@ -2767,15 +2782,9 @@ main(int argc, char **argv) "Fail to add ptype cb\n"); } - if (app_mode == APP_MODE_PMD_MGMT) { - ret = rte_power_ethdev_pmgmt_queue_enable( - lcore_id, portid, queueid, - pmgmt_type); - if (ret < 0) - rte_exit(EXIT_FAILURE, - "rte_power_ethdev_pmgmt_queue_enable: err=%d, port=%d\n", - ret, portid); - } + if (app_mode == APP_MODE_PMD_MGMT) + pmd_pmgmt_set_up(lcore_id, portid, queueid, + queue == (qconf->n_rx_queue - 1)); } }