From patchwork Tue Aug 11 10:27:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 75419 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3ECFDA04C3; Tue, 11 Aug 2020 12:28:00 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 7C9EE1C00D; Tue, 11 Aug 2020 12:27:59 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id C79651B203 for ; Tue, 11 Aug 2020 12:27:57 +0200 (CEST) IronPort-SDR: +b1i0sQAtydi6BZbv94U6yZpqpih5hnda5CJzz85WM7Fulx/hTUAUm/kBuqHEWVEdk+gJkglZm DE/pciVYlhuw== X-IronPort-AV: E=McAfee;i="6000,8403,9709"; a="154825039" X-IronPort-AV: E=Sophos;i="5.75,460,1589266800"; d="scan'208";a="154825039" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Aug 2020 03:27:56 -0700 IronPort-SDR: KYO/EM2VZqt5/HOtWTphUxSWQjc596wrr0bcqz4Hun9ff30MHnKFZU933CmCSC3bReMQ0D7V7u eOwPc8JOAGLQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,460,1589266800"; d="scan'208";a="294675613" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga006.jf.intel.com with ESMTP; 11 Aug 2020 03:27:55 -0700 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 07BARtMg012798; Tue, 11 Aug 2020 11:27:55 +0100 Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 07BARscJ020755; Tue, 11 Aug 2020 11:27:54 +0100 Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 07BARsvk020751; Tue, 11 Aug 2020 11:27:54 +0100 From: Liang Ma To: dev@dpdk.org Cc: anatoly.burakov@intel.com, Liang Ma Date: Tue, 11 Aug 2020 11:27:42 +0100 Message-Id: <1597141666-20621-1-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: References: Subject: [dpdk-dev] [RFC v2 1/5] eal: add power management intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add two new power management intrinsics, and provide an implementation in eal/x86 based on UMONITOR/UMWAIT instructions. The instructions are implemented as raw byte opcodes because there is not yet widespread compiler support for these instructions. The power management instructions provide an architecture-specific function to either wait until a specified TSC timestamp is reached, or optionally wait until either a TSC timestamp is reached or a memory location is written to. The monitor function also provides an optional comparison, to avoid sleeping when the expected write has already happened, and no more writes are expected. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov --- .../include/generic/rte_power_intrinsics.h | 64 ++++++++ lib/librte_eal/include/meson.build | 1 + lib/librte_eal/x86/include/meson.build | 1 + lib/librte_eal/x86/include/rte_cpuflags.h | 1 + .../x86/include/rte_power_intrinsics.h | 138 ++++++++++++++++++ lib/librte_eal/x86/rte_cpuflags.c | 2 + 6 files changed, 207 insertions(+) create mode 100644 lib/librte_eal/include/generic/rte_power_intrinsics.h create mode 100644 lib/librte_eal/x86/include/rte_power_intrinsics.h diff --git a/lib/librte_eal/include/generic/rte_power_intrinsics.h b/lib/librte_eal/include/generic/rte_power_intrinsics.h new file mode 100644 index 000000000..8646c4ac1 --- /dev/null +++ b/lib/librte_eal/include/generic/rte_power_intrinsics.h @@ -0,0 +1,64 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#ifndef _RTE_POWER_INTRINSIC_H_ +#define _RTE_POWER_INTRINSIC_H_ + +#include + +/** + * @file + * Advanced power management operations. + * + * This file define APIs for advanced power management, + * which are architecture-dependent. + */ + +/** + * Monitor specific address for changes. This will cause the CPU to enter an + * architecture-defined optimized power state until either the specified + * memory address is written to, or a certain TSC timestamp is reached. + * + * Additionally, an `expected` 64-bit value and 64-bit mask are provided. If + * mask is non-zero, the current value pointed to by the `p` pointer will be + * checked against the expected value, and if they match, the entering of + * optimized power state may be aborted. + * + * @param p + * Address to monitor for changes. Must be aligned on an 8-byte boundary. + * @param expected_value + * Before attempting the monitoring, the `p` address may be read and compared + * against this value. If `value_mask` is zero, this step will be skipped. + * @param value_mask + * The 64-bit mask to use to extract current value from `p`. + * @param state + * Architecture-dependent optimized power state number + * @param tsc_timestamp + * Maximum TSC timestamp to wait for. Note that the wait behavior is + * architecture-dependent. + * + * @return + * Architecture-dependent return value. + */ +static inline int rte_power_monitor(const volatile void *p, + const uint64_t expected_value, const uint64_t value_mask, + const uint32_t state, const uint64_t tsc_timestamp); + +/** + * Enter an architecture-defined optimized power state until a certain TSC + * timestamp is reached. + * + * @param state + * Architecture-dependent optimized power state number + * @param tsc_timestamp + * Maximum TSC timestamp to wait for. Note that the wait behavior is + * architecture-dependent. + * + * @return + * Architecture-dependent return value. + */ +static inline int rte_power_pause(const uint32_t state, + const uint64_t tsc_timestamp); + +#endif /* _RTE_POWER_INTRINSIC_H_ */ diff --git a/lib/librte_eal/include/meson.build b/lib/librte_eal/include/meson.build index cd0902795..3a12e87e1 100644 --- a/lib/librte_eal/include/meson.build +++ b/lib/librte_eal/include/meson.build @@ -60,6 +60,7 @@ generic_headers = files( 'generic/rte_memcpy.h', 'generic/rte_pause.h', 'generic/rte_prefetch.h', + 'generic/rte_power_intrinsics.h', 'generic/rte_rwlock.h', 'generic/rte_spinlock.h', 'generic/rte_ticketlock.h', diff --git a/lib/librte_eal/x86/include/meson.build b/lib/librte_eal/x86/include/meson.build index f0e998c2f..494a8142a 100644 --- a/lib/librte_eal/x86/include/meson.build +++ b/lib/librte_eal/x86/include/meson.build @@ -13,6 +13,7 @@ arch_headers = files( 'rte_io.h', 'rte_memcpy.h', 'rte_prefetch.h', + 'rte_power_intrinsics.h', 'rte_pause.h', 'rte_rtm.h', 'rte_rwlock.h', diff --git a/lib/librte_eal/x86/include/rte_cpuflags.h b/lib/librte_eal/x86/include/rte_cpuflags.h index c1d20364d..94d6a4376 100644 --- a/lib/librte_eal/x86/include/rte_cpuflags.h +++ b/lib/librte_eal/x86/include/rte_cpuflags.h @@ -110,6 +110,7 @@ enum rte_cpu_flag_t { RTE_CPUFLAG_RDTSCP, /**< RDTSCP */ RTE_CPUFLAG_EM64T, /**< EM64T */ + RTE_CPUFLAG_WAITPKG, /**< UMINITOR/UMWAIT/TPAUSE */ /* (EAX 80000007h) EDX features */ RTE_CPUFLAG_INVTSC, /**< INVTSC */ diff --git a/lib/librte_eal/x86/include/rte_power_intrinsics.h b/lib/librte_eal/x86/include/rte_power_intrinsics.h new file mode 100644 index 000000000..af8aa9459 --- /dev/null +++ b/lib/librte_eal/x86/include/rte_power_intrinsics.h @@ -0,0 +1,138 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#ifndef _RTE_POWER_INTRINSIC_X86_64_H_ +#define _RTE_POWER_INTRINSIC_X86_64_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +#include "generic/rte_power_intrinsics.h" + +/** + * Monitor specific address for changes. This will cause the CPU to enter an + * architecture-defined optimized power state until either the specified + * memory address is written to, or a certain TSC timestamp is reached. + * + * Additionally, an `expected` 64-bit value and 64-bit mask are provided. If + * mask is non-zero, the current value pointed to by the `p` pointer will be + * checked against the expected value, and if they match, the entering of + * optimized power state may be aborted. + * + * This function uses UMONITOR/UMWAIT instructions. For more information about + * their usage, please refer to Intel(R) 64 and IA-32 Architectures Software + * Developer's Manual. + * + * @param p + * Address to monitor for changes. Must be aligned on an 8-byte boundary. + * @param expected_value + * Before attempting the monitoring, the `p` address may be read and compared + * against this value. If `value_mask` is zero, this step will be skipped. + * @param value_mask + * The 64-bit mask to use to extract current value from `p`. + * @param state + * Architecture-dependent optimized power state number. Can be 0 (C0.2) or + * 1 (C0.1). + * @param tsc_timestamp + * Maximum TSC timestamp to wait for. + * + * @return + * - 1 if wakeup was due to TSC timeout expiration. + * - 0 if wakeup was due to memory write or other reasons. + */ +static inline int rte_power_monitor(const volatile void *p, + const uint64_t expected_value, const uint64_t value_mask, + const uint32_t state, const uint64_t tsc_timestamp) +{ + const uint32_t tsc_l = (uint32_t)tsc_timestamp; + const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + +#ifdef RTE_ARCH_I686 + uint32_t rflags; +#else + uint64_t rflags; +#endif + /* + * we're using raw byte codes for now as only the newest compiler + * versions support this instruction natively. + */ + + /* set address for UMONITOR */ + asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" + : + : "D"(p)); + rte_mb(); + if (value_mask) { + const uint64_t cur_value = *(const volatile uint64_t *)p; + const uint64_t masked = cur_value & value_mask; + /* if the masked value is already matching, abort */ + if (masked == expected_value) + return 0; + } + /* execute UMWAIT */ + asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;\n" + /* + * UMWAIT sets CF flag in RFLAGS, so PUSHF to push them + * onto the stack, then pop them back into `rflags` so that + * we can read it. + */ + "pushf;\n" + "pop %0;\n" + : "=r"(rflags) + : "D"(state), "a"(tsc_l), "d"(tsc_h)); + + /* we're interested in the first bit (the carry flag) */ + return rflags & 0x1; +} + +/** + * Enter an architecture-defined optimized power state until a certain TSC + * timestamp is reached. + * + * This function uses TPAUSE instruction. For more information about its usage, + * please refer to Intel(R) 64 and IA-32 Architectures Software Developer's + * Manual. + * + * @param state + * Architecture-dependent optimized power state number. Can be 0 (C0.2) or + * 1 (C0.1). + * @param tsc_timestamp + * Maximum TSC timestamp to wait for. + * + * @return + * - 1 if wakeup was due to TSC timeout expiration. + * - 0 if wakeup was due to other reasons. + */ +static inline int rte_power_pause(const uint32_t state, + const uint64_t tsc_timestamp) +{ + const uint32_t tsc_l = (uint32_t)tsc_timestamp; + const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + uint64_t rflags; + + /* execute TPAUSE */ + asm volatile(".byte 0x66, 0x0f, 0xae, 0xf7;\n" + /* + * TPAUSE sets CF flag in RFLAGS, so PUSHF to push them + * onto the stack, then pop them back into `rflags` so that + * we can read it. + */ + "pushf;\n" + "pop %0;\n" + : "=r"(rflags) + : "D"(state), "a"(tsc_l), "d"(tsc_h)); + + /* we're interested in the first bit (the carry flag) */ + return rflags & 0x1; +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_POWER_INTRINSIC_X86_64_H_ */ diff --git a/lib/librte_eal/x86/rte_cpuflags.c b/lib/librte_eal/x86/rte_cpuflags.c index 30439e795..0325c4b93 100644 --- a/lib/librte_eal/x86/rte_cpuflags.c +++ b/lib/librte_eal/x86/rte_cpuflags.c @@ -110,6 +110,8 @@ const struct feature_entry rte_cpu_feature_table[] = { FEAT_DEF(AVX512F, 0x00000007, 0, RTE_REG_EBX, 16) FEAT_DEF(RDSEED, 0x00000007, 0, RTE_REG_EBX, 18) + FEAT_DEF(WAITPKG, 0x00000007, 0, RTE_REG_ECX, 5) + FEAT_DEF(LAHF_SAHF, 0x80000001, 0, RTE_REG_ECX, 0) FEAT_DEF(LZCNT, 0x80000001, 0, RTE_REG_ECX, 4) From patchwork Tue Aug 11 10:27:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 75420 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 136E9A04C3; Tue, 11 Aug 2020 12:28:12 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 87BBD1C0AC; Tue, 11 Aug 2020 12:28:01 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id 17E361C022 for ; Tue, 11 Aug 2020 12:27:59 +0200 (CEST) IronPort-SDR: yjOF54edCs3xITDNkVe1hEqVv4a9voHm5PHBoqMRZZMPDscJQpsQvrcvACUPYjGl6ibRwQG/eT DsfhZJ5C/fOw== X-IronPort-AV: E=McAfee;i="6000,8403,9709"; a="133242471" X-IronPort-AV: E=Sophos;i="5.75,460,1589266800"; d="scan'208";a="133242471" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Aug 2020 03:27:59 -0700 IronPort-SDR: bqoT51Rkv4rpT5AhIUR6RXHny21G7yYLAnPKTy8bNEzWN4SfKND7RkXNhM/aIIPh1mmuXirZDz Dt8XIvQoznJg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,460,1589266800"; d="scan'208";a="334526050" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga007.jf.intel.com with ESMTP; 11 Aug 2020 03:27:57 -0700 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 07BARvMG012803; Tue, 11 Aug 2020 11:27:57 +0100 Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 07BARvRg020778; Tue, 11 Aug 2020 11:27:57 +0100 Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 07BARvM9020774; Tue, 11 Aug 2020 11:27:57 +0100 From: Liang Ma To: dev@dpdk.org Cc: anatoly.burakov@intel.com, Liang Ma Date: Tue, 11 Aug 2020 11:27:43 +0100 Message-Id: <1597141666-20621-2-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1597141666-20621-1-git-send-email-liang.j.ma@intel.com> References: <1597141666-20621-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [RFC v2 2/5] ethdev: add simple power management API and callback X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add a simple on/off switch that will enable saving power when no packets are arriving. It is based on counting the number of empty polls and, when the number reaches a certain threshold, entering an architecture-defined optimized power state that will either wait until a TSC timestamp expires, or when packets arrive. This API is limited to 1 core 1 queue use case as there is no coordination between queues/cores in ethdev. This design leverage RX Callback mechnaism which allow three different power management methodology co exist. 1. umwait/umonitor: The TSC timestamp is automatically calculated using current link speed and RX descriptor ring size, such that the sleep time is not longer than it would take for a NIC to fill its entire RX descriptor ring. 2. Pause instruction Instead of move the core into deeper C state, this lightweight method use Pause instruction to releaf the processor from busy polling. 3. Frequency Scaling Reuse exist rte power library to scale up/down core frequency depend on traffic volume. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov --- config/common_base | 4 +- lib/Makefile | 1 + lib/librte_ethdev/Makefile | 2 +- lib/librte_ethdev/meson.build | 2 +- lib/librte_ethdev/rte_ethdev.c | 198 +++++++++++++++++++++++ lib/librte_ethdev/rte_ethdev.h | 59 +++++++ lib/librte_ethdev/rte_ethdev_core.h | 43 ++++- lib/librte_ethdev/rte_ethdev_version.map | 4 + lib/meson.build | 5 +- mk/rte.app.mk | 2 +- 10 files changed, 311 insertions(+), 9 deletions(-) diff --git a/config/common_base b/config/common_base index f76585f16..e0948f0cb 100644 --- a/config/common_base +++ b/config/common_base @@ -155,7 +155,7 @@ CONFIG_RTE_MAX_ETHPORTS=32 CONFIG_RTE_MAX_QUEUES_PER_PORT=1024 CONFIG_RTE_LIBRTE_IEEE1588=n CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16 -CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y +CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=n CONFIG_RTE_ETHDEV_PROFILE_WITH_VTUNE=n # @@ -978,7 +978,7 @@ CONFIG_RTE_LIBRTE_ACL_DEBUG=n # # Compile librte_power # -CONFIG_RTE_LIBRTE_POWER=n +CONFIG_RTE_LIBRTE_POWER=y CONFIG_RTE_LIBRTE_POWER_DEBUG=n CONFIG_RTE_MAX_LCORE_FREQS=64 diff --git a/lib/Makefile b/lib/Makefile index 8f5b68a2d..87646698a 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -28,6 +28,7 @@ DEPDIRS-librte_ethdev := librte_net librte_eal librte_mempool librte_ring DEPDIRS-librte_ethdev += librte_mbuf DEPDIRS-librte_ethdev += librte_kvargs DEPDIRS-librte_ethdev += librte_meter +DEPDIRS-librte_ethdev += librte_power DIRS-$(CONFIG_RTE_LIBRTE_BBDEV) += librte_bbdev DEPDIRS-librte_bbdev := librte_eal librte_mempool librte_mbuf DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += librte_cryptodev diff --git a/lib/librte_ethdev/Makefile b/lib/librte_ethdev/Makefile index 47747150b..6a4ce14cf 100644 --- a/lib/librte_ethdev/Makefile +++ b/lib/librte_ethdev/Makefile @@ -11,7 +11,7 @@ LIB = librte_ethdev.a CFLAGS += -O3 CFLAGS += $(WERROR_FLAGS) LDLIBS += -lrte_net -lrte_eal -lrte_mempool -lrte_ring -LDLIBS += -lrte_mbuf -lrte_kvargs -lrte_meter -lrte_telemetry +LDLIBS += -lrte_mbuf -lrte_kvargs -lrte_meter -lrte_telemetry -lrte_power EXPORT_MAP := rte_ethdev_version.map diff --git a/lib/librte_ethdev/meson.build b/lib/librte_ethdev/meson.build index 8fc24e8c8..e09e2395e 100644 --- a/lib/librte_ethdev/meson.build +++ b/lib/librte_ethdev/meson.build @@ -27,4 +27,4 @@ headers = files('rte_ethdev.h', 'rte_tm.h', 'rte_tm_driver.h') -deps += ['net', 'kvargs', 'meter', 'telemetry'] +deps += ['net', 'kvargs', 'meter', 'telemetry', 'power'] diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index 7858ad5f1..b43de88ce 100644 --- a/lib/librte_ethdev/rte_ethdev.c +++ b/lib/librte_ethdev/rte_ethdev.c @@ -16,6 +16,7 @@ #include #include +#include #include #include #include @@ -39,6 +40,7 @@ #include #include #include +#include #include "rte_ethdev_trace.h" #include "rte_ethdev.h" @@ -185,6 +187,100 @@ enum { STAT_QMAP_RX }; + +static uint16_t +rte_ethdev_pmgmt_umait(uint16_t port_id, uint16_t qidx, + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, + uint16_t max_pkts __rte_unused, void *_ __rte_unused) +{ + + struct rte_eth_dev *dev = &rte_eth_devices[port_id]; + + if (dev->pwr_mgmt_state == RTE_ETH_DEV_POWER_MGMT_ENABLED) { + if (unlikely(nb_rx == 0)) { + dev->empty_poll_stats[qidx].num++; + if (unlikely(dev->empty_poll_stats[qidx].num > + ETH_EMPTYPOLL_MAX)) { + volatile void *target_addr; + uint64_t expected, mask; + uint16_t ret; + + /* + * get address of next descriptor in the RX + * ring for this queue, as well as expected + * value and a mask. + */ + ret = (*dev->dev_ops->next_rx_desc) + (dev->data->rx_queues[qidx], + &target_addr, &expected, &mask); + if (ret == 0) + /* -1ULL is maximum value for TSC */ + rte_power_monitor(target_addr, + expected, mask, + 0, -1ULL); + } + } else + dev->empty_poll_stats[qidx].num = 0; + } + + return 0; +} + +static uint16_t +rte_ethdev_pmgmt_pause(uint16_t port_id, uint16_t qidx, + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, + uint16_t max_pkts __rte_unused, void *_ __rte_unused) +{ + struct rte_eth_dev *dev = &rte_eth_devices[port_id]; + + int i; + + if (dev->pwr_mgmt_state == RTE_ETH_DEV_POWER_MGMT_ENABLED) { + if (unlikely(nb_rx == 0)) { + + dev->empty_poll_stats[qidx].num++; + + if (unlikely(dev->empty_poll_stats[qidx].num > + ETH_EMPTYPOLL_MAX)) { + + for (i = 0; i < RTE_ETH_PAUSE_NUM; i++) + rte_pause(); + + } + } else + dev->empty_poll_stats[qidx].num = 0; + } + + return 0; +} + +static uint16_t +rte_ethdev_pmgmt_scalefreq(uint16_t port_id, uint16_t qidx, + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, + uint16_t max_pkts __rte_unused, void *_ __rte_unused) +{ + struct rte_eth_dev *dev = &rte_eth_devices[port_id]; + + if (dev->pwr_mgmt_state == RTE_ETH_DEV_POWER_MGMT_ENABLED) { + if (unlikely(nb_rx == 0)) { + dev->empty_poll_stats[qidx].num++; + if (unlikely(dev->empty_poll_stats[qidx].num > + ETH_EMPTYPOLL_MAX)) { + + /*scale down freq */ + rte_power_freq_min(rte_lcore_id()); + + } + } else { + dev->empty_poll_stats[qidx].num = 0; + /* scal up freq */ + rte_power_freq_max(rte_lcore_id()); + } + } + + return 0; +} + int rte_eth_iterator_init(struct rte_dev_iterator *iter, const char *devargs_str) { @@ -5113,6 +5209,108 @@ rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool) return (*dev->dev_ops->pool_ops_supported)(dev, pool); } +int +rte_eth_dev_power_mgmt_enable(unsigned int lcore_id, + uint16_t port_id, + enum rte_eth_dev_power_mgmt_cb_mode mode) +{ + struct rte_eth_dev *dev; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); + dev = &rte_eth_devices[port_id]; + + /* allocate memory for empty poll stats */ + dev->empty_poll_stats = rte_malloc_socket(NULL, + sizeof(struct rte_eth_ep_stat) + * RTE_MAX_QUEUES_PER_PORT, + 0, dev->data->numa_node); + + if (dev->empty_poll_stats == NULL) + return -ENOMEM; + + if (dev->pwr_mgmt_state == RTE_ETH_DEV_POWER_MGMT_ENABLED) + return -EINVAL; + + dev->cb_mode = mode; + + switch (mode) { + + case RTE_ETH_DEV_POWER_MGMT_CB_UMWAIT: + + if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_WAITPKG)) + return -ENOTSUP; + + dev->cur_pwr_cb = rte_eth_add_rx_callback(port_id, 0, + rte_ethdev_pmgmt_umait, NULL); + break; + + case RTE_ETH_DEV_POWER_MGMT_CB_SCALE: + + /* init scale freq */ + if (rte_power_init(lcore_id)) + return -EINVAL; + + dev->cur_pwr_cb = rte_eth_add_rx_callback(port_id, 0, + rte_ethdev_pmgmt_scalefreq, NULL); + break; + + case RTE_ETH_DEV_POWER_MGMT_CB_PAUSE: + + dev->cur_pwr_cb = rte_eth_add_rx_callback(port_id, 0, + rte_ethdev_pmgmt_pause, NULL); + break; + + } + + dev->pwr_mgmt_state = RTE_ETH_DEV_POWER_MGMT_ENABLED; + return 0; +} + +int +rte_eth_dev_power_mgmt_disable(unsigned int lcore_id, + uint16_t port_id) +{ + struct rte_eth_dev *dev; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); + dev = &rte_eth_devices[port_id]; + + /*add flag check */ + + if (dev->pwr_mgmt_state == RTE_ETH_DEV_POWER_MGMT_ENABLED) { + /* rte_free ignores NULL so safe to call without checks */ + rte_free(dev->empty_poll_stats); + + switch (dev->cb_mode) { + + case RTE_ETH_DEV_POWER_MGMT_CB_UMWAIT: + + case RTE_ETH_DEV_POWER_MGMT_CB_PAUSE: + + rte_eth_remove_rx_callback(port_id, 0, + dev->cur_pwr_cb); + + break; + + case RTE_ETH_DEV_POWER_MGMT_CB_SCALE: + + rte_power_freq_max(lcore_id); + + rte_eth_remove_rx_callback(port_id, 0, + dev->cur_pwr_cb); + + if (rte_power_exit(lcore_id)) + return -EINVAL; + + break; + } + + dev->pwr_mgmt_state = RTE_ETH_DEV_POWER_MGMT_DISABLED; + + } + return 0; +} + /** * A set of values to describe the possible states of a switch domain. */ diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h index 57e4a6ca5..6858c0338 100644 --- a/lib/librte_ethdev/rte_ethdev.h +++ b/lib/librte_ethdev/rte_ethdev.h @@ -157,6 +157,7 @@ extern "C" { #include #include #include +#include #include "rte_ethdev_trace_fp.h" #include "rte_dev_info.h" @@ -775,6 +776,7 @@ rte_eth_rss_hf_refine(uint64_t rss_hf) /** Maximum nb. of vlan per mirror rule */ #define ETH_MIRROR_MAX_VLANS 64 +#define ETH_EMPTYPOLL_MAX 512 /**< Empty poll number threshlold */ #define ETH_MIRROR_VIRTUAL_POOL_UP 0x01 /**< Virtual Pool uplink Mirroring. */ #define ETH_MIRROR_UPLINK_PORT 0x02 /**< Uplink Port Mirroring. */ #define ETH_MIRROR_DOWNLINK_PORT 0x04 /**< Downlink Port Mirroring. */ @@ -1603,6 +1605,25 @@ enum rte_eth_dev_state { RTE_ETH_DEV_REMOVED, }; +#define RTE_ETH_PAUSE_NUM 64 /* How many times to pause */ +/** + * Possible power management states of an ethdev port. + */ +enum rte_eth_dev_power_mgmt_state { + /** Device power management is disabled. */ + RTE_ETH_DEV_POWER_MGMT_DISABLED = 0, + /** Device power management is enabled. */ + RTE_ETH_DEV_POWER_MGMT_ENABLED, +}; + +enum rte_eth_dev_power_mgmt_cb_mode { + /** Device power management is disabled. */ + RTE_ETH_DEV_POWER_MGMT_CB_UMWAIT = 0, + /** Device power management is enabled. */ + RTE_ETH_DEV_POWER_MGMT_CB_PAUSE, + RTE_ETH_DEV_POWER_MGMT_CB_SCALE, +}; + struct rte_eth_dev_sriov { uint8_t active; /**< SRIOV is active with 16, 32 or 64 pools */ uint8_t nb_q_per_pool; /**< rx queue number per pool */ @@ -4415,6 +4436,40 @@ __rte_experimental int rte_eth_dev_hairpin_capability_get(uint16_t port_id, struct rte_eth_hairpin_cap *cap); +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Enable device power management. + * + * @param port_id + * The port identifier of the Ethernet device. + * + * @return + * 0 on success + * <0 on error + */ +__rte_experimental +int rte_eth_dev_power_mgmt_enable(unsigned int lcore_id, + uint16_t port_id, + enum rte_eth_dev_power_mgmt_cb_mode mode); + +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Disable device power management. + * + * @param port_id + * The port identifier of the Ethernet device. + * + * @return + * 0 on success + * <0 on error + */ +__rte_experimental +int rte_eth_dev_power_mgmt_disable(unsigned int lcore_id, uint16_t port_id); + #include /** @@ -4535,6 +4590,7 @@ rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id, return nb_rx; } + /** * Get the number of used descriptors of a rx queue * @@ -4993,6 +5049,9 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id, return rte_eth_tx_buffer_flush(port_id, queue_id, buffer); } + + + #ifdef __cplusplus } #endif diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h index 32407dd41..7d6d85ddc 100644 --- a/lib/librte_ethdev/rte_ethdev_core.h +++ b/lib/librte_ethdev/rte_ethdev_core.h @@ -603,6 +603,27 @@ typedef int (*eth_tx_hairpin_queue_setup_t) uint16_t nb_tx_desc, const struct rte_eth_hairpin_conf *hairpin_conf); +/** + * @internal + * Get the next RX ring descriptor address. + * + * @param rxq + * ethdev queue pointer. + * @param tail_desc_addr + * the pointer point to descriptor address var. + * + * @return + * Negative errno value on error, 0 on success. + * + * @retval 0 + * Success. + * @retval -EINVAL + * Failed to get descriptor address. + */ +typedef int (*eth_next_rx_desc_t) + (void *rxq, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask); + /** * @internal A structure containing the functions exported by an Ethernet driver. */ @@ -752,6 +773,8 @@ struct eth_dev_ops { /**< Set up device RX hairpin queue. */ eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup; /**< Set up device TX hairpin queue. */ + eth_next_rx_desc_t next_rx_desc; + /**< Get next RX ring descriptor address. */ }; /** @@ -768,6 +791,14 @@ struct rte_eth_rxtx_callback { void *param; }; +/** + * @internal + * Structure used to hold counters for empty poll + */ +struct rte_eth_ep_stat { + uint64_t num; +} __rte_cache_aligned; + /** * @internal * The generic data structure associated with each ethernet device. @@ -807,8 +838,16 @@ struct rte_eth_dev { enum rte_eth_dev_state state; /**< Flag indicating the port state */ void *security_ctx; /**< Context for security ops */ - uint64_t reserved_64s[4]; /**< Reserved for future fields */ - void *reserved_ptrs[4]; /**< Reserved for future fields */ + /**< Empty poll number */ + enum rte_eth_dev_power_mgmt_state pwr_mgmt_state; + enum rte_eth_dev_power_mgmt_cb_mode cb_mode; + uint32_t reserved_32; + uint64_t reserved_64s[3]; /**< Reserved for future fields */ + + /**< Flag indicating the port power state */ + struct rte_eth_ep_stat *empty_poll_stats; + const struct rte_eth_rxtx_callback *cur_pwr_cb; + void *reserved_ptrs[3]; /**< Reserved for future fields */ } __rte_cache_aligned; struct rte_eth_dev_sriov; diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map index 1212a17d3..4d5b63a5b 100644 --- a/lib/librte_ethdev/rte_ethdev_version.map +++ b/lib/librte_ethdev/rte_ethdev_version.map @@ -241,6 +241,10 @@ EXPERIMENTAL { __rte_ethdev_trace_rx_burst; __rte_ethdev_trace_tx_burst; rte_flow_get_aged_flows; + + # added in 20.08 + rte_eth_dev_power_mgmt_disable; + rte_eth_dev_power_mgmt_enable; }; INTERNAL { diff --git a/lib/meson.build b/lib/meson.build index 3852c0156..54cc0db7d 100644 --- a/lib/meson.build +++ b/lib/meson.build @@ -14,17 +14,18 @@ libraries = [ 'eal', # everything depends on eal 'ring', 'rcu', # rcu depends on ring + 'timer', # eventdev depends on this + 'power', # eventdev depends on this 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core 'cmdline', 'metrics', # bitrate/latency stats depends on this 'hash', # efd depends on this - 'timer', # eventdev depends on this 'acl', 'bbdev', 'bitratestats', 'cfgfile', 'compressdev', 'cryptodev', 'distributor', 'efd', 'eventdev', 'gro', 'gso', 'ip_frag', 'jobstats', 'kni', 'latencystats', 'lpm', 'member', - 'power', 'pdump', 'rawdev', 'regexdev', + 'pdump', 'rawdev', 'regexdev', 'rib', 'reorder', 'sched', 'security', 'stack', 'vhost', # ipsec lib depends on net, crypto and security 'ipsec', diff --git a/mk/rte.app.mk b/mk/rte.app.mk index a54425997..b87abb26e 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -58,7 +58,6 @@ endif _LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS) += --no-whole-archive _LDLIBS-$(CONFIG_RTE_LIBRTE_BITRATE) += -lrte_bitratestats _LDLIBS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += -lrte_latencystats -_LDLIBS-$(CONFIG_RTE_LIBRTE_POWER) += -lrte_power _LDLIBS-$(CONFIG_RTE_LIBRTE_EFD) += -lrte_efd _LDLIBS-$(CONFIG_RTE_LIBRTE_BPF) += -lrte_bpf @@ -80,6 +79,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_KVARGS) += -lrte_kvargs _LDLIBS-y += -lrte_telemetry _LDLIBS-$(CONFIG_RTE_LIBRTE_MBUF) += -lrte_mbuf _LDLIBS-$(CONFIG_RTE_LIBRTE_NET) += -lrte_net +_LDLIBS-$(CONFIG_RTE_LIBRTE_POWER) += -lrte_power _LDLIBS-$(CONFIG_RTE_LIBRTE_ETHER) += -lrte_ethdev _LDLIBS-$(CONFIG_RTE_LIBRTE_BBDEV) += -lrte_bbdev _LDLIBS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += -lrte_cryptodev From patchwork Tue Aug 11 10:27:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 75421 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 14AD3A04C3; Tue, 11 Aug 2020 12:28:22 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 0B4DF1C0B1; Tue, 11 Aug 2020 12:28:03 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 83F581C025 for ; Tue, 11 Aug 2020 12:28:00 +0200 (CEST) IronPort-SDR: 0IPtwbv1sL/D5UyB0sZF8xOWVAO3iWb7jUiLGFk/t0IeXFJpVeySr5EnEEeU2tYTOiiNnZODD6 TVgxNn1K8o3g== X-IronPort-AV: E=McAfee;i="6000,8403,9709"; a="154825045" X-IronPort-AV: E=Sophos;i="5.75,460,1589266800"; d="scan'208";a="154825045" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Aug 2020 03:28:00 -0700 IronPort-SDR: AeCufWQIT1O9M4cWBsL6caiK8ZZ4/GlI8uxWC+AYxKjKFQUFeV91rR3QvuGkuW9SvbZ9f/dFB6 W2oEgEizFIRw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,460,1589266800"; d="scan'208";a="324723619" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga008.jf.intel.com with ESMTP; 11 Aug 2020 03:27:58 -0700 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 07BARwwx012809; Tue, 11 Aug 2020 11:27:58 +0100 Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 07BARw99020785; Tue, 11 Aug 2020 11:27:58 +0100 Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 07BARwCx020781; Tue, 11 Aug 2020 11:27:58 +0100 From: Liang Ma To: dev@dpdk.org Cc: anatoly.burakov@intel.com, Liang Ma Date: Tue, 11 Aug 2020 11:27:44 +0100 Message-Id: <1597141666-20621-3-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1597141666-20621-1-git-send-email-liang.j.ma@intel.com> References: <1597141666-20621-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [RFC v2 3/5] net/ixgbe: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Implement support for the power management API by implementing a `next_rx_desc` function that will return an address of an RX ring's status bit. Signed-off-by: Anatoly Burakov Signed-off-by: Liang Ma --- drivers/net/ixgbe/ixgbe_ethdev.c | 1 + drivers/net/ixgbe/ixgbe_rxtx.c | 22 ++++++++++++++++++++++ drivers/net/ixgbe/ixgbe_rxtx.h | 2 ++ 3 files changed, 25 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index fd0cb9b0e..618fc1573 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -592,6 +592,7 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = { .udp_tunnel_port_del = ixgbe_dev_udp_tunnel_port_del, .tm_ops_get = ixgbe_tm_ops_get, .tx_done_cleanup = ixgbe_dev_tx_done_cleanup, + .next_rx_desc = ixgbe_next_rx_desc, }; /* diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index 977ecf513..d1d015dea 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -1366,6 +1366,28 @@ const uint32_t RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP, }; +int ixgbe_next_rx_desc(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask) +{ + volatile union ixgbe_adv_rx_desc *rxdp; + struct ixgbe_rx_queue *rxq = rx_queue; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + *tail_desc_addr = &rxdp->wb.upper.status_error; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + *expected = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD); + *mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD); + + return 0; +} + /* @note: fix ixgbe_dev_supported_ptypes_get() if any change here. */ static inline uint32_t ixgbe_rxd_pkt_info_to_pkt_type(uint32_t pkt_info, uint16_t ptype_mask) diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h index 7e09291b2..826f451be 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.h +++ b/drivers/net/ixgbe/ixgbe_rxtx.h @@ -299,5 +299,7 @@ uint64_t ixgbe_get_tx_port_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_rx_queue_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_rx_port_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_tx_queue_offloads(struct rte_eth_dev *dev); +int ixgbe_next_rx_desc(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask); #endif /* _IXGBE_RXTX_H_ */ From patchwork Tue Aug 11 10:27:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 75422 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 42E14A04C3; Tue, 11 Aug 2020 12:28:32 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 34C321C0B6; Tue, 11 Aug 2020 12:28:04 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 48FC51C07D for ; Tue, 11 Aug 2020 12:28:01 +0200 (CEST) IronPort-SDR: F6scrq/98hRjuGcjd+zamEZTiJUahRAKz4N3/t2QQQmMP5x7aSFax0dYgRkDX31MHG/xu8I+Zm +mdnw8zECfGA== X-IronPort-AV: E=McAfee;i="6000,8403,9709"; a="154825049" X-IronPort-AV: E=Sophos;i="5.75,460,1589266800"; d="scan'208";a="154825049" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Aug 2020 03:28:00 -0700 IronPort-SDR: SBwwpvDD0ouzMDfzi7CxcbK6I+Za9ecnyPIdoeU75e+lD3nRJNbUy2f4/OEbZkfCF7WsXlPs5y /3+L63hollmg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,460,1589266800"; d="scan'208";a="368914457" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga001.jf.intel.com with ESMTP; 11 Aug 2020 03:27:59 -0700 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 07BARxSv012846; Tue, 11 Aug 2020 11:27:59 +0100 Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 07BARxam020792; Tue, 11 Aug 2020 11:27:59 +0100 Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 07BARx6X020788; Tue, 11 Aug 2020 11:27:59 +0100 From: Liang Ma To: dev@dpdk.org Cc: anatoly.burakov@intel.com, Liang Ma Date: Tue, 11 Aug 2020 11:27:45 +0100 Message-Id: <1597141666-20621-4-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1597141666-20621-1-git-send-email-liang.j.ma@intel.com> References: <1597141666-20621-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [RFC v2 4/5] net/i40e: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Implement support for the power management API by implementing a `next_rx_desc` function that will return an address of an RX ring's status bit. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov --- drivers/net/i40e/i40e_ethdev.c | 1 + drivers/net/i40e/i40e_rxtx.c | 23 +++++++++++++++++++++++ drivers/net/i40e/i40e_rxtx.h | 2 ++ 3 files changed, 26 insertions(+) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 05d5f2861..f0797c3cb 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -515,6 +515,7 @@ static const struct eth_dev_ops i40e_eth_dev_ops = { .mtu_set = i40e_dev_mtu_set, .tm_ops_get = i40e_tm_ops_get, .tx_done_cleanup = i40e_tx_done_cleanup, + .next_rx_desc = i40e_next_rx_desc, }; /* store statistics names and its offset in stats structure */ diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index fe7f9200c..9d7eea8ae 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -71,6 +71,29 @@ #define I40E_TX_OFFLOAD_NOTSUP_MASK \ (PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK) +int +i40e_next_rx_desc(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask) +{ + struct i40e_rx_queue *rxq = rx_queue; + volatile union i40e_rx_desc *rxdp; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + *tail_desc_addr = &rxdp->wb.qword1.status_error_len; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + *expected = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + *mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + + return 0; +} + static inline void i40e_rxd_to_vlan_tci(struct rte_mbuf *mb, volatile union i40e_rx_desc *rxdp) { diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h index 57d7b4160..bfda5b6ad 100644 --- a/drivers/net/i40e/i40e_rxtx.h +++ b/drivers/net/i40e/i40e_rxtx.h @@ -248,6 +248,8 @@ uint16_t i40e_recv_scattered_pkts_vec_avx2(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts); uint16_t i40e_xmit_pkts_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); +int i40e_next_rx_desc(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *value); /* For each value it means, datasheet of hardware can tell more details * From patchwork Tue Aug 11 10:27:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 75423 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id C83E1A04C3; Tue, 11 Aug 2020 12:28:42 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 71B371C0BD; Tue, 11 Aug 2020 12:28:05 +0200 (CEST) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id D88F61C027 for ; Tue, 11 Aug 2020 12:28:03 +0200 (CEST) IronPort-SDR: /0vVzdvQBIB2uAisOrMqtSJGRn39N5oatQnXGbMv4NvqKIV5mRgirw6yJE4Q4pV0KL4/d8vrAo 17BpbJ1NJ8vw== X-IronPort-AV: E=McAfee;i="6000,8403,9709"; a="141317632" X-IronPort-AV: E=Sophos;i="5.75,460,1589266800"; d="scan'208";a="141317632" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Aug 2020 03:28:02 -0700 IronPort-SDR: 82WZ+84e7YCSnjjgIqcUw0n2aIlXdvCDttdjnvrSDwdypZU9yq50IQMBH6xIMWDI+HvCLt9LTy X2Pe34OOtnCw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,460,1589266800"; d="scan'208";a="398488226" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga001.fm.intel.com with ESMTP; 11 Aug 2020 03:28:01 -0700 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 07BAS0Y2012852; Tue, 11 Aug 2020 11:28:00 +0100 Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 07BAS07h020802; Tue, 11 Aug 2020 11:28:00 +0100 Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 07BAS0N0020798; Tue, 11 Aug 2020 11:28:00 +0100 From: Liang Ma To: dev@dpdk.org Cc: anatoly.burakov@intel.com, Liang Ma Date: Tue, 11 Aug 2020 11:27:46 +0100 Message-Id: <1597141666-20621-5-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1597141666-20621-1-git-send-email-liang.j.ma@intel.com> References: <1597141666-20621-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [RFC v2 5/5] net/ice: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Implement support for the power management API by implementing a `next_rx_desc` function that will return an address of an RX ring's status bit. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov --- drivers/net/ice/ice_ethdev.c | 1 + drivers/net/ice/ice_rxtx.c | 23 +++++++++++++++++++++++ drivers/net/ice/ice_rxtx.h | 2 ++ 3 files changed, 26 insertions(+) diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index 7dd3fcd27..7a636cd11 100644 --- a/drivers/net/ice/ice_ethdev.c +++ b/drivers/net/ice/ice_ethdev.c @@ -212,6 +212,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = { .udp_tunnel_port_add = ice_dev_udp_tunnel_port_add, .udp_tunnel_port_del = ice_dev_udp_tunnel_port_del, .tx_done_cleanup = ice_tx_done_cleanup, + .next_rx_desc = ice_next_rx_desc, }; /* store statistics names and its offset in stats structure */ diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index cc3139042..ce7e025b6 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -24,6 +24,29 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_mask; uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask; uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask; +int ice_next_rx_desc(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask) +{ + volatile union ice_rx_flex_desc *rxdp; + struct ice_rx_queue *rxq = rx_queue; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + *tail_desc_addr = &rxdp->wb.status_error0; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + *expected = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S); + *mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S); + + return 0; +} + + static inline uint64_t ice_rxdid_to_proto_xtr_ol_flag(uint8_t rxdid) { diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h index 2fdcfb7d0..7eb6fa904 100644 --- a/drivers/net/ice/ice_rxtx.h +++ b/drivers/net/ice/ice_rxtx.h @@ -202,5 +202,7 @@ uint16_t ice_xmit_pkts_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); int ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc); int ice_tx_done_cleanup(void *txq, uint32_t free_cnt); +int ice_next_rx_desc(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask); #endif /* _ICE_RXTX_H_ */