[v6,05/10] power: add PMD power management API and callback
Checks
Commit Message
From: Liang Ma <liang.j.ma@intel.com>
Add a simple on/off switch that will enable saving power when no
packets are arriving. It is based on counting the number of empty
polls and, when the number reaches a certain threshold, entering an
architecture-defined optimized power state that will either wait
until a TSC timestamp expires, or when packets arrive.
This API mandates a core-to-single-queue mapping (that is, multiple
queued per device are supported, but they have to be polled on different
cores).
This design is using PMD RX callbacks.
1. UMWAIT/UMONITOR:
When a certain threshold of empty polls is reached, the core will go
into a power optimized sleep while waiting on an address of next RX
descriptor to be written to.
2. Pause instruction
Instead of move the core into deeper C state, this method uses the
pause instruction to avoid busy polling.
3. Frequency scaling
Reuse existing DPDK power library to scale up/down core frequency
depending on traffic volume.
Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
Notes:
v6:
- Added wakeup mechanism for UMWAIT
- Removed memory allocation (everything is now allocated statically)
- Fixed various typos and comments
- Check for invalid queue ID
- Moved release notes to this patch
v5:
- Make error checking more robust
- Prevent initializing scaling if ACPI or PSTATE env wasn't set
- Prevent initializing UMWAIT path if PMD doesn't support get_wake_addr
- Add some debug logging
- Replace x86-specific code path to generic path using the intrinsic check
doc/guides/rel_notes/release_20_11.rst | 11 +
lib/librte_power/meson.build | 5 +-
lib/librte_power/rte_power_pmd_mgmt.c | 300 +++++++++++++++++++++++++
lib/librte_power/rte_power_pmd_mgmt.h | 92 ++++++++
lib/librte_power/rte_power_version.map | 4 +
5 files changed, 410 insertions(+), 2 deletions(-)
create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c
create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h
Comments
On 14/10/2020 2:30 PM, Anatoly Burakov wrote:
> From: Liang Ma <liang.j.ma@intel.com>
>
> Add a simple on/off switch that will enable saving power when no
> packets are arriving. It is based on counting the number of empty
> polls and, when the number reaches a certain threshold, entering an
> architecture-defined optimized power state that will either wait
> until a TSC timestamp expires, or when packets arrive.
>
> This API mandates a core-to-single-queue mapping (that is, multiple
> queued per device are supported, but they have to be polled on different
> cores).
>
> This design is using PMD RX callbacks.
>
> 1. UMWAIT/UMONITOR:
>
> When a certain threshold of empty polls is reached, the core will go
> into a power optimized sleep while waiting on an address of next RX
> descriptor to be written to.
>
> 2. Pause instruction
>
> Instead of move the core into deeper C state, this method uses the
> pause instruction to avoid busy polling.
>
> 3. Frequency scaling
> Reuse existing DPDK power library to scale up/down core frequency
> depending on traffic volume.
>
> Signed-off-by: Liang Ma <liang.j.ma@intel.com>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
Hi Liang, Anatoly, great work on the patch set.
Acked-by: David Hunt <david.hunt@intel.com>
> From: Liang Ma <liang.j.ma@intel.com>
>
> Add a simple on/off switch that will enable saving power when no
> packets are arriving. It is based on counting the number of empty
> polls and, when the number reaches a certain threshold, entering an
> architecture-defined optimized power state that will either wait
> until a TSC timestamp expires, or when packets arrive.
>
> This API mandates a core-to-single-queue mapping (that is, multiple
> queued per device are supported, but they have to be polled on different
> cores).
>
> This design is using PMD RX callbacks.
>
> 1. UMWAIT/UMONITOR:
>
> When a certain threshold of empty polls is reached, the core will go
> into a power optimized sleep while waiting on an address of next RX
> descriptor to be written to.
>
> 2. Pause instruction
>
> Instead of move the core into deeper C state, this method uses the
> pause instruction to avoid busy polling.
>
> 3. Frequency scaling
> Reuse existing DPDK power library to scale up/down core frequency
> depending on traffic volume.
>
> Signed-off-by: Liang Ma <liang.j.ma@intel.com>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>
> Notes:
> v6:
> - Added wakeup mechanism for UMWAIT
> - Removed memory allocation (everything is now allocated statically)
> - Fixed various typos and comments
> - Check for invalid queue ID
> - Moved release notes to this patch
>
> v5:
> - Make error checking more robust
> - Prevent initializing scaling if ACPI or PSTATE env wasn't set
> - Prevent initializing UMWAIT path if PMD doesn't support get_wake_addr
> - Add some debug logging
> - Replace x86-specific code path to generic path using the intrinsic check
>
> doc/guides/rel_notes/release_20_11.rst | 11 +
> lib/librte_power/meson.build | 5 +-
> lib/librte_power/rte_power_pmd_mgmt.c | 300 +++++++++++++++++++++++++
> lib/librte_power/rte_power_pmd_mgmt.h | 92 ++++++++
> lib/librte_power/rte_power_version.map | 4 +
> 5 files changed, 410 insertions(+), 2 deletions(-)
> create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c
> create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h
>
> diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> index ca4f43f7f9..06b822aa36 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -197,6 +197,17 @@ New Features
> * Added new ``RTE_ACL_CLASSIFY_AVX512X32`` vector implementation,
> which can process up to 32 flows in parallel. Requires AVX512 support.
>
> +* **Add PMD power management mechanism**
> +
> + 3 new Ethernet PMD power management mechanism is added through existing
> + RX callback infrastructure.
> +
> + * Add power saving scheme based on UMWAIT instruction (x86 only)
> + * Add power saving scheme based on ``rte_pause()``
> + * Add power saving scheme based on frequency scaling through the power library
> + * Add new EXPERIMENTAL API ``rte_power_pmd_mgmt_queue_enable()``
> + * Add new EXPERIMENTAL API ``rte_power_pmd_mgmt_queue_disable()``
> +
>
> Removed Items
> -------------
> diff --git a/lib/librte_power/meson.build b/lib/librte_power/meson.build
> index 78c031c943..cc3c7a8646 100644
> --- a/lib/librte_power/meson.build
> +++ b/lib/librte_power/meson.build
> @@ -9,6 +9,7 @@ sources = files('rte_power.c', 'power_acpi_cpufreq.c',
> 'power_kvm_vm.c', 'guest_channel.c',
> 'rte_power_empty_poll.c',
> 'power_pstate_cpufreq.c',
> + 'rte_power_pmd_mgmt.c',
> 'power_common.c')
> -headers = files('rte_power.h','rte_power_empty_poll.h')
> -deps += ['timer']
> +headers = files('rte_power.h','rte_power_empty_poll.h','rte_power_pmd_mgmt.h')
> +deps += ['timer' ,'ethdev']
> diff --git a/lib/librte_power/rte_power_pmd_mgmt.c b/lib/librte_power/rte_power_pmd_mgmt.c
> new file mode 100644
> index 0000000000..2b7d2a1a46
> --- /dev/null
> +++ b/lib/librte_power/rte_power_pmd_mgmt.c
> @@ -0,0 +1,300 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2010-2020 Intel Corporation
> + */
> +
> +#include <rte_lcore.h>
> +#include <rte_cycles.h>
> +#include <rte_cpuflags.h>
> +#include <rte_malloc.h>
> +#include <rte_ethdev.h>
> +#include <rte_power_intrinsics.h>
> +
> +#include "rte_power_pmd_mgmt.h"
> +
> +#define EMPTYPOLL_MAX 512
> +
> +/**
> + * Possible power management states of an ethdev port.
> + */
> +enum pmd_mgmt_state {
> + /** Device power management is disabled. */
> + PMD_MGMT_DISABLED = 0,
> + /** Device power management is enabled. */
> + PMD_MGMT_ENABLED,
> +};
> +
> +struct pmd_queue_cfg {
> + enum pmd_mgmt_state pwr_mgmt_state;
> + /**< State of power management for this queue */
> + enum rte_power_pmd_mgmt_type cb_mode;
> + /**< Callback mode for this queue */
> + const struct rte_eth_rxtx_callback *cur_cb;
> + /**< Callback instance */
> + rte_spinlock_t umwait_lock;
> + /**< Per-queue status lock - used only for UMWAIT mode */
> + volatile void *wait_addr;
> + /**< UMWAIT wakeup address */
> + uint64_t empty_poll_stats;
> + /**< Number of empty polls */
> +} __rte_cache_aligned;
> +
> +static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
> +
> +/* trigger a write to the cache line we're waiting on */
> +static void umwait_wakeup(volatile void *addr)
> +{
> + uint64_t val;
> +
> + val = __atomic_load_n((volatile uint64_t *)addr, __ATOMIC_RELAXED);
> + __atomic_compare_exchange_n((volatile uint64_t *)addr, &val, val, 0,
> + __ATOMIC_RELAXED, __ATOMIC_RELAXED);
> +}
> +
> +static uint16_t
> +clb_umwait(uint16_t port_id, uint16_t qidx,
> + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
> + uint16_t max_pkts __rte_unused, void *addr __rte_unused)
> +{
> +
> + struct pmd_queue_cfg *q_conf;
> +
> + q_conf = &port_cfg[port_id][qidx];
> +
> + if (unlikely(nb_rx == 0)) {
> + q_conf->empty_poll_stats++;
> + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) {
> + volatile void *target_addr;
> + uint64_t expected, mask;
> + uint16_t ret;
> + uint8_t data_sz;
> +
> + /*
> + * get address of next descriptor in the RX
> + * ring for this queue, as well as expected
> + * value and a mask.
> + */
> + ret = rte_eth_get_wake_addr(port_id, qidx,
> + &target_addr, &expected, &mask,
> + &data_sz);
> + if (ret == 0) {
> + /*
> + * we need to ensure we can wake up by another
> + * thread triggering a write, so we need the
> + * address to always be up to date.
> + */
> + rte_spinlock_lock(&q_conf->umwait_lock);
I think you need to check state here, and _disable() have to set state with lock grabbed.
Otherwise this lock wouldn't protect you from race conditions.
As an example:
CP@T0:
rte_spinlock_lock(&queue_cfg->umwait_lock);
if (queue_cfg->wait_addr != NULL) //wait_addr == NULL, fallthrough
rte_spinlock_unlock(&queue_cfg->umwait_lock);
DP@T1:
rte_spinlock_lock(&queue_cfg->umwait_lock);
queue_cfg->wait_addr = target_addr;
monitor_sync(...); // DP was put to sleep
CP@T2:
queue_cfg->cur_cb = NULL;
queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
ret = 0;
rte_power_pmd_mgmt_queue_disable() finished with success,
but DP core wasn't wokenup.
To be more specific:
clb_umwait(...) {
...
lock(&qcfg->lck);
if (qcfg->state == ENABLED) {
qcfg->wake_addr = addr;
monitor_sync(addr, ...,&qcfg->lck);
}
unlock(&qcfg->lck);
...
}
_disable(...) {
...
lock(&qcfg->lck);
qcfg->state = DISABLED;
if (qcfg->wake_addr != NULL)
monitor_wakeup(qcfg->wake_addr);
unlock(&qcfg->lock);
...
}
> + q_conf->wait_addr = target_addr;
> + /* -1ULL is maximum value for TSC */
> + rte_power_monitor_sync(target_addr, expected,
> + mask, -1ULL, data_sz,
> + &q_conf->umwait_lock);
> + /* erase the address */
> + q_conf->wait_addr = NULL;
> + rte_spinlock_unlock(&q_conf->umwait_lock);
> + }
> + }
> + } else
> + q_conf->empty_poll_stats = 0;
> +
> + return nb_rx;
> +}
> +
> +static uint16_t
> +clb_pause(uint16_t port_id, uint16_t qidx,
> + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
> + uint16_t max_pkts __rte_unused, void *addr __rte_unused)
> +{
> + struct pmd_queue_cfg *q_conf;
> +
> + q_conf = &port_cfg[port_id][qidx];
> +
> + if (unlikely(nb_rx == 0)) {
> + q_conf->empty_poll_stats++;
> + /* sleep for 1 microsecond */
> + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX))
> + rte_delay_us(1);
> + } else
> + q_conf->empty_poll_stats = 0;
> +
> + return nb_rx;
> +}
> +
> +static uint16_t
> +clb_scale_freq(uint16_t port_id, uint16_t qidx,
> + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
> + uint16_t max_pkts __rte_unused, void *_ __rte_unused)
> +{
> + struct pmd_queue_cfg *q_conf;
> +
> + q_conf = &port_cfg[port_id][qidx];
> +
> + if (unlikely(nb_rx == 0)) {
> + q_conf->empty_poll_stats++;
> + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX))
> + /* scale down freq */
> + rte_power_freq_min(rte_lcore_id());
> + } else {
> + q_conf->empty_poll_stats = 0;
> + /* scale up freq */
> + rte_power_freq_max(rte_lcore_id());
> + }
> +
> + return nb_rx;
> +}
> +
> +int
> +rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id,
> + uint16_t port_id, uint16_t queue_id,
> + enum rte_power_pmd_mgmt_type mode)
> +{
> + struct rte_eth_dev *dev;
> + struct pmd_queue_cfg *queue_cfg;
> + int ret;
> +
> + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> + dev = &rte_eth_devices[port_id];
> +
> + /* check if queue id is valid */
> + if (queue_id >= dev->data->nb_rx_queues ||
> + queue_id >= RTE_MAX_QUEUES_PER_PORT) {
> + return -EINVAL;
> + }
> +
> + queue_cfg = &port_cfg[port_id][queue_id];
> +
> + if (queue_cfg->pwr_mgmt_state == PMD_MGMT_ENABLED) {
> + ret = -EINVAL;
> + goto end;
> + }
> +
> + switch (mode) {
> + case RTE_POWER_MGMT_TYPE_WAIT:
> + {
> + /* check if rte_power_monitor is supported */
> + uint64_t dummy_expected, dummy_mask;
> + struct rte_cpu_intrinsics i;
> + volatile void *dummy_addr;
> + uint8_t dummy_sz;
> +
> + rte_cpu_get_intrinsics_support(&i);
> +
> + if (!i.power_monitor) {
> + RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n");
> + ret = -ENOTSUP;
> + goto end;
> + }
> +
> + /* check if the device supports the necessary PMD API */
> + if (rte_eth_get_wake_addr(port_id, queue_id,
> + &dummy_addr, &dummy_expected,
> + &dummy_mask, &dummy_sz) == -ENOTSUP) {
> + RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_rxq_ring_addr_get\n");
> + ret = -ENOTSUP;
> + goto end;
> + }
> + /* initialize UMWAIT spinlock */
> + rte_spinlock_init(&queue_cfg->umwait_lock);
I think don't need to do that.
It supposed to be in valid state (otherwise you are probably in trouble anyway).
> +
> + /* initialize data before enabling the callback */
> + queue_cfg->empty_poll_stats = 0;
> + queue_cfg->cb_mode = mode;
> + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
> +
> + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
> + clb_umwait, NULL);
Would be a bit cleaner/nicer to move add_rx_callback out of switch() {}
As you have to do it always anyway.
Same thought for disable() and remove_rx_callback().
> + break;
> + }
> + case RTE_POWER_MGMT_TYPE_SCALE:
> + {
> + enum power_management_env env;
> + /* only PSTATE and ACPI modes are supported */
> + if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) &&
> + !rte_power_check_env_supported(
> + PM_ENV_PSTATE_CPUFREQ)) {
> + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n");
> + ret = -ENOTSUP;
> + goto end;
> + }
> + /* ensure we could initialize the power library */
> + if (rte_power_init(lcore_id)) {
> + ret = -EINVAL;
> + goto end;
> + }
> + /* ensure we initialized the correct env */
> + env = rte_power_get_env();
> + if (env != PM_ENV_ACPI_CPUFREQ &&
> + env != PM_ENV_PSTATE_CPUFREQ) {
> + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized\n");
> + ret = -ENOTSUP;
> + goto end;
> + }
> + /* initialize data before enabling the callback */
> + queue_cfg->empty_poll_stats = 0;
> + queue_cfg->cb_mode = mode;
> + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
> +
> + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id,
> + queue_id, clb_scale_freq, NULL);
> + break;
> + }
> + case RTE_POWER_MGMT_TYPE_PAUSE:
> + /* initialize data before enabling the callback */
> + queue_cfg->empty_poll_stats = 0;
> + queue_cfg->cb_mode = mode;
> + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
> +
> + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
> + clb_pause, NULL);
> + break;
> + }
> + ret = 0;
> +
> +end:
> + return ret;
> +}
> +
> +int
> +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id,
> + uint16_t port_id, uint16_t queue_id)
> +{
> + struct pmd_queue_cfg *queue_cfg;
> + int ret;
> +
> + queue_cfg = &port_cfg[port_id][queue_id];
> +
> + if (queue_cfg->pwr_mgmt_state == PMD_MGMT_DISABLED) {
> + ret = -EINVAL;
> + goto end;
> + }
> +
> + switch (queue_cfg->cb_mode) {
> + case RTE_POWER_MGMT_TYPE_WAIT:
> + rte_spinlock_lock(&queue_cfg->umwait_lock);
> +
> + /* wake up the core from UMWAIT sleep, if any */
> + if (queue_cfg->wait_addr != NULL)
> + umwait_wakeup(queue_cfg->wait_addr);
> +
> + rte_spinlock_unlock(&queue_cfg->umwait_lock);
> + /* fall-through */
> + case RTE_POWER_MGMT_TYPE_PAUSE:
> + rte_eth_remove_rx_callback(port_id, queue_id,
> + queue_cfg->cur_cb);
> + break;
> + case RTE_POWER_MGMT_TYPE_SCALE:
> + rte_power_freq_max(lcore_id);
> + rte_eth_remove_rx_callback(port_id, queue_id,
> + queue_cfg->cur_cb);
> + rte_power_exit(lcore_id);
> + break;
> + }
> + /*
> + * we don't free the RX callback here because it is unsafe to do so
> + * unless we know for a fact that all data plane threads have stopped.
> + */
> + queue_cfg->cur_cb = NULL;
> + queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
> + ret = 0;
> +end:
> + return ret;
> +}
> diff --git a/lib/librte_power/rte_power_pmd_mgmt.h b/lib/librte_power/rte_power_pmd_mgmt.h
> new file mode 100644
> index 0000000000..a7a3f98268
> --- /dev/null
> +++ b/lib/librte_power/rte_power_pmd_mgmt.h
> @@ -0,0 +1,92 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2010-2020 Intel Corporation
> + */
> +
> +#ifndef _RTE_POWER_PMD_MGMT_H
> +#define _RTE_POWER_PMD_MGMT_H
> +
> +/**
> + * @file
> + * RTE PMD Power Management
> + */
> +#include <stdint.h>
> +#include <stdbool.h>
> +
> +#include <rte_common.h>
> +#include <rte_byteorder.h>
> +#include <rte_log.h>
> +#include <rte_power.h>
> +#include <rte_atomic.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * PMD Power Management Type
> + */
> +enum rte_power_pmd_mgmt_type {
> + /** WAIT callback mode. */
> + RTE_POWER_MGMT_TYPE_WAIT = 1,
> + /** PAUSE callback mode. */
> + RTE_POWER_MGMT_TYPE_PAUSE,
> + /** Freq Scaling callback mode. */
> + RTE_POWER_MGMT_TYPE_SCALE,
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Setup per-queue power management callback.
> + *
> + * @note This function is not thread-safe.
> + *
> + * @param lcore_id
> + * lcore_id.
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param queue_id
> + * The queue identifier of the Ethernet device.
> + * @param mode
> + * The power management callback function type.
> +
> + * @return
> + * 0 on success
> + * <0 on error
> + */
> +__rte_experimental
> +int
> +rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id,
> + uint16_t port_id,
> + uint16_t queue_id,
> + enum rte_power_pmd_mgmt_type mode);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Remove per-queue power management callback.
> + *
> + * @note This function is not thread-safe.
> + *
> + * @param lcore_id
> + * lcore_id.
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param queue_id
> + * The queue identifier of the Ethernet device.
> + * @return
> + * 0 on success
> + * <0 on error
> + */
> +__rte_experimental
> +int
> +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id,
> + uint16_t port_id,
> + uint16_t queue_id);
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> diff --git a/lib/librte_power/rte_power_version.map b/lib/librte_power/rte_power_version.map
> index 69ca9af616..3f2f6cd6f6 100644
> --- a/lib/librte_power/rte_power_version.map
> +++ b/lib/librte_power/rte_power_version.map
> @@ -34,4 +34,8 @@ EXPERIMENTAL {
> rte_power_guest_channel_receive_msg;
> rte_power_poll_stat_fetch;
> rte_power_poll_stat_update;
> + # added in 20.11
> + rte_power_pmd_mgmt_queue_enable;
> + rte_power_pmd_mgmt_queue_disable;
> +
> };
> --
> 2.17.1
On 14-Oct-20 7:41 PM, Ananyev, Konstantin wrote:
>> From: Liang Ma <liang.j.ma@intel.com>
>>
>> Add a simple on/off switch that will enable saving power when no
>> packets are arriving. It is based on counting the number of empty
>> polls and, when the number reaches a certain threshold, entering an
>> architecture-defined optimized power state that will either wait
>> until a TSC timestamp expires, or when packets arrive.
>>
>> This API mandates a core-to-single-queue mapping (that is, multiple
>> queued per device are supported, but they have to be polled on different
>> cores).
>>
>> This design is using PMD RX callbacks.
>>
>> 1. UMWAIT/UMONITOR:
>>
>> When a certain threshold of empty polls is reached, the core will go
>> into a power optimized sleep while waiting on an address of next RX
>> descriptor to be written to.
>>
>> 2. Pause instruction
>>
>> Instead of move the core into deeper C state, this method uses the
>> pause instruction to avoid busy polling.
>>
>> 3. Frequency scaling
>> Reuse existing DPDK power library to scale up/down core frequency
>> depending on traffic volume.
>>
>> Signed-off-by: Liang Ma <liang.j.ma@intel.com>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>
>> Notes:
>> v6:
>> - Added wakeup mechanism for UMWAIT
>> - Removed memory allocation (everything is now allocated statically)
>> - Fixed various typos and comments
>> - Check for invalid queue ID
>> - Moved release notes to this patch
>>
>> v5:
>> - Make error checking more robust
>> - Prevent initializing scaling if ACPI or PSTATE env wasn't set
>> - Prevent initializing UMWAIT path if PMD doesn't support get_wake_addr
>> - Add some debug logging
>> - Replace x86-specific code path to generic path using the intrinsic check
>>
<snip>
>
>
> I think you need to check state here, and _disable() have to set state with lock grabbed.
> Otherwise this lock wouldn't protect you from race conditions.
> As an example:
>
> CP@T0:
> rte_spinlock_lock(&queue_cfg->umwait_lock);
> if (queue_cfg->wait_addr != NULL) //wait_addr == NULL, fallthrough
> rte_spinlock_unlock(&queue_cfg->umwait_lock);
>
> DP@T1:
> rte_spinlock_lock(&queue_cfg->umwait_lock);
> queue_cfg->wait_addr = target_addr;
> monitor_sync(...); // DP was put to sleep
>
> CP@T2:
> queue_cfg->cur_cb = NULL;
> queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
> ret = 0;
>
> rte_power_pmd_mgmt_queue_disable() finished with success,
> but DP core wasn't wokenup.
>
> To be more specific:
> clb_umwait(...) {
> ...
> lock(&qcfg->lck);
> if (qcfg->state == ENABLED) {
> qcfg->wake_addr = addr;
> monitor_sync(addr, ...,&qcfg->lck);
> }
> unlock(&qcfg->lck);
> ...
> }
>
> _disable(...) {
> ...
> lock(&qcfg->lck);
> qcfg->state = DISABLED;
> if (qcfg->wake_addr != NULL)
> monitor_wakeup(qcfg->wake_addr);
> unlock(&qcfg->lock);
> ...
> }
True, didn't think of that. Will fix.
>> +
>> +if (!i.power_monitor) {
>> +RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n");
>> +ret = -ENOTSUP;
>> +goto end;
>> +}
>> +
>> +/* check if the device supports the necessary PMD API */
>> +if (rte_eth_get_wake_addr(port_id, queue_id,
>> +&dummy_addr, &dummy_expected,
>> +&dummy_mask, &dummy_sz) == -ENOTSUP) {
>> +RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_rxq_ring_addr_get\n");
>> +ret = -ENOTSUP;
>> +goto end;
>> +}
>> +/* initialize UMWAIT spinlock */
>> +rte_spinlock_init(&queue_cfg->umwait_lock);
>
> I think don't need to do that.
> It supposed to be in valid state (otherwise you are probably in trouble anyway).
This is mostly for initialization, for when we first run the callback. I
suppose we could do it in an RTE_INIT() function or just leave it be
since the spinlocks are part of a statically allocated structure and
will default to 0 anyway (although it wouldn't be proper usage of the
API as that would be relying on implementation detail).
>
>> +
>> +/* initialize data before enabling the callback */
>> +queue_cfg->empty_poll_stats = 0;
>> +queue_cfg->cb_mode = mode;
>> +queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
>> +
>> +queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
>> +clb_umwait, NULL);
>
> Would be a bit cleaner/nicer to move add_rx_callback out of switch() {}
> As you have to do it always anyway.
> Same thought for disable() and remove_rx_callback().
The functions are different for each, so we can't move them out of
switch (unless you're suggesting to unify the callback to handle all
three modes?).
> On 14-Oct-20 7:41 PM, Ananyev, Konstantin wrote:
> >> From: Liang Ma <liang.j.ma@intel.com>
> >>
> >> Add a simple on/off switch that will enable saving power when no
> >> packets are arriving. It is based on counting the number of empty
> >> polls and, when the number reaches a certain threshold, entering an
> >> architecture-defined optimized power state that will either wait
> >> until a TSC timestamp expires, or when packets arrive.
> >>
> >> This API mandates a core-to-single-queue mapping (that is, multiple
> >> queued per device are supported, but they have to be polled on different
> >> cores).
> >>
> >> This design is using PMD RX callbacks.
> >>
> >> 1. UMWAIT/UMONITOR:
> >>
> >> When a certain threshold of empty polls is reached, the core will go
> >> into a power optimized sleep while waiting on an address of next RX
> >> descriptor to be written to.
> >>
> >> 2. Pause instruction
> >>
> >> Instead of move the core into deeper C state, this method uses the
> >> pause instruction to avoid busy polling.
> >>
> >> 3. Frequency scaling
> >> Reuse existing DPDK power library to scale up/down core frequency
> >> depending on traffic volume.
> >>
> >> Signed-off-by: Liang Ma <liang.j.ma@intel.com>
> >> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> >> ---
> >>
> >> Notes:
> >> v6:
> >> - Added wakeup mechanism for UMWAIT
> >> - Removed memory allocation (everything is now allocated statically)
> >> - Fixed various typos and comments
> >> - Check for invalid queue ID
> >> - Moved release notes to this patch
> >>
> >> v5:
> >> - Make error checking more robust
> >> - Prevent initializing scaling if ACPI or PSTATE env wasn't set
> >> - Prevent initializing UMWAIT path if PMD doesn't support get_wake_addr
> >> - Add some debug logging
> >> - Replace x86-specific code path to generic path using the intrinsic check
> >>
>
>
> <snip>
>
> >
> >
> > I think you need to check state here, and _disable() have to set state with lock grabbed.
> > Otherwise this lock wouldn't protect you from race conditions.
> > As an example:
> >
> > CP@T0:
> > rte_spinlock_lock(&queue_cfg->umwait_lock);
> > if (queue_cfg->wait_addr != NULL) //wait_addr == NULL, fallthrough
> > rte_spinlock_unlock(&queue_cfg->umwait_lock);
> >
> > DP@T1:
> > rte_spinlock_lock(&queue_cfg->umwait_lock);
> > queue_cfg->wait_addr = target_addr;
> > monitor_sync(...); // DP was put to sleep
> >
> > CP@T2:
> > queue_cfg->cur_cb = NULL;
> > queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
> > ret = 0;
> >
> > rte_power_pmd_mgmt_queue_disable() finished with success,
> > but DP core wasn't wokenup.
> >
> > To be more specific:
> > clb_umwait(...) {
> > ...
> > lock(&qcfg->lck);
> > if (qcfg->state == ENABLED) {
> > qcfg->wake_addr = addr;
> > monitor_sync(addr, ...,&qcfg->lck);
> > }
> > unlock(&qcfg->lck);
> > ...
> > }
> >
> > _disable(...) {
> > ...
> > lock(&qcfg->lck);
> > qcfg->state = DISABLED;
> > if (qcfg->wake_addr != NULL)
> > monitor_wakeup(qcfg->wake_addr);
> > unlock(&qcfg->lock);
> > ...
> > }
>
> True, didn't think of that. Will fix.
>
> >> +
> >> +if (!i.power_monitor) {
> >> +RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n");
> >> +ret = -ENOTSUP;
> >> +goto end;
> >> +}
> >> +
> >> +/* check if the device supports the necessary PMD API */
> >> +if (rte_eth_get_wake_addr(port_id, queue_id,
> >> +&dummy_addr, &dummy_expected,
> >> +&dummy_mask, &dummy_sz) == -ENOTSUP) {
> >> +RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_rxq_ring_addr_get\n");
> >> +ret = -ENOTSUP;
> >> +goto end;
> >> +}
> >> +/* initialize UMWAIT spinlock */
> >> +rte_spinlock_init(&queue_cfg->umwait_lock);
> >
> > I think don't need to do that.
> > It supposed to be in valid state (otherwise you are probably in trouble anyway).
>
> This is mostly for initialization, for when we first run the callback. I
> suppose we could do it in an RTE_INIT() function or just leave it be
> since the spinlocks are part of a statically allocated structure and
> will default to 0 anyway
Yes static initialization should be sufficient here.
(although it wouldn't be proper usage of the
> API as that would be relying on implementation detail).
What I am trying to say - at that moment spinlock value should be zero.
If it is not, then your DP callback is still running and holding the lock.
I understand that it should be *very* rare situation, but it seems
strange to me to silently over-write the locked value from other thread.
It shouldn't cause any major trouble, but still...
>
> >
> >> +
> >> +/* initialize data before enabling the callback */
> >> +queue_cfg->empty_poll_stats = 0;
> >> +queue_cfg->cb_mode = mode;
> >> +queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
> >> +
> >> +queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
> >> +clb_umwait, NULL);
> >
> > Would be a bit cleaner/nicer to move add_rx_callback out of switch() {}
> > As you have to do it always anyway.
> > Same thought for disable() and remove_rx_callback().
>
> The functions are different for each, so we can't move them out of
> switch (unless you're suggesting to unify the callback to handle all
> three modes?).
>
> --
> Thanks,
> Anatoly
@@ -197,6 +197,17 @@ New Features
* Added new ``RTE_ACL_CLASSIFY_AVX512X32`` vector implementation,
which can process up to 32 flows in parallel. Requires AVX512 support.
+* **Add PMD power management mechanism**
+
+ 3 new Ethernet PMD power management mechanism is added through existing
+ RX callback infrastructure.
+
+ * Add power saving scheme based on UMWAIT instruction (x86 only)
+ * Add power saving scheme based on ``rte_pause()``
+ * Add power saving scheme based on frequency scaling through the power library
+ * Add new EXPERIMENTAL API ``rte_power_pmd_mgmt_queue_enable()``
+ * Add new EXPERIMENTAL API ``rte_power_pmd_mgmt_queue_disable()``
+
Removed Items
-------------
@@ -9,6 +9,7 @@ sources = files('rte_power.c', 'power_acpi_cpufreq.c',
'power_kvm_vm.c', 'guest_channel.c',
'rte_power_empty_poll.c',
'power_pstate_cpufreq.c',
+ 'rte_power_pmd_mgmt.c',
'power_common.c')
-headers = files('rte_power.h','rte_power_empty_poll.h')
-deps += ['timer']
+headers = files('rte_power.h','rte_power_empty_poll.h','rte_power_pmd_mgmt.h')
+deps += ['timer' ,'ethdev']
new file mode 100644
@@ -0,0 +1,300 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2020 Intel Corporation
+ */
+
+#include <rte_lcore.h>
+#include <rte_cycles.h>
+#include <rte_cpuflags.h>
+#include <rte_malloc.h>
+#include <rte_ethdev.h>
+#include <rte_power_intrinsics.h>
+
+#include "rte_power_pmd_mgmt.h"
+
+#define EMPTYPOLL_MAX 512
+
+/**
+ * Possible power management states of an ethdev port.
+ */
+enum pmd_mgmt_state {
+ /** Device power management is disabled. */
+ PMD_MGMT_DISABLED = 0,
+ /** Device power management is enabled. */
+ PMD_MGMT_ENABLED,
+};
+
+struct pmd_queue_cfg {
+ enum pmd_mgmt_state pwr_mgmt_state;
+ /**< State of power management for this queue */
+ enum rte_power_pmd_mgmt_type cb_mode;
+ /**< Callback mode for this queue */
+ const struct rte_eth_rxtx_callback *cur_cb;
+ /**< Callback instance */
+ rte_spinlock_t umwait_lock;
+ /**< Per-queue status lock - used only for UMWAIT mode */
+ volatile void *wait_addr;
+ /**< UMWAIT wakeup address */
+ uint64_t empty_poll_stats;
+ /**< Number of empty polls */
+} __rte_cache_aligned;
+
+static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+
+/* trigger a write to the cache line we're waiting on */
+static void umwait_wakeup(volatile void *addr)
+{
+ uint64_t val;
+
+ val = __atomic_load_n((volatile uint64_t *)addr, __ATOMIC_RELAXED);
+ __atomic_compare_exchange_n((volatile uint64_t *)addr, &val, val, 0,
+ __ATOMIC_RELAXED, __ATOMIC_RELAXED);
+}
+
+static uint16_t
+clb_umwait(uint16_t port_id, uint16_t qidx,
+ struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
+ uint16_t max_pkts __rte_unused, void *addr __rte_unused)
+{
+
+ struct pmd_queue_cfg *q_conf;
+
+ q_conf = &port_cfg[port_id][qidx];
+
+ if (unlikely(nb_rx == 0)) {
+ q_conf->empty_poll_stats++;
+ if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) {
+ volatile void *target_addr;
+ uint64_t expected, mask;
+ uint16_t ret;
+ uint8_t data_sz;
+
+ /*
+ * get address of next descriptor in the RX
+ * ring for this queue, as well as expected
+ * value and a mask.
+ */
+ ret = rte_eth_get_wake_addr(port_id, qidx,
+ &target_addr, &expected, &mask,
+ &data_sz);
+ if (ret == 0) {
+ /*
+ * we need to ensure we can wake up by another
+ * thread triggering a write, so we need the
+ * address to always be up to date.
+ */
+ rte_spinlock_lock(&q_conf->umwait_lock);
+ q_conf->wait_addr = target_addr;
+ /* -1ULL is maximum value for TSC */
+ rte_power_monitor_sync(target_addr, expected,
+ mask, -1ULL, data_sz,
+ &q_conf->umwait_lock);
+ /* erase the address */
+ q_conf->wait_addr = NULL;
+ rte_spinlock_unlock(&q_conf->umwait_lock);
+ }
+ }
+ } else
+ q_conf->empty_poll_stats = 0;
+
+ return nb_rx;
+}
+
+static uint16_t
+clb_pause(uint16_t port_id, uint16_t qidx,
+ struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
+ uint16_t max_pkts __rte_unused, void *addr __rte_unused)
+{
+ struct pmd_queue_cfg *q_conf;
+
+ q_conf = &port_cfg[port_id][qidx];
+
+ if (unlikely(nb_rx == 0)) {
+ q_conf->empty_poll_stats++;
+ /* sleep for 1 microsecond */
+ if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX))
+ rte_delay_us(1);
+ } else
+ q_conf->empty_poll_stats = 0;
+
+ return nb_rx;
+}
+
+static uint16_t
+clb_scale_freq(uint16_t port_id, uint16_t qidx,
+ struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
+ uint16_t max_pkts __rte_unused, void *_ __rte_unused)
+{
+ struct pmd_queue_cfg *q_conf;
+
+ q_conf = &port_cfg[port_id][qidx];
+
+ if (unlikely(nb_rx == 0)) {
+ q_conf->empty_poll_stats++;
+ if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX))
+ /* scale down freq */
+ rte_power_freq_min(rte_lcore_id());
+ } else {
+ q_conf->empty_poll_stats = 0;
+ /* scale up freq */
+ rte_power_freq_max(rte_lcore_id());
+ }
+
+ return nb_rx;
+}
+
+int
+rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id,
+ uint16_t port_id, uint16_t queue_id,
+ enum rte_power_pmd_mgmt_type mode)
+{
+ struct rte_eth_dev *dev;
+ struct pmd_queue_cfg *queue_cfg;
+ int ret;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+ dev = &rte_eth_devices[port_id];
+
+ /* check if queue id is valid */
+ if (queue_id >= dev->data->nb_rx_queues ||
+ queue_id >= RTE_MAX_QUEUES_PER_PORT) {
+ return -EINVAL;
+ }
+
+ queue_cfg = &port_cfg[port_id][queue_id];
+
+ if (queue_cfg->pwr_mgmt_state == PMD_MGMT_ENABLED) {
+ ret = -EINVAL;
+ goto end;
+ }
+
+ switch (mode) {
+ case RTE_POWER_MGMT_TYPE_WAIT:
+ {
+ /* check if rte_power_monitor is supported */
+ uint64_t dummy_expected, dummy_mask;
+ struct rte_cpu_intrinsics i;
+ volatile void *dummy_addr;
+ uint8_t dummy_sz;
+
+ rte_cpu_get_intrinsics_support(&i);
+
+ if (!i.power_monitor) {
+ RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n");
+ ret = -ENOTSUP;
+ goto end;
+ }
+
+ /* check if the device supports the necessary PMD API */
+ if (rte_eth_get_wake_addr(port_id, queue_id,
+ &dummy_addr, &dummy_expected,
+ &dummy_mask, &dummy_sz) == -ENOTSUP) {
+ RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_rxq_ring_addr_get\n");
+ ret = -ENOTSUP;
+ goto end;
+ }
+ /* initialize UMWAIT spinlock */
+ rte_spinlock_init(&queue_cfg->umwait_lock);
+
+ /* initialize data before enabling the callback */
+ queue_cfg->empty_poll_stats = 0;
+ queue_cfg->cb_mode = mode;
+ queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+
+ queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
+ clb_umwait, NULL);
+ break;
+ }
+ case RTE_POWER_MGMT_TYPE_SCALE:
+ {
+ enum power_management_env env;
+ /* only PSTATE and ACPI modes are supported */
+ if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) &&
+ !rte_power_check_env_supported(
+ PM_ENV_PSTATE_CPUFREQ)) {
+ RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n");
+ ret = -ENOTSUP;
+ goto end;
+ }
+ /* ensure we could initialize the power library */
+ if (rte_power_init(lcore_id)) {
+ ret = -EINVAL;
+ goto end;
+ }
+ /* ensure we initialized the correct env */
+ env = rte_power_get_env();
+ if (env != PM_ENV_ACPI_CPUFREQ &&
+ env != PM_ENV_PSTATE_CPUFREQ) {
+ RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized\n");
+ ret = -ENOTSUP;
+ goto end;
+ }
+ /* initialize data before enabling the callback */
+ queue_cfg->empty_poll_stats = 0;
+ queue_cfg->cb_mode = mode;
+ queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+
+ queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id,
+ queue_id, clb_scale_freq, NULL);
+ break;
+ }
+ case RTE_POWER_MGMT_TYPE_PAUSE:
+ /* initialize data before enabling the callback */
+ queue_cfg->empty_poll_stats = 0;
+ queue_cfg->cb_mode = mode;
+ queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+
+ queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
+ clb_pause, NULL);
+ break;
+ }
+ ret = 0;
+
+end:
+ return ret;
+}
+
+int
+rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id,
+ uint16_t port_id, uint16_t queue_id)
+{
+ struct pmd_queue_cfg *queue_cfg;
+ int ret;
+
+ queue_cfg = &port_cfg[port_id][queue_id];
+
+ if (queue_cfg->pwr_mgmt_state == PMD_MGMT_DISABLED) {
+ ret = -EINVAL;
+ goto end;
+ }
+
+ switch (queue_cfg->cb_mode) {
+ case RTE_POWER_MGMT_TYPE_WAIT:
+ rte_spinlock_lock(&queue_cfg->umwait_lock);
+
+ /* wake up the core from UMWAIT sleep, if any */
+ if (queue_cfg->wait_addr != NULL)
+ umwait_wakeup(queue_cfg->wait_addr);
+
+ rte_spinlock_unlock(&queue_cfg->umwait_lock);
+ /* fall-through */
+ case RTE_POWER_MGMT_TYPE_PAUSE:
+ rte_eth_remove_rx_callback(port_id, queue_id,
+ queue_cfg->cur_cb);
+ break;
+ case RTE_POWER_MGMT_TYPE_SCALE:
+ rte_power_freq_max(lcore_id);
+ rte_eth_remove_rx_callback(port_id, queue_id,
+ queue_cfg->cur_cb);
+ rte_power_exit(lcore_id);
+ break;
+ }
+ /*
+ * we don't free the RX callback here because it is unsafe to do so
+ * unless we know for a fact that all data plane threads have stopped.
+ */
+ queue_cfg->cur_cb = NULL;
+ queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
+ ret = 0;
+end:
+ return ret;
+}
new file mode 100644
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2020 Intel Corporation
+ */
+
+#ifndef _RTE_POWER_PMD_MGMT_H
+#define _RTE_POWER_PMD_MGMT_H
+
+/**
+ * @file
+ * RTE PMD Power Management
+ */
+#include <stdint.h>
+#include <stdbool.h>
+
+#include <rte_common.h>
+#include <rte_byteorder.h>
+#include <rte_log.h>
+#include <rte_power.h>
+#include <rte_atomic.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * PMD Power Management Type
+ */
+enum rte_power_pmd_mgmt_type {
+ /** WAIT callback mode. */
+ RTE_POWER_MGMT_TYPE_WAIT = 1,
+ /** PAUSE callback mode. */
+ RTE_POWER_MGMT_TYPE_PAUSE,
+ /** Freq Scaling callback mode. */
+ RTE_POWER_MGMT_TYPE_SCALE,
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Setup per-queue power management callback.
+ *
+ * @note This function is not thread-safe.
+ *
+ * @param lcore_id
+ * lcore_id.
+ * @param port_id
+ * The port identifier of the Ethernet device.
+ * @param queue_id
+ * The queue identifier of the Ethernet device.
+ * @param mode
+ * The power management callback function type.
+
+ * @return
+ * 0 on success
+ * <0 on error
+ */
+__rte_experimental
+int
+rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id,
+ uint16_t port_id,
+ uint16_t queue_id,
+ enum rte_power_pmd_mgmt_type mode);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Remove per-queue power management callback.
+ *
+ * @note This function is not thread-safe.
+ *
+ * @param lcore_id
+ * lcore_id.
+ * @param port_id
+ * The port identifier of the Ethernet device.
+ * @param queue_id
+ * The queue identifier of the Ethernet device.
+ * @return
+ * 0 on success
+ * <0 on error
+ */
+__rte_experimental
+int
+rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id,
+ uint16_t port_id,
+ uint16_t queue_id);
+#ifdef __cplusplus
+}
+#endif
+
+#endif
@@ -34,4 +34,8 @@ EXPERIMENTAL {
rte_power_guest_channel_receive_msg;
rte_power_poll_stat_fetch;
rte_power_poll_stat_update;
+ # added in 20.11
+ rte_power_pmd_mgmt_queue_enable;
+ rte_power_pmd_mgmt_queue_disable;
+
};