[v1,5/7] power: support callbacks for multiple Rx queues

Message ID 601f09ad562f97ca1e0077cb36ba46e448df382d.1622548381.git.anatoly.burakov@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series Enhancements for PMD power management |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Burakov, Anatoly June 1, 2021, noon UTC
  Currently, there is a hard limitation on the PMD power management
support that only allows it to support a single queue per lcore. This is
not ideal as most DPDK use cases will poll multiple queues per core.

The PMD power management mechanism relies on ethdev Rx callbacks, so it
is very difficult to implement such support because callbacks are
effectively stateless and have no visibility into what the other ethdev
devices are doing. This places limitations on what we can do within the
framework of Rx callbacks, but the basics of this implementation are as
follows:

- Replace per-queue structures with per-lcore ones, so that any device
  polled from the same lcore can share data
- Any queue that is going to be polled from a specific lcore has to be
  added to the list of cores to poll, so that the callback is aware of
  other queues being polled by the same lcore
- Both the empty poll counter and the actual power saving mechanism is
  shared between all queues polled on a particular lcore, and is only
  activated when a special designated "power saving" queue is polled. To
  put it another way, we have no idea which queue the user will poll in
  what order, so we rely on them telling us that queue X is the last one
  in the polling loop, so any power management should happen there.
- A new API is added to mark a specific Rx queue as "power saving".
  Failing to call this API will result in no power management, however
  when having only one queue per core it is obvious which queue is the
  "power saving" one, so things will still work without this new API for
  use cases that were previously working without it.
- The limitation on UMWAIT-based polling is not removed because UMWAIT
  is incapable of monitoring more than one address.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/power/rte_power_pmd_mgmt.c | 335 ++++++++++++++++++++++++++-------
 lib/power/rte_power_pmd_mgmt.h |  34 ++++
 lib/power/version.map          |   3 +
 3 files changed, 306 insertions(+), 66 deletions(-)
  

Comments

Ananyev, Konstantin June 22, 2021, 9:41 a.m. UTC | #1
> Currently, there is a hard limitation on the PMD power management
> support that only allows it to support a single queue per lcore. This is
> not ideal as most DPDK use cases will poll multiple queues per core.
> 
> The PMD power management mechanism relies on ethdev Rx callbacks, so it
> is very difficult to implement such support because callbacks are
> effectively stateless and have no visibility into what the other ethdev
> devices are doing.  This places limitations on what we can do within the
> framework of Rx callbacks, but the basics of this implementation are as
> follows:
> 
> - Replace per-queue structures with per-lcore ones, so that any device
>   polled from the same lcore can share data
> - Any queue that is going to be polled from a specific lcore has to be
>   added to the list of cores to poll, so that the callback is aware of
>   other queues being polled by the same lcore
> - Both the empty poll counter and the actual power saving mechanism is
>   shared between all queues polled on a particular lcore, and is only
>   activated when a special designated "power saving" queue is polled. To
>   put it another way, we have no idea which queue the user will poll in
>   what order, so we rely on them telling us that queue X is the last one
>   in the polling loop, so any power management should happen there.
> - A new API is added to mark a specific Rx queue as "power saving".

Honestly, I don't understand the logic behind that new function.
I understand that depending on HW we ca monitor either one or multiple queues.
That's ok, but why we now need to mark one queue as a 'very special' one?
Why can't rte_power_ethdev_pmgmt_queue_enable() just:
Check is number of monitored queues exceed HW/SW capabilities,
and if so then just return a failure.
Otherwise add queue to the list and treat them all equally, i.e:
go to power save mode when number of sequential empty polls on
all monitored queues will exceed EMPTYPOLL_MAX threshold?

>   Failing to call this API will result in no power management, however
>   when having only one queue per core it is obvious which queue is the
>   "power saving" one, so things will still work without this new API for
>   use cases that were previously working without it.
> - The limitation on UMWAIT-based polling is not removed because UMWAIT
>   is incapable of monitoring more than one address.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  lib/power/rte_power_pmd_mgmt.c | 335 ++++++++++++++++++++++++++-------
>  lib/power/rte_power_pmd_mgmt.h |  34 ++++
>  lib/power/version.map          |   3 +
>  3 files changed, 306 insertions(+), 66 deletions(-)
> 
> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
> index 0707c60a4f..60dd21a19c 100644
> --- a/lib/power/rte_power_pmd_mgmt.c
> +++ b/lib/power/rte_power_pmd_mgmt.c
> @@ -33,7 +33,19 @@ enum pmd_mgmt_state {
>  	PMD_MGMT_ENABLED
>  };
> 
> -struct pmd_queue_cfg {
> +struct queue {
> +	uint16_t portid;
> +	uint16_t qid;
> +};

Just a thought: if that would help somehow, it can be changed to:
union queue {
	uint32_t raw;
	struct { uint16_t portid, qid;
	}; 
};

That way in queue find/cmp functions below you can operate with single raw 32-bt values.
Probably not that important, as all these functions are on slow path, but might look nicer.

> +struct pmd_core_cfg {
> +	struct queue queues[RTE_MAX_ETHPORTS];

If we'll have ability to monitor multiple queues per lcore, would it be always enough?
From other side, it is updated on control path only.
Wouldn't normal list with malloc(/rte_malloc) would be more suitable here?  

> +	/**< Which port-queue pairs are associated with this lcore? */
> +	struct queue power_save_queue;
> +	/**< When polling multiple queues, all but this one will be ignored */
> +	bool power_save_queue_set;
> +	/**< When polling multiple queues, power save queue must be set */
> +	size_t n_queues;
> +	/**< How many queues are in the list? */
>  	volatile enum pmd_mgmt_state pwr_mgmt_state;
>  	/**< State of power management for this queue */
>  	enum rte_power_pmd_mgmt_type cb_mode;
> @@ -43,8 +55,97 @@ struct pmd_queue_cfg {
>  	uint64_t empty_poll_stats;
>  	/**< Number of empty polls */
>  } __rte_cache_aligned;
> +static struct pmd_core_cfg lcore_cfg[RTE_MAX_LCORE];
> 
> -static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
> +static inline bool
> +queue_equal(const struct queue *l, const struct queue *r)
> +{
> +	return l->portid == r->portid && l->qid == r->qid;
> +}
> +
> +static inline void
> +queue_copy(struct queue *dst, const struct queue *src)
> +{
> +	dst->portid = src->portid;
> +	dst->qid = src->qid;
> +}
> +
> +static inline bool
> +queue_is_power_save(const struct pmd_core_cfg *cfg, const struct queue *q) {

Here and in other places - any reason why standard DPDK coding style is not used?

> +	const struct queue *pwrsave = &cfg->power_save_queue;
> +
> +	/* if there's only single queue, no need to check anything */
> +	if (cfg->n_queues == 1)
> +		return true;
> +	return cfg->power_save_queue_set && queue_equal(q, pwrsave);
> +}
> +
  
Burakov, Anatoly June 23, 2021, 9:36 a.m. UTC | #2
On 22-Jun-21 10:41 AM, Ananyev, Konstantin wrote:
> 
>> Currently, there is a hard limitation on the PMD power management
>> support that only allows it to support a single queue per lcore. This is
>> not ideal as most DPDK use cases will poll multiple queues per core.
>>
>> The PMD power management mechanism relies on ethdev Rx callbacks, so it
>> is very difficult to implement such support because callbacks are
>> effectively stateless and have no visibility into what the other ethdev
>> devices are doing.  This places limitations on what we can do within the
>> framework of Rx callbacks, but the basics of this implementation are as
>> follows:
>>
>> - Replace per-queue structures with per-lcore ones, so that any device
>>    polled from the same lcore can share data
>> - Any queue that is going to be polled from a specific lcore has to be
>>    added to the list of cores to poll, so that the callback is aware of
>>    other queues being polled by the same lcore
>> - Both the empty poll counter and the actual power saving mechanism is
>>    shared between all queues polled on a particular lcore, and is only
>>    activated when a special designated "power saving" queue is polled. To
>>    put it another way, we have no idea which queue the user will poll in
>>    what order, so we rely on them telling us that queue X is the last one
>>    in the polling loop, so any power management should happen there.
>> - A new API is added to mark a specific Rx queue as "power saving".
> 
> Honestly, I don't understand the logic behind that new function.
> I understand that depending on HW we ca monitor either one or multiple queues.
> That's ok, but why we now need to mark one queue as a 'very special' one?

Because we don't know which of the queues we are supposed to sleep on.

Imagine a situation where you have 3 queues. What usually happens is you 
poll them in a loop, so q0, q1, q2, q0, q1, q2... etc. We only want to 
enter power-optimized state on polling q2, because otherwise we're 
risking going into power optimized state while q1 or q2 have traffic.

Worst case scenario, we enter sleep after polling q0, then traffic 
arrives at q2, we wake up, and then attempt to go to sleep on q1 instead 
of skipping it. Essentially, we will be attempting to sleep at every 
queue, instead of once in a loop. This *might* be OK for multi-monitor 
because we'll be aborting sleep due to sleep condition check failure, 
but for modes like rte_pause()/rte_power_pause()-based sleep, we will be 
entering sleep unconditionally, and will be risking to sleep at q1 while 
there's traffic at q2.

So, we need this mechanism to be activated once every *loop*, not per queue.

> Why can't rte_power_ethdev_pmgmt_queue_enable() just:
> Check is number of monitored queues exceed HW/SW capabilities,
> and if so then just return a failure.
> Otherwise add queue to the list and treat them all equally, i.e:
> go to power save mode when number of sequential empty polls on
> all monitored queues will exceed EMPTYPOLL_MAX threshold?
> 
>>    Failing to call this API will result in no power management, however
>>    when having only one queue per core it is obvious which queue is the
>>    "power saving" one, so things will still work without this new API for
>>    use cases that were previously working without it.
>> - The limitation on UMWAIT-based polling is not removed because UMWAIT
>>    is incapable of monitoring more than one address.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>   lib/power/rte_power_pmd_mgmt.c | 335 ++++++++++++++++++++++++++-------
>>   lib/power/rte_power_pmd_mgmt.h |  34 ++++
>>   lib/power/version.map          |   3 +
>>   3 files changed, 306 insertions(+), 66 deletions(-)
>>
>> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
>> index 0707c60a4f..60dd21a19c 100644
>> --- a/lib/power/rte_power_pmd_mgmt.c
>> +++ b/lib/power/rte_power_pmd_mgmt.c
>> @@ -33,7 +33,19 @@ enum pmd_mgmt_state {
>>        PMD_MGMT_ENABLED
>>   };
>>
>> -struct pmd_queue_cfg {
>> +struct queue {
>> +     uint16_t portid;
>> +     uint16_t qid;
>> +};
> 
> Just a thought: if that would help somehow, it can be changed to:
> union queue {
>          uint32_t raw;
>          struct { uint16_t portid, qid;
>          };
> };
> 
> That way in queue find/cmp functions below you can operate with single raw 32-bt values.
> Probably not that important, as all these functions are on slow path, but might look nicer.

Sure, that can work. We actually do comparisons with power save queue on 
fast path, so maybe that'll help.

> 
>> +struct pmd_core_cfg {
>> +     struct queue queues[RTE_MAX_ETHPORTS];
> 
> If we'll have ability to monitor multiple queues per lcore, would it be always enough?
>  From other side, it is updated on control path only.
> Wouldn't normal list with malloc(/rte_malloc) would be more suitable here?

You're right, it should be dynamically allocated.

> 
>> +     /**< Which port-queue pairs are associated with this lcore? */
>> +     struct queue power_save_queue;
>> +     /**< When polling multiple queues, all but this one will be ignored */
>> +     bool power_save_queue_set;
>> +     /**< When polling multiple queues, power save queue must be set */
>> +     size_t n_queues;
>> +     /**< How many queues are in the list? */
>>        volatile enum pmd_mgmt_state pwr_mgmt_state;
>>        /**< State of power management for this queue */
>>        enum rte_power_pmd_mgmt_type cb_mode;
>> @@ -43,8 +55,97 @@ struct pmd_queue_cfg {
>>        uint64_t empty_poll_stats;
>>        /**< Number of empty polls */
>>   } __rte_cache_aligned;
>> +static struct pmd_core_cfg lcore_cfg[RTE_MAX_LCORE];
>>
>> -static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
>> +static inline bool
>> +queue_equal(const struct queue *l, const struct queue *r)
>> +{
>> +     return l->portid == r->portid && l->qid == r->qid;
>> +}
>> +
>> +static inline void
>> +queue_copy(struct queue *dst, const struct queue *src)
>> +{
>> +     dst->portid = src->portid;
>> +     dst->qid = src->qid;
>> +}
>> +
>> +static inline bool
>> +queue_is_power_save(const struct pmd_core_cfg *cfg, const struct queue *q) {
> 
> Here and in other places - any reason why standard DPDK coding style is not used?

Just accidental :)

> 
>> +     const struct queue *pwrsave = &cfg->power_save_queue;
>> +
>> +     /* if there's only single queue, no need to check anything */
>> +     if (cfg->n_queues == 1)
>> +             return true;
>> +     return cfg->power_save_queue_set && queue_equal(q, pwrsave);
>> +}
>> +
  
Ananyev, Konstantin June 23, 2021, 9:49 a.m. UTC | #3
> 
> On 22-Jun-21 10:41 AM, Ananyev, Konstantin wrote:
> >
> >> Currently, there is a hard limitation on the PMD power management
> >> support that only allows it to support a single queue per lcore. This is
> >> not ideal as most DPDK use cases will poll multiple queues per core.
> >>
> >> The PMD power management mechanism relies on ethdev Rx callbacks, so it
> >> is very difficult to implement such support because callbacks are
> >> effectively stateless and have no visibility into what the other ethdev
> >> devices are doing.  This places limitations on what we can do within the
> >> framework of Rx callbacks, but the basics of this implementation are as
> >> follows:
> >>
> >> - Replace per-queue structures with per-lcore ones, so that any device
> >>    polled from the same lcore can share data
> >> - Any queue that is going to be polled from a specific lcore has to be
> >>    added to the list of cores to poll, so that the callback is aware of
> >>    other queues being polled by the same lcore
> >> - Both the empty poll counter and the actual power saving mechanism is
> >>    shared between all queues polled on a particular lcore, and is only
> >>    activated when a special designated "power saving" queue is polled. To
> >>    put it another way, we have no idea which queue the user will poll in
> >>    what order, so we rely on them telling us that queue X is the last one
> >>    in the polling loop, so any power management should happen there.
> >> - A new API is added to mark a specific Rx queue as "power saving".
> >
> > Honestly, I don't understand the logic behind that new function.
> > I understand that depending on HW we ca monitor either one or multiple queues.
> > That's ok, but why we now need to mark one queue as a 'very special' one?
> 
> Because we don't know which of the queues we are supposed to sleep on.
> 
> Imagine a situation where you have 3 queues. What usually happens is you
> poll them in a loop, so q0, q1, q2, q0, q1, q2... etc. We only want to
> enter power-optimized state on polling q2, because otherwise we're
> risking going into power optimized state while q1 or q2 have traffic.

That's why before going to sleep we need to make sure that for *all* queues
we have at least EMPTYPOLL_MAX empty polls.
Then the order of queue checking wouldn't matter.
With your example it should be:
if (q1.empty_polls >  EMPTYPOLL_MAX && q2. empty_polls >  EMPTYPOLL_MAX &&
     q3.empy_pools >  EMPTYPOLL_MAX)
        goto_sleep;

Don't take me wrong, I am not suggesting to make *precisely* that checks
in the actual code (it could be time consuming if number of checks is big),
but the logic needs to remain.

> 
> Worst case scenario, we enter sleep after polling q0, then traffic
> arrives at q2, we wake up, and then attempt to go to sleep on q1 instead
> of skipping it. Essentially, we will be attempting to sleep at every
> queue, instead of once in a loop. This *might* be OK for multi-monitor
> because we'll be aborting sleep due to sleep condition check failure,
> but for modes like rte_pause()/rte_power_pause()-based sleep, we will be
> entering sleep unconditionally, and will be risking to sleep at q1 while
> there's traffic at q2.
> 
> So, we need this mechanism to be activated once every *loop*, not per queue.
> 
> > Why can't rte_power_ethdev_pmgmt_queue_enable() just:
> > Check is number of monitored queues exceed HW/SW capabilities,
> > and if so then just return a failure.
> > Otherwise add queue to the list and treat them all equally, i.e:
> > go to power save mode when number of sequential empty polls on
> > all monitored queues will exceed EMPTYPOLL_MAX threshold?
> >
> >>    Failing to call this API will result in no power management, however
> >>    when having only one queue per core it is obvious which queue is the
> >>    "power saving" one, so things will still work without this new API for
> >>    use cases that were previously working without it.
> >> - The limitation on UMWAIT-based polling is not removed because UMWAIT
> >>    is incapable of monitoring more than one address.
> >>
> >> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> >> ---
> >>   lib/power/rte_power_pmd_mgmt.c | 335 ++++++++++++++++++++++++++-------
> >>   lib/power/rte_power_pmd_mgmt.h |  34 ++++
> >>   lib/power/version.map          |   3 +
> >>   3 files changed, 306 insertions(+), 66 deletions(-)
> >>
> >> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
> >> index 0707c60a4f..60dd21a19c 100644
> >> --- a/lib/power/rte_power_pmd_mgmt.c
> >> +++ b/lib/power/rte_power_pmd_mgmt.c
> >> @@ -33,7 +33,19 @@ enum pmd_mgmt_state {
> >>        PMD_MGMT_ENABLED
> >>   };
> >>
> >> -struct pmd_queue_cfg {
> >> +struct queue {
> >> +     uint16_t portid;
> >> +     uint16_t qid;
> >> +};
> >
> > Just a thought: if that would help somehow, it can be changed to:
> > union queue {
> >          uint32_t raw;
> >          struct { uint16_t portid, qid;
> >          };
> > };
> >
> > That way in queue find/cmp functions below you can operate with single raw 32-bt values.
> > Probably not that important, as all these functions are on slow path, but might look nicer.
> 
> Sure, that can work. We actually do comparisons with power save queue on
> fast path, so maybe that'll help.
> 
> >
> >> +struct pmd_core_cfg {
> >> +     struct queue queues[RTE_MAX_ETHPORTS];
> >
> > If we'll have ability to monitor multiple queues per lcore, would it be always enough?
> >  From other side, it is updated on control path only.
> > Wouldn't normal list with malloc(/rte_malloc) would be more suitable here?
> 
> You're right, it should be dynamically allocated.
> 
> >
> >> +     /**< Which port-queue pairs are associated with this lcore? */
> >> +     struct queue power_save_queue;
> >> +     /**< When polling multiple queues, all but this one will be ignored */
> >> +     bool power_save_queue_set;
> >> +     /**< When polling multiple queues, power save queue must be set */
> >> +     size_t n_queues;
> >> +     /**< How many queues are in the list? */
> >>        volatile enum pmd_mgmt_state pwr_mgmt_state;
> >>        /**< State of power management for this queue */
> >>        enum rte_power_pmd_mgmt_type cb_mode;
> >> @@ -43,8 +55,97 @@ struct pmd_queue_cfg {
> >>        uint64_t empty_poll_stats;
> >>        /**< Number of empty polls */
> >>   } __rte_cache_aligned;
> >> +static struct pmd_core_cfg lcore_cfg[RTE_MAX_LCORE];
> >>
> >> -static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
> >> +static inline bool
> >> +queue_equal(const struct queue *l, const struct queue *r)
> >> +{
> >> +     return l->portid == r->portid && l->qid == r->qid;
> >> +}
> >> +
> >> +static inline void
> >> +queue_copy(struct queue *dst, const struct queue *src)
> >> +{
> >> +     dst->portid = src->portid;
> >> +     dst->qid = src->qid;
> >> +}
> >> +
> >> +static inline bool
> >> +queue_is_power_save(const struct pmd_core_cfg *cfg, const struct queue *q) {
> >
> > Here and in other places - any reason why standard DPDK coding style is not used?
> 
> Just accidental :)
> 
> >
> >> +     const struct queue *pwrsave = &cfg->power_save_queue;
> >> +
> >> +     /* if there's only single queue, no need to check anything */
> >> +     if (cfg->n_queues == 1)
> >> +             return true;
> >> +     return cfg->power_save_queue_set && queue_equal(q, pwrsave);
> >> +}
> >> +
> 
> 
> --
> Thanks,
> Anatoly
  
Burakov, Anatoly June 23, 2021, 9:56 a.m. UTC | #4
On 23-Jun-21 10:49 AM, Ananyev, Konstantin wrote:
> 
>>
>> On 22-Jun-21 10:41 AM, Ananyev, Konstantin wrote:
>>>
>>>> Currently, there is a hard limitation on the PMD power management
>>>> support that only allows it to support a single queue per lcore. This is
>>>> not ideal as most DPDK use cases will poll multiple queues per core.
>>>>
>>>> The PMD power management mechanism relies on ethdev Rx callbacks, so it
>>>> is very difficult to implement such support because callbacks are
>>>> effectively stateless and have no visibility into what the other ethdev
>>>> devices are doing.  This places limitations on what we can do within the
>>>> framework of Rx callbacks, but the basics of this implementation are as
>>>> follows:
>>>>
>>>> - Replace per-queue structures with per-lcore ones, so that any device
>>>>     polled from the same lcore can share data
>>>> - Any queue that is going to be polled from a specific lcore has to be
>>>>     added to the list of cores to poll, so that the callback is aware of
>>>>     other queues being polled by the same lcore
>>>> - Both the empty poll counter and the actual power saving mechanism is
>>>>     shared between all queues polled on a particular lcore, and is only
>>>>     activated when a special designated "power saving" queue is polled. To
>>>>     put it another way, we have no idea which queue the user will poll in
>>>>     what order, so we rely on them telling us that queue X is the last one
>>>>     in the polling loop, so any power management should happen there.
>>>> - A new API is added to mark a specific Rx queue as "power saving".
>>>
>>> Honestly, I don't understand the logic behind that new function.
>>> I understand that depending on HW we ca monitor either one or multiple queues.
>>> That's ok, but why we now need to mark one queue as a 'very special' one?
>>
>> Because we don't know which of the queues we are supposed to sleep on.
>>
>> Imagine a situation where you have 3 queues. What usually happens is you
>> poll them in a loop, so q0, q1, q2, q0, q1, q2... etc. We only want to
>> enter power-optimized state on polling q2, because otherwise we're
>> risking going into power optimized state while q1 or q2 have traffic.
> 
> That's why before going to sleep we need to make sure that for *all* queues
> we have at least EMPTYPOLL_MAX empty polls.
> Then the order of queue checking wouldn't matter.
> With your example it should be:
> if (q1.empty_polls >  EMPTYPOLL_MAX && q2. empty_polls >  EMPTYPOLL_MAX &&
>       q3.empy_pools >  EMPTYPOLL_MAX)
>          goto_sleep;
> 
> Don't take me wrong, I am not suggesting to make *precisely* that checks
> in the actual code (it could be time consuming if number of checks is big),
> but the logic needs to remain.
> 

The empty poll counter is *per core*, not *per queue*. All the shared 
data is per core. We only increment empty poll counter on last queue, 
but we drop it to 0 on any queue that has received traffic. That way, we 
can avoid checking/incrementing empty poll counters for multiple queues. 
In other words, this is effectively achieving what you're suggesting, 
but without per-queue checks.

Of course, i could make it per-queue like before, but then we just end 
up doing way more checks on every callback and basically need to have 
the same logic anyway, so why bother?

>>
>> Worst case scenario, we enter sleep after polling q0, then traffic
>> arrives at q2, we wake up, and then attempt to go to sleep on q1 instead
>> of skipping it. Essentially, we will be attempting to sleep at every
>> queue, instead of once in a loop. This *might* be OK for multi-monitor
>> because we'll be aborting sleep due to sleep condition check failure,
>> but for modes like rte_pause()/rte_power_pause()-based sleep, we will be
>> entering sleep unconditionally, and will be risking to sleep at q1 while
>> there's traffic at q2.
>>
>> So, we need this mechanism to be activated once every *loop*, not per queue.
>>
>>> Why can't rte_power_ethdev_pmgmt_queue_enable() just:
>>> Check is number of monitored queues exceed HW/SW capabilities,
>>> and if so then just return a failure.
>>> Otherwise add queue to the list and treat them all equally, i.e:
>>> go to power save mode when number of sequential empty polls on
>>> all monitored queues will exceed EMPTYPOLL_MAX threshold?
>>>
>>>>     Failing to call this API will result in no power management, however
>>>>     when having only one queue per core it is obvious which queue is the
>>>>     "power saving" one, so things will still work without this new API for
>>>>     use cases that were previously working without it.
>>>> - The limitation on UMWAIT-based polling is not removed because UMWAIT
>>>>     is incapable of monitoring more than one address.
>>>>
>>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>> ---
>>>>    lib/power/rte_power_pmd_mgmt.c | 335 ++++++++++++++++++++++++++-------
>>>>    lib/power/rte_power_pmd_mgmt.h |  34 ++++
>>>>    lib/power/version.map          |   3 +
>>>>    3 files changed, 306 insertions(+), 66 deletions(-)
>>>>
>>>> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
>>>> index 0707c60a4f..60dd21a19c 100644
>>>> --- a/lib/power/rte_power_pmd_mgmt.c
>>>> +++ b/lib/power/rte_power_pmd_mgmt.c
>>>> @@ -33,7 +33,19 @@ enum pmd_mgmt_state {
>>>>         PMD_MGMT_ENABLED
>>>>    };
>>>>
>>>> -struct pmd_queue_cfg {
>>>> +struct queue {
>>>> +     uint16_t portid;
>>>> +     uint16_t qid;
>>>> +};
>>>
>>> Just a thought: if that would help somehow, it can be changed to:
>>> union queue {
>>>           uint32_t raw;
>>>           struct { uint16_t portid, qid;
>>>           };
>>> };
>>>
>>> That way in queue find/cmp functions below you can operate with single raw 32-bt values.
>>> Probably not that important, as all these functions are on slow path, but might look nicer.
>>
>> Sure, that can work. We actually do comparisons with power save queue on
>> fast path, so maybe that'll help.
>>
>>>
>>>> +struct pmd_core_cfg {
>>>> +     struct queue queues[RTE_MAX_ETHPORTS];
>>>
>>> If we'll have ability to monitor multiple queues per lcore, would it be always enough?
>>>   From other side, it is updated on control path only.
>>> Wouldn't normal list with malloc(/rte_malloc) would be more suitable here?
>>
>> You're right, it should be dynamically allocated.
>>
>>>
>>>> +     /**< Which port-queue pairs are associated with this lcore? */
>>>> +     struct queue power_save_queue;
>>>> +     /**< When polling multiple queues, all but this one will be ignored */
>>>> +     bool power_save_queue_set;
>>>> +     /**< When polling multiple queues, power save queue must be set */
>>>> +     size_t n_queues;
>>>> +     /**< How many queues are in the list? */
>>>>         volatile enum pmd_mgmt_state pwr_mgmt_state;
>>>>         /**< State of power management for this queue */
>>>>         enum rte_power_pmd_mgmt_type cb_mode;
>>>> @@ -43,8 +55,97 @@ struct pmd_queue_cfg {
>>>>         uint64_t empty_poll_stats;
>>>>         /**< Number of empty polls */
>>>>    } __rte_cache_aligned;
>>>> +static struct pmd_core_cfg lcore_cfg[RTE_MAX_LCORE];
>>>>
>>>> -static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
>>>> +static inline bool
>>>> +queue_equal(const struct queue *l, const struct queue *r)
>>>> +{
>>>> +     return l->portid == r->portid && l->qid == r->qid;
>>>> +}
>>>> +
>>>> +static inline void
>>>> +queue_copy(struct queue *dst, const struct queue *src)
>>>> +{
>>>> +     dst->portid = src->portid;
>>>> +     dst->qid = src->qid;
>>>> +}
>>>> +
>>>> +static inline bool
>>>> +queue_is_power_save(const struct pmd_core_cfg *cfg, const struct queue *q) {
>>>
>>> Here and in other places - any reason why standard DPDK coding style is not used?
>>
>> Just accidental :)
>>
>>>
>>>> +     const struct queue *pwrsave = &cfg->power_save_queue;
>>>> +
>>>> +     /* if there's only single queue, no need to check anything */
>>>> +     if (cfg->n_queues == 1)
>>>> +             return true;
>>>> +     return cfg->power_save_queue_set && queue_equal(q, pwrsave);
>>>> +}
>>>> +
>>
>>
>> --
>> Thanks,
>> Anatoly
  

Patch

diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index 0707c60a4f..60dd21a19c 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -33,7 +33,19 @@  enum pmd_mgmt_state {
 	PMD_MGMT_ENABLED
 };
 
-struct pmd_queue_cfg {
+struct queue {
+	uint16_t portid;
+	uint16_t qid;
+};
+struct pmd_core_cfg {
+	struct queue queues[RTE_MAX_ETHPORTS];
+	/**< Which port-queue pairs are associated with this lcore? */
+	struct queue power_save_queue;
+	/**< When polling multiple queues, all but this one will be ignored */
+	bool power_save_queue_set;
+	/**< When polling multiple queues, power save queue must be set */
+	size_t n_queues;
+	/**< How many queues are in the list? */
 	volatile enum pmd_mgmt_state pwr_mgmt_state;
 	/**< State of power management for this queue */
 	enum rte_power_pmd_mgmt_type cb_mode;
@@ -43,8 +55,97 @@  struct pmd_queue_cfg {
 	uint64_t empty_poll_stats;
 	/**< Number of empty polls */
 } __rte_cache_aligned;
+static struct pmd_core_cfg lcore_cfg[RTE_MAX_LCORE];
 
-static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+static inline bool
+queue_equal(const struct queue *l, const struct queue *r)
+{
+	return l->portid == r->portid && l->qid == r->qid;
+}
+
+static inline void
+queue_copy(struct queue *dst, const struct queue *src)
+{
+	dst->portid = src->portid;
+	dst->qid = src->qid;
+}
+
+static inline bool
+queue_is_power_save(const struct pmd_core_cfg *cfg, const struct queue *q) {
+	const struct queue *pwrsave = &cfg->power_save_queue;
+
+	/* if there's only single queue, no need to check anything */
+	if (cfg->n_queues == 1)
+		return true;
+	return cfg->power_save_queue_set && queue_equal(q, pwrsave);
+}
+
+static int
+queue_list_find(const struct pmd_core_cfg *cfg, const struct queue *q,
+		size_t *idx) {
+	size_t i;
+	for (i = 0; i < cfg->n_queues; i++) {
+		const struct queue *cur = &cfg->queues[i];
+		if (queue_equal(cur, q)) {
+			if (idx != NULL)
+				*idx = i;
+			return 0;
+		}
+	}
+	return -1;
+}
+
+static int
+queue_set_power_save(struct pmd_core_cfg *cfg, const struct queue *q) {
+	if (queue_list_find(cfg, q, NULL) < 0)
+		return -ENOENT;
+	queue_copy(&cfg->power_save_queue, q);
+	cfg->power_save_queue_set = true;
+	return 0;
+}
+
+static int
+queue_list_add(struct pmd_core_cfg *cfg, const struct queue *q)
+{
+	size_t idx = cfg->n_queues;
+	if (idx >= RTE_DIM(cfg->queues))
+		return -ENOSPC;
+	/* is it already in the list? */
+	if (queue_list_find(cfg, q, NULL) == 0)
+		return -EEXIST;
+	queue_copy(&cfg->queues[idx], q);
+	cfg->n_queues++;
+
+	return 0;
+}
+
+static int
+queue_list_remove(struct pmd_core_cfg *cfg, const struct queue *q)
+{
+	struct queue *found, *pwrsave;
+	size_t idx, last_idx = cfg->n_queues - 1;
+
+	if (queue_list_find(cfg, q, &idx) != 0)
+		return -ENOENT;
+
+	/* erase the queue pair being deleted */
+	found = &cfg->queues[idx];
+	memset(found, 0, sizeof(*found));
+
+	/* move the rest of the list */
+	for (; idx < last_idx; idx++)
+		queue_copy(&cfg->queues[idx], &cfg->queues[idx + 1]);
+	cfg->n_queues--;
+
+	/* if this was a power save queue, unset it */
+	pwrsave = &cfg->power_save_queue;
+	if (cfg->power_save_queue_set && queue_is_power_save(cfg, q)) {
+		cfg->power_save_queue_set = false;
+		memset(pwrsave, 0, sizeof(*pwrsave));
+	}
+
+	return 0;
+}
 
 static void
 calc_tsc(void)
@@ -79,10 +180,10 @@  clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 		uint16_t nb_rx, uint16_t max_pkts __rte_unused,
 		void *addr __rte_unused)
 {
+	const unsigned int lcore = rte_lcore_id();
+	struct pmd_core_cfg *q_conf;
 
-	struct pmd_queue_cfg *q_conf;
-
-	q_conf = &port_cfg[port_id][qidx];
+	q_conf = &lcore_cfg[lcore];
 
 	if (unlikely(nb_rx == 0)) {
 		q_conf->empty_poll_stats++;
@@ -107,11 +208,26 @@  clb_pause(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 		uint16_t nb_rx, uint16_t max_pkts __rte_unused,
 		void *addr __rte_unused)
 {
-	struct pmd_queue_cfg *q_conf;
+	const unsigned int lcore = rte_lcore_id();
+	const struct queue q = {port_id, qidx};
+	const bool empty = nb_rx == 0;
+	struct pmd_core_cfg *q_conf;
 
-	q_conf = &port_cfg[port_id][qidx];
+	q_conf = &lcore_cfg[lcore];
 
-	if (unlikely(nb_rx == 0)) {
+	/* early exit */
+	if (likely(!empty)) {
+		q_conf->empty_poll_stats = 0;
+	} else {
+		/* do we care about this particular queue? */
+		if (!queue_is_power_save(q_conf, &q))
+			return nb_rx;
+
+		/*
+		 * we can increment unconditionally here because if there were
+		 * non-empty polls in other queues assigned to this core, we
+		 * dropped the counter to zero anyway.
+		 */
 		q_conf->empty_poll_stats++;
 		/* sleep for 1 microsecond */
 		if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) {
@@ -127,8 +243,7 @@  clb_pause(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 					rte_pause();
 			}
 		}
-	} else
-		q_conf->empty_poll_stats = 0;
+	}
 
 	return nb_rx;
 }
@@ -138,29 +253,97 @@  clb_scale_freq(uint16_t port_id, uint16_t qidx,
 		struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
 		uint16_t max_pkts __rte_unused, void *_  __rte_unused)
 {
-	struct pmd_queue_cfg *q_conf;
+	const unsigned int lcore = rte_lcore_id();
+	const struct queue q = {port_id, qidx};
+	const bool empty = nb_rx == 0;
+	struct pmd_core_cfg *q_conf;
 
-	q_conf = &port_cfg[port_id][qidx];
+	q_conf = &lcore_cfg[lcore];
 
-	if (unlikely(nb_rx == 0)) {
+	/* early exit */
+	if (likely(!empty)) {
+		q_conf->empty_poll_stats = 0;
+
+		/* scale up freq immediately */
+		rte_power_freq_max(rte_lcore_id());
+	} else {
+		/* do we care about this particular queue? */
+		if (!queue_is_power_save(q_conf, &q))
+			return nb_rx;
+
+		/*
+		 * we can increment unconditionally here because if there were
+		 * non-empty polls in other queues assigned to this core, we
+		 * dropped the counter to zero anyway.
+		 */
 		q_conf->empty_poll_stats++;
 		if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX))
 			/* scale down freq */
 			rte_power_freq_min(rte_lcore_id());
-	} else {
-		q_conf->empty_poll_stats = 0;
-		/* scale up freq */
-		rte_power_freq_max(rte_lcore_id());
 	}
 
 	return nb_rx;
 }
 
+static int
+check_scale(unsigned int lcore)
+{
+	enum power_management_env env;
+
+	/* only PSTATE and ACPI modes are supported */
+	if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) &&
+	    !rte_power_check_env_supported(PM_ENV_PSTATE_CPUFREQ)) {
+		RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n");
+		return -ENOTSUP;
+	}
+	/* ensure we could initialize the power library */
+	if (rte_power_init(lcore))
+		return -EINVAL;
+
+	/* ensure we initialized the correct env */
+	env = rte_power_get_env();
+	if (env != PM_ENV_ACPI_CPUFREQ && env != PM_ENV_PSTATE_CPUFREQ) {
+		RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized\n");
+		return -ENOTSUP;
+	}
+
+	/* we're done */
+	return 0;
+}
+
+static int
+check_monitor(struct pmd_core_cfg *cfg, const struct queue *qdata)
+{
+	struct rte_power_monitor_cond dummy;
+
+	/* check if rte_power_monitor is supported */
+	if (!global_data.intrinsics_support.power_monitor) {
+		RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n");
+		return -ENOTSUP;
+	}
+
+	if (cfg->n_queues > 0) {
+		RTE_LOG(DEBUG, POWER, "Monitoring multiple queues is not supported\n");
+		return -ENOTSUP;
+	}
+
+	/* check if the device supports the necessary PMD API */
+	if (rte_eth_get_monitor_addr(qdata->portid, qdata->qid,
+			&dummy) == -ENOTSUP) {
+		RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_get_monitor_addr\n");
+		return -ENOTSUP;
+	}
+
+	/* we're done */
+	return 0;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
 {
-	struct pmd_queue_cfg *queue_cfg;
+	const struct queue qdata = {port_id, queue_id};
+	struct pmd_core_cfg *queue_cfg;
 	struct rte_eth_dev_info info;
 	rte_rx_callback_fn clb;
 	int ret;
@@ -183,9 +366,11 @@  rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		goto end;
 	}
 
-	queue_cfg = &port_cfg[port_id][queue_id];
+	queue_cfg = &lcore_cfg[lcore_id];
 
-	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) {
+	/* if callback was already enabled, check current callback type */
+	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED &&
+			queue_cfg->cb_mode != mode) {
 		ret = -EINVAL;
 		goto end;
 	}
@@ -195,53 +380,20 @@  rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 
 	switch (mode) {
 	case RTE_POWER_MGMT_TYPE_MONITOR:
-	{
-		struct rte_power_monitor_cond dummy;
-
-		/* check if rte_power_monitor is supported */
-		if (!global_data.intrinsics_support.power_monitor) {
-			RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n");
-			ret = -ENOTSUP;
+		/* check if we can add a new queue */
+		ret = check_monitor(queue_cfg, &qdata);
+		if (ret < 0)
 			goto end;
-		}
 
-		/* check if the device supports the necessary PMD API */
-		if (rte_eth_get_monitor_addr(port_id, queue_id,
-				&dummy) == -ENOTSUP) {
-			RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_get_monitor_addr\n");
-			ret = -ENOTSUP;
-			goto end;
-		}
 		clb = clb_umwait;
 		break;
-	}
 	case RTE_POWER_MGMT_TYPE_SCALE:
-	{
-		enum power_management_env env;
-		/* only PSTATE and ACPI modes are supported */
-		if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) &&
-				!rte_power_check_env_supported(
-					PM_ENV_PSTATE_CPUFREQ)) {
-			RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n");
-			ret = -ENOTSUP;
+		/* check if we can add a new queue */
+		ret = check_scale(lcore_id);
+		if (ret < 0)
 			goto end;
-		}
-		/* ensure we could initialize the power library */
-		if (rte_power_init(lcore_id)) {
-			ret = -EINVAL;
-			goto end;
-		}
-		/* ensure we initialized the correct env */
-		env = rte_power_get_env();
-		if (env != PM_ENV_ACPI_CPUFREQ &&
-				env != PM_ENV_PSTATE_CPUFREQ) {
-			RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized\n");
-			ret = -ENOTSUP;
-			goto end;
-		}
 		clb = clb_scale_freq;
 		break;
-	}
 	case RTE_POWER_MGMT_TYPE_PAUSE:
 		/* figure out various time-to-tsc conversions */
 		if (global_data.tsc_per_us == 0)
@@ -254,11 +406,20 @@  rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		ret = -EINVAL;
 		goto end;
 	}
+	/* add this queue to the list */
+	ret = queue_list_add(queue_cfg, &qdata);
+	if (ret < 0) {
+		RTE_LOG(DEBUG, POWER, "Failed to add queue to list: %s\n",
+				strerror(-ret));
+		goto end;
+	}
 
 	/* initialize data before enabling the callback */
-	queue_cfg->empty_poll_stats = 0;
-	queue_cfg->cb_mode = mode;
-	queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	if (queue_cfg->n_queues == 1) {
+		queue_cfg->empty_poll_stats = 0;
+		queue_cfg->cb_mode = mode;
+		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	}
 	queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
 			clb, NULL);
 
@@ -271,7 +432,9 @@  int
 rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id)
 {
-	struct pmd_queue_cfg *queue_cfg;
+	const struct queue qdata = {port_id, queue_id};
+	struct pmd_core_cfg *queue_cfg;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
@@ -279,13 +442,24 @@  rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		return -EINVAL;
 
 	/* no need to check queue id as wrong queue id would not be enabled */
-	queue_cfg = &port_cfg[port_id][queue_id];
+	queue_cfg = &lcore_cfg[lcore_id];
 
 	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED)
 		return -EINVAL;
 
-	/* stop any callbacks from progressing */
-	queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
+	/*
+	 * There is no good/easy way to do this without race conditions, so we
+	 * are just going to throw our hands in the air and hope that the user
+	 * has read the documentation and has ensured that ports are stopped at
+	 * the time we enter the API functions.
+	 */
+	ret = queue_list_remove(queue_cfg, &qdata);
+	if (ret < 0)
+		return -ret;
+
+	/* if we've removed all queues from the lists, set state to disabled */
+	if (queue_cfg->n_queues == 0)
+		queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
 
 	switch (queue_cfg->cb_mode) {
 	case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
@@ -309,3 +483,32 @@  rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 
 	return 0;
 }
+
+int
+rte_power_ethdev_pmgmt_queue_set_power_save(unsigned int lcore_id,
+		uint16_t port_id, uint16_t queue_id)
+{
+	const struct queue qdata = {port_id, queue_id};
+	struct pmd_core_cfg *queue_cfg;
+	int ret;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
+		return -EINVAL;
+
+	/* no need to check queue id as wrong queue id would not be enabled */
+	queue_cfg = &lcore_cfg[lcore_id];
+
+	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED)
+		return -EINVAL;
+
+	ret = queue_set_power_save(queue_cfg, &qdata);
+	if (ret < 0) {
+		RTE_LOG(DEBUG, POWER, "Failed to set power save queue: %s\n",
+			strerror(-ret));
+		return -ret;
+	}
+
+	return 0;
+}
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 7557f5d7e1..edf8d8714f 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -90,6 +90,40 @@  int
 rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Set a specific Ethernet device Rx queue to be the "power save" queue for a
+ * particular lcore. When multiple queues are assigned to a single lcore using
+ * the `rte_power_ethdev_pmgmt_queue_enable` API, only one of them will trigger
+ * the power management. In a typical scenario, the last queue to be polled on
+ * a particular lcore should be designated as power save queue.
+ *
+ * @note This function is not thread-safe.
+ *
+ * @note When using multiple queues per lcore, calling this function is
+ *   mandatory. If not called, no power management routines would be triggered
+ *   when the traffic starts.
+ *
+ * @warning This function must be called when all affected Ethernet ports are
+ *   stopped and no Rx/Tx is in progress!
+ *
+ * @param lcore_id
+ *   The lcore the Rx queue is polled from.
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The queue identifier of the Ethernet device.
+ * @return
+ *   0 on success
+ *   <0 on error
+ */
+__rte_experimental
+int
+rte_power_ethdev_pmgmt_queue_set_power_save(unsigned int lcore_id,
+		uint16_t port_id, uint16_t queue_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/power/version.map b/lib/power/version.map
index b004e3e4a9..105d1d94c2 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -38,4 +38,7 @@  EXPERIMENTAL {
 	# added in 21.02
 	rte_power_ethdev_pmgmt_queue_disable;
 	rte_power_ethdev_pmgmt_queue_enable;
+
+	# added in 21.08
+	rte_power_ethdev_pmgmt_queue_set_power_save;
 };