[v10,1/2] power: introduce PM QoS API on CPU wide

Message ID 20240912023812.30885-2-lihuisong@huawei.com (mailing list archive)
State Changes Requested
Delegated to: Thomas Monjalon
Headers
Series power: introduce PM QoS interface |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

lihuisong (C) Sept. 12, 2024, 2:38 a.m. UTC
The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 doc/guides/prog_guide/power_man.rst    |  24 ++++++
 doc/guides/rel_notes/release_24_11.rst |   5 ++
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 111 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  73 ++++++++++++++++
 lib/power/version.map                  |   4 +
 6 files changed, 219 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h
  

Comments

Stephen Hemminger Oct. 13, 2024, 1:10 a.m. UTC | #1
On Thu, 12 Sep 2024 10:38:11 +0800
Huisong Li <lihuisong@huawei.com> wrote:

> +
> +PM QoS
> +------
> +
> +The deeper the idle state, the lower the power consumption, but the longer
> +the resume time. Some service are delay sensitive and very except the low
> +resume time, like interrupt packet receiving mode.
> +
> +And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
> +interface is used to set and get the resume latency limit on the cpuX for
> +userspace. Each cpuidle governor in Linux select which idle state to enter
> +based on this CPU resume latency in their idle task.
> +
> +The per-CPU PM QoS API can be used to set and get the CPU resume latency based
> +on this sysfs.
> +
> +The ``rte_power_qos_set_cpu_resume_latency()`` function can control the CPU's
> +idle state selection in Linux and limit just to enter the shallowest idle state
> +to low the delay of resuming service after sleeping by setting strict resume
> +latency (zero value).
> +
> +The ``rte_power_qos_get_cpu_resume_latency()`` function can get the resume
> +latency on specified CPU.
> +

These paragraphs need some editing help. The wording is awkward,
it uses passive voice, and does not seemed directed at a user audience.
If you need help, a writer or AI might help clarify.

It also ends up in the section associated with Intel UnCore.
It would be better after the Turbo Boost section.



> +	ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
> +	if (ret != 0) {
> +		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
> +		return ret;
> +	}

The function open_core_sysfs_file() should have been written to return FILE *
and then it could have same attributes as fopen().

The message should include the error reason.

	if (open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id)) {
		POWER_LOG(ERR, "Failed to open " PM_QOS_SYSFILE_RESUME_LATENCY_US ": %s",
			lcore_id, strerror(errno));
  
Konstantin Ananyev Oct. 14, 2024, 8:29 a.m. UTC | #2
> The deeper the idle state, the lower the power consumption, but the longer
> the resume time. Some service are delay sensitive and very except the low
> resume time, like interrupt packet receiving mode.
> 
> And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
> interface is used to set and get the resume latency limit on the cpuX for
> userspace. Each cpuidle governor in Linux select which idle state to enter
> based on this CPU resume latency in their idle task.
> 
> The per-CPU PM QoS API can be used to control this CPU's idle state
> selection and limit just enter the shallowest idle state to low the delay
> after sleep by setting strict resume latency (zero value).
> 
> Signed-off-by: Huisong Li <lihuisong@huawei.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>  doc/guides/prog_guide/power_man.rst    |  24 ++++++
>  doc/guides/rel_notes/release_24_11.rst |   5 ++
>  lib/power/meson.build                  |   2 +
>  lib/power/rte_power_qos.c              | 111 +++++++++++++++++++++++++
>  lib/power/rte_power_qos.h              |  73 ++++++++++++++++
>  lib/power/version.map                  |   4 +
>  6 files changed, 219 insertions(+)
>  create mode 100644 lib/power/rte_power_qos.c
>  create mode 100644 lib/power/rte_power_qos.h
> 
> diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
> index f6674efe2d..faa32b4320 100644
> --- a/doc/guides/prog_guide/power_man.rst
> +++ b/doc/guides/prog_guide/power_man.rst
> @@ -249,6 +249,30 @@ Get Num Pkgs
>  Get Num Dies
>    Get the number of die's on a given package.
> 
> +
> +PM QoS
> +------
> +
> +The deeper the idle state, the lower the power consumption, but the longer
> +the resume time. Some service are delay sensitive and very except the low
> +resume time, like interrupt packet receiving mode.
> +
> +And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
> +interface is used to set and get the resume latency limit on the cpuX for
> +userspace. Each cpuidle governor in Linux select which idle state to enter
> +based on this CPU resume latency in their idle task.
> +
> +The per-CPU PM QoS API can be used to set and get the CPU resume latency based
> +on this sysfs.
> +
> +The ``rte_power_qos_set_cpu_resume_latency()`` function can control the CPU's
> +idle state selection in Linux and limit just to enter the shallowest idle state
> +to low the delay of resuming service after sleeping by setting strict resume
> +latency (zero value).
> +
> +The ``rte_power_qos_get_cpu_resume_latency()`` function can get the resume
> +latency on specified CPU.
> +
>  References
>  ----------
> 
> diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
> index 0ff70d9057..bd72d0a595 100644
> --- a/doc/guides/rel_notes/release_24_11.rst
> +++ b/doc/guides/rel_notes/release_24_11.rst
> @@ -55,6 +55,11 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
> 
> +* **Introduce per-CPU PM QoS interface.**
> +
> +  * Add per-CPU PM QoS interface to low the delay after sleep by controlling
> +    CPU idle state selection.
> +
> 
>  Removed Items
>  -------------
> diff --git a/lib/power/meson.build b/lib/power/meson.build
> index b8426589b2..8222e178b0 100644
> --- a/lib/power/meson.build
> +++ b/lib/power/meson.build
> @@ -23,12 +23,14 @@ sources = files(
>          'rte_power.c',
>          'rte_power_uncore.c',
>          'rte_power_pmd_mgmt.c',
> +        'rte_power_qos.c',
>  )
>  headers = files(
>          'rte_power.h',
>          'rte_power_guest_channel.h',
>          'rte_power_pmd_mgmt.h',
>          'rte_power_uncore.h',
> +        'rte_power_qos.h',
>  )
>  if cc.has_argument('-Wno-cast-qual')
>      cflags += '-Wno-cast-qual'
> diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
> new file mode 100644
> index 0000000000..8eb26cd41a
> --- /dev/null
> +++ b/lib/power/rte_power_qos.c
> @@ -0,0 +1,111 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2024 HiSilicon Limited
> + */
> +
> +#include <errno.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +#include <rte_lcore.h>
> +#include <rte_log.h>
> +
> +#include "power_common.h"
> +#include "rte_power_qos.h"
> +
> +#define PM_QOS_SYSFILE_RESUME_LATENCY_US	\
> +	"/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
> +
> +#define PM_QOS_CPU_RESUME_LATENCY_BUF_LEN	32
> +
> +int
> +rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
> +{
> +	char buf[PM_QOS_CPU_RESUME_LATENCY_BUF_LEN];
> +	FILE *f;
> +	int ret;
> +
> +	if (!rte_lcore_is_enabled(lcore_id)) {
> +		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
> +		return -EINVAL;
> +	}
> +
> +	if (latency < 0) {
> +		POWER_LOG(ERR, "latency should be greater than and equal to 0");
> +		return -EINVAL;
> +	}
> +
> +	ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);

That was already brought by Morten:
lcore_id is not always equal to cpu_core_id (cpu affinity).
Looking through power library it is not specific to that particular patch,
but sort of common limitation (bug?) in rte_power lib.  
 

> +	if (ret != 0) {
> +		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
> +		return ret;
> +	}
> +
> +	/*
> +	 * Based on the sysfs interface pm_qos_resume_latency_us under
> +	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meaning
> +	 * is as follows for different input string.
> +	 * 1> the resume latency is 0 if the input is "n/a".
> +	 * 2> the resume latency is no constraint if the input is "0".
> +	 * 3> the resume latency is the actual value to be set.
> +	 */
> +	if (latency == 0)
> +		snprintf(buf, sizeof(buf), "%s", "n/a");
> +	else if (latency == RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT)
> +		snprintf(buf, sizeof(buf), "%u", 0);
> +	else
> +		snprintf(buf, sizeof(buf), "%u", latency);
> +
> +	ret = write_core_sysfs_s(f, buf);
> +	if (ret != 0)
> +		POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
> +
> +	fclose(f);
> +
> +	return ret;
> +}
> +
> +int
> +rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
> +{
> +	char buf[PM_QOS_CPU_RESUME_LATENCY_BUF_LEN];
> +	int latency = -1;
> +	FILE *f;
> +	int ret;
> +
> +	if (!rte_lcore_is_enabled(lcore_id)) {
> +		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
> +		return -EINVAL;
> +	}
> +
> +	ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
> +	if (ret != 0) {
> +		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
> +		return ret;
> +	}
> +
> +	ret = read_core_sysfs_s(f, buf, sizeof(buf));
> +	if (ret != 0) {
> +		POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
> +		goto out;
> +	}
> +
> +	/*
> +	 * Based on the sysfs interface pm_qos_resume_latency_us under
> +	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meaning
> +	 * is as follows for different output string.
> +	 * 1> the resume latency is 0 if the output is "n/a".
> +	 * 2> the resume latency is no constraint if the output is "0".
> +	 * 3> the resume latency is the actual value in used for other string.
> +	 */
> +	if (strcmp(buf, "n/a") == 0)
> +		latency = 0;
> +	else {
> +		latency = strtoul(buf, NULL, 10);
> +		latency = latency == 0 ? RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
> +	}
> +
> +out:
> +	fclose(f);
> +
> +	return latency != -1 ? latency : ret;
> +}
> diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
> new file mode 100644
> index 0000000000..990c488373
> --- /dev/null
> +++ b/lib/power/rte_power_qos.h
> @@ -0,0 +1,73 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2024 HiSilicon Limited
> + */
> +
> +#ifndef RTE_POWER_QOS_H
> +#define RTE_POWER_QOS_H
> +
> +#include <stdint.h>
> +
> +#include <rte_compat.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * @file rte_power_qos.h
> + *
> + * PM QoS API.
> + *
> + * The CPU-wide resume latency limit has a positive impact on this CPU's idle
> + * state selection in each cpuidle governor.
> + * Please see the PM QoS on CPU wide in the following link:
> + * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-
> power-pm-qos-resume-latency-us
> + *
> + * The deeper the idle state, the lower the power consumption, but the
> + * longer the resume time. Some service are delay sensitive and very except the
> + * low resume time, like interrupt packet receiving mode.
> + *
> + * In these case, per-CPU PM QoS API can be used to control this CPU's idle
> + * state selection and limit just enter the shallowest idle state to low the
> + * delay after sleep by setting strict resume latency (zero value).
> + */
> +
> +#define RTE_POWER_QOS_STRICT_LATENCY_VALUE             0
> +#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT    ((int)(UINT32_MAX >> 1))
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * @param lcore_id
> + *   target logical core id
> + *
> + * @param latency
> + *   The latency should be greater than and equal to zero in microseconds unit.
> + *
> + * @return
> + *   0 on success. Otherwise negative value is returned.
> + */
> +__rte_experimental
> +int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Get the current resume latency of this logical core.
> + * The default value in kernel is @see RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT
> + * if don't set it.
> + *
> + * @return
> + *   Negative value on failure.
> + *   >= 0 means the actual resume latency limit on this core.
> + */
> +__rte_experimental
> +int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* RTE_POWER_QOS_H */
> diff --git a/lib/power/version.map b/lib/power/version.map
> index c9a226614e..08f178a39d 100644
> --- a/lib/power/version.map
> +++ b/lib/power/version.map
> @@ -51,4 +51,8 @@ EXPERIMENTAL {
>  	rte_power_set_uncore_env;
>  	rte_power_uncore_freqs;
>  	rte_power_unset_uncore_env;
> +
> +	# added in 24.11
> +	rte_power_qos_get_cpu_resume_latency;
> +	rte_power_qos_set_cpu_resume_latency;
>  };
> --
> 2.22.0
  
lihuisong (C) Oct. 14, 2024, 12:19 p.m. UTC | #3
Hi Stephen,


在 2024/10/13 9:10, Stephen Hemminger 写道:
> On Thu, 12 Sep 2024 10:38:11 +0800
> Huisong Li <lihuisong@huawei.com> wrote:
>
>> +
>> +PM QoS
>> +------
>> +
>> +The deeper the idle state, the lower the power consumption, but the longer
>> +the resume time. Some service are delay sensitive and very except the low
>> +resume time, like interrupt packet receiving mode.
>> +
>> +And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
>> +interface is used to set and get the resume latency limit on the cpuX for
>> +userspace. Each cpuidle governor in Linux select which idle state to enter
>> +based on this CPU resume latency in their idle task.
>> +
>> +The per-CPU PM QoS API can be used to set and get the CPU resume latency based
>> +on this sysfs.
>> +
>> +The ``rte_power_qos_set_cpu_resume_latency()`` function can control the CPU's
>> +idle state selection in Linux and limit just to enter the shallowest idle state
>> +to low the delay of resuming service after sleeping by setting strict resume
>> +latency (zero value).
>> +
>> +The ``rte_power_qos_get_cpu_resume_latency()`` function can get the resume
>> +latency on specified CPU.
>> +
> These paragraphs need some editing help. The wording is awkward,
> it uses passive voice, and does not seemed directed at a user audience.
> If you need help, a writer or AI might help clarify.
How about the following description:
-->
The "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

The ``rte_power_qos_set_cpu_resume_latency()`` and 
``rte_power_qos_get_cpu_resume_latency()``
can set and get the CPU resume latency based on this per-CPU sysfs.

The ``rte_power_qos_set_cpu_resume_latency()`` can control the CPU's
idle state selection and limit to enter the shallowest idle state
to low the delay of resuming service by setting strict resume
latency (zero value) so as to get better performance.
>
> It also ends up in the section associated with Intel UnCore.
> It would be better after the Turbo Boost section.
Ack
>
>
>
>> +	ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
>> +	if (ret != 0) {
>> +		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
>> +		return ret;
>> +	}
> The function open_core_sysfs_file() should have been written to return FILE *
> and then it could have same attributes as fopen().
Do you mean we should save this handle for directly using it to get or 
set this resume latency?
If it is I understand, we cannot do it like this. Because qos driver in 
Linux use the "noop_llseek()" as the .llseek API.
>
> The message should include the error reason.
>
> 	if (open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id)) {
> 		POWER_LOG(ERR, "Failed to open " PM_QOS_SYSFILE_RESUME_LATENCY_US ": %s",
> 			lcore_id, strerror(errno));
Good idea.  Thanks.

>
> .
  

Patch

diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..faa32b4320 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,30 @@  Get Num Pkgs
 Get Num Dies
   Get the number of die's on a given package.
 
+
+PM QoS
+------
+
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service are delay sensitive and very except the low
+resume time, like interrupt packet receiving mode.
+
+And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
+interface is used to set and get the resume latency limit on the cpuX for
+userspace. Each cpuidle governor in Linux select which idle state to enter
+based on this CPU resume latency in their idle task.
+
+The per-CPU PM QoS API can be used to set and get the CPU resume latency based
+on this sysfs.
+
+The ``rte_power_qos_set_cpu_resume_latency()`` function can control the CPU's
+idle state selection in Linux and limit just to enter the shallowest idle state
+to low the delay of resuming service after sleeping by setting strict resume
+latency (zero value).
+
+The ``rte_power_qos_get_cpu_resume_latency()`` function can get the resume
+latency on specified CPU.
+
 References
 ----------
 
diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 0ff70d9057..bd72d0a595 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -55,6 +55,11 @@  New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Introduce per-CPU PM QoS interface.**
+
+  * Add per-CPU PM QoS interface to low the delay after sleep by controlling
+    CPU idle state selection.
+
 
 Removed Items
 -------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@  sources = files(
         'rte_power.c',
         'rte_power_uncore.c',
         'rte_power_pmd_mgmt.c',
+        'rte_power_qos.c',
 )
 headers = files(
         'rte_power.h',
         'rte_power_guest_channel.h',
         'rte_power_pmd_mgmt.h',
         'rte_power_uncore.h',
+        'rte_power_qos.h',
 )
 if cc.has_argument('-Wno-cast-qual')
     cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..8eb26cd41a
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,111 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_lcore.h>
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define PM_QOS_SYSFILE_RESUME_LATENCY_US	\
+	"/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
+
+#define PM_QOS_CPU_RESUME_LATENCY_BUF_LEN	32
+
+int
+rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
+{
+	char buf[PM_QOS_CPU_RESUME_LATENCY_BUF_LEN];
+	FILE *f;
+	int ret;
+
+	if (!rte_lcore_is_enabled(lcore_id)) {
+		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+		return -EINVAL;
+	}
+
+	if (latency < 0) {
+		POWER_LOG(ERR, "latency should be greater than and equal to 0");
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meaning
+	 * is as follows for different input string.
+	 * 1> the resume latency is 0 if the input is "n/a".
+	 * 2> the resume latency is no constraint if the input is "0".
+	 * 3> the resume latency is the actual value to be set.
+	 */
+	if (latency == 0)
+		snprintf(buf, sizeof(buf), "%s", "n/a");
+	else if (latency == RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT)
+		snprintf(buf, sizeof(buf), "%u", 0);
+	else
+		snprintf(buf, sizeof(buf), "%u", latency);
+
+	ret = write_core_sysfs_s(f, buf);
+	if (ret != 0)
+		POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+
+	fclose(f);
+
+	return ret;
+}
+
+int
+rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
+{
+	char buf[PM_QOS_CPU_RESUME_LATENCY_BUF_LEN];
+	int latency = -1;
+	FILE *f;
+	int ret;
+
+	if (!rte_lcore_is_enabled(lcore_id)) {
+		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	ret = read_core_sysfs_s(f, buf, sizeof(buf));
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meaning
+	 * is as follows for different output string.
+	 * 1> the resume latency is 0 if the output is "n/a".
+	 * 2> the resume latency is no constraint if the output is "0".
+	 * 3> the resume latency is the actual value in used for other string.
+	 */
+	if (strcmp(buf, "n/a") == 0)
+		latency = 0;
+	else {
+		latency = strtoul(buf, NULL, 10);
+		latency = latency == 0 ? RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
+	}
+
+out:
+	fclose(f);
+
+	return latency != -1 ? latency : ret;
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..990c488373
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,73 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <stdint.h>
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The CPU-wide resume latency limit has a positive impact on this CPU's idle
+ * state selection in each cpuidle governor.
+ * Please see the PM QoS on CPU wide in the following link:
+ * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
+ *
+ * The deeper the idle state, the lower the power consumption, but the
+ * longer the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * In these case, per-CPU PM QoS API can be used to control this CPU's idle
+ * state selection and limit just enter the shallowest idle state to low the
+ * delay after sleep by setting strict resume latency (zero value).
+ */
+
+#define RTE_POWER_QOS_STRICT_LATENCY_VALUE             0
+#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT    ((int)(UINT32_MAX >> 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param lcore_id
+ *   target logical core id
+ *
+ * @param latency
+ *   The latency should be greater than and equal to zero in microseconds unit.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current resume latency of this logical core.
+ * The default value in kernel is @see RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT
+ * if don't set it.
+ *
+ * @return
+ *   Negative value on failure.
+ *   >= 0 means the actual resume latency limit on this core.
+ */
+__rte_experimental
+int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index c9a226614e..08f178a39d 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,8 @@  EXPERIMENTAL {
 	rte_power_set_uncore_env;
 	rte_power_uncore_freqs;
 	rte_power_unset_uncore_env;
+
+	# added in 24.11
+	rte_power_qos_get_cpu_resume_latency;
+	rte_power_qos_set_cpu_resume_latency;
 };