[v2,1/4] power: refactor core power management library

Message ID 20240826130650.320130-2-sivaprasad.tummala@amd.com (mailing list archive)
State Superseded
Delegated to: Thomas Monjalon
Headers
Series power: refactor power management library |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/iol-testing fail build patch failure

Commit Message

Tummala, Sivaprasad Aug. 26, 2024, 1:06 p.m. UTC
This patch introduces a comprehensive refactor to the core power
management library. The primary focus is on improving modularity
and organization by relocating specific driver implementations
from the 'lib/power' directory to dedicated directories within
'drivers/power/core/*'. The adjustment of meson.build files
enables the selective activation of individual drivers.

These changes contribute to a significant enhancement in code
organization, providing a clearer structure for driver implementations.
The refactor aims to improve overall code clarity and boost
maintainability. Additionally, it establishes a foundation for
future development, allowing for more focused work on individual
drivers and seamless integration of forthcoming enhancements.

v2:
 - added NULL check for global_core_ops in rte_power_get_core_ops

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
---
 drivers/meson.build                           |   1 +
 .../power/acpi/acpi_cpufreq.c                 |  22 +-
 .../power/acpi/acpi_cpufreq.h                 |   6 +-
 drivers/power/acpi/meson.build                |  10 +
 .../power/amd_pstate/amd_pstate_cpufreq.c     |  24 +-
 .../power/amd_pstate/amd_pstate_cpufreq.h     |   8 +-
 drivers/power/amd_pstate/meson.build          |  10 +
 .../power/cppc/cppc_cpufreq.c                 |  22 +-
 .../power/cppc/cppc_cpufreq.h                 |   8 +-
 drivers/power/cppc/meson.build                |  10 +
 .../power/kvm_vm}/guest_channel.c             |   0
 .../power/kvm_vm}/guest_channel.h             |   0
 .../power/kvm_vm/kvm_vm.c                     |  22 +-
 .../power/kvm_vm/kvm_vm.h                     |   6 +-
 drivers/power/kvm_vm/meson.build              |  16 +
 drivers/power/meson.build                     |  12 +
 drivers/power/pstate/meson.build              |  10 +
 .../power/pstate/pstate_cpufreq.c             |  22 +-
 .../power/pstate/pstate_cpufreq.h             |   6 +-
 lib/power/meson.build                         |   7 +-
 lib/power/power_common.c                      |   2 +-
 lib/power/power_common.h                      |  16 +-
 lib/power/rte_power.c                         | 291 ++++++------------
 lib/power/rte_power.h                         | 139 ++++++---
 lib/power/rte_power_core_ops.h                | 208 +++++++++++++
 lib/power/version.map                         |  14 +
 26 files changed, 621 insertions(+), 271 deletions(-)
 rename lib/power/power_acpi_cpufreq.c => drivers/power/acpi/acpi_cpufreq.c (95%)
 rename lib/power/power_acpi_cpufreq.h => drivers/power/acpi/acpi_cpufreq.h (98%)
 create mode 100644 drivers/power/acpi/meson.build
 rename lib/power/power_amd_pstate_cpufreq.c => drivers/power/amd_pstate/amd_pstate_cpufreq.c (95%)
 rename lib/power/power_amd_pstate_cpufreq.h => drivers/power/amd_pstate/amd_pstate_cpufreq.h (97%)
 create mode 100644 drivers/power/amd_pstate/meson.build
 rename lib/power/power_cppc_cpufreq.c => drivers/power/cppc/cppc_cpufreq.c (95%)
 rename lib/power/power_cppc_cpufreq.h => drivers/power/cppc/cppc_cpufreq.h (97%)
 create mode 100644 drivers/power/cppc/meson.build
 rename {lib/power => drivers/power/kvm_vm}/guest_channel.c (100%)
 rename {lib/power => drivers/power/kvm_vm}/guest_channel.h (100%)
 rename lib/power/power_kvm_vm.c => drivers/power/kvm_vm/kvm_vm.c (82%)
 rename lib/power/power_kvm_vm.h => drivers/power/kvm_vm/kvm_vm.h (98%)
 create mode 100644 drivers/power/kvm_vm/meson.build
 create mode 100644 drivers/power/meson.build
 create mode 100644 drivers/power/pstate/meson.build
 rename lib/power/power_pstate_cpufreq.c => drivers/power/pstate/pstate_cpufreq.c (96%)
 rename lib/power/power_pstate_cpufreq.h => drivers/power/pstate/pstate_cpufreq.h (98%)
 create mode 100644 lib/power/rte_power_core_ops.h
  

Comments

Stephen Hemminger Aug. 26, 2024, 3:26 p.m. UTC | #1
On Mon, 26 Aug 2024 13:06:46 +0000
Sivaprasad Tummala <sivaprasad.tummala@amd.com> wrote:

> +static struct rte_power_core_ops acpi_ops = {
> +	.name = "acpi",
> +	.init = power_acpi_cpufreq_init,
> +	.exit = power_acpi_cpufreq_exit,
> +	.check_env_support = power_acpi_cpufreq_check_supported,
> +	.get_avail_freqs = power_acpi_cpufreq_freqs,
> +	.get_freq = power_acpi_cpufreq_get_freq,
> +	.set_freq = power_acpi_cpufreq_set_freq,
> +	.freq_down = power_acpi_cpufreq_freq_down,
> +	.freq_up = power_acpi_cpufreq_freq_up,
> +	.freq_max = power_acpi_cpufreq_freq_max,
> +	.freq_min = power_acpi_cpufreq_freq_min,
> +	.turbo_status = power_acpi_turbo_status,
> +	.enable_turbo = power_acpi_enable_turbo,
> +	.disable_turbo = power_acpi_disable_turbo,
> +	.get_caps = power_acpi_get_capabilities
> +};
> +

Can this be made const?
It is good for security and overall safety to have structures with
function pointers marked const.
  
lihuisong (C) Aug. 27, 2024, 8:21 a.m. UTC | #2
Hi Sivaprasa,

Some comments inline.

/Huisong

在 2024/8/26 21:06, Sivaprasad Tummala 写道:
> This patch introduces a comprehensive refactor to the core power
> management library. The primary focus is on improving modularity
> and organization by relocating specific driver implementations
> from the 'lib/power' directory to dedicated directories within
> 'drivers/power/core/*'. The adjustment of meson.build files
> enables the selective activation of individual drivers.
> These changes contribute to a significant enhancement in code
> organization, providing a clearer structure for driver implementations.
> The refactor aims to improve overall code clarity and boost
> maintainability. Additionally, it establishes a foundation for
> future development, allowing for more focused work on individual
> drivers and seamless integration of forthcoming enhancements.
>
> v2:
>   - added NULL check for global_core_ops in rte_power_get_core_ops
>
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> ---
>   drivers/meson.build                           |   1 +
>   .../power/acpi/acpi_cpufreq.c                 |  22 +-
>   .../power/acpi/acpi_cpufreq.h                 |   6 +-
>   drivers/power/acpi/meson.build                |  10 +
>   .../power/amd_pstate/amd_pstate_cpufreq.c     |  24 +-
>   .../power/amd_pstate/amd_pstate_cpufreq.h     |   8 +-
>   drivers/power/amd_pstate/meson.build          |  10 +
>   .../power/cppc/cppc_cpufreq.c                 |  22 +-
>   .../power/cppc/cppc_cpufreq.h                 |   8 +-
>   drivers/power/cppc/meson.build                |  10 +
>   .../power/kvm_vm}/guest_channel.c             |   0
>   .../power/kvm_vm}/guest_channel.h             |   0
>   .../power/kvm_vm/kvm_vm.c                     |  22 +-
>   .../power/kvm_vm/kvm_vm.h                     |   6 +-
>   drivers/power/kvm_vm/meson.build              |  16 +
>   drivers/power/meson.build                     |  12 +
>   drivers/power/pstate/meson.build              |  10 +
>   .../power/pstate/pstate_cpufreq.c             |  22 +-
>   .../power/pstate/pstate_cpufreq.h             |   6 +-
>   lib/power/meson.build                         |   7 +-
>   lib/power/power_common.c                      |   2 +-
>   lib/power/power_common.h                      |  16 +-
>   lib/power/rte_power.c                         | 291 ++++++------------
>   lib/power/rte_power.h                         | 139 ++++++---
>   lib/power/rte_power_core_ops.h                | 208 +++++++++++++
>   lib/power/version.map                         |  14 +
>   26 files changed, 621 insertions(+), 271 deletions(-)
>   rename lib/power/power_acpi_cpufreq.c => drivers/power/acpi/acpi_cpufreq.c (95%)
>   rename lib/power/power_acpi_cpufreq.h => drivers/power/acpi/acpi_cpufreq.h (98%)
>   create mode 100644 drivers/power/acpi/meson.build
>   rename lib/power/power_amd_pstate_cpufreq.c => drivers/power/amd_pstate/amd_pstate_cpufreq.c (95%)
>   rename lib/power/power_amd_pstate_cpufreq.h => drivers/power/amd_pstate/amd_pstate_cpufreq.h (97%)
>   create mode 100644 drivers/power/amd_pstate/meson.build
>   rename lib/power/power_cppc_cpufreq.c => drivers/power/cppc/cppc_cpufreq.c (95%)
>   rename lib/power/power_cppc_cpufreq.h => drivers/power/cppc/cppc_cpufreq.h (97%)
>   create mode 100644 drivers/power/cppc/meson.build
>   rename {lib/power => drivers/power/kvm_vm}/guest_channel.c (100%)
>   rename {lib/power => drivers/power/kvm_vm}/guest_channel.h (100%)
>   rename lib/power/power_kvm_vm.c => drivers/power/kvm_vm/kvm_vm.c (82%)
>   rename lib/power/power_kvm_vm.h => drivers/power/kvm_vm/kvm_vm.h (98%)
>   create mode 100644 drivers/power/kvm_vm/meson.build
>   create mode 100644 drivers/power/meson.build
>   create mode 100644 drivers/power/pstate/meson.build
>   rename lib/power/power_pstate_cpufreq.c => drivers/power/pstate/pstate_cpufreq.c (96%)
>   rename lib/power/power_pstate_cpufreq.h => drivers/power/pstate/pstate_cpufreq.h (98%)
>   create mode 100644 lib/power/rte_power_core_ops.h
How about use the following directory structure?
*For power libs*
lib/power/power_common.*
lib/power/rte_power_pmd_mgmt.*
lib/power/rte_power_cpufreq_api.* (replacing rte_power.c file maybe 
simple for us. but I'm not sure if we can put the init of core, uncore 
and pmd mgmt to rte_power_init.c in rte_power.c.)
lib/power/rte_power_uncore_freq_api.*

*And has directories under drivers/power:*
1> For core dvfs driver:
drivers/power/cpufreq/acpi_cpufreq.c
drivers/power/cpufreq/cppc_cpufreq.c
drivers/power/cpufreq/amd_pstate_cpufreq.c
drivers/power/cpufreq/intel_pstate_cpufreq.c
drivers/power/cpufreq/kvm_cpufreq.c
The code of each cpufreq driver is not too much and doesn't probably 
increase. So don't need to use a directory for it.

2> For uncore dvfs driver:
drivers/power/uncorefreq/intel_uncore.*
> diff --git a/drivers/meson.build b/drivers/meson.build
> index 66931d4241..9d77e0deab 100644
> --- a/drivers/meson.build
> +++ b/drivers/meson.build
> @@ -29,6 +29,7 @@ subdirs = [
>           'event',          # depends on common, bus, mempool and net.
>           'baseband',       # depends on common and bus.
>           'gpu',            # depends on common and bus.
> +        'power',          # depends on common (in future).
>   ]
>   
>   if meson.is_cross_build()
> diff --git a/lib/power/power_acpi_cpufreq.c b/drivers/power/acpi/acpi_cpufreq.c
> similarity index 95%
> rename from lib/power/power_acpi_cpufreq.c
> rename to drivers/power/acpi/acpi_cpufreq.c
do not suggest to create one directory for each cpufreq driver.
Because pstate drivers also comply with ACPI spec, right?
In addition, the code of each cpufreq drivers are not too much.
There is just one file under one directory which is not good.
> index 81996e1c13..8637c69703 100644
> --- a/lib/power/power_acpi_cpufreq.c
> +++ b/drivers/power/acpi/acpi_cpufreq.c
> @@ -10,7 +10,7 @@
>   #include <rte_stdatomic.h>
>   #include <rte_string_fns.h>
>   
> -#include "power_acpi_cpufreq.h"
> +#include "acpi_cpufreq.h"
>   #include "power_common.h"
>   
<...>
> +if not is_linux
> +    build = false
> +    reason = 'only supported on Linux'
> +endif
> +sources = files('pstate_cpufreq.c')
> +
> +deps += ['power']
> diff --git a/lib/power/power_pstate_cpufreq.c b/drivers/power/pstate/pstate_cpufreq.c
> similarity index 96%
> rename from lib/power/power_pstate_cpufreq.c
> rename to drivers/power/pstate/pstate_cpufreq.c
pstate_cpufreq.c is actually intel_pstate cpufreq driver, right?
So how about modify this file name to intel_pstate_cpufreq.c?
> index 2343121621..c32b1adabc 100644
> --- a/lib/power/power_pstate_cpufreq.c
> +++ b/drivers/power/pstate/pstate_cpufreq.c
> @@ -15,7 +15,7 @@
>   #include <rte_stdatomic.h>
>   
>   #include "rte_power_pmd_mgmt.h"
> -#include "power_pstate_cpufreq.h"
> +#include "pstate_cpufreq.h"
>   #include "power_common.h"
>   
>   /* macros used for rounding frequency to nearest 100000 */
> @@ -888,3 +888,23 @@ int power_pstate_get_capabilities(unsigned int lcore_id,
>   
>   	return 0;
>   }
> +
<...>
> diff --git a/lib/power/power_common.c b/lib/power/power_common.c
> index 590986d5ef..6c06411e8b 100644
> --- a/lib/power/power_common.c
> +++ b/lib/power/power_common.c
> @@ -12,7 +12,7 @@
>   
>   #include "power_common.h"
>   
> -RTE_LOG_REGISTER_DEFAULT(power_logtype, INFO);
> +RTE_LOG_REGISTER_DEFAULT(rte_power_logtype, INFO);
>   
>   #define POWER_SYSFILE_SCALING_DRIVER   \
>   		"/sys/devices/system/cpu/cpu%u/cpufreq/scaling_driver"
> diff --git a/lib/power/power_common.h b/lib/power/power_common.h
> index 83f742f42a..767686ee12 100644
> --- a/lib/power/power_common.h
> +++ b/lib/power/power_common.h
> @@ -6,12 +6,13 @@
>   #define _POWER_COMMON_H_
>   
>   #include <rte_common.h>
> +#include <rte_compat.h>
>   #include <rte_log.h>
>   
>   #define RTE_POWER_INVALID_FREQ_INDEX (~0)
>   
> -extern int power_logtype;
> -#define RTE_LOGTYPE_POWER power_logtype
> +extern int rte_power_logtype;
> +#define RTE_LOGTYPE_POWER rte_power_logtype
>   #define POWER_LOG(level, ...) \
>   	RTE_LOG_LINE(level, POWER, "" __VA_ARGS__)
>   
> @@ -23,13 +24,24 @@ extern int power_logtype;
>   #endif
>   
>   /* check if scaling driver matches one we want */
> +__rte_internal
>   int cpufreq_check_scaling_driver(const char *driver);
> +
> +__rte_internal
>   int power_set_governor(unsigned int lcore_id, const char *new_governor,
>   		char *orig_governor, size_t orig_governor_len);
suggest that move cpufreq interfaces like this to the 
rte_power_cpufreq_api.* I proposed above.
The interfaces in power_comm.* can be used by all power modules, like 
core/uncore/pmd mgmt.
> +
> +__rte_internal
>   int open_core_sysfs_file(FILE **f, const char *mode, const char *format, ...)
>   		__rte_format_printf(3, 4);
> +
> +__rte_internal
>   int read_core_sysfs_u32(FILE *f, uint32_t *val);
> +
> +__rte_internal
>   int read_core_sysfs_s(FILE *f, char *buf, unsigned int len);
> +
> +__rte_internal
>   int write_core_sysfs_s(FILE *f, const char *str);
>   
>   #endif /* _POWER_COMMON_H_ */
> diff --git a/lib/power/rte_power.c b/lib/power/rte_power.c
The name of the rte_power.c file is impropriate now. The context in this 
file is just for cpufreq, right?
So I suggest that we need to rename this file as the rte_power_cpufreq_api.c
> index 36c3f3da98..2bf6d40517 100644
> --- a/lib/power/rte_power.c
> +++ b/lib/power/rte_power.c
> @@ -8,153 +8,86 @@
>   #include <rte_spinlock.h>
>   
>   #include "rte_power.h"
> -#include "power_acpi_cpufreq.h"
> -#include "power_cppc_cpufreq.h"
>   #include "power_common.h"
> -#include "power_kvm_vm.h"
> -#include "power_pstate_cpufreq.h"
> -#include "power_amd_pstate_cpufreq.h"
>   
> -enum power_management_env global_default_env = PM_ENV_NOT_SET;
> +static enum power_management_env global_default_env = PM_ENV_NOT_SET;
> +static struct rte_power_core_ops *global_power_core_ops;
>   
>   static rte_spinlock_t global_env_cfg_lock = RTE_SPINLOCK_INITIALIZER;
> +static RTE_TAILQ_HEAD(, rte_power_core_ops) core_ops_list =
> +			TAILQ_HEAD_INITIALIZER(core_ops_list);
>   
> -/* function pointers */
> -rte_power_freqs_t rte_power_freqs  = NULL;
> -rte_power_get_freq_t rte_power_get_freq = NULL;
> -rte_power_set_freq_t rte_power_set_freq = NULL;
> -rte_power_freq_change_t rte_power_freq_up = NULL;
> -rte_power_freq_change_t rte_power_freq_down = NULL;
> -rte_power_freq_change_t rte_power_freq_max = NULL;
> -rte_power_freq_change_t rte_power_freq_min = NULL;
> -rte_power_freq_change_t rte_power_turbo_status;
> -rte_power_freq_change_t rte_power_freq_enable_turbo;
> -rte_power_freq_change_t rte_power_freq_disable_turbo;
> -rte_power_get_capabilities_t rte_power_get_capabilities;
> -
> -static void
> -reset_power_function_ptrs(void)
> +
> +const char *power_env_str[] = {
> +	"not set",
> +	"acpi",
> +	"kvm-vm",
> +	"pstate",
> +	"cppc",
> +	"amd-pstate"
> +};
> +
> +/* register the ops struct in rte_power_core_ops, return 0 on success. */
> +int
> +rte_power_register_ops(struct rte_power_core_ops *driver_ops)
>   {
> -	rte_power_freqs  = NULL;
> -	rte_power_get_freq = NULL;
> -	rte_power_set_freq = NULL;
> -	rte_power_freq_up = NULL;
> -	rte_power_freq_down = NULL;
> -	rte_power_freq_max = NULL;
> -	rte_power_freq_min = NULL;
> -	rte_power_turbo_status = NULL;
> -	rte_power_freq_enable_turbo = NULL;
> -	rte_power_freq_disable_turbo = NULL;
> -	rte_power_get_capabilities = NULL;
> +	if (!driver_ops->init || !driver_ops->exit ||
> +		!driver_ops->check_env_support || !driver_ops->get_avail_freqs ||
> +		!driver_ops->get_freq || !driver_ops->set_freq ||
> +		!driver_ops->freq_up || !driver_ops->freq_down ||
> +		!driver_ops->freq_max || !driver_ops->freq_min ||
> +		!driver_ops->turbo_status || !driver_ops->enable_turbo ||
> +		!driver_ops->disable_turbo || !driver_ops->get_caps) {
> +		POWER_LOG(ERR, "Missing callbacks while registering power ops");
turbo_status(), enable_turbo() and disable turbo() are not necessary, 
right?
These depand on the capabilities from get_caps().
> +		return -EINVAL;
> +	}
> +
> +	TAILQ_INSERT_TAIL(&core_ops_list, driver_ops, next);
> +
> +	return 0;
>   }
>   
>   int
>   rte_power_check_env_supported(enum power_management_env env)
>   {
> -	switch (env) {
> -	case PM_ENV_ACPI_CPUFREQ:
> -		return power_acpi_cpufreq_check_supported();
> -	case PM_ENV_PSTATE_CPUFREQ:
> -		return power_pstate_cpufreq_check_supported();
> -	case PM_ENV_KVM_VM:
> -		return power_kvm_vm_check_supported();
> -	case PM_ENV_CPPC_CPUFREQ:
> -		return power_cppc_cpufreq_check_supported();
> -	case PM_ENV_AMD_PSTATE_CPUFREQ:
> -		return power_amd_pstate_cpufreq_check_supported();
> -	default:
> -		rte_errno = EINVAL;
> -		return -1;
> -	}
> +	struct rte_power_core_ops *ops;
> +
> +	if (env >= RTE_DIM(power_env_str))
> +		return 0;
> +
> +	RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
> +		if (strncmp(ops->name, power_env_str[env],
> +				RTE_POWER_DRIVER_NAMESZ) == 0)
> +			return ops->check_env_support();
> +
> +	return 0;
>   }
>   
>   int
>   rte_power_set_env(enum power_management_env env)
>   {
> +	struct rte_power_core_ops *ops;
> +	int ret = -1;
> +
>   	rte_spinlock_lock(&global_env_cfg_lock);
>   
>   	if (global_default_env != PM_ENV_NOT_SET) {
>   		POWER_LOG(ERR, "Power Management Environment already set.");
> -		rte_spinlock_unlock(&global_env_cfg_lock);
> -		return -1;
> -	}
> -
<...>
> -	if (ret == 0)
> -		global_default_env = env;
> -	else {
> -		global_default_env = PM_ENV_NOT_SET;
> -		reset_power_function_ptrs();
> -	}
> +	RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
> +		if (strncmp(ops->name, power_env_str[env],
> +				RTE_POWER_DRIVER_NAMESZ) == 0) {
> +			global_power_core_ops = ops;
> +			global_default_env = env;
> +			ret = 0;
> +			goto out;
> +		}
> +	POWER_LOG(ERR, "Invalid Power Management Environment(%d) set",
> +			env);
>   
> +out:
>   	rte_spinlock_unlock(&global_env_cfg_lock);
>   	return ret;
>   }
> @@ -164,94 +97,66 @@ rte_power_unset_env(void)
>   {
>   	rte_spinlock_lock(&global_env_cfg_lock);
>   	global_default_env = PM_ENV_NOT_SET;
> -	reset_power_function_ptrs();
> +	global_power_core_ops = NULL;
>   	rte_spinlock_unlock(&global_env_cfg_lock);
>   }
>   
>   enum power_management_env
> -rte_power_get_env(void) {
> +rte_power_get_env(void)
> +{
>   	return global_default_env;
>   }
>   
> -int
> -rte_power_init(unsigned int lcore_id)
> +struct rte_power_core_ops *
> +rte_power_get_core_ops(void)
>   {
> -	int ret = -1;
> +	RTE_ASSERT(global_power_core_ops != NULL);
>   
> -	switch (global_default_env) {
> -	case PM_ENV_ACPI_CPUFREQ:
> -		return power_acpi_cpufreq_init(lcore_id);
> -	case PM_ENV_KVM_VM:
> -		return power_kvm_vm_init(lcore_id);
> -	case PM_ENV_PSTATE_CPUFREQ:
> -		return power_pstate_cpufreq_init(lcore_id);
> -	case PM_ENV_CPPC_CPUFREQ:
> -		return power_cppc_cpufreq_init(lcore_id);
> -	case PM_ENV_AMD_PSTATE_CPUFREQ:
> -		return power_amd_pstate_cpufreq_init(lcore_id);
> -	default:
> -		POWER_LOG(INFO, "Env isn't set yet!");
> -	}
> +	return global_power_core_ops;
> +}
>   
> -	/* Auto detect Environment */
> -	POWER_LOG(INFO, "Attempting to initialise ACPI cpufreq power management...");
> -	ret = power_acpi_cpufreq_init(lcore_id);
> -	if (ret == 0) {
> -		rte_power_set_env(PM_ENV_ACPI_CPUFREQ);
> -		goto out;
> -	}
> +int
> +rte_power_init(unsigned int lcore_id)
> +{
> +	struct rte_power_core_ops *ops;
> +	uint8_t env;
>   
> -	POWER_LOG(INFO, "Attempting to initialise PSTAT power management...");
> -	ret = power_pstate_cpufreq_init(lcore_id);
> -	if (ret == 0) {
> -		rte_power_set_env(PM_ENV_PSTATE_CPUFREQ);
> -		goto out;
> -	}
> +	if (global_default_env != PM_ENV_NOT_SET)
> +		return global_power_core_ops->init(lcore_id);
>   
> -	POWER_LOG(INFO, "Attempting to initialise AMD PSTATE power management...");
> -	ret = power_amd_pstate_cpufreq_init(lcore_id);
> -	if (ret == 0) {
> -		rte_power_set_env(PM_ENV_AMD_PSTATE_CPUFREQ);
> -		goto out;
> -	}
> +	POWER_LOG(INFO, "Env isn't set yet!");
remove this log?
>   
> -	POWER_LOG(INFO, "Attempting to initialise CPPC power management...");
> -	ret = power_cppc_cpufreq_init(lcore_id);
> -	if (ret == 0) {
> -		rte_power_set_env(PM_ENV_CPPC_CPUFREQ);
> -		goto out;
> -	}
> +	/* Auto detect Environment */
> +	RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
> +		if (ops) {
> +			POWER_LOG(INFO,
> +				"Attempting to initialise %s cpufreq power management...",
> +				ops->name);
> +			if (ops->init(lcore_id) == 0) {
> +				for (env = 0; env < RTE_DIM(power_env_str); env++)
> +					if (strncmp(ops->name, power_env_str[env],
> +							RTE_POWER_DRIVER_NAMESZ) == 0) {
> +						rte_power_set_env(env);
> +						return 0;
> +					}
> +			}
> +		}
Can we change the logic of rte_power_set_env()? like:
     RTE_TAILQ_FOREACH(ops, &core_ops_list, next) {
         for (env = 0; env < RTE_DIM(power_env_str); env++) {
                 if (strncmp(ops->name, power_env_str[env], 
RTE_POWER_DRIVER_NAMESZ) == 0 &&
                     ops->init(lcore_id) == 0) {
                     global_power_core_ops = ops;
                     global_default_env = env;
                 }
         }
     }
That is easier to follow code.
> +
> +	POWER_LOG(ERR,
> +		"Unable to set Power Management Environment for lcore %u",
> +		lcore_id);
>   
> -	POWER_LOG(INFO, "Attempting to initialise VM power management...");
> -	ret = power_kvm_vm_init(lcore_id);
> -	if (ret == 0) {
> -		rte_power_set_env(PM_ENV_KVM_VM);
> -		goto out;
> -	}
> -	POWER_LOG(ERR, "Unable to set Power Management Environment for lcore "
> -			"%u", lcore_id);
> -out:
> -	return ret;
> +	return -1;
>   }
>   
>   int
>   rte_power_exit(unsigned int lcore_id)
>   {
> -	switch (global_default_env) {
> -	case PM_ENV_ACPI_CPUFREQ:
> -		return power_acpi_cpufreq_exit(lcore_id);
> -	case PM_ENV_KVM_VM:
> -		return power_kvm_vm_exit(lcore_id);
> -	case PM_ENV_PSTATE_CPUFREQ:
> -		return power_pstate_cpufreq_exit(lcore_id);
> -	case PM_ENV_CPPC_CPUFREQ:
> -		return power_cppc_cpufreq_exit(lcore_id);
> -	case PM_ENV_AMD_PSTATE_CPUFREQ:
> -		return power_amd_pstate_cpufreq_exit(lcore_id);
> -	default:
> -		POWER_LOG(ERR, "Environment has not been set, unable to exit gracefully");
> +	if (global_default_env != PM_ENV_NOT_SET)
> +		return global_power_core_ops->exit(lcore_id);
>   
> -	}
> -	return -1;
> +	POWER_LOG(ERR,
> +		"Environment has not been set, unable to exit gracefully");
>   
> +	return -1;
>   }
> diff --git a/lib/power/rte_power.h b/lib/power/rte_power.h
> index 4fa4afe399..5e4aacf08b 100644
> --- a/lib/power/rte_power.h
> +++ b/lib/power/rte_power.h
> @@ -1,5 +1,6 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
>    * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2024 Advanced Micro Devices, Inc.
>    */
>   
>   #ifndef _RTE_POWER_H
> @@ -14,14 +15,21 @@
>   #include <rte_log.h>
>   #include <rte_power_guest_channel.h>
>   
> +#include "rte_power_core_ops.h"
> +
>   #ifdef __cplusplus
>   extern "C" {
>   #endif
>   
>   /* Power Management Environment State */
> -enum power_management_env {PM_ENV_NOT_SET, PM_ENV_ACPI_CPUFREQ, PM_ENV_KVM_VM,
> -		PM_ENV_PSTATE_CPUFREQ, PM_ENV_CPPC_CPUFREQ,
> -		PM_ENV_AMD_PSTATE_CPUFREQ};
> +enum power_management_env {
> +	PM_ENV_NOT_SET = 0,
> +	PM_ENV_ACPI_CPUFREQ,
> +	PM_ENV_KVM_VM,
> +	PM_ENV_PSTATE_CPUFREQ,
> +	PM_ENV_CPPC_CPUFREQ,
> +	PM_ENV_AMD_PSTATE_CPUFREQ
> +};
>   
>   /**
>    * Check if a specific power management environment type is supported on a
> @@ -66,6 +74,15 @@ void rte_power_unset_env(void);
>    */
>   enum power_management_env rte_power_get_env(void);

I'd like to let user not know used which cpufreq driver, which is 
friendly to user.

So we can rethink if this API is necessary.

>   
> +/**
> + * @internal Get the power ops struct from its index.
> + *
> + * @return
> + *   The pointer to the ops struct in the table if registered.
> + */
> +struct rte_power_core_ops *
> +rte_power_get_core_ops(void);
> +
>   /**
>    * Initialize power management for a specific lcore. If rte_power_set_env() has
>    * not been called then an auto-detect of the environment will start and
> @@ -108,10 +125,13 @@ int rte_power_exit(unsigned int lcore_id);
>    * @return
>    *  The number of available frequencies.
>    */
> -typedef uint32_t (*rte_power_freqs_t)(unsigned int lcore_id, uint32_t *freqs,
> -		uint32_t num);
> +static inline uint32_t
> +rte_power_freqs(unsigned int lcore_id, uint32_t *freqs, uint32_t n)
> +{
> +	struct rte_power_core_ops *ops = rte_power_get_core_ops();
>   
> -extern rte_power_freqs_t rte_power_freqs;
> +	return ops->get_avail_freqs(lcore_id, freqs, n);
> +}
>   
>   /**
>    * Return the current index of available frequencies of a specific lcore.
> @@ -124,9 +144,13 @@ extern rte_power_freqs_t rte_power_freqs;
>    * @return
>    *  The current index of available frequencies.
>    */
> -typedef uint32_t (*rte_power_get_freq_t)(unsigned int lcore_id);
> +static inline uint32_t
> +rte_power_get_freq(unsigned int lcore_id)
> +{
> +	struct rte_power_core_ops *ops = rte_power_get_core_ops();
>   
> -extern rte_power_get_freq_t rte_power_get_freq;
> +	return ops->get_freq(lcore_id);
> +}
>   
>   /**
>    * Set the new frequency for a specific lcore by indicating the index of
> @@ -144,82 +168,101 @@ extern rte_power_get_freq_t rte_power_get_freq;
>    *  - 0 on success without frequency changed.
>    *  - Negative on error.
>    */
> -typedef int (*rte_power_set_freq_t)(unsigned int lcore_id, uint32_t index);
> -
> -extern rte_power_set_freq_t rte_power_set_freq;
> +static inline uint32_t
> +rte_power_set_freq(unsigned int lcore_id, uint32_t index)
> +{
> +	struct rte_power_core_ops *ops = rte_power_get_core_ops();
>   
> -/**
> - * Function pointer definition for generic frequency change functions. Review
> - * each environments specific documentation for usage.
> - *
> - * @param lcore_id
> - *  lcore id.
> - *
> - * @return
> - *  - 1 on success with frequency changed.
> - *  - 0 on success without frequency changed.
> - *  - Negative on error.
> - */
> -typedef int (*rte_power_freq_change_t)(unsigned int lcore_id);
> +	return ops->set_freq(lcore_id, index);
> +}
>   
>   /**
>    * Scale up the frequency of a specific lcore according to the available
>    * frequencies.
>    * Review each environments specific documentation for usage.
>    */
> -extern rte_power_freq_change_t rte_power_freq_up;
> +static inline int
> +rte_power_freq_up(unsigned int lcore_id)
> +{
> +	struct rte_power_core_ops *ops = rte_power_get_core_ops();
> +
> +	return ops->freq_up(lcore_id);
> +}
>   
>   /**
>    * Scale down the frequency of a specific lcore according to the available
>    * frequencies.
>    * Review each environments specific documentation for usage.
>    */
> -extern rte_power_freq_change_t rte_power_freq_down;
> +static inline int
> +rte_power_freq_down(unsigned int lcore_id)
> +{
> +	struct rte_power_core_ops *ops = rte_power_get_core_ops();
> +
> +	return ops->freq_down(lcore_id);
> +}
>   
>   /**
>    * Scale up the frequency of a specific lcore to the highest according to the
>    * available frequencies.
>    * Review each environments specific documentation for usage.
>    */
> -extern rte_power_freq_change_t rte_power_freq_max;
> +static inline int
> +rte_power_freq_max(unsigned int lcore_id)
> +{
> +	struct rte_power_core_ops *ops = rte_power_get_core_ops();
> +
> +	return ops->freq_max(lcore_id);
> +}
>   
>   /**
>    * Scale down the frequency of a specific lcore to the lowest according to the
>    * available frequencies.
>    * Review each environments specific documentation for usage..
>    */
> -extern rte_power_freq_change_t rte_power_freq_min;
> +static inline int
> +rte_power_freq_min(unsigned int lcore_id)
> +{
> +	struct rte_power_core_ops *ops = rte_power_get_core_ops();
> +
> +	return ops->freq_min(lcore_id);
> +}
>   
>   /**
>    * Query the Turbo Boost status of a specific lcore.
>    * Review each environments specific documentation for usage..
>    */
> -extern rte_power_freq_change_t rte_power_turbo_status;
> +static inline int
> +rte_power_turbo_status(unsigned int lcore_id)
> +{
> +	struct rte_power_core_ops *ops = rte_power_get_core_ops();
> +
> +	return ops->turbo_status(lcore_id);
> +}
>   
>   /**
>    * Enable Turbo Boost for this lcore.
>    * Review each environments specific documentation for usage..
>    */
> -extern rte_power_freq_change_t rte_power_freq_enable_turbo;
> +static inline int
> +rte_power_freq_enable_turbo(unsigned int lcore_id)
> +{
> +	struct rte_power_core_ops *ops = rte_power_get_core_ops();
> +
> +	return ops->enable_turbo(lcore_id);
> +}
>   
>   /**
>    * Disable Turbo Boost for this lcore.
>    * Review each environments specific documentation for usage..
>    */
> -extern rte_power_freq_change_t rte_power_freq_disable_turbo;
> +static inline int
> +rte_power_freq_disable_turbo(unsigned int lcore_id)
> +{
> +	struct rte_power_core_ops *ops = rte_power_get_core_ops();
>   
> -/**
> - * Power capabilities summary.
> - */
> -struct rte_power_core_capabilities {
> -	union {
> -		uint64_t capabilities;
> -		struct {
> -			uint64_t turbo:1;	/**< Turbo can be enabled. */
> -			uint64_t priority:1;	/**< SST-BF high freq core */
> -		};
> -	};
> -};
> +	return ops->disable_turbo(lcore_id);
> +}
>   
>   /**
>    * Returns power capabilities for a specific lcore.
> @@ -235,10 +278,14 @@ struct rte_power_core_capabilities {
>    *  - 0 on success.
>    *  - Negative on error.
>    */
> -typedef int (*rte_power_get_capabilities_t)(unsigned int lcore_id,
> -		struct rte_power_core_capabilities *caps);
> +static inline int
> +rte_power_get_capabilities(unsigned int lcore_id,
> +		struct rte_power_core_capabilities *caps)
> +{
> +	struct rte_power_core_ops *ops = rte_power_get_core_ops();
>   
> -extern rte_power_get_capabilities_t rte_power_get_capabilities;
> +	return ops->get_caps(lcore_id, caps);
> +}
>   
>   #ifdef __cplusplus
>   }
> diff --git a/lib/power/rte_power_core_ops.h b/lib/power/rte_power_core_ops.h
> new file mode 100644
> index 0000000000..356a64df79
> --- /dev/null
> +++ b/lib/power/rte_power_core_ops.h
> @@ -0,0 +1,208 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2024 Advanced Micro Devices, Inc.
> + */
> +
> +#ifndef _RTE_POWER_CORE_OPS_H
> +#define _RTE_POWER_CORE_OPS_H
> +
suggest rename the file as rte_power_cpufreq_api.h.
If so, the role of this file is more clearly.
> +__rte_internal
> +int rte_power_register_ops(struct rte_power_core_ops *ops);
> +
> +/**
> + * Macro to statically register the ops of a cpufreq driver.
> + */
> +#define RTE_POWER_REGISTER_OPS(ops)		\
> +	RTE_INIT(power_hdlr_init_##ops)		\
> +	{					\
> +		rte_power_register_ops(&ops);	\
> +	}
> +
> +/**
> + * @internal Get the power ops struct from its index.
> + *
> + * @return
> + *   The pointer to the ops struct in the table if registered.
> + */
> +struct rte_power_core_ops *
> +rte_power_get_core_ops(void);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> diff --git a/lib/power/version.map b/lib/power/version.map
> index c9a226614e..bd64e0828f 100644
> --- a/lib/power/version.map
> +++ b/lib/power/version.map
> @@ -51,4 +51,18 @@ EXPERIMENTAL {
>   	rte_power_set_uncore_env;
>   	rte_power_uncore_freqs;
>   	rte_power_unset_uncore_env;
> +	# added in 24.07
24.07-->24.11?
> +	rte_power_logtype;
> +};
> +
> +INTERNAL {
> +	global:
> +
> +	rte_power_register_ops;
> +	cpufreq_check_scaling_driver;
> +	power_set_governor;
> +	open_core_sysfs_file;
> +	read_core_sysfs_u32;
> +	read_core_sysfs_s;
> +	write_core_sysfs_s;
>   };
  
Tummala, Sivaprasad Sept. 12, 2024, 11:17 a.m. UTC | #3
[AMD Official Use Only - AMD Internal Distribution Only]

Hi Huisong,

Please find my response inline.

> -----Original Message-----
> From: lihuisong (C) <lihuisong@huawei.com>
> Sent: Tuesday, August 27, 2024 1:51 PM
> To: Tummala, Sivaprasad <Sivaprasad.Tummala@amd.com>
> Cc: dev@dpdk.org; david.hunt@intel.com; anatoly.burakov@intel.com;
> radu.nicolau@intel.com; cristian.dumitrescu@intel.com; jerinj@marvell.com;
> konstantin.ananyev@huawei.com; Yigit, Ferruh <Ferruh.Yigit@amd.com>;
> gakhil@marvell.com
> Subject: Re: [PATCH v2 1/4] power: refactor core power management library
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> Hi Sivaprasa,
>
> Some comments inline.
>
> /Huisong
>
> 在 2024/8/26 21:06, Sivaprasad Tummala 写道:
> > This patch introduces a comprehensive refactor to the core power
> > management library. The primary focus is on improving modularity and
> > organization by relocating specific driver implementations from the
> > 'lib/power' directory to dedicated directories within
> > 'drivers/power/core/*'. The adjustment of meson.build files enables
> > the selective activation of individual drivers.
> > These changes contribute to a significant enhancement in code
> > organization, providing a clearer structure for driver implementations.
> > The refactor aims to improve overall code clarity and boost
> > maintainability. Additionally, it establishes a foundation for future
> > development, allowing for more focused work on individual drivers and
> > seamless integration of forthcoming enhancements.
> >
> > v2:
> >   - added NULL check for global_core_ops in rte_power_get_core_ops
> >
> > Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> > ---
> >   drivers/meson.build                           |   1 +
> >   .../power/acpi/acpi_cpufreq.c                 |  22 +-
> >   .../power/acpi/acpi_cpufreq.h                 |   6 +-
> >   drivers/power/acpi/meson.build                |  10 +
> >   .../power/amd_pstate/amd_pstate_cpufreq.c     |  24 +-
> >   .../power/amd_pstate/amd_pstate_cpufreq.h     |   8 +-
> >   drivers/power/amd_pstate/meson.build          |  10 +
> >   .../power/cppc/cppc_cpufreq.c                 |  22 +-
> >   .../power/cppc/cppc_cpufreq.h                 |   8 +-
> >   drivers/power/cppc/meson.build                |  10 +
> >   .../power/kvm_vm}/guest_channel.c             |   0
> >   .../power/kvm_vm}/guest_channel.h             |   0
> >   .../power/kvm_vm/kvm_vm.c                     |  22 +-
> >   .../power/kvm_vm/kvm_vm.h                     |   6 +-
> >   drivers/power/kvm_vm/meson.build              |  16 +
> >   drivers/power/meson.build                     |  12 +
> >   drivers/power/pstate/meson.build              |  10 +
> >   .../power/pstate/pstate_cpufreq.c             |  22 +-
> >   .../power/pstate/pstate_cpufreq.h             |   6 +-
> >   lib/power/meson.build                         |   7 +-
> >   lib/power/power_common.c                      |   2 +-
> >   lib/power/power_common.h                      |  16 +-
> >   lib/power/rte_power.c                         | 291 ++++++------------
> >   lib/power/rte_power.h                         | 139 ++++++---
> >   lib/power/rte_power_core_ops.h                | 208 +++++++++++++
> >   lib/power/version.map                         |  14 +
> >   26 files changed, 621 insertions(+), 271 deletions(-)
> >   rename lib/power/power_acpi_cpufreq.c => drivers/power/acpi/acpi_cpufreq.c
> (95%)
> >   rename lib/power/power_acpi_cpufreq.h => drivers/power/acpi/acpi_cpufreq.h
> (98%)
> >   create mode 100644 drivers/power/acpi/meson.build
> >   rename lib/power/power_amd_pstate_cpufreq.c =>
> drivers/power/amd_pstate/amd_pstate_cpufreq.c (95%)
> >   rename lib/power/power_amd_pstate_cpufreq.h =>
> drivers/power/amd_pstate/amd_pstate_cpufreq.h (97%)
> >   create mode 100644 drivers/power/amd_pstate/meson.build
> >   rename lib/power/power_cppc_cpufreq.c => drivers/power/cppc/cppc_cpufreq.c
> (95%)
> >   rename lib/power/power_cppc_cpufreq.h => drivers/power/cppc/cppc_cpufreq.h
> (97%)
> >   create mode 100644 drivers/power/cppc/meson.build
> >   rename {lib/power => drivers/power/kvm_vm}/guest_channel.c (100%)
> >   rename {lib/power => drivers/power/kvm_vm}/guest_channel.h (100%)
> >   rename lib/power/power_kvm_vm.c => drivers/power/kvm_vm/kvm_vm.c (82%)
> >   rename lib/power/power_kvm_vm.h => drivers/power/kvm_vm/kvm_vm.h (98%)
> >   create mode 100644 drivers/power/kvm_vm/meson.build
> >   create mode 100644 drivers/power/meson.build
> >   create mode 100644 drivers/power/pstate/meson.build
> >   rename lib/power/power_pstate_cpufreq.c =>
> drivers/power/pstate/pstate_cpufreq.c (96%)
> >   rename lib/power/power_pstate_cpufreq.h =>
> drivers/power/pstate/pstate_cpufreq.h (98%)
> >   create mode 100644 lib/power/rte_power_core_ops.h
> How about use the following directory structure?
> *For power libs*
> lib/power/power_common.*
> lib/power/rte_power_pmd_mgmt.*
> lib/power/rte_power_cpufreq_api.* (replacing rte_power.c file maybe simple for us.
> but I'm not sure if we can put the init of core, uncore and pmd mgmt to
> rte_power_init.c in rte_power.c.)
> lib/power/rte_power_uncore_freq_api.*
Yes, renaming rte_power.c is definitely a possible incremental change that could be considered later.
However, for the time being, our focus will be on refactoring the cpufreq drivers only.
>
> *And has directories under drivers/power:*
> 1> For core dvfs driver:
> drivers/power/cpufreq/acpi_cpufreq.c
> drivers/power/cpufreq/cppc_cpufreq.c
> drivers/power/cpufreq/amd_pstate_cpufreq.c
> drivers/power/cpufreq/intel_pstate_cpufreq.c
> drivers/power/cpufreq/kvm_cpufreq.c
> The code of each cpufreq driver is not too much and doesn't probably increase. So
> don't need to use a directory for it.
>
> 2> For uncore dvfs driver:
> drivers/power/uncorefreq/intel_uncore.*
> > diff --git a/drivers/meson.build b/drivers/meson.build index
> > 66931d4241..9d77e0deab 100644
> > --- a/drivers/meson.build
> > +++ b/drivers/meson.build
> > @@ -29,6 +29,7 @@ subdirs = [
> >           'event',          # depends on common, bus, mempool and net.
> >           'baseband',       # depends on common and bus.
> >           'gpu',            # depends on common and bus.
> > +        'power',          # depends on common (in future).
> >   ]
> >
> >   if meson.is_cross_build()
> > diff --git a/lib/power/power_acpi_cpufreq.c
> > b/drivers/power/acpi/acpi_cpufreq.c
> > similarity index 95%
> > rename from lib/power/power_acpi_cpufreq.c rename to
> > drivers/power/acpi/acpi_cpufreq.c
> do not suggest to create one directory for each cpufreq driver.
> Because pstate drivers also comply with ACPI spec, right?
> In addition, the code of each cpufreq drivers are not too much.
> There is just one file under one directory which is not good.
One of our objectives for the refactoring is to selectively disable non-essential drivers using Meson build options.
However, by rearranging the driver structure, we risk disrupting this capability.
> > index 81996e1c13..8637c69703 100644
> > --- a/lib/power/power_acpi_cpufreq.c
> > +++ b/drivers/power/acpi/acpi_cpufreq.c
> > @@ -10,7 +10,7 @@
> >   #include <rte_stdatomic.h>
> >   #include <rte_string_fns.h>
> >
> > -#include "power_acpi_cpufreq.h"
> > +#include "acpi_cpufreq.h"
> >   #include "power_common.h"
> >
> <...>
> > +if not is_linux
> > +    build = false
> > +    reason = 'only supported on Linux'
> > +endif
> > +sources = files('pstate_cpufreq.c')
> > +
> > +deps += ['power']
> > diff --git a/lib/power/power_pstate_cpufreq.c
> > b/drivers/power/pstate/pstate_cpufreq.c
> > similarity index 96%
> > rename from lib/power/power_pstate_cpufreq.c rename to
> > drivers/power/pstate/pstate_cpufreq.c
> pstate_cpufreq.c is actually intel_pstate cpufreq driver, right?
> So how about modify this file name to intel_pstate_cpufreq.c?
Yes, will fix this in next version.
> > index 2343121621..c32b1adabc 100644
> > --- a/lib/power/power_pstate_cpufreq.c
> > +++ b/drivers/power/pstate/pstate_cpufreq.c
> > @@ -15,7 +15,7 @@
> >   #include <rte_stdatomic.h>
> >
> >   #include "rte_power_pmd_mgmt.h"
> > -#include "power_pstate_cpufreq.h"
> > +#include "pstate_cpufreq.h"
> >   #include "power_common.h"
> >
> >   /* macros used for rounding frequency to nearest 100000 */ @@ -888,3
> > +888,23 @@ int power_pstate_get_capabilities(unsigned int lcore_id,
> >
> >       return 0;
> >   }
> > +
> <...>
> > diff --git a/lib/power/power_common.c b/lib/power/power_common.c index
> > 590986d5ef..6c06411e8b 100644
> > --- a/lib/power/power_common.c
> > +++ b/lib/power/power_common.c
> > @@ -12,7 +12,7 @@
> >
> >   #include "power_common.h"
> >
> > -RTE_LOG_REGISTER_DEFAULT(power_logtype, INFO);
> > +RTE_LOG_REGISTER_DEFAULT(rte_power_logtype, INFO);
> >
> >   #define POWER_SYSFILE_SCALING_DRIVER   \
> >               "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_driver"
> > diff --git a/lib/power/power_common.h b/lib/power/power_common.h index
> > 83f742f42a..767686ee12 100644
> > --- a/lib/power/power_common.h
> > +++ b/lib/power/power_common.h
> > @@ -6,12 +6,13 @@
> >   #define _POWER_COMMON_H_
> >
> >   #include <rte_common.h>
> > +#include <rte_compat.h>
> >   #include <rte_log.h>
> >
> >   #define RTE_POWER_INVALID_FREQ_INDEX (~0)
> >
> > -extern int power_logtype;
> > -#define RTE_LOGTYPE_POWER power_logtype
> > +extern int rte_power_logtype;
> > +#define RTE_LOGTYPE_POWER rte_power_logtype
> >   #define POWER_LOG(level, ...) \
> >       RTE_LOG_LINE(level, POWER, "" __VA_ARGS__)
> >
> > @@ -23,13 +24,24 @@ extern int power_logtype;
> >   #endif
> >
> >   /* check if scaling driver matches one we want */
> > +__rte_internal
> >   int cpufreq_check_scaling_driver(const char *driver);
> > +
> > +__rte_internal
> >   int power_set_governor(unsigned int lcore_id, const char *new_governor,
> >               char *orig_governor, size_t orig_governor_len);
> suggest that move cpufreq interfaces like this to the
> rte_power_cpufreq_api.* I proposed above.
This is an internal API and isn’t intended for direct use by applications.
By moving it to rte_power_*, we risk exposing it inadvertently.

> The interfaces in power_comm.* can be used by all power modules, like
> core/uncore/pmd mgmt.
> > +
> > +__rte_internal
> >   int open_core_sysfs_file(FILE **f, const char *mode, const char *format, ...)
> >               __rte_format_printf(3, 4);
> > +
> > +__rte_internal
> >   int read_core_sysfs_u32(FILE *f, uint32_t *val);
> > +
> > +__rte_internal
> >   int read_core_sysfs_s(FILE *f, char *buf, unsigned int len);
> > +
> > +__rte_internal
> >   int write_core_sysfs_s(FILE *f, const char *str);
> >
> >   #endif /* _POWER_COMMON_H_ */
> > diff --git a/lib/power/rte_power.c b/lib/power/rte_power.c
> The name of the rte_power.c file is impropriate now. The context in this file is just for
> cpufreq, right?
> So I suggest that we need to rename this file as the rte_power_cpufreq_api.c
Yes, renaming rte_power.c to rte_power_cpufreq.c is definitely a possible incremental change
and will fix this as a separate patch.
.

> > index 36c3f3da98..2bf6d40517 100644
> > --- a/lib/power/rte_power.c
> > +++ b/lib/power/rte_power.c
> > @@ -8,153 +8,86 @@
> >   #include <rte_spinlock.h>
> >
> >   #include "rte_power.h"
> > -#include "power_acpi_cpufreq.h"
> > -#include "power_cppc_cpufreq.h"
> >   #include "power_common.h"
> > -#include "power_kvm_vm.h"
> > -#include "power_pstate_cpufreq.h"
> > -#include "power_amd_pstate_cpufreq.h"
> >
> > -enum power_management_env global_default_env = PM_ENV_NOT_SET;
> > +static enum power_management_env global_default_env =
> PM_ENV_NOT_SET;
> > +static struct rte_power_core_ops *global_power_core_ops;
> >
> >   static rte_spinlock_t global_env_cfg_lock =
> > RTE_SPINLOCK_INITIALIZER;
> > +static RTE_TAILQ_HEAD(, rte_power_core_ops) core_ops_list =
> > +                     TAILQ_HEAD_INITIALIZER(core_ops_list);
> >
> > -/* function pointers */
> > -rte_power_freqs_t rte_power_freqs  = NULL; -rte_power_get_freq_t
> > rte_power_get_freq = NULL; -rte_power_set_freq_t rte_power_set_freq =
> > NULL; -rte_power_freq_change_t rte_power_freq_up = NULL;
> > -rte_power_freq_change_t rte_power_freq_down = NULL;
> > -rte_power_freq_change_t rte_power_freq_max = NULL;
> > -rte_power_freq_change_t rte_power_freq_min = NULL;
> > -rte_power_freq_change_t rte_power_turbo_status;
> > -rte_power_freq_change_t rte_power_freq_enable_turbo;
> > -rte_power_freq_change_t rte_power_freq_disable_turbo;
> > -rte_power_get_capabilities_t rte_power_get_capabilities;
> > -
> > -static void
> > -reset_power_function_ptrs(void)
> > +
> > +const char *power_env_str[] = {
> > +     "not set",
> > +     "acpi",
> > +     "kvm-vm",
> > +     "pstate",
> > +     "cppc",
> > +     "amd-pstate"
> > +};
> > +
> > +/* register the ops struct in rte_power_core_ops, return 0 on
> > +success. */ int rte_power_register_ops(struct rte_power_core_ops
> > +*driver_ops)
> >   {
> > -     rte_power_freqs  = NULL;
> > -     rte_power_get_freq = NULL;
> > -     rte_power_set_freq = NULL;
> > -     rte_power_freq_up = NULL;
> > -     rte_power_freq_down = NULL;
> > -     rte_power_freq_max = NULL;
> > -     rte_power_freq_min = NULL;
> > -     rte_power_turbo_status = NULL;
> > -     rte_power_freq_enable_turbo = NULL;
> > -     rte_power_freq_disable_turbo = NULL;
> > -     rte_power_get_capabilities = NULL;
> > +     if (!driver_ops->init || !driver_ops->exit ||
> > +             !driver_ops->check_env_support || !driver_ops->get_avail_freqs ||
> > +             !driver_ops->get_freq || !driver_ops->set_freq ||
> > +             !driver_ops->freq_up || !driver_ops->freq_down ||
> > +             !driver_ops->freq_max || !driver_ops->freq_min ||
> > +             !driver_ops->turbo_status || !driver_ops->enable_turbo ||
> > +             !driver_ops->disable_turbo || !driver_ops->get_caps) {
> > +             POWER_LOG(ERR, "Missing callbacks while registering
> > + power ops");
> turbo_status(), enable_turbo() and disable turbo() are not necessary, right?
Nope, this is required to get the current status unlike the capability API (get_caps()).
> These depand on the capabilities from get_caps().
> > +             return -EINVAL;
> > +     }
> > +
> > +     TAILQ_INSERT_TAIL(&core_ops_list, driver_ops, next);
> > +
> > +     return 0;
> >   }
> >
> >   int
> >   rte_power_check_env_supported(enum power_management_env env)
> >   {
> > -     switch (env) {
> > -     case PM_ENV_ACPI_CPUFREQ:
> > -             return power_acpi_cpufreq_check_supported();
> > -     case PM_ENV_PSTATE_CPUFREQ:
> > -             return power_pstate_cpufreq_check_supported();
> > -     case PM_ENV_KVM_VM:
> > -             return power_kvm_vm_check_supported();
> > -     case PM_ENV_CPPC_CPUFREQ:
> > -             return power_cppc_cpufreq_check_supported();
> > -     case PM_ENV_AMD_PSTATE_CPUFREQ:
> > -             return power_amd_pstate_cpufreq_check_supported();
> > -     default:
> > -             rte_errno = EINVAL;
> > -             return -1;
> > -     }
> > +     struct rte_power_core_ops *ops;
> > +
> > +     if (env >= RTE_DIM(power_env_str))
> > +             return 0;
> > +
> > +     RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
> > +             if (strncmp(ops->name, power_env_str[env],
> > +                             RTE_POWER_DRIVER_NAMESZ) == 0)
> > +                     return ops->check_env_support();
> > +
> > +     return 0;
> >   }
> >
> >   int
> >   rte_power_set_env(enum power_management_env env)
> >   {
> > +     struct rte_power_core_ops *ops;
> > +     int ret = -1;
> > +
> >       rte_spinlock_lock(&global_env_cfg_lock);
> >
> >       if (global_default_env != PM_ENV_NOT_SET) {
> >               POWER_LOG(ERR, "Power Management Environment already set.");
> > -             rte_spinlock_unlock(&global_env_cfg_lock);
> > -             return -1;
> > -     }
> > -
> <...>
> > -     if (ret == 0)
> > -             global_default_env = env;
> > -     else {
> > -             global_default_env = PM_ENV_NOT_SET;
> > -             reset_power_function_ptrs();
> > -     }
> > +     RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
> > +             if (strncmp(ops->name, power_env_str[env],
> > +                             RTE_POWER_DRIVER_NAMESZ) == 0) {
> > +                     global_power_core_ops = ops;
> > +                     global_default_env = env;
> > +                     ret = 0;
> > +                     goto out;
> > +             }
> > +     POWER_LOG(ERR, "Invalid Power Management Environment(%d) set",
> > +                     env);
> >
> > +out:
> >       rte_spinlock_unlock(&global_env_cfg_lock);
> >       return ret;
> >   }
> > @@ -164,94 +97,66 @@ rte_power_unset_env(void)
> >   {
> >       rte_spinlock_lock(&global_env_cfg_lock);
> >       global_default_env = PM_ENV_NOT_SET;
> > -     reset_power_function_ptrs();
> > +     global_power_core_ops = NULL;
> >       rte_spinlock_unlock(&global_env_cfg_lock);
> >   }
> >
> >   enum power_management_env
> > -rte_power_get_env(void) {
> > +rte_power_get_env(void)
> > +{
> >       return global_default_env;
> >   }
> >
> > -int
> > -rte_power_init(unsigned int lcore_id)
> > +struct rte_power_core_ops *
> > +rte_power_get_core_ops(void)
> >   {
> > -     int ret = -1;
> > +     RTE_ASSERT(global_power_core_ops != NULL);
> >
> > -     switch (global_default_env) {
> > -     case PM_ENV_ACPI_CPUFREQ:
> > -             return power_acpi_cpufreq_init(lcore_id);
> > -     case PM_ENV_KVM_VM:
> > -             return power_kvm_vm_init(lcore_id);
> > -     case PM_ENV_PSTATE_CPUFREQ:
> > -             return power_pstate_cpufreq_init(lcore_id);
> > -     case PM_ENV_CPPC_CPUFREQ:
> > -             return power_cppc_cpufreq_init(lcore_id);
> > -     case PM_ENV_AMD_PSTATE_CPUFREQ:
> > -             return power_amd_pstate_cpufreq_init(lcore_id);
> > -     default:
> > -             POWER_LOG(INFO, "Env isn't set yet!");
> > -     }
> > +     return global_power_core_ops;
> > +}
> >
> > -     /* Auto detect Environment */
> > -     POWER_LOG(INFO, "Attempting to initialise ACPI cpufreq power
> management...");
> > -     ret = power_acpi_cpufreq_init(lcore_id);
> > -     if (ret == 0) {
> > -             rte_power_set_env(PM_ENV_ACPI_CPUFREQ);
> > -             goto out;
> > -     }
> > +int
> > +rte_power_init(unsigned int lcore_id) {
> > +     struct rte_power_core_ops *ops;
> > +     uint8_t env;
> >
> > -     POWER_LOG(INFO, "Attempting to initialise PSTAT power management...");
> > -     ret = power_pstate_cpufreq_init(lcore_id);
> > -     if (ret == 0) {
> > -             rte_power_set_env(PM_ENV_PSTATE_CPUFREQ);
> > -             goto out;
> > -     }
> > +     if (global_default_env != PM_ENV_NOT_SET)
> > +             return global_power_core_ops->init(lcore_id);
> >
> > -     POWER_LOG(INFO, "Attempting to initialise AMD PSTATE power
> management...");
> > -     ret = power_amd_pstate_cpufreq_init(lcore_id);
> > -     if (ret == 0) {
> > -             rte_power_set_env(PM_ENV_AMD_PSTATE_CPUFREQ);
> > -             goto out;
> > -     }
> > +     POWER_LOG(INFO, "Env isn't set yet!");
> remove this log?
> >
> > -     POWER_LOG(INFO, "Attempting to initialise CPPC power management...");
> > -     ret = power_cppc_cpufreq_init(lcore_id);
> > -     if (ret == 0) {
> > -             rte_power_set_env(PM_ENV_CPPC_CPUFREQ);
> > -             goto out;
> > -     }
> > +     /* Auto detect Environment */
> > +     RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
> > +             if (ops) {
> > +                     POWER_LOG(INFO,
> > +                             "Attempting to initialise %s cpufreq power management...",
> > +                             ops->name);
> > +                     if (ops->init(lcore_id) == 0) {
> > +                             for (env = 0; env < RTE_DIM(power_env_str); env++)
> > +                                     if (strncmp(ops->name, power_env_str[env],
> > +                                                     RTE_POWER_DRIVER_NAMESZ) == 0) {
> > +                                             rte_power_set_env(env);
> > +                                             return 0;
> > +                                     }
> > +                     }
> > +             }
> Can we change the logic of rte_power_set_env()? like:
>      RTE_TAILQ_FOREACH(ops, &core_ops_list, next) {
>          for (env = 0; env < RTE_DIM(power_env_str); env++) {
>                  if (strncmp(ops->name, power_env_str[env],
> RTE_POWER_DRIVER_NAMESZ) == 0 &&
>                      ops->init(lcore_id) == 0) {
>                      global_power_core_ops = ops;
>                      global_default_env = env;
>                  }
>          }
>      }
> That is easier to follow code.
Yes, will fix in next version.

> > +
> > +     POWER_LOG(ERR,
> > +             "Unable to set Power Management Environment for lcore %u",
> > +             lcore_id);
> >
> > -     POWER_LOG(INFO, "Attempting to initialise VM power management...");
> > -     ret = power_kvm_vm_init(lcore_id);
> > -     if (ret == 0) {
> > -             rte_power_set_env(PM_ENV_KVM_VM);
> > -             goto out;
> > -     }
> > -     POWER_LOG(ERR, "Unable to set Power Management Environment for lcore
> "
> > -                     "%u", lcore_id);
> > -out:
> > -     return ret;
> > +     return -1;
> >   }
> >
> >   int
> >   rte_power_exit(unsigned int lcore_id)
> >   {
> > -     switch (global_default_env) {
> > -     case PM_ENV_ACPI_CPUFREQ:
> > -             return power_acpi_cpufreq_exit(lcore_id);
> > -     case PM_ENV_KVM_VM:
> > -             return power_kvm_vm_exit(lcore_id);
> > -     case PM_ENV_PSTATE_CPUFREQ:
> > -             return power_pstate_cpufreq_exit(lcore_id);
> > -     case PM_ENV_CPPC_CPUFREQ:
> > -             return power_cppc_cpufreq_exit(lcore_id);
> > -     case PM_ENV_AMD_PSTATE_CPUFREQ:
> > -             return power_amd_pstate_cpufreq_exit(lcore_id);
> > -     default:
> > -             POWER_LOG(ERR, "Environment has not been set, unable to exit
> gracefully");
> > +     if (global_default_env != PM_ENV_NOT_SET)
> > +             return global_power_core_ops->exit(lcore_id);
> >
> > -     }
> > -     return -1;
> > +     POWER_LOG(ERR,
> > +             "Environment has not been set, unable to exit
> > + gracefully");
> >
> > +     return -1;
> >   }
> > diff --git a/lib/power/rte_power.h b/lib/power/rte_power.h index
> > 4fa4afe399..5e4aacf08b 100644
> > --- a/lib/power/rte_power.h
> > +++ b/lib/power/rte_power.h
> > @@ -1,5 +1,6 @@
> >   /* SPDX-License-Identifier: BSD-3-Clause
> >    * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2024 Advanced Micro Devices, Inc.
> >    */
> >
> >   #ifndef _RTE_POWER_H
> > @@ -14,14 +15,21 @@
> >   #include <rte_log.h>
> >   #include <rte_power_guest_channel.h>
> >
> > +#include "rte_power_core_ops.h"
> > +
> >   #ifdef __cplusplus
> >   extern "C" {
> >   #endif
> >
> >   /* Power Management Environment State */ -enum power_management_env
> > {PM_ENV_NOT_SET, PM_ENV_ACPI_CPUFREQ, PM_ENV_KVM_VM,
> > -             PM_ENV_PSTATE_CPUFREQ, PM_ENV_CPPC_CPUFREQ,
> > -             PM_ENV_AMD_PSTATE_CPUFREQ};
> > +enum power_management_env {
> > +     PM_ENV_NOT_SET = 0,
> > +     PM_ENV_ACPI_CPUFREQ,
> > +     PM_ENV_KVM_VM,
> > +     PM_ENV_PSTATE_CPUFREQ,
> > +     PM_ENV_CPPC_CPUFREQ,
> > +     PM_ENV_AMD_PSTATE_CPUFREQ
> > +};
> >
> >   /**
> >    * Check if a specific power management environment type is
> > supported on a @@ -66,6 +74,15 @@ void rte_power_unset_env(void);
> >    */
> >   enum power_management_env rte_power_get_env(void);
>
> I'd like to let user not know used which cpufreq driver, which is friendly to user.
>
> So we can rethink if this API is necessary.
For any API changes, could we handle this as a separate RFC for discussion?
It’s important that these changes are not included within the scope of this patch.
>
> >
> > +/**
> > + * @internal Get the power ops struct from its index.
> > + *
> > + * @return
> > + *   The pointer to the ops struct in the table if registered.
> > + */
> > +struct rte_power_core_ops *
> > +rte_power_get_core_ops(void);
> > +
> >   /**
> >    * Initialize power management for a specific lcore. If rte_power_set_env() has
> >    * not been called then an auto-detect of the environment will start
> > and @@ -108,10 +125,13 @@ int rte_power_exit(unsigned int lcore_id);
> >    * @return
> >    *  The number of available frequencies.
> >    */
> > -typedef uint32_t (*rte_power_freqs_t)(unsigned int lcore_id, uint32_t *freqs,
> > -             uint32_t num);
> > +static inline uint32_t
> > +rte_power_freqs(unsigned int lcore_id, uint32_t *freqs, uint32_t n) {
> > +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> >
> > -extern rte_power_freqs_t rte_power_freqs;
> > +     return ops->get_avail_freqs(lcore_id, freqs, n); }
> >
> >   /**
> >    * Return the current index of available frequencies of a specific lcore.
> > @@ -124,9 +144,13 @@ extern rte_power_freqs_t rte_power_freqs;
> >    * @return
> >    *  The current index of available frequencies.
> >    */
> > -typedef uint32_t (*rte_power_get_freq_t)(unsigned int lcore_id);
> > +static inline uint32_t
> > +rte_power_get_freq(unsigned int lcore_id) {
> > +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> >
> > -extern rte_power_get_freq_t rte_power_get_freq;
> > +     return ops->get_freq(lcore_id);
> > +}
> >
> >   /**
> >    * Set the new frequency for a specific lcore by indicating the
> > index of @@ -144,82 +168,101 @@ extern rte_power_get_freq_t
> rte_power_get_freq;
> >    *  - 0 on success without frequency changed.
> >    *  - Negative on error.
> >    */
> > -typedef int (*rte_power_set_freq_t)(unsigned int lcore_id, uint32_t
> > index);
> > -
> > -extern rte_power_set_freq_t rte_power_set_freq;
> > +static inline uint32_t
> > +rte_power_set_freq(unsigned int lcore_id, uint32_t index) {
> > +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> >
> > -/**
> > - * Function pointer definition for generic frequency change
> > functions. Review
> > - * each environments specific documentation for usage.
> > - *
> > - * @param lcore_id
> > - *  lcore id.
> > - *
> > - * @return
> > - *  - 1 on success with frequency changed.
> > - *  - 0 on success without frequency changed.
> > - *  - Negative on error.
> > - */
> > -typedef int (*rte_power_freq_change_t)(unsigned int lcore_id);
> > +     return ops->set_freq(lcore_id, index); }
> >
> >   /**
> >    * Scale up the frequency of a specific lcore according to the available
> >    * frequencies.
> >    * Review each environments specific documentation for usage.
> >    */
> > -extern rte_power_freq_change_t rte_power_freq_up;
> > +static inline int
> > +rte_power_freq_up(unsigned int lcore_id) {
> > +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> > +
> > +     return ops->freq_up(lcore_id);
> > +}
> >
> >   /**
> >    * Scale down the frequency of a specific lcore according to the available
> >    * frequencies.
> >    * Review each environments specific documentation for usage.
> >    */
> > -extern rte_power_freq_change_t rte_power_freq_down;
> > +static inline int
> > +rte_power_freq_down(unsigned int lcore_id) {
> > +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> > +
> > +     return ops->freq_down(lcore_id); }
> >
> >   /**
> >    * Scale up the frequency of a specific lcore to the highest according to the
> >    * available frequencies.
> >    * Review each environments specific documentation for usage.
> >    */
> > -extern rte_power_freq_change_t rte_power_freq_max;
> > +static inline int
> > +rte_power_freq_max(unsigned int lcore_id) {
> > +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> > +
> > +     return ops->freq_max(lcore_id);
> > +}
> >
> >   /**
> >    * Scale down the frequency of a specific lcore to the lowest according to the
> >    * available frequencies.
> >    * Review each environments specific documentation for usage..
> >    */
> > -extern rte_power_freq_change_t rte_power_freq_min;
> > +static inline int
> > +rte_power_freq_min(unsigned int lcore_id) {
> > +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> > +
> > +     return ops->freq_min(lcore_id);
> > +}
> >
> >   /**
> >    * Query the Turbo Boost status of a specific lcore.
> >    * Review each environments specific documentation for usage..
> >    */
> > -extern rte_power_freq_change_t rte_power_turbo_status;
> > +static inline int
> > +rte_power_turbo_status(unsigned int lcore_id) {
> > +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> > +
> > +     return ops->turbo_status(lcore_id); }
> >
> >   /**
> >    * Enable Turbo Boost for this lcore.
> >    * Review each environments specific documentation for usage..
> >    */
> > -extern rte_power_freq_change_t rte_power_freq_enable_turbo;
> > +static inline int
> > +rte_power_freq_enable_turbo(unsigned int lcore_id) {
> > +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> > +
> > +     return ops->enable_turbo(lcore_id); }
> >
> >   /**
> >    * Disable Turbo Boost for this lcore.
> >    * Review each environments specific documentation for usage..
> >    */
> > -extern rte_power_freq_change_t rte_power_freq_disable_turbo;
> > +static inline int
> > +rte_power_freq_disable_turbo(unsigned int lcore_id) {
> > +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> >
> > -/**
> > - * Power capabilities summary.
> > - */
> > -struct rte_power_core_capabilities {
> > -     union {
> > -             uint64_t capabilities;
> > -             struct {
> > -                     uint64_t turbo:1;       /**< Turbo can be enabled. */
> > -                     uint64_t priority:1;    /**< SST-BF high freq core */
> > -             };
> > -     };
> > -};
> > +     return ops->disable_turbo(lcore_id); }
> >
> >   /**
> >    * Returns power capabilities for a specific lcore.
> > @@ -235,10 +278,14 @@ struct rte_power_core_capabilities {
> >    *  - 0 on success.
> >    *  - Negative on error.
> >    */
> > -typedef int (*rte_power_get_capabilities_t)(unsigned int lcore_id,
> > -             struct rte_power_core_capabilities *caps);
> > +static inline int
> > +rte_power_get_capabilities(unsigned int lcore_id,
> > +             struct rte_power_core_capabilities *caps) {
> > +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> >
> > -extern rte_power_get_capabilities_t rte_power_get_capabilities;
> > +     return ops->get_caps(lcore_id, caps); }
> >
> >   #ifdef __cplusplus
> >   }
> > diff --git a/lib/power/rte_power_core_ops.h
> > b/lib/power/rte_power_core_ops.h new file mode 100644 index
> > 0000000000..356a64df79
> > --- /dev/null
> > +++ b/lib/power/rte_power_core_ops.h
> > @@ -0,0 +1,208 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2024 Advanced Micro Devices, Inc.
> > + */
> > +
> > +#ifndef _RTE_POWER_CORE_OPS_H
> > +#define _RTE_POWER_CORE_OPS_H
> > +
> suggest rename the file as rte_power_cpufreq_api.h.
> If so, the role of this file is more clearly.
> > +__rte_internal
> > +int rte_power_register_ops(struct rte_power_core_ops *ops);
> > +
> > +/**
> > + * Macro to statically register the ops of a cpufreq driver.
> > + */
> > +#define RTE_POWER_REGISTER_OPS(ops)          \
> > +     RTE_INIT(power_hdlr_init_##ops)         \
> > +     {                                       \
> > +             rte_power_register_ops(&ops);   \
> > +     }
> > +
> > +/**
> > + * @internal Get the power ops struct from its index.
> > + *
> > + * @return
> > + *   The pointer to the ops struct in the table if registered.
> > + */
> > +struct rte_power_core_ops *
> > +rte_power_get_core_ops(void);
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif
> > diff --git a/lib/power/version.map b/lib/power/version.map index
> > c9a226614e..bd64e0828f 100644
> > --- a/lib/power/version.map
> > +++ b/lib/power/version.map
> > @@ -51,4 +51,18 @@ EXPERIMENTAL {
> >       rte_power_set_uncore_env;
> >       rte_power_uncore_freqs;
> >       rte_power_unset_uncore_env;
> > +     # added in 24.07
> 24.07-->24.11?
> > +     rte_power_logtype;
> > +};
> > +
> > +INTERNAL {
> > +     global:
> > +
> > +     rte_power_register_ops;
> > +     cpufreq_check_scaling_driver;
> > +     power_set_governor;
> > +     open_core_sysfs_file;
> > +     read_core_sysfs_u32;
> > +     read_core_sysfs_s;
> > +     write_core_sysfs_s;
> >   };
  
lihuisong (C) Sept. 13, 2024, 7:34 a.m. UTC | #4
在 2024/9/12 19:17, Tummala, Sivaprasad 写道:
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> Hi Huisong,
>
> Please find my response inline.
>
>> -----Original Message-----
>> From: lihuisong (C) <lihuisong@huawei.com>
>> Sent: Tuesday, August 27, 2024 1:51 PM
>> To: Tummala, Sivaprasad <Sivaprasad.Tummala@amd.com>
>> Cc: dev@dpdk.org; david.hunt@intel.com; anatoly.burakov@intel.com;
>> radu.nicolau@intel.com; cristian.dumitrescu@intel.com; jerinj@marvell.com;
>> konstantin.ananyev@huawei.com; Yigit, Ferruh <Ferruh.Yigit@amd.com>;
>> gakhil@marvell.com
>> Subject: Re: [PATCH v2 1/4] power: refactor core power management library
>>
>> Caution: This message originated from an External Source. Use proper caution
>> when opening attachments, clicking links, or responding.
>>
>>
>> Hi Sivaprasa,
>>
>> Some comments inline.
>>
>> /Huisong
>>
>> 在 2024/8/26 21:06, Sivaprasad Tummala 写道:
>>> This patch introduces a comprehensive refactor to the core power
>>> management library. The primary focus is on improving modularity and
>>> organization by relocating specific driver implementations from the
>>> 'lib/power' directory to dedicated directories within
>>> 'drivers/power/core/*'. The adjustment of meson.build files enables
>>> the selective activation of individual drivers.
>>> These changes contribute to a significant enhancement in code
>>> organization, providing a clearer structure for driver implementations.
>>> The refactor aims to improve overall code clarity and boost
>>> maintainability. Additionally, it establishes a foundation for future
>>> development, allowing for more focused work on individual drivers and
>>> seamless integration of forthcoming enhancements.
>>>
>>> v2:
>>>    - added NULL check for global_core_ops in rte_power_get_core_ops
>>>
>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>>> ---
>>>    drivers/meson.build                           |   1 +
>>>    .../power/acpi/acpi_cpufreq.c                 |  22 +-
>>>    .../power/acpi/acpi_cpufreq.h                 |   6 +-
>>>    drivers/power/acpi/meson.build                |  10 +
>>>    .../power/amd_pstate/amd_pstate_cpufreq.c     |  24 +-
>>>    .../power/amd_pstate/amd_pstate_cpufreq.h     |   8 +-
>>>    drivers/power/amd_pstate/meson.build          |  10 +
>>>    .../power/cppc/cppc_cpufreq.c                 |  22 +-
>>>    .../power/cppc/cppc_cpufreq.h                 |   8 +-
>>>    drivers/power/cppc/meson.build                |  10 +
>>>    .../power/kvm_vm}/guest_channel.c             |   0
>>>    .../power/kvm_vm}/guest_channel.h             |   0
>>>    .../power/kvm_vm/kvm_vm.c                     |  22 +-
>>>    .../power/kvm_vm/kvm_vm.h                     |   6 +-
>>>    drivers/power/kvm_vm/meson.build              |  16 +
>>>    drivers/power/meson.build                     |  12 +
>>>    drivers/power/pstate/meson.build              |  10 +
>>>    .../power/pstate/pstate_cpufreq.c             |  22 +-
>>>    .../power/pstate/pstate_cpufreq.h             |   6 +-
>>>    lib/power/meson.build                         |   7 +-
>>>    lib/power/power_common.c                      |   2 +-
>>>    lib/power/power_common.h                      |  16 +-
>>>    lib/power/rte_power.c                         | 291 ++++++------------
>>>    lib/power/rte_power.h                         | 139 ++++++---
>>>    lib/power/rte_power_core_ops.h                | 208 +++++++++++++
>>>    lib/power/version.map                         |  14 +
>>>    26 files changed, 621 insertions(+), 271 deletions(-)
>>>    rename lib/power/power_acpi_cpufreq.c => drivers/power/acpi/acpi_cpufreq.c
>> (95%)
>>>    rename lib/power/power_acpi_cpufreq.h => drivers/power/acpi/acpi_cpufreq.h
>> (98%)
>>>    create mode 100644 drivers/power/acpi/meson.build
>>>    rename lib/power/power_amd_pstate_cpufreq.c =>
>> drivers/power/amd_pstate/amd_pstate_cpufreq.c (95%)
>>>    rename lib/power/power_amd_pstate_cpufreq.h =>
>> drivers/power/amd_pstate/amd_pstate_cpufreq.h (97%)
>>>    create mode 100644 drivers/power/amd_pstate/meson.build
>>>    rename lib/power/power_cppc_cpufreq.c => drivers/power/cppc/cppc_cpufreq.c
>> (95%)
>>>    rename lib/power/power_cppc_cpufreq.h => drivers/power/cppc/cppc_cpufreq.h
>> (97%)
>>>    create mode 100644 drivers/power/cppc/meson.build
>>>    rename {lib/power => drivers/power/kvm_vm}/guest_channel.c (100%)
>>>    rename {lib/power => drivers/power/kvm_vm}/guest_channel.h (100%)
>>>    rename lib/power/power_kvm_vm.c => drivers/power/kvm_vm/kvm_vm.c (82%)
>>>    rename lib/power/power_kvm_vm.h => drivers/power/kvm_vm/kvm_vm.h (98%)
>>>    create mode 100644 drivers/power/kvm_vm/meson.build
>>>    create mode 100644 drivers/power/meson.build
>>>    create mode 100644 drivers/power/pstate/meson.build
>>>    rename lib/power/power_pstate_cpufreq.c =>
>> drivers/power/pstate/pstate_cpufreq.c (96%)
>>>    rename lib/power/power_pstate_cpufreq.h =>
>> drivers/power/pstate/pstate_cpufreq.h (98%)
>>>    create mode 100644 lib/power/rte_power_core_ops.h
>> How about use the following directory structure?
>> *For power libs*
>> lib/power/power_common.*
>> lib/power/rte_power_pmd_mgmt.*
>> lib/power/rte_power_cpufreq_api.* (replacing rte_power.c file maybe simple for us.
>> but I'm not sure if we can put the init of core, uncore and pmd mgmt to
>> rte_power_init.c in rte_power.c.)
>> lib/power/rte_power_uncore_freq_api.*
> Yes, renaming rte_power.c is definitely a possible incremental change that could be considered later.
> However, for the time being, our focus will be on refactoring the cpufreq drivers only.
The rte_power.c just works for the initialization of cpufreq driver. Now 
that you are reworking core and uncore power library and rearrange the 
directory under power.
I think renaming this file name should be more appropriate in this series.
>> *And has directories under drivers/power:*
>> 1> For core dvfs driver:
>> drivers/power/cpufreq/acpi_cpufreq.c
>> drivers/power/cpufreq/cppc_cpufreq.c
>> drivers/power/cpufreq/amd_pstate_cpufreq.c
>> drivers/power/cpufreq/intel_pstate_cpufreq.c
>> drivers/power/cpufreq/kvm_cpufreq.c
>> The code of each cpufreq driver is not too much and doesn't probably increase. So
>> don't need to use a directory for it.
>>
>> 2> For uncore dvfs driver:
>> drivers/power/uncorefreq/intel_uncore.*
>>> diff --git a/drivers/meson.build b/drivers/meson.build index
>>> 66931d4241..9d77e0deab 100644
>>> --- a/drivers/meson.build
>>> +++ b/drivers/meson.build
>>> @@ -29,6 +29,7 @@ subdirs = [
>>>            'event',          # depends on common, bus, mempool and net.
>>>            'baseband',       # depends on common and bus.
>>>            'gpu',            # depends on common and bus.
>>> +        'power',          # depends on common (in future).
>>>    ]
>>>
>>>    if meson.is_cross_build()
>>> diff --git a/lib/power/power_acpi_cpufreq.c
>>> b/drivers/power/acpi/acpi_cpufreq.c
>>> similarity index 95%
>>> rename from lib/power/power_acpi_cpufreq.c rename to
>>> drivers/power/acpi/acpi_cpufreq.c
>> do not suggest to create one directory for each cpufreq driver.
>> Because pstate drivers also comply with ACPI spec, right?
>> In addition, the code of each cpufreq drivers are not too much.
>> There is just one file under one directory which is not good.
> One of our objectives for the refactoring is to selectively disable non-essential drivers using Meson build options.
> However, by rearranging the driver structure, we risk disrupting this capability.
I get your purpose.
The cpufreq library has the feature and interface to detect which driver 
to use, right?
So it is not necessary for cpufreq library to introduce the Meson build 
options, which probably makes it complicate.
>>> index 81996e1c13..8637c69703 100644
>>> --- a/lib/power/power_acpi_cpufreq.c
>>> +++ b/drivers/power/acpi/acpi_cpufreq.c
>>> @@ -10,7 +10,7 @@
>>>    #include <rte_stdatomic.h>
>>>    #include <rte_string_fns.h>
>>>
>>> -#include "power_acpi_cpufreq.h"
>>> +#include "acpi_cpufreq.h"
>>>    #include "power_common.h"
>>>
>> <...>
>>> +if not is_linux
>>> +    build = false
>>> +    reason = 'only supported on Linux'
>>> +endif
>>> +sources = files('pstate_cpufreq.c')
>>> +
>>> +deps += ['power']
>>> diff --git a/lib/power/power_pstate_cpufreq.c
>>> b/drivers/power/pstate/pstate_cpufreq.c
>>> similarity index 96%
>>> rename from lib/power/power_pstate_cpufreq.c rename to
>>> drivers/power/pstate/pstate_cpufreq.c
>> pstate_cpufreq.c is actually intel_pstate cpufreq driver, right?
>> So how about modify this file name to intel_pstate_cpufreq.c?
> Yes, will fix this in next version.
>>> index 2343121621..c32b1adabc 100644
>>> --- a/lib/power/power_pstate_cpufreq.c
>>> +++ b/drivers/power/pstate/pstate_cpufreq.c
>>> @@ -15,7 +15,7 @@
>>>    #include <rte_stdatomic.h>
>>>
>>>    #include "rte_power_pmd_mgmt.h"
>>> -#include "power_pstate_cpufreq.h"
>>> +#include "pstate_cpufreq.h"
>>>    #include "power_common.h"
>>>
>>>    /* macros used for rounding frequency to nearest 100000 */ @@ -888,3
>>> +888,23 @@ int power_pstate_get_capabilities(unsigned int lcore_id,
>>>
>>>        return 0;
>>>    }
>>> +
>> <...>
>>> diff --git a/lib/power/power_common.c b/lib/power/power_common.c index
>>> 590986d5ef..6c06411e8b 100644
>>> --- a/lib/power/power_common.c
>>> +++ b/lib/power/power_common.c
>>> @@ -12,7 +12,7 @@
>>>
>>>    #include "power_common.h"
>>>
>>> -RTE_LOG_REGISTER_DEFAULT(power_logtype, INFO);
>>> +RTE_LOG_REGISTER_DEFAULT(rte_power_logtype, INFO);
>>>
>>>    #define POWER_SYSFILE_SCALING_DRIVER   \
>>>                "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_driver"
>>> diff --git a/lib/power/power_common.h b/lib/power/power_common.h index
>>> 83f742f42a..767686ee12 100644
>>> --- a/lib/power/power_common.h
>>> +++ b/lib/power/power_common.h
>>> @@ -6,12 +6,13 @@
>>>    #define _POWER_COMMON_H_
>>>
>>>    #include <rte_common.h>
>>> +#include <rte_compat.h>
>>>    #include <rte_log.h>
>>>
>>>    #define RTE_POWER_INVALID_FREQ_INDEX (~0)
>>>
>>> -extern int power_logtype;
>>> -#define RTE_LOGTYPE_POWER power_logtype
>>> +extern int rte_power_logtype;
>>> +#define RTE_LOGTYPE_POWER rte_power_logtype
>>>    #define POWER_LOG(level, ...) \
>>>        RTE_LOG_LINE(level, POWER, "" __VA_ARGS__)
>>>
>>> @@ -23,13 +24,24 @@ extern int power_logtype;
>>>    #endif
>>>
>>>    /* check if scaling driver matches one we want */
>>> +__rte_internal
>>>    int cpufreq_check_scaling_driver(const char *driver);
>>> +
>>> +__rte_internal
>>>    int power_set_governor(unsigned int lcore_id, const char *new_governor,
>>>                char *orig_governor, size_t orig_governor_len);
>> suggest that move cpufreq interfaces like this to the
>> rte_power_cpufreq_api.* I proposed above.
> This is an internal API and isn’t intended for direct use by applications.
> By moving it to rte_power_*, we risk exposing it inadvertently.
we don't expose these to applications. application do not include this 
header file.
power_set_governor() and cpufreq_check_scaling_driver() is just used by 
cpufreq driver. So they just can be seen by cpufreq lib or module, right?
But if these interface are in power_common.h, pmd_mgmt and uncore driver 
also include this header file and can see them. This is not good.
AFAIS, the power_common.h should just contain the kind of interfaces 
that are used by all power libs or sub-modules, like cpufreq, uncore, 
pmd_mgmt and so on.
>
>> The interfaces in power_comm.* can be used by all power modules, like
>> core/uncore/pmd mgmt.
>>> +
>>> +__rte_internal
>>>    int open_core_sysfs_file(FILE **f, const char *mode, const char *format, ...)
>>>                __rte_format_printf(3, 4);
>>> +
>>> +__rte_internal
>>>    int read_core_sysfs_u32(FILE *f, uint32_t *val);
>>> +
>>> +__rte_internal
>>>    int read_core_sysfs_s(FILE *f, char *buf, unsigned int len);
>>> +
>>> +__rte_internal
>>>    int write_core_sysfs_s(FILE *f, const char *str);
>>>
>>>    #endif /* _POWER_COMMON_H_ */
>>> diff --git a/lib/power/rte_power.c b/lib/power/rte_power.c
>> The name of the rte_power.c file is impropriate now. The context in this file is just for
>> cpufreq, right?
>> So I suggest that we need to rename this file as the rte_power_cpufreq_api.c
> Yes, renaming rte_power.c to rte_power_cpufreq.c is definitely a possible incremental change
> and will fix this as a separate patch.
> .
>
>>> index 36c3f3da98..2bf6d40517 100644
>>> --- a/lib/power/rte_power.c
>>> +++ b/lib/power/rte_power.c
>>> @@ -8,153 +8,86 @@
>>>    #include <rte_spinlock.h>
>>>
>>>    #include "rte_power.h"
>>> -#include "power_acpi_cpufreq.h"
>>> -#include "power_cppc_cpufreq.h"
>>>    #include "power_common.h"
>>> -#include "power_kvm_vm.h"
>>> -#include "power_pstate_cpufreq.h"
>>> -#include "power_amd_pstate_cpufreq.h"
>>>
>>> -enum power_management_env global_default_env = PM_ENV_NOT_SET;
>>> +static enum power_management_env global_default_env =
>> PM_ENV_NOT_SET;
>>> +static struct rte_power_core_ops *global_power_core_ops;
>>>
>>>    static rte_spinlock_t global_env_cfg_lock =
>>> RTE_SPINLOCK_INITIALIZER;
>>> +static RTE_TAILQ_HEAD(, rte_power_core_ops) core_ops_list =
>>> +                     TAILQ_HEAD_INITIALIZER(core_ops_list);
>>>
>>> -/* function pointers */
>>> -rte_power_freqs_t rte_power_freqs  = NULL; -rte_power_get_freq_t
>>> rte_power_get_freq = NULL; -rte_power_set_freq_t rte_power_set_freq =
>>> NULL; -rte_power_freq_change_t rte_power_freq_up = NULL;
>>> -rte_power_freq_change_t rte_power_freq_down = NULL;
>>> -rte_power_freq_change_t rte_power_freq_max = NULL;
>>> -rte_power_freq_change_t rte_power_freq_min = NULL;
>>> -rte_power_freq_change_t rte_power_turbo_status;
>>> -rte_power_freq_change_t rte_power_freq_enable_turbo;
>>> -rte_power_freq_change_t rte_power_freq_disable_turbo;
>>> -rte_power_get_capabilities_t rte_power_get_capabilities;
>>> -
>>> -static void
>>> -reset_power_function_ptrs(void)
>>> +
>>> +const char *power_env_str[] = {
>>> +     "not set",
>>> +     "acpi",
>>> +     "kvm-vm",
>>> +     "pstate",
>>> +     "cppc",
>>> +     "amd-pstate"
>>> +};
>>> +
>>> +/* register the ops struct in rte_power_core_ops, return 0 on
>>> +success. */ int rte_power_register_ops(struct rte_power_core_ops
>>> +*driver_ops)
>>>    {
>>> -     rte_power_freqs  = NULL;
>>> -     rte_power_get_freq = NULL;
>>> -     rte_power_set_freq = NULL;
>>> -     rte_power_freq_up = NULL;
>>> -     rte_power_freq_down = NULL;
>>> -     rte_power_freq_max = NULL;
>>> -     rte_power_freq_min = NULL;
>>> -     rte_power_turbo_status = NULL;
>>> -     rte_power_freq_enable_turbo = NULL;
>>> -     rte_power_freq_disable_turbo = NULL;
>>> -     rte_power_get_capabilities = NULL;
>>> +     if (!driver_ops->init || !driver_ops->exit ||
>>> +             !driver_ops->check_env_support || !driver_ops->get_avail_freqs ||
>>> +             !driver_ops->get_freq || !driver_ops->set_freq ||
>>> +             !driver_ops->freq_up || !driver_ops->freq_down ||
>>> +             !driver_ops->freq_max || !driver_ops->freq_min ||
>>> +             !driver_ops->turbo_status || !driver_ops->enable_turbo ||
>>> +             !driver_ops->disable_turbo || !driver_ops->get_caps) {
>>> +             POWER_LOG(ERR, "Missing callbacks while registering
>>> + power ops");
>> turbo_status(), enable_turbo() and disable turbo() are not necessary, right?
> Nope, this is required to get the current status unlike the capability API (get_caps()).
ok
>> These depand on the capabilities from get_caps().
>>> +             return -EINVAL;
>>> +     }
>>> +
>>> +     TAILQ_INSERT_TAIL(&core_ops_list, driver_ops, next);
>>> +
>>> +     return 0;
>>>    }
>>>
>>>    int
>>>    rte_power_check_env_supported(enum power_management_env env)
>>>    {
>>> -     switch (env) {
>>> -     case PM_ENV_ACPI_CPUFREQ:
>>> -             return power_acpi_cpufreq_check_supported();
>>> -     case PM_ENV_PSTATE_CPUFREQ:
>>> -             return power_pstate_cpufreq_check_supported();
>>> -     case PM_ENV_KVM_VM:
>>> -             return power_kvm_vm_check_supported();
>>> -     case PM_ENV_CPPC_CPUFREQ:
>>> -             return power_cppc_cpufreq_check_supported();
>>> -     case PM_ENV_AMD_PSTATE_CPUFREQ:
>>> -             return power_amd_pstate_cpufreq_check_supported();
>>> -     default:
>>> -             rte_errno = EINVAL;
>>> -             return -1;
>>> -     }
>>> +     struct rte_power_core_ops *ops;
>>> +
>>> +     if (env >= RTE_DIM(power_env_str))
>>> +             return 0;
>>> +
>>> +     RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
>>> +             if (strncmp(ops->name, power_env_str[env],
>>> +                             RTE_POWER_DRIVER_NAMESZ) == 0)
>>> +                     return ops->check_env_support();
>>> +
>>> +     return 0;
>>>    }
>>>
>>>    int
>>>    rte_power_set_env(enum power_management_env env)
>>>    {
>>> +     struct rte_power_core_ops *ops;
>>> +     int ret = -1;
>>> +
>>>        rte_spinlock_lock(&global_env_cfg_lock);
>>>
>>>        if (global_default_env != PM_ENV_NOT_SET) {
>>>                POWER_LOG(ERR, "Power Management Environment already set.");
>>> -             rte_spinlock_unlock(&global_env_cfg_lock);
>>> -             return -1;
>>> -     }
>>> -
>> <...>
>>> -     if (ret == 0)
>>> -             global_default_env = env;
>>> -     else {
>>> -             global_default_env = PM_ENV_NOT_SET;
>>> -             reset_power_function_ptrs();
>>> -     }
>>> +     RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
>>> +             if (strncmp(ops->name, power_env_str[env],
>>> +                             RTE_POWER_DRIVER_NAMESZ) == 0) {
>>> +                     global_power_core_ops = ops;
>>> +                     global_default_env = env;
>>> +                     ret = 0;
>>> +                     goto out;
>>> +             }
>>> +     POWER_LOG(ERR, "Invalid Power Management Environment(%d) set",
>>> +                     env);
>>>
>>> +out:
>>>        rte_spinlock_unlock(&global_env_cfg_lock);
>>>        return ret;
>>>    }
>>> @@ -164,94 +97,66 @@ rte_power_unset_env(void)
>>>    {
>>>        rte_spinlock_lock(&global_env_cfg_lock);
>>>        global_default_env = PM_ENV_NOT_SET;
>>> -     reset_power_function_ptrs();
>>> +     global_power_core_ops = NULL;
>>>        rte_spinlock_unlock(&global_env_cfg_lock);
>>>    }
>>>
>>>    enum power_management_env
>>> -rte_power_get_env(void) {
>>> +rte_power_get_env(void)
>>> +{
>>>        return global_default_env;
>>>    }
>>>
>>> -int
>>> -rte_power_init(unsigned int lcore_id)
>>> +struct rte_power_core_ops *
>>> +rte_power_get_core_ops(void)
>>>    {
>>> -     int ret = -1;
>>> +     RTE_ASSERT(global_power_core_ops != NULL);
>>>
>>> -     switch (global_default_env) {
>>> -     case PM_ENV_ACPI_CPUFREQ:
>>> -             return power_acpi_cpufreq_init(lcore_id);
>>> -     case PM_ENV_KVM_VM:
>>> -             return power_kvm_vm_init(lcore_id);
>>> -     case PM_ENV_PSTATE_CPUFREQ:
>>> -             return power_pstate_cpufreq_init(lcore_id);
>>> -     case PM_ENV_CPPC_CPUFREQ:
>>> -             return power_cppc_cpufreq_init(lcore_id);
>>> -     case PM_ENV_AMD_PSTATE_CPUFREQ:
>>> -             return power_amd_pstate_cpufreq_init(lcore_id);
>>> -     default:
>>> -             POWER_LOG(INFO, "Env isn't set yet!");
>>> -     }
>>> +     return global_power_core_ops;
>>> +}
>>>
>>> -     /* Auto detect Environment */
>>> -     POWER_LOG(INFO, "Attempting to initialise ACPI cpufreq power
>> management...");
>>> -     ret = power_acpi_cpufreq_init(lcore_id);
>>> -     if (ret == 0) {
>>> -             rte_power_set_env(PM_ENV_ACPI_CPUFREQ);
>>> -             goto out;
>>> -     }
>>> +int
>>> +rte_power_init(unsigned int lcore_id) {
>>> +     struct rte_power_core_ops *ops;
>>> +     uint8_t env;
>>>
>>> -     POWER_LOG(INFO, "Attempting to initialise PSTAT power management...");
>>> -     ret = power_pstate_cpufreq_init(lcore_id);
>>> -     if (ret == 0) {
>>> -             rte_power_set_env(PM_ENV_PSTATE_CPUFREQ);
>>> -             goto out;
>>> -     }
>>> +     if (global_default_env != PM_ENV_NOT_SET)
>>> +             return global_power_core_ops->init(lcore_id);
>>>
>>> -     POWER_LOG(INFO, "Attempting to initialise AMD PSTATE power
>> management...");
>>> -     ret = power_amd_pstate_cpufreq_init(lcore_id);
>>> -     if (ret == 0) {
>>> -             rte_power_set_env(PM_ENV_AMD_PSTATE_CPUFREQ);
>>> -             goto out;
>>> -     }
>>> +     POWER_LOG(INFO, "Env isn't set yet!");
>> remove this log?
>>> -     POWER_LOG(INFO, "Attempting to initialise CPPC power management...");
>>> -     ret = power_cppc_cpufreq_init(lcore_id);
>>> -     if (ret == 0) {
>>> -             rte_power_set_env(PM_ENV_CPPC_CPUFREQ);
>>> -             goto out;
>>> -     }
>>> +     /* Auto detect Environment */
>>> +     RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
>>> +             if (ops) {
>>> +                     POWER_LOG(INFO,
>>> +                             "Attempting to initialise %s cpufreq power management...",
>>> +                             ops->name);
>>> +                     if (ops->init(lcore_id) == 0) {
>>> +                             for (env = 0; env < RTE_DIM(power_env_str); env++)
>>> +                                     if (strncmp(ops->name, power_env_str[env],
>>> +                                                     RTE_POWER_DRIVER_NAMESZ) == 0) {
>>> +                                             rte_power_set_env(env);
>>> +                                             return 0;
>>> +                                     }
>>> +                     }
>>> +             }
>> Can we change the logic of rte_power_set_env()? like:
>>       RTE_TAILQ_FOREACH(ops, &core_ops_list, next) {
>>           for (env = 0; env < RTE_DIM(power_env_str); env++) {
>>                   if (strncmp(ops->name, power_env_str[env],
>> RTE_POWER_DRIVER_NAMESZ) == 0 &&
>>                       ops->init(lcore_id) == 0) {
>>                       global_power_core_ops = ops;
>>                       global_default_env = env;
>>                   }
>>           }
>>       }
>> That is easier to follow code.
> Yes, will fix in next version.
>
>>> +
>>> +     POWER_LOG(ERR,
>>> +             "Unable to set Power Management Environment for lcore %u",
>>> +             lcore_id);
>>>
>>> -     POWER_LOG(INFO, "Attempting to initialise VM power management...");
>>> -     ret = power_kvm_vm_init(lcore_id);
>>> -     if (ret == 0) {
>>> -             rte_power_set_env(PM_ENV_KVM_VM);
>>> -             goto out;
>>> -     }
>>> -     POWER_LOG(ERR, "Unable to set Power Management Environment for lcore
>> "
>>> -                     "%u", lcore_id);
>>> -out:
>>> -     return ret;
>>> +     return -1;
>>>    }
>>>
>>>    int
>>>    rte_power_exit(unsigned int lcore_id)
>>>    {
>>> -     switch (global_default_env) {
>>> -     case PM_ENV_ACPI_CPUFREQ:
>>> -             return power_acpi_cpufreq_exit(lcore_id);
>>> -     case PM_ENV_KVM_VM:
>>> -             return power_kvm_vm_exit(lcore_id);
>>> -     case PM_ENV_PSTATE_CPUFREQ:
>>> -             return power_pstate_cpufreq_exit(lcore_id);
>>> -     case PM_ENV_CPPC_CPUFREQ:
>>> -             return power_cppc_cpufreq_exit(lcore_id);
>>> -     case PM_ENV_AMD_PSTATE_CPUFREQ:
>>> -             return power_amd_pstate_cpufreq_exit(lcore_id);
>>> -     default:
>>> -             POWER_LOG(ERR, "Environment has not been set, unable to exit
>> gracefully");
>>> +     if (global_default_env != PM_ENV_NOT_SET)
>>> +             return global_power_core_ops->exit(lcore_id);
>>>
>>> -     }
>>> -     return -1;
>>> +     POWER_LOG(ERR,
>>> +             "Environment has not been set, unable to exit
>>> + gracefully");
>>>
>>> +     return -1;
>>>    }
>>> diff --git a/lib/power/rte_power.h b/lib/power/rte_power.h index
>>> 4fa4afe399..5e4aacf08b 100644
>>> --- a/lib/power/rte_power.h
>>> +++ b/lib/power/rte_power.h
>>> @@ -1,5 +1,6 @@
>>>    /* SPDX-License-Identifier: BSD-3-Clause
>>>     * Copyright(c) 2010-2014 Intel Corporation
>>> + * Copyright(c) 2024 Advanced Micro Devices, Inc.
>>>     */
>>>
>>>    #ifndef _RTE_POWER_H
>>> @@ -14,14 +15,21 @@
>>>    #include <rte_log.h>
>>>    #include <rte_power_guest_channel.h>
>>>
>>> +#include "rte_power_core_ops.h"
>>> +
>>>    #ifdef __cplusplus
>>>    extern "C" {
>>>    #endif
>>>
>>>    /* Power Management Environment State */ -enum power_management_env
>>> {PM_ENV_NOT_SET, PM_ENV_ACPI_CPUFREQ, PM_ENV_KVM_VM,
>>> -             PM_ENV_PSTATE_CPUFREQ, PM_ENV_CPPC_CPUFREQ,
>>> -             PM_ENV_AMD_PSTATE_CPUFREQ};
>>> +enum power_management_env {
>>> +     PM_ENV_NOT_SET = 0,
>>> +     PM_ENV_ACPI_CPUFREQ,
>>> +     PM_ENV_KVM_VM,
>>> +     PM_ENV_PSTATE_CPUFREQ,
>>> +     PM_ENV_CPPC_CPUFREQ,
>>> +     PM_ENV_AMD_PSTATE_CPUFREQ
>>> +};
>>>
>>>    /**
>>>     * Check if a specific power management environment type is
>>> supported on a @@ -66,6 +74,15 @@ void rte_power_unset_env(void);
>>>     */
>>>    enum power_management_env rte_power_get_env(void);
>> I'd like to let user not know used which cpufreq driver, which is friendly to user.
>>
>> So we can rethink if this API is necessary.
> For any API changes, could we handle this as a separate RFC for discussion?
> It’s important that these changes are not included within the scope of this patch.
Agreed.
Can you post a separate RFC to disscuss this improvement later?
>>> +/**
>>> + * @internal Get the power ops struct from its index.
>>> + *
>>> + * @return
>>> + *   The pointer to the ops struct in the table if registered.
>>> + */
>>> +struct rte_power_core_ops *
>>> +rte_power_get_core_ops(void);
>>> +
>>>    /**
>>>     * Initialize power management for a specific lcore. If rte_power_set_env() has
>>>     * not been called then an auto-detect of the environment will start
>>> and @@ -108,10 +125,13 @@ int rte_power_exit(unsigned int lcore_id);
>>>     * @return
>>>     *  The number of available frequencies.
>>>     */
>>> -typedef uint32_t (*rte_power_freqs_t)(unsigned int lcore_id, uint32_t *freqs,
>>> -             uint32_t num);
>>> +static inline uint32_t
>>> +rte_power_freqs(unsigned int lcore_id, uint32_t *freqs, uint32_t n) {
>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>>
>>> -extern rte_power_freqs_t rte_power_freqs;
>>> +     return ops->get_avail_freqs(lcore_id, freqs, n); }
>>>
>>>    /**
>>>     * Return the current index of available frequencies of a specific lcore.
>>> @@ -124,9 +144,13 @@ extern rte_power_freqs_t rte_power_freqs;
>>>     * @return
>>>     *  The current index of available frequencies.
>>>     */
>>> -typedef uint32_t (*rte_power_get_freq_t)(unsigned int lcore_id);
>>> +static inline uint32_t
>>> +rte_power_get_freq(unsigned int lcore_id) {
>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>>
>>> -extern rte_power_get_freq_t rte_power_get_freq;
>>> +     return ops->get_freq(lcore_id);
>>> +}
>>>
>>>    /**
>>>     * Set the new frequency for a specific lcore by indicating the
>>> index of @@ -144,82 +168,101 @@ extern rte_power_get_freq_t
>> rte_power_get_freq;
>>>     *  - 0 on success without frequency changed.
>>>     *  - Negative on error.
>>>     */
>>> -typedef int (*rte_power_set_freq_t)(unsigned int lcore_id, uint32_t
>>> index);
>>> -
>>> -extern rte_power_set_freq_t rte_power_set_freq;
>>> +static inline uint32_t
>>> +rte_power_set_freq(unsigned int lcore_id, uint32_t index) {
>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>>
>>> -/**
>>> - * Function pointer definition for generic frequency change
>>> functions. Review
>>> - * each environments specific documentation for usage.
>>> - *
>>> - * @param lcore_id
>>> - *  lcore id.
>>> - *
>>> - * @return
>>> - *  - 1 on success with frequency changed.
>>> - *  - 0 on success without frequency changed.
>>> - *  - Negative on error.
>>> - */
>>> -typedef int (*rte_power_freq_change_t)(unsigned int lcore_id);
>>> +     return ops->set_freq(lcore_id, index); }
>>>
>>>    /**
>>>     * Scale up the frequency of a specific lcore according to the available
>>>     * frequencies.
>>>     * Review each environments specific documentation for usage.
>>>     */
>>> -extern rte_power_freq_change_t rte_power_freq_up;
>>> +static inline int
>>> +rte_power_freq_up(unsigned int lcore_id) {
>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>> +
>>> +     return ops->freq_up(lcore_id);
>>> +}
>>>
>>>    /**
>>>     * Scale down the frequency of a specific lcore according to the available
>>>     * frequencies.
>>>     * Review each environments specific documentation for usage.
>>>     */
>>> -extern rte_power_freq_change_t rte_power_freq_down;
>>> +static inline int
>>> +rte_power_freq_down(unsigned int lcore_id) {
>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>> +
>>> +     return ops->freq_down(lcore_id); }
>>>
>>>    /**
>>>     * Scale up the frequency of a specific lcore to the highest according to the
>>>     * available frequencies.
>>>     * Review each environments specific documentation for usage.
>>>     */
>>> -extern rte_power_freq_change_t rte_power_freq_max;
>>> +static inline int
>>> +rte_power_freq_max(unsigned int lcore_id) {
>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>> +
>>> +     return ops->freq_max(lcore_id);
>>> +}
>>>
>>>    /**
>>>     * Scale down the frequency of a specific lcore to the lowest according to the
>>>     * available frequencies.
>>>     * Review each environments specific documentation for usage..
>>>     */
>>> -extern rte_power_freq_change_t rte_power_freq_min;
>>> +static inline int
>>> +rte_power_freq_min(unsigned int lcore_id) {
>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>> +
>>> +     return ops->freq_min(lcore_id);
>>> +}
>>>
>>>    /**
>>>     * Query the Turbo Boost status of a specific lcore.
>>>     * Review each environments specific documentation for usage..
>>>     */
>>> -extern rte_power_freq_change_t rte_power_turbo_status;
>>> +static inline int
>>> +rte_power_turbo_status(unsigned int lcore_id) {
>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>> +
>>> +     return ops->turbo_status(lcore_id); }
>>>
>>>    /**
>>>     * Enable Turbo Boost for this lcore.
>>>     * Review each environments specific documentation for usage..
>>>     */
>>> -extern rte_power_freq_change_t rte_power_freq_enable_turbo;
>>> +static inline int
>>> +rte_power_freq_enable_turbo(unsigned int lcore_id) {
>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>> +
>>> +     return ops->enable_turbo(lcore_id); }
>>>
>>>    /**
>>>     * Disable Turbo Boost for this lcore.
>>>     * Review each environments specific documentation for usage..
>>>     */
>>> -extern rte_power_freq_change_t rte_power_freq_disable_turbo;
>>> +static inline int
>>> +rte_power_freq_disable_turbo(unsigned int lcore_id) {
>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>>
>>> -/**
>>> - * Power capabilities summary.
>>> - */
>>> -struct rte_power_core_capabilities {
>>> -     union {
>>> -             uint64_t capabilities;
>>> -             struct {
>>> -                     uint64_t turbo:1;       /**< Turbo can be enabled. */
>>> -                     uint64_t priority:1;    /**< SST-BF high freq core */
>>> -             };
>>> -     };
>>> -};
>>> +     return ops->disable_turbo(lcore_id); }
>>>
>>>    /**
>>>     * Returns power capabilities for a specific lcore.
>>> @@ -235,10 +278,14 @@ struct rte_power_core_capabilities {
>>>     *  - 0 on success.
>>>     *  - Negative on error.
>>>     */
>>> -typedef int (*rte_power_get_capabilities_t)(unsigned int lcore_id,
>>> -             struct rte_power_core_capabilities *caps);
>>> +static inline int
>>> +rte_power_get_capabilities(unsigned int lcore_id,
>>> +             struct rte_power_core_capabilities *caps) {
>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>>
>>> -extern rte_power_get_capabilities_t rte_power_get_capabilities;
>>> +     return ops->get_caps(lcore_id, caps); }
>>>
>>>    #ifdef __cplusplus
>>>    }
>>> diff --git a/lib/power/rte_power_core_ops.h
>>> b/lib/power/rte_power_core_ops.h new file mode 100644 index
>>> 0000000000..356a64df79
>>> --- /dev/null
>>> +++ b/lib/power/rte_power_core_ops.h
>>> @@ -0,0 +1,208 @@
>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>> + * Copyright(c) 2010-2014 Intel Corporation
>>> + * Copyright(c) 2024 Advanced Micro Devices, Inc.
>>> + */
>>> +
>>> +#ifndef _RTE_POWER_CORE_OPS_H
>>> +#define _RTE_POWER_CORE_OPS_H
>>> +
>> suggest rename the file as rte_power_cpufreq_api.h.
>> If so, the role of this file is more clearly.
>>> +__rte_internal
>>> +int rte_power_register_ops(struct rte_power_core_ops *ops);
>>> +
>>> +/**
>>> + * Macro to statically register the ops of a cpufreq driver.
>>> + */
>>> +#define RTE_POWER_REGISTER_OPS(ops)          \
>>> +     RTE_INIT(power_hdlr_init_##ops)         \
>>> +     {                                       \
>>> +             rte_power_register_ops(&ops);   \
>>> +     }
>>> +
>>> +/**
>>> + * @internal Get the power ops struct from its index.
>>> + *
>>> + * @return
>>> + *   The pointer to the ops struct in the table if registered.
>>> + */
>>> +struct rte_power_core_ops *
>>> +rte_power_get_core_ops(void);
>>> +
>>> +#ifdef __cplusplus
>>> +}
>>> +#endif
>>> +
>>> +#endif
>>> diff --git a/lib/power/version.map b/lib/power/version.map index
>>> c9a226614e..bd64e0828f 100644
>>> --- a/lib/power/version.map
>>> +++ b/lib/power/version.map
>>> @@ -51,4 +51,18 @@ EXPERIMENTAL {
>>>        rte_power_set_uncore_env;
>>>        rte_power_uncore_freqs;
>>>        rte_power_unset_uncore_env;
>>> +     # added in 24.07
>> 24.07-->24.11?
>>> +     rte_power_logtype;
>>> +};
>>> +
>>> +INTERNAL {
>>> +     global:
>>> +
>>> +     rte_power_register_ops;
>>> +     cpufreq_check_scaling_driver;
>>> +     power_set_governor;
>>> +     open_core_sysfs_file;
>>> +     read_core_sysfs_u32;
>>> +     read_core_sysfs_s;
>>> +     write_core_sysfs_s;
>>>    };
  
Tummala, Sivaprasad Sept. 18, 2024, 8:37 a.m. UTC | #5
[AMD Official Use Only - AMD Internal Distribution Only]

> -----Original Message-----
> From: lihuisong (C) <lihuisong@huawei.com>
> Sent: Friday, September 13, 2024 1:05 PM
> To: Tummala, Sivaprasad <Sivaprasad.Tummala@amd.com>
> Cc: dev@dpdk.org; david.hunt@intel.com; anatoly.burakov@intel.com;
> radu.nicolau@intel.com; cristian.dumitrescu@intel.com; jerinj@marvell.com;
> konstantin.ananyev@huawei.com; Yigit, Ferruh <Ferruh.Yigit@amd.com>;
> gakhil@marvell.com
> Subject: Re: [PATCH v2 1/4] power: refactor core power management library
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> 在 2024/9/12 19:17, Tummala, Sivaprasad 写道:
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> > Hi Huisong,
> >
> > Please find my response inline.
> >
> >> -----Original Message-----
> >> From: lihuisong (C) <lihuisong@huawei.com>
> >> Sent: Tuesday, August 27, 2024 1:51 PM
> >> To: Tummala, Sivaprasad <Sivaprasad.Tummala@amd.com>
> >> Cc: dev@dpdk.org; david.hunt@intel.com; anatoly.burakov@intel.com;
> >> radu.nicolau@intel.com; cristian.dumitrescu@intel.com;
> >> jerinj@marvell.com; konstantin.ananyev@huawei.com; Yigit, Ferruh
> >> <Ferruh.Yigit@amd.com>; gakhil@marvell.com
> >> Subject: Re: [PATCH v2 1/4] power: refactor core power management
> >> library
> >>
> >> Caution: This message originated from an External Source. Use proper
> >> caution when opening attachments, clicking links, or responding.
> >>
> >>
> >> Hi Sivaprasa,
> >>
> >> Some comments inline.
> >>
> >> /Huisong
> >>
> >> 在 2024/8/26 21:06, Sivaprasad Tummala 写道:
> >>> This patch introduces a comprehensive refactor to the core power
> >>> management library. The primary focus is on improving modularity and
> >>> organization by relocating specific driver implementations from the
> >>> 'lib/power' directory to dedicated directories within
> >>> 'drivers/power/core/*'. The adjustment of meson.build files enables
> >>> the selective activation of individual drivers.
> >>> These changes contribute to a significant enhancement in code
> >>> organization, providing a clearer structure for driver implementations.
> >>> The refactor aims to improve overall code clarity and boost
> >>> maintainability. Additionally, it establishes a foundation for
> >>> future development, allowing for more focused work on individual
> >>> drivers and seamless integration of forthcoming enhancements.
> >>>
> >>> v2:
> >>>    - added NULL check for global_core_ops in rte_power_get_core_ops
> >>>
> >>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> >>> ---
> >>>    drivers/meson.build                           |   1 +
> >>>    .../power/acpi/acpi_cpufreq.c                 |  22 +-
> >>>    .../power/acpi/acpi_cpufreq.h                 |   6 +-
> >>>    drivers/power/acpi/meson.build                |  10 +
> >>>    .../power/amd_pstate/amd_pstate_cpufreq.c     |  24 +-
> >>>    .../power/amd_pstate/amd_pstate_cpufreq.h     |   8 +-
> >>>    drivers/power/amd_pstate/meson.build          |  10 +
> >>>    .../power/cppc/cppc_cpufreq.c                 |  22 +-
> >>>    .../power/cppc/cppc_cpufreq.h                 |   8 +-
> >>>    drivers/power/cppc/meson.build                |  10 +
> >>>    .../power/kvm_vm}/guest_channel.c             |   0
> >>>    .../power/kvm_vm}/guest_channel.h             |   0
> >>>    .../power/kvm_vm/kvm_vm.c                     |  22 +-
> >>>    .../power/kvm_vm/kvm_vm.h                     |   6 +-
> >>>    drivers/power/kvm_vm/meson.build              |  16 +
> >>>    drivers/power/meson.build                     |  12 +
> >>>    drivers/power/pstate/meson.build              |  10 +
> >>>    .../power/pstate/pstate_cpufreq.c             |  22 +-
> >>>    .../power/pstate/pstate_cpufreq.h             |   6 +-
> >>>    lib/power/meson.build                         |   7 +-
> >>>    lib/power/power_common.c                      |   2 +-
> >>>    lib/power/power_common.h                      |  16 +-
> >>>    lib/power/rte_power.c                         | 291 ++++++------------
> >>>    lib/power/rte_power.h                         | 139 ++++++---
> >>>    lib/power/rte_power_core_ops.h                | 208 +++++++++++++
> >>>    lib/power/version.map                         |  14 +
> >>>    26 files changed, 621 insertions(+), 271 deletions(-)
> >>>    rename lib/power/power_acpi_cpufreq.c =>
> >>> drivers/power/acpi/acpi_cpufreq.c
> >> (95%)
> >>>    rename lib/power/power_acpi_cpufreq.h =>
> >>> drivers/power/acpi/acpi_cpufreq.h
> >> (98%)
> >>>    create mode 100644 drivers/power/acpi/meson.build
> >>>    rename lib/power/power_amd_pstate_cpufreq.c =>
> >> drivers/power/amd_pstate/amd_pstate_cpufreq.c (95%)
> >>>    rename lib/power/power_amd_pstate_cpufreq.h =>
> >> drivers/power/amd_pstate/amd_pstate_cpufreq.h (97%)
> >>>    create mode 100644 drivers/power/amd_pstate/meson.build
> >>>    rename lib/power/power_cppc_cpufreq.c =>
> >>> drivers/power/cppc/cppc_cpufreq.c
> >> (95%)
> >>>    rename lib/power/power_cppc_cpufreq.h =>
> >>> drivers/power/cppc/cppc_cpufreq.h
> >> (97%)
> >>>    create mode 100644 drivers/power/cppc/meson.build
> >>>    rename {lib/power => drivers/power/kvm_vm}/guest_channel.c (100%)
> >>>    rename {lib/power => drivers/power/kvm_vm}/guest_channel.h (100%)
> >>>    rename lib/power/power_kvm_vm.c => drivers/power/kvm_vm/kvm_vm.c
> (82%)
> >>>    rename lib/power/power_kvm_vm.h => drivers/power/kvm_vm/kvm_vm.h
> (98%)
> >>>    create mode 100644 drivers/power/kvm_vm/meson.build
> >>>    create mode 100644 drivers/power/meson.build
> >>>    create mode 100644 drivers/power/pstate/meson.build
> >>>    rename lib/power/power_pstate_cpufreq.c =>
> >> drivers/power/pstate/pstate_cpufreq.c (96%)
> >>>    rename lib/power/power_pstate_cpufreq.h =>
> >> drivers/power/pstate/pstate_cpufreq.h (98%)
> >>>    create mode 100644 lib/power/rte_power_core_ops.h
> >> How about use the following directory structure?
> >> *For power libs*
> >> lib/power/power_common.*
> >> lib/power/rte_power_pmd_mgmt.*
> >> lib/power/rte_power_cpufreq_api.* (replacing rte_power.c file maybe simple for
> us.
> >> but I'm not sure if we can put the init of core, uncore and pmd mgmt
> >> to rte_power_init.c in rte_power.c.)
> >> lib/power/rte_power_uncore_freq_api.*
> > Yes, renaming rte_power.c is definitely a possible incremental change that could
> be considered later.
> > However, for the time being, our focus will be on refactoring the cpufreq drivers
> only.
> The rte_power.c just works for the initialization of cpufreq driver. Now that you are
> reworking core and uncore power library and rearrange the directory under power.
> I think renaming this file name should be more appropriate in this series.
> >> *And has directories under drivers/power:*
> >> 1> For core dvfs driver:
> >> drivers/power/cpufreq/acpi_cpufreq.c
> >> drivers/power/cpufreq/cppc_cpufreq.c
> >> drivers/power/cpufreq/amd_pstate_cpufreq.c
> >> drivers/power/cpufreq/intel_pstate_cpufreq.c
> >> drivers/power/cpufreq/kvm_cpufreq.c
> >> The code of each cpufreq driver is not too much and doesn't probably
> >> increase. So don't need to use a directory for it.
> >>
> >> 2> For uncore dvfs driver:
> >> drivers/power/uncorefreq/intel_uncore.*
> >>> diff --git a/drivers/meson.build b/drivers/meson.build index
> >>> 66931d4241..9d77e0deab 100644
> >>> --- a/drivers/meson.build
> >>> +++ b/drivers/meson.build
> >>> @@ -29,6 +29,7 @@ subdirs = [
> >>>            'event',          # depends on common, bus, mempool and net.
> >>>            'baseband',       # depends on common and bus.
> >>>            'gpu',            # depends on common and bus.
> >>> +        'power',          # depends on common (in future).
> >>>    ]
> >>>
> >>>    if meson.is_cross_build()
> >>> diff --git a/lib/power/power_acpi_cpufreq.c
> >>> b/drivers/power/acpi/acpi_cpufreq.c
> >>> similarity index 95%
> >>> rename from lib/power/power_acpi_cpufreq.c rename to
> >>> drivers/power/acpi/acpi_cpufreq.c
> >> do not suggest to create one directory for each cpufreq driver.
> >> Because pstate drivers also comply with ACPI spec, right?
> >> In addition, the code of each cpufreq drivers are not too much.
> >> There is just one file under one directory which is not good.
> > One of our objectives for the refactoring is to selectively disable non-essential
> drivers using Meson build options.
> > However, by rearranging the driver structure, we risk disrupting this capability.
> I get your purpose.
> The cpufreq library has the feature and interface to detect which driver to use, right?
> So it is not necessary for cpufreq library to introduce the Meson build options, which
> probably makes it complicate.
In Meson, you can reduce code size by disabling specific drivers or components through build options,
allowing you to exclude unnecessary features. At runtime, the library will automatically detect the available driver,
and if it's not present in the build, initialization will fail.
We're not introducing any new complexities; rather, we aim to ensure that the drivers in drivers/power/*
are consistent with the other drivers.
> >>> index 81996e1c13..8637c69703 100644
> >>> --- a/lib/power/power_acpi_cpufreq.c
> >>> +++ b/drivers/power/acpi/acpi_cpufreq.c
> >>> @@ -10,7 +10,7 @@
> >>>    #include <rte_stdatomic.h>
> >>>    #include <rte_string_fns.h>
> >>>
> >>> -#include "power_acpi_cpufreq.h"
> >>> +#include "acpi_cpufreq.h"
> >>>    #include "power_common.h"
> >>>
> >> <...>
> >>> +if not is_linux
> >>> +    build = false
> >>> +    reason = 'only supported on Linux'
> >>> +endif
> >>> +sources = files('pstate_cpufreq.c')
> >>> +
> >>> +deps += ['power']
> >>> diff --git a/lib/power/power_pstate_cpufreq.c
> >>> b/drivers/power/pstate/pstate_cpufreq.c
> >>> similarity index 96%
> >>> rename from lib/power/power_pstate_cpufreq.c rename to
> >>> drivers/power/pstate/pstate_cpufreq.c
> >> pstate_cpufreq.c is actually intel_pstate cpufreq driver, right?
> >> So how about modify this file name to intel_pstate_cpufreq.c?
> > Yes, will fix this in next version.
> >>> index 2343121621..c32b1adabc 100644
> >>> --- a/lib/power/power_pstate_cpufreq.c
> >>> +++ b/drivers/power/pstate/pstate_cpufreq.c
> >>> @@ -15,7 +15,7 @@
> >>>    #include <rte_stdatomic.h>
> >>>
> >>>    #include "rte_power_pmd_mgmt.h"
> >>> -#include "power_pstate_cpufreq.h"
> >>> +#include "pstate_cpufreq.h"
> >>>    #include "power_common.h"
> >>>
> >>>    /* macros used for rounding frequency to nearest 100000 */ @@
> >>> -888,3
> >>> +888,23 @@ int power_pstate_get_capabilities(unsigned int lcore_id,
> >>>
> >>>        return 0;
> >>>    }
> >>> +
> >> <...>
> >>> diff --git a/lib/power/power_common.c b/lib/power/power_common.c
> >>> index 590986d5ef..6c06411e8b 100644
> >>> --- a/lib/power/power_common.c
> >>> +++ b/lib/power/power_common.c
> >>> @@ -12,7 +12,7 @@
> >>>
> >>>    #include "power_common.h"
> >>>
> >>> -RTE_LOG_REGISTER_DEFAULT(power_logtype, INFO);
> >>> +RTE_LOG_REGISTER_DEFAULT(rte_power_logtype, INFO);
> >>>
> >>>    #define POWER_SYSFILE_SCALING_DRIVER   \
> >>>                "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_driver"
> >>> diff --git a/lib/power/power_common.h b/lib/power/power_common.h
> >>> index
> >>> 83f742f42a..767686ee12 100644
> >>> --- a/lib/power/power_common.h
> >>> +++ b/lib/power/power_common.h
> >>> @@ -6,12 +6,13 @@
> >>>    #define _POWER_COMMON_H_
> >>>
> >>>    #include <rte_common.h>
> >>> +#include <rte_compat.h>
> >>>    #include <rte_log.h>
> >>>
> >>>    #define RTE_POWER_INVALID_FREQ_INDEX (~0)
> >>>
> >>> -extern int power_logtype;
> >>> -#define RTE_LOGTYPE_POWER power_logtype
> >>> +extern int rte_power_logtype;
> >>> +#define RTE_LOGTYPE_POWER rte_power_logtype
> >>>    #define POWER_LOG(level, ...) \
> >>>        RTE_LOG_LINE(level, POWER, "" __VA_ARGS__)
> >>>
> >>> @@ -23,13 +24,24 @@ extern int power_logtype;
> >>>    #endif
> >>>
> >>>    /* check if scaling driver matches one we want */
> >>> +__rte_internal
> >>>    int cpufreq_check_scaling_driver(const char *driver);
> >>> +
> >>> +__rte_internal
> >>>    int power_set_governor(unsigned int lcore_id, const char *new_governor,
> >>>                char *orig_governor, size_t orig_governor_len);
> >> suggest that move cpufreq interfaces like this to the
> >> rte_power_cpufreq_api.* I proposed above.
> > This is an internal API and isn’t intended for direct use by applications.
> > By moving it to rte_power_*, we risk exposing it inadvertently.
> we don't expose these to applications. application do not include this header file.
> power_set_governor() and cpufreq_check_scaling_driver() is just used by cpufreq
> driver. So they just can be seen by cpufreq lib or module, right?
> But if these interface are in power_common.h, pmd_mgmt and uncore driver also
> include this header file and can see them. This is not good.
> AFAIS, the power_common.h should just contain the kind of interfaces that are used
> by all power libs or sub-modules, like cpufreq, uncore, pmd_mgmt and so on.
OK., Will move this internal APIs from power_common.h to a separate header file.
> >
> >> The interfaces in power_comm.* can be used by all power modules, like
> >> core/uncore/pmd mgmt.
> >>> +
> >>> +__rte_internal
> >>>    int open_core_sysfs_file(FILE **f, const char *mode, const char *format, ...)
> >>>                __rte_format_printf(3, 4);
> >>> +
> >>> +__rte_internal
> >>>    int read_core_sysfs_u32(FILE *f, uint32_t *val);
> >>> +
> >>> +__rte_internal
> >>>    int read_core_sysfs_s(FILE *f, char *buf, unsigned int len);
> >>> +
> >>> +__rte_internal
> >>>    int write_core_sysfs_s(FILE *f, const char *str);
> >>>
> >>>    #endif /* _POWER_COMMON_H_ */
> >>> diff --git a/lib/power/rte_power.c b/lib/power/rte_power.c
> >> The name of the rte_power.c file is impropriate now. The context in
> >> this file is just for cpufreq, right?
> >> So I suggest that we need to rename this file as the
> >> rte_power_cpufreq_api.c
> > Yes, renaming rte_power.c to rte_power_cpufreq.c is definitely a
> > possible incremental change and will fix this as a separate patch.
> > .
> >
> >>> index 36c3f3da98..2bf6d40517 100644
> >>> --- a/lib/power/rte_power.c
> >>> +++ b/lib/power/rte_power.c
> >>> @@ -8,153 +8,86 @@
> >>>    #include <rte_spinlock.h>
> >>>
> >>>    #include "rte_power.h"
> >>> -#include "power_acpi_cpufreq.h"
> >>> -#include "power_cppc_cpufreq.h"
> >>>    #include "power_common.h"
> >>> -#include "power_kvm_vm.h"
> >>> -#include "power_pstate_cpufreq.h"
> >>> -#include "power_amd_pstate_cpufreq.h"
> >>>
> >>> -enum power_management_env global_default_env = PM_ENV_NOT_SET;
> >>> +static enum power_management_env global_default_env =
> >> PM_ENV_NOT_SET;
> >>> +static struct rte_power_core_ops *global_power_core_ops;
> >>>
> >>>    static rte_spinlock_t global_env_cfg_lock =
> >>> RTE_SPINLOCK_INITIALIZER;
> >>> +static RTE_TAILQ_HEAD(, rte_power_core_ops) core_ops_list =
> >>> +                     TAILQ_HEAD_INITIALIZER(core_ops_list);
> >>>
> >>> -/* function pointers */
> >>> -rte_power_freqs_t rte_power_freqs  = NULL; -rte_power_get_freq_t
> >>> rte_power_get_freq = NULL; -rte_power_set_freq_t rte_power_set_freq
> >>> = NULL; -rte_power_freq_change_t rte_power_freq_up = NULL;
> >>> -rte_power_freq_change_t rte_power_freq_down = NULL;
> >>> -rte_power_freq_change_t rte_power_freq_max = NULL;
> >>> -rte_power_freq_change_t rte_power_freq_min = NULL;
> >>> -rte_power_freq_change_t rte_power_turbo_status;
> >>> -rte_power_freq_change_t rte_power_freq_enable_turbo;
> >>> -rte_power_freq_change_t rte_power_freq_disable_turbo;
> >>> -rte_power_get_capabilities_t rte_power_get_capabilities;
> >>> -
> >>> -static void
> >>> -reset_power_function_ptrs(void)
> >>> +
> >>> +const char *power_env_str[] = {
> >>> +     "not set",
> >>> +     "acpi",
> >>> +     "kvm-vm",
> >>> +     "pstate",
> >>> +     "cppc",
> >>> +     "amd-pstate"
> >>> +};
> >>> +
> >>> +/* register the ops struct in rte_power_core_ops, return 0 on
> >>> +success. */ int rte_power_register_ops(struct rte_power_core_ops
> >>> +*driver_ops)
> >>>    {
> >>> -     rte_power_freqs  = NULL;
> >>> -     rte_power_get_freq = NULL;
> >>> -     rte_power_set_freq = NULL;
> >>> -     rte_power_freq_up = NULL;
> >>> -     rte_power_freq_down = NULL;
> >>> -     rte_power_freq_max = NULL;
> >>> -     rte_power_freq_min = NULL;
> >>> -     rte_power_turbo_status = NULL;
> >>> -     rte_power_freq_enable_turbo = NULL;
> >>> -     rte_power_freq_disable_turbo = NULL;
> >>> -     rte_power_get_capabilities = NULL;
> >>> +     if (!driver_ops->init || !driver_ops->exit ||
> >>> +             !driver_ops->check_env_support || !driver_ops->get_avail_freqs ||
> >>> +             !driver_ops->get_freq || !driver_ops->set_freq ||
> >>> +             !driver_ops->freq_up || !driver_ops->freq_down ||
> >>> +             !driver_ops->freq_max || !driver_ops->freq_min ||
> >>> +             !driver_ops->turbo_status || !driver_ops->enable_turbo ||
> >>> +             !driver_ops->disable_turbo || !driver_ops->get_caps) {
> >>> +             POWER_LOG(ERR, "Missing callbacks while registering
> >>> + power ops");
> >> turbo_status(), enable_turbo() and disable turbo() are not necessary, right?
> > Nope, this is required to get the current status unlike the capability API
> (get_caps()).
> ok
> >> These depand on the capabilities from get_caps().
> >>> +             return -EINVAL;
> >>> +     }
> >>> +
> >>> +     TAILQ_INSERT_TAIL(&core_ops_list, driver_ops, next);
> >>> +
> >>> +     return 0;
> >>>    }
> >>>
> >>>    int
> >>>    rte_power_check_env_supported(enum power_management_env env)
> >>>    {
> >>> -     switch (env) {
> >>> -     case PM_ENV_ACPI_CPUFREQ:
> >>> -             return power_acpi_cpufreq_check_supported();
> >>> -     case PM_ENV_PSTATE_CPUFREQ:
> >>> -             return power_pstate_cpufreq_check_supported();
> >>> -     case PM_ENV_KVM_VM:
> >>> -             return power_kvm_vm_check_supported();
> >>> -     case PM_ENV_CPPC_CPUFREQ:
> >>> -             return power_cppc_cpufreq_check_supported();
> >>> -     case PM_ENV_AMD_PSTATE_CPUFREQ:
> >>> -             return power_amd_pstate_cpufreq_check_supported();
> >>> -     default:
> >>> -             rte_errno = EINVAL;
> >>> -             return -1;
> >>> -     }
> >>> +     struct rte_power_core_ops *ops;
> >>> +
> >>> +     if (env >= RTE_DIM(power_env_str))
> >>> +             return 0;
> >>> +
> >>> +     RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
> >>> +             if (strncmp(ops->name, power_env_str[env],
> >>> +                             RTE_POWER_DRIVER_NAMESZ) == 0)
> >>> +                     return ops->check_env_support();
> >>> +
> >>> +     return 0;
> >>>    }
> >>>
> >>>    int
> >>>    rte_power_set_env(enum power_management_env env)
> >>>    {
> >>> +     struct rte_power_core_ops *ops;
> >>> +     int ret = -1;
> >>> +
> >>>        rte_spinlock_lock(&global_env_cfg_lock);
> >>>
> >>>        if (global_default_env != PM_ENV_NOT_SET) {
> >>>                POWER_LOG(ERR, "Power Management Environment already
> set.");
> >>> -             rte_spinlock_unlock(&global_env_cfg_lock);
> >>> -             return -1;
> >>> -     }
> >>> -
> >> <...>
> >>> -     if (ret == 0)
> >>> -             global_default_env = env;
> >>> -     else {
> >>> -             global_default_env = PM_ENV_NOT_SET;
> >>> -             reset_power_function_ptrs();
> >>> -     }
> >>> +     RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
> >>> +             if (strncmp(ops->name, power_env_str[env],
> >>> +                             RTE_POWER_DRIVER_NAMESZ) == 0) {
> >>> +                     global_power_core_ops = ops;
> >>> +                     global_default_env = env;
> >>> +                     ret = 0;
> >>> +                     goto out;
> >>> +             }
> >>> +     POWER_LOG(ERR, "Invalid Power Management Environment(%d) set",
> >>> +                     env);
> >>>
> >>> +out:
> >>>        rte_spinlock_unlock(&global_env_cfg_lock);
> >>>        return ret;
> >>>    }
> >>> @@ -164,94 +97,66 @@ rte_power_unset_env(void)
> >>>    {
> >>>        rte_spinlock_lock(&global_env_cfg_lock);
> >>>        global_default_env = PM_ENV_NOT_SET;
> >>> -     reset_power_function_ptrs();
> >>> +     global_power_core_ops = NULL;
> >>>        rte_spinlock_unlock(&global_env_cfg_lock);
> >>>    }
> >>>
> >>>    enum power_management_env
> >>> -rte_power_get_env(void) {
> >>> +rte_power_get_env(void)
> >>> +{
> >>>        return global_default_env;
> >>>    }
> >>>
> >>> -int
> >>> -rte_power_init(unsigned int lcore_id)
> >>> +struct rte_power_core_ops *
> >>> +rte_power_get_core_ops(void)
> >>>    {
> >>> -     int ret = -1;
> >>> +     RTE_ASSERT(global_power_core_ops != NULL);
> >>>
> >>> -     switch (global_default_env) {
> >>> -     case PM_ENV_ACPI_CPUFREQ:
> >>> -             return power_acpi_cpufreq_init(lcore_id);
> >>> -     case PM_ENV_KVM_VM:
> >>> -             return power_kvm_vm_init(lcore_id);
> >>> -     case PM_ENV_PSTATE_CPUFREQ:
> >>> -             return power_pstate_cpufreq_init(lcore_id);
> >>> -     case PM_ENV_CPPC_CPUFREQ:
> >>> -             return power_cppc_cpufreq_init(lcore_id);
> >>> -     case PM_ENV_AMD_PSTATE_CPUFREQ:
> >>> -             return power_amd_pstate_cpufreq_init(lcore_id);
> >>> -     default:
> >>> -             POWER_LOG(INFO, "Env isn't set yet!");
> >>> -     }
> >>> +     return global_power_core_ops;
> >>> +}
> >>>
> >>> -     /* Auto detect Environment */
> >>> -     POWER_LOG(INFO, "Attempting to initialise ACPI cpufreq power
> >> management...");
> >>> -     ret = power_acpi_cpufreq_init(lcore_id);
> >>> -     if (ret == 0) {
> >>> -             rte_power_set_env(PM_ENV_ACPI_CPUFREQ);
> >>> -             goto out;
> >>> -     }
> >>> +int
> >>> +rte_power_init(unsigned int lcore_id) {
> >>> +     struct rte_power_core_ops *ops;
> >>> +     uint8_t env;
> >>>
> >>> -     POWER_LOG(INFO, "Attempting to initialise PSTAT power
> management...");
> >>> -     ret = power_pstate_cpufreq_init(lcore_id);
> >>> -     if (ret == 0) {
> >>> -             rte_power_set_env(PM_ENV_PSTATE_CPUFREQ);
> >>> -             goto out;
> >>> -     }
> >>> +     if (global_default_env != PM_ENV_NOT_SET)
> >>> +             return global_power_core_ops->init(lcore_id);
> >>>
> >>> -     POWER_LOG(INFO, "Attempting to initialise AMD PSTATE power
> >> management...");
> >>> -     ret = power_amd_pstate_cpufreq_init(lcore_id);
> >>> -     if (ret == 0) {
> >>> -             rte_power_set_env(PM_ENV_AMD_PSTATE_CPUFREQ);
> >>> -             goto out;
> >>> -     }
> >>> +     POWER_LOG(INFO, "Env isn't set yet!");
> >> remove this log?
> >>> -     POWER_LOG(INFO, "Attempting to initialise CPPC power
> management...");
> >>> -     ret = power_cppc_cpufreq_init(lcore_id);
> >>> -     if (ret == 0) {
> >>> -             rte_power_set_env(PM_ENV_CPPC_CPUFREQ);
> >>> -             goto out;
> >>> -     }
> >>> +     /* Auto detect Environment */
> >>> +     RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
> >>> +             if (ops) {
> >>> +                     POWER_LOG(INFO,
> >>> +                             "Attempting to initialise %s cpufreq power management...",
> >>> +                             ops->name);
> >>> +                     if (ops->init(lcore_id) == 0) {
> >>> +                             for (env = 0; env < RTE_DIM(power_env_str); env++)
> >>> +                                     if (strncmp(ops->name, power_env_str[env],
> >>> +                                                     RTE_POWER_DRIVER_NAMESZ) == 0) {
> >>> +                                             rte_power_set_env(env);
> >>> +                                             return 0;
> >>> +                                     }
> >>> +                     }
> >>> +             }
> >> Can we change the logic of rte_power_set_env()? like:
> >>       RTE_TAILQ_FOREACH(ops, &core_ops_list, next) {
> >>           for (env = 0; env < RTE_DIM(power_env_str); env++) {
> >>                   if (strncmp(ops->name, power_env_str[env],
> >> RTE_POWER_DRIVER_NAMESZ) == 0 &&
> >>                       ops->init(lcore_id) == 0) {
> >>                       global_power_core_ops = ops;
> >>                       global_default_env = env;
> >>                   }
> >>           }
> >>       }
> >> That is easier to follow code.
> > Yes, will fix in next version.
> >
> >>> +
> >>> +     POWER_LOG(ERR,
> >>> +             "Unable to set Power Management Environment for lcore %u",
> >>> +             lcore_id);
> >>>
> >>> -     POWER_LOG(INFO, "Attempting to initialise VM power management...");
> >>> -     ret = power_kvm_vm_init(lcore_id);
> >>> -     if (ret == 0) {
> >>> -             rte_power_set_env(PM_ENV_KVM_VM);
> >>> -             goto out;
> >>> -     }
> >>> -     POWER_LOG(ERR, "Unable to set Power Management Environment for
> lcore
> >> "
> >>> -                     "%u", lcore_id);
> >>> -out:
> >>> -     return ret;
> >>> +     return -1;
> >>>    }
> >>>
> >>>    int
> >>>    rte_power_exit(unsigned int lcore_id)
> >>>    {
> >>> -     switch (global_default_env) {
> >>> -     case PM_ENV_ACPI_CPUFREQ:
> >>> -             return power_acpi_cpufreq_exit(lcore_id);
> >>> -     case PM_ENV_KVM_VM:
> >>> -             return power_kvm_vm_exit(lcore_id);
> >>> -     case PM_ENV_PSTATE_CPUFREQ:
> >>> -             return power_pstate_cpufreq_exit(lcore_id);
> >>> -     case PM_ENV_CPPC_CPUFREQ:
> >>> -             return power_cppc_cpufreq_exit(lcore_id);
> >>> -     case PM_ENV_AMD_PSTATE_CPUFREQ:
> >>> -             return power_amd_pstate_cpufreq_exit(lcore_id);
> >>> -     default:
> >>> -             POWER_LOG(ERR, "Environment has not been set, unable to exit
> >> gracefully");
> >>> +     if (global_default_env != PM_ENV_NOT_SET)
> >>> +             return global_power_core_ops->exit(lcore_id);
> >>>
> >>> -     }
> >>> -     return -1;
> >>> +     POWER_LOG(ERR,
> >>> +             "Environment has not been set, unable to exit
> >>> + gracefully");
> >>>
> >>> +     return -1;
> >>>    }
> >>> diff --git a/lib/power/rte_power.h b/lib/power/rte_power.h index
> >>> 4fa4afe399..5e4aacf08b 100644
> >>> --- a/lib/power/rte_power.h
> >>> +++ b/lib/power/rte_power.h
> >>> @@ -1,5 +1,6 @@
> >>>    /* SPDX-License-Identifier: BSD-3-Clause
> >>>     * Copyright(c) 2010-2014 Intel Corporation
> >>> + * Copyright(c) 2024 Advanced Micro Devices, Inc.
> >>>     */
> >>>
> >>>    #ifndef _RTE_POWER_H
> >>> @@ -14,14 +15,21 @@
> >>>    #include <rte_log.h>
> >>>    #include <rte_power_guest_channel.h>
> >>>
> >>> +#include "rte_power_core_ops.h"
> >>> +
> >>>    #ifdef __cplusplus
> >>>    extern "C" {
> >>>    #endif
> >>>
> >>>    /* Power Management Environment State */ -enum
> >>> power_management_env {PM_ENV_NOT_SET, PM_ENV_ACPI_CPUFREQ,
> PM_ENV_KVM_VM,
> >>> -             PM_ENV_PSTATE_CPUFREQ, PM_ENV_CPPC_CPUFREQ,
> >>> -             PM_ENV_AMD_PSTATE_CPUFREQ};
> >>> +enum power_management_env {
> >>> +     PM_ENV_NOT_SET = 0,
> >>> +     PM_ENV_ACPI_CPUFREQ,
> >>> +     PM_ENV_KVM_VM,
> >>> +     PM_ENV_PSTATE_CPUFREQ,
> >>> +     PM_ENV_CPPC_CPUFREQ,
> >>> +     PM_ENV_AMD_PSTATE_CPUFREQ
> >>> +};
> >>>
> >>>    /**
> >>>     * Check if a specific power management environment type is
> >>> supported on a @@ -66,6 +74,15 @@ void rte_power_unset_env(void);
> >>>     */
> >>>    enum power_management_env rte_power_get_env(void);
> >> I'd like to let user not know used which cpufreq driver, which is friendly to user.
> >>
> >> So we can rethink if this API is necessary.
> > For any API changes, could we handle this as a separate RFC for discussion?
> > It’s important that these changes are not included within the scope of this patch.
> Agreed.
> Can you post a separate RFC to disscuss this improvement later?
> >>> +/**
> >>> + * @internal Get the power ops struct from its index.
> >>> + *
> >>> + * @return
> >>> + *   The pointer to the ops struct in the table if registered.
> >>> + */
> >>> +struct rte_power_core_ops *
> >>> +rte_power_get_core_ops(void);
> >>> +
> >>>    /**
> >>>     * Initialize power management for a specific lcore. If rte_power_set_env()
> has
> >>>     * not been called then an auto-detect of the environment will
> >>> start and @@ -108,10 +125,13 @@ int rte_power_exit(unsigned int lcore_id);
> >>>     * @return
> >>>     *  The number of available frequencies.
> >>>     */
> >>> -typedef uint32_t (*rte_power_freqs_t)(unsigned int lcore_id, uint32_t *freqs,
> >>> -             uint32_t num);
> >>> +static inline uint32_t
> >>> +rte_power_freqs(unsigned int lcore_id, uint32_t *freqs, uint32_t n) {
> >>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> >>>
> >>> -extern rte_power_freqs_t rte_power_freqs;
> >>> +     return ops->get_avail_freqs(lcore_id, freqs, n); }
> >>>
> >>>    /**
> >>>     * Return the current index of available frequencies of a specific lcore.
> >>> @@ -124,9 +144,13 @@ extern rte_power_freqs_t rte_power_freqs;
> >>>     * @return
> >>>     *  The current index of available frequencies.
> >>>     */
> >>> -typedef uint32_t (*rte_power_get_freq_t)(unsigned int lcore_id);
> >>> +static inline uint32_t
> >>> +rte_power_get_freq(unsigned int lcore_id) {
> >>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> >>>
> >>> -extern rte_power_get_freq_t rte_power_get_freq;
> >>> +     return ops->get_freq(lcore_id); }
> >>>
> >>>    /**
> >>>     * Set the new frequency for a specific lcore by indicating the
> >>> index of @@ -144,82 +168,101 @@ extern rte_power_get_freq_t
> >> rte_power_get_freq;
> >>>     *  - 0 on success without frequency changed.
> >>>     *  - Negative on error.
> >>>     */
> >>> -typedef int (*rte_power_set_freq_t)(unsigned int lcore_id, uint32_t
> >>> index);
> >>> -
> >>> -extern rte_power_set_freq_t rte_power_set_freq;
> >>> +static inline uint32_t
> >>> +rte_power_set_freq(unsigned int lcore_id, uint32_t index) {
> >>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> >>>
> >>> -/**
> >>> - * Function pointer definition for generic frequency change
> >>> functions. Review
> >>> - * each environments specific documentation for usage.
> >>> - *
> >>> - * @param lcore_id
> >>> - *  lcore id.
> >>> - *
> >>> - * @return
> >>> - *  - 1 on success with frequency changed.
> >>> - *  - 0 on success without frequency changed.
> >>> - *  - Negative on error.
> >>> - */
> >>> -typedef int (*rte_power_freq_change_t)(unsigned int lcore_id);
> >>> +     return ops->set_freq(lcore_id, index); }
> >>>
> >>>    /**
> >>>     * Scale up the frequency of a specific lcore according to the available
> >>>     * frequencies.
> >>>     * Review each environments specific documentation for usage.
> >>>     */
> >>> -extern rte_power_freq_change_t rte_power_freq_up;
> >>> +static inline int
> >>> +rte_power_freq_up(unsigned int lcore_id) {
> >>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> >>> +
> >>> +     return ops->freq_up(lcore_id); }
> >>>
> >>>    /**
> >>>     * Scale down the frequency of a specific lcore according to the available
> >>>     * frequencies.
> >>>     * Review each environments specific documentation for usage.
> >>>     */
> >>> -extern rte_power_freq_change_t rte_power_freq_down;
> >>> +static inline int
> >>> +rte_power_freq_down(unsigned int lcore_id) {
> >>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> >>> +
> >>> +     return ops->freq_down(lcore_id); }
> >>>
> >>>    /**
> >>>     * Scale up the frequency of a specific lcore to the highest according to the
> >>>     * available frequencies.
> >>>     * Review each environments specific documentation for usage.
> >>>     */
> >>> -extern rte_power_freq_change_t rte_power_freq_max;
> >>> +static inline int
> >>> +rte_power_freq_max(unsigned int lcore_id) {
> >>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> >>> +
> >>> +     return ops->freq_max(lcore_id); }
> >>>
> >>>    /**
> >>>     * Scale down the frequency of a specific lcore to the lowest according to the
> >>>     * available frequencies.
> >>>     * Review each environments specific documentation for usage..
> >>>     */
> >>> -extern rte_power_freq_change_t rte_power_freq_min;
> >>> +static inline int
> >>> +rte_power_freq_min(unsigned int lcore_id) {
> >>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> >>> +
> >>> +     return ops->freq_min(lcore_id); }
> >>>
> >>>    /**
> >>>     * Query the Turbo Boost status of a specific lcore.
> >>>     * Review each environments specific documentation for usage..
> >>>     */
> >>> -extern rte_power_freq_change_t rte_power_turbo_status;
> >>> +static inline int
> >>> +rte_power_turbo_status(unsigned int lcore_id) {
> >>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> >>> +
> >>> +     return ops->turbo_status(lcore_id); }
> >>>
> >>>    /**
> >>>     * Enable Turbo Boost for this lcore.
> >>>     * Review each environments specific documentation for usage..
> >>>     */
> >>> -extern rte_power_freq_change_t rte_power_freq_enable_turbo;
> >>> +static inline int
> >>> +rte_power_freq_enable_turbo(unsigned int lcore_id) {
> >>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> >>> +
> >>> +     return ops->enable_turbo(lcore_id); }
> >>>
> >>>    /**
> >>>     * Disable Turbo Boost for this lcore.
> >>>     * Review each environments specific documentation for usage..
> >>>     */
> >>> -extern rte_power_freq_change_t rte_power_freq_disable_turbo;
> >>> +static inline int
> >>> +rte_power_freq_disable_turbo(unsigned int lcore_id) {
> >>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> >>>
> >>> -/**
> >>> - * Power capabilities summary.
> >>> - */
> >>> -struct rte_power_core_capabilities {
> >>> -     union {
> >>> -             uint64_t capabilities;
> >>> -             struct {
> >>> -                     uint64_t turbo:1;       /**< Turbo can be enabled. */
> >>> -                     uint64_t priority:1;    /**< SST-BF high freq core */
> >>> -             };
> >>> -     };
> >>> -};
> >>> +     return ops->disable_turbo(lcore_id); }
> >>>
> >>>    /**
> >>>     * Returns power capabilities for a specific lcore.
> >>> @@ -235,10 +278,14 @@ struct rte_power_core_capabilities {
> >>>     *  - 0 on success.
> >>>     *  - Negative on error.
> >>>     */
> >>> -typedef int (*rte_power_get_capabilities_t)(unsigned int lcore_id,
> >>> -             struct rte_power_core_capabilities *caps);
> >>> +static inline int
> >>> +rte_power_get_capabilities(unsigned int lcore_id,
> >>> +             struct rte_power_core_capabilities *caps) {
> >>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
> >>>
> >>> -extern rte_power_get_capabilities_t rte_power_get_capabilities;
> >>> +     return ops->get_caps(lcore_id, caps); }
> >>>
> >>>    #ifdef __cplusplus
> >>>    }
> >>> diff --git a/lib/power/rte_power_core_ops.h
> >>> b/lib/power/rte_power_core_ops.h new file mode 100644 index
> >>> 0000000000..356a64df79
> >>> --- /dev/null
> >>> +++ b/lib/power/rte_power_core_ops.h
> >>> @@ -0,0 +1,208 @@
> >>> +/* SPDX-License-Identifier: BSD-3-Clause
> >>> + * Copyright(c) 2010-2014 Intel Corporation
> >>> + * Copyright(c) 2024 Advanced Micro Devices, Inc.
> >>> + */
> >>> +
> >>> +#ifndef _RTE_POWER_CORE_OPS_H
> >>> +#define _RTE_POWER_CORE_OPS_H
> >>> +
> >> suggest rename the file as rte_power_cpufreq_api.h.
> >> If so, the role of this file is more clearly.
> >>> +__rte_internal
> >>> +int rte_power_register_ops(struct rte_power_core_ops *ops);
> >>> +
> >>> +/**
> >>> + * Macro to statically register the ops of a cpufreq driver.
> >>> + */
> >>> +#define RTE_POWER_REGISTER_OPS(ops)          \
> >>> +     RTE_INIT(power_hdlr_init_##ops)         \
> >>> +     {                                       \
> >>> +             rte_power_register_ops(&ops);   \
> >>> +     }
> >>> +
> >>> +/**
> >>> + * @internal Get the power ops struct from its index.
> >>> + *
> >>> + * @return
> >>> + *   The pointer to the ops struct in the table if registered.
> >>> + */
> >>> +struct rte_power_core_ops *
> >>> +rte_power_get_core_ops(void);
> >>> +
> >>> +#ifdef __cplusplus
> >>> +}
> >>> +#endif
> >>> +
> >>> +#endif
> >>> diff --git a/lib/power/version.map b/lib/power/version.map index
> >>> c9a226614e..bd64e0828f 100644
> >>> --- a/lib/power/version.map
> >>> +++ b/lib/power/version.map
> >>> @@ -51,4 +51,18 @@ EXPERIMENTAL {
> >>>        rte_power_set_uncore_env;
> >>>        rte_power_uncore_freqs;
> >>>        rte_power_unset_uncore_env;
> >>> +     # added in 24.07
> >> 24.07-->24.11?
> >>> +     rte_power_logtype;
> >>> +};
> >>> +
> >>> +INTERNAL {
> >>> +     global:
> >>> +
> >>> +     rte_power_register_ops;
> >>> +     cpufreq_check_scaling_driver;
> >>> +     power_set_governor;
> >>> +     open_core_sysfs_file;
> >>> +     read_core_sysfs_u32;
> >>> +     read_core_sysfs_s;
> >>> +     write_core_sysfs_s;
> >>>    };
  
lihuisong (C) Sept. 19, 2024, 3:37 a.m. UTC | #6
在 2024/9/18 16:37, Tummala, Sivaprasad 写道:
> [AMD Official Use Only - AMD Internal Distribution Only]
>
>> -----Original Message-----
>> From: lihuisong (C) <lihuisong@huawei.com>
>> Sent: Friday, September 13, 2024 1:05 PM
>> To: Tummala, Sivaprasad <Sivaprasad.Tummala@amd.com>
>> Cc: dev@dpdk.org; david.hunt@intel.com; anatoly.burakov@intel.com;
>> radu.nicolau@intel.com; cristian.dumitrescu@intel.com; jerinj@marvell.com;
>> konstantin.ananyev@huawei.com; Yigit, Ferruh <Ferruh.Yigit@amd.com>;
>> gakhil@marvell.com
>> Subject: Re: [PATCH v2 1/4] power: refactor core power management library
>>
>> Caution: This message originated from an External Source. Use proper caution
>> when opening attachments, clicking links, or responding.
>>
>>
>> 在 2024/9/12 19:17, Tummala, Sivaprasad 写道:
>>> [AMD Official Use Only - AMD Internal Distribution Only]
>>>
>>> Hi Huisong,
>>>
>>> Please find my response inline.
>>>
>>>> -----Original Message-----
>>>> From: lihuisong (C) <lihuisong@huawei.com>
>>>> Sent: Tuesday, August 27, 2024 1:51 PM
>>>> To: Tummala, Sivaprasad <Sivaprasad.Tummala@amd.com>
>>>> Cc: dev@dpdk.org; david.hunt@intel.com; anatoly.burakov@intel.com;
>>>> radu.nicolau@intel.com; cristian.dumitrescu@intel.com;
>>>> jerinj@marvell.com; konstantin.ananyev@huawei.com; Yigit, Ferruh
>>>> <Ferruh.Yigit@amd.com>; gakhil@marvell.com
>>>> Subject: Re: [PATCH v2 1/4] power: refactor core power management
>>>> library
>>>>
>>>> Caution: This message originated from an External Source. Use proper
>>>> caution when opening attachments, clicking links, or responding.
>>>>
>>>>
>>>> Hi Sivaprasa,
>>>>
>>>> Some comments inline.
>>>>
>>>> /Huisong
>>>>
>>>> 在 2024/8/26 21:06, Sivaprasad Tummala 写道:
>>>>> This patch introduces a comprehensive refactor to the core power
>>>>> management library. The primary focus is on improving modularity and
>>>>> organization by relocating specific driver implementations from the
>>>>> 'lib/power' directory to dedicated directories within
>>>>> 'drivers/power/core/*'. The adjustment of meson.build files enables
>>>>> the selective activation of individual drivers.
>>>>> These changes contribute to a significant enhancement in code
>>>>> organization, providing a clearer structure for driver implementations.
>>>>> The refactor aims to improve overall code clarity and boost
>>>>> maintainability. Additionally, it establishes a foundation for
>>>>> future development, allowing for more focused work on individual
>>>>> drivers and seamless integration of forthcoming enhancements.
>>>>>
>>>>> v2:
>>>>>     - added NULL check for global_core_ops in rte_power_get_core_ops
>>>>>
>>>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>>>>> ---
>>>>>     drivers/meson.build                           |   1 +
>>>>>     .../power/acpi/acpi_cpufreq.c                 |  22 +-
>>>>>     .../power/acpi/acpi_cpufreq.h                 |   6 +-
>>>>>     drivers/power/acpi/meson.build                |  10 +
>>>>>     .../power/amd_pstate/amd_pstate_cpufreq.c     |  24 +-
>>>>>     .../power/amd_pstate/amd_pstate_cpufreq.h     |   8 +-
>>>>>     drivers/power/amd_pstate/meson.build          |  10 +
>>>>>     .../power/cppc/cppc_cpufreq.c                 |  22 +-
>>>>>     .../power/cppc/cppc_cpufreq.h                 |   8 +-
>>>>>     drivers/power/cppc/meson.build                |  10 +
>>>>>     .../power/kvm_vm}/guest_channel.c             |   0
>>>>>     .../power/kvm_vm}/guest_channel.h             |   0
>>>>>     .../power/kvm_vm/kvm_vm.c                     |  22 +-
>>>>>     .../power/kvm_vm/kvm_vm.h                     |   6 +-
>>>>>     drivers/power/kvm_vm/meson.build              |  16 +
>>>>>     drivers/power/meson.build                     |  12 +
>>>>>     drivers/power/pstate/meson.build              |  10 +
>>>>>     .../power/pstate/pstate_cpufreq.c             |  22 +-
>>>>>     .../power/pstate/pstate_cpufreq.h             |   6 +-
>>>>>     lib/power/meson.build                         |   7 +-
>>>>>     lib/power/power_common.c                      |   2 +-
>>>>>     lib/power/power_common.h                      |  16 +-
>>>>>     lib/power/rte_power.c                         | 291 ++++++------------
>>>>>     lib/power/rte_power.h                         | 139 ++++++---
>>>>>     lib/power/rte_power_core_ops.h                | 208 +++++++++++++
>>>>>     lib/power/version.map                         |  14 +
>>>>>     26 files changed, 621 insertions(+), 271 deletions(-)
>>>>>     rename lib/power/power_acpi_cpufreq.c =>
>>>>> drivers/power/acpi/acpi_cpufreq.c
>>>> (95%)
>>>>>     rename lib/power/power_acpi_cpufreq.h =>
>>>>> drivers/power/acpi/acpi_cpufreq.h
>>>> (98%)
>>>>>     create mode 100644 drivers/power/acpi/meson.build
>>>>>     rename lib/power/power_amd_pstate_cpufreq.c =>
>>>> drivers/power/amd_pstate/amd_pstate_cpufreq.c (95%)
>>>>>     rename lib/power/power_amd_pstate_cpufreq.h =>
>>>> drivers/power/amd_pstate/amd_pstate_cpufreq.h (97%)
>>>>>     create mode 100644 drivers/power/amd_pstate/meson.build
>>>>>     rename lib/power/power_cppc_cpufreq.c =>
>>>>> drivers/power/cppc/cppc_cpufreq.c
>>>> (95%)
>>>>>     rename lib/power/power_cppc_cpufreq.h =>
>>>>> drivers/power/cppc/cppc_cpufreq.h
>>>> (97%)
>>>>>     create mode 100644 drivers/power/cppc/meson.build
>>>>>     rename {lib/power => drivers/power/kvm_vm}/guest_channel.c (100%)
>>>>>     rename {lib/power => drivers/power/kvm_vm}/guest_channel.h (100%)
>>>>>     rename lib/power/power_kvm_vm.c => drivers/power/kvm_vm/kvm_vm.c
>> (82%)
>>>>>     rename lib/power/power_kvm_vm.h => drivers/power/kvm_vm/kvm_vm.h
>> (98%)
>>>>>     create mode 100644 drivers/power/kvm_vm/meson.build
>>>>>     create mode 100644 drivers/power/meson.build
>>>>>     create mode 100644 drivers/power/pstate/meson.build
>>>>>     rename lib/power/power_pstate_cpufreq.c =>
>>>> drivers/power/pstate/pstate_cpufreq.c (96%)
>>>>>     rename lib/power/power_pstate_cpufreq.h =>
>>>> drivers/power/pstate/pstate_cpufreq.h (98%)
>>>>>     create mode 100644 lib/power/rte_power_core_ops.h
>>>> How about use the following directory structure?
>>>> *For power libs*
>>>> lib/power/power_common.*
>>>> lib/power/rte_power_pmd_mgmt.*
>>>> lib/power/rte_power_cpufreq_api.* (replacing rte_power.c file maybe simple for
>> us.
>>>> but I'm not sure if we can put the init of core, uncore and pmd mgmt
>>>> to rte_power_init.c in rte_power.c.)
>>>> lib/power/rte_power_uncore_freq_api.*
>>> Yes, renaming rte_power.c is definitely a possible incremental change that could
>> be considered later.
>>> However, for the time being, our focus will be on refactoring the cpufreq drivers
>> only.
>> The rte_power.c just works for the initialization of cpufreq driver. Now that you are
>> reworking core and uncore power library and rearrange the directory under power.
>> I think renaming this file name should be more appropriate in this series.
>>>> *And has directories under drivers/power:*
>>>> 1> For core dvfs driver:
>>>> drivers/power/cpufreq/acpi_cpufreq.c
>>>> drivers/power/cpufreq/cppc_cpufreq.c
>>>> drivers/power/cpufreq/amd_pstate_cpufreq.c
>>>> drivers/power/cpufreq/intel_pstate_cpufreq.c
>>>> drivers/power/cpufreq/kvm_cpufreq.c
>>>> The code of each cpufreq driver is not too much and doesn't probably
>>>> increase. So don't need to use a directory for it.
>>>>
>>>> 2> For uncore dvfs driver:
>>>> drivers/power/uncorefreq/intel_uncore.*
>>>>> diff --git a/drivers/meson.build b/drivers/meson.build index
>>>>> 66931d4241..9d77e0deab 100644
>>>>> --- a/drivers/meson.build
>>>>> +++ b/drivers/meson.build
>>>>> @@ -29,6 +29,7 @@ subdirs = [
>>>>>             'event',          # depends on common, bus, mempool and net.
>>>>>             'baseband',       # depends on common and bus.
>>>>>             'gpu',            # depends on common and bus.
>>>>> +        'power',          # depends on common (in future).
>>>>>     ]
>>>>>
>>>>>     if meson.is_cross_build()
>>>>> diff --git a/lib/power/power_acpi_cpufreq.c
>>>>> b/drivers/power/acpi/acpi_cpufreq.c
>>>>> similarity index 95%
>>>>> rename from lib/power/power_acpi_cpufreq.c rename to
>>>>> drivers/power/acpi/acpi_cpufreq.c
>>>> do not suggest to create one directory for each cpufreq driver.
>>>> Because pstate drivers also comply with ACPI spec, right?
>>>> In addition, the code of each cpufreq drivers are not too much.
>>>> There is just one file under one directory which is not good.
>>> One of our objectives for the refactoring is to selectively disable non-essential
>> drivers using Meson build options.
>>> However, by rearranging the driver structure, we risk disrupting this capability.
>> I get your purpose.
>> The cpufreq library has the feature and interface to detect which driver to use, right?
>> So it is not necessary for cpufreq library to introduce the Meson build options, which
>> probably makes it complicate.
> In Meson, you can reduce code size by disabling specific drivers or components through build options,
> allowing you to exclude unnecessary features. At runtime, the library will automatically detect the available driver,
> and if it's not present in the build, initialization will fail.
I still cannot understand why you want to do this.
The reducing code size is not a good reason. If all libraries or drivers 
want to do it like this, need to add more meson options.
Unless there is a situation where multiple drivers on a platform can be 
used, and the automatic detection mechanism is not enough to determine 
which driver to use.
Of course, you also can see the opinion of other reviewers.
> We're not introducing any new complexities; rather, we aim to ensure that the drivers in drivers/power/*
> are consistent with the other drivers.
What do the other drivers stand for?
Anyway, we need to make the directory hierarchy under drivers/power/ and 
lib/power/ clear.
>>>>> index 81996e1c13..8637c69703 100644
>>>>> --- a/lib/power/power_acpi_cpufreq.c
>>>>> +++ b/drivers/power/acpi/acpi_cpufreq.c
>>>>> @@ -10,7 +10,7 @@
>>>>>     #include <rte_stdatomic.h>
>>>>>     #include <rte_string_fns.h>
>>>>>
>>>>> -#include "power_acpi_cpufreq.h"
>>>>> +#include "acpi_cpufreq.h"
>>>>>     #include "power_common.h"
>>>>>
>>>> <...>
>>>>> +if not is_linux
>>>>> +    build = false
>>>>> +    reason = 'only supported on Linux'
>>>>> +endif
>>>>> +sources = files('pstate_cpufreq.c')
>>>>> +
>>>>> +deps += ['power']
>>>>> diff --git a/lib/power/power_pstate_cpufreq.c
>>>>> b/drivers/power/pstate/pstate_cpufreq.c
>>>>> similarity index 96%
>>>>> rename from lib/power/power_pstate_cpufreq.c rename to
>>>>> drivers/power/pstate/pstate_cpufreq.c
>>>> pstate_cpufreq.c is actually intel_pstate cpufreq driver, right?
>>>> So how about modify this file name to intel_pstate_cpufreq.c?
>>> Yes, will fix this in next version.
>>>>> index 2343121621..c32b1adabc 100644
>>>>> --- a/lib/power/power_pstate_cpufreq.c
>>>>> +++ b/drivers/power/pstate/pstate_cpufreq.c
>>>>> @@ -15,7 +15,7 @@
>>>>>     #include <rte_stdatomic.h>
>>>>>
>>>>>     #include "rte_power_pmd_mgmt.h"
>>>>> -#include "power_pstate_cpufreq.h"
>>>>> +#include "pstate_cpufreq.h"
>>>>>     #include "power_common.h"
>>>>>
>>>>>     /* macros used for rounding frequency to nearest 100000 */ @@
>>>>> -888,3
>>>>> +888,23 @@ int power_pstate_get_capabilities(unsigned int lcore_id,
>>>>>
>>>>>         return 0;
>>>>>     }
>>>>> +
>>>> <...>
>>>>> diff --git a/lib/power/power_common.c b/lib/power/power_common.c
>>>>> index 590986d5ef..6c06411e8b 100644
>>>>> --- a/lib/power/power_common.c
>>>>> +++ b/lib/power/power_common.c
>>>>> @@ -12,7 +12,7 @@
>>>>>
>>>>>     #include "power_common.h"
>>>>>
>>>>> -RTE_LOG_REGISTER_DEFAULT(power_logtype, INFO);
>>>>> +RTE_LOG_REGISTER_DEFAULT(rte_power_logtype, INFO);
>>>>>
>>>>>     #define POWER_SYSFILE_SCALING_DRIVER   \
>>>>>                 "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_driver"
>>>>> diff --git a/lib/power/power_common.h b/lib/power/power_common.h
>>>>> index
>>>>> 83f742f42a..767686ee12 100644
>>>>> --- a/lib/power/power_common.h
>>>>> +++ b/lib/power/power_common.h
>>>>> @@ -6,12 +6,13 @@
>>>>>     #define _POWER_COMMON_H_
>>>>>
>>>>>     #include <rte_common.h>
>>>>> +#include <rte_compat.h>
>>>>>     #include <rte_log.h>
>>>>>
>>>>>     #define RTE_POWER_INVALID_FREQ_INDEX (~0)
>>>>>
>>>>> -extern int power_logtype;
>>>>> -#define RTE_LOGTYPE_POWER power_logtype
>>>>> +extern int rte_power_logtype;
>>>>> +#define RTE_LOGTYPE_POWER rte_power_logtype
>>>>>     #define POWER_LOG(level, ...) \
>>>>>         RTE_LOG_LINE(level, POWER, "" __VA_ARGS__)
>>>>>
>>>>> @@ -23,13 +24,24 @@ extern int power_logtype;
>>>>>     #endif
>>>>>
>>>>>     /* check if scaling driver matches one we want */
>>>>> +__rte_internal
>>>>>     int cpufreq_check_scaling_driver(const char *driver);
>>>>> +
>>>>> +__rte_internal
>>>>>     int power_set_governor(unsigned int lcore_id, const char *new_governor,
>>>>>                 char *orig_governor, size_t orig_governor_len);
>>>> suggest that move cpufreq interfaces like this to the
>>>> rte_power_cpufreq_api.* I proposed above.
>>> This is an internal API and isn’t intended for direct use by applications.
>>> By moving it to rte_power_*, we risk exposing it inadvertently.
>> we don't expose these to applications. application do not include this header file.
>> power_set_governor() and cpufreq_check_scaling_driver() is just used by cpufreq
>> driver. So they just can be seen by cpufreq lib or module, right?
>> But if these interface are in power_common.h, pmd_mgmt and uncore driver also
>> include this header file and can see them. This is not good.
>> AFAIS, the power_common.h should just contain the kind of interfaces that are used
>> by all power libs or sub-modules, like cpufreq, uncore, pmd_mgmt and so on.
> OK., Will move this internal APIs from power_common.h to a separate header file.
>>>> The interfaces in power_comm.* can be used by all power modules, like
>>>> core/uncore/pmd mgmt.
>>>>> +
>>>>> +__rte_internal
>>>>>     int open_core_sysfs_file(FILE **f, const char *mode, const char *format, ...)
>>>>>                 __rte_format_printf(3, 4);
>>>>> +
>>>>> +__rte_internal
>>>>>     int read_core_sysfs_u32(FILE *f, uint32_t *val);
>>>>> +
>>>>> +__rte_internal
>>>>>     int read_core_sysfs_s(FILE *f, char *buf, unsigned int len);
>>>>> +
>>>>> +__rte_internal
>>>>>     int write_core_sysfs_s(FILE *f, const char *str);
>>>>>
>>>>>     #endif /* _POWER_COMMON_H_ */
>>>>> diff --git a/lib/power/rte_power.c b/lib/power/rte_power.c
>>>> The name of the rte_power.c file is impropriate now. The context in
>>>> this file is just for cpufreq, right?
>>>> So I suggest that we need to rename this file as the
>>>> rte_power_cpufreq_api.c
>>> Yes, renaming rte_power.c to rte_power_cpufreq.c is definitely a
>>> possible incremental change and will fix this as a separate patch.
>>> .
>>>
>>>>> index 36c3f3da98..2bf6d40517 100644
>>>>> --- a/lib/power/rte_power.c
>>>>> +++ b/lib/power/rte_power.c
>>>>> @@ -8,153 +8,86 @@
>>>>>     #include <rte_spinlock.h>
>>>>>
>>>>>     #include "rte_power.h"
>>>>> -#include "power_acpi_cpufreq.h"
>>>>> -#include "power_cppc_cpufreq.h"
>>>>>     #include "power_common.h"
>>>>> -#include "power_kvm_vm.h"
>>>>> -#include "power_pstate_cpufreq.h"
>>>>> -#include "power_amd_pstate_cpufreq.h"
>>>>>
>>>>> -enum power_management_env global_default_env = PM_ENV_NOT_SET;
>>>>> +static enum power_management_env global_default_env =
>>>> PM_ENV_NOT_SET;
>>>>> +static struct rte_power_core_ops *global_power_core_ops;
>>>>>
>>>>>     static rte_spinlock_t global_env_cfg_lock =
>>>>> RTE_SPINLOCK_INITIALIZER;
>>>>> +static RTE_TAILQ_HEAD(, rte_power_core_ops) core_ops_list =
>>>>> +                     TAILQ_HEAD_INITIALIZER(core_ops_list);
>>>>>
>>>>> -/* function pointers */
>>>>> -rte_power_freqs_t rte_power_freqs  = NULL; -rte_power_get_freq_t
>>>>> rte_power_get_freq = NULL; -rte_power_set_freq_t rte_power_set_freq
>>>>> = NULL; -rte_power_freq_change_t rte_power_freq_up = NULL;
>>>>> -rte_power_freq_change_t rte_power_freq_down = NULL;
>>>>> -rte_power_freq_change_t rte_power_freq_max = NULL;
>>>>> -rte_power_freq_change_t rte_power_freq_min = NULL;
>>>>> -rte_power_freq_change_t rte_power_turbo_status;
>>>>> -rte_power_freq_change_t rte_power_freq_enable_turbo;
>>>>> -rte_power_freq_change_t rte_power_freq_disable_turbo;
>>>>> -rte_power_get_capabilities_t rte_power_get_capabilities;
>>>>> -
>>>>> -static void
>>>>> -reset_power_function_ptrs(void)
>>>>> +
>>>>> +const char *power_env_str[] = {
>>>>> +     "not set",
>>>>> +     "acpi",
>>>>> +     "kvm-vm",
>>>>> +     "pstate",
>>>>> +     "cppc",
>>>>> +     "amd-pstate"
>>>>> +};
>>>>> +
>>>>> +/* register the ops struct in rte_power_core_ops, return 0 on
>>>>> +success. */ int rte_power_register_ops(struct rte_power_core_ops
>>>>> +*driver_ops)
>>>>>     {
>>>>> -     rte_power_freqs  = NULL;
>>>>> -     rte_power_get_freq = NULL;
>>>>> -     rte_power_set_freq = NULL;
>>>>> -     rte_power_freq_up = NULL;
>>>>> -     rte_power_freq_down = NULL;
>>>>> -     rte_power_freq_max = NULL;
>>>>> -     rte_power_freq_min = NULL;
>>>>> -     rte_power_turbo_status = NULL;
>>>>> -     rte_power_freq_enable_turbo = NULL;
>>>>> -     rte_power_freq_disable_turbo = NULL;
>>>>> -     rte_power_get_capabilities = NULL;
>>>>> +     if (!driver_ops->init || !driver_ops->exit ||
>>>>> +             !driver_ops->check_env_support || !driver_ops->get_avail_freqs ||
>>>>> +             !driver_ops->get_freq || !driver_ops->set_freq ||
>>>>> +             !driver_ops->freq_up || !driver_ops->freq_down ||
>>>>> +             !driver_ops->freq_max || !driver_ops->freq_min ||
>>>>> +             !driver_ops->turbo_status || !driver_ops->enable_turbo ||
>>>>> +             !driver_ops->disable_turbo || !driver_ops->get_caps) {
>>>>> +             POWER_LOG(ERR, "Missing callbacks while registering
>>>>> + power ops");
>>>> turbo_status(), enable_turbo() and disable turbo() are not necessary, right?
>>> Nope, this is required to get the current status unlike the capability API
>> (get_caps()).
>> ok
>>>> These depand on the capabilities from get_caps().
>>>>> +             return -EINVAL;
>>>>> +     }
>>>>> +
>>>>> +     TAILQ_INSERT_TAIL(&core_ops_list, driver_ops, next);
>>>>> +
>>>>> +     return 0;
>>>>>     }
>>>>>
>>>>>     int
>>>>>     rte_power_check_env_supported(enum power_management_env env)
>>>>>     {
>>>>> -     switch (env) {
>>>>> -     case PM_ENV_ACPI_CPUFREQ:
>>>>> -             return power_acpi_cpufreq_check_supported();
>>>>> -     case PM_ENV_PSTATE_CPUFREQ:
>>>>> -             return power_pstate_cpufreq_check_supported();
>>>>> -     case PM_ENV_KVM_VM:
>>>>> -             return power_kvm_vm_check_supported();
>>>>> -     case PM_ENV_CPPC_CPUFREQ:
>>>>> -             return power_cppc_cpufreq_check_supported();
>>>>> -     case PM_ENV_AMD_PSTATE_CPUFREQ:
>>>>> -             return power_amd_pstate_cpufreq_check_supported();
>>>>> -     default:
>>>>> -             rte_errno = EINVAL;
>>>>> -             return -1;
>>>>> -     }
>>>>> +     struct rte_power_core_ops *ops;
>>>>> +
>>>>> +     if (env >= RTE_DIM(power_env_str))
>>>>> +             return 0;
>>>>> +
>>>>> +     RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
>>>>> +             if (strncmp(ops->name, power_env_str[env],
>>>>> +                             RTE_POWER_DRIVER_NAMESZ) == 0)
>>>>> +                     return ops->check_env_support();
>>>>> +
>>>>> +     return 0;
>>>>>     }
>>>>>
>>>>>     int
>>>>>     rte_power_set_env(enum power_management_env env)
>>>>>     {
>>>>> +     struct rte_power_core_ops *ops;
>>>>> +     int ret = -1;
>>>>> +
>>>>>         rte_spinlock_lock(&global_env_cfg_lock);
>>>>>
>>>>>         if (global_default_env != PM_ENV_NOT_SET) {
>>>>>                 POWER_LOG(ERR, "Power Management Environment already
>> set.");
>>>>> -             rte_spinlock_unlock(&global_env_cfg_lock);
>>>>> -             return -1;
>>>>> -     }
>>>>> -
>>>> <...>
>>>>> -     if (ret == 0)
>>>>> -             global_default_env = env;
>>>>> -     else {
>>>>> -             global_default_env = PM_ENV_NOT_SET;
>>>>> -             reset_power_function_ptrs();
>>>>> -     }
>>>>> +     RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
>>>>> +             if (strncmp(ops->name, power_env_str[env],
>>>>> +                             RTE_POWER_DRIVER_NAMESZ) == 0) {
>>>>> +                     global_power_core_ops = ops;
>>>>> +                     global_default_env = env;
>>>>> +                     ret = 0;
>>>>> +                     goto out;
>>>>> +             }
>>>>> +     POWER_LOG(ERR, "Invalid Power Management Environment(%d) set",
>>>>> +                     env);
>>>>>
>>>>> +out:
>>>>>         rte_spinlock_unlock(&global_env_cfg_lock);
>>>>>         return ret;
>>>>>     }
>>>>> @@ -164,94 +97,66 @@ rte_power_unset_env(void)
>>>>>     {
>>>>>         rte_spinlock_lock(&global_env_cfg_lock);
>>>>>         global_default_env = PM_ENV_NOT_SET;
>>>>> -     reset_power_function_ptrs();
>>>>> +     global_power_core_ops = NULL;
>>>>>         rte_spinlock_unlock(&global_env_cfg_lock);
>>>>>     }
>>>>>
>>>>>     enum power_management_env
>>>>> -rte_power_get_env(void) {
>>>>> +rte_power_get_env(void)
>>>>> +{
>>>>>         return global_default_env;
>>>>>     }
>>>>>
>>>>> -int
>>>>> -rte_power_init(unsigned int lcore_id)
>>>>> +struct rte_power_core_ops *
>>>>> +rte_power_get_core_ops(void)
>>>>>     {
>>>>> -     int ret = -1;
>>>>> +     RTE_ASSERT(global_power_core_ops != NULL);
>>>>>
>>>>> -     switch (global_default_env) {
>>>>> -     case PM_ENV_ACPI_CPUFREQ:
>>>>> -             return power_acpi_cpufreq_init(lcore_id);
>>>>> -     case PM_ENV_KVM_VM:
>>>>> -             return power_kvm_vm_init(lcore_id);
>>>>> -     case PM_ENV_PSTATE_CPUFREQ:
>>>>> -             return power_pstate_cpufreq_init(lcore_id);
>>>>> -     case PM_ENV_CPPC_CPUFREQ:
>>>>> -             return power_cppc_cpufreq_init(lcore_id);
>>>>> -     case PM_ENV_AMD_PSTATE_CPUFREQ:
>>>>> -             return power_amd_pstate_cpufreq_init(lcore_id);
>>>>> -     default:
>>>>> -             POWER_LOG(INFO, "Env isn't set yet!");
>>>>> -     }
>>>>> +     return global_power_core_ops;
>>>>> +}
>>>>>
>>>>> -     /* Auto detect Environment */
>>>>> -     POWER_LOG(INFO, "Attempting to initialise ACPI cpufreq power
>>>> management...");
>>>>> -     ret = power_acpi_cpufreq_init(lcore_id);
>>>>> -     if (ret == 0) {
>>>>> -             rte_power_set_env(PM_ENV_ACPI_CPUFREQ);
>>>>> -             goto out;
>>>>> -     }
>>>>> +int
>>>>> +rte_power_init(unsigned int lcore_id) {
>>>>> +     struct rte_power_core_ops *ops;
>>>>> +     uint8_t env;
>>>>>
>>>>> -     POWER_LOG(INFO, "Attempting to initialise PSTAT power
>> management...");
>>>>> -     ret = power_pstate_cpufreq_init(lcore_id);
>>>>> -     if (ret == 0) {
>>>>> -             rte_power_set_env(PM_ENV_PSTATE_CPUFREQ);
>>>>> -             goto out;
>>>>> -     }
>>>>> +     if (global_default_env != PM_ENV_NOT_SET)
>>>>> +             return global_power_core_ops->init(lcore_id);
>>>>>
>>>>> -     POWER_LOG(INFO, "Attempting to initialise AMD PSTATE power
>>>> management...");
>>>>> -     ret = power_amd_pstate_cpufreq_init(lcore_id);
>>>>> -     if (ret == 0) {
>>>>> -             rte_power_set_env(PM_ENV_AMD_PSTATE_CPUFREQ);
>>>>> -             goto out;
>>>>> -     }
>>>>> +     POWER_LOG(INFO, "Env isn't set yet!");
>>>> remove this log?
>>>>> -     POWER_LOG(INFO, "Attempting to initialise CPPC power
>> management...");
>>>>> -     ret = power_cppc_cpufreq_init(lcore_id);
>>>>> -     if (ret == 0) {
>>>>> -             rte_power_set_env(PM_ENV_CPPC_CPUFREQ);
>>>>> -             goto out;
>>>>> -     }
>>>>> +     /* Auto detect Environment */
>>>>> +     RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
>>>>> +             if (ops) {
>>>>> +                     POWER_LOG(INFO,
>>>>> +                             "Attempting to initialise %s cpufreq power management...",
>>>>> +                             ops->name);
>>>>> +                     if (ops->init(lcore_id) == 0) {
>>>>> +                             for (env = 0; env < RTE_DIM(power_env_str); env++)
>>>>> +                                     if (strncmp(ops->name, power_env_str[env],
>>>>> +                                                     RTE_POWER_DRIVER_NAMESZ) == 0) {
>>>>> +                                             rte_power_set_env(env);
>>>>> +                                             return 0;
>>>>> +                                     }
>>>>> +                     }
>>>>> +             }
>>>> Can we change the logic of rte_power_set_env()? like:
>>>>        RTE_TAILQ_FOREACH(ops, &core_ops_list, next) {
>>>>            for (env = 0; env < RTE_DIM(power_env_str); env++) {
>>>>                    if (strncmp(ops->name, power_env_str[env],
>>>> RTE_POWER_DRIVER_NAMESZ) == 0 &&
>>>>                        ops->init(lcore_id) == 0) {
>>>>                        global_power_core_ops = ops;
>>>>                        global_default_env = env;
>>>>                    }
>>>>            }
>>>>        }
>>>> That is easier to follow code.
>>> Yes, will fix in next version.
>>>
>>>>> +
>>>>> +     POWER_LOG(ERR,
>>>>> +             "Unable to set Power Management Environment for lcore %u",
>>>>> +             lcore_id);
>>>>>
>>>>> -     POWER_LOG(INFO, "Attempting to initialise VM power management...");
>>>>> -     ret = power_kvm_vm_init(lcore_id);
>>>>> -     if (ret == 0) {
>>>>> -             rte_power_set_env(PM_ENV_KVM_VM);
>>>>> -             goto out;
>>>>> -     }
>>>>> -     POWER_LOG(ERR, "Unable to set Power Management Environment for
>> lcore
>>>> "
>>>>> -                     "%u", lcore_id);
>>>>> -out:
>>>>> -     return ret;
>>>>> +     return -1;
>>>>>     }
>>>>>
>>>>>     int
>>>>>     rte_power_exit(unsigned int lcore_id)
>>>>>     {
>>>>> -     switch (global_default_env) {
>>>>> -     case PM_ENV_ACPI_CPUFREQ:
>>>>> -             return power_acpi_cpufreq_exit(lcore_id);
>>>>> -     case PM_ENV_KVM_VM:
>>>>> -             return power_kvm_vm_exit(lcore_id);
>>>>> -     case PM_ENV_PSTATE_CPUFREQ:
>>>>> -             return power_pstate_cpufreq_exit(lcore_id);
>>>>> -     case PM_ENV_CPPC_CPUFREQ:
>>>>> -             return power_cppc_cpufreq_exit(lcore_id);
>>>>> -     case PM_ENV_AMD_PSTATE_CPUFREQ:
>>>>> -             return power_amd_pstate_cpufreq_exit(lcore_id);
>>>>> -     default:
>>>>> -             POWER_LOG(ERR, "Environment has not been set, unable to exit
>>>> gracefully");
>>>>> +     if (global_default_env != PM_ENV_NOT_SET)
>>>>> +             return global_power_core_ops->exit(lcore_id);
>>>>>
>>>>> -     }
>>>>> -     return -1;
>>>>> +     POWER_LOG(ERR,
>>>>> +             "Environment has not been set, unable to exit
>>>>> + gracefully");
>>>>>
>>>>> +     return -1;
>>>>>     }
>>>>> diff --git a/lib/power/rte_power.h b/lib/power/rte_power.h index
>>>>> 4fa4afe399..5e4aacf08b 100644
>>>>> --- a/lib/power/rte_power.h
>>>>> +++ b/lib/power/rte_power.h
>>>>> @@ -1,5 +1,6 @@
>>>>>     /* SPDX-License-Identifier: BSD-3-Clause
>>>>>      * Copyright(c) 2010-2014 Intel Corporation
>>>>> + * Copyright(c) 2024 Advanced Micro Devices, Inc.
>>>>>      */
>>>>>
>>>>>     #ifndef _RTE_POWER_H
>>>>> @@ -14,14 +15,21 @@
>>>>>     #include <rte_log.h>
>>>>>     #include <rte_power_guest_channel.h>
>>>>>
>>>>> +#include "rte_power_core_ops.h"
>>>>> +
>>>>>     #ifdef __cplusplus
>>>>>     extern "C" {
>>>>>     #endif
>>>>>
>>>>>     /* Power Management Environment State */ -enum
>>>>> power_management_env {PM_ENV_NOT_SET, PM_ENV_ACPI_CPUFREQ,
>> PM_ENV_KVM_VM,
>>>>> -             PM_ENV_PSTATE_CPUFREQ, PM_ENV_CPPC_CPUFREQ,
>>>>> -             PM_ENV_AMD_PSTATE_CPUFREQ};
>>>>> +enum power_management_env {
>>>>> +     PM_ENV_NOT_SET = 0,
>>>>> +     PM_ENV_ACPI_CPUFREQ,
>>>>> +     PM_ENV_KVM_VM,
>>>>> +     PM_ENV_PSTATE_CPUFREQ,
>>>>> +     PM_ENV_CPPC_CPUFREQ,
>>>>> +     PM_ENV_AMD_PSTATE_CPUFREQ
>>>>> +};
>>>>>
>>>>>     /**
>>>>>      * Check if a specific power management environment type is
>>>>> supported on a @@ -66,6 +74,15 @@ void rte_power_unset_env(void);
>>>>>      */
>>>>>     enum power_management_env rte_power_get_env(void);
>>>> I'd like to let user not know used which cpufreq driver, which is friendly to user.
>>>>
>>>> So we can rethink if this API is necessary.
>>> For any API changes, could we handle this as a separate RFC for discussion?
>>> It’s important that these changes are not included within the scope of this patch.
>> Agreed.
>> Can you post a separate RFC to disscuss this improvement later?
>>>>> +/**
>>>>> + * @internal Get the power ops struct from its index.
>>>>> + *
>>>>> + * @return
>>>>> + *   The pointer to the ops struct in the table if registered.
>>>>> + */
>>>>> +struct rte_power_core_ops *
>>>>> +rte_power_get_core_ops(void);
>>>>> +
>>>>>     /**
>>>>>      * Initialize power management for a specific lcore. If rte_power_set_env()
>> has
>>>>>      * not been called then an auto-detect of the environment will
>>>>> start and @@ -108,10 +125,13 @@ int rte_power_exit(unsigned int lcore_id);
>>>>>      * @return
>>>>>      *  The number of available frequencies.
>>>>>      */
>>>>> -typedef uint32_t (*rte_power_freqs_t)(unsigned int lcore_id, uint32_t *freqs,
>>>>> -             uint32_t num);
>>>>> +static inline uint32_t
>>>>> +rte_power_freqs(unsigned int lcore_id, uint32_t *freqs, uint32_t n) {
>>>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>>>>
>>>>> -extern rte_power_freqs_t rte_power_freqs;
>>>>> +     return ops->get_avail_freqs(lcore_id, freqs, n); }
>>>>>
>>>>>     /**
>>>>>      * Return the current index of available frequencies of a specific lcore.
>>>>> @@ -124,9 +144,13 @@ extern rte_power_freqs_t rte_power_freqs;
>>>>>      * @return
>>>>>      *  The current index of available frequencies.
>>>>>      */
>>>>> -typedef uint32_t (*rte_power_get_freq_t)(unsigned int lcore_id);
>>>>> +static inline uint32_t
>>>>> +rte_power_get_freq(unsigned int lcore_id) {
>>>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>>>>
>>>>> -extern rte_power_get_freq_t rte_power_get_freq;
>>>>> +     return ops->get_freq(lcore_id); }
>>>>>
>>>>>     /**
>>>>>      * Set the new frequency for a specific lcore by indicating the
>>>>> index of @@ -144,82 +168,101 @@ extern rte_power_get_freq_t
>>>> rte_power_get_freq;
>>>>>      *  - 0 on success without frequency changed.
>>>>>      *  - Negative on error.
>>>>>      */
>>>>> -typedef int (*rte_power_set_freq_t)(unsigned int lcore_id, uint32_t
>>>>> index);
>>>>> -
>>>>> -extern rte_power_set_freq_t rte_power_set_freq;
>>>>> +static inline uint32_t
>>>>> +rte_power_set_freq(unsigned int lcore_id, uint32_t index) {
>>>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>>>>
>>>>> -/**
>>>>> - * Function pointer definition for generic frequency change
>>>>> functions. Review
>>>>> - * each environments specific documentation for usage.
>>>>> - *
>>>>> - * @param lcore_id
>>>>> - *  lcore id.
>>>>> - *
>>>>> - * @return
>>>>> - *  - 1 on success with frequency changed.
>>>>> - *  - 0 on success without frequency changed.
>>>>> - *  - Negative on error.
>>>>> - */
>>>>> -typedef int (*rte_power_freq_change_t)(unsigned int lcore_id);
>>>>> +     return ops->set_freq(lcore_id, index); }
>>>>>
>>>>>     /**
>>>>>      * Scale up the frequency of a specific lcore according to the available
>>>>>      * frequencies.
>>>>>      * Review each environments specific documentation for usage.
>>>>>      */
>>>>> -extern rte_power_freq_change_t rte_power_freq_up;
>>>>> +static inline int
>>>>> +rte_power_freq_up(unsigned int lcore_id) {
>>>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>>>> +
>>>>> +     return ops->freq_up(lcore_id); }
>>>>>
>>>>>     /**
>>>>>      * Scale down the frequency of a specific lcore according to the available
>>>>>      * frequencies.
>>>>>      * Review each environments specific documentation for usage.
>>>>>      */
>>>>> -extern rte_power_freq_change_t rte_power_freq_down;
>>>>> +static inline int
>>>>> +rte_power_freq_down(unsigned int lcore_id) {
>>>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>>>> +
>>>>> +     return ops->freq_down(lcore_id); }
>>>>>
>>>>>     /**
>>>>>      * Scale up the frequency of a specific lcore to the highest according to the
>>>>>      * available frequencies.
>>>>>      * Review each environments specific documentation for usage.
>>>>>      */
>>>>> -extern rte_power_freq_change_t rte_power_freq_max;
>>>>> +static inline int
>>>>> +rte_power_freq_max(unsigned int lcore_id) {
>>>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>>>> +
>>>>> +     return ops->freq_max(lcore_id); }
>>>>>
>>>>>     /**
>>>>>      * Scale down the frequency of a specific lcore to the lowest according to the
>>>>>      * available frequencies.
>>>>>      * Review each environments specific documentation for usage..
>>>>>      */
>>>>> -extern rte_power_freq_change_t rte_power_freq_min;
>>>>> +static inline int
>>>>> +rte_power_freq_min(unsigned int lcore_id) {
>>>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>>>> +
>>>>> +     return ops->freq_min(lcore_id); }
>>>>>
>>>>>     /**
>>>>>      * Query the Turbo Boost status of a specific lcore.
>>>>>      * Review each environments specific documentation for usage..
>>>>>      */
>>>>> -extern rte_power_freq_change_t rte_power_turbo_status;
>>>>> +static inline int
>>>>> +rte_power_turbo_status(unsigned int lcore_id) {
>>>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>>>> +
>>>>> +     return ops->turbo_status(lcore_id); }
>>>>>
>>>>>     /**
>>>>>      * Enable Turbo Boost for this lcore.
>>>>>      * Review each environments specific documentation for usage..
>>>>>      */
>>>>> -extern rte_power_freq_change_t rte_power_freq_enable_turbo;
>>>>> +static inline int
>>>>> +rte_power_freq_enable_turbo(unsigned int lcore_id) {
>>>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>>>> +
>>>>> +     return ops->enable_turbo(lcore_id); }
>>>>>
>>>>>     /**
>>>>>      * Disable Turbo Boost for this lcore.
>>>>>      * Review each environments specific documentation for usage..
>>>>>      */
>>>>> -extern rte_power_freq_change_t rte_power_freq_disable_turbo;
>>>>> +static inline int
>>>>> +rte_power_freq_disable_turbo(unsigned int lcore_id) {
>>>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>>>>
>>>>> -/**
>>>>> - * Power capabilities summary.
>>>>> - */
>>>>> -struct rte_power_core_capabilities {
>>>>> -     union {
>>>>> -             uint64_t capabilities;
>>>>> -             struct {
>>>>> -                     uint64_t turbo:1;       /**< Turbo can be enabled. */
>>>>> -                     uint64_t priority:1;    /**< SST-BF high freq core */
>>>>> -             };
>>>>> -     };
>>>>> -};
>>>>> +     return ops->disable_turbo(lcore_id); }
>>>>>
>>>>>     /**
>>>>>      * Returns power capabilities for a specific lcore.
>>>>> @@ -235,10 +278,14 @@ struct rte_power_core_capabilities {
>>>>>      *  - 0 on success.
>>>>>      *  - Negative on error.
>>>>>      */
>>>>> -typedef int (*rte_power_get_capabilities_t)(unsigned int lcore_id,
>>>>> -             struct rte_power_core_capabilities *caps);
>>>>> +static inline int
>>>>> +rte_power_get_capabilities(unsigned int lcore_id,
>>>>> +             struct rte_power_core_capabilities *caps) {
>>>>> +     struct rte_power_core_ops *ops = rte_power_get_core_ops();
>>>>>
>>>>> -extern rte_power_get_capabilities_t rte_power_get_capabilities;
>>>>> +     return ops->get_caps(lcore_id, caps); }
>>>>>
>>>>>     #ifdef __cplusplus
>>>>>     }
>>>>> diff --git a/lib/power/rte_power_core_ops.h
>>>>> b/lib/power/rte_power_core_ops.h new file mode 100644 index
>>>>> 0000000000..356a64df79
>>>>> --- /dev/null
>>>>> +++ b/lib/power/rte_power_core_ops.h
>>>>> @@ -0,0 +1,208 @@
>>>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>>>> + * Copyright(c) 2010-2014 Intel Corporation
>>>>> + * Copyright(c) 2024 Advanced Micro Devices, Inc.
>>>>> + */
>>>>> +
>>>>> +#ifndef _RTE_POWER_CORE_OPS_H
>>>>> +#define _RTE_POWER_CORE_OPS_H
>>>>> +
>>>> suggest rename the file as rte_power_cpufreq_api.h.
>>>> If so, the role of this file is more clearly.
>>>>> +__rte_internal
>>>>> +int rte_power_register_ops(struct rte_power_core_ops *ops);
>>>>> +
>>>>> +/**
>>>>> + * Macro to statically register the ops of a cpufreq driver.
>>>>> + */
>>>>> +#define RTE_POWER_REGISTER_OPS(ops)          \
>>>>> +     RTE_INIT(power_hdlr_init_##ops)         \
>>>>> +     {                                       \
>>>>> +             rte_power_register_ops(&ops);   \
>>>>> +     }
>>>>> +
>>>>> +/**
>>>>> + * @internal Get the power ops struct from its index.
>>>>> + *
>>>>> + * @return
>>>>> + *   The pointer to the ops struct in the table if registered.
>>>>> + */
>>>>> +struct rte_power_core_ops *
>>>>> +rte_power_get_core_ops(void);
>>>>> +
>>>>> +#ifdef __cplusplus
>>>>> +}
>>>>> +#endif
>>>>> +
>>>>> +#endif
>>>>> diff --git a/lib/power/version.map b/lib/power/version.map index
>>>>> c9a226614e..bd64e0828f 100644
>>>>> --- a/lib/power/version.map
>>>>> +++ b/lib/power/version.map
>>>>> @@ -51,4 +51,18 @@ EXPERIMENTAL {
>>>>>         rte_power_set_uncore_env;
>>>>>         rte_power_uncore_freqs;
>>>>>         rte_power_unset_uncore_env;
>>>>> +     # added in 24.07
>>>> 24.07-->24.11?
>>>>> +     rte_power_logtype;
>>>>> +};
>>>>> +
>>>>> +INTERNAL {
>>>>> +     global:
>>>>> +
>>>>> +     rte_power_register_ops;
>>>>> +     cpufreq_check_scaling_driver;
>>>>> +     power_set_governor;
>>>>> +     open_core_sysfs_file;
>>>>> +     read_core_sysfs_u32;
>>>>> +     read_core_sysfs_s;
>>>>> +     write_core_sysfs_s;
>>>>>     };
  
Tummala, Sivaprasad Oct. 7, 2024, 7:25 p.m. UTC | #7
[AMD Official Use Only - AMD Internal Distribution Only]

Hi Stephen,

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Monday, August 26, 2024 8:56 PM
> To: Tummala, Sivaprasad <Sivaprasad.Tummala@amd.com>
> Cc: david.hunt@intel.com; anatoly.burakov@intel.com; jerinj@marvell.com;
> radu.nicolau@intel.com; gakhil@marvell.com; cristian.dumitrescu@intel.com; Yigit,
> Ferruh <Ferruh.Yigit@amd.com>; konstantin.ananyev@huawei.com;
> dev@dpdk.org
> Subject: Re: [PATCH v2 1/4] power: refactor core power management library
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> On Mon, 26 Aug 2024 13:06:46 +0000
> Sivaprasad Tummala <sivaprasad.tummala@amd.com> wrote:
>
> > +static struct rte_power_core_ops acpi_ops = {
> > +     .name = "acpi",
> > +     .init = power_acpi_cpufreq_init,
> > +     .exit = power_acpi_cpufreq_exit,
> > +     .check_env_support = power_acpi_cpufreq_check_supported,
> > +     .get_avail_freqs = power_acpi_cpufreq_freqs,
> > +     .get_freq = power_acpi_cpufreq_get_freq,
> > +     .set_freq = power_acpi_cpufreq_set_freq,
> > +     .freq_down = power_acpi_cpufreq_freq_down,
> > +     .freq_up = power_acpi_cpufreq_freq_up,
> > +     .freq_max = power_acpi_cpufreq_freq_max,
> > +     .freq_min = power_acpi_cpufreq_freq_min,
> > +     .turbo_status = power_acpi_turbo_status,
> > +     .enable_turbo = power_acpi_enable_turbo,
> > +     .disable_turbo = power_acpi_disable_turbo,
> > +     .get_caps = power_acpi_get_capabilities };
> > +
>
> Can this be made const?
> It is good for security and overall safety to have structures with function pointers
> marked const.
The struct relies on dynamic list operations, it makes sense to keep it mutable.
This will ensure we can effectively manage the operations as needed without
running into issues with read-only restrictions.
  

Patch

diff --git a/drivers/meson.build b/drivers/meson.build
index 66931d4241..9d77e0deab 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -29,6 +29,7 @@  subdirs = [
         'event',          # depends on common, bus, mempool and net.
         'baseband',       # depends on common and bus.
         'gpu',            # depends on common and bus.
+        'power',          # depends on common (in future).
 ]
 
 if meson.is_cross_build()
diff --git a/lib/power/power_acpi_cpufreq.c b/drivers/power/acpi/acpi_cpufreq.c
similarity index 95%
rename from lib/power/power_acpi_cpufreq.c
rename to drivers/power/acpi/acpi_cpufreq.c
index 81996e1c13..8637c69703 100644
--- a/lib/power/power_acpi_cpufreq.c
+++ b/drivers/power/acpi/acpi_cpufreq.c
@@ -10,7 +10,7 @@ 
 #include <rte_stdatomic.h>
 #include <rte_string_fns.h>
 
-#include "power_acpi_cpufreq.h"
+#include "acpi_cpufreq.h"
 #include "power_common.h"
 
 #define STR_SIZE     1024
@@ -577,3 +577,23 @@  int power_acpi_get_capabilities(unsigned int lcore_id,
 
 	return 0;
 }
+
+static struct rte_power_core_ops acpi_ops = {
+	.name = "acpi",
+	.init = power_acpi_cpufreq_init,
+	.exit = power_acpi_cpufreq_exit,
+	.check_env_support = power_acpi_cpufreq_check_supported,
+	.get_avail_freqs = power_acpi_cpufreq_freqs,
+	.get_freq = power_acpi_cpufreq_get_freq,
+	.set_freq = power_acpi_cpufreq_set_freq,
+	.freq_down = power_acpi_cpufreq_freq_down,
+	.freq_up = power_acpi_cpufreq_freq_up,
+	.freq_max = power_acpi_cpufreq_freq_max,
+	.freq_min = power_acpi_cpufreq_freq_min,
+	.turbo_status = power_acpi_turbo_status,
+	.enable_turbo = power_acpi_enable_turbo,
+	.disable_turbo = power_acpi_disable_turbo,
+	.get_caps = power_acpi_get_capabilities
+};
+
+RTE_POWER_REGISTER_OPS(acpi_ops);
diff --git a/lib/power/power_acpi_cpufreq.h b/drivers/power/acpi/acpi_cpufreq.h
similarity index 98%
rename from lib/power/power_acpi_cpufreq.h
rename to drivers/power/acpi/acpi_cpufreq.h
index 682fd9278c..1194a7e2a5 100644
--- a/lib/power/power_acpi_cpufreq.h
+++ b/drivers/power/acpi/acpi_cpufreq.h
@@ -2,15 +2,15 @@ 
  * Copyright(c) 2010-2014 Intel Corporation
  */
 
-#ifndef _POWER_ACPI_CPUFREQ_H
-#define _POWER_ACPI_CPUFREQ_H
+#ifndef _ACPI_CPUFREQ_H
+#define _ACPI_CPUFREQ_H
 
 /**
  * @file
  * RTE Power Management via userspace ACPI cpufreq
  */
 
-#include "rte_power.h"
+#include "rte_power_core_ops.h"
 
 /**
  * Check if ACPI power management is supported.
diff --git a/drivers/power/acpi/meson.build b/drivers/power/acpi/meson.build
new file mode 100644
index 0000000000..f5afc893ce
--- /dev/null
+++ b/drivers/power/acpi/meson.build
@@ -0,0 +1,10 @@ 
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 Advanced Micro Devices, Inc.
+
+if not is_linux
+    build = false
+    reason = 'only supported on Linux'
+endif
+sources = files('acpi_cpufreq.c')
+
+deps += ['power']
diff --git a/lib/power/power_amd_pstate_cpufreq.c b/drivers/power/amd_pstate/amd_pstate_cpufreq.c
similarity index 95%
rename from lib/power/power_amd_pstate_cpufreq.c
rename to drivers/power/amd_pstate/amd_pstate_cpufreq.c
index 090a0d96cb..f571f4184a 100644
--- a/lib/power/power_amd_pstate_cpufreq.c
+++ b/drivers/power/amd_pstate/amd_pstate_cpufreq.c
@@ -1,7 +1,7 @@ 
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2021 Intel Corporation
  * Copyright(c) 2021 Arm Limited
- * Copyright(c) 2023 Amd Limited
+ * Copyright(c) 2024 Advanced Micro Devices, Inc.
  */
 
 #include <stdlib.h>
@@ -9,7 +9,7 @@ 
 #include <rte_memcpy.h>
 #include <rte_stdatomic.h>
 
-#include "power_amd_pstate_cpufreq.h"
+#include "amd_pstate_cpufreq.h"
 #include "power_common.h"
 
 /* macros used for rounding frequency to nearest 1000 */
@@ -700,3 +700,23 @@  power_amd_pstate_get_capabilities(unsigned int lcore_id,
 
 	return 0;
 }
+
+static struct rte_power_core_ops amd_pstate_ops = {
+	.name = "amd-pstate",
+	.init = power_amd_pstate_cpufreq_init,
+	.exit = power_amd_pstate_cpufreq_exit,
+	.check_env_support = power_amd_pstate_cpufreq_check_supported,
+	.get_avail_freqs = power_amd_pstate_cpufreq_freqs,
+	.get_freq = power_amd_pstate_cpufreq_get_freq,
+	.set_freq = power_amd_pstate_cpufreq_set_freq,
+	.freq_down = power_amd_pstate_cpufreq_freq_down,
+	.freq_up = power_amd_pstate_cpufreq_freq_up,
+	.freq_max = power_amd_pstate_cpufreq_freq_max,
+	.freq_min = power_amd_pstate_cpufreq_freq_min,
+	.turbo_status = power_amd_pstate_turbo_status,
+	.enable_turbo = power_amd_pstate_enable_turbo,
+	.disable_turbo = power_amd_pstate_disable_turbo,
+	.get_caps = power_amd_pstate_get_capabilities
+};
+
+RTE_POWER_REGISTER_OPS(amd_pstate_ops);
diff --git a/lib/power/power_amd_pstate_cpufreq.h b/drivers/power/amd_pstate/amd_pstate_cpufreq.h
similarity index 97%
rename from lib/power/power_amd_pstate_cpufreq.h
rename to drivers/power/amd_pstate/amd_pstate_cpufreq.h
index b02f9f98e4..b04b2f28c0 100644
--- a/lib/power/power_amd_pstate_cpufreq.h
+++ b/drivers/power/amd_pstate/amd_pstate_cpufreq.h
@@ -1,18 +1,18 @@ 
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2021 Intel Corporation
  * Copyright(c) 2021 Arm Limited
- * Copyright(c) 2023 Amd Limited
+ * Copyright(c) 2024 Advanced Micro Devices, Inc.
  */
 
-#ifndef _POWER_AMD_PSTATE_CPUFREQ_H
-#define _POWER_AMD_PSTATE_CPUFREQ_H
+#ifndef _AMD_PSTATE_CPUFREQ_H
+#define _AMD_PSTATE_CPUFREQ_H
 
 /**
  * @file
  * RTE Power Management via userspace AMD pstate cpufreq
  */
 
-#include "rte_power.h"
+#include "rte_power_core_ops.h"
 
 /**
  * Check if amd p-state power management is supported.
diff --git a/drivers/power/amd_pstate/meson.build b/drivers/power/amd_pstate/meson.build
new file mode 100644
index 0000000000..acaf20b388
--- /dev/null
+++ b/drivers/power/amd_pstate/meson.build
@@ -0,0 +1,10 @@ 
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 Advanced Micro Devices, Inc.
+
+if not is_linux
+    build = false
+    reason = 'only supported on Linux'
+endif
+sources = files('amd_pstate_cpufreq.c')
+
+deps += ['power']
diff --git a/lib/power/power_cppc_cpufreq.c b/drivers/power/cppc/cppc_cpufreq.c
similarity index 95%
rename from lib/power/power_cppc_cpufreq.c
rename to drivers/power/cppc/cppc_cpufreq.c
index 32aaacb948..775b8f4434 100644
--- a/lib/power/power_cppc_cpufreq.c
+++ b/drivers/power/cppc/cppc_cpufreq.c
@@ -8,7 +8,7 @@ 
 #include <rte_memcpy.h>
 #include <rte_stdatomic.h>
 
-#include "power_cppc_cpufreq.h"
+#include "cppc_cpufreq.h"
 #include "power_common.h"
 
 /* macros used for rounding frequency to nearest 100000 */
@@ -685,3 +685,23 @@  power_cppc_get_capabilities(unsigned int lcore_id,
 
 	return 0;
 }
+
+static struct rte_power_core_ops cppc_ops = {
+	.name = "cppc",
+	.init = power_cppc_cpufreq_init,
+	.exit = power_cppc_cpufreq_exit,
+	.check_env_support = power_cppc_cpufreq_check_supported,
+	.get_avail_freqs = power_cppc_cpufreq_freqs,
+	.get_freq = power_cppc_cpufreq_get_freq,
+	.set_freq = power_cppc_cpufreq_set_freq,
+	.freq_down = power_cppc_cpufreq_freq_down,
+	.freq_up = power_cppc_cpufreq_freq_up,
+	.freq_max = power_cppc_cpufreq_freq_max,
+	.freq_min = power_cppc_cpufreq_freq_min,
+	.turbo_status = power_cppc_turbo_status,
+	.enable_turbo = power_cppc_enable_turbo,
+	.disable_turbo = power_cppc_disable_turbo,
+	.get_caps = power_cppc_get_capabilities
+};
+
+RTE_POWER_REGISTER_OPS(cppc_ops);
diff --git a/lib/power/power_cppc_cpufreq.h b/drivers/power/cppc/cppc_cpufreq.h
similarity index 97%
rename from lib/power/power_cppc_cpufreq.h
rename to drivers/power/cppc/cppc_cpufreq.h
index f4121b237e..d6e32fdd47 100644
--- a/lib/power/power_cppc_cpufreq.h
+++ b/drivers/power/cppc/cppc_cpufreq.h
@@ -3,15 +3,15 @@ 
  * Copyright(c) 2021 Arm Limited
  */
 
-#ifndef _POWER_CPPC_CPUFREQ_H
-#define _POWER_CPPC_CPUFREQ_H
+#ifndef _CPPC_CPUFREQ_H
+#define _CPPC_CPUFREQ_H
 
 /**
  * @file
  * RTE Power Management via userspace CPPC cpufreq
  */
 
-#include "rte_power.h"
+#include "rte_power_core_ops.h"
 
 /**
  * Check if CPPC power management is supported.
@@ -215,4 +215,4 @@  int power_cppc_disable_turbo(unsigned int lcore_id);
 int power_cppc_get_capabilities(unsigned int lcore_id,
 		struct rte_power_core_capabilities *caps);
 
-#endif /* _POWER_CPPC_CPUFREQ_H */
+#endif /* _CPPC_CPUFREQ_H */
diff --git a/drivers/power/cppc/meson.build b/drivers/power/cppc/meson.build
new file mode 100644
index 0000000000..f1948cd424
--- /dev/null
+++ b/drivers/power/cppc/meson.build
@@ -0,0 +1,10 @@ 
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 Advanced Micro Devices, Inc.
+
+if not is_linux
+    build = false
+    reason = 'only supported on Linux'
+endif
+sources = files('cppc_cpufreq.c')
+
+deps += ['power']
diff --git a/lib/power/guest_channel.c b/drivers/power/kvm_vm/guest_channel.c
similarity index 100%
rename from lib/power/guest_channel.c
rename to drivers/power/kvm_vm/guest_channel.c
diff --git a/lib/power/guest_channel.h b/drivers/power/kvm_vm/guest_channel.h
similarity index 100%
rename from lib/power/guest_channel.h
rename to drivers/power/kvm_vm/guest_channel.h
diff --git a/lib/power/power_kvm_vm.c b/drivers/power/kvm_vm/kvm_vm.c
similarity index 82%
rename from lib/power/power_kvm_vm.c
rename to drivers/power/kvm_vm/kvm_vm.c
index f15be8fac5..a1342dcd8b 100644
--- a/lib/power/power_kvm_vm.c
+++ b/drivers/power/kvm_vm/kvm_vm.c
@@ -9,7 +9,7 @@ 
 #include "rte_power_guest_channel.h"
 #include "guest_channel.h"
 #include "power_common.h"
-#include "power_kvm_vm.h"
+#include "kvm_vm.h"
 
 #define FD_PATH "/dev/virtio-ports/virtio.serial.port.poweragent"
 
@@ -137,3 +137,23 @@  int power_kvm_vm_get_capabilities(__rte_unused unsigned int lcore_id,
 	POWER_LOG(ERR, "rte_power_get_capabilities is not implemented for Virtual Machine Power Management");
 	return -ENOTSUP;
 }
+
+static struct rte_power_core_ops kvm_vm_ops = {
+	.name = "kvm-vm",
+	.init = power_kvm_vm_init,
+	.exit = power_kvm_vm_exit,
+	.check_env_support = power_kvm_vm_check_supported,
+	.get_avail_freqs = power_kvm_vm_freqs,
+	.get_freq = power_kvm_vm_get_freq,
+	.set_freq = power_kvm_vm_set_freq,
+	.freq_down = power_kvm_vm_freq_down,
+	.freq_up = power_kvm_vm_freq_up,
+	.freq_max = power_kvm_vm_freq_max,
+	.freq_min = power_kvm_vm_freq_min,
+	.turbo_status = power_kvm_vm_turbo_status,
+	.enable_turbo = power_kvm_vm_enable_turbo,
+	.disable_turbo = power_kvm_vm_disable_turbo,
+	.get_caps = power_kvm_vm_get_capabilities
+};
+
+RTE_POWER_REGISTER_OPS(kvm_vm_ops);
diff --git a/lib/power/power_kvm_vm.h b/drivers/power/kvm_vm/kvm_vm.h
similarity index 98%
rename from lib/power/power_kvm_vm.h
rename to drivers/power/kvm_vm/kvm_vm.h
index 303fcc041b..64086a67e7 100644
--- a/lib/power/power_kvm_vm.h
+++ b/drivers/power/kvm_vm/kvm_vm.h
@@ -2,15 +2,15 @@ 
  * Copyright(c) 2010-2014 Intel Corporation
  */
 
-#ifndef _POWER_KVM_VM_H
-#define _POWER_KVM_VM_H
+#ifndef _KVM_VM_H
+#define _KVM_VM_H
 
 /**
  * @file
  * RTE Power Management KVM VM
  */
 
-#include "rte_power.h"
+#include "rte_power_core_ops.h"
 
 /**
  * Check if KVM power management is supported.
diff --git a/drivers/power/kvm_vm/meson.build b/drivers/power/kvm_vm/meson.build
new file mode 100644
index 0000000000..405524ce7c
--- /dev/null
+++ b/drivers/power/kvm_vm/meson.build
@@ -0,0 +1,16 @@ 
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(C) 2024 Advanced Micro Devices, Inc.
+#
+
+if not is_linux
+    build = false
+    reason = 'only supported on Linux'
+    subdir_done()
+endif
+
+sources = files(
+        'guest_channel.c',
+        'kvm_vm.c',
+)
+
+deps += ['power']
diff --git a/drivers/power/meson.build b/drivers/power/meson.build
new file mode 100644
index 0000000000..8c7215c639
--- /dev/null
+++ b/drivers/power/meson.build
@@ -0,0 +1,12 @@ 
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 Advanced Micro Devices, Inc.
+
+drivers = [
+        'acpi',
+        'amd_pstate',
+        'cppc',
+        'kvm_vm',
+        'pstate'
+]
+
+std_deps = ['power']
diff --git a/drivers/power/pstate/meson.build b/drivers/power/pstate/meson.build
new file mode 100644
index 0000000000..9cd47833fb
--- /dev/null
+++ b/drivers/power/pstate/meson.build
@@ -0,0 +1,10 @@ 
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 Advanced Micro Devices, Inc.
+
+if not is_linux
+    build = false
+    reason = 'only supported on Linux'
+endif
+sources = files('pstate_cpufreq.c')
+
+deps += ['power']
diff --git a/lib/power/power_pstate_cpufreq.c b/drivers/power/pstate/pstate_cpufreq.c
similarity index 96%
rename from lib/power/power_pstate_cpufreq.c
rename to drivers/power/pstate/pstate_cpufreq.c
index 2343121621..c32b1adabc 100644
--- a/lib/power/power_pstate_cpufreq.c
+++ b/drivers/power/pstate/pstate_cpufreq.c
@@ -15,7 +15,7 @@ 
 #include <rte_stdatomic.h>
 
 #include "rte_power_pmd_mgmt.h"
-#include "power_pstate_cpufreq.h"
+#include "pstate_cpufreq.h"
 #include "power_common.h"
 
 /* macros used for rounding frequency to nearest 100000 */
@@ -888,3 +888,23 @@  int power_pstate_get_capabilities(unsigned int lcore_id,
 
 	return 0;
 }
+
+static struct rte_power_core_ops pstate_ops = {
+	.name = "pstate",
+	.init = power_pstate_cpufreq_init,
+	.exit = power_pstate_cpufreq_exit,
+	.check_env_support = power_pstate_cpufreq_check_supported,
+	.get_avail_freqs = power_pstate_cpufreq_freqs,
+	.get_freq = power_pstate_cpufreq_get_freq,
+	.set_freq = power_pstate_cpufreq_set_freq,
+	.freq_down = power_pstate_cpufreq_freq_down,
+	.freq_up = power_pstate_cpufreq_freq_up,
+	.freq_max = power_pstate_cpufreq_freq_max,
+	.freq_min = power_pstate_cpufreq_freq_min,
+	.turbo_status = power_pstate_turbo_status,
+	.enable_turbo = power_pstate_enable_turbo,
+	.disable_turbo = power_pstate_disable_turbo,
+	.get_caps = power_pstate_get_capabilities
+};
+
+RTE_POWER_REGISTER_OPS(pstate_ops);
diff --git a/lib/power/power_pstate_cpufreq.h b/drivers/power/pstate/pstate_cpufreq.h
similarity index 98%
rename from lib/power/power_pstate_cpufreq.h
rename to drivers/power/pstate/pstate_cpufreq.h
index 7bf64a518c..8b67b2da21 100644
--- a/lib/power/power_pstate_cpufreq.h
+++ b/drivers/power/pstate/pstate_cpufreq.h
@@ -2,15 +2,15 @@ 
  * Copyright(c) 2018 Intel Corporation
  */
 
-#ifndef _POWER_PSTATE_CPUFREQ_H
-#define _POWER_PSTATE_CPUFREQ_H
+#ifndef _PSTATE_CPUFREQ_H
+#define _PSTATE_CPUFREQ_H
 
 /**
  * @file
  * RTE Power Management via Intel Pstate driver
  */
 
-#include "rte_power.h"
+#include "rte_power_core_ops.h"
 
 /**
  * Check if pstate power management is supported.
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..f3e3451cdc 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -12,20 +12,15 @@  if not is_linux
     reason = 'only supported on Linux'
 endif
 sources = files(
-        'guest_channel.c',
-        'power_acpi_cpufreq.c',
-        'power_amd_pstate_cpufreq.c',
         'power_common.c',
-        'power_cppc_cpufreq.c',
-        'power_kvm_vm.c',
         'power_intel_uncore.c',
-        'power_pstate_cpufreq.c',
         'rte_power.c',
         'rte_power_uncore.c',
         'rte_power_pmd_mgmt.c',
 )
 headers = files(
         'rte_power.h',
+        'rte_power_core_ops.h',
         'rte_power_guest_channel.h',
         'rte_power_pmd_mgmt.h',
         'rte_power_uncore.h',
diff --git a/lib/power/power_common.c b/lib/power/power_common.c
index 590986d5ef..6c06411e8b 100644
--- a/lib/power/power_common.c
+++ b/lib/power/power_common.c
@@ -12,7 +12,7 @@ 
 
 #include "power_common.h"
 
-RTE_LOG_REGISTER_DEFAULT(power_logtype, INFO);
+RTE_LOG_REGISTER_DEFAULT(rte_power_logtype, INFO);
 
 #define POWER_SYSFILE_SCALING_DRIVER   \
 		"/sys/devices/system/cpu/cpu%u/cpufreq/scaling_driver"
diff --git a/lib/power/power_common.h b/lib/power/power_common.h
index 83f742f42a..767686ee12 100644
--- a/lib/power/power_common.h
+++ b/lib/power/power_common.h
@@ -6,12 +6,13 @@ 
 #define _POWER_COMMON_H_
 
 #include <rte_common.h>
+#include <rte_compat.h>
 #include <rte_log.h>
 
 #define RTE_POWER_INVALID_FREQ_INDEX (~0)
 
-extern int power_logtype;
-#define RTE_LOGTYPE_POWER power_logtype
+extern int rte_power_logtype;
+#define RTE_LOGTYPE_POWER rte_power_logtype
 #define POWER_LOG(level, ...) \
 	RTE_LOG_LINE(level, POWER, "" __VA_ARGS__)
 
@@ -23,13 +24,24 @@  extern int power_logtype;
 #endif
 
 /* check if scaling driver matches one we want */
+__rte_internal
 int cpufreq_check_scaling_driver(const char *driver);
+
+__rte_internal
 int power_set_governor(unsigned int lcore_id, const char *new_governor,
 		char *orig_governor, size_t orig_governor_len);
+
+__rte_internal
 int open_core_sysfs_file(FILE **f, const char *mode, const char *format, ...)
 		__rte_format_printf(3, 4);
+
+__rte_internal
 int read_core_sysfs_u32(FILE *f, uint32_t *val);
+
+__rte_internal
 int read_core_sysfs_s(FILE *f, char *buf, unsigned int len);
+
+__rte_internal
 int write_core_sysfs_s(FILE *f, const char *str);
 
 #endif /* _POWER_COMMON_H_ */
diff --git a/lib/power/rte_power.c b/lib/power/rte_power.c
index 36c3f3da98..2bf6d40517 100644
--- a/lib/power/rte_power.c
+++ b/lib/power/rte_power.c
@@ -8,153 +8,86 @@ 
 #include <rte_spinlock.h>
 
 #include "rte_power.h"
-#include "power_acpi_cpufreq.h"
-#include "power_cppc_cpufreq.h"
 #include "power_common.h"
-#include "power_kvm_vm.h"
-#include "power_pstate_cpufreq.h"
-#include "power_amd_pstate_cpufreq.h"
 
-enum power_management_env global_default_env = PM_ENV_NOT_SET;
+static enum power_management_env global_default_env = PM_ENV_NOT_SET;
+static struct rte_power_core_ops *global_power_core_ops;
 
 static rte_spinlock_t global_env_cfg_lock = RTE_SPINLOCK_INITIALIZER;
+static RTE_TAILQ_HEAD(, rte_power_core_ops) core_ops_list =
+			TAILQ_HEAD_INITIALIZER(core_ops_list);
 
-/* function pointers */
-rte_power_freqs_t rte_power_freqs  = NULL;
-rte_power_get_freq_t rte_power_get_freq = NULL;
-rte_power_set_freq_t rte_power_set_freq = NULL;
-rte_power_freq_change_t rte_power_freq_up = NULL;
-rte_power_freq_change_t rte_power_freq_down = NULL;
-rte_power_freq_change_t rte_power_freq_max = NULL;
-rte_power_freq_change_t rte_power_freq_min = NULL;
-rte_power_freq_change_t rte_power_turbo_status;
-rte_power_freq_change_t rte_power_freq_enable_turbo;
-rte_power_freq_change_t rte_power_freq_disable_turbo;
-rte_power_get_capabilities_t rte_power_get_capabilities;
-
-static void
-reset_power_function_ptrs(void)
+
+const char *power_env_str[] = {
+	"not set",
+	"acpi",
+	"kvm-vm",
+	"pstate",
+	"cppc",
+	"amd-pstate"
+};
+
+/* register the ops struct in rte_power_core_ops, return 0 on success. */
+int
+rte_power_register_ops(struct rte_power_core_ops *driver_ops)
 {
-	rte_power_freqs  = NULL;
-	rte_power_get_freq = NULL;
-	rte_power_set_freq = NULL;
-	rte_power_freq_up = NULL;
-	rte_power_freq_down = NULL;
-	rte_power_freq_max = NULL;
-	rte_power_freq_min = NULL;
-	rte_power_turbo_status = NULL;
-	rte_power_freq_enable_turbo = NULL;
-	rte_power_freq_disable_turbo = NULL;
-	rte_power_get_capabilities = NULL;
+	if (!driver_ops->init || !driver_ops->exit ||
+		!driver_ops->check_env_support || !driver_ops->get_avail_freqs ||
+		!driver_ops->get_freq || !driver_ops->set_freq ||
+		!driver_ops->freq_up || !driver_ops->freq_down ||
+		!driver_ops->freq_max || !driver_ops->freq_min ||
+		!driver_ops->turbo_status || !driver_ops->enable_turbo ||
+		!driver_ops->disable_turbo || !driver_ops->get_caps) {
+		POWER_LOG(ERR, "Missing callbacks while registering power ops");
+		return -EINVAL;
+	}
+
+	TAILQ_INSERT_TAIL(&core_ops_list, driver_ops, next);
+
+	return 0;
 }
 
 int
 rte_power_check_env_supported(enum power_management_env env)
 {
-	switch (env) {
-	case PM_ENV_ACPI_CPUFREQ:
-		return power_acpi_cpufreq_check_supported();
-	case PM_ENV_PSTATE_CPUFREQ:
-		return power_pstate_cpufreq_check_supported();
-	case PM_ENV_KVM_VM:
-		return power_kvm_vm_check_supported();
-	case PM_ENV_CPPC_CPUFREQ:
-		return power_cppc_cpufreq_check_supported();
-	case PM_ENV_AMD_PSTATE_CPUFREQ:
-		return power_amd_pstate_cpufreq_check_supported();
-	default:
-		rte_errno = EINVAL;
-		return -1;
-	}
+	struct rte_power_core_ops *ops;
+
+	if (env >= RTE_DIM(power_env_str))
+		return 0;
+
+	RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
+		if (strncmp(ops->name, power_env_str[env],
+				RTE_POWER_DRIVER_NAMESZ) == 0)
+			return ops->check_env_support();
+
+	return 0;
 }
 
 int
 rte_power_set_env(enum power_management_env env)
 {
+	struct rte_power_core_ops *ops;
+	int ret = -1;
+
 	rte_spinlock_lock(&global_env_cfg_lock);
 
 	if (global_default_env != PM_ENV_NOT_SET) {
 		POWER_LOG(ERR, "Power Management Environment already set.");
-		rte_spinlock_unlock(&global_env_cfg_lock);
-		return -1;
-	}
-
-	int ret = 0;
-
-	if (env == PM_ENV_ACPI_CPUFREQ) {
-		rte_power_freqs = power_acpi_cpufreq_freqs;
-		rte_power_get_freq = power_acpi_cpufreq_get_freq;
-		rte_power_set_freq = power_acpi_cpufreq_set_freq;
-		rte_power_freq_up = power_acpi_cpufreq_freq_up;
-		rte_power_freq_down = power_acpi_cpufreq_freq_down;
-		rte_power_freq_min = power_acpi_cpufreq_freq_min;
-		rte_power_freq_max = power_acpi_cpufreq_freq_max;
-		rte_power_turbo_status = power_acpi_turbo_status;
-		rte_power_freq_enable_turbo = power_acpi_enable_turbo;
-		rte_power_freq_disable_turbo = power_acpi_disable_turbo;
-		rte_power_get_capabilities = power_acpi_get_capabilities;
-	} else if (env == PM_ENV_KVM_VM) {
-		rte_power_freqs = power_kvm_vm_freqs;
-		rte_power_get_freq = power_kvm_vm_get_freq;
-		rte_power_set_freq = power_kvm_vm_set_freq;
-		rte_power_freq_up = power_kvm_vm_freq_up;
-		rte_power_freq_down = power_kvm_vm_freq_down;
-		rte_power_freq_min = power_kvm_vm_freq_min;
-		rte_power_freq_max = power_kvm_vm_freq_max;
-		rte_power_turbo_status = power_kvm_vm_turbo_status;
-		rte_power_freq_enable_turbo = power_kvm_vm_enable_turbo;
-		rte_power_freq_disable_turbo = power_kvm_vm_disable_turbo;
-		rte_power_get_capabilities = power_kvm_vm_get_capabilities;
-	} else if (env == PM_ENV_PSTATE_CPUFREQ) {
-		rte_power_freqs = power_pstate_cpufreq_freqs;
-		rte_power_get_freq = power_pstate_cpufreq_get_freq;
-		rte_power_set_freq = power_pstate_cpufreq_set_freq;
-		rte_power_freq_up = power_pstate_cpufreq_freq_up;
-		rte_power_freq_down = power_pstate_cpufreq_freq_down;
-		rte_power_freq_min = power_pstate_cpufreq_freq_min;
-		rte_power_freq_max = power_pstate_cpufreq_freq_max;
-		rte_power_turbo_status = power_pstate_turbo_status;
-		rte_power_freq_enable_turbo = power_pstate_enable_turbo;
-		rte_power_freq_disable_turbo = power_pstate_disable_turbo;
-		rte_power_get_capabilities = power_pstate_get_capabilities;
-
-	} else if (env == PM_ENV_CPPC_CPUFREQ) {
-		rte_power_freqs = power_cppc_cpufreq_freqs;
-		rte_power_get_freq = power_cppc_cpufreq_get_freq;
-		rte_power_set_freq = power_cppc_cpufreq_set_freq;
-		rte_power_freq_up = power_cppc_cpufreq_freq_up;
-		rte_power_freq_down = power_cppc_cpufreq_freq_down;
-		rte_power_freq_min = power_cppc_cpufreq_freq_min;
-		rte_power_freq_max = power_cppc_cpufreq_freq_max;
-		rte_power_turbo_status = power_cppc_turbo_status;
-		rte_power_freq_enable_turbo = power_cppc_enable_turbo;
-		rte_power_freq_disable_turbo = power_cppc_disable_turbo;
-		rte_power_get_capabilities = power_cppc_get_capabilities;
-	} else if (env == PM_ENV_AMD_PSTATE_CPUFREQ) {
-		rte_power_freqs = power_amd_pstate_cpufreq_freqs;
-		rte_power_get_freq = power_amd_pstate_cpufreq_get_freq;
-		rte_power_set_freq = power_amd_pstate_cpufreq_set_freq;
-		rte_power_freq_up = power_amd_pstate_cpufreq_freq_up;
-		rte_power_freq_down = power_amd_pstate_cpufreq_freq_down;
-		rte_power_freq_min = power_amd_pstate_cpufreq_freq_min;
-		rte_power_freq_max = power_amd_pstate_cpufreq_freq_max;
-		rte_power_turbo_status = power_amd_pstate_turbo_status;
-		rte_power_freq_enable_turbo = power_amd_pstate_enable_turbo;
-		rte_power_freq_disable_turbo = power_amd_pstate_disable_turbo;
-		rte_power_get_capabilities = power_amd_pstate_get_capabilities;
-	} else {
-		POWER_LOG(ERR, "Invalid Power Management Environment(%d) set",
-				env);
-		ret = -1;
+		goto out;
 	}
 
-	if (ret == 0)
-		global_default_env = env;
-	else {
-		global_default_env = PM_ENV_NOT_SET;
-		reset_power_function_ptrs();
-	}
+	RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
+		if (strncmp(ops->name, power_env_str[env],
+				RTE_POWER_DRIVER_NAMESZ) == 0) {
+			global_power_core_ops = ops;
+			global_default_env = env;
+			ret = 0;
+			goto out;
+		}
+	POWER_LOG(ERR, "Invalid Power Management Environment(%d) set",
+			env);
 
+out:
 	rte_spinlock_unlock(&global_env_cfg_lock);
 	return ret;
 }
@@ -164,94 +97,66 @@  rte_power_unset_env(void)
 {
 	rte_spinlock_lock(&global_env_cfg_lock);
 	global_default_env = PM_ENV_NOT_SET;
-	reset_power_function_ptrs();
+	global_power_core_ops = NULL;
 	rte_spinlock_unlock(&global_env_cfg_lock);
 }
 
 enum power_management_env
-rte_power_get_env(void) {
+rte_power_get_env(void)
+{
 	return global_default_env;
 }
 
-int
-rte_power_init(unsigned int lcore_id)
+struct rte_power_core_ops *
+rte_power_get_core_ops(void)
 {
-	int ret = -1;
+	RTE_ASSERT(global_power_core_ops != NULL);
 
-	switch (global_default_env) {
-	case PM_ENV_ACPI_CPUFREQ:
-		return power_acpi_cpufreq_init(lcore_id);
-	case PM_ENV_KVM_VM:
-		return power_kvm_vm_init(lcore_id);
-	case PM_ENV_PSTATE_CPUFREQ:
-		return power_pstate_cpufreq_init(lcore_id);
-	case PM_ENV_CPPC_CPUFREQ:
-		return power_cppc_cpufreq_init(lcore_id);
-	case PM_ENV_AMD_PSTATE_CPUFREQ:
-		return power_amd_pstate_cpufreq_init(lcore_id);
-	default:
-		POWER_LOG(INFO, "Env isn't set yet!");
-	}
+	return global_power_core_ops;
+}
 
-	/* Auto detect Environment */
-	POWER_LOG(INFO, "Attempting to initialise ACPI cpufreq power management...");
-	ret = power_acpi_cpufreq_init(lcore_id);
-	if (ret == 0) {
-		rte_power_set_env(PM_ENV_ACPI_CPUFREQ);
-		goto out;
-	}
+int
+rte_power_init(unsigned int lcore_id)
+{
+	struct rte_power_core_ops *ops;
+	uint8_t env;
 
-	POWER_LOG(INFO, "Attempting to initialise PSTAT power management...");
-	ret = power_pstate_cpufreq_init(lcore_id);
-	if (ret == 0) {
-		rte_power_set_env(PM_ENV_PSTATE_CPUFREQ);
-		goto out;
-	}
+	if (global_default_env != PM_ENV_NOT_SET)
+		return global_power_core_ops->init(lcore_id);
 
-	POWER_LOG(INFO, "Attempting to initialise AMD PSTATE power management...");
-	ret = power_amd_pstate_cpufreq_init(lcore_id);
-	if (ret == 0) {
-		rte_power_set_env(PM_ENV_AMD_PSTATE_CPUFREQ);
-		goto out;
-	}
+	POWER_LOG(INFO, "Env isn't set yet!");
 
-	POWER_LOG(INFO, "Attempting to initialise CPPC power management...");
-	ret = power_cppc_cpufreq_init(lcore_id);
-	if (ret == 0) {
-		rte_power_set_env(PM_ENV_CPPC_CPUFREQ);
-		goto out;
-	}
+	/* Auto detect Environment */
+	RTE_TAILQ_FOREACH(ops, &core_ops_list, next)
+		if (ops) {
+			POWER_LOG(INFO,
+				"Attempting to initialise %s cpufreq power management...",
+				ops->name);
+			if (ops->init(lcore_id) == 0) {
+				for (env = 0; env < RTE_DIM(power_env_str); env++)
+					if (strncmp(ops->name, power_env_str[env],
+							RTE_POWER_DRIVER_NAMESZ) == 0) {
+						rte_power_set_env(env);
+						return 0;
+					}
+			}
+		}
+
+	POWER_LOG(ERR,
+		"Unable to set Power Management Environment for lcore %u",
+		lcore_id);
 
-	POWER_LOG(INFO, "Attempting to initialise VM power management...");
-	ret = power_kvm_vm_init(lcore_id);
-	if (ret == 0) {
-		rte_power_set_env(PM_ENV_KVM_VM);
-		goto out;
-	}
-	POWER_LOG(ERR, "Unable to set Power Management Environment for lcore "
-			"%u", lcore_id);
-out:
-	return ret;
+	return -1;
 }
 
 int
 rte_power_exit(unsigned int lcore_id)
 {
-	switch (global_default_env) {
-	case PM_ENV_ACPI_CPUFREQ:
-		return power_acpi_cpufreq_exit(lcore_id);
-	case PM_ENV_KVM_VM:
-		return power_kvm_vm_exit(lcore_id);
-	case PM_ENV_PSTATE_CPUFREQ:
-		return power_pstate_cpufreq_exit(lcore_id);
-	case PM_ENV_CPPC_CPUFREQ:
-		return power_cppc_cpufreq_exit(lcore_id);
-	case PM_ENV_AMD_PSTATE_CPUFREQ:
-		return power_amd_pstate_cpufreq_exit(lcore_id);
-	default:
-		POWER_LOG(ERR, "Environment has not been set, unable to exit gracefully");
+	if (global_default_env != PM_ENV_NOT_SET)
+		return global_power_core_ops->exit(lcore_id);
 
-	}
-	return -1;
+	POWER_LOG(ERR,
+		"Environment has not been set, unable to exit gracefully");
 
+	return -1;
 }
diff --git a/lib/power/rte_power.h b/lib/power/rte_power.h
index 4fa4afe399..5e4aacf08b 100644
--- a/lib/power/rte_power.h
+++ b/lib/power/rte_power.h
@@ -1,5 +1,6 @@ 
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2024 Advanced Micro Devices, Inc.
  */
 
 #ifndef _RTE_POWER_H
@@ -14,14 +15,21 @@ 
 #include <rte_log.h>
 #include <rte_power_guest_channel.h>
 
+#include "rte_power_core_ops.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
 /* Power Management Environment State */
-enum power_management_env {PM_ENV_NOT_SET, PM_ENV_ACPI_CPUFREQ, PM_ENV_KVM_VM,
-		PM_ENV_PSTATE_CPUFREQ, PM_ENV_CPPC_CPUFREQ,
-		PM_ENV_AMD_PSTATE_CPUFREQ};
+enum power_management_env {
+	PM_ENV_NOT_SET = 0,
+	PM_ENV_ACPI_CPUFREQ,
+	PM_ENV_KVM_VM,
+	PM_ENV_PSTATE_CPUFREQ,
+	PM_ENV_CPPC_CPUFREQ,
+	PM_ENV_AMD_PSTATE_CPUFREQ
+};
 
 /**
  * Check if a specific power management environment type is supported on a
@@ -66,6 +74,15 @@  void rte_power_unset_env(void);
  */
 enum power_management_env rte_power_get_env(void);
 
+/**
+ * @internal Get the power ops struct from its index.
+ *
+ * @return
+ *   The pointer to the ops struct in the table if registered.
+ */
+struct rte_power_core_ops *
+rte_power_get_core_ops(void);
+
 /**
  * Initialize power management for a specific lcore. If rte_power_set_env() has
  * not been called then an auto-detect of the environment will start and
@@ -108,10 +125,13 @@  int rte_power_exit(unsigned int lcore_id);
  * @return
  *  The number of available frequencies.
  */
-typedef uint32_t (*rte_power_freqs_t)(unsigned int lcore_id, uint32_t *freqs,
-		uint32_t num);
+static inline uint32_t
+rte_power_freqs(unsigned int lcore_id, uint32_t *freqs, uint32_t n)
+{
+	struct rte_power_core_ops *ops = rte_power_get_core_ops();
 
-extern rte_power_freqs_t rte_power_freqs;
+	return ops->get_avail_freqs(lcore_id, freqs, n);
+}
 
 /**
  * Return the current index of available frequencies of a specific lcore.
@@ -124,9 +144,13 @@  extern rte_power_freqs_t rte_power_freqs;
  * @return
  *  The current index of available frequencies.
  */
-typedef uint32_t (*rte_power_get_freq_t)(unsigned int lcore_id);
+static inline uint32_t
+rte_power_get_freq(unsigned int lcore_id)
+{
+	struct rte_power_core_ops *ops = rte_power_get_core_ops();
 
-extern rte_power_get_freq_t rte_power_get_freq;
+	return ops->get_freq(lcore_id);
+}
 
 /**
  * Set the new frequency for a specific lcore by indicating the index of
@@ -144,82 +168,101 @@  extern rte_power_get_freq_t rte_power_get_freq;
  *  - 0 on success without frequency changed.
  *  - Negative on error.
  */
-typedef int (*rte_power_set_freq_t)(unsigned int lcore_id, uint32_t index);
-
-extern rte_power_set_freq_t rte_power_set_freq;
+static inline uint32_t
+rte_power_set_freq(unsigned int lcore_id, uint32_t index)
+{
+	struct rte_power_core_ops *ops = rte_power_get_core_ops();
 
-/**
- * Function pointer definition for generic frequency change functions. Review
- * each environments specific documentation for usage.
- *
- * @param lcore_id
- *  lcore id.
- *
- * @return
- *  - 1 on success with frequency changed.
- *  - 0 on success without frequency changed.
- *  - Negative on error.
- */
-typedef int (*rte_power_freq_change_t)(unsigned int lcore_id);
+	return ops->set_freq(lcore_id, index);
+}
 
 /**
  * Scale up the frequency of a specific lcore according to the available
  * frequencies.
  * Review each environments specific documentation for usage.
  */
-extern rte_power_freq_change_t rte_power_freq_up;
+static inline int
+rte_power_freq_up(unsigned int lcore_id)
+{
+	struct rte_power_core_ops *ops = rte_power_get_core_ops();
+
+	return ops->freq_up(lcore_id);
+}
 
 /**
  * Scale down the frequency of a specific lcore according to the available
  * frequencies.
  * Review each environments specific documentation for usage.
  */
-extern rte_power_freq_change_t rte_power_freq_down;
+static inline int
+rte_power_freq_down(unsigned int lcore_id)
+{
+	struct rte_power_core_ops *ops = rte_power_get_core_ops();
+
+	return ops->freq_down(lcore_id);
+}
 
 /**
  * Scale up the frequency of a specific lcore to the highest according to the
  * available frequencies.
  * Review each environments specific documentation for usage.
  */
-extern rte_power_freq_change_t rte_power_freq_max;
+static inline int
+rte_power_freq_max(unsigned int lcore_id)
+{
+	struct rte_power_core_ops *ops = rte_power_get_core_ops();
+
+	return ops->freq_max(lcore_id);
+}
 
 /**
  * Scale down the frequency of a specific lcore to the lowest according to the
  * available frequencies.
  * Review each environments specific documentation for usage..
  */
-extern rte_power_freq_change_t rte_power_freq_min;
+static inline int
+rte_power_freq_min(unsigned int lcore_id)
+{
+	struct rte_power_core_ops *ops = rte_power_get_core_ops();
+
+	return ops->freq_min(lcore_id);
+}
 
 /**
  * Query the Turbo Boost status of a specific lcore.
  * Review each environments specific documentation for usage..
  */
-extern rte_power_freq_change_t rte_power_turbo_status;
+static inline int
+rte_power_turbo_status(unsigned int lcore_id)
+{
+	struct rte_power_core_ops *ops = rte_power_get_core_ops();
+
+	return ops->turbo_status(lcore_id);
+}
 
 /**
  * Enable Turbo Boost for this lcore.
  * Review each environments specific documentation for usage..
  */
-extern rte_power_freq_change_t rte_power_freq_enable_turbo;
+static inline int
+rte_power_freq_enable_turbo(unsigned int lcore_id)
+{
+	struct rte_power_core_ops *ops = rte_power_get_core_ops();
+
+	return ops->enable_turbo(lcore_id);
+}
 
 /**
  * Disable Turbo Boost for this lcore.
  * Review each environments specific documentation for usage..
  */
-extern rte_power_freq_change_t rte_power_freq_disable_turbo;
+static inline int
+rte_power_freq_disable_turbo(unsigned int lcore_id)
+{
+	struct rte_power_core_ops *ops = rte_power_get_core_ops();
 
-/**
- * Power capabilities summary.
- */
-struct rte_power_core_capabilities {
-	union {
-		uint64_t capabilities;
-		struct {
-			uint64_t turbo:1;	/**< Turbo can be enabled. */
-			uint64_t priority:1;	/**< SST-BF high freq core */
-		};
-	};
-};
+	return ops->disable_turbo(lcore_id);
+}
 
 /**
  * Returns power capabilities for a specific lcore.
@@ -235,10 +278,14 @@  struct rte_power_core_capabilities {
  *  - 0 on success.
  *  - Negative on error.
  */
-typedef int (*rte_power_get_capabilities_t)(unsigned int lcore_id,
-		struct rte_power_core_capabilities *caps);
+static inline int
+rte_power_get_capabilities(unsigned int lcore_id,
+		struct rte_power_core_capabilities *caps)
+{
+	struct rte_power_core_ops *ops = rte_power_get_core_ops();
 
-extern rte_power_get_capabilities_t rte_power_get_capabilities;
+	return ops->get_caps(lcore_id, caps);
+}
 
 #ifdef __cplusplus
 }
diff --git a/lib/power/rte_power_core_ops.h b/lib/power/rte_power_core_ops.h
new file mode 100644
index 0000000000..356a64df79
--- /dev/null
+++ b/lib/power/rte_power_core_ops.h
@@ -0,0 +1,208 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2024 Advanced Micro Devices, Inc.
+ */
+
+#ifndef _RTE_POWER_CORE_OPS_H
+#define _RTE_POWER_CORE_OPS_H
+
+/**
+ * @file
+ * RTE Power Management
+ */
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define RTE_POWER_DRIVER_NAMESZ	24
+
+/**
+ * Initialize power management for a specific lcore. If rte_power_set_env() has
+ * not been called then an auto-detect of the environment will start and
+ * initialise the corresponding resources.
+ *
+ * @param lcore_id
+ *  lcore id.
+ *
+ * @return
+ *  - 0 on success.
+ *  - Negative on error.
+ */
+typedef int (*rte_power_cpufreq_init_t)(unsigned int lcore_id);
+
+/**
+ * Exit power management on a specific lcore. This will call the environment
+ * dependent exit function.
+ *
+ * @param lcore_id
+ *  lcore id.
+ *
+ * @return
+ *  - 0 on success.
+ *  - Negative on error.
+ */
+typedef int (*rte_power_cpufreq_exit_t)(unsigned int lcore_id);
+
+/**
+ * Check if a specific power management environment type is supported on a
+ * currently running system.
+ *
+ * @return
+ *   - 1 if supported
+ *   - 0 if unsupported
+ *   - -1 if error, with rte_errno indicating reason for error.
+ */
+typedef int (*rte_power_check_env_support_t)(void);
+
+/**
+ * Get the available frequencies of a specific lcore.
+ * Function pointer definition. Review each environments
+ * specific documentation for usage.
+ *
+ * @param lcore_id
+ *  lcore id.
+ * @param freqs
+ *  The buffer array to save the frequencies.
+ * @param num
+ *  The number of frequencies to get.
+ *
+ * @return
+ *  The number of available frequencies.
+ */
+typedef uint32_t (*rte_power_freqs_t)(unsigned int lcore_id, uint32_t *freqs,
+					uint32_t num);
+
+/**
+ * Return the current index of available frequencies of a specific lcore.
+ * Function pointer definition. Review each environments
+ * specific documentation for usage.
+ *
+ * @param lcore_id
+ *  lcore id.
+ *
+ * @return
+ *  The current index of available frequencies.
+ */
+typedef uint32_t (*rte_power_get_freq_t)(unsigned int lcore_id);
+
+/**
+ * Set the new frequency for a specific lcore by indicating the index of
+ * available frequencies.
+ * Function pointer definition. Review each environments
+ * specific documentation for usage.
+ *
+ * @param lcore_id
+ *  lcore id.
+ * @param index
+ *  The index of available frequencies.
+ *
+ * @return
+ *  - 1 on success with frequency changed.
+ *  - 0 on success without frequency changed.
+ *  - Negative on error.
+ */
+typedef int (*rte_power_set_freq_t)(unsigned int lcore_id, uint32_t index);
+
+/**
+ * Function pointer definition for generic frequency change functions. Review
+ * each environments specific documentation for usage.
+ *
+ * @param lcore_id
+ *  lcore id.
+ *
+ * @return
+ *  - 1 on success with frequency changed.
+ *  - 0 on success without frequency changed.
+ *  - Negative on error.
+ */
+typedef int (*rte_power_freq_change_t)(unsigned int lcore_id);
+
+/**
+ * Function pointer definition for generic frequency change functions. Review
+ * each environments specific documentation for usage.
+ *
+ * @param lcore_id
+ *  lcore id.
+ *
+ * @return
+ *  - 1 on success with frequency changed.
+ *  - 0 on success without frequency changed.
+ *  - Negative on error.
+ */
+
+/**
+ * Power capabilities summary.
+ */
+struct rte_power_core_capabilities {
+	union {
+		uint64_t capabilities;
+		struct {
+			uint64_t turbo:1;       /**< Turbo can be enabled. */
+			uint64_t priority:1;    /**< SST-BF high freq core */
+		};
+	};
+};
+
+typedef int (*rte_power_get_capabilities_t)(unsigned int lcore_id,
+				struct rte_power_core_capabilities *caps);
+
+/** Structure defining core power operations structure */
+struct rte_power_core_ops {
+	RTE_TAILQ_ENTRY(rte_power_core_ops) next;	/**< Next in list. */
+	char name[RTE_POWER_DRIVER_NAMESZ];		/**< power mgmt driver. */
+	rte_power_cpufreq_init_t init;			/**< Initialize power management. */
+	rte_power_cpufreq_exit_t exit;			/**< Exit power management. */
+	rte_power_check_env_support_t check_env_support;/**< verify env is supported. */
+	rte_power_freqs_t get_avail_freqs;		/**< Get the available frequencies. */
+	rte_power_get_freq_t get_freq;			/**< Get frequency index. */
+	rte_power_set_freq_t set_freq;			/**< Set frequency index. */
+	rte_power_freq_change_t freq_up;		/**< Scale up frequency. */
+	rte_power_freq_change_t freq_down;		/**< Scale down frequency. */
+	rte_power_freq_change_t freq_max;		/**< Scale up frequency to highest. */
+	rte_power_freq_change_t freq_min;		/**< Scale up frequency to lowest. */
+	rte_power_freq_change_t turbo_status;		/**< Get Turbo status. */
+	rte_power_freq_change_t enable_turbo;		/**< Enable Turbo. */
+	rte_power_freq_change_t disable_turbo;		/**< Disable Turbo. */
+	rte_power_get_capabilities_t get_caps;		/**< power capabilities. */
+};
+
+/**
+ * Register power cpu frequency operations.
+ *
+ * @param ops
+ *   Pointer to an ops structure to register.
+ * @return
+ *   - >=0: Success; return the index of the ops struct in the table.
+ *   - -EINVAL - error while registering ops struct.
+ */
+__rte_internal
+int rte_power_register_ops(struct rte_power_core_ops *ops);
+
+/**
+ * Macro to statically register the ops of a cpufreq driver.
+ */
+#define RTE_POWER_REGISTER_OPS(ops)		\
+	RTE_INIT(power_hdlr_init_##ops)		\
+	{					\
+		rte_power_register_ops(&ops);	\
+	}
+
+/**
+ * @internal Get the power ops struct from its index.
+ *
+ * @return
+ *   The pointer to the ops struct in the table if registered.
+ */
+struct rte_power_core_ops *
+rte_power_get_core_ops(void);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/power/version.map b/lib/power/version.map
index c9a226614e..bd64e0828f 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,18 @@  EXPERIMENTAL {
 	rte_power_set_uncore_env;
 	rte_power_uncore_freqs;
 	rte_power_unset_uncore_env;
+	# added in 24.07
+	rte_power_logtype;
+};
+
+INTERNAL {
+	global:
+
+	rte_power_register_ops;
+	cpufreq_check_scaling_driver;
+	power_set_governor;
+	open_core_sysfs_file;
+	read_core_sysfs_u32;
+	read_core_sysfs_s;
+	write_core_sysfs_s;
 };