[dpdk-dev,v1,1/4] lib/librte_power: add per-core turbo capability

Message ID 1503418310-162535-2-git-send-email-david.hunt@intel.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Hunt, David Aug. 22, 2017, 4:11 p.m. UTC
  Adds a new set of APIs to allow per-core turbo
enable-disable.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_power/channel_commands.h       |   2 +
 lib/librte_power/rte_power.c              |   9 ++
 lib/librte_power/rte_power.h              |  41 +++++++++
 lib/librte_power/rte_power_acpi_cpufreq.c | 143 ++++++++++++++++++++++++++++++
 lib/librte_power/rte_power_acpi_cpufreq.h |  40 +++++++++
 lib/librte_power/rte_power_kvm_vm.c       |  19 ++++
 lib/librte_power/rte_power_kvm_vm.h       |  35 +++++++-
 7 files changed, 288 insertions(+), 1 deletion(-)
  

Comments

Hunt, David Sept. 13, 2017, 10:44 a.m. UTC | #1
Recent generations of the Intel® Xeon® family processors allow Turbo Boost
to be enabled/disabled on a per-core basis.

This patch set introduces additional API calls to the librte_power library
to allow users to enable/disable Turbo Boost on particular cores.

Changes in patchset v2:
   * Removed wrmsr/rdmsr functions as they were very architecture specific.
     Now using the scaling_setspeed in the sys filesystem, as this is a more
     standard cross-platform method of changing frequencies (where available).
   * Removed patch that checks for particular models of CPU, as they are no
     longer needed with the above change.
   * Added APIs to the docs.

Additionally, the use of the library is demonstrated by additions to the
vm_power_manager example application, where the new commands have been
added to allow the turbo status of cores to be changed dynamically.

Extra message types have been added to the virtio-serial channels between the
guest_vm_power_manager app and the vm_power_manager apps to demonstrate
turbo change requests from a virtual machine. In this case, the guest will
send a request to the physical host, which in turn will change the state of
the turbo status.


Usage Example:
--------------

A VM has been created using 8 CPU cores, and 8 virtio-serial channels have
been created as per-core communications channels between the host and the VM.

See: http://www.dpdk.org/doc/guides/sample_app_ug/vm_power_management.html
for more information on setting up the vm_power applications.

In the vm_power_manager app on the host, we can query these channels:
vmpower> show_vm ubuntu2

VM: 'ubuntu2', status = ACTIVE
Channels 8
  [0]: /tmp/powermonitor/ubuntu2.0, status = CONNECTED
  [1]: /tmp/powermonitor/ubuntu2.1, status = CONNECTED
  [2]: /tmp/powermonitor/ubuntu2.2, status = CONNECTED
  [3]: /tmp/powermonitor/ubuntu2.3, status = CONNECTED
  [4]: /tmp/powermonitor/ubuntu2.4, status = CONNECTED
  [5]: /tmp/powermonitor/ubuntu2.5, status = CONNECTED
  [6]: /tmp/powermonitor/ubuntu2.6, status = CONNECTED
  [7]: /tmp/powermonitor/ubuntu2.7, status = CONNECTED
Virtual CPU(s): 8
  [0]: Physical CPU Mask 0x100000
  [1]: Physical CPU Mask 0x200000
  [2]: Physical CPU Mask 0x400000
  [3]: Physical CPU Mask 0x800000
  [4]: Physical CPU Mask 0x1000000
  [5]: Physical CPU Mask 0x2000000
  [6]: Physical CPU Mask 0x4000000
  [7]: Physical CPU Mask 0x8000000

Once the VM is up and running, if we exercise all the cores on the guest, we
can use turbostat on the host to see the frequencies of the guest cores. In
this example, it's cores 20-27:

      19       0    0.01    2500    2500
      20    2498  100.00    2500    2498
      21    2498  100.00    2500    2498
      22    2498  100.00    2500    2498
      23    2498  100.00    2500    2498
      24   *2498  100.00    2500    2498
      25    2498  100.00    2500    2498
      26    2498  100.00    2500    2498
      27    2498  100.00    2500    2498
      28       0    0.01    2032    2498

We can then issue a command in the vmpower app on the guest:

vmpower(guest)> set_cpu_freq 4 enable_turbo

This command will pass a message down through virtio-serial to the host, which
will enable turbo on core 24, the underlying physical core for the guest's
4th lcore_id. We can then see the change by running turbostat on the host:

      19       0    0.01    2500    2496
      20    2498  100.00    2500    2498
      21    2498  100.00    2500    2498
      22    2498  100.00    2500    2498
      23    2498  100.00    2500    2498
      24   *3297  100.00    3300    2498
      25    2498  100.00    2500    2498
      26    2498  100.00    2500    2498
      27    2498  100.00    2500    2498
      28       0    0.01    1016    2498

Core 24 is now running at 3300MHz, whereas the remainder are still running
at 2500MHz.

We can issue a similar command in the vm_power_manager running on the host
to disable turbo on that core, but this time we use the physical core id:

vmpower> set_cpu_freq 24 disable_turbo

and we see that turbo is now disabled on that core.

      19       0    0.00    2500    2495
      20    2499  100.00    2500    2499
      21    2499  100.00    2500    2499
      22    2499  100.00    2500    2499
      23    2499  100.00    2500    2499
      24   *2499  100.00    2500    2499
      25    2499  100.00    2500    2499
      26    2499  100.00    2500    2499
      27    2499  100.00    2500    2499
      28       0    0.01    1000    2499

[1/4] lib/librte_power: add turbo boost API
[2/4] examples/vm_power_manager: add per-core turbo
[3/4] examples/vm_power_cli_guest: add per-core turbo
[4/4] doc/power: add information on per-core turbo APIs
  
Thomas Monjalon Sept. 22, 2017, 2:36 p.m. UTC | #2
13/09/2017 12:44, David Hunt:
> Recent generations of the Intel® Xeon® family processors allow Turbo Boost
> to be enabled/disabled on a per-core basis.
> 
> This patch set introduces additional API calls to the librte_power library
> to allow users to enable/disable Turbo Boost on particular cores.

Applied, thanks
  

Patch

diff --git a/lib/librte_power/channel_commands.h b/lib/librte_power/channel_commands.h
index 383897b..484085b 100644
--- a/lib/librte_power/channel_commands.h
+++ b/lib/librte_power/channel_commands.h
@@ -52,6 +52,8 @@  extern "C" {
 #define CPU_POWER_SCALE_DOWN    2
 #define CPU_POWER_SCALE_MAX     3
 #define CPU_POWER_SCALE_MIN     4
+#define CPU_POWER_ENABLE_TURBO  5
+#define CPU_POWER_DISABLE_TURBO 6
 
 struct channel_packet {
 	uint64_t resource_id; /**< core_num, device */
diff --git a/lib/librte_power/rte_power.c b/lib/librte_power/rte_power.c
index 998ed1c..b327a86 100644
--- a/lib/librte_power/rte_power.c
+++ b/lib/librte_power/rte_power.c
@@ -50,6 +50,9 @@  rte_power_freq_change_t rte_power_freq_up = NULL;
 rte_power_freq_change_t rte_power_freq_down = NULL;
 rte_power_freq_change_t rte_power_freq_max = NULL;
 rte_power_freq_change_t rte_power_freq_min = NULL;
+rte_power_freq_change_t rte_power_turbo_status;
+rte_power_freq_change_t rte_power_freq_enable_turbo;
+rte_power_freq_change_t rte_power_freq_disable_turbo;
 
 int
 rte_power_set_env(enum power_management_env env)
@@ -65,6 +68,9 @@  rte_power_set_env(enum power_management_env env)
 		rte_power_freq_down = rte_power_acpi_cpufreq_freq_down;
 		rte_power_freq_min = rte_power_acpi_cpufreq_freq_min;
 		rte_power_freq_max = rte_power_acpi_cpufreq_freq_max;
+		rte_power_turbo_status = rte_power_acpi_turbo_status;
+		rte_power_freq_enable_turbo = rte_power_acpi_enable_turbo;
+		rte_power_freq_disable_turbo = rte_power_acpi_disable_turbo;
 	} else if (env == PM_ENV_KVM_VM) {
 		rte_power_freqs = rte_power_kvm_vm_freqs;
 		rte_power_get_freq = rte_power_kvm_vm_get_freq;
@@ -73,6 +79,9 @@  rte_power_set_env(enum power_management_env env)
 		rte_power_freq_down = rte_power_kvm_vm_freq_down;
 		rte_power_freq_min = rte_power_kvm_vm_freq_min;
 		rte_power_freq_max = rte_power_kvm_vm_freq_max;
+		rte_power_turbo_status = rte_power_kvm_vm_turbo_status;
+		rte_power_freq_enable_turbo = rte_power_kvm_vm_enable_turbo;
+		rte_power_freq_disable_turbo = rte_power_kvm_vm_disable_turbo;
 	} else {
 		RTE_LOG(ERR, POWER, "Invalid Power Management Environment(%d) set\n",
 				env);
diff --git a/lib/librte_power/rte_power.h b/lib/librte_power/rte_power.h
index 67e0ec0..b17b7a5 100644
--- a/lib/librte_power/rte_power.h
+++ b/lib/librte_power/rte_power.h
@@ -236,6 +236,47 @@  extern rte_power_freq_change_t rte_power_freq_max;
  */
 extern rte_power_freq_change_t rte_power_freq_min;
 
+/**
+ * Query the Turbo Boost status of a specific lcore.
+ * Review each environments specific documentation for usage..
+ *
+ * @param lcore_id
+ *  lcore id.
+ *
+ * @return
+ *  - 1 Turbo Boost is enabled for this lcore.
+ *  - 0 Turbo Boost is disabled for this lcore.
+ *  - Negative on error.
+ */
+extern rte_power_freq_change_t rte_power_turbo_status;
+
+/**
+ * Enable Turbo Boost for this lcore.
+ * Review each environments specific documentation for usage..
+ *
+ * @param lcore_id
+ *  lcore id.
+ *
+ * @return
+ *  - 0 on success.
+ *  - Negative on error.
+ */
+extern rte_power_freq_change_t rte_power_freq_enable_turbo;
+
+/**
+ * Disable Turbo Boost for this lcore.
+ * Review each environments specific documentation for usage..
+ *
+ * @param lcore_id
+ *  lcore id.
+ *
+ * @return
+ *  - 0 on success.
+ *  - Negative on error.
+ */
+extern rte_power_freq_change_t rte_power_freq_disable_turbo;
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_power/rte_power_acpi_cpufreq.c b/lib/librte_power/rte_power_acpi_cpufreq.c
index a56c9b5..6695f59 100644
--- a/lib/librte_power/rte_power_acpi_cpufreq.c
+++ b/lib/librte_power/rte_power_acpi_cpufreq.c
@@ -87,6 +87,14 @@ 
 #define POWER_SYSFILE_SETSPEED   \
 		"/sys/devices/system/cpu/cpu%u/cpufreq/scaling_setspeed"
 
+/*
+ * MSR related
+ */
+#define PLATFORM_INFO     0x0CE
+#define TURBO_RATIO_LIMIT 0x1AD
+#define IA32_PERF_CTL     0x199
+#define CORE_TURBO_DISABLE_BIT ((uint64_t)1<<32)
+
 enum power_state {
 	POWER_IDLE = 0,
 	POWER_ONGOING,
@@ -543,3 +551,138 @@  rte_power_acpi_cpufreq_freq_min(unsigned lcore_id)
 	/* Frequencies in the array are from high to low. */
 	return set_freq_internal(pi, pi->nb_freqs - 1);
 }
+
+
+static int
+rdmsr(int lcore, int msr, uint64_t *val)
+{
+	char filename[32];
+	int fd;
+	int retval;
+
+	sprintf(filename, "/dev/cpu/%d/msr", lcore);
+	fd = open(filename, O_RDONLY);
+	if (fd < 0)
+		return fd;
+
+	retval = pread(fd, val, sizeof(uint64_t), msr);
+	if (retval < 0) {
+		close(fd);
+		return retval;
+	}
+	close(fd);
+	return 0;
+}
+
+static int
+wrmsr(int lcore, int msr, uint64_t val)
+{
+	char filename[32];
+	int fd;
+	int retval;
+
+	sprintf(filename, "/dev/cpu/%d/msr", lcore);
+	fd = open(filename, O_WRONLY);
+	if (fd < 0)
+		return fd;
+
+	retval = pwrite(fd, (void *)&val, sizeof(uint64_t), msr);
+	if (retval < 0) {
+		close(fd);
+		return retval;
+	}
+	close(fd);
+	return 0;
+}
+
+int
+rte_power_acpi_turbo_status(unsigned int lcore_id)
+{
+	uint64_t val;
+	int retval;
+
+	if (lcore_id >= RTE_MAX_LCORE) {
+		RTE_LOG(ERR, POWER, "Invalid lcore ID\n");
+		return -1;
+	}
+
+#if defined(RTE_ARCH_I686) || defined(RTE_ARCH_X86_64)
+	retval = rdmsr(lcore_id, IA32_PERF_CTL, &val);
+	if (retval)
+		return retval;
+	else
+		return(!(val & CORE_TURBO_DISABLE_BIT));
+#else
+	return 0
+#endif
+}
+
+
+int
+rte_power_acpi_enable_turbo(unsigned int lcore_id)
+{
+	uint64_t val;
+	int retval;
+
+	if (lcore_id >= RTE_MAX_LCORE) {
+		RTE_LOG(ERR, POWER, "Invalid lcore ID\n");
+		return -1;
+	}
+
+#if defined(RTE_ARCH_I686) || defined(RTE_ARCH_X86_64)
+	/*
+	 * The low byte of 1ADh MSR contains max recomended ratio when a small
+	 * number of cores are active. Use this ratio when turbo is enabled.
+	 */
+	retval = rdmsr(lcore_id, TURBO_RATIO_LIMIT, &val);
+	if (retval)
+		return retval;
+
+	val = (val & 0x00ff) << 8;       /* Move to second lowest byte     */
+	val &= ~CORE_TURBO_DISABLE_BIT;  /* Switch bit off to enable turbo */
+
+	retval = wrmsr(lcore_id, IA32_PERF_CTL, val);
+	if (retval)
+		return retval;
+	else
+		return 0;
+#else
+	return 0;
+#endif
+}
+
+int
+rte_power_acpi_disable_turbo(unsigned int lcore_id)
+{
+	uint64_t val;
+	int retval;
+
+	if (lcore_id >= RTE_MAX_LCORE) {
+		RTE_LOG(ERR, POWER, "Invalid lcore ID\n");
+		return -1;
+	}
+
+#if defined(RTE_ARCH_I686) || defined(RTE_ARCH_X86_64)
+	/*
+	 * 0CEh MSR contains max non-turbo ratio in bits 8-15. Use this
+	 * for the freq when turbo is disabled for that core.
+	 */
+	retval = rdmsr(lcore_id, PLATFORM_INFO, &val);
+	if (retval)
+		return retval;
+
+	val = val & 0xff00;             /* Only need second lowest byte   */
+	val |= CORE_TURBO_DISABLE_BIT;  /* Switch bit on to disable turbo */
+
+	retval = wrmsr(lcore_id, IA32_PERF_CTL, val);
+	if (retval)
+		return retval;
+
+	/* Try to set freq to max by default coming out of turbo */
+	if (rte_power_acpi_cpufreq_freq_max(lcore_id) < 0) {
+		RTE_LOG(ERR, POWER, "Failed to set frequency of lcore %u to max\n",
+				lcore_id);
+	}
+#endif
+	return 0;
+}
diff --git a/lib/librte_power/rte_power_acpi_cpufreq.h b/lib/librte_power/rte_power_acpi_cpufreq.h
index 68578e9..eee0ca0 100644
--- a/lib/librte_power/rte_power_acpi_cpufreq.h
+++ b/lib/librte_power/rte_power_acpi_cpufreq.h
@@ -185,6 +185,46 @@  int rte_power_acpi_cpufreq_freq_max(unsigned lcore_id);
  */
 int rte_power_acpi_cpufreq_freq_min(unsigned lcore_id);
 
+/**
+ * Get the turbo status of a specific lcore.
+ * It should be protected outside of this function for threadsafe.
+ *
+ * @param lcore_id
+ *  lcore id.
+ *
+ * @return
+ *  - 1 Turbo Boost is enabled on this lcore.
+ *  - 0 Turbo Boost is disabled on this lcore.
+ *  - Negative on error.
+ */
+int rte_power_acpi_turbo_status(unsigned int lcore_id);
+
+/**
+ * Enable Turbo Boost on a specific lcore.
+ * It should be protected outside of this function for threadsafe.
+ *
+ * @param lcore_id
+ *  lcore id.
+ *
+ * @return
+ *  - 0 Turbo Boost is enabled successfully on this lcore.
+ *  - Negative on error.
+ */
+int rte_power_acpi_enable_turbo(unsigned int lcore_id);
+
+/**
+ * Disable Turbo Boost on a specific lcore.
+ * It should be protected outside of this function for threadsafe.
+ *
+ * @param lcore_id
+ *  lcore id.
+ *
+ * @return
+ *  - 0 Turbo Boost disabled successfully on this lcore.
+ *  - Negative on error.
+ */
+int rte_power_acpi_disable_turbo(unsigned int lcore_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_power/rte_power_kvm_vm.c b/lib/librte_power/rte_power_kvm_vm.c
index a1badf3..9906062 100644
--- a/lib/librte_power/rte_power_kvm_vm.c
+++ b/lib/librte_power/rte_power_kvm_vm.c
@@ -134,3 +134,22 @@  rte_power_kvm_vm_freq_min(unsigned lcore_id)
 {
 	return send_msg(lcore_id, CPU_POWER_SCALE_MIN);
 }
+
+int
+rte_power_kvm_vm_turbo_status(__attribute__((unused)) unsigned int lcore_id)
+{
+	RTE_LOG(ERR, POWER, "rte_power_turbo_status is not implemented for Virtual Machine Power Management\n");
+	return -ENOTSUP;
+}
+
+int
+rte_power_kvm_vm_enable_turbo(unsigned int lcore_id)
+{
+	return send_msg(lcore_id, CPU_POWER_ENABLE_TURBO);
+}
+
+int
+rte_power_kvm_vm_disable_turbo(unsigned int lcore_id)
+{
+	return send_msg(lcore_id, CPU_POWER_DISABLE_TURBO);
+}
diff --git a/lib/librte_power/rte_power_kvm_vm.h b/lib/librte_power/rte_power_kvm_vm.h
index dcbc878..9af41d6 100644
--- a/lib/librte_power/rte_power_kvm_vm.h
+++ b/lib/librte_power/rte_power_kvm_vm.h
@@ -172,8 +172,41 @@  int rte_power_kvm_vm_freq_max(unsigned lcore_id);
  */
 int rte_power_kvm_vm_freq_min(unsigned lcore_id);
 
+/**
+ * It should be protected outside of this function for threadsafe.
+ *
+ * @param lcore_id
+ *  lcore id.
+ *
+ * @return
+ *  -ENOTSUP
+ */
+int rte_power_kvm_vm_turbo_status(unsigned int lcore_id);
+
+/**
+ * It should be protected outside of this function for threadsafe.
+ *
+ * @param lcore_id
+ *  lcore id.
+ *
+ * @return
+ *  - 1 on success.
+ *  - Negative on error.
+ */
+int rte_power_kvm_vm_enable_turbo(unsigned int lcore_id);
+
+/**
+ * It should be protected outside of this function for threadsafe.
+ *
+ * @param lcore_id
+ *  lcore id.
+ *
+ * @return
+ *  - 1 on success.
+ *  - Negative on error.
+ */
+int rte_power_kvm_vm_disable_turbo(unsigned int lcore_id);
 #ifdef __cplusplus
 }
 #endif
-
 #endif