From patchwork Tue Oct 29 13:28:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "lihuisong (C)" X-Patchwork-Id: 147585 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C7A6A45C0B; Tue, 29 Oct 2024 14:39:19 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 4EB4642E72; Tue, 29 Oct 2024 14:39:03 +0100 (CET) Received: from szxga06-in.huawei.com (szxga06-in.huawei.com [45.249.212.32]) by mails.dpdk.org (Postfix) with ESMTP id 6AD4C40144 for ; Tue, 29 Oct 2024 14:38:58 +0100 (CET) Received: from mail.maildlp.com (unknown [172.19.88.234]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4XdBDm0KP5z1ynkX; Tue, 29 Oct 2024 21:39:04 +0800 (CST) Received: from kwepemm600004.china.huawei.com (unknown [7.193.23.242]) by mail.maildlp.com (Postfix) with ESMTPS id 822501401F1; Tue, 29 Oct 2024 21:38:55 +0800 (CST) Received: from localhost.localdomain (10.28.79.22) by kwepemm600004.china.huawei.com (7.193.23.242) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Tue, 29 Oct 2024 21:38:54 +0800 From: Huisong Li To: CC: , , , , , , , , , , , Subject: [PATCH v14 1/3] power: introduce PM QoS API on CPU wide Date: Tue, 29 Oct 2024 21:28:02 +0800 Message-ID: <20241029132804.27613-2-lihuisong@huawei.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20241029132804.27613-1-lihuisong@huawei.com> References: <20240320105529.5626-1-lihuisong@huawei.com> <20241029132804.27613-1-lihuisong@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.28.79.22] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemm600004.china.huawei.com (7.193.23.242) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org The deeper the idle state, the lower the power consumption, but the longer the resume time. Some service are delay sensitive and very except the low resume time, like interrupt packet receiving mode. And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs interface is used to set and get the resume latency limit on the cpuX for userspace. Each cpuidle governor in Linux select which idle state to enter based on this CPU resume latency in their idle task. The per-CPU PM QoS API can be used to control this CPU's idle state selection and limit just enter the shallowest idle state to low the delay when wake up from by setting strict resume latency (zero value). Signed-off-by: Huisong Li Acked-by: Morten Brørup Acked-by: Chengwen Feng Acked-by: Konstantin Ananyev Acked-by: Sivaprasad Tummala --- doc/guides/prog_guide/power_man.rst | 19 ++++ doc/guides/rel_notes/release_24_11.rst | 5 + lib/power/meson.build | 2 + lib/power/rte_power_qos.c | 123 +++++++++++++++++++++++++ lib/power/rte_power_qos.h | 73 +++++++++++++++ lib/power/version.map | 4 + 6 files changed, 226 insertions(+) create mode 100644 lib/power/rte_power_qos.c create mode 100644 lib/power/rte_power_qos.h diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst index f6674efe2d..91358b04f3 100644 --- a/doc/guides/prog_guide/power_man.rst +++ b/doc/guides/prog_guide/power_man.rst @@ -107,6 +107,25 @@ User Cases The power management mechanism is used to save power when performing L3 forwarding. +PM QoS +------ + +The "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs +interface is used to set and get the resume latency limit on the cpuX for +userspace. Each cpuidle governor in Linux select which idle state to enter +based on this CPU resume latency in their idle task. + +The deeper the idle state, the lower the power consumption, but the longer +the resume time. Some service are latency sensitive and very except the low +resume time, like interrupt packet receiving mode. + +Applications can set and get the CPU resume latency by the +``rte_power_qos_set_cpu_resume_latency()`` and ``rte_power_qos_get_cpu_resume_latency()`` +respectively. Applications can set a strict resume latency (zero value) by +the ``rte_power_qos_set_cpu_resume_latency()`` to low the resume latency and +get better performance (instead, the power consumption of platform may increase). + + Ethernet PMD Power Management API --------------------------------- diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst index fa4822d928..d9e268274b 100644 --- a/doc/guides/rel_notes/release_24_11.rst +++ b/doc/guides/rel_notes/release_24_11.rst @@ -237,6 +237,11 @@ New Features This field is used to pass an extra configuration settings such as ability to lookup IPv4 addresses in network byte order. +* **Introduce per-CPU PM QoS interface.** + + * Add per-CPU PM QoS interface to low the resume latency when wake up from + idle state. + * **Added new API to register telemetry endpoint callbacks with private arguments.** A new ``rte_telemetry_register_cmd_arg`` function is available to pass an opaque value to diff --git a/lib/power/meson.build b/lib/power/meson.build index 2f0f3d26e9..9b5d3e8315 100644 --- a/lib/power/meson.build +++ b/lib/power/meson.build @@ -23,12 +23,14 @@ sources = files( 'rte_power.c', 'rte_power_uncore.c', 'rte_power_pmd_mgmt.c', + 'rte_power_qos.c', ) headers = files( 'rte_power.h', 'rte_power_guest_channel.h', 'rte_power_pmd_mgmt.h', 'rte_power_uncore.h', + 'rte_power_qos.h', ) deps += ['timer', 'ethdev'] diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c new file mode 100644 index 0000000000..4dd0532b36 --- /dev/null +++ b/lib/power/rte_power_qos.c @@ -0,0 +1,123 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 HiSilicon Limited + */ + +#include +#include +#include + +#include +#include + +#include "power_common.h" +#include "rte_power_qos.h" + +#define PM_QOS_SYSFILE_RESUME_LATENCY_US \ + "/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us" + +#define PM_QOS_CPU_RESUME_LATENCY_BUF_LEN 32 + +int +rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency) +{ + char buf[PM_QOS_CPU_RESUME_LATENCY_BUF_LEN]; + uint32_t cpu_id; + FILE *f; + int ret; + + if (!rte_lcore_is_enabled(lcore_id)) { + POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id); + return -EINVAL; + } + ret = power_get_lcore_mapped_cpu_id(lcore_id, &cpu_id); + if (ret != 0) + return ret; + + if (latency < 0) { + POWER_LOG(ERR, "latency should be greater than and equal to 0"); + return -EINVAL; + } + + ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, cpu_id); + if (ret != 0) { + POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US" : %s", + cpu_id, strerror(errno)); + return ret; + } + + /* + * Based on the sysfs interface pm_qos_resume_latency_us under + * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meaning + * is as follows for different input string. + * 1> the resume latency is 0 if the input is "n/a". + * 2> the resume latency is no constraint if the input is "0". + * 3> the resume latency is the actual value to be set. + */ + if (latency == RTE_POWER_QOS_STRICT_LATENCY_VALUE) + snprintf(buf, sizeof(buf), "%s", "n/a"); + else if (latency == RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT) + snprintf(buf, sizeof(buf), "%u", 0); + else + snprintf(buf, sizeof(buf), "%u", latency); + + ret = write_core_sysfs_s(f, buf); + if (ret != 0) + POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US" : %s", + cpu_id, strerror(errno)); + + fclose(f); + + return ret; +} + +int +rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id) +{ + char buf[PM_QOS_CPU_RESUME_LATENCY_BUF_LEN]; + int latency = -1; + uint32_t cpu_id; + FILE *f; + int ret; + + if (!rte_lcore_is_enabled(lcore_id)) { + POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id); + return -EINVAL; + } + ret = power_get_lcore_mapped_cpu_id(lcore_id, &cpu_id); + if (ret != 0) + return ret; + + ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, cpu_id); + if (ret != 0) { + POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US" : %s", + cpu_id, strerror(errno)); + return ret; + } + + ret = read_core_sysfs_s(f, buf, sizeof(buf)); + if (ret != 0) { + POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US" : %s", + cpu_id, strerror(errno)); + goto out; + } + + /* + * Based on the sysfs interface pm_qos_resume_latency_us under + * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meaning + * is as follows for different output string. + * 1> the resume latency is 0 if the output is "n/a". + * 2> the resume latency is no constraint if the output is "0". + * 3> the resume latency is the actual value in used for other string. + */ + if (strcmp(buf, "n/a") == 0) + latency = RTE_POWER_QOS_STRICT_LATENCY_VALUE; + else { + latency = strtoul(buf, NULL, 10); + latency = latency == 0 ? RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency; + } + +out: + fclose(f); + + return latency != -1 ? latency : ret; +} diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h new file mode 100644 index 0000000000..7a8dab9272 --- /dev/null +++ b/lib/power/rte_power_qos.h @@ -0,0 +1,73 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 HiSilicon Limited + */ + +#ifndef RTE_POWER_QOS_H +#define RTE_POWER_QOS_H + +#include + +#include + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * @file rte_power_qos.h + * + * PM QoS API. + * + * The CPU-wide resume latency limit has a positive impact on this CPU's idle + * state selection in each cpuidle governor. + * Please see the PM QoS on CPU wide in the following link: + * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us + * + * The deeper the idle state, the lower the power consumption, but the + * longer the resume time. Some service are delay sensitive and very except the + * low resume time, like interrupt packet receiving mode. + * + * In these case, per-CPU PM QoS API can be used to control this CPU's idle + * state selection and limit just enter the shallowest idle state to low the + * delay after sleep by setting strict resume latency (zero value). + */ + +#define RTE_POWER_QOS_STRICT_LATENCY_VALUE 0 +#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT INT32_MAX + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * @param lcore_id + * target logical core id + * + * @param latency + * The latency should be greater than and equal to zero in microseconds unit. + * + * @return + * 0 on success. Otherwise negative value is returned. + */ +__rte_experimental +int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Get the current resume latency of this logical core. + * The default value in kernel is @see RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT + * if don't set it. + * + * @return + * Negative value on failure. + * >= 0 means the actual resume latency limit on this core. + */ +__rte_experimental +int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id); + +#ifdef __cplusplus +} +#endif + +#endif /* RTE_POWER_QOS_H */ diff --git a/lib/power/version.map b/lib/power/version.map index c9a226614e..08f178a39d 100644 --- a/lib/power/version.map +++ b/lib/power/version.map @@ -51,4 +51,8 @@ EXPERIMENTAL { rte_power_set_uncore_env; rte_power_uncore_freqs; rte_power_unset_uncore_env; + + # added in 24.11 + rte_power_qos_get_cpu_resume_latency; + rte_power_qos_set_cpu_resume_latency; }; From patchwork Tue Oct 29 13:28:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "lihuisong (C)" X-Patchwork-Id: 147584 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8C12645C0B; Tue, 29 Oct 2024 14:39:13 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 356F642E6A; Tue, 29 Oct 2024 14:39:02 +0100 (CET) Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by mails.dpdk.org (Postfix) with ESMTP id 2D5DA42E5F for ; Tue, 29 Oct 2024 14:38:58 +0100 (CET) Received: from mail.maildlp.com (unknown [172.19.162.112]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4XdBBs17Myz1jw74; Tue, 29 Oct 2024 21:37:25 +0800 (CST) Received: from kwepemm600004.china.huawei.com (unknown [7.193.23.242]) by mail.maildlp.com (Postfix) with ESMTPS id EDA0414013B; Tue, 29 Oct 2024 21:38:55 +0800 (CST) Received: from localhost.localdomain (10.28.79.22) by kwepemm600004.china.huawei.com (7.193.23.242) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Tue, 29 Oct 2024 21:38:55 +0800 From: Huisong Li To: CC: , , , , , , , , , , , Subject: [PATCH v14 2/3] examples/l3fwd-power: fix data overflow when parse command line Date: Tue, 29 Oct 2024 21:28:03 +0800 Message-ID: <20241029132804.27613-3-lihuisong@huawei.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20241029132804.27613-1-lihuisong@huawei.com> References: <20240320105529.5626-1-lihuisong@huawei.com> <20241029132804.27613-1-lihuisong@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.28.79.22] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemm600004.china.huawei.com (7.193.23.242) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Many variables are 'uint32_t', like, 'pause_duration', 'scale_freq_min' and so on. They use parse_int() to parse it from command line. But overflow problem occurs when this function return. Fixes: 59f2853c4cae ("examples/l3fwd_power: add configuration options") Cc: stable@dpdk.org Signed-off-by: Huisong Li Acked-by: Konstantin Ananyev Acked-by: Chengwen Feng Acked-by: Sivaprasad Tummala --- examples/l3fwd-power/main.c | 41 +++++++++++++++++++------------------ 1 file changed, 21 insertions(+), 20 deletions(-) diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index 2bb6b092c3..96fac45c61 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -1524,8 +1524,12 @@ print_usage(const char *prgname) prgname); } +/* + * Caller must give the right upper limit so as to ensure receiver variable + * doesn't overflow. + */ static int -parse_int(const char *opt) +parse_uint(const char *opt, uint32_t max, uint32_t *res) { char *end = NULL; unsigned long val; @@ -1535,23 +1539,15 @@ parse_int(const char *opt) if ((opt[0] == '\0') || (end == NULL) || (*end != '\0')) return -1; - return val; -} - -static int parse_max_pkt_len(const char *pktlen) -{ - char *end = NULL; - unsigned long len; - - /* parse decimal string */ - len = strtoul(pktlen, &end, 10); - if ((pktlen[0] == '\0') || (end == NULL) || (*end != '\0')) + if (val > max) { + RTE_LOG(ERR, L3FWD_POWER, "%s parameter shouldn't exceed %u.\n", + opt, max); return -1; + } - if (len == 0) - return -1; + *res = val; - return len; + return 0; } static int @@ -1898,8 +1894,9 @@ parse_args(int argc, char **argv) if (!strncmp(lgopts[option_index].name, CMD_LINE_OPT_MAX_PKT_LEN, sizeof(CMD_LINE_OPT_MAX_PKT_LEN))) { + if (parse_uint(optarg, UINT32_MAX, &max_pkt_len) != 0) + return -1; printf("Custom frame size is configured\n"); - max_pkt_len = parse_max_pkt_len(optarg); } if (!strncmp(lgopts[option_index].name, @@ -1912,29 +1909,33 @@ parse_args(int argc, char **argv) if (!strncmp(lgopts[option_index].name, CMD_LINE_OPT_MAX_EMPTY_POLLS, sizeof(CMD_LINE_OPT_MAX_EMPTY_POLLS))) { + if (parse_uint(optarg, UINT32_MAX, &max_empty_polls) != 0) + return -1; printf("Maximum empty polls configured\n"); - max_empty_polls = parse_int(optarg); } if (!strncmp(lgopts[option_index].name, CMD_LINE_OPT_PAUSE_DURATION, sizeof(CMD_LINE_OPT_PAUSE_DURATION))) { + if (parse_uint(optarg, UINT32_MAX, &pause_duration) != 0) + return -1; printf("Pause duration configured\n"); - pause_duration = parse_int(optarg); } if (!strncmp(lgopts[option_index].name, CMD_LINE_OPT_SCALE_FREQ_MIN, sizeof(CMD_LINE_OPT_SCALE_FREQ_MIN))) { + if (parse_uint(optarg, UINT32_MAX, &scale_freq_min) != 0) + return -1; printf("Scaling frequency minimum configured\n"); - scale_freq_min = parse_int(optarg); } if (!strncmp(lgopts[option_index].name, CMD_LINE_OPT_SCALE_FREQ_MAX, sizeof(CMD_LINE_OPT_SCALE_FREQ_MAX))) { + if (parse_uint(optarg, UINT32_MAX, &scale_freq_max) != 0) + return -1; printf("Scaling frequency maximum configured\n"); - scale_freq_max = parse_int(optarg); } break; From patchwork Tue Oct 29 13:28:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "lihuisong (C)" X-Patchwork-Id: 147583 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4C32C45C0B; Tue, 29 Oct 2024 14:39:06 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0C4BF42E63; Tue, 29 Oct 2024 14:39:01 +0100 (CET) Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by mails.dpdk.org (Postfix) with ESMTP id 9EAE540144 for ; Tue, 29 Oct 2024 14:38:57 +0100 (CET) Received: from mail.maildlp.com (unknown [172.19.163.17]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4XdBBs4hbMz1jw77; Tue, 29 Oct 2024 21:37:25 +0800 (CST) Received: from kwepemm600004.china.huawei.com (unknown [7.193.23.242]) by mail.maildlp.com (Postfix) with ESMTPS id 74C3C1A0188; Tue, 29 Oct 2024 21:38:56 +0800 (CST) Received: from localhost.localdomain (10.28.79.22) by kwepemm600004.china.huawei.com (7.193.23.242) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Tue, 29 Oct 2024 21:38:55 +0800 From: Huisong Li To: CC: , , , , , , , , , , , Subject: [PATCH v14 3/3] examples/l3fwd-power: add PM QoS configuration Date: Tue, 29 Oct 2024 21:28:04 +0800 Message-ID: <20241029132804.27613-4-lihuisong@huawei.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20241029132804.27613-1-lihuisong@huawei.com> References: <20240320105529.5626-1-lihuisong@huawei.com> <20241029132804.27613-1-lihuisong@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.28.79.22] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemm600004.china.huawei.com (7.193.23.242) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org The '--cpu-resume-latency' can use to control C-state selection. Setting the CPU resume latency to 0 can limit the CPU just to enter C0-state to improve performance, which also may increase the power consumption of platform. Signed-off-by: Huisong Li Acked-by: Morten Brørup Acked-by: Chengwen Feng Acked-by: Konstantin Ananyev --- .../sample_app_ug/l3_forward_power_man.rst | 5 +- examples/l3fwd-power/main.c | 55 +++++++++++++++++++ 2 files changed, 59 insertions(+), 1 deletion(-) diff --git a/doc/guides/sample_app_ug/l3_forward_power_man.rst b/doc/guides/sample_app_ug/l3_forward_power_man.rst index 9c9684fea7..70fa83669a 100644 --- a/doc/guides/sample_app_ug/l3_forward_power_man.rst +++ b/doc/guides/sample_app_ug/l3_forward_power_man.rst @@ -67,7 +67,8 @@ based on the speculative sleep duration of the core. In this application, we introduce a heuristic algorithm that allows packet processing cores to sleep for a short period if there is no Rx packet received on recent polls. In this way, CPUIdle automatically forces the corresponding cores to enter deeper C-states -instead of always running to the C0 state waiting for packets. +instead of always running to the C0 state waiting for packets. But user can set the CPU resume latency to control C-state selection. +Setting the CPU resume latency to 0 can limit the CPU just to enter C0-state to improve performance, which may increase power consumption of platform. .. note:: @@ -105,6 +106,8 @@ where, * --config (port,queue,lcore)[,(port,queue,lcore)]: determines which queues from which ports are mapped to which cores. +* --cpu-resume-latency LATENCY: set CPU resume latency to control C-state selection, 0 : just allow to enter C0-state. + * --max-pkt-len: optional, maximum packet length in decimal (64-9600) * --no-numa: optional, disables numa awareness diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index 96fac45c61..7b04fd06dc 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -47,6 +47,7 @@ #include #include #include +#include #include "perf_core.h" #include "main.h" @@ -265,6 +266,9 @@ static uint32_t pause_duration = 1; static uint32_t scale_freq_min; static uint32_t scale_freq_max; +static int cpu_resume_latency = -1; +static int resume_latency_bk[RTE_MAX_LCORE]; + static struct rte_mempool * pktmbuf_pool[NB_SOCKETS]; @@ -1501,6 +1505,8 @@ print_usage(const char *prgname) " -U: set min/max frequency for uncore to maximum value\n" " -i (frequency index): set min/max frequency for uncore to specified frequency index\n" " --config (port,queue,lcore): rx queues configuration\n" + " --cpu-resume-latency LATENCY: set CPU resume latency to control C-state selection," + " 0 : just allow to enter C0-state\n" " --high-perf-cores CORELIST: list of high performance cores\n" " --perf-config: similar as config, cores specified as indices" " for bins containing high or regular performance cores\n" @@ -1739,6 +1745,7 @@ parse_pmd_mgmt_config(const char *name) #define CMD_LINE_OPT_PAUSE_DURATION "pause-duration" #define CMD_LINE_OPT_SCALE_FREQ_MIN "scale-freq-min" #define CMD_LINE_OPT_SCALE_FREQ_MAX "scale-freq-max" +#define CMD_LINE_OPT_CPU_RESUME_LATENCY "cpu-resume-latency" /* Parse the argument given in the command line of the application */ static int @@ -1753,6 +1760,7 @@ parse_args(int argc, char **argv) {"perf-config", 1, 0, 0}, {"high-perf-cores", 1, 0, 0}, {"no-numa", 0, 0, 0}, + {CMD_LINE_OPT_CPU_RESUME_LATENCY, 1, 0, 0}, {CMD_LINE_OPT_MAX_PKT_LEN, 1, 0, 0}, {CMD_LINE_OPT_PARSE_PTYPE, 0, 0, 0}, {CMD_LINE_OPT_LEGACY, 0, 0, 0}, @@ -1938,6 +1946,15 @@ parse_args(int argc, char **argv) printf("Scaling frequency maximum configured\n"); } + if (!strncmp(lgopts[option_index].name, + CMD_LINE_OPT_CPU_RESUME_LATENCY, + sizeof(CMD_LINE_OPT_CPU_RESUME_LATENCY))) { + if (parse_uint(optarg, INT_MAX, + (uint32_t *)&cpu_resume_latency) != 0) + return -1; + printf("PM QoS configured\n"); + } + break; default: @@ -2261,6 +2278,35 @@ init_power_library(void) return -1; } } + + if (cpu_resume_latency != -1) { + RTE_LCORE_FOREACH(lcore_id) { + /* Back old CPU resume latency. */ + ret = rte_power_qos_get_cpu_resume_latency(lcore_id); + if (ret < 0) { + RTE_LOG(ERR, L3FWD_POWER, + "Failed to get cpu resume latency on lcore-%u, ret=%d.\n", + lcore_id, ret); + } + resume_latency_bk[lcore_id] = ret; + + /* + * Set the cpu resume latency of the worker lcore based + * on user's request. If set strict latency (0), just + * allow the CPU to enter the shallowest idle state to + * improve performance. + */ + ret = rte_power_qos_set_cpu_resume_latency(lcore_id, + cpu_resume_latency); + if (ret != 0) { + RTE_LOG(ERR, L3FWD_POWER, + "Failed to set cpu resume latency on lcore-%u, ret=%d.\n", + lcore_id, ret); + return ret; + } + } + } + return ret; } @@ -2300,6 +2346,15 @@ deinit_power_library(void) } } } + + if (cpu_resume_latency != -1) { + RTE_LCORE_FOREACH(lcore_id) { + /* Restore the original value. */ + rte_power_qos_set_cpu_resume_latency(lcore_id, + resume_latency_bk[lcore_id]); + } + } + return ret; }