From patchwork Fri Jan 17 09:00:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tomasz Duszynski X-Patchwork-Id: 150159 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (unknown [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id EA34F460A7; Fri, 17 Jan 2025 10:01:30 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id AD712427A7; Fri, 17 Jan 2025 10:01:01 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by mails.dpdk.org (Postfix) with ESMTP id 826904279C for ; Fri, 17 Jan 2025 10:00:59 +0100 (CET) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50H7TwJQ013266; Fri, 17 Jan 2025 01:00:56 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pfpt0220; bh=Y WIH7bzQZ2jnfvOh+ZkJBDSf0OwFoj9MkwUwDuLVNAQ=; b=V6rs7MgOE758MTm+1 WvX1vUlZYrIT6r7hIgGZY94rI1kcWB3teqVpMYQj6SnfdGEa9L6q5wl/lyTtXLjN YKUpn9jtuThKIMP30/0TY3fz/GDhKwj+b8bBj9DZ5hYlkl7AdVkrtyDYnOQHihzy IlWqcAz8QMeAj/PMqpSxOGL50PH40NOvmMThS9uvB50H9bseRukZWF2wzkfMdTaa UK4urSKGdQU8/NNOq8GrXbAD7TLtIGYrKJfP0rROKM9StPV6HrIzb0+jMlsgfd3t arhLif3Uq/BxXLHR6qhOtqr5Tr6/20wIujHTEAGDy3lLMSQO84aAb3/oeg8UPF71 1CXYQ== Received: from dc5-exch05.marvell.com ([199.233.59.128]) by mx0b-0016f401.pphosted.com (PPS) with ESMTPS id 447jvc04q7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Jan 2025 01:00:56 -0800 (PST) Received: from DC5-EXCH05.marvell.com (10.69.176.209) by DC5-EXCH05.marvell.com (10.69.176.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Fri, 17 Jan 2025 01:00:54 -0800 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH05.marvell.com (10.69.176.209) with Microsoft SMTP Server id 15.2.1544.4 via Frontend Transport; Fri, 17 Jan 2025 01:00:54 -0800 Received: from cavium-optiplex-3070-BM15.. (unknown [10.28.34.39]) by maili.marvell.com (Postfix) with ESMTP id 037FA3F707D; Fri, 17 Jan 2025 01:00:50 -0800 (PST) From: Tomasz Duszynski To: , Thomas Monjalon , Tomasz Duszynski CC: , , , , , , , , , Subject: [PATCH v17 1/4] lib: add generic support for reading PMU events Date: Fri, 17 Jan 2025 10:00:30 +0100 Message-ID: <20250117090033.2807073-2-tduszynski@marvell.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250117090033.2807073-1-tduszynski@marvell.com> References: <20241118073706.3129423-1-tduszynski@marvell.com> <20250117090033.2807073-1-tduszynski@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: h48SGCIpOSze1ABzI9HEvlS95M2FV9ZX X-Proofpoint-GUID: h48SGCIpOSze1ABzI9HEvlS95M2FV9ZX X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-17_03,2025-01-16_01,2024-11-22_01 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Add support for programming PMU counters and reading their values in runtime bypassing kernel completely. This is especially useful in cases where CPU cores are isolated i.e run dedicated tasks. In such cases one cannot use standard perf utility without sacrificing latency and performance. Signed-off-by: Tomasz Duszynski --- MAINTAINERS | 5 + app/test/meson.build | 1 + app/test/test_pmu.c | 49 +++ doc/api/doxy-api-index.md | 3 +- doc/api/doxy-api.conf.in | 1 + doc/guides/prog_guide/glossary.rst | 3 + doc/guides/prog_guide/profile_app.rst | 33 ++ doc/guides/rel_notes/release_25_03.rst | 7 + lib/eal/meson.build | 3 + lib/meson.build | 1 + lib/pmu/meson.build | 13 + lib/pmu/pmu_private.h | 32 ++ lib/pmu/rte_pmu.c | 474 +++++++++++++++++++++++++ lib/pmu/rte_pmu.h | 227 ++++++++++++ lib/pmu/version.map | 14 + 15 files changed, 865 insertions(+), 1 deletion(-) create mode 100644 app/test/test_pmu.c create mode 100644 lib/pmu/meson.build create mode 100644 lib/pmu/pmu_private.h create mode 100644 lib/pmu/rte_pmu.c create mode 100644 lib/pmu/rte_pmu.h create mode 100644 lib/pmu/version.map diff --git a/MAINTAINERS b/MAINTAINERS index b86cdd266b..226f41e36b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1868,6 +1868,11 @@ M: Nithin Dabilpuram M: Pavan Nikhilesh F: lib/node/ +PMU - EXPERIMENTAL +M: Tomasz Duszynski +F: lib/pmu/ +F: app/test/test_pmu* + Test Applications ----------------- diff --git a/app/test/meson.build b/app/test/meson.build index 48cf77fda9..fb9fa9faf8 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -142,6 +142,7 @@ source_file_deps = { 'test_pmd_perf.c': ['ethdev', 'net'] + packet_burst_generator_deps, 'test_pmd_ring.c': ['net_ring', 'ethdev', 'bus_vdev'], 'test_pmd_ring_perf.c': ['ethdev', 'net_ring', 'bus_vdev'], + 'test_pmu.c': ['pmu'], 'test_power.c': ['power'], 'test_power_cpufreq.c': ['power'], 'test_power_intel_uncore.c': ['power'], diff --git a/app/test/test_pmu.c b/app/test/test_pmu.c new file mode 100644 index 0000000000..210190e583 --- /dev/null +++ b/app/test/test_pmu.c @@ -0,0 +1,49 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2025 Marvell International Ltd. + */ + +#include + +#include "test.h" + +static int +test_pmu_read(void) +{ + const char *name = NULL; + int tries = 10, event; + uint64_t val = 0; + + if (name == NULL) { + printf("PMU not supported on this arch\n"); + return TEST_SKIPPED; + } + + if (rte_pmu_init() < 0) + return TEST_FAILED; + + event = rte_pmu_add_event(name); + while (tries--) + val += rte_pmu_read(event); + + rte_pmu_fini(); + + return val ? TEST_SUCCESS : TEST_FAILED; +} + +static struct unit_test_suite pmu_tests = { + .suite_name = "pmu autotest", + .setup = NULL, + .teardown = NULL, + .unit_test_cases = { + TEST_CASE(test_pmu_read), + TEST_CASES_END() + } +}; + +static int +test_pmu(void) +{ + return unit_test_suite_runner(&pmu_tests); +} + +REGISTER_FAST_TEST(pmu_autotest, true, true, test_pmu); diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md index b2fc24b3e4..adf5950899 100644 --- a/doc/api/doxy-api-index.md +++ b/doc/api/doxy-api-index.md @@ -242,7 +242,8 @@ The public API headers are grouped by topics: [log](@ref rte_log.h), [errno](@ref rte_errno.h), [trace](@ref rte_trace.h), - [trace_point](@ref rte_trace_point.h) + [trace_point](@ref rte_trace_point.h), + [pmu](@ref rte_pmu.h) - **misc**: [EAL config](@ref rte_eal.h), diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in index d23352d300..f26317d346 100644 --- a/doc/api/doxy-api.conf.in +++ b/doc/api/doxy-api.conf.in @@ -69,6 +69,7 @@ INPUT = @TOPDIR@/doc/api/doxy-api-index.md \ @TOPDIR@/lib/pdcp \ @TOPDIR@/lib/pdump \ @TOPDIR@/lib/pipeline \ + @TOPDIR@/lib/pmu \ @TOPDIR@/lib/port \ @TOPDIR@/lib/power \ @TOPDIR@/lib/ptr_compress \ diff --git a/doc/guides/prog_guide/glossary.rst b/doc/guides/prog_guide/glossary.rst index 8d6349701e..fd995373f2 100644 --- a/doc/guides/prog_guide/glossary.rst +++ b/doc/guides/prog_guide/glossary.rst @@ -164,6 +164,9 @@ pktmbuf PMD Poll Mode Driver +PMU + Performance Monitoring Unit + QoS Quality of Service diff --git a/doc/guides/prog_guide/profile_app.rst b/doc/guides/prog_guide/profile_app.rst index a6b5fb4d5e..9a18aa0a26 100644 --- a/doc/guides/prog_guide/profile_app.rst +++ b/doc/guides/prog_guide/profile_app.rst @@ -7,6 +7,39 @@ Profile Your Application The following sections describe methods of profiling DPDK applications on different architectures. +Performance counter based profiling +----------------------------------- + +Modern CPU architectures are equipped with Performance Monitoring Units (PMUs), which provide +programmable counters to monitor specific hardware events, such as cache hits, instruction counts, +and branch predictions. + +Tools like perf utilize PMUs to gather performance data. However, in scenarios where CPU cores are +isolated, running dedicated tasks and performance of some specific regions of code must be analyzed +extra overhead may be undesirable. In such cases, applications can directly access PMU data using +the ``rte_pmu_read()`` function. + +Access requirements +~~~~~~~~~~~~~~~~~~~ + +Userspace applications may be restricted, due to various reasons, from accessing PMU internals. +To enable access ``/proc/sys/kernel/perf_event_paranoid`` should be set to ``2`` and application +should have ``CAP_PERFMON`` capability assigned. + +For comprehensive information on security implications and configuration, refer to +`kernel documentation `_. + +Limitations +~~~~~~~~~~~ + +Current implementation imposes certain limitations: + +* Only EAL lcores are supported + +* EAL lcores must not share a cpu + +* Each EAL lcore measures same group of events + Profiling on x86 ---------------- diff --git a/doc/guides/rel_notes/release_25_03.rst b/doc/guides/rel_notes/release_25_03.rst index 85986ffa61..546579a626 100644 --- a/doc/guides/rel_notes/release_25_03.rst +++ b/doc/guides/rel_notes/release_25_03.rst @@ -63,6 +63,13 @@ New Features and even substantial part of its code. It can be viewed as an extension of rte_ring functionality. +* **Added PMU library.** + + Added a new performance monitoring unit (PMU) library which allows applications + to perform self monitoring activities without depending on external utilities like perf. + After integration with :doc:`../prog_guide/trace_lib` data gathered from hardware counters + can be stored in CTF format for further analysis. + Removed Items ------------- diff --git a/lib/eal/meson.build b/lib/eal/meson.build index e1d6c4cf17..1349624653 100644 --- a/lib/eal/meson.build +++ b/lib/eal/meson.build @@ -18,6 +18,9 @@ deps += ['log', 'kvargs'] if not is_windows deps += ['telemetry'] endif +if dpdk_conf.has('RTE_LIB_PMU') + deps += ['pmu'] +endif if dpdk_conf.has('RTE_USE_LIBBSD') ext_deps += libbsd endif diff --git a/lib/meson.build b/lib/meson.build index ce92cb5537..968ad29e8d 100644 --- a/lib/meson.build +++ b/lib/meson.build @@ -13,6 +13,7 @@ libraries = [ 'kvargs', # eal depends on kvargs 'argparse', 'telemetry', # basic info querying + 'pmu', 'eal', # everything depends on eal 'ptr_compress', 'ring', diff --git a/lib/pmu/meson.build b/lib/pmu/meson.build new file mode 100644 index 0000000000..46eebda155 --- /dev/null +++ b/lib/pmu/meson.build @@ -0,0 +1,13 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(C) 2025 Marvell International Ltd. + +if not is_linux + build = false + reason = 'only supported on Linux' + subdir_done() +endif + +headers = files('rte_pmu.h') +sources = files('rte_pmu.c') + +deps += ['log'] diff --git a/lib/pmu/pmu_private.h b/lib/pmu/pmu_private.h new file mode 100644 index 0000000000..388458850b --- /dev/null +++ b/lib/pmu/pmu_private.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2025 Marvell + */ + +#ifndef _PMU_PRIVATE_H_ +#define _PMU_PRIVATE_H_ + +/** + * Architecture specific PMU init callback. + * + * @return + * 0 in case of success, negative value otherwise. + */ +int +pmu_arch_init(void); + +/** + * Architecture specific PMU cleanup callback. + */ +void +pmu_arch_fini(void); + +/** + * Apply architecture specific settings to config before passing it to syscall. + * + * @param config + * Architecture specific event configuration. Consult kernel sources for available options. + */ +void +pmu_arch_fixup_config(uint64_t config[3]); + +#endif /* _PMU_PRIVATE_H_ */ diff --git a/lib/pmu/rte_pmu.c b/lib/pmu/rte_pmu.c new file mode 100644 index 0000000000..ae24fdf2ee --- /dev/null +++ b/lib/pmu/rte_pmu.c @@ -0,0 +1,474 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2025 Marvell International Ltd. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "pmu_private.h" + +#define EVENT_SOURCE_DEVICES_PATH "/sys/bus/event_source/devices" + +#define FIELD_PREP(m, v) (((uint64_t)(v) << (__builtin_ffsll(m) - 1)) & (m)) + +RTE_LOG_REGISTER_DEFAULT(rte_pmu_logtype, INFO) +#define RTE_LOGTYPE_PMU rte_pmu_logtype + +#define PMU_LOG(level, ...) \ + RTE_LOG_LINE(level, PMU, ## __VA_ARGS__) + +/* A structure describing an event */ +struct rte_pmu_event { + char *name; + unsigned int index; + TAILQ_ENTRY(rte_pmu_event) next; +}; + +struct rte_pmu rte_pmu; + +/* + * Following __rte_weak functions provide default no-op. Architectures should override them if + * necessary. + */ + +int +__rte_weak pmu_arch_init(void) +{ + return 0; +} + +void +__rte_weak pmu_arch_fini(void) +{ +} + +void +__rte_weak pmu_arch_fixup_config(uint64_t __rte_unused config[3]) +{ +} + +static int +get_term_format(const char *name, int *num, uint64_t *mask) +{ + char path[PATH_MAX]; + char *config = NULL; + int high, low, ret; + FILE *fp; + + *num = *mask = 0; + snprintf(path, sizeof(path), EVENT_SOURCE_DEVICES_PATH "/%s/format/%s", rte_pmu.name, name); + fp = fopen(path, "r"); + if (fp == NULL) + return -errno; + + errno = 0; + ret = fscanf(fp, "%m[^:]:%d-%d", &config, &low, &high); + if (ret < 2) { + ret = -ENODATA; + goto out; + } + if (errno) { + ret = -errno; + goto out; + } + + if (ret == 2) + high = low; + + *mask = RTE_GENMASK64(high, low); + /* Last digit should be [012]. If last digit is missing 0 is implied. */ + *num = config[strlen(config) - 1]; + *num = isdigit(*num) ? *num - '0' : 0; + + ret = 0; +out: + free(config); + fclose(fp); + + return ret; +} + +static int +parse_event(char *buf, uint64_t config[3]) +{ + char *token, *term; + int num, ret, val; + uint64_t mask; + char *tmp; + + config[0] = config[1] = config[2] = 0; + + token = strtok_r(buf, ",", &tmp); + while (token) { + errno = 0; + /* = */ + ret = sscanf(token, "%m[^=]=%i", &term, &val); + if (ret < 1) + return -ENODATA; + if (errno) + return -errno; + if (ret == 1) + val = 1; + + ret = get_term_format(term, &num, &mask); + free(term); + if (ret) + return ret; + + config[num] |= FIELD_PREP(mask, val); + token = strtok_r(NULL, ",", &tmp); + } + + return 0; +} + +static int +get_event_config(const char *name, uint64_t config[3]) +{ + char path[PATH_MAX], buf[BUFSIZ]; + FILE *fp; + int ret; + + snprintf(path, sizeof(path), EVENT_SOURCE_DEVICES_PATH "/%s/events/%s", rte_pmu.name, name); + fp = fopen(path, "r"); + if (fp == NULL) + return -errno; + + ret = fread(buf, 1, sizeof(buf), fp); + if (ret == 0) { + fclose(fp); + + return -EINVAL; + } + fclose(fp); + buf[ret] = '\0'; + + return parse_event(buf, config); +} + +static int +do_perf_event_open(uint64_t config[3], int group_fd) +{ + struct perf_event_attr attr = { + .size = sizeof(struct perf_event_attr), + .type = PERF_TYPE_RAW, + .exclude_kernel = 1, + .exclude_hv = 1, + .disabled = 1, + .pinned = group_fd == -1, + }; + + pmu_arch_fixup_config(config); + + attr.config = config[0]; + attr.config1 = config[1]; + attr.config2 = config[2]; + + return syscall(SYS_perf_event_open, &attr, 0, -1, group_fd, 0); +} + +static int +open_events(struct rte_pmu_event_group *group) +{ + struct rte_pmu_event *event; + uint64_t config[3]; + int num = 0, ret; + + /* group leader gets created first, with fd = -1 */ + group->fds[0] = -1; + + TAILQ_FOREACH(event, &rte_pmu.event_list, next) { + ret = get_event_config(event->name, config); + if (ret) + continue; + + ret = do_perf_event_open(config, group->fds[0]); + if (ret == -1) { + ret = -errno; + goto out; + } + + group->fds[event->index] = ret; + num++; + } + + return 0; +out: + for (--num; num >= 0; num--) { + close(group->fds[num]); + group->fds[num] = -1; + } + + return ret; +} + +static int +mmap_events(struct rte_pmu_event_group *group) +{ + long page_size = sysconf(_SC_PAGE_SIZE); + unsigned int i; + void *addr; + int ret; + + for (i = 0; i < rte_pmu.num_group_events; i++) { + addr = mmap(0, page_size, PROT_READ, MAP_SHARED, group->fds[i], 0); + if (addr == MAP_FAILED) { + ret = -errno; + goto out; + } + + group->mmap_pages[i] = addr; + } + + return 0; +out: + for (; i; i--) { + munmap(group->mmap_pages[i - 1], page_size); + group->mmap_pages[i - 1] = NULL; + } + + return ret; +} + +static void +cleanup_events(struct rte_pmu_event_group *group) +{ + unsigned int i; + + if (group->fds[0] != -1) + ioctl(group->fds[0], PERF_EVENT_IOC_DISABLE, PERF_IOC_FLAG_GROUP); + + for (i = 0; i < rte_pmu.num_group_events; i++) { + if (group->mmap_pages[i]) { + munmap(group->mmap_pages[i], sysconf(_SC_PAGE_SIZE)); + group->mmap_pages[i] = NULL; + } + + if (group->fds[i] != -1) { + close(group->fds[i]); + group->fds[i] = -1; + } + } + + group->enabled = false; +} + +int +__rte_pmu_enable_group(struct rte_pmu_event_group *group) +{ + int ret; + + if (rte_pmu.num_group_events == 0) + return -ENODEV; + + ret = open_events(group); + if (ret) + goto out; + + ret = mmap_events(group); + if (ret) + goto out; + + if (ioctl(group->fds[0], PERF_EVENT_IOC_RESET, PERF_IOC_FLAG_GROUP) == -1) { + ret = -errno; + goto out; + } + + if (ioctl(group->fds[0], PERF_EVENT_IOC_ENABLE, PERF_IOC_FLAG_GROUP) == -1) { + ret = -errno; + goto out; + } + + group->enabled = true; + + return 0; +out: + cleanup_events(group); + + return ret; +} + +static int +scan_pmus(void) +{ + char path[PATH_MAX]; + struct dirent *dent; + const char *name; + DIR *dirp; + + dirp = opendir(EVENT_SOURCE_DEVICES_PATH); + if (dirp == NULL) + return -errno; + + while ((dent = readdir(dirp))) { + name = dent->d_name; + if (name[0] == '.') + continue; + + /* sysfs entry should either contain cpus or be a cpu */ + if (!strcmp(name, "cpu")) + break; + + snprintf(path, sizeof(path), EVENT_SOURCE_DEVICES_PATH "/%s/cpus", name); + if (access(path, F_OK) == 0) + break; + } + + if (dent) { + rte_pmu.name = strdup(name); + if (rte_pmu.name == NULL) { + closedir(dirp); + + return -ENOMEM; + } + } + + closedir(dirp); + + return rte_pmu.name ? 0 : -ENODEV; +} + +static struct rte_pmu_event * +new_event(const char *name) +{ + struct rte_pmu_event *event; + + event = calloc(1, sizeof(*event)); + if (event == NULL) + goto out; + + event->name = strdup(name); + if (event->name == NULL) { + free(event); + event = NULL; + } + +out: + return event; +} + +static void +free_event(struct rte_pmu_event *event) +{ + free(event->name); + free(event); +} + +int +rte_pmu_add_event(const char *name) +{ + struct rte_pmu_event *event; + char path[PATH_MAX]; + + if (!rte_pmu.initialized) { + PMU_LOG(ERR, "PMU is not initialized"); + return -ENODEV; + } + + if (rte_pmu.num_group_events + 1 >= RTE_MAX_NUM_GROUP_EVENTS) { + PMU_LOG(ERR, "Excessive number of events in a group (%d > %d)", + rte_pmu.num_group_events, RTE_MAX_NUM_GROUP_EVENTS); + return -ENOSPC; + } + + snprintf(path, sizeof(path), EVENT_SOURCE_DEVICES_PATH "/%s/events/%s", rte_pmu.name, name); + if (access(path, R_OK)) { + PMU_LOG(ERR, "Cannot access %s", path); + return -ENODEV; + } + + TAILQ_FOREACH(event, &rte_pmu.event_list, next) { + if (strcmp(event->name, name)) + continue; + + return event->index; + } + + event = new_event(name); + if (event == NULL) { + PMU_LOG(ERR, "Failed to create event %s", name); + return -ENOMEM; + } + + event->index = rte_pmu.num_group_events++; + TAILQ_INSERT_TAIL(&rte_pmu.event_list, event, next); + + return event->index; +} + +int +rte_pmu_init(void) +{ + int ret; + + if (rte_pmu.initialized) + return 0; + + ret = scan_pmus(); + if (ret) { + PMU_LOG(ERR, "Failed to scan for event sources"); + goto out; + } + + ret = pmu_arch_init(); + if (ret) { + PMU_LOG(ERR, "Failed to setup arch internals"); + goto out; + } + + TAILQ_INIT(&rte_pmu.event_list); + rte_pmu.initialized = 1; +out: + if (ret) { + free(rte_pmu.name); + rte_pmu.name = NULL; + } + + return ret; +} + +void +rte_pmu_fini(void) +{ + struct rte_pmu_event *event, *tmp_event; + struct rte_pmu_event_group *group; + unsigned int i; + + if (!rte_pmu.initialized) + return; + + RTE_TAILQ_FOREACH_SAFE(event, &rte_pmu.event_list, next, tmp_event) { + TAILQ_REMOVE(&rte_pmu.event_list, event, next); + free_event(event); + } + + for (i = 0; i < RTE_DIM(rte_pmu.event_groups); i++) { + group = &rte_pmu.event_groups[i]; + if (!group->enabled) + continue; + + cleanup_events(group); + } + + pmu_arch_fini(); + free(rte_pmu.name); + rte_pmu.name = NULL; + rte_pmu.num_group_events = 0; +} diff --git a/lib/pmu/rte_pmu.h b/lib/pmu/rte_pmu.h new file mode 100644 index 0000000000..a158091115 --- /dev/null +++ b/lib/pmu/rte_pmu.h @@ -0,0 +1,227 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2025 Marvell + */ + +#ifndef _RTE_PMU_H_ +#define _RTE_PMU_H_ + +/** + * @file + * + * PMU event tracing operations + * + * This file defines generic API and types necessary to setup PMU and + * read selected counters in runtime. Exported APIs are generally not MT-safe. + * One exception is rte_pmu_read() which can be called concurrently once + * everything has been setup. + * + * In order to initialize library following sequence of calls perform by the same EAL thread + * is required: + * + * rte_pmu_init() + * rte_pmu_add_event() + * + * Afterwards all threads can read events by calling rte_pmu_read(). + */ + +#ifdef __cplusplus +extern "C" { +#endif + +#include +#include + +#include +#include +#include +#include +#include + +/** Maximum number of events in a group */ +#define RTE_MAX_NUM_GROUP_EVENTS 8 + +/** + * A structure describing a group of events. + */ +struct __rte_cache_aligned rte_pmu_event_group { + /** array of user pages */ + struct perf_event_mmap_page *mmap_pages[RTE_MAX_NUM_GROUP_EVENTS]; + int fds[RTE_MAX_NUM_GROUP_EVENTS]; /**< array of event descriptors */ + TAILQ_ENTRY(rte_pmu_event_group) next; /**< list entry */ + bool enabled; /**< true if group was enabled on particular lcore */ +}; + +/** + * A PMU state container. + */ +struct rte_pmu { + struct rte_pmu_event_group event_groups[RTE_MAX_LCORE]; /**< event groups */ + unsigned int num_group_events; /**< number of events in a group */ + unsigned int initialized; /**< initialization counter */ + char *name; /**< name of core PMU listed under /sys/bus/event_source/devices */ + TAILQ_HEAD(, rte_pmu_event) event_list; /**< list of matching events */ +}; + +/** PMU state container */ +extern struct rte_pmu rte_pmu; + +/** Each architecture supporting PMU needs to provide its own version */ +#ifndef rte_pmu_pmc_read +#define rte_pmu_pmc_read(index) ({ RTE_SET_USED(index); 0; }) +#endif + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Read PMU counter. + * + * @warning This should not be called directly. + * + * @param pc + * Pointer to the mmapped user page. + * @return + * Counter value read from hardware. + */ +__rte_experimental +static __rte_always_inline uint64_t +__rte_pmu_read_userpage(struct perf_event_mmap_page *pc) +{ +#define __RTE_PMU_READ_ONCE(x) (*(const volatile typeof(x) *)&(x)) + uint64_t width, offset; + uint32_t seq, index; + int64_t pmc; + + for (;;) { + seq = __RTE_PMU_READ_ONCE(pc->lock); + rte_compiler_barrier(); + index = __RTE_PMU_READ_ONCE(pc->index); + offset = __RTE_PMU_READ_ONCE(pc->offset); + width = __RTE_PMU_READ_ONCE(pc->pmc_width); + + /* index set to 0 means that particular counter cannot be used */ + if (likely(pc->cap_user_rdpmc && index)) { + pmc = rte_pmu_pmc_read(index - 1); + pmc <<= 64 - width; + pmc >>= 64 - width; + offset += pmc; + } + + rte_compiler_barrier(); + + if (likely(__RTE_PMU_READ_ONCE(pc->lock) == seq)) + return offset; + } + + return 0; +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Enable group of events on the calling lcore. + * + * @warning This should not be called directly. + * + * @param group + * Pointer to the group which will be enabled. + * @return + * 0 in case of success, negative value otherwise. + */ +__rte_experimental +int +__rte_pmu_enable_group(struct rte_pmu_event_group *group); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Initialize PMU library. + * + * @return + * 0 in case of success, negative value otherwise. + */ +__rte_experimental +int +rte_pmu_init(void); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Finalize PMU library. + */ +__rte_experimental +void +rte_pmu_fini(void); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Add event to the group of enabled events. + * + * @param name + * Name of an event listed under /sys/bus/event_source/devices/pmu/events, wher pmu is + * a placeholder for an event source. + * @return + * Event index in case of success, negative value otherwise. + */ +__rte_experimental +int +rte_pmu_add_event(const char *name); + +/* quiesce warnings produced by chkincs caused by calling internal functions directly */ +#ifndef ALLOW_EXPERIMENTAL_API +#define __rte_pmu_enable_group(group) ({ RTE_SET_USED(group); 0; }) +#define __rte_pmu_read_userpage(pc) ({ RTE_SET_USED(pc); 0; }) +#endif + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Read hardware counter configured to count occurrences of an event. This is called by an lcore + * (EAL thread) bound exclusively to particular cpu and may not work as expected if gets migrated + * elsewhere. Reason being event group is pinned hence not supposed to be multiplexed with any other + * events. This is the only API which can be called concurrently by different lcores. + * + * @param index + * Index of an event to be read. + * @return + * Event value read from register. In case of errors or lack of support + * 0 is returned. In other words, stream of zeros in a trace file + * indicates problem with reading particular PMU event register. + */ +__rte_experimental +static __rte_always_inline uint64_t +rte_pmu_read(unsigned int index) +{ + unsigned int lcore_id = rte_lcore_id(); + struct rte_pmu_event_group *group; + + if (unlikely(!rte_pmu.initialized)) + return 0; + + /* non-EAL threads are not supported */ + if (unlikely(lcore_id >= RTE_MAX_LCORE)) + return 0; + + if (unlikely(index >= rte_pmu.num_group_events)) + return 0; + + group = &rte_pmu.event_groups[lcore_id]; + if (unlikely(!group->enabled)) { + if (__rte_pmu_enable_group(group)) + return 0; + } + + return __rte_pmu_read_userpage(group->mmap_pages[index]); +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_PMU_H_ */ diff --git a/lib/pmu/version.map b/lib/pmu/version.map new file mode 100644 index 0000000000..d21f7cf933 --- /dev/null +++ b/lib/pmu/version.map @@ -0,0 +1,14 @@ +EXPERIMENTAL { + global: + + # added in 25.03 + __rte_pmu_enable_group; + __rte_pmu_read_userpage; + rte_pmu; + rte_pmu_add_event; + rte_pmu_fini; + rte_pmu_init; + rte_pmu_read; + + local: *; +}; From patchwork Fri Jan 17 09:00:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tomasz Duszynski X-Patchwork-Id: 150160 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (unknown [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id EDFA4460AB; Fri, 17 Jan 2025 10:01:50 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 69781427A1; Fri, 17 Jan 2025 10:01:06 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by mails.dpdk.org (Postfix) with ESMTP id D1C9F4279B for ; Fri, 17 Jan 2025 10:01:04 +0100 (CET) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50H7TvDZ013247; Fri, 17 Jan 2025 01:01:01 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pfpt0220; bh=m CGZwb4RojC0LJ/a43KwQPgHEgJXNjgP9tcM/w6rblM=; b=XFjq5yus5WWjryjA/ qBSFZMFf8AzIzW0UCK5H08MCLqTm7YAKiAQd3eeN5hG3+OmjFqMmF5RkOZfsdECk cTTQCqZv+8qNE9vrdY8YfKueARbFs7JlrhOQgZEn2RspdSs9pnuuq059tNEFzB36 3wh/37ebnmq8CgTh6vXj/X3w12Arkdbw7IEETjQx0ZRsM70u7/k1bRqCpO4Ex81s qVRLDK3VKfcoya0m0+hgID5MhTV0Yg2WQo5xxu4yGsN3VEFhyXOjBg4zc0fcW3Z1 HMNjhGlYAl18uwm20p+bJt2njAw3lBK9asJ5pY5YwLivjfcgH8z9GCTkFdlh/7h+ mR3Gg== Received: from dc5-exch05.marvell.com ([199.233.59.128]) by mx0b-0016f401.pphosted.com (PPS) with ESMTPS id 447jvc04qc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Jan 2025 01:01:01 -0800 (PST) Received: from DC5-EXCH05.marvell.com (10.69.176.209) by DC5-EXCH05.marvell.com (10.69.176.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Fri, 17 Jan 2025 01:00:59 -0800 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH05.marvell.com (10.69.176.209) with Microsoft SMTP Server id 15.2.1544.4 via Frontend Transport; Fri, 17 Jan 2025 01:00:59 -0800 Received: from cavium-optiplex-3070-BM15.. (unknown [10.28.34.39]) by maili.marvell.com (Postfix) with ESMTP id 470BB5B6923; Fri, 17 Jan 2025 01:00:55 -0800 (PST) From: Tomasz Duszynski To: , Tomasz Duszynski , "Wathsala Vithanage" CC: , , , , , , , , , , Subject: [PATCH v17 2/4] pmu: support reading ARM PMU events in runtime Date: Fri, 17 Jan 2025 10:00:31 +0100 Message-ID: <20250117090033.2807073-3-tduszynski@marvell.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250117090033.2807073-1-tduszynski@marvell.com> References: <20241118073706.3129423-1-tduszynski@marvell.com> <20250117090033.2807073-1-tduszynski@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: idPHfi__i5m-9p9tZfYqBkvJeqIZyYCJ X-Proofpoint-GUID: idPHfi__i5m-9p9tZfYqBkvJeqIZyYCJ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-17_03,2025-01-16_01,2024-11-22_01 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Add support for reading ARM PMU events in runtime. Signed-off-by: Tomasz Duszynski --- app/test/test_pmu.c | 4 ++ lib/pmu/meson.build | 8 ++++ lib/pmu/pmu_arm64.c | 94 +++++++++++++++++++++++++++++++++++++ lib/pmu/rte_pmu.h | 4 ++ lib/pmu/rte_pmu_pmc_arm64.h | 30 ++++++++++++ 5 files changed, 140 insertions(+) create mode 100644 lib/pmu/pmu_arm64.c create mode 100644 lib/pmu/rte_pmu_pmc_arm64.h diff --git a/app/test/test_pmu.c b/app/test/test_pmu.c index 210190e583..e0c11e4898 100644 --- a/app/test/test_pmu.c +++ b/app/test/test_pmu.c @@ -13,6 +13,10 @@ test_pmu_read(void) int tries = 10, event; uint64_t val = 0; +#if defined(RTE_ARCH_ARM64) + name = "cpu_cycles"; +#endif + if (name == NULL) { printf("PMU not supported on this arch\n"); return TEST_SKIPPED; diff --git a/lib/pmu/meson.build b/lib/pmu/meson.build index 46eebda155..467841e431 100644 --- a/lib/pmu/meson.build +++ b/lib/pmu/meson.build @@ -10,4 +10,12 @@ endif headers = files('rte_pmu.h') sources = files('rte_pmu.c') +indirect_headers += files( + 'rte_pmu_pmc_arm64.h', +) + +if dpdk_conf.has('RTE_ARCH_ARM64') + sources += files('pmu_arm64.c') +endif + deps += ['log'] diff --git a/lib/pmu/pmu_arm64.c b/lib/pmu/pmu_arm64.c new file mode 100644 index 0000000000..a23f1864df --- /dev/null +++ b/lib/pmu/pmu_arm64.c @@ -0,0 +1,94 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2025 Marvell International Ltd. + */ + +#include +#include +#include +#include + +#include +#include + +#include "pmu_private.h" + +#define PERF_USER_ACCESS_PATH "/proc/sys/kernel/perf_user_access" + +static int restore_uaccess; + +static int +read_attr_int(const char *path, int *val) +{ + char buf[BUFSIZ]; + int ret, fd; + + fd = open(path, O_RDONLY); + if (fd == -1) + return -errno; + + ret = read(fd, buf, sizeof(buf)); + if (ret == -1) { + close(fd); + + return -errno; + } + + *val = strtol(buf, NULL, 10); + close(fd); + + return 0; +} + +static int +write_attr_int(const char *path, int val) +{ + char buf[BUFSIZ]; + int num, ret, fd; + + fd = open(path, O_WRONLY); + if (fd == -1) + return -errno; + + num = snprintf(buf, sizeof(buf), "%d", val); + ret = write(fd, buf, num); + if (ret == -1) { + close(fd); + + return -errno; + } + + close(fd); + + return 0; +} + +int +pmu_arch_init(void) +{ + int ret; + + ret = read_attr_int(PERF_USER_ACCESS_PATH, &restore_uaccess); + if (ret) + return ret; + + /* user access already enabled */ + if (restore_uaccess == 1) + return 0; + + return write_attr_int(PERF_USER_ACCESS_PATH, 1); +} + +void +pmu_arch_fini(void) +{ + write_attr_int(PERF_USER_ACCESS_PATH, restore_uaccess); +} + +void +pmu_arch_fixup_config(uint64_t config[3]) +{ + /* select 64 bit counters */ + config[1] |= RTE_BIT64(0); + /* enable userspace access */ + config[1] |= RTE_BIT64(1); +} diff --git a/lib/pmu/rte_pmu.h b/lib/pmu/rte_pmu.h index a158091115..171c52e692 100644 --- a/lib/pmu/rte_pmu.h +++ b/lib/pmu/rte_pmu.h @@ -37,6 +37,10 @@ extern "C" { #include #include +#if defined(RTE_ARCH_ARM64) +#include "rte_pmu_pmc_arm64.h" +#endif + /** Maximum number of events in a group */ #define RTE_MAX_NUM_GROUP_EVENTS 8 diff --git a/lib/pmu/rte_pmu_pmc_arm64.h b/lib/pmu/rte_pmu_pmc_arm64.h new file mode 100644 index 0000000000..2952f5d65e --- /dev/null +++ b/lib/pmu/rte_pmu_pmc_arm64.h @@ -0,0 +1,30 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2025 Marvell. + */ +#ifndef _RTE_PMU_PMC_ARM64_H_ +#define _RTE_PMU_PMC_ARM64_H_ + +#include + +static __rte_always_inline uint64_t +rte_pmu_pmc_read(int index) +{ + uint64_t val; + + if (index == 31) { + /* CPU Cycles (0x11) must be read via pmccntr_el0 */ + asm volatile("mrs %0, pmccntr_el0" : "=r" (val)); + } else { + asm volatile( + "msr pmselr_el0, %x0\n" + "mrs %0, pmxevcntr_el0\n" + : "=r" (val) + : "rZ" (index) + ); + } + + return val; +} +#define rte_pmu_pmc_read rte_pmu_pmc_read + +#endif /* _RTE_PMU_PMC_ARM64_H_ */ From patchwork Fri Jan 17 09:00:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tomasz Duszynski X-Patchwork-Id: 150161 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (unknown [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id EF96A460AB; Fri, 17 Jan 2025 10:02:06 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5549A427B5; Fri, 17 Jan 2025 10:01:11 +0100 (CET) Received: from mx0a-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id 6A56D427B4 for ; Fri, 17 Jan 2025 10:01:10 +0100 (CET) Received: from pps.filterd (m0431384.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50H82J8F008546; Fri, 17 Jan 2025 01:01:05 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pfpt0220; bh=U kaxJE7MhNlYBGGRuiVDGZY2hW7UWP4Fr490uCfIlI8=; b=P+GWmnTLxhRKQ6l9+ yND63m24GZv8ntGEsWkihwsxoiqMwoYyIuu+fq8pLfxf7hlQLmeB6FrzXU2J7pVs NLN410c4duKfTcaqg4B56byJ9wLkfN6qrGLvvj0KrQleoFAzUGLsKuDpKVTVwY+K 22u0K3uiTBfWyhEHxSN1HEgHe8NVdtmzznlMmqR1pyV/LOkqfFoW0Uy5RvSOkTqO Kud2bSDl8xlQM8H5JuGQuQF1hgIku9aIixmN4FER6F8MpaIDxBNRDpnMFQO68rkX 853vDY3eq4gMHPt70zXBQL6GgxnUjoyXHq7sEZW5Ch9p3mZcv01x8BESRcosAjuU l26MA== Received: from dc6wp-exch02.marvell.com ([4.21.29.225]) by mx0a-0016f401.pphosted.com (PPS) with ESMTPS id 447kb982vg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Jan 2025 01:01:05 -0800 (PST) Received: from DC6WP-EXCH02.marvell.com (10.76.176.209) by DC6WP-EXCH02.marvell.com (10.76.176.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Fri, 17 Jan 2025 01:01:03 -0800 Received: from maili.marvell.com (10.69.176.80) by DC6WP-EXCH02.marvell.com (10.76.176.209) with Microsoft SMTP Server id 15.2.1544.4 via Frontend Transport; Fri, 17 Jan 2025 01:01:03 -0800 Received: from cavium-optiplex-3070-BM15.. (unknown [10.28.34.39]) by maili.marvell.com (Postfix) with ESMTP id C4A733F707D; Fri, 17 Jan 2025 01:00:59 -0800 (PST) From: Tomasz Duszynski To: , Tomasz Duszynski CC: , , , , , , , , , , Subject: [PATCH v17 3/4] pmu: support reading Intel x86_64 PMU events in runtime Date: Fri, 17 Jan 2025 10:00:32 +0100 Message-ID: <20250117090033.2807073-4-tduszynski@marvell.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250117090033.2807073-1-tduszynski@marvell.com> References: <20241118073706.3129423-1-tduszynski@marvell.com> <20250117090033.2807073-1-tduszynski@marvell.com> MIME-Version: 1.0 X-Proofpoint-GUID: SYAvUxSwkay-jxIEa2G1CIIZ7rUW5Yx2 X-Proofpoint-ORIG-GUID: SYAvUxSwkay-jxIEa2G1CIIZ7rUW5Yx2 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-17_03,2025-01-16_01,2024-11-22_01 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Add support for reading Intel x86_64 PMU events in runtime. Signed-off-by: Tomasz Duszynski --- app/test/test_pmu.c | 2 ++ lib/pmu/meson.build | 1 + lib/pmu/rte_pmu.h | 2 ++ lib/pmu/rte_pmu_pmc_x86_64.h | 24 ++++++++++++++++++++++++ 4 files changed, 29 insertions(+) create mode 100644 lib/pmu/rte_pmu_pmc_x86_64.h diff --git a/app/test/test_pmu.c b/app/test/test_pmu.c index e0c11e4898..addb078005 100644 --- a/app/test/test_pmu.c +++ b/app/test/test_pmu.c @@ -15,6 +15,8 @@ test_pmu_read(void) #if defined(RTE_ARCH_ARM64) name = "cpu_cycles"; +#elif defined(RTE_ARCH_X86_64) + name = "cpu-cycles"; #endif if (name == NULL) { diff --git a/lib/pmu/meson.build b/lib/pmu/meson.build index 467841e431..ad16ead11f 100644 --- a/lib/pmu/meson.build +++ b/lib/pmu/meson.build @@ -12,6 +12,7 @@ sources = files('rte_pmu.c') indirect_headers += files( 'rte_pmu_pmc_arm64.h', + 'rte_pmu_pmc_x86_64.h', ) if dpdk_conf.has('RTE_ARCH_ARM64') diff --git a/lib/pmu/rte_pmu.h b/lib/pmu/rte_pmu.h index 171c52e692..c66898d1d3 100644 --- a/lib/pmu/rte_pmu.h +++ b/lib/pmu/rte_pmu.h @@ -39,6 +39,8 @@ extern "C" { #if defined(RTE_ARCH_ARM64) #include "rte_pmu_pmc_arm64.h" +#elif defined(RTE_ARCH_X86_64) +#include "rte_pmu_pmc_x86_64.h" #endif /** Maximum number of events in a group */ diff --git a/lib/pmu/rte_pmu_pmc_x86_64.h b/lib/pmu/rte_pmu_pmc_x86_64.h new file mode 100644 index 0000000000..9ba6e5bae8 --- /dev/null +++ b/lib/pmu/rte_pmu_pmc_x86_64.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2025 Marvell. + */ +#ifndef _RTE_PMU_PMC_X86_64_H_ +#define _RTE_PMU_PMC_X86_64_H_ + +#include + +static __rte_always_inline uint64_t +rte_pmu_pmc_read(int index) +{ + uint64_t low, high; + + asm volatile( + "rdpmc\n" + : "=a" (low), "=d" (high) + : "c" (index) + ); + + return low | (high << 32); +} +#define rte_pmu_pmc_read rte_pmu_pmc_read + +#endif /* _RTE_PMU_PMC_X86_64_H_ */ From patchwork Fri Jan 17 09:00:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tomasz Duszynski X-Patchwork-Id: 150162 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (unknown [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0ECBC460A7; Fri, 17 Jan 2025 10:02:28 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3D686427AE; Fri, 17 Jan 2025 10:01:15 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id 08695427A6 for ; Fri, 17 Jan 2025 10:01:12 +0100 (CET) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 50H8QRjI009513; Fri, 17 Jan 2025 01:01:09 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pfpt0220; bh=H Ebrfi/nW1DltkH3/3N6W/ctCP8wQ4yJziLfN35NVWE=; b=N+Np1BAcJsU8NrGu8 oZR6PPC1Wzl6JywmH6/tjSjGk5hOjhBa7IcTGmKjIatSbZEE0GG0NL3aUlczePkJ OFYzER28qbUV99Z43wZcC/IvGcEzxt1njqhfWFQcvoMQefvm35I2azNfD8Nn7Bq0 NnC2To8NxnGexsNFGBBeibLZIdCirFJjYzdNfQGmNhXH3Xa62oz717vu1wWxuJOa +/nyBqJ50oNxRM1QL1/tefXrny16xOCSBzHYy+ggQo7spB1qe8SNRf/wz9iLb/O1 61/9M6PiB2hhqsznOgh/aoRAXiqCgrNGB6VeM7SewLOUDsc/ySoowt9Q2Pa8+KTo 0wQ3w== Received: from dc5-exch05.marvell.com ([199.233.59.128]) by mx0a-0016f401.pphosted.com (PPS) with ESMTPS id 447kps81xj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Jan 2025 01:01:08 -0800 (PST) Received: from DC5-EXCH05.marvell.com (10.69.176.209) by DC5-EXCH05.marvell.com (10.69.176.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Fri, 17 Jan 2025 01:01:08 -0800 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH05.marvell.com (10.69.176.209) with Microsoft SMTP Server id 15.2.1544.4 via Frontend Transport; Fri, 17 Jan 2025 01:01:08 -0800 Received: from cavium-optiplex-3070-BM15.. (unknown [10.28.34.39]) by maili.marvell.com (Postfix) with ESMTP id 1D7D03F707D; Fri, 17 Jan 2025 01:01:03 -0800 (PST) From: Tomasz Duszynski To: , Jerin Jacob , Sunil Kumar Kori , Tyler Retzlaff , "Tomasz Duszynski" CC: , , , , , , , , Subject: [PATCH v17 4/4] eal: add PMU support to tracing library Date: Fri, 17 Jan 2025 10:00:33 +0100 Message-ID: <20250117090033.2807073-5-tduszynski@marvell.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250117090033.2807073-1-tduszynski@marvell.com> References: <20241118073706.3129423-1-tduszynski@marvell.com> <20250117090033.2807073-1-tduszynski@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: B9CCs1G-7cEPacTHKctqHZtealRzOLgL X-Proofpoint-GUID: B9CCs1G-7cEPacTHKctqHZtealRzOLgL X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-01-17_03,2025-01-16_01,2024-11-22_01 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org In order to profile app one needs to store significant amount of samples somewhere for an analysis later on. Since trace library supports storing data in a CTF format lets take advantage of that and add a dedicated PMU tracepoint. Signed-off-by: Tomasz Duszynski --- app/test/test_trace_perf.c | 10 ++++ doc/guides/prog_guide/profile_app.rst | 5 ++ doc/guides/prog_guide/trace_lib.rst | 32 +++++++++++ lib/eal/common/eal_common_trace.c | 5 +- lib/eal/common/eal_common_trace_pmu.c | 38 ++++++++++++++ lib/eal/common/eal_common_trace_points.c | 5 ++ lib/eal/common/eal_trace.h | 4 ++ lib/eal/common/meson.build | 1 + lib/eal/include/rte_eal_trace.h | 11 ++++ lib/eal/version.map | 1 + lib/pmu/rte_pmu.c | 67 +++++++++++++++++++++++- lib/pmu/rte_pmu.h | 24 +++++++-- lib/pmu/version.map | 1 + 13 files changed, 198 insertions(+), 6 deletions(-) create mode 100644 lib/eal/common/eal_common_trace_pmu.c diff --git a/app/test/test_trace_perf.c b/app/test/test_trace_perf.c index 8257cc02be..28f908ce40 100644 --- a/app/test/test_trace_perf.c +++ b/app/test/test_trace_perf.c @@ -114,6 +114,10 @@ worker_fn_##func(void *arg) \ #define GENERIC_DOUBLE rte_eal_trace_generic_double(3.66666) #define GENERIC_STR rte_eal_trace_generic_str("hello world") #define VOID_FP app_dpdk_test_fp() +#ifdef RTE_LIB_PMU +/* 0 corresponds first event passed via --trace= */ +#define READ_PMU rte_pmu_trace_read(0) +#endif WORKER_DEFINE(GENERIC_VOID) WORKER_DEFINE(GENERIC_U64) @@ -122,6 +126,9 @@ WORKER_DEFINE(GENERIC_FLOAT) WORKER_DEFINE(GENERIC_DOUBLE) WORKER_DEFINE(GENERIC_STR) WORKER_DEFINE(VOID_FP) +#ifdef RTE_LIB_PMU +WORKER_DEFINE(READ_PMU) +#endif static void run_test(const char *str, lcore_function_t f, struct test_data *data, size_t sz) @@ -174,6 +181,9 @@ test_trace_perf(void) run_test("double", worker_fn_GENERIC_DOUBLE, data, sz); run_test("string", worker_fn_GENERIC_STR, data, sz); run_test("void_fp", worker_fn_VOID_FP, data, sz); +#ifdef RTE_LIB_PMU + run_test("read_pmu", worker_fn_READ_PMU, data, sz); +#endif rte_free(data); return TEST_SUCCESS; diff --git a/doc/guides/prog_guide/profile_app.rst b/doc/guides/prog_guide/profile_app.rst index 9a18aa0a26..525b5012ed 100644 --- a/doc/guides/prog_guide/profile_app.rst +++ b/doc/guides/prog_guide/profile_app.rst @@ -40,6 +40,11 @@ Current implementation imposes certain limitations: * Each EAL lcore measures same group of events +Alternatively tracing library can be used which offers dedicated tracepoint +``rte_pmu_trace_read()``. + +Refer to :doc:`../prog_guide/trace_lib` for more details. + Profiling on x86 ---------------- diff --git a/doc/guides/prog_guide/trace_lib.rst b/doc/guides/prog_guide/trace_lib.rst index d9b17abe90..1cbfd42656 100644 --- a/doc/guides/prog_guide/trace_lib.rst +++ b/doc/guides/prog_guide/trace_lib.rst @@ -46,6 +46,7 @@ DPDK tracing library features trace format and is compatible with ``LTTng``. For detailed information, refer to `Common Trace Format `_. +- Support reading PMU events on ARM64 and x86-64 (Intel) How to add a tracepoint? ------------------------ @@ -139,6 +140,37 @@ the user must use ``RTE_TRACE_POINT_FP`` instead of ``RTE_TRACE_POINT``. ``RTE_TRACE_POINT_FP`` is compiled out by default and it can be enabled using the ``enable_trace_fp`` option for meson build. +PMU tracepoint +-------------- + +Performance monitoring unit (PMU) event values can be read from hardware +registers using predefined ``rte_pmu_read`` tracepoint. + +Tracing is enabled via ``--trace`` EAL option by passing both expression +matching PMU tracepoint name i.e ``lib.eal.pmu.read`` and expression +``e=ev1[,ev2,...]`` matching particular events:: + + --trace='.*pmu.read\|e=cpu_cycles,l1d_cache' + +Event names are available under ``/sys/bus/event_source/devices/PMU/events`` +directory, where ``PMU`` is a placeholder for either a ``cpu`` or a directory +containing ``cpus``. + +In contrary to other tracepoints this does not need any extra variables +added to source files. Instead, caller passes index which follows the order of +events specified via ``--trace`` parameter. In the following example index ``0`` +corresponds to ``cpu_cyclces`` while index ``1`` corresponds to ``l1d_cache``. + +.. code-block:: c + + ... + rte_pmu_trace_read(0); + rte_pmu_trace_read(1); + ... + +PMU tracing support must be explicitly enabled using the ``enable_trace_fp`` +option for meson build. + Event record mode ----------------- diff --git a/lib/eal/common/eal_common_trace.c b/lib/eal/common/eal_common_trace.c index 918f49bf4f..9be8724ec4 100644 --- a/lib/eal/common/eal_common_trace.c +++ b/lib/eal/common/eal_common_trace.c @@ -72,8 +72,10 @@ eal_trace_init(void) goto free_meta; /* Apply global configurations */ - STAILQ_FOREACH(arg, &trace.args, next) + STAILQ_FOREACH(arg, &trace.args, next) { trace_args_apply(arg->val); + trace_pmu_args_apply(arg->val); + } rte_trace_mode_set(trace.mode); @@ -89,6 +91,7 @@ eal_trace_init(void) void eal_trace_fini(void) { + trace_pmu_args_free(); trace_mem_free(); trace_metadata_destroy(); eal_trace_args_free(); diff --git a/lib/eal/common/eal_common_trace_pmu.c b/lib/eal/common/eal_common_trace_pmu.c new file mode 100644 index 0000000000..3824904481 --- /dev/null +++ b/lib/eal/common/eal_common_trace_pmu.c @@ -0,0 +1,38 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2025 Marvell International Ltd. + */ + +#include + +#include "eal_trace.h" + +#ifdef RTE_LIB_PMU + +#include + +void +trace_pmu_args_apply(const char *arg) +{ + static bool once; + + if (!once) { + if (rte_pmu_init()) + return; + once = true; + } + + rte_pmu_add_events_by_pattern(arg); +} + +void +trace_pmu_args_free(void) +{ + rte_pmu_fini(); +} + +#else /* !RTE_LIB_PMU */ + +void trace_pmu_args_apply(const char *arg __rte_unused) { return; } +void trace_pmu_args_free(void) { return; } + +#endif /* RTE_LIB_PMU */ diff --git a/lib/eal/common/eal_common_trace_points.c b/lib/eal/common/eal_common_trace_points.c index 0f1240ea3a..25aeb439df 100644 --- a/lib/eal/common/eal_common_trace_points.c +++ b/lib/eal/common/eal_common_trace_points.c @@ -100,3 +100,8 @@ RTE_TRACE_POINT_REGISTER(rte_eal_trace_intr_enable, lib.eal.intr.enable) RTE_TRACE_POINT_REGISTER(rte_eal_trace_intr_disable, lib.eal.intr.disable) + +#if defined(ALLOW_EXPERIMENTAL_API) && defined(RTE_LIB_PMU) +RTE_TRACE_POINT_REGISTER(rte_pmu_trace_read, + lib.pmu.read) +#endif diff --git a/lib/eal/common/eal_trace.h b/lib/eal/common/eal_trace.h index 55262677e0..58fa43472a 100644 --- a/lib/eal/common/eal_trace.h +++ b/lib/eal/common/eal_trace.h @@ -104,6 +104,10 @@ int trace_epoch_time_save(void); void trace_mem_free(void); void trace_mem_per_thread_free(void); +/* PMU wrappers */ +void trace_pmu_args_apply(const char *arg); +void trace_pmu_args_free(void); + /* EAL interface */ int eal_trace_init(void); void eal_trace_fini(void); diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index e273745e93..239c111461 100644 --- a/lib/eal/common/meson.build +++ b/lib/eal/common/meson.build @@ -28,6 +28,7 @@ sources += files( 'eal_common_tailqs.c', 'eal_common_thread.c', 'eal_common_timer.c', + 'eal_common_trace_pmu.c', 'eal_common_trace_points.c', 'eal_common_uuid.c', 'malloc_elem.c', diff --git a/lib/eal/include/rte_eal_trace.h b/lib/eal/include/rte_eal_trace.h index 9ad2112801..774ff9dcb2 100644 --- a/lib/eal/include/rte_eal_trace.h +++ b/lib/eal/include/rte_eal_trace.h @@ -127,6 +127,17 @@ RTE_TRACE_POINT( #define RTE_EAL_TRACE_GENERIC_FUNC rte_eal_trace_generic_func(__func__) +#if defined(ALLOW_EXPERIMENTAL_API) && defined(RTE_LIB_PMU) +#include + +RTE_TRACE_POINT_FP( + rte_pmu_trace_read, + RTE_TRACE_POINT_ARGS(unsigned int index), + uint64_t val = rte_pmu_read(index); + rte_trace_point_emit_u64(val); +) +#endif + #ifdef __cplusplus } #endif diff --git a/lib/eal/version.map b/lib/eal/version.map index a20c713eb1..945414c626 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -398,6 +398,7 @@ EXPERIMENTAL { # added in 24.11 rte_bitset_to_str; rte_lcore_var_alloc; + __rte_pmu_trace_read; # WINDOWS_NO_EXPORT }; INTERNAL { diff --git a/lib/pmu/rte_pmu.c b/lib/pmu/rte_pmu.c index ae24fdf2ee..0ff9e76064 100644 --- a/lib/pmu/rte_pmu.c +++ b/lib/pmu/rte_pmu.c @@ -413,12 +413,75 @@ rte_pmu_add_event(const char *name) return event->index; } +static int +add_events(const char *pattern) +{ + char *token, *copy, *tmp; + int ret = 0; + + copy = strdup(pattern); + if (copy == NULL) + return -ENOMEM; + + token = strtok_r(copy, ",", &tmp); + while (token) { + ret = rte_pmu_add_event(token); + if (ret < 0) + break; + + token = strtok_r(NULL, ",", &tmp); + } + + free(copy); + + return ret >= 0 ? 0 : ret; +} + +int +rte_pmu_add_events_by_pattern(const char *pattern) +{ + regmatch_t rmatch; + char buf[BUFSIZ]; + unsigned int num; + regex_t reg; + int ret; + + /* events are matched against occurrences of e=ev1[,ev2,..] pattern */ + ret = regcomp(®, "e=([_[:alnum:]-],?)+", REG_EXTENDED); + if (ret) { + PMU_LOG(ERR, "Failed to compile event matching regexp"); + return -EINVAL; + } + + for (;;) { + if (regexec(®, pattern, 1, &rmatch, 0)) + break; + + num = rmatch.rm_eo - rmatch.rm_so; + if (num > sizeof(buf)) + num = sizeof(buf); + + /* skip e= pattern prefix */ + memcpy(buf, pattern + rmatch.rm_so + 2, num - 2); + buf[num - 2] = '\0'; + ret = add_events(buf); + if (ret) + break; + + pattern += rmatch.rm_eo; + } + + regfree(®); + + return ret; +} + int rte_pmu_init(void) { int ret; - if (rte_pmu.initialized) + if (rte_pmu.initialized && ++rte_pmu.initialized) return 0; ret = scan_pmus(); @@ -451,7 +514,7 @@ rte_pmu_fini(void) struct rte_pmu_event_group *group; unsigned int i; - if (!rte_pmu.initialized) + if (!rte_pmu.initialized || --rte_pmu.initialized) return; RTE_TAILQ_FOREACH_SAFE(event, &rte_pmu.event_list, next, tmp_event) { diff --git a/lib/pmu/rte_pmu.h b/lib/pmu/rte_pmu.h index c66898d1d3..093e62f040 100644 --- a/lib/pmu/rte_pmu.h +++ b/lib/pmu/rte_pmu.h @@ -19,7 +19,9 @@ * is required: * * rte_pmu_init() - * rte_pmu_add_event() + * rte_pmu_add_event() [or rte_pmu_add_events_by_pattern()] + * + * Note that if -Denable_trace_fp=True was passed to meson rte_pmu_init() gets called automatically. * * Afterwards all threads can read events by calling rte_pmu_read(). */ @@ -143,7 +145,7 @@ __rte_pmu_enable_group(struct rte_pmu_event_group *group); * @warning * @b EXPERIMENTAL: this API may change without prior notice * - * Initialize PMU library. + * Initialize PMU library. It's safe to call it multiple times. * * @return * 0 in case of success, negative value otherwise. @@ -156,7 +158,9 @@ rte_pmu_init(void); * @warning * @b EXPERIMENTAL: this API may change without prior notice * - * Finalize PMU library. + * Finalize PMU library. Number of calls must match number + * of times rte_pmu_init() was called. Otherwise memory + * won't be freed properly. */ __rte_experimental void @@ -184,6 +188,20 @@ rte_pmu_add_event(const char *name); #define __rte_pmu_read_userpage(pc) ({ RTE_SET_USED(pc); 0; }) #endif +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Add events matching pattern to the group of enabled events. + * + * @param pattern + * Pattern e=ev1[,ev2,...] matching events, listed under /sys/bus/event_source/devices/pmu/events, + * where evX and pmu are placeholders for respectively an event and an event source. + */ +__rte_experimental +int +rte_pmu_add_events_by_pattern(const char *pattern); + /** * @warning * @b EXPERIMENTAL: this API may change without prior notice diff --git a/lib/pmu/version.map b/lib/pmu/version.map index d21f7cf933..2502326cdf 100644 --- a/lib/pmu/version.map +++ b/lib/pmu/version.map @@ -6,6 +6,7 @@ EXPERIMENTAL { __rte_pmu_read_userpage; rte_pmu; rte_pmu_add_event; + rte_pmu_add_events_by_pattern; rte_pmu_fini; rte_pmu_init; rte_pmu_read;