Message ID | 20221129092821.1304853-1-tduszynski@marvell.com (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9D3CFA0093; Tue, 29 Nov 2022 10:28:35 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7C40A40691; Tue, 29 Nov 2022 10:28:35 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id CC9294067E for <dev@dpdk.org>; Tue, 29 Nov 2022 10:28:33 +0100 (CET) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2AT3Nr8q005707; Tue, 29 Nov 2022 01:28:32 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=IdJeNPCekr8qmPGAPaIOo8JYoHnFRABcBstxGcysUOw=; b=F0aXAWarTEYnMN5hwDAAQxRzJSC3bxm5vVltEUV2qRJjT6d6/EW0xJcG8u/rO6r2fTS1 nWwJalRIMqYmUmg0XiLlshskk/T7Spxn0+WBu11vl4zQRVk7VUX2CUN0WZmid0zdTEae jA0l+fTpUrzTxW18wm+De8fymlIkV0e+D3tOCiE1xViLSc3iLya20HoitaOBKQB1H3Cm Rx/cDEMusbCSqMO2wDgoTUE55WSZfH8TPNunbBs5dY1QJCjqWqzHT+otWjKBM+ZEbjHN qjURLfUNYQGw3X4Q2UGRgEyU2iCcgr/1VbchFnzBH48Hlhajy80u1m2MIk2DWGpeaEuO Ng== Received: from dc5-exch02.marvell.com ([199.233.59.182]) by mx0a-0016f401.pphosted.com (PPS) with ESMTPS id 3m5a5098cj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Tue, 29 Nov 2022 01:28:32 -0800 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Tue, 29 Nov 2022 01:28:31 -0800 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Tue, 29 Nov 2022 01:28:30 -0800 Received: from localhost.localdomain (unknown [10.28.34.39]) by maili.marvell.com (Postfix) with ESMTP id 8875D3F7043; Tue, 29 Nov 2022 01:28:29 -0800 (PST) From: Tomasz Duszynski <tduszynski@marvell.com> To: <dev@dpdk.org> CC: <thomas@monjalon.net>, <jerinj@marvell.com>, Tomasz Duszynski <tduszynski@marvell.com> Subject: [PATCH v3 0/4] add support for self monitoring Date: Tue, 29 Nov 2022 10:28:17 +0100 Message-ID: <20221129092821.1304853-1-tduszynski@marvell.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221121121121.3917194-1-tduszynski@marvell.com> References: <20221121121121.3917194-1-tduszynski@marvell.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Proofpoint-ORIG-GUID: wx9h3OW0esTGS-509OMBJG7RY25SQPDg X-Proofpoint-GUID: wx9h3OW0esTGS-509OMBJG7RY25SQPDg X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-29_06,2022-11-28_02,2022-06-22_01 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org |
Series | add support for self monitoring | |
Message
Tomasz Duszynski
Nov. 29, 2022, 9:28 a.m. UTC
This series adds self monitoring support i.e allows to configure and read performance measurement unit (PMU) counters in runtime without using perf utility. This has certain adventages when application runs on isolated cores with nohz_full kernel parameter. Events can be read directly using rte_pmu_read() or using dedicated tracepoint rte_eal_trace_pmu_read(). The latter will cause events to be stored inside CTF file. By design, all enabled events are grouped together and the same group is attached to lcores that use self monitoring funtionality. Events are enabled by names, which need to be read from standard location under sysfs i.e /sys/bus/event_source/devices/PMU/events where PMU is a core pmu i.e one measuring cpu events. As of today raw events are not supported. v3: - fix shared build v2: - fix problems reported by test build infra Tomasz Duszynski (4): eal: add generic support for reading PMU events eal/arm: support reading ARM PMU events in runtime eal/x86: support reading Intel PMU events in runtime eal: add PMU support to tracing library app/test/meson.build | 1 + app/test/test_pmu.c | 47 ++ app/test/test_trace_perf.c | 4 + doc/guides/prog_guide/profile_app.rst | 13 + doc/guides/prog_guide/trace_lib.rst | 32 ++ lib/eal/arm/include/meson.build | 1 + lib/eal/arm/include/rte_pmu_pmc.h | 39 ++ lib/eal/arm/meson.build | 4 + lib/eal/arm/rte_pmu.c | 104 +++++ lib/eal/common/eal_common_trace_points.c | 3 + lib/eal/common/meson.build | 3 + lib/eal/common/pmu_private.h | 41 ++ lib/eal/common/rte_pmu.c | 520 +++++++++++++++++++++++ lib/eal/include/meson.build | 1 + lib/eal/include/rte_eal_trace.h | 11 + lib/eal/include/rte_pmu.h | 207 +++++++++ lib/eal/linux/eal.c | 4 + lib/eal/version.map | 7 + lib/eal/x86/include/meson.build | 1 + lib/eal/x86/include/rte_pmu_pmc.h | 33 ++ 20 files changed, 1076 insertions(+) create mode 100644 app/test/test_pmu.c create mode 100644 lib/eal/arm/include/rte_pmu_pmc.h create mode 100644 lib/eal/arm/rte_pmu.c create mode 100644 lib/eal/common/pmu_private.h create mode 100644 lib/eal/common/rte_pmu.c create mode 100644 lib/eal/include/rte_pmu.h create mode 100644 lib/eal/x86/include/rte_pmu_pmc.h -- 2.25.1
Comments
> From: Tomasz Duszynski [mailto:tduszynski@marvell.com] > Sent: Tuesday, 29 November 2022 10.28 > > This series adds self monitoring support i.e allows to configure and > read performance measurement unit (PMU) counters in runtime without > using perf utility. This has certain adventages when application runs > on > isolated cores with nohz_full kernel parameter. > > Events can be read directly using rte_pmu_read() or using dedicated > tracepoint rte_eal_trace_pmu_read(). The latter will cause events to be > stored inside CTF file. > > By design, all enabled events are grouped together and the same group > is attached to lcores that use self monitoring funtionality. > > Events are enabled by names, which need to be read from standard > location under sysfs i.e > > /sys/bus/event_source/devices/PMU/events > > where PMU is a core pmu i.e one measuring cpu events. As of today > raw events are not supported. Hi Thomasz, I am very interested in this patch series for fast path profiling purposes. (Not using EAL trace, but our proprietary profiler.) However, it seems that rte_pmu_read() is quite longwinded, compared to rte_pmu_pmc_read(). But perhaps I am just worrying too much, so I will ask: What is the performance cost of using rte_pmu_read() - compared to rte_pmu_pmc_read() - in the fast path? If there is a non-negligible difference, could you please provide an example of how to configure PMU events and use rte_pmu_pmc_read() in an application? I would primarily be interested in data cache misses and branch mispredictions. But feel free to make your own choices for the example.
Hi Morten, > -----Original Message----- > From: Morten Brørup <mb@smartsharesystems.com> > Sent: Tuesday, November 29, 2022 11:43 AM > To: Tomasz Duszynski <tduszynski@marvell.com>; dev@dpdk.org > Cc: thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com> > Subject: [EXT] RE: [PATCH v3 0/4] add support for self monitoring > > External Email > > ---------------------------------------------------------------------- > > From: Tomasz Duszynski [mailto:tduszynski@marvell.com] > > Sent: Tuesday, 29 November 2022 10.28 > > > > This series adds self monitoring support i.e allows to configure and > > read performance measurement unit (PMU) counters in runtime without > > using perf utility. This has certain adventages when application runs > > on isolated cores with nohz_full kernel parameter. > > > > Events can be read directly using rte_pmu_read() or using dedicated > > tracepoint rte_eal_trace_pmu_read(). The latter will cause events to > > be stored inside CTF file. > > > > By design, all enabled events are grouped together and the same group > > is attached to lcores that use self monitoring funtionality. > > > > Events are enabled by names, which need to be read from standard > > location under sysfs i.e > > > > /sys/bus/event_source/devices/PMU/events > > > > where PMU is a core pmu i.e one measuring cpu events. As of today raw > > events are not supported. > > Hi Thomasz, > > I am very interested in this patch series for fast path profiling purposes. (Not using EAL trace, > but our proprietary profiler.) > > However, it seems that rte_pmu_read() is quite longwinded, compared to rte_pmu_pmc_read(). > We need some bit of extra logic to set thigs up before performing reading actual counter but in reality cycles are mostly consumed by rte_pmu_pmc_read(). This obviously differs among platforms so if you want precise measurements you need to get your hands dirty. That said, below are results coming from dpdk-test after running trace_perf_autotest - just to give you some idea. X86-64 RTE>>trace_perf_autotest Timer running at 3000.00MHz void: cycles=17.739375 ns=5.913125 u64: cycles=17.348296 ns=5.782765 int: cycles=17.098724 ns=5.699575 float: cycles=17.099946 ns=5.699982 double: cycles=17.229702 ns=5.743234 string: cycles=31.159907 ns=10.386636 void_fp: cycles=0.679842 ns=0.226614 read_pmu: cycles=49.325117 ns=16.441706 ARM64 with RTE_ARM_EAL_RDTSC_USE_PMU RTE>>trace_perf_autotest Timer running at 2480.00MHz void: cycles=9.413568 ns=3.795793 u64: cycles=9.386003 ns=3.784678 int: cycles=9.438701 ns=3.805928 float: cycles=9.359377 ns=3.773942 double: cycles=9.372279 ns=3.779145 string: cycles=24.474899 ns=9.868911 void_fp: cycles=0.505513 ns=0.203836 read_pmu: cycles=17.442853 ns=7.033409 > But perhaps I am just worrying too much, so I will ask: What is the performance cost of using > rte_pmu_read() - compared to rte_pmu_pmc_read() - in the fast path? > > If there is a non-negligible difference, could you please provide an example of how to configure > PMU events and use rte_pmu_pmc_read() in an application? > Series come with some docs so you can check there how to run it. > I would primarily be interested in data cache misses and branch mispredictions. But feel free to > make your own choices for the example. Raw events are not supported right now which means you don't have fine control over all events. You can use only events from CPU PMU (/sys/bus/event_source/devices/<PMU>/events).