[v3] net/mlx5: enable PCI related counters

Message ID 20240215182636.1395044-1-wathsala.vithanage@arm.com (mailing list archive)
State Accepted, archived
Delegated to: Raslan Darawsheh
Headers
Series [v3] net/mlx5: enable PCI related counters |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/github-robot: build success github build: passed
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-sample-apps-testing success Testing PASS

Commit Message

Wathsala Wathawana Vithanage Feb. 15, 2024, 6:26 p.m. UTC
  Versions of Mellanox NICs starting from CX5 have device counters
related to PCI. These counters are helpful in debugging IO
bottlenecks. For instance, the outbound_pci_stalled_rd and
outbound_pci_stalled_wr counters can help with identifying NIC
stalls due to insufficient PCI credits, which otherwise would
have required a PCI analyzer or a sophisticated PCI root port
with a PMU.
Currently none of these are available in the MLX5 PMD even
though ethtool is capable of reading some of them.
Since PMD uses the same ioctl used by ethtool (SIOCETHTOOL) and
reads via the kernel driver it is possible to add support with
ease.
There is one more PCI related counter and a device counter that
aren't implemented in the Linux driver at the moment. These two
are named outbound_pci_buffer_overflow and dev_out_of_buffer
respectively. As per Nvidia's documentation these two counters
can tell the number of packets dropped due to pci buffer
overflow and the number of times the device owned queue had not
enough buffers allocated.

Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 .mailmap                                |  1 +
 drivers/net/mlx5/linux/mlx5_ethdev_os.c | 37 +++++++++++++++++++++++++
 2 files changed, 38 insertions(+)
  

Comments

Raslan Darawsheh Feb. 27, 2024, 4:14 p.m. UTC | #1
Hi,
> -----Original Message-----
> From: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Sent: Thursday, February 15, 2024 8:27 PM
> To: NBU-Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; Suanming Mou
> <suanmingm@nvidia.com>; Matan Azrad <matan@nvidia.com>
> Cc: dev@dpdk.org; nd@arm.com; Wathsala Vithanage
> <wathsala.vithanage@arm.com>; Honnappa Nagarahalli
> <honnappa.nagarahalli@arm.com>
> Subject: [PATCH v3] net/mlx5: enable PCI related counters
> 
> Versions of Mellanox NICs starting from CX5 have device counters related to
> PCI. These counters are helpful in debugging IO bottlenecks. For instance, the
> outbound_pci_stalled_rd and outbound_pci_stalled_wr counters can help
> with identifying NIC stalls due to insufficient PCI credits, which otherwise
> would have required a PCI analyzer or a sophisticated PCI root port with a
> PMU.
> Currently none of these are available in the MLX5 PMD even though ethtool is
> capable of reading some of them.
> Since PMD uses the same ioctl used by ethtool (SIOCETHTOOL) and reads via
> the kernel driver it is possible to add support with ease.
> There is one more PCI related counter and a device counter that aren't
> implemented in the Linux driver at the moment. These two are named
> outbound_pci_buffer_overflow and dev_out_of_buffer respectively. As per
> Nvidia's documentation these two counters can tell the number of packets
> dropped due to pci buffer overflow and the number of times the device
> owned queue had not enough buffers allocated.
> 
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Patch applied to next-net-mlx,
Kindest regards
Raslan Darawsheh
  

Patch

diff --git a/.mailmap b/.mailmap
index aa569ff456..f57415f7a1 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1510,6 +1510,7 @@  Walter Heymans <walter.heymans@corigine.com>
 Wang Sheng-Hui <shhuiw@gmail.com>
 Wangyu (Eric) <seven.wangyu@huawei.com>
 Waterman Cao <waterman.cao@intel.com>
+Wathsala Vithanage <wathsala.vithanage@arm.com>
 Weichun Chen <weichunx.chen@intel.com>
 Wei Dai <wei.dai@intel.com>
 Weifeng Li <liweifeng96@126.com>
diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
index dd5a0c546d..92c47a3b3d 100644
--- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
+++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
@@ -1574,6 +1574,43 @@  static const struct mlx5_counter_ctrl mlx5_counters_init[] = {
 		.dpdk_name = "tx_vport_bytes",
 		.ctr_name = "vport_tx_bytes",
 	},
+	/**
+	 * Device counters: These counters are for the
+	 * entire PCI device (NIC). These counters are
+	 * not counting on a per port/queue basis.
+	 */
+	{
+		.dpdk_name = "rx_pci_signal_integrity",
+		.ctr_name = "rx_pci_signal_integrity",
+	},
+	{
+		.dpdk_name = "tx_pci_signal_integrity",
+		.ctr_name = "tx_pci_signal_integrity",
+	},
+	{
+		.dpdk_name = "outbound_pci_buffer_overflow",
+		.ctr_name = "outbound_pci_buffer_overflow",
+	},
+	{
+		.dpdk_name = "outbound_pci_stalled_rd",
+		.ctr_name = "outbound_pci_stalled_rd",
+	},
+	{
+		.dpdk_name = "outbound_pci_stalled_wr",
+		.ctr_name = "outbound_pci_stalled_wr",
+	},
+	{
+		.dpdk_name = "outbound_pci_stalled_rd_events",
+		.ctr_name = "outbound_pci_stalled_rd_events",
+	},
+	{
+		.dpdk_name = "outbound_pci_stalled_wr_events",
+		.ctr_name = "outbound_pci_stalled_wr_events",
+	},
+	{
+		.dpdk_name = "dev_out_of_buffer",
+		.ctr_name = "dev_out_of_buffer",
+	},
 };
 
 static const unsigned int xstats_n = RTE_DIM(mlx5_counters_init);