[4/8] net/mlx5: add sysfs check for Multiport E-Switch
Checks
Commit Message
This patch implements checking if Multiport E-Switch is enabled
on a given PCI device, using sysfs Linux kernel interface.
This facility will be used in follow up commits,
which add support for such configuration to mlx5 PMD.
MLNX_OFED mlx5_core kernel module versions which support
Multiport E-Switch do not expose this configuration through Devlink,
but through sysfs interface.
If such a version is used, then Multiport E-Switch can be enabled
(or its state can be probed) through a sysfs file under path:
# <ifname> should be substituted with Linux interface name.
/sys/class/net/<ifname>/compat/devlink/lag_port_select_mode
Writing "multiport_esw" to this file enables Multiport E-Switch.
If "multiport_esw" is read from this file, then
Multiport E-Switch is enabled.
If this file does not exist or writing "multiport_esw" to this file,
raises an error, then Multiport E-Switch is not supported.
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
drivers/net/mlx5/linux/mlx5_os.c | 69 ++++++++++++++++++++++++++++++++
1 file changed, 69 insertions(+)
Comments
On Tue, 31 Oct 2023 16:27:29 +0200
Dariusz Sosnowski <dsosnowski@nvidia.com> wrote:
> + MKSTR(sysfs_if_path, "/sys/class/net/%s", ifname);
> + if (mlx5_get_pci_addr(sysfs_if_path, &if_pci_addr))
> + continue;
> + if (pci_addr->domain != if_pci_addr.domain ||
> + pci_addr->bus != if_pci_addr.bus ||
> + pci_addr->devid != if_pci_addr.devid ||
> + pci_addr->function != if_pci_addr.function)
> + continue;
> + MKSTR(sysfs_mpesw_path,
> + "/sys/class/net/%s/compat/devlink/lag_port_select_mode", ifname);
There are lots of DPDK code that reads sysfs, but eal and each driver ends up
coding there own way of handling this. Would be good to have common helpers in EAL.
Hi Stephen,
Thank you for your comment.
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Tuesday, October 31, 2023 17:09
> To: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; Suanming Mou
> <suanmingm@nvidia.com>; dev@dpdk.org; Raslan Darawsheh
> <rasland@nvidia.com>
> Subject: Re: [PATCH 4/8] net/mlx5: add sysfs check for Multiport E-Switch
>
> External email: Use caution opening links or attachments
>
>
> On Tue, 31 Oct 2023 16:27:29 +0200
> Dariusz Sosnowski <dsosnowski@nvidia.com> wrote:
>
> > + MKSTR(sysfs_if_path, "/sys/class/net/%s", ifname);
> > + if (mlx5_get_pci_addr(sysfs_if_path, &if_pci_addr))
> > + continue;
> > + if (pci_addr->domain != if_pci_addr.domain ||
> > + pci_addr->bus != if_pci_addr.bus ||
> > + pci_addr->devid != if_pci_addr.devid ||
> > + pci_addr->function != if_pci_addr.function)
> > + continue;
> > + MKSTR(sysfs_mpesw_path,
> > +
> > + "/sys/class/net/%s/compat/devlink/lag_port_select_mode", ifname);
>
> There are lots of DPDK code that reads sysfs, but eal and each driver ends up
> coding there own way of handling this. Would be good to have common
> helpers in EAL.
Agreed.
From a quick glance, I see that there are a few sysfs paths with which several drivers interact with e.g.:
- /sys/class/net
- /sys/bus/pci/devices
- /sys/devices
I think that, introducing common sysfs utilities (for example, some way of interacting with such common paths or just constructing sysfs paths) in DPDK could be beneficial.
We definitely can look into it, to see if it is viable.
Best regards,
Dariusz Sosnowski
@@ -1931,6 +1931,75 @@ mlx5_device_bond_pci_match(const char *ibdev_name,
return pf;
}
+#define SYSFS_MPESW_PARAM_MAX_LEN 16
+
+static __rte_unused int
+mlx5_sysfs_esw_multiport_get(struct ibv_device *ibv, struct rte_pci_addr *pci_addr, int *enabled)
+{
+ int nl_rdma;
+ unsigned int n_ports;
+ unsigned int i;
+ int ret;
+
+ /* Provide correct value to have defined enabled state in case of an error. */
+ *enabled = 0;
+ nl_rdma = mlx5_nl_init(NETLINK_RDMA, 0);
+ if (nl_rdma < 0)
+ return nl_rdma;
+ n_ports = mlx5_nl_portnum(nl_rdma, ibv->name);
+ if (!n_ports) {
+ ret = -rte_errno;
+ goto close_nl_rdma;
+ }
+ for (i = 1; i <= n_ports; ++i) {
+ unsigned int ifindex;
+ char ifname[IF_NAMESIZE + 1];
+ struct rte_pci_addr if_pci_addr;
+ char mpesw[SYSFS_MPESW_PARAM_MAX_LEN + 1];
+ FILE *sysfs;
+ int n;
+
+ ifindex = mlx5_nl_ifindex(nl_rdma, ibv->name, i);
+ if (!ifindex)
+ continue;
+ if (!if_indextoname(ifindex, ifname))
+ continue;
+ MKSTR(sysfs_if_path, "/sys/class/net/%s", ifname);
+ if (mlx5_get_pci_addr(sysfs_if_path, &if_pci_addr))
+ continue;
+ if (pci_addr->domain != if_pci_addr.domain ||
+ pci_addr->bus != if_pci_addr.bus ||
+ pci_addr->devid != if_pci_addr.devid ||
+ pci_addr->function != if_pci_addr.function)
+ continue;
+ MKSTR(sysfs_mpesw_path,
+ "/sys/class/net/%s/compat/devlink/lag_port_select_mode", ifname);
+ sysfs = fopen(sysfs_mpesw_path, "r");
+ if (!sysfs)
+ continue;
+ n = fscanf(sysfs, "%" RTE_STR(SYSFS_MPESW_PARAM_MAX_LEN) "s", mpesw);
+ fclose(sysfs);
+ if (n != 1)
+ continue;
+ ret = 0;
+ if (strcmp(mpesw, "multiport_esw") == 0) {
+ *enabled = 1;
+ break;
+ }
+ *enabled = 0;
+ break;
+ }
+ if (i > n_ports) {
+ DRV_LOG(DEBUG, "Unable to get Multiport E-Switch state by sysfs.");
+ rte_errno = ENOENT;
+ ret = -rte_errno;
+ }
+
+close_nl_rdma:
+ close(nl_rdma);
+ return ret;
+}
+
/**
* Register a PCI device within bonding.
*