[v3] eal/common: fix inconsistent representation of PCI numbers

Message ID 20240708165145.1405107-1-shperetz@nvidia.com (mailing list archive)
State New
Delegated to: Thomas Monjalon
Headers
Series [v3] eal/common: fix inconsistent representation of PCI numbers |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/intel-Functional success Functional PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS

Commit Message

Shani Peretz July 8, 2024, 4:51 p.m. UTC
  DPDK allows for two ways to specify PCI device numbers:
a full version ("0000:08:00.0") and a short version ("08:00.0").
The problem arises when the application uses one format (e.g., full)
when running testpmd, but then tries to use the other format (e.g., short)
in a subsequent command, leading to a failure.

The cmp_dev_name func, which is responsible for comparing PCI device names,
is not handling the inconsistent PCI number representations correctly.
The suggested fix is to use the pci_parse function, which can parse
the PCI device name and fill a struct rte_pci_addr with the standardized
representation of the PCI number.
By comparing the struct rte_pci_addr instances instead of the string
representations, the application can ensure consistent handling of
PCI device numbers, regardless of the format used.

Fixes: a3ee360f4440 ("eal: add hotplug add/remove device")
Cc: jblunck@infradead.org

Signed-off-by: Shani Peretz <shperetz@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
---
 app/test/test_vdev.c            | 10 ++--------
 drivers/bus/pci/pci_common.c    | 11 +++++++++++
 lib/eal/common/eal_common_dev.c | 11 ++++++-----
 lib/eal/common/hotplug_mp.c     | 11 ++---------
 lib/eal/include/bus_driver.h    | 18 ++++++++++++++++++
 lib/eal/include/rte_dev.h       | 16 ++++++++++++++++
 lib/eal/linux/eal_dev.c         | 10 +---------
 lib/eal/version.map             |  3 +++
 8 files changed, 59 insertions(+), 31 deletions(-)
  

Comments

David Marchand July 12, 2024, 1:49 p.m. UTC | #1
On Mon, Jul 8, 2024 at 6:52 PM Shani Peretz <shperetz@nvidia.com> wrote:
>
> DPDK allows for two ways to specify PCI device numbers:
> a full version ("0000:08:00.0") and a short version ("08:00.0").
> The problem arises when the application uses one format (e.g., full)
> when running testpmd, but then tries to use the other format (e.g., short)
> in a subsequent command, leading to a failure.
>
> The cmp_dev_name func, which is responsible for comparing PCI device names,
> is not handling the inconsistent PCI number representations correctly.
> The suggested fix is to use the pci_parse function, which can parse
> the PCI device name and fill a struct rte_pci_addr with the standardized
> representation of the PCI number.
> By comparing the struct rte_pci_addr instances instead of the string
> representations, the application can ensure consistent handling of
> PCI device numbers, regardless of the format used.
>
> Fixes: a3ee360f4440 ("eal: add hotplug add/remove device")
> Cc: jblunck@infradead.org
>
> Signed-off-by: Shani Peretz <shperetz@nvidia.com>
> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Acked-by: Thomas Monjalon <thomas@monjalon.net>

I find it strange that Thomas acked this patch (for example, the
commit title prefix is wrong).

I don't understand the issue.
Please provide a reproducer.
And ideally we need a unit test to track regressions on this topic.


Thanks.
  
Thomas Monjalon July 12, 2024, 5:55 p.m. UTC | #2
12/07/2024 15:49, David Marchand:
> On Mon, Jul 8, 2024 at 6:52 PM Shani Peretz <shperetz@nvidia.com> wrote:
> >
> > DPDK allows for two ways to specify PCI device numbers:
> > a full version ("0000:08:00.0") and a short version ("08:00.0").
> > The problem arises when the application uses one format (e.g., full)
> > when running testpmd, but then tries to use the other format (e.g., short)
> > in a subsequent command, leading to a failure.
> >
> > The cmp_dev_name func, which is responsible for comparing PCI device names,
> > is not handling the inconsistent PCI number representations correctly.
> > The suggested fix is to use the pci_parse function, which can parse
> > the PCI device name and fill a struct rte_pci_addr with the standardized
> > representation of the PCI number.
> > By comparing the struct rte_pci_addr instances instead of the string
> > representations, the application can ensure consistent handling of
> > PCI device numbers, regardless of the format used.
> >
> > Fixes: a3ee360f4440 ("eal: add hotplug add/remove device")
> > Cc: jblunck@infradead.org
> >
> > Signed-off-by: Shani Peretz <shperetz@nvidia.com>
> > Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> > Acked-by: Thomas Monjalon <thomas@monjalon.net>
> 
> I find it strange that Thomas acked this patch (for example, the
> commit title prefix is wrong).
> 
> I don't understand the issue.
> Please provide a reproducer.
> And ideally we need a unit test to track regressions on this topic.

Indeed, Shani we shouldn't add "Acked-by" until all is reviewed in detail.
  

Patch

diff --git a/app/test/test_vdev.c b/app/test/test_vdev.c
index 3e262f30bc..860fa260af 100644
--- a/app/test/test_vdev.c
+++ b/app/test/test_vdev.c
@@ -20,12 +20,6 @@  static const char * const valid_keys[] = {
 	NULL,
 };
 
-static int
-cmp_dev_name(const struct rte_device *dev, const void *name)
-{
-	return strcmp(rte_dev_name(dev), name);
-}
-
 static int
 cmp_dev_match(const struct rte_device *dev, const void *_kvlist)
 {
@@ -82,7 +76,7 @@  test_vdev_bus(void)
 		printf("Failed to create vdev net_null_test0\n");
 		goto fail;
 	}
-	dev0 = vdev_bus->find_device(NULL, cmp_dev_name, "net_null_test0");
+	dev0 = vdev_bus->find_device(NULL, rte_cmp_dev_name, "net_null_test0");
 	if (dev0 == NULL) {
 		printf("Cannot find net_null_test0 vdev\n");
 		goto fail;
@@ -93,7 +87,7 @@  test_vdev_bus(void)
 		printf("Failed to create vdev net_null_test1\n");
 		goto fail;
 	}
-	dev1 = vdev_bus->find_device(NULL, cmp_dev_name, "net_null_test1");
+	dev1 = vdev_bus->find_device(NULL, rte_cmp_dev_name, "net_null_test1");
 	if (dev1 == NULL) {
 		printf("Cannot find net_null_test1 vdev\n");
 		goto fail;
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 889a48d2af..538d491067 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -502,6 +502,16 @@  rte_pci_dump(FILE *f)
 	}
 }
 
+static int
+pci_cmp_name(const struct rte_device *dev, const void *name2)
+{
+	struct rte_pci_addr name2_addr;
+	const struct rte_pci_device *dev1 = RTE_DEV_TO_PCI_CONST(dev);
+
+	dev->bus->parse(name2, &name2_addr);
+	return rte_pci_addr_cmp(&dev1->addr, &name2_addr);
+}
+
 static int
 pci_parse(const char *name, void *addr)
 {
@@ -956,6 +966,7 @@  struct rte_pci_bus rte_pci_bus = {
 		.plug = pci_plug,
 		.unplug = pci_unplug,
 		.parse = pci_parse,
+		.cmp_name = pci_cmp_name,
 		.devargs_parse = rte_pci_devargs_parse,
 		.dma_map = pci_dma_map,
 		.dma_unmap = pci_dma_unmap,
diff --git a/lib/eal/common/eal_common_dev.c b/lib/eal/common/eal_common_dev.c
index a99252b02f..12d68c3605 100644
--- a/lib/eal/common/eal_common_dev.c
+++ b/lib/eal/common/eal_common_dev.c
@@ -107,11 +107,12 @@  struct dev_next_ctx {
 #define CLSCTX(ptr) \
 	(((struct dev_next_ctx *)(intptr_t)ptr)->cls_str)
 
-static int cmp_dev_name(const struct rte_device *dev, const void *_name)
+int rte_cmp_dev_name(const struct rte_device *dev1, const void *name2)
 {
-	const char *name = _name;
+	if (dev1->bus->cmp_name)
+		return dev1->bus->cmp_name(dev1, name2);
 
-	return strcmp(dev->name, name);
+	return strcmp(dev1->name, (const char *)name2);
 }
 
 int
@@ -197,7 +198,7 @@  local_dev_probe(const char *devargs, struct rte_device **new_dev)
 	if (ret)
 		goto err_devarg;
 
-	dev = da->bus->find_device(NULL, cmp_dev_name, da->name);
+	dev = da->bus->find_device(NULL, rte_cmp_dev_name, da->name);
 	if (dev == NULL) {
 		EAL_LOG(ERR, "Cannot find device (%s)",
 			da->name);
@@ -335,7 +336,7 @@  rte_eal_hotplug_remove(const char *busname, const char *devname)
 		return -ENOENT;
 	}
 
-	dev = bus->find_device(NULL, cmp_dev_name, devname);
+	dev = bus->find_device(NULL, rte_cmp_dev_name, devname);
 	if (dev == NULL) {
 		EAL_LOG(ERR, "Cannot find plugged device (%s)", devname);
 		return -EINVAL;
diff --git a/lib/eal/common/hotplug_mp.c b/lib/eal/common/hotplug_mp.c
index 17089ca3db..a2623c96c3 100644
--- a/lib/eal/common/hotplug_mp.c
+++ b/lib/eal/common/hotplug_mp.c
@@ -21,13 +21,6 @@  struct mp_reply_bundle {
 	void *peer;
 };
 
-static int cmp_dev_name(const struct rte_device *dev, const void *_name)
-{
-	const char *name = _name;
-
-	return strcmp(dev->name, name);
-}
-
 /**
  * Secondary to primary request.
  * start from function eal_dev_hotplug_request_to_primary.
@@ -135,7 +128,7 @@  __handle_secondary_request(void *param)
 			goto finish;
 		}
 
-		dev = bus->find_device(NULL, cmp_dev_name, da.name);
+		dev = bus->find_device(NULL, rte_cmp_dev_name, da.name);
 		if (dev == NULL) {
 			EAL_LOG(ERR, "Cannot find plugged device (%s)", da.name);
 			ret = -ENOENT;
@@ -262,7 +255,7 @@  static void __handle_primary_request(void *param)
 			goto quit;
 		}
 
-		dev = bus->find_device(NULL, cmp_dev_name, da->name);
+		dev = bus->find_device(NULL, rte_cmp_dev_name, da->name);
 		if (dev == NULL) {
 			EAL_LOG(ERR, "Cannot find plugged device (%s)", da->name);
 			ret = -ENOENT;
diff --git a/lib/eal/include/bus_driver.h b/lib/eal/include/bus_driver.h
index 7b85a17a09..eba820f36c 100644
--- a/lib/eal/include/bus_driver.h
+++ b/lib/eal/include/bus_driver.h
@@ -118,6 +118,23 @@  typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
  */
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
+/**
+ * Bus specific device name comparison function.
+ *
+ * This type of function is used to compare a bus name with an arbitrary
+ * name.
+ *
+ * @param dev
+ *	Device handle.
+ *
+ * @param name
+ *	Name to compare against.
+ *
+ * @return
+ *	0 if the device matches the name. Nonzero otherwise.
+ */
+typedef int (*rte_bus_cmp_name_t)(const struct rte_device *dev, const void *name);
+
 /**
  * Parse bus part of the device arguments.
  *
@@ -258,6 +275,7 @@  struct rte_bus {
 	rte_bus_plug_t plug;         /**< Probe single device for drivers */
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
+	rte_bus_cmp_name_t cmp_name; /**< Compare device name */
 	rte_bus_devargs_parse_t devargs_parse; /**< Parse bus devargs */
 	rte_dev_dma_map_t dma_map;   /**< DMA map for device in the bus */
 	rte_dev_dma_unmap_t dma_unmap; /**< DMA unmap for device in the bus */
diff --git a/lib/eal/include/rte_dev.h b/lib/eal/include/rte_dev.h
index cefa04f905..a3d666c110 100644
--- a/lib/eal/include/rte_dev.h
+++ b/lib/eal/include/rte_dev.h
@@ -21,6 +21,7 @@  extern "C" {
 
 #include <rte_config.h>
 #include <rte_common.h>
+#include <rte_compat.h>
 #include <rte_log.h>
 
 struct rte_bus;
@@ -170,6 +171,21 @@  int rte_dev_is_probed(const struct rte_device *dev);
 int rte_eal_hotplug_add(const char *busname, const char *devname,
 			const char *drvargs);
 
+/**
+ * General device name comparison. Will compare by using the specific bus
+ * compare function or by comparing the names directly.
+ *
+ * @param dev
+ *   Device handle.
+ * @param name
+ *   Name to compare against.
+ * @return
+ *   0 if the device matches the name. Nonzero otherwise.
+ */
+__rte_experimental
+int
+rte_cmp_dev_name(const struct rte_device *dev, const void *name);
+
 /**
  * Add matching devices.
  *
diff --git a/lib/eal/linux/eal_dev.c b/lib/eal/linux/eal_dev.c
index fff4e8fa83..5c8effd7be 100644
--- a/lib/eal/linux/eal_dev.c
+++ b/lib/eal/linux/eal_dev.c
@@ -91,14 +91,6 @@  static void sigbus_handler(int signum, siginfo_t *info,
 	EAL_LOG(DEBUG, "Success to handle SIGBUS for hot-unplug!");
 }
 
-static int cmp_dev_name(const struct rte_device *dev,
-	const void *_name)
-{
-	const char *name = _name;
-
-	return strcmp(dev->name, name);
-}
-
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -280,7 +272,7 @@  dev_uev_handler(__rte_unused void *param)
 				goto failure_handle_err;
 			}
 
-			dev = bus->find_device(NULL, cmp_dev_name,
+			dev = bus->find_device(NULL, rte_cmp_dev_name,
 					       uevent.devname);
 			if (dev == NULL) {
 				EAL_LOG(ERR, "Cannot find device (%s) on "
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 3df50c3fbb..e4355ef890 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -396,6 +396,9 @@  EXPERIMENTAL {
 
 	# added in 24.03
 	rte_vfio_get_device_info; # WINDOWS_NO_EXPORT
+
+	# added in 24.07
+	rte_cmp_dev_name;
 };
 
 INTERNAL {