[v2] ethdev: fix device init without socket-local memory

Message ID 20240722100228.618616-1-bruce.richardson@intel.com (mailing list archive)
State Accepted, archived
Delegated to: Ferruh Yigit
Headers
Series [v2] ethdev: fix device init without socket-local memory |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/github-robot: build success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-marvell-Functional success Functional Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS

Commit Message

Bruce Richardson July 22, 2024, 10:02 a.m. UTC
When allocating memory for an ethdev, the rte_malloc_socket call used
only allocates memory on the NUMA node/socket local to the device. This
means that even if the user wanted to, they could never use a remote NIC
without also having memory on that NIC's socket.

For example, if we change examples/skeleton/basicfwd.c to have
SOCKET_ID_ANY as the socket_id parameter for Rx and Tx rings, we should
be able to run the app cross-numa e.g. as below, where the two PCI
devices are on socket 1, and core 1 is on socket 0:

 ./build/examples/dpdk-skeleton -l 1 --legacy-mem --socket-mem=1024,0 \
		-a a8:00.0 -a b8:00.0

This fails however, with the error:

  ETHDEV: failed to allocate private data
  PCI_BUS: Requested device 0000:a8:00.0 cannot be used

We can remove this restriction by doing a fallback call to general
rte_malloc after a call to rte_malloc_socket fails. This should be safe
to do because the later ethdev calls to setup Rx/Tx queues all take a
socket_id parameter, which can be used by applications to enforce the
requirement for local-only memory for a device, if so desired. [If
device-local memory is present it will be used as before, while if not
present the rte_eth_dev_configure call will now pass, but the subsequent
queue setup calls requesting local memory will fail].

Fixes: e489007a411c ("ethdev: add generic create/destroy ethdev APIs")
Fixes: dcd5c8112bc3 ("ethdev: add PCI driver helpers")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Padraig Connolly <padraig.j.connolly@intel.com>

---
V2:
* Add warning printout in the case where we don't get device-local
  memory, but we do get memory on another socket.
---
 lib/ethdev/ethdev_driver.c | 20 +++++++++++++++-----
 lib/ethdev/ethdev_pci.h    | 20 +++++++++++++++++---
 2 files changed, 32 insertions(+), 8 deletions(-)
  

Comments

Ferruh Yigit July 22, 2024, 1:24 p.m. UTC | #1
On 7/22/2024 11:02 AM, Bruce Richardson wrote:
> When allocating memory for an ethdev, the rte_malloc_socket call used
> only allocates memory on the NUMA node/socket local to the device. This
> means that even if the user wanted to, they could never use a remote NIC
> without also having memory on that NIC's socket.
> 
> For example, if we change examples/skeleton/basicfwd.c to have
> SOCKET_ID_ANY as the socket_id parameter for Rx and Tx rings, we should
> be able to run the app cross-numa e.g. as below, where the two PCI
> devices are on socket 1, and core 1 is on socket 0:
> 
>  ./build/examples/dpdk-skeleton -l 1 --legacy-mem --socket-mem=1024,0 \
> 		-a a8:00.0 -a b8:00.0
> 
> This fails however, with the error:
> 
>   ETHDEV: failed to allocate private data
>   PCI_BUS: Requested device 0000:a8:00.0 cannot be used
> 
> We can remove this restriction by doing a fallback call to general
> rte_malloc after a call to rte_malloc_socket fails. This should be safe
> to do because the later ethdev calls to setup Rx/Tx queues all take a
> socket_id parameter, which can be used by applications to enforce the
> requirement for local-only memory for a device, if so desired. [If
> device-local memory is present it will be used as before, while if not
> present the rte_eth_dev_configure call will now pass, but the subsequent
> queue setup calls requesting local memory will fail].
> 
> Fixes: e489007a411c ("ethdev: add generic create/destroy ethdev APIs")
> Fixes: dcd5c8112bc3 ("ethdev: add PCI driver helpers")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: Padraig Connolly <padraig.j.connolly@intel.com>
>

Reviewed-by: Ferruh Yigit <ferruh.yigit@amd.com>

Applied to dpdk-next-net/main, thanks.
  

Patch

diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
index f48c0eb8bc..c335a25a82 100644
--- a/lib/ethdev/ethdev_driver.c
+++ b/lib/ethdev/ethdev_driver.c
@@ -303,15 +303,25 @@  rte_eth_dev_create(struct rte_device *device, const char *name,
 			return -ENODEV;
 
 		if (priv_data_size) {
+			/* try alloc private data on device-local node. */
 			ethdev->data->dev_private = rte_zmalloc_socket(
 				name, priv_data_size, RTE_CACHE_LINE_SIZE,
 				device->numa_node);
 
-			if (!ethdev->data->dev_private) {
-				RTE_ETHDEV_LOG_LINE(ERR,
-					"failed to allocate private data");
-				retval = -ENOMEM;
-				goto probe_failed;
+			/* fall back to alloc on any socket on failure */
+			if (ethdev->data->dev_private == NULL) {
+				ethdev->data->dev_private = rte_zmalloc(name,
+						priv_data_size, RTE_CACHE_LINE_SIZE);
+
+				if (ethdev->data->dev_private == NULL) {
+					RTE_ETHDEV_LOG_LINE(ERR, "failed to allocate private data");
+					retval = -ENOMEM;
+					goto probe_failed;
+				}
+				/* got memory, but not local, so issue warning */
+				RTE_ETHDEV_LOG_LINE(WARNING,
+						"Private data for ethdev '%s' not allocated on local NUMA node %d",
+						device->name, device->numa_node);
 			}
 		}
 	} else {
diff --git a/lib/ethdev/ethdev_pci.h b/lib/ethdev/ethdev_pci.h
index 737fff1833..ec4f731270 100644
--- a/lib/ethdev/ethdev_pci.h
+++ b/lib/ethdev/ethdev_pci.h
@@ -93,12 +93,26 @@  rte_eth_dev_pci_allocate(struct rte_pci_device *dev, size_t private_data_size)
 			return NULL;
 
 		if (private_data_size) {
+			/* Try and alloc the private-data structure on socket local to the device */
 			eth_dev->data->dev_private = rte_zmalloc_socket(name,
 				private_data_size, RTE_CACHE_LINE_SIZE,
 				dev->device.numa_node);
-			if (!eth_dev->data->dev_private) {
-				rte_eth_dev_release_port(eth_dev);
-				return NULL;
+
+			/* if cannot allocate memory on the socket local to the device
+			 * use rte_malloc to allocate memory on some other socket, if available.
+			 */
+			if (eth_dev->data->dev_private == NULL) {
+				eth_dev->data->dev_private = rte_zmalloc(name,
+						private_data_size, RTE_CACHE_LINE_SIZE);
+
+				if (eth_dev->data->dev_private == NULL) {
+					rte_eth_dev_release_port(eth_dev);
+					return NULL;
+				}
+				/* got memory, but not local, so issue warning */
+				RTE_ETHDEV_LOG_LINE(WARNING,
+						"Private data for ethdev '%s' not allocated on local NUMA node %d",
+						dev->device.name, dev->device.numa_node);
 			}
 		}
 	} else {