mlx5: fix race at mlx5_dev_close

Message ID 20240411061740.16495-1-hepeng.0320@bytedance.com (mailing list archive)
State New
Delegated to: Raslan Darawsheh
Headers
Series mlx5: fix race at mlx5_dev_close |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/github-robot: build success github build: passed
ci/iol-abi-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS

Commit Message

hepeng April 11, 2024, 6:17 a.m. UTC
  From: "hepeng.0320" <hepeng.0320@bytedance.com>

mlx5_dev_close currently will set priv->sh->port[priv->dev_port -
1].nl_ih_port_id to RTE_MAX_ETHPORTS to avoid mlx5_dev_interrupt_nl_cb
to use the port's dev_private, because later the rte_eth_dev_close
will free the dev_private and set the pointer to NULL.

However, since mlx5_dev_interrupt_nl_cb is running in another thread,
I think the race still exists. So perhaps an easy fix is to wait for
1ms to avoid this race.

Signed-off-by: hepeng.0320 <hepeng.0320@bytedance.com>
---
 drivers/net/mlx5/mlx5.c | 4 ++++
 1 file changed, 4 insertions(+)
  

Patch

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index d1a6382..283162f 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2457,6 +2457,10 @@  mlx5_dev_close(struct rte_eth_dev *dev)
 	 * mlx5_os_mac_addr_flush() uses ibdev_path for retrieving
 	 * ifindex if Netlink fails.
 	 */
+
+	/* Avoid race condition if mlx5_dev_interrupt_nl_cb is running. */
+	rte_delay_us_sleep(1000);
+
 	mlx5_free_shared_dev_ctx(priv->sh);
 	if (priv->domain_id != RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID) {
 		unsigned int c = 0;