[v2] net/mlx5: workaround list management of Rx queue control

Message ID 20240723111411.1088810-1-bingz@nvidia.com (mailing list archive)
State Accepted, archived
Delegated to: Raslan Darawsheh
Headers
Series [v2] net/mlx5: workaround list management of Rx queue control |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/intel-Functional success Functional PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-marvell-Functional success Functional Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-sample-apps-testing success Testing PASS

Commit Message

Bing Zhao July 23, 2024, 11:14 a.m. UTC
The LIST_REMOVE macro only removes the entry from the list and
updates list itself. The pointers of this entry are not reset to
NULL to prevent the accessing for the 2nd time.

In the previous fix for the memory accessing, the "rxq_ctrl" was
removed from the list in a device private data when the "refcnt" was
decreased to 0. Under only shared or non-shared queues scenarios,
this was safe since all the "rxq_ctrl" entries were freed or kept.

There is one case that shared and non-shared Rx queues are configured
simultaneously, for example, a hairpin Rx queue cannot be shared.
When closing the port that allocated the shared Rx queues'
"rxq_ctrl", if the next entry is hairpin "rxq_ctrl", the hairpin
"rxq_ctrl" will be freed directly with other resources. When trying
to close the another port sharing the "rxq_ctrl", the LIST_REMOVE
will be called again and cause some UFA issue. If the memory is no
longer mapped, there will be a SIGSEGV.

Adding a flag in the Rx queue private structure to remove the
"rxq_ctrl" from the list only on the port/queue that allocated it.

Fixes: bcc220cb57d7 ("net/mlx5: fix shared Rx queue list management")

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
v2: fix CI code style warning
---
 drivers/net/mlx5/mlx5_rx.h  | 1 +
 drivers/net/mlx5/mlx5_rxq.c | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)
  

Comments

Raslan Darawsheh Aug. 29, 2024, 9 a.m. UTC | #1
Hi,

From: Bing Zhao <bingz@nvidia.com>
Sent: Tuesday, July 23, 2024 2:14 PM
To: Dariusz Sosnowski; Slava Ovsiienko; dev@dpdk.org; Raslan Darawsheh
Cc: Ori Kam; Suanming Mou; Matan Azrad
Subject: [PATCH v2] net/mlx5: workaround list management of Rx queue control

The LIST_REMOVE macro only removes the entry from the list and
updates list itself. The pointers of this entry are not reset to
NULL to prevent the accessing for the 2nd time.

In the previous fix for the memory accessing, the "rxq_ctrl" was
removed from the list in a device private data when the "refcnt" was
decreased to 0. Under only shared or non-shared queues scenarios,
this was safe since all the "rxq_ctrl" entries were freed or kept.

There is one case that shared and non-shared Rx queues are configured
simultaneously, for example, a hairpin Rx queue cannot be shared.
When closing the port that allocated the shared Rx queues'
"rxq_ctrl", if the next entry is hairpin "rxq_ctrl", the hairpin
"rxq_ctrl" will be freed directly with other resources. When trying
to close the another port sharing the "rxq_ctrl", the LIST_REMOVE
will be called again and cause some UFA issue. If the memory is no
longer mapped, there will be a SIGSEGV.

Adding a flag in the Rx queue private structure to remove the
"rxq_ctrl" from the list only on the port/queue that allocated it.

Fixes: bcc220cb57d7 ("net/mlx5: fix shared Rx queue list management")

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
v2: fix CI code style warning
---

Patch applied to next-net-mlx,


Kindest regards,
Raslan Darawsheh
  

Patch

diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 7d144921ab..9bcb43b007 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -173,6 +173,7 @@  struct mlx5_rxq_ctrl {
 /* RX queue private data. */
 struct mlx5_rxq_priv {
 	uint16_t idx; /* Queue index. */
+	bool possessor; /* Shared rxq_ctrl allocated for the 1st time. */
 	RTE_ATOMIC(uint32_t) refcnt; /* Reference counter. */
 	struct mlx5_rxq_ctrl *ctrl; /* Shared Rx Queue. */
 	LIST_ENTRY(mlx5_rxq_priv) owner_entry; /* Entry in shared rxq_ctrl. */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index f13fc3b353..c6655b7db4 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -938,6 +938,7 @@  mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			rte_errno = ENOMEM;
 			return -rte_errno;
 		}
+		rxq->possessor = true;
 	}
 	rxq->priv = priv;
 	rxq->idx = idx;
@@ -2015,6 +2016,7 @@  mlx5_rxq_hairpin_new(struct rte_eth_dev *dev, struct mlx5_rxq_priv *rxq,
 	tmpl->rxq.mr_ctrl.cache_bh = (struct mlx5_mr_btree) { 0 };
 	tmpl->rxq.idx = idx;
 	rxq->hairpin_conf = *hairpin_conf;
+	rxq->possessor = true;
 	mlx5_rxq_ref(dev, idx);
 	LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next);
 	return tmpl;
@@ -2282,7 +2284,8 @@  mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx)
 					RTE_ETH_QUEUE_STATE_STOPPED;
 		}
 	} else { /* Refcnt zero, closing device. */
-		LIST_REMOVE(rxq_ctrl, next);
+		if (rxq->possessor)
+			LIST_REMOVE(rxq_ctrl, next);
 		LIST_REMOVE(rxq, owner_entry);
 		if (LIST_EMPTY(&rxq_ctrl->owners)) {
 			if (!rxq_ctrl->is_hairpin)