net/mlx5: fix bond resource release

Message ID 20230804171536.1724554-1-dsosnowski@nvidia.com (mailing list archive)
State Accepted, archived
Delegated to: Raslan Darawsheh
Headers
Series net/mlx5: fix bond resource release |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/intel-Functional success Functional PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-unit-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-aarch-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS

Commit Message

Dariusz Sosnowski Aug. 4, 2023, 5:15 p.m. UTC
  When a port is spawned on top of mlx5 bonding device,
the following TIS objects are created:

- TIS with index 0 - for default HW hash bonding mode,
- TIS with index 1 - for sending packets on 1st physical port,
- TIS with index 2 - for sending packets on 2nd physical port,
- and so on.

These TIS objects are used according to configured Tx queue affinity,
which was set up using rte_eth_dev_map_aggr_tx_affinity() API.

Before this patch, when DPDK was compiled in debug mode and
RTE_LIBRTE_MLX5_DEBUG macro was declared, applications were asserting
on failed call to destroy the TD object (on which TIS objects
are dependent) during closing of the ports.
Failure was caused by the fact that when TD object was destroyed, not
all TIS objects were destroyed yet. This was caused by "off-by-one"
issue in mlx5_free_shared_dev_ctx().
This function was releasing n TIS objects, but it should
release n + 1 objects, where n is number of aggregated ports.
(n + 1, because there are n TIS objects for each physical port
and 1 TIS object for default HW hash mode).

This patch fixes this issue in resource release of TIS objects.

Fixes: ce306af6341b ("net/mlx5: enhance Tx queue affinity")
Cc: jiaweiw@nvidia.com

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/net/mlx5/mlx5.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Slava Ovsiienko Aug. 18, 2023, 10:04 a.m. UTC | #1
> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Sent: Friday, August 4, 2023 8:16 PM
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; Suanming Mou
> <suanmingm@nvidia.com>; Jiawei(Jonny) Wang <jiaweiw@nvidia.com>
> Cc: dev@dpdk.org
> Subject: [PATCH] net/mlx5: fix bond resource release
> 
> When a port is spawned on top of mlx5 bonding device,
> the following TIS objects are created:
> 
> - TIS with index 0 - for default HW hash bonding mode,
> - TIS with index 1 - for sending packets on 1st physical port,
> - TIS with index 2 - for sending packets on 2nd physical port,
> - and so on.
> 
> These TIS objects are used according to configured Tx queue affinity,
> which was set up using rte_eth_dev_map_aggr_tx_affinity() API.
> 
> Before this patch, when DPDK was compiled in debug mode and
> RTE_LIBRTE_MLX5_DEBUG macro was declared, applications were asserting
> on failed call to destroy the TD object (on which TIS objects
> are dependent) during closing of the ports.
> Failure was caused by the fact that when TD object was destroyed, not
> all TIS objects were destroyed yet. This was caused by "off-by-one"
> issue in mlx5_free_shared_dev_ctx().
> This function was releasing n TIS objects, but it should
> release n + 1 objects, where n is number of aggregated ports.
> (n + 1, because there are n TIS objects for each physical port
> and 1 TIS object for default HW hash mode).
> 
> This patch fixes this issue in resource release of TIS objects.
> 
> Fixes: ce306af6341b ("net/mlx5: enhance Tx queue affinity")
> Cc: jiaweiw@nvidia.com
> 
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
  
Raslan Darawsheh Aug. 31, 2023, 8:39 a.m. UTC | #2
Hi,

> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Sent: Friday, August 4, 2023 8:16 PM
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; Suanming Mou
> <suanmingm@nvidia.com>; Jiawei(Jonny) Wang <jiaweiw@nvidia.com>
> Cc: dev@dpdk.org
> Subject: [PATCH] net/mlx5: fix bond resource release
> 
> When a port is spawned on top of mlx5 bonding device, the following TIS
> objects are created:
> 
> - TIS with index 0 - for default HW hash bonding mode,
> - TIS with index 1 - for sending packets on 1st physical port,
> - TIS with index 2 - for sending packets on 2nd physical port,
> - and so on.
> 
> These TIS objects are used according to configured Tx queue affinity, which
> was set up using rte_eth_dev_map_aggr_tx_affinity() API.
> 
> Before this patch, when DPDK was compiled in debug mode and
> RTE_LIBRTE_MLX5_DEBUG macro was declared, applications were asserting
> on failed call to destroy the TD object (on which TIS objects are dependent)
> during closing of the ports.
> Failure was caused by the fact that when TD object was destroyed, not all TIS
> objects were destroyed yet. This was caused by "off-by-one"
> issue in mlx5_free_shared_dev_ctx().
> This function was releasing n TIS objects, but it should release n + 1 objects,
> where n is number of aggregated ports.
> (n + 1, because there are n TIS objects for each physical port and 1 TIS object
> for default HW hash mode).
> 
> This patch fixes this issue in resource release of TIS objects.
> 
> Fixes: ce306af6341b ("net/mlx5: enhance Tx queue affinity")
> Cc: jiaweiw@nvidia.com
> 
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> ---

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh
  

Patch

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index b373306f98..7d044bcd75 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1720,7 +1720,7 @@  mlx5_alloc_shared_dev_ctx(const struct mlx5_dev_spawn_data *spawn,
 	do {
 		if (sh->tis[i])
 			claim_zero(mlx5_devx_cmd_destroy(sh->tis[i]));
-	} while (++i < (uint32_t)sh->bond.n_port);
+	} while (++i <= (uint32_t)sh->bond.n_port);
 	if (sh->td)
 		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	mlx5_free(sh);
@@ -1864,7 +1864,7 @@  mlx5_free_shared_dev_ctx(struct mlx5_dev_ctx_shared *sh)
 	do {
 		if (sh->tis[i])
 			claim_zero(mlx5_devx_cmd_destroy(sh->tis[i]));
-	} while (++i < sh->bond.n_port);
+	} while (++i <= sh->bond.n_port);
 	if (sh->td)
 		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 #ifdef HAVE_MLX5_HWS_SUPPORT