net/mlx5: fix bond resource release
Checks
Commit Message
When a port is spawned on top of mlx5 bonding device,
the following TIS objects are created:
- TIS with index 0 - for default HW hash bonding mode,
- TIS with index 1 - for sending packets on 1st physical port,
- TIS with index 2 - for sending packets on 2nd physical port,
- and so on.
These TIS objects are used according to configured Tx queue affinity,
which was set up using rte_eth_dev_map_aggr_tx_affinity() API.
Before this patch, when DPDK was compiled in debug mode and
RTE_LIBRTE_MLX5_DEBUG macro was declared, applications were asserting
on failed call to destroy the TD object (on which TIS objects
are dependent) during closing of the ports.
Failure was caused by the fact that when TD object was destroyed, not
all TIS objects were destroyed yet. This was caused by "off-by-one"
issue in mlx5_free_shared_dev_ctx().
This function was releasing n TIS objects, but it should
release n + 1 objects, where n is number of aggregated ports.
(n + 1, because there are n TIS objects for each physical port
and 1 TIS object for default HW hash mode).
This patch fixes this issue in resource release of TIS objects.
Fixes: ce306af6341b ("net/mlx5: enhance Tx queue affinity")
Cc: jiaweiw@nvidia.com
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
drivers/net/mlx5/mlx5.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Comments
> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Sent: Friday, August 4, 2023 8:16 PM
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; Suanming Mou
> <suanmingm@nvidia.com>; Jiawei(Jonny) Wang <jiaweiw@nvidia.com>
> Cc: dev@dpdk.org
> Subject: [PATCH] net/mlx5: fix bond resource release
>
> When a port is spawned on top of mlx5 bonding device,
> the following TIS objects are created:
>
> - TIS with index 0 - for default HW hash bonding mode,
> - TIS with index 1 - for sending packets on 1st physical port,
> - TIS with index 2 - for sending packets on 2nd physical port,
> - and so on.
>
> These TIS objects are used according to configured Tx queue affinity,
> which was set up using rte_eth_dev_map_aggr_tx_affinity() API.
>
> Before this patch, when DPDK was compiled in debug mode and
> RTE_LIBRTE_MLX5_DEBUG macro was declared, applications were asserting
> on failed call to destroy the TD object (on which TIS objects
> are dependent) during closing of the ports.
> Failure was caused by the fact that when TD object was destroyed, not
> all TIS objects were destroyed yet. This was caused by "off-by-one"
> issue in mlx5_free_shared_dev_ctx().
> This function was releasing n TIS objects, but it should
> release n + 1 objects, where n is number of aggregated ports.
> (n + 1, because there are n TIS objects for each physical port
> and 1 TIS object for default HW hash mode).
>
> This patch fixes this issue in resource release of TIS objects.
>
> Fixes: ce306af6341b ("net/mlx5: enhance Tx queue affinity")
> Cc: jiaweiw@nvidia.com
>
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Hi,
> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Sent: Friday, August 4, 2023 8:16 PM
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; Suanming Mou
> <suanmingm@nvidia.com>; Jiawei(Jonny) Wang <jiaweiw@nvidia.com>
> Cc: dev@dpdk.org
> Subject: [PATCH] net/mlx5: fix bond resource release
>
> When a port is spawned on top of mlx5 bonding device, the following TIS
> objects are created:
>
> - TIS with index 0 - for default HW hash bonding mode,
> - TIS with index 1 - for sending packets on 1st physical port,
> - TIS with index 2 - for sending packets on 2nd physical port,
> - and so on.
>
> These TIS objects are used according to configured Tx queue affinity, which
> was set up using rte_eth_dev_map_aggr_tx_affinity() API.
>
> Before this patch, when DPDK was compiled in debug mode and
> RTE_LIBRTE_MLX5_DEBUG macro was declared, applications were asserting
> on failed call to destroy the TD object (on which TIS objects are dependent)
> during closing of the ports.
> Failure was caused by the fact that when TD object was destroyed, not all TIS
> objects were destroyed yet. This was caused by "off-by-one"
> issue in mlx5_free_shared_dev_ctx().
> This function was releasing n TIS objects, but it should release n + 1 objects,
> where n is number of aggregated ports.
> (n + 1, because there are n TIS objects for each physical port and 1 TIS object
> for default HW hash mode).
>
> This patch fixes this issue in resource release of TIS objects.
>
> Fixes: ce306af6341b ("net/mlx5: enhance Tx queue affinity")
> Cc: jiaweiw@nvidia.com
>
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> ---
Patch applied to next-net-mlx,
Kindest regards,
Raslan Darawsheh
@@ -1720,7 +1720,7 @@ mlx5_alloc_shared_dev_ctx(const struct mlx5_dev_spawn_data *spawn,
do {
if (sh->tis[i])
claim_zero(mlx5_devx_cmd_destroy(sh->tis[i]));
- } while (++i < (uint32_t)sh->bond.n_port);
+ } while (++i <= (uint32_t)sh->bond.n_port);
if (sh->td)
claim_zero(mlx5_devx_cmd_destroy(sh->td));
mlx5_free(sh);
@@ -1864,7 +1864,7 @@ mlx5_free_shared_dev_ctx(struct mlx5_dev_ctx_shared *sh)
do {
if (sh->tis[i])
claim_zero(mlx5_devx_cmd_destroy(sh->tis[i]));
- } while (++i < sh->bond.n_port);
+ } while (++i <= sh->bond.n_port);
if (sh->td)
claim_zero(mlx5_devx_cmd_destroy(sh->td));
#ifdef HAVE_MLX5_HWS_SUPPORT