net/mlx5: fix memory regions release deadlock

Message ID 1580810881-177458-1-git-send-email-michaelba@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Raslan Darawsheh
Headers
Series net/mlx5: fix memory regions release deadlock |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-nxp-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/Intel-compilation fail apply issues

Commit Message

Michael Baum Feb. 4, 2020, 10:08 a.m. UTC
  When we create memory callback list, we add cb function managing memory
regions. This function uses lock. This callback iterates over shared
device list and takes a lock of each shared device to avoid parallel
accessing to the MR list of the shared device.
When PND releases memory regions while the list is exist, the callback
function maps the MRs using the lock.
In shared device closing, when all its MRs are freed, the same lock is
taken.

The MRs freeing calls rte_free what may trigger the memory callback.
The MR freeing, wrongly, took the lock before the shared device removal
from the callback list what causes the deadlock.

In order to solve it, first we remove the share device from the list and
then release memory regions.

Fixes: 0e3d0525b2f2 ("net/mlx5: fix memory event callback list")
Cc: viacheslavo@mellanox.com
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Patch

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f80e403..759491f 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -679,12 +679,12 @@  struct mlx5_flow_id_pool *
 	MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	if (--sh->refcnt)
 		goto exit;
-	/* Release created Memory Regions. */
-	mlx5_mr_release(sh);
 	/* Remove from memory callback device list. */
 	rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);
 	LIST_REMOVE(sh, mem_event_cb);
 	rte_rwlock_write_unlock(&mlx5_shared_data->mem_event_rwlock);
+	/* Release created Memory Regions. */
+	mlx5_mr_release(sh);
 	/* Remove context from the global device list. */
 	LIST_REMOVE(sh, next);
 	/*