vhost: fix deadlock when vhost unregister

Message ID 20190128065549.98266-1-findtheonlyway@gmail.com
State New
Delegated to: Maxime Coquelin
Headers show
Series
  • vhost: fix deadlock when vhost unregister
Related show

Checks

Context Check Description
ci/mellanox-Performance-Testing success Performance Testing PASS
ci/intel-Performance-Testing success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/checkpatch success coding style OK

Commit Message

孙文杰 Jan. 28, 2019, 6:55 a.m.
When rte_vhost_driver_unregister  delete the connection fd,
fdset_try_del will always try and donot release the
vhostuser.mutex if the fd is busy, but the fdset_event_dispatch
will set the  fd to busy and call vhost_user_msg_handler to get
vhostuser.mutex, which will  cause deadlock. Unlock the
vhost_user.mutexif fdset_try_del fail and relock it when retry.

Fixes: 8b4b949144b8 ("vhost: fix dead lock on closing in server mode")
Cc: stable@dpdk.org

Signed-off-by: sunwenjie <findtheonlyway@gmail.com>
---
 lib/librte_vhost/socket.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Maxime Coquelin Feb. 8, 2019, 2:12 p.m. | #1
On 1/28/19 7:55 AM, sunwenjie wrote:
> When rte_vhost_driver_unregister  delete the connection fd,
> fdset_try_del will always try and donot release the
> vhostuser.mutex if the fd is busy, but the fdset_event_dispatch
> will set the  fd to busy and call vhost_user_msg_handler to get
> vhostuser.mutex, which will  cause deadlock. Unlock the
> vhost_user.mutexif fdset_try_del fail and relock it when retry.

What about this wording:

In rte_vhost_driver_unregister(), the connection fd is removed from
the fdset using fdset_try_del(). Call to this function may fail
if the corresponding fd is in busy state, indicating that event
dispatcher is executing the read or write callback on this fd.
When it happens, rte_vhost_driver_unregister() keeps trying to
remove the fd from the set until it is no more busy.

This situation is causing a deadlock, because
rte_vhost_driver_unregister() keeps trying to remove the fd from
the set with vhost_user.mutex held, while the callback executed
by the dispatcher, vhost_user_read_cb(), also takes this mutex at
numerous places.

The fix consists in releasing vhost_user.mutex between each retry
in vhost_driver_unregister().


> 
> Fixes: 8b4b949144b8 ("vhost: fix dead lock on closing in server mode")
> Cc: stable@dpdk.org
> 
> Signed-off-by: sunwenjie <findtheonlyway@gmail.com>

We need your real name for legal reasons:
Signed-off-by: Surname Lastname <findtheonlyway@gmail.com>

No need to resubmit, I can handle the commit message fixup and
the fix looks good to me:
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

As soon as I get your name in above format I will apply the patch in
Virtio tree. Thanks for submitting the fix.

Maxime
> ---
>   lib/librte_vhost/socket.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
> index 9cf34ad17..9883b0491 100644
> --- a/lib/librte_vhost/socket.c
> +++ b/lib/librte_vhost/socket.c
> @@ -961,13 +961,13 @@ rte_vhost_driver_unregister(const char *path)
>   	int count;
>   	struct vhost_user_connection *conn, *next;
>   
> +again:
>   	pthread_mutex_lock(&vhost_user.mutex);
>   
>   	for (i = 0; i < vhost_user.vsocket_cnt; i++) {
>   		struct vhost_user_socket *vsocket = vhost_user.vsockets[i];
>   
>   		if (!strcmp(vsocket->path, path)) {
> -again:
>   			pthread_mutex_lock(&vsocket->conn_mutex);
>   			for (conn = TAILQ_FIRST(&vsocket->conn_list);
>   			     conn != NULL;
> @@ -983,6 +983,7 @@ rte_vhost_driver_unregister(const char *path)
>   						  conn->connfd) == -1) {
>   					pthread_mutex_unlock(
>   							&vsocket->conn_mutex);
> +					pthread_mutex_unlock(&vhost_user.mutex);
>   					goto again;
>   				}
>   
>
孙文杰 Feb. 14, 2019, 4:05 a.m. | #2
Thanks, Maxime.

Your description is better, My real name is Wenjie Sun.

Signed-off-by: Wenjie Sun <findtheonlyway@gmail.com>

Maxime Coquelin <maxime.coquelin@redhat.com> 于2019年2月8日周五 下午10:12写道:

>
>
> On 1/28/19 7:55 AM, sunwenjie wrote:
> > When rte_vhost_driver_unregister  delete the connection fd,
> > fdset_try_del will always try and donot release the
> > vhostuser.mutex if the fd is busy, but the fdset_event_dispatch
> > will set the  fd to busy and call vhost_user_msg_handler to get
> > vhostuser.mutex, which will  cause deadlock. Unlock the
> > vhost_user.mutexif fdset_try_del fail and relock it when retry.
>
> What about this wording:
>
> In rte_vhost_driver_unregister(), the connection fd is removed from
> the fdset using fdset_try_del(). Call to this function may fail
> if the corresponding fd is in busy state, indicating that event
> dispatcher is executing the read or write callback on this fd.
> When it happens, rte_vhost_driver_unregister() keeps trying to
> remove the fd from the set until it is no more busy.
>
> This situation is causing a deadlock, because
> rte_vhost_driver_unregister() keeps trying to remove the fd from
> the set with vhost_user.mutex held, while the callback executed
> by the dispatcher, vhost_user_read_cb(), also takes this mutex at
> numerous places.
>
> The fix consists in releasing vhost_user.mutex between each retry
> in vhost_driver_unregister().
>
>
> >
> > Fixes: 8b4b949144b8 ("vhost: fix dead lock on closing in server mode")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: sunwenjie <findtheonlyway@gmail.com>
>
> We need your real name for legal reasons:
> Signed-off-by: Surname Lastname <findtheonlyway@gmail.com>
>
> No need to resubmit, I can handle the commit message fixup and
> the fix looks good to me:
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>
> As soon as I get your name in above format I will apply the patch in
> Virtio tree. Thanks for submitting the fix.
>
> Maxime
> > ---
> >   lib/librte_vhost/socket.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
> > index 9cf34ad17..9883b0491 100644
> > --- a/lib/librte_vhost/socket.c
> > +++ b/lib/librte_vhost/socket.c
> > @@ -961,13 +961,13 @@ rte_vhost_driver_unregister(const char *path)
> >       int count;
> >       struct vhost_user_connection *conn, *next;
> >
> > +again:
> >       pthread_mutex_lock(&vhost_user.mutex);
> >
> >       for (i = 0; i < vhost_user.vsocket_cnt; i++) {
> >               struct vhost_user_socket *vsocket = vhost_user.vsockets[i];
> >
> >               if (!strcmp(vsocket->path, path)) {
> > -again:
> >                       pthread_mutex_lock(&vsocket->conn_mutex);
> >                       for (conn = TAILQ_FIRST(&vsocket->conn_list);
> >                            conn != NULL;
> > @@ -983,6 +983,7 @@ rte_vhost_driver_unregister(const char *path)
> >                                                 conn->connfd) == -1) {
> >                                       pthread_mutex_unlock(
> >
>  &vsocket->conn_mutex);
> > +
>  pthread_mutex_unlock(&vhost_user.mutex);
> >                                       goto again;
> >                               }
> >
> >
>

Patch

diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 9cf34ad17..9883b0491 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -961,13 +961,13 @@  rte_vhost_driver_unregister(const char *path)
 	int count;
 	struct vhost_user_connection *conn, *next;
 
+again:
 	pthread_mutex_lock(&vhost_user.mutex);
 
 	for (i = 0; i < vhost_user.vsocket_cnt; i++) {
 		struct vhost_user_socket *vsocket = vhost_user.vsockets[i];
 
 		if (!strcmp(vsocket->path, path)) {
-again:
 			pthread_mutex_lock(&vsocket->conn_mutex);
 			for (conn = TAILQ_FIRST(&vsocket->conn_list);
 			     conn != NULL;
@@ -983,6 +983,7 @@  rte_vhost_driver_unregister(const char *path)
 						  conn->connfd) == -1) {
 					pthread_mutex_unlock(
 							&vsocket->conn_mutex);
+					pthread_mutex_unlock(&vhost_user.mutex);
 					goto again;
 				}