[v5] net/mlx5: fix mutex unlock in txpp cleanup

Message ID 20211012100204.5569-1-cyeaa@connect.ust.hk (mailing list archive)
State Superseded, archived
Delegated to: Raslan Darawsheh
Headers
Series [v5] net/mlx5: fix mutex unlock in txpp cleanup |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/github-robot: build success github build: passed
ci/Intel-compilation success Compilation OK
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS

Commit Message

YE Chengfeng Oct. 12, 2021, 10:02 a.m. UTC
  The lock sh->txpp.mutex was not correctly released on one path
of cleanup function return, potentially causing the deadlock.

Fixes: d133f4cdb7 ("net/mlx5: create clock queue for packet pacing")
Cc: stable@dpdk.org

Signed-off-by: Chengfeng Ye <cyeaa@connect.ust.hk>
---
 drivers/net/mlx5/mlx5_txpp.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
  

Comments

Slava Ovsiienko Nov. 2, 2021, 7:55 a.m. UTC | #1
> -----Original Message-----
> From: Chengfeng Ye <cyeaa@connect.ust.hk>
> Sent: Tuesday, October 12, 2021 13:02
> To: david.marchand@redhat.com; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Matan
> Azrad <matan@nvidia.com>
> Cc: dev@dpdk.org; Chengfeng Ye <cyeaa@connect.ust.hk>;
> stable@dpdk.org
> Subject: [PATCH v5] net/mlx5: fix mutex unlock in txpp cleanup
> 
> The lock sh->txpp.mutex was not correctly released on one path of cleanup
> function return, potentially causing the deadlock.
> 
> Fixes: d133f4cdb7 ("net/mlx5: create clock queue for packet pacing")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Chengfeng Ye <cyeaa@connect.ust.hk>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
  
Raslan Darawsheh Nov. 9, 2021, 11:08 a.m. UTC | #2
Hi,

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Chengfeng Ye
> Sent: Tuesday, October 12, 2021 1:02 PM
> To: david.marchand@redhat.com; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Matan
> Azrad <matan@nvidia.com>
> Cc: dev@dpdk.org; Chengfeng Ye <cyeaa@connect.ust.hk>;
> stable@dpdk.org
> Subject: [dpdk-dev] [PATCH v5] net/mlx5: fix mutex unlock in txpp cleanup
> 
> The lock sh->txpp.mutex was not correctly released on one path of cleanup
> function return, potentially causing the deadlock.
> 
> Fixes: d133f4cdb7 ("net/mlx5: create clock queue for packet pacing")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Chengfeng Ye <cyeaa@connect.ust.hk>

Patch applied to next-net-mlx,

Kindest regards
Raslan Darawsheh
  
Ferruh Yigit Nov. 10, 2021, 4:57 p.m. UTC | #3
On 10/12/2021 11:02 AM, Chengfeng Ye wrote:
> The lock sh->txpp.mutex was not correctly released on one path
> of cleanup function return, potentially causing the deadlock.
> 
> Fixes: d133f4cdb7 ("net/mlx5: create clock queue for packet pacing")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Chengfeng Ye <cyeaa@connect.ust.hk>
> ---
>   drivers/net/mlx5/mlx5_txpp.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
> index 4f6da9f2d1..0ece788a84 100644
> --- a/drivers/net/mlx5/mlx5_txpp.c
> +++ b/drivers/net/mlx5/mlx5_txpp.c
> @@ -961,8 +961,12 @@ mlx5_txpp_stop(struct rte_eth_dev *dev)
>   	MLX5_ASSERT(!ret);
>   	RTE_SET_USED(ret);
>   	MLX5_ASSERT(sh->txpp.refcnt);
> -	if (!sh->txpp.refcnt || --sh->txpp.refcnt)
> +	if (!sh->txpp.refcnt || --sh->txpp.refcnt) {
> +		ret = pthread_mutex_unlock(&sh->txpp.mutex);
> +		MLX5_ASSERT(!ret);
> +		RTE_SET_USED(ret);

Is this 'RTE_SET_USED()' need to be used multiple times for same variable?

This usage looks ugly, I can see why it is used but I wonder if this can
be solved differently, what about something like following:

  #ifdef RTE_LIBRTE_MLX5_DEBUG
   #define MLX5_ASSERT(exp) RTE_VERIFY(exp)
  #else
   #ifdef RTE_ENABLE_ASSERT
    #define MLX5_ASSERT(exp) RTE_ASSERT(exp)
   #else
    #define MLX5_ASSERT(exp) RTE_SET_USED(exp)
   #endif
  #endif

>   		return;
> +	}
>   	/* No references any more, do actual destroy. */
>   	mlx5_txpp_destroy(sh);
>   	ret = pthread_mutex_unlock(&sh->txpp.mutex);
>
  
Slava Ovsiienko Nov. 11, 2021, 7:06 a.m. UTC | #4
Hi, Ferruh

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Wednesday, November 10, 2021 18:57
> To: Chengfeng Ye <cyeaa@connect.ust.hk>; david.marchand@redhat.com;
> Slava Ovsiienko <viacheslavo@nvidia.com>; Shahaf Shuler
> <shahafs@nvidia.com>; Matan Azrad <matan@nvidia.com>
> Cc: dev@dpdk.org; stable@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v5] net/mlx5: fix mutex unlock in txpp
> cleanup
> 
> On 10/12/2021 11:02 AM, Chengfeng Ye wrote:
> > The lock sh->txpp.mutex was not correctly released on one path of
> > cleanup function return, potentially causing the deadlock.
> >
> > Fixes: d133f4cdb7 ("net/mlx5: create clock queue for packet pacing")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Chengfeng Ye <cyeaa@connect.ust.hk>
> > ---
> >   drivers/net/mlx5/mlx5_txpp.c | 6 +++++-
> >   1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/mlx5/mlx5_txpp.c
> > b/drivers/net/mlx5/mlx5_txpp.c index 4f6da9f2d1..0ece788a84 100644
> > --- a/drivers/net/mlx5/mlx5_txpp.c
> > +++ b/drivers/net/mlx5/mlx5_txpp.c
> > @@ -961,8 +961,12 @@ mlx5_txpp_stop(struct rte_eth_dev *dev)
> >   	MLX5_ASSERT(!ret);
> >   	RTE_SET_USED(ret);
> >   	MLX5_ASSERT(sh->txpp.refcnt);
> > -	if (!sh->txpp.refcnt || --sh->txpp.refcnt)
> > +	if (!sh->txpp.refcnt || --sh->txpp.refcnt) {
> > +		ret = pthread_mutex_unlock(&sh->txpp.mutex);
> > +		MLX5_ASSERT(!ret);
> > +		RTE_SET_USED(ret);
> 
> Is this 'RTE_SET_USED()' need to be used multiple times for same variable?
mmm, It seems "claim_zero()" macro would be better here:

claim_zero(pthread_mutex_lock(&sh->txpp.mutex));

I will provide the cleanup patch, thank you for noticing that

> This usage looks ugly, I can see why it is used but I wonder if this can be
> solved differently, what about something like following:
> 
>   #ifdef RTE_LIBRTE_MLX5_DEBUG
>    #define MLX5_ASSERT(exp) RTE_VERIFY(exp)
>   #else
>    #ifdef RTE_ENABLE_ASSERT
>     #define MLX5_ASSERT(exp) RTE_ASSERT(exp)
>    #else
>     #define MLX5_ASSERT(exp) RTE_SET_USED(exp)
>    #endif
>   #endif
It would directly replace MLX5_ASSERT(exp) with RTE_SET_USED(exp)
if there is neither RTE_ENABLE_ASSERT nor RTE_LIBRTE_MLX5_DEBUG.
We would not like to drop the "not used" check functionality at all , right?

With best regards,
Slava
  
Ferruh Yigit Nov. 11, 2021, 11:25 a.m. UTC | #5
On 11/11/2021 7:06 AM, Slava Ovsiienko wrote:
> Hi, Ferruh
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> Sent: Wednesday, November 10, 2021 18:57
>> To: Chengfeng Ye <cyeaa@connect.ust.hk>; david.marchand@redhat.com;
>> Slava Ovsiienko <viacheslavo@nvidia.com>; Shahaf Shuler
>> <shahafs@nvidia.com>; Matan Azrad <matan@nvidia.com>
>> Cc: dev@dpdk.org; stable@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v5] net/mlx5: fix mutex unlock in txpp
>> cleanup
>>
>> On 10/12/2021 11:02 AM, Chengfeng Ye wrote:
>>> The lock sh->txpp.mutex was not correctly released on one path of
>>> cleanup function return, potentially causing the deadlock.
>>>
>>> Fixes: d133f4cdb7 ("net/mlx5: create clock queue for packet pacing")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Chengfeng Ye <cyeaa@connect.ust.hk>
>>> ---
>>>    drivers/net/mlx5/mlx5_txpp.c | 6 +++++-
>>>    1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/mlx5/mlx5_txpp.c
>>> b/drivers/net/mlx5/mlx5_txpp.c index 4f6da9f2d1..0ece788a84 100644
>>> --- a/drivers/net/mlx5/mlx5_txpp.c
>>> +++ b/drivers/net/mlx5/mlx5_txpp.c
>>> @@ -961,8 +961,12 @@ mlx5_txpp_stop(struct rte_eth_dev *dev)
>>>    	MLX5_ASSERT(!ret);
>>>    	RTE_SET_USED(ret);
>>>    	MLX5_ASSERT(sh->txpp.refcnt);
>>> -	if (!sh->txpp.refcnt || --sh->txpp.refcnt)
>>> +	if (!sh->txpp.refcnt || --sh->txpp.refcnt) {
>>> +		ret = pthread_mutex_unlock(&sh->txpp.mutex);
>>> +		MLX5_ASSERT(!ret);
>>> +		RTE_SET_USED(ret);
>>
>> Is this 'RTE_SET_USED()' need to be used multiple times for same variable?
> mmm, It seems "claim_zero()" macro would be better here:
> 
> claim_zero(pthread_mutex_lock(&sh->txpp.mutex));
> 
> I will provide the cleanup patch, thank you for noticing that
> 
>> This usage looks ugly, I can see why it is used but I wonder if this can be
>> solved differently, what about something like following:
>>
>>    #ifdef RTE_LIBRTE_MLX5_DEBUG
>>     #define MLX5_ASSERT(exp) RTE_VERIFY(exp)
>>    #else
>>     #ifdef RTE_ENABLE_ASSERT
>>      #define MLX5_ASSERT(exp) RTE_ASSERT(exp)
>>     #else
>>      #define MLX5_ASSERT(exp) RTE_SET_USED(exp)
>>     #endif
>>    #endif
> It would directly replace MLX5_ASSERT(exp) with RTE_SET_USED(exp)
> if there is neither RTE_ENABLE_ASSERT nor RTE_LIBRTE_MLX5_DEBUG.
> We would not like to drop the "not used" check functionality at all , right?
> 

The suggestion was to prevent following kind of usage:
  	MLX5_ASSERT(!ret);
  	RTE_SET_USED(ret);

I assume you need above usage when a variable is used only in the 'MLX5_ASSERT',
if there is a way to prevent warning in that case without 'RTE_SET_USED' that
may be better.
  

Patch

diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 4f6da9f2d1..0ece788a84 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -961,8 +961,12 @@  mlx5_txpp_stop(struct rte_eth_dev *dev)
 	MLX5_ASSERT(!ret);
 	RTE_SET_USED(ret);
 	MLX5_ASSERT(sh->txpp.refcnt);
-	if (!sh->txpp.refcnt || --sh->txpp.refcnt)
+	if (!sh->txpp.refcnt || --sh->txpp.refcnt) {
+		ret = pthread_mutex_unlock(&sh->txpp.mutex);
+		MLX5_ASSERT(!ret);
+		RTE_SET_USED(ret);
 		return;
+	}
 	/* No references any more, do actual destroy. */
 	mlx5_txpp_destroy(sh);
 	ret = pthread_mutex_unlock(&sh->txpp.mutex);