net/mlx5: handle expected errno properly

Message ID 20180823063851.32559-1-jackmin@mellanox.com (mailing list archive)
State Superseded, archived
Headers
Series net/mlx5: handle expected errno properly |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Xiaoyu Min Aug. 23, 2018, 6:38 a.m. UTC
  rte_errno is a per thread variable and is widely used as an
error indicator, which means a function could affect
other functions' results by setting rte_errno carelessly

During rxq setup, an EINVAL rte_errno is expected since
the queues are not created yet
So rte_errno is cleared when it is EINVAL as expected

Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
---
 drivers/net/mlx5/mlx5_rxq.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)
  

Comments

Yongseok Koh Aug. 23, 2018, 9:08 p.m. UTC | #1
On Thu, Aug 23, 2018 at 02:38:51PM +0800, Xiaoyu Min wrote:
> rte_errno is a per thread variable and is widely used as an
> error indicator, which means a function could affect
> other functions' results by setting rte_errno carelessly
> 
> During rxq setup, an EINVAL rte_errno is expected since
> the queues are not created yet
> So rte_errno is cleared when it is EINVAL as expected
> 
> Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5_rxq.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> index 1f7bfd4..e7056e8 100644
> --- a/drivers/net/mlx5/mlx5_rxq.c
> +++ b/drivers/net/mlx5/mlx5_rxq.c
> @@ -443,6 +443,7 @@
>  	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
>  	struct mlx5_rxq_ctrl *rxq_ctrl =
>  		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
> +	int ret = 0;
>  
>  	if (!rte_is_power_of_2(desc)) {
>  		desc = 1 << log2above(desc);
> @@ -459,13 +460,21 @@
>  		rte_errno = EOVERFLOW;
>  		return -rte_errno;
>  	}
> -	if (!mlx5_rxq_releasable(dev, idx)) {
> +	ret = mlx5_rxq_releasable(dev, idx);
> +	if (!ret) {
>  		DRV_LOG(ERR, "port %u unable to release queue index %u",
>  			dev->data->port_id, idx);
>  		rte_errno = EBUSY;
>  		return -rte_errno;
> +	} else if (ret == -EINVAL) {
> +		/**
> +		 * on the first time, rx queue doesn't exist,
> +		 * so just ignore this error and reset rte_errno.
> +		 */
> +		rte_errno = 0;

Unless this function returns failure, the rte_errno will be ignored by caller
and caller shouldn't assume rte_errno has 0. Caller will assume it is garbage
data in case of success. So we can silently ignore this case. Does it cause any
issue in application side?


Thanks,
Yongseok

> +	} else {
> +		mlx5_rxq_release(dev, idx);
>  	}
> -	mlx5_rxq_release(dev, idx);
>  	rxq_ctrl = mlx5_rxq_new(dev, idx, desc, socket, conf, mp);
>  	if (!rxq_ctrl) {
>  		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
> @@ -1543,11 +1552,12 @@ struct mlx5_rxq_ctrl *
>   * @param dev
>   *   Pointer to Ethernet device.
>   * @param idx
> - *   TX queue index.
> + *   RX queue index.
>   *
>   * @return
> - *   1 if the queue can be released, negative errno otherwise and rte_errno is
> - *   set.
> + *   1 if the queue can be released
> + *   0 if the queue can not be released
> + *   -EINVAL if the queue doesn't exist
>   */
>  int
>  mlx5_rxq_releasable(struct rte_eth_dev *dev, uint16_t idx)
> -- 
> 1.8.3.1
>
  
Xiaoyu Min Aug. 24, 2018, 6:45 a.m. UTC | #2
On Thu, Aug 23, 2018 at 02:08:09PM -0700, Yongseok Koh wrote:
> On Thu, Aug 23, 2018 at 02:38:51PM +0800, Xiaoyu Min wrote:
> > rte_errno is a per thread variable and is widely used as an
> > error indicator, which means a function could affect
> > other functions' results by setting rte_errno carelessly
> > 
> > During rxq setup, an EINVAL rte_errno is expected since
> > the queues are not created yet
> > So rte_errno is cleared when it is EINVAL as expected
> > 
> > Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
> > ---
> >  drivers/net/mlx5/mlx5_rxq.c | 20 +++++++++++++++-----
> >  1 file changed, 15 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> > index 1f7bfd4..e7056e8 100644
> > --- a/drivers/net/mlx5/mlx5_rxq.c
> > +++ b/drivers/net/mlx5/mlx5_rxq.c
> > @@ -443,6 +443,7 @@
> >  	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
> >  	struct mlx5_rxq_ctrl *rxq_ctrl =
> >  		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
> > +	int ret = 0;
> >  
> >  	if (!rte_is_power_of_2(desc)) {
> >  		desc = 1 << log2above(desc);
> > @@ -459,13 +460,21 @@
> >  		rte_errno = EOVERFLOW;
> >  		return -rte_errno;
> >  	}
> > -	if (!mlx5_rxq_releasable(dev, idx)) {
> > +	ret = mlx5_rxq_releasable(dev, idx);
> > +	if (!ret) {
> >  		DRV_LOG(ERR, "port %u unable to release queue index %u",
> >  			dev->data->port_id, idx);
> >  		rte_errno = EBUSY;
> >  		return -rte_errno;
> > +	} else if (ret == -EINVAL) {
> > +		/**
> > +		 * on the first time, rx queue doesn't exist,
> > +		 * so just ignore this error and reset rte_errno.
> > +		 */
> > +		rte_errno = 0;
> 
> Unless this function returns failure, the rte_errno will be ignored by caller
> and caller shouldn't assume rte_errno has 0. Caller will assume it is garbage
> data in case of success. So we can silently ignore this case. Does it cause any
> issue in application side?
> 
Not application side but mlx5 PMD this time:
**mlx5_fdir_filter_delete** 
which just _return -rte_errno;_
For sure, _mlx5_fdir_filter_delete_ should be more defensive, should not assume
rte_errno is zero if no error happened.
However if the caller know that an error will happen and rte_errno will become
meaningless (garbage) for the succeeding functions, Catching the expected error
and resetting rte_errno will be better. What do you think?

-Jack

> 
> Thanks,
> Yongseok
> 
> > +	} else {
> > +		mlx5_rxq_release(dev, idx);
> >  	}
> > -	mlx5_rxq_release(dev, idx);
> >  	rxq_ctrl = mlx5_rxq_new(dev, idx, desc, socket, conf, mp);
> >  	if (!rxq_ctrl) {
> >  		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
> > @@ -1543,11 +1552,12 @@ struct mlx5_rxq_ctrl *
> >   * @param dev
> >   *   Pointer to Ethernet device.
> >   * @param idx
> > - *   TX queue index.
> > + *   RX queue index.
> >   *
> >   * @return
> > - *   1 if the queue can be released, negative errno otherwise and rte_errno is
> > - *   set.
> > + *   1 if the queue can be released
> > + *   0 if the queue can not be released
> > + *   -EINVAL if the queue doesn't exist
> >   */
> >  int
> >  mlx5_rxq_releasable(struct rte_eth_dev *dev, uint16_t idx)
> > -- 
> > 1.8.3.1
> >
  
Yongseok Koh Aug. 28, 2018, 8:42 p.m. UTC | #3
On Fri, Aug 24, 2018 at 02:45:00PM +0800, Jack MIN wrote:
> On Thu, Aug 23, 2018 at 02:08:09PM -0700, Yongseok Koh wrote:
> > On Thu, Aug 23, 2018 at 02:38:51PM +0800, Xiaoyu Min wrote:
> > > rte_errno is a per thread variable and is widely used as an
> > > error indicator, which means a function could affect
> > > other functions' results by setting rte_errno carelessly
> > > 
> > > During rxq setup, an EINVAL rte_errno is expected since
> > > the queues are not created yet
> > > So rte_errno is cleared when it is EINVAL as expected
> > > 
> > > Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
> > > ---
> > >  drivers/net/mlx5/mlx5_rxq.c | 20 +++++++++++++++-----
> > >  1 file changed, 15 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> > > index 1f7bfd4..e7056e8 100644
> > > --- a/drivers/net/mlx5/mlx5_rxq.c
> > > +++ b/drivers/net/mlx5/mlx5_rxq.c
> > > @@ -443,6 +443,7 @@
> > >  	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
> > >  	struct mlx5_rxq_ctrl *rxq_ctrl =
> > >  		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
> > > +	int ret = 0;
> > >  
> > >  	if (!rte_is_power_of_2(desc)) {
> > >  		desc = 1 << log2above(desc);
> > > @@ -459,13 +460,21 @@
> > >  		rte_errno = EOVERFLOW;
> > >  		return -rte_errno;
> > >  	}
> > > -	if (!mlx5_rxq_releasable(dev, idx)) {
> > > +	ret = mlx5_rxq_releasable(dev, idx);
> > > +	if (!ret) {
> > >  		DRV_LOG(ERR, "port %u unable to release queue index %u",
> > >  			dev->data->port_id, idx);
> > >  		rte_errno = EBUSY;
> > >  		return -rte_errno;
> > > +	} else if (ret == -EINVAL) {
> > > +		/**
> > > +		 * on the first time, rx queue doesn't exist,
> > > +		 * so just ignore this error and reset rte_errno.
> > > +		 */
> > > +		rte_errno = 0;
> > 
> > Unless this function returns failure, the rte_errno will be ignored by caller
> > and caller shouldn't assume rte_errno has 0. Caller will assume it is garbage
> > data in case of success. So we can silently ignore this case. Does it cause any
> > issue in application side?
> > 
> Not application side but mlx5 PMD this time:
> **mlx5_fdir_filter_delete** 
> which just _return -rte_errno;_

Looks like an error. mlx5_fdir_filter_delete() can't be like that. We seem to
have lost the code while refactoring it. Let take it offline.

> For sure, _mlx5_fdir_filter_delete_ should be more defensive, should not assume
> rte_errno is zero if no error happened.
> However if the caller know that an error will happen and rte_errno will become
> meaningless (garbage) for the succeeding functions, Catching the expected error
> and resetting rte_errno will be better. What do you think?

Still don't understand clearly. There would be many other similar cases where we
don't clear rte_errno when returning success. I don't understand why this case
should be taken as a special one??

Thanks
Yongseok
  
Xiaoyu Min Aug. 29, 2018, 3:52 a.m. UTC | #4
On Tue, Aug 28, 2018 at 01:42:18PM -0700, Yongseok Koh wrote:
> On Fri, Aug 24, 2018 at 02:45:00PM +0800, Jack MIN wrote:
> > On Thu, Aug 23, 2018 at 02:08:09PM -0700, Yongseok Koh wrote:
> > > On Thu, Aug 23, 2018 at 02:38:51PM +0800, Xiaoyu Min wrote:
> > > > rte_errno is a per thread variable and is widely used as an
> > > > error indicator, which means a function could affect
> > > > other functions' results by setting rte_errno carelessly
> > > > 
> > > > During rxq setup, an EINVAL rte_errno is expected since
> > > > the queues are not created yet
> > > > So rte_errno is cleared when it is EINVAL as expected
> > > > 
> > > > Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
> > > > ---
> > > >  drivers/net/mlx5/mlx5_rxq.c | 20 +++++++++++++++-----
> > > >  1 file changed, 15 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> > > > index 1f7bfd4..e7056e8 100644
> > > > --- a/drivers/net/mlx5/mlx5_rxq.c
> > > > +++ b/drivers/net/mlx5/mlx5_rxq.c
> > > > @@ -443,6 +443,7 @@
> > > >  	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
> > > >  	struct mlx5_rxq_ctrl *rxq_ctrl =
> > > >  		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
> > > > +	int ret = 0;
> > > >  
> > > >  	if (!rte_is_power_of_2(desc)) {
> > > >  		desc = 1 << log2above(desc);
> > > > @@ -459,13 +460,21 @@
> > > >  		rte_errno = EOVERFLOW;
> > > >  		return -rte_errno;
> > > >  	}
> > > > -	if (!mlx5_rxq_releasable(dev, idx)) {
> > > > +	ret = mlx5_rxq_releasable(dev, idx);
> > > > +	if (!ret) {
> > > >  		DRV_LOG(ERR, "port %u unable to release queue index %u",
> > > >  			dev->data->port_id, idx);
> > > >  		rte_errno = EBUSY;
> > > >  		return -rte_errno;
> > > > +	} else if (ret == -EINVAL) {
> > > > +		/**
> > > > +		 * on the first time, rx queue doesn't exist,
> > > > +		 * so just ignore this error and reset rte_errno.
> > > > +		 */
> > > > +		rte_errno = 0;
> > > 
> > > Unless this function returns failure, the rte_errno will be ignored by caller
> > > and caller shouldn't assume rte_errno has 0. Caller will assume it is garbage
> > > data in case of success. So we can silently ignore this case. Does it cause any
> > > issue in application side?
> > > 
> > Not application side but mlx5 PMD this time:
> > **mlx5_fdir_filter_delete** 
> > which just _return -rte_errno;_
> 
> Looks like an error. mlx5_fdir_filter_delete() can't be like that. We seem to
> have lost the code while refactoring it. Let take it offline.
> 
Sure ~

> > For sure, _mlx5_fdir_filter_delete_ should be more defensive, should not assume
> > rte_errno is zero if no error happened.
> > However if the caller know that an error will happen and rte_errno will become
> > meaningless (garbage) for the succeeding functions, Catching the expected error
> > and resetting rte_errno will be better. What do you think?
> 
> Still don't understand clearly. There would be many other similar cases where we
> don't clear rte_errno when returning success. I don't understand why this case
> should be taken as a special one??
> 
No, the _mlx5_rxq_releasable()_ returns *error* (-EINVAL)(the rxq doesn't exist)
as my understanding
We only check if rxq is not releasable(==0) in previouse version but we didn't
check if function returns error or success
> Thanks
> Yongseok
  
Yongseok Koh Aug. 31, 2018, 9:48 p.m. UTC | #5
> On Aug 28, 2018, at 8:52 PM, Jack Min <jackmin@mellanox.com> wrote:
> 
> On Tue, Aug 28, 2018 at 01:42:18PM -0700, Yongseok Koh wrote:
>> On Fri, Aug 24, 2018 at 02:45:00PM +0800, Jack MIN wrote:
>>> On Thu, Aug 23, 2018 at 02:08:09PM -0700, Yongseok Koh wrote:
>>>> On Thu, Aug 23, 2018 at 02:38:51PM +0800, Xiaoyu Min wrote:
>>>>> rte_errno is a per thread variable and is widely used as an
>>>>> error indicator, which means a function could affect
>>>>> other functions' results by setting rte_errno carelessly
>>>>> 
>>>>> During rxq setup, an EINVAL rte_errno is expected since
>>>>> the queues are not created yet
>>>>> So rte_errno is cleared when it is EINVAL as expected
>>>>> 
>>>>> Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
>>>>> ---
>>>>> drivers/net/mlx5/mlx5_rxq.c | 20 +++++++++++++++-----
>>>>> 1 file changed, 15 insertions(+), 5 deletions(-)
>>>>> 
>>>>> diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
>>>>> index 1f7bfd4..e7056e8 100644
>>>>> --- a/drivers/net/mlx5/mlx5_rxq.c
>>>>> +++ b/drivers/net/mlx5/mlx5_rxq.c
>>>>> @@ -443,6 +443,7 @@
>>>>> 	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
>>>>> 	struct mlx5_rxq_ctrl *rxq_ctrl =
>>>>> 		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
>>>>> +	int ret = 0;
>>>>> 
>>>>> 	if (!rte_is_power_of_2(desc)) {
>>>>> 		desc = 1 << log2above(desc);
>>>>> @@ -459,13 +460,21 @@
>>>>> 		rte_errno = EOVERFLOW;
>>>>> 		return -rte_errno;
>>>>> 	}
>>>>> -	if (!mlx5_rxq_releasable(dev, idx)) {
>>>>> +	ret = mlx5_rxq_releasable(dev, idx);
>>>>> +	if (!ret) {
>>>>> 		DRV_LOG(ERR, "port %u unable to release queue index %u",
>>>>> 			dev->data->port_id, idx);
>>>>> 		rte_errno = EBUSY;
>>>>> 		return -rte_errno;
>>>>> +	} else if (ret == -EINVAL) {
>>>>> +		/**
>>>>> +		 * on the first time, rx queue doesn't exist,
>>>>> +		 * so just ignore this error and reset rte_errno.
>>>>> +		 */
>>>>> +		rte_errno = 0;
>>>> 
>>>> Unless this function returns failure, the rte_errno will be ignored by caller
>>>> and caller shouldn't assume rte_errno has 0. Caller will assume it is garbage
>>>> data in case of success. So we can silently ignore this case. Does it cause any
>>>> issue in application side?
>>>> 
>>> Not application side but mlx5 PMD this time:
>>> **mlx5_fdir_filter_delete** 
>>> which just _return -rte_errno;_
>> 
>> Looks like an error. mlx5_fdir_filter_delete() can't be like that. We seem to
>> have lost the code while refactoring it. Let take it offline.
>> 
> Sure ~
> 
>>> For sure, _mlx5_fdir_filter_delete_ should be more defensive, should not assume
>>> rte_errno is zero if no error happened.
>>> However if the caller know that an error will happen and rte_errno will become
>>> meaningless (garbage) for the succeeding functions, Catching the expected error
>>> and resetting rte_errno will be better. What do you think?
>> 
>> Still don't understand clearly. There would be many other similar cases where we
>> don't clear rte_errno when returning success. I don't understand why this case
>> should be taken as a special one??
>> 
> No, the _mlx5_rxq_releasable()_ returns *error* (-EINVAL)(the rxq doesn't exist)
> as my understanding
> We only check if rxq is not releasable(==0) in previouse version but we didn't
> check if function returns error or success

I think that was intended. The case where rxq doesn't exist is silently ignored.
I know it is a little weird but still I don't see any need to clear rte_errno.


Thanks,
Yongseok
  

Patch

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 1f7bfd4..e7056e8 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -443,6 +443,7 @@ 
 	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
 	struct mlx5_rxq_ctrl *rxq_ctrl =
 		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+	int ret = 0;
 
 	if (!rte_is_power_of_2(desc)) {
 		desc = 1 << log2above(desc);
@@ -459,13 +460,21 @@ 
 		rte_errno = EOVERFLOW;
 		return -rte_errno;
 	}
-	if (!mlx5_rxq_releasable(dev, idx)) {
+	ret = mlx5_rxq_releasable(dev, idx);
+	if (!ret) {
 		DRV_LOG(ERR, "port %u unable to release queue index %u",
 			dev->data->port_id, idx);
 		rte_errno = EBUSY;
 		return -rte_errno;
+	} else if (ret == -EINVAL) {
+		/**
+		 * on the first time, rx queue doesn't exist,
+		 * so just ignore this error and reset rte_errno.
+		 */
+		rte_errno = 0;
+	} else {
+		mlx5_rxq_release(dev, idx);
 	}
-	mlx5_rxq_release(dev, idx);
 	rxq_ctrl = mlx5_rxq_new(dev, idx, desc, socket, conf, mp);
 	if (!rxq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
@@ -1543,11 +1552,12 @@  struct mlx5_rxq_ctrl *
  * @param dev
  *   Pointer to Ethernet device.
  * @param idx
- *   TX queue index.
+ *   RX queue index.
  *
  * @return
- *   1 if the queue can be released, negative errno otherwise and rte_errno is
- *   set.
+ *   1 if the queue can be released
+ *   0 if the queue can not be released
+ *   -EINVAL if the queue doesn't exist
  */
 int
 mlx5_rxq_releasable(struct rte_eth_dev *dev, uint16_t idx)