[dpdk-dev] net/mlx5: fix memory region cache init

Message ID 20180525063538.24363-1-xuemingl@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Shahaf Shuler
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail apply issues

Commit Message

Xueming Li May 25, 2018, 6:35 a.m. UTC
  This patch moved MR cache init from device configuration function to
probe function to make sure init only once.

Fixes: 974f1e7ef146 ("net/mlx5: add new memory region support")
Cc: yskoh@mellanox.com

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        | 11 +++++++++++
 drivers/net/mlx5/mlx5_ethdev.c | 11 -----------
 drivers/net/mlx5/mlx5_mr.c     |  1 +
 3 files changed, 12 insertions(+), 11 deletions(-)
  

Comments

Xueming Li May 25, 2018, 6:38 a.m. UTC | #1
Hi Shahaf,

Please note that this patch depends on Adrien's patch:
	http://dpdk.org/dev/patchwork/patch/40404/

> -----Original Message-----

> From: Xueming Li <xuemingl@mellanox.com>

> Sent: Friday, May 25, 2018 2:36 PM

> To: Shahaf Shuler <shahafs@mellanox.com>; Yongseok Koh <yskoh@mellanox.com>

> Cc: Xueming(Steven) Li <xuemingl@mellanox.com>; dev@dpdk.org

> Subject: [PATCH] net/mlx5: fix memory region cache init

> 

> This patch moved MR cache init from device configuration function to probe function to make sure init

> only once.

> 

> Fixes: 974f1e7ef146 ("net/mlx5: add new memory region support")

> Cc: yskoh@mellanox.com

> 

> Signed-off-by: Xueming Li <xuemingl@mellanox.com>

> ---

>  drivers/net/mlx5/mlx5.c        | 11 +++++++++++

>  drivers/net/mlx5/mlx5_ethdev.c | 11 -----------

>  drivers/net/mlx5/mlx5_mr.c     |  1 +

>  3 files changed, 12 insertions(+), 11 deletions(-)

> 

> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index dae847493..77ed8e01f 100644

> --- a/drivers/net/mlx5/mlx5.c

> +++ b/drivers/net/mlx5/mlx5.c

> @@ -1193,6 +1193,17 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,

>  			goto port_error;

>  		}

>  		priv->config.max_verbs_prio = verb_priorities;

> +		/*

> +		 * Once the device is added to the list of memory event

> +		 * callback, its global MR cache table cannot be expanded

> +		 * on the fly because of deadlock. If it overflows, lookup

> +		 * should be done by searching MR list linearly, which is slow.

> +		 */

> +		err = -mlx5_mr_btree_init(&priv->mr.cache,

> +					  MLX5_MR_BTREE_CACHE_N * 2,

> +					  eth_dev->device->numa_node);

> +		if (err)

> +			goto port_error;

>  		/* Add device to memory callback list. */

>  		rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);

>  		LIST_INSERT_HEAD(&mlx5_shared_data->mem_event_cb_list,

> diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index

> f6cebae41..90488af33 100644

> --- a/drivers/net/mlx5/mlx5_ethdev.c

> +++ b/drivers/net/mlx5/mlx5_ethdev.c

> @@ -392,17 +392,6 @@ mlx5_dev_configure(struct rte_eth_dev *dev)

>  		if (++j == rxqs_n)

>  			j = 0;

>  	}

> -	/*

> -	 * Once the device is added to the list of memory event callback, its

> -	 * global MR cache table cannot be expanded on the fly because of

> -	 * deadlock. If it overflows, lookup should be done by searching MR list

> -	 * linearly, which is slow.

> -	 */

> -	if (mlx5_mr_btree_init(&priv->mr.cache, MLX5_MR_BTREE_CACHE_N * 2,

> -			       dev->device->numa_node)) {

> -		/* rte_errno is already set. */

> -		return -rte_errno;

> -	}

>  	return 0;

>  }

> 

> diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c index abb1f5179..08105a443 100644

> --- a/drivers/net/mlx5/mlx5_mr.c

> +++ b/drivers/net/mlx5/mlx5_mr.c

> @@ -191,6 +191,7 @@ mlx5_mr_btree_init(struct mlx5_mr_btree *bt, int n, int socket)

>  		rte_errno = EINVAL;

>  		return -rte_errno;

>  	}

> +	assert(!bt->table && !bt->size);

>  	memset(bt, 0, sizeof(*bt));

>  	bt->table = rte_calloc_socket("B-tree table",

>  				      n, sizeof(struct mlx5_mr_cache),

> --

> 2.13.3
  
Yongseok Koh May 25, 2018, 10:19 a.m. UTC | #2
> On May 24, 2018, at 11:35 PM, Xueming Li <xuemingl@mellanox.com> wrote:
> 
> This patch moved MR cache init from device configuration function to
> probe function to make sure init only once.
> 
> Fixes: 974f1e7ef146 ("net/mlx5: add new memory region support")
> Cc: yskoh@mellanox.com
> 
> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> ---
> drivers/net/mlx5/mlx5.c        | 11 +++++++++++
> drivers/net/mlx5/mlx5_ethdev.c | 11 -----------
> drivers/net/mlx5/mlx5_mr.c     |  1 +
> 3 files changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> index dae847493..77ed8e01f 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -1193,6 +1193,17 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
> 			goto port_error;
> 		}
> 		priv->config.max_verbs_prio = verb_priorities;
> +		/*
> +		 * Once the device is added to the list of memory event
> +		 * callback, its global MR cache table cannot be expanded
> +		 * on the fly because of deadlock. If it overflows, lookup
> +		 * should be done by searching MR list linearly, which is slow.
> +		 */
> +		err = -mlx5_mr_btree_init(&priv->mr.cache,
> +					  MLX5_MR_BTREE_CACHE_N * 2,
> +					  eth_dev->device->numa_node);
> +		if (err)
> +			goto port_error;

A nit.
Like mlx5_flow_create_drop_queue(), please store rte_errno to err (err = rte_errno;)
instead of putting a minus sign to the function.

With that being fixed, you can put my acked-by tag when you submit v2.
 
Thanks,
Yongseok

> 		/* Add device to memory callback list. */
> 		rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);
> 		LIST_INSERT_HEAD(&mlx5_shared_data->mem_event_cb_list,
> diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
> index f6cebae41..90488af33 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -392,17 +392,6 @@ mlx5_dev_configure(struct rte_eth_dev *dev)
> 		if (++j == rxqs_n)
> 			j = 0;
> 	}
> -	/*
> -	 * Once the device is added to the list of memory event callback, its
> -	 * global MR cache table cannot be expanded on the fly because of
> -	 * deadlock. If it overflows, lookup should be done by searching MR list
> -	 * linearly, which is slow.
> -	 */
> -	if (mlx5_mr_btree_init(&priv->mr.cache, MLX5_MR_BTREE_CACHE_N * 2,
> -			       dev->device->numa_node)) {
> -		/* rte_errno is already set. */
> -		return -rte_errno;
> -	}
> 	return 0;
> }
> 
> diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
> index abb1f5179..08105a443 100644
> --- a/drivers/net/mlx5/mlx5_mr.c
> +++ b/drivers/net/mlx5/mlx5_mr.c
> @@ -191,6 +191,7 @@ mlx5_mr_btree_init(struct mlx5_mr_btree *bt, int n, int socket)
> 		rte_errno = EINVAL;
> 		return -rte_errno;
> 	}
> +	assert(!bt->table && !bt->size);
> 	memset(bt, 0, sizeof(*bt));
> 	bt->table = rte_calloc_socket("B-tree table",
> 				      n, sizeof(struct mlx5_mr_cache),
> -- 
> 2.13.3
>
  
Xueming Li May 25, 2018, 1:18 p.m. UTC | #3
> -----Original Message-----

> From: Yongseok Koh

> Sent: Friday, May 25, 2018 6:20 PM

> To: Xueming(Steven) Li <xuemingl@mellanox.com>

> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org

> Subject: Re: [PATCH] net/mlx5: fix memory region cache init

> 

> > On May 24, 2018, at 11:35 PM, Xueming Li <xuemingl@mellanox.com> wrote:

> >

> > This patch moved MR cache init from device configuration function to

> > probe function to make sure init only once.

> >

> > Fixes: 974f1e7ef146 ("net/mlx5: add new memory region support")

> > Cc: yskoh@mellanox.com

> >

> > Signed-off-by: Xueming Li <xuemingl@mellanox.com>

> > ---

> > drivers/net/mlx5/mlx5.c        | 11 +++++++++++

> > drivers/net/mlx5/mlx5_ethdev.c | 11 -----------

> > drivers/net/mlx5/mlx5_mr.c     |  1 +

> > 3 files changed, 12 insertions(+), 11 deletions(-)

> >

> > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index

> > dae847493..77ed8e01f 100644

> > --- a/drivers/net/mlx5/mlx5.c

> > +++ b/drivers/net/mlx5/mlx5.c

> > @@ -1193,6 +1193,17 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,

> > 			goto port_error;

> > 		}

> > 		priv->config.max_verbs_prio = verb_priorities;

> > +		/*

> > +		 * Once the device is added to the list of memory event

> > +		 * callback, its global MR cache table cannot be expanded

> > +		 * on the fly because of deadlock. If it overflows, lookup

> > +		 * should be done by searching MR list linearly, which is slow.

> > +		 */

> > +		err = -mlx5_mr_btree_init(&priv->mr.cache,

> > +					  MLX5_MR_BTREE_CACHE_N * 2,

> > +					  eth_dev->device->numa_node);

> > +		if (err)

> > +			goto port_error;

> 

> A nit.

> Like mlx5_flow_create_drop_queue(), please store rte_errno to err (err = rte_errno;) instead of

> putting a minus sign to the function.


Rte_errno is set inside mlx5_mr_btree_init() if any error, that's why I simply the code.

> 

> With that being fixed, you can put my acked-by tag when you submit v2.

> 

> Thanks,

> Yongseok

> 

> > 		/* Add device to memory callback list. */

> > 		rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);

> > 		LIST_INSERT_HEAD(&mlx5_shared_data->mem_event_cb_list,

> > diff --git a/drivers/net/mlx5/mlx5_ethdev.c

> > b/drivers/net/mlx5/mlx5_ethdev.c index f6cebae41..90488af33 100644

> > --- a/drivers/net/mlx5/mlx5_ethdev.c

> > +++ b/drivers/net/mlx5/mlx5_ethdev.c

> > @@ -392,17 +392,6 @@ mlx5_dev_configure(struct rte_eth_dev *dev)

> > 		if (++j == rxqs_n)

> > 			j = 0;

> > 	}

> > -	/*

> > -	 * Once the device is added to the list of memory event callback, its

> > -	 * global MR cache table cannot be expanded on the fly because of

> > -	 * deadlock. If it overflows, lookup should be done by searching MR list

> > -	 * linearly, which is slow.

> > -	 */

> > -	if (mlx5_mr_btree_init(&priv->mr.cache, MLX5_MR_BTREE_CACHE_N * 2,

> > -			       dev->device->numa_node)) {

> > -		/* rte_errno is already set. */

> > -		return -rte_errno;

> > -	}

> > 	return 0;

> > }

> >

> > diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c

> > index abb1f5179..08105a443 100644

> > --- a/drivers/net/mlx5/mlx5_mr.c

> > +++ b/drivers/net/mlx5/mlx5_mr.c

> > @@ -191,6 +191,7 @@ mlx5_mr_btree_init(struct mlx5_mr_btree *bt, int n, int socket)

> > 		rte_errno = EINVAL;

> > 		return -rte_errno;

> > 	}

> > +	assert(!bt->table && !bt->size);

> > 	memset(bt, 0, sizeof(*bt));

> > 	bt->table = rte_calloc_socket("B-tree table",

> > 				      n, sizeof(struct mlx5_mr_cache),

> > --

> > 2.13.3

> >
  
Yongseok Koh May 25, 2018, 3:41 p.m. UTC | #4
> On May 25, 2018, at 6:18 AM, Xueming(Steven) Li <xuemingl@mellanox.com> wrote:

> 

> 

> 

>> -----Original Message-----

>> From: Yongseok Koh

>> Sent: Friday, May 25, 2018 6:20 PM

>> To: Xueming(Steven) Li <xuemingl@mellanox.com>

>> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org

>> Subject: Re: [PATCH] net/mlx5: fix memory region cache init

>> 

>>> On May 24, 2018, at 11:35 PM, Xueming Li <xuemingl@mellanox.com> wrote:

>>> 

>>> This patch moved MR cache init from device configuration function to

>>> probe function to make sure init only once.

>>> 

>>> Fixes: 974f1e7ef146 ("net/mlx5: add new memory region support")

>>> Cc: yskoh@mellanox.com

>>> 

>>> Signed-off-by: Xueming Li <xuemingl@mellanox.com>

>>> ---

>>> drivers/net/mlx5/mlx5.c        | 11 +++++++++++

>>> drivers/net/mlx5/mlx5_ethdev.c | 11 -----------

>>> drivers/net/mlx5/mlx5_mr.c     |  1 +

>>> 3 files changed, 12 insertions(+), 11 deletions(-)

>>> 

>>> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index

>>> dae847493..77ed8e01f 100644

>>> --- a/drivers/net/mlx5/mlx5.c

>>> +++ b/drivers/net/mlx5/mlx5.c

>>> @@ -1193,6 +1193,17 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,

>>>            goto port_error;

>>>        }

>>>        priv->config.max_verbs_prio = verb_priorities;

>>> +        /*

>>> +         * Once the device is added to the list of memory event

>>> +         * callback, its global MR cache table cannot be expanded

>>> +         * on the fly because of deadlock. If it overflows, lookup

>>> +         * should be done by searching MR list linearly, which is slow.

>>> +         */

>>> +        err = -mlx5_mr_btree_init(&priv->mr.cache,

>>> +                      MLX5_MR_BTREE_CACHE_N * 2,

>>> +                      eth_dev->device->numa_node);

>>> +        if (err)

>>> +            goto port_error;

>> 

>> A nit.

>> Like mlx5_flow_create_drop_queue(), please store rte_errno to err (err = rte_errno;) instead of

>> putting a minus sign to the function.

> 

> Rte_errno is set inside mlx5_mr_btree_init() if any error, that's why I simply the code.


I understood your intention and I know that’s not faulty either. 

However, here consistency is more important than simplicity. In the probe func, we have been fixing quite a few bugs due to code inconsistencies. I don’t want to put a minus sign but follow the way as we did for mlx5_flow_create_drop_queue() in the same function. Putting a minus sign unlike other code around it could be another buggy point if someone else makes changes there in the future. Please make it aligned because both approaches are same anyway. 

Thanks 
Yongseok 
>> 

>> With that being fixed, you can put my acked-by tag when you submit v2.

>> 

>> Thanks,

>> Yongseok

>> 

>>>        /* Add device to memory callback list. */

>>>        rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);

>>>        LIST_INSERT_HEAD(&mlx5_shared_data->mem_event_cb_list,

>>> diff --git a/drivers/net/mlx5/mlx5_ethdev.c

>>> b/drivers/net/mlx5/mlx5_ethdev.c index f6cebae41..90488af33 100644

>>> --- a/drivers/net/mlx5/mlx5_ethdev.c

>>> +++ b/drivers/net/mlx5/mlx5_ethdev.c

>>> @@ -392,17 +392,6 @@ mlx5_dev_configure(struct rte_eth_dev *dev)

>>>        if (++j == rxqs_n)

>>>            j = 0;

>>>    }

>>> -    /*

>>> -     * Once the device is added to the list of memory event callback, its

>>> -     * global MR cache table cannot be expanded on the fly because of

>>> -     * deadlock. If it overflows, lookup should be done by searching MR list

>>> -     * linearly, which is slow.

>>> -     */

>>> -    if (mlx5_mr_btree_init(&priv->mr.cache, MLX5_MR_BTREE_CACHE_N * 2,

>>> -                   dev->device->numa_node)) {

>>> -        /* rte_errno is already set. */

>>> -        return -rte_errno;

>>> -    }

>>>    return 0;

>>> }

>>> 

>>> diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c

>>> index abb1f5179..08105a443 100644

>>> --- a/drivers/net/mlx5/mlx5_mr.c

>>> +++ b/drivers/net/mlx5/mlx5_mr.c

>>> @@ -191,6 +191,7 @@ mlx5_mr_btree_init(struct mlx5_mr_btree *bt, int n, int socket)

>>>        rte_errno = EINVAL;

>>>        return -rte_errno;

>>>    }

>>> +    assert(!bt->table && !bt->size);

>>>    memset(bt, 0, sizeof(*bt));

>>>    bt->table = rte_calloc_socket("B-tree table",

>>>                      n, sizeof(struct mlx5_mr_cache),

>>> --

>>> 2.13.3

>>> 

>
  
Xueming Li May 26, 2018, 2:31 a.m. UTC | #5
> -----Original Message-----

> From: Yongseok Koh

> Sent: Friday, May 25, 2018 11:41 PM

> To: Xueming(Steven) Li <xuemingl@mellanox.com>

> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org; Nélio Laranjeiro <nelio.laranjeiro@6wind.com>;

> Adrien Mazarguil <adrien.mazarguil@6wind.com>

> Subject: Re: [PATCH] net/mlx5: fix memory region cache init

> 

> 

> > On May 25, 2018, at 6:18 AM, Xueming(Steven) Li <xuemingl@mellanox.com> wrote:

> >

> >

> >

> >> -----Original Message-----

> >> From: Yongseok Koh

> >> Sent: Friday, May 25, 2018 6:20 PM

> >> To: Xueming(Steven) Li <xuemingl@mellanox.com>

> >> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org

> >> Subject: Re: [PATCH] net/mlx5: fix memory region cache init

> >>

> >>> On May 24, 2018, at 11:35 PM, Xueming Li <xuemingl@mellanox.com> wrote:

> >>>

> >>> This patch moved MR cache init from device configuration function to

> >>> probe function to make sure init only once.

> >>>

> >>> Fixes: 974f1e7ef146 ("net/mlx5: add new memory region support")

> >>> Cc: yskoh@mellanox.com

> >>>

> >>> Signed-off-by: Xueming Li <xuemingl@mellanox.com>

> >>> ---

> >>> drivers/net/mlx5/mlx5.c        | 11 +++++++++++

> >>> drivers/net/mlx5/mlx5_ethdev.c | 11 -----------

> >>> drivers/net/mlx5/mlx5_mr.c     |  1 +

> >>> 3 files changed, 12 insertions(+), 11 deletions(-)

> >>>

> >>> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index

> >>> dae847493..77ed8e01f 100644

> >>> --- a/drivers/net/mlx5/mlx5.c

> >>> +++ b/drivers/net/mlx5/mlx5.c

> >>> @@ -1193,6 +1193,17 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,

> >>>            goto port_error;

> >>>        }

> >>>        priv->config.max_verbs_prio = verb_priorities;

> >>> +        /*

> >>> +         * Once the device is added to the list of memory event

> >>> +         * callback, its global MR cache table cannot be expanded

> >>> +         * on the fly because of deadlock. If it overflows, lookup

> >>> +         * should be done by searching MR list linearly, which is slow.

> >>> +         */

> >>> +        err = -mlx5_mr_btree_init(&priv->mr.cache,

> >>> +                      MLX5_MR_BTREE_CACHE_N * 2,

> >>> +                      eth_dev->device->numa_node);

> >>> +        if (err)

> >>> +            goto port_error;

> >>

> >> A nit.

> >> Like mlx5_flow_create_drop_queue(), please store rte_errno to err

> >> (err = rte_errno;) instead of putting a minus sign to the function.

> >

> > Rte_errno is set inside mlx5_mr_btree_init() if any error, that's why I simply the code.

> 

> I understood your intention and I know that’s not faulty either.

> 

> However, here consistency is more important than simplicity. In the probe func, we have been fixing

> quite a few bugs due to code inconsistencies. I don’t want to put a minus sign but follow the way as

> we did for mlx5_flow_create_drop_queue() in the same function. Putting a minus sign unlike other code

> around it could be another buggy point if someone else makes changes there in the future. Please make

> it aligned because both approaches are same anyway.


Such design is confusing, error code just used to test error happens or not, in functions we spend many 
code to save errno to both rte_errno and return value, and then also caller side. If people intend to 
have error info saved in rte_errno, then maybe changing functions returning void could be better - this 
something also anti-design, like errno, rte_errno is used to indicate detail root cause when error happens,
for people who want to know detail, that can’t necessarily mean that we should pass it precisely in each
caller. Just personal concern.

> 

> Thanks

> Yongseok

> >>

> >> With that being fixed, you can put my acked-by tag when you submit v2.

> >>

> >> Thanks,

> >> Yongseok

> >>

> >>>        /* Add device to memory callback list. */

> >>>        rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);

> >>>        LIST_INSERT_HEAD(&mlx5_shared_data->mem_event_cb_list,

> >>> diff --git a/drivers/net/mlx5/mlx5_ethdev.c

> >>> b/drivers/net/mlx5/mlx5_ethdev.c index f6cebae41..90488af33 100644

> >>> --- a/drivers/net/mlx5/mlx5_ethdev.c

> >>> +++ b/drivers/net/mlx5/mlx5_ethdev.c

> >>> @@ -392,17 +392,6 @@ mlx5_dev_configure(struct rte_eth_dev *dev)

> >>>        if (++j == rxqs_n)

> >>>            j = 0;

> >>>    }

> >>> -    /*

> >>> -     * Once the device is added to the list of memory event callback, its

> >>> -     * global MR cache table cannot be expanded on the fly because of

> >>> -     * deadlock. If it overflows, lookup should be done by searching MR list

> >>> -     * linearly, which is slow.

> >>> -     */

> >>> -    if (mlx5_mr_btree_init(&priv->mr.cache, MLX5_MR_BTREE_CACHE_N * 2,

> >>> -                   dev->device->numa_node)) {

> >>> -        /* rte_errno is already set. */

> >>> -        return -rte_errno;

> >>> -    }

> >>>    return 0;

> >>> }

> >>>

> >>> diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c

> >>> index abb1f5179..08105a443 100644

> >>> --- a/drivers/net/mlx5/mlx5_mr.c

> >>> +++ b/drivers/net/mlx5/mlx5_mr.c

> >>> @@ -191,6 +191,7 @@ mlx5_mr_btree_init(struct mlx5_mr_btree *bt, int n, int socket)

> >>>        rte_errno = EINVAL;

> >>>        return -rte_errno;

> >>>    }

> >>> +    assert(!bt->table && !bt->size);

> >>>    memset(bt, 0, sizeof(*bt));

> >>>    bt->table = rte_calloc_socket("B-tree table",

> >>>                      n, sizeof(struct mlx5_mr_cache),

> >>> --

> >>> 2.13.3

> >>>

> >
  
Yongseok Koh May 26, 2018, 5:01 a.m. UTC | #6
> On May 25, 2018, at 7:32 PM, Xueming(Steven) Li <xuemingl@mellanox.com> wrote:

> 

> 

> 

>> -----Original Message-----

>> From: Yongseok Koh

>> Sent: Friday, May 25, 2018 11:41 PM

>> To: Xueming(Steven) Li <xuemingl@mellanox.com>

>> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org; Nélio Laranjeiro <nelio.laranjeiro@6wind.com>;

>> Adrien Mazarguil <adrien.mazarguil@6wind.com>

>> Subject: Re: [PATCH] net/mlx5: fix memory region cache init

>> 

>> 

>>> On May 25, 2018, at 6:18 AM, Xueming(Steven) Li <xuemingl@mellanox.com> wrote:

>>> 

>>> 

>>> 

>>>> -----Original Message-----

>>>> From: Yongseok Koh

>>>> Sent: Friday, May 25, 2018 6:20 PM

>>>> To: Xueming(Steven) Li <xuemingl@mellanox.com>

>>>> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org

>>>> Subject: Re: [PATCH] net/mlx5: fix memory region cache init

>>>> 

>>>>> On May 24, 2018, at 11:35 PM, Xueming Li <xuemingl@mellanox.com> wrote:

>>>>> 

>>>>> This patch moved MR cache init from device configuration function to

>>>>> probe function to make sure init only once.

>>>>> 

>>>>> Fixes: 974f1e7ef146 ("net/mlx5: add new memory region support")

>>>>> Cc: yskoh@mellanox.com

>>>>> 

>>>>> Signed-off-by: Xueming Li <xuemingl@mellanox.com>

>>>>> ---

>>>>> drivers/net/mlx5/mlx5.c        | 11 +++++++++++

>>>>> drivers/net/mlx5/mlx5_ethdev.c | 11 -----------

>>>>> drivers/net/mlx5/mlx5_mr.c     |  1 +

>>>>> 3 files changed, 12 insertions(+), 11 deletions(-)

>>>>> 

>>>>> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index

>>>>> dae847493..77ed8e01f 100644

>>>>> --- a/drivers/net/mlx5/mlx5.c

>>>>> +++ b/drivers/net/mlx5/mlx5.c

>>>>> @@ -1193,6 +1193,17 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,

>>>>>           goto port_error;

>>>>>       }

>>>>>       priv->config.max_verbs_prio = verb_priorities;

>>>>> +        /*

>>>>> +         * Once the device is added to the list of memory event

>>>>> +         * callback, its global MR cache table cannot be expanded

>>>>> +         * on the fly because of deadlock. If it overflows, lookup

>>>>> +         * should be done by searching MR list linearly, which is slow.

>>>>> +         */

>>>>> +        err = -mlx5_mr_btree_init(&priv->mr.cache,

>>>>> +                      MLX5_MR_BTREE_CACHE_N * 2,

>>>>> +                      eth_dev->device->numa_node);

>>>>> +        if (err)

>>>>> +            goto port_error;

>>>> 

>>>> A nit.

>>>> Like mlx5_flow_create_drop_queue(), please store rte_errno to err

>>>> (err = rte_errno;) instead of putting a minus sign to the function.

>>> 

>>> Rte_errno is set inside mlx5_mr_btree_init() if any error, that's why I simply the code.

>> 

>> I understood your intention and I know that’s not faulty either.

>> 

>> However, here consistency is more important than simplicity. In the probe func, we have been fixing

>> quite a few bugs due to code inconsistencies. I don’t want to put a minus sign but follow the way as

>> we did for mlx5_flow_create_drop_queue() in the same function. Putting a minus sign unlike other code

>> around it could be another buggy point if someone else makes changes there in the future. Please make

>> it aligned because both approaches are same anyway.

> 

> Such design is confusing, error code just used to test error happens or not, in functions we spend many 

> code to save errno to both rte_errno and return value, and then also caller side. If people intend to 

> have error info saved in rte_errno, then maybe changing functions returning void could be better - this 

> something also anti-design, like errno, rte_errno is used to indicate detail root cause when error happens,

> for people who want to know detail, that can’t necessarily mean that we should pass it precisely in each

> caller. Just personal concern.


Of course, you can submit the idea to make such an enhancement. Please come up with a separate patch for 18.08. 

For now, let’s focus on fixing the bug for the current release. Schedule is really tight for 18.05. Please send out a new version. We don’t have much time.

Thanks
Yongseok 


>>>> 

>>>> With that being fixed, you can put my acked-by tag when you submit v2.

>>>> 

>>>> Thanks,

>>>> Yongseok

>>>> 

>>>>>       /* Add device to memory callback list. */

>>>>>       rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);

>>>>>       LIST_INSERT_HEAD(&mlx5_shared_data->mem_event_cb_list,

>>>>> diff --git a/drivers/net/mlx5/mlx5_ethdev.c

>>>>> b/drivers/net/mlx5/mlx5_ethdev.c index f6cebae41..90488af33 100644

>>>>> --- a/drivers/net/mlx5/mlx5_ethdev.c

>>>>> +++ b/drivers/net/mlx5/mlx5_ethdev.c

>>>>> @@ -392,17 +392,6 @@ mlx5_dev_configure(struct rte_eth_dev *dev)

>>>>>       if (++j == rxqs_n)

>>>>>           j = 0;

>>>>>   }

>>>>> -    /*

>>>>> -     * Once the device is added to the list of memory event callback, its

>>>>> -     * global MR cache table cannot be expanded on the fly because of

>>>>> -     * deadlock. If it overflows, lookup should be done by searching MR list

>>>>> -     * linearly, which is slow.

>>>>> -     */

>>>>> -    if (mlx5_mr_btree_init(&priv->mr.cache, MLX5_MR_BTREE_CACHE_N * 2,

>>>>> -                   dev->device->numa_node)) {

>>>>> -        /* rte_errno is already set. */

>>>>> -        return -rte_errno;

>>>>> -    }

>>>>>   return 0;

>>>>> }

>>>>> 

>>>>> diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c

>>>>> index abb1f5179..08105a443 100644

>>>>> --- a/drivers/net/mlx5/mlx5_mr.c

>>>>> +++ b/drivers/net/mlx5/mlx5_mr.c

>>>>> @@ -191,6 +191,7 @@ mlx5_mr_btree_init(struct mlx5_mr_btree *bt, int n, int socket)

>>>>>       rte_errno = EINVAL;

>>>>>       return -rte_errno;

>>>>>   }

>>>>> +    assert(!bt->table && !bt->size);

>>>>>   memset(bt, 0, sizeof(*bt));

>>>>>   bt->table = rte_calloc_socket("B-tree table",

>>>>>                     n, sizeof(struct mlx5_mr_cache),

>>>>> --

>>>>> 2.13.3

>>>>> 

>>>
  

Patch

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index dae847493..77ed8e01f 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1193,6 +1193,17 @@  mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 			goto port_error;
 		}
 		priv->config.max_verbs_prio = verb_priorities;
+		/*
+		 * Once the device is added to the list of memory event
+		 * callback, its global MR cache table cannot be expanded
+		 * on the fly because of deadlock. If it overflows, lookup
+		 * should be done by searching MR list linearly, which is slow.
+		 */
+		err = -mlx5_mr_btree_init(&priv->mr.cache,
+					  MLX5_MR_BTREE_CACHE_N * 2,
+					  eth_dev->device->numa_node);
+		if (err)
+			goto port_error;
 		/* Add device to memory callback list. */
 		rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);
 		LIST_INSERT_HEAD(&mlx5_shared_data->mem_event_cb_list,
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index f6cebae41..90488af33 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -392,17 +392,6 @@  mlx5_dev_configure(struct rte_eth_dev *dev)
 		if (++j == rxqs_n)
 			j = 0;
 	}
-	/*
-	 * Once the device is added to the list of memory event callback, its
-	 * global MR cache table cannot be expanded on the fly because of
-	 * deadlock. If it overflows, lookup should be done by searching MR list
-	 * linearly, which is slow.
-	 */
-	if (mlx5_mr_btree_init(&priv->mr.cache, MLX5_MR_BTREE_CACHE_N * 2,
-			       dev->device->numa_node)) {
-		/* rte_errno is already set. */
-		return -rte_errno;
-	}
 	return 0;
 }
 
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index abb1f5179..08105a443 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -191,6 +191,7 @@  mlx5_mr_btree_init(struct mlx5_mr_btree *bt, int n, int socket)
 		rte_errno = EINVAL;
 		return -rte_errno;
 	}
+	assert(!bt->table && !bt->size);
 	memset(bt, 0, sizeof(*bt));
 	bt->table = rte_calloc_socket("B-tree table",
 				      n, sizeof(struct mlx5_mr_cache),