[2/3] net/nfp: fix free resource problem

Message ID 20231214102431.2091608-3-chaoyong.he@corigine.com (mailing list archive)
State Accepted, archived
Delegated to: Ferruh Yigit
Headers
Series fix some problems of flower firmware |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Chaoyong He Dec. 14, 2023, 10:24 a.m. UTC
  From: Long Wu <long.wu@corigine.com>

Set the representor array to NULL to avoid that close interface
does not free some resource.

Fixes: a135bc1644d6 ("net/nfp: fix resource leak for flower firmware")
Cc: chaoyong.he@corigine.com
Cc: stable@dpdk.org

Signed-off-by: Long Wu <long.wu@corigine.com>
Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
Reviewed-by: Peng Zhang <peng.zhang@corigine.com>
---
 drivers/net/nfp/flower/nfp_flower_representor.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)
  

Comments

Ferruh Yigit Dec. 15, 2023, 12:54 p.m. UTC | #1
On 12/14/2023 10:24 AM, Chaoyong He wrote:
> From: Long Wu <long.wu@corigine.com>
> 
> Set the representor array to NULL to avoid that close interface
> does not free some resource.
> 
> Fixes: a135bc1644d6 ("net/nfp: fix resource leak for flower firmware")
> Cc: chaoyong.he@corigine.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Long Wu <long.wu@corigine.com>
> Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
> Reviewed-by: Peng Zhang <peng.zhang@corigine.com>
> ---
>  drivers/net/nfp/flower/nfp_flower_representor.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/nfp/flower/nfp_flower_representor.c b/drivers/net/nfp/flower/nfp_flower_representor.c
> index 27ea3891bd..5f7c1fa737 100644
> --- a/drivers/net/nfp/flower/nfp_flower_representor.c
> +++ b/drivers/net/nfp/flower/nfp_flower_representor.c
> @@ -294,17 +294,30 @@ nfp_flower_repr_tx_burst(void *tx_queue,
>  static int
>  nfp_flower_repr_uninit(struct rte_eth_dev *eth_dev)
>  {
> +	uint16_t index;
>  	struct nfp_flower_representor *repr;
>  
>  	repr = eth_dev->data->dev_private;
>  	rte_ring_free(repr->ring);
>  
> +	if (repr->repr_type == NFP_REPR_TYPE_PHYS_PORT) {
> +		index = NFP_FLOWER_CMSG_PORT_PHYS_PORT_NUM(repr->port_id);
> +		repr->app_fw_flower->phy_reprs[index] = NULL;
> +	} else {
> +		index = repr->vf_id;
> +		repr->app_fw_flower->vf_reprs[index] = NULL;
> +	}
> +
>  	return 0;
>  }
>  
>  static int
> -nfp_flower_pf_repr_uninit(__rte_unused struct rte_eth_dev *eth_dev)
> +nfp_flower_pf_repr_uninit(struct rte_eth_dev *eth_dev)
>  {
> +	struct nfp_flower_representor *repr = eth_dev->data->dev_private;
> +
> +	repr->app_fw_flower->pf_repr = NULL;
> 

Here it is assigned to NULL but is it freed? If freed, why not set to
NULL where it is freed?

Same for above phy_reprs & vf_reprs.
  
Chaoyong He Dec. 18, 2023, 1:50 a.m. UTC | #2
> On 12/14/2023 10:24 AM, Chaoyong He wrote:
> > From: Long Wu <long.wu@corigine.com>
> >
> > Set the representor array to NULL to avoid that close interface does
> > not free some resource.
> >
> > Fixes: a135bc1644d6 ("net/nfp: fix resource leak for flower firmware")
> > Cc: chaoyong.he@corigine.com
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Long Wu <long.wu@corigine.com>
> > Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
> > Reviewed-by: Peng Zhang <peng.zhang@corigine.com>
> > ---
> >  drivers/net/nfp/flower/nfp_flower_representor.c | 15 ++++++++++++++-
> >  1 file changed, 14 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/nfp/flower/nfp_flower_representor.c
> > b/drivers/net/nfp/flower/nfp_flower_representor.c
> > index 27ea3891bd..5f7c1fa737 100644
> > --- a/drivers/net/nfp/flower/nfp_flower_representor.c
> > +++ b/drivers/net/nfp/flower/nfp_flower_representor.c
> > @@ -294,17 +294,30 @@ nfp_flower_repr_tx_burst(void *tx_queue,  static
> > int  nfp_flower_repr_uninit(struct rte_eth_dev *eth_dev)  {
> > +	uint16_t index;
> >  	struct nfp_flower_representor *repr;
> >
> >  	repr = eth_dev->data->dev_private;
> >  	rte_ring_free(repr->ring);
> >
> > +	if (repr->repr_type == NFP_REPR_TYPE_PHYS_PORT) {
> > +		index = NFP_FLOWER_CMSG_PORT_PHYS_PORT_NUM(repr-
> >port_id);
> > +		repr->app_fw_flower->phy_reprs[index] = NULL;
> > +	} else {
> > +		index = repr->vf_id;
> > +		repr->app_fw_flower->vf_reprs[index] = NULL;
> > +	}
> > +
> >  	return 0;
> >  }
> >
> >  static int
> > -nfp_flower_pf_repr_uninit(__rte_unused struct rte_eth_dev *eth_dev)
> > +nfp_flower_pf_repr_uninit(struct rte_eth_dev *eth_dev)
> >  {
> > +	struct nfp_flower_representor *repr = eth_dev->data->dev_private;
> > +
> > +	repr->app_fw_flower->pf_repr = NULL;
> >
> 
> Here it is assigned to NULL but is it freed? If freed, why not set to NULL where
> it is freed?
> 
> Same for above phy_reprs & vf_reprs.

The whole invoke view:
rte_eth_dev_close()
    --> nfp_flower_repr_dev_close()
        --> nfp_flower_repr_free()
            --> nfp_flower_pf_repr_uninit()
            --> nfp_flower_repr_uninit()
           // In these two functions, we just assigned to NULL but not freed yet.
           // It is still refer by the `eth_dev->data->dev_private`.
    --> rte_eth_dev_release_port()
        --> rte_free(eth_dev->data->dev_private);
        // And here it is really freed (by the rte framework).
  
Ferruh Yigit Jan. 8, 2024, 3:42 p.m. UTC | #3
On 12/18/2023 1:50 AM, Chaoyong He wrote:
>> On 12/14/2023 10:24 AM, Chaoyong He wrote:
>>> From: Long Wu <long.wu@corigine.com>
>>>
>>> Set the representor array to NULL to avoid that close interface does
>>> not free some resource.
>>>
>>> Fixes: a135bc1644d6 ("net/nfp: fix resource leak for flower firmware")
>>> Cc: chaoyong.he@corigine.com
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Long Wu <long.wu@corigine.com>
>>> Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
>>> Reviewed-by: Peng Zhang <peng.zhang@corigine.com>
>>> ---
>>>  drivers/net/nfp/flower/nfp_flower_representor.c | 15 ++++++++++++++-
>>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/nfp/flower/nfp_flower_representor.c
>>> b/drivers/net/nfp/flower/nfp_flower_representor.c
>>> index 27ea3891bd..5f7c1fa737 100644
>>> --- a/drivers/net/nfp/flower/nfp_flower_representor.c
>>> +++ b/drivers/net/nfp/flower/nfp_flower_representor.c
>>> @@ -294,17 +294,30 @@ nfp_flower_repr_tx_burst(void *tx_queue,  static
>>> int  nfp_flower_repr_uninit(struct rte_eth_dev *eth_dev)  {
>>> +	uint16_t index;
>>>  	struct nfp_flower_representor *repr;
>>>
>>>  	repr = eth_dev->data->dev_private;
>>>  	rte_ring_free(repr->ring);
>>>
>>> +	if (repr->repr_type == NFP_REPR_TYPE_PHYS_PORT) {
>>> +		index = NFP_FLOWER_CMSG_PORT_PHYS_PORT_NUM(repr-
>>> port_id);
>>> +		repr->app_fw_flower->phy_reprs[index] = NULL;
>>> +	} else {
>>> +		index = repr->vf_id;
>>> +		repr->app_fw_flower->vf_reprs[index] = NULL;
>>> +	}
>>> +
>>>  	return 0;
>>>  }
>>>
>>>  static int
>>> -nfp_flower_pf_repr_uninit(__rte_unused struct rte_eth_dev *eth_dev)
>>> +nfp_flower_pf_repr_uninit(struct rte_eth_dev *eth_dev)
>>>  {
>>> +	struct nfp_flower_representor *repr = eth_dev->data->dev_private;
>>> +
>>> +	repr->app_fw_flower->pf_repr = NULL;
>>>
>>
>> Here it is assigned to NULL but is it freed? If freed, why not set to NULL where
>> it is freed?
>>
>> Same for above phy_reprs & vf_reprs.
> 
> The whole invoke view:
> rte_eth_dev_close()
>     --> nfp_flower_repr_dev_close()
>         --> nfp_flower_repr_free()
>             --> nfp_flower_pf_repr_uninit()
>             --> nfp_flower_repr_uninit()
>            // In these two functions, we just assigned to NULL but not freed yet.
>            // It is still refer by the `eth_dev->data->dev_private`.
>     --> rte_eth_dev_release_port()
>         --> rte_free(eth_dev->data->dev_private);
>         // And here it is really freed (by the rte framework).
> 

'rte_eth_dev_release_port()' frees the device private data, but not all
pointers, like 'repr->app_fw_flower->pf_repr', in the struct are freed,
it is dev_close() or unint() functions responsibility.

Can you please double check if
'eth_dev->data->dev_private->app_fw_flower->pf_repr' freed or not?
  
Chaoyong He Jan. 9, 2024, 7:56 a.m. UTC | #4
> On 12/18/2023 1:50 AM, Chaoyong He wrote:
> >> On 12/14/2023 10:24 AM, Chaoyong He wrote:
> >>> From: Long Wu <long.wu@corigine.com>
> >>>
> >>> Set the representor array to NULL to avoid that close interface does
> >>> not free some resource.
> >>>
> >>> Fixes: a135bc1644d6 ("net/nfp: fix resource leak for flower
> >>> firmware")
> >>> Cc: chaoyong.he@corigine.com
> >>> Cc: stable@dpdk.org
> >>>
> >>> Signed-off-by: Long Wu <long.wu@corigine.com>
> >>> Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
> >>> Reviewed-by: Peng Zhang <peng.zhang@corigine.com>
> >>> ---
> >>>  drivers/net/nfp/flower/nfp_flower_representor.c | 15
> >>> ++++++++++++++-
> >>>  1 file changed, 14 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/net/nfp/flower/nfp_flower_representor.c
> >>> b/drivers/net/nfp/flower/nfp_flower_representor.c
> >>> index 27ea3891bd..5f7c1fa737 100644
> >>> --- a/drivers/net/nfp/flower/nfp_flower_representor.c
> >>> +++ b/drivers/net/nfp/flower/nfp_flower_representor.c
> >>> @@ -294,17 +294,30 @@ nfp_flower_repr_tx_burst(void *tx_queue,
> >>> static int  nfp_flower_repr_uninit(struct rte_eth_dev *eth_dev)  {
> >>> +	uint16_t index;
> >>>  	struct nfp_flower_representor *repr;
> >>>
> >>>  	repr = eth_dev->data->dev_private;
> >>>  	rte_ring_free(repr->ring);
> >>>
> >>> +	if (repr->repr_type == NFP_REPR_TYPE_PHYS_PORT) {
> >>> +		index = NFP_FLOWER_CMSG_PORT_PHYS_PORT_NUM(repr-
> >>> port_id);
> >>> +		repr->app_fw_flower->phy_reprs[index] = NULL;
> >>> +	} else {
> >>> +		index = repr->vf_id;
> >>> +		repr->app_fw_flower->vf_reprs[index] = NULL;
> >>> +	}
> >>> +
> >>>  	return 0;
> >>>  }
> >>>
> >>>  static int
> >>> -nfp_flower_pf_repr_uninit(__rte_unused struct rte_eth_dev *eth_dev)
> >>> +nfp_flower_pf_repr_uninit(struct rte_eth_dev *eth_dev)
> >>>  {
> >>> +	struct nfp_flower_representor *repr = eth_dev->data->dev_private;
> >>> +
> >>> +	repr->app_fw_flower->pf_repr = NULL;
> >>>
> >>
> >> Here it is assigned to NULL but is it freed? If freed, why not set to
> >> NULL where it is freed?
> >>
> >> Same for above phy_reprs & vf_reprs.
> >
> > The whole invoke view:
> > rte_eth_dev_close()
> >     --> nfp_flower_repr_dev_close()
> >         --> nfp_flower_repr_free()
> >             --> nfp_flower_pf_repr_uninit()
> >             --> nfp_flower_repr_uninit()
> >            // In these two functions, we just assigned to NULL but not freed yet.
> >            // It is still refer by the `eth_dev->data->dev_private`.
> >     --> rte_eth_dev_release_port()
> >         --> rte_free(eth_dev->data->dev_private);
> >         // And here it is really freed (by the rte framework).
> >
> 
> 'rte_eth_dev_release_port()' frees the device private data, but not all pointers,
> like 'repr->app_fw_flower->pf_repr', in the struct are freed, it is dev_close() or
> unint() functions responsibility.
> 
> Can you please double check if
> 'eth_dev->data->dev_private->app_fw_flower->pf_repr' freed or not?

(gdb) b nfp_flower_repr_dev_close
Breakpoint 1 at 0x7f839a4ad37f: file ../drivers/net/nfp/flower/nfp_flower_representor.c, line 356.
(gdb) c
Continuing.

Thread 1 "dpdk-testpmd" hit Breakpoint 1, nfp_flower_repr_dev_close (dev=0x7f839aed2340 <rte_eth_devices>)
    at ../drivers/net/nfp/flower/nfp_flower_representor.c:356
356             if (rte_eal_process_type() != RTE_PROC_PRIMARY)
(gdb) n
359             repr = dev->data->dev_private;
(gdb)
360             app_fw_flower = repr->app_fw_flower;
(gdb)
361             hw = app_fw_flower->pf_hw;
(gdb)
362             pf_dev = hw->pf_dev;
(gdb)
368             nfp_net_disable_queues(dev);
(gdb) p repr
$1 = (struct nfp_flower_representor *) 0x17c49c800
(gdb) p dev->data->dev_private
$2 = (void *) 0x17c49c800
(gdb) p repr->app_fw_flower->pf_repr
$3 = (struct nfp_flower_representor *) 0x17c49c800

As we can see, these three pointers point the same block of memory.

(gdb) until 384
nfp_flower_repr_dev_close (dev=0x7f839aed2340 <rte_eth_devices>)
    at ../drivers/net/nfp/flower/nfp_flower_representor.c:384
384             nfp_flower_repr_free(repr, repr->repr_type);
(gdb) s
nfp_flower_repr_free (repr=0x17c49c800, repr_type=NFP_REPR_TYPE_PF)
    at ../drivers/net/nfp/flower/nfp_flower_representor.c:328
328             switch (repr_type) {
(gdb) n
333                     nfp_flower_pf_repr_uninit(repr->eth_dev);
(gdb) s
nfp_flower_pf_repr_uninit (eth_dev=0x7f839aed2340 <rte_eth_devices>)
    at ../drivers/net/nfp/flower/nfp_flower_representor.c:317
317             struct nfp_flower_representor *repr = eth_dev->data->dev_private;
(gdb) n
319             repr->app_fw_flower->pf_repr = NULL;
(gdb) p eth_dev->data->dev_private
$4 = (void *) 0x17c49c800
(gdb) p repr
$5 = (struct nfp_flower_representor *) 0x17c49c800
(gdb) p repr->app_fw_flower->pf_repr
$6 = (struct nfp_flower_representor *) 0x17c49c800

As what I said, although I assign NULL to one of the pointers `repr->app_fw_flower->pf_repr`, it still can be access by the other one `eth_dev->data->dev_private`.

(gdb) fin
Run till exit from #0  nfp_flower_pf_repr_uninit (eth_dev=0x7f839aed2340 <rte_eth_devices>)
    at ../drivers/net/nfp/flower/nfp_flower_representor.c:319
nfp_flower_repr_free (repr=0x17c49c800, repr_type=NFP_REPR_TYPE_PF)
    at ../drivers/net/nfp/flower/nfp_flower_representor.c:334
334                     break;
Value returned is $7 = 0
(gdb) fin
Run till exit from #0  nfp_flower_repr_free (repr=0x17c49c800, repr_type=NFP_REPR_TYPE_PF)
    at ../drivers/net/nfp/flower/nfp_flower_representor.c:334
nfp_flower_repr_dev_close (dev=0x7f839aed2340 <rte_eth_devices>)
    at ../drivers/net/nfp/flower/nfp_flower_representor.c:386
386             for (i = 0; i < MAX_FLOWER_VFS; i++) {
(gdb) b rte_eth_dev_release_port
Breakpoint 2 at 0x7f839ae0d0d5: file ../lib/ethdev/ethdev_driver.c, line 230.
(gdb) c
Continuing.

Thread 1 "dpdk-testpmd" hit Breakpoint 2, rte_eth_dev_release_port (eth_dev=0x7f839aed2340 <rte_eth_devices>)
    at ../lib/ethdev/ethdev_driver.c:230
230             if (eth_dev == NULL)
(gdb) p eth_dev
$8 = (struct rte_eth_dev *) 0x7f839aed2340 <rte_eth_devices>
(gdb) until 267
rte_eth_dev_release_port (eth_dev=0x7f839aed2340 <rte_eth_devices>) at ../lib/ethdev/ethdev_driver.c:267
267                     rte_free(eth_dev->data->dev_private);
(gdb) p eth_dev->data->dev_private
$9 = (void *) 0x17c49c800
(gdb)

And here, we free this block of memory, and no memory leak happens, I think.
  
Ferruh Yigit Jan. 9, 2024, 5:48 p.m. UTC | #5
On 1/9/2024 7:56 AM, Chaoyong He wrote:
>> On 12/18/2023 1:50 AM, Chaoyong He wrote:
>>>> On 12/14/2023 10:24 AM, Chaoyong He wrote:
>>>>> From: Long Wu <long.wu@corigine.com>
>>>>>
>>>>> Set the representor array to NULL to avoid that close interface does
>>>>> not free some resource.
>>>>>
>>>>> Fixes: a135bc1644d6 ("net/nfp: fix resource leak for flower
>>>>> firmware")
>>>>> Cc: chaoyong.he@corigine.com
>>>>> Cc: stable@dpdk.org
>>>>>
>>>>> Signed-off-by: Long Wu <long.wu@corigine.com>
>>>>> Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
>>>>> Reviewed-by: Peng Zhang <peng.zhang@corigine.com>
>>>>> ---
>>>>>  drivers/net/nfp/flower/nfp_flower_representor.c | 15
>>>>> ++++++++++++++-
>>>>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/net/nfp/flower/nfp_flower_representor.c
>>>>> b/drivers/net/nfp/flower/nfp_flower_representor.c
>>>>> index 27ea3891bd..5f7c1fa737 100644
>>>>> --- a/drivers/net/nfp/flower/nfp_flower_representor.c
>>>>> +++ b/drivers/net/nfp/flower/nfp_flower_representor.c
>>>>> @@ -294,17 +294,30 @@ nfp_flower_repr_tx_burst(void *tx_queue,
>>>>> static int  nfp_flower_repr_uninit(struct rte_eth_dev *eth_dev)  {
>>>>> +	uint16_t index;
>>>>>  	struct nfp_flower_representor *repr;
>>>>>
>>>>>  	repr = eth_dev->data->dev_private;
>>>>>  	rte_ring_free(repr->ring);
>>>>>
>>>>> +	if (repr->repr_type == NFP_REPR_TYPE_PHYS_PORT) {
>>>>> +		index = NFP_FLOWER_CMSG_PORT_PHYS_PORT_NUM(repr-
>>>>> port_id);
>>>>> +		repr->app_fw_flower->phy_reprs[index] = NULL;
>>>>> +	} else {
>>>>> +		index = repr->vf_id;
>>>>> +		repr->app_fw_flower->vf_reprs[index] = NULL;
>>>>> +	}
>>>>> +
>>>>>  	return 0;
>>>>>  }
>>>>>
>>>>>  static int
>>>>> -nfp_flower_pf_repr_uninit(__rte_unused struct rte_eth_dev *eth_dev)
>>>>> +nfp_flower_pf_repr_uninit(struct rte_eth_dev *eth_dev)
>>>>>  {
>>>>> +	struct nfp_flower_representor *repr = eth_dev->data->dev_private;
>>>>> +
>>>>> +	repr->app_fw_flower->pf_repr = NULL;
>>>>>
>>>>
>>>> Here it is assigned to NULL but is it freed? If freed, why not set to
>>>> NULL where it is freed?
>>>>
>>>> Same for above phy_reprs & vf_reprs.
>>>
>>> The whole invoke view:
>>> rte_eth_dev_close()
>>>     --> nfp_flower_repr_dev_close()
>>>         --> nfp_flower_repr_free()
>>>             --> nfp_flower_pf_repr_uninit()
>>>             --> nfp_flower_repr_uninit()
>>>            // In these two functions, we just assigned to NULL but not freed yet.
>>>            // It is still refer by the `eth_dev->data->dev_private`.
>>>     --> rte_eth_dev_release_port()
>>>         --> rte_free(eth_dev->data->dev_private);
>>>         // And here it is really freed (by the rte framework).
>>>
>>
>> 'rte_eth_dev_release_port()' frees the device private data, but not all pointers,
>> like 'repr->app_fw_flower->pf_repr', in the struct are freed, it is dev_close() or
>> unint() functions responsibility.
>>
>> Can you please double check if
>> 'eth_dev->data->dev_private->app_fw_flower->pf_repr' freed or not?
> 
> (gdb) b nfp_flower_repr_dev_close
> Breakpoint 1 at 0x7f839a4ad37f: file ../drivers/net/nfp/flower/nfp_flower_representor.c, line 356.
> (gdb) c
> Continuing.
> 
> Thread 1 "dpdk-testpmd" hit Breakpoint 1, nfp_flower_repr_dev_close (dev=0x7f839aed2340 <rte_eth_devices>)
>     at ../drivers/net/nfp/flower/nfp_flower_representor.c:356
> 356             if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> (gdb) n
> 359             repr = dev->data->dev_private;
> (gdb)
> 360             app_fw_flower = repr->app_fw_flower;
> (gdb)
> 361             hw = app_fw_flower->pf_hw;
> (gdb)
> 362             pf_dev = hw->pf_dev;
> (gdb)
> 368             nfp_net_disable_queues(dev);
> (gdb) p repr
> $1 = (struct nfp_flower_representor *) 0x17c49c800
> (gdb) p dev->data->dev_private
> $2 = (void *) 0x17c49c800
> (gdb) p repr->app_fw_flower->pf_repr
> $3 = (struct nfp_flower_representor *) 0x17c49c800
> 
> As we can see, these three pointers point the same block of memory.
> 

Ahh, I missed that 'repr->app_fw_flower->pf_repr' points to
'dev_private', so your code makes sense.

But if it is 'dev_private', why free it in 'nfp_pf_uninit()' as it will
be freed by 'rte_eth_dev_release_port()'?
Won't removing 'rte_free(pf_dev);' from 'nfp_pf_uninit()' will have the
same effect, instead of setting it NULL in advance?
  
Chaoyong He Jan. 11, 2024, 2:02 a.m. UTC | #6
> On 1/9/2024 7:56 AM, Chaoyong He wrote:
> >> On 12/18/2023 1:50 AM, Chaoyong He wrote:
> >>>> On 12/14/2023 10:24 AM, Chaoyong He wrote:
> >>>>> From: Long Wu <long.wu@corigine.com>
> >>>>>
> >>>>> Set the representor array to NULL to avoid that close interface
> >>>>> does not free some resource.
> >>>>>
> >>>>> Fixes: a135bc1644d6 ("net/nfp: fix resource leak for flower
> >>>>> firmware")
> >>>>> Cc: chaoyong.he@corigine.com
> >>>>> Cc: stable@dpdk.org
> >>>>>
> >>>>> Signed-off-by: Long Wu <long.wu@corigine.com>
> >>>>> Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
> >>>>> Reviewed-by: Peng Zhang <peng.zhang@corigine.com>
> >>>>> ---
> >>>>>  drivers/net/nfp/flower/nfp_flower_representor.c | 15
> >>>>> ++++++++++++++-
> >>>>>  1 file changed, 14 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/drivers/net/nfp/flower/nfp_flower_representor.c
> >>>>> b/drivers/net/nfp/flower/nfp_flower_representor.c
> >>>>> index 27ea3891bd..5f7c1fa737 100644
> >>>>> --- a/drivers/net/nfp/flower/nfp_flower_representor.c
> >>>>> +++ b/drivers/net/nfp/flower/nfp_flower_representor.c
> >>>>> @@ -294,17 +294,30 @@ nfp_flower_repr_tx_burst(void *tx_queue,
> >>>>> static int  nfp_flower_repr_uninit(struct rte_eth_dev *eth_dev)  {
> >>>>> +	uint16_t index;
> >>>>>  	struct nfp_flower_representor *repr;
> >>>>>
> >>>>>  	repr = eth_dev->data->dev_private;
> >>>>>  	rte_ring_free(repr->ring);
> >>>>>
> >>>>> +	if (repr->repr_type == NFP_REPR_TYPE_PHYS_PORT) {
> >>>>> +		index = NFP_FLOWER_CMSG_PORT_PHYS_PORT_NUM(repr-
> >>>>> port_id);
> >>>>> +		repr->app_fw_flower->phy_reprs[index] = NULL;
> >>>>> +	} else {
> >>>>> +		index = repr->vf_id;
> >>>>> +		repr->app_fw_flower->vf_reprs[index] = NULL;
> >>>>> +	}
> >>>>> +
> >>>>>  	return 0;
> >>>>>  }
> >>>>>
> >>>>>  static int
> >>>>> -nfp_flower_pf_repr_uninit(__rte_unused struct rte_eth_dev
> >>>>> *eth_dev)
> >>>>> +nfp_flower_pf_repr_uninit(struct rte_eth_dev *eth_dev)
> >>>>>  {
> >>>>> +	struct nfp_flower_representor *repr =
> >>>>> +eth_dev->data->dev_private;
> >>>>> +
> >>>>> +	repr->app_fw_flower->pf_repr = NULL;
> >>>>>
> >>>>
> >>>> Here it is assigned to NULL but is it freed? If freed, why not set
> >>>> to NULL where it is freed?
> >>>>
> >>>> Same for above phy_reprs & vf_reprs.
> >>>
> >>> The whole invoke view:
> >>> rte_eth_dev_close()
> >>>     --> nfp_flower_repr_dev_close()
> >>>         --> nfp_flower_repr_free()
> >>>             --> nfp_flower_pf_repr_uninit()
> >>>             --> nfp_flower_repr_uninit()
> >>>            // In these two functions, we just assigned to NULL but not freed
> yet.
> >>>            // It is still refer by the `eth_dev->data->dev_private`.
> >>>     --> rte_eth_dev_release_port()
> >>>         --> rte_free(eth_dev->data->dev_private);
> >>>         // And here it is really freed (by the rte framework).
> >>>
> >>
> >> 'rte_eth_dev_release_port()' frees the device private data, but not
> >> all pointers, like 'repr->app_fw_flower->pf_repr', in the struct are
> >> freed, it is dev_close() or
> >> unint() functions responsibility.
> >>
> >> Can you please double check if
> >> 'eth_dev->data->dev_private->app_fw_flower->pf_repr' freed or not?
> >
> > (gdb) b nfp_flower_repr_dev_close
> > Breakpoint 1 at 0x7f839a4ad37f:
> file ../drivers/net/nfp/flower/nfp_flower_representor.c, line 356.
> > (gdb) c
> > Continuing.
> >
> > Thread 1 "dpdk-testpmd" hit Breakpoint 1, nfp_flower_repr_dev_close
> (dev=0x7f839aed2340 <rte_eth_devices>)
> >     at ../drivers/net/nfp/flower/nfp_flower_representor.c:356
> > 356             if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> > (gdb) n
> > 359             repr = dev->data->dev_private;
> > (gdb)
> > 360             app_fw_flower = repr->app_fw_flower;
> > (gdb)
> > 361             hw = app_fw_flower->pf_hw;
> > (gdb)
> > 362             pf_dev = hw->pf_dev;
> > (gdb)
> > 368             nfp_net_disable_queues(dev);
> > (gdb) p repr
> > $1 = (struct nfp_flower_representor *) 0x17c49c800
> > (gdb) p dev->data->dev_private
> > $2 = (void *) 0x17c49c800
> > (gdb) p repr->app_fw_flower->pf_repr
> > $3 = (struct nfp_flower_representor *) 0x17c49c800
> >
> > As we can see, these three pointers point the same block of memory.
> >
> 
> Ahh, I missed that 'repr->app_fw_flower->pf_repr' points to 'dev_private', so
> your code makes sense.
> 
> But if it is 'dev_private', why free it in 'nfp_pf_uninit()' as it will be freed by
> 'rte_eth_dev_release_port()'?

Sorry, I'm not understanding this.
The 'dev_private' is a 'struct nfp_flower_representor *', and it will be freed in 'rte_eth_dev_release_port()'.
What I freed in 'nfp_pf_uninit()' is a 'struct nfp_pf_dev *', so I'm not catch your point about this.

> Won't removing 'rte_free(pf_dev);' from 'nfp_pf_uninit()' will have the same
> effect, instead of setting it NULL in advance?
> 

If I remove the 'rte_free(pf_dev);' from 'nfp_pf_uninit()', there will be a memory leak as no one will free it, and actually I'm not 'setting it NULL in advance'.

359             repr = dev->data->dev_private;
360             app_fw_flower = repr->app_fw_flower;
361             hw = app_fw_flower->pf_hw;
362             pf_dev = hw->pf_dev;

Maybe you just confuse the 'pf_repr' and 'pf_dev'? Just a guess.
  
Ferruh Yigit Jan. 11, 2024, 12:32 p.m. UTC | #7
On 1/11/2024 2:02 AM, Chaoyong He wrote:
>> On 1/9/2024 7:56 AM, Chaoyong He wrote:
>>>> On 12/18/2023 1:50 AM, Chaoyong He wrote:
>>>>>> On 12/14/2023 10:24 AM, Chaoyong He wrote:
>>>>>>> From: Long Wu <long.wu@corigine.com>
>>>>>>>
>>>>>>> Set the representor array to NULL to avoid that close interface
>>>>>>> does not free some resource.
>>>>>>>
>>>>>>> Fixes: a135bc1644d6 ("net/nfp: fix resource leak for flower
>>>>>>> firmware")
>>>>>>> Cc: chaoyong.he@corigine.com
>>>>>>> Cc: stable@dpdk.org
>>>>>>>
>>>>>>> Signed-off-by: Long Wu <long.wu@corigine.com>
>>>>>>> Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
>>>>>>> Reviewed-by: Peng Zhang <peng.zhang@corigine.com>
>>>>>>> ---
>>>>>>>  drivers/net/nfp/flower/nfp_flower_representor.c | 15
>>>>>>> ++++++++++++++-
>>>>>>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/net/nfp/flower/nfp_flower_representor.c
>>>>>>> b/drivers/net/nfp/flower/nfp_flower_representor.c
>>>>>>> index 27ea3891bd..5f7c1fa737 100644
>>>>>>> --- a/drivers/net/nfp/flower/nfp_flower_representor.c
>>>>>>> +++ b/drivers/net/nfp/flower/nfp_flower_representor.c
>>>>>>> @@ -294,17 +294,30 @@ nfp_flower_repr_tx_burst(void *tx_queue,
>>>>>>> static int  nfp_flower_repr_uninit(struct rte_eth_dev *eth_dev)  {
>>>>>>> +	uint16_t index;
>>>>>>>  	struct nfp_flower_representor *repr;
>>>>>>>
>>>>>>>  	repr = eth_dev->data->dev_private;
>>>>>>>  	rte_ring_free(repr->ring);
>>>>>>>
>>>>>>> +	if (repr->repr_type == NFP_REPR_TYPE_PHYS_PORT) {
>>>>>>> +		index = NFP_FLOWER_CMSG_PORT_PHYS_PORT_NUM(repr-
>>>>>>> port_id);
>>>>>>> +		repr->app_fw_flower->phy_reprs[index] = NULL;
>>>>>>> +	} else {
>>>>>>> +		index = repr->vf_id;
>>>>>>> +		repr->app_fw_flower->vf_reprs[index] = NULL;
>>>>>>> +	}
>>>>>>> +
>>>>>>>  	return 0;
>>>>>>>  }
>>>>>>>
>>>>>>>  static int
>>>>>>> -nfp_flower_pf_repr_uninit(__rte_unused struct rte_eth_dev
>>>>>>> *eth_dev)
>>>>>>> +nfp_flower_pf_repr_uninit(struct rte_eth_dev *eth_dev)
>>>>>>>  {
>>>>>>> +	struct nfp_flower_representor *repr =
>>>>>>> +eth_dev->data->dev_private;
>>>>>>> +
>>>>>>> +	repr->app_fw_flower->pf_repr = NULL;
>>>>>>>
>>>>>>
>>>>>> Here it is assigned to NULL but is it freed? If freed, why not set
>>>>>> to NULL where it is freed?
>>>>>>
>>>>>> Same for above phy_reprs & vf_reprs.
>>>>>
>>>>> The whole invoke view:
>>>>> rte_eth_dev_close()
>>>>>     --> nfp_flower_repr_dev_close()
>>>>>         --> nfp_flower_repr_free()
>>>>>             --> nfp_flower_pf_repr_uninit()
>>>>>             --> nfp_flower_repr_uninit()
>>>>>            // In these two functions, we just assigned to NULL but not freed
>> yet.
>>>>>            // It is still refer by the `eth_dev->data->dev_private`.
>>>>>     --> rte_eth_dev_release_port()
>>>>>         --> rte_free(eth_dev->data->dev_private);
>>>>>         // And here it is really freed (by the rte framework).
>>>>>
>>>>
>>>> 'rte_eth_dev_release_port()' frees the device private data, but not
>>>> all pointers, like 'repr->app_fw_flower->pf_repr', in the struct are
>>>> freed, it is dev_close() or
>>>> unint() functions responsibility.
>>>>
>>>> Can you please double check if
>>>> 'eth_dev->data->dev_private->app_fw_flower->pf_repr' freed or not?
>>>
>>> (gdb) b nfp_flower_repr_dev_close
>>> Breakpoint 1 at 0x7f839a4ad37f:
>> file ../drivers/net/nfp/flower/nfp_flower_representor.c, line 356.
>>> (gdb) c
>>> Continuing.
>>>
>>> Thread 1 "dpdk-testpmd" hit Breakpoint 1, nfp_flower_repr_dev_close
>> (dev=0x7f839aed2340 <rte_eth_devices>)
>>>     at ../drivers/net/nfp/flower/nfp_flower_representor.c:356
>>> 356             if (rte_eal_process_type() != RTE_PROC_PRIMARY)
>>> (gdb) n
>>> 359             repr = dev->data->dev_private;
>>> (gdb)
>>> 360             app_fw_flower = repr->app_fw_flower;
>>> (gdb)
>>> 361             hw = app_fw_flower->pf_hw;
>>> (gdb)
>>> 362             pf_dev = hw->pf_dev;
>>> (gdb)
>>> 368             nfp_net_disable_queues(dev);
>>> (gdb) p repr
>>> $1 = (struct nfp_flower_representor *) 0x17c49c800
>>> (gdb) p dev->data->dev_private
>>> $2 = (void *) 0x17c49c800
>>> (gdb) p repr->app_fw_flower->pf_repr
>>> $3 = (struct nfp_flower_representor *) 0x17c49c800
>>>
>>> As we can see, these three pointers point the same block of memory.
>>>
>>
>> Ahh, I missed that 'repr->app_fw_flower->pf_repr' points to 'dev_private', so
>> your code makes sense.
>>
>> But if it is 'dev_private', why free it in 'nfp_pf_uninit()' as it will be freed by
>> 'rte_eth_dev_release_port()'?
> 
> Sorry, I'm not understanding this.
> The 'dev_private' is a 'struct nfp_flower_representor *', and it will be freed in 'rte_eth_dev_release_port()'.
> What I freed in 'nfp_pf_uninit()' is a 'struct nfp_pf_dev *', so I'm not catch your point about this.
> 
>> Won't removing 'rte_free(pf_dev);' from 'nfp_pf_uninit()' will have the same
>> effect, instead of setting it NULL in advance?
>>
> 
> If I remove the 'rte_free(pf_dev);' from 'nfp_pf_uninit()', there will be a memory leak as no one will free it, and actually I'm not 'setting it NULL in advance'.
> 
> 359             repr = dev->data->dev_private;
> 360             app_fw_flower = repr->app_fw_flower;
> 361             hw = app_fw_flower->pf_hw;
> 362             pf_dev = hw->pf_dev;
> 
> Maybe you just confuse the 'pf_repr' and 'pf_dev'? Just a guess.
>

Yes I did confuse those two, sorry about that.

'repr->app_fw_flower->pf_repr' is 'dev_private', and I assumed you are
setting it NULL to escape from double free (and was checking where that
double free happens), but I guess that is not the case.

'rte_eth_dev_destroy()' calls 'rte_eth_dev_release_port()' and frees
'dev_private' but 'repr->app_fw_flower->pf_repr' remains as dangling
pointer and perhaps prevents 'nfp_flower_repr_dev_close()' move forward
(because of "if (app_fw_flower->pf_repr != NULL)" check), and you are
fixing it, is it the case?
  
Chaoyong He Jan. 12, 2024, 1:19 a.m. UTC | #8
> On 1/11/2024 2:02 AM, Chaoyong He wrote:
> >> On 1/9/2024 7:56 AM, Chaoyong He wrote:
> >>>> On 12/18/2023 1:50 AM, Chaoyong He wrote:
> >>>>>> On 12/14/2023 10:24 AM, Chaoyong He wrote:
> >>>>>>> From: Long Wu <long.wu@corigine.com>
> >>>>>>>
> >>>>>>> Set the representor array to NULL to avoid that close interface
> >>>>>>> does not free some resource.
> >>>>>>>
> >>>>>>> Fixes: a135bc1644d6 ("net/nfp: fix resource leak for flower
> >>>>>>> firmware")
> >>>>>>> Cc: chaoyong.he@corigine.com
> >>>>>>> Cc: stable@dpdk.org
> >>>>>>>
> >>>>>>> Signed-off-by: Long Wu <long.wu@corigine.com>
> >>>>>>> Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
> >>>>>>> Reviewed-by: Peng Zhang <peng.zhang@corigine.com>
> >>>>>>> ---
> >>>>>>>  drivers/net/nfp/flower/nfp_flower_representor.c | 15
> >>>>>>> ++++++++++++++-
> >>>>>>>  1 file changed, 14 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/net/nfp/flower/nfp_flower_representor.c
> >>>>>>> b/drivers/net/nfp/flower/nfp_flower_representor.c
> >>>>>>> index 27ea3891bd..5f7c1fa737 100644
> >>>>>>> --- a/drivers/net/nfp/flower/nfp_flower_representor.c
> >>>>>>> +++ b/drivers/net/nfp/flower/nfp_flower_representor.c
> >>>>>>> @@ -294,17 +294,30 @@ nfp_flower_repr_tx_burst(void
> *tx_queue,
> >>>>>>> static int  nfp_flower_repr_uninit(struct rte_eth_dev *eth_dev)
> >>>>>>> {
> >>>>>>> +	uint16_t index;
> >>>>>>>  	struct nfp_flower_representor *repr;
> >>>>>>>
> >>>>>>>  	repr = eth_dev->data->dev_private;
> >>>>>>>  	rte_ring_free(repr->ring);
> >>>>>>>
> >>>>>>> +	if (repr->repr_type == NFP_REPR_TYPE_PHYS_PORT) {
> >>>>>>> +		index =
> NFP_FLOWER_CMSG_PORT_PHYS_PORT_NUM(repr-
> >>>>>>> port_id);
> >>>>>>> +		repr->app_fw_flower->phy_reprs[index] = NULL;
> >>>>>>> +	} else {
> >>>>>>> +		index = repr->vf_id;
> >>>>>>> +		repr->app_fw_flower->vf_reprs[index] = NULL;
> >>>>>>> +	}
> >>>>>>> +
> >>>>>>>  	return 0;
> >>>>>>>  }
> >>>>>>>
> >>>>>>>  static int
> >>>>>>> -nfp_flower_pf_repr_uninit(__rte_unused struct rte_eth_dev
> >>>>>>> *eth_dev)
> >>>>>>> +nfp_flower_pf_repr_uninit(struct rte_eth_dev *eth_dev)
> >>>>>>>  {
> >>>>>>> +	struct nfp_flower_representor *repr =
> >>>>>>> +eth_dev->data->dev_private;
> >>>>>>> +
> >>>>>>> +	repr->app_fw_flower->pf_repr = NULL;
> >>>>>>>
> >>>>>>
> >>>>>> Here it is assigned to NULL but is it freed? If freed, why not
> >>>>>> set to NULL where it is freed?
> >>>>>>
> >>>>>> Same for above phy_reprs & vf_reprs.
> >>>>>
> >>>>> The whole invoke view:
> >>>>> rte_eth_dev_close()
> >>>>>     --> nfp_flower_repr_dev_close()
> >>>>>         --> nfp_flower_repr_free()
> >>>>>             --> nfp_flower_pf_repr_uninit()
> >>>>>             --> nfp_flower_repr_uninit()
> >>>>>            // In these two functions, we just assigned to NULL but
> >>>>> not freed
> >> yet.
> >>>>>            // It is still refer by the `eth_dev->data->dev_private`.
> >>>>>     --> rte_eth_dev_release_port()
> >>>>>         --> rte_free(eth_dev->data->dev_private);
> >>>>>         // And here it is really freed (by the rte framework).
> >>>>>
> >>>>
> >>>> 'rte_eth_dev_release_port()' frees the device private data, but not
> >>>> all pointers, like 'repr->app_fw_flower->pf_repr', in the struct
> >>>> are freed, it is dev_close() or
> >>>> unint() functions responsibility.
> >>>>
> >>>> Can you please double check if
> >>>> 'eth_dev->data->dev_private->app_fw_flower->pf_repr' freed or not?
> >>>
> >>> (gdb) b nfp_flower_repr_dev_close
> >>> Breakpoint 1 at 0x7f839a4ad37f:
> >> file ../drivers/net/nfp/flower/nfp_flower_representor.c, line 356.
> >>> (gdb) c
> >>> Continuing.
> >>>
> >>> Thread 1 "dpdk-testpmd" hit Breakpoint 1, nfp_flower_repr_dev_close
> >> (dev=0x7f839aed2340 <rte_eth_devices>)
> >>>     at ../drivers/net/nfp/flower/nfp_flower_representor.c:356
> >>> 356             if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> >>> (gdb) n
> >>> 359             repr = dev->data->dev_private;
> >>> (gdb)
> >>> 360             app_fw_flower = repr->app_fw_flower;
> >>> (gdb)
> >>> 361             hw = app_fw_flower->pf_hw;
> >>> (gdb)
> >>> 362             pf_dev = hw->pf_dev;
> >>> (gdb)
> >>> 368             nfp_net_disable_queues(dev);
> >>> (gdb) p repr
> >>> $1 = (struct nfp_flower_representor *) 0x17c49c800
> >>> (gdb) p dev->data->dev_private
> >>> $2 = (void *) 0x17c49c800
> >>> (gdb) p repr->app_fw_flower->pf_repr
> >>> $3 = (struct nfp_flower_representor *) 0x17c49c800
> >>>
> >>> As we can see, these three pointers point the same block of memory.
> >>>
> >>
> >> Ahh, I missed that 'repr->app_fw_flower->pf_repr' points to
> >> 'dev_private', so your code makes sense.
> >>
> >> But if it is 'dev_private', why free it in 'nfp_pf_uninit()' as it
> >> will be freed by 'rte_eth_dev_release_port()'?
> >
> > Sorry, I'm not understanding this.
> > The 'dev_private' is a 'struct nfp_flower_representor *', and it will be freed in
> 'rte_eth_dev_release_port()'.
> > What I freed in 'nfp_pf_uninit()' is a 'struct nfp_pf_dev *', so I'm not catch
> your point about this.
> >
> >> Won't removing 'rte_free(pf_dev);' from 'nfp_pf_uninit()' will have
> >> the same effect, instead of setting it NULL in advance?
> >>
> >
> > If I remove the 'rte_free(pf_dev);' from 'nfp_pf_uninit()', there will be a
> memory leak as no one will free it, and actually I'm not 'setting it NULL in
> advance'.
> >
> > 359             repr = dev->data->dev_private;
> > 360             app_fw_flower = repr->app_fw_flower;
> > 361             hw = app_fw_flower->pf_hw;
> > 362             pf_dev = hw->pf_dev;
> >
> > Maybe you just confuse the 'pf_repr' and 'pf_dev'? Just a guess.
> >
> 
> Yes I did confuse those two, sorry about that.
> 
> 'repr->app_fw_flower->pf_repr' is 'dev_private', and I assumed you are setting
> it NULL to escape from double free (and was checking where that double free
> happens), but I guess that is not the case.
> 
> 'rte_eth_dev_destroy()' calls 'rte_eth_dev_release_port()' and frees
> 'dev_private' but 'repr->app_fw_flower->pf_repr' remains as dangling pointer
> and perhaps prevents 'nfp_flower_repr_dev_close()' move forward (because
> of "if (app_fw_flower->pf_repr != NULL)" check), and you are fixing it, is it the
> case?

Correct, that's what we want to do by this patch and where the problem is, your description is very clear and brief.
  
Ferruh Yigit Jan. 12, 2024, 10:06 a.m. UTC | #9
On 1/12/2024 1:19 AM, Chaoyong He wrote:
>> On 1/11/2024 2:02 AM, Chaoyong He wrote:
>>>> On 1/9/2024 7:56 AM, Chaoyong He wrote:
>>>>>> On 12/18/2023 1:50 AM, Chaoyong He wrote:
>>>>>>>> On 12/14/2023 10:24 AM, Chaoyong He wrote:
>>>>>>>>> From: Long Wu <long.wu@corigine.com>
>>>>>>>>>
>>>>>>>>> Set the representor array to NULL to avoid that close interface
>>>>>>>>> does not free some resource.
>>>>>>>>>
>>>>>>>>> Fixes: a135bc1644d6 ("net/nfp: fix resource leak for flower
>>>>>>>>> firmware")
>>>>>>>>> Cc: chaoyong.he@corigine.com
>>>>>>>>> Cc: stable@dpdk.org
>>>>>>>>>
>>>>>>>>> Signed-off-by: Long Wu <long.wu@corigine.com>
>>>>>>>>> Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
>>>>>>>>> Reviewed-by: Peng Zhang <peng.zhang@corigine.com>
>>>>>>>>> ---
>>>>>>>>>  drivers/net/nfp/flower/nfp_flower_representor.c | 15
>>>>>>>>> ++++++++++++++-
>>>>>>>>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/net/nfp/flower/nfp_flower_representor.c
>>>>>>>>> b/drivers/net/nfp/flower/nfp_flower_representor.c
>>>>>>>>> index 27ea3891bd..5f7c1fa737 100644
>>>>>>>>> --- a/drivers/net/nfp/flower/nfp_flower_representor.c
>>>>>>>>> +++ b/drivers/net/nfp/flower/nfp_flower_representor.c
>>>>>>>>> @@ -294,17 +294,30 @@ nfp_flower_repr_tx_burst(void
>> *tx_queue,
>>>>>>>>> static int  nfp_flower_repr_uninit(struct rte_eth_dev *eth_dev)
>>>>>>>>> {
>>>>>>>>> +	uint16_t index;
>>>>>>>>>  	struct nfp_flower_representor *repr;
>>>>>>>>>
>>>>>>>>>  	repr = eth_dev->data->dev_private;
>>>>>>>>>  	rte_ring_free(repr->ring);
>>>>>>>>>
>>>>>>>>> +	if (repr->repr_type == NFP_REPR_TYPE_PHYS_PORT) {
>>>>>>>>> +		index =
>> NFP_FLOWER_CMSG_PORT_PHYS_PORT_NUM(repr-
>>>>>>>>> port_id);
>>>>>>>>> +		repr->app_fw_flower->phy_reprs[index] = NULL;
>>>>>>>>> +	} else {
>>>>>>>>> +		index = repr->vf_id;
>>>>>>>>> +		repr->app_fw_flower->vf_reprs[index] = NULL;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>>  	return 0;
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>>  static int
>>>>>>>>> -nfp_flower_pf_repr_uninit(__rte_unused struct rte_eth_dev
>>>>>>>>> *eth_dev)
>>>>>>>>> +nfp_flower_pf_repr_uninit(struct rte_eth_dev *eth_dev)
>>>>>>>>>  {
>>>>>>>>> +	struct nfp_flower_representor *repr =
>>>>>>>>> +eth_dev->data->dev_private;
>>>>>>>>> +
>>>>>>>>> +	repr->app_fw_flower->pf_repr = NULL;
>>>>>>>>>
>>>>>>>>
>>>>>>>> Here it is assigned to NULL but is it freed? If freed, why not
>>>>>>>> set to NULL where it is freed?
>>>>>>>>
>>>>>>>> Same for above phy_reprs & vf_reprs.
>>>>>>>
>>>>>>> The whole invoke view:
>>>>>>> rte_eth_dev_close()
>>>>>>>     --> nfp_flower_repr_dev_close()
>>>>>>>         --> nfp_flower_repr_free()
>>>>>>>             --> nfp_flower_pf_repr_uninit()
>>>>>>>             --> nfp_flower_repr_uninit()
>>>>>>>            // In these two functions, we just assigned to NULL but
>>>>>>> not freed
>>>> yet.
>>>>>>>            // It is still refer by the `eth_dev->data->dev_private`.
>>>>>>>     --> rte_eth_dev_release_port()
>>>>>>>         --> rte_free(eth_dev->data->dev_private);
>>>>>>>         // And here it is really freed (by the rte framework).
>>>>>>>
>>>>>>
>>>>>> 'rte_eth_dev_release_port()' frees the device private data, but not
>>>>>> all pointers, like 'repr->app_fw_flower->pf_repr', in the struct
>>>>>> are freed, it is dev_close() or
>>>>>> unint() functions responsibility.
>>>>>>
>>>>>> Can you please double check if
>>>>>> 'eth_dev->data->dev_private->app_fw_flower->pf_repr' freed or not?
>>>>>
>>>>> (gdb) b nfp_flower_repr_dev_close
>>>>> Breakpoint 1 at 0x7f839a4ad37f:
>>>> file ../drivers/net/nfp/flower/nfp_flower_representor.c, line 356.
>>>>> (gdb) c
>>>>> Continuing.
>>>>>
>>>>> Thread 1 "dpdk-testpmd" hit Breakpoint 1, nfp_flower_repr_dev_close
>>>> (dev=0x7f839aed2340 <rte_eth_devices>)
>>>>>     at ../drivers/net/nfp/flower/nfp_flower_representor.c:356
>>>>> 356             if (rte_eal_process_type() != RTE_PROC_PRIMARY)
>>>>> (gdb) n
>>>>> 359             repr = dev->data->dev_private;
>>>>> (gdb)
>>>>> 360             app_fw_flower = repr->app_fw_flower;
>>>>> (gdb)
>>>>> 361             hw = app_fw_flower->pf_hw;
>>>>> (gdb)
>>>>> 362             pf_dev = hw->pf_dev;
>>>>> (gdb)
>>>>> 368             nfp_net_disable_queues(dev);
>>>>> (gdb) p repr
>>>>> $1 = (struct nfp_flower_representor *) 0x17c49c800
>>>>> (gdb) p dev->data->dev_private
>>>>> $2 = (void *) 0x17c49c800
>>>>> (gdb) p repr->app_fw_flower->pf_repr
>>>>> $3 = (struct nfp_flower_representor *) 0x17c49c800
>>>>>
>>>>> As we can see, these three pointers point the same block of memory.
>>>>>
>>>>
>>>> Ahh, I missed that 'repr->app_fw_flower->pf_repr' points to
>>>> 'dev_private', so your code makes sense.
>>>>
>>>> But if it is 'dev_private', why free it in 'nfp_pf_uninit()' as it
>>>> will be freed by 'rte_eth_dev_release_port()'?
>>>
>>> Sorry, I'm not understanding this.
>>> The 'dev_private' is a 'struct nfp_flower_representor *', and it will be freed in
>> 'rte_eth_dev_release_port()'.
>>> What I freed in 'nfp_pf_uninit()' is a 'struct nfp_pf_dev *', so I'm not catch
>> your point about this.
>>>
>>>> Won't removing 'rte_free(pf_dev);' from 'nfp_pf_uninit()' will have
>>>> the same effect, instead of setting it NULL in advance?
>>>>
>>>
>>> If I remove the 'rte_free(pf_dev);' from 'nfp_pf_uninit()', there will be a
>> memory leak as no one will free it, and actually I'm not 'setting it NULL in
>> advance'.
>>>
>>> 359             repr = dev->data->dev_private;
>>> 360             app_fw_flower = repr->app_fw_flower;
>>> 361             hw = app_fw_flower->pf_hw;
>>> 362             pf_dev = hw->pf_dev;
>>>
>>> Maybe you just confuse the 'pf_repr' and 'pf_dev'? Just a guess.
>>>
>>
>> Yes I did confuse those two, sorry about that.
>>
>> 'repr->app_fw_flower->pf_repr' is 'dev_private', and I assumed you are setting
>> it NULL to escape from double free (and was checking where that double free
>> happens), but I guess that is not the case.
>>
>> 'rte_eth_dev_destroy()' calls 'rte_eth_dev_release_port()' and frees
>> 'dev_private' but 'repr->app_fw_flower->pf_repr' remains as dangling pointer
>> and perhaps prevents 'nfp_flower_repr_dev_close()' move forward (because
>> of "if (app_fw_flower->pf_repr != NULL)" check), and you are fixing it, is it the
>> case?
> 
> Correct, that's what we want to do by this patch and where the problem is, your description is very clear and brief.
> 

Got it, I will proceed with the set.
More details in the commit log helps reviewing the patches.
  

Patch

diff --git a/drivers/net/nfp/flower/nfp_flower_representor.c b/drivers/net/nfp/flower/nfp_flower_representor.c
index 27ea3891bd..5f7c1fa737 100644
--- a/drivers/net/nfp/flower/nfp_flower_representor.c
+++ b/drivers/net/nfp/flower/nfp_flower_representor.c
@@ -294,17 +294,30 @@  nfp_flower_repr_tx_burst(void *tx_queue,
 static int
 nfp_flower_repr_uninit(struct rte_eth_dev *eth_dev)
 {
+	uint16_t index;
 	struct nfp_flower_representor *repr;
 
 	repr = eth_dev->data->dev_private;
 	rte_ring_free(repr->ring);
 
+	if (repr->repr_type == NFP_REPR_TYPE_PHYS_PORT) {
+		index = NFP_FLOWER_CMSG_PORT_PHYS_PORT_NUM(repr->port_id);
+		repr->app_fw_flower->phy_reprs[index] = NULL;
+	} else {
+		index = repr->vf_id;
+		repr->app_fw_flower->vf_reprs[index] = NULL;
+	}
+
 	return 0;
 }
 
 static int
-nfp_flower_pf_repr_uninit(__rte_unused struct rte_eth_dev *eth_dev)
+nfp_flower_pf_repr_uninit(struct rte_eth_dev *eth_dev)
 {
+	struct nfp_flower_representor *repr = eth_dev->data->dev_private;
+
+	repr->app_fw_flower->pf_repr = NULL;
+
 	return 0;
 }