[5/7] net/nfp: fix resource leak for exit of CoreNIC firmware

Message ID 20231130085238.60290-6-chaoyong.he@corigine.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series fix resource leak problems |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Chaoyong He Nov. 30, 2023, 8:52 a.m. UTC
  Fix the resource leak problem in the exit logic of CoreNIC firmware.

Fixes: 646ea79ce481 ("net/nfp: move PF functions into its own file")
Cc: stable@dpdk.org

Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Reviewed-by: Long Wu <long.wu@corigine.com>
Reviewed-by: Peng Zhang <peng.zhang@corigine.com>
---
 drivers/net/nfp/nfp_ethdev.c     | 94 ++++++++++++++++++++++++--------
 drivers/net/nfp/nfp_net_common.h |  1 +
 2 files changed, 73 insertions(+), 22 deletions(-)
  

Comments

Ferruh Yigit Nov. 30, 2023, 11 a.m. UTC | #1
On 11/30/2023 8:52 AM, Chaoyong He wrote:
> Fix the resource leak problem in the exit logic of CoreNIC firmware.
> 
> Fixes: 646ea79ce481 ("net/nfp: move PF functions into its own file")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
> Reviewed-by: Long Wu <long.wu@corigine.com>
> Reviewed-by: Peng Zhang <peng.zhang@corigine.com>

<...>

> +static int
> +nfp_pf_secondary_uninit(struct nfp_pf_dev *pf_dev)
> +{
> +	free(pf_dev->sym_tbl);
> +	rte_free(pf_dev);
> +
> +	return 0;
> +}
> +
>  /* Reset and stop device. The device can not be restarted. */
>  static int
>  nfp_net_close(struct rte_eth_dev *dev)
> @@ -333,14 +381,25 @@ nfp_net_close(struct rte_eth_dev *dev)
>  	struct rte_pci_device *pci_dev;
>  	struct nfp_app_fw_nic *app_fw_nic;
>  
> -	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> -		return 0;
> -
>  	hw = dev->data->dev_private;
>  	pf_dev = hw->pf_dev;
>  	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
>  	app_fw_nic = NFP_PRIV_TO_APP_FW_NIC(pf_dev->app_fw_priv);
>  
> +	/*
> +	 * In secondary process, a released eth device can be found by its name
> +	 * in shared memory.
> +	 * If the state of the eth device is RTE_ETH_DEV_UNUSED, it means the
> +	 * eth device has been released.
> +	 */
> +	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> +		if (dev->state == RTE_ETH_DEV_UNUSED)
> +			return 0;
> +
> +		nfp_pf_secondary_uninit(pf_dev);
> +		return 0;
> +	}
> +
>

Mostly expectation is secondary process doesn't free shared resources,
but init and free done by primary process.

When there are multiple secondaries active, and if one of them closes
the port, will system behave properly? Can you please double check above
logic?
  
Chaoyong He Dec. 1, 2023, 3 a.m. UTC | #2
> On 11/30/2023 8:52 AM, Chaoyong He wrote:
> > Fix the resource leak problem in the exit logic of CoreNIC firmware.
> >
> > Fixes: 646ea79ce481 ("net/nfp: move PF functions into its own file")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
> > Reviewed-by: Long Wu <long.wu@corigine.com>
> > Reviewed-by: Peng Zhang <peng.zhang@corigine.com>
> 
> <...>
> 
> > +static int
> > +nfp_pf_secondary_uninit(struct nfp_pf_dev *pf_dev) {
> > +	free(pf_dev->sym_tbl);
> > +	rte_free(pf_dev);
> > +
> > +	return 0;
> > +}
> > +
> >  /* Reset and stop device. The device can not be restarted. */  static
> > int  nfp_net_close(struct rte_eth_dev *dev) @@ -333,14 +381,25 @@
> > nfp_net_close(struct rte_eth_dev *dev)
> >  	struct rte_pci_device *pci_dev;
> >  	struct nfp_app_fw_nic *app_fw_nic;
> >
> > -	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> > -		return 0;
> > -
> >  	hw = dev->data->dev_private;
> >  	pf_dev = hw->pf_dev;
> >  	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
> >  	app_fw_nic = NFP_PRIV_TO_APP_FW_NIC(pf_dev->app_fw_priv);
> >
> > +	/*
> > +	 * In secondary process, a released eth device can be found by its
> name
> > +	 * in shared memory.
> > +	 * If the state of the eth device is RTE_ETH_DEV_UNUSED, it means the
> > +	 * eth device has been released.
> > +	 */
> > +	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> > +		if (dev->state == RTE_ETH_DEV_UNUSED)
> > +			return 0;
> > +
> > +		nfp_pf_secondary_uninit(pf_dev);
> > +		return 0;
> > +	}
> > +
> >
> 
> Mostly expectation is secondary process doesn't free shared resources, but
> init and free done by primary process.

I agree.
Maybe the comment here make reader a little confused.
But the `nfp_pf_secondary_uninit()` does not free any shared resources, it only free two memory which private to each secondary process.

> When there are multiple secondaries active, and if one of them closes the port,
> will system behave properly? Can you please double check above logic?

Yes, the system behave properly.
  
Ferruh Yigit Dec. 1, 2023, 9:41 a.m. UTC | #3
On 12/1/2023 3:00 AM, Chaoyong He wrote:
>> On 11/30/2023 8:52 AM, Chaoyong He wrote:
>>> Fix the resource leak problem in the exit logic of CoreNIC firmware.
>>>
>>> Fixes: 646ea79ce481 ("net/nfp: move PF functions into its own file")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
>>> Reviewed-by: Long Wu <long.wu@corigine.com>
>>> Reviewed-by: Peng Zhang <peng.zhang@corigine.com>
>>
>> <...>
>>
>>> +static int
>>> +nfp_pf_secondary_uninit(struct nfp_pf_dev *pf_dev) {
>>> +	free(pf_dev->sym_tbl);
>>> +	rte_free(pf_dev);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>  /* Reset and stop device. The device can not be restarted. */  static
>>> int  nfp_net_close(struct rte_eth_dev *dev) @@ -333,14 +381,25 @@
>>> nfp_net_close(struct rte_eth_dev *dev)
>>>  	struct rte_pci_device *pci_dev;
>>>  	struct nfp_app_fw_nic *app_fw_nic;
>>>
>>> -	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
>>> -		return 0;
>>> -
>>>  	hw = dev->data->dev_private;
>>>  	pf_dev = hw->pf_dev;
>>>  	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
>>>  	app_fw_nic = NFP_PRIV_TO_APP_FW_NIC(pf_dev->app_fw_priv);
>>>
>>> +	/*
>>> +	 * In secondary process, a released eth device can be found by its
>> name
>>> +	 * in shared memory.
>>> +	 * If the state of the eth device is RTE_ETH_DEV_UNUSED, it means the
>>> +	 * eth device has been released.
>>> +	 */
>>> +	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
>>> +		if (dev->state == RTE_ETH_DEV_UNUSED)
>>> +			return 0;
>>> +
>>> +		nfp_pf_secondary_uninit(pf_dev);
>>> +		return 0;
>>> +	}
>>> +
>>>
>>
>> Mostly expectation is secondary process doesn't free shared resources, but
>> init and free done by primary process.
> 
> I agree.
> Maybe the comment here make reader a little confused.
> But the `nfp_pf_secondary_uninit()` does not free any shared resources, it only free two memory which private to each secondary process.
> 

What freed is not process private, it is in the shared memory:

 	hw = dev->data->dev_private;
 	pf_dev = hw->pf_dev;
	rte_free(pf_dev);


And when there are multiple secondaries, one of them frees `pf_dev`, how
this is not effecting others that may use `pf_dev`?


>> When there are multiple secondaries active, and if one of them closes the port,
>> will system behave properly? Can you please double check above logic?
> 
> Yes, the system behave properly.
>
  
Chaoyong He Dec. 1, 2023, 10:03 a.m. UTC | #4
> On 12/1/2023 3:00 AM, Chaoyong He wrote:
> >> On 11/30/2023 8:52 AM, Chaoyong He wrote:
> >>> Fix the resource leak problem in the exit logic of CoreNIC firmware.
> >>>
> >>> Fixes: 646ea79ce481 ("net/nfp: move PF functions into its own file")
> >>> Cc: stable@dpdk.org
> >>>
> >>> Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
> >>> Reviewed-by: Long Wu <long.wu@corigine.com>
> >>> Reviewed-by: Peng Zhang <peng.zhang@corigine.com>
> >>
> >> <...>
> >>
> >>> +static int
> >>> +nfp_pf_secondary_uninit(struct nfp_pf_dev *pf_dev) {
> >>> +	free(pf_dev->sym_tbl);
> >>> +	rte_free(pf_dev);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>>  /* Reset and stop device. The device can not be restarted. */
> >>> static int  nfp_net_close(struct rte_eth_dev *dev) @@ -333,14
> >>> +381,25 @@ nfp_net_close(struct rte_eth_dev *dev)
> >>>  	struct rte_pci_device *pci_dev;
> >>>  	struct nfp_app_fw_nic *app_fw_nic;
> >>>
> >>> -	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> >>> -		return 0;
> >>> -
> >>>  	hw = dev->data->dev_private;
> >>>  	pf_dev = hw->pf_dev;
> >>>  	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
> >>>  	app_fw_nic = NFP_PRIV_TO_APP_FW_NIC(pf_dev->app_fw_priv);
> >>>
> >>> +	/*
> >>> +	 * In secondary process, a released eth device can be found by its
> >> name
> >>> +	 * in shared memory.
> >>> +	 * If the state of the eth device is RTE_ETH_DEV_UNUSED, it means the
> >>> +	 * eth device has been released.
> >>> +	 */
> >>> +	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> >>> +		if (dev->state == RTE_ETH_DEV_UNUSED)
> >>> +			return 0;
> >>> +
> >>> +		nfp_pf_secondary_uninit(pf_dev);
> >>> +		return 0;
> >>> +	}
> >>> +
> >>>
> >>
> >> Mostly expectation is secondary process doesn't free shared
> >> resources, but init and free done by primary process.
> >
> > I agree.
> > Maybe the comment here make reader a little confused.
> > But the `nfp_pf_secondary_uninit()` does not free any shared resources, it
> only free two memory which private to each secondary process.
> >
> 
> What freed is not process private, it is in the shared memory:
> 
>  	hw = dev->data->dev_private;
>  	pf_dev = hw->pf_dev;
> 	rte_free(pf_dev);
> 
> 
> And when there are multiple secondaries, one of them frees `pf_dev`, how
> this is not effecting others that may use `pf_dev`?

Oh, I see what you mean now.
I will a v2 patch to fix this.
Thanks.

> 
> >> When there are multiple secondaries active, and if one of them closes
> >> the port, will system behave properly? Can you please double check above
> logic?
> >
> > Yes, the system behave properly.
> >
  

Patch

diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c
index bb0ddf3d54..5c5e658671 100644
--- a/drivers/net/nfp/nfp_ethdev.c
+++ b/drivers/net/nfp/nfp_ethdev.c
@@ -322,6 +322,54 @@  nfp_net_uninit(struct rte_eth_dev *eth_dev)
 		nfp_cpp_area_release_free(net_hw->mac_stats_area);
 }
 
+static void
+nfp_cleanup_port_app_fw_nic(struct nfp_pf_dev *pf_dev,
+		uint8_t id)
+{
+	struct rte_eth_dev *eth_dev;
+	struct nfp_app_fw_nic *app_fw_nic;
+
+	app_fw_nic = pf_dev->app_fw_priv;
+	if (app_fw_nic->ports[id] != NULL) {
+		eth_dev = app_fw_nic->ports[id]->eth_dev;
+		if (eth_dev != NULL)
+			nfp_net_uninit(eth_dev);
+
+		app_fw_nic->ports[id] = NULL;
+	}
+}
+
+static void
+nfp_uninit_app_fw_nic(struct nfp_pf_dev *pf_dev)
+{
+	nfp_cpp_area_release_free(pf_dev->ctrl_area);
+	rte_free(pf_dev->app_fw_priv);
+}
+
+void
+nfp_pf_uninit(struct nfp_pf_dev *pf_dev)
+{
+	nfp_cpp_area_release_free(pf_dev->qc_area);
+	free(pf_dev->sym_tbl);
+	if (pf_dev->multi_pf.enabled) {
+		nfp_net_keepalive_stop(&pf_dev->multi_pf);
+		nfp_net_keepalive_uninit(&pf_dev->multi_pf);
+	}
+	free(pf_dev->nfp_eth_table);
+	free(pf_dev->hwinfo);
+	nfp_cpp_free(pf_dev->cpp);
+	rte_free(pf_dev);
+}
+
+static int
+nfp_pf_secondary_uninit(struct nfp_pf_dev *pf_dev)
+{
+	free(pf_dev->sym_tbl);
+	rte_free(pf_dev);
+
+	return 0;
+}
+
 /* Reset and stop device. The device can not be restarted. */
 static int
 nfp_net_close(struct rte_eth_dev *dev)
@@ -333,14 +381,25 @@  nfp_net_close(struct rte_eth_dev *dev)
 	struct rte_pci_device *pci_dev;
 	struct nfp_app_fw_nic *app_fw_nic;
 
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-		return 0;
-
 	hw = dev->data->dev_private;
 	pf_dev = hw->pf_dev;
 	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
 	app_fw_nic = NFP_PRIV_TO_APP_FW_NIC(pf_dev->app_fw_priv);
 
+	/*
+	 * In secondary process, a released eth device can be found by its name
+	 * in shared memory.
+	 * If the state of the eth device is RTE_ETH_DEV_UNUSED, it means the
+	 * eth device has been released.
+	 */
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		if (dev->state == RTE_ETH_DEV_UNUSED)
+			return 0;
+
+		nfp_pf_secondary_uninit(pf_dev);
+		return 0;
+	}
+
 	/*
 	 * We assume that the DPDK application is stopping all the
 	 * threads/queues before calling the device close function.
@@ -351,16 +410,17 @@  nfp_net_close(struct rte_eth_dev *dev)
 	nfp_net_close_tx_queue(dev);
 	nfp_net_close_rx_queue(dev);
 
-	/* Clear ipsec */
-	nfp_ipsec_uninit(dev);
-
 	/* Cancel possible impending LSC work here before releasing the port */
 	rte_eal_alarm_cancel(nfp_net_dev_interrupt_delayed_handler, (void *)dev);
 
 	/* Only free PF resources after all physical ports have been closed */
 	/* Mark this port as unused and free device priv resources */
 	nn_cfg_writeb(&hw->super, NFP_NET_CFG_LSC, 0xff);
-	app_fw_nic->ports[hw->idx] = NULL;
+
+	if (pf_dev->app_fw_id != NFP_APP_FW_CORE_NIC)
+		return -EINVAL;
+
+	nfp_cleanup_port_app_fw_nic(pf_dev, hw->idx);
 
 	for (i = 0; i < app_fw_nic->total_phyports; i++) {
 		id = nfp_function_id_get(pf_dev, i);
@@ -370,26 +430,16 @@  nfp_net_close(struct rte_eth_dev *dev)
 			return 0;
 	}
 
-	/* Now it is safe to free all PF resources */
-	PMD_INIT_LOG(INFO, "Freeing PF resources");
-	if (pf_dev->multi_pf.enabled) {
-		nfp_net_keepalive_stop(&pf_dev->multi_pf);
-		nfp_net_keepalive_uninit(&pf_dev->multi_pf);
-	}
-	nfp_cpp_area_free(pf_dev->ctrl_area);
-	nfp_cpp_area_free(pf_dev->qc_area);
-	free(pf_dev->hwinfo);
-	free(pf_dev->sym_tbl);
-	nfp_cpp_free(pf_dev->cpp);
-	rte_free(app_fw_nic);
-	rte_free(pf_dev);
-
+	/* Enable in nfp_net_start() */
 	rte_intr_disable(pci_dev->intr_handle);
 
-	/* Unregister callback func from eal lib */
+	/* Register in nfp_net_init() */
 	rte_intr_callback_unregister(pci_dev->intr_handle,
 			nfp_net_dev_interrupt_handler, (void *)dev);
 
+	nfp_uninit_app_fw_nic(pf_dev);
+	nfp_pf_uninit(pf_dev);
+
 	return 0;
 }
 
diff --git a/drivers/net/nfp/nfp_net_common.h b/drivers/net/nfp/nfp_net_common.h
index 30fea7ae02..ded491cbdc 100644
--- a/drivers/net/nfp/nfp_net_common.h
+++ b/drivers/net/nfp/nfp_net_common.h
@@ -272,6 +272,7 @@  int nfp_net_flow_ctrl_get(struct rte_eth_dev *dev,
 		struct rte_eth_fc_conf *fc_conf);
 int nfp_net_flow_ctrl_set(struct rte_eth_dev *dev,
 		struct rte_eth_fc_conf *fc_conf);
+void nfp_pf_uninit(struct nfp_pf_dev *pf_dev);
 
 #define NFP_PRIV_TO_APP_FW_NIC(app_fw_priv)\
 	((struct nfp_app_fw_nic *)app_fw_priv)