[v2,2/2] app/testpmd:fix testpmd quit failure

Message ID 20220223113251.723692-3-wenxuanx.wu@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series app/testpmd: fix testpmd quit with pf and vfs |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-abi-testing success Testing PASS

Commit Message

Wu, WenxuanX Feb. 23, 2022, 11:32 a.m. UTC
  From: wenxuan wu <wenxuanx.wu@intel.com>

When testpmd start ed with 1 pf and 2 vfs, testpmd quited while vfs 
were still alive would result in failure. Root cause is that pf had
been released already but vfs were still accessing by func 
rte_eth_dev_info_get, which would result in heap-free-after-use error.

By quitting our ports in reverse order to avoid this.And the order is
guaranteed that vf are created after pfs.

Fixes: d3a274ce9dee ("app/testpmd: handle SIGINT and SIGTERM")
Cc: stable@dpdk.org

Signed-off-by: wenxuan wu <wenxuanx.wu@intel.com>
---
 app/test-pmd/testpmd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Ferruh Yigit Feb. 23, 2022, 12:09 p.m. UTC | #1
On 2/23/2022 11:32 AM, wenxuanx.wu@intel.com wrote:
> From: wenxuan wu <wenxuanx.wu@intel.com>
> 
> When testpmd start ed with 1 pf and 2 vfs, testpmd quited while vfs
> were still alive would result in failure. Root cause is that pf had
> been released already but vfs were still accessing by func
> rte_eth_dev_info_get, which would result in heap-free-after-use error.
> 
> By quitting our ports in reverse order to avoid this.And the order is
> guaranteed that vf are created after pfs.
> 
> Fixes: d3a274ce9dee ("app/testpmd: handle SIGINT and SIGTERM")
> Cc: stable@dpdk.org
> 
> Signed-off-by: wenxuan wu <wenxuanx.wu@intel.com>
> ---
>   app/test-pmd/testpmd.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index e1da961311..698b6d8cc4 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -3384,12 +3384,12 @@ pmd_test_exit(void)
>   #endif
>   	if (ports != NULL) {
>   		no_link_check = 1;
> -		RTE_ETH_FOREACH_DEV(pt_id) {
> +		RTE_ETH_FOREACH_DEV_REVERSE(pt_id) {

The main problem with this patch was this logic,
can you please check comment on previous version?

>   			printf("\nStopping port %d...\n", pt_id);
>   			fflush(stdout);
>   			stop_port(pt_id);
>   		}
> -		RTE_ETH_FOREACH_DEV(pt_id) {
> +		RTE_ETH_FOREACH_DEV_REVERSE(pt_id) {
>   			printf("\nShutting down port %d...\n", pt_id);
>   			fflush(stdout);
>   			close_port(pt_id);
  
Wu, WenxuanX March 3, 2022, 1:22 p.m. UTC | #2
I found this meaning in DPDK testplan. 
Note that currently hot-plugging of representor ports is not supported so all the required representors must be specified on the creation of the PF or the trusted VF.
When testpmd is started with pf and vf representors, the order of representor is determined on creation. So it is guaranteed that ,pf is beneath the vf representors, we implemented in a reverse way is acceptable just at present,  depends on when the hot-plugging of representor is supported.  

> -----Original Message-----
> From: Wu, WenxuanX <wenxuanx.wu@intel.com>
> Sent: 2022年2月23日 19:33
> To: Li, Xiaoyun <xiaoyun.li@intel.com>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; dev@dpdk.org
> Cc: Wu, WenxuanX <wenxuanx.wu@intel.com>; stable@dpdk.org
> Subject: [PATCH v2 2/2] app/testpmd:fix testpmd quit failure
> 
> From: wenxuan wu <wenxuanx.wu@intel.com>
> 
> When testpmd start ed with 1 pf and 2 vfs, testpmd quited while vfs were
> still alive would result in failure. Root cause is that pf had been released
> already but vfs were still accessing by func rte_eth_dev_info_get, which
> would result in heap-free-after-use error.
> 
> By quitting our ports in reverse order to avoid this.And the order is
> guaranteed that vf are created after pfs.
> 
> Fixes: d3a274ce9dee ("app/testpmd: handle SIGINT and SIGTERM")
> Cc: stable@dpdk.org
> 
> Signed-off-by: wenxuan wu <wenxuanx.wu@intel.com>
> ---
>  app/test-pmd/testpmd.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> e1da961311..698b6d8cc4 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -3384,12 +3384,12 @@ pmd_test_exit(void)  #endif
>  	if (ports != NULL) {
>  		no_link_check = 1;
> -		RTE_ETH_FOREACH_DEV(pt_id) {
> +		RTE_ETH_FOREACH_DEV_REVERSE(pt_id) {
>  			printf("\nStopping port %d...\n", pt_id);
>  			fflush(stdout);
>  			stop_port(pt_id);
>  		}
> -		RTE_ETH_FOREACH_DEV(pt_id) {
> +		RTE_ETH_FOREACH_DEV_REVERSE(pt_id) {
>  			printf("\nShutting down port %d...\n", pt_id);
>  			fflush(stdout);
>  			close_port(pt_id);
> --
> 2.25.1
  
Ferruh Yigit March 4, 2022, 4:15 p.m. UTC | #3
On 3/3/2022 1:22 PM, Wu, WenxuanX wrote:

moved down, please don't top post

>> -----Original Message-----
>> From: Wu, WenxuanX <wenxuanx.wu@intel.com>
>> Sent: 2022年2月23日 19:33
>> To: Li, Xiaoyun <xiaoyun.li@intel.com>; Yigit, Ferruh
>> <ferruh.yigit@intel.com>; dev@dpdk.org
>> Cc: Wu, WenxuanX <wenxuanx.wu@intel.com>; stable@dpdk.org
>> Subject: [PATCH v2 2/2] app/testpmd:fix testpmd quit failure
>>
>> From: wenxuan wu <wenxuanx.wu@intel.com>
>>
>> When testpmd start ed with 1 pf and 2 vfs, testpmd quited while vfs were
>> still alive would result in failure. Root cause is that pf had been released
>> already but vfs were still accessing by func rte_eth_dev_info_get, which
>> would result in heap-free-after-use error.
>>
>> By quitting our ports in reverse order to avoid this.And the order is
>> guaranteed that vf are created after pfs.
>>
>> Fixes: d3a274ce9dee ("app/testpmd: handle SIGINT and SIGTERM")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: wenxuan wu <wenxuanx.wu@intel.com>
>> ---
>>   app/test-pmd/testpmd.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>> e1da961311..698b6d8cc4 100644
>> --- a/app/test-pmd/testpmd.c
>> +++ b/app/test-pmd/testpmd.c
>> @@ -3384,12 +3384,12 @@ pmd_test_exit(void)  #endif
>>   	if (ports != NULL) {
>>   		no_link_check = 1;
>> -		RTE_ETH_FOREACH_DEV(pt_id) {
>> +		RTE_ETH_FOREACH_DEV_REVERSE(pt_id) {
>>   			printf("\nStopping port %d...\n", pt_id);
>>   			fflush(stdout);
>>   			stop_port(pt_id);
>>   		}
>> -		RTE_ETH_FOREACH_DEV(pt_id) {
>> +		RTE_ETH_FOREACH_DEV_REVERSE(pt_id) {
>>   			printf("\nShutting down port %d...\n", pt_id);
>>   			fflush(stdout);
>>   			close_port(pt_id);
>> --
>> 2.25.1
> 
> 
> I found this meaning in DPDK testplan.
> Note that currently hot-plugging of representor ports is not supported so all the required representors must be specified on the creation of the PF or the trusted VF.
> When testpmd is started with pf and vf representors, the order of representor is determined on creation. So it is guaranteed that ,pf is beneath the vf representors, we implemented in a reverse way is acceptable just at present,  depends on when the hot-plugging of representor is supported.
> 

The patch mentions from PF and VFs, and now you are referring
to port representor.
Is the problem related to VF or port representor.

For both, VF and port reporesentor should be closed before
PF, that part is OK. My comment is if reversing port id
traverse will fix the issue or do we need more complex
solution.
Like have APIs to get VF and representor ports from a given
port id, and free them first.
  
Wu, WenxuanX March 9, 2022, 3:07 a.m. UTC | #4
> -----Original Message-----
> From: Yigit, Ferruh <ferruh.yigit@intel.com>
> Sent: 2022年3月5日 0:16
> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; Li, Xiaoyun
> <xiaoyun.li@intel.com>; dev@dpdk.org
> Cc: stable@dpdk.org
> Subject: Re: [PATCH v2 2/2] app/testpmd:fix testpmd quit failure
> 
> On 3/3/2022 1:22 PM, Wu, WenxuanX wrote:
> 
> moved down, please don't top post
> 
> >> -----Original Message-----
> >> From: Wu, WenxuanX <wenxuanx.wu@intel.com>
> >> Sent: 2022年2月23日 19:33
> >> To: Li, Xiaoyun <xiaoyun.li@intel.com>; Yigit, Ferruh
> >> <ferruh.yigit@intel.com>; dev@dpdk.org
> >> Cc: Wu, WenxuanX <wenxuanx.wu@intel.com>; stable@dpdk.org
> >> Subject: [PATCH v2 2/2] app/testpmd:fix testpmd quit failure
> >>
> >> From: wenxuan wu <wenxuanx.wu@intel.com>
> >>
> >> When testpmd start ed with 1 pf and 2 vfs, testpmd quited while vfs
> >> were still alive would result in failure. Root cause is that pf had
> >> been released already but vfs were still accessing by func
> >> rte_eth_dev_info_get, which would result in heap-free-after-use error.
> >>
> >> By quitting our ports in reverse order to avoid this.And the order is
> >> guaranteed that vf are created after pfs.
> >>
> >> Fixes: d3a274ce9dee ("app/testpmd: handle SIGINT and SIGTERM")
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: wenxuan wu <wenxuanx.wu@intel.com>
> >> ---
> >>   app/test-pmd/testpmd.c | 4 ++--
> >>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> >> e1da961311..698b6d8cc4 100644
> >> --- a/app/test-pmd/testpmd.c
> >> +++ b/app/test-pmd/testpmd.c
> >> @@ -3384,12 +3384,12 @@ pmd_test_exit(void)  #endif
> >>   	if (ports != NULL) {
> >>   		no_link_check = 1;
> >> -		RTE_ETH_FOREACH_DEV(pt_id) {
> >> +		RTE_ETH_FOREACH_DEV_REVERSE(pt_id) {
> >>   			printf("\nStopping port %d...\n", pt_id);
> >>   			fflush(stdout);
> >>   			stop_port(pt_id);
> >>   		}
> >> -		RTE_ETH_FOREACH_DEV(pt_id) {
> >> +		RTE_ETH_FOREACH_DEV_REVERSE(pt_id) {
> >>   			printf("\nShutting down port %d...\n", pt_id);
> >>   			fflush(stdout);
> >>   			close_port(pt_id);
> >> --
> >> 2.25.1
> >
> >
> > I found this meaning in DPDK testplan.
> > Note that currently hot-plugging of representor ports is not supported so all
> the required representors must be specified on the creation of the PF or the
> trusted VF.
> > When testpmd is started with pf and vf representors, the order of
> representor is determined on creation. So it is guaranteed that ,pf is beneath
> the vf representors, we implemented in a reverse way is acceptable just at
> present,  depends on when the hot-plugging of representor is supported.
> >
> 
> The patch mentions from PF and VFs, and now you are referring to port
> representor.
> Is the problem related to VF or port representor.
> 
> For both, VF and port reporesentor should be closed before PF, that part is
> OK. My comment is if reversing port id traverse will fix the issue or do we
> need more complex solution.
> Like have APIs to get VF and representor ports from a given port id, and free
> them first.
This patch did fix the issue ,and was verified.But it was rejected.
  
Wu, WenxuanX March 10, 2022, 7:02 a.m. UTC | #5
> -----Original Message-----
> From: Wu, WenxuanX
> Sent: 2022年3月9日 11:07
> To: Yigit, Ferruh <ferruh.yigit@intel.com>; Li, Xiaoyun <xiaoyun.li@intel.com>;
> dev@dpdk.org
> Cc: stable@dpdk.org
> Subject: RE: [PATCH v2 2/2] app/testpmd:fix testpmd quit failure
> 
> 
> 
> > -----Original Message-----
> > From: Yigit, Ferruh <ferruh.yigit@intel.com>
> > Sent: 2022年3月5日 0:16
> > To: Wu, WenxuanX <wenxuanx.wu@intel.com>; Li, Xiaoyun
> > <xiaoyun.li@intel.com>; dev@dpdk.org
> > Cc: stable@dpdk.org
> > Subject: Re: [PATCH v2 2/2] app/testpmd:fix testpmd quit failure
> >
> > On 3/3/2022 1:22 PM, Wu, WenxuanX wrote:
> >
> > moved down, please don't top post
> >
> > >> -----Original Message-----
> > >> From: Wu, WenxuanX <wenxuanx.wu@intel.com>
> > >> Sent: 2022年2月23日 19:33
> > >> To: Li, Xiaoyun <xiaoyun.li@intel.com>; Yigit, Ferruh
> > >> <ferruh.yigit@intel.com>; dev@dpdk.org
> > >> Cc: Wu, WenxuanX <wenxuanx.wu@intel.com>; stable@dpdk.org
> > >> Subject: [PATCH v2 2/2] app/testpmd:fix testpmd quit failure
> > >>
> > >> From: wenxuan wu <wenxuanx.wu@intel.com>
> > >>
> > >> When testpmd start ed with 1 pf and 2 vfs, testpmd quited while vfs
> > >> were still alive would result in failure. Root cause is that pf had
> > >> been released already but vfs were still accessing by func
> > >> rte_eth_dev_info_get, which would result in heap-free-after-use error.
> > >>
> > >> By quitting our ports in reverse order to avoid this.And the order
> > >> is guaranteed that vf are created after pfs.
> > >>
> > >> Fixes: d3a274ce9dee ("app/testpmd: handle SIGINT and SIGTERM")
> > >> Cc: stable@dpdk.org
> > >>
> > >> Signed-off-by: wenxuan wu <wenxuanx.wu@intel.com>
> > >> ---
> > >>   app/test-pmd/testpmd.c | 4 ++--
> > >>   1 file changed, 2 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> > >> e1da961311..698b6d8cc4 100644
> > >> --- a/app/test-pmd/testpmd.c
> > >> +++ b/app/test-pmd/testpmd.c
> > >> @@ -3384,12 +3384,12 @@ pmd_test_exit(void)  #endif
> > >>   	if (ports != NULL) {
> > >>   		no_link_check = 1;
> > >> -		RTE_ETH_FOREACH_DEV(pt_id) {
> > >> +		RTE_ETH_FOREACH_DEV_REVERSE(pt_id) {
> > >>   			printf("\nStopping port %d...\n", pt_id);
> > >>   			fflush(stdout);
> > >>   			stop_port(pt_id);
> > >>   		}
> > >> -		RTE_ETH_FOREACH_DEV(pt_id) {
> > >> +		RTE_ETH_FOREACH_DEV_REVERSE(pt_id) {
> > >>   			printf("\nShutting down port %d...\n", pt_id);
> > >>   			fflush(stdout);
> > >>   			close_port(pt_id);
> > >> --
> > >> 2.25.1
> > >
> > >
> > > I found this meaning in DPDK testplan.
> > > Note that currently hot-plugging of representor ports is not
> > > supported so all
> > the required representors must be specified on the creation of the PF
> > or the trusted VF.
> > > When testpmd is started with pf and vf representors, the order of
> > representor is determined on creation. So it is guaranteed that ,pf is
> > beneath the vf representors, we implemented in a reverse way is
> > acceptable just at present,  depends on when the hot-plugging of
> representor is supported.
> > >
> >
> > The patch mentions from PF and VFs, and now you are referring to port
> > representor.
> > Is the problem related to VF or port representor.
> >
> > For both, VF and port reporesentor should be closed before PF, that
> > part is OK. My comment is if reversing port id traverse will fix the
> > issue or do we need more complex solution.
> > Like have APIs to get VF and representor ports from a given port id,
> > and free them first.
The problem is that when I attempted to use a func like get_representor_info(pid,info); I didn't found this func implemented by driver ,
so I can not get the type of port(VF or PF ) directly by get_representor_info(pid,info),especially when the connected driver is i40e, 
representor_info_get occurred below.

./drivers/net/mlx5/mlx5.c:      .representor_info_get = mlx5_representor_info_get,
./drivers/net/mlx5/mlx5.c:      .representor_info_get = mlx5_representor_info_get,
./drivers/net/mlx5/mlx5.c:      .representor_info_get = mlx5_representor_info_get,
./drivers/net/mlx5/mlx5.h:int mlx5_representor_info_get(struct rte_eth_dev *dev,
./drivers/net/mlx5/mlx5_ethdev.c:mlx5_representor_info_get(struct rte_eth_dev *dev,
./drivers/net/sfc/sfc_ethdev.c:sfc_representor_info_get(struct rte_eth_dev *dev,
./drivers/net/sfc/sfc_ethdev.c: .representor_info_get           = sfc_representor_info_get,
./app/test-pmd/cmdline.c:       ret = rte_eth_representor_info_get(res->cmd_pid, NULL);
./app/test-pmd/cmdline.c:       ret = rte_eth_representor_info_get(res->cmd_pid, info);
./app/test-pmd/testpmd.c:       ret = rte_eth_representor_info_get(pi,&info);
./lib/ethdev/version.map:       rte_eth_representor_info_get;
./lib/ethdev/rte_ethdev.h:int rte_eth_representor_info_get(uint16_t port_id,
./lib/ethdev/ethdev_driver.h:typedef int (*eth_representor_info_get_t)(struct rte_eth_dev *dev,
./lib/ethdev/ethdev_driver.h:   eth_representor_info_get_t representor_info_get;
./lib/ethdev/ethdev_driver.h: * The mapping is retrieved from rte_eth_representor_info_get().
./lib/ethdev/rte_ethdev.c:      ret = rte_eth_representor_info_get(port_id, NULL);
./lib/ethdev/rte_ethdev.c:      ret = rte_eth_representor_info_get(port_id, info);
./lib/ethdev/rte_ethdev.c:rte_eth_representor_info_get(uint16_t port_id,
./lib/ethdev/rte_ethdev.c:      RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->representor_info_get, -ENOTSUP);
./lib/ethdev/rte_ethdev.c:      return eth_err(port_id, (*dev->dev_ops->representor_info_get)(dev, info));
We should focus on changing the logic of is_bonding_slave to avoid this bug ,right ? The procedure of port_close is like this :
FOR_EACH_DEV(pi):
	If Is_bonding_slave(pi)
		Continue;
	.... 
	Etc.
In is_bonding_slave(pi): 

This func get_dev_info(pid,dev_info) would be called to get dev.dev_flag &SLAVE ,this would 
be result in error ,when port sequence is like port 0  PF ,port 1 VF_REPR, port 2 VF_REPR, there is no 
obstacles in closing port 0 ,then pid iterate to port 1.

But port 1 is a VF_REPR which based on port 0 ,when in func is_bonding_slave(port 1), it would
 call get_dev_info(1,dev_info) ,error occurred. Because we use get_dev_info(especially when PF is released )
 not ports[1] which had been pre allocated . would result in this error.
 Two solutions :
                        1. Reverse order to close like I mentioned before (PF and VF_repr  order is determined at creation time with no representor hot_plug).
                        2.change func get_dev_info(pid,info) to  ports[pid)  to get dev_flag
Both can resolve this bug ,which one do you prefer?
  

Patch

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index e1da961311..698b6d8cc4 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -3384,12 +3384,12 @@  pmd_test_exit(void)
 #endif
 	if (ports != NULL) {
 		no_link_check = 1;
-		RTE_ETH_FOREACH_DEV(pt_id) {
+		RTE_ETH_FOREACH_DEV_REVERSE(pt_id) {
 			printf("\nStopping port %d...\n", pt_id);
 			fflush(stdout);
 			stop_port(pt_id);
 		}
-		RTE_ETH_FOREACH_DEV(pt_id) {
+		RTE_ETH_FOREACH_DEV_REVERSE(pt_id) {
 			printf("\nShutting down port %d...\n", pt_id);
 			fflush(stdout);
 			close_port(pt_id);