net/failsafe: fix fd leak

Message ID 1587984259-18296-1-git-send-email-wangyunjian@huawei.com (mailing list archive)
State Accepted, archived
Delegated to: Ferruh Yigit
Headers
Series net/failsafe: fix fd leak |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-mellanox-Performance fail Performance Testing issues
ci/iol-intel-Performance fail Performance Testing issues
ci/iol-nxp-Performance success Performance Testing PASS
ci/travis-robot success Travis build: passed
ci/iol-testing success Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Yunjian Wang April 27, 2020, 10:44 a.m. UTC
  From: Yunjian Wang <wangyunjian@huawei.com>

Zero is a valid fd. The fd won't be closed thus leading fd leak,
when it is zero.

Fixes: f234e5bd996d ("net/failsafe: register slaves Rx interrupts")
Fixes: 9e0360aebf23 ("net/failsafe: register as Rx interrupt mode")
Cc: stable@dpdk.org

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
---
 drivers/net/failsafe/failsafe_intr.c | 2 +-
 drivers/net/failsafe/failsafe_ops.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
  

Comments

Gaëtan Rivet April 27, 2020, 11:12 a.m. UTC | #1
On 27/04/20 18:44 +0800, wangyunjian wrote:
> From: Yunjian Wang <wangyunjian@huawei.com>
> 
> Zero is a valid fd. The fd won't be closed thus leading fd leak,
> when it is zero.
> 
> Fixes: f234e5bd996d ("net/failsafe: register slaves Rx interrupts")
> Fixes: 9e0360aebf23 ("net/failsafe: register as Rx interrupt mode")
> Cc: stable@dpdk.org
> 

Hello Yunjian,

Nothing prevents a DPDK app from closing 0 and getting it from
another call, good catch.

> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>

Acked-by: Gaetan Rivet <grive@u256.net>

> ---
>  drivers/net/failsafe/failsafe_intr.c | 2 +-
>  drivers/net/failsafe/failsafe_ops.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/failsafe/failsafe_intr.c b/drivers/net/failsafe/failsafe_intr.c
> index d8728fe7e..602c04033 100644
> --- a/drivers/net/failsafe/failsafe_intr.c
> +++ b/drivers/net/failsafe/failsafe_intr.c
> @@ -393,7 +393,7 @@ fs_rx_event_proxy_uninstall(struct fs_priv *priv)
>  		free(priv->rxp.evec);
>  		priv->rxp.evec = NULL;
>  	}
> -	if (priv->rxp.efd > 0) {
> +	if (priv->rxp.efd >= 0) {
>  		close(priv->rxp.efd);
>  		priv->rxp.efd = -1;
>  	}
> diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
> index 50f2aca4e..e1d08e46c 100644
> --- a/drivers/net/failsafe/failsafe_ops.c
> +++ b/drivers/net/failsafe/failsafe_ops.c
> @@ -380,7 +380,7 @@ fs_rx_queue_release(void *queue)
>  	rxq = queue;
>  	dev = &rte_eth_devices[rxq->priv->data->port_id];
>  	fs_lock(dev, 0);
> -	if (rxq->event_fd > 0)
> +	if (rxq->event_fd >= 0)
>  		close(rxq->event_fd);
>  	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
>  		if (ETH(sdev)->data->rx_queues != NULL &&
> -- 
> 2.19.1
> 
>
  
Ferruh Yigit April 27, 2020, 4:55 p.m. UTC | #2
On 4/27/2020 12:12 PM, Gaëtan Rivet wrote:
> On 27/04/20 18:44 +0800, wangyunjian wrote:
>> From: Yunjian Wang <wangyunjian@huawei.com>
>>
>> Zero is a valid fd. The fd won't be closed thus leading fd leak,
>> when it is zero.
>>
>> Fixes: f234e5bd996d ("net/failsafe: register slaves Rx interrupts")
>> Fixes: 9e0360aebf23 ("net/failsafe: register as Rx interrupt mode")
>> Cc: stable@dpdk.org
>>
> 
> Hello Yunjian,
> 
> Nothing prevents a DPDK app from closing 0 and getting it from
> another call, good catch.
> 
>> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> 
> Acked-by: Gaetan Rivet <grive@u256.net>

Applied to dpdk-next-net/master, thanks.
  
Ali Alnubani May 3, 2020, 11:33 a.m. UTC | #3
Hi,

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Ferruh Yigit
> Sent: Monday, April 27, 2020 7:56 PM
> To: Gaëtan Rivet <grive@u256.net>; wangyunjian
> <wangyunjian@huawei.com>
> Cc: dev@dpdk.org; jerry.lilijun@huawei.com; xudingke@huawei.com;
> stable@dpdk.org
> Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH] net/failsafe: fix fd leak
> 
> On 4/27/2020 12:12 PM, Gaëtan Rivet wrote:
> > On 27/04/20 18:44 +0800, wangyunjian wrote:
> >> From: Yunjian Wang <wangyunjian@huawei.com>
> >>
> >> Zero is a valid fd. The fd won't be closed thus leading fd leak, when
> >> it is zero.
> >>
> >> Fixes: f234e5bd996d ("net/failsafe: register slaves Rx interrupts")
> >> Fixes: 9e0360aebf23 ("net/failsafe: register as Rx interrupt mode")
> >> Cc: stable@dpdk.org
> >>
> >
> > Hello Yunjian,
> >
> > Nothing prevents a DPDK app from closing 0 and getting it from another
> > call, good catch.
> >
> >> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> >
> > Acked-by: Gaetan Rivet <grive@u256.net>
> 
> Applied to dpdk-next-net/master, thanks.

This patch is causing Testpmd to quit when I issue a "port stop" command. Testpmd log:

"""
x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd -n 4 -- -i --forward-mode=mac
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: PCI device 0002:00:02.0 on NUMA socket 0
EAL:   probe driver: 15b3:1004 net_mlx4
Interactive-mode selected
Set mac packet forwarding mode
Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa.
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=203456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc

Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.

Configuring Port 1 (socket 0)
Port 1: 00:15:5D:26:2B:00
Checking link statuses...
Done
testpmd> port stop 1
Stopping ports...
Checking link statuses...
Done
testpmd>
Stopping port 1...
Stopping ports...
Done

Shutting down port 1...
Closing ports...
Done

Bye...
"""

My terminal gets broken at this point, and I have to reinitialize it with a "reset".

- Ali
  
Gaëtan Rivet May 4, 2020, 4:22 p.m. UTC | #4
On 03/05/20 11:33 +0000, Ali Alnubani wrote:
> Hi,
> 
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Ferruh Yigit
> > Sent: Monday, April 27, 2020 7:56 PM
> > To: Gaëtan Rivet <grive@u256.net>; wangyunjian
> > <wangyunjian@huawei.com>
> > Cc: dev@dpdk.org; jerry.lilijun@huawei.com; xudingke@huawei.com;
> > stable@dpdk.org
> > Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH] net/failsafe: fix fd leak
> > 
> > On 4/27/2020 12:12 PM, Gaëtan Rivet wrote:
> > > On 27/04/20 18:44 +0800, wangyunjian wrote:
> > >> From: Yunjian Wang <wangyunjian@huawei.com>
> > >>
> > >> Zero is a valid fd. The fd won't be closed thus leading fd leak, when
> > >> it is zero.
> > >>
> > >> Fixes: f234e5bd996d ("net/failsafe: register slaves Rx interrupts")
> > >> Fixes: 9e0360aebf23 ("net/failsafe: register as Rx interrupt mode")
> > >> Cc: stable@dpdk.org
> > >>
> > >
> > > Hello Yunjian,
> > >
> > > Nothing prevents a DPDK app from closing 0 and getting it from another
> > > call, good catch.
> > >
> > >> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > >
> > > Acked-by: Gaetan Rivet <grive@u256.net>
> > 
> > Applied to dpdk-next-net/master, thanks.
> 
> This patch is causing Testpmd to quit when I issue a "port stop" command. Testpmd log:
> 
> """
> x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd -n 4 -- -i --forward-mode=mac
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'PA'
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: PCI device 0002:00:02.0 on NUMA socket 0
> EAL:   probe driver: 15b3:1004 net_mlx4
> Interactive-mode selected
> Set mac packet forwarding mode
> Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa.
> testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=203456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> 
> Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.
> 
> Configuring Port 1 (socket 0)
> Port 1: 00:15:5D:26:2B:00
> Checking link statuses...
> Done
> testpmd> port stop 1
> Stopping ports...
> Checking link statuses...
> Done
> testpmd>
> Stopping port 1...
> Stopping ports...
> Done
> 
> Shutting down port 1...
> Closing ports...
> Done
> 
> Bye...
> """
> 
> My terminal gets broken at this point, and I have to reinitialize it with a "reset".
> 
> - Ali

Hi Ali,

Thanks for the report, I am looking into it.

Are you testing failsafe on Azure?

I see a segfault currently at startup, so in any case there are fixes to be pushed.
I'll see afterward if I need a specific platform to reproduce your bug.

Regards,
  
Stephen Hemminger May 4, 2020, 4:28 p.m. UTC | #5
On Mon, 4 May 2020 18:22:26 +0200
Gaëtan Rivet <grive@u256.net> wrote:

> On 03/05/20 11:33 +0000, Ali Alnubani wrote:
> > Hi,
> >   
> > > -----Original Message-----
> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Ferruh Yigit
> > > Sent: Monday, April 27, 2020 7:56 PM
> > > To: Gaëtan Rivet <grive@u256.net>; wangyunjian
> > > <wangyunjian@huawei.com>
> > > Cc: dev@dpdk.org; jerry.lilijun@huawei.com; xudingke@huawei.com;
> > > stable@dpdk.org
> > > Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH] net/failsafe: fix fd leak
> > > 
> > > On 4/27/2020 12:12 PM, Gaëtan Rivet wrote:  
> > > > On 27/04/20 18:44 +0800, wangyunjian wrote:  
> > > >> From: Yunjian Wang <wangyunjian@huawei.com>
> > > >>
> > > >> Zero is a valid fd. The fd won't be closed thus leading fd leak, when
> > > >> it is zero.
> > > >>
> > > >> Fixes: f234e5bd996d ("net/failsafe: register slaves Rx interrupts")
> > > >> Fixes: 9e0360aebf23 ("net/failsafe: register as Rx interrupt mode")
> > > >> Cc: stable@dpdk.org
> > > >>  
> > > >
> > > > Hello Yunjian,
> > > >
> > > > Nothing prevents a DPDK app from closing 0 and getting it from another
> > > > call, good catch.
> > > >  
> > > >> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>  
> > > >
> > > > Acked-by: Gaetan Rivet <grive@u256.net>  
> > > 
> > > Applied to dpdk-next-net/master, thanks.  
> > 
> > This patch is causing Testpmd to quit when I issue a "port stop" command. Testpmd log:
> > 
> > """
> > x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd -n 4 -- -i --forward-mode=mac
> > EAL: Detected 8 lcore(s)
> > EAL: Detected 1 NUMA nodes
> > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> > EAL: Selected IOVA mode 'PA'
> > EAL: No available hugepages reported in hugepages-1048576kB
> > EAL: Probing VFIO support...
> > EAL: PCI device 0002:00:02.0 on NUMA socket 0
> > EAL:   probe driver: 15b3:1004 net_mlx4
> > Interactive-mode selected
> > Set mac packet forwarding mode
> > Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa.
> > testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=203456, size=2176, socket=0
> > testpmd: preferred mempool ops selected: ring_mp_mc
> > 
> > Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.
> > 
> > Configuring Port 1 (socket 0)
> > Port 1: 00:15:5D:26:2B:00
> > Checking link statuses...
> > Done  
> > testpmd> port stop 1  
> > Stopping ports...
> > Checking link statuses...
> > Done  
> > testpmd>  
> > Stopping port 1...
> > Stopping ports...
> > Done
> > 
> > Shutting down port 1...
> > Closing ports...
> > Done
> > 
> > Bye...
> > """
> > 
> > My terminal gets broken at this point, and I have to reinitialize it with a "reset".

The problem is that you did not blacklist the PCI address of the Mellanox device associated with
your login session (normally this is the PCI device associated with eth0).

By default, DPDK will take over all VF devices it finds as part of the Mellanox device
startup. This means the traffic that was going to the VF associated with eth0 (your ssh)
is now going to DPDK; which is not what you want.

The solution is to either use blacklist (-b option) or whitelist (-w option) to get only
the PCI devices you want to be part of the DPDK.
  
Ali Alnubani May 5, 2020, 9:14 a.m. UTC | #6
> -----Original Message-----
> From: Gaëtan Rivet <grive@u256.net>
> Sent: Monday, May 4, 2020 7:22 PM
> To: Ali Alnubani <alialnu@mellanox.com>
> Cc: Ferruh Yigit <ferruh.yigit@intel.com>; wangyunjian
> <wangyunjian@huawei.com>; dev@dpdk.org; jerry.lilijun@huawei.com;
> xudingke@huawei.com; stable@dpdk.org; Raslan Darawsheh
> <rasland@mellanox.com>
> Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH] net/failsafe: fix fd leak
> 
> On 03/05/20 11:33 +0000, Ali Alnubani wrote:
> > Hi,
> >
> > > -----Original Message-----
> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Ferruh Yigit
> > > Sent: Monday, April 27, 2020 7:56 PM
> > > To: Gaëtan Rivet <grive@u256.net>; wangyunjian
> > > <wangyunjian@huawei.com>
> > > Cc: dev@dpdk.org; jerry.lilijun@huawei.com; xudingke@huawei.com;
> > > stable@dpdk.org
> > > Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH] net/failsafe: fix fd
> > > leak
> > >
> > > On 4/27/2020 12:12 PM, Gaëtan Rivet wrote:
> > > > On 27/04/20 18:44 +0800, wangyunjian wrote:
> > > >> From: Yunjian Wang <wangyunjian@huawei.com>
> > > >>
> > > >> Zero is a valid fd. The fd won't be closed thus leading fd leak,
> > > >> when it is zero.
> > > >>
> > > >> Fixes: f234e5bd996d ("net/failsafe: register slaves Rx
> > > >> interrupts")
> > > >> Fixes: 9e0360aebf23 ("net/failsafe: register as Rx interrupt
> > > >> mode")
> > > >> Cc: stable@dpdk.org
> > > >>
> > > >
> > > > Hello Yunjian,
> > > >
> > > > Nothing prevents a DPDK app from closing 0 and getting it from
> > > > another call, good catch.
> > > >
> > > >> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > > >
> > > > Acked-by: Gaetan Rivet <grive@u256.net>
> > >
> > > Applied to dpdk-next-net/master, thanks.
> >
> > This patch is causing Testpmd to quit when I issue a "port stop" command.
> Testpmd log:
> >
> > """
> > x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd -n 4 -- -i
> > --forward-mode=mac
> > EAL: Detected 8 lcore(s)
> > EAL: Detected 1 NUMA nodes
> > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> > EAL: Selected IOVA mode 'PA'
> > EAL: No available hugepages reported in hugepages-1048576kB
> > EAL: Probing VFIO support...
> > EAL: PCI device 0002:00:02.0 on NUMA socket 0
> > EAL:   probe driver: 15b3:1004 net_mlx4
> > Interactive-mode selected
> > Set mac packet forwarding mode
> > Warning: NUMA should be configured manually by using --port-numa-config
> and --ring-numa-config parameters along with --numa.
> > testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=203456,
> > size=2176, socket=0
> > testpmd: preferred mempool ops selected: ring_mp_mc
> >
> > Warning! port-topology=paired and odd forward ports number, the last port
> will pair with itself.
> >
> > Configuring Port 1 (socket 0)
> > Port 1: 00:15:5D:26:2B:00
> > Checking link statuses...
> > Done
> > testpmd> port stop 1
> > Stopping ports...
> > Checking link statuses...
> > Done
> > testpmd>
> > Stopping port 1...
> > Stopping ports...
> > Done
> >
> > Shutting down port 1...
> > Closing ports...
> > Done
> >
> > Bye...
> > """
> >
> > My terminal gets broken at this point, and I have to reinitialize it with a
> "reset".
> >
> > - Ali
> 
> Hi Ali,
> 
> Thanks for the report, I am looking into it.
> 
> Are you testing failsafe on Azure?

This reproduces with Failsafe, but not necessarily on Azure. You can try to reproduce on any platform if you pass something like '-w 00:00.0 --vdev="net_failsafe0,dev(0000:08:00.0)"'.

> 
> I see a segfault currently at startup, so in any case there are fixes to be
> pushed.
> I'll see afterward if I need a specific platform to reproduce your bug.
> 
> Regards,
> --
> Gaëtan

Regards,
Ali
  
Ali Alnubani May 5, 2020, 9:47 a.m. UTC | #7
Hi Stephen,

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Monday, May 4, 2020 7:29 PM
> To: Gaëtan Rivet <grive@u256.net>
> Cc: Ali Alnubani <alialnu@mellanox.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; wangyunjian <wangyunjian@huawei.com>;
> dev@dpdk.org; jerry.lilijun@huawei.com; xudingke@huawei.com;
> stable@dpdk.org; Raslan Darawsheh <rasland@mellanox.com>
> Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH] net/failsafe: fix fd leak
> 
> On Mon, 4 May 2020 18:22:26 +0200
> Gaëtan Rivet <grive@u256.net> wrote:
> 
> > On 03/05/20 11:33 +0000, Ali Alnubani wrote:
> > > Hi,
> > >
> > > > -----Original Message-----
> > > > From: dev <dev-bounces@dpdk.org> On Behalf Of Ferruh Yigit
> > > > Sent: Monday, April 27, 2020 7:56 PM
> > > > To: Gaëtan Rivet <grive@u256.net>; wangyunjian
> > > > <wangyunjian@huawei.com>
> > > > Cc: dev@dpdk.org; jerry.lilijun@huawei.com; xudingke@huawei.com;
> > > > stable@dpdk.org
> > > > Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH] net/failsafe: fix fd
> > > > leak
> > > >
> > > > On 4/27/2020 12:12 PM, Gaëtan Rivet wrote:
> > > > > On 27/04/20 18:44 +0800, wangyunjian wrote:
> > > > >> From: Yunjian Wang <wangyunjian@huawei.com>
> > > > >>
> > > > >> Zero is a valid fd. The fd won't be closed thus leading fd
> > > > >> leak, when it is zero.
> > > > >>
> > > > >> Fixes: f234e5bd996d ("net/failsafe: register slaves Rx
> > > > >> interrupts")
> > > > >> Fixes: 9e0360aebf23 ("net/failsafe: register as Rx interrupt
> > > > >> mode")
> > > > >> Cc: stable@dpdk.org
> > > > >>
> > > > >
> > > > > Hello Yunjian,
> > > > >
> > > > > Nothing prevents a DPDK app from closing 0 and getting it from
> > > > > another call, good catch.
> > > > >
> > > > >> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > > > >
> > > > > Acked-by: Gaetan Rivet <grive@u256.net>
> > > >
> > > > Applied to dpdk-next-net/master, thanks.
> > >
> > > This patch is causing Testpmd to quit when I issue a "port stop" command.
> Testpmd log:
> > >
> > > """
> > > x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd -n 4 -- -i
> > > --forward-mode=mac
> > > EAL: Detected 8 lcore(s)
> > > EAL: Detected 1 NUMA nodes
> > > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> > > EAL: Selected IOVA mode 'PA'
> > > EAL: No available hugepages reported in hugepages-1048576kB
> > > EAL: Probing VFIO support...
> > > EAL: PCI device 0002:00:02.0 on NUMA socket 0
> > > EAL:   probe driver: 15b3:1004 net_mlx4
> > > Interactive-mode selected
> > > Set mac packet forwarding mode
> > > Warning: NUMA should be configured manually by using --port-numa-config
> and --ring-numa-config parameters along with --numa.
> > > testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=203456,
> > > size=2176, socket=0
> > > testpmd: preferred mempool ops selected: ring_mp_mc
> > >
> > > Warning! port-topology=paired and odd forward ports number, the last
> port will pair with itself.
> > >
> > > Configuring Port 1 (socket 0)
> > > Port 1: 00:15:5D:26:2B:00
> > > Checking link statuses...
> > > Done
> > > testpmd> port stop 1
> > > Stopping ports...
> > > Checking link statuses...
> > > Done
> > > testpmd>
> > > Stopping port 1...
> > > Stopping ports...
> > > Done
> > >
> > > Shutting down port 1...
> > > Closing ports...
> > > Done
> > >
> > > Bye...
> > > """
> > >
> > > My terminal gets broken at this point, and I have to reinitialize it with a
> "reset".
> 
> The problem is that you did not blacklist the PCI address of the Mellanox device
> associated with your login session (normally this is the PCI device associated
> with eth0).
> 
> By default, DPDK will take over all VF devices it finds as part of the Mellanox
> device startup. This means the traffic that was going to the VF associated with
> eth0 (your ssh) is now going to DPDK; which is not what you want.
> 
> The solution is to either use blacklist (-b option) or whitelist (-w option) to get
> only the PCI devices you want to be part of the DPDK.

I'm confused.
I don't think my login interface was being whitelisted in DPDK because my issue isn't that I lose my connection, it's that testpmd quits when I stop a port.

Regards,
Ali
  
Gaëtan Rivet May 5, 2020, 6:35 p.m. UTC | #8
On 05/05/20 09:14 +0000, Ali Alnubani wrote:
> > -----Original Message-----
> > From: Gaëtan Rivet <grive@u256.net>
> > Sent: Monday, May 4, 2020 7:22 PM
> > To: Ali Alnubani <alialnu@mellanox.com>
> > Cc: Ferruh Yigit <ferruh.yigit@intel.com>; wangyunjian
> > <wangyunjian@huawei.com>; dev@dpdk.org; jerry.lilijun@huawei.com;
> > xudingke@huawei.com; stable@dpdk.org; Raslan Darawsheh
> > <rasland@mellanox.com>
> > Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH] net/failsafe: fix fd leak
> > 
> > On 03/05/20 11:33 +0000, Ali Alnubani wrote:
> > > Hi,
> > >
> > > > -----Original Message-----
> > > > From: dev <dev-bounces@dpdk.org> On Behalf Of Ferruh Yigit
> > > > Sent: Monday, April 27, 2020 7:56 PM
> > > > To: Gaëtan Rivet <grive@u256.net>; wangyunjian
> > > > <wangyunjian@huawei.com>
> > > > Cc: dev@dpdk.org; jerry.lilijun@huawei.com; xudingke@huawei.com;
> > > > stable@dpdk.org
> > > > Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH] net/failsafe: fix fd
> > > > leak
> > > >
> > > > On 4/27/2020 12:12 PM, Gaëtan Rivet wrote:
> > > > > On 27/04/20 18:44 +0800, wangyunjian wrote:
> > > > >> From: Yunjian Wang <wangyunjian@huawei.com>
> > > > >>
> > > > >> Zero is a valid fd. The fd won't be closed thus leading fd leak,
> > > > >> when it is zero.
> > > > >>
> > > > >> Fixes: f234e5bd996d ("net/failsafe: register slaves Rx
> > > > >> interrupts")
> > > > >> Fixes: 9e0360aebf23 ("net/failsafe: register as Rx interrupt
> > > > >> mode")
> > > > >> Cc: stable@dpdk.org
> > > > >>
> > > > >
> > > > > Hello Yunjian,
> > > > >
> > > > > Nothing prevents a DPDK app from closing 0 and getting it from
> > > > > another call, good catch.
> > > > >
> > > > >> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > > > >
> > > > > Acked-by: Gaetan Rivet <grive@u256.net>
> > > >
> > > > Applied to dpdk-next-net/master, thanks.
> > >
> > > This patch is causing Testpmd to quit when I issue a "port stop" command.
> > Testpmd log:
> > >
> > > """
> > > x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd -n 4 -- -i
> > > --forward-mode=mac
> > > EAL: Detected 8 lcore(s)
> > > EAL: Detected 1 NUMA nodes
> > > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> > > EAL: Selected IOVA mode 'PA'
> > > EAL: No available hugepages reported in hugepages-1048576kB
> > > EAL: Probing VFIO support...
> > > EAL: PCI device 0002:00:02.0 on NUMA socket 0
> > > EAL:   probe driver: 15b3:1004 net_mlx4
> > > Interactive-mode selected
> > > Set mac packet forwarding mode
> > > Warning: NUMA should be configured manually by using --port-numa-config
> > and --ring-numa-config parameters along with --numa.
> > > testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=203456,
> > > size=2176, socket=0
> > > testpmd: preferred mempool ops selected: ring_mp_mc
> > >
> > > Warning! port-topology=paired and odd forward ports number, the last port
> > will pair with itself.
> > >
> > > Configuring Port 1 (socket 0)
> > > Port 1: 00:15:5D:26:2B:00
> > > Checking link statuses...
> > > Done
> > > testpmd> port stop 1
> > > Stopping ports...
> > > Checking link statuses...
> > > Done
> > > testpmd>
> > > Stopping port 1...
> > > Stopping ports...
> > > Done
> > >
> > > Shutting down port 1...
> > > Closing ports...
> > > Done
> > >
> > > Bye...
> > > """
> > >
> > > My terminal gets broken at this point, and I have to reinitialize it with a
> > "reset".
> > >
> > > - Ali
> > 
> > Hi Ali,
> > 
> > Thanks for the report, I am looking into it.
> > 
> > Are you testing failsafe on Azure?
> 
> This reproduces with Failsafe, but not necessarily on Azure. You can try to reproduce on any platform if you pass something like '-w 00:00.0 --vdev="net_failsafe0,dev(0000:08:00.0)"'.
> 

Hi,

Indeed, I am able to reproduce the issue using this command:
   bash> ./build/app/dpdk-testpmd -n4 -m 4096 --no-huge --vdev='net_failsafe0,dev(net_ring0)' -- -i
(no need of PCI bus nor hugepages to validate failsafe sometimes).

I was asking about Azure because you did not give the command line
options for fail-safe, so I assumed it had been probed automagically.

I made a fix, will send soon.

Regards,
  

Patch

diff --git a/drivers/net/failsafe/failsafe_intr.c b/drivers/net/failsafe/failsafe_intr.c
index d8728fe7e..602c04033 100644
--- a/drivers/net/failsafe/failsafe_intr.c
+++ b/drivers/net/failsafe/failsafe_intr.c
@@ -393,7 +393,7 @@  fs_rx_event_proxy_uninstall(struct fs_priv *priv)
 		free(priv->rxp.evec);
 		priv->rxp.evec = NULL;
 	}
-	if (priv->rxp.efd > 0) {
+	if (priv->rxp.efd >= 0) {
 		close(priv->rxp.efd);
 		priv->rxp.efd = -1;
 	}
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index 50f2aca4e..e1d08e46c 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -380,7 +380,7 @@  fs_rx_queue_release(void *queue)
 	rxq = queue;
 	dev = &rte_eth_devices[rxq->priv->data->port_id];
 	fs_lock(dev, 0);
-	if (rxq->event_fd > 0)
+	if (rxq->event_fd >= 0)
 		close(rxq->event_fd);
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
 		if (ETH(sdev)->data->rx_queues != NULL &&