net/mlx5: fix FD configuration for Rx interrupt

Message ID 20220310131923.1144368-1-michaelba@nvidia.com (mailing list archive)
State Rejected, archived
Delegated to: Raslan Darawsheh
Headers
Series net/mlx5: fix FD configuration for Rx interrupt |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/github-robot: build success github build: passed
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-abi-testing success Testing PASS

Commit Message

Michael Baum March 10, 2022, 1:19 p.m. UTC
  The mlx5_rx_intr_vec_enable() function allocates queue vector and fill
FD list for Rx interrupts.

The driver wrongly configured the FD with a non-blocking flag which
prevent waiting on this FD.

This patch removes O_NONBLOCK flag adding.

Fixes: 3c7d44af252a ("net/mlx5: support user space Rx interrupt event")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/mlx5_rxq.c | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)
  

Comments

David Marchand March 10, 2022, 3:12 p.m. UTC | #1
On Thu, Mar 10, 2022 at 2:19 PM Michael Baum <michaelba@nvidia.com> wrote:
>
> The mlx5_rx_intr_vec_enable() function allocates queue vector and fill
> FD list for Rx interrupts.
>
> The driver wrongly configured the FD with a non-blocking flag which
> prevent waiting on this FD.
>
> This patch removes O_NONBLOCK flag adding.

- Maybe I deserve a Reported-by: credit on this issue.
I sent a proposal to make use of Rx interrupts in OVS
https://patchwork.ozlabs.org/project/openvswitch/patch/20220304161132.22065-1-david.marchand@redhat.com/.
And that's when I noticed that mlx5 rx fds were waking OVS too often
and reported it to mlx5 maintainers.


- Testing this patch by starting dpdk-l3fwd-power example (and no
traffic sent at all):

# strace -r -f ./dpdk-dir/v21.11/examples/dpdk-l3fwd-power --lcores
0@3,1@5 -a 0000:82:00.0 --in-memory -- -p 0x1 -P --config '(0,0,1)'
...
[pid 534983]      0.000348 epoll_wait(26, [], 1, 10) = 0
[pid 534983]      0.010082 read(24,

For some reason, there is an event available for fd 18 right away
(which is the issue in the first place).
When reading this fd, read() blocks until an actual packet is received.

Then, I send exactly one packet:
[pid 534983]      0.010082 read(24, "@\217:\370\21\0\0\0", 136) = 8
[pid 534983]      9.228478 epoll_wait(26, [], 1, 10) = 0
[pid 534983]      0.010082 read(24,

That makes mlx5 rx interrupts unusable for an application that does
more than just polling one rxq.
  
Thomas Monjalon March 10, 2022, 4:16 p.m. UTC | #2
10/03/2022 16:12, David Marchand:
> On Thu, Mar 10, 2022 at 2:19 PM Michael Baum <michaelba@nvidia.com> wrote:
> >
> > The mlx5_rx_intr_vec_enable() function allocates queue vector and fill
> > FD list for Rx interrupts.
> >
> > The driver wrongly configured the FD with a non-blocking flag which
> > prevent waiting on this FD.
> >
> > This patch removes O_NONBLOCK flag adding.
> 
> - Maybe I deserve a Reported-by: credit on this issue.
> I sent a proposal to make use of Rx interrupts in OVS
> https://patchwork.ozlabs.org/project/openvswitch/patch/20220304161132.22065-1-david.marchand@redhat.com/.
> And that's when I noticed that mlx5 rx fds were waking OVS too often
> and reported it to mlx5 maintainers.
> 
> 
> - Testing this patch by starting dpdk-l3fwd-power example (and no
> traffic sent at all):
> 
> # strace -r -f ./dpdk-dir/v21.11/examples/dpdk-l3fwd-power --lcores
> 0@3,1@5 -a 0000:82:00.0 --in-memory -- -p 0x1 -P --config '(0,0,1)'
> ...
> [pid 534983]      0.000348 epoll_wait(26, [], 1, 10) = 0
> [pid 534983]      0.010082 read(24,
> 
> For some reason, there is an event available for fd 18 right away
> (which is the issue in the first place).
> When reading this fd, read() blocks until an actual packet is received.
> 
> Then, I send exactly one packet:
> [pid 534983]      0.010082 read(24, "@\217:\370\21\0\0\0", 136) = 8
> [pid 534983]      9.228478 epoll_wait(26, [], 1, 10) = 0
> [pid 534983]      0.010082 read(24,
> 
> That makes mlx5 rx interrupts unusable for an application that does
> more than just polling one rxq.

Excuse me, I don't understand this trace.
What is the first read?
Having a second read after epoll_wait is normal if a packet is received,
isn't it?
  
Michael Baum March 10, 2022, 5:04 p.m. UTC | #3
On Thu, Mar 10, 2022 at 6:16 PM Thomas Monjalon <thomas@monjalon.net> wrote: 
> 
> 10/03/2022 16:12, David Marchand:
> > On Thu, Mar 10, 2022 at 2:19 PM Michael Baum <michaelba@nvidia.com>
> wrote:
> > >
> > > The mlx5_rx_intr_vec_enable() function allocates queue vector and
> > > fill FD list for Rx interrupts.
> > >
> > > The driver wrongly configured the FD with a non-blocking flag which
> > > prevent waiting on this FD.
> > >
> > > This patch removes O_NONBLOCK flag adding.
> >
> > - Maybe I deserve a Reported-by: credit on this issue.

Hi David,

You are right, you should have get the credit.

> > I sent a proposal to make use of Rx interrupts in OVS
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatc
> hwork.ozlabs.org%2Fproject%2Fopenvswitch%2Fpatch%2F20220304161132.2
> 2065-1-
> david.marchand%40redhat.com%2F&amp;data=04%7C01%7Cmichaelba%40n
> vidia.com%7C3679e1e70986425d465b08da02b14b34%7C43083d15727340c1b7
> db39efd9ccc17a%7C0%7C0%7C637825257758477604%7CUnknown%7CTWFpb
> GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> 6Mn0%3D%7C3000&amp;sdata=LamFMZTLgZzQoUDBYAGn2aL9z7RWylWCkn
> q2SGFuKOY%3D&amp;reserved=0.
> > And that's when I noticed that mlx5 rx fds were waking OVS too often
> > and reported it to mlx5 maintainers.
> >
> >
> > - Testing this patch by starting dpdk-l3fwd-power example (and no
> > traffic sent at all):
> >
> > # strace -r -f ./dpdk-dir/v21.11/examples/dpdk-l3fwd-power --lcores
> > 0@3,1@5 -a 0000:82:00.0 --in-memory -- -p 0x1 -P --config '(0,0,1)'
> > ...
> > [pid 534983]      0.000348 epoll_wait(26, [], 1, 10) = 0
> > [pid 534983]      0.010082 read(24,
> >
> > For some reason, there is an event available for fd 18 right away
> > (which is the issue in the first place).

I don't understand who is FD 18, I cannot see it in your log

> > When reading this fd, read() blocks until an actual packet is received.
> >
> > Then, I send exactly one packet:
> > [pid 534983]      0.010082 read(24, "@\217:\370\21\0\0\0", 136) = 8
> > [pid 534983]      9.228478 epoll_wait(26, [], 1, 10) = 0
> > [pid 534983]      0.010082 read(24,
> >
> > That makes mlx5 rx interrupts unusable for an application that does
> > more than just polling one rxq.
> 
> Excuse me, I don't understand this trace.
> What is the first read?
> Having a second read after epoll_wait is normal if a packet is received, isn't it?
>
  
David Marchand March 14, 2022, 9:31 a.m. UTC | #4
On Thu, Mar 10, 2022 at 6:04 PM Michael Baum <michaelba@nvidia.com> wrote:
> > > # strace -r -f ./dpdk-dir/v21.11/examples/dpdk-l3fwd-power --lcores
> > > 0@3,1@5 -a 0000:82:00.0 --in-memory -- -p 0x1 -P --config '(0,0,1)'
> > > ...
> > > [pid 534983]      0.000348 epoll_wait(26, [], 1, 10) = 0
> > > [pid 534983]      0.010082 read(24,
> > >
> > > For some reason, there is an event available for fd 18 right away
> > > (which is the issue in the first place).
>
> I don't understand who is FD 18, I cannot see it in your log

I had refreshed the trace before sending, but did not update the mail,
I meant fd 24.


Now, looking from scratch (rather than my one month old mail),
epoll_wait() returning 0 is normal since it's returning on a 10ms
timeout.
The mlx5 rx fd should be left in non blocking mode.
  
Matan Azrad March 14, 2022, 10:49 a.m. UTC | #5
From: David Marchand
> On Thu, Mar 10, 2022 at 6:04 PM Michael Baum <michaelba@nvidia.com>
> wrote:
> > > > # strace -r -f ./dpdk-dir/v21.11/examples/dpdk-l3fwd-power
> > > > --lcores
> > > > 0@3,1@5 -a 0000:82:00.0 --in-memory -- -p 0x1 -P --config '(0,0,1)'
> > > > ...
> > > > [pid 534983]      0.000348 epoll_wait(26, [], 1, 10) = 0
> > > > [pid 534983]      0.010082 read(24,
> > > >
> > > > For some reason, there is an event available for fd 18 right away
> > > > (which is the issue in the first place).
> >
> > I don't understand who is FD 18, I cannot see it in your log
> 
> I had refreshed the trace before sending, but did not update the mail, I meant fd
> 24.
> 
> 
> Now, looking from scratch (rather than my one month old mail),
> epoll_wait() returning 0 is normal since it's returning on a 10ms timeout.
> The mlx5 rx fd should be left in non blocking mode.

+1



 
> 
> --
> David Marchand
  

Patch

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index f16795bac3..b38505b85c 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1075,7 +1075,6 @@  mlx5_rx_intr_vec_enable(struct rte_eth_dev *dev)
 		/* This rxq obj must not be released in this function. */
 		struct mlx5_rxq_priv *rxq = mlx5_rxq_get(dev, i);
 		struct mlx5_rxq_obj *rxq_obj = rxq ? rxq->ctrl->obj : NULL;
-		int rc;
 
 		/* Skip queues that cannot request interrupts. */
 		if (!rxq_obj || (!rxq_obj->ibv_channel &&
@@ -1097,23 +1096,10 @@  mlx5_rx_intr_vec_enable(struct rte_eth_dev *dev)
 			rte_errno = ENOMEM;
 			return -rte_errno;
 		}
-		rc = mlx5_os_set_nonblock_channel_fd(rxq_obj->fd);
-		if (rc < 0) {
-			rte_errno = errno;
-			DRV_LOG(ERR,
-				"port %u failed to make Rx interrupt file"
-				" descriptor %d non-blocking for queue index"
-				" %d",
-				dev->data->port_id, rxq_obj->fd, i);
-			mlx5_rx_intr_vec_disable(dev);
-			return -rte_errno;
-		}
-
 		if (rte_intr_vec_list_index_set(intr_handle, i,
 					RTE_INTR_VEC_RXTX_OFFSET + count))
 			return -rte_errno;
-		if (rte_intr_efds_index_set(intr_handle, count,
-						   rxq_obj->fd))
+		if (rte_intr_efds_index_set(intr_handle, count, rxq_obj->fd))
 			return -rte_errno;
 		count++;
 	}