failsafe: fix segfault on hotplug event

Message ID 20221110163410.12734-1-lucp.at.work@gmail.com (mailing list archive)
State Superseded, archived
Delegated to: Andrew Rybchenko
Headers
Series failsafe: fix segfault on hotplug event |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS

Commit Message

Luc Pelletier Nov. 10, 2022, 4:34 p.m. UTC
  When the failsafe PMD encounters a hotplug event, it switches its rx/tx
functions to "safe" ones that validate the sub-device's rx/tx functions
before calling them. It switches the rx/tx functions by changing the
function pointers in the rte_eth_dev structure.

Following commit 7a0935239b, the rx/tx functions of PMDs are no longer
called through the function pointers in the rte_eth_dev structure. They
are rather called through a flat array named rte_eth_fp_ops. The
function pointers in that array are initialized when the devices start
and are initialized.

When a hotplug event occurs, the function pointers in rte_eth_fp_ops
still point to the "unsafe" rx/tx functions in the failsafe PMD since
they haven't been updated. This results in a segmentation fault because
it ends up using the "unsafe" functions, when the "safe" functions
should have been used.

To fix the problem, the failsafe PMD code was changed to update the
function pointers in the rte_eth_fp_ops array when a hotplug event
occurs.

Fixes: 7a0935239b ("ethdev: make fast-path functions to use new flat array")
Cc: Konstantin Ananyev <konstantin.ananyev@intel.com>
Cc: stable@dpdk.org

Signed-off-by: Luc Pelletier <lucp.at.work@gmail.com>
---
 drivers/net/failsafe/failsafe_rxtx.c | 9 +++++++++
 1 file changed, 9 insertions(+)
  

Comments

Konstantin Ananyev Nov. 16, 2022, 5:35 p.m. UTC | #1
> When the failsafe PMD encounters a hotplug event, it switches its rx/tx
> functions to "safe" ones that validate the sub-device's rx/tx functions
> before calling them. It switches the rx/tx functions by changing the
> function pointers in the rte_eth_dev structure.
> 
> Following commit 7a0935239b, the rx/tx functions of PMDs are no longer
> called through the function pointers in the rte_eth_dev structure. They
> are rather called through a flat array named rte_eth_fp_ops. The
> function pointers in that array are initialized when the devices start
> and are initialized.
> 
> When a hotplug event occurs, the function pointers in rte_eth_fp_ops
> still point to the "unsafe" rx/tx functions in the failsafe PMD since
> they haven't been updated. This results in a segmentation fault because
> it ends up using the "unsafe" functions, when the "safe" functions
> should have been used.
> 
> To fix the problem, the failsafe PMD code was changed to update the
> function pointers in the rte_eth_fp_ops array when a hotplug event
> occurs.

 
It is not recommended way to update rte_eth_fp_ops[] contents directly.
There are eth_dev_fp_ops_setup()/ eth_dev_fp_ops_reset() that supposed
to be used for that.
About the fix itself - while it might help till some extent,
I think it will not remove the problem completely.
There still remain a race-condition between rte_eth_rx_burst() and failsafe_eth_rmv_event_callback().
Right now DPDK doesn't support switching PMD fast-ops functions (or updating rxq/txq data)
on the fly.
  
> Fixes: 7a0935239b ("ethdev: make fast-path functions to use new flat array")
> Cc: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Cc: stable@dpdk.org
> 
> Signed-off-by: Luc Pelletier <lucp.at.work@gmail.com>
> ---
>  drivers/net/failsafe/failsafe_rxtx.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c
> index fe67293299..34d59dfbb1 100644
> --- a/drivers/net/failsafe/failsafe_rxtx.c
> +++ b/drivers/net/failsafe/failsafe_rxtx.c
> @@ -5,6 +5,7 @@
> 
>  #include <rte_atomic.h>
>  #include <rte_debug.h>
> +#include <rte_ethdev.h>
>  #include <rte_mbuf.h>
>  #include <ethdev_driver.h>
> 
> @@ -44,9 +45,13 @@ failsafe_set_burst_fn(struct rte_eth_dev *dev, int force_safe)
>  		DEBUG("Using safe RX bursts%s",
>  		      (force_safe ? " (forced)" : ""));
>  		dev->rx_pkt_burst = &failsafe_rx_burst;
> +		rte_eth_fp_ops[dev->data->port_id].rx_pkt_burst =
> +			&failsafe_rx_burst;
>  	} else if (!need_safe && safe_set) {
>  		DEBUG("Using fast RX bursts");
>  		dev->rx_pkt_burst = &failsafe_rx_burst_fast;
> +		rte_eth_fp_ops[dev->data->port_id].rx_pkt_burst =
> +			&failsafe_rx_burst_fast;
>  	}
>  	need_safe = force_safe || fs_tx_unsafe(TX_SUBDEV(dev));
>  	safe_set = (dev->tx_pkt_burst == &failsafe_tx_burst);
> @@ -54,9 +59,13 @@ failsafe_set_burst_fn(struct rte_eth_dev *dev, int force_safe)
>  		DEBUG("Using safe TX bursts%s",
>  		      (force_safe ? " (forced)" : ""));
>  		dev->tx_pkt_burst = &failsafe_tx_burst;
> +		rte_eth_fp_ops[dev->data->port_id].tx_pkt_burst =
> +			&failsafe_tx_burst;
>  	} else if (!need_safe && safe_set) {
>  		DEBUG("Using fast TX bursts");
>  		dev->tx_pkt_burst = &failsafe_tx_burst_fast;
> +		rte_eth_fp_ops[dev->data->port_id].tx_pkt_burst =
> +			&failsafe_tx_burst_fast;
>  	}
>  	rte_wmb();
>  }
> --
> 2.25.1
  
Luc Pelletier Nov. 16, 2022, 9:51 p.m. UTC | #2
Hi Konstantin,

> It is not recommended way to update rte_eth_fp_ops[] contents directly.
> There are eth_dev_fp_ops_setup()/ eth_dev_fp_ops_reset() that supposed
> to be used for that.

Good to know. I see another fix that was made in a different PMD that
does exactly the same thing:

https://github.com/DPDK/dpdk/commit/bcd68b68415172815e55fc67cf3947c0433baf74

CC'ing the authors for awareness.

> About the fix itself - while it might help till some extent,
> I think it will not remove the problem completely.
> There still remain a race-condition between rte_eth_rx_burst() and failsafe_eth_rmv_event_callback().
> Right now DPDK doesn't support switching PMD fast-ops functions (or updating rxq/txq data)
> on the fly.

Thanks for the information. This is very helpful.

Are you saying that the previous code also had that same race
condition? It was only updating the rte_eth_dev structure, but I
assume the problem would have been the same since rte_eth_rx_burst()
in DPDK versions <=20 use the function pointers in rte_eth_dev, not
rte_eth_fp_ops.

Can you think of a possible solution to this problem? I'm happy to
provide a patch to properly fix the problem. Having your guidance
would be extremely helpful.

Thanks!
  
Stephen Hemminger Nov. 16, 2022, 10:25 p.m. UTC | #3
On Wed, 16 Nov 2022 16:51:59 -0500
Luc Pelletier <lucp.at.work@gmail.com> wrote:

> Hi Konstantin,
> 
> > It is not recommended way to update rte_eth_fp_ops[] contents directly.
> > There are eth_dev_fp_ops_setup()/ eth_dev_fp_ops_reset() that supposed
> > to be used for that.  
> 
> Good to know. I see another fix that was made in a different PMD that
> does exactly the same thing:
> 
> https://github.com/DPDK/dpdk/commit/bcd68b68415172815e55fc67cf3947c0433baf74
> 
> CC'ing the authors for awareness.
> 
> > About the fix itself - while it might help till some extent,
> > I think it will not remove the problem completely.
> > There still remain a race-condition between rte_eth_rx_burst() and failsafe_eth_rmv_event_callback().
> > Right now DPDK doesn't support switching PMD fast-ops functions (or updating rxq/txq data)
> > on the fly.  
> 
> Thanks for the information. This is very helpful.
> 
> Are you saying that the previous code also had that same race
> condition? It was only updating the rte_eth_dev structure, but I
> assume the problem would have been the same since rte_eth_rx_burst()
> in DPDK versions <=20 use the function pointers in rte_eth_dev, not
> rte_eth_fp_ops.
> 
> Can you think of a possible solution to this problem? I'm happy to
> provide a patch to properly fix the problem. Having your guidance
> would be extremely helpful.
> 
> Thanks!

Changing burst mode on a running device is not safe because
of lack of locking and/or memory barriers.

Would have been better to not to do this optimization.
Just have one rx_burst/tx_burst function and look at what
ever conditions are present there.
  
Konstantin Ananyev Nov. 18, 2022, 4:31 p.m. UTC | #4
Hi Luc,
 
> > Hi Konstantin,
> >
> > > It is not recommended way to update rte_eth_fp_ops[] contents directly.
> > > There are eth_dev_fp_ops_setup()/ eth_dev_fp_ops_reset() that supposed
> > > to be used for that.
> >
> > Good to know. I see another fix that was made in a different PMD that
> > does exactly the same thing:
> >
> > https://github.com/DPDK/dpdk/commit/bcd68b68415172815e55fc67cf3947c0433baf74
> >
> > CC'ing the authors for awareness.
> >
> > > About the fix itself - while it might help till some extent,
> > > I think it will not remove the problem completely.
> > > There still remain a race-condition between rte_eth_rx_burst() and failsafe_eth_rmv_event_callback().
> > > Right now DPDK doesn't support switching PMD fast-ops functions (or updating rxq/txq data)
> > > on the fly.
> >
> > Thanks for the information. This is very helpful.
> >
> > Are you saying that the previous code also had that same race
> > condition?

Yes, I believe so. 

> It was only updating the rte_eth_dev structure, but I
> > assume the problem would have been the same since rte_eth_rx_burst()
> > in DPDK versions <=20 use the function pointers in rte_eth_dev, not
> > rte_eth_fp_ops.
> >
> > Can you think of a possible solution to this problem? I'm happy to
> > provide a patch to properly fix the problem. Having your guidance
> > would be extremely helpful.
> >
> > Thanks!
> 
> Changing burst mode on a running device is not safe because
> of lack of locking and/or memory barriers.
> 
> Would have been better to not to do this optimization.
> Just have one rx_burst/tx_burst function and look at what
> ever conditions are present there.

I think Stephen is right - within DPDK it is just not possible to switch RX/TX
function on the fly (without some external synchronization).
So the safe way is to always use safe version of RX/TX call.
I personally don't think such few extra checks will affect performance that much.

As another nit: inside failsafe rx_burst functions it is probably better not to access dev->data->rx_queues directly,
but call rte_eth_rx_burst(sdev->sdev_port_id, ...); instead.
Same for TX.

Thanks
Konstantin
  

Patch

diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c
index fe67293299..34d59dfbb1 100644
--- a/drivers/net/failsafe/failsafe_rxtx.c
+++ b/drivers/net/failsafe/failsafe_rxtx.c
@@ -5,6 +5,7 @@ 
 
 #include <rte_atomic.h>
 #include <rte_debug.h>
+#include <rte_ethdev.h>
 #include <rte_mbuf.h>
 #include <ethdev_driver.h>
 
@@ -44,9 +45,13 @@  failsafe_set_burst_fn(struct rte_eth_dev *dev, int force_safe)
 		DEBUG("Using safe RX bursts%s",
 		      (force_safe ? " (forced)" : ""));
 		dev->rx_pkt_burst = &failsafe_rx_burst;
+		rte_eth_fp_ops[dev->data->port_id].rx_pkt_burst =
+			&failsafe_rx_burst;
 	} else if (!need_safe && safe_set) {
 		DEBUG("Using fast RX bursts");
 		dev->rx_pkt_burst = &failsafe_rx_burst_fast;
+		rte_eth_fp_ops[dev->data->port_id].rx_pkt_burst =
+			&failsafe_rx_burst_fast;
 	}
 	need_safe = force_safe || fs_tx_unsafe(TX_SUBDEV(dev));
 	safe_set = (dev->tx_pkt_burst == &failsafe_tx_burst);
@@ -54,9 +59,13 @@  failsafe_set_burst_fn(struct rte_eth_dev *dev, int force_safe)
 		DEBUG("Using safe TX bursts%s",
 		      (force_safe ? " (forced)" : ""));
 		dev->tx_pkt_burst = &failsafe_tx_burst;
+		rte_eth_fp_ops[dev->data->port_id].tx_pkt_burst =
+			&failsafe_tx_burst;
 	} else if (!need_safe && safe_set) {
 		DEBUG("Using fast TX bursts");
 		dev->tx_pkt_burst = &failsafe_tx_burst_fast;
+		rte_eth_fp_ops[dev->data->port_id].tx_pkt_burst =
+			&failsafe_tx_burst_fast;
 	}
 	rte_wmb();
 }