net/mlx5: fix device removal handler for multiport device

Message ID 1557498982-27734-1-git-send-email-viacheslavo@mellanox.com (mailing list archive)
State Changes Requested, archived
Delegated to: Shahaf Shuler
Headers
Series net/mlx5: fix device removal handler for multiport device |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/intel-Performance-Testing success Performance Testing PASS
ci/mellanox-Performance-Testing success Performance Testing PASS

Commit Message

Slava Ovsiienko May 10, 2019, 2:36 p.m. UTC
  IBV_EVENT_DEVICE_FATAL event is generated by the driver once for
the entire multiport Infiniband device, not for each existing ports.
The port index is zero and it causes dropping the device removal
event. We should invoke the removal event processing routine
for each port we have installed handler for.

Fixes: 028b2a28c3cb ("net/mlx5: update event handler for multiport IB devices")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5_ethdev.c | 43 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)
  

Comments

Yongseok Koh May 10, 2019, 9:24 p.m. UTC | #1
> On May 10, 2019, at 7:36 AM, Viacheslav Ovsiienko <viacheslavo@mellanox.com> wrote:
> 
> IBV_EVENT_DEVICE_FATAL event is generated by the driver once for
> the entire multiport Infiniband device, not for each existing ports.
> The port index is zero and it causes dropping the device removal
> event. We should invoke the removal event processing routine
> for each port we have installed handler for.
> 
> Fixes: 028b2a28c3cb ("net/mlx5: update event handler for multiport IB devices")
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
> drivers/net/mlx5/mlx5_ethdev.c | 43 +++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
> index 80ee98f..3962d41 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -1116,6 +1116,35 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
> }
> 
> /**
> + * Handle asynchronous removal event for entire multiport device.
> + *
> + * @param sh
> + *   Infiniband device shared context.
> + */
> +static void
> +mlx5_dev_interrupt_device_fatal(struct mlx5_ibv_shared *sh)
> +{
> +	uint32_t i;
> +
> +	for (i = 0; i < sh->max_port; ++i) {
> +		struct rte_eth_dev *dev;
> +
> +		if (sh->port[i].ih_port_id >= RTE_MAX_ETHPORTS) {
> +			/*
> +			 * Or not existing port either no
> +			 * handler installed for this port.
> +			 */
> +			continue;
> +		}
> +		dev = &rte_eth_devices[sh->port[i].ih_port_id];
> +		assert(dev);
> +		if (dev->data->dev_conf.intr_conf.rmv)
> +			_rte_eth_dev_callback_process
> +				(dev, RTE_ETH_EVENT_INTR_RMV, NULL);
> +	}
> +}
> +
> +/**
>  * Handle shared asynchronous events the NIC (removal event
>  * and link status change). Supports multiport IB device.
>  *
> @@ -1137,6 +1166,16 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
> 			break;
> 		/* Retrieve and check IB port index. */
> 		tmp = (uint32_t)event.element.port_num;
> +		if (!tmp && event.event_type == IBV_EVENT_DEVICE_FATAL) {
> +			/*
> +			 * The DEVICE_FATAL event is called once for
> +			 * entire device without port specifying.
> +			 * We should notify all existing ports.
> +			 */
> +			mlx5_glue->ack_async_event(&event);
> +			mlx5_dev_interrupt_device_fatal(sh);
> +			continue;
> +		}

Then, you should clear the previous handle below, right? I mean this if-clause.

		if (event.event_type == IBV_EVENT_DEVICE_FATAL &&

> 		assert(tmp && (tmp <= sh->max_port));
> 		if (!tmp ||
> 		    tmp > sh->max_port ||
> @@ -1146,12 +1185,14 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
> 			 * installed for this port.
> 			 */
> 			mlx5_glue->ack_async_event(&event);
> +			DRV_LOG(DEBUG,
> +				"port %u event type %d on not handled",

Needs to fix the sentence. Be more specific. For example,
				port %u cannot handle an event (type %d) due to invalid IB port index

One more below. That should be something like 
				port %u cannot handle an unknown event (type %d)

Thanks,
Yongseok

> +				tmp, event.event_type);
> 			continue;
> 		}
> 		/* Retrieve ethernet device descriptor. */
> 		tmp = sh->port[tmp - 1].ih_port_id;
> 		dev = &rte_eth_devices[tmp];
> -		tmp = 0;
> 		assert(dev);
> 		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
> 		     event.event_type == IBV_EVENT_PORT_ERR) &&
> -- 
> 1.8.3.1
>
  

Patch

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 80ee98f..3962d41 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1116,6 +1116,35 @@  int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
+ * Handle asynchronous removal event for entire multiport device.
+ *
+ * @param sh
+ *   Infiniband device shared context.
+ */
+static void
+mlx5_dev_interrupt_device_fatal(struct mlx5_ibv_shared *sh)
+{
+	uint32_t i;
+
+	for (i = 0; i < sh->max_port; ++i) {
+		struct rte_eth_dev *dev;
+
+		if (sh->port[i].ih_port_id >= RTE_MAX_ETHPORTS) {
+			/*
+			 * Or not existing port either no
+			 * handler installed for this port.
+			 */
+			continue;
+		}
+		dev = &rte_eth_devices[sh->port[i].ih_port_id];
+		assert(dev);
+		if (dev->data->dev_conf.intr_conf.rmv)
+			_rte_eth_dev_callback_process
+				(dev, RTE_ETH_EVENT_INTR_RMV, NULL);
+	}
+}
+
+/**
  * Handle shared asynchronous events the NIC (removal event
  * and link status change). Supports multiport IB device.
  *
@@ -1137,6 +1166,16 @@  int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 			break;
 		/* Retrieve and check IB port index. */
 		tmp = (uint32_t)event.element.port_num;
+		if (!tmp && event.event_type == IBV_EVENT_DEVICE_FATAL) {
+			/*
+			 * The DEVICE_FATAL event is called once for
+			 * entire device without port specifying.
+			 * We should notify all existing ports.
+			 */
+			mlx5_glue->ack_async_event(&event);
+			mlx5_dev_interrupt_device_fatal(sh);
+			continue;
+		}
 		assert(tmp && (tmp <= sh->max_port));
 		if (!tmp ||
 		    tmp > sh->max_port ||
@@ -1146,12 +1185,14 @@  int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 			 * installed for this port.
 			 */
 			mlx5_glue->ack_async_event(&event);
+			DRV_LOG(DEBUG,
+				"port %u event type %d on not handled",
+				tmp, event.event_type);
 			continue;
 		}
 		/* Retrieve ethernet device descriptor. */
 		tmp = sh->port[tmp - 1].ih_port_id;
 		dev = &rte_eth_devices[tmp];
-		tmp = 0;
 		assert(dev);
 		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
 		     event.event_type == IBV_EVENT_PORT_ERR) &&