[dpdk-dev] bonding: fix bond link detect in non-interrupt mode

Message ID 1458953090-12764-1-git-send-email-johndale@cisco.com (mailing list archive)
State Accepted, archived
Headers

Commit Message

John Daley (johndale) March 26, 2016, 12:44 a.m. UTC
  From: Nelson Escobar <neescoba@cisco.com>

Stopping then re-starting a bond interface containing slaves that
used polling for link detection caused the bond to think all slave
links were down and inactive.

Move the start of the polling for link from slave_add() to
bond_ethdev_start() and in bond_ethdev_stop() make sure we clear
the last_link_status of the slaves.

Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)
  

Comments

Thomas Monjalon March 30, 2016, 4:15 p.m. UTC | #1
Anyone to review, please?

2016-03-25 17:44, John Daley:
> From: Nelson Escobar <neescoba@cisco.com>
> 
> Stopping then re-starting a bond interface containing slaves that
> used polling for link detection caused the bond to think all slave
> links were down and inactive.
> 
> Move the start of the polling for link from slave_add() to
> bond_ethdev_start() and in bond_ethdev_stop() make sure we clear
> the last_link_status of the slaves.
> 
> Signed-off-by: Nelson Escobar <neescoba@cisco.com>
> Signed-off-by: John Daley <johndale@cisco.com>
> ---
>  drivers/net/bonding/rte_eth_bond_pmd.c | 27 +++++++++++++++++----------
>  1 file changed, 17 insertions(+), 10 deletions(-)
  
Thomas Monjalon March 31, 2016, 1:40 p.m. UTC | #2
2016-03-25 17:44, John Daley:
> From: Nelson Escobar <neescoba@cisco.com>
> 
> Stopping then re-starting a bond interface containing slaves that
> used polling for link detection caused the bond to think all slave
> links were down and inactive.
> 
> Move the start of the polling for link from slave_add() to
> bond_ethdev_start() and in bond_ethdev_stop() make sure we clear
> the last_link_status of the slaves.
> 
> Signed-off-by: Nelson Escobar <neescoba@cisco.com>
> Signed-off-by: John Daley <johndale@cisco.com>

A "Fixes:" line would be appreciated to know the origin of the bug.
Thanks
  
Nelson Escobar March 31, 2016, 5:33 p.m. UTC | #3
Thomas,

Sorry, I should have included the following fixes line:

Fixes: a45b288ef21a("bond: support link status polling")

Please add it to the commit message when applying if there are no other
problems with the patch.

Nelson.
On 3/31/2016 6:40 AM, Thomas Monjalon wrote:
> 2016-03-25 17:44, John Daley:
>> From: Nelson Escobar <neescoba@cisco.com>
>>
>> Stopping then re-starting a bond interface containing slaves that
>> used polling for link detection caused the bond to think all slave
>> links were down and inactive.
>>
>> Move the start of the polling for link from slave_add() to
>> bond_ethdev_start() and in bond_ethdev_stop() make sure we clear
>> the last_link_status of the slaves.
>>
>> Signed-off-by: Nelson Escobar <neescoba@cisco.com>
>> Signed-off-by: John Daley <johndale@cisco.com>
> 
> A "Fixes:" line would be appreciated to know the origin of the bug.
> Thanks
>
  
Doherty, Declan April 1, 2016, 1:35 p.m. UTC | #4
On 26/03/16 00:44, John Daley wrote:
> From: Nelson Escobar <neescoba@cisco.com>
>
> Stopping then re-starting a bond interface containing slaves that
> used polling for link detection caused the bond to think all slave
> links were down and inactive.
>
> Move the start of the polling for link from slave_add() to
> bond_ethdev_start() and in bond_ethdev_stop() make sure we clear
> the last_link_status of the slaves.
>
> Signed-off-by: Nelson Escobar <neescoba@cisco.com>
> Signed-off-by: John Daley <johndale@cisco.com>
> ---

Acked-by: Declan Doherty <declan.doherty@intel.com>
  

Patch

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index fb26d35..f0960c6 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1454,18 +1454,11 @@  slave_add(struct bond_dev_private *internals,
 	slave_details->port_id = slave_eth_dev->data->port_id;
 	slave_details->last_link_status = 0;
 
-	/* If slave device doesn't support interrupts then we need to enabled
-	 * polling to monitor link status */
+	/* Mark slave devices that don't support interrupts so we can
+	 * compensate when we start the bond
+	 */
 	if (!(slave_eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)) {
 		slave_details->link_status_poll_enabled = 1;
-
-		if (!internals->link_status_polling_enabled) {
-			internals->link_status_polling_enabled = 1;
-
-			rte_eal_alarm_set(internals->link_status_polling_interval_ms * 1000,
-					bond_ethdev_slave_link_status_change_monitor,
-					(void *)&rte_eth_devices[internals->port_id]);
-		}
 	}
 
 	slave_details->link_status_wait_to_complete = 0;
@@ -1550,6 +1543,18 @@  bond_ethdev_start(struct rte_eth_dev *eth_dev)
 					eth_dev->data->port_id, internals->slaves[i].port_id);
 			return -1;
 		}
+		/* We will need to poll for link status if any slave doesn't
+		 * support interrupts
+		 */
+		if (internals->slaves[i].link_status_poll_enabled)
+			internals->link_status_polling_enabled = 1;
+	}
+	/* start polling if needed */
+	if (internals->link_status_polling_enabled) {
+		rte_eal_alarm_set(
+			internals->link_status_polling_interval_ms * 1000,
+			bond_ethdev_slave_link_status_change_monitor,
+			(void *)&rte_eth_devices[internals->port_id]);
 	}
 
 	if (internals->user_defined_primary_port)
@@ -1622,6 +1627,8 @@  bond_ethdev_stop(struct rte_eth_dev *eth_dev)
 
 	internals->active_slave_count = 0;
 	internals->link_status_polling_enabled = 0;
+	for (i = 0; i < internals->slave_count; i++)
+		internals->slaves[i].last_link_status = 0;
 
 	eth_dev->data->dev_link.link_status = 0;
 	eth_dev->data->dev_started = 0;