[v2] net/bonding: failover of LACP with mode 4 takes long time

Message ID 20220606143445.501-1-gaoxiangliu0@163.com (mailing list archive)
State Accepted
Delegated to: Ferruh Yigit
Headers
Series [v2] net/bonding: failover of LACP with mode 4 takes long time |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/github-robot: build success github build: passed
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-abi-testing success Testing PASS

Commit Message

Gaoxiang Liu June 6, 2022, 2:34 p.m. UTC
  When the primary port of bond slaves with bond mode 4 linked down,
the system id of the other slave ports channged.
It may cause some switches to renegotiate,
and the process takes a few seconds. It is not acceptable for any
Telcos.
We need sub-second switchover time like in linux.

Set the mac of the bond port to the slave port's system to solve the
problem.

Bugzilla ID: 551
Fixes: 46fb43683679 ("bond: add mode 4")
Cc: stable@dpdk.org

Signed-off-by: Gaoxiang Liu <liugaoxiang@huawei.com>

---
v2:
* Fixed compile issues.
---
 drivers/net/bonding/rte_eth_bond_8023ad.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)
  

Comments

Andrew Rybchenko June 9, 2022, 8:28 a.m. UTC | #1
On 6/6/22 17:34, Gaoxiang Liu wrote:
> When the primary port of bond slaves with bond mode 4 linked down,
> the system id of the other slave ports channged.
> It may cause some switches to renegotiate,
> and the process takes a few seconds. It is not acceptable for any
> Telcos.
> We need sub-second switchover time like in linux.
> 
> Set the mac of the bond port to the slave port's system to solve the
> problem.
> 
> Bugzilla ID: 551
> Fixes: 46fb43683679 ("bond: add mode 4")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Gaoxiang Liu <liugaoxiang@huawei.com>

Should I use E-mail address from Signed-off-by as the author of the
patch? E-mail From does not match it.
  
Ferruh Yigit April 16, 2024, 3:44 p.m. UTC | #2
On 6/6/2022 3:34 PM, Gaoxiang Liu wrote:
> When the primary port of bond slaves with bond mode 4 linked down,
> the system id of the other slave ports channged.
> It may cause some switches to renegotiate,
> and the process takes a few seconds. It is not acceptable for any
> Telcos.
> We need sub-second switchover time like in linux.
> 
> Set the mac of the bond port to the slave port's system to solve the
> problem.
> 
> Bugzilla ID: 551
> Fixes: 46fb43683679 ("bond: add mode 4")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Gaoxiang Liu <liugaoxiang@huawei.com>
> 
>

This is another old patch, as far as I can see from bugzilla, the patch
the issue is reported by Kiran and proposed by Nandu, and shared in mail
list by Gaoxiang.

As it is addressing two bugzilla items, and we are at the beginning of
the release cycle, I am willing to take risk and apply this patch
without ack from a bonding maintainer.
If we found some issues we can revert this back.

Chas, Connor, if you have any objection, or if you need time for review
please let me know, otherwise I will merge the patch soon.

And if anyone can provide test, like Kiran who reported the issue, or
any other stakeholder, it helps a lot to give confidence to the fix.


Thanks,
ferruh
  
Ferruh Yigit April 17, 2024, 1:16 p.m. UTC | #3
On 4/16/2024 4:44 PM, Ferruh Yigit wrote:
> On 6/6/2022 3:34 PM, Gaoxiang Liu wrote:
>> When the primary port of bond slaves with bond mode 4 linked down,
>> the system id of the other slave ports channged.
>> It may cause some switches to renegotiate,
>> and the process takes a few seconds. It is not acceptable for any
>> Telcos.
>> We need sub-second switchover time like in linux.
>>
>> Set the mac of the bond port to the slave port's system to solve the
>> problem.
>>
>> Bugzilla ID: 551
>> Fixes: 46fb43683679 ("bond: add mode 4")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Gaoxiang Liu <liugaoxiang@huawei.com>
>>
>>
> 
> This is another old patch, as far as I can see from bugzilla, the patch
> the issue is reported by Kiran and proposed by Nandu, and shared in mail
> list by Gaoxiang.
> 
> As it is addressing two bugzilla items, and we are at the beginning of
> the release cycle, I am willing to take risk and apply this patch
> without ack from a bonding maintainer.
> If we found some issues we can revert this back.
> 
> Chas, Connor, if you have any objection, or if you need time for review
> please let me know, otherwise I will merge the patch soon.
> 
> And if anyone can provide test, like Kiran who reported the issue, or
> any other stakeholder, it helps a lot to give confidence to the fix.
> 

Applied to dpdk-next-net/main, thanks.


Please help on verifying the patch.
  

Patch

diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c
index b3cddd8a20..8d5486dad3 100644
--- a/drivers/net/bonding/rte_eth_bond_8023ad.c
+++ b/drivers/net/bonding/rte_eth_bond_8023ad.c
@@ -866,7 +866,6 @@  bond_mode_8023ad_periodic_cb(void *arg)
 	struct bond_dev_private *internals = bond_dev->data->dev_private;
 	struct port *port;
 	struct rte_eth_link link_info;
-	struct rte_ether_addr slave_addr;
 	struct rte_mbuf *lacp_pkt = NULL;
 	uint16_t slave_id;
 	uint16_t i;
@@ -893,7 +892,6 @@  bond_mode_8023ad_periodic_cb(void *arg)
 			key = 0;
 		}
 
-		rte_eth_macaddr_get(slave_id, &slave_addr);
 		port = &bond_mode_8023ad_ports[slave_id];
 
 		key = rte_cpu_to_be_16(key);
@@ -905,8 +903,8 @@  bond_mode_8023ad_periodic_cb(void *arg)
 			SM_FLAG_SET(port, NTT);
 		}
 
-		if (!rte_is_same_ether_addr(&port->actor.system, &slave_addr)) {
-			rte_ether_addr_copy(&slave_addr, &port->actor.system);
+		if (!rte_is_same_ether_addr(&internals->mode4.mac_addr, &port->actor.system)) {
+			rte_ether_addr_copy(&internals->mode4.mac_addr, &port->actor.system);
 			if (port->aggregator_port_id == slave_id)
 				SM_FLAG_SET(port, NTT);
 		}
@@ -1172,21 +1170,20 @@  void
 bond_mode_8023ad_mac_address_update(struct rte_eth_dev *bond_dev)
 {
 	struct bond_dev_private *internals = bond_dev->data->dev_private;
-	struct rte_ether_addr slave_addr;
 	struct port *slave, *agg_slave;
 	uint16_t slave_id, i, j;
 
 	bond_mode_8023ad_stop(bond_dev);
 
+	rte_eth_macaddr_get(internals->port_id, &internals->mode4.mac_addr);
 	for (i = 0; i < internals->active_slave_count; i++) {
 		slave_id = internals->active_slaves[i];
 		slave = &bond_mode_8023ad_ports[slave_id];
-		rte_eth_macaddr_get(slave_id, &slave_addr);
 
-		if (rte_is_same_ether_addr(&slave_addr, &slave->actor.system))
+		if (rte_is_same_ether_addr(&internals->mode4.mac_addr, &slave->actor.system))
 			continue;
 
-		rte_ether_addr_copy(&slave_addr, &slave->actor.system);
+		rte_ether_addr_copy(&internals->mode4.mac_addr, &slave->actor.system);
 		/* Do nothing if this port is not an aggregator. In other case
 		 * Set NTT flag on every port that use this aggregator. */
 		if (slave->aggregator_port_id != slave_id)