[v2] net/bonding: failover of LACP with mode 4 takes long time
Checks
Commit Message
When the primary port of bond slaves with bond mode 4 linked down,
the system id of the other slave ports channged.
It may cause some switches to renegotiate,
and the process takes a few seconds. It is not acceptable for any
Telcos.
We need sub-second switchover time like in linux.
Set the mac of the bond port to the slave port's system to solve the
problem.
Bugzilla ID: 551
Fixes: 46fb43683679 ("bond: add mode 4")
Cc: stable@dpdk.org
Signed-off-by: Gaoxiang Liu <liugaoxiang@huawei.com>
---
v2:
* Fixed compile issues.
---
drivers/net/bonding/rte_eth_bond_8023ad.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
Comments
On 6/6/22 17:34, Gaoxiang Liu wrote:
> When the primary port of bond slaves with bond mode 4 linked down,
> the system id of the other slave ports channged.
> It may cause some switches to renegotiate,
> and the process takes a few seconds. It is not acceptable for any
> Telcos.
> We need sub-second switchover time like in linux.
>
> Set the mac of the bond port to the slave port's system to solve the
> problem.
>
> Bugzilla ID: 551
> Fixes: 46fb43683679 ("bond: add mode 4")
> Cc: stable@dpdk.org
>
> Signed-off-by: Gaoxiang Liu <liugaoxiang@huawei.com>
Should I use E-mail address from Signed-off-by as the author of the
patch? E-mail From does not match it.
On 6/6/2022 3:34 PM, Gaoxiang Liu wrote:
> When the primary port of bond slaves with bond mode 4 linked down,
> the system id of the other slave ports channged.
> It may cause some switches to renegotiate,
> and the process takes a few seconds. It is not acceptable for any
> Telcos.
> We need sub-second switchover time like in linux.
>
> Set the mac of the bond port to the slave port's system to solve the
> problem.
>
> Bugzilla ID: 551
> Fixes: 46fb43683679 ("bond: add mode 4")
> Cc: stable@dpdk.org
>
> Signed-off-by: Gaoxiang Liu <liugaoxiang@huawei.com>
>
>
This is another old patch, as far as I can see from bugzilla, the patch
the issue is reported by Kiran and proposed by Nandu, and shared in mail
list by Gaoxiang.
As it is addressing two bugzilla items, and we are at the beginning of
the release cycle, I am willing to take risk and apply this patch
without ack from a bonding maintainer.
If we found some issues we can revert this back.
Chas, Connor, if you have any objection, or if you need time for review
please let me know, otherwise I will merge the patch soon.
And if anyone can provide test, like Kiran who reported the issue, or
any other stakeholder, it helps a lot to give confidence to the fix.
Thanks,
ferruh
On 4/16/2024 4:44 PM, Ferruh Yigit wrote:
> On 6/6/2022 3:34 PM, Gaoxiang Liu wrote:
>> When the primary port of bond slaves with bond mode 4 linked down,
>> the system id of the other slave ports channged.
>> It may cause some switches to renegotiate,
>> and the process takes a few seconds. It is not acceptable for any
>> Telcos.
>> We need sub-second switchover time like in linux.
>>
>> Set the mac of the bond port to the slave port's system to solve the
>> problem.
>>
>> Bugzilla ID: 551
>> Fixes: 46fb43683679 ("bond: add mode 4")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Gaoxiang Liu <liugaoxiang@huawei.com>
>>
>>
>
> This is another old patch, as far as I can see from bugzilla, the patch
> the issue is reported by Kiran and proposed by Nandu, and shared in mail
> list by Gaoxiang.
>
> As it is addressing two bugzilla items, and we are at the beginning of
> the release cycle, I am willing to take risk and apply this patch
> without ack from a bonding maintainer.
> If we found some issues we can revert this back.
>
> Chas, Connor, if you have any objection, or if you need time for review
> please let me know, otherwise I will merge the patch soon.
>
> And if anyone can provide test, like Kiran who reported the issue, or
> any other stakeholder, it helps a lot to give confidence to the fix.
>
Applied to dpdk-next-net/main, thanks.
Please help on verifying the patch.
@@ -866,7 +866,6 @@ bond_mode_8023ad_periodic_cb(void *arg)
struct bond_dev_private *internals = bond_dev->data->dev_private;
struct port *port;
struct rte_eth_link link_info;
- struct rte_ether_addr slave_addr;
struct rte_mbuf *lacp_pkt = NULL;
uint16_t slave_id;
uint16_t i;
@@ -893,7 +892,6 @@ bond_mode_8023ad_periodic_cb(void *arg)
key = 0;
}
- rte_eth_macaddr_get(slave_id, &slave_addr);
port = &bond_mode_8023ad_ports[slave_id];
key = rte_cpu_to_be_16(key);
@@ -905,8 +903,8 @@ bond_mode_8023ad_periodic_cb(void *arg)
SM_FLAG_SET(port, NTT);
}
- if (!rte_is_same_ether_addr(&port->actor.system, &slave_addr)) {
- rte_ether_addr_copy(&slave_addr, &port->actor.system);
+ if (!rte_is_same_ether_addr(&internals->mode4.mac_addr, &port->actor.system)) {
+ rte_ether_addr_copy(&internals->mode4.mac_addr, &port->actor.system);
if (port->aggregator_port_id == slave_id)
SM_FLAG_SET(port, NTT);
}
@@ -1172,21 +1170,20 @@ void
bond_mode_8023ad_mac_address_update(struct rte_eth_dev *bond_dev)
{
struct bond_dev_private *internals = bond_dev->data->dev_private;
- struct rte_ether_addr slave_addr;
struct port *slave, *agg_slave;
uint16_t slave_id, i, j;
bond_mode_8023ad_stop(bond_dev);
+ rte_eth_macaddr_get(internals->port_id, &internals->mode4.mac_addr);
for (i = 0; i < internals->active_slave_count; i++) {
slave_id = internals->active_slaves[i];
slave = &bond_mode_8023ad_ports[slave_id];
- rte_eth_macaddr_get(slave_id, &slave_addr);
- if (rte_is_same_ether_addr(&slave_addr, &slave->actor.system))
+ if (rte_is_same_ether_addr(&internals->mode4.mac_addr, &slave->actor.system))
continue;
- rte_ether_addr_copy(&slave_addr, &slave->actor.system);
+ rte_ether_addr_copy(&internals->mode4.mac_addr, &slave->actor.system);
/* Do nothing if this port is not an aggregator. In other case
* Set NTT flag on every port that use this aggregator. */
if (slave->aggregator_port_id != slave_id)