[dpdk-dev,v3] net/bonding: fix slave add for mode 4

Message ID 1527783047-18201-1-git-send-email-radu.nicolau@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Radu Nicolau May 31, 2018, 4:10 p.m. UTC
  Add a call to rte_eth_link_get_nowait on every slave to update
the internal link status struct. Otherwise slave add will fail
for mode 4 if the ports are all stopped but only one of them checked.

Fixes: b77d21cc2364 ("ethdev: add link status get/set helper functions")
Bugzilla ID: 52

Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
---
v3: updated commit msg
v2: add fix and Bugzilla references

 drivers/net/bonding/rte_eth_bond_api.c | 2 ++
 1 file changed, 2 insertions(+)
  

Comments

Chas Williams June 1, 2018, 12:05 a.m. UTC | #1
It's not clear to me that the issue here is the bonding slave add.
You can only add started PMDs.  When a PMD dev start is complete,
the PMD should have a valid link state and the link properties should be
valid.  A few of the PMDs are very good about this, particularly the
ones with LSC interrupts.  Those drivers often wait for the first
link interrupt before setting their link status.  So there is a
race where the link state isn't well defined.

And lastly, why do we care what the link state is when adding a
slave?  If the link state changes to down, do we remove the slave?
If the link speed of the slave changes, do we remove the slave?
So this test doesn't make much sense.  For mode 4, you should be
able to add a slave, but if the link state doesn't match what
has been negotiated, then the slave should fail to activate.

On Thu, May 31, 2018 at 12:10 PM, Radu Nicolau <radu.nicolau@intel.com>
wrote:
>
> Add a call to rte_eth_link_get_nowait on every slave to update
> the internal link status struct. Otherwise slave add will fail
> for mode 4 if the ports are all stopped but only one of them checked.
>
> Fixes: b77d21cc2364 ("ethdev: add link status get/set helper functions")
> Bugzilla ID: 52
>
> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
> ---
> v3: updated commit msg
> v2: add fix and Bugzilla references
>
>  drivers/net/bonding/rte_eth_bond_api.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/bonding/rte_eth_bond_api.c
b/drivers/net/bonding/rte_eth_bond_api.c
> index d558df8..cad08b9 100644
> --- a/drivers/net/bonding/rte_eth_bond_api.c
> +++ b/drivers/net/bonding/rte_eth_bond_api.c
> @@ -296,6 +296,8 @@ __eth_bond_slave_add_lock_free(uint16_t
bonded_port_id, uint16_t slave_port_id)
>                 return -1;
>         }
>
> +       rte_eth_link_get_nowait(slave_port_id, &link_props);
> +
>         slave_add(internals, slave_eth_dev);
>
>         /* We need to store slaves reta_size to be able to synchronize
RETA for all
> --
> 2.7.5
>
  
Radu Nicolau June 1, 2018, 9:59 a.m. UTC | #2
On 6/1/2018 1:05 AM, Chas Williams wrote:
> It's not clear to me that the issue here is the bonding slave add.
> You can only add started PMDs.  When a PMD dev start is complete,
> the PMD should have a valid link state and the link properties should be
> valid.  A few of the PMDs are very good about this, particularly the
> ones with LSC interrupts.  Those drivers often wait for the first
> link interrupt before setting their link status.  So there is a
> race where the link state isn't well defined.
Indeed, the source of the problem is that the link state is not properly 
reflected across all the ports. So, the issue steps are the following:

1. user issues "port stop all" in testpmd; when a port is stopped the 
internal link state bits are cleared (and LSC interrupt will not run 
from what I can see)
2. testpmd tries to update link status on all the ports, reads the state 
of the first port updating the bits that were cleared; with a stopped 
port ixgbe PMD and probably others sets the link_autoneg bit, but all 
other bits remain cleared
3. seeing a link down, tespmd stops checking; now first port has the 
link state link_autoneg bit set, but all other ports have it cleared
4. trying to create a bonded port in mode 4 fails because of the 
link_autoneg bit

To reiterate, the issue is creating a bonding port with stopped ports 
that have the link_status bits cleared, but not updated except the first 
port. My fix updated the bits on all the ports.

>
> And lastly, why do we care what the link state is when adding a
> slave?  If the link state changes to down, do we remove the slave?
> If the link speed of the slave changes, do we remove the slave?
> So this test doesn't make much sense.  For mode 4, you should be
> able to add a slave, but if the link state doesn't match what
> has been negotiated, then the slave should fail to activate.
You are right, I will send an updated patch that checks the slave link 
status before activation. This should also solve the initial issue, as 
the link state will be already updated.

>
> On Thu, May 31, 2018 at 12:10 PM, Radu Nicolau <radu.nicolau@intel.com 
> <mailto:radu.nicolau@intel.com>> wrote:
> >
> > Add a call to rte_eth_link_get_nowait on every slave to update
> > the internal link status struct. Otherwise slave add will fail
> > for mode 4 if the ports are all stopped but only one of them checked.
> >
> > Fixes: b77d21cc2364 ("ethdev: add link status get/set helper functions")
> > Bugzilla ID: 52
> >
> > Signed-off-by: Radu Nicolau <radu.nicolau@intel.com 
> <mailto:radu.nicolau@intel.com>>
> > ---
> > v3: updated commit msg
> > v2: add fix and Bugzilla references
> >
> >  drivers/net/bonding/rte_eth_bond_api.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
> b/drivers/net/bonding/rte_eth_bond_api.c
> > index d558df8..cad08b9 100644
> > --- a/drivers/net/bonding/rte_eth_bond_api.c
> > +++ b/drivers/net/bonding/rte_eth_bond_api.c
> > @@ -296,6 +296,8 @@ __eth_bond_slave_add_lock_free(uint16_t 
> bonded_port_id, uint16_t slave_port_id)
> >                 return -1;
> >         }
> >
> > +       rte_eth_link_get_nowait(slave_port_id, &link_props);
> > +
> >         slave_add(internals, slave_eth_dev);
> >
> >         /* We need to store slaves reta_size to be able to 
> synchronize RETA for all
> > --
> > 2.7.5
> >
  

Patch

diff --git a/drivers/net/bonding/rte_eth_bond_api.c b/drivers/net/bonding/rte_eth_bond_api.c
index d558df8..cad08b9 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -296,6 +296,8 @@  __eth_bond_slave_add_lock_free(uint16_t bonded_port_id, uint16_t slave_port_id)
 		return -1;
 	}
 
+	rte_eth_link_get_nowait(slave_port_id, &link_props);
+
 	slave_add(internals, slave_eth_dev);
 
 	/* We need to store slaves reta_size to be able to synchronize RETA for all