[V2,1/4] net/bonding: fix non-active slaves aren't stopped

Message ID 20220324030036.4761-2-humin29@huawei.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series bugfix for bonding |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

humin (Q) March 24, 2022, 3 a.m. UTC
  From: Huisong Li <lihuisong@huawei.com>

When stopping a bonded port, all slaves should be deactivated. But only
active slaves are stopped. So fix it and do "deactivae_slave()" for active
slaves.

Fixes: 0911d4ec0183 ("net/bonding: fix crash when stopping mode 4 port")
Cc: stable@dpdk.org

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)
  

Comments

Ferruh Yigit April 26, 2022, 6:19 p.m. UTC | #1
On 3/24/2022 3:00 AM, Min Hu (Connor) wrote:
> From: Huisong Li <lihuisong@huawei.com>
> 
> When stopping a bonded port, all slaves should be deactivated. But only

s/deactivated/stopped/ ?

> active slaves are stopped. So fix it and do "deactivae_slave()" for active

s/deactivae_slave()/deactivate_slave()/

> slaves.

Hi Connor,

When a bonding port is closed, is it clear if all slave ports or active 
slave ports should be stopped?

> 
> Fixes: 0911d4ec0183 ("net/bonding: fix crash when stopping mode 4 port")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Huisong Li <lihuisong@huawei.com>
> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
> ---
>   drivers/net/bonding/rte_eth_bond_pmd.c | 20 +++++++++++---------
>   1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
> index b305b6a35b..469dc71170 100644
> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> @@ -2118,18 +2118,20 @@ bond_ethdev_stop(struct rte_eth_dev *eth_dev)
>   	internals->link_status_polling_enabled = 0;
>   	for (i = 0; i < internals->slave_count; i++) {
>   		uint16_t slave_id = internals->slaves[i].port_id;
> +
> +		internals->slaves[i].last_link_status = 0;
> +		ret = rte_eth_dev_stop(slave_id);
> +		if (ret != 0) {
> +			RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
> +				     slave_id);
> +			return ret;

Should it return here or try to stop all ports?
What about to record the return status, but keep continue to stop all 
ports. And return error if any of the stop failed?

> +		}
> +
> +		/* active slaves need to deactivate. */

" active slaves need to be deactivated. " ?

>   		if (find_slave_by_id(internals->active_slaves,
>   				internals->active_slave_count, slave_id) !=
> -						internals->active_slave_count) {
> -			internals->slaves[i].last_link_status = 0;
> -			ret = rte_eth_dev_stop(slave_id);
> -			if (ret != 0) {
> -				RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
> -					     slave_id);
> -				return ret;
> -			}
> +				internals->active_slave_count)

I think original indentation for this line is better.

>   			deactivate_slave(eth_dev, slave_id);
> -		}
>   	}
>   
>   	return 0;
  
humin (Q) April 29, 2022, 6:45 a.m. UTC | #2
Hi, Ferruh,

在 2022/4/27 2:19, Ferruh Yigit 写道:
> On 3/24/2022 3:00 AM, Min Hu (Connor) wrote:
>> From: Huisong Li <lihuisong@huawei.com>
>>
>> When stopping a bonded port, all slaves should be deactivated. But only
> 
> s/deactivated/stopped/ ?
not agreed. deactivated and stopped are different state for slave.

> 
>> active slaves are stopped. So fix it and do "deactivae_slave()" for 
>> active
> 
> s/deactivae_slave()/deactivate_slave()/
> 
agreed.

>> slaves.
> 
> Hi Connor,
> 
> When a bonding port is closed, is it clear if all slave ports or active 
> slave ports should be stopped?
Yes, I think all the slave ports should be stopped(or try to be stopped).
> 
>>
>> Fixes: 0911d4ec0183 ("net/bonding: fix crash when stopping mode 4 port")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Huisong Li <lihuisong@huawei.com>
>> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
>> ---
>>   drivers/net/bonding/rte_eth_bond_pmd.c | 20 +++++++++++---------
>>   1 file changed, 11 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>> index b305b6a35b..469dc71170 100644
>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>> @@ -2118,18 +2118,20 @@ bond_ethdev_stop(struct rte_eth_dev *eth_dev)
>>       internals->link_status_polling_enabled = 0;
>>       for (i = 0; i < internals->slave_count; i++) {
>>           uint16_t slave_id = internals->slaves[i].port_id;
>> +
>> +        internals->slaves[i].last_link_status = 0;
>> +        ret = rte_eth_dev_stop(slave_id);
>> +        if (ret != 0) {
>> +            RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
>> +                     slave_id);
>> +            return ret;
> 
> Should it return here or try to stop all ports?
> What about to record the return status, but keep continue to stop all 
> ports. And return error if any of the stop failed?
I think no need to do this. APP only see the bonded device. If bonded
device stop failed, it means it works failed. And the number of 
"stopped" successfully slave does not make any sense.

> 
>> +        }
>> +
>> +        /* active slaves need to deactivate. */
> 
> " active slaves need to be deactivated. " ?
agreed.
> 
>>           if (find_slave_by_id(internals->active_slaves,
>>                   internals->active_slave_count, slave_id) !=
>> -                        internals->active_slave_count) {
>> -            internals->slaves[i].last_link_status = 0;
>> -            ret = rte_eth_dev_stop(slave_id);
>> -            if (ret != 0) {
>> -                RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
>> -                         slave_id);
>> -                return ret;
>> -            }
>> +                internals->active_slave_count)
> 
> I think original indentation for this line is better.
> 
agreed.
>>               deactivate_slave(eth_dev, slave_id);
>> -        }
>>       }
>>       return 0;
> 
> .
  
Ferruh Yigit April 29, 2022, 1:31 p.m. UTC | #3
On 4/29/2022 7:45 AM, Min Hu (Connor) wrote:
> Hi, Ferruh,
> 
> 在 2022/4/27 2:19, Ferruh Yigit 写道:
>> On 3/24/2022 3:00 AM, Min Hu (Connor) wrote:
>>> From: Huisong Li <lihuisong@huawei.com>
>>>
>>> When stopping a bonded port, all slaves should be deactivated. But only
>>
>> s/deactivated/stopped/ ?
> not agreed. deactivated and stopped are different state for slave.
> 

Just to clarify the sentences, otherwise I see the 'stopped' and 
'deactivated' states are different.
Next sentences complains that not all ports are stopped: "But only 
active slaves are stopped.", so I thought intention in this sentences to 
claim that all slaves should be stopped (but it mentions all slaves 
should be 'deactivated').
As long as you address the disconnection between two sentences, I don't 
mind the wording.

>>
>>> active slaves are stopped. So fix it and do "deactivae_slave()" for 
>>> active
>>
>> s/deactivae_slave()/deactivate_slave()/
>>
> agreed.
> 
>>> slaves.
>>
>> Hi Connor,
>>
>> When a bonding port is closed, is it clear if all slave ports or 
>> active slave ports should be stopped?
> Yes, I think all the slave ports should be stopped(or try to be stopped).
>>
>>>
>>> Fixes: 0911d4ec0183 ("net/bonding: fix crash when stopping mode 4 port")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Huisong Li <lihuisong@huawei.com>
>>> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
>>> ---
>>>   drivers/net/bonding/rte_eth_bond_pmd.c | 20 +++++++++++---------
>>>   1 file changed, 11 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
>>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>>> index b305b6a35b..469dc71170 100644
>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>> @@ -2118,18 +2118,20 @@ bond_ethdev_stop(struct rte_eth_dev *eth_dev)
>>>       internals->link_status_polling_enabled = 0;
>>>       for (i = 0; i < internals->slave_count; i++) {
>>>           uint16_t slave_id = internals->slaves[i].port_id;
>>> +
>>> +        internals->slaves[i].last_link_status = 0;
>>> +        ret = rte_eth_dev_stop(slave_id);
>>> +        if (ret != 0) {
>>> +            RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
>>> +                     slave_id);
>>> +            return ret;
>>
>> Should it return here or try to stop all ports?
>> What about to record the return status, but keep continue to stop all 
>> ports. And return error if any of the stop failed?
> I think no need to do this. APP only see the bonded device. If bonded
> device stop failed, it means it works failed. And the number of 
> "stopped" successfully slave does not make any sense.
> 

OK if trying to stop as much as possible 'slave' devices doesn't make 
sense, we can keep as it is.

Btw, when functions fails at this point, bonding device itself already 
marked as stopped, right? And some of the slave devices may be stopped 
already before failure.
I don't know how confusing this is for the user, that stop() function is 
failed but bonding device state is 'stopped'. I don't know if function 
should recover at least bonding device status (back to started) on 
failure, what do you think?

>>
>>> +        }
>>> +
>>> +        /* active slaves need to deactivate. */
>>
>> " active slaves need to be deactivated. " ?
> agreed.
>>
>>>           if (find_slave_by_id(internals->active_slaves,
>>>                   internals->active_slave_count, slave_id) !=
>>> -                        internals->active_slave_count) {
>>> -            internals->slaves[i].last_link_status = 0;
>>> -            ret = rte_eth_dev_stop(slave_id);
>>> -            if (ret != 0) {
>>> -                RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
>>> -                         slave_id);
>>> -                return ret;
>>> -            }
>>> +                internals->active_slave_count)
>>
>> I think original indentation for this line is better.
>>
> agreed.
>>>               deactivate_slave(eth_dev, slave_id);
>>> -        }
>>>       }
>>>       return 0;
>>
>> .
  
humin (Q) May 3, 2022, 6:54 a.m. UTC | #4
Hi, Ferruh,

在 2022/4/29 21:31, Ferruh Yigit 写道:
> On 4/29/2022 7:45 AM, Min Hu (Connor) wrote:
>> Hi, Ferruh,
>>
>> 在 2022/4/27 2:19, Ferruh Yigit 写道:
>>> On 3/24/2022 3:00 AM, Min Hu (Connor) wrote:
>>>> From: Huisong Li <lihuisong@huawei.com>
>>>>
>>>> When stopping a bonded port, all slaves should be deactivated. But only
>>>
>>> s/deactivated/stopped/ ?
>> not agreed. deactivated and stopped are different state for slave.
>>
> 
> Just to clarify the sentences, otherwise I see the 'stopped' and 
> 'deactivated' states are different.
> Next sentences complains that not all ports are stopped: "But only 
> active slaves are stopped.", so I thought intention in this sentences to 
> claim that all slaves should be stopped (but it mentions all slaves 
> should be 'deactivated').
> As long as you address the disconnection between two sentences, I don't 
> mind the wording.
Actually, there is something wrong with the wording.
Yes, I should take your advice.

> 
>>>
>>>> active slaves are stopped. So fix it and do "deactivae_slave()" for 
>>>> active
>>>
>>> s/deactivae_slave()/deactivate_slave()/
>>>
>> agreed.
>>
>>>> slaves.
>>>
>>> Hi Connor,
>>>
>>> When a bonding port is closed, is it clear if all slave ports or 
>>> active slave ports should be stopped?
>> Yes, I think all the slave ports should be stopped(or try to be stopped).
>>>
>>>>
>>>> Fixes: 0911d4ec0183 ("net/bonding: fix crash when stopping mode 4 
>>>> port")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Huisong Li <lihuisong@huawei.com>
>>>> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
>>>> ---
>>>>   drivers/net/bonding/rte_eth_bond_pmd.c | 20 +++++++++++---------
>>>>   1 file changed, 11 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
>>>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> index b305b6a35b..469dc71170 100644
>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> @@ -2118,18 +2118,20 @@ bond_ethdev_stop(struct rte_eth_dev *eth_dev)
>>>>       internals->link_status_polling_enabled = 0;
>>>>       for (i = 0; i < internals->slave_count; i++) {
>>>>           uint16_t slave_id = internals->slaves[i].port_id;
>>>> +
>>>> +        internals->slaves[i].last_link_status = 0;
>>>> +        ret = rte_eth_dev_stop(slave_id);
>>>> +        if (ret != 0) {
>>>> +            RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
>>>> +                     slave_id);
>>>> +            return ret;
>>>
>>> Should it return here or try to stop all ports?
>>> What about to record the return status, but keep continue to stop all 
>>> ports. And return error if any of the stop failed?
Well, I am glad you have found something unreasaonable about 'stop'.
Let us see API 'rte_eth_dev_stop'

rte_eth_dev_stop(dev)
{
	....
	dev->data->dev_started = 0;
	ret = (*dev->dev_ops->dev_stop)(dev)
	retur ret;
}
This is unreasaonable. No matter 'dev_ops->dev_stop' succeed or fail,
the state 'dev_started ' will always set to be '0'.

But this does not only influence bonding device but other devices like
eth dev or vdev.
This is the bug in rte ethdev level. I will send another patch to fix
it.


>> I think no need to do this. APP only see the bonded device. If bonded
>> device stop failed, it means it works failed. And the number of 
>> "stopped" successfully slave does not make any sense.
>>
> 
> OK if trying to stop as much as possible 'slave' devices doesn't make 
> sense, we can keep as it is.
> 
> Btw, when functions fails at this point, bonding device itself already 
> marked as stopped, right? And some of the slave devices may be stopped 
> already before failure.
> I don't know how confusing this is for the user, that stop() function is 
> failed but bonding device state is 'stopped'. I don't know if function 
> should recover at least bonding device status (back to started) on 
> failure, what do you think?
> 
>>>
>>>> +        }
>>>> +
>>>> +        /* active slaves need to deactivate. */
>>>
>>> " active slaves need to be deactivated. " ?
>> agreed.
>>>
>>>>           if (find_slave_by_id(internals->active_slaves,
>>>>                   internals->active_slave_count, slave_id) !=
>>>> -                        internals->active_slave_count) {
>>>> -            internals->slaves[i].last_link_status = 0;
>>>> -            ret = rte_eth_dev_stop(slave_id);
>>>> -            if (ret != 0) {
>>>> -                RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
>>>> -                         slave_id);
>>>> -                return ret;
>>>> -            }
>>>> +                internals->active_slave_count)
>>>
>>> I think original indentation for this line is better.
>>>
>> agreed.
>>>>               deactivate_slave(eth_dev, slave_id);
>>>> -        }
>>>>       }
>>>>       return 0;
>>>
>>> .
> 
> .
  
Ferruh Yigit May 3, 2022, 7:04 p.m. UTC | #5
On 5/3/2022 7:54 AM, Min Hu (Connor) wrote:
> Hi, Ferruh,
> 
> 在 2022/4/29 21:31, Ferruh Yigit 写道:
>> On 4/29/2022 7:45 AM, Min Hu (Connor) wrote:
>>> Hi, Ferruh,
>>>
>>> 在 2022/4/27 2:19, Ferruh Yigit 写道:
>>>> On 3/24/2022 3:00 AM, Min Hu (Connor) wrote:
>>>>> From: Huisong Li <lihuisong@huawei.com>
>>>>>
>>>>> When stopping a bonded port, all slaves should be deactivated. But 
>>>>> only
>>>>
>>>> s/deactivated/stopped/ ?
>>> not agreed. deactivated and stopped are different state for slave.
>>>
>>
>> Just to clarify the sentences, otherwise I see the 'stopped' and 
>> 'deactivated' states are different.
>> Next sentences complains that not all ports are stopped: "But only 
>> active slaves are stopped.", so I thought intention in this sentences 
>> to claim that all slaves should be stopped (but it mentions all slaves 
>> should be 'deactivated').
>> As long as you address the disconnection between two sentences, I 
>> don't mind the wording.
> Actually, there is something wrong with the wording.
> Yes, I should take your advice.
> 
>>
>>>>
>>>>> active slaves are stopped. So fix it and do "deactivae_slave()" for 
>>>>> active
>>>>
>>>> s/deactivae_slave()/deactivate_slave()/
>>>>
>>> agreed.
>>>
>>>>> slaves.
>>>>
>>>> Hi Connor,
>>>>
>>>> When a bonding port is closed, is it clear if all slave ports or 
>>>> active slave ports should be stopped?
>>> Yes, I think all the slave ports should be stopped(or try to be 
>>> stopped).
>>>>
>>>>>
>>>>> Fixes: 0911d4ec0183 ("net/bonding: fix crash when stopping mode 4 
>>>>> port")
>>>>> Cc: stable@dpdk.org
>>>>>
>>>>> Signed-off-by: Huisong Li <lihuisong@huawei.com>
>>>>> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
>>>>> ---
>>>>>   drivers/net/bonding/rte_eth_bond_pmd.c | 20 +++++++++++---------
>>>>>   1 file changed, 11 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
>>>>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> index b305b6a35b..469dc71170 100644
>>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> @@ -2118,18 +2118,20 @@ bond_ethdev_stop(struct rte_eth_dev *eth_dev)
>>>>>       internals->link_status_polling_enabled = 0;
>>>>>       for (i = 0; i < internals->slave_count; i++) {
>>>>>           uint16_t slave_id = internals->slaves[i].port_id;
>>>>> +
>>>>> +        internals->slaves[i].last_link_status = 0;
>>>>> +        ret = rte_eth_dev_stop(slave_id);
>>>>> +        if (ret != 0) {
>>>>> +            RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
>>>>> +                     slave_id);
>>>>> +            return ret;
>>>>
>>>> Should it return here or try to stop all ports?
>>>> What about to record the return status, but keep continue to stop 
>>>> all ports. And return error if any of the stop failed?
> Well, I am glad you have found something unreasaonable about 'stop'.
> Let us see API 'rte_eth_dev_stop'
> 
> rte_eth_dev_stop(dev)
> {
>      ....
>      dev->data->dev_started = 0;
>      ret = (*dev->dev_ops->dev_stop)(dev)
>      retur ret;
> }
> This is unreasaonable. No matter 'dev_ops->dev_stop' succeed or fail,
> the state 'dev_started ' will always set to be '0'.
> 
> But this does not only influence bonding device but other devices like
> eth dev or vdev.
> This is the bug in rte ethdev level. I will send another patch to fix
> it.
> 

Hi Connor,

I agree this is an issue in the API, cc'ed Andrew and Thomas.

I vaguely remember that "dev_started = 0" was required for some dev_ops, 
but not quite sure, let me check this.
At worst we can do as following to be sure:

   dev->data->dev_started = 0;
   ret = (*dev->dev_ops->dev_stop)(dev)
   if (ret)
     dev->data->dev_started = 1;

Also we need to clarify in the API documentation (.h file), what is the 
status of the device if 'rte_eth_dev_stop()' returned error.


Btw, would you be OK to separate this ethdev patch from your bonding 
patch, to not stuck your series because of ethdev one.


> 
>>> I think no need to do this. APP only see the bonded device. If bonded
>>> device stop failed, it means it works failed. And the number of 
>>> "stopped" successfully slave does not make any sense.
>>>
>>
>> OK if trying to stop as much as possible 'slave' devices doesn't make 
>> sense, we can keep as it is.
>>
>> Btw, when functions fails at this point, bonding device itself already 
>> marked as stopped, right? And some of the slave devices may be stopped 
>> already before failure.
>> I don't know how confusing this is for the user, that stop() function 
>> is failed but bonding device state is 'stopped'. I don't know if 
>> function should recover at least bonding device status (back to 
>> started) on failure, what do you think?
>>
>>>>
>>>>> +        }
>>>>> +
>>>>> +        /* active slaves need to deactivate. */
>>>>
>>>> " active slaves need to be deactivated. " ?
>>> agreed.
>>>>
>>>>>           if (find_slave_by_id(internals->active_slaves,
>>>>>                   internals->active_slave_count, slave_id) !=
>>>>> -                        internals->active_slave_count) {
>>>>> -            internals->slaves[i].last_link_status = 0;
>>>>> -            ret = rte_eth_dev_stop(slave_id);
>>>>> -            if (ret != 0) {
>>>>> -                RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
>>>>> -                         slave_id);
>>>>> -                return ret;
>>>>> -            }
>>>>> +                internals->active_slave_count)
>>>>
>>>> I think original indentation for this line is better.
>>>>
>>> agreed.
>>>>>               deactivate_slave(eth_dev, slave_id);
>>>>> -        }
>>>>>       }
>>>>>       return 0;
>>>>
>>>> .
>>
>> .
  
humin (Q) May 5, 2022, 1:16 a.m. UTC | #6
Hi, Ferruh,

在 2022/5/4 3:04, Ferruh Yigit 写道:
> On 5/3/2022 7:54 AM, Min Hu (Connor) wrote:
>> Hi, Ferruh,
>>
>> 在 2022/4/29 21:31, Ferruh Yigit 写道:
>>> On 4/29/2022 7:45 AM, Min Hu (Connor) wrote:
>>>> Hi, Ferruh,
>>>>
>>>> 在 2022/4/27 2:19, Ferruh Yigit 写道:
>>>>> On 3/24/2022 3:00 AM, Min Hu (Connor) wrote:
>>>>>> From: Huisong Li <lihuisong@huawei.com>
>>>>>>
>>>>>> When stopping a bonded port, all slaves should be deactivated. But 
>>>>>> only
>>>>>
>>>>> s/deactivated/stopped/ ?
>>>> not agreed. deactivated and stopped are different state for slave.
>>>>
>>>
>>> Just to clarify the sentences, otherwise I see the 'stopped' and 
>>> 'deactivated' states are different.
>>> Next sentences complains that not all ports are stopped: "But only 
>>> active slaves are stopped.", so I thought intention in this sentences 
>>> to claim that all slaves should be stopped (but it mentions all 
>>> slaves should be 'deactivated').
>>> As long as you address the disconnection between two sentences, I 
>>> don't mind the wording.
>> Actually, there is something wrong with the wording.
>> Yes, I should take your advice.
>>
>>>
>>>>>
>>>>>> active slaves are stopped. So fix it and do "deactivae_slave()" 
>>>>>> for active
>>>>>
>>>>> s/deactivae_slave()/deactivate_slave()/
>>>>>
>>>> agreed.
>>>>
>>>>>> slaves.
>>>>>
>>>>> Hi Connor,
>>>>>
>>>>> When a bonding port is closed, is it clear if all slave ports or 
>>>>> active slave ports should be stopped?
>>>> Yes, I think all the slave ports should be stopped(or try to be 
>>>> stopped).
>>>>>
>>>>>>
>>>>>> Fixes: 0911d4ec0183 ("net/bonding: fix crash when stopping mode 4 
>>>>>> port")
>>>>>> Cc: stable@dpdk.org
>>>>>>
>>>>>> Signed-off-by: Huisong Li <lihuisong@huawei.com>
>>>>>> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
>>>>>> ---
>>>>>>   drivers/net/bonding/rte_eth_bond_pmd.c | 20 +++++++++++---------
>>>>>>   1 file changed, 11 insertions(+), 9 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
>>>>>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>> index b305b6a35b..469dc71170 100644
>>>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>> @@ -2118,18 +2118,20 @@ bond_ethdev_stop(struct rte_eth_dev *eth_dev)
>>>>>>       internals->link_status_polling_enabled = 0;
>>>>>>       for (i = 0; i < internals->slave_count; i++) {
>>>>>>           uint16_t slave_id = internals->slaves[i].port_id;
>>>>>> +
>>>>>> +        internals->slaves[i].last_link_status = 0;
>>>>>> +        ret = rte_eth_dev_stop(slave_id);
>>>>>> +        if (ret != 0) {
>>>>>> +            RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
>>>>>> +                     slave_id);
>>>>>> +            return ret;
>>>>>
>>>>> Should it return here or try to stop all ports?
>>>>> What about to record the return status, but keep continue to stop 
>>>>> all ports. And return error if any of the stop failed?
>> Well, I am glad you have found something unreasaonable about 'stop'.
>> Let us see API 'rte_eth_dev_stop'
>>
>> rte_eth_dev_stop(dev)
>> {
>>      ....
>>      dev->data->dev_started = 0;
>>      ret = (*dev->dev_ops->dev_stop)(dev)
>>      retur ret;
>> }
>> This is unreasaonable. No matter 'dev_ops->dev_stop' succeed or fail,
>> the state 'dev_started ' will always set to be '0'.
>>
>> But this does not only influence bonding device but other devices like
>> eth dev or vdev.
>> This is the bug in rte ethdev level. I will send another patch to fix
>> it.
>>
> 
> Hi Connor,
> 
> I agree this is an issue in the API, cc'ed Andrew and Thomas.
> 
> I vaguely remember that "dev_started = 0" was required for some dev_ops, 
> but not quite sure, let me check this.
> At worst we can do as following to be sure:
> 
>    dev->data->dev_started = 0;
>    ret = (*dev->dev_ops->dev_stop)(dev)
>    if (ret)
>      dev->data->dev_started = 1;
> 
> Also we need to clarify in the API documentation (.h file), what is the 
> status of the device if 'rte_eth_dev_stop()' returned error.
> 
> 
> Btw, would you be OK to separate this ethdev patch from your bonding 
> patch, to not stuck your series because of ethdev one.
Yes, this patch can be abandoned from this set.

> 
> 
>>
>>>> I think no need to do this. APP only see the bonded device. If bonded
>>>> device stop failed, it means it works failed. And the number of 
>>>> "stopped" successfully slave does not make any sense.
>>>>
>>>
>>> OK if trying to stop as much as possible 'slave' devices doesn't make 
>>> sense, we can keep as it is.
>>>
>>> Btw, when functions fails at this point, bonding device itself 
>>> already marked as stopped, right? And some of the slave devices may 
>>> be stopped already before failure.
>>> I don't know how confusing this is for the user, that stop() function 
>>> is failed but bonding device state is 'stopped'. I don't know if 
>>> function should recover at least bonding device status (back to 
>>> started) on failure, what do you think?
>>>
>>>>>
>>>>>> +        }
>>>>>> +
>>>>>> +        /* active slaves need to deactivate. */
>>>>>
>>>>> " active slaves need to be deactivated. " ?
>>>> agreed.
>>>>>
>>>>>>           if (find_slave_by_id(internals->active_slaves,
>>>>>>                   internals->active_slave_count, slave_id) !=
>>>>>> -                        internals->active_slave_count) {
>>>>>> -            internals->slaves[i].last_link_status = 0;
>>>>>> -            ret = rte_eth_dev_stop(slave_id);
>>>>>> -            if (ret != 0) {
>>>>>> -                RTE_BOND_LOG(ERR, "Failed to stop device on port 
>>>>>> %u",
>>>>>> -                         slave_id);
>>>>>> -                return ret;
>>>>>> -            }
>>>>>> +                internals->active_slave_count)
>>>>>
>>>>> I think original indentation for this line is better.
>>>>>
>>>> agreed.
>>>>>>               deactivate_slave(eth_dev, slave_id);
>>>>>> -        }
>>>>>>       }
>>>>>>       return 0;
>>>>>
>>>>> .
>>>
>>> .
> 
> .
  

Patch

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index b305b6a35b..469dc71170 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -2118,18 +2118,20 @@  bond_ethdev_stop(struct rte_eth_dev *eth_dev)
 	internals->link_status_polling_enabled = 0;
 	for (i = 0; i < internals->slave_count; i++) {
 		uint16_t slave_id = internals->slaves[i].port_id;
+
+		internals->slaves[i].last_link_status = 0;
+		ret = rte_eth_dev_stop(slave_id);
+		if (ret != 0) {
+			RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
+				     slave_id);
+			return ret;
+		}
+
+		/* active slaves need to deactivate. */
 		if (find_slave_by_id(internals->active_slaves,
 				internals->active_slave_count, slave_id) !=
-						internals->active_slave_count) {
-			internals->slaves[i].last_link_status = 0;
-			ret = rte_eth_dev_stop(slave_id);
-			if (ret != 0) {
-				RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
-					     slave_id);
-				return ret;
-			}
+				internals->active_slave_count)
 			deactivate_slave(eth_dev, slave_id);
-		}
 	}
 
 	return 0;