[V2,2/4] net/bonding: fix non-terminable while loop

Message ID 20220324030036.4761-3-humin29@huawei.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series bugfix for bonding |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

humin (Q) March 24, 2022, 3 a.m. UTC
  From: Huisong Li <lihuisong@huawei.com>

All slaves will be stopped and removed when closing a bonded port. But the
while loop can not stop if both rte_eth_dev_stop and
rte_eth_bond_slave_remove fail to run.

Fixes: fb0379bc5db3 ("net/bonding: check stop call status")
Cc: stable@dpdk.org

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
  

Comments

Ferruh Yigit April 26, 2022, 6:19 p.m. UTC | #1
On 3/24/2022 3:00 AM, Min Hu (Connor) wrote:
> From: Huisong Li <lihuisong@huawei.com>
> 
> All slaves will be stopped and removed when closing a bonded port. But the
> while loop can not stop if both rte_eth_dev_stop and
> rte_eth_bond_slave_remove fail to run.
> 

Agree that this is a defect introduced in below commit. Thanks for the fix.

> Fixes: fb0379bc5db3 ("net/bonding: check stop call status")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Huisong Li <lihuisong@huawei.com>
> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
> ---
>   drivers/net/bonding/rte_eth_bond_pmd.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
> index 469dc71170..00d4deda44 100644
> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> @@ -2149,13 +2149,14 @@ bond_ethdev_close(struct rte_eth_dev *dev)
>   		return 0;
>   
>   	RTE_BOND_LOG(INFO, "Closing bonded device %s", dev->device->name);
> -	while (internals->slave_count != skipped) {
> +	while (skipped < internals->slave_count) {

When below fixed with adding 'continue', no need to change the check, 
right? Although new one is also correct.

>   		uint16_t port_id = internals->slaves[skipped].port_id;
>   
>   		if (rte_eth_dev_stop(port_id) != 0) {
>   			RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
>   				     port_id);
>   			skipped++;
> +			continue;

Can't we remove the slave even if 'stop()' failed? If so I think better 
to just log the error and keep continue in that case, what do you think?

>   		}
>   
>   		if (rte_eth_bond_slave_remove(bond_port_id, port_id) != 0) {
  
humin (Q) April 29, 2022, 6:52 a.m. UTC | #2
Hi, Ferruh,

在 2022/4/27 2:19, Ferruh Yigit 写道:
> On 3/24/2022 3:00 AM, Min Hu (Connor) wrote:
>> From: Huisong Li <lihuisong@huawei.com>
>>
>> All slaves will be stopped and removed when closing a bonded port. But 
>> the
>> while loop can not stop if both rte_eth_dev_stop and
>> rte_eth_bond_slave_remove fail to run.
>>
> 
> Agree that this is a defect introduced in below commit. Thanks for the fix.
thanks.
> 
>> Fixes: fb0379bc5db3 ("net/bonding: check stop call status")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Huisong Li <lihuisong@huawei.com>
>> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
>> ---
>>   drivers/net/bonding/rte_eth_bond_pmd.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>> index 469dc71170..00d4deda44 100644
>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>> @@ -2149,13 +2149,14 @@ bond_ethdev_close(struct rte_eth_dev *dev)
>>           return 0;
>>       RTE_BOND_LOG(INFO, "Closing bonded device %s", dev->device->name);
>> -    while (internals->slave_count != skipped) {
>> +    while (skipped < internals->slave_count) {
> 
> When below fixed with adding 'continue', no need to change the check, 
> right? Although new one is also correct.
Agreed.
> 
>>           uint16_t port_id = internals->slaves[skipped].port_id;
>>           if (rte_eth_dev_stop(port_id) != 0) {
>>               RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
>>                        port_id);
>>               skipped++;
>> +            continue;
> 
> Can't we remove the slave even if 'stop()' failed? If so I think better 
> to just log the error and keep continue in that case, what do you think?
NO, if slave stop failed, we cannot remove the slave.
just see the function stack:
rte_eth_bond_slave_remove
__eth_bond_slave_remove_lock_free
slave_remove
rte_eth_dev_internal_reset
if (dev->data->dev_started) {
	RTE_ETHDEV_LOG(ERR, "Port %u must be stopped to allow reset\n",
		dev->data->port_id);
	return;
}

> 
>>           }
>>           if (rte_eth_bond_slave_remove(bond_port_id, port_id) != 0) {
> 
> .
  
Ferruh Yigit April 29, 2022, 1:35 p.m. UTC | #3
On 4/29/2022 7:52 AM, Min Hu (Connor) wrote:
> Hi, Ferruh,
> 
> 在 2022/4/27 2:19, Ferruh Yigit 写道:
>> On 3/24/2022 3:00 AM, Min Hu (Connor) wrote:
>>> From: Huisong Li <lihuisong@huawei.com>
>>>
>>> All slaves will be stopped and removed when closing a bonded port. 
>>> But the
>>> while loop can not stop if both rte_eth_dev_stop and
>>> rte_eth_bond_slave_remove fail to run.
>>>
>>
>> Agree that this is a defect introduced in below commit. Thanks for the 
>> fix.
> thanks.
>>
>>> Fixes: fb0379bc5db3 ("net/bonding: check stop call status")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Huisong Li <lihuisong@huawei.com>
>>> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
>>> ---
>>>   drivers/net/bonding/rte_eth_bond_pmd.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
>>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>>> index 469dc71170..00d4deda44 100644
>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>> @@ -2149,13 +2149,14 @@ bond_ethdev_close(struct rte_eth_dev *dev)
>>>           return 0;
>>>       RTE_BOND_LOG(INFO, "Closing bonded device %s", dev->device->name);
>>> -    while (internals->slave_count != skipped) {
>>> +    while (skipped < internals->slave_count) {
>>
>> When below fixed with adding 'continue', no need to change the check, 
>> right? Although new one is also correct.
> Agreed.
>>
>>>           uint16_t port_id = internals->slaves[skipped].port_id;
>>>           if (rte_eth_dev_stop(port_id) != 0) {
>>>               RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
>>>                        port_id);
>>>               skipped++;
>>> +            continue;
>>
>> Can't we remove the slave even if 'stop()' failed? If so I think 
>> better to just log the error and keep continue in that case, what do 
>> you think?
> NO, if slave stop failed, we cannot remove the slave.

Got it, thanks for clarification.

> just see the function stack:
> rte_eth_bond_slave_remove
> __eth_bond_slave_remove_lock_free
> slave_remove
> rte_eth_dev_internal_reset
> if (dev->data->dev_started) {
>      RTE_ETHDEV_LOG(ERR, "Port %u must be stopped to allow reset\n",
>          dev->data->port_id);
>      return;
> }
> 
>>
>>>           }
>>>           if (rte_eth_bond_slave_remove(bond_port_id, port_id) != 0) {
>>
>> .
  

Patch

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index 469dc71170..00d4deda44 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -2149,13 +2149,14 @@  bond_ethdev_close(struct rte_eth_dev *dev)
 		return 0;
 
 	RTE_BOND_LOG(INFO, "Closing bonded device %s", dev->device->name);
-	while (internals->slave_count != skipped) {
+	while (skipped < internals->slave_count) {
 		uint16_t port_id = internals->slaves[skipped].port_id;
 
 		if (rte_eth_dev_stop(port_id) != 0) {
 			RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
 				     port_id);
 			skipped++;
+			continue;
 		}
 
 		if (rte_eth_bond_slave_remove(bond_port_id, port_id) != 0) {