net/bonding: fix iavf bond device query stats

Message ID 20230608072636.426803-1-kaiwenx.deng@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series net/bonding: fix iavf bond device query stats |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/github-robot: build success github build: passed
ci/iol-aarch-unit-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-unit-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Functional success Functional PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/intel-Testing success Testing PASS

Commit Message

Kaiwen Deng June 8, 2023, 7:26 a.m. UTC
  If the rte_eth_stats_get function does not work properly,
the update function of the slave device does not work
properly When device is bonded as BONDING_MODE_TLB mode.

This commit adds handling for functions that do not get
stats properly.

Fixes: 7c76a747e68c ("bond: add mode 5")
Cc: stable@dpdk.org

Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com>
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)
  

Comments

Stephen Hemminger June 8, 2023, 3:41 p.m. UTC | #1
On Thu,  8 Jun 2023 15:26:36 +0800
Kaiwen Deng <kaiwenx.deng@intel.com> wrote:

> If the rte_eth_stats_get function does not work properly,
> the update function of the slave device does not work
> properly When device is bonded as BONDING_MODE_TLB mode.
> 
> This commit adds handling for functions that do not get
> stats properly.
> 
> Fixes: 7c76a747e68c ("bond: add mode 5")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com>
> ---
>  drivers/net/bonding/rte_eth_bond_pmd.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
> index f0c4f7d26b..edce621496 100644
> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> @@ -894,6 +894,7 @@ bond_ethdev_update_tlb_slave_cb(void *arg)
>  	uint8_t update_stats = 0;
>  	uint16_t slave_id;
>  	uint16_t i;
> +	int ret;
>  
>  	internals->slave_update_idx++;
>  
> @@ -903,7 +904,10 @@ bond_ethdev_update_tlb_slave_cb(void *arg)
>  
>  	for (i = 0; i < internals->active_slave_count; i++) {
>  		slave_id = internals->active_slaves[i];
> -		rte_eth_stats_get(slave_id, &slave_stats);
> +		ret = rte_eth_stats_get(slave_id, &slave_stats);
> +		if (ret)
> +			goto OUT;
> +
>  		tx_bytes = slave_stats.obytes - tlb_last_obytets[slave_id];
>  		bandwidth_left(slave_id, tx_bytes,
>  				internals->slave_update_idx, &bwg_array[i]);
> @@ -922,6 +926,7 @@ bond_ethdev_update_tlb_slave_cb(void *arg)
>  	for (i = 0; i < slave_count; i++)
>  		internals->tlb_slaves_order[i] = bwg_array[i].slave;
>  
> +OUT:
>  	rte_eal_alarm_set(REORDER_PERIOD_MS * 1000, bond_ethdev_update_tlb_slave_cb,
>  			(struct bond_dev_private *)internals);
>  }

Why is stats get failing on a device, looks like the real bug is there?
Better to fix the buggy driver. Other usages might already be affected.

Silently ignoring the error without logging is also not good.
Lastly, DPDK coding style is to use lower case for goto labels.
  
lihuisong (C) June 9, 2023, 1:38 a.m. UTC | #2
在 2023/6/8 23:41, Stephen Hemminger 写道:
> On Thu,  8 Jun 2023 15:26:36 +0800
> Kaiwen Deng <kaiwenx.deng@intel.com> wrote:
>
>> If the rte_eth_stats_get function does not work properly,
>> the update function of the slave device does not work
>> properly When device is bonded as BONDING_MODE_TLB mode.
>>
>> This commit adds handling for functions that do not get
>> stats properly.
>>
>> Fixes: 7c76a747e68c ("bond: add mode 5")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com>
>> ---
>>   drivers/net/bonding/rte_eth_bond_pmd.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>> index f0c4f7d26b..edce621496 100644
>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>> @@ -894,6 +894,7 @@ bond_ethdev_update_tlb_slave_cb(void *arg)
>>   	uint8_t update_stats = 0;
>>   	uint16_t slave_id;
>>   	uint16_t i;
>> +	int ret;
>>   
>>   	internals->slave_update_idx++;
>>   
>> @@ -903,7 +904,10 @@ bond_ethdev_update_tlb_slave_cb(void *arg)
>>   
>>   	for (i = 0; i < internals->active_slave_count; i++) {
>>   		slave_id = internals->active_slaves[i];
>> -		rte_eth_stats_get(slave_id, &slave_stats);
>> +		ret = rte_eth_stats_get(slave_id, &slave_stats);
>> +		if (ret)
>> +			goto OUT;
>> +
>>   		tx_bytes = slave_stats.obytes - tlb_last_obytets[slave_id];
>>   		bandwidth_left(slave_id, tx_bytes,
>>   				internals->slave_update_idx, &bwg_array[i]);
>> @@ -922,6 +926,7 @@ bond_ethdev_update_tlb_slave_cb(void *arg)
>>   	for (i = 0; i < slave_count; i++)
>>   		internals->tlb_slaves_order[i] = bwg_array[i].slave;
>>   
>> +OUT:
>>   	rte_eal_alarm_set(REORDER_PERIOD_MS * 1000, bond_ethdev_update_tlb_slave_cb,
>>   			(struct bond_dev_private *)internals);
>>   }
> Why is stats get failing on a device, looks like the real bug is there?
> Better to fix the buggy driver. Other usages might already be affected.
I think this is the case if the driver happens to have an abnormal event 
and do reset.
So here need to handle this fairlure.
Additionally, suggest that this API in bond_ethdev_stats_get should do 
the same things.
> Silently ignoring the error without logging is also not good.
> Lastly, DPDK coding style is to use lower case for goto labels.
+1
> .
  
Ferruh Yigit June 23, 2023, 1:13 p.m. UTC | #3
On 6/9/2023 2:38 AM, lihuisong (C) wrote:
> 
> 在 2023/6/8 23:41, Stephen Hemminger 写道:
>> On Thu,  8 Jun 2023 15:26:36 +0800
>> Kaiwen Deng <kaiwenx.deng@intel.com> wrote:
>>
>>> If the rte_eth_stats_get function does not work properly,
>>> the update function of the slave device does not work
>>> properly When device is bonded as BONDING_MODE_TLB mode.
>>>
>>> This commit adds handling for functions that do not get
>>> stats properly.
>>>
>>> Fixes: 7c76a747e68c ("bond: add mode 5")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com>
>>> ---
>>>   drivers/net/bonding/rte_eth_bond_pmd.c | 7 ++++++-
>>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
>>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>>> index f0c4f7d26b..edce621496 100644
>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>> @@ -894,6 +894,7 @@ bond_ethdev_update_tlb_slave_cb(void *arg)
>>>       uint8_t update_stats = 0;
>>>       uint16_t slave_id;
>>>       uint16_t i;
>>> +    int ret;
>>>         internals->slave_update_idx++;
>>>   @@ -903,7 +904,10 @@ bond_ethdev_update_tlb_slave_cb(void *arg)
>>>         for (i = 0; i < internals->active_slave_count; i++) {
>>>           slave_id = internals->active_slaves[i];
>>> -        rte_eth_stats_get(slave_id, &slave_stats);
>>> +        ret = rte_eth_stats_get(slave_id, &slave_stats);
>>> +        if (ret)
>>> +            goto OUT;
>>> +
>>>           tx_bytes = slave_stats.obytes - tlb_last_obytets[slave_id];
>>>           bandwidth_left(slave_id, tx_bytes,
>>>                   internals->slave_update_idx, &bwg_array[i]);
>>> @@ -922,6 +926,7 @@ bond_ethdev_update_tlb_slave_cb(void *arg)
>>>       for (i = 0; i < slave_count; i++)
>>>           internals->tlb_slaves_order[i] = bwg_array[i].slave;
>>>   +OUT:
>>>       rte_eal_alarm_set(REORDER_PERIOD_MS * 1000,
>>> bond_ethdev_update_tlb_slave_cb,
>>>               (struct bond_dev_private *)internals);
>>>   }
>> Why is stats get failing on a device, looks like the real bug is there?
>> Better to fix the buggy driver. Other usages might already be affected.
> I think this is the case if the driver happens to have an abnormal event
> and do reset.
> So here need to handle this fairlure.
>

Agree with Huisong, this is a DPDK API that can return error, so
checking return value is reasonable.


@Kaiwen, back to Stephen's point, the patch title mentions from iavf, is
there anything special with iavf driver that fails to get stats?
Main concern is, if this patch to hide a defect in iavf, or is driver
only mentioned because problem observed with iavf driver?

If this is not iavf specific issue, we can continue with patch,
in that case can you please make a new version with following changes:

1- Add error check for both usage of 'rte_eth_stats_get()' in bonding
2- Make label lowercase
3- Print a log message on failure

Thanks,
ferruh

> Additionally, suggest that this API in bond_ethdev_stats_get should do
> the same things.
>> Silently ignoring the error without logging is also not good.
>> Lastly, DPDK coding style is to use lower case for goto labels.
> +1
>> .
  

Patch

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index f0c4f7d26b..edce621496 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -894,6 +894,7 @@  bond_ethdev_update_tlb_slave_cb(void *arg)
 	uint8_t update_stats = 0;
 	uint16_t slave_id;
 	uint16_t i;
+	int ret;
 
 	internals->slave_update_idx++;
 
@@ -903,7 +904,10 @@  bond_ethdev_update_tlb_slave_cb(void *arg)
 
 	for (i = 0; i < internals->active_slave_count; i++) {
 		slave_id = internals->active_slaves[i];
-		rte_eth_stats_get(slave_id, &slave_stats);
+		ret = rte_eth_stats_get(slave_id, &slave_stats);
+		if (ret)
+			goto OUT;
+
 		tx_bytes = slave_stats.obytes - tlb_last_obytets[slave_id];
 		bandwidth_left(slave_id, tx_bytes,
 				internals->slave_update_idx, &bwg_array[i]);
@@ -922,6 +926,7 @@  bond_ethdev_update_tlb_slave_cb(void *arg)
 	for (i = 0; i < slave_count; i++)
 		internals->tlb_slaves_order[i] = bwg_array[i].slave;
 
+OUT:
 	rte_eal_alarm_set(REORDER_PERIOD_MS * 1000, bond_ethdev_update_tlb_slave_cb,
 			(struct bond_dev_private *)internals);
 }