[v4] net/ice: retry sending adminQ command after failure

Message ID 20220531174827.357629-1-peng1x.zhang@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Qi Zhang
Headers
Series [v4] net/ice: retry sending adminQ command after failure |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/github-robot: build success github build: passed
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-abi-testing warning Testing issues
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS

Commit Message

Zhang, Peng1X May 31, 2022, 5:48 p.m. UTC
  From: Peng Zhang <peng1x.zhang@intel.com>

The origin design is if error happen during the step 3 of following given 
situation, it will return error directly without retry. While in current 
patch, it will retry again and again during certain time. If retry succeed,
rule can be continuously created. It will improve success rate of creating 
rule under following given situation. 

The given situation as following steps show:
step 1. Kernel PF and DCF are ready at the beginning.
step 2. A VF reset happen, kernel send an event to DPDK DCF and set STATE 
to pause.
step 3. Before DPDK DCF receive the event, it is possible a rule creation 
is ongoing and switch rules, recipe, or vsi list related adminQ operation 
is executing.
step 4. Then result of operation is failure, it will lead to error return 
to DPDK DCF. DPDK DCF error code will be set as EINVAL, not EAGAIN.

Fixes: 6bad5047be24 ("net/ice/base: return correct error code")
Fixes: 453d087ccaff ("net/ice/base: add common functions")
Cc: stable@dpdk.org

Signed-off-by: Peng Zhang <peng1x.zhang@intel.com>
---
 v4 changes:
 - Add retry mechanism if fail to send adminQ command under given situation.
 v3 Changes:
 - Add the situation description, expected error code and incorrect error code 
 - in commit log.
 v2 Changes:
 - Modify DCF state checking mechanism.

 drivers/net/ice/base/ice_common.c |  2 +-
 drivers/net/ice/base/ice_switch.c | 46 +++++++++++++++++++++++++++----
 drivers/net/ice/base/ice_switch.h |  5 ++++
 3 files changed, 46 insertions(+), 7 deletions(-)
  

Comments

Qi Zhang May 31, 2022, 11:51 a.m. UTC | #1
> -----Original Message-----
> From: Zhang, Peng1X <peng1x.zhang@intel.com>
> Sent: Wednesday, June 1, 2022 1:48 AM
> To: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; dev@dpdk.org
> Cc: Zhang, Peng1X <peng1x.zhang@intel.com>; stable@dpdk.org
> Subject: [PATCH v4] net/ice: retry sending adminQ command after failure
> 
> From: Peng Zhang <peng1x.zhang@intel.com>
> 
> The origin design is if error happen during the step 3 of following given situation,
> it will return error directly without retry. While in current patch, it will retry
> again and again during certain time. If retry succeed, rule can be continuously
> created. It will improve success rate of creating rule under following given
> situation.
> 
> The given situation as following steps show:
> step 1. Kernel PF and DCF are ready at the beginning.
> step 2. A VF reset happen, kernel send an event to DPDK DCF and set STATE to
> pause.
> step 3. Before DPDK DCF receive the event, it is possible a rule creation is
> ongoing and switch rules, recipe, or vsi list related adminQ operation is
> executing.
> step 4. Then result of operation is failure, it will lead to error return to DPDK
> DCF. DPDK DCF error code will be set as EINVAL, not EAGAIN.
> 
> Fixes: 6bad5047be24 ("net/ice/base: return correct error code")
> Fixes: 453d087ccaff ("net/ice/base: add common functions")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Peng Zhang <peng1x.zhang@intel.com>
> ---
>  v4 changes:
>  - Add retry mechanism if fail to send adminQ command under given situation.
>  v3 Changes:
>  - Add the situation description, expected error code and incorrect error code
>  - in commit log.
>  v2 Changes:
>  - Modify DCF state checking mechanism.
> 
>  drivers/net/ice/base/ice_common.c |  2 +-  drivers/net/ice/base/ice_switch.c |
> 46 +++++++++++++++++++++++++++----  drivers/net/ice/base/ice_switch.h |  5
> ++++
>  3 files changed, 46 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ice/base/ice_common.c
> b/drivers/net/ice/base/ice_common.c
> index db87bacd97..013c255371 100644
> --- a/drivers/net/ice/base/ice_common.c
> +++ b/drivers/net/ice/base/ice_common.c
> @@ -2127,7 +2127,7 @@ ice_aq_alloc_free_res(struct ice_hw *hw, u16
> num_entries,
> 
>  	cmd->num_entries = CPU_TO_LE16(num_entries);
> 
> -	return ice_aq_send_cmd(hw, &desc, buf, buf_size, cd);
> +	return ice_aq_retry_send_cmd(hw, &desc, buf, buf_size, cd);

this is the fix only for DCF, we don't need to retry in a PF driver context.
Better keep the same function name, but implements the retry mechanism inside the function, and should only be triggered  when it is in a DCF context
  
Zhang, Peng1X June 1, 2022, 1:48 a.m. UTC | #2
I see, so ice_aq_retry_send_cmd  function will  conduct whether  DCF is enabled or not with 
'hw->dcf_enabled'.
> -----Original Message-----
> From: Zhang, Qi Z <qi.z.zhang@intel.com>
> Sent: Tuesday, May 31, 2022 7:52 PM
> To: Zhang, Peng1X <peng1x.zhang@intel.com>; Yang, Qiming
> <qiming.yang@intel.com>; dev@dpdk.org
> Cc: stable@dpdk.org
> Subject: RE: [PATCH v4] net/ice: retry sending adminQ command after failure
> 
> 
> 
> > -----Original Message-----
> > From: Zhang, Peng1X <peng1x.zhang@intel.com>
> > Sent: Wednesday, June 1, 2022 1:48 AM
> > To: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z
> > <qi.z.zhang@intel.com>; dev@dpdk.org
> > Cc: Zhang, Peng1X <peng1x.zhang@intel.com>; stable@dpdk.org
> > Subject: [PATCH v4] net/ice: retry sending adminQ command after
> > failure
> >
> > From: Peng Zhang <peng1x.zhang@intel.com>
> >
> > The origin design is if error happen during the step 3 of following
> > given situation, it will return error directly without retry. While in
> > current patch, it will retry again and again during certain time. If
> > retry succeed, rule can be continuously created. It will improve
> > success rate of creating rule under following given situation.
> >
> > The given situation as following steps show:
> > step 1. Kernel PF and DCF are ready at the beginning.
> > step 2. A VF reset happen, kernel send an event to DPDK DCF and set
> > STATE to pause.
> > step 3. Before DPDK DCF receive the event, it is possible a rule
> > creation is ongoing and switch rules, recipe, or vsi list related
> > adminQ operation is executing.
> > step 4. Then result of operation is failure, it will lead to error
> > return to DPDK DCF. DPDK DCF error code will be set as EINVAL, not EAGAIN.
> >
> > Fixes: 6bad5047be24 ("net/ice/base: return correct error code")
> > Fixes: 453d087ccaff ("net/ice/base: add common functions")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Peng Zhang <peng1x.zhang@intel.com>
> > ---
> >  v4 changes:
> >  - Add retry mechanism if fail to send adminQ command under given
> situation.
> >  v3 Changes:
> >  - Add the situation description, expected error code and incorrect
> > error code
> >  - in commit log.
> >  v2 Changes:
> >  - Modify DCF state checking mechanism.
> >
> >  drivers/net/ice/base/ice_common.c |  2 +-
> > drivers/net/ice/base/ice_switch.c |
> > 46 +++++++++++++++++++++++++++----  drivers/net/ice/base/ice_switch.h
> > |  5
> > ++++
> >  3 files changed, 46 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/ice/base/ice_common.c
> > b/drivers/net/ice/base/ice_common.c
> > index db87bacd97..013c255371 100644
> > --- a/drivers/net/ice/base/ice_common.c
> > +++ b/drivers/net/ice/base/ice_common.c
> > @@ -2127,7 +2127,7 @@ ice_aq_alloc_free_res(struct ice_hw *hw, u16
> > num_entries,
> >
> >  	cmd->num_entries = CPU_TO_LE16(num_entries);
> >
> > -	return ice_aq_send_cmd(hw, &desc, buf, buf_size, cd);
> > +	return ice_aq_retry_send_cmd(hw, &desc, buf, buf_size, cd);
> 
> this is the fix only for DCF, we don't need to retry in a PF driver context.
> Better keep the same function name, but implements the retry mechanism
> inside the function, and should only be triggered  when it is in a DCF context
  
Qi Zhang June 1, 2022, 1:53 a.m. UTC | #3
> -----Original Message-----
> From: Zhang, Peng1X <peng1x.zhang@intel.com>
> Sent: Wednesday, June 1, 2022 9:49 AM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, Qiming
> <qiming.yang@intel.com>; dev@dpdk.org
> Cc: stable@dpdk.org
> Subject: RE: [PATCH v4] net/ice: retry sending adminQ command after failure
> 
> I see, so ice_aq_retry_send_cmd  function will  conduct whether  DCF is
> enabled or not with 'hw->dcf_enabled'.

Why not just enable retry in ice_dcf_send_aq_cmd? 


> > -----Original Message-----
> > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > Sent: Tuesday, May 31, 2022 7:52 PM
> > To: Zhang, Peng1X <peng1x.zhang@intel.com>; Yang, Qiming
> > <qiming.yang@intel.com>; dev@dpdk.org
> > Cc: stable@dpdk.org
> > Subject: RE: [PATCH v4] net/ice: retry sending adminQ command after
> > failure
> >
> >
> >
> > > -----Original Message-----
> > > From: Zhang, Peng1X <peng1x.zhang@intel.com>
> > > Sent: Wednesday, June 1, 2022 1:48 AM
> > > To: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z
> > > <qi.z.zhang@intel.com>; dev@dpdk.org
> > > Cc: Zhang, Peng1X <peng1x.zhang@intel.com>; stable@dpdk.org
> > > Subject: [PATCH v4] net/ice: retry sending adminQ command after
> > > failure
> > >
> > > From: Peng Zhang <peng1x.zhang@intel.com>
> > >
> > > The origin design is if error happen during the step 3 of following
> > > given situation, it will return error directly without retry. While
> > > in current patch, it will retry again and again during certain time.
> > > If retry succeed, rule can be continuously created. It will improve
> > > success rate of creating rule under following given situation.
> > >
> > > The given situation as following steps show:
> > > step 1. Kernel PF and DCF are ready at the beginning.
> > > step 2. A VF reset happen, kernel send an event to DPDK DCF and set
> > > STATE to pause.
> > > step 3. Before DPDK DCF receive the event, it is possible a rule
> > > creation is ongoing and switch rules, recipe, or vsi list related
> > > adminQ operation is executing.
> > > step 4. Then result of operation is failure, it will lead to error
> > > return to DPDK DCF. DPDK DCF error code will be set as EINVAL, not
> EAGAIN.
> > >
> > > Fixes: 6bad5047be24 ("net/ice/base: return correct error code")
> > > Fixes: 453d087ccaff ("net/ice/base: add common functions")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Peng Zhang <peng1x.zhang@intel.com>
> > > ---
> > >  v4 changes:
> > >  - Add retry mechanism if fail to send adminQ command under given
> > situation.
> > >  v3 Changes:
> > >  - Add the situation description, expected error code and incorrect
> > > error code
> > >  - in commit log.
> > >  v2 Changes:
> > >  - Modify DCF state checking mechanism.
> > >
> > >  drivers/net/ice/base/ice_common.c |  2 +-
> > > drivers/net/ice/base/ice_switch.c |
> > > 46 +++++++++++++++++++++++++++----
> > > drivers/net/ice/base/ice_switch.h
> > > |  5
> > > ++++
> > >  3 files changed, 46 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/net/ice/base/ice_common.c
> > > b/drivers/net/ice/base/ice_common.c
> > > index db87bacd97..013c255371 100644
> > > --- a/drivers/net/ice/base/ice_common.c
> > > +++ b/drivers/net/ice/base/ice_common.c
> > > @@ -2127,7 +2127,7 @@ ice_aq_alloc_free_res(struct ice_hw *hw, u16
> > > num_entries,
> > >
> > >  	cmd->num_entries = CPU_TO_LE16(num_entries);
> > >
> > > -	return ice_aq_send_cmd(hw, &desc, buf, buf_size, cd);
> > > +	return ice_aq_retry_send_cmd(hw, &desc, buf, buf_size, cd);
> >
> > this is the fix only for DCF, we don't need to retry in a PF driver context.
> > Better keep the same function name, but implements the retry mechanism
> > inside the function, and should only be triggered  when it is in a DCF
> > context
  
Zhang, Peng1X June 1, 2022, 2:46 a.m. UTC | #4
Because currently just focus on the following given situation, and use 'hw->dcf_enabled' it should only affect on DCF.
After having checked code 'ice_dcf_send_aq_cmd', it already have retry mechanism. 

> -----Original Message-----
> From: Zhang, Qi Z <qi.z.zhang@intel.com>
> Sent: Wednesday, June 1, 2022 9:54 AM
> To: Zhang, Peng1X <peng1x.zhang@intel.com>; Yang, Qiming
> <qiming.yang@intel.com>; dev@dpdk.org
> Cc: stable@dpdk.org
> Subject: RE: [PATCH v4] net/ice: retry sending adminQ command after failure
> 
> 
> 
> > -----Original Message-----
> > From: Zhang, Peng1X <peng1x.zhang@intel.com>
> > Sent: Wednesday, June 1, 2022 9:49 AM
> > To: Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, Qiming
> > <qiming.yang@intel.com>; dev@dpdk.org
> > Cc: stable@dpdk.org
> > Subject: RE: [PATCH v4] net/ice: retry sending adminQ command after
> > failure
> >
> > I see, so ice_aq_retry_send_cmd  function will  conduct whether  DCF
> > is enabled or not with 'hw->dcf_enabled'.
> 
> Why not just enable retry in ice_dcf_send_aq_cmd?
> 
> 
> > > -----Original Message-----
> > > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > > Sent: Tuesday, May 31, 2022 7:52 PM
> > > To: Zhang, Peng1X <peng1x.zhang@intel.com>; Yang, Qiming
> > > <qiming.yang@intel.com>; dev@dpdk.org
> > > Cc: stable@dpdk.org
> > > Subject: RE: [PATCH v4] net/ice: retry sending adminQ command after
> > > failure
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Zhang, Peng1X <peng1x.zhang@intel.com>
> > > > Sent: Wednesday, June 1, 2022 1:48 AM
> > > > To: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z
> > > > <qi.z.zhang@intel.com>; dev@dpdk.org
> > > > Cc: Zhang, Peng1X <peng1x.zhang@intel.com>; stable@dpdk.org
> > > > Subject: [PATCH v4] net/ice: retry sending adminQ command after
> > > > failure
> > > >
> > > > From: Peng Zhang <peng1x.zhang@intel.com>
> > > >
> > > > The origin design is if error happen during the step 3 of
> > > > following given situation, it will return error directly without
> > > > retry. While in current patch, it will retry again and again during certain
> time.
> > > > If retry succeed, rule can be continuously created. It will
> > > > improve success rate of creating rule under following given situation.
> > > >
> > > > The given situation as following steps show:
> > > > step 1. Kernel PF and DCF are ready at the beginning.
> > > > step 2. A VF reset happen, kernel send an event to DPDK DCF and
> > > > set STATE to pause.
> > > > step 3. Before DPDK DCF receive the event, it is possible a rule
> > > > creation is ongoing and switch rules, recipe, or vsi list related
> > > > adminQ operation is executing.
> > > > step 4. Then result of operation is failure, it will lead to error
> > > > return to DPDK DCF. DPDK DCF error code will be set as EINVAL, not
> > EAGAIN.
> > > >
> > > > Fixes: 6bad5047be24 ("net/ice/base: return correct error code")
> > > > Fixes: 453d087ccaff ("net/ice/base: add common functions")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Peng Zhang <peng1x.zhang@intel.com>
> > > > ---
> > > >  v4 changes:
> > > >  - Add retry mechanism if fail to send adminQ command under given
> > > situation.
> > > >  v3 Changes:
> > > >  - Add the situation description, expected error code and
> > > > incorrect error code
> > > >  - in commit log.
> > > >  v2 Changes:
> > > >  - Modify DCF state checking mechanism.
> > > >
> > > >  drivers/net/ice/base/ice_common.c |  2 +-
> > > > drivers/net/ice/base/ice_switch.c |
> > > > 46 +++++++++++++++++++++++++++----
> > > > drivers/net/ice/base/ice_switch.h
> > > > |  5
> > > > ++++
> > > >  3 files changed, 46 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/net/ice/base/ice_common.c
> > > > b/drivers/net/ice/base/ice_common.c
> > > > index db87bacd97..013c255371 100644
> > > > --- a/drivers/net/ice/base/ice_common.c
> > > > +++ b/drivers/net/ice/base/ice_common.c
> > > > @@ -2127,7 +2127,7 @@ ice_aq_alloc_free_res(struct ice_hw *hw, u16
> > > > num_entries,
> > > >
> > > >  	cmd->num_entries = CPU_TO_LE16(num_entries);
> > > >
> > > > -	return ice_aq_send_cmd(hw, &desc, buf, buf_size, cd);
> > > > +	return ice_aq_retry_send_cmd(hw, &desc, buf, buf_size, cd);
> > >
> > > this is the fix only for DCF, we don't need to retry in a PF driver context.
> > > Better keep the same function name, but implements the retry
> > > mechanism inside the function, and should only be triggered  when it
> > > is in a DCF context
  
Qi Zhang June 1, 2022, 3:22 a.m. UTC | #5
> -----Original Message-----
> From: Zhang, Peng1X <peng1x.zhang@intel.com>
> Sent: Wednesday, June 1, 2022 10:46 AM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, Qiming
> <qiming.yang@intel.com>; dev@dpdk.org
> Cc: stable@dpdk.org
> Subject: RE: [PATCH v4] net/ice: retry sending adminQ command after failure
> 
> Because currently just focus on the following given situation, and use 'hw-
> >dcf_enabled' it should only affect on DCF.
> After having checked code 'ice_dcf_send_aq_cmd', it already have retry
> mechanism.


If it is already retried, then why we need this patch?
Btw, we are talking about "retry virtual channel command from DCF to kernel PF" here, not the case "retry polling reply status of the a request".

> 
> > -----Original Message-----
> > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > Sent: Wednesday, June 1, 2022 9:54 AM
> > To: Zhang, Peng1X <peng1x.zhang@intel.com>; Yang, Qiming
> > <qiming.yang@intel.com>; dev@dpdk.org
> > Cc: stable@dpdk.org
> > Subject: RE: [PATCH v4] net/ice: retry sending adminQ command after
> > failure
> >
> >
> >
> > > -----Original Message-----
> > > From: Zhang, Peng1X <peng1x.zhang@intel.com>
> > > Sent: Wednesday, June 1, 2022 9:49 AM
> > > To: Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, Qiming
> > > <qiming.yang@intel.com>; dev@dpdk.org
> > > Cc: stable@dpdk.org
> > > Subject: RE: [PATCH v4] net/ice: retry sending adminQ command after
> > > failure
> > >
> > > I see, so ice_aq_retry_send_cmd  function will  conduct whether  DCF
> > > is enabled or not with 'hw->dcf_enabled'.
> >
> > Why not just enable retry in ice_dcf_send_aq_cmd?
> >
> >
> > > > -----Original Message-----
> > > > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > > > Sent: Tuesday, May 31, 2022 7:52 PM
> > > > To: Zhang, Peng1X <peng1x.zhang@intel.com>; Yang, Qiming
> > > > <qiming.yang@intel.com>; dev@dpdk.org
> > > > Cc: stable@dpdk.org
> > > > Subject: RE: [PATCH v4] net/ice: retry sending adminQ command
> > > > after failure
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Zhang, Peng1X <peng1x.zhang@intel.com>
> > > > > Sent: Wednesday, June 1, 2022 1:48 AM
> > > > > To: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z
> > > > > <qi.z.zhang@intel.com>; dev@dpdk.org
> > > > > Cc: Zhang, Peng1X <peng1x.zhang@intel.com>; stable@dpdk.org
> > > > > Subject: [PATCH v4] net/ice: retry sending adminQ command after
> > > > > failure
> > > > >
> > > > > From: Peng Zhang <peng1x.zhang@intel.com>
> > > > >
> > > > > The origin design is if error happen during the step 3 of
> > > > > following given situation, it will return error directly without
> > > > > retry. While in current patch, it will retry again and again
> > > > > during certain
> > time.
> > > > > If retry succeed, rule can be continuously created. It will
> > > > > improve success rate of creating rule under following given situation.
> > > > >
> > > > > The given situation as following steps show:
> > > > > step 1. Kernel PF and DCF are ready at the beginning.
> > > > > step 2. A VF reset happen, kernel send an event to DPDK DCF and
> > > > > set STATE to pause.
> > > > > step 3. Before DPDK DCF receive the event, it is possible a rule
> > > > > creation is ongoing and switch rules, recipe, or vsi list
> > > > > related adminQ operation is executing.
> > > > > step 4. Then result of operation is failure, it will lead to
> > > > > error return to DPDK DCF. DPDK DCF error code will be set as
> > > > > EINVAL, not
> > > EAGAIN.
> > > > >
> > > > > Fixes: 6bad5047be24 ("net/ice/base: return correct error code")
> > > > > Fixes: 453d087ccaff ("net/ice/base: add common functions")
> > > > > Cc: stable@dpdk.org
> > > > >
> > > > > Signed-off-by: Peng Zhang <peng1x.zhang@intel.com>
> > > > > ---
> > > > >  v4 changes:
> > > > >  - Add retry mechanism if fail to send adminQ command under
> > > > > given
> > > > situation.
> > > > >  v3 Changes:
> > > > >  - Add the situation description, expected error code and
> > > > > incorrect error code
> > > > >  - in commit log.
> > > > >  v2 Changes:
> > > > >  - Modify DCF state checking mechanism.
> > > > >
> > > > >  drivers/net/ice/base/ice_common.c |  2 +-
> > > > > drivers/net/ice/base/ice_switch.c |
> > > > > 46 +++++++++++++++++++++++++++----
> > > > > drivers/net/ice/base/ice_switch.h
> > > > > |  5
> > > > > ++++
> > > > >  3 files changed, 46 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/ice/base/ice_common.c
> > > > > b/drivers/net/ice/base/ice_common.c
> > > > > index db87bacd97..013c255371 100644
> > > > > --- a/drivers/net/ice/base/ice_common.c
> > > > > +++ b/drivers/net/ice/base/ice_common.c
> > > > > @@ -2127,7 +2127,7 @@ ice_aq_alloc_free_res(struct ice_hw *hw,
> > > > > u16 num_entries,
> > > > >
> > > > >  	cmd->num_entries = CPU_TO_LE16(num_entries);
> > > > >
> > > > > -	return ice_aq_send_cmd(hw, &desc, buf, buf_size, cd);
> > > > > +	return ice_aq_retry_send_cmd(hw, &desc, buf, buf_size, cd);
> > > >
> > > > this is the fix only for DCF, we don't need to retry in a PF driver context.
> > > > Better keep the same function name, but implements the retry
> > > > mechanism inside the function, and should only be triggered  when
> > > > it is in a DCF context
  
Zhang, Peng1X June 1, 2022, 5:35 a.m. UTC | #6
Because under the following given situation, it still have some issue. To this given situation, it is sure that retry mechanism is necessary. 
While is it confirm that under other situation retry mechanism is always better than without retry? 
If modify 'ice_dcf_send_aq_cmd', it will affect the handling to AdminQ under DCF. It looks like it will affect much more than following given situation. 

> -----Original Message-----
> From: Zhang, Qi Z <qi.z.zhang@intel.com>
> Sent: Wednesday, June 1, 2022 11:23 AM
> To: Zhang, Peng1X <peng1x.zhang@intel.com>; Yang, Qiming
> <qiming.yang@intel.com>; dev@dpdk.org
> Cc: stable@dpdk.org
> Subject: RE: [PATCH v4] net/ice: retry sending adminQ command after failure
> 
> 
> 
> > -----Original Message-----
> > From: Zhang, Peng1X <peng1x.zhang@intel.com>
> > Sent: Wednesday, June 1, 2022 10:46 AM
> > To: Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, Qiming
> > <qiming.yang@intel.com>; dev@dpdk.org
> > Cc: stable@dpdk.org
> > Subject: RE: [PATCH v4] net/ice: retry sending adminQ command after
> > failure
> >
> > Because currently just focus on the following given situation, and use
> > 'hw-
> > >dcf_enabled' it should only affect on DCF.
> > After having checked code 'ice_dcf_send_aq_cmd', it already have retry
> > mechanism.
> 
> 
> If it is already retried, then why we need this patch?
> Btw, we are talking about "retry virtual channel command from DCF to kernel
> PF" here, not the case "retry polling reply status of the a request".
> 
> >
> > > -----Original Message-----
> > > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > > Sent: Wednesday, June 1, 2022 9:54 AM
> > > To: Zhang, Peng1X <peng1x.zhang@intel.com>; Yang, Qiming
> > > <qiming.yang@intel.com>; dev@dpdk.org
> > > Cc: stable@dpdk.org
> > > Subject: RE: [PATCH v4] net/ice: retry sending adminQ command after
> > > failure
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Zhang, Peng1X <peng1x.zhang@intel.com>
> > > > Sent: Wednesday, June 1, 2022 9:49 AM
> > > > To: Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, Qiming
> > > > <qiming.yang@intel.com>; dev@dpdk.org
> > > > Cc: stable@dpdk.org
> > > > Subject: RE: [PATCH v4] net/ice: retry sending adminQ command
> > > > after failure
> > > >
> > > > I see, so ice_aq_retry_send_cmd  function will  conduct whether
> > > > DCF is enabled or not with 'hw->dcf_enabled'.
> > >
> > > Why not just enable retry in ice_dcf_send_aq_cmd?
> > >
> > >
> > > > > -----Original Message-----
> > > > > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > > > > Sent: Tuesday, May 31, 2022 7:52 PM
> > > > > To: Zhang, Peng1X <peng1x.zhang@intel.com>; Yang, Qiming
> > > > > <qiming.yang@intel.com>; dev@dpdk.org
> > > > > Cc: stable@dpdk.org
> > > > > Subject: RE: [PATCH v4] net/ice: retry sending adminQ command
> > > > > after failure
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Zhang, Peng1X <peng1x.zhang@intel.com>
> > > > > > Sent: Wednesday, June 1, 2022 1:48 AM
> > > > > > To: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z
> > > > > > <qi.z.zhang@intel.com>; dev@dpdk.org
> > > > > > Cc: Zhang, Peng1X <peng1x.zhang@intel.com>; stable@dpdk.org
> > > > > > Subject: [PATCH v4] net/ice: retry sending adminQ command
> > > > > > after failure
> > > > > >
> > > > > > From: Peng Zhang <peng1x.zhang@intel.com>
> > > > > >
> > > > > > The origin design is if error happen during the step 3 of
> > > > > > following given situation, it will return error directly
> > > > > > without retry. While in current patch, it will retry again and
> > > > > > again during certain
> > > time.
> > > > > > If retry succeed, rule can be continuously created. It will
> > > > > > improve success rate of creating rule under following given situation.
> > > > > >
> > > > > > The given situation as following steps show:
> > > > > > step 1. Kernel PF and DCF are ready at the beginning.
> > > > > > step 2. A VF reset happen, kernel send an event to DPDK DCF
> > > > > > and set STATE to pause.
> > > > > > step 3. Before DPDK DCF receive the event, it is possible a
> > > > > > rule creation is ongoing and switch rules, recipe, or vsi list
> > > > > > related adminQ operation is executing.
> > > > > > step 4. Then result of operation is failure, it will lead to
> > > > > > error return to DPDK DCF. DPDK DCF error code will be set as
> > > > > > EINVAL, not
> > > > EAGAIN.
> > > > > >
> > > > > > Fixes: 6bad5047be24 ("net/ice/base: return correct error
> > > > > > code")
> > > > > > Fixes: 453d087ccaff ("net/ice/base: add common functions")
> > > > > > Cc: stable@dpdk.org
> > > > > >
> > > > > > Signed-off-by: Peng Zhang <peng1x.zhang@intel.com>
> > > > > > ---
> > > > > >  v4 changes:
> > > > > >  - Add retry mechanism if fail to send adminQ command under
> > > > > > given
> > > > > situation.
> > > > > >  v3 Changes:
> > > > > >  - Add the situation description, expected error code and
> > > > > > incorrect error code
> > > > > >  - in commit log.
> > > > > >  v2 Changes:
> > > > > >  - Modify DCF state checking mechanism.
> > > > > >
> > > > > >  drivers/net/ice/base/ice_common.c |  2 +-
> > > > > > drivers/net/ice/base/ice_switch.c |
> > > > > > 46 +++++++++++++++++++++++++++----
> > > > > > drivers/net/ice/base/ice_switch.h
> > > > > > |  5
> > > > > > ++++
> > > > > >  3 files changed, 46 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/ice/base/ice_common.c
> > > > > > b/drivers/net/ice/base/ice_common.c
> > > > > > index db87bacd97..013c255371 100644
> > > > > > --- a/drivers/net/ice/base/ice_common.c
> > > > > > +++ b/drivers/net/ice/base/ice_common.c
> > > > > > @@ -2127,7 +2127,7 @@ ice_aq_alloc_free_res(struct ice_hw *hw,
> > > > > > u16 num_entries,
> > > > > >
> > > > > >  	cmd->num_entries = CPU_TO_LE16(num_entries);
> > > > > >
> > > > > > -	return ice_aq_send_cmd(hw, &desc, buf, buf_size, cd);
> > > > > > +	return ice_aq_retry_send_cmd(hw, &desc, buf, buf_size, cd);
> > > > >
> > > > > this is the fix only for DCF, we don't need to retry in a PF driver context.
> > > > > Better keep the same function name, but implements the retry
> > > > > mechanism inside the function, and should only be triggered
> > > > > when it is in a DCF context
  

Patch

diff --git a/drivers/net/ice/base/ice_common.c b/drivers/net/ice/base/ice_common.c
index db87bacd97..013c255371 100644
--- a/drivers/net/ice/base/ice_common.c
+++ b/drivers/net/ice/base/ice_common.c
@@ -2127,7 +2127,7 @@  ice_aq_alloc_free_res(struct ice_hw *hw, u16 num_entries,
 
 	cmd->num_entries = CPU_TO_LE16(num_entries);
 
-	return ice_aq_send_cmd(hw, &desc, buf, buf_size, cd);
+	return ice_aq_retry_send_cmd(hw, &desc, buf, buf_size, cd);
 }
 
 /**
diff --git a/drivers/net/ice/base/ice_switch.c b/drivers/net/ice/base/ice_switch.c
index d4cc664ad7..8db71f6edb 100644
--- a/drivers/net/ice/base/ice_switch.c
+++ b/drivers/net/ice/base/ice_switch.c
@@ -18,6 +18,9 @@ 
 #define ICE_ETH_P_8021Q			0x8100
 #define ICE_MPLS_ETHER_ID		0x8847
 
+#define ICE_DCF_ADMINQ_MAX_RETRIES	20
+#define ICE_DCF_ADMINQ_CHECK_TIME	2   /* msecs */
+
 /* Dummy ethernet header needed in the ice_aqc_sw_rules_elem
  * struct to configure any switch filter rules.
  * {DA (6 bytes), SA(6 bytes),
@@ -3296,7 +3299,7 @@  ice_aq_sw_rules(struct ice_hw *hw, void *rule_list, u16 rule_list_sz,
 	desc.flags |= CPU_TO_LE16(ICE_AQ_FLAG_RD);
 	desc.params.sw_rules.num_rules_fltr_entry_index =
 		CPU_TO_LE16(num_rules);
-	status = ice_aq_send_cmd(hw, &desc, rule_list, rule_list_sz, cd);
+	status = ice_aq_retry_send_cmd(hw, &desc, rule_list, rule_list_sz, cd);
 	if (opc != ice_aqc_opc_add_sw_rules &&
 	    hw->adminq.sq_last_status == ICE_AQ_RC_ENOENT)
 		status = ICE_ERR_DOES_NOT_EXIST;
@@ -3331,7 +3334,7 @@  ice_aq_add_recipe(struct ice_hw *hw,
 
 	buf_size = num_recipes * sizeof(*s_recipe_list);
 
-	return ice_aq_send_cmd(hw, &desc, s_recipe_list, buf_size, cd);
+	return ice_aq_retry_send_cmd(hw, &desc, s_recipe_list, buf_size, cd);
 }
 
 /**
@@ -3373,7 +3376,7 @@  ice_aq_get_recipe(struct ice_hw *hw,
 
 	buf_size = *num_recipes * sizeof(*s_recipe_list);
 
-	status = ice_aq_send_cmd(hw, &desc, s_recipe_list, buf_size, cd);
+	status = ice_aq_retry_send_cmd(hw, &desc, s_recipe_list, buf_size, cd);
 	*num_recipes = LE16_TO_CPU(cmd->num_sub_recipes);
 
 	return status;
@@ -3462,7 +3465,7 @@  ice_aq_map_recipe_to_profile(struct ice_hw *hw, u32 profile_id, u8 *r_bitmap,
 	ice_memcpy(cmd->recipe_assoc, r_bitmap, sizeof(cmd->recipe_assoc),
 		   ICE_NONDMA_TO_NONDMA);
 
-	return ice_aq_send_cmd(hw, &desc, NULL, 0, cd);
+	return ice_aq_retry_send_cmd(hw, &desc, NULL, 0, cd);
 }
 
 /**
@@ -3486,7 +3489,7 @@  ice_aq_get_recipe_to_profile(struct ice_hw *hw, u32 profile_id, u8 *r_bitmap,
 	ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_get_recipe_to_profile);
 	cmd->profile_id = CPU_TO_LE16(profile_id);
 
-	status = ice_aq_send_cmd(hw, &desc, NULL, 0, cd);
+	status = ice_aq_retry_send_cmd(hw, &desc, NULL, 0, cd);
 	if (!status)
 		ice_memcpy(r_bitmap, cmd->recipe_assoc,
 			   sizeof(cmd->recipe_assoc), ICE_NONDMA_TO_NONDMA);
@@ -4119,7 +4122,6 @@  ice_update_vsi_list_rule(struct ice_hw *hw, u16 *vsi_handle_arr, u16 num_vsi,
 	s_rule->pdata.vsi_list.index = CPU_TO_LE16(vsi_list_id);
 
 	status = ice_aq_sw_rules(hw, s_rule, s_rule_size, 1, opc, NULL);
-
 exit:
 	ice_free(hw, s_rule);
 	return status;
@@ -9582,3 +9584,35 @@  void ice_rm_all_sw_replay_rule_info(struct ice_hw *hw)
 {
 	ice_rm_sw_replay_rule_info(hw, hw->switch_info);
 }
+
+/**
+ * ice_aq_retry_send_cmd - helper function to retry sending FW Admin Queue commands
+ * @hw: pointer to the HW struct
+ * @desc: descriptor describing the command
+ * @buf: buffer to use for indirect commands (NULL for direct commands)
+ * @buf_size: size of buffer for indirect commands (0 for direct commands)
+ * @cd: pointer to command details structure
+ *
+ * Retry sending FW Admin Queue commands to the FW Admin Queue if fail to send and
+ * DCF function is enabled.
+ */
+enum ice_status
+ice_aq_retry_send_cmd(struct ice_hw *hw, struct ice_aq_desc *desc,
+		void *buf, u16 buf_size, struct ice_sq_cd *cd)
+{
+	enum ice_status status;
+	int i = 0;
+
+	for (;;) {
+		status = ice_aq_send_cmd(hw, desc, buf, buf_size, cd);
+		if (status == 0 || !hw->dcf_enabled)
+			break;
+
+		if (++i >= ICE_DCF_ADMINQ_MAX_RETRIES)
+			break;
+
+		rte_delay_ms(ICE_DCF_ADMINQ_CHECK_TIME);
+	}
+
+	return status;
+}
diff --git a/drivers/net/ice/base/ice_switch.h b/drivers/net/ice/base/ice_switch.h
index a2b3c80107..1a3fd5472f 100644
--- a/drivers/net/ice/base/ice_switch.h
+++ b/drivers/net/ice/base/ice_switch.h
@@ -552,4 +552,9 @@  enum ice_status
 ice_update_recipe_lkup_idx(struct ice_hw *hw,
 			   struct ice_update_recipe_lkup_idx_params *params);
 void ice_change_proto_id_to_dvm(void);
+
+/* Retry sending adminQ command */
+enum ice_status
+ice_aq_retry_send_cmd(struct ice_hw *hw, struct ice_aq_desc *desc,
+			void *buf, u16 buf_size, struct ice_sq_cd *cd);
 #endif /* _ICE_SWITCH_H_ */