[v6] net/ice: add retry mechanism for DCF after failure

Message ID 20220706141709.7681-1-peng1x.zhang@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Qi Zhang
Headers
Series [v6] net/ice: add retry mechanism for DCF after failure |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/github-robot: build success github build: passed
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/intel-Testing success Testing PASS

Commit Message

Zhang, Peng1X July 6, 2022, 2:17 p.m. UTC
  From: Peng Zhang <peng1x.zhang@intel.com>

The origin design is if error happen during the step 3 of given
situation, it will return error directly without retry. While
in current patch, it will retry at every interval time during
certain time if receive designed error code 'VIRTCHNL_STATUS_ERR_RETRY'
from kernel. If retry succeed, rule can be continuously created.

The given situation as following steps show:
step 1. Kernel PF and DPDK DCF are ready at the beginning.
step 2. A VF reset happen, kernel send an event to DCF and set STATE
to pause.
step 3. Before DCF receive the event, it is possible a rule creation
is ongoing and virtual channel command from DCF to kernel PF is
executing.
step 4. Then result of command is failure, it will lead to error code
return to DCF. Error code will be set as EINVAL, not EAGAIN.

Fixes: daa714d55c72 ("net/ice: handle AdminQ command by DCF")
Cc: stable@dpdk.org

Signed-off-by: Peng Zhang <peng1x.zhang@intel.com>
---
 v6 changes:
 - Add retry mechanism for DCF if receive designed error code from kernel.
 v5 changes:
 - Add retry mechanism for DCF if the result of sending virtual channel
   command is failure.
 v4 changes:
 - Add retry mechanism if the result of sending adminQ queue command is
   failure.
 v3 Changes:
 - Add the situation description, expected error code and incorrect
   error code in commit log.
 v2 Changes:
 - Modify DCF state checking mechanism.

 drivers/common/iavf/virtchnl.h |  1 +
 drivers/net/ice/ice_dcf.c      | 32 ++++++++++++++++++++------------
 2 files changed, 21 insertions(+), 12 deletions(-)
  

Comments

Qi Zhang July 7, 2022, 6:55 a.m. UTC | #1
> -----Original Message-----
> From: Zhang, Peng1X <peng1x.zhang@intel.com>
> Sent: Wednesday, July 6, 2022 10:17 PM
> To: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; dev@dpdk.org
> Subject: [PATCH v6] net/ice: add retry mechanism for DCF after failure
> 
> From: Peng Zhang <peng1x.zhang@intel.com>
> 
> The origin design is if error happen during the step 3 of given situation, it will
> return error directly without retry. While in current patch, it will retry at every
> interval time during certain time if receive designed error code
> 'VIRTCHNL_STATUS_ERR_RETRY'
> from kernel. If retry succeed, rule can be continuously created.
> 
> The given situation as following steps show:
> step 1. Kernel PF and DPDK DCF are ready at the beginning.
> step 2. A VF reset happen, kernel send an event to DCF and set STATE to pause.
> step 3. Before DCF receive the event, it is possible a rule creation is ongoing
> and virtual channel command from DCF to kernel PF is executing.
> step 4. Then result of command is failure, it will lead to error code return to
> DCF. Error code will be set as EINVAL, not EAGAIN.
> 
> Fixes: daa714d55c72 ("net/ice: handle AdminQ command by DCF")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Peng Zhang <peng1x.zhang@intel.com>
> ---
>  v6 changes:
>  - Add retry mechanism for DCF if receive designed error code from kernel.
>  v5 changes:
>  - Add retry mechanism for DCF if the result of sending virtual channel
>    command is failure.
>  v4 changes:
>  - Add retry mechanism if the result of sending adminQ queue command is
>    failure.
>  v3 Changes:
>  - Add the situation description, expected error code and incorrect
>    error code in commit log.
>  v2 Changes:
>  - Modify DCF state checking mechanism.
> 
>  drivers/common/iavf/virtchnl.h |  1 +
>  drivers/net/ice/ice_dcf.c      | 32 ++++++++++++++++++++------------
>  2 files changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/common/iavf/virtchnl.h b/drivers/common/iavf/virtchnl.h
> index f123daec8e..e15e3a4439 100644
> --- a/drivers/common/iavf/virtchnl.h
> +++ b/drivers/common/iavf/virtchnl.h
> @@ -49,6 +49,7 @@ enum virtchnl_status_code {
>  	VIRTCHNL_STATUS_ERR_CQP_COMPL_ERROR		= -39,
>  	VIRTCHNL_STATUS_ERR_INVALID_VF_ID		= -40,
>  	VIRTCHNL_STATUS_ERR_ADMIN_QUEUE_ERROR		= -53,
> +	VIRTCHNL_STATUS_ERR_RETRY			= -63,

Where this error code be used?
  
Zhang, Peng1X July 29, 2022, 10:14 a.m. UTC | #2
The error code 'VIRTCHNL_STATUS_ERR_RETRY' is used in kernel when DCF state is busy or pause.
If DPDK receive VIRTCHNL_STATUS_ERR_RETRY from kernel, actually in 'ice_dcf_send_aq_cmd' function
the error code 'VIRTCHNL_STATUS_ERR_RETRY' will be treated as 'IAVF_ERR_NOT_READY'.

> -----Original Message-----
> From: Zhang, Qi Z <qi.z.zhang@intel.com>
> Sent: Thursday, July 7, 2022 2:56 PM
> To: Zhang, Peng1X <peng1x.zhang@intel.com>; Yang, Qiming
> <qiming.yang@intel.com>; dev@dpdk.org
> Subject: RE: [PATCH v6] net/ice: add retry mechanism for DCF after failure
> 
> 
> 
> > -----Original Message-----
> > From: Zhang, Peng1X <peng1x.zhang@intel.com>
> > Sent: Wednesday, July 6, 2022 10:17 PM
> > To: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z
> > <qi.z.zhang@intel.com>; dev@dpdk.org
> > Subject: [PATCH v6] net/ice: add retry mechanism for DCF after failure
> >
> > From: Peng Zhang <peng1x.zhang@intel.com>
> >
> > The origin design is if error happen during the step 3 of given
> > situation, it will return error directly without retry. While in
> > current patch, it will retry at every interval time during certain
> > time if receive designed error code 'VIRTCHNL_STATUS_ERR_RETRY'
> > from kernel. If retry succeed, rule can be continuously created.
> >
> > The given situation as following steps show:
> > step 1. Kernel PF and DPDK DCF are ready at the beginning.
> > step 2. A VF reset happen, kernel send an event to DCF and set STATE to
> pause.
> > step 3. Before DCF receive the event, it is possible a rule creation
> > is ongoing and virtual channel command from DCF to kernel PF is executing.
> > step 4. Then result of command is failure, it will lead to error code
> > return to DCF. Error code will be set as EINVAL, not EAGAIN.
> >
> > Fixes: daa714d55c72 ("net/ice: handle AdminQ command by DCF")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Peng Zhang <peng1x.zhang@intel.com>
> > ---
> >  v6 changes:
> >  - Add retry mechanism for DCF if receive designed error code from kernel.
> >  v5 changes:
> >  - Add retry mechanism for DCF if the result of sending virtual channel
> >    command is failure.
> >  v4 changes:
> >  - Add retry mechanism if the result of sending adminQ queue command is
> >    failure.
> >  v3 Changes:
> >  - Add the situation description, expected error code and incorrect
> >    error code in commit log.
> >  v2 Changes:
> >  - Modify DCF state checking mechanism.
> >
> >  drivers/common/iavf/virtchnl.h |  1 +
> >  drivers/net/ice/ice_dcf.c      | 32 ++++++++++++++++++++------------
> >  2 files changed, 21 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/common/iavf/virtchnl.h
> > b/drivers/common/iavf/virtchnl.h index f123daec8e..e15e3a4439 100644
> > --- a/drivers/common/iavf/virtchnl.h
> > +++ b/drivers/common/iavf/virtchnl.h
> > @@ -49,6 +49,7 @@ enum virtchnl_status_code {
> >  	VIRTCHNL_STATUS_ERR_CQP_COMPL_ERROR		= -39,
> >  	VIRTCHNL_STATUS_ERR_INVALID_VF_ID		= -40,
> >  	VIRTCHNL_STATUS_ERR_ADMIN_QUEUE_ERROR		= -53,
> > +	VIRTCHNL_STATUS_ERR_RETRY			= -63,
> 
> Where this error code be used?
  
Yiding Zhou Aug. 1, 2022, 5:44 a.m. UTC | #3
> -----Original Message-----
> From: Zhang, Qi Z <qi.z.zhang@intel.com>
> Sent: Thursday, July 7, 2022 2:56 PM
> To: Zhang, Peng1X <peng1x.zhang@intel.com>; Yang, Qiming
> <qiming.yang@intel.com>; dev@dpdk.org
> Subject: RE: [PATCH v6] net/ice: add retry mechanism for DCF after failure
> 
> 
> 
> > -----Original Message-----
> > From: Zhang, Peng1X <peng1x.zhang@intel.com>
> > Sent: Wednesday, July 6, 2022 10:17 PM
> > To: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z
> > <qi.z.zhang@intel.com>; dev@dpdk.org
> > Subject: [PATCH v6] net/ice: add retry mechanism for DCF after failure
> >
> > From: Peng Zhang <peng1x.zhang@intel.com>
> >
> > The origin design is if error happen during the step 3 of given
> > situation, it will return error directly without retry. While in
> > current patch, it will retry at every interval time during certain
> > time if receive designed error code 'VIRTCHNL_STATUS_ERR_RETRY'
> > from kernel. If retry succeed, rule can be continuously created.
> >
> > The given situation as following steps show:
> > step 1. Kernel PF and DPDK DCF are ready at the beginning.
> > step 2. A VF reset happen, kernel send an event to DCF and set STATE to
> pause.
> > step 3. Before DCF receive the event, it is possible a rule creation
> > is ongoing and virtual channel command from DCF to kernel PF is executing.
> > step 4. Then result of command is failure, it will lead to error code
> > return to DCF. Error code will be set as EINVAL, not EAGAIN.
> >
> > Fixes: daa714d55c72 ("net/ice: handle AdminQ command by DCF")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Peng Zhang <peng1x.zhang@intel.com>
> > ---
> >  v6 changes:
> >  - Add retry mechanism for DCF if receive designed error code from kernel.
> >  v5 changes:
> >  - Add retry mechanism for DCF if the result of sending virtual channel
> >    command is failure.
> >  v4 changes:
> >  - Add retry mechanism if the result of sending adminQ queue command is
> >    failure.
> >  v3 Changes:
> >  - Add the situation description, expected error code and incorrect
> >    error code in commit log.
> >  v2 Changes:
> >  - Modify DCF state checking mechanism.
> >
> >  drivers/common/iavf/virtchnl.h |  1 +
> >  drivers/net/ice/ice_dcf.c      | 32 ++++++++++++++++++++------------
> >  2 files changed, 21 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/common/iavf/virtchnl.h
> > b/drivers/common/iavf/virtchnl.h index f123daec8e..e15e3a4439 100644
> > --- a/drivers/common/iavf/virtchnl.h
> > +++ b/drivers/common/iavf/virtchnl.h
> > @@ -49,6 +49,7 @@ enum virtchnl_status_code {
> >  	VIRTCHNL_STATUS_ERR_CQP_COMPL_ERROR		= -39,
> >  	VIRTCHNL_STATUS_ERR_INVALID_VF_ID		= -40,
> >  	VIRTCHNL_STATUS_ERR_ADMIN_QUEUE_ERROR		= -53,
> > +	VIRTCHNL_STATUS_ERR_RETRY			= -63,
> 
> Where this error code be used?

This error code is not used in DPDK.  It's unnecessary to add it in vritchnl.h.
DPDK use IAVF_ERR_NOT_READY which has the same value -63.
  

Patch

diff --git a/drivers/common/iavf/virtchnl.h b/drivers/common/iavf/virtchnl.h
index f123daec8e..e15e3a4439 100644
--- a/drivers/common/iavf/virtchnl.h
+++ b/drivers/common/iavf/virtchnl.h
@@ -49,6 +49,7 @@  enum virtchnl_status_code {
 	VIRTCHNL_STATUS_ERR_CQP_COMPL_ERROR		= -39,
 	VIRTCHNL_STATUS_ERR_INVALID_VF_ID		= -40,
 	VIRTCHNL_STATUS_ERR_ADMIN_QUEUE_ERROR		= -53,
+	VIRTCHNL_STATUS_ERR_RETRY			= -63,
 	VIRTCHNL_STATUS_ERR_NOT_SUPPORTED		= -64,
 };
 
diff --git a/drivers/net/ice/ice_dcf.c b/drivers/net/ice/ice_dcf.c
index 885d58c0f4..d38c2afe8f 100644
--- a/drivers/net/ice/ice_dcf.c
+++ b/drivers/net/ice/ice_dcf.c
@@ -474,7 +474,7 @@  ice_dcf_send_aq_cmd(void *dcf_hw, struct ice_aq_desc *desc,
 	struct dcf_virtchnl_cmd desc_cmd, buff_cmd;
 	struct ice_dcf_hw *hw = dcf_hw;
 	int err = 0;
-	int i = 0;
+	int i, j = 0;
 
 	if ((buf && !buf_size) || (!buf && buf_size) ||
 	    buf_size > ICE_DCF_AQ_BUF_SZ)
@@ -501,25 +501,33 @@  ice_dcf_send_aq_cmd(void *dcf_hw, struct ice_aq_desc *desc,
 	ice_dcf_vc_cmd_set(hw, &desc_cmd);
 	ice_dcf_vc_cmd_set(hw, &buff_cmd);
 
-	if (ice_dcf_vc_cmd_send(hw, &desc_cmd) ||
-	    ice_dcf_vc_cmd_send(hw, &buff_cmd)) {
-		err = -1;
-		PMD_DRV_LOG(ERR, "fail to send OP_DCF_CMD_DESC/BUFF");
-		goto ret;
-	}
-
 	do {
-		if (!desc_cmd.pending && !buff_cmd.pending)
+		if (ice_dcf_vc_cmd_send(hw, &desc_cmd) ||
+		    ice_dcf_vc_cmd_send(hw, &buff_cmd)) {
+			err = -1;
+			PMD_DRV_LOG(ERR, "fail to send OP_DCF_CMD_DESC/BUFF");
+			goto ret;
+		}
+
+		i = 0;
+		do {
+			if (!desc_cmd.pending && !buff_cmd.pending)
+				break;
+
+			rte_delay_ms(ICE_DCF_ARQ_CHECK_TIME);
+		} while (i++ < ICE_DCF_ARQ_MAX_RETRIES);
+
+		if (desc_cmd.v_ret != IAVF_ERR_NOT_READY && buff_cmd.v_ret != IAVF_ERR_NOT_READY)
 			break;
 
 		rte_delay_ms(ICE_DCF_ARQ_CHECK_TIME);
-	} while (i++ < ICE_DCF_ARQ_MAX_RETRIES);
+	} while (j++ < ICE_DCF_ARQ_MAX_RETRIES);
 
 	if (desc_cmd.v_ret != IAVF_SUCCESS || buff_cmd.v_ret != IAVF_SUCCESS) {
 		err = -1;
 		PMD_DRV_LOG(ERR,
-			    "No response (%d times) or return failure (desc: %d / buff: %d)",
-			    i, desc_cmd.v_ret, buff_cmd.v_ret);
+			    "No response (%d times) or return failure (desc: %d / buff: %d)"
+			    " after retry %d times cmd", i, desc_cmd.v_ret, buff_cmd.v_ret, j);
 	}
 
 ret: