[v2] doc/compress: clarify error handling on data-plane

Message ID 1554821747-27868-1-git-send-email-fiona.trahe@intel.com
State Superseded, archived
Delegated to: akhil goyal
Headers show
Series
  • [v2] doc/compress: clarify error handling on data-plane
Related show

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/checkpatch success coding style OK

Commit Message

Trahe, Fiona April 9, 2019, 2:55 p.m.
Fixed some typos and clarified which errors should be returned
when and why on the enqueue and dequeue APIs.

Fixes: a584d3bea902 ("doc: add compressdev library guide")
cc: stable@dpdk.org

Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
---
v2 changes:
 - changed "0 or undefined" to just "undefined" as 0 is superfluous.


 doc/guides/prog_guide/compressdev.rst | 46 ++++++++++++++++++++++++++++++++---
 1 file changed, 43 insertions(+), 3 deletions(-)

Comments

Shally Verma April 18, 2019, 12:12 p.m. | #1
Hi Fiona,


> -----Original Message-----
> From: Fiona Trahe <fiona.trahe@intel.com>
> Sent: Tuesday, April 9, 2019 8:26 PM
> To: dev@dpdk.org
> Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>;
> lee.daly@intel.com; Sunila Sahu <ssahu@marvell.com>; Shally Verma
> <shallyv@marvell.com>; Fiona Trahe <fiona.trahe@intel.com>;
> stable@dpdk.org
> Subject: [PATCH v2] doc/compress: clarify error handling on data-plane
> 
> Fixed some typos and clarified which errors should be returned when and
> why on the enqueue and dequeue APIs.
> 
> Fixes: a584d3bea902 ("doc: add compressdev library guide")
> cc: stable@dpdk.org
> 
> Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
> ---
> v2 changes:
>  - changed "0 or undefined" to just "undefined" as 0 is superfluous.
> 
> 
>  doc/guides/prog_guide/compressdev.rst | 46
> ++++++++++++++++++++++++++++++++---
>  1 file changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/compressdev.rst
> b/doc/guides/prog_guide/compressdev.rst
> index ad9703753..c700dd103 100644
> --- a/doc/guides/prog_guide/compressdev.rst
> +++ b/doc/guides/prog_guide/compressdev.rst
> @@ -201,7 +201,7 @@ for stateful processing of ops.
>  Operation Status
>  ~~~~~~~~~~~~~~~~
>  Each operation carries a status information updated by PMD after it is
> processed.
> -following are currently supported status:
> +Following are currently supported:
> 
>  - RTE_COMP_OP_STATUS_SUCCESS,
>      Operation is successfully completed @@ -227,14 +227,54 @@ following are
> currently supported status:
>      is not an error case. Output data up to op.produced can be used and
>      next op in the stream should continue on from op.consumed+1.
> 
> +Operation status after enqueue / dequeue
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +Some of the above values will only arise in the op after an
> +``rte_compressdev_enqueue_burst()``, some only after an
> +``rte_compressdev_dequeue_burst()``. For optimal performance on the
> +data-plane an application is not expected to check the ``op.status`` of
> +all ops after both enqueue and dequeue, it should be sufficient to only
> +check after dequeue. To facilitate this optimisation, most errors which
> +may reasonably be expected to occur in a production environment will be
> returned by the PMD on the ``dequeue``.
> +op.status may hold the following values after dequeue:
> +
> +- RTE_COMP_OP_STATUS_SUCCESS
> +- RTE_COMP_OP_STATUS_ERROR
> +- RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED
> +- RTE_COMP_OP_STATUS_OUT_OF_SPACE_RECOVERABLE
> +
> +There are some exceptions whereby errors can occur on the ``enqueue``.
> +For any error which can occur in a production environment and can be
> +successful after a retry with the same op the PMD may return the error
> +on the enqueue. 
This statement looks bit confusing. 
Seems like we are trying to add a description regarding op status check even after the enqueue call unlike current scenario, where app only check for it
after dequeue?

So if less than the full burst is enqueued there's no
> +need for the application to check op.status - the application can
> +simply retry in a later enqueue and expect success. Though the application
> is not expected to check for these, the values are as follows:
> +
> +- RTE_COMP_OP_STATUS_NOT_PROCESSED  - could occur if a hardware
> device's queue is full, after a dequeue a retry of the enqueue can be
> successful.
> +
> +- RTE_COMP_OP_STATUS_ERROR - could occur due to out-of-memory or
> other transient condition which could clear after a time.
> +
> +Other errors may also occur on an ``enqueue``, but they are only
> +expected to arise during development. As a retry with the same op won't
> +be successful, if a performant application wants to avoid checking
> +op.status on the enqueue it should ensure these never arise in a
> +production environment, e.g. by checking device capabilities and validating
> input parameters before sending operations. Examples are:
> +
> +- RTE_COMP_OP_STATUS_INVALID_ARGS
> +- RTE_COMP_OP_STATUS_ERROR (if due to a condition which is not
> +transient)
How does app identify error is transient or permanent?
 
> +- RTE_COMP_OP_STATUS_INVALID_STATE
> +
> +If an application doesn't safeguard against these AND doesn't check the
> +op.status of the next op which was not enqueued, but just retries, it could
> result in an infinite loop.
> +
May be , you are trying to insist whenever the number_enqueued ops < number actually enqueued , then caller should check for status of num_enqueued + 1th op to know exact reason, and 
take corrective measure before re-enqueuing it for following status condition:
1. INVALID_ARGS 
2. INVALID_STATE
3 ERROR 

Is that correct?




>  Produced, Consumed And Operation Status
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
>  - If status is RTE_COMP_OP_STATUS_SUCCESS,
>      consumed = amount of data read from input buffer, and
>      produced = amount of data written in destination buffer
> -- If status is RTE_COMP_OP_STATUS_FAILURE,
> -    consumed = produced = 0 or undefined
> +- If status is RTE_COMP_OP_STATUS_ERROR,
> +    consumed = produced = undefined
>  - If status is RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED,
>      consumed = 0 and
>      produced = usually 0, but in decompression cases a PMD may return > 0
> --
Acked.

> 2.13.6
Trahe, Fiona April 30, 2019, 4:33 p.m. | #2
Hi Shally,


> -----Original Message-----
> From: Shally Verma [mailto:shallyv@marvell.com]
> Sent: Thursday, April 18, 2019 1:12 PM
> To: Trahe, Fiona <fiona.trahe@intel.com>; dev@dpdk.org
> Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>; Daly, Lee <lee.daly@intel.com>; Sunila
> Sahu <ssahu@marvell.com>; stable@dpdk.org
> Subject: RE: [PATCH v2] doc/compress: clarify error handling on data-plane
> 
> Hi Fiona,
> 
> 
> > -----Original Message-----
> > From: Fiona Trahe <fiona.trahe@intel.com>
> > Sent: Tuesday, April 9, 2019 8:26 PM
> > To: dev@dpdk.org
> > Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>;
> > lee.daly@intel.com; Sunila Sahu <ssahu@marvell.com>; Shally Verma
> > <shallyv@marvell.com>; Fiona Trahe <fiona.trahe@intel.com>;
> > stable@dpdk.org
> > Subject: [PATCH v2] doc/compress: clarify error handling on data-plane
> >
> > Fixed some typos and clarified which errors should be returned when and
> > why on the enqueue and dequeue APIs.
> >
> > Fixes: a584d3bea902 ("doc: add compressdev library guide")
> > cc: stable@dpdk.org
> >
> > Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
> > ---
> > v2 changes:
> >  - changed "0 or undefined" to just "undefined" as 0 is superfluous.
> >
> >
> >  doc/guides/prog_guide/compressdev.rst | 46
> > ++++++++++++++++++++++++++++++++---
> >  1 file changed, 43 insertions(+), 3 deletions(-)
> >
> > diff --git a/doc/guides/prog_guide/compressdev.rst
> > b/doc/guides/prog_guide/compressdev.rst
> > index ad9703753..c700dd103 100644
> > --- a/doc/guides/prog_guide/compressdev.rst
> > +++ b/doc/guides/prog_guide/compressdev.rst
> > @@ -201,7 +201,7 @@ for stateful processing of ops.
> >  Operation Status
> >  ~~~~~~~~~~~~~~~~
> >  Each operation carries a status information updated by PMD after it is
> > processed.
> > -following are currently supported status:
> > +Following are currently supported:
> >
> >  - RTE_COMP_OP_STATUS_SUCCESS,
> >      Operation is successfully completed @@ -227,14 +227,54 @@ following are
> > currently supported status:
> >      is not an error case. Output data up to op.produced can be used and
> >      next op in the stream should continue on from op.consumed+1.
> >
> > +Operation status after enqueue / dequeue
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +Some of the above values will only arise in the op after an
> > +``rte_compressdev_enqueue_burst()``, some only after an
> > +``rte_compressdev_dequeue_burst()``. For optimal performance on the
> > +data-plane an application is not expected to check the ``op.status`` of
> > +all ops after both enqueue and dequeue, it should be sufficient to only
> > +check after dequeue. To facilitate this optimisation, most errors which
> > +may reasonably be expected to occur in a production environment will be
> > returned by the PMD on the ``dequeue``.
> > +op.status may hold the following values after dequeue:
> > +
> > +- RTE_COMP_OP_STATUS_SUCCESS
> > +- RTE_COMP_OP_STATUS_ERROR
> > +- RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED
> > +- RTE_COMP_OP_STATUS_OUT_OF_SPACE_RECOVERABLE
> > +
> > +There are some exceptions whereby errors can occur on the ``enqueue``.
> > +For any error which can occur in a production environment and can be
> > +successful after a retry with the same op the PMD may return the error
> > +on the enqueue.
> This statement looks bit confusing.
> Seems like we are trying to add a description regarding op status check even after the enqueue call unlike
> current scenario, where app only check for it
> after dequeue?
[Fiona] The line following this explains that there is no need to check op.status in this case.
Maybe it's not obvious that the application SHOULD check that all ops are enqueued?
I can reword as:
The application should always check the value returned by the enqueue.
If less than the full burst is enqueued there's no
need for the application to check op.status of any or every op - it can
simply retry from the return value+1 in a later enqueue and expect success.

If this isn't clear, can you suggest other wording - or expand on what's not unclear.

 
> So if less than the full burst is enqueued there's no
> > +need for the application to check op.status - the application can
> > +simply retry in a later enqueue and expect success. Though the application
> > is not expected to check for these, the values are as follows:
> > +
> > +- RTE_COMP_OP_STATUS_NOT_PROCESSED  - could occur if a hardware
> > device's queue is full, after a dequeue a retry of the enqueue can be
> > successful.
> > +
> > +- RTE_COMP_OP_STATUS_ERROR - could occur due to out-of-memory or
> > other transient condition which could clear after a time.
> > +
> > +Other errors may also occur on an ``enqueue``, but they are only
> > +expected to arise during development. As a retry with the same op won't
> > +be successful, if a performant application wants to avoid checking
> > +op.status on the enqueue it should ensure these never arise in a
> > +production environment, e.g. by checking device capabilities and validating
> > input parameters before sending operations. Examples are:
> > +
> > +- RTE_COMP_OP_STATUS_INVALID_ARGS
> > +- RTE_COMP_OP_STATUS_ERROR (if due to a condition which is not
> > +transient)
> How does app identify error is transient or permanent?
[Fiona] The app doesn't - it's up to the PMD to decide what the appropriate
return is using the logic described above. If the PMD encounters an error that's 
not transient, and can be reasonably expected in a production environment,
then it should forward the error to the dequeue - where it expects the application
to check for it.
Should I add this note at the end of this section?

> 
> > +- RTE_COMP_OP_STATUS_INVALID_STATE
> > +
> > +If an application doesn't safeguard against these AND doesn't check the
> > +op.status of the next op which was not enqueued, but just retries, it could
> > result in an infinite loop.
> > +
> May be , you are trying to insist whenever the number_enqueued ops < number actually enqueued , then
> caller should check for status of num_enqueued + 1th op to know exact reason, and
> take corrective measure before re-enqueuing it for following status condition:
> 1. INVALID_ARGS
> 2. INVALID_STATE
> 3 ERROR
> 
> Is that correct?
[Fiona] Only if the application has a slow-path for debug in non-production code.
In production code, for high-performance, I'm saying it's reasonable not to do this check.

The reason for calling this out is we did have cases where errors were being returned on the enqueue,
But the performance tool was not checking op.status of enqueued+1 and so getting into an infinite loop.
Seems reasonable that a performant application would also not check.
We've removed those cases from QAT PMD, and behave according to above logic.
But  it's not described explicitly in this detail on the API - so think it's worth calling out this expectation.
Does it make sense?

 
> 
> 
> >  Produced, Consumed And Operation Status
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> >  - If status is RTE_COMP_OP_STATUS_SUCCESS,
> >      consumed = amount of data read from input buffer, and
> >      produced = amount of data written in destination buffer
> > -- If status is RTE_COMP_OP_STATUS_FAILURE,
> > -    consumed = produced = 0 or undefined
> > +- If status is RTE_COMP_OP_STATUS_ERROR,
> > +    consumed = produced = undefined
> >  - If status is RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED,
> >      consumed = 0 and
> >      produced = usually 0, but in decompression cases a PMD may return > 0
> > --
> Acked.
> 
> > 2.13.6
Shally Verma May 7, 2019, 5:14 p.m. | #3
> -----Original Message-----
> From: Trahe, Fiona <fiona.trahe@intel.com>
> Sent: Tuesday, April 30, 2019 10:04 PM
> To: Shally Verma <shallyv@marvell.com>; dev@dpdk.org
> Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>; Daly, Lee
> <lee.daly@intel.com>; Sunila Sahu <ssahu@marvell.com>; stable@dpdk.org;
> Trahe, Fiona <fiona.trahe@intel.com>
> Subject: RE: [PATCH v2] doc/compress: clarify error handling on data-plane
> 
> Hi Shally,
> 
> 
> > -----Original Message-----
> > From: Shally Verma [mailto:shallyv@marvell.com]
> > Sent: Thursday, April 18, 2019 1:12 PM
> > To: Trahe, Fiona <fiona.trahe@intel.com>; dev@dpdk.org
> > Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>; Daly, Lee
> > <lee.daly@intel.com>; Sunila Sahu <ssahu@marvell.com>;
> stable@dpdk.org
> > Subject: RE: [PATCH v2] doc/compress: clarify error handling on
> > data-plane
> >
> > Hi Fiona,
> >
> >
> > > -----Original Message-----
> > > From: Fiona Trahe <fiona.trahe@intel.com>
> > > Sent: Tuesday, April 9, 2019 8:26 PM
> > > To: dev@dpdk.org
> > > Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>;
> > > lee.daly@intel.com; Sunila Sahu <ssahu@marvell.com>; Shally Verma
> > > <shallyv@marvell.com>; Fiona Trahe <fiona.trahe@intel.com>;
> > > stable@dpdk.org
> > > Subject: [PATCH v2] doc/compress: clarify error handling on
> > > data-plane
> > >
> > > Fixed some typos and clarified which errors should be returned when
> > > and why on the enqueue and dequeue APIs.
> > >
> > > Fixes: a584d3bea902 ("doc: add compressdev library guide")
> > > cc: stable@dpdk.org
> > >
> > > Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
> > > ---
> > > v2 changes:
> > >  - changed "0 or undefined" to just "undefined" as 0 is superfluous.
> > >
> > >
> > >  doc/guides/prog_guide/compressdev.rst | 46
> > > ++++++++++++++++++++++++++++++++---
> > >  1 file changed, 43 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/doc/guides/prog_guide/compressdev.rst
> > > b/doc/guides/prog_guide/compressdev.rst
> > > index ad9703753..c700dd103 100644
> > > --- a/doc/guides/prog_guide/compressdev.rst
> > > +++ b/doc/guides/prog_guide/compressdev.rst
> > > @@ -201,7 +201,7 @@ for stateful processing of ops.
> > >  Operation Status
> > >  ~~~~~~~~~~~~~~~~
> > >  Each operation carries a status information updated by PMD after it
> > > is processed.
> > > -following are currently supported status:
> > > +Following are currently supported:
> > >
> > >  - RTE_COMP_OP_STATUS_SUCCESS,
> > >      Operation is successfully completed @@ -227,14 +227,54 @@
> > > following are currently supported status:
> > >      is not an error case. Output data up to op.produced can be used and
> > >      next op in the stream should continue on from op.consumed+1.
> > >
> > > +Operation status after enqueue / dequeue
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +Some of the above values will only arise in the op after an
> > > +``rte_compressdev_enqueue_burst()``, some only after an
> > > +``rte_compressdev_dequeue_burst()``. For optimal performance on the
> > > +data-plane an application is not expected to check the
> > > +``op.status`` of all ops after both enqueue and dequeue, it should
> > > +be sufficient to only check after dequeue. To facilitate this
> > > +optimisation, most errors which may reasonably be expected to occur
> > > +in a production environment will be
> > > returned by the PMD on the ``dequeue``.
> > > +op.status may hold the following values after dequeue:
> > > +
> > > +- RTE_COMP_OP_STATUS_SUCCESS
> > > +- RTE_COMP_OP_STATUS_ERROR
> > > +- RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED
> > > +- RTE_COMP_OP_STATUS_OUT_OF_SPACE_RECOVERABLE
> > > +
> > > +There are some exceptions whereby errors can occur on the
> ``enqueue``.
> > > +For any error which can occur in a production environment and can
> > > +be successful after a retry with the same op the PMD may return the
> > > +error on the enqueue.
> > This statement looks bit confusing.
> > Seems like we are trying to add a description regarding op status
> > check even after the enqueue call unlike current scenario, where app
> > only check for it after dequeue?
> [Fiona] The line following this explains that there is no need to check
> op.status in this case.
> Maybe it's not obvious that the application SHOULD check that all ops are
> enqueued?
> I can reword as:
> The application should always check the value returned by the enqueue.
> If less than the full burst is enqueued there's no need for the application to
> check op.status of any or every op - it can simply retry from the return
> value+1 in a later enqueue and expect success.
> 
 I agree to purpose of patch but have these confusions when I read description above:

My understand is , if op status is INVALID_ARGS or any ERROR which is permanent in nature,
Then nb_enqd return will be less than actually passed. Regardless of whatever reason, if any time app gets nb_enqd < actually passed, then app should check status of nb_enqd + 1th op to find
exact cause of failure and then either attempt re-enqueue Or correct op preparation or take any other appropriate action.
Also, STATUS_ERROR is very generic, it can be when queue is full in which case app can re-attempt an enqueue of same op OR
It can also indicate any irrecoverable error on enqueue, in which app just probably has to reset everything. For such kind of case, it might not be possible for PMD design
to even push it into completion queue for an app to dequeue .  I would suggest  add another status code type which reflect permanent error condition i.e. irrecoverable error code
which tells an app to perform PMD qp reset/re-init to recover and simplify description just to state an expected APP behavior to avoid infinite loop condition.
It is then an app choice whether or not to check for op status for error after enqueue depending on whether its running in production environment or dev environment.

Thanks
Shally


> If this isn't clear, can you suggest other wording - or expand on what's not
> unclear.
> 
> 
> > So if less than the full burst is enqueued there's no
> > > +need for the application to check op.status - the application can
> > > +simply retry in a later enqueue and expect success. Though the
> > > +application
> > > is not expected to check for these, the values are as follows:
> > > +
> > > +- RTE_COMP_OP_STATUS_NOT_PROCESSED  - could occur if a hardware
> > > device's queue is full, after a dequeue a retry of the enqueue can
> > > be successful.
> > > +
> > > +- RTE_COMP_OP_STATUS_ERROR - could occur due to out-of-memory
> or
> > > other transient condition which could clear after a time.
> > > +
> > > +Other errors may also occur on an ``enqueue``, but they are only
> > > +expected to arise during development. As a retry with the same op
> > > +won't be successful, if a performant application wants to avoid
> > > +checking op.status on the enqueue it should ensure these never
> > > +arise in a production environment, e.g. by checking device
> > > +capabilities and validating
> > > input parameters before sending operations. Examples are:
> > > +
> > > +- RTE_COMP_OP_STATUS_INVALID_ARGS
> > > +- RTE_COMP_OP_STATUS_ERROR (if due to a condition which is not
> > > +transient)
> > How does app identify error is transient or permanent?
> [Fiona] The app doesn't - it's up to the PMD to decide what the appropriate
> return is using the logic described above. If the PMD encounters an error
> that's not transient, and can be reasonably expected in a production
> environment, then it should forward the error to the dequeue - where it
> expects the application to check for it.
> Should I add this note at the end of this section?
> 
> >
> > > +- RTE_COMP_OP_STATUS_INVALID_STATE
> > > +
> > > +If an application doesn't safeguard against these AND doesn't check
> > > +the op.status of the next op which was not enqueued, but just
> > > +retries, it could
> > > result in an infinite loop.
> > > +
> > May be , you are trying to insist whenever the number_enqueued ops <
> > number actually enqueued , then caller should check for status of
> > num_enqueued + 1th op to know exact reason, and take corrective
> measure before re-enqueuing it for following status condition:
> > 1. INVALID_ARGS
> > 2. INVALID_STATE
> > 3 ERROR
> >
> > Is that correct?
> [Fiona] Only if the application has a slow-path for debug in non-production
> code.
> In production code, for high-performance, I'm saying it's reasonable not to
> do this check.
> 
> The reason for calling this out is we did have cases where errors were being
> returned on the enqueue, But the performance tool was not checking
> op.status of enqueued+1 and so getting into an infinite loop.
> Seems reasonable that a performant application would also not check.
> We've removed those cases from QAT PMD, and behave according to above
> logic.
> But  it's not described explicitly in this detail on the API - so think it's worth
> calling out this expectation.
> Does it make sense?
> 
> 
> >
> >
> > >  Produced, Consumed And Operation Status
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >
> > >  - If status is RTE_COMP_OP_STATUS_SUCCESS,
> > >      consumed = amount of data read from input buffer, and
> > >      produced = amount of data written in destination buffer
> > > -- If status is RTE_COMP_OP_STATUS_FAILURE,
> > > -    consumed = produced = 0 or undefined
> > > +- If status is RTE_COMP_OP_STATUS_ERROR,
> > > +    consumed = produced = undefined
> > >  - If status is RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED,
> > >      consumed = 0 and
> > >      produced = usually 0, but in decompression cases a PMD may
> > > return > 0
> > > --
> > Acked.
> >
> > > 2.13.6
Trahe, Fiona May 7, 2019, 6:24 p.m. | #4
Hi Shally

> > > > +
> > > > +There are some exceptions whereby errors can occur on the
> > ``enqueue``.
> > > > +For any error which can occur in a production environment and can
> > > > +be successful after a retry with the same op the PMD may return the
> > > > +error on the enqueue.
> > > This statement looks bit confusing.
> > > Seems like we are trying to add a description regarding op status
> > > check even after the enqueue call unlike current scenario, where app
> > > only check for it after dequeue?
> > [Fiona] The line following this explains that there is no need to check
> > op.status in this case.
> > Maybe it's not obvious that the application SHOULD check that all ops are
> > enqueued?
> > I can reword as:
> > The application should always check the value returned by the enqueue.
> > If less than the full burst is enqueued there's no need for the application to
> > check op.status of any or every op - it can simply retry from the return
> > value+1 in a later enqueue and expect success.
> >
>  I agree to purpose of patch but have these confusions when I read description above:
> 
> My understand is , if op status is INVALID_ARGS or any ERROR which is permanent in nature,
> Then nb_enqd return will be less than actually passed. 
[Fiona] True.

> Regardless of whatever reason, if any time app gets
> nb_enqd < actually passed, then app should check status of nb_enqd + 1th op 
[Fiona]. No, that's exactly what I was proposing to avoid.

> to find exact cause of failure and then either attempt re-enqueue Or correct op preparation or take any other
> appropriate action.
[Fiona] I was proposing to constrain PMDs to only return a subset of errors on the enqueue, so apps could be optimised.
But if you think it's not possible for PMDs to comply with it, then yes, apps would always have to check
status of nb_enqd + 1th op, and fork depending on the status.
Is this the case?
If so, much of this patch is unnecessary and I'll send a simplified v3 as almost any status can be returned anywhere.
 
  
> Also, STATUS_ERROR is very generic, it can be when queue is full in which case app can re-attempt an
> enqueue of same op
 OR
> It can also indicate any irrecoverable error on enqueue, in which app just probably has to reset
> everything. For such kind of case, it might not be possible for PMD design
> to even push it into completion queue for an app to dequeue .  I would suggest  add another status code
> type which reflect permanent error condition i.e. irrecoverable error code
> which tells an app to perform PMD qp reset/re-init to recover and simplify description just to state an
> expected APP behavior to avoid infinite loop condition.
> It is then an app choice whether or not to check for op status for error after enqueue depending on
> whether its running in production environment or dev environment.
[Fiona] I wouldn't expect ERROR in a queue full case. I'd see ERROR as the catch-all when some
other specific status isn't appropriate. If you think there's a need for another specific status then
best send an API patch proposing it. This patch is only documenting the existing set.
Shally Verma May 8, 2019, 12:41 p.m. | #5
Hi Fiona


> -----Original Message-----
> From: Trahe, Fiona <fiona.trahe@intel.com>
> Sent: Tuesday, May 7, 2019 11:54 PM
> To: Shally Verma <shallyv@marvell.com>; dev@dpdk.org
> Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>; Daly, Lee
> <lee.daly@intel.com>; Sunila Sahu <ssahu@marvell.com>; stable@dpdk.org;
> Trahe, Fiona <fiona.trahe@intel.com>
> Subject: RE: [PATCH v2] doc/compress: clarify error handling on data-plane
> 
> Hi Shally
> 
> > > > > +
> > > > > +There are some exceptions whereby errors can occur on the
> > > ``enqueue``.
> > > > > +For any error which can occur in a production environment and
> > > > > +can be successful after a retry with the same op the PMD may
> > > > > +return the error on the enqueue.
> > > > This statement looks bit confusing.
> > > > Seems like we are trying to add a description regarding op status
> > > > check even after the enqueue call unlike current scenario, where
> > > > app only check for it after dequeue?
> > > [Fiona] The line following this explains that there is no need to
> > > check op.status in this case.
> > > Maybe it's not obvious that the application SHOULD check that all
> > > ops are enqueued?
> > > I can reword as:
> > > The application should always check the value returned by the enqueue.
> > > If less than the full burst is enqueued there's no need for the
> > > application to check op.status of any or every op - it can simply
> > > retry from the return
> > > value+1 in a later enqueue and expect success.
> > >
> >  I agree to purpose of patch but have these confusions when I read
> description above:
> >
> > My understand is , if op status is INVALID_ARGS or any ERROR which is
> > permanent in nature, Then nb_enqd return will be less than actually
> passed.
> [Fiona] True.
> 
> > Regardless of whatever reason, if any time app gets nb_enqd < actually
> > passed, then app should check status of nb_enqd + 1th op
> [Fiona]. No, that's exactly what I was proposing to avoid.
> 
> > to find exact cause of failure and then either attempt re-enqueue Or
> > correct op preparation or take any other appropriate action.
> [Fiona] I was proposing to constrain PMDs to only return a subset of errors
> on the enqueue, so apps could be optimised.
> But if you think it's not possible for PMDs to comply with it, then yes, apps
> would always have to check status of nb_enqd + 1th op, and fork depending
> on the status.
> Is this the case?
> If so, much of this patch is unnecessary and I'll send a simplified v3 as almost
> any status can be returned anywhere.
> 
[Shally] Okay. I seem to understand it now. 
Purpose seem reasonable just a simpler rephrase would help.
It will be easier for me to further feedback on 1st v2 patch sent. So will send it another email.

> 
> > Also, STATUS_ERROR is very generic, it can be when queue is full in
> > which case app can re-attempt an enqueue of same op
>  OR
> > It can also indicate any irrecoverable error on enqueue, in which app
> > just probably has to reset everything. For such kind of case, it might
> > not be possible for PMD design to even push it into completion queue
> > for an app to dequeue .  I would suggest  add another status code type
> > which reflect permanent error condition i.e. irrecoverable error code
> > which tells an app to perform PMD qp reset/re-init to recover and simplify
> description just to state an expected APP behavior to avoid infinite loop
> condition.
> > It is then an app choice whether or not to check for op status for
> > error after enqueue depending on whether its running in production
> environment or dev environment.
> [Fiona] I wouldn't expect ERROR in a queue full case. I'd see ERROR as the
> catch-all when some other specific status isn't appropriate. If you think
> there's a need for another specific status then best send an API patch
> proposing it. This patch is only documenting the existing set.

[Shally] Sorry, I missed. STATUS_NOT_PROCESSED can be indication of queue_full. 
STATUS_ERROR on dequeue = any catch-all error case
STATUS_ERROR on enqueue = any irrecoverable error on op. app should not attempt same op or may be reset queue pair or PMD.

Is this interpretation correct?
Trahe, Fiona May 8, 2019, 2 p.m. | #6
HI Shally,

> -----Original Message-----
> From: Shally Verma [mailto:shallyv@marvell.com]
> Sent: Wednesday, May 8, 2019 1:41 PM
> To: Trahe, Fiona <fiona.trahe@intel.com>; dev@dpdk.org
> Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>; Daly, Lee <lee.daly@intel.com>; Sunila
> Sahu <ssahu@marvell.com>; stable@dpdk.org
> Subject: RE: [PATCH v2] doc/compress: clarify error handling on data-plane
> 
> Hi Fiona
> 
> 
> > -----Original Message-----
> > From: Trahe, Fiona <fiona.trahe@intel.com>
> > Sent: Tuesday, May 7, 2019 11:54 PM
> > To: Shally Verma <shallyv@marvell.com>; dev@dpdk.org
> > Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>; Daly, Lee
> > <lee.daly@intel.com>; Sunila Sahu <ssahu@marvell.com>; stable@dpdk.org;
> > Trahe, Fiona <fiona.trahe@intel.com>
> > Subject: RE: [PATCH v2] doc/compress: clarify error handling on data-plane
> >
> > Hi Shally
> >
> > > > > > +
> > > > > > +There are some exceptions whereby errors can occur on the
> > > > ``enqueue``.
> > > > > > +For any error which can occur in a production environment and
> > > > > > +can be successful after a retry with the same op the PMD may
> > > > > > +return the error on the enqueue.
> > > > > This statement looks bit confusing.
> > > > > Seems like we are trying to add a description regarding op status
> > > > > check even after the enqueue call unlike current scenario, where
> > > > > app only check for it after dequeue?
> > > > [Fiona] The line following this explains that there is no need to
> > > > check op.status in this case.
> > > > Maybe it's not obvious that the application SHOULD check that all
> > > > ops are enqueued?
> > > > I can reword as:
> > > > The application should always check the value returned by the enqueue.
> > > > If less than the full burst is enqueued there's no need for the
> > > > application to check op.status of any or every op - it can simply
> > > > retry from the return
> > > > value+1 in a later enqueue and expect success.
> > > >
> > >  I agree to purpose of patch but have these confusions when I read
> > description above:
> > >
> > > My understand is , if op status is INVALID_ARGS or any ERROR which is
> > > permanent in nature, Then nb_enqd return will be less than actually
> > passed.
> > [Fiona] True.
> >
> > > Regardless of whatever reason, if any time app gets nb_enqd < actually
> > > passed, then app should check status of nb_enqd + 1th op
> > [Fiona]. No, that's exactly what I was proposing to avoid.
> >
> > > to find exact cause of failure and then either attempt re-enqueue Or
> > > correct op preparation or take any other appropriate action.
> > [Fiona] I was proposing to constrain PMDs to only return a subset of errors
> > on the enqueue, so apps could be optimised.
> > But if you think it's not possible for PMDs to comply with it, then yes, apps
> > would always have to check status of nb_enqd + 1th op, and fork depending
> > on the status.
> > Is this the case?
> > If so, much of this patch is unnecessary and I'll send a simplified v3 as almost
> > any status can be returned anywhere.
> >
> [Shally] Okay. I seem to understand it now.
> Purpose seem reasonable just a simpler rephrase would help.
> It will be easier for me to further feedback on 1st v2 patch sent. So will send it another email.
[Fiona] ok, will look for that.
 

> > > Also, STATUS_ERROR is very generic, it can be when queue is full in
> > > which case app can re-attempt an enqueue of same op
> >  OR
> > > It can also indicate any irrecoverable error on enqueue, in which app
> > > just probably has to reset everything. For such kind of case, it might
> > > not be possible for PMD design to even push it into completion queue
> > > for an app to dequeue .  I would suggest  add another status code type
> > > which reflect permanent error condition i.e. irrecoverable error code
> > > which tells an app to perform PMD qp reset/re-init to recover and simplify
> > description just to state an expected APP behavior to avoid infinite loop
> > condition.
> > > It is then an app choice whether or not to check for op status for
> > > error after enqueue depending on whether its running in production
> > environment or dev environment.
> > [Fiona] I wouldn't expect ERROR in a queue full case. I'd see ERROR as the
> > catch-all when some other specific status isn't appropriate. If you think
> > there's a need for another specific status then best send an API patch
> > proposing it. This patch is only documenting the existing set.
> 
> [Shally] Sorry, I missed. STATUS_NOT_PROCESSED can be indication of queue_full.
> STATUS_ERROR on dequeue = any catch-all error case
> STATUS_ERROR on enqueue = any irrecoverable error on op. app should not attempt same op or may be
> reset queue pair or PMD.
> 
> Is this interpretation correct?
[Fiona] Yes
Akhil Goyal May 9, 2019, 8:58 a.m. | #7
Hi Fiona/Shally,

Can we get a closure on this patch by today EOD(We are closing tree for RC4 today). Thomas will pick this patch directly on master.

Thanks,
Akhil

> -----Original Message-----
> From: Trahe, Fiona <fiona.trahe@intel.com>
> Sent: Wednesday, May 8, 2019 7:31 PM
> To: Shally Verma <shallyv@marvell.com>; dev@dpdk.org
> Cc: Akhil Goyal <akhil.goyal@nxp.com>; Ashish Gupta <ashishg@marvell.com>;
> Daly, Lee <lee.daly@intel.com>; Sunila Sahu <ssahu@marvell.com>;
> stable@dpdk.org; Trahe, Fiona <fiona.trahe@intel.com>
> Subject: RE: [PATCH v2] doc/compress: clarify error handling on data-plane
> 
> HI Shally,
> 
> > -----Original Message-----
> > From: Shally Verma [mailto:shallyv@marvell.com]
> > Sent: Wednesday, May 8, 2019 1:41 PM
> > To: Trahe, Fiona <fiona.trahe@intel.com>; dev@dpdk.org
> > Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>; Daly, Lee
> <lee.daly@intel.com>; Sunila
> > Sahu <ssahu@marvell.com>; stable@dpdk.org
> > Subject: RE: [PATCH v2] doc/compress: clarify error handling on data-plane
> >
> > Hi Fiona
> >
> >
> > > -----Original Message-----
> > > From: Trahe, Fiona <fiona.trahe@intel.com>
> > > Sent: Tuesday, May 7, 2019 11:54 PM
> > > To: Shally Verma <shallyv@marvell.com>; dev@dpdk.org
> > > Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>; Daly, Lee
> > > <lee.daly@intel.com>; Sunila Sahu <ssahu@marvell.com>; stable@dpdk.org;
> > > Trahe, Fiona <fiona.trahe@intel.com>
> > > Subject: RE: [PATCH v2] doc/compress: clarify error handling on data-plane
> > >
> > > Hi Shally
> > >
> > > > > > > +
> > > > > > > +There are some exceptions whereby errors can occur on the
> > > > > ``enqueue``.
> > > > > > > +For any error which can occur in a production environment and
> > > > > > > +can be successful after a retry with the same op the PMD may
> > > > > > > +return the error on the enqueue.
> > > > > > This statement looks bit confusing.
> > > > > > Seems like we are trying to add a description regarding op status
> > > > > > check even after the enqueue call unlike current scenario, where
> > > > > > app only check for it after dequeue?
> > > > > [Fiona] The line following this explains that there is no need to
> > > > > check op.status in this case.
> > > > > Maybe it's not obvious that the application SHOULD check that all
> > > > > ops are enqueued?
> > > > > I can reword as:
> > > > > The application should always check the value returned by the enqueue.
> > > > > If less than the full burst is enqueued there's no need for the
> > > > > application to check op.status of any or every op - it can simply
> > > > > retry from the return
> > > > > value+1 in a later enqueue and expect success.
> > > > >
> > > >  I agree to purpose of patch but have these confusions when I read
> > > description above:
> > > >
> > > > My understand is , if op status is INVALID_ARGS or any ERROR which is
> > > > permanent in nature, Then nb_enqd return will be less than actually
> > > passed.
> > > [Fiona] True.
> > >
> > > > Regardless of whatever reason, if any time app gets nb_enqd < actually
> > > > passed, then app should check status of nb_enqd + 1th op
> > > [Fiona]. No, that's exactly what I was proposing to avoid.
> > >
> > > > to find exact cause of failure and then either attempt re-enqueue Or
> > > > correct op preparation or take any other appropriate action.
> > > [Fiona] I was proposing to constrain PMDs to only return a subset of errors
> > > on the enqueue, so apps could be optimised.
> > > But if you think it's not possible for PMDs to comply with it, then yes, apps
> > > would always have to check status of nb_enqd + 1th op, and fork depending
> > > on the status.
> > > Is this the case?
> > > If so, much of this patch is unnecessary and I'll send a simplified v3 as almost
> > > any status can be returned anywhere.
> > >
> > [Shally] Okay. I seem to understand it now.
> > Purpose seem reasonable just a simpler rephrase would help.
> > It will be easier for me to further feedback on 1st v2 patch sent. So will send it
> another email.
> [Fiona] ok, will look for that.
> 
> 
> > > > Also, STATUS_ERROR is very generic, it can be when queue is full in
> > > > which case app can re-attempt an enqueue of same op
> > >  OR
> > > > It can also indicate any irrecoverable error on enqueue, in which app
> > > > just probably has to reset everything. For such kind of case, it might
> > > > not be possible for PMD design to even push it into completion queue
> > > > for an app to dequeue .  I would suggest  add another status code type
> > > > which reflect permanent error condition i.e. irrecoverable error code
> > > > which tells an app to perform PMD qp reset/re-init to recover and simplify
> > > description just to state an expected APP behavior to avoid infinite loop
> > > condition.
> > > > It is then an app choice whether or not to check for op status for
> > > > error after enqueue depending on whether its running in production
> > > environment or dev environment.
> > > [Fiona] I wouldn't expect ERROR in a queue full case. I'd see ERROR as the
> > > catch-all when some other specific status isn't appropriate. If you think
> > > there's a need for another specific status then best send an API patch
> > > proposing it. This patch is only documenting the existing set.
> >
> > [Shally] Sorry, I missed. STATUS_NOT_PROCESSED can be indication of
> queue_full.
> > STATUS_ERROR on dequeue = any catch-all error case
> > STATUS_ERROR on enqueue = any irrecoverable error on op. app should not
> attempt same op or may be
> > reset queue pair or PMD.
> >
> > Is this interpretation correct?
> [Fiona] Yes
Shally Verma May 9, 2019, 10:44 a.m. | #8
HI Fiona

> -----Original Message-----
> From: Fiona Trahe <fiona.trahe@intel.com>
> Sent: Tuesday, April 9, 2019 8:26 PM
> To: dev@dpdk.org
> Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>;
> lee.daly@intel.com; Sunila Sahu <ssahu@marvell.com>; Shally Verma
> <shallyv@marvell.com>; Fiona Trahe <fiona.trahe@intel.com>;
> stable@dpdk.org
> Subject: [PATCH v2] doc/compress: clarify error handling on data-plane
> 
> Fixed some typos and clarified which errors should be returned when and
> why on the enqueue and dequeue APIs.
> 
> Fixes: a584d3bea902 ("doc: add compressdev library guide")
> cc: stable@dpdk.org
> 
> Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
> ---
> v2 changes:
>  - changed "0 or undefined" to just "undefined" as 0 is superfluous.
> 
> 
>  doc/guides/prog_guide/compressdev.rst | 46
> ++++++++++++++++++++++++++++++++---
>  1 file changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/compressdev.rst
> b/doc/guides/prog_guide/compressdev.rst
> index ad9703753..c700dd103 100644
> --- a/doc/guides/prog_guide/compressdev.rst
> +++ b/doc/guides/prog_guide/compressdev.rst
> @@ -201,7 +201,7 @@ for stateful processing of ops.
>  Operation Status
>  ~~~~~~~~~~~~~~~~
>  Each operation carries a status information updated by PMD after it is
> processed.
> -following are currently supported status:
> +Following are currently supported:
> 
>  - RTE_COMP_OP_STATUS_SUCCESS,
>      Operation is successfully completed @@ -227,14 +227,54 @@ following are
> currently supported status:
>      is not an error case. Output data up to op.produced can be used and
>      next op in the stream should continue on from op.consumed+1.
> 
> +Operation status after enqueue / dequeue
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +Some of the above values will only arise in the op after an
> +``rte_compressdev_enqueue_burst()``, some only after an
> +``rte_compressdev_dequeue_burst()``. 

[Shally] I think this could be simply stated as "some of the above values may arise in the op after ``rte_compressdev_enqueue_burst()``" because except STATUS_NOT_PROCESSED, I don't see any other error code that cannot be returned from enqueue. 
Like STATUS_ERROR, can be returned by both enqueue() and dequeue(). 

>For optimal performance on the
> +data-plane an application is not expected to check the ``op.status`` of
> +all ops after both enqueue and dequeue, it should be sufficient to only
> +check after dequeue.

[Shally] Ack

> To facilitate this optimisation, most errors which
> +may reasonably be expected to occur in a production environment will be
> returned by the PMD on the ``dequeue``.

[Shally] This statement should be moved after the next lines with bit of rephrasing

> +op.status may hold the following values after dequeue:
> +
> +- RTE_COMP_OP_STATUS_SUCCESS
> +- RTE_COMP_OP_STATUS_ERROR
> +- RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED
> +- RTE_COMP_OP_STATUS_OUT_OF_SPACE_RECOVERABLE
> +
> +There are some exceptions whereby errors can occur on the ``enqueue``.

[Shally] this can be stated as  "To facilitate optimization in applications running in production environment, 
PMD should update and return op.status on dequeue unless in exceptional scenarios whereby error can occur on "enqueue"

> +For any error which can occur in a production environment and can be
> +successful after a retry with the same op the PMD may return the error
> +on the enqueue. 

[Shally] This statement looks redundant. If an op can be successful with retry, then no need for PMD to set an error on op. it can be left as OP_STATUS_NOT_PROCESSED.
Also, In normal scenario, if nb_enqd < actual passed, then app naturally attempt re-enqueue of an op.


So if less than the full burst is enqueued there's no
> +need for the application to check op.status - the application can
> +simply retry in a later enqueue and expect success. 
>Though the application
> is not expected to check for these, the values are as follows:

[Shally] these two statements can be combined as "if less than full burst is enqueued, then app shouldn't need to check for any error and simply should retry. However, if needed 
Then app can check for following errors:"

> +
> +- RTE_COMP_OP_STATUS_NOT_PROCESSED  - could occur if a hardware
> device's queue is full, after a dequeue a retry of the enqueue can be
> successful.
> +
> +- RTE_COMP_OP_STATUS_ERROR - could occur due to out-of-memory or
> other transient condition which could clear after a time.
> +
> +Other errors may also occur on an ``enqueue``, but they are only
> +expected to arise during development.
>As a retry with the same op won't
> +be successful, 

[Shally] Rephrasing could be "Or, any other irrecoverable error , example INVALID_ARGS or STATUS_ERROR.
In such case, the same op wont be successful"

>if a performant application wants to avoid checking
> +op.status on the enqueue it should ensure these never arise in a
> +production environment, e.g. by checking device capabilities and validating
> input parameters before sending operations. Examples are:
> +
> +- RTE_COMP_OP_STATUS_INVALID_ARGS
> +- RTE_COMP_OP_STATUS_ERROR (if due to a condition which is not
> +transient)
> +- RTE_COMP_OP_STATUS_INVALID_STATE
> +
> +If an application doesn't safeguard against these AND doesn't check the
> +op.status of the next op which was not enqueued, but just retries, it could
> result in an infinite loop.
> +
Ack. 

Thanks
Shally

>  Produced, Consumed And Operation Status
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
>  - If status is RTE_COMP_OP_STATUS_SUCCESS,
>      consumed = amount of data read from input buffer, and
>      produced = amount of data written in destination buffer
> -- If status is RTE_COMP_OP_STATUS_FAILURE,
> -    consumed = produced = 0 or undefined
> +- If status is RTE_COMP_OP_STATUS_ERROR,
> +    consumed = produced = undefined
>  - If status is RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED,
>      consumed = 0 and
>      produced = usually 0, but in decompression cases a PMD may return > 0
> --
> 2.13.6
Trahe, Fiona May 14, 2019, 3:29 p.m. | #9
Hi Shally,

Although we're close to agreement on this, I'm reconsidering.
I think the difficulty we've had finding the best wording highlights the confusion an app
developer will have in figuring out how to handle errors on enqueue. 
So I'm proposing to drop this - which was intended to allow some optimisation - and instead
propose a more robust approach, i.e. add this to the doc:

    Operation status after enqueue / dequeue
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Some of the above values may arise in the op after an
    ``rte_compressdev_enqueue_burst()``. If number ops enqueued < number ops requested then
    the app should check the op.status of nb_enqd+1. If status is
    RTE_COMP_OP_STATUS_NOT_PROCESSED, it likely indicates a full-queue case for a hardware device
    and a retry after dequeuing some ops is likely to be successful. If the op holds any other status, e.g.
    RTE_COMP_OP_STATUS_INVALID_ARGS, a retry with the same op is unlikely to be successful.
 

I know this adds an extra fork, so is less optimal, but once there's even a small chance that an error may occur on the enqueue, a robust application should probably check anyway.
What do you think?
If you agree, I'll send the doc update and a perf tool update to add the status check on the enqueue.

Btw - this doesn't stop PMDs from minimising those cases, just means they're not bound by the API to do it.

Fiona
Shally Verma May 14, 2019, 3:37 p.m. | #10
HI Fiona

> -----Original Message-----
> From: Trahe, Fiona <fiona.trahe@intel.com>
> Sent: Tuesday, May 14, 2019 9:00 PM
> To: Shally Verma <shallyv@marvell.com>; dev@dpdk.org
> Cc: akhil.goyal@nxp.com; Ashish Gupta <ashishg@marvell.com>; Daly, Lee
> <lee.daly@intel.com>; Sunila Sahu <ssahu@marvell.com>; stable@dpdk.org;
> Trahe, Fiona <fiona.trahe@intel.com>
> Subject: RE: [PATCH v2] doc/compress: clarify error handling on data-plane
> 
> Hi Shally,
> 
> Although we're close to agreement on this, I'm reconsidering.
> I think the difficulty we've had finding the best wording highlights the confusion
> an app developer will have in figuring out how to handle errors on enqueue.
> So I'm proposing to drop this - which was intended to allow some optimisation -
> and instead propose a more robust approach, i.e. add this to the doc:
> 
>     Operation status after enqueue / dequeue
>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     Some of the above values may arise in the op after an
>     ``rte_compressdev_enqueue_burst()``. If number ops enqueued < number
> ops requested then
>     the app should check the op.status of nb_enqd+1. If status is
>     RTE_COMP_OP_STATUS_NOT_PROCESSED, it likely indicates a full-queue
> case for a hardware device
>     and a retry after dequeuing some ops is likely to be successful. If the op holds
> any other status, e.g.
>     RTE_COMP_OP_STATUS_INVALID_ARGS, a retry with the same op is unlikely
> to be successful.
> 
> 
> I know this adds an extra fork, so is less optimal, but once there's even a small
> chance that an error may occur on the enqueue, a robust application should
> probably check anyway.
> What do you think?
> If you agree, I'll send the doc update and a perf tool update to add the status
> check on the enqueue.

[Shally] Yup This looks absolutely perfect to me. So it is acked by me.

> 
> Btw - this doesn't stop PMDs from minimising those cases, just means they're
> not bound by the API to do it.
> 
> Fiona

Patch

diff --git a/doc/guides/prog_guide/compressdev.rst b/doc/guides/prog_guide/compressdev.rst
index ad9703753..c700dd103 100644
--- a/doc/guides/prog_guide/compressdev.rst
+++ b/doc/guides/prog_guide/compressdev.rst
@@ -201,7 +201,7 @@  for stateful processing of ops.
 Operation Status
 ~~~~~~~~~~~~~~~~
 Each operation carries a status information updated by PMD after it is processed.
-following are currently supported status:
+Following are currently supported:
 
 - RTE_COMP_OP_STATUS_SUCCESS,
     Operation is successfully completed
@@ -227,14 +227,54 @@  following are currently supported status:
     is not an error case. Output data up to op.produced can be used and
     next op in the stream should continue on from op.consumed+1.
 
+Operation status after enqueue / dequeue
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Some of the above values will only arise in the op after an
+``rte_compressdev_enqueue_burst()``, some only after an
+``rte_compressdev_dequeue_burst()``. For optimal performance on the data-plane an
+application is not expected to check the ``op.status`` of all ops after both enqueue and dequeue,
+it should be sufficient to only check after dequeue. To facilitate this optimisation, most errors
+which may reasonably be expected to occur in a production environment will be returned by
+the PMD on the ``dequeue``.
+op.status may hold the following values after dequeue:
+
+- RTE_COMP_OP_STATUS_SUCCESS
+- RTE_COMP_OP_STATUS_ERROR
+- RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED
+- RTE_COMP_OP_STATUS_OUT_OF_SPACE_RECOVERABLE
+
+There are some exceptions whereby errors can occur on the ``enqueue``. For any error which
+can occur in a production environment and can be successful after a retry with the same op the
+PMD may return the error on the enqueue. So if less than the full burst is enqueued there's no
+need for the application to check op.status - the application can simply retry in a later enqueue
+and expect success. Though the application is not expected to check for these, the
+values are as follows:
+
+- RTE_COMP_OP_STATUS_NOT_PROCESSED  - could occur if a hardware device's queue is full, after a dequeue a retry of the enqueue can be successful.
+
+- RTE_COMP_OP_STATUS_ERROR - could occur due to out-of-memory or other transient condition which could clear after a time.
+
+Other errors may also occur on an ``enqueue``, but they are only expected to arise during
+development. As a retry with the same op won't be successful, if a performant application
+wants to avoid checking op.status on the enqueue it should ensure these never arise in a
+production environment, e.g. by checking device capabilities and validating input parameters
+before sending operations. Examples are:
+
+- RTE_COMP_OP_STATUS_INVALID_ARGS
+- RTE_COMP_OP_STATUS_ERROR (if due to a condition which is not transient)
+- RTE_COMP_OP_STATUS_INVALID_STATE
+
+If an application doesn't safeguard against these AND doesn't check the op.status of the next
+op which was not enqueued, but just retries, it could result in an infinite loop.
+
 Produced, Consumed And Operation Status
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 - If status is RTE_COMP_OP_STATUS_SUCCESS,
     consumed = amount of data read from input buffer, and
     produced = amount of data written in destination buffer
-- If status is RTE_COMP_OP_STATUS_FAILURE,
-    consumed = produced = 0 or undefined
+- If status is RTE_COMP_OP_STATUS_ERROR,
+    consumed = produced = undefined
 - If status is RTE_COMP_OP_STATUS_OUT_OF_SPACE_TERMINATED,
     consumed = 0 and
     produced = usually 0, but in decompression cases a PMD may return > 0