[v2,11/11] eventdev: RFC clarify docs on event object fields

Message ID 20240119174346.108905-12-bruce.richardson@intel.com (mailing list archive)
State Changes Requested, archived
Delegated to: Jerin Jacob
Headers
Series improve eventdev API specification/documentation |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS

Commit Message

Bruce Richardson Jan. 19, 2024, 5:43 p.m. UTC
  Clarify the meaning of the NEW, FORWARD and RELEASE event types.
For the fields in "rte_event" struct, enhance the comments on each to
clarify the field's use, and whether it is preserved between enqueue and
dequeue, and it's role, if any, in scheduling.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---

As with the previous patch, please review this patch to ensure that the
expected semantics of the various event types and event fields have not
changed in an unexpected way.
---
 lib/eventdev/rte_eventdev.h | 105 ++++++++++++++++++++++++++----------
 1 file changed, 77 insertions(+), 28 deletions(-)

--
2.40.1
  

Comments

Mattias Rönnblom Jan. 24, 2024, 11:34 a.m. UTC | #1
On 2024-01-19 18:43, Bruce Richardson wrote:
> Clarify the meaning of the NEW, FORWARD and RELEASE event types.
> For the fields in "rte_event" struct, enhance the comments on each to
> clarify the field's use, and whether it is preserved between enqueue and
> dequeue, and it's role, if any, in scheduling.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
> 
> As with the previous patch, please review this patch to ensure that the
> expected semantics of the various event types and event fields have not
> changed in an unexpected way.
> ---
>   lib/eventdev/rte_eventdev.h | 105 ++++++++++++++++++++++++++----------
>   1 file changed, 77 insertions(+), 28 deletions(-)
> 
> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> index cb13602ffb..4eff1c4958 100644
> --- a/lib/eventdev/rte_eventdev.h
> +++ b/lib/eventdev/rte_eventdev.h
> @@ -1416,21 +1416,25 @@ struct rte_event_vector {
> 
>   /* Event enqueue operations */
>   #define RTE_EVENT_OP_NEW                0
> -/**< The event producers use this operation to inject a new event to the
> +/**< The @ref rte_event.op field should be set to this type to inject a new event to the
>    * event device.
>    */

"type" -> "value"

"to" -> "into"?

You could also say "to mark the event as new".

What is new? Maybe "new (as opposed to a forwarded) event." or "new 
(i.e., not previously dequeued).".

"The application sets the @ref rte_event.op field of an enqueued event 
to this value to mark the event as new (i.e., not previously dequeued)."

>   #define RTE_EVENT_OP_FORWARD            1
> -/**< The CPU use this operation to forward the event to different event queue or
> - * change to new application specific flow or schedule type to enable
> - * pipelining.
> +/**< SW should set the @ref rte_event.op filed to this type to return a
> + * previously dequeued event to the event device for further processing.

"filed" -> "field"

"SW" -> "The application"

"to be scheduled for further processing (or transmission)"

The wording should otherwise be the same as NEW, whatever you choose there.

>    *
> - * This operation must only be enqueued to the same port that the
> + * This event *must* be enqueued to the same port that the
>    * event to be forwarded was dequeued from.

OK, so you "should" mark a new event RTE_EVENT_OP_FORWARD but you 
"*must*" enqueue it to the same port.

I think you "must" do both.

> + *
> + * The event's fields, including (but not limited to) flow_id, scheduling type,
> + * destination queue, and event payload e.g. mbuf pointer, may all be updated as
> + * desired by software, but the @ref rte_event.impl_opaque field must

"software" -> "application"

> + * be kept to the same value as was present when the event was dequeued.
>    */
>   #define RTE_EVENT_OP_RELEASE            2
>   /**< Release the flow context associated with the schedule type.
>    *
> - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ATOMIC*
> + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ATOMIC
>    * then this function hints the scheduler that the user has completed critical
>    * section processing in the current atomic context.
>    * The scheduler is now allowed to schedule events from the same flow from
> @@ -1442,21 +1446,19 @@ struct rte_event_vector {
>    * performance, but the user needs to design carefully the split into critical
>    * vs non-critical sections.
>    *
> - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ORDERED*
> - * then this function hints the scheduler that the user has done all that need
> - * to maintain event order in the current ordered context.
> - * The scheduler is allowed to release the ordered context of this port and
> - * avoid reordering any following enqueues.
> - *
> - * Early ordered context release may increase parallelism and thus system
> - * performance.
> + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ORDERED

Isn't a missing "or @ref RTE_SCHED_TYPE_ATOMIC" just an oversight (in 
the original API wording)?

> + * then this function informs the scheduler that the current event has
> + * completed processing and will not be returned to the scheduler, i.e.
> + * it has been dropped, and so the reordering context for that event
> + * should be considered filled.
>    *
> - * If current flow's scheduler type method is *RTE_SCHED_TYPE_PARALLEL*
> + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_PARALLEL
>    * or no scheduling context is held then this function may be an NOOP,
>    * depending on the implementation.

Maybe you can also fix this "function" -> "operation". I suggest you 
delete that sentence, because it makes no sense.

What is says in a somewhat vague manner is that you tread into the realm 
of undefined behavior if you release PARALLEL events.

>    *
>    * This operation must only be enqueued to the same port that the
> - * event to be released was dequeued from.
> + * event to be released was dequeued from. The @ref rte_event.impl_opaque
> + * field in the release event must match that in the original dequeued event.

I would say "same value" rather than "match".

"The @ref rte_event.impl_opaque field in the release event have the same 
value as in the original dequeued event."

>    */
> 
>   /**
> @@ -1473,53 +1475,100 @@ struct rte_event {
>   			/**< Targeted flow identifier for the enqueue and
>   			 * dequeue operation.
>   			 * The value must be in the range of
> -			 * [0, nb_event_queue_flows - 1] which
> +			 * [0, @ref rte_event_dev_config.nb_event_queue_flows - 1] which

The same comment as I had before about ranges for unsigned types.

>   			 * previously supplied to rte_event_dev_configure().
> +			 *
> +			 * For @ref RTE_SCHED_TYPE_ATOMIC, this field is used to identify a
> +			 * flow context for atomicity, such that events from each individual flow
> +			 * will only be scheduled to one port at a time.

flow_id alone doesn't identify an atomic flow. It's queue_id + flow_id. 
I'm not sure I think "flow context" adds much, unless it's defined 
somewhere. Sounds like some assumed implementation detail.

> +			 *
> +			 * This field is preserved between enqueue and dequeue when
> +			 * a device reports the @ref RTE_EVENT_DEV_CAP_CARRY_FLOW_ID
> +			 * capability. Otherwise the value is implementation dependent
> +			 * on dequeue >   			 */
>   			uint32_t sub_event_type:8;
>   			/**< Sub-event types based on the event source.
> +			 *
> +			 * This field is preserved between enqueue and dequeue.
> +			 * This field is for SW or event adapter use,

"SW" -> "application"

> +			 * and is unused in scheduling decisions.
> +			 *
>   			 * @see RTE_EVENT_TYPE_CPU
>   			 */
>   			uint32_t event_type:4;
> -			/**< Event type to classify the event source.
> -			 * @see RTE_EVENT_TYPE_ETHDEV, (RTE_EVENT_TYPE_*)
> +			/**< Event type to classify the event source. (RTE_EVENT_TYPE_*)
> +			 *
> +			 * This field is preserved between enqueue and dequeue
> +			 * This field is for SW or event adapter use,
> +			 * and is unused in scheduling decisions.

"unused" -> "is not considered"?

>   			 */
>   			uint8_t op:2;
> -			/**< The type of event enqueue operation - new/forward/
> -			 * etc.This field is not preserved across an instance
> +			/**< The type of event enqueue operation - new/forward/ etc.
> +			 *
> +			 * This field is *not* preserved across an instance
>   			 * and is undefined on dequeue.

Maybe you should use "undefined" rather than "implementation dependent", 
or change this instance of undefined to implementation dependent. Now 
two different terms are used for the same thing.

> -			 * @see RTE_EVENT_OP_NEW, (RTE_EVENT_OP_*)
> +			 *
> +			 * @see RTE_EVENT_OP_NEW
> +			 * @see RTE_EVENT_OP_FORWARD
> +			 * @see RTE_EVENT_OP_RELEASE
>   			 */
>   			uint8_t rsvd:4;
> -			/**< Reserved for future use */
> +			/**< Reserved for future use.
> +			 *
> +			 * Should be set to zero on enqueue. Zero on dequeue.
> +			 */

Why say they should be zero on dequeue? Doesn't this defeat the purpose 
of having reserved bits, for future use? With you suggested wording, you 
can't use these bits without breaking the ABI.

>   			uint8_t sched_type:2;
>   			/**< Scheduler synchronization type (RTE_SCHED_TYPE_*)
>   			 * associated with flow id on a given event queue
>   			 * for the enqueue and dequeue operation.
> +			 *
> +			 * This field is used to determine the scheduling type
> +			 * for events sent to queues where @ref RTE_EVENT_QUEUE_CFG_ALL_TYPES
> +			 * is supported.

"supported" -> "configured"

> +			 * For queues where only a single scheduling type is available,
> +			 * this field must be set to match the configured scheduling type.
> +			 *

Why is the API/event device asking this of the application?

> +			 * This field is preserved between enqueue and dequeue.
> +			 *
> +			 * @see RTE_SCHED_TYPE_ORDERED
> +			 * @see RTE_SCHED_TYPE_ATOMIC
> +			 * @see RTE_SCHED_TYPE_PARALLEL
>   			 */
>   			uint8_t queue_id;
>   			/**< Targeted event queue identifier for the enqueue or
>   			 * dequeue operation.
>   			 * The value must be in the range of
> -			 * [0, nb_event_queues - 1] which previously supplied to
> -			 * rte_event_dev_configure().
> +			 * [0, @ref rte_event_dev_config.nb_event_queues - 1] which was
> +			 * previously supplied to rte_event_dev_configure().
> +			 *
> +			 * This field is preserved between enqueue on dequeue.
>   			 */
>   			uint8_t priority;
>   			/**< Event priority relative to other events in the
>   			 * event queue. The requested priority should in the
> -			 * range of  [RTE_EVENT_DEV_PRIORITY_HIGHEST,
> -			 * RTE_EVENT_DEV_PRIORITY_LOWEST].
> +			 * range of  [@ref RTE_EVENT_DEV_PRIORITY_HIGHEST,
> +			 * @ref RTE_EVENT_DEV_PRIORITY_LOWEST].
>   			 * The implementation shall normalize the requested
>   			 * priority to supported priority value.
> +			 *
>   			 * Valid when the device has
> -			 * RTE_EVENT_DEV_CAP_EVENT_QOS capability.
> +			 * @ref RTE_EVENT_DEV_CAP_EVENT_QOS capability.
> +			 * Ignored otherwise.
> +			 *
> +			 * This field is preserved between enqueue and dequeue.

Is the normalized or unnormalized value that is preserved?

>   			 */
>   			uint8_t impl_opaque;
>   			/**< Implementation specific opaque value.

Maybe you can also fix "implementation" here to be something more 
specific. Implementation, of what?

"Event device implementation" or just "event device".

> +			 *
>   			 * An implementation may use this field to hold
>   			 * implementation specific value to share between
>   			 * dequeue and enqueue operation.
> +			 *
>   			 * The application should not modify this field.
> +			 * Its value is implementation dependent on dequeue,
> +			 * and must be returned unmodified on enqueue when
> +			 * op type is @ref RTE_EVENT_OP_FORWARD or @ref RTE_EVENT_OP_RELEASE

Should it be mentioned that impl_opaque is ignored by the event device 
for NEW events?

>   			 */
>   		};
>   	};
> --
> 2.40.1
>
  
Bruce Richardson Feb. 1, 2024, 9:35 a.m. UTC | #2
On Fri, Jan 19, 2024 at 05:43:46PM +0000, Bruce Richardson wrote:
> Clarify the meaning of the NEW, FORWARD and RELEASE event types.
> For the fields in "rte_event" struct, enhance the comments on each to
> clarify the field's use, and whether it is preserved between enqueue and
> dequeue, and it's role, if any, in scheduling.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
> 
> As with the previous patch, please review this patch to ensure that the
> expected semantics of the various event types and event fields have not
> changed in an unexpected way.
> ---
>  lib/eventdev/rte_eventdev.h | 105 ++++++++++++++++++++++++++----------
>  1 file changed, 77 insertions(+), 28 deletions(-)
> 
<snip>

>  #define RTE_EVENT_OP_RELEASE            2
>  /**< Release the flow context associated with the schedule type.
>   *
> - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ATOMIC*
> + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ATOMIC
>   * then this function hints the scheduler that the user has completed critical
>   * section processing in the current atomic context.
>   * The scheduler is now allowed to schedule events from the same flow from
> @@ -1442,21 +1446,19 @@ struct rte_event_vector {
>   * performance, but the user needs to design carefully the split into critical
>   * vs non-critical sections.
>   *
> - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ORDERED*
> - * then this function hints the scheduler that the user has done all that need
> - * to maintain event order in the current ordered context.
> - * The scheduler is allowed to release the ordered context of this port and
> - * avoid reordering any following enqueues.
> - *
> - * Early ordered context release may increase parallelism and thus system
> - * performance.

Before I do up a V3 of this patchset, I'd like to try and understand a bit
more what was meant by the original text for reordered here. The use of
"context" is very ambiguous, since it could refer to a number of different
things here.

For me, RELEASE for ordered queues should mean much the same as for atomic
queues - it means that the event being released is to be "dropped" from the
point of view of the eventdev scheduler - i.e. any atomic locks held for
that event should be released, and any reordering slots for it should be
skipped. However, the text above seems to imply that when we release one
event it means that we should stop reordering all subsequent events for
that port - which seems wrong to me. Especially in the case where
reordering may be done per flow, does one release mean that we need to go
through all flows and mark as skipped all reordered slots awaiting returned
events from that port? If this is what is intended, how is it better than
just doing another dequeue call from the port, which releases everything
automatically anyway?

Jerin, I believe you were the author of the original text, can you perhaps
clarify? Other PMD maintainers, can any of you chime in with current
supported behaviour when enqueuing a release of an ordered event?
Ideally, I'd like to see this simplified whereby release for ordered
behaves like that for atomic, and refers to the current event only, (and
drop any mention of contexts).

Thanks,
/Bruce
  
Jerin Jacob Feb. 1, 2024, 3 p.m. UTC | #3
On Thu, Feb 1, 2024 at 3:05 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Fri, Jan 19, 2024 at 05:43:46PM +0000, Bruce Richardson wrote:
> > Clarify the meaning of the NEW, FORWARD and RELEASE event types.
> > For the fields in "rte_event" struct, enhance the comments on each to
> > clarify the field's use, and whether it is preserved between enqueue and
> > dequeue, and it's role, if any, in scheduling.
> >
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> >
> > As with the previous patch, please review this patch to ensure that the
> > expected semantics of the various event types and event fields have not
> > changed in an unexpected way.
> > ---
> >  lib/eventdev/rte_eventdev.h | 105 ++++++++++++++++++++++++++----------
> >  1 file changed, 77 insertions(+), 28 deletions(-)
> >
> <snip>
>
> >  #define RTE_EVENT_OP_RELEASE            2
> >  /**< Release the flow context associated with the schedule type.
> >   *
> > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ATOMIC*
> > + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ATOMIC
> >   * then this function hints the scheduler that the user has completed critical
> >   * section processing in the current atomic context.
> >   * The scheduler is now allowed to schedule events from the same flow from
> > @@ -1442,21 +1446,19 @@ struct rte_event_vector {
> >   * performance, but the user needs to design carefully the split into critical
> >   * vs non-critical sections.
> >   *
> > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ORDERED*
> > - * then this function hints the scheduler that the user has done all that need
> > - * to maintain event order in the current ordered context.
> > - * The scheduler is allowed to release the ordered context of this port and
> > - * avoid reordering any following enqueues.
> > - *
> > - * Early ordered context release may increase parallelism and thus system
> > - * performance.
>
> Before I do up a V3 of this patchset, I'd like to try and understand a bit
> more what was meant by the original text for reordered here. The use of
> "context" is very ambiguous, since it could refer to a number of different
> things here.
>
> For me, RELEASE for ordered queues should mean much the same as for atomic
> queues - it means that the event being released is to be "dropped" from the
> point of view of the eventdev scheduler - i.e. any atomic locks held for
> that event should be released, and any reordering slots for it should be
> skipped. However, the text above seems to imply that when we release one
> event it means that we should stop reordering all subsequent events for
> that port - which seems wrong to me. Especially in the case where
> reordering may be done per flow, does one release mean that we need to go
> through all flows and mark as skipped all reordered slots awaiting returned
> events from that port? If this is what is intended, how is it better than
> just doing another dequeue call from the port, which releases everything
> automatically anyway?
>
> Jerin, I believe you were the author of the original text, can you perhaps
> clarify? Other PMD maintainers, can any of you chime in with current
> supported behaviour when enqueuing a release of an ordered event?

If N number of cores does rte_event_dequeue_burst() and got the same
flow, and it is scheduled as
RTE_SCHED_TYPE_ORDERED and then irrespective of the timing downstream
rte_event_enqueue_burst()
invocation any core. Upon rte_event_enqueue_burst() completion, the
events will be enqueued the downstream
queue in the ingress order.

Assume, one of the core, calls RTE_EVENT_OP_RELEASE  in between
dequeue and enqueue, then that event no more
eligible for the ingress order maintenance.


> Ideally, I'd like to see this simplified whereby release for ordered
> behaves like that for atomic, and refers to the current event only, (and
> drop any mention of contexts).
>
> Thanks,
> /Bruce
  
Bruce Richardson Feb. 1, 2024, 3:24 p.m. UTC | #4
On Thu, Feb 01, 2024 at 08:30:26PM +0530, Jerin Jacob wrote:
> On Thu, Feb 1, 2024 at 3:05 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> >
> > On Fri, Jan 19, 2024 at 05:43:46PM +0000, Bruce Richardson wrote:
> > > Clarify the meaning of the NEW, FORWARD and RELEASE event types.
> > > For the fields in "rte_event" struct, enhance the comments on each to
> > > clarify the field's use, and whether it is preserved between enqueue and
> > > dequeue, and it's role, if any, in scheduling.
> > >
> > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > ---
> > >
> > > As with the previous patch, please review this patch to ensure that the
> > > expected semantics of the various event types and event fields have not
> > > changed in an unexpected way.
> > > ---
> > >  lib/eventdev/rte_eventdev.h | 105 ++++++++++++++++++++++++++----------
> > >  1 file changed, 77 insertions(+), 28 deletions(-)
> > >
> > <snip>
> >
> > >  #define RTE_EVENT_OP_RELEASE            2
> > >  /**< Release the flow context associated with the schedule type.
> > >   *
> > > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ATOMIC*
> > > + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ATOMIC
> > >   * then this function hints the scheduler that the user has completed critical
> > >   * section processing in the current atomic context.
> > >   * The scheduler is now allowed to schedule events from the same flow from
> > > @@ -1442,21 +1446,19 @@ struct rte_event_vector {
> > >   * performance, but the user needs to design carefully the split into critical
> > >   * vs non-critical sections.
> > >   *
> > > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ORDERED*
> > > - * then this function hints the scheduler that the user has done all that need
> > > - * to maintain event order in the current ordered context.
> > > - * The scheduler is allowed to release the ordered context of this port and
> > > - * avoid reordering any following enqueues.
> > > - *
> > > - * Early ordered context release may increase parallelism and thus system
> > > - * performance.
> >
> > Before I do up a V3 of this patchset, I'd like to try and understand a bit
> > more what was meant by the original text for reordered here. The use of
> > "context" is very ambiguous, since it could refer to a number of different
> > things here.
> >
> > For me, RELEASE for ordered queues should mean much the same as for atomic
> > queues - it means that the event being released is to be "dropped" from the
> > point of view of the eventdev scheduler - i.e. any atomic locks held for
> > that event should be released, and any reordering slots for it should be
> > skipped. However, the text above seems to imply that when we release one
> > event it means that we should stop reordering all subsequent events for
> > that port - which seems wrong to me. Especially in the case where
> > reordering may be done per flow, does one release mean that we need to go
> > through all flows and mark as skipped all reordered slots awaiting returned
> > events from that port? If this is what is intended, how is it better than
> > just doing another dequeue call from the port, which releases everything
> > automatically anyway?
> >
> > Jerin, I believe you were the author of the original text, can you perhaps
> > clarify? Other PMD maintainers, can any of you chime in with current
> > supported behaviour when enqueuing a release of an ordered event?
> 
> If N number of cores does rte_event_dequeue_burst() and got the same
> flow, and it is scheduled as
> RTE_SCHED_TYPE_ORDERED and then irrespective of the timing downstream
> rte_event_enqueue_burst()
> invocation any core. Upon rte_event_enqueue_burst() completion, the
> events will be enqueued the downstream
> queue in the ingress order.
> 
> Assume, one of the core, calls RTE_EVENT_OP_RELEASE  in between
> dequeue and enqueue, then that event no more
> eligible for the ingress order maintenance.
> 
Thanks for the reply. Just to confirm my understanding - the RELEASE
applies to the event that is being skipped/dropped, which in a burst-mode
of operation i.e. when nb_dequeued > 1, other events may still be enqueued
from that burst and reordered appropriately. Correct?

/Bruce
  
Jerin Jacob Feb. 1, 2024, 4:20 p.m. UTC | #5
On Thu, Feb 1, 2024 at 8:54 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Thu, Feb 01, 2024 at 08:30:26PM +0530, Jerin Jacob wrote:
> > On Thu, Feb 1, 2024 at 3:05 PM Bruce Richardson
> > <bruce.richardson@intel.com> wrote:
> > >
> > > On Fri, Jan 19, 2024 at 05:43:46PM +0000, Bruce Richardson wrote:
> > > > Clarify the meaning of the NEW, FORWARD and RELEASE event types.
> > > > For the fields in "rte_event" struct, enhance the comments on each to
> > > > clarify the field's use, and whether it is preserved between enqueue and
> > > > dequeue, and it's role, if any, in scheduling.
> > > >
> > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > > ---
> > > >
> > > > As with the previous patch, please review this patch to ensure that the
> > > > expected semantics of the various event types and event fields have not
> > > > changed in an unexpected way.
> > > > ---
> > > >  lib/eventdev/rte_eventdev.h | 105 ++++++++++++++++++++++++++----------
> > > >  1 file changed, 77 insertions(+), 28 deletions(-)
> > > >
> > > <snip>
> > >
> > > >  #define RTE_EVENT_OP_RELEASE            2
> > > >  /**< Release the flow context associated with the schedule type.
> > > >   *
> > > > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ATOMIC*
> > > > + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ATOMIC
> > > >   * then this function hints the scheduler that the user has completed critical
> > > >   * section processing in the current atomic context.
> > > >   * The scheduler is now allowed to schedule events from the same flow from
> > > > @@ -1442,21 +1446,19 @@ struct rte_event_vector {
> > > >   * performance, but the user needs to design carefully the split into critical
> > > >   * vs non-critical sections.
> > > >   *
> > > > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ORDERED*
> > > > - * then this function hints the scheduler that the user has done all that need
> > > > - * to maintain event order in the current ordered context.
> > > > - * The scheduler is allowed to release the ordered context of this port and
> > > > - * avoid reordering any following enqueues.
> > > > - *
> > > > - * Early ordered context release may increase parallelism and thus system
> > > > - * performance.
> > >
> > > Before I do up a V3 of this patchset, I'd like to try and understand a bit
> > > more what was meant by the original text for reordered here. The use of
> > > "context" is very ambiguous, since it could refer to a number of different
> > > things here.
> > >
> > > For me, RELEASE for ordered queues should mean much the same as for atomic
> > > queues - it means that the event being released is to be "dropped" from the
> > > point of view of the eventdev scheduler - i.e. any atomic locks held for
> > > that event should be released, and any reordering slots for it should be
> > > skipped. However, the text above seems to imply that when we release one
> > > event it means that we should stop reordering all subsequent events for
> > > that port - which seems wrong to me. Especially in the case where
> > > reordering may be done per flow, does one release mean that we need to go
> > > through all flows and mark as skipped all reordered slots awaiting returned
> > > events from that port? If this is what is intended, how is it better than
> > > just doing another dequeue call from the port, which releases everything
> > > automatically anyway?
> > >
> > > Jerin, I believe you were the author of the original text, can you perhaps
> > > clarify? Other PMD maintainers, can any of you chime in with current
> > > supported behaviour when enqueuing a release of an ordered event?
> >
> > If N number of cores does rte_event_dequeue_burst() and got the same
> > flow, and it is scheduled as
> > RTE_SCHED_TYPE_ORDERED and then irrespective of the timing downstream
> > rte_event_enqueue_burst()
> > invocation any core. Upon rte_event_enqueue_burst() completion, the
> > events will be enqueued the downstream
> > queue in the ingress order.
> >
> > Assume, one of the core, calls RTE_EVENT_OP_RELEASE  in between
> > dequeue and enqueue, then that event no more
> > eligible for the ingress order maintenance.
> >
> Thanks for the reply. Just to confirm my understanding - the RELEASE
> applies to the event that is being skipped/dropped, which in a burst-mode
> of operation i.e. when nb_dequeued > 1, other events may still be enqueued
> from that burst and reordered appropriately. Correct?

Yes. That's my understanding too.

>
> /Bruce
  
Bruce Richardson Feb. 1, 2024, 4:59 p.m. UTC | #6
On Wed, Jan 24, 2024 at 12:34:50PM +0100, Mattias Rönnblom wrote:
> On 2024-01-19 18:43, Bruce Richardson wrote:
> > Clarify the meaning of the NEW, FORWARD and RELEASE event types.
> > For the fields in "rte_event" struct, enhance the comments on each to
> > clarify the field's use, and whether it is preserved between enqueue and
> > dequeue, and it's role, if any, in scheduling.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> > 
> > As with the previous patch, please review this patch to ensure that the
> > expected semantics of the various event types and event fields have not
> > changed in an unexpected way.
> > ---
> >   lib/eventdev/rte_eventdev.h | 105 ++++++++++++++++++++++++++----------
> >   1 file changed, 77 insertions(+), 28 deletions(-)
> > 
> > diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> > index cb13602ffb..4eff1c4958 100644
> > --- a/lib/eventdev/rte_eventdev.h
> > +++ b/lib/eventdev/rte_eventdev.h
> > @@ -1416,21 +1416,25 @@ struct rte_event_vector {
> > 
> >   /* Event enqueue operations */
> >   #define RTE_EVENT_OP_NEW                0
> > -/**< The event producers use this operation to inject a new event to the
> > +/**< The @ref rte_event.op field should be set to this type to inject a new event to the
> >    * event device.
> >    */
> 
> "type" -> "value"
> 
> "to" -> "into"?
> 
> You could also say "to mark the event as new".
> 
> What is new? Maybe "new (as opposed to a forwarded) event." or "new (i.e.,
> not previously dequeued).".
> 

Using this latter suggested wording in V3.

> "The application sets the @ref rte_event.op field of an enqueued event to
> this value to mark the event as new (i.e., not previously dequeued)."
> 
> >   #define RTE_EVENT_OP_FORWARD            1
> > -/**< The CPU use this operation to forward the event to different event queue or
> > - * change to new application specific flow or schedule type to enable
> > - * pipelining.
> > +/**< SW should set the @ref rte_event.op filed to this type to return a
> > + * previously dequeued event to the event device for further processing.
> 
> "filed" -> "field"
> 
> "SW" -> "The application"
> 
> "to be scheduled for further processing (or transmission)"
> 
> The wording should otherwise be the same as NEW, whatever you choose there.
> 
Ack.

> >    *
> > - * This operation must only be enqueued to the same port that the
> > + * This event *must* be enqueued to the same port that the
> >    * event to be forwarded was dequeued from.
> 
> OK, so you "should" mark a new event RTE_EVENT_OP_FORWARD but you "*must*"
> enqueue it to the same port.
> 
> I think you "must" do both.
> 
Ack

> > + *
> > + * The event's fields, including (but not limited to) flow_id, scheduling type,
> > + * destination queue, and event payload e.g. mbuf pointer, may all be updated as
> > + * desired by software, but the @ref rte_event.impl_opaque field must
> 
> "software" -> "application"
>
Ack
 
> > + * be kept to the same value as was present when the event was dequeued.
> >    */
> >   #define RTE_EVENT_OP_RELEASE            2
> >   /**< Release the flow context associated with the schedule type.
> >    *
> > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ATOMIC*
> > + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ATOMIC
> >    * then this function hints the scheduler that the user has completed critical
> >    * section processing in the current atomic context.
> >    * The scheduler is now allowed to schedule events from the same flow from
> > @@ -1442,21 +1446,19 @@ struct rte_event_vector {
> >    * performance, but the user needs to design carefully the split into critical
> >    * vs non-critical sections.
> >    *
> > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ORDERED*
> > - * then this function hints the scheduler that the user has done all that need
> > - * to maintain event order in the current ordered context.
> > - * The scheduler is allowed to release the ordered context of this port and
> > - * avoid reordering any following enqueues.
> > - *
> > - * Early ordered context release may increase parallelism and thus system
> > - * performance.
> > + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ORDERED
> 
> Isn't a missing "or @ref RTE_SCHED_TYPE_ATOMIC" just an oversight (in the
> original API wording)?
> 

No, I don't think so, because ATOMIC is described above.

> > + * then this function informs the scheduler that the current event has
> > + * completed processing and will not be returned to the scheduler, i.e.
> > + * it has been dropped, and so the reordering context for that event
> > + * should be considered filled.
> >    *
> > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_PARALLEL*
> > + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_PARALLEL
> >    * or no scheduling context is held then this function may be an NOOP,
> >    * depending on the implementation.
> 
> Maybe you can also fix this "function" -> "operation". I suggest you delete
> that sentence, because it makes no sense.
> 
> What is says in a somewhat vague manner is that you tread into the realm of
> undefined behavior if you release PARALLEL events.
> 

Agree. Just deleting.

> >    *
> >    * This operation must only be enqueued to the same port that the
> > - * event to be released was dequeued from.
> > + * event to be released was dequeued from. The @ref rte_event.impl_opaque
> > + * field in the release event must match that in the original dequeued event.
> 
> I would say "same value" rather than "match".
> 
> "The @ref rte_event.impl_opaque field in the release event have the same
> value as in the original dequeued event."
> 
Ack.

> >    */
> > 
> >   /**
> > @@ -1473,53 +1475,100 @@ struct rte_event {
> >   			/**< Targeted flow identifier for the enqueue and
> >   			 * dequeue operation.
> >   			 * The value must be in the range of
> > -			 * [0, nb_event_queue_flows - 1] which
> > +			 * [0, @ref rte_event_dev_config.nb_event_queue_flows - 1] which
> 
> The same comment as I had before about ranges for unsigned types.
> 
Ack.

> >   			 * previously supplied to rte_event_dev_configure().
> > +			 *
> > +			 * For @ref RTE_SCHED_TYPE_ATOMIC, this field is used to identify a
> > +			 * flow context for atomicity, such that events from each individual flow
> > +			 * will only be scheduled to one port at a time.
> 
> flow_id alone doesn't identify an atomic flow. It's queue_id + flow_id. I'm
> not sure I think "flow context" adds much, unless it's defined somewhere.
> Sounds like some assumed implementation detail.
> 
Removing the word context, and adding that it identifies a flow "within a
queue and priority level", to make it clear that it's just not the flow_id
involved here, as you say.

> > +			 *
> > +			 * This field is preserved between enqueue and dequeue when
> > +			 * a device reports the @ref RTE_EVENT_DEV_CAP_CARRY_FLOW_ID
> > +			 * capability. Otherwise the value is implementation dependent
> > +			 * on dequeue >   			 */
> >   			uint32_t sub_event_type:8;
> >   			/**< Sub-event types based on the event source.
> > +			 *
> > +			 * This field is preserved between enqueue and dequeue.
> > +			 * This field is for SW or event adapter use,
> 
> "SW" -> "application"
> 
Ack.

> > +			 * and is unused in scheduling decisions.
> > +			 *
> >   			 * @see RTE_EVENT_TYPE_CPU
> >   			 */
> >   			uint32_t event_type:4;
> > -			/**< Event type to classify the event source.
> > -			 * @see RTE_EVENT_TYPE_ETHDEV, (RTE_EVENT_TYPE_*)
> > +			/**< Event type to classify the event source. (RTE_EVENT_TYPE_*)
> > +			 *
> > +			 * This field is preserved between enqueue and dequeue
> > +			 * This field is for SW or event adapter use,
> > +			 * and is unused in scheduling decisions.
> 
> "unused" -> "is not considered"?
> 
Ack.

> >   			 */
> >   			uint8_t op:2;
> > -			/**< The type of event enqueue operation - new/forward/
> > -			 * etc.This field is not preserved across an instance
> > +			/**< The type of event enqueue operation - new/forward/ etc.
> > +			 *
> > +			 * This field is *not* preserved across an instance
> >   			 * and is undefined on dequeue.
> 
> Maybe you should use "undefined" rather than "implementation dependent", or
> change this instance of undefined to implementation dependent. Now two
> different terms are used for the same thing.
> 

Using implementation dependent.
Ideally, I think we should update all drivers to set this to "FORWARD" by
default on dequeue, but for now it's "implementation dependent".

> > -			 * @see RTE_EVENT_OP_NEW, (RTE_EVENT_OP_*)
> > +			 *
> > +			 * @see RTE_EVENT_OP_NEW
> > +			 * @see RTE_EVENT_OP_FORWARD
> > +			 * @see RTE_EVENT_OP_RELEASE
> >   			 */
> >   			uint8_t rsvd:4;
> > -			/**< Reserved for future use */
> > +			/**< Reserved for future use.
> > +			 *
> > +			 * Should be set to zero on enqueue. Zero on dequeue.
> > +			 */
> 
> Why say they should be zero on dequeue? Doesn't this defeat the purpose of
> having reserved bits, for future use? With you suggested wording, you can't
> use these bits without breaking the ABI.

Good point. Removing the dequeue value bit.

> 
> >   			uint8_t sched_type:2;
> >   			/**< Scheduler synchronization type (RTE_SCHED_TYPE_*)
> >   			 * associated with flow id on a given event queue
> >   			 * for the enqueue and dequeue operation.
> > +			 *
> > +			 * This field is used to determine the scheduling type
> > +			 * for events sent to queues where @ref RTE_EVENT_QUEUE_CFG_ALL_TYPES
> > +			 * is supported.
> 
> "supported" -> "configured"
> 
Ack.

> > +			 * For queues where only a single scheduling type is available,
> > +			 * this field must be set to match the configured scheduling type.
> > +			 *
> 
> Why is the API/event device asking this of the application?
> 
Historical reasons. I agree that it shouldn't, this should just be marked
as ignored on fixed-type queues, but the spec up till now says it must
match and some drivers do check this. Once we update the drivers to stop
checking then we can change the spec without affecting apps.

> > +			 * This field is preserved between enqueue and
> > dequeue.  +			 * +			 * @see
> > RTE_SCHED_TYPE_ORDERED +			 * @see
> > RTE_SCHED_TYPE_ATOMIC +			 * @see
> > RTE_SCHED_TYPE_PARALLEL */ uint8_t queue_id; /**< Targeted event queue
> > identifier for the enqueue or * dequeue operation.  * The value must be
> > in the range of -			 * [0, nb_event_queues - 1] which
> > previously supplied to -			 *
> > rte_event_dev_configure().  +			 * [0, @ref
> > rte_event_dev_config.nb_event_queues - 1] which was +
> > * previously supplied to rte_event_dev_configure().  +
> > * +			 * This field is preserved between enqueue on
> > dequeue.  */ uint8_t priority; /**< Event priority relative to other
> > events in the * event queue. The requested priority should in the -
> > * range of  [RTE_EVENT_DEV_PRIORITY_HIGHEST, -			 *
> > RTE_EVENT_DEV_PRIORITY_LOWEST].  +			 * range of  [@ref
> > RTE_EVENT_DEV_PRIORITY_HIGHEST, +			 * @ref
> > RTE_EVENT_DEV_PRIORITY_LOWEST].  * The implementation shall normalize
> > the requested * priority to supported priority value.  +
> > * * Valid when the device has -			 *
> > RTE_EVENT_DEV_CAP_EVENT_QOS capability.  +			 * @ref
> > RTE_EVENT_DEV_CAP_EVENT_QOS capability.  +			 * Ignored
> > otherwise.  +			 * +			 * This
> > field is preserved between enqueue and dequeue.
> 
> Is the normalized or unnormalized value that is preserved?
> 
Very good point. It's the normalized & then denormalised version that is
guaranteed to be preserved, I suspect. SW eventdevs probably preserve
as-is, but HW eventdevs may lose precision. Rather than making this
"implementation defined" or "not preserved" which would be annoying for
apps, I think, I'm going to document this as "preserved, but with possible
loss of precision".

> >   			 */
> >   			uint8_t impl_opaque;
> >   			/**< Implementation specific opaque value.
> 
> Maybe you can also fix "implementation" here to be something more specific.
> Implementation, of what?
> 
> "Event device implementation" or just "event device".
> 
"Opaque field for event device use"

> > +			 *
> >   			 * An implementation may use this field to hold
> >   			 * implementation specific value to share between
> >   			 * dequeue and enqueue operation.
> > +			 *
> >   			 * The application should not modify this field.
> > +			 * Its value is implementation dependent on dequeue,
> > +			 * and must be returned unmodified on enqueue when
> > +			 * op type is @ref RTE_EVENT_OP_FORWARD or @ref RTE_EVENT_OP_RELEASE
> 
> Should it be mentioned that impl_opaque is ignored by the event device for
> NEW events?
> 
Added in V3.

> >   			 */
> >   		};
> >   	};
> > --
> > 2.40.1
> >
  
Bruce Richardson Feb. 1, 2024, 5:02 p.m. UTC | #7
On Wed, Jan 24, 2024 at 12:34:50PM +0100, Mattias Rönnblom wrote:
> On 2024-01-19 18:43, Bruce Richardson wrote:
> > Clarify the meaning of the NEW, FORWARD and RELEASE event types.
> > For the fields in "rte_event" struct, enhance the comments on each to
> > clarify the field's use, and whether it is preserved between enqueue and
> > dequeue, and it's role, if any, in scheduling.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> > 
> > As with the previous patch, please review this patch to ensure that the
> > expected semantics of the various event types and event fields have not
> > changed in an unexpected way.
> > ---
> >   lib/eventdev/rte_eventdev.h | 105 ++++++++++++++++++++++++++----------
> >   1 file changed, 77 insertions(+), 28 deletions(-)
> > 
> > diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> > index cb13602ffb..4eff1c4958 100644
> > --- a/lib/eventdev/rte_eventdev.h
> > +++ b/lib/eventdev/rte_eventdev.h
<snip>

> >   /**
> > @@ -1473,53 +1475,100 @@ struct rte_event {
> >   			/**< Targeted flow identifier for the enqueue and
> >   			 * dequeue operation.
> >   			 * The value must be in the range of
> > -			 * [0, nb_event_queue_flows - 1] which
> > +			 * [0, @ref rte_event_dev_config.nb_event_queue_flows - 1] which
> 
> The same comment as I had before about ranges for unsigned types.
> 
Actually, is this correct, does a range actually apply here? 

I thought that the number of queue flows supported was a guide as to how
internal HW resources were to be allocated, and that the flow_id was always
a 20-bit value, where it was up to the scheduler to work out how to map
that to internal atomic locks (when combined with queue ids etc.). It
should not be up to the app to have to do the range limiting itself!

/Bruce
  
Bruce Richardson Feb. 2, 2024, 9:14 a.m. UTC | #8
On Thu, Feb 01, 2024 at 05:02:44PM +0000, Bruce Richardson wrote:
> On Wed, Jan 24, 2024 at 12:34:50PM +0100, Mattias Rönnblom wrote:
> > On 2024-01-19 18:43, Bruce Richardson wrote:
> > > Clarify the meaning of the NEW, FORWARD and RELEASE event types.
> > > For the fields in "rte_event" struct, enhance the comments on each to
> > > clarify the field's use, and whether it is preserved between enqueue and
> > > dequeue, and it's role, if any, in scheduling.
> > > 
> > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > ---
> > > 
> > > As with the previous patch, please review this patch to ensure that the
> > > expected semantics of the various event types and event fields have not
> > > changed in an unexpected way.
> > > ---
> > >   lib/eventdev/rte_eventdev.h | 105 ++++++++++++++++++++++++++----------
> > >   1 file changed, 77 insertions(+), 28 deletions(-)
> > > 
> > > diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> > > index cb13602ffb..4eff1c4958 100644
> > > --- a/lib/eventdev/rte_eventdev.h
> > > +++ b/lib/eventdev/rte_eventdev.h
> <snip>
> 
> > >   /**
> > > @@ -1473,53 +1475,100 @@ struct rte_event {
> > >   			/**< Targeted flow identifier for the enqueue and
> > >   			 * dequeue operation.
> > >   			 * The value must be in the range of
> > > -			 * [0, nb_event_queue_flows - 1] which
> > > +			 * [0, @ref rte_event_dev_config.nb_event_queue_flows - 1] which
> > 
> > The same comment as I had before about ranges for unsigned types.
> > 
> Actually, is this correct, does a range actually apply here? 
> 
> I thought that the number of queue flows supported was a guide as to how
> internal HW resources were to be allocated, and that the flow_id was always
> a 20-bit value, where it was up to the scheduler to work out how to map
> that to internal atomic locks (when combined with queue ids etc.). It
> should not be up to the app to have to do the range limiting itself!
> 
Looking at the RX adapter in eventdev, I don't see any obvious clamping of
the flow ids to the range of 0-nb_event_queue_flows, though I'm not that
familiar with that code, so I may have missed something. If I'm right,
it looks like this doc line may indeed by a mistake.

@Jerin, can you comment again here. Is flow_id really meant to be limited
to the specified range, or is it a full 20-bit value supplied in all cases?

Thanks,
/Bruce
  
Jerin Jacob Feb. 2, 2024, 9:22 a.m. UTC | #9
On Thu, Feb 1, 2024 at 10:33 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Wed, Jan 24, 2024 at 12:34:50PM +0100, Mattias Rönnblom wrote:
> > On 2024-01-19 18:43, Bruce Richardson wrote:
> > > Clarify the meaning of the NEW, FORWARD and RELEASE event types.
> > > For the fields in "rte_event" struct, enhance the comments on each to
> > > clarify the field's use, and whether it is preserved between enqueue and
> > > dequeue, and it's role, if any, in scheduling.
> > >
> > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > ---
> > >
> > > As with the previous patch, please review this patch to ensure that the
> > > expected semantics of the various event types and event fields have not
> > > changed in an unexpected way.
> > > ---
> > >   lib/eventdev/rte_eventdev.h | 105 ++++++++++++++++++++++++++----------
> > >   1 file changed, 77 insertions(+), 28 deletions(-)
> > >
> > > diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> > > index cb13602ffb..4eff1c4958 100644
> > > --- a/lib/eventdev/rte_eventdev.h
> > > +++ b/lib/eventdev/rte_eventdev.h
> <snip>
>
> > >   /**
> > > @@ -1473,53 +1475,100 @@ struct rte_event {
> > >                     /**< Targeted flow identifier for the enqueue and
> > >                      * dequeue operation.
> > >                      * The value must be in the range of
> > > -                    * [0, nb_event_queue_flows - 1] which
> > > +                    * [0, @ref rte_event_dev_config.nb_event_queue_flows - 1] which
> >
> > The same comment as I had before about ranges for unsigned types.
> >
> Actually, is this correct, does a range actually apply here?
>
> I thought that the number of queue flows supported was a guide as to how
> internal HW resources were to be allocated, and that the flow_id was always
> a 20-bit value, where it was up to the scheduler to work out how to map
> that to internal atomic locks (when combined with queue ids etc.). It
> should not be up to the app to have to do the range limiting itself!

On CNXK HW, it supports 20bit value. I am not sure about other HW.
That is the reason I add this configuration parameter by allowing HW
to be configured if it is NOT free.

>
> /Bruce
  
Bruce Richardson Feb. 2, 2024, 9:36 a.m. UTC | #10
On Fri, Feb 02, 2024 at 02:52:05PM +0530, Jerin Jacob wrote:
> On Thu, Feb 1, 2024 at 10:33 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> >
> > On Wed, Jan 24, 2024 at 12:34:50PM +0100, Mattias Rönnblom wrote:
> > > On 2024-01-19 18:43, Bruce Richardson wrote:
> > > > Clarify the meaning of the NEW, FORWARD and RELEASE event types.
> > > > For the fields in "rte_event" struct, enhance the comments on each to
> > > > clarify the field's use, and whether it is preserved between enqueue and
> > > > dequeue, and it's role, if any, in scheduling.
> > > >
> > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > > ---
> > > >
> > > > As with the previous patch, please review this patch to ensure that the
> > > > expected semantics of the various event types and event fields have not
> > > > changed in an unexpected way.
> > > > ---
> > > >   lib/eventdev/rte_eventdev.h | 105 ++++++++++++++++++++++++++----------
> > > >   1 file changed, 77 insertions(+), 28 deletions(-)
> > > >
> > > > diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> > > > index cb13602ffb..4eff1c4958 100644
> > > > --- a/lib/eventdev/rte_eventdev.h
> > > > +++ b/lib/eventdev/rte_eventdev.h
> > <snip>
> >
> > > >   /**
> > > > @@ -1473,53 +1475,100 @@ struct rte_event {
> > > >                     /**< Targeted flow identifier for the enqueue and
> > > >                      * dequeue operation.
> > > >                      * The value must be in the range of
> > > > -                    * [0, nb_event_queue_flows - 1] which
> > > > +                    * [0, @ref rte_event_dev_config.nb_event_queue_flows - 1] which
> > >
> > > The same comment as I had before about ranges for unsigned types.
> > >
> > Actually, is this correct, does a range actually apply here?
> >
> > I thought that the number of queue flows supported was a guide as to how
> > internal HW resources were to be allocated, and that the flow_id was always
> > a 20-bit value, where it was up to the scheduler to work out how to map
> > that to internal atomic locks (when combined with queue ids etc.). It
> > should not be up to the app to have to do the range limiting itself!
> 
> On CNXK HW, it supports 20bit value. I am not sure about other HW.
> That is the reason I add this configuration parameter by allowing HW
> to be configured if it is NOT free.
> 
Ok, but that is making the assumption that the number of flow slots is
directly related to the number of bits of flow_id which can be passed in. I
think it's the driver or device's job to hash down the bits if necessary
internally.

For v3 I'm going to remove this sentence, as the event RX adapter doesn't
seem to be limiting things, and nobody's reported an issue with it, and
also because the rte_event_dev_config struct itself doesn't mention the
config value having an impact on the flow-ids that can be passed.

/Bruce
  
Mattias Rönnblom Feb. 2, 2024, 9:38 a.m. UTC | #11
On 2024-02-01 17:59, Bruce Richardson wrote:
> On Wed, Jan 24, 2024 at 12:34:50PM +0100, Mattias Rönnblom wrote:
>> On 2024-01-19 18:43, Bruce Richardson wrote:
>>> Clarify the meaning of the NEW, FORWARD and RELEASE event types.
>>> For the fields in "rte_event" struct, enhance the comments on each to
>>> clarify the field's use, and whether it is preserved between enqueue and
>>> dequeue, and it's role, if any, in scheduling.
>>>
>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>> ---
>>>
>>> As with the previous patch, please review this patch to ensure that the
>>> expected semantics of the various event types and event fields have not
>>> changed in an unexpected way.
>>> ---
>>>    lib/eventdev/rte_eventdev.h | 105 ++++++++++++++++++++++++++----------
>>>    1 file changed, 77 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
>>> index cb13602ffb..4eff1c4958 100644
>>> --- a/lib/eventdev/rte_eventdev.h
>>> +++ b/lib/eventdev/rte_eventdev.h
>>> @@ -1416,21 +1416,25 @@ struct rte_event_vector {
>>>
>>>    /* Event enqueue operations */
>>>    #define RTE_EVENT_OP_NEW                0
>>> -/**< The event producers use this operation to inject a new event to the
>>> +/**< The @ref rte_event.op field should be set to this type to inject a new event to the
>>>     * event device.
>>>     */
>>
>> "type" -> "value"
>>
>> "to" -> "into"?
>>
>> You could also say "to mark the event as new".
>>
>> What is new? Maybe "new (as opposed to a forwarded) event." or "new (i.e.,
>> not previously dequeued).".
>>
> 
> Using this latter suggested wording in V3.
> 
>> "The application sets the @ref rte_event.op field of an enqueued event to
>> this value to mark the event as new (i.e., not previously dequeued)."
>>
>>>    #define RTE_EVENT_OP_FORWARD            1
>>> -/**< The CPU use this operation to forward the event to different event queue or
>>> - * change to new application specific flow or schedule type to enable
>>> - * pipelining.
>>> +/**< SW should set the @ref rte_event.op filed to this type to return a
>>> + * previously dequeued event to the event device for further processing.
>>
>> "filed" -> "field"
>>
>> "SW" -> "The application"
>>
>> "to be scheduled for further processing (or transmission)"
>>
>> The wording should otherwise be the same as NEW, whatever you choose there.
>>
> Ack.
> 
>>>     *
>>> - * This operation must only be enqueued to the same port that the
>>> + * This event *must* be enqueued to the same port that the
>>>     * event to be forwarded was dequeued from.
>>
>> OK, so you "should" mark a new event RTE_EVENT_OP_FORWARD but you "*must*"
>> enqueue it to the same port.
>>
>> I think you "must" do both.
>>
> Ack
> 
>>> + *
>>> + * The event's fields, including (but not limited to) flow_id, scheduling type,
>>> + * destination queue, and event payload e.g. mbuf pointer, may all be updated as
>>> + * desired by software, but the @ref rte_event.impl_opaque field must
>>
>> "software" -> "application"
>>
> Ack
>   
>>> + * be kept to the same value as was present when the event was dequeued.
>>>     */
>>>    #define RTE_EVENT_OP_RELEASE            2
>>>    /**< Release the flow context associated with the schedule type.
>>>     *
>>> - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ATOMIC*
>>> + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ATOMIC
>>>     * then this function hints the scheduler that the user has completed critical
>>>     * section processing in the current atomic context.
>>>     * The scheduler is now allowed to schedule events from the same flow from
>>> @@ -1442,21 +1446,19 @@ struct rte_event_vector {
>>>     * performance, but the user needs to design carefully the split into critical
>>>     * vs non-critical sections.
>>>     *
>>> - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ORDERED*
>>> - * then this function hints the scheduler that the user has done all that need
>>> - * to maintain event order in the current ordered context.
>>> - * The scheduler is allowed to release the ordered context of this port and
>>> - * avoid reordering any following enqueues.
>>> - *
>>> - * Early ordered context release may increase parallelism and thus system
>>> - * performance.
>>> + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ORDERED
>>
>> Isn't a missing "or @ref RTE_SCHED_TYPE_ATOMIC" just an oversight (in the
>> original API wording)?
>>
> 
> No, I don't think so, because ATOMIC is described above.
> 
>>> + * then this function informs the scheduler that the current event has
>>> + * completed processing and will not be returned to the scheduler, i.e.
>>> + * it has been dropped, and so the reordering context for that event
>>> + * should be considered filled.
>>>     *
>>> - * If current flow's scheduler type method is *RTE_SCHED_TYPE_PARALLEL*
>>> + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_PARALLEL
>>>     * or no scheduling context is held then this function may be an NOOP,
>>>     * depending on the implementation.
>>
>> Maybe you can also fix this "function" -> "operation". I suggest you delete
>> that sentence, because it makes no sense.
>>
>> What is says in a somewhat vague manner is that you tread into the realm of
>> undefined behavior if you release PARALLEL events.
>>
> 
> Agree. Just deleting.
> 
>>>     *
>>>     * This operation must only be enqueued to the same port that the
>>> - * event to be released was dequeued from.
>>> + * event to be released was dequeued from. The @ref rte_event.impl_opaque
>>> + * field in the release event must match that in the original dequeued event.
>>
>> I would say "same value" rather than "match".
>>
>> "The @ref rte_event.impl_opaque field in the release event have the same
>> value as in the original dequeued event."
>>
> Ack.
> 
>>>     */
>>>
>>>    /**
>>> @@ -1473,53 +1475,100 @@ struct rte_event {
>>>    			/**< Targeted flow identifier for the enqueue and
>>>    			 * dequeue operation.
>>>    			 * The value must be in the range of
>>> -			 * [0, nb_event_queue_flows - 1] which
>>> +			 * [0, @ref rte_event_dev_config.nb_event_queue_flows - 1] which
>>
>> The same comment as I had before about ranges for unsigned types.
>>
> Ack.
> 
>>>    			 * previously supplied to rte_event_dev_configure().
>>> +			 *
>>> +			 * For @ref RTE_SCHED_TYPE_ATOMIC, this field is used to identify a
>>> +			 * flow context for atomicity, such that events from each individual flow
>>> +			 * will only be scheduled to one port at a time.
>>
>> flow_id alone doesn't identify an atomic flow. It's queue_id + flow_id. I'm
>> not sure I think "flow context" adds much, unless it's defined somewhere.
>> Sounds like some assumed implementation detail.
>>
> Removing the word context, and adding that it identifies a flow "within a
> queue and priority level", to make it clear that it's just not the flow_id
> involved here, as you say.
> 
>>> +			 *
>>> +			 * This field is preserved between enqueue and dequeue when
>>> +			 * a device reports the @ref RTE_EVENT_DEV_CAP_CARRY_FLOW_ID
>>> +			 * capability. Otherwise the value is implementation dependent
>>> +			 * on dequeue >   			 */
>>>    			uint32_t sub_event_type:8;
>>>    			/**< Sub-event types based on the event source.
>>> +			 *
>>> +			 * This field is preserved between enqueue and dequeue.
>>> +			 * This field is for SW or event adapter use,
>>
>> "SW" -> "application"
>>
> Ack.
> 
>>> +			 * and is unused in scheduling decisions.
>>> +			 *
>>>    			 * @see RTE_EVENT_TYPE_CPU
>>>    			 */
>>>    			uint32_t event_type:4;
>>> -			/**< Event type to classify the event source.
>>> -			 * @see RTE_EVENT_TYPE_ETHDEV, (RTE_EVENT_TYPE_*)
>>> +			/**< Event type to classify the event source. (RTE_EVENT_TYPE_*)
>>> +			 *
>>> +			 * This field is preserved between enqueue and dequeue
>>> +			 * This field is for SW or event adapter use,
>>> +			 * and is unused in scheduling decisions.
>>
>> "unused" -> "is not considered"?
>>
> Ack.
> 
>>>    			 */
>>>    			uint8_t op:2;
>>> -			/**< The type of event enqueue operation - new/forward/
>>> -			 * etc.This field is not preserved across an instance
>>> +			/**< The type of event enqueue operation - new/forward/ etc.
>>> +			 *
>>> +			 * This field is *not* preserved across an instance
>>>    			 * and is undefined on dequeue.
>>
>> Maybe you should use "undefined" rather than "implementation dependent", or
>> change this instance of undefined to implementation dependent. Now two
>> different terms are used for the same thing.
>>
> 
> Using implementation dependent.
> Ideally, I think we should update all drivers to set this to "FORWARD" by
> default on dequeue, but for now it's "implementation dependent".
> 

That would make a lot of sense.

>>> -			 * @see RTE_EVENT_OP_NEW, (RTE_EVENT_OP_*)
>>> +			 *
>>> +			 * @see RTE_EVENT_OP_NEW
>>> +			 * @see RTE_EVENT_OP_FORWARD
>>> +			 * @see RTE_EVENT_OP_RELEASE
>>>    			 */
>>>    			uint8_t rsvd:4;
>>> -			/**< Reserved for future use */
>>> +			/**< Reserved for future use.
>>> +			 *
>>> +			 * Should be set to zero on enqueue. Zero on dequeue.
>>> +			 */
>>
>> Why say they should be zero on dequeue? Doesn't this defeat the purpose of
>> having reserved bits, for future use? With you suggested wording, you can't
>> use these bits without breaking the ABI.
> 
> Good point. Removing the dequeue value bit.
> 
>>
>>>    			uint8_t sched_type:2;
>>>    			/**< Scheduler synchronization type (RTE_SCHED_TYPE_*)
>>>    			 * associated with flow id on a given event queue
>>>    			 * for the enqueue and dequeue operation.
>>> +			 *
>>> +			 * This field is used to determine the scheduling type
>>> +			 * for events sent to queues where @ref RTE_EVENT_QUEUE_CFG_ALL_TYPES
>>> +			 * is supported.
>>
>> "supported" -> "configured"
>>
> Ack.
> 
>>> +			 * For queues where only a single scheduling type is available,
>>> +			 * this field must be set to match the configured scheduling type.
>>> +			 *
>>
>> Why is the API/event device asking this of the application?
>>
> Historical reasons. I agree that it shouldn't, this should just be marked
> as ignored on fixed-type queues, but the spec up till now says it must
> match and some drivers do check this. Once we update the drivers to stop
> checking then we can change the spec without affecting apps.
> 
>>> +			 * This field is preserved between enqueue and
>>> dequeue.  +			 * +			 * @see
>>> RTE_SCHED_TYPE_ORDERED +			 * @see
>>> RTE_SCHED_TYPE_ATOMIC +			 * @see
>>> RTE_SCHED_TYPE_PARALLEL */ uint8_t queue_id; /**< Targeted event queue
>>> identifier for the enqueue or * dequeue operation.  * The value must be
>>> in the range of -			 * [0, nb_event_queues - 1] which
>>> previously supplied to -			 *
>>> rte_event_dev_configure().  +			 * [0, @ref
>>> rte_event_dev_config.nb_event_queues - 1] which was +
>>> * previously supplied to rte_event_dev_configure().  +
>>> * +			 * This field is preserved between enqueue on
>>> dequeue.  */ uint8_t priority; /**< Event priority relative to other
>>> events in the * event queue. The requested priority should in the -
>>> * range of  [RTE_EVENT_DEV_PRIORITY_HIGHEST, -			 *
>>> RTE_EVENT_DEV_PRIORITY_LOWEST].  +			 * range of  [@ref
>>> RTE_EVENT_DEV_PRIORITY_HIGHEST, +			 * @ref
>>> RTE_EVENT_DEV_PRIORITY_LOWEST].  * The implementation shall normalize
>>> the requested * priority to supported priority value.  +
>>> * * Valid when the device has -			 *
>>> RTE_EVENT_DEV_CAP_EVENT_QOS capability.  +			 * @ref
>>> RTE_EVENT_DEV_CAP_EVENT_QOS capability.  +			 * Ignored
>>> otherwise.  +			 * +			 * This
>>> field is preserved between enqueue and dequeue.
>>
>> Is the normalized or unnormalized value that is preserved?
>>
> Very good point. It's the normalized & then denormalised version that is
> guaranteed to be preserved, I suspect. SW eventdevs probably preserve
> as-is, but HW eventdevs may lose precision. Rather than making this
> "implementation defined" or "not preserved" which would be annoying for
> apps, I think, I'm going to document this as "preserved, but with possible
> loss of precision".
> 

This makes me again think it may be worth noting that Eventdev -> API 
priority normalization is (event.priority * PMD_LEVELS) / 
EVENTDEV_LEVELS (rounded down) - assuming that's how it's supposed to be 
done - or something to that effect.

>>>    			 */
>>>    			uint8_t impl_opaque;
>>>    			/**< Implementation specific opaque value.
>>
>> Maybe you can also fix "implementation" here to be something more specific.
>> Implementation, of what?
>>
>> "Event device implementation" or just "event device".
>>
> "Opaque field for event device use"
> 
>>> +			 *
>>>    			 * An implementation may use this field to hold
>>>    			 * implementation specific value to share between
>>>    			 * dequeue and enqueue operation.
>>> +			 *
>>>    			 * The application should not modify this field.
>>> +			 * Its value is implementation dependent on dequeue,
>>> +			 * and must be returned unmodified on enqueue when
>>> +			 * op type is @ref RTE_EVENT_OP_FORWARD or @ref RTE_EVENT_OP_RELEASE
>>
>> Should it be mentioned that impl_opaque is ignored by the event device for
>> NEW events?
>>
> Added in V3.
> 
>>>    			 */
>>>    		};
>>>    	};
>>> --
>>> 2.40.1
>>>
  
Mattias Rönnblom Feb. 2, 2024, 9:45 a.m. UTC | #12
On 2024-02-01 18:02, Bruce Richardson wrote:
> On Wed, Jan 24, 2024 at 12:34:50PM +0100, Mattias Rönnblom wrote:
>> On 2024-01-19 18:43, Bruce Richardson wrote:
>>> Clarify the meaning of the NEW, FORWARD and RELEASE event types.
>>> For the fields in "rte_event" struct, enhance the comments on each to
>>> clarify the field's use, and whether it is preserved between enqueue and
>>> dequeue, and it's role, if any, in scheduling.
>>>
>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>> ---
>>>
>>> As with the previous patch, please review this patch to ensure that the
>>> expected semantics of the various event types and event fields have not
>>> changed in an unexpected way.
>>> ---
>>>    lib/eventdev/rte_eventdev.h | 105 ++++++++++++++++++++++++++----------
>>>    1 file changed, 77 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
>>> index cb13602ffb..4eff1c4958 100644
>>> --- a/lib/eventdev/rte_eventdev.h
>>> +++ b/lib/eventdev/rte_eventdev.h
> <snip>
> 
>>>    /**
>>> @@ -1473,53 +1475,100 @@ struct rte_event {
>>>    			/**< Targeted flow identifier for the enqueue and
>>>    			 * dequeue operation.
>>>    			 * The value must be in the range of
>>> -			 * [0, nb_event_queue_flows - 1] which
>>> +			 * [0, @ref rte_event_dev_config.nb_event_queue_flows - 1] which
>>
>> The same comment as I had before about ranges for unsigned types.
>>
> Actually, is this correct, does a range actually apply here?
> 
> I thought that the number of queue flows supported was a guide as to how
> internal HW resources were to be allocated, and that the flow_id was always
> a 20-bit value, where it was up to the scheduler to work out how to map
> that to internal atomic locks (when combined with queue ids etc.). It
> should not be up to the app to have to do the range limiting itself!
> 

Indeed, I also operated under this belief, which is reflected in DSW, 
which just takes the flow_id and (pseudo-)hash+mask it into the 
appropriate range.

Leaving it to the app allows app-level attempts to avoid collisions 
between large flows, I guess. Not sure I think apps will (or even 
should) really do this.
  
Bruce Richardson Feb. 2, 2024, 10:32 a.m. UTC | #13
On Fri, Feb 02, 2024 at 10:45:34AM +0100, Mattias Rönnblom wrote:
> On 2024-02-01 18:02, Bruce Richardson wrote:
> > On Wed, Jan 24, 2024 at 12:34:50PM +0100, Mattias Rönnblom wrote:
> > > On 2024-01-19 18:43, Bruce Richardson wrote:
> > > > Clarify the meaning of the NEW, FORWARD and RELEASE event types.
> > > > For the fields in "rte_event" struct, enhance the comments on each to
> > > > clarify the field's use, and whether it is preserved between enqueue and
> > > > dequeue, and it's role, if any, in scheduling.
> > > > 
> > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > > ---
> > > > 
> > > > As with the previous patch, please review this patch to ensure that the
> > > > expected semantics of the various event types and event fields have not
> > > > changed in an unexpected way.
> > > > ---
> > > >    lib/eventdev/rte_eventdev.h | 105 ++++++++++++++++++++++++++----------
> > > >    1 file changed, 77 insertions(+), 28 deletions(-)
> > > > 
> > > > diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> > > > index cb13602ffb..4eff1c4958 100644
> > > > --- a/lib/eventdev/rte_eventdev.h
> > > > +++ b/lib/eventdev/rte_eventdev.h
> > <snip>
> > 
> > > >    /**
> > > > @@ -1473,53 +1475,100 @@ struct rte_event {
> > > >    			/**< Targeted flow identifier for the enqueue and
> > > >    			 * dequeue operation.
> > > >    			 * The value must be in the range of
> > > > -			 * [0, nb_event_queue_flows - 1] which
> > > > +			 * [0, @ref rte_event_dev_config.nb_event_queue_flows - 1] which
> > > 
> > > The same comment as I had before about ranges for unsigned types.
> > > 
> > Actually, is this correct, does a range actually apply here?
> > 
> > I thought that the number of queue flows supported was a guide as to how
> > internal HW resources were to be allocated, and that the flow_id was always
> > a 20-bit value, where it was up to the scheduler to work out how to map
> > that to internal atomic locks (when combined with queue ids etc.). It
> > should not be up to the app to have to do the range limiting itself!
> > 
> 
> Indeed, I also operated under this belief, which is reflected in DSW, which
> just takes the flow_id and (pseudo-)hash+mask it into the appropriate range.
> 
> Leaving it to the app allows app-level attempts to avoid collisions between
> large flows, I guess. Not sure I think apps will (or even should) really do
> this.

I'm just going to drop this restriction from v3.
  
Bruce Richardson Feb. 2, 2024, 11:33 a.m. UTC | #14
On Fri, Feb 02, 2024 at 10:38:10AM +0100, Mattias Rönnblom wrote:
> On 2024-02-01 17:59, Bruce Richardson wrote:
> > On Wed, Jan 24, 2024 at 12:34:50PM +0100, Mattias Rönnblom wrote:
> > > On 2024-01-19 18:43, Bruce Richardson wrote:
> > > > Clarify the meaning of the NEW, FORWARD and RELEASE event types.
> > > > For the fields in "rte_event" struct, enhance the comments on each to
> > > > clarify the field's use, and whether it is preserved between enqueue and
> > > > dequeue, and it's role, if any, in scheduling.
> > > > 
> > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > > ---
> > > > 
<snip>
> > > Is the normalized or unnormalized value that is preserved?
> > > 
> > Very good point. It's the normalized & then denormalised version that is
> > guaranteed to be preserved, I suspect. SW eventdevs probably preserve
> > as-is, but HW eventdevs may lose precision. Rather than making this
> > "implementation defined" or "not preserved" which would be annoying for
> > apps, I think, I'm going to document this as "preserved, but with possible
> > loss of precision".
> > 
> 
> This makes me again think it may be worth noting that Eventdev -> API
> priority normalization is (event.priority * PMD_LEVELS) / EVENTDEV_LEVELS
> (rounded down) - assuming that's how it's supposed to be done - or something
> to that effect.
> 
Following my comment on the thread on the other patch about looking at
numbers of bits of priority being valid, I did a quick check of the evdev PMDs
by using grep for "max_event_priority_levels" in each driver. According to
that (and resolving some #defines), I see:

0 - dpaa, dpaa2
1 - cnxk, dsw, octeontx, opdl
4 - sw
8 - dlb2, skeleton

So it looks like switching to a bit-scheme is workable, where we measure
supported event levels in powers-of-two only. [And we can cut down priority
fields if we like].

Question for confirmation. For cases where the eventdev does not support
per-event prioritization, I suppose we should say that the priority field
is not preserved, as well as being ignored?

/Bruce
  
Bruce Richardson Feb. 2, 2024, 12:02 p.m. UTC | #15
On Fri, Feb 02, 2024 at 11:33:19AM +0000, Bruce Richardson wrote:
> On Fri, Feb 02, 2024 at 10:38:10AM +0100, Mattias Rönnblom wrote:
> > On 2024-02-01 17:59, Bruce Richardson wrote:
> > > On Wed, Jan 24, 2024 at 12:34:50PM +0100, Mattias Rönnblom wrote:
> > > > On 2024-01-19 18:43, Bruce Richardson wrote:
> > > > > Clarify the meaning of the NEW, FORWARD and RELEASE event types.
> > > > > For the fields in "rte_event" struct, enhance the comments on each to
> > > > > clarify the field's use, and whether it is preserved between enqueue and
> > > > > dequeue, and it's role, if any, in scheduling.
> > > > > 
> > > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > > > ---
> > > > > 
> <snip>
> > > > Is the normalized or unnormalized value that is preserved?
> > > > 
> > > Very good point. It's the normalized & then denormalised version that is
> > > guaranteed to be preserved, I suspect. SW eventdevs probably preserve
> > > as-is, but HW eventdevs may lose precision. Rather than making this
> > > "implementation defined" or "not preserved" which would be annoying for
> > > apps, I think, I'm going to document this as "preserved, but with possible
> > > loss of precision".
> > > 
> > 
> > This makes me again think it may be worth noting that Eventdev -> API
> > priority normalization is (event.priority * PMD_LEVELS) / EVENTDEV_LEVELS
> > (rounded down) - assuming that's how it's supposed to be done - or something
> > to that effect.
> > 
> Following my comment on the thread on the other patch about looking at
> numbers of bits of priority being valid, I did a quick check of the evdev PMDs
> by using grep for "max_event_priority_levels" in each driver. According to
> that (and resolving some #defines), I see:
> 
> 0 - dpaa, dpaa2
> 1 - cnxk, dsw, octeontx, opdl
> 4 - sw
> 8 - dlb2, skeleton
> 
> So it looks like switching to a bit-scheme is workable, where we measure
> supported event levels in powers-of-two only. [And we can cut down priority
> fields if we like].
> 
And just for reference, the advertized values for
max_event_queue_priority_levels are:

1 - dsw, opdl
8 - cnxk, dlb2, dpaa, dpaa2, octeontx, skeleton
255 - sw [though this should really be 256, it's an off-by-one error due to
          the range of uint8_t type. SW evdev just sorts queues by priority
          using the whole priority value specified.]

So I think we can treat queue priority similarly to event priority - giving
the number of bits which are valid. Also, if we decide to cut the event
priority level range to e.g. 0-15, I think we can do the same for the queue
priority levels, so that the ranges are similar, and then we can adjust the
min-max definitions to match.

/Bruce
  

Patch

diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
index cb13602ffb..4eff1c4958 100644
--- a/lib/eventdev/rte_eventdev.h
+++ b/lib/eventdev/rte_eventdev.h
@@ -1416,21 +1416,25 @@  struct rte_event_vector {

 /* Event enqueue operations */
 #define RTE_EVENT_OP_NEW                0
-/**< The event producers use this operation to inject a new event to the
+/**< The @ref rte_event.op field should be set to this type to inject a new event to the
  * event device.
  */
 #define RTE_EVENT_OP_FORWARD            1
-/**< The CPU use this operation to forward the event to different event queue or
- * change to new application specific flow or schedule type to enable
- * pipelining.
+/**< SW should set the @ref rte_event.op filed to this type to return a
+ * previously dequeued event to the event device for further processing.
  *
- * This operation must only be enqueued to the same port that the
+ * This event *must* be enqueued to the same port that the
  * event to be forwarded was dequeued from.
+ *
+ * The event's fields, including (but not limited to) flow_id, scheduling type,
+ * destination queue, and event payload e.g. mbuf pointer, may all be updated as
+ * desired by software, but the @ref rte_event.impl_opaque field must
+ * be kept to the same value as was present when the event was dequeued.
  */
 #define RTE_EVENT_OP_RELEASE            2
 /**< Release the flow context associated with the schedule type.
  *
- * If current flow's scheduler type method is *RTE_SCHED_TYPE_ATOMIC*
+ * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ATOMIC
  * then this function hints the scheduler that the user has completed critical
  * section processing in the current atomic context.
  * The scheduler is now allowed to schedule events from the same flow from
@@ -1442,21 +1446,19 @@  struct rte_event_vector {
  * performance, but the user needs to design carefully the split into critical
  * vs non-critical sections.
  *
- * If current flow's scheduler type method is *RTE_SCHED_TYPE_ORDERED*
- * then this function hints the scheduler that the user has done all that need
- * to maintain event order in the current ordered context.
- * The scheduler is allowed to release the ordered context of this port and
- * avoid reordering any following enqueues.
- *
- * Early ordered context release may increase parallelism and thus system
- * performance.
+ * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ORDERED
+ * then this function informs the scheduler that the current event has
+ * completed processing and will not be returned to the scheduler, i.e.
+ * it has been dropped, and so the reordering context for that event
+ * should be considered filled.
  *
- * If current flow's scheduler type method is *RTE_SCHED_TYPE_PARALLEL*
+ * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_PARALLEL
  * or no scheduling context is held then this function may be an NOOP,
  * depending on the implementation.
  *
  * This operation must only be enqueued to the same port that the
- * event to be released was dequeued from.
+ * event to be released was dequeued from. The @ref rte_event.impl_opaque
+ * field in the release event must match that in the original dequeued event.
  */

 /**
@@ -1473,53 +1475,100 @@  struct rte_event {
 			/**< Targeted flow identifier for the enqueue and
 			 * dequeue operation.
 			 * The value must be in the range of
-			 * [0, nb_event_queue_flows - 1] which
+			 * [0, @ref rte_event_dev_config.nb_event_queue_flows - 1] which
 			 * previously supplied to rte_event_dev_configure().
+			 *
+			 * For @ref RTE_SCHED_TYPE_ATOMIC, this field is used to identify a
+			 * flow context for atomicity, such that events from each individual flow
+			 * will only be scheduled to one port at a time.
+			 *
+			 * This field is preserved between enqueue and dequeue when
+			 * a device reports the @ref RTE_EVENT_DEV_CAP_CARRY_FLOW_ID
+			 * capability. Otherwise the value is implementation dependent
+			 * on dequeue.
 			 */
 			uint32_t sub_event_type:8;
 			/**< Sub-event types based on the event source.
+			 *
+			 * This field is preserved between enqueue and dequeue.
+			 * This field is for SW or event adapter use,
+			 * and is unused in scheduling decisions.
+			 *
 			 * @see RTE_EVENT_TYPE_CPU
 			 */
 			uint32_t event_type:4;
-			/**< Event type to classify the event source.
-			 * @see RTE_EVENT_TYPE_ETHDEV, (RTE_EVENT_TYPE_*)
+			/**< Event type to classify the event source. (RTE_EVENT_TYPE_*)
+			 *
+			 * This field is preserved between enqueue and dequeue
+			 * This field is for SW or event adapter use,
+			 * and is unused in scheduling decisions.
 			 */
 			uint8_t op:2;
-			/**< The type of event enqueue operation - new/forward/
-			 * etc.This field is not preserved across an instance
+			/**< The type of event enqueue operation - new/forward/ etc.
+			 *
+			 * This field is *not* preserved across an instance
 			 * and is undefined on dequeue.
-			 * @see RTE_EVENT_OP_NEW, (RTE_EVENT_OP_*)
+			 *
+			 * @see RTE_EVENT_OP_NEW
+			 * @see RTE_EVENT_OP_FORWARD
+			 * @see RTE_EVENT_OP_RELEASE
 			 */
 			uint8_t rsvd:4;
-			/**< Reserved for future use */
+			/**< Reserved for future use.
+			 *
+			 * Should be set to zero on enqueue. Zero on dequeue.
+			 */
 			uint8_t sched_type:2;
 			/**< Scheduler synchronization type (RTE_SCHED_TYPE_*)
 			 * associated with flow id on a given event queue
 			 * for the enqueue and dequeue operation.
+			 *
+			 * This field is used to determine the scheduling type
+			 * for events sent to queues where @ref RTE_EVENT_QUEUE_CFG_ALL_TYPES
+			 * is supported.
+			 * For queues where only a single scheduling type is available,
+			 * this field must be set to match the configured scheduling type.
+			 *
+			 * This field is preserved between enqueue and dequeue.
+			 *
+			 * @see RTE_SCHED_TYPE_ORDERED
+			 * @see RTE_SCHED_TYPE_ATOMIC
+			 * @see RTE_SCHED_TYPE_PARALLEL
 			 */
 			uint8_t queue_id;
 			/**< Targeted event queue identifier for the enqueue or
 			 * dequeue operation.
 			 * The value must be in the range of
-			 * [0, nb_event_queues - 1] which previously supplied to
-			 * rte_event_dev_configure().
+			 * [0, @ref rte_event_dev_config.nb_event_queues - 1] which was
+			 * previously supplied to rte_event_dev_configure().
+			 *
+			 * This field is preserved between enqueue on dequeue.
 			 */
 			uint8_t priority;
 			/**< Event priority relative to other events in the
 			 * event queue. The requested priority should in the
-			 * range of  [RTE_EVENT_DEV_PRIORITY_HIGHEST,
-			 * RTE_EVENT_DEV_PRIORITY_LOWEST].
+			 * range of  [@ref RTE_EVENT_DEV_PRIORITY_HIGHEST,
+			 * @ref RTE_EVENT_DEV_PRIORITY_LOWEST].
 			 * The implementation shall normalize the requested
 			 * priority to supported priority value.
+			 *
 			 * Valid when the device has
-			 * RTE_EVENT_DEV_CAP_EVENT_QOS capability.
+			 * @ref RTE_EVENT_DEV_CAP_EVENT_QOS capability.
+			 * Ignored otherwise.
+			 *
+			 * This field is preserved between enqueue and dequeue.
 			 */
 			uint8_t impl_opaque;
 			/**< Implementation specific opaque value.
+			 *
 			 * An implementation may use this field to hold
 			 * implementation specific value to share between
 			 * dequeue and enqueue operation.
+			 *
 			 * The application should not modify this field.
+			 * Its value is implementation dependent on dequeue,
+			 * and must be returned unmodified on enqueue when
+			 * op type is @ref RTE_EVENT_OP_FORWARD or @ref RTE_EVENT_OP_RELEASE
 			 */
 		};
 	};