[v3,03/10] ethdev: bring in async queue-based flow rules operations

Message ID 20220206032526.816079-4-akozyrev@nvidia.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series ethdev: datapath-focused flow rules management |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Alexander Kozyrev Feb. 6, 2022, 3:25 a.m. UTC
  A new, faster, queue-based flow rules management mechanism is needed for
applications offloading rules inside the datapath. This asynchronous
and lockless mechanism frees the CPU for further packet processing and
reduces the performance impact of the flow rules creation/destruction
on the datapath. Note that queues are not thread-safe and the queue
should be accessed from the same thread for all queue operations.
It is the responsibility of the app to sync the queue functions in case
of multi-threaded access to the same queue.

The rte_flow_q_flow_create() function enqueues a flow creation to the
requested queue. It benefits from already configured resources and sets
unique values on top of item and action templates. A flow rule is enqueued
on the specified flow queue and offloaded asynchronously to the hardware.
The function returns immediately to spare CPU for further packet
processing. The application must invoke the rte_flow_q_pull() function
to complete the flow rule operation offloading, to clear the queue, and to
receive the operation status. The rte_flow_q_flow_destroy() function
enqueues a flow destruction to the requested queue.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 doc/guides/prog_guide/img/rte_flow_q_init.svg |  71 ++++
 .../prog_guide/img/rte_flow_q_usage.svg       |  60 +++
 doc/guides/prog_guide/rte_flow.rst            | 159 +++++++-
 doc/guides/rel_notes/release_22_03.rst        |   8 +
 lib/ethdev/rte_flow.c                         | 173 ++++++++-
 lib/ethdev/rte_flow.h                         | 342 ++++++++++++++++++
 lib/ethdev/rte_flow_driver.h                  |  55 +++
 lib/ethdev/version.map                        |   7 +
 8 files changed, 873 insertions(+), 2 deletions(-)
 create mode 100644 doc/guides/prog_guide/img/rte_flow_q_init.svg
 create mode 100644 doc/guides/prog_guide/img/rte_flow_q_usage.svg
  

Comments

Ori Kam Feb. 7, 2022, 1:18 p.m. UTC | #1
HI Alexander,

> -----Original Message-----
> From: Alexander Kozyrev <akozyrev@nvidia.com>
> Sent: Sunday, February 6, 2022 5:25 AM
> To: dev@dpdk.org
> Cc: Ori Kam <orika@nvidia.com>; NBU-Contact-Thomas Monjalon (EXTERNAL)
> <thomas@monjalon.net>; ivan.malov@oktetlabs.ru; andrew.rybchenko@oktetlabs.ru;
> ferruh.yigit@intel.com; mohammad.abdul.awal@intel.com; qi.z.zhang@intel.com;
> jerinj@marvell.com; ajit.khaparde@broadcom.com
> Subject: [PATCH v3 03/10] ethdev: bring in async queue-based flow rules operations
> 
> A new, faster, queue-based flow rules management mechanism is needed for
> applications offloading rules inside the datapath. This asynchronous
> and lockless mechanism frees the CPU for further packet processing and
> reduces the performance impact of the flow rules creation/destruction
> on the datapath. Note that queues are not thread-safe and the queue
> should be accessed from the same thread for all queue operations.
> It is the responsibility of the app to sync the queue functions in case
> of multi-threaded access to the same queue.
> 
> The rte_flow_q_flow_create() function enqueues a flow creation to the
> requested queue. It benefits from already configured resources and sets
> unique values on top of item and action templates. A flow rule is enqueued
> on the specified flow queue and offloaded asynchronously to the hardware.
> The function returns immediately to spare CPU for further packet
> processing. The application must invoke the rte_flow_q_pull() function
> to complete the flow rule operation offloading, to clear the queue, and to
> receive the operation status. The rte_flow_q_flow_destroy() function
> enqueues a flow destruction to the requested queue.
> 
> Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
> ---

Acked-by: Ori Kam <orika@nvidia.com>
Best,
Ori
  
Jerin Jacob Feb. 8, 2022, 10:56 a.m. UTC | #2
On Sun, Feb 6, 2022 at 8:57 AM Alexander Kozyrev <akozyrev@nvidia.com> wrote:
>
> A new, faster, queue-based flow rules management mechanism is needed for
> applications offloading rules inside the datapath. This asynchronous
> and lockless mechanism frees the CPU for further packet processing and
> reduces the performance impact of the flow rules creation/destruction
> on the datapath. Note that queues are not thread-safe and the queue
> should be accessed from the same thread for all queue operations.
> It is the responsibility of the app to sync the queue functions in case
> of multi-threaded access to the same queue.
>
> The rte_flow_q_flow_create() function enqueues a flow creation to the
> requested queue. It benefits from already configured resources and sets
> unique values on top of item and action templates. A flow rule is enqueued
> on the specified flow queue and offloaded asynchronously to the hardware.
> The function returns immediately to spare CPU for further packet
> processing. The application must invoke the rte_flow_q_pull() function
> to complete the flow rule operation offloading, to clear the queue, and to
> receive the operation status. The rte_flow_q_flow_destroy() function
> enqueues a flow destruction to the requested queue.

It is good to see the implementation, specifically to understand,
1)
I understand, We are creating queues to make multiple producers to
enqueue multiple jobs in parallel.
On the consumer side, Is it HW or some other cores to consume the job?
Can we operate in consumer in parallel?

2) Is Queue part of HW or just SW primitive to submit the work as a channel.


>
> Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
> ---
>  doc/guides/prog_guide/img/rte_flow_q_init.svg |  71 ++++
>  .../prog_guide/img/rte_flow_q_usage.svg       |  60 +++
>  doc/guides/prog_guide/rte_flow.rst            | 159 +++++++-
>  doc/guides/rel_notes/release_22_03.rst        |   8 +
>  lib/ethdev/rte_flow.c                         | 173 ++++++++-
>  lib/ethdev/rte_flow.h                         | 342 ++++++++++++++++++
>  lib/ethdev/rte_flow_driver.h                  |  55 +++
>  lib/ethdev/version.map                        |   7 +
>  8 files changed, 873 insertions(+), 2 deletions(-)
>  create mode 100644 doc/guides/prog_guide/img/rte_flow_q_init.svg
>  create mode 100644 doc/guides/prog_guide/img/rte_flow_q_usage.svg
>
> diff --git a/doc/guides/prog_guide/img/rte_flow_q_init.svg b/doc/guides/prog_guide/img/rte_flow_q_init.svg
> new file mode 100644
> index 0000000000..2080bf4c04



Some comments on the diagrams:
# rte_flow_q_create_flow and rte_flow_q_destroy_flow used instead of
rte_flow_q_flow_create/destroy
# rte_flow_q_pull's brackets(i.e ()) not aligned


> +</svg>
> \ No newline at end of file
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index b7799c5abe..734294e65d 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -3607,12 +3607,16 @@ Hints about the expected number of counters or meters in an application,
>  for example, allow PMD to prepare and optimize NIC memory layout in advance.
>  ``rte_flow_configure()`` must be called before any flow rule is created,
>  but after an Ethernet device is configured.
> +It also creates flow queues for asynchronous flow rules operations via
> +queue-based API, see `Asynchronous operations`_ section.
>
>  .. code-block:: c
>
>     int
>     rte_flow_configure(uint16_t port_id,
>                       const struct rte_flow_port_attr *port_attr,
> +                     uint16_t nb_queue,

# rte_flow_info_get() don't have number of queues, why not adding
number queues in rte_flow_port_attr.
# And additional APIs for queue_setup() like ethdev.


> +                     const struct rte_flow_queue_attr *queue_attr[],
>                       struct rte_flow_error *error);
>
>  Information about resources that can benefit from pre-allocation can be
> @@ -3737,7 +3741,7 @@ and pattern and actions templates are created.
>
>  .. code-block:: c
>
> -       rte_flow_configure(port, *port_attr, *error);
> +       rte_flow_configure(port, *port_attr, nb_queue, *queue_attr, *error);
>
>         struct rte_flow_pattern_template *pattern_templates[0] =
>                 rte_flow_pattern_template_create(port, &itr, &pattern, &error);
> @@ -3750,6 +3754,159 @@ and pattern and actions templates are created.
>                                 *actions_templates, nb_actions_templates,
>                                 *error);
>
> +Asynchronous operations
> +-----------------------
> +
> +Flow rules management can be done via special lockless flow management queues.
> +- Queue operations are asynchronous and not thread-safe.
> +- Operations can thus be invoked by the app's datapath,
> +packet processing can continue while queue operations are processed by NIC.
> +- The queue number is configured at initialization stage.
> +- Available operation types: rule creation, rule destruction,
> +indirect rule creation, indirect rule destruction, indirect rule update.
> +- Operations may be reordered within a queue.
> +- Operations can be postponed and pushed to NIC in batches.
> +- Results pulling must be done on time to avoid queue overflows.
> +- User data is returned as part of the result to identify an operation.
> +- Flow handle is valid once the creation operation is enqueued and must be
> +destroyed even if the operation is not successful and the rule is not inserted.

You need CR between lines as rendered text does comes as new line in
between the items.


> +
> +The asynchronous flow rule insertion logic can be broken into two phases.
> +
> +1. Initialization stage as shown here:
> +
> +.. _figure_rte_flow_q_init:
> +
> +.. figure:: img/rte_flow_q_init.*
> +
> +2. Main loop as presented on a datapath application example:
> +
> +.. _figure_rte_flow_q_usage:
> +
> +.. figure:: img/rte_flow_q_usage.*

it is better to add sequence operations as text to understand the flow.


> +
> +Enqueue creation operation
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Enqueueing a flow rule creation operation is similar to simple creation.

If it is enqueue operation, why not call it ad rte_flow_q_flow_enqueue()

> +
> +.. code-block:: c
> +
> +       struct rte_flow *
> +       rte_flow_q_flow_create(uint16_t port_id,
> +                               uint32_t queue_id,
> +                               const struct rte_flow_q_ops_attr *q_ops_attr,
> +                               struct rte_flow_table *table,
> +                               const struct rte_flow_item pattern[],
> +                               uint8_t pattern_template_index,
> +                               const struct rte_flow_action actions[],

If I understand correctly, table is the pre-configured object that has
N number of patterns and N number of actions.
Why giving items[] and actions[] again?

> +                               uint8_t actions_template_index,
> +                               struct rte_flow_error *error);
> +
> +A valid handle in case of success is returned. It must be destroyed later
> +by calling ``rte_flow_q_flow_destroy()`` even if the rule is rejected by HW.
> +
> +Enqueue destruction operation
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Queue destruction operation.


> +
> +Enqueueing a flow rule destruction operation is similar to simple destruction.
> +
> +.. code-block:: c
> +
> +       int
> +       rte_flow_q_flow_destroy(uint16_t port_id,
> +                               uint32_t queue_id,
> +                               const struct rte_flow_q_ops_attr *q_ops_attr,
> +                               struct rte_flow *flow,
> +                               struct rte_flow_error *error);
> +
> +Push enqueued operations
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Pushing all internally stored rules from a queue to the NIC.
> +
> +.. code-block:: c
> +
> +       int
> +       rte_flow_q_push(uint16_t port_id,
> +                       uint32_t queue_id,
> +                       struct rte_flow_error *error);
> +
> +There is the postpone attribute in the queue operation attributes.
> +When it is set, multiple operations can be bulked together and not sent to HW
> +right away to save SW/HW interactions and prioritize throughput over latency.
> +The application must invoke this function to actually push all outstanding
> +operations to HW in this case.
> +
> +Pull enqueued operations
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Pulling asynchronous operations results.
> +
> +The application must invoke this function in order to complete asynchronous
> +flow rule operations and to receive flow rule operations statuses.
> +
> +.. code-block:: c
> +
> +       int
> +       rte_flow_q_pull(uint16_t port_id,
> +                       uint32_t queue_id,
> +                       struct rte_flow_q_op_res res[],
> +                       uint16_t n_res,
> +                       struct rte_flow_error *error);
> +
> +Multiple outstanding operation results can be pulled simultaneously.
> +User data may be provided during a flow creation/destruction in order
> +to distinguish between multiple operations. User data is returned as part
> +of the result to provide a method to detect which operation is completed.
> +
> +Enqueue indirect action creation operation
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Asynchronous version of indirect action creation API.
> +
> +.. code-block:: c
> +
> +       struct rte_flow_action_handle *
> +       rte_flow_q_action_handle_create(uint16_t port_id,

What is the use case for this?
How application needs to use this. We already creating flow_table. Is
that not sufficient?


> +                       uint32_t queue_id,
> +                       const struct rte_flow_q_ops_attr *q_ops_attr,
> +                       const struct rte_flow_indir_action_conf *indir_action_conf,
> +                       const struct rte_flow_action *action,
> +                       struct rte_flow_error *error);
> +
> +A valid handle in case of success is returned. It must be destroyed later by
> +calling ``rte_flow_q_action_handle_destroy()`` even if the rule is rejected.
> +
> +Enqueue indirect action destruction operation
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Asynchronous version of indirect action destruction API.
> +
> +.. code-block:: c
> +
> +       int
> +       rte_flow_q_action_handle_destroy(uint16_t port_id,
> +                       uint32_t queue_id,
> +                       const struct rte_flow_q_ops_attr *q_ops_attr,
> +                       struct rte_flow_action_handle *action_handle,
> +                       struct rte_flow_error *error);
> +
> +Enqueue indirect action update operation
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Asynchronous version of indirect action update API.
> +
> +.. code-block:: c
> +
> +       int
> +       rte_flow_q_action_handle_update(uint16_t port_id,
> +                       uint32_t queue_id,
> +                       const struct rte_flow_q_ops_attr *q_ops_attr,
> +                       struct rte_flow_action_handle *action_handle,
> +                       const void *update,
> +                       struct rte_flow_error *error);
> +
>  .. _flow_isolated_mode:
>
>  Flow isolated mode
> diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst
> index d23d1591df..80a85124e6 100644
> --- a/doc/guides/rel_notes/release_22_03.rst
> +++ b/doc/guides/rel_notes/release_22_03.rst
> @@ -67,6 +67,14 @@ New Features
>    ``rte_flow_table_destroy``, ``rte_flow_pattern_template_destroy``
>    and ``rte_flow_actions_template_destroy``.
>
> +* ethdev: Added ``rte_flow_q_flow_create`` and ``rte_flow_q_flow_destroy`` API
> +  to enqueue flow creaion/destruction operations asynchronously as well as
> +  ``rte_flow_q_pull`` to poll and retrieve results of these operations and
> +  ``rte_flow_q_push`` to push all the in-flight operations to the NIC.
> +  Introduced asynchronous API for indirect actions management as well:
> +  ``rte_flow_q_action_handle_create``, ``rte_flow_q_action_handle_destroy`` and
> +  ``rte_flow_q_action_handle_update``.
> +
  
Alexander Kozyrev Feb. 8, 2022, 2:11 p.m. UTC | #3
> On Tuesday, February 8, 2022 5:57 Jerin Jacob <jerinjacobk@gmail.com> wrote:
> On Sun, Feb 6, 2022 at 8:57 AM Alexander Kozyrev <akozyrev@nvidia.com>
> wrote:

Hi Jerin, thanks you for reviewing my patch. I appreciate your input.
I'm planning to send v4 today with addressed comments today to be on time for RC1.
I hope that my answers are satisfactory for the rest of questions raised by you.

> >
> > A new, faster, queue-based flow rules management mechanism is needed
> for
> > applications offloading rules inside the datapath. This asynchronous
> > and lockless mechanism frees the CPU for further packet processing and
> > reduces the performance impact of the flow rules creation/destruction
> > on the datapath. Note that queues are not thread-safe and the queue
> > should be accessed from the same thread for all queue operations.
> > It is the responsibility of the app to sync the queue functions in case
> > of multi-threaded access to the same queue.
> >
> > The rte_flow_q_flow_create() function enqueues a flow creation to the
> > requested queue. It benefits from already configured resources and sets
> > unique values on top of item and action templates. A flow rule is enqueued
> > on the specified flow queue and offloaded asynchronously to the
> hardware.
> > The function returns immediately to spare CPU for further packet
> > processing. The application must invoke the rte_flow_q_pull() function
> > to complete the flow rule operation offloading, to clear the queue, and to
> > receive the operation status. The rte_flow_q_flow_destroy() function
> > enqueues a flow destruction to the requested queue.
> 
> It is good to see the implementation, specifically to understand,

We will send PMD implementation in the next few days.

> 1)
> I understand, We are creating queues to make multiple producers to
> enqueue multiple jobs in parallel.
> On the consumer side, Is it HW or some other cores to consume the job?

From API point of view there is no restriction on the type of consumer.
It could be hardware or software implementation, but in most cases
(and in our driver) it will be the NIC to handle the requests.

> Can we operate in consumer in parallel?

Yes, we can have separate multiple hardware queues to handle operations
in parallel independently and without any locking mechanism needed.

> 2) Is Queue part of HW or just SW primitive to submit the work as a channel.

The queue is a software primitive.

> 
> >
> > Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
> > ---
> >  doc/guides/prog_guide/img/rte_flow_q_init.svg |  71 ++++
> >  .../prog_guide/img/rte_flow_q_usage.svg       |  60 +++
> >  doc/guides/prog_guide/rte_flow.rst            | 159 +++++++-
> >  doc/guides/rel_notes/release_22_03.rst        |   8 +
> >  lib/ethdev/rte_flow.c                         | 173 ++++++++-
> >  lib/ethdev/rte_flow.h                         | 342 ++++++++++++++++++
> >  lib/ethdev/rte_flow_driver.h                  |  55 +++
> >  lib/ethdev/version.map                        |   7 +
> >  8 files changed, 873 insertions(+), 2 deletions(-)
> >  create mode 100644 doc/guides/prog_guide/img/rte_flow_q_init.svg
> >  create mode 100644 doc/guides/prog_guide/img/rte_flow_q_usage.svg
> >
> > diff --git a/doc/guides/prog_guide/img/rte_flow_q_init.svg
> b/doc/guides/prog_guide/img/rte_flow_q_init.svg
> > new file mode 100644
> > index 0000000000..2080bf4c04
> 
> 
> 
> Some comments on the diagrams:
> # rte_flow_q_create_flow and rte_flow_q_destroy_flow used instead of
> rte_flow_q_flow_create/destroy
> # rte_flow_q_pull's brackets(i.e ()) not aligned

Will fix this, thanks for noticing.
 
> 
> > +</svg>
> > \ No newline at end of file
> > diff --git a/doc/guides/prog_guide/rte_flow.rst
> b/doc/guides/prog_guide/rte_flow.rst
> > index b7799c5abe..734294e65d 100644
> > --- a/doc/guides/prog_guide/rte_flow.rst
> > +++ b/doc/guides/prog_guide/rte_flow.rst
> > @@ -3607,12 +3607,16 @@ Hints about the expected number of counters
> or meters in an application,
> >  for example, allow PMD to prepare and optimize NIC memory layout in
> advance.
> >  ``rte_flow_configure()`` must be called before any flow rule is created,
> >  but after an Ethernet device is configured.
> > +It also creates flow queues for asynchronous flow rules operations via
> > +queue-based API, see `Asynchronous operations`_ section.
> >
> >  .. code-block:: c
> >
> >     int
> >     rte_flow_configure(uint16_t port_id,
> >                       const struct rte_flow_port_attr *port_attr,
> > +                     uint16_t nb_queue,
> 
> # rte_flow_info_get() don't have number of queues, why not adding
> number queues in rte_flow_port_attr.

Good suggestion, I'll add it to the capabilities structure.

> # And additional APIs for queue_setup() like ethdev.

ethdev has the start function which tells the PMD when all configurations are done.
In our case there is no such function and device is ready to create a flows as soon
as we exit the rte_flow_configure(). In addition, the number of queues may affect
the resource allocation it is best to process all the requested resources at the same time.

> 
> > +                     const struct rte_flow_queue_attr *queue_attr[],
> >                       struct rte_flow_error *error);
> >
> >  Information about resources that can benefit from pre-allocation can be
> > @@ -3737,7 +3741,7 @@ and pattern and actions templates are created.
> >
> >  .. code-block:: c
> >
> > -       rte_flow_configure(port, *port_attr, *error);
> > +       rte_flow_configure(port, *port_attr, nb_queue, *queue_attr,
> *error);
> >
> >         struct rte_flow_pattern_template *pattern_templates[0] =
> >                 rte_flow_pattern_template_create(port, &itr, &pattern, &error);
> > @@ -3750,6 +3754,159 @@ and pattern and actions templates are created.
> >                                 *actions_templates, nb_actions_templates,
> >                                 *error);
> >
> > +Asynchronous operations
> > +-----------------------
> > +
> > +Flow rules management can be done via special lockless flow
> management queues.
> > +- Queue operations are asynchronous and not thread-safe.
> > +- Operations can thus be invoked by the app's datapath,
> > +packet processing can continue while queue operations are processed by
> NIC.
> > +- The queue number is configured at initialization stage.
> > +- Available operation types: rule creation, rule destruction,
> > +indirect rule creation, indirect rule destruction, indirect rule update.
> > +- Operations may be reordered within a queue.
> > +- Operations can be postponed and pushed to NIC in batches.
> > +- Results pulling must be done on time to avoid queue overflows.
> > +- User data is returned as part of the result to identify an operation.
> > +- Flow handle is valid once the creation operation is enqueued and must
> be
> > +destroyed even if the operation is not successful and the rule is not
> inserted.
> 
> You need CR between lines as rendered text does comes as new line in
> between the items.

OK.

> 
> > +
> > +The asynchronous flow rule insertion logic can be broken into two phases.
> > +
> > +1. Initialization stage as shown here:
> > +
> > +.. _figure_rte_flow_q_init:
> > +
> > +.. figure:: img/rte_flow_q_init.*
> > +
> > +2. Main loop as presented on a datapath application example:
> > +
> > +.. _figure_rte_flow_q_usage:
> > +
> > +.. figure:: img/rte_flow_q_usage.*
> 
> it is better to add sequence operations as text to understand the flow.

I prefer keeping the diagram here, it looks more clean and concise.
Block of text gives no new information and harder to follow, imho.

> 
> > +
> > +Enqueue creation operation
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Enqueueing a flow rule creation operation is similar to simple creation.
> 
> If it is enqueue operation, why not call it ad rte_flow_q_flow_enqueue()
> 
> > +
> > +.. code-block:: c
> > +
> > +       struct rte_flow *
> > +       rte_flow_q_flow_create(uint16_t port_id,
> > +                               uint32_t queue_id,
> > +                               const struct rte_flow_q_ops_attr *q_ops_attr,
> > +                               struct rte_flow_table *table,
> > +                               const struct rte_flow_item pattern[],
> > +                               uint8_t pattern_template_index,
> > +                               const struct rte_flow_action actions[],
> 
> If I understand correctly, table is the pre-configured object that has
> N number of patterns and N number of actions.
> Why giving items[] and actions[] again?

Table only contains templates for pattern and actions.
We still need to provide the values for those templates when we create a flow.
Thus we specify patterns and action here.

> > +                               uint8_t actions_template_index,
> > +                               struct rte_flow_error *error);
> > +
> > +A valid handle in case of success is returned. It must be destroyed later
> > +by calling ``rte_flow_q_flow_destroy()`` even if the rule is rejected by
> HW.
> > +
> > +Enqueue destruction operation
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Queue destruction operation.

We are not destroying queue, we are enqueuing the flow destruction operation.

> 
> > +
> > +Enqueueing a flow rule destruction operation is similar to simple
> destruction.
> > +
> > +.. code-block:: c
> > +
> > +       int
> > +       rte_flow_q_flow_destroy(uint16_t port_id,
> > +                               uint32_t queue_id,
> > +                               const struct rte_flow_q_ops_attr *q_ops_attr,
> > +                               struct rte_flow *flow,
> > +                               struct rte_flow_error *error);
> > +
> > +Push enqueued operations
> > +~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Pushing all internally stored rules from a queue to the NIC.
> > +
> > +.. code-block:: c
> > +
> > +       int
> > +       rte_flow_q_push(uint16_t port_id,
> > +                       uint32_t queue_id,
> > +                       struct rte_flow_error *error);
> > +
> > +There is the postpone attribute in the queue operation attributes.
> > +When it is set, multiple operations can be bulked together and not sent to
> HW
> > +right away to save SW/HW interactions and prioritize throughput over
> latency.
> > +The application must invoke this function to actually push all outstanding
> > +operations to HW in this case.
> > +
> > +Pull enqueued operations
> > +~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Pulling asynchronous operations results.
> > +
> > +The application must invoke this function in order to complete
> asynchronous
> > +flow rule operations and to receive flow rule operations statuses.
> > +
> > +.. code-block:: c
> > +
> > +       int
> > +       rte_flow_q_pull(uint16_t port_id,
> > +                       uint32_t queue_id,
> > +                       struct rte_flow_q_op_res res[],
> > +                       uint16_t n_res,
> > +                       struct rte_flow_error *error);
> > +
> > +Multiple outstanding operation results can be pulled simultaneously.
> > +User data may be provided during a flow creation/destruction in order
> > +to distinguish between multiple operations. User data is returned as part
> > +of the result to provide a method to detect which operation is completed.
> > +
> > +Enqueue indirect action creation operation
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Asynchronous version of indirect action creation API.
> > +
> > +.. code-block:: c
> > +
> > +       struct rte_flow_action_handle *
> > +       rte_flow_q_action_handle_create(uint16_t port_id,
> 
> What is the use case for this?

Indirect action creation may take time, it may depend on hardware resources
allocation. So we add the asynchronous way of creating it the same way.

> How application needs to use this. We already creating flow_table. Is
> that not sufficient?

The indirect action object is used in flow rules via its handle.
This is an extension to the already existing API in order to speed up
the creation of these objects.

> 
> > +                       uint32_t queue_id,
> > +                       const struct rte_flow_q_ops_attr *q_ops_attr,
> > +                       const struct rte_flow_indir_action_conf *indir_action_conf,
> > +                       const struct rte_flow_action *action,
> > +                       struct rte_flow_error *error);
> > +
> > +A valid handle in case of success is returned. It must be destroyed later by
> > +calling ``rte_flow_q_action_handle_destroy()`` even if the rule is
> rejected.
> > +
> > +Enqueue indirect action destruction operation
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Asynchronous version of indirect action destruction API.
> > +
> > +.. code-block:: c
> > +
> > +       int
> > +       rte_flow_q_action_handle_destroy(uint16_t port_id,
> > +                       uint32_t queue_id,
> > +                       const struct rte_flow_q_ops_attr *q_ops_attr,
> > +                       struct rte_flow_action_handle *action_handle,
> > +                       struct rte_flow_error *error);
> > +
> > +Enqueue indirect action update operation
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Asynchronous version of indirect action update API.
> > +
> > +.. code-block:: c
> > +
> > +       int
> > +       rte_flow_q_action_handle_update(uint16_t port_id,
> > +                       uint32_t queue_id,
> > +                       const struct rte_flow_q_ops_attr *q_ops_attr,
> > +                       struct rte_flow_action_handle *action_handle,
> > +                       const void *update,
> > +                       struct rte_flow_error *error);
> > +
> >  .. _flow_isolated_mode:
> >
> >  Flow isolated mode
> > diff --git a/doc/guides/rel_notes/release_22_03.rst
> b/doc/guides/rel_notes/release_22_03.rst
> > index d23d1591df..80a85124e6 100644
> > --- a/doc/guides/rel_notes/release_22_03.rst
> > +++ b/doc/guides/rel_notes/release_22_03.rst
> > @@ -67,6 +67,14 @@ New Features
> >    ``rte_flow_table_destroy``, ``rte_flow_pattern_template_destroy``
> >    and ``rte_flow_actions_template_destroy``.
> >
> > +* ethdev: Added ``rte_flow_q_flow_create`` and
> ``rte_flow_q_flow_destroy`` API
> > +  to enqueue flow creaion/destruction operations asynchronously as well
> as
> > +  ``rte_flow_q_pull`` to poll and retrieve results of these operations and
> > +  ``rte_flow_q_push`` to push all the in-flight operations to the NIC.
> > +  Introduced asynchronous API for indirect actions management as well:
> > +  ``rte_flow_q_action_handle_create``,
> ``rte_flow_q_action_handle_destroy`` and
> > +  ``rte_flow_q_action_handle_update``.
> > +
  
Ivan Malov Feb. 8, 2022, 3:23 p.m. UTC | #4
Hi,

PSB

On Tue, 8 Feb 2022, Alexander Kozyrev wrote:

>> On Tuesday, February 8, 2022 5:57 Jerin Jacob <jerinjacobk@gmail.com> wrote:
>> On Sun, Feb 6, 2022 at 8:57 AM Alexander Kozyrev <akozyrev@nvidia.com>
>> wrote:
>
> Hi Jerin, thanks you for reviewing my patch. I appreciate your input.
> I'm planning to send v4 today with addressed comments today to be on time for RC1.
> I hope that my answers are satisfactory for the rest of questions raised by you.
>
>>>
>>> A new, faster, queue-based flow rules management mechanism is needed
>> for
>>> applications offloading rules inside the datapath. This asynchronous
>>> and lockless mechanism frees the CPU for further packet processing and
>>> reduces the performance impact of the flow rules creation/destruction
>>> on the datapath. Note that queues are not thread-safe and the queue
>>> should be accessed from the same thread for all queue operations.
>>> It is the responsibility of the app to sync the queue functions in case
>>> of multi-threaded access to the same queue.
>>>
>>> The rte_flow_q_flow_create() function enqueues a flow creation to the
>>> requested queue. It benefits from already configured resources and sets
>>> unique values on top of item and action templates. A flow rule is enqueued
>>> on the specified flow queue and offloaded asynchronously to the
>> hardware.
>>> The function returns immediately to spare CPU for further packet
>>> processing. The application must invoke the rte_flow_q_pull() function
>>> to complete the flow rule operation offloading, to clear the queue, and to
>>> receive the operation status. The rte_flow_q_flow_destroy() function
>>> enqueues a flow destruction to the requested queue.
>>
>> It is good to see the implementation, specifically to understand,
>
> We will send PMD implementation in the next few days.
>
>> 1)
>> I understand, We are creating queues to make multiple producers to
>> enqueue multiple jobs in parallel.
>> On the consumer side, Is it HW or some other cores to consume the job?
>
> From API point of view there is no restriction on the type of consumer.
> It could be hardware or software implementation, but in most cases
> (and in our driver) it will be the NIC to handle the requests.
>
>> Can we operate in consumer in parallel?
>
> Yes, we can have separate multiple hardware queues to handle operations
> in parallel independently and without any locking mechanism needed.
>
>> 2) Is Queue part of HW or just SW primitive to submit the work as a channel.
>
> The queue is a software primitive.
>
>>
>>>
>>> Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
>>> ---
>>>  doc/guides/prog_guide/img/rte_flow_q_init.svg |  71 ++++
>>>  .../prog_guide/img/rte_flow_q_usage.svg       |  60 +++
>>>  doc/guides/prog_guide/rte_flow.rst            | 159 +++++++-
>>>  doc/guides/rel_notes/release_22_03.rst        |   8 +
>>>  lib/ethdev/rte_flow.c                         | 173 ++++++++-
>>>  lib/ethdev/rte_flow.h                         | 342 ++++++++++++++++++
>>>  lib/ethdev/rte_flow_driver.h                  |  55 +++
>>>  lib/ethdev/version.map                        |   7 +
>>>  8 files changed, 873 insertions(+), 2 deletions(-)
>>>  create mode 100644 doc/guides/prog_guide/img/rte_flow_q_init.svg
>>>  create mode 100644 doc/guides/prog_guide/img/rte_flow_q_usage.svg
>>>
>>> diff --git a/doc/guides/prog_guide/img/rte_flow_q_init.svg
>> b/doc/guides/prog_guide/img/rte_flow_q_init.svg
>>> new file mode 100644
>>> index 0000000000..2080bf4c04
>>
>>
>>
>> Some comments on the diagrams:
>> # rte_flow_q_create_flow and rte_flow_q_destroy_flow used instead of
>> rte_flow_q_flow_create/destroy
>> # rte_flow_q_pull's brackets(i.e ()) not aligned
>
> Will fix this, thanks for noticing.
>
>>
>>> +</svg>
>>> \ No newline at end of file
>>> diff --git a/doc/guides/prog_guide/rte_flow.rst
>> b/doc/guides/prog_guide/rte_flow.rst
>>> index b7799c5abe..734294e65d 100644
>>> --- a/doc/guides/prog_guide/rte_flow.rst
>>> +++ b/doc/guides/prog_guide/rte_flow.rst
>>> @@ -3607,12 +3607,16 @@ Hints about the expected number of counters
>> or meters in an application,
>>>  for example, allow PMD to prepare and optimize NIC memory layout in
>> advance.
>>>  ``rte_flow_configure()`` must be called before any flow rule is created,
>>>  but after an Ethernet device is configured.
>>> +It also creates flow queues for asynchronous flow rules operations via
>>> +queue-based API, see `Asynchronous operations`_ section.
>>>
>>>  .. code-block:: c
>>>
>>>     int
>>>     rte_flow_configure(uint16_t port_id,
>>>                       const struct rte_flow_port_attr *port_attr,
>>> +                     uint16_t nb_queue,
>>
>> # rte_flow_info_get() don't have number of queues, why not adding
>> number queues in rte_flow_port_attr.
>
> Good suggestion, I'll add it to the capabilities structure.
>
>> # And additional APIs for queue_setup() like ethdev.
>
> ethdev has the start function which tells the PMD when all configurations are done.
> In our case there is no such function and device is ready to create a flows as soon
> as we exit the rte_flow_configure(). In addition, the number of queues may affect
> the resource allocation it is best to process all the requested resources at the same time.
>
>>
>>> +                     const struct rte_flow_queue_attr *queue_attr[],
>>>                       struct rte_flow_error *error);
>>>
>>>  Information about resources that can benefit from pre-allocation can be
>>> @@ -3737,7 +3741,7 @@ and pattern and actions templates are created.
>>>
>>>  .. code-block:: c
>>>
>>> -       rte_flow_configure(port, *port_attr, *error);
>>> +       rte_flow_configure(port, *port_attr, nb_queue, *queue_attr,
>> *error);
>>>
>>>         struct rte_flow_pattern_template *pattern_templates[0] =
>>>                 rte_flow_pattern_template_create(port, &itr, &pattern, &error);
>>> @@ -3750,6 +3754,159 @@ and pattern and actions templates are created.
>>>                                 *actions_templates, nb_actions_templates,
>>>                                 *error);
>>>
>>> +Asynchronous operations
>>> +-----------------------
>>> +
>>> +Flow rules management can be done via special lockless flow
>> management queues.
>>> +- Queue operations are asynchronous and not thread-safe.
>>> +- Operations can thus be invoked by the app's datapath,
>>> +packet processing can continue while queue operations are processed by
>> NIC.
>>> +- The queue number is configured at initialization stage.
>>> +- Available operation types: rule creation, rule destruction,
>>> +indirect rule creation, indirect rule destruction, indirect rule update.
>>> +- Operations may be reordered within a queue.
>>> +- Operations can be postponed and pushed to NIC in batches.
>>> +- Results pulling must be done on time to avoid queue overflows.
>>> +- User data is returned as part of the result to identify an operation.
>>> +- Flow handle is valid once the creation operation is enqueued and must
>> be
>>> +destroyed even if the operation is not successful and the rule is not
>> inserted.
>>
>> You need CR between lines as rendered text does comes as new line in
>> between the items.
>
> OK.
>
>>
>>> +
>>> +The asynchronous flow rule insertion logic can be broken into two phases.
>>> +
>>> +1. Initialization stage as shown here:
>>> +
>>> +.. _figure_rte_flow_q_init:
>>> +
>>> +.. figure:: img/rte_flow_q_init.*
>>> +
>>> +2. Main loop as presented on a datapath application example:
>>> +
>>> +.. _figure_rte_flow_q_usage:
>>> +
>>> +.. figure:: img/rte_flow_q_usage.*
>>
>> it is better to add sequence operations as text to understand the flow.
>
> I prefer keeping the diagram here, it looks more clean and concise.
> Block of text gives no new information and harder to follow, imho.
>
>>
>>> +
>>> +Enqueue creation operation
>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +Enqueueing a flow rule creation operation is similar to simple creation.
>>
>> If it is enqueue operation, why not call it ad rte_flow_q_flow_enqueue()
>>
>>> +
>>> +.. code-block:: c
>>> +
>>> +       struct rte_flow *
>>> +       rte_flow_q_flow_create(uint16_t port_id,
>>> +                               uint32_t queue_id,
>>> +                               const struct rte_flow_q_ops_attr *q_ops_attr,
>>> +                               struct rte_flow_table *table,
>>> +                               const struct rte_flow_item pattern[],
>>> +                               uint8_t pattern_template_index,
>>> +                               const struct rte_flow_action actions[],
>>
>> If I understand correctly, table is the pre-configured object that has
>> N number of patterns and N number of actions.
>> Why giving items[] and actions[] again?
>
> Table only contains templates for pattern and actions.

Then why not reflect it in the argument name? Perhaps, "template_table"?
Or even in the struct name: "struct rte_flow_template_table".
Chances are that readers will misread "rte_flow_table"
as "flow entry table" in the OpenFlow sense.

> We still need to provide the values for those templates when we create a flow.
> Thus we specify patterns and action here.

All of that is clear in terms of this review cycle, but please
consider improving the argument names to help future readers.

>
>>> +                               uint8_t actions_template_index,
>>> +                               struct rte_flow_error *error);
>>> +
>>> +A valid handle in case of success is returned. It must be destroyed later
>>> +by calling ``rte_flow_q_flow_destroy()`` even if the rule is rejected by
>> HW.
>>> +
>>> +Enqueue destruction operation
>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> Queue destruction operation.
>
> We are not destroying queue, we are enqueuing the flow destruction operation.
>
>>
>>> +
>>> +Enqueueing a flow rule destruction operation is similar to simple
>> destruction.
>>> +
>>> +.. code-block:: c
>>> +
>>> +       int
>>> +       rte_flow_q_flow_destroy(uint16_t port_id,
>>> +                               uint32_t queue_id,
>>> +                               const struct rte_flow_q_ops_attr *q_ops_attr,
>>> +                               struct rte_flow *flow,
>>> +                               struct rte_flow_error *error);
>>> +
>>> +Push enqueued operations
>>> +~~~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +Pushing all internally stored rules from a queue to the NIC.
>>> +
>>> +.. code-block:: c
>>> +
>>> +       int
>>> +       rte_flow_q_push(uint16_t port_id,
>>> +                       uint32_t queue_id,
>>> +                       struct rte_flow_error *error);
>>> +
>>> +There is the postpone attribute in the queue operation attributes.
>>> +When it is set, multiple operations can be bulked together and not sent to
>> HW
>>> +right away to save SW/HW interactions and prioritize throughput over
>> latency.
>>> +The application must invoke this function to actually push all outstanding
>>> +operations to HW in this case.
>>> +
>>> +Pull enqueued operations
>>> +~~~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +Pulling asynchronous operations results.
>>> +
>>> +The application must invoke this function in order to complete
>> asynchronous
>>> +flow rule operations and to receive flow rule operations statuses.
>>> +
>>> +.. code-block:: c
>>> +
>>> +       int
>>> +       rte_flow_q_pull(uint16_t port_id,
>>> +                       uint32_t queue_id,
>>> +                       struct rte_flow_q_op_res res[],
>>> +                       uint16_t n_res,
>>> +                       struct rte_flow_error *error);
>>> +
>>> +Multiple outstanding operation results can be pulled simultaneously.
>>> +User data may be provided during a flow creation/destruction in order
>>> +to distinguish between multiple operations. User data is returned as part
>>> +of the result to provide a method to detect which operation is completed.
>>> +
>>> +Enqueue indirect action creation operation
>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +Asynchronous version of indirect action creation API.
>>> +
>>> +.. code-block:: c
>>> +
>>> +       struct rte_flow_action_handle *
>>> +       rte_flow_q_action_handle_create(uint16_t port_id,
>>
>> What is the use case for this?
>
> Indirect action creation may take time, it may depend on hardware resources
> allocation. So we add the asynchronous way of creating it the same way.
>
>> How application needs to use this. We already creating flow_table. Is
>> that not sufficient?
>
> The indirect action object is used in flow rules via its handle.
> This is an extension to the already existing API in order to speed up
> the creation of these objects.
>
>>
>>> +                       uint32_t queue_id,
>>> +                       const struct rte_flow_q_ops_attr *q_ops_attr,
>>> +                       const struct rte_flow_indir_action_conf *indir_action_conf,
>>> +                       const struct rte_flow_action *action,
>>> +                       struct rte_flow_error *error);
>>> +
>>> +A valid handle in case of success is returned. It must be destroyed later by
>>> +calling ``rte_flow_q_action_handle_destroy()`` even if the rule is
>> rejected.
>>> +
>>> +Enqueue indirect action destruction operation
>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +Asynchronous version of indirect action destruction API.
>>> +
>>> +.. code-block:: c
>>> +
>>> +       int
>>> +       rte_flow_q_action_handle_destroy(uint16_t port_id,
>>> +                       uint32_t queue_id,
>>> +                       const struct rte_flow_q_ops_attr *q_ops_attr,
>>> +                       struct rte_flow_action_handle *action_handle,
>>> +                       struct rte_flow_error *error);
>>> +
>>> +Enqueue indirect action update operation
>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +Asynchronous version of indirect action update API.
>>> +
>>> +.. code-block:: c
>>> +
>>> +       int
>>> +       rte_flow_q_action_handle_update(uint16_t port_id,
>>> +                       uint32_t queue_id,
>>> +                       const struct rte_flow_q_ops_attr *q_ops_attr,
>>> +                       struct rte_flow_action_handle *action_handle,
>>> +                       const void *update,
>>> +                       struct rte_flow_error *error);
>>> +
>>>  .. _flow_isolated_mode:
>>>
>>>  Flow isolated mode
>>> diff --git a/doc/guides/rel_notes/release_22_03.rst
>> b/doc/guides/rel_notes/release_22_03.rst
>>> index d23d1591df..80a85124e6 100644
>>> --- a/doc/guides/rel_notes/release_22_03.rst
>>> +++ b/doc/guides/rel_notes/release_22_03.rst
>>> @@ -67,6 +67,14 @@ New Features
>>>    ``rte_flow_table_destroy``, ``rte_flow_pattern_template_destroy``
>>>    and ``rte_flow_actions_template_destroy``.
>>>
>>> +* ethdev: Added ``rte_flow_q_flow_create`` and
>> ``rte_flow_q_flow_destroy`` API
>>> +  to enqueue flow creaion/destruction operations asynchronously as well
>> as
>>> +  ``rte_flow_q_pull`` to poll and retrieve results of these operations and
>>> +  ``rte_flow_q_push`` to push all the in-flight operations to the NIC.
>>> +  Introduced asynchronous API for indirect actions management as well:
>>> +  ``rte_flow_q_action_handle_create``,
>> ``rte_flow_q_action_handle_destroy`` and
>>> +  ``rte_flow_q_action_handle_update``.
>>> +
>
  
Jerin Jacob Feb. 8, 2022, 5:36 p.m. UTC | #5
On Tue, Feb 8, 2022 at 7:42 PM Alexander Kozyrev <akozyrev@nvidia.com> wrote:
>
> > On Tuesday, February 8, 2022 5:57 Jerin Jacob <jerinjacobk@gmail.com> wrote:
> > On Sun, Feb 6, 2022 at 8:57 AM Alexander Kozyrev <akozyrev@nvidia.com>
> > wrote:
>
> Hi Jerin, thanks you for reviewing my patch. I appreciate your input.

Hi Alex,

> I'm planning to send v4 today with addressed comments today to be on time for RC1.
> I hope that my answers are satisfactory for the rest of questions raised by you.

Comments looks good to me. Please remove version field in the next patch.


>
> > >
> > > A new, faster, queue-based flow rules management mechanism is needed
> > for
> > > applications offloading rules inside the datapath. This asynchronous
> > > and lockless mechanism frees the CPU for further packet processing and
> > > reduces the performance impact of the flow rules creation/destruction
> > > on the datapath. Note that queues are not thread-safe and the queue
> > > should be accessed from the same thread for all queue operations.
> > > It is the responsibility of the app to sync the queue functions in case
> > > of multi-threaded access to the same queue.
> > >
> > > The rte_flow_q_flow_create() function enqueues a flow creation to the
> > > requested queue. It benefits from already configured resources and sets
> > > unique values on top of item and action templates. A flow rule is enqueued
> > > on the specified flow queue and offloaded asynchronously to the
> > hardware.
> > > The function returns immediately to spare CPU for further packet
> > > processing. The application must invoke the rte_flow_q_pull() function
> > > to complete the flow rule operation offloading, to clear the queue, and to
> > > receive the operation status. The rte_flow_q_flow_destroy() function
> > > enqueues a flow destruction to the requested queue.
> >
> > It is good to see the implementation, specifically to understand,
>
> We will send PMD implementation in the next few days.
>
> > 1)
> > I understand, We are creating queues to make multiple producers to
> > enqueue multiple jobs in parallel.
> > On the consumer side, Is it HW or some other cores to consume the job?
>
> From API point of view there is no restriction on the type of consumer.
> It could be hardware or software implementation, but in most cases
> (and in our driver) it will be the NIC to handle the requests.
>
> > Can we operate in consumer in parallel?
>
> Yes, we can have separate multiple hardware queues to handle operations
> in parallel independently and without any locking mechanism needed.
>
> > 2) Is Queue part of HW or just SW primitive to submit the work as a channel.
>
> The queue is a software primitive.
>
> >
> > >
> > > Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
> > > ---
> > >  doc/guides/prog_guide/img/rte_flow_q_init.svg |  71 ++++
> > >  .../prog_guide/img/rte_flow_q_usage.svg       |  60 +++
> > >  doc/guides/prog_guide/rte_flow.rst            | 159 +++++++-
> > >  doc/guides/rel_notes/release_22_03.rst        |   8 +
> > >  lib/ethdev/rte_flow.c                         | 173 ++++++++-
> > >  lib/ethdev/rte_flow.h                         | 342 ++++++++++++++++++
> > >  lib/ethdev/rte_flow_driver.h                  |  55 +++
> > >  lib/ethdev/version.map                        |   7 +
> > >  8 files changed, 873 insertions(+), 2 deletions(-)
> > >  create mode 100644 doc/guides/prog_guide/img/rte_flow_q_init.svg
> > >  create mode 100644 doc/guides/prog_guide/img/rte_flow_q_usage.svg
> > >
> > > diff --git a/doc/guides/prog_guide/img/rte_flow_q_init.svg
> > b/doc/guides/prog_guide/img/rte_flow_q_init.svg
> > > new file mode 100644
> > > index 0000000000..2080bf4c04
> >
> >
> >
> > Some comments on the diagrams:
> > # rte_flow_q_create_flow and rte_flow_q_destroy_flow used instead of
> > rte_flow_q_flow_create/destroy
> > # rte_flow_q_pull's brackets(i.e ()) not aligned
>
> Will fix this, thanks for noticing.
>
> >
> > > +</svg>
> > > \ No newline at end of file
> > > diff --git a/doc/guides/prog_guide/rte_flow.rst
> > b/doc/guides/prog_guide/rte_flow.rst
> > > index b7799c5abe..734294e65d 100644
> > > --- a/doc/guides/prog_guide/rte_flow.rst
> > > +++ b/doc/guides/prog_guide/rte_flow.rst
> > > @@ -3607,12 +3607,16 @@ Hints about the expected number of counters
> > or meters in an application,
> > >  for example, allow PMD to prepare and optimize NIC memory layout in
> > advance.
> > >  ``rte_flow_configure()`` must be called before any flow rule is created,
> > >  but after an Ethernet device is configured.
> > > +It also creates flow queues for asynchronous flow rules operations via
> > > +queue-based API, see `Asynchronous operations`_ section.
> > >
> > >  .. code-block:: c
> > >
> > >     int
> > >     rte_flow_configure(uint16_t port_id,
> > >                       const struct rte_flow_port_attr *port_attr,
> > > +                     uint16_t nb_queue,
> >
> > # rte_flow_info_get() don't have number of queues, why not adding
> > number queues in rte_flow_port_attr.
>
> Good suggestion, I'll add it to the capabilities structure.
>
> > # And additional APIs for queue_setup() like ethdev.
>
> ethdev has the start function which tells the PMD when all configurations are done.
> In our case there is no such function and device is ready to create a flows as soon
> as we exit the rte_flow_configure(). In addition, the number of queues may affect
> the resource allocation it is best to process all the requested resources at the same time.
>
> >
> > > +                     const struct rte_flow_queue_attr *queue_attr[],
> > >                       struct rte_flow_error *error);
> > >
> > >  Information about resources that can benefit from pre-allocation can be
> > > @@ -3737,7 +3741,7 @@ and pattern and actions templates are created.
> > >
> > >  .. code-block:: c
> > >
> > > -       rte_flow_configure(port, *port_attr, *error);
> > > +       rte_flow_configure(port, *port_attr, nb_queue, *queue_attr,
> > *error);
> > >
> > >         struct rte_flow_pattern_template *pattern_templates[0] =
> > >                 rte_flow_pattern_template_create(port, &itr, &pattern, &error);
> > > @@ -3750,6 +3754,159 @@ and pattern and actions templates are created.
> > >                                 *actions_templates, nb_actions_templates,
> > >                                 *error);
> > >
> > > +Asynchronous operations
> > > +-----------------------
> > > +
> > > +Flow rules management can be done via special lockless flow
> > management queues.
> > > +- Queue operations are asynchronous and not thread-safe.
> > > +- Operations can thus be invoked by the app's datapath,
> > > +packet processing can continue while queue operations are processed by
> > NIC.
> > > +- The queue number is configured at initialization stage.
> > > +- Available operation types: rule creation, rule destruction,
> > > +indirect rule creation, indirect rule destruction, indirect rule update.
> > > +- Operations may be reordered within a queue.
> > > +- Operations can be postponed and pushed to NIC in batches.
> > > +- Results pulling must be done on time to avoid queue overflows.
> > > +- User data is returned as part of the result to identify an operation.
> > > +- Flow handle is valid once the creation operation is enqueued and must
> > be
> > > +destroyed even if the operation is not successful and the rule is not
> > inserted.
> >
> > You need CR between lines as rendered text does comes as new line in
> > between the items.
>
> OK.
>
> >
> > > +
> > > +The asynchronous flow rule insertion logic can be broken into two phases.
> > > +
> > > +1. Initialization stage as shown here:
> > > +
> > > +.. _figure_rte_flow_q_init:
> > > +
> > > +.. figure:: img/rte_flow_q_init.*
> > > +
> > > +2. Main loop as presented on a datapath application example:
> > > +
> > > +.. _figure_rte_flow_q_usage:
> > > +
> > > +.. figure:: img/rte_flow_q_usage.*
> >
> > it is better to add sequence operations as text to understand the flow.
>
> I prefer keeping the diagram here, it looks more clean and concise.
> Block of text gives no new information and harder to follow, imho.
>
> >
> > > +
> > > +Enqueue creation operation
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Enqueueing a flow rule creation operation is similar to simple creation.
> >
> > If it is enqueue operation, why not call it ad rte_flow_q_flow_enqueue()
> >
> > > +
> > > +.. code-block:: c
> > > +
> > > +       struct rte_flow *
> > > +       rte_flow_q_flow_create(uint16_t port_id,
> > > +                               uint32_t queue_id,
> > > +                               const struct rte_flow_q_ops_attr *q_ops_attr,
> > > +                               struct rte_flow_table *table,
> > > +                               const struct rte_flow_item pattern[],
> > > +                               uint8_t pattern_template_index,
> > > +                               const struct rte_flow_action actions[],
> >
> > If I understand correctly, table is the pre-configured object that has
> > N number of patterns and N number of actions.
> > Why giving items[] and actions[] again?
>
> Table only contains templates for pattern and actions.
> We still need to provide the values for those templates when we create a flow.
> Thus we specify patterns and action here.
>
> > > +                               uint8_t actions_template_index,
> > > +                               struct rte_flow_error *error);
> > > +
> > > +A valid handle in case of success is returned. It must be destroyed later
> > > +by calling ``rte_flow_q_flow_destroy()`` even if the rule is rejected by
> > HW.
> > > +
> > > +Enqueue destruction operation
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > Queue destruction operation.
>
> We are not destroying queue, we are enqueuing the flow destruction operation.
>
> >
> > > +
> > > +Enqueueing a flow rule destruction operation is similar to simple
> > destruction.
> > > +
> > > +.. code-block:: c
> > > +
> > > +       int
> > > +       rte_flow_q_flow_destroy(uint16_t port_id,
> > > +                               uint32_t queue_id,
> > > +                               const struct rte_flow_q_ops_attr *q_ops_attr,
> > > +                               struct rte_flow *flow,
> > > +                               struct rte_flow_error *error);
> > > +
> > > +Push enqueued operations
> > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Pushing all internally stored rules from a queue to the NIC.
> > > +
> > > +.. code-block:: c
> > > +
> > > +       int
> > > +       rte_flow_q_push(uint16_t port_id,
> > > +                       uint32_t queue_id,
> > > +                       struct rte_flow_error *error);
> > > +
> > > +There is the postpone attribute in the queue operation attributes.
> > > +When it is set, multiple operations can be bulked together and not sent to
> > HW
> > > +right away to save SW/HW interactions and prioritize throughput over
> > latency.
> > > +The application must invoke this function to actually push all outstanding
> > > +operations to HW in this case.
> > > +
> > > +Pull enqueued operations
> > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Pulling asynchronous operations results.
> > > +
> > > +The application must invoke this function in order to complete
> > asynchronous
> > > +flow rule operations and to receive flow rule operations statuses.
> > > +
> > > +.. code-block:: c
> > > +
> > > +       int
> > > +       rte_flow_q_pull(uint16_t port_id,
> > > +                       uint32_t queue_id,
> > > +                       struct rte_flow_q_op_res res[],
> > > +                       uint16_t n_res,
> > > +                       struct rte_flow_error *error);
> > > +
> > > +Multiple outstanding operation results can be pulled simultaneously.
> > > +User data may be provided during a flow creation/destruction in order
> > > +to distinguish between multiple operations. User data is returned as part
> > > +of the result to provide a method to detect which operation is completed.
> > > +
> > > +Enqueue indirect action creation operation
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Asynchronous version of indirect action creation API.
> > > +
> > > +.. code-block:: c
> > > +
> > > +       struct rte_flow_action_handle *
> > > +       rte_flow_q_action_handle_create(uint16_t port_id,
> >
> > What is the use case for this?
>
> Indirect action creation may take time, it may depend on hardware resources
> allocation. So we add the asynchronous way of creating it the same way.
>
> > How application needs to use this. We already creating flow_table. Is
> > that not sufficient?
>
> The indirect action object is used in flow rules via its handle.
> This is an extension to the already existing API in order to speed up
> the creation of these objects.
>
> >
> > > +                       uint32_t queue_id,
> > > +                       const struct rte_flow_q_ops_attr *q_ops_attr,
> > > +                       const struct rte_flow_indir_action_conf *indir_action_conf,
> > > +                       const struct rte_flow_action *action,
> > > +                       struct rte_flow_error *error);
> > > +
> > > +A valid handle in case of success is returned. It must be destroyed later by
> > > +calling ``rte_flow_q_action_handle_destroy()`` even if the rule is
> > rejected.
> > > +
> > > +Enqueue indirect action destruction operation
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Asynchronous version of indirect action destruction API.
> > > +
> > > +.. code-block:: c
> > > +
> > > +       int
> > > +       rte_flow_q_action_handle_destroy(uint16_t port_id,
> > > +                       uint32_t queue_id,
> > > +                       const struct rte_flow_q_ops_attr *q_ops_attr,
> > > +                       struct rte_flow_action_handle *action_handle,
> > > +                       struct rte_flow_error *error);
> > > +
> > > +Enqueue indirect action update operation
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Asynchronous version of indirect action update API.
> > > +
> > > +.. code-block:: c
> > > +
> > > +       int
> > > +       rte_flow_q_action_handle_update(uint16_t port_id,
> > > +                       uint32_t queue_id,
> > > +                       const struct rte_flow_q_ops_attr *q_ops_attr,
> > > +                       struct rte_flow_action_handle *action_handle,
> > > +                       const void *update,
> > > +                       struct rte_flow_error *error);
> > > +
> > >  .. _flow_isolated_mode:
> > >
> > >  Flow isolated mode
> > > diff --git a/doc/guides/rel_notes/release_22_03.rst
> > b/doc/guides/rel_notes/release_22_03.rst
> > > index d23d1591df..80a85124e6 100644
> > > --- a/doc/guides/rel_notes/release_22_03.rst
> > > +++ b/doc/guides/rel_notes/release_22_03.rst
> > > @@ -67,6 +67,14 @@ New Features
> > >    ``rte_flow_table_destroy``, ``rte_flow_pattern_template_destroy``
> > >    and ``rte_flow_actions_template_destroy``.
> > >
> > > +* ethdev: Added ``rte_flow_q_flow_create`` and
> > ``rte_flow_q_flow_destroy`` API
> > > +  to enqueue flow creaion/destruction operations asynchronously as well
> > as
> > > +  ``rte_flow_q_pull`` to poll and retrieve results of these operations and
> > > +  ``rte_flow_q_push`` to push all the in-flight operations to the NIC.
> > > +  Introduced asynchronous API for indirect actions management as well:
> > > +  ``rte_flow_q_action_handle_create``,
> > ``rte_flow_q_action_handle_destroy`` and
> > > +  ``rte_flow_q_action_handle_update``.
> > > +
  
Alexander Kozyrev Feb. 9, 2022, 5:40 a.m. UTC | #6
On Tuesday, February 8, 2022 10:24 Ivan Malov <ivan.malov@oktetlabs.ru> wrote:
> On Tue, 8 Feb 2022, Alexander Kozyrev wrote:
> >>
> >>> +
> >>> +Enqueue creation operation
> >>> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>> +
> >>> +Enqueueing a flow rule creation operation is similar to simple creation.
> >>
> >> If it is enqueue operation, why not call it ad rte_flow_q_flow_enqueue()
> >>
> >>> +
> >>> +.. code-block:: c
> >>> +
> >>> +       struct rte_flow *
> >>> +       rte_flow_q_flow_create(uint16_t port_id,
> >>> +                               uint32_t queue_id,
> >>> +                               const struct rte_flow_q_ops_attr *q_ops_attr,
> >>> +                               struct rte_flow_table *table,
> >>> +                               const struct rte_flow_item pattern[],
> >>> +                               uint8_t pattern_template_index,
> >>> +                               const struct rte_flow_action actions[],
> >>
> >> If I understand correctly, table is the pre-configured object that has
> >> N number of patterns and N number of actions.
> >> Why giving items[] and actions[] again?
> >
> > Table only contains templates for pattern and actions.
> 
> Then why not reflect it in the argument name? Perhaps, "template_table"?
> Or even in the struct name: "struct rte_flow_template_table".
> Chances are that readers will misread "rte_flow_table"
> as "flow entry table" in the OpenFlow sense.
> > We still need to provide the values for those templates when we create a
> flow.
> > Thus we specify patterns and action here.
> 
> All of that is clear in terms of this review cycle, but please
> consider improving the argument names to help future readers.

Agree, it is a good idea to rename it to template_table, thanks.

> >
> >>> +                               uint8_t actions_template_index,
> >>> +                               struct rte_flow_error *error);
> >>> +
> >>> +A valid handle in case of success is returned. It must be destroyed later
> >>> +by calling ``rte_flow_q_flow_destroy()`` even if the rule is rejected by
> >> HW.
> >>> +
> >>> +Enqueue destruction operation
> >>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>
> >> Queue destruction operation.
> >
> > We are not destroying queue, we are enqueuing the flow destruction
> operation.
7
  
Jerin Jacob Feb. 9, 2022, 5:50 a.m. UTC | #7
On Tue, Feb 8, 2022 at 7:42 PM Alexander Kozyrev <akozyrev@nvidia.com> wrote:
>
> > On Tuesday, February 8, 2022 5:57 Jerin Jacob <jerinjacobk@gmail.com> wrote:
> > On Sun, Feb 6, 2022 at 8:57 AM Alexander Kozyrev <akozyrev@nvidia.com>
> > wrote:
>
>
> > > +The asynchronous flow rule insertion logic can be broken into two phases.
> > > +
> > > +1. Initialization stage as shown here:
> > > +
> > > +.. _figure_rte_flow_q_init:
> > > +
> > > +.. figure:: img/rte_flow_q_init.*
> > > +
> > > +2. Main loop as presented on a datapath application example:
> > > +
> > > +.. _figure_rte_flow_q_usage:
> > > +
> > > +.. figure:: img/rte_flow_q_usage.*
> >
> > it is better to add sequence operations as text to understand the flow.
>
> I prefer keeping the diagram here, it looks more clean and concise.
> Block of text gives no new information and harder to follow, IMHO.

I forgot to reply yesterday on this specific item.

IMO, The diagram is good. The request is, In the diagram, you can add items like
[1] [2] etc and corresponding text can be added either in image or as text
See example https://doc.dpdk.org/guides/prog_guide/event_crypto_adapter.html
Fig. 49.2.
  

Patch

diff --git a/doc/guides/prog_guide/img/rte_flow_q_init.svg b/doc/guides/prog_guide/img/rte_flow_q_init.svg
new file mode 100644
index 0000000000..2080bf4c04
--- /dev/null
+++ b/doc/guides/prog_guide/img/rte_flow_q_init.svg
@@ -0,0 +1,71 @@ 
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- SPDX-License-Identifier: BSD-3-Clause -->
+<!-- Copyright(c) 2022 NVIDIA Corporation & Affiliates -->
+
+<svg
+   width="485" height="535"
+   xmlns="http://www.w3.org/2000/svg"
+   xmlns:xlink="http://www.w3.org/1999/xlink"
+   overflow="hidden">
+   <defs>
+      <clipPath id="clip0">
+         <rect x="0" y="0" width="485" height="535"/>
+      </clipPath>
+   </defs>
+   <g clip-path="url(#clip0)">
+      <rect x="0" y="0" width="485" height="535" fill="#FFFFFF"/>
+      <rect x="0.500053" y="79.5001" width="482" height="59"
+         stroke="#000000" stroke-width="1.33333" stroke-miterlimit="8"
+         fill="#A6A6A6"/>
+      <text font-family="Calibri,Calibri_MSFontService,sans-serif"
+         font-weight="400" font-size="24" transform="translate(121.6 116)">
+         rte_eth_dev_configure
+         <tspan font-size="24" x="224.007" y="0">()</tspan>
+      </text>
+      <rect x="0.500053" y="158.5" width="482" height="59" stroke="#000000"
+         stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/>
+      <text font-family="Calibri,Calibri_MSFontService,sans-serif"
+         font-weight="400" font-size="24" transform="translate(140.273 195)">
+         rte_flow_configure()
+      </text>
+      <rect x="0.500053" y="236.5" width="482" height="60" stroke="#000000"
+         stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/>
+      <text font-family="Calibri,Calibri_MSFontService,sans-serif"
+         font-weight="400" font-size="24" transform="translate(77.4259 274)">
+         rte_flow_pattern_template_create()
+      </text>
+      <rect x="0.500053" y="316.5" width="482" height="59" stroke="#000000"
+         stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/>
+      <text font-family="Calibri,Calibri_MSFontService,sans-serif"
+         font-weight="400" font-size="24" transform="translate(69.3792 353)">
+         rte_flow_actions_template_create
+         <tspan font-size="24" x="328.447" y="0">(</tspan>)
+      </text>
+      <rect x="0.500053" y="0.500053" width="482" height="60" stroke="#000000"
+         stroke-width="1.33333" stroke-miterlimit="8" fill="#A6A6A6"/>
+      <text font-family="Calibri,Calibri_MSFontService,sans-serif"
+         font-weight="400" font-size="24" transform="translate(177.233 37)">
+         rte_eal_init
+         <tspan font-size="24" x="112.74" y="0">()</tspan>
+      </text>
+      <path d="M2-1.09108e-05 2.00005 9.2445-1.99995 9.24452-2 1.09108e-05ZM6.00004 7.24448 0.000104987 19.2445-5.99996 7.24455Z" transform="matrix(-1 0 0 1 241 60)"/>
+      <path d="M2-1.08133e-05 2.00005 9.41805-1.99995 9.41807-2 1.08133e-05ZM6.00004 7.41802 0.000104987 19.4181-5.99996 7.41809Z" transform="matrix(-1 0 0 1 241 138)"/>
+      <path d="M2-1.09108e-05 2.00005 9.2445-1.99995 9.24452-2 1.09108e-05ZM6.00004 7.24448 0.000104987 19.2445-5.99996 7.24455Z" transform="matrix(-1 0 0 1 241 217)"/>
+      <rect x="0.500053" y="395.5" width="482" height="59" stroke="#000000"
+         stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/>
+      <text font-family="Calibri,Calibri_MSFontService,sans-serif"
+         font-weight="400" font-size="24" transform="translate(124.989 432)">
+         rte_flow_table_create
+         <tspan font-size="24" x="217.227" y="0">(</tspan>
+         <tspan font-size="24" x="224.56" y="0">)</tspan>
+      </text>
+      <path d="M2-1.05859e-05 2.00005 9.83526-1.99995 9.83529-2 1.05859e-05ZM6.00004 7.83524 0.000104987 19.8353-5.99996 7.83531Z" transform="matrix(-1 0 0 1 241 296)"/>
+      <path d="M243 375 243 384.191 239 384.191 239 375ZM247 382.191 241 394.191 235 382.191Z"/>
+      <rect x="0.500053" y="473.5" width="482" height="60" stroke="#000000"
+         stroke-width="1.33333" stroke-miterlimit="8" fill="#A6A6A6"/>
+      <text font-family="Calibri,Calibri_MSFontService,sans-serif"
+         font-weight="400" font-size="24" transform="translate(145.303 511)">
+         rte_eth_dev_start()</text>
+      <path d="M245 454 245 463.191 241 463.191 241 454ZM249 461.191 243 473.191 237 461.191Z"/>
+   </g>
+</svg>
\ No newline at end of file
diff --git a/doc/guides/prog_guide/img/rte_flow_q_usage.svg b/doc/guides/prog_guide/img/rte_flow_q_usage.svg
new file mode 100644
index 0000000000..113da764ba
--- /dev/null
+++ b/doc/guides/prog_guide/img/rte_flow_q_usage.svg
@@ -0,0 +1,60 @@ 
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- SPDX-License-Identifier: BSD-3-Clause -->
+<!-- Copyright(c) 2022 NVIDIA Corporation & Affiliates -->
+
+<svg
+   width="880" height="610"
+   xmlns="http://www.w3.org/2000/svg"
+   xmlns:xlink="http://www.w3.org/1999/xlink"
+   overflow="hidden">
+   <defs>
+      <clipPath id="clip0">
+         <rect x="0" y="0" width="880" height="610"/>
+      </clipPath>
+   </defs>
+   <g clip-path="url(#clip0)">
+      <rect x="0" y="0" width="880" height="610" fill="#FFFFFF"/>
+      <rect x="333.5" y="0.500053" width="234" height="45" stroke="#000000" stroke-width="1.33333" stroke-miterlimit="8" fill="#A6A6A6"/>
+      <text font-family="Consolas,Consolas_MSFontService,sans-serif" font-weight="400" font-size="19" transform="translate(357.196 29)">rte_eth_rx_burst()</text>
+      <rect x="333.5" y="63.5001" width="234" height="45" stroke="#000000" stroke-width="1.33333" stroke-miterlimit="8" fill="#D9D9D9"/>
+      <text font-family="Calibri,Calibri_MSFontService,sans-serif" font-weight="400" font-size="19" transform="translate(394.666 91)">analyze <tspan font-size="19" x="60.9267" y="0">packet </tspan></text>
+      <rect x="572.5" y="279.5" width="234" height="46" stroke="#000000" stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/>
+      <text font-family="Calibri,Calibri_MSFontService,sans-serif" font-weight="400" font-size="19" transform="translate(591.429 308)">rte_flow_q_create_flow()</text>
+      <path d="M333.5 384 450.5 350.5 567.5 384 450.5 417.5Z" stroke="#000000" stroke-width="1.33333" stroke-miterlimit="8" fill="#D9D9D9" fill-rule="evenodd"/>
+      <text font-family="Calibri,Calibri_MSFontService,sans-serif" font-weight="400" font-size="19" transform="translate(430.069 378)">more <tspan font-size="19" x="-12.94" y="23">packets?</tspan></text>
+      <path d="M689.249 325.5 689.249 338.402 450.5 338.402 450.833 338.069 450.833 343.971 450.167 343.971 450.167 337.735 688.916 337.735 688.582 338.069 688.582 325.5ZM454.5 342.638 450.5 350.638 446.5 342.638Z"/>
+      <path d="M450.833 45.5 450.833 56.8197 450.167 56.8197 450.167 45.5001ZM454.5 55.4864 450.5 63.4864 446.5 55.4864Z"/>
+      <path d="M450.833 108.5 450.833 120.375 450.167 120.375 450.167 108.5ZM454.5 119.041 450.5 127.041 446.5 119.041Z"/>
+      <path d="M451.833 507.5 451.833 533.61 451.167 533.61 451.167 507.5ZM455.5 532.277 451.5 540.277 447.5 532.277Z"/>
+      <path d="M0 0.333333-23.9993 0.333333-23.666 0-23.666 141.649-23.9993 141.316 562.966 141.316 562.633 141.649 562.633 124.315 563.299 124.315 563.299 141.983-24.3327 141.983-24.3327-0.333333 0-0.333333ZM558.966 125.649 562.966 117.649 566.966 125.649Z" transform="matrix(-6.12323e-17 -1 -1 6.12323e-17 451.149 585.466)"/>
+      <path d="M333.5 160.5 450.5 126.5 567.5 160.5 450.5 194.5Z" stroke="#000000" stroke-width="1.33333" stroke-miterlimit="8" fill="#D9D9D9" fill-rule="evenodd"/>
+      <text font-family="Calibri,Calibri_MSFontService,sans-serif" font-weight="400" font-size="19" transform="translate(417.576 155)">add new <tspan font-size="19" x="13.2867" y="23">rule?</tspan></text>
+      <path d="M567.5 160.167 689.267 160.167 689.267 273.228 688.6 273.228 688.6 160.5 688.933 160.833 567.5 160.833ZM692.933 271.894 688.933 279.894 684.933 271.894Z"/>
+      <rect x="602.5" y="127.5" width="46" height="30" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="#D9D9D9"/>
+      <text font-family="Trebuchet MS,Trebuchet MS_MSFontService,sans-serif" font-weight="400" font-size="19" transform="translate(611.34 148)">yes</text>
+      <rect x="254.5" y="126.5" width="46" height="31" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="#D9D9D9"/>
+      <text font-family="Trebuchet MS,Trebuchet MS_MSFontService,sans-serif" font-weight="400" font-size="19" transform="translate(267.182 147)">no</text>
+      <path d="M0-0.333333 251.563-0.333333 251.563 298.328 8.00002 298.328 8.00002 297.662 251.229 297.662 250.896 297.995 250.896 0 251.229 0.333333 0 0.333333ZM9.33333 301.995 1.33333 297.995 9.33333 293.995Z" transform="matrix(1 0 0 -1 567.5 383.495)"/>
+      <path d="M86.5001 213.5 203.5 180.5 320.5 213.5 203.5 246.5Z" stroke="#000000" stroke-width="1.33333" stroke-miterlimit="8" fill="#D9D9D9" fill-rule="evenodd"/>
+      <text font-family="Calibri,Calibri_MSFontService,sans-serif" font-weight="400" font-size="19" transform="translate(159.155 208)">destroy the <tspan font-size="19" x="24.0333" y="23">rule?</tspan></text>
+      <path d="M0-0.333333 131.029-0.333333 131.029 12.9778 130.363 12.9778 130.363 0 130.696 0.333333 0 0.333333ZM134.696 11.6445 130.696 19.6445 126.696 11.6445Z" transform="matrix(-1 1.22465e-16 1.22465e-16 1 334.196 160.5)"/>
+      <rect x="81.5001" y="280.5" width="234" height="45" stroke="#000000" stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/>
+      <text font-family="Calibri,Calibri_MSFontService,sans-serif" font-weight="400" font-size="19" transform="translate(96.2282 308)">rte_flow_q_destroy_flow()</text>
+      <path d="M0 0.333333-24.0001 0.333333-23.6667 0-23.6667 49.9498-24.0001 49.6165 121.748 49.6165 121.748 59.958 121.082 59.958 121.082 49.9498 121.415 50.2832-24.3334 50.2832-24.3334-0.333333 0-0.333333ZM125.415 58.6247 121.415 66.6247 117.415 58.6247Z" transform="matrix(-1 0 0 1 319.915 213.5)"/>
+      <path d="M86.5001 213.833 62.5002 213.833 62.8335 213.5 62.8335 383.95 62.5002 383.617 327.511 383.617 327.511 384.283 62.1668 384.283 62.1668 213.167 86.5001 213.167ZM326.178 379.95 334.178 383.95 326.178 387.95Z"/>
+      <path d="M0-0.333333 12.8273-0.333333 12.8273 252.111 12.494 251.778 18.321 251.778 18.321 252.445 12.1607 252.445 12.1607 0 12.494 0.333333 0 0.333333ZM16.9877 248.111 24.9877 252.111 16.9877 256.111Z" transform="matrix(1.83697e-16 1 1 -1.83697e-16 198.5 325.5)"/>
+      <rect x="334.5" y="540.5" width="234" height="45" stroke="#000000" stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/>
+      <text font-family="Calibri,Calibri_MSFontService,sans-serif" font-weight="400" font-size="19" transform="translate(365.083 569)">rte_flow_q_pull<tspan font-size="19" x="160.227" y="0">()</tspan></text>
+      <rect x="334.5" y="462.5" width="234" height="45" stroke="#000000" stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/>
+      <text font-family="Calibri,Calibri_MSFontService,sans-serif" font-weight="400" font-size="19" transform="translate(379.19 491)">rte_flow_q<tspan font-size="19" x="83.56" y="0">_push</tspan>()</text>
+      <path d="M450.833 417.495 451.402 455.999 450.735 456.008 450.167 417.505ZM455.048 454.611 451.167 462.669 447.049 454.729Z"/>
+      <rect x="0.500053" y="287.5" width="46" height="30" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="#D9D9D9"/>
+      <text font-family="Trebuchet MS,Trebuchet MS_MSFontService,sans-serif" font-weight="400" font-size="19" transform="translate(12.8617 308)">no</text>
+      <rect x="357.5" y="223.5" width="47" height="31" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="#D9D9D9"/>
+      <text font-family="Trebuchet MS,Trebuchet MS_MSFontService,sans-serif" font-weight="400" font-size="19" transform="translate(367.001 244)">yes</text>
+      <rect x="469.5" y="421.5" width="46" height="30" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="#D9D9D9"/>
+      <text font-family="Trebuchet MS,Trebuchet MS_MSFontService,sans-serif" font-weight="400" font-size="19" transform="translate(481.872 442)">no</text>
+      <rect x="832.5" y="223.5" width="46" height="31" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="#D9D9D9"/>
+      <text font-family="Trebuchet MS,Trebuchet MS_MSFontService,sans-serif" font-weight="400" font-size="19" transform="translate(841.777 244)">yes</text>
+   </g>
+</svg>
\ No newline at end of file
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index b7799c5abe..734294e65d 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -3607,12 +3607,16 @@  Hints about the expected number of counters or meters in an application,
 for example, allow PMD to prepare and optimize NIC memory layout in advance.
 ``rte_flow_configure()`` must be called before any flow rule is created,
 but after an Ethernet device is configured.
+It also creates flow queues for asynchronous flow rules operations via
+queue-based API, see `Asynchronous operations`_ section.
 
 .. code-block:: c
 
    int
    rte_flow_configure(uint16_t port_id,
                      const struct rte_flow_port_attr *port_attr,
+                     uint16_t nb_queue,
+                     const struct rte_flow_queue_attr *queue_attr[],
                      struct rte_flow_error *error);
 
 Information about resources that can benefit from pre-allocation can be
@@ -3737,7 +3741,7 @@  and pattern and actions templates are created.
 
 .. code-block:: c
 
-	rte_flow_configure(port, *port_attr, *error);
+	rte_flow_configure(port, *port_attr, nb_queue, *queue_attr, *error);
 
 	struct rte_flow_pattern_template *pattern_templates[0] =
 		rte_flow_pattern_template_create(port, &itr, &pattern, &error);
@@ -3750,6 +3754,159 @@  and pattern and actions templates are created.
 				*actions_templates, nb_actions_templates,
 				*error);
 
+Asynchronous operations
+-----------------------
+
+Flow rules management can be done via special lockless flow management queues.
+- Queue operations are asynchronous and not thread-safe.
+- Operations can thus be invoked by the app's datapath,
+packet processing can continue while queue operations are processed by NIC.
+- The queue number is configured at initialization stage.
+- Available operation types: rule creation, rule destruction,
+indirect rule creation, indirect rule destruction, indirect rule update.
+- Operations may be reordered within a queue.
+- Operations can be postponed and pushed to NIC in batches.
+- Results pulling must be done on time to avoid queue overflows.
+- User data is returned as part of the result to identify an operation.
+- Flow handle is valid once the creation operation is enqueued and must be
+destroyed even if the operation is not successful and the rule is not inserted.
+
+The asynchronous flow rule insertion logic can be broken into two phases.
+
+1. Initialization stage as shown here:
+
+.. _figure_rte_flow_q_init:
+
+.. figure:: img/rte_flow_q_init.*
+
+2. Main loop as presented on a datapath application example:
+
+.. _figure_rte_flow_q_usage:
+
+.. figure:: img/rte_flow_q_usage.*
+
+Enqueue creation operation
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Enqueueing a flow rule creation operation is similar to simple creation.
+
+.. code-block:: c
+
+	struct rte_flow *
+	rte_flow_q_flow_create(uint16_t port_id,
+				uint32_t queue_id,
+				const struct rte_flow_q_ops_attr *q_ops_attr,
+				struct rte_flow_table *table,
+				const struct rte_flow_item pattern[],
+				uint8_t pattern_template_index,
+				const struct rte_flow_action actions[],
+				uint8_t actions_template_index,
+				struct rte_flow_error *error);
+
+A valid handle in case of success is returned. It must be destroyed later
+by calling ``rte_flow_q_flow_destroy()`` even if the rule is rejected by HW.
+
+Enqueue destruction operation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Enqueueing a flow rule destruction operation is similar to simple destruction.
+
+.. code-block:: c
+
+	int
+	rte_flow_q_flow_destroy(uint16_t port_id,
+				uint32_t queue_id,
+				const struct rte_flow_q_ops_attr *q_ops_attr,
+				struct rte_flow *flow,
+				struct rte_flow_error *error);
+
+Push enqueued operations
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Pushing all internally stored rules from a queue to the NIC.
+
+.. code-block:: c
+
+	int
+	rte_flow_q_push(uint16_t port_id,
+			uint32_t queue_id,
+			struct rte_flow_error *error);
+
+There is the postpone attribute in the queue operation attributes.
+When it is set, multiple operations can be bulked together and not sent to HW
+right away to save SW/HW interactions and prioritize throughput over latency.
+The application must invoke this function to actually push all outstanding
+operations to HW in this case.
+
+Pull enqueued operations
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Pulling asynchronous operations results.
+
+The application must invoke this function in order to complete asynchronous
+flow rule operations and to receive flow rule operations statuses.
+
+.. code-block:: c
+
+	int
+	rte_flow_q_pull(uint16_t port_id,
+			uint32_t queue_id,
+			struct rte_flow_q_op_res res[],
+			uint16_t n_res,
+			struct rte_flow_error *error);
+
+Multiple outstanding operation results can be pulled simultaneously.
+User data may be provided during a flow creation/destruction in order
+to distinguish between multiple operations. User data is returned as part
+of the result to provide a method to detect which operation is completed.
+
+Enqueue indirect action creation operation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Asynchronous version of indirect action creation API.
+
+.. code-block:: c
+
+	struct rte_flow_action_handle *
+	rte_flow_q_action_handle_create(uint16_t port_id,
+			uint32_t queue_id,
+			const struct rte_flow_q_ops_attr *q_ops_attr,
+			const struct rte_flow_indir_action_conf *indir_action_conf,
+			const struct rte_flow_action *action,
+			struct rte_flow_error *error);
+
+A valid handle in case of success is returned. It must be destroyed later by
+calling ``rte_flow_q_action_handle_destroy()`` even if the rule is rejected.
+
+Enqueue indirect action destruction operation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Asynchronous version of indirect action destruction API.
+
+.. code-block:: c
+
+	int
+	rte_flow_q_action_handle_destroy(uint16_t port_id,
+			uint32_t queue_id,
+			const struct rte_flow_q_ops_attr *q_ops_attr,
+			struct rte_flow_action_handle *action_handle,
+			struct rte_flow_error *error);
+
+Enqueue indirect action update operation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Asynchronous version of indirect action update API.
+
+.. code-block:: c
+
+	int
+	rte_flow_q_action_handle_update(uint16_t port_id,
+			uint32_t queue_id,
+			const struct rte_flow_q_ops_attr *q_ops_attr,
+			struct rte_flow_action_handle *action_handle,
+			const void *update,
+			struct rte_flow_error *error);
+
 .. _flow_isolated_mode:
 
 Flow isolated mode
diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst
index d23d1591df..80a85124e6 100644
--- a/doc/guides/rel_notes/release_22_03.rst
+++ b/doc/guides/rel_notes/release_22_03.rst
@@ -67,6 +67,14 @@  New Features
   ``rte_flow_table_destroy``, ``rte_flow_pattern_template_destroy``
   and ``rte_flow_actions_template_destroy``.
 
+* ethdev: Added ``rte_flow_q_flow_create`` and ``rte_flow_q_flow_destroy`` API
+  to enqueue flow creaion/destruction operations asynchronously as well as
+  ``rte_flow_q_pull`` to poll and retrieve results of these operations and
+  ``rte_flow_q_push`` to push all the in-flight operations to the NIC.
+  Introduced asynchronous API for indirect actions management as well:
+  ``rte_flow_q_action_handle_create``, ``rte_flow_q_action_handle_destroy`` and
+  ``rte_flow_q_action_handle_update``.
+
 * **Updated AF_XDP PMD**
 
   * Added support for libxdp >=v1.2.2.
diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index ab942117d0..127dbb13cb 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -1415,6 +1415,8 @@  rte_flow_info_get(uint16_t port_id,
 int
 rte_flow_configure(uint16_t port_id,
 		   const struct rte_flow_port_attr *port_attr,
+		   uint16_t nb_queue,
+		   const struct rte_flow_queue_attr *queue_attr[],
 		   struct rte_flow_error *error)
 {
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
@@ -1424,7 +1426,7 @@  rte_flow_configure(uint16_t port_id,
 		return -rte_errno;
 	if (likely(!!ops->configure)) {
 		return flow_err(port_id,
-				ops->configure(dev, port_attr, error),
+				ops->configure(dev, port_attr, nb_queue, queue_attr, error),
 				error);
 	}
 	return rte_flow_error_set(error, ENOTSUP,
@@ -1572,3 +1574,172 @@  rte_flow_table_destroy(uint16_t port_id,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				  NULL, rte_strerror(ENOTSUP));
 }
+
+struct rte_flow *
+rte_flow_q_flow_create(uint16_t port_id,
+		       uint32_t queue_id,
+		       const struct rte_flow_q_ops_attr *q_ops_attr,
+		       struct rte_flow_table *table,
+		       const struct rte_flow_item pattern[],
+		       uint8_t pattern_template_index,
+		       const struct rte_flow_action actions[],
+		       uint8_t actions_template_index,
+		       struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+	struct rte_flow *flow;
+
+	if (unlikely(!ops))
+		return NULL;
+	if (likely(!!ops->q_flow_create)) {
+		flow = ops->q_flow_create(dev, queue_id, q_ops_attr, table,
+					  pattern, pattern_template_index,
+					  actions, actions_template_index,
+					  error);
+		if (flow == NULL)
+			flow_err(port_id, -rte_errno, error);
+		return flow;
+	}
+	rte_flow_error_set(error, ENOTSUP,
+			   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOTSUP));
+	return NULL;
+}
+
+int
+rte_flow_q_flow_destroy(uint16_t port_id,
+			uint32_t queue_id,
+			const struct rte_flow_q_ops_attr *q_ops_attr,
+			struct rte_flow *flow,
+			struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->q_flow_destroy)) {
+		return flow_err(port_id,
+				ops->q_flow_destroy(dev, queue_id,
+						    q_ops_attr, flow, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+struct rte_flow_action_handle *
+rte_flow_q_action_handle_create(uint16_t port_id,
+		uint32_t queue_id,
+		const struct rte_flow_q_ops_attr *q_ops_attr,
+		const struct rte_flow_indir_action_conf *indir_action_conf,
+		const struct rte_flow_action *action,
+		struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+	struct rte_flow_action_handle *handle;
+
+	if (unlikely(!ops))
+		return NULL;
+	if (unlikely(!ops->q_action_handle_create)) {
+		rte_flow_error_set(error, ENOSYS,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				   rte_strerror(ENOSYS));
+		return NULL;
+	}
+	handle = ops->q_action_handle_create(dev, queue_id, q_ops_attr,
+					     indir_action_conf, action, error);
+	if (handle == NULL)
+		flow_err(port_id, -rte_errno, error);
+	return handle;
+}
+
+int
+rte_flow_q_action_handle_destroy(uint16_t port_id,
+		uint32_t queue_id,
+		const struct rte_flow_q_ops_attr *q_ops_attr,
+		struct rte_flow_action_handle *action_handle,
+		struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+	int ret;
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (unlikely(!ops->q_action_handle_destroy))
+		return rte_flow_error_set(error, ENOSYS,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, rte_strerror(ENOSYS));
+	ret = ops->q_action_handle_destroy(dev, queue_id, q_ops_attr,
+					   action_handle, error);
+	return flow_err(port_id, ret, error);
+}
+
+int
+rte_flow_q_action_handle_update(uint16_t port_id,
+		uint32_t queue_id,
+		const struct rte_flow_q_ops_attr *q_ops_attr,
+		struct rte_flow_action_handle *action_handle,
+		const void *update,
+		struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+	int ret;
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (unlikely(!ops->q_action_handle_update))
+		return rte_flow_error_set(error, ENOSYS,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, rte_strerror(ENOSYS));
+	ret = ops->q_action_handle_update(dev, queue_id, q_ops_attr,
+					  action_handle, update, error);
+	return flow_err(port_id, ret, error);
+}
+
+int
+rte_flow_q_push(uint16_t port_id,
+		uint32_t queue_id,
+		struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->q_push)) {
+		return flow_err(port_id,
+				ops->q_push(dev, queue_id, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_q_pull(uint16_t port_id,
+		uint32_t queue_id,
+		struct rte_flow_q_op_res res[],
+		uint16_t n_res,
+		struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+	int ret;
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->q_pull)) {
+		ret = ops->q_pull(dev, queue_id, res, n_res, error);
+		return ret ? ret : flow_err(port_id, ret, error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index a65f5d4e6a..25a6ad5b64 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -4883,6 +4883,21 @@  struct rte_flow_port_attr {
 	uint32_t nb_meters;
 };
 
+/**
+ * Flow engine queue configuration.
+ */
+__extension__
+struct rte_flow_queue_attr {
+	/**
+	 * Version of the struct layout, should be 0.
+	 */
+	uint32_t version;
+	/**
+	 * Number of flow rule operations a queue can hold.
+	 */
+	uint32_t size;
+};
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -4922,6 +4937,11 @@  rte_flow_info_get(uint16_t port_id,
  *   Port identifier of Ethernet device.
  * @param[in] port_attr
  *   Port configuration attributes.
+ * @param[in] nb_queue
+ *   Number of flow queues to be configured.
+ * @param[in] queue_attr
+ *   Array that holds attributes for each flow queue.
+ *   Number of elements is set in @p port_attr.nb_queues.
  * @param[out] error
  *   Perform verbose error reporting if not NULL.
  *   PMDs initialize this structure in case of error only.
@@ -4933,6 +4953,8 @@  __rte_experimental
 int
 rte_flow_configure(uint16_t port_id,
 		   const struct rte_flow_port_attr *port_attr,
+		   uint16_t nb_queue,
+		   const struct rte_flow_queue_attr *queue_attr[],
 		   struct rte_flow_error *error);
 
 /**
@@ -5209,6 +5231,326 @@  rte_flow_table_destroy(uint16_t port_id,
 		       struct rte_flow_table *table,
 		       struct rte_flow_error *error);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Queue operation attributes.
+ */
+struct rte_flow_q_ops_attr {
+	/**
+	 * Version of the struct layout, should be 0.
+	 */
+	uint32_t version;
+	/**
+	 * The user data that will be returned on the completion events.
+	 */
+	void *user_data;
+	 /**
+	  * When set, the requested action will not be sent to the HW immediately.
+	  * The application must call the rte_flow_queue_push to actually send it.
+	  */
+	uint32_t postpone:1;
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue rule creation operation.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param queue_id
+ *   Flow queue used to insert the rule.
+ * @param[in] q_ops_attr
+ *   Rule creation operation attributes.
+ * @param[in] table
+ *   Table to select templates from.
+ * @param[in] pattern
+ *   List of pattern items to be used.
+ *   The list order should match the order in the pattern template.
+ *   The spec is the only relevant member of the item that is being used.
+ * @param[in] pattern_template_index
+ *   Pattern template index in the table.
+ * @param[in] actions
+ *   List of actions to be used.
+ *   The list order should match the order in the actions template.
+ * @param[in] actions_template_index
+ *   Actions template index in the table.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *   PMDs initialize this structure in case of error only.
+ *
+ * @return
+ *   Handle on success, NULL otherwise and rte_errno is set.
+ *   The rule handle doesn't mean that the rule was offloaded.
+ *   Only completion result indicates that the rule was offloaded.
+ */
+__rte_experimental
+struct rte_flow *
+rte_flow_q_flow_create(uint16_t port_id,
+		       uint32_t queue_id,
+		       const struct rte_flow_q_ops_attr *q_ops_attr,
+		       struct rte_flow_table *table,
+		       const struct rte_flow_item pattern[],
+		       uint8_t pattern_template_index,
+		       const struct rte_flow_action actions[],
+		       uint8_t actions_template_index,
+		       struct rte_flow_error *error);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue rule destruction operation.
+ *
+ * This function enqueues a destruction operation on the queue.
+ * Application should assume that after calling this function
+ * the rule handle is not valid anymore.
+ * Completion indicates the full removal of the rule from the HW.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param queue_id
+ *   Flow queue which is used to destroy the rule.
+ *   This must match the queue on which the rule was created.
+ * @param[in] q_ops_attr
+ *   Rule destroy operation attributes.
+ * @param[in] flow
+ *   Flow handle to be destroyed.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *   PMDs initialize this structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_q_flow_destroy(uint16_t port_id,
+			uint32_t queue_id,
+			const struct rte_flow_q_ops_attr *q_ops_attr,
+			struct rte_flow *flow,
+			struct rte_flow_error *error);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue indirect action creation operation.
+ * @see rte_flow_action_handle_create
+ *
+ * @param[in] port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] queue_id
+ *   Flow queue which is used to create the rule.
+ * @param[in] q_ops_attr
+ *   Queue operation attributes.
+ * @param[in] indir_action_conf
+ *   Action configuration for the indirect action object creation.
+ * @param[in] action
+ *   Specific configuration of the indirect action object.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *   PMDs initialize this structure in case of error only.
+ *
+ * @return
+ *   - (0) if success.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-ENOSYS) if underlying device does not support this functionality.
+ *   - (-EIO) if underlying device is removed.
+ *   - (-ENOENT) if action pointed by *action* handle was not found.
+ *   - (-EBUSY) if action pointed by *action* handle still used by some rules
+ *   rte_errno is also set.
+ */
+__rte_experimental
+struct rte_flow_action_handle *
+rte_flow_q_action_handle_create(uint16_t port_id,
+		uint32_t queue_id,
+		const struct rte_flow_q_ops_attr *q_ops_attr,
+		const struct rte_flow_indir_action_conf *indir_action_conf,
+		const struct rte_flow_action *action,
+		struct rte_flow_error *error);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue indirect action destruction operation.
+ * The destroy queue must be the same
+ * as the queue on which the action was created.
+ *
+ * @param[in] port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] queue_id
+ *   Flow queue which is used to destroy the rule.
+ * @param[in] q_ops_attr
+ *   Queue operation attributes.
+ * @param[in] action_handle
+ *   Handle for the indirect action object to be destroyed.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *   PMDs initialize this structure in case of error only.
+ *
+ * @return
+ *   - (0) if success.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-ENOSYS) if underlying device does not support this functionality.
+ *   - (-EIO) if underlying device is removed.
+ *   - (-ENOENT) if action pointed by *action* handle was not found.
+ *   - (-EBUSY) if action pointed by *action* handle still used by some rules
+ *   rte_errno is also set.
+ */
+__rte_experimental
+int
+rte_flow_q_action_handle_destroy(uint16_t port_id,
+		uint32_t queue_id,
+		const struct rte_flow_q_ops_attr *q_ops_attr,
+		struct rte_flow_action_handle *action_handle,
+		struct rte_flow_error *error);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue indirect action update operation.
+ * @see rte_flow_action_handle_create
+ *
+ * @param[in] port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] queue_id
+ *   Flow queue which is used to update the rule.
+ * @param[in] q_ops_attr
+ *   Queue operation attributes.
+ * @param[in] action_handle
+ *   Handle for the indirect action object to be updated.
+ * @param[in] update
+ *   Update profile specification used to modify the action pointed by handle.
+ *   *update* could be with the same type of the immediate action corresponding
+ *   to the *handle* argument when creating, or a wrapper structure includes
+ *   action configuration to be updated and bit fields to indicate the member
+ *   of fields inside the action to update.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *   PMDs initialize this structure in case of error only.
+ *
+ * @return
+ *   - (0) if success.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-ENOSYS) if underlying device does not support this functionality.
+ *   - (-EIO) if underlying device is removed.
+ *   - (-ENOENT) if action pointed by *action* handle was not found.
+ *   - (-EBUSY) if action pointed by *action* handle still used by some rules
+ *   rte_errno is also set.
+ */
+__rte_experimental
+int
+rte_flow_q_action_handle_update(uint16_t port_id,
+		uint32_t queue_id,
+		const struct rte_flow_q_ops_attr *q_ops_attr,
+		struct rte_flow_action_handle *action_handle,
+		const void *update,
+		struct rte_flow_error *error);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Push all internally stored rules to the HW.
+ * Postponed rules are rules that were inserted with the postpone flag set.
+ * Can be used to notify the HW about batch of rules prepared by the SW to
+ * reduce the number of communications between the HW and SW.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param queue_id
+ *   Flow queue to be pushed.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *   PMDs initialize this structure in case of error only.
+ *
+ * @return
+ *    0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_q_push(uint16_t port_id,
+		uint32_t queue_id,
+		struct rte_flow_error *error);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Queue operation status.
+ */
+enum rte_flow_q_op_status {
+	/**
+	 * The operation was completed successfully.
+	 */
+	RTE_FLOW_Q_OP_SUCCESS,
+	/**
+	 * The operation was not completed successfully.
+	 */
+	RTE_FLOW_Q_OP_ERROR,
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Queue operation results.
+ */
+__extension__
+struct rte_flow_q_op_res {
+	/**
+	 * Version of the struct layout, should be 0.
+	 */
+	uint32_t version;
+	/**
+	 * Returns the status of the operation that this completion signals.
+	 */
+	enum rte_flow_q_op_status status;
+	/**
+	 * The user data that will be returned on the completion events.
+	 */
+	void *user_data;
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Pull a rte flow operation.
+ * The application must invoke this function in order to complete
+ * the flow rule offloading and to retrieve the flow rule operation status.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param queue_id
+ *   Flow queue which is used to pull the operation.
+ * @param[out] res
+ *   Array of results that will be set.
+ * @param[in] n_res
+ *   Maximum number of results that can be returned.
+ *   This value is equal to the size of the res array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *   PMDs initialize this structure in case of error only.
+ *
+ * @return
+ *   Number of results that were pulled,
+ *   a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_q_pull(uint16_t port_id,
+		uint32_t queue_id,
+		struct rte_flow_q_op_res res[],
+		uint16_t n_res,
+		struct rte_flow_error *error);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/rte_flow_driver.h b/lib/ethdev/rte_flow_driver.h
index 04b0960825..0edd933bf3 100644
--- a/lib/ethdev/rte_flow_driver.h
+++ b/lib/ethdev/rte_flow_driver.h
@@ -161,6 +161,8 @@  struct rte_flow_ops {
 	int (*configure)
 		(struct rte_eth_dev *dev,
 		 const struct rte_flow_port_attr *port_attr,
+		 uint16_t nb_queue,
+		 const struct rte_flow_queue_attr *queue_attr[],
 		 struct rte_flow_error *err);
 	/** See rte_flow_pattern_template_create() */
 	struct rte_flow_pattern_template *(*pattern_template_create)
@@ -199,6 +201,59 @@  struct rte_flow_ops {
 		(struct rte_eth_dev *dev,
 		 struct rte_flow_table *table,
 		 struct rte_flow_error *err);
+	/** See rte_flow_q_flow_create() */
+	struct rte_flow *(*q_flow_create)
+		(struct rte_eth_dev *dev,
+		 uint32_t queue_id,
+		 const struct rte_flow_q_ops_attr *q_ops_attr,
+		 struct rte_flow_table *table,
+		 const struct rte_flow_item pattern[],
+		 uint8_t pattern_template_index,
+		 const struct rte_flow_action actions[],
+		 uint8_t actions_template_index,
+		 struct rte_flow_error *err);
+	/** See rte_flow_q_flow_destroy() */
+	int (*q_flow_destroy)
+		(struct rte_eth_dev *dev,
+		 uint32_t queue_id,
+		 const struct rte_flow_q_ops_attr *q_ops_attr,
+		 struct rte_flow *flow,
+		 struct rte_flow_error *err);
+	/** See rte_flow_q_action_handle_create() */
+	struct rte_flow_action_handle *(*q_action_handle_create)
+		(struct rte_eth_dev *dev,
+		 uint32_t queue_id,
+		 const struct rte_flow_q_ops_attr *q_ops_attr,
+		 const struct rte_flow_indir_action_conf *indir_action_conf,
+		 const struct rte_flow_action *action,
+		 struct rte_flow_error *err);
+	/** See rte_flow_q_action_handle_destroy() */
+	int (*q_action_handle_destroy)
+		(struct rte_eth_dev *dev,
+		 uint32_t queue_id,
+		 const struct rte_flow_q_ops_attr *q_ops_attr,
+		 struct rte_flow_action_handle *action_handle,
+		 struct rte_flow_error *error);
+	/** See rte_flow_q_action_handle_update() */
+	int (*q_action_handle_update)
+		(struct rte_eth_dev *dev,
+		 uint32_t queue_id,
+		 const struct rte_flow_q_ops_attr *q_ops_attr,
+		 struct rte_flow_action_handle *action_handle,
+		 const void *update,
+		 struct rte_flow_error *error);
+	/** See rte_flow_q_push() */
+	int (*q_push)
+		(struct rte_eth_dev *dev,
+		 uint32_t queue_id,
+		 struct rte_flow_error *err);
+	/** See rte_flow_q_pull() */
+	int (*q_pull)
+		(struct rte_eth_dev *dev,
+		 uint32_t queue_id,
+		 struct rte_flow_q_op_res res[],
+		 uint16_t n_res,
+		 struct rte_flow_error *error);
 };
 
 /**
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 01c004d491..f431ef0a5d 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -266,6 +266,13 @@  EXPERIMENTAL {
 	rte_flow_actions_template_destroy;
 	rte_flow_table_create;
 	rte_flow_table_destroy;
+	rte_flow_q_flow_create;
+	rte_flow_q_flow_destroy;
+	rte_flow_q_action_handle_create;
+	rte_flow_q_action_handle_destroy;
+	rte_flow_q_action_handle_update;
+	rte_flow_q_push;
+	rte_flow_q_pull;
 };
 
 INTERNAL {