[v4,2/4] ethdev: tunnel offload model

Message ID 20201004135040.10307-3-getelson@nvidia.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series Tunnel Offload API |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Etelson, Gregory Oct. 4, 2020, 1:50 p.m. UTC
From: Eli Britstein <elibr@mellanox.com>

Rte_flow API provides the building blocks for vendor agnostic flow
classification offloads.  The rte_flow match and action primitives are
fine grained, thus enabling DPDK applications the flexibility to
offload network stacks and complex pipelines.

Applications wishing to offload complex data structures (e.g. tunnel
virtual ports) are required to use the rte_flow primitives, such as
group, meta, mark, tag and others to model their high level objects.

The hardware model design for high level software objects is not
trivial.  Furthermore, an optimal design is often vendor specific.

The goal of this API is to provide applications with the hardware
offload model for common high level software objects which is optimal
in regards to the underlying hardware.

Tunnel ports are the first of such objects.

Tunnel ports
------------
Ingress processing of tunneled traffic requires the classification of
the tunnel type followed by a decap action.

In software, once a packet is decapsulated the in_port field is
changed to a virtual port representing the tunnel type. The outer
header fields are stored as packet metadata members and may be matched
by proceeding flows.

Openvswitch, for example, uses two flows:
1. classification flow - setting the virtual port representing the
tunnel type For example: match on udp port 4789
actions=tnl_pop(vxlan_vport)
2. steering flow according to outer and inner header matches match on
in_port=vxlan_vport and outer/inner header matches actions=forward to
p ort X The benefits of multi-flow tables are described in [1].

Offloading tunnel ports
-----------------------
Tunnel ports introduce a new stateless field that can be matched on.
Currently the rte_flow library provides an API to encap, decap and
match on tunnel headers. However, there is no rte_flow primitive to
set and match tunnel virtual ports.

There are several possible hardware models for offloading virtual
tunnel port flows including, but not limited to, the following:
1. Setting the virtual port on a hw register using the
rte_flow_action_mark/ rte_flow_action_tag/rte_flow_set_meta objects.
2. Mapping a virtual port to an rte_flow group
3. Avoiding the need to match on transient objects by merging
multi-table flows to a single rte_flow rule.

Every approach has its pros and cons.  The preferred approach should
take into account the entire system architecture and is very often
vendor specific.

The proposed rte_flow_tunnel_decap_set helper function (drafted below)
is designed to provide a common, vendor agnostic, API for setting the
virtual port value.  The helper API enables PMD implementations to
return vendor specific combination of rte_flow actions realizing the
vendor's hardware model for setting a tunnel port.  Applications may
append the list of actions returned from the helper function when
creating an rte_flow rule in hardware.

Similarly, the rte_flow_tunnel_match helper (drafted below)
allows for multiple hardware implementations to return a list of
fte_flow items.

Miss handling
-------------
Packets going through multiple rte_flow groups are exposed to hw
misses due to partial packet processing. In such cases, the software
should continue the packet's processing from the point where the
hardware missed.

We propose a generic rte_flow_restore structure providing the state
that was stored in hardware when the packet missed.

Currently, the structure will provide the tunnel state of the packet
that missed, namely:
1. The group id that missed
2. The tunnel port that missed
3. Tunnel information that was stored in memory (due to decap action).
In the future, we may add additional fields as more state may be
stored in the device memory (e.g. ct_state).

Applications may query the state via a new
rte_flow_tunnel_get_restore_info(mbuf) API, thus allowing
a vendor specific implementation.

VXLAN Code example:
Assume application needs to do inner NAT on VXLAN packet.
The first  rule in group 0:

flow create <port id> ingress group 0
  pattern eth / ipv4 / udp dst is 4789 / vxlan / end
  actions {pmd actions} / jump group 3 / end

First VXLAN packet that arrives matches the rule in group 0 and jumps
to group 3 In group 3 the packet will miss since there is no flow to
match and will be uploaded to application.  Application  will call
rte_flow_get_restore_info() to get the packet outer header.
Application will insert a new rule in group 3 to match outer and inner
headers:

flow create <port id> ingress group 3
  pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 /
          udp dst 4789 / vxlan vni is 10 /
          ipv4 dst is 184.1.2.3 / end
  actions  set_ipv4_dst  186.1.1.1 / queue index 3 / end

Resulting of rules will be that VXLAN packet with vni=10, outer IPv4
dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received decaped
on queue 3 with IPv4 dst=186.1.1.1

Note: Packet in group 3 is considered decaped. All actions in that
group will be done on header that was inner before decap. Application
may specify outer header to be matched on.  It's PMD responsibility to
translate these items to outer metadata.

API usage:
/**
 * 1. Initiate RTE flow tunnel object
 */
const struct rte_flow_tunnel tunnel = {
  .type = RTE_FLOW_ITEM_TYPE_VXLAN,
  .tun_id = 10,
}

/**
 * 2. Obtain PMD tunnel actions
 *
 * pmd_actions is an intermediate variable application uses to
 * compile actions array
 */
struct rte_flow_action **pmd_actions;
rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions,
                              &num_pmd_actions, &error);

/**
 * 3. offload the first  rule
 * matching on VXLAN traffic and jumps to group 3
 * (implicitly decaps packet)
 */
app_actions  =   jump group 3
rule_items = app_items;  /** eth / ipv4 / udp / vxlan  */
rule_actions = { pmd_actions, app_actions };
attr.group = 0;
flow_1 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);
/**
  * 4. after flow creation application does not need to keep tunnel
  * action resources.
  */
rte_flow_tunnel_action_release(port_id, pmd_actions,
                               num_pmd_actions);

/**
  * 5. After partially offloaded packet miss because there was no
  * matching rule handle miss on group 3
  */
struct rte_flow_restore_info info;
rte_flow_get_restore_info(port_id, mbuf, &info, &error);

/**
 * 6. Offload NAT rule:
 */
app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 /
            vxlan vni is 10 / ipv4 dst is 184.1.2.3 }
app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 }

rte_flow_tunnel_match(&info.tunnel, &pmd_items,
                      &num_pmd_items,  &error);
rule_items = {pmd_items, app_items};
rule_actions = app_actions;
attr.group = info.group_id;
flow_2 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
 * 7. Release PMD items after rule creation
 */
rte_flow_tunnel_item_release(port_id, pmd_items, num_pmd_items);

References
1. https://mails.dpdk.org/archives/dev/2020-June/index.html

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/prog_guide/rte_flow.rst       | 105 ++++++++++++
 doc/guides/rel_notes/release_20_11.rst   |   9 ++
 lib/librte_ethdev/rte_ethdev_version.map |   6 +
 lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
 lib/librte_ethdev/rte_flow.h             | 195 +++++++++++++++++++++++
 lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
 6 files changed, 459 insertions(+)
  

Comments

Sriharsha Basavapatna Oct. 6, 2020, 9:47 a.m. UTC | #1
On Sun, Oct 4, 2020 at 7:23 PM Gregory Etelson <getelson@nvidia.com> wrote:
>
> From: Eli Britstein <elibr@mellanox.com>
>
> Rte_flow API provides the building blocks for vendor agnostic flow
> classification offloads.  The rte_flow match and action primitives are
> fine grained, thus enabling DPDK applications the flexibility to
> offload network stacks and complex pipelines.
>
> Applications wishing to offload complex data structures (e.g. tunnel
> virtual ports) are required to use the rte_flow primitives, such as
> group, meta, mark, tag and others to model their high level objects.
>
> The hardware model design for high level software objects is not
> trivial.  Furthermore, an optimal design is often vendor specific.
>
> The goal of this API is to provide applications with the hardware
> offload model for common high level software objects which is optimal
> in regards to the underlying hardware.
>
> Tunnel ports are the first of such objects.
>
> Tunnel ports
> ------------
> Ingress processing of tunneled traffic requires the classification of
> the tunnel type followed by a decap action.
>
> In software, once a packet is decapsulated the in_port field is
> changed to a virtual port representing the tunnel type. The outer
> header fields are stored as packet metadata members and may be matched
> by proceeding flows.
>
> Openvswitch, for example, uses two flows:
> 1. classification flow - setting the virtual port representing the
> tunnel type For example: match on udp port 4789
> actions=tnl_pop(vxlan_vport)
> 2. steering flow according to outer and inner header matches match on
> in_port=vxlan_vport and outer/inner header matches actions=forward to
> p ort X The benefits of multi-flow tables are described in [1].
>
> Offloading tunnel ports
> -----------------------
> Tunnel ports introduce a new stateless field that can be matched on.
> Currently the rte_flow library provides an API to encap, decap and
> match on tunnel headers. However, there is no rte_flow primitive to
> set and match tunnel virtual ports.

Tunnel vport is an internal construct used by one specific
application: OVS. So, shouldn't the rte APIs also be application
agnostic apart from being vendor agnostic ? For OVS, the match fields
in the existing datapath flow rules contain enough information to
identify the tunnel type.
>
> There are several possible hardware models for offloading virtual
> tunnel port flows including, but not limited to, the following:
> 1. Setting the virtual port on a hw register using the
> rte_flow_action_mark/ rte_flow_action_tag/rte_flow_set_meta objects.
> 2. Mapping a virtual port to an rte_flow group
> 3. Avoiding the need to match on transient objects by merging
> multi-table flows to a single rte_flow rule.
>
> Every approach has its pros and cons.  The preferred approach should
> take into account the entire system architecture and is very often
> vendor specific.
>
> The proposed rte_flow_tunnel_decap_set helper function (drafted below)
> is designed to provide a common, vendor agnostic, API for setting the
> virtual port value.  The helper API enables PMD implementations to
> return vendor specific combination of rte_flow actions realizing the
> vendor's hardware model for setting a tunnel port.  Applications may
> append the list of actions returned from the helper function when
> creating an rte_flow rule in hardware.

Wouldn't it be better if the APIs do not refer to vports and avoid
percolating it down to the PMD ? My point here is to avoid bringing in
the knowledge of an application specific virtual object (vport) to the
PMD.

Here's some other issues that I see with the helper APIs and
vendor-specific variable actions.
1) The application needs some kind of validation (or understanding) of
the actions returned by the PMD. The application can't just blindly
use the actions specified by the PMD. That is, the decision to pick
the set of actions can't be left entirely to the PMD.
2) The application needs to learn a PMD-specific way of action
processing for each vendor. For example, how should the application
handle flow-miss, given a different set of actions between two vendors
(if one vendor has already popped the tunnel header while the other
one hasn't).
3) The end-users/customers won't have a common interface (as in,
consistent actions) to perform tunnel decap action. This becomes a
manageability/maintenance issue for the application while working with
different vendors.

IMO, the API shouldn't expect the PMD to understand the notion of
vport. The goal here is to offload a flow rule to decap the tunnel
header and forward the packet to a HW endpoint.  The problem is that
we don't have a way to express the "tnl_pop" datapath action to the HW
(decap flow #1, in the context of br-phy in OVS-DPDK) and also we may
not want the HW to really pop the tunnel header at that stage. If this
cannot be expressed with existing rte action types, maybe we should
introduce a new action that clearly defines what is expected to the
PMD.
>
> Similarly, the rte_flow_tunnel_match helper (drafted below)
> allows for multiple hardware implementations to return a list of
> fte_flow items.
>
> Miss handling
> -------------
> Packets going through multiple rte_flow groups are exposed to hw
> misses due to partial packet processing. In such cases, the software
> should continue the packet's processing from the point where the
> hardware missed.

Whether the packet goes through multiple groups or not for tunnel
decap processing, should be left to the PMD/HW.  These assumptions
shouldn't be built into the APIs. The encapsulated packet (i,e with
outer headers) should be provided to the application, rather than
making SW understand that there was a miss in stage-1, or stage-n in
HW. That is, HW either processes it entirely, or punts the whole
packet to SW if there's a miss. And the packet should take the normal
processing path in SW (no action offload).

Thanks,
-Harsha

>
> We propose a generic rte_flow_restore structure providing the state
> that was stored in hardware when the packet missed.
>
> Currently, the structure will provide the tunnel state of the packet
> that missed, namely:
> 1. The group id that missed
> 2. The tunnel port that missed
> 3. Tunnel information that was stored in memory (due to decap action).
> In the future, we may add additional fields as more state may be
> stored in the device memory (e.g. ct_state).
>
> Applications may query the state via a new
> rte_flow_tunnel_get_restore_info(mbuf) API, thus allowing
> a vendor specific implementation.
>
> VXLAN Code example:
> Assume application needs to do inner NAT on VXLAN packet.
> The first  rule in group 0:
>
> flow create <port id> ingress group 0
>   pattern eth / ipv4 / udp dst is 4789 / vxlan / end
>   actions {pmd actions} / jump group 3 / end
>
> First VXLAN packet that arrives matches the rule in group 0 and jumps
> to group 3 In group 3 the packet will miss since there is no flow to
> match and will be uploaded to application.  Application  will call
> rte_flow_get_restore_info() to get the packet outer header.
> Application will insert a new rule in group 3 to match outer and inner
> headers:
>
> flow create <port id> ingress group 3
>   pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 /
>           udp dst 4789 / vxlan vni is 10 /
>           ipv4 dst is 184.1.2.3 / end
>   actions  set_ipv4_dst  186.1.1.1 / queue index 3 / end
>
> Resulting of rules will be that VXLAN packet with vni=10, outer IPv4
> dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received decaped
> on queue 3 with IPv4 dst=186.1.1.1
>
> Note: Packet in group 3 is considered decaped. All actions in that
> group will be done on header that was inner before decap. Application
> may specify outer header to be matched on.  It's PMD responsibility to
> translate these items to outer metadata.
>
> API usage:
> /**
>  * 1. Initiate RTE flow tunnel object
>  */
> const struct rte_flow_tunnel tunnel = {
>   .type = RTE_FLOW_ITEM_TYPE_VXLAN,
>   .tun_id = 10,
> }
>
> /**
>  * 2. Obtain PMD tunnel actions
>  *
>  * pmd_actions is an intermediate variable application uses to
>  * compile actions array
>  */
> struct rte_flow_action **pmd_actions;
> rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions,
>                               &num_pmd_actions, &error);
>
> /**
>  * 3. offload the first  rule
>  * matching on VXLAN traffic and jumps to group 3
>  * (implicitly decaps packet)
>  */
> app_actions  =   jump group 3
> rule_items = app_items;  /** eth / ipv4 / udp / vxlan  */
> rule_actions = { pmd_actions, app_actions };
> attr.group = 0;
> flow_1 = rte_flow_create(port_id, &attr,
>                          rule_items, rule_actions, &error);
> /**
>   * 4. after flow creation application does not need to keep tunnel
>   * action resources.
>   */
> rte_flow_tunnel_action_release(port_id, pmd_actions,
>                                num_pmd_actions);
>
> /**
>   * 5. After partially offloaded packet miss because there was no
>   * matching rule handle miss on group 3
>   */
> struct rte_flow_restore_info info;
> rte_flow_get_restore_info(port_id, mbuf, &info, &error);
>
> /**
>  * 6. Offload NAT rule:
>  */
> app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 /
>             vxlan vni is 10 / ipv4 dst is 184.1.2.3 }
> app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 }
>
> rte_flow_tunnel_match(&info.tunnel, &pmd_items,
>                       &num_pmd_items,  &error);
> rule_items = {pmd_items, app_items};
> rule_actions = app_actions;
> attr.group = info.group_id;
> flow_2 = rte_flow_create(port_id, &attr,
>                          rule_items, rule_actions, &error);
>
> /**
>  * 7. Release PMD items after rule creation
>  */
> rte_flow_tunnel_item_release(port_id, pmd_items, num_pmd_items);
>
> References
> 1. https://mails.dpdk.org/archives/dev/2020-June/index.html
>
> Signed-off-by: Eli Britstein <elibr@mellanox.com>
> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
> Acked-by: Ori Kam <orika@nvidia.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>  doc/guides/prog_guide/rte_flow.rst       | 105 ++++++++++++
>  doc/guides/rel_notes/release_20_11.rst   |   9 ++
>  lib/librte_ethdev/rte_ethdev_version.map |   6 +
>  lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
>  lib/librte_ethdev/rte_flow.h             | 195 +++++++++++++++++++++++
>  lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
>  6 files changed, 459 insertions(+)
>
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index df71bd2eeb..28f2aca984 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -3034,6 +3034,111 @@ operations include:
>  - Duplication of a complete flow rule description.
>  - Pattern item or action name retrieval.
>
> +Tunneled traffic offload
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Provide software application with unified rules model for tunneled traffic
> +regardless underlying hardware.
> +
> + - The model introduces a concept of a virtual tunnel port (VTP).
> + - The model uses VTP to offload ingress tunneled network traffic
> +   with RTE flow rules.
> + - The model is implemented as set of helper functions. Each PMD
> +   implements VTP offload according to underlying hardware offload
> +   capabilities.  Applications must query PMD for VTP flow
> +   items / actions before using in creation of a VTP flow rule.
> +
> +The model components:
> +
> +- Virtual Tunnel Port (VTP) is a stateless software object that
> +  describes tunneled network traffic.  VTP object usually contains
> +  descriptions of outer headers, tunnel headers and inner headers.
> +- Tunnel Steering flow Rule (TSR) detects tunneled packets and
> +  delegates them to tunnel processing infrastructure, implemented
> +  in PMD for optimal hardware utilization, for further processing.
> +- Tunnel Matching flow Rule (TMR) verifies packet configuration and
> +  runs offload actions in case of a match.
> +
> +Application actions:
> +
> +1 Initialize VTP object according to tunnel network parameters.
> +
> +2 Create TSR flow rule.
> +
> +2.1 Query PMD for VTP actions. Application can query for VTP actions more than once.
> +
> +  .. code-block:: c
> +
> +    int
> +    rte_flow_tunnel_decap_set(uint16_t port_id,
> +                              struct rte_flow_tunnel *tunnel,
> +                              struct rte_flow_action **pmd_actions,
> +                              uint32_t *num_of_pmd_actions,
> +                              struct rte_flow_error *error);
> +
> +2.2 Integrate PMD actions into TSR actions list.
> +
> +2.3 Create TSR flow rule.
> +
> +    .. code-block:: console
> +
> +      flow create <port> group 0 match {tunnel items} / end actions {PMD actions} / {App actions} / end
> +
> +3 Create TMR flow rule.
> +
> +3.1 Query PMD for VTP items. Application can query for VTP items more than once.
> +
> +    .. code-block:: c
> +
> +      int
> +      rte_flow_tunnel_match(uint16_t port_id,
> +                            struct rte_flow_tunnel *tunnel,
> +                            struct rte_flow_item **pmd_items,
> +                            uint32_t *num_of_pmd_items,
> +                            struct rte_flow_error *error);
> +
> +3.2 Integrate PMD items into TMR items list.
> +
> +3.3 Create TMR flow rule.
> +
> +    .. code-block:: console
> +
> +      flow create <port> group 0 match {PMD items} / {APP items} / end actions {offload actions} / end
> +
> +The model provides helper function call to restore packets that miss
> +tunnel TMR rules to its original state:
> +
> +.. code-block:: c
> +
> +  int
> +  rte_flow_get_restore_info(uint16_t port_id,
> +                            struct rte_mbuf *mbuf,
> +                            struct rte_flow_restore_info *info,
> +                            struct rte_flow_error *error);
> +
> +rte_tunnel object filled by the call inside
> +``rte_flow_restore_info *info parameter`` can be used by the application
> +to create new TMR rule for that tunnel.
> +
> +The model requirements:
> +
> +Software application must initialize
> +rte_tunnel object with tunnel parameters before calling
> +rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
> +
> +PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> +released by application with rte_flow_action_release() call.
> +Application can release the actionsfter TSR rule was created.
> +
> +PMD items array obtained with rte_flow_tunnel_match() must be released
> +by application with rte_flow_item_release() call.  Application can
> +release the items after rule was created. However, if the application
> +needs to create additional TMR rule for the same tunnel it will need
> +to obtain PMD items again.
> +
> +Application cannot destroy rte_tunnel object before it releases all
> +PMD actions & PMD items referencing that tunnel.
> +
>  Caveats
>  -------
>
> diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> index 1a9945f314..9ea8ee3170 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -62,6 +62,15 @@ New Features
>    * Added support for 200G PAM4 link speed.
>    * Added support for RSS hash level selection.
>    * Updated HWRM structures to 1.10.1.70 version.
> +* **Flow rules allowed to use private PMD items / actions.**
> +
> +  * Flow rule verification was updated to accept private PMD
> +    items and actions.
> +
> +* **Added generic API to offload tunneled traffic and restore missed packet.**
> +
> +  * Added a new hardware independent helper API to RTE flow library that
> +    offloads tunneled traffic and restores missed packets.
>
>  * **Updated Cisco enic driver.**
>
> diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
> index c95ef5157a..9832c138a2 100644
> --- a/lib/librte_ethdev/rte_ethdev_version.map
> +++ b/lib/librte_ethdev/rte_ethdev_version.map
> @@ -226,6 +226,12 @@ EXPERIMENTAL {
>         rte_tm_wred_profile_add;
>         rte_tm_wred_profile_delete;
>
> +       rte_flow_tunnel_decap_set;
> +       rte_flow_tunnel_match;
> +       rte_flow_get_restore_info;
> +       rte_flow_tunnel_action_decap_release;
> +       rte_flow_tunnel_item_release;
> +
>         # added in 20.11
>         rte_eth_link_speed_to_str;
>         rte_eth_link_to_str;
> diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
> index c8c6d62a8b..181c02792d 100644
> --- a/lib/librte_ethdev/rte_flow.c
> +++ b/lib/librte_ethdev/rte_flow.c
> @@ -1267,3 +1267,115 @@ rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
>                                   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
>                                   NULL, rte_strerror(ENOTSUP));
>  }
> +
> +int
> +rte_flow_tunnel_decap_set(uint16_t port_id,
> +                         struct rte_flow_tunnel *tunnel,
> +                         struct rte_flow_action **actions,
> +                         uint32_t *num_of_actions,
> +                         struct rte_flow_error *error)
> +{
> +       struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +       const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +       if (unlikely(!ops))
> +               return -rte_errno;
> +       if (likely(!!ops->tunnel_decap_set)) {
> +               return flow_err(port_id,
> +                               ops->tunnel_decap_set(dev, tunnel, actions,
> +                                                     num_of_actions, error),
> +                               error);
> +       }
> +       return rte_flow_error_set(error, ENOTSUP,
> +                                 RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +                                 NULL, rte_strerror(ENOTSUP));
> +}
> +
> +int
> +rte_flow_tunnel_match(uint16_t port_id,
> +                     struct rte_flow_tunnel *tunnel,
> +                     struct rte_flow_item **items,
> +                     uint32_t *num_of_items,
> +                     struct rte_flow_error *error)
> +{
> +       struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +       const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +       if (unlikely(!ops))
> +               return -rte_errno;
> +       if (likely(!!ops->tunnel_match)) {
> +               return flow_err(port_id,
> +                               ops->tunnel_match(dev, tunnel, items,
> +                                                 num_of_items, error),
> +                               error);
> +       }
> +       return rte_flow_error_set(error, ENOTSUP,
> +                                 RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +                                 NULL, rte_strerror(ENOTSUP));
> +}
> +
> +int
> +rte_flow_get_restore_info(uint16_t port_id,
> +                         struct rte_mbuf *m,
> +                         struct rte_flow_restore_info *restore_info,
> +                         struct rte_flow_error *error)
> +{
> +       struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +       const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +       if (unlikely(!ops))
> +               return -rte_errno;
> +       if (likely(!!ops->get_restore_info)) {
> +               return flow_err(port_id,
> +                               ops->get_restore_info(dev, m, restore_info,
> +                                                     error),
> +                               error);
> +       }
> +       return rte_flow_error_set(error, ENOTSUP,
> +                                 RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +                                 NULL, rte_strerror(ENOTSUP));
> +}
> +
> +int
> +rte_flow_tunnel_action_decap_release(uint16_t port_id,
> +                                    struct rte_flow_action *actions,
> +                                    uint32_t num_of_actions,
> +                                    struct rte_flow_error *error)
> +{
> +       struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +       const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +       if (unlikely(!ops))
> +               return -rte_errno;
> +       if (likely(!!ops->action_release)) {
> +               return flow_err(port_id,
> +                               ops->action_release(dev, actions,
> +                                                   num_of_actions, error),
> +                               error);
> +       }
> +       return rte_flow_error_set(error, ENOTSUP,
> +                                 RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +                                 NULL, rte_strerror(ENOTSUP));
> +}
> +
> +int
> +rte_flow_tunnel_item_release(uint16_t port_id,
> +                            struct rte_flow_item *items,
> +                            uint32_t num_of_items,
> +                            struct rte_flow_error *error)
> +{
> +       struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +       const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +       if (unlikely(!ops))
> +               return -rte_errno;
> +       if (likely(!!ops->item_release)) {
> +               return flow_err(port_id,
> +                               ops->item_release(dev, items,
> +                                                 num_of_items, error),
> +                               error);
> +       }
> +       return rte_flow_error_set(error, ENOTSUP,
> +                                 RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +                                 NULL, rte_strerror(ENOTSUP));
> +}
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index da8bfa5489..2f12d3ea1a 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -3357,6 +3357,201 @@ int
>  rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
>                         uint32_t nb_contexts, struct rte_flow_error *error);
>
> +/* Tunnel has a type and the key information. */
> +struct rte_flow_tunnel {
> +       /**
> +        * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
> +        * RTE_FLOW_ITEM_TYPE_NVGRE etc.
> +        */
> +       enum rte_flow_item_type type;
> +       uint64_t tun_id; /**< Tunnel identification. */
> +
> +       RTE_STD_C11
> +       union {
> +               struct {
> +                       rte_be32_t src_addr; /**< IPv4 source address. */
> +                       rte_be32_t dst_addr; /**< IPv4 destination address. */
> +               } ipv4;
> +               struct {
> +                       uint8_t src_addr[16]; /**< IPv6 source address. */
> +                       uint8_t dst_addr[16]; /**< IPv6 destination address. */
> +               } ipv6;
> +       };
> +       rte_be16_t tp_src; /**< Tunnel port source. */
> +       rte_be16_t tp_dst; /**< Tunnel port destination. */
> +       uint16_t   tun_flags; /**< Tunnel flags. */
> +
> +       bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
> +
> +       /**
> +        * the following members are required to restore packet
> +        * after miss
> +        */
> +       uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
> +       uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
> +       uint32_t label; /**< Flow Label for IPv6. */
> +};
> +
> +/**
> + * Indicate that the packet has a tunnel.
> + */
> +#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
> +
> +/**
> + * Indicate that the packet has a non decapsulated tunnel header.
> + */
> +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
> +
> +/**
> + * Indicate that the packet has a group_id.
> + */
> +#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
> +
> +/**
> + * Restore information structure to communicate the current packet processing
> + * state when some of the processing pipeline is done in hardware and should
> + * continue in software.
> + */
> +struct rte_flow_restore_info {
> +       /**
> +        * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
> +        * other fields in struct rte_flow_restore_info.
> +        */
> +       uint64_t flags;
> +       uint32_t group_id; /**< Group ID where packed missed */
> +       struct rte_flow_tunnel tunnel; /**< Tunnel information. */
> +};
> +
> +/**
> + * Allocate an array of actions to be used in rte_flow_create, to implement
> + * tunnel-decap-set for the given tunnel.
> + * Sample usage:
> + *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
> + *            jump group 0 / end
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] actions
> + *   Array of actions to be allocated by the PMD. This array should be
> + *   concatenated with the actions array provided to rte_flow_create.
> + * @param[out] num_of_actions
> + *   Number of actions allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_decap_set(uint16_t port_id,
> +                         struct rte_flow_tunnel *tunnel,
> +                         struct rte_flow_action **actions,
> +                         uint32_t *num_of_actions,
> +                         struct rte_flow_error *error);
> +
> +/**
> + * Allocate an array of items to be used in rte_flow_create, to implement
> + * tunnel-match for the given tunnel.
> + * Sample usage:
> + *   pattern tunnel-match(tunnel properties) / outer-header-matches /
> + *           inner-header-matches / end
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] items
> + *   Array of items to be allocated by the PMD. This array should be
> + *   concatenated with the items array provided to rte_flow_create.
> + * @param[out] num_of_items
> + *   Number of items allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_match(uint16_t port_id,
> +                     struct rte_flow_tunnel *tunnel,
> +                     struct rte_flow_item **items,
> +                     uint32_t *num_of_items,
> +                     struct rte_flow_error *error);
> +
> +/**
> + * Populate the current packet processing state, if exists, for the given mbuf.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] m
> + *   Mbuf struct.
> + * @param[out] info
> + *   Restore information. Upon success contains the HW state.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_get_restore_info(uint16_t port_id,
> +                         struct rte_mbuf *m,
> +                         struct rte_flow_restore_info *info,
> +                         struct rte_flow_error *error);
> +
> +/**
> + * Release the action array as allocated by rte_flow_tunnel_decap_set.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] actions
> + *   Array of actions to be released.
> + * @param[in] num_of_actions
> + *   Number of elements in actions array.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_action_decap_release(uint16_t port_id,
> +                                    struct rte_flow_action *actions,
> +                                    uint32_t num_of_actions,
> +                                    struct rte_flow_error *error);
> +
> +/**
> + * Release the item array as allocated by rte_flow_tunnel_match.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] items
> + *   Array of items to be released.
> + * @param[in] num_of_items
> + *   Number of elements in item array.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_item_release(uint16_t port_id,
> +                            struct rte_flow_item *items,
> +                            uint32_t num_of_items,
> +                            struct rte_flow_error *error);
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_ethdev/rte_flow_driver.h b/lib/librte_ethdev/rte_flow_driver.h
> index 3ee871d3eb..9d87407203 100644
> --- a/lib/librte_ethdev/rte_flow_driver.h
> +++ b/lib/librte_ethdev/rte_flow_driver.h
> @@ -108,6 +108,38 @@ struct rte_flow_ops {
>                  void **context,
>                  uint32_t nb_contexts,
>                  struct rte_flow_error *err);
> +       /** See rte_flow_tunnel_decap_set() */
> +       int (*tunnel_decap_set)
> +               (struct rte_eth_dev *dev,
> +                struct rte_flow_tunnel *tunnel,
> +                struct rte_flow_action **pmd_actions,
> +                uint32_t *num_of_actions,
> +                struct rte_flow_error *err);
> +       /** See rte_flow_tunnel_match() */
> +       int (*tunnel_match)
> +               (struct rte_eth_dev *dev,
> +                struct rte_flow_tunnel *tunnel,
> +                struct rte_flow_item **pmd_items,
> +                uint32_t *num_of_items,
> +                struct rte_flow_error *err);
> +       /** See rte_flow_get_rte_flow_restore_info() */
> +       int (*get_restore_info)
> +               (struct rte_eth_dev *dev,
> +                struct rte_mbuf *m,
> +                struct rte_flow_restore_info *info,
> +                struct rte_flow_error *err);
> +       /** See rte_flow_action_tunnel_decap_release() */
> +       int (*action_release)
> +               (struct rte_eth_dev *dev,
> +                struct rte_flow_action *pmd_actions,
> +                uint32_t num_of_actions,
> +                struct rte_flow_error *err);
> +       /** See rte_flow_item_release() */
> +       int (*item_release)
> +               (struct rte_eth_dev *dev,
> +                struct rte_flow_item *pmd_items,
> +                uint32_t num_of_items,
> +                struct rte_flow_error *err);
>  };
>
>  /**
> --
> 2.28.0
>
  
Etelson, Gregory Oct. 7, 2020, 12:36 p.m. UTC | #2
Hello Harsha,

> -----Original Message-----

[snip]
> 
> Tunnel vport is an internal construct used by one specific
> application: OVS. So, shouldn't the rte APIs also be application
> agnostic apart from being vendor agnostic ? For OVS, the match fields
> in the existing datapath flow rules contain enough information to
> identify the tunnel type.

Tunnel offload model was inspired by OVS vport, but it is not part of the existing API.
It looks like the API documentation should not use that term to avoid confusion.

[snip]

[snip]
> 
> Wouldn't it be better if the APIs do not refer to vports and avoid
> percolating it down to the PMD ? My point here is to avoid bringing in
> the knowledge of an application specific virtual object (vport) to the
> PMD.
> 

As I have mentioned above, the API description should not mention vport.
I'll post updated documents. 

> Here's some other issues that I see with the helper APIs and
> vendor-specific variable actions.
> 1) The application needs some kind of validation (or understanding) of
> the actions returned by the PMD. The application can't just blindly
> use the actions specified by the PMD. That is, the decision to pick
> the set of actions can't be left entirely to the PMD.
> 2) The application needs to learn a PMD-specific way of action
> processing for each vendor. For example, how should the application
> handle flow-miss, given a different set of actions between two vendors
> (if one vendor has already popped the tunnel header while the other
> one hasn't).
> 3) The end-users/customers won't have a common interface (as in,
> consistent actions) to perform tunnel decap action. This becomes a
> manageability/maintenance issue for the application while working with
> different vendors.
> 
> IMO, the API shouldn't expect the PMD to understand the notion of
> vport. The goal here is to offload a flow rule to decap the tunnel
> header and forward the packet to a HW endpoint.  The problem is that
> we don't have a way to express the "tnl_pop" datapath action to the HW
> (decap flow #1, in the context of br-phy in OVS-DPDK) and also we may
> not want the HW to really pop the tunnel header at that stage. If this
> cannot be expressed with existing rte action types, maybe we should
> introduce a new action that clearly defines what is expected to the
> PMD.

Tunnel Offload API provides a common interface for all HW vendors:
Rule #1: define a tunneled traffic and steer / group traffic related to
that tunnel
Rule #2: within the tunnel selection, run matchers on all packet headers,
outer and inner, and perform actions on inner headers in case of a match.
For the rule #1 application provides tunnel matchers and traffic selection actions
and for rule #2 application provides full header matchers and actions for inner parts.
The rest is supplied by PMD according to HW and rule type. Application does not
need to understand exact PMD elements implementation.
Helper return value notifies application whether it received requested PMD elements or not.
If helper completed successfully, it means that application received required elements
and can complete flow rule compilation.
As the result, a packet will be fully offloaded or returned to application with enough
information to continue processing in SW.

[snip]

[snip]

> > Miss handling
> > -------------
> > Packets going through multiple rte_flow groups are exposed to hw
> > misses due to partial packet processing. In such cases, the software
> > should continue the packet's processing from the point where the
> > hardware missed.
> 
> Whether the packet goes through multiple groups or not for tunnel
> decap processing, should be left to the PMD/HW.  These assumptions
> shouldn't be built into the APIs. The encapsulated packet (i,e with
> outer headers) should be provided to the application, rather than
> making SW understand that there was a miss in stage-1, or stage-n in
> HW. That is, HW either processes it entirely, or punts the whole
> packet to SW if there's a miss. And the packet should take the normal
> processing path in SW (no action offload).
> 
> Thanks,
> -Harsha

The packet is provided to the application via the standard rte_eth_rx_burst API.
Additional information about the HW packet processing state is provided to
the application by the suggested rte_flow_get_restore_info API. It is up to the
application if to use such provided info, or even if to call this API at all.

[snip]

Regards,
Gregory
  
Ferruh Yigit Oct. 14, 2020, 5:23 p.m. UTC | #3
On 10/7/2020 1:36 PM, Gregory Etelson wrote:
> Hello Harsha,
> 
>> -----Original Message-----
> 
> [snip]
>>
>> Tunnel vport is an internal construct used by one specific
>> application: OVS. So, shouldn't the rte APIs also be application
>> agnostic apart from being vendor agnostic ? For OVS, the match fields
>> in the existing datapath flow rules contain enough information to
>> identify the tunnel type.
> 
> Tunnel offload model was inspired by OVS vport, but it is not part of the existing API.
> It looks like the API documentation should not use that term to avoid confusion.
> 
> [snip]
> 
> [snip]
>>
>> Wouldn't it be better if the APIs do not refer to vports and avoid
>> percolating it down to the PMD ? My point here is to avoid bringing in
>> the knowledge of an application specific virtual object (vport) to the
>> PMD.
>>
> 
> As I have mentioned above, the API description should not mention vport.
> I'll post updated documents.
> 
>> Here's some other issues that I see with the helper APIs and
>> vendor-specific variable actions.
>> 1) The application needs some kind of validation (or understanding) of
>> the actions returned by the PMD. The application can't just blindly
>> use the actions specified by the PMD. That is, the decision to pick
>> the set of actions can't be left entirely to the PMD.
>> 2) The application needs to learn a PMD-specific way of action
>> processing for each vendor. For example, how should the application
>> handle flow-miss, given a different set of actions between two vendors
>> (if one vendor has already popped the tunnel header while the other
>> one hasn't).
>> 3) The end-users/customers won't have a common interface (as in,
>> consistent actions) to perform tunnel decap action. This becomes a
>> manageability/maintenance issue for the application while working with
>> different vendors.
>>
>> IMO, the API shouldn't expect the PMD to understand the notion of
>> vport. The goal here is to offload a flow rule to decap the tunnel
>> header and forward the packet to a HW endpoint.  The problem is that
>> we don't have a way to express the "tnl_pop" datapath action to the HW
>> (decap flow #1, in the context of br-phy in OVS-DPDK) and also we may
>> not want the HW to really pop the tunnel header at that stage. If this
>> cannot be expressed with existing rte action types, maybe we should
>> introduce a new action that clearly defines what is expected to the
>> PMD.
> 
> Tunnel Offload API provides a common interface for all HW vendors:
> Rule #1: define a tunneled traffic and steer / group traffic related to
> that tunnel
> Rule #2: within the tunnel selection, run matchers on all packet headers,
> outer and inner, and perform actions on inner headers in case of a match.
> For the rule #1 application provides tunnel matchers and traffic selection actions
> and for rule #2 application provides full header matchers and actions for inner parts.
> The rest is supplied by PMD according to HW and rule type. Application does not
> need to understand exact PMD elements implementation.
> Helper return value notifies application whether it received requested PMD elements or not.
> If helper completed successfully, it means that application received required elements
> and can complete flow rule compilation.
> As the result, a packet will be fully offloaded or returned to application with enough
> information to continue processing in SW.
> 
> [snip]
> 
> [snip]
> 
>>> Miss handling
>>> -------------
>>> Packets going through multiple rte_flow groups are exposed to hw
>>> misses due to partial packet processing. In such cases, the software
>>> should continue the packet's processing from the point where the
>>> hardware missed.
>>
>> Whether the packet goes through multiple groups or not for tunnel
>> decap processing, should be left to the PMD/HW.  These assumptions
>> shouldn't be built into the APIs. The encapsulated packet (i,e with
>> outer headers) should be provided to the application, rather than
>> making SW understand that there was a miss in stage-1, or stage-n in
>> HW. That is, HW either processes it entirely, or punts the whole
>> packet to SW if there's a miss. And the packet should take the normal
>> processing path in SW (no action offload).
>>
>> Thanks,
>> -Harsha
> 
> The packet is provided to the application via the standard rte_eth_rx_burst API.
> Additional information about the HW packet processing state is provided to
> the application by the suggested rte_flow_get_restore_info API. It is up to the
> application if to use such provided info, or even if to call this API at all.
> 
> [snip]
> 
> Regards,
> Gregory
> 


Hi Gregory, Sriharsha,

Is there any output of the discussion?
  
Thomas Monjalon Oct. 14, 2020, 11:55 p.m. UTC | #4
Formatting review below (someone has to do it):

04/10/2020 15:50, Gregory Etelson:
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -3034,6 +3034,111 @@ operations include:
> +Tunneled traffic offload
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Provide software application with unified rules model for tunneled traffic
> +regardless underlying hardware.
> +
> + - The model introduces a concept of a virtual tunnel port (VTP).

Given the API does use this terminology, it is confusing to find
this wording in the doc.

> + - The model uses VTP to offload ingress tunneled network traffic 
> +   with RTE flow rules.
> + - The model is implemented as set of helper functions. Each PMD
> +   implements VTP offload according to underlying hardware offload
> +   capabilities.  Applications must query PMD for VTP flow
> +   items / actions before using in creation of a VTP flow rule.
> +
> +The model components:
> +
> +- Virtual Tunnel Port (VTP) is a stateless software object that
> +  describes tunneled network traffic.  VTP object usually contains
> +  descriptions of outer headers, tunnel headers and inner headers.
> +- Tunnel Steering flow Rule (TSR) detects tunneled packets and
> +  delegates them to tunnel processing infrastructure, implemented
> +  in PMD for optimal hardware utilization, for further processing.
> +- Tunnel Matching flow Rule (TMR) verifies packet configuration and
> +  runs offload actions in case of a match.

I'm not a fan of all those acronyms.
It makes reading more difficult in my opinion.

> +
> +Application actions:
> +
> +1 Initialize VTP object according to tunnel network parameters.
> +
> +2 Create TSR flow rule.
> +
> +2.1 Query PMD for VTP actions. Application can query for VTP actions more than once.
> +
> +  .. code-block:: c
> +
> +    int
> +    rte_flow_tunnel_decap_set(uint16_t port_id,
> +                              struct rte_flow_tunnel *tunnel,
> +                              struct rte_flow_action **pmd_actions,
> +                              uint32_t *num_of_pmd_actions,
> +                              struct rte_flow_error *error);
> +
> +2.2 Integrate PMD actions into TSR actions list.
> +
> +2.3 Create TSR flow rule.
> +
> +    .. code-block:: console
> +
> +      flow create <port> group 0 match {tunnel items} / end actions {PMD actions} / {App actions} / end

Not sure about the testpmd syntax here.
Is it bringing a value compared to some text description?

> +
> +3 Create TMR flow rule.
> +
> +3.1 Query PMD for VTP items. Application can query for VTP items more than once.
> +
> +    .. code-block:: c
> +
> +      int
> +      rte_flow_tunnel_match(uint16_t port_id,
> +                            struct rte_flow_tunnel *tunnel,
> +                            struct rte_flow_item **pmd_items,
> +                            uint32_t *num_of_pmd_items,
> +                            struct rte_flow_error *error);
> +
> +3.2 Integrate PMD items into TMR items list.
> +
> +3.3 Create TMR flow rule.
> +
> +    .. code-block:: console
> +
> +      flow create <port> group 0 match {PMD items} / {APP items} / end actions {offload actions} / end
> +
> +The model provides helper function call to restore packets that miss
> +tunnel TMR rules to its original state:
> +
> +.. code-block:: c
> +
> +  int
> +  rte_flow_get_restore_info(uint16_t port_id,
> +                            struct rte_mbuf *mbuf,
> +                            struct rte_flow_restore_info *info,
> +                            struct rte_flow_error *error);
> +
> +rte_tunnel object filled by the call inside
> +``rte_flow_restore_info *info parameter`` can be used by the application
> +to create new TMR rule for that tunnel.
> +
> +The model requirements:

Should it be a section title?

> +
> +Software application must initialize
> +rte_tunnel object with tunnel parameters before calling
> +rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().

It is preferred having code symbols between double backquotes.

> +
> +PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> +released by application with rte_flow_action_release() call.
> +Application can release the actionsfter TSR rule was created.

typo: actionsfter

> +
> +PMD items array obtained with rte_flow_tunnel_match() must be released
> +by application with rte_flow_item_release() call.  Application can
> +release the items after rule was created. However, if the application
> +needs to create additional TMR rule for the same tunnel it will need
> +to obtain PMD items again.
> +
> +Application cannot destroy rte_tunnel object before it releases all
> +PMD actions & PMD items referencing that tunnel.
[...]
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -62,6 +62,15 @@ New Features
>    * Added support for 200G PAM4 link speed.
>    * Added support for RSS hash level selection.
>    * Updated HWRM structures to 1.10.1.70 version.
> +* **Flow rules allowed to use private PMD items / actions.**
> +
> +  * Flow rule verification was updated to accept private PMD
> +    items and actions.

This should be in the previous patch, but not sure it's worth noting at all.

> +
> +* **Added generic API to offload tunneled traffic and restore missed packet.**
> +
> +  * Added a new hardware independent helper API to RTE flow library that
> +    offloads tunneled traffic and restores missed packets.

Here and elsewhere, "hardware independent" is implied for rte_flow API.
Please write "flow API" or "rte_flow".
This block should be before driver ones, with other ethdev features.

> --- a/lib/librte_ethdev/rte_ethdev_version.map
> +++ b/lib/librte_ethdev/rte_ethdev_version.map
> @@ -226,6 +226,12 @@ EXPERIMENTAL {
>  	rte_tm_wred_profile_add;
>  	rte_tm_wred_profile_delete;
>  
> +	rte_flow_tunnel_decap_set;
> +	rte_flow_tunnel_match;
> +	rte_flow_get_restore_info;
> +	rte_flow_tunnel_action_decap_release;
> +	rte_flow_tunnel_item_release;

It is for 20.11 now, so should be placed below.

> +
>  	# added in 20.11
>  	rte_eth_link_speed_to_str;
>  	rte_eth_link_to_str;
  
Etelson, Gregory Oct. 16, 2020, 9:15 a.m. UTC | #5
Hello Feruh,

> Subject: Re: [dpdk-dev] [PATCH v4 2/4] ethdev: tunnel offload model
> 
> External email: Use caution opening links or attachments
> 
> 
> On 10/7/2020 1:36 PM, Gregory Etelson wrote:
> > Hello Harsha,
> >
> >> -----Original Message-----
> >
> > [snip]
> >>
> >> Tunnel vport is an internal construct used by one specific
> >> application: OVS. So, shouldn't the rte APIs also be application
> >> agnostic apart from being vendor agnostic ? For OVS, the match fields
> >> in the existing datapath flow rules contain enough information to
> >> identify the tunnel type.
> >
> > Tunnel offload model was inspired by OVS vport, but it is not part of
> the existing API.
> > It looks like the API documentation should not use that term to avoid
> confusion.
> >
> > [snip]
> >
> > [snip]
> >>
> >> Wouldn't it be better if the APIs do not refer to vports and avoid
> >> percolating it down to the PMD ? My point here is to avoid bringing in
> >> the knowledge of an application specific virtual object (vport) to the
> >> PMD.
> >>
> >
> > As I have mentioned above, the API description should not mention vport.
> > I'll post updated documents.
> >
> >> Here's some other issues that I see with the helper APIs and
> >> vendor-specific variable actions.
> >> 1) The application needs some kind of validation (or understanding) of
> >> the actions returned by the PMD. The application can't just blindly
> >> use the actions specified by the PMD. That is, the decision to pick
> >> the set of actions can't be left entirely to the PMD.
> >> 2) The application needs to learn a PMD-specific way of action
> >> processing for each vendor. For example, how should the application
> >> handle flow-miss, given a different set of actions between two vendors
> >> (if one vendor has already popped the tunnel header while the other
> >> one hasn't).
> >> 3) The end-users/customers won't have a common interface (as in,
> >> consistent actions) to perform tunnel decap action. This becomes a
> >> manageability/maintenance issue for the application while working with
> >> different vendors.
> >>
> >> IMO, the API shouldn't expect the PMD to understand the notion of
> >> vport. The goal here is to offload a flow rule to decap the tunnel
> >> header and forward the packet to a HW endpoint.  The problem is that
> >> we don't have a way to express the "tnl_pop" datapath action to the HW
> >> (decap flow #1, in the context of br-phy in OVS-DPDK) and also we may
> >> not want the HW to really pop the tunnel header at that stage. If this
> >> cannot be expressed with existing rte action types, maybe we should
> >> introduce a new action that clearly defines what is expected to the
> >> PMD.
> >
> > Tunnel Offload API provides a common interface for all HW vendors:
> > Rule #1: define a tunneled traffic and steer / group traffic related to
> > that tunnel
> > Rule #2: within the tunnel selection, run matchers on all packet
> headers,
> > outer and inner, and perform actions on inner headers in case of a
> match.
> > For the rule #1 application provides tunnel matchers and traffic
> selection actions
> > and for rule #2 application provides full header matchers and actions
> for inner parts.
> > The rest is supplied by PMD according to HW and rule type. Application
> does not
> > need to understand exact PMD elements implementation.
> > Helper return value notifies application whether it received requested
> PMD elements or not.
> > If helper completed successfully, it means that application received
> required elements
> > and can complete flow rule compilation.
> > As the result, a packet will be fully offloaded or returned to
> application with enough
> > information to continue processing in SW.
> >
> > [snip]
> >
> > [snip]
> >
> >>> Miss handling
> >>> -------------
> >>> Packets going through multiple rte_flow groups are exposed to hw
> >>> misses due to partial packet processing. In such cases, the software
> >>> should continue the packet's processing from the point where the
> >>> hardware missed.
> >>
> >> Whether the packet goes through multiple groups or not for tunnel
> >> decap processing, should be left to the PMD/HW.  These assumptions
> >> shouldn't be built into the APIs. The encapsulated packet (i,e with
> >> outer headers) should be provided to the application, rather than
> >> making SW understand that there was a miss in stage-1, or stage-n in
> >> HW. That is, HW either processes it entirely, or punts the whole
> >> packet to SW if there's a miss. And the packet should take the normal
> >> processing path in SW (no action offload).
> >>
> >> Thanks,
> >> -Harsha
> >
> > The packet is provided to the application via the standard
> rte_eth_rx_burst API.
> > Additional information about the HW packet processing state is provided
> to
> > the application by the suggested rte_flow_get_restore_info API. It is up
> to the
> > application if to use such provided info, or even if to call this API at
> all.
> >
> > [snip]
> >
> > Regards,
> > Gregory
> >
> 
> 
> Hi Gregory, Sriharsha,
> 
> Is there any output of the discussion?

Tunnel API documentation was updated in V6.

Regards,
Gregory
  

Patch

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index df71bd2eeb..28f2aca984 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -3034,6 +3034,111 @@  operations include:
 - Duplication of a complete flow rule description.
 - Pattern item or action name retrieval.
 
+Tunneled traffic offload
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Provide software application with unified rules model for tunneled traffic
+regardless underlying hardware.
+
+ - The model introduces a concept of a virtual tunnel port (VTP).
+ - The model uses VTP to offload ingress tunneled network traffic 
+   with RTE flow rules.
+ - The model is implemented as set of helper functions. Each PMD
+   implements VTP offload according to underlying hardware offload
+   capabilities.  Applications must query PMD for VTP flow
+   items / actions before using in creation of a VTP flow rule.
+
+The model components:
+
+- Virtual Tunnel Port (VTP) is a stateless software object that
+  describes tunneled network traffic.  VTP object usually contains
+  descriptions of outer headers, tunnel headers and inner headers.
+- Tunnel Steering flow Rule (TSR) detects tunneled packets and
+  delegates them to tunnel processing infrastructure, implemented
+  in PMD for optimal hardware utilization, for further processing.
+- Tunnel Matching flow Rule (TMR) verifies packet configuration and
+  runs offload actions in case of a match.
+
+Application actions:
+
+1 Initialize VTP object according to tunnel network parameters.
+
+2 Create TSR flow rule.
+
+2.1 Query PMD for VTP actions. Application can query for VTP actions more than once.
+
+  .. code-block:: c
+
+    int
+    rte_flow_tunnel_decap_set(uint16_t port_id,
+                              struct rte_flow_tunnel *tunnel,
+                              struct rte_flow_action **pmd_actions,
+                              uint32_t *num_of_pmd_actions,
+                              struct rte_flow_error *error);
+
+2.2 Integrate PMD actions into TSR actions list.
+
+2.3 Create TSR flow rule.
+
+    .. code-block:: console
+
+      flow create <port> group 0 match {tunnel items} / end actions {PMD actions} / {App actions} / end
+
+3 Create TMR flow rule.
+
+3.1 Query PMD for VTP items. Application can query for VTP items more than once.
+
+    .. code-block:: c
+
+      int
+      rte_flow_tunnel_match(uint16_t port_id,
+                            struct rte_flow_tunnel *tunnel,
+                            struct rte_flow_item **pmd_items,
+                            uint32_t *num_of_pmd_items,
+                            struct rte_flow_error *error);
+
+3.2 Integrate PMD items into TMR items list.
+
+3.3 Create TMR flow rule.
+
+    .. code-block:: console
+
+      flow create <port> group 0 match {PMD items} / {APP items} / end actions {offload actions} / end
+
+The model provides helper function call to restore packets that miss
+tunnel TMR rules to its original state:
+
+.. code-block:: c
+
+  int
+  rte_flow_get_restore_info(uint16_t port_id,
+                            struct rte_mbuf *mbuf,
+                            struct rte_flow_restore_info *info,
+                            struct rte_flow_error *error);
+
+rte_tunnel object filled by the call inside
+``rte_flow_restore_info *info parameter`` can be used by the application
+to create new TMR rule for that tunnel.
+
+The model requirements:
+
+Software application must initialize
+rte_tunnel object with tunnel parameters before calling
+rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
+
+PMD actions array obtained in rte_flow_tunnel_decap_set() must be
+released by application with rte_flow_action_release() call.
+Application can release the actionsfter TSR rule was created.
+
+PMD items array obtained with rte_flow_tunnel_match() must be released
+by application with rte_flow_item_release() call.  Application can
+release the items after rule was created. However, if the application
+needs to create additional TMR rule for the same tunnel it will need
+to obtain PMD items again.
+
+Application cannot destroy rte_tunnel object before it releases all
+PMD actions & PMD items referencing that tunnel.
+
 Caveats
 -------
 
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 1a9945f314..9ea8ee3170 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -62,6 +62,15 @@  New Features
   * Added support for 200G PAM4 link speed.
   * Added support for RSS hash level selection.
   * Updated HWRM structures to 1.10.1.70 version.
+* **Flow rules allowed to use private PMD items / actions.**
+
+  * Flow rule verification was updated to accept private PMD
+    items and actions.
+
+* **Added generic API to offload tunneled traffic and restore missed packet.**
+
+  * Added a new hardware independent helper API to RTE flow library that
+    offloads tunneled traffic and restores missed packets.
 
 * **Updated Cisco enic driver.**
 
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index c95ef5157a..9832c138a2 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -226,6 +226,12 @@  EXPERIMENTAL {
 	rte_tm_wred_profile_add;
 	rte_tm_wred_profile_delete;
 
+	rte_flow_tunnel_decap_set;
+	rte_flow_tunnel_match;
+	rte_flow_get_restore_info;
+	rte_flow_tunnel_action_decap_release;
+	rte_flow_tunnel_item_release;
+
 	# added in 20.11
 	rte_eth_link_speed_to_str;
 	rte_eth_link_to_str;
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index c8c6d62a8b..181c02792d 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -1267,3 +1267,115 @@  rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				  NULL, rte_strerror(ENOTSUP));
 }
+
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_decap_set)) {
+		return flow_err(port_id,
+				ops->tunnel_decap_set(dev, tunnel, actions,
+						      num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_match)) {
+		return flow_err(port_id,
+				ops->tunnel_match(dev, tunnel, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_get_restore_info(uint16_t port_id,
+			  struct rte_mbuf *m,
+			  struct rte_flow_restore_info *restore_info,
+			  struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->get_restore_info)) {
+		return flow_err(port_id,
+				ops->get_restore_info(dev, m, restore_info,
+						      error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->action_release)) {
+		return flow_err(port_id,
+				ops->action_release(dev, actions,
+						    num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->item_release)) {
+		return flow_err(port_id,
+				ops->item_release(dev, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index da8bfa5489..2f12d3ea1a 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -3357,6 +3357,201 @@  int
 rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
 			uint32_t nb_contexts, struct rte_flow_error *error);
 
+/* Tunnel has a type and the key information. */
+struct rte_flow_tunnel {
+	/**
+	 * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
+	 * RTE_FLOW_ITEM_TYPE_NVGRE etc.
+	 */
+	enum rte_flow_item_type	type;
+	uint64_t tun_id; /**< Tunnel identification. */
+
+	RTE_STD_C11
+	union {
+		struct {
+			rte_be32_t src_addr; /**< IPv4 source address. */
+			rte_be32_t dst_addr; /**< IPv4 destination address. */
+		} ipv4;
+		struct {
+			uint8_t src_addr[16]; /**< IPv6 source address. */
+			uint8_t dst_addr[16]; /**< IPv6 destination address. */
+		} ipv6;
+	};
+	rte_be16_t tp_src; /**< Tunnel port source. */
+	rte_be16_t tp_dst; /**< Tunnel port destination. */
+	uint16_t   tun_flags; /**< Tunnel flags. */
+
+	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
+
+	/**
+	 * the following members are required to restore packet
+	 * after miss
+	 */
+	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
+	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
+	uint32_t label; /**< Flow Label for IPv6. */
+};
+
+/**
+ * Indicate that the packet has a tunnel.
+ */
+#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
+
+/**
+ * Indicate that the packet has a non decapsulated tunnel header.
+ */
+#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
+
+/**
+ * Indicate that the packet has a group_id.
+ */
+#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
+
+/**
+ * Restore information structure to communicate the current packet processing
+ * state when some of the processing pipeline is done in hardware and should
+ * continue in software.
+ */
+struct rte_flow_restore_info {
+	/**
+	 * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
+	 * other fields in struct rte_flow_restore_info.
+	 */
+	uint64_t flags;
+	uint32_t group_id; /**< Group ID where packed missed */
+	struct rte_flow_tunnel tunnel; /**< Tunnel information. */
+};
+
+/**
+ * Allocate an array of actions to be used in rte_flow_create, to implement
+ * tunnel-decap-set for the given tunnel.
+ * Sample usage:
+ *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
+ *            jump group 0 / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] actions
+ *   Array of actions to be allocated by the PMD. This array should be
+ *   concatenated with the actions array provided to rte_flow_create.
+ * @param[out] num_of_actions
+ *   Number of actions allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error);
+
+/**
+ * Allocate an array of items to be used in rte_flow_create, to implement
+ * tunnel-match for the given tunnel.
+ * Sample usage:
+ *   pattern tunnel-match(tunnel properties) / outer-header-matches /
+ *           inner-header-matches / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] items
+ *   Array of items to be allocated by the PMD. This array should be
+ *   concatenated with the items array provided to rte_flow_create.
+ * @param[out] num_of_items
+ *   Number of items allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error);
+
+/**
+ * Populate the current packet processing state, if exists, for the given mbuf.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] m
+ *   Mbuf struct.
+ * @param[out] info
+ *   Restore information. Upon success contains the HW state.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_get_restore_info(uint16_t port_id,
+			  struct rte_mbuf *m,
+			  struct rte_flow_restore_info *info,
+			  struct rte_flow_error *error);
+
+/**
+ * Release the action array as allocated by rte_flow_tunnel_decap_set.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] actions
+ *   Array of actions to be released.
+ * @param[in] num_of_actions
+ *   Number of elements in actions array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error);
+
+/**
+ * Release the item array as allocated by rte_flow_tunnel_match.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] items
+ *   Array of items to be released.
+ * @param[in] num_of_items
+ *   Number of elements in item array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error);
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ethdev/rte_flow_driver.h b/lib/librte_ethdev/rte_flow_driver.h
index 3ee871d3eb..9d87407203 100644
--- a/lib/librte_ethdev/rte_flow_driver.h
+++ b/lib/librte_ethdev/rte_flow_driver.h
@@ -108,6 +108,38 @@  struct rte_flow_ops {
 		 void **context,
 		 uint32_t nb_contexts,
 		 struct rte_flow_error *err);
+	/** See rte_flow_tunnel_decap_set() */
+	int (*tunnel_decap_set)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_action **pmd_actions,
+		 uint32_t *num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_tunnel_match() */
+	int (*tunnel_match)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_item **pmd_items,
+		 uint32_t *num_of_items,
+		 struct rte_flow_error *err);
+	/** See rte_flow_get_rte_flow_restore_info() */
+	int (*get_restore_info)
+		(struct rte_eth_dev *dev,
+		 struct rte_mbuf *m,
+		 struct rte_flow_restore_info *info,
+		 struct rte_flow_error *err);
+	/** See rte_flow_action_tunnel_decap_release() */
+	int (*action_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_action *pmd_actions,
+		 uint32_t num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_item_release() */
+	int (*item_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_item *pmd_items,
+		 uint32_t num_of_items,
+		 struct rte_flow_error *err);
 };
 
 /**