[v2,01/14] ethdev: introduce configurable flexible item

Message ID 20211001193415.23288-2-viacheslavo@nvidia.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series ethdev: introduce configurable flexible item |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Slava Ovsiienko Oct. 1, 2021, 7:34 p.m. UTC
  1. Introduction and Retrospective

Nowadays the networks are evolving fast and wide, the network
structures are getting more and more complicated, the new
application areas are emerging. To address these challenges
the new network protocols are continuously being developed,
considered by technical communities, adopted by industry and,
eventually implemented in hardware and software. The DPDK
framework follows the common trends and if we bother
to glance at the RTE Flow API header we see the multiple
new items were introduced during the last years since
the initial release.

The new protocol adoption and implementation process is
not straightforward and takes time, the new protocol passes
development, consideration, adoption, and implementation
phases. The industry tries to mitigate and address the
forthcoming network protocols, for example, many hardware
vendors are implementing flexible and configurable network
protocol parsers. As DPDK developers, could we anticipate
the near future in the same fashion and introduce the similar
flexibility in RTE Flow API?

Let's check what we already have merged in our project, and
we see the nice raw item (rte_flow_item_raw). At the first
glance, it looks superior and we can try to implement a flow
matching on the header of some relatively new tunnel protocol,
say on the GENEVE header with variable length options. And,
under further consideration, we run into the raw item
limitations:

- only fixed size network header can be represented
- the entire network header pattern of fixed format
  (header field offsets are fixed) must be provided
- the search for patterns is not robust (the wrong matches
  might be triggered), and actually is not supported
  by existing PMDs
- no explicitly specified relations with preceding
  and following items
- no tunnel hint support

As the result, implementing the support for tunnel protocols
like aforementioned GENEVE with variable extra protocol option
with flow raw item becomes very complicated and would require
multiple flows and multiple raw items chained in the same
flow (by the way, there is no support found for chained raw
items in implemented drivers).

This RFC introduces the dedicated flex item (rte_flow_item_flex)
to handle matches with existing and new network protocol headers
in a unified fashion.

2. Flex Item Life Cycle

Let's assume there are the requirements to support the new
network protocol with RTE Flows. What is given within protocol
specification:

  - header format
  - header length, (can be variable, depending on options)
  - potential presence of extra options following or included
    in the header the header
  - the relations with preceding protocols. For example,
    the GENEVE follows UDP, eCPRI can follow either UDP
    or L2 header
  - the relations with following protocols. For example,
    the next layer after tunnel header can be L2 or L3
  - whether the new protocol is a tunnel and the header
    is a splitting point between outer and inner layers

The supposed way to operate with flex item:

  - application defines the header structures according to
    protocol specification

  - application calls rte_flow_flex_item_create() with desired
    configuration according to the protocol specification, it
    creates the flex item object over specified ethernet device
    and prepares PMD and underlying hardware to handle flex
    item. On item creation call PMD backing the specified
    ethernet device returns the opaque handle identifying
    the object have been created

  - application uses the rte_flow_item_flex with obtained handle
    in the flows, the values/masks to match with fields in the
    header are specified in the flex item per flow as for regular
    items (except that pattern buffer combines all fields)

  - flows with flex items match with packets in a regular fashion,
    the values and masks for the new protocol header match are
    taken from the flex items in the flows

  - application destroys flows with flex items

  - application calls rte_flow_flex_item_release() as part of
    ethernet device API and destroys the flex item object in
    PMD and releases the engaged hardware resources

3. Flex Item Structure

The flex item structure is intended to be used as part of the flow
pattern like regular RTE flow items and provides the mask and
value to match with fields of the protocol item was configured
for.

  struct rte_flow_item_flex {
    void *handle;
    uint32_t length;
    const uint8_t* pattern;
  };

The handle is some opaque object maintained on per device basis
by underlying driver.

The protocol header fields are considered as bit fields, all
offsets and widths are expressed in bits. The pattern is the
buffer containing the bit concatenation of all the fields
presented at item configuration time, in the same order and
same amount. If byte boundary alignment is needed an application
can use a dummy type field, this is just some kind of gap filler.

The length field specifies the pattern buffer length in bytes
and is needed to allow rte_flow_copy() operations. The approach
of multiple pattern pointers and lengths (per field) was
considered and found clumsy - it seems to be much suitable for
the application to maintain the single structure within the
single pattern buffer.

4. Flex Item Configuration

The flex item configuration consists of the following parts:

  - header field descriptors:
    - next header
    - next protocol
    - sample to match
  - input link descriptors
  - output link descriptors

The field descriptors tell driver and hardware what data should
be extracted from the packet and then presented to match in the
flows. Each field is a bit pattern. It has width, offset from
the header beginning, mode of offset calculation, and offset
related parameters.

The next header field is special, no data are actually taken
from the packet, but its offset is used as pointer to the next
header in the packet, in other word the next header offset
specifies the size of the header being parsed by flex item.

There is one more special field - next protocol, it specifies
where the next protocol identifier is contained and packet data
sampled from this field will be used to determine the next
protocol header type to continue packet parsing. The next
protocol field is like eth_type field in MAC2, or proto field
in IPv4/v6 headers.

The sample fields are used to represent the data be sampled
from the packet and then matched with established flows.

There are several methods supposed to calculate field offset
in runtime depending on configuration and packet content:

  - FIELD_MODE_FIXED - fixed offset. The bit offset from
    header beginning is permanent and defined by field_base
    configuration parameter.

  - FIELD_MODE_OFFSET - the field bit offset is extracted
    from other header field (indirect offset field). The
    resulting field offset to match is calculated from as:

  field_base + (*field_offset & offset_mask) << field_shift

    This mode is useful to sample some extra options following
    the main header with field containing main header length.
    Also, this mode can be used to calculate offset to the
    next protocol header, for example - IPv4 header contains
    the 4-bit field with IPv4 header length expressed in dwords.
    One more example - this mode would allow us to skip GENEVE
    header variable length options.

  - FIELD_MODE_BITMASK - the field bit offset is extracted
    from other header field (indirect offset field), the latter
    is considered as bitmask containing some number of one bits,
    the resulting field offset to match is calculated as:

  field_base + bitcount(*field_offset & offset_mask) << field_shift

    This mode would be useful to skip the GTP header and its
    extra options with specified flags.

  - FIELD_MODE_DUMMY - dummy field, optionally used for byte
    boundary alignment in pattern. Pattern mask and data are
    ignored in the match. All configuration parameters besides
    field size and offset are ignored.

The offset mode list can be extended by vendors according to
hardware supported options.

The input link configuration section tells the driver after
what protocols and at what conditions the flex item can follow.
Input link specified the preceding header pattern, for example
for GENEVE it can be UDP item specifying match on destination
port with value 6081. The flex item can follow multiple header
types and multiple input links should be specified. At flow
creation type the item with one of input link types should
precede the flex item and driver will select the correct flex
item settings, depending on actual flow pattern.

The output link configuration section tells the driver how
to continue packet parsing after the flex item protocol.
If multiple protocols can follow the flex item header the
flex item should contain the field with next protocol
identifier, and the parsing will be continued depending
on the data contained in this field in the actual packet.

The flex item fields can participate in RSS hash calculation,
the dedicated flag is present in field description to specify
what fields should be provided for hashing.

5. Flex Item Chaining

If there are multiple protocols supposed to be supported with
flex items in chained fashion - two or more flex items within
the same flow and these ones might be neighbors in pattern - it
means the flex items are mutual referencing.  In this case,
the item that occurred first should be created with empty
output link list or with the list including existing items,
and then the second flex item should be created referencing
the first flex item as input arc.

Also, the hardware resources used by flex items to handle
the packet can be limited. If there are multiple flex items
that are supposed to be used within the same flow it would
be nice to provide some hint for the driver that these two
or more flex items are intended for simultaneous usage.
The fields of items should be assigned with hint indices
and these indices from two or more flex items should not
overlap (be unique per field). For this case, the driver
will try to engage not overlapping hardware resources
and provide independent handling of the fields with unique
indices. If the hint index is zero the driver assigns
resources on its own.

6. Example of New Protocol Handling

Let's suppose we have the requirements to handle the new tunnel
protocol that follows UDP header with destination port 0xFADE
and is followed by MAC header. Let the new protocol header format
be like this:

  struct new_protocol_header {
    rte_be32 header_length; /* length in dwords, including options */
    rte_be32 specific0;     /* some protocol data, no intention */
    rte_be32 specific1;     /* to match in flows on these fields */
    rte_be32 crucial;       /* data of interest, match is needed */
    rte_be32 options[0];    /* optional protocol data, variable length */
  };

The supposed flex item configuration:

  struct rte_flow_item_flex_field field0 = {
    .field_mode = FIELD_MODE_DUMMY,  /* Affects match pattern only */
    .field_size = 96,                /* three dwords from the beginning */
  };
  struct rte_flow_item_flex_field field1 = {
    .field_mode = FIELD_MODE_FIXED,
    .field_size = 32,       /* Field size is one dword */
    .field_base = 96,       /* Skip three dwords from the beginning */
  };
  struct rte_flow_item_udp spec0 = {
    .hdr = {
      .dst_port = RTE_BE16(0xFADE),
    }
  };
  struct rte_flow_item_udp mask0 = {
    .hdr = {
      .dst_port = RTE_BE16(0xFFFF),
    }
  };
  struct rte_flow_item_flex_link link0 = {
    .item = {
       .type = RTE_FLOW_ITEM_TYPE_UDP,
       .spec = &spec0,
       .mask = &mask0,
  };

  struct rte_flow_item_flex_conf conf = {
    .next_header = {
      .field_mode = FIELD_MODE_OFFSET,
      .field_base = 0,
      .offset_base = 0,
      .offset_mask = 0xFFFFFFFF,
      .offset_shift = 2	   /* Expressed in dwords, shift left by 2 */
    },
    .sample = {
       &field0,
       &field1,
    },
    .sample_num = 2,
    .input_link[0] = &link0,
    .input_num = 1
  };

Let's suppose we have created the flex item successfully, and PMD
returned the handle 0x123456789A. We can use the following item
pattern to match the crucial field in the packet with value 0x00112233:

  struct new_protocol_header spec_pattern =
  {
    .crucial = RTE_BE32(0x00112233),
  };
  struct new_protocol_header mask_pattern =
  {
    .crucial = RTE_BE32(0xFFFFFFFF),
  };
  struct rte_flow_item_flex spec_flex = {
    .handle = 0x123456789A
    .length = sizeiof(struct new_protocol_header),
    .pattern = &spec_pattern,
  };
  struct rte_flow_item_flex mask_flex = {
    .length = sizeof(struct new_protocol_header),
    .pattern = &mask_pattern,
  };
  struct rte_flow_item item_to_match = {
    .type = RTE_FLOW_ITEM_TYPE_FLEX,
    .spec = &spec_flex,
    .mask = &mask_flex,
  };

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/prog_guide/rte_flow.rst     |  24 +++
 doc/guides/rel_notes/release_21_11.rst |   7 +
 lib/ethdev/rte_ethdev.h                |   1 +
 lib/ethdev/rte_flow.h                  | 228 +++++++++++++++++++++++++
 4 files changed, 260 insertions(+)
  

Comments

Ori Kam Oct. 7, 2021, 11:08 a.m. UTC | #1
Hi Slava,

> -----Original Message-----
> From: Slava Ovsiienko <viacheslavo@nvidia.com>
> Sent: Friday, October 1, 2021 10:34 PM
> Subject: [PATCH v2 01/14] ethdev: introduce configurable flexible item
> 
> 1. Introduction and Retrospective
> 
> Nowadays the networks are evolving fast and wide, the network structures are
> getting more and more complicated, the new application areas are emerging.
> To address these challenges the new network protocols are continuously being
> developed, considered by technical communities, adopted by industry and,
> eventually implemented in hardware and software. The DPDK framework
> follows the common trends and if we bother to glance at the RTE Flow API
> header we see the multiple new items were introduced during the last years
> since the initial release.
> 
> The new protocol adoption and implementation process is not straightforward
> and takes time, the new protocol passes development, consideration,
> adoption, and implementation phases. The industry tries to mitigate and
> address the forthcoming network protocols, for example, many hardware
> vendors are implementing flexible and configurable network protocol parsers.
> As DPDK developers, could we anticipate the near future in the same fashion
> and introduce the similar flexibility in RTE Flow API?
> 
> Let's check what we already have merged in our project, and we see the nice
> raw item (rte_flow_item_raw). At the first glance, it looks superior and we can
> try to implement a flow matching on the header of some relatively new tunnel
> protocol, say on the GENEVE header with variable length options. And, under
> further consideration, we run into the raw item
> limitations:
> 
> - only fixed size network header can be represented
> - the entire network header pattern of fixed format
>   (header field offsets are fixed) must be provided
> - the search for patterns is not robust (the wrong matches
>   might be triggered), and actually is not supported
>   by existing PMDs
> - no explicitly specified relations with preceding
>   and following items
> - no tunnel hint support
> 
> As the result, implementing the support for tunnel protocols like
> aforementioned GENEVE with variable extra protocol option with flow raw
> item becomes very complicated and would require multiple flows and
> multiple raw items chained in the same flow (by the way, there is no support
> found for chained raw items in implemented drivers).
> 
> This RFC introduces the dedicated flex item (rte_flow_item_flex) to handle
> matches with existing and new network protocol headers in a unified fashion.
> 
> 2. Flex Item Life Cycle
> 
> Let's assume there are the requirements to support the new network protocol
> with RTE Flows. What is given within protocol
> specification:
> 
>   - header format
>   - header length, (can be variable, depending on options)
>   - potential presence of extra options following or included
>     in the header the header
>   - the relations with preceding protocols. For example,
>     the GENEVE follows UDP, eCPRI can follow either UDP
>     or L2 header
>   - the relations with following protocols. For example,
>     the next layer after tunnel header can be L2 or L3
>   - whether the new protocol is a tunnel and the header
>     is a splitting point between outer and inner layers
> 
> The supposed way to operate with flex item:
> 
>   - application defines the header structures according to
>     protocol specification
> 
>   - application calls rte_flow_flex_item_create() with desired
>     configuration according to the protocol specification, it
>     creates the flex item object over specified ethernet device
>     and prepares PMD and underlying hardware to handle flex
>     item. On item creation call PMD backing the specified
>     ethernet device returns the opaque handle identifying
>     the object have been created
> 
>   - application uses the rte_flow_item_flex with obtained handle
>     in the flows, the values/masks to match with fields in the
>     header are specified in the flex item per flow as for regular
>     items (except that pattern buffer combines all fields)
> 
>   - flows with flex items match with packets in a regular fashion,
>     the values and masks for the new protocol header match are
>     taken from the flex items in the flows
> 
>   - application destroys flows with flex items
> 
>   - application calls rte_flow_flex_item_release() as part of
>     ethernet device API and destroys the flex item object in
>     PMD and releases the engaged hardware resources
> 
> 3. Flex Item Structure
> 
> The flex item structure is intended to be used as part of the flow pattern like
> regular RTE flow items and provides the mask and value to match with fields of
> the protocol item was configured for.
> 
>   struct rte_flow_item_flex {
>     void *handle;
>     uint32_t length;
>     const uint8_t* pattern;
>   };
> 
> The handle is some opaque object maintained on per device basis by
> underlying driver.
> 
> The protocol header fields are considered as bit fields, all offsets and widths
> are expressed in bits. The pattern is the buffer containing the bit
> concatenation of all the fields presented at item configuration time, in the
> same order and same amount. If byte boundary alignment is needed an
> application can use a dummy type field, this is just some kind of gap filler.
> 
> The length field specifies the pattern buffer length in bytes and is needed to
> allow rte_flow_copy() operations. The approach of multiple pattern pointers
> and lengths (per field) was considered and found clumsy - it seems to be much
> suitable for the application to maintain the single structure within the single
> pattern buffer.
> 

I think that the main thing that is unclear to me and I think I understand it from
reading the code is that the pattern is the entire flex header structure.
maybe a better word will be header?
In the beginning I thought that you should only give the matchable fields.
also you say everything is in bits and suddenly you are talking in bytes.

> 4. Flex Item Configuration
> 
> The flex item configuration consists of the following parts:
> 
>   - header field descriptors:
>     - next header
>     - next protocol
>     - sample to match
>   - input link descriptors
>   - output link descriptors
> 
> The field descriptors tell driver and hardware what data should be extracted
> from the packet and then presented to match in the flows. Each field is a bit
> pattern. It has width, offset from the header beginning, mode of offset
> calculation, and offset related parameters.
> 

I'm not sure your indentation is correct for the next header, next protocol, sample to match.
Since reading the first line means that all fields are going to be matched
while in following sections only the sample to match are matchable. 

> The next header field is special, no data are actually taken from the packet,
> but its offset is used as pointer to the next header in the packet, in other word
> the next header offset specifies the size of the header being parsed by flex
> item.
> 

So the name of the next header should be len?

> There is one more special field - next protocol, it specifies where the next
> protocol identifier is contained and packet data sampled from this field will be
> used to determine the next protocol header type to continue packet parsing.
> The next protocol field is like eth_type field in MAC2, or proto field in IPv4/v6
> headers.
> 
> The sample fields are used to represent the data be sampled from the packet
> and then matched with established flows.

Should this be samples?

> 
> There are several methods supposed to calculate field offset in runtime
> depending on configuration and packet content:
> 
>   - FIELD_MODE_FIXED - fixed offset. The bit offset from
>     header beginning is permanent and defined by field_base
>     configuration parameter.
> 
>   - FIELD_MODE_OFFSET - the field bit offset is extracted
>     from other header field (indirect offset field). The
>     resulting field offset to match is calculated from as:
> 
>   field_base + (*field_offset & offset_mask) << field_shift
> 

Not all of those fields names are defined later in this patch, and I'm not
sure about what they mean.
Does * means take the value this is in field_offset?
How do we know the width of the field (by the value of the mask)?

>     This mode is useful to sample some extra options following
>     the main header with field containing main header length.
>     Also, this mode can be used to calculate offset to the
>     next protocol header, for example - IPv4 header contains
>     the 4-bit field with IPv4 header length expressed in dwords.
>     One more example - this mode would allow us to skip GENEVE
>     header variable length options.
> 
>   - FIELD_MODE_BITMASK - the field bit offset is extracted
>     from other header field (indirect offset field), the latter
>     is considered as bitmask containing some number of one bits,
>     the resulting field offset to match is calculated as:
> 
>   field_base + bitcount(*field_offset & offset_mask) << field_shift

Same comment as above you are using name that are not defined later.

> 
>     This mode would be useful to skip the GTP header and its
>     extra options with specified flags.
> 
>   - FIELD_MODE_DUMMY - dummy field, optionally used for byte
>     boundary alignment in pattern. Pattern mask and data are
>     ignored in the match. All configuration parameters besides
>     field size and offset are ignored.
> 
> The offset mode list can be extended by vendors according to hardware
> supported options.
> 
> The input link configuration section tells the driver after what protocols and at
> what conditions the flex item can follow.
> Input link specified the preceding header pattern, for example for GENEVE it
> can be UDP item specifying match on destination port with value 6081. The
> flex item can follow multiple header types and multiple input links should be
> specified. At flow creation type the item with one of input link types should
> precede the flex item and driver will select the correct flex item settings,
> depending on actual flow pattern.
> 
> The output link configuration section tells the driver how to continue packet
> parsing after the flex item protocol.
> If multiple protocols can follow the flex item header the flex item should
> contain the field with next protocol identifier, and the parsing will be
> continued depending on the data contained in this field in the actual packet.
> 
> The flex item fields can participate in RSS hash calculation, the dedicated flag
> is present in field description to specify what fields should be provided for
> hashing.
> 
> 5. Flex Item Chaining
> 
> If there are multiple protocols supposed to be supported with flex items in
> chained fashion - two or more flex items within the same flow and these ones
> might be neighbors in pattern - it means the flex items are mutual referencing.
> In this case, the item that occurred first should be created with empty output
> link list or with the list including existing items, and then the second flex item
> should be created referencing the first flex item as input arc.
> 

And then I assume we should update the output list.

> Also, the hardware resources used by flex items to handle the packet can be
> limited. If there are multiple flex items that are supposed to be used within the
> same flow it would be nice to provide some hint for the driver that these two
> or more flex items are intended for simultaneous usage.
> The fields of items should be assigned with hint indices and these indices from
> two or more flex items should not overlap (be unique per field). For this case,
> the driver will try to engage not overlapping hardware resources and provide
> independent handling of the fields with unique indices. If the hint index is zero
> the driver assigns resources on its own.
> 
> 6. Example of New Protocol Handling
> 
> Let's suppose we have the requirements to handle the new tunnel protocol
> that follows UDP header with destination port 0xFADE and is followed by MAC
> header. Let the new protocol header format be like this:
> 
>   struct new_protocol_header {
>     rte_be32 header_length; /* length in dwords, including options */
>     rte_be32 specific0;     /* some protocol data, no intention */
>     rte_be32 specific1;     /* to match in flows on these fields */
>     rte_be32 crucial;       /* data of interest, match is needed */
>     rte_be32 options[0];    /* optional protocol data, variable length */
>   };
> 
> The supposed flex item configuration:
> 
>   struct rte_flow_item_flex_field field0 = {
>     .field_mode = FIELD_MODE_DUMMY,  /* Affects match pattern only */
>     .field_size = 96,                /* three dwords from the beginning */
>   };
>   struct rte_flow_item_flex_field field1 = {
>     .field_mode = FIELD_MODE_FIXED,
>     .field_size = 32,       /* Field size is one dword */
>     .field_base = 96,       /* Skip three dwords from the beginning */
>   };
>   struct rte_flow_item_udp spec0 = {
>     .hdr = {
>       .dst_port = RTE_BE16(0xFADE),
>     }
>   };
>   struct rte_flow_item_udp mask0 = {
>     .hdr = {
>       .dst_port = RTE_BE16(0xFFFF),
>     }
>   };
>   struct rte_flow_item_flex_link link0 = {
>     .item = {
>        .type = RTE_FLOW_ITEM_TYPE_UDP,
>        .spec = &spec0,
>        .mask = &mask0,
>   };
> 
>   struct rte_flow_item_flex_conf conf = {
>     .next_header = {
>       .field_mode = FIELD_MODE_OFFSET,
>       .field_base = 0,
>       .offset_base = 0,
>       .offset_mask = 0xFFFFFFFF,
>       .offset_shift = 2	   /* Expressed in dwords, shift left by 2 */
>     },
>     .sample = {
>        &field0,
>        &field1,
>     },

Why in sample you give both fields?
by your decision we just want to match on field1.

>     .sample_num = 2,
>     .input_link[0] = &link0,
>     .input_num = 1
>   };
> 
> Let's suppose we have created the flex item successfully, and PMD returned
> the handle 0x123456789A. We can use the following item pattern to match the
> crucial field in the packet with value 0x00112233:
> 
>   struct new_protocol_header spec_pattern =
>   {
>     .crucial = RTE_BE32(0x00112233),
>   };
>   struct new_protocol_header mask_pattern =
>   {
>     .crucial = RTE_BE32(0xFFFFFFFF),
>   };
>   struct rte_flow_item_flex spec_flex = {
>     .handle = 0x123456789A
>     .length = sizeiof(struct new_protocol_header),
>     .pattern = &spec_pattern,
>   };
>   struct rte_flow_item_flex mask_flex = {
>     .length = sizeof(struct new_protocol_header),
>     .pattern = &mask_pattern,
>   };
>   struct rte_flow_item item_to_match = {
>     .type = RTE_FLOW_ITEM_TYPE_FLEX,
>     .spec = &spec_flex,
>     .mask = &mask_flex,
>   };
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>  doc/guides/prog_guide/rte_flow.rst     |  24 +++
>  doc/guides/rel_notes/release_21_11.rst |   7 +
>  lib/ethdev/rte_ethdev.h                |   1 +
>  lib/ethdev/rte_flow.h                  | 228 +++++++++++++++++++++++++
>  4 files changed, 260 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst
> b/doc/guides/prog_guide/rte_flow.rst
> index 2b42d5ec8c..628f30cea7 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -1425,6 +1425,30 @@ Matches a conntrack state after conntrack action.
>  - ``flags``: conntrack packet state flags.
>  - Default ``mask`` matches all state bits.
> 
> +Item: ``FLEX``
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Matches with the network protocol header of preliminary configured format.
> +The application describes the desired header structure, defines the
> +header fields attributes and header relations with preceding and
> +following protocols and configures the ethernet devices accordingly via
> +rte_flow_flex_item_create() routine.

How about: matches a custom header that was created using
rte_flow_flex_item_create

> +
> +- ``handle``: the flex item handle returned by the PMD on successful
> +  rte_flow_flex_item_create() call. The item handle is unique within
> +  the device port, mask for this field is ignored.

I think you can remove that it is unique handle.

> +- ``length``: match pattern length in bytes. If the length does not
> +cover
> +  all fields defined in item configuration, the pattern spec and mask
> +are
> +  supposed to be appended with zeroes till the full configured item length.

It looks bugy saying that you can give any length but expect the application to supply the
full length.
 
> +- ``pattern``: pattern to match. The protocol header fields are
> +considered
> +  as bit fields, all offsets and widths are expressed in bits. The
> +pattern
> +  is the buffer containing the bit concatenation of all the fields
> +presented
> +  at item configuration time, in the same order and same amount. The
> +most
> +  regular way is to define all the header fields in the flex item
> +configuration
> +  and directly use the header structure as pattern template, i.e.
> +application
> +  just can fill the header structures with desired match values and
> +masks and
> +  specify these structures as flex item pattern directly.
> +

It hard to understand this comment and what the application should set.
I suggest to take the basic approach and just explain it. ( I think those are
the last few lines)

>  Actions
>  ~~~~~~~
> 
> diff --git a/doc/guides/rel_notes/release_21_11.rst
> b/doc/guides/rel_notes/release_21_11.rst
> index 73e377a007..170797f9e9 100644
> --- a/doc/guides/rel_notes/release_21_11.rst
> +++ b/doc/guides/rel_notes/release_21_11.rst
> @@ -55,6 +55,13 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
> 
> +* **Introduced RTE Flow Flex Item.**
> +
> +  * The configurable RTE Flow Flex Item provides the capability to introdude
> +    the arbitrary user specified network protocol header, configure the device
> +    hardware accordingly, and perform match on this header with desired
> patterns
> +    and masks.
> +
>  * **Enabled new devargs parser.**
> 
>    * Enabled devargs syntax
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> afdc53b674..e9ad7673e9 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -558,6 +558,7 @@ struct rte_eth_rss_conf {
>   * it takes the reserved value 0 as input for the hash function.
>   */
>  #define ETH_RSS_L4_CHKSUM          (1ULL << 35)
> +#define ETH_RSS_FLEX		   (1ULL << 36)

Is the indentation right?
How do you support FLEX RSS if more then on FLEX item is configured?

> 
>  /*
>   * We use the following macros to combine with above ETH_RSS_* for diff --git
> a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> 7b1ed7f110..eccb1e1791 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -574,6 +574,15 @@ enum rte_flow_item_type {
>  	 * @see struct rte_flow_item_conntrack.
>  	 */
>  	RTE_FLOW_ITEM_TYPE_CONNTRACK,
> +
> +	/**
> +	 * Matches a configured set of fields at runtime calculated offsets
> +	 * over the generic network header with variable length and
> +	 * flexible pattern
> +	 *

I think it should say matches on application configured header.

> +	 * @see struct rte_flow_item_flex.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_FLEX,
>  };
> 
>  /**
> @@ -1839,6 +1848,160 @@ struct rte_flow_item {
>  	const void *mask; /**< Bit-mask applied to spec and last. */  };
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * RTE_FLOW_ITEM_TYPE_FLEX
> + *
> + * Matches a specified set of fields within the network protocol
> + * header. Each field is presented as set of bits with specified width,
> +and
> + * bit offset (this is dynamic one - can be calulated by several
> +methods
> + * in runtime) from the header beginning.
> + *
> + * The pattern is concatenation of all bit fields configured at item
> +creation
> + * by rte_flow_flex_item_create() exactly in the same order and amount,
> +no
> + * fields can be omitted or swapped. The dummy mode field can be used
> +for
> + * pattern byte boundary alignment, least significant bit in byte goes first.
> + * Only the fields specified in sample_data configuration parameter
> +participate
> + * in pattern construction.
> + *
> + * If pattern length is smaller than configured fields overall length
> +it is
> + * extended with trailing zeroes, both for value and mask.
> + *
> + * This type does not support ranges (struct rte_flow_item.last).
> + */

I think it is to complex to understand see my comment above.

> +struct rte_flow_item_flex {
> +	struct rte_flow_item_flex_handle *handle; /**< Opaque item handle.
> */
> +	uint32_t length; /**< Pattern length in bytes. */
> +	const uint8_t *pattern; /**< Combined bitfields pattern to match. */
> +};
> +/**
> + * Field bit offset calculation mode.
> + */
> +enum rte_flow_item_flex_field_mode {
> +	/**
> +	 * Dummy field, used for byte boundary alignment in pattern.
> +	 * Pattern mask and data are ignored in the match. All configuration
> +	 * parameters besides field size are ignored.

Since in the item we just set value and mask what will happen if
we set mask to be different then 0 in an offset that we have such a field?

> +	 */
> +	FIELD_MODE_DUMMY = 0,
> +	/**
> +	 * Fixed offset field. The bit offset from header beginning is
> +	 * is permanent and defined by field_base parameter.
> +	 */
> +	FIELD_MODE_FIXED,
> +	/**
> +	 * The field bit offset is extracted from other header field (indirect
> +	 * offset field). The resulting field offset to match is calculated as:
> +	 *
> +	 *    field_base + (*field_offset & offset_mask) << field_shift

I can't find those name in the patch and I'm not clear on what they mean.

> +	 */
> +	FIELD_MODE_OFFSET,
> +	/**
> +	 * The field bit offset is extracted from other header field (indirect
> +	 * offset field), the latter is considered as bitmask containing some
> +	 * number of one bits, the resulting field offset to match is
> +	 * calculated as:

Just like above. 

> +	 *
> +	 *    field_base + bitcount(*field_offset & offset_mask) << field_shift
> +	 */
> +	FIELD_MODE_BITMASK,
> +};
> +
> +/**
> + * Flex item field tunnel mode
> + */
> +enum rte_flow_item_flex_tunnel_mode {
> +	FLEX_TUNNEL_MODE_FIRST = 0, /**< First item occurrence. */
> +	FLEX_TUNNEL_MODE_OUTER = 1, /**< Outer item. */
> +	FLEX_TUNNEL_MODE_INNER = 2  /**< Inner item. */ };
> +

The '}' should be at a new line.
If the item can be inner and outer do we need to define two flex objects?
Also why enum and not defines?
From API point of view I think it should hav the following options:
Mode_outer , mode_inner, mode_global and mode_tunnel,
Why is per field and not per object. 

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice  */
> +__extension__ struct rte_flow_item_flex_field {
> +	/** Defines how match field offset is calculated over the packet. */
> +	enum rte_flow_item_flex_field_mode field_mode;
> +	uint32_t field_size; /**< Match field size in bits. */

I think it will be better to remove the word Match.

> +	int32_t field_base; /**< Match field offset in bits. */

I think it will be better to remove the word Match.

> +	uint32_t offset_base; /**< Indirect offset field offset in bits. */

I think a better name will be offset_field /* the offset of the field that holds the offset that
should be used from the field_base */ what do you think?

Maybe just change from offset_base to offset?

> +	uint32_t offset_mask; /**< Indirect offset field bit mask. */

Maybe better wording?
The mask to apply to the value that is set in the offset_field.

> +	int32_t offset_shift; /**< Indirect offset multiply factor. */
> +	uint16_t tunnel_count:2; /**< 0-first occurrence, 1-outer, 2-inner.*/

I think this may result in some warning since you try to cast enum to 2 bits.
Also the same question from above to support inner and outer do we need
two objects?

> +	uint16_t rss_hash:1; /**< Field participates in RSS hash calculation. */

Please see my comment on the RSS, it is not clear how more then one flex item 
can be created and the rss will work.

> +	uint16_t field_id; /**< device hint, for flows with multiple items. */

How should this be used? 
Should be capital D in device.

> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice  */
> +struct rte_flow_item_flex_link {
> +	/**
> +	 * Preceding/following header. The item type must be always
> provided.
> +	 * For preceding one item must specify the header value/mask to
> match
> +	 * for the link be taken and start the flex item header parsing.
> +	 */
> +	struct rte_flow_item item;
> +	/**
> +	 * Next field value to match to continue with one of the configured
> +	 * next protocols.
> +	 */
> +	uint32_t next;

Is this offset of the field or the value?

> +	/**
> +	 * Specifies whether flex item represents tunnel protocol
> +	 */
> +	bool tunnel;
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice  */
> +struct rte_flow_item_flex_conf {
> +	/**
> +	 * The next header offset, it presents the network header size covered
> +	 * by the flex item and can be obtained with all supported offset
> +	 * calculating methods (fixed, dedicated field, bitmask, etc).
> +	 */
> +	struct rte_flow_item_flex_field next_header;

I think a better name will be size/len

> +	/**
> +	 * Specifies the next protocol field to match with link next protocol
> +	 * values and continue packet parsing with matching link.
> +	 */
> +	struct rte_flow_item_flex_field next_protocol;
> +	/**
> +	 * The fields will be sampled and presented for explicit match
> +	 * with pattern in the rte_flow_flex_item. There can be multiple
> +	 * fields descriptors, the number should be specified by sample_num.
> +	 */
> +	struct rte_flow_item_flex_field *sample_data;
> +	/** Number of field descriptors in the sample_data array. */
> +	uint32_t sample_num;

nb_samples?

> +	/**
> +	 * Input link defines the flex item relation with preceding
> +	 * header. It specified the preceding item type and provides pattern
> +	 * to match. The flex item will continue parsing and will provide the
> +	 * data to flow match in case if there is the match with one of input
> +	 * links.
> +	 */
> +	struct rte_flow_item_flex_link *input_link;
> +	/** Number of link descriptors in the input link array. */
> +	uint32_t input_num;
Nb_inputs
> +	/**
> +	 * Output link defines the next protocol field value to match and
> +	 * the following protocol header to continue packet parsing. Also
> +	 * defines the tunnel-related behaviour.
> +	 */
> +	struct rte_flow_item_flex_link *output_link;
> +	/** Number of link descriptors in the output link array. */
> +	uint32_t output_num;
> +};
> +
>  /**
>   * Action types.
>   *
> @@ -4288,6 +4451,71 @@ rte_flow_tunnel_item_release(uint16_t port_id,
>  			     struct rte_flow_item *items,
>  			     uint32_t num_of_items,
>  			     struct rte_flow_error *error);
> +
> +/**
> + * Create the flex item with specified configuration over
> + * the Ethernet device.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] conf
> + *   Item configuration.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   Non-NULL opaque pointer on success, NULL otherwise and rte_errno is
> set.
> + */
> +__rte_experimental
> +struct rte_flow_item_flex_handle *
> +rte_flow_flex_item_create(uint16_t port_id,
> +			  const struct rte_flow_item_flex_conf *conf,
> +			  struct rte_flow_error *error);
> +
> +/**
> + * Release the flex item on the specified Ethernet device.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] handle
> + *   Handle of the item existing on the specified device.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_flex_item_release(uint16_t port_id,
> +			   const struct rte_flow_item_flex_handle *handle,
> +			   struct rte_flow_error *error);
> +
> +/**
> + * Modify the flex item on the specified Ethernet device.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] handle
> + *   Handle of the item existing on the specified device.
> + * @param[in] conf
> + *   Item new configuration.

Do you to supply full configuration for each update?
Maybe add a mask?

> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_flex_item_update(uint16_t port_id,
> +			  const struct rte_flow_item_flex_handle *handle,
> +			  const struct rte_flow_item_flex_conf *conf,
> +			  struct rte_flow_error *error);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 2.18.1

Best,
Ori
  
Slava Ovsiienko Oct. 12, 2021, 6:42 a.m. UTC | #2
Hi, Ori

Thank you very much for the review, I found some of your comment extremely useful.
Please, see below

> -----Original Message-----
> From: Ori Kam <orika@nvidia.com>
> Sent: Thursday, October 7, 2021 14:08
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Gregory Etelson
> <getelson@nvidia.com>; NBU-Contact-Thomas Monjalon
> <thomas@monjalon.net>
> Subject: RE: [PATCH v2 01/14] ethdev: introduce configurable flexible item
> 
> Hi Slava,
> 
> > -----Original Message-----
> > From: Slava Ovsiienko <viacheslavo@nvidia.com>
> > Sent: Friday, October 1, 2021 10:34 PM
> > Subject: [PATCH v2 01/14] ethdev: introduce configurable flexible item
> >
> > 1. Introduction and Retrospective

.. snip ..
> >
> > The length field specifies the pattern buffer length in bytes and is
> > needed to allow rte_flow_copy() operations. The approach of multiple
> > pattern pointers and lengths (per field) was considered and found
> > clumsy - it seems to be much suitable for the application to maintain
> > the single structure within the single pattern buffer.
> >
> 
> I think that the main thing that is unclear to me and I think I understand it
> from reading the code is that the pattern is the entire flex header structure.
> maybe a better word will be header?

"pattern is the entire flex header structure" - it is not completely correct, sorry
Pattern represents the set of fields of the header. Yes, usually it coincides
with the entire header (it is just simpler for understanding). But it must not!
The flex item can be constructed in more generic way. It can include fields
in arbitrary order (in general, we may not follow the strict field order in the
header while defining the flex item), the field can be split into subfields
and reordered. Theoretically it even allows to do many interesting things,
say the byte order conversion - item will sample byte subfields into single
integer field in desired host endianness. 

Other possibility is to gather some split fields (say we have some offset
split into multiple locations in the header) into one. Yes, for this case
pattern will not correspond the header structure 1-to-1, but it is not
required. 1-to-1 pattern-to-header mapping is just most straightforward
way to operate, but it is not the only possible one.

> In the beginning I thought that you should only give the matchable fields.
> also you say everything is in bits and suddenly you are talking in bytes.
Yes, data are presented as bytes. But we must provide the capability
to operate with bitfields. If we introduced some byte alignment for
bitfields in the pattern we would lose the opportunity to match
with header structure (that might define bitfields) strictly. That's
why all offsets in flex config are expressed in bits, it provides the
precise control over pattern structure.

> 
> > 4. Flex Item Configuration
> >
> > The flex item configuration consists of the following parts:
> >
> >   - header field descriptors:
> >     - next header
> >     - next protocol
> >     - sample to match
> >   - input link descriptors
> >   - output link descriptors
> >
> > The field descriptors tell driver and hardware what data should be
> > extracted from the packet and then presented to match in the flows.
> > Each field is a bit pattern. It has width, offset from the header
> > beginning, mode of offset calculation, and offset related parameters.
> >
> 
> I'm not sure your indentation is correct for the next header, next protocol,
> sample to match.
> Since reading the first line means that all fields are going to be matched while
> in following sections only the sample to match are matchable.

"reading the first line means". M-m-m-m, sorry I do not follow.
The first line is "- header field descriptors". It tells what field descriptors we have.
It tells nothing about match. All indented bullets have the same structure type.
Indentation is correct, but the claim 
The field descriptors tell driver .... then presented to match in the flows"
Is not. So - agree, fixed.

> 
> > The next header field is special, no data are actually taken from the
> > packet, but its offset is used as pointer to the next header in the
> > packet, in other word the next header offset specifies the size of the
> > header being parsed by flex item.
> >
> 
> So the name of the next header should be len?

We considered using the naming "len". We even started the code development
with this one. But there is some level of indirection. The header length
can be obtained with indirect methods (offset or bitmask field in the packet),
and this field descriptor provides rather some "pointer/offset to next header"
than length itself. So, naming next_header (pointer) is more precise,
in my opinion. I would prefer to keep this.

> 
> > There is one more special field - next protocol, it specifies where
> > the next protocol identifier is contained and packet data sampled from
> > this field will be used to determine the next protocol header type to
> continue packet parsing.
> > The next protocol field is like eth_type field in MAC2, or proto field
> > in IPv4/v6 headers.
> >
> > The sample fields are used to represent the data be sampled from the
> > packet and then matched with established flows.
> 
> Should this be samples?
? IIUC  - "sample" is adjective here, "fieldS" is plural.

> 
> >
> > There are several methods supposed to calculate field offset in
> > runtime depending on configuration and packet content:
> >
> >   - FIELD_MODE_FIXED - fixed offset. The bit offset from
> >     header beginning is permanent and defined by field_base
> >     configuration parameter.
> >
> >   - FIELD_MODE_OFFSET - the field bit offset is extracted
> >     from other header field (indirect offset field). The
> >     resulting field offset to match is calculated from as:
> >
> >   field_base + (*field_offset & offset_mask) << field_shift
> >
> 
> Not all of those fields names are defined later in this patch, and I'm not sure
> about what they mean.
Yes, sorry, missed this fix in commit message once code was updated.
> Does * means take the value this is in field_offset?
Yes, it means we should calculate indirect field offset, and extract field
data from the packet (like by pointer *p from memory).
Added the note about this.

> How do we know the width of the field (by the value of the mask)?
By mask, it is common and advanced way to specify the field.

> 
> >     This mode is useful to sample some extra options following
> >     the main header with field containing main header length.
> >     Also, this mode can be used to calculate offset to the
> >     next protocol header, for example - IPv4 header contains
> >     the 4-bit field with IPv4 header length expressed in dwords.
> >     One more example - this mode would allow us to skip GENEVE
> >     header variable length options.
> >
> >   - FIELD_MODE_BITMASK - the field bit offset is extracted
> >     from other header field (indirect offset field), the latter
> >     is considered as bitmask containing some number of one bits,
> >     the resulting field offset to match is calculated as:
> >
> >   field_base + bitcount(*field_offset & offset_mask) << field_shift
> 
> Same comment as above you are using name that are not defined later.
Yes, fixed.

> 
> >
> >     This mode would be useful to skip the GTP header and its
> >     extra options with specified flags.
> >
> >   - FIELD_MODE_DUMMY - dummy field, optionally used for byte
> >     boundary alignment in pattern. Pattern mask and data are
> >     ignored in the match. All configuration parameters besides
> >     field size and offset are ignored.
> >
> > The offset mode list can be extended by vendors according to hardware
> > supported options.
> >
> > The input link configuration section tells the driver after what
> > protocols and at what conditions the flex item can follow.
> > Input link specified the preceding header pattern, for example for
> > GENEVE it can be UDP item specifying match on destination port with
> > value 6081. The flex item can follow multiple header types and
> > multiple input links should be specified. At flow creation type the
> > item with one of input link types should precede the flex item and
> > driver will select the correct flex item settings, depending on actual flow
> pattern.
> >
> > The output link configuration section tells the driver how to continue
> > packet parsing after the flex item protocol.
> > If multiple protocols can follow the flex item header the flex item
> > should contain the field with next protocol identifier, and the
> > parsing will be continued depending on the data contained in this field in
> the actual packet.
> >
> > The flex item fields can participate in RSS hash calculation, the
> > dedicated flag is present in field description to specify what fields
> > should be provided for hashing.
> >
> > 5. Flex Item Chaining
> >
> > If there are multiple protocols supposed to be supported with flex
> > items in chained fashion - two or more flex items within the same flow
> > and these ones might be neighbors in pattern - it means the flex items are
> mutual referencing.
> > In this case, the item that occurred first should be created with
> > empty output link list or with the list including existing items, and
> > then the second flex item should be created referencing the first flex item as
> input arc.
> >
> 
> And then I assume we should update the output list.

It is supposed to be done by driver on creation the second item.
Now update API is not supported (it depends on FW, and now there
is no plans to support object modify), so - no support - we should
not include the code, and rte_flow_flex_item_update() will be missing,
at least in this release.

And, currently there is no code supporting chaining, it is just
an attempt to consider the potential scenario of flex item chaining and
to get as complete API as we can.

> 
> > Also, the hardware resources used by flex items to handle the packet
> > can be limited. If there are multiple flex items that are supposed to
> > be used within the same flow it would be nice to provide some hint for
> > the driver that these two or more flex items are intended for simultaneous
> usage.
> > The fields of items should be assigned with hint indices and these
> > indices from two or more flex items should not overlap (be unique per
> > field). For this case, the driver will try to engage not overlapping
> > hardware resources and provide independent handling of the fields with
> > unique indices. If the hint index is zero the driver assigns resources on its
> own.
> >
> > 6. Example of New Protocol Handling
> >
> > Let's suppose we have the requirements to handle the new tunnel
> > protocol that follows UDP header with destination port 0xFADE and is
> > followed by MAC header. Let the new protocol header format be like this:
> >
> >   struct new_protocol_header {
> >     rte_be32 header_length; /* length in dwords, including options */
> >     rte_be32 specific0;     /* some protocol data, no intention */
> >     rte_be32 specific1;     /* to match in flows on these fields */
> >     rte_be32 crucial;       /* data of interest, match is needed */
> >     rte_be32 options[0];    /* optional protocol data, variable length */
> >   };
> >
> > The supposed flex item configuration:
> >
> >   struct rte_flow_item_flex_field field0 = {
> >     .field_mode = FIELD_MODE_DUMMY,  /* Affects match pattern only */
> >     .field_size = 96,                /* three dwords from the beginning */
> >   };
> >   struct rte_flow_item_flex_field field1 = {
> >     .field_mode = FIELD_MODE_FIXED,
> >     .field_size = 32,       /* Field size is one dword */
> >     .field_base = 96,       /* Skip three dwords from the beginning */
> >   };
> >   struct rte_flow_item_udp spec0 = {
> >     .hdr = {
> >       .dst_port = RTE_BE16(0xFADE),
> >     }
> >   };
> >   struct rte_flow_item_udp mask0 = {
> >     .hdr = {
> >       .dst_port = RTE_BE16(0xFFFF),
> >     }
> >   };
> >   struct rte_flow_item_flex_link link0 = {
> >     .item = {
> >        .type = RTE_FLOW_ITEM_TYPE_UDP,
> >        .spec = &spec0,
> >        .mask = &mask0,
> >   };
> >
> >   struct rte_flow_item_flex_conf conf = {
> >     .next_header = {
> >       .field_mode = FIELD_MODE_OFFSET,
> >       .field_base = 0,
> >       .offset_base = 0,
> >       .offset_mask = 0xFFFFFFFF,
> >       .offset_shift = 2	   /* Expressed in dwords, shift left by 2 */
> >     },
> >     .sample = {
> >        &field0,
> >        &field1,
> >     },
> 
> Why in sample you give both fields?
> by your decision we just want to match on field1.

Field0 is a placeholder, it covers the gap in pattern and
makes sure the pattern has exactly the same format as the
protocol header. As option (by application design choice)
we can omit field0 and, for this case, we'll get compact
pattern structure, but it won't the exact protocol  header
structure.
> 
> >     .sample_num = 2,
> >     .input_link[0] = &link0,
> >     .input_num = 1
> >   };
> >
> > Let's suppose we have created the flex item successfully, and PMD
> > returned the handle 0x123456789A. We can use the following item
> > pattern to match the crucial field in the packet with value 0x00112233:
> >
> >   struct new_protocol_header spec_pattern =
> >   {
> >     .crucial = RTE_BE32(0x00112233),
> >   };
> >   struct new_protocol_header mask_pattern =
> >   {
> >     .crucial = RTE_BE32(0xFFFFFFFF),
> >   };
> >   struct rte_flow_item_flex spec_flex = {
> >     .handle = 0x123456789A
> >     .length = sizeiof(struct new_protocol_header),
> >     .pattern = &spec_pattern,
> >   };
> >   struct rte_flow_item_flex mask_flex = {
> >     .length = sizeof(struct new_protocol_header),
> >     .pattern = &mask_pattern,
> >   };
> >   struct rte_flow_item item_to_match = {
> >     .type = RTE_FLOW_ITEM_TYPE_FLEX,
> >     .spec = &spec_flex,
> >     .mask = &mask_flex,
> >   };
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> > ---
> >  doc/guides/prog_guide/rte_flow.rst     |  24 +++
> >  doc/guides/rel_notes/release_21_11.rst |   7 +
> >  lib/ethdev/rte_ethdev.h                |   1 +
> >  lib/ethdev/rte_flow.h                  | 228 +++++++++++++++++++++++++
> >  4 files changed, 260 insertions(+)
> >
> > diff --git a/doc/guides/prog_guide/rte_flow.rst
> > b/doc/guides/prog_guide/rte_flow.rst
> > index 2b42d5ec8c..628f30cea7 100644
> > --- a/doc/guides/prog_guide/rte_flow.rst
> > +++ b/doc/guides/prog_guide/rte_flow.rst
> > @@ -1425,6 +1425,30 @@ Matches a conntrack state after conntrack
> action.
> >  - ``flags``: conntrack packet state flags.
> >  - Default ``mask`` matches all state bits.
> >
> > +Item: ``FLEX``
> > +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > +
> > +Matches with the network protocol header of preliminary configured
> format.
> > +The application describes the desired header structure, defines the
> > +header fields attributes and header relations with preceding and
> > +following protocols and configures the ethernet devices accordingly
> > +via
> > +rte_flow_flex_item_create() routine.
> 
> How about: matches a custom header that was created using
> rte_flow_flex_item_create
Np, fixed.

> 
> > +
> > +- ``handle``: the flex item handle returned by the PMD on successful
> > +  rte_flow_flex_item_create() call. The item handle is unique within
> > +  the device port, mask for this field is ignored.
> 
> I think you can remove that it is unique handle.
> 
> > +- ``length``: match pattern length in bytes. If the length does not
> > +cover
> > +  all fields defined in item configuration, the pattern spec and mask
> > +are
> > +  supposed to be appended with zeroes till the full configured item length.
> 
> It looks bugy saying that you can give any length but expect the application to
> supply the full length.
Yes. Application can configure 128B protocol header with rte_flow_flex_item_create(). "Full configured length is 128B.
And provide only 4 bytes of pattern in the flow. The driver
should consider the patter as 4 bytes provided and 120 following 
zero bytes. 

> 
> > +- ``pattern``: pattern to match. The protocol header fields are
> > +considered
> > +  as bit fields, all offsets and widths are expressed in bits. The
> > +pattern
> > +  is the buffer containing the bit concatenation of all the fields
> > +presented
> > +  at item configuration time, in the same order and same amount. The
> > +most
> > +  regular way is to define all the header fields in the flex item
> > +configuration
> > +  and directly use the header structure as pattern template, i.e.
> > +application
> > +  just can fill the header structures with desired match values and
> > +masks and
> > +  specify these structures as flex item pattern directly.
> > +
> 
> It hard to understand this comment and what the application should set.
> I suggest to take the basic approach and just explain it. ( I think those are the
> last few lines)
Last few lines is just a supposed option.
Generally speaking, there are TWO structures - protocol header and match pattern. The easiest way ("the most regular way") to use flex item - make these TWO structures coinciding. But this is not an only way. Fields in pattern
can reference the same protocol header fields multiple times, in  arbitrary
number and combinations. Application can do gathering split field together, do byte order conversion, etc. 


> 
> >  Actions
> >  ~~~~~~~
> >
> > diff --git a/doc/guides/rel_notes/release_21_11.rst
> > b/doc/guides/rel_notes/release_21_11.rst
> > index 73e377a007..170797f9e9 100644
> > --- a/doc/guides/rel_notes/release_21_11.rst
> > +++ b/doc/guides/rel_notes/release_21_11.rst
> > @@ -55,6 +55,13 @@ New Features
> >       Also, make sure to start the actual text at the margin.
> >       =======================================================
> >
> > +* **Introduced RTE Flow Flex Item.**
> > +
> > +  * The configurable RTE Flow Flex Item provides the capability to introdude
> > +    the arbitrary user specified network protocol header, configure the
> device
> > +    hardware accordingly, and perform match on this header with
> > + desired
> > patterns
> > +    and masks.
> > +
> >  * **Enabled new devargs parser.**
> >
> >    * Enabled devargs syntax
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > afdc53b674..e9ad7673e9 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -558,6 +558,7 @@ struct rte_eth_rss_conf {
> >   * it takes the reserved value 0 as input for the hash function.
> >   */
> >  #define ETH_RSS_L4_CHKSUM          (1ULL << 35)
> > +#define ETH_RSS_FLEX		   (1ULL << 36)
> 
> Is the indentation right?
> How do you support FLEX RSS if more then on FLEX item is configured?
> 
As we found we missed some options in RSS related API (we have to invent
the way how to tell the drivers about flex item fields while creating the indirect RSS action), and there is no PMDs supporting RSSing over flex field yet, we
can omit RSS-related stuff in this release.

> >
> >  /*
> >   * We use the following macros to combine with above ETH_RSS_* for
> > diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> > 7b1ed7f110..eccb1e1791 100644
> > --- a/lib/ethdev/rte_flow.h
> > +++ b/lib/ethdev/rte_flow.h
> > @@ -574,6 +574,15 @@ enum rte_flow_item_type {
> >  	 * @see struct rte_flow_item_conntrack.
> >  	 */
> >  	RTE_FLOW_ITEM_TYPE_CONNTRACK,
> > +
> > +	/**
> > +	 * Matches a configured set of fields at runtime calculated offsets
> > +	 * over the generic network header with variable length and
> > +	 * flexible pattern
> > +	 *
> 
> I think it should say matches on application configured header.
No, no. Matches on pattern. That may be configured with the same format
as protocol header has. Or may be not (more complicated way to operated, but it could provide some optimizations).
> 
> > +	 * @see struct rte_flow_item_flex.
> > +	 */
> > +	RTE_FLOW_ITEM_TYPE_FLEX,
> >  };
> >
> >  /**
> > @@ -1839,6 +1848,160 @@ struct rte_flow_item {
> >  	const void *mask; /**< Bit-mask applied to spec and last. */  };
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior notice
> > + *
> > + * RTE_FLOW_ITEM_TYPE_FLEX
> > + *
> > + * Matches a specified set of fields within the network protocol
> > + * header. Each field is presented as set of bits with specified
> > +width, and
> > + * bit offset (this is dynamic one - can be calulated by several
> > +methods
> > + * in runtime) from the header beginning.
> > + *
> > + * The pattern is concatenation of all bit fields configured at item
> > +creation
> > + * by rte_flow_flex_item_create() exactly in the same order and
> > +amount, no
> > + * fields can be omitted or swapped. The dummy mode field can be used
> > +for
> > + * pattern byte boundary alignment, least significant bit in byte goes first.
> > + * Only the fields specified in sample_data configuration parameter
> > +participate
> > + * in pattern construction.
> > + *
> > + * If pattern length is smaller than configured fields overall length
> > +it is
> > + * extended with trailing zeroes, both for value and mask.
> > + *
> > + * This type does not support ranges (struct rte_flow_item.last).
> > + */
> 
> I think it is to complex to understand see my comment above.
> 
> > +struct rte_flow_item_flex {
> > +	struct rte_flow_item_flex_handle *handle; /**< Opaque item handle.
> > */
> > +	uint32_t length; /**< Pattern length in bytes. */
> > +	const uint8_t *pattern; /**< Combined bitfields pattern to match. */
> > +};
> > +/**
> > + * Field bit offset calculation mode.
> > + */
> > +enum rte_flow_item_flex_field_mode {
> > +	/**
> > +	 * Dummy field, used for byte boundary alignment in pattern.
> > +	 * Pattern mask and data are ignored in the match. All configuration
> > +	 * parameters besides field size are ignored.
> 
> Since in the item we just set value and mask what will happen if we set mask
> to be different then 0 in an offset that we have such a field?
Nothing. The mask and value for bits covered with DUMMY fields are just ignored. There can be any values and masks, these ones will not be translated to actual flow matcher. DUMMY  is just a placeholder, to align the substantial fields with actual protocol header structure. DUMMY usage is optional, and we
need these ones only to build the pattern structure coinciding with proto header, without covering the entire header with actual sampling  fields (to save HW resources).

> 
> > +	 */
> > +	FIELD_MODE_DUMMY = 0,
> > +	/**
> > +	 * Fixed offset field. The bit offset from header beginning is
> > +	 * is permanent and defined by field_base parameter.
> > +	 */
> > +	FIELD_MODE_FIXED,
> > +	/**
> > +	 * The field bit offset is extracted from other header field (indirect
> > +	 * offset field). The resulting field offset to match is calculated as:
> > +	 *
> > +	 *    field_base + (*field_offset & offset_mask) << field_shift
> 
> I can't find those name in the patch and I'm not clear on what they mean.
Yes, it is type, fixed, thank you.

> 
> > +	 */
> > +	FIELD_MODE_OFFSET,
> > +	/**
> > +	 * The field bit offset is extracted from other header field (indirect
> > +	 * offset field), the latter is considered as bitmask containing some
> > +	 * number of one bits, the resulting field offset to match is
> > +	 * calculated as:
> 
> Just like above.
> 
> > +	 *
> > +	 *    field_base + bitcount(*field_offset & offset_mask) << field_shift
> > +	 */
> > +	FIELD_MODE_BITMASK,
> > +};
> > +
> > +/**
> > + * Flex item field tunnel mode
> > + */
> > +enum rte_flow_item_flex_tunnel_mode {
> > +	FLEX_TUNNEL_MODE_FIRST = 0, /**< First item occurrence. */
> > +	FLEX_TUNNEL_MODE_OUTER = 1, /**< Outer item. */
> > +	FLEX_TUNNEL_MODE_INNER = 2  /**< Inner item. */ };
> > +
> 
> The '}' should be at a new line.
> If the item can be inner and outer do we need to define two flex objects?
> Also why enum and not defines?
Just looked at rte_flow.h and saw the #defines are not so common there,
just for bit flags. For values sets there are mostly enums. And we have updated
the tunnel settings, so this  enum is not needed anymore.

> From API point of view I think it should hav the following options:
> Mode_outer , mode_inner, mode_global and mode_tunnel, Why is per field
> and not per object.
Yes,  agree, thank you very much for discovering this arch gap, updated.

> 
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior notice
> > +*/ __extension__ struct rte_flow_item_flex_field {
> > +	/** Defines how match field offset is calculated over the packet. */
> > +	enum rte_flow_item_flex_field_mode field_mode;
> > +	uint32_t field_size; /**< Match field size in bits. */
> 
> I think it will be better to remove the word Match.
Yes, right, fixed
> 
> > +	int32_t field_base; /**< Match field offset in bits. */
> 
> I think it will be better to remove the word Match.
Yes, right, fixed

> 
> > +	uint32_t offset_base; /**< Indirect offset field offset in bits. */
> 
> I think a better name will be offset_field /* the offset of the field that holds
> the offset that should be used from the field_base */ what do you think?
It is just one term (first in sum of resulting offset), so xxxx_base looks OK.
> 
> Maybe just change from offset_base to offset?
> 
> > +	uint32_t offset_mask; /**< Indirect offset field bit mask. */
> 
> Maybe better wording?
> The mask to apply to the value that is set in the offset_field.

We have an entity - offset field.
We have 3 attributes of entiry - base, mask, shift.
So, the naming schema is  "entity-name_attribute-name":
offset_base  - "base" attribute of "offset field"
offset_mask - "mask" attribute of "offset field"
offset_shift - "shift" attribute of "offset field"
field_base - "base" attribute of "field" entity

The "field" word is omitted from "offset_field" name in order to make name shorted and to not intermix with pure "field" entity.

> 
> > +	int32_t offset_shift; /**< Indirect offset multiply factor. */
> > +	uint16_t tunnel_count:2; /**< 0-first occurrence, 1-outer,
> > +2-inner.*/
> 
> I think this may result in some warning since you try to cast enum to 2 bits.
> Also the same question from above to support inner and outer do we need
> two objects?
We refactored the tunneling attributes.

> 
> > +	uint16_t rss_hash:1; /**< Field participates in RSS hash
> > +calculation. */
> 
> Please see my comment on the RSS, it is not clear how more then one flex
> item can be created and the rss will work.
Yes, you are right, we must update the RSS API as well. No we have no drivers
supporting RSS over flex item fields. But we considered the opportunity and now (as review result) we have better understanding what we should develop to provide RSS over flex. 

> 
> > +	uint16_t field_id; /**< device hint, for flows with multiple items.
> > +*/
> 
> How should this be used?
> Should be capital D in device.
Updated the documentation.
> 
> > +};
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior notice
> > +*/ struct rte_flow_item_flex_link {
> > +	/**
> > +	 * Preceding/following header. The item type must be always
> > provided.
> > +	 * For preceding one item must specify the header value/mask to
> > match
> > +	 * for the link be taken and start the flex item header parsing.
> > +	 */
> > +	struct rte_flow_item item;
> > +	/**
> > +	 * Next field value to match to continue with one of the configured
> > +	 * next protocols.
> > +	 */
> > +	uint32_t next;
> 
> Is this offset of the field or the value?
" Next field VALUE"
It is the value. Like 0x0800 in eth_type to specify IPv4 next proto.
Or 17 in IPv4.proto to specify following UDP.

> 
> > +	/**
> > +	 * Specifies whether flex item represents tunnel protocol
> > +	 */
> > +	bool tunnel;
> > +};
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior notice
> > +*/ struct rte_flow_item_flex_conf {
> > +	/**
> > +	 * The next header offset, it presents the network header size covered
> > +	 * by the flex item and can be obtained with all supported offset
> > +	 * calculating methods (fixed, dedicated field, bitmask, etc).
> > +	 */
> > +	struct rte_flow_item_flex_field next_header;
> 
> I think a better name will be size/len
Replied above about level of indirection.

> 
> > +	/**
> > +	 * Specifies the next protocol field to match with link next protocol
> > +	 * values and continue packet parsing with matching link.
> > +	 */
> > +	struct rte_flow_item_flex_field next_protocol;
> > +	/**
> > +	 * The fields will be sampled and presented for explicit match
> > +	 * with pattern in the rte_flow_flex_item. There can be multiple
> > +	 * fields descriptors, the number should be specified by sample_num.
> > +	 */
> > +	struct rte_flow_item_flex_field *sample_data;
> > +	/** Number of field descriptors in the sample_data array. */
> > +	uint32_t sample_num;
> 
> nb_samples?
> 
> > +	/**
> > +	 * Input link defines the flex item relation with preceding
> > +	 * header. It specified the preceding item type and provides pattern
> > +	 * to match. The flex item will continue parsing and will provide the
> > +	 * data to flow match in case if there is the match with one of input
> > +	 * links.
> > +	 */
> > +	struct rte_flow_item_flex_link *input_link;
> > +	/** Number of link descriptors in the input link array. */
> > +	uint32_t input_num;
> Nb_inputs
OK, let's rename.
.. snip ..
> > +
> > +/**
> > + * Modify the flex item on the specified Ethernet device.
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] handle
> > + *   Handle of the item existing on the specified device.
> > + * @param[in] conf
> > + *   Item new configuration.
> 
> Do you to supply full configuration for each update?
> Maybe add a mask?
Currently no drivers supporting update. So, let's remove update routine.

With best regards,
Slava
  

Patch

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 2b42d5ec8c..628f30cea7 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1425,6 +1425,30 @@  Matches a conntrack state after conntrack action.
 - ``flags``: conntrack packet state flags.
 - Default ``mask`` matches all state bits.
 
+Item: ``FLEX``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Matches with the network protocol header of preliminary configured format.
+The application describes the desired header structure, defines the header
+fields attributes and header relations with preceding and following
+protocols and configures the ethernet devices accordingly via
+rte_flow_flex_item_create() routine.
+
+- ``handle``: the flex item handle returned by the PMD on successful
+  rte_flow_flex_item_create() call. The item handle is unique within
+  the device port, mask for this field is ignored.
+- ``length``: match pattern length in bytes. If the length does not cover
+  all fields defined in item configuration, the pattern spec and mask are
+  supposed to be appended with zeroes till the full configured item length.
+- ``pattern``: pattern to match. The protocol header fields are considered
+  as bit fields, all offsets and widths are expressed in bits. The pattern
+  is the buffer containing the bit concatenation of all the fields presented
+  at item configuration time, in the same order and same amount. The most
+  regular way is to define all the header fields in the flex item configuration
+  and directly use the header structure as pattern template, i.e. application
+  just can fill the header structures with desired match values and masks and
+  specify these structures as flex item pattern directly.
+
 Actions
 ~~~~~~~
 
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 73e377a007..170797f9e9 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -55,6 +55,13 @@  New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Introduced RTE Flow Flex Item.**
+
+  * The configurable RTE Flow Flex Item provides the capability to introdude
+    the arbitrary user specified network protocol header, configure the device
+    hardware accordingly, and perform match on this header with desired patterns
+    and masks.
+
 * **Enabled new devargs parser.**
 
   * Enabled devargs syntax
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index afdc53b674..e9ad7673e9 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -558,6 +558,7 @@  struct rte_eth_rss_conf {
  * it takes the reserved value 0 as input for the hash function.
  */
 #define ETH_RSS_L4_CHKSUM          (1ULL << 35)
+#define ETH_RSS_FLEX		   (1ULL << 36)
 
 /*
  * We use the following macros to combine with above ETH_RSS_* for
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 7b1ed7f110..eccb1e1791 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -574,6 +574,15 @@  enum rte_flow_item_type {
 	 * @see struct rte_flow_item_conntrack.
 	 */
 	RTE_FLOW_ITEM_TYPE_CONNTRACK,
+
+	/**
+	 * Matches a configured set of fields at runtime calculated offsets
+	 * over the generic network header with variable length and
+	 * flexible pattern
+	 *
+	 * @see struct rte_flow_item_flex.
+	 */
+	RTE_FLOW_ITEM_TYPE_FLEX,
 };
 
 /**
@@ -1839,6 +1848,160 @@  struct rte_flow_item {
 	const void *mask; /**< Bit-mask applied to spec and last. */
 };
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ *
+ * RTE_FLOW_ITEM_TYPE_FLEX
+ *
+ * Matches a specified set of fields within the network protocol
+ * header. Each field is presented as set of bits with specified width, and
+ * bit offset (this is dynamic one - can be calulated by several methods
+ * in runtime) from the header beginning.
+ *
+ * The pattern is concatenation of all bit fields configured at item creation
+ * by rte_flow_flex_item_create() exactly in the same order and amount, no
+ * fields can be omitted or swapped. The dummy mode field can be used for
+ * pattern byte boundary alignment, least significant bit in byte goes first.
+ * Only the fields specified in sample_data configuration parameter participate
+ * in pattern construction.
+ *
+ * If pattern length is smaller than configured fields overall length it is
+ * extended with trailing zeroes, both for value and mask.
+ *
+ * This type does not support ranges (struct rte_flow_item.last).
+ */
+struct rte_flow_item_flex {
+	struct rte_flow_item_flex_handle *handle; /**< Opaque item handle. */
+	uint32_t length; /**< Pattern length in bytes. */
+	const uint8_t *pattern; /**< Combined bitfields pattern to match. */
+};
+/**
+ * Field bit offset calculation mode.
+ */
+enum rte_flow_item_flex_field_mode {
+	/**
+	 * Dummy field, used for byte boundary alignment in pattern.
+	 * Pattern mask and data are ignored in the match. All configuration
+	 * parameters besides field size are ignored.
+	 */
+	FIELD_MODE_DUMMY = 0,
+	/**
+	 * Fixed offset field. The bit offset from header beginning is
+	 * is permanent and defined by field_base parameter.
+	 */
+	FIELD_MODE_FIXED,
+	/**
+	 * The field bit offset is extracted from other header field (indirect
+	 * offset field). The resulting field offset to match is calculated as:
+	 *
+	 *    field_base + (*field_offset & offset_mask) << field_shift
+	 */
+	FIELD_MODE_OFFSET,
+	/**
+	 * The field bit offset is extracted from other header field (indirect
+	 * offset field), the latter is considered as bitmask containing some
+	 * number of one bits, the resulting field offset to match is
+	 * calculated as:
+	 *
+	 *    field_base + bitcount(*field_offset & offset_mask) << field_shift
+	 */
+	FIELD_MODE_BITMASK,
+};
+
+/**
+ * Flex item field tunnel mode
+ */
+enum rte_flow_item_flex_tunnel_mode {
+	FLEX_TUNNEL_MODE_FIRST = 0, /**< First item occurrence. */
+	FLEX_TUNNEL_MODE_OUTER = 1, /**< Outer item. */
+	FLEX_TUNNEL_MODE_INNER = 2  /**< Inner item. */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ */
+__extension__
+struct rte_flow_item_flex_field {
+	/** Defines how match field offset is calculated over the packet. */
+	enum rte_flow_item_flex_field_mode field_mode;
+	uint32_t field_size; /**< Match field size in bits. */
+	int32_t field_base; /**< Match field offset in bits. */
+	uint32_t offset_base; /**< Indirect offset field offset in bits. */
+	uint32_t offset_mask; /**< Indirect offset field bit mask. */
+	int32_t offset_shift; /**< Indirect offset multiply factor. */
+	uint16_t tunnel_count:2; /**< 0-first occurrence, 1-outer, 2-inner.*/
+	uint16_t rss_hash:1; /**< Field participates in RSS hash calculation. */
+	uint16_t field_id; /**< device hint, for flows with multiple items. */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ */
+struct rte_flow_item_flex_link {
+	/**
+	 * Preceding/following header. The item type must be always provided.
+	 * For preceding one item must specify the header value/mask to match
+	 * for the link be taken and start the flex item header parsing.
+	 */
+	struct rte_flow_item item;
+	/**
+	 * Next field value to match to continue with one of the configured
+	 * next protocols.
+	 */
+	uint32_t next;
+	/**
+	 * Specifies whether flex item represents tunnel protocol
+	 */
+	bool tunnel;
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ */
+struct rte_flow_item_flex_conf {
+	/**
+	 * The next header offset, it presents the network header size covered
+	 * by the flex item and can be obtained with all supported offset
+	 * calculating methods (fixed, dedicated field, bitmask, etc).
+	 */
+	struct rte_flow_item_flex_field next_header;
+	/**
+	 * Specifies the next protocol field to match with link next protocol
+	 * values and continue packet parsing with matching link.
+	 */
+	struct rte_flow_item_flex_field next_protocol;
+	/**
+	 * The fields will be sampled and presented for explicit match
+	 * with pattern in the rte_flow_flex_item. There can be multiple
+	 * fields descriptors, the number should be specified by sample_num.
+	 */
+	struct rte_flow_item_flex_field *sample_data;
+	/** Number of field descriptors in the sample_data array. */
+	uint32_t sample_num;
+	/**
+	 * Input link defines the flex item relation with preceding
+	 * header. It specified the preceding item type and provides pattern
+	 * to match. The flex item will continue parsing and will provide the
+	 * data to flow match in case if there is the match with one of input
+	 * links.
+	 */
+	struct rte_flow_item_flex_link *input_link;
+	/** Number of link descriptors in the input link array. */
+	uint32_t input_num;
+	/**
+	 * Output link defines the next protocol field value to match and
+	 * the following protocol header to continue packet parsing. Also
+	 * defines the tunnel-related behaviour.
+	 */
+	struct rte_flow_item_flex_link *output_link;
+	/** Number of link descriptors in the output link array. */
+	uint32_t output_num;
+};
+
 /**
  * Action types.
  *
@@ -4288,6 +4451,71 @@  rte_flow_tunnel_item_release(uint16_t port_id,
 			     struct rte_flow_item *items,
 			     uint32_t num_of_items,
 			     struct rte_flow_error *error);
+
+/**
+ * Create the flex item with specified configuration over
+ * the Ethernet device.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] conf
+ *   Item configuration.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   Non-NULL opaque pointer on success, NULL otherwise and rte_errno is set.
+ */
+__rte_experimental
+struct rte_flow_item_flex_handle *
+rte_flow_flex_item_create(uint16_t port_id,
+			  const struct rte_flow_item_flex_conf *conf,
+			  struct rte_flow_error *error);
+
+/**
+ * Release the flex item on the specified Ethernet device.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] handle
+ *   Handle of the item existing on the specified device.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_flex_item_release(uint16_t port_id,
+			   const struct rte_flow_item_flex_handle *handle,
+			   struct rte_flow_error *error);
+
+/**
+ * Modify the flex item on the specified Ethernet device.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] handle
+ *   Handle of the item existing on the specified device.
+ * @param[in] conf
+ *   Item new configuration.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_flex_item_update(uint16_t port_id,
+			  const struct rte_flow_item_flex_handle *handle,
+			  const struct rte_flow_item_flex_conf *conf,
+			  struct rte_flow_error *error);
+
 #ifdef __cplusplus
 }
 #endif