diff mbox series

[v4] ethdev: extend flow metadata

Message ID 1572201636-16374-1-git-send-email-viacheslavo@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers show
Series [v4] ethdev: extend flow metadata | expand

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-compilation success Compile Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/travis-robot success Travis build: passed
ci/Intel-compilation fail Compilation issues

Commit Message

Slava Ovsiienko Oct. 27, 2019, 6:40 p.m. UTC
Currently, metadata can be set on egress path via mbuf tx_metadata field
with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META matches metadata.

This patch extends the metadata feature usability.

1) RTE_FLOW_ACTION_TYPE_SET_META

When supporting multiple tables, Tx metadata can also be set by a rule and
matched by another rule. This new action allows metadata to be set as a
result of flow match.

2) Metadata on ingress

There's also need to support metadata on ingress. Metadata can be set by
SET_META action and matched by META item like Tx. The final value set by
the action will be delivered to application via metadata dynamic field of
mbuf which can be accessed by RTE_FLOW_DYNF_METADATA().
PKT_RX_DYNF_METADATA flag will be set along with the data.

The mbuf dynamic field must be registered by calling
rte_flow_dynf_metadata_register() prior to use SET_META action.

The availability of dynamic mbuf metadata field can be checked
with rte_flow_dynf_metadata_avail() routine.

For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
propagated to the other path depending on hardware capability.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
v4: documentation comments addressed
    deprecation notice for Tx metadata offload flag
    rebased

v3: http://patches.dpdk.org/patch/61902/
    rebased, neat updates

v2: http://patches.dpdk.org/patch/60909/
v1: http://patches.dpdk.org/patch/56104/
rfc: http://patches.dpdk.org/patch/54271/

 app/test-pmd/cmdline_flow.c              | 57 +++++++++++++++++-
 app/test-pmd/util.c                      |  5 ++
 doc/guides/prog_guide/rte_flow.rst       | 72 ++++++++++++++++++-----
 doc/guides/rel_notes/deprecation.rst     |  4 ++
 doc/guides/rel_notes/release_19_11.rst   | 15 +++++
 lib/librte_ethdev/rte_ethdev.h           |  1 -
 lib/librte_ethdev/rte_ethdev_version.map |  3 +
 lib/librte_ethdev/rte_flow.c             | 41 +++++++++++++
 lib/librte_ethdev/rte_flow.h             | 99 +++++++++++++++++++++++++++++++-
 lib/librte_mbuf/rte_mbuf_dyn.h           |  8 ++-
 10 files changed, 283 insertions(+), 22 deletions(-)

Comments

Ori Kam Oct. 27, 2019, 7:10 p.m. UTC | #1
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Viacheslav Ovsiienko
> Subject: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata
> 
> Currently, metadata can be set on egress path via mbuf tx_metadata field
> with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META matches
> metadata.
> 
> This patch extends the metadata feature usability.
> 
> 1) RTE_FLOW_ACTION_TYPE_SET_META
> 
> When supporting multiple tables, Tx metadata can also be set by a rule and
> matched by another rule. This new action allows metadata to be set as a
> result of flow match.
> 
> 2) Metadata on ingress
> 
> There's also need to support metadata on ingress. Metadata can be set by
> SET_META action and matched by META item like Tx. The final value set by
> the action will be delivered to application via metadata dynamic field of
> mbuf which can be accessed by RTE_FLOW_DYNF_METADATA().
> PKT_RX_DYNF_METADATA flag will be set along with the data.
> 
> The mbuf dynamic field must be registered by calling
> rte_flow_dynf_metadata_register() prior to use SET_META action.
> 
> The availability of dynamic mbuf metadata field can be checked
> with rte_flow_dynf_metadata_avail() routine.
> 
> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> propagated to the other path depending on hardware capability.
> 
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> ---
> v4: documentation comments addressed
>     deprecation notice for Tx metadata offload flag
>     rebased
> 
> v3:
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatches.dp
> dk.org%2Fpatch%2F61902%2F&amp;data=02%7C01%7Corika%40mellanox.com
> %7Ce5a38cd79f30468e76d408d75b0d2f4c%7Ca652971c7d2e4d9ba6a4d149256
> f461b%7C0%7C0%7C637077984504455741&amp;sdata=C1yyYY8M8LpoOg1bTz
> wM8nIx19RcDzP96GVNA%2FABRb8%3D&amp;reserved=0
>     rebased, neat updates
> 
> v2:
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatches.dp
> dk.org%2Fpatch%2F60909%2F&amp;data=02%7C01%7Corika%40mellanox.com
> %7Ce5a38cd79f30468e76d408d75b0d2f4c%7Ca652971c7d2e4d9ba6a4d149256
> f461b%7C0%7C0%7C637077984504455741&amp;sdata=H1zpBrDfxQaTAQwETE
> St9uiY3rgVHQEMw%2FeEveZSdx4%3D&amp;reserved=0
> v1:
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatches.dp
> dk.org%2Fpatch%2F56104%2F&amp;data=02%7C01%7Corika%40mellanox.com
> %7Ce5a38cd79f30468e76d408d75b0d2f4c%7Ca652971c7d2e4d9ba6a4d149256
> f461b%7C0%7C0%7C637077984504455741&amp;sdata=olUN2iPv38TqFHIX8a0
> b3Uz505Cqz34BOlckZHsl8rw%3D&amp;reserved=0
> rfc:
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatches.dp
> dk.org%2Fpatch%2F54271%2F&amp;data=02%7C01%7Corika%40mellanox.com
> %7Ce5a38cd79f30468e76d408d75b0d2f4c%7Ca652971c7d2e4d9ba6a4d149256
> f461b%7C0%7C0%7C637077984504455741&amp;sdata=%2BMP4tWWQHO6Vd
> NBGJNM1om%2BwoM5ARrbXx0DP44et5mA%3D&amp;reserved=0
> 
>  app/test-pmd/cmdline_flow.c              | 57 +++++++++++++++++-
>  app/test-pmd/util.c                      |  5 ++
>  doc/guides/prog_guide/rte_flow.rst       | 72 ++++++++++++++++++-----
>  doc/guides/rel_notes/deprecation.rst     |  4 ++
>  doc/guides/rel_notes/release_19_11.rst   | 15 +++++
>  lib/librte_ethdev/rte_ethdev.h           |  1 -
>  lib/librte_ethdev/rte_ethdev_version.map |  3 +
>  lib/librte_ethdev/rte_flow.c             | 41 +++++++++++++
>  lib/librte_ethdev/rte_flow.h             | 99
> +++++++++++++++++++++++++++++++-

Acked-by: Ori Kam <orika@mellanox.com>

Thanks,
Ori
Andrew Rybchenko Oct. 29, 2019, 4:22 p.m. UTC | #2
On 10/27/19 9:40 PM, Viacheslav Ovsiienko wrote:
> Currently, metadata can be set on egress path via mbuf tx_metadata field
> with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META matches metadata.
>
> This patch extends the metadata feature usability.
>
> 1) RTE_FLOW_ACTION_TYPE_SET_META
>
> When supporting multiple tables, Tx metadata can also be set by a rule and
> matched by another rule. This new action allows metadata to be set as a
> result of flow match.
>
> 2) Metadata on ingress
>
> There's also need to support metadata on ingress. Metadata can be set by
> SET_META action and matched by META item like Tx. The final value set by
> the action will be delivered to application via metadata dynamic field of
> mbuf which can be accessed by RTE_FLOW_DYNF_METADATA().
> PKT_RX_DYNF_METADATA flag will be set along with the data.
>
> The mbuf dynamic field must be registered by calling
> rte_flow_dynf_metadata_register() prior to use SET_META action.
>
> The availability of dynamic mbuf metadata field can be checked
> with rte_flow_dynf_metadata_avail() routine.
>
> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> propagated to the other path depending on hardware capability.
>
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Above explanations lack information about "meta" vs "mark" which
may be set on Rx as well and delivered in other mbuf field.
It should be explained by one more field is required and rules
defined. Otherwise we can endup in half PMDs supporting
mark only, half PMDs supporting meta only and applications
in an interesting situation to make a choice which one to use.

[snip]

> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index 159ce19..c943aca 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -658,6 +658,32 @@ the physical device, with virtual groups in the PMD or not at all.
>      | ``mask`` | ``id``   | zeroed to match any value |
>      +----------+----------+---------------------------+
>   
> +Item: ``META``
> +^^^^^^^^^^^^^^^^^
> +
> +Matches 32 bit metadata item set.
> +
> +On egress, metadata can be set either by mbuf metadata field with
> +PKT_TX_METADATA flag or ``SET_META`` action. On ingress, ``SET_META``
> +action sets metadata for a packet and the metadata will be reported via
> +``metadata`` dynamic field of ``rte_mbuf`` with PKT_RX_DYNF_METADATA flag.
> +
> +- Default ``mask`` matches the specified Rx metadata value.
> +
> +.. _table_rte_flow_item_meta:
> +
> +.. table:: META
> +
> +   +----------+----------+---------------------------------------+
> +   | Field    | Subfield | Value                                 |
> +   +==========+==========+=======================================+
> +   | ``spec`` | ``data`` | 32 bit metadata value                 |
> +   +----------+----------+---------------------------------------+
> +   | ``last`` | ``data`` | upper range value                     |
> +   +----------+----------+---------------------------------------+
> +   | ``mask`` | ``data`` | bit-mask applies to "spec" and "last" |
> +   +----------+----------+---------------------------------------+
> +
>   Data matching item types
>   ~~~~~~~~~~~~~~~~~~~~~~~~
>   
> @@ -1232,21 +1258,6 @@ Matches a PPPoE session protocol identifier.
>   - ``proto_id``: PPP protocol identifier.
>   - Default ``mask`` matches proto_id only.
>   
> -
> -.. _table_rte_flow_item_meta:
> -
> -.. table:: META
> -
> -   +----------+----------+---------------------------------------+
> -   | Field    | Subfield | Value                                 |
> -   +==========+==========+=======================================+
> -   | ``spec`` | ``data`` | 32 bit metadata value                 |
> -   +----------+--------------------------------------------------+
> -   | ``last`` | ``data`` | upper range value                     |
> -   +----------+----------+---------------------------------------+
> -   | ``mask`` | ``data`` | bit-mask applies to "spec" and "last" |
> -   +----------+----------+---------------------------------------+
> -
>   Item: ``NSH``
>   ^^^^^^^^^^^^^
>   
> @@ -2466,6 +2477,37 @@ Value to decrease TCP acknowledgment number by is a big-endian 32 bit integer.
>   
>   Using this action on non-matching traffic will result in undefined behavior.
>   
> +Action: ``SET_META``
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Set metadata. Item ``META`` matches metadata.
> +
> +Metadata set by mbuf metadata field with PKT_TX_METADATA flag on egress will be
> +overridden by this action. On ingress, the metadata will be carried by
> +``metadata`` dynamic field of ``rte_mbuf`` which can be accessed by
> +``RTE_FLOW_DYNF_METADATA()``. PKT_RX_DYNF_METADATA flag will be set along
> +with the data.
> +
> +The mbuf dynamic field must be registered by calling
> +``rte_flow_dynf_metadata_register()`` prior to use ``SET_META`` action.
> +
> +Altering partial bits is supported with ``mask``. For bits which have never been
> +set, unpredictable value will be seen depending on driver implementation. For
> +loopback/hairpin packet, metadata set on Rx/Tx may or may not be propagated to
> +the other path depending on HW capability.
> +
> +.. _table_rte_flow_action_set_meta:
> +
> +.. table:: SET_META
> +
> +   +----------+----------------------------+
> +   | Field    | Value                      |
> +   +==========+============================+
> +   | ``data`` | 32 bit metadata value      |
> +   +----------+----------------------------+
> +   | ``mask`` | bit-mask applies to "data" |
> +   +----------+----------------------------+
> +
>   Negative types
>   ~~~~~~~~~~~~~~
>   
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 3aa1634..9d54d8e 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -106,6 +106,10 @@ Deprecation Notices
>     struct ``rte_eth_dev_info`` for the port capability and in struct
>     ``rte_eth_rxmode`` for the port configuration.
>   
> +* ethdev: DEV_TX_OFFLOAD_MATCH_METADATA will be removed, static metadata
> +  mbuf field will be removed in 20.02, metadata feature will use dynamic mbuf
> +  field and flag instead.
> +

Isn't it breaking stable API/ABI? Should target release be 20.11?

I think that DEV_TX_OFFLOAD_MATCH_METADATA should marked
as deprecated now as well as tx_metadata field in mbuf.

>   * cryptodev: support for using IV with all sizes is added, J0 still can
>     be used but only when IV length in following structs ``rte_crypto_auth_xform``,
>     ``rte_crypto_aead_xform`` is set to zero. When IV length is greater or equal
> diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
> index 0e5bb5d..6d331f6 100644
> --- a/doc/guides/rel_notes/release_19_11.rst
> +++ b/doc/guides/rel_notes/release_19_11.rst
> @@ -232,6 +232,21 @@ New Features
>       gives ability to print port supported ptypes in different protocol layers.
>   
>   
> +* **Add support of support dynamic fields and flags in mbuf.**
> +
> +  This new feature adds the ability to dynamically register some room
> +  for a field or a flag in the mbuf structure. This is typically used
> +  for specific offload features, where adding a static field or flag
> +  in the mbuf is not justified.
> +

I guess above is just incorrect merge.

> +* **Extended metadata support in rte_flow.**
> +
> +  Flow metadata is extended to both Rx and Tx.
> +
> +  * Tx metadata can also be set by SET_META action of rte_flow.
> +  * Rx metadata is delivered to host via a dynamic field of ``rte_mbuf`` with
> +    PKT_RX_DYNF_METADATA.
> +

Two empty lines are required before the next section.

>   Removed Items
>   -------------
>   
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index c36c1b6..b19c86b 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1048,7 +1048,6 @@ struct rte_eth_conf {
>   #define DEV_RX_OFFLOAD_KEEP_CRC		0x00010000
>   #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
>   #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
> -

Unrelated change.

>   #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
>   				 DEV_RX_OFFLOAD_UDP_CKSUM | \
>   				 DEV_RX_OFFLOAD_TCP_CKSUM)

[snip]

> diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
> index ca0f680..6090177 100644
> --- a/lib/librte_ethdev/rte_flow.c
> +++ b/lib/librte_ethdev/rte_flow.c
> @@ -12,10 +12,18 @@
>   #include <rte_errno.h>
>   #include <rte_branch_prediction.h>
>   #include <rte_string_fns.h>
> +#include <rte_mbuf.h>
> +#include <rte_mbuf_dyn.h>
>   #include "rte_ethdev.h"
>   #include "rte_flow_driver.h"
>   #include "rte_flow.h"
>   
> +/* Mbuf dynamic field name for metadata. */
> +int rte_flow_dynf_metadata_offs = -1;
> +
> +/* Mbuf dynamic field flag bit number for metadata. */
> +uint64_t rte_flow_dynf_metadata_mask;
> +
>   /**
>    * Flow elements description tables.
>    */
> @@ -157,8 +165,41 @@ struct rte_flow_desc_data {
>   	MK_FLOW_ACTION(DEC_TCP_SEQ, sizeof(rte_be32_t)),
>   	MK_FLOW_ACTION(INC_TCP_ACK, sizeof(rte_be32_t)),
>   	MK_FLOW_ACTION(DEC_TCP_ACK, sizeof(rte_be32_t)),
> +	MK_FLOW_ACTION(SET_META, sizeof(struct rte_flow_action_set_meta)),
>   };
>   
> +int
> +rte_flow_dynf_metadata_register(void)
> +{
> +	int offset;
> +	int flag;
> +
> +	static const struct rte_mbuf_dynfield desc_offs = {
> +		.name = MBUF_DYNF_METADATA_NAME,
> +		.size = sizeof(uint32_t),
> +		.align = __alignof__(uint32_t),
> +		.flags = 0,

I think flags initialization to 0 is redundant.

> +	};
> +	static const struct rte_mbuf_dynflag desc_flag = {
> +		.name = MBUF_DYNF_METADATA_NAME,
> +	};
> +
> +	offset = rte_mbuf_dynfield_register(&desc_offs);
> +	if (offset < 0)
> +		goto error;
> +	flag = rte_mbuf_dynflag_register(&desc_flag);
> +	if (flag < 0)
> +		goto error;
> +	rte_flow_dynf_metadata_offs = offset;
> +	rte_flow_dynf_metadata_mask = (1ULL << flag);
> +	return 0;
> +
> +error:

Just an observation...
Impossibility to unregister hits here. Field may be registered,
but will be used.

> +	rte_flow_dynf_metadata_offs = -1;
> +	rte_flow_dynf_metadata_mask = 0ULL;
> +	return -rte_errno;
> +}
> +
>   static int
>   flow_err(uint16_t port_id, int ret, struct rte_flow_error *error)
>   {
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index 4fee105..b821557 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -28,6 +28,8 @@
>   #include <rte_byteorder.h>
>   #include <rte_esp.h>
>   #include <rte_higig.h>
> +#include <rte_mbuf.h>
> +#include <rte_mbuf_dyn.h>
>   
>   #ifdef __cplusplus
>   extern "C" {
> @@ -418,7 +420,8 @@ enum rte_flow_item_type {
>   	/**
>   	 * [META]
>   	 *
> -	 * Matches a metadata value specified in mbuf metadata field.
> +	 * Matches a metadata value.
> +	 *
>   	 * See struct rte_flow_item_meta.
>   	 */
>   	RTE_FLOW_ITEM_TYPE_META,
> @@ -1263,9 +1266,17 @@ struct rte_flow_item_icmp6_nd_opt_tla_eth {
>   #endif
>   
>   /**
> - * RTE_FLOW_ITEM_TYPE_META.
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice

Is it allowed to make experimental back?

>    *
> - * Matches a specified metadata value.
> + * RTE_FLOW_ITEM_TYPE_META
> + *
> + * Matches a specified metadata value. On egress, metadata can be set either by
> + * mbuf tx_metadata field with PKT_TX_METADATA flag or
> + * RTE_FLOW_ACTION_TYPE_SET_META. On ingress, RTE_FLOW_ACTION_TYPE_SET_META sets
> + * metadata for a packet and the metadata will be reported via mbuf metadata
> + * dynamic field with PKT_RX_DYNF_METADATA flag. The dynamic mbuf field must be
> + * registered in advance by rte_flow_dynf_metadata_register().
>    */
>   struct rte_flow_item_meta {
>   	rte_be32_t data;

[snip]

> @@ -2429,6 +2447,55 @@ struct rte_flow_action_set_mac {
>   	uint8_t mac_addr[RTE_ETHER_ADDR_LEN];
>   };
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * RTE_FLOW_ACTION_TYPE_SET_META
> + *
> + * Set metadata. Metadata set by mbuf tx_metadata field with
> + * PKT_TX_METADATA flag on egress will be overridden by this action. On
> + * ingress, the metadata will be carried by mbuf metadata dynamic field
> + * with PKT_RX_DYNF_METADATA flag if set.  The dynamic mbuf field must be
> + * registered in advance by rte_flow_dynf_metadata_register().
> + *
> + * Altering partial bits is supported with mask. For bits which have never
> + * been set, unpredictable value will be seen depending on driver
> + * implementation. For loopback/hairpin packet, metadata set on Rx/Tx may
> + * or may not be propagated to the other path depending on HW capability.
> + *
> + * RTE_FLOW_ITEM_TYPE_META matches metadata.
> + */
> +struct rte_flow_action_set_meta {
> +	rte_be32_t data;
> +	rte_be32_t mask;

As I understand tx_metadata is host endian. Just double-checking.
Is a new dynamic field host endian or big endian?
I definitely would like to see motivation in comments why data/mask
are big-endian here.

> +};
> +
> +/* Mbuf dynamic field offset for metadata. */
> +extern int rte_flow_dynf_metadata_offs;
> +
> +/* Mbuf dynamic field flag mask for metadata. */
> +extern uint64_t rte_flow_dynf_metadata_mask;

These two global variables look frightening to me.
It does not look good to me.

> +
> +/* Mbuf dynamic field pointer for metadata. */
> +#define RTE_FLOW_DYNF_METADATA(m) \
> +	RTE_MBUF_DYNFIELD((m), rte_flow_dynf_metadata_offs, uint32_t *)
> +
> +/* Mbuf dynamic flag for metadata. */
> +#define PKT_RX_DYNF_METADATA (rte_flow_dynf_metadata_mask)
> +
> +__rte_experimental
> +static inline uint32_t
> +rte_flow_dynf_metadata_get(struct rte_mbuf *m) {

Above curly bracket should be on its own line in the case of function
definition.

Shouldn't m be 'const struct rte_mbuf *'?

> +	return *RTE_FLOW_DYNF_METADATA(m);
> +}
> +
> +__rte_experimental
> +static inline void
> +rte_flow_dynf_metadata_set(struct rte_mbuf *m, uint32_t v) {

Above curly bracket should be on its own line in the case of function
definition.

> +	*RTE_FLOW_DYNF_METADATA(m) = v;
> +}
> +
>   /*
>    * Definition of a single action.
>    *
> @@ -2662,6 +2729,32 @@ enum rte_flow_conv_op {
>   };
>   
>   /**
> + * Check if mbuf dynamic field for metadata is registered.
> + *
> + * @return
> + *   True if registered, false otherwise.
> + */
> +__rte_experimental
> +static inline int
> +rte_flow_dynf_metadata_avail(void) {

Above curly bracket should be on its own line in the case of function
definition.

> +	return !!rte_flow_dynf_metadata_mask;
> +}
> +
> +/**
> + * Register mbuf dynamic field and flag for metadata.
> + *
> + * This function must be called prior to use SET_META action in order to
> + * register the dynamic mbuf field. Otherwise, the data cannot be delivered to
> + * application.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_dynf_metadata_register(void);
> +
> +/**
>    * Check whether a flow rule can be created on a given port.
>    *
>    * The flow rule is validated for correctness and whether it could be accepted
> diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> index 2e9d418..a4a0cf5 100644
> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -234,6 +234,10 @@ int rte_mbuf_dynflag_lookup(const char *name,
>   __rte_experimental
>   void rte_mbuf_dyn_dump(FILE *out);
>   
> -/* Placeholder for dynamic fields and flags declarations. */
> -
> +/*
> + * Placeholder for dynamic fields and flags declarations.
> + * This is centralizing point to gather all field names
> + * and parameters together.
> + */

It is not a comment for below define. So, I think empty line is
required to separate the comment from below define.
I'm not sure that the clarification is required, but it is up to Olivier.

> +#define MBUF_DYNF_METADATA_NAME "rte_flow_dynfield_metadata"

Empty line is missing here

>   #endif
Olivier Matz Oct. 29, 2019, 4:25 p.m. UTC | #3
Hi Slava,

Looks good to me overall. Few minor comments below.

On Sun, Oct 27, 2019 at 06:40:36PM +0000, Viacheslav Ovsiienko wrote:
> Currently, metadata can be set on egress path via mbuf tx_metadata field
> with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META matches metadata.
> 
> This patch extends the metadata feature usability.
> 
> 1) RTE_FLOW_ACTION_TYPE_SET_META
> 
> When supporting multiple tables, Tx metadata can also be set by a rule and
> matched by another rule. This new action allows metadata to be set as a
> result of flow match.
> 
> 2) Metadata on ingress
> 
> There's also need to support metadata on ingress. Metadata can be set by
> SET_META action and matched by META item like Tx. The final value set by
> the action will be delivered to application via metadata dynamic field of
> mbuf which can be accessed by RTE_FLOW_DYNF_METADATA().
> PKT_RX_DYNF_METADATA flag will be set along with the data.
> 
> The mbuf dynamic field must be registered by calling
> rte_flow_dynf_metadata_register() prior to use SET_META action.
> 
> The availability of dynamic mbuf metadata field can be checked
> with rte_flow_dynf_metadata_avail() routine.
> 
> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> propagated to the other path depending on hardware capability.
> 
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

(...)

> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index c36c1b6..b19c86b 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1048,7 +1048,6 @@ struct rte_eth_conf {
>  #define DEV_RX_OFFLOAD_KEEP_CRC		0x00010000
>  #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
>  #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
> -
>  #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
>  				 DEV_RX_OFFLOAD_UDP_CKSUM | \
>  				 DEV_RX_OFFLOAD_TCP_CKSUM)

Undue removed line here.

(...)

> +/* Mbuf dynamic field offset for metadata. */
> +extern int rte_flow_dynf_metadata_offs;
> +
> +/* Mbuf dynamic field flag mask for metadata. */
> +extern uint64_t rte_flow_dynf_metadata_mask;
> +
> +/* Mbuf dynamic field pointer for metadata. */
> +#define RTE_FLOW_DYNF_METADATA(m) \
> +	RTE_MBUF_DYNFIELD((m), rte_flow_dynf_metadata_offs, uint32_t *)
> +
> +/* Mbuf dynamic flag for metadata. */
> +#define PKT_RX_DYNF_METADATA (rte_flow_dynf_metadata_mask)
> +
> +__rte_experimental
> +static inline uint32_t
> +rte_flow_dynf_metadata_get(struct rte_mbuf *m) {
> +	return *RTE_FLOW_DYNF_METADATA(m);
> +}
> +
> +__rte_experimental
> +static inline void
> +rte_flow_dynf_metadata_set(struct rte_mbuf *m, uint32_t v) {
> +	*RTE_FLOW_DYNF_METADATA(m) = v;
> +}
> +

(...)

> +__rte_experimental
> +static inline int
> +rte_flow_dynf_metadata_avail(void) {
> +       return !!rte_flow_dynf_metadata_mask;
> +}

I think, in DPDK:

	static inline void
	rte_flow_dynf_metadata_set(struct rte_mbuf *m, uint32_t v)
	{
		...

is prefered over:

	static inline void
	rte_flow_dynf_metadata_set(struct rte_mbuf *m, uint32_t v) {
		...

> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -234,6 +234,10 @@ int rte_mbuf_dynflag_lookup(const char *name,
>  __rte_experimental
>  void rte_mbuf_dyn_dump(FILE *out);
>  
> -/* Placeholder for dynamic fields and flags declarations. */
> -
> +/*
> + * Placeholder for dynamic fields and flags declarations.
> + * This is centralizing point to gather all field names
> + * and parameters together.
> + */
> +#define MBUF_DYNF_METADATA_NAME "rte_flow_dynfield_metadata"
>  #endif

The RTE_ prefix is missing. Also, thi name is called dynfield but it is
used for both field and flag. I suggest RTE_MBUF_DYNFIELD_METADATA_NAME
and RTE_MBUF_DYNFLAG_METADATA_NAME, to be consistent with the other
naming conventions in rte_mbuf_dyn.[ch].

One more comment: as previously discussed, changing the size or
alignement of a dynamic field should not be allowed, because it can
break the users of the field.

Depending on how it is implemented (is the registration function inline?
is the rte_mbuf_dynfield structure private, shared, or static const in a
.h? are we using #defines for name, size, align?), I think the impact on
users will be different. This is something we need to think about for
next versions: how to detect these changes before pushing the commit,
and/or at runtime?

Regards,
Olivier
Olivier Matz Oct. 29, 2019, 4:33 p.m. UTC | #4
On Tue, Oct 29, 2019 at 05:25:22PM +0100, Olivier Matz wrote:
> Hi Slava,
> 
> Looks good to me overall. Few minor comments below.
> 
> On Sun, Oct 27, 2019 at 06:40:36PM +0000, Viacheslav Ovsiienko wrote:
> > Currently, metadata can be set on egress path via mbuf tx_metadata field
> > with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META matches metadata.
> > 
> > This patch extends the metadata feature usability.
> > 
> > 1) RTE_FLOW_ACTION_TYPE_SET_META
> > 
> > When supporting multiple tables, Tx metadata can also be set by a rule and
> > matched by another rule. This new action allows metadata to be set as a
> > result of flow match.
> > 
> > 2) Metadata on ingress
> > 
> > There's also need to support metadata on ingress. Metadata can be set by
> > SET_META action and matched by META item like Tx. The final value set by
> > the action will be delivered to application via metadata dynamic field of
> > mbuf which can be accessed by RTE_FLOW_DYNF_METADATA().
> > PKT_RX_DYNF_METADATA flag will be set along with the data.
> > 
> > The mbuf dynamic field must be registered by calling
> > rte_flow_dynf_metadata_register() prior to use SET_META action.
> > 
> > The availability of dynamic mbuf metadata field can be checked
> > with rte_flow_dynf_metadata_avail() routine.
> > 
> > For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> > propagated to the other path depending on hardware capability.
> > 
> > Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> (...)
> 
> > diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> > index c36c1b6..b19c86b 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -1048,7 +1048,6 @@ struct rte_eth_conf {
> >  #define DEV_RX_OFFLOAD_KEEP_CRC		0x00010000
> >  #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
> >  #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
> > -
> >  #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
> >  				 DEV_RX_OFFLOAD_UDP_CKSUM | \
> >  				 DEV_RX_OFFLOAD_TCP_CKSUM)
> 
> Undue removed line here.
> 
> (...)
> 
> > +/* Mbuf dynamic field offset for metadata. */
> > +extern int rte_flow_dynf_metadata_offs;
> > +
> > +/* Mbuf dynamic field flag mask for metadata. */
> > +extern uint64_t rte_flow_dynf_metadata_mask;
> > +
> > +/* Mbuf dynamic field pointer for metadata. */
> > +#define RTE_FLOW_DYNF_METADATA(m) \
> > +	RTE_MBUF_DYNFIELD((m), rte_flow_dynf_metadata_offs, uint32_t *)
> > +
> > +/* Mbuf dynamic flag for metadata. */
> > +#define PKT_RX_DYNF_METADATA (rte_flow_dynf_metadata_mask)
> > +
> > +__rte_experimental
> > +static inline uint32_t
> > +rte_flow_dynf_metadata_get(struct rte_mbuf *m) {
> > +	return *RTE_FLOW_DYNF_METADATA(m);
> > +}
> > +
> > +__rte_experimental
> > +static inline void
> > +rte_flow_dynf_metadata_set(struct rte_mbuf *m, uint32_t v) {
> > +	*RTE_FLOW_DYNF_METADATA(m) = v;
> > +}
> > +
> 
> (...)
> 
> > +__rte_experimental
> > +static inline int
> > +rte_flow_dynf_metadata_avail(void) {
> > +       return !!rte_flow_dynf_metadata_mask;
> > +}
> 
> I think, in DPDK:
> 
> 	static inline void
> 	rte_flow_dynf_metadata_set(struct rte_mbuf *m, uint32_t v)
> 	{
> 		...
> 
> is prefered over:
> 
> 	static inline void
> 	rte_flow_dynf_metadata_set(struct rte_mbuf *m, uint32_t v) {
> 		...
> 
> > --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> > +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> > @@ -234,6 +234,10 @@ int rte_mbuf_dynflag_lookup(const char *name,
> >  __rte_experimental
> >  void rte_mbuf_dyn_dump(FILE *out);
> >  
> > -/* Placeholder for dynamic fields and flags declarations. */
> > -
> > +/*
> > + * Placeholder for dynamic fields and flags declarations.
> > + * This is centralizing point to gather all field names
> > + * and parameters together.
> > + */
> > +#define MBUF_DYNF_METADATA_NAME "rte_flow_dynfield_metadata"
> >  #endif
> 
> The RTE_ prefix is missing. Also, thi name is called dynfield but it is
> used for both field and flag. I suggest RTE_MBUF_DYNFIELD_METADATA_NAME
> and RTE_MBUF_DYNFLAG_METADATA_NAME, to be consistent with the other
> naming conventions in rte_mbuf_dyn.[ch].

I forgot: can you please document the goal/usage of these field and flag
here?  Not necessarily a detailed explanation, but a high level view:
what is transported, when it is registered, ...


> One more comment: as previously discussed, changing the size or
> alignement of a dynamic field should not be allowed, because it can
> break the users of the field.
> 
> Depending on how it is implemented (is the registration function inline?
> is the rte_mbuf_dynfield structure private, shared, or static const in a
> .h? are we using #defines for name, size, align?), I think the impact on
> users will be different. This is something we need to think about for
> next versions: how to detect these changes before pushing the commit,
> and/or at runtime?
> 
> Regards,
> Olivier
Slava Ovsiienko Oct. 29, 2019, 5:19 p.m. UTC | #5
Hi, Andrew

Thank you for the review.

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Tuesday, October 29, 2019 18:22
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Cc: Thomas Monjalon <thomas@monjalon.net>; Matan Azrad
> <matan@mellanox.com>; olivier.matz@6wind.com; Ori Kam
> <orika@mellanox.com>; Yongseok Koh <yskoh@mellanox.com>
> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata
> 
> On 10/27/19 9:40 PM, Viacheslav Ovsiienko wrote:
> > Currently, metadata can be set on egress path via mbuf tx_metadata
> > field with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META
> matches metadata.
> >
> > This patch extends the metadata feature usability.
> >
> > 1) RTE_FLOW_ACTION_TYPE_SET_META
> >
> > When supporting multiple tables, Tx metadata can also be set by a rule
> > and matched by another rule. This new action allows metadata to be set
> > as a result of flow match.
> >
> > 2) Metadata on ingress
> >
> > There's also need to support metadata on ingress. Metadata can be set
> > by SET_META action and matched by META item like Tx. The final value
> > set by the action will be delivered to application via metadata
> > dynamic field of mbuf which can be accessed by
> RTE_FLOW_DYNF_METADATA().
> > PKT_RX_DYNF_METADATA flag will be set along with the data.
> >
> > The mbuf dynamic field must be registered by calling
> > rte_flow_dynf_metadata_register() prior to use SET_META action.
> >
> > The availability of dynamic mbuf metadata field can be checked with
> > rte_flow_dynf_metadata_avail() routine.
> >
> > For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> > propagated to the other path depending on hardware capability.
> >
> > Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> Above explanations lack information about "meta" vs "mark" which may be
> set on Rx as well and delivered in other mbuf field.
> It should be explained by one more field is required and rules defined.

There is some story about metadata features.
Initially, there were proposed two metadata related actions:

- RTE_FLOW_ACTION_TYPE_FLAG
- RTE_FLOW_ACTION_TYPE_MARK

These actions set the special flag in the packet metadata, MARK action stores some
specified value in the metadata storage, and, on the packet receiving PMD puts the flag
and value to the mbuf and applications can see the packet was threated inside flow engine
according to the appropriate RTE flow(s). MARK and FLAG are like some kind of gateway
to transfer some per-packet information from the flow engine to the application
via receiving datapath.

From the datapath point of view, the MARK and FLAG are related to the receiving side only.
It would useful to have the same gateway on the transmitting side and there was the feature
of type RTE_FLOW_ITEM_TYPE_META was proposed. The application can fill the field in mbuf
and this value will be transferred to some field in the packet metadata inside the flow engine.
It did not matter whether these metadata fields are shared because of MARK and META items
belonged to different domains (receiving and transmitting) and could be vendor-specific.

So far, so good, DPDK proposes some entities to control metadata inside the flow engine
and gateways to exchange these values on a per-packet basis via datapaths.

As we can see, the MARK and META means are not symmetric, there is absent action which
would allow us to set META value on the transmitting path. So, the action of type:
- RTE_FLOW_ACTION_TYPE_SET_META is proposed.

The next, applications raise the new requirements for packet metadata. The flow engines are
getting more complex, internal switches are introduced, multiple ports might be supported within
the same flow engine namespace. From the DPDK points of view, it means the packets might be sent
on one eth_dev port and received on the other one, and the packet path inside the flow engine entirely
belongs to the same hardware device. The simplest example is SR-IOV with PF, VFs and the representors.
And there is a brilliant opportunity to provide some out-of-band channel to transfer some extra data
 from one port to another one, besides the packet data itself.


> Above explanations lack information about "meta" vs "mark" which may be
> set on Rx as well and delivered in other mbuf field.
> It should be explained by one more field is required and rules defined.
> Otherwise we can endup in half PMDs supporting mark only, half PMDs
> supporting meta only and applications in an interesting situation to make a
> choice which one to use.

There is no "mark" vs "meta". MARK and META means are kept for compatibility issues
and legacy part works exactly as before. The trials (with flow_validate)  is supposed
to check whether PMD supports MARK or META feature on appropriate domain. It depends
on PMD implementation, configuration and underlaying HW/FW/kernel capabilities and
should be resolved in runtime.

> 
> [snip]
> 
> > diff --git a/doc/guides/prog_guide/rte_flow.rst
> > b/doc/guides/prog_guide/rte_flow.rst
> > index 159ce19..c943aca 100644
> > --- a/doc/guides/prog_guide/rte_flow.rst
> > +++ b/doc/guides/prog_guide/rte_flow.rst
> > @@ -658,6 +658,32 @@ the physical device, with virtual groups in the
> PMD or not at all.
> >      | ``mask`` | ``id``   | zeroed to match any value |
> >      +----------+----------+---------------------------+
> >
> > +Item: ``META``
> > +^^^^^^^^^^^^^^^^^
> > +
> > +Matches 32 bit metadata item set.
> > +
> > +On egress, metadata can be set either by mbuf metadata field with
> > +PKT_TX_METADATA flag or ``SET_META`` action. On ingress,
> ``SET_META``
> > +action sets metadata for a packet and the metadata will be reported
> > +via ``metadata`` dynamic field of ``rte_mbuf`` with
> PKT_RX_DYNF_METADATA flag.
> > +
> > +- Default ``mask`` matches the specified Rx metadata value.
> > +
> > +.. _table_rte_flow_item_meta:
> > +
> > +.. table:: META
> > +
> > +   +----------+----------+---------------------------------------+
> > +   | Field    | Subfield | Value                                 |
> > +
> +==========+==========+=======================================
> +
> > +   | ``spec`` | ``data`` | 32 bit metadata value                 |
> > +   +----------+----------+---------------------------------------+
> > +   | ``last`` | ``data`` | upper range value                     |
> > +   +----------+----------+---------------------------------------+
> > +   | ``mask`` | ``data`` | bit-mask applies to "spec" and "last" |
> > +   +----------+----------+---------------------------------------+
> > +
> >   Data matching item types
> >   ~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > @@ -1232,21 +1258,6 @@ Matches a PPPoE session protocol identifier.
> >   - ``proto_id``: PPP protocol identifier.
> >   - Default ``mask`` matches proto_id only.
> >
> > -
> > -.. _table_rte_flow_item_meta:
> > -
> > -.. table:: META
> > -
> > -   +----------+----------+---------------------------------------+
> > -   | Field    | Subfield | Value                                 |
> > -
> +==========+==========+=======================================
> +
> > -   | ``spec`` | ``data`` | 32 bit metadata value                 |
> > -   +----------+--------------------------------------------------+
> > -   | ``last`` | ``data`` | upper range value                     |
> > -   +----------+----------+---------------------------------------+
> > -   | ``mask`` | ``data`` | bit-mask applies to "spec" and "last" |
> > -   +----------+----------+---------------------------------------+
> > -
> >   Item: ``NSH``
> >   ^^^^^^^^^^^^^
> >
> > @@ -2466,6 +2477,37 @@ Value to decrease TCP acknowledgment
> number by is a big-endian 32 bit integer.
> >
> >   Using this action on non-matching traffic will result in undefined behavior.
> >
> > +Action: ``SET_META``
> > +^^^^^^^^^^^^^^^^^^^^^^^
> > +
> > +Set metadata. Item ``META`` matches metadata.
> > +
> > +Metadata set by mbuf metadata field with PKT_TX_METADATA flag on
> > +egress will be overridden by this action. On ingress, the metadata
> > +will be carried by ``metadata`` dynamic field of ``rte_mbuf`` which
> > +can be accessed by ``RTE_FLOW_DYNF_METADATA()``.
> PKT_RX_DYNF_METADATA
> > +flag will be set along with the data.
> > +
> > +The mbuf dynamic field must be registered by calling
> > +``rte_flow_dynf_metadata_register()`` prior to use ``SET_META`` action.
> > +
> > +Altering partial bits is supported with ``mask``. For bits which have
> > +never been set, unpredictable value will be seen depending on driver
> > +implementation. For loopback/hairpin packet, metadata set on Rx/Tx
> > +may or may not be propagated to the other path depending on HW
> capability.
> > +
> > +.. _table_rte_flow_action_set_meta:
> > +
> > +.. table:: SET_META
> > +
> > +   +----------+----------------------------+
> > +   | Field    | Value                      |
> > +   +==========+============================+
> > +   | ``data`` | 32 bit metadata value      |
> > +   +----------+----------------------------+
> > +   | ``mask`` | bit-mask applies to "data" |
> > +   +----------+----------------------------+
> > +
> >   Negative types
> >   ~~~~~~~~~~~~~~
> >
> > diff --git a/doc/guides/rel_notes/deprecation.rst
> > b/doc/guides/rel_notes/deprecation.rst
> > index 3aa1634..9d54d8e 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -106,6 +106,10 @@ Deprecation Notices
> >     struct ``rte_eth_dev_info`` for the port capability and in struct
> >     ``rte_eth_rxmode`` for the port configuration.
> >
> > +* ethdev: DEV_TX_OFFLOAD_MATCH_METADATA will be removed, static
> > +metadata
> > +  mbuf field will be removed in 20.02, metadata feature will use
> > +dynamic mbuf
> > +  field and flag instead.
> > +
> 
> Isn't it breaking stable API/ABI? Should target release be 20.11?
tx_metadata is in union, it should not be ABI break.
And we propose to remove tx_metadata at all in 19.11 
and share the dynamic metadata field between Rx and Tx METAdata.

> I think that DEV_TX_OFFLOAD_MATCH_METADATA should marked as
> deprecated now as well as tx_metadata field in mbuf.
> 
> >   * cryptodev: support for using IV with all sizes is added, J0 still can
> >     be used but only when IV length in following structs
> ``rte_crypto_auth_xform``,
> >     ``rte_crypto_aead_xform`` is set to zero. When IV length is
> > greater or equal diff --git a/doc/guides/rel_notes/release_19_11.rst
> > b/doc/guides/rel_notes/release_19_11.rst
> > index 0e5bb5d..6d331f6 100644
> > --- a/doc/guides/rel_notes/release_19_11.rst
> > +++ b/doc/guides/rel_notes/release_19_11.rst
> > @@ -232,6 +232,21 @@ New Features
> >       gives ability to print port supported ptypes in different protocol layers.
> >
> >
> > +* **Add support of support dynamic fields and flags in mbuf.**
> > +
> > +  This new feature adds the ability to dynamically register some room
> > + for a field or a flag in the mbuf structure. This is typically used
> > + for specific offload features, where adding a static field or flag
> > + in the mbuf is not justified.
> > +
> 
> I guess above is just incorrect merge.
Oops, thanks for spotting,

> 
> > +* **Extended metadata support in rte_flow.**
> > +
> > +  Flow metadata is extended to both Rx and Tx.
> > +
> > +  * Tx metadata can also be set by SET_META action of rte_flow.
> > +  * Rx metadata is delivered to host via a dynamic field of ``rte_mbuf``
> with
> > +    PKT_RX_DYNF_METADATA.
> > +
> 
> Two empty lines are required before the next section.
Accepted.

> 
> >   Removed Items
> >   -------------
> >
> > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > b/lib/librte_ethdev/rte_ethdev.h index c36c1b6..b19c86b 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -1048,7 +1048,6 @@ struct rte_eth_conf {
> >   #define DEV_RX_OFFLOAD_KEEP_CRC		0x00010000
> >   #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
> >   #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
> > -
OK, accepted.

> 
> Unrelated change.
> 
> >   #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM
> | \
> >   				 DEV_RX_OFFLOAD_UDP_CKSUM | \
> >   				 DEV_RX_OFFLOAD_TCP_CKSUM)
> 
> [snip]
> 
> > diff --git a/lib/librte_ethdev/rte_flow.c
> > b/lib/librte_ethdev/rte_flow.c index ca0f680..6090177 100644
> > --- a/lib/librte_ethdev/rte_flow.c
> > +++ b/lib/librte_ethdev/rte_flow.c
> > @@ -12,10 +12,18 @@
> >   #include <rte_errno.h>
> >   #include <rte_branch_prediction.h>
> >   #include <rte_string_fns.h>
> > +#include <rte_mbuf.h>
> > +#include <rte_mbuf_dyn.h>
> >   #include "rte_ethdev.h"
> >   #include "rte_flow_driver.h"
> >   #include "rte_flow.h"
> >
> > +/* Mbuf dynamic field name for metadata. */ int
> > +rte_flow_dynf_metadata_offs = -1;
> > +
> > +/* Mbuf dynamic field flag bit number for metadata. */ uint64_t
> > +rte_flow_dynf_metadata_mask;
> > +
> >   /**
> >    * Flow elements description tables.
> >    */
> > @@ -157,8 +165,41 @@ struct rte_flow_desc_data {
> >   	MK_FLOW_ACTION(DEC_TCP_SEQ, sizeof(rte_be32_t)),
> >   	MK_FLOW_ACTION(INC_TCP_ACK, sizeof(rte_be32_t)),
> >   	MK_FLOW_ACTION(DEC_TCP_ACK, sizeof(rte_be32_t)),
> > +	MK_FLOW_ACTION(SET_META, sizeof(struct
> rte_flow_action_set_meta)),
> >   };
> >
> > +int
> > +rte_flow_dynf_metadata_register(void)
> > +{
> > +	int offset;
> > +	int flag;
> > +
> > +	static const struct rte_mbuf_dynfield desc_offs = {
> > +		.name = MBUF_DYNF_METADATA_NAME,
> > +		.size = sizeof(uint32_t),
> > +		.align = __alignof__(uint32_t),
> > +		.flags = 0,
> 
> I think flags initialization to 0 is redundant.
It was left just for reminding that field exist. Do you think we do not need the reminding? OK.

> 
> > +	};
> > +	static const struct rte_mbuf_dynflag desc_flag = {
> > +		.name = MBUF_DYNF_METADATA_NAME,
> > +	};
> > +
> > +	offset = rte_mbuf_dynfield_register(&desc_offs);
> > +	if (offset < 0)
> > +		goto error;
> > +	flag = rte_mbuf_dynflag_register(&desc_flag);
> > +	if (flag < 0)
> > +		goto error;
> > +	rte_flow_dynf_metadata_offs = offset;
> > +	rte_flow_dynf_metadata_mask = (1ULL << flag);
> > +	return 0;
> > +
> > +error:
> 
> Just an observation...
> Impossibility to unregister hits here. Field may be registered, but will be used.

Metadata field is useless without flag. Yes, we have no opportunity to unregister,
so we just forget about "field with no flag"  and that's it.

> 
> > +	rte_flow_dynf_metadata_offs = -1;
> > +	rte_flow_dynf_metadata_mask = 0ULL;
> > +	return -rte_errno;
> > +}
> > +
> >   static int
> >   flow_err(uint16_t port_id, int ret, struct rte_flow_error *error)
> >   {
> > diff --git a/lib/librte_ethdev/rte_flow.h
> > b/lib/librte_ethdev/rte_flow.h index 4fee105..b821557 100644
> > --- a/lib/librte_ethdev/rte_flow.h
> > +++ b/lib/librte_ethdev/rte_flow.h
> > @@ -28,6 +28,8 @@
> >   #include <rte_byteorder.h>
> >   #include <rte_esp.h>
> >   #include <rte_higig.h>
> > +#include <rte_mbuf.h>
> > +#include <rte_mbuf_dyn.h>
> >
> >   #ifdef __cplusplus
> >   extern "C" {
> > @@ -418,7 +420,8 @@ enum rte_flow_item_type {
> >   	/**
> >   	 * [META]
> >   	 *
> > -	 * Matches a metadata value specified in mbuf metadata field.
> > +	 * Matches a metadata value.
> > +	 *
> >   	 * See struct rte_flow_item_meta.
> >   	 */
> >   	RTE_FLOW_ITEM_TYPE_META,
> > @@ -1263,9 +1266,17 @@ struct rte_flow_item_icmp6_nd_opt_tla_eth {
> >   #endif
> >
> >   /**
> > - * RTE_FLOW_ITEM_TYPE_META.
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior notice
> 
> Is it allowed to make experimental back?
I think we should remove EXPERIMENTAL here. We do not introduce new
feature, but just extend the apply area.

> 
> >    *
> > - * Matches a specified metadata value.
> > + * RTE_FLOW_ITEM_TYPE_META
> > + *
> > + * Matches a specified metadata value. On egress, metadata can be set
> > + either by
> > + * mbuf tx_metadata field with PKT_TX_METADATA flag or
> > + * RTE_FLOW_ACTION_TYPE_SET_META. On ingress,
> > + RTE_FLOW_ACTION_TYPE_SET_META sets
> > + * metadata for a packet and the metadata will be reported via mbuf
> > + metadata
> > + * dynamic field with PKT_RX_DYNF_METADATA flag. The dynamic mbuf
> > + field must be
> > + * registered in advance by rte_flow_dynf_metadata_register().
> >    */
> >   struct rte_flow_item_meta {
> >   	rte_be32_t data;
> 
> [snip]
> 
> > @@ -2429,6 +2447,55 @@ struct rte_flow_action_set_mac {
> >   	uint8_t mac_addr[RTE_ETHER_ADDR_LEN];
> >   };
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior notice
> > + *
> > + * RTE_FLOW_ACTION_TYPE_SET_META
> > + *
> > + * Set metadata. Metadata set by mbuf tx_metadata field with
> > + * PKT_TX_METADATA flag on egress will be overridden by this action.
> > +On
> > + * ingress, the metadata will be carried by mbuf metadata dynamic
> > +field
> > + * with PKT_RX_DYNF_METADATA flag if set.  The dynamic mbuf field
> > +must be
> > + * registered in advance by rte_flow_dynf_metadata_register().
> > + *
> > + * Altering partial bits is supported with mask. For bits which have
> > +never
> > + * been set, unpredictable value will be seen depending on driver
> > + * implementation. For loopback/hairpin packet, metadata set on Rx/Tx
> > +may
> > + * or may not be propagated to the other path depending on HW
> capability.
> > + *
> > + * RTE_FLOW_ITEM_TYPE_META matches metadata.
> > + */
> > +struct rte_flow_action_set_meta {
> > +	rte_be32_t data;
> > +	rte_be32_t mask;
> 
> As I understand tx_metadata is host endian. Just double-checking.
> Is a new dynamic field host endian or big endian?
> I definitely would like to see motivation in comments why data/mask are big-
> endian here.

metadata is opaque value, endianness does not matter, there are no some 
special motivations for choosing endiannes. rte_flow_item_meta() structure
provides data with rte_be32_t type, so meta related action does the same. 
I could assume the origin of selecting bigendian type was the endianness
of metadata field in Tx descriptor of ConnectX NICs.

> 
> > +};
> > +
> > +/* Mbuf dynamic field offset for metadata. */ extern int
> > +rte_flow_dynf_metadata_offs;
> > +
> > +/* Mbuf dynamic field flag mask for metadata. */ extern uint64_t
> > +rte_flow_dynf_metadata_mask;
> 
> These two global variables look frightening to me.
> It does not look good to me.
For me too. But we need the performance, these ones are 
intended for usage in datapath, any overhead is painful.

> 
> > +
> > +/* Mbuf dynamic field pointer for metadata. */ #define
> > +RTE_FLOW_DYNF_METADATA(m) \
> > +	RTE_MBUF_DYNFIELD((m), rte_flow_dynf_metadata_offs, uint32_t
> *)
> > +
> > +/* Mbuf dynamic flag for metadata. */ #define PKT_RX_DYNF_METADATA
> > +(rte_flow_dynf_metadata_mask)
> > +
> > +__rte_experimental
> > +static inline uint32_t
> > +rte_flow_dynf_metadata_get(struct rte_mbuf *m) {
> 
> Above curly bracket should be on its own line in the case of function
> definition.
> 
> Shouldn't m be 'const struct rte_mbuf *'?
You are right, it would be better, will update.
> 
> > +	return *RTE_FLOW_DYNF_METADATA(m);
> > +}
> > +
> > +__rte_experimental
> > +static inline void
> > +rte_flow_dynf_metadata_set(struct rte_mbuf *m, uint32_t v) {
> 
> Above curly bracket should be on its own line in the case of function
> definition.
> 
> > +	*RTE_FLOW_DYNF_METADATA(m) = v;
> > +}
> > +
> >   /*
> >    * Definition of a single action.
> >    *
> > @@ -2662,6 +2729,32 @@ enum rte_flow_conv_op {
> >   };
> >
> >   /**
> > + * Check if mbuf dynamic field for metadata is registered.
> > + *
> > + * @return
> > + *   True if registered, false otherwise.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_flow_dynf_metadata_avail(void) {
> 
> Above curly bracket should be on its own line in the case of function
> definition.
> 
> > +	return !!rte_flow_dynf_metadata_mask; }
> > +
> > +/**
> > + * Register mbuf dynamic field and flag for metadata.
> > + *
> > + * This function must be called prior to use SET_META action in order
> > +to
> > + * register the dynamic mbuf field. Otherwise, the data cannot be
> > +delivered to
> > + * application.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +__rte_experimental
> > +int
> > +rte_flow_dynf_metadata_register(void);
> > +
> > +/**
> >    * Check whether a flow rule can be created on a given port.
> >    *
> >    * The flow rule is validated for correctness and whether it could
> > be accepted diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h
> > b/lib/librte_mbuf/rte_mbuf_dyn.h index 2e9d418..a4a0cf5 100644
> > --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> > +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> > @@ -234,6 +234,10 @@ int rte_mbuf_dynflag_lookup(const char *name,
> >   __rte_experimental
> >   void rte_mbuf_dyn_dump(FILE *out);
> >
> > -/* Placeholder for dynamic fields and flags declarations. */
> > -
> > +/*
> > + * Placeholder for dynamic fields and flags declarations.
> > + * This is centralizing point to gather all field names
> > + * and parameters together.
> > + */
> 
> It is not a comment for below define. So, I think empty line is required to
> separate the comment from below define.
> I'm not sure that the clarification is required, but it is up to Olivier.
> 
> > +#define MBUF_DYNF_METADATA_NAME "rte_flow_dynfield_metadata"
> 
> Empty line is missing here
Thanks, will add one.

> 
> >   #endif

With best regards, Slava
Slava Ovsiienko Oct. 29, 2019, 5:43 p.m. UTC | #6
Hi, Olivier

Thanks a lot for the review.

> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Tuesday, October 29, 2019 18:25
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Matan
> Azrad <matan@mellanox.com>; Ori Kam <orika@mellanox.com>; Yongseok
> Koh <yskoh@mellanox.com>
> Subject: Re: [PATCH v4] ethdev: extend flow metadata
> 
> Hi Slava,
> 
> Looks good to me overall. Few minor comments below.
> 
> On Sun, Oct 27, 2019 at 06:40:36PM +0000, Viacheslav Ovsiienko wrote:
> > Currently, metadata can be set on egress path via mbuf tx_metadata
> > field with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META
> matches metadata.
> >
> > This patch extends the metadata feature usability.
> >
> > 1) RTE_FLOW_ACTION_TYPE_SET_META
> >
> > When supporting multiple tables, Tx metadata can also be set by a rule
> > and matched by another rule. This new action allows metadata to be set
> > as a result of flow match.
> >
> > 2) Metadata on ingress
> >
> > There's also need to support metadata on ingress. Metadata can be set
> > by SET_META action and matched by META item like Tx. The final value
> > set by the action will be delivered to application via metadata
> > dynamic field of mbuf which can be accessed by
> RTE_FLOW_DYNF_METADATA().
> > PKT_RX_DYNF_METADATA flag will be set along with the data.
> >
> > The mbuf dynamic field must be registered by calling
> > rte_flow_dynf_metadata_register() prior to use SET_META action.
> >
> > The availability of dynamic mbuf metadata field can be checked with
> > rte_flow_dynf_metadata_avail() routine.
> >
> > For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> > propagated to the other path depending on hardware capability.
> >
> > Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> (...)
> 
> > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > b/lib/librte_ethdev/rte_ethdev.h index c36c1b6..b19c86b 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -1048,7 +1048,6 @@ struct rte_eth_conf {
> >  #define DEV_RX_OFFLOAD_KEEP_CRC		0x00010000
> >  #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
> >  #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
> > -
> >  #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM
> | \
> >  				 DEV_RX_OFFLOAD_UDP_CKSUM | \
> >  				 DEV_RX_OFFLOAD_TCP_CKSUM)
> 
> Undue removed line here.

Right, will fix.

> 
> (...)
> 
> > +/* Mbuf dynamic field offset for metadata. */ extern int
> > +rte_flow_dynf_metadata_offs;
> > +
> > +/* Mbuf dynamic field flag mask for metadata. */ extern uint64_t
> > +rte_flow_dynf_metadata_mask;
> > +
> > +/* Mbuf dynamic field pointer for metadata. */ #define
> > +RTE_FLOW_DYNF_METADATA(m) \
> > +	RTE_MBUF_DYNFIELD((m), rte_flow_dynf_metadata_offs, uint32_t
> *)
> > +
> > +/* Mbuf dynamic flag for metadata. */ #define PKT_RX_DYNF_METADATA
> > +(rte_flow_dynf_metadata_mask)
> > +
> > +__rte_experimental
> > +static inline uint32_t
> > +rte_flow_dynf_metadata_get(struct rte_mbuf *m) {
> > +	return *RTE_FLOW_DYNF_METADATA(m);
> > +}
> > +
> > +__rte_experimental
> > +static inline void
> > +rte_flow_dynf_metadata_set(struct rte_mbuf *m, uint32_t v) {
> > +	*RTE_FLOW_DYNF_METADATA(m) = v;
> > +}
> > +
> 
> (...)
> 
> > +__rte_experimental
> > +static inline int
> > +rte_flow_dynf_metadata_avail(void) {
> > +       return !!rte_flow_dynf_metadata_mask; }
> 
> I think, in DPDK:
> 
> 	static inline void
> 	rte_flow_dynf_metadata_set(struct rte_mbuf *m, uint32_t v)
> 	{
> 		...
> 
> is prefered over:
> 
> 	static inline void
> 	rte_flow_dynf_metadata_set(struct rte_mbuf *m, uint32_t v) {

Right. It is strange it passed the validator. Will fix.
> 		...
> 
> > --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> > +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> > @@ -234,6 +234,10 @@ int rte_mbuf_dynflag_lookup(const char *name,
> > __rte_experimental  void rte_mbuf_dyn_dump(FILE *out);
> >
> > -/* Placeholder for dynamic fields and flags declarations. */
> > -
> > +/*
> > + * Placeholder for dynamic fields and flags declarations.
> > + * This is centralizing point to gather all field names
> > + * and parameters together.
> > + */
> > +#define MBUF_DYNF_METADATA_NAME "rte_flow_dynfield_metadata"
> >  #endif
> 
> The RTE_ prefix is missing. Also, thi name is called dynfield but it is used for
> both field and flag. I suggest RTE_MBUF_DYNFIELD_METADATA_NAME and
> RTE_MBUF_DYNFLAG_METADATA_NAME, to be consistent with the other
> naming conventions in rte_mbuf_dyn.[ch].

Well, it makes sense for complex features, say, with multiple dynaflags.
But it is OK for me, will update.

> 
> One more comment: as previously discussed, changing the size or alignement
> of a dynamic field should not be allowed, because it can break the users of
> the field.
> 
> Depending on how it is implemented (is the registration function inline?
> is the rte_mbuf_dynfield structure private, shared, or static const in a .h? are
> we using #defines for name, size, align?), I think the impact on users will be
> different. This is something we need to think about for next versions: how to
> detect these changes before pushing the commit, and/or at runtime?
> 
I'm not sure if we will have a lot of implementation invariants for dynamic fields.
These ones are intended to be used in datapath, performance is a king. I think if 
someone invent the more efficient way to handle dynafields, we'll update the
metadata code either.


> Regards,
> Olivier
Slava Ovsiienko Oct. 29, 2019, 5:53 p.m. UTC | #7
> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Tuesday, October 29, 2019 18:34
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Matan
> Azrad <matan@mellanox.com>; Ori Kam <orika@mellanox.com>; Yongseok
> Koh <yskoh@mellanox.com>
> Subject: Re: [PATCH v4] ethdev: extend flow metadata
> 
> On Tue, Oct 29, 2019 at 05:25:22PM +0100, Olivier Matz wrote:
> > Hi Slava,
> >
> > Looks good to me overall. Few minor comments below.

[snip]

> 
> I forgot: can you please document the goal/usage of these field and flag
> here?  Not necessarily a detailed explanation, but a high level view:
> what is transported, when it is registered, ...
> 
Even the brief explanation is wordy, because it involves some 
metadata features history, I will think how I could elaborate the 
description.

With best regards, Slava
Thomas Monjalon Oct. 29, 2019, 6:30 p.m. UTC | #8
29/10/2019 18:19, Slava Ovsiienko:
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> > > --- a/doc/guides/rel_notes/deprecation.rst
> > > +++ b/doc/guides/rel_notes/deprecation.rst
> > > +* ethdev: DEV_TX_OFFLOAD_MATCH_METADATA will be removed, static
> > > +metadata
> > > +  mbuf field will be removed in 20.02, metadata feature will use
> > > +dynamic mbuf
> > > +  field and flag instead.
> > > +
> > 
> > Isn't it breaking stable API/ABI? Should target release be 20.11?
> 
> tx_metadata is in union, it should not be ABI break.
> And we propose to remove tx_metadata at all in 19.11 
> and share the dynamic metadata field between Rx and Tx METAdata.

Yes please, let's remove tx_metadata from mbuf now,
while adding metadata dynamic field.
I am sure nobody will complain that we sanitized this feature
before releasing DPDK 19.11 LTS.
Slava Ovsiienko Oct. 29, 2019, 6:35 p.m. UTC | #9
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Tuesday, October 29, 2019 20:30
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: Andrew Rybchenko <arybchenko@solarflare.com>; dev@dpdk.org;
> Matan Azrad <matan@mellanox.com>; olivier.matz@6wind.com; Ori Kam
> <orika@mellanox.com>
> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata
> 
> 29/10/2019 18:19, Slava Ovsiienko:
> > From: Andrew Rybchenko <arybchenko@solarflare.com>
> > > > --- a/doc/guides/rel_notes/deprecation.rst
> > > > +++ b/doc/guides/rel_notes/deprecation.rst
> > > > +* ethdev: DEV_TX_OFFLOAD_MATCH_METADATA will be removed,
> static
> > > > +metadata
> > > > +  mbuf field will be removed in 20.02, metadata feature will use
> > > > +dynamic mbuf
> > > > +  field and flag instead.
> > > > +
> > >
> > > Isn't it breaking stable API/ABI? Should target release be 20.11?
> >
> > tx_metadata is in union, it should not be ABI break.
> > And we propose to remove tx_metadata at all in 19.11 and share the
> > dynamic metadata field between Rx and Tx METAdata.
> 
> Yes please, let's remove tx_metadata from mbuf now, while adding
> metadata dynamic field.
> I am sure nobody will complain that we sanitized this feature before
> releasing DPDK 19.11 LTS.

OK, I'll extend this patch to small series and update tx_metadata in the
dedicated commit (to simplify reviewing either).

With best regards, Slava
Andrew Rybchenko Oct. 30, 2019, 6:28 a.m. UTC | #10
On 10/29/19 9:30 PM, Thomas Monjalon wrote:
> 29/10/2019 18:19, Slava Ovsiienko:
>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>>>> --- a/doc/guides/rel_notes/deprecation.rst
>>>> +++ b/doc/guides/rel_notes/deprecation.rst
>>>> +* ethdev: DEV_TX_OFFLOAD_MATCH_METADATA will be removed, static
>>>> +metadata
>>>> +  mbuf field will be removed in 20.02, metadata feature will use
>>>> +dynamic mbuf
>>>> +  field and flag instead.
>>>> +
>>> Isn't it breaking stable API/ABI? Should target release be 20.11?
>> tx_metadata is in union, it should not be ABI break.
>> And we propose to remove tx_metadata at all in 19.11
>> and share the dynamic metadata field between Rx and Tx METAdata.
> Yes please, let's remove tx_metadata from mbuf now,
> while adding metadata dynamic field.
> I am sure nobody will complain that we sanitized this feature
> before releasing DPDK 19.11 LTS.

OK for me.
Andrew Rybchenko Oct. 30, 2019, 7:35 a.m. UTC | #11
@Olivier, please, take a look at the end of the mail.

On 10/29/19 8:19 PM, Slava Ovsiienko wrote:
> Hi, Andrew
>
> Thank you for the review.
>
>> -----Original Message-----
>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>> Sent: Tuesday, October 29, 2019 18:22
>> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
>> Cc: Thomas Monjalon <thomas@monjalon.net>; Matan Azrad
>> <matan@mellanox.com>; olivier.matz@6wind.com; Ori Kam
>> <orika@mellanox.com>; Yongseok Koh <yskoh@mellanox.com>
>> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata
>>
>> On 10/27/19 9:40 PM, Viacheslav Ovsiienko wrote:
>>> Currently, metadata can be set on egress path via mbuf tx_metadata
>>> field with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META
>> matches metadata.
>>> This patch extends the metadata feature usability.
>>>
>>> 1) RTE_FLOW_ACTION_TYPE_SET_META
>>>
>>> When supporting multiple tables, Tx metadata can also be set by a rule
>>> and matched by another rule. This new action allows metadata to be set
>>> as a result of flow match.
>>>
>>> 2) Metadata on ingress
>>>
>>> There's also need to support metadata on ingress. Metadata can be set
>>> by SET_META action and matched by META item like Tx. The final value
>>> set by the action will be delivered to application via metadata
>>> dynamic field of mbuf which can be accessed by
>> RTE_FLOW_DYNF_METADATA().
>>> PKT_RX_DYNF_METADATA flag will be set along with the data.
>>>
>>> The mbuf dynamic field must be registered by calling
>>> rte_flow_dynf_metadata_register() prior to use SET_META action.
>>>
>>> The availability of dynamic mbuf metadata field can be checked with
>>> rte_flow_dynf_metadata_avail() routine.
>>>
>>> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
>>> propagated to the other path depending on hardware capability.
>>>
>>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
>>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
>> Above explanations lack information about "meta" vs "mark" which may be
>> set on Rx as well and delivered in other mbuf field.
>> It should be explained by one more field is required and rules defined.
> There is some story about metadata features.
> Initially, there were proposed two metadata related actions:
>
> - RTE_FLOW_ACTION_TYPE_FLAG
> - RTE_FLOW_ACTION_TYPE_MARK
>
> These actions set the special flag in the packet metadata, MARK action stores some
> specified value in the metadata storage, and, on the packet receiving PMD puts the flag
> and value to the mbuf and applications can see the packet was threated inside flow engine
> according to the appropriate RTE flow(s). MARK and FLAG are like some kind of gateway
> to transfer some per-packet information from the flow engine to the application
> via receiving datapath.
>
>  From the datapath point of view, the MARK and FLAG are related to the receiving side only.
> It would useful to have the same gateway on the transmitting side and there was the feature
> of type RTE_FLOW_ITEM_TYPE_META was proposed. The application can fill the field in mbuf
> and this value will be transferred to some field in the packet metadata inside the flow engine.
> It did not matter whether these metadata fields are shared because of MARK and META items
> belonged to different domains (receiving and transmitting) and could be vendor-specific.
>
> So far, so good, DPDK proposes some entities to control metadata inside the flow engine
> and gateways to exchange these values on a per-packet basis via datapaths.
>
> As we can see, the MARK and META means are not symmetric, there is absent action which
> would allow us to set META value on the transmitting path. So, the action of type:
> - RTE_FLOW_ACTION_TYPE_SET_META is proposed.
>
> The next, applications raise the new requirements for packet metadata. The flow engines are
> getting more complex, internal switches are introduced, multiple ports might be supported within
> the same flow engine namespace. From the DPDK points of view, it means the packets might be sent
> on one eth_dev port and received on the other one, and the packet path inside the flow engine entirely
> belongs to the same hardware device. The simplest example is SR-IOV with PF, VFs and the representors.
> And there is a brilliant opportunity to provide some out-of-band channel to transfer some extra data
>   from one port to another one, besides the packet data itself.
>
>
>> Above explanations lack information about "meta" vs "mark" which may be
>> set on Rx as well and delivered in other mbuf field.
>> It should be explained by one more field is required and rules defined.
>> Otherwise we can endup in half PMDs supporting mark only, half PMDs
>> supporting meta only and applications in an interesting situation to make a
>> choice which one to use.
> There is no "mark" vs "meta". MARK and META means are kept for compatibility issues
> and legacy part works exactly as before. The trials (with flow_validate)  is supposed
> to check whether PMD supports MARK or META feature on appropriate domain. It depends
> on PMD implementation, configuration and underlaying HW/FW/kernel capabilities and
> should be resolved in runtime.

The trials a way, but very tricky way. My imagination draws me
pictures how an application code could look like in attempt to use
either mark or meta for Rx only and these pictures are not nice.
May be it will look acceptable when mark becomes a dynamic
since usage of either one or another dynamic field is definitely
easier than usage of either fixed or dynamic field.

May be dynamic field for mark at fixed offset should be
introduced in the release or the nearest future? It will allow
to preserve ABI up to 20.11 and provide future proof API.
The trick is to register dynamic meta field at fixed offset
at start of a day to be sure that it is guaranteed to succeed.
It sounds like it is a transition mechanism from fixed to
dynamic fields.

[snip]

>>> diff --git a/lib/librte_ethdev/rte_flow.h
>>> b/lib/librte_ethdev/rte_flow.h index 4fee105..b821557 100644
>>> --- a/lib/librte_ethdev/rte_flow.h
>>> +++ b/lib/librte_ethdev/rte_flow.h
>>> @@ -28,6 +28,8 @@
>>>    #include <rte_byteorder.h>
>>>    #include <rte_esp.h>
>>>    #include <rte_higig.h>
>>> +#include <rte_mbuf.h>
>>> +#include <rte_mbuf_dyn.h>
>>>
>>>    #ifdef __cplusplus
>>>    extern "C" {
>>> @@ -418,7 +420,8 @@ enum rte_flow_item_type {
>>>    	/**
>>>    	 * [META]
>>>    	 *
>>> -	 * Matches a metadata value specified in mbuf metadata field.
>>> +	 * Matches a metadata value.
>>> +	 *
>>>    	 * See struct rte_flow_item_meta.
>>>    	 */
>>>    	RTE_FLOW_ITEM_TYPE_META,
>>> @@ -1263,9 +1266,17 @@ struct rte_flow_item_icmp6_nd_opt_tla_eth {
>>>    #endif
>>>
>>>    /**
>>> - * RTE_FLOW_ITEM_TYPE_META.
>>> + * @warning
>>> + * @b EXPERIMENTAL: this structure may change without prior notice
>> Is it allowed to make experimental back?
> I think we should remove EXPERIMENTAL here. We do not introduce new
> feature, but just extend the apply area.

Agreed.

>>>     *
>>> - * Matches a specified metadata value.
>>> + * RTE_FLOW_ITEM_TYPE_META
>>> + *
>>> + * Matches a specified metadata value. On egress, metadata can be set
>>> + either by
>>> + * mbuf tx_metadata field with PKT_TX_METADATA flag or
>>> + * RTE_FLOW_ACTION_TYPE_SET_META. On ingress,
>>> + RTE_FLOW_ACTION_TYPE_SET_META sets
>>> + * metadata for a packet and the metadata will be reported via mbuf
>>> + metadata
>>> + * dynamic field with PKT_RX_DYNF_METADATA flag. The dynamic mbuf
>>> + field must be
>>> + * registered in advance by rte_flow_dynf_metadata_register().
>>>     */
>>>    struct rte_flow_item_meta {
>>>    	rte_be32_t data;
>> [snip]
>>
>>> @@ -2429,6 +2447,55 @@ struct rte_flow_action_set_mac {
>>>    	uint8_t mac_addr[RTE_ETHER_ADDR_LEN];
>>>    };
>>>
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this structure may change without prior notice
>>> + *
>>> + * RTE_FLOW_ACTION_TYPE_SET_META
>>> + *
>>> + * Set metadata. Metadata set by mbuf tx_metadata field with
>>> + * PKT_TX_METADATA flag on egress will be overridden by this action.
>>> +On
>>> + * ingress, the metadata will be carried by mbuf metadata dynamic
>>> +field
>>> + * with PKT_RX_DYNF_METADATA flag if set.  The dynamic mbuf field
>>> +must be
>>> + * registered in advance by rte_flow_dynf_metadata_register().
>>> + *
>>> + * Altering partial bits is supported with mask. For bits which have
>>> +never
>>> + * been set, unpredictable value will be seen depending on driver
>>> + * implementation. For loopback/hairpin packet, metadata set on Rx/Tx
>>> +may
>>> + * or may not be propagated to the other path depending on HW
>> capability.
>>> + *
>>> + * RTE_FLOW_ITEM_TYPE_META matches metadata.
>>> + */
>>> +struct rte_flow_action_set_meta {
>>> +	rte_be32_t data;
>>> +	rte_be32_t mask;
>> As I understand tx_metadata is host endian. Just double-checking.
>> Is a new dynamic field host endian or big endian?
>> I definitely would like to see motivation in comments why data/mask are big-
>> endian here.
> metadata is opaque value, endianness does not matter, there are no some
> special motivations for choosing endiannes. rte_flow_item_meta() structure
> provides data with rte_be32_t type, so meta related action does the same.

Endianness of meta in mbuf and flow API should match and it must be
documented. Endianness is important if a HW supports less bits since
it makes a hit for application to use LSB first if the bit space is 
sufficient.
mark is defined as host-endian (uint32_t) and I think meta should be the
same. Otherwise it complicates even more either mark or meta usage
as discussed above .

Yes, I think that rte_flow_item_meta should be fixed since both
mark and tx_metadata are host-endian.

(it says nothing about HW interface which is vendor specific and
vendor PMDs should care about it)

> I could assume the origin of selecting bigendian type was the endianness
> of metadata field in Tx descriptor of ConnectX NICs.
>
>>> +};
>>> +
>>> +/* Mbuf dynamic field offset for metadata. */ extern int
>>> +rte_flow_dynf_metadata_offs;
>>> +
>>> +/* Mbuf dynamic field flag mask for metadata. */ extern uint64_t
>>> +rte_flow_dynf_metadata_mask;
>> These two global variables look frightening to me.
>> It does not look good to me.
> For me too. But we need the performance, these ones are
> intended for usage in datapath, any overhead is painful.

@Olivier, could you share your thoughts, please.

Andrew.
Slava Ovsiienko Oct. 30, 2019, 8:59 a.m. UTC | #12
> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Wednesday, October 30, 2019 9:35
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; olivier.matz@6wind.com
> Cc: dev@dpdk.org; Matan Azrad <matan@mellanox.com>; Ori Kam
> <orika@mellanox.com>; Yongseok Koh <yskoh@mellanox.com>
> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata
> 
> @Olivier, please, take a look at the end of the mail.
> 
> On 10/29/19 8:19 PM, Slava Ovsiienko wrote:
> > Hi, Andrew
> >
> > Thank you for the review.
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <arybchenko@solarflare.com>
> >> Sent: Tuesday, October 29, 2019 18:22
> >> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> >> Cc: Thomas Monjalon <thomas@monjalon.net>; Matan Azrad
> >> <matan@mellanox.com>; olivier.matz@6wind.com; Ori Kam
> >> <orika@mellanox.com>; Yongseok Koh <yskoh@mellanox.com>
> >> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata
> >>
> >> On 10/27/19 9:40 PM, Viacheslav Ovsiienko wrote:
> >>> Currently, metadata can be set on egress path via mbuf tx_metadata
> >>> field with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META
> >> matches metadata.
> >>> This patch extends the metadata feature usability.
> >>>
> >>> 1) RTE_FLOW_ACTION_TYPE_SET_META
> >>>
> >>> When supporting multiple tables, Tx metadata can also be set by a
> >>> rule and matched by another rule. This new action allows metadata to
> >>> be set as a result of flow match.
> >>>
> >>> 2) Metadata on ingress
> >>>
> >>> There's also need to support metadata on ingress. Metadata can be
> >>> set by SET_META action and matched by META item like Tx. The final
> >>> value set by the action will be delivered to application via
> >>> metadata dynamic field of mbuf which can be accessed by
> >> RTE_FLOW_DYNF_METADATA().
> >>> PKT_RX_DYNF_METADATA flag will be set along with the data.
> >>>
> >>> The mbuf dynamic field must be registered by calling
> >>> rte_flow_dynf_metadata_register() prior to use SET_META action.
> >>>
> >>> The availability of dynamic mbuf metadata field can be checked with
> >>> rte_flow_dynf_metadata_avail() routine.
> >>>
> >>> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> >>> propagated to the other path depending on hardware capability.
> >>>
> >>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> >>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> >> Above explanations lack information about "meta" vs "mark" which may
> >> be set on Rx as well and delivered in other mbuf field.
> >> It should be explained by one more field is required and rules defined.
> > There is some story about metadata features.
> > Initially, there were proposed two metadata related actions:
> >
> > - RTE_FLOW_ACTION_TYPE_FLAG
> > - RTE_FLOW_ACTION_TYPE_MARK
> >
> > These actions set the special flag in the packet metadata, MARK action
> > stores some specified value in the metadata storage, and, on the
> > packet receiving PMD puts the flag and value to the mbuf and
> > applications can see the packet was threated inside flow engine
> > according to the appropriate RTE flow(s). MARK and FLAG are like some
> > kind of gateway to transfer some per-packet information from the flow
> engine to the application via receiving datapath.
> >
> >  From the datapath point of view, the MARK and FLAG are related to the
> receiving side only.
> > It would useful to have the same gateway on the transmitting side and
> > there was the feature of type RTE_FLOW_ITEM_TYPE_META was
> proposed.
> > The application can fill the field in mbuf and this value will be transferred to
> some field in the packet metadata inside the flow engine.
> > It did not matter whether these metadata fields are shared because of
> > MARK and META items belonged to different domains (receiving and
> transmitting) and could be vendor-specific.
> >
> > So far, so good, DPDK proposes some entities to control metadata
> > inside the flow engine and gateways to exchange these values on a per-
> packet basis via datapaths.
> >
> > As we can see, the MARK and META means are not symmetric, there is
> > absent action which would allow us to set META value on the transmitting
> path. So, the action of type:
> > - RTE_FLOW_ACTION_TYPE_SET_META is proposed.
> >
> > The next, applications raise the new requirements for packet metadata.
> > The flow engines are getting more complex, internal switches are
> > introduced, multiple ports might be supported within the same flow
> > engine namespace. From the DPDK points of view, it means the packets
> > might be sent on one eth_dev port and received on the other one, and the
> packet path inside the flow engine entirely belongs to the same hardware
> device. The simplest example is SR-IOV with PF, VFs and the representors.
> > And there is a brilliant opportunity to provide some out-of-band channel to
> transfer some extra data
> >   from one port to another one, besides the packet data itself.
> >
> >
> >> Above explanations lack information about "meta" vs "mark" which may
> >> be set on Rx as well and delivered in other mbuf field.
> >> It should be explained by one more field is required and rules defined.
> >> Otherwise we can endup in half PMDs supporting mark only, half PMDs
> >> supporting meta only and applications in an interesting situation to
> >> make a choice which one to use.
> > There is no "mark" vs "meta". MARK and META means are kept for
> > compatibility issues and legacy part works exactly as before. The
> > trials (with flow_validate)  is supposed to check whether PMD supports
> > MARK or META feature on appropriate domain. It depends on PMD
> > implementation, configuration and underlaying HW/FW/kernel capabilities
> and should be resolved in runtime.
> 
> The trials a way, but very tricky way. My imagination draws me pictures how
> an application code could look like in attempt to use either mark or meta for
> Rx only and these pictures are not nice.
Agree, trials is not the best way.
For now there is no application trying to choose mark or meta, because these ones
belonged to other domains, and extension is newly introduced. 
So, the applications using mark will continue use mark, the same is regarding meta.
The new application definitely will ask for both mark and of them,
we have the requirements from customers  - "give us as many through bits as you can".
This new application just may refuse to work if metadata features are not detected,
because relays on it strongly.
BTW, trials are not so complex: rte_flow_validate(mark), rte_flow_validate_metadata()
and that's it.

> May be it will look acceptable when mark becomes a dynamic since usage of
> either one or another dynamic field is definitely easier than usage of either
> fixed or dynamic field.
At least in PMD datapath it does not look very ugly.
> 
> May be dynamic field for mark at fixed offset should be introduced in the
> release or the nearest future? It will allow to preserve ABI up to 20.11 and
> provide future proof API.
> The trick is to register dynamic meta field at fixed offset at start of a day to
> be sure that it is guaranteed to succeed.
> It sounds like it is a transition mechanism from fixed to dynamic fields.
> 
> [snip]
> 
> >>> diff --git a/lib/librte_ethdev/rte_flow.h
> >>> b/lib/librte_ethdev/rte_flow.h index 4fee105..b821557 100644
> >>> --- a/lib/librte_ethdev/rte_flow.h
> >>> +++ b/lib/librte_ethdev/rte_flow.h
> >>> @@ -28,6 +28,8 @@
> >>>    #include <rte_byteorder.h>
> >>>    #include <rte_esp.h>
> >>>    #include <rte_higig.h>
> >>> +#include <rte_mbuf.h>
> >>> +#include <rte_mbuf_dyn.h>
> >>>
> >>>    #ifdef __cplusplus
> >>>    extern "C" {
> >>> @@ -418,7 +420,8 @@ enum rte_flow_item_type {
> >>>    	/**
> >>>    	 * [META]
> >>>    	 *
> >>> -	 * Matches a metadata value specified in mbuf metadata field.
> >>> +	 * Matches a metadata value.
> >>> +	 *
> >>>    	 * See struct rte_flow_item_meta.
> >>>    	 */
> >>>    	RTE_FLOW_ITEM_TYPE_META,
> >>> @@ -1263,9 +1266,17 @@ struct rte_flow_item_icmp6_nd_opt_tla_eth
> {
> >>>    #endif
> >>>
> >>>    /**
> >>> - * RTE_FLOW_ITEM_TYPE_META.
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this structure may change without prior notice
> >> Is it allowed to make experimental back?
> > I think we should remove EXPERIMENTAL here. We do not introduce new
> > feature, but just extend the apply area.
> 
> Agreed.
> 
> >>>     *
> >>> - * Matches a specified metadata value.
> >>> + * RTE_FLOW_ITEM_TYPE_META
> >>> + *
> >>> + * Matches a specified metadata value. On egress, metadata can be
> >>> + set either by
> >>> + * mbuf tx_metadata field with PKT_TX_METADATA flag or
> >>> + * RTE_FLOW_ACTION_TYPE_SET_META. On ingress,
> >>> + RTE_FLOW_ACTION_TYPE_SET_META sets
> >>> + * metadata for a packet and the metadata will be reported via mbuf
> >>> + metadata
> >>> + * dynamic field with PKT_RX_DYNF_METADATA flag. The dynamic
> mbuf
> >>> + field must be
> >>> + * registered in advance by rte_flow_dynf_metadata_register().
> >>>     */
> >>>    struct rte_flow_item_meta {
> >>>    	rte_be32_t data;
> >> [snip]
> >>
> >>> @@ -2429,6 +2447,55 @@ struct rte_flow_action_set_mac {
> >>>    	uint8_t mac_addr[RTE_ETHER_ADDR_LEN];
> >>>    };
> >>>
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this structure may change without prior notice
> >>> + *
> >>> + * RTE_FLOW_ACTION_TYPE_SET_META
> >>> + *
> >>> + * Set metadata. Metadata set by mbuf tx_metadata field with
> >>> + * PKT_TX_METADATA flag on egress will be overridden by this action.
> >>> +On
> >>> + * ingress, the metadata will be carried by mbuf metadata dynamic
> >>> +field
> >>> + * with PKT_RX_DYNF_METADATA flag if set.  The dynamic mbuf field
> >>> +must be
> >>> + * registered in advance by rte_flow_dynf_metadata_register().
> >>> + *
> >>> + * Altering partial bits is supported with mask. For bits which
> >>> +have never
> >>> + * been set, unpredictable value will be seen depending on driver
> >>> + * implementation. For loopback/hairpin packet, metadata set on
> >>> +Rx/Tx may
> >>> + * or may not be propagated to the other path depending on HW
> >> capability.
> >>> + *
> >>> + * RTE_FLOW_ITEM_TYPE_META matches metadata.
> >>> + */
> >>> +struct rte_flow_action_set_meta {
> >>> +	rte_be32_t data;
> >>> +	rte_be32_t mask;
> >> As I understand tx_metadata is host endian. Just double-checking.
> >> Is a new dynamic field host endian or big endian?
> >> I definitely would like to see motivation in comments why data/mask
> >> are big- endian here.
> > metadata is opaque value, endianness does not matter, there are no
> > some special motivations for choosing endiannes. rte_flow_item_meta()
> > structure provides data with rte_be32_t type, so meta related action does
> the same.
> 
> Endianness of meta in mbuf and flow API should match and it must be
Yes, and they match (for meta), both are rte_be32_t. OK, will emphasize it in docs.

> documented. Endianness is important if a HW supports less bits since it
> makes a hit for application to use LSB first if the bit space is sufficient.
> mark is defined as host-endian (uint32_t) and I think meta should be the
> same. Otherwise it complicates even more either mark or meta usage as
> discussed above .
mark is mark, meta is meta, these ones are not interrelated (despite they
are becoming similar). And there is no choice between mark and meta.
It is not obvious we should have the same endianness for both.
There are some applications using this legacy features, so there might be compatibility
issues either.

> 
> Yes, I think that rte_flow_item_meta should be fixed since both mark and
> tx_metadata are host-endian.
> 
> (it says nothing about HW interface which is vendor specific and vendor
> PMDs should care about it)
Handling this in PMD might introduce the extra endianness conversion in datapath,
impacting performance. Not nice, IMO.
> 
> > I could assume the origin of selecting bigendian type was the
> > endianness of metadata field in Tx descriptor of ConnectX NICs.
> >
> >>> +};
> >>> +
> >>> +/* Mbuf dynamic field offset for metadata. */ extern int
> >>> +rte_flow_dynf_metadata_offs;
> >>> +
> >>> +/* Mbuf dynamic field flag mask for metadata. */ extern uint64_t
> >>> +rte_flow_dynf_metadata_mask;
> >> These two global variables look frightening to me.
> >> It does not look good to me.
> > For me too. But we need the performance, these ones are intended for
> > usage in datapath, any overhead is painful.
> 
> @Olivier, could you share your thoughts, please.
> 
> Andrew.

With best regards, Slava
Andrew Rybchenko Oct. 30, 2019, 9:20 a.m. UTC | #13
On 10/30/19 11:59 AM, Slava Ovsiienko wrote:
>> -----Original Message-----
>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>> Sent: Wednesday, October 30, 2019 9:35
>> To: Slava Ovsiienko <viacheslavo@mellanox.com>; Thomas Monjalon
>> <thomas@monjalon.net>; olivier.matz@6wind.com
>> Cc: dev@dpdk.org; Matan Azrad <matan@mellanox.com>; Ori Kam
>> <orika@mellanox.com>; Yongseok Koh <yskoh@mellanox.com>
>> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata
>>
>> @Olivier, please, take a look at the end of the mail.
>>
>> On 10/29/19 8:19 PM, Slava Ovsiienko wrote:
>>> Hi, Andrew
>>>
>>> Thank you for the review.
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>>>> Sent: Tuesday, October 29, 2019 18:22
>>>> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
>>>> Cc: Thomas Monjalon <thomas@monjalon.net>; Matan Azrad
>>>> <matan@mellanox.com>; olivier.matz@6wind.com; Ori Kam
>>>> <orika@mellanox.com>; Yongseok Koh <yskoh@mellanox.com>
>>>> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata
>>>>
>>>> On 10/27/19 9:40 PM, Viacheslav Ovsiienko wrote:
>>>>> Currently, metadata can be set on egress path via mbuf tx_metadata
>>>>> field with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META
>>>> matches metadata.
>>>>> This patch extends the metadata feature usability.
>>>>>
>>>>> 1) RTE_FLOW_ACTION_TYPE_SET_META
>>>>>
>>>>> When supporting multiple tables, Tx metadata can also be set by a
>>>>> rule and matched by another rule. This new action allows metadata to
>>>>> be set as a result of flow match.
>>>>>
>>>>> 2) Metadata on ingress
>>>>>
>>>>> There's also need to support metadata on ingress. Metadata can be
>>>>> set by SET_META action and matched by META item like Tx. The final
>>>>> value set by the action will be delivered to application via
>>>>> metadata dynamic field of mbuf which can be accessed by
>>>> RTE_FLOW_DYNF_METADATA().
>>>>> PKT_RX_DYNF_METADATA flag will be set along with the data.
>>>>>
>>>>> The mbuf dynamic field must be registered by calling
>>>>> rte_flow_dynf_metadata_register() prior to use SET_META action.
>>>>>
>>>>> The availability of dynamic mbuf metadata field can be checked with
>>>>> rte_flow_dynf_metadata_avail() routine.
>>>>>
>>>>> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
>>>>> propagated to the other path depending on hardware capability.
>>>>>
>>>>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
>>>>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
>>>> Above explanations lack information about "meta" vs "mark" which may
>>>> be set on Rx as well and delivered in other mbuf field.
>>>> It should be explained by one more field is required and rules defined.
>>> There is some story about metadata features.
>>> Initially, there were proposed two metadata related actions:
>>>
>>> - RTE_FLOW_ACTION_TYPE_FLAG
>>> - RTE_FLOW_ACTION_TYPE_MARK
>>>
>>> These actions set the special flag in the packet metadata, MARK action
>>> stores some specified value in the metadata storage, and, on the
>>> packet receiving PMD puts the flag and value to the mbuf and
>>> applications can see the packet was threated inside flow engine
>>> according to the appropriate RTE flow(s). MARK and FLAG are like some
>>> kind of gateway to transfer some per-packet information from the flow
>> engine to the application via receiving datapath.
>>>   From the datapath point of view, the MARK and FLAG are related to the
>> receiving side only.
>>> It would useful to have the same gateway on the transmitting side and
>>> there was the feature of type RTE_FLOW_ITEM_TYPE_META was
>> proposed.
>>> The application can fill the field in mbuf and this value will be transferred to
>> some field in the packet metadata inside the flow engine.
>>> It did not matter whether these metadata fields are shared because of
>>> MARK and META items belonged to different domains (receiving and
>> transmitting) and could be vendor-specific.
>>> So far, so good, DPDK proposes some entities to control metadata
>>> inside the flow engine and gateways to exchange these values on a per-
>> packet basis via datapaths.
>>> As we can see, the MARK and META means are not symmetric, there is
>>> absent action which would allow us to set META value on the transmitting
>> path. So, the action of type:
>>> - RTE_FLOW_ACTION_TYPE_SET_META is proposed.
>>>
>>> The next, applications raise the new requirements for packet metadata.
>>> The flow engines are getting more complex, internal switches are
>>> introduced, multiple ports might be supported within the same flow
>>> engine namespace. From the DPDK points of view, it means the packets
>>> might be sent on one eth_dev port and received on the other one, and the
>> packet path inside the flow engine entirely belongs to the same hardware
>> device. The simplest example is SR-IOV with PF, VFs and the representors.
>>> And there is a brilliant opportunity to provide some out-of-band channel to
>> transfer some extra data
>>>    from one port to another one, besides the packet data itself.
>>>
>>>
>>>> Above explanations lack information about "meta" vs "mark" which may
>>>> be set on Rx as well and delivered in other mbuf field.
>>>> It should be explained by one more field is required and rules defined.
>>>> Otherwise we can endup in half PMDs supporting mark only, half PMDs
>>>> supporting meta only and applications in an interesting situation to
>>>> make a choice which one to use.
>>> There is no "mark" vs "meta". MARK and META means are kept for
>>> compatibility issues and legacy part works exactly as before. The
>>> trials (with flow_validate)  is supposed to check whether PMD supports
>>> MARK or META feature on appropriate domain. It depends on PMD
>>> implementation, configuration and underlaying HW/FW/kernel capabilities
>> and should be resolved in runtime.
>>
>> The trials a way, but very tricky way. My imagination draws me pictures how
>> an application code could look like in attempt to use either mark or meta for
>> Rx only and these pictures are not nice.
> Agree, trials is not the best way.
> For now there is no application trying to choose mark or meta, because these ones
> belonged to other domains, and extension is newly introduced.
> So, the applications using mark will continue use mark, the same is regarding meta.
> The new application definitely will ask for both mark and of them,
> we have the requirements from customers  - "give us as many through bits as you can".
> This new application just may refuse to work if metadata features are not detected,
> because relays on it strongly.
> BTW, trials are not so complex: rte_flow_validate(mark), rte_flow_validate_metadata()
> and that's it.

In assumption that pattern is supported and fate action used together 
with mark
is supported as well. Not that easy, but OK.

>> May be it will look acceptable when mark becomes a dynamic since usage of
>> either one or another dynamic field is definitely easier than usage of either
>> fixed or dynamic field.
> At least in PMD datapath it does not look very ugly.
>> May be dynamic field for mark at fixed offset should be introduced in the
>> release or the nearest future? It will allow to preserve ABI up to 20.11 and
>> provide future proof API.
>> The trick is to register dynamic meta field at fixed offset at start of a day to
>> be sure that it is guaranteed to succeed.
>> It sounds like it is a transition mechanism from fixed to dynamic fields.
>>
>> [snip]
>>
>>>>> diff --git a/lib/librte_ethdev/rte_flow.h
>>>>> b/lib/librte_ethdev/rte_flow.h index 4fee105..b821557 100644
>>>>> --- a/lib/librte_ethdev/rte_flow.h
>>>>> +++ b/lib/librte_ethdev/rte_flow.h
>>>>> @@ -28,6 +28,8 @@
>>>>>     #include <rte_byteorder.h>
>>>>>     #include <rte_esp.h>
>>>>>     #include <rte_higig.h>
>>>>> +#include <rte_mbuf.h>
>>>>> +#include <rte_mbuf_dyn.h>
>>>>>
>>>>>     #ifdef __cplusplus
>>>>>     extern "C" {
>>>>> @@ -418,7 +420,8 @@ enum rte_flow_item_type {
>>>>>     	/**
>>>>>     	 * [META]
>>>>>     	 *
>>>>> -	 * Matches a metadata value specified in mbuf metadata field.
>>>>> +	 * Matches a metadata value.
>>>>> +	 *
>>>>>     	 * See struct rte_flow_item_meta.
>>>>>     	 */
>>>>>     	RTE_FLOW_ITEM_TYPE_META,
>>>>> @@ -1263,9 +1266,17 @@ struct rte_flow_item_icmp6_nd_opt_tla_eth
>> {
>>>>>     #endif
>>>>>
>>>>>     /**
>>>>> - * RTE_FLOW_ITEM_TYPE_META.
>>>>> + * @warning
>>>>> + * @b EXPERIMENTAL: this structure may change without prior notice
>>>> Is it allowed to make experimental back?
>>> I think we should remove EXPERIMENTAL here. We do not introduce new
>>> feature, but just extend the apply area.
>> Agreed.
>>
>>>>>      *
>>>>> - * Matches a specified metadata value.
>>>>> + * RTE_FLOW_ITEM_TYPE_META
>>>>> + *
>>>>> + * Matches a specified metadata value. On egress, metadata can be
>>>>> + set either by
>>>>> + * mbuf tx_metadata field with PKT_TX_METADATA flag or
>>>>> + * RTE_FLOW_ACTION_TYPE_SET_META. On ingress,
>>>>> + RTE_FLOW_ACTION_TYPE_SET_META sets
>>>>> + * metadata for a packet and the metadata will be reported via mbuf
>>>>> + metadata
>>>>> + * dynamic field with PKT_RX_DYNF_METADATA flag. The dynamic
>> mbuf
>>>>> + field must be
>>>>> + * registered in advance by rte_flow_dynf_metadata_register().
>>>>>      */
>>>>>     struct rte_flow_item_meta {
>>>>>     	rte_be32_t data;
>>>> [snip]
>>>>
>>>>> @@ -2429,6 +2447,55 @@ struct rte_flow_action_set_mac {
>>>>>     	uint8_t mac_addr[RTE_ETHER_ADDR_LEN];
>>>>>     };
>>>>>
>>>>> +/**
>>>>> + * @warning
>>>>> + * @b EXPERIMENTAL: this structure may change without prior notice
>>>>> + *
>>>>> + * RTE_FLOW_ACTION_TYPE_SET_META
>>>>> + *
>>>>> + * Set metadata. Metadata set by mbuf tx_metadata field with
>>>>> + * PKT_TX_METADATA flag on egress will be overridden by this action.
>>>>> +On
>>>>> + * ingress, the metadata will be carried by mbuf metadata dynamic
>>>>> +field
>>>>> + * with PKT_RX_DYNF_METADATA flag if set.  The dynamic mbuf field
>>>>> +must be
>>>>> + * registered in advance by rte_flow_dynf_metadata_register().
>>>>> + *
>>>>> + * Altering partial bits is supported with mask. For bits which
>>>>> +have never
>>>>> + * been set, unpredictable value will be seen depending on driver
>>>>> + * implementation. For loopback/hairpin packet, metadata set on
>>>>> +Rx/Tx may
>>>>> + * or may not be propagated to the other path depending on HW
>>>> capability.
>>>>> + *
>>>>> + * RTE_FLOW_ITEM_TYPE_META matches metadata.
>>>>> + */
>>>>> +struct rte_flow_action_set_meta {
>>>>> +	rte_be32_t data;
>>>>> +	rte_be32_t mask;
>>>> As I understand tx_metadata is host endian. Just double-checking.
>>>> Is a new dynamic field host endian or big endian?
>>>> I definitely would like to see motivation in comments why data/mask
>>>> are big- endian here.
>>> metadata is opaque value, endianness does not matter, there are no
>>> some special motivations for choosing endiannes. rte_flow_item_meta()
>>> structure provides data with rte_be32_t type, so meta related action does
>> the same.
>>
>> Endianness of meta in mbuf and flow API should match and it must be
> Yes, and they match (for meta), both are rte_be32_t. OK, will emphasize it in docs.
>
>> documented. Endianness is important if a HW supports less bits since it
>> makes a hit for application to use LSB first if the bit space is sufficient.
>> mark is defined as host-endian (uint32_t) and I think meta should be the
>> same. Otherwise it complicates even more either mark or meta usage as
>> discussed above .
> mark is mark, meta is meta, these ones are not interrelated (despite they
> are becoming similar). And there is no choice between mark and meta.
> It is not obvious we should have the same endianness for both.
> There are some applications using this legacy features, so there might be compatibility
> issues either.

There are few reasons above to make meta host endian:
- match mark endianness (explained above, I still think that my reasons 
vaild)
- make it easier for application to use it without endianness conversion
   if a sequence number is used to allocate metas (similar to mark in OVS)
   and bit space is less than 32-bits

I see no single reason to keep it big-endian except a reason to keep it.

Since tx_metadata is going away and metadata was used for Tx only
before it is a good moment to fix it.

>> Yes, I think that rte_flow_item_meta should be fixed since both mark and
>> tx_metadata are host-endian.
>>
>> (it says nothing about HW interface which is vendor specific and vendor
>> PMDs should care about it)
> Handling this in PMD might introduce the extra endianness conversion in datapath,
> impacting performance. Not nice, IMO.

These concerns should not affect external interface since
it could be different for different HW vendors.
Slava Ovsiienko Oct. 30, 2019, 10:03 a.m. UTC | #14
> -----Original Message-----
> From: Slava Ovsiienko
> Sent: Wednesday, October 30, 2019 11:00
> To: Andrew Rybchenko <arybchenko@solarflare.com>; Thomas Monjalon
> <thomas@monjalon.net>; olivier.matz@6wind.com
> Cc: dev@dpdk.org; Matan Azrad <matan@mellanox.com>; Ori Kam
> <orika@mellanox.com>; Yongseok Koh <yskoh@mellanox.com>
> Subject: RE: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata
> 
> > -----Original Message-----
> > From: Andrew Rybchenko <arybchenko@solarflare.com>
> > Sent: Wednesday, October 30, 2019 9:35
> > To: Slava Ovsiienko <viacheslavo@mellanox.com>; Thomas Monjalon
> > <thomas@monjalon.net>; olivier.matz@6wind.com
> > Cc: dev@dpdk.org; Matan Azrad <matan@mellanox.com>; Ori Kam
> > <orika@mellanox.com>; Yongseok Koh <yskoh@mellanox.com>
> > Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata
> >
> > @Olivier, please, take a look at the end of the mail.
> >
> > On 10/29/19 8:19 PM, Slava Ovsiienko wrote:
> > > Hi, Andrew
> > >
> > > Thank you for the review.
> > >
> > >> -----Original Message-----
> > >> From: Andrew Rybchenko <arybchenko@solarflare.com>
> > >> Sent: Tuesday, October 29, 2019 18:22
> > >> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> > >> Cc: Thomas Monjalon <thomas@monjalon.net>; Matan Azrad
> > >> <matan@mellanox.com>; olivier.matz@6wind.com; Ori Kam
> > >> <orika@mellanox.com>; Yongseok Koh <yskoh@mellanox.com>
> > >> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata
> > >>
> > >> On 10/27/19 9:40 PM, Viacheslav Ovsiienko wrote:
> > >>> Currently, metadata can be set on egress path via mbuf tx_metadata
> > >>> field with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META
> > >> matches metadata.
> > >>> This patch extends the metadata feature usability.
> > >>>
> > >>> 1) RTE_FLOW_ACTION_TYPE_SET_META
> > >>>
> > >>> When supporting multiple tables, Tx metadata can also be set by a
> > >>> rule and matched by another rule. This new action allows metadata
> > >>> to be set as a result of flow match.
> > >>>
> > >>> 2) Metadata on ingress
> > >>>
> > >>> There's also need to support metadata on ingress. Metadata can be
> > >>> set by SET_META action and matched by META item like Tx. The final
> > >>> value set by the action will be delivered to application via
> > >>> metadata dynamic field of mbuf which can be accessed by
> > >> RTE_FLOW_DYNF_METADATA().
> > >>> PKT_RX_DYNF_METADATA flag will be set along with the data.
> > >>>
> > >>> The mbuf dynamic field must be registered by calling
> > >>> rte_flow_dynf_metadata_register() prior to use SET_META action.
> > >>>
> > >>> The availability of dynamic mbuf metadata field can be checked
> > >>> with
> > >>> rte_flow_dynf_metadata_avail() routine.
> > >>>
> > >>> For loopback/hairpin packet, metadata set on Rx/Tx may or may not
> > >>> be propagated to the other path depending on hardware capability.
> > >>>
> > >>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> > >>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > >> Above explanations lack information about "meta" vs "mark" which
> > >> may be set on Rx as well and delivered in other mbuf field.
> > >> It should be explained by one more field is required and rules defined.
> > > There is some story about metadata features.
> > > Initially, there were proposed two metadata related actions:
> > >
> > > - RTE_FLOW_ACTION_TYPE_FLAG
> > > - RTE_FLOW_ACTION_TYPE_MARK
> > >
> > > These actions set the special flag in the packet metadata, MARK
> > > action stores some specified value in the metadata storage, and, on
> > > the packet receiving PMD puts the flag and value to the mbuf and
> > > applications can see the packet was threated inside flow engine
> > > according to the appropriate RTE flow(s). MARK and FLAG are like
> > > some kind of gateway to transfer some per-packet information from
> > > the flow
> > engine to the application via receiving datapath.
> > >
> > >  From the datapath point of view, the MARK and FLAG are related to
> > > the
> > receiving side only.
> > > It would useful to have the same gateway on the transmitting side
> > > and there was the feature of type RTE_FLOW_ITEM_TYPE_META was
> > proposed.
> > > The application can fill the field in mbuf and this value will be
> > > transferred to
> > some field in the packet metadata inside the flow engine.
> > > It did not matter whether these metadata fields are shared because
> > > of MARK and META items belonged to different domains (receiving and
> > transmitting) and could be vendor-specific.
> > >
> > > So far, so good, DPDK proposes some entities to control metadata
> > > inside the flow engine and gateways to exchange these values on a
> > > per-
> > packet basis via datapaths.
> > >
> > > As we can see, the MARK and META means are not symmetric, there is
> > > absent action which would allow us to set META value on the
> > > transmitting
> > path. So, the action of type:
> > > - RTE_FLOW_ACTION_TYPE_SET_META is proposed.
> > >
> > > The next, applications raise the new requirements for packet metadata.
> > > The flow engines are getting more complex, internal switches are
> > > introduced, multiple ports might be supported within the same flow
> > > engine namespace. From the DPDK points of view, it means the packets
> > > might be sent on one eth_dev port and received on the other one, and
> > > the
> > packet path inside the flow engine entirely belongs to the same
> > hardware device. The simplest example is SR-IOV with PF, VFs and the
> representors.
> > > And there is a brilliant opportunity to provide some out-of-band
> > > channel to
> > transfer some extra data
> > >   from one port to another one, besides the packet data itself.
> > >
> > >
> > >> Above explanations lack information about "meta" vs "mark" which
> > >> may be set on Rx as well and delivered in other mbuf field.
> > >> It should be explained by one more field is required and rules defined.
> > >> Otherwise we can endup in half PMDs supporting mark only, half PMDs
> > >> supporting meta only and applications in an interesting situation
> > >> to make a choice which one to use.
> > > There is no "mark" vs "meta". MARK and META means are kept for
> > > compatibility issues and legacy part works exactly as before. The
> > > trials (with flow_validate)  is supposed to check whether PMD
> > > supports MARK or META feature on appropriate domain. It depends on
> > > PMD implementation, configuration and underlaying HW/FW/kernel
> > > capabilities
> > and should be resolved in runtime.
> >
> > The trials a way, but very tricky way. My imagination draws me
> > pictures how an application code could look like in attempt to use
> > either mark or meta for Rx only and these pictures are not nice.
> Agree, trials is not the best way.
> For now there is no application trying to choose mark or meta, because
> these ones belonged to other domains, and extension is newly introduced.
> So, the applications using mark will continue use mark, the same is regarding
> meta.
> The new application definitely will ask for both mark and of them, we have
> the requirements from customers  - "give us as many through bits as you
> can".
> This new application just may refuse to work if metadata features are not
> detected, because relays on it strongly.
> BTW, trials are not so complex: rte_flow_validate(mark),
> rte_flow_validate_metadata() and that's it.
> 
> > May be it will look acceptable when mark becomes a dynamic since usage
> > of either one or another dynamic field is definitely easier than usage
> > of either fixed or dynamic field.
> At least in PMD datapath it does not look very ugly.
> >
> > May be dynamic field for mark at fixed offset should be introduced in
> > the release or the nearest future? It will allow to preserve ABI up to
> > 20.11 and provide future proof API.
> > The trick is to register dynamic meta field at fixed offset at start
> > of a day to be sure that it is guaranteed to succeed.
> > It sounds like it is a transition mechanism from fixed to dynamic fields.
> >
> > [snip]
> >
> > >>> diff --git a/lib/librte_ethdev/rte_flow.h
> > >>> b/lib/librte_ethdev/rte_flow.h index 4fee105..b821557 100644
> > >>> --- a/lib/librte_ethdev/rte_flow.h
> > >>> +++ b/lib/librte_ethdev/rte_flow.h
> > >>> @@ -28,6 +28,8 @@
> > >>>    #include <rte_byteorder.h>
> > >>>    #include <rte_esp.h>
> > >>>    #include <rte_higig.h>
> > >>> +#include <rte_mbuf.h>
> > >>> +#include <rte_mbuf_dyn.h>
> > >>>
> > >>>    #ifdef __cplusplus
> > >>>    extern "C" {
> > >>> @@ -418,7 +420,8 @@ enum rte_flow_item_type {
> > >>>    	/**
> > >>>    	 * [META]
> > >>>    	 *
> > >>> -	 * Matches a metadata value specified in mbuf metadata field.
> > >>> +	 * Matches a metadata value.
> > >>> +	 *
> > >>>    	 * See struct rte_flow_item_meta.
> > >>>    	 */
> > >>>    	RTE_FLOW_ITEM_TYPE_META,
> > >>> @@ -1263,9 +1266,17 @@ struct
> rte_flow_item_icmp6_nd_opt_tla_eth
> > {
> > >>>    #endif
> > >>>
> > >>>    /**
> > >>> - * RTE_FLOW_ITEM_TYPE_META.
> > >>> + * @warning
> > >>> + * @b EXPERIMENTAL: this structure may change without prior
> > >>> + notice
> > >> Is it allowed to make experimental back?
> > > I think we should remove EXPERIMENTAL here. We do not introduce new
> > > feature, but just extend the apply area.
> >
> > Agreed.
> >
> > >>>     *
> > >>> - * Matches a specified metadata value.
> > >>> + * RTE_FLOW_ITEM_TYPE_META
> > >>> + *
> > >>> + * Matches a specified metadata value. On egress, metadata can be
> > >>> + set either by
> > >>> + * mbuf tx_metadata field with PKT_TX_METADATA flag or
> > >>> + * RTE_FLOW_ACTION_TYPE_SET_META. On ingress,
> > >>> + RTE_FLOW_ACTION_TYPE_SET_META sets
> > >>> + * metadata for a packet and the metadata will be reported via
> > >>> + mbuf metadata
> > >>> + * dynamic field with PKT_RX_DYNF_METADATA flag. The dynamic
> > mbuf
> > >>> + field must be
> > >>> + * registered in advance by rte_flow_dynf_metadata_register().
> > >>>     */
> > >>>    struct rte_flow_item_meta {
> > >>>    	rte_be32_t data;
> > >> [snip]
> > >>
> > >>> @@ -2429,6 +2447,55 @@ struct rte_flow_action_set_mac {
> > >>>    	uint8_t mac_addr[RTE_ETHER_ADDR_LEN];
> > >>>    };
> > >>>
> > >>> +/**
> > >>> + * @warning
> > >>> + * @b EXPERIMENTAL: this structure may change without prior
> > >>> +notice
> > >>> + *
> > >>> + * RTE_FLOW_ACTION_TYPE_SET_META
> > >>> + *
> > >>> + * Set metadata. Metadata set by mbuf tx_metadata field with
> > >>> + * PKT_TX_METADATA flag on egress will be overridden by this action.
> > >>> +On
> > >>> + * ingress, the metadata will be carried by mbuf metadata dynamic
> > >>> +field
> > >>> + * with PKT_RX_DYNF_METADATA flag if set.  The dynamic mbuf field
> > >>> +must be
> > >>> + * registered in advance by rte_flow_dynf_metadata_register().
> > >>> + *
> > >>> + * Altering partial bits is supported with mask. For bits which
> > >>> +have never
> > >>> + * been set, unpredictable value will be seen depending on driver
> > >>> + * implementation. For loopback/hairpin packet, metadata set on
> > >>> +Rx/Tx may
> > >>> + * or may not be propagated to the other path depending on HW
> > >> capability.
> > >>> + *
> > >>> + * RTE_FLOW_ITEM_TYPE_META matches metadata.
> > >>> + */
> > >>> +struct rte_flow_action_set_meta {
> > >>> +	rte_be32_t data;
> > >>> +	rte_be32_t mask;
> > >> As I understand tx_metadata is host endian. Just double-checking.
> > >> Is a new dynamic field host endian or big endian?
> > >> I definitely would like to see motivation in comments why data/mask
> > >> are big- endian here.
> > > metadata is opaque value, endianness does not matter, there are no
> > > some special motivations for choosing endiannes.
> > > rte_flow_item_meta() structure provides data with rte_be32_t type,
> > > so meta related action does
> > the same.
> >
> > Endianness of meta in mbuf and flow API should match and it must be
> Yes, and they match (for meta), both are rte_be32_t. OK, will emphasize it in
> docs.
> 
> > documented. Endianness is important if a HW supports less bits since
> > it makes a hit for application to use LSB first if the bit space is sufficient.
> > mark is defined as host-endian (uint32_t) and I think meta should be
> > the same. Otherwise it complicates even more either mark or meta usage
> > as discussed above .
> mark is mark, meta is meta, these ones are not interrelated (despite they are
> becoming similar). And there is no choice between mark and meta.
> It is not obvious we should have the same endianness for both.
> There are some applications using this legacy features, so there might be
> compatibility issues either.
> 
> >
> > Yes, I think that rte_flow_item_meta should be fixed since both mark
> > and tx_metadata are host-endian.
> >
> > (it says nothing about HW interface which is vendor specific and
> > vendor PMDs should care about it)
> Handling this in PMD might introduce the extra endianness conversion in
> datapath, impacting performance. Not nice, IMO.

Update: I've reviewed other [META] items/actions, all of them are in
host endianness. OK, we are moving tx_metadata to dynfield, it seems to
be good point to alter metadata endianness. And I see the way to avoid
conversions in data path

With best regards, Slava

> >
> > > I could assume the origin of selecting bigendian type was the
> > > endianness of metadata field in Tx descriptor of ConnectX NICs.
> > >
> > >>> +};
> > >>> +
> > >>> +/* Mbuf dynamic field offset for metadata. */ extern int
> > >>> +rte_flow_dynf_metadata_offs;
> > >>> +
> > >>> +/* Mbuf dynamic field flag mask for metadata. */ extern uint64_t
> > >>> +rte_flow_dynf_metadata_mask;
> > >> These two global variables look frightening to me.
> > >> It does not look good to me.
> > > For me too. But we need the performance, these ones are intended for
> > > usage in datapath, any overhead is painful.
> >
> > @Olivier, could you share your thoughts, please.
> >
> > Andrew.
> 
> With best regards, Slava
Slava Ovsiienko Oct. 30, 2019, 10:05 a.m. UTC | #15
> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Wednesday, October 30, 2019 11:20
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; olivier.matz@6wind.com
> Cc: dev@dpdk.org; Matan Azrad <matan@mellanox.com>; Ori Kam
> <orika@mellanox.com>; Yongseok Koh <yskoh@mellanox.com>
> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata
> 
> On 10/30/19 11:59 AM, Slava Ovsiienko wrote:
> >> -----Original Message-----
> >> From: Andrew Rybchenko <arybchenko@solarflare.com>
> >> Sent: Wednesday, October 30, 2019 9:35
> >> To: Slava Ovsiienko <viacheslavo@mellanox.com>; Thomas Monjalon
> >> <thomas@monjalon.net>; olivier.matz@6wind.com
> >> Cc: dev@dpdk.org; Matan Azrad <matan@mellanox.com>; Ori Kam
> >> <orika@mellanox.com>; Yongseok Koh <yskoh@mellanox.com>
> >> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata
> >>
> >> @Olivier, please, take a look at the end of the mail.
> >>
> >> On 10/29/19 8:19 PM, Slava Ovsiienko wrote:
> >>> Hi, Andrew
> >>>
> >>> Thank you for the review.
> >>>
> >>>> -----Original Message-----
> >>>> From: Andrew Rybchenko <arybchenko@solarflare.com>
> >>>> Sent: Tuesday, October 29, 2019 18:22
> >>>> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> >>>> Cc: Thomas Monjalon <thomas@monjalon.net>; Matan Azrad
> >>>> <matan@mellanox.com>; olivier.matz@6wind.com; Ori Kam
> >>>> <orika@mellanox.com>; Yongseok Koh <yskoh@mellanox.com>
> >>>> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata
> >>>>
> >>>> On 10/27/19 9:40 PM, Viacheslav Ovsiienko wrote:
> >>>>> Currently, metadata can be set on egress path via mbuf tx_metadata
> >>>>> field with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META
> >>>> matches metadata.
> >>>>> This patch extends the metadata feature usability.
> >>>>>
> >>>>> 1) RTE_FLOW_ACTION_TYPE_SET_META
> >>>>>
> >>>>> When supporting multiple tables, Tx metadata can also be set by a
> >>>>> rule and matched by another rule. This new action allows metadata
> >>>>> to be set as a result of flow match.
> >>>>>
> >>>>> 2) Metadata on ingress
> >>>>>
> >>>>> There's also need to support metadata on ingress. Metadata can be
> >>>>> set by SET_META action and matched by META item like Tx. The final
> >>>>> value set by the action will be delivered to application via
> >>>>> metadata dynamic field of mbuf which can be accessed by
> >>>> RTE_FLOW_DYNF_METADATA().
> >>>>> PKT_RX_DYNF_METADATA flag will be set along with the data.
> >>>>>
> >>>>> The mbuf dynamic field must be registered by calling
> >>>>> rte_flow_dynf_metadata_register() prior to use SET_META action.
> >>>>>
> >>>>> The availability of dynamic mbuf metadata field can be checked
> >>>>> with
> >>>>> rte_flow_dynf_metadata_avail() routine.
> >>>>>
> >>>>> For loopback/hairpin packet, metadata set on Rx/Tx may or may not
> >>>>> be propagated to the other path depending on hardware capability.
> >>>>>
> >>>>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> >>>>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> >>>> Above explanations lack information about "meta" vs "mark" which
> >>>> may be set on Rx as well and delivered in other mbuf field.
> >>>> It should be explained by one more field is required and rules defined.
> >>> There is some story about metadata features.
> >>> Initially, there were proposed two metadata related actions:
> >>>
> >>> - RTE_FLOW_ACTION_TYPE_FLAG
> >>> - RTE_FLOW_ACTION_TYPE_MARK
> >>>
> >>> These actions set the special flag in the packet metadata, MARK
> >>> action stores some specified value in the metadata storage, and, on
> >>> the packet receiving PMD puts the flag and value to the mbuf and
> >>> applications can see the packet was threated inside flow engine
> >>> according to the appropriate RTE flow(s). MARK and FLAG are like
> >>> some kind of gateway to transfer some per-packet information from
> >>> the flow
> >> engine to the application via receiving datapath.
> >>>   From the datapath point of view, the MARK and FLAG are related to
> >>> the
> >> receiving side only.
> >>> It would useful to have the same gateway on the transmitting side
> >>> and there was the feature of type RTE_FLOW_ITEM_TYPE_META was
> >> proposed.
> >>> The application can fill the field in mbuf and this value will be
> >>> transferred to
> >> some field in the packet metadata inside the flow engine.
> >>> It did not matter whether these metadata fields are shared because
> >>> of MARK and META items belonged to different domains (receiving and
> >> transmitting) and could be vendor-specific.
> >>> So far, so good, DPDK proposes some entities to control metadata
> >>> inside the flow engine and gateways to exchange these values on a
> >>> per-
> >> packet basis via datapaths.
> >>> As we can see, the MARK and META means are not symmetric, there is
> >>> absent action which would allow us to set META value on the
> >>> transmitting
> >> path. So, the action of type:
> >>> - RTE_FLOW_ACTION_TYPE_SET_META is proposed.
> >>>
> >>> The next, applications raise the new requirements for packet metadata.
> >>> The flow engines are getting more complex, internal switches are
> >>> introduced, multiple ports might be supported within the same flow
> >>> engine namespace. From the DPDK points of view, it means the packets
> >>> might be sent on one eth_dev port and received on the other one, and
> >>> the
> >> packet path inside the flow engine entirely belongs to the same
> >> hardware device. The simplest example is SR-IOV with PF, VFs and the
> representors.
> >>> And there is a brilliant opportunity to provide some out-of-band
> >>> channel to
> >> transfer some extra data
> >>>    from one port to another one, besides the packet data itself.
> >>>
> >>>
> >>>> Above explanations lack information about "meta" vs "mark" which
> >>>> may be set on Rx as well and delivered in other mbuf field.
> >>>> It should be explained by one more field is required and rules defined.
> >>>> Otherwise we can endup in half PMDs supporting mark only, half PMDs
> >>>> supporting meta only and applications in an interesting situation
> >>>> to make a choice which one to use.
> >>> There is no "mark" vs "meta". MARK and META means are kept for
> >>> compatibility issues and legacy part works exactly as before. The
> >>> trials (with flow_validate)  is supposed to check whether PMD
> >>> supports MARK or META feature on appropriate domain. It depends on
> >>> PMD implementation, configuration and underlaying HW/FW/kernel
> >>> capabilities
> >> and should be resolved in runtime.
> >>
> >> The trials a way, but very tricky way. My imagination draws me
> >> pictures how an application code could look like in attempt to use
> >> either mark or meta for Rx only and these pictures are not nice.
> > Agree, trials is not the best way.
> > For now there is no application trying to choose mark or meta, because
> > these ones belonged to other domains, and extension is newly introduced.
> > So, the applications using mark will continue use mark, the same is
> regarding meta.
> > The new application definitely will ask for both mark and of them, we
> > have the requirements from customers  - "give us as many through bits as
> you can".
> > This new application just may refuse to work if metadata features are
> > not detected, because relays on it strongly.
> > BTW, trials are not so complex: rte_flow_validate(mark),
> > rte_flow_validate_metadata() and that's it.
> 
> In assumption that pattern is supported and fate action used together with
> mark is supported as well. Not that easy, but OK.
> 
> >> May be it will look acceptable when mark becomes a dynamic since
> >> usage of either one or another dynamic field is definitely easier
> >> than usage of either fixed or dynamic field.
> > At least in PMD datapath it does not look very ugly.
> >> May be dynamic field for mark at fixed offset should be introduced in
> >> the release or the nearest future? It will allow to preserve ABI up
> >> to 20.11 and provide future proof API.
> >> The trick is to register dynamic meta field at fixed offset at start
> >> of a day to be sure that it is guaranteed to succeed.
> >> It sounds like it is a transition mechanism from fixed to dynamic fields.
> >>
> >> [snip]
> >>
> >>>>> diff --git a/lib/librte_ethdev/rte_flow.h
> >>>>> b/lib/librte_ethdev/rte_flow.h index 4fee105..b821557 100644
> >>>>> --- a/lib/librte_ethdev/rte_flow.h
> >>>>> +++ b/lib/librte_ethdev/rte_flow.h
> >>>>> @@ -28,6 +28,8 @@
> >>>>>     #include <rte_byteorder.h>
> >>>>>     #include <rte_esp.h>
> >>>>>     #include <rte_higig.h>
> >>>>> +#include <rte_mbuf.h>
> >>>>> +#include <rte_mbuf_dyn.h>
> >>>>>
> >>>>>     #ifdef __cplusplus
> >>>>>     extern "C" {
> >>>>> @@ -418,7 +420,8 @@ enum rte_flow_item_type {
> >>>>>     	/**
> >>>>>     	 * [META]
> >>>>>     	 *
> >>>>> -	 * Matches a metadata value specified in mbuf metadata field.
> >>>>> +	 * Matches a metadata value.
> >>>>> +	 *
> >>>>>     	 * See struct rte_flow_item_meta.
> >>>>>     	 */
> >>>>>     	RTE_FLOW_ITEM_TYPE_META,
> >>>>> @@ -1263,9 +1266,17 @@ struct
> rte_flow_item_icmp6_nd_opt_tla_eth
> >> {
> >>>>>     #endif
> >>>>>
> >>>>>     /**
> >>>>> - * RTE_FLOW_ITEM_TYPE_META.
> >>>>> + * @warning
> >>>>> + * @b EXPERIMENTAL: this structure may change without prior
> >>>>> + notice
> >>>> Is it allowed to make experimental back?
> >>> I think we should remove EXPERIMENTAL here. We do not introduce
> new
> >>> feature, but just extend the apply area.
> >> Agreed.
> >>
> >>>>>      *
> >>>>> - * Matches a specified metadata value.
> >>>>> + * RTE_FLOW_ITEM_TYPE_META
> >>>>> + *
> >>>>> + * Matches a specified metadata value. On egress, metadata can be
> >>>>> + set either by
> >>>>> + * mbuf tx_metadata field with PKT_TX_METADATA flag or
> >>>>> + * RTE_FLOW_ACTION_TYPE_SET_META. On ingress,
> >>>>> + RTE_FLOW_ACTION_TYPE_SET_META sets
> >>>>> + * metadata for a packet and the metadata will be reported via
> >>>>> + mbuf metadata
> >>>>> + * dynamic field with PKT_RX_DYNF_METADATA flag. The dynamic
> >> mbuf
> >>>>> + field must be
> >>>>> + * registered in advance by rte_flow_dynf_metadata_register().
> >>>>>      */
> >>>>>     struct rte_flow_item_meta {
> >>>>>     	rte_be32_t data;
> >>>> [snip]
> >>>>
> >>>>> @@ -2429,6 +2447,55 @@ struct rte_flow_action_set_mac {
> >>>>>     	uint8_t mac_addr[RTE_ETHER_ADDR_LEN];
> >>>>>     };
> >>>>>
> >>>>> +/**
> >>>>> + * @warning
> >>>>> + * @b EXPERIMENTAL: this structure may change without prior
> >>>>> +notice
> >>>>> + *
> >>>>> + * RTE_FLOW_ACTION_TYPE_SET_META
> >>>>> + *
> >>>>> + * Set metadata. Metadata set by mbuf tx_metadata field with
> >>>>> + * PKT_TX_METADATA flag on egress will be overridden by this
> action.
> >>>>> +On
> >>>>> + * ingress, the metadata will be carried by mbuf metadata dynamic
> >>>>> +field
> >>>>> + * with PKT_RX_DYNF_METADATA flag if set.  The dynamic mbuf field
> >>>>> +must be
> >>>>> + * registered in advance by rte_flow_dynf_metadata_register().
> >>>>> + *
> >>>>> + * Altering partial bits is supported with mask. For bits which
> >>>>> +have never
> >>>>> + * been set, unpredictable value will be seen depending on driver
> >>>>> + * implementation. For loopback/hairpin packet, metadata set on
> >>>>> +Rx/Tx may
> >>>>> + * or may not be propagated to the other path depending on HW
> >>>> capability.
> >>>>> + *
> >>>>> + * RTE_FLOW_ITEM_TYPE_META matches metadata.
> >>>>> + */
> >>>>> +struct rte_flow_action_set_meta {
> >>>>> +	rte_be32_t data;
> >>>>> +	rte_be32_t mask;
> >>>> As I understand tx_metadata is host endian. Just double-checking.
> >>>> Is a new dynamic field host endian or big endian?
> >>>> I definitely would like to see motivation in comments why data/mask
> >>>> are big- endian here.
> >>> metadata is opaque value, endianness does not matter, there are no
> >>> some special motivations for choosing endiannes.
> >>> rte_flow_item_meta() structure provides data with rte_be32_t type,
> >>> so meta related action does
> >> the same.
> >>
> >> Endianness of meta in mbuf and flow API should match and it must be
> > Yes, and they match (for meta), both are rte_be32_t. OK, will emphasize it
> in docs.
> >
> >> documented. Endianness is important if a HW supports less bits since
> >> it makes a hit for application to use LSB first if the bit space is sufficient.
> >> mark is defined as host-endian (uint32_t) and I think meta should be
> >> the same. Otherwise it complicates even more either mark or meta
> >> usage as discussed above .
> > mark is mark, meta is meta, these ones are not interrelated (despite
> > they are becoming similar). And there is no choice between mark and meta.
> > It is not obvious we should have the same endianness for both.
> > There are some applications using this legacy features, so there might
> > be compatibility issues either.
> 
> There are few reasons above to make meta host endian:
> - match mark endianness (explained above, I still think that my reasons
> vaild)
> - make it easier for application to use it without endianness conversion
>    if a sequence number is used to allocate metas (similar to mark in OVS)
>    and bit space is less than 32-bits
> 
> I see no single reason to keep it big-endian except a reason to keep it.
> 
> Since tx_metadata is going away and metadata was used for Tx only before
> it is a good moment to fix it.
> 
> >> Yes, I think that rte_flow_item_meta should be fixed since both mark
> >> and tx_metadata are host-endian.
> >>
> >> (it says nothing about HW interface which is vendor specific and
> >> vendor PMDs should care about it)
> > Handling this in PMD might introduce the extra endianness conversion
> > in datapath, impacting performance. Not nice, IMO.
> 
> These concerns should not affect external interface since it could be
> different for different HW vendors.

OK, will alter metadata endianness to host one. There Is undefeatable argument
the other [META] items/actions are all in HE.

With best regards, Slava
Olivier Matz Oct. 30, 2019, 3:49 p.m. UTC | #16
Hi,

On Wed, Oct 30, 2019 at 10:35:16AM +0300, Andrew Rybchenko wrote:
> @Olivier, please, take a look at the end of the mail.
> 

(...)

> On 10/29/19 8:19 PM, Slava Ovsiienko wrote:
> > > > +};
> > > > +
> > > > +/* Mbuf dynamic field offset for metadata. */ extern int
> > > > +rte_flow_dynf_metadata_offs;
> > > > +
> > > > +/* Mbuf dynamic field flag mask for metadata. */ extern uint64_t
> > > > +rte_flow_dynf_metadata_mask;
> > > These two global variables look frightening to me.
> > > It does not look good to me.
> > For me too. But we need the performance, these ones are
> > intended for usage in datapath, any overhead is painful.
> 
> @Olivier, could you share your thoughts, please.

Having a global variable looks unavoidable to me, if we want
performance.

An alternative can be to use static global variables in every file that
use this dynamic field, and call the register method from it. But I
don't think it would scale if a dynamic field is widely used.

Why does it look frigthening to you?

The constraint is that before using this variable, the register function
has to be called. I don't think there are race conditions, because the
field/flag registration is lock protected and always return the same
value.
Andrew Rybchenko Oct. 31, 2019, 9:25 a.m. UTC | #17
On 10/30/19 6:49 PM, Olivier Matz wrote:
> Hi,
>
> On Wed, Oct 30, 2019 at 10:35:16AM +0300, Andrew Rybchenko wrote:
>> @Olivier, please, take a look at the end of the mail.
>>
> (...)
>
>> On 10/29/19 8:19 PM, Slava Ovsiienko wrote:
>>>>> +};
>>>>> +
>>>>> +/* Mbuf dynamic field offset for metadata. */ extern int
>>>>> +rte_flow_dynf_metadata_offs;
>>>>> +
>>>>> +/* Mbuf dynamic field flag mask for metadata. */ extern uint64_t
>>>>> +rte_flow_dynf_metadata_mask;
>>>> These two global variables look frightening to me.
>>>> It does not look good to me.
>>> For me too. But we need the performance, these ones are
>>> intended for usage in datapath, any overhead is painful.
>> @Olivier, could you share your thoughts, please.
> Having a global variable looks unavoidable to me, if we want
> performance.
>
> An alternative can be to use static global variables in every file that
> use this dynamic field, and call the register method from it. But I
> don't think it would scale if a dynamic field is widely used.

Yes, I see.

> Why does it look frigthening to you?

It is just a good/bad design feeling. No specific technical reasons 
right now.
Let's try and take a look how it goes.

> The constraint is that before using this variable, the register function
> has to be called. I don't think there are race conditions, because the
> field/flag registration is lock protected and always return the same
> value.
diff mbox series

Patch

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 0d0bc0a..e4ef066 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -316,6 +316,9 @@  enum index {
 	ACTION_RAW_ENCAP_INDEX_VALUE,
 	ACTION_RAW_DECAP_INDEX,
 	ACTION_RAW_DECAP_INDEX_VALUE,
+	ACTION_SET_META,
+	ACTION_SET_META_DATA,
+	ACTION_SET_META_MASK,
 };
 
 /** Maximum size for pattern in struct rte_flow_item_raw. */
@@ -1067,6 +1070,7 @@  struct parse_action_priv {
 	ACTION_DEC_TCP_ACK,
 	ACTION_RAW_ENCAP,
 	ACTION_RAW_DECAP,
+	ACTION_SET_META,
 	ZERO,
 };
 
@@ -1265,6 +1269,13 @@  struct parse_action_priv {
 	ZERO,
 };
 
+static const enum index action_set_meta[] = {
+	ACTION_SET_META_DATA,
+	ACTION_SET_META_MASK,
+	ACTION_NEXT,
+	ZERO,
+};
+
 static int parse_set_raw_encap_decap(struct context *, const struct token *,
 				     const char *, unsigned int,
 				     void *, unsigned int);
@@ -1329,6 +1340,10 @@  static int parse_vc_action_raw_encap_index(struct context *,
 static int parse_vc_action_raw_decap_index(struct context *,
 					   const struct token *, const char *,
 					   unsigned int, void *, unsigned int);
+static int parse_vc_action_set_meta(struct context *ctx,
+				    const struct token *token, const char *str,
+				    unsigned int len, void *buf,
+				    unsigned int size);
 static int parse_destroy(struct context *, const struct token *,
 			 const char *, unsigned int,
 			 void *, unsigned int);
@@ -3378,7 +3393,31 @@  static int comp_set_raw_index(struct context *, const struct token *,
 		.help = "index of raw_encap/raw_decap data",
 		.next = NEXT(next_item),
 		.call = parse_port,
-	}
+	},
+	[ACTION_SET_META] = {
+		.name = "set_meta",
+		.help = "set metadata",
+		.priv = PRIV_ACTION(SET_META,
+			sizeof(struct rte_flow_action_set_meta)),
+		.next = NEXT(action_set_meta),
+		.call = parse_vc_action_set_meta,
+	},
+	[ACTION_SET_META_DATA] = {
+		.name = "data",
+		.help = "metadata value",
+		.next = NEXT(action_set_meta, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY_HTON
+			     (struct rte_flow_action_set_meta, data)),
+		.call = parse_vc_conf,
+	},
+	[ACTION_SET_META_MASK] = {
+		.name = "mask",
+		.help = "mask for metadata value",
+		.next = NEXT(action_set_meta, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY_HTON
+			     (struct rte_flow_action_set_meta, mask)),
+		.call = parse_vc_conf,
+	},
 };
 
 /** Remove and return last entry from argument stack. */
@@ -4818,6 +4857,22 @@  static int comp_set_raw_index(struct context *, const struct token *,
 	return ret;
 }
 
+static int
+parse_vc_action_set_meta(struct context *ctx, const struct token *token,
+			 const char *str, unsigned int len, void *buf,
+			 unsigned int size)
+{
+	int ret;
+
+	ret = parse_vc(ctx, token, str, len, buf, size);
+	if (ret < 0)
+		return ret;
+	ret = rte_flow_dynf_metadata_register();
+	if (ret < 0)
+		return -1;
+	return len;
+}
+
 /** Parse tokens for destroy command. */
 static int
 parse_destroy(struct context *ctx, const struct token *token,
diff --git a/app/test-pmd/util.c b/app/test-pmd/util.c
index f20531d..56075b3 100644
--- a/app/test-pmd/util.c
+++ b/app/test-pmd/util.c
@@ -82,6 +82,11 @@ 
 			       mb->vlan_tci, mb->vlan_tci_outer);
 		else if (ol_flags & PKT_RX_VLAN)
 			printf(" - VLAN tci=0x%x", mb->vlan_tci);
+		if (ol_flags & PKT_TX_METADATA)
+			printf(" - Tx metadata: 0x%x", mb->tx_metadata);
+		if (ol_flags & PKT_RX_DYNF_METADATA)
+			printf(" - Rx metadata: 0x%x",
+			       *RTE_FLOW_DYNF_METADATA(mb));
 		if (mb->packet_type) {
 			rte_get_ptype_name(mb->packet_type, buf, sizeof(buf));
 			printf(" - hw ptype: %s", buf);
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 159ce19..c943aca 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -658,6 +658,32 @@  the physical device, with virtual groups in the PMD or not at all.
    | ``mask`` | ``id``   | zeroed to match any value |
    +----------+----------+---------------------------+
 
+Item: ``META``
+^^^^^^^^^^^^^^^^^
+
+Matches 32 bit metadata item set.
+
+On egress, metadata can be set either by mbuf metadata field with
+PKT_TX_METADATA flag or ``SET_META`` action. On ingress, ``SET_META``
+action sets metadata for a packet and the metadata will be reported via
+``metadata`` dynamic field of ``rte_mbuf`` with PKT_RX_DYNF_METADATA flag.
+
+- Default ``mask`` matches the specified Rx metadata value.
+
+.. _table_rte_flow_item_meta:
+
+.. table:: META
+
+   +----------+----------+---------------------------------------+
+   | Field    | Subfield | Value                                 |
+   +==========+==========+=======================================+
+   | ``spec`` | ``data`` | 32 bit metadata value                 |
+   +----------+----------+---------------------------------------+
+   | ``last`` | ``data`` | upper range value                     |
+   +----------+----------+---------------------------------------+
+   | ``mask`` | ``data`` | bit-mask applies to "spec" and "last" |
+   +----------+----------+---------------------------------------+
+
 Data matching item types
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -1232,21 +1258,6 @@  Matches a PPPoE session protocol identifier.
 - ``proto_id``: PPP protocol identifier.
 - Default ``mask`` matches proto_id only.
 
-
-.. _table_rte_flow_item_meta:
-
-.. table:: META
-
-   +----------+----------+---------------------------------------+
-   | Field    | Subfield | Value                                 |
-   +==========+==========+=======================================+
-   | ``spec`` | ``data`` | 32 bit metadata value                 |
-   +----------+--------------------------------------------------+
-   | ``last`` | ``data`` | upper range value                     |
-   +----------+----------+---------------------------------------+
-   | ``mask`` | ``data`` | bit-mask applies to "spec" and "last" |
-   +----------+----------+---------------------------------------+
-
 Item: ``NSH``
 ^^^^^^^^^^^^^
 
@@ -2466,6 +2477,37 @@  Value to decrease TCP acknowledgment number by is a big-endian 32 bit integer.
 
 Using this action on non-matching traffic will result in undefined behavior.
 
+Action: ``SET_META``
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Set metadata. Item ``META`` matches metadata.
+
+Metadata set by mbuf metadata field with PKT_TX_METADATA flag on egress will be
+overridden by this action. On ingress, the metadata will be carried by
+``metadata`` dynamic field of ``rte_mbuf`` which can be accessed by
+``RTE_FLOW_DYNF_METADATA()``. PKT_RX_DYNF_METADATA flag will be set along
+with the data.
+
+The mbuf dynamic field must be registered by calling
+``rte_flow_dynf_metadata_register()`` prior to use ``SET_META`` action.
+
+Altering partial bits is supported with ``mask``. For bits which have never been
+set, unpredictable value will be seen depending on driver implementation. For
+loopback/hairpin packet, metadata set on Rx/Tx may or may not be propagated to
+the other path depending on HW capability.
+
+.. _table_rte_flow_action_set_meta:
+
+.. table:: SET_META
+
+   +----------+----------------------------+
+   | Field    | Value                      |
+   +==========+============================+
+   | ``data`` | 32 bit metadata value      |
+   +----------+----------------------------+
+   | ``mask`` | bit-mask applies to "data" |
+   +----------+----------------------------+
+
 Negative types
 ~~~~~~~~~~~~~~
 
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 3aa1634..9d54d8e 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -106,6 +106,10 @@  Deprecation Notices
   struct ``rte_eth_dev_info`` for the port capability and in struct
   ``rte_eth_rxmode`` for the port configuration.
 
+* ethdev: DEV_TX_OFFLOAD_MATCH_METADATA will be removed, static metadata
+  mbuf field will be removed in 20.02, metadata feature will use dynamic mbuf
+  field and flag instead.
+
 * cryptodev: support for using IV with all sizes is added, J0 still can
   be used but only when IV length in following structs ``rte_crypto_auth_xform``,
   ``rte_crypto_aead_xform`` is set to zero. When IV length is greater or equal
diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index 0e5bb5d..6d331f6 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -232,6 +232,21 @@  New Features
     gives ability to print port supported ptypes in different protocol layers.
 
 
+* **Add support of support dynamic fields and flags in mbuf.**
+
+  This new feature adds the ability to dynamically register some room
+  for a field or a flag in the mbuf structure. This is typically used
+  for specific offload features, where adding a static field or flag
+  in the mbuf is not justified.
+
+* **Extended metadata support in rte_flow.**
+
+  Flow metadata is extended to both Rx and Tx.
+
+  * Tx metadata can also be set by SET_META action of rte_flow.
+  * Rx metadata is delivered to host via a dynamic field of ``rte_mbuf`` with
+    PKT_RX_DYNF_METADATA.
+
 Removed Items
 -------------
 
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index c36c1b6..b19c86b 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1048,7 +1048,6 @@  struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_KEEP_CRC		0x00010000
 #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
 #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
-
 #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
 				 DEV_RX_OFFLOAD_UDP_CKSUM | \
 				 DEV_RX_OFFLOAD_TCP_CKSUM)
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index e59d516..a5bf643 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -288,4 +288,7 @@  EXPERIMENTAL {
 	rte_eth_rx_burst_mode_get;
 	rte_eth_tx_burst_mode_get;
 	rte_eth_burst_mode_option_name;
+	rte_flow_dynf_metadata_offs;
+	rte_flow_dynf_metadata_mask;
+	rte_flow_dynf_metadata_register;
 };
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index ca0f680..6090177 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -12,10 +12,18 @@ 
 #include <rte_errno.h>
 #include <rte_branch_prediction.h>
 #include <rte_string_fns.h>
+#include <rte_mbuf.h>
+#include <rte_mbuf_dyn.h>
 #include "rte_ethdev.h"
 #include "rte_flow_driver.h"
 #include "rte_flow.h"
 
+/* Mbuf dynamic field name for metadata. */
+int rte_flow_dynf_metadata_offs = -1;
+
+/* Mbuf dynamic field flag bit number for metadata. */
+uint64_t rte_flow_dynf_metadata_mask;
+
 /**
  * Flow elements description tables.
  */
@@ -157,8 +165,41 @@  struct rte_flow_desc_data {
 	MK_FLOW_ACTION(DEC_TCP_SEQ, sizeof(rte_be32_t)),
 	MK_FLOW_ACTION(INC_TCP_ACK, sizeof(rte_be32_t)),
 	MK_FLOW_ACTION(DEC_TCP_ACK, sizeof(rte_be32_t)),
+	MK_FLOW_ACTION(SET_META, sizeof(struct rte_flow_action_set_meta)),
 };
 
+int
+rte_flow_dynf_metadata_register(void)
+{
+	int offset;
+	int flag;
+
+	static const struct rte_mbuf_dynfield desc_offs = {
+		.name = MBUF_DYNF_METADATA_NAME,
+		.size = sizeof(uint32_t),
+		.align = __alignof__(uint32_t),
+		.flags = 0,
+	};
+	static const struct rte_mbuf_dynflag desc_flag = {
+		.name = MBUF_DYNF_METADATA_NAME,
+	};
+
+	offset = rte_mbuf_dynfield_register(&desc_offs);
+	if (offset < 0)
+		goto error;
+	flag = rte_mbuf_dynflag_register(&desc_flag);
+	if (flag < 0)
+		goto error;
+	rte_flow_dynf_metadata_offs = offset;
+	rte_flow_dynf_metadata_mask = (1ULL << flag);
+	return 0;
+
+error:
+	rte_flow_dynf_metadata_offs = -1;
+	rte_flow_dynf_metadata_mask = 0ULL;
+	return -rte_errno;
+}
+
 static int
 flow_err(uint16_t port_id, int ret, struct rte_flow_error *error)
 {
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index 4fee105..b821557 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -28,6 +28,8 @@ 
 #include <rte_byteorder.h>
 #include <rte_esp.h>
 #include <rte_higig.h>
+#include <rte_mbuf.h>
+#include <rte_mbuf_dyn.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -418,7 +420,8 @@  enum rte_flow_item_type {
 	/**
 	 * [META]
 	 *
-	 * Matches a metadata value specified in mbuf metadata field.
+	 * Matches a metadata value.
+	 *
 	 * See struct rte_flow_item_meta.
 	 */
 	RTE_FLOW_ITEM_TYPE_META,
@@ -1263,9 +1266,17 @@  struct rte_flow_item_icmp6_nd_opt_tla_eth {
 #endif
 
 /**
- * RTE_FLOW_ITEM_TYPE_META.
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
  *
- * Matches a specified metadata value.
+ * RTE_FLOW_ITEM_TYPE_META
+ *
+ * Matches a specified metadata value. On egress, metadata can be set either by
+ * mbuf tx_metadata field with PKT_TX_METADATA flag or
+ * RTE_FLOW_ACTION_TYPE_SET_META. On ingress, RTE_FLOW_ACTION_TYPE_SET_META sets
+ * metadata for a packet and the metadata will be reported via mbuf metadata
+ * dynamic field with PKT_RX_DYNF_METADATA flag. The dynamic mbuf field must be
+ * registered in advance by rte_flow_dynf_metadata_register().
  */
 struct rte_flow_item_meta {
 	rte_be32_t data;
@@ -1942,6 +1953,13 @@  enum rte_flow_action_type {
 	 * undefined behavior.
 	 */
 	RTE_FLOW_ACTION_TYPE_DEC_TCP_ACK,
+
+	/**
+	 * Set metadata on ingress or egress path.
+	 *
+	 * See struct rte_flow_action_set_meta.
+	 */
+	RTE_FLOW_ACTION_TYPE_SET_META,
 };
 
 /**
@@ -2429,6 +2447,55 @@  struct rte_flow_action_set_mac {
 	uint8_t mac_addr[RTE_ETHER_ADDR_LEN];
 };
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ *
+ * RTE_FLOW_ACTION_TYPE_SET_META
+ *
+ * Set metadata. Metadata set by mbuf tx_metadata field with
+ * PKT_TX_METADATA flag on egress will be overridden by this action. On
+ * ingress, the metadata will be carried by mbuf metadata dynamic field
+ * with PKT_RX_DYNF_METADATA flag if set.  The dynamic mbuf field must be
+ * registered in advance by rte_flow_dynf_metadata_register().
+ *
+ * Altering partial bits is supported with mask. For bits which have never
+ * been set, unpredictable value will be seen depending on driver
+ * implementation. For loopback/hairpin packet, metadata set on Rx/Tx may
+ * or may not be propagated to the other path depending on HW capability.
+ *
+ * RTE_FLOW_ITEM_TYPE_META matches metadata.
+ */
+struct rte_flow_action_set_meta {
+	rte_be32_t data;
+	rte_be32_t mask;
+};
+
+/* Mbuf dynamic field offset for metadata. */
+extern int rte_flow_dynf_metadata_offs;
+
+/* Mbuf dynamic field flag mask for metadata. */
+extern uint64_t rte_flow_dynf_metadata_mask;
+
+/* Mbuf dynamic field pointer for metadata. */
+#define RTE_FLOW_DYNF_METADATA(m) \
+	RTE_MBUF_DYNFIELD((m), rte_flow_dynf_metadata_offs, uint32_t *)
+
+/* Mbuf dynamic flag for metadata. */
+#define PKT_RX_DYNF_METADATA (rte_flow_dynf_metadata_mask)
+
+__rte_experimental
+static inline uint32_t
+rte_flow_dynf_metadata_get(struct rte_mbuf *m) {
+	return *RTE_FLOW_DYNF_METADATA(m);
+}
+
+__rte_experimental
+static inline void
+rte_flow_dynf_metadata_set(struct rte_mbuf *m, uint32_t v) {
+	*RTE_FLOW_DYNF_METADATA(m) = v;
+}
+
 /*
  * Definition of a single action.
  *
@@ -2662,6 +2729,32 @@  enum rte_flow_conv_op {
 };
 
 /**
+ * Check if mbuf dynamic field for metadata is registered.
+ *
+ * @return
+ *   True if registered, false otherwise.
+ */
+__rte_experimental
+static inline int
+rte_flow_dynf_metadata_avail(void) {
+	return !!rte_flow_dynf_metadata_mask;
+}
+
+/**
+ * Register mbuf dynamic field and flag for metadata.
+ *
+ * This function must be called prior to use SET_META action in order to
+ * register the dynamic mbuf field. Otherwise, the data cannot be delivered to
+ * application.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_dynf_metadata_register(void);
+
+/**
  * Check whether a flow rule can be created on a given port.
  *
  * The flow rule is validated for correctness and whether it could be accepted
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 2e9d418..a4a0cf5 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -234,6 +234,10 @@  int rte_mbuf_dynflag_lookup(const char *name,
 __rte_experimental
 void rte_mbuf_dyn_dump(FILE *out);
 
-/* Placeholder for dynamic fields and flags declarations. */
-
+/*
+ * Placeholder for dynamic fields and flags declarations.
+ * This is centralizing point to gather all field names
+ * and parameters together.
+ */
+#define MBUF_DYNF_METADATA_NAME "rte_flow_dynfield_metadata"
 #endif