ethdev: extend flow metadata

Message ID 20190704232122.19477-1-yskoh@mellanox.com (mailing list archive)
State Changes Requested, archived
Delegated to: Ferruh Yigit
Headers
Series ethdev: extend flow metadata |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/mellanox-Performance-Testing success Performance Testing PASS
ci/intel-Performance-Testing success Performance Testing PASS
ci/Intel-compilation fail apply issues

Commit Message

Yongseok Koh July 4, 2019, 11:21 p.m. UTC
  Currently, metadata can be set on egress path via mbuf tx_meatadata field
with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.

This patch extends the usability.

1) RTE_FLOW_ACTION_TYPE_SET_META

When supporting multiple tables, Tx metadata can also be set by a rule and
matched by another rule. This new action allows metadata to be set as a
result of flow match.

2) Metadata on ingress

There's also need to support metadata on packet Rx. Metadata can be set by
SET_META action and matched by META item like Tx. The final value set by
the action will be delivered to application via mbuf metadata field with
PKT_RX_METADATA ol_flag.

For this purpose, mbuf->tx_metadata is moved as a separate new field and
renamed to 'metadata' to support both Rx and Tx metadata.

For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
propagated to the other path depending on HW capability.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
---
 app/test-pmd/cmdline_flow.c            | 35 ++++++++++++
 app/test-pmd/util.c                    |  2 +-
 doc/guides/prog_guide/rte_flow.rst     | 73 ++++++++++++++++++--------
 doc/guides/rel_notes/release_19_08.rst | 10 ++++
 drivers/net/mlx5/mlx5_rxtx.c           | 12 ++---
 drivers/net/mlx5/mlx5_rxtx_vec.c       |  4 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h  |  2 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h   |  2 +-
 lib/librte_ethdev/rte_ethdev.h         |  5 ++
 lib/librte_ethdev/rte_flow.h           | 43 +++++++++++++--
 lib/librte_mbuf/rte_mbuf.h             | 21 ++++----
 11 files changed, 161 insertions(+), 48 deletions(-)
  

Comments

Olivier Matz July 10, 2019, 9:31 a.m. UTC | #1
Hi Yongseok,

On Thu, Jul 04, 2019 at 04:21:22PM -0700, Yongseok Koh wrote:
> Currently, metadata can be set on egress path via mbuf tx_meatadata field
> with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
> 
> This patch extends the usability.
> 
> 1) RTE_FLOW_ACTION_TYPE_SET_META
> 
> When supporting multiple tables, Tx metadata can also be set by a rule and
> matched by another rule. This new action allows metadata to be set as a
> result of flow match.
> 
> 2) Metadata on ingress
> 
> There's also need to support metadata on packet Rx. Metadata can be set by
> SET_META action and matched by META item like Tx. The final value set by
> the action will be delivered to application via mbuf metadata field with
> PKT_RX_METADATA ol_flag.
> 
> For this purpose, mbuf->tx_metadata is moved as a separate new field and
> renamed to 'metadata' to support both Rx and Tx metadata.
> 
> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> propagated to the other path depending on HW capability.
> 
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>

(...)

> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -200,6 +200,11 @@ extern "C" {
>  
>  /* add new RX flags here */
>  
> +/**
> + * Indicate that mbuf has metadata from device.
> + */
> +#define PKT_RX_METADATA	(1ULL << 23)
> +
>  /* add new TX flags here */
>  
>  /**
> @@ -648,17 +653,6 @@ struct rte_mbuf {
>  			/**< User defined tags. See rte_distributor_process() */
>  			uint32_t usr;
>  		} hash;                   /**< hash information */
> -		struct {
> -			/**
> -			 * Application specific metadata value
> -			 * for egress flow rule match.
> -			 * Valid if PKT_TX_METADATA is set.
> -			 * Located here to allow conjunct use
> -			 * with hash.sched.hi.
> -			 */
> -			uint32_t tx_metadata;
> -			uint32_t reserved;
> -		};
>  	};
>  
>  	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ is set. */
> @@ -727,6 +721,11 @@ struct rte_mbuf {
>  	 */
>  	struct rte_mbuf_ext_shared_info *shinfo;
>  
> +	/** Application specific metadata value for flow rule match.
> +	 * Valid if PKT_RX_METADATA or PKT_TX_METADATA is set.
> +	 */
> +	uint32_t metadata;
> +
>  } __rte_cache_aligned;

This will break the ABI, so we cannot put it in 19.08, and we need a
deprecation notice.

That said, it shows again that we need something to make the addition of
fields in mbufs more flexible. Knowing that Thomas will present
something about it at next userspace [1], I drafted something in a RFC
[2]. It is simpler than I expected, and I think your commit could be
a good candidate for a first user. Do you mind having a try? Feedback
and comment is of course welcome.

If it fits your needs, we can target its introduction for 19.11. Having
a user for this new feature would be a plus for its introduction :)

Thanks,
Olivier

[1] https://dpdkbordeaux2019.sched.com/
[2] http://mails.dpdk.org/archives/dev/2019-July/137772.html
  
Bruce Richardson July 10, 2019, 9:55 a.m. UTC | #2
On Wed, Jul 10, 2019 at 11:31:56AM +0200, Olivier Matz wrote:
> Hi Yongseok,
> 
> On Thu, Jul 04, 2019 at 04:21:22PM -0700, Yongseok Koh wrote:
> > Currently, metadata can be set on egress path via mbuf tx_meatadata field
> > with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
> > 
> > This patch extends the usability.
> > 
> > 1) RTE_FLOW_ACTION_TYPE_SET_META
> > 
> > When supporting multiple tables, Tx metadata can also be set by a rule and
> > matched by another rule. This new action allows metadata to be set as a
> > result of flow match.
> > 
> > 2) Metadata on ingress
> > 
> > There's also need to support metadata on packet Rx. Metadata can be set by
> > SET_META action and matched by META item like Tx. The final value set by
> > the action will be delivered to application via mbuf metadata field with
> > PKT_RX_METADATA ol_flag.
> > 
> > For this purpose, mbuf->tx_metadata is moved as a separate new field and
> > renamed to 'metadata' to support both Rx and Tx metadata.
> > 
> > For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> > propagated to the other path depending on HW capability.
> > 
> > Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> 
> (...)
> 
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -200,6 +200,11 @@ extern "C" {
> >  
> >  /* add new RX flags here */
> >  
> > +/**
> > + * Indicate that mbuf has metadata from device.
> > + */
> > +#define PKT_RX_METADATA	(1ULL << 23)
> > +
> >  /* add new TX flags here */
> >  
> >  /**
> > @@ -648,17 +653,6 @@ struct rte_mbuf {
> >  			/**< User defined tags. See rte_distributor_process() */
> >  			uint32_t usr;
> >  		} hash;                   /**< hash information */
> > -		struct {
> > -			/**
> > -			 * Application specific metadata value
> > -			 * for egress flow rule match.
> > -			 * Valid if PKT_TX_METADATA is set.
> > -			 * Located here to allow conjunct use
> > -			 * with hash.sched.hi.
> > -			 */
> > -			uint32_t tx_metadata;
> > -			uint32_t reserved;
> > -		};
> >  	};
> >  
> >  	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ is set. */
> > @@ -727,6 +721,11 @@ struct rte_mbuf {
> >  	 */
> >  	struct rte_mbuf_ext_shared_info *shinfo;
> >  
> > +	/** Application specific metadata value for flow rule match.
> > +	 * Valid if PKT_RX_METADATA or PKT_TX_METADATA is set.
> > +	 */
> > +	uint32_t metadata;
> > +
> >  } __rte_cache_aligned;
> 
> This will break the ABI, so we cannot put it in 19.08, and we need a
> deprecation notice.
> 
Does it actually break the ABI? Adding a new field to the mbuf should only
break the ABI if it either causes new fields to move or changes the
structure size. Since this is at the end, it's not going to move any older
fields, and since everything is cache-aligned I don't think the structure
size changes either.
  
Olivier Matz July 10, 2019, 10:07 a.m. UTC | #3
On Wed, Jul 10, 2019 at 10:55:34AM +0100, Bruce Richardson wrote:
> On Wed, Jul 10, 2019 at 11:31:56AM +0200, Olivier Matz wrote:
> > Hi Yongseok,
> > 
> > On Thu, Jul 04, 2019 at 04:21:22PM -0700, Yongseok Koh wrote:
> > > Currently, metadata can be set on egress path via mbuf tx_meatadata field
> > > with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
> > > 
> > > This patch extends the usability.
> > > 
> > > 1) RTE_FLOW_ACTION_TYPE_SET_META
> > > 
> > > When supporting multiple tables, Tx metadata can also be set by a rule and
> > > matched by another rule. This new action allows metadata to be set as a
> > > result of flow match.
> > > 
> > > 2) Metadata on ingress
> > > 
> > > There's also need to support metadata on packet Rx. Metadata can be set by
> > > SET_META action and matched by META item like Tx. The final value set by
> > > the action will be delivered to application via mbuf metadata field with
> > > PKT_RX_METADATA ol_flag.
> > > 
> > > For this purpose, mbuf->tx_metadata is moved as a separate new field and
> > > renamed to 'metadata' to support both Rx and Tx metadata.
> > > 
> > > For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> > > propagated to the other path depending on HW capability.
> > > 
> > > Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> > 
> > (...)
> > 
> > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > @@ -200,6 +200,11 @@ extern "C" {
> > >  
> > >  /* add new RX flags here */
> > >  
> > > +/**
> > > + * Indicate that mbuf has metadata from device.
> > > + */
> > > +#define PKT_RX_METADATA	(1ULL << 23)
> > > +
> > >  /* add new TX flags here */
> > >  
> > >  /**
> > > @@ -648,17 +653,6 @@ struct rte_mbuf {
> > >  			/**< User defined tags. See rte_distributor_process() */
> > >  			uint32_t usr;
> > >  		} hash;                   /**< hash information */
> > > -		struct {
> > > -			/**
> > > -			 * Application specific metadata value
> > > -			 * for egress flow rule match.
> > > -			 * Valid if PKT_TX_METADATA is set.
> > > -			 * Located here to allow conjunct use
> > > -			 * with hash.sched.hi.
> > > -			 */
> > > -			uint32_t tx_metadata;
> > > -			uint32_t reserved;
> > > -		};
> > >  	};
> > >  
> > >  	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ is set. */
> > > @@ -727,6 +721,11 @@ struct rte_mbuf {
> > >  	 */
> > >  	struct rte_mbuf_ext_shared_info *shinfo;
> > >  
> > > +	/** Application specific metadata value for flow rule match.
> > > +	 * Valid if PKT_RX_METADATA or PKT_TX_METADATA is set.
> > > +	 */
> > > +	uint32_t metadata;
> > > +
> > >  } __rte_cache_aligned;
> > 
> > This will break the ABI, so we cannot put it in 19.08, and we need a
> > deprecation notice.
> > 
> Does it actually break the ABI? Adding a new field to the mbuf should only
> break the ABI if it either causes new fields to move or changes the
> structure size. Since this is at the end, it's not going to move any older
> fields, and since everything is cache-aligned I don't think the structure
> size changes either.

I think it does break the ABI: in previous version, when the PKT_TX_METADATA
flag is set, the associated value is put in m->tx_metadata (offset 44 on
x86-64), and in the next version, it will be in m->metadata (offset 112). So,
these 2 versions are not binary compatible.

Anyway, at least it breaks the API.
  
Bruce Richardson July 10, 2019, 12:01 p.m. UTC | #4
On Wed, Jul 10, 2019 at 12:07:43PM +0200, Olivier Matz wrote:
> On Wed, Jul 10, 2019 at 10:55:34AM +0100, Bruce Richardson wrote:
> > On Wed, Jul 10, 2019 at 11:31:56AM +0200, Olivier Matz wrote:
> > > Hi Yongseok,
> > > 
> > > On Thu, Jul 04, 2019 at 04:21:22PM -0700, Yongseok Koh wrote:
> > > > Currently, metadata can be set on egress path via mbuf tx_meatadata field
> > > > with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
> > > > 
> > > > This patch extends the usability.
> > > > 
> > > > 1) RTE_FLOW_ACTION_TYPE_SET_META
> > > > 
> > > > When supporting multiple tables, Tx metadata can also be set by a rule and
> > > > matched by another rule. This new action allows metadata to be set as a
> > > > result of flow match.
> > > > 
> > > > 2) Metadata on ingress
> > > > 
> > > > There's also need to support metadata on packet Rx. Metadata can be set by
> > > > SET_META action and matched by META item like Tx. The final value set by
> > > > the action will be delivered to application via mbuf metadata field with
> > > > PKT_RX_METADATA ol_flag.
> > > > 
> > > > For this purpose, mbuf->tx_metadata is moved as a separate new field and
> > > > renamed to 'metadata' to support both Rx and Tx metadata.
> > > > 
> > > > For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> > > > propagated to the other path depending on HW capability.
> > > > 
> > > > Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> > > 
> > > (...)
> > > 
> > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > @@ -200,6 +200,11 @@ extern "C" {
> > > >  
> > > >  /* add new RX flags here */
> > > >  
> > > > +/**
> > > > + * Indicate that mbuf has metadata from device.
> > > > + */
> > > > +#define PKT_RX_METADATA	(1ULL << 23)
> > > > +
> > > >  /* add new TX flags here */
> > > >  
> > > >  /**
> > > > @@ -648,17 +653,6 @@ struct rte_mbuf {
> > > >  			/**< User defined tags. See rte_distributor_process() */
> > > >  			uint32_t usr;
> > > >  		} hash;                   /**< hash information */
> > > > -		struct {
> > > > -			/**
> > > > -			 * Application specific metadata value
> > > > -			 * for egress flow rule match.
> > > > -			 * Valid if PKT_TX_METADATA is set.
> > > > -			 * Located here to allow conjunct use
> > > > -			 * with hash.sched.hi.
> > > > -			 */
> > > > -			uint32_t tx_metadata;
> > > > -			uint32_t reserved;
> > > > -		};
> > > >  	};
> > > >  
> > > >  	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ is set. */
> > > > @@ -727,6 +721,11 @@ struct rte_mbuf {
> > > >  	 */
> > > >  	struct rte_mbuf_ext_shared_info *shinfo;
> > > >  
> > > > +	/** Application specific metadata value for flow rule match.
> > > > +	 * Valid if PKT_RX_METADATA or PKT_TX_METADATA is set.
> > > > +	 */
> > > > +	uint32_t metadata;
> > > > +
> > > >  } __rte_cache_aligned;
> > > 
> > > This will break the ABI, so we cannot put it in 19.08, and we need a
> > > deprecation notice.
> > > 
> > Does it actually break the ABI? Adding a new field to the mbuf should only
> > break the ABI if it either causes new fields to move or changes the
> > structure size. Since this is at the end, it's not going to move any older
> > fields, and since everything is cache-aligned I don't think the structure
> > size changes either.
> 
> I think it does break the ABI: in previous version, when the PKT_TX_METADATA
> flag is set, the associated value is put in m->tx_metadata (offset 44 on
> x86-64), and in the next version, it will be in m->metadata (offset 112). So,
> these 2 versions are not binary compatible.
> 
> Anyway, at least it breaks the API.

Ok, I misunderstood. I thought it was the structure change itself you were
saying broke the ABI. Yes, putting the data in a different place is indeed
an ABI break.

/Bruce
  
Thomas Monjalon July 10, 2019, 12:26 p.m. UTC | #5
10/07/2019 14:01, Bruce Richardson:
> On Wed, Jul 10, 2019 at 12:07:43PM +0200, Olivier Matz wrote:
> > On Wed, Jul 10, 2019 at 10:55:34AM +0100, Bruce Richardson wrote:
> > > On Wed, Jul 10, 2019 at 11:31:56AM +0200, Olivier Matz wrote:
> > > > On Thu, Jul 04, 2019 at 04:21:22PM -0700, Yongseok Koh wrote:
> > > > > Currently, metadata can be set on egress path via mbuf tx_meatadata field
> > > > > with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
> > > > > 
> > > > > This patch extends the usability.
> > > > > 
> > > > > 1) RTE_FLOW_ACTION_TYPE_SET_META
> > > > > 
> > > > > When supporting multiple tables, Tx metadata can also be set by a rule and
> > > > > matched by another rule. This new action allows metadata to be set as a
> > > > > result of flow match.
> > > > > 
> > > > > 2) Metadata on ingress
> > > > > 
> > > > > There's also need to support metadata on packet Rx. Metadata can be set by
> > > > > SET_META action and matched by META item like Tx. The final value set by
> > > > > the action will be delivered to application via mbuf metadata field with
> > > > > PKT_RX_METADATA ol_flag.
> > > > > 
> > > > > For this purpose, mbuf->tx_metadata is moved as a separate new field and
> > > > > renamed to 'metadata' to support both Rx and Tx metadata.
> > > > > 
> > > > > For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> > > > > propagated to the other path depending on HW capability.
> > > > > 
> > > > > Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> > > > 
> > > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > @@ -648,17 +653,6 @@ struct rte_mbuf {
> > > > >  			/**< User defined tags. See rte_distributor_process() */
> > > > >  			uint32_t usr;
> > > > >  		} hash;                   /**< hash information */
> > > > > -		struct {
> > > > > -			/**
> > > > > -			 * Application specific metadata value
> > > > > -			 * for egress flow rule match.
> > > > > -			 * Valid if PKT_TX_METADATA is set.
> > > > > -			 * Located here to allow conjunct use
> > > > > -			 * with hash.sched.hi.
> > > > > -			 */
> > > > > -			uint32_t tx_metadata;
> > > > > -			uint32_t reserved;
> > > > > -		};
> > > > >  	};
> > > > >  
> > > > >  	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ is set. */
> > > > > @@ -727,6 +721,11 @@ struct rte_mbuf {
> > > > >  	 */
> > > > >  	struct rte_mbuf_ext_shared_info *shinfo;
> > > > >  
> > > > > +	/** Application specific metadata value for flow rule match.
> > > > > +	 * Valid if PKT_RX_METADATA or PKT_TX_METADATA is set.
> > > > > +	 */
> > > > > +	uint32_t metadata;
> > > > > +
> > > > >  } __rte_cache_aligned;
> > > > 
> > > > This will break the ABI, so we cannot put it in 19.08, and we need a
> > > > deprecation notice.
> > > > 
> > > Does it actually break the ABI? Adding a new field to the mbuf should only
> > > break the ABI if it either causes new fields to move or changes the
> > > structure size. Since this is at the end, it's not going to move any older
> > > fields, and since everything is cache-aligned I don't think the structure
> > > size changes either.
> > 
> > I think it does break the ABI: in previous version, when the PKT_TX_METADATA
> > flag is set, the associated value is put in m->tx_metadata (offset 44 on
> > x86-64), and in the next version, it will be in m->metadata (offset 112). So,
> > these 2 versions are not binary compatible.
> > 
> > Anyway, at least it breaks the API.
> 
> Ok, I misunderstood. I thought it was the structure change itself you were
> saying broke the ABI. Yes, putting the data in a different place is indeed
> an ABI break.

We could add the new field and keep the old one unused,
so it does not break the ABI.
However I suppose everybody will prefer a version using dynamic fields.
Is someone against using dynamic field for such usage?
  
Yongseok Koh July 10, 2019, 4:37 p.m. UTC | #6
> On Jul 10, 2019, at 5:26 AM, Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> 10/07/2019 14:01, Bruce Richardson:
>> On Wed, Jul 10, 2019 at 12:07:43PM +0200, Olivier Matz wrote:
>>> On Wed, Jul 10, 2019 at 10:55:34AM +0100, Bruce Richardson wrote:
>>>> On Wed, Jul 10, 2019 at 11:31:56AM +0200, Olivier Matz wrote:
>>>>> On Thu, Jul 04, 2019 at 04:21:22PM -0700, Yongseok Koh wrote:
>>>>>> Currently, metadata can be set on egress path via mbuf tx_meatadata field
>>>>>> with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
>>>>>> 
>>>>>> This patch extends the usability.
>>>>>> 
>>>>>> 1) RTE_FLOW_ACTION_TYPE_SET_META
>>>>>> 
>>>>>> When supporting multiple tables, Tx metadata can also be set by a rule and
>>>>>> matched by another rule. This new action allows metadata to be set as a
>>>>>> result of flow match.
>>>>>> 
>>>>>> 2) Metadata on ingress
>>>>>> 
>>>>>> There's also need to support metadata on packet Rx. Metadata can be set by
>>>>>> SET_META action and matched by META item like Tx. The final value set by
>>>>>> the action will be delivered to application via mbuf metadata field with
>>>>>> PKT_RX_METADATA ol_flag.
>>>>>> 
>>>>>> For this purpose, mbuf->tx_metadata is moved as a separate new field and
>>>>>> renamed to 'metadata' to support both Rx and Tx metadata.
>>>>>> 
>>>>>> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
>>>>>> propagated to the other path depending on HW capability.
>>>>>> 
>>>>>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
>>>>> 
>>>>>> --- a/lib/librte_mbuf/rte_mbuf.h
>>>>>> +++ b/lib/librte_mbuf/rte_mbuf.h
>>>>>> @@ -648,17 +653,6 @@ struct rte_mbuf {
>>>>>> 			/**< User defined tags. See rte_distributor_process() */
>>>>>> 			uint32_t usr;
>>>>>> 		} hash;                   /**< hash information */
>>>>>> -		struct {
>>>>>> -			/**
>>>>>> -			 * Application specific metadata value
>>>>>> -			 * for egress flow rule match.
>>>>>> -			 * Valid if PKT_TX_METADATA is set.
>>>>>> -			 * Located here to allow conjunct use
>>>>>> -			 * with hash.sched.hi.
>>>>>> -			 */
>>>>>> -			uint32_t tx_metadata;
>>>>>> -			uint32_t reserved;
>>>>>> -		};
>>>>>> 	};
>>>>>> 
>>>>>> 	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ is set. */
>>>>>> @@ -727,6 +721,11 @@ struct rte_mbuf {
>>>>>> 	 */
>>>>>> 	struct rte_mbuf_ext_shared_info *shinfo;
>>>>>> 
>>>>>> +	/** Application specific metadata value for flow rule match.
>>>>>> +	 * Valid if PKT_RX_METADATA or PKT_TX_METADATA is set.
>>>>>> +	 */
>>>>>> +	uint32_t metadata;
>>>>>> +
>>>>>> } __rte_cache_aligned;
>>>>> 
>>>>> This will break the ABI, so we cannot put it in 19.08, and we need a
>>>>> deprecation notice.
>>>>> 
>>>> Does it actually break the ABI? Adding a new field to the mbuf should only
>>>> break the ABI if it either causes new fields to move or changes the
>>>> structure size. Since this is at the end, it's not going to move any older
>>>> fields, and since everything is cache-aligned I don't think the structure
>>>> size changes either.
>>> 
>>> I think it does break the ABI: in previous version, when the PKT_TX_METADATA
>>> flag is set, the associated value is put in m->tx_metadata (offset 44 on
>>> x86-64), and in the next version, it will be in m->metadata (offset 112). So,
>>> these 2 versions are not binary compatible.
>>> 
>>> Anyway, at least it breaks the API.
>> 
>> Ok, I misunderstood. I thought it was the structure change itself you were
>> saying broke the ABI. Yes, putting the data in a different place is indeed
>> an ABI break.
> 
> We could add the new field and keep the old one unused,
> so it does not break the ABI.

Still breaks ABI if PKT_TX_METADATA is set. :-) In order not to break it, I can
keep the current union'd field (tx_metadata) as is with PKT_TX_METADATA, add
the new one at the end and make it used with the new PKT_RX_METADATA.

> However I suppose everybody will prefer a version using dynamic fields.
> Is someone against using dynamic field for such usage?

However, given that the amazing dynamic fields is coming soon (thanks for your
effort, Olivier and Thomas!), I'd be honored to be the first user of it.

Olivier, I'll take a look at your RFC.


Thanks,
Yongseok
  
Adrien Mazarguil July 11, 2019, 7:44 a.m. UTC | #7
On Wed, Jul 10, 2019 at 04:37:46PM +0000, Yongseok Koh wrote:
> 
> > On Jul 10, 2019, at 5:26 AM, Thomas Monjalon <thomas@monjalon.net> wrote:
> > 
> > 10/07/2019 14:01, Bruce Richardson:
> >> On Wed, Jul 10, 2019 at 12:07:43PM +0200, Olivier Matz wrote:
> >>> On Wed, Jul 10, 2019 at 10:55:34AM +0100, Bruce Richardson wrote:
> >>>> On Wed, Jul 10, 2019 at 11:31:56AM +0200, Olivier Matz wrote:
> >>>>> On Thu, Jul 04, 2019 at 04:21:22PM -0700, Yongseok Koh wrote:
> >>>>>> Currently, metadata can be set on egress path via mbuf tx_meatadata field
> >>>>>> with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
> >>>>>> 
> >>>>>> This patch extends the usability.
> >>>>>> 
> >>>>>> 1) RTE_FLOW_ACTION_TYPE_SET_META
> >>>>>> 
> >>>>>> When supporting multiple tables, Tx metadata can also be set by a rule and
> >>>>>> matched by another rule. This new action allows metadata to be set as a
> >>>>>> result of flow match.
> >>>>>> 
> >>>>>> 2) Metadata on ingress
> >>>>>> 
> >>>>>> There's also need to support metadata on packet Rx. Metadata can be set by
> >>>>>> SET_META action and matched by META item like Tx. The final value set by
> >>>>>> the action will be delivered to application via mbuf metadata field with
> >>>>>> PKT_RX_METADATA ol_flag.
> >>>>>> 
> >>>>>> For this purpose, mbuf->tx_metadata is moved as a separate new field and
> >>>>>> renamed to 'metadata' to support both Rx and Tx metadata.
> >>>>>> 
> >>>>>> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> >>>>>> propagated to the other path depending on HW capability.
> >>>>>> 
> >>>>>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> >>>>> 
> >>>>>> --- a/lib/librte_mbuf/rte_mbuf.h
> >>>>>> +++ b/lib/librte_mbuf/rte_mbuf.h
> >>>>>> @@ -648,17 +653,6 @@ struct rte_mbuf {
> >>>>>> 			/**< User defined tags. See rte_distributor_process() */
> >>>>>> 			uint32_t usr;
> >>>>>> 		} hash;                   /**< hash information */
> >>>>>> -		struct {
> >>>>>> -			/**
> >>>>>> -			 * Application specific metadata value
> >>>>>> -			 * for egress flow rule match.
> >>>>>> -			 * Valid if PKT_TX_METADATA is set.
> >>>>>> -			 * Located here to allow conjunct use
> >>>>>> -			 * with hash.sched.hi.
> >>>>>> -			 */
> >>>>>> -			uint32_t tx_metadata;
> >>>>>> -			uint32_t reserved;
> >>>>>> -		};
> >>>>>> 	};
> >>>>>> 
> >>>>>> 	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ is set. */
> >>>>>> @@ -727,6 +721,11 @@ struct rte_mbuf {
> >>>>>> 	 */
> >>>>>> 	struct rte_mbuf_ext_shared_info *shinfo;
> >>>>>> 
> >>>>>> +	/** Application specific metadata value for flow rule match.
> >>>>>> +	 * Valid if PKT_RX_METADATA or PKT_TX_METADATA is set.
> >>>>>> +	 */
> >>>>>> +	uint32_t metadata;
> >>>>>> +
> >>>>>> } __rte_cache_aligned;
> >>>>> 
> >>>>> This will break the ABI, so we cannot put it in 19.08, and we need a
> >>>>> deprecation notice.
> >>>>> 
> >>>> Does it actually break the ABI? Adding a new field to the mbuf should only
> >>>> break the ABI if it either causes new fields to move or changes the
> >>>> structure size. Since this is at the end, it's not going to move any older
> >>>> fields, and since everything is cache-aligned I don't think the structure
> >>>> size changes either.
> >>> 
> >>> I think it does break the ABI: in previous version, when the PKT_TX_METADATA
> >>> flag is set, the associated value is put in m->tx_metadata (offset 44 on
> >>> x86-64), and in the next version, it will be in m->metadata (offset 112). So,
> >>> these 2 versions are not binary compatible.
> >>> 
> >>> Anyway, at least it breaks the API.
> >> 
> >> Ok, I misunderstood. I thought it was the structure change itself you were
> >> saying broke the ABI. Yes, putting the data in a different place is indeed
> >> an ABI break.
> > 
> > We could add the new field and keep the old one unused,
> > so it does not break the ABI.
> 
> Still breaks ABI if PKT_TX_METADATA is set. :-) In order not to break it, I can
> keep the current union'd field (tx_metadata) as is with PKT_TX_METADATA, add
> the new one at the end and make it used with the new PKT_RX_METADATA.
> 
> > However I suppose everybody will prefer a version using dynamic fields.
> > Is someone against using dynamic field for such usage?
> 
> However, given that the amazing dynamic fields is coming soon (thanks for your
> effort, Olivier and Thomas!), I'd be honored to be the first user of it.
> 
> Olivier, I'll take a look at your RFC.

Just got a crazy idea while reading this thread... How about repurposing
that "reserved" field as "rx_metadata" in the meantime?

I know reserved fields are cursed and no one's ever supposed to touch them
but this risk is mitigated by having the end user explicitly request its
use, so the patch author (and his relatives) should be safe from the
resulting bad juju.

Joke aside, while I like the idea of Tx/Rx META, I think the similarities
with MARK (and TAG eventually) is a problem. I wasn't available and couldn't
comment when META was originally added to the Tx path, but there's a lot of
overlap between these items/actions, without anything explaining to the end
user how and why they should pick one over the other, if they can be
combined at all and what happens in that case.

All this must be documented, then we should think about unifying their
respective features and deprecate the less capable items/actions. In my
opinion, users need exactly one method to mark/match some mark while
processing Rx/Tx traffic and *optionally* have that mark read from/written
to the mbuf, which may or may not be possible depending on HW features.
  
Andrew Rybchenko July 14, 2019, 11:46 a.m. UTC | #8
On 11.07.2019 10:44, Adrien Mazarguil wrote:
> On Wed, Jul 10, 2019 at 04:37:46PM +0000, Yongseok Koh wrote:
>>> On Jul 10, 2019, at 5:26 AM, Thomas Monjalon <thomas@monjalon.net> wrote:
>>>
>>> 10/07/2019 14:01, Bruce Richardson:
>>>> On Wed, Jul 10, 2019 at 12:07:43PM +0200, Olivier Matz wrote:
>>>>> On Wed, Jul 10, 2019 at 10:55:34AM +0100, Bruce Richardson wrote:
>>>>>> On Wed, Jul 10, 2019 at 11:31:56AM +0200, Olivier Matz wrote:
>>>>>>> On Thu, Jul 04, 2019 at 04:21:22PM -0700, Yongseok Koh wrote:
>>>>>>>> Currently, metadata can be set on egress path via mbuf tx_meatadata field
>>>>>>>> with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
>>>>>>>>
>>>>>>>> This patch extends the usability.
>>>>>>>>
>>>>>>>> 1) RTE_FLOW_ACTION_TYPE_SET_META
>>>>>>>>
>>>>>>>> When supporting multiple tables, Tx metadata can also be set by a rule and
>>>>>>>> matched by another rule. This new action allows metadata to be set as a
>>>>>>>> result of flow match.
>>>>>>>>
>>>>>>>> 2) Metadata on ingress
>>>>>>>>
>>>>>>>> There's also need to support metadata on packet Rx. Metadata can be set by
>>>>>>>> SET_META action and matched by META item like Tx. The final value set by
>>>>>>>> the action will be delivered to application via mbuf metadata field with
>>>>>>>> PKT_RX_METADATA ol_flag.
>>>>>>>>
>>>>>>>> For this purpose, mbuf->tx_metadata is moved as a separate new field and
>>>>>>>> renamed to 'metadata' to support both Rx and Tx metadata.
>>>>>>>>
>>>>>>>> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
>>>>>>>> propagated to the other path depending on HW capability.
>>>>>>>>
>>>>>>>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
>>>>>>>> --- a/lib/librte_mbuf/rte_mbuf.h
>>>>>>>> +++ b/lib/librte_mbuf/rte_mbuf.h
>>>>>>>> @@ -648,17 +653,6 @@ struct rte_mbuf {
>>>>>>>> 			/**< User defined tags. See rte_distributor_process() */
>>>>>>>> 			uint32_t usr;
>>>>>>>> 		} hash;                   /**< hash information */
>>>>>>>> -		struct {
>>>>>>>> -			/**
>>>>>>>> -			 * Application specific metadata value
>>>>>>>> -			 * for egress flow rule match.
>>>>>>>> -			 * Valid if PKT_TX_METADATA is set.
>>>>>>>> -			 * Located here to allow conjunct use
>>>>>>>> -			 * with hash.sched.hi.
>>>>>>>> -			 */
>>>>>>>> -			uint32_t tx_metadata;
>>>>>>>> -			uint32_t reserved;
>>>>>>>> -		};
>>>>>>>> 	};
>>>>>>>>
>>>>>>>> 	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ is set. */
>>>>>>>> @@ -727,6 +721,11 @@ struct rte_mbuf {
>>>>>>>> 	 */
>>>>>>>> 	struct rte_mbuf_ext_shared_info *shinfo;
>>>>>>>>
>>>>>>>> +	/** Application specific metadata value for flow rule match.
>>>>>>>> +	 * Valid if PKT_RX_METADATA or PKT_TX_METADATA is set.
>>>>>>>> +	 */
>>>>>>>> +	uint32_t metadata;
>>>>>>>> +
>>>>>>>> } __rte_cache_aligned;
>>>>>>> This will break the ABI, so we cannot put it in 19.08, and we need a
>>>>>>> deprecation notice.
>>>>>>>
>>>>>> Does it actually break the ABI? Adding a new field to the mbuf should only
>>>>>> break the ABI if it either causes new fields to move or changes the
>>>>>> structure size. Since this is at the end, it's not going to move any older
>>>>>> fields, and since everything is cache-aligned I don't think the structure
>>>>>> size changes either.
>>>>> I think it does break the ABI: in previous version, when the PKT_TX_METADATA
>>>>> flag is set, the associated value is put in m->tx_metadata (offset 44 on
>>>>> x86-64), and in the next version, it will be in m->metadata (offset 112). So,
>>>>> these 2 versions are not binary compatible.
>>>>>
>>>>> Anyway, at least it breaks the API.
>>>> Ok, I misunderstood. I thought it was the structure change itself you were
>>>> saying broke the ABI. Yes, putting the data in a different place is indeed
>>>> an ABI break.
>>> We could add the new field and keep the old one unused,
>>> so it does not break the ABI.
>> Still breaks ABI if PKT_TX_METADATA is set. :-) In order not to break it, I can
>> keep the current union'd field (tx_metadata) as is with PKT_TX_METADATA, add
>> the new one at the end and make it used with the new PKT_RX_METADATA.
>>
>>> However I suppose everybody will prefer a version using dynamic fields.
>>> Is someone against using dynamic field for such usage?
>> However, given that the amazing dynamic fields is coming soon (thanks for your
>> effort, Olivier and Thomas!), I'd be honored to be the first user of it.
>>
>> Olivier, I'll take a look at your RFC.
> Just got a crazy idea while reading this thread... How about repurposing
> that "reserved" field as "rx_metadata" in the meantime?

It overlaps with hash.fdir.hi which has RSS hash.

> I know reserved fields are cursed and no one's ever supposed to touch them
> but this risk is mitigated by having the end user explicitly request its
> use, so the patch author (and his relatives) should be safe from the
> resulting bad juju.
>
> Joke aside, while I like the idea of Tx/Rx META, I think the similarities
> with MARK (and TAG eventually) is a problem. I wasn't available and couldn't
> comment when META was originally added to the Tx path, but there's a lot of
> overlap between these items/actions, without anything explaining to the end
> user how and why they should pick one over the other, if they can be
> combined at all and what happens in that case.
>
> All this must be documented, then we should think about unifying their
> respective features and deprecate the less capable items/actions. In my
> opinion, users need exactly one method to mark/match some mark while
> processing Rx/Tx traffic and *optionally* have that mark read from/written
> to the mbuf, which may or may not be possible depending on HW features.
>
  
Adrien Mazarguil July 29, 2019, 3:06 p.m. UTC | #9
On Sun, Jul 14, 2019 at 02:46:58PM +0300, Andrew Rybchenko wrote:
> On 11.07.2019 10:44, Adrien Mazarguil wrote:
> > On Wed, Jul 10, 2019 at 04:37:46PM +0000, Yongseok Koh wrote:
> > > > On Jul 10, 2019, at 5:26 AM, Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > 
> > > > 10/07/2019 14:01, Bruce Richardson:
> > > > > On Wed, Jul 10, 2019 at 12:07:43PM +0200, Olivier Matz wrote:
> > > > > > On Wed, Jul 10, 2019 at 10:55:34AM +0100, Bruce Richardson wrote:
> > > > > > > On Wed, Jul 10, 2019 at 11:31:56AM +0200, Olivier Matz wrote:
> > > > > > > > On Thu, Jul 04, 2019 at 04:21:22PM -0700, Yongseok Koh wrote:
> > > > > > > > > Currently, metadata can be set on egress path via mbuf tx_meatadata field
> > > > > > > > > with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
> > > > > > > > > 
> > > > > > > > > This patch extends the usability.
> > > > > > > > > 
> > > > > > > > > 1) RTE_FLOW_ACTION_TYPE_SET_META
> > > > > > > > > 
> > > > > > > > > When supporting multiple tables, Tx metadata can also be set by a rule and
> > > > > > > > > matched by another rule. This new action allows metadata to be set as a
> > > > > > > > > result of flow match.
> > > > > > > > > 
> > > > > > > > > 2) Metadata on ingress
> > > > > > > > > 
> > > > > > > > > There's also need to support metadata on packet Rx. Metadata can be set by
> > > > > > > > > SET_META action and matched by META item like Tx. The final value set by
> > > > > > > > > the action will be delivered to application via mbuf metadata field with
> > > > > > > > > PKT_RX_METADATA ol_flag.
> > > > > > > > > 
> > > > > > > > > For this purpose, mbuf->tx_metadata is moved as a separate new field and
> > > > > > > > > renamed to 'metadata' to support both Rx and Tx metadata.
> > > > > > > > > 
> > > > > > > > > For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> > > > > > > > > propagated to the other path depending on HW capability.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> > > > > > > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > @@ -648,17 +653,6 @@ struct rte_mbuf {
> > > > > > > > > 			/**< User defined tags. See rte_distributor_process() */
> > > > > > > > > 			uint32_t usr;
> > > > > > > > > 		} hash;                   /**< hash information */
> > > > > > > > > -		struct {
> > > > > > > > > -			/**
> > > > > > > > > -			 * Application specific metadata value
> > > > > > > > > -			 * for egress flow rule match.
> > > > > > > > > -			 * Valid if PKT_TX_METADATA is set.
> > > > > > > > > -			 * Located here to allow conjunct use
> > > > > > > > > -			 * with hash.sched.hi.
> > > > > > > > > -			 */
> > > > > > > > > -			uint32_t tx_metadata;
> > > > > > > > > -			uint32_t reserved;
> > > > > > > > > -		};
> > > > > > > > > 	};
> > > > > > > > > 
> > > > > > > > > 	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ is set. */
> > > > > > > > > @@ -727,6 +721,11 @@ struct rte_mbuf {
> > > > > > > > > 	 */
> > > > > > > > > 	struct rte_mbuf_ext_shared_info *shinfo;
> > > > > > > > > 
> > > > > > > > > +	/** Application specific metadata value for flow rule match.
> > > > > > > > > +	 * Valid if PKT_RX_METADATA or PKT_TX_METADATA is set.
> > > > > > > > > +	 */
> > > > > > > > > +	uint32_t metadata;
> > > > > > > > > +
> > > > > > > > > } __rte_cache_aligned;
> > > > > > > > This will break the ABI, so we cannot put it in 19.08, and we need a
> > > > > > > > deprecation notice.
> > > > > > > > 
> > > > > > > Does it actually break the ABI? Adding a new field to the mbuf should only
> > > > > > > break the ABI if it either causes new fields to move or changes the
> > > > > > > structure size. Since this is at the end, it's not going to move any older
> > > > > > > fields, and since everything is cache-aligned I don't think the structure
> > > > > > > size changes either.
> > > > > > I think it does break the ABI: in previous version, when the PKT_TX_METADATA
> > > > > > flag is set, the associated value is put in m->tx_metadata (offset 44 on
> > > > > > x86-64), and in the next version, it will be in m->metadata (offset 112). So,
> > > > > > these 2 versions are not binary compatible.
> > > > > > 
> > > > > > Anyway, at least it breaks the API.
> > > > > Ok, I misunderstood. I thought it was the structure change itself you were
> > > > > saying broke the ABI. Yes, putting the data in a different place is indeed
> > > > > an ABI break.
> > > > We could add the new field and keep the old one unused,
> > > > so it does not break the ABI.
> > > Still breaks ABI if PKT_TX_METADATA is set. :-) In order not to break it, I can
> > > keep the current union'd field (tx_metadata) as is with PKT_TX_METADATA, add
> > > the new one at the end and make it used with the new PKT_RX_METADATA.
> > > 
> > > > However I suppose everybody will prefer a version using dynamic fields.
> > > > Is someone against using dynamic field for such usage?
> > > However, given that the amazing dynamic fields is coming soon (thanks for your
> > > effort, Olivier and Thomas!), I'd be honored to be the first user of it.
> > > 
> > > Olivier, I'll take a look at your RFC.
> > Just got a crazy idea while reading this thread... How about repurposing
> > that "reserved" field as "rx_metadata" in the meantime?
> 
> It overlaps with hash.fdir.hi which has RSS hash.

While it does overlap with hash.fdir.hi, isn't the RSS hash stored in the
"rss" field overlapping with hash.fdir.lo? (see struct rte_flow_action_rss)

hash.fdir.hi was originally used by FDIR and later repurposed by rte_flow
for its MARK action, which neatly qualifies as Rx metadata so renaming
"reserved" as "rx_metadata" could already make sense.

That is, assuming users do not need two different kinds of Rx metadata
returned simultaneously with their packets. I think it's safe.

> > I know reserved fields are cursed and no one's ever supposed to touch them
> > but this risk is mitigated by having the end user explicitly request its
> > use, so the patch author (and his relatives) should be safe from the
> > resulting bad juju.
> > 
> > Joke aside, while I like the idea of Tx/Rx META, I think the similarities
> > with MARK (and TAG eventually) is a problem. I wasn't available and couldn't
> > comment when META was originally added to the Tx path, but there's a lot of
> > overlap between these items/actions, without anything explaining to the end
> > user how and why they should pick one over the other, if they can be
> > combined at all and what happens in that case.
> > 
> > All this must be documented, then we should think about unifying their
> > respective features and deprecate the less capable items/actions. In my
> > opinion, users need exactly one method to mark/match some mark while
> > processing Rx/Tx traffic and *optionally* have that mark read from/written
> > to the mbuf, which may or may not be possible depending on HW features.

Thoughts regarding this suggestion? From a user perspective I think all
these actions should be unified but maybe there are good reasons to keep
them separate?
  
Ferruh Yigit Oct. 8, 2019, 12:51 p.m. UTC | #10
On 7/29/2019 4:06 PM, Adrien Mazarguil wrote:
> On Sun, Jul 14, 2019 at 02:46:58PM +0300, Andrew Rybchenko wrote:
>> On 11.07.2019 10:44, Adrien Mazarguil wrote:
>>> On Wed, Jul 10, 2019 at 04:37:46PM +0000, Yongseok Koh wrote:
>>>>> On Jul 10, 2019, at 5:26 AM, Thomas Monjalon <thomas@monjalon.net> wrote:
>>>>>
>>>>> 10/07/2019 14:01, Bruce Richardson:
>>>>>> On Wed, Jul 10, 2019 at 12:07:43PM +0200, Olivier Matz wrote:
>>>>>>> On Wed, Jul 10, 2019 at 10:55:34AM +0100, Bruce Richardson wrote:
>>>>>>>> On Wed, Jul 10, 2019 at 11:31:56AM +0200, Olivier Matz wrote:
>>>>>>>>> On Thu, Jul 04, 2019 at 04:21:22PM -0700, Yongseok Koh wrote:
>>>>>>>>>> Currently, metadata can be set on egress path via mbuf tx_meatadata field
>>>>>>>>>> with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
>>>>>>>>>>
>>>>>>>>>> This patch extends the usability.
>>>>>>>>>>
>>>>>>>>>> 1) RTE_FLOW_ACTION_TYPE_SET_META
>>>>>>>>>>
>>>>>>>>>> When supporting multiple tables, Tx metadata can also be set by a rule and
>>>>>>>>>> matched by another rule. This new action allows metadata to be set as a
>>>>>>>>>> result of flow match.
>>>>>>>>>>
>>>>>>>>>> 2) Metadata on ingress
>>>>>>>>>>
>>>>>>>>>> There's also need to support metadata on packet Rx. Metadata can be set by
>>>>>>>>>> SET_META action and matched by META item like Tx. The final value set by
>>>>>>>>>> the action will be delivered to application via mbuf metadata field with
>>>>>>>>>> PKT_RX_METADATA ol_flag.
>>>>>>>>>>
>>>>>>>>>> For this purpose, mbuf->tx_metadata is moved as a separate new field and
>>>>>>>>>> renamed to 'metadata' to support both Rx and Tx metadata.
>>>>>>>>>>
>>>>>>>>>> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
>>>>>>>>>> propagated to the other path depending on HW capability.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
>>>>>>>>>> --- a/lib/librte_mbuf/rte_mbuf.h
>>>>>>>>>> +++ b/lib/librte_mbuf/rte_mbuf.h
>>>>>>>>>> @@ -648,17 +653,6 @@ struct rte_mbuf {
>>>>>>>>>> 			/**< User defined tags. See rte_distributor_process() */
>>>>>>>>>> 			uint32_t usr;
>>>>>>>>>> 		} hash;                   /**< hash information */
>>>>>>>>>> -		struct {
>>>>>>>>>> -			/**
>>>>>>>>>> -			 * Application specific metadata value
>>>>>>>>>> -			 * for egress flow rule match.
>>>>>>>>>> -			 * Valid if PKT_TX_METADATA is set.
>>>>>>>>>> -			 * Located here to allow conjunct use
>>>>>>>>>> -			 * with hash.sched.hi.
>>>>>>>>>> -			 */
>>>>>>>>>> -			uint32_t tx_metadata;
>>>>>>>>>> -			uint32_t reserved;
>>>>>>>>>> -		};
>>>>>>>>>> 	};
>>>>>>>>>>
>>>>>>>>>> 	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ is set. */
>>>>>>>>>> @@ -727,6 +721,11 @@ struct rte_mbuf {
>>>>>>>>>> 	 */
>>>>>>>>>> 	struct rte_mbuf_ext_shared_info *shinfo;
>>>>>>>>>>
>>>>>>>>>> +	/** Application specific metadata value for flow rule match.
>>>>>>>>>> +	 * Valid if PKT_RX_METADATA or PKT_TX_METADATA is set.
>>>>>>>>>> +	 */
>>>>>>>>>> +	uint32_t metadata;
>>>>>>>>>> +
>>>>>>>>>> } __rte_cache_aligned;
>>>>>>>>> This will break the ABI, so we cannot put it in 19.08, and we need a
>>>>>>>>> deprecation notice.
>>>>>>>>>
>>>>>>>> Does it actually break the ABI? Adding a new field to the mbuf should only
>>>>>>>> break the ABI if it either causes new fields to move or changes the
>>>>>>>> structure size. Since this is at the end, it's not going to move any older
>>>>>>>> fields, and since everything is cache-aligned I don't think the structure
>>>>>>>> size changes either.
>>>>>>> I think it does break the ABI: in previous version, when the PKT_TX_METADATA
>>>>>>> flag is set, the associated value is put in m->tx_metadata (offset 44 on
>>>>>>> x86-64), and in the next version, it will be in m->metadata (offset 112). So,
>>>>>>> these 2 versions are not binary compatible.
>>>>>>>
>>>>>>> Anyway, at least it breaks the API.
>>>>>> Ok, I misunderstood. I thought it was the structure change itself you were
>>>>>> saying broke the ABI. Yes, putting the data in a different place is indeed
>>>>>> an ABI break.
>>>>> We could add the new field and keep the old one unused,
>>>>> so it does not break the ABI.
>>>> Still breaks ABI if PKT_TX_METADATA is set. :-) In order not to break it, I can
>>>> keep the current union'd field (tx_metadata) as is with PKT_TX_METADATA, add
>>>> the new one at the end and make it used with the new PKT_RX_METADATA.
>>>>
>>>>> However I suppose everybody will prefer a version using dynamic fields.
>>>>> Is someone against using dynamic field for such usage?
>>>> However, given that the amazing dynamic fields is coming soon (thanks for your
>>>> effort, Olivier and Thomas!), I'd be honored to be the first user of it.
>>>>
>>>> Olivier, I'll take a look at your RFC.
>>> Just got a crazy idea while reading this thread... How about repurposing
>>> that "reserved" field as "rx_metadata" in the meantime?
>>
>> It overlaps with hash.fdir.hi which has RSS hash.
> 
> While it does overlap with hash.fdir.hi, isn't the RSS hash stored in the
> "rss" field overlapping with hash.fdir.lo? (see struct rte_flow_action_rss)
> 
> hash.fdir.hi was originally used by FDIR and later repurposed by rte_flow
> for its MARK action, which neatly qualifies as Rx metadata so renaming
> "reserved" as "rx_metadata" could already make sense.
> 
> That is, assuming users do not need two different kinds of Rx metadata
> returned simultaneously with their packets. I think it's safe.
> 
>>> I know reserved fields are cursed and no one's ever supposed to touch them
>>> but this risk is mitigated by having the end user explicitly request its
>>> use, so the patch author (and his relatives) should be safe from the
>>> resulting bad juju.
>>>
>>> Joke aside, while I like the idea of Tx/Rx META, I think the similarities
>>> with MARK (and TAG eventually) is a problem. I wasn't available and couldn't
>>> comment when META was originally added to the Tx path, but there's a lot of
>>> overlap between these items/actions, without anything explaining to the end
>>> user how and why they should pick one over the other, if they can be
>>> combined at all and what happens in that case.
>>>
>>> All this must be documented, then we should think about unifying their
>>> respective features and deprecate the less capable items/actions. In my
>>> opinion, users need exactly one method to mark/match some mark while
>>> processing Rx/Tx traffic and *optionally* have that mark read from/written
>>> to the mbuf, which may or may not be possible depending on HW features.
> 
> Thoughts regarding this suggestion? From a user perspective I think all
> these actions should be unified but maybe there are good reasons to keep
> them separate?
> 

I think more recent plan is introducing dynamic fields for the remaining 16
bytes in the second cacheline.

I will update the patch as rejected, is there any objection?
  
Slava Ovsiienko Oct. 8, 2019, 1:17 p.m. UTC | #11
> -----Original Message-----
> From: Yigit, Ferruh <ferruh.yigit@linux.intel.com>
> Sent: Tuesday, October 8, 2019 15:51
> To: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>
> Cc: Yongseok Koh <yskoh@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; Olivier Matz <olivier.matz@6wind.com>; Bruce
> Richardson <bruce.richardson@intel.com>; Shahaf Shuler
> <shahafs@mellanox.com>; Ferruh Yigit <ferruh.yigit@intel.com>; dev
> <dev@dpdk.org>; Slava Ovsiienko <viacheslavo@mellanox.com>
> Subject: Re: [dpdk-dev] [PATCH] ethdev: extend flow metadata
> 
> On 7/29/2019 4:06 PM, Adrien Mazarguil wrote:
> > On Sun, Jul 14, 2019 at 02:46:58PM +0300, Andrew Rybchenko wrote:
> >> On 11.07.2019 10:44, Adrien Mazarguil wrote:
> >>> On Wed, Jul 10, 2019 at 04:37:46PM +0000, Yongseok Koh wrote:
> >>>>> On Jul 10, 2019, at 5:26 AM, Thomas Monjalon
> <thomas@monjalon.net> wrote:
> >>>>>
> >>>>> 10/07/2019 14:01, Bruce Richardson:
> >>>>>> On Wed, Jul 10, 2019 at 12:07:43PM +0200, Olivier Matz wrote:
> >>>>>>> On Wed, Jul 10, 2019 at 10:55:34AM +0100, Bruce Richardson
> wrote:
> >>>>>>>> On Wed, Jul 10, 2019 at 11:31:56AM +0200, Olivier Matz wrote:
> >>>>>>>>> On Thu, Jul 04, 2019 at 04:21:22PM -0700, Yongseok Koh wrote:
> >>>>>>>>>> Currently, metadata can be set on egress path via mbuf
> >>>>>>>>>> tx_meatadata field with PKT_TX_METADATA flag and
> RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
> >>>>>>>>>>
> >>>>>>>>>> This patch extends the usability.
> >>>>>>>>>>
> >>>>>>>>>> 1) RTE_FLOW_ACTION_TYPE_SET_META
> >>>>>>>>>>
> >>>>>>>>>> When supporting multiple tables, Tx metadata can also be set
> >>>>>>>>>> by a rule and matched by another rule. This new action allows
> >>>>>>>>>> metadata to be set as a result of flow match.
> >>>>>>>>>>
> >>>>>>>>>> 2) Metadata on ingress
> >>>>>>>>>>
> >>>>>>>>>> There's also need to support metadata on packet Rx. Metadata
> >>>>>>>>>> can be set by SET_META action and matched by META item like
> >>>>>>>>>> Tx. The final value set by the action will be delivered to
> >>>>>>>>>> application via mbuf metadata field with PKT_RX_METADATA
> ol_flag.
> >>>>>>>>>>
> >>>>>>>>>> For this purpose, mbuf->tx_metadata is moved as a separate
> >>>>>>>>>> new field and renamed to 'metadata' to support both Rx and Tx
> metadata.
> >>>>>>>>>>
> >>>>>>>>>> For loopback/hairpin packet, metadata set on Rx/Tx may or
> may
> >>>>>>>>>> not be propagated to the other path depending on HW
> capability.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> >>>>>>>>>> --- a/lib/librte_mbuf/rte_mbuf.h
> >>>>>>>>>> +++ b/lib/librte_mbuf/rte_mbuf.h
> >>>>>>>>>> @@ -648,17 +653,6 @@ struct rte_mbuf {
> >>>>>>>>>> 			/**< User defined tags. See
> rte_distributor_process() */
> >>>>>>>>>> 			uint32_t usr;
> >>>>>>>>>> 		} hash;                   /**< hash information */
> >>>>>>>>>> -		struct {
> >>>>>>>>>> -			/**
> >>>>>>>>>> -			 * Application specific metadata value
> >>>>>>>>>> -			 * for egress flow rule match.
> >>>>>>>>>> -			 * Valid if PKT_TX_METADATA is set.
> >>>>>>>>>> -			 * Located here to allow conjunct use
> >>>>>>>>>> -			 * with hash.sched.hi.
> >>>>>>>>>> -			 */
> >>>>>>>>>> -			uint32_t tx_metadata;
> >>>>>>>>>> -			uint32_t reserved;
> >>>>>>>>>> -		};
> >>>>>>>>>> 	};
> >>>>>>>>>>
> >>>>>>>>>> 	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ is set.
> >>>>>>>>>> */ @@ -727,6 +721,11 @@ struct rte_mbuf {
> >>>>>>>>>> 	 */
> >>>>>>>>>> 	struct rte_mbuf_ext_shared_info *shinfo;
> >>>>>>>>>>
> >>>>>>>>>> +	/** Application specific metadata value for flow rule match.
> >>>>>>>>>> +	 * Valid if PKT_RX_METADATA or PKT_TX_METADATA is set.
> >>>>>>>>>> +	 */
> >>>>>>>>>> +	uint32_t metadata;
> >>>>>>>>>> +
> >>>>>>>>>> } __rte_cache_aligned;
> >>>>>>>>> This will break the ABI, so we cannot put it in 19.08, and we
> >>>>>>>>> need a deprecation notice.
> >>>>>>>>>
> >>>>>>>> Does it actually break the ABI? Adding a new field to the mbuf
> >>>>>>>> should only break the ABI if it either causes new fields to
> >>>>>>>> move or changes the structure size. Since this is at the end,
> >>>>>>>> it's not going to move any older fields, and since everything
> >>>>>>>> is cache-aligned I don't think the structure size changes either.
> >>>>>>> I think it does break the ABI: in previous version, when the
> >>>>>>> PKT_TX_METADATA flag is set, the associated value is put in
> >>>>>>> m->tx_metadata (offset 44 on x86-64), and in the next version,
> >>>>>>> it will be in m->metadata (offset 112). So, these 2 versions are not
> binary compatible.
> >>>>>>>
> >>>>>>> Anyway, at least it breaks the API.
> >>>>>> Ok, I misunderstood. I thought it was the structure change itself
> >>>>>> you were saying broke the ABI. Yes, putting the data in a
> >>>>>> different place is indeed an ABI break.
> >>>>> We could add the new field and keep the old one unused, so it does
> >>>>> not break the ABI.
> >>>> Still breaks ABI if PKT_TX_METADATA is set. :-) In order not to
> >>>> break it, I can keep the current union'd field (tx_metadata) as is
> >>>> with PKT_TX_METADATA, add the new one at the end and make it used
> with the new PKT_RX_METADATA.
> >>>>
> >>>>> However I suppose everybody will prefer a version using dynamic
> fields.
> >>>>> Is someone against using dynamic field for such usage?
> >>>> However, given that the amazing dynamic fields is coming soon
> >>>> (thanks for your effort, Olivier and Thomas!), I'd be honored to be the
> first user of it.
> >>>>
> >>>> Olivier, I'll take a look at your RFC.
> >>> Just got a crazy idea while reading this thread... How about
> >>> repurposing that "reserved" field as "rx_metadata" in the meantime?
> >>
> >> It overlaps with hash.fdir.hi which has RSS hash.
> >
> > While it does overlap with hash.fdir.hi, isn't the RSS hash stored in
> > the "rss" field overlapping with hash.fdir.lo? (see struct
> > rte_flow_action_rss)
> >
> > hash.fdir.hi was originally used by FDIR and later repurposed by
> > rte_flow for its MARK action, which neatly qualifies as Rx metadata so
> > renaming "reserved" as "rx_metadata" could already make sense.
> >
> > That is, assuming users do not need two different kinds of Rx metadata
> > returned simultaneously with their packets. I think it's safe.
> >
> >>> I know reserved fields are cursed and no one's ever supposed to
> >>> touch them but this risk is mitigated by having the end user
> >>> explicitly request its use, so the patch author (and his relatives)
> >>> should be safe from the resulting bad juju.
> >>>
> >>> Joke aside, while I like the idea of Tx/Rx META, I think the
> >>> similarities with MARK (and TAG eventually) is a problem. I wasn't
> >>> available and couldn't comment when META was originally added to the
> >>> Tx path, but there's a lot of overlap between these items/actions,
> >>> without anything explaining to the end user how and why they should
> >>> pick one over the other, if they can be combined at all and what happens
> in that case.
> >>>
> >>> All this must be documented, then we should think about unifying
> >>> their respective features and deprecate the less capable
> >>> items/actions. In my opinion, users need exactly one method to
> >>> mark/match some mark while processing Rx/Tx traffic and *optionally*
> >>> have that mark read from/written to the mbuf, which may or may not be
> possible depending on HW features.
> >
> > Thoughts regarding this suggestion? From a user perspective I think
> > all these actions should be unified but maybe there are good reasons
> > to keep them separate?
> >
> 
> I think more recent plan is introducing dynamic fields for the remaining 16
> bytes in the second cacheline.
> 
> I will update the patch as rejected, is there any objection?

v2 is coming,  will be based on dynamic mbuf fields.
I think Superseded / Changes Requested is more relevant.

WBR, Slava
  

Patch

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 201bd9de56..eda5c5491f 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -272,6 +272,9 @@  enum index {
 	ACTION_SET_MAC_SRC_MAC_SRC,
 	ACTION_SET_MAC_DST,
 	ACTION_SET_MAC_DST_MAC_DST,
+	ACTION_SET_META,
+	ACTION_SET_META_DATA,
+	ACTION_SET_META_MASK,
 };
 
 /** Maximum size for pattern in struct rte_flow_item_raw. */
@@ -885,6 +888,7 @@  static const enum index next_action[] = {
 	ACTION_SET_TTL,
 	ACTION_SET_MAC_SRC,
 	ACTION_SET_MAC_DST,
+	ACTION_SET_META,
 	ZERO,
 };
 
@@ -1047,6 +1051,13 @@  static const enum index action_set_mac_dst[] = {
 	ZERO,
 };
 
+static const enum index action_set_meta[] = {
+	ACTION_SET_META_DATA,
+	ACTION_SET_META_MASK,
+	ACTION_NEXT,
+	ZERO,
+};
+
 static int parse_init(struct context *, const struct token *,
 		      const char *, unsigned int,
 		      void *, unsigned int);
@@ -2854,6 +2865,30 @@  static const struct token token_list[] = {
 			     (struct rte_flow_action_set_mac, mac_addr)),
 		.call = parse_vc_conf,
 	},
+	[ACTION_SET_META] = {
+		.name = "set_meta",
+		.help = "set metadata",
+		.priv = PRIV_ACTION(SET_META,
+			sizeof(struct rte_flow_action_set_meta)),
+		.next = NEXT(action_set_meta),
+		.call = parse_vc,
+	},
+	[ACTION_SET_META_DATA] = {
+		.name = "data",
+		.help = "metadata value",
+		.next = NEXT(action_set_meta, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY_HTON
+			     (struct rte_flow_action_set_meta, data)),
+		.call = parse_vc_conf,
+	},
+	[ACTION_SET_META_MASK] = {
+		.name = "mask",
+		.help = "mask for metadata value",
+		.next = NEXT(action_set_meta, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY_HTON
+			     (struct rte_flow_action_set_meta, mask)),
+		.call = parse_vc_conf,
+	},
 };
 
 /** Remove and return last entry from argument stack. */
diff --git a/app/test-pmd/util.c b/app/test-pmd/util.c
index a1164b7053..6ecc97351f 100644
--- a/app/test-pmd/util.c
+++ b/app/test-pmd/util.c
@@ -182,7 +182,7 @@  tx_pkt_set_md(uint16_t port_id, __rte_unused uint16_t queue,
 	 * and set ol_flags accordingly.
 	 */
 	for (i = 0; i < nb_pkts; i++) {
-		pkts[i]->tx_metadata = ports[port_id].tx_metadata;
+		pkts[i]->metadata = ports[port_id].tx_metadata;
 		pkts[i]->ol_flags |= PKT_TX_METADATA;
 	}
 	return nb_pkts;
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index a34d012e55..5092f0074e 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -658,6 +658,32 @@  the physical device, with virtual groups in the PMD or not at all.
    | ``mask`` | ``id``   | zeroed to match any value |
    +----------+----------+---------------------------+
 
+Item: ``META``
+^^^^^^^^^^^^^^^^^
+
+Matches 32 bit metadata item set.
+
+On egress, metadata can be set either by mbuf metadata field with
+PKT_TX_METADATA flag or ``SET_META`` action. On ingress, ``SET_META``
+action sets metadata for a packet and the metadata will be reported via
+``metadata`` field of ``rte_mbuf`` field with PKT_RX_METADATA flag.
+
+- Default ``mask`` matches the specified Rx metadata value.
+
+.. _table_rte_flow_item_meta:
+
+.. table:: META
+
+   +----------+----------+---------------------------------------+
+   | Field    | Subfield | Value                                 |
+   +==========+==========+=======================================+
+   | ``spec`` | ``data`` | 32 bit metadata value                 |
+   +----------+----------+---------------------------------------+
+   | ``last`` | ``data`` | upper range value                     |
+   +----------+----------+---------------------------------------+
+   | ``mask`` | ``data`` | bit-mask applies to "spec" and "last" |
+   +----------+----------+---------------------------------------+
+
 Data matching item types
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -1189,27 +1215,6 @@  Normally preceded by any of:
 - `Item: ICMP6_ND_NS`_
 - `Item: ICMP6_ND_OPT`_
 
-Item: ``META``
-^^^^^^^^^^^^^^
-
-Matches an application specific 32 bit metadata item.
-
-- Default ``mask`` matches the specified metadata value.
-
-.. _table_rte_flow_item_meta:
-
-.. table:: META
-
-   +----------+----------+---------------------------------------+
-   | Field    | Subfield | Value                                 |
-   +==========+==========+=======================================+
-   | ``spec`` | ``data`` | 32 bit metadata value                 |
-   +----------+--------------------------------------------------+
-   | ``last`` | ``data`` | upper range value                     |
-   +----------+----------+---------------------------------------+
-   | ``mask`` | ``data`` | bit-mask applies to "spec" and "last" |
-   +----------+----------+---------------------------------------+
-
 Actions
 ~~~~~~~
 
@@ -2345,6 +2350,32 @@  Otherwise, RTE_FLOW_ERROR_TYPE_ACTION error will be returned.
    | ``mac_addr`` | MAC address   |
    +--------------+---------------+
 
+Action: ``SET_META``
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Set metadata. Item ``META`` matches metadata.
+
+Metadata set by mbuf metadata field with PKT_TX_METADATA flag on egress will be
+overridden by this action. On ingress, the metadata will be carried by mbuf
+metadata field with PKT_RX_METADATA flag if set.
+
+Altering partial bits is supported with ``mask``. For bits which have never been
+set, unpredictable value will be seen depending on driver implementation. For
+loopback/hairpin packet, metadata set on Rx/Tx may or may not be propagated to
+the other path depending on HW capability.
+
+.. _table_rte_flow_action_set_meta:
+
+.. table:: SET_META
+
+   +----------+----------------------------+
+   | Field    | Value                      |
+   +==========+============================+
+   | ``data`` | 32 bit metadata value      |
+   +----------+----------------------------+
+   | ``mask`` | bit-mask applies to "data" |
+   +----------+----------------------------+
+
 Negative types
 ~~~~~~~~~~~~~~
 
diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
index 223479c6d4..e087266da0 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -68,6 +68,16 @@  New Features
   rte_rand_max() which supplies unbiased, bounded pseudo-random
   numbers.
 
+* **Extended metadata support in rte_flow.**
+
+  Flow metadata is extended to both Rx and Tx.
+
+  * ``tx_metadata`` field of ``rte_mbuf`` has been moved to an independent
+    field and renamed as ``metadata``.
+  * Tx metadata can also be set by SET_META action of rte_flow.
+  * Rx metadata is delivered to host via ``metadata`` field of ``rte_mbuf``
+    with PKT_RX_METADATA.
+
 * **Updated the bnxt PMD.**
 
   Updated the bnxt PMD. The major enhancements include:
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index c1dc8c4e17..4b23a0176d 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -784,8 +784,7 @@  mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		txq_mbuf_to_swp(txq, buf, (uint8_t *)&swp_offsets, &swp_types);
 		raw = ((uint8_t *)(uintptr_t)wqe) + 2 * MLX5_WQE_DWORD_SIZE;
 		/* Copy metadata from mbuf if valid */
-		metadata = buf->ol_flags & PKT_TX_METADATA ? buf->tx_metadata :
-							     0;
+		metadata = buf->ol_flags & PKT_TX_METADATA ? buf->metadata : 0;
 		/* Replace the Ethernet type by the VLAN if necessary. */
 		if (buf->ol_flags & PKT_TX_VLAN_PKT) {
 			uint32_t vlan = rte_cpu_to_be_32(0x81000000 |
@@ -1193,8 +1192,7 @@  mlx5_tx_burst_mpw(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		--pkts_n;
 		cs_flags = txq_ol_cksum_to_cs(buf);
 		/* Copy metadata from mbuf if valid */
-		metadata = buf->ol_flags & PKT_TX_METADATA ? buf->tx_metadata :
-							     0;
+		metadata = buf->ol_flags & PKT_TX_METADATA ? buf->metadata : 0;
 		/* Retrieve packet information. */
 		length = PKT_LEN(buf);
 		assert(length);
@@ -1430,8 +1428,7 @@  mlx5_tx_burst_mpw_inline(void *dpdk_txq, struct rte_mbuf **pkts,
 		max_wqe = (1u << txq->wqe_n) - (txq->wqe_ci - txq->wqe_pi);
 		cs_flags = txq_ol_cksum_to_cs(buf);
 		/* Copy metadata from mbuf if valid */
-		metadata = buf->ol_flags & PKT_TX_METADATA ? buf->tx_metadata :
-							     0;
+		metadata = buf->ol_flags & PKT_TX_METADATA ? buf->metadata : 0;
 		/* Retrieve packet information. */
 		length = PKT_LEN(buf);
 		/* Start new session if packet differs. */
@@ -1715,8 +1712,7 @@  txq_burst_empw(struct mlx5_txq_data *txq, struct rte_mbuf **pkts,
 			break;
 		cs_flags = txq_ol_cksum_to_cs(buf);
 		/* Copy metadata from mbuf if valid */
-		metadata = buf->ol_flags & PKT_TX_METADATA ? buf->tx_metadata :
-							     0;
+		metadata = buf->ol_flags & PKT_TX_METADATA ? buf->metadata : 0;
 		/* Retrieve packet information. */
 		length = PKT_LEN(buf);
 		/* Start new session if:
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.c b/drivers/net/mlx5/mlx5_rxtx_vec.c
index 073044f6d1..b8e042c5d2 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec.c
+++ b/drivers/net/mlx5/mlx5_rxtx_vec.c
@@ -71,7 +71,7 @@  txq_calc_offload(struct rte_mbuf **pkts, uint16_t pkts_n, uint8_t *cs_flags,
 	if (!pkts_n)
 		return 0;
 	p0_metadata = pkts[0]->ol_flags & PKT_TX_METADATA ?
-			pkts[0]->tx_metadata : 0;
+		      pkts[0]->metadata : 0;
 	/* Count the number of packets having same offload parameters. */
 	for (pos = 1; pos < pkts_n; ++pos) {
 		/* Check if packet has same checksum flags. */
@@ -81,7 +81,7 @@  txq_calc_offload(struct rte_mbuf **pkts, uint16_t pkts_n, uint8_t *cs_flags,
 		/* Check if packet has same metadata. */
 		if (txq_offloads & DEV_TX_OFFLOAD_MATCH_METADATA) {
 			pn_metadata = pkts[pos]->ol_flags & PKT_TX_METADATA ?
-					pkts[pos]->tx_metadata : 0;
+				      pkts[pos]->metadata : 0;
 			if (pn_metadata != p0_metadata)
 				break;
 		}
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index 1c7e3b444a..900cd9db43 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -131,7 +131,7 @@  txq_scatter_v(struct mlx5_txq_data *txq, struct rte_mbuf **pkts,
 		uint8x16_t ctrl;
 		rte_be32_t metadata =
 			metadata_ol && (buf->ol_flags & PKT_TX_METADATA) ?
-			buf->tx_metadata : 0;
+			buf->metadata : 0;
 
 		assert(segs_n);
 		max_elts = elts_n - (elts_head - txq->elts_tail);
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
index 503ca0f6ad..df7e22b9b9 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
@@ -129,7 +129,7 @@  txq_scatter_v(struct mlx5_txq_data *txq, struct rte_mbuf **pkts,
 		__m128i ctrl;
 		rte_be32_t metadata =
 			metadata_ol && (buf->ol_flags & PKT_TX_METADATA) ?
-			buf->tx_metadata : 0;
+			buf->metadata : 0;
 
 		assert(segs_n);
 		max_elts = elts_n - (elts_head - txq->elts_tail);
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index c85212649c..ee0707e2d8 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1011,6 +1011,11 @@  struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_KEEP_CRC		0x00010000
 #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
 #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
+/**
+ * Device supports match on metadata Rx offload.
+ * Driver sets PKT_RX_METADATA and mbuf metadata field.
+ */
+#define DEV_RX_OFFLOAD_MATCH_METADATA   0x00080000
 
 #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
 				 DEV_RX_OFFLOAD_UDP_CKSUM | \
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index f3a8fb103f..cda8628183 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -417,7 +417,8 @@  enum rte_flow_item_type {
 	/**
 	 * [META]
 	 *
-	 * Matches a metadata value specified in mbuf metadata field.
+	 * Matches a metadata value.
+	 *
 	 * See struct rte_flow_item_meta.
 	 */
 	RTE_FLOW_ITEM_TYPE_META,
@@ -1164,9 +1165,16 @@  rte_flow_item_icmp6_nd_opt_tla_eth_mask = {
 #endif
 
 /**
- * RTE_FLOW_ITEM_TYPE_META.
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
  *
- * Matches a specified metadata value.
+ * RTE_FLOW_ITEM_TYPE_META
+ *
+ * Matches a specified metadata value. On egress, metadata can be set either by
+ * mbuf metadata field with PKT_TX_METADATA flag or
+ * RTE_FLOW_ACTION_TYPE_SET_META. On ingress, RTE_FLOW_ACTION_TYPE_SET_META sets
+ * metadata for a packet and the metadata will be reported via mbuf metadata
+ * field with PKT_RX_METADATA flag.
  */
 struct rte_flow_item_meta {
 	rte_be32_t data;
@@ -1650,6 +1658,13 @@  enum rte_flow_action_type {
 	 * See struct rte_flow_action_set_mac.
 	 */
 	RTE_FLOW_ACTION_TYPE_SET_MAC_DST,
+
+	/**
+	 * Set metadata on ingress or egress path.
+	 *
+	 * See struct rte_flow_action_set_meta.
+	 */
+	RTE_FLOW_ACTION_TYPE_SET_META,
 };
 
 /**
@@ -2131,6 +2146,28 @@  struct rte_flow_action_set_mac {
 	uint8_t mac_addr[RTE_ETHER_ADDR_LEN];
 };
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ *
+ * RTE_FLOW_ACTION_TYPE_SET_META
+ *
+ * Set metadata. Metadata set by mbuf metadata field with PKT_TX_METADATA flag
+ * on egress will be overridden by this action. On ingress, the metadata will be
+ * carried by mbuf metadata field with PKT_RX_METADATA flag if set.
+ *
+ * Altering partial bits is supported with mask. For bits which have never been
+ * set, unpredictable value will be seen depending on driver implementation. For
+ * loopback/hairpin packet, metadata set on Rx/Tx may or may not be propagated
+ * to the other path depending on HW capability.
+ *
+ * RTE_FLOW_ITEM_TYPE_META matches metadata.
+ */
+struct rte_flow_action_set_meta {
+	rte_be32_t data;
+	rte_be32_t mask;
+};
+
 /*
  * Definition of a single action.
  *
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 9542488554..ba2da874f5 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -200,6 +200,11 @@  extern "C" {
 
 /* add new RX flags here */
 
+/**
+ * Indicate that mbuf has metadata from device.
+ */
+#define PKT_RX_METADATA	(1ULL << 23)
+
 /* add new TX flags here */
 
 /**
@@ -648,17 +653,6 @@  struct rte_mbuf {
 			/**< User defined tags. See rte_distributor_process() */
 			uint32_t usr;
 		} hash;                   /**< hash information */
-		struct {
-			/**
-			 * Application specific metadata value
-			 * for egress flow rule match.
-			 * Valid if PKT_TX_METADATA is set.
-			 * Located here to allow conjunct use
-			 * with hash.sched.hi.
-			 */
-			uint32_t tx_metadata;
-			uint32_t reserved;
-		};
 	};
 
 	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ is set. */
@@ -727,6 +721,11 @@  struct rte_mbuf {
 	 */
 	struct rte_mbuf_ext_shared_info *shinfo;
 
+	/** Application specific metadata value for flow rule match.
+	 * Valid if PKT_RX_METADATA or PKT_TX_METADATA is set.
+	 */
+	uint32_t metadata;
+
 } __rte_cache_aligned;
 
 /**