[v4] ethdev: advertise flow restore in mbuf

Message ID 20230621144327.2202591-1-david.marchand@redhat.com (mailing list archive)
State Accepted, archived
Delegated to: Ferruh Yigit
Headers
Series [v4] ethdev: advertise flow restore in mbuf |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/github-robot: build success github build: passed
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-aarch-unit-testing success Testing PASS
ci/iol-unit-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS

Commit Message

David Marchand June 21, 2023, 2:43 p.m. UTC
  As reported by Ilya [1], unconditionally calling
rte_flow_get_restore_info() impacts an application performance for drivers
that do not provide this ops.
It could also impact processing of packets that require no call to
rte_flow_get_restore_info() at all.

Register a dynamic mbuf flag when an application negotiates tunnel
metadata delivery (calling rte_eth_rx_metadata_negotiate() with
RTE_ETH_RX_METADATA_TUNNEL_ID).

Drivers then advertise that metadata can be extracted by setting this
dynamic flag in each mbuf.

The application then calls rte_flow_get_restore_info() only when required.

Link: http://inbox.dpdk.org/dev/5248c2ca-f2a6-3fb0-38b8-7f659bfa40de@ovn.org/
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
---
Changes since RFC v3:
- rebased on next-net,
- sending as non RFC for CIs that skip RFC patches,

Changes since RFC v2:
- fixed crash introduced in v2 and removed unneeded argument to 
  rte_flow_restore_info_dynflag_register(),

Changes since RFC v1:
- rebased,
- updated vectorized datapath functions for net/mlx5,
- moved dynamic flag register to rte_eth_rx_metadata_negotiate() and
  hid rte_flow_restore_info_dynflag_register() into ethdev internals,

---
 app/test-pmd/util.c                      |  9 +++--
 drivers/net/mlx5/mlx5.c                  |  2 +
 drivers/net/mlx5/mlx5.h                  |  5 ++-
 drivers/net/mlx5/mlx5_flow.c             | 47 +++++++++++++++++++++---
 drivers/net/mlx5/mlx5_rx.c               |  2 +-
 drivers/net/mlx5/mlx5_rx.h               |  1 +
 drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 16 ++++----
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h    |  6 +--
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h     |  6 +--
 drivers/net/mlx5/mlx5_trigger.c          |  4 +-
 drivers/net/sfc/sfc_dp.c                 | 14 +------
 lib/ethdev/rte_ethdev.c                  |  5 +++
 lib/ethdev/rte_flow.c                    | 27 ++++++++++++++
 lib/ethdev/rte_flow.h                    | 18 ++++++++-
 lib/ethdev/rte_flow_driver.h             |  6 +++
 lib/ethdev/version.map                   |  1 +
 16 files changed, 128 insertions(+), 41 deletions(-)
  

Comments

Ferruh Yigit June 21, 2023, 6:52 p.m. UTC | #1
On 6/21/2023 3:43 PM, David Marchand wrote:
> As reported by Ilya [1], unconditionally calling
> rte_flow_get_restore_info() impacts an application performance for drivers
> that do not provide this ops.
> It could also impact processing of packets that require no call to
> rte_flow_get_restore_info() at all.
> 
> Register a dynamic mbuf flag when an application negotiates tunnel
> metadata delivery (calling rte_eth_rx_metadata_negotiate() with
> RTE_ETH_RX_METADATA_TUNNEL_ID).
> 
> Drivers then advertise that metadata can be extracted by setting this
> dynamic flag in each mbuf.
> 
> The application then calls rte_flow_get_restore_info() only when required.
> 
> Link: http://inbox.dpdk.org/dev/5248c2ca-f2a6-3fb0-38b8-7f659bfa40de@ovn.org/
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> Tested-by: Ali Alnubani <alialnu@nvidia.com>
> Acked-by: Ori Kam <orika@nvidia.com>
>

Applied to dpdk-next-net/main, thanks.
  
Ilya Maximets July 31, 2023, 8:41 p.m. UTC | #2
On 6/21/23 16:43, David Marchand wrote:
> As reported by Ilya [1], unconditionally calling
> rte_flow_get_restore_info() impacts an application performance for drivers
> that do not provide this ops.
> It could also impact processing of packets that require no call to
> rte_flow_get_restore_info() at all.
> 
> Register a dynamic mbuf flag when an application negotiates tunnel
> metadata delivery (calling rte_eth_rx_metadata_negotiate() with
> RTE_ETH_RX_METADATA_TUNNEL_ID).
> 
> Drivers then advertise that metadata can be extracted by setting this
> dynamic flag in each mbuf.
> 
> The application then calls rte_flow_get_restore_info() only when required.
> 
> Link: http://inbox.dpdk.org/dev/5248c2ca-f2a6-3fb0-38b8-7f659bfa40de@ovn.org/
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> Tested-by: Ali Alnubani <alialnu@nvidia.com>
> Acked-by: Ori Kam <orika@nvidia.com>
> ---
> Changes since RFC v3:
> - rebased on next-net,
> - sending as non RFC for CIs that skip RFC patches,
> 
> Changes since RFC v2:
> - fixed crash introduced in v2 and removed unneeded argument to 
>   rte_flow_restore_info_dynflag_register(),
> 
> Changes since RFC v1:
> - rebased,
> - updated vectorized datapath functions for net/mlx5,
> - moved dynamic flag register to rte_eth_rx_metadata_negotiate() and
>   hid rte_flow_restore_info_dynflag_register() into ethdev internals,
> 
> ---
>  app/test-pmd/util.c                      |  9 +++--
>  drivers/net/mlx5/mlx5.c                  |  2 +
>  drivers/net/mlx5/mlx5.h                  |  5 ++-
>  drivers/net/mlx5/mlx5_flow.c             | 47 +++++++++++++++++++++---
>  drivers/net/mlx5/mlx5_rx.c               |  2 +-
>  drivers/net/mlx5/mlx5_rx.h               |  1 +
>  drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 16 ++++----
>  drivers/net/mlx5/mlx5_rxtx_vec_neon.h    |  6 +--
>  drivers/net/mlx5/mlx5_rxtx_vec_sse.h     |  6 +--
>  drivers/net/mlx5/mlx5_trigger.c          |  4 +-
>  drivers/net/sfc/sfc_dp.c                 | 14 +------
>  lib/ethdev/rte_ethdev.c                  |  5 +++
>  lib/ethdev/rte_flow.c                    | 27 ++++++++++++++
>  lib/ethdev/rte_flow.h                    | 18 ++++++++-
>  lib/ethdev/rte_flow_driver.h             |  6 +++
>  lib/ethdev/version.map                   |  1 +
>  16 files changed, 128 insertions(+), 41 deletions(-)

<snip>

> diff --git a/lib/ethdev/rte_flow_driver.h b/lib/ethdev/rte_flow_driver.h
> index 356b60f523..f9fb01b8a2 100644
> --- a/lib/ethdev/rte_flow_driver.h
> +++ b/lib/ethdev/rte_flow_driver.h
> @@ -376,6 +376,12 @@ struct rte_flow_ops {
>  const struct rte_flow_ops *
>  rte_flow_ops_get(uint16_t port_id, struct rte_flow_error *error);
>  
> +/**
> + * Register mbuf dynamic flag for rte_flow_get_restore_info.
> + */
> +int
> +rte_flow_restore_info_dynflag_register(void);
> +

Hi, David, others.

Is there a reason to not expose this function to the application?

The point is that application will likely want to know the value
of the flag before creating any devices.  I.e. request it once
and use for all devices later without performing a call to an
external library (DPDK).  In current implementation, application
will need to open some device first, and only then the result of
rte_flow_restore_info_dynflag() will become meaningful.

There is no need to require application to call this function,
it can still be called from the rx negotiation API, but it would
be nice if application could know it beforehand, i.e. had control
over when the flag is actually becomes visible.

Alternatively, the _register() could also be called right from
the rte_flow_restore_info_dynflag() at a slight performance cost.
It shouldn't be important though, since drivers do not seem to
call it from a performance-sensitive code.

Thoughts?

Best regards, Ilya Maximets.
  
David Marchand Sept. 26, 2023, 9:17 a.m. UTC | #3
Hello Ilya,

On Mon, Jul 31, 2023 at 10:40 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> On 6/21/23 16:43, David Marchand wrote:
> > As reported by Ilya [1], unconditionally calling
> > rte_flow_get_restore_info() impacts an application performance for drivers
> > that do not provide this ops.
> > It could also impact processing of packets that require no call to
> > rte_flow_get_restore_info() at all.
> >
> > Register a dynamic mbuf flag when an application negotiates tunnel
> > metadata delivery (calling rte_eth_rx_metadata_negotiate() with
> > RTE_ETH_RX_METADATA_TUNNEL_ID).
> >
> > Drivers then advertise that metadata can be extracted by setting this
> > dynamic flag in each mbuf.
> >
> > The application then calls rte_flow_get_restore_info() only when required.
> >
> > Link: http://inbox.dpdk.org/dev/5248c2ca-f2a6-3fb0-38b8-7f659bfa40de@ovn.org/
> > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> > Tested-by: Ali Alnubani <alialnu@nvidia.com>
> > Acked-by: Ori Kam <orika@nvidia.com>
> > ---
> > Changes since RFC v3:
> > - rebased on next-net,
> > - sending as non RFC for CIs that skip RFC patches,
> >
> > Changes since RFC v2:
> > - fixed crash introduced in v2 and removed unneeded argument to
> >   rte_flow_restore_info_dynflag_register(),
> >
> > Changes since RFC v1:
> > - rebased,
> > - updated vectorized datapath functions for net/mlx5,
> > - moved dynamic flag register to rte_eth_rx_metadata_negotiate() and
> >   hid rte_flow_restore_info_dynflag_register() into ethdev internals,
> >
> > ---
> >  app/test-pmd/util.c                      |  9 +++--
> >  drivers/net/mlx5/mlx5.c                  |  2 +
> >  drivers/net/mlx5/mlx5.h                  |  5 ++-
> >  drivers/net/mlx5/mlx5_flow.c             | 47 +++++++++++++++++++++---
> >  drivers/net/mlx5/mlx5_rx.c               |  2 +-
> >  drivers/net/mlx5/mlx5_rx.h               |  1 +
> >  drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 16 ++++----
> >  drivers/net/mlx5/mlx5_rxtx_vec_neon.h    |  6 +--
> >  drivers/net/mlx5/mlx5_rxtx_vec_sse.h     |  6 +--
> >  drivers/net/mlx5/mlx5_trigger.c          |  4 +-
> >  drivers/net/sfc/sfc_dp.c                 | 14 +------
> >  lib/ethdev/rte_ethdev.c                  |  5 +++
> >  lib/ethdev/rte_flow.c                    | 27 ++++++++++++++
> >  lib/ethdev/rte_flow.h                    | 18 ++++++++-
> >  lib/ethdev/rte_flow_driver.h             |  6 +++
> >  lib/ethdev/version.map                   |  1 +
> >  16 files changed, 128 insertions(+), 41 deletions(-)
>
> <snip>
>
> > diff --git a/lib/ethdev/rte_flow_driver.h b/lib/ethdev/rte_flow_driver.h
> > index 356b60f523..f9fb01b8a2 100644
> > --- a/lib/ethdev/rte_flow_driver.h
> > +++ b/lib/ethdev/rte_flow_driver.h
> > @@ -376,6 +376,12 @@ struct rte_flow_ops {
> >  const struct rte_flow_ops *
> >  rte_flow_ops_get(uint16_t port_id, struct rte_flow_error *error);
> >
> > +/**
> > + * Register mbuf dynamic flag for rte_flow_get_restore_info.
> > + */
> > +int
> > +rte_flow_restore_info_dynflag_register(void);
> > +
>
> Hi, David, others.
>
> Is there a reason to not expose this function to the application?
>
> The point is that application will likely want to know the value
> of the flag before creating any devices.  I.e. request it once
> and use for all devices later without performing a call to an
> external library (DPDK).  In current implementation, application
> will need to open some device first, and only then the result of
> rte_flow_restore_info_dynflag() will become meaningful.
>
> There is no need to require application to call this function,
> it can still be called from the rx negotiation API, but it would
> be nice if application could know it beforehand, i.e. had control
> over when the flag is actually becomes visible.

DPDK tries to register flags only when needed, as there is not a lot
of space for dyn flags.
Some drivers take some space and applications want some share too.

DPDK can export the _register function for applications to call it
regardless of what driver will be used later.

Yet, I want to be sure why it matters in OVS context.
Is it not enough resolving the flag (by calling
rte_flow_restore_info_dynflag()) once rte_eth_rx_metadata_negotiate
for tunnel metadata is called?
Do you want to avoid an atomic store/load between OVS main thread and
PMD threads?
  
Ilya Maximets Sept. 26, 2023, 7:49 p.m. UTC | #4
On 9/26/23 11:17, David Marchand wrote:
> Hello Ilya,
> 
> On Mon, Jul 31, 2023 at 10:40 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>> On 6/21/23 16:43, David Marchand wrote:
>>> As reported by Ilya [1], unconditionally calling
>>> rte_flow_get_restore_info() impacts an application performance for drivers
>>> that do not provide this ops.
>>> It could also impact processing of packets that require no call to
>>> rte_flow_get_restore_info() at all.
>>>
>>> Register a dynamic mbuf flag when an application negotiates tunnel
>>> metadata delivery (calling rte_eth_rx_metadata_negotiate() with
>>> RTE_ETH_RX_METADATA_TUNNEL_ID).
>>>
>>> Drivers then advertise that metadata can be extracted by setting this
>>> dynamic flag in each mbuf.
>>>
>>> The application then calls rte_flow_get_restore_info() only when required.
>>>
>>> Link: http://inbox.dpdk.org/dev/5248c2ca-f2a6-3fb0-38b8-7f659bfa40de@ovn.org/
>>> Signed-off-by: David Marchand <david.marchand@redhat.com>
>>> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
>>> Tested-by: Ali Alnubani <alialnu@nvidia.com>
>>> Acked-by: Ori Kam <orika@nvidia.com>
>>> ---
>>> Changes since RFC v3:
>>> - rebased on next-net,
>>> - sending as non RFC for CIs that skip RFC patches,
>>>
>>> Changes since RFC v2:
>>> - fixed crash introduced in v2 and removed unneeded argument to
>>>   rte_flow_restore_info_dynflag_register(),
>>>
>>> Changes since RFC v1:
>>> - rebased,
>>> - updated vectorized datapath functions for net/mlx5,
>>> - moved dynamic flag register to rte_eth_rx_metadata_negotiate() and
>>>   hid rte_flow_restore_info_dynflag_register() into ethdev internals,
>>>
>>> ---
>>>  app/test-pmd/util.c                      |  9 +++--
>>>  drivers/net/mlx5/mlx5.c                  |  2 +
>>>  drivers/net/mlx5/mlx5.h                  |  5 ++-
>>>  drivers/net/mlx5/mlx5_flow.c             | 47 +++++++++++++++++++++---
>>>  drivers/net/mlx5/mlx5_rx.c               |  2 +-
>>>  drivers/net/mlx5/mlx5_rx.h               |  1 +
>>>  drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 16 ++++----
>>>  drivers/net/mlx5/mlx5_rxtx_vec_neon.h    |  6 +--
>>>  drivers/net/mlx5/mlx5_rxtx_vec_sse.h     |  6 +--
>>>  drivers/net/mlx5/mlx5_trigger.c          |  4 +-
>>>  drivers/net/sfc/sfc_dp.c                 | 14 +------
>>>  lib/ethdev/rte_ethdev.c                  |  5 +++
>>>  lib/ethdev/rte_flow.c                    | 27 ++++++++++++++
>>>  lib/ethdev/rte_flow.h                    | 18 ++++++++-
>>>  lib/ethdev/rte_flow_driver.h             |  6 +++
>>>  lib/ethdev/version.map                   |  1 +
>>>  16 files changed, 128 insertions(+), 41 deletions(-)
>>
>> <snip>
>>
>>> diff --git a/lib/ethdev/rte_flow_driver.h b/lib/ethdev/rte_flow_driver.h
>>> index 356b60f523..f9fb01b8a2 100644
>>> --- a/lib/ethdev/rte_flow_driver.h
>>> +++ b/lib/ethdev/rte_flow_driver.h
>>> @@ -376,6 +376,12 @@ struct rte_flow_ops {
>>>  const struct rte_flow_ops *
>>>  rte_flow_ops_get(uint16_t port_id, struct rte_flow_error *error);
>>>
>>> +/**
>>> + * Register mbuf dynamic flag for rte_flow_get_restore_info.
>>> + */
>>> +int
>>> +rte_flow_restore_info_dynflag_register(void);
>>> +
>>
>> Hi, David, others.
>>
>> Is there a reason to not expose this function to the application?
>>
>> The point is that application will likely want to know the value
>> of the flag before creating any devices.  I.e. request it once
>> and use for all devices later without performing a call to an
>> external library (DPDK).  In current implementation, application
>> will need to open some device first, and only then the result of
>> rte_flow_restore_info_dynflag() will become meaningful.
>>
>> There is no need to require application to call this function,
>> it can still be called from the rx negotiation API, but it would
>> be nice if application could know it beforehand, i.e. had control
>> over when the flag is actually becomes visible.
> 
> DPDK tries to register flags only when needed, as there is not a lot
> of space for dyn flags.
> Some drivers take some space and applications want some share too.
> 
> DPDK can export the _register function for applications to call it
> regardless of what driver will be used later.
> 
> Yet, I want to be sure why it matters in OVS context.
> Is it not enough resolving the flag (by calling
> rte_flow_restore_info_dynflag()) once rte_eth_rx_metadata_negotiate
> for tunnel metadata is called?
> Do you want to avoid an atomic store/load between OVS main thread and
> PMD threads?

Yeas, something like this.  I do not have a solid implementation idea
for that though.  I replied to your OVS patch.

Best regards, Ilya Maximets.
  

Patch

diff --git a/app/test-pmd/util.c b/app/test-pmd/util.c
index f9df5f69ef..5aa69ed545 100644
--- a/app/test-pmd/util.c
+++ b/app/test-pmd/util.c
@@ -88,18 +88,20 @@  dump_pkt_burst(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[],
 	char print_buf[MAX_STRING_LEN];
 	size_t buf_size = MAX_STRING_LEN;
 	size_t cur_len = 0;
+	uint64_t restore_info_dynflag;
 
 	if (!nb_pkts)
 		return;
+	restore_info_dynflag = rte_flow_restore_info_dynflag();
 	MKDUMPSTR(print_buf, buf_size, cur_len,
 		  "port %u/queue %u: %s %u packets\n", port_id, queue,
 		  is_rx ? "received" : "sent", (unsigned int) nb_pkts);
 	for (i = 0; i < nb_pkts; i++) {
-		int ret;
 		struct rte_flow_error error;
 		struct rte_flow_restore_info info = { 0, };
 
 		mb = pkts[i];
+		ol_flags = mb->ol_flags;
 		if (rxq_share > 0)
 			MKDUMPSTR(print_buf, buf_size, cur_len, "port %u, ",
 				  mb->port);
@@ -107,8 +109,8 @@  dump_pkt_burst(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[],
 		eth_type = RTE_BE_TO_CPU_16(eth_hdr->ether_type);
 		packet_type = mb->packet_type;
 		is_encapsulation = RTE_ETH_IS_TUNNEL_PKT(packet_type);
-		ret = rte_flow_get_restore_info(port_id, mb, &info, &error);
-		if (!ret) {
+		if ((ol_flags & restore_info_dynflag) != 0 &&
+				rte_flow_get_restore_info(port_id, mb, &info, &error) == 0) {
 			MKDUMPSTR(print_buf, buf_size, cur_len,
 				  "restore info:");
 			if (info.flags & RTE_FLOW_RESTORE_INFO_TUNNEL) {
@@ -153,7 +155,6 @@  dump_pkt_burst(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[],
 			  " - pool=%s - type=0x%04x - length=%u - nb_segs=%d",
 			  mb->pool->name, eth_type, (unsigned int) mb->pkt_len,
 			  (int)mb->nb_segs);
-		ol_flags = mb->ol_flags;
 		if (ol_flags & RTE_MBUF_F_RX_RSS_HASH) {
 			MKDUMPSTR(print_buf, buf_size, cur_len,
 				  " - RSS hash=0x%x",
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index a75fa1b7f0..58fde3af22 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2347,6 +2347,7 @@  const struct eth_dev_ops mlx5_dev_ops = {
 	.get_monitor_addr = mlx5_get_monitor_addr,
 	.count_aggr_ports = mlx5_count_aggr_ports,
 	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
+	.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
 };
 
 /* Available operations from secondary process. */
@@ -2372,6 +2373,7 @@  const struct eth_dev_ops mlx5_dev_sec_ops = {
 	.get_module_eeprom = mlx5_get_module_eeprom,
 	.count_aggr_ports = mlx5_count_aggr_ports,
 	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
+	.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
 };
 
 /* Available operations in flow isolated mode. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 021049ad2b..bf464613f0 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1745,6 +1745,7 @@  struct mlx5_priv {
 	unsigned int lb_used:1; /* Loopback queue is referred to. */
 	uint32_t mark_enabled:1; /* If mark action is enabled on rxqs. */
 	uint32_t num_lag_ports:4; /* Number of ports can be bonded. */
+	uint32_t tunnel_enabled:1; /* If tunnel offloading is enabled on rxqs. */
 	uint16_t domain_id; /* Switch domain identifier. */
 	uint16_t vport_id; /* Associated VF vport index (if any). */
 	uint32_t vport_meta_tag; /* Used for vport index match ove VF LAG. */
@@ -2154,7 +2155,9 @@  int mlx5_flow_query_counter(struct rte_eth_dev *dev, struct rte_flow *flow,
 int mlx5_flow_dev_dump_ipool(struct rte_eth_dev *dev, struct rte_flow *flow,
 		FILE *file, struct rte_flow_error *error);
 #endif
-void mlx5_flow_rxq_dynf_metadata_set(struct rte_eth_dev *dev);
+int mlx5_flow_rx_metadata_negotiate(struct rte_eth_dev *dev,
+	uint64_t *features);
+void mlx5_flow_rxq_dynf_set(struct rte_eth_dev *dev);
 int mlx5_flow_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
 			uint32_t nb_contexts, struct rte_flow_error *error);
 int mlx5_validate_action_ct(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index eb1d7a6be2..19e567401b 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1796,6 +1796,38 @@  flow_rxq_flags_clear(struct rte_eth_dev *dev)
 	priv->sh->shared_mark_enabled = 0;
 }
 
+static uint64_t mlx5_restore_info_dynflag;
+
+int
+mlx5_flow_rx_metadata_negotiate(struct rte_eth_dev *dev, uint64_t *features)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint64_t supported = 0;
+
+	if (!is_tunnel_offload_active(dev)) {
+		supported |= RTE_ETH_RX_METADATA_USER_FLAG;
+		supported |= RTE_ETH_RX_METADATA_USER_MARK;
+		if ((*features & RTE_ETH_RX_METADATA_TUNNEL_ID) != 0) {
+			DRV_LOG(DEBUG,
+				"tunnel offload was not activated, consider setting dv_xmeta_en=%d",
+				MLX5_XMETA_MODE_MISS_INFO);
+		}
+	} else {
+		supported |= RTE_ETH_RX_METADATA_TUNNEL_ID;
+		if ((*features & RTE_ETH_RX_METADATA_TUNNEL_ID) != 0 &&
+				mlx5_restore_info_dynflag == 0)
+			mlx5_restore_info_dynflag = rte_flow_restore_info_dynflag();
+	}
+
+	if (((*features & supported) & RTE_ETH_RX_METADATA_TUNNEL_ID) != 0)
+		priv->tunnel_enabled = 1;
+	else
+		priv->tunnel_enabled = 0;
+
+	*features &= supported;
+	return 0;
+}
+
 /**
  * Set the Rx queue dynamic metadata (mask and offset) for a flow
  *
@@ -1803,11 +1835,15 @@  flow_rxq_flags_clear(struct rte_eth_dev *dev)
  *   Pointer to the Ethernet device structure.
  */
 void
-mlx5_flow_rxq_dynf_metadata_set(struct rte_eth_dev *dev)
+mlx5_flow_rxq_dynf_set(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	uint64_t mark_flag = RTE_MBUF_F_RX_FDIR_ID;
 	unsigned int i;
 
+	if (priv->tunnel_enabled)
+		mark_flag |= mlx5_restore_info_dynflag;
+
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_priv *rxq = mlx5_rxq_get(dev, i);
 		struct mlx5_rxq_data *data;
@@ -1826,6 +1862,7 @@  mlx5_flow_rxq_dynf_metadata_set(struct rte_eth_dev *dev)
 			data->flow_meta_offset = rte_flow_dynf_metadata_offs;
 			data->flow_meta_port_mask = priv->sh->dv_meta_mask;
 		}
+		data->mark_flag = mark_flag;
 	}
 }
 
@@ -11560,12 +11597,10 @@  mlx5_flow_tunnel_get_restore_info(struct rte_eth_dev *dev,
 	uint64_t ol_flags = m->ol_flags;
 	const struct mlx5_flow_tbl_data_entry *tble;
 	const uint64_t mask = RTE_MBUF_F_RX_FDIR | RTE_MBUF_F_RX_FDIR_ID;
+	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (!is_tunnel_offload_active(dev)) {
-		info->flags = 0;
-		return 0;
-	}
-
+	if (priv->tunnel_enabled == 0)
+		goto err;
 	if ((ol_flags & mask) != mask)
 		goto err;
 	tble = tunnel_mark_decode(dev, m->hash.fdir.hi);
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index a2be523e9e..71c4638251 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -857,7 +857,7 @@  rxq_cq_to_mbuf(struct mlx5_rxq_data *rxq, struct rte_mbuf *pkt,
 		if (MLX5_FLOW_MARK_IS_VALID(mark)) {
 			pkt->ol_flags |= RTE_MBUF_F_RX_FDIR;
 			if (mark != RTE_BE32(MLX5_FLOW_MARK_DEFAULT)) {
-				pkt->ol_flags |= RTE_MBUF_F_RX_FDIR_ID;
+				pkt->ol_flags |= rxq->mark_flag;
 				pkt->hash.fdir.hi = mlx5_flow_mark_get(mark);
 			}
 		}
diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 52c35c83f8..3514edd84e 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -136,6 +136,7 @@  struct mlx5_rxq_data {
 	struct mlx5_uar_data uar_data; /* CQ doorbell. */
 	uint32_t cqn; /* CQ number. */
 	uint8_t cq_arm_sn; /* CQ arm seq number. */
+	uint64_t mark_flag; /* ol_flags to set with marks. */
 	uint32_t tunnel; /* Tunnel information. */
 	int timestamp_offset; /* Dynamic mbuf field for timestamp. */
 	uint64_t timestamp_rx_flag; /* Dynamic mbuf flag for timestamp. */
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
index 14ffff26f4..4d0d05c376 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
@@ -296,15 +296,15 @@  rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 				const __vector unsigned char fdir_all_flags =
 					(__vector unsigned char)
 					(__vector unsigned int){
-					RTE_MBUF_F_RX_FDIR | RTE_MBUF_F_RX_FDIR_ID,
-					RTE_MBUF_F_RX_FDIR | RTE_MBUF_F_RX_FDIR_ID,
-					RTE_MBUF_F_RX_FDIR | RTE_MBUF_F_RX_FDIR_ID,
-					RTE_MBUF_F_RX_FDIR | RTE_MBUF_F_RX_FDIR_ID};
+					RTE_MBUF_F_RX_FDIR | rxq->mark_flag,
+					RTE_MBUF_F_RX_FDIR | rxq->mark_flag,
+					RTE_MBUF_F_RX_FDIR | rxq->mark_flag,
+					RTE_MBUF_F_RX_FDIR | rxq->mark_flag};
 				__vector unsigned char fdir_id_flags =
 					(__vector unsigned char)
 					(__vector unsigned int){
-					RTE_MBUF_F_RX_FDIR_ID, RTE_MBUF_F_RX_FDIR_ID,
-					RTE_MBUF_F_RX_FDIR_ID, RTE_MBUF_F_RX_FDIR_ID};
+					rxq->mark_flag, rxq->mark_flag,
+					rxq->mark_flag, rxq->mark_flag};
 				/* Extract flow_tag field. */
 				__vector unsigned char ftag0 = vec_perm(mcqe1,
 							zero, flow_mark_shuf);
@@ -632,8 +632,8 @@  rxq_cq_to_ptype_oflags_v(struct mlx5_rxq_data *rxq,
 			RTE_MBUF_F_RX_FDIR, RTE_MBUF_F_RX_FDIR};
 		__vector unsigned char fdir_id_flags =
 			(__vector unsigned char)(__vector unsigned int){
-			RTE_MBUF_F_RX_FDIR_ID, RTE_MBUF_F_RX_FDIR_ID,
-			RTE_MBUF_F_RX_FDIR_ID, RTE_MBUF_F_RX_FDIR_ID};
+			rxq->mark_flag, rxq->mark_flag,
+			rxq->mark_flag, rxq->mark_flag};
 		__vector unsigned char flow_tag, invalid_mask;
 
 		flow_tag = (__vector unsigned char)
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index 75e8ed7e5a..91c85bec6d 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -231,9 +231,9 @@  rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 					vdupq_n_u32(RTE_MBUF_F_RX_FDIR);
 				const uint32x4_t fdir_all_flags =
 					vdupq_n_u32(RTE_MBUF_F_RX_FDIR |
-						    RTE_MBUF_F_RX_FDIR_ID);
+						    rxq->mark_flag);
 				uint32x4_t fdir_id_flags =
-					vdupq_n_u32(RTE_MBUF_F_RX_FDIR_ID);
+					vdupq_n_u32(rxq->mark_flag);
 				uint32x4_t invalid_mask, ftag;
 
 				__asm__ volatile
@@ -446,7 +446,7 @@  rxq_cq_to_ptype_oflags_v(struct mlx5_rxq_data *rxq,
 	if (rxq->mark) {
 		const uint32x4_t ft_def = vdupq_n_u32(MLX5_FLOW_MARK_DEFAULT);
 		const uint32x4_t fdir_flags = vdupq_n_u32(RTE_MBUF_F_RX_FDIR);
-		uint32x4_t fdir_id_flags = vdupq_n_u32(RTE_MBUF_F_RX_FDIR_ID);
+		uint32x4_t fdir_id_flags = vdupq_n_u32(rxq->mark_flag);
 		uint32x4_t invalid_mask;
 
 		/* Check if flow tag is non-zero then set RTE_MBUF_F_RX_FDIR. */
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
index b282f8b8e6..0766952255 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
@@ -214,9 +214,9 @@  rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 					_mm_set1_epi32(RTE_MBUF_F_RX_FDIR);
 				const __m128i fdir_all_flags =
 					_mm_set1_epi32(RTE_MBUF_F_RX_FDIR |
-						       RTE_MBUF_F_RX_FDIR_ID);
+						       rxq->mark_flag);
 				__m128i fdir_id_flags =
-					_mm_set1_epi32(RTE_MBUF_F_RX_FDIR_ID);
+					_mm_set1_epi32(rxq->mark_flag);
 
 				/* Extract flow_tag field. */
 				__m128i ftag0 =
@@ -442,7 +442,7 @@  rxq_cq_to_ptype_oflags_v(struct mlx5_rxq_data *rxq, __m128i cqes[4],
 	if (rxq->mark) {
 		const __m128i pinfo_ft_mask = _mm_set1_epi32(0xffffff00);
 		const __m128i fdir_flags = _mm_set1_epi32(RTE_MBUF_F_RX_FDIR);
-		__m128i fdir_id_flags = _mm_set1_epi32(RTE_MBUF_F_RX_FDIR_ID);
+		__m128i fdir_id_flags = _mm_set1_epi32(rxq->mark_flag);
 		__m128i flow_tag, invalid_mask;
 
 		flow_tag = _mm_and_si128(pinfo, pinfo_ft_mask);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index bbaa7d2aa0..7bdb897612 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1282,8 +1282,8 @@  mlx5_dev_start(struct rte_eth_dev *dev)
 			dev->data->port_id);
 		goto error;
 	}
-	/* Set a mask and offset of dynamic metadata flows into Rx queues. */
-	mlx5_flow_rxq_dynf_metadata_set(dev);
+	/* Set dynamic fields and flags into Rx queues. */
+	mlx5_flow_rxq_dynf_set(dev);
 	/* Set flags and context to convert Rx timestamps. */
 	mlx5_rxq_timestamp_set(dev);
 	/* Set a mask and offset of scheduling on timestamp into Tx queues. */
diff --git a/drivers/net/sfc/sfc_dp.c b/drivers/net/sfc/sfc_dp.c
index 9f2093b353..2b0a1d749d 100644
--- a/drivers/net/sfc/sfc_dp.c
+++ b/drivers/net/sfc/sfc_dp.c
@@ -11,6 +11,7 @@ 
 #include <string.h>
 #include <errno.h>
 
+#include <rte_ethdev.h>
 #include <rte_log.h>
 #include <rte_mbuf_dyn.h>
 
@@ -135,12 +136,8 @@  sfc_dp_ft_ctx_id_register(void)
 		.size = sizeof(uint8_t),
 		.align = __alignof__(uint8_t),
 	};
-	static const struct rte_mbuf_dynflag ft_ctx_id_valid = {
-		.name = "rte_net_sfc_dynflag_ft_ctx_id_valid",
-	};
 
 	int field_offset;
-	int flag;
 
 	SFC_GENERIC_LOG(INFO, "%s() entry", __func__);
 
@@ -156,15 +153,8 @@  sfc_dp_ft_ctx_id_register(void)
 		return -1;
 	}
 
-	flag = rte_mbuf_dynflag_register(&ft_ctx_id_valid);
-	if (flag < 0) {
-		SFC_GENERIC_LOG(ERR, "%s() failed to register ft_ctx_id dynflag",
-				__func__);
-		return -1;
-	}
-
+	sfc_dp_ft_ctx_id_valid = rte_flow_restore_info_dynflag();
 	sfc_dp_ft_ctx_id_offset = field_offset;
-	sfc_dp_ft_ctx_id_valid = UINT64_C(1) << flag;
 
 	SFC_GENERIC_LOG(INFO, "%s() done", __func__);
 
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 5f1a656420..2ee0f63321 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -30,6 +30,7 @@ 
 #include "rte_ethdev.h"
 #include "rte_ethdev_trace_fp.h"
 #include "ethdev_driver.h"
+#include "rte_flow_driver.h"
 #include "ethdev_profile.h"
 #include "ethdev_private.h"
 #include "ethdev_trace.h"
@@ -6502,6 +6503,10 @@  rte_eth_rx_metadata_negotiate(uint16_t port_id, uint64_t *features)
 		return -EINVAL;
 	}
 
+	if ((*features & RTE_ETH_RX_METADATA_TUNNEL_ID) != 0 &&
+			rte_flow_restore_info_dynflag_register() < 0)
+		*features &= ~RTE_ETH_RX_METADATA_TUNNEL_ID;
+
 	if (*dev->dev_ops->rx_metadata_negotiate == NULL)
 		return -ENOTSUP;
 	ret = eth_err(port_id,
diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index d41963b42e..271d854f78 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -1441,6 +1441,33 @@  rte_flow_get_restore_info(uint16_t port_id,
 				  NULL, rte_strerror(ENOTSUP));
 }
 
+static struct {
+	const struct rte_mbuf_dynflag desc;
+	uint64_t value;
+} flow_restore_info_dynflag = {
+	.desc = { .name = "RTE_MBUF_F_RX_RESTORE_INFO", },
+};
+
+uint64_t
+rte_flow_restore_info_dynflag(void)
+{
+	return flow_restore_info_dynflag.value;
+}
+
+int
+rte_flow_restore_info_dynflag_register(void)
+{
+	if (flow_restore_info_dynflag.value == 0) {
+		int offset = rte_mbuf_dynflag_register(&flow_restore_info_dynflag.desc);
+
+		if (offset < 0)
+			return -1;
+		flow_restore_info_dynflag.value = RTE_BIT64(offset);
+	}
+
+	return 0;
+}
+
 int
 rte_flow_tunnel_action_decap_release(uint16_t port_id,
 				     struct rte_flow_action *actions,
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index dec454275f..261d95378b 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -5128,7 +5128,23 @@  rte_flow_tunnel_match(uint16_t port_id,
 		      struct rte_flow_error *error);
 
 /**
- * Populate the current packet processing state, if exists, for the given mbuf.
+ * On reception of a mbuf from HW, a call to rte_flow_get_restore_info() may be
+ * required to retrieve some metadata.
+ * This function returns the associated mbuf ol_flags.
+ *
+ * Note: the dynamic flag is registered during a call to
+ * rte_eth_rx_metadata_negotiate() with RTE_ETH_RX_METADATA_TUNNEL_ID.
+ *
+ * @return
+ *   The offload flag indicating rte_flow_get_restore_info() must be called.
+ */
+__rte_experimental
+uint64_t
+rte_flow_restore_info_dynflag(void);
+
+/**
+ * If a mbuf contains the rte_flow_restore_info_dynflag() flag in ol_flags,
+ * populate the current packet processing state.
  *
  * One should negotiate tunnel metadata delivery from the NIC to the HW.
  * @see rte_eth_rx_metadata_negotiate()
diff --git a/lib/ethdev/rte_flow_driver.h b/lib/ethdev/rte_flow_driver.h
index 356b60f523..f9fb01b8a2 100644
--- a/lib/ethdev/rte_flow_driver.h
+++ b/lib/ethdev/rte_flow_driver.h
@@ -376,6 +376,12 @@  struct rte_flow_ops {
 const struct rte_flow_ops *
 rte_flow_ops_get(uint16_t port_id, struct rte_flow_error *error);
 
+/**
+ * Register mbuf dynamic flag for rte_flow_get_restore_info.
+ */
+int
+rte_flow_restore_info_dynflag_register(void);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 69c9f09a82..fc492ee839 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -311,6 +311,7 @@  EXPERIMENTAL {
 	rte_flow_async_action_list_handle_destroy;
 	rte_flow_async_action_list_handle_query_update;
 	rte_flow_async_actions_update;
+	rte_flow_restore_info_dynflag;
 };
 
 INTERNAL {