RFC: ethdev: add reassembly offload

Message ID 20210823100259.1619886-1-gakhil@marvell.com (mailing list archive)
State Changes Requested, archived
Delegated to: Ferruh Yigit
Headers
Series RFC: ethdev: add reassembly offload |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-testing fail Testing issues
ci/iol-aarch64-unit-testing fail Testing issues
ci/iol-mellanox-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS

Commit Message

Akhil Goyal Aug. 23, 2021, 10:02 a.m. UTC
  Reassembly is a costly operation if it is done in
software, however, if it is offloaded to HW, it can
considerably save application cycles.
The operation becomes even more costlier if IP fragmants
are encrypted.

To resolve above two issues, a new offload
DEV_RX_OFFLOAD_REASSEMBLY is introduced in ethdev for
devices which can attempt reassembly of packets in hardware.
rte_eth_dev_info is added with the reassembly capabilities
which a device can support.
Now, if IP fragments are encrypted, reassembly can also be
attempted while doing inline IPsec processing.
This is controlled by a flag in rte_security_ipsec_sa_options
to enable reassembly of encrypted IP fragments in the inline
path.

The resulting reassembled packet would be a typical
segmented mbuf in case of success.

And if reassembly of fragments is failed or is incomplete (if
fragments do not come before the reass_timeout), the mbuf is
updated with an ol_flag PKT_RX_REASSEMBLY_INCOMPLETE and
mbuf is returned as is. Now application may decide the fate
of the packet to wait more for fragments to come or drop.

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
---
 lib/ethdev/rte_ethdev.c     |  1 +
 lib/ethdev/rte_ethdev.h     | 18 +++++++++++++++++-
 lib/mbuf/rte_mbuf_core.h    |  3 ++-
 lib/security/rte_security.h | 10 ++++++++++
 4 files changed, 30 insertions(+), 2 deletions(-)
  

Comments

Andrew Rybchenko Aug. 23, 2021, 10:18 a.m. UTC | #1
On 8/23/21 1:02 PM, Akhil Goyal wrote:
> Reassembly is a costly operation if it is done in
> software, however, if it is offloaded to HW, it can
> considerably save application cycles.
> The operation becomes even more costlier if IP fragmants
> are encrypted.
> 
> To resolve above two issues, a new offload
> DEV_RX_OFFLOAD_REASSEMBLY is introduced in ethdev for
> devices which can attempt reassembly of packets in hardware.
> rte_eth_dev_info is added with the reassembly capabilities
> which a device can support.
> Now, if IP fragments are encrypted, reassembly can also be
> attempted while doing inline IPsec processing.
> This is controlled by a flag in rte_security_ipsec_sa_options
> to enable reassembly of encrypted IP fragments in the inline
> path.
> 
> The resulting reassembled packet would be a typical
> segmented mbuf in case of success.
> 
> And if reassembly of fragments is failed or is incomplete (if
> fragments do not come before the reass_timeout), the mbuf is
> updated with an ol_flag PKT_RX_REASSEMBLY_INCOMPLETE and
> mbuf is returned as is. Now application may decide the fate
> of the packet to wait more for fragments to come or drop.
> 
> Signed-off-by: Akhil Goyal <gakhil@marvell.com>

Is it IPv4 only or IPv6 as well? I guess IPv4 only to start
with. If so, I think offload name should say so. See below.

I'd say that the feature should be added to
doc/guides/nics/features.rst

Do we really need RX_REASSEMBLY_INCOMPLETE if we provide
buffered packets for incomplete reassembly anyway?
I guess it is sufficient to cover simply reassembly case
only in HW when there is no overlapping fragments etc.
Everything else should be handled in SW anyway as without
the offload support at all.

> ---
>  lib/ethdev/rte_ethdev.c     |  1 +
>  lib/ethdev/rte_ethdev.h     | 18 +++++++++++++++++-
>  lib/mbuf/rte_mbuf_core.h    |  3 ++-
>  lib/security/rte_security.h | 10 ++++++++++
>  4 files changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 9d95cd11e1..1ab3a093cf 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -119,6 +119,7 @@ static const struct {
>  	RTE_RX_OFFLOAD_BIT2STR(VLAN_FILTER),
>  	RTE_RX_OFFLOAD_BIT2STR(VLAN_EXTEND),
>  	RTE_RX_OFFLOAD_BIT2STR(JUMBO_FRAME),
> +	RTE_RX_OFFLOAD_BIT2STR(REASSEMBLY),
>  	RTE_RX_OFFLOAD_BIT2STR(SCATTER),
>  	RTE_RX_OFFLOAD_BIT2STR(TIMESTAMP),
>  	RTE_RX_OFFLOAD_BIT2STR(SECURITY),
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index d2b27c351f..e89a4dc1eb 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1360,6 +1360,7 @@ struct rte_eth_conf {
>  #define DEV_RX_OFFLOAD_VLAN_FILTER	0x00000200
>  #define DEV_RX_OFFLOAD_VLAN_EXTEND	0x00000400
>  #define DEV_RX_OFFLOAD_JUMBO_FRAME	0x00000800
> +#define DEV_RX_OFFLOAD_REASSEMBLY	0x00001000

I think it should be:
RTE_ETH_RX_OFFLOAD_IPV4_REASSEMBLY

i.e. have correct prefix similar to
RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT and mention IPv4.

If we'd like to cover IPv6 as well, it could be
RTE_ETH_RX_OFFLOAD_IP_REASSEMBLY and have IPv4/6
support bits in the offload capabilities below.

>  #define DEV_RX_OFFLOAD_SCATTER		0x00002000
>  /**
>   * Timestamp is set by the driver in RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> @@ -1477,6 +1478,20 @@ struct rte_eth_dev_portconf {
>   */
>  #define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID	(UINT16_MAX)
>  
> +/**
> + * Reassembly capabilities that a device can support.
> + * The device which can support reassembly offload should set
> + * DEV_RX_OFFLOAD_REASSEMBLY
> + */
> +struct rte_eth_reass_capa {
> +	/** Maximum time in ns that a fragment can wait for further fragments */
> +	uint64_t reass_timeout;
> +	/** Maximum number of fragments that device can reassemble */
> +	uint16_t max_frags;
> +	/** Reserved for future capabilities */
> +	uint16_t reserved[3];
> +};
> +
>  /**
>   * Ethernet device associated switch information
>   */
> @@ -1582,8 +1597,9 @@ struct rte_eth_dev_info {
>  	 * embedded managed interconnect/switch.
>  	 */
>  	struct rte_eth_switch_info switch_info;
> +	/* Reassembly capabilities of a device for reassembly offload */
> +	struct rte_eth_reass_capa reass_capa;
>  
> -	uint64_t reserved_64s[2]; /**< Reserved for future fields */
>  	void *reserved_ptrs[2];   /**< Reserved for future fields */
>  };
>  
> diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> index bb38d7f581..cea25c87f7 100644
> --- a/lib/mbuf/rte_mbuf_core.h
> +++ b/lib/mbuf/rte_mbuf_core.h
> @@ -200,10 +200,11 @@ extern "C" {
>  #define PKT_RX_OUTER_L4_CKSUM_BAD	(1ULL << 21)
>  #define PKT_RX_OUTER_L4_CKSUM_GOOD	(1ULL << 22)
>  #define PKT_RX_OUTER_L4_CKSUM_INVALID	((1ULL << 21) | (1ULL << 22))
> +#define PKT_RX_REASSEMBLY_INCOMPLETE	(1ULL << 23)

In accordance with deprecation notice it should be
RTE_MBUF_F_RX_REASSEMBLY_INCOMPLETE

>  
>  /* add new RX flags here, don't forget to update PKT_FIRST_FREE */
>  
> -#define PKT_FIRST_FREE (1ULL << 23)
> +#define PKT_FIRST_FREE (1ULL << 24)
>  #define PKT_LAST_FREE (1ULL << 40)
>  
>  /* add new TX flags here, don't forget to update PKT_LAST_FREE  */
> diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
> index 88d31de0a6..364eeb5cd4 100644
> --- a/lib/security/rte_security.h
> +++ b/lib/security/rte_security.h
> @@ -181,6 +181,16 @@ struct rte_security_ipsec_sa_options {
>  	 * * 0: Disable per session security statistics collection for this SA.
>  	 */
>  	uint32_t stats : 1;
> +
> +	/** Enable reassembly on incoming packets.
> +	 *
> +	 * * 1: Enable driver to try reassembly of encrypted IP packets for
> +	 *      this SA, if supported by the driver. This feature will work
> +	 *      only if rx_offload DEV_RX_OFFLOAD_REASSEMBLY is set in
> +	 *      inline ethernet device.

ethernet -> Ethernet

> +	 * * 0: Disable reassembly of packets (default).
> +	 */
> +	uint32_t reass_en : 1;
>  };
>  
>  /** IPSec security association direction */
>
  
Akhil Goyal Aug. 29, 2021, 1:14 p.m. UTC | #2
> On 8/23/21 1:02 PM, Akhil Goyal wrote:
> > Reassembly is a costly operation if it is done in
> > software, however, if it is offloaded to HW, it can
> > considerably save application cycles.
> > The operation becomes even more costlier if IP fragmants
> > are encrypted.
> >
> > To resolve above two issues, a new offload
> > DEV_RX_OFFLOAD_REASSEMBLY is introduced in ethdev for
> > devices which can attempt reassembly of packets in hardware.
> > rte_eth_dev_info is added with the reassembly capabilities
> > which a device can support.
> > Now, if IP fragments are encrypted, reassembly can also be
> > attempted while doing inline IPsec processing.
> > This is controlled by a flag in rte_security_ipsec_sa_options
> > to enable reassembly of encrypted IP fragments in the inline
> > path.
> >
> > The resulting reassembled packet would be a typical
> > segmented mbuf in case of success.
> >
> > And if reassembly of fragments is failed or is incomplete (if
> > fragments do not come before the reass_timeout), the mbuf is
> > updated with an ol_flag PKT_RX_REASSEMBLY_INCOMPLETE and
> > mbuf is returned as is. Now application may decide the fate
> > of the packet to wait more for fragments to come or drop.
> >
> > Signed-off-by: Akhil Goyal <gakhil@marvell.com>
> 
> Is it IPv4 only or IPv6 as well? I guess IPv4 only to start
> with. If so, I think offload name should say so. See below.
> 
We can update spec for both and update capabilities for both.
See below.

> I'd say that the feature should be added to
> doc/guides/nics/features.rst

OK will update in next version
> 
> Do we really need RX_REASSEMBLY_INCOMPLETE if we provide
> buffered packets for incomplete reassembly anyway?
> I guess it is sufficient to cover simply reassembly case
> only in HW when there is no overlapping fragments etc.
> Everything else should be handled in SW anyway as without
> the offload support at all.
> 
In that case, application would need to again parse the packet
to check whether it is a fragment or not even when the reassembly
is not required. However, we would consider your suggestion in
implementation.

> > ---
> >  lib/ethdev/rte_ethdev.c     |  1 +
> >  lib/ethdev/rte_ethdev.h     | 18 +++++++++++++++++-
> >  lib/mbuf/rte_mbuf_core.h    |  3 ++-
> >  lib/security/rte_security.h | 10 ++++++++++
> >  4 files changed, 30 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> > index 9d95cd11e1..1ab3a093cf 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -119,6 +119,7 @@ static const struct {
> >  	RTE_RX_OFFLOAD_BIT2STR(VLAN_FILTER),
> >  	RTE_RX_OFFLOAD_BIT2STR(VLAN_EXTEND),
> >  	RTE_RX_OFFLOAD_BIT2STR(JUMBO_FRAME),
> > +	RTE_RX_OFFLOAD_BIT2STR(REASSEMBLY),
> >  	RTE_RX_OFFLOAD_BIT2STR(SCATTER),
> >  	RTE_RX_OFFLOAD_BIT2STR(TIMESTAMP),
> >  	RTE_RX_OFFLOAD_BIT2STR(SECURITY),
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > index d2b27c351f..e89a4dc1eb 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1360,6 +1360,7 @@ struct rte_eth_conf {
> >  #define DEV_RX_OFFLOAD_VLAN_FILTER	0x00000200
> >  #define DEV_RX_OFFLOAD_VLAN_EXTEND	0x00000400
> >  #define DEV_RX_OFFLOAD_JUMBO_FRAME	0x00000800
> > +#define DEV_RX_OFFLOAD_REASSEMBLY	0x00001000
> 
> I think it should be:
> RTE_ETH_RX_OFFLOAD_IPV4_REASSEMBLY
> 
> i.e. have correct prefix similar to
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT and mention IPv4.
> 
> If we'd like to cover IPv6 as well, it could be
> RTE_ETH_RX_OFFLOAD_IP_REASSEMBLY and have IPv4/6
> support bits in the offload capabilities below.

Intention is to update spec for both.
Will update the capabilities accordingly to have both IPv4 and IPv6.

> 
> >  #define DEV_RX_OFFLOAD_SCATTER		0x00002000
> >  /**
> >   * Timestamp is set by the driver in
> RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> > @@ -1477,6 +1478,20 @@ struct rte_eth_dev_portconf {
> >   */
> >  #define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID
> 	(UINT16_MAX)
> >
> > +/**
> > + * Reassembly capabilities that a device can support.
> > + * The device which can support reassembly offload should set
> > + * DEV_RX_OFFLOAD_REASSEMBLY
> > + */
> > +struct rte_eth_reass_capa {
> > +	/** Maximum time in ns that a fragment can wait for further
> fragments */
> > +	uint64_t reass_timeout;
> > +	/** Maximum number of fragments that device can reassemble */
> > +	uint16_t max_frags;
> > +	/** Reserved for future capabilities */
> > +	uint16_t reserved[3];
> > +};
> > +
> >  /**
> >   * Ethernet device associated switch information
> >   */
> > @@ -1582,8 +1597,9 @@ struct rte_eth_dev_info {
> >  	 * embedded managed interconnect/switch.
> >  	 */
> >  	struct rte_eth_switch_info switch_info;
> > +	/* Reassembly capabilities of a device for reassembly offload */
> > +	struct rte_eth_reass_capa reass_capa;
> >
> > -	uint64_t reserved_64s[2]; /**< Reserved for future fields */
> >  	void *reserved_ptrs[2];   /**< Reserved for future fields */
> >  };
> >
> > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > index bb38d7f581..cea25c87f7 100644
> > --- a/lib/mbuf/rte_mbuf_core.h
> > +++ b/lib/mbuf/rte_mbuf_core.h
> > @@ -200,10 +200,11 @@ extern "C" {
> >  #define PKT_RX_OUTER_L4_CKSUM_BAD	(1ULL << 21)
> >  #define PKT_RX_OUTER_L4_CKSUM_GOOD	(1ULL << 22)
> >  #define PKT_RX_OUTER_L4_CKSUM_INVALID	((1ULL << 21) | (1ULL << 22))
> > +#define PKT_RX_REASSEMBLY_INCOMPLETE	(1ULL << 23)
> 
> In accordance with deprecation notice it should be
> RTE_MBUF_F_RX_REASSEMBLY_INCOMPLETE
> 
Ok will correct in next version.

> >
> >  /* add new RX flags here, don't forget to update PKT_FIRST_FREE */
> >
> > -#define PKT_FIRST_FREE (1ULL << 23)
> > +#define PKT_FIRST_FREE (1ULL << 24)
> >  #define PKT_LAST_FREE (1ULL << 40)
> >
> >  /* add new TX flags here, don't forget to update PKT_LAST_FREE  */
> > diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
> > index 88d31de0a6..364eeb5cd4 100644
> > --- a/lib/security/rte_security.h
> > +++ b/lib/security/rte_security.h
> > @@ -181,6 +181,16 @@ struct rte_security_ipsec_sa_options {
> >  	 * * 0: Disable per session security statistics collection for this SA.
> >  	 */
> >  	uint32_t stats : 1;
> > +
> > +	/** Enable reassembly on incoming packets.
> > +	 *
> > +	 * * 1: Enable driver to try reassembly of encrypted IP packets for
> > +	 *      this SA, if supported by the driver. This feature will work
> > +	 *      only if rx_offload DEV_RX_OFFLOAD_REASSEMBLY is set in
> > +	 *      inline ethernet device.
> 
> ethernet -> Ethernet
> 
> > +	 * * 0: Disable reassembly of packets (default).
> > +	 */
> > +	uint32_t reass_en : 1;
> >  };
> >
> >  /** IPSec security association direction */
> >
  
Ferruh Yigit Sept. 7, 2021, 8:47 a.m. UTC | #3
On 8/23/2021 11:02 AM, Akhil Goyal wrote:
> Reassembly is a costly operation if it is done in
> software, however, if it is offloaded to HW, it can
> considerably save application cycles.
> The operation becomes even more costlier if IP fragmants
> are encrypted.
> 
> To resolve above two issues, a new offload
> DEV_RX_OFFLOAD_REASSEMBLY is introduced in ethdev for
> devices which can attempt reassembly of packets in hardware.
> rte_eth_dev_info is added with the reassembly capabilities
> which a device can support.
> Now, if IP fragments are encrypted, reassembly can also be
> attempted while doing inline IPsec processing.
> This is controlled by a flag in rte_security_ipsec_sa_options
> to enable reassembly of encrypted IP fragments in the inline
> path.
> 
> The resulting reassembled packet would be a typical
> segmented mbuf in case of success.
> 
> And if reassembly of fragments is failed or is incomplete (if
> fragments do not come before the reass_timeout), the mbuf is
> updated with an ol_flag PKT_RX_REASSEMBLY_INCOMPLETE and
> mbuf is returned as is. Now application may decide the fate
> of the packet to wait more for fragments to come or drop.
> 
> Signed-off-by: Akhil Goyal <gakhil@marvell.com>
> ---
>  lib/ethdev/rte_ethdev.c     |  1 +
>  lib/ethdev/rte_ethdev.h     | 18 +++++++++++++++++-
>  lib/mbuf/rte_mbuf_core.h    |  3 ++-
>  lib/security/rte_security.h | 10 ++++++++++
>  4 files changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 9d95cd11e1..1ab3a093cf 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -119,6 +119,7 @@ static const struct {
>  	RTE_RX_OFFLOAD_BIT2STR(VLAN_FILTER),
>  	RTE_RX_OFFLOAD_BIT2STR(VLAN_EXTEND),
>  	RTE_RX_OFFLOAD_BIT2STR(JUMBO_FRAME),
> +	RTE_RX_OFFLOAD_BIT2STR(REASSEMBLY),
>  	RTE_RX_OFFLOAD_BIT2STR(SCATTER),
>  	RTE_RX_OFFLOAD_BIT2STR(TIMESTAMP),
>  	RTE_RX_OFFLOAD_BIT2STR(SECURITY),
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index d2b27c351f..e89a4dc1eb 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1360,6 +1360,7 @@ struct rte_eth_conf {
>  #define DEV_RX_OFFLOAD_VLAN_FILTER	0x00000200
>  #define DEV_RX_OFFLOAD_VLAN_EXTEND	0x00000400
>  #define DEV_RX_OFFLOAD_JUMBO_FRAME	0x00000800
> +#define DEV_RX_OFFLOAD_REASSEMBLY	0x00001000

previous '0x00001000' was 'DEV_RX_OFFLOAD_CRC_STRIP', it has been long that
offload has been removed, but not sure if it cause any problem to re-use it.

>  #define DEV_RX_OFFLOAD_SCATTER		0x00002000
>  /**
>   * Timestamp is set by the driver in RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> @@ -1477,6 +1478,20 @@ struct rte_eth_dev_portconf {
>   */
>  #define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID	(UINT16_MAX)
>  
> +/**
> + * Reassembly capabilities that a device can support.
> + * The device which can support reassembly offload should set
> + * DEV_RX_OFFLOAD_REASSEMBLY
> + */
> +struct rte_eth_reass_capa {
> +	/** Maximum time in ns that a fragment can wait for further fragments */
> +	uint64_t reass_timeout;
> +	/** Maximum number of fragments that device can reassemble */
> +	uint16_t max_frags;
> +	/** Reserved for future capabilities */
> +	uint16_t reserved[3];
> +};
> +

I wonder if there is any other hardware around supports reassembly offload, it
would be good to get more feedback on the capabilities list.

>  /**
>   * Ethernet device associated switch information
>   */
> @@ -1582,8 +1597,9 @@ struct rte_eth_dev_info {
>  	 * embedded managed interconnect/switch.
>  	 */
>  	struct rte_eth_switch_info switch_info;
> +	/* Reassembly capabilities of a device for reassembly offload */
> +	struct rte_eth_reass_capa reass_capa;
>  
> -	uint64_t reserved_64s[2]; /**< Reserved for future fields */

Reserved fields were added to be able to update the struct without breaking the
ABI, so that a critical change doesn't have to wait until next ABI break release.
Since this is ABI break release, we can keep the reserved field and add the new
struct. Or this can be an opportunity to get rid of the reserved field.

Personally I have no objection to get rid of the reserved field, but better to
agree on this explicitly.

>  	void *reserved_ptrs[2];   /**< Reserved for future fields */
>  };
>  
> diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> index bb38d7f581..cea25c87f7 100644
> --- a/lib/mbuf/rte_mbuf_core.h
> +++ b/lib/mbuf/rte_mbuf_core.h
> @@ -200,10 +200,11 @@ extern "C" {
>  #define PKT_RX_OUTER_L4_CKSUM_BAD	(1ULL << 21)
>  #define PKT_RX_OUTER_L4_CKSUM_GOOD	(1ULL << 22)
>  #define PKT_RX_OUTER_L4_CKSUM_INVALID	((1ULL << 21) | (1ULL << 22))
> +#define PKT_RX_REASSEMBLY_INCOMPLETE	(1ULL << 23)
>  

Similar comment with Andrew's, what is the expectation from application if this
flag exists? Can we drop it to simplify the logic in the application?

>  /* add new RX flags here, don't forget to update PKT_FIRST_FREE */
>  
> -#define PKT_FIRST_FREE (1ULL << 23)
> +#define PKT_FIRST_FREE (1ULL << 24)
>  #define PKT_LAST_FREE (1ULL << 40)
>  
>  /* add new TX flags here, don't forget to update PKT_LAST_FREE  */
> diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
> index 88d31de0a6..364eeb5cd4 100644
> --- a/lib/security/rte_security.h
> +++ b/lib/security/rte_security.h
> @@ -181,6 +181,16 @@ struct rte_security_ipsec_sa_options {
>  	 * * 0: Disable per session security statistics collection for this SA.
>  	 */
>  	uint32_t stats : 1;
> +
> +	/** Enable reassembly on incoming packets.
> +	 *
> +	 * * 1: Enable driver to try reassembly of encrypted IP packets for
> +	 *      this SA, if supported by the driver. This feature will work
> +	 *      only if rx_offload DEV_RX_OFFLOAD_REASSEMBLY is set in
> +	 *      inline ethernet device.
> +	 * * 0: Disable reassembly of packets (default).
> +	 */
> +	uint32_t reass_en : 1;
>  };
>  
>  /** IPSec security association direction */
>
  
Xu, Rosen Sept. 8, 2021, 6:34 a.m. UTC | #4
Hi,

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Akhil Goyal
> Sent: Monday, August 23, 2021 18:03
> To: dev@dpdk.org
> Cc: anoobj@marvell.com; Nicolau, Radu <radu.nicolau@intel.com>; Doherty,
> Declan <declan.doherty@intel.com>; hemant.agrawal@nxp.com;
> matan@nvidia.com; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> thomas@monjalon.net; adwivedi@marvell.com; Yigit, Ferruh
> <ferruh.yigit@intel.com>; andrew.rybchenko@oktetlabs.ru; Akhil Goyal
> <gakhil@marvell.com>
> Subject: [dpdk-dev] [PATCH] RFC: ethdev: add reassembly offload
> 
> Reassembly is a costly operation if it is done in software, however, if it is
> offloaded to HW, it can considerably save application cycles.
> The operation becomes even more costlier if IP fragmants are encrypted.
> 
> To resolve above two issues, a new offload DEV_RX_OFFLOAD_REASSEMBLY
> is introduced in ethdev for devices which can attempt reassembly of packets
> in hardware.
> rte_eth_dev_info is added with the reassembly capabilities which a device
> can support.
> Now, if IP fragments are encrypted, reassembly can also be attempted while
> doing inline IPsec processing.
> This is controlled by a flag in rte_security_ipsec_sa_options to enable
> reassembly of encrypted IP fragments in the inline path.
> 
> The resulting reassembled packet would be a typical segmented mbuf in case
> of success.
> 
> And if reassembly of fragments is failed or is incomplete (if fragments do not
> come before the reass_timeout), the mbuf is updated with an ol_flag
> PKT_RX_REASSEMBLY_INCOMPLETE and mbuf is returned as is. Now
> application may decide the fate of the packet to wait more for fragments to
> come or drop.
> 
> Signed-off-by: Akhil Goyal <gakhil@marvell.com>
> ---
>  lib/ethdev/rte_ethdev.c     |  1 +
>  lib/ethdev/rte_ethdev.h     | 18 +++++++++++++++++-
>  lib/mbuf/rte_mbuf_core.h    |  3 ++-
>  lib/security/rte_security.h | 10 ++++++++++
>  4 files changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> 9d95cd11e1..1ab3a093cf 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -119,6 +119,7 @@ static const struct {
>  	RTE_RX_OFFLOAD_BIT2STR(VLAN_FILTER),
>  	RTE_RX_OFFLOAD_BIT2STR(VLAN_EXTEND),
>  	RTE_RX_OFFLOAD_BIT2STR(JUMBO_FRAME),
> +	RTE_RX_OFFLOAD_BIT2STR(REASSEMBLY),
>  	RTE_RX_OFFLOAD_BIT2STR(SCATTER),
>  	RTE_RX_OFFLOAD_BIT2STR(TIMESTAMP),
>  	RTE_RX_OFFLOAD_BIT2STR(SECURITY),
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> d2b27c351f..e89a4dc1eb 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1360,6 +1360,7 @@ struct rte_eth_conf {
>  #define DEV_RX_OFFLOAD_VLAN_FILTER	0x00000200
>  #define DEV_RX_OFFLOAD_VLAN_EXTEND	0x00000400
>  #define DEV_RX_OFFLOAD_JUMBO_FRAME	0x00000800
> +#define DEV_RX_OFFLOAD_REASSEMBLY	0x00001000
>  #define DEV_RX_OFFLOAD_SCATTER		0x00002000
>  /**
>   * Timestamp is set by the driver in
> RTE_MBUF_DYNFIELD_TIMESTAMP_NAME @@ -1477,6 +1478,20 @@ struct
> rte_eth_dev_portconf {
>   */
>  #define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID
> 	(UINT16_MAX)
> 
> +/**
> + * Reassembly capabilities that a device can support.
> + * The device which can support reassembly offload should set
> + * DEV_RX_OFFLOAD_REASSEMBLY
> + */
> +struct rte_eth_reass_capa {
> +	/** Maximum time in ns that a fragment can wait for further
> fragments */
> +	uint64_t reass_timeout;
> +	/** Maximum number of fragments that device can reassemble */
> +	uint16_t max_frags;
> +	/** Reserved for future capabilities */
> +	uint16_t reserved[3];
> +};

IP reassembly occurs at the final recipient of the message, NIC attempts to do it has a fer challenges. The reason is that having NICs need to worry about reassembling fragments would increase their complexity, so most likely it only can handle range length of datagrams. Seems rte_eth_reass_capa miss the max original datagrams length which NIC can support, this features is better to be negotiated between NIC and SW as well.

>  /**
>   * Ethernet device associated switch information
>   */
> @@ -1582,8 +1597,9 @@ struct rte_eth_dev_info {
>  	 * embedded managed interconnect/switch.
>  	 */
>  	struct rte_eth_switch_info switch_info;
> +	/* Reassembly capabilities of a device for reassembly offload */
> +	struct rte_eth_reass_capa reass_capa;
> 
> -	uint64_t reserved_64s[2]; /**< Reserved for future fields */
>  	void *reserved_ptrs[2];   /**< Reserved for future fields */
>  };
> 
> diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h index
> bb38d7f581..cea25c87f7 100644
> --- a/lib/mbuf/rte_mbuf_core.h
> +++ b/lib/mbuf/rte_mbuf_core.h
> @@ -200,10 +200,11 @@ extern "C" {
>  #define PKT_RX_OUTER_L4_CKSUM_BAD	(1ULL << 21)
>  #define PKT_RX_OUTER_L4_CKSUM_GOOD	(1ULL << 22)
>  #define PKT_RX_OUTER_L4_CKSUM_INVALID	((1ULL << 21) | (1ULL << 22))
> +#define PKT_RX_REASSEMBLY_INCOMPLETE	(1ULL << 23)
> 
>  /* add new RX flags here, don't forget to update PKT_FIRST_FREE */
> 
> -#define PKT_FIRST_FREE (1ULL << 23)
> +#define PKT_FIRST_FREE (1ULL << 24)
>  #define PKT_LAST_FREE (1ULL << 40)
> 
>  /* add new TX flags here, don't forget to update PKT_LAST_FREE  */ diff --git
> a/lib/security/rte_security.h b/lib/security/rte_security.h index
> 88d31de0a6..364eeb5cd4 100644
> --- a/lib/security/rte_security.h
> +++ b/lib/security/rte_security.h
> @@ -181,6 +181,16 @@ struct rte_security_ipsec_sa_options {
>  	 * * 0: Disable per session security statistics collection for this SA.
>  	 */
>  	uint32_t stats : 1;
> +
> +	/** Enable reassembly on incoming packets.
> +	 *
> +	 * * 1: Enable driver to try reassembly of encrypted IP packets for
> +	 *      this SA, if supported by the driver. This feature will work
> +	 *      only if rx_offload DEV_RX_OFFLOAD_REASSEMBLY is set in
> +	 *      inline ethernet device.
> +	 * * 0: Disable reassembly of packets (default).
> +	 */
> +	uint32_t reass_en : 1;
>  };
> 
>  /** IPSec security association direction */
> --
> 2.25.1
  
Xu, Rosen Sept. 8, 2021, 6:36 a.m. UTC | #5
Cc  myself

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Xu, Rosen
> Sent: Wednesday, September 08, 2021 14:34
> To: Akhil Goyal <gakhil@marvell.com>; dev@dpdk.org
> Cc: anoobj@marvell.com; Nicolau, Radu <radu.nicolau@intel.com>; Doherty,
> Declan <declan.doherty@intel.com>; hemant.agrawal@nxp.com;
> matan@nvidia.com; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> thomas@monjalon.net; adwivedi@marvell.com; Yigit, Ferruh
> <ferruh.yigit@intel.com>; andrew.rybchenko@oktetlabs.ru
> Subject: Re: [dpdk-dev] [PATCH] RFC: ethdev: add reassembly offload
> 
> Hi,
> 
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Akhil Goyal
> > Sent: Monday, August 23, 2021 18:03
> > To: dev@dpdk.org
> > Cc: anoobj@marvell.com; Nicolau, Radu <radu.nicolau@intel.com>;
> > Doherty, Declan <declan.doherty@intel.com>; hemant.agrawal@nxp.com;
> > matan@nvidia.com; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>;
> > thomas@monjalon.net; adwivedi@marvell.com; Yigit, Ferruh
> > <ferruh.yigit@intel.com>; andrew.rybchenko@oktetlabs.ru; Akhil Goyal
> > <gakhil@marvell.com>
> > Subject: [dpdk-dev] [PATCH] RFC: ethdev: add reassembly offload
> >
> > Reassembly is a costly operation if it is done in software, however,
> > if it is offloaded to HW, it can considerably save application cycles.
> > The operation becomes even more costlier if IP fragmants are encrypted.
> >
> > To resolve above two issues, a new offload
> DEV_RX_OFFLOAD_REASSEMBLY
> > is introduced in ethdev for devices which can attempt reassembly of
> > packets in hardware.
> > rte_eth_dev_info is added with the reassembly capabilities which a
> > device can support.
> > Now, if IP fragments are encrypted, reassembly can also be attempted
> > while doing inline IPsec processing.
> > This is controlled by a flag in rte_security_ipsec_sa_options to
> > enable reassembly of encrypted IP fragments in the inline path.
> >
> > The resulting reassembled packet would be a typical segmented mbuf in
> > case of success.
> >
> > And if reassembly of fragments is failed or is incomplete (if
> > fragments do not come before the reass_timeout), the mbuf is updated
> > with an ol_flag PKT_RX_REASSEMBLY_INCOMPLETE and mbuf is returned
> as
> > is. Now application may decide the fate of the packet to wait more for
> > fragments to come or drop.
> >
> > Signed-off-by: Akhil Goyal <gakhil@marvell.com>
> > ---
> >  lib/ethdev/rte_ethdev.c     |  1 +
> >  lib/ethdev/rte_ethdev.h     | 18 +++++++++++++++++-
> >  lib/mbuf/rte_mbuf_core.h    |  3 ++-
> >  lib/security/rte_security.h | 10 ++++++++++
> >  4 files changed, 30 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 9d95cd11e1..1ab3a093cf 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -119,6 +119,7 @@ static const struct {
> >  	RTE_RX_OFFLOAD_BIT2STR(VLAN_FILTER),
> >  	RTE_RX_OFFLOAD_BIT2STR(VLAN_EXTEND),
> >  	RTE_RX_OFFLOAD_BIT2STR(JUMBO_FRAME),
> > +	RTE_RX_OFFLOAD_BIT2STR(REASSEMBLY),
> >  	RTE_RX_OFFLOAD_BIT2STR(SCATTER),
> >  	RTE_RX_OFFLOAD_BIT2STR(TIMESTAMP),
> >  	RTE_RX_OFFLOAD_BIT2STR(SECURITY),
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > d2b27c351f..e89a4dc1eb 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1360,6 +1360,7 @@ struct rte_eth_conf {
> >  #define DEV_RX_OFFLOAD_VLAN_FILTER	0x00000200
> >  #define DEV_RX_OFFLOAD_VLAN_EXTEND	0x00000400
> >  #define DEV_RX_OFFLOAD_JUMBO_FRAME	0x00000800
> > +#define DEV_RX_OFFLOAD_REASSEMBLY	0x00001000
> >  #define DEV_RX_OFFLOAD_SCATTER		0x00002000
> >  /**
> >   * Timestamp is set by the driver in
> > RTE_MBUF_DYNFIELD_TIMESTAMP_NAME @@ -1477,6 +1478,20 @@
> struct
> > rte_eth_dev_portconf {
> >   */
> >  #define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID
> > 	(UINT16_MAX)
> >
> > +/**
> > + * Reassembly capabilities that a device can support.
> > + * The device which can support reassembly offload should set
> > + * DEV_RX_OFFLOAD_REASSEMBLY
> > + */
> > +struct rte_eth_reass_capa {
> > +	/** Maximum time in ns that a fragment can wait for further
> > fragments */
> > +	uint64_t reass_timeout;
> > +	/** Maximum number of fragments that device can reassemble */
> > +	uint16_t max_frags;
> > +	/** Reserved for future capabilities */
> > +	uint16_t reserved[3];
> > +};
> 
> IP reassembly occurs at the final recipient of the message, NIC attempts to
> do it has a fer challenges. The reason is that having NICs need to worry about
> reassembling fragments would increase their complexity, so most likely it
> only can handle range length of datagrams. Seems rte_eth_reass_capa miss
> the max original datagrams length which NIC can support, this features is
> better to be negotiated between NIC and SW as well.
> 
> >  /**
> >   * Ethernet device associated switch information
> >   */
> > @@ -1582,8 +1597,9 @@ struct rte_eth_dev_info {
> >  	 * embedded managed interconnect/switch.
> >  	 */
> >  	struct rte_eth_switch_info switch_info;
> > +	/* Reassembly capabilities of a device for reassembly offload */
> > +	struct rte_eth_reass_capa reass_capa;
> >
> > -	uint64_t reserved_64s[2]; /**< Reserved for future fields */
> >  	void *reserved_ptrs[2];   /**< Reserved for future fields */
> >  };
> >
> > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h index
> > bb38d7f581..cea25c87f7 100644
> > --- a/lib/mbuf/rte_mbuf_core.h
> > +++ b/lib/mbuf/rte_mbuf_core.h
> > @@ -200,10 +200,11 @@ extern "C" {
> >  #define PKT_RX_OUTER_L4_CKSUM_BAD	(1ULL << 21)
> >  #define PKT_RX_OUTER_L4_CKSUM_GOOD	(1ULL << 22)
> >  #define PKT_RX_OUTER_L4_CKSUM_INVALID	((1ULL << 21) | (1ULL
> << 22))
> > +#define PKT_RX_REASSEMBLY_INCOMPLETE	(1ULL << 23)
> >
> >  /* add new RX flags here, don't forget to update PKT_FIRST_FREE */
> >
> > -#define PKT_FIRST_FREE (1ULL << 23)
> > +#define PKT_FIRST_FREE (1ULL << 24)
> >  #define PKT_LAST_FREE (1ULL << 40)
> >
> >  /* add new TX flags here, don't forget to update PKT_LAST_FREE  */
> > diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
> > index
> > 88d31de0a6..364eeb5cd4 100644
> > --- a/lib/security/rte_security.h
> > +++ b/lib/security/rte_security.h
> > @@ -181,6 +181,16 @@ struct rte_security_ipsec_sa_options {
> >  	 * * 0: Disable per session security statistics collection for this SA.
> >  	 */
> >  	uint32_t stats : 1;
> > +
> > +	/** Enable reassembly on incoming packets.
> > +	 *
> > +	 * * 1: Enable driver to try reassembly of encrypted IP packets for
> > +	 *      this SA, if supported by the driver. This feature will work
> > +	 *      only if rx_offload DEV_RX_OFFLOAD_REASSEMBLY is set in
> > +	 *      inline ethernet device.
> > +	 * * 0: Disable reassembly of packets (default).
> > +	 */
> > +	uint32_t reass_en : 1;
> >  };
> >
> >  /** IPSec security association direction */
> > --
> > 2.25.1
  
Anoob Joseph Sept. 8, 2021, 10:29 a.m. UTC | #6
Hi Ferruh, Rosen, Andrew,

Please see inline.

Thanks,
Anoob

> Subject: [EXT] Re: [PATCH] RFC: ethdev: add reassembly offload
> 
> External Email
> 
> ----------------------------------------------------------------------
> On 8/23/2021 11:02 AM, Akhil Goyal wrote:
> > Reassembly is a costly operation if it is done in software, however,
> > if it is offloaded to HW, it can considerably save application cycles.
> > The operation becomes even more costlier if IP fragmants are
> > encrypted.
> >
> > To resolve above two issues, a new offload
> DEV_RX_OFFLOAD_REASSEMBLY
> > is introduced in ethdev for devices which can attempt reassembly of
> > packets in hardware.
> > rte_eth_dev_info is added with the reassembly capabilities which a
> > device can support.
> > Now, if IP fragments are encrypted, reassembly can also be attempted
> > while doing inline IPsec processing.
> > This is controlled by a flag in rte_security_ipsec_sa_options to
> > enable reassembly of encrypted IP fragments in the inline path.
> >
> > The resulting reassembled packet would be a typical segmented mbuf in
> > case of success.
> >
> > And if reassembly of fragments is failed or is incomplete (if
> > fragments do not come before the reass_timeout), the mbuf is updated
> > with an ol_flag PKT_RX_REASSEMBLY_INCOMPLETE and mbuf is returned
> as
> > is. Now application may decide the fate of the packet to wait more for
> > fragments to come or drop.
> >
> > Signed-off-by: Akhil Goyal <gakhil@marvell.com>
> > ---
> >  lib/ethdev/rte_ethdev.c     |  1 +
> >  lib/ethdev/rte_ethdev.h     | 18 +++++++++++++++++-
> >  lib/mbuf/rte_mbuf_core.h    |  3 ++-
> >  lib/security/rte_security.h | 10 ++++++++++
> >  4 files changed, 30 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 9d95cd11e1..1ab3a093cf 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -119,6 +119,7 @@ static const struct {
> >  	RTE_RX_OFFLOAD_BIT2STR(VLAN_FILTER),
> >  	RTE_RX_OFFLOAD_BIT2STR(VLAN_EXTEND),
> >  	RTE_RX_OFFLOAD_BIT2STR(JUMBO_FRAME),
> > +	RTE_RX_OFFLOAD_BIT2STR(REASSEMBLY),
> >  	RTE_RX_OFFLOAD_BIT2STR(SCATTER),
> >  	RTE_RX_OFFLOAD_BIT2STR(TIMESTAMP),
> >  	RTE_RX_OFFLOAD_BIT2STR(SECURITY),
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > d2b27c351f..e89a4dc1eb 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1360,6 +1360,7 @@ struct rte_eth_conf {
> >  #define DEV_RX_OFFLOAD_VLAN_FILTER	0x00000200
> >  #define DEV_RX_OFFLOAD_VLAN_EXTEND	0x00000400
> >  #define DEV_RX_OFFLOAD_JUMBO_FRAME	0x00000800
> > +#define DEV_RX_OFFLOAD_REASSEMBLY	0x00001000
> 
> previous '0x00001000' was 'DEV_RX_OFFLOAD_CRC_STRIP', it has been long
> that offload has been removed, but not sure if it cause any problem to re-
> use it.
> 
> >  #define DEV_RX_OFFLOAD_SCATTER		0x00002000
> >  /**
> >   * Timestamp is set by the driver in
> RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> > @@ -1477,6 +1478,20 @@ struct rte_eth_dev_portconf {
> >   */
> >  #define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID
> 	(UINT16_MAX)
> >
> > +/**
> > + * Reassembly capabilities that a device can support.
> > + * The device which can support reassembly offload should set
> > + * DEV_RX_OFFLOAD_REASSEMBLY
> > + */
> > +struct rte_eth_reass_capa {
> > +	/** Maximum time in ns that a fragment can wait for further
> fragments */
> > +	uint64_t reass_timeout;
> > +	/** Maximum number of fragments that device can reassemble */
> > +	uint16_t max_frags;
> > +	/** Reserved for future capabilities */
> > +	uint16_t reserved[3];
> > +};
> > +
> 
> I wonder if there is any other hardware around supports reassembly offload,
> it would be good to get more feedback on the capabilities list.
> 
> >  /**
> >   * Ethernet device associated switch information
> >   */
> > @@ -1582,8 +1597,9 @@ struct rte_eth_dev_info {
> >  	 * embedded managed interconnect/switch.
> >  	 */
> >  	struct rte_eth_switch_info switch_info;
> > +	/* Reassembly capabilities of a device for reassembly offload */
> > +	struct rte_eth_reass_capa reass_capa;
> >
> > -	uint64_t reserved_64s[2]; /**< Reserved for future fields */
> 
> Reserved fields were added to be able to update the struct without breaking
> the ABI, so that a critical change doesn't have to wait until next ABI break
> release.
> Since this is ABI break release, we can keep the reserved field and add the
> new struct. Or this can be an opportunity to get rid of the reserved field.
> 
> Personally I have no objection to get rid of the reserved field, but better to
> agree on this explicitly.
> 
> >  	void *reserved_ptrs[2];   /**< Reserved for future fields */
> >  };
> >
> > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h index
> > bb38d7f581..cea25c87f7 100644
> > --- a/lib/mbuf/rte_mbuf_core.h
> > +++ b/lib/mbuf/rte_mbuf_core.h
> > @@ -200,10 +200,11 @@ extern "C" {
> >  #define PKT_RX_OUTER_L4_CKSUM_BAD	(1ULL << 21)
> >  #define PKT_RX_OUTER_L4_CKSUM_GOOD	(1ULL << 22)
> >  #define PKT_RX_OUTER_L4_CKSUM_INVALID	((1ULL << 21) | (1ULL
> << 22))
> > +#define PKT_RX_REASSEMBLY_INCOMPLETE	(1ULL << 23)
> >
> 
> Similar comment with Andrew's, what is the expectation from application if
> this flag exists? Can we drop it to simplify the logic in the application?

[Anoob] There can be few cases where hardware/NIC attempts inline reassembly but it fails to complete it

1. Number of fragments is larger than what is supported by the hardware
2. Hardware reassembly resources are exhausted (due to limited reassembly contexts etc)
3. Reassembly errors such as overlapping fragments
4. Wait time exhausted (or reassembly timeout)

In such cases, application would be required to retrieve the original fragments so that it can attempt reassembly in software. The incomplete flag is useful for 2 purposes basically,
1. Application would need to retrieve the time the fragment has already spend in hardware reassembly so that software reassembly attempt can compensate for it. Otherwise, reassembly timeout across hardware + software will not be accurate
2. Retrieve original fragments. With this proposal, an incomplete reassembly would result in a chained mbuf but the segments need not be consecutive. To explain bit more,

Suppose we have a packet that is fragmented into 3 fragments, and fragment 3 & fragment 1 arrives in that order. Fragment 2 didn't arrive and hardware ultimately pushes it. In that case, application would be receiving a chained/segmented mbuf with fragment 1 & fragment 3 chained.

Now, this chained mbuf can't be treated like a regular chained mbuf. Each fragment would have its IP hdr and there are fragments missing in between. The only thing application is expected to do is, retrieve fragments, push it to s/w reassembly.
 
> 
> >  /* add new RX flags here, don't forget to update PKT_FIRST_FREE */
> >
> > -#define PKT_FIRST_FREE (1ULL << 23)
> > +#define PKT_FIRST_FREE (1ULL << 24)
> >  #define PKT_LAST_FREE (1ULL << 40)
> >
> >  /* add new TX flags here, don't forget to update PKT_LAST_FREE  */
> > diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
> > index 88d31de0a6..364eeb5cd4 100644
> > --- a/lib/security/rte_security.h
> > +++ b/lib/security/rte_security.h
> > @@ -181,6 +181,16 @@ struct rte_security_ipsec_sa_options {
> >  	 * * 0: Disable per session security statistics collection for this SA.
> >  	 */
> >  	uint32_t stats : 1;
> > +
> > +	/** Enable reassembly on incoming packets.
> > +	 *
> > +	 * * 1: Enable driver to try reassembly of encrypted IP packets for
> > +	 *      this SA, if supported by the driver. This feature will work
> > +	 *      only if rx_offload DEV_RX_OFFLOAD_REASSEMBLY is set in
> > +	 *      inline ethernet device.
> > +	 * * 0: Disable reassembly of packets (default).
> > +	 */
> > +	uint32_t reass_en : 1;
> >  };
> >
> >  /** IPSec security association direction */
> >
  
Xu, Rosen Sept. 13, 2021, 6:56 a.m. UTC | #7
Hi,

> -----Original Message-----
> From: Anoob Joseph <anoobj@marvell.com>
> Sent: Wednesday, September 08, 2021 18:30
> To: Yigit, Ferruh <ferruh.yigit@intel.com>; Xu, Rosen <rosen.xu@intel.com>;
> Andrew Rybchenko <arybchenko@solarflare.com>
> Cc: Nicolau, Radu <radu.nicolau@intel.com>; Doherty, Declan
> <declan.doherty@intel.com>; hemant.agrawal@nxp.com;
> matan@nvidia.com; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> thomas@monjalon.net; Ankur Dwivedi <adwivedi@marvell.com>;
> andrew.rybchenko@oktetlabs.ru; Akhil Goyal <gakhil@marvell.com>;
> dev@dpdk.org
> Subject: RE: [EXT] Re: [PATCH] RFC: ethdev: add reassembly offload
> 
> Hi Ferruh, Rosen, Andrew,
> 
> Please see inline.
> 
> Thanks,
> Anoob
> 
> > Subject: [EXT] Re: [PATCH] RFC: ethdev: add reassembly offload
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > On 8/23/2021 11:02 AM, Akhil Goyal wrote:
> > > Reassembly is a costly operation if it is done in software, however,
> > > if it is offloaded to HW, it can considerably save application cycles.
> > > The operation becomes even more costlier if IP fragmants are
> > > encrypted.
> > >
> > > To resolve above two issues, a new offload
> > DEV_RX_OFFLOAD_REASSEMBLY
> > > is introduced in ethdev for devices which can attempt reassembly of
> > > packets in hardware.
> > > rte_eth_dev_info is added with the reassembly capabilities which a
> > > device can support.
> > > Now, if IP fragments are encrypted, reassembly can also be attempted
> > > while doing inline IPsec processing.
> > > This is controlled by a flag in rte_security_ipsec_sa_options to
> > > enable reassembly of encrypted IP fragments in the inline path.
> > >
> > > The resulting reassembled packet would be a typical segmented mbuf
> > > in case of success.
> > >
> > > And if reassembly of fragments is failed or is incomplete (if
> > > fragments do not come before the reass_timeout), the mbuf is updated
> > > with an ol_flag PKT_RX_REASSEMBLY_INCOMPLETE and mbuf is returned
> > as
> > > is. Now application may decide the fate of the packet to wait more
> > > for fragments to come or drop.
> > >
> > > Signed-off-by: Akhil Goyal <gakhil@marvell.com>
> > > ---
> > >  lib/ethdev/rte_ethdev.c     |  1 +
> > >  lib/ethdev/rte_ethdev.h     | 18 +++++++++++++++++-
> > >  lib/mbuf/rte_mbuf_core.h    |  3 ++-
> > >  lib/security/rte_security.h | 10 ++++++++++
> > >  4 files changed, 30 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > > 9d95cd11e1..1ab3a093cf 100644
> > > --- a/lib/ethdev/rte_ethdev.c
> > > +++ b/lib/ethdev/rte_ethdev.c
> > > @@ -119,6 +119,7 @@ static const struct {
> > >  	RTE_RX_OFFLOAD_BIT2STR(VLAN_FILTER),
> > >  	RTE_RX_OFFLOAD_BIT2STR(VLAN_EXTEND),
> > >  	RTE_RX_OFFLOAD_BIT2STR(JUMBO_FRAME),
> > > +	RTE_RX_OFFLOAD_BIT2STR(REASSEMBLY),
> > >  	RTE_RX_OFFLOAD_BIT2STR(SCATTER),
> > >  	RTE_RX_OFFLOAD_BIT2STR(TIMESTAMP),
> > >  	RTE_RX_OFFLOAD_BIT2STR(SECURITY),
> > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > > d2b27c351f..e89a4dc1eb 100644
> > > --- a/lib/ethdev/rte_ethdev.h
> > > +++ b/lib/ethdev/rte_ethdev.h
> > > @@ -1360,6 +1360,7 @@ struct rte_eth_conf {
> > >  #define DEV_RX_OFFLOAD_VLAN_FILTER	0x00000200
> > >  #define DEV_RX_OFFLOAD_VLAN_EXTEND	0x00000400
> > >  #define DEV_RX_OFFLOAD_JUMBO_FRAME	0x00000800
> > > +#define DEV_RX_OFFLOAD_REASSEMBLY	0x00001000
> >
> > previous '0x00001000' was 'DEV_RX_OFFLOAD_CRC_STRIP', it has been
> long
> > that offload has been removed, but not sure if it cause any problem to
> > re- use it.
> >
> > >  #define DEV_RX_OFFLOAD_SCATTER		0x00002000
> > >  /**
> > >   * Timestamp is set by the driver in
> > RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> > > @@ -1477,6 +1478,20 @@ struct rte_eth_dev_portconf {
> > >   */
> > >  #define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID
> > 	(UINT16_MAX)
> > >
> > > +/**
> > > + * Reassembly capabilities that a device can support.
> > > + * The device which can support reassembly offload should set
> > > + * DEV_RX_OFFLOAD_REASSEMBLY
> > > + */
> > > +struct rte_eth_reass_capa {
> > > +	/** Maximum time in ns that a fragment can wait for further
> > fragments */
> > > +	uint64_t reass_timeout;
> > > +	/** Maximum number of fragments that device can reassemble */
> > > +	uint16_t max_frags;
> > > +	/** Reserved for future capabilities */
> > > +	uint16_t reserved[3];
> > > +};
> > > +
> >
> > I wonder if there is any other hardware around supports reassembly
> > offload, it would be good to get more feedback on the capabilities list.
> >
> > >  /**
> > >   * Ethernet device associated switch information
> > >   */
> > > @@ -1582,8 +1597,9 @@ struct rte_eth_dev_info {
> > >  	 * embedded managed interconnect/switch.
> > >  	 */
> > >  	struct rte_eth_switch_info switch_info;
> > > +	/* Reassembly capabilities of a device for reassembly offload */
> > > +	struct rte_eth_reass_capa reass_capa;
> > >
> > > -	uint64_t reserved_64s[2]; /**< Reserved for future fields */
> >
> > Reserved fields were added to be able to update the struct without
> > breaking the ABI, so that a critical change doesn't have to wait until
> > next ABI break release.
> > Since this is ABI break release, we can keep the reserved field and
> > add the new struct. Or this can be an opportunity to get rid of the reserved
> field.
> >
> > Personally I have no objection to get rid of the reserved field, but
> > better to agree on this explicitly.
> >
> > >  	void *reserved_ptrs[2];   /**< Reserved for future fields */
> > >  };
> > >
> > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > index
> > > bb38d7f581..cea25c87f7 100644
> > > --- a/lib/mbuf/rte_mbuf_core.h
> > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > @@ -200,10 +200,11 @@ extern "C" {
> > >  #define PKT_RX_OUTER_L4_CKSUM_BAD	(1ULL << 21)
> > >  #define PKT_RX_OUTER_L4_CKSUM_GOOD	(1ULL << 22)
> > >  #define PKT_RX_OUTER_L4_CKSUM_INVALID	((1ULL << 21) | (1ULL
> > << 22))
> > > +#define PKT_RX_REASSEMBLY_INCOMPLETE	(1ULL << 23)
> > >
> >
> > Similar comment with Andrew's, what is the expectation from
> > application if this flag exists? Can we drop it to simplify the logic in the
> application?
> 
> [Anoob] There can be few cases where hardware/NIC attempts inline
> reassembly but it fails to complete it
> 
> 1. Number of fragments is larger than what is supported by the hardware 2.
> Hardware reassembly resources are exhausted (due to limited reassembly
> contexts etc) 3. Reassembly errors such as overlapping fragments 4. Wait
> time exhausted (or reassembly timeout)
> 
> In such cases, application would be required to retrieve the original
> fragments so that it can attempt reassembly in software. The incomplete flag
> is useful for 2 purposes basically, 1. Application would need to retrieve the
> time the fragment has already spend in hardware reassembly so that
> software reassembly attempt can compensate for it. Otherwise, reassembly
> timeout across hardware + software will not be accurate 2. Retrieve original
> fragments. With this proposal, an incomplete reassembly would result in a
> chained mbuf but the segments need not be consecutive. To explain bit more,
> 
> Suppose we have a packet that is fragmented into 3 fragments, and fragment
> 3 & fragment 1 arrives in that order. Fragment 2 didn't arrive and hardware
> ultimately pushes it. In that case, application would be receiving a
> chained/segmented mbuf with fragment 1 & fragment 3 chained.
> 
> Now, this chained mbuf can't be treated like a regular chained mbuf. Each
> fragment would have its IP hdr and there are fragments missing in between.
> The only thing application is expected to do is, retrieve fragments, push it to
> s/w reassembly.

What you mentioned is error identification. But actually a negotiation about max frame size is needed before datagrams tx/rx.
> >
> > >  /* add new RX flags here, don't forget to update PKT_FIRST_FREE */
> > >
> > > -#define PKT_FIRST_FREE (1ULL << 23)
> > > +#define PKT_FIRST_FREE (1ULL << 24)
> > >  #define PKT_LAST_FREE (1ULL << 40)
> > >
> > >  /* add new TX flags here, don't forget to update PKT_LAST_FREE  */
> > > diff --git a/lib/security/rte_security.h
> > > b/lib/security/rte_security.h index 88d31de0a6..364eeb5cd4 100644
> > > --- a/lib/security/rte_security.h
> > > +++ b/lib/security/rte_security.h
> > > @@ -181,6 +181,16 @@ struct rte_security_ipsec_sa_options {
> > >  	 * * 0: Disable per session security statistics collection for this SA.
> > >  	 */
> > >  	uint32_t stats : 1;
> > > +
> > > +	/** Enable reassembly on incoming packets.
> > > +	 *
> > > +	 * * 1: Enable driver to try reassembly of encrypted IP packets for
> > > +	 *      this SA, if supported by the driver. This feature will work
> > > +	 *      only if rx_offload DEV_RX_OFFLOAD_REASSEMBLY is set in
> > > +	 *      inline ethernet device.
> > > +	 * * 0: Disable reassembly of packets (default).
> > > +	 */
> > > +	uint32_t reass_en : 1;
> > >  };
> > >
> > >  /** IPSec security association direction */
> > >
  
Andrew Rybchenko Sept. 13, 2021, 7:22 a.m. UTC | #8
On 9/13/21 9:56 AM, Xu, Rosen wrote:
> Hi,
> 
>> -----Original Message-----
>> From: Anoob Joseph <anoobj@marvell.com>
>> Sent: Wednesday, September 08, 2021 18:30
>> To: Yigit, Ferruh <ferruh.yigit@intel.com>; Xu, Rosen <rosen.xu@intel.com>;
>> Andrew Rybchenko <arybchenko@solarflare.com>
>> Cc: Nicolau, Radu <radu.nicolau@intel.com>; Doherty, Declan
>> <declan.doherty@intel.com>; hemant.agrawal@nxp.com;
>> matan@nvidia.com; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
>> thomas@monjalon.net; Ankur Dwivedi <adwivedi@marvell.com>;
>> andrew.rybchenko@oktetlabs.ru; Akhil Goyal <gakhil@marvell.com>;
>> dev@dpdk.org
>> Subject: RE: [EXT] Re: [PATCH] RFC: ethdev: add reassembly offload
>>
>> Hi Ferruh, Rosen, Andrew,
>>
>> Please see inline.
>>
>> Thanks,
>> Anoob
>>
>>> Subject: [EXT] Re: [PATCH] RFC: ethdev: add reassembly offload
>>>
>>> External Email
>>>
>>> ----------------------------------------------------------------------
>>> On 8/23/2021 11:02 AM, Akhil Goyal wrote:
>>>> Reassembly is a costly operation if it is done in software, however,
>>>> if it is offloaded to HW, it can considerably save application cycles.
>>>> The operation becomes even more costlier if IP fragmants are
>>>> encrypted.
>>>>
>>>> To resolve above two issues, a new offload
>>> DEV_RX_OFFLOAD_REASSEMBLY
>>>> is introduced in ethdev for devices which can attempt reassembly of
>>>> packets in hardware.
>>>> rte_eth_dev_info is added with the reassembly capabilities which a
>>>> device can support.
>>>> Now, if IP fragments are encrypted, reassembly can also be attempted
>>>> while doing inline IPsec processing.
>>>> This is controlled by a flag in rte_security_ipsec_sa_options to
>>>> enable reassembly of encrypted IP fragments in the inline path.
>>>>
>>>> The resulting reassembled packet would be a typical segmented mbuf
>>>> in case of success.
>>>>
>>>> And if reassembly of fragments is failed or is incomplete (if
>>>> fragments do not come before the reass_timeout), the mbuf is updated
>>>> with an ol_flag PKT_RX_REASSEMBLY_INCOMPLETE and mbuf is returned
>>> as
>>>> is. Now application may decide the fate of the packet to wait more
>>>> for fragments to come or drop.
>>>>
>>>> Signed-off-by: Akhil Goyal <gakhil@marvell.com>
>>>> ---
>>>>  lib/ethdev/rte_ethdev.c     |  1 +
>>>>  lib/ethdev/rte_ethdev.h     | 18 +++++++++++++++++-
>>>>  lib/mbuf/rte_mbuf_core.h    |  3 ++-
>>>>  lib/security/rte_security.h | 10 ++++++++++
>>>>  4 files changed, 30 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
>>>> 9d95cd11e1..1ab3a093cf 100644
>>>> --- a/lib/ethdev/rte_ethdev.c
>>>> +++ b/lib/ethdev/rte_ethdev.c
>>>> @@ -119,6 +119,7 @@ static const struct {
>>>>  	RTE_RX_OFFLOAD_BIT2STR(VLAN_FILTER),
>>>>  	RTE_RX_OFFLOAD_BIT2STR(VLAN_EXTEND),
>>>>  	RTE_RX_OFFLOAD_BIT2STR(JUMBO_FRAME),
>>>> +	RTE_RX_OFFLOAD_BIT2STR(REASSEMBLY),
>>>>  	RTE_RX_OFFLOAD_BIT2STR(SCATTER),
>>>>  	RTE_RX_OFFLOAD_BIT2STR(TIMESTAMP),
>>>>  	RTE_RX_OFFLOAD_BIT2STR(SECURITY),
>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>>>> d2b27c351f..e89a4dc1eb 100644
>>>> --- a/lib/ethdev/rte_ethdev.h
>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>> @@ -1360,6 +1360,7 @@ struct rte_eth_conf {
>>>>  #define DEV_RX_OFFLOAD_VLAN_FILTER	0x00000200
>>>>  #define DEV_RX_OFFLOAD_VLAN_EXTEND	0x00000400
>>>>  #define DEV_RX_OFFLOAD_JUMBO_FRAME	0x00000800
>>>> +#define DEV_RX_OFFLOAD_REASSEMBLY	0x00001000
>>>
>>> previous '0x00001000' was 'DEV_RX_OFFLOAD_CRC_STRIP', it has been
>> long
>>> that offload has been removed, but not sure if it cause any problem to
>>> re- use it.
>>>
>>>>  #define DEV_RX_OFFLOAD_SCATTER		0x00002000
>>>>  /**
>>>>   * Timestamp is set by the driver in
>>> RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
>>>> @@ -1477,6 +1478,20 @@ struct rte_eth_dev_portconf {
>>>>   */
>>>>  #define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID
>>> 	(UINT16_MAX)
>>>>
>>>> +/**
>>>> + * Reassembly capabilities that a device can support.
>>>> + * The device which can support reassembly offload should set
>>>> + * DEV_RX_OFFLOAD_REASSEMBLY
>>>> + */
>>>> +struct rte_eth_reass_capa {
>>>> +	/** Maximum time in ns that a fragment can wait for further
>>> fragments */
>>>> +	uint64_t reass_timeout;
>>>> +	/** Maximum number of fragments that device can reassemble */
>>>> +	uint16_t max_frags;
>>>> +	/** Reserved for future capabilities */
>>>> +	uint16_t reserved[3];
>>>> +};
>>>> +
>>>
>>> I wonder if there is any other hardware around supports reassembly
>>> offload, it would be good to get more feedback on the capabilities list.
>>>
>>>>  /**
>>>>   * Ethernet device associated switch information
>>>>   */
>>>> @@ -1582,8 +1597,9 @@ struct rte_eth_dev_info {
>>>>  	 * embedded managed interconnect/switch.
>>>>  	 */
>>>>  	struct rte_eth_switch_info switch_info;
>>>> +	/* Reassembly capabilities of a device for reassembly offload */
>>>> +	struct rte_eth_reass_capa reass_capa;
>>>>
>>>> -	uint64_t reserved_64s[2]; /**< Reserved for future fields */
>>>
>>> Reserved fields were added to be able to update the struct without
>>> breaking the ABI, so that a critical change doesn't have to wait until
>>> next ABI break release.
>>> Since this is ABI break release, we can keep the reserved field and
>>> add the new struct. Or this can be an opportunity to get rid of the reserved
>> field.
>>>
>>> Personally I have no objection to get rid of the reserved field, but
>>> better to agree on this explicitly.
>>>
>>>>  	void *reserved_ptrs[2];   /**< Reserved for future fields */
>>>>  };
>>>>
>>>> diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
>>>> index
>>>> bb38d7f581..cea25c87f7 100644
>>>> --- a/lib/mbuf/rte_mbuf_core.h
>>>> +++ b/lib/mbuf/rte_mbuf_core.h
>>>> @@ -200,10 +200,11 @@ extern "C" {
>>>>  #define PKT_RX_OUTER_L4_CKSUM_BAD	(1ULL << 21)
>>>>  #define PKT_RX_OUTER_L4_CKSUM_GOOD	(1ULL << 22)
>>>>  #define PKT_RX_OUTER_L4_CKSUM_INVALID	((1ULL << 21) | (1ULL
>>> << 22))
>>>> +#define PKT_RX_REASSEMBLY_INCOMPLETE	(1ULL << 23)
>>>>
>>>
>>> Similar comment with Andrew's, what is the expectation from
>>> application if this flag exists? Can we drop it to simplify the logic in the
>> application?
>>
>> [Anoob] There can be few cases where hardware/NIC attempts inline
>> reassembly but it fails to complete it
>>
>> 1. Number of fragments is larger than what is supported by the hardware 2.
>> Hardware reassembly resources are exhausted (due to limited reassembly
>> contexts etc) 3. Reassembly errors such as overlapping fragments 4. Wait
>> time exhausted (or reassembly timeout)
>>
>> In such cases, application would be required to retrieve the original
>> fragments so that it can attempt reassembly in software. The incomplete flag
>> is useful for 2 purposes basically, 1. Application would need to retrieve the
>> time the fragment has already spend in hardware reassembly so that
>> software reassembly attempt can compensate for it. Otherwise, reassembly
>> timeout across hardware + software will not be accurate 

Could you clarify how application will find out the time spent
in HW.

>> 2. Retrieve original
>> fragments. With this proposal, an incomplete reassembly would result in a
>> chained mbuf but the segments need not be consecutive. To explain bit more,
>>
>> Suppose we have a packet that is fragmented into 3 fragments, and fragment
>> 3 & fragment 1 arrives in that order. Fragment 2 didn't arrive and hardware
>> ultimately pushes it. In that case, application would be receiving a
>> chained/segmented mbuf with fragment 1 & fragment 3 chained.
>>
>> Now, this chained mbuf can't be treated like a regular chained mbuf. Each
>> fragment would have its IP hdr and there are fragments missing in between.
>> The only thing application is expected to do is, retrieve fragments, push it to
>> s/w reassembly.

It sounds like it conflicts with SCATTER and BUFFER_SPLIT
offloads which allow to return chained mbuf's. Don't know
if it is good or bad, but anyway it must be documented.

> 
> What you mentioned is error identification. But actually a negotiation about max frame size is needed before datagrams tx/rx.

It sounds like it is OK for informational purposes, but
right now I don't understand how it could be used by the
application. Application still has to support reassembly
in SW regardless of the information.

>>>
>>>>  /* add new RX flags here, don't forget to update PKT_FIRST_FREE */
>>>>
>>>> -#define PKT_FIRST_FREE (1ULL << 23)
>>>> +#define PKT_FIRST_FREE (1ULL << 24)
>>>>  #define PKT_LAST_FREE (1ULL << 40)
>>>>
>>>>  /* add new TX flags here, don't forget to update PKT_LAST_FREE  */
>>>> diff --git a/lib/security/rte_security.h
>>>> b/lib/security/rte_security.h index 88d31de0a6..364eeb5cd4 100644
>>>> --- a/lib/security/rte_security.h
>>>> +++ b/lib/security/rte_security.h
>>>> @@ -181,6 +181,16 @@ struct rte_security_ipsec_sa_options {
>>>>  	 * * 0: Disable per session security statistics collection for this SA.
>>>>  	 */
>>>>  	uint32_t stats : 1;
>>>> +
>>>> +	/** Enable reassembly on incoming packets.
>>>> +	 *
>>>> +	 * * 1: Enable driver to try reassembly of encrypted IP packets for
>>>> +	 *      this SA, if supported by the driver. This feature will work
>>>> +	 *      only if rx_offload DEV_RX_OFFLOAD_REASSEMBLY is set in
>>>> +	 *      inline ethernet device.
>>>> +	 * * 0: Disable reassembly of packets (default).
>>>> +	 */
>>>> +	uint32_t reass_en : 1;
>>>>  };
>>>>
>>>>  /** IPSec security association direction */
>>>>
>
  
Anoob Joseph Sept. 14, 2021, 5:14 a.m. UTC | #9
Hi Andrew, Rosen,

Please see inline.

Thanks,
Anoob

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Monday, September 13, 2021 12:52 PM
> To: Xu, Rosen <rosen.xu@intel.com>; Anoob Joseph
> <anoobj@marvell.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Andrew
> Rybchenko <arybchenko@solarflare.com>
> Cc: Nicolau, Radu <radu.nicolau@intel.com>; Doherty, Declan
> <declan.doherty@intel.com>; hemant.agrawal@nxp.com;
> matan@nvidia.com; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> thomas@monjalon.net; Ankur Dwivedi <adwivedi@marvell.com>; Akhil
> Goyal <gakhil@marvell.com>; dev@dpdk.org
> Subject: Re: [EXT] Re: [PATCH] RFC: ethdev: add reassembly offload
> 
> On 9/13/21 9:56 AM, Xu, Rosen wrote:
> > Hi,
> >
> >> -----Original Message-----
> >> From: Anoob Joseph <anoobj@marvell.com>
> >> Sent: Wednesday, September 08, 2021 18:30
> >> To: Yigit, Ferruh <ferruh.yigit@intel.com>; Xu, Rosen
> >> <rosen.xu@intel.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>
> >> Cc: Nicolau, Radu <radu.nicolau@intel.com>; Doherty, Declan
> >> <declan.doherty@intel.com>; hemant.agrawal@nxp.com;
> matan@nvidia.com;
> >> Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> >> thomas@monjalon.net; Ankur Dwivedi <adwivedi@marvell.com>;
> >> andrew.rybchenko@oktetlabs.ru; Akhil Goyal <gakhil@marvell.com>;
> >> dev@dpdk.org
> >> Subject: RE: [EXT] Re: [PATCH] RFC: ethdev: add reassembly offload
> >>
> >> Hi Ferruh, Rosen, Andrew,
> >>
> >> Please see inline.
> >>
> >> Thanks,
> >> Anoob
> >>
> >>> Subject: [EXT] Re: [PATCH] RFC: ethdev: add reassembly offload
> >>>
> >>> External Email
> >>>
> >>> --------------------------------------------------------------------
> >>> -- On 8/23/2021 11:02 AM, Akhil Goyal wrote:
> >>>> Reassembly is a costly operation if it is done in software,
> >>>> however, if it is offloaded to HW, it can considerably save application
> cycles.
> >>>> The operation becomes even more costlier if IP fragmants are
> >>>> encrypted.
> >>>>
> >>>> To resolve above two issues, a new offload
> >>> DEV_RX_OFFLOAD_REASSEMBLY
> >>>> is introduced in ethdev for devices which can attempt reassembly of
> >>>> packets in hardware.
> >>>> rte_eth_dev_info is added with the reassembly capabilities which a
> >>>> device can support.
> >>>> Now, if IP fragments are encrypted, reassembly can also be
> >>>> attempted while doing inline IPsec processing.
> >>>> This is controlled by a flag in rte_security_ipsec_sa_options to
> >>>> enable reassembly of encrypted IP fragments in the inline path.
> >>>>
> >>>> The resulting reassembled packet would be a typical segmented mbuf
> >>>> in case of success.
> >>>>
> >>>> And if reassembly of fragments is failed or is incomplete (if
> >>>> fragments do not come before the reass_timeout), the mbuf is
> >>>> updated with an ol_flag PKT_RX_REASSEMBLY_INCOMPLETE and mbuf
> is
> >>>> returned
> >>> as
> >>>> is. Now application may decide the fate of the packet to wait more
> >>>> for fragments to come or drop.
> >>>>
> >>>> Signed-off-by: Akhil Goyal <gakhil@marvell.com>
> >>>> ---
> >>>>  lib/ethdev/rte_ethdev.c     |  1 +
> >>>>  lib/ethdev/rte_ethdev.h     | 18 +++++++++++++++++-
> >>>>  lib/mbuf/rte_mbuf_core.h    |  3 ++-
> >>>>  lib/security/rte_security.h | 10 ++++++++++
> >>>>  4 files changed, 30 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> >>>> index 9d95cd11e1..1ab3a093cf 100644
> >>>> --- a/lib/ethdev/rte_ethdev.c
> >>>> +++ b/lib/ethdev/rte_ethdev.c
> >>>> @@ -119,6 +119,7 @@ static const struct {
> >>>>  	RTE_RX_OFFLOAD_BIT2STR(VLAN_FILTER),
> >>>>  	RTE_RX_OFFLOAD_BIT2STR(VLAN_EXTEND),
> >>>>  	RTE_RX_OFFLOAD_BIT2STR(JUMBO_FRAME),
> >>>> +	RTE_RX_OFFLOAD_BIT2STR(REASSEMBLY),
> >>>>  	RTE_RX_OFFLOAD_BIT2STR(SCATTER),
> >>>>  	RTE_RX_OFFLOAD_BIT2STR(TIMESTAMP),
> >>>>  	RTE_RX_OFFLOAD_BIT2STR(SECURITY), diff --git
> >>>> a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> >>>> d2b27c351f..e89a4dc1eb 100644
> >>>> --- a/lib/ethdev/rte_ethdev.h
> >>>> +++ b/lib/ethdev/rte_ethdev.h
> >>>> @@ -1360,6 +1360,7 @@ struct rte_eth_conf {
> >>>>  #define DEV_RX_OFFLOAD_VLAN_FILTER	0x00000200
> >>>>  #define DEV_RX_OFFLOAD_VLAN_EXTEND	0x00000400
> >>>>  #define DEV_RX_OFFLOAD_JUMBO_FRAME	0x00000800
> >>>> +#define DEV_RX_OFFLOAD_REASSEMBLY	0x00001000
> >>>
> >>> previous '0x00001000' was 'DEV_RX_OFFLOAD_CRC_STRIP', it has been
> >> long
> >>> that offload has been removed, but not sure if it cause any problem
> >>> to
> >>> re- use it.
> >>>
> >>>>  #define DEV_RX_OFFLOAD_SCATTER		0x00002000
> >>>>  /**
> >>>>   * Timestamp is set by the driver in
> >>> RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> >>>> @@ -1477,6 +1478,20 @@ struct rte_eth_dev_portconf {
> >>>>   */
> >>>>  #define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID
> >>> 	(UINT16_MAX)
> >>>>
> >>>> +/**
> >>>> + * Reassembly capabilities that a device can support.
> >>>> + * The device which can support reassembly offload should set
> >>>> + * DEV_RX_OFFLOAD_REASSEMBLY
> >>>> + */
> >>>> +struct rte_eth_reass_capa {
> >>>> +	/** Maximum time in ns that a fragment can wait for further
> >>> fragments */
> >>>> +	uint64_t reass_timeout;
> >>>> +	/** Maximum number of fragments that device can reassemble */
> >>>> +	uint16_t max_frags;
> >>>> +	/** Reserved for future capabilities */
> >>>> +	uint16_t reserved[3];
> >>>> +};
> >>>> +
> >>>
> >>> I wonder if there is any other hardware around supports reassembly
> >>> offload, it would be good to get more feedback on the capabilities list.
> >>>
> >>>>  /**
> >>>>   * Ethernet device associated switch information
> >>>>   */
> >>>> @@ -1582,8 +1597,9 @@ struct rte_eth_dev_info {
> >>>>  	 * embedded managed interconnect/switch.
> >>>>  	 */
> >>>>  	struct rte_eth_switch_info switch_info;
> >>>> +	/* Reassembly capabilities of a device for reassembly offload */
> >>>> +	struct rte_eth_reass_capa reass_capa;
> >>>>
> >>>> -	uint64_t reserved_64s[2]; /**< Reserved for future fields */
> >>>
> >>> Reserved fields were added to be able to update the struct without
> >>> breaking the ABI, so that a critical change doesn't have to wait
> >>> until next ABI break release.
> >>> Since this is ABI break release, we can keep the reserved field and
> >>> add the new struct. Or this can be an opportunity to get rid of the
> >>> reserved
> >> field.
> >>>
> >>> Personally I have no objection to get rid of the reserved field, but
> >>> better to agree on this explicitly.
> >>>
> >>>>  	void *reserved_ptrs[2];   /**< Reserved for future fields */
> >>>>  };
> >>>>
> >>>> diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> >>>> index
> >>>> bb38d7f581..cea25c87f7 100644
> >>>> --- a/lib/mbuf/rte_mbuf_core.h
> >>>> +++ b/lib/mbuf/rte_mbuf_core.h
> >>>> @@ -200,10 +200,11 @@ extern "C" {
> >>>>  #define PKT_RX_OUTER_L4_CKSUM_BAD	(1ULL << 21)
> >>>>  #define PKT_RX_OUTER_L4_CKSUM_GOOD	(1ULL << 22)
> >>>>  #define PKT_RX_OUTER_L4_CKSUM_INVALID	((1ULL << 21) | (1ULL
> >>> << 22))
> >>>> +#define PKT_RX_REASSEMBLY_INCOMPLETE	(1ULL << 23)
> >>>>
> >>>
> >>> Similar comment with Andrew's, what is the expectation from
> >>> application if this flag exists? Can we drop it to simplify the
> >>> logic in the
> >> application?
> >>
> >> [Anoob] There can be few cases where hardware/NIC attempts inline
> >> reassembly but it fails to complete it
> >>
> >> 1. Number of fragments is larger than what is supported by the hardware
> 2.
> >> Hardware reassembly resources are exhausted (due to limited
> >> reassembly contexts etc) 3. Reassembly errors such as overlapping
> >> fragments 4. Wait time exhausted (or reassembly timeout)
> >>
> >> In such cases, application would be required to retrieve the original
> >> fragments so that it can attempt reassembly in software. The
> >> incomplete flag is useful for 2 purposes basically, 1. Application
> >> would need to retrieve the time the fragment has already spend in
> >> hardware reassembly so that software reassembly attempt can
> >> compensate for it. Otherwise, reassembly timeout across hardware +
> >> software will not be accurate
> 
> Could you clarify how application will find out the time spent in HW.

[Anoob] We could use rte_mbuf dynamic fields for the same. Looks like RFC hasn't touched on this aspect yet. 
 
> 
> >> 2. Retrieve original
> >> fragments. With this proposal, an incomplete reassembly would result
> >> in a chained mbuf but the segments need not be consecutive. To
> >> explain bit more,
> >>
> >> Suppose we have a packet that is fragmented into 3 fragments, and
> >> fragment
> >> 3 & fragment 1 arrives in that order. Fragment 2 didn't arrive and
> >> hardware ultimately pushes it. In that case, application would be
> >> receiving a chained/segmented mbuf with fragment 1 & fragment 3
> chained.
> >>
> >> Now, this chained mbuf can't be treated like a regular chained mbuf.
> >> Each fragment would have its IP hdr and there are fragments missing in
> between.
> >> The only thing application is expected to do is, retrieve fragments,
> >> push it to s/w reassembly.
> 
> It sounds like it conflicts with SCATTER and BUFFER_SPLIT offloads which
> allow to return chained mbuf's. Don't know if it is good or bad, but anyway it
> must be documented.

[Anoob] Agreed.
 
> 
> >
> > What you mentioned is error identification. But actually a negotiation about
> max frame size is needed before datagrams tx/rx.

[Anoob] The actually reassembly settings would be negotiated by the s/w. The offload can be thought of like how checksum is being done now. S/w negotiates with peer and then enables the hardware to accelerate. If hardware is able to reassemble, then well and good. If not, we would have software compensate for it.
 
> 
> It sounds like it is OK for informational purposes, but right now I don't
> understand how it could be used by the application. Application still has to
> support reassembly in SW regardless of the information.

[Anoob] The additional information from "incomplete reassembly" attempt would be useful for software to properly compensate for the hardware reassembly attempt (basically, the reassembly timeout is honored across s/w + h/w  reassembly attempt). 

Benefit of such an offload is in accelerating reassembly in hardware for performance use cases. If application expects heavy fragmentation, then every packet would have a cost of ~1000 cycles (typically) to get it reassembled. By offloading this (atleast some portion of it) to hardware, application would be able to save significant cycles.

Since IP reassembly presents varying challenges depending on hardware implementation, we cannot expect complete reassembly offload in hardware. For some vendors, maximum number of fragments supported could be limited. Some vendors could have limited reassembly timeout (or wait_time). Some vendors could have limitations depending on datagram sizes. So s/w reassembly is not going away even with the proposed hardware assisted inline reassembly.

> 
> >>>
> >>>>  /* add new RX flags here, don't forget to update PKT_FIRST_FREE */
> >>>>
> >>>> -#define PKT_FIRST_FREE (1ULL << 23)
> >>>> +#define PKT_FIRST_FREE (1ULL << 24)
> >>>>  #define PKT_LAST_FREE (1ULL << 40)
> >>>>
> >>>>  /* add new TX flags here, don't forget to update PKT_LAST_FREE  */
> >>>> diff --git a/lib/security/rte_security.h
> >>>> b/lib/security/rte_security.h index 88d31de0a6..364eeb5cd4 100644
> >>>> --- a/lib/security/rte_security.h
> >>>> +++ b/lib/security/rte_security.h
> >>>> @@ -181,6 +181,16 @@ struct rte_security_ipsec_sa_options {
> >>>>  	 * * 0: Disable per session security statistics collection for this SA.
> >>>>  	 */
> >>>>  	uint32_t stats : 1;
> >>>> +
> >>>> +	/** Enable reassembly on incoming packets.
> >>>> +	 *
> >>>> +	 * * 1: Enable driver to try reassembly of encrypted IP packets for
> >>>> +	 *      this SA, if supported by the driver. This feature will work
> >>>> +	 *      only if rx_offload DEV_RX_OFFLOAD_REASSEMBLY is set in
> >>>> +	 *      inline ethernet device.
> >>>> +	 * * 0: Disable reassembly of packets (default).
> >>>> +	 */
> >>>> +	uint32_t reass_en : 1;
> >>>>  };
> >>>>
> >>>>  /** IPSec security association direction */
> >>>>
> >
  
Thomas Monjalon Sept. 21, 2021, 7:59 p.m. UTC | #10
29/08/2021 15:14, Akhil Goyal:
> > On 8/23/21 1:02 PM, Akhil Goyal wrote:
> > > +#define DEV_RX_OFFLOAD_REASSEMBLY	0x00001000
> > 
> > I think it should be:
> > RTE_ETH_RX_OFFLOAD_IPV4_REASSEMBLY
> > 
> > i.e. have correct prefix similar to
> > RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT and mention IPv4.
> > 
> > If we'd like to cover IPv6 as well, it could be
> > RTE_ETH_RX_OFFLOAD_IP_REASSEMBLY and have IPv4/6
> > support bits in the offload capabilities below.
> 
> Intention is to update spec for both.
> Will update the capabilities accordingly to have both IPv4 and IPv6.
> 
> > 
> > >  #define DEV_RX_OFFLOAD_SCATTER		0x00002000
> > >  /**
> > >   * Timestamp is set by the driver in
> > RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> > > @@ -1477,6 +1478,20 @@ struct rte_eth_dev_portconf {
> > >   */
> > >  #define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID
> > 	(UINT16_MAX)
> > >
> > > +/**
> > > + * Reassembly capabilities that a device can support.
> > > + * The device which can support reassembly offload should set
> > > + * DEV_RX_OFFLOAD_REASSEMBLY
> > > + */
> > > +struct rte_eth_reass_capa {

Please add "IP" in flags, struct and comments.
  

Patch

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 9d95cd11e1..1ab3a093cf 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -119,6 +119,7 @@  static const struct {
 	RTE_RX_OFFLOAD_BIT2STR(VLAN_FILTER),
 	RTE_RX_OFFLOAD_BIT2STR(VLAN_EXTEND),
 	RTE_RX_OFFLOAD_BIT2STR(JUMBO_FRAME),
+	RTE_RX_OFFLOAD_BIT2STR(REASSEMBLY),
 	RTE_RX_OFFLOAD_BIT2STR(SCATTER),
 	RTE_RX_OFFLOAD_BIT2STR(TIMESTAMP),
 	RTE_RX_OFFLOAD_BIT2STR(SECURITY),
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index d2b27c351f..e89a4dc1eb 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1360,6 +1360,7 @@  struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_VLAN_FILTER	0x00000200
 #define DEV_RX_OFFLOAD_VLAN_EXTEND	0x00000400
 #define DEV_RX_OFFLOAD_JUMBO_FRAME	0x00000800
+#define DEV_RX_OFFLOAD_REASSEMBLY	0x00001000
 #define DEV_RX_OFFLOAD_SCATTER		0x00002000
 /**
  * Timestamp is set by the driver in RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
@@ -1477,6 +1478,20 @@  struct rte_eth_dev_portconf {
  */
 #define RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID	(UINT16_MAX)
 
+/**
+ * Reassembly capabilities that a device can support.
+ * The device which can support reassembly offload should set
+ * DEV_RX_OFFLOAD_REASSEMBLY
+ */
+struct rte_eth_reass_capa {
+	/** Maximum time in ns that a fragment can wait for further fragments */
+	uint64_t reass_timeout;
+	/** Maximum number of fragments that device can reassemble */
+	uint16_t max_frags;
+	/** Reserved for future capabilities */
+	uint16_t reserved[3];
+};
+
 /**
  * Ethernet device associated switch information
  */
@@ -1582,8 +1597,9 @@  struct rte_eth_dev_info {
 	 * embedded managed interconnect/switch.
 	 */
 	struct rte_eth_switch_info switch_info;
+	/* Reassembly capabilities of a device for reassembly offload */
+	struct rte_eth_reass_capa reass_capa;
 
-	uint64_t reserved_64s[2]; /**< Reserved for future fields */
 	void *reserved_ptrs[2];   /**< Reserved for future fields */
 };
 
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index bb38d7f581..cea25c87f7 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -200,10 +200,11 @@  extern "C" {
 #define PKT_RX_OUTER_L4_CKSUM_BAD	(1ULL << 21)
 #define PKT_RX_OUTER_L4_CKSUM_GOOD	(1ULL << 22)
 #define PKT_RX_OUTER_L4_CKSUM_INVALID	((1ULL << 21) | (1ULL << 22))
+#define PKT_RX_REASSEMBLY_INCOMPLETE	(1ULL << 23)
 
 /* add new RX flags here, don't forget to update PKT_FIRST_FREE */
 
-#define PKT_FIRST_FREE (1ULL << 23)
+#define PKT_FIRST_FREE (1ULL << 24)
 #define PKT_LAST_FREE (1ULL << 40)
 
 /* add new TX flags here, don't forget to update PKT_LAST_FREE  */
diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
index 88d31de0a6..364eeb5cd4 100644
--- a/lib/security/rte_security.h
+++ b/lib/security/rte_security.h
@@ -181,6 +181,16 @@  struct rte_security_ipsec_sa_options {
 	 * * 0: Disable per session security statistics collection for this SA.
 	 */
 	uint32_t stats : 1;
+
+	/** Enable reassembly on incoming packets.
+	 *
+	 * * 1: Enable driver to try reassembly of encrypted IP packets for
+	 *      this SA, if supported by the driver. This feature will work
+	 *      only if rx_offload DEV_RX_OFFLOAD_REASSEMBLY is set in
+	 *      inline ethernet device.
+	 * * 0: Disable reassembly of packets (default).
+	 */
+	uint32_t reass_en : 1;
 };
 
 /** IPSec security association direction */