[v4,1/3] ethdev: introduce protocol type based header split

Message ID 20220402104109.472078-2-wenxuanx.wu@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Andrew Rybchenko
Headers
Series ethdev: introduce protocol type based header split |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Wu, WenxuanX April 2, 2022, 10:41 a.m. UTC
  From: Xuan Ding <xuan.ding@intel.com>

Header split consists of splitting a received packet into two separate
regions based on the packet content. The split happens after the
packet header and before the packet payload. Splitting is usually between
the packet header that can be posted to a dedicated buffer and the packet
payload that can be posted to a different buffer.

Currently, Rx buffer split supports length and offset based packet split.
Although header split is a subset of buffer split, configuring buffer
split based on length is not suitable for NICs that do split based on
header protocol types. Because tunneling makes the conversion from length
to protocol type impossible.

This patch extends the current buffer split to support protocol type and
offset based header split. A new proto field is introduced in the
rte_eth_rxseg_split structure reserved field to specify header protocol
type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT enabled and
protocol type configured, PMD will split the ingress packets into two
separate regions. Currently, both inner and outer L2/L3/L4 level header
split can be supported.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, off0=2B
    seg1 - pool1, off1=128B

With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
    seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - payload @ 128 in mbuf from pool1

The memory attributes for the split parts may differ either - for example
the mempool0 and mempool1 belong to dpdk memory and external memory,
respectively.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 lib/ethdev/rte_ethdev.c | 34 ++++++++++++++++++++++-------
 lib/ethdev/rte_ethdev.h | 48 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 72 insertions(+), 10 deletions(-)
  

Comments

Andrew Rybchenko April 7, 2022, 10:47 a.m. UTC | #1
On 4/2/22 13:41, wenxuanx.wu@intel.com wrote:
> From: Xuan Ding <xuan.ding@intel.com>
> 
> Header split consists of splitting a received packet into two separate
> regions based on the packet content. The split happens after the
> packet header and before the packet payload. Splitting is usually between
> the packet header that can be posted to a dedicated buffer and the packet
> payload that can be posted to a different buffer.
> 
> Currently, Rx buffer split supports length and offset based packet split.
> Although header split is a subset of buffer split, configuring buffer
> split based on length is not suitable for NICs that do split based on
> header protocol types. Because tunneling makes the conversion from length
> to protocol type impossible.
> 
> This patch extends the current buffer split to support protocol type and
> offset based header split. A new proto field is introduced in the
> rte_eth_rxseg_split structure reserved field to specify header protocol
> type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT enabled and
> protocol type configured, PMD will split the ingress packets into two
> separate regions. Currently, both inner and outer L2/L3/L4 level header
> split can be supported.

RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload was introduced some
time ago to substitute bit-field header_split in struct
rte_eth_rxmode. It allows to enable header split offload with
the header size controlled using split_hdr_size in the same
structure.

Right now I see no single PMD which actually supports
RTE_ETH_RX_OFFLOAD_HEADER_SPLIT with above definition.
Many examples and test apps initialize the field to 0
explicitly. The most of drivers simply ignore split_hdr_size
since the offload is not advertised, but some double-check
that its value is 0.

I think that it means that the field should be removed on
the next LTS, and I'd say, together with the
RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload bit.

We should not redefine the offload meaning. 
 
 
 
 
 
 
 
 
 
 
 
 
 


> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>      seg0 - pool0, off0=2B
>      seg1 - pool1, off1=128B

Corresponding feature is named Rx buffer split.
Does it mean that protocol type based header split
requires Rx buffer split feature to be supported?

> 
> With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
>      seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>      seg1 - payload @ 128 in mbuf from pool1

Is it always outermost UDP? Does it require both UDP over IPv4
and UDP over IPv6 to be supported? What will happen if only one
is supported? How application can find out which protocol stack
are supported?

> 
> The memory attributes for the split parts may differ either - for example
> the mempool0 and mempool1 belong to dpdk memory and external memory,
> respectively.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> ---
>   lib/ethdev/rte_ethdev.c | 34 ++++++++++++++++++++++-------
>   lib/ethdev/rte_ethdev.h | 48 +++++++++++++++++++++++++++++++++++++++--
>   2 files changed, 72 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 29a3d80466..29adcdc2f0 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>   		uint32_t length = rx_seg[seg_idx].length;
>   		uint32_t offset = rx_seg[seg_idx].offset;
> +		uint16_t proto = rx_seg[seg_idx].proto;
>   
>   		if (mpl == NULL) {
>   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1694,13 +1695,29 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		}
>   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {
> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +		if (proto == RTE_ETH_RX_HEADER_SPLIT_NONE) {
> +			/* Check buffer split. */
> +			length = length != 0 ? length : *mbp_buf_size;
> +			if (*mbp_buf_size < length + offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> +					mpl->name, *mbp_buf_size,
> +					length + offset, length, offset);
> +				return -EINVAL;
> +			}
> +		} else {
> +			/* Check header split. */
> +			if (length != 0) {
> +				RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in header split\n");
> +				return -EINVAL;
> +			}
> +			if (*mbp_buf_size < offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u segment offset)\n",
> +					mpl->name, *mbp_buf_size,
> +					offset);
> +				return -EINVAL;
> +			}
>   		}
>   	}
>   	return 0;
> @@ -1778,7 +1795,8 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>   		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
>   		n_seg = rx_conf->rx_nseg;
>   
> -		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
> +		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT ||
> +			rx_conf->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
>   			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
>   							   &mbp_buf_size,
>   							   &dev_info);
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 04cff8ee10..e8371b98ed 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1197,12 +1197,31 @@ struct rte_eth_txmode {
>    *     - pool from the last valid element
>    *     - the buffer size from this pool
>    *     - zero offset
> + *
> + * Header split is a subset of buffer split. The split happens after the
> + * packet header and before the packet payload. For PMDs that do not
> + * support header split configuration by length, the location of the split
> + * needs to be specified by the header protocol type. While for buffer split,
> + * this field should not be configured.
> + *
> + * If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT flag is set in offloads field,
> + * the PMD will split the received packets into two separate regions:
> + * - The header buffer will be allocated from the memory pool,
> + *   specified in the first array element, the second buffer, from the
> + *   pool in the second element.
> + *
> + * - The lengths do not need to be configured in header split.
> + *
> + * - The offsets from the segment description elements specify
> + *   the data offset from the buffer beginning except the first mbuf.
> + *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
>    */
>   struct rte_eth_rxseg_split {
>   	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
>   	uint16_t length; /**< Segment data length, configures split point. */
>   	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> -	uint32_t reserved; /**< Reserved field. */
> +	uint16_t proto; /**< header protocol type, configures header split point. */

I realize that you don't want to use here enum defined above to
save some reserved space, but description must refer to the
enum rte_eth_rx_header_split_protocol_type.

> +	uint16_t reserved; /**< Reserved field. */

As far as I can see the structure is experimental. So, it
should not be the problem to extend it, but it is a really
good question raised by Stephen in RFC v1 discussion.
Shouldn't we require that all reserved fields are initialized
to zero and ignored on processing? Frankly speaking I always
thought so, but failed to find the place were it is documented.

@Thomas, @David, @Ferruh?

>   };
>   
>   /**
> @@ -1212,7 +1231,7 @@ struct rte_eth_rxseg_split {
>    * A common structure used to describe Rx packet segment properties.
>    */
>   union rte_eth_rxseg {
> -	/* The settings for buffer split offload. */
> +	/* The settings for buffer split and header split offload. */
>   	struct rte_eth_rxseg_split split;
>   	/* The other features settings should be added here. */
>   };
> @@ -1664,6 +1683,31 @@ struct rte_eth_conf {
>   			     RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
>   #define DEV_RX_OFFLOAD_VLAN RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN) RTE_ETH_RX_OFFLOAD_VLAN
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this enum may change without prior notice.
> + * This enum indicates the header split protocol type
> + */
> +enum rte_eth_rx_header_split_protocol_type {
> +	RTE_ETH_RX_HEADER_SPLIT_NONE = 0,
> +	RTE_ETH_RX_HEADER_SPLIT_MAC,
> +	RTE_ETH_RX_HEADER_SPLIT_IPV4,
> +	RTE_ETH_RX_HEADER_SPLIT_IPV6,
> +	RTE_ETH_RX_HEADER_SPLIT_L3,
> +	RTE_ETH_RX_HEADER_SPLIT_TCP,
> +	RTE_ETH_RX_HEADER_SPLIT_UDP,
> +	RTE_ETH_RX_HEADER_SPLIT_SCTP,
> +	RTE_ETH_RX_HEADER_SPLIT_L4,
> +	RTE_ETH_RX_HEADER_SPLIT_INNER_MAC,
> +	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4,
> +	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6,
> +	RTE_ETH_RX_HEADER_SPLIT_INNER_L3,
> +	RTE_ETH_RX_HEADER_SPLIT_INNER_TCP,
> +	RTE_ETH_RX_HEADER_SPLIT_INNER_UDP,
> +	RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP,
> +	RTE_ETH_RX_HEADER_SPLIT_INNER_L4,

Enumeration members should be documented. See my question
in the patch description.

> +};
> +
>   /*
>    * If new Rx offload capabilities are defined, they also must be
>    * mentioned in rte_rx_offload_names in rte_ethdev.c file.
  
Jerin Jacob April 7, 2022, 1:26 p.m. UTC | #2
On Sat, Apr 2, 2022 at 4:33 PM <wenxuanx.wu@intel.com> wrote:
>
> From: Xuan Ding <xuan.ding@intel.com>
>
> Header split consists of splitting a received packet into two separate
> regions based on the packet content. The split happens after the
> packet header and before the packet payload. Splitting is usually between
> the packet header that can be posted to a dedicated buffer and the packet
> payload that can be posted to a different buffer.
>
> Currently, Rx buffer split supports length and offset based packet split.
> Although header split is a subset of buffer split, configuring buffer
> split based on length is not suitable for NICs that do split based on
> header protocol types. Because tunneling makes the conversion from length
> to protocol type impossible.
>
> This patch extends the current buffer split to support protocol type and
> offset based header split. A new proto field is introduced in the
> rte_eth_rxseg_split structure reserved field to specify header protocol
> type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT enabled and
> protocol type configured, PMD will split the ingress packets into two
> separate regions. Currently, both inner and outer L2/L3/L4 level header
> split can be supported.
>
> For example, let's suppose we configured the Rx queue with the
> following segments:
>     seg0 - pool0, off0=2B
>     seg1 - pool1, off1=128B
>
> With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
>     seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0

If we set rte_eth_rxseg_split::proto = RTE_ETH_RX_HEADER_SPLIT_UDP and
rte_eth_rxseg_split.offset = 2,
What will be the content for seg0,
Will it be,
- offset as Starts atUDP Header
- size of segment as MAX(size of UDP header + 2, 128(as seg 1 start from128).
Right? If not, Please describe

Also, I don't think we need duplate
rte_eth_rx_header_split_protocol_type instead we can
reuse existing RTE_PTYPE_*  flags.


>     seg1 - payload @ 128 in mbuf from pool1
>
> The memory attributes for the split parts may differ either - for example
> the mempool0 and mempool1 belong to dpdk memory and external memory,
> respectively.
  
Ding, Xuan April 12, 2022, 4:15 p.m. UTC | #3
Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Thursday, April 7, 2022 6:48 PM
> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li,
> Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman Deep
> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> mb@smartsharesystems.com; viacheslavo@nvidia.com; Yu, Ping
> <ping.yu@intel.com>; Ding, Xuan <xuan.ding@intel.com>; Wang, YuanX
> <yuanx.wang@intel.com>; david.marchand@redhat.com; Ferruh Yigit
> <ferruhy@xilinx.com>
> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header split
> 
> On 4/2/22 13:41, wenxuanx.wu@intel.com wrote:
> > From: Xuan Ding <xuan.ding@intel.com>
> >
> > Header split consists of splitting a received packet into two separate
> > regions based on the packet content. The split happens after the
> > packet header and before the packet payload. Splitting is usually
> > between the packet header that can be posted to a dedicated buffer and
> > the packet payload that can be posted to a different buffer.
> >
> > Currently, Rx buffer split supports length and offset based packet split.
> > Although header split is a subset of buffer split, configuring buffer
> > split based on length is not suitable for NICs that do split based on
> > header protocol types. Because tunneling makes the conversion from
> > length to protocol type impossible.
> >
> > This patch extends the current buffer split to support protocol type
> > and offset based header split. A new proto field is introduced in the
> > rte_eth_rxseg_split structure reserved field to specify header
> > protocol type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
> > enabled and protocol type configured, PMD will split the ingress
> > packets into two separate regions. Currently, both inner and outer
> > L2/L3/L4 level header split can be supported.
> 
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload was introduced some time
> ago to substitute bit-field header_split in struct rte_eth_rxmode. It allows to
> enable header split offload with the header size controlled using
> split_hdr_size in the same structure.
> 
> Right now I see no single PMD which actually supports
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT with above definition.
> Many examples and test apps initialize the field to 0 explicitly. The most of
> drivers simply ignore split_hdr_size since the offload is not advertised, but
> some double-check that its value is 0.
> 
> I think that it means that the field should be removed on the next LTS, and I'd
> say, together with the RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload bit.
> 
> We should not redefine the offload meaning.

Yes, you are right. No single PMD supports RTE_ETH_RX_OFFLOAD_HEADER_SPLIT now.
Previously, I used this flag is to distinguish buffer split and header split.
The former supports multi-segments split by length and offset.
The later supports two segments split by proto and offset.
At this level, header split is a subset of buffer split.

Since we shouldn't redefine the meaning of this offload,
I will use the RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
The existence of tunnel needs to define a proto field in buffer split,
because some PMDs do not support split based on length and offset.

> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >      seg0 - pool0, off0=2B
> >      seg1 - pool1, off1=128B
> 
> Corresponding feature is named Rx buffer split.
> Does it mean that protocol type based header split requires Rx buffer split
> feature to be supported?

Protocol type based header split does not requires Rx buffer split.
In previous design, the header split and buffer split are exclusive.
Because we only configure one split offload for one RX queue.

> 
> >
> > With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> > the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
> >      seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
> >      seg1 - payload @ 128 in mbuf from pool1
> 
> Is it always outermost UDP? Does it require both UDP over IPv4 and UDP over
> IPv6 to be supported? What will happen if only one is supported? How
> application can find out which protocol stack are supported?

Both inner and outer UDP are considered.
Current design does not distinguish UDP over IPv4 or IPv6.
If we want to support granularity like only IPv4 or IPv6 supported,
user need add more configurations.

If application want to find out which protocol stack is supported,
one way I think is to expose the protocol stack supported by the driver through dev_info.
Any thoughts are welcomed :)

> 
> >
> > The memory attributes for the split parts may differ either - for
> > example the mempool0 and mempool1 belong to dpdk memory and
> external
> > memory, respectively.
> >
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> > Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> > ---
> >   lib/ethdev/rte_ethdev.c | 34 ++++++++++++++++++++++-------
> >   lib/ethdev/rte_ethdev.h | 48
> +++++++++++++++++++++++++++++++++++++++--
> >   2 files changed, 72 insertions(+), 10 deletions(-)
> >
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 29a3d80466..29adcdc2f0 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >   		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> >   		uint32_t length = rx_seg[seg_idx].length;
> >   		uint32_t offset = rx_seg[seg_idx].offset;
> > +		uint16_t proto = rx_seg[seg_idx].proto;
> >
> >   		if (mpl == NULL) {
> >   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1694,13
> > +1695,29 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >   		}
> >   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > -		length = length != 0 ? length : *mbp_buf_size;
> > -		if (*mbp_buf_size < length + offset) {
> > -			RTE_ETHDEV_LOG(ERR,
> > -				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > -				       mpl->name, *mbp_buf_size,
> > -				       length + offset, length, offset);
> > -			return -EINVAL;
> > +		if (proto == RTE_ETH_RX_HEADER_SPLIT_NONE) {
> > +			/* Check buffer split. */
> > +			length = length != 0 ? length : *mbp_buf_size;
> > +			if (*mbp_buf_size < length + offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > +					mpl->name, *mbp_buf_size,
> > +					length + offset, length, offset);
> > +				return -EINVAL;
> > +			}
> > +		} else {
> > +			/* Check header split. */
> > +			if (length != 0) {
> > +				RTE_ETHDEV_LOG(ERR, "segment length
> should be set to zero in header split\n");
> > +				return -EINVAL;
> > +			}
> > +			if (*mbp_buf_size < offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"%s mbuf_data_room_size %u < %u
> segment offset)\n",
> > +					mpl->name, *mbp_buf_size,
> > +					offset);
> > +				return -EINVAL;
> > +			}
> >   		}
> >   	}
> >   	return 0;
> > @@ -1778,7 +1795,8 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> >   		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
> >   		n_seg = rx_conf->rx_nseg;
> >
> > -		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
> {
> > +		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
> ||
> > +			rx_conf->offloads &
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
> >   			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> >   							   &mbp_buf_size,
> >   							   &dev_info);
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > 04cff8ee10..e8371b98ed 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1197,12 +1197,31 @@ struct rte_eth_txmode {
> >    *     - pool from the last valid element
> >    *     - the buffer size from this pool
> >    *     - zero offset
> > + *
> > + * Header split is a subset of buffer split. The split happens after
> > + the
> > + * packet header and before the packet payload. For PMDs that do not
> > + * support header split configuration by length, the location of the
> > + split
> > + * needs to be specified by the header protocol type. While for
> > + buffer split,
> > + * this field should not be configured.
> > + *
> > + * If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT flag is set in offloads field,
> > + * the PMD will split the received packets into two separate regions:
> > + * - The header buffer will be allocated from the memory pool,
> > + *   specified in the first array element, the second buffer, from the
> > + *   pool in the second element.
> > + *
> > + * - The lengths do not need to be configured in header split.
> > + *
> > + * - The offsets from the segment description elements specify
> > + *   the data offset from the buffer beginning except the first mbuf.
> > + *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> >    */
> >   struct rte_eth_rxseg_split {
> >   	struct rte_mempool *mp; /**< Memory pool to allocate segment
> from. */
> >   	uint16_t length; /**< Segment data length, configures split point. */
> >   	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> */
> > -	uint32_t reserved; /**< Reserved field. */
> > +	uint16_t proto; /**< header protocol type, configures header split
> > +point. */
> 
> I realize that you don't want to use here enum defined above to save some
> reserved space, but description must refer to the enum
> rte_eth_rx_header_split_protocol_type.

Thanks for your suggestion, will fix it in next version.

> 
> > +	uint16_t reserved; /**< Reserved field. */
> 
> As far as I can see the structure is experimental. So, it should not be the
> problem to extend it, but it is a really good question raised by Stephen in RFC
> v1 discussion.
> Shouldn't we require that all reserved fields are initialized to zero and
> ignored on processing? Frankly speaking I always thought so, but failed to
> find the place were it is documented.

Yes, it can be documented. By default is should be zero, and we can configure
it to enable protocol type based buffer split.

> 
> @Thomas, @David, @Ferruh?
> 
> >   };
> >
> >   /**
> > @@ -1212,7 +1231,7 @@ struct rte_eth_rxseg_split {
> >    * A common structure used to describe Rx packet segment properties.
> >    */
> >   union rte_eth_rxseg {
> > -	/* The settings for buffer split offload. */
> > +	/* The settings for buffer split and header split offload. */
> >   	struct rte_eth_rxseg_split split;
> >   	/* The other features settings should be added here. */
> >   };
> > @@ -1664,6 +1683,31 @@ struct rte_eth_conf {
> >   			     RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
> >   #define DEV_RX_OFFLOAD_VLAN
> RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN)
> > RTE_ETH_RX_OFFLOAD_VLAN
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this enum may change without prior notice.
> > + * This enum indicates the header split protocol type  */ enum
> > +rte_eth_rx_header_split_protocol_type {
> > +	RTE_ETH_RX_HEADER_SPLIT_NONE = 0,
> > +	RTE_ETH_RX_HEADER_SPLIT_MAC,
> > +	RTE_ETH_RX_HEADER_SPLIT_IPV4,
> > +	RTE_ETH_RX_HEADER_SPLIT_IPV6,
> > +	RTE_ETH_RX_HEADER_SPLIT_L3,
> > +	RTE_ETH_RX_HEADER_SPLIT_TCP,
> > +	RTE_ETH_RX_HEADER_SPLIT_UDP,
> > +	RTE_ETH_RX_HEADER_SPLIT_SCTP,
> > +	RTE_ETH_RX_HEADER_SPLIT_L4,
> > +	RTE_ETH_RX_HEADER_SPLIT_INNER_MAC,
> > +	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4,
> > +	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6,
> > +	RTE_ETH_RX_HEADER_SPLIT_INNER_L3,
> > +	RTE_ETH_RX_HEADER_SPLIT_INNER_TCP,
> > +	RTE_ETH_RX_HEADER_SPLIT_INNER_UDP,
> > +	RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP,
> > +	RTE_ETH_RX_HEADER_SPLIT_INNER_L4,
> 
> Enumeration members should be documented. See my question in the patch
> description.

Thanks for your detailed comments, questions are answered accordingly.

Best Regards,
Xuan

> 
> > +};
> > +
> >   /*
> >    * If new Rx offload capabilities are defined, they also must be
> >    * mentioned in rte_rx_offload_names in rte_ethdev.c file.
  
Ding, Xuan April 12, 2022, 4:40 p.m. UTC | #4
Hi Jacob,

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Thursday, April 7, 2022 9:27 PM
> To: Wu, WenxuanX <wenxuanx.wu@intel.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Li, Xiaoyun <xiaoyun.li@intel.com>;
> Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; dpdk-dev
> <dev@dpdk.org>; Stephen Hemminger <stephen@networkplumber.org>;
> Morten Brørup <mb@smartsharesystems.com>; Viacheslav Ovsiienko
> <viacheslavo@nvidia.com>; Yu, Ping <ping.yu@intel.com>; Ding, Xuan
> <xuan.ding@intel.com>; Wang, YuanX <yuanx.wang@intel.com>
> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header split
> 
> On Sat, Apr 2, 2022 at 4:33 PM <wenxuanx.wu@intel.com> wrote:
> >
> > From: Xuan Ding <xuan.ding@intel.com>
> >
> > Header split consists of splitting a received packet into two separate
> > regions based on the packet content. The split happens after the
> > packet header and before the packet payload. Splitting is usually
> > between the packet header that can be posted to a dedicated buffer and
> > the packet payload that can be posted to a different buffer.
> >
> > Currently, Rx buffer split supports length and offset based packet split.
> > Although header split is a subset of buffer split, configuring buffer
> > split based on length is not suitable for NICs that do split based on
> > header protocol types. Because tunneling makes the conversion from
> > length to protocol type impossible.
> >
> > This patch extends the current buffer split to support protocol type
> > and offset based header split. A new proto field is introduced in the
> > rte_eth_rxseg_split structure reserved field to specify header
> > protocol type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
> > enabled and protocol type configured, PMD will split the ingress
> > packets into two separate regions. Currently, both inner and outer
> > L2/L3/L4 level header split can be supported.
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >     seg0 - pool0, off0=2B
> >     seg1 - pool1, off1=128B
> >
> > With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> > the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
> >     seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
> 
> If we set rte_eth_rxseg_split::proto = RTE_ETH_RX_HEADER_SPLIT_UDP and
> rte_eth_rxseg_split.offset = 2, What will be the content for seg0, Will it be,
> - offset as Starts atUDP Header
> - size of segment as MAX(size of UDP header + 2, 128(as seg 1 start from128).
> Right? If not, Please describe

Proto defines the location in packet for split.
Offset defines data buffer from beginning of mbuf data buffer, it can be zero.
With proto and offset configured, packets received will be split into two segments.

So in this configuration, the seg0 content is UDP header, the seg1 content is the payload.
Size of seg0 is size of UDP header, size of seg1 is size of payload.
rte_eth_rxseg_split.offset = 2/128 decides the mbuf offset, rather than segment size.

> 
> Also, I don't think we need duplate
> rte_eth_rx_header_split_protocol_type instead we can reuse existing
> RTE_PTYPE_*  flags.

That's a good idea. Yes, I can use the RTE_PTYPE_* here. My only
concern is the 32-bits RTE_PTYPE_* will run out of the 32-bits reserved fields.
If this proposal is agreed, I will use RTE_PTYPE_* instead of rte_eth_rx_header_split_protocol_type.

Best Regards,
Xuan

> 
> 
> >     seg1 - payload @ 128 in mbuf from pool1
> >
> > The memory attributes for the split parts may differ either - for
> > example the mempool0 and mempool1 belong to dpdk memory and
> external
> > memory, respectively.
  
Andrew Rybchenko April 20, 2022, 2:39 p.m. UTC | #5
On 4/12/22 19:40, Ding, Xuan wrote:
> Hi Jacob,
> 
>> -----Original Message-----
>> From: Jerin Jacob <jerinjacobk@gmail.com>
>> Sent: Thursday, April 7, 2022 9:27 PM
>> To: Wu, WenxuanX <wenxuanx.wu@intel.com>
>> Cc: Thomas Monjalon <thomas@monjalon.net>; Andrew Rybchenko
>> <andrew.rybchenko@oktetlabs.ru>; Li, Xiaoyun <xiaoyun.li@intel.com>;
>> Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying
>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; dpdk-dev
>> <dev@dpdk.org>; Stephen Hemminger <stephen@networkplumber.org>;
>> Morten Brørup <mb@smartsharesystems.com>; Viacheslav Ovsiienko
>> <viacheslavo@nvidia.com>; Yu, Ping <ping.yu@intel.com>; Ding, Xuan
>> <xuan.ding@intel.com>; Wang, YuanX <yuanx.wang@intel.com>
>> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header split
>>
>> On Sat, Apr 2, 2022 at 4:33 PM <wenxuanx.wu@intel.com> wrote:
>>>
>>> From: Xuan Ding <xuan.ding@intel.com>
>>>
>>> Header split consists of splitting a received packet into two separate
>>> regions based on the packet content. The split happens after the
>>> packet header and before the packet payload. Splitting is usually
>>> between the packet header that can be posted to a dedicated buffer and
>>> the packet payload that can be posted to a different buffer.
>>>
>>> Currently, Rx buffer split supports length and offset based packet split.
>>> Although header split is a subset of buffer split, configuring buffer
>>> split based on length is not suitable for NICs that do split based on
>>> header protocol types. Because tunneling makes the conversion from
>>> length to protocol type impossible.
>>>
>>> This patch extends the current buffer split to support protocol type
>>> and offset based header split. A new proto field is introduced in the
>>> rte_eth_rxseg_split structure reserved field to specify header
>>> protocol type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
>>> enabled and protocol type configured, PMD will split the ingress
>>> packets into two separate regions. Currently, both inner and outer
>>> L2/L3/L4 level header split can be supported.
>>>
>>> For example, let's suppose we configured the Rx queue with the
>>> following segments:
>>>      seg0 - pool0, off0=2B
>>>      seg1 - pool1, off1=128B
>>>
>>> With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
>>> the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
>>>      seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>>
>> If we set rte_eth_rxseg_split::proto = RTE_ETH_RX_HEADER_SPLIT_UDP and
>> rte_eth_rxseg_split.offset = 2, What will be the content for seg0, Will it be,
>> - offset as Starts atUDP Header
>> - size of segment as MAX(size of UDP header + 2, 128(as seg 1 start from128).
>> Right? If not, Please describe
> 
> Proto defines the location in packet for split.
> Offset defines data buffer from beginning of mbuf data buffer, it can be zero.
> With proto and offset configured, packets received will be split into two segments.
> 
> So in this configuration, the seg0 content is UDP header, the seg1 content is the payload.
> Size of seg0 is size of UDP header, size of seg1 is size of payload.
> rte_eth_rxseg_split.offset = 2/128 decides the mbuf offset, rather than segment size.

Above discussion proves that definition of the struct
rte_eth_rxseg_split is misleading. It is hard to catch
from naming that length defines a maximum data amount
to be copied, but office is a an offset in destination
mbuf. The structure is still experimental and I think
we should improve naming: offset -> mbuf_offset?

> 
>>
>> Also, I don't think we need duplate
>> rte_eth_rx_header_split_protocol_type instead we can reuse existing
>> RTE_PTYPE_*  flags.
> 
> That's a good idea. Yes, I can use the RTE_PTYPE_* here. My only
> concern is the 32-bits RTE_PTYPE_* will run out of the 32-bits reserved fields.
> If this proposal is agreed, I will use RTE_PTYPE_* instead of rte_eth_rx_header_split_protocol_type.
> 
> Best Regards,
> Xuan
> 
>>
>>
>>>      seg1 - payload @ 128 in mbuf from pool1
>>>
>>> The memory attributes for the split parts may differ either - for
>>> example the mempool0 and mempool1 belong to dpdk memory and
>> external
>>> memory, respectively.
  
Andrew Rybchenko April 20, 2022, 3:48 p.m. UTC | #6
Hi Xuan,

On 4/12/22 19:15, Ding, Xuan wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Thursday, April 7, 2022 6:48 PM
>> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li,
>> Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman Deep
>> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
>> Zhang, Qi Z <qi.z.zhang@intel.com>
>> Cc: dev@dpdk.org; stephen@networkplumber.org;
>> mb@smartsharesystems.com; viacheslavo@nvidia.com; Yu, Ping
>> <ping.yu@intel.com>; Ding, Xuan <xuan.ding@intel.com>; Wang, YuanX
>> <yuanx.wang@intel.com>; david.marchand@redhat.com; Ferruh Yigit
>> <ferruhy@xilinx.com>
>> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header split
>>
>> On 4/2/22 13:41, wenxuanx.wu@intel.com wrote:
>>> From: Xuan Ding <xuan.ding@intel.com>
>>>
>>> Header split consists of splitting a received packet into two separate
>>> regions based on the packet content. The split happens after the
>>> packet header and before the packet payload. Splitting is usually
>>> between the packet header that can be posted to a dedicated buffer and
>>> the packet payload that can be posted to a different buffer.
>>>
>>> Currently, Rx buffer split supports length and offset based packet split.
>>> Although header split is a subset of buffer split, configuring buffer
>>> split based on length is not suitable for NICs that do split based on
>>> header protocol types. Because tunneling makes the conversion from
>>> length to protocol type impossible.
>>>
>>> This patch extends the current buffer split to support protocol type
>>> and offset based header split. A new proto field is introduced in the
>>> rte_eth_rxseg_split structure reserved field to specify header
>>> protocol type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
>>> enabled and protocol type configured, PMD will split the ingress
>>> packets into two separate regions. Currently, both inner and outer
>>> L2/L3/L4 level header split can be supported.
>>
>> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload was introduced some time
>> ago to substitute bit-field header_split in struct rte_eth_rxmode. It allows to
>> enable header split offload with the header size controlled using
>> split_hdr_size in the same structure.
>>
>> Right now I see no single PMD which actually supports
>> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT with above definition.
>> Many examples and test apps initialize the field to 0 explicitly. The most of
>> drivers simply ignore split_hdr_size since the offload is not advertised, but
>> some double-check that its value is 0.
>>
>> I think that it means that the field should be removed on the next LTS, and I'd
>> say, together with the RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload bit.
>>
>> We should not redefine the offload meaning.
> 
> Yes, you are right. No single PMD supports RTE_ETH_RX_OFFLOAD_HEADER_SPLIT now.
> Previously, I used this flag is to distinguish buffer split and header split.
> The former supports multi-segments split by length and offset.

offset is misleading here, since split offset is derived from
segment lengths. Offset specified in segments is a different
thing.

> The later supports two segments split by proto and offset.
> At this level, header split is a subset of buffer split.

IMHO, generic definition of the header split should not limit
it by just two segments.

> 
> Since we shouldn't redefine the meaning of this offload,
> I will use the RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
> The existence of tunnel needs to define a proto field in buffer split,
> because some PMDs do not support split based on length and offset.

Not sure that I fully understand, but I'm looking forward
to review v5.

>>>
>>> For example, let's suppose we configured the Rx queue with the
>>> following segments:
>>>       seg0 - pool0, off0=2B
>>>       seg1 - pool1, off1=128B
>>
>> Corresponding feature is named Rx buffer split.
>> Does it mean that protocol type based header split requires Rx buffer split
>> feature to be supported?
> 
> Protocol type based header split does not requires Rx buffer split.
> In previous design, the header split and buffer split are exclusive.
> Because we only configure one split offload for one RX queue.
> 
>>
>>>
>>> With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
>>> the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
>>>       seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
>> pool0
>>>       seg1 - payload @ 128 in mbuf from pool1
>>
>> Is it always outermost UDP? Does it require both UDP over IPv4 and UDP over
>> IPv6 to be supported? What will happen if only one is supported? How
>> application can find out which protocol stack are supported?
> 
> Both inner and outer UDP are considered.
> Current design does not distinguish UDP over IPv4 or IPv6.
> If we want to support granularity like only IPv4 or IPv6 supported,
> user need add more configurations.

You should make it clear for application how to use it.
What happens if unsupported packet is received on an RxQ
configured to do header split?

> 
> If application want to find out which protocol stack is supported,
> one way I think is to expose the protocol stack supported by the driver through dev_info.
> Any thoughts are welcomed :)

dev_info is nice, but very heavily overloaded. We can start
from dev_info and understand if it should be factored out
to a separate API or it is OK to have it in dev_info if
it just few simple fields.

>>
>>>
>>> The memory attributes for the split parts may differ either - for
>>> example the mempool0 and mempool1 belong to dpdk memory and
>> external
>>> memory, respectively.
>>>
>>> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
>>> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
>>> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
>>> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
>>> ---
>>>    lib/ethdev/rte_ethdev.c | 34 ++++++++++++++++++++++-------
>>>    lib/ethdev/rte_ethdev.h | 48
>> +++++++++++++++++++++++++++++++++++++++--
>>>    2 files changed, 72 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
>>> 29a3d80466..29adcdc2f0 100644
>>> --- a/lib/ethdev/rte_ethdev.c
>>> +++ b/lib/ethdev/rte_ethdev.c
>>> @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
>> rte_eth_rxseg_split *rx_seg,
>>>    		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>>>    		uint32_t length = rx_seg[seg_idx].length;
>>>    		uint32_t offset = rx_seg[seg_idx].offset;
>>> +		uint16_t proto = rx_seg[seg_idx].proto;
>>>
>>>    		if (mpl == NULL) {
>>>    			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
>> @@ -1694,13
>>> +1695,29 @@ rte_eth_rx_queue_check_split(const struct
>> rte_eth_rxseg_split *rx_seg,
>>>    		}
>>>    		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>>>    		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
>>> -		length = length != 0 ? length : *mbp_buf_size;
>>> -		if (*mbp_buf_size < length + offset) {
>>> -			RTE_ETHDEV_LOG(ERR,
>>> -				       "%s mbuf_data_room_size %u < %u
>> (segment length=%u + segment offset=%u)\n",
>>> -				       mpl->name, *mbp_buf_size,
>>> -				       length + offset, length, offset);
>>> -			return -EINVAL;
>>> +		if (proto == RTE_ETH_RX_HEADER_SPLIT_NONE) {
>>> +			/* Check buffer split. */
>>> +			length = length != 0 ? length : *mbp_buf_size;
>>> +			if (*mbp_buf_size < length + offset) {
>>> +				RTE_ETHDEV_LOG(ERR,
>>> +					"%s mbuf_data_room_size %u < %u
>> (segment length=%u + segment offset=%u)\n",
>>> +					mpl->name, *mbp_buf_size,
>>> +					length + offset, length, offset);
>>> +				return -EINVAL;
>>> +			}
>>> +		} else {
>>> +			/* Check header split. */
>>> +			if (length != 0) {
>>> +				RTE_ETHDEV_LOG(ERR, "segment length
>> should be set to zero in header split\n");
>>> +				return -EINVAL;
>>> +			}
>>> +			if (*mbp_buf_size < offset) {
>>> +				RTE_ETHDEV_LOG(ERR,
>>> +					"%s mbuf_data_room_size %u < %u
>> segment offset)\n",
>>> +					mpl->name, *mbp_buf_size,
>>> +					offset);
>>> +				return -EINVAL;
>>> +			}
>>>    		}
>>>    	}
>>>    	return 0;
>>> @@ -1778,7 +1795,8 @@ rte_eth_rx_queue_setup(uint16_t port_id,
>> uint16_t rx_queue_id,
>>>    		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
>>>    		n_seg = rx_conf->rx_nseg;
>>>
>>> -		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
>> {
>>> +		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
>> ||
>>> +			rx_conf->offloads &
>> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
>>>    			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
>>>    							   &mbp_buf_size,
>>>    							   &dev_info);
>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>>> 04cff8ee10..e8371b98ed 100644
>>> --- a/lib/ethdev/rte_ethdev.h
>>> +++ b/lib/ethdev/rte_ethdev.h
>>> @@ -1197,12 +1197,31 @@ struct rte_eth_txmode {
>>>     *     - pool from the last valid element
>>>     *     - the buffer size from this pool
>>>     *     - zero offset
>>> + *
>>> + * Header split is a subset of buffer split. The split happens after
>>> + the
>>> + * packet header and before the packet payload. For PMDs that do not
>>> + * support header split configuration by length, the location of the
>>> + split
>>> + * needs to be specified by the header protocol type. While for
>>> + buffer split,
>>> + * this field should not be configured.
>>> + *
>>> + * If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT flag is set in offloads field,
>>> + * the PMD will split the received packets into two separate regions:
>>> + * - The header buffer will be allocated from the memory pool,
>>> + *   specified in the first array element, the second buffer, from the
>>> + *   pool in the second element.
>>> + *
>>> + * - The lengths do not need to be configured in header split.
>>> + *
>>> + * - The offsets from the segment description elements specify
>>> + *   the data offset from the buffer beginning except the first mbuf.
>>> + *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
>>>     */
>>>    struct rte_eth_rxseg_split {
>>>    	struct rte_mempool *mp; /**< Memory pool to allocate segment
>> from. */
>>>    	uint16_t length; /**< Segment data length, configures split point. */
>>>    	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
>> */
>>> -	uint32_t reserved; /**< Reserved field. */
>>> +	uint16_t proto; /**< header protocol type, configures header split
>>> +point. */
>>
>> I realize that you don't want to use here enum defined above to save some
>> reserved space, but description must refer to the enum
>> rte_eth_rx_header_split_protocol_type.
> 
> Thanks for your suggestion, will fix it in next version.
> 
>>
>>> +	uint16_t reserved; /**< Reserved field. */
>>
>> As far as I can see the structure is experimental. So, it should not be the
>> problem to extend it, but it is a really good question raised by Stephen in RFC
>> v1 discussion.
>> Shouldn't we require that all reserved fields are initialized to zero and
>> ignored on processing? Frankly speaking I always thought so, but failed to
>> find the place were it is documented.
> 
> Yes, it can be documented. By default is should be zero, and we can configure
> it to enable protocol type based buffer split.
> 
>>
>> @Thomas, @David, @Ferruh?
>>
>>>    };
>>>
>>>    /**
>>> @@ -1212,7 +1231,7 @@ struct rte_eth_rxseg_split {
>>>     * A common structure used to describe Rx packet segment properties.
>>>     */
>>>    union rte_eth_rxseg {
>>> -	/* The settings for buffer split offload. */
>>> +	/* The settings for buffer split and header split offload. */
>>>    	struct rte_eth_rxseg_split split;
>>>    	/* The other features settings should be added here. */
>>>    };
>>> @@ -1664,6 +1683,31 @@ struct rte_eth_conf {
>>>    			     RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
>>>    #define DEV_RX_OFFLOAD_VLAN
>> RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN)
>>> RTE_ETH_RX_OFFLOAD_VLAN
>>>
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this enum may change without prior notice.
>>> + * This enum indicates the header split protocol type  */ enum
>>> +rte_eth_rx_header_split_protocol_type {
>>> +	RTE_ETH_RX_HEADER_SPLIT_NONE = 0,
>>> +	RTE_ETH_RX_HEADER_SPLIT_MAC,
>>> +	RTE_ETH_RX_HEADER_SPLIT_IPV4,
>>> +	RTE_ETH_RX_HEADER_SPLIT_IPV6,
>>> +	RTE_ETH_RX_HEADER_SPLIT_L3,
>>> +	RTE_ETH_RX_HEADER_SPLIT_TCP,
>>> +	RTE_ETH_RX_HEADER_SPLIT_UDP,
>>> +	RTE_ETH_RX_HEADER_SPLIT_SCTP,
>>> +	RTE_ETH_RX_HEADER_SPLIT_L4,
>>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_MAC,
>>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4,
>>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6,
>>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_L3,
>>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_TCP,
>>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_UDP,
>>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP,
>>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_L4,
>>
>> Enumeration members should be documented. See my question in the patch
>> description.
> 
> Thanks for your detailed comments, questions are answered accordingly.
> 
> Best Regards,
> Xuan
> 
>>
>>> +};
>>> +
>>>    /*
>>>     * If new Rx offload capabilities are defined, they also must be
>>>     * mentioned in rte_rx_offload_names in rte_ethdev.c file.
>
  
Thomas Monjalon April 21, 2022, 10:27 a.m. UTC | #7
12/04/2022 18:15, Ding, Xuan:
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > On 4/2/22 13:41, wenxuanx.wu@intel.com wrote:
> > > From: Xuan Ding <xuan.ding@intel.com>
> > >
> > > Header split consists of splitting a received packet into two separate
> > > regions based on the packet content. The split happens after the
> > > packet header and before the packet payload. Splitting is usually
> > > between the packet header that can be posted to a dedicated buffer and
> > > the packet payload that can be posted to a different buffer.
> > >
> > > Currently, Rx buffer split supports length and offset based packet split.
> > > Although header split is a subset of buffer split, configuring buffer
> > > split based on length is not suitable for NICs that do split based on
> > > header protocol types. Because tunneling makes the conversion from
> > > length to protocol type impossible.
> > >
> > > This patch extends the current buffer split to support protocol type
> > > and offset based header split. A new proto field is introduced in the
> > > rte_eth_rxseg_split structure reserved field to specify header
> > > protocol type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
> > > enabled and protocol type configured, PMD will split the ingress
> > > packets into two separate regions. Currently, both inner and outer
> > > L2/L3/L4 level header split can be supported.
> > 
> > RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload was introduced some time
> > ago to substitute bit-field header_split in struct rte_eth_rxmode. It allows to
> > enable header split offload with the header size controlled using
> > split_hdr_size in the same structure.
> > 
> > Right now I see no single PMD which actually supports
> > RTE_ETH_RX_OFFLOAD_HEADER_SPLIT with above definition.
> > Many examples and test apps initialize the field to 0 explicitly. The most of
> > drivers simply ignore split_hdr_size since the offload is not advertised, but
> > some double-check that its value is 0.
> > 
> > I think that it means that the field should be removed on the next LTS, and I'd
> > say, together with the RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload bit.
> > 
> > We should not redefine the offload meaning.
> 
> Yes, you are right. No single PMD supports RTE_ETH_RX_OFFLOAD_HEADER_SPLIT now.
> Previously, I used this flag is to distinguish buffer split and header split.
> The former supports multi-segments split by length and offset.
> The later supports two segments split by proto and offset.
> At this level, header split is a subset of buffer split.
> 
> Since we shouldn't redefine the meaning of this offload,
> I will use the RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
> The existence of tunnel needs to define a proto field in buffer split,
> because some PMDs do not support split based on length and offset.

Before doing anything, the first patch of this series should make
the current status clearer.
Example, this line does not explain what it does:
	uint16_t split_hdr_size;  /**< hdr buf size (header_split enabled).*/
And header_split has been removed in ab3ce1e0c193 ("ethdev: remove old offload API")

If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT is not needed,
let's add a comment to start a deprecation.

> > > For example, let's suppose we configured the Rx queue with the
> > > following segments:
> > >      seg0 - pool0, off0=2B
> > >      seg1 - pool1, off1=128B
> > 
> > Corresponding feature is named Rx buffer split.
> > Does it mean that protocol type based header split requires Rx buffer split
> > feature to be supported?
> 
> Protocol type based header split does not requires Rx buffer split.
> In previous design, the header split and buffer split are exclusive.
> Because we only configure one split offload for one RX queue.

Things must be made clear and documented.

> > > With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> > > the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
> > >      seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> > pool0
> > >      seg1 - payload @ 128 in mbuf from pool1
> > 
> > Is it always outermost UDP? Does it require both UDP over IPv4 and UDP over
> > IPv6 to be supported? What will happen if only one is supported? How
> > application can find out which protocol stack are supported?
> 
> Both inner and outer UDP are considered.
> Current design does not distinguish UDP over IPv4 or IPv6.
> If we want to support granularity like only IPv4 or IPv6 supported,
> user need add more configurations.
> 
> If application want to find out which protocol stack is supported,
> one way I think is to expose the protocol stack supported by the driver through dev_info.
> Any thoughts are welcomed :)
[...]
> > > +	uint16_t reserved; /**< Reserved field. */
> > 
> > As far as I can see the structure is experimental. So, it should not be the
> > problem to extend it, but it is a really good question raised by Stephen in RFC
> > v1 discussion.
> > Shouldn't we require that all reserved fields are initialized to zero and
> > ignored on processing? Frankly speaking I always thought so, but failed to
> > find the place were it is documented.
> 
> Yes, it can be documented. By default is should be zero, and we can configure
> it to enable protocol type based buffer split.
> 
> > @Thomas, @David, @Ferruh?

Yes that's very important to have a clear state of the reserved fields.
A value must be set and documented.
  
Thomas Monjalon April 21, 2022, 10:36 a.m. UTC | #8
20/04/2022 16:39, Andrew Rybchenko:
> On 4/12/22 19:40, Ding, Xuan wrote:
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> >> On Sat, Apr 2, 2022 at 4:33 PM <wenxuanx.wu@intel.com> wrote:
> >>> From: Xuan Ding <xuan.ding@intel.com>
> >>>
> >>> Header split consists of splitting a received packet into two separate
> >>> regions based on the packet content. The split happens after the
> >>> packet header and before the packet payload. Splitting is usually
> >>> between the packet header that can be posted to a dedicated buffer and
> >>> the packet payload that can be posted to a different buffer.
> >>>
> >>> Currently, Rx buffer split supports length and offset based packet split.
> >>> Although header split is a subset of buffer split, configuring buffer
> >>> split based on length is not suitable for NICs that do split based on
> >>> header protocol types. Because tunneling makes the conversion from
> >>> length to protocol type impossible.
> >>>
> >>> This patch extends the current buffer split to support protocol type
> >>> and offset based header split. A new proto field is introduced in the
> >>> rte_eth_rxseg_split structure reserved field to specify header
> >>> protocol type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
> >>> enabled and protocol type configured, PMD will split the ingress
> >>> packets into two separate regions. Currently, both inner and outer
> >>> L2/L3/L4 level header split can be supported.
> >>>
> >>> For example, let's suppose we configured the Rx queue with the
> >>> following segments:
> >>>      seg0 - pool0, off0=2B
> >>>      seg1 - pool1, off1=128B
> >>>
> >>> With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> >>> the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
> >>>      seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
> >>
> >> If we set rte_eth_rxseg_split::proto = RTE_ETH_RX_HEADER_SPLIT_UDP and
> >> rte_eth_rxseg_split.offset = 2, What will be the content for seg0, Will it be,
> >> - offset as Starts atUDP Header
> >> - size of segment as MAX(size of UDP header + 2, 128(as seg 1 start from128).
> >> Right? If not, Please describe
> > 
> > Proto defines the location in packet for split.
> > Offset defines data buffer from beginning of mbuf data buffer, it can be zero.
> > With proto and offset configured, packets received will be split into two segments.
> > 
> > So in this configuration, the seg0 content is UDP header, the seg1 content is the payload.
> > Size of seg0 is size of UDP header, size of seg1 is size of payload.
> > rte_eth_rxseg_split.offset = 2/128 decides the mbuf offset, rather than segment size.
> 
> Above discussion proves that definition of the struct
> rte_eth_rxseg_split is misleading. It is hard to catch
> from naming that length defines a maximum data amount
> to be copied, but office is a an offset in destination
> mbuf. The structure is still experimental and I think
> we should improve naming: offset -> mbuf_offset?

I agree it is confusing.
mbuf_offset could be a better name.
length could be renamed as well. Is data_length better?

But the most important is to have a clear description
in the doxygen comment of the field.
We must specify what is the starting point and the "end" for those fields.

> >> Also, I don't think we need duplate
> >> rte_eth_rx_header_split_protocol_type instead we can reuse existing
> >> RTE_PTYPE_*  flags.
> > 
> > That's a good idea. Yes, I can use the RTE_PTYPE_* here. My only
> > concern is the 32-bits RTE_PTYPE_* will run out of the 32-bits reserved fields.
> > If this proposal is agreed, I will use RTE_PTYPE_* instead of rte_eth_rx_header_split_protocol_type.

Yes I think RTE_PTYPE_* is appropriate.
  
Ding, Xuan April 25, 2022, 9:23 a.m. UTC | #9
Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Wednesday, April 20, 2022 10:40 PM
> To: Ding, Xuan <xuan.ding@intel.com>; Jerin Jacob <jerinjacobk@gmail.com>;
> Wu, WenxuanX <wenxuanx.wu@intel.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>; Li, Xiaoyun
> <xiaoyun.li@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
> Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; dpdk-dev <dev@dpdk.org>; Stephen Hemminger
> <stephen@networkplumber.org>; Morten Brørup
> <mb@smartsharesystems.com>; Viacheslav Ovsiienko
> <viacheslavo@nvidia.com>; Yu, Ping <ping.yu@intel.com>; Wang, YuanX
> <yuanx.wang@intel.com>
> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header split
> 
> On 4/12/22 19:40, Ding, Xuan wrote:
> > Hi Jacob,
> >
> >> -----Original Message-----
> >> From: Jerin Jacob <jerinjacobk@gmail.com>
> >> Sent: Thursday, April 7, 2022 9:27 PM
> >> To: Wu, WenxuanX <wenxuanx.wu@intel.com>
> >> Cc: Thomas Monjalon <thomas@monjalon.net>; Andrew Rybchenko
> >> <andrew.rybchenko@oktetlabs.ru>; Li, Xiaoyun <xiaoyun.li@intel.com>;
> >> Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying
> >> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> >> dpdk-dev <dev@dpdk.org>; Stephen Hemminger
> >> <stephen@networkplumber.org>; Morten Brørup
> >> <mb@smartsharesystems.com>; Viacheslav Ovsiienko
> >> <viacheslavo@nvidia.com>; Yu, Ping <ping.yu@intel.com>; Ding, Xuan
> >> <xuan.ding@intel.com>; Wang, YuanX <yuanx.wang@intel.com>
> >> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header
> >> split
> >>
> >> On Sat, Apr 2, 2022 at 4:33 PM <wenxuanx.wu@intel.com> wrote:
> >>>
> >>> From: Xuan Ding <xuan.ding@intel.com>
> >>>
> >>> Header split consists of splitting a received packet into two
> >>> separate regions based on the packet content. The split happens
> >>> after the packet header and before the packet payload. Splitting is
> >>> usually between the packet header that can be posted to a dedicated
> >>> buffer and the packet payload that can be posted to a different buffer.
> >>>
> >>> Currently, Rx buffer split supports length and offset based packet split.
> >>> Although header split is a subset of buffer split, configuring
> >>> buffer split based on length is not suitable for NICs that do split
> >>> based on header protocol types. Because tunneling makes the
> >>> conversion from length to protocol type impossible.
> >>>
> >>> This patch extends the current buffer split to support protocol type
> >>> and offset based header split. A new proto field is introduced in
> >>> the rte_eth_rxseg_split structure reserved field to specify header
> >>> protocol type. With Rx offload flag
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
> >>> enabled and protocol type configured, PMD will split the ingress
> >>> packets into two separate regions. Currently, both inner and outer
> >>> L2/L3/L4 level header split can be supported.
> >>>
> >>> For example, let's suppose we configured the Rx queue with the
> >>> following segments:
> >>>      seg0 - pool0, off0=2B
> >>>      seg1 - pool1, off1=128B
> >>>
> >>> With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> >>> the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
> >>>      seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
> >>
> >> If we set rte_eth_rxseg_split::proto = RTE_ETH_RX_HEADER_SPLIT_UDP
> >> and rte_eth_rxseg_split.offset = 2, What will be the content for
> >> seg0, Will it be,
> >> - offset as Starts atUDP Header
> >> - size of segment as MAX(size of UDP header + 2, 128(as seg 1 start
> from128).
> >> Right? If not, Please describe
> >
> > Proto defines the location in packet for split.
> > Offset defines data buffer from beginning of mbuf data buffer, it can be
> zero.
> > With proto and offset configured, packets received will be split into two
> segments.
> >
> > So in this configuration, the seg0 content is UDP header, the seg1 content is
> the payload.
> > Size of seg0 is size of UDP header, size of seg1 is size of payload.
> > rte_eth_rxseg_split.offset = 2/128 decides the mbuf offset, rather than
> segment size.
> 
> Above discussion proves that definition of the struct rte_eth_rxseg_split is
> misleading. It is hard to catch from naming that length defines a maximum
> data amount to be copied, but office is a an offset in destination mbuf. The
> structure is still experimental and I think we should improve naming: offset ->
> mbuf_offset?

Yes, you are right. In rte_eth_rxseg_split structure, even the length and offset
are documented, it is hard to understand just from the naming.

Thanks,
Xuan

> 
> >
> >>
> >> Also, I don't think we need duplate
> >> rte_eth_rx_header_split_protocol_type instead we can reuse existing
> >> RTE_PTYPE_*  flags.
> >
> > That's a good idea. Yes, I can use the RTE_PTYPE_* here. My only
> > concern is the 32-bits RTE_PTYPE_* will run out of the 32-bits reserved
> fields.
> > If this proposal is agreed, I will use RTE_PTYPE_* instead of
> rte_eth_rx_header_split_protocol_type.
> >
> > Best Regards,
> > Xuan
> >
> >>
> >>
> >>>      seg1 - payload @ 128 in mbuf from pool1
> >>>
> >>> The memory attributes for the split parts may differ either - for
> >>> example the mempool0 and mempool1 belong to dpdk memory and
> >> external
> >>> memory, respectively.
  
Ding, Xuan April 25, 2022, 2:57 p.m. UTC | #10
Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Wednesday, April 20, 2022 11:48 PM
> To: Ding, Xuan <xuan.ding@intel.com>; Wu, WenxuanX
> <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li, Xiaoyun
> <xiaoyun.li@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
> Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> mb@smartsharesystems.com; viacheslavo@nvidia.com; Yu, Ping
> <ping.yu@intel.com>; Wang, YuanX <yuanx.wang@intel.com>;
> david.marchand@redhat.com; Ferruh Yigit <ferruhy@xilinx.com>
> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header split
> 
> Hi Xuan,
> 
> On 4/12/22 19:15, Ding, Xuan wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Thursday, April 7, 2022 6:48 PM
> >> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net;
> Li,
> >> Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman Deep
> >> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> >> Zhang, Qi Z <qi.z.zhang@intel.com>
> >> Cc: dev@dpdk.org; stephen@networkplumber.org;
> >> mb@smartsharesystems.com; viacheslavo@nvidia.com; Yu, Ping
> >> <ping.yu@intel.com>; Ding, Xuan <xuan.ding@intel.com>; Wang, YuanX
> >> <yuanx.wang@intel.com>; david.marchand@redhat.com; Ferruh Yigit
> >> <ferruhy@xilinx.com>
> >> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header
> >> split
> >>
> >> On 4/2/22 13:41, wenxuanx.wu@intel.com wrote:
> >>> From: Xuan Ding <xuan.ding@intel.com>
> >>>
> >>> Header split consists of splitting a received packet into two
> >>> separate regions based on the packet content. The split happens
> >>> after the packet header and before the packet payload. Splitting is
> >>> usually between the packet header that can be posted to a dedicated
> >>> buffer and the packet payload that can be posted to a different buffer.
> >>>
> >>> Currently, Rx buffer split supports length and offset based packet split.
> >>> Although header split is a subset of buffer split, configuring
> >>> buffer split based on length is not suitable for NICs that do split
> >>> based on header protocol types. Because tunneling makes the
> >>> conversion from length to protocol type impossible.
> >>>
> >>> This patch extends the current buffer split to support protocol type
> >>> and offset based header split. A new proto field is introduced in
> >>> the rte_eth_rxseg_split structure reserved field to specify header
> >>> protocol type. With Rx offload flag
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
> >>> enabled and protocol type configured, PMD will split the ingress
> >>> packets into two separate regions. Currently, both inner and outer
> >>> L2/L3/L4 level header split can be supported.
> >>
> >> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload was introduced some time
> ago
> >> to substitute bit-field header_split in struct rte_eth_rxmode. It
> >> allows to enable header split offload with the header size controlled
> >> using split_hdr_size in the same structure.
> >>
> >> Right now I see no single PMD which actually supports
> >> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT with above definition.
> >> Many examples and test apps initialize the field to 0 explicitly. The
> >> most of drivers simply ignore split_hdr_size since the offload is not
> >> advertised, but some double-check that its value is 0.
> >>
> >> I think that it means that the field should be removed on the next
> >> LTS, and I'd say, together with the RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
> offload bit.
> >>
> >> We should not redefine the offload meaning.
> >
> > Yes, you are right. No single PMD supports
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT now.
> > Previously, I used this flag is to distinguish buffer split and header split.
> > The former supports multi-segments split by length and offset.
> 
> offset is misleading here, since split offset is derived from segment lengths.
> Offset specified in segments is a different thing.

Yes, the length defines the segment length, and the offset defines the data  offset in mbuf.
The usage of length and offset are explained in the comments, but it is somewhat misleading
just from name.

> 
> > The later supports two segments split by proto and offset.
> > At this level, header split is a subset of buffer split.
> 
> IMHO, generic definition of the header split should not limit it by just two
> segments.

Does the header split here refer to the traditional header split?
If so, since you mentioned before we should not redefine the offload meaning,
I will use protocol and mbuf_offset based buffer split in next version.

It is worth noting that the purpose of specifying the split location by protocol is
to divide a packet into two segments. If you want to divide into multiple segments,
it should still be specified by length.

> 
> >
> > Since we shouldn't redefine the meaning of this offload, I will use
> > the RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
> > The existence of tunnel needs to define a proto field in buffer split,
> > because some PMDs do not support split based on length and offset.
> 
> Not sure that I fully understand, but I'm looking forward to review v5.

Thanks for your comments, I will send a v5, including these main changes:
1. Use protocol and mbuf_offset based buffer split instead of header split.
2. Use RTE_PTYPE* instead of enum rte_eth_rx_header_split_protocol_type.
3. Improve the description of rte_eth_rxseg_split.proto.

Your comments are welcomed. 😊

> 
> >>>
> >>> For example, let's suppose we configured the Rx queue with the
> >>> following segments:
> >>>       seg0 - pool0, off0=2B
> >>>       seg1 - pool1, off1=128B
> >>
> >> Corresponding feature is named Rx buffer split.
> >> Does it mean that protocol type based header split requires Rx buffer
> >> split feature to be supported?
> >
> > Protocol type based header split does not requires Rx buffer split.
> > In previous design, the header split and buffer split are exclusive.
> > Because we only configure one split offload for one RX queue.
> >
> >>
> >>>
> >>> With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> >>> the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
> >>>       seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> >> pool0
> >>>       seg1 - payload @ 128 in mbuf from pool1
> >>
> >> Is it always outermost UDP? Does it require both UDP over IPv4 and
> >> UDP over
> >> IPv6 to be supported? What will happen if only one is supported? How
> >> application can find out which protocol stack are supported?
> >
> > Both inner and outer UDP are considered.
> > Current design does not distinguish UDP over IPv4 or IPv6.
> > If we want to support granularity like only IPv4 or IPv6 supported,
> > user need add more configurations.

Thanks for your suggestion.
I will improve the documents about the usage of proto based buffer split.

> 
> You should make it clear for application how to use it.
> What happens if unsupported packet is received on an RxQ configured to do
> header split?


In fact, the buffer split and rte_flow are used in combination. It is expected that
the received packets will be steering to the RXQ configured with buffer split
offload. So there won't be unsupported packet received on an RXQ.

> 
> >
> > If application want to find out which protocol stack is supported, one
> > way I think is to expose the protocol stack supported by the driver through
> dev_info.
> > Any thoughts are welcomed :)
> 
> dev_info is nice, but very heavily overloaded. We can start from dev_info
> and understand if it should be factored out to a separate API or it is OK to
> have it in dev_info if it just few simple fields.

I'm also thinking exposing the protocol stack by dev_info is heavy.
We can configure all the protocol stack, and driver supports
part of the stacks. For protocols driver not supported, driver can returns the error.
What do you think of this design?

Regards,
Xuan

> 
> >>
> >>>
> >>> The memory attributes for the split parts may differ either - for
> >>> example the mempool0 and mempool1 belong to dpdk memory and
> >> external
> >>> memory, respectively.
> >>>
> >>> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> >>> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> >>> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> >>> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> >>> ---
> >>>    lib/ethdev/rte_ethdev.c | 34 ++++++++++++++++++++++-------
> >>>    lib/ethdev/rte_ethdev.h | 48
> >> +++++++++++++++++++++++++++++++++++++++--
> >>>    2 files changed, 72 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> >>> 29a3d80466..29adcdc2f0 100644
> >>> --- a/lib/ethdev/rte_ethdev.c
> >>> +++ b/lib/ethdev/rte_ethdev.c
> >>> @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> >> rte_eth_rxseg_split *rx_seg,
> >>>    		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> >>>    		uint32_t length = rx_seg[seg_idx].length;
> >>>    		uint32_t offset = rx_seg[seg_idx].offset;
> >>> +		uint16_t proto = rx_seg[seg_idx].proto;
> >>>
> >>>    		if (mpl == NULL) {
> >>>    			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> >> @@ -1694,13
> >>> +1695,29 @@ rte_eth_rx_queue_check_split(const struct
> >> rte_eth_rxseg_split *rx_seg,
> >>>    		}
> >>>    		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >>>    		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> >>> -		length = length != 0 ? length : *mbp_buf_size;
> >>> -		if (*mbp_buf_size < length + offset) {
> >>> -			RTE_ETHDEV_LOG(ERR,
> >>> -				       "%s mbuf_data_room_size %u < %u
> >> (segment length=%u + segment offset=%u)\n",
> >>> -				       mpl->name, *mbp_buf_size,
> >>> -				       length + offset, length, offset);
> >>> -			return -EINVAL;
> >>> +		if (proto == RTE_ETH_RX_HEADER_SPLIT_NONE) {
> >>> +			/* Check buffer split. */
> >>> +			length = length != 0 ? length : *mbp_buf_size;
> >>> +			if (*mbp_buf_size < length + offset) {
> >>> +				RTE_ETHDEV_LOG(ERR,
> >>> +					"%s mbuf_data_room_size %u < %u
> >> (segment length=%u + segment offset=%u)\n",
> >>> +					mpl->name, *mbp_buf_size,
> >>> +					length + offset, length, offset);
> >>> +				return -EINVAL;
> >>> +			}
> >>> +		} else {
> >>> +			/* Check header split. */
> >>> +			if (length != 0) {
> >>> +				RTE_ETHDEV_LOG(ERR, "segment length
> >> should be set to zero in header split\n");
> >>> +				return -EINVAL;
> >>> +			}
> >>> +			if (*mbp_buf_size < offset) {
> >>> +				RTE_ETHDEV_LOG(ERR,
> >>> +					"%s mbuf_data_room_size %u < %u
> >> segment offset)\n",
> >>> +					mpl->name, *mbp_buf_size,
> >>> +					offset);
> >>> +				return -EINVAL;
> >>> +			}
> >>>    		}
> >>>    	}
> >>>    	return 0;
> >>> @@ -1778,7 +1795,8 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> >> uint16_t rx_queue_id,
> >>>    		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
> >>>    		n_seg = rx_conf->rx_nseg;
> >>>
> >>> -		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
> >> {
> >>> +		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
> >> ||
> >>> +			rx_conf->offloads &
> >> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
> >>>    			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> >>>    							   &mbp_buf_size,
> >>>    							   &dev_info);
> >>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> >>> 04cff8ee10..e8371b98ed 100644
> >>> --- a/lib/ethdev/rte_ethdev.h
> >>> +++ b/lib/ethdev/rte_ethdev.h
> >>> @@ -1197,12 +1197,31 @@ struct rte_eth_txmode {
> >>>     *     - pool from the last valid element
> >>>     *     - the buffer size from this pool
> >>>     *     - zero offset
> >>> + *
> >>> + * Header split is a subset of buffer split. The split happens
> >>> + after the
> >>> + * packet header and before the packet payload. For PMDs that do
> >>> + not
> >>> + * support header split configuration by length, the location of
> >>> + the split
> >>> + * needs to be specified by the header protocol type. While for
> >>> + buffer split,
> >>> + * this field should not be configured.
> >>> + *
> >>> + * If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT flag is set in offloads
> >>> + field,
> >>> + * the PMD will split the received packets into two separate regions:
> >>> + * - The header buffer will be allocated from the memory pool,
> >>> + *   specified in the first array element, the second buffer, from the
> >>> + *   pool in the second element.
> >>> + *
> >>> + * - The lengths do not need to be configured in header split.
> >>> + *
> >>> + * - The offsets from the segment description elements specify
> >>> + *   the data offset from the buffer beginning except the first mbuf.
> >>> + *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> >>>     */
> >>>    struct rte_eth_rxseg_split {
> >>>    	struct rte_mempool *mp; /**< Memory pool to allocate segment
> >> from. */
> >>>    	uint16_t length; /**< Segment data length, configures split point. */
> >>>    	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> >> */
> >>> -	uint32_t reserved; /**< Reserved field. */
> >>> +	uint16_t proto; /**< header protocol type, configures header split
> >>> +point. */
> >>
> >> I realize that you don't want to use here enum defined above to save
> >> some reserved space, but description must refer to the enum
> >> rte_eth_rx_header_split_protocol_type.
> >
> > Thanks for your suggestion, will fix it in next version.
> >
> >>
> >>> +	uint16_t reserved; /**< Reserved field. */
> >>
> >> As far as I can see the structure is experimental. So, it should not
> >> be the problem to extend it, but it is a really good question raised
> >> by Stephen in RFC
> >> v1 discussion.
> >> Shouldn't we require that all reserved fields are initialized to zero
> >> and ignored on processing? Frankly speaking I always thought so, but
> >> failed to find the place were it is documented.
> >
> > Yes, it can be documented. By default is should be zero, and we can
> > configure it to enable protocol type based buffer split.
> >
> >>
> >> @Thomas, @David, @Ferruh?
> >>
> >>>    };
> >>>
> >>>    /**
> >>> @@ -1212,7 +1231,7 @@ struct rte_eth_rxseg_split {
> >>>     * A common structure used to describe Rx packet segment properties.
> >>>     */
> >>>    union rte_eth_rxseg {
> >>> -	/* The settings for buffer split offload. */
> >>> +	/* The settings for buffer split and header split offload. */
> >>>    	struct rte_eth_rxseg_split split;
> >>>    	/* The other features settings should be added here. */
> >>>    };
> >>> @@ -1664,6 +1683,31 @@ struct rte_eth_conf {
> >>>    			     RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
> >>>    #define DEV_RX_OFFLOAD_VLAN
> >> RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN)
> >>> RTE_ETH_RX_OFFLOAD_VLAN
> >>>
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this enum may change without prior notice.
> >>> + * This enum indicates the header split protocol type  */ enum
> >>> +rte_eth_rx_header_split_protocol_type {
> >>> +	RTE_ETH_RX_HEADER_SPLIT_NONE = 0,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_MAC,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_IPV4,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_IPV6,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_L3,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_TCP,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_UDP,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_SCTP,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_L4,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_MAC,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_L3,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_TCP,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_UDP,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_L4,
> >>
> >> Enumeration members should be documented. See my question in the
> >> patch description.
> >
> > Thanks for your detailed comments, questions are answered accordingly.
> >
> > Best Regards,
> > Xuan
> >
> >>
> >>> +};
> >>> +
> >>>    /*
> >>>     * If new Rx offload capabilities are defined, they also must be
> >>>     * mentioned in rte_rx_offload_names in rte_ethdev.c file.
> >
  
Ding, Xuan April 25, 2022, 3:05 p.m. UTC | #11
Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, April 21, 2022 6:28 PM
> To: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Wu, WenxuanX
> <wenxuanx.wu@intel.com>; Li, Xiaoyun <xiaoyun.li@intel.com>; Singh,
> Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> dev@dpdk.org
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> mb@smartsharesystems.com; viacheslavo@nvidia.com; Yu, Ping
> <ping.yu@intel.com>; Wang, YuanX <yuanx.wang@intel.com>;
> david.marchand@redhat.com; Ferruh Yigit <ferruhy@xilinx.com>; Ding, Xuan
> <xuan.ding@intel.com>
> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header split
> 
> 12/04/2022 18:15, Ding, Xuan:
> > From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > > On 4/2/22 13:41, wenxuanx.wu@intel.com wrote:
> > > > From: Xuan Ding <xuan.ding@intel.com>
> > > >
> > > > Header split consists of splitting a received packet into two
> > > > separate regions based on the packet content. The split happens
> > > > after the packet header and before the packet payload. Splitting
> > > > is usually between the packet header that can be posted to a
> > > > dedicated buffer and the packet payload that can be posted to a
> different buffer.
> > > >
> > > > Currently, Rx buffer split supports length and offset based packet split.
> > > > Although header split is a subset of buffer split, configuring
> > > > buffer split based on length is not suitable for NICs that do
> > > > split based on header protocol types. Because tunneling makes the
> > > > conversion from length to protocol type impossible.
> > > >
> > > > This patch extends the current buffer split to support protocol
> > > > type and offset based header split. A new proto field is
> > > > introduced in the rte_eth_rxseg_split structure reserved field to
> > > > specify header protocol type. With Rx offload flag
> > > > RTE_ETH_RX_OFFLOAD_HEADER_SPLIT enabled and protocol type
> > > > configured, PMD will split the ingress packets into two separate
> > > > regions. Currently, both inner and outer
> > > > L2/L3/L4 level header split can be supported.
> > >
> > > RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload was introduced some
> time ago
> > > to substitute bit-field header_split in struct rte_eth_rxmode. It
> > > allows to enable header split offload with the header size
> > > controlled using split_hdr_size in the same structure.
> > >
> > > Right now I see no single PMD which actually supports
> > > RTE_ETH_RX_OFFLOAD_HEADER_SPLIT with above definition.
> > > Many examples and test apps initialize the field to 0 explicitly.
> > > The most of drivers simply ignore split_hdr_size since the offload
> > > is not advertised, but some double-check that its value is 0.
> > >
> > > I think that it means that the field should be removed on the next
> > > LTS, and I'd say, together with the
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload bit.
> > >
> > > We should not redefine the offload meaning.
> >
> > Yes, you are right. No single PMD supports
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT now.
> > Previously, I used this flag is to distinguish buffer split and header split.
> > The former supports multi-segments split by length and offset.
> > The later supports two segments split by proto and offset.
> > At this level, header split is a subset of buffer split.
> >
> > Since we shouldn't redefine the meaning of this offload, I will use
> > the RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
> > The existence of tunnel needs to define a proto field in buffer split,
> > because some PMDs do not support split based on length and offset.
> 
> Before doing anything, the first patch of this series should make the current
> status clearer.
> Example, this line does not explain what it does:
> 	uint16_t split_hdr_size;  /**< hdr buf size (header_split enabled).*/
> And header_split has been removed in ab3ce1e0c193 ("ethdev: remove old
> offload API")
> 
> If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT is not needed, let's add a comment
> to start a deprecation.

Agree, I discussed with Andrew before that RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
is no longer supported by any PMDs.

I can send a separate patch of header split deprecation notice in 22.07,
and start removing the code in 22.11. What do you think?

> 
> > > > For example, let's suppose we configured the Rx queue with the
> > > > following segments:
> > > >      seg0 - pool0, off0=2B
> > > >      seg1 - pool1, off1=128B
> > >
> > > Corresponding feature is named Rx buffer split.
> > > Does it mean that protocol type based header split requires Rx
> > > buffer split feature to be supported?
> >
> > Protocol type based header split does not requires Rx buffer split.
> > In previous design, the header split and buffer split are exclusive.
> > Because we only configure one split offload for one RX queue.
> 
> Things must be made clear and documented.

Thanks for your suggestion, the documents will be improved in v5.

> 
> > > > With header split type configured with
> > > > RTE_ETH_RX_HEADER_SPLIT_UDP, the packet consists of
> MAC_IP_UDP_PAYLOAD will be split like following:
> > > >      seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> > > pool0
> > > >      seg1 - payload @ 128 in mbuf from pool1
> > >
> > > Is it always outermost UDP? Does it require both UDP over IPv4 and
> > > UDP over
> > > IPv6 to be supported? What will happen if only one is supported? How
> > > application can find out which protocol stack are supported?
> >
> > Both inner and outer UDP are considered.
> > Current design does not distinguish UDP over IPv4 or IPv6.
> > If we want to support granularity like only IPv4 or IPv6 supported,
> > user need add more configurations.
> >
> > If application want to find out which protocol stack is supported, one
> > way I think is to expose the protocol stack supported by the driver through
> dev_info.
> > Any thoughts are welcomed :)
> [...]
> > > > +	uint16_t reserved; /**< Reserved field. */
> > >
> > > As far as I can see the structure is experimental. So, it should not
> > > be the problem to extend it, but it is a really good question raised
> > > by Stephen in RFC
> > > v1 discussion.
> > > Shouldn't we require that all reserved fields are initialized to
> > > zero and ignored on processing? Frankly speaking I always thought
> > > so, but failed to find the place were it is documented.
> >
> > Yes, it can be documented. By default is should be zero, and we can
> > configure it to enable protocol type based buffer split.
> >
> > > @Thomas, @David, @Ferruh?
> 
> Yes that's very important to have a clear state of the reserved fields.
> A value must be set and documented.

Ditto, thanks for your comments. :)

Regards,
Xuan

> 
>
  

Patch

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 29a3d80466..29adcdc2f0 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1661,6 +1661,7 @@  rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint16_t proto = rx_seg[seg_idx].proto;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1694,13 +1695,29 @@  rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+		if (proto == RTE_ETH_RX_HEADER_SPLIT_NONE) {
+			/* Check buffer split. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Check header split. */
+			if (length != 0) {
+				RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in header split\n");
+				return -EINVAL;
+			}
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u segment offset)\n",
+					mpl->name, *mbp_buf_size,
+					offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
@@ -1778,7 +1795,8 @@  rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
 		n_seg = rx_conf->rx_nseg;
 
-		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT ||
+			rx_conf->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
 			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
 							   &mbp_buf_size,
 							   &dev_info);
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 04cff8ee10..e8371b98ed 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1197,12 +1197,31 @@  struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * Header split is a subset of buffer split. The split happens after the
+ * packet header and before the packet payload. For PMDs that do not
+ * support header split configuration by length, the location of the split
+ * needs to be specified by the header protocol type. While for buffer split,
+ * this field should not be configured.
+ *
+ * If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT flag is set in offloads field,
+ * the PMD will split the received packets into two separate regions:
+ * - The header buffer will be allocated from the memory pool,
+ *   specified in the first array element, the second buffer, from the
+ *   pool in the second element.
+ *
+ * - The lengths do not need to be configured in header split.
+ *
+ * - The offsets from the segment description elements specify
+ *   the data offset from the buffer beginning except the first mbuf.
+ *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	uint16_t proto; /**< header protocol type, configures header split point. */
+	uint16_t reserved; /**< Reserved field. */
 };
 
 /**
@@ -1212,7 +1231,7 @@  struct rte_eth_rxseg_split {
  * A common structure used to describe Rx packet segment properties.
  */
 union rte_eth_rxseg {
-	/* The settings for buffer split offload. */
+	/* The settings for buffer split and header split offload. */
 	struct rte_eth_rxseg_split split;
 	/* The other features settings should be added here. */
 };
@@ -1664,6 +1683,31 @@  struct rte_eth_conf {
 			     RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
 #define DEV_RX_OFFLOAD_VLAN RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN) RTE_ETH_RX_OFFLOAD_VLAN
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this enum may change without prior notice.
+ * This enum indicates the header split protocol type
+ */
+enum rte_eth_rx_header_split_protocol_type {
+	RTE_ETH_RX_HEADER_SPLIT_NONE = 0,
+	RTE_ETH_RX_HEADER_SPLIT_MAC,
+	RTE_ETH_RX_HEADER_SPLIT_IPV4,
+	RTE_ETH_RX_HEADER_SPLIT_IPV6,
+	RTE_ETH_RX_HEADER_SPLIT_L3,
+	RTE_ETH_RX_HEADER_SPLIT_TCP,
+	RTE_ETH_RX_HEADER_SPLIT_UDP,
+	RTE_ETH_RX_HEADER_SPLIT_SCTP,
+	RTE_ETH_RX_HEADER_SPLIT_L4,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_MAC,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_L3,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_TCP,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_UDP,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_L4,
+};
+
 /*
  * If new Rx offload capabilities are defined, they also must be
  * mentioned in rte_rx_offload_names in rte_ethdev.c file.