[v7,2/4] ethdev: introduce protocol hdr based buffer split

Message ID 20221001210521.15955-3-yuanx.wang@intel.com (mailing list archive)
State Changes Requested, archived
Delegated to: Andrew Rybchenko
Headers
Series support protocol based buffer split |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Wang, YuanX Oct. 1, 2022, 9:05 p.m. UTC
  Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given an arbitrarily variable length in Rx
packet segment, it is almost impossible to pass a fixed protocol header to
driver. Besides, the existence of tunneling results in the composition of
a packet is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happen
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, driver will split the ingress packets into
multiple segments.

Examples for proto_hdr field defines:
To split after ETH-IPV4-UDP, it should be defined as
RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP

For inner ETH-IPV4-UDP, it should be defined as
RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP

struct rte_eth_rxseg_split {
        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures split point */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        /**
	 * Proto_hdr defines a bit mask of the protocol sequence as
         * RTE_PTYPE_*, configures split point. The last RTE_PTYPE*
         * in the mask indicates the split position.
	 * For non-tunneling packets, the complete protocol sequence
         * should be defined.
	 * For tunneling packets, for simplicity, only the tunnel and
         * inner protocol sequence should be defined.
	 */
        uint32_t proto_hdr;
};

If protocol header split can be supported by a PMD, the
rte_eth_buffer_split_get_supported_hdr_ptypes function can
be use to obtain a list of these protocol headers.

For example, let's suppose we configured the Rx queue with the
following segments:
        seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
               off0=2B
        seg1 - pool1, proto_hdr1=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4
               | RTE_PTYPE_L4_UDP, off1=128B
        seg2 - pool2, off1=0B

The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
following:
        seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
        seg1 - udp header @ 128 in mbuf from pool1
        seg2 - payload @ 0 in mbuf from pool2

Note: NIC will only do split when the packets exactly match all the
protocol headers in the segments. For example, if ARP packets received
with above config, the NIC won't do split for ARP packets since
it does not contains ipv4 header and udp header. These packets will be put
into the last valid mempool, with zero offset.

Now buffer split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field will be ignored.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field will
be ignored.

The split limitations imposed by underlying driver is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |  7 +++
 lib/ethdev/rte_ethdev.c                | 74 ++++++++++++++++++++++----
 lib/ethdev/rte_ethdev.h                | 29 +++++++++-
 3 files changed, 98 insertions(+), 12 deletions(-)
  

Comments

Wang, YuanX Oct. 2, 2022, 4:01 a.m. UTC | #1
Hi All,

Could you please review and provide suggestions if any.

Thanks,
Yuan

> -----Original Message-----
> From: Wang, YuanX <yuanx.wang@intel.com>
> Sent: Sunday, October 2, 2022 5:05 AM
> To: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Cc: ferruh.yigit@xilinx.com; mdr@ashroe.eu; Li, Xiaoyun
> <xiaoyun.li@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
> Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> jerinjacobk@gmail.com; viacheslavo@nvidia.com;
> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>; Wang, YuanX
> <yuanx.wang@intel.com>; Wenxuan Wu <wenxuanx.wu@intel.com>
> Subject: [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split
> 
> Currently, Rx buffer split supports length based split. With Rx queue offload
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
> configured, PMD will be able to split the received packets into multiple
> segments.
> 
> However, length based buffer split is not suitable for NICs that do split based
> on protocol headers. Given an arbitrarily variable length in Rx packet
> segment, it is almost impossible to pass a fixed protocol header to driver.
> Besides, the existence of tunneling results in the composition of a packet is
> various, which makes the situation even worse.
> 
> This patch extends current buffer split to support protocol header based
> buffer split. A new proto_hdr field is introduced in the reserved field of
> rte_eth_rxseg_split structure to specify protocol header. The proto_hdr field
> defines the split position of packet, splitting will always happen after the
> protocol header defined in the Rx packet segment. When Rx queue offload
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol header is configured, driver will split the ingress packets into
> multiple segments.
> 
> Examples for proto_hdr field defines:
> To split after ETH-IPV4-UDP, it should be defined as RTE_PTYPE_L2_ETHER |
> RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP
> 
> For inner ETH-IPV4-UDP, it should be defined as
> RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
> RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP
> 
> struct rte_eth_rxseg_split {
>         struct rte_mempool *mp; /* memory pools to allocate segment from */
>         uint16_t length; /* segment maximal data length,
>                             configures split point */
>         uint16_t offset; /* data offset from beginning
>                             of mbuf data buffer */
>         /**
> 	 * Proto_hdr defines a bit mask of the protocol sequence as
>          * RTE_PTYPE_*, configures split point. The last RTE_PTYPE*
>          * in the mask indicates the split position.
> 	 * For non-tunneling packets, the complete protocol sequence
>          * should be defined.
> 	 * For tunneling packets, for simplicity, only the tunnel and
>          * inner protocol sequence should be defined.
> 	 */
>         uint32_t proto_hdr;
> };
> 
> If protocol header split can be supported by a PMD, the
> rte_eth_buffer_split_get_supported_hdr_ptypes function can be use to
> obtain a list of these protocol headers.
> 
> For example, let's suppose we configured the Rx queue with the following
> segments:
>         seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
>                off0=2B
>         seg1 - pool1, proto_hdr1=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4
>                | RTE_PTYPE_L4_UDP, off1=128B
>         seg2 - pool2, off1=0B
> 
> The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
> following:
>         seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
>         seg1 - udp header @ 128 in mbuf from pool1
>         seg2 - payload @ 0 in mbuf from pool2
> 
> Note: NIC will only do split when the packets exactly match all the protocol
> headers in the segments. For example, if ARP packets received with above
> config, the NIC won't do split for ARP packets since it does not contains ipv4
> header and udp header. These packets will be put into the last valid
> mempool, with zero offset.
> 
> Now buffer split can be configured in two modes. For length based buffer
> split, the mp, length, offset field in Rx packet segment should be configured,
> while the proto_hdr field will be ignored.
> For protocol header based buffer split, the mp, offset, proto_hdr field in Rx
> packet segment should be configured, while the length field will be ignored.
> 
> The split limitations imposed by underlying driver is reported in the
> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
> parts may differ either, dpdk memory and external memory, respectively.
> 
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> ---
>  doc/guides/rel_notes/release_22_11.rst |  7 +++
>  lib/ethdev/rte_ethdev.c                | 74 ++++++++++++++++++++++----
>  lib/ethdev/rte_ethdev.h                | 29 +++++++++-
>  3 files changed, 98 insertions(+), 12 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_22_11.rst
> b/doc/guides/rel_notes/release_22_11.rst
> index 6a7474a3d6..510869c73a 100644
> --- a/doc/guides/rel_notes/release_22_11.rst
> +++ b/doc/guides/rel_notes/release_22_11.rst
> @@ -101,6 +101,13 @@ New Features
>    * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get
> supported
>      header protocols of a PMD to split.
> 
> +* **Added protocol header based buffer split.**
> +
> +  * Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is
> +    replaced with ``proto_hdr`` to support protocol header based buffer split.
> +    User can choose length or protocol header to configure buffer split
> +    according to NIC's capability.
> +
> 
>  Removed Items
>  -------------
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> 1f0a7f8f3f..27ec19faed 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1649,9 +1649,10 @@ rte_eth_dev_is_removed(uint16_t port_id)  }
> 
>  static int
> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> -			     const struct rte_eth_dev_info *dev_info)
> +rte_eth_rx_queue_check_split(uint16_t port_id,
> +			const struct rte_eth_rxseg_split *rx_seg,
> +			uint16_t n_seg, uint32_t *mbp_buf_size,
> +			const struct rte_eth_dev_info *dev_info)
>  {
>  	const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
> >rx_seg_capa;
>  	struct rte_mempool *mp_first;
> @@ -1674,6 +1675,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
>  		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>  		uint32_t length = rx_seg[seg_idx].length;
>  		uint32_t offset = rx_seg[seg_idx].offset;
> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
> 
>  		if (mpl == NULL) {
>  			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1707,13 +1709,63 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
>  		}
>  		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>  		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {
> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +
> +		if (proto_hdr > 0) {
> +			/* Split based on protocol headers. */
> +
> +			/* skip the payload */
> +			if (proto_hdr == RTE_PTYPE_ALL_MASK)
> +				continue;
> +
> +			int ptype_cnt;
> +
> +			ptype_cnt =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
> +			if (ptype_cnt <= 0) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Port %u failed to supported buffer
> split header protocols\n",
> +					port_id);
> +				return -EINVAL;
> +			}
> +
> +			uint32_t ptypes[ptype_cnt];
> +			int i;
> +
> +			ptype_cnt =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> +
> 	ptypes, ptype_cnt);
> +			if (ptype_cnt < 0) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Port %u failed to supported buffer
> split header protocols\n",
> +					port_id);
> +				return -EINVAL;
> +			}
> +
> +			for (i = 0; i < ptype_cnt; i++)
> +				if (ptypes[i] == proto_hdr)
> +					break;
> +			if (i == ptype_cnt) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Requested Rx split header protocols
> 0x%x is not supported.\n",
> +					proto_hdr);
> +				return -EINVAL;
> +			}
> +
> +			if (*mbp_buf_size < offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +						"%s
> mbuf_data_room_size %u < %u segment offset)\n",
> +						mpl->name, *mbp_buf_size,
> +						offset);
> +				return -EINVAL;
> +			}
> +		} else {
> +			/* Split at fixed length. */
> +			length = length != 0 ? length : *mbp_buf_size;
> +			if (*mbp_buf_size < length + offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> +					mpl->name, *mbp_buf_size,
> +					length + offset, length, offset);
> +				return -EINVAL;
> +			}
>  		}
>  	}
>  	return 0;
> @@ -1793,7 +1845,7 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
>  		n_seg = rx_conf->rx_nseg;
> 
>  		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
> {
> -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg,
> n_seg,
>  							   &mbp_buf_size,
>  							   &dev_info);
>  			if (ret != 0)
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> cf14e04010..a5f9647bd3 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -994,6 +994,9 @@ struct rte_eth_txmode {
>   *   specified in the first array element, the second buffer, from the
>   *   pool in the second element, and so on.
>   *
> + * - The proto_hdrs in the elements define the split position of
> + *   received packets.
> + *
>   * - The offsets from the segment description elements specify
>   *   the data offset from the buffer beginning except the first mbuf.
>   *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> @@ -1015,12 +1018,36 @@ struct rte_eth_txmode {
>   *     - pool from the last valid element
>   *     - the buffer size from this pool
>   *     - zero offset
> + *
> + * - Length based buffer split:
> + *     - mp, length, offset should be configured.
> + *     - The proto_hdr field will be ignored.
> + *
> + * - Protocol header based buffer split:
> + *     - mp, offset, proto_hdr should be configured.
> + *     - The length field will be ignored.
> + *
> + * - For Protocol header based buffer split, if the received packets
> + *   don't exactly match all protocol headers in the elements, packets
> + *   will not be split.
> + *   These packets will be put into:
> + *     - pool from the last valid element
> + *     - the buffer size from this pool
> + *     - zero offset
>   */
>  struct rte_eth_rxseg_split {
>  	struct rte_mempool *mp; /**< Memory pool to allocate segment
> from. */
>  	uint16_t length; /**< Segment data length, configures split point. */
>  	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> */
> -	uint32_t reserved; /**< Reserved field. */
> +	/**
> +	 * Proto_hdr defines a bit mask of the protocol sequence as
> RTE_PTYPE_*,
> +	 * configures split point. The last RTE_PTYPE* in the mask indicates
> the
> +	 * split position.
> +	 * For non-tunneling packets, the complete protocol sequence should
> be defined.
> +	 * For tunneling packets, for simplicity, only the tunnel and inner
> +	 * protocol sequence should be defined.
> +	 */
> +	uint32_t proto_hdr;
>  };
> 
>  /**
> --
> 2.25.1
  
Andrew Rybchenko Oct. 3, 2022, 7:47 a.m. UTC | #2
On 10/2/22 00:05, Yuan Wang wrote:
> Currently, Rx buffer split supports length based split. With Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
> configured, PMD will be able to split the received packets into
> multiple segments.
> 
> However, length based buffer split is not suitable for NICs that do split
> based on protocol headers. Given an arbitrarily variable length in Rx
> packet segment, it is almost impossible to pass a fixed protocol header to
> driver. Besides, the existence of tunneling results in the composition of
> a packet is various, which makes the situation even worse.
> 
> This patch extends current buffer split to support protocol header based
> buffer split. A new proto_hdr field is introduced in the reserved field
> of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
> field defines the split position of packet, splitting will always happen
> after the protocol header defined in the Rx packet segment. When Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol header is configured, driver will split the ingress packets into
> multiple segments.
> 
> Examples for proto_hdr field defines:
> To split after ETH-IPV4-UDP, it should be defined as
> RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP
> 
> For inner ETH-IPV4-UDP, it should be defined as
> RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
> RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP
> 
> struct rte_eth_rxseg_split {
>          struct rte_mempool *mp; /* memory pools to allocate segment from */
>          uint16_t length; /* segment maximal data length,
>                              configures split point */
>          uint16_t offset; /* data offset from beginning
>                              of mbuf data buffer */
>          /**
> 	 * Proto_hdr defines a bit mask of the protocol sequence as
>           * RTE_PTYPE_*, configures split point. The last RTE_PTYPE*
>           * in the mask indicates the split position.
> 	 * For non-tunneling packets, the complete protocol sequence
>           * should be defined.
> 	 * For tunneling packets, for simplicity, only the tunnel and
>           * inner protocol sequence should be defined.
> 	 */
>          uint32_t proto_hdr;
> };
> 
> If protocol header split can be supported by a PMD, the
> rte_eth_buffer_split_get_supported_hdr_ptypes function can
> be use to obtain a list of these protocol headers.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>          seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
>                 off0=2B
>          seg1 - pool1, proto_hdr1=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4
>                 | RTE_PTYPE_L4_UDP, off1=128B
>          seg2 - pool2, off1=0B
> 
> The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
> following:
>          seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>          seg1 - udp header @ 128 in mbuf from pool1
>          seg2 - payload @ 0 in mbuf from pool2
> 
> Note: NIC will only do split when the packets exactly match all the
> protocol headers in the segments. For example, if ARP packets received
> with above config, the NIC won't do split for ARP packets since
> it does not contains ipv4 header and udp header. These packets will be put
> into the last valid mempool, with zero offset.
> 
> Now buffer split can be configured in two modes. For length based
> buffer split, the mp, length, offset field in Rx packet segment should
> be configured, while the proto_hdr field will be ignored.
> For protocol header based buffer split, the mp, offset, proto_hdr field
> in Rx packet segment should be configured, while the length field will
> be ignored.
> 
> The split limitations imposed by underlying driver is reported in the
> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
> parts may differ either, dpdk memory and external memory, respectively.
> 
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>

I apologize for delay with review. Overall LGTM now. See few
notes below.

> ---
>   doc/guides/rel_notes/release_22_11.rst |  7 +++
>   lib/ethdev/rte_ethdev.c                | 74 ++++++++++++++++++++++----
>   lib/ethdev/rte_ethdev.h                | 29 +++++++++-
>   3 files changed, 98 insertions(+), 12 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
> index 6a7474a3d6..510869c73a 100644
> --- a/doc/guides/rel_notes/release_22_11.rst
> +++ b/doc/guides/rel_notes/release_22_11.rst
> @@ -101,6 +101,13 @@ New Features
>     * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
>       header protocols of a PMD to split.
>   
> +* **Added protocol header based buffer split.**
> +
> +  * Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is
> +    replaced with ``proto_hdr`` to support protocol header based buffer split.
> +    User can choose length or protocol header to configure buffer split
> +    according to NIC's capability.
> +

It should be grouped together with other ethdev features.

>   
>   Removed Items
>   -------------
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 1f0a7f8f3f..27ec19faed 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1649,9 +1649,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
>   }
>   
>   static int
> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> -			     const struct rte_eth_dev_info *dev_info)
> +rte_eth_rx_queue_check_split(uint16_t port_id,
> +			const struct rte_eth_rxseg_split *rx_seg,
> +			uint16_t n_seg, uint32_t *mbp_buf_size,
> +			const struct rte_eth_dev_info *dev_info)
>   {
>   	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
>   	struct rte_mempool *mp_first;
> @@ -1674,6 +1675,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>   		uint32_t length = rx_seg[seg_idx].length;
>   		uint32_t offset = rx_seg[seg_idx].offset;
> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
>   
>   		if (mpl == NULL) {
>   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1707,13 +1709,63 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		}
>   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {
> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +
> +		if (proto_hdr > 0) {
> +			/* Split based on protocol headers. */

Isn't safer here to ensure that segment length is set to 0?
Just to protect agains misusage etc.

> +
> +			/* skip the payload */

Sorry, it is confusing. What do you mean here?

> +			if (proto_hdr == RTE_PTYPE_ALL_MASK)
> +				continue;
> +
> +			int ptype_cnt;
> +
> +			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
> +			if (ptype_cnt <= 0) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Port %u failed to supported buffer split header protocols\n",
> +					port_id);
> +				return -EINVAL;
> +			}
> +
> +			uint32_t ptypes[ptype_cnt];
> +			int i;

First of all do no mix code and variable declaration.
It significantly complicates code reading.
Second creation of an array on stack based on function
return value is very dangerours from security point of
view - potential stack overflow and corresponding
vulnerabilities.

> +
> +			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> +										ptypes, ptype_cnt);
> +			if (ptype_cnt < 0) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Port %u failed to supported buffer split header protocols\n",
> +					port_id);
> +				return -EINVAL;
> +			}
> +
> +			for (i = 0; i < ptype_cnt; i++)
> +				if (ptypes[i] == proto_hdr)
> +					break;
> +			if (i == ptype_cnt) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Requested Rx split header protocols 0x%x is not supported.\n",
> +					proto_hdr);
> +				return -EINVAL;
> +			}
> +
> +			if (*mbp_buf_size < offset) {

The check is obviously insufficient, but I agree that it should
be driver reponsibility to do extra checks for required space
in mbuf.

> +				RTE_ETHDEV_LOG(ERR,
> +						"%s mbuf_data_room_size %u < %u segment offset)\n",
> +						mpl->name, *mbp_buf_size,
> +						offset);
> +				return -EINVAL;
> +			}
> +		} else {
> +			/* Split at fixed length. */
> +			length = length != 0 ? length : *mbp_buf_size;
> +			if (*mbp_buf_size < length + offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> +					mpl->name, *mbp_buf_size,
> +					length + offset, length, offset);
> +				return -EINVAL;
> +			}
>   		}
>   	}
>   	return 0;
> @@ -1793,7 +1845,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>   		n_seg = rx_conf->rx_nseg;
>   
>   		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
> -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
>   							   &mbp_buf_size,
>   							   &dev_info);
>   			if (ret != 0)
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index cf14e04010..a5f9647bd3 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -994,6 +994,9 @@ struct rte_eth_txmode {
>    *   specified in the first array element, the second buffer, from the
>    *   pool in the second element, and so on.
>    *
> + * - The proto_hdrs in the elements define the split position of
> + *   received packets.
> + *
>    * - The offsets from the segment description elements specify
>    *   the data offset from the buffer beginning except the first mbuf.
>    *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> @@ -1015,12 +1018,36 @@ struct rte_eth_txmode {
>    *     - pool from the last valid element
>    *     - the buffer size from this pool
>    *     - zero offset
> + *
> + * - Length based buffer split:
> + *     - mp, length, offset should be configured.
> + *     - The proto_hdr field will be ignored.

Looking at the code above I think proto_hdr must be 0.

> + *
> + * - Protocol header based buffer split:
> + *     - mp, offset, proto_hdr should be configured.
> + *     - The length field will be ignored.

I'd require length to be 0 to avoid misusage of the API.

> + *
> + * - For Protocol header based buffer split, if the received packets
> + *   don't exactly match all protocol headers in the elements, packets
> + *   will not be split.
> + *   These packets will be put into:
> + *     - pool from the last valid element
> + *     - the buffer size from this pool
> + *     - zero offset

Shoundl't be check that dataroom in the last segment mempool
is sufficient for up to MTU packet if Rx scatter is disabled?

>    */
>   struct rte_eth_rxseg_split {
>   	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
>   	uint16_t length; /**< Segment data length, configures split point. */
>   	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> -	uint32_t reserved; /**< Reserved field. */
> +	/**
> +	 * Proto_hdr defines a bit mask of the protocol sequence as RTE_PTYPE_*,
> +	 * configures split point. The last RTE_PTYPE* in the mask indicates the
> +	 * split position.
> +	 * For non-tunneling packets, the complete protocol sequence should be defined.
> +	 * For tunneling packets, for simplicity, only the tunnel and inner
> +	 * protocol sequence should be defined.
> +	 */
> +	uint32_t proto_hdr;
>   };
>   
>   /**
  
Wang, YuanX Oct. 4, 2022, 2:48 a.m. UTC | #3
Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Monday, October 3, 2022 3:47 PM
> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@amd.com>
> Cc: ferruh.yigit@xilinx.com; mdr@ashroe.eu; Li, Xiaoyun
> <xiaoyun.li@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
> Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> jerinjacobk@gmail.com; viacheslavo@nvidia.com;
> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>; Wenxuan Wu
> <wenxuanx.wu@intel.com>
> Subject: Re: [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split
> 
> On 10/2/22 00:05, Yuan Wang wrote:
> > Currently, Rx buffer split supports length based split. With Rx queue
> > offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet
> segment
> > configured, PMD will be able to split the received packets into
> > multiple segments.
> >
> > However, length based buffer split is not suitable for NICs that do
> > split based on protocol headers. Given an arbitrarily variable length
> > in Rx packet segment, it is almost impossible to pass a fixed protocol
> > header to driver. Besides, the existence of tunneling results in the
> > composition of a packet is various, which makes the situation even worse.
> >
> > This patch extends current buffer split to support protocol header
> > based buffer split. A new proto_hdr field is introduced in the
> > reserved field of rte_eth_rxseg_split structure to specify protocol
> > header. The proto_hdr field defines the split position of packet,
> > splitting will always happen after the protocol header defined in the
> > Rx packet segment. When Rx queue offload
> > RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol
> > header is configured, driver will split the ingress packets into multiple
> segments.
> >
> > Examples for proto_hdr field defines:
> > To split after ETH-IPV4-UDP, it should be defined as
> > RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
> RTE_PTYPE_L4_UDP
> >
> > For inner ETH-IPV4-UDP, it should be defined as
> > RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
> > RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP
> >
> > struct rte_eth_rxseg_split {
> >          struct rte_mempool *mp; /* memory pools to allocate segment from
> */
> >          uint16_t length; /* segment maximal data length,
> >                              configures split point */
> >          uint16_t offset; /* data offset from beginning
> >                              of mbuf data buffer */
> >          /**
> > 	 * Proto_hdr defines a bit mask of the protocol sequence as
> >           * RTE_PTYPE_*, configures split point. The last RTE_PTYPE*
> >           * in the mask indicates the split position.
> > 	 * For non-tunneling packets, the complete protocol sequence
> >           * should be defined.
> > 	 * For tunneling packets, for simplicity, only the tunnel and
> >           * inner protocol sequence should be defined.
> > 	 */
> >          uint32_t proto_hdr;
> > };
> >
> > If protocol header split can be supported by a PMD, the
> > rte_eth_buffer_split_get_supported_hdr_ptypes function can be use to
> > obtain a list of these protocol headers.
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >          seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
> >                 off0=2B
> >          seg1 - pool1, proto_hdr1=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4
> >                 | RTE_PTYPE_L4_UDP, off1=128B
> >          seg2 - pool2, off1=0B
> >
> > The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
> > following:
> >          seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
> >          seg1 - udp header @ 128 in mbuf from pool1
> >          seg2 - payload @ 0 in mbuf from pool2
> >
> > Note: NIC will only do split when the packets exactly match all the
> > protocol headers in the segments. For example, if ARP packets received
> > with above config, the NIC won't do split for ARP packets since it
> > does not contains ipv4 header and udp header. These packets will be
> > put into the last valid mempool, with zero offset.
> >
> > Now buffer split can be configured in two modes. For length based
> > buffer split, the mp, length, offset field in Rx packet segment should
> > be configured, while the proto_hdr field will be ignored.
> > For protocol header based buffer split, the mp, offset, proto_hdr
> > field in Rx packet segment should be configured, while the length
> > field will be ignored.
> >
> > The split limitations imposed by underlying driver is reported in the
> > rte_eth_dev_info->rx_seg_capa field. The memory attributes for the
> > split parts may differ either, dpdk memory and external memory,
> respectively.
> >
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> 
> I apologize for delay with review. Overall LGTM now. See few notes below.

Thanks so much for your time and patience for this patch series.

> 
> > ---
> >   doc/guides/rel_notes/release_22_11.rst |  7 +++
> >   lib/ethdev/rte_ethdev.c                | 74 ++++++++++++++++++++++----
> >   lib/ethdev/rte_ethdev.h                | 29 +++++++++-
> >   3 files changed, 98 insertions(+), 12 deletions(-)
> >
> > diff --git a/doc/guides/rel_notes/release_22_11.rst
> > b/doc/guides/rel_notes/release_22_11.rst
> > index 6a7474a3d6..510869c73a 100644
> > --- a/doc/guides/rel_notes/release_22_11.rst
> > +++ b/doc/guides/rel_notes/release_22_11.rst
> > @@ -101,6 +101,13 @@ New Features
> >     * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get
> supported
> >       header protocols of a PMD to split.
> >
> > +* **Added protocol header based buffer split.**
> > +
> > +  * Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is
> > +    replaced with ``proto_hdr`` to support protocol header based buffer
> split.
> > +    User can choose length or protocol header to configure buffer split
> > +    according to NIC's capability.
> > +
> 
> It should be grouped together with other ethdev features.

We will send a new version. For the doc changes, the same as patch 1, could you help to adjust the doc?
Thanks very much.

> 
> >
> >   Removed Items
> >   -------------
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 1f0a7f8f3f..27ec19faed 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1649,9 +1649,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
> >   }
> >
> >   static int
> > -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> > -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> > -			     const struct rte_eth_dev_info *dev_info)
> > +rte_eth_rx_queue_check_split(uint16_t port_id,
> > +			const struct rte_eth_rxseg_split *rx_seg,
> > +			uint16_t n_seg, uint32_t *mbp_buf_size,
> > +			const struct rte_eth_dev_info *dev_info)
> >   {
> >   	const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
> >rx_seg_capa;
> >   	struct rte_mempool *mp_first;
> > @@ -1674,6 +1675,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >   		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> >   		uint32_t length = rx_seg[seg_idx].length;
> >   		uint32_t offset = rx_seg[seg_idx].offset;
> > +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
> >
> >   		if (mpl == NULL) {
> >   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1707,13
> > +1709,63 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >   		}
> >   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > -		length = length != 0 ? length : *mbp_buf_size;
> > -		if (*mbp_buf_size < length + offset) {
> > -			RTE_ETHDEV_LOG(ERR,
> > -				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > -				       mpl->name, *mbp_buf_size,
> > -				       length + offset, length, offset);
> > -			return -EINVAL;
> > +
> > +		if (proto_hdr > 0) {
> > +			/* Split based on protocol headers. */
> 
> Isn't safer here to ensure that segment length is set to 0?
> Just to protect agains misusage etc.

It's a reasonable suggestion, I will take it, please see v8.

> 
> > +
> > +			/* skip the payload */
> 
> Sorry, it is confusing. What do you mean here?

Because setting n proto_hdr will generate (n+1) segments. If we want to split the packet into n segments, we only need to check the first (n-1) proto_hdr.
For example, for ETH-IPV4-UDP-PAYLOAD, if we want to split after the UDP header, we only need to set and check the UDP header in the first segment.

Maybe mask is not a good way, so we will use index to filter out the check of proto_hdr inside the last segment.

> 
> > +			if (proto_hdr == RTE_PTYPE_ALL_MASK)
> > +				continue;
> > +
> > +			int ptype_cnt;
> > +
> > +			ptype_cnt =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
> > +			if (ptype_cnt <= 0) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"Port %u failed to supported buffer
> split header protocols\n",
> > +					port_id);
> > +				return -EINVAL;
> > +			}
> > +
> > +			uint32_t ptypes[ptype_cnt];
> > +			int i;
> 
> First of all do no mix code and variable declaration.
> It significantly complicates code reading.

Thanks, the code and variable declaration will be separated.

> Second creation of an array on stack based on function return value is very
> dangerours from security point of view - potential stack overflow and
> corresponding vulnerabilities.

The function value is used for defining how much space is needed to store ptypes. Thanks for your correction of stack overflow, we will use heap instead.

> 
> > +
> > +			ptype_cnt =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> > +
> 	ptypes, ptype_cnt);
> > +			if (ptype_cnt < 0) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"Port %u failed to supported buffer
> split header protocols\n",
> > +					port_id);
> > +				return -EINVAL;
> > +			}
> > +
> > +			for (i = 0; i < ptype_cnt; i++)
> > +				if (ptypes[i] == proto_hdr)
> > +					break;
> > +			if (i == ptype_cnt) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"Requested Rx split header protocols
> 0x%x is not supported.\n",
> > +					proto_hdr);
> > +				return -EINVAL;
> > +			}
> > +
> > +			if (*mbp_buf_size < offset) {
> 
> The check is obviously insufficient, but I agree that it should be driver
> reponsibility to do extra checks for required space in mbuf.
> 
> > +				RTE_ETHDEV_LOG(ERR,
> > +						"%s
> mbuf_data_room_size %u < %u segment offset)\n",
> > +						mpl->name, *mbp_buf_size,
> > +						offset);
> > +				return -EINVAL;
> > +			}
> > +		} else {
> > +			/* Split at fixed length. */
> > +			length = length != 0 ? length : *mbp_buf_size;
> > +			if (*mbp_buf_size < length + offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > +					mpl->name, *mbp_buf_size,
> > +					length + offset, length, offset);
> > +				return -EINVAL;
> > +			}
> >   		}
> >   	}
> >   	return 0;
> > @@ -1793,7 +1845,7 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> >   		n_seg = rx_conf->rx_nseg;
> >
> >   		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
> {
> > -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> > +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg,
> n_seg,
> >   							   &mbp_buf_size,
> >   							   &dev_info);
> >   			if (ret != 0)
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > cf14e04010..a5f9647bd3 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -994,6 +994,9 @@ struct rte_eth_txmode {
> >    *   specified in the first array element, the second buffer, from the
> >    *   pool in the second element, and so on.
> >    *
> > + * - The proto_hdrs in the elements define the split position of
> > + *   received packets.
> > + *
> >    * - The offsets from the segment description elements specify
> >    *   the data offset from the buffer beginning except the first mbuf.
> >    *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> > @@ -1015,12 +1018,36 @@ struct rte_eth_txmode {
> >    *     - pool from the last valid element
> >    *     - the buffer size from this pool
> >    *     - zero offset
> > + *
> > + * - Length based buffer split:
> > + *     - mp, length, offset should be configured.
> > + *     - The proto_hdr field will be ignored.
> 
> Looking at the code above I think proto_hdr must be 0.
> 
> > + *
> > + * - Protocol header based buffer split:
> > + *     - mp, offset, proto_hdr should be configured.
> > + *     - The length field will be ignored.
> 
> I'd require length to be 0 to avoid misusage of the API.

Sure, we will fix them in v8.

> 
> > + *
> > + * - For Protocol header based buffer split, if the received packets
> > + *   don't exactly match all protocol headers in the elements, packets
> > + *   will not be split.
> > + *   These packets will be put into:
> > + *     - pool from the last valid element
> > + *     - the buffer size from this pool
> > + *     - zero offset
> 
> Shoundl't be check that dataroom in the last segment mempool is sufficient
> for up to MTU packet if Rx scatter is disabled?

Yes, we will add this check in the last segment.

Thanks,
Yuan

> 
> >    */
> >   struct rte_eth_rxseg_split {
> >   	struct rte_mempool *mp; /**< Memory pool to allocate segment
> from. */
> >   	uint16_t length; /**< Segment data length, configures split point. */
> >   	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> */
> > -	uint32_t reserved; /**< Reserved field. */
> > +	/**
> > +	 * Proto_hdr defines a bit mask of the protocol sequence as
> RTE_PTYPE_*,
> > +	 * configures split point. The last RTE_PTYPE* in the mask indicates
> the
> > +	 * split position.
> > +	 * For non-tunneling packets, the complete protocol sequence should
> be defined.
> > +	 * For tunneling packets, for simplicity, only the tunnel and inner
> > +	 * protocol sequence should be defined.
> > +	 */
> > +	uint32_t proto_hdr;
> >   };
> >
> >   /**
  
Andrew Rybchenko Oct. 4, 2022, 8:22 a.m. UTC | #4
On 10/4/22 05:48, Wang, YuanX wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> On 10/2/22 00:05, Yuan Wang wrote:
>>> +
>>> +			/* skip the payload */
>>
>> Sorry, it is confusing. What do you mean here?
> 
> Because setting n proto_hdr will generate (n+1) segments. If we want to split the packet into n segments, we only need to check the first (n-1) proto_hdr.
> For example, for ETH-IPV4-UDP-PAYLOAD, if we want to split after the UDP header, we only need to set and check the UDP header in the first segment.
> 
> Maybe mask is not a good way, so we will use index to filter out the check of proto_hdr inside the last segment.

I see your point and understand the problem now.
Thinking a bit more about it I realize that consistency check
here should be more sophisticated.
It should not allow:
  - seg1 - length-based, seg2 - proto-based, seg3 - payload
  - seg1 - proto-based, seg2 - legnth-based, seg3 - proto-based, seg4 - 
payload
I.e. no protocol-based split after length-based.
But should allow:
  - seg1 - proto-based, seg2 - legnth-based, seg3 - payload
I.e. length based split after protocol-based.

Taking the last point above into account, proto_hdr in the last
segment should be 0 like in length-based split (not
RTE_PTYPE_ALL_MASK).

It is an interesting question how to request:
  - seg1 - ETH, seg2 - IPv4, seg3 - UDP, seg4 - payload
Should we really repeat ETH in seg2->proto_hdr and
seg3->proto_hdr header and IPv4 in seg3->proto_hdr again?
I tend to say no since when packet comes to seg2 it already
has no ETH header.

If so, how to handle configuration when ETH is repeat in seg2?
For example,
   - seg1 ETH+IPv4+UDP
   - seg2 ETH+IPv6+UDP
   - seg2 0
Should we deny it or should we define behaviour like.
If a packet does not match segX proto_hdr, the segment is
skipped and segX+1 considered.
Of course, not all drivers/HW supports it. If so, such
configuration should be just discarded by the driver itself.
  
Wang, YuanX Oct. 4, 2022, 3:01 p.m. UTC | #5
Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Tuesday, October 4, 2022 4:23 PM
> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@amd.com>
> Cc: ferruh.yigit@xilinx.com; mdr@ashroe.eu; Li, Xiaoyun
> <xiaoyun.li@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
> Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> jerinjacobk@gmail.com; viacheslavo@nvidia.com;
> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>
> Subject: Re: [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split
> 
> On 10/4/22 05:48, Wang, YuanX wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> On 10/2/22 00:05, Yuan Wang wrote:
> >>> +
> >>> +			/* skip the payload */
> >>
> >> Sorry, it is confusing. What do you mean here?
> >
> > Because setting n proto_hdr will generate (n+1) segments. If we want to
> split the packet into n segments, we only need to check the first (n-1)
> proto_hdr.
> > For example, for ETH-IPV4-UDP-PAYLOAD, if we want to split after the UDP
> header, we only need to set and check the UDP header in the first segment.
> >
> > Maybe mask is not a good way, so we will use index to filter out the check
> of proto_hdr inside the last segment.
> 
> I see your point and understand the problem now.
> Thinking a bit more about it I realize that consistency check here should be
> more sophisticated.
> It should not allow:
>   - seg1 - length-based, seg2 - proto-based, seg3 - payload
>   - seg1 - proto-based, seg2 - legnth-based, seg3 - proto-based, seg4 - payload
> I.e. no protocol-based split after length-based.
> But should allow:
>   - seg1 - proto-based, seg2 - legnth-based, seg3 - payload I.e. length based
> split after protocol-based.
> 
> Taking the last point above into account, proto_hdr in the last segment
> should be 0 like in length-based split (not RTE_PTYPE_ALL_MASK).

Just to confirm, do you mean that the payload as last segment should be treated as a length-based split(proto_hdr == 0)?
If so, for this question, 'check that dataroom in the last segment mempool is sufficient> for up to MTU packet if Rx scatter is disabled'
Is it not necessary to compare MTU size and mbuf_size? Because the check in length based split is sufficient. We will send v8 soon with above thought, please help to check.

> 
> It is an interesting question how to request:
>   - seg1 - ETH, seg2 - IPv4, seg3 - UDP, seg4 - payload Should we really repeat
> ETH in seg2->proto_hdr and
> seg3->proto_hdr header and IPv4 in seg3->proto_hdr again?
> I tend to say no since when packet comes to seg2 it already has no ETH
> header.
> 
> If so, how to handle configuration when ETH is repeat in seg2?
> For example,
>    - seg1 ETH+IPv4+UDP
>    - seg2 ETH+IPv6+UDP
>    - seg2 0
> Should we deny it or should we define behaviour like.
> If a packet does not match segX proto_hdr, the segment is skipped and
> segX+1 considered.
> Of course, not all drivers/HW supports it. If so, such configuration should be
> just discarded by the driver itself.

Here a question that needs to be clarified, whether the segments are sequential or independent. I prefer the former because it's more readable. Furthermore, it consists with length based split, which also configures the lengths sequentially. In this case, the following situation does not exist:
- seg1 ETH+IPv4+UDP
- seg2 ETH+IPv6+UDP
- seg3 0

For the case of repeating ETH, such as - seg1 - ETH, seg2 - IPv4, seg3 - UDP, seg4 - payload, as you suggested, we can omit ETH in the following segment. but IPV4-UDP and IPV6-UDP still need  to be distinguished, follow our previous discussion (user wants to split at IPV4-UDP rather than IPV6-UDP although driver supports both). In this case, seg1 - ETH, seg2 - IPv4, seg3 - UDP, seg4 - payload,
we set proto_hdr with:
seg1 proto_hdr1=RTE_PTYPE_L2_ETHER
seg2 proto_hdr2=RTE_PTYPE_L3_IPV4
seg3 proto_hdr3=RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP

Thanks,
Yuan
  

Patch

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 6a7474a3d6..510869c73a 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -101,6 +101,13 @@  New Features
   * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
     header protocols of a PMD to split.
 
+* **Added protocol header based buffer split.**
+
+  * Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is
+    replaced with ``proto_hdr`` to support protocol header based buffer split.
+    User can choose length or protocol header to configure buffer split
+    according to NIC's capability.
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 1f0a7f8f3f..27ec19faed 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1649,9 +1649,10 @@  rte_eth_dev_is_removed(uint16_t port_id)
 }
 
 static int
-rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
-			     uint16_t n_seg, uint32_t *mbp_buf_size,
-			     const struct rte_eth_dev_info *dev_info)
+rte_eth_rx_queue_check_split(uint16_t port_id,
+			const struct rte_eth_rxseg_split *rx_seg,
+			uint16_t n_seg, uint32_t *mbp_buf_size,
+			const struct rte_eth_dev_info *dev_info)
 {
 	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
 	struct rte_mempool *mp_first;
@@ -1674,6 +1675,7 @@  rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1707,13 +1709,63 @@  rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+
+		if (proto_hdr > 0) {
+			/* Split based on protocol headers. */
+
+			/* skip the payload */
+			if (proto_hdr == RTE_PTYPE_ALL_MASK)
+				continue;
+
+			int ptype_cnt;
+
+			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
+			if (ptype_cnt <= 0) {
+				RTE_ETHDEV_LOG(ERR,
+					"Port %u failed to supported buffer split header protocols\n",
+					port_id);
+				return -EINVAL;
+			}
+
+			uint32_t ptypes[ptype_cnt];
+			int i;
+
+			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
+										ptypes, ptype_cnt);
+			if (ptype_cnt < 0) {
+				RTE_ETHDEV_LOG(ERR,
+					"Port %u failed to supported buffer split header protocols\n",
+					port_id);
+				return -EINVAL;
+			}
+
+			for (i = 0; i < ptype_cnt; i++)
+				if (ptypes[i] == proto_hdr)
+					break;
+			if (i == ptype_cnt) {
+				RTE_ETHDEV_LOG(ERR,
+					"Requested Rx split header protocols 0x%x is not supported.\n",
+					proto_hdr);
+				return -EINVAL;
+			}
+
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
@@ -1793,7 +1845,7 @@  rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		n_seg = rx_conf->rx_nseg;
 
 		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
-			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
+			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
 							   &mbp_buf_size,
 							   &dev_info);
 			if (ret != 0)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index cf14e04010..a5f9647bd3 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -994,6 +994,9 @@  struct rte_eth_txmode {
  *   specified in the first array element, the second buffer, from the
  *   pool in the second element, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - The offsets from the segment description elements specify
  *   the data offset from the buffer beginning except the first mbuf.
  *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
@@ -1015,12 +1018,36 @@  struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field will be ignored.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field will be ignored.
+ *
+ * - For Protocol header based buffer split, if the received packets
+ *   don't exactly match all protocol headers in the elements, packets
+ *   will not be split.
+ *   These packets will be put into:
+ *     - pool from the last valid element
+ *     - the buffer size from this pool
+ *     - zero offset
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	/**
+	 * Proto_hdr defines a bit mask of the protocol sequence as RTE_PTYPE_*,
+	 * configures split point. The last RTE_PTYPE* in the mask indicates the
+	 * split position.
+	 * For non-tunneling packets, the complete protocol sequence should be defined.
+	 * For tunneling packets, for simplicity, only the tunnel and inner
+	 * protocol sequence should be defined.
+	 */
+	uint32_t proto_hdr;
 };
 
 /**