[v3,2/4] ethdev: introduce protocol hdr based buffer split

Message ID 20220902191027.666923-3-yuanx.wang@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Andrew Rybchenko
Headers
Series support protocol based buffer split |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Wang, YuanX Sept. 2, 2022, 7:10 p.m. UTC
  Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given an arbitrarily variable length in Rx
packet segment, it is almost impossible to pass a fixed protocol header to
driver. Besides, the existence of tunneling results in the composition of
a packet is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happens
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, driver will split the ingress packets into
multiple segments.

struct rte_eth_rxseg_split {
        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures split point */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        uint32_t proto_hdr; /* supported ptype of a specific pmd,
                               configures split point.
			       It should be defined by RTE_PTYPE_*
			     */
};

If protocol header split can be supported by a PMD. The
rte_eth_buffer_split_get_supported_hdr_ptypes function can
be use to obtain a list of these protocol headers.

For example, let's suppose we configured the Rx queue with the
following segments:
        seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
        seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
        seg2 - pool2, off1=0B

The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
following:
        seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
        seg1 - udp header @ 128 in mbuf from pool1
        seg2 - payload @ 0 in mbuf from pool2

Note: NIC will only do split when the packets exactly match all the
protocol headers in the segments. For example, if ARP packets received
with above config, the NIC won't do split for ARP packets since
it does not contains ipv4 header and udp header.

Now buffer split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field will be ignored.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field will
be ignored.

The split limitations imposed by underlying driver is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |  5 +++
 lib/ethdev/rte_ethdev.c                | 55 ++++++++++++++++++++------
 lib/ethdev/rte_ethdev.h                | 17 +++++++-
 3 files changed, 65 insertions(+), 12 deletions(-)
  

Comments

Andrew Rybchenko Sept. 12, 2022, 11:47 a.m. UTC | #1
On 9/2/22 22:10, Yuan Wang wrote:
> Currently, Rx buffer split supports length based split. With Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
> configured, PMD will be able to split the received packets into
> multiple segments.
> 
> However, length based buffer split is not suitable for NICs that do split
> based on protocol headers. Given an arbitrarily variable length in Rx
> packet segment, it is almost impossible to pass a fixed protocol header to
> driver. Besides, the existence of tunneling results in the composition of
> a packet is various, which makes the situation even worse.
> 
> This patch extends current buffer split to support protocol header based
> buffer split. A new proto_hdr field is introduced in the reserved field
> of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
> field defines the split position of packet, splitting will always happens
> after the protocol header defined in the Rx packet segment. When Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol header is configured, driver will split the ingress packets into
> multiple segments.
> 
> struct rte_eth_rxseg_split {
>          struct rte_mempool *mp; /* memory pools to allocate segment from */
>          uint16_t length; /* segment maximal data length,
>                              configures split point */
>          uint16_t offset; /* data offset from beginning
>                              of mbuf data buffer */
>          uint32_t proto_hdr; /* supported ptype of a specific pmd,
>                                 configures split point.
> 			       It should be defined by RTE_PTYPE_*

If I understand correctly, the statement is a bit misleading
since it should be a bit mask of RTE_PTYPE_* defines. Not
exactly one RTE_PTYPE_*.

> 			     */
> };
> 
> If protocol header split can be supported by a PMD. The
> rte_eth_buffer_split_get_supported_hdr_ptypes function can
> be use to obtain a list of these protocol headers.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>          seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
>          seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
>          seg2 - pool2, off1=0B
> 
> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like

What is MAC_IPV4_UDP_PAYLOAD? Do you mean ETH_IPV4_UDP_PAYLOAD?

> following:
>          seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>          seg1 - udp header @ 128 in mbuf from pool1
>          seg2 - payload @ 0 in mbuf from pool2
> 
> Note: NIC will only do split when the packets exactly match all the
> protocol headers in the segments. For example, if ARP packets received
> with above config, the NIC won't do split for ARP packets since
> it does not contains ipv4 header and udp header.

You must define which mempool is used in the case.

> 
> Now buffer split can be configured in two modes. For length based
> buffer split, the mp, length, offset field in Rx packet segment should
> be configured, while the proto_hdr field will be ignored.
> For protocol header based buffer split, the mp, offset, proto_hdr field
> in Rx packet segment should be configured, while the length field will
> be ignored.
> 
> The split limitations imposed by underlying driver is reported in the
> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
> parts may differ either, dpdk memory and external memory, respectively.
> 
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> ---
>   doc/guides/rel_notes/release_22_11.rst |  5 +++
>   lib/ethdev/rte_ethdev.c                | 55 ++++++++++++++++++++------
>   lib/ethdev/rte_ethdev.h                | 17 +++++++-
>   3 files changed, 65 insertions(+), 12 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
> index 4d90514a9a..f3b58c7895 100644
> --- a/doc/guides/rel_notes/release_22_11.rst
> +++ b/doc/guides/rel_notes/release_22_11.rst
> @@ -60,6 +60,11 @@ New Features
>     Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
>     header protocols of a PMD to split.
>   
> +* **Added protocol header based buffer split.**
> +  Ethdev: The ``reserved`` field in the  ``rte_eth_rxseg_split`` structure is
> +  replaced with ``proto_hdr`` to support protocol header based buffer split.
> +  User can choose length or protocol header to configure buffer split
> +  according to NIC's capability.

Add one more empty line to have two before the next sectoin.

>   
>   Removed Items
>   -------------
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 093c577add..dfceb723ee 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1635,9 +1635,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
>   }
>   
>   static int
> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> -			     const struct rte_eth_dev_info *dev_info)
> +rte_eth_rx_queue_check_split(uint16_t port_id,
> +			const struct rte_eth_rxseg_split *rx_seg,
> +			uint16_t n_seg, uint32_t *mbp_buf_size,
> +			const struct rte_eth_dev_info *dev_info)
>   {
>   	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
>   	struct rte_mempool *mp_first;
> @@ -1660,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>   		uint32_t length = rx_seg[seg_idx].length;
>   		uint32_t offset = rx_seg[seg_idx].offset;
> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
>   
>   		if (mpl == NULL) {
>   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1693,13 +1695,44 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		}
>   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {
> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +
> +		int ret = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);

Do not mix variable declaration and code.
It is better to give the variable some sensible name.
Otherwise else branch code is hard to read.

> +		if (ret <= 0) {

May be I'm missing something, but nothing prevetns a driver/HW
to support both protocol-based and fixed-length split.
So, ability to support protocol based split should be treated
as a request to do it. It must be based on rx_seg->proto_hdr
content (for all segments).

Also nothing should prevent to mix protocol and fixed-length
split. I.e. split just after UDP in the first segment,
40 bytes in the second segment, everything else in the third.

> +			/* Split at fixed length. */
> +			length = length != 0 ? length : *mbp_buf_size;
> +			if (*mbp_buf_size < length + offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> +					mpl->name, *mbp_buf_size,
> +					length + offset, length, offset);
> +				return -EINVAL;
> +			}
> +		} else {
> +			/* Split after specified protocol header. */
> +			uint32_t ptypes[ret];
> +			int i;
> +
> +			ret = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, ptypes, ret);

In theory, the funciton could fail since input arguments
differ. So, it should be handled.

> +			for (i = 0; i < ret; i++)
> +				if (ptypes[i] & proto_hdr)

IMHO it should be ==, not &. I think that
rte_eth_buffer_split_get_supported_hdr_ptypes() should define
points at which split could happen and we should match the
point exactly.

> +					break;
> +
> +			if (i == ret) {
> +#define PTYPE_NAMESIZE	256

Why? It is looks really strange that it is defined here.

> +				char ptype_name[PTYPE_NAMESIZE];
> +				rte_get_ptype_name(proto_hdr, ptype_name, sizeof(ptype_name));
> +				RTE_ETHDEV_LOG(ERR,
> +					"Protocol header %s is not supported.\n",
> +					ptype_name);
> +				return -EINVAL;
> +			}
> +			if (*mbp_buf_size < offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +						"%s mbuf_data_room_size %u < %u segment offset)\n",
> +						mpl->name, *mbp_buf_size,
> +						offset);
> +				return -EINVAL;
> +			}
>   		}
>   	}
>   	return 0;
> @@ -1778,7 +1811,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>   		n_seg = rx_conf->rx_nseg;
>   
>   		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
> -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
>   							   &mbp_buf_size,
>   							   &dev_info);
>   			if (ret != 0)
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index c58c908c3a..410fba5eab 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1175,6 +1175,9 @@ struct rte_eth_txmode {
>    *   specified in the first array element, the second buffer, from the
>    *   pool in the second element, and so on.
>    *
> + * - The proto_hdrs in the elements define the split position of
> + *   received packets.
> + *
>    * - The offsets from the segment description elements specify
>    *   the data offset from the buffer beginning except the first mbuf.
>    *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> @@ -1196,12 +1199,24 @@ struct rte_eth_txmode {
>    *     - pool from the last valid element
>    *     - the buffer size from this pool
>    *     - zero offset
> + *
> + * - Length based buffer split:
> + *     - mp, length, offset should be configured.
> + *     - The proto_hdr field will be ignored.
> + *
> + * - Protocol header based buffer split:
> + *     - mp, offset, proto_hdr should be configured.
> + *     - The length field will be ignored.
>    */
>   struct rte_eth_rxseg_split {
>   	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
>   	uint16_t length; /**< Segment data length, configures split point. */
>   	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> -	uint32_t reserved; /**< Reserved field. */
> +	/**
> +	 * Supported ptype of a specific pmd, configures split point.
> +	 * It should be defined by RTE_PTYPE_*.
> +	 */
> +	uint32_t proto_hdr;
>   };
>   
>   /**
  
Suanming Mou Sept. 13, 2022, 7:56 a.m. UTC | #2
Hi

> -----Original Message-----
> From: Yuan Wang <yuanx.wang@intel.com>
> Sent: Saturday, September 3, 2022 3:10 AM
> To: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL)
> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@xilinx.com>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Cc: mdr@ashroe.eu; xiaoyun.li@intel.com; aman.deep.singh@intel.com;
> yuying.zhang@intel.com; qi.z.zhang@intel.com; qiming.yang@intel.com;
> jerinjacobk@gmail.com; Slava Ovsiienko <viacheslavo@nvidia.com>;
> stephen@networkplumber.org; xuan.ding@intel.com; hpothula@marvell.com;
> yaqi.tang@intel.com; Yuan Wang <yuanx.wang@intel.com>; Wenxuan Wu
> <wenxuanx.wu@intel.com>
> Subject: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split
> 

snip

> @@ -1693,13 +1695,44 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
>  		}
>  		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>  		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {
> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +
> +		int ret =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);

One small question, since the  ptypes == NULL and num == 0, I assume ret will always be <=0, right?

> +		if (ret <= 0) {
> +			/* Split at fixed length. */
> +			length = length != 0 ? length : *mbp_buf_size;
> +			if (*mbp_buf_size < length + offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> +					mpl->name, *mbp_buf_size,
> +					length + offset, length, offset);
> +				return -EINVAL;
> +			}
> +		} else {
> +			/* Split after specified protocol header. */
> +			uint32_t ptypes[ret];
> +			int i;
> +
> +			ret =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, ptypes, ret);
> +			for (i = 0; i < ret; i++)
> +				if (ptypes[i] & proto_hdr)
> +					break;
> +

snip
  
Wang, YuanX Sept. 16, 2022, 8:38 a.m. UTC | #3
Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Monday, September 12, 2022 7:47 PM
> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@xilinx.com>
> Cc: mdr@ashroe.eu; Li, Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman Deep
> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> jerinjacobk@gmail.com; viacheslavo@nvidia.com;
> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>; Wenxuan Wu
> <wenxuanx.wu@intel.com>
> Subject: Re: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split
> 
> On 9/2/22 22:10, Yuan Wang wrote:
> > Currently, Rx buffer split supports length based split. With Rx queue
> > offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet
> segment
> > configured, PMD will be able to split the received packets into
> > multiple segments.
> >
> > However, length based buffer split is not suitable for NICs that do
> > split based on protocol headers. Given an arbitrarily variable length
> > in Rx packet segment, it is almost impossible to pass a fixed protocol
> > header to driver. Besides, the existence of tunneling results in the
> > composition of a packet is various, which makes the situation even worse.
> >
> > This patch extends current buffer split to support protocol header
> > based buffer split. A new proto_hdr field is introduced in the
> > reserved field of rte_eth_rxseg_split structure to specify protocol
> > header. The proto_hdr field defines the split position of packet,
> > splitting will always happens after the protocol header defined in the
> > Rx packet segment. When Rx queue offload
> > RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol
> > header is configured, driver will split the ingress packets into multiple
> segments.
> >
> > struct rte_eth_rxseg_split {
> >          struct rte_mempool *mp; /* memory pools to allocate segment from
> */
> >          uint16_t length; /* segment maximal data length,
> >                              configures split point */
> >          uint16_t offset; /* data offset from beginning
> >                              of mbuf data buffer */
> >          uint32_t proto_hdr; /* supported ptype of a specific pmd,
> >                                 configures split point.
> > 			       It should be defined by RTE_PTYPE_*
> 
> If I understand correctly, the statement is a bit misleading since it should be a
> bit mask of RTE_PTYPE_* defines. Not exactly one RTE_PTYPE_*.

Do you mean that a segment should support multiple protocol headers, such as splitting both tcp and udp headers?

> 
> > 			     */
> > };
> >
> > If protocol header split can be supported by a PMD. The
> > rte_eth_buffer_split_get_supported_hdr_ptypes function can be use to
> > obtain a list of these protocol headers.
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >          seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
> >          seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
> >          seg2 - pool2, off1=0B
> >
> > The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> 
> What is MAC_IPV4_UDP_PAYLOAD? Do you mean ETH_IPV4_UDP_PAYLOAD?

Thanks for your correction, it should be ETH_IPV4_UDP_PAYLOAD.

> 
> > following:
> >          seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
> >          seg1 - udp header @ 128 in mbuf from pool1
> >          seg2 - payload @ 0 in mbuf from pool2
> >
> > Note: NIC will only do split when the packets exactly match all the
> > protocol headers in the segments. For example, if ARP packets received
> > with above config, the NIC won't do split for ARP packets since it
> > does not contains ipv4 header and udp header.
> 
> You must define which mempool is used in the case.

IMHO I don't think we can define which mempool to use, it depends on NIC behavior.
For our NIC, packets that are unable to split will be put into the last valid pool, with zero offset. 
So here we would like to define to put these packets into the last valid mempool, with zero offset.

> 
> >
> > Now buffer split can be configured in two modes. For length based
> > buffer split, the mp, length, offset field in Rx packet segment should
> > be configured, while the proto_hdr field will be ignored.
> > For protocol header based buffer split, the mp, offset, proto_hdr
> > field in Rx packet segment should be configured, while the length
> > field will be ignored.
> >
> > The split limitations imposed by underlying driver is reported in the
> > rte_eth_dev_info->rx_seg_capa field. The memory attributes for the
> > split parts may differ either, dpdk memory and external memory,
> respectively.
> >
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> > ---
> >   doc/guides/rel_notes/release_22_11.rst |  5 +++
> >   lib/ethdev/rte_ethdev.c                | 55 ++++++++++++++++++++------
> >   lib/ethdev/rte_ethdev.h                | 17 +++++++-
> >   3 files changed, 65 insertions(+), 12 deletions(-)
> >
> > diff --git a/doc/guides/rel_notes/release_22_11.rst
> > b/doc/guides/rel_notes/release_22_11.rst
> > index 4d90514a9a..f3b58c7895 100644
> > --- a/doc/guides/rel_notes/release_22_11.rst
> > +++ b/doc/guides/rel_notes/release_22_11.rst
> > @@ -60,6 +60,11 @@ New Features
> >     Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get
> supported
> >     header protocols of a PMD to split.
> >
> > +* **Added protocol header based buffer split.**
> > +  Ethdev: The ``reserved`` field in the  ``rte_eth_rxseg_split``
> > +structure is
> > +  replaced with ``proto_hdr`` to support protocol header based buffer split.
> > +  User can choose length or protocol header to configure buffer split
> > +  according to NIC's capability.
> 
> Add one more empty line to have two before the next sectoin.

Thanks for your catch.

> 
> >
> >   Removed Items
> >   -------------
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 093c577add..dfceb723ee 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1635,9 +1635,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
> >   }
> >
> >   static int
> > -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> > -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> > -			     const struct rte_eth_dev_info *dev_info)
> > +rte_eth_rx_queue_check_split(uint16_t port_id,
> > +			const struct rte_eth_rxseg_split *rx_seg,
> > +			uint16_t n_seg, uint32_t *mbp_buf_size,
> > +			const struct rte_eth_dev_info *dev_info)
> >   {
> >   	const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
> >rx_seg_capa;
> >   	struct rte_mempool *mp_first;
> > @@ -1660,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >   		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> >   		uint32_t length = rx_seg[seg_idx].length;
> >   		uint32_t offset = rx_seg[seg_idx].offset;
> > +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
> >
> >   		if (mpl == NULL) {
> >   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1693,13
> > +1695,44 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >   		}
> >   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > -		length = length != 0 ? length : *mbp_buf_size;
> > -		if (*mbp_buf_size < length + offset) {
> > -			RTE_ETHDEV_LOG(ERR,
> > -				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > -				       mpl->name, *mbp_buf_size,
> > -				       length + offset, length, offset);
> > -			return -EINVAL;
> > +
> > +		int ret =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> > +NULL, 0);
> 
> Do not mix variable declaration and code.
> It is better to give the variable some sensible name.
> Otherwise else branch code is hard to read.

Thanks for the suggestion, will  take care of naming.

> 
> > +		if (ret <= 0) {
> 
> May be I'm missing something, but nothing prevetns a driver/HW to support
> both protocol-based and fixed-length split.
> So, ability to support protocol based split should be treated as a request to
> do it. It must be based on rx_seg->proto_hdr content (for all segments).
> 
> Also nothing should prevent to mix protocol and fixed-length split. I.e. split
> just after UDP in the first segment,
> 40 bytes in the second segment, everything else in the third.

Mix mode is an interesting idea. Currently testpmd and driver do not support mixed mode, but it does not affect the library to support this mode.

> 
> > +			/* Split at fixed length. */
> > +			length = length != 0 ? length : *mbp_buf_size;
> > +			if (*mbp_buf_size < length + offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > +					mpl->name, *mbp_buf_size,
> > +					length + offset, length, offset);
> > +				return -EINVAL;
> > +			}
> > +		} else {
> > +			/* Split after specified protocol header. */
> > +			uint32_t ptypes[ret];
> > +			int i;
> > +
> > +			ret =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> > +ptypes, ret);
> 
> In theory, the funciton could fail since input arguments differ. So, it should be
> handled.

Thanks for your catch, will fix in the next version.

> 
> > +			for (i = 0; i < ret; i++)
> > +				if (ptypes[i] & proto_hdr)
> 
> IMHO it should be ==, not &. I think that
> rte_eth_buffer_split_get_supported_hdr_ptypes() should define points at
> which split could happen and we should match the point exactly.

Sure, == is better. Thanks for the suggestion.

> 
> > +					break;
> > +
> > +			if (i == ret) {
> > +#define PTYPE_NAMESIZE	256
> 
> Why? It is looks really strange that it is defined here.

I intend to display the protocol name in the log, but if the proto_hdr is a bit mask, can I just show the number?
Please see v4 for this modification.

> 
> > +				char ptype_name[PTYPE_NAMESIZE];
> > +				rte_get_ptype_name(proto_hdr,
> ptype_name, sizeof(ptype_name));
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"Protocol header %s is not
> supported.\n",
> > +					ptype_name);
> > +				return -EINVAL;
> > +			}
> > +			if (*mbp_buf_size < offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +						"%s
> mbuf_data_room_size %u < %u segment offset)\n",
> > +						mpl->name, *mbp_buf_size,
> > +						offset);
> > +				return -EINVAL;
> > +			}
> >   		}
> >   	}
> >   	return 0;
> > @@ -1778,7 +1811,7 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> >   		n_seg = rx_conf->rx_nseg;
> >
> >   		if (rx_conf->offloads &
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
> > -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> > +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg,
> n_seg,
> >   							   &mbp_buf_size,
> >   							   &dev_info);
> >   			if (ret != 0)
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > c58c908c3a..410fba5eab 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1175,6 +1175,9 @@ struct rte_eth_txmode {
> >    *   specified in the first array element, the second buffer, from the
> >    *   pool in the second element, and so on.
> >    *
> > + * - The proto_hdrs in the elements define the split position of
> > + *   received packets.
> > + *
> >    * - The offsets from the segment description elements specify
> >    *   the data offset from the buffer beginning except the first mbuf.
> >    *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> > @@ -1196,12 +1199,24 @@ struct rte_eth_txmode {
> >    *     - pool from the last valid element
> >    *     - the buffer size from this pool
> >    *     - zero offset
> > + *
> > + * - Length based buffer split:
> > + *     - mp, length, offset should be configured.
> > + *     - The proto_hdr field will be ignored.
> > + *
> > + * - Protocol header based buffer split:
> > + *     - mp, offset, proto_hdr should be configured.
> > + *     - The length field will be ignored.
> >    */
> >   struct rte_eth_rxseg_split {
> >   	struct rte_mempool *mp; /**< Memory pool to allocate segment
> from. */
> >   	uint16_t length; /**< Segment data length, configures split point. */
> >   	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> */
> > -	uint32_t reserved; /**< Reserved field. */
> > +	/**
> > +	 * Supported ptype of a specific pmd, configures split point.
> > +	 * It should be defined by RTE_PTYPE_*.
> > +	 */
> > +	uint32_t proto_hdr;
> >   };
> >
> >   /**
  
Wang, YuanX Sept. 16, 2022, 8:39 a.m. UTC | #4
Hi 

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Tuesday, September 13, 2022 3:57 PM
> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@xilinx.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Cc: mdr@ashroe.eu; Li, Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman Deep
> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> jerinjacobk@gmail.com; Slava Ovsiienko <viacheslavo@nvidia.com>;
> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>; Wenxuan Wu
> <wenxuanx.wu@intel.com>
> Subject: RE: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split
> 
> Hi
> 
> > -----Original Message-----
> > From: Yuan Wang <yuanx.wang@intel.com>
> > Sent: Saturday, September 3, 2022 3:10 AM
> > To: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL)
> > <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@xilinx.com>; Andrew
> > Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > Cc: mdr@ashroe.eu; xiaoyun.li@intel.com; aman.deep.singh@intel.com;
> > yuying.zhang@intel.com; qi.z.zhang@intel.com; qiming.yang@intel.com;
> > jerinjacobk@gmail.com; Slava Ovsiienko <viacheslavo@nvidia.com>;
> > stephen@networkplumber.org; xuan.ding@intel.com;
> hpothula@marvell.com;
> > yaqi.tang@intel.com; Yuan Wang <yuanx.wang@intel.com>; Wenxuan Wu
> > <wenxuanx.wu@intel.com>
> > Subject: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer
> > split
> >
> 
> snip
> 
> > @@ -1693,13 +1695,44 @@ rte_eth_rx_queue_check_split(const struct
> > rte_eth_rxseg_split *rx_seg,
> >  		}
> >  		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >  		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > -		length = length != 0 ? length : *mbp_buf_size;
> > -		if (*mbp_buf_size < length + offset) {
> > -			RTE_ETHDEV_LOG(ERR,
> > -				       "%s mbuf_data_room_size %u < %u
> > (segment length=%u + segment offset=%u)\n",
> > -				       mpl->name, *mbp_buf_size,
> > -				       length + offset, length, offset);
> > -			return -EINVAL;
> > +
> > +		int ret =
> > rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
> 
> One small question, since the  ptypes == NULL and num == 0, I assume ret
> will always be <=0, right?

The usage of rte_eth_buffer_split_get_supported_hdr_ptypes is the same as rte_eth_dev_get_supported_ptypes.
 In this scenario, the function returns the total number of supported ptypes, or an error code less than 0.

> 
> > +		if (ret <= 0) {
> > +			/* Split at fixed length. */
> > +			length = length != 0 ? length : *mbp_buf_size;
> > +			if (*mbp_buf_size < length + offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"%s mbuf_data_room_size %u < %u
> > (segment length=%u + segment offset=%u)\n",
> > +					mpl->name, *mbp_buf_size,
> > +					length + offset, length, offset);
> > +				return -EINVAL;
> > +			}
> > +		} else {
> > +			/* Split after specified protocol header. */
> > +			uint32_t ptypes[ret];
> > +			int i;
> > +
> > +			ret =
> > rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, ptypes, ret);
> > +			for (i = 0; i < ret; i++)
> > +				if (ptypes[i] & proto_hdr)
> > +					break;
> > +
> 
> snip
  
Andrew Rybchenko Sept. 20, 2022, 5:35 a.m. UTC | #5
On 9/16/22 11:38, Wang, YuanX wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Monday, September 12, 2022 7:47 PM
>> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
>> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@xilinx.com>
>> Cc: mdr@ashroe.eu; Li, Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman Deep
>> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
>> Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
>> jerinjacobk@gmail.com; viacheslavo@nvidia.com;
>> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
>> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>; Wenxuan Wu
>> <wenxuanx.wu@intel.com>
>> Subject: Re: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split
>>
>> On 9/2/22 22:10, Yuan Wang wrote:
>>> Currently, Rx buffer split supports length based split. With Rx queue
>>> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet
>> segment
>>> configured, PMD will be able to split the received packets into
>>> multiple segments.
>>>
>>> However, length based buffer split is not suitable for NICs that do
>>> split based on protocol headers. Given an arbitrarily variable length
>>> in Rx packet segment, it is almost impossible to pass a fixed protocol
>>> header to driver. Besides, the existence of tunneling results in the
>>> composition of a packet is various, which makes the situation even worse.
>>>
>>> This patch extends current buffer split to support protocol header
>>> based buffer split. A new proto_hdr field is introduced in the
>>> reserved field of rte_eth_rxseg_split structure to specify protocol
>>> header. The proto_hdr field defines the split position of packet,
>>> splitting will always happens after the protocol header defined in the
>>> Rx packet segment. When Rx queue offload
>>> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
>> protocol
>>> header is configured, driver will split the ingress packets into multiple
>> segments.
>>>
>>> struct rte_eth_rxseg_split {
>>>           struct rte_mempool *mp; /* memory pools to allocate segment from
>> */
>>>           uint16_t length; /* segment maximal data length,
>>>                               configures split point */
>>>           uint16_t offset; /* data offset from beginning
>>>                               of mbuf data buffer */
>>>           uint32_t proto_hdr; /* supported ptype of a specific pmd,
>>>                                  configures split point.
>>> 			       It should be defined by RTE_PTYPE_*
>>
>> If I understand correctly, the statement is a bit misleading since it should be a
>> bit mask of RTE_PTYPE_* defines. Not exactly one RTE_PTYPE_*.
> 
> Do you mean that a segment should support multiple protocol headers, such as splitting both tcp and udp headers?

No-no. Look. In order to split after some protocol, for example
UDP, NIC should recognice all previous protocols. Moreover, in
the case of a tunenl UDP could be inner and outer. At which
point would you like to split if NIC supports both?
Another example, if NIC support stplit after Eth-IPv4-UDP and
after Eth-IPv6-UDP, how to request to split just after
Eth-IPv4-UDP, but not Eth-IPv6-UDP?

> 
>>
>>> 			     */
>>> };
>>>
>>> If protocol header split can be supported by a PMD. The
>>> rte_eth_buffer_split_get_supported_hdr_ptypes function can be use to
>>> obtain a list of these protocol headers.
>>>
>>> For example, let's suppose we configured the Rx queue with the
>>> following segments:
>>>           seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
>>>           seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
>>>           seg2 - pool2, off1=0B
>>>
>>> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
>>
>> What is MAC_IPV4_UDP_PAYLOAD? Do you mean ETH_IPV4_UDP_PAYLOAD?
> 
> Thanks for your correction, it should be ETH_IPV4_UDP_PAYLOAD.
> 
>>
>>> following:
>>>           seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
>> pool0
>>>           seg1 - udp header @ 128 in mbuf from pool1
>>>           seg2 - payload @ 0 in mbuf from pool2
>>>
>>> Note: NIC will only do split when the packets exactly match all the
>>> protocol headers in the segments. For example, if ARP packets received
>>> with above config, the NIC won't do split for ARP packets since it
>>> does not contains ipv4 header and udp header.
>>
>> You must define which mempool is used in the case.
> 
> IMHO I don't think we can define which mempool to use, it depends on NIC behavior.
> For our NIC, packets that are unable to split will be put into the last valid pool, with zero offset.
> So here we would like to define to put these packets into the last valid mempool, with zero offset.


Anyway API should not be silent about the case since it is the
first question at least in my head. IMHO the last segment is
the only sensible option since it is typically will be big
enough. Other mempool for protocol headers are likely to be
small. So, I suggest to define this way.

>>
>>>
>>> Now buffer split can be configured in two modes. For length based
>>> buffer split, the mp, length, offset field in Rx packet segment should
>>> be configured, while the proto_hdr field will be ignored.
>>> For protocol header based buffer split, the mp, offset, proto_hdr
>>> field in Rx packet segment should be configured, while the length
>>> field will be ignored.
>>>
>>> The split limitations imposed by underlying driver is reported in the
>>> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the
>>> split parts may differ either, dpdk memory and external memory,
>> respectively.
>>>
>>> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
>>> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
>>> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
>>> ---
>>>    doc/guides/rel_notes/release_22_11.rst |  5 +++
>>>    lib/ethdev/rte_ethdev.c                | 55 ++++++++++++++++++++------
>>>    lib/ethdev/rte_ethdev.h                | 17 +++++++-
>>>    3 files changed, 65 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/doc/guides/rel_notes/release_22_11.rst
>>> b/doc/guides/rel_notes/release_22_11.rst
>>> index 4d90514a9a..f3b58c7895 100644
>>> --- a/doc/guides/rel_notes/release_22_11.rst
>>> +++ b/doc/guides/rel_notes/release_22_11.rst
>>> @@ -60,6 +60,11 @@ New Features
>>>      Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get
>> supported
>>>      header protocols of a PMD to split.
>>>
>>> +* **Added protocol header based buffer split.**
>>> +  Ethdev: The ``reserved`` field in the  ``rte_eth_rxseg_split``
>>> +structure is
>>> +  replaced with ``proto_hdr`` to support protocol header based buffer split.
>>> +  User can choose length or protocol header to configure buffer split
>>> +  according to NIC's capability.
>>
>> Add one more empty line to have two before the next sectoin.
> 
> Thanks for your catch.
> 
>>
>>>
>>>    Removed Items
>>>    -------------
>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
>>> 093c577add..dfceb723ee 100644
>>> --- a/lib/ethdev/rte_ethdev.c
>>> +++ b/lib/ethdev/rte_ethdev.c
>>> @@ -1635,9 +1635,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
>>>    }
>>>
>>>    static int
>>> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>>> -			     uint16_t n_seg, uint32_t *mbp_buf_size,
>>> -			     const struct rte_eth_dev_info *dev_info)
>>> +rte_eth_rx_queue_check_split(uint16_t port_id,
>>> +			const struct rte_eth_rxseg_split *rx_seg,
>>> +			uint16_t n_seg, uint32_t *mbp_buf_size,
>>> +			const struct rte_eth_dev_info *dev_info)
>>>    {
>>>    	const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
>>> rx_seg_capa;
>>>    	struct rte_mempool *mp_first;
>>> @@ -1660,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
>> rte_eth_rxseg_split *rx_seg,
>>>    		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>>>    		uint32_t length = rx_seg[seg_idx].length;
>>>    		uint32_t offset = rx_seg[seg_idx].offset;
>>> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
>>>
>>>    		if (mpl == NULL) {
>>>    			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
>> @@ -1693,13
>>> +1695,44 @@ rte_eth_rx_queue_check_split(const struct
>> rte_eth_rxseg_split *rx_seg,
>>>    		}
>>>    		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>>>    		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
>>> -		length = length != 0 ? length : *mbp_buf_size;
>>> -		if (*mbp_buf_size < length + offset) {
>>> -			RTE_ETHDEV_LOG(ERR,
>>> -				       "%s mbuf_data_room_size %u < %u
>> (segment length=%u + segment offset=%u)\n",
>>> -				       mpl->name, *mbp_buf_size,
>>> -				       length + offset, length, offset);
>>> -			return -EINVAL;
>>> +
>>> +		int ret =
>> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
>>> +NULL, 0);
>>
>> Do not mix variable declaration and code.
>> It is better to give the variable some sensible name.
>> Otherwise else branch code is hard to read.
> 
> Thanks for the suggestion, will  take care of naming.
> 
>>
>>> +		if (ret <= 0) {
>>
>> May be I'm missing something, but nothing prevetns a driver/HW to support
>> both protocol-based and fixed-length split.
>> So, ability to support protocol based split should be treated as a request to
>> do it. It must be based on rx_seg->proto_hdr content (for all segments).
>>
>> Also nothing should prevent to mix protocol and fixed-length split. I.e. split
>> just after UDP in the first segment,
>> 40 bytes in the second segment, everything else in the third.
> 
> Mix mode is an interesting idea. Currently testpmd and driver do not support mixed mode, but it does not affect the library to support this mode.

That's OK.

> 
>>
>>> +			/* Split at fixed length. */
>>> +			length = length != 0 ? length : *mbp_buf_size;
>>> +			if (*mbp_buf_size < length + offset) {
>>> +				RTE_ETHDEV_LOG(ERR,
>>> +					"%s mbuf_data_room_size %u < %u
>> (segment length=%u + segment offset=%u)\n",
>>> +					mpl->name, *mbp_buf_size,
>>> +					length + offset, length, offset);
>>> +				return -EINVAL;
>>> +			}
>>> +		} else {
>>> +			/* Split after specified protocol header. */
>>> +			uint32_t ptypes[ret];
>>> +			int i;
>>> +
>>> +			ret =
>> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
>>> +ptypes, ret);
>>
>> In theory, the funciton could fail since input arguments differ. So, it should be
>> handled.
> 
> Thanks for your catch, will fix in the next version.
> 
>>
>>> +			for (i = 0; i < ret; i++)
>>> +				if (ptypes[i] & proto_hdr)
>>
>> IMHO it should be ==, not &. I think that
>> rte_eth_buffer_split_get_supported_hdr_ptypes() should define points at
>> which split could happen and we should match the point exactly.
> 
> Sure, == is better. Thanks for the suggestion.
> 
>>
>>> +					break;
>>> +
>>> +			if (i == ret) {
>>> +#define PTYPE_NAMESIZE	256
>>
>> Why? It is looks really strange that it is defined here.
> 
> I intend to display the protocol name in the log, but if the proto_hdr is a bit mask, can I just show the number?
> Please see v4 for this modification.
> 
>>
>>> +				char ptype_name[PTYPE_NAMESIZE];
>>> +				rte_get_ptype_name(proto_hdr,
>> ptype_name, sizeof(ptype_name));
>>> +				RTE_ETHDEV_LOG(ERR,
>>> +					"Protocol header %s is not
>> supported.\n",
>>> +					ptype_name);
>>> +				return -EINVAL;
>>> +			}
>>> +			if (*mbp_buf_size < offset) {
>>> +				RTE_ETHDEV_LOG(ERR,
>>> +						"%s
>> mbuf_data_room_size %u < %u segment offset)\n",
>>> +						mpl->name, *mbp_buf_size,
>>> +						offset);
>>> +				return -EINVAL;
>>> +			}
>>>    		}
>>>    	}
>>>    	return 0;
>>> @@ -1778,7 +1811,7 @@ rte_eth_rx_queue_setup(uint16_t port_id,
>> uint16_t rx_queue_id,
>>>    		n_seg = rx_conf->rx_nseg;
>>>
>>>    		if (rx_conf->offloads &
>> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
>>> -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
>>> +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg,
>> n_seg,
>>>    							   &mbp_buf_size,
>>>    							   &dev_info);
>>>    			if (ret != 0)
>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>>> c58c908c3a..410fba5eab 100644
>>> --- a/lib/ethdev/rte_ethdev.h
>>> +++ b/lib/ethdev/rte_ethdev.h
>>> @@ -1175,6 +1175,9 @@ struct rte_eth_txmode {
>>>     *   specified in the first array element, the second buffer, from the
>>>     *   pool in the second element, and so on.
>>>     *
>>> + * - The proto_hdrs in the elements define the split position of
>>> + *   received packets.
>>> + *
>>>     * - The offsets from the segment description elements specify
>>>     *   the data offset from the buffer beginning except the first mbuf.
>>>     *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
>>> @@ -1196,12 +1199,24 @@ struct rte_eth_txmode {
>>>     *     - pool from the last valid element
>>>     *     - the buffer size from this pool
>>>     *     - zero offset
>>> + *
>>> + * - Length based buffer split:
>>> + *     - mp, length, offset should be configured.
>>> + *     - The proto_hdr field will be ignored.
>>> + *
>>> + * - Protocol header based buffer split:
>>> + *     - mp, offset, proto_hdr should be configured.
>>> + *     - The length field will be ignored.
>>>     */
>>>    struct rte_eth_rxseg_split {
>>>    	struct rte_mempool *mp; /**< Memory pool to allocate segment
>> from. */
>>>    	uint16_t length; /**< Segment data length, configures split point. */
>>>    	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
>> */
>>> -	uint32_t reserved; /**< Reserved field. */
>>> +	/**
>>> +	 * Supported ptype of a specific pmd, configures split point.
>>> +	 * It should be defined by RTE_PTYPE_*.
>>> +	 */
>>> +	uint32_t proto_hdr;
>>>    };
>>>
>>>    /**
>
  
Wang, YuanX Sept. 22, 2022, 3:13 a.m. UTC | #6
Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Tuesday, September 20, 2022 1:35 PM
> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@xilinx.com>
> Cc: mdr@ashroe.eu; Li, Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman Deep
> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> jerinjacobk@gmail.com; viacheslavo@nvidia.com;
> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>; Wenxuan Wu
> <wenxuanx.wu@intel.com>
> Subject: Re: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split
> 
> On 9/16/22 11:38, Wang, YuanX wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Monday, September 12, 2022 7:47 PM
> >> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
> Monjalon
> >> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@xilinx.com>
> >> Cc: mdr@ashroe.eu; Li, Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman
> >> Deep <aman.deep.singh@intel.com>; Zhang, Yuying
> >> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Yang,
> >> Qiming <qiming.yang@intel.com>; jerinjacobk@gmail.com;
> >> viacheslavo@nvidia.com; stephen@networkplumber.org; Ding, Xuan
> >> <xuan.ding@intel.com>; hpothula@marvell.com; Tang, Yaqi
> >> <yaqi.tang@intel.com>; Wenxuan Wu <wenxuanx.wu@intel.com>
> >> Subject: Re: [PATCH v3 2/4] ethdev: introduce protocol hdr based
> >> buffer split
> >>
> >> On 9/2/22 22:10, Yuan Wang wrote:
> >>> Currently, Rx buffer split supports length based split. With Rx
> >>> queue offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx
> packet
> >> segment
> >>> configured, PMD will be able to split the received packets into
> >>> multiple segments.
> >>>
> >>> However, length based buffer split is not suitable for NICs that do
> >>> split based on protocol headers. Given an arbitrarily variable
> >>> length in Rx packet segment, it is almost impossible to pass a fixed
> >>> protocol header to driver. Besides, the existence of tunneling
> >>> results in the composition of a packet is various, which makes the
> situation even worse.
> >>>
> >>> This patch extends current buffer split to support protocol header
> >>> based buffer split. A new proto_hdr field is introduced in the
> >>> reserved field of rte_eth_rxseg_split structure to specify protocol
> >>> header. The proto_hdr field defines the split position of packet,
> >>> splitting will always happens after the protocol header defined in
> >>> the Rx packet segment. When Rx queue offload
> >>> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> >> protocol
> >>> header is configured, driver will split the ingress packets into
> >>> multiple
> >> segments.
> >>>
> >>> struct rte_eth_rxseg_split {
> >>>           struct rte_mempool *mp; /* memory pools to allocate
> >>> segment from
> >> */
> >>>           uint16_t length; /* segment maximal data length,
> >>>                               configures split point */
> >>>           uint16_t offset; /* data offset from beginning
> >>>                               of mbuf data buffer */
> >>>           uint32_t proto_hdr; /* supported ptype of a specific pmd,
> >>>                                  configures split point.
> >>> 			       It should be defined by RTE_PTYPE_*
> >>
> >> If I understand correctly, the statement is a bit misleading since it
> >> should be a bit mask of RTE_PTYPE_* defines. Not exactly one
> RTE_PTYPE_*.
> >
> > Do you mean that a segment should support multiple protocol headers,
> such as splitting both tcp and udp headers?
> 
> No-no. Look. In order to split after some protocol, for example UDP, NIC
> should recognice all previous protocols. Moreover, in the case of a tunenl
> UDP could be inner and outer. At which point would you like to split if NIC
> supports both?
> Another example, if NIC support stplit after Eth-IPv4-UDP and after Eth-IPv6-
> UDP, how to request to split just after Eth-IPv4-UDP, but not Eth-IPv6-UDP?

Thank you for your patience. 

We have a proposal to solve this problem. 
We define the proto_hdr as a bit mask. Each mask should contain the composition of the packet, but inner and outer are written separately (to avoid unnecessary trouble).
We use the highest RTE_PTYPE* in the mask to define the split position.

For the first example, since ptype has distinguished between outer and inner, external UDP can be simply written as RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP.
The inner UDP can be simply written as RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP.

 For the second example, Eth-IPv4-UDP can be written as RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP.
 Eth-IPv6-UDP can be written as RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP.

What do you think?

Thanks,
Yuan

> 
> >
> >>
> >>> 			     */
> >>> };
> >>>
> >>> If protocol header split can be supported by a PMD. The
> >>> rte_eth_buffer_split_get_supported_hdr_ptypes function can be use
> to
> >>> obtain a list of these protocol headers.
> >>>
> >>> For example, let's suppose we configured the Rx queue with the
> >>> following segments:
> >>>           seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
> >>>           seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
> >>>           seg2 - pool2, off1=0B
> >>>
> >>> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> >>
> >> What is MAC_IPV4_UDP_PAYLOAD? Do you mean
> ETH_IPV4_UDP_PAYLOAD?
> >
> > Thanks for your correction, it should be ETH_IPV4_UDP_PAYLOAD.
> >
> >>
> >>> following:
> >>>           seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> >> pool0
> >>>           seg1 - udp header @ 128 in mbuf from pool1
> >>>           seg2 - payload @ 0 in mbuf from pool2
> >>>
> >>> Note: NIC will only do split when the packets exactly match all the
> >>> protocol headers in the segments. For example, if ARP packets
> >>> received with above config, the NIC won't do split for ARP packets
> >>> since it does not contains ipv4 header and udp header.
> >>
> >> You must define which mempool is used in the case.
> >
> > IMHO I don't think we can define which mempool to use, it depends on NIC
> behavior.
> > For our NIC, packets that are unable to split will be put into the last valid
> pool, with zero offset.
> > So here we would like to define to put these packets into the last valid
> mempool, with zero offset.
> 
> 
> Anyway API should not be silent about the case since it is the first question at
> least in my head. IMHO the last segment is the only sensible option since it is
> typically will be big enough. Other mempool for protocol headers are likely to
> be small. So, I suggest to define this way.
> 
> >>
> >>>
> >>> Now buffer split can be configured in two modes. For length based
> >>> buffer split, the mp, length, offset field in Rx packet segment
> >>> should be configured, while the proto_hdr field will be ignored.
> >>> For protocol header based buffer split, the mp, offset, proto_hdr
> >>> field in Rx packet segment should be configured, while the length
> >>> field will be ignored.
> >>>
> >>> The split limitations imposed by underlying driver is reported in
> >>> the rte_eth_dev_info->rx_seg_capa field. The memory attributes for
> >>> the split parts may differ either, dpdk memory and external memory,
> >> respectively.
> >>>
> >>> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> >>> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> >>> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> >>> ---
> >>>    doc/guides/rel_notes/release_22_11.rst |  5 +++
> >>>    lib/ethdev/rte_ethdev.c                | 55 ++++++++++++++++++++------
> >>>    lib/ethdev/rte_ethdev.h                | 17 +++++++-
> >>>    3 files changed, 65 insertions(+), 12 deletions(-)
> >>>
> >>> diff --git a/doc/guides/rel_notes/release_22_11.rst
> >>> b/doc/guides/rel_notes/release_22_11.rst
> >>> index 4d90514a9a..f3b58c7895 100644
> >>> --- a/doc/guides/rel_notes/release_22_11.rst
> >>> +++ b/doc/guides/rel_notes/release_22_11.rst
> >>> @@ -60,6 +60,11 @@ New Features
> >>>      Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to
> >>> get
> >> supported
> >>>      header protocols of a PMD to split.
> >>>
> >>> +* **Added protocol header based buffer split.**
> >>> +  Ethdev: The ``reserved`` field in the  ``rte_eth_rxseg_split``
> >>> +structure is
> >>> +  replaced with ``proto_hdr`` to support protocol header based buffer
> split.
> >>> +  User can choose length or protocol header to configure buffer
> >>> +split
> >>> +  according to NIC's capability.
> >>
> >> Add one more empty line to have two before the next sectoin.
> >
> > Thanks for your catch.
> >
> >>
> >>>
> >>>    Removed Items
> >>>    -------------
> >>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> >>> 093c577add..dfceb723ee 100644
> >>> --- a/lib/ethdev/rte_ethdev.c
> >>> +++ b/lib/ethdev/rte_ethdev.c
> >>> @@ -1635,9 +1635,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
> >>>    }
> >>>
> >>>    static int
> >>> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split
> *rx_seg,
> >>> -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> >>> -			     const struct rte_eth_dev_info *dev_info)
> >>> +rte_eth_rx_queue_check_split(uint16_t port_id,
> >>> +			const struct rte_eth_rxseg_split *rx_seg,
> >>> +			uint16_t n_seg, uint32_t *mbp_buf_size,
> >>> +			const struct rte_eth_dev_info *dev_info)
> >>>    {
> >>>    	const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
> >>> rx_seg_capa;
> >>>    	struct rte_mempool *mp_first;
> >>> @@ -1660,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> >> rte_eth_rxseg_split *rx_seg,
> >>>    		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> >>>    		uint32_t length = rx_seg[seg_idx].length;
> >>>    		uint32_t offset = rx_seg[seg_idx].offset;
> >>> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
> >>>
> >>>    		if (mpl == NULL) {
> >>>    			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> >> @@ -1693,13
> >>> +1695,44 @@ rte_eth_rx_queue_check_split(const struct
> >> rte_eth_rxseg_split *rx_seg,
> >>>    		}
> >>>    		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >>>    		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> >>> -		length = length != 0 ? length : *mbp_buf_size;
> >>> -		if (*mbp_buf_size < length + offset) {
> >>> -			RTE_ETHDEV_LOG(ERR,
> >>> -				       "%s mbuf_data_room_size %u < %u
> >> (segment length=%u + segment offset=%u)\n",
> >>> -				       mpl->name, *mbp_buf_size,
> >>> -				       length + offset, length, offset);
> >>> -			return -EINVAL;
> >>> +
> >>> +		int ret =
> >> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> >>> +NULL, 0);
> >>
> >> Do not mix variable declaration and code.
> >> It is better to give the variable some sensible name.
> >> Otherwise else branch code is hard to read.
> >
> > Thanks for the suggestion, will  take care of naming.
> >
> >>
> >>> +		if (ret <= 0) {
> >>
> >> May be I'm missing something, but nothing prevetns a driver/HW to
> >> support both protocol-based and fixed-length split.
> >> So, ability to support protocol based split should be treated as a
> >> request to do it. It must be based on rx_seg->proto_hdr content (for all
> segments).
> >>
> >> Also nothing should prevent to mix protocol and fixed-length split.
> >> I.e. split just after UDP in the first segment,
> >> 40 bytes in the second segment, everything else in the third.
> >
> > Mix mode is an interesting idea. Currently testpmd and driver do not
> support mixed mode, but it does not affect the library to support this mode.
> 
> That's OK.
> 
> >
> >>
> >>> +			/* Split at fixed length. */
> >>> +			length = length != 0 ? length : *mbp_buf_size;
> >>> +			if (*mbp_buf_size < length + offset) {
> >>> +				RTE_ETHDEV_LOG(ERR,
> >>> +					"%s mbuf_data_room_size %u < %u
> >> (segment length=%u + segment offset=%u)\n",
> >>> +					mpl->name, *mbp_buf_size,
> >>> +					length + offset, length, offset);
> >>> +				return -EINVAL;
> >>> +			}
> >>> +		} else {
> >>> +			/* Split after specified protocol header. */
> >>> +			uint32_t ptypes[ret];
> >>> +			int i;
> >>> +
> >>> +			ret =
> >> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> >>> +ptypes, ret);
> >>
> >> In theory, the funciton could fail since input arguments differ. So,
> >> it should be handled.
> >
> > Thanks for your catch, will fix in the next version.
> >
> >>
> >>> +			for (i = 0; i < ret; i++)
> >>> +				if (ptypes[i] & proto_hdr)
> >>
> >> IMHO it should be ==, not &. I think that
> >> rte_eth_buffer_split_get_supported_hdr_ptypes() should define points
> >> at which split could happen and we should match the point exactly.
> >
> > Sure, == is better. Thanks for the suggestion.
> >
> >>
> >>> +					break;
> >>> +
> >>> +			if (i == ret) {
> >>> +#define PTYPE_NAMESIZE	256
> >>
> >> Why? It is looks really strange that it is defined here.
> >
> > I intend to display the protocol name in the log, but if the proto_hdr is a bit
> mask, can I just show the number?
> > Please see v4 for this modification.
> >
> >>
> >>> +				char ptype_name[PTYPE_NAMESIZE];
> >>> +				rte_get_ptype_name(proto_hdr,
> >> ptype_name, sizeof(ptype_name));
> >>> +				RTE_ETHDEV_LOG(ERR,
> >>> +					"Protocol header %s is not
> >> supported.\n",
> >>> +					ptype_name);
> >>> +				return -EINVAL;
> >>> +			}
> >>> +			if (*mbp_buf_size < offset) {
> >>> +				RTE_ETHDEV_LOG(ERR,
> >>> +						"%s
> >> mbuf_data_room_size %u < %u segment offset)\n",
> >>> +						mpl->name, *mbp_buf_size,
> >>> +						offset);
> >>> +				return -EINVAL;
> >>> +			}
> >>>    		}
> >>>    	}
> >>>    	return 0;
> >>> @@ -1778,7 +1811,7 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> >> uint16_t rx_queue_id,
> >>>    		n_seg = rx_conf->rx_nseg;
> >>>
> >>>    		if (rx_conf->offloads &
> >> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
> >>> -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> >>> +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg,
> >> n_seg,
> >>>    							   &mbp_buf_size,
> >>>    							   &dev_info);
> >>>    			if (ret != 0)
> >>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> >>> c58c908c3a..410fba5eab 100644
> >>> --- a/lib/ethdev/rte_ethdev.h
> >>> +++ b/lib/ethdev/rte_ethdev.h
> >>> @@ -1175,6 +1175,9 @@ struct rte_eth_txmode {
> >>>     *   specified in the first array element, the second buffer, from the
> >>>     *   pool in the second element, and so on.
> >>>     *
> >>> + * - The proto_hdrs in the elements define the split position of
> >>> + *   received packets.
> >>> + *
> >>>     * - The offsets from the segment description elements specify
> >>>     *   the data offset from the buffer beginning except the first mbuf.
> >>>     *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> >>> @@ -1196,12 +1199,24 @@ struct rte_eth_txmode {
> >>>     *     - pool from the last valid element
> >>>     *     - the buffer size from this pool
> >>>     *     - zero offset
> >>> + *
> >>> + * - Length based buffer split:
> >>> + *     - mp, length, offset should be configured.
> >>> + *     - The proto_hdr field will be ignored.
> >>> + *
> >>> + * - Protocol header based buffer split:
> >>> + *     - mp, offset, proto_hdr should be configured.
> >>> + *     - The length field will be ignored.
> >>>     */
> >>>    struct rte_eth_rxseg_split {
> >>>    	struct rte_mempool *mp; /**< Memory pool to allocate segment
> >> from. */
> >>>    	uint16_t length; /**< Segment data length, configures split point. */
> >>>    	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> >> */
> >>> -	uint32_t reserved; /**< Reserved field. */
> >>> +	/**
> >>> +	 * Supported ptype of a specific pmd, configures split point.
> >>> +	 * It should be defined by RTE_PTYPE_*.
> >>> +	 */
> >>> +	uint32_t proto_hdr;
> >>>    };
> >>>
> >>>    /**
> >
  

Patch

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 4d90514a9a..f3b58c7895 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -60,6 +60,11 @@  New Features
   Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
   header protocols of a PMD to split.
 
+* **Added protocol header based buffer split.**
+  Ethdev: The ``reserved`` field in the  ``rte_eth_rxseg_split`` structure is
+  replaced with ``proto_hdr`` to support protocol header based buffer split.
+  User can choose length or protocol header to configure buffer split
+  according to NIC's capability.
 
 Removed Items
 -------------
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 093c577add..dfceb723ee 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1635,9 +1635,10 @@  rte_eth_dev_is_removed(uint16_t port_id)
 }
 
 static int
-rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
-			     uint16_t n_seg, uint32_t *mbp_buf_size,
-			     const struct rte_eth_dev_info *dev_info)
+rte_eth_rx_queue_check_split(uint16_t port_id,
+			const struct rte_eth_rxseg_split *rx_seg,
+			uint16_t n_seg, uint32_t *mbp_buf_size,
+			const struct rte_eth_dev_info *dev_info)
 {
 	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
 	struct rte_mempool *mp_first;
@@ -1660,6 +1661,7 @@  rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1693,13 +1695,44 @@  rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+
+		int ret = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
+		if (ret <= 0) {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Split after specified protocol header. */
+			uint32_t ptypes[ret];
+			int i;
+
+			ret = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, ptypes, ret);
+			for (i = 0; i < ret; i++)
+				if (ptypes[i] & proto_hdr)
+					break;
+
+			if (i == ret) {
+#define PTYPE_NAMESIZE	256
+				char ptype_name[PTYPE_NAMESIZE];
+				rte_get_ptype_name(proto_hdr, ptype_name, sizeof(ptype_name));
+				RTE_ETHDEV_LOG(ERR,
+					"Protocol header %s is not supported.\n",
+					ptype_name);
+				return -EINVAL;
+			}
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
@@ -1778,7 +1811,7 @@  rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		n_seg = rx_conf->rx_nseg;
 
 		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
-			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
+			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
 							   &mbp_buf_size,
 							   &dev_info);
 			if (ret != 0)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index c58c908c3a..410fba5eab 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1175,6 +1175,9 @@  struct rte_eth_txmode {
  *   specified in the first array element, the second buffer, from the
  *   pool in the second element, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - The offsets from the segment description elements specify
  *   the data offset from the buffer beginning except the first mbuf.
  *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
@@ -1196,12 +1199,24 @@  struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field will be ignored.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field will be ignored.
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	/**
+	 * Supported ptype of a specific pmd, configures split point.
+	 * It should be defined by RTE_PTYPE_*.
+	 */
+	uint32_t proto_hdr;
 };
 
 /**