[v5,1/4] lib/ethdev: introduce protocol type based buffer split
Checks
Commit Message
From: Wenxuan Wu <wenxuanx.wu@intel.com>
Protocol based buffer split consists of splitting a received packet into two
separate regions based on its content. The split happens after the packet
protocol header and before the packet payload. Splitting is usually between
the packet protocol header that can be posted to a dedicated buffer and the
packet payload that can be posted to a different buffer.
Currently, Rx buffer split supports length and offset based packet split.
protocol split is based on buffer split, configuring length of buffer split
is not suitable for NICs that do split based on protocol types. Because
tunneling makes the conversion from length to protocol type impossible.
This patch extends the current buffer split to support protocol and offset
based buffer split. A new proto field is introduced in the rte_eth_rxseg_split
structure reserved field to specify header protocol type. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and corresponding protocol
type configured. PMD will split the ingress packets into two separate regions.
Currently, both inner and outer L2/L3/L4 level protocol based buffer split
can be supported.
For example, let's suppose we configured the Rx queue with the
following segments:
seg0 - pool0, off0=2B
seg1 - pool1, off1=128B
With protocol split type configured with RTE_PTYPE_L4_UDP. The packet
consists of MAC_IP_UDP_PAYLOAD will be splitted like following:
seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
seg1 - payload @ 128 in mbuf from pool1
The memory attributes for the split parts may differ either - for example
the mempool0 and mempool1 belong to dpdk memory and external memory,
respectively.
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
lib/ethdev/rte_ethdev.c | 36 +++++++++++++++++++++++++++++-------
lib/ethdev/rte_ethdev.h | 15 ++++++++++++++-
2 files changed, 43 insertions(+), 8 deletions(-)
Comments
Hello,
It seems you didn't try to address my main comment on v4:
"
Before doing anything, the first patch of this series should make
the current status clearer.
Example, this line does not explain what it does:
uint16_t split_hdr_size; /**< hdr buf size (header_split enabled).*/
And header_split has been removed in ab3ce1e0c193 ("ethdev: remove old offload API")
If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT is not needed,
let's add a comment to start a deprecation.
"
Also the comment from Andrew about removing limitation to 2 packets
is not addressed.
All the part about the protocols capability is missing here.
It is not encouraging.
26/04/2022 13:13, wenxuanx.wu@intel.com:
> From: Wenxuan Wu <wenxuanx.wu@intel.com>
>
> Protocol based buffer split consists of splitting a received packet into two
> separate regions based on its content. The split happens after the packet
> protocol header and before the packet payload. Splitting is usually between
> the packet protocol header that can be posted to a dedicated buffer and the
> packet payload that can be posted to a different buffer.
>
> Currently, Rx buffer split supports length and offset based packet split.
> protocol split is based on buffer split, configuring length of buffer split
> is not suitable for NICs that do split based on protocol types.
Why? Is it impossible to support length split on Intel NIC?
> Because tunneling makes the conversion from length
> to protocol type impossible.
This is not a HW issue.
I agree on the need but that a different usage than length split.
> This patch extends the current buffer split to support protocol and offset
> based buffer split. A new proto field is introduced in the rte_eth_rxseg_split
> structure reserved field to specify header protocol type. With Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and corresponding protocol
> type configured. PMD will split the ingress packets into two separate regions.
> Currently, both inner and outer L2/L3/L4 level protocol based buffer split
> can be supported.
>
> For example, let's suppose we configured the Rx queue with the
> following segments:
> seg0 - pool0, off0=2B
> seg1 - pool1, off1=128B
>
> With protocol split type configured with RTE_PTYPE_L4_UDP. The packet
> consists of MAC_IP_UDP_PAYLOAD will be splitted like following:
> seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
> seg1 - payload @ 128 in mbuf from pool1
Not clear what is the calculation.
> The memory attributes for the split parts may differ either - for example
> the mempool0 and mempool1 belong to dpdk memory and external memory,
> respectively.
>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> ---
> lib/ethdev/rte_ethdev.c | 36 +++++++++++++++++++++++++++++-------
> lib/ethdev/rte_ethdev.h | 15 ++++++++++++++-
> 2 files changed, 43 insertions(+), 8 deletions(-)
>
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 29a3d80466..1a2bc172ab 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> uint32_t length = rx_seg[seg_idx].length;
> uint32_t offset = rx_seg[seg_idx].offset;
> + uint32_t proto = rx_seg[seg_idx].proto;
>
> if (mpl == NULL) {
> RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1694,13 +1695,34 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> }
> offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> *mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> - length = length != 0 ? length : *mbp_buf_size;
> - if (*mbp_buf_size < length + offset) {
> - RTE_ETHDEV_LOG(ERR,
> - "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> - mpl->name, *mbp_buf_size,
> - length + offset, length, offset);
> - return -EINVAL;
> + if (proto == 0) {
Add a comment here, /* split at fixed length */
> + length = length != 0 ? length : *mbp_buf_size;
> + if (*mbp_buf_size < length + offset) {
> + RTE_ETHDEV_LOG(ERR,
> + "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> + mpl->name, *mbp_buf_size,
> + length + offset, length, offset);
> + return -EINVAL;
> + }
> + } else {
Add a comment here, /* split after specified protocol header */
> + /* Ensure n_seg is 2 in protocol based buffer split. */
> + if (n_seg != 2) {
(should be a space, not a tab before brace)
Why do you limit the feature to 2 segments only?
> + RTE_ETHDEV_LOG(ERR, "number of buffer split protocol segments should be 2.\n");
> + return -EINVAL;
> + }
> + /* Length and protocol are exclusive here, so make sure length is 0 in protocol
> + based buffer split. */
> + if (length != 0) {
> + RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in buffer split\n");
> + return -EINVAL;
> + }
> + if (*mbp_buf_size < offset) {
> + RTE_ETHDEV_LOG(ERR,
> + "%s mbuf_data_room_size %u < %u segment offset)\n",
> + mpl->name, *mbp_buf_size,
> + offset);
> + return -EINVAL;
[...]
> + * - The proto in the elements defines the split position of received packets.
> + *
> * - If the length in the segment description element is zero
> * the actual buffer size will be deduced from the appropriate
> * memory pool properties.
> @@ -1197,12 +1200,21 @@ struct rte_eth_txmode {
> * - pool from the last valid element
> * - the buffer size from this pool
> * - zero offset
> + *
> + * - Length based buffer split:
> + * - mp, length, offset should be configured.
> + * - The proto should not be configured in length split. Zero default.
> + *
> + * - Protocol based buffer split:
> + * - mp, offset, proto should be configured.
> + * - The length should not be configured in protocol split. Zero default.
What means "Zero default"?
You should just ignore the non relevant field.
> struct rte_eth_rxseg_split {
> struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
> uint16_t length; /**< Segment data length, configures split point. */
> uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> - uint32_t reserved; /**< Reserved field. */
How do you manage ABI compatibility?
Was the reserved field initialized to 0 in previous versions?
> + uint32_t proto; /**< Protocol of buffer split, determines protocol split point. */
What are the values for "proto"?
> @@ -1664,6 +1676,7 @@ struct rte_eth_conf {
> RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
> #define DEV_RX_OFFLOAD_VLAN RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN) RTE_ETH_RX_OFFLOAD_VLAN
>
> +
It looks to be an useless change.
> /*
> * If new Rx offload capabilities are defined, they also must be
> * mentioned in rte_rx_offload_names in rte_ethdev.c file.
>
Hi Thomas,
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Wednesday, May 18, 2022 5:12 AM
> To: Ding, Xuan <xuan.ding@intel.com>; Wang, YuanX
> <yuanx.wang@intel.com>; Wu, WenxuanX <wenxuanx.wu@intel.com>
> Cc: andrew.rybchenko@oktetlabs.ru; Li, Xiaoyun <xiaoyun.li@intel.com>;
> ferruh.yigit@xilinx.com; Singh, Aman Deep <aman.deep.singh@intel.com>;
> dev@dpdk.org; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; jerinjacobk@gmail.com;
> stephen@networkplumber.org; mb@smartsharesystems.com;
> viacheslavo@nvidia.com; Yu, Ping <ping.yu@intel.com>
> Subject: Re: [PATCH v5 1/4] lib/ethdev: introduce protocol type based buffer
> split
>
> Hello,
>
> It seems you didn't try to address my main comment on v4:
> "
> Before doing anything, the first patch of this series should make the current
> status clearer.
> Example, this line does not explain what it does:
> uint16_t split_hdr_size; /**< hdr buf size (header_split enabled).*/ And
> header_split has been removed in ab3ce1e0c193 ("ethdev: remove old
> offload API")
>
> If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT is not needed, let's add a comment
> to start a deprecation.
> "
Thank you for the detailed review.
First of all, I agree that header split should be deprecated.
Since it is unrelated with buffer split, I was planning to send the deprecation notice
in 22.07 sometime later and start the deprecation in 22.11.
If you think it should be the first step, I will send the deprecation notice first.
>
> Also the comment from Andrew about removing limitation to 2 packets is not
> addressed.
Secondly, it is said that the protocol based buffer split will divide the packet into two segments.
Because I thought it will only be used in the split between header and payload.
In fact, protocol based buffer split can support multi-segment split.
That is to say, like length-based buffer split, we define a series of protos,
as the split point of protocol based buffer split. And this series of protos,
like lengths, indicate the split location.
For example, a packet consists of MAC/IPV4/UDP/payload.
If we define the buffer split proto with IPV4, and UDP, the packet will be
split into three segments:
seg0: MAC and IPV4 header, 34 bytes
seg1: UDP header, 8 bytes
seg2: Payload, the actual payload size
What do you think of this design?
>
> All the part about the protocols capability is missing here.
Yes, I missed the protocols capability with RTE_PTYPE* now.
I will update the doc with supported protocol capability in v6.
>
> It is not encouraging.
>
> 26/04/2022 13:13, wenxuanx.wu@intel.com:
> > From: Wenxuan Wu <wenxuanx.wu@intel.com>
> >
> > Protocol based buffer split consists of splitting a received packet
> > into two separate regions based on its content. The split happens
> > after the packet protocol header and before the packet payload.
> > Splitting is usually between the packet protocol header that can be
> > posted to a dedicated buffer and the packet payload that can be posted to
> a different buffer.
> >
> > Currently, Rx buffer split supports length and offset based packet split.
> > protocol split is based on buffer split, configuring length of buffer
> > split is not suitable for NICs that do split based on protocol types.
>
> Why? Is it impossible to support length split on Intel NIC?
Yes, our NIC supports split based on protocol types. And I think there are other vendors too.
The existence of tunneling results in the composition of a packet is various.
Given a changeable length, it is impossible to tell the driver a fixed protocol type.
>
> > Because tunneling makes the conversion from length to protocol type
> > impossible.
>
> This is not a HW issue.
> I agree on the need but that a different usage than length split.
I think the new usage can solve this problem, so that length split
and proto split can have the same result.
>
> > This patch extends the current buffer split to support protocol and
> > offset based buffer split. A new proto field is introduced in the
> > rte_eth_rxseg_split structure reserved field to specify header
> > protocol type. With Rx queue offload
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
> > enabled and corresponding protocol type configured. PMD will split the
> ingress packets into two separate regions.
> > Currently, both inner and outer L2/L3/L4 level protocol based buffer
> > split can be supported.
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> > seg0 - pool0, off0=2B
> > seg1 - pool1, off1=128B
> >
> > With protocol split type configured with RTE_PTYPE_L4_UDP. The packet
> > consists of MAC_IP_UDP_PAYLOAD will be splitted like following:
> > seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
> > seg1 - payload @ 128 in mbuf from pool1
>
> Not clear what is the calculation.
The previous usage of protocol based split is the split between header and payload.
Here for a packet composed of MAC_IP_UDP_PAYLOAD, with protocol split type RTE_PTYPE_L4_UDP
configured, it means split between the UDP header and payload.
In length configuration, the proto = RTE_PTYPE_L4_UDP means length = 42B.
>
> > The memory attributes for the split parts may differ either - for
> > example the mempool0 and mempool1 belong to dpdk memory and
> external
> > memory, respectively.
> >
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> > Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> > ---
> > lib/ethdev/rte_ethdev.c | 36 +++++++++++++++++++++++++++++-------
> > lib/ethdev/rte_ethdev.h | 15 ++++++++++++++-
> > 2 files changed, 43 insertions(+), 8 deletions(-)
> >
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 29a3d80466..1a2bc172ab 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> > struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> > uint32_t length = rx_seg[seg_idx].length;
> > uint32_t offset = rx_seg[seg_idx].offset;
> > + uint32_t proto = rx_seg[seg_idx].proto;
> >
> > if (mpl == NULL) {
> > RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1694,13
> > +1695,34 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> > }
> > offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> > *mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > - length = length != 0 ? length : *mbp_buf_size;
> > - if (*mbp_buf_size < length + offset) {
> > - RTE_ETHDEV_LOG(ERR,
> > - "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > - mpl->name, *mbp_buf_size,
> > - length + offset, length, offset);
> > - return -EINVAL;
> > + if (proto == 0) {
>
> Add a comment here, /* split at fixed length */
Thanks for the suggestion, will add it in next version.
>
> > + length = length != 0 ? length : *mbp_buf_size;
> > + if (*mbp_buf_size < length + offset) {
> > + RTE_ETHDEV_LOG(ERR,
> > + "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > + mpl->name, *mbp_buf_size,
> > + length + offset, length, offset);
> > + return -EINVAL;
> > + }
> > + } else {
>
> Add a comment here, /* split after specified protocol header */
Thanks for the suggestion, will add it in next version.
>
> > + /* Ensure n_seg is 2 in protocol based buffer split. */
> > + if (n_seg != 2) {
>
> (should be a space, not a tab before brace)
Get it.
>
> Why do you limit the feature to 2 segments only?
Please see the new usage explained above.
>
> > + RTE_ETHDEV_LOG(ERR, "number of buffer
> split protocol segments should be 2.\n");
> > + return -EINVAL;
> > + }
> > + /* Length and protocol are exclusive here, so make
> sure length is 0 in protocol
> > + based buffer split. */
> > + if (length != 0) {
> > + RTE_ETHDEV_LOG(ERR, "segment length
> should be set to zero in buffer split\n");
> > + return -EINVAL;
> > + }
> > + if (*mbp_buf_size < offset) {
> > + RTE_ETHDEV_LOG(ERR,
> > + "%s
> mbuf_data_room_size %u < %u segment offset)\n",
> > + mpl->name, *mbp_buf_size,
> > + offset);
> > + return -EINVAL;
> [...]
> > + * - The proto in the elements defines the split position of received packets.
> > + *
> > * - If the length in the segment description element is zero
> > * the actual buffer size will be deduced from the appropriate
> > * memory pool properties.
> > @@ -1197,12 +1200,21 @@ struct rte_eth_txmode {
> > * - pool from the last valid element
> > * - the buffer size from this pool
> > * - zero offset
> > + *
> > + * - Length based buffer split:
> > + * - mp, length, offset should be configured.
> > + * - The proto should not be configured in length split. Zero default.
> > + *
> > + * - Protocol based buffer split:
> > + * - mp, offset, proto should be configured.
> > + * - The length should not be configured in protocol split. Zero default.
>
> What means "Zero default"?
> You should just ignore the non relevant field.
Yes, you are right, the none relevant field should be just ignored.
I will update the doc in v6.
>
> > struct rte_eth_rxseg_split {
> > struct rte_mempool *mp; /**< Memory pool to allocate segment
> from. */
> > uint16_t length; /**< Segment data length, configures split point. */
> > uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> */
> > - uint32_t reserved; /**< Reserved field. */
>
> How do you manage ABI compatibility?
> Was the reserved field initialized to 0 in previous versions?
I think we reached an agreement in RFC v1. There is no document for the reserved
field in the previous release. And it is always initialized to zero in real cases.
And now splitting based on fixed length and protocol header parsing is exclusive,
we can ignore the none relevant field.
>
> > + uint32_t proto; /**< Protocol of buffer split, determines protocol
> > +split point. */
>
> What are the values for "proto"?
Yes, I missed the protocol capability here, will fix it in v6.
>
> > @@ -1664,6 +1676,7 @@ struct rte_eth_conf {
> > RTE_ETH_RX_OFFLOAD_QINQ_STRIP) #define
> DEV_RX_OFFLOAD_VLAN
> > RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN) RTE_ETH_RX_OFFLOAD_VLAN
> >
> > +
>
> It looks to be an useless change.
Thanks for the catch, will fix it in next version.
Thanks again for your time.
Regards,
Xuan
>
> > /*
> > * If new Rx offload capabilities are defined, they also must be
> > * mentioned in rte_rx_offload_names in rte_ethdev.c file.
> >
>
>
>
>
Hi,
> -----Original Message-----
> From: Ding, Xuan <xuan.ding@intel.com>
> Sent: Thursday, May 19, 2022 10:40 PM
> To: Thomas Monjalon <thomas@monjalon.net>; Wang, YuanX
> <yuanx.wang@intel.com>; Wu, WenxuanX <wenxuanx.wu@intel.com>
> Cc: andrew.rybchenko@oktetlabs.ru; Li, Xiaoyun <xiaoyun.li@intel.com>;
> ferruh.yigit@xilinx.com; Singh, Aman Deep <aman.deep.singh@intel.com>;
> dev@dpdk.org; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; jerinjacobk@gmail.com;
> stephen@networkplumber.org; mb@smartsharesystems.com;
> viacheslavo@nvidia.com; Yu, Ping <ping.yu@intel.com>
> Subject: RE: [PATCH v5 1/4] lib/ethdev: introduce protocol type based buffer
> split
>
> Hi Thomas,
>
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Wednesday, May 18, 2022 5:12 AM
> > To: Ding, Xuan <xuan.ding@intel.com>; Wang, YuanX
> > <yuanx.wang@intel.com>; Wu, WenxuanX <wenxuanx.wu@intel.com>
> > Cc: andrew.rybchenko@oktetlabs.ru; Li, Xiaoyun <xiaoyun.li@intel.com>;
> > ferruh.yigit@xilinx.com; Singh, Aman Deep <aman.deep.singh@intel.com>;
> > dev@dpdk.org; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> > <qi.z.zhang@intel.com>; jerinjacobk@gmail.com;
> > stephen@networkplumber.org; mb@smartsharesystems.com;
> > viacheslavo@nvidia.com; Yu, Ping <ping.yu@intel.com>
> > Subject: Re: [PATCH v5 1/4] lib/ethdev: introduce protocol type based
> > buffer split
> >
> > Hello,
> >
> > It seems you didn't try to address my main comment on v4:
> > "
> > Before doing anything, the first patch of this series should make the
> > current status clearer.
> > Example, this line does not explain what it does:
> > uint16_t split_hdr_size; /**< hdr buf size (header_split
> > enabled).*/ And header_split has been removed in ab3ce1e0c193
> > ("ethdev: remove old offload API")
> >
> > If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT is not needed, let's add a
> comment
> > to start a deprecation.
> > "
>
> Thank you for the detailed review.
>
> First of all, I agree that header split should be deprecated.
> Since it is unrelated with buffer split, I was planning to send the deprecation
> notice in 22.07 sometime later and start the deprecation in 22.11.
>
> If you think it should be the first step, I will send the deprecation notice first.
>
> >
> > Also the comment from Andrew about removing limitation to 2 packets is
> > not addressed.
>
> Secondly, it is said that the protocol based buffer split will divide the packet
> into two segments.
> Because I thought it will only be used in the split between header and
> payload.
>
> In fact, protocol based buffer split can support multi-segment split.
> That is to say, like length-based buffer split, we define a series of protos, as
> the split point of protocol based buffer split. And this series of protos, like
> lengths, indicate the split location.
>
> For example, a packet consists of MAC/IPV4/UDP/payload.
> If we define the buffer split proto with IPV4, and UDP, the packet will be split
> into three segments:
> seg0: MAC and IPV4 header, 34 bytes
> seg1: UDP header, 8 bytes
> seg2: Payload, the actual payload size
>
> What do you think of this design?
>
> >
> > All the part about the protocols capability is missing here.
>
> Yes, I missed the protocols capability with RTE_PTYPE* now.
> I will update the doc with supported protocol capability in v6.
>
> >
> > It is not encouraging.
> >
> > 26/04/2022 13:13, wenxuanx.wu@intel.com:
> > > From: Wenxuan Wu <wenxuanx.wu@intel.com>
> > >
> > > Protocol based buffer split consists of splitting a received packet
> > > into two separate regions based on its content. The split happens
> > > after the packet protocol header and before the packet payload.
> > > Splitting is usually between the packet protocol header that can be
> > > posted to a dedicated buffer and the packet payload that can be
> > > posted to
> > a different buffer.
> > >
> > > Currently, Rx buffer split supports length and offset based packet split.
> > > protocol split is based on buffer split, configuring length of
> > > buffer split is not suitable for NICs that do split based on protocol types.
> >
> > Why? Is it impossible to support length split on Intel NIC?
>
> Yes, our NIC supports split based on protocol types. And I think there are
> other vendors too.
> The existence of tunneling results in the composition of a packet is various.
> Given a changeable length, it is impossible to tell the driver a fixed protocol
> type.
>
> >
> > > Because tunneling makes the conversion from length to protocol type
> > > impossible.
> >
> > This is not a HW issue.
> > I agree on the need but that a different usage than length split.
>
> I think the new usage can solve this problem, so that length split and proto
> split can have the same result.
>
> >
> > > This patch extends the current buffer split to support protocol and
> > > offset based buffer split. A new proto field is introduced in the
> > > rte_eth_rxseg_split structure reserved field to specify header
> > > protocol type. With Rx queue offload
> > RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
> > > enabled and corresponding protocol type configured. PMD will split
> > > the
> > ingress packets into two separate regions.
> > > Currently, both inner and outer L2/L3/L4 level protocol based buffer
> > > split can be supported.
> > >
> > > For example, let's suppose we configured the Rx queue with the
> > > following segments:
> > > seg0 - pool0, off0=2B
> > > seg1 - pool1, off1=128B
> > >
> > > With protocol split type configured with RTE_PTYPE_L4_UDP. The
> > > packet consists of MAC_IP_UDP_PAYLOAD will be splitted like following:
> > > seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
> > > seg1 - payload @ 128 in mbuf from pool1
> >
> > Not clear what is the calculation.
>
> The previous usage of protocol based split is the split between header and
> payload.
> Here for a packet composed of MAC_IP_UDP_PAYLOAD, with protocol split
> type RTE_PTYPE_L4_UDP configured, it means split between the UDP header
> and payload.
> In length configuration, the proto = RTE_PTYPE_L4_UDP means length = 42B.
>
> >
> > > The memory attributes for the split parts may differ either - for
> > > example the mempool0 and mempool1 belong to dpdk memory and
> > external
> > > memory, respectively.
> > >
> > > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> > > Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> > > ---
> > > lib/ethdev/rte_ethdev.c | 36 +++++++++++++++++++++++++++++-------
> > > lib/ethdev/rte_ethdev.h | 15 ++++++++++++++-
> > > 2 files changed, 43 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > > 29a3d80466..1a2bc172ab 100644
> > > --- a/lib/ethdev/rte_ethdev.c
> > > +++ b/lib/ethdev/rte_ethdev.c
> > > @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> > rte_eth_rxseg_split *rx_seg,
> > > struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> > > uint32_t length = rx_seg[seg_idx].length;
> > > uint32_t offset = rx_seg[seg_idx].offset;
> > > + uint32_t proto = rx_seg[seg_idx].proto;
> > >
> > > if (mpl == NULL) {
> > > RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> > @@ -1694,13
> > > +1695,34 @@ rte_eth_rx_queue_check_split(const struct
> > rte_eth_rxseg_split *rx_seg,
> > > }
> > > offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> > > *mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > > - length = length != 0 ? length : *mbp_buf_size;
> > > - if (*mbp_buf_size < length + offset) {
> > > - RTE_ETHDEV_LOG(ERR,
> > > - "%s mbuf_data_room_size %u < %u
> > (segment length=%u + segment offset=%u)\n",
> > > - mpl->name, *mbp_buf_size,
> > > - length + offset, length, offset);
> > > - return -EINVAL;
> > > + if (proto == 0) {
> >
> > Add a comment here, /* split at fixed length */
>
> Thanks for the suggestion, will add it in next version.
>
> >
> > > + length = length != 0 ? length : *mbp_buf_size;
> > > + if (*mbp_buf_size < length + offset) {
> > > + RTE_ETHDEV_LOG(ERR,
> > > + "%s mbuf_data_room_size %u < %u
> > (segment length=%u + segment offset=%u)\n",
> > > + mpl->name, *mbp_buf_size,
> > > + length + offset, length, offset);
> > > + return -EINVAL;
> > > + }
> > > + } else {
> >
> > Add a comment here, /* split after specified protocol header */
>
> Thanks for the suggestion, will add it in next version.
>
> >
> > > + /* Ensure n_seg is 2 in protocol based buffer split. */
> > > + if (n_seg != 2) {
> >
> > (should be a space, not a tab before brace)
>
> Get it.
>
> >
> > Why do you limit the feature to 2 segments only?
>
> Please see the new usage explained above.
>
> >
> > > + RTE_ETHDEV_LOG(ERR, "number of buffer
> > split protocol segments should be 2.\n");
> > > + return -EINVAL;
> > > + }
> > > + /* Length and protocol are exclusive here, so make
> > sure length is 0 in protocol
> > > + based buffer split. */
> > > + if (length != 0) {
> > > + RTE_ETHDEV_LOG(ERR, "segment length
> > should be set to zero in buffer split\n");
> > > + return -EINVAL;
> > > + }
> > > + if (*mbp_buf_size < offset) {
> > > + RTE_ETHDEV_LOG(ERR,
> > > + "%s
> > mbuf_data_room_size %u < %u segment offset)\n",
> > > + mpl->name, *mbp_buf_size,
> > > + offset);
> > > + return -EINVAL;
> > [...]
> > > + * - The proto in the elements defines the split position of received
> packets.
> > > + *
> > > * - If the length in the segment description element is zero
> > > * the actual buffer size will be deduced from the appropriate
> > > * memory pool properties.
> > > @@ -1197,12 +1200,21 @@ struct rte_eth_txmode {
> > > * - pool from the last valid element
> > > * - the buffer size from this pool
> > > * - zero offset
> > > + *
> > > + * - Length based buffer split:
> > > + * - mp, length, offset should be configured.
> > > + * - The proto should not be configured in length split. Zero default.
> > > + *
> > > + * - Protocol based buffer split:
> > > + * - mp, offset, proto should be configured.
> > > + * - The length should not be configured in protocol split. Zero default.
> >
> > What means "Zero default"?
> > You should just ignore the non relevant field.
>
> Yes, you are right, the none relevant field should be just ignored.
> I will update the doc in v6.
Sorry for replying myself.
After consideration, I found it is hard to just ignore the non-relevant field.
Because when length and proto both exist, it is hard to decide whether it is
length based buffer split or protocol based buffer split, which affects the
check in rte_eth_rx_queue_check_split().
So I would like to keep the current design in v6. When choosing one mode of
buffer split, the non-relevant field should not be configured.
Hope to get your feedbacks. :)
Regards,
Xuan
>
> >
> > > struct rte_eth_rxseg_split {
> > > struct rte_mempool *mp; /**< Memory pool to allocate segment
> > from. */
> > > uint16_t length; /**< Segment data length, configures split point. */
> > > uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> > */
> > > - uint32_t reserved; /**< Reserved field. */
> >
> > How do you manage ABI compatibility?
> > Was the reserved field initialized to 0 in previous versions?
>
> I think we reached an agreement in RFC v1. There is no document for the
> reserved field in the previous release. And it is always initialized to zero in
> real cases.
> And now splitting based on fixed length and protocol header parsing is
> exclusive, we can ignore the none relevant field.
>
> >
> > > + uint32_t proto; /**< Protocol of buffer split, determines protocol
> > > +split point. */
> >
> > What are the values for "proto"?
>
> Yes, I missed the protocol capability here, will fix it in v6.
>
> >
> > > @@ -1664,6 +1676,7 @@ struct rte_eth_conf {
> > > RTE_ETH_RX_OFFLOAD_QINQ_STRIP) #define
> > DEV_RX_OFFLOAD_VLAN
> > > RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN)
> RTE_ETH_RX_OFFLOAD_VLAN
> > >
> > > +
> >
> > It looks to be an useless change.
>
> Thanks for the catch, will fix it in next version.
>
> Thanks again for your time.
>
> Regards,
> Xuan
>
> >
> > > /*
> > > * If new Rx offload capabilities are defined, they also must be
> > > * mentioned in rte_rx_offload_names in rte_ethdev.c file.
> > >
> >
> >
> >
> >
@@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
struct rte_mempool *mpl = rx_seg[seg_idx].mp;
uint32_t length = rx_seg[seg_idx].length;
uint32_t offset = rx_seg[seg_idx].offset;
+ uint32_t proto = rx_seg[seg_idx].proto;
if (mpl == NULL) {
RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1694,13 +1695,34 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
}
offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
- length = length != 0 ? length : *mbp_buf_size;
- if (*mbp_buf_size < length + offset) {
- RTE_ETHDEV_LOG(ERR,
- "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
- mpl->name, *mbp_buf_size,
- length + offset, length, offset);
- return -EINVAL;
+ if (proto == 0) {
+ length = length != 0 ? length : *mbp_buf_size;
+ if (*mbp_buf_size < length + offset) {
+ RTE_ETHDEV_LOG(ERR,
+ "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+ mpl->name, *mbp_buf_size,
+ length + offset, length, offset);
+ return -EINVAL;
+ }
+ } else {
+ /* Ensure n_seg is 2 in protocol based buffer split. */
+ if (n_seg != 2) {
+ RTE_ETHDEV_LOG(ERR, "number of buffer split protocol segments should be 2.\n");
+ return -EINVAL;
+ }
+ /* Length and protocol are exclusive here, so make sure length is 0 in protocol
+ based buffer split. */
+ if (length != 0) {
+ RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in buffer split\n");
+ return -EINVAL;
+ }
+ if (*mbp_buf_size < offset) {
+ RTE_ETHDEV_LOG(ERR,
+ "%s mbuf_data_room_size %u < %u segment offset)\n",
+ mpl->name, *mbp_buf_size,
+ offset);
+ return -EINVAL;
+ }
}
}
return 0;
@@ -1187,6 +1187,9 @@ struct rte_eth_txmode {
* mbuf) the following data will be pushed to the next segment
* up to its own length, and so on.
*
+ *
+ * - The proto in the elements defines the split position of received packets.
+ *
* - If the length in the segment description element is zero
* the actual buffer size will be deduced from the appropriate
* memory pool properties.
@@ -1197,12 +1200,21 @@ struct rte_eth_txmode {
* - pool from the last valid element
* - the buffer size from this pool
* - zero offset
+ *
+ * - Length based buffer split:
+ * - mp, length, offset should be configured.
+ * - The proto should not be configured in length split. Zero default.
+ *
+ * - Protocol based buffer split:
+ * - mp, offset, proto should be configured.
+ * - The length should not be configured in protocol split. Zero default.
+ *
*/
struct rte_eth_rxseg_split {
struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
uint16_t length; /**< Segment data length, configures split point. */
uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
- uint32_t reserved; /**< Reserved field. */
+ uint32_t proto; /**< Protocol of buffer split, determines protocol split point. */
};
/**
@@ -1664,6 +1676,7 @@ struct rte_eth_conf {
RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
#define DEV_RX_OFFLOAD_VLAN RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN) RTE_ETH_RX_OFFLOAD_VLAN
+
/*
* If new Rx offload capabilities are defined, they also must be
* mentioned in rte_rx_offload_names in rte_ethdev.c file.