[v3,1/3] ethdev: introduce pool sort capability

Message ID 20220902070047.2812906-1-hpothula@marvell.com (mailing list archive)
State Superseded, archived
Delegated to: Andrew Rybchenko
Headers
Series [v3,1/3] ethdev: introduce pool sort capability |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Hanumanth Pothula Sept. 2, 2022, 7 a.m. UTC
  This patch adds support for the pool sort capability.
Some of the HW has support for choosing memory pools based on the
packet's size. The pool sort capability allows PMD to choose a
memory pool based on the packet's length.

This is often useful for saving the memory where the application
can create a different pool to steer the specific size of the
packet, thus enabling effective use of memory.

For example, let's say HW has a capability of three pools,
 - pool-1 size is 2K
 - pool-2 size is > 2K and < 4K
 - pool-3 size is > 4K
Here,
        pool-1 can accommodate packets with sizes < 2K
        pool-2 can accommodate packets with sizes > 2K and < 4K
        pool-3 can accommodate packets with sizes > 4K

With pool sort capability enabled in SW, an application may create
three pools of different sizes and send them to PMD. Allowing PMD
to program HW based on packet lengths. So that packets with less
than 2K are received on pool-1, packets with lengths between 2K
and 4K are received on pool-2 and finally packets greater than 4K
are received on pool-3.

The following two capabilities are added to the rte_eth_rxseg_capa
structure,
1. pool_sort --> tells pool sort capability is supported by HW.
2. max_npool --> max number of pools supported by HW.

Defined new structure rte_eth_rxseg_sort, to be used only when pool
sort capability is present. If required this may be extended further
to support more configurations.

Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>

v3:
 - Implemented Pool Sort capability as new Rx offload capability,
   RTE_ETH_RX_OFFLOAD_BUFFER_SORT.
v2:
 - Along with spec changes, uploading testpmd and driver changes.
---
 lib/ethdev/rte_ethdev.c | 69 ++++++++++++++++++++++++++++++++++++++---
 lib/ethdev/rte_ethdev.h | 24 +++++++++++++-
 2 files changed, 88 insertions(+), 5 deletions(-)
  

Comments

Andrew Rybchenko Sept. 13, 2022, 8:06 a.m. UTC | #1
On 9/2/22 10:00, Hanumanth Pothula wrote:
> This patch adds support for the pool sort capability.

"Add support for serveral (?) mbuf pools per Rx queue."

I dislike the word "sort" in summary and the feature
description. IMHO it is too restrictive for intended behaviour.

The key feature here is just support for more than one mbuf
pool per Rx queue. That's it. Everything else should be out
of scope of the definiteion.

If buffers from many pools are provided, the hardware may do
whatever it wants with it. Use smaller buffers for small
packets and bigger for big. Use bigger buffers for small
packets if there is no small buffers available. Use big plus
small buffer if Rx scatter is enabled and a packet fits in
such combination. And so so on.

I.e. the feature should be orthogoal to Rx scatter.
Rx scatter just says if driver/application allows to chain
mbufs to receive a packet. If Rx scatter is disabled,
a packet must be delivered in a single mbuf (either big or
small). If Rx scatter is enable, a packet may be delivered
using a chain of mbufs obtained from provided pools (either
just one or many if several pools are supported).

Ideally the feature should be orthogonal to buffer split as
well. I.e. provide many pools for different segments.
May be it is an overkill to provide pools A and B for the first
segment and C and D for the second. It could be limitted to the
last segment only. If so, we need separate strcture (not
rte_eth_rxseg) to pass many pools. IMHO, an array of mempools
is sufficient - similar to Rx queue configuration.
I.e. no extra length since data length may be derived from
mempool element size.

> Some of the HW has support for choosing memory pools based on the
> packet's size. The pool sort capability allows PMD to choose a
> memory pool based on the packet's length.
> 
> This is often useful for saving the memory where the application
> can create a different pool to steer the specific size of the
> packet, thus enabling effective use of memory.
> 
> For example, let's say HW has a capability of three pools,
>   - pool-1 size is 2K
>   - pool-2 size is > 2K and < 4K
>   - pool-3 size is > 4K
> Here,
>          pool-1 can accommodate packets with sizes < 2K
>          pool-2 can accommodate packets with sizes > 2K and < 4K
>          pool-3 can accommodate packets with sizes > 4K
> 
> With pool sort capability enabled in SW, an application may create
> three pools of different sizes and send them to PMD. Allowing PMD
> to program HW based on packet lengths. So that packets with less
> than 2K are received on pool-1, packets with lengths between 2K
> and 4K are received on pool-2 and finally packets greater than 4K
> are received on pool-3.
> 
> The following two capabilities are added to the rte_eth_rxseg_capa
> structure,
> 1. pool_sort --> tells pool sort capability is supported by HW.
> 2. max_npool --> max number of pools supported by HW.
> 
> Defined new structure rte_eth_rxseg_sort, to be used only when pool
> sort capability is present. If required this may be extended further
> to support more configurations.
> 
> Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>
> 
> v3:
>   - Implemented Pool Sort capability as new Rx offload capability,
>     RTE_ETH_RX_OFFLOAD_BUFFER_SORT.
> v2:
>   - Along with spec changes, uploading testpmd and driver changes.
> ---

[snip]
  
Ferruh Yigit Sept. 13, 2022, 9:31 a.m. UTC | #2
On 9/13/2022 9:06 AM, Andrew Rybchenko wrote:
> On 9/2/22 10:00, Hanumanth Pothula wrote:
>> This patch adds support for the pool sort capability.
> 
> "Add support for serveral (?) mbuf pools per Rx queue."
> 
> I dislike the word "sort" in summary and the feature
> description. IMHO it is too restrictive for intended behaviour.
> 
> The key feature here is just support for more than one mbuf
> pool per Rx queue. That's it. Everything else should be out
> of scope of the definiteion.
> 

ack, and author already agreed to update it as 'MULTIPLE_POOL',
perhaps should we say 'MULTIPLE_MEMPOOL' ?

> If buffers from many pools are provided, the hardware may do
> whatever it wants with it. Use smaller buffers for small
> packets and bigger for big. Use bigger buffers for small
> packets if there is no small buffers available. Use big plus
> small buffer if Rx scatter is enabled and a packet fits in
> such combination. And so so on.
> 
> I.e. the feature should be orthogoal to Rx scatter.
> Rx scatter just says if driver/application allows to chain
> mbufs to receive a packet. If Rx scatter is disabled,
> a packet must be delivered in a single mbuf (either big or
> small). If Rx scatter is enable, a packet may be delivered
> using a chain of mbufs obtained from provided pools (either
> just one or many if several pools are supported).
> 
> Ideally the feature should be orthogonal to buffer split as
> well. I.e. provide many pools for different segments.
> May be it is an overkill to provide pools A and B for the first
> segment and C and D for the second. It could be limitted to the
> last segment only. If so, we need separate strcture (not
> rte_eth_rxseg) to pass many pools. IMHO, an array of mempools
> is sufficient - similar to Rx queue configuration.
> I.e. no extra length since data length may be derived from
> mempool element size.
> 
>> Some of the HW has support for choosing memory pools based on the
>> packet's size. The pool sort capability allows PMD to choose a
>> memory pool based on the packet's length.
>>
>> This is often useful for saving the memory where the application
>> can create a different pool to steer the specific size of the
>> packet, thus enabling effective use of memory.
>>
>> For example, let's say HW has a capability of three pools,
>>   - pool-1 size is 2K
>>   - pool-2 size is > 2K and < 4K
>>   - pool-3 size is > 4K
>> Here,
>>          pool-1 can accommodate packets with sizes < 2K
>>          pool-2 can accommodate packets with sizes > 2K and < 4K
>>          pool-3 can accommodate packets with sizes > 4K
>>
>> With pool sort capability enabled in SW, an application may create
>> three pools of different sizes and send them to PMD. Allowing PMD
>> to program HW based on packet lengths. So that packets with less
>> than 2K are received on pool-1, packets with lengths between 2K
>> and 4K are received on pool-2 and finally packets greater than 4K
>> are received on pool-3.
>>
>> The following two capabilities are added to the rte_eth_rxseg_capa
>> structure,
>> 1. pool_sort --> tells pool sort capability is supported by HW.
>> 2. max_npool --> max number of pools supported by HW.
>>
>> Defined new structure rte_eth_rxseg_sort, to be used only when pool
>> sort capability is present. If required this may be extended further
>> to support more configurations.
>>
>> Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>
>>
>> v3:
>>   - Implemented Pool Sort capability as new Rx offload capability,
>>     RTE_ETH_RX_OFFLOAD_BUFFER_SORT.
>> v2:
>>   - Along with spec changes, uploading testpmd and driver changes.
>> ---
> 
> [snip]
>
  
Hanumanth Pothula Sept. 13, 2022, 10:41 a.m. UTC | #3
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Sent: Tuesday, September 13, 2022 3:01 PM
> To: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Hanumanth Reddy
> Pothula <hpothula@marvell.com>; Thomas Monjalon <thomas@monjalon.net>
> Cc: dev@dpdk.org; xuan.ding@intel.com; wenxuanx.wu@intel.com;
> xiaoyun.li@intel.com; stephen@networkplumber.org; yuanx.wang@intel.com;
> mdr@ashroe.eu; yuying.zhang@intel.com; qi.z.zhang@intel.com;
> viacheslavo@nvidia.com; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> Nithin Kumar Dabilpuram <ndabilpuram@marvell.com>
> Subject: [EXT] Re: [PATCH v3 1/3] ethdev: introduce pool sort capability
> 
> External Email
> 
> ----------------------------------------------------------------------
> On 9/13/2022 9:06 AM, Andrew Rybchenko wrote:
> > On 9/2/22 10:00, Hanumanth Pothula wrote:
> >> This patch adds support for the pool sort capability.
> >
> > "Add support for serveral (?) mbuf pools per Rx queue."
> >
> > I dislike the word "sort" in summary and the feature description. IMHO
> > it is too restrictive for intended behaviour.
> >
> > The key feature here is just support for more than one mbuf pool per
> > Rx queue. That's it. Everything else should be out of scope of the
> > definiteion.
> >
> 
> ack, and author already agreed to update it as 'MULTIPLE_POOL', perhaps
> should we say 'MULTIPLE_MEMPOOL' ?

Yes, will  take care naming.

> 
> > If buffers from many pools are provided, the hardware may do whatever
> > it wants with it. Use smaller buffers for small packets and bigger for
> > big. Use bigger buffers for small packets if there is no small buffers
> > available. Use big plus small buffer if Rx scatter is enabled and a
> > packet fits in such combination. And so so on.
> >
> > I.e. the feature should be orthogoal to Rx scatter.
> > Rx scatter just says if driver/application allows to chain mbufs to
> > receive a packet. If Rx scatter is disabled, a packet must be
> > delivered in a single mbuf (either big or small). If Rx scatter is
> > enable, a packet may be delivered using a chain of mbufs obtained from
> > provided pools (either just one or many if several pools are
> > supported).
> >
> > Ideally the feature should be orthogonal to buffer split as well. I.e.
> > provide many pools for different segments.
> > May be it is an overkill to provide pools A and B for the first
> > segment and C and D for the second. It could be limitted to the last
> > segment only. If so, we need separate strcture (not
> > rte_eth_rxseg) to pass many pools. IMHO, an array of mempools is
> > sufficient - similar to Rx queue configuration.
> > I.e. no extra length since data length may be derived from mempool
> > element size.

Thanks Andrew for your valuable inputs.
Yes, an array of mempools works fine.

Will upload V4 by taking care above.

> >
> >> Some of the HW has support for choosing memory pools based on the
> >> packet's size. The pool sort capability allows PMD to choose a memory
> >> pool based on the packet's length.
> >>
> >> This is often useful for saving the memory where the application can
> >> create a different pool to steer the specific size of the packet,
> >> thus enabling effective use of memory.
> >>
> >> For example, let's say HW has a capability of three pools,
> >>   - pool-1 size is 2K
> >>   - pool-2 size is > 2K and < 4K
> >>   - pool-3 size is > 4K
> >> Here,
> >>          pool-1 can accommodate packets with sizes < 2K
> >>          pool-2 can accommodate packets with sizes > 2K and < 4K
> >>          pool-3 can accommodate packets with sizes > 4K
> >>
> >> With pool sort capability enabled in SW, an application may create
> >> three pools of different sizes and send them to PMD. Allowing PMD to
> >> program HW based on packet lengths. So that packets with less than 2K
> >> are received on pool-1, packets with lengths between 2K and 4K are
> >> received on pool-2 and finally packets greater than 4K are received
> >> on pool-3.
> >>
> >> The following two capabilities are added to the rte_eth_rxseg_capa
> >> structure, 1. pool_sort --> tells pool sort capability is supported
> >> by HW.
> >> 2. max_npool --> max number of pools supported by HW.
> >>
> >> Defined new structure rte_eth_rxseg_sort, to be used only when pool
> >> sort capability is present. If required this may be extended further
> >> to support more configurations.
> >>
> >> Signed-off-by: Hanumanth Pothula <hpothula@marvell.com>
> >>
> >> v3:
> >>   - Implemented Pool Sort capability as new Rx offload capability,
> >>     RTE_ETH_RX_OFFLOAD_BUFFER_SORT.
> >> v2:
> >>   - Along with spec changes, uploading testpmd and driver changes.
> >> ---
> >
> > [snip]
> >
  

Patch

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 1979dc0850..5152c08f1e 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1634,6 +1634,58 @@  rte_eth_dev_is_removed(uint16_t port_id)
 	return ret;
 }
 
+static int
+rte_eth_rx_queue_check_sort(const struct rte_eth_rxseg_sort *rx_seg,
+			     uint16_t n_seg, uint32_t *mbp_buf_size,
+			     const struct rte_eth_dev_info *dev_info)
+{
+	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
+	uint16_t seg_idx;
+
+	if (!seg_capa->multi_pools || n_seg > seg_capa->max_npool) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid capabilities, multi_pools:%d different length segments %u exceed supported %u\n",
+			       seg_capa->multi_pools, n_seg, seg_capa->max_nseg);
+		return -EINVAL;
+	}
+
+	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
+		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
+		uint32_t length = rx_seg[seg_idx].length;
+
+		if (mpl == NULL) {
+			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
+			return -EINVAL;
+		}
+
+		if (mpl->private_data_size <
+			sizeof(struct rte_pktmbuf_pool_private)) {
+			RTE_ETHDEV_LOG(ERR,
+				       "%s private_data_size %u < %u\n",
+				       mpl->name, mpl->private_data_size,
+				       (unsigned int)sizeof
+					(struct rte_pktmbuf_pool_private));
+			return -ENOSPC;
+		}
+
+		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
+		/* On segment length == 0, update segment's length with
+		 * the pool's length - headeroom space, to make sure enough
+		 * space is accomidate for header.
+		 **/
+		length = length != 0 ? length : (*mbp_buf_size - RTE_PKTMBUF_HEADROOM);
+		if (*mbp_buf_size < length + RTE_PKTMBUF_HEADROOM) {
+			RTE_ETHDEV_LOG(ERR,
+				       "%s mbuf_data_room_size %u < %u))\n",
+				       mpl->name, *mbp_buf_size,
+				       length);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
 static int
 rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 			     uint16_t n_seg, uint32_t *mbp_buf_size,
@@ -1764,7 +1816,6 @@  rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 			return -EINVAL;
 		}
 	} else {
-		const struct rte_eth_rxseg_split *rx_seg;
 		uint16_t n_seg;
 
 		/* Extended multi-segment configuration check. */
@@ -1774,13 +1825,23 @@  rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 			return -EINVAL;
 		}
 
-		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
 		n_seg = rx_conf->rx_nseg;
 
 		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			const struct rte_eth_rxseg_split *rx_seg =
+				(const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
 			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
-							   &mbp_buf_size,
-							   &dev_info);
+								   &mbp_buf_size,
+								   &dev_info);
+			if (ret != 0)
+				return ret;
+		} else if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SORT) {
+			const struct rte_eth_rxseg_sort *rx_seg =
+				(const struct rte_eth_rxseg_sort *)rx_conf->rx_seg;
+			ret = rte_eth_rx_queue_check_sort(rx_seg, n_seg,
+								  &mbp_buf_size,
+								  &dev_info);
+
 			if (ret != 0)
 				return ret;
 		} else {
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index de9e970d4d..f7b5901a40 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1204,6 +1204,21 @@  struct rte_eth_rxseg_split {
 	uint32_t reserved; /**< Reserved field. */
 };
 
+/**
+ * The pool sort capability allows PMD to choose a memory pool based on the
+ * packet's length. So, basically, PMD programs HW for receiving packets from
+ * different pools, based on the packet's length.
+ *
+ * This is often useful for saving the memory where the application can create
+ * a different pool to steer the specific size of the packet, thus enabling
+ * effective use of memory.
+ */
+struct rte_eth_rxseg_sort {
+	struct rte_mempool *mp; /**< Memory pool to allocate packets from. */
+	uint16_t length; /**< Packet data length. */
+	uint32_t reserved; /**< Reserved field. */
+};
+
 /**
  * @warning
  * @b EXPERIMENTAL: this structure may change without prior notice.
@@ -1213,7 +1228,9 @@  struct rte_eth_rxseg_split {
 union rte_eth_rxseg {
 	/* The settings for buffer split offload. */
 	struct rte_eth_rxseg_split split;
-	/* The other features settings should be added here. */
+
+	/*The settings for packet sort offload. */
+	struct rte_eth_rxseg_sort sort;
 };
 
 /**
@@ -1633,6 +1650,7 @@  struct rte_eth_conf {
 #define RTE_ETH_RX_OFFLOAD_OUTER_UDP_CKSUM  RTE_BIT64(18)
 #define RTE_ETH_RX_OFFLOAD_RSS_HASH         RTE_BIT64(19)
 #define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT     RTE_BIT64(20)
+#define RTE_ETH_RX_OFFLOAD_BUFFER_SORT      RTE_BIT64(21)
 
 #define DEV_RX_OFFLOAD_VLAN_STRIP       RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN_STRIP)       RTE_ETH_RX_OFFLOAD_VLAN_STRIP
 #define DEV_RX_OFFLOAD_IPV4_CKSUM       RTE_DEPRECATED(DEV_RX_OFFLOAD_IPV4_CKSUM)       RTE_ETH_RX_OFFLOAD_IPV4_CKSUM
@@ -1827,10 +1845,14 @@  struct rte_eth_switch_info {
  */
 struct rte_eth_rxseg_capa {
 	__extension__
+	uint32_t mode_split : 1; /**< Supports buffer split capability @see struct rte_eth_rxseg_split */
+	uint32_t mode_sort : 1; /**< Supports pool sort capability @see struct rte_eth_rxseg_sort */
 	uint32_t multi_pools:1; /**< Supports receiving to multiple pools.*/
 	uint32_t offset_allowed:1; /**< Supports buffer offsets. */
 	uint32_t offset_align_log2:4; /**< Required offset alignment. */
 	uint16_t max_nseg; /**< Maximum amount of segments to split. */
+	/* < Maximum amount of pools that PMD can sort based on packet/segment lengths */
+	uint16_t max_npool;
 	uint16_t reserved; /**< Reserved field. */
 };