[1/3] net/af_xdp: Increase max batch size to 512
Checks
Commit Message
Prior to this the max size was 32 which was unnecessarily
small. Also enforce the max batch size for TX for both
copy and zero copy modes. Prior to this only copy mode
enforced the max size.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
drivers/net/af_xdp/rte_eth_af_xdp.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
Comments
On 2/24/2021 11:18 AM, Ciara Loftus wrote:
> Prior to this the max size was 32 which was unnecessarily
> small.
Can you please describe the impact? Why changed from 32 to 512?
I assume this is to improve the performance but can you please explicitly
document it in the commit log?
> Also enforce the max batch size for TX for both
> copy and zero copy modes. Prior to this only copy mode
> enforced the max size.
>
By enforcing, the PMD ignores the user provided burst value if it is more than
PMS supported MAX, and this ignoring is done in silent. Also there is no way to
discover this MAX value without checking the code.
Overall, why this max values are required at all? After quick check I can see
they are used for some bulk operations, which I assume can be eliminated, what
do you think?
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
<...>
>
> On 2/24/2021 11:18 AM, Ciara Loftus wrote:
> > Prior to this the max size was 32 which was unnecessarily
> > small.
>
> Can you please describe the impact? Why changed from 32 to 512?
> I assume this is to improve the performance but can you please explicitly
> document it in the commit log?
Indeed - improved performance due to bulk operations and fewer ring accesses and syscalls.
The value 512 was arbitrary. I will change this to the default ring size as defined by libbpf (2048) in v2.
Will update the commit log with this info.
>
> > Also enforce the max batch size for TX for both
> > copy and zero copy modes. Prior to this only copy mode
> > enforced the max size.
> >
>
> By enforcing, the PMD ignores the user provided burst value if it is more than
> PMS supported MAX, and this ignoring is done in silent. Also there is no way
> to
> discover this MAX value without checking the code.
>
> Overall, why this max values are required at all? After quick check I can see
> they are used for some bulk operations, which I assume can be eliminated,
> what
> do you think?
We need to size some arrays at compile time with this max value.
Instead of removing the bulk operations which may impact performance, how about taking an approach where we split batches that are > 2048 into smaller batches and still handle all the packets instead of discarding those > 2048. Something like what's done in ixgbe for example:
http://code.dpdk.org/dpdk/v21.02/source/drivers/net/ixgbe/ixgbe_rxtx.c#L318
Thanks,
Ciara
>
> > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>
> <...>
On 3/3/2021 3:07 PM, Loftus, Ciara wrote:
>>
>> On 2/24/2021 11:18 AM, Ciara Loftus wrote:
>>> Prior to this the max size was 32 which was unnecessarily
>>> small.
>>
>> Can you please describe the impact? Why changed from 32 to 512?
>> I assume this is to improve the performance but can you please explicitly
>> document it in the commit log?
>
> Indeed - improved performance due to bulk operations and fewer ring accesses and syscalls.
> The value 512 was arbitrary. I will change this to the default ring size as defined by libbpf (2048) in v2.
> Will update the commit log with this info.
>
>>
>>> Also enforce the max batch size for TX for both
>>> copy and zero copy modes. Prior to this only copy mode
>>> enforced the max size.
>>>
>>
>> By enforcing, the PMD ignores the user provided burst value if it is more than
>> PMS supported MAX, and this ignoring is done in silent. Also there is no way
>> to
>> discover this MAX value without checking the code.
>>
>> Overall, why this max values are required at all? After quick check I can see
>> they are used for some bulk operations, which I assume can be eliminated,
>> what
>> do you think?
>
> We need to size some arrays at compile time with this max value.
>
> Instead of removing the bulk operations which may impact performance, how about taking an approach where we split batches that are > 2048 into smaller batches and still handle all the packets instead of discarding those > 2048. Something like what's done in ixgbe for example:
> http://code.dpdk.org/dpdk/v21.02/source/drivers/net/ixgbe/ixgbe_rxtx.c#L318
>`
If there is no reasonable way to eliminate the fix sized arrays, above
suggestion looks good.
Thanks,
ferruh
@@ -66,8 +66,8 @@ RTE_LOG_REGISTER(af_xdp_logtype, pmd.net.af_xdp, NOTICE);
#define ETH_AF_XDP_DFLT_START_QUEUE_IDX 0
#define ETH_AF_XDP_DFLT_QUEUE_COUNT 1
-#define ETH_AF_XDP_RX_BATCH_SIZE 32
-#define ETH_AF_XDP_TX_BATCH_SIZE 32
+#define ETH_AF_XDP_RX_BATCH_SIZE 512
+#define ETH_AF_XDP_TX_BATCH_SIZE 512
struct xsk_umem_info {
@@ -535,8 +535,6 @@ af_xdp_tx_cp(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
uint32_t idx_tx;
struct xsk_ring_cons *cq = &txq->pair->cq;
- nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
-
pull_umem_cq(umem, nb_pkts, cq);
nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
@@ -580,6 +578,8 @@ af_xdp_tx_cp(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
static uint16_t
eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
{
+ nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
+
#if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
return af_xdp_tx_zc(queue, bufs, nb_pkts);
#else