[v2] examples/l3fwd: adjust Tx burst size based on Rx burst
Checks
Commit Message
Previously, the TX burst size was fixed at 256, leading to performance
degradation in certain scenarios.
This patch introduces logic to set the TX burst size to match the
configured RX burst size (--burst option, default 32, max 512)
for better efficiency.
Fixes: d5c4897ecfb2 ("examples/l3fwd: add option to set Rx burst size")
Cc: haijie1@huawei.com
Cc: stable@dpdk.org
Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
Tested-by: Venkat Kumar Ande <VenkatKumar.Ande@amd.com>
Tested-by: Dengdui Huang <huangdengdui@huawei.com>
---
examples/l3fwd/l3fwd.h | 8 ++------
examples/l3fwd/l3fwd_common.h | 11 +++++++----
examples/l3fwd/main.c | 2 ++
3 files changed, 11 insertions(+), 10 deletions(-)
Comments
On Mon, 9 Jun 2025 09:58:27 +0000
Sivaprasad Tummala <sivaprasad.tummala@amd.com> wrote:
> Previously, the TX burst size was fixed at 256, leading to performance
> degradation in certain scenarios.
>
> This patch introduces logic to set the TX burst size to match the
> configured RX burst size (--burst option, default 32, max 512)
> for better efficiency.
>
> Fixes: d5c4897ecfb2 ("examples/l3fwd: add option to set Rx burst size")
> Cc: haijie1@huawei.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> Tested-by: Venkat Kumar Ande <VenkatKumar.Ande@amd.com>
> Tested-by: Dengdui Huang <huangdengdui@huawei.com>
What driver? Why not fix the driver.
If RX burst is small, there should be no way to get TX burst larger
than that to happen.
On 2025/6/9 23:21, Stephen Hemminger wrote:
> On Mon, 9 Jun 2025 09:58:27 +0000
> Sivaprasad Tummala <sivaprasad.tummala@amd.com> wrote:
>
>> Previously, the TX burst size was fixed at 256, leading to performance
>> degradation in certain scenarios.
>>
>> This patch introduces logic to set the TX burst size to match the
>> configured RX burst size (--burst option, default 32, max 512)
>> for better efficiency.
>>
>> Fixes: d5c4897ecfb2 ("examples/l3fwd: add option to set Rx burst size")
>> Cc: haijie1@huawei.com
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>> Tested-by: Venkat Kumar Ande <VenkatKumar.Ande@amd.com>
>> Tested-by: Dengdui Huang <huangdengdui@huawei.com>
>
> What driver? Why not fix the driver.
> If RX burst is small, there should be no way to get TX burst larger
> than that to happen.
If the Tx burst is too large, a number of mbufs will be temporarily stored in l3fwd's mbuf_table in a short period of time.
This leads to a decrease in the hit rate of the mempool cache, resulting in a drop in performance.
On 2025/6/10 14:42, huangdengdui wrote:
>
> On 2025/6/9 23:21, Stephen Hemminger wrote:
>> On Mon, 9 Jun 2025 09:58:27 +0000
>> Sivaprasad Tummala <sivaprasad.tummala@amd.com> wrote:
>>
>>> Previously, the TX burst size was fixed at 256, leading to performance
>>> degradation in certain scenarios.
>>>
>>> This patch introduces logic to set the TX burst size to match the
>>> configured RX burst size (--burst option, default 32, max 512)
>>> for better efficiency.
>>>
>>> Fixes: d5c4897ecfb2 ("examples/l3fwd: add option to set Rx burst size")
>>> Cc: haijie1@huawei.com
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>>> Tested-by: Venkat Kumar Ande <VenkatKumar.Ande@amd.com>
>>> Tested-by: Dengdui Huang <huangdengdui@huawei.com>
>>
>> What driver? Why not fix the driver.
>> If RX burst is small, there should be no way to get TX burst larger
>> than that to happen.
>
> If the Tx burst is too large, a number of mbufs will be temporarily stored in l3fwd's mbuf_table in a short period of time.
> This leads to a decrease in the hit rate of the mempool cache, resulting in a drop in performance.
This commit introduce coupling in Tx burst size and Rx burst size.
Different arch may have differ combination for Rx/Tx burst and descriptors.
So how about add one extra option for Tx burst size, and default the same with MAX_PKT_BURST.
>
>Previously, the TX burst size was fixed at 256, leading to performance
>degradation in certain scenarios.
>
>This patch introduces logic to set the TX burst size to match the
>configured RX burst size (--burst option, default 32, max 512)
>for better efficiency.
>
>Fixes: d5c4897ecfb2 ("examples/l3fwd: add option to set Rx burst size")
>Cc: haijie1@huawei.com
>Cc: stable@dpdk.org
>
>Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>Tested-by: Venkat Kumar Ande <VenkatKumar.Ande@amd.com>
>Tested-by: Dengdui Huang <huangdengdui@huawei.com>
It would be good if the selected burst sizes of Rx and Tx are logged.
On CN10K platform we see upto 5% improvement, and upto 30% improvement on CN9K.
Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
>---
> examples/l3fwd/l3fwd.h | 8 ++------
> examples/l3fwd/l3fwd_common.h | 11 +++++++----
> examples/l3fwd/main.c | 2 ++
> 3 files changed, 11 insertions(+), 10 deletions(-)
@@ -32,10 +32,6 @@
#define VECTOR_SIZE_DEFAULT MAX_PKT_BURST
#define VECTOR_TMO_NS_DEFAULT 1E6 /* 1ms */
-/*
- * Try to avoid TX buffering if we have at least MAX_TX_BURST packets to send.
- */
-#define MAX_TX_BURST (MAX_PKT_BURST / 2)
#define NB_SOCKETS 8
@@ -152,8 +148,8 @@ send_single_packet(struct lcore_conf *qconf,
len++;
/* enough pkts to be sent */
- if (unlikely(len == MAX_PKT_BURST)) {
- send_burst(qconf, MAX_PKT_BURST, port);
+ if (unlikely(len == nb_pkt_per_burst)) {
+ send_burst(qconf, nb_pkt_per_burst, port);
len = 0;
}
@@ -25,6 +25,9 @@
*/
#define SENDM_PORT_OVERHEAD(x) (x)
+extern uint32_t nb_pkt_per_burst;
+extern uint32_t max_tx_burst;
+
/*
* From http://www.rfc-editor.org/rfc/rfc1812.txt section 5.2.2:
* - The IP version number must be 4.
@@ -71,7 +74,7 @@ send_packetsx4(struct lcore_conf *qconf, uint16_t port, struct rte_mbuf *m[],
* If TX buffer for that queue is empty, and we have enough packets,
* then send them straightway.
*/
- if (num >= MAX_TX_BURST && len == 0) {
+ if (num >= max_tx_burst && len == 0) {
n = rte_eth_tx_burst(port, qconf->tx_queue_id[port], m, num);
if (unlikely(n < num)) {
do {
@@ -86,7 +89,7 @@ send_packetsx4(struct lcore_conf *qconf, uint16_t port, struct rte_mbuf *m[],
*/
n = len + num;
- n = (n > MAX_PKT_BURST) ? MAX_PKT_BURST - len : num;
+ n = (n > nb_pkt_per_burst) ? nb_pkt_per_burst - len : num;
j = 0;
switch (n % FWDSTEP) {
@@ -112,9 +115,9 @@ send_packetsx4(struct lcore_conf *qconf, uint16_t port, struct rte_mbuf *m[],
len += n;
/* enough pkts to be sent */
- if (unlikely(len == MAX_PKT_BURST)) {
+ if (unlikely(len == nb_pkt_per_burst)) {
- send_burst(qconf, MAX_PKT_BURST, port);
+ send_burst(qconf, nb_pkt_per_burst, port);
/* copy rest of the packets into the TX buffer. */
len = num - n;
@@ -59,6 +59,7 @@ uint16_t nb_rxd = RX_DESC_DEFAULT;
uint16_t nb_txd = TX_DESC_DEFAULT;
uint32_t nb_pkt_per_burst = DEFAULT_PKT_BURST;
uint32_t mb_mempool_cache_size = MEMPOOL_CACHE_SIZE;
+uint32_t max_tx_burst = DEFAULT_PKT_BURST / 2;
/**< Ports set in promiscuous mode off by default. */
static int promiscuous_on;
@@ -734,6 +735,7 @@ parse_pkt_burst(const char *optarg)
return;
}
nb_pkt_per_burst = burst_size;
+ max_tx_burst = burst_size / 2;
RTE_LOG(INFO, L3FWD, "Using PMD-provided burst value %d\n", burst_size);
}