[3/4] examples/l3fwd: eliminate unnecessary reloads in loop

Message ID 20210318102550.59265-4-ruifeng.wang@arm.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series l3fwd improvements |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Ruifeng Wang March 18, 2021, 10:25 a.m. UTC
  Number of rx queue and number of rx port in lcore config are constants
during the period of l3 forward application running. But compiler has
no this information.

Copied values from lcore config to local variables and used the local
variables for iteration. Compiler can see that the local variables are
not changed, so qconf reloads at each iteration can be eliminated.

The change showed 1.8% performance uplift in single core, single port,
single queue test on N1SDP platform with MLX5 NIC.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 examples/l3fwd/l3fwd_lpm.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)
  

Comments

Jerin Jacob April 13, 2021, 5:43 p.m. UTC | #1
On Thu, Mar 18, 2021 at 3:56 PM Ruifeng Wang <ruifeng.wang@arm.com> wrote:
>
> Number of rx queue and number of rx port in lcore config are constants
> during the period of l3 forward application running. But compiler has
> no this information.
>
> Copied values from lcore config to local variables and used the local
> variables for iteration. Compiler can see that the local variables are
> not changed, so qconf reloads at each iteration can be eliminated.
>
> The change showed 1.8% performance uplift in single core, single port,
> single queue test on N1SDP platform with MLX5 NIC.

At least, in octeontx2, I dont see any performance improvement.
But change looks good. Please find below a comment.

>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  examples/l3fwd/l3fwd_lpm.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c
> index 3dcf1fef1..d338590b9 100644
> --- a/examples/l3fwd/l3fwd_lpm.c
> +++ b/examples/l3fwd/l3fwd_lpm.c
> @@ -190,14 +190,16 @@ lpm_main_loop(__rte_unused void *dummy)
>         lcore_id = rte_lcore_id();
>         qconf = &lcore_conf[lcore_id];
>
> -       if (qconf->n_rx_queue == 0) {
> +       uint16_t n_rx_q = qconf->n_rx_queue;
> +       uint16_t n_tx_p = qconf->n_tx_port;

How about adding const?
  
Ruifeng Wang April 14, 2021, 6:02 a.m. UTC | #2
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, April 14, 2021 1:43 AM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Cc: jerinj@marvell.com; hemant.agrawal@nxp.com; Ferruh Yigit
> <ferruh.yigit@intel.com>; thomas@monjalon.net; David Marchand
> <david.marchand@redhat.com>; dpdk-dev <dev@dpdk.org>; nd
> <nd@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Subject: Re: [dpdk-dev] [PATCH 3/4] examples/l3fwd: eliminate unnecessary
> reloads in loop
> 
> On Thu, Mar 18, 2021 at 3:56 PM Ruifeng Wang <ruifeng.wang@arm.com>
> wrote:
> >
> > Number of rx queue and number of rx port in lcore config are constants
> > during the period of l3 forward application running. But compiler has
> > no this information.
> >
> > Copied values from lcore config to local variables and used the local
> > variables for iteration. Compiler can see that the local variables are
> > not changed, so qconf reloads at each iteration can be eliminated.
> >
> > The change showed 1.8% performance uplift in single core, single port,
> > single queue test on N1SDP platform with MLX5 NIC.
> 
> At least, in octeontx2, I dont see any performance improvement.
> But change looks good. Please find below a comment.
> 
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  examples/l3fwd/l3fwd_lpm.c | 10 ++++++----
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c
> > index 3dcf1fef1..d338590b9 100644
> > --- a/examples/l3fwd/l3fwd_lpm.c
> > +++ b/examples/l3fwd/l3fwd_lpm.c
> > @@ -190,14 +190,16 @@ lpm_main_loop(__rte_unused void *dummy)
> >         lcore_id = rte_lcore_id();
> >         qconf = &lcore_conf[lcore_id];
> >
> > -       if (qconf->n_rx_queue == 0) {
> > +       uint16_t n_rx_q = qconf->n_rx_queue;
> > +       uint16_t n_tx_p = qconf->n_tx_port;
> 
> How about adding const?

Ack. The values are not expected to be changed.
Will update in next version.
  

Patch

diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c
index 3dcf1fef1..d338590b9 100644
--- a/examples/l3fwd/l3fwd_lpm.c
+++ b/examples/l3fwd/l3fwd_lpm.c
@@ -190,14 +190,16 @@  lpm_main_loop(__rte_unused void *dummy)
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_conf[lcore_id];
 
-	if (qconf->n_rx_queue == 0) {
+	uint16_t n_rx_q = qconf->n_rx_queue;
+	uint16_t n_tx_p = qconf->n_tx_port;
+	if (n_rx_q == 0) {
 		RTE_LOG(INFO, L3FWD, "lcore %u has nothing to do\n", lcore_id);
 		return 0;
 	}
 
 	RTE_LOG(INFO, L3FWD, "entering main loop on lcore %u\n", lcore_id);
 
-	for (i = 0; i < qconf->n_rx_queue; i++) {
+	for (i = 0; i < n_rx_q; i++) {
 
 		portid = qconf->rx_queue_list[i].port_id;
 		queueid = qconf->rx_queue_list[i].queue_id;
@@ -216,7 +218,7 @@  lpm_main_loop(__rte_unused void *dummy)
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
 
-			for (i = 0; i < qconf->n_tx_port; ++i) {
+			for (i = 0; i < n_tx_p; ++i) {
 				portid = qconf->tx_port_id[i];
 				if (qconf->tx_mbufs[portid].len == 0)
 					continue;
@@ -232,7 +234,7 @@  lpm_main_loop(__rte_unused void *dummy)
 		/*
 		 * Read packet from RX queues
 		 */
-		for (i = 0; i < qconf->n_rx_queue; ++i) {
+		for (i = 0; i < n_rx_q; ++i) {
 			portid = qconf->rx_queue_list[i].port_id;
 			queueid = qconf->rx_queue_list[i].queue_id;
 			nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,