[v2,2/2] net/mlx5: reduce unnecessary memory access

Message ID 20210707090307.1650632-3-ruifeng.wang@arm.com (mailing list archive)
State Accepted, archived
Delegated to: Raslan Darawsheh
Headers
Series MLX5 PMD tuning |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/github-robot success github build: passed
ci/Intel-compilation success Compilation OK
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-abi-testing success Testing PASS

Commit Message

Ruifeng Wang July 7, 2021, 9:03 a.m. UTC
  MR btree len is a constant during Rx replenish.
Moved retrieve of the value out of loop to reduce data loads.
Slight performance uplift was measured on both N1SDP and x86.

Suggested-by: Slava Ovsiienko <viacheslavo@nvidia.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 drivers/net/mlx5/mlx5_rxtx_vec.c | 35 ++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 15 deletions(-)
  

Comments

Slava Ovsiienko July 12, 2021, 3:33 p.m. UTC | #1
> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Wednesday, July 7, 2021 12:03
> To: Raslan Darawsheh <rasland@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Slava
> Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; nd@arm.com;
> honnappa.nagarahalli@arm.com; Ruifeng Wang <ruifeng.wang@arm.com>
> Subject: [PATCH v2 2/2] net/mlx5: reduce unnecessary memory access
> 
> MR btree len is a constant during Rx replenish.
> Moved retrieve of the value out of loop to reduce data loads.
> Slight performance uplift was measured on both N1SDP and x86.
> 
> Suggested-by: Slava Ovsiienko <viacheslavo@nvidia.com>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Thank you for the update,
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
  

Patch

diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.c b/drivers/net/mlx5/mlx5_rxtx_vec.c
index d5af2d91ff..e64ef70181 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec.c
+++ b/drivers/net/mlx5/mlx5_rxtx_vec.c
@@ -106,22 +106,27 @@  mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data *rxq)
 			rxq->stats.rx_nombuf += n;
 			return;
 		}
-		for (i = 0; i < n; ++i) {
-			void *buf_addr;
-
-			/*
-			 * In order to support the mbufs with external attached
-			 * data buffer we should use the buf_addr pointer
-			 * instead of rte_mbuf_buf_addr(). It touches the mbuf
-			 * itself and may impact the performance.
-			 */
-			buf_addr = elts[i]->buf_addr;
-			wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr +
-						      RTE_PKTMBUF_HEADROOM);
-			/* If there's a single MR, no need to replace LKey. */
-			if (unlikely(mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh)
-				     > 1))
+		if (unlikely(mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh) > 1)) {
+			for (i = 0; i < n; ++i) {
+				/*
+				 * In order to support the mbufs with external attached
+				 * data buffer we should use the buf_addr pointer
+				 * instead of rte_mbuf_buf_addr(). It touches the mbuf
+				 * itself and may impact the performance.
+				 */
+				void *buf_addr = elts[i]->buf_addr;
+
+				wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr +
+							      RTE_PKTMBUF_HEADROOM);
 				wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]);
+			}
+		} else {
+			for (i = 0; i < n; ++i) {
+				void *buf_addr = elts[i]->buf_addr;
+
+				wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr +
+							      RTE_PKTMBUF_HEADROOM);
+			}
 		}
 		rxq->rq_ci += n;
 		/* Prevent overflowing into consumed mbufs. */