[2/2] net/bnxt: remove software prefetches from AVX2 Rx path

Message ID 20211115182410.5545-3-lance.richardson@broadcom.com (mailing list archive)
State Accepted, archived
Delegated to: Ajit Khaparde
Headers
Series net/bnxt: minor performance fixes for AVX2 Rx |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-intel-Functional fail Functional Testing issues
ci/iol-intel-Performance success Performance Testing PASS

Commit Message

Lance Richardson Nov. 15, 2021, 6:24 p.m. UTC
  Testing has shown no performance benefit from software prefetching
of receive completion descriptors in the AVX2 burst receive path,
and slightly better performance without them on some CPU families,
so this patch removes them.

Fixes: c4e4c18963b0 ("net/bnxt: add AVX2 RX/Tx")
Cc: stable@dpdk.org
Signed-off-by: Lance Richardson <lance.richardson@broadcom.com>
---
 drivers/net/bnxt/bnxt_rxtx_vec_avx2.c | 14 --------------
 1 file changed, 14 deletions(-)
  

Patch

diff --git a/drivers/net/bnxt/bnxt_rxtx_vec_avx2.c b/drivers/net/bnxt/bnxt_rxtx_vec_avx2.c
index 54e3af22ac..34bd22edf0 100644
--- a/drivers/net/bnxt/bnxt_rxtx_vec_avx2.c
+++ b/drivers/net/bnxt/bnxt_rxtx_vec_avx2.c
@@ -92,12 +92,6 @@  recv_burst_vec_avx2(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	cons = raw_cons & (cp_ring_size - 1);
 	mbcons = (raw_cons / 2) & (rx_ring_size - 1);
 
-	/* Prefetch first four descriptor pairs. */
-	rte_prefetch0(&cp_desc_ring[cons + 0]);
-	rte_prefetch0(&cp_desc_ring[cons + 4]);
-	rte_prefetch0(&cp_desc_ring[cons + 8]);
-	rte_prefetch0(&cp_desc_ring[cons + 12]);
-
 	/* Return immediately if there is not at least one completed packet. */
 	if (!bnxt_cpr_cmp_valid(&cp_desc_ring[cons], raw_cons, cp_ring_size))
 		return 0;
@@ -136,14 +130,6 @@  recv_burst_vec_avx2(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		_mm256_storeu_si256((void *)&rx_pkts[i + 4], t0);
 #endif
 
-		/* Prefetch eight descriptor pairs for next iteration. */
-		if (i + BNXT_RX_DESCS_PER_LOOP_VEC256 < nb_pkts) {
-			rte_prefetch0(&cp_desc_ring[cons + 16]);
-			rte_prefetch0(&cp_desc_ring[cons + 20]);
-			rte_prefetch0(&cp_desc_ring[cons + 24]);
-			rte_prefetch0(&cp_desc_ring[cons + 28]);
-		}
-
 		/*
 		 * Load eight receive completion descriptors into 256-bit
 		 * registers. Loads are issued in reverse order in order to