[v2,1/2] net/txgbe: add proper memory barriers in Rx

Message ID 20231101033241.623190-1-jiawenwu@trustnetic.com (mailing list archive)
State Accepted, archived
Delegated to: Ferruh Yigit
Headers
Series [v2,1/2] net/txgbe: add proper memory barriers in Rx |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Jiawen Wu Nov. 1, 2023, 3:32 a.m. UTC
  Refer to commit 85e46c532bc7 ("net/ixgbe: add proper memory barriers in
Rx"). Fix the same issue as ixgbe.

Segmentation fault has been observed while running the
txgbe_recv_pkts_lro() function to receive packets on the Loongson 3A5000
processor. It's caused by the out-of-order execution of CPU. So add a
proper memory barrier to ensure the read ordering be correct.

We also did the same thing in the txgbe_recv_pkts() function to make the
rxd data be valid even though we did not find segmentation fault in this
function.

Fixes: 0e484278c85f ("net/txgbe: support Rx")
Cc: stable@dpdk.org

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
---
 drivers/net/txgbe/txgbe_rxtx.c | 49 +++++++++++++++-------------------
 1 file changed, 22 insertions(+), 27 deletions(-)
  

Comments

Ferruh Yigit Nov. 1, 2023, 4:55 p.m. UTC | #1
On 11/1/2023 3:32 AM, Jiawen Wu wrote:
> Refer to commit 85e46c532bc7 ("net/ixgbe: add proper memory barriers in
> Rx"). Fix the same issue as ixgbe.
> 
> Segmentation fault has been observed while running the
> txgbe_recv_pkts_lro() function to receive packets on the Loongson 3A5000
> processor. It's caused by the out-of-order execution of CPU. So add a
> proper memory barrier to ensure the read ordering be correct.
> 
> We also did the same thing in the txgbe_recv_pkts() function to make the
> rxd data be valid even though we did not find segmentation fault in this
> function.
> 
> Fixes: 0e484278c85f ("net/txgbe: support Rx")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
>

Series applied to dpdk-next-net/main, thanks.
  

Patch

diff --git a/drivers/net/txgbe/txgbe_rxtx.c b/drivers/net/txgbe/txgbe_rxtx.c
index 834ada886a..1cd4b25965 100644
--- a/drivers/net/txgbe/txgbe_rxtx.c
+++ b/drivers/net/txgbe/txgbe_rxtx.c
@@ -1226,7 +1226,7 @@  txgbe_rx_scan_hw_ring(struct txgbe_rx_queue *rxq)
 		for (j = 0; j < LOOK_AHEAD; j++)
 			s[j] = rte_le_to_cpu_32(rxdp[j].qw1.lo.status);
 
-		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+		rte_atomic_thread_fence(rte_memory_order_acquire);
 
 		/* Compute how many status bits were set */
 		for (nb_dd = 0; nb_dd < LOOK_AHEAD &&
@@ -1476,11 +1476,22 @@  txgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		 * of accesses cannot be reordered by the compiler. If they were
 		 * not volatile, they could be reordered which could lead to
 		 * using invalid descriptor fields when read from rxd.
+		 *
+		 * Meanwhile, to prevent the CPU from executing out of order, we
+		 * need to use a proper memory barrier to ensure the memory
+		 * ordering below.
 		 */
 		rxdp = &rx_ring[rx_id];
 		staterr = rxdp->qw1.lo.status;
 		if (!(staterr & rte_cpu_to_le_32(TXGBE_RXD_STAT_DD)))
 			break;
+
+		/*
+		 * Use acquire fence to ensure that status_error which includes
+		 * DD bit is loaded before loading of other descriptor words.
+		 */
+		rte_atomic_thread_fence(rte_memory_order_acquire);
+
 		rxd = *rxdp;
 
 		/*
@@ -1726,32 +1737,10 @@  txgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts,
 
 next_desc:
 		/*
-		 * The code in this whole file uses the volatile pointer to
-		 * ensure the read ordering of the status and the rest of the
-		 * descriptor fields (on the compiler level only!!!). This is so
-		 * UGLY - why not to just use the compiler barrier instead? DPDK
-		 * even has the rte_compiler_barrier() for that.
-		 *
-		 * But most importantly this is just wrong because this doesn't
-		 * ensure memory ordering in a general case at all. For
-		 * instance, DPDK is supposed to work on Power CPUs where
-		 * compiler barrier may just not be enough!
-		 *
-		 * I tried to write only this function properly to have a
-		 * starting point (as a part of an LRO/RSC series) but the
-		 * compiler cursed at me when I tried to cast away the
-		 * "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm
-		 * keeping it the way it is for now.
-		 *
-		 * The code in this file is broken in so many other places and
-		 * will just not work on a big endian CPU anyway therefore the
-		 * lines below will have to be revisited together with the rest
-		 * of the txgbe PMD.
-		 *
-		 * TODO:
-		 *    - Get rid of "volatile" and let the compiler do its job.
-		 *    - Use the proper memory barrier (rte_rmb()) to ensure the
-		 *      memory ordering below.
+		 * "Volatile" only prevents caching of the variable marked
+		 * volatile. Most important, "volatile" cannot prevent the CPU
+		 * from executing out of order. So, it is necessary to use a
+		 * proper memory barrier to ensure the memory ordering below.
 		 */
 		rxdp = &rx_ring[rx_id];
 		staterr = rte_le_to_cpu_32(rxdp->qw1.lo.status);
@@ -1759,6 +1748,12 @@  txgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts,
 		if (!(staterr & TXGBE_RXD_STAT_DD))
 			break;
 
+		/*
+		 * Use acquire fence to ensure that status_error which includes
+		 * DD bit is loaded before loading of other descriptor words.
+		 */
+		rte_atomic_thread_fence(rte_memory_order_acquire);
+
 		rxd = *rxdp;
 
 		PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u "