net/mlx5: fix error packets drop in the regular Rx

Message ID 20240220114520.14531-1-viacheslavo@nvidia.com (mailing list archive)
State Accepted, archived
Delegated to: Raslan Darawsheh
Headers
Series net/mlx5: fix error packets drop in the regular Rx |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/github-robot: build success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS

Commit Message

Slava Ovsiienko Feb. 20, 2024, 11:45 a.m. UTC
  When packet gets received with error it is reported in CQE
structure and PMD analyzes the error syndrome and provides
two options - either reset the entire queue for the critical
errors, or just ignore the packet.

The non-vectorized rx_burst did not ignore the non-critical
error packets, and in case of packet length exceeding the
mbuf data buffer length it took the next element in the queue
WQE ring, resulting in CQE/WQE consume indices synchronization
lost.

Fixes: aa67ed308458 ("net/mlx5: ignore non-critical syndromes for Rx queue")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_rx.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)
  

Comments

Dariusz Sosnowski Feb. 20, 2024, 2:04 p.m. UTC | #1
> -----Original Message-----
> From: Slava Ovsiienko <viacheslavo@nvidia.com>
> Sent: Tuesday, February 20, 2024 12:45
> To: dev@dpdk.org
> Cc: Matan Azrad <matan@nvidia.com>; Raslan Darawsheh
> <rasland@nvidia.com>; Ori Kam <orika@nvidia.com>; Dariusz Sosnowski
> <dsosnowski@nvidia.com>; stable@dpdk.org
> Subject: [PATCH] net/mlx5: fix error packets drop in the regular Rx
> 
> When packet gets received with error it is reported in CQE structure and PMD
> analyzes the error syndrome and provides two options - either reset the entire
> queue for the critical errors, or just ignore the packet.
> 
> The non-vectorized rx_burst did not ignore the non-critical error packets, and
> in case of packet length exceeding the mbuf data buffer length it took the next
> element in the queue WQE ring, resulting in CQE/WQE consume indices
> synchronization lost.
> 
> Fixes: aa67ed308458 ("net/mlx5: ignore non-critical syndromes for Rx
> queue")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>

Best regards,
Dariusz Sosnowski
  
Raslan Darawsheh Feb. 27, 2024, 4:16 p.m. UTC | #2
Hi,

> -----Original Message-----
> From: Slava Ovsiienko <viacheslavo@nvidia.com>
> Sent: Tuesday, February 20, 2024 1:45 PM
> To: dev@dpdk.org
> Cc: Matan Azrad <matan@nvidia.com>; Raslan Darawsheh
> <rasland@nvidia.com>; Ori Kam <orika@nvidia.com>; Dariusz Sosnowski
> <dsosnowski@nvidia.com>; stable@dpdk.org
> Subject: [PATCH] net/mlx5: fix error packets drop in the regular Rx
> 
> When packet gets received with error it is reported in CQE structure and PMD
> analyzes the error syndrome and provides two options - either reset the entire
> queue for the critical errors, or just ignore the packet.
> 
> The non-vectorized rx_burst did not ignore the non-critical error packets, and
> in case of packet length exceeding the mbuf data buffer length it took the next
> element in the queue WQE ring, resulting in CQE/WQE consume indices
> synchronization lost.
> 
> Fixes: aa67ed308458 ("net/mlx5: ignore non-critical syndromes for Rx
> queue")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Patch applied to next-net-mlx,
Kindest regards,
Raslan Darawsheh
  

Patch

diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 5bf1a679b2..cc087348a4 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -613,7 +613,8 @@  mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t vec,
  * @param mprq
  *   Indication if it is called from MPRQ.
  * @return
- *   0 in case of empty CQE, MLX5_REGULAR_ERROR_CQE_RET in case of error CQE,
+ *   0 in case of empty CQE,
+ *   MLX5_REGULAR_ERROR_CQE_RET in case of error CQE,
  *   MLX5_CRITICAL_ERROR_CQE_RET in case of error CQE lead to Rx queue reset,
  *   otherwise the packet size in regular RxQ,
  *   and striding byte count format in mprq case.
@@ -697,6 +698,11 @@  mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe,
 					if (ret == MLX5_RECOVERY_ERROR_RET ||
 						ret == MLX5_RECOVERY_COMPLETED_RET)
 						return MLX5_CRITICAL_ERROR_CQE_RET;
+					if (!mprq && ret == MLX5_RECOVERY_IGNORE_RET) {
+						*skip_cnt = 1;
+						++rxq->cq_ci;
+						return MLX5_ERROR_CQE_MASK;
+					}
 				} else {
 					return 0;
 				}
@@ -971,19 +977,18 @@  mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			cqe = &(*rxq->cqes)[rxq->cq_ci & cqe_mask];
 			len = mlx5_rx_poll_len(rxq, cqe, cqe_n, cqe_mask, &mcqe, &skip_cnt, false);
 			if (unlikely(len & MLX5_ERROR_CQE_MASK)) {
+				/* We drop packets with non-critical errors */
+				rte_mbuf_raw_free(rep);
 				if (len == MLX5_CRITICAL_ERROR_CQE_RET) {
-					rte_mbuf_raw_free(rep);
 					rq_ci = rxq->rq_ci << sges_n;
 					break;
 				}
+				/* Skip specified amount of error CQEs packets */
 				rq_ci >>= sges_n;
 				rq_ci += skip_cnt;
 				rq_ci <<= sges_n;
-				idx = rq_ci & wqe_mask;
-				wqe = &((volatile struct mlx5_wqe_data_seg *)rxq->wqes)[idx];
-				seg = (*rxq->elts)[idx];
-				cqe = &(*rxq->cqes)[rxq->cq_ci & cqe_mask];
-				len = len & ~MLX5_ERROR_CQE_MASK;
+				MLX5_ASSERT(!pkt);
+				continue;
 			}
 			if (len == 0) {
 				rte_mbuf_raw_free(rep);