common/mlx5: fix QP ack timeout configuration

Message ID 20220214060319.1669846-1-yajunw@nvidia.com (mailing list archive)
State Accepted, archived
Delegated to: Raslan Darawsheh
Headers
Series common/mlx5: fix QP ack timeout configuration |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/github-robot: build success github build: passed
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS

Commit Message

Yajun Wu Feb. 14, 2022, 6:03 a.m. UTC
  VDPA driver creates two QPs(1 queue pair include 1 send queue
and 1 receive queue) per virtio queue to get traffic events
from NIC to SW.
Two QPs(called FW QP and SW QP) are created as loopback QP
and FW QP'SQ is connected to SW QP'RQ internally.

When packet receive or send out, HW will send WQE by FW QP'SQ,
then SW will get CQE from the CQ of SW QP.

With large scale and heavy traffic, the SQ's request may fail
to get ACK from RQ HW, because HW is busy.
SQ will retry the request with qpc.retry_count times and each time
wait for 4.096 uS *2^(ack_timeout) for the response. If still can’t
get RQ’s HW response, SQ will go to an error state.

16 is experienced value. It should not be too high or too low.
Too high will make QP waits too long in case it’s packet drop.
Too low will cause QP to go to an error state(retry-exceeded) easily.

Fixes: 15c3807e86a ("common/mlx5: support DevX QP operations")
Cc: stable@dpdk.org

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Raslan Darawsheh Feb. 22, 2022, 2:44 p.m. UTC | #1
Hi,

> -----Original Message-----
> From: Yajun Wu <yajunw@nvidia.com>
> Sent: Monday, February 14, 2022 8:03 AM
> To: Ori Kam <orika@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Matan Azrad <matan@nvidia.com>
> Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL)
> <thomas@monjalon.net>; Raslan Darawsheh <rasland@nvidia.com>; Roni
> Bar Yanai <roniba@nvidia.com>; stable@dpdk.org
> Subject: [PATCH] common/mlx5: fix QP ack timeout configuration
> 
> VDPA driver creates two QPs(1 queue pair include 1 send queue and 1
> receive queue) per virtio queue to get traffic events from NIC to SW.
> Two QPs(called FW QP and SW QP) are created as loopback QP and FW QP'SQ
> is connected to SW QP'RQ internally.
> 
> When packet receive or send out, HW will send WQE by FW QP'SQ, then SW
> will get CQE from the CQ of SW QP.
> 
> With large scale and heavy traffic, the SQ's request may fail to get ACK from
> RQ HW, because HW is busy.
> SQ will retry the request with qpc.retry_count times and each time wait for
> 4.096 uS *2^(ack_timeout) for the response. If still can’t get RQ’s HW
> response, SQ will go to an error state.
> 
> 16 is experienced value. It should not be too high or too low.
> Too high will make QP waits too long in case it’s packet drop.
> Too low will cause QP to go to an error state(retry-exceeded) easily.
> 
> Fixes: 15c3807e86a ("common/mlx5: support DevX QP operations")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yajun Wu <yajunw@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh
  

Patch

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 2e807a0829..7732613c69 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2279,7 +2279,7 @@  mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
 	case MLX5_CMD_OP_RTR2RTS_QP:
 		qpc = MLX5_ADDR_OF(rtr2rts_qp_in, &in, qpc);
 		MLX5_SET(rtr2rts_qp_in, &in, qpn, qp->id);
-		MLX5_SET(qpc, qpc, primary_address_path.ack_timeout, 14);
+		MLX5_SET(qpc, qpc, primary_address_path.ack_timeout, 16);
 		MLX5_SET(qpc, qpc, log_ack_req_freq, 0);
 		MLX5_SET(qpc, qpc, retry_count, 7);
 		MLX5_SET(qpc, qpc, rnr_retry, 7);