From patchwork Tue Jan 4 03:00:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruifeng Wang X-Patchwork-Id: 105598 X-Patchwork-Delegate: rasland@nvidia.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 26BC1A034D; Tue, 4 Jan 2022 04:01:28 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B4CE240042; Tue, 4 Jan 2022 04:01:27 +0100 (CET) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id A89D14003C; Tue, 4 Jan 2022 04:01:26 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0F4346D; Mon, 3 Jan 2022 19:01:26 -0800 (PST) Received: from net-arm-n1amp-02.shanghai.arm.com (net-arm-n1amp-02.shanghai.arm.com [10.169.210.112]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id AFA1C3F66F; Mon, 3 Jan 2022 19:01:23 -0800 (PST) From: Ruifeng Wang To: matan@nvidia.com, viacheslavo@nvidia.com Cc: dev@dpdk.org, honnappa.nagarahalli@arm.com, stable@dpdk.org, nd@arm.com, Ruifeng Wang Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path Date: Tue, 4 Jan 2022 11:00:56 +0800 Message-Id: <20220104030056.268974-1-ruifeng.wang@arm.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org In NEON vector PMD, vector load loads two contiguous 8B of descriptor data into vector register. Given vector load ensures no 16B atomicity, read of the word that includes op_own field could be reordered after read of other words. In this case, some words could contain invalid data. Reloaded qword0 after read barrier to update vector register. This ensures that the fetched data is correct. Testpmd single core test on N1SDP/ThunderX2 showed no performance drop. Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx completions") Cc: stable@dpdk.org Signed-off-by: Ruifeng Wang Tested-by: Ali Alnubani Acked-by: Viacheslav Ovsiienko --- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index b1d16baa61..b1ec615b51 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -647,6 +647,14 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, c0 = vld1q_u64((uint64_t *)(p0 + 48)); /* Synchronize for loading the rest of blocks. */ rte_io_rmb(); + /* B.0 (CQE 3) reload lower half of the block. */ + c3 = vld1q_lane_u64((uint64_t *)(p3 + 48), c3, 0); + /* B.0 (CQE 2) reload lower half of the block. */ + c2 = vld1q_lane_u64((uint64_t *)(p2 + 48), c2, 0); + /* B.0 (CQE 1) reload lower half of the block. */ + c1 = vld1q_lane_u64((uint64_t *)(p1 + 48), c1, 0); + /* B.0 (CQE 0) reload lower half of the block. */ + c0 = vld1q_lane_u64((uint64_t *)(p0 + 48), c0, 0); /* Prefetch next 4 CQEs. */ if (pkts_n - pos >= 2 * MLX5_VPMD_DESCS_PER_LOOP) { unsigned int next = pos + MLX5_VPMD_DESCS_PER_LOOP;