Message ID | 20220104030056.268974-1-ruifeng.wang@arm.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Raslan Darawsheh |
Headers | show |
Series | net/mlx5: fix risk in Rx descriptor read in NEON vector path | expand |
Context | Check | Description |
---|---|---|
ci/intel-Testing | success | Testing PASS |
ci/Intel-compilation | success | Compilation OK |
ci/iol-aarch64-compile-testing | success | Testing PASS |
ci/iol-x86_64-unit-testing | success | Testing PASS |
ci/iol-x86_64-compile-testing | success | Testing PASS |
ci/iol-aarch64-unit-testing | success | Testing PASS |
ci/iol-abi-testing | success | Testing PASS |
ci/iol-intel-Functional | success | Functional Testing PASS |
ci/iol-intel-Performance | success | Performance Testing PASS |
ci/iol-broadcom-Performance | success | Performance Testing PASS |
ci/github-robot: build | success | github build: passed |
ci/iol-mellanox-Performance | success | Performance Testing PASS |
ci/iol-broadcom-Functional | success | Functional Testing PASS |
ci/checkpatch | success | coding style OK |
Ping. Please could you help to review this patch? Thanks. Ruifeng > -----Original Message----- > From: Ruifeng Wang <ruifeng.wang@arm.com> > Sent: Tuesday, January 4, 2022 11:01 AM > To: matan@nvidia.com; viacheslavo@nvidia.com > Cc: dev@dpdk.org; Honnappa Nagarahalli > <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; nd <nd@arm.com>; > Ruifeng Wang <Ruifeng.Wang@arm.com> > Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path > > In NEON vector PMD, vector load loads two contiguous 8B of descriptor data > into vector register. Given vector load ensures no 16B atomicity, read of the > word that includes op_own field could be reordered after read of other > words. In this case, some words could contain invalid data. > > Reloaded qword0 after read barrier to update vector register. This ensures > that the fetched data is correct. > > Testpmd single core test on N1SDP/ThunderX2 showed no performance > drop. > > Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx > completions") > Cc: stable@dpdk.org > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> > --- > drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > index b1d16baa61..b1ec615b51 100644 > --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > @@ -647,6 +647,14 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, > volatile struct mlx5_cqe *cq, > c0 = vld1q_u64((uint64_t *)(p0 + 48)); > /* Synchronize for loading the rest of blocks. */ > rte_io_rmb(); > + /* B.0 (CQE 3) reload lower half of the block. */ > + c3 = vld1q_lane_u64((uint64_t *)(p3 + 48), c3, 0); > + /* B.0 (CQE 2) reload lower half of the block. */ > + c2 = vld1q_lane_u64((uint64_t *)(p2 + 48), c2, 0); > + /* B.0 (CQE 1) reload lower half of the block. */ > + c1 = vld1q_lane_u64((uint64_t *)(p1 + 48), c1, 0); > + /* B.0 (CQE 0) reload lower half of the block. */ > + c0 = vld1q_lane_u64((uint64_t *)(p0 + 48), c0, 0); > /* Prefetch next 4 CQEs. */ > if (pkts_n - pos >= 2 * MLX5_VPMD_DESCS_PER_LOOP) { > unsigned int next = pos + > MLX5_VPMD_DESCS_PER_LOOP; > -- > 2.25.1
Hi Ruifeng, Patch looks reasonable, thank you. Just curious - did you see the real issue with re-ordering in this code fragment? And, please, let us do performance check. With best regards, Slava > -----Original Message----- > From: Ruifeng Wang <Ruifeng.Wang@arm.com> > Sent: Thursday, February 10, 2022 8:25 > To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko > <viacheslavo@nvidia.com> > Cc: dev@dpdk.org; Honnappa Nagarahalli > <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; nd <nd@arm.com>; > Ruifeng Wang <Ruifeng.Wang@arm.com>; nd <nd@arm.com> > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector > path > > Ping. > Please could you help to review this patch? > > Thanks. > Ruifeng > > > -----Original Message----- > > From: Ruifeng Wang <ruifeng.wang@arm.com> > > Sent: Tuesday, January 4, 2022 11:01 AM > > To: matan@nvidia.com; viacheslavo@nvidia.com > > Cc: dev@dpdk.org; Honnappa Nagarahalli > <Honnappa.Nagarahalli@arm.com>; > > stable@dpdk.org; nd <nd@arm.com>; Ruifeng Wang > <Ruifeng.Wang@arm.com> > > Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON > > vector path > > > > In NEON vector PMD, vector load loads two contiguous 8B of descriptor > > data into vector register. Given vector load ensures no 16B atomicity, > > read of the word that includes op_own field could be reordered after > > read of other words. In this case, some words could contain invalid data. > > > > Reloaded qword0 after read barrier to update vector register. This > > ensures that the fetched data is correct. > > > > Testpmd single core test on N1SDP/ThunderX2 showed no performance > > drop. > > > > Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx > > completions") > > Cc: stable@dpdk.org > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> > > --- > > drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > > b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > > index b1d16baa61..b1ec615b51 100644 > > --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > > +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > > @@ -647,6 +647,14 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, > > volatile struct mlx5_cqe *cq, > > c0 = vld1q_u64((uint64_t *)(p0 + 48)); > > /* Synchronize for loading the rest of blocks. */ > > rte_io_rmb(); > > + /* B.0 (CQE 3) reload lower half of the block. */ > > + c3 = vld1q_lane_u64((uint64_t *)(p3 + 48), c3, 0); > > + /* B.0 (CQE 2) reload lower half of the block. */ > > + c2 = vld1q_lane_u64((uint64_t *)(p2 + 48), c2, 0); > > + /* B.0 (CQE 1) reload lower half of the block. */ > > + c1 = vld1q_lane_u64((uint64_t *)(p1 + 48), c1, 0); > > + /* B.0 (CQE 0) reload lower half of the block. */ > > + c0 = vld1q_lane_u64((uint64_t *)(p0 + 48), c0, 0); > > /* Prefetch next 4 CQEs. */ > > if (pkts_n - pos >= 2 * MLX5_VPMD_DESCS_PER_LOOP) { > > unsigned int next = pos + > > MLX5_VPMD_DESCS_PER_LOOP; > > -- > > 2.25.1
> -----Original Message----- > From: Slava Ovsiienko <viacheslavo@nvidia.com> > Sent: Thursday, February 10, 2022 4:17 PM > To: Ruifeng Wang <Ruifeng.Wang@arm.com>; Matan Azrad > <matan@nvidia.com> > Cc: dev@dpdk.org; Honnappa Nagarahalli > <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; nd <nd@arm.com>; > nd <nd@arm.com> > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector > path > > Hi Ruifeng, Hi Slava, > > Patch looks reasonable, thank you. > Just curious - did you see the real issue with re-ordering in this code > fragment? No real issue was seen. It is analysis from architecture perspective. > And, please, let us do performance check. Sure. Thank you. > > With best regards, > Slava > > > -----Original Message----- > > From: Ruifeng Wang <Ruifeng.Wang@arm.com> > > Sent: Thursday, February 10, 2022 8:25 > > To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko > > <viacheslavo@nvidia.com> > > Cc: dev@dpdk.org; Honnappa Nagarahalli > <Honnappa.Nagarahalli@arm.com>; > > stable@dpdk.org; nd <nd@arm.com>; Ruifeng Wang > <Ruifeng.Wang@arm.com>; > > nd <nd@arm.com> > > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON > > vector path > > > > Ping. > > Please could you help to review this patch? > > > > Thanks. > > Ruifeng > > > > > -----Original Message----- > > > From: Ruifeng Wang <ruifeng.wang@arm.com> > > > Sent: Tuesday, January 4, 2022 11:01 AM > > > To: matan@nvidia.com; viacheslavo@nvidia.com > > > Cc: dev@dpdk.org; Honnappa Nagarahalli > > <Honnappa.Nagarahalli@arm.com>; > > > stable@dpdk.org; nd <nd@arm.com>; Ruifeng Wang > > <Ruifeng.Wang@arm.com> > > > Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON > > > vector path > > > > > > In NEON vector PMD, vector load loads two contiguous 8B of > > > descriptor data into vector register. Given vector load ensures no > > > 16B atomicity, read of the word that includes op_own field could be > > > reordered after read of other words. In this case, some words could > contain invalid data. > > > > > > Reloaded qword0 after read barrier to update vector register. This > > > ensures that the fetched data is correct. > > > > > > Testpmd single core test on N1SDP/ThunderX2 showed no performance > > > drop. > > > > > > Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx > > > completions") > > > Cc: stable@dpdk.org > > > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> > > > --- > > > drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 ++++++++ > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > > > b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > > > index b1d16baa61..b1ec615b51 100644 > > > --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > > > +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > > > @@ -647,6 +647,14 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, > > > volatile struct mlx5_cqe *cq, > > > c0 = vld1q_u64((uint64_t *)(p0 + 48)); > > > /* Synchronize for loading the rest of blocks. */ > > > rte_io_rmb(); > > > + /* B.0 (CQE 3) reload lower half of the block. */ > > > + c3 = vld1q_lane_u64((uint64_t *)(p3 + 48), c3, 0); > > > + /* B.0 (CQE 2) reload lower half of the block. */ > > > + c2 = vld1q_lane_u64((uint64_t *)(p2 + 48), c2, 0); > > > + /* B.0 (CQE 1) reload lower half of the block. */ > > > + c1 = vld1q_lane_u64((uint64_t *)(p1 + 48), c1, 0); > > > + /* B.0 (CQE 0) reload lower half of the block. */ > > > + c0 = vld1q_lane_u64((uint64_t *)(p0 + 48), c0, 0); > > > /* Prefetch next 4 CQEs. */ > > > if (pkts_n - pos >= 2 * MLX5_VPMD_DESCS_PER_LOOP) { > > > unsigned int next = pos + > > > MLX5_VPMD_DESCS_PER_LOOP; > > > -- > > > 2.25.1
> -----Original Message----- > From: Ruifeng Wang <ruifeng.wang@arm.com> > Sent: Tuesday, January 4, 2022 5:01 AM > To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko > <viacheslavo@nvidia.com> > Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com; stable@dpdk.org; > nd@arm.com; Ruifeng Wang <ruifeng.wang@arm.com> > Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path > > In NEON vector PMD, vector load loads two contiguous 8B of > descriptor data into vector register. Given vector load ensures no > 16B atomicity, read of the word that includes op_own field could be > reordered after read of other words. In this case, some words could > contain invalid data. > > Reloaded qword0 after read barrier to update vector register. This > ensures that the fetched data is correct. > > Testpmd single core test on N1SDP/ThunderX2 showed no performance > drop. > > Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx > completions") > Cc: stable@dpdk.org > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> > --- Tested with BlueField-2 and didn't see a performance impact. Tested-by: Ali Alnubani <alialnu@nvidia.com> Thanks, Ali
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index b1d16baa61..b1ec615b51 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -647,6 +647,14 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, c0 = vld1q_u64((uint64_t *)(p0 + 48)); /* Synchronize for loading the rest of blocks. */ rte_io_rmb(); + /* B.0 (CQE 3) reload lower half of the block. */ + c3 = vld1q_lane_u64((uint64_t *)(p3 + 48), c3, 0); + /* B.0 (CQE 2) reload lower half of the block. */ + c2 = vld1q_lane_u64((uint64_t *)(p2 + 48), c2, 0); + /* B.0 (CQE 1) reload lower half of the block. */ + c1 = vld1q_lane_u64((uint64_t *)(p1 + 48), c1, 0); + /* B.0 (CQE 0) reload lower half of the block. */ + c0 = vld1q_lane_u64((uint64_t *)(p0 + 48), c0, 0); /* Prefetch next 4 CQEs. */ if (pkts_n - pos >= 2 * MLX5_VPMD_DESCS_PER_LOOP) { unsigned int next = pos + MLX5_VPMD_DESCS_PER_LOOP;
In NEON vector PMD, vector load loads two contiguous 8B of descriptor data into vector register. Given vector load ensures no 16B atomicity, read of the word that includes op_own field could be reordered after read of other words. In this case, some words could contain invalid data. Reloaded qword0 after read barrier to update vector register. This ensures that the fetched data is correct. Testpmd single core test on N1SDP/ThunderX2 showed no performance drop. Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx completions") Cc: stable@dpdk.org Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> --- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 ++++++++ 1 file changed, 8 insertions(+)