diff mbox series

net/mlx5: fix risk in Rx descriptor read in NEON vector path

Message ID 20220104030056.268974-1-ruifeng.wang@arm.com (mailing list archive)
State New
Delegated to: Raslan Darawsheh
Headers show
Series net/mlx5: fix risk in Rx descriptor read in NEON vector path | expand

Checks

Context Check Description
ci/intel-Testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/github-robot: build success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/checkpatch success coding style OK

Commit Message

Ruifeng Wang Jan. 4, 2022, 3 a.m. UTC
In NEON vector PMD, vector load loads two contiguous 8B of
descriptor data into vector register. Given vector load ensures no
16B atomicity, read of the word that includes op_own field could be
reordered after read of other words. In this case, some words could
contain invalid data.

Reloaded qword0 after read barrier to update vector register. This
ensures that the fetched data is correct.

Testpmd single core test on N1SDP/ThunderX2 showed no performance drop.

Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx completions")
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Ruifeng Wang Feb. 10, 2022, 6:24 a.m. UTC | #1
Ping.
Please could you help to review this patch?

Thanks.
Ruifeng

> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Tuesday, January 4, 2022 11:01 AM
> To: matan@nvidia.com; viacheslavo@nvidia.com
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; nd <nd@arm.com>;
> Ruifeng Wang <Ruifeng.Wang@arm.com>
> Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
> 
> In NEON vector PMD, vector load loads two contiguous 8B of descriptor data
> into vector register. Given vector load ensures no 16B atomicity, read of the
> word that includes op_own field could be reordered after read of other
> words. In this case, some words could contain invalid data.
> 
> Reloaded qword0 after read barrier to update vector register. This ensures
> that the fetched data is correct.
> 
> Testpmd single core test on N1SDP/ThunderX2 showed no performance
> drop.
> 
> Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx
> completions")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> index b1d16baa61..b1ec615b51 100644
> --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> @@ -647,6 +647,14 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq,
> volatile struct mlx5_cqe *cq,
>  		c0 = vld1q_u64((uint64_t *)(p0 + 48));
>  		/* Synchronize for loading the rest of blocks. */
>  		rte_io_rmb();
> +		/* B.0 (CQE 3) reload lower half of the block. */
> +		c3 = vld1q_lane_u64((uint64_t *)(p3 + 48), c3, 0);
> +		/* B.0 (CQE 2) reload lower half of the block. */
> +		c2 = vld1q_lane_u64((uint64_t *)(p2 + 48), c2, 0);
> +		/* B.0 (CQE 1) reload lower half of the block. */
> +		c1 = vld1q_lane_u64((uint64_t *)(p1 + 48), c1, 0);
> +		/* B.0 (CQE 0) reload lower half of the block. */
> +		c0 = vld1q_lane_u64((uint64_t *)(p0 + 48), c0, 0);
>  		/* Prefetch next 4 CQEs. */
>  		if (pkts_n - pos >= 2 * MLX5_VPMD_DESCS_PER_LOOP) {
>  			unsigned int next = pos +
> MLX5_VPMD_DESCS_PER_LOOP;
> --
> 2.25.1
Slava Ovsiienko Feb. 10, 2022, 8:16 a.m. UTC | #2
Hi Ruifeng,

Patch looks reasonable, thank you.
Just curious - did you see the real issue with re-ordering in this code fragment?
And, please, let us do performance check.

With best regards,
Slava

> -----Original Message-----
> From: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Sent: Thursday, February 10, 2022 8:25
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; nd <nd@arm.com>;
> Ruifeng Wang <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector
> path
> 
> Ping.
> Please could you help to review this patch?
> 
> Thanks.
> Ruifeng
> 
> > -----Original Message-----
> > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > Sent: Tuesday, January 4, 2022 11:01 AM
> > To: matan@nvidia.com; viacheslavo@nvidia.com
> > Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>;
> > stable@dpdk.org; nd <nd@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> > Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > vector path
> >
> > In NEON vector PMD, vector load loads two contiguous 8B of descriptor
> > data into vector register. Given vector load ensures no 16B atomicity,
> > read of the word that includes op_own field could be reordered after
> > read of other words. In this case, some words could contain invalid data.
> >
> > Reloaded qword0 after read barrier to update vector register. This
> > ensures that the fetched data is correct.
> >
> > Testpmd single core test on N1SDP/ThunderX2 showed no performance
> > drop.
> >
> > Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx
> > completions")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > index b1d16baa61..b1ec615b51 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > @@ -647,6 +647,14 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq,
> > volatile struct mlx5_cqe *cq,
> >  		c0 = vld1q_u64((uint64_t *)(p0 + 48));
> >  		/* Synchronize for loading the rest of blocks. */
> >  		rte_io_rmb();
> > +		/* B.0 (CQE 3) reload lower half of the block. */
> > +		c3 = vld1q_lane_u64((uint64_t *)(p3 + 48), c3, 0);
> > +		/* B.0 (CQE 2) reload lower half of the block. */
> > +		c2 = vld1q_lane_u64((uint64_t *)(p2 + 48), c2, 0);
> > +		/* B.0 (CQE 1) reload lower half of the block. */
> > +		c1 = vld1q_lane_u64((uint64_t *)(p1 + 48), c1, 0);
> > +		/* B.0 (CQE 0) reload lower half of the block. */
> > +		c0 = vld1q_lane_u64((uint64_t *)(p0 + 48), c0, 0);
> >  		/* Prefetch next 4 CQEs. */
> >  		if (pkts_n - pos >= 2 * MLX5_VPMD_DESCS_PER_LOOP) {
> >  			unsigned int next = pos +
> > MLX5_VPMD_DESCS_PER_LOOP;
> > --
> > 2.25.1
Ruifeng Wang Feb. 10, 2022, 8:29 a.m. UTC | #3
> -----Original Message-----
> From: Slava Ovsiienko <viacheslavo@nvidia.com>
> Sent: Thursday, February 10, 2022 4:17 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; Matan Azrad
> <matan@nvidia.com>
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; nd <nd@arm.com>;
> nd <nd@arm.com>
> Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector
> path
> 
> Hi Ruifeng,

Hi Slava,
> 
> Patch looks reasonable, thank you.
> Just curious - did you see the real issue with re-ordering in this code
> fragment?

No real issue was seen. It is analysis from architecture perspective.
> And, please, let us do performance check.

Sure. Thank you.
> 
> With best regards,
> Slava
> 
> > -----Original Message-----
> > From: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > Sent: Thursday, February 10, 2022 8:25
> > To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>
> > Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>;
> > stable@dpdk.org; nd <nd@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>;
> > nd <nd@arm.com>
> > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > vector path
> >
> > Ping.
> > Please could you help to review this patch?
> >
> > Thanks.
> > Ruifeng
> >
> > > -----Original Message-----
> > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Sent: Tuesday, January 4, 2022 11:01 AM
> > > To: matan@nvidia.com; viacheslavo@nvidia.com
> > > Cc: dev@dpdk.org; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>;
> > > stable@dpdk.org; nd <nd@arm.com>; Ruifeng Wang
> > <Ruifeng.Wang@arm.com>
> > > Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > > vector path
> > >
> > > In NEON vector PMD, vector load loads two contiguous 8B of
> > > descriptor data into vector register. Given vector load ensures no
> > > 16B atomicity, read of the word that includes op_own field could be
> > > reordered after read of other words. In this case, some words could
> contain invalid data.
> > >
> > > Reloaded qword0 after read barrier to update vector register. This
> > > ensures that the fetched data is correct.
> > >
> > > Testpmd single core test on N1SDP/ThunderX2 showed no performance
> > > drop.
> > >
> > > Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx
> > > completions")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > >  drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > > b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > > index b1d16baa61..b1ec615b51 100644
> > > --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > > +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > > @@ -647,6 +647,14 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq,
> > > volatile struct mlx5_cqe *cq,
> > >  		c0 = vld1q_u64((uint64_t *)(p0 + 48));
> > >  		/* Synchronize for loading the rest of blocks. */
> > >  		rte_io_rmb();
> > > +		/* B.0 (CQE 3) reload lower half of the block. */
> > > +		c3 = vld1q_lane_u64((uint64_t *)(p3 + 48), c3, 0);
> > > +		/* B.0 (CQE 2) reload lower half of the block. */
> > > +		c2 = vld1q_lane_u64((uint64_t *)(p2 + 48), c2, 0);
> > > +		/* B.0 (CQE 1) reload lower half of the block. */
> > > +		c1 = vld1q_lane_u64((uint64_t *)(p1 + 48), c1, 0);
> > > +		/* B.0 (CQE 0) reload lower half of the block. */
> > > +		c0 = vld1q_lane_u64((uint64_t *)(p0 + 48), c0, 0);
> > >  		/* Prefetch next 4 CQEs. */
> > >  		if (pkts_n - pos >= 2 * MLX5_VPMD_DESCS_PER_LOOP) {
> > >  			unsigned int next = pos +
> > > MLX5_VPMD_DESCS_PER_LOOP;
> > > --
> > > 2.25.1
Ali Alnubani May 19, 2022, 2:56 p.m. UTC | #4
> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Tuesday, January 4, 2022 5:01 AM
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com; stable@dpdk.org;
> nd@arm.com; Ruifeng Wang <ruifeng.wang@arm.com>
> Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
> 
> In NEON vector PMD, vector load loads two contiguous 8B of
> descriptor data into vector register. Given vector load ensures no
> 16B atomicity, read of the word that includes op_own field could be
> reordered after read of other words. In this case, some words could
> contain invalid data.
> 
> Reloaded qword0 after read barrier to update vector register. This
> ensures that the fetched data is correct.
> 
> Testpmd single core test on N1SDP/ThunderX2 showed no performance
> drop.
> 
> Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx
> completions")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---

Tested with BlueField-2 and didn't see a performance impact.

Tested-by: Ali Alnubani <alialnu@nvidia.com>

Thanks,
Ali
diff mbox series

Patch

diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index b1d16baa61..b1ec615b51 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -647,6 +647,14 @@  rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 		c0 = vld1q_u64((uint64_t *)(p0 + 48));
 		/* Synchronize for loading the rest of blocks. */
 		rte_io_rmb();
+		/* B.0 (CQE 3) reload lower half of the block. */
+		c3 = vld1q_lane_u64((uint64_t *)(p3 + 48), c3, 0);
+		/* B.0 (CQE 2) reload lower half of the block. */
+		c2 = vld1q_lane_u64((uint64_t *)(p2 + 48), c2, 0);
+		/* B.0 (CQE 1) reload lower half of the block. */
+		c1 = vld1q_lane_u64((uint64_t *)(p1 + 48), c1, 0);
+		/* B.0 (CQE 0) reload lower half of the block. */
+		c0 = vld1q_lane_u64((uint64_t *)(p0 + 48), c0, 0);
 		/* Prefetch next 4 CQEs. */
 		if (pkts_n - pos >= 2 * MLX5_VPMD_DESCS_PER_LOOP) {
 			unsigned int next = pos + MLX5_VPMD_DESCS_PER_LOOP;