[v3,2/2] net/i40e: replace SMP barrier with thread fence

Message ID 20210706065404.25137-3-joyce.kong@arm.com (mailing list archive)
State Accepted, archived
Delegated to: Qi Zhang
Headers
Series fixes for i40e hw scan ring |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Joyce Kong July 6, 2021, 6:54 a.m. UTC
  Simply replace the SMP barrier with atomic thread fence for
i40e hw ring sacn, if there is no synchronization point.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 drivers/net/i40e/i40e_rxtx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
  

Comments

Qi Zhang July 8, 2021, 12:09 p.m. UTC | #1
> -----Original Message-----
> From: Joyce Kong <joyce.kong@arm.com>
> Sent: Tuesday, July 6, 2021 2:54 PM
> To: Xing, Beilei <beilei.xing@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> ruifeng.wang@arm.com; honnappa.nagarahalli@arm.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> Cc: dev@dpdk.org; stable@dpdk.org; nd@arm.com
> Subject: [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence
> 
> Simply replace the SMP barrier with atomic thread fence for i40e hw ring sacn,
> if there is no synchronization point.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  drivers/net/i40e/i40e_rxtx.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index
> 9aaabfd92..86e2f083e 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -482,7 +482,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
>  					I40E_RXD_QW1_STATUS_SHIFT;
>  		}
> 
> -		rte_smp_rmb();
> +		/* This barrier is to order loads of different words in the descriptor */
> +		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);

Now for x86, you actually replace a compiler barrier with a memory fence, this may have potential performance impact which need additional resource to investigate 
So there are 2 options:
1. if you want this patch be merged into DPDK 21.08, please change this for ARM only.
2. you can wait for our update for x86 but I guess it will miss 21.08.

What do you think?

Btw for patch 1/2, I think I can merge it independently right?


> 
>  		/* Compute how many status bits were set */
>  		for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++) {
> --
> 2.17.1
  
Lance Richardson July 8, 2021, 1:51 p.m. UTC | #2
On Thu, Jul 8, 2021 at 8:09 AM Zhang, Qi Z <qi.z.zhang@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Joyce Kong <joyce.kong@arm.com>
> > Sent: Tuesday, July 6, 2021 2:54 PM
> > To: Xing, Beilei <beilei.xing@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> > ruifeng.wang@arm.com; honnappa.nagarahalli@arm.com; Richardson, Bruce
> > <bruce.richardson@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> > Cc: dev@dpdk.org; stable@dpdk.org; nd@arm.com
> > Subject: [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence
> >
> > Simply replace the SMP barrier with atomic thread fence for i40e hw ring sacn,
> > if there is no synchronization point.
> >
> > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  drivers/net/i40e/i40e_rxtx.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index
> > 9aaabfd92..86e2f083e 100644
> > --- a/drivers/net/i40e/i40e_rxtx.c
> > +++ b/drivers/net/i40e/i40e_rxtx.c
> > @@ -482,7 +482,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
> >                                       I40E_RXD_QW1_STATUS_SHIFT;
> >               }
> >
> > -             rte_smp_rmb();
> > +             /* This barrier is to order loads of different words in the descriptor */
> > +             rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>
> Now for x86, you actually replace a compiler barrier with a memory fence, this may have potential performance impact which need additional resource to investigate

No memory fence instruction is generated for
__ATOMIC_ACQUIRE on x86 for any version of gcc
or clang that I've tried, based on experiments here:

    https://godbolt.org/z/Yxr1vGhKP
  
Qi Zhang July 8, 2021, 2:26 p.m. UTC | #3
> -----Original Message-----
> From: Lance Richardson <lance.richardson@broadcom.com>
> Sent: Thursday, July 8, 2021 9:51 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: Joyce Kong <joyce.kong@arm.com>; Xing, Beilei <beilei.xing@intel.com>;
> ruifeng.wang@arm.com; honnappa.nagarahalli@arm.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Zhang, Helin <helin.zhang@intel.com>;
> dev@dpdk.org; stable@dpdk.org; nd@arm.com
> Subject: Re: [dpdk-dev] [PATCH v3 2/2] net/i40e: replace SMP barrier with
> thread fence
> 
> On Thu, Jul 8, 2021 at 8:09 AM Zhang, Qi Z <qi.z.zhang@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Joyce Kong <joyce.kong@arm.com>
> > > Sent: Tuesday, July 6, 2021 2:54 PM
> > > To: Xing, Beilei <beilei.xing@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>;
> > > ruifeng.wang@arm.com; honnappa.nagarahalli@arm.com; Richardson,
> Bruce
> > > <bruce.richardson@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> > > Cc: dev@dpdk.org; stable@dpdk.org; nd@arm.com
> > > Subject: [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence
> > >
> > > Simply replace the SMP barrier with atomic thread fence for i40e hw ring
> sacn,
> > > if there is no synchronization point.
> > >
> > > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > >  drivers/net/i40e/i40e_rxtx.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index
> > > 9aaabfd92..86e2f083e 100644
> > > --- a/drivers/net/i40e/i40e_rxtx.c
> > > +++ b/drivers/net/i40e/i40e_rxtx.c
> > > @@ -482,7 +482,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
> > >
> I40E_RXD_QW1_STATUS_SHIFT;
> > >               }
> > >
> > > -             rte_smp_rmb();
> > > +             /* This barrier is to order loads of different words in the
> descriptor */
> > > +             rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> >
> > Now for x86, you actually replace a compiler barrier with a memory fence,
> this may have potential performance impact which need additional resource to
> investigate
> 
> No memory fence instruction is generated for
> __ATOMIC_ACQUIRE on x86 for any version of gcc
> or clang that I've tried, based on experiments here:
> 
>     https://godbolt.org/z/Yxr1vGhKP

Nice tool!
I try to write some dummy code combined with or without __atomic_thread_fence(__ATOMIC_ACQUIRE)
but I didn't see any difference of the generated assembly code, does that means __atomic_thread_fence(__ATOMIC_ACQUIRE) just does nothing on x86?
  
Honnappa Nagarahalli July 8, 2021, 2:44 p.m. UTC | #4
<snip>

> > > >
> > > > Simply replace the SMP barrier with atomic thread fence for i40e
> > > > hw ring
> > sacn,
> > > > if there is no synchronization point.
> > > >
> > > > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > ---
> > > >  drivers/net/i40e/i40e_rxtx.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/net/i40e/i40e_rxtx.c
> > > > b/drivers/net/i40e/i40e_rxtx.c index 9aaabfd92..86e2f083e 100644
> > > > --- a/drivers/net/i40e/i40e_rxtx.c
> > > > +++ b/drivers/net/i40e/i40e_rxtx.c
> > > > @@ -482,7 +482,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue
> > > > *rxq)
> > > >
> > I40E_RXD_QW1_STATUS_SHIFT;
> > > >               }
> > > >
> > > > -             rte_smp_rmb();
> > > > +             /* This barrier is to order loads of different words
> > > > + in the
> > descriptor */
> > > > +             rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> > >
> > > Now for x86, you actually replace a compiler barrier with a memory
> > > fence,
> > this may have potential performance impact which need additional
> > resource to investigate
> >
> > No memory fence instruction is generated for __ATOMIC_ACQUIRE on x86
> > for any version of gcc or clang that I've tried, based on experiments
> > here:
> >
> >     https://godbolt.org/z/Yxr1vGhKP
> 
> Nice tool!
> I try to write some dummy code combined with or without
> __atomic_thread_fence(__ATOMIC_ACQUIRE)
> but I didn't see any difference of the generated assembly code, does that means
> __atomic_thread_fence(__ATOMIC_ACQUIRE) just does nothing on x86?
Yes, it should not have any barriers generated for x86. At the same time it also acts as a compiler barrier.
  
Qi Zhang July 13, 2021, 12:46 a.m. UTC | #5
> -----Original Message-----
> From: Joyce Kong <joyce.kong@arm.com>
> Sent: Tuesday, July 6, 2021 2:54 PM
> To: Xing, Beilei <beilei.xing@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> ruifeng.wang@arm.com; honnappa.nagarahalli@arm.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> Cc: dev@dpdk.org; stable@dpdk.org; nd@arm.com
> Subject: [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence
> 
> Simply replace the SMP barrier with atomic thread fence for i40e hw ring sacn,
> if there is no synchronization point.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

Acked-by: Qi Zhang <qi.z.zhang@intel.com>

Applied to dpdk-next-net-intel.

Thanks
Qi
  

Patch

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 9aaabfd92..86e2f083e 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -482,7 +482,8 @@  i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
 					I40E_RXD_QW1_STATUS_SHIFT;
 		}
 
-		rte_smp_rmb();
+		/* This barrier is to order loads of different words in the descriptor */
+		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
 
 		/* Compute how many status bits were set */
 		for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++) {