[v3,2/2] net/i40e: replace SMP barrier with thread fence
Checks
Commit Message
Simply replace the SMP barrier with atomic thread fence for
i40e hw ring sacn, if there is no synchronization point.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
drivers/net/i40e/i40e_rxtx.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Comments
> -----Original Message-----
> From: Joyce Kong <joyce.kong@arm.com>
> Sent: Tuesday, July 6, 2021 2:54 PM
> To: Xing, Beilei <beilei.xing@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> ruifeng.wang@arm.com; honnappa.nagarahalli@arm.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> Cc: dev@dpdk.org; stable@dpdk.org; nd@arm.com
> Subject: [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence
>
> Simply replace the SMP barrier with atomic thread fence for i40e hw ring sacn,
> if there is no synchronization point.
>
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> drivers/net/i40e/i40e_rxtx.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index
> 9aaabfd92..86e2f083e 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -482,7 +482,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
> I40E_RXD_QW1_STATUS_SHIFT;
> }
>
> - rte_smp_rmb();
> + /* This barrier is to order loads of different words in the descriptor */
> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
Now for x86, you actually replace a compiler barrier with a memory fence, this may have potential performance impact which need additional resource to investigate
So there are 2 options:
1. if you want this patch be merged into DPDK 21.08, please change this for ARM only.
2. you can wait for our update for x86 but I guess it will miss 21.08.
What do you think?
Btw for patch 1/2, I think I can merge it independently right?
>
> /* Compute how many status bits were set */
> for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++) {
> --
> 2.17.1
On Thu, Jul 8, 2021 at 8:09 AM Zhang, Qi Z <qi.z.zhang@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Joyce Kong <joyce.kong@arm.com>
> > Sent: Tuesday, July 6, 2021 2:54 PM
> > To: Xing, Beilei <beilei.xing@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> > ruifeng.wang@arm.com; honnappa.nagarahalli@arm.com; Richardson, Bruce
> > <bruce.richardson@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> > Cc: dev@dpdk.org; stable@dpdk.org; nd@arm.com
> > Subject: [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence
> >
> > Simply replace the SMP barrier with atomic thread fence for i40e hw ring sacn,
> > if there is no synchronization point.
> >
> > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> > drivers/net/i40e/i40e_rxtx.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index
> > 9aaabfd92..86e2f083e 100644
> > --- a/drivers/net/i40e/i40e_rxtx.c
> > +++ b/drivers/net/i40e/i40e_rxtx.c
> > @@ -482,7 +482,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
> > I40E_RXD_QW1_STATUS_SHIFT;
> > }
> >
> > - rte_smp_rmb();
> > + /* This barrier is to order loads of different words in the descriptor */
> > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>
> Now for x86, you actually replace a compiler barrier with a memory fence, this may have potential performance impact which need additional resource to investigate
No memory fence instruction is generated for
__ATOMIC_ACQUIRE on x86 for any version of gcc
or clang that I've tried, based on experiments here:
https://godbolt.org/z/Yxr1vGhKP
> -----Original Message-----
> From: Lance Richardson <lance.richardson@broadcom.com>
> Sent: Thursday, July 8, 2021 9:51 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: Joyce Kong <joyce.kong@arm.com>; Xing, Beilei <beilei.xing@intel.com>;
> ruifeng.wang@arm.com; honnappa.nagarahalli@arm.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Zhang, Helin <helin.zhang@intel.com>;
> dev@dpdk.org; stable@dpdk.org; nd@arm.com
> Subject: Re: [dpdk-dev] [PATCH v3 2/2] net/i40e: replace SMP barrier with
> thread fence
>
> On Thu, Jul 8, 2021 at 8:09 AM Zhang, Qi Z <qi.z.zhang@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Joyce Kong <joyce.kong@arm.com>
> > > Sent: Tuesday, July 6, 2021 2:54 PM
> > > To: Xing, Beilei <beilei.xing@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>;
> > > ruifeng.wang@arm.com; honnappa.nagarahalli@arm.com; Richardson,
> Bruce
> > > <bruce.richardson@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> > > Cc: dev@dpdk.org; stable@dpdk.org; nd@arm.com
> > > Subject: [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence
> > >
> > > Simply replace the SMP barrier with atomic thread fence for i40e hw ring
> sacn,
> > > if there is no synchronization point.
> > >
> > > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > > drivers/net/i40e/i40e_rxtx.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index
> > > 9aaabfd92..86e2f083e 100644
> > > --- a/drivers/net/i40e/i40e_rxtx.c
> > > +++ b/drivers/net/i40e/i40e_rxtx.c
> > > @@ -482,7 +482,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
> > >
> I40E_RXD_QW1_STATUS_SHIFT;
> > > }
> > >
> > > - rte_smp_rmb();
> > > + /* This barrier is to order loads of different words in the
> descriptor */
> > > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> >
> > Now for x86, you actually replace a compiler barrier with a memory fence,
> this may have potential performance impact which need additional resource to
> investigate
>
> No memory fence instruction is generated for
> __ATOMIC_ACQUIRE on x86 for any version of gcc
> or clang that I've tried, based on experiments here:
>
> https://godbolt.org/z/Yxr1vGhKP
Nice tool!
I try to write some dummy code combined with or without __atomic_thread_fence(__ATOMIC_ACQUIRE)
but I didn't see any difference of the generated assembly code, does that means __atomic_thread_fence(__ATOMIC_ACQUIRE) just does nothing on x86?
<snip>
> > > >
> > > > Simply replace the SMP barrier with atomic thread fence for i40e
> > > > hw ring
> > sacn,
> > > > if there is no synchronization point.
> > > >
> > > > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > ---
> > > > drivers/net/i40e/i40e_rxtx.c | 3 ++-
> > > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/net/i40e/i40e_rxtx.c
> > > > b/drivers/net/i40e/i40e_rxtx.c index 9aaabfd92..86e2f083e 100644
> > > > --- a/drivers/net/i40e/i40e_rxtx.c
> > > > +++ b/drivers/net/i40e/i40e_rxtx.c
> > > > @@ -482,7 +482,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue
> > > > *rxq)
> > > >
> > I40E_RXD_QW1_STATUS_SHIFT;
> > > > }
> > > >
> > > > - rte_smp_rmb();
> > > > + /* This barrier is to order loads of different words
> > > > + in the
> > descriptor */
> > > > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> > >
> > > Now for x86, you actually replace a compiler barrier with a memory
> > > fence,
> > this may have potential performance impact which need additional
> > resource to investigate
> >
> > No memory fence instruction is generated for __ATOMIC_ACQUIRE on x86
> > for any version of gcc or clang that I've tried, based on experiments
> > here:
> >
> > https://godbolt.org/z/Yxr1vGhKP
>
> Nice tool!
> I try to write some dummy code combined with or without
> __atomic_thread_fence(__ATOMIC_ACQUIRE)
> but I didn't see any difference of the generated assembly code, does that means
> __atomic_thread_fence(__ATOMIC_ACQUIRE) just does nothing on x86?
Yes, it should not have any barriers generated for x86. At the same time it also acts as a compiler barrier.
> -----Original Message-----
> From: Joyce Kong <joyce.kong@arm.com>
> Sent: Tuesday, July 6, 2021 2:54 PM
> To: Xing, Beilei <beilei.xing@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> ruifeng.wang@arm.com; honnappa.nagarahalli@arm.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> Cc: dev@dpdk.org; stable@dpdk.org; nd@arm.com
> Subject: [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence
>
> Simply replace the SMP barrier with atomic thread fence for i40e hw ring sacn,
> if there is no synchronization point.
>
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Applied to dpdk-next-net-intel.
Thanks
Qi
@@ -482,7 +482,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
I40E_RXD_QW1_STATUS_SHIFT;
}
- rte_smp_rmb();
+ /* This barrier is to order loads of different words in the descriptor */
+ rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
/* Compute how many status bits were set */
for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++) {