Message ID | 20210706065404.25137-3-joyce.kong@arm.com (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Qi Zhang |
Headers | show |
Series | fixes for i40e hw scan ring | expand |
Context | Check | Description |
---|---|---|
ci/checkpatch | success | coding style OK |
> -----Original Message----- > From: Joyce Kong <joyce.kong@arm.com> > Sent: Tuesday, July 6, 2021 2:54 PM > To: Xing, Beilei <beilei.xing@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; > ruifeng.wang@arm.com; honnappa.nagarahalli@arm.com; Richardson, Bruce > <bruce.richardson@intel.com>; Zhang, Helin <helin.zhang@intel.com> > Cc: dev@dpdk.org; stable@dpdk.org; nd@arm.com > Subject: [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence > > Simply replace the SMP barrier with atomic thread fence for i40e hw ring sacn, > if there is no synchronization point. > > Signed-off-by: Joyce Kong <joyce.kong@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > --- > drivers/net/i40e/i40e_rxtx.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index > 9aaabfd92..86e2f083e 100644 > --- a/drivers/net/i40e/i40e_rxtx.c > +++ b/drivers/net/i40e/i40e_rxtx.c > @@ -482,7 +482,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq) > I40E_RXD_QW1_STATUS_SHIFT; > } > > - rte_smp_rmb(); > + /* This barrier is to order loads of different words in the descriptor */ > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); Now for x86, you actually replace a compiler barrier with a memory fence, this may have potential performance impact which need additional resource to investigate So there are 2 options: 1. if you want this patch be merged into DPDK 21.08, please change this for ARM only. 2. you can wait for our update for x86 but I guess it will miss 21.08. What do you think? Btw for patch 1/2, I think I can merge it independently right? > > /* Compute how many status bits were set */ > for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++) { > -- > 2.17.1
On Thu, Jul 8, 2021 at 8:09 AM Zhang, Qi Z <qi.z.zhang@intel.com> wrote: > > > > > -----Original Message----- > > From: Joyce Kong <joyce.kong@arm.com> > > Sent: Tuesday, July 6, 2021 2:54 PM > > To: Xing, Beilei <beilei.xing@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; > > ruifeng.wang@arm.com; honnappa.nagarahalli@arm.com; Richardson, Bruce > > <bruce.richardson@intel.com>; Zhang, Helin <helin.zhang@intel.com> > > Cc: dev@dpdk.org; stable@dpdk.org; nd@arm.com > > Subject: [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence > > > > Simply replace the SMP barrier with atomic thread fence for i40e hw ring sacn, > > if there is no synchronization point. > > > > Signed-off-by: Joyce Kong <joyce.kong@arm.com> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > > --- > > drivers/net/i40e/i40e_rxtx.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index > > 9aaabfd92..86e2f083e 100644 > > --- a/drivers/net/i40e/i40e_rxtx.c > > +++ b/drivers/net/i40e/i40e_rxtx.c > > @@ -482,7 +482,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq) > > I40E_RXD_QW1_STATUS_SHIFT; > > } > > > > - rte_smp_rmb(); > > + /* This barrier is to order loads of different words in the descriptor */ > > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > > Now for x86, you actually replace a compiler barrier with a memory fence, this may have potential performance impact which need additional resource to investigate No memory fence instruction is generated for __ATOMIC_ACQUIRE on x86 for any version of gcc or clang that I've tried, based on experiments here: https://godbolt.org/z/Yxr1vGhKP
> -----Original Message----- > From: Lance Richardson <lance.richardson@broadcom.com> > Sent: Thursday, July 8, 2021 9:51 PM > To: Zhang, Qi Z <qi.z.zhang@intel.com> > Cc: Joyce Kong <joyce.kong@arm.com>; Xing, Beilei <beilei.xing@intel.com>; > ruifeng.wang@arm.com; honnappa.nagarahalli@arm.com; Richardson, Bruce > <bruce.richardson@intel.com>; Zhang, Helin <helin.zhang@intel.com>; > dev@dpdk.org; stable@dpdk.org; nd@arm.com > Subject: Re: [dpdk-dev] [PATCH v3 2/2] net/i40e: replace SMP barrier with > thread fence > > On Thu, Jul 8, 2021 at 8:09 AM Zhang, Qi Z <qi.z.zhang@intel.com> wrote: > > > > > > > > > -----Original Message----- > > > From: Joyce Kong <joyce.kong@arm.com> > > > Sent: Tuesday, July 6, 2021 2:54 PM > > > To: Xing, Beilei <beilei.xing@intel.com>; Zhang, Qi Z > <qi.z.zhang@intel.com>; > > > ruifeng.wang@arm.com; honnappa.nagarahalli@arm.com; Richardson, > Bruce > > > <bruce.richardson@intel.com>; Zhang, Helin <helin.zhang@intel.com> > > > Cc: dev@dpdk.org; stable@dpdk.org; nd@arm.com > > > Subject: [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence > > > > > > Simply replace the SMP barrier with atomic thread fence for i40e hw ring > sacn, > > > if there is no synchronization point. > > > > > > Signed-off-by: Joyce Kong <joyce.kong@arm.com> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > > > --- > > > drivers/net/i40e/i40e_rxtx.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index > > > 9aaabfd92..86e2f083e 100644 > > > --- a/drivers/net/i40e/i40e_rxtx.c > > > +++ b/drivers/net/i40e/i40e_rxtx.c > > > @@ -482,7 +482,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq) > > > > I40E_RXD_QW1_STATUS_SHIFT; > > > } > > > > > > - rte_smp_rmb(); > > > + /* This barrier is to order loads of different words in the > descriptor */ > > > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > > > > Now for x86, you actually replace a compiler barrier with a memory fence, > this may have potential performance impact which need additional resource to > investigate > > No memory fence instruction is generated for > __ATOMIC_ACQUIRE on x86 for any version of gcc > or clang that I've tried, based on experiments here: > > https://godbolt.org/z/Yxr1vGhKP Nice tool! I try to write some dummy code combined with or without __atomic_thread_fence(__ATOMIC_ACQUIRE) but I didn't see any difference of the generated assembly code, does that means __atomic_thread_fence(__ATOMIC_ACQUIRE) just does nothing on x86?
<snip> > > > > > > > > Simply replace the SMP barrier with atomic thread fence for i40e > > > > hw ring > > sacn, > > > > if there is no synchronization point. > > > > > > > > Signed-off-by: Joyce Kong <joyce.kong@arm.com> > > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > > > > --- > > > > drivers/net/i40e/i40e_rxtx.c | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/net/i40e/i40e_rxtx.c > > > > b/drivers/net/i40e/i40e_rxtx.c index 9aaabfd92..86e2f083e 100644 > > > > --- a/drivers/net/i40e/i40e_rxtx.c > > > > +++ b/drivers/net/i40e/i40e_rxtx.c > > > > @@ -482,7 +482,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue > > > > *rxq) > > > > > > I40E_RXD_QW1_STATUS_SHIFT; > > > > } > > > > > > > > - rte_smp_rmb(); > > > > + /* This barrier is to order loads of different words > > > > + in the > > descriptor */ > > > > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > > > > > > Now for x86, you actually replace a compiler barrier with a memory > > > fence, > > this may have potential performance impact which need additional > > resource to investigate > > > > No memory fence instruction is generated for __ATOMIC_ACQUIRE on x86 > > for any version of gcc or clang that I've tried, based on experiments > > here: > > > > https://godbolt.org/z/Yxr1vGhKP > > Nice tool! > I try to write some dummy code combined with or without > __atomic_thread_fence(__ATOMIC_ACQUIRE) > but I didn't see any difference of the generated assembly code, does that means > __atomic_thread_fence(__ATOMIC_ACQUIRE) just does nothing on x86? Yes, it should not have any barriers generated for x86. At the same time it also acts as a compiler barrier.
> -----Original Message----- > From: Joyce Kong <joyce.kong@arm.com> > Sent: Tuesday, July 6, 2021 2:54 PM > To: Xing, Beilei <beilei.xing@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; > ruifeng.wang@arm.com; honnappa.nagarahalli@arm.com; Richardson, Bruce > <bruce.richardson@intel.com>; Zhang, Helin <helin.zhang@intel.com> > Cc: dev@dpdk.org; stable@dpdk.org; nd@arm.com > Subject: [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence > > Simply replace the SMP barrier with atomic thread fence for i40e hw ring sacn, > if there is no synchronization point. > > Signed-off-by: Joyce Kong <joyce.kong@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com> Applied to dpdk-next-net-intel. Thanks Qi
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 9aaabfd92..86e2f083e 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -482,7 +482,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq) I40E_RXD_QW1_STATUS_SHIFT; } - rte_smp_rmb(); + /* This barrier is to order loads of different words in the descriptor */ + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); /* Compute how many status bits were set */ for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++) {