[dpdk-dev,v2,09/29] eal/arm64: define I/O device memory barriers for arm64

Message ID 1482832175-27199-10-git-send-email-jerin.jacob@caviumnetworks.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel compilation success Compilation OK

Commit Message

Jerin Jacob Dec. 27, 2016, 9:49 a.m. UTC
  CC: Jianbo Liu <jianbo.liu@linaro.org>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 6 ++++++
 1 file changed, 6 insertions(+)
  

Comments

Jianbo Liu Jan. 3, 2017, 7:48 a.m. UTC | #1
On 27 December 2016 at 17:49, Jerin Jacob
<jerin.jacob@caviumnetworks.com> wrote:
> CC: Jianbo Liu <jianbo.liu@linaro.org>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>  lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> index 78ebea2..ef0efc7 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> @@ -88,6 +88,12 @@ static inline void rte_rmb(void)
>
>  #define rte_smp_rmb() dmb(ishld)
>
> +#define rte_io_mb() rte_mb()
> +
> +#define rte_io_wmb() rte_wmb()
> +
> +#define rte_io_rmb() rte_rmb()
> +

I think it's better to use outer shareable dmb for io barrier, instead of dsb.
  
Jerin Jacob Jan. 4, 2017, 10:01 a.m. UTC | #2
On Tue, Jan 03, 2017 at 03:48:32PM +0800, Jianbo Liu wrote:
> On 27 December 2016 at 17:49, Jerin Jacob
> <jerin.jacob@caviumnetworks.com> wrote:
> > CC: Jianbo Liu <jianbo.liu@linaro.org>
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > ---
> >  lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> > index 78ebea2..ef0efc7 100644
> > --- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> > +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> > @@ -88,6 +88,12 @@ static inline void rte_rmb(void)
> >
> >  #define rte_smp_rmb() dmb(ishld)
> >
> > +#define rte_io_mb() rte_mb()
> > +
> > +#define rte_io_wmb() rte_wmb()
> > +
> > +#define rte_io_rmb() rte_rmb()
> > +
> 
> I think it's better to use outer shareable dmb for io barrier, instead of dsb.

Its is difficult to generalize. AFAIK, from the IO barrier perspective
dsb would be the right candidate. But just for the DMA barrier between IO may
be outer sharable dmb is enough. In-terms of performance implication, the
fastpath code(door bell write) has been changed to relaxed write in all
the drivers in this patchset and rte_io_* will be only
used by rte_[read/write]8/16/32/64 which will be in slow-path.
So, IMO, it better stick with dsb and its safe from the complete IO barrier
perspective.

At least on ThunderX, I couldn't see any performance difference between
using dsb(st) and dmb(oshst) for dma write barrier before the doorbell register
write in fastpath. In case there are platforms which has such performance difference,
may be could add rte_dma_wmb() and rte_dma_rmb() in future like Linux kernel
dma_wmb() and dma_rmb().(But i couldn't  see all the driver are using it,
though)

Jerin
  
Jianbo Liu Jan. 5, 2017, 5:31 a.m. UTC | #3
On 4 January 2017 at 18:01, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> On Tue, Jan 03, 2017 at 03:48:32PM +0800, Jianbo Liu wrote:
>> On 27 December 2016 at 17:49, Jerin Jacob
>> <jerin.jacob@caviumnetworks.com> wrote:
>> > CC: Jianbo Liu <jianbo.liu@linaro.org>
>> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> > ---
>> >  lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 6 ++++++
>> >  1 file changed, 6 insertions(+)
>> >
>> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
>> > index 78ebea2..ef0efc7 100644
>> > --- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
>> > +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
>> > @@ -88,6 +88,12 @@ static inline void rte_rmb(void)
>> >
>> >  #define rte_smp_rmb() dmb(ishld)
>> >
>> > +#define rte_io_mb() rte_mb()
>> > +
>> > +#define rte_io_wmb() rte_wmb()
>> > +
>> > +#define rte_io_rmb() rte_rmb()
>> > +
>>
>> I think it's better to use outer shareable dmb for io barrier, instead of dsb.
>
> Its is difficult to generalize. AFAIK, from the IO barrier perspective
> dsb would be the right candidate. But just for the DMA barrier between IO may
> be outer sharable dmb is enough. In-terms of performance implication, the
> fastpath code(door bell write) has been changed to relaxed write in all
> the drivers in this patchset and rte_io_* will be only
> used by rte_[read/write]8/16/32/64 which will be in slow-path.
> So, IMO, it better stick with dsb and its safe from the complete IO barrier
> perspective.

If so, why not use *mb() directly?

>
> At least on ThunderX, I couldn't see any performance difference between
> using dsb(st) and dmb(oshst) for dma write barrier before the doorbell register
> write in fastpath. In case there are platforms which has such performance difference,
> may be could add rte_dma_wmb() and rte_dma_rmb() in future like Linux kernel
> dma_wmb() and dma_rmb().(But i couldn't  see all the driver are using it,
> though)
>

But there is no io_*mb() in the kernel, so you want to be different?
  
Jerin Jacob Jan. 5, 2017, 6:24 a.m. UTC | #4
On Thu, Jan 05, 2017 at 01:31:44PM +0800, Jianbo Liu wrote:
> On 4 January 2017 at 18:01, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > On Tue, Jan 03, 2017 at 03:48:32PM +0800, Jianbo Liu wrote:
> >> On 27 December 2016 at 17:49, Jerin Jacob
> >> <jerin.jacob@caviumnetworks.com> wrote:
> >> > CC: Jianbo Liu <jianbo.liu@linaro.org>
> >> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> >> > ---
> >> >  lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 6 ++++++
> >> >  1 file changed, 6 insertions(+)
> >> >
> >> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> >> > index 78ebea2..ef0efc7 100644
> >> > --- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> >> > +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> >> > @@ -88,6 +88,12 @@ static inline void rte_rmb(void)
> >> >
> >> >  #define rte_smp_rmb() dmb(ishld)
> >> >
> >> > +#define rte_io_mb() rte_mb()
> >> > +
> >> > +#define rte_io_wmb() rte_wmb()
> >> > +
> >> > +#define rte_io_rmb() rte_rmb()
> >> > +
> >>
> >> I think it's better to use outer shareable dmb for io barrier, instead of dsb.
> >
> > Its is difficult to generalize. AFAIK, from the IO barrier perspective
> > dsb would be the right candidate. But just for the DMA barrier between IO may
> > be outer sharable dmb is enough. In-terms of performance implication, the
> > fastpath code(door bell write) has been changed to relaxed write in all
> > the drivers in this patchset and rte_io_* will be only
> > used by rte_[read/write]8/16/32/64 which will be in slow-path.
> > So, IMO, it better stick with dsb and its safe from the complete IO barrier
> > perspective.
> 
> If so, why not use *mb() directly?

Adding David Marchand, EAL Maintainer.

Instead of rte_io_?. I thought, IO specific constraints can be abstracted
here in rte_io_*. Apart from arm, there other arch like "arc" has similar
constraints. IMHO, no harm in keeping that abstraction.

Thoughts ?

http://lxr.free-electrons.com/ident?i=__iormb

> 
> >
> > At least on ThunderX, I couldn't see any performance difference between
> > using dsb(st) and dmb(oshst) for dma write barrier before the doorbell register
> > write in fastpath. In case there are platforms which has such performance difference,
> > may be could add rte_dma_wmb() and rte_dma_rmb() in future like Linux kernel
> > dma_wmb() and dma_rmb().(But i couldn't  see all the driver are using it,
> > though)
> >
> 
> But there is no io_*mb() in the kernel, so you want to be different?

It is their for arm,arm64,arc architectures in Linux kernel. Please check writel
implementation for arm64

http://lxr.free-electrons.com/source/arch/arm64/include/asm/io.h#L143
  
Jianbo Liu Jan. 5, 2017, 6:47 a.m. UTC | #5
On 5 January 2017 at 14:24, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> On Thu, Jan 05, 2017 at 01:31:44PM +0800, Jianbo Liu wrote:
>> On 4 January 2017 at 18:01, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
>> > On Tue, Jan 03, 2017 at 03:48:32PM +0800, Jianbo Liu wrote:
>> >> On 27 December 2016 at 17:49, Jerin Jacob
>> >> <jerin.jacob@caviumnetworks.com> wrote:
>> >> > CC: Jianbo Liu <jianbo.liu@linaro.org>
>> >> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> >> > ---
>> >> >  lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 6 ++++++
>> >> >  1 file changed, 6 insertions(+)
>> >> >
>> >> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
>> >> > index 78ebea2..ef0efc7 100644
>> >> > --- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
>> >> > +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
>> >> > @@ -88,6 +88,12 @@ static inline void rte_rmb(void)
>> >> >
>> >> >  #define rte_smp_rmb() dmb(ishld)
>> >> >
>> >> > +#define rte_io_mb() rte_mb()
>> >> > +
>> >> > +#define rte_io_wmb() rte_wmb()
>> >> > +
>> >> > +#define rte_io_rmb() rte_rmb()
>> >> > +
>> >>
>> >> I think it's better to use outer shareable dmb for io barrier, instead of dsb.
>> >
>> > Its is difficult to generalize. AFAIK, from the IO barrier perspective
>> > dsb would be the right candidate. But just for the DMA barrier between IO may
>> > be outer sharable dmb is enough. In-terms of performance implication, the
>> > fastpath code(door bell write) has been changed to relaxed write in all
>> > the drivers in this patchset and rte_io_* will be only
>> > used by rte_[read/write]8/16/32/64 which will be in slow-path.
>> > So, IMO, it better stick with dsb and its safe from the complete IO barrier
>> > perspective.
>>
>> If so, why not use *mb() directly?
>
> Adding David Marchand, EAL Maintainer.
>
> Instead of rte_io_?. I thought, IO specific constraints can be abstracted
> here in rte_io_*. Apart from arm, there other arch like "arc" has similar
> constraints. IMHO, no harm in keeping that abstraction.
>
> Thoughts ?
>
> http://lxr.free-electrons.com/ident?i=__iormb
>
>>
>> >
>> > At least on ThunderX, I couldn't see any performance difference between
>> > using dsb(st) and dmb(oshst) for dma write barrier before the doorbell register
>> > write in fastpath. In case there are platforms which has such performance difference,
>> > may be could add rte_dma_wmb() and rte_dma_rmb() in future like Linux kernel
>> > dma_wmb() and dma_rmb().(But i couldn't  see all the driver are using it,
>> > though)
>> >
>>
>> But there is no io_*mb() in the kernel, so you want to be different?
>
> It is their for arm,arm64,arc architectures in Linux kernel. Please check writel
> implementation for arm64
>
> http://lxr.free-electrons.com/source/arch/arm64/include/asm/io.h#L143
>

Yes, I knew. But I'm afraid it will be mixed with dma_*mb by someone else.
  
Jerin Jacob Jan. 5, 2017, 7:22 a.m. UTC | #6
On Thu, Jan 05, 2017 at 02:47:27PM +0800, Jianbo Liu wrote:
> On 5 January 2017 at 14:24, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > On Thu, Jan 05, 2017 at 01:31:44PM +0800, Jianbo Liu wrote:
> >> On 4 January 2017 at 18:01, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> >> > On Tue, Jan 03, 2017 at 03:48:32PM +0800, Jianbo Liu wrote:
> >> >> On 27 December 2016 at 17:49, Jerin Jacob
> >> >> <jerin.jacob@caviumnetworks.com> wrote:
> >> >> > CC: Jianbo Liu <jianbo.liu@linaro.org>
> >> >> I think it's better to use outer shareable dmb for io barrier, instead of dsb.
> >> >
> >> > Its is difficult to generalize. AFAIK, from the IO barrier perspective
> >> > dsb would be the right candidate. But just for the DMA barrier between IO may
> >> > be outer sharable dmb is enough. In-terms of performance implication, the
> >> > fastpath code(door bell write) has been changed to relaxed write in all
> >> > the drivers in this patchset and rte_io_* will be only
> >> > used by rte_[read/write]8/16/32/64 which will be in slow-path.
> >> > So, IMO, it better stick with dsb and its safe from the complete IO barrier
> >> > perspective.
> >>
> >> If so, why not use *mb() directly?
> >
> > Adding David Marchand, EAL Maintainer.
> >
> > Instead of rte_io_?. I thought, IO specific constraints can be abstracted
> > here in rte_io_*. Apart from arm, there other arch like "arc" has similar
> > constraints. IMHO, no harm in keeping that abstraction.
> >
> > Thoughts ?
> >
> > http://lxr.free-electrons.com/ident?i=__iormb
> >
> >>
> >> >
> >> > At least on ThunderX, I couldn't see any performance difference between
> >> > using dsb(st) and dmb(oshst) for dma write barrier before the doorbell register
> >> > write in fastpath. In case there are platforms which has such performance difference,
> >> > may be could add rte_dma_wmb() and rte_dma_rmb() in future like Linux kernel
> >> > dma_wmb() and dma_rmb().(But i couldn't  see all the driver are using it,
> >> > though)
> >> >
> >>
> >> But there is no io_*mb() in the kernel, so you want to be different?
> >
> > It is their for arm,arm64,arc architectures in Linux kernel. Please check writel
> > implementation for arm64
> >
> > http://lxr.free-electrons.com/source/arch/arm64/include/asm/io.h#L143
> >
> 
> Yes, I knew. But I'm afraid it will be mixed with dma_*mb by someone else.

OK. Got it. To me both are totally different.
Feel free introduce additional dma_*mb* then, if you think its solving
any problem in DPDK.I am not seeing any performance different between
dsb sy and dmb outer one. If you have any platform that has performance
difference then I _think_ you can introduce dma_*mb()
  

Patch

diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
index 78ebea2..ef0efc7 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
@@ -88,6 +88,12 @@  static inline void rte_rmb(void)
 
 #define rte_smp_rmb() dmb(ishld)
 
+#define rte_io_mb() rte_mb()
+
+#define rte_io_wmb() rte_wmb()
+
+#define rte_io_rmb() rte_rmb()
+
 #ifdef __cplusplus
 }
 #endif