eal/armv7: add support for rte pause
diff mbox series

Message ID 20181007063127.27960-1-jerin.jacob@caviumnetworks.com
State Changes Requested, archived
Delegated to: Thomas Monjalon
Headers show
Series
  • eal/armv7: add support for rte pause
Related show

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Jerin Jacob Oct. 7, 2018, 6:31 a.m. UTC
Add support for rte_pause() implementation for armv7.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---

The reference implementation for Linux's cpu_relax() for armv7 is at
https://elixir.bootlin.com/linux/latest/source/arch/arm/include/asm/processor.h#L100

---
 lib/librte_eal/common/include/arch/arm/rte_pause_32.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Ola Liljedahl Oct. 7, 2018, 9:09 p.m. UTC | #1
On 07/10/2018, 08:32, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:

    Add support for rte_pause() implementation for armv7.

    Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
    ---

    The reference implementation for Linux's cpu_relax() for armv7 is at
    https://elixir.bootlin.com/linux/latest/source/arch/arm/include/asm/processor.h#L100

    ---
     lib/librte_eal/common/include/arch/arm/rte_pause_32.h | 4 +++-
     1 file changed, 3 insertions(+), 1 deletion(-)

    diff --git a/lib/librte_eal/common/include/arch/arm/rte_pause_32.h b/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
    index d4768c7a9..9b856e0cf 100644
    --- a/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
    +++ b/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
    @@ -9,11 +9,13 @@
     extern "C" {
     #endif

    -#include <rte_common.h>
    +#include <rte_atomic.h>
    +
     #include "generic/rte_pause.h"

     static inline void rte_pause(void)
     {
    +rte_compiler_barrier();
The compiler barrier is not mandated by the DPDK documentation for rte_pause():
http://doc.dpdk.org/api/rte__pause_8h.html

You have to go all the way to the source and GCC documentation to discover that for GCC, rte_pause calls _mm_pause() which in turn is implemented using __builtin_ia32_pause().
https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/X86-Built-in-Functions.html
void __builtin_ia32_pause (void)
Generates the pause machine instruction with a compiler memory barrier.


If you are using C11 atomic operations e.g. for polling a location, the atomic operations will be able to provide the required semantics (e.g. don't merge atomic loads from different iterations of a loop, optionally provide acquire and/or release (or stronger) ordering. A compiler barrier here interferes with the (possibly weaker) barriers from the atomic operations. We could use a C11-version of rte_pause() that doesn't have the compiler barrier. But actually, we want support for WFE, x86 also has something similar now, MONITOR/MWAIT?.

-- Ola


     }

     #ifdef __cplusplus
    --
    2.19.0



IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Jerin Jacob Oct. 8, 2018, 6:27 a.m. UTC | #2
-----Original Message-----
> Date: Sun, 7 Oct 2018 21:09:25 +0000
> From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, Jan Viktorin
>  <viktorin@rehivetech.com>, "Gavin Hu (Arm Technology China)"
>  <Gavin.Hu@arm.com>
> CC: "dev@dpdk.org" <dev@dpdk.org>, "thomas@monjalon.net"
>  <thomas@monjalon.net>
> Subject: Re: [dpdk-dev] [PATCH] eal/armv7: add support for rte pause
> user-agent: Microsoft-MacOutlook/10.11.0.180909
> 
> External Email
> 
> On 07/10/2018, 08:32, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
> 
>     Add support for rte_pause() implementation for armv7.
> 
>     Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>     ---
> 
>     The reference implementation for Linux's cpu_relax() for armv7 is at
>     https://elixir.bootlin.com/linux/latest/source/arch/arm/include/asm/processor.h#L100
> 
>     ---
>      lib/librte_eal/common/include/arch/arm/rte_pause_32.h | 4 +++-
>      1 file changed, 3 insertions(+), 1 deletion(-)
> 
>     diff --git a/lib/librte_eal/common/include/arch/arm/rte_pause_32.h b/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
>     index d4768c7a9..9b856e0cf 100644
>     --- a/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
>     +++ b/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
>     @@ -9,11 +9,13 @@
>      extern "C" {
>      #endif
> 
>     -#include <rte_common.h>
>     +#include <rte_atomic.h>
>     +
>      #include "generic/rte_pause.h"
> 
>      static inline void rte_pause(void)
>      {
>     +rte_compiler_barrier();
> The compiler barrier is not mandated by the DPDK documentation for rte_pause():
> http://doc.dpdk.org/api/rte__pause_8h.html

We can add that explicitly if required to inline with other arch. Just like
Linux kernel's cpu_relax()

> 
> You have to go all the way to the source and GCC documentation to discover that for GCC, rte_pause calls _mm_pause() which in turn is implemented using __builtin_ia32_pause().
> https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/X86-Built-in-Functions.html
> void __builtin_ia32_pause (void)
> Generates the pause machine instruction with a compiler memory barrier.

Yes. IMO, it makes sense to have compiler memory barrier to make sure it
waits semantically at least WRT current rte_pause() usage.

> 
> If you are using C11 atomic operations e.g. for polling a location, the atomic operations will be able to provide the required semantics (e.g. don't merge atomic loads from different iterations of a loop, optionally provide acquire and/or release (or stronger) ordering. A compiler barrier here interferes with the (possibly weaker) barriers from the atomic operations. We could use a C11-version of rte_pause() that doesn't have the compiler barrier. But actually, we want support for WFE, x86 also has something similar now, MONITOR/MWAIT

If it is WFE then who will wake up from the power saving state. SEV from the
other thread?

What would be a C11 version of rte_pause()?

> 
> -- Ola
> 
> 
>      }
> 
>      #ifdef __cplusplus
>     --
>     2.19.0
> 
> 
> 
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Ola Liljedahl Oct. 8, 2018, 8:25 a.m. UTC | #3
On 08/10/2018, 08:27, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:

    -----Original Message-----
    > Date: Sun, 7 Oct 2018 21:09:25 +0000
    > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
    > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, Jan Viktorin
    >  <viktorin@rehivetech.com>, "Gavin Hu (Arm Technology China)"
    >  <Gavin.Hu@arm.com>
    > CC: "dev@dpdk.org" <dev@dpdk.org>, "thomas@monjalon.net"
    >  <thomas@monjalon.net>
    > Subject: Re: [dpdk-dev] [PATCH] eal/armv7: add support for rte pause
    > user-agent: Microsoft-MacOutlook/10.11.0.180909
    >
    > External Email
    >
    > On 07/10/2018, 08:32, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
    >
    >     Add support for rte_pause() implementation for armv7.
    >
    >     Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
    >     ---
    >
    >     The reference implementation for Linux's cpu_relax() for armv7 is at
    >     https://elixir.bootlin.com/linux/latest/source/arch/arm/include/asm/processor.h#L100
    >
    >     ---
    >      lib/librte_eal/common/include/arch/arm/rte_pause_32.h | 4 +++-
    >      1 file changed, 3 insertions(+), 1 deletion(-)
    >
    >     diff --git a/lib/librte_eal/common/include/arch/arm/rte_pause_32.h b/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
    >     index d4768c7a9..9b856e0cf 100644
    >     --- a/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
    >     +++ b/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
    >     @@ -9,11 +9,13 @@
    >      extern "C" {
    >      #endif
    >
    >     -#include <rte_common.h>
    >     +#include <rte_atomic.h>
    >     +
    >      #include "generic/rte_pause.h"
    >
    >      static inline void rte_pause(void)
    >      {
    >     +rte_compiler_barrier();
    > The compiler barrier is not mandated by the DPDK documentation for rte_pause():
    > http://doc.dpdk.org/api/rte__pause_8h.html

    We can add that explicitly if required to inline with other arch. Just like
    Linux kernel's cpu_relax()
I think the documentation should specify this compiler barrier if it is needed for correct behaviour.


    >
    > You have to go all the way to the source and GCC documentation to discover that for GCC, rte_pause calls _mm_pause() which in turn is implemented using __builtin_ia32_pause().
    > https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/X86-Built-in-Functions.html
    > void __builtin_ia32_pause (void)
    > Generates the pause machine instruction with a compiler memory barrier.

    Yes. IMO, it makes sense to have compiler memory barrier to make sure it
    waits semantically at least WRT current rte_pause() usage.
Current *non-C11* usage. But more and more code in DPDK uses the C11 memory model.


    >
    > If you are using C11 atomic operations e.g. for polling a location, the atomic operations will be able to provide the required semantics (e.g. don't merge atomic loads from different iterations of a loop, optionally provide acquire and/or release (or stronger) ordering. A compiler barrier here interferes with the (possibly weaker) barriers from the atomic operations. We could use a C11-version of rte_pause() that doesn't have the compiler barrier. But actually, we want support for WFE, x86 also has something similar now, MONITOR/MWAIT

    If it is WFE then who will wake up from the power saving state. SEV from the
    other thread?
SEV/WFE is the ARMv7 way of waiting for event but the waking up is very crude (SEV broadcasts an event to *all* cores). ARMv8 introduces a new way where the waiting thread uses SEVL/WFE/LDXR/WFE to wait for a specific location (in practice cache line) to be updated and whichever thread writes the location will automatically notify any waiters (no SEV needed). See code example in other email thread.


    What would be a C11 version of rte_pause()?
A function that stalls the CPU for some ten(s) of cycles. No implicit or explicit (compiler) barriers. E.g. ISB on ARM which - unlink NOP - actually stalls the pipeline for 10-20 cycles (but ISB will also have HW barrier semantics). But as I wrote above, using WFE would be better (at least has been better in the internal benchmarks I have done/seen). Much better to focus our efforts on how to make use of WFE for C11 code.


    >
    > -- Ola
    >
    >
    >      }
    >
    >      #ifdef __cplusplus
    >     --
    >     2.19.0
    >
    >
    >
    > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Jerin Jacob Oct. 8, 2018, 8:41 a.m. UTC | #4
-----Original Message-----
> Date: Mon, 8 Oct 2018 08:25:28 +0000
> From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: Jan Viktorin <viktorin@rehivetech.com>, "Gavin Hu (Arm Technology
>  China)" <Gavin.Hu@arm.com>, "dev@dpdk.org" <dev@dpdk.org>,
>  "thomas@monjalon.net" <thomas@monjalon.net>
> Subject: Re: [dpdk-dev] [PATCH] eal/armv7: add support for rte pause
> user-agent: Microsoft-MacOutlook/10.11.0.180909
> 
> 
> On 08/10/2018, 08:27, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
> 
>     -----Original Message-----
>     > Date: Sun, 7 Oct 2018 21:09:25 +0000
>     > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
>     > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, Jan Viktorin
>     >  <viktorin@rehivetech.com>, "Gavin Hu (Arm Technology China)"
>     >  <Gavin.Hu@arm.com>
>     > CC: "dev@dpdk.org" <dev@dpdk.org>, "thomas@monjalon.net"
>     >  <thomas@monjalon.net>
>     > Subject: Re: [dpdk-dev] [PATCH] eal/armv7: add support for rte pause
>     > user-agent: Microsoft-MacOutlook/10.11.0.180909
>     >
>     > External Email
>     >
>     > On 07/10/2018, 08:32, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
>     >
>     >     Add support for rte_pause() implementation for armv7.
>     >
>     >     Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>     >     ---
>     >
>     >     The reference implementation for Linux's cpu_relax() for armv7 is at
>     >     https://elixir.bootlin.com/linux/latest/source/arch/arm/include/asm/processor.h#L100
>     >
>     >     ---
>     >      lib/librte_eal/common/include/arch/arm/rte_pause_32.h | 4 +++-
>     >      1 file changed, 3 insertions(+), 1 deletion(-)
>     >
>     >     diff --git a/lib/librte_eal/common/include/arch/arm/rte_pause_32.h b/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
>     >     index d4768c7a9..9b856e0cf 100644
>     >     --- a/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
>     >     +++ b/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
>     >     @@ -9,11 +9,13 @@
>     >      extern "C" {
>     >      #endif
>     >
>     >     -#include <rte_common.h>
>     >     +#include <rte_atomic.h>
>     >     +
>     >      #include "generic/rte_pause.h"
>     >
>     >      static inline void rte_pause(void)
>     >      {
>     >     +rte_compiler_barrier();
>     > The compiler barrier is not mandated by the DPDK documentation for rte_pause():
>     > http://doc.dpdk.org/api/rte__pause_8h.html
> 
>     We can add that explicitly if required to inline with other arch. Just like
>     Linux kernel's cpu_relax()
> I think the documentation should specify this compiler barrier if it is needed for correct behaviour.

Yes.

> 
> 
>     >
>     > You have to go all the way to the source and GCC documentation to discover that for GCC, rte_pause calls _mm_pause() which in turn is implemented using __builtin_ia32_pause().
>     > https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/X86-Built-in-Functions.html
>     > void __builtin_ia32_pause (void)
>     > Generates the pause machine instruction with a compiler memory barrier.
> 
>     Yes. IMO, it makes sense to have compiler memory barrier to make sure it
>     waits semantically at least WRT current rte_pause() usage.
> Current *non-C11* usage. But more and more code in DPDK uses the C11 memory model.

Probably we need a different API to CPU wait,n otherwise exiting
rte_pause() will break. For example, lib/librte_ring/rte_ring_generic.h
also uses rte_pause()


> 
> 
>     >
>     > If you are using C11 atomic operations e.g. for polling a location, the atomic operations will be able to provide the required semantics (e.g. don't merge atomic loads from different iterations of a loop, optionally provide acquire and/or release (or stronger) ordering. A compiler barrier here interferes with the (possibly weaker) barriers from the atomic operations. We could use a C11-version of rte_pause() that doesn't have the compiler barrier. But actually, we want support for WFE, x86 also has something similar now, MONITOR/MWAIT
> 
>     If it is WFE then who will wake up from the power saving state. SEV from the
>     other thread?
> SEV/WFE is the ARMv7 way of waiting for event but the waking up is very crude (SEV broadcasts an event to *all* cores). ARMv8 introduces a new way where the waiting thread uses SEVL/WFE/LDXR/WFE to wait for a specific location (in practice cache line) to be updated and whichever thread writes the location will automatically notify any waiters (no SEV needed). See code example in other email thread.

Yes. The context was ARMv7 patch so I said about SEV.

> 
> 
>     What would be a C11 version of rte_pause()?
> A function that stalls the CPU for some ten(s) of cycles. No implicit or explicit (compiler) barriers. E.g. ISB on ARM which - unlink NOP - actually stalls the pipeline for 10-20 cycles (but ISB will also have HW barrier semantics). But as I wrote above, using WFE would be better (at least has been better in the internal benchmarks I have done/seen). Much better to focus our efforts on how to make use of WFE for C11 code.

Is there any API in C11 which maps WFE and LDXR pair.? If not, We need
introduce new API to use conjunction with LDXR.

I would say more than C11, Address use case through load acquire and store release semantics.

 
> 
>     >
>     > -- Ola
>     >
>     >
>     >      }
>     >
>     >      #ifdef __cplusplus
>     >     --
>     >     2.19.0
>     >
>     >
>     >
>     > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> 
> 
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Ola Liljedahl Oct. 8, 2018, 10:51 a.m. UTC | #5
On 08/10/2018, 10:42, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:

    -----Original Message-----
    > Date: Mon, 8 Oct 2018 08:25:28 +0000
    > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
    > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
    > CC: Jan Viktorin <viktorin@rehivetech.com>, "Gavin Hu (Arm Technology
    >  China)" <Gavin.Hu@arm.com>, "dev@dpdk.org" <dev@dpdk.org>,
    >  "thomas@monjalon.net" <thomas@monjalon.net>
    > Subject: Re: [dpdk-dev] [PATCH] eal/armv7: add support for rte pause
    > user-agent: Microsoft-MacOutlook/10.11.0.180909
    >
    >
    > On 08/10/2018, 08:27, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
    >
    >     -----Original Message-----
    >     > Date: Sun, 7 Oct 2018 21:09:25 +0000
    >     > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
    >     > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, Jan Viktorin
    >     >  <viktorin@rehivetech.com>, "Gavin Hu (Arm Technology China)"
    >     >  <Gavin.Hu@arm.com>
    >     > CC: "dev@dpdk.org" <dev@dpdk.org>, "thomas@monjalon.net"
    >     >  <thomas@monjalon.net>
    >     > Subject: Re: [dpdk-dev] [PATCH] eal/armv7: add support for rte pause
    >     > user-agent: Microsoft-MacOutlook/10.11.0.180909
    >     >
    >     > External Email
    >     >
    >     > On 07/10/2018, 08:32, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
    >     >
    >     >     Add support for rte_pause() implementation for armv7.
    >     >
    >     >     Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
    >     >     ---
    >     >
    >     >     The reference implementation for Linux's cpu_relax() for armv7 is at
    >     >     https://elixir.bootlin.com/linux/latest/source/arch/arm/include/asm/processor.h#L100
    >     >
    >     >     ---
    >     >      lib/librte_eal/common/include/arch/arm/rte_pause_32.h | 4 +++-
    >     >      1 file changed, 3 insertions(+), 1 deletion(-)
    >     >
    >     >     diff --git a/lib/librte_eal/common/include/arch/arm/rte_pause_32.h b/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
    >     >     index d4768c7a9..9b856e0cf 100644
    >     >     --- a/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
    >     >     +++ b/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
    >     >     @@ -9,11 +9,13 @@
    >     >      extern "C" {
    >     >      #endif
    >     >
    >     >     -#include <rte_common.h>
    >     >     +#include <rte_atomic.h>
    >     >     +
    >     >      #include "generic/rte_pause.h"
    >     >
    >     >      static inline void rte_pause(void)
    >     >      {
    >     >     +rte_compiler_barrier();
    >     > The compiler barrier is not mandated by the DPDK documentation for rte_pause():
    >     > http://doc.dpdk.org/api/rte__pause_8h.html
    >
    >     We can add that explicitly if required to inline with other arch. Just like
    >     Linux kernel's cpu_relax()
    > I think the documentation should specify this compiler barrier if it is needed for correct behaviour.

    Yes.

    >
    >
    >     >
    >     > You have to go all the way to the source and GCC documentation to discover that for GCC, rte_pause calls _mm_pause() which in turn is implemented using __builtin_ia32_pause().
    >     > https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/X86-Built-in-Functions.html
    >     > void __builtin_ia32_pause (void)
    >     > Generates the pause machine instruction with a compiler memory barrier.
    >
    >     Yes. IMO, it makes sense to have compiler memory barrier to make sure it
    >     waits semantically at least WRT current rte_pause() usage.
    > Current *non-C11* usage. But more and more code in DPDK uses the C11 memory model.

    Probably we need a different API to CPU wait,n otherwise exiting
    rte_pause() will break. For example, lib/librte_ring/rte_ring_generic.h
    also uses rte_pause()


    >
    >
    >     >
    >     > If you are using C11 atomic operations e.g. for polling a location, the atomic operations will be able to provide the required semantics (e.g. don't merge atomic loads from different iterations of a loop, optionally provide acquire and/or release (or stronger) ordering. A compiler barrier here interferes with the (possibly weaker) barriers from the atomic operations. We could use a C11-version of rte_pause() that doesn't have the compiler barrier. But actually, we want support for WFE, x86 also has something similar now, MONITOR/MWAIT
    >
    >     If it is WFE then who will wake up from the power saving state. SEV from the
    >     other thread?
    > SEV/WFE is the ARMv7 way of waiting for event but the waking up is very crude (SEV broadcasts an event to *all* cores). ARMv8 introduces a new way where the waiting thread uses SEVL/WFE/LDXR/WFE to wait for a specific location (in practice cache line) to be updated and whichever thread writes the location will automatically notify any waiters (no SEV needed). See code example in other email thread.

    Yes. The context was ARMv7 patch so I said about SEV.

    >
    >
    >     What would be a C11 version of rte_pause()?
    > A function that stalls the CPU for some ten(s) of cycles. No implicit or explicit (compiler) barriers. E.g. ISB on ARM which - unlink NOP - actually stalls the pipeline for 10-20 cycles (but ISB will also have HW barrier semantics). But as I wrote above, using WFE would be better (at least has been better in the internal benchmarks I have done/seen). Much better to focus our efforts on how to make use of WFE for C11 code.

    Is there any API in C11 which maps WFE and LDXR pair.? If not, We need
    introduce new API to use conjunction with LDXR.
I learned something new today:
https://gcc.gnu.org/onlinedocs/gcc/ARM-C-Language-Extensions-_0028ACLE_0029.html
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf
8.4 Hints
The intrinsics in this section are available for all targets. They may be no-ops (i.e. generate no code, but possibly
act as a code motion barrier in compilers) on targets where the relevant instructions do not exist. On targets where
the relevant instructions exist but are implemented as no-ops, these intrinsics generate the instructions.
 void __wfi(void);
Generates a WFI (wait for interrupt) hint instruction, or nothing. The WFI instruction allows (but does not require)
the processor to enter a low-power state until one of a number of asynchronous events occurs.
 void __wfe(void);
Generates a WFE (wait for event) hint instruction, or nothing. The WFE instruction allows (but does not require)
the processor to enter a low-power state until some event occurs such as a SEV being issued by another
processor.
 void __sev(void);
Generates a SEV (send a global event) hint instruction. This causes an event to be signaled to all processors in a
multiprocessor system. It is a NOP on a uniprocessor system.
 void __sevl(void);
Generates a “send a local event” hint instruction. This causes an event to be signaled to only the processor
executing this instruction. In a multiprocessor system, it is not required to affect the other processors.
 void __yield(void);

But unfortunately, these definitions are missing from GCC 7.3. I have reported this to the ARM GCC maintainers.
Need to check GCC 8, see if I can update my target.

So it seems my current method of using inline assembler has be continued for a while.

As described before, here is how I support both WFE and non-WFE targets. Possibly there is some other more
abstract way to use WFE, not exposing the use of SEVL, WFE and LDXR. But in some situations, the condition for
continue to wait is more complicated than just comparing (equality/inequality) with another value.


        if (UNLIKELY(__atomic_load_n(loc, __ATOMIC_RELAXED) != idx))
        {
            SEVL();
            while (WFE() && LDXR32(loc, __ATOMIC_RELAXED) != idx)
            {
                DOZE();
            }
        }



    I would say more than C11, Address use case through load acquire and store release semantics.


    >
    >     >
    >     > -- Ola
    >     >
    >     >
    >     >      }
    >     >
    >     >      #ifdef __cplusplus
    >     >     --
    >     >     2.19.0
    >     >
    >     >
    >     >
    >     > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
    >
    >
    > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Patch
diff mbox series

diff --git a/lib/librte_eal/common/include/arch/arm/rte_pause_32.h b/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
index d4768c7a9..9b856e0cf 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_pause_32.h
@@ -9,11 +9,13 @@ 
 extern "C" {
 #endif
 
-#include <rte_common.h>
+#include <rte_atomic.h>
+
 #include "generic/rte_pause.h"
 
 static inline void rte_pause(void)
 {
+	rte_compiler_barrier();
 }
 
 #ifdef __cplusplus