[dpdk-dev] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.

Message ID BLU436-SMTP1816B53EC78CA7DDCB12FAABFD10@phx.gbl (mailing list archive)
State Changes Requested, archived
Headers

Commit Message

WangDong May 5, 2015, 3:38 p.m. UTC
The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor, compiler memory barrier is enough. But if dpdk runing on a AMD processor, maybe we should use processor memory barrier.
I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve performance with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't add the macro, the memory ordering will not be guaranteed. Which macro is better?
If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with rte_rmb() and rte_wmb() for any architecture.

---
 lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++
 1 file changed, 10 insertions(+)
  

Comments

Ananyev, Konstantin May 5, 2015, 10:46 p.m. UTC | #1
Hi Dong,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong
> Sent: Tuesday, May 05, 2015 4:38 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
> 
> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor, compiler
> memory barrier is enough. 

I wouldn't say they are 'unnecessary'.
There are situations, even on IA, when you need _fence_ isntructions.
So, please leave rte_*mb() macros unmodified.
I still think that we need to create a new set of architecture dependent macros, as what discussed before.
Probably by analogy with linux kernel rte_smp_*mb() is a good name for them.  
Though if you have some better name in mind, I am open to suggestions here.

> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier.

As far as I remember, amd has the same memory ordering model.
So, I don't think we need  #ifdef RTE_ARCH_X86_IA here.

Konstantin

> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve performance
> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't add the
> macro, the memory ordering will not be guaranteed. Which macro is better?
> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with rte_rmb()
> and rte_wmb() for any architecture.
> 
> ---
>  lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> index e93e8ee..52b1e81 100644
> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> @@ -49,10 +49,20 @@ extern "C" {
> 
>  #define	rte_mb() _mm_mfence()
> 
> +#ifdef RTE_ARCH_X86_IA
> +
> +#define rte_wmb() rte_compiler_barrier()
> +
> +#define rte_rmb() rte_compiler_barrier()
> +
> +#else
> +
>  #define	rte_wmb() _mm_sfence()
> 
>  #define	rte_rmb() _mm_lfence()
> 
> +#endif
> +
>  /*------------------------- 16 bit atomic operations -------------------------*/
> 
>  #ifndef RTE_FORCE_INTRINSICS
> --
> 1.9.1
  
WangDong May 7, 2015, 3:28 p.m. UTC | #2
Hi Konstantin,

> Hi Dong,
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong
>> Sent: Tuesday, May 05, 2015 4:38 PM
>> To: dev@dpdk.org
>> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
>>
>> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor, compiler
>> memory barrier is enough.
>
> I wouldn't say they are 'unnecessary'.
> There are situations, even on IA, when you need _fence_ isntructions.
> So, please leave rte_*mb() macros unmodified.
OK, leave them unmodified, but I really can't find a situation to use 
sfence and lfence instructions.


> I still think that we need to create a new set of architecture dependent macros, as what discussed before.
> Probably by analogy with linux kernel rte_smp_*mb() is a good name for them.
> Though if you have some better name in mind, I am open to suggestions here.
What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~

>
>> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier.
>
> As far as I remember, amd has the same memory ordering model.
It's too hard to find a AMD's software developer manual.....

Dong

> So, I don't think we need  #ifdef RTE_ARCH_X86_IA here.
>
> Konstantin
>
>> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve performance
>> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't add the
>> macro, the memory ordering will not be guaranteed. Which macro is better?
>> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with rte_rmb()
>> and rte_wmb() for any architecture.
>>
>> ---
>>   lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>> index e93e8ee..52b1e81 100644
>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>> @@ -49,10 +49,20 @@ extern "C" {
>>
>>   #define	rte_mb() _mm_mfence()
>>
>> +#ifdef RTE_ARCH_X86_IA
>> +
>> +#define rte_wmb() rte_compiler_barrier()
>> +
>> +#define rte_rmb() rte_compiler_barrier()
>> +
>> +#else
>> +
>>   #define	rte_wmb() _mm_sfence()
>>
>>   #define	rte_rmb() _mm_lfence()
>>
>> +#endif
>> +
>>   /*------------------------- 16 bit atomic operations -------------------------*/
>>
>>   #ifndef RTE_FORCE_INTRINSICS
>> --
>> 1.9.1
>
  
Ananyev, Konstantin May 7, 2015, 4:34 p.m. UTC | #3
Hi Dong,

> -----Original Message-----
> From: Wang Dong [mailto:dong.wang.pro@hotmail.com]
> Sent: Thursday, May 07, 2015 4:28 PM
> To: Ananyev, Konstantin; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
> 
> Hi Konstantin,
> 
> > Hi Dong,
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong
> >> Sent: Tuesday, May 05, 2015 4:38 PM
> >> To: dev@dpdk.org
> >> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
> >>
> >> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor,
> compiler
> >> memory barrier is enough.
> >
> > I wouldn't say they are 'unnecessary'.
> > There are situations, even on IA, when you need _fence_ isntructions.
> > So, please leave rte_*mb() macros unmodified.
> OK, leave them unmodified, but I really can't find a situation to use
> sfence and lfence instructions.

For example:
http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
http://dpdk.org/ml/archives/dev/2014-May/002613.html

> 
> 
> > I still think that we need to create a new set of architecture dependent macros, as what discussed before.
> > Probably by analogy with linux kernel rte_smp_*mb() is a good name for them.
> > Though if you have some better name in mind, I am open to suggestions here.
> What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~

Hmm, but why _dma_?
We need same thing for multi-core communication too.
If rte_smp_ is not good enough, might be: rte_arch_?

> 
> >
> >> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier.
> >
> > As far as I remember, amd has the same memory ordering model.
> It's too hard to find a AMD's software developer manual.....

There for example:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf
?

Konstantin

> 
> Dong
> 
> > So, I don't think we need  #ifdef RTE_ARCH_X86_IA here.
> >
> > Konstantin
> >
> >> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve
> performance
> >> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't add
> the
> >> macro, the memory ordering will not be guaranteed. Which macro is better?
> >> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with
> rte_rmb()
> >> and rte_wmb() for any architecture.
> >>
> >> ---
> >>   lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++
> >>   1 file changed, 10 insertions(+)
> >>
> >> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >> index e93e8ee..52b1e81 100644
> >> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >> @@ -49,10 +49,20 @@ extern "C" {
> >>
> >>   #define	rte_mb() _mm_mfence()
> >>
> >> +#ifdef RTE_ARCH_X86_IA
> >> +
> >> +#define rte_wmb() rte_compiler_barrier()
> >> +
> >> +#define rte_rmb() rte_compiler_barrier()
> >> +
> >> +#else
> >> +
> >>   #define	rte_wmb() _mm_sfence()
> >>
> >>   #define	rte_rmb() _mm_lfence()
> >>
> >> +#endif
> >> +
> >>   /*------------------------- 16 bit atomic operations -------------------------*/
> >>
> >>   #ifndef RTE_FORCE_INTRINSICS
> >> --
> >> 1.9.1
> >
  
WangDong May 9, 2015, 10:24 a.m. UTC | #4
Hi Konstantin,

>
> Hi Dong,
>
>> -----Original Message-----
>> From: Wang Dong [mailto:dong.wang.pro@hotmail.com]
>> Sent: Thursday, May 07, 2015 4:28 PM
>> To: Ananyev, Konstantin; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
>>
>> Hi Konstantin,
>>
>>> Hi Dong,
>>>
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong
>>>> Sent: Tuesday, May 05, 2015 4:38 PM
>>>> To: dev@dpdk.org
>>>> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
>>>>
>>>> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor,
>> compiler
>>>> memory barrier is enough.
>>>
>>> I wouldn't say they are 'unnecessary'.
>>> There are situations, even on IA, when you need _fence_ isntructions.
>>> So, please leave rte_*mb() macros unmodified.
>> OK, leave them unmodified, but I really can't find a situation to use
>> sfence and lfence instructions.
>
> For example:
> http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
> http://dpdk.org/ml/archives/dev/2014-May/002613.html
>
>>
>>
>>> I still think that we need to create a new set of architecture dependent macros, as what discussed before.
>>> Probably by analogy with linux kernel rte_smp_*mb() is a good name for them.
>>> Though if you have some better name in mind, I am open to suggestions here.
>> What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~
>
> Hmm, but why _dma_?
> We need same thing for multi-core communication too.
> If rte_smp_ is not good enough, might be: rte_arch_?
I want these two macro only used in PMD, so I think _dma_ is better. The 
memory barrier of processor-processor maybe more complex, and I'm not 
familiar with it... Someone can add rte_smp_*mb for multi-core.

I think _arch_ is means nothing here, because rte_*mb is already for 
architectures that dpdk supported, they are redefined in these architecture.

>
>>
>>>
>>>> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier.
>>>
>>> As far as I remember, amd has the same memory ordering model.
>> It's too hard to find a AMD's software developer manual.....
>
> There for example:
> http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf
> ?
Search such document on AMD offical website for a long time, this manual 
is what I want, thanks very much!!!

Dong

>
> Konstantin
>
>>
>> Dong
>>
>>> So, I don't think we need  #ifdef RTE_ARCH_X86_IA here.
>>>
>>> Konstantin
>>>
>>>> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve
>> performance
>>>> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't add
>> the
>>>> macro, the memory ordering will not be guaranteed. Which macro is better?
>>>> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with
>> rte_rmb()
>>>> and rte_wmb() for any architecture.
>>>>
>>>> ---
>>>>    lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++
>>>>    1 file changed, 10 insertions(+)
>>>>
>>>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>> index e93e8ee..52b1e81 100644
>>>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>> @@ -49,10 +49,20 @@ extern "C" {
>>>>
>>>>    #define	rte_mb() _mm_mfence()
>>>>
>>>> +#ifdef RTE_ARCH_X86_IA
>>>> +
>>>> +#define rte_wmb() rte_compiler_barrier()
>>>> +
>>>> +#define rte_rmb() rte_compiler_barrier()
>>>> +
>>>> +#else
>>>> +
>>>>    #define	rte_wmb() _mm_sfence()
>>>>
>>>>    #define	rte_rmb() _mm_lfence()
>>>>
>>>> +#endif
>>>> +
>>>>    /*------------------------- 16 bit atomic operations -------------------------*/
>>>>
>>>>    #ifndef RTE_FORCE_INTRINSICS
>>>> --
>>>> 1.9.1
>>>
  
Ananyev, Konstantin May 11, 2015, 9:59 a.m. UTC | #5
Hi Dong,

> -----Original Message-----

> From: Wang Dong [mailto:dong.wang.pro@hotmail.com]

> Sent: Saturday, May 09, 2015 11:24 AM

> To: Ananyev, Konstantin; dev@dpdk.org

> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.

> 

> Hi Konstantin,

> 

> >

> > Hi Dong,

> >

> >> -----Original Message-----

> >> From: Wang Dong [mailto:dong.wang.pro@hotmail.com]

> >> Sent: Thursday, May 07, 2015 4:28 PM

> >> To: Ananyev, Konstantin; dev@dpdk.org

> >> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.

> >>

> >> Hi Konstantin,

> >>

> >>> Hi Dong,

> >>>

> >>>> -----Original Message-----

> >>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong

> >>>> Sent: Tuesday, May 05, 2015 4:38 PM

> >>>> To: dev@dpdk.org

> >>>> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.

> >>>>

> >>>> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor,

> >> compiler

> >>>> memory barrier is enough.

> >>>

> >>> I wouldn't say they are 'unnecessary'.

> >>> There are situations, even on IA, when you need _fence_ isntructions.

> >>> So, please leave rte_*mb() macros unmodified.

> >> OK, leave them unmodified, but I really can't find a situation to use

> >> sfence and lfence instructions.

> >

> > For example:

> > http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/

> > http://dpdk.org/ml/archives/dev/2014-May/002613.html

> >

> >>

> >>

> >>> I still think that we need to create a new set of architecture dependent macros, as what discussed before.

> >>> Probably by analogy with linux kernel rte_smp_*mb() is a good name for them.

> >>> Though if you have some better name in mind, I am open to suggestions here.

> >> What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~

> >

> > Hmm, but why _dma_?

> > We need same thing for multi-core communication too.

> > If rte_smp_ is not good enough, might be: rte_arch_?

> I want these two macro only used in PMD, so I think _dma_ is better. The

> memory barrier of processor-processor maybe more complex, and I'm not

> familiar with it... Someone can add rte_smp_*mb for multi-core.


Sorry, what you are talking about?
At the end, it will use same instructions, whateve we'll name it: _dma_, _smp_, _arch_.
Konstantin

> 

> I think _arch_ is means nothing here, because rte_*mb is already for

> architectures that dpdk supported, they are redefined in these architecture.

> 

> >

> >>

> >>>

> >>>> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier.

> >>>

> >>> As far as I remember, amd has the same memory ordering model.

> >> It's too hard to find a AMD's software developer manual.....

> >

> > There for example:

> > http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf

> > ?

> Search such document on AMD offical website for a long time, this manual

> is what I want, thanks very much!!!

> 

> Dong

> 

> >

> > Konstantin

> >

> >>

> >> Dong

> >>

> >>> So, I don't think we need  #ifdef RTE_ARCH_X86_IA here.

> >>>

> >>> Konstantin

> >>>

> >>>> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve

> >> performance

> >>>> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't

> add

> >> the

> >>>> macro, the memory ordering will not be guaranteed. Which macro is better?

> >>>> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with

> >> rte_rmb()

> >>>> and rte_wmb() for any architecture.

> >>>>

> >>>> ---

> >>>>    lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++

> >>>>    1 file changed, 10 insertions(+)

> >>>>

> >>>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h

> >>>> index e93e8ee..52b1e81 100644

> >>>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h

> >>>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h

> >>>> @@ -49,10 +49,20 @@ extern "C" {

> >>>>

> >>>>    #define	rte_mb() _mm_mfence()

> >>>>

> >>>> +#ifdef RTE_ARCH_X86_IA

> >>>> +

> >>>> +#define rte_wmb() rte_compiler_barrier()

> >>>> +

> >>>> +#define rte_rmb() rte_compiler_barrier()

> >>>> +

> >>>> +#else

> >>>> +

> >>>>    #define	rte_wmb() _mm_sfence()

> >>>>

> >>>>    #define	rte_rmb() _mm_lfence()

> >>>>

> >>>> +#endif

> >>>> +

> >>>>    /*------------------------- 16 bit atomic operations -------------------------*/

> >>>>

> >>>>    #ifndef RTE_FORCE_INTRINSICS

> >>>> --

> >>>> 1.9.1

> >>>
  
WangDong May 12, 2015, 3:23 p.m. UTC | #6
> Hi Dong,
>
>> -----Original Message-----
>> From: Wang Dong [mailto:dong.wang.pro@hotmail.com]
>> Sent: Saturday, May 09, 2015 11:24 AM
>> To: Ananyev, Konstantin; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
>>
>> Hi Konstantin,
>>
>>>
>>> Hi Dong,
>>>
>>>> -----Original Message-----
>>>> From: Wang Dong [mailto:dong.wang.pro@hotmail.com]
>>>> Sent: Thursday, May 07, 2015 4:28 PM
>>>> To: Ananyev, Konstantin; dev@dpdk.org
>>>> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
>>>>
>>>> Hi Konstantin,
>>>>
>>>>> Hi Dong,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of WangDong
>>>>>> Sent: Tuesday, May 05, 2015 4:38 PM
>>>>>> To: dev@dpdk.org
>>>>>> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
>>>>>>
>>>>>> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor,
>>>> compiler
>>>>>> memory barrier is enough.
>>>>>
>>>>> I wouldn't say they are 'unnecessary'.
>>>>> There are situations, even on IA, when you need _fence_ isntructions.
>>>>> So, please leave rte_*mb() macros unmodified.
>>>> OK, leave them unmodified, but I really can't find a situation to use
>>>> sfence and lfence instructions.
>>>
>>> For example:
>>> http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
>>> http://dpdk.org/ml/archives/dev/2014-May/002613.html
>>>
>>>>
>>>>
>>>>> I still think that we need to create a new set of architecture dependent macros, as what discussed before.
>>>>> Probably by analogy with linux kernel rte_smp_*mb() is a good name for them.
>>>>> Though if you have some better name in mind, I am open to suggestions here.
>>>> What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~
>>>
>>> Hmm, but why _dma_?
>>> We need same thing for multi-core communication too.
>>> If rte_smp_ is not good enough, might be: rte_arch_?
>> I want these two macro only used in PMD, so I think _dma_ is better. The
>> memory barrier of processor-processor maybe more complex, and I'm not
>> familiar with it... Someone can add rte_smp_*mb for multi-core.
>
> Sorry, what you are talking about?
> At the end, it will use same instructions, whateve we'll name it: _dma_, _smp_, _arch_.
> Konstantin

Hi Konstantin,

In previous mail, I want to say, both rte_smp_*mb() and rte_dma_*mb() 
can be added, but the context of rte_smp_*mb() is different from 
rte_dma_*mb(), maybe rte_smp_*mb() is for thread and rte_dma_*mb() is 
for PMD. I'm not sure how to implement rte_smp_*mb(), hope it can be 
implemented by other developer. In Linux, I find it same as _dma_, if 
so, they will use same instructions.

linux-4.0.1/arch/x86/include/asm/barrier.h, line 27:
#ifdef CONFIG_X86_PPRO_FENCE
#define dma_rmb()       rmb()
#else
#define dma_rmb()       barrier()
#endif
#define dma_wmb()       barrier()

#ifdef CONFIG_SMP
#define smp_mb()        mb()
#define smp_rmb()       dma_rmb()
#define smp_wmb()       barrier()
#define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
#else /* !SMP */
#define smp_mb()        barrier()
#define smp_rmb()       barrier()
#define smp_wmb()       barrier()
#define set_mb(var, value) do { var = value; barrier(); } while (0)
#endif /* SMP */

Dong
>
>>
>> I think _arch_ is means nothing here, because rte_*mb is already for
>> architectures that dpdk supported, they are redefined in these architecture.
>>
>>>
>>>>
>>>>>
>>>>>> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier.
>>>>>
>>>>> As far as I remember, amd has the same memory ordering model.
>>>> It's too hard to find a AMD's software developer manual.....
>>>
>>> There for example:
>>> http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf
>>> ?
>> Search such document on AMD offical website for a long time, this manual
>> is what I want, thanks very much!!!
>>
>> Dong
>>
>>>
>>> Konstantin
>>>
>>>>
>>>> Dong
>>>>
>>>>> So, I don't think we need  #ifdef RTE_ARCH_X86_IA here.
>>>>>
>>>>> Konstantin
>>>>>
>>>>>> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve
>>>> performance
>>>>>> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't
>> add
>>>> the
>>>>>> macro, the memory ordering will not be guaranteed. Which macro is better?
>>>>>> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with
>>>> rte_rmb()
>>>>>> and rte_wmb() for any architecture.
>>>>>>
>>>>>> ---
>>>>>>     lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++
>>>>>>     1 file changed, 10 insertions(+)
>>>>>>
>>>>>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>>>> index e93e8ee..52b1e81 100644
>>>>>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>>>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>>>> @@ -49,10 +49,20 @@ extern "C" {
>>>>>>
>>>>>>     #define	rte_mb() _mm_mfence()
>>>>>>
>>>>>> +#ifdef RTE_ARCH_X86_IA
>>>>>> +
>>>>>> +#define rte_wmb() rte_compiler_barrier()
>>>>>> +
>>>>>> +#define rte_rmb() rte_compiler_barrier()
>>>>>> +
>>>>>> +#else
>>>>>> +
>>>>>>     #define	rte_wmb() _mm_sfence()
>>>>>>
>>>>>>     #define	rte_rmb() _mm_lfence()
>>>>>>
>>>>>> +#endif
>>>>>> +
>>>>>>     /*------------------------- 16 bit atomic operations -------------------------*/
>>>>>>
>>>>>>     #ifndef RTE_FORCE_INTRINSICS
>>>>>> --
>>>>>> 1.9.1
>>>>>
  

Patch

diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
index e93e8ee..52b1e81 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
@@ -49,10 +49,20 @@  extern "C" {
 
 #define	rte_mb() _mm_mfence()
 
+#ifdef RTE_ARCH_X86_IA
+
+#define rte_wmb() rte_compiler_barrier()
+
+#define rte_rmb() rte_compiler_barrier()
+
+#else
+
 #define	rte_wmb() _mm_sfence()
 
 #define	rte_rmb() _mm_lfence()
 
+#endif
+
 /*------------------------- 16 bit atomic operations -------------------------*/
 
 #ifndef RTE_FORCE_INTRINSICS