Message ID | 20210720035125.14214-5-joyce.kong@arm.com (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | David Marchand |
Headers | show |
Series | use compiler atomic builtins for test | expand |
Context | Check | Description |
---|---|---|
ci/checkpatch | success | coding style OK |
Hi Joyce, On Mon, Jul 19, 2021 at 10:51:21PM -0500, Joyce Kong wrote: > Convert rte_atomic usages to compiler atomic built-ins for lcores > sync in mcslock testcases. > > Signed-off-by: Joyce Kong <joyce.kong@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > Acked-by: Stephen Hemminger <stephen@networkplumber.org> > --- > app/test/test_mcslock.c | 14 ++++++-------- > 1 file changed, 6 insertions(+), 8 deletions(-) > > diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c > index 80eaecc90a..52e45e7e2a 100644 > --- a/app/test/test_mcslock.c > +++ b/app/test/test_mcslock.c > @@ -17,7 +17,6 @@ > #include <rte_lcore.h> > #include <rte_cycles.h> > #include <rte_mcslock.h> > -#include <rte_atomic.h> > > #include "test.h" > > @@ -43,7 +42,7 @@ rte_mcslock_t *p_ml_perf; > > static unsigned int count; > > -static rte_atomic32_t synchro; > +static uint32_t synchro; > > static int > test_mcslock_per_core(__rte_unused void *arg) > @@ -76,8 +75,7 @@ load_loop_fn(void *func_param) > rte_mcslock_t ml_perf_me; > > /* wait synchro */ > - while (rte_atomic32_read(&synchro) == 0) > - ; > + rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED); > > begin = rte_get_timer_cycles(); > while (lcount < MAX_LOOP) { > @@ -102,15 +100,15 @@ test_mcslock_perf(void) > const unsigned int lcore = rte_lcore_id(); > > printf("\nTest with no lock on single core...\n"); > - rte_atomic32_set(&synchro, 1); > + __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED); > load_loop_fn(&lock); > printf("Core [%u] Cost Time = %"PRIu64" us\n", > lcore, time_count[lcore]); > memset(time_count, 0, sizeof(time_count)); > > printf("\nTest with lock on single core...\n"); > + __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED); > lock = 1; > - rte_atomic32_set(&synchro, 1); nit: is there a reason for moving this line? > load_loop_fn(&lock); > printf("Core [%u] Cost Time = %"PRIu64" us\n", > lcore, time_count[lcore]); > @@ -118,11 +116,11 @@ test_mcslock_perf(void) > > printf("\nTest with lock on %u cores...\n", (rte_lcore_count())); > > - rte_atomic32_set(&synchro, 0); > + __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED); > rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN); > > /* start synchro and launch test on main */ > - rte_atomic32_set(&synchro, 1); > + __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED); > load_loop_fn(&lock); I have a more general question. Please forgive my ignorance about the C++11 atomic builtins and memory model. Both gcc manual and C11 standard are not that easy to understand :) In all the patches of this patchset, __ATOMIC_RELAXED is used. My understanding is that it does not add any inter-thread ordering constraint. I suppose that in this particular case, we rely on the call to rte_eal_mp_remote_launch() being a compiler barrier, and the function itself to be a memory barrier. This ensures that worker threads sees synchro=0 until it is set to 1 by the master. Is it correct? What is the reason for using the atomic API here? Wouldn't a standard affectation work too? (I mean "synchro = 1;") > > rte_eal_mp_wait_lcore(); > -- > 2.17.1 >
Hi Olivier, > -----Original Message----- > From: Olivier Matz <olivier.matz@6wind.com> > Sent: Wednesday, July 28, 2021 5:57 PM > To: Joyce Kong <Joyce.Kong@arm.com> > Cc: thomas@monjalon.net; david.marchand@redhat.com; > roretzla@linux.microsoft.com; stephen@networkplumber.org; > andrew.rybchenko@oktetlabs.ru; harry.van.haaren@intel.com; Honnappa > Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang > <Ruifeng.Wang@arm.com>; dev@dpdk.org; nd <nd@arm.com> > Subject: Re: [PATCH v3 4/8] test/mcslock: use compiler atomics for lcores > sync > > Hi Joyce, > > On Mon, Jul 19, 2021 at 10:51:21PM -0500, Joyce Kong wrote: > > Convert rte_atomic usages to compiler atomic built-ins for lcores sync > > in mcslock testcases. > > > > Signed-off-by: Joyce Kong <joyce.kong@arm.com> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > > Acked-by: Stephen Hemminger <stephen@networkplumber.org> > > --- > > app/test/test_mcslock.c | 14 ++++++-------- > > 1 file changed, 6 insertions(+), 8 deletions(-) > > > > diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c index > > 80eaecc90a..52e45e7e2a 100644 > > --- a/app/test/test_mcslock.c > > +++ b/app/test/test_mcslock.c > > @@ -17,7 +17,6 @@ > > #include <rte_lcore.h> > > #include <rte_cycles.h> > > #include <rte_mcslock.h> > > -#include <rte_atomic.h> > > > > #include "test.h" > > > > @@ -43,7 +42,7 @@ rte_mcslock_t *p_ml_perf; > > > > static unsigned int count; > > > > -static rte_atomic32_t synchro; > > +static uint32_t synchro; > > > > static int > > test_mcslock_per_core(__rte_unused void *arg) @@ -76,8 +75,7 @@ > > load_loop_fn(void *func_param) > > rte_mcslock_t ml_perf_me; > > > > /* wait synchro */ > > - while (rte_atomic32_read(&synchro) == 0) > > - ; > > + rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED); > > > > begin = rte_get_timer_cycles(); > > while (lcount < MAX_LOOP) { > > @@ -102,15 +100,15 @@ test_mcslock_perf(void) > > const unsigned int lcore = rte_lcore_id(); > > > > printf("\nTest with no lock on single core...\n"); > > - rte_atomic32_set(&synchro, 1); > > + __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED); > > load_loop_fn(&lock); > > printf("Core [%u] Cost Time = %"PRIu64" us\n", > > lcore, time_count[lcore]); > > memset(time_count, 0, sizeof(time_count)); > > > > printf("\nTest with lock on single core...\n"); > > + __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED); > > lock = 1; > > - rte_atomic32_set(&synchro, 1); > > nit: is there a reason for moving this line? I meant to use __atomic_store_n() instead of rte_atomic32_set() to set synchro, but put the operation to the line up 'lock=1' by mistake, will change it. > > > > load_loop_fn(&lock); > > printf("Core [%u] Cost Time = %"PRIu64" us\n", > > lcore, time_count[lcore]); > > @@ -118,11 +116,11 @@ test_mcslock_perf(void) > > > > printf("\nTest with lock on %u cores...\n", (rte_lcore_count())); > > > > - rte_atomic32_set(&synchro, 0); > > + __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED); > > rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN); > > > > /* start synchro and launch test on main */ > > - rte_atomic32_set(&synchro, 1); > > + __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED); > > load_loop_fn(&lock); > > I have a more general question. Please forgive my ignorance about the > C++11 atomic builtins and memory model. Both gcc manual and C11 > standard > are not that easy to understand :) > > In all the patches of this patchset, __ATOMIC_RELAXED is used. My > understanding is that it does not add any inter-thread ordering constraint. I > suppose that in this particular case, we rely on the call to > rte_eal_mp_remote_launch() being a compiler barrier, and the function itself > to be a memory barrier. This ensures that worker threads sees synchro=0 > until it is set to 1 by the master. > Is it correct? > Yes, you are right. __ATOMIC_RELAXED would introduce no barrier, and the worker threads would sync with master thread by 'synchro'. > What is the reason for using the atomic API here? Wouldn't a standard > affectation work too? (I mean "synchro = 1;") > Here, __atomic_store_n(__ATOMIC_RELAXED) is used to ensure worker threads see 'synchro=1' after it is changed by the master. And a standard affection can not ensure worker threads get the new value. > > > > > rte_eal_mp_wait_lcore(); > > -- > > 2.17.1 > >
On Thu, Jul 29, 2021 at 07:19:13AM +0000, Joyce Kong wrote: > Hi Olivier, > > > -----Original Message----- > > From: Olivier Matz <olivier.matz@6wind.com> > > Sent: Wednesday, July 28, 2021 5:57 PM > > To: Joyce Kong <Joyce.Kong@arm.com> > > Cc: thomas@monjalon.net; david.marchand@redhat.com; > > roretzla@linux.microsoft.com; stephen@networkplumber.org; > > andrew.rybchenko@oktetlabs.ru; harry.van.haaren@intel.com; Honnappa > > Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang > > <Ruifeng.Wang@arm.com>; dev@dpdk.org; nd <nd@arm.com> > > Subject: Re: [PATCH v3 4/8] test/mcslock: use compiler atomics for lcores > > sync > > > > Hi Joyce, > > > > On Mon, Jul 19, 2021 at 10:51:21PM -0500, Joyce Kong wrote: > > > Convert rte_atomic usages to compiler atomic built-ins for lcores sync > > > in mcslock testcases. > > > > > > Signed-off-by: Joyce Kong <joyce.kong@arm.com> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > > > Acked-by: Stephen Hemminger <stephen@networkplumber.org> > > > --- > > > app/test/test_mcslock.c | 14 ++++++-------- > > > 1 file changed, 6 insertions(+), 8 deletions(-) > > > > > > diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c index > > > 80eaecc90a..52e45e7e2a 100644 > > > --- a/app/test/test_mcslock.c > > > +++ b/app/test/test_mcslock.c > > > @@ -17,7 +17,6 @@ > > > #include <rte_lcore.h> > > > #include <rte_cycles.h> > > > #include <rte_mcslock.h> > > > -#include <rte_atomic.h> > > > > > > #include "test.h" > > > > > > @@ -43,7 +42,7 @@ rte_mcslock_t *p_ml_perf; > > > > > > static unsigned int count; > > > > > > -static rte_atomic32_t synchro; > > > +static uint32_t synchro; > > > > > > static int > > > test_mcslock_per_core(__rte_unused void *arg) @@ -76,8 +75,7 @@ > > > load_loop_fn(void *func_param) > > > rte_mcslock_t ml_perf_me; > > > > > > /* wait synchro */ > > > - while (rte_atomic32_read(&synchro) == 0) > > > - ; > > > + rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED); > > > > > > begin = rte_get_timer_cycles(); > > > while (lcount < MAX_LOOP) { > > > @@ -102,15 +100,15 @@ test_mcslock_perf(void) > > > const unsigned int lcore = rte_lcore_id(); > > > > > > printf("\nTest with no lock on single core...\n"); > > > - rte_atomic32_set(&synchro, 1); > > > + __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED); > > > load_loop_fn(&lock); > > > printf("Core [%u] Cost Time = %"PRIu64" us\n", > > > lcore, time_count[lcore]); > > > memset(time_count, 0, sizeof(time_count)); > > > > > > printf("\nTest with lock on single core...\n"); > > > + __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED); > > > lock = 1; > > > - rte_atomic32_set(&synchro, 1); > > > > nit: is there a reason for moving this line? > > I meant to use __atomic_store_n() instead of rte_atomic32_set() to set synchro, > but put the operation to the line up 'lock=1' by mistake, will change it. > > > > > > > load_loop_fn(&lock); > > > printf("Core [%u] Cost Time = %"PRIu64" us\n", > > > lcore, time_count[lcore]); > > > @@ -118,11 +116,11 @@ test_mcslock_perf(void) > > > > > > printf("\nTest with lock on %u cores...\n", (rte_lcore_count())); > > > > > > - rte_atomic32_set(&synchro, 0); > > > + __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED); > > > rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN); > > > > > > /* start synchro and launch test on main */ > > > - rte_atomic32_set(&synchro, 1); > > > + __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED); > > > load_loop_fn(&lock); > > > > I have a more general question. Please forgive my ignorance about the > > C++11 atomic builtins and memory model. Both gcc manual and C11 > > standard > > are not that easy to understand :) > > > > In all the patches of this patchset, __ATOMIC_RELAXED is used. My > > understanding is that it does not add any inter-thread ordering constraint. I > > suppose that in this particular case, we rely on the call to > > rte_eal_mp_remote_launch() being a compiler barrier, and the function itself > > to be a memory barrier. This ensures that worker threads sees synchro=0 > > until it is set to 1 by the master. > > Is it correct? > > > > Yes, you are right. __ATOMIC_RELAXED would introduce no barrier, and the worker > threads would sync with master thread by 'synchro'. > > > What is the reason for using the atomic API here? Wouldn't a standard > > affectation work too? (I mean "synchro = 1;") > > > > Here, __atomic_store_n(__ATOMIC_RELAXED) is used to ensure worker threads > see 'synchro=1' after it is changed by the master. And a standard affection can not > ensure worker threads get the new value. So, if I understand correctly, using __atomic_store() acts as if the variable is volatile, and this is indeed needed to ensure visibility from other worker threads. I did some tests to convince myself: https://godbolt.org/z/3qWYeneGf Thank you for the clarification. > > > > > > > > rte_eal_mp_wait_lcore(); > > > -- > > > 2.17.1 > > >
diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c index 80eaecc90a..52e45e7e2a 100644 --- a/app/test/test_mcslock.c +++ b/app/test/test_mcslock.c @@ -17,7 +17,6 @@ #include <rte_lcore.h> #include <rte_cycles.h> #include <rte_mcslock.h> -#include <rte_atomic.h> #include "test.h" @@ -43,7 +42,7 @@ rte_mcslock_t *p_ml_perf; static unsigned int count; -static rte_atomic32_t synchro; +static uint32_t synchro; static int test_mcslock_per_core(__rte_unused void *arg) @@ -76,8 +75,7 @@ load_loop_fn(void *func_param) rte_mcslock_t ml_perf_me; /* wait synchro */ - while (rte_atomic32_read(&synchro) == 0) - ; + rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED); begin = rte_get_timer_cycles(); while (lcount < MAX_LOOP) { @@ -102,15 +100,15 @@ test_mcslock_perf(void) const unsigned int lcore = rte_lcore_id(); printf("\nTest with no lock on single core...\n"); - rte_atomic32_set(&synchro, 1); + __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED); load_loop_fn(&lock); printf("Core [%u] Cost Time = %"PRIu64" us\n", lcore, time_count[lcore]); memset(time_count, 0, sizeof(time_count)); printf("\nTest with lock on single core...\n"); + __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED); lock = 1; - rte_atomic32_set(&synchro, 1); load_loop_fn(&lock); printf("Core [%u] Cost Time = %"PRIu64" us\n", lcore, time_count[lcore]); @@ -118,11 +116,11 @@ test_mcslock_perf(void) printf("\nTest with lock on %u cores...\n", (rte_lcore_count())); - rte_atomic32_set(&synchro, 0); + __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED); rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN); /* start synchro and launch test on main */ - rte_atomic32_set(&synchro, 1); + __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED); load_loop_fn(&lock); rte_eal_mp_wait_lcore();