Message ID  20190419212138.174223mattias.ronnblom@ericsson.com 

State  Superseded 
Delegated to:  Thomas Monjalon 
Headers  show 
Series 

Related  show 
Context  Check  Description 

ci/Intelcompilation  success  Compilation OK 
ci/checkpatch  success  coding style OK 
> On Apr 19, 2019, at 5:21 PM, Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > > Add a function rte_rand_max() which generates an uniformly distributed > pseudorandom number less than a userspecified upper bound. > > The commonly used pattern rte_rand() % SOME_VALUE, in addition to > being slow, also creates biased results if SOME_VALUE is not a power > of 2. This bias is very small for small SOME_VALUE, but increases > linearly with larger SOME_VALUE. > > Signedoffby: Mattias Rönnblom <mattias.ronnblom@ericsson.com> >  > lib/librte_eal/common/include/rte_random.h  16 ++++++++++++++++ > lib/librte_eal/common/rte_random.c  20 ++++++++++++++++++++ > lib/librte_eal/rte_eal_version.map  1 + > 3 files changed, 37 insertions(+) > > diff git a/lib/librte_eal/common/include/rte_random.h b/lib/librte_eal/common/include/rte_random.h > index 66dfe8ae7..053912168 100644 >  a/lib/librte_eal/common/include/rte_random.h > +++ b/lib/librte_eal/common/include/rte_random.h > @@ 47,6 +47,22 @@ rte_srand(uint64_t seedval); > uint64_t > rte_rand(void); > > +/** > + * Generates a pseudorandom number less than upper_bound. > + * > + * This function returns an uniformly distributed (unbiased) random > + * number lower than a userspecified maximum value. > + * > + * If called from lcore threads, this function is threadsafe. > + * > + * @param upper_bound > + * The upper bound of the generated number. > + * @return > + * A pseudorandom value between 0 and (upper_bound1). > + */ > +uint64_t > +rte_rand_max(uint64_t upper_bound); > + > #ifdef __cplusplus > } > #endif > diff git a/lib/librte_eal/common/rte_random.c b/lib/librte_eal/common/rte_random.c > index 288e7b8c5..bf9240470 100644 >  a/lib/librte_eal/common/rte_random.c > +++ b/lib/librte_eal/common/rte_random.c > @@ 131,6 +131,26 @@ rte_rand(void) > return __rte_rand_lfsr258(state); > } > > +uint64_t __rte_experimental > +rte_rand_max(uint64_t upper_bound) > +{ > + uint8_t zeros; > + uint64_t mask = ~((uint64_t)0); > + uint64_t res; > + > + if (upper_bound < 2) > + return 0; > + > + zeros = __builtin_clzll(upper_bound); > + mask >>= zeros; > + > + do { > + res = rte_rand() & mask; > + } while (unlikely(res >= upper_bound)); My concern here is the numbers of times this loop could be executed as the upper bound could be just over a power of 2 and it is a large number meaning the number of values above upper max and the power of 2 could be huge. Am I looking this loop correctly. If my thought process is correct it could take a long time to get a value less tha n upper max, correct? If every thing aligns in a bad way it could be a large number of loops and cause all kinds of problems. What could be done here or is this a nonissue? > + > + return res; > +} > + > RTE_INIT(rte_rand_init) > { > rte_srand(rte_get_timer_cycles()); > diff git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map > index 0d60668fa..8f5b8c6a6 100644 >  a/lib/librte_eal/rte_eal_version.map > +++ b/lib/librte_eal/rte_eal_version.map > @@ 367,6 +367,7 @@ EXPERIMENTAL { > rte_mp_sendmsg; > rte_option_register; > rte_rand; > + rte_rand_max; > rte_realloc_socket; > rte_service_lcore_attr_get; > rte_service_lcore_attr_reset_all; >  > 2.17.1 > Regards, Keith
On 20190420 23:08, Wiles, Keith wrote: >> +uint64_t __rte_experimental >> +rte_rand_max(uint64_t upper_bound) >> +{ >> + uint8_t zeros; >> + uint64_t mask = ~((uint64_t)0); >> + uint64_t res; >> + >> + if (upper_bound < 2) >> + return 0; >> + >> + zeros = __builtin_clzll(upper_bound); >> + mask >>= zeros; >> + >> + do { >> + res = rte_rand() & mask; >> + } while (unlikely(res >= upper_bound)); > > My concern here is the numbers of times this loop could be executed as the upper bound could be just over a power of 2 and it is a large number meaning the number of values above upper max and the power of 2 could be huge. Am I looking this loop correctly. If my thought process is correct it could take a long time to get a value less tha n upper max, correct? > If the number of values over the limit is huge, so is the value range below it. > If every thing aligns in a bad way it could be a large number of loops and cause all kinds of problems. What could be done here or is this a nonissue? Formally, the execution has no upper bound, but in practice I think this is a nonissue. The cycle cost of an iteration of the loop (including the rte_rand() call) is ~10 cc. With the worstcase upper_limit, the probability of having to redo the rte_rand() call is ~0.5, for every iteration. This is with a pseudorandom number generator featuring an uniform distribution. Let's assume we care about determinism, and we have a system which suffers a catastrophic failure if the rte_rand_max() latency is more than 10k clock cycles. The system performs 10M rte_rand_max() calls per second. The probability of this system running for a year without any rte_rand_max()related crashes is: 0.999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999970568598526 So crashes due to excessive rte_rand_max() latency may occur, but this risk is dwarfed by that of the system being hit by an asteroid. (Btw, I doubt even the people in our sales force have seen that many nines before.) From a performance point of view, the highloopcount cases are rare enough not to pose a serious threat. For example, being forced to redo rte_rand() more than five times is only a ~3% risk.
Sent from my iPhone > On Apr 21, 2019, at 2:05 PM, Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > > On 20190420 23:08, Wiles, Keith wrote: > >>> +uint64_t __rte_experimental >>> +rte_rand_max(uint64_t upper_bound) >>> +{ >>> + uint8_t zeros; >>> + uint64_t mask = ~((uint64_t)0); >>> + uint64_t res; >>> + >>> + if (upper_bound < 2) >>> + return 0; >>> + >>> + zeros = __builtin_clzll(upper_bound); >>> + mask >>= zeros; >>> + >>> + do { >>> + res = rte_rand() & mask; >>> + } while (unlikely(res >= upper_bound)); >> My concern here is the numbers of times this loop could be executed as the upper bound could be just over a power of 2 and it is a large number meaning the number of values above upper max and the power of 2 could be huge. Am I looking this loop correctly. If my thought process is correct it could take a long time to get a value less tha n upper max, correct? > > If the number of values over the limit is huge, so is the value range below it. > >> If every thing aligns in a bad way it could be a large number of loops and cause all kinds of problems. What could be done here or is this a nonissue? > > Formally, the execution has no upper bound, but in practice I think this is a nonissue. > > The cycle cost of an iteration of the loop (including the rte_rand() call) is ~10 cc. With the worstcase upper_limit, the probability of having to redo the rte_rand() call is ~0.5, for every iteration. This is with a pseudorandom number generator featuring an uniform distribution. > > Let's assume we care about determinism, and we have a system which suffers a catastrophic failure if the rte_rand_max() latency is more than 10k clock cycles. The system performs 10M rte_rand_max() calls per second. > > The probability of this system running for a year without any rte_rand_max()related crashes is: > 0.999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999970568598526 > > So crashes due to excessive rte_rand_max() latency may occur, but this risk is dwarfed by that of the system being hit by an asteroid. > > (Btw, I doubt even the people in our sales force have seen that many nines before.) > > From a performance point of view, the highloopcount cases are rare enough not to pose a serious threat. For example, being forced to redo rte_rand() more than five times is only a ~3% risk. Even a few loops can have an effect on performance when we are talking about microseconds plus it leads to indeterminate results. The numbers you reported here are interesting, but I would be happier if you added a limit to the loop. If you state the likely hood of doing 5 loops is only 3% then adding a loop limit would be reasonable, right?
On 20190422 06:33, Wiles, Keith wrote: >> From a performance point of view, the highloopcount cases are rare enough not to pose a serious threat. For example, being forced to redo rte_rand() more than five times is only a ~3% risk. > > Even a few loops can have an effect on performance when we are talking about microseconds plus it leads to indeterminate results. The numbers you reported here are interesting, but I would be happier if you added a limit to the loop. If you state the likely hood of doing 5 loops is only 3% then adding a loop limit would be reasonable, right? > Probability is already effectively putting a limit to the loop. The risk of being stuck for >1us is p=~6e73. The variations in execution times will in most cases be less than a LLC miss. A loop variable will not have any effect on performance, pollute the code, and hurt uniformity. Here's what rte_rand_max() performance looks like on my Skylake. Average rte_rand_max() latency with worstcase upper_bound: rte_rand_max() w/o loop limit: 47 cc rte_rand_max() w/ max 8 retries: 49 cc rte_rand_max() w/ max 4 retries: 47 cc rte_rand_max() w/ max 2 retries: 40 cc So you need to be very aggressive in limiting the loop count for that loop variable to pay off. Otherwise, you will just be at a loss, doing all that bookkeeping which very rarely turns out to be useful.
> On Apr 22, 2019, at 2:07 AM, Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > > On 20190422 06:33, Wiles, Keith wrote: > >>> From a performance point of view, the highloopcount cases are rare enough not to pose a serious threat. For example, being forced to redo rte_rand() more than five times is only a ~3% risk. >> Even a few loops can have an effect on performance when we are talking about microseconds plus it leads to indeterminate results. The numbers you reported here are interesting, but I would be happier if you added a limit to the loop. If you state the likely hood of doing 5 loops is only 3% then adding a loop limit would be reasonable, right? > > Probability is already effectively putting a limit to the loop. The risk of being stuck for >1us is p=~6e73. The variations in execution times will in most cases be less than a LLC miss. > > A loop variable will not have any effect on performance, pollute the code, and hurt uniformity. > > Here's what rte_rand_max() performance looks like on my Skylake. > > Average rte_rand_max() latency with worstcase upper_bound: > rte_rand_max() w/o loop limit: 47 cc > rte_rand_max() w/ max 8 retries: 49 cc > rte_rand_max() w/ max 4 retries: 47 cc > rte_rand_max() w/ max 2 retries: 40 cc > > So you need to be very aggressive in limiting the loop count for that loop variable to pay off. Otherwise, you will just be at a loss, doing all that bookkeeping which very rarely turns out to be useful. The adding of a loop counter is not a performance problem (as you stated only a few cycles are added) and it does not hurt the readability of the code. As for hurting uniformity I do not see that being a problem, as most loops have a visible way to terminate the loop instead of relying on a random number generator to give a valid value in a few loops. It is only reasonable IMO to limit the loop. But I will let it go and let you add this code to DPDK if someone else wants to ACK it. Regards, Keith
diff git a/lib/librte_eal/common/include/rte_random.h b/lib/librte_eal/common/include/rte_random.h index 66dfe8ae7..053912168 100644  a/lib/librte_eal/common/include/rte_random.h +++ b/lib/librte_eal/common/include/rte_random.h @@ 47,6 +47,22 @@ rte_srand(uint64_t seedval); uint64_t rte_rand(void); +/** + * Generates a pseudorandom number less than upper_bound. + * + * This function returns an uniformly distributed (unbiased) random + * number lower than a userspecified maximum value. + * + * If called from lcore threads, this function is threadsafe. + * + * @param upper_bound + * The upper bound of the generated number. + * @return + * A pseudorandom value between 0 and (upper_bound1). + */ +uint64_t +rte_rand_max(uint64_t upper_bound); + #ifdef __cplusplus } #endif diff git a/lib/librte_eal/common/rte_random.c b/lib/librte_eal/common/rte_random.c index 288e7b8c5..bf9240470 100644  a/lib/librte_eal/common/rte_random.c +++ b/lib/librte_eal/common/rte_random.c @@ 131,6 +131,26 @@ rte_rand(void) return __rte_rand_lfsr258(state); } +uint64_t __rte_experimental +rte_rand_max(uint64_t upper_bound) +{ + uint8_t zeros; + uint64_t mask = ~((uint64_t)0); + uint64_t res; + + if (upper_bound < 2) + return 0; + + zeros = __builtin_clzll(upper_bound); + mask >>= zeros; + + do { + res = rte_rand() & mask; + } while (unlikely(res >= upper_bound)); + + return res; +} + RTE_INIT(rte_rand_init) { rte_srand(rte_get_timer_cycles()); diff git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index 0d60668fa..8f5b8c6a6 100644  a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ 367,6 +367,7 @@ EXPERIMENTAL { rte_mp_sendmsg; rte_option_register; rte_rand; + rte_rand_max; rte_realloc_socket; rte_service_lcore_attr_get; rte_service_lcore_attr_reset_all;
Add a function rte_rand_max() which generates an uniformly distributed pseudorandom number less than a userspecified upper bound. The commonly used pattern rte_rand() % SOME_VALUE, in addition to being slow, also creates biased results if SOME_VALUE is not a power of 2. This bias is very small for small SOME_VALUE, but increases linearly with larger SOME_VALUE. Signedoffby: Mattias Rönnblom <mattias.ronnblom@ericsson.com>  lib/librte_eal/common/include/rte_random.h  16 ++++++++++++++++ lib/librte_eal/common/rte_random.c  20 ++++++++++++++++++++ lib/librte_eal/rte_eal_version.map  1 + 3 files changed, 37 insertions(+)