[v3] mempool: micro-optimize put function

Message ID 20221224104630.32235-1-mb@smartsharesystems.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [v3] mempool: micro-optimize put function |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/github-robot: build success github build: passed
ci/iol-abi-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS

Commit Message

Morten Brørup Dec. 24, 2022, 10:46 a.m. UTC
  Micro-optimization:
Reduced the most likely code path in the generic put function by moving an
unlikely check out of the most likely code path and further down.

Also updated the comments in the function.

v3 (feedback from Konstantin Ananyev):
* Removed assertion and comment about the invariant preventing overflow
  in the comparison. They were more confusing than enlightening.
v2 (feedback from Andrew Rybchenko):
* Modified comparison to prevent overflow if n is really huge and len is
  non-zero.
* Added assertion about the invariant preventing overflow in the
  comparison.
* Crossing the threshold is not extremely unlikely, so removed likely()
  from that comparison.
  The compiler will generate code with optimal static branch prediction
  here anyway.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
---
 lib/mempool/rte_mempool.h | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)
  

Comments

Andrew Rybchenko Dec. 27, 2022, 8:54 a.m. UTC | #1
On 12/24/22 13:46, Morten Brørup wrote:
> Micro-optimization:
> Reduced the most likely code path in the generic put function by moving an
> unlikely check out of the most likely code path and further down.
> 
> Also updated the comments in the function.
> 
> v3 (feedback from Konstantin Ananyev):
> * Removed assertion and comment about the invariant preventing overflow
>    in the comparison. They were more confusing than enlightening.
> v2 (feedback from Andrew Rybchenko):
> * Modified comparison to prevent overflow if n is really huge and len is
>    non-zero.
> * Added assertion about the invariant preventing overflow in the
>    comparison.
> * Crossing the threshold is not extremely unlikely, so removed likely()
>    from that comparison.
>    The compiler will generate code with optimal static branch prediction
>    here anyway.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>

Thanks for optimizing it further.

Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

> ---
>   lib/mempool/rte_mempool.h | 35 ++++++++++++++++++-----------------
>   1 file changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 9f530db24b..61ca0c6b65 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -1364,32 +1364,33 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
>   {
>   	void **cache_objs;
>   
> -	/* No cache provided */
> +	/* No cache provided? */

IMHO such changes do not add value and just add noise.
There are few similar cases below.
No strong opinion in any case.
  
Morten Brørup Dec. 27, 2022, 3:37 p.m. UTC | #2
> From: Andrew Rybchenko [mailto:andrew.rybchenko@oktetlabs.ru]
> Sent: Tuesday, 27 December 2022 09.54
> 
> On 12/24/22 13:46, Morten Brørup wrote:
> > Micro-optimization:
> > Reduced the most likely code path in the generic put function by
> moving an
> > unlikely check out of the most likely code path and further down.
> >
> > Also updated the comments in the function.
> >
> > v3 (feedback from Konstantin Ananyev):
> > * Removed assertion and comment about the invariant preventing
> overflow
> >    in the comparison. They were more confusing than enlightening.
> > v2 (feedback from Andrew Rybchenko):
> > * Modified comparison to prevent overflow if n is really huge and len
> is
> >    non-zero.
> > * Added assertion about the invariant preventing overflow in the
> >    comparison.
> > * Crossing the threshold is not extremely unlikely, so removed
> likely()
> >    from that comparison.
> >    The compiler will generate code with optimal static branch
> prediction
> >    here anyway.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> 
> Thanks for optimizing it further.
> 
> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> 
> > ---
> >   lib/mempool/rte_mempool.h | 35 ++++++++++++++++++-----------------
> >   1 file changed, 18 insertions(+), 17 deletions(-)
> >
> > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> > index 9f530db24b..61ca0c6b65 100644
> > --- a/lib/mempool/rte_mempool.h
> > +++ b/lib/mempool/rte_mempool.h
> > @@ -1364,32 +1364,33 @@ rte_mempool_do_generic_put(struct rte_mempool
> *mp, void * const *obj_table,
> >   {
> >   	void **cache_objs;
> >
> > -	/* No cache provided */
> > +	/* No cache provided? */
> 
> IMHO such changes do not add value and just add noise.
> There are few similar cases below.
> No strong opinion in any case.
> 

This patch is obsolete, because the zero-copy patch v5 [1] uses the zero-copy put function, which is optimized similarly, in rte_mempool_do_generic_put().

[1]: https://patchwork.dpdk.org/project/dpdk/patch/20221227151700.80887-1-mb@smartsharesystems.com/
  

Patch

diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 9f530db24b..61ca0c6b65 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -1364,32 +1364,33 @@  rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 {
 	void **cache_objs;
 
-	/* No cache provided */
+	/* No cache provided? */
 	if (unlikely(cache == NULL))
 		goto driver_enqueue;
 
-	/* increment stat now, adding in mempool always success */
+	/* Increment stats now, adding in mempool always succeeds. */
 	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
 	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
 
-	/* The request itself is too big for the cache */
-	if (unlikely(n > cache->flushthresh))
-		goto driver_enqueue_stats_incremented;
-
-	/*
-	 * The cache follows the following algorithm:
-	 *   1. If the objects cannot be added to the cache without crossing
-	 *      the flush threshold, flush the cache to the backend.
-	 *   2. Add the objects to the cache.
-	 */
-
-	if (cache->len + n <= cache->flushthresh) {
+	if (n <= cache->flushthresh - cache->len) {
+		/*
+		 * The objects can be added to the cache without crossing the
+		 * flush threshold.
+		 */
 		cache_objs = &cache->objs[cache->len];
 		cache->len += n;
-	} else {
+	} else if (likely(n <= cache->flushthresh)) {
+		/*
+		 * The request itself fits into the cache.
+		 * But first, the cache must be flushed to the backend, so
+		 * adding the objects does not cross the flush threshold.
+		 */
 		cache_objs = &cache->objs[0];
 		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
 		cache->len = n;
+	} else {
+		/* The request itself is too big for the cache. */
+		goto driver_enqueue_stats_incremented;
 	}
 
 	/* Add the objects to the cache. */
@@ -1399,13 +1400,13 @@  rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 
 driver_enqueue:
 
-	/* increment stat now, adding in mempool always success */
+	/* Increment stats now, adding in mempool always succeeds. */
 	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
 	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
 
 driver_enqueue_stats_incremented:
 
-	/* push objects to the backend */
+	/* Push the objects to the backend. */
 	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
 }