[v2] mbuf: optimize memory loads during mbuf freeing

Message ID 1584719715-17818-1-git-send-email-akozyrev@mellanox.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series [v2] mbuf: optimize memory loads during mbuf freeing |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-intel-Performance fail Performance Testing issues
ci/Intel-compilation success Compilation OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/travis-robot success Travis build: passed
ci/iol-testing success Testing PASS

Commit Message

Alexander Kozyrev March 20, 2020, 3:55 p.m. UTC
  Introduction of pinned external buffers doubled memory loads in the
rte_pktmbuf_prefree_seg() function. Analysis of the generated assembly
code shows unnecessary load of the pool field of the rte_mbuf structure.
Here is the snippet of the assembly for "if (!RTE_MBUF_DIRECT(m))":
Before the change the code was:
	movq  0x18(%rbx), %rax // load the ol_flags field
	test %r13, %rax	       // check if ol_flags equals to 0x60...0
	jz 0x9a8718 <Block 2>  // jump out to "if (m->next != NULL)"
After the change the code became:
	movq  0x18(%rbx), %rax // load ol_flags
	test %r14, %rax	       // check if ol_flags equals to 0x60...0
	jnz 0x9bea38 <Block 2> // jump in to "if (!RTE_MBUF_HAS_EXTBUF(m)"
	movq  0x48(%rbx), %rax // load the pool field
	jmp 0x9bea78 <Block 7> // jump out to "if (m->next != NULL)"
Look like this absolutely unneeded memory load of the pool field is an
optimization for the external buffer case in GCC (4.8.5), since Clang
generates the same assembly for both before and after the change versions.
Plus, GCC favors the external buffer case over the simple case.
This assembly code layout causes the performance degradation because the
rte_pktmbuf_prefree_seg() function is a part of a very hot path.
Workaround this compilation issue by moving the check for pinned buffer
apart from the check for external buffer and restore the initial code
flow that favors the direct mbuf case over the external one.

Fixes: 6ef1107ad4c6 ("mbuf: detach mbuf with pinned external buffer")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 lib/librte_mbuf/rte_mbuf.h | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)
  

Comments

Olivier Matz March 27, 2020, 8:13 a.m. UTC | #1
Hi,

On Fri, Mar 20, 2020 at 03:55:15PM +0000, Alexander Kozyrev wrote:
> Introduction of pinned external buffers doubled memory loads in the
> rte_pktmbuf_prefree_seg() function. Analysis of the generated assembly
> code shows unnecessary load of the pool field of the rte_mbuf structure.
> Here is the snippet of the assembly for "if (!RTE_MBUF_DIRECT(m))":
> Before the change the code was:
> 	movq  0x18(%rbx), %rax // load the ol_flags field
> 	test %r13, %rax	       // check if ol_flags equals to 0x60...0
> 	jz 0x9a8718 <Block 2>  // jump out to "if (m->next != NULL)"
> After the change the code became:
> 	movq  0x18(%rbx), %rax // load ol_flags
> 	test %r14, %rax	       // check if ol_flags equals to 0x60...0
> 	jnz 0x9bea38 <Block 2> // jump in to "if (!RTE_MBUF_HAS_EXTBUF(m)"
> 	movq  0x48(%rbx), %rax // load the pool field
> 	jmp 0x9bea78 <Block 7> // jump out to "if (m->next != NULL)"
> Look like this absolutely unneeded memory load of the pool field is an
> optimization for the external buffer case in GCC (4.8.5), since Clang
> generates the same assembly for both before and after the change versions.
> Plus, GCC favors the external buffer case over the simple case.
> This assembly code layout causes the performance degradation because the
> rte_pktmbuf_prefree_seg() function is a part of a very hot path.
> Workaround this compilation issue by moving the check for pinned buffer
> apart from the check for external buffer and restore the initial code
> flow that favors the direct mbuf case over the external one.
> 
> Fixes: 6ef1107ad4c6 ("mbuf: detach mbuf with pinned external buffer")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Acked-by: Olivier Matz <olivier.matz@6wind.com>

Thanks!
  
Thomas Monjalon March 31, 2020, 1:46 a.m. UTC | #2
27/03/2020 09:13, Olivier Matz:
> On Fri, Mar 20, 2020 at 03:55:15PM +0000, Alexander Kozyrev wrote:
> > Introduction of pinned external buffers doubled memory loads in the
> > rte_pktmbuf_prefree_seg() function. Analysis of the generated assembly
> > code shows unnecessary load of the pool field of the rte_mbuf structure.
> > Here is the snippet of the assembly for "if (!RTE_MBUF_DIRECT(m))":
> > Before the change the code was:
> > 	movq  0x18(%rbx), %rax // load the ol_flags field
> > 	test %r13, %rax	       // check if ol_flags equals to 0x60...0
> > 	jz 0x9a8718 <Block 2>  // jump out to "if (m->next != NULL)"
> > After the change the code became:
> > 	movq  0x18(%rbx), %rax // load ol_flags
> > 	test %r14, %rax	       // check if ol_flags equals to 0x60...0
> > 	jnz 0x9bea38 <Block 2> // jump in to "if (!RTE_MBUF_HAS_EXTBUF(m)"
> > 	movq  0x48(%rbx), %rax // load the pool field
> > 	jmp 0x9bea78 <Block 7> // jump out to "if (m->next != NULL)"
> > Look like this absolutely unneeded memory load of the pool field is an
> > optimization for the external buffer case in GCC (4.8.5), since Clang
> > generates the same assembly for both before and after the change versions.
> > Plus, GCC favors the external buffer case over the simple case.
> > This assembly code layout causes the performance degradation because the
> > rte_pktmbuf_prefree_seg() function is a part of a very hot path.
> > Workaround this compilation issue by moving the check for pinned buffer
> > apart from the check for external buffer and restore the initial code
> > flow that favors the direct mbuf case over the external one.
> > 
> > Fixes: 6ef1107ad4c6 ("mbuf: detach mbuf with pinned external buffer")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
> > Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> 
> Thanks!

Applied, thanks
  

Patch

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 34679e0..f8e492e 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1335,10 +1335,10 @@  static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
 	if (likely(rte_mbuf_refcnt_read(m) == 1)) {
 
 		if (!RTE_MBUF_DIRECT(m)) {
-			if (!RTE_MBUF_HAS_EXTBUF(m) ||
-			    !RTE_MBUF_HAS_PINNED_EXTBUF(m))
-				rte_pktmbuf_detach(m);
-			else if (__rte_pktmbuf_pinned_extbuf_decref(m))
+			rte_pktmbuf_detach(m);
+			if (RTE_MBUF_HAS_EXTBUF(m) &&
+			    RTE_MBUF_HAS_PINNED_EXTBUF(m) &&
+			    __rte_pktmbuf_pinned_extbuf_decref(m))
 				return NULL;
 		}
 
@@ -1352,10 +1352,10 @@  static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
 	} else if (__rte_mbuf_refcnt_update(m, -1) == 0) {
 
 		if (!RTE_MBUF_DIRECT(m)) {
-			if (!RTE_MBUF_HAS_EXTBUF(m) ||
-			    !RTE_MBUF_HAS_PINNED_EXTBUF(m))
-				rte_pktmbuf_detach(m);
-			else if (__rte_pktmbuf_pinned_extbuf_decref(m))
+			rte_pktmbuf_detach(m);
+			if (RTE_MBUF_HAS_EXTBUF(m) &&
+			    RTE_MBUF_HAS_PINNED_EXTBUF(m) &&
+			    __rte_pktmbuf_pinned_extbuf_decref(m))
 				return NULL;
 		}