[v1,2/4] mbuf: add second dynamic field member for VA only build

Message ID 20220829151626.2101336-3-sthotton@marvell.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series mbuf dynamic field expansion |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Shijith Thotton Aug. 29, 2022, 3:16 p.m. UTC
  mbuf physical address field is not used in builds which only uses VA. It
is used to expand the dynamic field area.

Signed-off-by: Shijith Thotton <sthotton@marvell.com>
---
 lib/mbuf/rte_mbuf_core.h | 26 +++++++++++++++++---------
 lib/mbuf/rte_mbuf_dyn.c  |  2 ++
 2 files changed, 19 insertions(+), 9 deletions(-)
  

Comments

Morten Brørup Aug. 29, 2022, 6:32 p.m. UTC | #1
> From: Shijith Thotton [mailto:sthotton@marvell.com]
> Sent: Monday, 29 August 2022 17.16
> 
> mbuf physical address field is not used in builds which only uses VA.
> It is used to expand the dynamic field area.
> 
> Signed-off-by: Shijith Thotton <sthotton@marvell.com>
> ---
>  lib/mbuf/rte_mbuf_core.h | 26 +++++++++++++++++---------
>  lib/mbuf/rte_mbuf_dyn.c  |  2 ++
>  2 files changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> index 81cb07c2e4..98ce62fd6a 100644
> --- a/lib/mbuf/rte_mbuf_core.h
> +++ b/lib/mbuf/rte_mbuf_core.h
> @@ -579,15 +579,23 @@ struct rte_mbuf {
>  	RTE_MARKER cacheline0;
> 
>  	void *buf_addr;           /**< Virtual address of segment buffer.
> */
> -	/**
> -	 * Physical address of segment buffer.
> -	 * This field is invalid if the build is configured to use only
> -	 * virtual address as IOVA (i.e. RTE_IOVA_AS_VA is defined).
> -	 * Force alignment to 8-bytes, so as to ensure we have the exact
> -	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
> -	 * working on vector drivers easier.
> -	 */
> -	rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
> +	RTE_STD_C11
> +	union {
> +		/**
> +		 * Physical address of segment buffer.
> +		 * This field is invalid if the build is configured to use
> only
> +		 * virtual address as IOVA (i.e. RTE_IOVA_AS_VA is
> defined).
> +		 * Force alignment to 8-bytes, so as to ensure we have the
> exact
> +		 * same mbuf cacheline0 layout for 32-bit and 64-bit. This
> makes
> +		 * working on vector drivers easier.
> +		 */
> +		rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
> +		/**
> +		 * Reserved for dynamic field in builds where physical
> address
> +		 * field is invalid.
> +		 */
> +		uint64_t dynfield2;
> +	};
> 
>  	/* next 8 bytes are initialised on RX descriptor rearm */
>  	RTE_MARKER64 rearm_data;

I know that the intention here is to keep the rte_mbuf structure intact, which will certainly improve the probability of getting this patch series into DPDK.

So, I will add a comment for the benefit of the other participants in the discussion:

With this patch, and in RTE_IOVA_AS_VA mode, it becomes possible to move m->next into the first cache line, so rte_pktmbuf_prefree_seg() does not have to touch the second cache line, thus potentially improving performance by eliminating one cache miss per freed packet segment. (I also recall someone mentioning that some PMDs set m->next on RX... If that is the case, a cache miss per packet might also be avoidable in those PMDs.)

Obviously, moving m->next to the first cache line is not related to this patch series, but would belong in a completely different patch.
  
Bruce Richardson Aug. 30, 2022, 8:35 a.m. UTC | #2
On Mon, Aug 29, 2022 at 08:32:20PM +0200, Morten Brørup wrote:
> 
> > From: Shijith Thotton [mailto:sthotton@marvell.com]
> > Sent: Monday, 29 August 2022 17.16
> > 
> > mbuf physical address field is not used in builds which only uses VA.
> > It is used to expand the dynamic field area.
> > 
> > Signed-off-by: Shijith Thotton <sthotton@marvell.com>
> > ---
> >  lib/mbuf/rte_mbuf_core.h | 26 +++++++++++++++++---------
> >  lib/mbuf/rte_mbuf_dyn.c  |  2 ++
> >  2 files changed, 19 insertions(+), 9 deletions(-)
> > 
> > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > index 81cb07c2e4..98ce62fd6a 100644
> > --- a/lib/mbuf/rte_mbuf_core.h
> > +++ b/lib/mbuf/rte_mbuf_core.h
> > @@ -579,15 +579,23 @@ struct rte_mbuf {
> >  	RTE_MARKER cacheline0;
> > 
> >  	void *buf_addr;           /**< Virtual address of segment buffer.
> > */
> > -	/**
> > -	 * Physical address of segment buffer.
> > -	 * This field is invalid if the build is configured to use only
> > -	 * virtual address as IOVA (i.e. RTE_IOVA_AS_VA is defined).
> > -	 * Force alignment to 8-bytes, so as to ensure we have the exact
> > -	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
> > -	 * working on vector drivers easier.
> > -	 */
> > -	rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
> > +	RTE_STD_C11
> > +	union {
> > +		/**
> > +		 * Physical address of segment buffer.
> > +		 * This field is invalid if the build is configured to use
> > only
> > +		 * virtual address as IOVA (i.e. RTE_IOVA_AS_VA is
> > defined).
> > +		 * Force alignment to 8-bytes, so as to ensure we have the
> > exact
> > +		 * same mbuf cacheline0 layout for 32-bit and 64-bit. This
> > makes
> > +		 * working on vector drivers easier.
> > +		 */
> > +		rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
> > +		/**
> > +		 * Reserved for dynamic field in builds where physical
> > address
> > +		 * field is invalid.
> > +		 */
> > +		uint64_t dynfield2;
> > +	};
> > 
> >  	/* next 8 bytes are initialised on RX descriptor rearm */
> >  	RTE_MARKER64 rearm_data;
> 
> I know that the intention here is to keep the rte_mbuf structure intact, which will certainly improve the probability of getting this patch series into DPDK.
> 
> So, I will add a comment for the benefit of the other participants in the discussion:
> 
> With this patch, and in RTE_IOVA_AS_VA mode, it becomes possible to move m->next into the first cache line, so rte_pktmbuf_prefree_seg() does not have to touch the second cache line, thus potentially improving performance by eliminating one cache miss per freed packet segment. (I also recall someone mentioning that some PMDs set m->next on RX... If that is the case, a cache miss per packet might also be avoidable in those PMDs.)
> 
> Obviously, moving m->next to the first cache line is not related to this patch series, but would belong in a completely different patch.
>

+1 to that, with the exception that if it is decided to move the next
pointer rather than use this as dynamic space, I think it *should* be in
this patch series, rather than mucking about with mbuf twice. :-)
  
Pavan Nikhilesh Bhagavatula Aug. 30, 2022, 8:41 a.m. UTC | #3
> -----Original Message-----
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Tuesday, August 30, 2022 2:06 PM
> To: Morten Brørup <mb@smartsharesystems.com>
> Cc: Shijith Thotton <sthotton@marvell.com>; dev@dpdk.org; Pavan
> Nikhilesh Bhagavatula <pbhagavatula@marvell.com>;
> Honnappa.Nagarahalli@arm.com; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>; olivier.matz@6wind.com;
> stephen@networkplumber.org; thomas@monjalon.net
> Subject: [EXT] Re: [PATCH v1 2/4] mbuf: add second dynamic field member
> for VA only build
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Mon, Aug 29, 2022 at 08:32:20PM +0200, Morten Brørup wrote:
> >
> > > From: Shijith Thotton [mailto:sthotton@marvell.com]
> > > Sent: Monday, 29 August 2022 17.16
> > >
> > > mbuf physical address field is not used in builds which only uses VA.
> > > It is used to expand the dynamic field area.
> > >
> > > Signed-off-by: Shijith Thotton <sthotton@marvell.com>
> > > ---
> > >  lib/mbuf/rte_mbuf_core.h | 26 +++++++++++++++++---------
> > >  lib/mbuf/rte_mbuf_dyn.c  |  2 ++
> > >  2 files changed, 19 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > index 81cb07c2e4..98ce62fd6a 100644
> > > --- a/lib/mbuf/rte_mbuf_core.h
> > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > @@ -579,15 +579,23 @@ struct rte_mbuf {
> > >  	RTE_MARKER cacheline0;
> > >
> > >  	void *buf_addr;           /**< Virtual address of segment buffer.
> > > */
> > > -	/**
> > > -	 * Physical address of segment buffer.
> > > -	 * This field is invalid if the build is configured to use only
> > > -	 * virtual address as IOVA (i.e. RTE_IOVA_AS_VA is defined).
> > > -	 * Force alignment to 8-bytes, so as to ensure we have the exact
> > > -	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
> > > -	 * working on vector drivers easier.
> > > -	 */
> > > -	rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
> > > +	RTE_STD_C11
> > > +	union {
> > > +		/**
> > > +		 * Physical address of segment buffer.
> > > +		 * This field is invalid if the build is configured to use
> > > only
> > > +		 * virtual address as IOVA (i.e. RTE_IOVA_AS_VA is
> > > defined).
> > > +		 * Force alignment to 8-bytes, so as to ensure we have the
> > > exact
> > > +		 * same mbuf cacheline0 layout for 32-bit and 64-bit. This
> > > makes
> > > +		 * working on vector drivers easier.
> > > +		 */
> > > +		rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
> > > +		/**
> > > +		 * Reserved for dynamic field in builds where physical
> > > address
> > > +		 * field is invalid.
> > > +		 */
> > > +		uint64_t dynfield2;
> > > +	};
> > >
> > >  	/* next 8 bytes are initialised on RX descriptor rearm */
> > >  	RTE_MARKER64 rearm_data;
> >
> > I know that the intention here is to keep the rte_mbuf structure intact,
> which will certainly improve the probability of getting this patch series into
> DPDK.
> >
> > So, I will add a comment for the benefit of the other participants in the
> discussion:
> >
> > With this patch, and in RTE_IOVA_AS_VA mode, it becomes possible to
> move m->next into the first cache line, so rte_pktmbuf_prefree_seg() does
> not have to touch the second cache line, thus potentially improving
> performance by eliminating one cache miss per freed packet segment. (I also
> recall someone mentioning that some PMDs set m->next on RX... If that is
> the case, a cache miss per packet might also be avoidable in those PMDs.)
> >
> > Obviously, moving m->next to the first cache line is not related to this patch
> series, but would belong in a completely different patch.
> >
> 
> +1 to that, with the exception that if it is decided to move the next
> pointer rather than use this as dynamic space, I think it *should* be in
> this patch series, rather than mucking about with mbuf twice. :-)

+1 When RTE_IOVA_AS_VA is set we can set mbuf->next as the dynamic field and move it to mbuf->buf_iova.
mbuf->next write is one of the prominent hotspot in arm platforms.
  
Honnappa Nagarahalli Aug. 30, 2022, 1:22 p.m. UTC | #4
<snip>
> >
> > ----------------------------------------------------------------------
> > On Mon, Aug 29, 2022 at 08:32:20PM +0200, Morten Brørup wrote:
> > >
> > > > From: Shijith Thotton [mailto:sthotton@marvell.com]
> > > > Sent: Monday, 29 August 2022 17.16
> > > >
> > > > mbuf physical address field is not used in builds which only uses VA.
> > > > It is used to expand the dynamic field area.
> > > >
> > > > Signed-off-by: Shijith Thotton <sthotton@marvell.com>
> > > > ---
> > > >  lib/mbuf/rte_mbuf_core.h | 26 +++++++++++++++++---------
> > > > lib/mbuf/rte_mbuf_dyn.c  |  2 ++
> > > >  2 files changed, 19 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > > index 81cb07c2e4..98ce62fd6a 100644
> > > > --- a/lib/mbuf/rte_mbuf_core.h
> > > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > > @@ -579,15 +579,23 @@ struct rte_mbuf {
> > > >  	RTE_MARKER cacheline0;
> > > >
> > > >  	void *buf_addr;           /**< Virtual address of segment buffer.
> > > > */
> > > > -	/**
> > > > -	 * Physical address of segment buffer.
> > > > -	 * This field is invalid if the build is configured to use only
> > > > -	 * virtual address as IOVA (i.e. RTE_IOVA_AS_VA is defined).
> > > > -	 * Force alignment to 8-bytes, so as to ensure we have the exact
> > > > -	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
> > > > -	 * working on vector drivers easier.
> > > > -	 */
> > > > -	rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
> > > > +	RTE_STD_C11
> > > > +	union {
> > > > +		/**
> > > > +		 * Physical address of segment buffer.
> > > > +		 * This field is invalid if the build is configured to use
> > > > only
> > > > +		 * virtual address as IOVA (i.e. RTE_IOVA_AS_VA is
> > > > defined).
> > > > +		 * Force alignment to 8-bytes, so as to ensure we have the
> > > > exact
> > > > +		 * same mbuf cacheline0 layout for 32-bit and 64-bit. This
> > > > makes
> > > > +		 * working on vector drivers easier.
> > > > +		 */
> > > > +		rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
> > > > +		/**
> > > > +		 * Reserved for dynamic field in builds where physical
> > > > address
> > > > +		 * field is invalid.
> > > > +		 */
> > > > +		uint64_t dynfield2;
> > > > +	};
> > > >
> > > >  	/* next 8 bytes are initialised on RX descriptor rearm */
> > > >  	RTE_MARKER64 rearm_data;
> > >
> > > I know that the intention here is to keep the rte_mbuf structure
> > > intact,
> > which will certainly improve the probability of getting this patch
> > series into DPDK.
> > >
> > > So, I will add a comment for the benefit of the other participants
> > > in the
> > discussion:
> > >
> > > With this patch, and in RTE_IOVA_AS_VA mode, it becomes possible to
> > move m->next into the first cache line, so rte_pktmbuf_prefree_seg()
> > does not have to touch the second cache line, thus potentially
> > improving performance by eliminating one cache miss per freed packet
> > segment. (I also recall someone mentioning that some PMDs set m->next
> > on RX... If that is the case, a cache miss per packet might also be
> > avoidable in those PMDs.)
> > >
> > > Obviously, moving m->next to the first cache line is not related to
> > > this patch
> > series, but would belong in a completely different patch.
> > >
> >
> > +1 to that, with the exception that if it is decided to move the next
> > pointer rather than use this as dynamic space, I think it *should* be
> > in this patch series, rather than mucking about with mbuf twice. :-)
> 
> +1 When RTE_IOVA_AS_VA is set we can set mbuf->next as the dynamic field
> and move it to mbuf->buf_iova.
> mbuf->next write is one of the prominent hotspot in arm platforms.
+1 for reducing the cachelines that need to be touched
  
Shijith Thotton Sept. 7, 2022, 1:55 p.m. UTC | #5
>> >
>> > ----------------------------------------------------------------------
>> > On Mon, Aug 29, 2022 at 08:32:20PM +0200, Morten Brørup wrote:
>> > >
>> > > > From: Shijith Thotton [mailto:sthotton@marvell.com]
>> > > > Sent: Monday, 29 August 2022 17.16
>> > > >
>> > > > mbuf physical address field is not used in builds which only uses VA.
>> > > > It is used to expand the dynamic field area.
>> > > >
>> > > > Signed-off-by: Shijith Thotton <sthotton@marvell.com>
>> > > > ---
>> > > >  lib/mbuf/rte_mbuf_core.h | 26 +++++++++++++++++---------
>> > > > lib/mbuf/rte_mbuf_dyn.c  |  2 ++
>> > > >  2 files changed, 19 insertions(+), 9 deletions(-)
>> > > >
>> > > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
>> > > > index 81cb07c2e4..98ce62fd6a 100644
>> > > > --- a/lib/mbuf/rte_mbuf_core.h
>> > > > +++ b/lib/mbuf/rte_mbuf_core.h
>> > > > @@ -579,15 +579,23 @@ struct rte_mbuf {
>> > > >  	RTE_MARKER cacheline0;
>> > > >
>> > > >  	void *buf_addr;           /**< Virtual address of segment buffer.
>> > > > */
>> > > > -	/**
>> > > > -	 * Physical address of segment buffer.
>> > > > -	 * This field is invalid if the build is configured to use only
>> > > > -	 * virtual address as IOVA (i.e. RTE_IOVA_AS_VA is defined).
>> > > > -	 * Force alignment to 8-bytes, so as to ensure we have the exact
>> > > > -	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
>> > > > -	 * working on vector drivers easier.
>> > > > -	 */
>> > > > -	rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
>> > > > +	RTE_STD_C11
>> > > > +	union {
>> > > > +		/**
>> > > > +		 * Physical address of segment buffer.
>> > > > +		 * This field is invalid if the build is configured to use
>> > > > only
>> > > > +		 * virtual address as IOVA (i.e. RTE_IOVA_AS_VA is
>> > > > defined).
>> > > > +		 * Force alignment to 8-bytes, so as to ensure we have the
>> > > > exact
>> > > > +		 * same mbuf cacheline0 layout for 32-bit and 64-bit. This
>> > > > makes
>> > > > +		 * working on vector drivers easier.
>> > > > +		 */
>> > > > +		rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
>> > > > +		/**
>> > > > +		 * Reserved for dynamic field in builds where physical
>> > > > address
>> > > > +		 * field is invalid.
>> > > > +		 */
>> > > > +		uint64_t dynfield2;
>> > > > +	};
>> > > >
>> > > >  	/* next 8 bytes are initialised on RX descriptor rearm */
>> > > >  	RTE_MARKER64 rearm_data;
>> > >
>> > > I know that the intention here is to keep the rte_mbuf structure
>> > > intact,
>> > which will certainly improve the probability of getting this patch
>> > series into DPDK.
>> > >
>> > > So, I will add a comment for the benefit of the other participants
>> > > in the
>> > discussion:
>> > >
>> > > With this patch, and in RTE_IOVA_AS_VA mode, it becomes possible to
>> > move m->next into the first cache line, so rte_pktmbuf_prefree_seg()
>> > does not have to touch the second cache line, thus potentially
>> > improving performance by eliminating one cache miss per freed packet
>> > segment. (I also recall someone mentioning that some PMDs set m->next
>> > on RX... If that is the case, a cache miss per packet might also be
>> > avoidable in those PMDs.)
>> > >
>> > > Obviously, moving m->next to the first cache line is not related to
>> > > this patch
>> > series, but would belong in a completely different patch.
>> > >
>> >
>> > +1 to that, with the exception that if it is decided to move the next
>> > pointer rather than use this as dynamic space, I think it *should* be
>> > in this patch series, rather than mucking about with mbuf twice. :-)
>>
>> +1 When RTE_IOVA_AS_VA is set we can set mbuf->next as the dynamic field
>> and move it to mbuf->buf_iova.
>> mbuf->next write is one of the prominent hotspot in arm platforms.
>+1 for reducing the cachelines that need to be touched
 
Added a new patch to move next pointer to first cache line in v2. Please review.
https://patchwork.dpdk.org/project/dpdk/patch/20220907134340.3629224-4-sthotton@marvell.com/
  

Patch

diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index 81cb07c2e4..98ce62fd6a 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -579,15 +579,23 @@  struct rte_mbuf {
 	RTE_MARKER cacheline0;
 
 	void *buf_addr;           /**< Virtual address of segment buffer. */
-	/**
-	 * Physical address of segment buffer.
-	 * This field is invalid if the build is configured to use only
-	 * virtual address as IOVA (i.e. RTE_IOVA_AS_VA is defined).
-	 * Force alignment to 8-bytes, so as to ensure we have the exact
-	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
-	 * working on vector drivers easier.
-	 */
-	rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
+	RTE_STD_C11
+	union {
+		/**
+		 * Physical address of segment buffer.
+		 * This field is invalid if the build is configured to use only
+		 * virtual address as IOVA (i.e. RTE_IOVA_AS_VA is defined).
+		 * Force alignment to 8-bytes, so as to ensure we have the exact
+		 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
+		 * working on vector drivers easier.
+		 */
+		rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
+		/**
+		 * Reserved for dynamic field in builds where physical address
+		 * field is invalid.
+		 */
+		uint64_t dynfield2;
+	};
 
 	/* next 8 bytes are initialised on RX descriptor rearm */
 	RTE_MARKER64 rearm_data;
diff --git a/lib/mbuf/rte_mbuf_dyn.c b/lib/mbuf/rte_mbuf_dyn.c
index 4ae79383b5..0813d5fb34 100644
--- a/lib/mbuf/rte_mbuf_dyn.c
+++ b/lib/mbuf/rte_mbuf_dyn.c
@@ -128,6 +128,8 @@  init_shared_mem(void)
 		 */
 		memset(shm, 0, sizeof(*shm));
 		mark_free(dynfield1);
+		if (rte_is_iova_as_va_build())
+			mark_free(dynfield2);
 
 		/* init free_flags */
 		for (mask = RTE_MBUF_F_FIRST_FREE; mask <= RTE_MBUF_F_LAST_FREE; mask <<= 1)