diff mbox series

mbuf: add mbuf physical address field to dynamic field

Message ID 57d2ab7fff672716d37ba4078e2e3bb2db126607.1656605763.git.sthotton@marvell.com (mailing list archive)
State New
Delegated to: Thomas Monjalon
Headers show
Series mbuf: add mbuf physical address field to dynamic field | expand

Checks

Context Check Description
ci/iol-abi-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/github-robot: build fail github build: failed
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/intel-Testing success Testing PASS
ci/Intel-compilation fail Compilation issues
ci/checkpatch warning coding style issues

Commit Message

Shijith Thotton June 30, 2022, 4:25 p.m. UTC
If all devices are configured to run in IOVA mode as VA, physical
address field of mbuf (buf_iova) won't be used. In such cases, buf_iova
space is free to use as a dynamic field. So a new dynamic field member
(dynfield2) is added in mbuf structure to make use of that space.

A new mbuf flag RTE_MBUF_F_DYNFIELD2 is introduced to help identify the
mbuf that can use dynfield2.

Signed-off-by: Shijith Thotton <sthotton@marvell.com>
---
 lib/mbuf/rte_mbuf.c      |  8 ++++++++
 lib/mbuf/rte_mbuf.h      | 16 +++++++++++++---
 lib/mbuf/rte_mbuf_core.h | 29 ++++++++++++++++++++++-------
 lib/mbuf/rte_mbuf_dyn.c  |  3 +++
 4 files changed, 46 insertions(+), 10 deletions(-)

Comments

Stephen Hemminger June 30, 2022, 4:45 p.m. UTC | #1
On Thu, 30 Jun 2022 21:55:16 +0530
Shijith Thotton <sthotton@marvell.com> wrote:

> If all devices are configured to run in IOVA mode as VA, physical
> address field of mbuf (buf_iova) won't be used. In such cases, buf_iova
> space is free to use as a dynamic field. So a new dynamic field member
> (dynfield2) is added in mbuf structure to make use of that space.
> 
> A new mbuf flag RTE_MBUF_F_DYNFIELD2 is introduced to help identify the
> mbuf that can use dynfield2.
> 
> Signed-off-by: Shijith Thotton <sthotton@marvell.com>

This seems like a complex and potentially error prone way to do this.
What is the use case? How much of a performance gain?
Bruce Richardson June 30, 2022, 4:55 p.m. UTC | #2
On Thu, Jun 30, 2022 at 09:55:16PM +0530, Shijith Thotton wrote:
> If all devices are configured to run in IOVA mode as VA, physical
> address field of mbuf (buf_iova) won't be used. In such cases, buf_iova
> space is free to use as a dynamic field. So a new dynamic field member
> (dynfield2) is added in mbuf structure to make use of that space.
> 
> A new mbuf flag RTE_MBUF_F_DYNFIELD2 is introduced to help identify the
> mbuf that can use dynfield2.
> 
> Signed-off-by: Shijith Thotton <sthotton@marvell.com>
> ---
I disagree with this patch. The mbuf should always record the iova of the
buffer directly, rather than forcing the drivers to query the EAL mode.
This will likely also break all vector drivers right now, as they are
sensitive to the mbuf layout and the position of the IOVA address in the
buffer.

/Bruce
Olivier Matz July 1, 2022, 9:48 a.m. UTC | #3
Hi,

On Thu, Jun 30, 2022 at 05:55:21PM +0100, Bruce Richardson wrote:
> On Thu, Jun 30, 2022 at 09:55:16PM +0530, Shijith Thotton wrote:
> > If all devices are configured to run in IOVA mode as VA, physical
> > address field of mbuf (buf_iova) won't be used. In such cases, buf_iova
> > space is free to use as a dynamic field. So a new dynamic field member
> > (dynfield2) is added in mbuf structure to make use of that space.
> > 
> > A new mbuf flag RTE_MBUF_F_DYNFIELD2 is introduced to help identify the
> > mbuf that can use dynfield2.
> > 
> > Signed-off-by: Shijith Thotton <sthotton@marvell.com>
> > ---
> I disagree with this patch. The mbuf should always record the iova of the
> buffer directly, rather than forcing the drivers to query the EAL mode.
> This will likely also break all vector drivers right now, as they are
> sensitive to the mbuf layout and the position of the IOVA address in the
> buffer.

I have the same opinion than Stephen and Bruce. This field is widely used
in DPDK, I don't think it is a good idea to disable it if some conditions
are met.
Slava Ovsiienko July 1, 2022, 11:53 a.m. UTC | #4
Hi,

Just to note, some PMDs do not use physical address field at all.
As an example - mlx5 PMD (and it is far from being the only one)
could take an advantage from this patch. Nonetheless, I tend to agree -
for the whole DPDK framework it looks risky. I had the similar thoughts
about removing iova field and I did not dare to propose 😊

With best regards,
Slava

> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Friday, July 1, 2022 12:49
> To: Bruce Richardson <bruce.richardson@intel.com>
> Cc: Shijith Thotton <sthotton@marvell.com>; jerinj@marvell.com; NBU-
> Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; dev@dpdk.org
> Subject: Re: [PATCH] mbuf: add mbuf physical address field to dynamic
> field
> 
> Hi,
> 
> On Thu, Jun 30, 2022 at 05:55:21PM +0100, Bruce Richardson wrote:
> > On Thu, Jun 30, 2022 at 09:55:16PM +0530, Shijith Thotton wrote:
> > > If all devices are configured to run in IOVA mode as VA, physical
> > > address field of mbuf (buf_iova) won't be used. In such cases,
> > > buf_iova space is free to use as a dynamic field. So a new dynamic
> > > field member
> > > (dynfield2) is added in mbuf structure to make use of that space.
> > >
> > > A new mbuf flag RTE_MBUF_F_DYNFIELD2 is introduced to help identify
> > > the mbuf that can use dynfield2.
> > >
> > > Signed-off-by: Shijith Thotton <sthotton@marvell.com>
> > > ---
> > I disagree with this patch. The mbuf should always record the iova of
> > the buffer directly, rather than forcing the drivers to query the EAL
> mode.
> > This will likely also break all vector drivers right now, as they are
> > sensitive to the mbuf layout and the position of the IOVA address in
> > the buffer.
> 
> I have the same opinion than Stephen and Bruce. This field is widely
> used in DPDK, I don't think it is a good idea to disable it if some
> conditions are met.
Shijith Thotton July 1, 2022, 12:01 p.m. UTC | #5
>
>On Thu, Jun 30, 2022 at 05:55:21PM +0100, Bruce Richardson wrote:
>> On Thu, Jun 30, 2022 at 09:55:16PM +0530, Shijith Thotton wrote:
>> > If all devices are configured to run in IOVA mode as VA, physical
>> > address field of mbuf (buf_iova) won't be used. In such cases, buf_iova
>> > space is free to use as a dynamic field. So a new dynamic field member
>> > (dynfield2) is added in mbuf structure to make use of that space.
>> >
>> > A new mbuf flag RTE_MBUF_F_DYNFIELD2 is introduced to help identify the
>> > mbuf that can use dynfield2.
>> >
>> > Signed-off-by: Shijith Thotton <sthotton@marvell.com>
>> > ---
>> I disagree with this patch. The mbuf should always record the iova of the
>> buffer directly, rather than forcing the drivers to query the EAL mode.
>> This will likely also break all vector drivers right now, as they are
>> sensitive to the mbuf layout and the position of the IOVA address in the
>> buffer.
>
 
Hi Bruce,

The IOVA check should have been bus specific, instead of eal.  The bus IOVA mode
will be VA, only if all devices on the bus has the flag
RTE_PCI_DRV_NEED_IOVA_AS_VA. It was our thought process, but used wrong API for
the check. It should have avoided the issue which you mentioned above.

>I have the same opinion than Stephen and Bruce. This field is widely used
>in DPDK, I don't think it is a good idea to disable it if some conditions
>are met.

Hi Olivier, 

I was under the assumption, buf_iova won't be used directly by the application
(only through wrapper). So that wrappers can check ol_flags before setting
buf_iova.
Shijith Thotton July 1, 2022, 12:24 p.m. UTC | #6
>> If all devices are configured to run in IOVA mode as VA, physical
>> address field of mbuf (buf_iova) won't be used. In such cases, buf_iova
>> space is free to use as a dynamic field. So a new dynamic field member
>> (dynfield2) is added in mbuf structure to make use of that space.
>>
>> A new mbuf flag RTE_MBUF_F_DYNFIELD2 is introduced to help identify the
>> mbuf that can use dynfield2.
>>
>> Signed-off-by: Shijith Thotton <sthotton@marvell.com>
>
> This seems like a complex and potentially error prone way to do this.
> What is the use case?
>

PCI drivers with the flag RTE_PCI_DRV_NEED_IOVA_AS_VA only works in IOVA mode as
VA. buf_iova field of mbuf is not used by those PMDs and can be used as a
dynamic area to save space.

> How much of a performance gain?

No change in performance.
Morten Brørup July 3, 2022, 7:31 a.m. UTC | #7
> From: Shijith Thotton [mailto:sthotton@marvell.com]
> Sent: Friday, 1 July 2022 14.25
> 
> >> If all devices are configured to run in IOVA mode as VA, physical
> >> address field of mbuf (buf_iova) won't be used.

Will some of the hardware vendors please comment on this: Has IOVA VA mode become common over time, or is it still an exotic bleeding edge feature?

If it has become common, we should let DPDK evolve accordingly, and consider PA (non-VA) mode legacy, treating it as such. Don't get stuck in the past.

> >> In such cases,
> buf_iova
> >> space is free to use as a dynamic field. So a new dynamic field
> member
> >> (dynfield2) is added in mbuf structure to make use of that space.
> >>
> >> A new mbuf flag RTE_MBUF_F_DYNFIELD2 is introduced to help identify
> the
> >> mbuf that can use dynfield2.
> >>
> >> Signed-off-by: Shijith Thotton <sthotton@marvell.com>
> >
> > This seems like a complex and potentially error prone way to do this.

Perhaps this optimization should be a compile time option instead?

> > What is the use case?
> >
> 
> PCI drivers with the flag RTE_PCI_DRV_NEED_IOVA_AS_VA only works in
> IOVA mode as
> VA. buf_iova field of mbuf is not used by those PMDs and can be used as
> a
> dynamic area to save space.
> 
> > How much of a performance gain?
> 
> No change in performance.

Freeing up 8 bytes in the first mbuf cache line is a major improvement!

This could provide a significant performance gain for some applications, by moving private/dynamic mbuf fields from the second to the first cache line, thus avoiding to write to the second cache line in the application's first pipeline stage.
Bruce Richardson July 4, 2022, 2 p.m. UTC | #8
On Sun, Jul 03, 2022 at 09:31:01AM +0200, Morten Brørup wrote:
> > From: Shijith Thotton [mailto:sthotton@marvell.com]
> > Sent: Friday, 1 July 2022 14.25
> > 
> > >> If all devices are configured to run in IOVA mode as VA, physical
> > >> address field of mbuf (buf_iova) won't be used.
> 
> Will some of the hardware vendors please comment on this: Has IOVA VA mode become common over time, or is it still an exotic bleeding edge feature?
> 
> If it has become common, we should let DPDK evolve accordingly, and consider PA (non-VA) mode legacy, treating it as such. Don't get stuck in the past.
> 

IOVA as VA mode is indeed common and we are constantly encouraging users to
switch to using vfio to try and take advantage of this.

However, in my experience IOVA as PA is still very, very common too. We
cannot drop support for this mode just yet, unfortunately.

> > >> In such cases,
> > buf_iova
> > >> space is free to use as a dynamic field. So a new dynamic field
> > member
> > >> (dynfield2) is added in mbuf structure to make use of that space.
> > >>
> > >> A new mbuf flag RTE_MBUF_F_DYNFIELD2 is introduced to help identify
> > the
> > >> mbuf that can use dynfield2.
> > >>
> > >> Signed-off-by: Shijith Thotton <sthotton@marvell.com>
> > >
> > > This seems like a complex and potentially error prone way to do this.
> 
> Perhaps this optimization should be a compile time option instead?
>

It could indeed be, and probably could be done very safely, in that we could
mark as disabled all drivers when the mode is enabled for a build.
Thereafter, drivers could be marked as VA-only safe as they are updated as
necessary, i.e. use the build system to enforce that only drivers known to
work with the mode are built when the mode is enabled.

That said, verifying all drivers to work with this mode is a decent effort.
Do we have indications of the perf benefit we would get from doing this for
some real-world app?

/Bruce
Shijith Thotton Aug. 3, 2022, 3:34 p.m. UTC | #9
Hi Bruce,

>> > >> If all devices are configured to run in IOVA mode as VA, physical
>> > >> address field of mbuf (buf_iova) won't be used.
>>
>> Will some of the hardware vendors please comment on this: Has IOVA VA mode
>become common over time, or is it still an exotic bleeding edge feature?
>>
>> If it has become common, we should let DPDK evolve accordingly, and consider
>PA (non-VA) mode legacy, treating it as such. Don't get stuck in the past.
>>
>
>IOVA as VA mode is indeed common and we are constantly encouraging users to
>switch to using vfio to try and take advantage of this.
>
>However, in my experience IOVA as PA is still very, very common too. We
>cannot drop support for this mode just yet, unfortunately.
>
>> > >> In such cases,
>> > buf_iova
>> > >> space is free to use as a dynamic field. So a new dynamic field
>> > member
>> > >> (dynfield2) is added in mbuf structure to make use of that space.
>> > >>
>> > >> A new mbuf flag RTE_MBUF_F_DYNFIELD2 is introduced to help identify
>> > the
>> > >> mbuf that can use dynfield2.
>> > >>
>> > >> Signed-off-by: Shijith Thotton <sthotton@marvell.com>
>> > >
>> > > This seems like a complex and potentially error prone way to do this.
>>
>> Perhaps this optimization should be a compile time option instead?
>>
>
>It could indeed be, and probably could be done very safely, in that we could
>mark as disabled all drivers when the mode is enabled for a build.
>Thereafter, drivers could be marked as VA-only safe as they are updated as
>necessary, i.e. use the build system to enforce that only drivers known to
>work with the mode are built when the mode is enabled.
>

I will prepare a patch to enable VA-only build.

>That said, verifying all drivers to work with this mode is a decent effort.
>Do we have indications of the perf benefit we would get from doing this for
>some real-world app?
>
diff mbox series

Patch

diff --git a/lib/mbuf/rte_mbuf.c b/lib/mbuf/rte_mbuf.c
index a2307cebe6..718b4505c4 100644
--- a/lib/mbuf/rte_mbuf.c
+++ b/lib/mbuf/rte_mbuf.c
@@ -101,6 +101,10 @@  rte_pktmbuf_init(struct rte_mempool *mp,
 	m->port = RTE_MBUF_PORT_INVALID;
 	rte_mbuf_refcnt_set(m, 1);
 	m->next = NULL;
+
+	/* enable dynfield2 if IOVA mode is VA */
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		m->ol_flags = RTE_MBUF_F_DYNFIELD2;
 }
 
 /*
@@ -206,6 +210,10 @@  __rte_pktmbuf_init_extmem(struct rte_mempool *mp,
 	rte_mbuf_refcnt_set(m, 1);
 	m->next = NULL;
 
+	/* enable dynfield2 if IOVA mode is VA */
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		m->ol_flags |= RTE_MBUF_F_DYNFIELD2;
+
 	/* init external buffer shared info items */
 	shinfo = RTE_PTR_ADD(m, mbuf_size);
 	m->shinfo = shinfo;
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 9811e8c760..59485f04ed 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -1056,9 +1056,11 @@  rte_pktmbuf_attach_extbuf(struct rte_mbuf *m, void *buf_addr,
 	RTE_ASSERT(shinfo->free_cb != NULL);
 
 	m->buf_addr = buf_addr;
-	m->buf_iova = buf_iova;
 	m->buf_len = buf_len;
 
+	if (!RTE_MBUF_HAS_DYNFIELD2(m))
+		m->buf_iova = buf_iova;
+
 	m->data_len = 0;
 	m->data_off = 0;
 
@@ -1087,6 +1089,10 @@  static inline void
 rte_mbuf_dynfield_copy(struct rte_mbuf *mdst, const struct rte_mbuf *msrc)
 {
 	memcpy(&mdst->dynfield1, msrc->dynfield1, sizeof(mdst->dynfield1));
+
+	if (RTE_MBUF_HAS_DYNFIELD2(mdst))
+		memcpy(&mdst->dynfield2, &msrc->dynfield2,
+		       sizeof(mdst->dynfield2));
 }
 
 /* internal */
@@ -1143,10 +1149,12 @@  static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *m)
 
 	mi->data_off = m->data_off;
 	mi->data_len = m->data_len;
-	mi->buf_iova = m->buf_iova;
 	mi->buf_addr = m->buf_addr;
 	mi->buf_len = m->buf_len;
 
+	if (!RTE_MBUF_HAS_DYNFIELD2(mi))
+		mi->buf_iova = m->buf_iova;
+
 	mi->next = NULL;
 	mi->pkt_len = mi->data_len;
 	mi->nb_segs = 1;
@@ -1245,11 +1253,13 @@  static inline void rte_pktmbuf_detach(struct rte_mbuf *m)
 
 	m->priv_size = priv_size;
 	m->buf_addr = (char *)m + mbuf_size;
-	m->buf_iova = rte_mempool_virt2iova(m) + mbuf_size;
 	m->buf_len = (uint16_t)buf_len;
 	rte_pktmbuf_reset_headroom(m);
 	m->data_len = 0;
 	m->ol_flags = 0;
+
+	if (!RTE_MBUF_HAS_DYNFIELD2(m))
+		m->buf_iova = rte_mempool_virt2iova(m) + mbuf_size;
 }
 
 /**
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index 3d6ddd6773..a549e36464 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -504,6 +504,8 @@  extern "C" {
 #define RTE_MBUF_F_INDIRECT    (1ULL << 62) /**< Indirect attached mbuf */
 #define IND_ATTACHED_MBUF RTE_DEPRECATED(IND_ATTACHED_MBUF) RTE_MBUF_F_INDIRECT
 
+#define RTE_MBUF_F_DYNFIELD2	(1ULL << 63) /**< dynfield2 mbuf field enabled */
+
 /** Alignment constraint of mbuf private area. */
 #define RTE_MBUF_PRIV_ALIGN 8
 
@@ -579,13 +581,18 @@  struct rte_mbuf {
 	RTE_MARKER cacheline0;
 
 	void *buf_addr;           /**< Virtual address of segment buffer. */
-	/**
-	 * Physical address of segment buffer.
-	 * Force alignment to 8-bytes, so as to ensure we have the exact
-	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
-	 * working on vector drivers easier.
-	 */
-	rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
+	RTE_STD_C11
+	union {
+		/**
+		 * Physical address of segment buffer if IOVA mode is not VA.
+		 * Force alignment to 8-bytes, so as to ensure we have the exact
+		 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
+		 * working on vector drivers easier.
+		 */
+		rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
+		/* Reserved for dynamic field if IOVA mode is VA. */
+		uint64_t dynfield2;
+	};
 
 	/* next 8 bytes are initialised on RX descriptor rearm */
 	RTE_MARKER64 rearm_data;
@@ -803,6 +810,14 @@  struct rte_mbuf_ext_shared_info {
 #define RTE_MBUF_DIRECT(mb) \
 	(!((mb)->ol_flags & (RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL)))
 
+/**
+ *
+ * Retrurns TRUE if given mbuf has dynfield2 field enabled, or FALSE otherwise.
+ *
+ * dynfield2 field can be enabled if IOVA mode is configured as VA.
+ */
+#define RTE_MBUF_HAS_DYNFIELD2(mb) (!!((mb)->ol_flags & RTE_MBUF_F_DYNFIELD2))
+
 /** Uninitialized or unspecified port. */
 #define RTE_MBUF_PORT_INVALID UINT16_MAX
 /** For backwards compatibility. */
diff --git a/lib/mbuf/rte_mbuf_dyn.c b/lib/mbuf/rte_mbuf_dyn.c
index 4ae79383b5..7bfe50e0e2 100644
--- a/lib/mbuf/rte_mbuf_dyn.c
+++ b/lib/mbuf/rte_mbuf_dyn.c
@@ -127,7 +127,10 @@  init_shared_mem(void)
 		 * rte_mbuf_dynfield_copy().
 		 */
 		memset(shm, 0, sizeof(*shm));
+
 		mark_free(dynfield1);
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			mark_free(dynfield2);
 
 		/* init free_flags */
 		for (mask = RTE_MBUF_F_FIRST_FREE; mask <= RTE_MBUF_F_LAST_FREE; mask <<= 1)