[RFC,20.02] mbuf: hint PMD not to inline packet

Message ID 20191017072723.36509-1-shahafs@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [RFC,20.02] mbuf: hint PMD not to inline packet |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Shahaf Shuler Oct. 17, 2019, 7:27 a.m. UTC
  Some PMDs inline the mbuf data buffer directly to device. This is in
order to save the overhead of the PCI headers involved when the device
DMA read the buffer pointer. For some devices it is essential in order
to reach the pick BW.

However, there are cases where such inlining is in-efficient. For example
when the data buffer resides on other device memory (like GPU or storage
device). attempt to inline such buffer will result in high PCI overhead
for reading and copying the data from the remote device.

To support a mixed traffic pattern (some buffers from local DRAM, some
buffers from other devices) with high BW, a hint flag is introduced in
the mbuf.
Application will hint the PMD whether or not it should try to inline the
given mbuf data buffer. PMD should do best effort to act upon this
request.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
 lib/librte_mbuf/rte_mbuf.h | 9 +++++++++
 1 file changed, 9 insertions(+)
  

Comments

Jerin Jacob Oct. 17, 2019, 8:16 a.m. UTC | #1
On Thu, Oct 17, 2019 at 12:57 PM Shahaf Shuler <shahafs@mellanox.com> wrote:
>
> Some PMDs inline the mbuf data buffer directly to device. This is in
> order to save the overhead of the PCI headers involved when the device
> DMA read the buffer pointer. For some devices it is essential in order
> to reach the pick BW.
>
> However, there are cases where such inlining is in-efficient. For example
> when the data buffer resides on other device memory (like GPU or storage
> device). attempt to inline such buffer will result in high PCI overhead
> for reading and copying the data from the remote device.

Some questions to understand the use case
# Is this use case where CPU, local DRAM, NW card and GPU memory
connected on the coherent bus
# Assuming the CPU needs to touch the buffer prior to Tx, In that
case, it will be useful?
# How the application knows, The data buffer is in GPU memory in order
to use this flag efficiently?
# Just an random thought, Does it help, if we create two different
mempools one from local DRAM
and one from GPU memory so that the application can work transparently.





>
> To support a mixed traffic pattern (some buffers from local DRAM, some
> buffers from other devices) with high BW, a hint flag is introduced in
> the mbuf.
> Application will hint the PMD whether or not it should try to inline the
> given mbuf data buffer. PMD should do best effort to act upon this
> request.
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
>  lib/librte_mbuf/rte_mbuf.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 98225ec80b..5934532b7f 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -203,6 +203,15 @@ extern "C" {
>  /* add new TX flags here */
>
>  /**
> + * Hint to PMD to not inline the mbuf data buffer to device
> + * rather let the device use its DMA engine to fetch the data with the
> + * provided pointer.
> + *
> + * This flag is a only a hint. PMD should enforce it as best effort.
> + */
> +#define PKT_TX_DONT_INLINE_HINT (1ULL << 39)
> +
> +/**
>   * Indicate that the metadata field in the mbuf is in use.
>   */
>  #define PKT_TX_METADATA        (1ULL << 40)
> --
> 2.12.0
>
  
Shahaf Shuler Oct. 17, 2019, 10:59 a.m. UTC | #2
Thursday, October 17, 2019 11:17 AM, Jerin Jacob:
> Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline
> packet
> 
> On Thu, Oct 17, 2019 at 12:57 PM Shahaf Shuler <shahafs@mellanox.com>
> wrote:
> >
> > Some PMDs inline the mbuf data buffer directly to device. This is in
> > order to save the overhead of the PCI headers involved when the device
> > DMA read the buffer pointer. For some devices it is essential in order
> > to reach the pick BW.
> >
> > However, there are cases where such inlining is in-efficient. For
> > example when the data buffer resides on other device memory (like GPU
> > or storage device). attempt to inline such buffer will result in high
> > PCI overhead for reading and copying the data from the remote device.
> 
> Some questions to understand the use case 
> # Is this use case where CPU, local DRAM, NW card and GPU memory connected on the coherent bus

Yes. For example one can allocate GPU memory and map it to the GPU bar, make it accessible from the host CPU through LD/ST. 

> # Assuming the CPU needs to touch the buffer prior to Tx, In that case, it will
> be useful?

If the CPU needs to modify the data then no. it will be more efficient to copy the data to CPU and then send it.
However there are use cases where the data is DMA w/ zero copy to the GPU (for example) , GPU perform the processing on the data, and then CPU send the mbuf (w/o touching the data). 

> # How the application knows, The data buffer is in GPU memory in order to
> use this flag efficiently?

Because it made it happen. For example it attached the mbuf external buffer from the other device memory. 

> # Just an random thought, Does it help, if we create two different mempools
> one from local DRAM and one from GPU memory so that the application can
> work transparently.

But you will still need to teach the PMD which pool it can inline and which cannot. 
IMO it is more generic to have it per mbuf. Moreover, application has this info. 

> 
> 
> 
> 
> 
> >
> > To support a mixed traffic pattern (some buffers from local DRAM, some
> > buffers from other devices) with high BW, a hint flag is introduced in
> > the mbuf.
> > Application will hint the PMD whether or not it should try to inline
> > the given mbuf data buffer. PMD should do best effort to act upon this
> > request.
> >
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > ---
> >  lib/librte_mbuf/rte_mbuf.h | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index 98225ec80b..5934532b7f 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -203,6 +203,15 @@ extern "C" {
> >  /* add new TX flags here */
> >
> >  /**
> > + * Hint to PMD to not inline the mbuf data buffer to device
> > + * rather let the device use its DMA engine to fetch the data with
> > +the
> > + * provided pointer.
> > + *
> > + * This flag is a only a hint. PMD should enforce it as best effort.
> > + */
> > +#define PKT_TX_DONT_INLINE_HINT (1ULL << 39)
> > +
> > +/**
> >   * Indicate that the metadata field in the mbuf is in use.
> >   */
> >  #define PKT_TX_METADATA        (1ULL << 40)
> > --
> > 2.12.0
> >
  
Stephen Hemminger Oct. 17, 2019, 3:14 p.m. UTC | #3
On Thu, 17 Oct 2019 07:27:34 +0000
Shahaf Shuler <shahafs@mellanox.com> wrote:

> Some PMDs inline the mbuf data buffer directly to device. This is in
> order to save the overhead of the PCI headers involved when the device
> DMA read the buffer pointer. For some devices it is essential in order
> to reach the pick BW.
> 
> However, there are cases where such inlining is in-efficient. For example
> when the data buffer resides on other device memory (like GPU or storage
> device). attempt to inline such buffer will result in high PCI overhead
> for reading and copying the data from the remote device.
> 
> To support a mixed traffic pattern (some buffers from local DRAM, some
> buffers from other devices) with high BW, a hint flag is introduced in
> the mbuf.
> Application will hint the PMD whether or not it should try to inline the
> given mbuf data buffer. PMD should do best effort to act upon this
> request.
> 
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>

This kind of optimization is hard, and pushing the problem to the application
to decide seems like the wrong step. Can the driver just infer this
already because some mbuf's are external?
  
Jerin Jacob Oct. 17, 2019, 5:18 p.m. UTC | #4
On Thu, Oct 17, 2019 at 4:30 PM Shahaf Shuler <shahafs@mellanox.com> wrote:
>
> Thursday, October 17, 2019 11:17 AM, Jerin Jacob:
> > Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline
> > packet
> >
> > On Thu, Oct 17, 2019 at 12:57 PM Shahaf Shuler <shahafs@mellanox.com>
> > wrote:
> > >
> > > Some PMDs inline the mbuf data buffer directly to device. This is in
> > > order to save the overhead of the PCI headers involved when the device
> > > DMA read the buffer pointer. For some devices it is essential in order
> > > to reach the pick BW.
> > >
> > > However, there are cases where such inlining is in-efficient. For
> > > example when the data buffer resides on other device memory (like GPU
> > > or storage device). attempt to inline such buffer will result in high
> > > PCI overhead for reading and copying the data from the remote device.
> >
> > Some questions to understand the use case
> > # Is this use case where CPU, local DRAM, NW card and GPU memory connected on the coherent bus
>
> Yes. For example one can allocate GPU memory and map it to the GPU bar, make it accessible from the host CPU through LD/ST.
>
> > # Assuming the CPU needs to touch the buffer prior to Tx, In that case, it will
> > be useful?
>
> If the CPU needs to modify the data then no. it will be more efficient to copy the data to CPU and then send it.
> However there are use cases where the data is DMA w/ zero copy to the GPU (for example) , GPU perform the processing on the data, and then CPU send the mbuf (w/o touching the data).

OK. If I understanding it correctly it is for offloading the
Network/Compute functions to GPU from NW card and/or CPU.

>
> > # How the application knows, The data buffer is in GPU memory in order to
> > use this flag efficiently?
>
> Because it made it happen. For example it attached the mbuf external buffer from the other device memory.
>
> > # Just an random thought, Does it help, if we create two different mempools
> > one from local DRAM and one from GPU memory so that the application can
> > work transparently.
>
> But you will still need to teach the PMD which pool it can inline and which cannot.
> IMO it is more generic to have it per mbuf. Moreover, application has this info.

IMO, we can not use PKT_TX_DONT_INLINE_HINT flag for generic applications,
The application usage will be tightly coupled with the platform and
capabilities of GPU or Host CPU etc.

I think, pushing this logic to the application is bad idea. But if you
are writing some custom application
and the per packet-level you need to control then this flag may be the only way.


>
> >
> >
> >
> >
> >
> > >
> > > To support a mixed traffic pattern (some buffers from local DRAM, some
> > > buffers from other devices) with high BW, a hint flag is introduced in
> > > the mbuf.
> > > Application will hint the PMD whether or not it should try to inline
> > > the given mbuf data buffer. PMD should do best effort to act upon this
> > > request.
> > >
> > > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > > ---
> > >  lib/librte_mbuf/rte_mbuf.h | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > index 98225ec80b..5934532b7f 100644
> > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > @@ -203,6 +203,15 @@ extern "C" {
> > >  /* add new TX flags here */
> > >
> > >  /**
> > > + * Hint to PMD to not inline the mbuf data buffer to device
> > > + * rather let the device use its DMA engine to fetch the data with
> > > +the
> > > + * provided pointer.
> > > + *
> > > + * This flag is a only a hint. PMD should enforce it as best effort.
> > > + */
> > > +#define PKT_TX_DONT_INLINE_HINT (1ULL << 39)
> > > +
> > > +/**
> > >   * Indicate that the metadata field in the mbuf is in use.
> > >   */
> > >  #define PKT_TX_METADATA        (1ULL << 40)
> > > --
> > > 2.12.0
> > >
  
Shahaf Shuler Oct. 22, 2019, 6:26 a.m. UTC | #5
Thursday, October 17, 2019 8:19 PM, Jerin Jacob:
> Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline
> packet
> 
> On Thu, Oct 17, 2019 at 4:30 PM Shahaf Shuler <shahafs@mellanox.com>
> wrote:
> >
> > Thursday, October 17, 2019 11:17 AM, Jerin Jacob:
> > > Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to
> > > inline packet
> > >
> > > On Thu, Oct 17, 2019 at 12:57 PM Shahaf Shuler
> > > <shahafs@mellanox.com>
> > > wrote:
> > > >
> > > > Some PMDs inline the mbuf data buffer directly to device. This is
> > > > in order to save the overhead of the PCI headers involved when the
> > > > device DMA read the buffer pointer. For some devices it is
> > > > essential in order to reach the pick BW.
> > > >
> > > > However, there are cases where such inlining is in-efficient. For
> > > > example when the data buffer resides on other device memory (like
> > > > GPU or storage device). attempt to inline such buffer will result
> > > > in high PCI overhead for reading and copying the data from the remote
> device.
> > >
> > > Some questions to understand the use case # Is this use case where
> > > CPU, local DRAM, NW card and GPU memory connected on the coherent
> > > bus
> >
> > Yes. For example one can allocate GPU memory and map it to the GPU bar,
> make it accessible from the host CPU through LD/ST.
> >
> > > # Assuming the CPU needs to touch the buffer prior to Tx, In that
> > > case, it will be useful?
> >
> > If the CPU needs to modify the data then no. it will be more efficient to
> copy the data to CPU and then send it.
> > However there are use cases where the data is DMA w/ zero copy to the
> GPU (for example) , GPU perform the processing on the data, and then CPU
> send the mbuf (w/o touching the data).
> 
> OK. If I understanding it correctly it is for offloading the Network/Compute
> functions to GPU from NW card and/or CPU.

Mostly the compute. The networking on this model is expected to be done by the CPU. 
Note this is only one use case. 

> 
> >
> > > # How the application knows, The data buffer is in GPU memory in
> > > order to use this flag efficiently?
> >
> > Because it made it happen. For example it attached the mbuf external
> buffer from the other device memory.
> >
> > > # Just an random thought, Does it help, if we create two different
> > > mempools one from local DRAM and one from GPU memory so that the
> > > application can work transparently.
> >
> > But you will still need to teach the PMD which pool it can inline and which
> cannot.
> > IMO it is more generic to have it per mbuf. Moreover, application has this
> info.
> 
> IMO, we can not use PKT_TX_DONT_INLINE_HINT flag for generic
> applications, The application usage will be tightly coupled with the platform
> and capabilities of GPU or Host CPU etc.
> 
> I think, pushing this logic to the application is bad idea. But if you are writing
> some custom application and the per packet-level you need to control then
> this flag may be the only way.

Yes. This flag is for custom application who do unique acceleration (by doing Zero copy for compute/compression/encryption accelerators) on specific platforms. 
Such application is fully aware to the platform and the location where the data resides hence it is very simple for it to know how to set this flag. 

Note, This flag is 0 by default - meaning no hint and generic application works same as today.

> 
> 
> >
> > >
> > >
> > >
> > >
> > >
> > > >
> > > > To support a mixed traffic pattern (some buffers from local DRAM,
> > > > some buffers from other devices) with high BW, a hint flag is
> > > > introduced in the mbuf.
> > > > Application will hint the PMD whether or not it should try to
> > > > inline the given mbuf data buffer. PMD should do best effort to
> > > > act upon this request.
> > > >
> > > > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > > > ---
> > > >  lib/librte_mbuf/rte_mbuf.h | 9 +++++++++
> > > >  1 file changed, 9 insertions(+)
> > > >
> > > > diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > > b/lib/librte_mbuf/rte_mbuf.h index 98225ec80b..5934532b7f 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > @@ -203,6 +203,15 @@ extern "C" {
> > > >  /* add new TX flags here */
> > > >
> > > >  /**
> > > > + * Hint to PMD to not inline the mbuf data buffer to device
> > > > + * rather let the device use its DMA engine to fetch the data
> > > > +with the
> > > > + * provided pointer.
> > > > + *
> > > > + * This flag is a only a hint. PMD should enforce it as best effort.
> > > > + */
> > > > +#define PKT_TX_DONT_INLINE_HINT (1ULL << 39)
> > > > +
> > > > +/**
> > > >   * Indicate that the metadata field in the mbuf is in use.
> > > >   */
> > > >  #define PKT_TX_METADATA        (1ULL << 40)
> > > > --
> > > > 2.12.0
> > > >
  
Shahaf Shuler Oct. 22, 2019, 6:29 a.m. UTC | #6
Thursday, October 17, 2019 6:15 PM, Stephen Hemminger:
> Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline
> packet
> 
> On Thu, 17 Oct 2019 07:27:34 +0000
> Shahaf Shuler <shahafs@mellanox.com> wrote:
> 
> > Some PMDs inline the mbuf data buffer directly to device. This is in
> > order to save the overhead of the PCI headers involved when the device
> > DMA read the buffer pointer. For some devices it is essential in order
> > to reach the pick BW.
> >
> > However, there are cases where such inlining is in-efficient. For
> > example when the data buffer resides on other device memory (like GPU
> > or storage device). attempt to inline such buffer will result in high
> > PCI overhead for reading and copying the data from the remote device.
> >
> > To support a mixed traffic pattern (some buffers from local DRAM, some
> > buffers from other devices) with high BW, a hint flag is introduced in
> > the mbuf.
> > Application will hint the PMD whether or not it should try to inline
> > the given mbuf data buffer. PMD should do best effort to act upon this
> > request.
> >
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> 
> This kind of optimization is hard, and pushing the problem to the application
> to decide seems like the wrong step.

See my comments to Jerin on other thread. This optimization is for custom application who do unique acceleration using look aside accelerators for compute while utilizing network device zero copy. 

 Can the driver just infer this already
> because some mbuf's are external?

Having mbuf w/ external buffer does not necessarily  means the buffer location is on other PCI device. 
Making optimization based on such heuristics may lead to unexpected behavior.
  
Jerin Jacob Oct. 22, 2019, 3:17 p.m. UTC | #7
On Tue, Oct 22, 2019 at 11:56 AM Shahaf Shuler <shahafs@mellanox.com> wrote:
>
> Thursday, October 17, 2019 8:19 PM, Jerin Jacob:
> > Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline
> > packet
> >
> > On Thu, Oct 17, 2019 at 4:30 PM Shahaf Shuler <shahafs@mellanox.com>
> > wrote:
> > >
> > > Thursday, October 17, 2019 11:17 AM, Jerin Jacob:
> > > > Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to
> > > > inline packet
> > > >
> > > > On Thu, Oct 17, 2019 at 12:57 PM Shahaf Shuler
> > > > <shahafs@mellanox.com>
> > > > wrote:
> > > > >
> > > > > Some PMDs inline the mbuf data buffer directly to device. This is
> > > > > in order to save the overhead of the PCI headers involved when the
> > > > > device DMA read the buffer pointer. For some devices it is
> > > > > essential in order to reach the pick BW.
> > > > >
> > > > > However, there are cases where such inlining is in-efficient. For
> > > > > example when the data buffer resides on other device memory (like
> > > > > GPU or storage device). attempt to inline such buffer will result
> > > > > in high PCI overhead for reading and copying the data from the remote
> > device.
> > > >
> > > > Some questions to understand the use case # Is this use case where
> > > > CPU, local DRAM, NW card and GPU memory connected on the coherent
> > > > bus
> > >
> > > Yes. For example one can allocate GPU memory and map it to the GPU bar,
> > make it accessible from the host CPU through LD/ST.
> > >
> > > > # Assuming the CPU needs to touch the buffer prior to Tx, In that
> > > > case, it will be useful?
> > >
> > > If the CPU needs to modify the data then no. it will be more efficient to
> > copy the data to CPU and then send it.
> > > However there are use cases where the data is DMA w/ zero copy to the
> > GPU (for example) , GPU perform the processing on the data, and then CPU
> > send the mbuf (w/o touching the data).
> >
> > OK. If I understanding it correctly it is for offloading the Network/Compute
> > functions to GPU from NW card and/or CPU.
>
> Mostly the compute. The networking on this model is expected to be done by the CPU.
> Note this is only one use case.
>
> >
> > >
> > > > # How the application knows, The data buffer is in GPU memory in
> > > > order to use this flag efficiently?
> > >
> > > Because it made it happen. For example it attached the mbuf external
> > buffer from the other device memory.
> > >
> > > > # Just an random thought, Does it help, if we create two different
> > > > mempools one from local DRAM and one from GPU memory so that the
> > > > application can work transparently.
> > >
> > > But you will still need to teach the PMD which pool it can inline and which
> > cannot.
> > > IMO it is more generic to have it per mbuf. Moreover, application has this
> > info.
> >
> > IMO, we can not use PKT_TX_DONT_INLINE_HINT flag for generic
> > applications, The application usage will be tightly coupled with the platform
> > and capabilities of GPU or Host CPU etc.
> >
> > I think, pushing this logic to the application is bad idea. But if you are writing
> > some custom application and the per packet-level you need to control then
> > this flag may be the only way.
>
> Yes. This flag is for custom application who do unique acceleration (by doing Zero copy for compute/compression/encryption accelerators) on specific platforms.
> Such application is fully aware to the platform and the location where the data resides hence it is very simple for it to know how to set this flag.

# if it is per packet, it will be an implicit requirement to add it mbuf.

If so,
# Does it makes sense to add through dynamic mbuf? Maybe it is not
worth it for a single bit.

Since we have only 17 bits (40 - 23) remaining for Rx and Tx and it is
custom application requirement,
how about adding PKT_PMD_CUSTOM1 flags so that similar requirement by other PMDs
can leverage the same bit for such custom applications.(We have a
similar use case for smart NIC (not so make much sense for generic
applications)  but needed for per packet)

>
> Note, This flag is 0 by default - meaning no hint and generic application works same as today.






>
> >
> >
> > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > > To support a mixed traffic pattern (some buffers from local DRAM,
> > > > > some buffers from other devices) with high BW, a hint flag is
> > > > > introduced in the mbuf.
> > > > > Application will hint the PMD whether or not it should try to
> > > > > inline the given mbuf data buffer. PMD should do best effort to
> > > > > act upon this request.
> > > > >
> > > > > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > > > > ---
> > > > >  lib/librte_mbuf/rte_mbuf.h | 9 +++++++++
> > > > >  1 file changed, 9 insertions(+)
> > > > >
> > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > > > b/lib/librte_mbuf/rte_mbuf.h index 98225ec80b..5934532b7f 100644
> > > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > @@ -203,6 +203,15 @@ extern "C" {
> > > > >  /* add new TX flags here */
> > > > >
> > > > >  /**
> > > > > + * Hint to PMD to not inline the mbuf data buffer to device
> > > > > + * rather let the device use its DMA engine to fetch the data
> > > > > +with the
> > > > > + * provided pointer.
> > > > > + *
> > > > > + * This flag is a only a hint. PMD should enforce it as best effort.
> > > > > + */
> > > > > +#define PKT_TX_DONT_INLINE_HINT (1ULL << 39)
> > > > > +
> > > > > +/**
> > > > >   * Indicate that the metadata field in the mbuf is in use.
> > > > >   */
> > > > >  #define PKT_TX_METADATA        (1ULL << 40)
> > > > > --
> > > > > 2.12.0
> > > > >
  
Shahaf Shuler Oct. 23, 2019, 11:24 a.m. UTC | #8
Tuesday, October 22, 2019 6:17 PM, Jerin Jacob:
> <viacheslavo@mellanox.com>
> Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline
> packet

[...]

> > > I think, pushing this logic to the application is bad idea. But if
> > > you are writing some custom application and the per packet-level you
> > > need to control then this flag may be the only way.
> >
> > Yes. This flag is for custom application who do unique acceleration (by doing
> Zero copy for compute/compression/encryption accelerators) on specific
> platforms.
> > Such application is fully aware to the platform and the location where the
> data resides hence it is very simple for it to know how to set this flag.
> 
> # if it is per packet, it will be an implicit requirement to add it mbuf.
> 
> If so,
> # Does it makes sense to add through dynamic mbuf? Maybe it is not worth it
> for a single bit.

You mean 
1. expose PMD cap for it
2. application enables it on dev offloads
3. PMD register bitfield to the dynamic mbuf flags (rte_mbuf_dynflag_register)
4. application register same flag to get the bit offset

It can be OK, if the community don't see common use for such flag. 


> 
> Since we have only 17 bits (40 - 23) remaining for Rx and Tx and it is custom
> application requirement, how about adding PKT_PMD_CUSTOM1 flags so
> that similar requirement by other PMDs can leverage the same bit for such
> custom applications.(We have a similar use case for smart NIC (not so make
> much sense for generic
> applications)  but needed for per packet)
> 
> >
> > Note, This flag is 0 by default - meaning no hint and generic application
> works same as today.
> 
> 
> 
> 
> 
> 
> >
> > >
> > >
> > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > To support a mixed traffic pattern (some buffers from local
> > > > > > DRAM, some buffers from other devices) with high BW, a hint
> > > > > > flag is introduced in the mbuf.
> > > > > > Application will hint the PMD whether or not it should try to
> > > > > > inline the given mbuf data buffer. PMD should do best effort
> > > > > > to act upon this request.
> > > > > >
> > > > > > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > > > > > ---
> > > > > >  lib/librte_mbuf/rte_mbuf.h | 9 +++++++++
> > > > > >  1 file changed, 9 insertions(+)
> > > > > >
> > > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > > > > b/lib/librte_mbuf/rte_mbuf.h index 98225ec80b..5934532b7f
> > > > > > 100644
> > > > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > @@ -203,6 +203,15 @@ extern "C" {
> > > > > >  /* add new TX flags here */
> > > > > >
> > > > > >  /**
> > > > > > + * Hint to PMD to not inline the mbuf data buffer to device
> > > > > > + * rather let the device use its DMA engine to fetch the data
> > > > > > +with the
> > > > > > + * provided pointer.
> > > > > > + *
> > > > > > + * This flag is a only a hint. PMD should enforce it as best effort.
> > > > > > + */
> > > > > > +#define PKT_TX_DONT_INLINE_HINT (1ULL << 39)
> > > > > > +
> > > > > > +/**
> > > > > >   * Indicate that the metadata field in the mbuf is in use.
> > > > > >   */
> > > > > >  #define PKT_TX_METADATA        (1ULL << 40)
> > > > > > --
> > > > > > 2.12.0
> > > > > >
  
Jerin Jacob Oct. 25, 2019, 11:17 a.m. UTC | #9
On Wed, Oct 23, 2019 at 4:54 PM Shahaf Shuler <shahafs@mellanox.com> wrote:
>
> Tuesday, October 22, 2019 6:17 PM, Jerin Jacob:
> > <viacheslavo@mellanox.com>
> > Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline
> > packet
>
> [...]
>
> > > > I think, pushing this logic to the application is bad idea. But if
> > > > you are writing some custom application and the per packet-level you
> > > > need to control then this flag may be the only way.
> > >
> > > Yes. This flag is for custom application who do unique acceleration (by doing
> > Zero copy for compute/compression/encryption accelerators) on specific
> > platforms.
> > > Such application is fully aware to the platform and the location where the
> > data resides hence it is very simple for it to know how to set this flag.
> >
> > # if it is per packet, it will be an implicit requirement to add it mbuf.
> >
> > If so,
> > # Does it makes sense to add through dynamic mbuf? Maybe it is not worth it
> > for a single bit.
>
> You mean
> 1. expose PMD cap for it
> 2. application enables it on dev offloads
> 3. PMD register bitfield to the dynamic mbuf flags (rte_mbuf_dynflag_register)
> 4. application register same flag to get the bit offset
>
> It can be OK, if the community don't see common use for such flag.

Any scheme based on dynamic mbuf flags should be fine.

>
>
> >
> > Since we have only 17 bits (40 - 23) remaining for Rx and Tx and it is custom
> > application requirement, how about adding PKT_PMD_CUSTOM1 flags so
> > that similar requirement by other PMDs can leverage the same bit for such
> > custom applications.(We have a similar use case for smart NIC (not so make
> > much sense for generic
> > applications)  but needed for per packet)
> >
> > >
> > > Note, This flag is 0 by default - meaning no hint and generic application
> > works same as today.
> >
> >
> >
> >
> >
> >
> > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > To support a mixed traffic pattern (some buffers from local
> > > > > > > DRAM, some buffers from other devices) with high BW, a hint
> > > > > > > flag is introduced in the mbuf.
> > > > > > > Application will hint the PMD whether or not it should try to
> > > > > > > inline the given mbuf data buffer. PMD should do best effort
> > > > > > > to act upon this request.
> > > > > > >
> > > > > > > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > > > > > > ---
> > > > > > >  lib/librte_mbuf/rte_mbuf.h | 9 +++++++++
> > > > > > >  1 file changed, 9 insertions(+)
> > > > > > >
> > > > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > b/lib/librte_mbuf/rte_mbuf.h index 98225ec80b..5934532b7f
> > > > > > > 100644
> > > > > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > @@ -203,6 +203,15 @@ extern "C" {
> > > > > > >  /* add new TX flags here */
> > > > > > >
> > > > > > >  /**
> > > > > > > + * Hint to PMD to not inline the mbuf data buffer to device
> > > > > > > + * rather let the device use its DMA engine to fetch the data
> > > > > > > +with the
> > > > > > > + * provided pointer.
> > > > > > > + *
> > > > > > > + * This flag is a only a hint. PMD should enforce it as best effort.
> > > > > > > + */
> > > > > > > +#define PKT_TX_DONT_INLINE_HINT (1ULL << 39)
> > > > > > > +
> > > > > > > +/**
> > > > > > >   * Indicate that the metadata field in the mbuf is in use.
> > > > > > >   */
> > > > > > >  #define PKT_TX_METADATA        (1ULL << 40)
> > > > > > > --
> > > > > > > 2.12.0
> > > > > > >
  
Slava Ovsiienko Dec. 11, 2019, 5:01 p.m. UTC | #10
Some PMDs inline the mbuf data buffer directly to device transmit descriptor.
This is in order to save the overhead of the PCI headers imposed when the
device DMA reads the data by buffer pointer. For some devices it is essential
in order to provide the full bandwidth.

However, there are cases where such inlining is in-efficient. For example, when
the data buffer resides on other device memory (like GPU or storage device).
Attempt to inline such buffer will result in high PCI overhead for reading
and copying the data from the remote device to the host memory.

To support a mixed traffic pattern (some buffers from local host memory, some
buffers from other devices) with high bandwidth, a hint flag is introduced in
the mbuf.

Application will hint the PMD whether or not it should try to inline the
given mbuf data buffer. PMD should do the best effort to act upon this request.

The hint flag RTE_NET_MLX5_DYNFLAG_NO_INLINE_NAME is supposed to be dynamic,
registered by application with rte_mbuf_dynflag_register(). This flag is
purely vendor specific and declared in PMD specific header rte_pmd_mlx5.h,
which is intended to be used by specific application.

To query the supported specific flags in runtime the private routine is
introduced:

int rte_pmd_mlx5_get_dyn_flag_names(
        uint16_t port,
	char *names[],
        uint16_t n)

It returns the array of currently (over present hardware and configuration)
supported specific flags.

The "not inline hint" feature operating flow is the following one:
- application start
- probe the devices, ports are created
- query the port capabilities
- if port supporting the feature is found
  - register dynamic flag RTE_NET_MLX5_DYNFLAG_NO_INLINE_NAME
- application starts the ports
- on dev_start() PMD checks whether the feature flag is registered and
  enables the feature support in datapath
- application might set this flag in ol_flags field of mbuf in the packets
  being sent and PMD will handle ones appropriately.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
v1: https://patches.dpdk.org/patch/61348/
  
Olivier Matz Dec. 27, 2019, 8:59 a.m. UTC | #11
Hi Viacheslav,

On Wed, Dec 11, 2019 at 05:01:33PM +0000, Viacheslav Ovsiienko wrote:
> Some PMDs inline the mbuf data buffer directly to device transmit descriptor.
> This is in order to save the overhead of the PCI headers imposed when the
> device DMA reads the data by buffer pointer. For some devices it is essential
> in order to provide the full bandwidth.
> 
> However, there are cases where such inlining is in-efficient. For example, when
> the data buffer resides on other device memory (like GPU or storage device).
> Attempt to inline such buffer will result in high PCI overhead for reading
> and copying the data from the remote device to the host memory.
> 
> To support a mixed traffic pattern (some buffers from local host memory, some
> buffers from other devices) with high bandwidth, a hint flag is introduced in
> the mbuf.
> 
> Application will hint the PMD whether or not it should try to inline the
> given mbuf data buffer. PMD should do the best effort to act upon this request.
> 
> The hint flag RTE_NET_MLX5_DYNFLAG_NO_INLINE_NAME is supposed to be dynamic,
> registered by application with rte_mbuf_dynflag_register(). This flag is
> purely vendor specific and declared in PMD specific header rte_pmd_mlx5.h,
> which is intended to be used by specific application.
> 
> To query the supported specific flags in runtime the private routine is
> introduced:
> 
> int rte_pmd_mlx5_get_dyn_flag_names(
>         uint16_t port,
> 	char *names[],
>         uint16_t n)
> 
> It returns the array of currently (over present hardware and configuration)
> supported specific flags.
> 
> The "not inline hint" feature operating flow is the following one:
> - application start
> - probe the devices, ports are created
> - query the port capabilities
> - if port supporting the feature is found
>   - register dynamic flag RTE_NET_MLX5_DYNFLAG_NO_INLINE_NAME
> - application starts the ports
> - on dev_start() PMD checks whether the feature flag is registered and
>   enables the feature support in datapath
> - application might set this flag in ol_flags field of mbuf in the packets
>   being sent and PMD will handle ones appropriately.
> 
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> ---
> v1: https://patches.dpdk.org/patch/61348/
> 

It looks the patch is missing.

I think a dynamic flag is the good solution for this problem: the pmd
can send a pmd-specific hint to the application, without impacting the
way it works today.


Olivier
  

Patch

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 98225ec80b..5934532b7f 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -203,6 +203,15 @@  extern "C" {
 /* add new TX flags here */
 
 /**
+ * Hint to PMD to not inline the mbuf data buffer to device
+ * rather let the device use its DMA engine to fetch the data with the
+ * provided pointer.
+ *
+ * This flag is a only a hint. PMD should enforce it as best effort.
+ */
+#define PKT_TX_DONT_INLINE_HINT (1ULL << 39)
+
+/**
  * Indicate that the metadata field in the mbuf is in use.
  */
 #define PKT_TX_METADATA	(1ULL << 40)