[dpdk-dev,4/6] ethdev: introduce TX common tunnel offloads

Message ID 20180109141110.146250-5-xuemingl@mellanox.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail apply patch file failure

Commit Message

Xueming Li Jan. 9, 2018, 2:11 p.m. UTC
  This patch introduce new DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM_TSO flag for
devices that support tunnel agnostic TX checksum and tso offloading.

Checksum offset and TSO header length are calculated based on mbuf
inner length l*_len, outer_l*_len and tx offload flags PKT_TX_*, tunnel
header length is part of inner l2_len, so device HW do cheksum and TSO
calculation w/o knowledge of perticular tunnel type.

When set application must guarantee that correct header types and
lengths for each inner and outer headers in mbuf header, no need to
specify tunnel type.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 lib/librte_ether/rte_ethdev.h | 9 +++++++++
 1 file changed, 9 insertions(+)
  

Comments

Ferruh Yigit Jan. 11, 2018, 6:38 p.m. UTC | #1
On 1/9/2018 2:11 PM, Xueming Li wrote:
> This patch introduce new DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM_TSO flag for
> devices that support tunnel agnostic TX checksum and tso offloading.
> 
> Checksum offset and TSO header length are calculated based on mbuf
> inner length l*_len, outer_l*_len and tx offload flags PKT_TX_*, tunnel
> header length is part of inner l2_len, so device HW do cheksum and TSO
> calculation w/o knowledge of perticular tunnel type.
> 
> When set application must guarantee that correct header types and
> lengths for each inner and outer headers in mbuf header, no need to
> specify tunnel type.
> 
> Signed-off-by: Xueming Li <xuemingl@mellanox.com>

Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
  
Olivier Matz Jan. 16, 2018, 5:10 p.m. UTC | #2
Hi Xueming,

On Tue, Jan 09, 2018 at 10:11:08PM +0800, Xueming Li wrote:
> This patch introduce new DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM_TSO flag for
> devices that support tunnel agnostic TX checksum and tso offloading.
> 
> Checksum offset and TSO header length are calculated based on mbuf
> inner length l*_len, outer_l*_len and tx offload flags PKT_TX_*, tunnel
> header length is part of inner l2_len, so device HW do cheksum and TSO
> calculation w/o knowledge of perticular tunnel type.
> 
> When set application must guarantee that correct header types and
> lengths for each inner and outer headers in mbuf header, no need to
> specify tunnel type.
> 
> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> ---
>  lib/librte_ether/rte_ethdev.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 57b61ed41..8457d01be 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1003,6 +1003,15 @@ struct rte_eth_conf {
>   *   the same mempool and has refcnt = 1.
>   */
>  #define DEV_TX_OFFLOAD_SECURITY         0x00020000
> +/**< Device supports arbitrary tunnel chksum and tso offloading w/o knowing
> + *   tunnel detail. Checksum and TSO are calculated based on mbuf fields:
> + *     l*_len, outer_l*_len
> + *     PKT_TX_OUTER_IPV6, PKT_TX_IPV6
> + *     PKT_TX_IP_CKSUM, PKT_TX_TCP_CKSUM, PKT_TX_UDP_CKSUM
> + *   When set application must guarantee correct header fields, no need to
> + *   specify tunnel type PKT_TX_TUNNEL_* for HW.
> + */
> +#define DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM_TSO	0x00040000
>  
>  struct rte_pci_device;
>  

I'd like to have more details about this flag and its meaning.

Let's say we want to do TSO on the following vxlan packet:
  Ether / IP1 / UDP / VXLAN / Ether / IP2 / TCP / Data

With the current API, we need to pass the tunnel type to the hardware
with the flag PKT_TX_TUNNEL_VXLAN. Thanks to that, the driver can
forward this information to the hardware so it knows that the
ip1->length, udp->length and optionally the udp->chksum have to be
updated for each generated segment.

With your proposal, if I understand properly, it is not expected to pass
the tunnel type in the mbuf. So how can the hardware know if some fields
have to be updated in the outer header?
  
Xueming Li Jan. 16, 2018, 5:28 p.m. UTC | #3
Hi Olivier,

Thanks for looking into this.

> -----Original Message-----

> From: Olivier Matz [mailto:olivier.matz@6wind.com]

> Sent: Wednesday, January 17, 2018 1:10 AM

> To: Xueming(Steven) Li <xuemingl@mellanox.com>

> Cc: Thomas Monjalon <thomas@monjalon.net>; Jingjing Wu

> <jingjing.wu@intel.com>; Yongseok Koh <yskoh@mellanox.com>; Shahaf Shuler

> <shahafs@mellanox.com>; dev@dpdk.org

> Subject: Re: [PATCH 4/6] ethdev: introduce TX common tunnel offloads

> 

> Hi Xueming,

> 

> On Tue, Jan 09, 2018 at 10:11:08PM +0800, Xueming Li wrote:

> > This patch introduce new DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM_TSO flag for

> > devices that support tunnel agnostic TX checksum and tso offloading.

> >

> > Checksum offset and TSO header length are calculated based on mbuf

> > inner length l*_len, outer_l*_len and tx offload flags PKT_TX_*,

> > tunnel header length is part of inner l2_len, so device HW do cheksum

> > and TSO calculation w/o knowledge of perticular tunnel type.

> >

> > When set application must guarantee that correct header types and

> > lengths for each inner and outer headers in mbuf header, no need to

> > specify tunnel type.

> >

> > Signed-off-by: Xueming Li <xuemingl@mellanox.com>

> > ---

> >  lib/librte_ether/rte_ethdev.h | 9 +++++++++

> >  1 file changed, 9 insertions(+)

> >

> > diff --git a/lib/librte_ether/rte_ethdev.h

> > b/lib/librte_ether/rte_ethdev.h index 57b61ed41..8457d01be 100644

> > --- a/lib/librte_ether/rte_ethdev.h

> > +++ b/lib/librte_ether/rte_ethdev.h

> > @@ -1003,6 +1003,15 @@ struct rte_eth_conf {

> >   *   the same mempool and has refcnt = 1.

> >   */

> >  #define DEV_TX_OFFLOAD_SECURITY         0x00020000

> > +/**< Device supports arbitrary tunnel chksum and tso offloading w/o

> knowing

> > + *   tunnel detail. Checksum and TSO are calculated based on mbuf

> fields:

> > + *     l*_len, outer_l*_len

> > + *     PKT_TX_OUTER_IPV6, PKT_TX_IPV6

> > + *     PKT_TX_IP_CKSUM, PKT_TX_TCP_CKSUM, PKT_TX_UDP_CKSUM

> > + *   When set application must guarantee correct header fields, no need

> to

> > + *   specify tunnel type PKT_TX_TUNNEL_* for HW.

> > + */

> > +#define DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM_TSO	0x00040000

> >

> >  struct rte_pci_device;

> >

> 

> I'd like to have more details about this flag and its meaning.

> 

> Let's say we want to do TSO on the following vxlan packet:

>   Ether / IP1 / UDP / VXLAN / Ether / IP2 / TCP / Data

> 

> With the current API, we need to pass the tunnel type to the hardware with

> the flag PKT_TX_TUNNEL_VXLAN. Thanks to that, the driver can forward this

> information to the hardware so it knows that the

> ip1->length, udp->length and optionally the udp->chksum have to be

> updated for each generated segment.

> 

> With your proposal, if I understand properly, it is not expected to pass

> the tunnel type in the mbuf. So how can the hardware know if some fields

> have to be updated in the outer header?


I'm not expert on hardware, the driver has to supply outer and inner 
L3/L4 offsets, types and which field(s) to fill checksum, no length update
as far as I know.
  
Shahaf Shuler Jan. 16, 2018, 7:06 p.m. UTC | #4
Hi Oliver, Xueming,

Tuesday, January 16, 2018 7:29 PM, Xueming(Steven) Li:
> > Hi Xueming,

> >

> > On Tue, Jan 09, 2018 at 10:11:08PM +0800, Xueming Li wrote:

> > >   */

> > >  #define DEV_TX_OFFLOAD_SECURITY         0x00020000

> > > +/**< Device supports arbitrary tunnel chksum and tso offloading w/o

> > knowing

> > > + *   tunnel detail. Checksum and TSO are calculated based on mbuf

> > fields:

> > > + *     l*_len, outer_l*_len

> > > + *     PKT_TX_OUTER_IPV6, PKT_TX_IPV6

> > > + *     PKT_TX_IP_CKSUM, PKT_TX_TCP_CKSUM, PKT_TX_UDP_CKSUM

> > > + *   When set application must guarantee correct header fields, no need

> > to

> > > + *   specify tunnel type PKT_TX_TUNNEL_* for HW.

> > > + */


I think some documentation is missing here.
What the NIC needs to know to support the generic tunnel TSO and checksum offloads is:
1. the length of each header
2. the type of the outer/inner l3/l4. Meaning is it IPv4/IPv6 and whether it is UDP/TCP.

The outer IPv6 seems covered. The inner L4 seems missing. 

More details about this offload below.

> > > +#define DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM_TSO	0x00040000

> > >

> > >  struct rte_pci_device;

> > >

> >

> > I'd like to have more details about this flag and its meaning.

> >

> > Let's say we want to do TSO on the following vxlan packet:

> >   Ether / IP1 / UDP / VXLAN / Ether / IP2 / TCP / Data

> >

> > With the current API, we need to pass the tunnel type to the hardware

> > with the flag PKT_TX_TUNNEL_VXLAN. Thanks to that, the driver can

> > forward this information to the hardware so it knows that the

> > ip1->length, udp->length and optionally the udp->chksum have to be

> > updated for each generated segment.

> >

> > With your proposal, if I understand properly, it is not expected to

> > pass the tunnel type in the mbuf. So how can the hardware know if some

> > fields have to be updated in the outer header?

> 

> I'm not expert on hardware, the driver has to supply outer and inner

> L3/L4 offsets, types and which field(s) to fill checksum, no length update as

> far as I know.


Mellanox HW is capable to parse the packet according to hints from the driver.

If you think about it, to calculate the IP checksum all you need to do is know the inner/outer IP offset, and the fact it is an IPv4.
To calculate the inner TCP/UDP checksum it is the same. all that after the L4 is counted as payload and the pseudo header can be done with the information about the IP.

About TSO - just need to get the offset till the inner header so that the NIC can append the full headers to every segment and update the inner/outer L3 and L4 fields accordingly (which their location is known). 

All of this can be done by the mbuf fields. Given those fields the driver can calculate the:
Outer_l3_offset = outer_l2_len
Outer_l4_offlset = outer_l3_offset +  outer_l3_len
Inner_l3_offset = outer_l4_offset + l2_len 
Inner_l4_offset = inner_l4_offset + l3_len

And pass to the device. Theoretically multiple encapsulating are supported by enlarging the l2_len. 
 
Hope it explains more about this generic offload.
  
Yongseok Koh Jan. 17, 2018, 12:50 a.m. UTC | #5
> On Jan 9, 2018, at 6:11 AM, Xueming Li <xuemingl@mellanox.com> wrote:
> 
> This patch introduce new DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM_TSO flag for
> devices that support tunnel agnostic TX checksum and tso offloading.
> 
> Checksum offset and TSO header length are calculated based on mbuf
> inner length l*_len, outer_l*_len and tx offload flags PKT_TX_*, tunnel
> header length is part of inner l2_len, so device HW do cheksum and TSO
> calculation w/o knowledge of perticular tunnel type.
> 
> When set application must guarantee that correct header types and
> lengths for each inner and outer headers in mbuf header, no need to
> specify tunnel type.
> 
> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> ---
> lib/librte_ether/rte_ethdev.h | 9 +++++++++
> 1 file changed, 9 insertions(+)
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 57b61ed41..8457d01be 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1003,6 +1003,15 @@ struct rte_eth_conf {
>  *   the same mempool and has refcnt = 1.
>  */
> #define DEV_TX_OFFLOAD_SECURITY         0x00020000
> +/**< Device supports arbitrary tunnel chksum and tso offloading w/o knowing
> + *   tunnel detail. Checksum and TSO are calculated based on mbuf fields:
> + *     l*_len, outer_l*_len
> + *     PKT_TX_OUTER_IPV6, PKT_TX_IPV6
> + *     PKT_TX_IP_CKSUM, PKT_TX_TCP_CKSUM, PKT_TX_UDP_CKSUM
> + *   When set application must guarantee correct header fields, no need to
> + *   specify tunnel type PKT_TX_TUNNEL_* for HW.
> + */
> +#define DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM_TSO	0x00040000
> 
> struct rte_pci_device;

I'm wondering why generic tunnel offload has to support checksum and TSO
together. Those two are orthogonal, aren't they? App can request HW checksum
offload even for non-TSO packets. Does DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM_TSO mean
HW can support checksum and TSO for generic tunnel? Then shouldn't it be two
flags instead? E.g.
DEV_TX_OFFLOAD_GENERIC_TNL_TSO
DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM

Thanks
Yongseok
  
Olivier Matz Jan. 22, 2018, 12:46 p.m. UTC | #6
Hi,

On Tue, Jan 16, 2018 at 07:06:15PM +0000, Shahaf Shuler wrote:
> Hi Oliver, Xueming,
> 
> Tuesday, January 16, 2018 7:29 PM, Xueming(Steven) Li:
> > > Hi Xueming,
> > >
> > > On Tue, Jan 09, 2018 at 10:11:08PM +0800, Xueming Li wrote:
> > > >   */
> > > >  #define DEV_TX_OFFLOAD_SECURITY         0x00020000
> > > > +/**< Device supports arbitrary tunnel chksum and tso offloading w/o
> > > knowing
> > > > + *   tunnel detail. Checksum and TSO are calculated based on mbuf
> > > fields:
> > > > + *     l*_len, outer_l*_len
> > > > + *     PKT_TX_OUTER_IPV6, PKT_TX_IPV6
> > > > + *     PKT_TX_IP_CKSUM, PKT_TX_TCP_CKSUM, PKT_TX_UDP_CKSUM
> > > > + *   When set application must guarantee correct header fields, no need
> > > to
> > > > + *   specify tunnel type PKT_TX_TUNNEL_* for HW.
> > > > + */
> 
> I think some documentation is missing here.
> What the NIC needs to know to support the generic tunnel TSO and checksum offloads is:
> 1. the length of each header
> 2. the type of the outer/inner l3/l4. Meaning is it IPv4/IPv6 and whether it is UDP/TCP.
> 
> The outer IPv6 seems covered. The inner L4 seems missing. 
> 
> More details about this offload below.
> 
> > > > +#define DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM_TSO	0x00040000
> > > >
> > > >  struct rte_pci_device;
> > > >
> > >
> > > I'd like to have more details about this flag and its meaning.
> > >
> > > Let's say we want to do TSO on the following vxlan packet:
> > >   Ether / IP1 / UDP / VXLAN / Ether / IP2 / TCP / Data
> > >
> > > With the current API, we need to pass the tunnel type to the hardware
> > > with the flag PKT_TX_TUNNEL_VXLAN. Thanks to that, the driver can
> > > forward this information to the hardware so it knows that the
> > > ip1->length, udp->length and optionally the udp->chksum have to be
> > > updated for each generated segment.
> > >
> > > With your proposal, if I understand properly, it is not expected to
> > > pass the tunnel type in the mbuf. So how can the hardware know if some
> > > fields have to be updated in the outer header?
> > 
> > I'm not expert on hardware, the driver has to supply outer and inner
> > L3/L4 offsets, types and which field(s) to fill checksum, no length update as
> > far as I know.
> 
> Mellanox HW is capable to parse the packet according to hints from the driver.
> 
> If you think about it, to calculate the IP checksum all you need to do is know the inner/outer IP offset, and the fact it is an IPv4.
> To calculate the inner TCP/UDP checksum it is the same. all that after the L4 is counted as payload and the pseudo header can be done with the information about the IP.
> 
> About TSO - just need to get the offset till the inner header so that the NIC can append the full headers to every segment and update the inner/outer L3 and L4 fields accordingly (which their location is known). 

I think that's partially true. Let me try to clarify:

- in case of VXLAN (my previous example), the hw needs to update the
  outer L3 (ip length) and L4 (udp length and optionnally checksum)
- in case of GRE, an update of the checksum is required if present. The
  sequence number may also be increased (I don't know how widely it is
  used).
- in case of a proprietary or unsupported tunnel, the hardware cannot
  know which fields to update in the outer header. So I'm not sure
  a "generic" flag is possible.

How can the application know which tunnels types are supported by the
hardware and which should be done in software?


Olivier
  
Shahaf Shuler Jan. 22, 2018, 8:06 p.m. UTC | #7
Monday, January 22, 2018 2:47 PM, Olivier Matz:
> Hi,
> 
> On Tue, Jan 16, 2018 at 07:06:15PM +0000, Shahaf Shuler wrote:
> > Hi Oliver, Xueming,
> >
> > Tuesday, January 16, 2018 7:29 PM, Xueming(Steven) Li:
> > > > Hi Xueming,
> > > >
> > > > On Tue, Jan 09, 2018 at 10:11:08PM +0800, Xueming Li wrote:
> > > > >   */
> > > > >  #define DEV_TX_OFFLOAD_SECURITY         0x00020000
> > > > > +/**< Device supports arbitrary tunnel chksum and tso offloading
> > > > > +w/o
> > > > knowing
> > > > > + *   tunnel detail. Checksum and TSO are calculated based on mbuf
> > > > fields:
> > > > > + *     l*_len, outer_l*_len
> > > > > + *     PKT_TX_OUTER_IPV6, PKT_TX_IPV6
> > > > > + *     PKT_TX_IP_CKSUM, PKT_TX_TCP_CKSUM,
> PKT_TX_UDP_CKSUM
> > > > > + *   When set application must guarantee correct header fields, no
> need
> > > > to
> > > > > + *   specify tunnel type PKT_TX_TUNNEL_* for HW.
> > > > > + */
> >
> > I think some documentation is missing here.
> > What the NIC needs to know to support the generic tunnel TSO and
> checksum offloads is:
> > 1. the length of each header
> > 2. the type of the outer/inner l3/l4. Meaning is it IPv4/IPv6 and whether it
> is UDP/TCP.
> >
> > The outer IPv6 seems covered. The inner L4 seems missing.
> >
> > More details about this offload below.
> >
> > > > > +#define DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM_TSO
> 	0x00040000
> > > > >
> > > > >  struct rte_pci_device;
> > > > >
> > > >
> > > > I'd like to have more details about this flag and its meaning.
> > > >
> > > > Let's say we want to do TSO on the following vxlan packet:
> > > >   Ether / IP1 / UDP / VXLAN / Ether / IP2 / TCP / Data
> > > >
> > > > With the current API, we need to pass the tunnel type to the
> > > > hardware with the flag PKT_TX_TUNNEL_VXLAN. Thanks to that, the
> > > > driver can forward this information to the hardware so it knows
> > > > that the
> > > > ip1->length, udp->length and optionally the udp->chksum have to be
> > > > updated for each generated segment.
> > > >
> > > > With your proposal, if I understand properly, it is not expected
> > > > to pass the tunnel type in the mbuf. So how can the hardware know
> > > > if some fields have to be updated in the outer header?
> > >
> > > I'm not expert on hardware, the driver has to supply outer and inner
> > > L3/L4 offsets, types and which field(s) to fill checksum, no length
> > > update as far as I know.
> >
> > Mellanox HW is capable to parse the packet according to hints from the
> driver.
> >
> > If you think about it, to calculate the IP checksum all you need to do is know
> the inner/outer IP offset, and the fact it is an IPv4.
> > To calculate the inner TCP/UDP checksum it is the same. all that after the L4
> is counted as payload and the pseudo header can be done with the
> information about the IP.
> >
> > About TSO - just need to get the offset till the inner header so that the NIC
> can append the full headers to every segment and update the inner/outer L3
> and L4 fields accordingly (which their location is known).
> 
> I think that's partially true. Let me try to clarify:
> 
> - in case of VXLAN (my previous example), the hw needs to update the
>   outer L3 (ip length) and L4 (udp length and optionnally checksum)
> - in case of GRE, an update of the checksum is required if present. The
>   sequence number may also be increased (I don't know how widely it is
>   used).
> - in case of a proprietary or unsupported tunnel, the hardware cannot
>   know which fields to update in the outer header. So I'm not sure
>   a "generic" flag is possible.
> 
> How can the application know which tunnels types are supported by the
> hardware and which should be done in software?

Yes I understand your point. maybe we should rephrase and change the name of the feature. 

The support from the device is for inner and outer checksums on IPV4/TCP/UDP and TSO for *any packet with the following format*:

< some headers > / [optional IPv4/IPv6] / [optional TCP/UDP] / <some headers> / [optional inner IPv4/IPv6] / [optional TCP/UDP]

 For example the following packets can use this feature:

1. eth / ipv4 / udp / VXLAN / ip / tcp
2. eth / ipv4 / GRE / MPLS / ipv4 / udp 


> 
> 
> Olivier
  

Patch

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 57b61ed41..8457d01be 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1003,6 +1003,15 @@  struct rte_eth_conf {
  *   the same mempool and has refcnt = 1.
  */
 #define DEV_TX_OFFLOAD_SECURITY         0x00020000
+/**< Device supports arbitrary tunnel chksum and tso offloading w/o knowing
+ *   tunnel detail. Checksum and TSO are calculated based on mbuf fields:
+ *     l*_len, outer_l*_len
+ *     PKT_TX_OUTER_IPV6, PKT_TX_IPV6
+ *     PKT_TX_IP_CKSUM, PKT_TX_TCP_CKSUM, PKT_TX_UDP_CKSUM
+ *   When set application must guarantee correct header fields, no need to
+ *   specify tunnel type PKT_TX_TUNNEL_* for HW.
+ */
+#define DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM_TSO	0x00040000
 
 struct rte_pci_device;