[dpdk-dev,v3,1/7] ethdev: introduce Tx generic tunnel L3/L4 offload

Message ID 20180305145121.71866-2-xuemingl@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Shahaf Shuler
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Xueming Li March 5, 2018, 2:51 p.m. UTC
  This patch introduce new TX offload flags for device that supports
tunnel agnostic L3/L4 checksum and TSO offload.

The support from the device is for inner and outer checksums on
IPV4/TCP/UDP and TSO for *any packet with the following format*:

< some headers > / [optional IPv4/IPv6] / [optional TCP/UDP] / <some
headers> / [optional inner IPv4/IPv6] / [optional TCP/UDP]

For example the following packets can use this feature:

1. eth / ipv4 / udp / VXLAN / ip / tcp
2. eth / ipv4 / GRE / MPLS / ipv4 / udp

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 lib/librte_ether/rte_ethdev.h | 24 ++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.c    |  5 +++++
 lib/librte_mbuf/rte_mbuf.h    | 18 ++++++++++++++++--
 3 files changed, 45 insertions(+), 2 deletions(-)
  

Comments

Yongseok Koh March 21, 2018, 1:40 a.m. UTC | #1
On Mon, Mar 05, 2018 at 10:51:15PM +0800, Xueming Li wrote:
> This patch introduce new TX offload flags for device that supports
> tunnel agnostic L3/L4 checksum and TSO offload.
> 
> The support from the device is for inner and outer checksums on
> IPV4/TCP/UDP and TSO for *any packet with the following format*:
> 
> < some headers > / [optional IPv4/IPv6] / [optional TCP/UDP] / <some
> headers> / [optional inner IPv4/IPv6] / [optional TCP/UDP]
> 
> For example the following packets can use this feature:
> 
> 1. eth / ipv4 / udp / VXLAN / ip / tcp
> 2. eth / ipv4 / GRE / MPLS / ipv4 / udp
> 
> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> ---
>  lib/librte_ether/rte_ethdev.h | 24 ++++++++++++++++++++++++
>  lib/librte_mbuf/rte_mbuf.c    |  5 +++++
>  lib/librte_mbuf/rte_mbuf.h    | 18 ++++++++++++++++--
>  3 files changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 036153306..66d12d3e0 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -980,6 +980,30 @@ struct rte_eth_conf {
>   *   the same mempool and has refcnt = 1.
>   */
>  #define DEV_TX_OFFLOAD_SECURITY         0x00020000
> +/**< Generic tunnel L3/L4 checksum offload. To enable this offload feature
> + * for a packet to be transmitted on hardware supporting generic tunnel L3/L4
> + * checksum offload:
> + *  - fill outer_l2_len and outer_l3_len in mbuf
> + *  - fill l2_len and l3_len in mbuf
> + *  - set the flags PKT_TX_TUNNEL_xxx (use PKT_TX_TUNNEL_UNKNOWN if undefined)
> + *  - set the flags PKT_TX_OUTER_IP_CKSUM
> + *  - set the flags PKT_TX_IP_CKSUM
> + *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM
> + */
> +#define DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM	0x00040000

It looks redundant to me. Isn't it same as having DEV_TX_OFFLOAD_*_CKSUM?
According to the API document, when PKT_TX_*_CKSUM is set, all the necessary
length fields should be filled in (e.g. mbuf->outer_l?_len and mbuf->l?_len). It
doesn't need to specify any tunnel type for checksum. For example, in order to
request checksum offload for an unknown tunnel type which is similar to VxLAN,
what should app do? Even without defining this new offload flag, it is currently
possible by setting
	(PKT_TX_OUTER_IP_CKSUM | PKT_TX_OUTER_IPV4 | PKT_TX_IP_CKSUM |
	 PKT_TX_IPV4 | PKT_TX_UDP_CKSUM)
along with various length fields.

> +/**< Generic tunnel segmentation offload. To enable it, the user needs to:
> + *  - fill outer_l2_len and outer_l3_len in mbuf
> + *  - fill l2_len and l3_len in mbuf
> + *  - set the flags PKT_TX_TUNNEL_xxx (use PKT_TX_TUNNEL_UNKNOWN if undefined)
> + *  - set the flags PKT_TX_OUTER_IPV4 or PKT_TX_OUTER_IPV6
> + *  - if it's UDP tunnel, set the flags PKT_TX_OUTER_UDP
> + *  - set the flags PKT_TX_IPV4 or PKT_TX_IPV6
> + *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
> + *    PKT_TX_OUTER_IP_CKSUM, PKT_TX_IP_CKSUM and PKT_TX_TCP_CKSUM)
> + * Hardware that supports generic tunnel TSO offload only update outer/inner
> + * L3/L4 fields, tunnel fields are not touched.
> + */
> +#define DEV_TX_OFFLOAD_GENERIC_TNL_TSO		0x00080000

Practically, for the existing TSO offloads for tunneled packets, tunnel type
(PKT_TX_TUNNEL_*) is needed for knowing if transport is UDP-based or IP-based
because the length field of UDP header should be updated for TSO. AFAIK, there's
no TCP-based tunnel. For PKT_TX_TUNNEL_UNKNOWN, the only thing _unknown_ seems
to be the type of transport and by defining PKT_TX_OUTER_UDP, you seem to
recognize the type of transport between UDP and IP (it's not non-UDP). Given
that, the new flags still look redundant and ambiguous. If PKT_TX_TUNNEL_VXLAN
is set, there's no need to set PKT_TX_OUTER_UDP redundantly.

Instead, I think defining PKT_TX_TUNNEL_UDP_UNKNOWN and PKT_TX_TUNNEL_IP_UNKNOWN
could be a good alternative. It is only my humble two cents and I'd like to hear
from more people regarding this.


Thanks,
Yongseok
  
Xueming Li March 22, 2018, 1:55 p.m. UTC | #2
> -----Original Message-----

> From: Yongseok Koh

> Sent: Wednesday, March 21, 2018 9:41 AM

> To: Xueming(Steven) Li <xuemingl@mellanox.com>

> Cc: Wenzhuo Lu <wenzhuo.lu@intel.com>; Jingjing Wu <jingjing.wu@intel.com>;

> Thomas Monjalon <thomas@monjalon.net>; Olivier MATZ

> <olivier.matz@6wind.com>; Shahaf Shuler <shahafs@mellanox.com>; Ferruh

> Yigit <ferruh.yigit@intel.com>; dev@dpdk.org

> Subject: Re: [PATCH v3 1/7] ethdev: introduce Tx generic tunnel L3/L4

> offload

> 

> On Mon, Mar 05, 2018 at 10:51:15PM +0800, Xueming Li wrote:

> > This patch introduce new TX offload flags for device that supports

> > tunnel agnostic L3/L4 checksum and TSO offload.

> >

> > The support from the device is for inner and outer checksums on

> > IPV4/TCP/UDP and TSO for *any packet with the following format*:

> >

> > < some headers > / [optional IPv4/IPv6] / [optional TCP/UDP] / <some

> > headers> / [optional inner IPv4/IPv6] / [optional TCP/UDP]

> >

> > For example the following packets can use this feature:

> >

> > 1. eth / ipv4 / udp / VXLAN / ip / tcp 2. eth / ipv4 / GRE / MPLS /

> > ipv4 / udp

> >

> > Signed-off-by: Xueming Li <xuemingl@mellanox.com>

> > ---

> >  lib/librte_ether/rte_ethdev.h | 24 ++++++++++++++++++++++++

> >  lib/librte_mbuf/rte_mbuf.c    |  5 +++++

> >  lib/librte_mbuf/rte_mbuf.h    | 18 ++++++++++++++++--

> >  3 files changed, 45 insertions(+), 2 deletions(-)

> >

> > diff --git a/lib/librte_ether/rte_ethdev.h

> > b/lib/librte_ether/rte_ethdev.h index 036153306..66d12d3e0 100644

> > --- a/lib/librte_ether/rte_ethdev.h

> > +++ b/lib/librte_ether/rte_ethdev.h

> > @@ -980,6 +980,30 @@ struct rte_eth_conf {

> >   *   the same mempool and has refcnt = 1.

> >   */

> >  #define DEV_TX_OFFLOAD_SECURITY         0x00020000

> > +/**< Generic tunnel L3/L4 checksum offload. To enable this offload

> > +feature

> > + * for a packet to be transmitted on hardware supporting generic

> > +tunnel L3/L4

> > + * checksum offload:

> > + *  - fill outer_l2_len and outer_l3_len in mbuf

> > + *  - fill l2_len and l3_len in mbuf

> > + *  - set the flags PKT_TX_TUNNEL_xxx (use PKT_TX_TUNNEL_UNKNOWN if

> > +undefined)

> > + *  - set the flags PKT_TX_OUTER_IP_CKSUM

> > + *  - set the flags PKT_TX_IP_CKSUM

> > + *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or

> > +PKT_TX_UDP_CKSUM  */

> > +#define DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM	0x00040000

> 

> It looks redundant to me. Isn't it same as having DEV_TX_OFFLOAD_*_CKSUM?

> According to the API document, when PKT_TX_*_CKSUM is set, all the

> necessary length fields should be filled in (e.g. mbuf->outer_l?_len and

> mbuf->l?_len). It doesn't need to specify any tunnel type for checksum.

> For example, in order to request checksum offload for an unknown tunnel

> type which is similar to VxLAN, what should app do? Even without defining

> this new offload flag, it is currently possible by setting

> 	(PKT_TX_OUTER_IP_CKSUM | PKT_TX_OUTER_IPV4 | PKT_TX_IP_CKSUM |

> 	 PKT_TX_IPV4 | PKT_TX_UDP_CKSUM)

> along with various length fields.


Agree it almost same as existing checksum offloading requirement, besides:
1. it does not demand pseudo checksum
2. no need to clear IP checksum
I guess the requirement on API doc is a super set and differences could be 
documented on PMD doc.

For mlx5 original chksum offloading, hw will parse everything w/o demanding
mbuf headers, a little better performance than SWP. The question is how to 
choose between SWP and HW checksum offloading - assuming PKT_TX_TUNNEL_UNKNOWN
is the key. Take L3 VXLAN(no inner L2 header) as example, only SWP could 
offload it correctly and PKT_TX_TUNNEL_UNKOWN should be used here, not 
PKT_TX_TUNNEL_VXLAN, make sense?

Another question, how does user app know the NIC capability of generic tunnel 
checksum offload?

> 

> > +/**< Generic tunnel segmentation offload. To enable it, the user needs

> to:

> > + *  - fill outer_l2_len and outer_l3_len in mbuf

> > + *  - fill l2_len and l3_len in mbuf

> > + *  - set the flags PKT_TX_TUNNEL_xxx (use PKT_TX_TUNNEL_UNKNOWN if

> > +undefined)

> > + *  - set the flags PKT_TX_OUTER_IPV4 or PKT_TX_OUTER_IPV6

> > + *  - if it's UDP tunnel, set the flags PKT_TX_OUTER_UDP

> > + *  - set the flags PKT_TX_IPV4 or PKT_TX_IPV6

> > + *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies

> > + *    PKT_TX_OUTER_IP_CKSUM, PKT_TX_IP_CKSUM and PKT_TX_TCP_CKSUM)

> > + * Hardware that supports generic tunnel TSO offload only update

> > +outer/inner

> > + * L3/L4 fields, tunnel fields are not touched.

> > + */

> > +#define DEV_TX_OFFLOAD_GENERIC_TNL_TSO		0x00080000

> 

> Practically, for the existing TSO offloads for tunneled packets, tunnel

> type

> (PKT_TX_TUNNEL_*) is needed for knowing if transport is UDP-based or IP-

> based because the length field of UDP header should be updated for TSO.

> AFAIK, there's no TCP-based tunnel. For PKT_TX_TUNNEL_UNKNOWN, the only

> thing _unknown_ seems to be the type of transport and by defining

> PKT_TX_OUTER_UDP, you seem to recognize the type of transport between UDP

> and IP (it's not non-UDP). Given that, the new flags still look redundant

> and ambiguous. If PKT_TX_TUNNEL_VXLAN is set, there's no need to set

> PKT_TX_OUTER_UDP redundantly.

> 

> Instead, I think defining PKT_TX_TUNNEL_UDP_UNKNOWN and

> PKT_TX_TUNNEL_IP_UNKNOWN could be a good alternative. It is only my humble

> two cents and I'd like to hear from more people regarding this.

> 


This flag looks more like a feature notification. Current tunnel offloading 
capability like "DEV_TX_OFFLOAD_VXLAN_TNL_TSO" ties to VXLAN tunnel type,
DEV_TX_OFFLOAD_GENERIC_TNL_TSO should be able to let users know that NIC's
capability of offloading generic tunnel type like MPLS-in-UDP that not defined
yet. PKT_TX_OUTER_UDP is something must to have to flag an exists of outer UDP
in such case.

> 

> Thanks,

> Yongseok
  
Olivier Matz March 28, 2018, 12:52 p.m. UTC | #3
Hi Xueming, Yongseok,

On Thu, Mar 22, 2018 at 01:55:05PM +0000, Xueming(Steven) Li wrote:
> 
> 
> > -----Original Message-----
> > From: Yongseok Koh
> > Sent: Wednesday, March 21, 2018 9:41 AM
> > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > Cc: Wenzhuo Lu <wenzhuo.lu@intel.com>; Jingjing Wu <jingjing.wu@intel.com>;
> > Thomas Monjalon <thomas@monjalon.net>; Olivier MATZ
> > <olivier.matz@6wind.com>; Shahaf Shuler <shahafs@mellanox.com>; Ferruh
> > Yigit <ferruh.yigit@intel.com>; dev@dpdk.org
> > Subject: Re: [PATCH v3 1/7] ethdev: introduce Tx generic tunnel L3/L4
> > offload
> > 
> > On Mon, Mar 05, 2018 at 10:51:15PM +0800, Xueming Li wrote:
> > > This patch introduce new TX offload flags for device that supports
> > > tunnel agnostic L3/L4 checksum and TSO offload.
> > >
> > > The support from the device is for inner and outer checksums on
> > > IPV4/TCP/UDP and TSO for *any packet with the following format*:
> > >
> > > < some headers > / [optional IPv4/IPv6] / [optional TCP/UDP] / <some
> > > headers> / [optional inner IPv4/IPv6] / [optional TCP/UDP]
> > >
> > > For example the following packets can use this feature:
> > >
> > > 1. eth / ipv4 / udp / VXLAN / ip / tcp 2. eth / ipv4 / GRE / MPLS /
> > > ipv4 / udp
> > >
> > > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> > > ---
> > >  lib/librte_ether/rte_ethdev.h | 24 ++++++++++++++++++++++++
> > >  lib/librte_mbuf/rte_mbuf.c    |  5 +++++
> > >  lib/librte_mbuf/rte_mbuf.h    | 18 ++++++++++++++++--
> > >  3 files changed, 45 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/lib/librte_ether/rte_ethdev.h
> > > b/lib/librte_ether/rte_ethdev.h index 036153306..66d12d3e0 100644
> > > --- a/lib/librte_ether/rte_ethdev.h
> > > +++ b/lib/librte_ether/rte_ethdev.h
> > > @@ -980,6 +980,30 @@ struct rte_eth_conf {
> > >   *   the same mempool and has refcnt = 1.
> > >   */
> > >  #define DEV_TX_OFFLOAD_SECURITY         0x00020000
> > > +/**< Generic tunnel L3/L4 checksum offload. To enable this offload
> > > +feature
> > > + * for a packet to be transmitted on hardware supporting generic
> > > +tunnel L3/L4
> > > + * checksum offload:
> > > + *  - fill outer_l2_len and outer_l3_len in mbuf
> > > + *  - fill l2_len and l3_len in mbuf
> > > + *  - set the flags PKT_TX_TUNNEL_xxx (use PKT_TX_TUNNEL_UNKNOWN if
> > > +undefined)
> > > + *  - set the flags PKT_TX_OUTER_IP_CKSUM
> > > + *  - set the flags PKT_TX_IP_CKSUM
> > > + *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or
> > > +PKT_TX_UDP_CKSUM  */
> > > +#define DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM	0x00040000
> > 
> > It looks redundant to me. Isn't it same as having DEV_TX_OFFLOAD_*_CKSUM?
> > According to the API document, when PKT_TX_*_CKSUM is set, all the
> > necessary length fields should be filled in (e.g. mbuf->outer_l?_len and
> > mbuf->l?_len). It doesn't need to specify any tunnel type for checksum.
> > For example, in order to request checksum offload for an unknown tunnel
> > type which is similar to VxLAN, what should app do? Even without defining
> > this new offload flag, it is currently possible by setting
> > 	(PKT_TX_OUTER_IP_CKSUM | PKT_TX_OUTER_IPV4 | PKT_TX_IP_CKSUM |
> > 	 PKT_TX_IPV4 | PKT_TX_UDP_CKSUM)
> > along with various length fields.
> 
> Agree it almost same as existing checksum offloading requirement, besides:
> 1. it does not demand pseudo checksum
> 2. no need to clear IP checksum
> I guess the requirement on API doc is a super set and differences could be 
> documented on PMD doc.

Since commit 4fb7e803eb1a ("ethdev: add Tx preparation"), setting the
IP checksum to 0 and the pseudo header checksum in TCP is not required.
The user just need to set the flags, then before transmitting the packet,
call tx_prepare() which will do the proper work, according to the hw
requirements.

It looks that the API documentation of PKT_TX_TCP_SEG should be updated.
I'll submit a patch for this.


> For mlx5 original chksum offloading, hw will parse everything w/o demanding
> mbuf headers, a little better performance than SWP. The question is how to 
> choose between SWP and HW checksum offloading - assuming PKT_TX_TUNNEL_UNKNOWN
> is the key. Take L3 VXLAN(no inner L2 header) as example, only SWP could 
> offload it correctly and PKT_TX_TUNNEL_UNKOWN should be used here, not 
> PKT_TX_TUNNEL_VXLAN, make sense?
> 
> Another question, how does user app know the NIC capability of generic tunnel 
> checksum offload?

After reading the patchset, I still don't see what you want to achieve.
In patch 1, we talk about these 2 packets:

1. eth / ipv4 / udp / VXLAN / ip / tcp     (L3 VXLAN as you said above)
2. eth / ipv4 / GRE / MPLS / ipv4 / udp

What is not doable with current API?

I didn't try, but I think the following would work currently
with tx_prepare() + tx_burst().

- inner TSO on 1.

  flags = PKT_TX_TUNNEL_VXLAN | PKT_TX_IPV4 | PKT_TX_IP_CKSUM |
          PKT_TX_TCP_CKSUM | PKT_TX_TCP_SEG
  outer_l2_len = size(eth)
  outer_l3_len = size(ipv4)
  l2_len = size(udp + vxlan)
  l3_len = size(ip)

- inner TCP checksum on 1.

  flags = PKT_TX_TUNNEL_VXLAN | PKT_TX_IPV4 | PKT_TX_IP_CKSUM |
          PKT_TX_TCP_CKSUM
  outer_l2_len = size(eth)
  outer_l3_len = size(ipv4)
  l2_len = size(udp + vxlan)
  l3_len = size(ip)

- inner UDP checksum on 2.

  flags = PKT_TX_TUNNEL_GRE | PKT_TX_IPV4 | PKT_TX_IP_CKSUM |
          PKT_TX_UDP_CKSUM
  outer_l2_len = size(eth)
  outer_l3_len = size(ipv4)
  l2_len = size(gre + mpls)
  l3_len = size(ipv4)

Yes, writing the tunnel len in l2_len is a bit strange, but it
should do the trick.

It's still unclear for me why you need to add a "generic" flag.
Is there something you can't do today with the current API?


> 
> > 
> > > +/**< Generic tunnel segmentation offload. To enable it, the user needs
> > to:
> > > + *  - fill outer_l2_len and outer_l3_len in mbuf
> > > + *  - fill l2_len and l3_len in mbuf
> > > + *  - set the flags PKT_TX_TUNNEL_xxx (use PKT_TX_TUNNEL_UNKNOWN if
> > > +undefined)
> > > + *  - set the flags PKT_TX_OUTER_IPV4 or PKT_TX_OUTER_IPV6
> > > + *  - if it's UDP tunnel, set the flags PKT_TX_OUTER_UDP
> > > + *  - set the flags PKT_TX_IPV4 or PKT_TX_IPV6
> > > + *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
> > > + *    PKT_TX_OUTER_IP_CKSUM, PKT_TX_IP_CKSUM and PKT_TX_TCP_CKSUM)
> > > + * Hardware that supports generic tunnel TSO offload only update
> > > +outer/inner
> > > + * L3/L4 fields, tunnel fields are not touched.
> > > + */
> > > +#define DEV_TX_OFFLOAD_GENERIC_TNL_TSO		0x00080000
> > 
> > Practically, for the existing TSO offloads for tunneled packets, tunnel
> > type
> > (PKT_TX_TUNNEL_*) is needed for knowing if transport is UDP-based or IP-
> > based because the length field of UDP header should be updated for TSO.
> > AFAIK, there's no TCP-based tunnel. For PKT_TX_TUNNEL_UNKNOWN, the only
> > thing _unknown_ seems to be the type of transport and by defining
> > PKT_TX_OUTER_UDP, you seem to recognize the type of transport between UDP
> > and IP (it's not non-UDP). Given that, the new flags still look redundant
> > and ambiguous. If PKT_TX_TUNNEL_VXLAN is set, there's no need to set
> > PKT_TX_OUTER_UDP redundantly.
> > 
> > Instead, I think defining PKT_TX_TUNNEL_UDP_UNKNOWN and
> > PKT_TX_TUNNEL_IP_UNKNOWN could be a good alternative. It is only my humble
> > two cents and I'd like to hear from more people regarding this.
> > 
> 

I agree with Yongseok here. The flag PKT_TX_TUNNEL_VXLAN implies that
the protocol is UDP. Having a flag PKT_TX_TUNNEL_UDP or PKT_TX_TUNNEL_IP
could make sense if we find a use case that cannot be done currently with
the existing API (for instance, another tunnel type over UDP).

> This flag looks more like a feature notification. Current tunnel offloading 
> capability like "DEV_TX_OFFLOAD_VXLAN_TNL_TSO" ties to VXLAN tunnel type,
> DEV_TX_OFFLOAD_GENERIC_TNL_TSO should be able to let users know that NIC's
> capability of offloading generic tunnel type like MPLS-in-UDP that not defined
> yet. PKT_TX_OUTER_UDP is something must to have to flag an exists of outer UDP
> in such case.

So, maybe, instead of DEV_TX_OFFLOAD_GENERIC_TNL_TSO it could be
DEV_TX_OFFLOAD_UDP_TUNNEL_TSO. Because it is not possible for a hardware
to support TSO on top of any kind of tunnel type.

For instance, if the tunnel header has a checksum or a sequence id, the
hardware must be aware of it.
  
Xueming Li April 4, 2018, 8:20 a.m. UTC | #4
Hi Olivier, 

> -----Original Message-----

> From: Olivier Matz <olivier.matz@6wind.com>

> Sent: Wednesday, March 28, 2018 8:53 PM

> To: Xueming(Steven) Li <xuemingl@mellanox.com>

> Cc: Yongseok Koh <yskoh@mellanox.com>; Wenzhuo Lu <wenzhuo.lu@intel.com>;

> Jingjing Wu <jingjing.wu@intel.com>; Thomas Monjalon <thomas@monjalon.net>;

> Shahaf Shuler <shahafs@mellanox.com>; Ferruh Yigit

> <ferruh.yigit@intel.com>; dev@dpdk.org

> Subject: Re: [PATCH v3 1/7] ethdev: introduce Tx generic tunnel L3/L4

> offload

> 

> Hi Xueming, Yongseok,

> 

> On Thu, Mar 22, 2018 at 01:55:05PM +0000, Xueming(Steven) Li wrote:

> >

> >

> > > -----Original Message-----

> > > From: Yongseok Koh

> > > Sent: Wednesday, March 21, 2018 9:41 AM

> > > To: Xueming(Steven) Li <xuemingl@mellanox.com>

> > > Cc: Wenzhuo Lu <wenzhuo.lu@intel.com>; Jingjing Wu

> > > <jingjing.wu@intel.com>; Thomas Monjalon <thomas@monjalon.net>;

> > > Olivier MATZ <olivier.matz@6wind.com>; Shahaf Shuler

> > > <shahafs@mellanox.com>; Ferruh Yigit <ferruh.yigit@intel.com>;

> > > dev@dpdk.org

> > > Subject: Re: [PATCH v3 1/7] ethdev: introduce Tx generic tunnel

> > > L3/L4 offload

> > >

> > > On Mon, Mar 05, 2018 at 10:51:15PM +0800, Xueming Li wrote:

> > > > This patch introduce new TX offload flags for device that supports

> > > > tunnel agnostic L3/L4 checksum and TSO offload.

> > > >

> > > > The support from the device is for inner and outer checksums on

> > > > IPV4/TCP/UDP and TSO for *any packet with the following format*:

> > > >

> > > > < some headers > / [optional IPv4/IPv6] / [optional TCP/UDP] /

> > > > <some

> > > > headers> / [optional inner IPv4/IPv6] / [optional TCP/UDP]

> > > >

> > > > For example the following packets can use this feature:

> > > >

> > > > 1. eth / ipv4 / udp / VXLAN / ip / tcp 2. eth / ipv4 / GRE / MPLS

> > > > /

> > > > ipv4 / udp

> > > >

> > > > Signed-off-by: Xueming Li <xuemingl@mellanox.com>

> > > > ---

> > > >  lib/librte_ether/rte_ethdev.h | 24 ++++++++++++++++++++++++

> > > >  lib/librte_mbuf/rte_mbuf.c    |  5 +++++

> > > >  lib/librte_mbuf/rte_mbuf.h    | 18 ++++++++++++++++--

> > > >  3 files changed, 45 insertions(+), 2 deletions(-)

> > > >

> > > > diff --git a/lib/librte_ether/rte_ethdev.h

> > > > b/lib/librte_ether/rte_ethdev.h index 036153306..66d12d3e0 100644

> > > > --- a/lib/librte_ether/rte_ethdev.h

> > > > +++ b/lib/librte_ether/rte_ethdev.h

> > > > @@ -980,6 +980,30 @@ struct rte_eth_conf {

> > > >   *   the same mempool and has refcnt = 1.

> > > >   */

> > > >  #define DEV_TX_OFFLOAD_SECURITY         0x00020000

> > > > +/**< Generic tunnel L3/L4 checksum offload. To enable this

> > > > +offload feature

> > > > + * for a packet to be transmitted on hardware supporting generic

> > > > +tunnel L3/L4

> > > > + * checksum offload:

> > > > + *  - fill outer_l2_len and outer_l3_len in mbuf

> > > > + *  - fill l2_len and l3_len in mbuf

> > > > + *  - set the flags PKT_TX_TUNNEL_xxx (use PKT_TX_TUNNEL_UNKNOWN

> > > > +if

> > > > +undefined)

> > > > + *  - set the flags PKT_TX_OUTER_IP_CKSUM

> > > > + *  - set the flags PKT_TX_IP_CKSUM

> > > > + *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or

> > > > +PKT_TX_UDP_CKSUM  */

> > > > +#define DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM	0x00040000

> > >

> > > It looks redundant to me. Isn't it same as having

> DEV_TX_OFFLOAD_*_CKSUM?

> > > According to the API document, when PKT_TX_*_CKSUM is set, all the

> > > necessary length fields should be filled in (e.g. mbuf->outer_l?_len

> > > and

> > > mbuf->l?_len). It doesn't need to specify any tunnel type for checksum.

> > > For example, in order to request checksum offload for an unknown

> > > tunnel type which is similar to VxLAN, what should app do? Even

> > > without defining this new offload flag, it is currently possible by

> setting

> > > 	(PKT_TX_OUTER_IP_CKSUM | PKT_TX_OUTER_IPV4 | PKT_TX_IP_CKSUM |

> > > 	 PKT_TX_IPV4 | PKT_TX_UDP_CKSUM)

> > > along with various length fields.

> >

> > Agree it almost same as existing checksum offloading requirement,

> besides:

> > 1. it does not demand pseudo checksum

> > 2. no need to clear IP checksum

> > I guess the requirement on API doc is a super set and differences

> > could be documented on PMD doc.

> 

> Since commit 4fb7e803eb1a ("ethdev: add Tx preparation"), setting the IP

> checksum to 0 and the pseudo header checksum in TCP is not required.

> The user just need to set the flags, then before transmitting the packet,

> call tx_prepare() which will do the proper work, according to the hw

> requirements.

> 

> It looks that the API documentation of PKT_TX_TCP_SEG should be updated.

> I'll submit a patch for this.

> 


Nice to hide intel specific requirement and now it's much clear, thanks.

> 

> > For mlx5 original chksum offloading, hw will parse everything w/o

> > demanding mbuf headers, a little better performance than SWP. The

> > question is how to choose between SWP and HW checksum offloading -

> > assuming PKT_TX_TUNNEL_UNKNOWN is the key. Take L3 VXLAN(no inner L2

> > header) as example, only SWP could offload it correctly and

> > PKT_TX_TUNNEL_UNKOWN should be used here, not PKT_TX_TUNNEL_VXLAN, make

> sense?

> >

> > Another question, how does user app know the NIC capability of generic

> > tunnel checksum offload?

> 

> After reading the patchset, I still don't see what you want to achieve.

> In patch 1, we talk about these 2 packets:

> 

> 1. eth / ipv4 / udp / VXLAN / ip / tcp     (L3 VXLAN as you said above)

> 2. eth / ipv4 / GRE / MPLS / ipv4 / udp

> 

> What is not doable with current API?

> 

> I didn't try, but I think the following would work currently with

> tx_prepare() + tx_burst().

> 

> - inner TSO on 1.

> 

>   flags = PKT_TX_TUNNEL_VXLAN | PKT_TX_IPV4 | PKT_TX_IP_CKSUM |

>           PKT_TX_TCP_CKSUM | PKT_TX_TCP_SEG

>   outer_l2_len = size(eth)

>   outer_l3_len = size(ipv4)

>   l2_len = size(udp + vxlan)

>   l3_len = size(ip)

> 

> - inner TCP checksum on 1.

> 

>   flags = PKT_TX_TUNNEL_VXLAN | PKT_TX_IPV4 | PKT_TX_IP_CKSUM |

>           PKT_TX_TCP_CKSUM

>   outer_l2_len = size(eth)

>   outer_l3_len = size(ipv4)

>   l2_len = size(udp + vxlan)

>   l3_len = size(ip)

> 

> - inner UDP checksum on 2.

> 

>   flags = PKT_TX_TUNNEL_GRE | PKT_TX_IPV4 | PKT_TX_IP_CKSUM |

>           PKT_TX_UDP_CKSUM

>   outer_l2_len = size(eth)

>   outer_l3_len = size(ipv4)

>   l2_len = size(gre + mpls)

>   l3_len = size(ipv4)

> 

> Yes, writing the tunnel len in l2_len is a bit strange, but it should do

> the trick.

> 

> It's still unclear for me why you need to add a "generic" flag.

> Is there something you can't do today with the current API?

> 


I think the problem is mlx5 current hw so smart that not generic enough, 
that's why this patchset tried to introduce new hw feature and new generic flag.
Since current checksum API already covered, no reason to add new, thanks.

> 

> >

> > >

> > > > +/**< Generic tunnel segmentation offload. To enable it, the user

> > > > +needs

> > > to:

> > > > + *  - fill outer_l2_len and outer_l3_len in mbuf

> > > > + *  - fill l2_len and l3_len in mbuf

> > > > + *  - set the flags PKT_TX_TUNNEL_xxx (use PKT_TX_TUNNEL_UNKNOWN

> > > > +if

> > > > +undefined)

> > > > + *  - set the flags PKT_TX_OUTER_IPV4 or PKT_TX_OUTER_IPV6

> > > > + *  - if it's UDP tunnel, set the flags PKT_TX_OUTER_UDP

> > > > + *  - set the flags PKT_TX_IPV4 or PKT_TX_IPV6

> > > > + *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag

> implies

> > > > + *    PKT_TX_OUTER_IP_CKSUM, PKT_TX_IP_CKSUM and PKT_TX_TCP_CKSUM)

> > > > + * Hardware that supports generic tunnel TSO offload only update

> > > > +outer/inner

> > > > + * L3/L4 fields, tunnel fields are not touched.

> > > > + */

> > > > +#define DEV_TX_OFFLOAD_GENERIC_TNL_TSO		0x00080000

> > >

> > > Practically, for the existing TSO offloads for tunneled packets,

> > > tunnel type

> > > (PKT_TX_TUNNEL_*) is needed for knowing if transport is UDP-based or

> > > IP- based because the length field of UDP header should be updated for

> TSO.

> > > AFAIK, there's no TCP-based tunnel. For PKT_TX_TUNNEL_UNKNOWN, the

> > > only thing _unknown_ seems to be the type of transport and by

> > > defining PKT_TX_OUTER_UDP, you seem to recognize the type of

> > > transport between UDP and IP (it's not non-UDP). Given that, the new

> > > flags still look redundant and ambiguous. If PKT_TX_TUNNEL_VXLAN is

> > > set, there's no need to set PKT_TX_OUTER_UDP redundantly.

> > >

> > > Instead, I think defining PKT_TX_TUNNEL_UDP_UNKNOWN and

> > > PKT_TX_TUNNEL_IP_UNKNOWN could be a good alternative. It is only my

> > > humble two cents and I'd like to hear from more people regarding this.

> > >

> >

> 

> I agree with Yongseok here. The flag PKT_TX_TUNNEL_VXLAN implies that the

> protocol is UDP. Having a flag PKT_TX_TUNNEL_UDP or PKT_TX_TUNNEL_IP could

> make sense if we find a use case that cannot be done currently with the

> existing API (for instance, another tunnel type over UDP).


PKT_TX_TUNNEL_UDP + PKT_TX_TUNNEL_IP looks similar to PKT_TX_TUNNEL_UNKOWN + 
PKT_TX_OUTER_UDP, I'm fine with this proposal.

> 

> > This flag looks more like a feature notification. Current tunnel

> > offloading capability like "DEV_TX_OFFLOAD_VXLAN_TNL_TSO" ties to

> > VXLAN tunnel type, DEV_TX_OFFLOAD_GENERIC_TNL_TSO should be able to

> > let users know that NIC's capability of offloading generic tunnel type

> > like MPLS-in-UDP that not defined yet. PKT_TX_OUTER_UDP is something

> > must to have to flag an exists of outer UDP in such case.

> 

> So, maybe, instead of DEV_TX_OFFLOAD_GENERIC_TNL_TSO it could be

> DEV_TX_OFFLOAD_UDP_TUNNEL_TSO. Because it is not possible for a hardware

> to support TSO on top of any kind of tunnel type.

> 

> For instance, if the tunnel header has a checksum or a sequence id, the

> hardware must be aware of it.


Similarly, I'll split DEV_TX_OFFLOAD_GENERIC_TNL_TSO into DEV_TX_OFFLOAD_UDP_TNL_TSO
and DEV_TX_OFFLOAD_IP_TNL_TSO to cover tunnel types that no checksum and seq id.

Overall speaking, existing app using tunnel offloading should work w/o any change.
I'm working on a new version to reflect this change, and split this patch set into 
public and mlx5.
  

Patch

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 036153306..66d12d3e0 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -980,6 +980,30 @@  struct rte_eth_conf {
  *   the same mempool and has refcnt = 1.
  */
 #define DEV_TX_OFFLOAD_SECURITY         0x00020000
+/**< Generic tunnel L3/L4 checksum offload. To enable this offload feature
+ * for a packet to be transmitted on hardware supporting generic tunnel L3/L4
+ * checksum offload:
+ *  - fill outer_l2_len and outer_l3_len in mbuf
+ *  - fill l2_len and l3_len in mbuf
+ *  - set the flags PKT_TX_TUNNEL_xxx (use PKT_TX_TUNNEL_UNKNOWN if undefined)
+ *  - set the flags PKT_TX_OUTER_IP_CKSUM
+ *  - set the flags PKT_TX_IP_CKSUM
+ *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM
+ */
+#define DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM	0x00040000
+/**< Generic tunnel segmentation offload. To enable it, the user needs to:
+ *  - fill outer_l2_len and outer_l3_len in mbuf
+ *  - fill l2_len and l3_len in mbuf
+ *  - set the flags PKT_TX_TUNNEL_xxx (use PKT_TX_TUNNEL_UNKNOWN if undefined)
+ *  - set the flags PKT_TX_OUTER_IPV4 or PKT_TX_OUTER_IPV6
+ *  - if it's UDP tunnel, set the flags PKT_TX_OUTER_UDP
+ *  - set the flags PKT_TX_IPV4 or PKT_TX_IPV6
+ *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
+ *    PKT_TX_OUTER_IP_CKSUM, PKT_TX_IP_CKSUM and PKT_TX_TCP_CKSUM)
+ * Hardware that supports generic tunnel TSO offload only update outer/inner
+ * L3/L4 fields, tunnel fields are not touched.
+ */
+#define DEV_TX_OFFLOAD_GENERIC_TNL_TSO		0x00080000
 
 /*
  * If new Tx offload capabilities are defined, they also must be
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 091d388d3..c139d5b30 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -400,11 +400,13 @@  const char *rte_get_tx_ol_flag_name(uint64_t mask)
 	case PKT_TX_OUTER_IP_CKSUM: return "PKT_TX_OUTER_IP_CKSUM";
 	case PKT_TX_OUTER_IPV4: return "PKT_TX_OUTER_IPV4";
 	case PKT_TX_OUTER_IPV6: return "PKT_TX_OUTER_IPV6";
+	case PKT_TX_OUTER_UDP: return "PKT_TX_OUTER_UDP";
 	case PKT_TX_TUNNEL_VXLAN: return "PKT_TX_TUNNEL_VXLAN";
 	case PKT_TX_TUNNEL_GRE: return "PKT_TX_TUNNEL_GRE";
 	case PKT_TX_TUNNEL_IPIP: return "PKT_TX_TUNNEL_IPIP";
 	case PKT_TX_TUNNEL_GENEVE: return "PKT_TX_TUNNEL_GENEVE";
 	case PKT_TX_TUNNEL_MPLSINUDP: return "PKT_TX_TUNNEL_MPLSINUDP";
+	case PKT_TX_TUNNEL_UNKNOWN: return "PKT_TX_TUNNEL_UNKNOWN";
 	case PKT_TX_MACSEC: return "PKT_TX_MACSEC";
 	case PKT_TX_SEC_OFFLOAD: return "PKT_TX_SEC_OFFLOAD";
 	default: return NULL;
@@ -429,6 +431,7 @@  rte_get_tx_ol_flag_list(uint64_t mask, char *buf, size_t buflen)
 		{ PKT_TX_OUTER_IP_CKSUM, PKT_TX_OUTER_IP_CKSUM, NULL },
 		{ PKT_TX_OUTER_IPV4, PKT_TX_OUTER_IPV4, NULL },
 		{ PKT_TX_OUTER_IPV6, PKT_TX_OUTER_IPV6, NULL },
+		{ PKT_TX_OUTER_UDP, PKT_TX_OUTER_UDP, NULL },
 		{ PKT_TX_TUNNEL_VXLAN, PKT_TX_TUNNEL_MASK,
 		  "PKT_TX_TUNNEL_NONE" },
 		{ PKT_TX_TUNNEL_GRE, PKT_TX_TUNNEL_MASK,
@@ -439,6 +442,8 @@  rte_get_tx_ol_flag_list(uint64_t mask, char *buf, size_t buflen)
 		  "PKT_TX_TUNNEL_NONE" },
 		{ PKT_TX_TUNNEL_MPLSINUDP, PKT_TX_TUNNEL_MASK,
 		  "PKT_TX_TUNNEL_NONE" },
+		{ PKT_TX_TUNNEL_UNKNOWN, PKT_TX_TUNNEL_MASK,
+		  "PKT_TX_TUNNEL_NONE" },
 		{ PKT_TX_MACSEC, PKT_TX_MACSEC, NULL },
 		{ PKT_TX_SEC_OFFLOAD, PKT_TX_SEC_OFFLOAD, NULL },
 	};
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 62740254d..53cc1b713 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -210,6 +210,13 @@  extern "C" {
 #define PKT_TX_TUNNEL_GENEVE  (0x4ULL << 45)
 /**< TX packet with MPLS-in-UDP RFC 7510 header. */
 #define PKT_TX_TUNNEL_MPLSINUDP (0x5ULL << 45)
+/**
+ * Used by generic tunnel checksum and TSO. Please refer to document of below
+ * fields to enable this feature on hardware support Generic tunnel offload:
+ *  - DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM
+ *  - DEV_TX_OFFLOAD_GENERIC_TNL_TSO
+ */
+#define PKT_TX_TUNNEL_UNKNOWN (0xFULL << 45)
 /* add new TX TUNNEL type here */
 #define PKT_TX_TUNNEL_MASK    (0xFULL << 45)
 
@@ -232,6 +239,8 @@  extern "C" {
  *  - calculate the pseudo header checksum without taking ip_len in account,
  *    and set it in the TCP header. Refer to rte_ipv4_phdr_cksum() and
  *    rte_ipv6_phdr_cksum() that can be used as helpers.
+ *  PLease refer to DEV_TX_OFFLOAD_GENERIC_TNL_TSO to enable Generic tunnel
+ *  TSO.
  */
 #define PKT_TX_TCP_SEG       (1ULL << 50)
 
@@ -311,6 +320,13 @@  extern "C" {
 #define PKT_TX_OUTER_IPV6    (1ULL << 60)
 
 /**
+ * Packet outer header is UDP. Set when using generic tunnel TSO offload
+ * (DEV_TX_OFFLOAD_GENERIC_TNL_TSO) to tell the NIC that the outer L4
+ * header is UDP.
+ */
+#define PKT_TX_OUTER_UDP     (1ULL << 61) /**< Outer L4 header is UDP. */
+
+/**
  * Bitmask of all supported packet Tx offload features flags,
  * which can be set for packet.
  */
@@ -326,8 +342,6 @@  extern "C" {
 		PKT_TX_MACSEC |		 \
 		PKT_TX_SEC_OFFLOAD)
 
-#define __RESERVED           (1ULL << 61) /**< reserved for future mbuf use */
-
 #define IND_ATTACHED_MBUF    (1ULL << 62) /**< Indirect attached mbuf */
 
 /* Use final bit of flags to indicate a control mbuf */