[RFC] ethdev: add min/max MTU to device info

Message ID 20180905164157.844-1-stephen@networkplumber.org (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [RFC] ethdev: add min/max MTU to device info |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Stephen Hemminger Sept. 5, 2018, 4:41 p.m. UTC
  This addresses the usability issue raised by OVS at DPDK Userspace
summit. It adds general min/max mtu into device info. For compatiablity,
and to save space, it fits in a hole in existing structure.

The initial version sets max mtu to normal Ethernet, it is up to
PMD to set larger value if it supports Jumbo frames.

Fixing the drivers to use this is trivial and can be done by 18.11.
Already have some of the patches done.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/librte_ethdev/rte_ethdev.c | 7 +++++++
 lib/librte_ethdev/rte_ethdev.h | 2 ++
 2 files changed, 9 insertions(+)
  

Comments

Shahaf Shuler Sept. 6, 2018, 5:51 a.m. UTC | #1
Wednesday, September 5, 2018 7:42 PM, Stephen Hemminger:
> Subject: [dpdk-dev] [RFC] ethdev: add min/max MTU to device info
> 
> This addresses the usability issue raised by OVS at DPDK Userspace summit.
> It adds general min/max mtu into device info. For compatiablity, and to save
> space, it fits in a hole in existing structure.
> 
> The initial version sets max mtu to normal Ethernet, it is up to PMD to set
> larger value if it supports Jumbo frames.
> 
> Fixing the drivers to use this is trivial and can be done by 18.11.
> Already have some of the patches done.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Makes sense.
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

> ---
>  lib/librte_ethdev/rte_ethdev.c | 7 +++++++  lib/librte_ethdev/rte_ethdev.h
> | 2 ++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 4c320250589a..df0c7536a7c4 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -2408,6 +2408,8 @@ rte_eth_dev_info_get(uint16_t port_id, struct
> rte_eth_dev_info *dev_info)
>  	dev_info->rx_desc_lim = lim;
>  	dev_info->tx_desc_lim = lim;
>  	dev_info->device = dev->device;
> +	dev_info->min_mtu = ETHER_MIN_MTU;
> +	dev_info->max_mtu = ETHER_MTU;
> 
>  	RTE_FUNC_PTR_OR_RET(*dev->dev_ops->dev_infos_get);
>  	(*dev->dev_ops->dev_infos_get)(dev, dev_info); @@ -2471,12
> +2473,17 @@ int  rte_eth_dev_set_mtu(uint16_t port_id, uint16_t mtu)  {
>  	int ret;
> +	struct rte_eth_dev_info dev_info;
>  	struct rte_eth_dev *dev;
> 
>  	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>  	dev = &rte_eth_devices[port_id];
>  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->mtu_set, -
> ENOTSUP);
> 
> +	rte_eth_dev_info_get(port_id, &dev_info);
> +	if (mtu < dev_info.min_mtu || mtu > dev_info.max_mtu)
> +		return -EINVAL;
> +
>  	ret = (*dev->dev_ops->mtu_set)(dev, mtu);
>  	if (!ret)
>  		dev->data->mtu = mtu;
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 7070e9ab408f..5171a9083288 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1015,6 +1015,8 @@ struct rte_eth_dev_info {
>  	const char *driver_name; /**< Device Driver name. */
>  	unsigned int if_index; /**< Index to bound host interface, or 0 if
> none.
>  		Use if_indextoname() to translate into an interface name. */
> +	uint16_t min_mtu;	/**< Minimum MTU allowed */
> +	uint16_t max_mtu;	/**< Maximum MTU allowed */
>  	const uint32_t *dev_flags; /**< Device flags */
>  	uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
>  	uint32_t max_rx_pktlen; /**< Maximum configurable length of RX
> pkt. */
> --
> 2.17.1
  
Andrew Rybchenko Sept. 6, 2018, 6:29 a.m. UTC | #2
On 09/05/2018 07:41 PM, Stephen Hemminger wrote:
> This addresses the usability issue raised by OVS at DPDK Userspace
> summit. It adds general min/max mtu into device info. For compatiablity,
> and to save space, it fits in a hole in existing structure.

It is true for amd64, but it looks like it is false on 32-bit. So, ABI 
breakage.

> The initial version sets max mtu to normal Ethernet, it is up to
> PMD to set larger value if it supports Jumbo frames.
>
> Fixing the drivers to use this is trivial and can be done by 18.11.
> Already have some of the patches done.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>   lib/librte_ethdev/rte_ethdev.c | 7 +++++++
>   lib/librte_ethdev/rte_ethdev.h | 2 ++
>   2 files changed, 9 insertions(+)
>
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 4c320250589a..df0c7536a7c4 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -2408,6 +2408,8 @@ rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
>   	dev_info->rx_desc_lim = lim;
>   	dev_info->tx_desc_lim = lim;
>   	dev_info->device = dev->device;
> +	dev_info->min_mtu = ETHER_MIN_MTU;
> +	dev_info->max_mtu = ETHER_MTU;
>   
>   	RTE_FUNC_PTR_OR_RET(*dev->dev_ops->dev_infos_get);
>   	(*dev->dev_ops->dev_infos_get)(dev, dev_info);
> @@ -2471,12 +2473,17 @@ int
>   rte_eth_dev_set_mtu(uint16_t port_id, uint16_t mtu)
>   {
>   	int ret;
> +	struct rte_eth_dev_info dev_info;
>   	struct rte_eth_dev *dev;
>   
>   	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>   	dev = &rte_eth_devices[port_id];
>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->mtu_set, -ENOTSUP);
>   
> +	rte_eth_dev_info_get(port_id, &dev_info);
> +	if (mtu < dev_info.min_mtu || mtu > dev_info.max_mtu)
> +		return -EINVAL;
> +

The check breaks set MTU to value larger than ETHER_MTU for not
updated drivers. So, IMHO, it should be pushed only with appropriate
updates in all drivers which support bigger MTU.

>   	ret = (*dev->dev_ops->mtu_set)(dev, mtu);
>   	if (!ret)
>   		dev->data->mtu = mtu;
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 7070e9ab408f..5171a9083288 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1015,6 +1015,8 @@ struct rte_eth_dev_info {
>   	const char *driver_name; /**< Device Driver name. */
>   	unsigned int if_index; /**< Index to bound host interface, or 0 if none.
>   		Use if_indextoname() to translate into an interface name. */
> +	uint16_t min_mtu;	/**< Minimum MTU allowed */
> +	uint16_t max_mtu;	/**< Maximum MTU allowed */
>   	const uint32_t *dev_flags; /**< Device flags */
>   	uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
>   	uint32_t max_rx_pktlen; /**< Maximum configurable length of RX pkt. */
  
Stephen Hemminger Sept. 6, 2018, 10:52 a.m. UTC | #3
On Thu, 6 Sep 2018 09:29:32 +0300
Andrew Rybchenko <arybchenko@solarflare.com> wrote:

> On 09/05/2018 07:41 PM, Stephen Hemminger wrote:
> > This addresses the usability issue raised by OVS at DPDK Userspace
> > summit. It adds general min/max mtu into device info. For compatiablity,
> > and to save space, it fits in a hole in existing structure.  
> 
> It is true for amd64, but it looks like it is false on 32-bit. So, ABI 
> breakage.

Yes it is ABI change on 32 bit, but 18.11 is a major release where
this is allowed/expected.
  
Ian Stokes Nov. 22, 2018, 9:58 a.m. UTC | #4
> On Thu, 6 Sep 2018 09:29:32 +0300
> Andrew Rybchenko <arybchenko@solarflare.com> wrote:
> 
> > On 09/05/2018 07:41 PM, Stephen Hemminger wrote:
> > > This addresses the usability issue raised by OVS at DPDK Userspace
> > > summit. It adds general min/max mtu into device info. For
> > > compatiablity, and to save space, it fits in a hole in existing
> structure.
> >
> > It is true for amd64, but it looks like it is false on 32-bit. So, ABI
> > breakage.
> 
> Yes it is ABI change on 32 bit, but 18.11 is a major release where this is
> allowed/expected.

Thanks for this work Stephen, I've tested it with OVS DPDK and it resolves the issues as described, if it's to be part of DPDK 19.02 I guess there should be an ABI breakage notification in 18.11?
  
Thomas Monjalon Dec. 19, 2018, 2:37 a.m. UTC | #5
Hi Stephen,

Can we expect a v2 of this patch? At least for removing deprecation
and bump ethdev version, etc?

05/09/2018 18:41, Stephen Hemminger:
> This addresses the usability issue raised by OVS at DPDK Userspace
> summit. It adds general min/max mtu into device info. For compatiablity,
> and to save space, it fits in a hole in existing structure.
> 
> The initial version sets max mtu to normal Ethernet, it is up to
> PMD to set larger value if it supports Jumbo frames.
> 
> Fixing the drivers to use this is trivial and can be done by 18.11.
> Already have some of the patches done.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
  
Stephen Hemminger Jan. 16, 2019, 9:30 p.m. UTC | #6
On Wed, 19 Dec 2018 03:37:16 +0100
Thomas Monjalon <thomas@monjalon.net> wrote:

> Hi Stephen,
> 
> Can we expect a v2 of this patch? At least for removing deprecation
> and bump ethdev version, etc?
> 
> 05/09/2018 18:41, Stephen Hemminger:
> > This addresses the usability issue raised by OVS at DPDK Userspace
> > summit. It adds general min/max mtu into device info. For compatiablity,
> > and to save space, it fits in a hole in existing structure.
> > 
> > The initial version sets max mtu to normal Ethernet, it is up to
> > PMD to set larger value if it supports Jumbo frames.
> > 
> > Fixing the drivers to use this is trivial and can be done by 18.11.
> > Already have some of the patches done.
> > 
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>  
> 
> 
> 
> 

It needs more than just this patch to be worthwhile.
Need device drivers to fill in the max_mtu field, otherwise it would
just break jumbo frame support.

I suppose an initial patch to bump ethdev version would be possible for 19.05,
are there other API changes doing that already for that relaease.
  
Morten Brørup Feb. 6, 2019, 1:05 p.m. UTC | #7
Good work, Stephen.

It should also be documented how PMDs should interpret this MTU.

Obviously, a VLAN tagged Ethernet frame grows from 1518 to 1522 bytes incl. header and CRC, and should be allowed with an Ethernet MTU of 1500 bytes. There's even a #define ETHER_MAX_VLAN_FRAME_LEN for this, but that's as far as it goes...

But how about frames with even larger headers, e.g. 4 MPLS labels makes a frame 16 bytes longer, i.e. it grows from 1518 to 1534 bytes... is such a frame acceptable with an MTU of 1500 bytes?

According to Wikipedia (https://en.wikipedia.org/wiki/Maximum_transmission_unit), MTU describes the maximum payload size, and is not tied to the maximum frame size. However, the Linux kernel (at least the older versions I have been working with) incorrectly ties the MTU directly to the maximum frame size, so the MTU has to increased to support larger headers (e.g. QinQ or an MPLS stack). Do none, all or some DPDK PMDs suffer from the same error?


Med venlig hilsen / kind regards
- Morten Brørup
  
Stephen Hemminger Feb. 6, 2019, 4:11 p.m. UTC | #8
On Wed, 6 Feb 2019 14:05:34 +0100
Morten Brørup <mb@smartsharesystems.com> wrote:

> Good work, Stephen.
> 
> It should also be documented how PMDs should interpret this MTU.
> 
> Obviously, a VLAN tagged Ethernet frame grows from 1518 to 1522 bytes incl. header and CRC, and should be allowed with an Ethernet MTU of 1500 bytes. There's even a #define ETHER_MAX_VLAN_FRAME_LEN for this, but that's as far as it goes...
> 
> But how about frames with even larger headers, e.g. 4 MPLS labels makes a frame 16 bytes longer, i.e. it grows from 1518 to 1534 bytes... is such a frame acceptable with an MTU of 1500 bytes?

No. According to standard practice in Linux and FreeBSD, only the first VLAN tag is free.
After that any other headers count against MTU.

> 
> According to Wikipedia (https://en.wikipedia.org/wiki/Maximum_transmission_unit), MTU describes the maximum payload size, and is not tied to the maximum frame size. However, the Linux kernel (at least the older versions I have been working with) incorrectly ties the MTU directly to the maximum frame size, so the MTU has to increased to support larger headers (e.g. QinQ or an MPLS stack). Do none, all or some DPDK PMDs suffer from the same error?
> 

Maximum Transmission Unit and Maximum Receive Unit (MRU) are separate. On most hardware they are the same but
there is variation. Some hardware can receive larger sizes than MTU and some have max receive length register.
  
Morten Brørup Feb. 6, 2019, 9:58 p.m. UTC | #9
> From: dev on behalf of Stephen Hemminger
> On Wed, 6 Feb 2019 14:05:34 +0100
> Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> > Good work, Stephen.
> > 
> > It should also be documented how PMDs should interpret this MTU.
> > 
> > Obviously, a VLAN tagged Ethernet frame grows from 1518 to 1522 bytes incl. header and CRC, and should be allowed with an Ethernet MTU of 1500 bytes. There's even a #define ETHER_MAX_VLAN_FRAME_LEN for this, but that's as far as it goes...
> > 
> > But how about frames with even larger headers, e.g. 4 MPLS labels makes a frame 16 bytes longer, i.e. it grows from 1518 to 1534 bytes... is such a frame acceptable with an MTU of 1500 bytes?
> 
> No. According to standard practice in Linux and FreeBSD, only the first VLAN tag is free.
> After that any other headers count against MTU.

Thank you for the insights. Just to clarify:
1 VLAN tag is allowed for free.
But on order to support two VLAN tags, the MTU must be increased by the size of one VLAN tag, because the first VLAN tag is free?
Or must the MTU be increased by the size of two VLAN tags, because only the special case of exactly one VLAN tag is free?

I wish details like this were documented... for the benefit of both PMD developers and DPDK users. Which is why I CC'ed the Documentatation and Ethernet API maintainers.

It reminds me of the previous discussion about some Ethernet PMD's not padding to minimum Ethernet frame size... DPDK users expecting a certain behavior based on Linux/BSD behavior, but not documented, so the PMD developers were unaware (or possibly disagreed).

We all know that the devil is in the details! Which is why documenting such details is important.

Outside the DPDK core libraries, the Ethernet PMDs are probably the closest to the DPDK core you are going to get.
The core libraries have a bunch of test cases. Perhaps there should be more PMD test cases.

> 
> > 
> > According to Wikipedia (https://en.wikipedia.org/wiki/Maximum_transmission_unit), MTU describes the maximum payload size, and is not tied to the maximum frame size. However, the Linux kernel (at least the older versions I have been working with) incorrectly ties the MTU directly to the maximum frame size, so the MTU has to increased to support larger headers (e.g. QinQ or an MPLS stack). Do none, all or some DPDK PMDs suffer from the same error?
> > 
> 
> Maximum Transmission Unit and Maximum Receive Unit (MRU) are separate. On most hardware they are the same but
> there is variation. Some hardware can receive larger sizes than MTU and some have max receive length register.

Yes. This is also not obvious, although the name Maximum Transmission Unit strongly indicates that it does not apply to ingress. It took me by surprise a few years ago while working with the Linux kernel.
  
Ananyev, Konstantin Feb. 7, 2019, 10:25 a.m. UTC | #10
> 
> > From: dev on behalf of Stephen Hemminger
> > On Wed, 6 Feb 2019 14:05:34 +0100
> > Morten Brørup <mb@smartsharesystems.com> wrote:
> >
> > > Good work, Stephen.
> > >
> > > It should also be documented how PMDs should interpret this MTU.
> > >
> > > Obviously, a VLAN tagged Ethernet frame grows from 1518 to 1522 bytes incl. header and CRC, and should be allowed with an Ethernet
> MTU of 1500 bytes. There's even a #define ETHER_MAX_VLAN_FRAME_LEN for this, but that's as far as it goes...
> > >
> > > But how about frames with even larger headers, e.g. 4 MPLS labels makes a frame 16 bytes longer, i.e. it grows from 1518 to 1534
> bytes... is such a frame acceptable with an MTU of 1500 bytes?
> >
> > No. According to standard practice in Linux and FreeBSD, only the first VLAN tag is free.
> > After that any other headers count against MTU.
> 
> Thank you for the insights. Just to clarify:
> 1 VLAN tag is allowed for free.
> But on order to support two VLAN tags, the MTU must be increased by the size of one VLAN tag, because the first VLAN tag is free?
> Or must the MTU be increased by the size of two VLAN tags, because only the special case of exactly one VLAN tag is free?

Can we introduce new function at ehtdev API to query PMD frame size based on MTU?
Something like: rte_ethdev_mtu_to_frame_size(uint32_t mtu);
Provide default behavior and allow PMD to overwrite it?
Konstantin 

> 
> I wish details like this were documented... for the benefit of both PMD developers and DPDK users. Which is why I CC'ed the
> Documentatation and Ethernet API maintainers.
> 
> It reminds me of the previous discussion about some Ethernet PMD's not padding to minimum Ethernet frame size... DPDK users expecting a
> certain behavior based on Linux/BSD behavior, but not documented, so the PMD developers were unaware (or possibly disagreed).
> 
> We all know that the devil is in the details! Which is why documenting such details is important.
> 
> Outside the DPDK core libraries, the Ethernet PMDs are probably the closest to the DPDK core you are going to get.
> The core libraries have a bunch of test cases. Perhaps there should be more PMD test cases.
> 
> >
> > >
> > > According to Wikipedia (https://en.wikipedia.org/wiki/Maximum_transmission_unit), MTU describes the maximum payload size, and is
> not tied to the maximum frame size. However, the Linux kernel (at least the older versions I have been working with) incorrectly ties the
> MTU directly to the maximum frame size, so the MTU has to increased to support larger headers (e.g. QinQ or an MPLS stack). Do none, all
> or some DPDK PMDs suffer from the same error?
> > >
> >
> > Maximum Transmission Unit and Maximum Receive Unit (MRU) are separate. On most hardware they are the same but
> > there is variation. Some hardware can receive larger sizes than MTU and some have max receive length register.
> 
> Yes. This is also not obvious, although the name Maximum Transmission Unit strongly indicates that it does not apply to ingress. It took me
> by surprise a few years ago while working with the Linux kernel.
> 
> 
> 
>
  
Morten Brørup Feb. 7, 2019, 11:10 a.m. UTC | #11
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
> Konstantin
> >
> > > From: dev on behalf of Stephen Hemminger
> > > On Wed, 6 Feb 2019 14:05:34 +0100
> > > Morten Brørup <mb@smartsharesystems.com> wrote:
> > >
> > > > Good work, Stephen.
> > > >
> > > > It should also be documented how PMDs should interpret this MTU.
> > > >
> > > > Obviously, a VLAN tagged Ethernet frame grows from 1518 to 1522
> bytes incl. header and CRC, and should be allowed with an Ethernet
> > MTU of 1500 bytes. There's even a #define ETHER_MAX_VLAN_FRAME_LEN
> for this, but that's as far as it goes...
> > > >
> > > > But how about frames with even larger headers, e.g. 4 MPLS labels
> makes a frame 16 bytes longer, i.e. it grows from 1518 to 1534
> > bytes... is such a frame acceptable with an MTU of 1500 bytes?
> > >
> > > No. According to standard practice in Linux and FreeBSD, only the
> first VLAN tag is free.
> > > After that any other headers count against MTU.
> >
> > Thank you for the insights. Just to clarify:
> > 1 VLAN tag is allowed for free.
> > But on order to support two VLAN tags, the MTU must be increased by
> the size of one VLAN tag, because the first VLAN tag is free?
> > Or must the MTU be increased by the size of two VLAN tags, because
> only the special case of exactly one VLAN tag is free?
> 
> Can we introduce new function at ehtdev API to query PMD frame size
> based on MTU?
> Something like: rte_ethdev_mtu_to_frame_size(uint32_t mtu);
> Provide default behavior and allow PMD to overwrite it?
> Konstantin

This assumes that the Layer 2 headers are fixed size. If you add e.g. an MPLS stack to the packet, the number of MPLS labels can vary, and thus the size of the Layer 2 headers varies with each packet.

It is a problem if Layer 3/4 offload features make assumptions about the Layer 3/4 MTUs based on the Layer 2 MTU without considering the size of the actual Ethernet headers of each packet, but simply assume that the Ethernet header size is fixed.

It might currently be calculated correctly for untagged or single VLAN tagged packets (assuming the VLAN tag is not already part of the packet data, but is set in the mbuf for the NIC to add).

-Morten
  
Ananyev, Konstantin Feb. 7, 2019, noon UTC | #12
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
> > Konstantin
> > >
> > > > From: dev on behalf of Stephen Hemminger
> > > > On Wed, 6 Feb 2019 14:05:34 +0100
> > > > Morten Brørup <mb@smartsharesystems.com> wrote:
> > > >
> > > > > Good work, Stephen.
> > > > >
> > > > > It should also be documented how PMDs should interpret this MTU.
> > > > >
> > > > > Obviously, a VLAN tagged Ethernet frame grows from 1518 to 1522
> > bytes incl. header and CRC, and should be allowed with an Ethernet
> > > MTU of 1500 bytes. There's even a #define ETHER_MAX_VLAN_FRAME_LEN
> > for this, but that's as far as it goes...
> > > > >
> > > > > But how about frames with even larger headers, e.g. 4 MPLS labels
> > makes a frame 16 bytes longer, i.e. it grows from 1518 to 1534
> > > bytes... is such a frame acceptable with an MTU of 1500 bytes?
> > > >
> > > > No. According to standard practice in Linux and FreeBSD, only the
> > first VLAN tag is free.
> > > > After that any other headers count against MTU.
> > >
> > > Thank you for the insights. Just to clarify:
> > > 1 VLAN tag is allowed for free.
> > > But on order to support two VLAN tags, the MTU must be increased by
> > the size of one VLAN tag, because the first VLAN tag is free?
> > > Or must the MTU be increased by the size of two VLAN tags, because
> > only the special case of exactly one VLAN tag is free?
> >
> > Can we introduce new function at ehtdev API to query PMD frame size
> > based on MTU?
> > Something like: rte_ethdev_mtu_to_frame_size(uint32_t mtu);
> > Provide default behavior and allow PMD to overwrite it?
> > Konstantin
> 
> This assumes that the Layer 2 headers are fixed size. If you add e.g. an MPLS stack to the packet, the number of MPLS labels can vary, and
> thus the size of the Layer 2 headers varies with each packet.

If all this extra stuff (MPLS labels, etc.) is added by SW - then yes, it will be SW (upper layer) to take account about such overhead
and it would be accounted by PMD frame-size caluclation.
If these fields would be added by HW/PMD - then it would be PMD responsibility to take these extras into account when calculating
frame size.

> 
> It is a problem if Layer 3/4 offload features make assumptions about the Layer 3/4 MTUs based on the Layer 2 MTU without considering the
> size of the actual Ethernet headers of each packet, but simply assume that the Ethernet header size is fixed.
> 
> It might currently be calculated correctly for untagged or single VLAN tagged packets (assuming the VLAN tag is not already part of the
> packet data, but is set in the mbuf for the NIC to add).

That could be a default behavior for that function.
Konstantin
  
Ian Stokes Feb. 20, 2019, 4:02 p.m. UTC | #13
On 2/7/2019 12:00 PM, Ananyev, Konstantin wrote:
> 
>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
>>> Konstantin
>>>>
>>>>> From: dev on behalf of Stephen Hemminger
>>>>> On Wed, 6 Feb 2019 14:05:34 +0100
>>>>> Morten Brørup <mb@smartsharesystems.com> wrote:
>>>>>
>>>>>> Good work, Stephen.
>>>>>>
>>>>>> It should also be documented how PMDs should interpret this MTU.
>>>>>>
>>>>>> Obviously, a VLAN tagged Ethernet frame grows from 1518 to 1522
>>> bytes incl. header and CRC, and should be allowed with an Ethernet
>>>> MTU of 1500 bytes. There's even a #define ETHER_MAX_VLAN_FRAME_LEN
>>> for this, but that's as far as it goes...
>>>>>>
>>>>>> But how about frames with even larger headers, e.g. 4 MPLS labels
>>> makes a frame 16 bytes longer, i.e. it grows from 1518 to 1534
>>>> bytes... is such a frame acceptable with an MTU of 1500 bytes?
>>>>>
>>>>> No. According to standard practice in Linux and FreeBSD, only the
>>> first VLAN tag is free.
>>>>> After that any other headers count against MTU.
>>>>
>>>> Thank you for the insights. Just to clarify:
>>>> 1 VLAN tag is allowed for free.
>>>> But on order to support two VLAN tags, the MTU must be increased by
>>> the size of one VLAN tag, because the first VLAN tag is free?
>>>> Or must the MTU be increased by the size of two VLAN tags, because
>>> only the special case of exactly one VLAN tag is free?
>>>
>>> Can we introduce new function at ehtdev API to query PMD frame size
>>> based on MTU?
>>> Something like: rte_ethdev_mtu_to_frame_size(uint32_t mtu);
>>> Provide default behavior and allow PMD to overwrite it?
>>> Konstantin
>>
>> This assumes that the Layer 2 headers are fixed size. If you add e.g. an MPLS stack to the packet, the number of MPLS labels can vary, and
>> thus the size of the Layer 2 headers varies with each packet.
> 
> If all this extra stuff (MPLS labels, etc.) is added by SW - then yes, it will be SW (upper layer) to take account about such overhead
> and it would be accounted by PMD frame-size caluclation.
> If these fields would be added by HW/PMD - then it would be PMD responsibility to take these extras into account when calculating
> frame size.
> 
>>
>> It is a problem if Layer 3/4 offload features make assumptions about the Layer 3/4 MTUs based on the Layer 2 MTU without considering the
>> size of the actual Ethernet headers of each packet, but simply assume that the Ethernet header size is fixed.
>>
>> It might currently be calculated correctly for untagged or single VLAN tagged packets (assuming the VLAN tag is not already part of the
>> packet data, but is set in the mbuf for the NIC to add).
> 
> That could be a default behavior for that function.
> Konstantin
> 

Hi All,

with a view to help progress this I've posted an RFC series based on 
Stephens work and some of the PMD drivers

http://mails.dpdk.org/archives/dev/2019-February/124938.html

Any feedback welcome

Thanks
Ian
  
Nithin Dabilpuram June 24, 2019, 1:18 p.m. UTC | #14
Hi All,

Since Ian's change didn't include MTU interpretation issue, we want to see if we can send out a patch for this discussion to conclude as we have a similar problem.

Currently two ways are as below.
#1 Konstantin's solution of adding new api rte_ethdev_mtu_to_frame_size(devid, mtu), that would dynamically provide the new max mtu.
#2 Can we have the value of max_mtu field in rte_eth_dev_info  be dynamic i.e based on device configure to indicate that new Tx offloads would need more space for L2.
       For this we can update max_mtu field documentation without abi change.

I also see a problem with above two solutions that is not present with Linux like approach. It is that currently MTU configuration both effects Rx frame size and Tx frame size.
If we change the MTU interpretation based on Tx offloads, then for Rx, there is a confusion of what will be its max frame size accepted.

Thanks
Nithin

 -----Original Message-----
From: dev <dev-bounces@dpdk.org> On Behalf Of Ian Stokes
Sent: Wednesday, February 20, 2019 9:32 PM
To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Morten Brørup <mb@smartsharesystems.com>; Stephen Hemminger <stephen@networkplumber.org>
Cc: dev@dpdk.org; Mcnamara, John <john.mcnamara@intel.com>; Kovacevic, Marko <marko.kovacevic@intel.com>; Thomas Monjalon <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; Andrew Rybchenko <arybchenko@solarflare.com>
Subject: Re: [dpdk-dev] [RFC] ethdev: add min/max MTU to device info

On 2/7/2019 12:00 PM, Ananyev, Konstantin wrote:
> 
>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
>>> Konstantin
>>>>
>>>>> From: dev on behalf of Stephen Hemminger
>>>>> On Wed, 6 Feb 2019 14:05:34 +0100
>>>>> Morten Brørup <mb@smartsharesystems.com> wrote:
>>>>>
>>>>>> Good work, Stephen.
>>>>>>
>>>>>> It should also be documented how PMDs should interpret this MTU.
>>>>>>
>>>>>> Obviously, a VLAN tagged Ethernet frame grows from 1518 to 1522
>>> bytes incl. header and CRC, and should be allowed with an Ethernet
>>>> MTU of 1500 bytes. There's even a #define ETHER_MAX_VLAN_FRAME_LEN
>>> for this, but that's as far as it goes...
>>>>>>
>>>>>> But how about frames with even larger headers, e.g. 4 MPLS labels
>>> makes a frame 16 bytes longer, i.e. it grows from 1518 to 1534
>>>> bytes... is such a frame acceptable with an MTU of 1500 bytes?
>>>>>
>>>>> No. According to standard practice in Linux and FreeBSD, only the
>>> first VLAN tag is free.
>>>>> After that any other headers count against MTU.
>>>>
>>>> Thank you for the insights. Just to clarify:
>>>> 1 VLAN tag is allowed for free.
>>>> But on order to support two VLAN tags, the MTU must be increased by
>>> the size of one VLAN tag, because the first VLAN tag is free?
>>>> Or must the MTU be increased by the size of two VLAN tags, because
>>> only the special case of exactly one VLAN tag is free?
>>>
>>> Can we introduce new function at ehtdev API to query PMD frame size
>>> based on MTU?
>>> Something like: rte_ethdev_mtu_to_frame_size(uint32_t mtu);
>>> Provide default behavior and allow PMD to overwrite it?
>>> Konstantin
>>
>> This assumes that the Layer 2 headers are fixed size. If you add e.g. an MPLS stack to the packet, the number of MPLS labels can vary, and
>> thus the size of the Layer 2 headers varies with each packet.
> 
> If all this extra stuff (MPLS labels, etc.) is added by SW - then yes, it will be SW (upper layer) to take account about such overhead
> and it would be accounted by PMD frame-size caluclation.
> If these fields would be added by HW/PMD - then it would be PMD responsibility to take these extras into account when calculating
> frame size.
> 
>>
>> It is a problem if Layer 3/4 offload features make assumptions about the Layer 3/4 MTUs based on the Layer 2 MTU without considering the
>> size of the actual Ethernet headers of each packet, but simply assume that the Ethernet header size is fixed.
>>
>> It might currently be calculated correctly for untagged or single VLAN tagged packets (assuming the VLAN tag is not already part of the
>> packet data, but is set in the mbuf for the NIC to add).
> 
> That could be a default behavior for that function.
> Konstantin
> 

Hi All,

with a view to help progress this I've posted an RFC series based on 
Stephens work and some of the PMD drivers

http://mails.dpdk.org/archives/dev/2019-February/124938.html

Any feedback welcome

Thanks
Ian
  

Patch

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 4c320250589a..df0c7536a7c4 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -2408,6 +2408,8 @@  rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
 	dev_info->rx_desc_lim = lim;
 	dev_info->tx_desc_lim = lim;
 	dev_info->device = dev->device;
+	dev_info->min_mtu = ETHER_MIN_MTU;
+	dev_info->max_mtu = ETHER_MTU;
 
 	RTE_FUNC_PTR_OR_RET(*dev->dev_ops->dev_infos_get);
 	(*dev->dev_ops->dev_infos_get)(dev, dev_info);
@@ -2471,12 +2473,17 @@  int
 rte_eth_dev_set_mtu(uint16_t port_id, uint16_t mtu)
 {
 	int ret;
+	struct rte_eth_dev_info dev_info;
 	struct rte_eth_dev *dev;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
 	dev = &rte_eth_devices[port_id];
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->mtu_set, -ENOTSUP);
 
+	rte_eth_dev_info_get(port_id, &dev_info);
+	if (mtu < dev_info.min_mtu || mtu > dev_info.max_mtu)
+		return -EINVAL;
+
 	ret = (*dev->dev_ops->mtu_set)(dev, mtu);
 	if (!ret)
 		dev->data->mtu = mtu;
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 7070e9ab408f..5171a9083288 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1015,6 +1015,8 @@  struct rte_eth_dev_info {
 	const char *driver_name; /**< Device Driver name. */
 	unsigned int if_index; /**< Index to bound host interface, or 0 if none.
 		Use if_indextoname() to translate into an interface name. */
+	uint16_t min_mtu;	/**< Minimum MTU allowed */
+	uint16_t max_mtu;	/**< Maximum MTU allowed */
 	const uint32_t *dev_flags; /**< Device flags */
 	uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
 	uint32_t max_rx_pktlen; /**< Maximum configurable length of RX pkt. */