[v3] vhost: add header check in dequeue offload

Message ID 20210317063109.135662-1-xiao.w.wang@intel.com (mailing list archive)
State Rejected, archived
Delegated to: Maxime Coquelin
Headers
Series [v3] vhost: add header check in dequeue offload |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/travis-robot success travis build: passed
ci/github-robot success github build: passed
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS

Commit Message

Xiao Wang March 17, 2021, 6:31 a.m. UTC
  When parsing the virtio net header and packet header for dequeue offload,
we need to perform sanity check on the packet header to ensure:
  - No out-of-boundary memory access.
  - The packet header and virtio_net header are valid and aligned.

Fixes: d0cf91303d73 ("vhost: add Tx offload capabilities")
Cc: stable@dpdk.org

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
v3:
- Check data_len before calling rte_pktmbuf_mtod. (David)

v2:
- Allow empty L4 payload for cksum offload. (Konstantin)
---
 lib/librte_vhost/virtio_net.c | 49 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 43 insertions(+), 6 deletions(-)
  

Comments

David Marchand April 1, 2021, 12:04 p.m. UTC | #1
On Wed, Mar 17, 2021 at 7:50 AM Xiao Wang <xiao.w.wang@intel.com> wrote:
>
> When parsing the virtio net header and packet header for dequeue offload,
> we need to perform sanity check on the packet header to ensure:
>   - No out-of-boundary memory access.
>   - The packet header and virtio_net header are valid and aligned.
>
> Fixes: d0cf91303d73 ("vhost: add Tx offload capabilities")
> Cc: stable@dpdk.org
>
> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>

I spent some time digging on this topic.

Afaiu the offload API, vhost is not supposed to populate tx offloads.
I would drop this whole parse_ethernet function and replace
vhost_dequeue_offload with what virtio does on the rx side.

Please have a look at this series (especially the last patch):
http://patchwork.dpdk.org/project/dpdk/list/?series=16052


Thanks.
  
Xiao Wang April 2, 2021, 8:38 a.m. UTC | #2
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Thursday, April 1, 2021 8:04 PM
> To: Wang, Xiao W <xiao.w.wang@intel.com>
> Cc: Xia, Chenbo <chenbo.xia@intel.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; Liu, Yong <yong.liu@intel.com>; dev
> <dev@dpdk.org>; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> dpdk stable <stable@dpdk.org>
> Subject: Re: [PATCH v3] vhost: add header check in dequeue offload
> 
> On Wed, Mar 17, 2021 at 7:50 AM Xiao Wang <xiao.w.wang@intel.com>
> wrote:
> >
> > When parsing the virtio net header and packet header for dequeue offload,
> > we need to perform sanity check on the packet header to ensure:
> >   - No out-of-boundary memory access.
> >   - The packet header and virtio_net header are valid and aligned.
> >
> > Fixes: d0cf91303d73 ("vhost: add Tx offload capabilities")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> 
> I spent some time digging on this topic.
> 
> Afaiu the offload API, vhost is not supposed to populate tx offloads.
> I would drop this whole parse_ethernet function and replace
> vhost_dequeue_offload with what virtio does on the rx side.
> 
> Please have a look at this series (especially the last patch):
> http://patchwork.dpdk.org/project/dpdk/list/?series=16052
> 
> 
> Thanks.
> 
> --
> David Marchand

+Yang ,Yi into this loop who may have comments especially from OVS perspective on CKSUM/TSO/TSO in tunnel/etc..

I think the original vhost implementation here is to help pass virtio's offload request onto the next output port, either physical device or a virtio device.
If we go with series http://patchwork.dpdk.org/project/dpdk/list/?series=16052, then virtual switch need to do an extra translation on the flags:
e.g. PKT_RX_LRO --> PKT_TX_TCP_SEG. The question is that a packet marked with PKT_RX_LRO may come from different types of ports (non-vhost), how vSwitch can tell if TSO request should be set for this packet at transmission?

If I think from an endpoint app's perspective, I'm inclined to agree with your series. If I think from a switch/router's perspective, I'm inclined to keep the current implementation. Maybe we can add PKT_RX_L4_CKSUM_NONE/PKT_RX_LRO flags into the current implementation, seems this method can cover both scenarios.

BRs,
Xiao
  
Xiao Wang April 12, 2021, 9:08 a.m. UTC | #3
Hi,

> -----Original Message-----
> From: Wang, Xiao W
> Sent: Friday, April 2, 2021 4:39 PM
> To: David Marchand <david.marchand@redhat.com>
> Cc: Xia, Chenbo <Chenbo.Xia@intel.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; Liu, Yong <yong.liu@intel.com>; dev
> <dev@dpdk.org>; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> dpdk stable <stable@dpdk.org>; yangyi01@inspur.com
> Subject: RE: [PATCH v3] vhost: add header check in dequeue offload
> 
> 
> > -----Original Message-----
> > From: David Marchand <david.marchand@redhat.com>
> > Sent: Thursday, April 1, 2021 8:04 PM
> > To: Wang, Xiao W <xiao.w.wang@intel.com>
> > Cc: Xia, Chenbo <chenbo.xia@intel.com>; Maxime Coquelin
> > <maxime.coquelin@redhat.com>; Liu, Yong <yong.liu@intel.com>; dev
> > <dev@dpdk.org>; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> > dpdk stable <stable@dpdk.org>
> > Subject: Re: [PATCH v3] vhost: add header check in dequeue offload
> >
> > On Wed, Mar 17, 2021 at 7:50 AM Xiao Wang <xiao.w.wang@intel.com>
> > wrote:
> > >
> > > When parsing the virtio net header and packet header for dequeue
> offload,
> > > we need to perform sanity check on the packet header to ensure:
> > >   - No out-of-boundary memory access.
> > >   - The packet header and virtio_net header are valid and aligned.
> > >
> > > Fixes: d0cf91303d73 ("vhost: add Tx offload capabilities")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> >
> > I spent some time digging on this topic.
> >
> > Afaiu the offload API, vhost is not supposed to populate tx offloads.
> > I would drop this whole parse_ethernet function and replace
> > vhost_dequeue_offload with what virtio does on the rx side.
> >
> > Please have a look at this series (especially the last patch):
> > http://patchwork.dpdk.org/project/dpdk/list/?series=16052
> >
> >
> > Thanks.
> >
> > --
> > David Marchand
> 
> +Yang ,Yi into this loop who may have comments especially from OVS
> perspective on CKSUM/TSO/TSO in tunnel/etc..
> 
> I think the original vhost implementation here is to help pass virtio's offload
> request onto the next output port, either physical device or a virtio device.
> If we go with series
> http://patchwork.dpdk.org/project/dpdk/list/?series=16052, then virtual
> switch need to do an extra translation on the flags:
> e.g. PKT_RX_LRO --> PKT_TX_TCP_SEG. The question is that a packet
> marked with PKT_RX_LRO may come from different types of ports (non-
> vhost), how vSwitch can tell if TSO request should be set for this packet at
> transmission?
> 
> If I think from an endpoint app's perspective, I'm inclined to agree with your
> series. If I think from a switch/router's perspective, I'm inclined to keep the
> current implementation. Maybe we can add
> PKT_RX_L4_CKSUM_NONE/PKT_RX_LRO flags into the current
> implementation, seems this method can cover both scenarios.
> 
> BRs,
> Xiao
> 
> 

Considering the major consumer of vhost API is virtual switch/router, I tend to keep the current implementation and apply this fix patch.
Any comments?

BRs,
Xiao
  
David Marchand April 12, 2021, 9:33 a.m. UTC | #4
On Mon, Apr 12, 2021 at 11:09 AM Wang, Xiao W <xiao.w.wang@intel.com> wrote:
> Considering the major consumer of vhost API is virtual switch/router, I tend to keep the current implementation and apply this fix patch.
> Any comments?

This is just a hack that bypasses the vswitch control.

It happens to work when the vswitch does nothing.
If anything is done, like popping a vlan header, the vswitch needs to
update l3 offset.
  
Maxime Coquelin April 13, 2021, 2:30 p.m. UTC | #5
On 4/12/21 11:33 AM, David Marchand wrote:
> On Mon, Apr 12, 2021 at 11:09 AM Wang, Xiao W <xiao.w.wang@intel.com> wrote:
>> Considering the major consumer of vhost API is virtual switch/router, I tend to keep the current implementation and apply this fix patch.
>> Any comments?
> 
> This is just a hack that bypasses the vswitch control.
> 
> It happens to work when the vswitch does nothing.
> If anything is done, like popping a vlan header, the vswitch needs to
> update l3 offset.
> 
> 

I agree with David, current behavior is wrong.

Furthermore, when the lib is used via the Vhost PMD, the application
should not have to handle it differently on whether it is Vhost PMD or
any physical NIC PMD.
  
Xiao Wang May 8, 2021, 5:54 a.m. UTC | #6
Hi Maxime and David,

I see patch " vhost: fix offload flags in Rx path " http://patches.dpdk.org/project/dpdk/patch/20210503164344.27916-4-david.marchand@redhat.com/ has been merged, and the legacy implementation is kept. Do you think we still need to fix the header check for the legacy implementation?

BRs,
Xiao

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, April 13, 2021 10:31 PM
> To: David Marchand <david.marchand@redhat.com>; Wang, Xiao W
> <xiao.w.wang@intel.com>
> Cc: Xia, Chenbo <chenbo.xia@intel.com>; Liu, Yong <yong.liu@intel.com>;
> dev <dev@dpdk.org>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; dpdk stable <stable@dpdk.org>;
> yangyi01@inspur.com
> Subject: Re: [PATCH v3] vhost: add header check in dequeue offload
> 
> 
> 
> On 4/12/21 11:33 AM, David Marchand wrote:
> > On Mon, Apr 12, 2021 at 11:09 AM Wang, Xiao W
> <xiao.w.wang@intel.com> wrote:
> >> Considering the major consumer of vhost API is virtual switch/router, I
> tend to keep the current implementation and apply this fix patch.
> >> Any comments?
> >
> > This is just a hack that bypasses the vswitch control.
> >
> > It happens to work when the vswitch does nothing.
> > If anything is done, like popping a vlan header, the vswitch needs to
> > update l3 offset.
> >
> >
> 
> I agree with David, current behavior is wrong.
> 
> Furthermore, when the lib is used via the Vhost PMD, the application
> should not have to handle it differently on whether it is Vhost PMD or
> any physical NIC PMD.
  

Patch

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 583bf379c6..f8712f5417 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -1821,44 +1821,64 @@  virtio_net_with_host_offload(struct virtio_net *dev)
 	return false;
 }
 
-static void
-parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
+static int
+parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr,
+		uint16_t *len)
 {
 	struct rte_ipv4_hdr *ipv4_hdr;
 	struct rte_ipv6_hdr *ipv6_hdr;
 	void *l3_hdr = NULL;
 	struct rte_ether_hdr *eth_hdr;
 	uint16_t ethertype;
+	uint16_t data_len = m->data_len;
+
+	if (data_len <= sizeof(struct rte_ether_hdr))
+		return -EINVAL;
 
 	eth_hdr = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
 
 	m->l2_len = sizeof(struct rte_ether_hdr);
 	ethertype = rte_be_to_cpu_16(eth_hdr->ether_type);
+	data_len -= sizeof(struct rte_ether_hdr);
 
 	if (ethertype == RTE_ETHER_TYPE_VLAN) {
+		if (data_len <= sizeof(struct rte_vlan_hdr))
+			return -EINVAL;
+
 		struct rte_vlan_hdr *vlan_hdr =
 			(struct rte_vlan_hdr *)(eth_hdr + 1);
 
 		m->l2_len += sizeof(struct rte_vlan_hdr);
 		ethertype = rte_be_to_cpu_16(vlan_hdr->eth_proto);
+		data_len -= sizeof(struct rte_vlan_hdr);
 	}
 
 	l3_hdr = (char *)eth_hdr + m->l2_len;
 
 	switch (ethertype) {
 	case RTE_ETHER_TYPE_IPV4:
+		if (data_len <= sizeof(struct rte_ipv4_hdr))
+			return -EINVAL;
 		ipv4_hdr = l3_hdr;
 		*l4_proto = ipv4_hdr->next_proto_id;
 		m->l3_len = rte_ipv4_hdr_len(ipv4_hdr);
+		if (data_len <= m->l3_len) {
+			m->l3_len = 0;
+			return -EINVAL;
+		}
 		*l4_hdr = (char *)l3_hdr + m->l3_len;
 		m->ol_flags |= PKT_TX_IPV4;
+		data_len -= m->l3_len;
 		break;
 	case RTE_ETHER_TYPE_IPV6:
+		if (data_len <= sizeof(struct rte_ipv6_hdr))
+			return -EINVAL;
 		ipv6_hdr = l3_hdr;
 		*l4_proto = ipv6_hdr->proto;
 		m->l3_len = sizeof(struct rte_ipv6_hdr);
 		*l4_hdr = (char *)l3_hdr + m->l3_len;
 		m->ol_flags |= PKT_TX_IPV6;
+		data_len -= m->l3_len;
 		break;
 	default:
 		m->l3_len = 0;
@@ -1866,6 +1886,9 @@  parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
 		*l4_hdr = NULL;
 		break;
 	}
+
+	*len = data_len;
+	return 0;
 }
 
 static __rte_always_inline void
@@ -1874,24 +1897,30 @@  vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
 	uint16_t l4_proto = 0;
 	void *l4_hdr = NULL;
 	struct rte_tcp_hdr *tcp_hdr = NULL;
+	uint16_t len = 0;
 
 	if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
 		return;
 
-	parse_ethernet(m, &l4_proto, &l4_hdr);
+	if (parse_ethernet(m, &l4_proto, &l4_hdr, &len) < 0)
+		return;
+
 	if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
 		if (hdr->csum_start == (m->l2_len + m->l3_len)) {
 			switch (hdr->csum_offset) {
 			case (offsetof(struct rte_tcp_hdr, cksum)):
-				if (l4_proto == IPPROTO_TCP)
+				if (l4_proto == IPPROTO_TCP &&
+					len >= sizeof(struct rte_tcp_hdr))
 					m->ol_flags |= PKT_TX_TCP_CKSUM;
 				break;
 			case (offsetof(struct rte_udp_hdr, dgram_cksum)):
-				if (l4_proto == IPPROTO_UDP)
+				if (l4_proto == IPPROTO_UDP &&
+					len >= sizeof(struct rte_udp_hdr))
 					m->ol_flags |= PKT_TX_UDP_CKSUM;
 				break;
 			case (offsetof(struct rte_sctp_hdr, cksum)):
-				if (l4_proto == IPPROTO_SCTP)
+				if (l4_proto == IPPROTO_SCTP &&
+					len >= sizeof(struct rte_sctp_hdr))
 					m->ol_flags |= PKT_TX_SCTP_CKSUM;
 				break;
 			default:
@@ -1904,12 +1933,20 @@  vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
 		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
 		case VIRTIO_NET_HDR_GSO_TCPV4:
 		case VIRTIO_NET_HDR_GSO_TCPV6:
+			if (l4_proto != IPPROTO_TCP ||
+					len <= sizeof(struct rte_tcp_hdr))
+				break;
 			tcp_hdr = l4_hdr;
+			if (len <= (tcp_hdr->data_off & 0xf0) >> 2)
+				break;
 			m->ol_flags |= PKT_TX_TCP_SEG;
 			m->tso_segsz = hdr->gso_size;
 			m->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
 			break;
 		case VIRTIO_NET_HDR_GSO_UDP:
+			if (l4_proto != IPPROTO_UDP ||
+					len <= sizeof(struct rte_udp_hdr))
+				break;
 			m->ol_flags |= PKT_TX_UDP_SEG;
 			m->tso_segsz = hdr->gso_size;
 			m->l4_len = sizeof(struct rte_udp_hdr);