net: do not insert VLAN tag to shared mbufs

Message ID 20190416155126.26438-1-ferruh.yigit@intel.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series net: do not insert VLAN tag to shared mbufs |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/intel-Performance-Testing success Performance Testing PASS
ci/mellanox-Performance-Testing success Performance Testing PASS

Commit Message

Ferruh Yigit April 16, 2019, 3:51 p.m. UTC
  The vlan_insert() is buggy when it tires to handle the shared mbufs,
instead don't support inserting VLAN tag into shared mbufs and return
an error for that case.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Chas Williams <chas3@att.com>

This is another approach to RFC to fix the vlan_insert:
https://patches.dpdk.org/patch/51870/

vlan_insert() mostly used by drivers to insert VLAN tag into packet
data in Tx path, drivers creating new copies of mbufs in Tx path may
result unexpected behavior, like not freed or double freed mbufs.
---
 lib/librte_net/rte_ether.h | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)
  

Comments

Bruce Richardson April 16, 2019, 4:28 p.m. UTC | #1
On Tue, Apr 16, 2019 at 04:51:26PM +0100, Ferruh Yigit wrote:
> The vlan_insert() is buggy when it tires to handle the shared mbufs,
> instead don't support inserting VLAN tag into shared mbufs and return
> an error for that case.
> 
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Cc: Chas Williams <chas3@att.com>
> 
> This is another approach to RFC to fix the vlan_insert:
> https://patches.dpdk.org/patch/51870/
> 
> vlan_insert() mostly used by drivers to insert VLAN tag into packet
> data in Tx path, drivers creating new copies of mbufs in Tx path may
> result unexpected behavior, like not freed or double freed mbufs.
> ---
>  lib/librte_net/rte_ether.h | 11 ++---------
>  1 file changed, 2 insertions(+), 9 deletions(-)
>
So what is the API to be used if one does want to insert a vlan tag into a
shared mbuf?

Also, why is it such a problem to create new copies of data inside the
driver if that is necessary? You create a copy and use that, freeing the
original (i.e. in all likelyhood decrememting the ref-count since you no
longer use it). You already have the pointer to the mbuf pool from the
original buffer so you can get a copy from the same place. I'm curious to
know why it would be impossible to do a functionally correct
implementation?

/Bruce
  
Chas Williams April 16, 2019, 6:22 p.m. UTC | #2
On 4/16/19 11:51 AM, Ferruh Yigit wrote:
> The vlan_insert() is buggy when it tires to handle the shared mbufs,

s/tries/tries/

> instead don't support inserting VLAN tag into shared mbufs and return
> an error for that case.
> 
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Cc: Chas Williams <chas3@att.com>
> 
> This is another approach to RFC to fix the vlan_insert:
> https://patches.dpdk.org/patch/51870/
> 
> vlan_insert() mostly used by drivers to insert VLAN tag into packet
> data in Tx path, drivers creating new copies of mbufs in Tx path may
> result unexpected behavior, like not freed or double freed mbufs.
> ---
>   lib/librte_net/rte_ether.h | 11 ++---------
>   1 file changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h
> index 3a87ff184..a1df911b6 100644
> --- a/lib/librte_net/rte_ether.h
> +++ b/lib/librte_net/rte_ether.h
> @@ -388,15 +388,8 @@ static inline int rte_vlan_insert(struct rte_mbuf **m)
>   	struct vlan_hdr *vh;
>   
>   	/* Can't insert header if mbuf is shared */
> -	if (rte_mbuf_refcnt_read(*m) > 1) {
> -		struct rte_mbuf *copy;
> -
> -		copy = rte_pktmbuf_clone(*m, (*m)->pool);
> -		if (unlikely(copy == NULL))
> -			return -ENOMEM;
> -		rte_pktmbuf_free(*m);
> -		*m = copy;
> -	}
> +	if (!RTE_MBUF_DIRECT(*m) || rte_mbuf_refcnt_read(*m) > 1)
> +		return -EINVAL;
>   
>   	oh = rte_pktmbuf_mtod(*m, struct ether_hdr *);
>   	nh = (struct ether_hdr *)
>
  
Chas Williams April 16, 2019, 6:32 p.m. UTC | #3
On 4/16/19 12:28 PM, Bruce Richardson wrote:
> On Tue, Apr 16, 2019 at 04:51:26PM +0100, Ferruh Yigit wrote:
>> The vlan_insert() is buggy when it tires to handle the shared mbufs,
>> instead don't support inserting VLAN tag into shared mbufs and return
>> an error for that case.
>>
>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>> ---
>> Cc: Stephen Hemminger <stephen@networkplumber.org>
>> Cc: Chas Williams <chas3@att.com>
>>
>> This is another approach to RFC to fix the vlan_insert:
>> https://patches.dpdk.org/patch/51870/
>>
>> vlan_insert() mostly used by drivers to insert VLAN tag into packet
>> data in Tx path, drivers creating new copies of mbufs in Tx path may
>> result unexpected behavior, like not freed or double freed mbufs.
>> ---
>>   lib/librte_net/rte_ether.h | 11 ++---------
>>   1 file changed, 2 insertions(+), 9 deletions(-)
>>
> So what is the API to be used if one does want to insert a vlan tag into a
> shared mbuf?

It's unlikely you would ever want to do that.  Have one thread perform
some operation on the mbuf and other threads would expect this to have
happened? It seems counter to the way that packets might flow through an
application. Typically, you would insert the vlan and then share
the mbuf. Modifying a shared mbuf should make you ask, what are the
other copies expecting?

> Also, why is it such a problem to create new copies of data inside the
> driver if that is necessary? You create a copy and use that, freeing the
> original (i.e. in all likelyhood decrememting the ref-count since you no
> longer use it). You already have the pointer to the mbuf pool from the
> original buffer so you can get a copy from the same place. I'm curious to
> know why it would be impossible to do a functionally correct
> implementation?

It is not an issue to do this correctly. Hemminger did submit a patch
that appeared to do this correctly (I haven't tested it). As mentioned
earlier the tricky part is returning the buffer to the application. If
you create a copy and transmit fails, you need to free that buffer or
return it to the application for it to free. If you free the buffer when
making a buffer, you certainly can't return it to the application for
it to be freed a second time.

> /Bruce
>
  
Bruce Richardson April 17, 2019, 8:12 a.m. UTC | #4
On Tue, Apr 16, 2019 at 02:32:18PM -0400, Chas Williams wrote:
> 
> 
> On 4/16/19 12:28 PM, Bruce Richardson wrote:
> > On Tue, Apr 16, 2019 at 04:51:26PM +0100, Ferruh Yigit wrote:
> > > The vlan_insert() is buggy when it tires to handle the shared mbufs,
> > > instead don't support inserting VLAN tag into shared mbufs and return
> > > an error for that case.
> > > 
> > > Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > > ---
> > > Cc: Stephen Hemminger <stephen@networkplumber.org>
> > > Cc: Chas Williams <chas3@att.com>
> > > 
> > > This is another approach to RFC to fix the vlan_insert:
> > > https://patches.dpdk.org/patch/51870/
> > > 
> > > vlan_insert() mostly used by drivers to insert VLAN tag into packet
> > > data in Tx path, drivers creating new copies of mbufs in Tx path may
> > > result unexpected behavior, like not freed or double freed mbufs.
> > > ---
> > >   lib/librte_net/rte_ether.h | 11 ++---------
> > >   1 file changed, 2 insertions(+), 9 deletions(-)
> > > 
> > So what is the API to be used if one does want to insert a vlan tag into a
> > shared mbuf?
> 
> It's unlikely you would ever want to do that.  Have one thread perform
> some operation on the mbuf and other threads would expect this to have
> happened? It seems counter to the way that packets might flow through an
> application. Typically, you would insert the vlan and then share
> the mbuf. Modifying a shared mbuf should make you ask, what are the
> other copies expecting?
> 
The thing is that the reference count only indicates the number of pointers
to a buffer, it doesn't identify what parts are in use. So in the
fragmentation case, there may only be one mbuf actually referencing the
header part of the packet, with all other references to the memory being to
other parts further in. However, point taken about how the app pipeline layout
would probably make this issue unlikely.

> > Also, why is it such a problem to create new copies of data inside the
> > driver if that is necessary? You create a copy and use that, freeing the
> > original (i.e. in all likelyhood decrememting the ref-count since you no
> > longer use it). You already have the pointer to the mbuf pool from the
> > original buffer so you can get a copy from the same place. I'm curious to
> > know why it would be impossible to do a functionally correct
> > implementation?
> 
> It is not an issue to do this correctly. Hemminger did submit a patch
> that appeared to do this correctly (I haven't tested it). As mentioned
> earlier the tricky part is returning the buffer to the application. If
> you create a copy and transmit fails, you need to free that buffer or
> return it to the application for it to free. If you free the buffer when
> making a buffer, you certainly can't return it to the application for
> it to be freed a second time.
> 
Right. For transmit though, in most cases the only reason for failure is
lack of space in a transmit ring, so most NIC drivers can be sure of
success before cloning.

Overall, it seems the consensus is that for real-world cases it's better to
have this patch than not, so I'm ok for it to go into DPDK.

/Bruce
  
Olivier Matz May 13, 2019, 12:43 p.m. UTC | #5
On Wed, Apr 17, 2019 at 09:12:55AM +0100, Bruce Richardson wrote:
> On Tue, Apr 16, 2019 at 02:32:18PM -0400, Chas Williams wrote:
> > 
> > 
> > On 4/16/19 12:28 PM, Bruce Richardson wrote:
> > > On Tue, Apr 16, 2019 at 04:51:26PM +0100, Ferruh Yigit wrote:
> > > > The vlan_insert() is buggy when it tires to handle the shared mbufs,
> > > > instead don't support inserting VLAN tag into shared mbufs and return
> > > > an error for that case.
> > > > 
> > > > Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > > > ---
> > > > Cc: Stephen Hemminger <stephen@networkplumber.org>
> > > > Cc: Chas Williams <chas3@att.com>
> > > > 
> > > > This is another approach to RFC to fix the vlan_insert:
> > > > https://patches.dpdk.org/patch/51870/
> > > > 
> > > > vlan_insert() mostly used by drivers to insert VLAN tag into packet
> > > > data in Tx path, drivers creating new copies of mbufs in Tx path may
> > > > result unexpected behavior, like not freed or double freed mbufs.
> > > > ---
> > > >   lib/librte_net/rte_ether.h | 11 ++---------
> > > >   1 file changed, 2 insertions(+), 9 deletions(-)
> > > > 
> > > So what is the API to be used if one does want to insert a vlan tag into a
> > > shared mbuf?
> > 
> > It's unlikely you would ever want to do that.  Have one thread perform
> > some operation on the mbuf and other threads would expect this to have
> > happened? It seems counter to the way that packets might flow through an
> > application. Typically, you would insert the vlan and then share
> > the mbuf. Modifying a shared mbuf should make you ask, what are the
> > other copies expecting?
> > 
> The thing is that the reference count only indicates the number of pointers
> to a buffer, it doesn't identify what parts are in use. So in the
> fragmentation case, there may only be one mbuf actually referencing the
> header part of the packet, with all other references to the memory being to
> other parts further in. However, point taken about how the app pipeline layout
> would probably make this issue unlikely.

Yes, the difficulty here is that the condition
(!RTE_MBUF_DIRECT(*m) || rte_mbuf_refcnt_read(*m) > 1)
is not an exact equivalent of "the mbuf is writable".

Of course, it the mbuf is direct and refcnt is 1, the mbuf is writable.
But we can imagine other cases where mbuf is writable. For instance, a
PMD that receives several packets in one big mbuf (with an appropriate
headroom for each), then create one indirect mbuf for each packet.

We probably miss an API to express that the mbuf is writable.

> > > Also, why is it such a problem to create new copies of data inside the
> > > driver if that is necessary? You create a copy and use that, freeing the
> > > original (i.e. in all likelyhood decrememting the ref-count since you no
> > > longer use it). You already have the pointer to the mbuf pool from the
> > > original buffer so you can get a copy from the same place. I'm curious to
> > > know why it would be impossible to do a functionally correct
> > > implementation?
> > 
> > It is not an issue to do this correctly. Hemminger did submit a patch
> > that appeared to do this correctly (I haven't tested it). As mentioned
> > earlier the tricky part is returning the buffer to the application. If
> > you create a copy and transmit fails, you need to free that buffer or
> > return it to the application for it to free. If you free the buffer when
> > making a buffer, you certainly can't return it to the application for
> > it to be freed a second time.
> > 
> Right. For transmit though, in most cases the only reason for failure is
> lack of space in a transmit ring, so most NIC drivers can be sure of
> success before cloning.
> 
> Overall, it seems the consensus is that for real-world cases it's better to
> have this patch than not, so I'm ok for it to go into DPDK.

Agree.

Acked-by: Olivier Matz <olivier.matz@6wind.com>
  
Thomas Monjalon July 4, 2019, 2:01 p.m. UTC | #6
13/05/2019 14:43, Olivier Matz:
> On Wed, Apr 17, 2019 at 09:12:55AM +0100, Bruce Richardson wrote:
> > On Tue, Apr 16, 2019 at 02:32:18PM -0400, Chas Williams wrote:
> > > 
> > > 
> > > On 4/16/19 12:28 PM, Bruce Richardson wrote:
> > > > On Tue, Apr 16, 2019 at 04:51:26PM +0100, Ferruh Yigit wrote:
> > > > > The vlan_insert() is buggy when it tires to handle the shared mbufs,
> > > > > instead don't support inserting VLAN tag into shared mbufs and return
> > > > > an error for that case.
> > > > > 
> > > > > Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > > > > ---
> > > > > Cc: Stephen Hemminger <stephen@networkplumber.org>
> > > > > Cc: Chas Williams <chas3@att.com>
> > > > > 
> > > > > This is another approach to RFC to fix the vlan_insert:
> > > > > https://patches.dpdk.org/patch/51870/
> > > > > 
> > > > > vlan_insert() mostly used by drivers to insert VLAN tag into packet
> > > > > data in Tx path, drivers creating new copies of mbufs in Tx path may
> > > > > result unexpected behavior, like not freed or double freed mbufs.
> > > > > ---
> > > > >   lib/librte_net/rte_ether.h | 11 ++---------
> > > > >   1 file changed, 2 insertions(+), 9 deletions(-)
> > > > > 
> > > > So what is the API to be used if one does want to insert a vlan tag into a
> > > > shared mbuf?
> > > 
> > > It's unlikely you would ever want to do that.  Have one thread perform
> > > some operation on the mbuf and other threads would expect this to have
> > > happened? It seems counter to the way that packets might flow through an
> > > application. Typically, you would insert the vlan and then share
> > > the mbuf. Modifying a shared mbuf should make you ask, what are the
> > > other copies expecting?
> > > 
> > The thing is that the reference count only indicates the number of pointers
> > to a buffer, it doesn't identify what parts are in use. So in the
> > fragmentation case, there may only be one mbuf actually referencing the
> > header part of the packet, with all other references to the memory being to
> > other parts further in. However, point taken about how the app pipeline layout
> > would probably make this issue unlikely.
> 
> Yes, the difficulty here is that the condition
> (!RTE_MBUF_DIRECT(*m) || rte_mbuf_refcnt_read(*m) > 1)
> is not an exact equivalent of "the mbuf is writable".
> 
> Of course, it the mbuf is direct and refcnt is 1, the mbuf is writable.
> But we can imagine other cases where mbuf is writable. For instance, a
> PMD that receives several packets in one big mbuf (with an appropriate
> headroom for each), then create one indirect mbuf for each packet.
> 
> We probably miss an API to express that the mbuf is writable.
> 
> > > > Also, why is it such a problem to create new copies of data inside the
> > > > driver if that is necessary? You create a copy and use that, freeing the
> > > > original (i.e. in all likelyhood decrememting the ref-count since you no
> > > > longer use it). You already have the pointer to the mbuf pool from the
> > > > original buffer so you can get a copy from the same place. I'm curious to
> > > > know why it would be impossible to do a functionally correct
> > > > implementation?
> > > 
> > > It is not an issue to do this correctly. Hemminger did submit a patch
> > > that appeared to do this correctly (I haven't tested it). As mentioned
> > > earlier the tricky part is returning the buffer to the application. If
> > > you create a copy and transmit fails, you need to free that buffer or
> > > return it to the application for it to free. If you free the buffer when
> > > making a buffer, you certainly can't return it to the application for
> > > it to be freed a second time.
> > > 
> > Right. For transmit though, in most cases the only reason for failure is
> > lack of space in a transmit ring, so most NIC drivers can be sure of
> > success before cloning.
> > 
> > Overall, it seems the consensus is that for real-world cases it's better to
> > have this patch than not, so I'm ok for it to go into DPDK.
> 
> Agree.
> 
> Acked-by: Olivier Matz <olivier.matz@6wind.com>

Applied, sorry this patch was forgotten.
  

Patch

diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h
index 3a87ff184..a1df911b6 100644
--- a/lib/librte_net/rte_ether.h
+++ b/lib/librte_net/rte_ether.h
@@ -388,15 +388,8 @@  static inline int rte_vlan_insert(struct rte_mbuf **m)
 	struct vlan_hdr *vh;
 
 	/* Can't insert header if mbuf is shared */
-	if (rte_mbuf_refcnt_read(*m) > 1) {
-		struct rte_mbuf *copy;
-
-		copy = rte_pktmbuf_clone(*m, (*m)->pool);
-		if (unlikely(copy == NULL))
-			return -ENOMEM;
-		rte_pktmbuf_free(*m);
-		*m = copy;
-	}
+	if (!RTE_MBUF_DIRECT(*m) || rte_mbuf_refcnt_read(*m) > 1)
+		return -EINVAL;
 
 	oh = rte_pktmbuf_mtod(*m, struct ether_hdr *);
 	nh = (struct ether_hdr *)