[RFC,4/4] net/ether: use bitops to speedup comparison

Message ID 20190515221952.21959-5-stephen@networkplumber.org (mailing list archive)
State Superseded, archived
Headers
Series net/ether: improvements |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Stephen Hemminger May 15, 2019, 10:19 p.m. UTC
  Using bit operations like or and xor is faster than a loop
on all architectures. Really just explicit unrolling.

Similar cast to uint16 unaligned is already done in
other functions here.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/librte_net/rte_ether.h | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)
  

Comments

Mattias Rönnblom May 16, 2019, 9:03 a.m. UTC | #1
On 2019-05-16 00:19, Stephen Hemminger wrote:
> Using bit operations like or and xor is faster than a loop
> on all architectures. Really just explicit unrolling.
> 
> Similar cast to uint16 unaligned is already done in
> other functions here.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>   lib/librte_net/rte_ether.h | 17 +++++++----------
>   1 file changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h
> index b94e64b2195e..5d9242cda230 100644
> --- a/lib/librte_net/rte_ether.h
> +++ b/lib/librte_net/rte_ether.h
> @@ -78,11 +78,10 @@ struct ether_addr {
>   static inline int is_same_ether_addr(const struct ether_addr *ea1,
>   				     const struct ether_addr *ea2)
>   {
> -	int i;
> -	for (i = 0; i < ETHER_ADDR_LEN; i++)
> -		if (ea1->addr_bytes[i] != ea2->addr_bytes[i])
> -			return 0;
> -	return 1;
> +	const unaligned_uint16_t *w1 = (const uint16_t *)ea1;
> +	const unaligned_uint16_t *w2 = (const uint16_t *)ea2;
> +
> +	return ((w1[0] ^ w2[0]) | (w1[1] ^ w2[1]) | (w1[2] ^ w2[2])) == 0;
>   }
>   

If you want to shave off a couple of instructions, you can switch the 
three 16-bit loads to one 32-bit and one 16-bit load.

Something like:

	const uint8_t *ea1_b = (const uint8_t *)ea1;
	const uint8_t *ea2_b = (const uint8_t *)ea2;
	uint32_t ea1_h;
	uint32_t ea2_h;
	uint16_t ea1_l;
	uint16_t ea2_l;

	memcpy(&ea1_h, &ea1_b[0], sizeof(ea1_h));
	memcpy(&ea1_l, &ea1_b[sizeof(ea1_h)], sizeof(ea1_l));

	memcpy(&ea2_h, &ea2_b[0], sizeof(ea2_h));
	memcpy(&ea2_l, &ea2_b[sizeof(ea2_h)], sizeof(ea2_l));

	return ((ea1_l ^ ea2_l) | (ea1_h ^ ea2_h)) == 0;

Code is not as clean as your solution though.

>   /**
> @@ -97,11 +96,9 @@ static inline int is_same_ether_addr(const struct ether_addr *ea1,
>    */
>   static inline int is_zero_ether_addr(const struct ether_addr *ea)
>   {
> -	int i;
> -	for (i = 0; i < ETHER_ADDR_LEN; i++)
> -		if (ea->addr_bytes[i] != 0x00)
> -			return 0;
> -	return 1;
> +	const unaligned_uint16_t *w = (const uint16_t *)ea;
> +
> +	return (w[0] | w[1] | w[2]) == 0;
>   }
>   
>   /**
>
  
Stephen Hemminger May 16, 2019, 3:32 p.m. UTC | #2
On Thu, 16 May 2019 11:03:10 +0200
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> On 2019-05-16 00:19, Stephen Hemminger wrote:
> > Using bit operations like or and xor is faster than a loop
> > on all architectures. Really just explicit unrolling.
> > 
> > Similar cast to uint16 unaligned is already done in
> > other functions here.
> > 
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >   lib/librte_net/rte_ether.h | 17 +++++++----------
> >   1 file changed, 7 insertions(+), 10 deletions(-)
> > 
> > diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h
> > index b94e64b2195e..5d9242cda230 100644
> > --- a/lib/librte_net/rte_ether.h
> > +++ b/lib/librte_net/rte_ether.h
> > @@ -78,11 +78,10 @@ struct ether_addr {
> >   static inline int is_same_ether_addr(const struct ether_addr *ea1,
> >   				     const struct ether_addr *ea2)
> >   {
> > -	int i;
> > -	for (i = 0; i < ETHER_ADDR_LEN; i++)
> > -		if (ea1->addr_bytes[i] != ea2->addr_bytes[i])
> > -			return 0;
> > -	return 1;
> > +	const unaligned_uint16_t *w1 = (const uint16_t *)ea1;
> > +	const unaligned_uint16_t *w2 = (const uint16_t *)ea2;
> > +
> > +	return ((w1[0] ^ w2[0]) | (w1[1] ^ w2[1]) | (w1[2] ^ w2[2])) == 0;
> >   }
> >     
> 
> If you want to shave off a couple of instructions, you can switch the 
> three 16-bit loads to one 32-bit and one 16-bit load.
> 
> Something like:
> 
> 	const uint8_t *ea1_b = (const uint8_t *)ea1;
> 	const uint8_t *ea2_b = (const uint8_t *)ea2;
> 	uint32_t ea1_h;
> 	uint32_t ea2_h;
> 	uint16_t ea1_l;
> 	uint16_t ea2_l;
> 
> 	memcpy(&ea1_h, &ea1_b[0], sizeof(ea1_h));
> 	memcpy(&ea1_l, &ea1_b[sizeof(ea1_h)], sizeof(ea1_l));
> 
> 	memcpy(&ea2_h, &ea2_b[0], sizeof(ea2_h));
> 	memcpy(&ea2_l, &ea2_b[sizeof(ea2_h)], sizeof(ea2_l));
> 
> 	return ((ea1_l ^ ea2_l) | (ea1_h ^ ea2_h)) == 0;
> 
> Code is not as clean as your solution though.
> 
> >   /**
> > @@ -97,11 +96,9 @@ static inline int is_same_ether_addr(const struct ether_addr *ea1,
> >    */
> >   static inline int is_zero_ether_addr(const struct ether_addr *ea)
> >   {
> > -	int i;
> > -	for (i = 0; i < ETHER_ADDR_LEN; i++)
> > -		if (ea->addr_bytes[i] != 0x00)
> > -			return 0;
> > -	return 1;
> > +	const unaligned_uint16_t *w = (const uint16_t *)ea;
> > +
> > +	return (w[0] | w[1] | w[2]) == 0;
> >   }
> >   
> >   /**
> >   

You can even do 64 bit load and then mask, which is what Linux kernel
does. But not sure it matters. The cost of the loop is what is expensive
  
Bruce Richardson May 16, 2019, 4:03 p.m. UTC | #3
On Wed, May 15, 2019 at 03:19:52PM -0700, Stephen Hemminger wrote:
> Using bit operations like or and xor is faster than a loop
> on all architectures. Really just explicit unrolling.
> 
> Similar cast to uint16 unaligned is already done in
> other functions here.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/librte_net/rte_ether.h | 17 +++++++----------
>  1 file changed, 7 insertions(+), 10 deletions(-)
> 
Rather than casting to unaligned values, which gives compiler warnings in
some cases, I believe we should just mark the ethernet addresses as always
being 2-byte aligned and simplify things. [unless we have a good use case
where we won't have 2-byte alignment???].

See patch: http://patches.dpdk.org/patch/53482/

Regards,
/Bruce
  
Stephen Hemminger May 16, 2019, 4:06 p.m. UTC | #4
On Thu, 16 May 2019 17:03:37 +0100
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Wed, May 15, 2019 at 03:19:52PM -0700, Stephen Hemminger wrote:
> > Using bit operations like or and xor is faster than a loop
> > on all architectures. Really just explicit unrolling.
> > 
> > Similar cast to uint16 unaligned is already done in
> > other functions here.
> > 
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >  lib/librte_net/rte_ether.h | 17 +++++++----------
> >  1 file changed, 7 insertions(+), 10 deletions(-)
> >   
> Rather than casting to unaligned values, which gives compiler warnings in
> some cases, I believe we should just mark the ethernet addresses as always
> being 2-byte aligned and simplify things. [unless we have a good use case
> where we won't have 2-byte alignment???].
> 
> See patch: http://patches.dpdk.org/patch/53482/
> 
> Regards,
> /Bruce

I agree. Then you could also remove the unaligned_uint16_t that
already exists in rte_ether.h

Do you want me to put your patch in my series?
  
Bruce Richardson May 16, 2019, 4:07 p.m. UTC | #5
On Thu, May 16, 2019 at 09:06:52AM -0700, Stephen Hemminger wrote:
> On Thu, 16 May 2019 17:03:37 +0100
> Bruce Richardson <bruce.richardson@intel.com> wrote:
> 
> > On Wed, May 15, 2019 at 03:19:52PM -0700, Stephen Hemminger wrote:
> > > Using bit operations like or and xor is faster than a loop
> > > on all architectures. Really just explicit unrolling.
> > > 
> > > Similar cast to uint16 unaligned is already done in
> > > other functions here.
> > > 
> > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > > ---
> > >  lib/librte_net/rte_ether.h | 17 +++++++----------
> > >  1 file changed, 7 insertions(+), 10 deletions(-)
> > >   
> > Rather than casting to unaligned values, which gives compiler warnings in
> > some cases, I believe we should just mark the ethernet addresses as always
> > being 2-byte aligned and simplify things. [unless we have a good use case
> > where we won't have 2-byte alignment???].
> > 
> > See patch: http://patches.dpdk.org/patch/53482/
> > 
> > Regards,
> > /Bruce
> 
> I agree. Then you could also remove the unaligned_uint16_t that
> already exists in rte_ether.h
> 
> Do you want me to put your patch in my series?

Sure, feel free.

Thanks,
/Bruce
  
Bruce Richardson May 16, 2019, 4:36 p.m. UTC | #6
On Thu, May 16, 2019 at 05:07:45PM +0100, Bruce Richardson wrote:
> On Thu, May 16, 2019 at 09:06:52AM -0700, Stephen Hemminger wrote:
> > On Thu, 16 May 2019 17:03:37 +0100
> > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > 
> > > On Wed, May 15, 2019 at 03:19:52PM -0700, Stephen Hemminger wrote:
> > > > Using bit operations like or and xor is faster than a loop
> > > > on all architectures. Really just explicit unrolling.
> > > > 
> > > > Similar cast to uint16 unaligned is already done in
> > > > other functions here.
> > > > 
> > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > > > ---
> > > >  lib/librte_net/rte_ether.h | 17 +++++++----------
> > > >  1 file changed, 7 insertions(+), 10 deletions(-)
> > > >   
> > > Rather than casting to unaligned values, which gives compiler warnings in
> > > some cases, I believe we should just mark the ethernet addresses as always
> > > being 2-byte aligned and simplify things. [unless we have a good use case
> > > where we won't have 2-byte alignment???].
> > > 
> > > See patch: http://patches.dpdk.org/patch/53482/
> > > 
> > > Regards,
> > > /Bruce
> > 
> > I agree. Then you could also remove the unaligned_uint16_t that
> > already exists in rte_ether.h
> > 
> > Do you want me to put your patch in my series?
> 
> Sure, feel free.
> 

I've just found another problem in this area. Compiling l2fwd with gcc 9, I
get:

main.c:164:54: warning: taking address of packed member of ‘struct ether_hdr’ may result in an unaligned pointer value [-Waddress-of-packed-member]
  164 |  ether_addr_copy(&l2fwd_ports_eth_addr[dest_portid], &eth->s_addr);
      | 

Looking at some of the structures, it appears that not all the packet
attributes may be necessary. For example, while AFAIK there are not
absolute guarantees about structure padding, for all compilers I remember
using the following changes are safe:

diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h
index 8090b7c01..7d9f34791 100644
--- a/lib/librte_net/rte_ether.h
+++ b/lib/librte_net/rte_ether.h
@@ -57,7 +57,7 @@ extern "C" {
 struct ether_addr {
        /** Addr bytes in tx order */
        uint8_t addr_bytes[ETHER_ADDR_LEN] __rte_aligned(2);
-} __attribute__((__packed__));
+};

 #define ETHER_LOCAL_ADMIN_ADDR 0x02 /**< Locally assigned Eth. address. */
 #define ETHER_GROUP_ADDR       0x01 /**< Multicast or broadcast Eth. address. */
@@ -273,7 +273,7 @@ struct ether_hdr {
        struct ether_addr d_addr; /**< Destination address. */
        struct ether_addr s_addr; /**< Source address. */
        uint16_t ether_type;      /**< Frame type. */
-} __attribute__((__packed__));
+};

 /**
  * Ethernet VLAN Header.
@@ -283,7 +283,7 @@ struct ether_hdr {
 struct vlan_hdr {
        uint16_t vlan_tci; /**< Priority (3) + CFI (1) + Identifier Code (12) */
        uint16_t eth_proto;/**< Ethernet type of encapsulated frame. */
-} __attribute__((__packed__));
+};

 /**
  * VXLAN protocol header.

I think we therefore should consider removing those packed attributes to
avoid application warnings.

/Bruce
  
Stephen Hemminger May 16, 2019, 5:04 p.m. UTC | #7
On Thu, 16 May 2019 17:36:43 +0100
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Thu, May 16, 2019 at 05:07:45PM +0100, Bruce Richardson wrote:
> > On Thu, May 16, 2019 at 09:06:52AM -0700, Stephen Hemminger wrote:  
> > > On Thu, 16 May 2019 17:03:37 +0100
> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > >   
> > > > On Wed, May 15, 2019 at 03:19:52PM -0700, Stephen Hemminger wrote:  
> > > > > Using bit operations like or and xor is faster than a loop
> > > > > on all architectures. Really just explicit unrolling.
> > > > > 
> > > > > Similar cast to uint16 unaligned is already done in
> > > > > other functions here.
> > > > > 
> > > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > > > > ---
> > > > >  lib/librte_net/rte_ether.h | 17 +++++++----------
> > > > >  1 file changed, 7 insertions(+), 10 deletions(-)
> > > > >     
> > > > Rather than casting to unaligned values, which gives compiler warnings in
> > > > some cases, I believe we should just mark the ethernet addresses as always
> > > > being 2-byte aligned and simplify things. [unless we have a good use case
> > > > where we won't have 2-byte alignment???].
> > > > 
> > > > See patch: http://patches.dpdk.org/patch/53482/
> > > > 
> > > > Regards,
> > > > /Bruce  
> > > 
> > > I agree. Then you could also remove the unaligned_uint16_t that
> > > already exists in rte_ether.h
> > > 
> > > Do you want me to put your patch in my series?  
> > 
> > Sure, feel free.
> >   
> 
> I've just found another problem in this area. Compiling l2fwd with gcc 9, I
> get:
> 
> main.c:164:54: warning: taking address of packed member of ‘struct ether_hdr’ may result in an unaligned pointer value [-Waddress-of-packed-member]
>   164 |  ether_addr_copy(&l2fwd_ports_eth_addr[dest_portid], &eth->s_addr);
>       | 
> 
> Looking at some of the structures, it appears that not all the packet
> attributes may be necessary. For example, while AFAIK there are not
> absolute guarantees about structure padding, for all compilers I remember
> using the following changes are safe:
> 
> diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h
> index 8090b7c01..7d9f34791 100644
> --- a/lib/librte_net/rte_ether.h
> +++ b/lib/librte_net/rte_ether.h
> @@ -57,7 +57,7 @@ extern "C" {
>  struct ether_addr {
>         /** Addr bytes in tx order */
>         uint8_t addr_bytes[ETHER_ADDR_LEN] __rte_aligned(2);
> -} __attribute__((__packed__));
> +};
> 
>  #define ETHER_LOCAL_ADMIN_ADDR 0x02 /**< Locally assigned Eth. address. */
>  #define ETHER_GROUP_ADDR       0x01 /**< Multicast or broadcast Eth. address. */
> @@ -273,7 +273,7 @@ struct ether_hdr {
>         struct ether_addr d_addr; /**< Destination address. */
>         struct ether_addr s_addr; /**< Source address. */
>         uint16_t ether_type;      /**< Frame type. */
> -} __attribute__((__packed__));
> +};
> 
>  /**
>   * Ethernet VLAN Header.
> @@ -283,7 +283,7 @@ struct ether_hdr {
>  struct vlan_hdr {
>         uint16_t vlan_tci; /**< Priority (3) + CFI (1) + Identifier Code (12) */
>         uint16_t eth_proto;/**< Ethernet type of encapsulated frame. */
> -} __attribute__((__packed__));
> +};
> 
>  /**
>   * VXLAN protocol header.
> 
> I think we therefore should consider removing those packed attributes to
> avoid application warnings.
> 
> /Bruce

Agree if structure is naturally packed, adding packed attribute is a mistake.
  
Bruce Richardson May 16, 2019, 8:37 p.m. UTC | #8
On Thu, May 16, 2019 at 10:04:54AM -0700, Stephen Hemminger wrote:
> On Thu, 16 May 2019 17:36:43 +0100
> Bruce Richardson <bruce.richardson@intel.com> wrote:
> 
> > On Thu, May 16, 2019 at 05:07:45PM +0100, Bruce Richardson wrote:
> > > On Thu, May 16, 2019 at 09:06:52AM -0700, Stephen Hemminger wrote:  
> > > > On Thu, 16 May 2019 17:03:37 +0100
> > > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > > >   
> > > > > On Wed, May 15, 2019 at 03:19:52PM -0700, Stephen Hemminger wrote:  
> > > > > > Using bit operations like or and xor is faster than a loop
> > > > > > on all architectures. Really just explicit unrolling.
> > > > > > 
> > > > > > Similar cast to uint16 unaligned is already done in
> > > > > > other functions here.
> > > > > > 
> > > > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > > > > > ---
> > > > > >  lib/librte_net/rte_ether.h | 17 +++++++----------
> > > > > >  1 file changed, 7 insertions(+), 10 deletions(-)
> > > > > >     
> > > > > Rather than casting to unaligned values, which gives compiler warnings in
> > > > > some cases, I believe we should just mark the ethernet addresses as always
> > > > > being 2-byte aligned and simplify things. [unless we have a good use case
> > > > > where we won't have 2-byte alignment???].
> > > > > 
> > > > > See patch: http://patches.dpdk.org/patch/53482/
> > > > > 
> > > > > Regards,
> > > > > /Bruce  
> > > > 
> > > > I agree. Then you could also remove the unaligned_uint16_t that
> > > > already exists in rte_ether.h
> > > > 
> > > > Do you want me to put your patch in my series?  
> > > 
> > > Sure, feel free.
> > >   
> > 
> > I've just found another problem in this area. Compiling l2fwd with gcc 9, I
> > get:
> > 
> > main.c:164:54: warning: taking address of packed member of ‘struct ether_hdr’ may result in an unaligned pointer value [-Waddress-of-packed-member]
> >   164 |  ether_addr_copy(&l2fwd_ports_eth_addr[dest_portid], &eth->s_addr);
> >       | 
> > 
> > Looking at some of the structures, it appears that not all the packet
> > attributes may be necessary. For example, while AFAIK there are not
> > absolute guarantees about structure padding, for all compilers I remember
> > using the following changes are safe:
> > 
> > diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h
> > index 8090b7c01..7d9f34791 100644
> > --- a/lib/librte_net/rte_ether.h
> > +++ b/lib/librte_net/rte_ether.h
> > @@ -57,7 +57,7 @@ extern "C" {
> >  struct ether_addr {
> >         /** Addr bytes in tx order */
> >         uint8_t addr_bytes[ETHER_ADDR_LEN] __rte_aligned(2);
> > -} __attribute__((__packed__));
> > +};
> > 
> >  #define ETHER_LOCAL_ADMIN_ADDR 0x02 /**< Locally assigned Eth. address. */
> >  #define ETHER_GROUP_ADDR       0x01 /**< Multicast or broadcast Eth. address. */
> > @@ -273,7 +273,7 @@ struct ether_hdr {
> >         struct ether_addr d_addr; /**< Destination address. */
> >         struct ether_addr s_addr; /**< Source address. */
> >         uint16_t ether_type;      /**< Frame type. */
> > -} __attribute__((__packed__));
> > +};
> > 
> >  /**
> >   * Ethernet VLAN Header.
> > @@ -283,7 +283,7 @@ struct ether_hdr {
> >  struct vlan_hdr {
> >         uint16_t vlan_tci; /**< Priority (3) + CFI (1) + Identifier Code (12) */
> >         uint16_t eth_proto;/**< Ethernet type of encapsulated frame. */
> > -} __attribute__((__packed__));
> > +};
> > 
> >  /**
> >   * VXLAN protocol header.
> > 
> > I think we therefore should consider removing those packed attributes to
> > avoid application warnings.
> > 
> > /Bruce
> 
> Agree if structure is naturally packed, adding packed attribute is a mistake.

Do you want to also include such a change in your patchset, or will I do up
a patch for it - perhaps a two-patch set with this and my previous alignment
patch?
  
Bruce Richardson May 16, 2019, 8:41 p.m. UTC | #9
On Thu, May 16, 2019 at 09:37:42PM +0100, Bruce Richardson wrote:
> On Thu, May 16, 2019 at 10:04:54AM -0700, Stephen Hemminger wrote:
> > Agree if structure is naturally packed, adding packed attribute is a mistake.
> 
> Do you want to also include such a change in your patchset, or will I do up
> a patch for it - perhaps a two-patch set with this and my previous alignment
> patch?

Never mind, see patchset on list now...
  

Patch

diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h
index b94e64b2195e..5d9242cda230 100644
--- a/lib/librte_net/rte_ether.h
+++ b/lib/librte_net/rte_ether.h
@@ -78,11 +78,10 @@  struct ether_addr {
 static inline int is_same_ether_addr(const struct ether_addr *ea1,
 				     const struct ether_addr *ea2)
 {
-	int i;
-	for (i = 0; i < ETHER_ADDR_LEN; i++)
-		if (ea1->addr_bytes[i] != ea2->addr_bytes[i])
-			return 0;
-	return 1;
+	const unaligned_uint16_t *w1 = (const uint16_t *)ea1;
+	const unaligned_uint16_t *w2 = (const uint16_t *)ea2;
+
+	return ((w1[0] ^ w2[0]) | (w1[1] ^ w2[1]) | (w1[2] ^ w2[2])) == 0;
 }
 
 /**
@@ -97,11 +96,9 @@  static inline int is_same_ether_addr(const struct ether_addr *ea1,
  */
 static inline int is_zero_ether_addr(const struct ether_addr *ea)
 {
-	int i;
-	for (i = 0; i < ETHER_ADDR_LEN; i++)
-		if (ea->addr_bytes[i] != 0x00)
-			return 0;
-	return 1;
+	const unaligned_uint16_t *w = (const uint16_t *)ea;
+
+	return (w[0] | w[1] | w[2]) == 0;
 }
 
 /**