diff mbox series

[1/1] net: fix aliasing issue in checksum computation

Message ID	20210918114930.245387-2-mail@gms.tf (mailing list archive)
State	Superseded, archived
Delegated to:	Ferruh Yigit
Headers	sender: georg) by goedel.lru.li (Postfix) with ESMTPSA id 62C4C3EAA03; Sat, 18 Sep 2021 11:49:48 +0000 (UTC) From: Georg Sauthoff <mail@gms.tf> To: Olivier Matz <olivier.matz@6wind.com>, Thomas Monjalon <thomas@monjalon.net>, David Marchand <david.marchand@redhat.com> Cc: dev@dpdk.org, Georg Sauthoff <mail@gms.tf> Date: Sat, 18 Sep 2021 13:49:30 +0200 Message-Id: <20210918114930.245387-2-mail@gms.tf> In-Reply-To: <20210918114930.245387-1-mail@gms.tf> References: <20210918114930.245387-1-mail@gms.tf> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] [PATCH 1/1] net: fix aliasing issue in checksum computation Precedence: list Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org>
Series	net: fix aliasing issue in checksum computation \| [0/1] net: fix aliasing issue in checksum computation [1/1] net: fix aliasing issue in checksum computation

Checks

Context	Check	Description
ci/checkpatch	warning	coding style issues
ci/github-robot: build	success	github build: passed
ci/iol-mellanox-Functional	fail	Functional Testing issues
ci/Intel-compilation	success	Compilation OK
ci/intel-Testing	fail	Testing issues
ci/iol-broadcom-Functional	success	Functional Testing PASS
ci/iol-mellanox-Performance	success	Performance Testing PASS
ci/iol-broadcom-Performance	success	Performance Testing PASS
ci/iol-x86_64-compile-testing	success	Testing PASS
ci/iol-intel-Functional	success	Functional Testing PASS
ci/iol-intel-Performance	success	Performance Testing PASS
ci/iol-aarch64-compile-testing	success	Testing PASS
ci/iol-x86_64-unit-testing	success	Testing PASS

Commit Message

Georg Sauthoff Sept. 18, 2021, 11:49 a.m. UTC

  That means a superfluous cast is removed and aliasing through a uint8_t
pointer is eliminated. Note that uint8_t doesn't have the same
strict-aliasing properties as unsigned char.

Also simplified the loop since a modern C compiler can speed up (i.e.
auto-vectorize) it in a similar way. For example, GCC auto-vectorizes it
for Haswell using AVX registers while halving the number of instructions
in the generated code.

Signed-off-by: Georg Sauthoff <mail@gms.tf>
---
 lib/net/rte_ip.h | 27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

Comments

Ferruh Yigit Oct. 14, 2021, 5:20 p.m. UTC | #1

On 9/18/2021 12:49 PM, Georg Sauthoff wrote:
> That means a superfluous cast is removed and aliasing through a uint8_t
> pointer is eliminated. Note that uint8_t doesn't have the same
> strict-aliasing properties as unsigned char.
> 
> Also simplified the loop since a modern C compiler can speed up (i.e.
> auto-vectorize) it in a similar way. For example, GCC auto-vectorizes it
> for Haswell using AVX registers while halving the number of instructions
> in the generated code.
> 
> Signed-off-by: Georg Sauthoff <mail@gms.tf>

+ Morten. (Because of past reviews on cksum code)

> ---
>   lib/net/rte_ip.h | 27 ++++++++-------------------
>   1 file changed, 8 insertions(+), 19 deletions(-)
> 
> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> index 05948b69b7..386db94c85 100644
> --- a/lib/net/rte_ip.h
> +++ b/lib/net/rte_ip.h
> @@ -141,29 +141,18 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr *ipv4_hdr)
>   static inline uint32_t
>   __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
>   {
> -	/* workaround gcc strict-aliasing warning */
> -	uintptr_t ptr = (uintptr_t)buf;
> +	/* extend strict-aliasing rules */
>   	typedef uint16_t __attribute__((__may_alias__)) u16_p;
> -	const u16_p *u16_buf = (const u16_p *)ptr;
> -
> -	while (len >= (sizeof(*u16_buf) * 4)) {
> -		sum += u16_buf[0];
> -		sum += u16_buf[1];
> -		sum += u16_buf[2];
> -		sum += u16_buf[3];
> -		len -= sizeof(*u16_buf) * 4;
> -		u16_buf += 4;
> -	}
> -	while (len >= sizeof(*u16_buf)) {
> +	const u16_p *u16_buf = (const u16_p *)buf;
> +	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> +
> +	for (; u16_buf != end; ++u16_buf)
>   		sum += *u16_buf;
> -		len -= sizeof(*u16_buf);
> -		u16_buf += 1;
> -	}
>   
> -	/* if length is in odd bytes */
> -	if (len == 1) {
> +	/* if length is odd, keeping it byte order independent */
> +	if (len % 2) {
>   		uint16_t left = 0;
> -		*(uint8_t *)&left = *(const uint8_t *)u16_buf;
> +		*(unsigned char*)&left = *(const unsigned char *)end;
>   		sum += left;
>   	}
>   
>

Morten Brørup Oct. 14, 2021, 8:22 p.m. UTC | #2

> -----Original Message-----
> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> Sent: Thursday, 14 October 2021 19.20
> 
> On 9/18/2021 12:49 PM, Georg Sauthoff wrote:
> > That means a superfluous cast is removed and aliasing through a
> uint8_t
> > pointer is eliminated. Note that uint8_t doesn't have the same
> > strict-aliasing properties as unsigned char.
> >
> > Also simplified the loop since a modern C compiler can speed up (i.e.
> > auto-vectorize) it in a similar way. For example, GCC auto-vectorizes
> it
> > for Haswell using AVX registers while halving the number of
> instructions
> > in the generated code.
> >
> > Signed-off-by: Georg Sauthoff <mail@gms.tf>
> 
> + Morten. (Because of past reviews on cksum code)

Thanks, Ferruh.

I have not verified the claimed benefits of the patch, but I have reviewed the code thoroughly, and it looks perfectly good to me.

Reviewed-by: Morten Brørup <mb@smartsharesystems.com>

BTW: It makes me wonder if other parts of DPDK could benefit from the same treatment. Especially some of the older DPDK code, where we were trying to optimize by hand what a modern compiler can optimize for us today.

> 
> > ---
> >   lib/net/rte_ip.h | 27 ++++++++-------------------
> >   1 file changed, 8 insertions(+), 19 deletions(-)
> >
> > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> > index 05948b69b7..386db94c85 100644
> > --- a/lib/net/rte_ip.h
> > +++ b/lib/net/rte_ip.h
> > @@ -141,29 +141,18 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr
> *ipv4_hdr)
> >   static inline uint32_t
> >   __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
> >   {
> > -	/* workaround gcc strict-aliasing warning */
> > -	uintptr_t ptr = (uintptr_t)buf;
> > +	/* extend strict-aliasing rules */
> >   	typedef uint16_t __attribute__((__may_alias__)) u16_p;
> > -	const u16_p *u16_buf = (const u16_p *)ptr;
> > -
> > -	while (len >= (sizeof(*u16_buf) * 4)) {
> > -		sum += u16_buf[0];
> > -		sum += u16_buf[1];
> > -		sum += u16_buf[2];
> > -		sum += u16_buf[3];
> > -		len -= sizeof(*u16_buf) * 4;
> > -		u16_buf += 4;
> > -	}
> > -	while (len >= sizeof(*u16_buf)) {
> > +	const u16_p *u16_buf = (const u16_p *)buf;
> > +	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > +
> > +	for (; u16_buf != end; ++u16_buf)

Personally I would prefer post-incrementing here. It makes no difference, so I don't see any need to revise the patch.

> >   		sum += *u16_buf;
> > -		len -= sizeof(*u16_buf);
> > -		u16_buf += 1;
> > -	}
> >
> > -	/* if length is in odd bytes */
> > -	if (len == 1) {
> > +	/* if length is odd, keeping it byte order independent */
> > +	if (len % 2) {
> >   		uint16_t left = 0;
> > -		*(uint8_t *)&left = *(const uint8_t *)u16_buf;
> > +		*(unsigned char*)&left = *(const unsigned char *)end;
> >   		sum += left;
> >   	}
> >
> >
>

Olivier Matz Oct. 15, 2021, 2:39 p.m. UTC | #3

Hi Georg,

On Sat, Sep 18, 2021 at 01:49:30PM +0200, Georg Sauthoff wrote:
> That means a superfluous cast is removed and aliasing through a uint8_t
> pointer is eliminated. Note that uint8_t doesn't have the same
> strict-aliasing properties as unsigned char.

Interesting. Out of curiosity, do you have links that explains
this?

I found these, but these are just discussions:
  https://stackoverflow.com/questions/16138237/when-is-uint8-t-%E2%89%A0-unsigned-char
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110

What about rewording the sentence "uint8_t doesn't have the same
strict-aliasing properties as unsigned char" to clarify that unsigned
char may alias, but uint8_t may not?

> Also simplified the loop since a modern C compiler can speed up (i.e.
> auto-vectorize) it in a similar way. For example, GCC auto-vectorizes it
> for Haswell using AVX registers while halving the number of instructions
> in the generated code.
> 
> Signed-off-by: Georg Sauthoff <mail@gms.tf>

The patch looks good to me, thanks!

Acked-by: Olivier Matz <olivier.matz@6wind.com>

Morten Brørup Oct. 16, 2021, 8:21 a.m. UTC | #4

Geoff,

I have given this some more thoughts.

Most bytes transferred in real life are transferred in large packets, so faster processing of large packets is a great improvement!

Furthermore, a quick analysis of a recent packet sample from an ISP customer of ours shows that less than 8 % of the packets are odd size. Would you consider adding an unlikely() to the branch handling the odd byte at the end?

-Morten

> -----Original Message-----
> From: Morten Brørup
> Sent: Thursday, 14 October 2021 22.22
> 
> > -----Original Message-----
> > From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> > Sent: Thursday, 14 October 2021 19.20
> >
> > On 9/18/2021 12:49 PM, Georg Sauthoff wrote:
> > > That means a superfluous cast is removed and aliasing through a
> > uint8_t
> > > pointer is eliminated. Note that uint8_t doesn't have the same
> > > strict-aliasing properties as unsigned char.
> > >
> > > Also simplified the loop since a modern C compiler can speed up
> (i.e.
> > > auto-vectorize) it in a similar way. For example, GCC auto-
> vectorizes
> > it
> > > for Haswell using AVX registers while halving the number of
> > instructions
> > > in the generated code.
> > >
> > > Signed-off-by: Georg Sauthoff <mail@gms.tf>
> >
> > + Morten. (Because of past reviews on cksum code)
> 
> Thanks, Ferruh.
> 
> I have not verified the claimed benefits of the patch, but I have
> reviewed the code thoroughly, and it looks perfectly good to me.
> 
> Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
> 
> BTW: It makes me wonder if other parts of DPDK could benefit from the
> same treatment. Especially some of the older DPDK code, where we were
> trying to optimize by hand what a modern compiler can optimize for us
> today.
> 
> >
> > > ---
> > >   lib/net/rte_ip.h | 27 ++++++++-------------------
> > >   1 file changed, 8 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> > > index 05948b69b7..386db94c85 100644
> > > --- a/lib/net/rte_ip.h
> > > +++ b/lib/net/rte_ip.h
> > > @@ -141,29 +141,18 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr
> > *ipv4_hdr)
> > >   static inline uint32_t
> > >   __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
> > >   {
> > > -	/* workaround gcc strict-aliasing warning */
> > > -	uintptr_t ptr = (uintptr_t)buf;
> > > +	/* extend strict-aliasing rules */
> > >   	typedef uint16_t __attribute__((__may_alias__)) u16_p;
> > > -	const u16_p *u16_buf = (const u16_p *)ptr;
> > > -
> > > -	while (len >= (sizeof(*u16_buf) * 4)) {
> > > -		sum += u16_buf[0];
> > > -		sum += u16_buf[1];
> > > -		sum += u16_buf[2];
> > > -		sum += u16_buf[3];
> > > -		len -= sizeof(*u16_buf) * 4;
> > > -		u16_buf += 4;
> > > -	}
> > > -	while (len >= sizeof(*u16_buf)) {
> > > +	const u16_p *u16_buf = (const u16_p *)buf;
> > > +	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > > +
> > > +	for (; u16_buf != end; ++u16_buf)
> 
> Personally I would prefer post-incrementing here. It makes no
> difference, so I don't see any need to revise the patch.
> 
> > >   		sum += *u16_buf;
> > > -		len -= sizeof(*u16_buf);
> > > -		u16_buf += 1;
> > > -	}
> > >
> > > -	/* if length is in odd bytes */
> > > -	if (len == 1) {
> > > +	/* if length is odd, keeping it byte order independent */
> > > +	if (len % 2) {

I assume that the compiler already optimizes "% 2" to "& 1".

> > >   		uint16_t left = 0;
> > > -		*(uint8_t *)&left = *(const uint8_t *)u16_buf;
> > > +		*(unsigned char*)&left = *(const unsigned char *)end;
> > >   		sum += left;
> > >   	}
> > >

Morten Brørup Oct. 16, 2021, 8:24 a.m. UTC | #5

Georg, I apologize for calling you Geoff below! Just realized my mistake.

Med venlig hilsen / Kind regards,
-Morten Brørup


> -----Original Message-----
> From: Morten Brørup
> Sent: Saturday, 16 October 2021 10.21
> To: 'Georg Sauthoff'
> Cc: 'dev@dpdk.org'; 'Ferruh Yigit'; 'Olivier Matz'; 'Thomas Monjalon';
> 'David Marchand'
> Subject: RE: [dpdk-dev] [PATCH 1/1] net: fix aliasing issue in checksum
> computation
> 
> Geoff,
> 
> I have given this some more thoughts.
> 
> Most bytes transferred in real life are transferred in large packets,
> so faster processing of large packets is a great improvement!
> 
> Furthermore, a quick analysis of a recent packet sample from an ISP
> customer of ours shows that less than 8 % of the packets are odd size.
> Would you consider adding an unlikely() to the branch handling the odd
> byte at the end?
> 
> -Morten
> 
> > -----Original Message-----
> > From: Morten Brørup
> > Sent: Thursday, 14 October 2021 22.22
> >
> > > -----Original Message-----
> > > From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> > > Sent: Thursday, 14 October 2021 19.20
> > >
> > > On 9/18/2021 12:49 PM, Georg Sauthoff wrote:
> > > > That means a superfluous cast is removed and aliasing through a
> > > uint8_t
> > > > pointer is eliminated. Note that uint8_t doesn't have the same
> > > > strict-aliasing properties as unsigned char.
> > > >
> > > > Also simplified the loop since a modern C compiler can speed up
> > (i.e.
> > > > auto-vectorize) it in a similar way. For example, GCC auto-
> > vectorizes
> > > it
> > > > for Haswell using AVX registers while halving the number of
> > > instructions
> > > > in the generated code.
> > > >
> > > > Signed-off-by: Georg Sauthoff <mail@gms.tf>
> > >
> > > + Morten. (Because of past reviews on cksum code)
> >
> > Thanks, Ferruh.
> >
> > I have not verified the claimed benefits of the patch, but I have
> > reviewed the code thoroughly, and it looks perfectly good to me.
> >
> > Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
> >
> > BTW: It makes me wonder if other parts of DPDK could benefit from the
> > same treatment. Especially some of the older DPDK code, where we were
> > trying to optimize by hand what a modern compiler can optimize for us
> > today.
> >
> > >
> > > > ---
> > > >   lib/net/rte_ip.h | 27 ++++++++-------------------
> > > >   1 file changed, 8 insertions(+), 19 deletions(-)
> > > >
> > > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> > > > index 05948b69b7..386db94c85 100644
> > > > --- a/lib/net/rte_ip.h
> > > > +++ b/lib/net/rte_ip.h
> > > > @@ -141,29 +141,18 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr
> > > *ipv4_hdr)
> > > >   static inline uint32_t
> > > >   __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
> > > >   {
> > > > -	/* workaround gcc strict-aliasing warning */
> > > > -	uintptr_t ptr = (uintptr_t)buf;
> > > > +	/* extend strict-aliasing rules */
> > > >   	typedef uint16_t __attribute__((__may_alias__)) u16_p;
> > > > -	const u16_p *u16_buf = (const u16_p *)ptr;
> > > > -
> > > > -	while (len >= (sizeof(*u16_buf) * 4)) {
> > > > -		sum += u16_buf[0];
> > > > -		sum += u16_buf[1];
> > > > -		sum += u16_buf[2];
> > > > -		sum += u16_buf[3];
> > > > -		len -= sizeof(*u16_buf) * 4;
> > > > -		u16_buf += 4;
> > > > -	}
> > > > -	while (len >= sizeof(*u16_buf)) {
> > > > +	const u16_p *u16_buf = (const u16_p *)buf;
> > > > +	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > > > +
> > > > +	for (; u16_buf != end; ++u16_buf)
> >
> > Personally I would prefer post-incrementing here. It makes no
> > difference, so I don't see any need to revise the patch.
> >
> > > >   		sum += *u16_buf;
> > > > -		len -= sizeof(*u16_buf);
> > > > -		u16_buf += 1;
> > > > -	}
> > > >
> > > > -	/* if length is in odd bytes */
> > > > -	if (len == 1) {
> > > > +	/* if length is odd, keeping it byte order independent */
> > > > +	if (len % 2) {
> 
> I assume that the compiler already optimizes "% 2" to "& 1".
> 
> > > >   		uint16_t left = 0;
> > > > -		*(uint8_t *)&left = *(const uint8_t *)u16_buf;
> > > > +		*(unsigned char*)&left = *(const unsigned char *)end;
> > > >   		sum += left;
> > > >   	}
> > > >

Georg Sauthoff Oct. 16, 2021, 5:02 p.m. UTC | #6

Hello,

On Fri, Oct 15, 2021 at 04:39:02PM +0200, Olivier Matz wrote:
> On Sat, Sep 18, 2021 at 01:49:30PM +0200, Georg Sauthoff wrote:
> > That means a superfluous cast is removed and aliasing through a uint8_t
> > pointer is eliminated. Note that uint8_t doesn't have the same
> > strict-aliasing properties as unsigned char.
> 
> Interesting. Out of curiosity, do you have links that explains
> this?

yes, I do. https://stefansf.de/post/type-based-alias-analysis/ has some
nice examples and explains some things. Especially, it makes the point
that it's the access that matters for yielding undefined behaviour (i.e.
when violating strict-aliasing rules) and not the cast itself:

"N.B. the standard only speaks about the type of an object and the type
of an lvalue in order to access an object. Thus a pointer to an object
x may be converted arbitrarily often to arbitrary object pointer
types, and therefore even to incompatible types, as long as every
access to x is done through an lvalue which type conforms to C11
section 6.5 paragraph 7."

Section 'Character Type' in that article also addresses how uint8_t
isn't special as unsigned char while quoting the standard and
referencing below Bugzilla bug.

Another good article on strict aliasing:

https://gustedt.wordpress.com/2016/08/17/effective-types-and-aliasing/

> I found these, but these are just discussions:
>   https://stackoverflow.com/questions/16138237/when-is-uint8-t-%E2%89%A0-unsigned-char
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110

I like the Bugzilla link as it shows how some code benefits from
uint8_t not having the same aliasing requirements as e.g. unsigned char.
Thus, it's an example of why compiler developers might be motivated to
decide against making uint8_t a typedef of unsigned char, since the
standard doesn't require it.

> What about rewording the sentence "uint8_t doesn't have the same
> strict-aliasing properties as unsigned char" to clarify that unsigned
> char may alias, but uint8_t may not?

I can change

"That means a superfluous cast is removed and aliasing through a uint8_t
pointer is eliminated. Note that uint8_t doesn't have the same
strict-aliasing properties as unsigned char."

to

"That means a superfluous cast is removed and aliasing through a uint8_t
pointer is eliminated. NB: The C standard specifies that a unsigned char
pointer may alias while the C standard doesn't include such requirement
for uint8_t pointers."

Better?

Best regards
Georg

Georg Sauthoff Oct. 16, 2021, 5:17 p.m. UTC | #7

Hello,

On Sat, Oct 16, 2021 at 10:21:03AM +0200, Morten Brørup wrote:
> I have given this some more thoughts.
> 
> Most bytes transferred in real life are transferred in large packets,
> so faster processing of large packets is a great improvement!
> 
> Furthermore, a quick analysis of a recent packet sample from an ISP
> customer of ours shows that less than 8 % of the packets are odd size.
> Would you consider adding an unlikely() to the branch handling the odd
> byte at the end?

sure, I don't see a problem with adding unlikely() there.

I'll post a version 2 of that patch then, tomorrow.

Best regards
Georg

diff mbox series

Patch

diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
index 05948b69b7..386db94c85 100644
--- a/lib/net/rte_ip.h
+++ b/lib/net/rte_ip.h
@@ -141,29 +141,18 @@  rte_ipv4_hdr_len(const struct rte_ipv4_hdr *ipv4_hdr)
 static inline uint32_t
 __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
 {
-	/* workaround gcc strict-aliasing warning */
-	uintptr_t ptr = (uintptr_t)buf;
+	/* extend strict-aliasing rules */
 	typedef uint16_t __attribute__((__may_alias__)) u16_p;
-	const u16_p *u16_buf = (const u16_p *)ptr;
-
-	while (len >= (sizeof(*u16_buf) * 4)) {
-		sum += u16_buf[0];
-		sum += u16_buf[1];
-		sum += u16_buf[2];
-		sum += u16_buf[3];
-		len -= sizeof(*u16_buf) * 4;
-		u16_buf += 4;
-	}
-	while (len >= sizeof(*u16_buf)) {
+	const u16_p *u16_buf = (const u16_p *)buf;
+	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
+
+	for (; u16_buf != end; ++u16_buf)
 		sum += *u16_buf;
-		len -= sizeof(*u16_buf);
-		u16_buf += 1;
-	}
 
-	/* if length is in odd bytes */
-	if (len == 1) {
+	/* if length is odd, keeping it byte order independent */
+	if (len % 2) {
 		uint16_t left = 0;
-		*(uint8_t *)&left = *(const uint8_t *)u16_buf;
+		*(unsigned char*)&left = *(const unsigned char *)end;
 		sum += left;
 	}