diff mbox series

net: fix checksum with unaligned buffer

Message ID	20220617084505.62071-1-mb@smartsharesystems.com (mailing list archive)
State	Superseded, archived
Delegated to:	Thomas Monjalon
Headers	From: =?utf-8?q?Morten_Br=C3=B8rup?= <mb@smartsharesystems.com> To: emil.berg@ericsson.com, dev@dpdk.org Cc: stable@dpdk.org, bugzilla@dpdk.org, hofors@lysator.liu.se, olivier.matz@6wind.com, =?utf-8?q?Morten_Br=C3=B8rup?= <mb@smartsharesystems.com> Subject: [PATCH] net: fix checksum with unaligned buffer Date: Fri, 17 Jun 2022 10:45:05 +0200 Message-Id: <20220617084505.62071-1-mb@smartsharesystems.com> In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D87139@smartserver.smartshare.dk> References: <98CBD80474FA8B44BF855DF32C47DC35D87139@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: dev-bounces@dpdk.org
Series	net: fix checksum with unaligned buffer \| net: fix checksum with unaligned buffer

Checks

Context	Check	Description
ci/checkpatch	success	coding style OK
ci/Intel-compilation	success	Compilation OK
ci/intel-Testing	success	Testing PASS
ci/iol-mellanox-Performance	success	Performance Testing PASS
ci/github-robot: build	success	github build: passed
ci/iol-x86_64-compile-testing	success	Testing PASS
ci/iol-aarch64-unit-testing	success	Testing PASS
ci/iol-x86_64-unit-testing	success	Testing PASS
ci/iol-aarch64-compile-testing	success	Testing PASS
ci/iol-intel-Performance	success	Performance Testing PASS
ci/iol-intel-Functional	success	Functional Testing PASS
ci/iol-abi-testing	success	Testing PASS

Commit Message

Morten Brørup June 17, 2022, 8:45 a.m. UTC

  With this patch, the checksum can be calculated on an unligned part of
a packet buffer.
I.e. the buf parameter is no longer required to be 16 bit aligned.

The DPDK invariant that packet buffers must be 16 bit aligned remains
unchanged.
This invariant also defines how to calculate the 16 bit checksum on an
unaligned part of a packet buffer.

Bugzilla ID: 1035
Cc: stable@dpdk.org

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/net/rte_ip.h | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

Comments

Morten Brørup June 17, 2022, 9:06 a.m. UTC | #1

> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Friday, 17 June 2022 10.45
> 
> With this patch, the checksum can be calculated on an unligned part of
> a packet buffer.
> I.e. the buf parameter is no longer required to be 16 bit aligned.
> 
> The DPDK invariant that packet buffers must be 16 bit aligned remains
> unchanged.
> This invariant also defines how to calculate the 16 bit checksum on an
> unaligned part of a packet buffer.
> 
> Bugzilla ID: 1035
> Cc: stable@dpdk.org
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>  lib/net/rte_ip.h | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> index b502481670..8e301d9c26 100644
> --- a/lib/net/rte_ip.h
> +++ b/lib/net/rte_ip.h
> @@ -162,9 +162,22 @@ __rte_raw_cksum(const void *buf, size_t len,
> uint32_t sum)
>  {
>  	/* extend strict-aliasing rules */
>  	typedef uint16_t __attribute__((__may_alias__)) u16_p;
> -	const u16_p *u16_buf = (const u16_p *)buf;
> -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> +	const u16_p *u16_buf;
> +	const u16_p *end;
> +
> +	/* if buffer is unaligned, keeping it byte order independent */
> +	if (unlikely((uintptr_t)buf & 1)) {
> +		uint16_t first = 0;
> +		if (unlikely(len == 0))
> +			return 0;
> +		((unsigned char *)&first)[1] = *(const unsigned char *)buf;
> +		sum += first;
> +		buf = (const void *)((uintptr_t)buf + 1);
> +		len--;
> +	}
> 
> +	u16_buf = (const u16_p *)buf;
> +	end = u16_buf + len / sizeof(*u16_buf);
>  	for (; u16_buf != end; ++u16_buf)
>  		sum += *u16_buf;
> 
> --
> 2.17.1

@Emil, can you please test this patch with an unaligned buffer on your application to confirm that it produces the expected result.

Emil Berg June 17, 2022, 12:17 p.m. UTC | #2

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: den 17 juni 2022 11:07
> To: Emil Berg <emil.berg@ericsson.com>
> Cc: stable@dpdk.org; bugzilla@dpdk.org; hofors@lysator.liu.se;
> olivier.matz@6wind.com; dev@dpdk.org
> Subject: RE: [PATCH] net: fix checksum with unaligned buffer
> 
> > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > Sent: Friday, 17 June 2022 10.45
> >
> > With this patch, the checksum can be calculated on an unligned part of
> > a packet buffer.
> > I.e. the buf parameter is no longer required to be 16 bit aligned.
> >
> > The DPDK invariant that packet buffers must be 16 bit aligned remains
> > unchanged.
> > This invariant also defines how to calculate the 16 bit checksum on an
> > unaligned part of a packet buffer.
> >
> > Bugzilla ID: 1035
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
> >  lib/net/rte_ip.h | 17 +++++++++++++++--
> >  1 file changed, 15 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index
> > b502481670..8e301d9c26 100644
> > --- a/lib/net/rte_ip.h
> > +++ b/lib/net/rte_ip.h
> > @@ -162,9 +162,22 @@ __rte_raw_cksum(const void *buf, size_t len,
> > uint32_t sum)  {
> >  	/* extend strict-aliasing rules */
> >  	typedef uint16_t __attribute__((__may_alias__)) u16_p;
> > -	const u16_p *u16_buf = (const u16_p *)buf;
> > -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > +	const u16_p *u16_buf;
> > +	const u16_p *end;
> > +
> > +	/* if buffer is unaligned, keeping it byte order independent */
> > +	if (unlikely((uintptr_t)buf & 1)) {
> > +		uint16_t first = 0;
> > +		if (unlikely(len == 0))
> > +			return 0;
> > +		((unsigned char *)&first)[1] = *(const unsigned
> char *)buf;
> > +		sum += first;
> > +		buf = (const void *)((uintptr_t)buf + 1);
> > +		len--;
> > +	}
> >
> > +	u16_buf = (const u16_p *)buf;
> > +	end = u16_buf + len / sizeof(*u16_buf);
> >  	for (; u16_buf != end; ++u16_buf)
> >  		sum += *u16_buf;
> >
> > --
> > 2.17.1
> 
> @Emil, can you please test this patch with an unaligned buffer on your
> application to confirm that it produces the expected result.

All right. I'll test, perhaps on Monday.

Emil Berg June 20, 2022, 10:37 a.m. UTC | #3

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: den 17 juni 2022 11:07
> To: Emil Berg <emil.berg@ericsson.com>
> Cc: stable@dpdk.org; bugzilla@dpdk.org; hofors@lysator.liu.se;
> olivier.matz@6wind.com; dev@dpdk.org
> Subject: RE: [PATCH] net: fix checksum with unaligned buffer
> 
> > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > Sent: Friday, 17 June 2022 10.45
> >
> > With this patch, the checksum can be calculated on an unligned part of
> > a packet buffer.
> > I.e. the buf parameter is no longer required to be 16 bit aligned.
> >
> > The DPDK invariant that packet buffers must be 16 bit aligned remains
> > unchanged.
> > This invariant also defines how to calculate the 16 bit checksum on an
> > unaligned part of a packet buffer.
> >
> > Bugzilla ID: 1035
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
> >  lib/net/rte_ip.h | 17 +++++++++++++++--
> >  1 file changed, 15 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index
> > b502481670..8e301d9c26 100644
> > --- a/lib/net/rte_ip.h
> > +++ b/lib/net/rte_ip.h
> > @@ -162,9 +162,22 @@ __rte_raw_cksum(const void *buf, size_t len,
> > uint32_t sum)  {
> >  	/* extend strict-aliasing rules */
> >  	typedef uint16_t __attribute__((__may_alias__)) u16_p;
> > -	const u16_p *u16_buf = (const u16_p *)buf;
> > -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > +	const u16_p *u16_buf;
> > +	const u16_p *end;
> > +
> > +	/* if buffer is unaligned, keeping it byte order independent */
> > +	if (unlikely((uintptr_t)buf & 1)) {
> > +		uint16_t first = 0;
> > +		if (unlikely(len == 0))
> > +			return 0;
> > +		((unsigned char *)&first)[1] = *(const unsigned
> char *)buf;
> > +		sum += first;
> > +		buf = (const void *)((uintptr_t)buf + 1);
> > +		len--;
> > +	}
> >
> > +	u16_buf = (const u16_p *)buf;
> > +	end = u16_buf + len / sizeof(*u16_buf);
> >  	for (; u16_buf != end; ++u16_buf)
> >  		sum += *u16_buf;
> >
> > --
> > 2.17.1
> 
> @Emil, can you please test this patch with an unaligned buffer on your
> application to confirm that it produces the expected result.

Hi!

I tested the patch. It doesn't seem to produce the same results. I think the problem is that it always starts summing from an even address, the sum should always start from the first byte according to the checksum specification. Can I instead propose something Mattias Rönnblom sent me?
--------------------------------------------------------------------------------------------------------------
const void *end = RTE_PTR_ADD(buf, (len / sizeof(uint16_t)) * sizeof(uint16_t));

for (; buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) {
    uint16_t v;
    memcpy(&v, buf, sizeof(uint16_t));
    sum += v;
}

/* if length is odd, keeping it byte order independent */
if (unlikely(len % 2)) {
    uint16_t left = 0;
    *(unsigned char *)&left = *(const unsigned char *)end;
    sum += left;
}
--------------------------------------------------------------------------------------------------------------
Note that the last block is the same as before. Amazingly I see no measurable performance hit from this compared to the previous one (-O3, march=native). Looking at the previous the loop body may compile to (x86):
--------------------------------------------------------------------------------------------------------------
vmovdqa (%rdx),%xmm1
vpmovzxwd %xmm1,%xmm0
vpsrldq $0x8,%xmm1,%xmm1
vpmovzxwd %xmm1,%xmm1
vpaddd %xmm1,%xmm0,%xmm0
cmp    $0xf,%rax
jbe    0x7ff7a0dfb1a9
--------------------------------------------------------------------------------------------------------------
while Mattias' memcpy solution:
--------------------------------------------------------------------------------------------------------------
vmovdqu (%rcx),%ymm0
add    $0x20,%rcx
vpmovzxwd %xmm0,%ymm1
vextracti128 $0x1,%ymm0,%xmm0
vpmovzxwd %xmm0,%ymm0
vpaddd %ymm0,%ymm1,%ymm0
vpaddd %ymm0,%ymm2,%ymm2
cmp    %r9,%rcx
jne    0x555555556380
--------------------------------------------------------------------------------------------------------------
Thus two extra instructions in the loop, but I suspect it may be memory bound, leading to no measurable performance difference. 

Any comments?

/Emil

Morten Brørup June 20, 2022, 10:57 a.m. UTC | #4

> From: Emil Berg [mailto:emil.berg@ericsson.com]
> Sent: Monday, 20 June 2022 12.38
> 
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: den 17 juni 2022 11:07
> >
> > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > Sent: Friday, 17 June 2022 10.45
> > >
> > > With this patch, the checksum can be calculated on an unligned part
> of
> > > a packet buffer.
> > > I.e. the buf parameter is no longer required to be 16 bit aligned.
> > >
> > > The DPDK invariant that packet buffers must be 16 bit aligned
> remains
> > > unchanged.
> > > This invariant also defines how to calculate the 16 bit checksum on
> an
> > > unaligned part of a packet buffer.
> > >
> > > Bugzilla ID: 1035
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > ---
> > >  lib/net/rte_ip.h | 17 +++++++++++++++--
> > >  1 file changed, 15 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index
> > > b502481670..8e301d9c26 100644
> > > --- a/lib/net/rte_ip.h
> > > +++ b/lib/net/rte_ip.h
> > > @@ -162,9 +162,22 @@ __rte_raw_cksum(const void *buf, size_t len,
> > > uint32_t sum)  {
> > >  	/* extend strict-aliasing rules */
> > >  	typedef uint16_t __attribute__((__may_alias__)) u16_p;
> > > -	const u16_p *u16_buf = (const u16_p *)buf;
> > > -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > > +	const u16_p *u16_buf;
> > > +	const u16_p *end;
> > > +
> > > +	/* if buffer is unaligned, keeping it byte order independent */
> > > +	if (unlikely((uintptr_t)buf & 1)) {
> > > +		uint16_t first = 0;
> > > +		if (unlikely(len == 0))
> > > +			return 0;
> > > +		((unsigned char *)&first)[1] = *(const unsigned
> > char *)buf;
> > > +		sum += first;
> > > +		buf = (const void *)((uintptr_t)buf + 1);
> > > +		len--;
> > > +	}
> > >
> > > +	u16_buf = (const u16_p *)buf;
> > > +	end = u16_buf + len / sizeof(*u16_buf);
> > >  	for (; u16_buf != end; ++u16_buf)
> > >  		sum += *u16_buf;
> > >
> > > --
> > > 2.17.1
> >
> > @Emil, can you please test this patch with an unaligned buffer on
> your
> > application to confirm that it produces the expected result.
> 
> Hi!
> 
> I tested the patch. It doesn't seem to produce the same results. I
> think the problem is that it always starts summing from an even
> address, the sum should always start from the first byte according to
> the checksum specification. Can I instead propose something Mattias
> Rönnblom sent me?

I assume that it produces the same result when the "buf" parameter is aligned?

And when the "buf" parameter is unaligned, I don't expect it to produce the same results as the simple algorithm!

This was the whole point of the patch: I expect the overall packet buffer to be 16 bit aligned, and the checksum to be a partial checksum of such a 16 bit aligned packet buffer. When calling this function, I assume that the "buf" and "len" parameters point to a part of such a packet buffer. If these expectations are correct, the simple algorithm will produce incorrect results when "buf" is unaligned.

I was asking you to test if the checksum on the packet is correct when your application modifies an unaligned part of the packet and uses this function to update the checksum.


> -----------------------------------------------------------------------
> ---------------------------------------
> const void *end = RTE_PTR_ADD(buf, (len / sizeof(uint16_t)) *
> sizeof(uint16_t));
> 
> for (; buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) {
>     uint16_t v;
>     memcpy(&v, buf, sizeof(uint16_t));
>     sum += v;
> }
> 
> /* if length is odd, keeping it byte order independent */
> if (unlikely(len % 2)) {
>     uint16_t left = 0;
>     *(unsigned char *)&left = *(const unsigned char *)end;
>     sum += left;
> }
> -----------------------------------------------------------------------
> ---------------------------------------
> Note that the last block is the same as before. Amazingly I see no
> measurable performance hit from this compared to the previous one (-O3,
> march=native). Looking at the previous the loop body may compile to
> (x86):
> -----------------------------------------------------------------------
> ---------------------------------------
> vmovdqa (%rdx),%xmm1
> vpmovzxwd %xmm1,%xmm0
> vpsrldq $0x8,%xmm1,%xmm1
> vpmovzxwd %xmm1,%xmm1
> vpaddd %xmm1,%xmm0,%xmm0
> cmp    $0xf,%rax
> jbe    0x7ff7a0dfb1a9
> -----------------------------------------------------------------------
> ---------------------------------------
> while Mattias' memcpy solution:
> -----------------------------------------------------------------------
> ---------------------------------------
> vmovdqu (%rcx),%ymm0
> add    $0x20,%rcx
> vpmovzxwd %xmm0,%ymm1
> vextracti128 $0x1,%ymm0,%xmm0
> vpmovzxwd %xmm0,%ymm0
> vpaddd %ymm0,%ymm1,%ymm0
> vpaddd %ymm0,%ymm2,%ymm2
> cmp    %r9,%rcx
> jne    0x555555556380
> -----------------------------------------------------------------------
> ---------------------------------------
> Thus two extra instructions in the loop, but I suspect it may be memory
> bound, leading to no measurable performance difference.
> 
> Any comments?
> 
> /Emil

Emil Berg June 21, 2022, 7:16 a.m. UTC | #5

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: den 20 juni 2022 12:58
> To: Emil Berg <emil.berg@ericsson.com>
> Cc: stable@dpdk.org; bugzilla@dpdk.org; hofors@lysator.liu.se;
> olivier.matz@6wind.com; dev@dpdk.org
> Subject: RE: [PATCH] net: fix checksum with unaligned buffer
> 
> > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > Sent: Monday, 20 June 2022 12.38
> >
> > > From: Morten Brørup <mb@smartsharesystems.com>
> > > Sent: den 17 juni 2022 11:07
> > >
> > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > Sent: Friday, 17 June 2022 10.45
> > > >
> > > > With this patch, the checksum can be calculated on an unligned
> > > > part
> > of
> > > > a packet buffer.
> > > > I.e. the buf parameter is no longer required to be 16 bit aligned.
> > > >
> > > > The DPDK invariant that packet buffers must be 16 bit aligned
> > remains
> > > > unchanged.
> > > > This invariant also defines how to calculate the 16 bit checksum
> > > > on
> > an
> > > > unaligned part of a packet buffer.
> > > >
> > > > Bugzilla ID: 1035
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > > ---
> > > >  lib/net/rte_ip.h | 17 +++++++++++++++--
> > > >  1 file changed, 15 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index
> > > > b502481670..8e301d9c26 100644
> > > > --- a/lib/net/rte_ip.h
> > > > +++ b/lib/net/rte_ip.h
> > > > @@ -162,9 +162,22 @@ __rte_raw_cksum(const void *buf, size_t len,
> > > > uint32_t sum)  {
> > > >  	/* extend strict-aliasing rules */
> > > >  	typedef uint16_t __attribute__((__may_alias__)) u16_p;
> > > > -	const u16_p *u16_buf = (const u16_p *)buf;
> > > > -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > > > +	const u16_p *u16_buf;
> > > > +	const u16_p *end;
> > > > +
> > > > +	/* if buffer is unaligned, keeping it byte order independent */
> > > > +	if (unlikely((uintptr_t)buf & 1)) {
> > > > +		uint16_t first = 0;
> > > > +		if (unlikely(len == 0))
> > > > +			return 0;
> > > > +		((unsigned char *)&first)[1] = *(const unsigned
> > > char *)buf;
> > > > +		sum += first;
> > > > +		buf = (const void *)((uintptr_t)buf + 1);
> > > > +		len--;
> > > > +	}
> > > >
> > > > +	u16_buf = (const u16_p *)buf;
> > > > +	end = u16_buf + len / sizeof(*u16_buf);
> > > >  	for (; u16_buf != end; ++u16_buf)
> > > >  		sum += *u16_buf;
> > > >
> > > > --
> > > > 2.17.1
> > >
> > > @Emil, can you please test this patch with an unaligned buffer on
> > your
> > > application to confirm that it produces the expected result.
> >
> > Hi!
> >
> > I tested the patch. It doesn't seem to produce the same results. I
> > think the problem is that it always starts summing from an even
> > address, the sum should always start from the first byte according to
> > the checksum specification. Can I instead propose something Mattias
> > Rönnblom sent me?
> 
> I assume that it produces the same result when the "buf" parameter is
> aligned?
> 
> And when the "buf" parameter is unaligned, I don't expect it to produce the
> same results as the simple algorithm!
> 
> This was the whole point of the patch: I expect the overall packet buffer to
> be 16 bit aligned, and the checksum to be a partial checksum of such a 16 bit
> aligned packet buffer. When calling this function, I assume that the "buf" and
> "len" parameters point to a part of such a packet buffer. If these
> expectations are correct, the simple algorithm will produce incorrect results
> when "buf" is unaligned.
> 
> I was asking you to test if the checksum on the packet is correct when your
> application modifies an unaligned part of the packet and uses this function to
> update the checksum.
> 

Now I understand your use case. Your use case seems to be about partial checksums, of which some partial checksums may start on unaligned addresses in an otherwise aligned packet. 

Our use case is about calculating the full checksum on a nested packet. That nested packet may start on unaligned addresses.

The difference is basically if we want to sum over aligned addresses or not, handling the heading and trailing bytes appropriately.

Your method does not work in our case since we want to treat the first two bytes as the first word in our case. But I do understand that both methods are useful.

Note that your method breaks the API. Previously (assuming no crashing due to low optimization levels, more accepting hardware, or a different compiler (version)) the current method would calculate the checksum assuming the first two bytes is the first word.

> 
> > ----------------------------------------------------------------------
> > -
> > ---------------------------------------
> > const void *end = RTE_PTR_ADD(buf, (len / sizeof(uint16_t)) *
> > sizeof(uint16_t));
> >
> > for (; buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) {
> >     uint16_t v;
> >     memcpy(&v, buf, sizeof(uint16_t));
> >     sum += v;
> > }
> >
> > /* if length is odd, keeping it byte order independent */ if
> > (unlikely(len % 2)) {
> >     uint16_t left = 0;
> >     *(unsigned char *)&left = *(const unsigned char *)end;
> >     sum += left;
> > }
> > ----------------------------------------------------------------------
> > -
> > ---------------------------------------
> > Note that the last block is the same as before. Amazingly I see no
> > measurable performance hit from this compared to the previous one
> > (-O3, march=native). Looking at the previous the loop body may compile
> > to
> > (x86):
> > ----------------------------------------------------------------------
> > -
> > ---------------------------------------
> > vmovdqa (%rdx),%xmm1
> > vpmovzxwd %xmm1,%xmm0
> > vpsrldq $0x8,%xmm1,%xmm1
> > vpmovzxwd %xmm1,%xmm1
> > vpaddd %xmm1,%xmm0,%xmm0
> > cmp    $0xf,%rax
> > jbe    0x7ff7a0dfb1a9
> > ----------------------------------------------------------------------
> > -
> > ---------------------------------------
> > while Mattias' memcpy solution:
> > ----------------------------------------------------------------------
> > -
> > ---------------------------------------
> > vmovdqu (%rcx),%ymm0
> > add    $0x20,%rcx
> > vpmovzxwd %xmm0,%ymm1
> > vextracti128 $0x1,%ymm0,%xmm0
> > vpmovzxwd %xmm0,%ymm0
> > vpaddd %ymm0,%ymm1,%ymm0
> > vpaddd %ymm0,%ymm2,%ymm2
> > cmp    %r9,%rcx
> > jne    0x555555556380
> > ----------------------------------------------------------------------
> > -
> > ---------------------------------------
> > Thus two extra instructions in the loop, but I suspect it may be
> > memory bound, leading to no measurable performance difference.
> >
> > Any comments?
> >
> > /Emil

Morten Brørup June 21, 2022, 8:05 a.m. UTC | #6

+TO: @Bruce and @Stephen: You signed off on the 16 bit alignment requirement. We need background info on this.

> From: Emil Berg [mailto:emil.berg@ericsson.com]
> Sent: Tuesday, 21 June 2022 09.17
> 
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: den 20 juni 2022 12:58
> >
> > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > Sent: Monday, 20 June 2022 12.38
> > >
> > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > Sent: den 17 juni 2022 11:07
> > > >
> > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > Sent: Friday, 17 June 2022 10.45
> > > > >
> > > > > With this patch, the checksum can be calculated on an unligned
> > > > > part
> > > of
> > > > > a packet buffer.
> > > > > I.e. the buf parameter is no longer required to be 16 bit
> aligned.
> > > > >
> > > > > The DPDK invariant that packet buffers must be 16 bit aligned
> > > remains
> > > > > unchanged.
> > > > > This invariant also defines how to calculate the 16 bit
> checksum
> > > > > on
> > > an
> > > > > unaligned part of a packet buffer.
> > > > >
> > > > > Bugzilla ID: 1035
> > > > > Cc: stable@dpdk.org
> > > > >
> > > > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > > > ---
> > > > >  lib/net/rte_ip.h | 17 +++++++++++++++--
> > > > >  1 file changed, 15 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index
> > > > > b502481670..8e301d9c26 100644
> > > > > --- a/lib/net/rte_ip.h
> > > > > +++ b/lib/net/rte_ip.h
> > > > > @@ -162,9 +162,22 @@ __rte_raw_cksum(const void *buf, size_t
> len,
> > > > > uint32_t sum)  {
> > > > >  	/* extend strict-aliasing rules */
> > > > >  	typedef uint16_t __attribute__((__may_alias__)) u16_p;
> > > > > -	const u16_p *u16_buf = (const u16_p *)buf;
> > > > > -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > > > > +	const u16_p *u16_buf;
> > > > > +	const u16_p *end;
> > > > > +
> > > > > +	/* if buffer is unaligned, keeping it byte order
> independent */
> > > > > +	if (unlikely((uintptr_t)buf & 1)) {
> > > > > +		uint16_t first = 0;
> > > > > +		if (unlikely(len == 0))
> > > > > +			return 0;
> > > > > +		((unsigned char *)&first)[1] = *(const unsigned
> > > > char *)buf;
> > > > > +		sum += first;
> > > > > +		buf = (const void *)((uintptr_t)buf + 1);
> > > > > +		len--;
> > > > > +	}
> > > > >
> > > > > +	u16_buf = (const u16_p *)buf;
> > > > > +	end = u16_buf + len / sizeof(*u16_buf);
> > > > >  	for (; u16_buf != end; ++u16_buf)
> > > > >  		sum += *u16_buf;
> > > > >
> > > > > --
> > > > > 2.17.1
> > > >
> > > > @Emil, can you please test this patch with an unaligned buffer on
> > > your
> > > > application to confirm that it produces the expected result.
> > >
> > > Hi!
> > >
> > > I tested the patch. It doesn't seem to produce the same results. I
> > > think the problem is that it always starts summing from an even
> > > address, the sum should always start from the first byte according
> to
> > > the checksum specification. Can I instead propose something Mattias
> > > Rönnblom sent me?
> >
> > I assume that it produces the same result when the "buf" parameter is
> > aligned?
> >
> > And when the "buf" parameter is unaligned, I don't expect it to
> produce the
> > same results as the simple algorithm!
> >
> > This was the whole point of the patch: I expect the overall packet
> buffer to
> > be 16 bit aligned, and the checksum to be a partial checksum of such
> a 16 bit
> > aligned packet buffer. When calling this function, I assume that the
> "buf" and
> > "len" parameters point to a part of such a packet buffer. If these
> > expectations are correct, the simple algorithm will produce incorrect
> results
> > when "buf" is unaligned.
> >
> > I was asking you to test if the checksum on the packet is correct
> when your
> > application modifies an unaligned part of the packet and uses this
> function to
> > update the checksum.
> >
> 
> Now I understand your use case. Your use case seems to be about partial
> checksums, of which some partial checksums may start on unaligned
> addresses in an otherwise aligned packet.
> 
> Our use case is about calculating the full checksum on a nested packet.
> That nested packet may start on unaligned addresses.
> 
> The difference is basically if we want to sum over aligned addresses or
> not, handling the heading and trailing bytes appropriately.
> 
> Your method does not work in our case since we want to treat the first
> two bytes as the first word in our case. But I do understand that both
> methods are useful.

Yes, that certainly are two different use cases, requiring two different ways of calculating the 16 bit checksum.

> 
> Note that your method breaks the API. Previously (assuming no crashing
> due to low optimization levels, more accepting hardware, or a different
> compiler (version)) the current method would calculate the checksum
> assuming the first two bytes is the first word.
> 

Depending on the point of view, my patch either fixes a bug (where the checksum was calculated incorrectly when the buf pointer was unaligned) or breaks the API (by calculating the differently when the buffer is unaligned).

I cannot say with certainty which one is correct, but perhaps some of the people with a deeper DPDK track record can...

@Bruce and @Stephen, in 2019 you signed off on a patch [1] introducing a 16 bit alignment requirement to the Ethernet address structure.

It is my understanding that DPDK has an invariant requiring packets to be 16 bit aligned, which that patch supports. Is this invariant documented anywhere, or am I completely wrong? If I'm wrong, then the alignment requirement introduced in that patch needs to be removed, as well as any similar alignment requirements elsewhere in DPDK.

[1] http://git.dpdk.org/dpdk/commit/lib/librte_net/rte_ether.h?id=da5350ef29afd35c1adabe76f60832f3092269ad

@Emil, we should wait for a conclusion about the alignment invariant before we proceed.

If there is no such invariant, my patch is wrong, and we need to provide a v2 of the patch, which will then fit your use case.
If there is such an invariant, my patch is correct, and another function must be added for your use case.

Bruce Richardson June 21, 2022, 8:23 a.m. UTC | #7

On Tue, Jun 21, 2022 at 10:05:15AM +0200, Morten Brørup wrote:
> +TO: @Bruce and @Stephen: You signed off on the 16 bit alignment requirement. We need background info on this.
> 
> > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > Sent: Tuesday, 21 June 2022 09.17
> > 
> > > From: Morten Brørup <mb@smartsharesystems.com>
> > > Sent: den 20 juni 2022 12:58
> > >
> > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > Sent: Monday, 20 June 2022 12.38
> > > >
> > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > Sent: den 17 juni 2022 11:07
> > > > >
> > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > Sent: Friday, 17 June 2022 10.45
> > > > > >
> > > > > > With this patch, the checksum can be calculated on an unligned
> > > > > > part
> > > > of
> > > > > > a packet buffer.
> > > > > > I.e. the buf parameter is no longer required to be 16 bit
> > aligned.
> > > > > >
> > > > > > The DPDK invariant that packet buffers must be 16 bit aligned
> > > > remains
> > > > > > unchanged.
> > > > > > This invariant also defines how to calculate the 16 bit
> > checksum
> > > > > > on
> > > > an
> > > > > > unaligned part of a packet buffer.
> > > > > >
> > > > > > Bugzilla ID: 1035
> > > > > > Cc: stable@dpdk.org
> > > > > >
> > > > > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > > > > ---
> > > > > >  lib/net/rte_ip.h | 17 +++++++++++++++--
> > > > > >  1 file changed, 15 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index
> > > > > > b502481670..8e301d9c26 100644
> > > > > > --- a/lib/net/rte_ip.h
> > > > > > +++ b/lib/net/rte_ip.h
> > > > > > @@ -162,9 +162,22 @@ __rte_raw_cksum(const void *buf, size_t
> > len,
> > > > > > uint32_t sum)  {
> > > > > >  	/* extend strict-aliasing rules */
> > > > > >  	typedef uint16_t __attribute__((__may_alias__)) u16_p;
> > > > > > -	const u16_p *u16_buf = (const u16_p *)buf;
> > > > > > -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > > > > > +	const u16_p *u16_buf;
> > > > > > +	const u16_p *end;
> > > > > > +
> > > > > > +	/* if buffer is unaligned, keeping it byte order
> > independent */
> > > > > > +	if (unlikely((uintptr_t)buf & 1)) {
> > > > > > +		uint16_t first = 0;
> > > > > > +		if (unlikely(len == 0))
> > > > > > +			return 0;
> > > > > > +		((unsigned char *)&first)[1] = *(const unsigned
> > > > > char *)buf;
> > > > > > +		sum += first;
> > > > > > +		buf = (const void *)((uintptr_t)buf + 1);
> > > > > > +		len--;
> > > > > > +	}
> > > > > >
> > > > > > +	u16_buf = (const u16_p *)buf;
> > > > > > +	end = u16_buf + len / sizeof(*u16_buf);
> > > > > >  	for (; u16_buf != end; ++u16_buf)
> > > > > >  		sum += *u16_buf;
> > > > > >
> > > > > > --
> > > > > > 2.17.1
> > > > >
> > > > > @Emil, can you please test this patch with an unaligned buffer on
> > > > your
> > > > > application to confirm that it produces the expected result.
> > > >
> > > > Hi!
> > > >
> > > > I tested the patch. It doesn't seem to produce the same results. I
> > > > think the problem is that it always starts summing from an even
> > > > address, the sum should always start from the first byte according
> > to
> > > > the checksum specification. Can I instead propose something Mattias
> > > > Rönnblom sent me?
> > >
> > > I assume that it produces the same result when the "buf" parameter is
> > > aligned?
> > >
> > > And when the "buf" parameter is unaligned, I don't expect it to
> > produce the
> > > same results as the simple algorithm!
> > >
> > > This was the whole point of the patch: I expect the overall packet
> > buffer to
> > > be 16 bit aligned, and the checksum to be a partial checksum of such
> > a 16 bit
> > > aligned packet buffer. When calling this function, I assume that the
> > "buf" and
> > > "len" parameters point to a part of such a packet buffer. If these
> > > expectations are correct, the simple algorithm will produce incorrect
> > results
> > > when "buf" is unaligned.
> > >
> > > I was asking you to test if the checksum on the packet is correct
> > when your
> > > application modifies an unaligned part of the packet and uses this
> > function to
> > > update the checksum.
> > >
> > 
> > Now I understand your use case. Your use case seems to be about partial
> > checksums, of which some partial checksums may start on unaligned
> > addresses in an otherwise aligned packet.
> > 
> > Our use case is about calculating the full checksum on a nested packet.
> > That nested packet may start on unaligned addresses.
> > 
> > The difference is basically if we want to sum over aligned addresses or
> > not, handling the heading and trailing bytes appropriately.
> > 
> > Your method does not work in our case since we want to treat the first
> > two bytes as the first word in our case. But I do understand that both
> > methods are useful.
> 
> Yes, that certainly are two different use cases, requiring two different ways of calculating the 16 bit checksum.
> 
> > 
> > Note that your method breaks the API. Previously (assuming no crashing
> > due to low optimization levels, more accepting hardware, or a different
> > compiler (version)) the current method would calculate the checksum
> > assuming the first two bytes is the first word.
> > 
> 
> Depending on the point of view, my patch either fixes a bug (where the checksum was calculated incorrectly when the buf pointer was unaligned) or breaks the API (by calculating the differently when the buffer is unaligned).
> 
> I cannot say with certainty which one is correct, but perhaps some of the people with a deeper DPDK track record can...
> 
> @Bruce and @Stephen, in 2019 you signed off on a patch [1] introducing a 16 bit alignment requirement to the Ethernet address structure.
> 
> It is my understanding that DPDK has an invariant requiring packets to be 16 bit aligned, which that patch supports. Is this invariant documented anywhere, or am I completely wrong? If I'm wrong, then the alignment requirement introduced in that patch needs to be removed, as well as any similar alignment requirements elsewhere in DPDK.

I don't believe it is explicitly documented as a global invariant, but I
think it should be unless there is a definite case where we need to allow
packets to be completely unaligned. Across all packet headers we looked at,
there was no tunneling protocol where the resulting packet was left
unaligned.

That said, if there are real use cases where we need to allow packets to
start at an unaligned address, then I agree with you that we need to roll
back the patch and work to ensure everything works with unaligned
addresses.

/Bruce

> 
> [1] http://git.dpdk.org/dpdk/commit/lib/librte_net/rte_ether.h?id=da5350ef29afd35c1adabe76f60832f3092269ad
> 
> @Emil, we should wait for a conclusion about the alignment invariant before we proceed.
> 
> If there is no such invariant, my patch is wrong, and we need to provide a v2 of the patch, which will then fit your use case.
> If there is such an invariant, my patch is correct, and another function must be added for your use case.
>

Morten Brørup June 21, 2022, 9:35 a.m. UTC | #8

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Tuesday, 21 June 2022 10.23
> 
> On Tue, Jun 21, 2022 at 10:05:15AM +0200, Morten Brørup wrote:
> > +TO: @Bruce and @Stephen: You signed off on the 16 bit alignment
> requirement. We need background info on this.
> >
> > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > Sent: Tuesday, 21 June 2022 09.17
> > >
> > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > Sent: den 20 juni 2022 12:58
> > > >
> > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > Sent: Monday, 20 June 2022 12.38
> > > > >
> > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > Sent: den 17 juni 2022 11:07
> > > > > >
> > > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > > Sent: Friday, 17 June 2022 10.45
> > > > > > >
> > > > > > > With this patch, the checksum can be calculated on an
> unligned
> > > > > > > part
> > > > > of
> > > > > > > a packet buffer.
> > > > > > > I.e. the buf parameter is no longer required to be 16 bit
> > > aligned.
> > > > > > >
> > > > > > > The DPDK invariant that packet buffers must be 16 bit
> aligned
> > > > > remains
> > > > > > > unchanged.
> > > > > > > This invariant also defines how to calculate the 16 bit
> > > checksum
> > > > > > > on
> > > > > an
> > > > > > > unaligned part of a packet buffer.
> > > > > > >
> > > > > > > Bugzilla ID: 1035
> > > > > > > Cc: stable@dpdk.org
> > > > > > >
> > > > > > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > ---
> > > > > > >  lib/net/rte_ip.h | 17 +++++++++++++++--
> > > > > > >  1 file changed, 15 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index
> > > > > > > b502481670..8e301d9c26 100644
> > > > > > > --- a/lib/net/rte_ip.h
> > > > > > > +++ b/lib/net/rte_ip.h
> > > > > > > @@ -162,9 +162,22 @@ __rte_raw_cksum(const void *buf,
> size_t
> > > len,
> > > > > > > uint32_t sum)  {
> > > > > > >  	/* extend strict-aliasing rules */
> > > > > > >  	typedef uint16_t __attribute__((__may_alias__))
> u16_p;
> > > > > > > -	const u16_p *u16_buf = (const u16_p *)buf;
> > > > > > > -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > > > > > > +	const u16_p *u16_buf;
> > > > > > > +	const u16_p *end;
> > > > > > > +
> > > > > > > +	/* if buffer is unaligned, keeping it byte order
> > > independent */
> > > > > > > +	if (unlikely((uintptr_t)buf & 1)) {
> > > > > > > +		uint16_t first = 0;
> > > > > > > +		if (unlikely(len == 0))
> > > > > > > +			return 0;
> > > > > > > +		((unsigned char *)&first)[1] = *(const unsigned
> > > > > > char *)buf;
> > > > > > > +		sum += first;
> > > > > > > +		buf = (const void *)((uintptr_t)buf + 1);
> > > > > > > +		len--;
> > > > > > > +	}
> > > > > > >
> > > > > > > +	u16_buf = (const u16_p *)buf;
> > > > > > > +	end = u16_buf + len / sizeof(*u16_buf);
> > > > > > >  	for (; u16_buf != end; ++u16_buf)
> > > > > > >  		sum += *u16_buf;
> > > > > > >
> > > > > > > --
> > > > > > > 2.17.1
> > > > > >
> > > > > > @Emil, can you please test this patch with an unaligned
> buffer on
> > > > > your
> > > > > > application to confirm that it produces the expected result.
> > > > >
> > > > > Hi!
> > > > >
> > > > > I tested the patch. It doesn't seem to produce the same
> results. I
> > > > > think the problem is that it always starts summing from an even
> > > > > address, the sum should always start from the first byte
> according
> > > to
> > > > > the checksum specification. Can I instead propose something
> Mattias
> > > > > Rönnblom sent me?
> > > >
> > > > I assume that it produces the same result when the "buf"
> parameter is
> > > > aligned?
> > > >
> > > > And when the "buf" parameter is unaligned, I don't expect it to
> > > produce the
> > > > same results as the simple algorithm!
> > > >
> > > > This was the whole point of the patch: I expect the overall
> packet
> > > buffer to
> > > > be 16 bit aligned, and the checksum to be a partial checksum of
> such
> > > a 16 bit
> > > > aligned packet buffer. When calling this function, I assume that
> the
> > > "buf" and
> > > > "len" parameters point to a part of such a packet buffer. If
> these
> > > > expectations are correct, the simple algorithm will produce
> incorrect
> > > results
> > > > when "buf" is unaligned.
> > > >
> > > > I was asking you to test if the checksum on the packet is correct
> > > when your
> > > > application modifies an unaligned part of the packet and uses
> this
> > > function to
> > > > update the checksum.
> > > >
> > >
> > > Now I understand your use case. Your use case seems to be about
> partial
> > > checksums, of which some partial checksums may start on unaligned
> > > addresses in an otherwise aligned packet.
> > >
> > > Our use case is about calculating the full checksum on a nested
> packet.
> > > That nested packet may start on unaligned addresses.
> > >
> > > The difference is basically if we want to sum over aligned
> addresses or
> > > not, handling the heading and trailing bytes appropriately.
> > >
> > > Your method does not work in our case since we want to treat the
> first
> > > two bytes as the first word in our case. But I do understand that
> both
> > > methods are useful.
> >
> > Yes, that certainly are two different use cases, requiring two
> different ways of calculating the 16 bit checksum.
> >
> > >
> > > Note that your method breaks the API. Previously (assuming no
> crashing
> > > due to low optimization levels, more accepting hardware, or a
> different
> > > compiler (version)) the current method would calculate the checksum
> > > assuming the first two bytes is the first word.
> > >
> >
> > Depending on the point of view, my patch either fixes a bug (where
> the checksum was calculated incorrectly when the buf pointer was
> unaligned) or breaks the API (by calculating the differently when the
> buffer is unaligned).
> >
> > I cannot say with certainty which one is correct, but perhaps some of
> the people with a deeper DPDK track record can...
> >
> > @Bruce and @Stephen, in 2019 you signed off on a patch [1]
> introducing a 16 bit alignment requirement to the Ethernet address
> structure.
> >
> > It is my understanding that DPDK has an invariant requiring packets
> to be 16 bit aligned, which that patch supports. Is this invariant
> documented anywhere, or am I completely wrong? If I'm wrong, then the
> alignment requirement introduced in that patch needs to be removed, as
> well as any similar alignment requirements elsewhere in DPDK.
> 
> I don't believe it is explicitly documented as a global invariant, but
> I
> think it should be unless there is a definite case where we need to
> allow
> packets to be completely unaligned. Across all packet headers we looked
> at,
> there was no tunneling protocol where the resulting packet was left
> unaligned.
> 
> That said, if there are real use cases where we need to allow packets
> to
> start at an unaligned address, then I agree with you that we need to
> roll
> back the patch and work to ensure everything works with unaligned
> addresses.
> 
> /Bruce
>

@Emil, can you please describe or refer to which tunneling protocol you are using, where the nested packet can be unaligned?

I am asking to determine if your use case is exotic (maybe some Ericsson proprietary protocol), or more generic (rooted in some standard protocol). This information affects the DPDK community's opinion about how it should be supported by DPDK.

If possible, please provide more details about the tunneling protocol and nested packets... E.g. do the nested packets also contain Layer 2 (Ethernet, VLAN, etc.) headers, or only Layer 3 (IP) or Layer 4 (TCP, UDP, etc.)? And how about ARP packets and Layer 2 control protocol packets (STP, LACP, etc.)?

> >
> > [1]
> http://git.dpdk.org/dpdk/commit/lib/librte_net/rte_ether.h?id=da5350ef2
> 9afd35c1adabe76f60832f3092269ad
> >
> > @Emil, we should wait for a conclusion about the alignment invariant
> before we proceed.
> >
> > If there is no such invariant, my patch is wrong, and we need to
> provide a v2 of the patch, which will then fit your use case.
> > If there is such an invariant, my patch is correct, and another
> function must be added for your use case.
> >

Emil Berg June 22, 2022, 6:26 a.m. UTC | #9

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: den 21 juni 2022 11:35
> To: Emil Berg <emil.berg@ericsson.com>
> Cc: Bruce Richardson <bruce.richardson@intel.com>; Stephen Hemminger
> <stephen@networkplumber.org>; stable@dpdk.org; bugzilla@dpdk.org;
> hofors@lysator.liu.se; olivier.matz@6wind.com; dev@dpdk.org
> Subject: RE: [PATCH] net: fix checksum with unaligned buffer
> 
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Tuesday, 21 June 2022 10.23
> >
> > On Tue, Jun 21, 2022 at 10:05:15AM +0200, Morten Brørup wrote:
> > > +TO: @Bruce and @Stephen: You signed off on the 16 bit alignment
> > requirement. We need background info on this.
> > >
> > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > Sent: Tuesday, 21 June 2022 09.17
> > > >
> > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > Sent: den 20 juni 2022 12:58
> > > > >
> > > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > > Sent: Monday, 20 June 2022 12.38
> > > > > >
> > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > Sent: den 17 juni 2022 11:07
> > > > > > >
> > > > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > > > Sent: Friday, 17 June 2022 10.45
> > > > > > > >
> > > > > > > > With this patch, the checksum can be calculated on an
> > unligned
> > > > > > > > part
> > > > > > of
> > > > > > > > a packet buffer.
> > > > > > > > I.e. the buf parameter is no longer required to be 16 bit
> > > > aligned.
> > > > > > > >
> > > > > > > > The DPDK invariant that packet buffers must be 16 bit
> > aligned
> > > > > > remains
> > > > > > > > unchanged.
> > > > > > > > This invariant also defines how to calculate the 16 bit
> > > > checksum
> > > > > > > > on
> > > > > > an
> > > > > > > > unaligned part of a packet buffer.
> > > > > > > >
> > > > > > > > Bugzilla ID: 1035
> > > > > > > > Cc: stable@dpdk.org
> > > > > > > >
> > > > > > > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > ---
> > > > > > > >  lib/net/rte_ip.h | 17 +++++++++++++++--
> > > > > > > >  1 file changed, 15 insertions(+), 2 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index
> > > > > > > > b502481670..8e301d9c26 100644
> > > > > > > > --- a/lib/net/rte_ip.h
> > > > > > > > +++ b/lib/net/rte_ip.h
> > > > > > > > @@ -162,9 +162,22 @@ __rte_raw_cksum(const void *buf,
> > size_t
> > > > len,
> > > > > > > > uint32_t sum)  {
> > > > > > > >  	/* extend strict-aliasing rules */
> > > > > > > >  	typedef uint16_t __attribute__((__may_alias__))
> > u16_p;
> > > > > > > > -	const u16_p *u16_buf = (const u16_p *)buf;
> > > > > > > > -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > > > > > > > +	const u16_p *u16_buf;
> > > > > > > > +	const u16_p *end;
> > > > > > > > +
> > > > > > > > +	/* if buffer is unaligned, keeping it byte order
> > > > independent */
> > > > > > > > +	if (unlikely((uintptr_t)buf & 1)) {
> > > > > > > > +		uint16_t first = 0;
> > > > > > > > +		if (unlikely(len == 0))
> > > > > > > > +			return 0;
> > > > > > > > +		((unsigned char *)&first)[1] = *(const unsigned
> > > > > > > char *)buf;
> > > > > > > > +		sum += first;
> > > > > > > > +		buf = (const void *)((uintptr_t)buf + 1);
> > > > > > > > +		len--;
> > > > > > > > +	}
> > > > > > > >
> > > > > > > > +	u16_buf = (const u16_p *)buf;
> > > > > > > > +	end = u16_buf + len / sizeof(*u16_buf);
> > > > > > > >  	for (; u16_buf != end; ++u16_buf)
> > > > > > > >  		sum += *u16_buf;
> > > > > > > >
> > > > > > > > --
> > > > > > > > 2.17.1
> > > > > > >
> > > > > > > @Emil, can you please test this patch with an unaligned
> > buffer on
> > > > > > your
> > > > > > > application to confirm that it produces the expected result.
> > > > > >
> > > > > > Hi!
> > > > > >
> > > > > > I tested the patch. It doesn't seem to produce the same
> > results. I
> > > > > > think the problem is that it always starts summing from an
> > > > > > even address, the sum should always start from the first byte
> > according
> > > > to
> > > > > > the checksum specification. Can I instead propose something
> > Mattias
> > > > > > Rönnblom sent me?
> > > > >
> > > > > I assume that it produces the same result when the "buf"
> > parameter is
> > > > > aligned?
> > > > >
> > > > > And when the "buf" parameter is unaligned, I don't expect it to
> > > > produce the
> > > > > same results as the simple algorithm!
> > > > >
> > > > > This was the whole point of the patch: I expect the overall
> > packet
> > > > buffer to
> > > > > be 16 bit aligned, and the checksum to be a partial checksum of
> > such
> > > > a 16 bit
> > > > > aligned packet buffer. When calling this function, I assume that
> > the
> > > > "buf" and
> > > > > "len" parameters point to a part of such a packet buffer. If
> > these
> > > > > expectations are correct, the simple algorithm will produce
> > incorrect
> > > > results
> > > > > when "buf" is unaligned.
> > > > >
> > > > > I was asking you to test if the checksum on the packet is
> > > > > correct
> > > > when your
> > > > > application modifies an unaligned part of the packet and uses
> > this
> > > > function to
> > > > > update the checksum.
> > > > >
> > > >
> > > > Now I understand your use case. Your use case seems to be about
> > partial
> > > > checksums, of which some partial checksums may start on unaligned
> > > > addresses in an otherwise aligned packet.
> > > >
> > > > Our use case is about calculating the full checksum on a nested
> > packet.
> > > > That nested packet may start on unaligned addresses.
> > > >
> > > > The difference is basically if we want to sum over aligned
> > addresses or
> > > > not, handling the heading and trailing bytes appropriately.
> > > >
> > > > Your method does not work in our case since we want to treat the
> > first
> > > > two bytes as the first word in our case. But I do understand that
> > both
> > > > methods are useful.
> > >
> > > Yes, that certainly are two different use cases, requiring two
> > different ways of calculating the 16 bit checksum.
> > >
> > > >
> > > > Note that your method breaks the API. Previously (assuming no
> > crashing
> > > > due to low optimization levels, more accepting hardware, or a
> > different
> > > > compiler (version)) the current method would calculate the
> > > > checksum assuming the first two bytes is the first word.
> > > >
> > >
> > > Depending on the point of view, my patch either fixes a bug (where
> > the checksum was calculated incorrectly when the buf pointer was
> > unaligned) or breaks the API (by calculating the differently when the
> > buffer is unaligned).
> > >
> > > I cannot say with certainty which one is correct, but perhaps some
> > > of
> > the people with a deeper DPDK track record can...
> > >
> > > @Bruce and @Stephen, in 2019 you signed off on a patch [1]
> > introducing a 16 bit alignment requirement to the Ethernet address
> > structure.
> > >
> > > It is my understanding that DPDK has an invariant requiring packets
> > to be 16 bit aligned, which that patch supports. Is this invariant
> > documented anywhere, or am I completely wrong? If I'm wrong, then the
> > alignment requirement introduced in that patch needs to be removed, as
> > well as any similar alignment requirements elsewhere in DPDK.
> >
> > I don't believe it is explicitly documented as a global invariant, but
> > I think it should be unless there is a definite case where we need to
> > allow packets to be completely unaligned. Across all packet headers we
> > looked at, there was no tunneling protocol where the resulting packet
> > was left unaligned.
> >
> > That said, if there are real use cases where we need to allow packets
> > to start at an unaligned address, then I agree with you that we need
> > to roll back the patch and work to ensure everything works with
> > unaligned addresses.
> >
> > /Bruce
> >
> 
> @Emil, can you please describe or refer to which tunneling protocol you are
> using, where the nested packet can be unaligned?
> 
> I am asking to determine if your use case is exotic (maybe some Ericsson
> proprietary protocol), or more generic (rooted in some standard protocol).
> This information affects the DPDK community's opinion about how it should
> be supported by DPDK.
> 
> If possible, please provide more details about the tunneling protocol and
> nested packets... E.g. do the nested packets also contain Layer 2 (Ethernet,
> VLAN, etc.) headers, or only Layer 3 (IP) or Layer 4 (TCP, UDP, etc.)? And how
> about ARP packets and Layer 2 control protocol packets (STP, LACP, etc.)?
> 

Well, if you append or adjust an odd number of bytes (e.g. a PDCP header) from a previously aligned payload the entire packet will then be unaligned.

> > >
> > > [1]
> > https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-45444
> > 5555731-713e91ae28ea4a95&q=1&e=91f8f355-4366-43bd-ac93-
> 4c2f375f8d25&u=
> >
> http%3A%2F%2Fgit.dpdk.org%2Fdpdk%2Fcommit%2Flib%2Flibrte_net%2Frt
> e_eth
> > er.h%3Fid%3Dda5350ef2
> > 9afd35c1adabe76f60832f3092269ad
> > >
> > > @Emil, we should wait for a conclusion about the alignment invariant
> > before we proceed.
> > >
> > > If there is no such invariant, my patch is wrong, and we need to
> > provide a v2 of the patch, which will then fit your use case.
> > > If there is such an invariant, my patch is correct, and another
> > function must be added for your use case.
> > >

Bruce Richardson June 22, 2022, 9:18 a.m. UTC | #10

On Wed, Jun 22, 2022 at 06:26:07AM +0000, Emil Berg wrote:
> 
> 
> > -----Original Message-----
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: den 21 juni 2022 11:35
> > To: Emil Berg <emil.berg@ericsson.com>
> > Cc: Bruce Richardson <bruce.richardson@intel.com>; Stephen Hemminger
> > <stephen@networkplumber.org>; stable@dpdk.org; bugzilla@dpdk.org;
> > hofors@lysator.liu.se; olivier.matz@6wind.com; dev@dpdk.org
> > Subject: RE: [PATCH] net: fix checksum with unaligned buffer
> > 
> > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > Sent: Tuesday, 21 June 2022 10.23
> > >
> > > On Tue, Jun 21, 2022 at 10:05:15AM +0200, Morten Brørup wrote:
> > > > +TO: @Bruce and @Stephen: You signed off on the 16 bit alignment
> > > requirement. We need background info on this.
> > > >
> > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > Sent: Tuesday, 21 June 2022 09.17
> > > > >
> > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > Sent: den 20 juni 2022 12:58
> > > > > >
> > > > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > > > Sent: Monday, 20 June 2022 12.38
> > > > > > >
> > > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > Sent: den 17 juni 2022 11:07
> > > > > > > >
> > > > > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > > > > Sent: Friday, 17 June 2022 10.45
> > > > > > > > >
> > > > > > > > > With this patch, the checksum can be calculated on an
> > > unligned
> > > > > > > > > part
> > > > > > > of
> > > > > > > > > a packet buffer.
> > > > > > > > > I.e. the buf parameter is no longer required to be 16 bit
> > > > > aligned.
> > > > > > > > >
> > > > > > > > > The DPDK invariant that packet buffers must be 16 bit
> > > aligned
> > > > > > > remains
> > > > > > > > > unchanged.
> > > > > > > > > This invariant also defines how to calculate the 16 bit
> > > > > checksum
> > > > > > > > > on
> > > > > > > an
> > > > > > > > > unaligned part of a packet buffer.
> > > > > > > > >
> > > > > > > > > Bugzilla ID: 1035
> > > > > > > > > Cc: stable@dpdk.org
> > > > > > > > >
> > > > > > > > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > > ---
> > > > > > > > >  lib/net/rte_ip.h | 17 +++++++++++++++--
> > > > > > > > >  1 file changed, 15 insertions(+), 2 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index
> > > > > > > > > b502481670..8e301d9c26 100644
> > > > > > > > > --- a/lib/net/rte_ip.h
> > > > > > > > > +++ b/lib/net/rte_ip.h
> > > > > > > > > @@ -162,9 +162,22 @@ __rte_raw_cksum(const void *buf,
> > > size_t
> > > > > len,
> > > > > > > > > uint32_t sum)  {
> > > > > > > > >  	/* extend strict-aliasing rules */
> > > > > > > > >  	typedef uint16_t __attribute__((__may_alias__))
> > > u16_p;
> > > > > > > > > -	const u16_p *u16_buf = (const u16_p *)buf;
> > > > > > > > > -	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
> > > > > > > > > +	const u16_p *u16_buf;
> > > > > > > > > +	const u16_p *end;
> > > > > > > > > +
> > > > > > > > > +	/* if buffer is unaligned, keeping it byte order
> > > > > independent */
> > > > > > > > > +	if (unlikely((uintptr_t)buf & 1)) {
> > > > > > > > > +		uint16_t first = 0;
> > > > > > > > > +		if (unlikely(len == 0))
> > > > > > > > > +			return 0;
> > > > > > > > > +		((unsigned char *)&first)[1] = *(const unsigned
> > > > > > > > char *)buf;
> > > > > > > > > +		sum += first;
> > > > > > > > > +		buf = (const void *)((uintptr_t)buf + 1);
> > > > > > > > > +		len--;
> > > > > > > > > +	}
> > > > > > > > >
> > > > > > > > > +	u16_buf = (const u16_p *)buf;
> > > > > > > > > +	end = u16_buf + len / sizeof(*u16_buf);
> > > > > > > > >  	for (; u16_buf != end; ++u16_buf)
> > > > > > > > >  		sum += *u16_buf;
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 2.17.1
> > > > > > > >
> > > > > > > > @Emil, can you please test this patch with an unaligned
> > > buffer on
> > > > > > > your
> > > > > > > > application to confirm that it produces the expected result.
> > > > > > >
> > > > > > > Hi!
> > > > > > >
> > > > > > > I tested the patch. It doesn't seem to produce the same
> > > results. I
> > > > > > > think the problem is that it always starts summing from an
> > > > > > > even address, the sum should always start from the first byte
> > > according
> > > > > to
> > > > > > > the checksum specification. Can I instead propose something
> > > Mattias
> > > > > > > Rönnblom sent me?
> > > > > >
> > > > > > I assume that it produces the same result when the "buf"
> > > parameter is
> > > > > > aligned?
> > > > > >
> > > > > > And when the "buf" parameter is unaligned, I don't expect it to
> > > > > produce the
> > > > > > same results as the simple algorithm!
> > > > > >
> > > > > > This was the whole point of the patch: I expect the overall
> > > packet
> > > > > buffer to
> > > > > > be 16 bit aligned, and the checksum to be a partial checksum of
> > > such
> > > > > a 16 bit
> > > > > > aligned packet buffer. When calling this function, I assume that
> > > the
> > > > > "buf" and
> > > > > > "len" parameters point to a part of such a packet buffer. If
> > > these
> > > > > > expectations are correct, the simple algorithm will produce
> > > incorrect
> > > > > results
> > > > > > when "buf" is unaligned.
> > > > > >
> > > > > > I was asking you to test if the checksum on the packet is
> > > > > > correct
> > > > > when your
> > > > > > application modifies an unaligned part of the packet and uses
> > > this
> > > > > function to
> > > > > > update the checksum.
> > > > > >
> > > > >
> > > > > Now I understand your use case. Your use case seems to be about
> > > partial
> > > > > checksums, of which some partial checksums may start on unaligned
> > > > > addresses in an otherwise aligned packet.
> > > > >
> > > > > Our use case is about calculating the full checksum on a nested
> > > packet.
> > > > > That nested packet may start on unaligned addresses.
> > > > >
> > > > > The difference is basically if we want to sum over aligned
> > > addresses or
> > > > > not, handling the heading and trailing bytes appropriately.
> > > > >
> > > > > Your method does not work in our case since we want to treat the
> > > first
> > > > > two bytes as the first word in our case. But I do understand that
> > > both
> > > > > methods are useful.
> > > >
> > > > Yes, that certainly are two different use cases, requiring two
> > > different ways of calculating the 16 bit checksum.
> > > >
> > > > >
> > > > > Note that your method breaks the API. Previously (assuming no
> > > crashing
> > > > > due to low optimization levels, more accepting hardware, or a
> > > different
> > > > > compiler (version)) the current method would calculate the
> > > > > checksum assuming the first two bytes is the first word.
> > > > >
> > > >
> > > > Depending on the point of view, my patch either fixes a bug (where
> > > the checksum was calculated incorrectly when the buf pointer was
> > > unaligned) or breaks the API (by calculating the differently when the
> > > buffer is unaligned).
> > > >
> > > > I cannot say with certainty which one is correct, but perhaps some
> > > > of
> > > the people with a deeper DPDK track record can...
> > > >
> > > > @Bruce and @Stephen, in 2019 you signed off on a patch [1]
> > > introducing a 16 bit alignment requirement to the Ethernet address
> > > structure.
> > > >
> > > > It is my understanding that DPDK has an invariant requiring packets
> > > to be 16 bit aligned, which that patch supports. Is this invariant
> > > documented anywhere, or am I completely wrong? If I'm wrong, then the
> > > alignment requirement introduced in that patch needs to be removed, as
> > > well as any similar alignment requirements elsewhere in DPDK.
> > >
> > > I don't believe it is explicitly documented as a global invariant, but
> > > I think it should be unless there is a definite case where we need to
> > > allow packets to be completely unaligned. Across all packet headers we
> > > looked at, there was no tunneling protocol where the resulting packet
> > > was left unaligned.
> > >
> > > That said, if there are real use cases where we need to allow packets
> > > to start at an unaligned address, then I agree with you that we need
> > > to roll back the patch and work to ensure everything works with
> > > unaligned addresses.
> > >
> > > /Bruce
> > >
> > 
> > @Emil, can you please describe or refer to which tunneling protocol you are
> > using, where the nested packet can be unaligned?
> > 
> > I am asking to determine if your use case is exotic (maybe some Ericsson
> > proprietary protocol), or more generic (rooted in some standard protocol).
> > This information affects the DPDK community's opinion about how it should
> > be supported by DPDK.
> > 
> > If possible, please provide more details about the tunneling protocol and
> > nested packets... E.g. do the nested packets also contain Layer 2 (Ethernet,
> > VLAN, etc.) headers, or only Layer 3 (IP) or Layer 4 (TCP, UDP, etc.)? And how
> > about ARP packets and Layer 2 control protocol packets (STP, LACP, etc.)?
> > 
> 
> Well, if you append or adjust an odd number of bytes (e.g. a PDCP header) from a previously aligned payload the entire packet will then be unaligned.
> 

If PDCP headers can leave the rest of the packet field unaligned, then we
had better remove the alignment restrictions through all of DPDK.

/Bruce

Morten Brørup June 22, 2022, 11:26 a.m. UTC | #11

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Wednesday, 22 June 2022 11.18
> 
> On Wed, Jun 22, 2022 at 06:26:07AM +0000, Emil Berg wrote:
> >
> > > From: Morten Brørup <mb@smartsharesystems.com>
> > > Sent: den 21 juni 2022 11:35
> > >
> > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > Sent: Tuesday, 21 June 2022 10.23
> > > >
> > > > On Tue, Jun 21, 2022 at 10:05:15AM +0200, Morten Brørup wrote:
> > > > > +TO: @Bruce and @Stephen: You signed off on the 16 bit
> alignment
> > > > requirement. We need background info on this.
> > > > >
> > > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > > Sent: Tuesday, 21 June 2022 09.17
> > > > > >
> > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > Sent: den 20 juni 2022 12:58
> > > > > > >
> > > > > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > > > > Sent: Monday, 20 June 2022 12.38
> > > > > > > >
> > > > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > > Sent: den 17 juni 2022 11:07
> > > > > > > > >
> > > > > > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > > > > > Sent: Friday, 17 June 2022 10.45
> > > > > > > > > >
> > > > > > > > > > With this patch, the checksum can be calculated on an
> > > > unligned
> > > > > > > > > > part
> > > > > > > > of
> > > > > > > > > > a packet buffer.
> > > > > > > > > > I.e. the buf parameter is no longer required to be 16
> bit
> > > > > > aligned.
> > > > > > > > > >
> > > > > > > > > > The DPDK invariant that packet buffers must be 16 bit
> > > > aligned
> > > > > > > > remains
> > > > > > > > > > unchanged.
> > > > > > > > > > This invariant also defines how to calculate the 16
> bit
> > > > > > checksum
> > > > > > > > > > on
> > > > > > > > an
> > > > > > > > > > unaligned part of a packet buffer.
> > > > > > > > > >
> > > > > > > > > > Bugzilla ID: 1035
> > > > > > > > > > Cc: stable@dpdk.org
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Morten Brørup
> <mb@smartsharesystems.com>
> > > > > > > > > > ---
> > > > > > > > > >  lib/net/rte_ip.h | 17 +++++++++++++++--
> > > > > > > > > >  1 file changed, 15 insertions(+), 2 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> index
> > > > > > > > > > b502481670..8e301d9c26 100644
> > > > > > > > > > --- a/lib/net/rte_ip.h
> > > > > > > > > > +++ b/lib/net/rte_ip.h
> > > > > > > > > > @@ -162,9 +162,22 @@ __rte_raw_cksum(const void *buf,
> > > > size_t
> > > > > > len,
> > > > > > > > > > uint32_t sum)  {
> > > > > > > > > >  	/* extend strict-aliasing rules */
> > > > > > > > > >  	typedef uint16_t __attribute__((__may_alias__))
> > > > u16_p;
> > > > > > > > > > -	const u16_p *u16_buf = (const u16_p *)buf;
> > > > > > > > > > -	const u16_p *end = u16_buf + len /
> sizeof(*u16_buf);
> > > > > > > > > > +	const u16_p *u16_buf;
> > > > > > > > > > +	const u16_p *end;
> > > > > > > > > > +
> > > > > > > > > > +	/* if buffer is unaligned, keeping it byte
> order
> > > > > > independent */
> > > > > > > > > > +	if (unlikely((uintptr_t)buf & 1)) {
> > > > > > > > > > +		uint16_t first = 0;
> > > > > > > > > > +		if (unlikely(len == 0))
> > > > > > > > > > +			return 0;
> > > > > > > > > > +		((unsigned char *)&first)[1] = *(const
> unsigned
> > > > > > > > > char *)buf;
> > > > > > > > > > +		sum += first;
> > > > > > > > > > +		buf = (const void *)((uintptr_t)buf + 1);
> > > > > > > > > > +		len--;
> > > > > > > > > > +	}
> > > > > > > > > >
> > > > > > > > > > +	u16_buf = (const u16_p *)buf;
> > > > > > > > > > +	end = u16_buf + len / sizeof(*u16_buf);
> > > > > > > > > >  	for (; u16_buf != end; ++u16_buf)
> > > > > > > > > >  		sum += *u16_buf;
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > 2.17.1
> > > > > > > > >
> > > > > > > > > @Emil, can you please test this patch with an unaligned
> > > > buffer on
> > > > > > > > your
> > > > > > > > > application to confirm that it produces the expected
> result.
> > > > > > > >
> > > > > > > > Hi!
> > > > > > > >
> > > > > > > > I tested the patch. It doesn't seem to produce the same
> > > > results. I
> > > > > > > > think the problem is that it always starts summing from
> an
> > > > > > > > even address, the sum should always start from the first
> byte
> > > > according
> > > > > > to
> > > > > > > > the checksum specification. Can I instead propose
> something
> > > > Mattias
> > > > > > > > Rönnblom sent me?
> > > > > > >
> > > > > > > I assume that it produces the same result when the "buf"
> > > > parameter is
> > > > > > > aligned?
> > > > > > >
> > > > > > > And when the "buf" parameter is unaligned, I don't expect
> it to
> > > > > > produce the
> > > > > > > same results as the simple algorithm!
> > > > > > >
> > > > > > > This was the whole point of the patch: I expect the overall
> > > > packet
> > > > > > buffer to
> > > > > > > be 16 bit aligned, and the checksum to be a partial
> checksum of
> > > > such
> > > > > > a 16 bit
> > > > > > > aligned packet buffer. When calling this function, I assume
> that
> > > > the
> > > > > > "buf" and
> > > > > > > "len" parameters point to a part of such a packet buffer.
> If
> > > > these
> > > > > > > expectations are correct, the simple algorithm will produce
> > > > incorrect
> > > > > > results
> > > > > > > when "buf" is unaligned.
> > > > > > >
> > > > > > > I was asking you to test if the checksum on the packet is
> > > > > > > correct
> > > > > > when your
> > > > > > > application modifies an unaligned part of the packet and
> uses
> > > > this
> > > > > > function to
> > > > > > > update the checksum.
> > > > > > >
> > > > > >
> > > > > > Now I understand your use case. Your use case seems to be
> about
> > > > partial
> > > > > > checksums, of which some partial checksums may start on
> unaligned
> > > > > > addresses in an otherwise aligned packet.
> > > > > >
> > > > > > Our use case is about calculating the full checksum on a
> nested
> > > > packet.
> > > > > > That nested packet may start on unaligned addresses.
> > > > > >
> > > > > > The difference is basically if we want to sum over aligned
> > > > addresses or
> > > > > > not, handling the heading and trailing bytes appropriately.
> > > > > >
> > > > > > Your method does not work in our case since we want to treat
> the
> > > > first
> > > > > > two bytes as the first word in our case. But I do understand
> that
> > > > both
> > > > > > methods are useful.
> > > > >
> > > > > Yes, that certainly are two different use cases, requiring two
> > > > different ways of calculating the 16 bit checksum.
> > > > >
> > > > > >
> > > > > > Note that your method breaks the API. Previously (assuming no
> > > > crashing
> > > > > > due to low optimization levels, more accepting hardware, or a
> > > > different
> > > > > > compiler (version)) the current method would calculate the
> > > > > > checksum assuming the first two bytes is the first word.
> > > > > >
> > > > >
> > > > > Depending on the point of view, my patch either fixes a bug
> (where
> > > > the checksum was calculated incorrectly when the buf pointer was
> > > > unaligned) or breaks the API (by calculating the differently when
> the
> > > > buffer is unaligned).
> > > > >
> > > > > I cannot say with certainty which one is correct, but perhaps
> some
> > > > > of
> > > > the people with a deeper DPDK track record can...
> > > > >
> > > > > @Bruce and @Stephen, in 2019 you signed off on a patch [1]
> > > > introducing a 16 bit alignment requirement to the Ethernet
> address
> > > > structure.
> > > > >
> > > > > It is my understanding that DPDK has an invariant requiring
> packets
> > > > to be 16 bit aligned, which that patch supports. Is this
> invariant
> > > > documented anywhere, or am I completely wrong? If I'm wrong, then
> the
> > > > alignment requirement introduced in that patch needs to be
> removed, as
> > > > well as any similar alignment requirements elsewhere in DPDK.
> > > >
> > > > I don't believe it is explicitly documented as a global
> invariant, but
> > > > I think it should be unless there is a definite case where we
> need to
> > > > allow packets to be completely unaligned. Across all packet
> headers we
> > > > looked at, there was no tunneling protocol where the resulting
> packet
> > > > was left unaligned.
> > > >
> > > > That said, if there are real use cases where we need to allow
> packets
> > > > to start at an unaligned address, then I agree with you that we
> need
> > > > to roll back the patch and work to ensure everything works with
> > > > unaligned addresses.
> > > >
> > > > /Bruce
> > > >
> > >
> > > @Emil, can you please describe or refer to which tunneling protocol
> you are
> > > using, where the nested packet can be unaligned?
> > >
> > > I am asking to determine if your use case is exotic (maybe some
> Ericsson
> > > proprietary protocol), or more generic (rooted in some standard
> protocol).
> > > This information affects the DPDK community's opinion about how it
> should
> > > be supported by DPDK.
> > >
> > > If possible, please provide more details about the tunneling
> protocol and
> > > nested packets... E.g. do the nested packets also contain Layer 2
> (Ethernet,
> > > VLAN, etc.) headers, or only Layer 3 (IP) or Layer 4 (TCP, UDP,
> etc.)? And how
> > > about ARP packets and Layer 2 control protocol packets (STP, LACP,
> etc.)?
> > >
> >
> > Well, if you append or adjust an odd number of bytes (e.g. a PDCP
> header) from a previously aligned payload the entire packet will then
> be unaligned.
> >
> 
> If PDCP headers can leave the rest of the packet field unaligned, then
> we
> had better remove the alignment restrictions through all of DPDK.
> 
> /Bruce

Re-reading the details regarding unaligned pointers in C11, as posted by Emil in Bugzilla [2], I interpret it as follows: Any 16 bit or wider pointer type a must point to data aligned with that type, i.e. a pointer of the type "uint16_t *" must point to 16 bit aligned data, and a pointer of the type "uint64_t *" must point to 64 bit aligned data. Please, someone tell me I got this wrong, and wake me up from my nightmare!

Updating DPDK's packet structures to fully support this C11 limitation with unaligned access would be a nightmare, as we would need to use byte arrays for all structure fields. Functions would also be unable to use other pointer types than "void *" and "char *", which seems to be the actual problem in the __rte_raw_cksum() function. I guess that it also would prevent the compiler from auto-vectorizing the functions.

I am usually a big proponent of academically correct solutions, but such a change would be too wide ranging, so I would like to narrow it down to the actual use case, and perhaps extrapolate a bit from there.

@Emil: Do you only need to calculate the checksum of the (potentially unaligned) embedded packet? Or do you also need to use other DPDK functions with the embedded packet, potentially accessing it at an unaligned address?

I'm trying to determine the scope of this C11 pointer alignment limitation for your use case, i.e. whether or not other DPDK functions need to be updated to support unaligned packet access too.

[2] https://bugs.dpdk.org/show_bug.cgi?id=1035

Emil Berg June 22, 2022, 12:25 p.m. UTC | #12

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: den 22 juni 2022 13:26
> To: Bruce Richardson <bruce.richardson@intel.com>; Emil Berg
> <emil.berg@ericsson.com>
> Cc: Stephen Hemminger <stephen@networkplumber.org>;
> stable@dpdk.org; bugzilla@dpdk.org; hofors@lysator.liu.se;
> olivier.matz@6wind.com; dev@dpdk.org
> Subject: RE: [PATCH] net: fix checksum with unaligned buffer
> 
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Wednesday, 22 June 2022 11.18
> >
> > On Wed, Jun 22, 2022 at 06:26:07AM +0000, Emil Berg wrote:
> > >
> > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > Sent: den 21 juni 2022 11:35
> > > >
> > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > > Sent: Tuesday, 21 June 2022 10.23
> > > > >
> > > > > On Tue, Jun 21, 2022 at 10:05:15AM +0200, Morten Brørup wrote:
> > > > > > +TO: @Bruce and @Stephen: You signed off on the 16 bit
> > alignment
> > > > > requirement. We need background info on this.
> > > > > >
> > > > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > > > Sent: Tuesday, 21 June 2022 09.17
> > > > > > >
> > > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > Sent: den 20 juni 2022 12:58
> > > > > > > >
> > > > > > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > > > > > Sent: Monday, 20 June 2022 12.38
> > > > > > > > >
> > > > > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > > > Sent: den 17 juni 2022 11:07
> > > > > > > > > >
> > > > > > > > > > > From: Morten Brørup
> > > > > > > > > > > [mailto:mb@smartsharesystems.com]
> > > > > > > > > > > Sent: Friday, 17 June 2022 10.45
> > > > > > > > > > >
> > > > > > > > > > > With this patch, the checksum can be calculated on
> > > > > > > > > > > an
> > > > > unligned
> > > > > > > > > > > part
> > > > > > > > > of
> > > > > > > > > > > a packet buffer.
> > > > > > > > > > > I.e. the buf parameter is no longer required to be
> > > > > > > > > > > 16
> > bit
> > > > > > > aligned.
> > > > > > > > > > >
> > > > > > > > > > > The DPDK invariant that packet buffers must be 16
> > > > > > > > > > > bit
> > > > > aligned
> > > > > > > > > remains
> > > > > > > > > > > unchanged.
> > > > > > > > > > > This invariant also defines how to calculate the 16
> > bit
> > > > > > > checksum
> > > > > > > > > > > on
> > > > > > > > > an
> > > > > > > > > > > unaligned part of a packet buffer.
> > > > > > > > > > >
> > > > > > > > > > > Bugzilla ID: 1035
> > > > > > > > > > > Cc: stable@dpdk.org
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Morten Brørup
> > <mb@smartsharesystems.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  lib/net/rte_ip.h | 17 +++++++++++++++--
> > > > > > > > > > >  1 file changed, 15 insertions(+), 2 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> > index
> > > > > > > > > > > b502481670..8e301d9c26 100644
> > > > > > > > > > > --- a/lib/net/rte_ip.h
> > > > > > > > > > > +++ b/lib/net/rte_ip.h
> > > > > > > > > > > @@ -162,9 +162,22 @@ __rte_raw_cksum(const void
> > > > > > > > > > > *buf,
> > > > > size_t
> > > > > > > len,
> > > > > > > > > > > uint32_t sum)  {
> > > > > > > > > > >  	/* extend strict-aliasing rules */
> > > > > > > > > > >  	typedef uint16_t
> __attribute__((__may_alias__))
> > > > > u16_p;
> > > > > > > > > > > -	const u16_p *u16_buf = (const u16_p *)buf;
> > > > > > > > > > > -	const u16_p *end = u16_buf + len /
> > sizeof(*u16_buf);
> > > > > > > > > > > +	const u16_p *u16_buf;
> > > > > > > > > > > +	const u16_p *end;
> > > > > > > > > > > +
> > > > > > > > > > > +	/* if buffer is unaligned, keeping it byte
> > order
> > > > > > > independent */
> > > > > > > > > > > +	if (unlikely((uintptr_t)buf & 1)) {
> > > > > > > > > > > +		uint16_t first = 0;
> > > > > > > > > > > +		if (unlikely(len == 0))
> > > > > > > > > > > +			return 0;
> > > > > > > > > > > +		((unsigned char *)&first)[1] =
> *(const
> > unsigned
> > > > > > > > > > char *)buf;
> > > > > > > > > > > +		sum += first;
> > > > > > > > > > > +		buf = (const void *)((uintptr_t)buf
> + 1);
> > > > > > > > > > > +		len--;
> > > > > > > > > > > +	}
> > > > > > > > > > >
> > > > > > > > > > > +	u16_buf = (const u16_p *)buf;
> > > > > > > > > > > +	end = u16_buf + len / sizeof(*u16_buf);
> > > > > > > > > > >  	for (; u16_buf != end; ++u16_buf)
> > > > > > > > > > >  		sum += *u16_buf;
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > 2.17.1
> > > > > > > > > >
> > > > > > > > > > @Emil, can you please test this patch with an
> > > > > > > > > > unaligned
> > > > > buffer on
> > > > > > > > > your
> > > > > > > > > > application to confirm that it produces the expected
> > result.
> > > > > > > > >
> > > > > > > > > Hi!
> > > > > > > > >
> > > > > > > > > I tested the patch. It doesn't seem to produce the same
> > > > > results. I
> > > > > > > > > think the problem is that it always starts summing from
> > an
> > > > > > > > > even address, the sum should always start from the first
> > byte
> > > > > according
> > > > > > > to
> > > > > > > > > the checksum specification. Can I instead propose
> > something
> > > > > Mattias
> > > > > > > > > Rönnblom sent me?
> > > > > > > >
> > > > > > > > I assume that it produces the same result when the "buf"
> > > > > parameter is
> > > > > > > > aligned?
> > > > > > > >
> > > > > > > > And when the "buf" parameter is unaligned, I don't expect
> > it to
> > > > > > > produce the
> > > > > > > > same results as the simple algorithm!
> > > > > > > >
> > > > > > > > This was the whole point of the patch: I expect the
> > > > > > > > overall
> > > > > packet
> > > > > > > buffer to
> > > > > > > > be 16 bit aligned, and the checksum to be a partial
> > checksum of
> > > > > such
> > > > > > > a 16 bit
> > > > > > > > aligned packet buffer. When calling this function, I
> > > > > > > > assume
> > that
> > > > > the
> > > > > > > "buf" and
> > > > > > > > "len" parameters point to a part of such a packet buffer.
> > If
> > > > > these
> > > > > > > > expectations are correct, the simple algorithm will
> > > > > > > > produce
> > > > > incorrect
> > > > > > > results
> > > > > > > > when "buf" is unaligned.
> > > > > > > >
> > > > > > > > I was asking you to test if the checksum on the packet is
> > > > > > > > correct
> > > > > > > when your
> > > > > > > > application modifies an unaligned part of the packet and
> > uses
> > > > > this
> > > > > > > function to
> > > > > > > > update the checksum.
> > > > > > > >
> > > > > > >
> > > > > > > Now I understand your use case. Your use case seems to be
> > about
> > > > > partial
> > > > > > > checksums, of which some partial checksums may start on
> > unaligned
> > > > > > > addresses in an otherwise aligned packet.
> > > > > > >
> > > > > > > Our use case is about calculating the full checksum on a
> > nested
> > > > > packet.
> > > > > > > That nested packet may start on unaligned addresses.
> > > > > > >
> > > > > > > The difference is basically if we want to sum over aligned
> > > > > addresses or
> > > > > > > not, handling the heading and trailing bytes appropriately.
> > > > > > >
> > > > > > > Your method does not work in our case since we want to treat
> > the
> > > > > first
> > > > > > > two bytes as the first word in our case. But I do understand
> > that
> > > > > both
> > > > > > > methods are useful.
> > > > > >
> > > > > > Yes, that certainly are two different use cases, requiring two
> > > > > different ways of calculating the 16 bit checksum.
> > > > > >
> > > > > > >
> > > > > > > Note that your method breaks the API. Previously (assuming
> > > > > > > no
> > > > > crashing
> > > > > > > due to low optimization levels, more accepting hardware, or
> > > > > > > a
> > > > > different
> > > > > > > compiler (version)) the current method would calculate the
> > > > > > > checksum assuming the first two bytes is the first word.
> > > > > > >
> > > > > >
> > > > > > Depending on the point of view, my patch either fixes a bug
> > (where
> > > > > the checksum was calculated incorrectly when the buf pointer was
> > > > > unaligned) or breaks the API (by calculating the differently
> > > > > when
> > the
> > > > > buffer is unaligned).
> > > > > >
> > > > > > I cannot say with certainty which one is correct, but perhaps
> > some
> > > > > > of
> > > > > the people with a deeper DPDK track record can...
> > > > > >
> > > > > > @Bruce and @Stephen, in 2019 you signed off on a patch [1]
> > > > > introducing a 16 bit alignment requirement to the Ethernet
> > address
> > > > > structure.
> > > > > >
> > > > > > It is my understanding that DPDK has an invariant requiring
> > packets
> > > > > to be 16 bit aligned, which that patch supports. Is this
> > invariant
> > > > > documented anywhere, or am I completely wrong? If I'm wrong,
> > > > > then
> > the
> > > > > alignment requirement introduced in that patch needs to be
> > removed, as
> > > > > well as any similar alignment requirements elsewhere in DPDK.
> > > > >
> > > > > I don't believe it is explicitly documented as a global
> > invariant, but
> > > > > I think it should be unless there is a definite case where we
> > need to
> > > > > allow packets to be completely unaligned. Across all packet
> > headers we
> > > > > looked at, there was no tunneling protocol where the resulting
> > packet
> > > > > was left unaligned.
> > > > >
> > > > > That said, if there are real use cases where we need to allow
> > packets
> > > > > to start at an unaligned address, then I agree with you that we
> > need
> > > > > to roll back the patch and work to ensure everything works with
> > > > > unaligned addresses.
> > > > >
> > > > > /Bruce
> > > > >
> > > >
> > > > @Emil, can you please describe or refer to which tunneling
> > > > protocol
> > you are
> > > > using, where the nested packet can be unaligned?
> > > >
> > > > I am asking to determine if your use case is exotic (maybe some
> > Ericsson
> > > > proprietary protocol), or more generic (rooted in some standard
> > protocol).
> > > > This information affects the DPDK community's opinion about how it
> > should
> > > > be supported by DPDK.
> > > >
> > > > If possible, please provide more details about the tunneling
> > protocol and
> > > > nested packets... E.g. do the nested packets also contain Layer 2
> > (Ethernet,
> > > > VLAN, etc.) headers, or only Layer 3 (IP) or Layer 4 (TCP, UDP,
> > etc.)? And how
> > > > about ARP packets and Layer 2 control protocol packets (STP, LACP,
> > etc.)?
> > > >
> > >
> > > Well, if you append or adjust an odd number of bytes (e.g. a PDCP
> > header) from a previously aligned payload the entire packet will then
> > be unaligned.
> > >
> >
> > If PDCP headers can leave the rest of the packet field unaligned, then
> > we had better remove the alignment restrictions through all of DPDK.
> >
> > /Bruce
> 
> Re-reading the details regarding unaligned pointers in C11, as posted by Emil
> in Bugzilla [2], I interpret it as follows: Any 16 bit or wider pointer type a must
> point to data aligned with that type, i.e. a pointer of the type "uint16_t *"
> must point to 16 bit aligned data, and a pointer of the type "uint64_t *" must
> point to 64 bit aligned data. Please, someone tell me I got this wrong, and
> wake me up from my nightmare!
> 
> Updating DPDK's packet structures to fully support this C11 limitation with
> unaligned access would be a nightmare, as we would need to use byte arrays
> for all structure fields. Functions would also be unable to use other pointer
> types than "void *" and "char *", which seems to be the actual problem in
> the __rte_raw_cksum() function. I guess that it also would prevent the
> compiler from auto-vectorizing the functions.
> 
> I am usually a big proponent of academically correct solutions, but such a
> change would be too wide ranging, so I would like to narrow it down to the
> actual use case, and perhaps extrapolate a bit from there.
> 
> @Emil: Do you only need to calculate the checksum of the (potentially
> unaligned) embedded packet? Or do you also need to use other DPDK
> functions with the embedded packet, potentially accessing it at an unaligned
> address?
> 
> I'm trying to determine the scope of this C11 pointer alignment limitation for
> your use case, i.e. whether or not other DPDK functions need to be updated
> to support unaligned packet access too.
> 
> [2] https://protect2.fireeye.com/v1/url?k=31323334-501cfaf3-313273af-
> 454445554331-2ffe58e5caaeb74e&q=1&e=3f0544d3-8a71-4676-b4f9-
> 27e0952f7de0&u=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%
> 3D1035

That's my interpretation of the standard as well; For example an uint16_t* must be on even addresses. If not it is undefined behavior. I think this is a bigger problem on ARM for example.

Without being that invested in dpdk, adding unaligned support for everything seems like a steep step, but I'm not sure what it entails in practice.

We are actually only interested in the checksumming.

Morten Brørup June 22, 2022, 2:01 p.m. UTC | #13

> From: Emil Berg [mailto:emil.berg@ericsson.com]
> Sent: Wednesday, 22 June 2022 14.25
> 
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: den 22 juni 2022 13:26
> >
> > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > Sent: Wednesday, 22 June 2022 11.18
> > >
> > > On Wed, Jun 22, 2022 at 06:26:07AM +0000, Emil Berg wrote:
> > > >
> > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > Sent: den 21 juni 2022 11:35
> > > > >
> > > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > > > Sent: Tuesday, 21 June 2022 10.23
> > > > > >
> > > > > > On Tue, Jun 21, 2022 at 10:05:15AM +0200, Morten Brørup
> wrote:
> > > > > > > +TO: @Bruce and @Stephen: You signed off on the 16 bit
> > > alignment
> > > > > > requirement. We need background info on this.
> > > > > > >
> > > > > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > > > > Sent: Tuesday, 21 June 2022 09.17
> > > > > > > >
> > > > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > > Sent: den 20 juni 2022 12:58
> > > > > > > > >
> > > > > > > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > > > > > > Sent: Monday, 20 June 2022 12.38
> > > > > > > > > >
> > > > > > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > > > > Sent: den 17 juni 2022 11:07
> > > > > > > > > > >
> > > > > > > > > > > > From: Morten Brørup
> > > > > > > > > > > > [mailto:mb@smartsharesystems.com]
> > > > > > > > > > > > Sent: Friday, 17 June 2022 10.45
> > > > > > > > > > > >
> > > > > > > > > > > > With this patch, the checksum can be calculated
> on
> > > > > > > > > > > > an
> > > > > > unligned
> > > > > > > > > > > > part
> > > > > > > > > > of
> > > > > > > > > > > > a packet buffer.
> > > > > > > > > > > > I.e. the buf parameter is no longer required to
> be
> > > > > > > > > > > > 16
> > > bit
> > > > > > > > aligned.
> > > > > > > > > > > >
> > > > > > > > > > > > The DPDK invariant that packet buffers must be 16
> > > > > > > > > > > > bit
> > > > > > aligned
> > > > > > > > > > remains
> > > > > > > > > > > > unchanged.
> > > > > > > > > > > > This invariant also defines how to calculate the
> 16
> > > bit
> > > > > > > > checksum
> > > > > > > > > > > > on
> > > > > > > > > > an
> > > > > > > > > > > > unaligned part of a packet buffer.
> > > > > > > > > > > >
> > > > > > > > > > > > Bugzilla ID: 1035
> > > > > > > > > > > > Cc: stable@dpdk.org
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Morten Brørup
> > > <mb@smartsharesystems.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >  lib/net/rte_ip.h | 17 +++++++++++++++--
> > > > > > > > > > > >  1 file changed, 15 insertions(+), 2 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> > > index
> > > > > > > > > > > > b502481670..8e301d9c26 100644
> > > > > > > > > > > > --- a/lib/net/rte_ip.h
> > > > > > > > > > > > +++ b/lib/net/rte_ip.h
> > > > > > > > > > > > @@ -162,9 +162,22 @@ __rte_raw_cksum(const void
> > > > > > > > > > > > *buf,
> > > > > > size_t
> > > > > > > > len,
> > > > > > > > > > > > uint32_t sum)  {
> > > > > > > > > > > >  	/* extend strict-aliasing rules */
> > > > > > > > > > > >  	typedef uint16_t
> > __attribute__((__may_alias__))
> > > > > > u16_p;
> > > > > > > > > > > > -	const u16_p *u16_buf = (const u16_p *)buf;
> > > > > > > > > > > > -	const u16_p *end = u16_buf + len /
> > > sizeof(*u16_buf);
> > > > > > > > > > > > +	const u16_p *u16_buf;
> > > > > > > > > > > > +	const u16_p *end;
> > > > > > > > > > > > +
> > > > > > > > > > > > +	/* if buffer is unaligned, keeping it byte
> > > order
> > > > > > > > independent */
> > > > > > > > > > > > +	if (unlikely((uintptr_t)buf & 1)) {
> > > > > > > > > > > > +		uint16_t first = 0;
> > > > > > > > > > > > +		if (unlikely(len == 0))
> > > > > > > > > > > > +			return 0;
> > > > > > > > > > > > +		((unsigned char *)&first)[1] =
> > *(const
> > > unsigned
> > > > > > > > > > > char *)buf;
> > > > > > > > > > > > +		sum += first;
> > > > > > > > > > > > +		buf = (const void *)((uintptr_t)buf
> > + 1);
> > > > > > > > > > > > +		len--;
> > > > > > > > > > > > +	}
> > > > > > > > > > > >
> > > > > > > > > > > > +	u16_buf = (const u16_p *)buf;
> > > > > > > > > > > > +	end = u16_buf + len / sizeof(*u16_buf);
> > > > > > > > > > > >  	for (; u16_buf != end; ++u16_buf)
> > > > > > > > > > > >  		sum += *u16_buf;
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.17.1
> > > > > > > > > > >
> > > > > > > > > > > @Emil, can you please test this patch with an
> > > > > > > > > > > unaligned
> > > > > > buffer on
> > > > > > > > > > your
> > > > > > > > > > > application to confirm that it produces the
> expected
> > > result.
> > > > > > > > > >
> > > > > > > > > > Hi!
> > > > > > > > > >
> > > > > > > > > > I tested the patch. It doesn't seem to produce the
> same
> > > > > > results. I
> > > > > > > > > > think the problem is that it always starts summing
> from
> > > an
> > > > > > > > > > even address, the sum should always start from the
> first
> > > byte
> > > > > > according
> > > > > > > > to
> > > > > > > > > > the checksum specification. Can I instead propose
> > > something
> > > > > > Mattias
> > > > > > > > > > Rönnblom sent me?
> > > > > > > > >
> > > > > > > > > I assume that it produces the same result when the
> "buf"
> > > > > > parameter is
> > > > > > > > > aligned?
> > > > > > > > >
> > > > > > > > > And when the "buf" parameter is unaligned, I don't
> expect
> > > it to
> > > > > > > > produce the
> > > > > > > > > same results as the simple algorithm!
> > > > > > > > >
> > > > > > > > > This was the whole point of the patch: I expect the
> > > > > > > > > overall
> > > > > > packet
> > > > > > > > buffer to
> > > > > > > > > be 16 bit aligned, and the checksum to be a partial
> > > checksum of
> > > > > > such
> > > > > > > > a 16 bit
> > > > > > > > > aligned packet buffer. When calling this function, I
> > > > > > > > > assume
> > > that
> > > > > > the
> > > > > > > > "buf" and
> > > > > > > > > "len" parameters point to a part of such a packet
> buffer.
> > > If
> > > > > > these
> > > > > > > > > expectations are correct, the simple algorithm will
> > > > > > > > > produce
> > > > > > incorrect
> > > > > > > > results
> > > > > > > > > when "buf" is unaligned.
> > > > > > > > >
> > > > > > > > > I was asking you to test if the checksum on the packet
> is
> > > > > > > > > correct
> > > > > > > > when your
> > > > > > > > > application modifies an unaligned part of the packet
> and
> > > uses
> > > > > > this
> > > > > > > > function to
> > > > > > > > > update the checksum.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Now I understand your use case. Your use case seems to be
> > > about
> > > > > > partial
> > > > > > > > checksums, of which some partial checksums may start on
> > > unaligned
> > > > > > > > addresses in an otherwise aligned packet.
> > > > > > > >
> > > > > > > > Our use case is about calculating the full checksum on a
> > > nested
> > > > > > packet.
> > > > > > > > That nested packet may start on unaligned addresses.
> > > > > > > >
> > > > > > > > The difference is basically if we want to sum over
> aligned
> > > > > > addresses or
> > > > > > > > not, handling the heading and trailing bytes
> appropriately.
> > > > > > > >
> > > > > > > > Your method does not work in our case since we want to
> treat
> > > the
> > > > > > first
> > > > > > > > two bytes as the first word in our case. But I do
> understand
> > > that
> > > > > > both
> > > > > > > > methods are useful.
> > > > > > >
> > > > > > > Yes, that certainly are two different use cases, requiring
> two
> > > > > > different ways of calculating the 16 bit checksum.
> > > > > > >
> > > > > > > >
> > > > > > > > Note that your method breaks the API. Previously
> (assuming
> > > > > > > > no
> > > > > > crashing
> > > > > > > > due to low optimization levels, more accepting hardware,
> or
> > > > > > > > a
> > > > > > different
> > > > > > > > compiler (version)) the current method would calculate
> the
> > > > > > > > checksum assuming the first two bytes is the first word.
> > > > > > > >
> > > > > > >
> > > > > > > Depending on the point of view, my patch either fixes a bug
> > > (where
> > > > > > the checksum was calculated incorrectly when the buf pointer
> was
> > > > > > unaligned) or breaks the API (by calculating the differently
> > > > > > when
> > > the
> > > > > > buffer is unaligned).
> > > > > > >
> > > > > > > I cannot say with certainty which one is correct, but
> perhaps
> > > some
> > > > > > > of
> > > > > > the people with a deeper DPDK track record can...
> > > > > > >
> > > > > > > @Bruce and @Stephen, in 2019 you signed off on a patch [1]
> > > > > > introducing a 16 bit alignment requirement to the Ethernet
> > > address
> > > > > > structure.
> > > > > > >
> > > > > > > It is my understanding that DPDK has an invariant requiring
> > > packets
> > > > > > to be 16 bit aligned, which that patch supports. Is this
> > > invariant
> > > > > > documented anywhere, or am I completely wrong? If I'm wrong,
> > > > > > then
> > > the
> > > > > > alignment requirement introduced in that patch needs to be
> > > removed, as
> > > > > > well as any similar alignment requirements elsewhere in DPDK.
> > > > > >
> > > > > > I don't believe it is explicitly documented as a global
> > > invariant, but
> > > > > > I think it should be unless there is a definite case where we
> > > need to
> > > > > > allow packets to be completely unaligned. Across all packet
> > > headers we
> > > > > > looked at, there was no tunneling protocol where the
> resulting
> > > packet
> > > > > > was left unaligned.
> > > > > >
> > > > > > That said, if there are real use cases where we need to allow
> > > packets
> > > > > > to start at an unaligned address, then I agree with you that
> we
> > > need
> > > > > > to roll back the patch and work to ensure everything works
> with
> > > > > > unaligned addresses.
> > > > > >
> > > > > > /Bruce
> > > > > >
> > > > >
> > > > > @Emil, can you please describe or refer to which tunneling
> > > > > protocol
> > > you are
> > > > > using, where the nested packet can be unaligned?
> > > > >
> > > > > I am asking to determine if your use case is exotic (maybe some
> > > Ericsson
> > > > > proprietary protocol), or more generic (rooted in some standard
> > > protocol).
> > > > > This information affects the DPDK community's opinion about how
> it
> > > should
> > > > > be supported by DPDK.
> > > > >
> > > > > If possible, please provide more details about the tunneling
> > > protocol and
> > > > > nested packets... E.g. do the nested packets also contain Layer
> 2
> > > (Ethernet,
> > > > > VLAN, etc.) headers, or only Layer 3 (IP) or Layer 4 (TCP, UDP,
> > > etc.)? And how
> > > > > about ARP packets and Layer 2 control protocol packets (STP,
> LACP,
> > > etc.)?
> > > > >
> > > >
> > > > Well, if you append or adjust an odd number of bytes (e.g. a PDCP
> > > header) from a previously aligned payload the entire packet will
> then
> > > be unaligned.
> > > >
> > >
> > > If PDCP headers can leave the rest of the packet field unaligned,
> then
> > > we had better remove the alignment restrictions through all of
> DPDK.
> > >
> > > /Bruce
> >
> > Re-reading the details regarding unaligned pointers in C11, as posted
> by Emil
> > in Bugzilla [2], I interpret it as follows: Any 16 bit or wider
> pointer type a must
> > point to data aligned with that type, i.e. a pointer of the type
> "uint16_t *"
> > must point to 16 bit aligned data, and a pointer of the type
> "uint64_t *" must
> > point to 64 bit aligned data. Please, someone tell me I got this
> wrong, and
> > wake me up from my nightmare!
> >
> > Updating DPDK's packet structures to fully support this C11
> limitation with
> > unaligned access would be a nightmare, as we would need to use byte
> arrays
> > for all structure fields. Functions would also be unable to use other
> pointer
> > types than "void *" and "char *", which seems to be the actual
> problem in
> > the __rte_raw_cksum() function. I guess that it also would prevent
> the
> > compiler from auto-vectorizing the functions.
> >
> > I am usually a big proponent of academically correct solutions, but
> such a
> > change would be too wide ranging, so I would like to narrow it down
> to the
> > actual use case, and perhaps extrapolate a bit from there.
> >
> > @Emil: Do you only need to calculate the checksum of the (potentially
> > unaligned) embedded packet? Or do you also need to use other DPDK
> > functions with the embedded packet, potentially accessing it at an
> unaligned
> > address?
> >
> > I'm trying to determine the scope of this C11 pointer alignment
> limitation for
> > your use case, i.e. whether or not other DPDK functions need to be
> updated
> > to support unaligned packet access too.
> >
> > [2] https://protect2.fireeye.com/v1/url?k=31323334-501cfaf3-313273af-
> > 454445554331-2ffe58e5caaeb74e&q=1&e=3f0544d3-8a71-4676-b4f9-
> > 27e0952f7de0&u=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%
> > 3D1035
> 
> That's my interpretation of the standard as well; For example an
> uint16_t* must be on even addresses. If not it is undefined behavior. I
> think this is a bigger problem on ARM for example.
> 
> Without being that invested in dpdk, adding unaligned support for
> everything seems like a steep step, but I'm not sure what it entails in
> practice.
> 
> We are actually only interested in the checksumming.

Great! Then we can cancel the panic about rewriting DPDK Core completely. Although it might still need some review for similar alignment bugs, where we have been forcing the compiler shut up when trying to warn us. :-)

I have provided v3 of the patch, which should do as requested - and still allow the compiler to auto-vectorize.

@Emil, will you please test v3 of the patch?

Emil Berg June 22, 2022, 2:03 p.m. UTC | #14

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: den 22 juni 2022 16:02
> To: Emil Berg <emil.berg@ericsson.com>; Bruce Richardson
> <bruce.richardson@intel.com>
> Cc: Stephen Hemminger <stephen@networkplumber.org>;
> stable@dpdk.org; bugzilla@dpdk.org; hofors@lysator.liu.se;
> olivier.matz@6wind.com; dev@dpdk.org
> Subject: RE: [PATCH] net: fix checksum with unaligned buffer
> 
> > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > Sent: Wednesday, 22 June 2022 14.25
> >
> > > From: Morten Brørup <mb@smartsharesystems.com>
> > > Sent: den 22 juni 2022 13:26
> > >
> > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > Sent: Wednesday, 22 June 2022 11.18
> > > >
> > > > On Wed, Jun 22, 2022 at 06:26:07AM +0000, Emil Berg wrote:
> > > > >
> > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > Sent: den 21 juni 2022 11:35
> > > > > >
> > > > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > > > > Sent: Tuesday, 21 June 2022 10.23
> > > > > > >
> > > > > > > On Tue, Jun 21, 2022 at 10:05:15AM +0200, Morten Brørup
> > wrote:
> > > > > > > > +TO: @Bruce and @Stephen: You signed off on the 16 bit
> > > > alignment
> > > > > > > requirement. We need background info on this.
> > > > > > > >
> > > > > > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > > > > > Sent: Tuesday, 21 June 2022 09.17
> > > > > > > > >
> > > > > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > > > Sent: den 20 juni 2022 12:58
> > > > > > > > > >
> > > > > > > > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > > > > > > > Sent: Monday, 20 June 2022 12.38
> > > > > > > > > > >
> > > > > > > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > > > > > Sent: den 17 juni 2022 11:07
> > > > > > > > > > > >
> > > > > > > > > > > > > From: Morten Brørup
> > > > > > > > > > > > > [mailto:mb@smartsharesystems.com]
> > > > > > > > > > > > > Sent: Friday, 17 June 2022 10.45
> > > > > > > > > > > > >
> > > > > > > > > > > > > With this patch, the checksum can be calculated
> > on
> > > > > > > > > > > > > an
> > > > > > > unligned
> > > > > > > > > > > > > part
> > > > > > > > > > > of
> > > > > > > > > > > > > a packet buffer.
> > > > > > > > > > > > > I.e. the buf parameter is no longer required to
> > be
> > > > > > > > > > > > > 16
> > > > bit
> > > > > > > > > aligned.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The DPDK invariant that packet buffers must be
> > > > > > > > > > > > > 16 bit
> > > > > > > aligned
> > > > > > > > > > > remains
> > > > > > > > > > > > > unchanged.
> > > > > > > > > > > > > This invariant also defines how to calculate the
> > 16
> > > > bit
> > > > > > > > > checksum
> > > > > > > > > > > > > on
> > > > > > > > > > > an
> > > > > > > > > > > > > unaligned part of a packet buffer.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Bugzilla ID: 1035
> > > > > > > > > > > > > Cc: stable@dpdk.org
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Morten Brørup
> > > > <mb@smartsharesystems.com>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  lib/net/rte_ip.h | 17 +++++++++++++++--
> > > > > > > > > > > > >  1 file changed, 15 insertions(+), 2
> > > > > > > > > > > > > deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> > > > index
> > > > > > > > > > > > > b502481670..8e301d9c26 100644
> > > > > > > > > > > > > --- a/lib/net/rte_ip.h
> > > > > > > > > > > > > +++ b/lib/net/rte_ip.h
> > > > > > > > > > > > > @@ -162,9 +162,22 @@ __rte_raw_cksum(const void
> > > > > > > > > > > > > *buf,
> > > > > > > size_t
> > > > > > > > > len,
> > > > > > > > > > > > > uint32_t sum)  {
> > > > > > > > > > > > >  	/* extend strict-aliasing rules */
> > > > > > > > > > > > >  	typedef uint16_t
> > > __attribute__((__may_alias__))
> > > > > > > u16_p;
> > > > > > > > > > > > > -	const u16_p *u16_buf = (const u16_p *)buf;
> > > > > > > > > > > > > -	const u16_p *end = u16_buf + len /
> > > > sizeof(*u16_buf);
> > > > > > > > > > > > > +	const u16_p *u16_buf;
> > > > > > > > > > > > > +	const u16_p *end;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +	/* if buffer is unaligned, keeping it byte
> > > > order
> > > > > > > > > independent */
> > > > > > > > > > > > > +	if (unlikely((uintptr_t)buf & 1)) {
> > > > > > > > > > > > > +		uint16_t first = 0;
> > > > > > > > > > > > > +		if (unlikely(len == 0))
> > > > > > > > > > > > > +			return 0;
> > > > > > > > > > > > > +		((unsigned char *)&first)[1] =
> > > *(const
> > > > unsigned
> > > > > > > > > > > > char *)buf;
> > > > > > > > > > > > > +		sum += first;
> > > > > > > > > > > > > +		buf = (const void *)((uintptr_t)buf
> > > + 1);
> > > > > > > > > > > > > +		len--;
> > > > > > > > > > > > > +	}
> > > > > > > > > > > > >
> > > > > > > > > > > > > +	u16_buf = (const u16_p *)buf;
> > > > > > > > > > > > > +	end = u16_buf + len / sizeof(*u16_buf);
> > > > > > > > > > > > >  	for (; u16_buf != end; ++u16_buf)
> > > > > > > > > > > > >  		sum += *u16_buf;
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > 2.17.1
> > > > > > > > > > > >
> > > > > > > > > > > > @Emil, can you please test this patch with an
> > > > > > > > > > > > unaligned
> > > > > > > buffer on
> > > > > > > > > > > your
> > > > > > > > > > > > application to confirm that it produces the
> > expected
> > > > result.
> > > > > > > > > > >
> > > > > > > > > > > Hi!
> > > > > > > > > > >
> > > > > > > > > > > I tested the patch. It doesn't seem to produce the
> > same
> > > > > > > results. I
> > > > > > > > > > > think the problem is that it always starts summing
> > from
> > > > an
> > > > > > > > > > > even address, the sum should always start from the
> > first
> > > > byte
> > > > > > > according
> > > > > > > > > to
> > > > > > > > > > > the checksum specification. Can I instead propose
> > > > something
> > > > > > > Mattias
> > > > > > > > > > > Rönnblom sent me?
> > > > > > > > > >
> > > > > > > > > > I assume that it produces the same result when the
> > "buf"
> > > > > > > parameter is
> > > > > > > > > > aligned?
> > > > > > > > > >
> > > > > > > > > > And when the "buf" parameter is unaligned, I don't
> > expect
> > > > it to
> > > > > > > > > produce the
> > > > > > > > > > same results as the simple algorithm!
> > > > > > > > > >
> > > > > > > > > > This was the whole point of the patch: I expect the
> > > > > > > > > > overall
> > > > > > > packet
> > > > > > > > > buffer to
> > > > > > > > > > be 16 bit aligned, and the checksum to be a partial
> > > > checksum of
> > > > > > > such
> > > > > > > > > a 16 bit
> > > > > > > > > > aligned packet buffer. When calling this function, I
> > > > > > > > > > assume
> > > > that
> > > > > > > the
> > > > > > > > > "buf" and
> > > > > > > > > > "len" parameters point to a part of such a packet
> > buffer.
> > > > If
> > > > > > > these
> > > > > > > > > > expectations are correct, the simple algorithm will
> > > > > > > > > > produce
> > > > > > > incorrect
> > > > > > > > > results
> > > > > > > > > > when "buf" is unaligned.
> > > > > > > > > >
> > > > > > > > > > I was asking you to test if the checksum on the packet
> > is
> > > > > > > > > > correct
> > > > > > > > > when your
> > > > > > > > > > application modifies an unaligned part of the packet
> > and
> > > > uses
> > > > > > > this
> > > > > > > > > function to
> > > > > > > > > > update the checksum.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Now I understand your use case. Your use case seems to
> > > > > > > > > be
> > > > about
> > > > > > > partial
> > > > > > > > > checksums, of which some partial checksums may start on
> > > > unaligned
> > > > > > > > > addresses in an otherwise aligned packet.
> > > > > > > > >
> > > > > > > > > Our use case is about calculating the full checksum on a
> > > > nested
> > > > > > > packet.
> > > > > > > > > That nested packet may start on unaligned addresses.
> > > > > > > > >
> > > > > > > > > The difference is basically if we want to sum over
> > aligned
> > > > > > > addresses or
> > > > > > > > > not, handling the heading and trailing bytes
> > appropriately.
> > > > > > > > >
> > > > > > > > > Your method does not work in our case since we want to
> > treat
> > > > the
> > > > > > > first
> > > > > > > > > two bytes as the first word in our case. But I do
> > understand
> > > > that
> > > > > > > both
> > > > > > > > > methods are useful.
> > > > > > > >
> > > > > > > > Yes, that certainly are two different use cases, requiring
> > two
> > > > > > > different ways of calculating the 16 bit checksum.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Note that your method breaks the API. Previously
> > (assuming
> > > > > > > > > no
> > > > > > > crashing
> > > > > > > > > due to low optimization levels, more accepting hardware,
> > or
> > > > > > > > > a
> > > > > > > different
> > > > > > > > > compiler (version)) the current method would calculate
> > the
> > > > > > > > > checksum assuming the first two bytes is the first word.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Depending on the point of view, my patch either fixes a
> > > > > > > > bug
> > > > (where
> > > > > > > the checksum was calculated incorrectly when the buf pointer
> > was
> > > > > > > unaligned) or breaks the API (by calculating the differently
> > > > > > > when
> > > > the
> > > > > > > buffer is unaligned).
> > > > > > > >
> > > > > > > > I cannot say with certainty which one is correct, but
> > perhaps
> > > > some
> > > > > > > > of
> > > > > > > the people with a deeper DPDK track record can...
> > > > > > > >
> > > > > > > > @Bruce and @Stephen, in 2019 you signed off on a patch [1]
> > > > > > > introducing a 16 bit alignment requirement to the Ethernet
> > > > address
> > > > > > > structure.
> > > > > > > >
> > > > > > > > It is my understanding that DPDK has an invariant
> > > > > > > > requiring
> > > > packets
> > > > > > > to be 16 bit aligned, which that patch supports. Is this
> > > > invariant
> > > > > > > documented anywhere, or am I completely wrong? If I'm wrong,
> > > > > > > then
> > > > the
> > > > > > > alignment requirement introduced in that patch needs to be
> > > > removed, as
> > > > > > > well as any similar alignment requirements elsewhere in DPDK.
> > > > > > >
> > > > > > > I don't believe it is explicitly documented as a global
> > > > invariant, but
> > > > > > > I think it should be unless there is a definite case where
> > > > > > > we
> > > > need to
> > > > > > > allow packets to be completely unaligned. Across all packet
> > > > headers we
> > > > > > > looked at, there was no tunneling protocol where the
> > resulting
> > > > packet
> > > > > > > was left unaligned.
> > > > > > >
> > > > > > > That said, if there are real use cases where we need to
> > > > > > > allow
> > > > packets
> > > > > > > to start at an unaligned address, then I agree with you that
> > we
> > > > need
> > > > > > > to roll back the patch and work to ensure everything works
> > with
> > > > > > > unaligned addresses.
> > > > > > >
> > > > > > > /Bruce
> > > > > > >
> > > > > >
> > > > > > @Emil, can you please describe or refer to which tunneling
> > > > > > protocol
> > > > you are
> > > > > > using, where the nested packet can be unaligned?
> > > > > >
> > > > > > I am asking to determine if your use case is exotic (maybe
> > > > > > some
> > > > Ericsson
> > > > > > proprietary protocol), or more generic (rooted in some
> > > > > > standard
> > > > protocol).
> > > > > > This information affects the DPDK community's opinion about
> > > > > > how
> > it
> > > > should
> > > > > > be supported by DPDK.
> > > > > >
> > > > > > If possible, please provide more details about the tunneling
> > > > protocol and
> > > > > > nested packets... E.g. do the nested packets also contain
> > > > > > Layer
> > 2
> > > > (Ethernet,
> > > > > > VLAN, etc.) headers, or only Layer 3 (IP) or Layer 4 (TCP,
> > > > > > UDP,
> > > > etc.)? And how
> > > > > > about ARP packets and Layer 2 control protocol packets (STP,
> > LACP,
> > > > etc.)?
> > > > > >
> > > > >
> > > > > Well, if you append or adjust an odd number of bytes (e.g. a
> > > > > PDCP
> > > > header) from a previously aligned payload the entire packet will
> > then
> > > > be unaligned.
> > > > >
> > > >
> > > > If PDCP headers can leave the rest of the packet field unaligned,
> > then
> > > > we had better remove the alignment restrictions through all of
> > DPDK.
> > > >
> > > > /Bruce
> > >
> > > Re-reading the details regarding unaligned pointers in C11, as
> > > posted
> > by Emil
> > > in Bugzilla [2], I interpret it as follows: Any 16 bit or wider
> > pointer type a must
> > > point to data aligned with that type, i.e. a pointer of the type
> > "uint16_t *"
> > > must point to 16 bit aligned data, and a pointer of the type
> > "uint64_t *" must
> > > point to 64 bit aligned data. Please, someone tell me I got this
> > wrong, and
> > > wake me up from my nightmare!
> > >
> > > Updating DPDK's packet structures to fully support this C11
> > limitation with
> > > unaligned access would be a nightmare, as we would need to use byte
> > arrays
> > > for all structure fields. Functions would also be unable to use
> > > other
> > pointer
> > > types than "void *" and "char *", which seems to be the actual
> > problem in
> > > the __rte_raw_cksum() function. I guess that it also would prevent
> > the
> > > compiler from auto-vectorizing the functions.
> > >
> > > I am usually a big proponent of academically correct solutions, but
> > such a
> > > change would be too wide ranging, so I would like to narrow it down
> > to the
> > > actual use case, and perhaps extrapolate a bit from there.
> > >
> > > @Emil: Do you only need to calculate the checksum of the
> > > (potentially
> > > unaligned) embedded packet? Or do you also need to use other DPDK
> > > functions with the embedded packet, potentially accessing it at an
> > unaligned
> > > address?
> > >
> > > I'm trying to determine the scope of this C11 pointer alignment
> > limitation for
> > > your use case, i.e. whether or not other DPDK functions need to be
> > updated
> > > to support unaligned packet access too.
> > >
> > > [2]
> > > https://protect2.fireeye.com/v1/url?k=31323334-501cfaf3-313273af-
> > > 454445554331-2ffe58e5caaeb74e&q=1&e=3f0544d3-8a71-4676-b4f9-
> > >
> 27e0952f7de0&u=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%
> > > 3D1035
> >
> > That's my interpretation of the standard as well; For example an
> > uint16_t* must be on even addresses. If not it is undefined behavior.
> > I think this is a bigger problem on ARM for example.
> >
> > Without being that invested in dpdk, adding unaligned support for
> > everything seems like a steep step, but I'm not sure what it entails
> > in practice.
> >
> > We are actually only interested in the checksumming.
> 
> Great! Then we can cancel the panic about rewriting DPDK Core completely.
> Although it might still need some review for similar alignment bugs, where
> we have been forcing the compiler shut up when trying to warn us. :-)
> 
> I have provided v3 of the patch, which should do as requested - and still allow
> the compiler to auto-vectorize.
> 
> @Emil, will you please test v3 of the patch?

I will test the patch tomorrow.

Emil Berg June 23, 2022, 5:21 a.m. UTC | #15

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: den 22 juni 2022 16:02
> To: Emil Berg <emil.berg@ericsson.com>; Bruce Richardson
> <bruce.richardson@intel.com>
> Cc: Stephen Hemminger <stephen@networkplumber.org>;
> stable@dpdk.org; bugzilla@dpdk.org; hofors@lysator.liu.se;
> olivier.matz@6wind.com; dev@dpdk.org
> Subject: RE: [PATCH] net: fix checksum with unaligned buffer
> 
> > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > Sent: Wednesday, 22 June 2022 14.25
> >
> > > From: Morten Brørup <mb@smartsharesystems.com>
> > > Sent: den 22 juni 2022 13:26
> > >
> > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > Sent: Wednesday, 22 June 2022 11.18
> > > >
> > > > On Wed, Jun 22, 2022 at 06:26:07AM +0000, Emil Berg wrote:
> > > > >
> > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > Sent: den 21 juni 2022 11:35
> > > > > >
> > > > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > > > > Sent: Tuesday, 21 June 2022 10.23
> > > > > > >
> > > > > > > On Tue, Jun 21, 2022 at 10:05:15AM +0200, Morten Brørup
> > wrote:
> > > > > > > > +TO: @Bruce and @Stephen: You signed off on the 16 bit
> > > > alignment
> > > > > > > requirement. We need background info on this.
> > > > > > > >
> > > > > > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > > > > > Sent: Tuesday, 21 June 2022 09.17
> > > > > > > > >
> > > > > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > > > Sent: den 20 juni 2022 12:58
> > > > > > > > > >
> > > > > > > > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > > > > > > > Sent: Monday, 20 June 2022 12.38
> > > > > > > > > > >
> > > > > > > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > > > > > Sent: den 17 juni 2022 11:07
> > > > > > > > > > > >
> > > > > > > > > > > > > From: Morten Brørup
> > > > > > > > > > > > > [mailto:mb@smartsharesystems.com]
> > > > > > > > > > > > > Sent: Friday, 17 June 2022 10.45
> > > > > > > > > > > > >
> > > > > > > > > > > > > With this patch, the checksum can be calculated
> > on
> > > > > > > > > > > > > an
> > > > > > > unligned
> > > > > > > > > > > > > part
> > > > > > > > > > > of
> > > > > > > > > > > > > a packet buffer.
> > > > > > > > > > > > > I.e. the buf parameter is no longer required to
> > be
> > > > > > > > > > > > > 16
> > > > bit
> > > > > > > > > aligned.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The DPDK invariant that packet buffers must be
> > > > > > > > > > > > > 16 bit
> > > > > > > aligned
> > > > > > > > > > > remains
> > > > > > > > > > > > > unchanged.
> > > > > > > > > > > > > This invariant also defines how to calculate the
> > 16
> > > > bit
> > > > > > > > > checksum
> > > > > > > > > > > > > on
> > > > > > > > > > > an
> > > > > > > > > > > > > unaligned part of a packet buffer.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Bugzilla ID: 1035
> > > > > > > > > > > > > Cc: stable@dpdk.org
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Morten Brørup
> > > > <mb@smartsharesystems.com>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  lib/net/rte_ip.h | 17 +++++++++++++++--
> > > > > > > > > > > > >  1 file changed, 15 insertions(+), 2
> > > > > > > > > > > > > deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> > > > index
> > > > > > > > > > > > > b502481670..8e301d9c26 100644
> > > > > > > > > > > > > --- a/lib/net/rte_ip.h
> > > > > > > > > > > > > +++ b/lib/net/rte_ip.h
> > > > > > > > > > > > > @@ -162,9 +162,22 @@ __rte_raw_cksum(const void
> > > > > > > > > > > > > *buf,
> > > > > > > size_t
> > > > > > > > > len,
> > > > > > > > > > > > > uint32_t sum)  {
> > > > > > > > > > > > >  	/* extend strict-aliasing rules */
> > > > > > > > > > > > >  	typedef uint16_t
> > > __attribute__((__may_alias__))
> > > > > > > u16_p;
> > > > > > > > > > > > > -	const u16_p *u16_buf = (const u16_p *)buf;
> > > > > > > > > > > > > -	const u16_p *end = u16_buf + len /
> > > > sizeof(*u16_buf);
> > > > > > > > > > > > > +	const u16_p *u16_buf;
> > > > > > > > > > > > > +	const u16_p *end;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +	/* if buffer is unaligned, keeping it byte
> > > > order
> > > > > > > > > independent */
> > > > > > > > > > > > > +	if (unlikely((uintptr_t)buf & 1)) {
> > > > > > > > > > > > > +		uint16_t first = 0;
> > > > > > > > > > > > > +		if (unlikely(len == 0))
> > > > > > > > > > > > > +			return 0;
> > > > > > > > > > > > > +		((unsigned char *)&first)[1] =
> > > *(const
> > > > unsigned
> > > > > > > > > > > > char *)buf;
> > > > > > > > > > > > > +		sum += first;
> > > > > > > > > > > > > +		buf = (const void *)((uintptr_t)buf
> > > + 1);
> > > > > > > > > > > > > +		len--;
> > > > > > > > > > > > > +	}
> > > > > > > > > > > > >
> > > > > > > > > > > > > +	u16_buf = (const u16_p *)buf;
> > > > > > > > > > > > > +	end = u16_buf + len / sizeof(*u16_buf);
> > > > > > > > > > > > >  	for (; u16_buf != end; ++u16_buf)
> > > > > > > > > > > > >  		sum += *u16_buf;
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > 2.17.1
> > > > > > > > > > > >
> > > > > > > > > > > > @Emil, can you please test this patch with an
> > > > > > > > > > > > unaligned
> > > > > > > buffer on
> > > > > > > > > > > your
> > > > > > > > > > > > application to confirm that it produces the
> > expected
> > > > result.
> > > > > > > > > > >
> > > > > > > > > > > Hi!
> > > > > > > > > > >
> > > > > > > > > > > I tested the patch. It doesn't seem to produce the
> > same
> > > > > > > results. I
> > > > > > > > > > > think the problem is that it always starts summing
> > from
> > > > an
> > > > > > > > > > > even address, the sum should always start from the
> > first
> > > > byte
> > > > > > > according
> > > > > > > > > to
> > > > > > > > > > > the checksum specification. Can I instead propose
> > > > something
> > > > > > > Mattias
> > > > > > > > > > > Rönnblom sent me?
> > > > > > > > > >
> > > > > > > > > > I assume that it produces the same result when the
> > "buf"
> > > > > > > parameter is
> > > > > > > > > > aligned?
> > > > > > > > > >
> > > > > > > > > > And when the "buf" parameter is unaligned, I don't
> > expect
> > > > it to
> > > > > > > > > produce the
> > > > > > > > > > same results as the simple algorithm!
> > > > > > > > > >
> > > > > > > > > > This was the whole point of the patch: I expect the
> > > > > > > > > > overall
> > > > > > > packet
> > > > > > > > > buffer to
> > > > > > > > > > be 16 bit aligned, and the checksum to be a partial
> > > > checksum of
> > > > > > > such
> > > > > > > > > a 16 bit
> > > > > > > > > > aligned packet buffer. When calling this function, I
> > > > > > > > > > assume
> > > > that
> > > > > > > the
> > > > > > > > > "buf" and
> > > > > > > > > > "len" parameters point to a part of such a packet
> > buffer.
> > > > If
> > > > > > > these
> > > > > > > > > > expectations are correct, the simple algorithm will
> > > > > > > > > > produce
> > > > > > > incorrect
> > > > > > > > > results
> > > > > > > > > > when "buf" is unaligned.
> > > > > > > > > >
> > > > > > > > > > I was asking you to test if the checksum on the packet
> > is
> > > > > > > > > > correct
> > > > > > > > > when your
> > > > > > > > > > application modifies an unaligned part of the packet
> > and
> > > > uses
> > > > > > > this
> > > > > > > > > function to
> > > > > > > > > > update the checksum.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Now I understand your use case. Your use case seems to
> > > > > > > > > be
> > > > about
> > > > > > > partial
> > > > > > > > > checksums, of which some partial checksums may start on
> > > > unaligned
> > > > > > > > > addresses in an otherwise aligned packet.
> > > > > > > > >
> > > > > > > > > Our use case is about calculating the full checksum on a
> > > > nested
> > > > > > > packet.
> > > > > > > > > That nested packet may start on unaligned addresses.
> > > > > > > > >
> > > > > > > > > The difference is basically if we want to sum over
> > aligned
> > > > > > > addresses or
> > > > > > > > > not, handling the heading and trailing bytes
> > appropriately.
> > > > > > > > >
> > > > > > > > > Your method does not work in our case since we want to
> > treat
> > > > the
> > > > > > > first
> > > > > > > > > two bytes as the first word in our case. But I do
> > understand
> > > > that
> > > > > > > both
> > > > > > > > > methods are useful.
> > > > > > > >
> > > > > > > > Yes, that certainly are two different use cases, requiring
> > two
> > > > > > > different ways of calculating the 16 bit checksum.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Note that your method breaks the API. Previously
> > (assuming
> > > > > > > > > no
> > > > > > > crashing
> > > > > > > > > due to low optimization levels, more accepting hardware,
> > or
> > > > > > > > > a
> > > > > > > different
> > > > > > > > > compiler (version)) the current method would calculate
> > the
> > > > > > > > > checksum assuming the first two bytes is the first word.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Depending on the point of view, my patch either fixes a
> > > > > > > > bug
> > > > (where
> > > > > > > the checksum was calculated incorrectly when the buf pointer
> > was
> > > > > > > unaligned) or breaks the API (by calculating the differently
> > > > > > > when
> > > > the
> > > > > > > buffer is unaligned).
> > > > > > > >
> > > > > > > > I cannot say with certainty which one is correct, but
> > perhaps
> > > > some
> > > > > > > > of
> > > > > > > the people with a deeper DPDK track record can...
> > > > > > > >
> > > > > > > > @Bruce and @Stephen, in 2019 you signed off on a patch [1]
> > > > > > > introducing a 16 bit alignment requirement to the Ethernet
> > > > address
> > > > > > > structure.
> > > > > > > >
> > > > > > > > It is my understanding that DPDK has an invariant
> > > > > > > > requiring
> > > > packets
> > > > > > > to be 16 bit aligned, which that patch supports. Is this
> > > > invariant
> > > > > > > documented anywhere, or am I completely wrong? If I'm wrong,
> > > > > > > then
> > > > the
> > > > > > > alignment requirement introduced in that patch needs to be
> > > > removed, as
> > > > > > > well as any similar alignment requirements elsewhere in DPDK.
> > > > > > >
> > > > > > > I don't believe it is explicitly documented as a global
> > > > invariant, but
> > > > > > > I think it should be unless there is a definite case where
> > > > > > > we
> > > > need to
> > > > > > > allow packets to be completely unaligned. Across all packet
> > > > headers we
> > > > > > > looked at, there was no tunneling protocol where the
> > resulting
> > > > packet
> > > > > > > was left unaligned.
> > > > > > >
> > > > > > > That said, if there are real use cases where we need to
> > > > > > > allow
> > > > packets
> > > > > > > to start at an unaligned address, then I agree with you that
> > we
> > > > need
> > > > > > > to roll back the patch and work to ensure everything works
> > with
> > > > > > > unaligned addresses.
> > > > > > >
> > > > > > > /Bruce
> > > > > > >
> > > > > >
> > > > > > @Emil, can you please describe or refer to which tunneling
> > > > > > protocol
> > > > you are
> > > > > > using, where the nested packet can be unaligned?
> > > > > >
> > > > > > I am asking to determine if your use case is exotic (maybe
> > > > > > some
> > > > Ericsson
> > > > > > proprietary protocol), or more generic (rooted in some
> > > > > > standard
> > > > protocol).
> > > > > > This information affects the DPDK community's opinion about
> > > > > > how
> > it
> > > > should
> > > > > > be supported by DPDK.
> > > > > >
> > > > > > If possible, please provide more details about the tunneling
> > > > protocol and
> > > > > > nested packets... E.g. do the nested packets also contain
> > > > > > Layer
> > 2
> > > > (Ethernet,
> > > > > > VLAN, etc.) headers, or only Layer 3 (IP) or Layer 4 (TCP,
> > > > > > UDP,
> > > > etc.)? And how
> > > > > > about ARP packets and Layer 2 control protocol packets (STP,
> > LACP,
> > > > etc.)?
> > > > > >
> > > > >
> > > > > Well, if you append or adjust an odd number of bytes (e.g. a
> > > > > PDCP
> > > > header) from a previously aligned payload the entire packet will
> > then
> > > > be unaligned.
> > > > >
> > > >
> > > > If PDCP headers can leave the rest of the packet field unaligned,
> > then
> > > > we had better remove the alignment restrictions through all of
> > DPDK.
> > > >
> > > > /Bruce
> > >
> > > Re-reading the details regarding unaligned pointers in C11, as
> > > posted
> > by Emil
> > > in Bugzilla [2], I interpret it as follows: Any 16 bit or wider
> > pointer type a must
> > > point to data aligned with that type, i.e. a pointer of the type
> > "uint16_t *"
> > > must point to 16 bit aligned data, and a pointer of the type
> > "uint64_t *" must
> > > point to 64 bit aligned data. Please, someone tell me I got this
> > wrong, and
> > > wake me up from my nightmare!
> > >
> > > Updating DPDK's packet structures to fully support this C11
> > limitation with
> > > unaligned access would be a nightmare, as we would need to use byte
> > arrays
> > > for all structure fields. Functions would also be unable to use
> > > other
> > pointer
> > > types than "void *" and "char *", which seems to be the actual
> > problem in
> > > the __rte_raw_cksum() function. I guess that it also would prevent
> > the
> > > compiler from auto-vectorizing the functions.
> > >
> > > I am usually a big proponent of academically correct solutions, but
> > such a
> > > change would be too wide ranging, so I would like to narrow it down
> > to the
> > > actual use case, and perhaps extrapolate a bit from there.
> > >
> > > @Emil: Do you only need to calculate the checksum of the
> > > (potentially
> > > unaligned) embedded packet? Or do you also need to use other DPDK
> > > functions with the embedded packet, potentially accessing it at an
> > unaligned
> > > address?
> > >
> > > I'm trying to determine the scope of this C11 pointer alignment
> > limitation for
> > > your use case, i.e. whether or not other DPDK functions need to be
> > updated
> > > to support unaligned packet access too.
> > >
> > > [2]
> > > https://protect2.fireeye.com/v1/url?k=31323334-501cfaf3-313273af-
> > > 454445554331-2ffe58e5caaeb74e&q=1&e=3f0544d3-8a71-4676-b4f9-
> > >
> 27e0952f7de0&u=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%
> > > 3D1035
> >
> > That's my interpretation of the standard as well; For example an
> > uint16_t* must be on even addresses. If not it is undefined behavior.
> > I think this is a bigger problem on ARM for example.
> >
> > Without being that invested in dpdk, adding unaligned support for
> > everything seems like a steep step, but I'm not sure what it entails
> > in practice.
> >
> > We are actually only interested in the checksumming.
> 
> Great! Then we can cancel the panic about rewriting DPDK Core completely.
> Although it might still need some review for similar alignment bugs, where
> we have been forcing the compiler shut up when trying to warn us. :-)
> 
> I have provided v3 of the patch, which should do as requested - and still allow
> the compiler to auto-vectorize.
> 
> @Emil, will you please test v3 of the patch?

It seems to work in these two cases:
* Even address, even length
* Even address, odd length
But it breaks in these two cases:
* Odd address, even length (although it works for small buffers, probably when the sum fits inside a uint16_t integer or something)
* Odd address, odd length
I get (and like) the main idea of the algorithm but haven't yet figured out what the problem is with odd addresses.

/Emil

Morten Brørup June 23, 2022, 7:01 a.m. UTC | #16

> From: Emil Berg [mailto:emil.berg@ericsson.com]
> Sent: Thursday, 23 June 2022 07.22
> 
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: den 22 juni 2022 16:02
> >
> > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > Sent: Wednesday, 22 June 2022 14.25
> > >
> > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > Sent: den 22 juni 2022 13:26
> > > >
> > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > > Sent: Wednesday, 22 June 2022 11.18
> > > > >
> > > > > On Wed, Jun 22, 2022 at 06:26:07AM +0000, Emil Berg wrote:
> > > > > >
> > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > Sent: den 21 juni 2022 11:35
> > > > > > >
> > > > > > > > From: Bruce Richardson
> [mailto:bruce.richardson@intel.com]
> > > > > > > > Sent: Tuesday, 21 June 2022 10.23
> > > > > > > >
> > > > > > > > On Tue, Jun 21, 2022 at 10:05:15AM +0200, Morten Brørup
> > > wrote:
> > > > > > > > > +TO: @Bruce and @Stephen: You signed off on the 16 bit
> > > > > alignment
> > > > > > > > requirement. We need background info on this.
> > > > > > > > >
> > > > > > > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > > > > > > Sent: Tuesday, 21 June 2022 09.17
> > > > > > > > > >
> > > > > > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > > > > Sent: den 20 juni 2022 12:58
> > > > > > > > > > >
> > > > > > > > > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > > > > > > > > Sent: Monday, 20 June 2022 12.38
> > > > > > > > > > > >
> > > > > > > > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > > > > > > Sent: den 17 juni 2022 11:07
> > > > > > > > > > > > >
> > > > > > > > > > > > > > From: Morten Brørup
> > > > > > > > > > > > > > [mailto:mb@smartsharesystems.com]
> > > > > > > > > > > > > > Sent: Friday, 17 June 2022 10.45
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > With this patch, the checksum can be
> calculated
> > > on
> > > > > > > > > > > > > > an
> > > > > > > > unligned
> > > > > > > > > > > > > > part
> > > > > > > > > > > > of
> > > > > > > > > > > > > > a packet buffer.
> > > > > > > > > > > > > > I.e. the buf parameter is no longer required
> to
> > > be
> > > > > > > > > > > > > > 16
> > > > > bit
> > > > > > > > > > aligned.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The DPDK invariant that packet buffers must
> be
> > > > > > > > > > > > > > 16 bit
> > > > > > > > aligned
> > > > > > > > > > > > remains
> > > > > > > > > > > > > > unchanged.
> > > > > > > > > > > > > > This invariant also defines how to calculate
> the
> > > 16
> > > > > bit
> > > > > > > > > > checksum
> > > > > > > > > > > > > > on
> > > > > > > > > > > > an
> > > > > > > > > > > > > > unaligned part of a packet buffer.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Bugzilla ID: 1035
> > > > > > > > > > > > > > Cc: stable@dpdk.org
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Morten Brørup
> > > > > <mb@smartsharesystems.com>
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >  lib/net/rte_ip.h | 17 +++++++++++++++--
> > > > > > > > > > > > > >  1 file changed, 15 insertions(+), 2
> > > > > > > > > > > > > > deletions(-)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > diff --git a/lib/net/rte_ip.h
> b/lib/net/rte_ip.h
> > > > > index
> > > > > > > > > > > > > > b502481670..8e301d9c26 100644
> > > > > > > > > > > > > > --- a/lib/net/rte_ip.h
> > > > > > > > > > > > > > +++ b/lib/net/rte_ip.h
> > > > > > > > > > > > > > @@ -162,9 +162,22 @@ __rte_raw_cksum(const
> void
> > > > > > > > > > > > > > *buf,
> > > > > > > > size_t
> > > > > > > > > > len,
> > > > > > > > > > > > > > uint32_t sum)  {
> > > > > > > > > > > > > >  	/* extend strict-aliasing rules */
> > > > > > > > > > > > > >  	typedef uint16_t
> > > > __attribute__((__may_alias__))
> > > > > > > > u16_p;
> > > > > > > > > > > > > > -	const u16_p *u16_buf = (const u16_p
> *)buf;
> > > > > > > > > > > > > > -	const u16_p *end = u16_buf + len /
> > > > > sizeof(*u16_buf);
> > > > > > > > > > > > > > +	const u16_p *u16_buf;
> > > > > > > > > > > > > > +	const u16_p *end;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +	/* if buffer is unaligned, keeping it
> byte
> > > > > order
> > > > > > > > > > independent */
> > > > > > > > > > > > > > +	if (unlikely((uintptr_t)buf & 1)) {
> > > > > > > > > > > > > > +		uint16_t first = 0;
> > > > > > > > > > > > > > +		if (unlikely(len == 0))
> > > > > > > > > > > > > > +			return 0;
> > > > > > > > > > > > > > +		((unsigned char *)&first)[1] =
> > > > *(const
> > > > > unsigned
> > > > > > > > > > > > > char *)buf;
> > > > > > > > > > > > > > +		sum += first;
> > > > > > > > > > > > > > +		buf = (const void *)((uintptr_t)buf
> > > > + 1);
> > > > > > > > > > > > > > +		len--;
> > > > > > > > > > > > > > +	}
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +	u16_buf = (const u16_p *)buf;
> > > > > > > > > > > > > > +	end = u16_buf + len / sizeof(*u16_buf);
> > > > > > > > > > > > > >  	for (; u16_buf != end; ++u16_buf)
> > > > > > > > > > > > > >  		sum += *u16_buf;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > 2.17.1
> > > > > > > > > > > > >
> > > > > > > > > > > > > @Emil, can you please test this patch with an
> > > > > > > > > > > > > unaligned
> > > > > > > > buffer on
> > > > > > > > > > > > your
> > > > > > > > > > > > > application to confirm that it produces the
> > > expected
> > > > > result.
> > > > > > > > > > > >
> > > > > > > > > > > > Hi!
> > > > > > > > > > > >
> > > > > > > > > > > > I tested the patch. It doesn't seem to produce
> the
> > > same
> > > > > > > > results. I
> > > > > > > > > > > > think the problem is that it always starts
> summing
> > > from
> > > > > an
> > > > > > > > > > > > even address, the sum should always start from
> the
> > > first
> > > > > byte
> > > > > > > > according
> > > > > > > > > > to
> > > > > > > > > > > > the checksum specification. Can I instead propose
> > > > > something
> > > > > > > > Mattias
> > > > > > > > > > > > Rönnblom sent me?
> > > > > > > > > > >
> > > > > > > > > > > I assume that it produces the same result when the
> > > "buf"
> > > > > > > > parameter is
> > > > > > > > > > > aligned?
> > > > > > > > > > >
> > > > > > > > > > > And when the "buf" parameter is unaligned, I don't
> > > expect
> > > > > it to
> > > > > > > > > > produce the
> > > > > > > > > > > same results as the simple algorithm!
> > > > > > > > > > >
> > > > > > > > > > > This was the whole point of the patch: I expect the
> > > > > > > > > > > overall
> > > > > > > > packet
> > > > > > > > > > buffer to
> > > > > > > > > > > be 16 bit aligned, and the checksum to be a partial
> > > > > checksum of
> > > > > > > > such
> > > > > > > > > > a 16 bit
> > > > > > > > > > > aligned packet buffer. When calling this function,
> I
> > > > > > > > > > > assume
> > > > > that
> > > > > > > > the
> > > > > > > > > > "buf" and
> > > > > > > > > > > "len" parameters point to a part of such a packet
> > > buffer.
> > > > > If
> > > > > > > > these
> > > > > > > > > > > expectations are correct, the simple algorithm will
> > > > > > > > > > > produce
> > > > > > > > incorrect
> > > > > > > > > > results
> > > > > > > > > > > when "buf" is unaligned.
> > > > > > > > > > >
> > > > > > > > > > > I was asking you to test if the checksum on the
> packet
> > > is
> > > > > > > > > > > correct
> > > > > > > > > > when your
> > > > > > > > > > > application modifies an unaligned part of the
> packet
> > > and
> > > > > uses
> > > > > > > > this
> > > > > > > > > > function to
> > > > > > > > > > > update the checksum.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Now I understand your use case. Your use case seems
> to
> > > > > > > > > > be
> > > > > about
> > > > > > > > partial
> > > > > > > > > > checksums, of which some partial checksums may start
> on
> > > > > unaligned
> > > > > > > > > > addresses in an otherwise aligned packet.
> > > > > > > > > >
> > > > > > > > > > Our use case is about calculating the full checksum
> on a
> > > > > nested
> > > > > > > > packet.
> > > > > > > > > > That nested packet may start on unaligned addresses.
> > > > > > > > > >
> > > > > > > > > > The difference is basically if we want to sum over
> > > aligned
> > > > > > > > addresses or
> > > > > > > > > > not, handling the heading and trailing bytes
> > > appropriately.
> > > > > > > > > >
> > > > > > > > > > Your method does not work in our case since we want
> to
> > > treat
> > > > > the
> > > > > > > > first
> > > > > > > > > > two bytes as the first word in our case. But I do
> > > understand
> > > > > that
> > > > > > > > both
> > > > > > > > > > methods are useful.
> > > > > > > > >
> > > > > > > > > Yes, that certainly are two different use cases,
> requiring
> > > two
> > > > > > > > different ways of calculating the 16 bit checksum.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Note that your method breaks the API. Previously
> > > (assuming
> > > > > > > > > > no
> > > > > > > > crashing
> > > > > > > > > > due to low optimization levels, more accepting
> hardware,
> > > or
> > > > > > > > > > a
> > > > > > > > different
> > > > > > > > > > compiler (version)) the current method would
> calculate
> > > the
> > > > > > > > > > checksum assuming the first two bytes is the first
> word.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Depending on the point of view, my patch either fixes a
> > > > > > > > > bug
> > > > > (where
> > > > > > > > the checksum was calculated incorrectly when the buf
> pointer
> > > was
> > > > > > > > unaligned) or breaks the API (by calculating the
> differently
> > > > > > > > when
> > > > > the
> > > > > > > > buffer is unaligned).
> > > > > > > > >
> > > > > > > > > I cannot say with certainty which one is correct, but
> > > perhaps
> > > > > some
> > > > > > > > > of
> > > > > > > > the people with a deeper DPDK track record can...
> > > > > > > > >
> > > > > > > > > @Bruce and @Stephen, in 2019 you signed off on a patch
> [1]
> > > > > > > > introducing a 16 bit alignment requirement to the
> Ethernet
> > > > > address
> > > > > > > > structure.
> > > > > > > > >
> > > > > > > > > It is my understanding that DPDK has an invariant
> > > > > > > > > requiring
> > > > > packets
> > > > > > > > to be 16 bit aligned, which that patch supports. Is this
> > > > > invariant
> > > > > > > > documented anywhere, or am I completely wrong? If I'm
> wrong,
> > > > > > > > then
> > > > > the
> > > > > > > > alignment requirement introduced in that patch needs to
> be
> > > > > removed, as
> > > > > > > > well as any similar alignment requirements elsewhere in
> DPDK.
> > > > > > > >
> > > > > > > > I don't believe it is explicitly documented as a global
> > > > > invariant, but
> > > > > > > > I think it should be unless there is a definite case
> where
> > > > > > > > we
> > > > > need to
> > > > > > > > allow packets to be completely unaligned. Across all
> packet
> > > > > headers we
> > > > > > > > looked at, there was no tunneling protocol where the
> > > resulting
> > > > > packet
> > > > > > > > was left unaligned.
> > > > > > > >
> > > > > > > > That said, if there are real use cases where we need to
> > > > > > > > allow
> > > > > packets
> > > > > > > > to start at an unaligned address, then I agree with you
> that
> > > we
> > > > > need
> > > > > > > > to roll back the patch and work to ensure everything
> works
> > > with
> > > > > > > > unaligned addresses.
> > > > > > > >
> > > > > > > > /Bruce
> > > > > > > >
> > > > > > >
> > > > > > > @Emil, can you please describe or refer to which tunneling
> > > > > > > protocol
> > > > > you are
> > > > > > > using, where the nested packet can be unaligned?
> > > > > > >
> > > > > > > I am asking to determine if your use case is exotic (maybe
> > > > > > > some
> > > > > Ericsson
> > > > > > > proprietary protocol), or more generic (rooted in some
> > > > > > > standard
> > > > > protocol).
> > > > > > > This information affects the DPDK community's opinion about
> > > > > > > how
> > > it
> > > > > should
> > > > > > > be supported by DPDK.
> > > > > > >
> > > > > > > If possible, please provide more details about the
> tunneling
> > > > > protocol and
> > > > > > > nested packets... E.g. do the nested packets also contain
> > > > > > > Layer
> > > 2
> > > > > (Ethernet,
> > > > > > > VLAN, etc.) headers, or only Layer 3 (IP) or Layer 4 (TCP,
> > > > > > > UDP,
> > > > > etc.)? And how
> > > > > > > about ARP packets and Layer 2 control protocol packets
> (STP,
> > > LACP,
> > > > > etc.)?
> > > > > > >
> > > > > >
> > > > > > Well, if you append or adjust an odd number of bytes (e.g. a
> > > > > > PDCP
> > > > > header) from a previously aligned payload the entire packet
> will
> > > then
> > > > > be unaligned.
> > > > > >
> > > > >
> > > > > If PDCP headers can leave the rest of the packet field
> unaligned,
> > > then
> > > > > we had better remove the alignment restrictions through all of
> > > DPDK.
> > > > >
> > > > > /Bruce
> > > >
> > > > Re-reading the details regarding unaligned pointers in C11, as
> > > > posted
> > > by Emil
> > > > in Bugzilla [2], I interpret it as follows: Any 16 bit or wider
> > > pointer type a must
> > > > point to data aligned with that type, i.e. a pointer of the type
> > > "uint16_t *"
> > > > must point to 16 bit aligned data, and a pointer of the type
> > > "uint64_t *" must
> > > > point to 64 bit aligned data. Please, someone tell me I got this
> > > wrong, and
> > > > wake me up from my nightmare!
> > > >
> > > > Updating DPDK's packet structures to fully support this C11
> > > limitation with
> > > > unaligned access would be a nightmare, as we would need to use
> byte
> > > arrays
> > > > for all structure fields. Functions would also be unable to use
> > > > other
> > > pointer
> > > > types than "void *" and "char *", which seems to be the actual
> > > problem in
> > > > the __rte_raw_cksum() function. I guess that it also would
> prevent
> > > the
> > > > compiler from auto-vectorizing the functions.
> > > >
> > > > I am usually a big proponent of academically correct solutions,
> but
> > > such a
> > > > change would be too wide ranging, so I would like to narrow it
> down
> > > to the
> > > > actual use case, and perhaps extrapolate a bit from there.
> > > >
> > > > @Emil: Do you only need to calculate the checksum of the
> > > > (potentially
> > > > unaligned) embedded packet? Or do you also need to use other DPDK
> > > > functions with the embedded packet, potentially accessing it at
> an
> > > unaligned
> > > > address?
> > > >
> > > > I'm trying to determine the scope of this C11 pointer alignment
> > > limitation for
> > > > your use case, i.e. whether or not other DPDK functions need to
> be
> > > updated
> > > > to support unaligned packet access too.
> > > >
> > > > [2]
> > > > https://protect2.fireeye.com/v1/url?k=31323334-501cfaf3-313273af-
> > > > 454445554331-2ffe58e5caaeb74e&q=1&e=3f0544d3-8a71-4676-b4f9-
> > > >
> > 27e0952f7de0&u=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%
> > > > 3D1035
> > >
> > > That's my interpretation of the standard as well; For example an
> > > uint16_t* must be on even addresses. If not it is undefined
> behavior.
> > > I think this is a bigger problem on ARM for example.
> > >
> > > Without being that invested in dpdk, adding unaligned support for
> > > everything seems like a steep step, but I'm not sure what it
> entails
> > > in practice.
> > >
> > > We are actually only interested in the checksumming.
> >
> > Great! Then we can cancel the panic about rewriting DPDK Core
> completely.
> > Although it might still need some review for similar alignment bugs,
> where
> > we have been forcing the compiler shut up when trying to warn us. :-)
> >
> > I have provided v3 of the patch, which should do as requested - and
> still allow
> > the compiler to auto-vectorize.
> >
> > @Emil, will you please test v3 of the patch?
> 
> It seems to work in these two cases:
> * Even address, even length
> * Even address, odd length
> But it breaks in these two cases:
> * Odd address, even length (although it works for small buffers,
> probably when the sum fits inside a uint16_t integer or something)

Interesting observation, good analysis.

> * Odd address, odd length

Does this also work for small buffers?

> I get (and like) the main idea of the algorithm but haven't yet figured
> out what the problem is with odd addresses.

I wonder if I messed up the algorithm for swapping back the bytes in bsum after the calculation... Is the checksum also wrong when compiling without optimization?

And just to be sure: The algorithm requires that __rte_raw_cksum_reduce() is also applied to the sum. Please confirm that you call rte_raw_cksum() (or __rte_raw_cksum() followed by __rte_raw_cksum_reduce())?

> 
> /Emil

Emil Berg June 23, 2022, 11:39 a.m. UTC | #17

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: den 23 juni 2022 09:01
> To: Emil Berg <emil.berg@ericsson.com>; Bruce Richardson
> <bruce.richardson@intel.com>
> Cc: Stephen Hemminger <stephen@networkplumber.org>;
> stable@dpdk.org; bugzilla@dpdk.org; hofors@lysator.liu.se;
> olivier.matz@6wind.com; dev@dpdk.org
> Subject: RE: [PATCH] net: fix checksum with unaligned buffer
> 
> > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > Sent: Thursday, 23 June 2022 07.22
> >
> > > From: Morten Brørup <mb@smartsharesystems.com>
> > > Sent: den 22 juni 2022 16:02
> > >
> > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > Sent: Wednesday, 22 June 2022 14.25
> > > >
> > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > Sent: den 22 juni 2022 13:26
> > > > >
> > > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > > > Sent: Wednesday, 22 June 2022 11.18
> > > > > >
> > > > > > On Wed, Jun 22, 2022 at 06:26:07AM +0000, Emil Berg wrote:
> > > > > > >
> > > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > Sent: den 21 juni 2022 11:35
> > > > > > > >
> > > > > > > > > From: Bruce Richardson
> > [mailto:bruce.richardson@intel.com]
> > > > > > > > > Sent: Tuesday, 21 June 2022 10.23
> > > > > > > > >
> > > > > > > > > On Tue, Jun 21, 2022 at 10:05:15AM +0200, Morten Brørup
> > > > wrote:
> > > > > > > > > > +TO: @Bruce and @Stephen: You signed off on the 16 bit
> > > > > > alignment
> > > > > > > > > requirement. We need background info on this.
> > > > > > > > > >
> > > > > > > > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > > > > > > > Sent: Tuesday, 21 June 2022 09.17
> > > > > > > > > > >
> > > > > > > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > > > > > Sent: den 20 juni 2022 12:58
> > > > > > > > > > > >
> > > > > > > > > > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > > > > > > > > > Sent: Monday, 20 June 2022 12.38
> > > > > > > > > > > > >
> > > > > > > > > > > > > > From: Morten Brørup
> <mb@smartsharesystems.com>
> > > > > > > > > > > > > > Sent: den 17 juni 2022 11:07
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > From: Morten Brørup
> > > > > > > > > > > > > > > [mailto:mb@smartsharesystems.com]
> > > > > > > > > > > > > > > Sent: Friday, 17 June 2022 10.45
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > With this patch, the checksum can be
> > calculated
> > > > on
> > > > > > > > > > > > > > > an
> > > > > > > > > unligned
> > > > > > > > > > > > > > > part
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > a packet buffer.
> > > > > > > > > > > > > > > I.e. the buf parameter is no longer required
> > to
> > > > be
> > > > > > > > > > > > > > > 16
> > > > > > bit
> > > > > > > > > > > aligned.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The DPDK invariant that packet buffers must
> > be
> > > > > > > > > > > > > > > 16 bit
> > > > > > > > > aligned
> > > > > > > > > > > > > remains
> > > > > > > > > > > > > > > unchanged.
> > > > > > > > > > > > > > > This invariant also defines how to calculate
> > the
> > > > 16
> > > > > > bit
> > > > > > > > > > > checksum
> > > > > > > > > > > > > > > on
> > > > > > > > > > > > > an
> > > > > > > > > > > > > > > unaligned part of a packet buffer.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Bugzilla ID: 1035
> > > > > > > > > > > > > > > Cc: stable@dpdk.org
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Signed-off-by: Morten Brørup
> > > > > > <mb@smartsharesystems.com>
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > >  lib/net/rte_ip.h | 17 +++++++++++++++--
> > > > > > > > > > > > > > >  1 file changed, 15 insertions(+), 2
> > > > > > > > > > > > > > > deletions(-)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > diff --git a/lib/net/rte_ip.h
> > b/lib/net/rte_ip.h
> > > > > > index
> > > > > > > > > > > > > > > b502481670..8e301d9c26 100644
> > > > > > > > > > > > > > > --- a/lib/net/rte_ip.h
> > > > > > > > > > > > > > > +++ b/lib/net/rte_ip.h
> > > > > > > > > > > > > > > @@ -162,9 +162,22 @@ __rte_raw_cksum(const
> > void
> > > > > > > > > > > > > > > *buf,
> > > > > > > > > size_t
> > > > > > > > > > > len,
> > > > > > > > > > > > > > > uint32_t sum)  {
> > > > > > > > > > > > > > >  	/* extend strict-aliasing rules */
> > > > > > > > > > > > > > >  	typedef uint16_t
> > > > > __attribute__((__may_alias__))
> > > > > > > > > u16_p;
> > > > > > > > > > > > > > > -	const u16_p *u16_buf = (const u16_p
> > *)buf;
> > > > > > > > > > > > > > > -	const u16_p *end = u16_buf + len /
> > > > > > sizeof(*u16_buf);
> > > > > > > > > > > > > > > +	const u16_p *u16_buf;
> > > > > > > > > > > > > > > +	const u16_p *end;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +	/* if buffer is unaligned, keeping it
> > byte
> > > > > > order
> > > > > > > > > > > independent */
> > > > > > > > > > > > > > > +	if (unlikely((uintptr_t)buf & 1)) {
> > > > > > > > > > > > > > > +		uint16_t first = 0;
> > > > > > > > > > > > > > > +		if (unlikely(len == 0))
> > > > > > > > > > > > > > > +			return 0;
> > > > > > > > > > > > > > > +		((unsigned char *)&first)[1] =
> > > > > *(const
> > > > > > unsigned
> > > > > > > > > > > > > > char *)buf;
> > > > > > > > > > > > > > > +		sum += first;
> > > > > > > > > > > > > > > +		buf = (const void *)((uintptr_t)buf
> > > > > + 1);
> > > > > > > > > > > > > > > +		len--;
> > > > > > > > > > > > > > > +	}
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +	u16_buf = (const u16_p *)buf;
> > > > > > > > > > > > > > > +	end = u16_buf + len / sizeof(*u16_buf);
> > > > > > > > > > > > > > >  	for (; u16_buf != end; ++u16_buf)
> > > > > > > > > > > > > > >  		sum += *u16_buf;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > 2.17.1
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @Emil, can you please test this patch with an
> > > > > > > > > > > > > > unaligned
> > > > > > > > > buffer on
> > > > > > > > > > > > > your
> > > > > > > > > > > > > > application to confirm that it produces the
> > > > expected
> > > > > > result.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi!
> > > > > > > > > > > > >
> > > > > > > > > > > > > I tested the patch. It doesn't seem to produce
> > the
> > > > same
> > > > > > > > > results. I
> > > > > > > > > > > > > think the problem is that it always starts
> > summing
> > > > from
> > > > > > an
> > > > > > > > > > > > > even address, the sum should always start from
> > the
> > > > first
> > > > > > byte
> > > > > > > > > according
> > > > > > > > > > > to
> > > > > > > > > > > > > the checksum specification. Can I instead
> > > > > > > > > > > > > propose
> > > > > > something
> > > > > > > > > Mattias
> > > > > > > > > > > > > Rönnblom sent me?
> > > > > > > > > > > >
> > > > > > > > > > > > I assume that it produces the same result when the
> > > > "buf"
> > > > > > > > > parameter is
> > > > > > > > > > > > aligned?
> > > > > > > > > > > >
> > > > > > > > > > > > And when the "buf" parameter is unaligned, I don't
> > > > expect
> > > > > > it to
> > > > > > > > > > > produce the
> > > > > > > > > > > > same results as the simple algorithm!
> > > > > > > > > > > >
> > > > > > > > > > > > This was the whole point of the patch: I expect
> > > > > > > > > > > > the overall
> > > > > > > > > packet
> > > > > > > > > > > buffer to
> > > > > > > > > > > > be 16 bit aligned, and the checksum to be a
> > > > > > > > > > > > partial
> > > > > > checksum of
> > > > > > > > > such
> > > > > > > > > > > a 16 bit
> > > > > > > > > > > > aligned packet buffer. When calling this function,
> > I
> > > > > > > > > > > > assume
> > > > > > that
> > > > > > > > > the
> > > > > > > > > > > "buf" and
> > > > > > > > > > > > "len" parameters point to a part of such a packet
> > > > buffer.
> > > > > > If
> > > > > > > > > these
> > > > > > > > > > > > expectations are correct, the simple algorithm
> > > > > > > > > > > > will produce
> > > > > > > > > incorrect
> > > > > > > > > > > results
> > > > > > > > > > > > when "buf" is unaligned.
> > > > > > > > > > > >
> > > > > > > > > > > > I was asking you to test if the checksum on the
> > packet
> > > > is
> > > > > > > > > > > > correct
> > > > > > > > > > > when your
> > > > > > > > > > > > application modifies an unaligned part of the
> > packet
> > > > and
> > > > > > uses
> > > > > > > > > this
> > > > > > > > > > > function to
> > > > > > > > > > > > update the checksum.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Now I understand your use case. Your use case seems
> > to
> > > > > > > > > > > be
> > > > > > about
> > > > > > > > > partial
> > > > > > > > > > > checksums, of which some partial checksums may start
> > on
> > > > > > unaligned
> > > > > > > > > > > addresses in an otherwise aligned packet.
> > > > > > > > > > >
> > > > > > > > > > > Our use case is about calculating the full checksum
> > on a
> > > > > > nested
> > > > > > > > > packet.
> > > > > > > > > > > That nested packet may start on unaligned addresses.
> > > > > > > > > > >
> > > > > > > > > > > The difference is basically if we want to sum over
> > > > aligned
> > > > > > > > > addresses or
> > > > > > > > > > > not, handling the heading and trailing bytes
> > > > appropriately.
> > > > > > > > > > >
> > > > > > > > > > > Your method does not work in our case since we want
> > to
> > > > treat
> > > > > > the
> > > > > > > > > first
> > > > > > > > > > > two bytes as the first word in our case. But I do
> > > > understand
> > > > > > that
> > > > > > > > > both
> > > > > > > > > > > methods are useful.
> > > > > > > > > >
> > > > > > > > > > Yes, that certainly are two different use cases,
> > requiring
> > > > two
> > > > > > > > > different ways of calculating the 16 bit checksum.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Note that your method breaks the API. Previously
> > > > (assuming
> > > > > > > > > > > no
> > > > > > > > > crashing
> > > > > > > > > > > due to low optimization levels, more accepting
> > hardware,
> > > > or
> > > > > > > > > > > a
> > > > > > > > > different
> > > > > > > > > > > compiler (version)) the current method would
> > calculate
> > > > the
> > > > > > > > > > > checksum assuming the first two bytes is the first
> > word.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Depending on the point of view, my patch either fixes
> > > > > > > > > > a bug
> > > > > > (where
> > > > > > > > > the checksum was calculated incorrectly when the buf
> > pointer
> > > > was
> > > > > > > > > unaligned) or breaks the API (by calculating the
> > differently
> > > > > > > > > when
> > > > > > the
> > > > > > > > > buffer is unaligned).
> > > > > > > > > >
> > > > > > > > > > I cannot say with certainty which one is correct, but
> > > > perhaps
> > > > > > some
> > > > > > > > > > of
> > > > > > > > > the people with a deeper DPDK track record can...
> > > > > > > > > >
> > > > > > > > > > @Bruce and @Stephen, in 2019 you signed off on a patch
> > [1]
> > > > > > > > > introducing a 16 bit alignment requirement to the
> > Ethernet
> > > > > > address
> > > > > > > > > structure.
> > > > > > > > > >
> > > > > > > > > > It is my understanding that DPDK has an invariant
> > > > > > > > > > requiring
> > > > > > packets
> > > > > > > > > to be 16 bit aligned, which that patch supports. Is this
> > > > > > invariant
> > > > > > > > > documented anywhere, or am I completely wrong? If I'm
> > wrong,
> > > > > > > > > then
> > > > > > the
> > > > > > > > > alignment requirement introduced in that patch needs to
> > be
> > > > > > removed, as
> > > > > > > > > well as any similar alignment requirements elsewhere in
> > DPDK.
> > > > > > > > >
> > > > > > > > > I don't believe it is explicitly documented as a global
> > > > > > invariant, but
> > > > > > > > > I think it should be unless there is a definite case
> > where
> > > > > > > > > we
> > > > > > need to
> > > > > > > > > allow packets to be completely unaligned. Across all
> > packet
> > > > > > headers we
> > > > > > > > > looked at, there was no tunneling protocol where the
> > > > resulting
> > > > > > packet
> > > > > > > > > was left unaligned.
> > > > > > > > >
> > > > > > > > > That said, if there are real use cases where we need to
> > > > > > > > > allow
> > > > > > packets
> > > > > > > > > to start at an unaligned address, then I agree with you
> > that
> > > > we
> > > > > > need
> > > > > > > > > to roll back the patch and work to ensure everything
> > works
> > > > with
> > > > > > > > > unaligned addresses.
> > > > > > > > >
> > > > > > > > > /Bruce
> > > > > > > > >
> > > > > > > >
> > > > > > > > @Emil, can you please describe or refer to which tunneling
> > > > > > > > protocol
> > > > > > you are
> > > > > > > > using, where the nested packet can be unaligned?
> > > > > > > >
> > > > > > > > I am asking to determine if your use case is exotic (maybe
> > > > > > > > some
> > > > > > Ericsson
> > > > > > > > proprietary protocol), or more generic (rooted in some
> > > > > > > > standard
> > > > > > protocol).
> > > > > > > > This information affects the DPDK community's opinion
> > > > > > > > about how
> > > > it
> > > > > > should
> > > > > > > > be supported by DPDK.
> > > > > > > >
> > > > > > > > If possible, please provide more details about the
> > tunneling
> > > > > > protocol and
> > > > > > > > nested packets... E.g. do the nested packets also contain
> > > > > > > > Layer
> > > > 2
> > > > > > (Ethernet,
> > > > > > > > VLAN, etc.) headers, or only Layer 3 (IP) or Layer 4 (TCP,
> > > > > > > > UDP,
> > > > > > etc.)? And how
> > > > > > > > about ARP packets and Layer 2 control protocol packets
> > (STP,
> > > > LACP,
> > > > > > etc.)?
> > > > > > > >
> > > > > > >
> > > > > > > Well, if you append or adjust an odd number of bytes (e.g. a
> > > > > > > PDCP
> > > > > > header) from a previously aligned payload the entire packet
> > will
> > > > then
> > > > > > be unaligned.
> > > > > > >
> > > > > >
> > > > > > If PDCP headers can leave the rest of the packet field
> > unaligned,
> > > > then
> > > > > > we had better remove the alignment restrictions through all of
> > > > DPDK.
> > > > > >
> > > > > > /Bruce
> > > > >
> > > > > Re-reading the details regarding unaligned pointers in C11, as
> > > > > posted
> > > > by Emil
> > > > > in Bugzilla [2], I interpret it as follows: Any 16 bit or wider
> > > > pointer type a must
> > > > > point to data aligned with that type, i.e. a pointer of the type
> > > > "uint16_t *"
> > > > > must point to 16 bit aligned data, and a pointer of the type
> > > > "uint64_t *" must
> > > > > point to 64 bit aligned data. Please, someone tell me I got this
> > > > wrong, and
> > > > > wake me up from my nightmare!
> > > > >
> > > > > Updating DPDK's packet structures to fully support this C11
> > > > limitation with
> > > > > unaligned access would be a nightmare, as we would need to use
> > byte
> > > > arrays
> > > > > for all structure fields. Functions would also be unable to use
> > > > > other
> > > > pointer
> > > > > types than "void *" and "char *", which seems to be the actual
> > > > problem in
> > > > > the __rte_raw_cksum() function. I guess that it also would
> > prevent
> > > > the
> > > > > compiler from auto-vectorizing the functions.
> > > > >
> > > > > I am usually a big proponent of academically correct solutions,
> > but
> > > > such a
> > > > > change would be too wide ranging, so I would like to narrow it
> > down
> > > > to the
> > > > > actual use case, and perhaps extrapolate a bit from there.
> > > > >
> > > > > @Emil: Do you only need to calculate the checksum of the
> > > > > (potentially
> > > > > unaligned) embedded packet? Or do you also need to use other
> > > > > DPDK functions with the embedded packet, potentially accessing
> > > > > it at
> > an
> > > > unaligned
> > > > > address?
> > > > >
> > > > > I'm trying to determine the scope of this C11 pointer alignment
> > > > limitation for
> > > > > your use case, i.e. whether or not other DPDK functions need to
> > be
> > > > updated
> > > > > to support unaligned packet access too.
> > > > >
> > > > > [2]
> > > > > https://protect2.fireeye.com/v1/url?k=31323334-501cfaf3-313273af
> > > > > -
> > > > > 454445554331-2ffe58e5caaeb74e&q=1&e=3f0544d3-8a71-4676-b4f9-
> > > > >
> > >
> 27e0952f7de0&u=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%
> > > > > 3D1035
> > > >
> > > > That's my interpretation of the standard as well; For example an
> > > > uint16_t* must be on even addresses. If not it is undefined
> > behavior.
> > > > I think this is a bigger problem on ARM for example.
> > > >
> > > > Without being that invested in dpdk, adding unaligned support for
> > > > everything seems like a steep step, but I'm not sure what it
> > entails
> > > > in practice.
> > > >
> > > > We are actually only interested in the checksumming.
> > >
> > > Great! Then we can cancel the panic about rewriting DPDK Core
> > completely.
> > > Although it might still need some review for similar alignment bugs,
> > where
> > > we have been forcing the compiler shut up when trying to warn us.
> > > :-)
> > >
> > > I have provided v3 of the patch, which should do as requested - and
> > still allow
> > > the compiler to auto-vectorize.
> > >
> > > @Emil, will you please test v3 of the patch?
> >
> > It seems to work in these two cases:
> > * Even address, even length
> > * Even address, odd length
> > But it breaks in these two cases:
> > * Odd address, even length (although it works for small buffers,
> > probably when the sum fits inside a uint16_t integer or something)
> 
> Interesting observation, good analysis.
> 
> > * Odd address, odd length
> 
> Does this also work for small buffers?
> 
> > I get (and like) the main idea of the algorithm but haven't yet
> > figured out what the problem is with odd addresses.
> 
> I wonder if I messed up the algorithm for swapping back the bytes in bsum
> after the calculation... Is the checksum also wrong when compiling without
> optimization?
> 
> And just to be sure: The algorithm requires that __rte_raw_cksum_reduce()
> is also applied to the sum. Please confirm that you call rte_raw_cksum() (or
> __rte_raw_cksum() followed by __rte_raw_cksum_reduce())?
> 

Yes, I messed up. I didn't run the reduction part. When I do the output seems to be the same.

It seems to be about as fast as the previous algorithm, obviously. Both valgrind and fsanitize=undefined are happy.

Some minor improvements:
* #include <stdbool.h>?
* Use RTE_PTR_ADD to make the casts cleaner?
* I guess you could skip using 'bsum' and add to 'sum' instead, but that's a matter of preference
* Can't you just do bsum += *(const unsigned char *)buf; to avoid 'first', making it a bit more readable?

> >
> > /Emil

Morten Brørup June 23, 2022, 12:18 p.m. UTC | #18

> From: Emil Berg [mailto:emil.berg@ericsson.com]
> Sent: Thursday, 23 June 2022 13.39
> 
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: den 23 juni 2022 09:01
> >
> > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > Sent: Thursday, 23 June 2022 07.22
> > >
> > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > Sent: den 22 juni 2022 16:02
> > > >
> > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > Sent: Wednesday, 22 June 2022 14.25
> > > > >
> > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > Sent: den 22 juni 2022 13:26
> > > > > >
> > > > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > > > > Sent: Wednesday, 22 June 2022 11.18
> > > > > > >
> > > > > > > On Wed, Jun 22, 2022 at 06:26:07AM +0000, Emil Berg wrote:
> > > > > > > >
> > > > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > > Sent: den 21 juni 2022 11:35
> > > > > > > > >
> > > > > > > > > > From: Bruce Richardson
> > > [mailto:bruce.richardson@intel.com]
> > > > > > > > > > Sent: Tuesday, 21 June 2022 10.23
> > > > > > > > > >
> > > > > > > > > > On Tue, Jun 21, 2022 at 10:05:15AM +0200, Morten
> Brørup
> > > > > wrote:
> > > > > > > > > > > +TO: @Bruce and @Stephen: You signed off on the 16
> bit
> > > > > > > alignment
> > > > > > > > > > requirement. We need background info on this.
> > > > > > > > > > >
> > > > > > > > > > > > From: Emil Berg [mailto:emil.berg@ericsson.com]
> > > > > > > > > > > > Sent: Tuesday, 21 June 2022 09.17
> > > > > > > > > > > >
> > > > > > > > > > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > > > > > > Sent: den 20 juni 2022 12:58
> > > > > > > > > > > > >
> > > > > > > > > > > > > > From: Emil Berg
> [mailto:emil.berg@ericsson.com]
> > > > > > > > > > > > > > Sent: Monday, 20 June 2022 12.38
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > From: Morten Brørup
> > <mb@smartsharesystems.com>
> > > > > > > > > > > > > > > Sent: den 17 juni 2022 11:07
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > From: Morten Brørup
> > > > > > > > > > > > > > > > [mailto:mb@smartsharesystems.com]
> > > > > > > > > > > > > > > > Sent: Friday, 17 June 2022 10.45
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > With this patch, the checksum can be
> > > calculated
> > > > > on
> > > > > > > > > > > > > > > > an
> > > > > > > > > > unligned
> > > > > > > > > > > > > > > > part
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > a packet buffer.
> > > > > > > > > > > > > > > > I.e. the buf parameter is no longer
> required
> > > to
> > > > > be
> > > > > > > > > > > > > > > > 16
> > > > > > > bit
> > > > > > > > > > > > aligned.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The DPDK invariant that packet buffers
> must
> > > be
> > > > > > > > > > > > > > > > 16 bit
> > > > > > > > > > aligned
> > > > > > > > > > > > > > remains
> > > > > > > > > > > > > > > > unchanged.
> > > > > > > > > > > > > > > > This invariant also defines how to
> calculate
> > > the
> > > > > 16
> > > > > > > bit
> > > > > > > > > > > > checksum
> > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > unaligned part of a packet buffer.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Bugzilla ID: 1035
> > > > > > > > > > > > > > > > Cc: stable@dpdk.org
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Signed-off-by: Morten Brørup
> > > > > > > <mb@smartsharesystems.com>
> > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > >  lib/net/rte_ip.h | 17 +++++++++++++++--
> > > > > > > > > > > > > > > >  1 file changed, 15 insertions(+), 2
> > > > > > > > > > > > > > > > deletions(-)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > diff --git a/lib/net/rte_ip.h
> > > b/lib/net/rte_ip.h
> > > > > > > index
> > > > > > > > > > > > > > > > b502481670..8e301d9c26 100644
> > > > > > > > > > > > > > > > --- a/lib/net/rte_ip.h
> > > > > > > > > > > > > > > > +++ b/lib/net/rte_ip.h
> > > > > > > > > > > > > > > > @@ -162,9 +162,22 @@
> __rte_raw_cksum(const
> > > void
> > > > > > > > > > > > > > > > *buf,
> > > > > > > > > > size_t
> > > > > > > > > > > > len,
> > > > > > > > > > > > > > > > uint32_t sum)  {
> > > > > > > > > > > > > > > >  	/* extend strict-aliasing rules */
> > > > > > > > > > > > > > > >  	typedef uint16_t
> > > > > > __attribute__((__may_alias__))
> > > > > > > > > > u16_p;
> > > > > > > > > > > > > > > > -	const u16_p *u16_buf = (const u16_p
> > > *)buf;
> > > > > > > > > > > > > > > > -	const u16_p *end = u16_buf + len /
> > > > > > > sizeof(*u16_buf);
> > > > > > > > > > > > > > > > +	const u16_p *u16_buf;
> > > > > > > > > > > > > > > > +	const u16_p *end;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +	/* if buffer is unaligned, keeping
> it
> > > byte
> > > > > > > order
> > > > > > > > > > > > independent */
> > > > > > > > > > > > > > > > +	if (unlikely((uintptr_t)buf & 1)) {
> > > > > > > > > > > > > > > > +		uint16_t first = 0;
> > > > > > > > > > > > > > > > +		if (unlikely(len == 0))
> > > > > > > > > > > > > > > > +			return 0;
> > > > > > > > > > > > > > > > +		((unsigned char *)&first)[1]
> =
> > > > > > *(const
> > > > > > > unsigned
> > > > > > > > > > > > > > > char *)buf;
> > > > > > > > > > > > > > > > +		sum += first;
> > > > > > > > > > > > > > > > +		buf = (const void
> *)((uintptr_t)buf
> > > > > > + 1);
> > > > > > > > > > > > > > > > +		len--;
> > > > > > > > > > > > > > > > +	}
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +	u16_buf = (const u16_p *)buf;
> > > > > > > > > > > > > > > > +	end = u16_buf + len /
> sizeof(*u16_buf);
> > > > > > > > > > > > > > > >  	for (; u16_buf != end; ++u16_buf)
> > > > > > > > > > > > > > > >  		sum += *u16_buf;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > 2.17.1
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @Emil, can you please test this patch with
> an
> > > > > > > > > > > > > > > unaligned
> > > > > > > > > > buffer on
> > > > > > > > > > > > > > your
> > > > > > > > > > > > > > > application to confirm that it produces the
> > > > > expected
> > > > > > > result.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I tested the patch. It doesn't seem to
> produce
> > > the
> > > > > same
> > > > > > > > > > results. I
> > > > > > > > > > > > > > think the problem is that it always starts
> > > summing
> > > > > from
> > > > > > > an
> > > > > > > > > > > > > > even address, the sum should always start
> from
> > > the
> > > > > first
> > > > > > > byte
> > > > > > > > > > according
> > > > > > > > > > > > to
> > > > > > > > > > > > > > the checksum specification. Can I instead
> > > > > > > > > > > > > > propose
> > > > > > > something
> > > > > > > > > > Mattias
> > > > > > > > > > > > > > Rönnblom sent me?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I assume that it produces the same result when
> the
> > > > > "buf"
> > > > > > > > > > parameter is
> > > > > > > > > > > > > aligned?
> > > > > > > > > > > > >
> > > > > > > > > > > > > And when the "buf" parameter is unaligned, I
> don't
> > > > > expect
> > > > > > > it to
> > > > > > > > > > > > produce the
> > > > > > > > > > > > > same results as the simple algorithm!
> > > > > > > > > > > > >
> > > > > > > > > > > > > This was the whole point of the patch: I expect
> > > > > > > > > > > > > the overall
> > > > > > > > > > packet
> > > > > > > > > > > > buffer to
> > > > > > > > > > > > > be 16 bit aligned, and the checksum to be a
> > > > > > > > > > > > > partial
> > > > > > > checksum of
> > > > > > > > > > such
> > > > > > > > > > > > a 16 bit
> > > > > > > > > > > > > aligned packet buffer. When calling this
> function,
> > > I
> > > > > > > > > > > > > assume
> > > > > > > that
> > > > > > > > > > the
> > > > > > > > > > > > "buf" and
> > > > > > > > > > > > > "len" parameters point to a part of such a
> packet
> > > > > buffer.
> > > > > > > If
> > > > > > > > > > these
> > > > > > > > > > > > > expectations are correct, the simple algorithm
> > > > > > > > > > > > > will produce
> > > > > > > > > > incorrect
> > > > > > > > > > > > results
> > > > > > > > > > > > > when "buf" is unaligned.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I was asking you to test if the checksum on the
> > > packet
> > > > > is
> > > > > > > > > > > > > correct
> > > > > > > > > > > > when your
> > > > > > > > > > > > > application modifies an unaligned part of the
> > > packet
> > > > > and
> > > > > > > uses
> > > > > > > > > > this
> > > > > > > > > > > > function to
> > > > > > > > > > > > > update the checksum.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Now I understand your use case. Your use case
> seems
> > > to
> > > > > > > > > > > > be
> > > > > > > about
> > > > > > > > > > partial
> > > > > > > > > > > > checksums, of which some partial checksums may
> start
> > > on
> > > > > > > unaligned
> > > > > > > > > > > > addresses in an otherwise aligned packet.
> > > > > > > > > > > >
> > > > > > > > > > > > Our use case is about calculating the full
> checksum
> > > on a
> > > > > > > nested
> > > > > > > > > > packet.
> > > > > > > > > > > > That nested packet may start on unaligned
> addresses.
> > > > > > > > > > > >
> > > > > > > > > > > > The difference is basically if we want to sum
> over
> > > > > aligned
> > > > > > > > > > addresses or
> > > > > > > > > > > > not, handling the heading and trailing bytes
> > > > > appropriately.
> > > > > > > > > > > >
> > > > > > > > > > > > Your method does not work in our case since we
> want
> > > to
> > > > > treat
> > > > > > > the
> > > > > > > > > > first
> > > > > > > > > > > > two bytes as the first word in our case. But I do
> > > > > understand
> > > > > > > that
> > > > > > > > > > both
> > > > > > > > > > > > methods are useful.
> > > > > > > > > > >
> > > > > > > > > > > Yes, that certainly are two different use cases,
> > > requiring
> > > > > two
> > > > > > > > > > different ways of calculating the 16 bit checksum.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Note that your method breaks the API. Previously
> > > > > (assuming
> > > > > > > > > > > > no
> > > > > > > > > > crashing
> > > > > > > > > > > > due to low optimization levels, more accepting
> > > hardware,
> > > > > or
> > > > > > > > > > > > a
> > > > > > > > > > different
> > > > > > > > > > > > compiler (version)) the current method would
> > > calculate
> > > > > the
> > > > > > > > > > > > checksum assuming the first two bytes is the
> first
> > > word.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Depending on the point of view, my patch either
> fixes
> > > > > > > > > > > a bug
> > > > > > > (where
> > > > > > > > > > the checksum was calculated incorrectly when the buf
> > > pointer
> > > > > was
> > > > > > > > > > unaligned) or breaks the API (by calculating the
> > > differently
> > > > > > > > > > when
> > > > > > > the
> > > > > > > > > > buffer is unaligned).
> > > > > > > > > > >
> > > > > > > > > > > I cannot say with certainty which one is correct,
> but
> > > > > perhaps
> > > > > > > some
> > > > > > > > > > > of
> > > > > > > > > > the people with a deeper DPDK track record can...
> > > > > > > > > > >
> > > > > > > > > > > @Bruce and @Stephen, in 2019 you signed off on a
> patch
> > > [1]
> > > > > > > > > > introducing a 16 bit alignment requirement to the
> > > Ethernet
> > > > > > > address
> > > > > > > > > > structure.
> > > > > > > > > > >
> > > > > > > > > > > It is my understanding that DPDK has an invariant
> > > > > > > > > > > requiring
> > > > > > > packets
> > > > > > > > > > to be 16 bit aligned, which that patch supports. Is
> this
> > > > > > > invariant
> > > > > > > > > > documented anywhere, or am I completely wrong? If I'm
> > > wrong,
> > > > > > > > > > then
> > > > > > > the
> > > > > > > > > > alignment requirement introduced in that patch needs
> to
> > > be
> > > > > > > removed, as
> > > > > > > > > > well as any similar alignment requirements elsewhere
> in
> > > DPDK.
> > > > > > > > > >
> > > > > > > > > > I don't believe it is explicitly documented as a
> global
> > > > > > > invariant, but
> > > > > > > > > > I think it should be unless there is a definite case
> > > where
> > > > > > > > > > we
> > > > > > > need to
> > > > > > > > > > allow packets to be completely unaligned. Across all
> > > packet
> > > > > > > headers we
> > > > > > > > > > looked at, there was no tunneling protocol where the
> > > > > resulting
> > > > > > > packet
> > > > > > > > > > was left unaligned.
> > > > > > > > > >
> > > > > > > > > > That said, if there are real use cases where we need
> to
> > > > > > > > > > allow
> > > > > > > packets
> > > > > > > > > > to start at an unaligned address, then I agree with
> you
> > > that
> > > > > we
> > > > > > > need
> > > > > > > > > > to roll back the patch and work to ensure everything
> > > works
> > > > > with
> > > > > > > > > > unaligned addresses.
> > > > > > > > > >
> > > > > > > > > > /Bruce
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > @Emil, can you please describe or refer to which
> tunneling
> > > > > > > > > protocol
> > > > > > > you are
> > > > > > > > > using, where the nested packet can be unaligned?
> > > > > > > > >
> > > > > > > > > I am asking to determine if your use case is exotic
> (maybe
> > > > > > > > > some
> > > > > > > Ericsson
> > > > > > > > > proprietary protocol), or more generic (rooted in some
> > > > > > > > > standard
> > > > > > > protocol).
> > > > > > > > > This information affects the DPDK community's opinion
> > > > > > > > > about how
> > > > > it
> > > > > > > should
> > > > > > > > > be supported by DPDK.
> > > > > > > > >
> > > > > > > > > If possible, please provide more details about the
> > > tunneling
> > > > > > > protocol and
> > > > > > > > > nested packets... E.g. do the nested packets also
> contain
> > > > > > > > > Layer
> > > > > 2
> > > > > > > (Ethernet,
> > > > > > > > > VLAN, etc.) headers, or only Layer 3 (IP) or Layer 4
> (TCP,
> > > > > > > > > UDP,
> > > > > > > etc.)? And how
> > > > > > > > > about ARP packets and Layer 2 control protocol packets
> > > (STP,
> > > > > LACP,
> > > > > > > etc.)?
> > > > > > > > >
> > > > > > > >
> > > > > > > > Well, if you append or adjust an odd number of bytes
> (e.g. a
> > > > > > > > PDCP
> > > > > > > header) from a previously aligned payload the entire packet
> > > will
> > > > > then
> > > > > > > be unaligned.
> > > > > > > >
> > > > > > >
> > > > > > > If PDCP headers can leave the rest of the packet field
> > > unaligned,
> > > > > then
> > > > > > > we had better remove the alignment restrictions through all
> of
> > > > > DPDK.
> > > > > > >
> > > > > > > /Bruce
> > > > > >
> > > > > > Re-reading the details regarding unaligned pointers in C11,
> as
> > > > > > posted
> > > > > by Emil
> > > > > > in Bugzilla [2], I interpret it as follows: Any 16 bit or
> wider
> > > > > pointer type a must
> > > > > > point to data aligned with that type, i.e. a pointer of the
> type
> > > > > "uint16_t *"
> > > > > > must point to 16 bit aligned data, and a pointer of the type
> > > > > "uint64_t *" must
> > > > > > point to 64 bit aligned data. Please, someone tell me I got
> this
> > > > > wrong, and
> > > > > > wake me up from my nightmare!
> > > > > >
> > > > > > Updating DPDK's packet structures to fully support this C11
> > > > > limitation with
> > > > > > unaligned access would be a nightmare, as we would need to
> use
> > > byte
> > > > > arrays
> > > > > > for all structure fields. Functions would also be unable to
> use
> > > > > > other
> > > > > pointer
> > > > > > types than "void *" and "char *", which seems to be the
> actual
> > > > > problem in
> > > > > > the __rte_raw_cksum() function. I guess that it also would
> > > prevent
> > > > > the
> > > > > > compiler from auto-vectorizing the functions.
> > > > > >
> > > > > > I am usually a big proponent of academically correct
> solutions,
> > > but
> > > > > such a
> > > > > > change would be too wide ranging, so I would like to narrow
> it
> > > down
> > > > > to the
> > > > > > actual use case, and perhaps extrapolate a bit from there.
> > > > > >
> > > > > > @Emil: Do you only need to calculate the checksum of the
> > > > > > (potentially
> > > > > > unaligned) embedded packet? Or do you also need to use other
> > > > > > DPDK functions with the embedded packet, potentially
> accessing
> > > > > > it at
> > > an
> > > > > unaligned
> > > > > > address?
> > > > > >
> > > > > > I'm trying to determine the scope of this C11 pointer
> alignment
> > > > > limitation for
> > > > > > your use case, i.e. whether or not other DPDK functions need
> to
> > > be
> > > > > updated
> > > > > > to support unaligned packet access too.
> > > > > >
> > > > > > [2]
> > > > > > https://protect2.fireeye.com/v1/url?k=31323334-501cfaf3-
> 313273af
> > > > > > -
> > > > > > 454445554331-2ffe58e5caaeb74e&q=1&e=3f0544d3-8a71-4676-b4f9-
> > > > > >
> > > >
> > 27e0952f7de0&u=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%
> > > > > > 3D1035
> > > > >
> > > > > That's my interpretation of the standard as well; For example
> an
> > > > > uint16_t* must be on even addresses. If not it is undefined
> > > behavior.
> > > > > I think this is a bigger problem on ARM for example.
> > > > >
> > > > > Without being that invested in dpdk, adding unaligned support
> for
> > > > > everything seems like a steep step, but I'm not sure what it
> > > entails
> > > > > in practice.
> > > > >
> > > > > We are actually only interested in the checksumming.
> > > >
> > > > Great! Then we can cancel the panic about rewriting DPDK Core
> > > completely.
> > > > Although it might still need some review for similar alignment
> bugs,
> > > where
> > > > we have been forcing the compiler shut up when trying to warn us.
> > > > :-)
> > > >
> > > > I have provided v3 of the patch, which should do as requested -
> and
> > > still allow
> > > > the compiler to auto-vectorize.
> > > >
> > > > @Emil, will you please test v3 of the patch?
> > >
> > > It seems to work in these two cases:
> > > * Even address, even length
> > > * Even address, odd length
> > > But it breaks in these two cases:
> > > * Odd address, even length (although it works for small buffers,
> > > probably when the sum fits inside a uint16_t integer or something)
> >
> > Interesting observation, good analysis.
> >
> > > * Odd address, odd length
> >
> > Does this also work for small buffers?
> >
> > > I get (and like) the main idea of the algorithm but haven't yet
> > > figured out what the problem is with odd addresses.
> >
> > I wonder if I messed up the algorithm for swapping back the bytes in
> bsum
> > after the calculation... Is the checksum also wrong when compiling
> without
> > optimization?
> >
> > And just to be sure: The algorithm requires that
> __rte_raw_cksum_reduce()
> > is also applied to the sum. Please confirm that you call
> rte_raw_cksum() (or
> > __rte_raw_cksum() followed by __rte_raw_cksum_reduce())?
> >
> 
> Yes, I messed up. I didn't run the reduction part. When I do the output
> seems to be the same.

I'm really happy to hear that! Thank you for informing me quickly.

> 
> It seems to be about as fast as the previous algorithm, obviously. Both
> valgrind and fsanitize=undefined are happy.
> 
> Some minor improvements:
> * #include <stdbool.h>?

Some include it, some don't.
Looking at other header files, I can't see any pattern, but I agree that it should be included here.

> * Use RTE_PTR_ADD to make the casts cleaner?

Yes. Will do.

> * I guess you could skip using 'bsum' and add to 'sum' instead, but
> that's a matter of preference

No, because we want to keep the byte order of the initial sum intact, and only swap the bytes of the buffer's sum (when the buffer is unaligned).
	
> * Can't you just do bsum += *(const unsigned char *)buf; to avoid
> 'first', making it a bit more readable?

No, because on little endian CPUs, the second byte is the MSB of the uint16_t.
It would only work on big endian CPUs.

I considered doing it differently. But for consistency, I reused the 'left' block's method; only I'm putting the spare byte in the second byte of the 16 bit word, whereas the 'first' block puts the spare byte in the first byte of the 16 bit word.


PS: A lot of smart tricks are available for IP checksumming... RFC 1071 is worth a read. :-)

diff mbox series

Patch

diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
index b502481670..8e301d9c26 100644
--- a/lib/net/rte_ip.h
+++ b/lib/net/rte_ip.h
@@ -162,9 +162,22 @@  __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
 {
 	/* extend strict-aliasing rules */
 	typedef uint16_t __attribute__((__may_alias__)) u16_p;
-	const u16_p *u16_buf = (const u16_p *)buf;
-	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
+	const u16_p *u16_buf;
+	const u16_p *end;
+
+	/* if buffer is unaligned, keeping it byte order independent */
+	if (unlikely((uintptr_t)buf & 1)) {
+		uint16_t first = 0;
+		if (unlikely(len == 0))
+			return 0;
+		((unsigned char *)&first)[1] = *(const unsigned char *)buf;
+		sum += first;
+		buf = (const void *)((uintptr_t)buf + 1);
+		len--;
+	}
 
+	u16_buf = (const u16_p *)buf;
+	end = u16_buf + len / sizeof(*u16_buf);
 	for (; u16_buf != end; ++u16_buf)
 		sum += *u16_buf;