[v3,2/2] net: have checksum routines accept unaligned data

Message ID 20220711121132.34546-2-mattias.ronnblom@ericsson.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series [v3,1/2] app/test: add cksum performance test |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/github-robot: build success github build: passed
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS

Commit Message

Mattias Rönnblom July 11, 2022, 12:11 p.m. UTC
  __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its
data through an uint16_t pointer, which allowed the compiler to assume
the data was 16-bit aligned. This in turn would, with certain
architectures and compiler flag combinations, result in code with SIMD
load or store instructions with restrictions on data alignment.

This patch keeps the old algorithm, but data is read using memcpy()
instead of direct pointer access, forcing the compiler to always
generate code that handles unaligned input. The __may_alias__ GCC
attribute is no longer needed.

The data on which the Internet checksum functions operates are almost
always 16-bit aligned, but there are exceptions. In particular, the
PDCP protocol header may (literally) have an odd size.

Performance impact seems to range from none to a very slight
regression.

Bugzilla ID: 1035
Cc: stable@dpdk.org

---

v3:
  * Use RTE_ALIGN_FLOOR() in the pointer arithmetic (Olivier Matz).
v2:
  * Simplified the odd-length conditional (Morten Brørup).

Reviewed-by: Morten Brørup <mb@smartsharesystems.com>

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/net/rte_ip.h | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)
  

Comments

Olivier Matz July 11, 2022, 1:25 p.m. UTC | #1
On Mon, Jul 11, 2022 at 02:11:32PM +0200, Mattias Rönnblom wrote:
> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its
> data through an uint16_t pointer, which allowed the compiler to assume
> the data was 16-bit aligned. This in turn would, with certain
> architectures and compiler flag combinations, result in code with SIMD
> load or store instructions with restrictions on data alignment.
> 
> This patch keeps the old algorithm, but data is read using memcpy()
> instead of direct pointer access, forcing the compiler to always
> generate code that handles unaligned input. The __may_alias__ GCC
> attribute is no longer needed.
> 
> The data on which the Internet checksum functions operates are almost
> always 16-bit aligned, but there are exceptions. In particular, the
> PDCP protocol header may (literally) have an odd size.
> 
> Performance impact seems to range from none to a very slight
> regression.
> 
> Bugzilla ID: 1035
> Cc: stable@dpdk.org

Fixes: 6006818cfb26 ("net: new checksum functions")

> ---
> 
> v3:
>   * Use RTE_ALIGN_FLOOR() in the pointer arithmetic (Olivier Matz).
> v2:
>   * Simplified the odd-length conditional (Morten Brørup).
> 
> Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>

Acked-by: Olivier Matz <olivier.matz@6wind.com>

Thank you!
  
Mattias Rönnblom Aug. 8, 2022, 9:25 a.m. UTC | #2
On 2022-07-11 15:25, Olivier Matz wrote:
> On Mon, Jul 11, 2022 at 02:11:32PM +0200, Mattias Rönnblom wrote:
>> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its
>> data through an uint16_t pointer, which allowed the compiler to assume
>> the data was 16-bit aligned. This in turn would, with certain
>> architectures and compiler flag combinations, result in code with SIMD
>> load or store instructions with restrictions on data alignment.
>>
>> This patch keeps the old algorithm, but data is read using memcpy()
>> instead of direct pointer access, forcing the compiler to always
>> generate code that handles unaligned input. The __may_alias__ GCC
>> attribute is no longer needed.
>>
>> The data on which the Internet checksum functions operates are almost
>> always 16-bit aligned, but there are exceptions. In particular, the
>> PDCP protocol header may (literally) have an odd size.
>>
>> Performance impact seems to range from none to a very slight
>> regression.
>>
>> Bugzilla ID: 1035
>> Cc: stable@dpdk.org
> 
> Fixes: 6006818cfb26 ("net: new checksum functions")
> 
>> ---
>>
>> v3:
>>    * Use RTE_ALIGN_FLOOR() in the pointer arithmetic (Olivier Matz).
>> v2:
>>    * Simplified the odd-length conditional (Morten Brørup).
>>
>> Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> 
> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> 
> Thank you!

Will this be merged into 22.11? Into the stable branches?
  
Mattias Rönnblom Sept. 20, 2022, 12:09 p.m. UTC | #3
On 2022-07-11 15:25, Olivier Matz wrote:
> On Mon, Jul 11, 2022 at 02:11:32PM +0200, Mattias Rönnblom wrote:
>> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its
>> data through an uint16_t pointer, which allowed the compiler to assume
>> the data was 16-bit aligned. This in turn would, with certain
>> architectures and compiler flag combinations, result in code with SIMD
>> load or store instructions with restrictions on data alignment.
>>
>> This patch keeps the old algorithm, but data is read using memcpy()
>> instead of direct pointer access, forcing the compiler to always
>> generate code that handles unaligned input. The __may_alias__ GCC
>> attribute is no longer needed.
>>
>> The data on which the Internet checksum functions operates are almost
>> always 16-bit aligned, but there are exceptions. In particular, the
>> PDCP protocol header may (literally) have an odd size.
>>
>> Performance impact seems to range from none to a very slight
>> regression.
>>
>> Bugzilla ID: 1035
>> Cc: stable@dpdk.org
> 
> Fixes: 6006818cfb26 ("net: new checksum functions")
> 
>> ---
>>
>> v3:
>>    * Use RTE_ALIGN_FLOOR() in the pointer arithmetic (Olivier Matz).
>> v2:
>>    * Simplified the odd-length conditional (Morten Brørup).
>>
>> Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> 
> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> 
> Thank you!

Are there any plans to merge this patchset?
  
Thomas Monjalon Sept. 20, 2022, 4:10 p.m. UTC | #4
20/09/2022 14:09, Mattias Rönnblom:
> On 2022-07-11 15:25, Olivier Matz wrote:
> > On Mon, Jul 11, 2022 at 02:11:32PM +0200, Mattias Rönnblom wrote:
> >> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its
> >> data through an uint16_t pointer, which allowed the compiler to assume
> >> the data was 16-bit aligned. This in turn would, with certain
> >> architectures and compiler flag combinations, result in code with SIMD
> >> load or store instructions with restrictions on data alignment.
> >>
> >> This patch keeps the old algorithm, but data is read using memcpy()
> >> instead of direct pointer access, forcing the compiler to always
> >> generate code that handles unaligned input. The __may_alias__ GCC
> >> attribute is no longer needed.
> >>
> >> The data on which the Internet checksum functions operates are almost
> >> always 16-bit aligned, but there are exceptions. In particular, the
> >> PDCP protocol header may (literally) have an odd size.
> >>
> >> Performance impact seems to range from none to a very slight
> >> regression.
> >>
> >> Bugzilla ID: 1035
> >> Cc: stable@dpdk.org
> > 
> > Fixes: 6006818cfb26 ("net: new checksum functions")
> > 
> >> ---
> >>
> >> v3:
> >>    * Use RTE_ALIGN_FLOOR() in the pointer arithmetic (Olivier Matz).
> >> v2:
> >>    * Simplified the odd-length conditional (Morten Brørup).
> >>
> >> Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
> >>
> >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> > 
> > Acked-by: Olivier Matz <olivier.matz@6wind.com>
> > 
> > Thank you!
> 
> Are there any plans to merge this patchset?

Applied, thanks.
Sorry for the delay.
  

Patch

diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
index b502481670..ecd250e9be 100644
--- a/lib/net/rte_ip.h
+++ b/lib/net/rte_ip.h
@@ -160,18 +160,21 @@  rte_ipv4_hdr_len(const struct rte_ipv4_hdr *ipv4_hdr)
 static inline uint32_t
 __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
 {
-	/* extend strict-aliasing rules */
-	typedef uint16_t __attribute__((__may_alias__)) u16_p;
-	const u16_p *u16_buf = (const u16_p *)buf;
-	const u16_p *end = u16_buf + len / sizeof(*u16_buf);
+	const void *end;
 
-	for (; u16_buf != end; ++u16_buf)
-		sum += *u16_buf;
+	for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t)));
+	     buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) {
+		uint16_t v;
+
+		memcpy(&v, buf, sizeof(uint16_t));
+		sum += v;
+	}
 
 	/* if length is odd, keeping it byte order independent */
 	if (unlikely(len % 2)) {
 		uint16_t left = 0;
-		*(unsigned char *)&left = *(const unsigned char *)end;
+
+		memcpy(&left, end, 1);
 		sum += left;
 	}