[v3] hash: add XOR32 hash function

Message ID 20230215110630.3885175-1-qobilidop@gmail.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [v3] hash: add XOR32 hash function |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/github-robot: build success github build: passed
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-abi-testing success Testing PASS

Commit Message

Bili Dong Feb. 15, 2023, 11:06 a.m. UTC
  An XOR32 hash is needed in the Software Switch (SWX) Pipeline for its
use case in P4. We implement it in this patch so it could be easily
registered in the pipeline later.

Signed-off-by: Bili Dong <qobilidop@gmail.com>
---
 .mailmap                       |  1 +
 app/test/test_hash_functions.c | 33 +++++++++++++++--
 lib/hash/rte_hash_xor.h        | 65 ++++++++++++++++++++++++++++++++++
 3 files changed, 96 insertions(+), 3 deletions(-)
 create mode 100644 lib/hash/rte_hash_xor.h
  

Comments

Morten Brørup Feb. 15, 2023, 11:39 a.m. UTC | #1
> From: Bili Dong [mailto:qobilidop@gmail.com]
> Sent: Wednesday, 15 February 2023 12.07
> 
> An XOR32 hash is needed in the Software Switch (SWX) Pipeline for its
> use case in P4. We implement it in this patch so it could be easily
> registered in the pipeline later.
> 
> Signed-off-by: Bili Dong <qobilidop@gmail.com>
> ---

[...]

> +#define LEFT8b_MASK rte_cpu_to_be_32(0xff000000)
> +#define LEFT16b_MASK rte_cpu_to_be_32(0xffff0000)
> +
> +/**
> + * Calculate XOR32 hash on user-supplied byte array.
> + *
> + * @param data
> + *   Data to perform hash on.
> + * @param data_len
> + *   How many bytes to use to calculate hash value.
> + * @param init_val
> + *   Value to initialise hash generator.
> + * @return
> + *   32bit calculated hash value.
> + */
> +static inline uint32_t
> +rte_hash_xor(const void *data, uint32_t data_len, uint32_t init_val)
> +{
> +	uint32_t i;
> +	uintptr_t pd = (uintptr_t) data;
> +	init_val = rte_cpu_to_be_32(init_val);
> +
> +	for (i = 0; i < data_len / 4; i++) {
> +		init_val ^= *(const uint32_t *)pd;
> +		pd += 4;
> +	}
> +
> +	if (data_len & 0x2) {
> +		init_val ^= *(const uint32_t *)pd & LEFT16b_MASK;
> +		pd += 2;
> +	}
> +
> +	if (data_len & 0x1)
> +		init_val ^= *(const uint32_t *)pd & LEFT8b_MASK;
> +
> +	init_val = rte_be_to_cpu_32(init_val);
> +	return init_val;
> +}

I think that this function has swapped big endian and CPU endian everywhere. The result is the same, but the code would be much less confusing if using rte_cpu_32_to_be() when converting from CPU endian to big endian, and rte_be_to_cpu_32() when converting the other way.

I also suppose that the return type and the init_val parameter were meant to be rte_be32_t.

Also, please document that the byte array must be 32 bit aligned. Alternatively, implement support for unaligned data. You can find inspiration for handling of unaligned data in the __rte_raw_cksum() function:
https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/net/rte_ip.h#L162
  
Bili Dong Feb. 15, 2023, 9:39 p.m. UTC | #2
Hi Morten,

Thanks for your comments!

For endianness conversion, I double-checked my usages. I did use both
rte_cpu_to_be_32() and rte_be_to_cpu_32(). I might have missed something
but I think I used them (4 occurrences) in a semantically meaningful way.
Could you point me to the lines that are confusing?

The hash function signature has to conform to
https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/table/rte_swx_hash_func.h#L31,
so I don't have the freedom to change the parameter type to rte_be32_t,
although personally I agree with you and would prefer to make everything
consistently big-endian here.

I'm not sure about the byte alignment assumptions used in hash functions.
My implementation basically follows the existing CRC32 hash:
https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/hash/rte_hash_crc.h#L168,
and I don't see byte alignment handled there. Maybe someone more familiar
with lib/hash/ could provide some context on this?

Thanks,
Bili

On Wed, Feb 15, 2023 at 3:39 AM Morten Brørup <mb@smartsharesystems.com>
wrote:

> > From: Bili Dong [mailto:qobilidop@gmail.com]
> > Sent: Wednesday, 15 February 2023 12.07
> >
> > An XOR32 hash is needed in the Software Switch (SWX) Pipeline for its
> > use case in P4. We implement it in this patch so it could be easily
> > registered in the pipeline later.
> >
> > Signed-off-by: Bili Dong <qobilidop@gmail.com>
> > ---
>
> [...]
>
> > +#define LEFT8b_MASK rte_cpu_to_be_32(0xff000000)
> > +#define LEFT16b_MASK rte_cpu_to_be_32(0xffff0000)
> > +
> > +/**
> > + * Calculate XOR32 hash on user-supplied byte array.
> > + *
> > + * @param data
> > + *   Data to perform hash on.
> > + * @param data_len
> > + *   How many bytes to use to calculate hash value.
> > + * @param init_val
> > + *   Value to initialise hash generator.
> > + * @return
> > + *   32bit calculated hash value.
> > + */
> > +static inline uint32_t
> > +rte_hash_xor(const void *data, uint32_t data_len, uint32_t init_val)
> > +{
> > +     uint32_t i;
> > +     uintptr_t pd = (uintptr_t) data;
> > +     init_val = rte_cpu_to_be_32(init_val);
> > +
> > +     for (i = 0; i < data_len / 4; i++) {
> > +             init_val ^= *(const uint32_t *)pd;
> > +             pd += 4;
> > +     }
> > +
> > +     if (data_len & 0x2) {
> > +             init_val ^= *(const uint32_t *)pd & LEFT16b_MASK;
> > +             pd += 2;
> > +     }
> > +
> > +     if (data_len & 0x1)
> > +             init_val ^= *(const uint32_t *)pd & LEFT8b_MASK;
> > +
> > +     init_val = rte_be_to_cpu_32(init_val);
> > +     return init_val;
> > +}
>
> I think that this function has swapped big endian and CPU endian
> everywhere. The result is the same, but the code would be much less
> confusing if using rte_cpu_32_to_be() when converting from CPU endian to
> big endian, and rte_be_to_cpu_32() when converting the other way.
>
> I also suppose that the return type and the init_val parameter were meant
> to be rte_be32_t.
>
> Also, please document that the byte array must be 32 bit aligned.
> Alternatively, implement support for unaligned data. You can find
> inspiration for handling of unaligned data in the __rte_raw_cksum()
> function:
> https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/net/rte_ip.h#L162
>
>
>
  
Morten Brørup Feb. 16, 2023, 9:49 a.m. UTC | #3
Hi Bili,

 

OK. I’m not familiar with this library, but the other function assumes 8 byte aligned data, so this should be OK too.

 

And with the required function signature, I now understand why you use endian conversions this way.

 

Acked-by: Morten Brørup <mb@smartsharesystems.com>

 

 

Med venlig hilsen / Kind regards,

-Morten Brørup

 

From: Bili Dong [mailto:qobilidop@gmail.com] 
Sent: Wednesday, 15 February 2023 22.40
To: Morten Brørup
Cc: yipeng1.wang@intel.com; sameh.gobriel@intel.com; bruce.richardson@intel.com; vladimir.medvedkin@intel.com; cristian.dumitrescu@intel.com; dev@dpdk.org
Subject: Re: [PATCH v3] hash: add XOR32 hash function

 

Hi Morten,

 

Thanks for your comments!

 

For endianness conversion, I double-checked my usages. I did use both rte_cpu_to_be_32() and rte_be_to_cpu_32(). I might have missed something but I think I used them (4 occurrences) in a semantically meaningful way. Could you point me to the lines that are confusing?

 

The hash function signature has to conform to https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/table/rte_swx_hash_func.h#L31, so I don't have the freedom to change the parameter type to rte_be32_t, although personally I agree with you and would prefer to make everything consistently big-endian here.

 

I'm not sure about the byte alignment assumptions used in hash functions. My implementation basically follows the existing CRC32 hash: https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/hash/rte_hash_crc.h#L168, and I don't see byte alignment handled there. Maybe someone more familiar with lib/hash/ could provide some context on this?

 

Thanks,

Bili

 

On Wed, Feb 15, 2023 at 3:39 AM Morten Brørup <mb@smartsharesystems.com> wrote:

	> From: Bili Dong [mailto:qobilidop@gmail.com]
	> Sent: Wednesday, 15 February 2023 12.07
	> 
	> An XOR32 hash is needed in the Software Switch (SWX) Pipeline for its
	> use case in P4. We implement it in this patch so it could be easily
	> registered in the pipeline later.
	> 
	> Signed-off-by: Bili Dong <qobilidop@gmail.com>
	> ---
	
	[...]
	
	> +#define LEFT8b_MASK rte_cpu_to_be_32(0xff000000)
	> +#define LEFT16b_MASK rte_cpu_to_be_32(0xffff0000)
	> +
	> +/**
	> + * Calculate XOR32 hash on user-supplied byte array.
	> + *
	> + * @param data
	> + *   Data to perform hash on.
	> + * @param data_len
	> + *   How many bytes to use to calculate hash value.
	> + * @param init_val
	> + *   Value to initialise hash generator.
	> + * @return
	> + *   32bit calculated hash value.
	> + */
	> +static inline uint32_t
	> +rte_hash_xor(const void *data, uint32_t data_len, uint32_t init_val)
	> +{
	> +     uint32_t i;
	> +     uintptr_t pd = (uintptr_t) data;
	> +     init_val = rte_cpu_to_be_32(init_val);
	> +
	> +     for (i = 0; i < data_len / 4; i++) {
	> +             init_val ^= *(const uint32_t *)pd;
	> +             pd += 4;
	> +     }
	> +
	> +     if (data_len & 0x2) {
	> +             init_val ^= *(const uint32_t *)pd & LEFT16b_MASK;
	> +             pd += 2;
	> +     }
	> +
	> +     if (data_len & 0x1)
	> +             init_val ^= *(const uint32_t *)pd & LEFT8b_MASK;
	> +
	> +     init_val = rte_be_to_cpu_32(init_val);
	> +     return init_val;
	> +}
	
	I think that this function has swapped big endian and CPU endian everywhere. The result is the same, but the code would be much less confusing if using rte_cpu_32_to_be() when converting from CPU endian to big endian, and rte_be_to_cpu_32() when converting the other way.
	
	I also suppose that the return type and the init_val parameter were meant to be rte_be32_t.
	
	Also, please document that the byte array must be 32 bit aligned. Alternatively, implement support for unaligned data. You can find inspiration for handling of unaligned data in the __rte_raw_cksum() function:
	https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/net/rte_ip.h#L162
  
Thomas Monjalon Feb. 20, 2023, 1:49 p.m. UTC | #4
15/02/2023 12:06, Bili Dong:
> An XOR32 hash is needed in the Software Switch (SWX) Pipeline for its
> use case in P4. We implement it in this patch so it could be easily
> registered in the pipeline later.
> 
> Signed-off-by: Bili Dong <qobilidop@gmail.com>
> ---
> +/**
> + * Calculate XOR32 hash on user-supplied byte array.
> + *
> + * @param data
> + *   Data to perform hash on.
> + * @param data_len
> + *   How many bytes to use to calculate hash value.
> + * @param init_val
> + *   Value to initialise hash generator.
> + * @return
> + *   32bit calculated hash value.
> + */
> +static inline uint32_t
> +rte_hash_xor(const void *data, uint32_t data_len, uint32_t init_val)

Should we add "32" in the function name?
  
Bili Dong Feb. 20, 2023, 5:21 p.m. UTC | #5
The naming is following the existing CRC32 hash:
https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/hash/rte_hash_crc.h#L168.
I believe all existing hash functions in DPDK are 32 bits, so "32" didn't
appear in other hash function names. If we add "32" here, we probably
should also rename rte_hash_crc(). I'm fine with either option.

On Mon, Feb 20, 2023 at 5:49 AM Thomas Monjalon <thomas@monjalon.net> wrote:

> 15/02/2023 12:06, Bili Dong:
> > An XOR32 hash is needed in the Software Switch (SWX) Pipeline for its
> > use case in P4. We implement it in this patch so it could be easily
> > registered in the pipeline later.
> >
> > Signed-off-by: Bili Dong <qobilidop@gmail.com>
> > ---
> > +/**
> > + * Calculate XOR32 hash on user-supplied byte array.
> > + *
> > + * @param data
> > + *   Data to perform hash on.
> > + * @param data_len
> > + *   How many bytes to use to calculate hash value.
> > + * @param init_val
> > + *   Value to initialise hash generator.
> > + * @return
> > + *   32bit calculated hash value.
> > + */
> > +static inline uint32_t
> > +rte_hash_xor(const void *data, uint32_t data_len, uint32_t init_val)
>
> Should we add "32" in the function name?
>
>
>
  
Thomas Monjalon Feb. 20, 2023, 5:38 p.m. UTC | #6
20/02/2023 18:21, Bili Dong:
> The naming is following the existing CRC32 hash:
> https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/hash/rte_hash_crc.h#L168.
> I believe all existing hash functions in DPDK are 32 bits, so "32" didn't
> appear in other hash function names. If we add "32" here, we probably
> should also rename rte_hash_crc(). I'm fine with either option.

Why all functions would be 32-bit?
I don't think we need to rename all.
We can just make the right thing when adding a new function.

What maintainers of rte_hash think?
  
Bruce Richardson Feb. 20, 2023, 5:51 p.m. UTC | #7
On Mon, Feb 20, 2023 at 06:38:23PM +0100, Thomas Monjalon wrote:
> 20/02/2023 18:21, Bili Dong:
> > The naming is following the existing CRC32 hash:
> > https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/hash/rte_hash_crc.h#L168.
> > I believe all existing hash functions in DPDK are 32 bits, so "32" didn't
> > appear in other hash function names. If we add "32" here, we probably
> > should also rename rte_hash_crc(). I'm fine with either option.
> 
> Why all functions would be 32-bit?
> I don't think we need to rename all.
> We can just make the right thing when adding a new function.
> 
> What maintainers of rte_hash think?
> 
+1 to adding the 32 for clarity.

If we want consistency, it's easy enough to create some aliases for the
existing functions with the "32" extension on them. No need to remove the
old names so there would be no compatibility issues.
  
Bili Dong Feb. 20, 2023, 5:54 p.m. UTC | #8
Got it. I’ll update the patch.

On Mon, Feb 20, 2023 at 9:52 AM Bruce Richardson <bruce.richardson@intel.com>
wrote:

> On Mon, Feb 20, 2023 at 06:38:23PM +0100, Thomas Monjalon wrote:
> > 20/02/2023 18:21, Bili Dong:
> > > The naming is following the existing CRC32 hash:
> > >
> https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/hash/rte_hash_crc.h#L168
> .
> > > I believe all existing hash functions in DPDK are 32 bits, so "32"
> didn't
> > > appear in other hash function names. If we add "32" here, we probably
> > > should also rename rte_hash_crc(). I'm fine with either option.
> >
> > Why all functions would be 32-bit?
> > I don't think we need to rename all.
> > We can just make the right thing when adding a new function.
> >
> > What maintainers of rte_hash think?
> >
> +1 to adding the 32 for clarity.
>
> If we want consistency, it's easy enough to create some aliases for the
> existing functions with the "32" extension on them. No need to remove the
> old names so there would be no compatibility issues.
>
  
Cristian Dumitrescu Feb. 20, 2023, 6:19 p.m. UTC | #9
Hi Bili,

Would it be possible to also remove the endianness conversion from this hash function? What would be the impact if the rte_cpu_to_be() functions are removed?

Thanks,
Cristian
  
Bili Dong Feb. 20, 2023, 6:50 p.m. UTC | #10
I think the endianness conversions are necessary, otherwise, the hash
function return value will be host endianness dependent.

For example, what should rte_hash_xor32(/*data*/ &{0x00, 0x01, 0x02, 0x03},
/*data_len*/ 4, /*init_val*/ 0x0) return? With the current implementation,
it returns 0x00010203 consistently. If we remove the endianness
conversions, it will return 0x00010203 on big-endian hosts and 0x03020100
on little-endian hosts. Please correct me but I don't think that's the
expected behavior.

On Mon, Feb 20, 2023 at 10:19 AM Dumitrescu, Cristian <
cristian.dumitrescu@intel.com> wrote:

> Hi Bili,
>
>
>
> Would it be possible to also remove the endianness conversion from this
> hash function? What would be the impact if the rte_cpu_to_be() functions
> are removed?
>
>
>
> Thanks,
>
> Cristian
>
>
  
Vladimir Medvedkin Feb. 20, 2023, 8:10 p.m. UTC | #11
Hi Bill,

On 15/02/2023 11:06, Bili Dong wrote:
> An XOR32 hash is needed in the Software Switch (SWX) Pipeline for its
> use case in P4. We implement it in this patch so it could be easily
> registered in the pipeline later.
>
> Signed-off-by: Bili Dong <qobilidop@gmail.com>
> ---
> +static inline uint32_t
> +rte_hash_xor(const void *data, uint32_t data_len, uint32_t init_val)
> +{
> +	uint32_t i;
> +	uintptr_t pd = (uintptr_t) data;
> +	init_val = rte_cpu_to_be_32(init_val);
> +
> +	for (i = 0; i < data_len / 4; i++) {
> +		init_val ^= *(const uint32_t *)pd;
> +		pd += 4;
> +	}
> +
> +	if (data_len & 0x2) {
> +		init_val ^= *(const uint32_t *)pd & LEFT16b_MASK;

Here you are reading 2 bytes after the data buffer, which can sometimes 
lead to segfault. I think it would be better just to:

init_val ^= *(const uint16_t *)pd << 16;

The same with the section bellow

> +		pd += 2;
> +	}
> +
> +	if (data_len & 0x1)
> +		init_val ^= *(const uint32_t *)pd & LEFT8b_MASK;
> +
> +	init_val = rte_be_to_cpu_32(init_val);
> +	return init_val;
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_HASH_XOR_H_ */
  
Bili Dong Feb. 20, 2023, 8:17 p.m. UTC | #12
Got it. Will fix. Thanks!

On Mon, Feb 20, 2023 at 12:10 PM Medvedkin, Vladimir <
vladimir.medvedkin@intel.com> wrote:

> Hi Bill,
>
> On 15/02/2023 11:06, Bili Dong wrote:
> > An XOR32 hash is needed in the Software Switch (SWX) Pipeline for its
> > use case in P4. We implement it in this patch so it could be easily
> > registered in the pipeline later.
> >
> > Signed-off-by: Bili Dong <qobilidop@gmail.com>
> > ---
> > +static inline uint32_t
> > +rte_hash_xor(const void *data, uint32_t data_len, uint32_t init_val)
> > +{
> > +     uint32_t i;
> > +     uintptr_t pd = (uintptr_t) data;
> > +     init_val = rte_cpu_to_be_32(init_val);
> > +
> > +     for (i = 0; i < data_len / 4; i++) {
> > +             init_val ^= *(const uint32_t *)pd;
> > +             pd += 4;
> > +     }
> > +
> > +     if (data_len & 0x2) {
> > +             init_val ^= *(const uint32_t *)pd & LEFT16b_MASK;
>
> Here you are reading 2 bytes after the data buffer, which can sometimes
> lead to segfault. I think it would be better just to:
>
> init_val ^= *(const uint16_t *)pd << 16;
>
> The same with the section bellow
>
> > +             pd += 2;
> > +     }
> > +
> > +     if (data_len & 0x1)
> > +             init_val ^= *(const uint32_t *)pd & LEFT8b_MASK;
> > +
> > +     init_val = rte_be_to_cpu_32(init_val);
> > +     return init_val;
> > +}
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_HASH_XOR_H_ */
>
> --
> Regards,
> Vladimir
>
>
  
Cristian Dumitrescu Feb. 20, 2023, 8:19 p.m. UTC | #13
HI Bili,

Comments inline below:

<snip>

> diff --git a/lib/hash/rte_hash_xor.h b/lib/hash/rte_hash_xor.h
> new file mode 100644
> index 0000000000..7004f83ec2
> --- /dev/null
> +++ b/lib/hash/rte_hash_xor.h
> @@ -0,0 +1,65 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2023 Intel Corporation
> + */
> +
> +#ifndef _RTE_HASH_XOR_H_
> +#define _RTE_HASH_XOR_H_
> +
> +/**
> + * @file
> + *
> + * RTE XOR Hash
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <stdint.h>
> +
> +#include <rte_byteorder.h>
> +
> +#define LEFT8b_MASK rte_cpu_to_be_32(0xff000000)

I know that cpu_to_be() and be_to_cpu() are doing the same thing, but to me the correct function to use here is be_to_cpu(), as the for loop in the function below works with values in the CPU endianness. Would you agree?

> +#define LEFT16b_MASK rte_cpu_to_be_32(0xffff0000)

I know that cpu_to_be() and be_to_cpu() are doing the same thing, but to me the correct function to use here is be_to_cpu(), as the for loop in the function below works with values in the CPU endianness. Would you agree?

> +
> +/**
> + * Calculate XOR32 hash on user-supplied byte array.
> + *
> + * @param data
> + *   Data to perform hash on.
> + * @param data_len
> + *   How many bytes to use to calculate hash value.
> + * @param init_val
> + *   Value to initialise hash generator.
> + * @return
> + *   32bit calculated hash value.
> + */
> +static inline uint32_t
> +rte_hash_xor(const void *data, uint32_t data_len, uint32_t init_val)
> +{
> +	uint32_t i;
> +	uintptr_t pd = (uintptr_t) data;
> +	init_val = rte_cpu_to_be_32(init_val);
I don't think this is correct, I the correct version here is to remove the above assignment, as I think the intention of this function (for performance reasons) is to do the endianness conversion only when needed, which is once at the end of the function (and also when handling the 2-byte and 1-byte left-overs).

> +
> +	for (i = 0; i < data_len / 4; i++) {
> +		init_val ^= *(const uint32_t *)pd;
> +		pd += 4;
> +	}
> +
> +	if (data_len & 0x2) {
> +		init_val ^= *(const uint32_t *)pd & LEFT16b_MASK;
> +		pd += 2;
> +	}
> +
> +	if (data_len & 0x1)
> +		init_val ^= *(const uint32_t *)pd & LEFT8b_MASK;
> +
> +	init_val = rte_be_to_cpu_32(init_val);
> +	return init_val;
> +}
> +

Due to the XOR properties (endianness-insensitivity, no carry bits, etc) and for performance reasons, I would also recommend implementing a 64-bit version of this function (while keeping the 32-bit result), similar to this:

uint64_t *data64 = (uint64_t *)data;
uint64_t result = init_data;

/* Read & accumulate input data in 8-byte increments. */
for (i = 0; i < data_len / 8; i++)
	result ^= *data64++;

data_len &= 0x7;

/* Handle any remaining bytes in the input data (up to 7 bytes). */
if (data_len >= 4) {
	uint32_t *data32 = (uint32_t *)data64;

	/* Read and accumulate  the next 4 bytes from the input data. */
	result ^= *data32++;
	data_len -= 4;

	if (data_len >= 2) {
		uint16_t *data16 = (uint16_t *)data32;

		/* Read and accumulate the next 2 bytes from the input data. */
		result ^= *data16++
		data_len -= 2;

		if (data_len) {
			uint8_t *data8 = (uint8_t *)data8;

			/* Read and accumulate the next byte from the input data. */
			result ^= *data8;
		}
	}
}

/* Accumulate the upper 32 bits on top of the lower 32 bits. */
result ^= result >> 32;

/* Single endianness swap at the very end. */
return rte_cpu_to_be32((uint32_t)result);

What do you think?

Regards,
Cristian
  
Bili Dong Feb. 20, 2023, 8:44 p.m. UTC | #14
Hi Cristian,

I agree the 64-bit version could enable better performance, and I will do
it in the next version.

For endianness, see my comments below (inline):

On Mon, Feb 20, 2023 at 12:19 PM Dumitrescu, Cristian <
cristian.dumitrescu@intel.com> wrote:

> HI Bili,
>
> Comments inline below:
>
> <snip>
>
> > diff --git a/lib/hash/rte_hash_xor.h b/lib/hash/rte_hash_xor.h
> > new file mode 100644
> > index 0000000000..7004f83ec2
> > --- /dev/null
> > +++ b/lib/hash/rte_hash_xor.h
> > @@ -0,0 +1,65 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2023 Intel Corporation
> > + */
> > +
> > +#ifndef _RTE_HASH_XOR_H_
> > +#define _RTE_HASH_XOR_H_
> > +
> > +/**
> > + * @file
> > + *
> > + * RTE XOR Hash
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include <stdint.h>
> > +
> > +#include <rte_byteorder.h>
> > +
> > +#define LEFT8b_MASK rte_cpu_to_be_32(0xff000000)
>
> I know that cpu_to_be() and be_to_cpu() are doing the same thing, but to
> me the correct function to use here is be_to_cpu(), as the for loop in the
> function below works with values in the CPU endianness. Would you agree?
>
What I have in mind is the 0xff000000 literal is in CPU endianness, and I'm
converting it to big-endian. For performance reasons, I think it's better
to work in big-endian in the loop.
Also as Vladimir suggested above, I'll remove these masks and switch to
using shifts directly. So I guess this won't matter anymore.

>
> > +#define LEFT16b_MASK rte_cpu_to_be_32(0xffff0000)
>
> I know that cpu_to_be() and be_to_cpu() are doing the same thing, but to
> me the correct function to use here is be_to_cpu(), as the for loop in the
> function below works with values in the CPU endianness. Would you agree?
>
> > +
> > +/**
> > + * Calculate XOR32 hash on user-supplied byte array.
> > + *
> > + * @param data
> > + *   Data to perform hash on.
> > + * @param data_len
> > + *   How many bytes to use to calculate hash value.
> > + * @param init_val
> > + *   Value to initialise hash generator.
> > + * @return
> > + *   32bit calculated hash value.
> > + */
> > +static inline uint32_t
> > +rte_hash_xor(const void *data, uint32_t data_len, uint32_t init_val)
> > +{
> > +     uint32_t i;
> > +     uintptr_t pd = (uintptr_t) data;
> > +     init_val = rte_cpu_to_be_32(init_val);
> I don't think this is correct, I the correct version here is to remove the
> above assignment, as I think the intention of this function (for
> performance reasons) is to do the endianness conversion only when needed,
> which is once at the end of the function (and also when handling the 2-byte
> and 1-byte left-overs).
>
What I have in mind is to convert init_val to big-endian once here, instead
of having to convert every byte array chunks from big-endian to host endian
in the loop (more conversions, worse performance, see comment below).

>
> > +
> > +     for (i = 0; i < data_len / 4; i++) {
> > +             init_val ^= *(const uint32_t *)pd;
>
Look at this line, if init_val is still in host endianness, we need to do
init_val ^= rte_be_to_cpu_32(*(const uint32_t *)pd) instead.
I think the essential tradeoff here is that, if we convert init_val from
host to be and back, we pay the constant cost of 2 conversions. The
alternative is to convert byte array chunks from be to host. The number of
conversions depends on data_len. Now I think of it more, maybe the best
option is to condition on the value of data_len, and choose the method with
fewer conversions.

> > +             pd += 4;
> > +     }
> > +
> > +     if (data_len & 0x2) {
> > +             init_val ^= *(const uint32_t *)pd & LEFT16b_MASK;
> > +             pd += 2;
> > +     }
> > +
> > +     if (data_len & 0x1)
> > +             init_val ^= *(const uint32_t *)pd & LEFT8b_MASK;
> > +
> > +     init_val = rte_be_to_cpu_32(init_val);
> > +     return init_val;
> > +}
> > +
>
> Due to the XOR properties (endianness-insensitivity, no carry bits, etc)
> and for performance reasons, I would also recommend implementing a 64-bit
> version of this function (while keeping the 32-bit result), similar to this:
>
> uint64_t *data64 = (uint64_t *)data;
> uint64_t result = init_data;
>
> /* Read & accumulate input data in 8-byte increments. */
> for (i = 0; i < data_len / 8; i++)
>         result ^= *data64++;
>
> data_len &= 0x7;
>
> /* Handle any remaining bytes in the input data (up to 7 bytes). */
> if (data_len >= 4) {
>         uint32_t *data32 = (uint32_t *)data64;
>
>         /* Read and accumulate  the next 4 bytes from the input data. */
>         result ^= *data32++;
>         data_len -= 4;
>
>         if (data_len >= 2) {
>                 uint16_t *data16 = (uint16_t *)data32;
>
>                 /* Read and accumulate the next 2 bytes from the input
> data. */
>                 result ^= *data16++
>                 data_len -= 2;
>
>                 if (data_len) {
>                         uint8_t *data8 = (uint8_t *)data8;
>
>                         /* Read and accumulate the next byte from the
> input data. */
>                         result ^= *data8;
>                 }
>         }
> }
>
> /* Accumulate the upper 32 bits on top of the lower 32 bits. */
> result ^= result >> 32;
>
> /* Single endianness swap at the very end. */
> return rte_cpu_to_be32((uint32_t)result);
>
> What do you think?
>
> Regards,
> Cristian
>
>
  
Stephen Hemminger Feb. 20, 2023, 11:04 p.m. UTC | #15
On Mon, 20 Feb 2023 12:44:06 -0800
Bili Dong <qobilidop@gmail.com> wrote:

> Hi Cristian,
> 
> I agree the 64-bit version could enable better performance, and I will do
> it in the next version.
> 

Same optimizations as ipv4 checksum probably apply.
Aligning data, and duffs device, etc.

You might even be able to use vector instructions
  
Bili Dong Feb. 21, 2023, 1:38 a.m. UTC | #16
Thanks, I'll check it out.

On Mon, Feb 20, 2023 at 3:04 PM Stephen Hemminger <
stephen@networkplumber.org> wrote:

> On Mon, 20 Feb 2023 12:44:06 -0800
> Bili Dong <qobilidop@gmail.com> wrote:
>
> > Hi Cristian,
> >
> > I agree the 64-bit version could enable better performance, and I will do
> > it in the next version.
> >
>
> Same optimizations as ipv4 checksum probably apply.
> Aligning data, and duffs device, etc.
>
> You might even be able to use vector instructions
>
>
  

Patch

diff --git a/.mailmap b/.mailmap
index 5015494210..176dd93b66 100644
--- a/.mailmap
+++ b/.mailmap
@@ -159,6 +159,7 @@  Bernard Iremonger <bernard.iremonger@intel.com>
 Bert van Leeuwen <bert.vanleeuwen@netronome.com>
 Bhagyada Modali <bhagyada.modali@amd.com>
 Bharat Mota <bmota@vmware.com>
+Bili Dong <qobilidop@gmail.com>
 Bill Hong <bhong@brocade.com>
 Billy McFall <bmcfall@redhat.com>
 Billy O'Mahony <billy.o.mahony@intel.com>
diff --git a/app/test/test_hash_functions.c b/app/test/test_hash_functions.c
index 76d51b6e71..14d69d90c4 100644
--- a/app/test/test_hash_functions.c
+++ b/app/test/test_hash_functions.c
@@ -15,6 +15,7 @@ 
 #include <rte_hash.h>
 #include <rte_jhash.h>
 #include <rte_hash_crc.h>
+#include <rte_hash_xor.h>
 
 #include "test.h"
 
@@ -22,8 +23,8 @@ 
  * Hash values calculated for key sizes from array "hashtest_key_lens"
  * and for initial values from array "hashtest_initvals.
  * Each key will be formed by increasing each byte by 1:
- * e.g.: key size = 4, key = 0x03020100
- *       key size = 8, key = 0x0706050403020100
+ * e.g.: key size = 4, key = 0x00010203
+ *       key size = 8, key = 0x0001020304050607
  */
 static uint32_t hash_values_jhash[2][12] = {{
 	0x8ba9414b, 0xdf0d39c9,
@@ -51,6 +52,19 @@  static uint32_t hash_values_crc[2][12] = {{
 	0x789c104f, 0x53028d3e
 }
 };
+static uint32_t hash_values_xor[2][12] = {{
+	0x00000000, 0x00010000,
+	0x00010203, 0x04040404, 0x00000000, 0x00000000,
+	0x00000000, 0x00000000, 0x0c040404, 0x000d0e0f,
+	0x04212223, 0x04040404
+},
+{
+	0xdeadbeef, 0xdeacbeef,
+	0xdeacbcec, 0xdaa9baeb, 0xdeadbeef, 0xdeadbeef,
+	0xdeadbeef, 0xdeadbeef, 0xd2a9baeb, 0xdea0b0e0,
+	0xda8c9ccc, 0xdaa9baeb
+}
+};
 
 /*******************************************************************************
  * Hash function performance test configuration section. Each performance test
@@ -61,7 +75,7 @@  static uint32_t hash_values_crc[2][12] = {{
  */
 #define HASHTEST_ITERATIONS 1000000
 #define MAX_KEYSIZE 64
-static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc};
+static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc, rte_hash_xor};
 static uint32_t hashtest_initvals[] = {0, 0xdeadbeef};
 static uint32_t hashtest_key_lens[] = {
 	1, 2,                 /* Unusual key sizes */
@@ -85,6 +99,9 @@  get_hash_name(rte_hash_function f)
 	if (f == rte_hash_crc)
 		return "rte_hash_crc";
 
+	if (f == rte_hash_xor)
+		return "rte_hash_xor";
+
 	return "UnknownHash";
 }
 
@@ -173,6 +190,16 @@  verify_precalculated_hash_func_tests(void)
 				       hash_values_crc[j][i], hash);
 				return -1;
 			}
+
+			hash = rte_hash_xor(key, hashtest_key_lens[i],
+					hashtest_initvals[j]);
+			if (hash != hash_values_xor[j][i]) {
+				printf("XOR for %u bytes with initial value 0x%x."
+				       "Expected 0x%x, but got 0x%x\n",
+				       hashtest_key_lens[i], hashtest_initvals[j],
+				       hash_values_xor[j][i], hash);
+				return -1;
+			}
 		}
 	}
 
diff --git a/lib/hash/rte_hash_xor.h b/lib/hash/rte_hash_xor.h
new file mode 100644
index 0000000000..7004f83ec2
--- /dev/null
+++ b/lib/hash/rte_hash_xor.h
@@ -0,0 +1,65 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_HASH_XOR_H_
+#define _RTE_HASH_XOR_H_
+
+/**
+ * @file
+ *
+ * RTE XOR Hash
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+#include <rte_byteorder.h>
+
+#define LEFT8b_MASK rte_cpu_to_be_32(0xff000000)
+#define LEFT16b_MASK rte_cpu_to_be_32(0xffff0000)
+
+/**
+ * Calculate XOR32 hash on user-supplied byte array.
+ *
+ * @param data
+ *   Data to perform hash on.
+ * @param data_len
+ *   How many bytes to use to calculate hash value.
+ * @param init_val
+ *   Value to initialise hash generator.
+ * @return
+ *   32bit calculated hash value.
+ */
+static inline uint32_t
+rte_hash_xor(const void *data, uint32_t data_len, uint32_t init_val)
+{
+	uint32_t i;
+	uintptr_t pd = (uintptr_t) data;
+	init_val = rte_cpu_to_be_32(init_val);
+
+	for (i = 0; i < data_len / 4; i++) {
+		init_val ^= *(const uint32_t *)pd;
+		pd += 4;
+	}
+
+	if (data_len & 0x2) {
+		init_val ^= *(const uint32_t *)pd & LEFT16b_MASK;
+		pd += 2;
+	}
+
+	if (data_len & 0x1)
+		init_val ^= *(const uint32_t *)pd & LEFT8b_MASK;
+
+	init_val = rte_be_to_cpu_32(init_val);
+	return init_val;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_HASH_XOR_H_ */