[RFC,v6,5/6] eal: add atomic bit operations

Message ID 20240502055706.112443-6-mattias.ronnblom@ericsson.com (mailing list archive)
State New
Delegated to: Thomas Monjalon
Headers
Series Improve EAL bit operations API |

Commit Message

Mattias Rönnblom May 2, 2024, 5:57 a.m. UTC
  Add atomic bit test/set/clear/assign/flip and
test-and-set/clear/assign/flip functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

RFC v6:
 * Have rte_bit_atomic_test() accept const-marked bitsets.

RFC v4:
 * Add atomic bit flip.
 * Mark macro-generated private functions experimental.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 428 +++++++++++++++++++++++++++++++++++
 1 file changed, 428 insertions(+)
  

Comments

Mattias Rönnblom May 3, 2024, 6:41 a.m. UTC | #1
On 2024-05-02 07:57, Mattias Rönnblom wrote:
> Add atomic bit test/set/clear/assign/flip and
> test-and-set/clear/assign/flip functions.
> 
> All atomic bit functions allow (and indeed, require) the caller to
> specify a memory order.
> 
> RFC v6:
>   * Have rte_bit_atomic_test() accept const-marked bitsets.
> 
> RFC v4:
>   * Add atomic bit flip.
>   * Mark macro-generated private functions experimental.
> 
> RFC v3:
>   * Work around lack of C++ support for _Generic (Tyler Retzlaff).
> 
> RFC v2:
>   o Add rte_bit_atomic_test_and_assign() (for consistency).
>   o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
>   o Use <rte_stdatomics.h> to support MSVC.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>   lib/eal/include/rte_bitops.h | 428 +++++++++++++++++++++++++++++++++++
>   1 file changed, 428 insertions(+)
> 
> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> index caec4f36bb..9cde982113 100644
> --- a/lib/eal/include/rte_bitops.h
> +++ b/lib/eal/include/rte_bitops.h
> @@ -21,6 +21,7 @@
>   
>   #include <rte_compat.h>
>   #include <rte_debug.h>
> +#include <rte_stdatomic.h>
>   
>   #ifdef __cplusplus
>   extern "C" {
> @@ -399,6 +400,202 @@ extern "C" {
>   		 uint32_t *: __rte_bit_once_flip32,		\
>   		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Test if a particular bit in a word is set with a particular memory
> + * order.
> + *
> + * Test a bit with the resulting memory load ordered as per the
> + * specified memory order.
> + *
> + * @param addr
> + *   A pointer to the word to query.
> + * @param nr
> + *   The index of the bit.
> + * @param memory_order
> + *   The memory order to use. See <rte_stdatomics.h> for details.
> + * @return
> + *   Returns true if the bit is set, and false otherwise.
> + */
> +#define rte_bit_atomic_test(addr, nr, memory_order)			\
> +	_Generic((addr),						\
> +		 uint32_t *: __rte_bit_atomic_test32,			\
> +		 const uint32_t *: __rte_bit_atomic_test32,		\
> +		 uint64_t *: __rte_bit_atomic_test64,			\
> +		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
> +							    memory_order)

Should __rte_bit_atomic_test32()'s addr parameter be marked volatile, 
and two volatile-marked branches added to the above list? Both the 
C11-style GCC built-ins and the C11-proper atomic functions have 
addresses marked volatile. The Linux kernel and the old __sync GCC 
built-ins on the other hand, doesn't (although I think you still get 
volatile semantics). The only point of "volatile", as far as I can see, 
is to avoid warnings in case the user passed a volatile-marked pointer. 
The drawback is that *you're asking for volatile semantics*, although 
with the current compilers, it seems like that is what you get, 
regardless if you asked for it or not.

Just to be clear: even these functions would accept volatile-marked 
pointers, non-volatile pointers should be accepted as well (and should 
generally be preferred).

Isn't parallel programming in C lovely.

<snip>
  
Tyler Retzlaff May 3, 2024, 11:30 p.m. UTC | #2
On Fri, May 03, 2024 at 08:41:09AM +0200, Mattias Rönnblom wrote:
> On 2024-05-02 07:57, Mattias Rönnblom wrote:
> >Add atomic bit test/set/clear/assign/flip and
> >test-and-set/clear/assign/flip functions.
> >
> >All atomic bit functions allow (and indeed, require) the caller to
> >specify a memory order.
> >
> >RFC v6:
> >  * Have rte_bit_atomic_test() accept const-marked bitsets.
> >
> >RFC v4:
> >  * Add atomic bit flip.
> >  * Mark macro-generated private functions experimental.
> >
> >RFC v3:
> >  * Work around lack of C++ support for _Generic (Tyler Retzlaff).
> >
> >RFC v2:
> >  o Add rte_bit_atomic_test_and_assign() (for consistency).
> >  o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
> >  o Use <rte_stdatomics.h> to support MSVC.
> >
> >Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> >---
> >  lib/eal/include/rte_bitops.h | 428 +++++++++++++++++++++++++++++++++++
> >  1 file changed, 428 insertions(+)
> >
> >diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> >index caec4f36bb..9cde982113 100644
> >--- a/lib/eal/include/rte_bitops.h
> >+++ b/lib/eal/include/rte_bitops.h
> >@@ -21,6 +21,7 @@
> >  #include <rte_compat.h>
> >  #include <rte_debug.h>
> >+#include <rte_stdatomic.h>
> >  #ifdef __cplusplus
> >  extern "C" {
> >@@ -399,6 +400,202 @@ extern "C" {
> >  		 uint32_t *: __rte_bit_once_flip32,		\
> >  		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
> >+/**
> >+ * @warning
> >+ * @b EXPERIMENTAL: this API may change without prior notice.
> >+ *
> >+ * Test if a particular bit in a word is set with a particular memory
> >+ * order.
> >+ *
> >+ * Test a bit with the resulting memory load ordered as per the
> >+ * specified memory order.
> >+ *
> >+ * @param addr
> >+ *   A pointer to the word to query.
> >+ * @param nr
> >+ *   The index of the bit.
> >+ * @param memory_order
> >+ *   The memory order to use. See <rte_stdatomics.h> for details.
> >+ * @return
> >+ *   Returns true if the bit is set, and false otherwise.
> >+ */
> >+#define rte_bit_atomic_test(addr, nr, memory_order)			\
> >+	_Generic((addr),						\
> >+		 uint32_t *: __rte_bit_atomic_test32,			\
> >+		 const uint32_t *: __rte_bit_atomic_test32,		\
> >+		 uint64_t *: __rte_bit_atomic_test64,			\
> >+		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
> >+							    memory_order)
> 
> Should __rte_bit_atomic_test32()'s addr parameter be marked
> volatile, and two volatile-marked branches added to the above list?

off-topic comment relating to the generic type selection list above, i was
reading C17 DR481 recently and i think we may want to avoid providing
qualified and unauqlified types in the list.

DR481: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2396.htm#dr_481

"the controlling expression of a generic selection shall have type
compatibile with at most one of the types named in its generic
association list."

"the type of the controlling expression is the type of the expression as
if it had undergone an lvalue conversion"

"lvalue conversion drops type qualifiers"

so the unqualified type of the controlling expression is only matched
selection list which i guess that means the qualified entries in the
list are never selected.

i suppose the implication here is we couldn't then provide 2 inline
functions one for volatile qualified and for not volatile qualified.

as for a single function where the parameter is or isn't volatile
qualified. if we're always forwarding to an intrinsic i've always
assumed (perhaps incorrectly) that the intrinsic itself did what was
semantically correct even without qualification.

as you note i believe there is a convenience element in providing the
volatile qualified version since it means the function like macro /
inline function will accept both volatile qualified and unqualified
whereas if we did not qualify the parameter it would require the
caller/user to strip the volatile qualification if present with casts.

i imagine in most cases we are just forwarding, in which case it seems
not horrible to provide the qualified version.

> Both the C11-style GCC built-ins and the C11-proper atomic functions
> have addresses marked volatile. The Linux kernel and the old __sync
> GCC built-ins on the other hand, doesn't (although I think you still
> get volatile semantics). The only point of "volatile", as far as I
> can see, is to avoid warnings in case the user passed a
> volatile-marked pointer. The drawback is that *you're asking for
> volatile semantics*, although with the current compilers, it seems
> like that is what you get, regardless if you asked for it or not.
> 
> Just to be clear: even these functions would accept volatile-marked
> pointers, non-volatile pointers should be accepted as well (and
> should generally be preferred).
> 
> Isn't parallel programming in C lovely.

it's super!

> 
> <snip>
  
Mattias Rönnblom May 4, 2024, 3:36 p.m. UTC | #3
On 2024-05-04 01:30, Tyler Retzlaff wrote:
> On Fri, May 03, 2024 at 08:41:09AM +0200, Mattias Rönnblom wrote:
>> On 2024-05-02 07:57, Mattias Rönnblom wrote:
>>> Add atomic bit test/set/clear/assign/flip and
>>> test-and-set/clear/assign/flip functions.
>>>
>>> All atomic bit functions allow (and indeed, require) the caller to
>>> specify a memory order.
>>>
>>> RFC v6:
>>>   * Have rte_bit_atomic_test() accept const-marked bitsets.
>>>
>>> RFC v4:
>>>   * Add atomic bit flip.
>>>   * Mark macro-generated private functions experimental.
>>>
>>> RFC v3:
>>>   * Work around lack of C++ support for _Generic (Tyler Retzlaff).
>>>
>>> RFC v2:
>>>   o Add rte_bit_atomic_test_and_assign() (for consistency).
>>>   o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
>>>   o Use <rte_stdatomics.h> to support MSVC.
>>>
>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
>>> ---
>>>   lib/eal/include/rte_bitops.h | 428 +++++++++++++++++++++++++++++++++++
>>>   1 file changed, 428 insertions(+)
>>>
>>> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
>>> index caec4f36bb..9cde982113 100644
>>> --- a/lib/eal/include/rte_bitops.h
>>> +++ b/lib/eal/include/rte_bitops.h
>>> @@ -21,6 +21,7 @@
>>>   #include <rte_compat.h>
>>>   #include <rte_debug.h>
>>> +#include <rte_stdatomic.h>
>>>   #ifdef __cplusplus
>>>   extern "C" {
>>> @@ -399,6 +400,202 @@ extern "C" {
>>>   		 uint32_t *: __rte_bit_once_flip32,		\
>>>   		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice.
>>> + *
>>> + * Test if a particular bit in a word is set with a particular memory
>>> + * order.
>>> + *
>>> + * Test a bit with the resulting memory load ordered as per the
>>> + * specified memory order.
>>> + *
>>> + * @param addr
>>> + *   A pointer to the word to query.
>>> + * @param nr
>>> + *   The index of the bit.
>>> + * @param memory_order
>>> + *   The memory order to use. See <rte_stdatomics.h> for details.
>>> + * @return
>>> + *   Returns true if the bit is set, and false otherwise.
>>> + */
>>> +#define rte_bit_atomic_test(addr, nr, memory_order)			\
>>> +	_Generic((addr),						\
>>> +		 uint32_t *: __rte_bit_atomic_test32,			\
>>> +		 const uint32_t *: __rte_bit_atomic_test32,		\
>>> +		 uint64_t *: __rte_bit_atomic_test64,			\
>>> +		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
>>> +							    memory_order)
>>
>> Should __rte_bit_atomic_test32()'s addr parameter be marked
>> volatile, and two volatile-marked branches added to the above list?
> 
> off-topic comment relating to the generic type selection list above, i was
> reading C17 DR481 recently and i think we may want to avoid providing
> qualified and unauqlified types in the list.
> 
> DR481: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2396.htm#dr_481
> 
> "the controlling expression of a generic selection shall have type
> compatibile with at most one of the types named in its generic
> association list."
> 

Const and unqualified pointers are not compatible. Without the "const 
uint32_t *" element in the association list, passing const-qualified 
pointers to rte_bit_test() will cause a compiler error.

So, if you want to support both passing const-qualified and unqualified 
pointers (which you do, obviously), then you have no other option than 
to treat them separately.

GCC, clang and ICC all seem to agree on this. The standard also is 
pretty clear on this, from what I can tell. "No two generic associations 
in the same generic selection shall specify compatible types." (6.5.1.1, 
note *compatible*). "For two pointer types to be compatible, both shall 
be identically qualified and both shall be pointers to compatible 
types." (6.7.6.1)

> "the type of the controlling expression is the type of the expression as
> if it had undergone an lvalue conversion"
> 
> "lvalue conversion drops type qualifiers"
> 
> so the unqualified type of the controlling expression is only matched
> selection list which i guess that means the qualified entries in the
> list are never selected.
> 
> i suppose the implication here is we couldn't then provide 2 inline
> functions one for volatile qualified and for not volatile qualified.
> 
> as for a single function where the parameter is or isn't volatile
> qualified. if we're always forwarding to an intrinsic i've always
> assumed (perhaps incorrectly) that the intrinsic itself did what was
> semantically correct even without qualification.
> 
> as you note i believe there is a convenience element in providing the
> volatile qualified version since it means the function like macro /
> inline function will accept both volatile qualified and unqualified
> whereas if we did not qualify the parameter it would require the
> caller/user to strip the volatile qualification if present with casts.
> 
> i imagine in most cases we are just forwarding, in which case it seems
> not horrible to provide the qualified version.
> 
>> Both the C11-style GCC built-ins and the C11-proper atomic functions
>> have addresses marked volatile. The Linux kernel and the old __sync
>> GCC built-ins on the other hand, doesn't (although I think you still
>> get volatile semantics). The only point of "volatile", as far as I
>> can see, is to avoid warnings in case the user passed a
>> volatile-marked pointer. The drawback is that *you're asking for
>> volatile semantics*, although with the current compilers, it seems
>> like that is what you get, regardless if you asked for it or not.
>>
>> Just to be clear: even these functions would accept volatile-marked
>> pointers, non-volatile pointers should be accepted as well (and
>> should generally be preferred).
>>
>> Isn't parallel programming in C lovely.
> 
> it's super!
> 
>>
>> <snip>
  

Patch

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index caec4f36bb..9cde982113 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@ 
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -399,6 +400,202 @@  extern "C" {
 		 uint32_t *: __rte_bit_once_flip32,		\
 		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 const uint32_t *: __rte_bit_atomic_test32,		\
+		 uint64_t *: __rte_bit_atomic_test64,			\
+		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
+							    memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to the value indicated by @c value, with the memory ordering
+ * as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically flip bit in word.
+ *
+ * Atomically negate the value of the bit specified by @c nr in the
+ * word pointed to by @c addr to the value indicated by @c value, with
+ * the memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_flip(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_flip32,			\
+		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Atomically test and set bit specified by @c nr in the word pointed
+ * to by @c addr to '1', with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Atomically test and clear bit specified by @c nr in the word
+ * pointed to by @c addr to '0', with the memory ordering as specified
+ * with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Atomically test and assign bit specified by @c nr in the word
+ * pointed to by @c addr the value specified by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -483,6 +680,162 @@  __RTE_GEN_BIT_CLEAR(once_, clear, volatile, 64)
 __RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 64)
 __RTE_GEN_BIT_FLIP(once_, flip, volatile, 64)
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
+					unsigned int nr, bool value,	\
+					int memory_order)		\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_clear ## size(addr, nr,	\
+						       memory_order);	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_assign ## size(&target, nr, value);	\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+		return __rte_bit_test ## size(&before, nr);		\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_FLIP(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_flip ## size(uint ## size ## _t *addr,		\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_flip ## size(&target, nr);		\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)	\
+	__RTE_GEN_BIT_ATOMIC_FLIP(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set32(uint32_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear32(uint32_t *addr, unsigned int nr,
+				int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, false,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, false,
+						  memory_order);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -1184,6 +1537,14 @@  rte_log2_u64(uint64_t v)
 #undef rte_bit_once_assign
 #undef rte_bit_once_flip
 
+#undef rte_bit_atomic_test
+#undef rte_bit_atomic_set
+#undef rte_bit_atomic_clear
+#undef rte_bit_atomic_assign
+#undef rte_bit_atomic_test_and_set
+#undef rte_bit_atomic_test_and_clear
+#undef rte_bit_atomic_test_and_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1227,6 +1588,59 @@  rte_log2_u64(uint64_t v)
 	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
+#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+	}
+
+#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
+					  arg3_name);		      \
+	}
+
+#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
+						 arg3_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)
+
 __RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
@@ -1239,6 +1653,20 @@  __RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(once_flip, volatile, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+		     int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+		      bool, value, int, memory_order)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */