[1/6] eal: add static per-lcore memory allocation facility

Message ID 20240910070344.699183-2-mattias.ronnblom@ericsson.com (mailing list archive)
State Superseded
Delegated to: Thomas Monjalon
Headers
Series Lcore variables |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Mattias Rönnblom Sept. 10, 2024, 7:03 a.m. UTC
Introduce DPDK per-lcore id variables, or lcore variables for short.

An lcore variable has one value for every current and future lcore
id-equipped thread.

The primary <rte_lcore_var.h> use case is for statically allocating
small, frequently-accessed data structures, for which one instance
should exist for each lcore.

Lcore variables are similar to thread-local storage (TLS, e.g., C11
_Thread_local), but decoupling the values' life time with that of the
threads.

Lcore variables are also similar in terms of functionality provided by
FreeBSD kernel's DPCPU_*() family of macros and the associated
build-time machinery. DPCPU uses linker scripts, which effectively
prevents the reuse of its, otherwise seemingly viable, approach.

The currently-prevailing way to solve the same problem as lcore
variables is to keep a module's per-lcore data as RTE_MAX_LCORE-sized
array of cache-aligned, RTE_CACHE_GUARDed structs. The benefit of
lcore variables over this approach is that data related to the same
lcore now is close (spatially, in memory), rather than data used by
the same module, which in turn avoid excessive use of padding,
polluting caches with unused data.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>

--

PATCH:
 * Update MAINTAINERS and release notes.
 * Stop covering included files in extern "C" {}.

RFC v6:
 * Include <stdlib.h> to get aligned_alloc().
 * Tweak documentation (grammar).
 * Provide API-level guarantees that lcore variable values take on an
   initial value of zero.
 * Fix misplaced __rte_cache_aligned in the API doc example.

RFC v5:
 * In Doxygen, consistenly use @<cmd> (and not \<cmd>).
 * The RTE_LCORE_VAR_GET() and SET() convience access macros
   covered an uncommon use case, where the lcore value is of a
   primitive type, rather than a struct, and is thus eliminated
   from the API. (Morten Brørup)
 * In the wake up GET()/SET() removeal, rename RTE_LCORE_VAR_PTR()
   RTE_LCORE_VAR_VALUE().
 * The underscores are removed from __rte_lcore_var_lcore_ptr() to
   signal that this function is a part of the public API.
 * Macro arguments are documented.

RFV v4:
 * Replace large static array with libc heap-allocated memory. One
   implication of this change is there no longer exists a fixed upper
   bound for the total amount of memory used by lcore variables.
   RTE_MAX_LCORE_VAR has changed meaning, and now represent the
   maximum size of any individual lcore variable value.
 * Fix issues in example. (Morten Brørup)
 * Improve access macro type checking. (Morten Brørup)
 * Refer to the lcore variable handle as "handle" and not "name" in
   various macros.
 * Document lack of thread safety in rte_lcore_var_alloc().
 * Provide API-level assurance the lcore variable handle is
   always non-NULL, to all applications to use NULL to mean
   "not yet allocated".
 * Note zero-sized allocations are not allowed.
 * Give API-level guarantee the lcore variable values are zeroed.

RFC v3:
 * Replace use of GCC-specific alignof(<expression>) with alignof(<type>).
 * Update example to reflect FOREACH macro name change (in RFC v2).

RFC v2:
 * Use alignof to derive alignment requirements. (Morten Brørup)
 * Change name of FOREACH to make it distinct from <rte_lcore.h>'s
   *per-EAL-thread* RTE_LCORE_FOREACH(). (Morten Brørup)
 * Allow user-specified alignment, but limit max to cache line size.
---
 MAINTAINERS                            |   6 +
 config/rte_config.h                    |   1 +
 doc/api/doxy-api-index.md              |   1 +
 doc/guides/rel_notes/release_24_11.rst |  14 +
 lib/eal/common/eal_common_lcore_var.c  |  69 +++++
 lib/eal/common/meson.build             |   1 +
 lib/eal/include/meson.build            |   1 +
 lib/eal/include/rte_lcore_var.h        | 384 +++++++++++++++++++++++++
 lib/eal/version.map                    |   3 +
 9 files changed, 480 insertions(+)
 create mode 100644 lib/eal/common/eal_common_lcore_var.c
 create mode 100644 lib/eal/include/rte_lcore_var.h
  

Comments

Morten Brørup Sept. 10, 2024, 9:32 a.m. UTC | #1
> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Tuesday, 10 September 2024 09.04
> 
> Introduce DPDK per-lcore id variables, or lcore variables for short.

Throughout the descriptions and comments,
please replace "lcore id" with "lcore" (e.g. "per-lcore variables"),
when referring to the lcore, and not the index of the lcore.
(Your intention might be to highlight that it only covers threads with an lcore id,
but if that wasn't the case, you would refer to them as "threads" not "lcores".)
Except, of course, when referring to an actual lcore id, e.g. lcore_id function parameters.

Paraphrasing:
Consider the type of what you are referring to;
use "lcore" if its type is "thread", and
use "lcore id" if its type is "int".

I might be wrong here, but please think hard about it.

> 
> An lcore variable has one value for every current and future lcore
> id-equipped thread.
> 
> The primary <rte_lcore_var.h> use case is for statically allocating
> small, frequently-accessed data structures, for which one instance
> should exist for each lcore.
> 
> Lcore variables are similar to thread-local storage (TLS, e.g., C11
> _Thread_local), but decoupling the values' life time with that of the
> threads.
> 
> Lcore variables are also similar in terms of functionality provided by
> FreeBSD kernel's DPCPU_*() family of macros and the associated
> build-time machinery. DPCPU uses linker scripts, which effectively
> prevents the reuse of its, otherwise seemingly viable, approach.
> 
> The currently-prevailing way to solve the same problem as lcore
> variables is to keep a module's per-lcore data as RTE_MAX_LCORE-sized
> array of cache-aligned, RTE_CACHE_GUARDed structs. The benefit of
> lcore variables over this approach is that data related to the same
> lcore now is close (spatially, in memory), rather than data used by
> the same module, which in turn avoid excessive use of padding,
> polluting caches with unused data.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> 
> --

> +++ b/doc/api/doxy-api-index.md
> @@ -99,6 +99,7 @@ The public API headers are grouped by topics:
>    [interrupts](@ref rte_interrupts.h),
>    [launch](@ref rte_launch.h),
>    [lcore](@ref rte_lcore.h),
> +  [lcore-varible](@ref rte_lcore_var.h),

Typo: varible -> variable


> +++ b/doc/guides/rel_notes/release_24_11.rst
> @@ -55,6 +55,20 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
> 
> +* **Added EAL per-lcore static memory allocation facility.**
> +
> +    Added EAL API <rte_lcore_var.h> for statically allocating small,
> +    frequently-accessed data structures, for which one instance should
> +    exist for each lcore.
> +
> +    With lcore variables, data is organized spatially on a per-lcore
> +    basis, rather than per library or PMD, avoiding the need for cache
> +    aligning (or RTE_CACHE_GUARDing) data structures, which in turn
> +    reduces CPU cache internal fragmentation, improving performance.
> +
> +    Lcore variables are similar to thread-local storage (TLS, e.g.,
> +    C11 _Thread_local), but decoupling the values' life time from that
> +    of the threads.

When referring to TLS, you might want to clarify that lcore variables are not instantiated for unregistered threads.


> +static void *lcore_buffer;
> +static size_t offset = RTE_MAX_LCORE_VAR;
> +
> +static void *
> +lcore_var_alloc(size_t size, size_t align)
> +{
> +	void *handle;
> +	void *value;
> +
> +	offset = RTE_ALIGN_CEIL(offset, align);
> +
> +	if (offset + size > RTE_MAX_LCORE_VAR) {
> +		lcore_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE,
> +					     LCORE_BUFFER_SIZE);
> +		RTE_VERIFY(lcore_buffer != NULL);
> +
> +		offset = 0;
> +	}

To determine if the lcore_buffer memory should be allocated, why not just check if lcore_buffer == NULL?
Then offset wouldn't need an initial value of RTE_MAX_LCORE_VAR.

> +
> +	handle = RTE_PTR_ADD(lcore_buffer, offset);
> +
> +	offset += size;
> +
> +	RTE_LCORE_VAR_FOREACH_VALUE(value, handle)
> +		memset(value, 0, size);
> +
> +	EAL_LOG(DEBUG, "Allocated %"PRIuPTR" bytes of per-lcore data with a "
> +		"%"PRIuPTR"-byte alignment", size, align);
> +
> +	return handle;
> +}


> +/**
> + * @file
> + *
> + * RTE Per-lcore id variables

Suggest mentioning the short form too, e.g.:
"RTE Per-lcore id variables (RTE Lcore variables)"

> + *
> + * This API provides a mechanism to create and access per-lcore id
> + * variables in a space- and cycle-efficient manner.
> + *
> + * A per-lcore id variable (or lcore variable for short) has one value
> + * for each EAL thread and registered non-EAL thread.

And service thread.

> + * There is one
> + * copy for each current and future lcore id-equipped thread, with the

"one copy" -> "one instance"

> + * total number of copies amounting to @c RTE_MAX_LCORE. The value of

"copies" -> "instances"

> + * an lcore variable for a particular lcore id is independent from
> + * other values (for other lcore ids) within the same lcore variable.
> + *
> + * In order to access the values of an lcore variable, a handle is
> + * used. The type of the handle is a pointer to the value's type
> + * (e.g., for @c uint32_t lcore variable, the handle is a
> + * <code>uint32_t *</code>. The handler type is used to inform the

Typo: "handler" -> "handle", I think :-/
Found this typo multiple times; search-replace.

> + * access macros the type of the values. A handle may be passed
> + * between modules and threads just like any pointer, but its value
> + * must be treated as a an opaque identifier. An allocated handle
> + * never has the value NULL.
> + *
> + * @b Creation
> + *
> + * An lcore variable is created in two steps:
> + *  1. Define a lcore variable handle by using @ref RTE_LCORE_VAR_HANDLE.
> + *  2. Allocate lcore variable storage and initialize the handle with
> + *     a unique identifier by @ref RTE_LCORE_VAR_ALLOC or
> + *     @ref RTE_LCORE_VAR_INIT. Allocation generally occurs the time of
> + *     module initialization, but may be done at any time.
> + *
> + * An lcore variable is not tied to the owning thread's lifetime. It's
> + * available for use by any thread immediately after having been
> + * allocated, and continues to be available throughout the lifetime of
> + * the EAL.
> + *
> + * Lcore variables cannot and need not be freed.
> + *
> + * @b Access
> + *
> + * The value of any lcore variable for any lcore id may be accessed
> + * from any thread (including unregistered threads), but it should
> + * only be *frequently* read from or written to by the owner.
> + *
> + * Values of the same lcore variable but owned by to different lcore

Typo: to -> two

> + * ids may be frequently read or written by the owners without risking
> + * false sharing.
> + *
> + * An appropriate synchronization mechanism (e.g., atomic loads and
> + * stores) should employed to assure there are no data races between
> + * the owning thread and any non-owner threads accessing the same
> + * lcore variable instance.
> + *
> + * The value of the lcore variable for a particular lcore id is
> + * accessed using @ref RTE_LCORE_VAR_LCORE_VALUE.
> + *
> + * A common pattern is for an EAL thread or a registered non-EAL
> + * thread to access its own lcore variable value. For this purpose, a
> + * short-hand exists in the form of @ref RTE_LCORE_VAR_VALUE.
> + *
> + * Although the handle (as defined by @ref RTE_LCORE_VAR_HANDLE) is a
> + * pointer with the same type as the value, it may not be directly
> + * dereferenced and must be treated as an opaque identifier.
> + *
> + * Lcore variable handles and value pointers may be freely passed
> + * between different threads.
> + *
> + * @b Storage
> + *
> + * An lcore variable's values may by of a primitive type like @c int,

Two typos: "values may by" -> "value may be"

> + * but would more typically be a @c struct.
> + *
> + * The lcore variable handle introduces a per-variable (not
> + * per-value/per-lcore id) overhead of @c sizeof(void *) bytes, so
> + * there are some memory footprint gains to be made by organizing all
> + * per-lcore id data for a particular module as one lcore variable
> + * (e.g., as a struct).
> + *
> + * An application may choose to define an lcore variable handle, which
> + * it then it goes on to never allocate.
> + *
> + * The size of a lcore variable's value must be less than the DPDK
> + * build-time constant @c RTE_MAX_LCORE_VAR.
> + *
> + * The lcore variable are stored in a series of lcore buffers, which
> + * are allocated from the libc heap. Heap allocation failures are
> + * treated as fatal.
> + *
> + * Lcore variables should generally *not* be @ref __rte_cache_aligned
> + * and need *not* include a @ref RTE_CACHE_GUARD field, since the use
> + * of these constructs are designed to avoid false sharing. In the
> + * case of an lcore variable instance, the thread most recently
> + * accessing nearby data structures should almost-always the lcore

Missing word: should almost-always *be* the lcore variables' owner.


> + * variables' owner. Adding padding will increase the effective memory
> + * working set size, potentially reducing performance.
> + *
> + * Lcore variable values take on an initial value of zero.
> + *
> + * @b Example
> + *
> + * Below is an example of the use of an lcore variable:
> + *
> + * @code{.c}
> + * struct foo_lcore_state {
> + *         int a;
> + *         long b;
> + * };
> + *
> + * static RTE_LCORE_VAR_HANDLE(struct foo_lcore_state, lcore_states);
> + *
> + * long foo_get_a_plus_b(void)
> + * {
> + *         struct foo_lcore_state *state = RTE_LCORE_VAR_VALUE(lcore_states);
> + *
> + *         return state->a + state->b;
> + * }
> + *
> + * RTE_INIT(rte_foo_init)
> + * {
> + *         RTE_LCORE_VAR_ALLOC(lcore_states);
> + *
> + *         struct foo_lcore_state *state;
> + *         RTE_LCORE_VAR_FOREACH_VALUE(state, lcore_states) {
> + *                 (initialize 'state')

Consider: (initialize 'state') -> /* initialize 'state' */

> + *         }
> + *
> + *         (other initialization)

Consider: (other initialization) -> /* other initialization */

> + * }
> + * @endcode
> + *
> + *
> + * @b Alternatives
> + *
> + * Lcore variables are designed to replace a pattern exemplified below:
> + * @code{.c}
> + * struct __rte_cache_aligned foo_lcore_state {
> + *         int a;
> + *         long b;
> + *         RTE_CACHE_GUARD;
> + * };
> + *
> + * static struct foo_lcore_state lcore_states[RTE_MAX_LCORE];
> + * @endcode
> + *
> + * This scheme is simple and effective, but has one drawback: the data
> + * is organized so that objects related to all lcores for a particular
> + * module is kept close in memory. At a bare minimum, this forces the
> + * use of cache-line alignment to avoid false sharing. With CPU

Consider adding: use of *padding to* cache-line alignment
My point here is:
This sentence should somehow include the word "padding".
This paragraph is not only aboud cache line alignment, it is primarily about padding.

> + * hardware prefetching and memory loads resulting from speculative
> + * execution (functions which seemingly are getting more eager faster
> + * than they are getting more intelligent), one or more "guard" cache
> + * lines may be required to separate one lcore's data from another's.
> + *
> + * Lcore variables has the upside of working with, not against, the

Typo: has -> have

> + * CPU's assumptions and for example next-line prefetchers may well
> + * work the way its designers intended (i.e., to the benefit, not
> + * detriment, of system performance).
> + *
> + * Another alternative to @ref rte_lcore_var.h is the @ref
> + * rte_per_lcore.h API, which make use of thread-local storage (TLS,

Typo: make -> makes

> + * e.g., GCC __thread or C11 _Thread_local). The main differences
> + * between by using the various forms of TLS (e.g., @ref
> + * RTE_DEFINE_PER_LCORE or _Thread_local) and the use of lcore
> + * variables are:
> + *
> + *   * The existence and non-existence of a thread-local variable
> + *     instance follow that of particular thread's. The data cannot be

Typo: "thread's" -> "threads", I think. :-/

> + *     accessed before the thread has been created, nor after it has
> + *     exited. As a result, thread-local variables must initialized in

Missing word: must *be* initialized

> + *     a "lazy" manner (e.g., at the point of thread creation). Lcore
> + *     variables may be accessed immediately after having been
> + *     allocated (which may be prior any thread beyond the main
> + *     thread is running).
> + *   * A thread-local variable is duplicated across all threads in the
> + *     process, including unregistered non-EAL threads (i.e.,
> + *     "regular" threads). For DPDK applications heavily relying on
> + *     multi-threading (in conjunction to DPDK's "one thread per core"
> + *     pattern), either by having many concurrent threads or
> + *     creating/destroying threads at a high rate, an excessive use of
> + *     thread-local variables may cause inefficiencies (e.g.,
> + *     increased thread creation overhead due to thread-local storage
> + *     initialization or increased total RAM footprint usage). Lcore
> + *     variables *only* exist for threads with an lcore id.
> + *   * If data in thread-local storage may be shared between threads
> + *     (i.e., can a pointer to a thread-local variable be passed to
> + *     and successfully dereferenced by non-owning thread) depends on
> + *     the details of the TLS implementation. With GCC __thread and
> + *     GCC _Thread_local, such data sharing is supported. In the C11
> + *     standard, the result of accessing another thread's
> + *     _Thread_local object is implementation-defined. Lcore variable
> + *     instances may be accessed reliably by any thread.
> + */
> +
> +#include <stddef.h>
> +#include <stdalign.h>
> +
> +#include <rte_common.h>
> +#include <rte_config.h>
> +#include <rte_lcore.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * Given the lcore variable type, produces the type of the lcore
> + * variable handle.
> + */
> +#define RTE_LCORE_VAR_HANDLE_TYPE(type)		\
> +	type *
> +
> +/**
> + * Define a lcore variable handle.

Typo: "a lcore" -> "an lcore"
Found this typo multiple times; search-replace "a lcore".

> + *
> + * This macro defines a variable which is used as a handle to access
> + * the various per-lcore id instances of a per-lcore id variable.

Suggest:
"the various per-lcore id instances of a per-lcore id variable" ->
"the various instances of a per-lcore id variable"

> + *
> + * The aim with this macro is to make clear at the point of
> + * declaration that this is an lcore handler, rather than a regular
> + * pointer.
> + *
> + * Add @b static as a prefix in case the lcore variable are only to be

Typo: are -> is

> + * accessed from a particular translation unit.
> + */
> +#define RTE_LCORE_VAR_HANDLE(type, name)	\
> +	RTE_LCORE_VAR_HANDLE_TYPE(type) name
> +
> +/**
> + * Allocate space for an lcore variable, and initialize its handle.
> + *
> + * The values of the lcore variable are initialized to zero.

Consider adding: "the lcore variable *instances* are initialized"
Found this typo multiple times; search-replace.

> + */
> +#define RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, size, align)	\
> +	handle = rte_lcore_var_alloc(size, align)
> +
> +/**
> + * Allocate space for an lcore variable, and initialize its handle,
> + * with values aligned for any type of object.
> + *
> + * The values of the lcore variable are initialized to zero.
> + */
> +#define RTE_LCORE_VAR_ALLOC_SIZE(handle, size)	\
> +	RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, size, 0)
> +
> +/**
> + * Allocate space for an lcore variable of the size and alignment
> requirements
> + * suggested by the handler pointer type, and initialize its handle.
> + *
> + * The values of the lcore variable are initialized to zero.
> + */
> +#define RTE_LCORE_VAR_ALLOC(handle)					\
> +	RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, sizeof(*(handle)),	\
> +				       alignof(typeof(*(handle))))
> +
> +/**
> + * Allocate an explicitly-sized, explicitly-aligned lcore variable by
> + * means of a @ref RTE_INIT constructor.
> + *
> + * The values of the lcore variable are initialized to zero.
> + */
> +#define RTE_LCORE_VAR_INIT_SIZE_ALIGN(name, size, align)		\
> +	RTE_INIT(rte_lcore_var_init_ ## name)				\
> +	{								\
> +		RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(name, size, align);	\
> +	}
> +
> +/**
> + * Allocate an explicitly-sized lcore variable by means of a @ref
> + * RTE_INIT constructor.
> + *
> + * The values of the lcore variable are initialized to zero.
> + */
> +#define RTE_LCORE_VAR_INIT_SIZE(name, size)		\
> +	RTE_LCORE_VAR_INIT_SIZE_ALIGN(name, size, 0)
> +
> +/**
> + * Allocate an lcore variable by means of a @ref RTE_INIT constructor.
> + *
> + * The values of the lcore variable are initialized to zero.
> + */
> +#define RTE_LCORE_VAR_INIT(name)					\
> +	RTE_INIT(rte_lcore_var_init_ ## name)				\
> +	{								\
> +		RTE_LCORE_VAR_ALLOC(name);				\
> +	}
> +
> +/**
> + * Get void pointer to lcore variable instance with the specified
> + * lcore id.
> + *
> + * @param lcore_id
> + *   The lcore id specifying which of the @c RTE_MAX_LCORE value
> + *   instances should be accessed. The lcore id need not be valid
> + *   (e.g., may be @ref LCORE_ID_ANY), but in such a case, the pointer
> + *   is also not valid (and thus should not be dereferenced).
> + * @param handle
> + *   The lcore variable handle.
> + */
> +static inline void *
> +rte_lcore_var_lcore_ptr(unsigned int lcore_id, void *handle)
> +{
> +	return RTE_PTR_ADD(handle, lcore_id * RTE_MAX_LCORE_VAR);
> +}
> +
> +/**
> + * Get pointer to lcore variable instance with the specified lcore id.
> + *
> + * @param lcore_id
> + *   The lcore id specifying which of the @c RTE_MAX_LCORE value
> + *   instances should be accessed. The lcore id need not be valid
> + *   (e.g., may be @ref LCORE_ID_ANY), but in such a case, the pointer
> + *   is also not valid (and thus should not be dereferenced).
> + * @param handle
> + *   The lcore variable handle.
> + */
> +#define RTE_LCORE_VAR_LCORE_VALUE(lcore_id, handle)			\
> +	((typeof(handle))rte_lcore_var_lcore_ptr(lcore_id, handle))
> +
> +/**
> + * Get pointer to lcore variable instance of the current thread.
> + *
> + * May only be used by EAL threads and registered non-EAL threads.
> + */
> +#define RTE_LCORE_VAR_VALUE(handle) \
> +	RTE_LCORE_VAR_LCORE_VALUE(rte_lcore_id(), handle)
> +
> +/**
> + * Iterate over each lcore id's value for a lcore variable.
> + *
> + * @param value
> + *   A pointer set successivly set to point to lcore variable value

"set successivly set" -> "successivly set"

Thinking out loud, ignore at your preference:
During the RFC discussions, the term used for referring to an lcore variable was discussed;
we considered "pointer", but settled for "value".
Perhaps "instance" would be usable in comments like like the one describing this function...
"A pointer set successivly set to point to lcore variable value" ->
"A pointer set successivly set to point to lcore variable instance".
I don't know.


> + *   corresponding to every lcore id (up to @c RTE_MAX_LCORE).
> + * @param handle
> + *   The lcore variable handle.
> + */
> +#define RTE_LCORE_VAR_FOREACH_VALUE(value, handle)			\
> +	for (unsigned int lcore_id =					\
> +		     (((value) = RTE_LCORE_VAR_LCORE_VALUE(0, handle)), 0); \
> +	     lcore_id < RTE_MAX_LCORE;					\
> +	     lcore_id++, (value) = RTE_LCORE_VAR_LCORE_VALUE(lcore_id, handle))
> +
> +/**
> + * Allocate space in the per-lcore id buffers for a lcore variable.
> + *
> + * The pointer returned is only an opaque identifer of the variable. To
> + * get an actual pointer to a particular instance of the variable use
> + * @ref RTE_LCORE_VAR_VALUE or @ref RTE_LCORE_VAR_LCORE_VALUE.
> + *
> + * The lcore variable values' memory is set to zero.
> + *
> + * The allocation is always successful, barring a fatal exhaustion of
> + * the per-lcore id buffer space.
> + *
> + * rte_lcore_var_alloc() is not multi-thread safe.
> + *
> + * @param size
> + *   The size (in bytes) of the variable's per-lcore id value. Must be > 0.
> + * @param align
> + *   If 0, the values will be suitably aligned for any kind of type
> + *   (i.e., alignof(max_align_t)). Otherwise, the values will be aligned
> + *   on a multiple of *align*, which must be a power of 2 and equal or
> + *   less than @c RTE_CACHE_LINE_SIZE.
> + * @return
> + *   The id of the variable, stored in a void pointer value. The value

"id" -> "handle"

> + *   is always non-NULL.
> + */
> +__rte_experimental
> +void *
> +rte_lcore_var_alloc(size_t size, size_t align);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_LCORE_VAR_H_ */
> diff --git a/lib/eal/version.map b/lib/eal/version.map
> index e3ff412683..5f5a3522c0 100644
> --- a/lib/eal/version.map
> +++ b/lib/eal/version.map
> @@ -396,6 +396,9 @@ EXPERIMENTAL {
> 
>  	# added in 24.03
>  	rte_vfio_get_device_info; # WINDOWS_NO_EXPORT
> +
> +	rte_lcore_var_alloc;
> +	rte_lcore_var;

No such function: rte_lcore_var

>  };
> 
>  INTERNAL {
> --
> 2.34.1
  
Mattias Rönnblom Sept. 10, 2024, 10:44 a.m. UTC | #2
On 2024-09-10 11:32, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>> Sent: Tuesday, 10 September 2024 09.04
>>
>> Introduce DPDK per-lcore id variables, or lcore variables for short.
> 
> Throughout the descriptions and comments,
> please replace "lcore id" with "lcore" (e.g. "per-lcore variables"),
> when referring to the lcore, and not the index of the lcore.
> (Your intention might be to highlight that it only covers threads with an lcore id,
> but if that wasn't the case, you would refer to them as "threads" not "lcores".)
> Except, of course, when referring to an actual lcore id, e.g. lcore_id function parameters.

"lcore" is just another word for "EAL thread." The lcore variables exist 
in one instance for every thread with an lcore id, thus also for 
registered non-EAL threads (i.e., threads which are not lcores).

I've tried to summarize the (very confusing) terminology of DPDK's 
threading model here:
https://ericsson.github.io/dataplanebook/threading/threading.html#eal-threads

So, in my world, "per-lcore id variables" is pretty accurate. You could 
say "variables with per-lcore id values" if you want to make it even 
more clear, what's going on.

> 
> Paraphrasing:
> Consider the type of what you are referring to;
> use "lcore" if its type is "thread", and
> use "lcore id" if its type is "int".
> 
> I might be wrong here, but please think hard about it.
> 
>>
>> An lcore variable has one value for every current and future lcore
>> id-equipped thread.
>>
>> The primary <rte_lcore_var.h> use case is for statically allocating
>> small, frequently-accessed data structures, for which one instance
>> should exist for each lcore.
>>
>> Lcore variables are similar to thread-local storage (TLS, e.g., C11
>> _Thread_local), but decoupling the values' life time with that of the
>> threads.
>>
>> Lcore variables are also similar in terms of functionality provided by
>> FreeBSD kernel's DPCPU_*() family of macros and the associated
>> build-time machinery. DPCPU uses linker scripts, which effectively
>> prevents the reuse of its, otherwise seemingly viable, approach.
>>
>> The currently-prevailing way to solve the same problem as lcore
>> variables is to keep a module's per-lcore data as RTE_MAX_LCORE-sized
>> array of cache-aligned, RTE_CACHE_GUARDed structs. The benefit of
>> lcore variables over this approach is that data related to the same
>> lcore now is close (spatially, in memory), rather than data used by
>> the same module, which in turn avoid excessive use of padding,
>> polluting caches with unused data.
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>>
>> --
> 
>> +++ b/doc/api/doxy-api-index.md
>> @@ -99,6 +99,7 @@ The public API headers are grouped by topics:
>>     [interrupts](@ref rte_interrupts.h),
>>     [launch](@ref rte_launch.h),
>>     [lcore](@ref rte_lcore.h),
>> +  [lcore-varible](@ref rte_lcore_var.h),
> 
> Typo: varible -> variable
> 
> 

I'll change it to "lcore variables" (no dash, plural).

>> +++ b/doc/guides/rel_notes/release_24_11.rst
>> @@ -55,6 +55,20 @@ New Features
>>        Also, make sure to start the actual text at the margin.
>>        =======================================================
>>
>> +* **Added EAL per-lcore static memory allocation facility.**
>> +
>> +    Added EAL API <rte_lcore_var.h> for statically allocating small,
>> +    frequently-accessed data structures, for which one instance should
>> +    exist for each lcore.
>> +
>> +    With lcore variables, data is organized spatially on a per-lcore
>> +    basis, rather than per library or PMD, avoiding the need for cache
>> +    aligning (or RTE_CACHE_GUARDing) data structures, which in turn
>> +    reduces CPU cache internal fragmentation, improving performance.
>> +
>> +    Lcore variables are similar to thread-local storage (TLS, e.g.,
>> +    C11 _Thread_local), but decoupling the values' life time from that
>> +    of the threads.
> 
> When referring to TLS, you might want to clarify that lcore variables are not instantiated for unregistered threads.
> 

Isn't that clear from the first paragraph? Although it should say "per 
lcore id", rather than "per lcore."

> 
>> +static void *lcore_buffer;
>> +static size_t offset = RTE_MAX_LCORE_VAR;
>> +
>> +static void *
>> +lcore_var_alloc(size_t size, size_t align)
>> +{
>> +	void *handle;
>> +	void *value;
>> +
>> +	offset = RTE_ALIGN_CEIL(offset, align);
>> +
>> +	if (offset + size > RTE_MAX_LCORE_VAR) {
>> +		lcore_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE,
>> +					     LCORE_BUFFER_SIZE);
>> +		RTE_VERIFY(lcore_buffer != NULL);
>> +
>> +		offset = 0;
>> +	}
> 
> To determine if the lcore_buffer memory should be allocated, why not just check if lcore_buffer == NULL?

Because it may be the case, lcore_buffer is non-NULL but the remaining 
space is too small to service the allocation.

> Then offset wouldn't need an initial value of RTE_MAX_LCORE_VAR.
> 
>> +
>> +	handle = RTE_PTR_ADD(lcore_buffer, offset);
>> +
>> +	offset += size;
>> +
>> +	RTE_LCORE_VAR_FOREACH_VALUE(value, handle)
>> +		memset(value, 0, size);
>> +
>> +	EAL_LOG(DEBUG, "Allocated %"PRIuPTR" bytes of per-lcore data with a "
>> +		"%"PRIuPTR"-byte alignment", size, align);
>> +
>> +	return handle;
>> +}
> 
> 
>> +/**
>> + * @file
>> + *
>> + * RTE Per-lcore id variables
> 
> Suggest mentioning the short form too, e.g.:
> "RTE Per-lcore id variables (RTE Lcore variables)"

What about just "RTE Lcore variables"?

Exactly what they are is thoroughly described in the text that follows.

> 
>> + *
>> + * This API provides a mechanism to create and access per-lcore id
>> + * variables in a space- and cycle-efficient manner.
>> + *
>> + * A per-lcore id variable (or lcore variable for short) has one value
>> + * for each EAL thread and registered non-EAL thread.
> 
> And service thread.

Service threads are EAL threads, or, at a bare minimum, must have a 
lcore id, and thus must be registered.

> 
>> + * There is one
>> + * copy for each current and future lcore id-equipped thread, with the
> 
> "one copy" -> "one instance"
> 

Fixed.

>> + * total number of copies amounting to @c RTE_MAX_LCORE. The value of
> 
> "copies" -> "instances"
> 

OK, I'll rephrase that sentence.

>> + * an lcore variable for a particular lcore id is independent from
>> + * other values (for other lcore ids) within the same lcore variable.
>> + *
>> + * In order to access the values of an lcore variable, a handle is
>> + * used. The type of the handle is a pointer to the value's type
>> + * (e.g., for @c uint32_t lcore variable, the handle is a
>> + * <code>uint32_t *</code>. The handler type is used to inform the
> 
> Typo: "handler" -> "handle", I think :-/
> Found this typo multiple times; search-replace.

Fixed.

> 
>> + * access macros the type of the values. A handle may be passed
>> + * between modules and threads just like any pointer, but its value
>> + * must be treated as a an opaque identifier. An allocated handle
>> + * never has the value NULL.
>> + *
>> + * @b Creation
>> + *
>> + * An lcore variable is created in two steps:
>> + *  1. Define a lcore variable handle by using @ref RTE_LCORE_VAR_HANDLE.
>> + *  2. Allocate lcore variable storage and initialize the handle with
>> + *     a unique identifier by @ref RTE_LCORE_VAR_ALLOC or
>> + *     @ref RTE_LCORE_VAR_INIT. Allocation generally occurs the time of
>> + *     module initialization, but may be done at any time.
>> + *
>> + * An lcore variable is not tied to the owning thread's lifetime. It's
>> + * available for use by any thread immediately after having been
>> + * allocated, and continues to be available throughout the lifetime of
>> + * the EAL.
>> + *
>> + * Lcore variables cannot and need not be freed.
>> + *
>> + * @b Access
>> + *
>> + * The value of any lcore variable for any lcore id may be accessed
>> + * from any thread (including unregistered threads), but it should
>> + * only be *frequently* read from or written to by the owner.
>> + *
>> + * Values of the same lcore variable but owned by to different lcore
> 
> Typo: to -> two
> 

Fixed.

>> + * ids may be frequently read or written by the owners without risking
>> + * false sharing.
>> + *
>> + * An appropriate synchronization mechanism (e.g., atomic loads and
>> + * stores) should employed to assure there are no data races between
>> + * the owning thread and any non-owner threads accessing the same
>> + * lcore variable instance.
>> + *
>> + * The value of the lcore variable for a particular lcore id is
>> + * accessed using @ref RTE_LCORE_VAR_LCORE_VALUE.
>> + *
>> + * A common pattern is for an EAL thread or a registered non-EAL
>> + * thread to access its own lcore variable value. For this purpose, a
>> + * short-hand exists in the form of @ref RTE_LCORE_VAR_VALUE.
>> + *
>> + * Although the handle (as defined by @ref RTE_LCORE_VAR_HANDLE) is a
>> + * pointer with the same type as the value, it may not be directly
>> + * dereferenced and must be treated as an opaque identifier.
>> + *
>> + * Lcore variable handles and value pointers may be freely passed
>> + * between different threads.
>> + *
>> + * @b Storage
>> + *
>> + * An lcore variable's values may by of a primitive type like @c int,
> 
> Two typos: "values may by" -> "value may be"
> 

That's not a typo. An lcore variable take on multiple values, one for 
each lcore id. That said, I guess you could refer to the whole thing 
(the set of values) as the "value" as well.

>> + * but would more typically be a @c struct.
>> + *
>> + * The lcore variable handle introduces a per-variable (not
>> + * per-value/per-lcore id) overhead of @c sizeof(void *) bytes, so
>> + * there are some memory footprint gains to be made by organizing all
>> + * per-lcore id data for a particular module as one lcore variable
>> + * (e.g., as a struct).
>> + *
>> + * An application may choose to define an lcore variable handle, which
>> + * it then it goes on to never allocate.
>> + *
>> + * The size of a lcore variable's value must be less than the DPDK
>> + * build-time constant @c RTE_MAX_LCORE_VAR.
>> + *
>> + * The lcore variable are stored in a series of lcore buffers, which
>> + * are allocated from the libc heap. Heap allocation failures are
>> + * treated as fatal.
>> + *
>> + * Lcore variables should generally *not* be @ref __rte_cache_aligned
>> + * and need *not* include a @ref RTE_CACHE_GUARD field, since the use
>> + * of these constructs are designed to avoid false sharing. In the
>> + * case of an lcore variable instance, the thread most recently
>> + * accessing nearby data structures should almost-always the lcore
> 
> Missing word: should almost-always *be* the lcore variables' owner.
> 

Fixed.

> 
>> + * variables' owner. Adding padding will increase the effective memory
>> + * working set size, potentially reducing performance.
>> + *
>> + * Lcore variable values take on an initial value of zero.
>> + *
>> + * @b Example
>> + *
>> + * Below is an example of the use of an lcore variable:
>> + *
>> + * @code{.c}
>> + * struct foo_lcore_state {
>> + *         int a;
>> + *         long b;
>> + * };
>> + *
>> + * static RTE_LCORE_VAR_HANDLE(struct foo_lcore_state, lcore_states);
>> + *
>> + * long foo_get_a_plus_b(void)
>> + * {
>> + *         struct foo_lcore_state *state = RTE_LCORE_VAR_VALUE(lcore_states);
>> + *
>> + *         return state->a + state->b;
>> + * }
>> + *
>> + * RTE_INIT(rte_foo_init)
>> + * {
>> + *         RTE_LCORE_VAR_ALLOC(lcore_states);
>> + *
>> + *         struct foo_lcore_state *state;
>> + *         RTE_LCORE_VAR_FOREACH_VALUE(state, lcore_states) {
>> + *                 (initialize 'state')
> 
> Consider: (initialize 'state') -> /* initialize 'state' */
> 

I think I tried that, and it failed because the compiler didn't like 
nested comments.

>> + *         }
>> + *
>> + *         (other initialization)
> 
> Consider: (other initialization) -> /* other initialization */
> 
>> + * }
>> + * @endcode
>> + *
>> + *
>> + * @b Alternatives
>> + *
>> + * Lcore variables are designed to replace a pattern exemplified below:
>> + * @code{.c}
>> + * struct __rte_cache_aligned foo_lcore_state {
>> + *         int a;
>> + *         long b;
>> + *         RTE_CACHE_GUARD;
>> + * };
>> + *
>> + * static struct foo_lcore_state lcore_states[RTE_MAX_LCORE];
>> + * @endcode
>> + *
>> + * This scheme is simple and effective, but has one drawback: the data
>> + * is organized so that objects related to all lcores for a particular
>> + * module is kept close in memory. At a bare minimum, this forces the
>> + * use of cache-line alignment to avoid false sharing. With CPU
> 
> Consider adding: use of *padding to* cache-line alignment
> My point here is:
> This sentence should somehow include the word "padding".

I'm not sure everyone thinks about __rte_cache_aligned or cache-aligned 
heap allocations as "padded."

> This paragraph is not only aboud cache line alignment, it is primarily about padding.
> 

"At a bare minimum, this requires sizing data structures (e.g., using 
`__rte_cache_aligned`) to an even number of cache lines to avoid false 
sharing."

How about this?

>> + * hardware prefetching and memory loads resulting from speculative
>> + * execution (functions which seemingly are getting more eager faster
>> + * than they are getting more intelligent), one or more "guard" cache
>> + * lines may be required to separate one lcore's data from another's.
>> + *
>> + * Lcore variables has the upside of working with, not against, the
> 
> Typo: has -> have
> 

Fixed.

>> + * CPU's assumptions and for example next-line prefetchers may well
>> + * work the way its designers intended (i.e., to the benefit, not
>> + * detriment, of system performance).
>> + *
>> + * Another alternative to @ref rte_lcore_var.h is the @ref
>> + * rte_per_lcore.h API, which make use of thread-local storage (TLS,
> 
> Typo: make -> makes >

Fixed.

>> + * e.g., GCC __thread or C11 _Thread_local). The main differences
>> + * between by using the various forms of TLS (e.g., @ref
>> + * RTE_DEFINE_PER_LCORE or _Thread_local) and the use of lcore
>> + * variables are:
>> + *
>> + *   * The existence and non-existence of a thread-local variable
>> + *     instance follow that of particular thread's. The data cannot be
> 
> Typo: "thread's" -> "threads", I think. :-/
> 

It's not a typo.

>> + *     accessed before the thread has been created, nor after it has
>> + *     exited. As a result, thread-local variables must initialized in
> 
> Missing word: must *be* initialized
> 

Fixed.

>> + *     a "lazy" manner (e.g., at the point of thread creation). Lcore
>> + *     variables may be accessed immediately after having been
>> + *     allocated (which may be prior any thread beyond the main
>> + *     thread is running).
>> + *   * A thread-local variable is duplicated across all threads in the
>> + *     process, including unregistered non-EAL threads (i.e.,
>> + *     "regular" threads). For DPDK applications heavily relying on
>> + *     multi-threading (in conjunction to DPDK's "one thread per core"
>> + *     pattern), either by having many concurrent threads or
>> + *     creating/destroying threads at a high rate, an excessive use of
>> + *     thread-local variables may cause inefficiencies (e.g.,
>> + *     increased thread creation overhead due to thread-local storage
>> + *     initialization or increased total RAM footprint usage). Lcore
>> + *     variables *only* exist for threads with an lcore id.
>> + *   * If data in thread-local storage may be shared between threads
>> + *     (i.e., can a pointer to a thread-local variable be passed to
>> + *     and successfully dereferenced by non-owning thread) depends on
>> + *     the details of the TLS implementation. With GCC __thread and
>> + *     GCC _Thread_local, such data sharing is supported. In the C11
>> + *     standard, the result of accessing another thread's
>> + *     _Thread_local object is implementation-defined. Lcore variable
>> + *     instances may be accessed reliably by any thread.
>> + */
>> +
>> +#include <stddef.h>
>> +#include <stdalign.h>
>> +
>> +#include <rte_common.h>
>> +#include <rte_config.h>
>> +#include <rte_lcore.h>
>> +
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +/**
>> + * Given the lcore variable type, produces the type of the lcore
>> + * variable handle.
>> + */
>> +#define RTE_LCORE_VAR_HANDLE_TYPE(type)		\
>> +	type *
>> +
>> +/**
>> + * Define a lcore variable handle.
> 
> Typo: "a lcore" -> "an lcore"
> Found this typo multiple times; search-replace "a lcore".
> 

Yes, fixed.

>> + *
>> + * This macro defines a variable which is used as a handle to access
>> + * the various per-lcore id instances of a per-lcore id variable.
> 
> Suggest:
> "the various per-lcore id instances of a per-lcore id variable" ->
> "the various instances of a per-lcore id variable" >

Sounds good.

>> + *
>> + * The aim with this macro is to make clear at the point of
>> + * declaration that this is an lcore handler, rather than a regular
>> + * pointer.
>> + *
>> + * Add @b static as a prefix in case the lcore variable are only to be
> 
> Typo: are -> is
> 

Fixed.

>> + * accessed from a particular translation unit.
>> + */
>> +#define RTE_LCORE_VAR_HANDLE(type, name)	\
>> +	RTE_LCORE_VAR_HANDLE_TYPE(type) name
>> +
>> +/**
>> + * Allocate space for an lcore variable, and initialize its handle.
>> + *
>> + * The values of the lcore variable are initialized to zero.
> 
> Consider adding: "the lcore variable *instances* are initialized"
> Found this typo multiple times; search-replace.
> 

It's not a typo. "Values" is just short for "instances of the value", 
just like "instances" is. Using instances everywhere may confuse the 
reader that an instance both a name and a value, which is not the case. 
I don't know, maybe I should be using "values" everywhere instead of 
"instances".

I agree there's some lack of consistency here and potential room for 
improvement, but I'm not sure exactly how improvement looks like.

>> + */
>> +#define RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, size, align)	\
>> +	handle = rte_lcore_var_alloc(size, align)
>> +
>> +/**
>> + * Allocate space for an lcore variable, and initialize its handle,
>> + * with values aligned for any type of object.
>> + *
>> + * The values of the lcore variable are initialized to zero.
>> + */
>> +#define RTE_LCORE_VAR_ALLOC_SIZE(handle, size)	\
>> +	RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, size, 0)
>> +
>> +/**
>> + * Allocate space for an lcore variable of the size and alignment
>> requirements
>> + * suggested by the handler pointer type, and initialize its handle.
>> + *
>> + * The values of the lcore variable are initialized to zero.
>> + */
>> +#define RTE_LCORE_VAR_ALLOC(handle)					\
>> +	RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, sizeof(*(handle)),	\
>> +				       alignof(typeof(*(handle))))
>> +
>> +/**
>> + * Allocate an explicitly-sized, explicitly-aligned lcore variable by
>> + * means of a @ref RTE_INIT constructor.
>> + *
>> + * The values of the lcore variable are initialized to zero.
>> + */
>> +#define RTE_LCORE_VAR_INIT_SIZE_ALIGN(name, size, align)		\
>> +	RTE_INIT(rte_lcore_var_init_ ## name)				\
>> +	{								\
>> +		RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(name, size, align);	\
>> +	}
>> +
>> +/**
>> + * Allocate an explicitly-sized lcore variable by means of a @ref
>> + * RTE_INIT constructor.
>> + *
>> + * The values of the lcore variable are initialized to zero.
>> + */
>> +#define RTE_LCORE_VAR_INIT_SIZE(name, size)		\
>> +	RTE_LCORE_VAR_INIT_SIZE_ALIGN(name, size, 0)
>> +
>> +/**
>> + * Allocate an lcore variable by means of a @ref RTE_INIT constructor.
>> + *
>> + * The values of the lcore variable are initialized to zero.
>> + */
>> +#define RTE_LCORE_VAR_INIT(name)					\
>> +	RTE_INIT(rte_lcore_var_init_ ## name)				\
>> +	{								\
>> +		RTE_LCORE_VAR_ALLOC(name);				\
>> +	}
>> +
>> +/**
>> + * Get void pointer to lcore variable instance with the specified
>> + * lcore id.
>> + *
>> + * @param lcore_id
>> + *   The lcore id specifying which of the @c RTE_MAX_LCORE value
>> + *   instances should be accessed. The lcore id need not be valid
>> + *   (e.g., may be @ref LCORE_ID_ANY), but in such a case, the pointer
>> + *   is also not valid (and thus should not be dereferenced).
>> + * @param handle
>> + *   The lcore variable handle.
>> + */
>> +static inline void *
>> +rte_lcore_var_lcore_ptr(unsigned int lcore_id, void *handle)
>> +{
>> +	return RTE_PTR_ADD(handle, lcore_id * RTE_MAX_LCORE_VAR);
>> +}
>> +
>> +/**
>> + * Get pointer to lcore variable instance with the specified lcore id.
>> + *
>> + * @param lcore_id
>> + *   The lcore id specifying which of the @c RTE_MAX_LCORE value
>> + *   instances should be accessed. The lcore id need not be valid
>> + *   (e.g., may be @ref LCORE_ID_ANY), but in such a case, the pointer
>> + *   is also not valid (and thus should not be dereferenced).
>> + * @param handle
>> + *   The lcore variable handle.
>> + */
>> +#define RTE_LCORE_VAR_LCORE_VALUE(lcore_id, handle)			\
>> +	((typeof(handle))rte_lcore_var_lcore_ptr(lcore_id, handle))
>> +
>> +/**
>> + * Get pointer to lcore variable instance of the current thread.
>> + *
>> + * May only be used by EAL threads and registered non-EAL threads.
>> + */
>> +#define RTE_LCORE_VAR_VALUE(handle) \
>> +	RTE_LCORE_VAR_LCORE_VALUE(rte_lcore_id(), handle)
>> +
>> +/**
>> + * Iterate over each lcore id's value for a lcore variable.
>> + *
>> + * @param value
>> + *   A pointer set successivly set to point to lcore variable value
> 
> "set successivly set" -> "successivly set"
> 
> Thinking out loud, ignore at your preference:
> During the RFC discussions, the term used for referring to an lcore variable was discussed;
> we considered "pointer", but settled for "value".
> Perhaps "instance" would be usable in comments like like the one describing this function...
> "A pointer set successivly set to point to lcore variable value" ->
> "A pointer set successivly set to point to lcore variable instance".
> I don't know.
> 

I also don't know.

> 
>> + *   corresponding to every lcore id (up to @c RTE_MAX_LCORE).
>> + * @param handle
>> + *   The lcore variable handle.
>> + */
>> +#define RTE_LCORE_VAR_FOREACH_VALUE(value, handle)			\
>> +	for (unsigned int lcore_id =					\
>> +		     (((value) = RTE_LCORE_VAR_LCORE_VALUE(0, handle)), 0); \
>> +	     lcore_id < RTE_MAX_LCORE;					\
>> +	     lcore_id++, (value) = RTE_LCORE_VAR_LCORE_VALUE(lcore_id, handle))
>> +
>> +/**
>> + * Allocate space in the per-lcore id buffers for a lcore variable.
>> + *
>> + * The pointer returned is only an opaque identifer of the variable. To
>> + * get an actual pointer to a particular instance of the variable use
>> + * @ref RTE_LCORE_VAR_VALUE or @ref RTE_LCORE_VAR_LCORE_VALUE.
>> + *
>> + * The lcore variable values' memory is set to zero.
>> + *
>> + * The allocation is always successful, barring a fatal exhaustion of
>> + * the per-lcore id buffer space.
>> + *
>> + * rte_lcore_var_alloc() is not multi-thread safe.
>> + *
>> + * @param size
>> + *   The size (in bytes) of the variable's per-lcore id value. Must be > 0.
>> + * @param align
>> + *   If 0, the values will be suitably aligned for any kind of type
>> + *   (i.e., alignof(max_align_t)). Otherwise, the values will be aligned
>> + *   on a multiple of *align*, which must be a power of 2 and equal or
>> + *   less than @c RTE_CACHE_LINE_SIZE.
>> + * @return
>> + *   The id of the variable, stored in a void pointer value. The value
> 
> "id" -> "handle"
> 

Fixed.

>> + *   is always non-NULL.
>> + */
>> +__rte_experimental
>> +void *
>> +rte_lcore_var_alloc(size_t size, size_t align);
>> +
>> +#ifdef __cplusplus
>> +}
>> +#endif
>> +
>> +#endif /* _RTE_LCORE_VAR_H_ */
>> diff --git a/lib/eal/version.map b/lib/eal/version.map
>> index e3ff412683..5f5a3522c0 100644
>> --- a/lib/eal/version.map
>> +++ b/lib/eal/version.map
>> @@ -396,6 +396,9 @@ EXPERIMENTAL {
>>
>>   	# added in 24.03
>>   	rte_vfio_get_device_info; # WINDOWS_NO_EXPORT
>> +
>> +	rte_lcore_var_alloc;
>> +	rte_lcore_var;
> 
> No such function: rte_lcore_var

Indeed. That variable is gone. Fixed.

Thanks a lot of your review Morten.

> 
>>   };
>>
>>   INTERNAL {
>> --
>> 2.34.1
>
  
Morten Brørup Sept. 10, 2024, 1:07 p.m. UTC | #3
> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Tuesday, 10 September 2024 12.45
> 
> On 2024-09-10 11:32, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >> Sent: Tuesday, 10 September 2024 09.04
> >>
> >> Introduce DPDK per-lcore id variables, or lcore variables for short.
> >
> > Throughout the descriptions and comments,
> > please replace "lcore id" with "lcore" (e.g. "per-lcore variables"),
> > when referring to the lcore, and not the index of the lcore.
> > (Your intention might be to highlight that it only covers threads with
> an lcore id,
> > but if that wasn't the case, you would refer to them as "threads" not
> "lcores".)
> > Except, of course, when referring to an actual lcore id, e.g. lcore_id
> function parameters.
> 
> "lcore" is just another word for "EAL thread." The lcore variables exist
> in one instance for every thread with an lcore id, thus also for
> registered non-EAL threads (i.e., threads which are not lcores).
> 
> I've tried to summarize the (very confusing) terminology of DPDK's
> threading model here:
> https://ericsson.github.io/dataplanebook/threading/threading.html#eal-
> threads
> 
> So, in my world, "per-lcore id variables" is pretty accurate. You could
> say "variables with per-lcore id values" if you want to make it even
> more clear, what's going on.

With your reference terminology in mind, "per-lcore id variables" is OK with me.

<rant>
DPDK's lcore terminology has drifted quite far away from its original 1:1 meaning, but I'm not going to try to clean it up.
It also seems the meaning of "socket" is drifting.

And the DPDK's project's API/API compatibility ambitions seem to favor bolting on new features to the pile, rather than replacing APIs that have grown misleading with new APIs serving new requirements.
</rant>

> 
> >
> > Paraphrasing:
> > Consider the type of what you are referring to;
> > use "lcore" if its type is "thread", and
> > use "lcore id" if its type is "int".
> >
> > I might be wrong here, but please think hard about it.
> >
> >>
> >> An lcore variable has one value for every current and future lcore
> >> id-equipped thread.
> >>
> >> The primary <rte_lcore_var.h> use case is for statically allocating
> >> small, frequently-accessed data structures, for which one instance
> >> should exist for each lcore.
> >>
> >> Lcore variables are similar to thread-local storage (TLS, e.g., C11
> >> _Thread_local), but decoupling the values' life time with that of the
> >> threads.
> >>
> >> Lcore variables are also similar in terms of functionality provided
> by
> >> FreeBSD kernel's DPCPU_*() family of macros and the associated
> >> build-time machinery. DPCPU uses linker scripts, which effectively
> >> prevents the reuse of its, otherwise seemingly viable, approach.
> >>
> >> The currently-prevailing way to solve the same problem as lcore
> >> variables is to keep a module's per-lcore data as RTE_MAX_LCORE-sized
> >> array of cache-aligned, RTE_CACHE_GUARDed structs. The benefit of
> >> lcore variables over this approach is that data related to the same
> >> lcore now is close (spatially, in memory), rather than data used by
> >> the same module, which in turn avoid excessive use of padding,
> >> polluting caches with unused data.
> >>
> >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >>
> >> --
> >
> >> +++ b/doc/api/doxy-api-index.md
> >> @@ -99,6 +99,7 @@ The public API headers are grouped by topics:
> >>     [interrupts](@ref rte_interrupts.h),
> >>     [launch](@ref rte_launch.h),
> >>     [lcore](@ref rte_lcore.h),
> >> +  [lcore-varible](@ref rte_lcore_var.h),
> >
> > Typo: varible -> variable
> >
> >
> 
> I'll change it to "lcore variables" (no dash, plural).

+1

> 
> >> +++ b/doc/guides/rel_notes/release_24_11.rst
> >> @@ -55,6 +55,20 @@ New Features
> >>        Also, make sure to start the actual text at the margin.
> >>        =======================================================
> >>
> >> +* **Added EAL per-lcore static memory allocation facility.**
> >> +
> >> +    Added EAL API <rte_lcore_var.h> for statically allocating small,
> >> +    frequently-accessed data structures, for which one instance
> should
> >> +    exist for each lcore.
> >> +
> >> +    With lcore variables, data is organized spatially on a per-lcore
> >> +    basis, rather than per library or PMD, avoiding the need for
> cache
> >> +    aligning (or RTE_CACHE_GUARDing) data structures, which in turn
> >> +    reduces CPU cache internal fragmentation, improving performance.
> >> +
> >> +    Lcore variables are similar to thread-local storage (TLS, e.g.,
> >> +    C11 _Thread_local), but decoupling the values' life time from
> that
> >> +    of the threads.
> >
> > When referring to TLS, you might want to clarify that lcore variables
> are not instantiated for unregistered threads.
> >
> 
> Isn't that clear from the first paragraph? Although it should say "per
> lcore id", rather than "per lcore."

Yes, almost.
But in this paragraph, when you mention that they are similar to TLS, someone might not catch that it still applies (that they only are instantiated for lcores and not all threads). So clarify one extra time, just to ensure that everyone gets it.

> 
> >
> >> +static void *lcore_buffer;
> >> +static size_t offset = RTE_MAX_LCORE_VAR;
> >> +
> >> +static void *
> >> +lcore_var_alloc(size_t size, size_t align)
> >> +{
> >> +	void *handle;
> >> +	void *value;
> >> +
> >> +	offset = RTE_ALIGN_CEIL(offset, align);
> >> +
> >> +	if (offset + size > RTE_MAX_LCORE_VAR) {
> >> +		lcore_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE,
> >> +					     LCORE_BUFFER_SIZE);
> >> +		RTE_VERIFY(lcore_buffer != NULL);
> >> +
> >> +		offset = 0;
> >> +	}
> >
> > To determine if the lcore_buffer memory should be allocated, why not
> just check if lcore_buffer == NULL?
> 
> Because it may be the case, lcore_buffer is non-NULL but the remaining
> space is too small to service the allocation.

There's no error handling of that case. You simply forget about the allocated memory, and behave like initial allocation/initialization.

> 
> > Then offset wouldn't need an initial value of RTE_MAX_LCORE_VAR.
> >
> >> +
> >> +	handle = RTE_PTR_ADD(lcore_buffer, offset);
> >> +
> >> +	offset += size;
> >> +
> >> +	RTE_LCORE_VAR_FOREACH_VALUE(value, handle)
> >> +		memset(value, 0, size);
> >> +
> >> +	EAL_LOG(DEBUG, "Allocated %"PRIuPTR" bytes of per-lcore data with
> a "
> >> +		"%"PRIuPTR"-byte alignment", size, align);
> >> +
> >> +	return handle;
> >> +}
> >
> >
> >> +/**
> >> + * @file
> >> + *
> >> + * RTE Per-lcore id variables
> >
> > Suggest mentioning the short form too, e.g.:
> > "RTE Per-lcore id variables (RTE Lcore variables)"
> 
> What about just "RTE Lcore variables"?

+1

> 
> Exactly what they are is thoroughly described in the text that follows.
> 
> >
> >> + *
> >> + * This API provides a mechanism to create and access per-lcore id
> >> + * variables in a space- and cycle-efficient manner.
> >> + *
> >> + * A per-lcore id variable (or lcore variable for short) has one
> value
> >> + * for each EAL thread and registered non-EAL thread.
> >
> > And service thread.
> 
> Service threads are EAL threads, or, at a bare minimum, must have a
> lcore id, and thus must be registered.

Service threads have an lcore id, yes, but they have rte_lcore_role_t enum value ROLE_SERVICE, which differs from that of EAL threads (ROLE_EAL). Registered non-EAL threads have yet another role, ROLE_NON_EAL.

> 
> >
> >> + * There is one
> >> + * copy for each current and future lcore id-equipped thread, with
> the
> >
> > "one copy" -> "one instance"
> >
> 
> Fixed.
> 
> >> + * total number of copies amounting to @c RTE_MAX_LCORE. The value
> of
> >
> > "copies" -> "instances"
> >
> 
> OK, I'll rephrase that sentence.
> 
> >> + * an lcore variable for a particular lcore id is independent from
> >> + * other values (for other lcore ids) within the same lcore
> variable.
> >> + *
> >> + * In order to access the values of an lcore variable, a handle is
> >> + * used. The type of the handle is a pointer to the value's type
> >> + * (e.g., for @c uint32_t lcore variable, the handle is a
> >> + * <code>uint32_t *</code>. The handler type is used to inform the
> >
> > Typo: "handler" -> "handle", I think :-/
> > Found this typo multiple times; search-replace.
> 
> Fixed.
> 
> >
> >> + * access macros the type of the values. A handle may be passed
> >> + * between modules and threads just like any pointer, but its value
> >> + * must be treated as a an opaque identifier. An allocated handle
> >> + * never has the value NULL.
> >> + *
> >> + * @b Creation
> >> + *
> >> + * An lcore variable is created in two steps:
> >> + *  1. Define a lcore variable handle by using @ref
> RTE_LCORE_VAR_HANDLE.
> >> + *  2. Allocate lcore variable storage and initialize the handle
> with
> >> + *     a unique identifier by @ref RTE_LCORE_VAR_ALLOC or
> >> + *     @ref RTE_LCORE_VAR_INIT. Allocation generally occurs the time
> of
> >> + *     module initialization, but may be done at any time.
> >> + *
> >> + * An lcore variable is not tied to the owning thread's lifetime.
> It's
> >> + * available for use by any thread immediately after having been
> >> + * allocated, and continues to be available throughout the lifetime
> of
> >> + * the EAL.
> >> + *
> >> + * Lcore variables cannot and need not be freed.
> >> + *
> >> + * @b Access
> >> + *
> >> + * The value of any lcore variable for any lcore id may be accessed
> >> + * from any thread (including unregistered threads), but it should
> >> + * only be *frequently* read from or written to by the owner.
> >> + *
> >> + * Values of the same lcore variable but owned by to different lcore
> >
> > Typo: to -> two
> >
> 
> Fixed.
> 
> >> + * ids may be frequently read or written by the owners without
> risking
> >> + * false sharing.
> >> + *
> >> + * An appropriate synchronization mechanism (e.g., atomic loads and
> >> + * stores) should employed to assure there are no data races between
> >> + * the owning thread and any non-owner threads accessing the same
> >> + * lcore variable instance.
> >> + *
> >> + * The value of the lcore variable for a particular lcore id is
> >> + * accessed using @ref RTE_LCORE_VAR_LCORE_VALUE.
> >> + *
> >> + * A common pattern is for an EAL thread or a registered non-EAL
> >> + * thread to access its own lcore variable value. For this purpose,
> a
> >> + * short-hand exists in the form of @ref RTE_LCORE_VAR_VALUE.
> >> + *
> >> + * Although the handle (as defined by @ref RTE_LCORE_VAR_HANDLE) is
> a
> >> + * pointer with the same type as the value, it may not be directly
> >> + * dereferenced and must be treated as an opaque identifier.
> >> + *
> >> + * Lcore variable handles and value pointers may be freely passed
> >> + * between different threads.
> >> + *
> >> + * @b Storage
> >> + *
> >> + * An lcore variable's values may by of a primitive type like @c
> int,
> >
> > Two typos: "values may by" -> "value may be"
> >
> 
> That's not a typo. An lcore variable take on multiple values, one for
> each lcore id. That said, I guess you could refer to the whole thing
> (the set of values) as the "value" as well.

OK. Reading it the way you explain, I get it. No typo.

> 
> >> + * but would more typically be a @c struct.
> >> + *
> >> + * The lcore variable handle introduces a per-variable (not
> >> + * per-value/per-lcore id) overhead of @c sizeof(void *) bytes, so
> >> + * there are some memory footprint gains to be made by organizing
> all
> >> + * per-lcore id data for a particular module as one lcore variable
> >> + * (e.g., as a struct).
> >> + *
> >> + * An application may choose to define an lcore variable handle,
> which
> >> + * it then it goes on to never allocate.
> >> + *
> >> + * The size of a lcore variable's value must be less than the DPDK
> >> + * build-time constant @c RTE_MAX_LCORE_VAR.
> >> + *
> >> + * The lcore variable are stored in a series of lcore buffers, which
> >> + * are allocated from the libc heap. Heap allocation failures are
> >> + * treated as fatal.
> >> + *
> >> + * Lcore variables should generally *not* be @ref
> __rte_cache_aligned
> >> + * and need *not* include a @ref RTE_CACHE_GUARD field, since the
> use
> >> + * of these constructs are designed to avoid false sharing. In the
> >> + * case of an lcore variable instance, the thread most recently
> >> + * accessing nearby data structures should almost-always the lcore
> >
> > Missing word: should almost-always *be* the lcore variables' owner.
> >
> 
> Fixed.
> 
> >
> >> + * variables' owner. Adding padding will increase the effective
> memory
> >> + * working set size, potentially reducing performance.
> >> + *
> >> + * Lcore variable values take on an initial value of zero.
> >> + *
> >> + * @b Example
> >> + *
> >> + * Below is an example of the use of an lcore variable:
> >> + *
> >> + * @code{.c}
> >> + * struct foo_lcore_state {
> >> + *         int a;
> >> + *         long b;
> >> + * };
> >> + *
> >> + * static RTE_LCORE_VAR_HANDLE(struct foo_lcore_state,
> lcore_states);
> >> + *
> >> + * long foo_get_a_plus_b(void)
> >> + * {
> >> + *         struct foo_lcore_state *state =
> RTE_LCORE_VAR_VALUE(lcore_states);
> >> + *
> >> + *         return state->a + state->b;
> >> + * }
> >> + *
> >> + * RTE_INIT(rte_foo_init)
> >> + * {
> >> + *         RTE_LCORE_VAR_ALLOC(lcore_states);
> >> + *
> >> + *         struct foo_lcore_state *state;
> >> + *         RTE_LCORE_VAR_FOREACH_VALUE(state, lcore_states) {
> >> + *                 (initialize 'state')
> >
> > Consider: (initialize 'state') -> /* initialize 'state' */
> >
> 
> I think I tried that, and it failed because the compiler didn't like
> nested comments.

OK, no objections. Just leave it as is.

> 
> >> + *         }
> >> + *
> >> + *         (other initialization)
> >
> > Consider: (other initialization) -> /* other initialization */
> >
> >> + * }
> >> + * @endcode
> >> + *
> >> + *
> >> + * @b Alternatives
> >> + *
> >> + * Lcore variables are designed to replace a pattern exemplified
> below:
> >> + * @code{.c}
> >> + * struct __rte_cache_aligned foo_lcore_state {
> >> + *         int a;
> >> + *         long b;
> >> + *         RTE_CACHE_GUARD;
> >> + * };
> >> + *
> >> + * static struct foo_lcore_state lcore_states[RTE_MAX_LCORE];
> >> + * @endcode
> >> + *
> >> + * This scheme is simple and effective, but has one drawback: the
> data
> >> + * is organized so that objects related to all lcores for a
> particular
> >> + * module is kept close in memory. At a bare minimum, this forces
> the
> >> + * use of cache-line alignment to avoid false sharing. With CPU
> >
> > Consider adding: use of *padding to* cache-line alignment
> > My point here is:
> > This sentence should somehow include the word "padding".
> 
> I'm not sure everyone thinks about __rte_cache_aligned or cache-aligned
> heap allocations as "padded."
> 
> > This paragraph is not only aboud cache line alignment, it is primarily
> about padding.
> >
> 
> "At a bare minimum, this requires sizing data structures (e.g., using
> `__rte_cache_aligned`) to an even number of cache lines to avoid false
> sharing."
> 
> How about this?

OK. Sizing might imply padding, so it serves the point I was targeting.
But "even number" -> "whole number". The number might be odd. :-)

> 
> >> + * hardware prefetching and memory loads resulting from speculative
> >> + * execution (functions which seemingly are getting more eager
> faster
> >> + * than they are getting more intelligent), one or more "guard"
> cache
> >> + * lines may be required to separate one lcore's data from
> another's.
> >> + *
> >> + * Lcore variables has the upside of working with, not against, the
> >
> > Typo: has -> have
> >
> 
> Fixed.
> 
> >> + * CPU's assumptions and for example next-line prefetchers may well
> >> + * work the way its designers intended (i.e., to the benefit, not
> >> + * detriment, of system performance).
> >> + *
> >> + * Another alternative to @ref rte_lcore_var.h is the @ref
> >> + * rte_per_lcore.h API, which make use of thread-local storage (TLS,
> >
> > Typo: make -> makes >
> 
> Fixed.
> 
> >> + * e.g., GCC __thread or C11 _Thread_local). The main differences
> >> + * between by using the various forms of TLS (e.g., @ref
> >> + * RTE_DEFINE_PER_LCORE or _Thread_local) and the use of lcore
> >> + * variables are:
> >> + *
> >> + *   * The existence and non-existence of a thread-local variable
> >> + *     instance follow that of particular thread's. The data cannot
> be
> >
> > Typo: "thread's" -> "threads", I think. :-/
> >
> 
> It's not a typo.

OK.

> 
> >> + *     accessed before the thread has been created, nor after it has
> >> + *     exited. As a result, thread-local variables must initialized
> in
> >
> > Missing word: must *be* initialized
> >
> 
> Fixed.
> 
> >> + *     a "lazy" manner (e.g., at the point of thread creation).
> Lcore
> >> + *     variables may be accessed immediately after having been
> >> + *     allocated (which may be prior any thread beyond the main
> >> + *     thread is running).
> >> + *   * A thread-local variable is duplicated across all threads in
> the
> >> + *     process, including unregistered non-EAL threads (i.e.,
> >> + *     "regular" threads). For DPDK applications heavily relying on
> >> + *     multi-threading (in conjunction to DPDK's "one thread per
> core"
> >> + *     pattern), either by having many concurrent threads or
> >> + *     creating/destroying threads at a high rate, an excessive use
> of
> >> + *     thread-local variables may cause inefficiencies (e.g.,
> >> + *     increased thread creation overhead due to thread-local
> storage
> >> + *     initialization or increased total RAM footprint usage). Lcore
> >> + *     variables *only* exist for threads with an lcore id.
> >> + *   * If data in thread-local storage may be shared between threads
> >> + *     (i.e., can a pointer to a thread-local variable be passed to
> >> + *     and successfully dereferenced by non-owning thread) depends
> on
> >> + *     the details of the TLS implementation. With GCC __thread and
> >> + *     GCC _Thread_local, such data sharing is supported. In the C11
> >> + *     standard, the result of accessing another thread's
> >> + *     _Thread_local object is implementation-defined. Lcore
> variable
> >> + *     instances may be accessed reliably by any thread.
> >> + */
> >> +
> >> +#include <stddef.h>
> >> +#include <stdalign.h>
> >> +
> >> +#include <rte_common.h>
> >> +#include <rte_config.h>
> >> +#include <rte_lcore.h>
> >> +
> >> +#ifdef __cplusplus
> >> +extern "C" {
> >> +#endif
> >> +
> >> +/**
> >> + * Given the lcore variable type, produces the type of the lcore
> >> + * variable handle.
> >> + */
> >> +#define RTE_LCORE_VAR_HANDLE_TYPE(type)		\
> >> +	type *
> >> +
> >> +/**
> >> + * Define a lcore variable handle.
> >
> > Typo: "a lcore" -> "an lcore"
> > Found this typo multiple times; search-replace "a lcore".
> >
> 
> Yes, fixed.
> 
> >> + *
> >> + * This macro defines a variable which is used as a handle to access
> >> + * the various per-lcore id instances of a per-lcore id variable.
> >
> > Suggest:
> > "the various per-lcore id instances of a per-lcore id variable" ->
> > "the various instances of a per-lcore id variable" >
> 
> Sounds good.
> 
> >> + *
> >> + * The aim with this macro is to make clear at the point of
> >> + * declaration that this is an lcore handler, rather than a regular
> >> + * pointer.
> >> + *
> >> + * Add @b static as a prefix in case the lcore variable are only to
> be
> >
> > Typo: are -> is
> >
> 
> Fixed.
> 
> >> + * accessed from a particular translation unit.
> >> + */
> >> +#define RTE_LCORE_VAR_HANDLE(type, name)	\
> >> +	RTE_LCORE_VAR_HANDLE_TYPE(type) name
> >> +
> >> +/**
> >> + * Allocate space for an lcore variable, and initialize its handle.
> >> + *
> >> + * The values of the lcore variable are initialized to zero.
> >
> > Consider adding: "the lcore variable *instances* are initialized"
> > Found this typo multiple times; search-replace.
> >
> 
> It's not a typo. "Values" is just short for "instances of the value",
> just like "instances" is. Using instances everywhere may confuse the
> reader that an instance both a name and a value, which is not the case.
> I don't know, maybe I should be using "values" everywhere instead of
> "instances".
> 
> I agree there's some lack of consistency here and potential room for
> improvement, but I'm not sure exactly how improvement looks like.

Yes, perhaps using "values" (instead of "instances of the value") everywhere,
and avoiding "instances", might be better.

If you repeat/paraphrase your above explanation in the documentation and/or source code, it should cover it.

> 
> >> + */
> >> +#define RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, size, align)	\
> >> +	handle = rte_lcore_var_alloc(size, align)
> >> +
> >> +/**
> >> + * Allocate space for an lcore variable, and initialize its handle,
> >> + * with values aligned for any type of object.
> >> + *
> >> + * The values of the lcore variable are initialized to zero.
> >> + */
> >> +#define RTE_LCORE_VAR_ALLOC_SIZE(handle, size)	\
> >> +	RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, size, 0)
> >> +
> >> +/**
> >> + * Allocate space for an lcore variable of the size and alignment
> >> requirements
> >> + * suggested by the handler pointer type, and initialize its handle.
> >> + *
> >> + * The values of the lcore variable are initialized to zero.
> >> + */
> >> +#define RTE_LCORE_VAR_ALLOC(handle)					\
> >> +	RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, sizeof(*(handle)),	\
> >> +				       alignof(typeof(*(handle))))
> >> +
> >> +/**
> >> + * Allocate an explicitly-sized, explicitly-aligned lcore variable
> by
> >> + * means of a @ref RTE_INIT constructor.
> >> + *
> >> + * The values of the lcore variable are initialized to zero.
> >> + */
> >> +#define RTE_LCORE_VAR_INIT_SIZE_ALIGN(name, size, align)		\
> >> +	RTE_INIT(rte_lcore_var_init_ ## name)				\
> >> +	{								\
> >> +		RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(name, size, align);	\
> >> +	}
> >> +
> >> +/**
> >> + * Allocate an explicitly-sized lcore variable by means of a @ref
> >> + * RTE_INIT constructor.
> >> + *
> >> + * The values of the lcore variable are initialized to zero.
> >> + */
> >> +#define RTE_LCORE_VAR_INIT_SIZE(name, size)		\
> >> +	RTE_LCORE_VAR_INIT_SIZE_ALIGN(name, size, 0)
> >> +
> >> +/**
> >> + * Allocate an lcore variable by means of a @ref RTE_INIT
> constructor.
> >> + *
> >> + * The values of the lcore variable are initialized to zero.
> >> + */
> >> +#define RTE_LCORE_VAR_INIT(name)					\
> >> +	RTE_INIT(rte_lcore_var_init_ ## name)				\
> >> +	{								\
> >> +		RTE_LCORE_VAR_ALLOC(name);				\
> >> +	}
> >> +
> >> +/**
> >> + * Get void pointer to lcore variable instance with the specified
> >> + * lcore id.
> >> + *
> >> + * @param lcore_id
> >> + *   The lcore id specifying which of the @c RTE_MAX_LCORE value
> >> + *   instances should be accessed. The lcore id need not be valid
> >> + *   (e.g., may be @ref LCORE_ID_ANY), but in such a case, the
> pointer
> >> + *   is also not valid (and thus should not be dereferenced).
> >> + * @param handle
> >> + *   The lcore variable handle.
> >> + */
> >> +static inline void *
> >> +rte_lcore_var_lcore_ptr(unsigned int lcore_id, void *handle)
> >> +{
> >> +	return RTE_PTR_ADD(handle, lcore_id * RTE_MAX_LCORE_VAR);
> >> +}
> >> +
> >> +/**
> >> + * Get pointer to lcore variable instance with the specified lcore
> id.
> >> + *
> >> + * @param lcore_id
> >> + *   The lcore id specifying which of the @c RTE_MAX_LCORE value
> >> + *   instances should be accessed. The lcore id need not be valid
> >> + *   (e.g., may be @ref LCORE_ID_ANY), but in such a case, the
> pointer
> >> + *   is also not valid (and thus should not be dereferenced).
> >> + * @param handle
> >> + *   The lcore variable handle.
> >> + */
> >> +#define RTE_LCORE_VAR_LCORE_VALUE(lcore_id, handle)
> 	\
> >> +	((typeof(handle))rte_lcore_var_lcore_ptr(lcore_id, handle))
> >> +
> >> +/**
> >> + * Get pointer to lcore variable instance of the current thread.
> >> + *
> >> + * May only be used by EAL threads and registered non-EAL threads.
> >> + */
> >> +#define RTE_LCORE_VAR_VALUE(handle) \
> >> +	RTE_LCORE_VAR_LCORE_VALUE(rte_lcore_id(), handle)
> >> +
> >> +/**
> >> + * Iterate over each lcore id's value for a lcore variable.
> >> + *
> >> + * @param value
> >> + *   A pointer set successivly set to point to lcore variable value
> >
> > "set successivly set" -> "successivly set"

Don't forget.

> >
> > Thinking out loud, ignore at your preference:
> > During the RFC discussions, the term used for referring to an lcore
> variable was discussed;
> > we considered "pointer", but settled for "value".
> > Perhaps "instance" would be usable in comments like like the one
> describing this function...
> > "A pointer set successivly set to point to lcore variable value" ->
> > "A pointer set successivly set to point to lcore variable instance".
> > I don't know.
> >
> 
> I also don't know.

Referring to the terminology above, if you go for "value" rather than "instance" (or "instance of the value"), stick with "value" here too.

> 
> >
> >> + *   corresponding to every lcore id (up to @c RTE_MAX_LCORE).
> >> + * @param handle
> >> + *   The lcore variable handle.
> >> + */
> >> +#define RTE_LCORE_VAR_FOREACH_VALUE(value, handle)			\
> >> +	for (unsigned int lcore_id =					\
> >> +		     (((value) = RTE_LCORE_VAR_LCORE_VALUE(0, handle)), 0);
> \
> >> +	     lcore_id < RTE_MAX_LCORE;					\
> >> +	     lcore_id++, (value) = RTE_LCORE_VAR_LCORE_VALUE(lcore_id,
> handle))
> >> +
> >> +/**
> >> + * Allocate space in the per-lcore id buffers for a lcore variable.
> >> + *
> >> + * The pointer returned is only an opaque identifer of the variable.
> To
> >> + * get an actual pointer to a particular instance of the variable
> use
> >> + * @ref RTE_LCORE_VAR_VALUE or @ref RTE_LCORE_VAR_LCORE_VALUE.
> >> + *
> >> + * The lcore variable values' memory is set to zero.
> >> + *
> >> + * The allocation is always successful, barring a fatal exhaustion
> of
> >> + * the per-lcore id buffer space.
> >> + *
> >> + * rte_lcore_var_alloc() is not multi-thread safe.
> >> + *
> >> + * @param size
> >> + *   The size (in bytes) of the variable's per-lcore id value. Must
> be > 0.
> >> + * @param align
> >> + *   If 0, the values will be suitably aligned for any kind of type
> >> + *   (i.e., alignof(max_align_t)). Otherwise, the values will be
> aligned
> >> + *   on a multiple of *align*, which must be a power of 2 and equal
> or
> >> + *   less than @c RTE_CACHE_LINE_SIZE.
> >> + * @return
> >> + *   The id of the variable, stored in a void pointer value. The
> value
> >
> > "id" -> "handle"
> >
> 
> Fixed.
> 
> >> + *   is always non-NULL.
> >> + */
> >> +__rte_experimental
> >> +void *
> >> +rte_lcore_var_alloc(size_t size, size_t align);
> >> +
> >> +#ifdef __cplusplus
> >> +}
> >> +#endif
> >> +
> >> +#endif /* _RTE_LCORE_VAR_H_ */
> >> diff --git a/lib/eal/version.map b/lib/eal/version.map
> >> index e3ff412683..5f5a3522c0 100644
> >> --- a/lib/eal/version.map
> >> +++ b/lib/eal/version.map
> >> @@ -396,6 +396,9 @@ EXPERIMENTAL {
> >>
> >>   	# added in 24.03
> >>   	rte_vfio_get_device_info; # WINDOWS_NO_EXPORT
> >> +
> >> +	rte_lcore_var_alloc;
> >> +	rte_lcore_var;
> >
> > No such function: rte_lcore_var
> 
> Indeed. That variable is gone. Fixed.
> 
> Thanks a lot of your review Morten.

Thanks a lot for your contribution, Mattias. :-)

> 
> >
> >>   };
> >>
> >>   INTERNAL {
> >> --
> >> 2.34.1
> >
  
Stephen Hemminger Sept. 10, 2024, 3:55 p.m. UTC | #4
On Tue, 10 Sep 2024 12:44:49 +0200
Mattias Rönnblom <hofors@lysator.liu.se> wrote:

> "lcore" is just another word for "EAL thread." The lcore variables exist 
> in one instance for every thread with an lcore id, thus also for 
> registered non-EAL threads (i.e., threads which are not lcores).
> 
> I've tried to summarize the (very confusing) terminology of DPDK's 
> threading model here:
> https://ericsson.github.io/dataplanebook/threading/threading.html#eal-threads
> 
> So, in my world, "per-lcore id variables" is pretty accurate. You could 
> say "variables with per-lcore id values" if you want to make it even 
> more clear, what's going on.

This is good and should be in DPDK documentation along with references
to other Intel/Arm documentation.

I don't see a glossary section in current documentation.
The issue goes deeper there is no clear introduction in the current DPDK documentation.

My suggestion would be something similar to Fd.io VPP and other projects

	About DPDK
	- Introduction
	- Glossary
	- Supported platforms
	- Release notes
	- FAQ

	Getting stated
	- Getting started on Linux
	...
	- Sample Applications

	Developer documentation
	- Programmer’s Guide
	- HowTo Guides
	- DPDK Tools User Guides
	- Testpmd Application User Guide
	- Drivers
	    - Network Interface
	    - Baseband
		...
  
Morten Brørup Sept. 11, 2024, 10:32 a.m. UTC | #5
> +static void *lcore_buffer;
[...]
> +		lcore_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE,
> +					     LCORE_BUFFER_SIZE);

Since lcore_buffer is never freed again, it is easy to support Windows:

#ifdef RTE_EXEC_ENV_WINDOWS
#include <malloc.h>
#endif

#ifndef RTE_EXEC_ENV_WINDOWS
lcore_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE,
		LCORE_BUFFER_SIZE);
#else
/* Never freed again, so don't worry about _aligned_free(). */
lcore_buffer = _aligned_malloc(LCORE_BUFFER_SIZE,
		RTE_CACHE_LINE_SIZE);
#endif

Ref:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/aligned-malloc?view=msvc-170

NB: Note the opposite parameter order.
  
Mattias Rönnblom Sept. 11, 2024, 3:05 p.m. UTC | #6
On 2024-09-11 12:32, Morten Brørup wrote:
>> +static void *lcore_buffer;
> [...]
>> +		lcore_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE,
>> +					     LCORE_BUFFER_SIZE);
> 
> Since lcore_buffer is never freed again, it is easy to support Windows:
> 
> #ifdef RTE_EXEC_ENV_WINDOWS
> #include <malloc.h>
> #endif
> 
> #ifndef RTE_EXEC_ENV_WINDOWS
> lcore_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE,
> 		LCORE_BUFFER_SIZE);
> #else
> /* Never freed again, so don't worry about _aligned_free(). */

What is the reason for this comment? It seems like it addresses the 
Windows code path in particular.

> lcore_buffer = _aligned_malloc(LCORE_BUFFER_SIZE,
> 		RTE_CACHE_LINE_SIZE);
> #endif
> 
> Ref:
> https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/aligned-malloc?view=msvc-170
> 
> NB: Note the opposite parameter order.
> 

Thanks. I will add something like this.
  
Morten Brørup Sept. 11, 2024, 3:07 p.m. UTC | #7
> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Wednesday, 11 September 2024 17.05
> 
> On 2024-09-11 12:32, Morten Brørup wrote:
> >> +static void *lcore_buffer;
> > [...]
> >> +		lcore_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE,
> >> +					     LCORE_BUFFER_SIZE);
> >
> > Since lcore_buffer is never freed again, it is easy to support
> Windows:
> >
> > #ifdef RTE_EXEC_ENV_WINDOWS
> > #include <malloc.h>
> > #endif
> >
> > #ifndef RTE_EXEC_ENV_WINDOWS
> > lcore_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE,
> > 		LCORE_BUFFER_SIZE);
> > #else
> > /* Never freed again, so don't worry about _aligned_free(). */
> 
> What is the reason for this comment? It seems like it addresses the
> Windows code path in particular.

It is Windows specific.
Memory allocated with _aligned_malloc() cannot be freed with free(); it needs to be freed with _aligned_free().

> 
> > lcore_buffer = _aligned_malloc(LCORE_BUFFER_SIZE,
> > 		RTE_CACHE_LINE_SIZE);
> > #endif
> >
> > Ref:
> > https://learn.microsoft.com/en-us/cpp/c-runtime-
> library/reference/aligned-malloc?view=msvc-170
> >
> > NB: Note the opposite parameter order.
> >
> 
> Thanks. I will add something like this.
  

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index c5a703b5c0..362d9a3f28 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -282,6 +282,12 @@  F: lib/eal/include/rte_random.h
 F: lib/eal/common/rte_random.c
 F: app/test/test_rand_perf.c
 
+Lcore Variables
+M: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
+F: lib/eal/include/rte_lcore_var.h
+F: lib/eal/common/eal_common_lcore_var.c
+F: app/test/test_lcore_var.c
+
 ARM v7
 M: Wathsala Vithanage <wathsala.vithanage@arm.com>
 F: config/arm/
diff --git a/config/rte_config.h b/config/rte_config.h
index dd7bb0d35b..311692e498 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -41,6 +41,7 @@ 
 /* EAL defines */
 #define RTE_CACHE_GUARD_LINES 1
 #define RTE_MAX_HEAPS 32
+#define RTE_MAX_LCORE_VAR 1048576
 #define RTE_MAX_MEMSEG_LISTS 128
 #define RTE_MAX_MEMSEG_PER_LIST 8192
 #define RTE_MAX_MEM_MB_PER_LIST 32768
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9f0300126..07d7cbc66c 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -99,6 +99,7 @@  The public API headers are grouped by topics:
   [interrupts](@ref rte_interrupts.h),
   [launch](@ref rte_launch.h),
   [lcore](@ref rte_lcore.h),
+  [lcore-varible](@ref rte_lcore_var.h),
   [per-lcore](@ref rte_per_lcore.h),
   [service cores](@ref rte_service.h),
   [keepalive](@ref rte_keepalive.h),
diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 0ff70d9057..adb8eb404d 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -55,6 +55,20 @@  New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added EAL per-lcore static memory allocation facility.**
+
+    Added EAL API <rte_lcore_var.h> for statically allocating small,
+    frequently-accessed data structures, for which one instance should
+    exist for each lcore.
+
+    With lcore variables, data is organized spatially on a per-lcore
+    basis, rather than per library or PMD, avoiding the need for cache
+    aligning (or RTE_CACHE_GUARDing) data structures, which in turn
+    reduces CPU cache internal fragmentation, improving performance.
+
+    Lcore variables are similar to thread-local storage (TLS, e.g.,
+    C11 _Thread_local), but decoupling the values' life time from that
+    of the threads.
 
 Removed Items
 -------------
diff --git a/lib/eal/common/eal_common_lcore_var.c b/lib/eal/common/eal_common_lcore_var.c
new file mode 100644
index 0000000000..74ad8272ec
--- /dev/null
+++ b/lib/eal/common/eal_common_lcore_var.c
@@ -0,0 +1,69 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Ericsson AB
+ */
+
+#include <inttypes.h>
+#include <stdlib.h>
+
+#include <rte_common.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include <rte_lcore_var.h>
+
+#include "eal_private.h"
+
+#define LCORE_BUFFER_SIZE (RTE_MAX_LCORE_VAR * RTE_MAX_LCORE)
+
+static void *lcore_buffer;
+static size_t offset = RTE_MAX_LCORE_VAR;
+
+static void *
+lcore_var_alloc(size_t size, size_t align)
+{
+	void *handle;
+	void *value;
+
+	offset = RTE_ALIGN_CEIL(offset, align);
+
+	if (offset + size > RTE_MAX_LCORE_VAR) {
+		lcore_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE,
+					     LCORE_BUFFER_SIZE);
+		RTE_VERIFY(lcore_buffer != NULL);
+
+		offset = 0;
+	}
+
+	handle = RTE_PTR_ADD(lcore_buffer, offset);
+
+	offset += size;
+
+	RTE_LCORE_VAR_FOREACH_VALUE(value, handle)
+		memset(value, 0, size);
+
+	EAL_LOG(DEBUG, "Allocated %"PRIuPTR" bytes of per-lcore data with a "
+		"%"PRIuPTR"-byte alignment", size, align);
+
+	return handle;
+}
+
+void *
+rte_lcore_var_alloc(size_t size, size_t align)
+{
+	/* Having the per-lcore buffer size aligned on cache lines
+	 * assures as well as having the base pointer aligned on cache
+	 * size assures that aligned offsets also translate to alipgned
+	 * pointers across all values.
+	 */
+	RTE_BUILD_BUG_ON(RTE_MAX_LCORE_VAR % RTE_CACHE_LINE_SIZE != 0);
+	RTE_ASSERT(align <= RTE_CACHE_LINE_SIZE);
+	RTE_ASSERT(size <= RTE_MAX_LCORE_VAR);
+
+	/* '0' means asking for worst-case alignment requirements */
+	if (align == 0)
+		align = alignof(max_align_t);
+
+	RTE_ASSERT(rte_is_power_of_2(align));
+
+	return lcore_var_alloc(size, align);
+}
diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
index 22a626ba6f..d41403680b 100644
--- a/lib/eal/common/meson.build
+++ b/lib/eal/common/meson.build
@@ -18,6 +18,7 @@  sources += files(
         'eal_common_interrupts.c',
         'eal_common_launch.c',
         'eal_common_lcore.c',
+        'eal_common_lcore_var.c',
         'eal_common_mcfg.c',
         'eal_common_memalloc.c',
         'eal_common_memory.c',
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index e94b056d46..9449253e23 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -27,6 +27,7 @@  headers += files(
         'rte_keepalive.h',
         'rte_launch.h',
         'rte_lcore.h',
+        'rte_lcore_var.h',
         'rte_lock_annotations.h',
         'rte_malloc.h',
         'rte_mcslock.h',
diff --git a/lib/eal/include/rte_lcore_var.h b/lib/eal/include/rte_lcore_var.h
new file mode 100644
index 0000000000..7d3178c424
--- /dev/null
+++ b/lib/eal/include/rte_lcore_var.h
@@ -0,0 +1,384 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Ericsson AB
+ */
+
+#ifndef _RTE_LCORE_VAR_H_
+#define _RTE_LCORE_VAR_H_
+
+/**
+ * @file
+ *
+ * RTE Per-lcore id variables
+ *
+ * This API provides a mechanism to create and access per-lcore id
+ * variables in a space- and cycle-efficient manner.
+ *
+ * A per-lcore id variable (or lcore variable for short) has one value
+ * for each EAL thread and registered non-EAL thread. There is one
+ * copy for each current and future lcore id-equipped thread, with the
+ * total number of copies amounting to @c RTE_MAX_LCORE. The value of
+ * an lcore variable for a particular lcore id is independent from
+ * other values (for other lcore ids) within the same lcore variable.
+ *
+ * In order to access the values of an lcore variable, a handle is
+ * used. The type of the handle is a pointer to the value's type
+ * (e.g., for @c uint32_t lcore variable, the handle is a
+ * <code>uint32_t *</code>. The handler type is used to inform the
+ * access macros the type of the values. A handle may be passed
+ * between modules and threads just like any pointer, but its value
+ * must be treated as a an opaque identifier. An allocated handle
+ * never has the value NULL.
+ *
+ * @b Creation
+ *
+ * An lcore variable is created in two steps:
+ *  1. Define a lcore variable handle by using @ref RTE_LCORE_VAR_HANDLE.
+ *  2. Allocate lcore variable storage and initialize the handle with
+ *     a unique identifier by @ref RTE_LCORE_VAR_ALLOC or
+ *     @ref RTE_LCORE_VAR_INIT. Allocation generally occurs the time of
+ *     module initialization, but may be done at any time.
+ *
+ * An lcore variable is not tied to the owning thread's lifetime. It's
+ * available for use by any thread immediately after having been
+ * allocated, and continues to be available throughout the lifetime of
+ * the EAL.
+ *
+ * Lcore variables cannot and need not be freed.
+ *
+ * @b Access
+ *
+ * The value of any lcore variable for any lcore id may be accessed
+ * from any thread (including unregistered threads), but it should
+ * only be *frequently* read from or written to by the owner.
+ *
+ * Values of the same lcore variable but owned by to different lcore
+ * ids may be frequently read or written by the owners without risking
+ * false sharing.
+ *
+ * An appropriate synchronization mechanism (e.g., atomic loads and
+ * stores) should employed to assure there are no data races between
+ * the owning thread and any non-owner threads accessing the same
+ * lcore variable instance.
+ *
+ * The value of the lcore variable for a particular lcore id is
+ * accessed using @ref RTE_LCORE_VAR_LCORE_VALUE.
+ *
+ * A common pattern is for an EAL thread or a registered non-EAL
+ * thread to access its own lcore variable value. For this purpose, a
+ * short-hand exists in the form of @ref RTE_LCORE_VAR_VALUE.
+ *
+ * Although the handle (as defined by @ref RTE_LCORE_VAR_HANDLE) is a
+ * pointer with the same type as the value, it may not be directly
+ * dereferenced and must be treated as an opaque identifier.
+ *
+ * Lcore variable handles and value pointers may be freely passed
+ * between different threads.
+ *
+ * @b Storage
+ *
+ * An lcore variable's values may by of a primitive type like @c int,
+ * but would more typically be a @c struct.
+ *
+ * The lcore variable handle introduces a per-variable (not
+ * per-value/per-lcore id) overhead of @c sizeof(void *) bytes, so
+ * there are some memory footprint gains to be made by organizing all
+ * per-lcore id data for a particular module as one lcore variable
+ * (e.g., as a struct).
+ *
+ * An application may choose to define an lcore variable handle, which
+ * it then it goes on to never allocate.
+ *
+ * The size of a lcore variable's value must be less than the DPDK
+ * build-time constant @c RTE_MAX_LCORE_VAR.
+ *
+ * The lcore variable are stored in a series of lcore buffers, which
+ * are allocated from the libc heap. Heap allocation failures are
+ * treated as fatal.
+ *
+ * Lcore variables should generally *not* be @ref __rte_cache_aligned
+ * and need *not* include a @ref RTE_CACHE_GUARD field, since the use
+ * of these constructs are designed to avoid false sharing. In the
+ * case of an lcore variable instance, the thread most recently
+ * accessing nearby data structures should almost-always the lcore
+ * variables' owner. Adding padding will increase the effective memory
+ * working set size, potentially reducing performance.
+ *
+ * Lcore variable values take on an initial value of zero.
+ *
+ * @b Example
+ *
+ * Below is an example of the use of an lcore variable:
+ *
+ * @code{.c}
+ * struct foo_lcore_state {
+ *         int a;
+ *         long b;
+ * };
+ *
+ * static RTE_LCORE_VAR_HANDLE(struct foo_lcore_state, lcore_states);
+ *
+ * long foo_get_a_plus_b(void)
+ * {
+ *         struct foo_lcore_state *state = RTE_LCORE_VAR_VALUE(lcore_states);
+ *
+ *         return state->a + state->b;
+ * }
+ *
+ * RTE_INIT(rte_foo_init)
+ * {
+ *         RTE_LCORE_VAR_ALLOC(lcore_states);
+ *
+ *         struct foo_lcore_state *state;
+ *         RTE_LCORE_VAR_FOREACH_VALUE(state, lcore_states) {
+ *                 (initialize 'state')
+ *         }
+ *
+ *         (other initialization)
+ * }
+ * @endcode
+ *
+ *
+ * @b Alternatives
+ *
+ * Lcore variables are designed to replace a pattern exemplified below:
+ * @code{.c}
+ * struct __rte_cache_aligned foo_lcore_state {
+ *         int a;
+ *         long b;
+ *         RTE_CACHE_GUARD;
+ * };
+ *
+ * static struct foo_lcore_state lcore_states[RTE_MAX_LCORE];
+ * @endcode
+ *
+ * This scheme is simple and effective, but has one drawback: the data
+ * is organized so that objects related to all lcores for a particular
+ * module is kept close in memory. At a bare minimum, this forces the
+ * use of cache-line alignment to avoid false sharing. With CPU
+ * hardware prefetching and memory loads resulting from speculative
+ * execution (functions which seemingly are getting more eager faster
+ * than they are getting more intelligent), one or more "guard" cache
+ * lines may be required to separate one lcore's data from another's.
+ *
+ * Lcore variables has the upside of working with, not against, the
+ * CPU's assumptions and for example next-line prefetchers may well
+ * work the way its designers intended (i.e., to the benefit, not
+ * detriment, of system performance).
+ *
+ * Another alternative to @ref rte_lcore_var.h is the @ref
+ * rte_per_lcore.h API, which make use of thread-local storage (TLS,
+ * e.g., GCC __thread or C11 _Thread_local). The main differences
+ * between by using the various forms of TLS (e.g., @ref
+ * RTE_DEFINE_PER_LCORE or _Thread_local) and the use of lcore
+ * variables are:
+ *
+ *   * The existence and non-existence of a thread-local variable
+ *     instance follow that of particular thread's. The data cannot be
+ *     accessed before the thread has been created, nor after it has
+ *     exited. As a result, thread-local variables must initialized in
+ *     a "lazy" manner (e.g., at the point of thread creation). Lcore
+ *     variables may be accessed immediately after having been
+ *     allocated (which may be prior any thread beyond the main
+ *     thread is running).
+ *   * A thread-local variable is duplicated across all threads in the
+ *     process, including unregistered non-EAL threads (i.e.,
+ *     "regular" threads). For DPDK applications heavily relying on
+ *     multi-threading (in conjunction to DPDK's "one thread per core"
+ *     pattern), either by having many concurrent threads or
+ *     creating/destroying threads at a high rate, an excessive use of
+ *     thread-local variables may cause inefficiencies (e.g.,
+ *     increased thread creation overhead due to thread-local storage
+ *     initialization or increased total RAM footprint usage). Lcore
+ *     variables *only* exist for threads with an lcore id.
+ *   * If data in thread-local storage may be shared between threads
+ *     (i.e., can a pointer to a thread-local variable be passed to
+ *     and successfully dereferenced by non-owning thread) depends on
+ *     the details of the TLS implementation. With GCC __thread and
+ *     GCC _Thread_local, such data sharing is supported. In the C11
+ *     standard, the result of accessing another thread's
+ *     _Thread_local object is implementation-defined. Lcore variable
+ *     instances may be accessed reliably by any thread.
+ */
+
+#include <stddef.h>
+#include <stdalign.h>
+
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_lcore.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Given the lcore variable type, produces the type of the lcore
+ * variable handle.
+ */
+#define RTE_LCORE_VAR_HANDLE_TYPE(type)		\
+	type *
+
+/**
+ * Define a lcore variable handle.
+ *
+ * This macro defines a variable which is used as a handle to access
+ * the various per-lcore id instances of a per-lcore id variable.
+ *
+ * The aim with this macro is to make clear at the point of
+ * declaration that this is an lcore handler, rather than a regular
+ * pointer.
+ *
+ * Add @b static as a prefix in case the lcore variable are only to be
+ * accessed from a particular translation unit.
+ */
+#define RTE_LCORE_VAR_HANDLE(type, name)	\
+	RTE_LCORE_VAR_HANDLE_TYPE(type) name
+
+/**
+ * Allocate space for an lcore variable, and initialize its handle.
+ *
+ * The values of the lcore variable are initialized to zero.
+ */
+#define RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, size, align)	\
+	handle = rte_lcore_var_alloc(size, align)
+
+/**
+ * Allocate space for an lcore variable, and initialize its handle,
+ * with values aligned for any type of object.
+ *
+ * The values of the lcore variable are initialized to zero.
+ */
+#define RTE_LCORE_VAR_ALLOC_SIZE(handle, size)	\
+	RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, size, 0)
+
+/**
+ * Allocate space for an lcore variable of the size and alignment requirements
+ * suggested by the handler pointer type, and initialize its handle.
+ *
+ * The values of the lcore variable are initialized to zero.
+ */
+#define RTE_LCORE_VAR_ALLOC(handle)					\
+	RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, sizeof(*(handle)),	\
+				       alignof(typeof(*(handle))))
+
+/**
+ * Allocate an explicitly-sized, explicitly-aligned lcore variable by
+ * means of a @ref RTE_INIT constructor.
+ *
+ * The values of the lcore variable are initialized to zero.
+ */
+#define RTE_LCORE_VAR_INIT_SIZE_ALIGN(name, size, align)		\
+	RTE_INIT(rte_lcore_var_init_ ## name)				\
+	{								\
+		RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(name, size, align);	\
+	}
+
+/**
+ * Allocate an explicitly-sized lcore variable by means of a @ref
+ * RTE_INIT constructor.
+ *
+ * The values of the lcore variable are initialized to zero.
+ */
+#define RTE_LCORE_VAR_INIT_SIZE(name, size)		\
+	RTE_LCORE_VAR_INIT_SIZE_ALIGN(name, size, 0)
+
+/**
+ * Allocate an lcore variable by means of a @ref RTE_INIT constructor.
+ *
+ * The values of the lcore variable are initialized to zero.
+ */
+#define RTE_LCORE_VAR_INIT(name)					\
+	RTE_INIT(rte_lcore_var_init_ ## name)				\
+	{								\
+		RTE_LCORE_VAR_ALLOC(name);				\
+	}
+
+/**
+ * Get void pointer to lcore variable instance with the specified
+ * lcore id.
+ *
+ * @param lcore_id
+ *   The lcore id specifying which of the @c RTE_MAX_LCORE value
+ *   instances should be accessed. The lcore id need not be valid
+ *   (e.g., may be @ref LCORE_ID_ANY), but in such a case, the pointer
+ *   is also not valid (and thus should not be dereferenced).
+ * @param handle
+ *   The lcore variable handle.
+ */
+static inline void *
+rte_lcore_var_lcore_ptr(unsigned int lcore_id, void *handle)
+{
+	return RTE_PTR_ADD(handle, lcore_id * RTE_MAX_LCORE_VAR);
+}
+
+/**
+ * Get pointer to lcore variable instance with the specified lcore id.
+ *
+ * @param lcore_id
+ *   The lcore id specifying which of the @c RTE_MAX_LCORE value
+ *   instances should be accessed. The lcore id need not be valid
+ *   (e.g., may be @ref LCORE_ID_ANY), but in such a case, the pointer
+ *   is also not valid (and thus should not be dereferenced).
+ * @param handle
+ *   The lcore variable handle.
+ */
+#define RTE_LCORE_VAR_LCORE_VALUE(lcore_id, handle)			\
+	((typeof(handle))rte_lcore_var_lcore_ptr(lcore_id, handle))
+
+/**
+ * Get pointer to lcore variable instance of the current thread.
+ *
+ * May only be used by EAL threads and registered non-EAL threads.
+ */
+#define RTE_LCORE_VAR_VALUE(handle) \
+	RTE_LCORE_VAR_LCORE_VALUE(rte_lcore_id(), handle)
+
+/**
+ * Iterate over each lcore id's value for a lcore variable.
+ *
+ * @param value
+ *   A pointer set successivly set to point to lcore variable value
+ *   corresponding to every lcore id (up to @c RTE_MAX_LCORE).
+ * @param handle
+ *   The lcore variable handle.
+ */
+#define RTE_LCORE_VAR_FOREACH_VALUE(value, handle)			\
+	for (unsigned int lcore_id =					\
+		     (((value) = RTE_LCORE_VAR_LCORE_VALUE(0, handle)), 0); \
+	     lcore_id < RTE_MAX_LCORE;					\
+	     lcore_id++, (value) = RTE_LCORE_VAR_LCORE_VALUE(lcore_id, handle))
+
+/**
+ * Allocate space in the per-lcore id buffers for a lcore variable.
+ *
+ * The pointer returned is only an opaque identifer of the variable. To
+ * get an actual pointer to a particular instance of the variable use
+ * @ref RTE_LCORE_VAR_VALUE or @ref RTE_LCORE_VAR_LCORE_VALUE.
+ *
+ * The lcore variable values' memory is set to zero.
+ *
+ * The allocation is always successful, barring a fatal exhaustion of
+ * the per-lcore id buffer space.
+ *
+ * rte_lcore_var_alloc() is not multi-thread safe.
+ *
+ * @param size
+ *   The size (in bytes) of the variable's per-lcore id value. Must be > 0.
+ * @param align
+ *   If 0, the values will be suitably aligned for any kind of type
+ *   (i.e., alignof(max_align_t)). Otherwise, the values will be aligned
+ *   on a multiple of *align*, which must be a power of 2 and equal or
+ *   less than @c RTE_CACHE_LINE_SIZE.
+ * @return
+ *   The id of the variable, stored in a void pointer value. The value
+ *   is always non-NULL.
+ */
+__rte_experimental
+void *
+rte_lcore_var_alloc(size_t size, size_t align);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_LCORE_VAR_H_ */
diff --git a/lib/eal/version.map b/lib/eal/version.map
index e3ff412683..5f5a3522c0 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -396,6 +396,9 @@  EXPERIMENTAL {
 
 	# added in 24.03
 	rte_vfio_get_device_info; # WINDOWS_NO_EXPORT
+
+	rte_lcore_var_alloc;
+	rte_lcore_var;
 };
 
 INTERNAL {