[v3] eal: add seqlock

Message ID 20220401150749.136921-1-mattias.ronnblom@ericsson.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series [v3] eal: add seqlock |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/iol-abi-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/github-robot: build success github build: passed
ci/intel-Testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS

Commit Message

Mattias Rönnblom April 1, 2022, 3:07 p.m. UTC
  A sequence lock (seqlock) is synchronization primitive which allows
for data-race free, low-overhead, high-frequency reads, especially for
data structures shared across many cores and which are updated with
relatively infrequently.

A seqlock permits multiple parallel readers. The variant of seqlock
implemented in this patch supports multiple writers as well. A
spinlock is used for writer-writer serialization.

To avoid resource reclamation and other issues, the data protected by
a seqlock is best off being self-contained (i.e., no pointers [except
to constant data]).

One way to think about seqlocks is that they provide means to perform
atomic operations on data objects larger what the native atomic
machine instructions allow for.

DPDK seqlocks are not preemption safe on the writer side. A thread
preemption affects performance, not correctness.

A seqlock contains a sequence number, which can be thought of as the
generation of the data it protects.

A reader will
  1. Load the sequence number (sn).
  2. Load, in arbitrary order, the seqlock-protected data.
  3. Load the sn again.
  4. Check if the first and second sn are equal, and even numbered.
     If they are not, discard the loaded data, and restart from 1.

The first three steps need to be ordered using suitable memory fences.

A writer will
  1. Take the spinlock, to serialize writer access.
  2. Load the sn.
  3. Store the original sn + 1 as the new sn.
  4. Perform load and stores to the seqlock-protected data.
  5. Store the original sn + 2 as the new sn.
  6. Release the spinlock.

Proper memory fencing is required to make sure the first sn store, the
data stores, and the second sn store appear to the reader in the
mentioned order.

The sn loads and stores must be atomic, but the data loads and stores
need not be.

The original seqlock design and implementation was done by Stephen
Hemminger. This is an independent implementation, using C11 atomics.

For more information on seqlocks, see
https://en.wikipedia.org/wiki/Seqlock

PATCH v3:
  * Renamed both read and write-side critical section begin/end functions
    to better match rwlock naming, per Ola Liljedahl's suggestion.
  * Added 'extern "C"' guards for C++ compatibility.
  * Refer to the main lcore as the main, and not the master.

PATCH v2:
  * Skip instead of fail unit test in case too few lcores are available.
  * Use main lcore for testing, reducing the minimum number of lcores
    required to run the unit tests to four.
  * Consistently refer to sn field as the "sequence number" in the
    documentation.
  * Fixed spelling mistakes in documentation.

Updates since RFC:
  * Added API documentation.
  * Added link to Wikipedia article in the commit message.
  * Changed seqlock sequence number field from uint64_t (which was
    overkill) to uint32_t. The sn type needs to be sufficiently large
    to assure no reader will read a sn, access the data, and then read
    the same sn, but the sn has been updated to many times during the
    read, so it has wrapped.
  * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
  * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
    with an anonymous struct typedef:ed to rte_seqlock_t.

Acked-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/meson.build          |   2 +
 app/test/test_seqlock.c       | 202 +++++++++++++++++++++++
 lib/eal/common/meson.build    |   1 +
 lib/eal/common/rte_seqlock.c  |  12 ++
 lib/eal/include/meson.build   |   1 +
 lib/eal/include/rte_seqlock.h | 302 ++++++++++++++++++++++++++++++++++
 lib/eal/version.map           |   3 +
 7 files changed, 523 insertions(+)
 create mode 100644 app/test/test_seqlock.c
 create mode 100644 lib/eal/common/rte_seqlock.c
 create mode 100644 lib/eal/include/rte_seqlock.h
  

Comments

Honnappa Nagarahalli April 2, 2022, 12:21 a.m. UTC | #1
Hi Mattias,
	Few comments inline.

> -----Original Message-----
> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Sent: Friday, April 1, 2022 10:08 AM
> To: dev@dpdk.org
> Cc: thomas@monjalon.net; David Marchand <david.marchand@redhat.com>;
> onar.olsen@ericsson.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>;
> konstantin.ananyev@intel.com; mb@smartsharesystems.com;
> stephen@networkplumber.org; Mattias Rönnblom
> <mattias.ronnblom@ericsson.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>
> Subject: [PATCH v3] eal: add seqlock
> 
> A sequence lock (seqlock) is synchronization primitive which allows for data-
> race free, low-overhead, high-frequency reads, especially for data structures
> shared across many cores and which are updated with relatively infrequently.
> 
> A seqlock permits multiple parallel readers. The variant of seqlock implemented
> in this patch supports multiple writers as well. A spinlock is used for writer-
> writer serialization.
> 
> To avoid resource reclamation and other issues, the data protected by a seqlock
> is best off being self-contained (i.e., no pointers [except to constant data]).
> 
> One way to think about seqlocks is that they provide means to perform atomic
> operations on data objects larger what the native atomic machine instructions
> allow for.
> 
> DPDK seqlocks are not preemption safe on the writer side. A thread preemption
> affects performance, not correctness.
> 
> A seqlock contains a sequence number, which can be thought of as the
> generation of the data it protects.
> 
> A reader will
>   1. Load the sequence number (sn).
>   2. Load, in arbitrary order, the seqlock-protected data.
>   3. Load the sn again.
>   4. Check if the first and second sn are equal, and even numbered.
>      If they are not, discard the loaded data, and restart from 1.
> 
> The first three steps need to be ordered using suitable memory fences.
> 
> A writer will
>   1. Take the spinlock, to serialize writer access.
>   2. Load the sn.
>   3. Store the original sn + 1 as the new sn.
>   4. Perform load and stores to the seqlock-protected data.
>   5. Store the original sn + 2 as the new sn.
>   6. Release the spinlock.
> 
> Proper memory fencing is required to make sure the first sn store, the data
> stores, and the second sn store appear to the reader in the mentioned order.
> 
> The sn loads and stores must be atomic, but the data loads and stores need not
> be.
> 
> The original seqlock design and implementation was done by Stephen
> Hemminger. This is an independent implementation, using C11 atomics.
> 
> For more information on seqlocks, see
> https://en.wikipedia.org/wiki/Seqlock
> 
> PATCH v3:
>   * Renamed both read and write-side critical section begin/end functions
>     to better match rwlock naming, per Ola Liljedahl's suggestion.
>   * Added 'extern "C"' guards for C++ compatibility.
>   * Refer to the main lcore as the main, and not the master.
> 
> PATCH v2:
>   * Skip instead of fail unit test in case too few lcores are available.
>   * Use main lcore for testing, reducing the minimum number of lcores
>     required to run the unit tests to four.
>   * Consistently refer to sn field as the "sequence number" in the
>     documentation.
>   * Fixed spelling mistakes in documentation.
> 
> Updates since RFC:
>   * Added API documentation.
>   * Added link to Wikipedia article in the commit message.
>   * Changed seqlock sequence number field from uint64_t (which was
>     overkill) to uint32_t. The sn type needs to be sufficiently large
>     to assure no reader will read a sn, access the data, and then read
>     the same sn, but the sn has been updated to many times during the
>     read, so it has wrapped.
>   * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
>   * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
>     with an anonymous struct typedef:ed to rte_seqlock_t.
> 
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---
>  app/test/meson.build          |   2 +
>  app/test/test_seqlock.c       | 202 +++++++++++++++++++++++
>  lib/eal/common/meson.build    |   1 +
>  lib/eal/common/rte_seqlock.c  |  12 ++
>  lib/eal/include/meson.build   |   1 +
>  lib/eal/include/rte_seqlock.h | 302 ++++++++++++++++++++++++++++++++++
>  lib/eal/version.map           |   3 +
>  7 files changed, 523 insertions(+)
>  create mode 100644 app/test/test_seqlock.c  create mode 100644
> lib/eal/common/rte_seqlock.c  create mode 100644
> lib/eal/include/rte_seqlock.h
> 
> diff --git a/app/test/meson.build b/app/test/meson.build index
> 5fc1dd1b7b..5e418e8766 100644
> --- a/app/test/meson.build
> +++ b/app/test/meson.build
> @@ -125,6 +125,7 @@ test_sources = files(
>          'test_rwlock.c',
>          'test_sched.c',
>          'test_security.c',
> +        'test_seqlock.c',
>          'test_service_cores.c',
>          'test_spinlock.c',
>          'test_stack.c',
> @@ -214,6 +215,7 @@ fast_tests = [
>          ['rwlock_rde_wro_autotest', true],
>          ['sched_autotest', true],
>          ['security_autotest', false],
> +        ['seqlock_autotest', true],
>          ['spinlock_autotest', true],
>          ['stack_autotest', false],
>          ['stack_lf_autotest', false],
> diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode
> 100644 index 0000000000..54fadf8025
> --- /dev/null
> +++ b/app/test/test_seqlock.c
> @@ -0,0 +1,202 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2022 Ericsson AB
> + */
> +
> +#include <rte_seqlock.h>
> +
> +#include <rte_cycles.h>
> +#include <rte_malloc.h>
> +#include <rte_random.h>
> +
> +#include <inttypes.h>
> +
> +#include "test.h"
> +
> +struct data {
> +	rte_seqlock_t lock;
> +
> +	uint64_t a;
> +	uint64_t b __rte_cache_aligned;
> +	uint64_t c __rte_cache_aligned;
> +} __rte_cache_aligned;
> +
> +struct reader {
> +	struct data *data;
> +	uint8_t stop;
> +};
> +
> +#define WRITER_RUNTIME (2.0) /* s */
> +
> +#define WRITER_MAX_DELAY (100) /* us */
> +
> +#define INTERRUPTED_WRITER_FREQUENCY (1000) #define
> +WRITER_INTERRUPT_TIME (1) /* us */
> +
> +static int
> +writer_run(void *arg)
> +{
> +	struct data *data = arg;
> +	uint64_t deadline;
> +
> +	deadline = rte_get_timer_cycles() +
> +		WRITER_RUNTIME * rte_get_timer_hz();
> +
> +	while (rte_get_timer_cycles() < deadline) {
> +		bool interrupted;
> +		uint64_t new_value;
> +		unsigned int delay;
> +
> +		new_value = rte_rand();
> +
> +		interrupted =
> rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
> +
> +		rte_seqlock_write_lock(&data->lock);
> +
> +		data->c = new_value;
> +
> +		/* These compiler barriers (both on the test reader
> +		 * and the test writer side) are here to ensure that
> +		 * loads/stores *usually* happen in test program order
> +		 * (always on a TSO machine). They are arrange in such
> +		 * a way that the writer stores in a different order
> +		 * than the reader loads, to emulate an arbitrary
> +		 * order. A real application using a seqlock does not
> +		 * require any compiler barriers.
> +		 */
> +		rte_compiler_barrier();
The compiler barriers are not sufficient on all architectures (if the intention is to maintain the program order).

> +		data->b = new_value;
> +
> +		if (interrupted)
> +			rte_delay_us_block(WRITER_INTERRUPT_TIME);
> +
> +		rte_compiler_barrier();
> +		data->a = new_value;
> +
> +		rte_seqlock_write_unlock(&data->lock);
> +
> +		delay = rte_rand_max(WRITER_MAX_DELAY);
> +
> +		rte_delay_us_block(delay);
> +	}
> +
> +	return 0;
> +}
> +
> +#define INTERRUPTED_READER_FREQUENCY (1000) #define
> +READER_INTERRUPT_TIME (1000) /* us */
> +
> +static int
> +reader_run(void *arg)
> +{
> +	struct reader *r = arg;
> +	int rc = 0;
> +
> +	while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc ==
> 0) {
> +		struct data *data = r->data;
> +		bool interrupted;
> +		uint64_t a;
> +		uint64_t b;
> +		uint64_t c;
> +		uint32_t sn;
> +
> +		interrupted =
> rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0;
> +
> +		sn = rte_seqlock_read_lock(&data->lock);
> +
> +		do {
> +			a = data->a;
> +			/* See writer_run() for an explanation why
> +			 * these barriers are here.
> +			 */
> +			rte_compiler_barrier();
> +
> +			if (interrupted)
> +
> 	rte_delay_us_block(READER_INTERRUPT_TIME);
> +
> +			c = data->c;
> +
> +			rte_compiler_barrier();
> +			b = data->b;
> +
> +		} while (!rte_seqlock_read_tryunlock(&data->lock, &sn));
> +
> +		if (a != b || b != c) {
> +			printf("Reader observed inconsistent data values "
> +			       "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
> +			       a, b, c);
> +			rc = -1;
> +		}
> +	}
> +
> +	return rc;
> +}
> +
> +static void
> +reader_stop(struct reader *reader)
> +{
> +	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED); }
> +
> +#define NUM_WRITERS (2) /* main lcore + one worker */ #define
> +MIN_NUM_READERS (2) #define MAX_READERS (RTE_MAX_LCORE -
> NUM_WRITERS -
> +1) #define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS)
> +
> +/* Only a compile-time test */
> +static rte_seqlock_t __rte_unused static_init_lock =
> +RTE_SEQLOCK_INITIALIZER;
> +
> +static int
> +test_seqlock(void)
> +{
> +	struct reader readers[MAX_READERS];
> +	unsigned int num_readers;
> +	unsigned int num_lcores;
> +	unsigned int i;
> +	unsigned int lcore_id;
> +	unsigned int reader_lcore_ids[MAX_READERS];
> +	unsigned int worker_writer_lcore_id = 0;
> +	int rc = 0;
> +
> +	num_lcores = rte_lcore_count();
> +
> +	if (num_lcores < MIN_LCORE_COUNT) {
> +		printf("Too few cores to run test. Skipping.\n");
> +		return 0;
> +	}
> +
> +	num_readers = num_lcores - NUM_WRITERS;
> +
> +	struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0);
> +
> +	i = 0;
> +	RTE_LCORE_FOREACH_WORKER(lcore_id) {
> +		if (i == 0) {
> +			rte_eal_remote_launch(writer_run, data, lcore_id);
> +			worker_writer_lcore_id = lcore_id;
> +		} else {
> +			unsigned int reader_idx = i - 1;
> +			struct reader *reader = &readers[reader_idx];
> +
> +			reader->data = data;
> +			reader->stop = 0;
> +
> +			rte_eal_remote_launch(reader_run, reader, lcore_id);
> +			reader_lcore_ids[reader_idx] = lcore_id;
> +		}
> +		i++;
> +	}
> +
> +	if (writer_run(data) != 0 ||
> +	    rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
> +		rc = -1;
> +
> +	for (i = 0; i < num_readers; i++) {
> +		reader_stop(&readers[i]);
> +		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
> +			rc = -1;
> +	}
> +
> +	return rc;
> +}
> +
> +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
> diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index
> 917758cc65..a41343bfed 100644
> --- a/lib/eal/common/meson.build
> +++ b/lib/eal/common/meson.build
> @@ -35,6 +35,7 @@ sources += files(
>          'rte_malloc.c',
>          'rte_random.c',
>          'rte_reciprocal.c',
> +	'rte_seqlock.c',
>          'rte_service.c',
>          'rte_version.c',
>  )
> diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c new
> file mode 100644 index 0000000000..d4fe648799
> --- /dev/null
> +++ b/lib/eal/common/rte_seqlock.c
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2022 Ericsson AB
> + */
> +
> +#include <rte_seqlock.h>
> +
> +void
> +rte_seqlock_init(rte_seqlock_t *seqlock) {
> +	seqlock->sn = 0;
> +	rte_spinlock_init(&seqlock->lock);
> +}
> diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index
> 9700494816..48df5f1a21 100644
> --- a/lib/eal/include/meson.build
> +++ b/lib/eal/include/meson.build
> @@ -36,6 +36,7 @@ headers += files(
>          'rte_per_lcore.h',
>          'rte_random.h',
>          'rte_reciprocal.h',
> +        'rte_seqlock.h',
>          'rte_service.h',
>          'rte_service_component.h',
>          'rte_string_fns.h',
> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h new file
Other lock implementations are in lib/eal/include/generic.

> mode 100644 index 0000000000..44eacd66e8
> --- /dev/null
> +++ b/lib/eal/include/rte_seqlock.h
> @@ -0,0 +1,302 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2022 Ericsson AB
> + */
> +
> +#ifndef _RTE_SEQLOCK_H_
> +#define _RTE_SEQLOCK_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * @file
> + * RTE Seqlock
> + *
> + * A sequence lock (seqlock) is a synchronization primitive allowing
> + * multiple, parallel, readers to efficiently and safely (i.e., in a
> + * data-race free manner) access the lock-protected data. The RTE
> + * seqlock permits multiple writers as well. A spinlock is used to
> + * writer-writer synchronization.
> + *
> + * A reader never blocks a writer. Very high frequency writes may
> + * prevent readers from making progress.
> + *
> + * A seqlock is not preemption-safe on the writer side. If a writer is
> + * preempted, it may block readers until the writer thread is again
> + * allowed to execute. Heavy computations should be kept out of the
> + * writer-side critical section, to avoid delaying readers.
> + *
> + * Seqlocks are useful for data which are read by many cores, at a
> + * high frequency, and relatively infrequently written to.
> + *
> + * One way to think about seqlocks is that they provide means to
> + * perform atomic operations on objects larger than what the native
> + * machine instructions allow for.
> + *
> + * To avoid resource reclamation issues, the data protected by a
> + * seqlock should typically be kept self-contained (e.g., no pointers
> + * to mutable, dynamically allocated data).
> + *
> + * Example usage:
> + * @code{.c}
> + * #define MAX_Y_LEN (16)
> + * // Application-defined example data structure, protected by a seqlock.
> + * struct config {
> + *         rte_seqlock_t lock;
> + *         int param_x;
> + *         char param_y[MAX_Y_LEN];
> + * };
> + *
> + * // Accessor function for reading config fields.
> + * void
> + * config_read(const struct config *config, int *param_x, char
> +*param_y)
> + * {
> + *         // Temporary variables, just to improve readability.
I think the above comment is not necessary. It is beneficial to copy the protected data to keep the read side critical section small.

> + *         int tentative_x;
> + *         char tentative_y[MAX_Y_LEN];
> + *         uint32_t sn;
> + *
> + *         sn = rte_seqlock_read_lock(&config->lock);
> + *         do {
> + *                 // Loads may be atomic or non-atomic, as in this example.
> + *                 tentative_x = config->param_x;
> + *                 strcpy(tentative_y, config->param_y);
> + *         } while (!rte_seqlock_read_tryunlock(&config->lock, &sn));
> + *         // An application could skip retrying, and try again later, if
> + *         // progress is possible without the data.
> + *
> + *         *param_x = tentative_x;
> + *         strcpy(param_y, tentative_y);
> + * }
> + *
> + * // Accessor function for writing config fields.
> + * void
> + * config_update(struct config *config, int param_x, const char
> +*param_y)
> + * {
> + *         rte_seqlock_write_lock(&config->lock);
> + *         // Stores may be atomic or non-atomic, as in this example.
> + *         config->param_x = param_x;
> + *         strcpy(config->param_y, param_y);
> + *         rte_seqlock_write_unlock(&config->lock);
> + * }
> + * @endcode
> + *
> + * @see
> + * https://en.wikipedia.org/wiki/Seqlock.
> + */
> +
> +#include <stdbool.h>
> +#include <stdint.h>
> +
> +#include <rte_atomic.h>
> +#include <rte_branch_prediction.h>
> +#include <rte_spinlock.h>
> +
> +/**
> + * The RTE seqlock type.
> + */
> +typedef struct {
> +	uint32_t sn; /**< A sequence number for the protected data. */
> +	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */ }
Suggest using ticket lock for the writer side. It should have low overhead when there is a single writer, but provides better functionality when there are multiple writers. 

> +rte_seqlock_t;
> +
> +/**
> + * A static seqlock initializer.
> + */
> +#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER }
> +
> +/**
> + * Initialize the seqlock.
> + *
> + * This function initializes the seqlock, and leaves the writer-side
> + * spinlock unlocked.
> + *
> + * @param seqlock
> + *   A pointer to the seqlock.
> + */
> +__rte_experimental
> +void
> +rte_seqlock_init(rte_seqlock_t *seqlock);
> +
> +/**
> + * Begin a read-side critical section.
> + *
> + * A call to this function marks the beginning of a read-side critical
> + * section, for @p seqlock.
> + *
> + * rte_seqlock_read_lock() returns a sequence number, which is later
> + * used in rte_seqlock_read_tryunlock() to check if the protected data
> + * underwent any modifications during the read transaction.
> + *
> + * After (in program order) rte_seqlock_read_lock() has been called,
> + * the calling thread reads the protected data, for later use. The
> + * protected data read *must* be copied (either in pristine form, or
> + * in the form of some derivative), since the caller may only read the
> + * data from within the read-side critical section (i.e., after
> + * rte_seqlock_read_lock() and before rte_seqlock_read_tryunlock()),
> + * but must not act upon the retrieved data while in the critical
> + * section, since it does not yet know if it is consistent.
> + *
> + * The protected data may be read using atomic and/or non-atomic
> + * operations.
> + *
> + * After (in program order) all required data loads have been
> + * performed, rte_seqlock_read_tryunlock() should be called, marking
> + * the end of the read-side critical section.
> + *
> + * If rte_seqlock_read_tryunlock() returns true, the data was read
> + * atomically and the copied data is consistent.
> + *
> + * If rte_seqlock_read_tryunlock() returns false, the just-read data
> + * is inconsistent and should be discarded. The caller has the option
> + * to either re-read the data and call rte_seqlock_read_tryunlock()
> + * again, or to restart the whole procedure (i.e., from
> + * rte_seqlock_read_lock()) at some later time.
> + *
> + * @param seqlock
> + *   A pointer to the seqlock.
> + * @return
> + *   The seqlock sequence number for this critical section, to
> + *   later be passed to rte_seqlock_read_tryunlock().
> + *
> + * @see rte_seqlock_read_tryunlock()
> + */
> +__rte_experimental
> +static inline uint32_t
> +rte_seqlock_read_lock(const rte_seqlock_t *seqlock) {
> +	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
> +	 * from happening before the sn load. Synchronizes-with the
> +	 * store release in rte_seqlock_write_unlock().
> +	 */
> +	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); }
> +
> +/**
> + * End a read-side critical section.
> + *
> + * A call to this function marks the end of a read-side critical
Should we capture that it also begins a new critical-section for the subsequent calls to rte_seqlock_tryunlock()?

> + * section, for @p seqlock. The application must supply the sequence
> + * number produced by the corresponding rte_seqlock_read_lock() (or,
> + * in case of a retry, the rte_seqlock_tryunlock()) call.
> + *
> + * After this function has been called, the caller should not access
> + * the protected data.
I understand what you mean here. But, I think this needs clarity.
In the documentation for rte_seqlock_read_lock() you have mentioned, if rte_seqlock_read_tryunlock() returns false, one could re-read the data.
May be this should be changed to:
" After this function returns true, the caller should not access the protected data."?
Or may be combine it with the following para.

> + *
> + * In case this function returns true, the just-read data was
> + * consistent and the set of atomic and non-atomic load operations
> + * performed between rte_seqlock_read_lock() and
> + * rte_seqlock_read_tryunlock() were atomic, as a whole.
> + *
> + * In case rte_seqlock_read_tryunlock() returns false, the data was
> + * modified as it was being read and may be inconsistent, and thus
> + * should be discarded. The @p begin_sn is updated with the
> + * now-current sequence number.
May be
" The @p begin_sn is updated with the sequence number for the next critical section."

> + *
> + * @param seqlock
> + *   A pointer to the seqlock.
> + * @param begin_sn
> + *   The seqlock sequence number returned by
> + *   rte_seqlock_read_lock() (potentially updated in subsequent
> + *   rte_seqlock_read_tryunlock() calls) for this critical section.
> + * @return
> + *   true or false, if the just-read seqlock-protected data was consistent
> + *   or inconsistent, respectively, at the time it was read.
true - just read protected data was consistent
false - just read protected data was inconsistent

> + *
> + * @see rte_seqlock_read_lock()
> + */
> +__rte_experimental
> +static inline bool
> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
> +*begin_sn) {
> +	uint32_t end_sn;
> +
> +	/* make sure the data loads happens before the sn load */
> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> +
> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
> +
> +	if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
> +		*begin_sn = end_sn;
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +/**
> + * Begin a write-side critical section.
> + *
> + * A call to this function acquires the write lock associated @p
> + * seqlock, and marks the beginning of a write-side critical section.
> + *
> + * After having called this function, the caller may go on to modify
> + * (both read and write) the protected data, in an atomic or
> + * non-atomic manner.
> + *
> + * After the necessary updates have been performed, the application
> + * calls rte_seqlock_write_unlock().
> + *
> + * This function is not preemption-safe in the sense that preemption
> + * of the calling thread may block reader progress until the writer
> + * thread is rescheduled.
> + *
> + * Unlike rte_seqlock_read_lock(), each call made to
> + * rte_seqlock_write_lock() must be matched with an unlock call.
> + *
> + * @param seqlock
> + *   A pointer to the seqlock.
> + *
> + * @see rte_seqlock_write_unlock()
> + */
> +__rte_experimental
> +static inline void
> +rte_seqlock_write_lock(rte_seqlock_t *seqlock) {
> +	uint32_t sn;
> +
> +	/* to synchronize with other writers */
> +	rte_spinlock_lock(&seqlock->lock);
> +
> +	sn = seqlock->sn + 1;
The load of seqlock->sn could use __atomic_load_n to be consistent.

> +
> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
> +
> +	/* __ATOMIC_RELEASE to prevent stores after (in program order)
> +	 * from happening before the sn store.
> +	 */
> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
> +}
> +
> +/**
> + * End a write-side critical section.
> + *
> + * A call to this function marks the end of the write-side critical
> + * section, for @p seqlock. After this call has been made, the
> +protected
> + * data may no longer be modified.
> + *
> + * @param seqlock
> + *   A pointer to the seqlock.
> + *
> + * @see rte_seqlock_write_lock()
> + */
> +__rte_experimental
> +static inline void
> +rte_seqlock_write_unlock(rte_seqlock_t *seqlock) {
> +	uint32_t sn;
> +
> +	sn = seqlock->sn + 1;
Same here, the load of seqlock->sn could use __atomic_load_n

> +
> +	/* synchronizes-with the load acquire in rte_seqlock_read_lock() */
> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
> +
> +	rte_spinlock_unlock(&seqlock->lock);
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif  /* _RTE_SEQLOCK_H_ */
> diff --git a/lib/eal/version.map b/lib/eal/version.map index
> b53eeb30d7..4a9d0ed899 100644
> --- a/lib/eal/version.map
> +++ b/lib/eal/version.map
> @@ -420,6 +420,9 @@ EXPERIMENTAL {
>  	rte_intr_instance_free;
>  	rte_intr_type_get;
>  	rte_intr_type_set;
> +
> +	# added in 22.07
> +	rte_seqlock_init;
>  };
> 
>  INTERNAL {
> --
> 2.25.1
  
Morten Brørup April 2, 2022, 11:01 a.m. UTC | #2
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Saturday, 2 April 2022 02.22
> 
> > From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> > Sent: Friday, April 1, 2022 10:08 AM
> >
> > diff --git a/lib/eal/include/rte_seqlock.h
> > b/lib/eal/include/rte_seqlock.h new file
> Other lock implementations are in lib/eal/include/generic.

I'm not sure why what goes where... e.g. rte_branch_prediction.h and rte_bitops.h are not in include/generic.

But I agree that keeping lock implementations in the same location makes sense.

Also, other lock implementations have their init() function in their header file, so you could consider getting rid of the C file. I don't care, just mentioning it.

> > +/**
> > + * The RTE seqlock type.
> > + */
> > +typedef struct {
> > +	uint32_t sn; /**< A sequence number for the protected data. */
> > +	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */
> }
> Suggest using ticket lock for the writer side. It should have low
> overhead when there is a single writer, but provides better
> functionality when there are multiple writers.

A spinlock and a ticket lock have the same size, so there is no memory cost either.

Unless using a ticket lock stands in the way for future extensions to the seqlock library, then it seems like a good idea.

> > +__rte_experimental
> > +static inline bool
> > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
> > +*begin_sn) {

Did anyone object to adding the __attribute__((warn_unused_result))?

Otherwise, I think you should add it.
  
Ola Liljedahl April 2, 2022, 6:15 p.m. UTC | #3
On 4/1/22 17:07, Mattias Rönnblom wrote:
> +
> +/**
> + * End a read-side critical section.
> + *
> + * A call to this function marks the end of a read-side critical
> + * section, for @p seqlock. The application must supply the sequence
> + * number produced by the corresponding rte_seqlock_read_lock() (or,
> + * in case of a retry, the rte_seqlock_tryunlock()) call.
> + *
> + * After this function has been called, the caller should not access
> + * the protected data.
> + *
> + * In case this function returns true, the just-read data was
> + * consistent and the set of atomic and non-atomic load operations
> + * performed between rte_seqlock_read_lock() and
> + * rte_seqlock_read_tryunlock() were atomic, as a whole.
> + *
> + * In case rte_seqlock_read_tryunlock() returns false, the data was
> + * modified as it was being read and may be inconsistent, and thus
> + * should be discarded. The @p begin_sn is updated with the
> + * now-current sequence number.
> + *
> + * @param seqlock
> + *   A pointer to the seqlock.
> + * @param begin_sn
> + *   The seqlock sequence number returned by
> + *   rte_seqlock_read_lock() (potentially updated in subsequent
> + *   rte_seqlock_read_tryunlock() calls) for this critical section.
> + * @return
> + *   true or false, if the just-read seqlock-protected data was consistent
> + *   or inconsistent, respectively, at the time it was read.
> + *
> + * @see rte_seqlock_read_lock()
> + */
> +__rte_experimental
> +static inline bool
> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t *begin_sn)
> +{
> +	uint32_t end_sn;
> +
> +	/* make sure the data loads happens before the sn load */
> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> +
> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);

Since we are reading and potentially returning the sequence number here 
(repeating the read of the protected data), we need to use load-acquire. 
I assume it is not expected that the user will call 
rte_seqlock_read_lock() again.

Seeing this implementation, I might actually prefer the original 
implementation, I think it is cleaner. But I would like for the begin 
function also to wait for an even sequence number, the end function 
would only have to check for same sequence number, this might improve 
performance a little bit as readers won't perform one or several broken 
reads while a write is in progress. The function names are a different 
thing though.

The writer side behaves much more like a lock with mutual exclusion so 
write_lock/write_unlock makes sense.

> +
> +	if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
> +		*begin_sn = end_sn;
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
  
Honnappa Nagarahalli April 2, 2022, 7:31 p.m. UTC | #4
<snip>

> > +__rte_experimental
> > +static inline bool
> > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
> > +*begin_sn) {
> > +	uint32_t end_sn;
> > +
> > +	/* make sure the data loads happens before the sn load */
> > +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> > +
> > +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
> 
> Since we are reading and potentially returning the sequence number here
> (repeating the read of the protected data), we need to use load-acquire.
> I assume it is not expected that the user will call
> rte_seqlock_read_lock() again.
Good point, we need a load-acquire (due to changes done in v3).

> 
> Seeing this implementation, I might actually prefer the original
> implementation, I think it is cleaner. But I would like for the begin function
> also to wait for an even sequence number, the end function would only have
> to check for same sequence number, this might improve performance a little
> bit as readers won't perform one or several broken reads while a write is in
> progress. The function names are a different thing though.
I think we need to be optimizing for the case where there is no contention between readers and writers (as that happens most of the time). From this perspective, not checking for an even seq number in the begin function would reduce one 'if' statement.

Going back to the earlier model is better as well, because of the load-acquire required in the 'rte_seqlock_read_tryunlock' function. The earlier model would not introduce the load-acquire for the no contention case.

> 
> The writer side behaves much more like a lock with mutual exclusion so
> write_lock/write_unlock makes sense.
> 
> > +
> > +	if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
> > +		*begin_sn = end_sn;
> > +		return false;
> > +	}
> > +
> > +	return true;
> > +}
> > +
  
Honnappa Nagarahalli April 2, 2022, 7:38 p.m. UTC | #5
<snip>

> 
> > > +__rte_experimental
> > > +static inline bool
> > > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
> > > +*begin_sn) {
> 
> Did anyone object to adding the __attribute__((warn_unused_result))?
> 
> Otherwise, I think you should add it.
+1
  
Morten Brørup April 2, 2022, 8:36 p.m. UTC | #6
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Saturday, 2 April 2022 21.31
> 
> <snip>
> 
> > > +__rte_experimental
> > > +static inline bool
> > > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
> > > +*begin_sn) {
> > > +	uint32_t end_sn;
> > > +
> > > +	/* make sure the data loads happens before the sn load */
> > > +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> > > +
> > > +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
> >
> > Since we are reading and potentially returning the sequence number
> here
> > (repeating the read of the protected data), we need to use load-
> acquire.
> > I assume it is not expected that the user will call
> > rte_seqlock_read_lock() again.
> Good point, we need a load-acquire (due to changes done in v3).
> 
> >
> > Seeing this implementation, I might actually prefer the original
> > implementation, I think it is cleaner. But I would like for the begin
> function
> > also to wait for an even sequence number, the end function would only
> have
> > to check for same sequence number, this might improve performance a
> little
> > bit as readers won't perform one or several broken reads while a
> write is in
> > progress. The function names are a different thing though.
> I think we need to be optimizing for the case where there is no
> contention between readers and writers (as that happens most of the
> time). From this perspective, not checking for an even seq number in
> the begin function would reduce one 'if' statement.

I might be siding with Ola on this, but with a twist: The read_lock() should not wait, but test. (Or both variants could be available. Or all three, including the variant without checking for an even sequence number.)

My argument for this is: The write operation could take a long time to complete, and while this goes on, it is good for the reading threads to know at entry of their critical read section that the read operation will fail, so they can take the alternative code path instead of proceeding into the critical read section. Otherwise, the reading threads have to waste time reading the protected data, only to discard them at the end. It's an optimization based on the assumption that reading the protected data has some small cost, because this small cost adds up if done many times during a longwinded write operation.

And, although checking for the sequence number in read_trylock() adds an 'if' statement to it, that 'if' statement should be surrounded by likely() to reduce its cost in the case we are optimizing for, i.e. when no write operation is ongoing.

This means that read_trylock() returns a boolean, and the sequence number is returned in an output parameter.

Please note that it doesn't change the fact that read_tryunlock() can still fail, even though read_trylock() gave the go-ahead.

I'm trying to highlight that while we all agree to optimize for the case of reading while no writing is ongoing, there might be opportunity for optimizing for the opposite case (i.e. trying to read while writing is ongoing) at the same time.

I only hope it can be done with negligent performance cost for the primary case.

I'll respectfully leave the hardcore implementation details and performance considerations to you experts in this area. :-)
  
Honnappa Nagarahalli April 2, 2022, 10:01 p.m. UTC | #7
<snip>
> >
> > > > +__rte_experimental
> > > > +static inline bool
> > > > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
> > > > +*begin_sn) {
> > > > +	uint32_t end_sn;
> > > > +
> > > > +	/* make sure the data loads happens before the sn load */
> > > > +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> > > > +
> > > > +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
> > >
> > > Since we are reading and potentially returning the sequence number
> > here
> > > (repeating the read of the protected data), we need to use load-
> > acquire.
> > > I assume it is not expected that the user will call
> > > rte_seqlock_read_lock() again.
> > Good point, we need a load-acquire (due to changes done in v3).
> >
> > >
> > > Seeing this implementation, I might actually prefer the original
> > > implementation, I think it is cleaner. But I would like for the
> > > begin
> > function
> > > also to wait for an even sequence number, the end function would
> > > only
> > have
> > > to check for same sequence number, this might improve performance a
> > little
> > > bit as readers won't perform one or several broken reads while a
> > write is in
> > > progress. The function names are a different thing though.
> > I think we need to be optimizing for the case where there is no
> > contention between readers and writers (as that happens most of the
> > time). From this perspective, not checking for an even seq number in
> > the begin function would reduce one 'if' statement.
> 
> I might be siding with Ola on this, but with a twist: The read_lock() should not
> wait, but test. (Or both variants could be available. Or all three, including the
> variant without checking for an even sequence number.)
> 
> My argument for this is: The write operation could take a long time to
> complete, and while this goes on, it is good for the reading threads to know at
> entry of their critical read section that the read operation will fail, so they can
> take the alternative code path instead of proceeding into the critical read
> section. Otherwise, the reading threads have to waste time reading the
> protected data, only to discard them at the end. It's an optimization based on
> the assumption that reading the protected data has some small cost, because
> this small cost adds up if done many times during a longwinded write
> operation.
> 
> And, although checking for the sequence number in read_trylock() adds an 'if'
> statement to it, that 'if' statement should be surrounded by likely() to reduce
> its cost in the case we are optimizing for, i.e. when no write operation is
> ongoing.
This 'if' statement can be part of the application code as well. This would allow for multiple models to exist.

> 
> This means that read_trylock() returns a boolean, and the sequence number is
> returned in an output parameter.
> 
> Please note that it doesn't change the fact that read_tryunlock() can still fail,
> even though read_trylock() gave the go-ahead.
> 
> I'm trying to highlight that while we all agree to optimize for the case of
> reading while no writing is ongoing, there might be opportunity for optimizing
> for the opposite case (i.e. trying to read while writing is ongoing) at the same
> time.
> 
> I only hope it can be done with negligent performance cost for the primary
> case.
> 
> I'll respectfully leave the hardcore implementation details and performance
> considerations to you experts in this area. :-)
  
Mattias Rönnblom April 3, 2022, 6:10 a.m. UTC | #8
On 2022-04-02 02:21, Honnappa Nagarahalli wrote:
> Hi Mattias,
> 	Few comments inline.
> 
>> -----Original Message-----
>> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Sent: Friday, April 1, 2022 10:08 AM
>> To: dev@dpdk.org
>> Cc: thomas@monjalon.net; David Marchand <david.marchand@redhat.com>;
>> onar.olsen@ericsson.com; Honnappa Nagarahalli
>> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>;
>> konstantin.ananyev@intel.com; mb@smartsharesystems.com;
>> stephen@networkplumber.org; Mattias Rönnblom
>> <mattias.ronnblom@ericsson.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>
>> Subject: [PATCH v3] eal: add seqlock
>>
>> A sequence lock (seqlock) is synchronization primitive which allows for data-
>> race free, low-overhead, high-frequency reads, especially for data structures
>> shared across many cores and which are updated with relatively infrequently.
>>
>> A seqlock permits multiple parallel readers. The variant of seqlock implemented
>> in this patch supports multiple writers as well. A spinlock is used for writer-
>> writer serialization.
>>
>> To avoid resource reclamation and other issues, the data protected by a seqlock
>> is best off being self-contained (i.e., no pointers [except to constant data]).
>>
>> One way to think about seqlocks is that they provide means to perform atomic
>> operations on data objects larger what the native atomic machine instructions
>> allow for.
>>
>> DPDK seqlocks are not preemption safe on the writer side. A thread preemption
>> affects performance, not correctness.
>>
>> A seqlock contains a sequence number, which can be thought of as the
>> generation of the data it protects.
>>
>> A reader will
>>    1. Load the sequence number (sn).
>>    2. Load, in arbitrary order, the seqlock-protected data.
>>    3. Load the sn again.
>>    4. Check if the first and second sn are equal, and even numbered.
>>       If they are not, discard the loaded data, and restart from 1.
>>
>> The first three steps need to be ordered using suitable memory fences.
>>
>> A writer will
>>    1. Take the spinlock, to serialize writer access.
>>    2. Load the sn.
>>    3. Store the original sn + 1 as the new sn.
>>    4. Perform load and stores to the seqlock-protected data.
>>    5. Store the original sn + 2 as the new sn.
>>    6. Release the spinlock.
>>
>> Proper memory fencing is required to make sure the first sn store, the data
>> stores, and the second sn store appear to the reader in the mentioned order.
>>
>> The sn loads and stores must be atomic, but the data loads and stores need not
>> be.
>>
>> The original seqlock design and implementation was done by Stephen
>> Hemminger. This is an independent implementation, using C11 atomics.
>>
>> For more information on seqlocks, see
>> https://en.wikipedia.org/wiki/Seqlock
>>
>> PATCH v3:
>>    * Renamed both read and write-side critical section begin/end functions
>>      to better match rwlock naming, per Ola Liljedahl's suggestion.
>>    * Added 'extern "C"' guards for C++ compatibility.
>>    * Refer to the main lcore as the main, and not the master.
>>
>> PATCH v2:
>>    * Skip instead of fail unit test in case too few lcores are available.
>>    * Use main lcore for testing, reducing the minimum number of lcores
>>      required to run the unit tests to four.
>>    * Consistently refer to sn field as the "sequence number" in the
>>      documentation.
>>    * Fixed spelling mistakes in documentation.
>>
>> Updates since RFC:
>>    * Added API documentation.
>>    * Added link to Wikipedia article in the commit message.
>>    * Changed seqlock sequence number field from uint64_t (which was
>>      overkill) to uint32_t. The sn type needs to be sufficiently large
>>      to assure no reader will read a sn, access the data, and then read
>>      the same sn, but the sn has been updated to many times during the
>>      read, so it has wrapped.
>>    * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
>>    * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
>>      with an anonymous struct typedef:ed to rte_seqlock_t.
>>
>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> ---
>>   app/test/meson.build          |   2 +
>>   app/test/test_seqlock.c       | 202 +++++++++++++++++++++++
>>   lib/eal/common/meson.build    |   1 +
>>   lib/eal/common/rte_seqlock.c  |  12 ++
>>   lib/eal/include/meson.build   |   1 +
>>   lib/eal/include/rte_seqlock.h | 302 ++++++++++++++++++++++++++++++++++
>>   lib/eal/version.map           |   3 +
>>   7 files changed, 523 insertions(+)
>>   create mode 100644 app/test/test_seqlock.c  create mode 100644
>> lib/eal/common/rte_seqlock.c  create mode 100644
>> lib/eal/include/rte_seqlock.h
>>
>> diff --git a/app/test/meson.build b/app/test/meson.build index
>> 5fc1dd1b7b..5e418e8766 100644
>> --- a/app/test/meson.build
>> +++ b/app/test/meson.build
>> @@ -125,6 +125,7 @@ test_sources = files(
>>           'test_rwlock.c',
>>           'test_sched.c',
>>           'test_security.c',
>> +        'test_seqlock.c',
>>           'test_service_cores.c',
>>           'test_spinlock.c',
>>           'test_stack.c',
>> @@ -214,6 +215,7 @@ fast_tests = [
>>           ['rwlock_rde_wro_autotest', true],
>>           ['sched_autotest', true],
>>           ['security_autotest', false],
>> +        ['seqlock_autotest', true],
>>           ['spinlock_autotest', true],
>>           ['stack_autotest', false],
>>           ['stack_lf_autotest', false],
>> diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode
>> 100644 index 0000000000..54fadf8025
>> --- /dev/null
>> +++ b/app/test/test_seqlock.c
>> @@ -0,0 +1,202 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2022 Ericsson AB
>> + */
>> +
>> +#include <rte_seqlock.h>
>> +
>> +#include <rte_cycles.h>
>> +#include <rte_malloc.h>
>> +#include <rte_random.h>
>> +
>> +#include <inttypes.h>
>> +
>> +#include "test.h"
>> +
>> +struct data {
>> +	rte_seqlock_t lock;
>> +
>> +	uint64_t a;
>> +	uint64_t b __rte_cache_aligned;
>> +	uint64_t c __rte_cache_aligned;
>> +} __rte_cache_aligned;
>> +
>> +struct reader {
>> +	struct data *data;
>> +	uint8_t stop;
>> +};
>> +
>> +#define WRITER_RUNTIME (2.0) /* s */
>> +
>> +#define WRITER_MAX_DELAY (100) /* us */
>> +
>> +#define INTERRUPTED_WRITER_FREQUENCY (1000) #define
>> +WRITER_INTERRUPT_TIME (1) /* us */
>> +
>> +static int
>> +writer_run(void *arg)
>> +{
>> +	struct data *data = arg;
>> +	uint64_t deadline;
>> +
>> +	deadline = rte_get_timer_cycles() +
>> +		WRITER_RUNTIME * rte_get_timer_hz();
>> +
>> +	while (rte_get_timer_cycles() < deadline) {
>> +		bool interrupted;
>> +		uint64_t new_value;
>> +		unsigned int delay;
>> +
>> +		new_value = rte_rand();
>> +
>> +		interrupted =
>> rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
>> +
>> +		rte_seqlock_write_lock(&data->lock);
>> +
>> +		data->c = new_value;
>> +
>> +		/* These compiler barriers (both on the test reader
>> +		 * and the test writer side) are here to ensure that
>> +		 * loads/stores *usually* happen in test program order
>> +		 * (always on a TSO machine). They are arrange in such
>> +		 * a way that the writer stores in a different order
>> +		 * than the reader loads, to emulate an arbitrary
>> +		 * order. A real application using a seqlock does not
>> +		 * require any compiler barriers.
>> +		 */
>> +		rte_compiler_barrier();
> The compiler barriers are not sufficient on all architectures (if the intention is to maintain the program order).
> 

The intention is what is described in the comment (i.e., to make it 
likely, but no guaranteed, that the stores will be globally visible in 
the program order).

The reason I didn't put in a release memory barrier, was that it seems a 
little intrusive.

Maybe I should remove these compiler barriers. They are also intrusive 
in the way may prevent some compiler optimizations, that could expose a 
seqlock bug. Or, I could have two variants of the tests. I don't know.

>> +		data->b = new_value;
>> +
>> +		if (interrupted)
>> +			rte_delay_us_block(WRITER_INTERRUPT_TIME);
>> +
>> +		rte_compiler_barrier();
>> +		data->a = new_value;
>> +
>> +		rte_seqlock_write_unlock(&data->lock);
>> +
>> +		delay = rte_rand_max(WRITER_MAX_DELAY);
>> +
>> +		rte_delay_us_block(delay);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +#define INTERRUPTED_READER_FREQUENCY (1000) #define
>> +READER_INTERRUPT_TIME (1000) /* us */
>> +
>> +static int
>> +reader_run(void *arg)
>> +{
>> +	struct reader *r = arg;
>> +	int rc = 0;
>> +
>> +	while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc ==
>> 0) {
>> +		struct data *data = r->data;
>> +		bool interrupted;
>> +		uint64_t a;
>> +		uint64_t b;
>> +		uint64_t c;
>> +		uint32_t sn;
>> +
>> +		interrupted =
>> rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0;
>> +
>> +		sn = rte_seqlock_read_lock(&data->lock);
>> +
>> +		do {
>> +			a = data->a;
>> +			/* See writer_run() for an explanation why
>> +			 * these barriers are here.
>> +			 */
>> +			rte_compiler_barrier();
>> +
>> +			if (interrupted)
>> +
>> 	rte_delay_us_block(READER_INTERRUPT_TIME);
>> +
>> +			c = data->c;
>> +
>> +			rte_compiler_barrier();
>> +			b = data->b;
>> +
>> +		} while (!rte_seqlock_read_tryunlock(&data->lock, &sn));
>> +
>> +		if (a != b || b != c) {
>> +			printf("Reader observed inconsistent data values "
>> +			       "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
>> +			       a, b, c);
>> +			rc = -1;
>> +		}
>> +	}
>> +
>> +	return rc;
>> +}
>> +
>> +static void
>> +reader_stop(struct reader *reader)
>> +{
>> +	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED); }
>> +
>> +#define NUM_WRITERS (2) /* main lcore + one worker */ #define
>> +MIN_NUM_READERS (2) #define MAX_READERS (RTE_MAX_LCORE -
>> NUM_WRITERS -
>> +1) #define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS)
>> +
>> +/* Only a compile-time test */
>> +static rte_seqlock_t __rte_unused static_init_lock =
>> +RTE_SEQLOCK_INITIALIZER;
>> +
>> +static int
>> +test_seqlock(void)
>> +{
>> +	struct reader readers[MAX_READERS];
>> +	unsigned int num_readers;
>> +	unsigned int num_lcores;
>> +	unsigned int i;
>> +	unsigned int lcore_id;
>> +	unsigned int reader_lcore_ids[MAX_READERS];
>> +	unsigned int worker_writer_lcore_id = 0;
>> +	int rc = 0;
>> +
>> +	num_lcores = rte_lcore_count();
>> +
>> +	if (num_lcores < MIN_LCORE_COUNT) {
>> +		printf("Too few cores to run test. Skipping.\n");
>> +		return 0;
>> +	}
>> +
>> +	num_readers = num_lcores - NUM_WRITERS;
>> +
>> +	struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0);
>> +
>> +	i = 0;
>> +	RTE_LCORE_FOREACH_WORKER(lcore_id) {
>> +		if (i == 0) {
>> +			rte_eal_remote_launch(writer_run, data, lcore_id);
>> +			worker_writer_lcore_id = lcore_id;
>> +		} else {
>> +			unsigned int reader_idx = i - 1;
>> +			struct reader *reader = &readers[reader_idx];
>> +
>> +			reader->data = data;
>> +			reader->stop = 0;
>> +
>> +			rte_eal_remote_launch(reader_run, reader, lcore_id);
>> +			reader_lcore_ids[reader_idx] = lcore_id;
>> +		}
>> +		i++;
>> +	}
>> +
>> +	if (writer_run(data) != 0 ||
>> +	    rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
>> +		rc = -1;
>> +
>> +	for (i = 0; i < num_readers; i++) {
>> +		reader_stop(&readers[i]);
>> +		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
>> +			rc = -1;
>> +	}
>> +
>> +	return rc;
>> +}
>> +
>> +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
>> diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index
>> 917758cc65..a41343bfed 100644
>> --- a/lib/eal/common/meson.build
>> +++ b/lib/eal/common/meson.build
>> @@ -35,6 +35,7 @@ sources += files(
>>           'rte_malloc.c',
>>           'rte_random.c',
>>           'rte_reciprocal.c',
>> +	'rte_seqlock.c',
>>           'rte_service.c',
>>           'rte_version.c',
>>   )
>> diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c new
>> file mode 100644 index 0000000000..d4fe648799
>> --- /dev/null
>> +++ b/lib/eal/common/rte_seqlock.c
>> @@ -0,0 +1,12 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2022 Ericsson AB
>> + */
>> +
>> +#include <rte_seqlock.h>
>> +
>> +void
>> +rte_seqlock_init(rte_seqlock_t *seqlock) {
>> +	seqlock->sn = 0;
>> +	rte_spinlock_init(&seqlock->lock);
>> +}
>> diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index
>> 9700494816..48df5f1a21 100644
>> --- a/lib/eal/include/meson.build
>> +++ b/lib/eal/include/meson.build
>> @@ -36,6 +36,7 @@ headers += files(
>>           'rte_per_lcore.h',
>>           'rte_random.h',
>>           'rte_reciprocal.h',
>> +        'rte_seqlock.h',
>>           'rte_service.h',
>>           'rte_service_component.h',
>>           'rte_string_fns.h',
>> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h new file
> Other lock implementations are in lib/eal/include/generic.
> 
>> mode 100644 index 0000000000..44eacd66e8
>> --- /dev/null
>> +++ b/lib/eal/include/rte_seqlock.h
>> @@ -0,0 +1,302 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2022 Ericsson AB
>> + */
>> +
>> +#ifndef _RTE_SEQLOCK_H_
>> +#define _RTE_SEQLOCK_H_
>> +
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +/**
>> + * @file
>> + * RTE Seqlock
>> + *
>> + * A sequence lock (seqlock) is a synchronization primitive allowing
>> + * multiple, parallel, readers to efficiently and safely (i.e., in a
>> + * data-race free manner) access the lock-protected data. The RTE
>> + * seqlock permits multiple writers as well. A spinlock is used to
>> + * writer-writer synchronization.
>> + *
>> + * A reader never blocks a writer. Very high frequency writes may
>> + * prevent readers from making progress.
>> + *
>> + * A seqlock is not preemption-safe on the writer side. If a writer is
>> + * preempted, it may block readers until the writer thread is again
>> + * allowed to execute. Heavy computations should be kept out of the
>> + * writer-side critical section, to avoid delaying readers.
>> + *
>> + * Seqlocks are useful for data which are read by many cores, at a
>> + * high frequency, and relatively infrequently written to.
>> + *
>> + * One way to think about seqlocks is that they provide means to
>> + * perform atomic operations on objects larger than what the native
>> + * machine instructions allow for.
>> + *
>> + * To avoid resource reclamation issues, the data protected by a
>> + * seqlock should typically be kept self-contained (e.g., no pointers
>> + * to mutable, dynamically allocated data).
>> + *
>> + * Example usage:
>> + * @code{.c}
>> + * #define MAX_Y_LEN (16)
>> + * // Application-defined example data structure, protected by a seqlock.
>> + * struct config {
>> + *         rte_seqlock_t lock;
>> + *         int param_x;
>> + *         char param_y[MAX_Y_LEN];
>> + * };
>> + *
>> + * // Accessor function for reading config fields.
>> + * void
>> + * config_read(const struct config *config, int *param_x, char
>> +*param_y)
>> + * {
>> + *         // Temporary variables, just to improve readability.
> I think the above comment is not necessary. It is beneficial to copy the protected data to keep the read side critical section small.
> 
>> + *         int tentative_x;
>> + *         char tentative_y[MAX_Y_LEN];
>> + *         uint32_t sn;
>> + *
>> + *         sn = rte_seqlock_read_lock(&config->lock);
>> + *         do {
>> + *                 // Loads may be atomic or non-atomic, as in this example.
>> + *                 tentative_x = config->param_x;
>> + *                 strcpy(tentative_y, config->param_y);
>> + *         } while (!rte_seqlock_read_tryunlock(&config->lock, &sn));
>> + *         // An application could skip retrying, and try again later, if
>> + *         // progress is possible without the data.
>> + *
>> + *         *param_x = tentative_x;
>> + *         strcpy(param_y, tentative_y);
>> + * }
>> + *
>> + * // Accessor function for writing config fields.
>> + * void
>> + * config_update(struct config *config, int param_x, const char
>> +*param_y)
>> + * {
>> + *         rte_seqlock_write_lock(&config->lock);
>> + *         // Stores may be atomic or non-atomic, as in this example.
>> + *         config->param_x = param_x;
>> + *         strcpy(config->param_y, param_y);
>> + *         rte_seqlock_write_unlock(&config->lock);
>> + * }
>> + * @endcode
>> + *
>> + * @see
>> + * https://en.wikipedia.org/wiki/Seqlock.
>> + */
>> +
>> +#include <stdbool.h>
>> +#include <stdint.h>
>> +
>> +#include <rte_atomic.h>
>> +#include <rte_branch_prediction.h>
>> +#include <rte_spinlock.h>
>> +
>> +/**
>> + * The RTE seqlock type.
>> + */
>> +typedef struct {
>> +	uint32_t sn; /**< A sequence number for the protected data. */
>> +	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */ }
> Suggest using ticket lock for the writer side. It should have low overhead when there is a single writer, but provides better functionality when there are multiple writers.
> 
>> +rte_seqlock_t;
>> +
>> +/**
>> + * A static seqlock initializer.
>> + */
>> +#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER }
>> +
>> +/**
>> + * Initialize the seqlock.
>> + *
>> + * This function initializes the seqlock, and leaves the writer-side
>> + * spinlock unlocked.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + */
>> +__rte_experimental
>> +void
>> +rte_seqlock_init(rte_seqlock_t *seqlock);
>> +
>> +/**
>> + * Begin a read-side critical section.
>> + *
>> + * A call to this function marks the beginning of a read-side critical
>> + * section, for @p seqlock.
>> + *
>> + * rte_seqlock_read_lock() returns a sequence number, which is later
>> + * used in rte_seqlock_read_tryunlock() to check if the protected data
>> + * underwent any modifications during the read transaction.
>> + *
>> + * After (in program order) rte_seqlock_read_lock() has been called,
>> + * the calling thread reads the protected data, for later use. The
>> + * protected data read *must* be copied (either in pristine form, or
>> + * in the form of some derivative), since the caller may only read the
>> + * data from within the read-side critical section (i.e., after
>> + * rte_seqlock_read_lock() and before rte_seqlock_read_tryunlock()),
>> + * but must not act upon the retrieved data while in the critical
>> + * section, since it does not yet know if it is consistent.
>> + *
>> + * The protected data may be read using atomic and/or non-atomic
>> + * operations.
>> + *
>> + * After (in program order) all required data loads have been
>> + * performed, rte_seqlock_read_tryunlock() should be called, marking
>> + * the end of the read-side critical section.
>> + *
>> + * If rte_seqlock_read_tryunlock() returns true, the data was read
>> + * atomically and the copied data is consistent.
>> + *
>> + * If rte_seqlock_read_tryunlock() returns false, the just-read data
>> + * is inconsistent and should be discarded. The caller has the option
>> + * to either re-read the data and call rte_seqlock_read_tryunlock()
>> + * again, or to restart the whole procedure (i.e., from
>> + * rte_seqlock_read_lock()) at some later time.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + * @return
>> + *   The seqlock sequence number for this critical section, to
>> + *   later be passed to rte_seqlock_read_tryunlock().
>> + *
>> + * @see rte_seqlock_read_tryunlock()
>> + */
>> +__rte_experimental
>> +static inline uint32_t
>> +rte_seqlock_read_lock(const rte_seqlock_t *seqlock) {
>> +	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
>> +	 * from happening before the sn load. Synchronizes-with the
>> +	 * store release in rte_seqlock_write_unlock().
>> +	 */
>> +	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); }
>> +
>> +/**
>> + * End a read-side critical section.
>> + *
>> + * A call to this function marks the end of a read-side critical
> Should we capture that it also begins a new critical-section for the subsequent calls to rte_seqlock_tryunlock()?
> 
>> + * section, for @p seqlock. The application must supply the sequence
>> + * number produced by the corresponding rte_seqlock_read_lock() (or,
>> + * in case of a retry, the rte_seqlock_tryunlock()) call.
>> + *
>> + * After this function has been called, the caller should not access
>> + * the protected data.
> I understand what you mean here. But, I think this needs clarity.
> In the documentation for rte_seqlock_read_lock() you have mentioned, if rte_seqlock_read_tryunlock() returns false, one could re-read the data.
> May be this should be changed to:
> " After this function returns true, the caller should not access the protected data."?
> Or may be combine it with the following para.
> 
>> + *
>> + * In case this function returns true, the just-read data was
>> + * consistent and the set of atomic and non-atomic load operations
>> + * performed between rte_seqlock_read_lock() and
>> + * rte_seqlock_read_tryunlock() were atomic, as a whole.
>> + *
>> + * In case rte_seqlock_read_tryunlock() returns false, the data was
>> + * modified as it was being read and may be inconsistent, and thus
>> + * should be discarded. The @p begin_sn is updated with the
>> + * now-current sequence number.
> May be
> " The @p begin_sn is updated with the sequence number for the next critical section."
> 

Sounds good.

>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + * @param begin_sn
>> + *   The seqlock sequence number returned by
>> + *   rte_seqlock_read_lock() (potentially updated in subsequent
>> + *   rte_seqlock_read_tryunlock() calls) for this critical section.
>> + * @return
>> + *   true or false, if the just-read seqlock-protected data was consistent
>> + *   or inconsistent, respectively, at the time it was read.
> true - just read protected data was consistent
> false - just read protected data was inconsistent
> 
>> + *
>> + * @see rte_seqlock_read_lock()
>> + */
>> +__rte_experimental
>> +static inline bool
>> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
>> +*begin_sn) {
>> +	uint32_t end_sn;
>> +
>> +	/* make sure the data loads happens before the sn load */
>> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>> +
>> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
>> +
>> +	if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
>> +		*begin_sn = end_sn;
>> +		return false;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>> +/**
>> + * Begin a write-side critical section.
>> + *
>> + * A call to this function acquires the write lock associated @p
>> + * seqlock, and marks the beginning of a write-side critical section.
>> + *
>> + * After having called this function, the caller may go on to modify
>> + * (both read and write) the protected data, in an atomic or
>> + * non-atomic manner.
>> + *
>> + * After the necessary updates have been performed, the application
>> + * calls rte_seqlock_write_unlock().
>> + *
>> + * This function is not preemption-safe in the sense that preemption
>> + * of the calling thread may block reader progress until the writer
>> + * thread is rescheduled.
>> + *
>> + * Unlike rte_seqlock_read_lock(), each call made to
>> + * rte_seqlock_write_lock() must be matched with an unlock call.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + *
>> + * @see rte_seqlock_write_unlock()
>> + */
>> +__rte_experimental
>> +static inline void
>> +rte_seqlock_write_lock(rte_seqlock_t *seqlock) {
>> +	uint32_t sn;
>> +
>> +	/* to synchronize with other writers */
>> +	rte_spinlock_lock(&seqlock->lock);
>> +
>> +	sn = seqlock->sn + 1;
> The load of seqlock->sn could use __atomic_load_n to be consistent.
> 

But why? I know it doesn't have any cost (these loads are going to be 
atomic anyways), but why use a construct with stronger guarantees than 
you have to?

>> +
>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
>> +
>> +	/* __ATOMIC_RELEASE to prevent stores after (in program order)
>> +	 * from happening before the sn store.
>> +	 */
>> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
>> +}
>> +
>> +/**
>> + * End a write-side critical section.
>> + *
>> + * A call to this function marks the end of the write-side critical
>> + * section, for @p seqlock. After this call has been made, the
>> +protected
>> + * data may no longer be modified.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + *
>> + * @see rte_seqlock_write_lock()
>> + */
>> +__rte_experimental
>> +static inline void
>> +rte_seqlock_write_unlock(rte_seqlock_t *seqlock) {
>> +	uint32_t sn;
>> +
>> +	sn = seqlock->sn + 1;
> Same here, the load of seqlock->sn could use __atomic_load_n
> 
>> +
>> +	/* synchronizes-with the load acquire in rte_seqlock_read_lock() */
>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
>> +
>> +	rte_spinlock_unlock(&seqlock->lock);
>> +}
>> +
>> +#ifdef __cplusplus
>> +}
>> +#endif
>> +
>> +#endif  /* _RTE_SEQLOCK_H_ */
>> diff --git a/lib/eal/version.map b/lib/eal/version.map index
>> b53eeb30d7..4a9d0ed899 100644
>> --- a/lib/eal/version.map
>> +++ b/lib/eal/version.map
>> @@ -420,6 +420,9 @@ EXPERIMENTAL {
>>   	rte_intr_instance_free;
>>   	rte_intr_type_get;
>>   	rte_intr_type_set;
>> +
>> +	# added in 22.07
>> +	rte_seqlock_init;
>>   };
>>
>>   INTERNAL {
>> --
>> 2.25.1
>
  
Mattias Rönnblom April 3, 2022, 6:33 a.m. UTC | #9
I missed some of your comments.

On 2022-04-02 02:21, Honnappa Nagarahalli wrote:

<snip>

>> + * Example usage:
>> + * @code{.c}
>> + * #define MAX_Y_LEN (16)
>> + * // Application-defined example data structure, protected by a seqlock.
>> + * struct config {
>> + *         rte_seqlock_t lock;
>> + *         int param_x;
>> + *         char param_y[MAX_Y_LEN];
>> + * };
>> + *
>> + * // Accessor function for reading config fields.
>> + * void
>> + * config_read(const struct config *config, int *param_x, char
>> +*param_y)
>> + * {
>> + *         // Temporary variables, just to improve readability.
> I think the above comment is not necessary. It is beneficial to copy the protected data to keep the read side critical section small.
> 

The data here would be copied into the buffers supplied by config_read() 
anyways, so it's a copy regardless.

>> + *         int tentative_x;
>> + *         char tentative_y[MAX_Y_LEN];
>> + *         uint32_t sn;
>> + *
>> + *         sn = rte_seqlock_read_lock(&config->lock);
>> + *         do {
>> + *                 // Loads may be atomic or non-atomic, as in this example.
>> + *                 tentative_x = config->param_x;
>> + *                 strcpy(tentative_y, config->param_y);
>> + *         } while (!rte_seqlock_read_tryunlock(&config->lock, &sn));
>> + *         // An application could skip retrying, and try again later, if
>> + *         // progress is possible without the data.
>> + *
>> + *         *param_x = tentative_x;
>> + *         strcpy(param_y, tentative_y);
>> + * }
>> + *
>> + * // Accessor function for writing config fields.
>> + * void
>> + * config_update(struct config *config, int param_x, const char
>> +*param_y)
>> + * {
>> + *         rte_seqlock_write_lock(&config->lock);
>> + *         // Stores may be atomic or non-atomic, as in this example.
>> + *         config->param_x = param_x;
>> + *         strcpy(config->param_y, param_y);
>> + *         rte_seqlock_write_unlock(&config->lock);
>> + * }
>> + * @endcode
>> + *
>> + * @see
>> + * https://en.wikipedia.org/wiki/Seqlock.
>> + */
>> +
>> +#include <stdbool.h>
>> +#include <stdint.h>
>> +
>> +#include <rte_atomic.h>
>> +#include <rte_branch_prediction.h>
>> +#include <rte_spinlock.h>
>> +
>> +/**
>> + * The RTE seqlock type.
>> + */
>> +typedef struct {
>> +	uint32_t sn; /**< A sequence number for the protected data. */
>> +	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */ }
> Suggest using ticket lock for the writer side. It should have low overhead when there is a single writer, but provides better functionality when there are multiple writers.
> 

Is a seqlock the synchronization primitive of choice for high-contention 
cases? I would say no, but I'm not sure what you would use instead.

<snip>
  
Mattias Rönnblom April 3, 2022, 6:51 a.m. UTC | #10
On 2022-04-02 20:15, Ola Liljedahl wrote:
> On 4/1/22 17:07, Mattias Rönnblom wrote:
>> +
>> +/**
>> + * End a read-side critical section.
>> + *
>> + * A call to this function marks the end of a read-side critical
>> + * section, for @p seqlock. The application must supply the sequence
>> + * number produced by the corresponding rte_seqlock_read_lock() (or,
>> + * in case of a retry, the rte_seqlock_tryunlock()) call.
>> + *
>> + * After this function has been called, the caller should not access
>> + * the protected data.
>> + *
>> + * In case this function returns true, the just-read data was
>> + * consistent and the set of atomic and non-atomic load operations
>> + * performed between rte_seqlock_read_lock() and
>> + * rte_seqlock_read_tryunlock() were atomic, as a whole.
>> + *
>> + * In case rte_seqlock_read_tryunlock() returns false, the data was
>> + * modified as it was being read and may be inconsistent, and thus
>> + * should be discarded. The @p begin_sn is updated with the
>> + * now-current sequence number.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + * @param begin_sn
>> + *   The seqlock sequence number returned by
>> + *   rte_seqlock_read_lock() (potentially updated in subsequent
>> + *   rte_seqlock_read_tryunlock() calls) for this critical section.
>> + * @return
>> + *   true or false, if the just-read seqlock-protected data was 
>> consistent
>> + *   or inconsistent, respectively, at the time it was read.
>> + *
>> + * @see rte_seqlock_read_lock()
>> + */
>> +__rte_experimental
>> +static inline bool
>> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t 
>> *begin_sn)
>> +{
>> +    uint32_t end_sn;
>> +
>> +    /* make sure the data loads happens before the sn load */
>> +    rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>> +
>> +    end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
> 
> Since we are reading and potentially returning the sequence number here 
> (repeating the read of the protected data), we need to use load-acquire. 
> I assume it is not expected that the user will call 
> rte_seqlock_read_lock() again.
> 

Good point.

> Seeing this implementation, I might actually prefer the original 
> implementation, I think it is cleaner.

Me too.

> But I would like for the begin 
> function also to wait for an even sequence number, the end function 
> would only have to check for same sequence number, this might improve 
> performance a little bit as readers won't perform one or several broken 
> reads while a write is in progress. The function names are a different 
> thing though.
> 

The odd sn should be a rare case, if the seqlock is used for relatively 
low frequency update scenarios, which is what I think it should be 
designed for.

Waiting for an even sn in read_begin() would exclude the option for the 
caller to defer reading the new data to same later time, in case it's 
being written. That in turn would force even a single writer to make 
sure its thread is not preempted, or risk blocking all lcore worker 
cores attempting to read the protected data.

You could complete the above API with a read_trybegin() function to 
address that issue, for those who care, but that would force some extra 
complexity on the user.

> The writer side behaves much more like a lock with mutual exclusion so 
> write_lock/write_unlock makes sense.
> 
>> +
>> +    if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
>> +        *begin_sn = end_sn;
>> +        return false;
>> +    }
>> +
>> +    return true;
>> +}
>> +
  
Honnappa Nagarahalli April 3, 2022, 5:27 p.m. UTC | #11
<snip>

> >> a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode
> >> 100644 index 0000000000..54fadf8025
> >> --- /dev/null
> >> +++ b/app/test/test_seqlock.c
> >> @@ -0,0 +1,202 @@
> >> +/* SPDX-License-Identifier: BSD-3-Clause
> >> + * Copyright(c) 2022 Ericsson AB
> >> + */
> >> +
> >> +#include <rte_seqlock.h>
> >> +
> >> +#include <rte_cycles.h>
> >> +#include <rte_malloc.h>
> >> +#include <rte_random.h>
> >> +
> >> +#include <inttypes.h>
> >> +
> >> +#include "test.h"
> >> +
> >> +struct data {
> >> +	rte_seqlock_t lock;
> >> +
> >> +	uint64_t a;
> >> +	uint64_t b __rte_cache_aligned;
> >> +	uint64_t c __rte_cache_aligned;
> >> +} __rte_cache_aligned;
> >> +
> >> +struct reader {
> >> +	struct data *data;
> >> +	uint8_t stop;
> >> +};
> >> +
> >> +#define WRITER_RUNTIME (2.0) /* s */
> >> +
> >> +#define WRITER_MAX_DELAY (100) /* us */
> >> +
> >> +#define INTERRUPTED_WRITER_FREQUENCY (1000) #define
> >> +WRITER_INTERRUPT_TIME (1) /* us */
> >> +
> >> +static int
> >> +writer_run(void *arg)
> >> +{
> >> +	struct data *data = arg;
> >> +	uint64_t deadline;
> >> +
> >> +	deadline = rte_get_timer_cycles() +
> >> +		WRITER_RUNTIME * rte_get_timer_hz();
> >> +
> >> +	while (rte_get_timer_cycles() < deadline) {
> >> +		bool interrupted;
> >> +		uint64_t new_value;
> >> +		unsigned int delay;
> >> +
> >> +		new_value = rte_rand();
> >> +
> >> +		interrupted =
> >> rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
> >> +
> >> +		rte_seqlock_write_lock(&data->lock);
> >> +
> >> +		data->c = new_value;
> >> +
> >> +		/* These compiler barriers (both on the test reader
> >> +		 * and the test writer side) are here to ensure that
> >> +		 * loads/stores *usually* happen in test program order
> >> +		 * (always on a TSO machine). They are arrange in such
> >> +		 * a way that the writer stores in a different order
> >> +		 * than the reader loads, to emulate an arbitrary
> >> +		 * order. A real application using a seqlock does not
> >> +		 * require any compiler barriers.
> >> +		 */
> >> +		rte_compiler_barrier();
> > The compiler barriers are not sufficient on all architectures (if the intention
> is to maintain the program order).
> >
> 
> The intention is what is described in the comment (i.e., to make it likely, but
> no guaranteed, that the stores will be globally visible in the program order).
> 
> The reason I didn't put in a release memory barrier, was that it seems a little
> intrusive.
> 
> Maybe I should remove these compiler barriers. They are also intrusive in the
> way may prevent some compiler optimizations, that could expose a seqlock
> bug. Or, I could have two variants of the tests. I don't know.
I would suggest removing the compiler barriers, leave it to the CPU to do what it can do.

> 
> >> +		data->b = new_value;
> >> +
> >> +		if (interrupted)
> >> +			rte_delay_us_block(WRITER_INTERRUPT_TIME);
> >> +
> >> +		rte_compiler_barrier();
> >> +		data->a = new_value;
> >> +
> >> +		rte_seqlock_write_unlock(&data->lock);
> >> +
> >> +		delay = rte_rand_max(WRITER_MAX_DELAY);
> >> +
> >> +		rte_delay_us_block(delay);
> >> +	}
> >> +
> >> +	return 0;
> >> +}
> >> +

<snip>

> >> +
> >> +/**
> >> + * Begin a write-side critical section.
> >> + *
> >> + * A call to this function acquires the write lock associated @p
> >> + * seqlock, and marks the beginning of a write-side critical section.
> >> + *
> >> + * After having called this function, the caller may go on to modify
> >> + * (both read and write) the protected data, in an atomic or
> >> + * non-atomic manner.
> >> + *
> >> + * After the necessary updates have been performed, the application
> >> + * calls rte_seqlock_write_unlock().
> >> + *
> >> + * This function is not preemption-safe in the sense that preemption
> >> + * of the calling thread may block reader progress until the writer
> >> + * thread is rescheduled.
> >> + *
> >> + * Unlike rte_seqlock_read_lock(), each call made to
> >> + * rte_seqlock_write_lock() must be matched with an unlock call.
> >> + *
> >> + * @param seqlock
> >> + *   A pointer to the seqlock.
> >> + *
> >> + * @see rte_seqlock_write_unlock()
> >> + */
> >> +__rte_experimental
> >> +static inline void
> >> +rte_seqlock_write_lock(rte_seqlock_t *seqlock) {
> >> +	uint32_t sn;
> >> +
> >> +	/* to synchronize with other writers */
> >> +	rte_spinlock_lock(&seqlock->lock);
> >> +
> >> +	sn = seqlock->sn + 1;
> > The load of seqlock->sn could use __atomic_load_n to be consistent.
> >
> 
> But why? I know it doesn't have any cost (these loads are going to be atomic
> anyways), but why use a construct with stronger guarantees than you have
> to?
Using __atomic_xxx ensures that the operation is atomic always. I believe (I am not sure) that, when not using __atomic_xxx, the compiler is allowed to use non-atomic operations.
The other reason is we are not qualifying 'sn' as volatile. Use of __atomic_xxx inherently indicate to the compiler not to cache 'sn' in a register. I do not know the compiler behavior if some operations on 'sn' use __atomic_xxx and some do not.

> 
> >> +
> >> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
> >> +
> >> +	/* __ATOMIC_RELEASE to prevent stores after (in program order)
> >> +	 * from happening before the sn store.
> >> +	 */
> >> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
> >> +}
> >> +
> >> +/**
> >> + * End a write-side critical section.
> >> + *
> >> + * A call to this function marks the end of the write-side critical
> >> + * section, for @p seqlock. After this call has been made, the
> >> +protected
> >> + * data may no longer be modified.
> >> + *
> >> + * @param seqlock
> >> + *   A pointer to the seqlock.
> >> + *
> >> + * @see rte_seqlock_write_lock()
> >> + */
> >> +__rte_experimental
> >> +static inline void
> >> +rte_seqlock_write_unlock(rte_seqlock_t *seqlock) {
> >> +	uint32_t sn;
> >> +
> >> +	sn = seqlock->sn + 1;
> > Same here, the load of seqlock->sn could use __atomic_load_n
> >
> >> +
> >> +	/* synchronizes-with the load acquire in rte_seqlock_read_lock() */
> >> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
> >> +
> >> +	rte_spinlock_unlock(&seqlock->lock);
> >> +}
> >> +
> >> +#ifdef __cplusplus
> >> +}
> >> +#endif
> >> +
> >> +#endif  /* _RTE_SEQLOCK_H_ */
<snip>
  
Honnappa Nagarahalli April 3, 2022, 5:37 p.m. UTC | #12
<snip>

> 
> >> + * Example usage:
> >> + * @code{.c}
> >> + * #define MAX_Y_LEN (16)
> >> + * // Application-defined example data structure, protected by a seqlock.
> >> + * struct config {
> >> + *         rte_seqlock_t lock;
> >> + *         int param_x;
> >> + *         char param_y[MAX_Y_LEN];
> >> + * };
> >> + *
> >> + * // Accessor function for reading config fields.
> >> + * void
> >> + * config_read(const struct config *config, int *param_x, char
> >> +*param_y)
> >> + * {
> >> + *         // Temporary variables, just to improve readability.
> > I think the above comment is not necessary. It is beneficial to copy the
> protected data to keep the read side critical section small.
> >
> 
> The data here would be copied into the buffers supplied by config_read()
> anyways, so it's a copy regardless.
I see what you mean here. I would think the local variables add confusion, the copy can happen to the passed parameters directly. I will leave it to you to decide.

> 
> >> + *         int tentative_x;
> >> + *         char tentative_y[MAX_Y_LEN];
> >> + *         uint32_t sn;
> >> + *
> >> + *         sn = rte_seqlock_read_lock(&config->lock);
> >> + *         do {
> >> + *                 // Loads may be atomic or non-atomic, as in this example.
> >> + *                 tentative_x = config->param_x;
> >> + *                 strcpy(tentative_y, config->param_y);
> >> + *         } while (!rte_seqlock_read_tryunlock(&config->lock, &sn));
> >> + *         // An application could skip retrying, and try again later, if
> >> + *         // progress is possible without the data.
> >> + *
> >> + *         *param_x = tentative_x;
> >> + *         strcpy(param_y, tentative_y);
> >> + * }
> >> + *
> >> + * // Accessor function for writing config fields.
> >> + * void
> >> + * config_update(struct config *config, int param_x, const char
> >> +*param_y)
> >> + * {
> >> + *         rte_seqlock_write_lock(&config->lock);
> >> + *         // Stores may be atomic or non-atomic, as in this example.
> >> + *         config->param_x = param_x;
> >> + *         strcpy(config->param_y, param_y);
> >> + *         rte_seqlock_write_unlock(&config->lock);
> >> + * }
> >> + * @endcode
> >> + *
> >> + * @see
> >> + * https://en.wikipedia.org/wiki/Seqlock.
> >> + */
> >> +
> >> +#include <stdbool.h>
> >> +#include <stdint.h>
> >> +
> >> +#include <rte_atomic.h>
> >> +#include <rte_branch_prediction.h>
> >> +#include <rte_spinlock.h>
> >> +
> >> +/**
> >> + * The RTE seqlock type.
> >> + */
> >> +typedef struct {
> >> +	uint32_t sn; /**< A sequence number for the protected data. */
> >> +	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */ }
> > Suggest using ticket lock for the writer side. It should have low overhead
> when there is a single writer, but provides better functionality when there are
> multiple writers.
> >
> 
> Is a seqlock the synchronization primitive of choice for high-contention cases?
> I would say no, but I'm not sure what you would use instead.
I think Stephen has come across some use cases of high contention writers with readers, maybe Stephen can provide some input.

IMO, there is no harm/perf issues in using ticket lock.

> 
> <snip>
  
Ola Liljedahl April 3, 2022, 6:11 p.m. UTC | #13
(Now using macOS Mail program in plain text mode, hope this works)

> On 2 Apr 2022, at 21:31, Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> 
> <snip>
> 
>>> +__rte_experimental
>>> +static inline bool
>>> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
>>> +*begin_sn) {
>>> +	uint32_t end_sn;
>>> +
>>> +	/* make sure the data loads happens before the sn load */
>>> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>>> +
>>> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
>> 
>> Since we are reading and potentially returning the sequence number here
>> (repeating the read of the protected data), we need to use load-acquire.
>> I assume it is not expected that the user will call
>> rte_seqlock_read_lock() again.
> Good point, we need a load-acquire (due to changes done in v3).
> 
>> 
>> Seeing this implementation, I might actually prefer the original
>> implementation, I think it is cleaner. But I would like for the begin function
>> also to wait for an even sequence number, the end function would only have
>> to check for same sequence number, this might improve performance a little
>> bit as readers won't perform one or several broken reads while a write is in
>> progress. The function names are a different thing though.
> I think we need to be optimizing for the case where there is no contention between readers and writers (as that happens most of the time). From this perspective, not checking for an even seq number in the begin function would reduce one 'if' statement.
The number of statements in C is not relevant, instead we need to look at the generated code. On x86, I would assume an if-statement like “if ((sn & 1) || (sn == sl->sn))” to generate two separate evaluations with their own conditional jump instructions. On AArch64, the two evaluations could probably be combined using a CCMP instruction and need only one conditional branch instruction. With branch prediction, it is doubtful we will see any difference in performance.

> 
> Going back to the earlier model is better as well, because of the load-acquire required in the 'rte_seqlock_read_tryunlock' function. The earlier model would not introduce the load-acquire for the no contention case.
The earlier model still had load-acquire in the read_begin function which would have to invoked again. There is no difference in the number or type of memory accesses. We just need to copy the implementation of read_begin into the read_tryunlock function if we decide that the user should not have to re-invoke read_begin on a failed read_tryunlock.

> 
>> 
>> The writer side behaves much more like a lock with mutual exclusion so
>> write_lock/write_unlock makes sense.
>> 
>>> +
>>> +	if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
>>> +		*begin_sn = end_sn;
>>> +		return false;
>>> +	}
>>> +
>>> +	return true;
>>> +}
>>> +
  
Ola Liljedahl April 3, 2022, 6:37 p.m. UTC | #14
>>>> +__rte_experimental
>>>> +static inline void
>>>> +rte_seqlock_write_lock(rte_seqlock_t *seqlock) {
>>>> +  uint32_t sn;
>>>> +
>>>> +  /* to synchronize with other writers */
>>>> +  rte_spinlock_lock(&seqlock->lock);
>>>> +
>>>> +  sn = seqlock->sn + 1;
>>> The load of seqlock->sn could use __atomic_load_n to be consistent.
>>> 
>> 
>> But why? I know it doesn't have any cost (these loads are going to be atomic
>> anyways), but why use a construct with stronger guarantees than you have
>> to?
> Using __atomic_xxx ensures that the operation is atomic always. I believe (I am not sure) that, when not using __atomic_xxx, the compiler is allowed to use non-atomic operations.
> The other reason is we are not qualifying 'sn' as volatile. Use of __atomic_xxx inherently indicate to the compiler not to cache 'sn' in a register. I do not know the compiler behavior if some operations on 'sn' use __atomic_xxx and some do not.
We don’t need an atomic read here as the seqlock->lock protects (serialises) writer-side accesses to seqlock->sn. There is no other thread which could update seqlock->sn while this thread owns the lock. The seqlock owner could read seqlock->sn byte for byte without any problems.
Only writes to seqlock->sn need to be atomic as there might be readers who read seqlock->sn and in such multi-access scenarios, all accesses need to be atomic in order to avoid data races.

If seqlock->sn was a C11 _Atomic type, all accesses would automatically be atomic.

- Ola
  
Honnappa Nagarahalli April 4, 2022, 9:56 p.m. UTC | #15
<snip>
> 
> 
> >>>> +__rte_experimental
> >>>> +static inline void
> >>>> +rte_seqlock_write_lock(rte_seqlock_t *seqlock) {
> >>>> +  uint32_t sn;
> >>>> +
> >>>> +  /* to synchronize with other writers */
> >>>> + rte_spinlock_lock(&seqlock->lock);
> >>>> +
> >>>> +  sn = seqlock->sn + 1;
> >>> The load of seqlock->sn could use __atomic_load_n to be consistent.
> >>>
> >>
> >> But why? I know it doesn't have any cost (these loads are going to be
> >> atomic anyways), but why use a construct with stronger guarantees
> >> than you have to?
> > Using __atomic_xxx ensures that the operation is atomic always. I believe (I
> am not sure) that, when not using __atomic_xxx, the compiler is allowed to
> use non-atomic operations.
> > The other reason is we are not qualifying 'sn' as volatile. Use of
> __atomic_xxx inherently indicate to the compiler not to cache 'sn' in a
> register. I do not know the compiler behavior if some operations on 'sn' use
> __atomic_xxx and some do not.
> We don’t need an atomic read here as the seqlock->lock protects (serialises)
> writer-side accesses to seqlock->sn. There is no other thread which could
> update seqlock->sn while this thread owns the lock. The seqlock owner could
> read seqlock->sn byte for byte without any problems.
> Only writes to seqlock->sn need to be atomic as there might be readers who
> read seqlock->sn and in such multi-access scenarios, all accesses need to be
> atomic in order to avoid data races.
How does the compiler interpret a mix of __atomic_xxx and non-atomic access to a memory location? What are the guarantees the compiler provides in such cases?
Do you see any harm in making this an atomic operation?

> 
> If seqlock->sn was a C11 _Atomic type, all accesses would automatically be
> atomic.
Exactly, we do not have that currently.

> 
> - Ola
>
  
Mattias Rönnblom April 8, 2022, 1:45 p.m. UTC | #16
On 2022-04-03 19:37, Honnappa Nagarahalli wrote:
> <snip>
> 
>>
>>>> + * Example usage:
>>>> + * @code{.c}
>>>> + * #define MAX_Y_LEN (16)
>>>> + * // Application-defined example data structure, protected by a seqlock.
>>>> + * struct config {
>>>> + *         rte_seqlock_t lock;
>>>> + *         int param_x;
>>>> + *         char param_y[MAX_Y_LEN];
>>>> + * };
>>>> + *
>>>> + * // Accessor function for reading config fields.
>>>> + * void
>>>> + * config_read(const struct config *config, int *param_x, char
>>>> +*param_y)
>>>> + * {
>>>> + *         // Temporary variables, just to improve readability.
>>> I think the above comment is not necessary. It is beneficial to copy the
>> protected data to keep the read side critical section small.
>>>
>>
>> The data here would be copied into the buffers supplied by config_read()
>> anyways, so it's a copy regardless.
> I see what you mean here. I would think the local variables add confusion, the copy can happen to the passed parameters directly. I will leave it to you to decide.
> 

I'll remove the temp variables.

>>
>>>> + *         int tentative_x;
>>>> + *         char tentative_y[MAX_Y_LEN];
>>>> + *         uint32_t sn;
>>>> + *
>>>> + *         sn = rte_seqlock_read_lock(&config->lock);
>>>> + *         do {
>>>> + *                 // Loads may be atomic or non-atomic, as in this example.
>>>> + *                 tentative_x = config->param_x;
>>>> + *                 strcpy(tentative_y, config->param_y);
>>>> + *         } while (!rte_seqlock_read_tryunlock(&config->lock, &sn));
>>>> + *         // An application could skip retrying, and try again later, if
>>>> + *         // progress is possible without the data.
>>>> + *
>>>> + *         *param_x = tentative_x;
>>>> + *         strcpy(param_y, tentative_y);
>>>> + * }
>>>> + *
>>>> + * // Accessor function for writing config fields.
>>>> + * void
>>>> + * config_update(struct config *config, int param_x, const char
>>>> +*param_y)
>>>> + * {
>>>> + *         rte_seqlock_write_lock(&config->lock);
>>>> + *         // Stores may be atomic or non-atomic, as in this example.
>>>> + *         config->param_x = param_x;
>>>> + *         strcpy(config->param_y, param_y);
>>>> + *         rte_seqlock_write_unlock(&config->lock);
>>>> + * }
>>>> + * @endcode
>>>> + *
>>>> + * @see
>>>> + * https://en.wikipedia.org/wiki/Seqlock.
>>>> + */
>>>> +
>>>> +#include <stdbool.h>
>>>> +#include <stdint.h>
>>>> +
>>>> +#include <rte_atomic.h>
>>>> +#include <rte_branch_prediction.h>
>>>> +#include <rte_spinlock.h>
>>>> +
>>>> +/**
>>>> + * The RTE seqlock type.
>>>> + */
>>>> +typedef struct {
>>>> +	uint32_t sn; /**< A sequence number for the protected data. */
>>>> +	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */ }
>>> Suggest using ticket lock for the writer side. It should have low overhead
>> when there is a single writer, but provides better functionality when there are
>> multiple writers.
>>>
>>
>> Is a seqlock the synchronization primitive of choice for high-contention cases?
>> I would say no, but I'm not sure what you would use instead.
> I think Stephen has come across some use cases of high contention writers with readers, maybe Stephen can provide some input.
> 
> IMO, there is no harm/perf issues in using ticket lock.
> 

OK. I will leave at as spinlock for now (PATCH v4).
  

Patch

diff --git a/app/test/meson.build b/app/test/meson.build
index 5fc1dd1b7b..5e418e8766 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -125,6 +125,7 @@  test_sources = files(
         'test_rwlock.c',
         'test_sched.c',
         'test_security.c',
+        'test_seqlock.c',
         'test_service_cores.c',
         'test_spinlock.c',
         'test_stack.c',
@@ -214,6 +215,7 @@  fast_tests = [
         ['rwlock_rde_wro_autotest', true],
         ['sched_autotest', true],
         ['security_autotest', false],
+        ['seqlock_autotest', true],
         ['spinlock_autotest', true],
         ['stack_autotest', false],
         ['stack_lf_autotest', false],
diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c
new file mode 100644
index 0000000000..54fadf8025
--- /dev/null
+++ b/app/test/test_seqlock.c
@@ -0,0 +1,202 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+#include <rte_cycles.h>
+#include <rte_malloc.h>
+#include <rte_random.h>
+
+#include <inttypes.h>
+
+#include "test.h"
+
+struct data {
+	rte_seqlock_t lock;
+
+	uint64_t a;
+	uint64_t b __rte_cache_aligned;
+	uint64_t c __rte_cache_aligned;
+} __rte_cache_aligned;
+
+struct reader {
+	struct data *data;
+	uint8_t stop;
+};
+
+#define WRITER_RUNTIME (2.0) /* s */
+
+#define WRITER_MAX_DELAY (100) /* us */
+
+#define INTERRUPTED_WRITER_FREQUENCY (1000)
+#define WRITER_INTERRUPT_TIME (1) /* us */
+
+static int
+writer_run(void *arg)
+{
+	struct data *data = arg;
+	uint64_t deadline;
+
+	deadline = rte_get_timer_cycles() +
+		WRITER_RUNTIME * rte_get_timer_hz();
+
+	while (rte_get_timer_cycles() < deadline) {
+		bool interrupted;
+		uint64_t new_value;
+		unsigned int delay;
+
+		new_value = rte_rand();
+
+		interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
+
+		rte_seqlock_write_lock(&data->lock);
+
+		data->c = new_value;
+
+		/* These compiler barriers (both on the test reader
+		 * and the test writer side) are here to ensure that
+		 * loads/stores *usually* happen in test program order
+		 * (always on a TSO machine). They are arrange in such
+		 * a way that the writer stores in a different order
+		 * than the reader loads, to emulate an arbitrary
+		 * order. A real application using a seqlock does not
+		 * require any compiler barriers.
+		 */
+		rte_compiler_barrier();
+		data->b = new_value;
+
+		if (interrupted)
+			rte_delay_us_block(WRITER_INTERRUPT_TIME);
+
+		rte_compiler_barrier();
+		data->a = new_value;
+
+		rte_seqlock_write_unlock(&data->lock);
+
+		delay = rte_rand_max(WRITER_MAX_DELAY);
+
+		rte_delay_us_block(delay);
+	}
+
+	return 0;
+}
+
+#define INTERRUPTED_READER_FREQUENCY (1000)
+#define READER_INTERRUPT_TIME (1000) /* us */
+
+static int
+reader_run(void *arg)
+{
+	struct reader *r = arg;
+	int rc = 0;
+
+	while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc == 0) {
+		struct data *data = r->data;
+		bool interrupted;
+		uint64_t a;
+		uint64_t b;
+		uint64_t c;
+		uint32_t sn;
+
+		interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0;
+
+		sn = rte_seqlock_read_lock(&data->lock);
+
+		do {
+			a = data->a;
+			/* See writer_run() for an explanation why
+			 * these barriers are here.
+			 */
+			rte_compiler_barrier();
+
+			if (interrupted)
+				rte_delay_us_block(READER_INTERRUPT_TIME);
+
+			c = data->c;
+
+			rte_compiler_barrier();
+			b = data->b;
+
+		} while (!rte_seqlock_read_tryunlock(&data->lock, &sn));
+
+		if (a != b || b != c) {
+			printf("Reader observed inconsistent data values "
+			       "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
+			       a, b, c);
+			rc = -1;
+		}
+	}
+
+	return rc;
+}
+
+static void
+reader_stop(struct reader *reader)
+{
+	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED);
+}
+
+#define NUM_WRITERS (2) /* main lcore + one worker */
+#define MIN_NUM_READERS (2)
+#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1)
+#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS)
+
+/* Only a compile-time test */
+static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER;
+
+static int
+test_seqlock(void)
+{
+	struct reader readers[MAX_READERS];
+	unsigned int num_readers;
+	unsigned int num_lcores;
+	unsigned int i;
+	unsigned int lcore_id;
+	unsigned int reader_lcore_ids[MAX_READERS];
+	unsigned int worker_writer_lcore_id = 0;
+	int rc = 0;
+
+	num_lcores = rte_lcore_count();
+
+	if (num_lcores < MIN_LCORE_COUNT) {
+		printf("Too few cores to run test. Skipping.\n");
+		return 0;
+	}
+
+	num_readers = num_lcores - NUM_WRITERS;
+
+	struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0);
+
+	i = 0;
+	RTE_LCORE_FOREACH_WORKER(lcore_id) {
+		if (i == 0) {
+			rte_eal_remote_launch(writer_run, data, lcore_id);
+			worker_writer_lcore_id = lcore_id;
+		} else {
+			unsigned int reader_idx = i - 1;
+			struct reader *reader = &readers[reader_idx];
+
+			reader->data = data;
+			reader->stop = 0;
+
+			rte_eal_remote_launch(reader_run, reader, lcore_id);
+			reader_lcore_ids[reader_idx] = lcore_id;
+		}
+		i++;
+	}
+
+	if (writer_run(data) != 0 ||
+	    rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
+		rc = -1;
+
+	for (i = 0; i < num_readers; i++) {
+		reader_stop(&readers[i]);
+		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
+			rc = -1;
+	}
+
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
index 917758cc65..a41343bfed 100644
--- a/lib/eal/common/meson.build
+++ b/lib/eal/common/meson.build
@@ -35,6 +35,7 @@  sources += files(
         'rte_malloc.c',
         'rte_random.c',
         'rte_reciprocal.c',
+	'rte_seqlock.c',
         'rte_service.c',
         'rte_version.c',
 )
diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c
new file mode 100644
index 0000000000..d4fe648799
--- /dev/null
+++ b/lib/eal/common/rte_seqlock.c
@@ -0,0 +1,12 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+void
+rte_seqlock_init(rte_seqlock_t *seqlock)
+{
+	seqlock->sn = 0;
+	rte_spinlock_init(&seqlock->lock);
+}
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index 9700494816..48df5f1a21 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -36,6 +36,7 @@  headers += files(
         'rte_per_lcore.h',
         'rte_random.h',
         'rte_reciprocal.h',
+        'rte_seqlock.h',
         'rte_service.h',
         'rte_service_component.h',
         'rte_string_fns.h',
diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
new file mode 100644
index 0000000000..44eacd66e8
--- /dev/null
+++ b/lib/eal/include/rte_seqlock.h
@@ -0,0 +1,302 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#ifndef _RTE_SEQLOCK_H_
+#define _RTE_SEQLOCK_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file
+ * RTE Seqlock
+ *
+ * A sequence lock (seqlock) is a synchronization primitive allowing
+ * multiple, parallel, readers to efficiently and safely (i.e., in a
+ * data-race free manner) access the lock-protected data. The RTE
+ * seqlock permits multiple writers as well. A spinlock is used to
+ * writer-writer synchronization.
+ *
+ * A reader never blocks a writer. Very high frequency writes may
+ * prevent readers from making progress.
+ *
+ * A seqlock is not preemption-safe on the writer side. If a writer is
+ * preempted, it may block readers until the writer thread is again
+ * allowed to execute. Heavy computations should be kept out of the
+ * writer-side critical section, to avoid delaying readers.
+ *
+ * Seqlocks are useful for data which are read by many cores, at a
+ * high frequency, and relatively infrequently written to.
+ *
+ * One way to think about seqlocks is that they provide means to
+ * perform atomic operations on objects larger than what the native
+ * machine instructions allow for.
+ *
+ * To avoid resource reclamation issues, the data protected by a
+ * seqlock should typically be kept self-contained (e.g., no pointers
+ * to mutable, dynamically allocated data).
+ *
+ * Example usage:
+ * @code{.c}
+ * #define MAX_Y_LEN (16)
+ * // Application-defined example data structure, protected by a seqlock.
+ * struct config {
+ *         rte_seqlock_t lock;
+ *         int param_x;
+ *         char param_y[MAX_Y_LEN];
+ * };
+ *
+ * // Accessor function for reading config fields.
+ * void
+ * config_read(const struct config *config, int *param_x, char *param_y)
+ * {
+ *         // Temporary variables, just to improve readability.
+ *         int tentative_x;
+ *         char tentative_y[MAX_Y_LEN];
+ *         uint32_t sn;
+ *
+ *         sn = rte_seqlock_read_lock(&config->lock);
+ *         do {
+ *                 // Loads may be atomic or non-atomic, as in this example.
+ *                 tentative_x = config->param_x;
+ *                 strcpy(tentative_y, config->param_y);
+ *         } while (!rte_seqlock_read_tryunlock(&config->lock, &sn));
+ *         // An application could skip retrying, and try again later, if
+ *         // progress is possible without the data.
+ *
+ *         *param_x = tentative_x;
+ *         strcpy(param_y, tentative_y);
+ * }
+ *
+ * // Accessor function for writing config fields.
+ * void
+ * config_update(struct config *config, int param_x, const char *param_y)
+ * {
+ *         rte_seqlock_write_lock(&config->lock);
+ *         // Stores may be atomic or non-atomic, as in this example.
+ *         config->param_x = param_x;
+ *         strcpy(config->param_y, param_y);
+ *         rte_seqlock_write_unlock(&config->lock);
+ * }
+ * @endcode
+ *
+ * @see
+ * https://en.wikipedia.org/wiki/Seqlock.
+ */
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_spinlock.h>
+
+/**
+ * The RTE seqlock type.
+ */
+typedef struct {
+	uint32_t sn; /**< A sequence number for the protected data. */
+	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */
+} rte_seqlock_t;
+
+/**
+ * A static seqlock initializer.
+ */
+#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER }
+
+/**
+ * Initialize the seqlock.
+ *
+ * This function initializes the seqlock, and leaves the writer-side
+ * spinlock unlocked.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ */
+__rte_experimental
+void
+rte_seqlock_init(rte_seqlock_t *seqlock);
+
+/**
+ * Begin a read-side critical section.
+ *
+ * A call to this function marks the beginning of a read-side critical
+ * section, for @p seqlock.
+ *
+ * rte_seqlock_read_lock() returns a sequence number, which is later
+ * used in rte_seqlock_read_tryunlock() to check if the protected data
+ * underwent any modifications during the read transaction.
+ *
+ * After (in program order) rte_seqlock_read_lock() has been called,
+ * the calling thread reads the protected data, for later use. The
+ * protected data read *must* be copied (either in pristine form, or
+ * in the form of some derivative), since the caller may only read the
+ * data from within the read-side critical section (i.e., after
+ * rte_seqlock_read_lock() and before rte_seqlock_read_tryunlock()),
+ * but must not act upon the retrieved data while in the critical
+ * section, since it does not yet know if it is consistent.
+ *
+ * The protected data may be read using atomic and/or non-atomic
+ * operations.
+ *
+ * After (in program order) all required data loads have been
+ * performed, rte_seqlock_read_tryunlock() should be called, marking
+ * the end of the read-side critical section.
+ *
+ * If rte_seqlock_read_tryunlock() returns true, the data was read
+ * atomically and the copied data is consistent.
+ *
+ * If rte_seqlock_read_tryunlock() returns false, the just-read data
+ * is inconsistent and should be discarded. The caller has the option
+ * to either re-read the data and call rte_seqlock_read_tryunlock()
+ * again, or to restart the whole procedure (i.e., from
+ * rte_seqlock_read_lock()) at some later time.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @return
+ *   The seqlock sequence number for this critical section, to
+ *   later be passed to rte_seqlock_read_tryunlock().
+ *
+ * @see rte_seqlock_read_tryunlock()
+ */
+__rte_experimental
+static inline uint32_t
+rte_seqlock_read_lock(const rte_seqlock_t *seqlock)
+{
+	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
+	 * from happening before the sn load. Synchronizes-with the
+	 * store release in rte_seqlock_write_unlock().
+	 */
+	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE);
+}
+
+/**
+ * End a read-side critical section.
+ *
+ * A call to this function marks the end of a read-side critical
+ * section, for @p seqlock. The application must supply the sequence
+ * number produced by the corresponding rte_seqlock_read_lock() (or,
+ * in case of a retry, the rte_seqlock_tryunlock()) call.
+ *
+ * After this function has been called, the caller should not access
+ * the protected data.
+ *
+ * In case this function returns true, the just-read data was
+ * consistent and the set of atomic and non-atomic load operations
+ * performed between rte_seqlock_read_lock() and
+ * rte_seqlock_read_tryunlock() were atomic, as a whole.
+ *
+ * In case rte_seqlock_read_tryunlock() returns false, the data was
+ * modified as it was being read and may be inconsistent, and thus
+ * should be discarded. The @p begin_sn is updated with the
+ * now-current sequence number.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @param begin_sn
+ *   The seqlock sequence number returned by
+ *   rte_seqlock_read_lock() (potentially updated in subsequent
+ *   rte_seqlock_read_tryunlock() calls) for this critical section.
+ * @return
+ *   true or false, if the just-read seqlock-protected data was consistent
+ *   or inconsistent, respectively, at the time it was read.
+ *
+ * @see rte_seqlock_read_lock()
+ */
+__rte_experimental
+static inline bool
+rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t *begin_sn)
+{
+	uint32_t end_sn;
+
+	/* make sure the data loads happens before the sn load */
+	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
+	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
+
+	if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
+		*begin_sn = end_sn;
+		return false;
+	}
+
+	return true;
+}
+
+/**
+ * Begin a write-side critical section.
+ *
+ * A call to this function acquires the write lock associated @p
+ * seqlock, and marks the beginning of a write-side critical section.
+ *
+ * After having called this function, the caller may go on to modify
+ * (both read and write) the protected data, in an atomic or
+ * non-atomic manner.
+ *
+ * After the necessary updates have been performed, the application
+ * calls rte_seqlock_write_unlock().
+ *
+ * This function is not preemption-safe in the sense that preemption
+ * of the calling thread may block reader progress until the writer
+ * thread is rescheduled.
+ *
+ * Unlike rte_seqlock_read_lock(), each call made to
+ * rte_seqlock_write_lock() must be matched with an unlock call.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_unlock()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_lock(rte_seqlock_t *seqlock)
+{
+	uint32_t sn;
+
+	/* to synchronize with other writers */
+	rte_spinlock_lock(&seqlock->lock);
+
+	sn = seqlock->sn + 1;
+
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
+
+	/* __ATOMIC_RELEASE to prevent stores after (in program order)
+	 * from happening before the sn store.
+	 */
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+}
+
+/**
+ * End a write-side critical section.
+ *
+ * A call to this function marks the end of the write-side critical
+ * section, for @p seqlock. After this call has been made, the protected
+ * data may no longer be modified.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_lock()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_unlock(rte_seqlock_t *seqlock)
+{
+	uint32_t sn;
+
+	sn = seqlock->sn + 1;
+
+	/* synchronizes-with the load acquire in rte_seqlock_read_lock() */
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
+
+	rte_spinlock_unlock(&seqlock->lock);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif  /* _RTE_SEQLOCK_H_ */
diff --git a/lib/eal/version.map b/lib/eal/version.map
index b53eeb30d7..4a9d0ed899 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -420,6 +420,9 @@  EXPERIMENTAL {
 	rte_intr_instance_free;
 	rte_intr_type_get;
 	rte_intr_type_set;
+
+	# added in 22.07
+	rte_seqlock_init;
 };
 
 INTERNAL {