diff mbox series

[v2] eal: add seqlock

Message ID	20220330142602.108061-1-mattias.ronnblom@ericsson.com (mailing list archive)
State	Superseded, archived
Delegated to:	Thomas Monjalon
Headers	Received-SPF: Pass (protection.outlook.com: domain of ericsson.com designates 192.176.1.74 as permitted sender) receiver=protection.outlook.com; client-ip=192.176.1.74; helo=oa.msg.ericsson.com; From: =?utf-8?q?Mattias_R=C3=B6nnblom?= <mattias.ronnblom@ericsson.com> To: <dev@dpdk.org> CC: Thomas Monjalon <thomas@monjalon.net>, David Marchand <david.marchand@redhat.com>, <onar.olsen@ericsson.com>, <Honnappa.Nagarahalli@arm.com>, <nd@arm.com>, <konstantin.ananyev@intel.com>, <mb@smartsharesystems.com>, <stephen@networkplumber.org>, =?utf-8?q?Mattia?= =?utf-8?q?s_R=C3=B6nnblom?= <mattias.ronnblom@ericsson.com>, "Ola Liljedahl" <ola.liljedahl@arm.com> Subject: [PATCH v2] eal: add seqlock Date: Wed, 30 Mar 2022 16:26:02 +0200 Message-ID: <20220330142602.108061-1-mattias.ronnblom@ericsson.com> In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D86F84@smartserver.smartshare.dk> References: <98CBD80474FA8B44BF855DF32C47DC35D86F84@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Precedence: list Errors-To: dev-bounces@dpdk.org
Series	[v2] eal: add seqlock \| [v2] eal: add seqlock

Checks

Context	Check	Description
ci/checkpatch	warning	coding style issues
ci/Intel-compilation	success	Compilation OK
ci/github-robot: build	fail	github build: failed
ci/intel-Testing	success	Testing PASS
ci/iol-mellanox-Performance	success	Performance Testing PASS
ci/iol-intel-Performance	success	Performance Testing PASS
ci/iol-intel-Functional	success	Functional Testing PASS
ci/iol-x86_64-unit-testing	success	Testing PASS
ci/iol-aarch64-unit-testing	success	Testing PASS
ci/iol-abi-testing	success	Testing PASS
ci/iol-x86_64-compile-testing	success	Testing PASS
ci/iol-aarch64-compile-testing	success	Testing PASS

Commit Message

Mattias Rönnblom March 30, 2022, 2:26 p.m. UTC

  A sequence lock (seqlock) is synchronization primitive which allows
for data-race free, low-overhead, high-frequency reads, especially for
data structures shared across many cores and which are updated with
relatively infrequently.

A seqlock permits multiple parallel readers. The variant of seqlock
implemented in this patch supports multiple writers as well. A
spinlock is used for writer-writer serialization.

To avoid resource reclamation and other issues, the data protected by
a seqlock is best off being self-contained (i.e., no pointers [except
to constant data]).

One way to think about seqlocks is that they provide means to perform
atomic operations on data objects larger what the native atomic
machine instructions allow for.

DPDK seqlocks are not preemption safe on the writer side. A thread
preemption affects performance, not correctness.

A seqlock contains a sequence number, which can be thought of as the
generation of the data it protects.

A reader will
  1. Load the sequence number (sn).
  2. Load, in arbitrary order, the seqlock-protected data.
  3. Load the sn again.
  4. Check if the first and second sn are equal, and even numbered.
     If they are not, discard the loaded data, and restart from 1.

The first three steps need to be ordered using suitable memory fences.

A writer will
  1. Take the spinlock, to serialize writer access.
  2. Load the sn.
  3. Store the original sn + 1 as the new sn.
  4. Perform load and stores to the seqlock-protected data.
  5. Store the original sn + 2 as the new sn.
  6. Release the spinlock.

Proper memory fencing is required to make sure the first sn store, the
data stores, and the second sn store appear to the reader in the
mentioned order.

The sn loads and stores must be atomic, but the data loads and stores
need not be.

The original seqlock design and implementation was done by Stephen
Hemminger. This is an independent implementation, using C11 atomics.

For more information on seqlocks, see
https://en.wikipedia.org/wiki/Seqlock

PATCH v2:
  * Skip instead of fail unit test in case too few lcores are available.
  * Use main lcore for testing, reducing the minimum number of lcores
    required to run the unit tests to four.
  * Consistently refer to sn field as the "sequence number" in the
    documentation.
  * Fixed spelling mistakes in documentation.

Updates since RFC:
  * Added API documentation.
  * Added link to Wikipedia article in the commit message.
  * Changed seqlock sequence number field from uint64_t (which was
    overkill) to uint32_t. The sn type needs to be sufficiently large
    to assure no reader will read a sn, access the data, and then read
    the same sn, but the sn has been updated to many times during the
    read, so it has wrapped.
  * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
  * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
    with an anonymous struct typedef:ed to rte_seqlock_t.

Acked-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/meson.build          |   2 +
 app/test/test_seqlock.c       | 202 ++++++++++++++++++++++++
 lib/eal/common/meson.build    |   1 +
 lib/eal/common/rte_seqlock.c  |  12 ++
 lib/eal/include/meson.build   |   1 +
 lib/eal/include/rte_seqlock.h | 282 ++++++++++++++++++++++++++++++++++
 lib/eal/version.map           |   3 +
 7 files changed, 503 insertions(+)
 create mode 100644 app/test/test_seqlock.c
 create mode 100644 lib/eal/common/rte_seqlock.c
 create mode 100644 lib/eal/include/rte_seqlock.h

Comments

Mattias Rönnblom March 31, 2022, 7:46 a.m. UTC | #1

On 2022-03-30 16:26, Mattias Rönnblom wrote:
> A sequence lock (seqlock) is synchronization primitive which allows
> for data-race free, low-overhead, high-frequency reads, especially for
> data structures shared across many cores and which are updated with
> relatively infrequently.
>
>

<snip>

Some questions I have:

Is a variant of the seqlock without the spinlock required? The reason I 
left such out was that I thought that in most cases where only a single 
writer is used (or serialization is external to the seqlock), the 
spinlock overhead is negligible, since updates are relatively infrequent.

Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(), or 
some third alternative? I wanted to make clear it's not just a "release 
the lock" function. You could use 
the|||__attribute__((warn_unused_result)) annotation to make clear the 
return value cannot be ignored, although I'm not sure DPDK ever use that 
attribute.

|

Ola Liljedahl March 31, 2022, 9:04 a.m. UTC | #2

On 3/31/22 09:46, Mattias Rönnblom wrote:
> On 2022-03-30 16:26, Mattias Rönnblom wrote:
>> A sequence lock (seqlock) is synchronization primitive which allows
>> for data-race free, low-overhead, high-frequency reads, especially for
>> data structures shared across many cores and which are updated with
>> relatively infrequently.
>>
>>
> 
> <snip>
> 
> Some questions I have:
> 
> Is a variant of the seqlock without the spinlock required? The reason I
> left such out was that I thought that in most cases where only a single
> writer is used (or serialization is external to the seqlock), the
> spinlock overhead is negligible, since updates are relatively infrequent.
You can combine the spinlock and the sequence number. Odd sequence 
number means the seqlock is busy. That would replace a non-atomic RMW of 
the sequence number with an atomic RMW CAS and avoid the spin lock 
atomic RMW operation. Not sure how much it helps.

> 
> Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(), or
> some third alternative? I wanted to make clear it's not just a "release
> the lock" function. You could use
> the|||__attribute__((warn_unused_result)) annotation to make clear the
> return value cannot be ignored, although I'm not sure DPDK ever use that
> attribute.
We have to decide how to use the seqlock API from the application 
perspective.
Your current proposal:
do {
     sn = rte_seqlock_read_begin(&seqlock)
     //read protected data
} while (rte_seqlock_read_retry(&seqlock, sn));

or perhaps
sn = rte_seqlock_read_lock(&seqlock);
do {
     //read protected data
} while (!rte_seqlock_read_tryunlock(&seqlock, &sn));

Tryunlock should signal to the user that the unlock operation might not 
succeed and something needs to be repeated.

-- Ola

Morten Brørup March 31, 2022, 9:25 a.m. UTC | #3

> From: Ola Liljedahl [mailto:ola.liljedahl@arm.com]
> Sent: Thursday, 31 March 2022 11.05
> 
> On 3/31/22 09:46, Mattias Rönnblom wrote:
> > On 2022-03-30 16:26, Mattias Rönnblom wrote:
> >> A sequence lock (seqlock) is synchronization primitive which allows
> >> for data-race free, low-overhead, high-frequency reads, especially
> for
> >> data structures shared across many cores and which are updated with
> >> relatively infrequently.
> >>
> >>
> >
> > <snip>
> >
> > Some questions I have:
> >
> > Is a variant of the seqlock without the spinlock required? The reason
> I
> > left such out was that I thought that in most cases where only a
> single
> > writer is used (or serialization is external to the seqlock), the
> > spinlock overhead is negligible, since updates are relatively
> infrequent.

Mattias, when you suggested adding the seqlock, I considered this too, and came to the same conclusion as you.

> You can combine the spinlock and the sequence number. Odd sequence
> number means the seqlock is busy. That would replace a non-atomic RMW
> of
> the sequence number with an atomic RMW CAS and avoid the spin lock
> atomic RMW operation. Not sure how much it helps.
> 
> >
> > Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(),
> or
> > some third alternative? I wanted to make clear it's not just a
> "release
> > the lock" function. You could use
> > the|||__attribute__((warn_unused_result)) annotation to make clear
> the
> > return value cannot be ignored, although I'm not sure DPDK ever use
> that
> > attribute.

I strongly support adding __attribute__((warn_unused_result)) to the function. There's a first time for everything, and this attribute is very relevant here!

> We have to decide how to use the seqlock API from the application
> perspective.
> Your current proposal:
> do {
>      sn = rte_seqlock_read_begin(&seqlock)
>      //read protected data
> } while (rte_seqlock_read_retry(&seqlock, sn));
> 
> or perhaps
> sn = rte_seqlock_read_lock(&seqlock);
> do {
>      //read protected data
> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn));
> 
> Tryunlock should signal to the user that the unlock operation might not
> succeed and something needs to be repeated.

Perhaps rename rte_seqlock_read_retry() to rte_seqlock_read_tryend()? As Ola mentions, this also inverses the boolean result value. If you consider this, please check that the resulting assembly output remains efficient.

I think lock()/unlock() should be avoided in the read operation names, because no lock is taken during read. I like the critical region begin()/end() names.

Regarding naming, you should also consider renaming rte_seqlock_write_begin/end() to rte_seqlock_write_lock/unlock(), following the naming convention of the other locks. This could prepare for future extensions, such as rte_seqlock_write_trylock(). Just a thought; I don't feel strongly about this.

Ola, the rte_seqlock_read_lock(&seqlock) must remain inside the loop, because retries can be triggered by a write operation happening between the read_begin() and read_tryend(), and then the new sn must be used by the read operation.

Ola Liljedahl March 31, 2022, 9:38 a.m. UTC | #4

On 3/31/22 11:25, Morten Brørup wrote:
>> From: Ola Liljedahl [mailto:ola.liljedahl@arm.com]
>> Sent: Thursday, 31 March 2022 11.05
>>
>> On 3/31/22 09:46, Mattias Rönnblom wrote:
>>> On 2022-03-30 16:26, Mattias Rönnblom wrote:
>>>> A sequence lock (seqlock) is synchronization primitive which allows
>>>> for data-race free, low-overhead, high-frequency reads, especially
>> for
>>>> data structures shared across many cores and which are updated with
>>>> relatively infrequently.
>>>>
>>>>
>>>
>>> <snip>
>>>
>>> Some questions I have:
>>>
>>> Is a variant of the seqlock without the spinlock required? The reason
>> I
>>> left such out was that I thought that in most cases where only a
>> single
>>> writer is used (or serialization is external to the seqlock), the
>>> spinlock overhead is negligible, since updates are relatively
>> infrequent.
> 
> Mattias, when you suggested adding the seqlock, I considered this too, and came to the same conclusion as you.
> 
>> You can combine the spinlock and the sequence number. Odd sequence
>> number means the seqlock is busy. That would replace a non-atomic RMW
>> of
>> the sequence number with an atomic RMW CAS and avoid the spin lock
>> atomic RMW operation. Not sure how much it helps.
>>
>>>
>>> Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(),
>> or
>>> some third alternative? I wanted to make clear it's not just a
>> "release
>>> the lock" function. You could use
>>> the|||__attribute__((warn_unused_result)) annotation to make clear
>> the
>>> return value cannot be ignored, although I'm not sure DPDK ever use
>> that
>>> attribute.
> 
> I strongly support adding __attribute__((warn_unused_result)) to the function. There's a first time for everything, and this attribute is very relevant here!
> 
>> We have to decide how to use the seqlock API from the application
>> perspective.
>> Your current proposal:
>> do {
>>       sn = rte_seqlock_read_begin(&seqlock)
>>       //read protected data
>> } while (rte_seqlock_read_retry(&seqlock, sn));
>>
>> or perhaps
>> sn = rte_seqlock_read_lock(&seqlock);
>> do {
>>       //read protected data
>> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn));
>>
>> Tryunlock should signal to the user that the unlock operation might not
>> succeed and something needs to be repeated.
> 
> Perhaps rename rte_seqlock_read_retry() to rte_seqlock_read_tryend()? As Ola mentions, this also inverses the boolean result value. If you consider this, please check that the resulting assembly output remains efficient.
> 
> I think lock()/unlock() should be avoided in the read operation names, because no lock is taken during read. I like the critical region begin()/end() names.
I was following the naming convention of rte_rwlock. Isn't the seqlock 
just a more scalable implementation of a reader/writer lock?


> 
> Regarding naming, you should also consider renaming rte_seqlock_write_begin/end() to rte_seqlock_write_lock/unlock(), following the naming convention of the other locks. This could prepare for future extensions, such as rte_seqlock_write_trylock(). Just a thought; I don't feel strongly about this.
> 
> Ola, the rte_seqlock_read_lock(&seqlock) must remain inside the loop, because retries can be triggered by a write operation happening between the read_begin() and read_tryend(), and then the new sn must be used by the read operation.
That's why my rte_seqlock_read_tryunlock() function takes the sequence 
number as a parameter passed by reference. Then the sequence number can 
be updated if necessary. I didn't want to force a new call to 
rte_seqlock_read_lock() because there should be a one-to-one match 
between rte_seqlock_read_lock() and a successful call to 
rte_seqlock_read_tryunlock().

- Ola
>

Morten Brørup March 31, 2022, 10:03 a.m. UTC | #5

> From: Ola Liljedahl [mailto:ola.liljedahl@arm.com]
> Sent: Thursday, 31 March 2022 11.39
> 
> On 3/31/22 11:25, Morten Brørup wrote:
> >> From: Ola Liljedahl [mailto:ola.liljedahl@arm.com]
> >> Sent: Thursday, 31 March 2022 11.05
> >>
> >> On 3/31/22 09:46, Mattias Rönnblom wrote:
> >>> On 2022-03-30 16:26, Mattias Rönnblom wrote:
> >>>> A sequence lock (seqlock) is synchronization primitive which
> >>>> allows
> >>>> for data-race free, low-overhead, high-frequency reads, especially
> >>>> for
> >>>> data structures shared across many cores and which are updated
> >>>> with relatively infrequently.
> >>>>
> >>>>
> >>>
> >>> <snip>
> >>>
> >>> Some questions I have:
> >>>
> >>> Is a variant of the seqlock without the spinlock required? The
> >>> reason I
> >>> left such out was that I thought that in most cases where only a
> >>> single
> >>> writer is used (or serialization is external to the seqlock), the
> >>> spinlock overhead is negligible, since updates are relatively
> >>> infrequent.
> >
> > Mattias, when you suggested adding the seqlock, I considered this
> > too, and came to the same conclusion as you.
> >
> >> You can combine the spinlock and the sequence number. Odd sequence
> >> number means the seqlock is busy. That would replace a non-atomic
> >> RMW of
> >> the sequence number with an atomic RMW CAS and avoid the spin lock
> >> atomic RMW operation. Not sure how much it helps.
> >>
> >>>
> >>> Should the rte_seqlock_read_retry() be called
> >>> rte_seqlock_read_end(), or
> >>> some third alternative? I wanted to make clear it's not just a
> >>> "release the lock" function. You could use
> >>> the|||__attribute__((warn_unused_result)) annotation to make clear
> >>> the
> >>> return value cannot be ignored, although I'm not sure DPDK ever use
> >>> that attribute.
> >
> > I strongly support adding __attribute__((warn_unused_result)) to the
> > function. There's a first time for everything, and this attribute is
> > very relevant here!
> >
> >> We have to decide how to use the seqlock API from the application
> >> perspective.
> >> Your current proposal:
> >> do {
> >>       sn = rte_seqlock_read_begin(&seqlock)
> >>       //read protected data
> >> } while (rte_seqlock_read_retry(&seqlock, sn));
> >>
> >> or perhaps
> >> sn = rte_seqlock_read_lock(&seqlock);
> >> do {
> >>       //read protected data
> >> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn));
> >>
> >> Tryunlock should signal to the user that the unlock operation might
> >> not succeed and something needs to be repeated.
> >
> > Perhaps rename rte_seqlock_read_retry() to rte_seqlock_read_tryend()?
> > As Ola mentions, this also inverses the boolean result value. If you
> > consider this, please check that the resulting assembly output remains
> > efficient.
> >
> > I think lock()/unlock() should be avoided in the read operation
> > names, because no lock is taken during read. I like the critical region
> > begin()/end() names.
> I was following the naming convention of rte_rwlock. Isn't the seqlock
> just a more scalable implementation of a reader/writer lock?

I see your point. However, no lock is taken, so using lock()/unlock() is somewhat misleading.

I have no strong opinion about this, so I'll leave it up to Mattias.

> > Regarding naming, you should also consider renaming
> > rte_seqlock_write_begin/end() to rte_seqlock_write_lock/unlock(),
> > following the naming convention of the other locks. This could prepare
> > for future extensions, such as rte_seqlock_write_trylock(). Just a
> > thought; I don't feel strongly about this.
> >
> > Ola, the rte_seqlock_read_lock(&seqlock) must remain inside the loop,
> > because retries can be triggered by a write operation happening between
> > the read_begin() and read_tryend(), and then the new sn must be used by
> > the read operation.
> That's why my rte_seqlock_read_tryunlock() function takes the sequence
> number as a parameter passed by reference. Then the sequence number can
> be updated if necessary. I didn't want to force a new call to
> rte_seqlock_read_lock() because there should be a one-to-one match
> between rte_seqlock_read_lock() and a successful call to
> rte_seqlock_read_tryunlock().

Uhh... I missed that point.

In that case, consider passing sn as output parameter to read_begin() by reference too, like the Linux kernel's spin_lock_irqsave() takes the flags as output parameter. I don't have a strong opinion here; just mentioning the possibility.

Performance wise, the resulting assembly output is probably the same, regardless if sn is the return value or an output parameter passed by reference.

Ola Liljedahl March 31, 2022, 11:44 a.m. UTC | #6

<snip>
>>> I think lock()/unlock() should be avoided in the read operation
>>> names, because no lock is taken during read. I like the critical region
>>> begin()/end() names.
>> I was following the naming convention of rte_rwlock. Isn't the seqlock
>> just a more scalable implementation of a reader/writer lock?
> 
> I see your point. However, no lock is taken, so using lock()/unlock() is somewhat misleading.
Conceptually, a reader lock is acquired and should be released. Now 
there wouldn't be any negative effects of skipping the unlock operation 
but then you wouldn't know if the data was properly read so you would 
have to ignore any read data as well. Why even call 
rte_seqlock_read_lock() in such a situation?

In the only meaningful case, the lock is acquired, the protected data is 
read and the lock is released. The only difference compared to a more 
vanilla lock implementation is that the release operation may fail and 
the operation must restart.

<snip>

- Ola

Morten Brørup March 31, 2022, 11:50 a.m. UTC | #7

> From: Ola Liljedahl [mailto:ola.liljedahl@arm.com]
> Sent: Thursday, 31 March 2022 13.44
> 
> <snip>
> >>> I think lock()/unlock() should be avoided in the read operation
> >>> names, because no lock is taken during read. I like the critical
> region
> >>> begin()/end() names.
> >> I was following the naming convention of rte_rwlock. Isn't the
> seqlock
> >> just a more scalable implementation of a reader/writer lock?
> >
> > I see your point. However, no lock is taken, so using lock()/unlock()
> is somewhat misleading.
> Conceptually, a reader lock is acquired and should be released. Now
> there wouldn't be any negative effects of skipping the unlock operation
> but then you wouldn't know if the data was properly read so you would
> have to ignore any read data as well. Why even call
> rte_seqlock_read_lock() in such a situation?
> 
> In the only meaningful case, the lock is acquired, the protected data
> is
> read and the lock is released. The only difference compared to a more
> vanilla lock implementation is that the release operation may fail and
> the operation must restart.

Thank you for taking the time to correct me on this...

I was stuck on the "lock" variable not being touched, but you are right: The serial number is a lock conceptually taken.

Then I agree about the lock()/unlock() names for the read operation too.

-Morten

Mattias Rönnblom March 31, 2022, 1:38 p.m. UTC | #8

On 2022-03-31 11:04, Ola Liljedahl wrote:
> 
> On 3/31/22 09:46, Mattias Rönnblom wrote:
>> On 2022-03-30 16:26, Mattias Rönnblom wrote:
>>> A sequence lock (seqlock) is synchronization primitive which allows
>>> for data-race free, low-overhead, high-frequency reads, especially for
>>> data structures shared across many cores and which are updated with
>>> relatively infrequently.
>>>
>>>
>>
>> <snip>
>>
>> Some questions I have:
>>
>> Is a variant of the seqlock without the spinlock required? The reason I
>> left such out was that I thought that in most cases where only a single
>> writer is used (or serialization is external to the seqlock), the
>> spinlock overhead is negligible, since updates are relatively infrequent.
> You can combine the spinlock and the sequence number. Odd sequence 
> number means the seqlock is busy. That would replace a non-atomic RMW of 
> the sequence number with an atomic RMW CAS and avoid the spin lock 
> atomic RMW operation. Not sure how much it helps.
> 
>>
>> Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(), or
>> some third alternative? I wanted to make clear it's not just a "release
>> the lock" function. You could use
>> the|||__attribute__((warn_unused_result)) annotation to make clear the
>> return value cannot be ignored, although I'm not sure DPDK ever use that
>> attribute.
> We have to decide how to use the seqlock API from the application 
> perspective.
> Your current proposal:
> do {
>      sn = rte_seqlock_read_begin(&seqlock)
>      //read protected data
> } while (rte_seqlock_read_retry(&seqlock, sn));
> 
> or perhaps
> sn = rte_seqlock_read_lock(&seqlock);
> do {
>      //read protected data
> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn));
> 
> Tryunlock should signal to the user that the unlock operation might not 
> succeed and something needs to be repeated.
> 

I like that your proposal is consistent with rwlock API, although I tend 
to think about a seqlock more like an arbitrary-size atomic load/store, 
where begin() is the beginning of the read transaction.

What I don't like so much with "tryunlock" is that it's not obvious what 
return type and values it should have. I seem not to be the only one 
which suffers from a lack of intuition here, since the DPDK spinlock 
trylock() function returns '1' in case lock is taken (using an int, but 
treating it like a bool), while the rwlock equivalent returns '0' (also 
int, but treating it as an error code).

"lock" also suggests you prevent something from occurring, which is not 
the case on the reader side. A calling application also need not call 
the reader unlock (or retry) function for all seqlocks it has locked, 
although I don't see a point why it wouldn't. (I don't see why a 
read-side critical section should contain much logic at all, since you 
can't act on the just-read data.)

Mattias Rönnblom March 31, 2022, 1:51 p.m. UTC | #9

On 2022-03-31 11:25, Morten Brørup wrote:
>> From: Ola Liljedahl [mailto:ola.liljedahl@arm.com]
>> Sent: Thursday, 31 March 2022 11.05
>>
>> On 3/31/22 09:46, Mattias Rönnblom wrote:
>>> On 2022-03-30 16:26, Mattias Rönnblom wrote:
>>>> A sequence lock (seqlock) is synchronization primitive which allows
>>>> for data-race free, low-overhead, high-frequency reads, especially
>> for
>>>> data structures shared across many cores and which are updated with
>>>> relatively infrequently.
>>>>
>>>>
>>>
>>> <snip>
>>>
>>> Some questions I have:
>>>
>>> Is a variant of the seqlock without the spinlock required? The reason
>> I
>>> left such out was that I thought that in most cases where only a
>> single
>>> writer is used (or serialization is external to the seqlock), the
>>> spinlock overhead is negligible, since updates are relatively
>> infrequent.
> 
> Mattias, when you suggested adding the seqlock, I considered this too, and came to the same conclusion as you.
> 
>> You can combine the spinlock and the sequence number. Odd sequence
>> number means the seqlock is busy. That would replace a non-atomic RMW
>> of
>> the sequence number with an atomic RMW CAS and avoid the spin lock
>> atomic RMW operation. Not sure how much it helps.
>>
>>>
>>> Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(),
>> or
>>> some third alternative? I wanted to make clear it's not just a
>> "release
>>> the lock" function. You could use
>>> the|||__attribute__((warn_unused_result)) annotation to make clear
>> the
>>> return value cannot be ignored, although I'm not sure DPDK ever use
>> that
>>> attribute.
> 
> I strongly support adding __attribute__((warn_unused_result)) to the function. There's a first time for everything, and this attribute is very relevant here!
> 

That would be a separate patch, I assume. Does anyone know if this 
attribute is available in all supported compilers?

>> We have to decide how to use the seqlock API from the application
>> perspective.
>> Your current proposal:
>> do {
>>       sn = rte_seqlock_read_begin(&seqlock)
>>       //read protected data
>> } while (rte_seqlock_read_retry(&seqlock, sn));
>>
>> or perhaps
>> sn = rte_seqlock_read_lock(&seqlock);
>> do {
>>       //read protected data
>> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn));
>>
>> Tryunlock should signal to the user that the unlock operation might not
>> succeed and something needs to be repeated.
> 
> Perhaps rename rte_seqlock_read_retry() to rte_seqlock_read_tryend()? As Ola mentions, this also inverses the boolean result value. If you consider this, please check that the resulting assembly output remains efficient.
> 
> I think lock()/unlock() should be avoided in the read operation names, because no lock is taken during read. I like the critical region begin()/end() names.
> 
> Regarding naming, you should also consider renaming rte_seqlock_write_begin/end() to rte_seqlock_write_lock/unlock(), following the naming convention of the other locks. This could prepare for future extensions, such as rte_seqlock_write_trylock(). Just a thought; I don't feel strongly about this.
> 
> Ola, the rte_seqlock_read_lock(&seqlock) must remain inside the loop, because retries can be triggered by a write operation happening between the read_begin() and read_tryend(), and then the new sn must be used by the read operation.
>

Mattias Rönnblom March 31, 2022, 2:02 p.m. UTC | #10

On 2022-03-31 13:44, Ola Liljedahl wrote:
> <snip>
>>>> I think lock()/unlock() should be avoided in the read operation
>>>> names, because no lock is taken during read. I like the critical region
>>>> begin()/end() names.
>>> I was following the naming convention of rte_rwlock. Isn't the seqlock
>>> just a more scalable implementation of a reader/writer lock?
>>
>> I see your point. However, no lock is taken, so using lock()/unlock() 
>> is somewhat misleading.
> Conceptually, a reader lock is acquired and should be released. Now 
> there wouldn't be any negative effects of skipping the unlock operation 
> but then you wouldn't know if the data was properly read so you would 
> have to ignore any read data as well. Why even call 
> rte_seqlock_read_lock() in such a situation?
> 
> In the only meaningful case, the lock is acquired, the protected data is 
> read and the lock is released. The only difference compared to a more 
> vanilla lock implementation is that the release operation may fail and 
> the operation must restart.
> 
> <snip>
> 
> - Ola

The RCU library also use the terms "lock" and "unlock" for the reader side.

Ola Liljedahl March 31, 2022, 2:53 p.m. UTC | #11

(Thunderbird suddenly refuses to edit in plain text mode, hope the mail 
gets sent as text anyway)

On 3/31/22 15:38, Mattias Rönnblom wrote:

> On 2022-03-31 11:04, Ola Liljedahl wrote:
>> On 3/31/22 09:46, Mattias Rönnblom wrote:
>>> On 2022-03-30 16:26, Mattias Rönnblom wrote:
>>>
<snip>
>>> Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(), or
>>> some third alternative? I wanted to make clear it's not just a "release
>>> the lock" function. You could use
>>> the|||__attribute__((warn_unused_result)) annotation to make clear the
>>> return value cannot be ignored, although I'm not sure DPDK ever use that
>>> attribute.
>> We have to decide how to use the seqlock API from the application
>> perspective.
>> Your current proposal:
>> do {
>>       sn = rte_seqlock_read_begin(&seqlock)
>>       //read protected data
>> } while (rte_seqlock_read_retry(&seqlock, sn));
>>
>> or perhaps
>> sn = rte_seqlock_read_lock(&seqlock);
>> do {
>>       //read protected data
>> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn));
>>
>> Tryunlock should signal to the user that the unlock operation might not
>> succeed and something needs to be repeated.
>>
> I like that your proposal is consistent with rwlock API, although I tend
> to think about a seqlock more like an arbitrary-size atomic load/store,
> where begin() is the beginning of the read transaction.

I can see the evolution of an application where is starts to use plain 
spin locks, moves to reader/writer locks for better performance and 
eventually moves to seqlocks. The usage is the same, only the 
characteristics (including performance) differ.

>
> What I don't like so much with "tryunlock" is that it's not obvious what
> return type and values it should have. I seem not to be the only one
> which suffers from a lack of intuition here, since the DPDK spinlock
> trylock() function returns '1' in case lock is taken (using an int, but
> treating it like a bool), while the rwlock equivalent returns '0' (also
> int, but treating it as an error code).
Then you have two different ways of doing it! Or invent a third since 
there seems to be no consistent pattern.
>
> "lock" also suggests you prevent something from occurring, which is not
> the case on the reader side.

That's why my implementations in Progress64 use the terms acquire and 
release. Locks are acquired and released (with acquire and release 
semantics!). Hazard pointers are acquired and released (with acquire and 
release semantics!). Slots in reorder buffers are acquired and released. 
Etc.

https://github.com/ARM-software/progress64

>   A calling application also need not call
> the reader unlock (or retry) function for all seqlocks it has locked,
> although I don't see a point why it wouldn't. (I don't see why a
> read-side critical section should contain much logic at all, since you
> can't act on the just-read data.)

Lock without unlock/retry is meaningless and not something we need to 
consider IMO.


- Ola

Stephen Hemminger April 2, 2022, 12:50 a.m. UTC | #12

On Wed, 30 Mar 2022 16:26:02 +0200
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
> +
> +	/* __ATOMIC_RELEASE to prevent stores after (in program
> order)
> +	 * from happening before the sn store.
> +	 */
> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);

Couldn't atomic store with __ATOMIC_RELEASE do same thing?

> +static inline void
> +rte_seqlock_write_end(rte_seqlock_t *seqlock)
> +{
> +	uint32_t sn;
> +
> +	sn = seqlock->sn + 1;
> +
> +	/* synchronizes-with the load acquire in rte_seqlock_begin()
> */
> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
> +
> +	rte_spinlock_unlock(&seqlock->lock);

Atomic store is not necessary here, the atomic operation in
spinlock_unlock wil assure theat the seqeuence number update is
ordered correctly.

Stephen Hemminger April 2, 2022, 12:52 a.m. UTC | #13

On Thu, 31 Mar 2022 16:53:00 +0200
Ola Liljedahl <ola.liljedahl@arm.com> wrote:

> From: Ola Liljedahl <ola.liljedahl@arm.com>
> To: Mattias Rönnblom <mattias.ronnblom@ericsson.com>,  "dev@dpdk.org"
> <dev@dpdk.org> Cc: Thomas Monjalon <thomas@monjalon.net>,  David
> Marchand <david.marchand@redhat.com>,  Onar Olsen
> <onar.olsen@ericsson.com>,  "Honnappa.Nagarahalli@arm.com"
> <Honnappa.Nagarahalli@arm.com>,  "nd@arm.com" <nd@arm.com>,
> "konstantin.ananyev@intel.com" <konstantin.ananyev@intel.com>,
> "mb@smartsharesystems.com" <mb@smartsharesystems.com>,
> "stephen@networkplumber.org" <stephen@networkplumber.org> Subject:
> Re: [PATCH v2] eal: add seqlock Date: Thu, 31 Mar 2022 16:53:00 +0200
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
> Thunderbird/91.7.0
> 
> (Thunderbird suddenly refuses to edit in plain text mode, hope the
> mail gets sent as text anyway)
> 
> On 3/31/22 15:38, Mattias Rönnblom wrote:
> 
> > On 2022-03-31 11:04, Ola Liljedahl wrote:  
> >> On 3/31/22 09:46, Mattias Rönnblom wrote:  
> >>> On 2022-03-30 16:26, Mattias Rönnblom wrote:
> >>>  
> <snip>
> >>> Should the rte_seqlock_read_retry() be called
> >>> rte_seqlock_read_end(), or some third alternative? I wanted to
> >>> make clear it's not just a "release the lock" function. You could
> >>> use the|||__attribute__((warn_unused_result)) annotation to make
> >>> clear the return value cannot be ignored, although I'm not sure
> >>> DPDK ever use that attribute.  
> >> We have to decide how to use the seqlock API from the application
> >> perspective.
> >> Your current proposal:
> >> do {
> >>       sn = rte_seqlock_read_begin(&seqlock)
> >>       //read protected data
> >> } while (rte_seqlock_read_retry(&seqlock, sn));
> >>
> >> or perhaps
> >> sn = rte_seqlock_read_lock(&seqlock);
> >> do {
> >>       //read protected data
> >> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn));
> >>
> >> Tryunlock should signal to the user that the unlock operation
> >> might not succeed and something needs to be repeated.
> >>  
> > I like that your proposal is consistent with rwlock API, although I
> > tend to think about a seqlock more like an arbitrary-size atomic
> > load/store, where begin() is the beginning of the read transaction.
> >  
> 
> I can see the evolution of an application where is starts to use
> plain spin locks, moves to reader/writer locks for better performance
> and eventually moves to seqlocks. The usage is the same, only the 
> characteristics (including performance) differ.


The semantics of seqlock in DPDK must be the same as what Linux kernel
does or you are asking for trouble.  It is not a reader-writer lock in
traditional sense.

Stephen Hemminger April 2, 2022, 12:54 a.m. UTC | #14

On Thu, 31 Mar 2022 13:51:32 +0000
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> > 
> > Regarding naming, you should also consider renaming
> > rte_seqlock_write_begin/end() to rte_seqlock_write_lock/unlock(),
> > following the naming convention of the other locks. This could
> > prepare for future extensions, such as rte_seqlock_write_trylock().
> > Just a thought; I don't feel strongly about this.

Semantics and naming should be the same as Linux kernel or you risk
having to reeducate too many people.

Morten Brørup April 2, 2022, 10:25 a.m. UTC | #15

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Saturday, 2 April 2022 02.54
> 
> Semantics and naming should be the same as Linux kernel or you risk
> having to reeducate too many people.

Although I do see significant value in that point, I don't consider the Linux kernel API the definitive golden standard in all regards. If DPDK can do better, it should.

However, if different naming/semantics does a better job for DPDK, then we should take care to avoid similar function names with different behavior than Linux, to reduce the risk of incorrect use by seasoned Linux kernel developers.

Ola Liljedahl April 2, 2022, 5:43 p.m. UTC | #16

On 4/2/22 12:25, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Saturday, 2 April 2022 02.54
>>
>> Semantics and naming should be the same as Linux kernel or you risk
>> having to reeducate too many people.
> Although I do see significant value in that point, I don't consider the Linux kernel API the definitive golden standard in all regards. If DPDK can do better, it should.
>
> However, if different naming/semantics does a better job for DPDK, then we should take care to avoid similar function names with different behavior than Linux, to reduce the risk of incorrect use by seasoned Linux kernel developers.
Couldn't have said it better myself.

Ola Liljedahl April 2, 2022, 5:54 p.m. UTC | #17

On 4/2/22 02:50, Stephen Hemminger wrote:
> On Wed, 30 Mar 2022 16:26:02 +0200
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
>
>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
>> +
>> +	/* __ATOMIC_RELEASE to prevent stores after (in program
>> order)
>> +	 * from happening before the sn store.
>> +	 */
>> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
> Couldn't atomic store with __ATOMIC_RELEASE do same thing?

No, store-release wouldn't prevent later stores from moving up. It only 
ensures that earlier loads and stores have completed before 
store-release completes. If later stores could move before a supposed 
store-release(seqlock->sn), readers could see inconsistent (torn) data 
with a valid sequence number.


>> +static inline void
>> +rte_seqlock_write_end(rte_seqlock_t *seqlock)
>> +{
>> +	uint32_t sn;
>> +
>> +	sn = seqlock->sn + 1;
>> +
>> +	/* synchronizes-with the load acquire in rte_seqlock_begin()
>> */
>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
>> +
>> +	rte_spinlock_unlock(&seqlock->lock);
> Atomic store is not necessary here, the atomic operation in
> spinlock_unlock wil assure theat the seqeuence number update is
> ordered correctly.
Load-acquire(seqlock->sn) in rte_seqlock_begin() must be paired with 
store-release(seqlock->sn) in rte_seqlock_write_end() or there wouldn't 
exist any synchronize-with relationship. Readers don't access the spin 
lock so any writer-side updates to the spin lock don't mean anything to 
readers.

Honnappa Nagarahalli April 2, 2022, 7:37 p.m. UTC | #18

<snip>

> >> +static inline void
> >> +rte_seqlock_write_end(rte_seqlock_t *seqlock) {
> >> +	uint32_t sn;
> >> +
> >> +	sn = seqlock->sn + 1;
> >> +
> >> +	/* synchronizes-with the load acquire in rte_seqlock_begin()
> >> */
> >> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
> >> +
> >> +	rte_spinlock_unlock(&seqlock->lock);
> > Atomic store is not necessary here, the atomic operation in
> > spinlock_unlock wil assure theat the seqeuence number update is
> > ordered correctly.
> Load-acquire(seqlock->sn) in rte_seqlock_begin() must be paired with
> store-release(seqlock->sn) in rte_seqlock_write_end() or there wouldn't exist
> any synchronize-with relationship. Readers don't access the spin lock so any
> writer-side updates to the spin lock don't mean anything to readers.
Agree with this assessment. The store-release in spin-lock unlock does not synchronize with the readers.

Mattias Rönnblom April 3, 2022, 6:23 a.m. UTC | #19

On 2022-04-02 02:52, Stephen Hemminger wrote:
> On Thu, 31 Mar 2022 16:53:00 +0200
> Ola Liljedahl <ola.liljedahl@arm.com> wrote:
> 
>> From: Ola Liljedahl <ola.liljedahl@arm.com>
>> To: Mattias Rönnblom <mattias.ronnblom@ericsson.com>,  "dev@dpdk.org"
>> <dev@dpdk.org> Cc: Thomas Monjalon <thomas@monjalon.net>,  David
>> Marchand <david.marchand@redhat.com>,  Onar Olsen
>> <onar.olsen@ericsson.com>,  "Honnappa.Nagarahalli@arm.com"
>> <Honnappa.Nagarahalli@arm.com>,  "nd@arm.com" <nd@arm.com>,
>> "konstantin.ananyev@intel.com" <konstantin.ananyev@intel.com>,
>> "mb@smartsharesystems.com" <mb@smartsharesystems.com>,
>> "stephen@networkplumber.org" <stephen@networkplumber.org> Subject:
>> Re: [PATCH v2] eal: add seqlock Date: Thu, 31 Mar 2022 16:53:00 +0200
>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
>> Thunderbird/91.7.0
>>
>> (Thunderbird suddenly refuses to edit in plain text mode, hope the
>> mail gets sent as text anyway)
>>
>> On 3/31/22 15:38, Mattias Rönnblom wrote:
>>
>>> On 2022-03-31 11:04, Ola Liljedahl wrote:
>>>> On 3/31/22 09:46, Mattias Rönnblom wrote:
>>>>> On 2022-03-30 16:26, Mattias Rönnblom wrote:
>>>>>   
>> <snip>
>>>>> Should the rte_seqlock_read_retry() be called
>>>>> rte_seqlock_read_end(), or some third alternative? I wanted to
>>>>> make clear it's not just a "release the lock" function. You could
>>>>> use the|||__attribute__((warn_unused_result)) annotation to make
>>>>> clear the return value cannot be ignored, although I'm not sure
>>>>> DPDK ever use that attribute.
>>>> We have to decide how to use the seqlock API from the application
>>>> perspective.
>>>> Your current proposal:
>>>> do {
>>>>        sn = rte_seqlock_read_begin(&seqlock)
>>>>        //read protected data
>>>> } while (rte_seqlock_read_retry(&seqlock, sn));
>>>>
>>>> or perhaps
>>>> sn = rte_seqlock_read_lock(&seqlock);
>>>> do {
>>>>        //read protected data
>>>> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn));
>>>>
>>>> Tryunlock should signal to the user that the unlock operation
>>>> might not succeed and something needs to be repeated.
>>>>   
>>> I like that your proposal is consistent with rwlock API, although I
>>> tend to think about a seqlock more like an arbitrary-size atomic
>>> load/store, where begin() is the beginning of the read transaction.
>>>   
>>
>> I can see the evolution of an application where is starts to use
>> plain spin locks, moves to reader/writer locks for better performance
>> and eventually moves to seqlocks. The usage is the same, only the
>> characteristics (including performance) differ.
> 
> 
> The semantics of seqlock in DPDK must be the same as what Linux kernel
> does or you are asking for trouble.  It is not a reader-writer lock in
> traditional sense.

Does "semantics" here including the naming of the functions? The overall 
semantics will be the same, except the kernel has a bunch of variants 
with different kind of write-side synchronization, if I recall correctly.

I'll try to summarize the options as I see them:

Option A: (PATCH v3):
rte_seqlock_read_lock()
rte_seqlock_read_tryunlock() /* with built-in "read restart" */ 
rte_seqlock_write_lock()
rte_seqlock_write_unlock()


Option B: (Linux kernel-style naming):
rte_seqlock_read_begin()
rte_seqlock_read_end()
rte_seqlock_write_begin()
rte_seqlock_write_end()

A combination, acknowledging there's a lock on the writer side, but not 
on the read side.

Option C:

rte_seqlock_read_begin()
rte_seqlock_read_retry()
rte_seqlock_write_lock()
rte_seqlock_write_unlock()

Stephen Hemminger April 5, 2022, 8:16 p.m. UTC | #20

On Wed, 30 Mar 2022 16:26:02 +0200
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> +/**
> + * A static seqlock initializer.
> + */
> +#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER }

Used named field initializers here please.

> +/**
> + * Initialize the seqlock.
> + *
> + * This function initializes the seqlock, and leaves the writer-side
> + * spinlock unlocked.
> + *
> + * @param seqlock
> + *   A pointer to the seqlock.
> + */
> +__rte_experimental
> +void
> +rte_seqlock_init(rte_seqlock_t *seqlock);

You need to add the standard experimental prefix
to the comment so that doxygen marks the API as experimental
in documentation.

> +static inline bool
> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn)
> +{
> +	uint32_t end_sn;
> +
> +	/* make sure the data loads happens before the sn load */
> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> +
> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
> +
> +	return unlikely(begin_sn & 1 || begin_sn != end_sn);

Please add parenthesis around the and to test if odd.
It would be good to document why if begin_sn is odd it returns false.

Mattias Rönnblom April 8, 2022, 1:50 p.m. UTC | #21

On 2022-04-05 22:16, Stephen Hemminger wrote:
> On Wed, 30 Mar 2022 16:26:02 +0200
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> 
>> +/**
>> + * A static seqlock initializer.
>> + */
>> +#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER }
> 
> Used named field initializers here please.
> 

OK.

>> +/**
>> + * Initialize the seqlock.
>> + *
>> + * This function initializes the seqlock, and leaves the writer-side
>> + * spinlock unlocked.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + */
>> +__rte_experimental
>> +void
>> +rte_seqlock_init(rte_seqlock_t *seqlock);
> 
> You need to add the standard experimental prefix
> to the comment so that doxygen marks the API as experimental
> in documentation.
> 

OK.

>> +static inline bool
>> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn)
>> +{
>> +	uint32_t end_sn;
>> +
>> +	/* make sure the data loads happens before the sn load */
>> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>> +
>> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
>> +
>> +	return unlikely(begin_sn & 1 || begin_sn != end_sn);
> 
> Please add parenthesis around the and to test if odd.
> It would be good to document why if begin_sn is odd it returns false.
> 
> 

Will do. Thanks.

diff mbox series

Patch

diff --git a/app/test/meson.build b/app/test/meson.build
index 5fc1dd1b7b..5e418e8766 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -125,6 +125,7 @@  test_sources = files(
         'test_rwlock.c',
         'test_sched.c',
         'test_security.c',
+        'test_seqlock.c',
         'test_service_cores.c',
         'test_spinlock.c',
         'test_stack.c',
@@ -214,6 +215,7 @@  fast_tests = [
         ['rwlock_rde_wro_autotest', true],
         ['sched_autotest', true],
         ['security_autotest', false],
+        ['seqlock_autotest', true],
         ['spinlock_autotest', true],
         ['stack_autotest', false],
         ['stack_lf_autotest', false],
diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c
new file mode 100644
index 0000000000..ba1755d9ad
--- /dev/null
+++ b/app/test/test_seqlock.c
@@ -0,0 +1,202 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+#include <rte_cycles.h>
+#include <rte_malloc.h>
+#include <rte_random.h>
+
+#include <inttypes.h>
+
+#include "test.h"
+
+struct data {
+	rte_seqlock_t lock;
+
+	uint64_t a;
+	uint64_t b __rte_cache_aligned;
+	uint64_t c __rte_cache_aligned;
+} __rte_cache_aligned;
+
+struct reader {
+	struct data *data;
+	uint8_t stop;
+};
+
+#define WRITER_RUNTIME (2.0) /* s */
+
+#define WRITER_MAX_DELAY (100) /* us */
+
+#define INTERRUPTED_WRITER_FREQUENCY (1000)
+#define WRITER_INTERRUPT_TIME (1) /* us */
+
+static int
+writer_run(void *arg)
+{
+	struct data *data = arg;
+	uint64_t deadline;
+
+	deadline = rte_get_timer_cycles() +
+		WRITER_RUNTIME * rte_get_timer_hz();
+
+	while (rte_get_timer_cycles() < deadline) {
+		bool interrupted;
+		uint64_t new_value;
+		unsigned int delay;
+
+		new_value = rte_rand();
+
+		interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
+
+		rte_seqlock_write_begin(&data->lock);
+
+		data->c = new_value;
+
+		/* These compiler barriers (both on the test reader
+		 * and the test writer side) are here to ensure that
+		 * loads/stores *usually* happen in test program order
+		 * (always on a TSO machine). They are arrange in such
+		 * a way that the writer stores in a different order
+		 * than the reader loads, to emulate an arbitrary
+		 * order. A real application using a seqlock does not
+		 * require any compiler barriers.
+		 */
+		rte_compiler_barrier();
+		data->b = new_value;
+
+		if (interrupted)
+			rte_delay_us_block(WRITER_INTERRUPT_TIME);
+
+		rte_compiler_barrier();
+		data->a = new_value;
+
+		rte_seqlock_write_end(&data->lock);
+
+		delay = rte_rand_max(WRITER_MAX_DELAY);
+
+		rte_delay_us_block(delay);
+	}
+
+	return 0;
+}
+
+#define INTERRUPTED_READER_FREQUENCY (1000)
+#define READER_INTERRUPT_TIME (1000) /* us */
+
+static int
+reader_run(void *arg)
+{
+	struct reader *r = arg;
+	int rc = 0;
+
+	while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc == 0) {
+		struct data *data = r->data;
+		bool interrupted;
+		uint64_t a;
+		uint64_t b;
+		uint64_t c;
+		uint32_t sn;
+
+		interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0;
+
+		do {
+			sn = rte_seqlock_read_begin(&data->lock);
+
+			a = data->a;
+			/* See writer_run() for an explanation why
+			 * these barriers are here.
+			 */
+			rte_compiler_barrier();
+
+			if (interrupted)
+				rte_delay_us_block(READER_INTERRUPT_TIME);
+
+			c = data->c;
+
+			rte_compiler_barrier();
+			b = data->b;
+
+		} while (rte_seqlock_read_retry(&data->lock, sn));
+
+		if (a != b || b != c) {
+			printf("Reader observed inconsistent data values "
+			       "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
+			       a, b, c);
+			rc = -1;
+		}
+	}
+
+	return rc;
+}
+
+static void
+reader_stop(struct reader *reader)
+{
+	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED);
+}
+
+#define NUM_WRITERS (2) /* master lcore + one worker */
+#define MIN_NUM_READERS (2)
+#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1)
+#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS)
+
+/* Only a compile-time test */
+static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER;
+
+static int
+test_seqlock(void)
+{
+	struct reader readers[MAX_READERS];
+	unsigned int num_readers;
+	unsigned int num_lcores;
+	unsigned int i;
+	unsigned int lcore_id;
+	unsigned int reader_lcore_ids[MAX_READERS];
+	unsigned int worker_writer_lcore_id = 0;
+	int rc = 0;
+
+	num_lcores = rte_lcore_count();
+
+	if (num_lcores < MIN_LCORE_COUNT) {
+		printf("Too few cores to run test. Skipping.\n");
+		return 0;
+	}
+
+	num_readers = num_lcores - NUM_WRITERS;
+
+	struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0);
+
+	i = 0;
+	RTE_LCORE_FOREACH_WORKER(lcore_id) {
+		if (i == 0) {
+			rte_eal_remote_launch(writer_run, data, lcore_id);
+			worker_writer_lcore_id = lcore_id;
+		} else {
+			unsigned int reader_idx = i - 1;
+			struct reader *reader = &readers[reader_idx];
+
+			reader->data = data;
+			reader->stop = 0;
+
+			rte_eal_remote_launch(reader_run, reader, lcore_id);
+			reader_lcore_ids[reader_idx] = lcore_id;
+		}
+		i++;
+	}
+
+	if (writer_run(data) != 0 ||
+	    rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
+		rc = -1;
+
+	for (i = 0; i < num_readers; i++) {
+		reader_stop(&readers[i]);
+		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
+			rc = -1;
+	}
+
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
index 917758cc65..a41343bfed 100644
--- a/lib/eal/common/meson.build
+++ b/lib/eal/common/meson.build
@@ -35,6 +35,7 @@  sources += files(
         'rte_malloc.c',
         'rte_random.c',
         'rte_reciprocal.c',
+	'rte_seqlock.c',
         'rte_service.c',
         'rte_version.c',
 )
diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c
new file mode 100644
index 0000000000..d4fe648799
--- /dev/null
+++ b/lib/eal/common/rte_seqlock.c
@@ -0,0 +1,12 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+void
+rte_seqlock_init(rte_seqlock_t *seqlock)
+{
+	seqlock->sn = 0;
+	rte_spinlock_init(&seqlock->lock);
+}
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index 9700494816..48df5f1a21 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -36,6 +36,7 @@  headers += files(
         'rte_per_lcore.h',
         'rte_random.h',
         'rte_reciprocal.h',
+        'rte_seqlock.h',
         'rte_service.h',
         'rte_service_component.h',
         'rte_string_fns.h',
diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
new file mode 100644
index 0000000000..12cc3cdcb2
--- /dev/null
+++ b/lib/eal/include/rte_seqlock.h
@@ -0,0 +1,282 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#ifndef _RTE_SEQLOCK_H_
+#define _RTE_SEQLOCK_H_
+
+/**
+ * @file
+ * RTE Seqlock
+ *
+ * A sequence lock (seqlock) is a synchronization primitive allowing
+ * multiple, parallel, readers to efficiently and safely (i.e., in a
+ * data-race free manner) access the lock-protected data. The RTE
+ * seqlock permits multiple writers as well. A spinlock is used for
+ * writer-writer synchronization.
+ *
+ * A reader never blocks a writer. Very high frequency writes may
+ * prevent readers from making progress.
+ *
+ * A seqlock is not preemption-safe on the writer side. If a writer is
+ * preempted, it may block readers until the writer thread is again
+ * allowed to execute. Heavy computations should be kept out of the
+ * writer-side critical section, to avoid delaying readers.
+ *
+ * Seqlocks are useful for data which are read by many cores, at a
+ * high frequency, and relatively infrequently written to.
+ *
+ * One way to think about seqlocks is that they provide means to
+ * perform atomic operations on objects larger than what the native
+ * machine instructions allow for.
+ *
+ * To avoid resource reclamation issues, the data protected by a
+ * seqlock should typically be kept self-contained (e.g., no pointers
+ * to mutable, dynamically allocated data).
+ *
+ * Example usage:
+ * @code{.c}
+ * #define MAX_Y_LEN (16)
+ * // Application-defined example data structure, protected by a seqlock.
+ * struct config {
+ *         rte_seqlock_t lock;
+ *         int param_x;
+ *         char param_y[MAX_Y_LEN];
+ * };
+ *
+ * // Accessor function for reading config fields.
+ * void
+ * config_read(const struct config *config, int *param_x, char *param_y)
+ * {
+ *         // Temporary variables, just to improve readability.
+ *         int tentative_x;
+ *         char tentative_y[MAX_Y_LEN];
+ *
+ *         do {
+ *                 rte_seqlock_read(&config->lock);
+ *                 // Loads may be atomic or non-atomic, as in this example.
+ *                 tentative_x = config->param_x;
+ *                 strcpy(tentative_y, config->param_y);
+ *         } while (rte_seqlock_read_retry(&config->lock));
+ *         // An application could skip retrying, and try again later, if
+ *         // it can make progress without the data.
+ *
+ *         *param_x = tentative_x;
+ *         strcpy(param_y, tentative_y);
+ * }
+ *
+ * // Accessor function for writing config fields.
+ * void
+ * config_update(struct config *config, int param_x, const char *param_y)
+ * {
+ *         rte_seqlock_write_begin(&config->lock);
+ *         // Stores may be atomic or non-atomic, as in this example.
+ *         config->param_x = param_x;
+ *         strcpy(config->param_y, param_y);
+ *         rte_seqlock_write_end(&config->lock);
+ * }
+ * @endcode
+ *
+ * @see
+ * https://en.wikipedia.org/wiki/Seqlock.
+ */
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_spinlock.h>
+
+/**
+ * The RTE seqlock type.
+ */
+typedef struct {
+	uint32_t sn; /**< A sequence number for the protected data. */
+	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */
+} rte_seqlock_t;
+
+/**
+ * A static seqlock initializer.
+ */
+#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER }
+
+/**
+ * Initialize the seqlock.
+ *
+ * This function initializes the seqlock, and leaves the writer-side
+ * spinlock unlocked.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ */
+__rte_experimental
+void
+rte_seqlock_init(rte_seqlock_t *seqlock);
+
+/**
+ * Begin a read-side critical section.
+ *
+ * A call to this function marks the beginning of a read-side critical
+ * section, for @p seqlock.
+ *
+ * rte_seqlock_read_begin() returns a sequence number, which is later
+ * used in rte_seqlock_read_retry() to check if the protected data
+ * underwent any modifications during the read transaction.
+ *
+ * After (in program order) rte_seqlock_read_begin() has been called,
+ * the calling thread may read and copy the protected data. The
+ * protected data read *must* be copied (either in pristine form, or
+ * in the form of some derivative). A copy is required since the
+ * application only may read the data in the read-side critical
+ * section (i.e., after rte_seqlock_read_begin() and before
+ * rte_seqlock_read_retry()), but must not act upon the retrieved data
+ * while in the critical section, since it does not yet know if it is
+ * consistent.
+ *
+ * The data may be accessed with both atomic and/or non-atomic loads.
+ *
+ * After (in program order) all required data loads have been
+ * performed, rte_seqlock_read_retry() must be called, marking the end
+ * of the read-side critical section.
+ *
+ * If rte_seqlock_read_retry() returns true, the just-read data is
+ * inconsistent and should be discarded. If rte_seqlock_read_retry()
+ * returns false, the data was read atomically and the copied data is
+ * consistent.
+ *
+ * If rte_seqlock_read_retry() returns false, the application has the
+ * option to immediately restart the whole procedure (e.g., calling
+ * rte_seqlock_read_being() again), or do the same at some later time.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @return
+ *   The seqlock sequence number for this critical section, to
+ *   later be passed to rte_seqlock_read_retry().
+ *
+ * @see rte_seqlock_read_retry()
+ */
+__rte_experimental
+static inline uint32_t
+rte_seqlock_read_begin(const rte_seqlock_t *seqlock)
+{
+	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
+	 * from happening before the sn load. Synchronizes-with the
+	 * store release in rte_seqlock_end().
+	 */
+	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE);
+}
+
+/**
+ * End a read-side critical section.
+ *
+ * A call to this function marks the end of a read-side critical
+ * section, for @p seqlock. The application must supply the sequence
+ * number returned from the corresponding rte_seqlock_read_begin()
+ * call.
+ *
+ * After this function has been called, the caller should not access
+ * the protected data.
+ *
+ * In case this function returns false, the just-read data was
+ * consistent and the set of atomic and non-atomic load operations
+ * performed between rte_seqlock_read_begin() and
+ * rte_seqlock_read_retry() were atomic, as a whole.
+ *
+ * In case rte_seqlock_read_retry() returns true, the data was
+ * modified as it was being read and may be inconsistent, and thus
+ * should be discarded.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @param begin_sn
+ *   The seqlock sequence number that was returned by
+ *   rte_seqlock_read_begin() for this critical section.
+ * @return
+ *   true or false, if the just-read seqlock-protected data is inconsistent
+ *   or consistent, respectively.
+ *
+ * @see rte_seqlock_read_begin()
+ */
+__rte_experimental
+static inline bool
+rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn)
+{
+	uint32_t end_sn;
+
+	/* make sure the data loads happens before the sn load */
+	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
+	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
+
+	return unlikely(begin_sn & 1 || begin_sn != end_sn);
+}
+
+/**
+ * Begin write-side critical section.
+ *
+ * A call to this function acquires the write lock associated @p
+ * seqlock, and marks the beginning of a write-side critical section.
+ *
+ * After having called this function, the caller may go on to modify
+ * the protected data, in an atomic or non-atomic manner.
+ *
+ * After the necessary updates have been performed, the application
+ * calls rte_seqlock_write_end().
+ *
+ * This function is not preemption-safe in the sense that preemption
+ * of the calling thread may block reader progress until the writer
+ * thread is rescheduled.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_end()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_begin(rte_seqlock_t *seqlock)
+{
+	uint32_t sn;
+
+	/* to synchronize with other writers */
+	rte_spinlock_lock(&seqlock->lock);
+
+	sn = seqlock->sn + 1;
+
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
+
+	/* __ATOMIC_RELEASE to prevent stores after (in program order)
+	 * from happening before the sn store.
+	 */
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+}
+
+/**
+ * End write-side critical section.
+ *
+ * A call to this function marks the end of the write-side critical
+ * section, for @p seqlock. After this call has been made, the protected
+ * data may no longer be modified.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_begin()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_end(rte_seqlock_t *seqlock)
+{
+	uint32_t sn;
+
+	sn = seqlock->sn + 1;
+
+	/* synchronizes-with the load acquire in rte_seqlock_begin() */
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
+
+	rte_spinlock_unlock(&seqlock->lock);
+}
+
+#endif  /* _RTE_SEQLOCK_H_ */
diff --git a/lib/eal/version.map b/lib/eal/version.map
index b53eeb30d7..4a9d0ed899 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -420,6 +420,9 @@  EXPERIMENTAL {
 	rte_intr_instance_free;
 	rte_intr_type_get;
 	rte_intr_type_set;
+
+	# added in 22.07
+	rte_seqlock_init;
 };
 
 INTERNAL {