eal/unix: allow creating thread with real-time priority
Checks
Commit Message
When adding an API for creating threads,
the real-time priority has been forbidden on Unix.
There is a known issue with ring behaviour,
but it should not be completely forbidden.
Fixes: ca04c78b6262 ("eal: get/set thread priority per thread identifier")
Fixes: ce6e911d20f6 ("eal: add thread lifetime API")
Fixes: a7ba40b2b1bf ("drivers: convert to internal control threads")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
app/test/test_threads.c | 9 ---------
doc/guides/prog_guide/env_abstraction_layer.rst | 4 +++-
lib/eal/include/rte_thread.h | 2 +-
lib/eal/unix/rte_thread.c | 9 ---------
4 files changed, 4 insertions(+), 20 deletions(-)
Comments
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Tuesday, 24 October 2023 14.54
>
> When adding an API for creating threads,
> the real-time priority has been forbidden on Unix.
>
> There is a known issue with ring behaviour,
> but it should not be completely forbidden.
>
> Fixes: ca04c78b6262 ("eal: get/set thread priority per thread identifier")
> Fixes: ce6e911d20f6 ("eal: add thread lifetime API")
> Fixes: a7ba40b2b1bf ("drivers: convert to internal control threads")
> Cc: stable@dpdk.org
>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
[...]
> @@ -815,7 +815,9 @@ Known Issues
>
> 4. It MAY be used by preemptible multi-producer and/or preemptible multi-
> consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), SCHED_IDLE
> or SCHED_BATCH. User SHOULD be aware of the performance penalty before using
> it.
>
> - 5. It MUST not be used by multi-producer/consumer pthreads, whose
> scheduling policies are SCHED_FIFO or SCHED_RR.
> + 5. It MUST not be used by multi-producer/consumer pthreads
> + whose scheduling policies are ``SCHED_FIFO``
> + or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).
Do the RTS or HTS ring modes make any difference here?
Anyway, I agree that real-time priority should not be forbidden on Unix.
Acked-by: Morten Brørup <mb@smartsharesystems.com>
On Tue, 24 Oct 2023 15:55:13 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:
> >
> > 4. It MAY be used by preemptible multi-producer and/or preemptible multi-
> > consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), SCHED_IDLE
> > or SCHED_BATCH. User SHOULD be aware of the performance penalty before using
> > it.
> >
> > - 5. It MUST not be used by multi-producer/consumer pthreads, whose
> > scheduling policies are SCHED_FIFO or SCHED_RR.
> > + 5. It MUST not be used by multi-producer/consumer pthreads
> > + whose scheduling policies are ``SCHED_FIFO``
> > + or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).
>
> Do the RTS or HTS ring modes make any difference here?
>
> Anyway, I agree that real-time priority should not be forbidden on Unix.
>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
Please add a big warning message in the rte_thread.c and the documentation
to describe the problem. Need to have the "you have been warned" action.
Use of RT priority is incompatible with 100% poll mode as is typically done
in DPDK applications. A real time thread has higher priority than other necessary
kernel threads on the same CPU. Therefore if the RT thread never sleeps, critical
system actions such as delayed writes, network packet processing and timer updates
will not happen which makes the system unstable.
Multiple DPDK users have learned this the hard way.
24/10/2023 18:04, Stephen Hemminger:
> On Tue, 24 Oct 2023 15:55:13 +0200
> Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > >
> > > 4. It MAY be used by preemptible multi-producer and/or preemptible multi-
> > > consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), SCHED_IDLE
> > > or SCHED_BATCH. User SHOULD be aware of the performance penalty before using
> > > it.
> > >
> > > - 5. It MUST not be used by multi-producer/consumer pthreads, whose
> > > scheduling policies are SCHED_FIFO or SCHED_RR.
> > > + 5. It MUST not be used by multi-producer/consumer pthreads
> > > + whose scheduling policies are ``SCHED_FIFO``
> > > + or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).
> >
> > Do the RTS or HTS ring modes make any difference here?
> >
> > Anyway, I agree that real-time priority should not be forbidden on Unix.
> >
> > Acked-by: Morten Brørup <mb@smartsharesystems.com>
>
> Please add a big warning message in the rte_thread.c and the documentation
> to describe the problem. Need to have the "you have been warned" action.
Yes I can add more warnings.
> Use of RT priority is incompatible with 100% poll mode as is typically done
> in DPDK applications. A real time thread has higher priority than other necessary
> kernel threads on the same CPU. Therefore if the RT thread never sleeps, critical
> system actions such as delayed writes, network packet processing and timer updates
> will not happen which makes the system unstable.
Yes, and it is shown by the test on loongarch:
DPDK:fast-tests / threads_autotest TIMEOUT 80.01s
http://mails.dpdk.org/archives/test-report/2023-October/488760.html
I'll try to pass the test by adding a sleep in the test thread.
> Multiple DPDK users have learned this the hard way.
On Wed, Oct 25, 2023 at 03:15:49PM +0200, Thomas Monjalon wrote:
> 24/10/2023 18:04, Stephen Hemminger:
> > On Tue, 24 Oct 2023 15:55:13 +0200
> > Morten Brørup <mb@smartsharesystems.com> wrote:
> >
> > > >
> > > > 4. It MAY be used by preemptible multi-producer and/or preemptible multi-
> > > > consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), SCHED_IDLE
> > > > or SCHED_BATCH. User SHOULD be aware of the performance penalty before using
> > > > it.
> > > >
> > > > - 5. It MUST not be used by multi-producer/consumer pthreads, whose
> > > > scheduling policies are SCHED_FIFO or SCHED_RR.
> > > > + 5. It MUST not be used by multi-producer/consumer pthreads
> > > > + whose scheduling policies are ``SCHED_FIFO``
> > > > + or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).
> > >
> > > Do the RTS or HTS ring modes make any difference here?
> > >
> > > Anyway, I agree that real-time priority should not be forbidden on Unix.
> > >
> > > Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >
> > Please add a big warning message in the rte_thread.c and the documentation
> > to describe the problem. Need to have the "you have been warned" action.
>
> Yes I can add more warnings.
>
> > Use of RT priority is incompatible with 100% poll mode as is typically done
> > in DPDK applications. A real time thread has higher priority than other necessary
> > kernel threads on the same CPU. Therefore if the RT thread never sleeps, critical
> > system actions such as delayed writes, network packet processing and timer updates
> > will not happen which makes the system unstable.
>
> Yes, and it is shown by the test on loongarch:
> DPDK:fast-tests / threads_autotest TIMEOUT 80.01s
> http://mails.dpdk.org/archives/test-report/2023-October/488760.html
>
> I'll try to pass the test by adding a sleep in the test thread.
>
"sched_yield()" rather than sleep perhaps? Might better convey the
intention of the call.
25/10/2023 15:34, Bruce Richardson:
> On Wed, Oct 25, 2023 at 03:15:49PM +0200, Thomas Monjalon wrote:
> > 24/10/2023 18:04, Stephen Hemminger:
> > > On Tue, 24 Oct 2023 15:55:13 +0200
> > > Morten Brørup <mb@smartsharesystems.com> wrote:
> > >
> > > > >
> > > > > 4. It MAY be used by preemptible multi-producer and/or preemptible multi-
> > > > > consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), SCHED_IDLE
> > > > > or SCHED_BATCH. User SHOULD be aware of the performance penalty before using
> > > > > it.
> > > > >
> > > > > - 5. It MUST not be used by multi-producer/consumer pthreads, whose
> > > > > scheduling policies are SCHED_FIFO or SCHED_RR.
> > > > > + 5. It MUST not be used by multi-producer/consumer pthreads
> > > > > + whose scheduling policies are ``SCHED_FIFO``
> > > > > + or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).
> > > >
> > > > Do the RTS or HTS ring modes make any difference here?
> > > >
> > > > Anyway, I agree that real-time priority should not be forbidden on Unix.
> > > >
> > > > Acked-by: Morten Brørup <mb@smartsharesystems.com>
> > >
> > > Please add a big warning message in the rte_thread.c and the documentation
> > > to describe the problem. Need to have the "you have been warned" action.
> >
> > Yes I can add more warnings.
> >
> > > Use of RT priority is incompatible with 100% poll mode as is typically done
> > > in DPDK applications. A real time thread has higher priority than other necessary
> > > kernel threads on the same CPU. Therefore if the RT thread never sleeps, critical
> > > system actions such as delayed writes, network packet processing and timer updates
> > > will not happen which makes the system unstable.
> >
> > Yes, and it is shown by the test on loongarch:
> > DPDK:fast-tests / threads_autotest TIMEOUT 80.01s
> > http://mails.dpdk.org/archives/test-report/2023-October/488760.html
> >
> > I'll try to pass the test by adding a sleep in the test thread.
> >
>
> "sched_yield()" rather than sleep perhaps? Might better convey the
> intention of the call.
Do we have sched_yield on Windows?
On Wed, 25 Oct 2023 15:44:25 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > I'll try to pass the test by adding a sleep in the test thread.
> > >
> >
> > "sched_yield()" rather than sleep perhaps? Might better convey the
> > intention of the call.
>
> Do we have sched_yield on Windows?
Windows has an equivalent but sched_yield() won't work here.
Since the DPDK thread is still higher priority than the kernel thread,
the scheduler will reschedule the DPDK thread. You need to sleep
to let kthread run.
On Wed, Oct 25, 2023 at 08:08:52AM -0700, Stephen Hemminger wrote:
> On Wed, 25 Oct 2023 15:44:25 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
>
> > > >
> > > > I'll try to pass the test by adding a sleep in the test thread.
> > > >
> > >
> > > "sched_yield()" rather than sleep perhaps? Might better convey the
> > > intention of the call.
> >
> > Do we have sched_yield on Windows?
>
> Windows has an equivalent but sched_yield() won't work here.
> Since the DPDK thread is still higher priority than the kernel thread,
> the scheduler will reschedule the DPDK thread. You need to sleep
> to let kthread run.
Interesting. Thanks for clarifying the situation.
25/10/2023 17:14, Bruce Richardson:
> On Wed, Oct 25, 2023 at 08:08:52AM -0700, Stephen Hemminger wrote:
> > On Wed, 25 Oct 2023 15:44:25 +0200
> > Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > > > >
> > > > > I'll try to pass the test by adding a sleep in the test thread.
> > > > >
> > > >
> > > > "sched_yield()" rather than sleep perhaps? Might better convey the
> > > > intention of the call.
> > >
> > > Do we have sched_yield on Windows?
> >
> > Windows has an equivalent but sched_yield() won't work here.
> > Since the DPDK thread is still higher priority than the kernel thread,
> > the scheduler will reschedule the DPDK thread. You need to sleep
> > to let kthread run.
>
> Interesting. Thanks for clarifying the situation.
Indeed interesting.
I've just sent a v2 before reading this.
So I should try a v3 with a sleep.
But then I need to find a better name than rte_thread_yield.
Ideas?
25/10/2023 17:18, Thomas Monjalon:
> 25/10/2023 17:14, Bruce Richardson:
> > On Wed, Oct 25, 2023 at 08:08:52AM -0700, Stephen Hemminger wrote:
> > > On Wed, 25 Oct 2023 15:44:25 +0200
> > > Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > > > >
> > > > > > I'll try to pass the test by adding a sleep in the test thread.
> > > > > >
> > > > >
> > > > > "sched_yield()" rather than sleep perhaps? Might better convey the
> > > > > intention of the call.
> > > >
> > > > Do we have sched_yield on Windows?
> > >
> > > Windows has an equivalent but sched_yield() won't work here.
> > > Since the DPDK thread is still higher priority than the kernel thread,
> > > the scheduler will reschedule the DPDK thread. You need to sleep
> > > to let kthread run.
> >
> > Interesting. Thanks for clarifying the situation.
>
> Indeed interesting.
> I've just sent a v2 before reading this.
>
> So I should try a v3 with a sleep.
> But then I need to find a better name than rte_thread_yield.
> Ideas?
I will go with rte_thread_yield_realtime().
Any sleep will suffice on Linux? What about a nanosleep?
I suppose Sleep(0) is OK on Windows?
25/10/2023 18:31, Thomas Monjalon:
> Real-time thread priority was been forbidden on Unix
> because of problems they can cause.
> Warnings and helpers are added to avoid deadlocks,
> so real-time can be allowed on all systems.
Unit test is failing:
DPDK:fast-tests / threads_autotest TIMEOUT 600.01 s
It is seen in only 1 target (maybe the failure occurence is random):
Debian 11 (Buster) (ARM) | PASS
Fedora 37 (ARM) | PASS
CentOS Stream 9 (ARM) | FAIL
Fedora 38 (ARM) | PASS
Fedora 38 (ARM Clang) | PASS
Ubuntu 20.04 (ARM) | PASS
I need to send a v4 with new implementation and better comments.
The Unix sleep will be upgraded from 1 ns to 1 us in case it makes a difference.
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, 26 October 2023 15.37
>
> 25/10/2023 18:31, Thomas Monjalon:
> > Real-time thread priority was been forbidden on Unix
> > because of problems they can cause.
> > Warnings and helpers are added to avoid deadlocks,
> > so real-time can be allowed on all systems.
>
> Unit test is failing:
> DPDK:fast-tests / threads_autotest TIMEOUT 600.01 s
>
> It is seen in only 1 target (maybe the failure occurence is random):
> Debian 11 (Buster) (ARM) | PASS
> Fedora 37 (ARM) | PASS
> CentOS Stream 9 (ARM) | FAIL
> Fedora 38 (ARM) | PASS
> Fedora 38 (ARM Clang) | PASS
> Ubuntu 20.04 (ARM) | PASS
>
> I need to send a v4 with new implementation and better comments.
> The Unix sleep will be upgraded from 1 ns to 1 us in case it makes a
> difference.
It will not make a difference. The kernel will go through the sleeping steps, then wake up again and see the real-time thread is ready to run, and then immediately schedule it.
For testing purposes, consider sleeping 10 milliseconds or something significant like that.
> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Thursday, 26 October 2023 15.45
>
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Thursday, 26 October 2023 15.37
> >
> > 25/10/2023 18:31, Thomas Monjalon:
> > > Real-time thread priority was been forbidden on Unix
> > > because of problems they can cause.
> > > Warnings and helpers are added to avoid deadlocks,
> > > so real-time can be allowed on all systems.
> >
> > Unit test is failing:
> > DPDK:fast-tests / threads_autotest TIMEOUT 600.01 s
> >
> > It is seen in only 1 target (maybe the failure occurence is random):
> > Debian 11 (Buster) (ARM) | PASS
> > Fedora 37 (ARM) | PASS
> > CentOS Stream 9 (ARM) | FAIL
> > Fedora 38 (ARM) | PASS
> > Fedora 38 (ARM Clang) | PASS
> > Ubuntu 20.04 (ARM) | PASS
> >
> > I need to send a v4 with new implementation and better comments.
> > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes a
> > difference.
>
> It will not make a difference. The kernel will go through the sleeping steps,
> then wake up again and see the real-time thread is ready to run, and then
> immediately schedule it.
>
> For testing purposes, consider sleeping 10 milliseconds or something
> significant like that.
A bit more details...
In our recent tests, nanosleep() itself took around 50 us. So you need to sleep longer than that for your thread not to be runnable when the nanosleep() wakes up again, because 50 us has already passed in "nanosleep overhead".
10 milliseconds provides plenty of margin, and corresponds to 10 jiffies on a 1000 Hz kernel. (I don't know if it makes any difference for the kernel scheduler if the timer crosses a jiffy border or not.)
26/10/2023 15:57, Morten Brørup:
> > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > Sent: Thursday, 26 October 2023 15.45
> >
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > Sent: Thursday, 26 October 2023 15.37
> > >
> > > 25/10/2023 18:31, Thomas Monjalon:
> > > > Real-time thread priority was been forbidden on Unix
> > > > because of problems they can cause.
> > > > Warnings and helpers are added to avoid deadlocks,
> > > > so real-time can be allowed on all systems.
> > >
> > > Unit test is failing:
> > > DPDK:fast-tests / threads_autotest TIMEOUT 600.01 s
> > >
> > > It is seen in only 1 target (maybe the failure occurence is random):
> > > Debian 11 (Buster) (ARM) | PASS
> > > Fedora 37 (ARM) | PASS
> > > CentOS Stream 9 (ARM) | FAIL
> > > Fedora 38 (ARM) | PASS
> > > Fedora 38 (ARM Clang) | PASS
> > > Ubuntu 20.04 (ARM) | PASS
> > >
> > > I need to send a v4 with new implementation and better comments.
> > > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes a
> > > difference.
> >
> > It will not make a difference. The kernel will go through the sleeping steps,
> > then wake up again and see the real-time thread is ready to run, and then
> > immediately schedule it.
> >
> > For testing purposes, consider sleeping 10 milliseconds or something
> > significant like that.
>
> A bit more details...
>
> In our recent tests, nanosleep() itself took around 50 us. So you need to sleep longer than that for your thread not to be runnable when the nanosleep() wakes up again, because 50 us has already passed in "nanosleep overhead".
> 10 milliseconds provides plenty of margin, and corresponds to 10 jiffies on a 1000 Hz kernel. (I don't know if it makes any difference for the kernel scheduler if the timer crosses a jiffy border or not.)
10 ms looks like an eternity.
I will try.
(Anyway I did a mistake when sending v4)
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, 26 October 2023 16.05
>
> 26/10/2023 15:57, Morten Brørup:
> > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > Sent: Thursday, 26 October 2023 15.45
> > >
> > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > Sent: Thursday, 26 October 2023 15.37
> > > >
> > > > 25/10/2023 18:31, Thomas Monjalon:
> > > > > Real-time thread priority was been forbidden on Unix
> > > > > because of problems they can cause.
> > > > > Warnings and helpers are added to avoid deadlocks,
> > > > > so real-time can be allowed on all systems.
> > > >
> > > > Unit test is failing:
> > > > DPDK:fast-tests / threads_autotest TIMEOUT 600.01 s
> > > >
> > > > It is seen in only 1 target (maybe the failure occurence is random):
> > > > Debian 11 (Buster) (ARM) | PASS
> > > > Fedora 37 (ARM) | PASS
> > > > CentOS Stream 9 (ARM) | FAIL
> > > > Fedora 38 (ARM) | PASS
> > > > Fedora 38 (ARM Clang) | PASS
> > > > Ubuntu 20.04 (ARM) | PASS
> > > >
> > > > I need to send a v4 with new implementation and better comments.
> > > > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes a
> > > > difference.
> > >
> > > It will not make a difference. The kernel will go through the sleeping
> steps,
> > > then wake up again and see the real-time thread is ready to run, and then
> > > immediately schedule it.
> > >
> > > For testing purposes, consider sleeping 10 milliseconds or something
> > > significant like that.
> >
> > A bit more details...
> >
> > In our recent tests, nanosleep() itself took around 50 us. So you need to
> sleep longer than that for your thread not to be runnable when the nanosleep()
> wakes up again, because 50 us has already passed in "nanosleep overhead".
> > 10 milliseconds provides plenty of margin, and corresponds to 10 jiffies on
> a 1000 Hz kernel. (I don't know if it makes any difference for the kernel
> scheduler if the timer crosses a jiffy border or not.)
>
> 10 ms looks like an eternity.
Agree. It is only for functional testing, not for production!
> I will try.
> (Anyway I did a mistake when sending v4)
>
>
On Thu, Oct 26, 2023 at 3:57 PM Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > Sent: Thursday, 26 October 2023 15.45
> >
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > Sent: Thursday, 26 October 2023 15.37
> > >
> > > 25/10/2023 18:31, Thomas Monjalon:
> > > > Real-time thread priority was been forbidden on Unix
> > > > because of problems they can cause.
> > > > Warnings and helpers are added to avoid deadlocks,
> > > > so real-time can be allowed on all systems.
> > >
> > > Unit test is failing:
> > > DPDK:fast-tests / threads_autotest TIMEOUT 600.01 s
> > >
> > > It is seen in only 1 target (maybe the failure occurence is random):
> > > Debian 11 (Buster) (ARM) | PASS
> > > Fedora 37 (ARM) | PASS
> > > CentOS Stream 9 (ARM) | FAIL
> > > Fedora 38 (ARM) | PASS
> > > Fedora 38 (ARM Clang) | PASS
> > > Ubuntu 20.04 (ARM) | PASS
> > >
> > > I need to send a v4 with new implementation and better comments.
> > > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes a
> > > difference.
> >
> > It will not make a difference. The kernel will go through the sleeping steps,
> > then wake up again and see the real-time thread is ready to run, and then
> > immediately schedule it.
> >
> > For testing purposes, consider sleeping 10 milliseconds or something
> > significant like that.
>
> A bit more details...
>
> In our recent tests, nanosleep() itself took around 50 us. So you need to sleep longer than that for your thread not to be runnable when the nanosleep() wakes up again, because 50 us has already passed in "nanosleep overhead".
You may want to read manual for prctl and look for PR_SET_TIMERSLACK.
https://github.com/openvswitch/ovs/commit/f62629a55894546ff043e8a116c3c57aff73c285
> 10 milliseconds provides plenty of margin, and corresponds to 10 jiffies on a 1000 Hz kernel. (I don't know if it makes any difference for the kernel scheduler if the timer crosses a jiffy border or not.)
26/10/2023 16:08, Morten Brørup:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Thursday, 26 October 2023 16.05
> >
> > 26/10/2023 15:57, Morten Brørup:
> > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > Sent: Thursday, 26 October 2023 15.45
> > > >
> > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > Sent: Thursday, 26 October 2023 15.37
> > > > >
> > > > > 25/10/2023 18:31, Thomas Monjalon:
> > > > > > Real-time thread priority was been forbidden on Unix
> > > > > > because of problems they can cause.
> > > > > > Warnings and helpers are added to avoid deadlocks,
> > > > > > so real-time can be allowed on all systems.
> > > > >
> > > > > Unit test is failing:
> > > > > DPDK:fast-tests / threads_autotest TIMEOUT 600.01 s
> > > > >
> > > > > It is seen in only 1 target (maybe the failure occurence is random):
> > > > > Debian 11 (Buster) (ARM) | PASS
> > > > > Fedora 37 (ARM) | PASS
> > > > > CentOS Stream 9 (ARM) | FAIL
> > > > > Fedora 38 (ARM) | PASS
> > > > > Fedora 38 (ARM Clang) | PASS
> > > > > Ubuntu 20.04 (ARM) | PASS
> > > > >
> > > > > I need to send a v4 with new implementation and better comments.
> > > > > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes a
> > > > > difference.
> > > >
> > > > It will not make a difference. The kernel will go through the sleeping
> > steps,
> > > > then wake up again and see the real-time thread is ready to run, and then
> > > > immediately schedule it.
> > > >
> > > > For testing purposes, consider sleeping 10 milliseconds or something
> > > > significant like that.
> > >
> > > A bit more details...
> > >
> > > In our recent tests, nanosleep() itself took around 50 us. So you need to
> > sleep longer than that for your thread not to be runnable when the nanosleep()
> > wakes up again, because 50 us has already passed in "nanosleep overhead".
> > > 10 milliseconds provides plenty of margin, and corresponds to 10 jiffies on
> > a 1000 Hz kernel. (I don't know if it makes any difference for the kernel
> > scheduler if the timer crosses a jiffy border or not.)
> >
> > 10 ms looks like an eternity.
>
> Agree. It is only for functional testing, not for production!
Realtime thread won't make any sense if we have to insert a long sleep.
We need to find a good sleep value or give up with real-time threads.
(note I'm not sure how much it is useful)
> > I will try.
> > (Anyway I did a mistake when sending v4)
I've sent a trial with 1 ms.
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, 26 October 2023 16.31
>
> 26/10/2023 16:08, Morten Brørup:
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > Sent: Thursday, 26 October 2023 16.05
> > >
> > > 26/10/2023 15:57, Morten Brørup:
> > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > Sent: Thursday, 26 October 2023 15.45
> > > > >
> > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > Sent: Thursday, 26 October 2023 15.37
> > > > > >
> > > > > > 25/10/2023 18:31, Thomas Monjalon:
> > > > > > > Real-time thread priority was been forbidden on Unix
> > > > > > > because of problems they can cause.
> > > > > > > Warnings and helpers are added to avoid deadlocks,
> > > > > > > so real-time can be allowed on all systems.
> > > > > >
> > > > > > Unit test is failing:
> > > > > > DPDK:fast-tests / threads_autotest TIMEOUT 600.01 s
> > > > > >
> > > > > > It is seen in only 1 target (maybe the failure occurence is random):
> > > > > > Debian 11 (Buster) (ARM) | PASS
> > > > > > Fedora 37 (ARM) | PASS
> > > > > > CentOS Stream 9 (ARM) | FAIL
> > > > > > Fedora 38 (ARM) | PASS
> > > > > > Fedora 38 (ARM Clang) | PASS
> > > > > > Ubuntu 20.04 (ARM) | PASS
> > > > > >
> > > > > > I need to send a v4 with new implementation and better comments.
> > > > > > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes a
> > > > > > difference.
> > > > >
> > > > > It will not make a difference. The kernel will go through the sleeping
> > > steps,
> > > > > then wake up again and see the real-time thread is ready to run, and
> then
> > > > > immediately schedule it.
> > > > >
> > > > > For testing purposes, consider sleeping 10 milliseconds or something
> > > > > significant like that.
> > > >
> > > > A bit more details...
> > > >
> > > > In our recent tests, nanosleep() itself took around 50 us. So you need
> to
> > > sleep longer than that for your thread not to be runnable when the
> nanosleep()
> > > wakes up again, because 50 us has already passed in "nanosleep overhead".
> > > > 10 milliseconds provides plenty of margin, and corresponds to 10 jiffies
> on
> > > a 1000 Hz kernel. (I don't know if it makes any difference for the kernel
> > > scheduler if the timer crosses a jiffy border or not.)
> > >
> > > 10 ms looks like an eternity.
> >
> > Agree. It is only for functional testing, not for production!
>
> Realtime thread won't make any sense if we have to insert a long sleep.
It seems David came to our rescue here!
I have just tried running our test again with prctl(PR_SET_TIMERSLACK) of 1 ns, and the nanosleep(1 ns) delay dropped from ca. 50 us to ca. 2.5 us.
The timeout parameter to epoll_wait() is in milliseconds, which is useless for low-latency.
Perhaps real-time threads can be used with epoll() combined with timerfd for nanosecond resolution timeout.
> We need to find a good sleep value or give up with real-time threads.
> (note I'm not sure how much it is useful)
Only the application developer knows how much delay is acceptable. Which is why I mentioned that the new yield functions should document the delay.
>
> > > I will try.
> > > (Anyway I did a mistake when sending v4)
>
> I've sent a trial with 1 ms.
>
> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Thursday, 26 October 2023 16.50
>
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Thursday, 26 October 2023 16.31
> >
> > 26/10/2023 16:08, Morten Brørup:
> > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > Sent: Thursday, 26 October 2023 16.05
> > > >
> > > > 26/10/2023 15:57, Morten Brørup:
> > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > Sent: Thursday, 26 October 2023 15.45
> > > > > >
> > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > Sent: Thursday, 26 October 2023 15.37
> > > > > > >
> > > > > > > 25/10/2023 18:31, Thomas Monjalon:
> > > > > > > > Real-time thread priority was been forbidden on Unix
> > > > > > > > because of problems they can cause.
> > > > > > > > Warnings and helpers are added to avoid deadlocks,
> > > > > > > > so real-time can be allowed on all systems.
> > > > > > >
> > > > > > > Unit test is failing:
> > > > > > > DPDK:fast-tests / threads_autotest TIMEOUT 600.01 s
> > > > > > >
> > > > > > > It is seen in only 1 target (maybe the failure occurence is
> random):
> > > > > > > Debian 11 (Buster) (ARM) | PASS
> > > > > > > Fedora 37 (ARM) | PASS
> > > > > > > CentOS Stream 9 (ARM) | FAIL
> > > > > > > Fedora 38 (ARM) | PASS
> > > > > > > Fedora 38 (ARM Clang) | PASS
> > > > > > > Ubuntu 20.04 (ARM) | PASS
> > > > > > >
> > > > > > > I need to send a v4 with new implementation and better comments.
> > > > > > > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes
> a
> > > > > > > difference.
> > > > > >
> > > > > > It will not make a difference. The kernel will go through the
> sleeping
> > > > steps,
> > > > > > then wake up again and see the real-time thread is ready to run, and
> > then
> > > > > > immediately schedule it.
> > > > > >
> > > > > > For testing purposes, consider sleeping 10 milliseconds or something
> > > > > > significant like that.
> > > > >
> > > > > A bit more details...
> > > > >
> > > > > In our recent tests, nanosleep() itself took around 50 us. So you need
> > to
> > > > sleep longer than that for your thread not to be runnable when the
> > nanosleep()
> > > > wakes up again, because 50 us has already passed in "nanosleep
> overhead".
> > > > > 10 milliseconds provides plenty of margin, and corresponds to 10
> jiffies
> > on
> > > > a 1000 Hz kernel. (I don't know if it makes any difference for the
> kernel
> > > > scheduler if the timer crosses a jiffy border or not.)
> > > >
> > > > 10 ms looks like an eternity.
> > >
> > > Agree. It is only for functional testing, not for production!
> >
> > Realtime thread won't make any sense if we have to insert a long sleep.
>
> It seems David came to our rescue here!
>
> I have just tried running our test again with prctl(PR_SET_TIMERSLACK) of 1
> ns, and the nanosleep(1 ns) delay dropped from ca. 50 us to ca. 2.5 us.
>
> The timeout parameter to epoll_wait() is in milliseconds, which is useless for
> low-latency.
> Perhaps real-time threads can be used with epoll() combined with timerfd for
> nanosecond resolution timeout.
Or epoll_pwait2(), which has nanosecond resolution timeout.
Unfortunately, rte_epoll_wait() is not an experimental API anymore, so we cannot change its timeout parameter from milliseconds to micro- or nanoseconds. We would have to introduce a new API for this.
>
> > We need to find a good sleep value or give up with real-time threads.
> > (note I'm not sure how much it is useful)
>
> Only the application developer knows how much delay is acceptable. Which is
> why I mentioned that the new yield functions should document the delay.
>
> >
> > > > I will try.
> > > > (Anyway I did a mistake when sending v4)
> >
> > I've sent a trial with 1 ms.
> >
On Thu, Oct 26, 2023 at 04:59:51PM +0200, Morten Brørup wrote:
> > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > Sent: Thursday, 26 October 2023 16.50
> >
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > Sent: Thursday, 26 October 2023 16.31
> > >
> > > 26/10/2023 16:08, Morten Brørup:
> > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > Sent: Thursday, 26 October 2023 16.05
> > > > >
> > > > > 26/10/2023 15:57, Morten Brørup:
> > > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > > Sent: Thursday, 26 October 2023 15.45
> > > > > > >
> > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > Sent: Thursday, 26 October 2023 15.37
> > > > > > > >
> > > > > > > > 25/10/2023 18:31, Thomas Monjalon:
> > > > > > > > > Real-time thread priority was been forbidden on Unix
> > > > > > > > > because of problems they can cause.
> > > > > > > > > Warnings and helpers are added to avoid deadlocks,
> > > > > > > > > so real-time can be allowed on all systems.
> > > > > > > >
> > > > > > > > Unit test is failing:
> > > > > > > > DPDK:fast-tests / threads_autotest TIMEOUT 600.01 s
> > > > > > > >
> > > > > > > > It is seen in only 1 target (maybe the failure occurence is
> > random):
> > > > > > > > Debian 11 (Buster) (ARM) | PASS
> > > > > > > > Fedora 37 (ARM) | PASS
> > > > > > > > CentOS Stream 9 (ARM) | FAIL
> > > > > > > > Fedora 38 (ARM) | PASS
> > > > > > > > Fedora 38 (ARM Clang) | PASS
> > > > > > > > Ubuntu 20.04 (ARM) | PASS
> > > > > > > >
> > > > > > > > I need to send a v4 with new implementation and better comments.
> > > > > > > > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes
> > a
> > > > > > > > difference.
> > > > > > >
> > > > > > > It will not make a difference. The kernel will go through the
> > sleeping
> > > > > steps,
> > > > > > > then wake up again and see the real-time thread is ready to run, and
> > > then
> > > > > > > immediately schedule it.
> > > > > > >
> > > > > > > For testing purposes, consider sleeping 10 milliseconds or something
> > > > > > > significant like that.
> > > > > >
> > > > > > A bit more details...
> > > > > >
> > > > > > In our recent tests, nanosleep() itself took around 50 us. So you need
> > > to
> > > > > sleep longer than that for your thread not to be runnable when the
> > > nanosleep()
> > > > > wakes up again, because 50 us has already passed in "nanosleep
> > overhead".
> > > > > > 10 milliseconds provides plenty of margin, and corresponds to 10
> > jiffies
> > > on
> > > > > a 1000 Hz kernel. (I don't know if it makes any difference for the
> > kernel
> > > > > scheduler if the timer crosses a jiffy border or not.)
> > > > >
> > > > > 10 ms looks like an eternity.
> > > >
> > > > Agree. It is only for functional testing, not for production!
> > >
> > > Realtime thread won't make any sense if we have to insert a long sleep.
> >
> > It seems David came to our rescue here!
> >
> > I have just tried running our test again with prctl(PR_SET_TIMERSLACK) of 1
> > ns, and the nanosleep(1 ns) delay dropped from ca. 50 us to ca. 2.5 us.
> >
> > The timeout parameter to epoll_wait() is in milliseconds, which is useless for
> > low-latency.
> > Perhaps real-time threads can be used with epoll() combined with timerfd for
> > nanosecond resolution timeout.
>
> Or epoll_pwait2(), which has nanosecond resolution timeout.
>
> Unfortunately, rte_epoll_wait() is not an experimental API anymore, so we cannot change its timeout parameter from milliseconds to micro- or nanoseconds. We would have to introduce a new API for this.
>
Just an idea - can we change the timeout parameter to float rather than int,
and then use function versioning for backward compatibility for any
binaries passing int?
That way the actual meaning of the parameter doesn't change, but it still
allows sub-millisecond values (all-be-it with some loss of accuracy due to
float).
/Bruce
26/10/2023 17:54, Bruce Richardson:
> On Thu, Oct 26, 2023 at 04:59:51PM +0200, Morten Brørup wrote:
> > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > Sent: Thursday, 26 October 2023 16.50
> > >
> > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > Sent: Thursday, 26 October 2023 16.31
> > > >
> > > > 26/10/2023 16:08, Morten Brørup:
> > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > Sent: Thursday, 26 October 2023 16.05
> > > > > >
> > > > > > 26/10/2023 15:57, Morten Brørup:
> > > > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > > > Sent: Thursday, 26 October 2023 15.45
> > > > > > > >
> > > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > > Sent: Thursday, 26 October 2023 15.37
> > > > > > > > >
> > > > > > > > > 25/10/2023 18:31, Thomas Monjalon:
> > > > > > > > > > Real-time thread priority was been forbidden on Unix
> > > > > > > > > > because of problems they can cause.
> > > > > > > > > > Warnings and helpers are added to avoid deadlocks,
> > > > > > > > > > so real-time can be allowed on all systems.
> > > > > > > > >
> > > > > > > > > Unit test is failing:
> > > > > > > > > DPDK:fast-tests / threads_autotest TIMEOUT 600.01 s
> > > > > > > > >
> > > > > > > > > It is seen in only 1 target (maybe the failure occurence is
> > > random):
> > > > > > > > > Debian 11 (Buster) (ARM) | PASS
> > > > > > > > > Fedora 37 (ARM) | PASS
> > > > > > > > > CentOS Stream 9 (ARM) | FAIL
> > > > > > > > > Fedora 38 (ARM) | PASS
> > > > > > > > > Fedora 38 (ARM Clang) | PASS
> > > > > > > > > Ubuntu 20.04 (ARM) | PASS
> > > > > > > > >
> > > > > > > > > I need to send a v4 with new implementation and better comments.
> > > > > > > > > The Unix sleep will be upgraded from 1 ns to 1 us in case it makes
> > > a
> > > > > > > > > difference.
> > > > > > > >
> > > > > > > > It will not make a difference. The kernel will go through the
> > > sleeping
> > > > > > steps,
> > > > > > > > then wake up again and see the real-time thread is ready to run, and
> > > > then
> > > > > > > > immediately schedule it.
> > > > > > > >
> > > > > > > > For testing purposes, consider sleeping 10 milliseconds or something
> > > > > > > > significant like that.
> > > > > > >
> > > > > > > A bit more details...
> > > > > > >
> > > > > > > In our recent tests, nanosleep() itself took around 50 us. So you need
> > > > to
> > > > > > sleep longer than that for your thread not to be runnable when the
> > > > nanosleep()
> > > > > > wakes up again, because 50 us has already passed in "nanosleep
> > > overhead".
> > > > > > > 10 milliseconds provides plenty of margin, and corresponds to 10
> > > jiffies
> > > > on
> > > > > > a 1000 Hz kernel. (I don't know if it makes any difference for the
> > > kernel
> > > > > > scheduler if the timer crosses a jiffy border or not.)
> > > > > >
> > > > > > 10 ms looks like an eternity.
> > > > >
> > > > > Agree. It is only for functional testing, not for production!
> > > >
> > > > Realtime thread won't make any sense if we have to insert a long sleep.
> > >
> > > It seems David came to our rescue here!
> > >
> > > I have just tried running our test again with prctl(PR_SET_TIMERSLACK) of 1
> > > ns, and the nanosleep(1 ns) delay dropped from ca. 50 us to ca. 2.5 us.
> > >
> > > The timeout parameter to epoll_wait() is in milliseconds, which is useless for
> > > low-latency.
> > > Perhaps real-time threads can be used with epoll() combined with timerfd for
> > > nanosecond resolution timeout.
> >
> > Or epoll_pwait2(), which has nanosecond resolution timeout.
> >
> > Unfortunately, rte_epoll_wait() is not an experimental API anymore, so we cannot change its timeout parameter from milliseconds to micro- or nanoseconds. We would have to introduce a new API for this.
> >
>
> Just an idea - can we change the timeout parameter to float rather than int,
> and then use function versioning for backward compatibility for any
> binaries passing int?
> That way the actual meaning of the parameter doesn't change, but it still
> allows sub-millisecond values (all-be-it with some loss of accuracy due to
> float).
Sorry I'm not following why you want to use rte_epoll_wait()?
If the realtime thread has some blocking system calls,
no sleep is needed I think.
For average realtime thread, I suggest the API rte_thread_yield_realtime()
which could wait for 1 ms or less by default.
For smaller sleep, we can use PR_SET_TIMERSLACK and rte_delay_us_sleep().
If we provide an API for PR_SET_TIMERSLACK, we could adapt the duration
of rte_thread_yield_realtime() dynamically after calling prctl().
On Thu, 26 Oct 2023 16:08:02 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:
> > > In our recent tests, nanosleep() itself took around 50 us. So you need to
> > sleep longer than that for your thread not to be runnable when the nanosleep()
> > wakes up again, because 50 us has already passed in "nanosleep overhead".
> > > 10 milliseconds provides plenty of margin, and corresponds to 10 jiffies on
> > a 1000 Hz kernel. (I don't know if it makes any difference for the kernel
> > scheduler if the timer crosses a jiffy border or not.)
> >
> > 10 ms looks like an eternity.
>
> Agree. It is only for functional testing, not for production!
To be safe the sleep has to be longer than the system clock tick.
Most systems are built today with HZ=250 but really should be using HZ=1000
on modern CPU's.
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, 26 October 2023 18.07
>
> 26/10/2023 17:54, Bruce Richardson:
> > On Thu, Oct 26, 2023 at 04:59:51PM +0200, Morten Brørup wrote:
> > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > Sent: Thursday, 26 October 2023 16.50
> > > >
> > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > Sent: Thursday, 26 October 2023 16.31
> > > > >
> > > > > 26/10/2023 16:08, Morten Brørup:
> > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > Sent: Thursday, 26 October 2023 16.05
> > > > > > >
> > > > > > > 26/10/2023 15:57, Morten Brørup:
> > > > > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > > > > Sent: Thursday, 26 October 2023 15.45
> > > > > > > > >
> > > > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > > > Sent: Thursday, 26 October 2023 15.37
> > > > > > > > > >
> > > > > > > > > > 25/10/2023 18:31, Thomas Monjalon:
> > > > > > > > > > > Real-time thread priority was been forbidden on Unix
> > > > > > > > > > > because of problems they can cause.
> > > > > > > > > > > Warnings and helpers are added to avoid deadlocks,
> > > > > > > > > > > so real-time can be allowed on all systems.
> > > > > > > > > >
> > > > > > > > > > Unit test is failing:
> > > > > > > > > > DPDK:fast-tests / threads_autotest TIMEOUT 600.01
> s
> > > > > > > > > >
> > > > > > > > > > It is seen in only 1 target (maybe the failure
> occurence is
> > > > random):
> > > > > > > > > > Debian 11 (Buster) (ARM) | PASS
> > > > > > > > > > Fedora 37 (ARM) | PASS
> > > > > > > > > > CentOS Stream 9 (ARM) | FAIL
> > > > > > > > > > Fedora 38 (ARM) | PASS
> > > > > > > > > > Fedora 38 (ARM Clang) | PASS
> > > > > > > > > > Ubuntu 20.04 (ARM) | PASS
> > > > > > > > > >
> > > > > > > > > > I need to send a v4 with new implementation and better
> comments.
> > > > > > > > > > The Unix sleep will be upgraded from 1 ns to 1 us in
> case it makes
> > > > a
> > > > > > > > > > difference.
> > > > > > > > >
> > > > > > > > > It will not make a difference. The kernel will go
> through the
> > > > sleeping
> > > > > > > steps,
> > > > > > > > > then wake up again and see the real-time thread is ready
> to run, and
> > > > > then
> > > > > > > > > immediately schedule it.
> > > > > > > > >
> > > > > > > > > For testing purposes, consider sleeping 10 milliseconds
> or something
> > > > > > > > > significant like that.
> > > > > > > >
> > > > > > > > A bit more details...
> > > > > > > >
> > > > > > > > In our recent tests, nanosleep() itself took around 50 us.
> So you need
> > > > > to
> > > > > > > sleep longer than that for your thread not to be runnable
> when the
> > > > > nanosleep()
> > > > > > > wakes up again, because 50 us has already passed in
> "nanosleep
> > > > overhead".
> > > > > > > > 10 milliseconds provides plenty of margin, and corresponds
> to 10
> > > > jiffies
> > > > > on
> > > > > > > a 1000 Hz kernel. (I don't know if it makes any difference
> for the
> > > > kernel
> > > > > > > scheduler if the timer crosses a jiffy border or not.)
> > > > > > >
> > > > > > > 10 ms looks like an eternity.
> > > > > >
> > > > > > Agree. It is only for functional testing, not for production!
> > > > >
> > > > > Realtime thread won't make any sense if we have to insert a long
> sleep.
> > > >
> > > > It seems David came to our rescue here!
> > > >
> > > > I have just tried running our test again with
> prctl(PR_SET_TIMERSLACK) of 1
> > > > ns, and the nanosleep(1 ns) delay dropped from ca. 50 us to ca.
> 2.5 us.
> > > >
> > > > The timeout parameter to epoll_wait() is in milliseconds, which is
> useless for
> > > > low-latency.
> > > > Perhaps real-time threads can be used with epoll() combined with
> timerfd for
> > > > nanosecond resolution timeout.
> > >
> > > Or epoll_pwait2(), which has nanosecond resolution timeout.
> > >
> > > Unfortunately, rte_epoll_wait() is not an experimental API anymore,
> so we cannot change its timeout parameter from milliseconds to micro- or
> nanoseconds. We would have to introduce a new API for this.
> > >
> >
> > Just an idea - can we change the timeout parameter to float rather
> than int,
> > and then use function versioning for backward compatibility for any
> > binaries passing int?
> > That way the actual meaning of the parameter doesn't change, but it
> still
> > allows sub-millisecond values (all-be-it with some loss of accuracy
> due to
> > float).
Too exotic for my taste. I would rather introduce rte_epoll_wait_ns() with timeout in nanoseconds than pass a float.
>
> Sorry I'm not following why you want to use rte_epoll_wait()?
I don't have experience with it yet, but it seems to be the official DPDK API for blocking I/O system call.
>
> If the realtime thread has some blocking system calls,
> no sleep is needed I think.
Correct.
> For average realtime thread, I suggest the API
> rte_thread_yield_realtime()
> which could wait for 1 ms or less by default.
If we introduce a yield() function, it should yield according to the O/S scheduling policy, e.g. the rest of the time slice allocated to the thread by the O/S scheduler (although that might not be available for real-time prioritized threads in Linux). I don't think we can make it O/S agnostic.
I don't think it should wait a fixed amount of time - we already have rte_delay_us_sleep() for that.
In my experiments with power saving, I ended up with a varying sleep duration, depending on traffic load. The l3fwd-power example also uses a varying sleep duration depending on traffic load.
> For smaller sleep, we can use PR_SET_TIMERSLACK and
> rte_delay_us_sleep().
Agree.
> If we provide an API for PR_SET_TIMERSLACK, we could adapt the duration
> of rte_thread_yield_realtime() dynamically after calling prctl().
>
I'm not sure exposing an API for PR_SET_TIMERSLACK is the right solution.
I would rather have the EAL set the timer slack to minimum (1 ns) at EAL initialization. An EAL command line parameter could be added to change the default from 1 ns.
Also, something similar needs to be done for Windows.
26/10/2023 18:50, Morten Brørup:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Thursday, 26 October 2023 18.07
> >
> > 26/10/2023 17:54, Bruce Richardson:
> > > On Thu, Oct 26, 2023 at 04:59:51PM +0200, Morten Brørup wrote:
> > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > Sent: Thursday, 26 October 2023 16.50
> > > > >
> > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > Sent: Thursday, 26 October 2023 16.31
> > > > > >
> > > > > > 26/10/2023 16:08, Morten Brørup:
> > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > Sent: Thursday, 26 October 2023 16.05
> > > > > > > >
> > > > > > > > 26/10/2023 15:57, Morten Brørup:
> > > > > > > > > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > > > > > > > > Sent: Thursday, 26 October 2023 15.45
> > > > > > > > > >
> > > > > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > > > > > > > Sent: Thursday, 26 October 2023 15.37
> > > > > > > > > > >
> > > > > > > > > > > 25/10/2023 18:31, Thomas Monjalon:
> > > > > > > > > > > > Real-time thread priority was been forbidden on Unix
> > > > > > > > > > > > because of problems they can cause.
> > > > > > > > > > > > Warnings and helpers are added to avoid deadlocks,
> > > > > > > > > > > > so real-time can be allowed on all systems.
> > > > > > > > > > >
> > > > > > > > > > > Unit test is failing:
> > > > > > > > > > > DPDK:fast-tests / threads_autotest TIMEOUT 600.01
> > s
> > > > > > > > > > >
> > > > > > > > > > > It is seen in only 1 target (maybe the failure
> > occurence is
> > > > > random):
> > > > > > > > > > > Debian 11 (Buster) (ARM) | PASS
> > > > > > > > > > > Fedora 37 (ARM) | PASS
> > > > > > > > > > > CentOS Stream 9 (ARM) | FAIL
> > > > > > > > > > > Fedora 38 (ARM) | PASS
> > > > > > > > > > > Fedora 38 (ARM Clang) | PASS
> > > > > > > > > > > Ubuntu 20.04 (ARM) | PASS
> > > > > > > > > > >
> > > > > > > > > > > I need to send a v4 with new implementation and better
> > comments.
> > > > > > > > > > > The Unix sleep will be upgraded from 1 ns to 1 us in
> > case it makes
> > > > > a
> > > > > > > > > > > difference.
> > > > > > > > > >
> > > > > > > > > > It will not make a difference. The kernel will go
> > through the
> > > > > sleeping
> > > > > > > > steps,
> > > > > > > > > > then wake up again and see the real-time thread is ready
> > to run, and
> > > > > > then
> > > > > > > > > > immediately schedule it.
> > > > > > > > > >
> > > > > > > > > > For testing purposes, consider sleeping 10 milliseconds
> > or something
> > > > > > > > > > significant like that.
> > > > > > > > >
> > > > > > > > > A bit more details...
> > > > > > > > >
> > > > > > > > > In our recent tests, nanosleep() itself took around 50 us.
> > So you need
> > > > > > to
> > > > > > > > sleep longer than that for your thread not to be runnable
> > when the
> > > > > > nanosleep()
> > > > > > > > wakes up again, because 50 us has already passed in
> > "nanosleep
> > > > > overhead".
> > > > > > > > > 10 milliseconds provides plenty of margin, and corresponds
> > to 10
> > > > > jiffies
> > > > > > on
> > > > > > > > a 1000 Hz kernel. (I don't know if it makes any difference
> > for the
> > > > > kernel
> > > > > > > > scheduler if the timer crosses a jiffy border or not.)
> > > > > > > >
> > > > > > > > 10 ms looks like an eternity.
> > > > > > >
> > > > > > > Agree. It is only for functional testing, not for production!
> > > > > >
> > > > > > Realtime thread won't make any sense if we have to insert a long
> > sleep.
> > > > >
> > > > > It seems David came to our rescue here!
> > > > >
> > > > > I have just tried running our test again with
> > prctl(PR_SET_TIMERSLACK) of 1
> > > > > ns, and the nanosleep(1 ns) delay dropped from ca. 50 us to ca.
> > 2.5 us.
> > > > >
> > > > > The timeout parameter to epoll_wait() is in milliseconds, which is
> > useless for
> > > > > low-latency.
> > > > > Perhaps real-time threads can be used with epoll() combined with
> > timerfd for
> > > > > nanosecond resolution timeout.
> > > >
> > > > Or epoll_pwait2(), which has nanosecond resolution timeout.
> > > >
> > > > Unfortunately, rte_epoll_wait() is not an experimental API anymore,
> > so we cannot change its timeout parameter from milliseconds to micro- or
> > nanoseconds. We would have to introduce a new API for this.
> > > >
> > >
> > > Just an idea - can we change the timeout parameter to float rather
> > than int,
> > > and then use function versioning for backward compatibility for any
> > > binaries passing int?
> > > That way the actual meaning of the parameter doesn't change, but it
> > still
> > > allows sub-millisecond values (all-be-it with some loss of accuracy
> > due to
> > > float).
>
> Too exotic for my taste. I would rather introduce rte_epoll_wait_ns() with timeout in nanoseconds than pass a float.
>
> >
> > Sorry I'm not following why you want to use rte_epoll_wait()?
>
> I don't have experience with it yet, but it seems to be the official DPDK API for blocking I/O system call.
>
> >
> > If the realtime thread has some blocking system calls,
> > no sleep is needed I think.
>
> Correct.
>
> > For average realtime thread, I suggest the API
> > rte_thread_yield_realtime()
> > which could wait for 1 ms or less by default.
>
> If we introduce a yield() function, it should yield according to the O/S scheduling policy, e.g. the rest of the time slice allocated to the thread by the O/S scheduler (although that might not be available for real-time prioritized threads in Linux). I don't think we can make it O/S agnostic.
>
> I don't think it should wait a fixed amount of time - we already have rte_delay_us_sleep() for that.
>
> In my experiments with power saving, I ended up with a varying sleep duration, depending on traffic load. The l3fwd-power example also uses a varying sleep duration depending on traffic load.
I feel you lost the goal of this: schedule other threads (especially kernel threads).
So we don't care about the sleep duration at all,
except we want it minimal while allowing scheduling.
> > For smaller sleep, we can use PR_SET_TIMERSLACK and
> > rte_delay_us_sleep().
>
> Agree.
>
> > If we provide an API for PR_SET_TIMERSLACK, we could adapt the duration
> > of rte_thread_yield_realtime() dynamically after calling prctl().
> >
>
> I'm not sure exposing an API for PR_SET_TIMERSLACK is the right solution.
>
> I would rather have the EAL set the timer slack to minimum (1 ns) at EAL initialization. An EAL command line parameter could be added to change the default from 1 ns.
>
> Also, something similar needs to be done for Windows.
Windows should be fine with Sleep(0).
26/10/2023 18:28, Stephen Hemminger:
> On Thu, 26 Oct 2023 16:08:02 +0200
> Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > > > In our recent tests, nanosleep() itself took around 50 us. So you need to
> > > sleep longer than that for your thread not to be runnable when the nanosleep()
> > > wakes up again, because 50 us has already passed in "nanosleep overhead".
> > > > 10 milliseconds provides plenty of margin, and corresponds to 10 jiffies on
> > > a 1000 Hz kernel. (I don't know if it makes any difference for the kernel
> > > scheduler if the timer crosses a jiffy border or not.)
> > >
> > > 10 ms looks like an eternity.
> >
> > Agree. It is only for functional testing, not for production!
>
> To be safe the sleep has to be longer than the system clock tick.
> Most systems are built today with HZ=250 but really should be using HZ=1000
> on modern CPU's.
If it has to be more than 1 ms,
we should mention it is a slow call
which may be skipped if the thread is already blocking on something else.
26/10/2023 16:19, Thomas Monjalon:
> Real-time thread priority was been forbidden on Unix
> because of problems they can cause.
> Warnings and helpers are added to avoid deadlocks,
> so real-time can be allowed on all systems.
>
> Thomas Monjalon (2):
> eal: add thread yield functions
> eal/unix: allow creating thread with real-time priority
>
> v1: no yield at all
> v2: more comments, sched_yield() and Sleep(0) on Windows
> v3: 2 yield functions with sleep in realtime version
> v4: runtime warning, longer sleep on Unix and lighter yield on Windows
> v5: fix build and increase Unix sleep to 1 ms
>
> Thomas Monjalon (2):
> eal: add thread yield functions
> eal/unix: allow creating thread with real-time priority
Now there is a test failing on Windows:
http://mails.dpdk.org/archives/test-report/2023-October/491475.html
On Thu, 26 Oct 2023 21:55:24 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:
> > To be safe the sleep has to be longer than the system clock tick.
> > Most systems are built today with HZ=250 but really should be using HZ=1000
> > on modern CPU's.
>
> If it has to be more than 1 ms,
> we should mention it is a slow call
> which may be skipped if the thread is already blocking on something else.
A thread that is real time is not subject to timer slack.
Therefore even 100ns would probably work fine.
https://man7.org/linux/man-pages/man7/time.7.html
High-resolution timers
Before Linux 2.6.21, the accuracy of timer and sleep system calls
(see below) was also limited by the size of the jiffy.
Since Linux 2.6.21, Linux supports high-resolution timers (HRTs),
optionally configurable via CONFIG_HIGH_RES_TIMERS. On a system
that supports HRTs, the accuracy of sleep and timer system calls
is no longer constrained by the jiffy, but instead can be as
accurate as the hardware allows (microsecond accuracy is typical
of modern hardware). You can determine whether high-resolution
timers are supported by checking the resolution returned by a
call to clock_getres(2) or looking at the "resolution" entries in
/proc/timer_list.
HRTs are not supported on all hardware architectures. (Support
is provided on x86, ARM, and PowerPC, among others.)
...
Timer slack
Since Linux 2.6.28, it is possible to control the "timer slack"
value for a thread. The timer slack is the length of time by
which the kernel may delay the wake-up of certain system calls
that block with a timeout. Permitting this delay allows the
kernel to coalesce wake-up events, thus possibly reducing the
number of system wake-ups and saving power. For more details,
see the description of PR_SET_TIMERSLACK in prctl(2).
...
https://man7.org/linux/man-pages/man2/prctl.2.html
PR_SET_TIMERSLACK (since Linux 2.6.28)
Each thread has two associated timer slack values: a
"default" value, and a "current" value. This operation
sets the "current" timer slack value for the calling
thread. arg2 is an unsigned long value, then maximum
"current" value is ULONG_MAX and the minimum "current"
value is 1. If the nanosecond value supplied in arg2 is
greater than zero, then the "current" value is set to this
value. If arg2 is equal to zero, the "current" timer
slack is reset to the thread's "default" timer slack
value.
The "current" timer slack is used by the kernel to group
timer expirations for the calling thread that are close to
one another; as a consequence, timer expirations for the
thread may be up to the specified number of nanoseconds
late (but will never expire early). Grouping timer
expirations can help reduce system power consumption by
minimizing CPU wake-ups.
The timer expirations affected by timer slack are those
set by select(2), pselect(2), poll(2), ppoll(2),
epoll_wait(2), epoll_pwait(2), clock_nanosleep(2),
nanosleep(2), and futex(2) (and thus the library functions
implemented via futexes, including
pthread_cond_timedwait(3), pthread_mutex_timedlock(3),
pthread_rwlock_timedrdlock(3),
pthread_rwlock_timedwrlock(3), and sem_timedwait(3)).
Timer slack is not applied to threads that are scheduled
under a real-time scheduling policy (see
sched_setscheduler(2)).
When a new thread is created, the two timer slack values
are made the same as the "current" value of the creating
thread. Thereafter, a thread can adjust its "current"
timer slack value via PR_SET_TIMERSLACK. The "default"
value can't be changed. The timer slack values of init
(PID 1), the ancestor of all processes, are 50,000
nanoseconds (50 microseconds). The timer slack value is
inherited by a child created via fork(2), and is preserved
across execve(2).
Since Linux 4.6, the "current" timer slack value of any
process can be examined and changed via the file
/proc/pid/timerslack_ns. See proc(5).
On Thu, Oct 26, 2023 at 10:00:33PM +0200, Thomas Monjalon wrote:
> 26/10/2023 16:19, Thomas Monjalon:
> > Real-time thread priority was been forbidden on Unix
> > because of problems they can cause.
> > Warnings and helpers are added to avoid deadlocks,
> > so real-time can be allowed on all systems.
> >
> > Thomas Monjalon (2):
> > eal: add thread yield functions
> > eal/unix: allow creating thread with real-time priority
> >
> > v1: no yield at all
> > v2: more comments, sched_yield() and Sleep(0) on Windows
> > v3: 2 yield functions with sleep in realtime version
> > v4: runtime warning, longer sleep on Unix and lighter yield on Windows
> > v5: fix build and increase Unix sleep to 1 ms
> >
> > Thomas Monjalon (2):
> > eal: add thread yield functions
> > eal/unix: allow creating thread with real-time priority
>
> Now there is a test failing on Windows:
> http://mails.dpdk.org/archives/test-report/2023-October/491475.html
40/84 DPDK:fast-tests / lcores_autotest TIMEOUT
626.38s
i don't think this test is reliable. if the failure is being reported
due to the lcores_autotest timeout.
i also see this test fail on linux for me occasionally when run in my
local lab.
ty
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, 26 October 2023 21.51
>
> 26/10/2023 18:50, Morten Brørup:
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > Sent: Thursday, 26 October 2023 18.07
[...]
> > > For average realtime thread, I suggest the API
> > > rte_thread_yield_realtime()
> > > which could wait for 1 ms or less by default.
> >
> > If we introduce a yield() function, it should yield according to the
> O/S scheduling policy, e.g. the rest of the time slice allocated to the
> thread by the O/S scheduler (although that might not be available for
> real-time prioritized threads in Linux). I don't think we can make it
> O/S agnostic.
> >
> > I don't think it should wait a fixed amount of time - we already have
> rte_delay_us_sleep() for that.
> >
> > In my experiments with power saving, I ended up with a varying sleep
> duration, depending on traffic load. The l3fwd-power example also uses
> a varying sleep duration depending on traffic load.
>
> I feel you lost the goal of this: schedule other threads (especially
> kernel threads).
> So we don't care about the sleep duration at all,
> except we want it minimal while allowing scheduling.
I do understand the goal: We want to relinquish the CPU core briefly, so the RT-priority thread gives way for other threads to run on the same CPU core.
But I don't think it is possible for an RT-priority thread to yield without waiting in a blocking call or waiting for an explicit timeout. This is the reason why Stephen wants the big fat warning sign when assigning RT-priority to a thread.
This will happen if we don't provide something for the RT-priority thread to wait for:
The application thread calls "yield()", e.g. sleep(0), to pass control to the kernel scheduler.
The kernel scheduler looks at all tasks available. The application thread has RT-priority and is ready to run, so the kernel scheduler will pick this thread for running. Only if other RT-priority threads are ready to run, perhaps one of those will be picked. No other threads will be picked, not even kernel threads (because RT-priority is higher than the priority of the kernel threads).
[...]
> Windows should be fine with Sleep(0).
Reading the Win32 API documentation [1] carefully, it think it will generally behave like the Linux scheduler. It says:
"A value of zero causes the thread to relinquish the remainder of its time slice to any other thread of equal priority that is ready to run. If there are no other threads of equal priority ready to run, the function returns immediately, and the thread continues execution. This behavior changed starting with Windows Server 2003."
How I interpret this is: The Windows scheduler will not relinquish the CPU core to lower priority threads when calling Sleep(0). If the application thread has RT-priority, the scheduler will immediately pick the same thread. Only if other RT-priority threads are ready to run, one of those will be picked.
[1]: https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-sleep
@@ -97,21 +97,12 @@ test_thread_priority(void)
"Priority set mismatches priority get");
priority = RTE_THREAD_PRIORITY_REALTIME_CRITICAL;
-#ifndef RTE_EXEC_ENV_WINDOWS
- RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == ENOTSUP,
- "Priority set to critical should fail");
- RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
- "Failed to get thread priority");
- RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_NORMAL,
- "Failed set to critical should have retained normal");
-#else
RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
"Priority set to critical should succeed");
RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
"Failed to get thread priority");
RTE_TEST_ASSERT(priority == RTE_THREAD_PRIORITY_REALTIME_CRITICAL,
"Priority set mismatches priority get");
-#endif
priority = RTE_THREAD_PRIORITY_NORMAL;
RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
@@ -815,7 +815,9 @@ Known Issues
4. It MAY be used by preemptible multi-producer and/or preemptible multi-consumer pthreads whose scheduling policy are all SCHED_OTHER(cfs), SCHED_IDLE or SCHED_BATCH. User SHOULD be aware of the performance penalty before using it.
- 5. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR.
+ 5. It MUST not be used by multi-producer/consumer pthreads
+ whose scheduling policies are ``SCHED_FIFO``
+ or ``SCHED_RR`` (``RTE_THREAD_PRIORITY_REALTIME_CRITICAL``).
Alternatively, applications can use the lock-free stack mempool handler. When
considering this handler, note that:
@@ -59,7 +59,7 @@ enum rte_thread_priority {
RTE_THREAD_PRIORITY_NORMAL = 0,
/**< normal thread priority, the default */
RTE_THREAD_PRIORITY_REALTIME_CRITICAL = 1,
- /**< highest thread priority allowed */
+ /**< highest thread priority, use with caution */
};
/**
@@ -153,11 +153,6 @@ rte_thread_create(rte_thread_t *thread_id,
goto cleanup;
}
- if (thread_attr->priority ==
- RTE_THREAD_PRIORITY_REALTIME_CRITICAL) {
- ret = ENOTSUP;
- goto cleanup;
- }
ret = thread_map_priority_to_os_value(thread_attr->priority,
¶m.sched_priority, &policy);
if (ret != 0)
@@ -275,10 +270,6 @@ rte_thread_set_priority(rte_thread_t thread_id,
int policy;
int ret;
- /* Realtime priority can cause crashes on non-Windows platforms. */
- if (priority == RTE_THREAD_PRIORITY_REALTIME_CRITICAL)
- return ENOTSUP;
-
ret = thread_map_priority_to_os_value(priority, ¶m.sched_priority,
&policy);
if (ret != 0)