diff mbox series

[1/4] latencystats: use alloca instead of vla trivial

Message ID	1712250913-1977-2-git-send-email-roretzla@linux.microsoft.com (mailing list archive)
State	Superseded
Headers	From: Tyler Retzlaff <roretzla@linux.microsoft.com> To: dev@dpdk.org Cc: Bruce Richardson <bruce.richardson@intel.com>, Stephen Hemminger <stephen@networkplumber.org>, Thomas Monjalon <thomas@monjalon.net>, =?utf-8?q?Morten_Br=C3=B8rup?= <mb@smartsharesystems.com>, Tyler Retzlaff <roretzla@linux.microsoft.com> Subject: [PATCH 1/4] latencystats: use alloca instead of vla trivial Date: Thu, 4 Apr 2024 10:15:10 -0700 Message-Id: <1712250913-1977-2-git-send-email-roretzla@linux.microsoft.com> In-Reply-To: <1712250913-1977-1-git-send-email-roretzla@linux.microsoft.com> References: <20231107193220.GA15232@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> <1712250913-1977-1-git-send-email-roretzla@linux.microsoft.com> Precedence: list Errors-To: dev-bounces@dpdk.org
Series	RFC samples converting VLA to alloca \| [0/4] RFC samples converting VLA to alloca [1/4] latencystats: use alloca instead of vla trivial [2/4] hash: use alloca instead of vla trivial [3/4] vhost: use alloca instead of vla sizeof [4/4] dispatcher: use alloca instead of vla multi dimensional

Checks

Context	Check	Description
ci/checkpatch	success	coding style OK

Commit Message

Tyler Retzlaff April 4, 2024, 5:15 p.m. UTC

  RFC sample illustrating simple conversion of VLA to alloca().

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/latencystats/rte_latencystats.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Morten Brørup April 6, 2024, 3:28 p.m. UTC | #1

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Thursday, 4 April 2024 19.15
> 
> RFC sample illustrating simple conversion of VLA to alloca().
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---

[...]

> --- a/lib/latencystats/rte_latencystats.c
> +++ b/lib/latencystats/rte_latencystats.c
> @@ -159,7 +159,7 @@ struct latency_stats_nameoff {
>  {
>  	unsigned int i, cnt = 0;
>  	uint64_t now;
> -	float latency[nb_pkts];
> +	float *latency = alloca(sizeof(float) * nb_pkts);

In cases where we are processing packet bursts, I would prefer introducing a global #define RTE_MAX_PKT_BURST_SIZE, indicating the max packet burst size supported by libraries and drivers.
For reference, rte_config.h already has #define RTE_GRAPH_BURST_SIZE 256.

Such a common define should also be used by functions such as rte_pktmbuf_free_bulk(); although it also supports segmented packets, so it must still be able to handle more mbufs.
https://elixir.bootlin.com/dpdk/v24.03/source/lib/mbuf/rte_mbuf.c#L486

Mattias Rönnblom April 7, 2024, 9:36 a.m. UTC | #2

On 2024-04-06 17:28, Morten Brørup wrote:
>> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
>> Sent: Thursday, 4 April 2024 19.15
>>
>> RFC sample illustrating simple conversion of VLA to alloca().
>>
>> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
>> ---
> 
> [...]
> 
>> --- a/lib/latencystats/rte_latencystats.c
>> +++ b/lib/latencystats/rte_latencystats.c
>> @@ -159,7 +159,7 @@ struct latency_stats_nameoff {
>>   {
>>   	unsigned int i, cnt = 0;
>>   	uint64_t now;
>> -	float latency[nb_pkts];
>> +	float *latency = alloca(sizeof(float) * nb_pkts);
> 
> In cases where we are processing packet bursts, I would prefer introducing a global #define RTE_MAX_PKT_BURST_SIZE, indicating the max packet burst size supported by libraries and drivers.

First question: what is meant by a "packet" here? An mbuf? A 
network-layer PDU? Something that in some way relates to zero or more 
packets, like an rte_event? Or just any object that are sent or receive 
of some DPDK API in batches or bursts?

Second question: is RTE_MAX_PKT_BURST_SIZE meant as an upper bound, so 
no API can consumer or produce a burst larger than this, it does all 
APIs literally have to support that burst size.

Third question: why not just keep VLAs?

> For reference, rte_config.h already has #define RTE_GRAPH_BURST_SIZE 256.
> 
> Such a common define should also be used by functions such as rte_pktmbuf_free_bulk(); although it also supports segmented packets, so it must still be able to handle more mbufs.
> https://elixir.bootlin.com/dpdk/v24.03/source/lib/mbuf/rte_mbuf.c#L486
>

Stephen Hemminger April 7, 2024, 5 p.m. UTC | #3

On Sun, 7 Apr 2024 11:36:59 +0200
Mattias Rönnblom <hofors@lysator.liu.se> wrote:

> On 2024-04-06 17:28, Morten Brørup wrote:
> >> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> >> Sent: Thursday, 4 April 2024 19.15
> >>
> >> RFC sample illustrating simple conversion of VLA to alloca().
> >>
> >> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> >> ---  
> > 
> > [...]
> >   
> >> --- a/lib/latencystats/rte_latencystats.c
> >> +++ b/lib/latencystats/rte_latencystats.c
> >> @@ -159,7 +159,7 @@ struct latency_stats_nameoff {
> >>   {
> >>   	unsigned int i, cnt = 0;
> >>   	uint64_t now;
> >> -	float latency[nb_pkts];
> >> +	float *latency = alloca(sizeof(float) * nb_pkts);  
> > 
> > In cases where we are processing packet bursts, I would prefer introducing a global #define RTE_MAX_PKT_BURST_SIZE, indicating the max packet burst size supported by libraries and drivers.  
> 
> First question: what is meant by a "packet" here? An mbuf? A 
> network-layer PDU? Something that in some way relates to zero or more 
> packets, like an rte_event? Or just any object that are sent or receive 
> of some DPDK API in batches or bursts?
> 
> Second question: is RTE_MAX_PKT_BURST_SIZE meant as an upper bound, so 
> no API can consumer or produce a burst larger than this, it does all 
> APIs literally have to support that burst size.
> 
> Third question: why not just keep VLAs?
> 
> > For reference, rte_config.h already has #define RTE_GRAPH_BURST_SIZE 256.
> > 
> > Such a common define should also be used by functions such as rte_pktmbuf_free_bulk(); although it also supports segmented packets, so it must still be able to handle more mbufs.
> > https://elixir.bootlin.com/dpdk/v24.03/source/lib/mbuf/rte_mbuf.c#L486
> >   

Looking at the maths here, calc_lantency can be seriously improved:
   - the calc latency is in the fast path. for transmit.
   - it is doing floating point math; floating point is much slower than doing
     fixed point
   - the latency[] array is a temporary, it should be possible to compute
     total latency without it.
   - it acquires a lock, in order to achieve DPDK level performance of 40 Mpps, it is
     necessary to not do absolute minimum of locking.

diff mbox series

Patch

diff --git a/lib/latencystats/rte_latencystats.c b/lib/latencystats/rte_latencystats.c
index 4ea9b0d..f59a9eb 100644
--- a/lib/latencystats/rte_latencystats.c
+++ b/lib/latencystats/rte_latencystats.c
@@ -159,7 +159,7 @@  struct latency_stats_nameoff {
 {
 	unsigned int i, cnt = 0;
 	uint64_t now;
-	float latency[nb_pkts];
+	float *latency = alloca(sizeof(float) * nb_pkts);
 	static float prev_latency;
 	/*
 	 * Alpha represents degree of weighting decrease in EWMA,