examples/ipsec-secgw: fix IPsec performance drop

Message ID 20240206123811.3452587-1-rbhansali@marvell.com (mailing list archive)
State Accepted, archived
Delegated to: akhil goyal
Headers
Series examples/ipsec-secgw: fix IPsec performance drop |

Checks

Context Check Description
ci/loongarch-compilation warning apply patch failure
ci/checkpatch success coding style OK
ci/Intel-compilation warning apply issues
ci/iol-testing warning apply patch failure

Commit Message

Rahul Bhansali Feb. 6, 2024, 12:38 p.m. UTC
Single packet free using rte_pktmbuf_free_bulk() is dropping the
performance. On cn10k, maximum of ~4% drop observed for IPsec
event mode single SA outbound case.

To fix this issue, single packet free will use rte_pktmbuf_free
API.

Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")

Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
---
 examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)
  

Comments

Ferruh Yigit Feb. 6, 2024, 6:25 p.m. UTC | #1
On 2/6/2024 12:38 PM, Rahul Bhansali wrote:
> Single packet free using rte_pktmbuf_free_bulk() is dropping the
> performance. On cn10k, maximum of ~4% drop observed for IPsec
> event mode single SA outbound case.
> 
> To fix this issue, single packet free will use rte_pktmbuf_free
> API.
> 
> Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")
> 
> Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
> ---
>  examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/examples/ipsec-secgw/ipsec-secgw.h b/examples/ipsec-secgw/ipsec-secgw.h
> index 8baab44ee7..ec33a982df 100644
> --- a/examples/ipsec-secgw/ipsec-secgw.h
> +++ b/examples/ipsec-secgw/ipsec-secgw.h
> @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf *mb)
>  }
>  
>  /* helper routine to free bulk of packets */
> -static inline void
> -free_pkts(struct rte_mbuf *mb[], uint32_t n)
> +static __rte_always_inline void
> +free_pkts(struct rte_mbuf *mb[], const uint32_t n)
>  {
> -	rte_pktmbuf_free_bulk(mb, n);
> -
> +	n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n);
>  	core_stats_update_drop(n);
>  }
>  

Hi Rahul,

Do you think the 'rte_pktmbuf_free_bulk()' API performance can be
improved by similar change?
  
Rahul Bhansali Feb. 7, 2024, 6:46 a.m. UTC | #2
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Tuesday, February 6, 2024 11:55 PM
> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu Nicolau
> <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>; Konstantin
> Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
> <anoobj@marvell.com>
> Subject: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance drop
> 
> External Email
> 
> ----------------------------------------------------------------------
> On 2/6/2024 12:38 PM, Rahul Bhansali wrote:
> > Single packet free using rte_pktmbuf_free_bulk() is dropping the
> > performance. On cn10k, maximum of ~4% drop observed for IPsec event
> > mode single SA outbound case.
> >
> > To fix this issue, single packet free will use rte_pktmbuf_free API.
> >
> > Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")
> >
> > Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
> > ---
> >  examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
> >  1 file changed, 3 insertions(+), 4 deletions(-)
> >
> > diff --git a/examples/ipsec-secgw/ipsec-secgw.h
> > b/examples/ipsec-secgw/ipsec-secgw.h
> > index 8baab44ee7..ec33a982df 100644
> > --- a/examples/ipsec-secgw/ipsec-secgw.h
> > +++ b/examples/ipsec-secgw/ipsec-secgw.h
> > @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf *mb)  }
> >
> >  /* helper routine to free bulk of packets */ -static inline void
> > -free_pkts(struct rte_mbuf *mb[], uint32_t n)
> > +static __rte_always_inline void
> > +free_pkts(struct rte_mbuf *mb[], const uint32_t n)
> >  {
> > -	rte_pktmbuf_free_bulk(mb, n);
> > -
> > +	n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n);
> >  	core_stats_update_drop(n);
> >  }
> >
> 
> Hi Rahul,
> 
> Do you think the 'rte_pktmbuf_free_bulk()' API performance can be improved by
> similar change?

Hi Ferruh,
Currently 'rte_pktmbuf_free_bulk() is not inline. If we make that along with __rte_pktmbuf_free_seg_via_array()  both inline then performance can be improved similar.
  
Ferruh Yigit Feb. 7, 2024, 10:35 a.m. UTC | #3
On 2/7/2024 6:46 AM, Rahul Bhansali wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Tuesday, February 6, 2024 11:55 PM
>> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu Nicolau
>> <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>; Konstantin
>> Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
>> <anoobj@marvell.com>
>> Subject: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance drop
>>
>> External Email
>>
>> ----------------------------------------------------------------------
>> On 2/6/2024 12:38 PM, Rahul Bhansali wrote:
>>> Single packet free using rte_pktmbuf_free_bulk() is dropping the
>>> performance. On cn10k, maximum of ~4% drop observed for IPsec event
>>> mode single SA outbound case.
>>>
>>> To fix this issue, single packet free will use rte_pktmbuf_free API.
>>>
>>> Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")
>>>
>>> Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
>>> ---
>>>  examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
>>>  1 file changed, 3 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/examples/ipsec-secgw/ipsec-secgw.h
>>> b/examples/ipsec-secgw/ipsec-secgw.h
>>> index 8baab44ee7..ec33a982df 100644
>>> --- a/examples/ipsec-secgw/ipsec-secgw.h
>>> +++ b/examples/ipsec-secgw/ipsec-secgw.h
>>> @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf *mb)  }
>>>
>>>  /* helper routine to free bulk of packets */ -static inline void
>>> -free_pkts(struct rte_mbuf *mb[], uint32_t n)
>>> +static __rte_always_inline void
>>> +free_pkts(struct rte_mbuf *mb[], const uint32_t n)
>>>  {
>>> -	rte_pktmbuf_free_bulk(mb, n);
>>> -
>>> +	n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n);
>>>  	core_stats_update_drop(n);
>>>  }
>>>
>>
>> Hi Rahul,
>>
>> Do you think the 'rte_pktmbuf_free_bulk()' API performance can be improved by
>> similar change?
> 
> Hi Ferruh,
> Currently 'rte_pktmbuf_free_bulk() is not inline. If we make that along with __rte_pktmbuf_free_seg_via_array()  both inline then performance can be improved similar.
>

Ah, so performance improvement is coming from 'rte_pktmbuf_free()' being
inline, OK.

As you are doing performance testing in that area, can you please check
if '__rte_pktmbuf_free_seg_via_array()' is inlined, as it is static
function I expect it to be inlined. If not, can you please test with
force inlining it (__rte_always_inline)?


And I wonder if bulk() API may get single mbuf is a common theme, does
it makes sense add a new inline wrapper to library to cover this case,
if it is bringing ~4% improvement, like:
```
static inline void
rte_pktmbuf_free_bulk_or_one(... **mb, unsigned int n)
{
	if (n == 1)
		return rte_pktmbuf_free(mb[0]);
	return rte_pktmbuf_free_bulk(mb, n);
}
```
  
Rahul Bhansali Feb. 9, 2024, 1:10 p.m. UTC | #4
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Wednesday, February 7, 2024 4:06 PM
> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu Nicolau
> <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>; Konstantin
> Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
> <anoobj@marvell.com>
> Subject: Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance drop
> 
> On 2/7/2024 6:46 AM, Rahul Bhansali wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Tuesday, February 6, 2024 11:55 PM
> >> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu
> >> Nicolau <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>;
> >> Konstantin Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
> >> <anoobj@marvell.com>
> >> Subject: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec
> >> performance drop
> >>
> >> External Email
> >>
> >> ---------------------------------------------------------------------
> >> - On 2/6/2024 12:38 PM, Rahul Bhansali wrote:
> >>> Single packet free using rte_pktmbuf_free_bulk() is dropping the
> >>> performance. On cn10k, maximum of ~4% drop observed for IPsec event
> >>> mode single SA outbound case.
> >>>
> >>> To fix this issue, single packet free will use rte_pktmbuf_free API.
> >>>
> >>> Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")
> >>>
> >>> Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
> >>> ---
> >>>  examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
> >>>  1 file changed, 3 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/examples/ipsec-secgw/ipsec-secgw.h
> >>> b/examples/ipsec-secgw/ipsec-secgw.h
> >>> index 8baab44ee7..ec33a982df 100644
> >>> --- a/examples/ipsec-secgw/ipsec-secgw.h
> >>> +++ b/examples/ipsec-secgw/ipsec-secgw.h
> >>> @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf *mb)
> >>> }
> >>>
> >>>  /* helper routine to free bulk of packets */ -static inline void
> >>> -free_pkts(struct rte_mbuf *mb[], uint32_t n)
> >>> +static __rte_always_inline void
> >>> +free_pkts(struct rte_mbuf *mb[], const uint32_t n)
> >>>  {
> >>> -	rte_pktmbuf_free_bulk(mb, n);
> >>> -
> >>> +	n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n);
> >>>  	core_stats_update_drop(n);
> >>>  }
> >>>
> >>
> >> Hi Rahul,
> >>
> >> Do you think the 'rte_pktmbuf_free_bulk()' API performance can be
> >> improved by similar change?
> >
> > Hi Ferruh,
> > Currently 'rte_pktmbuf_free_bulk() is not inline. If we make that along with
> __rte_pktmbuf_free_seg_via_array()  both inline then performance can be
> improved similar.
> >
> 
> Ah, so performance improvement is coming from 'rte_pktmbuf_free()' being
> inline, OK.
> 
> As you are doing performance testing in that area, can you please check if
> '__rte_pktmbuf_free_seg_via_array()' is inlined, as it is static function I expect it
> to be inlined. If not, can you please test with force inlining it
> (__rte_always_inline)?
It was not inline, did check with force inline also and no impact with this, so I can make it force inline.
> 
> 
> And I wonder if bulk() API may get single mbuf is a common theme, does it makes
> sense add a new inline wrapper to library to cover this case, if it is bringing ~4%
> improvement, like:
> ```
> static inline void
> rte_pktmbuf_free_bulk_or_one(... **mb, unsigned int n) {
> 	if (n == 1)
> 		return rte_pktmbuf_free(mb[0]);
> 	return rte_pktmbuf_free_bulk(mb, n);
> }
Agree, can make this wrapper to cover a case where bulk free API is called but might have single mbuf to get better perf. It can be further optimize " if (n == 1)" with compile time constant check,
```
static inline void
rte_pktmbuf_free_bulk_or_one(struct rte_mbuf **mb, unsigned int n)
{
       if (__builtin_constant_p(n) && (n == 1))
               rte_pktmbuf_free(mb[0]);
       else
               rte_pktmbuf_free_bulk(mb, n);
}
```
Let me know if it is fine. I'll send v2. And, this will be " __rte_experimental" right ?
> ```
  
Ferruh Yigit Feb. 9, 2024, 1:51 p.m. UTC | #5
On 2/9/2024 1:10 PM, Rahul Bhansali wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Wednesday, February 7, 2024 4:06 PM
>> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu Nicolau
>> <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>; Konstantin
>> Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
>> <anoobj@marvell.com>
>> Subject: Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance drop
>>
>> On 2/7/2024 6:46 AM, Rahul Bhansali wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>>>> Sent: Tuesday, February 6, 2024 11:55 PM
>>>> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu
>>>> Nicolau <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>;
>>>> Konstantin Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
>>>> <anoobj@marvell.com>
>>>> Subject: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec
>>>> performance drop
>>>>
>>>> External Email
>>>>
>>>> ---------------------------------------------------------------------
>>>> - On 2/6/2024 12:38 PM, Rahul Bhansali wrote:
>>>>> Single packet free using rte_pktmbuf_free_bulk() is dropping the
>>>>> performance. On cn10k, maximum of ~4% drop observed for IPsec event
>>>>> mode single SA outbound case.
>>>>>
>>>>> To fix this issue, single packet free will use rte_pktmbuf_free API.
>>>>>
>>>>> Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")
>>>>>
>>>>> Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
>>>>> ---
>>>>>  examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
>>>>>  1 file changed, 3 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/examples/ipsec-secgw/ipsec-secgw.h
>>>>> b/examples/ipsec-secgw/ipsec-secgw.h
>>>>> index 8baab44ee7..ec33a982df 100644
>>>>> --- a/examples/ipsec-secgw/ipsec-secgw.h
>>>>> +++ b/examples/ipsec-secgw/ipsec-secgw.h
>>>>> @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf *mb)
>>>>> }
>>>>>
>>>>>  /* helper routine to free bulk of packets */ -static inline void
>>>>> -free_pkts(struct rte_mbuf *mb[], uint32_t n)
>>>>> +static __rte_always_inline void
>>>>> +free_pkts(struct rte_mbuf *mb[], const uint32_t n)
>>>>>  {
>>>>> -	rte_pktmbuf_free_bulk(mb, n);
>>>>> -
>>>>> +	n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n);
>>>>>  	core_stats_update_drop(n);
>>>>>  }
>>>>>
>>>>
>>>> Hi Rahul,
>>>>
>>>> Do you think the 'rte_pktmbuf_free_bulk()' API performance can be
>>>> improved by similar change?
>>>
>>> Hi Ferruh,
>>> Currently 'rte_pktmbuf_free_bulk() is not inline. If we make that along with
>> __rte_pktmbuf_free_seg_via_array()  both inline then performance can be
>> improved similar.
>>>
>>
>> Ah, so performance improvement is coming from 'rte_pktmbuf_free()' being
>> inline, OK.
>>
>> As you are doing performance testing in that area, can you please check if
>> '__rte_pktmbuf_free_seg_via_array()' is inlined, as it is static function I expect it
>> to be inlined. If not, can you please test with force inlining it
>> (__rte_always_inline)?
> It was not inline, did check with force inline also and no impact with this, so I can make it force inline.
>

If there is no performance improvement, I think no need to force inline
'__rte_pktmbuf_free_seg_via_array()'.

>>
>>
>> And I wonder if bulk() API may get single mbuf is a common theme, does it makes
>> sense add a new inline wrapper to library to cover this case, if it is bringing ~4%
>> improvement, like:
>> ```
>> static inline void
>> rte_pktmbuf_free_bulk_or_one(... **mb, unsigned int n) {
>> 	if (n == 1)
>> 		return rte_pktmbuf_free(mb[0]);
>> 	return rte_pktmbuf_free_bulk(mb, n);
>> }
> Agree, can make this wrapper to cover a case where bulk free API is called but might have single mbuf to get better perf. It can be further optimize " if (n == 1)" with compile time constant check,
> ```
> static inline void
> rte_pktmbuf_free_bulk_or_one(struct rte_mbuf **mb, unsigned int n)
> {
>        if (__builtin_constant_p(n) && (n == 1))
>                rte_pktmbuf_free(mb[0]);
>        else
>                rte_pktmbuf_free_bulk(mb, n);
> }
> ```
> Let me know if it is fine. I'll send v2. And, this will be " __rte_experimental" right ?
>

Compile time constant check can prevent penalty from additional check,
which is good, and I can see this can work for the examples/ipsec-secgw
usecase above, which has some hardcoded single mbuf free calls.

But most of the other usecases I think 'n' won't be known in compile
time, so API will be effectively same as free_bulk().

If you have it with runtime check, do you still observe any performance
improvement? If not perhaps we can go only with example code update,
without new API.
  
Rahul Bhansali Feb. 13, 2024, 12:50 p.m. UTC | #6
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, February 9, 2024 7:21 PM
> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu Nicolau
> <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>; Konstantin
> Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
> <anoobj@marvell.com>
> Subject: Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance drop
> 
> On 2/9/2024 1:10 PM, Rahul Bhansali wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Wednesday, February 7, 2024 4:06 PM
> >> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu
> >> Nicolau <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>;
> >> Konstantin Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
> >> <anoobj@marvell.com>
> >> Subject: Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec
> >> performance drop
> >>
> >> On 2/7/2024 6:46 AM, Rahul Bhansali wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >>>> Sent: Tuesday, February 6, 2024 11:55 PM
> >>>> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu
> >>>> Nicolau <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>;
> >>>> Konstantin Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
> >>>> <anoobj@marvell.com>
> >>>> Subject: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec
> >>>> performance drop
> >>>>
> >>>> External Email
> >>>>
> >>>> -------------------------------------------------------------------
> >>>> --
> >>>> - On 2/6/2024 12:38 PM, Rahul Bhansali wrote:
> >>>>> Single packet free using rte_pktmbuf_free_bulk() is dropping the
> >>>>> performance. On cn10k, maximum of ~4% drop observed for IPsec
> >>>>> event mode single SA outbound case.
> >>>>>
> >>>>> To fix this issue, single packet free will use rte_pktmbuf_free API.
> >>>>>
> >>>>> Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")
> >>>>>
> >>>>> Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
> >>>>> ---
> >>>>>  examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
> >>>>>  1 file changed, 3 insertions(+), 4 deletions(-)
> >>>>>
> >>>>> diff --git a/examples/ipsec-secgw/ipsec-secgw.h
> >>>>> b/examples/ipsec-secgw/ipsec-secgw.h
> >>>>> index 8baab44ee7..ec33a982df 100644
> >>>>> --- a/examples/ipsec-secgw/ipsec-secgw.h
> >>>>> +++ b/examples/ipsec-secgw/ipsec-secgw.h
> >>>>> @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf
> >>>>> *mb) }
> >>>>>
> >>>>>  /* helper routine to free bulk of packets */ -static inline void
> >>>>> -free_pkts(struct rte_mbuf *mb[], uint32_t n)
> >>>>> +static __rte_always_inline void
> >>>>> +free_pkts(struct rte_mbuf *mb[], const uint32_t n)
> >>>>>  {
> >>>>> -	rte_pktmbuf_free_bulk(mb, n);
> >>>>> -
> >>>>> +	n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n);
> >>>>>  	core_stats_update_drop(n);
> >>>>>  }
> >>>>>
> >>>>
> >>>> Hi Rahul,
> >>>>
> >>>> Do you think the 'rte_pktmbuf_free_bulk()' API performance can be
> >>>> improved by similar change?
> >>>
> >>> Hi Ferruh,
> >>> Currently 'rte_pktmbuf_free_bulk() is not inline. If we make that
> >>> along with
> >> __rte_pktmbuf_free_seg_via_array()  both inline then performance can
> >> be improved similar.
> >>>
> >>
> >> Ah, so performance improvement is coming from 'rte_pktmbuf_free()'
> >> being inline, OK.
> >>
> >> As you are doing performance testing in that area, can you please
> >> check if '__rte_pktmbuf_free_seg_via_array()' is inlined, as it is
> >> static function I expect it to be inlined. If not, can you please
> >> test with force inlining it (__rte_always_inline)?
> > It was not inline, did check with force inline also and no impact with this, so I
> can make it force inline.
> >
> 
> If there is no performance improvement, I think no need to force inline
> '__rte_pktmbuf_free_seg_via_array()'.
> 
> >>
> >>
> >> And I wonder if bulk() API may get single mbuf is a common theme,
> >> does it makes sense add a new inline wrapper to library to cover this
> >> case, if it is bringing ~4% improvement, like:
> >> ```
> >> static inline void
> >> rte_pktmbuf_free_bulk_or_one(... **mb, unsigned int n) {
> >> 	if (n == 1)
> >> 		return rte_pktmbuf_free(mb[0]);
> >> 	return rte_pktmbuf_free_bulk(mb, n); }
> > Agree, can make this wrapper to cover a case where bulk free API is
> > called but might have single mbuf to get better perf. It can be
> > further optimize " if (n == 1)" with compile time constant check, ```
> > static inline void rte_pktmbuf_free_bulk_or_one(struct rte_mbuf **mb,
> > unsigned int n) {
> >        if (__builtin_constant_p(n) && (n == 1))
> >                rte_pktmbuf_free(mb[0]);
> >        else
> >                rte_pktmbuf_free_bulk(mb, n); } ``` Let me know if it
> > is fine. I'll send v2. And, this will be " __rte_experimental" right ?
> >
> 
> Compile time constant check can prevent penalty from additional check, which is
> good, and I can see this can work for the examples/ipsec-secgw usecase above,
> which has some hardcoded single mbuf free calls.
> 
> But most of the other usecases I think 'n' won't be known in compile time, so API
> will be effectively same as free_bulk().
Agree.
> 
> If you have it with runtime check, do you still observe any performance
> improvement? If not perhaps we can go only with example code update, without
> new API.
With runtime check, performance improvement is small only in compare to compile time check. So can continue without this new API.
  
Akhil Goyal Feb. 29, 2024, 5:06 p.m. UTC | #7
> > Subject: Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance
> drop
> >
> > On 2/9/2024 1:10 PM, Rahul Bhansali wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> > >> Sent: Wednesday, February 7, 2024 4:06 PM
> > >> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu
> > >> Nicolau <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>;
> > >> Konstantin Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
> > >> <anoobj@marvell.com>
> > >> Subject: Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec
> > >> performance drop
> > >>
> > >> On 2/7/2024 6:46 AM, Rahul Bhansali wrote:
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
> > >>>> Sent: Tuesday, February 6, 2024 11:55 PM
> > >>>> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu
> > >>>> Nicolau <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>;
> > >>>> Konstantin Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
> > >>>> <anoobj@marvell.com>
> > >>>> Subject: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec
> > >>>> performance drop
> > >>>>
> > >>>> External Email
> > >>>>
> > >>>> -------------------------------------------------------------------
> > >>>> --
> > >>>> - On 2/6/2024 12:38 PM, Rahul Bhansali wrote:
> > >>>>> Single packet free using rte_pktmbuf_free_bulk() is dropping the
> > >>>>> performance. On cn10k, maximum of ~4% drop observed for IPsec
> > >>>>> event mode single SA outbound case.
> > >>>>>
> > >>>>> To fix this issue, single packet free will use rte_pktmbuf_free API.
> > >>>>>
> > >>>>> Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")
> > >>>>>
> > >>>>> Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
> > >>>>> ---
> > >>>>>  examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
> > >>>>>  1 file changed, 3 insertions(+), 4 deletions(-)
> > >>>>>
> > >>>>> diff --git a/examples/ipsec-secgw/ipsec-secgw.h
> > >>>>> b/examples/ipsec-secgw/ipsec-secgw.h
> > >>>>> index 8baab44ee7..ec33a982df 100644
> > >>>>> --- a/examples/ipsec-secgw/ipsec-secgw.h
> > >>>>> +++ b/examples/ipsec-secgw/ipsec-secgw.h
> > >>>>> @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf
> > >>>>> *mb) }
> > >>>>>
> > >>>>>  /* helper routine to free bulk of packets */ -static inline void
> > >>>>> -free_pkts(struct rte_mbuf *mb[], uint32_t n)
> > >>>>> +static __rte_always_inline void
> > >>>>> +free_pkts(struct rte_mbuf *mb[], const uint32_t n)
> > >>>>>  {
> > >>>>> -	rte_pktmbuf_free_bulk(mb, n);
> > >>>>> -
> > >>>>> +	n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb,
> n);
> > >>>>>  	core_stats_update_drop(n);
> > >>>>>  }
> > >>>>>
> > >>>>
> > >>>> Hi Rahul,
> > >>>>
> > >>>> Do you think the 'rte_pktmbuf_free_bulk()' API performance can be
> > >>>> improved by similar change?
> > >>>
> > >>> Hi Ferruh,
> > >>> Currently 'rte_pktmbuf_free_bulk() is not inline. If we make that
> > >>> along with
> > >> __rte_pktmbuf_free_seg_via_array()  both inline then performance can
> > >> be improved similar.
> > >>>
> > >>
> > >> Ah, so performance improvement is coming from 'rte_pktmbuf_free()'
> > >> being inline, OK.
> > >>
> > >> As you are doing performance testing in that area, can you please
> > >> check if '__rte_pktmbuf_free_seg_via_array()' is inlined, as it is
> > >> static function I expect it to be inlined. If not, can you please
> > >> test with force inlining it (__rte_always_inline)?
> > > It was not inline, did check with force inline also and no impact with this, so I
> > can make it force inline.
> > >
> >
> > If there is no performance improvement, I think no need to force inline
> > '__rte_pktmbuf_free_seg_via_array()'.
> >
> > >>
> > >>
> > >> And I wonder if bulk() API may get single mbuf is a common theme,
> > >> does it makes sense add a new inline wrapper to library to cover this
> > >> case, if it is bringing ~4% improvement, like:
> > >> ```
> > >> static inline void
> > >> rte_pktmbuf_free_bulk_or_one(... **mb, unsigned int n) {
> > >> 	if (n == 1)
> > >> 		return rte_pktmbuf_free(mb[0]);
> > >> 	return rte_pktmbuf_free_bulk(mb, n); }
> > > Agree, can make this wrapper to cover a case where bulk free API is
> > > called but might have single mbuf to get better perf. It can be
> > > further optimize " if (n == 1)" with compile time constant check, ```
> > > static inline void rte_pktmbuf_free_bulk_or_one(struct rte_mbuf **mb,
> > > unsigned int n) {
> > >        if (__builtin_constant_p(n) && (n == 1))
> > >                rte_pktmbuf_free(mb[0]);
> > >        else
> > >                rte_pktmbuf_free_bulk(mb, n); } ``` Let me know if it
> > > is fine. I'll send v2. And, this will be " __rte_experimental" right ?
> > >
> >
> > Compile time constant check can prevent penalty from additional check, which
> is
> > good, and I can see this can work for the examples/ipsec-secgw usecase above,
> > which has some hardcoded single mbuf free calls.
> >
> > But most of the other usecases I think 'n' won't be known in compile time, so
> API
> > will be effectively same as free_bulk().
> Agree.
> >
> > If you have it with runtime check, do you still observe any performance
> > improvement? If not perhaps we can go only with example code update,
> without
> > new API.
> With runtime check, performance improvement is small only in compare to
> compile time check. So can continue without this new API.

Acked-by: Akhil Goyal <gakhil@marvell.com>
Applied to dpdk-next-crypto
Thanks.
  

Patch

diff --git a/examples/ipsec-secgw/ipsec-secgw.h b/examples/ipsec-secgw/ipsec-secgw.h
index 8baab44ee7..ec33a982df 100644
--- a/examples/ipsec-secgw/ipsec-secgw.h
+++ b/examples/ipsec-secgw/ipsec-secgw.h
@@ -229,11 +229,10 @@  free_reassembly_fail_pkt(struct rte_mbuf *mb)
 }
 
 /* helper routine to free bulk of packets */
-static inline void
-free_pkts(struct rte_mbuf *mb[], uint32_t n)
+static __rte_always_inline void
+free_pkts(struct rte_mbuf *mb[], const uint32_t n)
 {
-	rte_pktmbuf_free_bulk(mb, n);
-
+	n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n);
 	core_stats_update_drop(n);
 }