examples/ipsec-secgw: fix IPsec performance drop
Checks
Commit Message
Single packet free using rte_pktmbuf_free_bulk() is dropping the
performance. On cn10k, maximum of ~4% drop observed for IPsec
event mode single SA outbound case.
To fix this issue, single packet free will use rte_pktmbuf_free
API.
Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
---
examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
Comments
On 2/6/2024 12:38 PM, Rahul Bhansali wrote:
> Single packet free using rte_pktmbuf_free_bulk() is dropping the
> performance. On cn10k, maximum of ~4% drop observed for IPsec
> event mode single SA outbound case.
>
> To fix this issue, single packet free will use rte_pktmbuf_free
> API.
>
> Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")
>
> Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
> ---
> examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/examples/ipsec-secgw/ipsec-secgw.h b/examples/ipsec-secgw/ipsec-secgw.h
> index 8baab44ee7..ec33a982df 100644
> --- a/examples/ipsec-secgw/ipsec-secgw.h
> +++ b/examples/ipsec-secgw/ipsec-secgw.h
> @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf *mb)
> }
>
> /* helper routine to free bulk of packets */
> -static inline void
> -free_pkts(struct rte_mbuf *mb[], uint32_t n)
> +static __rte_always_inline void
> +free_pkts(struct rte_mbuf *mb[], const uint32_t n)
> {
> - rte_pktmbuf_free_bulk(mb, n);
> -
> + n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n);
> core_stats_update_drop(n);
> }
>
Hi Rahul,
Do you think the 'rte_pktmbuf_free_bulk()' API performance can be
improved by similar change?
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Tuesday, February 6, 2024 11:55 PM
> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu Nicolau
> <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>; Konstantin
> Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
> <anoobj@marvell.com>
> Subject: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance drop
>
> External Email
>
> ----------------------------------------------------------------------
> On 2/6/2024 12:38 PM, Rahul Bhansali wrote:
> > Single packet free using rte_pktmbuf_free_bulk() is dropping the
> > performance. On cn10k, maximum of ~4% drop observed for IPsec event
> > mode single SA outbound case.
> >
> > To fix this issue, single packet free will use rte_pktmbuf_free API.
> >
> > Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")
> >
> > Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
> > ---
> > examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
> > 1 file changed, 3 insertions(+), 4 deletions(-)
> >
> > diff --git a/examples/ipsec-secgw/ipsec-secgw.h
> > b/examples/ipsec-secgw/ipsec-secgw.h
> > index 8baab44ee7..ec33a982df 100644
> > --- a/examples/ipsec-secgw/ipsec-secgw.h
> > +++ b/examples/ipsec-secgw/ipsec-secgw.h
> > @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf *mb) }
> >
> > /* helper routine to free bulk of packets */ -static inline void
> > -free_pkts(struct rte_mbuf *mb[], uint32_t n)
> > +static __rte_always_inline void
> > +free_pkts(struct rte_mbuf *mb[], const uint32_t n)
> > {
> > - rte_pktmbuf_free_bulk(mb, n);
> > -
> > + n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n);
> > core_stats_update_drop(n);
> > }
> >
>
> Hi Rahul,
>
> Do you think the 'rte_pktmbuf_free_bulk()' API performance can be improved by
> similar change?
Hi Ferruh,
Currently 'rte_pktmbuf_free_bulk() is not inline. If we make that along with __rte_pktmbuf_free_seg_via_array() both inline then performance can be improved similar.
On 2/7/2024 6:46 AM, Rahul Bhansali wrote:
>
>
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Tuesday, February 6, 2024 11:55 PM
>> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu Nicolau
>> <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>; Konstantin
>> Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
>> <anoobj@marvell.com>
>> Subject: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance drop
>>
>> External Email
>>
>> ----------------------------------------------------------------------
>> On 2/6/2024 12:38 PM, Rahul Bhansali wrote:
>>> Single packet free using rte_pktmbuf_free_bulk() is dropping the
>>> performance. On cn10k, maximum of ~4% drop observed for IPsec event
>>> mode single SA outbound case.
>>>
>>> To fix this issue, single packet free will use rte_pktmbuf_free API.
>>>
>>> Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")
>>>
>>> Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
>>> ---
>>> examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
>>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/examples/ipsec-secgw/ipsec-secgw.h
>>> b/examples/ipsec-secgw/ipsec-secgw.h
>>> index 8baab44ee7..ec33a982df 100644
>>> --- a/examples/ipsec-secgw/ipsec-secgw.h
>>> +++ b/examples/ipsec-secgw/ipsec-secgw.h
>>> @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf *mb) }
>>>
>>> /* helper routine to free bulk of packets */ -static inline void
>>> -free_pkts(struct rte_mbuf *mb[], uint32_t n)
>>> +static __rte_always_inline void
>>> +free_pkts(struct rte_mbuf *mb[], const uint32_t n)
>>> {
>>> - rte_pktmbuf_free_bulk(mb, n);
>>> -
>>> + n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n);
>>> core_stats_update_drop(n);
>>> }
>>>
>>
>> Hi Rahul,
>>
>> Do you think the 'rte_pktmbuf_free_bulk()' API performance can be improved by
>> similar change?
>
> Hi Ferruh,
> Currently 'rte_pktmbuf_free_bulk() is not inline. If we make that along with __rte_pktmbuf_free_seg_via_array() both inline then performance can be improved similar.
>
Ah, so performance improvement is coming from 'rte_pktmbuf_free()' being
inline, OK.
As you are doing performance testing in that area, can you please check
if '__rte_pktmbuf_free_seg_via_array()' is inlined, as it is static
function I expect it to be inlined. If not, can you please test with
force inlining it (__rte_always_inline)?
And I wonder if bulk() API may get single mbuf is a common theme, does
it makes sense add a new inline wrapper to library to cover this case,
if it is bringing ~4% improvement, like:
```
static inline void
rte_pktmbuf_free_bulk_or_one(... **mb, unsigned int n)
{
if (n == 1)
return rte_pktmbuf_free(mb[0]);
return rte_pktmbuf_free_bulk(mb, n);
}
```
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Wednesday, February 7, 2024 4:06 PM
> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu Nicolau
> <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>; Konstantin
> Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
> <anoobj@marvell.com>
> Subject: Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance drop
>
> On 2/7/2024 6:46 AM, Rahul Bhansali wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Tuesday, February 6, 2024 11:55 PM
> >> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu
> >> Nicolau <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>;
> >> Konstantin Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
> >> <anoobj@marvell.com>
> >> Subject: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec
> >> performance drop
> >>
> >> External Email
> >>
> >> ---------------------------------------------------------------------
> >> - On 2/6/2024 12:38 PM, Rahul Bhansali wrote:
> >>> Single packet free using rte_pktmbuf_free_bulk() is dropping the
> >>> performance. On cn10k, maximum of ~4% drop observed for IPsec event
> >>> mode single SA outbound case.
> >>>
> >>> To fix this issue, single packet free will use rte_pktmbuf_free API.
> >>>
> >>> Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")
> >>>
> >>> Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
> >>> ---
> >>> examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
> >>> 1 file changed, 3 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/examples/ipsec-secgw/ipsec-secgw.h
> >>> b/examples/ipsec-secgw/ipsec-secgw.h
> >>> index 8baab44ee7..ec33a982df 100644
> >>> --- a/examples/ipsec-secgw/ipsec-secgw.h
> >>> +++ b/examples/ipsec-secgw/ipsec-secgw.h
> >>> @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf *mb)
> >>> }
> >>>
> >>> /* helper routine to free bulk of packets */ -static inline void
> >>> -free_pkts(struct rte_mbuf *mb[], uint32_t n)
> >>> +static __rte_always_inline void
> >>> +free_pkts(struct rte_mbuf *mb[], const uint32_t n)
> >>> {
> >>> - rte_pktmbuf_free_bulk(mb, n);
> >>> -
> >>> + n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n);
> >>> core_stats_update_drop(n);
> >>> }
> >>>
> >>
> >> Hi Rahul,
> >>
> >> Do you think the 'rte_pktmbuf_free_bulk()' API performance can be
> >> improved by similar change?
> >
> > Hi Ferruh,
> > Currently 'rte_pktmbuf_free_bulk() is not inline. If we make that along with
> __rte_pktmbuf_free_seg_via_array() both inline then performance can be
> improved similar.
> >
>
> Ah, so performance improvement is coming from 'rte_pktmbuf_free()' being
> inline, OK.
>
> As you are doing performance testing in that area, can you please check if
> '__rte_pktmbuf_free_seg_via_array()' is inlined, as it is static function I expect it
> to be inlined. If not, can you please test with force inlining it
> (__rte_always_inline)?
It was not inline, did check with force inline also and no impact with this, so I can make it force inline.
>
>
> And I wonder if bulk() API may get single mbuf is a common theme, does it makes
> sense add a new inline wrapper to library to cover this case, if it is bringing ~4%
> improvement, like:
> ```
> static inline void
> rte_pktmbuf_free_bulk_or_one(... **mb, unsigned int n) {
> if (n == 1)
> return rte_pktmbuf_free(mb[0]);
> return rte_pktmbuf_free_bulk(mb, n);
> }
Agree, can make this wrapper to cover a case where bulk free API is called but might have single mbuf to get better perf. It can be further optimize " if (n == 1)" with compile time constant check,
```
static inline void
rte_pktmbuf_free_bulk_or_one(struct rte_mbuf **mb, unsigned int n)
{
if (__builtin_constant_p(n) && (n == 1))
rte_pktmbuf_free(mb[0]);
else
rte_pktmbuf_free_bulk(mb, n);
}
```
Let me know if it is fine. I'll send v2. And, this will be " __rte_experimental" right ?
> ```
On 2/9/2024 1:10 PM, Rahul Bhansali wrote:
>
>
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Wednesday, February 7, 2024 4:06 PM
>> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu Nicolau
>> <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>; Konstantin
>> Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
>> <anoobj@marvell.com>
>> Subject: Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance drop
>>
>> On 2/7/2024 6:46 AM, Rahul Bhansali wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>>>> Sent: Tuesday, February 6, 2024 11:55 PM
>>>> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu
>>>> Nicolau <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>;
>>>> Konstantin Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
>>>> <anoobj@marvell.com>
>>>> Subject: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec
>>>> performance drop
>>>>
>>>> External Email
>>>>
>>>> ---------------------------------------------------------------------
>>>> - On 2/6/2024 12:38 PM, Rahul Bhansali wrote:
>>>>> Single packet free using rte_pktmbuf_free_bulk() is dropping the
>>>>> performance. On cn10k, maximum of ~4% drop observed for IPsec event
>>>>> mode single SA outbound case.
>>>>>
>>>>> To fix this issue, single packet free will use rte_pktmbuf_free API.
>>>>>
>>>>> Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")
>>>>>
>>>>> Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
>>>>> ---
>>>>> examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
>>>>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/examples/ipsec-secgw/ipsec-secgw.h
>>>>> b/examples/ipsec-secgw/ipsec-secgw.h
>>>>> index 8baab44ee7..ec33a982df 100644
>>>>> --- a/examples/ipsec-secgw/ipsec-secgw.h
>>>>> +++ b/examples/ipsec-secgw/ipsec-secgw.h
>>>>> @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf *mb)
>>>>> }
>>>>>
>>>>> /* helper routine to free bulk of packets */ -static inline void
>>>>> -free_pkts(struct rte_mbuf *mb[], uint32_t n)
>>>>> +static __rte_always_inline void
>>>>> +free_pkts(struct rte_mbuf *mb[], const uint32_t n)
>>>>> {
>>>>> - rte_pktmbuf_free_bulk(mb, n);
>>>>> -
>>>>> + n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n);
>>>>> core_stats_update_drop(n);
>>>>> }
>>>>>
>>>>
>>>> Hi Rahul,
>>>>
>>>> Do you think the 'rte_pktmbuf_free_bulk()' API performance can be
>>>> improved by similar change?
>>>
>>> Hi Ferruh,
>>> Currently 'rte_pktmbuf_free_bulk() is not inline. If we make that along with
>> __rte_pktmbuf_free_seg_via_array() both inline then performance can be
>> improved similar.
>>>
>>
>> Ah, so performance improvement is coming from 'rte_pktmbuf_free()' being
>> inline, OK.
>>
>> As you are doing performance testing in that area, can you please check if
>> '__rte_pktmbuf_free_seg_via_array()' is inlined, as it is static function I expect it
>> to be inlined. If not, can you please test with force inlining it
>> (__rte_always_inline)?
> It was not inline, did check with force inline also and no impact with this, so I can make it force inline.
>
If there is no performance improvement, I think no need to force inline
'__rte_pktmbuf_free_seg_via_array()'.
>>
>>
>> And I wonder if bulk() API may get single mbuf is a common theme, does it makes
>> sense add a new inline wrapper to library to cover this case, if it is bringing ~4%
>> improvement, like:
>> ```
>> static inline void
>> rte_pktmbuf_free_bulk_or_one(... **mb, unsigned int n) {
>> if (n == 1)
>> return rte_pktmbuf_free(mb[0]);
>> return rte_pktmbuf_free_bulk(mb, n);
>> }
> Agree, can make this wrapper to cover a case where bulk free API is called but might have single mbuf to get better perf. It can be further optimize " if (n == 1)" with compile time constant check,
> ```
> static inline void
> rte_pktmbuf_free_bulk_or_one(struct rte_mbuf **mb, unsigned int n)
> {
> if (__builtin_constant_p(n) && (n == 1))
> rte_pktmbuf_free(mb[0]);
> else
> rte_pktmbuf_free_bulk(mb, n);
> }
> ```
> Let me know if it is fine. I'll send v2. And, this will be " __rte_experimental" right ?
>
Compile time constant check can prevent penalty from additional check,
which is good, and I can see this can work for the examples/ipsec-secgw
usecase above, which has some hardcoded single mbuf free calls.
But most of the other usecases I think 'n' won't be known in compile
time, so API will be effectively same as free_bulk().
If you have it with runtime check, do you still observe any performance
improvement? If not perhaps we can go only with example code update,
without new API.
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, February 9, 2024 7:21 PM
> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu Nicolau
> <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>; Konstantin
> Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
> <anoobj@marvell.com>
> Subject: Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance drop
>
> On 2/9/2024 1:10 PM, Rahul Bhansali wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Wednesday, February 7, 2024 4:06 PM
> >> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu
> >> Nicolau <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>;
> >> Konstantin Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
> >> <anoobj@marvell.com>
> >> Subject: Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec
> >> performance drop
> >>
> >> On 2/7/2024 6:46 AM, Rahul Bhansali wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >>>> Sent: Tuesday, February 6, 2024 11:55 PM
> >>>> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu
> >>>> Nicolau <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>;
> >>>> Konstantin Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
> >>>> <anoobj@marvell.com>
> >>>> Subject: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec
> >>>> performance drop
> >>>>
> >>>> External Email
> >>>>
> >>>> -------------------------------------------------------------------
> >>>> --
> >>>> - On 2/6/2024 12:38 PM, Rahul Bhansali wrote:
> >>>>> Single packet free using rte_pktmbuf_free_bulk() is dropping the
> >>>>> performance. On cn10k, maximum of ~4% drop observed for IPsec
> >>>>> event mode single SA outbound case.
> >>>>>
> >>>>> To fix this issue, single packet free will use rte_pktmbuf_free API.
> >>>>>
> >>>>> Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")
> >>>>>
> >>>>> Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
> >>>>> ---
> >>>>> examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
> >>>>> 1 file changed, 3 insertions(+), 4 deletions(-)
> >>>>>
> >>>>> diff --git a/examples/ipsec-secgw/ipsec-secgw.h
> >>>>> b/examples/ipsec-secgw/ipsec-secgw.h
> >>>>> index 8baab44ee7..ec33a982df 100644
> >>>>> --- a/examples/ipsec-secgw/ipsec-secgw.h
> >>>>> +++ b/examples/ipsec-secgw/ipsec-secgw.h
> >>>>> @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf
> >>>>> *mb) }
> >>>>>
> >>>>> /* helper routine to free bulk of packets */ -static inline void
> >>>>> -free_pkts(struct rte_mbuf *mb[], uint32_t n)
> >>>>> +static __rte_always_inline void
> >>>>> +free_pkts(struct rte_mbuf *mb[], const uint32_t n)
> >>>>> {
> >>>>> - rte_pktmbuf_free_bulk(mb, n);
> >>>>> -
> >>>>> + n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n);
> >>>>> core_stats_update_drop(n);
> >>>>> }
> >>>>>
> >>>>
> >>>> Hi Rahul,
> >>>>
> >>>> Do you think the 'rte_pktmbuf_free_bulk()' API performance can be
> >>>> improved by similar change?
> >>>
> >>> Hi Ferruh,
> >>> Currently 'rte_pktmbuf_free_bulk() is not inline. If we make that
> >>> along with
> >> __rte_pktmbuf_free_seg_via_array() both inline then performance can
> >> be improved similar.
> >>>
> >>
> >> Ah, so performance improvement is coming from 'rte_pktmbuf_free()'
> >> being inline, OK.
> >>
> >> As you are doing performance testing in that area, can you please
> >> check if '__rte_pktmbuf_free_seg_via_array()' is inlined, as it is
> >> static function I expect it to be inlined. If not, can you please
> >> test with force inlining it (__rte_always_inline)?
> > It was not inline, did check with force inline also and no impact with this, so I
> can make it force inline.
> >
>
> If there is no performance improvement, I think no need to force inline
> '__rte_pktmbuf_free_seg_via_array()'.
>
> >>
> >>
> >> And I wonder if bulk() API may get single mbuf is a common theme,
> >> does it makes sense add a new inline wrapper to library to cover this
> >> case, if it is bringing ~4% improvement, like:
> >> ```
> >> static inline void
> >> rte_pktmbuf_free_bulk_or_one(... **mb, unsigned int n) {
> >> if (n == 1)
> >> return rte_pktmbuf_free(mb[0]);
> >> return rte_pktmbuf_free_bulk(mb, n); }
> > Agree, can make this wrapper to cover a case where bulk free API is
> > called but might have single mbuf to get better perf. It can be
> > further optimize " if (n == 1)" with compile time constant check, ```
> > static inline void rte_pktmbuf_free_bulk_or_one(struct rte_mbuf **mb,
> > unsigned int n) {
> > if (__builtin_constant_p(n) && (n == 1))
> > rte_pktmbuf_free(mb[0]);
> > else
> > rte_pktmbuf_free_bulk(mb, n); } ``` Let me know if it
> > is fine. I'll send v2. And, this will be " __rte_experimental" right ?
> >
>
> Compile time constant check can prevent penalty from additional check, which is
> good, and I can see this can work for the examples/ipsec-secgw usecase above,
> which has some hardcoded single mbuf free calls.
>
> But most of the other usecases I think 'n' won't be known in compile time, so API
> will be effectively same as free_bulk().
Agree.
>
> If you have it with runtime check, do you still observe any performance
> improvement? If not perhaps we can go only with example code update, without
> new API.
With runtime check, performance improvement is small only in compare to compile time check. So can continue without this new API.
> > Subject: Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance
> drop
> >
> > On 2/9/2024 1:10 PM, Rahul Bhansali wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> > >> Sent: Wednesday, February 7, 2024 4:06 PM
> > >> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu
> > >> Nicolau <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>;
> > >> Konstantin Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
> > >> <anoobj@marvell.com>
> > >> Subject: Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec
> > >> performance drop
> > >>
> > >> On 2/7/2024 6:46 AM, Rahul Bhansali wrote:
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
> > >>>> Sent: Tuesday, February 6, 2024 11:55 PM
> > >>>> To: Rahul Bhansali <rbhansali@marvell.com>; dev@dpdk.org; Radu
> > >>>> Nicolau <radu.nicolau@intel.com>; Akhil Goyal <gakhil@marvell.com>;
> > >>>> Konstantin Ananyev <konstantin.ananyev@huawei.com>; Anoob Joseph
> > >>>> <anoobj@marvell.com>
> > >>>> Subject: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec
> > >>>> performance drop
> > >>>>
> > >>>> External Email
> > >>>>
> > >>>> -------------------------------------------------------------------
> > >>>> --
> > >>>> - On 2/6/2024 12:38 PM, Rahul Bhansali wrote:
> > >>>>> Single packet free using rte_pktmbuf_free_bulk() is dropping the
> > >>>>> performance. On cn10k, maximum of ~4% drop observed for IPsec
> > >>>>> event mode single SA outbound case.
> > >>>>>
> > >>>>> To fix this issue, single packet free will use rte_pktmbuf_free API.
> > >>>>>
> > >>>>> Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")
> > >>>>>
> > >>>>> Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
> > >>>>> ---
> > >>>>> examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
> > >>>>> 1 file changed, 3 insertions(+), 4 deletions(-)
> > >>>>>
> > >>>>> diff --git a/examples/ipsec-secgw/ipsec-secgw.h
> > >>>>> b/examples/ipsec-secgw/ipsec-secgw.h
> > >>>>> index 8baab44ee7..ec33a982df 100644
> > >>>>> --- a/examples/ipsec-secgw/ipsec-secgw.h
> > >>>>> +++ b/examples/ipsec-secgw/ipsec-secgw.h
> > >>>>> @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf
> > >>>>> *mb) }
> > >>>>>
> > >>>>> /* helper routine to free bulk of packets */ -static inline void
> > >>>>> -free_pkts(struct rte_mbuf *mb[], uint32_t n)
> > >>>>> +static __rte_always_inline void
> > >>>>> +free_pkts(struct rte_mbuf *mb[], const uint32_t n)
> > >>>>> {
> > >>>>> - rte_pktmbuf_free_bulk(mb, n);
> > >>>>> -
> > >>>>> + n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb,
> n);
> > >>>>> core_stats_update_drop(n);
> > >>>>> }
> > >>>>>
> > >>>>
> > >>>> Hi Rahul,
> > >>>>
> > >>>> Do you think the 'rte_pktmbuf_free_bulk()' API performance can be
> > >>>> improved by similar change?
> > >>>
> > >>> Hi Ferruh,
> > >>> Currently 'rte_pktmbuf_free_bulk() is not inline. If we make that
> > >>> along with
> > >> __rte_pktmbuf_free_seg_via_array() both inline then performance can
> > >> be improved similar.
> > >>>
> > >>
> > >> Ah, so performance improvement is coming from 'rte_pktmbuf_free()'
> > >> being inline, OK.
> > >>
> > >> As you are doing performance testing in that area, can you please
> > >> check if '__rte_pktmbuf_free_seg_via_array()' is inlined, as it is
> > >> static function I expect it to be inlined. If not, can you please
> > >> test with force inlining it (__rte_always_inline)?
> > > It was not inline, did check with force inline also and no impact with this, so I
> > can make it force inline.
> > >
> >
> > If there is no performance improvement, I think no need to force inline
> > '__rte_pktmbuf_free_seg_via_array()'.
> >
> > >>
> > >>
> > >> And I wonder if bulk() API may get single mbuf is a common theme,
> > >> does it makes sense add a new inline wrapper to library to cover this
> > >> case, if it is bringing ~4% improvement, like:
> > >> ```
> > >> static inline void
> > >> rte_pktmbuf_free_bulk_or_one(... **mb, unsigned int n) {
> > >> if (n == 1)
> > >> return rte_pktmbuf_free(mb[0]);
> > >> return rte_pktmbuf_free_bulk(mb, n); }
> > > Agree, can make this wrapper to cover a case where bulk free API is
> > > called but might have single mbuf to get better perf. It can be
> > > further optimize " if (n == 1)" with compile time constant check, ```
> > > static inline void rte_pktmbuf_free_bulk_or_one(struct rte_mbuf **mb,
> > > unsigned int n) {
> > > if (__builtin_constant_p(n) && (n == 1))
> > > rte_pktmbuf_free(mb[0]);
> > > else
> > > rte_pktmbuf_free_bulk(mb, n); } ``` Let me know if it
> > > is fine. I'll send v2. And, this will be " __rte_experimental" right ?
> > >
> >
> > Compile time constant check can prevent penalty from additional check, which
> is
> > good, and I can see this can work for the examples/ipsec-secgw usecase above,
> > which has some hardcoded single mbuf free calls.
> >
> > But most of the other usecases I think 'n' won't be known in compile time, so
> API
> > will be effectively same as free_bulk().
> Agree.
> >
> > If you have it with runtime check, do you still observe any performance
> > improvement? If not perhaps we can go only with example code update,
> without
> > new API.
> With runtime check, performance improvement is small only in compare to
> compile time check. So can continue without this new API.
Acked-by: Akhil Goyal <gakhil@marvell.com>
Applied to dpdk-next-crypto
Thanks.
@@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf *mb)
}
/* helper routine to free bulk of packets */
-static inline void
-free_pkts(struct rte_mbuf *mb[], uint32_t n)
+static __rte_always_inline void
+free_pkts(struct rte_mbuf *mb[], const uint32_t n)
{
- rte_pktmbuf_free_bulk(mb, n);
-
+ n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n);
core_stats_update_drop(n);
}