[v2] mbuf: add fast free bulk function

Message ID 20250114163951.125667-1-mb@smartsharesystems.com (mailing list archive)
State Superseded
Delegated to: Thomas Monjalon
Headers
Series [v2] mbuf: add fast free bulk function |

Checks

Context Check Description
ci/loongarch-compilation success Compilation OK
ci/checkpatch success coding style OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/intel-Functional success Functional PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-marvell-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS

Commit Message

Morten Brørup Jan. 14, 2025, 4:39 p.m. UTC
mbuf: add fast free bulk function

When putting an mbuf back into its mempool, there are certain requirements
to the mbuf. Specifically, some of its fields must be initialized.

These requirements are in fact invariants about free mbufs, held in
mempools, and thus also apply when allocating an mbuf from a mempool.
With this in mind, the additional assertions in rte_mbuf_raw_free() were
moved to __rte_mbuf_raw_sanity_check().
Furthermore, the assertion regarding pinned external buffer was enhanced;
it now also asserts that the referenced pinned external buffer has
refcnt == 1.

The description of RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE was updated to
include the remaining requirements, which were missing here.

And finally:
A new rte_mbuf_fast_free_bulk() inline function was added for the benefit
of ethdev drivers supporting fast release of mbufs.
It asserts these requirements and that the mbufs belong to the specified
mempool, and then calls rte_mempool_put_bulk().

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
v2:
* Fixed missing inline.
---
 lib/ethdev/rte_ethdev.h |  6 ++++--
 lib/mbuf/rte_mbuf.h     | 39 +++++++++++++++++++++++++++++++++++++--
 2 files changed, 41 insertions(+), 4 deletions(-)
  

Comments

Stephen Hemminger Jan. 14, 2025, 5:39 p.m. UTC | #1
On Tue, 14 Jan 2025 16:39:51 +0000
Morten Brørup <mb@smartsharesystems.com> wrote:

> mbuf: add fast free bulk function
> 
> When putting an mbuf back into its mempool, there are certain requirements
> to the mbuf. Specifically, some of its fields must be initialized.
> 
> These requirements are in fact invariants about free mbufs, held in
> mempools, and thus also apply when allocating an mbuf from a mempool.
> With this in mind, the additional assertions in rte_mbuf_raw_free() were
> moved to __rte_mbuf_raw_sanity_check().
> Furthermore, the assertion regarding pinned external buffer was enhanced;
> it now also asserts that the referenced pinned external buffer has
> refcnt == 1.
> 
> The description of RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE was updated to
> include the remaining requirements, which were missing here.
> 
> And finally:
> A new rte_mbuf_fast_free_bulk() inline function was added for the benefit
> of ethdev drivers supporting fast release of mbufs.
> It asserts these requirements and that the mbufs belong to the specified
> mempool, and then calls rte_mempool_put_bulk().
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>

Since it is a new function it should be marked experimental for now.
  
huangdengdui Jan. 15, 2025, 6:52 a.m. UTC | #2
On 2025/1/15 0:39, Morten Brørup wrote:
> mbuf: add fast free bulk function
> 
> When putting an mbuf back into its mempool, there are certain requirements
> to the mbuf. Specifically, some of its fields must be initialized.
> 
> These requirements are in fact invariants about free mbufs, held in
> mempools, and thus also apply when allocating an mbuf from a mempool.
> With this in mind, the additional assertions in rte_mbuf_raw_free() were
> moved to __rte_mbuf_raw_sanity_check().
> Furthermore, the assertion regarding pinned external buffer was enhanced;
> it now also asserts that the referenced pinned external buffer has
> refcnt == 1.
> 
> The description of RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE was updated to
> include the remaining requirements, which were missing here.
> 
> And finally:
> A new rte_mbuf_fast_free_bulk() inline function was added for the benefit
> of ethdev drivers supporting fast release of mbufs.
> It asserts these requirements and that the mbufs belong to the specified
> mempool, and then calls rte_mempool_put_bulk().
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
> v2:
> * Fixed missing inline.
> ---
>  lib/ethdev/rte_ethdev.h |  6 ++++--
>  lib/mbuf/rte_mbuf.h     | 39 +++++++++++++++++++++++++++++++++++++--
>  2 files changed, 41 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 1f71cad244..e9267fca79 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1612,8 +1612,10 @@ struct rte_eth_conf {
>  #define RTE_ETH_TX_OFFLOAD_MULTI_SEGS       RTE_BIT64(15)
>  /**
>   * Device supports optimization for fast release of mbufs.
> - * When set application must guarantee that per-queue all mbufs comes from
> - * the same mempool and has refcnt = 1.
> + * When set application must guarantee that per-queue all mbufs come from the same mempool,
> + * are direct, have refcnt=1, next=NULL and nb_segs=1, as done by rte_pktmbuf_prefree_seg().
> + *
> + * @see rte_mbuf_fast_free_bulk()
>   */
>  #define RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE   RTE_BIT64(16)
>  #define RTE_ETH_TX_OFFLOAD_SECURITY         RTE_BIT64(17)
> diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> index 0d2e0e64b3..7590d82689 100644
> --- a/lib/mbuf/rte_mbuf.h
> +++ b/lib/mbuf/rte_mbuf.h
> @@ -568,6 +568,10 @@ __rte_mbuf_raw_sanity_check(__rte_unused const struct rte_mbuf *m)
>  	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
>  	RTE_ASSERT(m->next == NULL);
>  	RTE_ASSERT(m->nb_segs == 1);
> +	RTE_ASSERT(!RTE_MBUF_CLONED(m));
> +	RTE_ASSERT(!RTE_MBUF_HAS_EXTBUF(m) ||
> +			(RTE_MBUF_HAS_PINNED_EXTBUF(m) &&
> +			rte_mbuf_ext_refcnt_read(m->shinfo) == 1));
>  	__rte_mbuf_sanity_check(m, 0);
>  }
>  
> @@ -623,12 +627,43 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
>  static __rte_always_inline void
>  rte_mbuf_raw_free(struct rte_mbuf *m)
>  {
> -	RTE_ASSERT(!RTE_MBUF_CLONED(m) &&
> -		  (!RTE_MBUF_HAS_EXTBUF(m) || RTE_MBUF_HAS_PINNED_EXTBUF(m)));
>  	__rte_mbuf_raw_sanity_check(m);
>  	rte_mempool_put(m->pool, m);
>  }
>  
> +/**
> + * Put a bulk of mbufs allocated from the same mempool back into the mempool.
> + *
> + * The caller must ensure that the mbufs come from the specified mempool,
> + * are direct and properly reinitialized (refcnt=1, next=NULL, nb_segs=1), as done by
> + * rte_pktmbuf_prefree_seg().
> + *
> + * This function should be used with care, when optimization is
> + * required. For standard needs, prefer rte_pktmbuf_free_bulk().
> + *
> + * @see RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
> + *
> + * @param mp
> + *   The mempool to which the mbufs belong.
> + * @param mbufs
> + *   Array of pointers to packet mbufs.
> + *   The array must not contain NULL pointers.
> + * @param count
> + *   Array size.
> + */
> +static __rte_always_inline void
> +rte_mbuf_fast_free_bulk(struct rte_mempool *mp, struct rte_mbuf **mbufs, unsigned int count)
> +{
> +	for (unsigned int idx = 0; idx < count; idx++) {
> +		const struct rte_mbuf *m = mbufs[idx];
> +		RTE_ASSERT(m != NULL);
> +		RTE_ASSERT(m->pool == mp);
> +		__rte_mbuf_raw_sanity_check(m);
> +	}

Is there some way to avoid executing a loop in non-debug mode? Like the following or other better way

#ifdef RTE_LIBRTE_MBUF_DEBUG
	{
		for (unsigned int idx = 0; idx < count; idx++) {
			const struct rte_mbuf *m = mbufs[idx];
			RTE_ASSERT(m != NULL);
			RTE_ASSERT(m->pool == mp);
			__rte_mbuf_raw_sanity_check(m);
		}
	}
#endif

> +
> +	rte_mempool_put_bulk(mp, (void **)mbufs, count);

Can the mp be obtained from the mbuf?

> +}
> +
>  /**
>   * The packet mbuf constructor.
>   *
  
Morten Brørup Jan. 15, 2025, 9:38 a.m. UTC | #3
> From: huangdengdui [mailto:huangdengdui@huawei.com]
> Sent: Wednesday, 15 January 2025 07.52
> 
> On 2025/1/15 0:39, Morten Brørup wrote:
> > mbuf: add fast free bulk function
> >
> > When putting an mbuf back into its mempool, there are certain
> requirements
> > to the mbuf. Specifically, some of its fields must be initialized.
> >
> > These requirements are in fact invariants about free mbufs, held in
> > mempools, and thus also apply when allocating an mbuf from a mempool.
> > With this in mind, the additional assertions in rte_mbuf_raw_free()
> were
> > moved to __rte_mbuf_raw_sanity_check().
> > Furthermore, the assertion regarding pinned external buffer was
> enhanced;
> > it now also asserts that the referenced pinned external buffer has
> > refcnt == 1.
> >
> > The description of RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE was updated to
> > include the remaining requirements, which were missing here.
> >
> > And finally:
> > A new rte_mbuf_fast_free_bulk() inline function was added for the
> benefit
> > of ethdev drivers supporting fast release of mbufs.
> > It asserts these requirements and that the mbufs belong to the
> specified
> > mempool, and then calls rte_mempool_put_bulk().
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
> > v2:
> > * Fixed missing inline.
> > ---
> >  lib/ethdev/rte_ethdev.h |  6 ++++--
> >  lib/mbuf/rte_mbuf.h     | 39 +++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 41 insertions(+), 4 deletions(-)
> >
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > index 1f71cad244..e9267fca79 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1612,8 +1612,10 @@ struct rte_eth_conf {
> >  #define RTE_ETH_TX_OFFLOAD_MULTI_SEGS       RTE_BIT64(15)
> >  /**
> >   * Device supports optimization for fast release of mbufs.
> > - * When set application must guarantee that per-queue all mbufs
> comes from
> > - * the same mempool and has refcnt = 1.
> > + * When set application must guarantee that per-queue all mbufs come
> from the same mempool,
> > + * are direct, have refcnt=1, next=NULL and nb_segs=1, as done by
> rte_pktmbuf_prefree_seg().
> > + *
> > + * @see rte_mbuf_fast_free_bulk()
> >   */
> >  #define RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE   RTE_BIT64(16)
> >  #define RTE_ETH_TX_OFFLOAD_SECURITY         RTE_BIT64(17)
> > diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> > index 0d2e0e64b3..7590d82689 100644
> > --- a/lib/mbuf/rte_mbuf.h
> > +++ b/lib/mbuf/rte_mbuf.h
> > @@ -568,6 +568,10 @@ __rte_mbuf_raw_sanity_check(__rte_unused const
> struct rte_mbuf *m)
> >  	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
> >  	RTE_ASSERT(m->next == NULL);
> >  	RTE_ASSERT(m->nb_segs == 1);
> > +	RTE_ASSERT(!RTE_MBUF_CLONED(m));
> > +	RTE_ASSERT(!RTE_MBUF_HAS_EXTBUF(m) ||
> > +			(RTE_MBUF_HAS_PINNED_EXTBUF(m) &&
> > +			rte_mbuf_ext_refcnt_read(m->shinfo) == 1));
> >  	__rte_mbuf_sanity_check(m, 0);
> >  }
> >
> > @@ -623,12 +627,43 @@ static inline struct rte_mbuf
> *rte_mbuf_raw_alloc(struct rte_mempool *mp)
> >  static __rte_always_inline void
> >  rte_mbuf_raw_free(struct rte_mbuf *m)
> >  {
> > -	RTE_ASSERT(!RTE_MBUF_CLONED(m) &&
> > -		  (!RTE_MBUF_HAS_EXTBUF(m) ||
> RTE_MBUF_HAS_PINNED_EXTBUF(m)));
> >  	__rte_mbuf_raw_sanity_check(m);
> >  	rte_mempool_put(m->pool, m);
> >  }
> >
> > +/**
> > + * Put a bulk of mbufs allocated from the same mempool back into the
> mempool.
> > + *
> > + * The caller must ensure that the mbufs come from the specified
> mempool,
> > + * are direct and properly reinitialized (refcnt=1, next=NULL,
> nb_segs=1), as done by
> > + * rte_pktmbuf_prefree_seg().
> > + *
> > + * This function should be used with care, when optimization is
> > + * required. For standard needs, prefer rte_pktmbuf_free_bulk().
> > + *
> > + * @see RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
> > + *
> > + * @param mp
> > + *   The mempool to which the mbufs belong.
> > + * @param mbufs
> > + *   Array of pointers to packet mbufs.
> > + *   The array must not contain NULL pointers.
> > + * @param count
> > + *   Array size.
> > + */
> > +static __rte_always_inline void
> > +rte_mbuf_fast_free_bulk(struct rte_mempool *mp, struct rte_mbuf
> **mbufs, unsigned int count)
> > +{
> > +	for (unsigned int idx = 0; idx < count; idx++) {
> > +		const struct rte_mbuf *m = mbufs[idx];
> > +		RTE_ASSERT(m != NULL);
> > +		RTE_ASSERT(m->pool == mp);
> > +		__rte_mbuf_raw_sanity_check(m);
> > +	}
> 
> Is there some way to avoid executing a loop in non-debug mode? Like the
> following or other better way
> 
> #ifdef RTE_LIBRTE_MBUF_DEBUG
> 	{
> 		for (unsigned int idx = 0; idx < count; idx++) {
> 			const struct rte_mbuf *m = mbufs[idx];
> 			RTE_ASSERT(m != NULL);
> 			RTE_ASSERT(m->pool == mp);
> 			__rte_mbuf_raw_sanity_check(m);
> 		}
> 	}
> #endif

The loop is already omitted in non-debug mode:
RTE_ASSERT() [1] is omitted unless RTE_ENABLE_ASSERT is set.
__rte_mbuf_raw_sanity_check() [2] consist of some RTE_ASSERTs and __rte_mbuf_sanity_check().
__rte_mbuf_sanity_check() [3] is omitted unless RTE_LIBRTE_MBUF_DEBUG is set.

So the compiler will detect that the loop has no effect, and optimize away the loop.

[1]: https://elixir.bootlin.com/dpdk/v24.11.1/source/lib/eal/include/rte_debug.h#L46
[2]: https://elixir.bootlin.com/dpdk/v24.11.1/source/lib/mbuf/rte_mbuf.h#L566
[3]: https://elixir.bootlin.com/dpdk/v24.11.1/source/lib/mbuf/rte_mbuf.h#L348

> 
> > +
> > +	rte_mempool_put_bulk(mp, (void **)mbufs, count);
> 
> Can the mp be obtained from the mbuf?

Passing "mp" as a parameter avoids a potential CPU cache miss by not dereferencing the first mbuf, if the driver/application already has the mempool pointer readily available (and hot in the CPU cache) from somewhere else.
If the driver/or application doesn't have the mempool pointer readily available, it can obtain it from the mbuf when calling this function.

And as a bonus side effect, passing "mp" as a parameter allows calling the function with count=0 without special handling inside the function.

Obviously, if the driver/application gets "mp" from mbuf[0]->pool, it needs to first check that count>0; but that would be the situation for the driver/application whenever it accesses an mbuf array, regardless what it is doing with that array.
  
huangdengdui Jan. 15, 2025, 12:14 p.m. UTC | #4
On 2025/1/15 17:38, Morten Brørup wrote:
>> From: huangdengdui [mailto:huangdengdui@huawei.com]
>> Sent: Wednesday, 15 January 2025 07.52
>>
>> On 2025/1/15 0:39, Morten Brørup wrote:
>>> mbuf: add fast free bulk function
>>>
>>> When putting an mbuf back into its mempool, there are certain
>> requirements
>>> to the mbuf. Specifically, some of its fields must be initialized.
>>>
>>> These requirements are in fact invariants about free mbufs, held in
>>> mempools, and thus also apply when allocating an mbuf from a mempool.
>>> With this in mind, the additional assertions in rte_mbuf_raw_free()
>> were
>>> moved to __rte_mbuf_raw_sanity_check().
>>> Furthermore, the assertion regarding pinned external buffer was
>> enhanced;
>>> it now also asserts that the referenced pinned external buffer has
>>> refcnt == 1.
>>>
>>> The description of RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE was updated to
>>> include the remaining requirements, which were missing here.
>>>
>>> And finally:
>>> A new rte_mbuf_fast_free_bulk() inline function was added for the
>> benefit
>>> of ethdev drivers supporting fast release of mbufs.
>>> It asserts these requirements and that the mbufs belong to the
>> specified
>>> mempool, and then calls rte_mempool_put_bulk().
>>>
>>> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
>>> ---
>>> v2:
>>> * Fixed missing inline.
>>> ---
>>>  lib/ethdev/rte_ethdev.h |  6 ++++--
>>>  lib/mbuf/rte_mbuf.h     | 39 +++++++++++++++++++++++++++++++++++++--
>>>  2 files changed, 41 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>>> index 1f71cad244..e9267fca79 100644
>>> --- a/lib/ethdev/rte_ethdev.h
>>> +++ b/lib/ethdev/rte_ethdev.h
>>> @@ -1612,8 +1612,10 @@ struct rte_eth_conf {
>>>  #define RTE_ETH_TX_OFFLOAD_MULTI_SEGS       RTE_BIT64(15)
>>>  /**
>>>   * Device supports optimization for fast release of mbufs.
>>> - * When set application must guarantee that per-queue all mbufs
>> comes from
>>> - * the same mempool and has refcnt = 1.
>>> + * When set application must guarantee that per-queue all mbufs come
>> from the same mempool,
>>> + * are direct, have refcnt=1, next=NULL and nb_segs=1, as done by
>> rte_pktmbuf_prefree_seg().
>>> + *
>>> + * @see rte_mbuf_fast_free_bulk()
>>>   */
>>>  #define RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE   RTE_BIT64(16)
>>>  #define RTE_ETH_TX_OFFLOAD_SECURITY         RTE_BIT64(17)
>>> diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
>>> index 0d2e0e64b3..7590d82689 100644
>>> --- a/lib/mbuf/rte_mbuf.h
>>> +++ b/lib/mbuf/rte_mbuf.h
>>> @@ -568,6 +568,10 @@ __rte_mbuf_raw_sanity_check(__rte_unused const
>> struct rte_mbuf *m)
>>>  	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
>>>  	RTE_ASSERT(m->next == NULL);
>>>  	RTE_ASSERT(m->nb_segs == 1);
>>> +	RTE_ASSERT(!RTE_MBUF_CLONED(m));
>>> +	RTE_ASSERT(!RTE_MBUF_HAS_EXTBUF(m) ||
>>> +			(RTE_MBUF_HAS_PINNED_EXTBUF(m) &&
>>> +			rte_mbuf_ext_refcnt_read(m->shinfo) == 1));
>>>  	__rte_mbuf_sanity_check(m, 0);
>>>  }
>>>
>>> @@ -623,12 +627,43 @@ static inline struct rte_mbuf
>> *rte_mbuf_raw_alloc(struct rte_mempool *mp)
>>>  static __rte_always_inline void
>>>  rte_mbuf_raw_free(struct rte_mbuf *m)
>>>  {
>>> -	RTE_ASSERT(!RTE_MBUF_CLONED(m) &&
>>> -		  (!RTE_MBUF_HAS_EXTBUF(m) ||
>> RTE_MBUF_HAS_PINNED_EXTBUF(m)));
>>>  	__rte_mbuf_raw_sanity_check(m);
>>>  	rte_mempool_put(m->pool, m);
>>>  }
>>>
>>> +/**
>>> + * Put a bulk of mbufs allocated from the same mempool back into the
>> mempool.
>>> + *
>>> + * The caller must ensure that the mbufs come from the specified
>> mempool,
>>> + * are direct and properly reinitialized (refcnt=1, next=NULL,
>> nb_segs=1), as done by
>>> + * rte_pktmbuf_prefree_seg().
>>> + *
>>> + * This function should be used with care, when optimization is
>>> + * required. For standard needs, prefer rte_pktmbuf_free_bulk().
>>> + *
>>> + * @see RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
>>> + *
>>> + * @param mp
>>> + *   The mempool to which the mbufs belong.
>>> + * @param mbufs
>>> + *   Array of pointers to packet mbufs.
>>> + *   The array must not contain NULL pointers.
>>> + * @param count
>>> + *   Array size.
>>> + */
>>> +static __rte_always_inline void
>>> +rte_mbuf_fast_free_bulk(struct rte_mempool *mp, struct rte_mbuf
>> **mbufs, unsigned int count)
>>> +{
>>> +	for (unsigned int idx = 0; idx < count; idx++) {
>>> +		const struct rte_mbuf *m = mbufs[idx];
>>> +		RTE_ASSERT(m != NULL);
>>> +		RTE_ASSERT(m->pool == mp);
>>> +		__rte_mbuf_raw_sanity_check(m);
>>> +	}
>>
>> Is there some way to avoid executing a loop in non-debug mode? Like the
>> following or other better way
>>
>> #ifdef RTE_LIBRTE_MBUF_DEBUG
>> 	{
>> 		for (unsigned int idx = 0; idx < count; idx++) {
>> 			const struct rte_mbuf *m = mbufs[idx];
>> 			RTE_ASSERT(m != NULL);
>> 			RTE_ASSERT(m->pool == mp);
>> 			__rte_mbuf_raw_sanity_check(m);
>> 		}
>> 	}
>> #endif
> 
> The loop is already omitted in non-debug mode:
> RTE_ASSERT() [1] is omitted unless RTE_ENABLE_ASSERT is set.
> __rte_mbuf_raw_sanity_check() [2] consist of some RTE_ASSERTs and __rte_mbuf_sanity_check().
> __rte_mbuf_sanity_check() [3] is omitted unless RTE_LIBRTE_MBUF_DEBUG is set.
> 
> So the compiler will detect that the loop has no effect, and optimize away the loop.

Okay, you're right, I compiled the code and this loop code didn't generate any instructions after the compiler optimization was enabled.

> 
> [1]: https://elixir.bootlin.com/dpdk/v24.11.1/source/lib/eal/include/rte_debug.h#L46
> [2]: https://elixir.bootlin.com/dpdk/v24.11.1/source/lib/mbuf/rte_mbuf.h#L566
> [3]: https://elixir.bootlin.com/dpdk/v24.11.1/source/lib/mbuf/rte_mbuf.h#L348
> 
>>
>>> +
>>> +	rte_mempool_put_bulk(mp, (void **)mbufs, count);
>>
>> Can the mp be obtained from the mbuf?
> 
> Passing "mp" as a parameter avoids a potential CPU cache miss by not dereferencing the first mbuf, if the driver/application already has the mempool pointer readily available (and hot in the CPU cache) from somewhere else.
> If the driver/or application doesn't have the mempool pointer readily available, it can obtain it from the mbuf when calling this function.
> 
> And as a bonus side effect, passing "mp" as a parameter allows calling the function with count=0 without special handling inside the function.
> 
> Obviously, if the driver/application gets "mp" from mbuf[0]->pool, it needs to first check that count>0; but that would be the situation for the driver/application whenever it accesses an mbuf array, regardless what it is doing with that array.
> 

Okay, It's better this way.

Acked-by: huangdengdui@huawei.com
  
Stephen Hemminger Jan. 16, 2025, 3:50 a.m. UTC | #5
On Wed, 15 Jan 2025 14:52:16 +0800
huangdengdui <huangdengdui@huawei.com> wrote:

> On 2025/1/15 0:39, Morten Brørup wrote:
> > mbuf: add fast free bulk function
> > 
> > When putting an mbuf back into its mempool, there are certain requirements
> > to the mbuf. Specifically, some of its fields must be initialized.
> > 
> > These requirements are in fact invariants about free mbufs, held in
> > mempools, and thus also apply when allocating an mbuf from a mempool.
> > With this in mind, the additional assertions in rte_mbuf_raw_free() were
> > moved to __rte_mbuf_raw_sanity_check().
> > Furthermore, the assertion regarding pinned external buffer was enhanced;
> > it now also asserts that the referenced pinned external buffer has
> > refcnt == 1.
> > 
> > The description of RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE was updated to
> > include the remaining requirements, which were missing here.
> > 
> > And finally:
> > A new rte_mbuf_fast_free_bulk() inline function was added for the benefit
> > of ethdev drivers supporting fast release of mbufs.
> > It asserts these requirements and that the mbufs belong to the specified
> > mempool, and then calls rte_mempool_put_bulk().
> > 
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
> > v2:
> > * Fixed missing inline.
> > ---
> >  lib/ethdev/rte_ethdev.h |  6 ++++--
> >  lib/mbuf/rte_mbuf.h     | 39 +++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 41 insertions(+), 4 deletions(-)
> > 
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > index 1f71cad244..e9267fca79 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1612,8 +1612,10 @@ struct rte_eth_conf {
> >  #define RTE_ETH_TX_OFFLOAD_MULTI_SEGS       RTE_BIT64(15)
> >  /**
> >   * Device supports optimization for fast release of mbufs.
> > - * When set application must guarantee that per-queue all mbufs comes from
> > - * the same mempool and has refcnt = 1.
> > + * When set application must guarantee that per-queue all mbufs come from the same mempool,
> > + * are direct, have refcnt=1, next=NULL and nb_segs=1, as done by rte_pktmbuf_prefree_seg().
> > + *
> > + * @see rte_mbuf_fast_free_bulk()
> >   */
> >  #define RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE   RTE_BIT64(16)
> >  #define RTE_ETH_TX_OFFLOAD_SECURITY         RTE_BIT64(17)
> > diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> > index 0d2e0e64b3..7590d82689 100644
> > --- a/lib/mbuf/rte_mbuf.h
> > +++ b/lib/mbuf/rte_mbuf.h
> > @@ -568,6 +568,10 @@ __rte_mbuf_raw_sanity_check(__rte_unused const struct rte_mbuf *m)
> >  	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
> >  	RTE_ASSERT(m->next == NULL);
> >  	RTE_ASSERT(m->nb_segs == 1);
> > +	RTE_ASSERT(!RTE_MBUF_CLONED(m));
> > +	RTE_ASSERT(!RTE_MBUF_HAS_EXTBUF(m) ||
> > +			(RTE_MBUF_HAS_PINNED_EXTBUF(m) &&
> > +			rte_mbuf_ext_refcnt_read(m->shinfo) == 1));
> >  	__rte_mbuf_sanity_check(m, 0);
> >  }
> >  
> > @@ -623,12 +627,43 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
> >  static __rte_always_inline void
> >  rte_mbuf_raw_free(struct rte_mbuf *m)
> >  {
> > -	RTE_ASSERT(!RTE_MBUF_CLONED(m) &&
> > -		  (!RTE_MBUF_HAS_EXTBUF(m) || RTE_MBUF_HAS_PINNED_EXTBUF(m)));
> >  	__rte_mbuf_raw_sanity_check(m);
> >  	rte_mempool_put(m->pool, m);
> >  }
> >  
> > +/**
> > + * Put a bulk of mbufs allocated from the same mempool back into the mempool.
> > + *
> > + * The caller must ensure that the mbufs come from the specified mempool,
> > + * are direct and properly reinitialized (refcnt=1, next=NULL, nb_segs=1), as done by
> > + * rte_pktmbuf_prefree_seg().
> > + *
> > + * This function should be used with care, when optimization is
> > + * required. For standard needs, prefer rte_pktmbuf_free_bulk().
> > + *
> > + * @see RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
> > + *
> > + * @param mp
> > + *   The mempool to which the mbufs belong.
> > + * @param mbufs
> > + *   Array of pointers to packet mbufs.
> > + *   The array must not contain NULL pointers.
> > + * @param count
> > + *   Array size.
> > + */
> > +static __rte_always_inline void
> > +rte_mbuf_fast_free_bulk(struct rte_mempool *mp, struct rte_mbuf **mbufs, unsigned int count)
> > +{
> > +	for (unsigned int idx = 0; idx < count; idx++) {
> > +		const struct rte_mbuf *m = mbufs[idx];
> > +		RTE_ASSERT(m != NULL);
> > +		RTE_ASSERT(m->pool == mp);
> > +		__rte_mbuf_raw_sanity_check(m);
> > +	}  
> 
> Is there some way to avoid executing a loop in non-debug mode? Like the following or other better way
> 
> #ifdef RTE_LIBRTE_MBUF_DEBUG
> 	{
> 		for (unsigned int idx = 0; idx < count; idx++) {
> 			const struct rte_mbuf *m = mbufs[idx];
> 			RTE_ASSERT(m != NULL);
> 			RTE_ASSERT(m->pool == mp);
> 			__rte_mbuf_raw_sanity_check(m);
> 		}
> 	}
> #endif

I suspect compiler will optimize it way to nothing and drop the loop.
Use godbolt to see.

> > +
> > +	rte_mempool_put_bulk(mp, (void **)mbufs, count);  
> 
> Can the mp be obtained from the mbuf?

Yes, see rte_pktmbuf_free

> 
> > +}
> > +
> >  /**
> >   * The packet mbuf constructor.
> >   *
  

Patch

diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 1f71cad244..e9267fca79 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1612,8 +1612,10 @@  struct rte_eth_conf {
 #define RTE_ETH_TX_OFFLOAD_MULTI_SEGS       RTE_BIT64(15)
 /**
  * Device supports optimization for fast release of mbufs.
- * When set application must guarantee that per-queue all mbufs comes from
- * the same mempool and has refcnt = 1.
+ * When set application must guarantee that per-queue all mbufs come from the same mempool,
+ * are direct, have refcnt=1, next=NULL and nb_segs=1, as done by rte_pktmbuf_prefree_seg().
+ *
+ * @see rte_mbuf_fast_free_bulk()
  */
 #define RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE   RTE_BIT64(16)
 #define RTE_ETH_TX_OFFLOAD_SECURITY         RTE_BIT64(17)
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 0d2e0e64b3..7590d82689 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -568,6 +568,10 @@  __rte_mbuf_raw_sanity_check(__rte_unused const struct rte_mbuf *m)
 	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
 	RTE_ASSERT(m->next == NULL);
 	RTE_ASSERT(m->nb_segs == 1);
+	RTE_ASSERT(!RTE_MBUF_CLONED(m));
+	RTE_ASSERT(!RTE_MBUF_HAS_EXTBUF(m) ||
+			(RTE_MBUF_HAS_PINNED_EXTBUF(m) &&
+			rte_mbuf_ext_refcnt_read(m->shinfo) == 1));
 	__rte_mbuf_sanity_check(m, 0);
 }
 
@@ -623,12 +627,43 @@  static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
 static __rte_always_inline void
 rte_mbuf_raw_free(struct rte_mbuf *m)
 {
-	RTE_ASSERT(!RTE_MBUF_CLONED(m) &&
-		  (!RTE_MBUF_HAS_EXTBUF(m) || RTE_MBUF_HAS_PINNED_EXTBUF(m)));
 	__rte_mbuf_raw_sanity_check(m);
 	rte_mempool_put(m->pool, m);
 }
 
+/**
+ * Put a bulk of mbufs allocated from the same mempool back into the mempool.
+ *
+ * The caller must ensure that the mbufs come from the specified mempool,
+ * are direct and properly reinitialized (refcnt=1, next=NULL, nb_segs=1), as done by
+ * rte_pktmbuf_prefree_seg().
+ *
+ * This function should be used with care, when optimization is
+ * required. For standard needs, prefer rte_pktmbuf_free_bulk().
+ *
+ * @see RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
+ *
+ * @param mp
+ *   The mempool to which the mbufs belong.
+ * @param mbufs
+ *   Array of pointers to packet mbufs.
+ *   The array must not contain NULL pointers.
+ * @param count
+ *   Array size.
+ */
+static __rte_always_inline void
+rte_mbuf_fast_free_bulk(struct rte_mempool *mp, struct rte_mbuf **mbufs, unsigned int count)
+{
+	for (unsigned int idx = 0; idx < count; idx++) {
+		const struct rte_mbuf *m = mbufs[idx];
+		RTE_ASSERT(m != NULL);
+		RTE_ASSERT(m->pool == mp);
+		__rte_mbuf_raw_sanity_check(m);
+	}
+
+	rte_mempool_put_bulk(mp, (void **)mbufs, count);
+}
+
 /**
  * The packet mbuf constructor.
  *