[1/2] net/i40e: replace put function

Message ID 20230209062501.142828-1-kamalakshitha.aligeri@arm.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series [1/2] net/i40e: replace put function |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/iol-testing fail build patch failure

Commit Message

Kamalakshitha Aligeri Feb. 9, 2023, 6:25 a.m. UTC
  Integrated zero-copy put API in mempool cache in i40e PMD.
On Ampere Altra server, l3fwd single core's performance improves by 5%
with the new API

Signed-off-by: Kamalakshitha Aligeri <kamalakshitha.aligeri@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
---
Link: https://patchwork.dpdk.org/project/dpdk/patch/20221227151700.80887-1-mb@smartsharesystems.com/

 .mailmap                                |  1 +
 drivers/net/i40e/i40e_rxtx_vec_common.h | 34 ++++++++++++++++++++-----
 2 files changed, 28 insertions(+), 7 deletions(-)

--
2.25.1
  

Comments

Morten Brørup Feb. 9, 2023, 9:34 a.m. UTC | #1
> From: Kamalakshitha Aligeri [mailto:kamalakshitha.aligeri@arm.com]
> Sent: Thursday, 9 February 2023 07.25
> 
> Integrated zero-copy put API in mempool cache in i40e PMD.
> On Ampere Altra server, l3fwd single core's performance improves by 5%
> with the new API
> 
> Signed-off-by: Kamalakshitha Aligeri <kamalakshitha.aligeri@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
> ---
> Link:
> https://patchwork.dpdk.org/project/dpdk/patch/20221227151700.80887-1-
> mb@smartsharesystems.com/
> 
>  .mailmap                                |  1 +
>  drivers/net/i40e/i40e_rxtx_vec_common.h | 34 ++++++++++++++++++++-----
>  2 files changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/.mailmap b/.mailmap
> index 75884b6fe2..05a42edbcf 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -670,6 +670,7 @@ Kai Ji <kai.ji@intel.com>
>  Kaiwen Deng <kaiwenx.deng@intel.com>
>  Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
>  Kamalakannan R <kamalakannan.r@intel.com>
> +Kamalakshitha Aligeri <kamalakshitha.aligeri@arm.com>
>  Kamil Bednarczyk <kamil.bednarczyk@intel.com>
>  Kamil Chalupnik <kamilx.chalupnik@intel.com>
>  Kamil Rytarowski <kamil.rytarowski@caviumnetworks.com>
> diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h
> b/drivers/net/i40e/i40e_rxtx_vec_common.h
> index fe1a6ec75e..80d4a159e6 100644
> --- a/drivers/net/i40e/i40e_rxtx_vec_common.h
> +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h
> @@ -95,17 +95,37 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
> 
>  	n = txq->tx_rs_thresh;
> 
> -	 /* first buffer to free from S/W ring is at index
> -	  * tx_next_dd - (tx_rs_thresh-1)
> -	  */
> +	/* first buffer to free from S/W ring is at index
> +	 * tx_next_dd - (tx_rs_thresh-1)
> +	 */
>  	txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)];
> 
>  	if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) {
> -		for (i = 0; i < n; i++) {
> -			free[i] = txep[i].mbuf;
> -			/* no need to reset txep[i].mbuf in vector path */
> +		struct rte_mempool *mp = txep[0].mbuf->pool;
> +		struct rte_mempool_cache *cache =
> rte_mempool_default_cache(mp, rte_lcore_id());
> +
> +		if (!cache || n > RTE_MEMPOOL_CACHE_MAX_SIZE) {

If the mempool has a cache, do not compare n to RTE_MEMPOOL_CACHE_MAX_SIZE. Instead, call rte_mempool_cache_zc_put_bulk() to determine if n is acceptable for zero-copy.

It looks like this patch behaves incorrectly if the cache is configured to be smaller than RTE_MEMPOOL_CACHE_MAX_SIZE. Let's say the cache size is 8, which will make the flush threshold 12. If n is 32, your code will not enter this branch, but proceed to call rte_mempool_cache_zc_put_bulk(), which will return NULL, and then you will goto done.

Obviously, if there is no cache, fall back to the standard rte_mempool_put_bulk().

> +			for (i = 0; i < n ; i++)
> +				free[i] = txep[i].mbuf;
> +			if (!cache) {
> +				rte_mempool_generic_put(mp, (void **)free, n,
> cache);
> +				goto done;
> +			}
> +			if (n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
> +				rte_mempool_ops_enqueue_bulk(mp, (void **)free,
> n);
> +				goto done;
> +			}
> +		}
> +		void **cache_objs;
> +
> +		cache_objs = rte_mempool_cache_zc_put_bulk(cache, mp, n);
> +		if (cache_objs) {
> +			for (i = 0; i < n; i++) {
> +				cache_objs[i] = txep->mbuf;
> +				/* no need to reset txep[i].mbuf in vector path
> */
> +				txep++;
> +			}
>  		}
> -		rte_mempool_put_bulk(free[0]->pool, (void **)free, n);
>  		goto done;
>  	}
> 
> --
> 2.25.1
>
  
Feifei Wang Feb. 9, 2023, 10:58 a.m. UTC | #2
Hi, Morten

> -----邮件原件-----
> 发件人: Morten Brørup <mb@smartsharesystems.com>
> 发送时间: Thursday, February 9, 2023 5:34 PM
> 收件人: Kamalakshitha Aligeri <Kamalakshitha.Aligeri@arm.com>;
> Yuying.Zhang@intel.com; beilei.xing@intel.com; olivier.matz@6wind.com;
> andrew.rybchenko@oktetlabs.ru; bruce.richardson@intel.com;
> konstantin.ananyev@huawei.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>
> 抄送: dev@dpdk.org; nd <nd@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; Feifei Wang <Feifei.Wang2@arm.com>
> 主题: RE: [PATCH 1/2] net/i40e: replace put function
> 
> > From: Kamalakshitha Aligeri [mailto:kamalakshitha.aligeri@arm.com]
> > Sent: Thursday, 9 February 2023 07.25
> >
> > Integrated zero-copy put API in mempool cache in i40e PMD.
> > On Ampere Altra server, l3fwd single core's performance improves by 5%
> > with the new API
> >
> > Signed-off-by: Kamalakshitha Aligeri <kamalakshitha.aligeri@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
> > ---
> > Link:
> > https://patchwork.dpdk.org/project/dpdk/patch/20221227151700.80887-1-
> > mb@smartsharesystems.com/
> >
> >  .mailmap                                |  1 +
> >  drivers/net/i40e/i40e_rxtx_vec_common.h | 34
> > ++++++++++++++++++++-----
> >  2 files changed, 28 insertions(+), 7 deletions(-)
> >
> > diff --git a/.mailmap b/.mailmap
> > index 75884b6fe2..05a42edbcf 100644
> > --- a/.mailmap
> > +++ b/.mailmap
> > @@ -670,6 +670,7 @@ Kai Ji <kai.ji@intel.com>  Kaiwen Deng
> > <kaiwenx.deng@intel.com>  Kalesh AP
> > <kalesh-anakkur.purayil@broadcom.com>
> >  Kamalakannan R <kamalakannan.r@intel.com>
> > +Kamalakshitha Aligeri <kamalakshitha.aligeri@arm.com>
> >  Kamil Bednarczyk <kamil.bednarczyk@intel.com>  Kamil Chalupnik
> > <kamilx.chalupnik@intel.com>  Kamil Rytarowski
> > <kamil.rytarowski@caviumnetworks.com>
> > diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h
> > b/drivers/net/i40e/i40e_rxtx_vec_common.h
> > index fe1a6ec75e..80d4a159e6 100644
> > --- a/drivers/net/i40e/i40e_rxtx_vec_common.h
> > +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h
> > @@ -95,17 +95,37 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
> >
> >  	n = txq->tx_rs_thresh;
> >
> > -	 /* first buffer to free from S/W ring is at index
> > -	  * tx_next_dd - (tx_rs_thresh-1)
> > -	  */
> > +	/* first buffer to free from S/W ring is at index
> > +	 * tx_next_dd - (tx_rs_thresh-1)
> > +	 */
> >  	txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)];
> >
> >  	if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) {
> > -		for (i = 0; i < n; i++) {
> > -			free[i] = txep[i].mbuf;
> > -			/* no need to reset txep[i].mbuf in vector path */
> > +		struct rte_mempool *mp = txep[0].mbuf->pool;
> > +		struct rte_mempool_cache *cache =
> > rte_mempool_default_cache(mp, rte_lcore_id());
> > +
> > +		if (!cache || n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
> 
> If the mempool has a cache, do not compare n to
> RTE_MEMPOOL_CACHE_MAX_SIZE. Instead, call
> rte_mempool_cache_zc_put_bulk() to determine if n is acceptable for zero-
> copy.
> 

> It looks like this patch behaves incorrectly if the cache is configured to be
> smaller than RTE_MEMPOOL_CACHE_MAX_SIZE. Let's say the cache size is 8,
> which will make the flush threshold 12. If n is 32, your code will not enter this
> branch, but proceed to call rte_mempool_cache_zc_put_bulk(), which will
> return NULL, and then you will goto done.
> 
> Obviously, if there is no cache, fall back to the standard
> rte_mempool_put_bulk().

Agree with this. I think we ignore the case that (cache -> flushthresh  < n <  RTE_MEMPOOL_CACHE_MAX_SIZE).

Our goal is that if (!cache || n > cache -> flushthresh), we can put the buffers
into mempool directly.  

Thus maybe we can change as:
struct rte_mempool_cache *cache = rte_mempool_default_cache(mp, rte_lcore_id());
if (!cache || n > cache -> flushthresh) {
      for (i = 0; i < n ; i++)
          free[i] = txep[i].mbuf;
      if (!cache) {
                rte_mempool_generic_put;
                goto done;
      } else if {
                rte_mempool_ops_enqueue_bulk;
                goto done;
      }
}

If we can change like this?

> 
> > +			for (i = 0; i < n ; i++)
> > +				free[i] = txep[i].mbuf;
> > +			if (!cache) {
> > +				rte_mempool_generic_put(mp, (void
> **)free, n,
> > cache);
> > +				goto done;
> > +			}
> > +			if (n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
> > +				rte_mempool_ops_enqueue_bulk(mp, (void
> **)free,
> > n);
> > +				goto done;
> > +			}
> > +		}
> > +		void **cache_objs;
> > +
> > +		cache_objs = rte_mempool_cache_zc_put_bulk(cache, mp,
> n);
> > +		if (cache_objs) {
> > +			for (i = 0; i < n; i++) {
> > +				cache_objs[i] = txep->mbuf;
> > +				/* no need to reset txep[i].mbuf in vector
> path
> > */
> > +				txep++;
> > +			}
> >  		}
> > -		rte_mempool_put_bulk(free[0]->pool, (void **)free, n);
> >  		goto done;
> >  	}
> >
> > --
> > 2.25.1
> >
  
Morten Brørup Feb. 9, 2023, 11:30 a.m. UTC | #3
> From: Feifei Wang [mailto:Feifei.Wang2@arm.com]
> Sent: Thursday, 9 February 2023 11.59
> 
> Hi, Morten
> 
> > 发件人: Morten Brørup <mb@smartsharesystems.com>
> > 发送时间: Thursday, February 9, 2023 5:34 PM
> >
> > > From: Kamalakshitha Aligeri [mailto:kamalakshitha.aligeri@arm.com]
> > > Sent: Thursday, 9 February 2023 07.25
> > >
> > > Integrated zero-copy put API in mempool cache in i40e PMD.
> > > On Ampere Altra server, l3fwd single core's performance improves by
> 5%
> > > with the new API
> > >
> > > Signed-off-by: Kamalakshitha Aligeri
> <kamalakshitha.aligeri@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
> > > ---
> > > Link:
> > > https://patchwork.dpdk.org/project/dpdk/patch/20221227151700.80887-
> 1-
> > > mb@smartsharesystems.com/
> > >
> > >  .mailmap                                |  1 +
> > >  drivers/net/i40e/i40e_rxtx_vec_common.h | 34
> > > ++++++++++++++++++++-----
> > >  2 files changed, 28 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/.mailmap b/.mailmap
> > > index 75884b6fe2..05a42edbcf 100644
> > > --- a/.mailmap
> > > +++ b/.mailmap
> > > @@ -670,6 +670,7 @@ Kai Ji <kai.ji@intel.com>  Kaiwen Deng
> > > <kaiwenx.deng@intel.com>  Kalesh AP
> > > <kalesh-anakkur.purayil@broadcom.com>
> > >  Kamalakannan R <kamalakannan.r@intel.com>
> > > +Kamalakshitha Aligeri <kamalakshitha.aligeri@arm.com>
> > >  Kamil Bednarczyk <kamil.bednarczyk@intel.com>  Kamil Chalupnik
> > > <kamilx.chalupnik@intel.com>  Kamil Rytarowski
> > > <kamil.rytarowski@caviumnetworks.com>
> > > diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h
> > > b/drivers/net/i40e/i40e_rxtx_vec_common.h
> > > index fe1a6ec75e..80d4a159e6 100644
> > > --- a/drivers/net/i40e/i40e_rxtx_vec_common.h
> > > +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h
> > > @@ -95,17 +95,37 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
> > >
> > >  	n = txq->tx_rs_thresh;
> > >
> > > -	 /* first buffer to free from S/W ring is at index
> > > -	  * tx_next_dd - (tx_rs_thresh-1)
> > > -	  */
> > > +	/* first buffer to free from S/W ring is at index
> > > +	 * tx_next_dd - (tx_rs_thresh-1)
> > > +	 */
> > >  	txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)];
> > >
> > >  	if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) {
> > > -		for (i = 0; i < n; i++) {
> > > -			free[i] = txep[i].mbuf;
> > > -			/* no need to reset txep[i].mbuf in vector path */
> > > +		struct rte_mempool *mp = txep[0].mbuf->pool;
> > > +		struct rte_mempool_cache *cache =
> > > rte_mempool_default_cache(mp, rte_lcore_id());
> > > +
> > > +		if (!cache || n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
> >
> > If the mempool has a cache, do not compare n to
> > RTE_MEMPOOL_CACHE_MAX_SIZE. Instead, call
> > rte_mempool_cache_zc_put_bulk() to determine if n is acceptable for
> zero-
> > copy.
> >
> 
> > It looks like this patch behaves incorrectly if the cache is
> configured to be
> > smaller than RTE_MEMPOOL_CACHE_MAX_SIZE. Let's say the cache size is
> 8,
> > which will make the flush threshold 12. If n is 32, your code will
> not enter this
> > branch, but proceed to call rte_mempool_cache_zc_put_bulk(), which
> will
> > return NULL, and then you will goto done.
> >
> > Obviously, if there is no cache, fall back to the standard
> > rte_mempool_put_bulk().
> 
> Agree with this. I think we ignore the case that (cache -> flushthresh
> < n <  RTE_MEMPOOL_CACHE_MAX_SIZE).
> 
> Our goal is that if (!cache || n > cache -> flushthresh), we can put
> the buffers
> into mempool directly.
> 
> Thus maybe we can change as:
> struct rte_mempool_cache *cache = rte_mempool_default_cache(mp,
> rte_lcore_id());
> if (!cache || n > cache -> flushthresh) {
>       for (i = 0; i < n ; i++)
>           free[i] = txep[i].mbuf;
>       if (!cache) {
>                 rte_mempool_generic_put;
>                 goto done;
>       } else if {
>                 rte_mempool_ops_enqueue_bulk;
>                 goto done;
>       }
> }
> 
> If we can change like this?

Since I consider "flushthreshold" private to the cache structure, it shouldn't be accessed directly. If its meaning changes in the future, you will have to rewrite the PMD code again. Use the mempool API instead of accessing the mempool structures directly. (Yeah, I know the mempool and mempool cache structures are not marked as internal, and thus formally public, but I still dislike accessing their internals from outside the mempool library.)

I would change to something like:

struct rte_mempool_cache *cache;
void **cache_objs;

cache = rte_mempool_default_cache(mp, rte_lcore_id());
if (unlikely(cache == NULL))
	goto fallback;

/* Try zero-copy put. */
cache_objs = rte_mempool_cache_zc_put_bulk(cache, mp, n);
if (unlikely(cache_objs == NULL))
	goto fallback;

/* Zero-copy put. */
/* no need to reset txep[i].mbuf in vector path */
for (i = 0; i < n; i++)
	cache_objs[i] = txep[i].mbuf;
goto done;

fallback:
/* Ordinary put. */
/* no need to reset txep[i].mbuf in vector path */
for (i = 0; i < n ; i++)
	free[i] = txep[i].mbuf;
rte_mempool_generic_put(mp, free, n, cache);
goto done;


> 
> >
> > > +			for (i = 0; i < n ; i++)
> > > +				free[i] = txep[i].mbuf;
> > > +			if (!cache) {
> > > +				rte_mempool_generic_put(mp, (void
> > **)free, n,
> > > cache);
> > > +				goto done;
> > > +			}
> > > +			if (n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
> > > +				rte_mempool_ops_enqueue_bulk(mp, (void
> > **)free,
> > > n);
> > > +				goto done;
> > > +			}
> > > +		}
> > > +		void **cache_objs;
> > > +
> > > +		cache_objs = rte_mempool_cache_zc_put_bulk(cache, mp,
> > n);
> > > +		if (cache_objs) {
> > > +			for (i = 0; i < n; i++) {
> > > +				cache_objs[i] = txep->mbuf;
> > > +				/* no need to reset txep[i].mbuf in vector
> > path
> > > */
> > > +				txep++;
> > > +			}
> > >  		}
> > > -		rte_mempool_put_bulk(free[0]->pool, (void **)free, n);
> > >  		goto done;
> > >  	}
> > >
> > > --
> > > 2.25.1
> > >
  
Feifei Wang Feb. 10, 2023, 2:43 a.m. UTC | #4
> -----邮件原件-----
> 发件人: Morten Brørup <mb@smartsharesystems.com>
> 发送时间: Thursday, February 9, 2023 7:31 PM
> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; Kamalakshitha Aligeri
> <Kamalakshitha.Aligeri@arm.com>; Yuying.Zhang@intel.com;
> beilei.xing@intel.com; olivier.matz@6wind.com;
> andrew.rybchenko@oktetlabs.ru; bruce.richardson@intel.com;
> konstantin.ananyev@huawei.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>
> 抄送: dev@dpdk.org; nd <nd@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> 主题: RE: [PATCH 1/2] net/i40e: replace put function
> 
> > From: Feifei Wang [mailto:Feifei.Wang2@arm.com]
> > Sent: Thursday, 9 February 2023 11.59
> >
> > Hi, Morten
> >
> > > 发件人: Morten Brørup <mb@smartsharesystems.com>
> > > 发送时间: Thursday, February 9, 2023 5:34 PM
> > >
> > > > From: Kamalakshitha Aligeri [mailto:kamalakshitha.aligeri@arm.com]
> > > > Sent: Thursday, 9 February 2023 07.25
> > > >
> > > > Integrated zero-copy put API in mempool cache in i40e PMD.
> > > > On Ampere Altra server, l3fwd single core's performance improves
> > > > by
> > 5%
> > > > with the new API
> > > >
> > > > Signed-off-by: Kamalakshitha Aligeri
> > <kamalakshitha.aligeri@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
> > > > ---
> > > > Link:
> > > >
> https://patchwork.dpdk.org/project/dpdk/patch/20221227151700.80887
> > > > -
> > 1-
> > > > mb@smartsharesystems.com/
> > > >
> > > >  .mailmap                                |  1 +
> > > >  drivers/net/i40e/i40e_rxtx_vec_common.h | 34
> > > > ++++++++++++++++++++-----
> > > >  2 files changed, 28 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/.mailmap b/.mailmap
> > > > index 75884b6fe2..05a42edbcf 100644
> > > > --- a/.mailmap
> > > > +++ b/.mailmap
> > > > @@ -670,6 +670,7 @@ Kai Ji <kai.ji@intel.com>  Kaiwen Deng
> > > > <kaiwenx.deng@intel.com>  Kalesh AP
> > > > <kalesh-anakkur.purayil@broadcom.com>
> > > >  Kamalakannan R <kamalakannan.r@intel.com>
> > > > +Kamalakshitha Aligeri <kamalakshitha.aligeri@arm.com>
> > > >  Kamil Bednarczyk <kamil.bednarczyk@intel.com>  Kamil Chalupnik
> > > > <kamilx.chalupnik@intel.com>  Kamil Rytarowski
> > > > <kamil.rytarowski@caviumnetworks.com>
> > > > diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h
> > > > b/drivers/net/i40e/i40e_rxtx_vec_common.h
> > > > index fe1a6ec75e..80d4a159e6 100644
> > > > --- a/drivers/net/i40e/i40e_rxtx_vec_common.h
> > > > +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h
> > > > @@ -95,17 +95,37 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
> > > >
> > > >  	n = txq->tx_rs_thresh;
> > > >
> > > > -	 /* first buffer to free from S/W ring is at index
> > > > -	  * tx_next_dd - (tx_rs_thresh-1)
> > > > -	  */
> > > > +	/* first buffer to free from S/W ring is at index
> > > > +	 * tx_next_dd - (tx_rs_thresh-1)
> > > > +	 */
> > > >  	txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)];
> > > >
> > > >  	if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) {
> > > > -		for (i = 0; i < n; i++) {
> > > > -			free[i] = txep[i].mbuf;
> > > > -			/* no need to reset txep[i].mbuf in vector path */
> > > > +		struct rte_mempool *mp = txep[0].mbuf->pool;
> > > > +		struct rte_mempool_cache *cache =
> > > > rte_mempool_default_cache(mp, rte_lcore_id());
> > > > +
> > > > +		if (!cache || n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
> > >
> > > If the mempool has a cache, do not compare n to
> > > RTE_MEMPOOL_CACHE_MAX_SIZE. Instead, call
> > > rte_mempool_cache_zc_put_bulk() to determine if n is acceptable for
> > zero-
> > > copy.
> > >
> >
> > > It looks like this patch behaves incorrectly if the cache is
> > configured to be
> > > smaller than RTE_MEMPOOL_CACHE_MAX_SIZE. Let's say the cache size
> is
> > 8,
> > > which will make the flush threshold 12. If n is 32, your code will
> > not enter this
> > > branch, but proceed to call rte_mempool_cache_zc_put_bulk(), which
> > will
> > > return NULL, and then you will goto done.
> > >
> > > Obviously, if there is no cache, fall back to the standard
> > > rte_mempool_put_bulk().
> >
> > Agree with this. I think we ignore the case that (cache -> flushthresh
> > < n <  RTE_MEMPOOL_CACHE_MAX_SIZE).
> >
> > Our goal is that if (!cache || n > cache -> flushthresh), we can put
> > the buffers into mempool directly.
> >
> > Thus maybe we can change as:
> > struct rte_mempool_cache *cache = rte_mempool_default_cache(mp,
> > rte_lcore_id()); if (!cache || n > cache -> flushthresh) {
> >       for (i = 0; i < n ; i++)
> >           free[i] = txep[i].mbuf;
> >       if (!cache) {
> >                 rte_mempool_generic_put;
> >                 goto done;
> >       } else if {
> >                 rte_mempool_ops_enqueue_bulk;
> >                 goto done;
> >       }
> > }
> >
> > If we can change like this?
> 
> Since I consider "flushthreshold" private to the cache structure, it shouldn't
> be accessed directly. If its meaning changes in the future, you will have to
> rewrite the PMD code again. 
Ok, Agree with it. Using private variable to the cache needs to be treated with
caution.

Use the mempool API instead of accessing the
> mempool structures directly. (Yeah, I know the mempool and mempool
> cache structures are not marked as internal, and thus formally public, but I
> still dislike accessing their internals from outside the mempool library.)
> 
> I would change to something like:
> 
> struct rte_mempool_cache *cache;
> void **cache_objs;
> 
> cache = rte_mempool_default_cache(mp, rte_lcore_id()); if (unlikely(cache
> == NULL))
> 	goto fallback;
> 
> /* Try zero-copy put. */
> cache_objs = rte_mempool_cache_zc_put_bulk(cache, mp, n); if
> (unlikely(cache_objs == NULL))
> 	goto fallback;
> 
> /* Zero-copy put. */
> /* no need to reset txep[i].mbuf in vector path */ for (i = 0; i < n; i++)
> 	cache_objs[i] = txep[i].mbuf;
> goto done;
> 
> fallback:
> /* Ordinary put. */
> /* no need to reset txep[i].mbuf in vector path */ 
This note should be deleted, due to here we need to reset txep[i].mbuf.

for (i = 0; i < n ; i++)
> 	free[i] = txep[i].mbuf;
> rte_mempool_generic_put(mp, free, n, cache); goto done;
> 
> 
Agree with this change, some minor comments for the notes in 'fallback'.

> >
> > >
> > > > +			for (i = 0; i < n ; i++)
> > > > +				free[i] = txep[i].mbuf;
> > > > +			if (!cache) {
> > > > +				rte_mempool_generic_put(mp, (void
> > > **)free, n,
> > > > cache);
> > > > +				goto done;
> > > > +			}
> > > > +			if (n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
> > > > +				rte_mempool_ops_enqueue_bulk(mp, (void
> > > **)free,
> > > > n);
> > > > +				goto done;
> > > > +			}
> > > > +		}
> > > > +		void **cache_objs;
> > > > +
> > > > +		cache_objs = rte_mempool_cache_zc_put_bulk(cache, mp,
> > > n);
> > > > +		if (cache_objs) {
> > > > +			for (i = 0; i < n; i++) {
> > > > +				cache_objs[i] = txep->mbuf;
> > > > +				/* no need to reset txep[i].mbuf in vector
> > > path
> > > > */
> > > > +				txep++;
> > > > +			}
> > > >  		}
> > > > -		rte_mempool_put_bulk(free[0]->pool, (void **)free, n);
> > > >  		goto done;
> > > >  	}
> > > >
> > > > --
> > > > 2.25.1
> > > >
  

Patch

diff --git a/.mailmap b/.mailmap
index 75884b6fe2..05a42edbcf 100644
--- a/.mailmap
+++ b/.mailmap
@@ -670,6 +670,7 @@  Kai Ji <kai.ji@intel.com>
 Kaiwen Deng <kaiwenx.deng@intel.com>
 Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
 Kamalakannan R <kamalakannan.r@intel.com>
+Kamalakshitha Aligeri <kamalakshitha.aligeri@arm.com>
 Kamil Bednarczyk <kamil.bednarczyk@intel.com>
 Kamil Chalupnik <kamilx.chalupnik@intel.com>
 Kamil Rytarowski <kamil.rytarowski@caviumnetworks.com>
diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h
index fe1a6ec75e..80d4a159e6 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/i40e/i40e_rxtx_vec_common.h
@@ -95,17 +95,37 @@  i40e_tx_free_bufs(struct i40e_tx_queue *txq)

 	n = txq->tx_rs_thresh;

-	 /* first buffer to free from S/W ring is at index
-	  * tx_next_dd - (tx_rs_thresh-1)
-	  */
+	/* first buffer to free from S/W ring is at index
+	 * tx_next_dd - (tx_rs_thresh-1)
+	 */
 	txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)];

 	if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) {
-		for (i = 0; i < n; i++) {
-			free[i] = txep[i].mbuf;
-			/* no need to reset txep[i].mbuf in vector path */
+		struct rte_mempool *mp = txep[0].mbuf->pool;
+		struct rte_mempool_cache *cache = rte_mempool_default_cache(mp, rte_lcore_id());
+
+		if (!cache || n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
+			for (i = 0; i < n ; i++)
+				free[i] = txep[i].mbuf;
+			if (!cache) {
+				rte_mempool_generic_put(mp, (void **)free, n, cache);
+				goto done;
+			}
+			if (n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
+				rte_mempool_ops_enqueue_bulk(mp, (void **)free, n);
+				goto done;
+			}
+		}
+		void **cache_objs;
+
+		cache_objs = rte_mempool_cache_zc_put_bulk(cache, mp, n);
+		if (cache_objs) {
+			for (i = 0; i < n; i++) {
+				cache_objs[i] = txep->mbuf;
+				/* no need to reset txep[i].mbuf in vector path */
+				txep++;
+			}
 		}
-		rte_mempool_put_bulk(free[0]->pool, (void **)free, n);
 		goto done;
 	}