[1/8] app/test-crypto-perf: improve dequeue logic

Message ID 20220425041423.2232034-1-g.singh@nxp.com (mailing list archive)
State Changes Requested, archived
Delegated to: akhil goyal
Headers
Series [1/8] app/test-crypto-perf: improve dequeue logic |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Gagandeep Singh April 25, 2022, 4:14 a.m. UTC
  Issue more dequeue commands if the gap between enqueued
and dequeued packets is more than burst size *8

Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
---
 app/test-crypto-perf/cperf_test_throughput.c | 42 +++++++++++---------
 1 file changed, 23 insertions(+), 19 deletions(-)
  

Comments

Akhil Goyal May 13, 2022, 9:46 a.m. UTC | #1
Hi Gagan,
> Issue more dequeue commands if the gap between enqueued
> and dequeued packets is more than burst size *8
> 
> Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
> ---
Why is this change required? What gain are we getting?
I see a performance drop due to this patch.

>  app/test-crypto-perf/cperf_test_throughput.c | 42 +++++++++++---------
>  1 file changed, 23 insertions(+), 19 deletions(-)
> 
> diff --git a/app/test-crypto-perf/cperf_test_throughput.c b/app/test-crypto-
> perf/cperf_test_throughput.c
> index cecf30e470..5cd8919c91 100644
> --- a/app/test-crypto-perf/cperf_test_throughput.c
> +++ b/app/test-crypto-perf/cperf_test_throughput.c
> @@ -223,26 +223,30 @@ cperf_throughput_test_runner(void *test_ctx)
>  			ops_unused = burst_size - ops_enqd;
>  			ops_enqd_total += ops_enqd;
> 
> -
>  			/* Dequeue processed burst of ops from crypto device
> */
> -			ops_deqd = rte_cryptodev_dequeue_burst(ctx->dev_id,
> ctx->qp_id,
> -					ops_processed, test_burst_size);
> -
> -			if (likely(ops_deqd))  {
> -				/* Free crypto ops so they can be reused. */
> -				rte_mempool_put_bulk(ctx->pool,
> -						(void **)ops_processed,
> ops_deqd);
> -
> -				ops_deqd_total += ops_deqd;
> -			} else {
> -				/**
> -				 * Count dequeue polls which didn't return any
> -				 * processed operations. This statistic is mainly
> -				 * relevant to hw accelerators.
> -				 */
> -				ops_deqd_failed++;
> -			}
> -
> +			do {
> +				ops_deqd = rte_cryptodev_dequeue_burst(
> +						ctx->dev_id, ctx->qp_id,
> +						ops_processed,
> test_burst_size);
> +
> +				if (likely(ops_deqd))  {
> +					/* Free crypto ops for reuse */
> +					rte_mempool_put_bulk(ctx->pool,
> +							(void
> **)ops_processed,
> +							ops_deqd);
> +
> +					ops_deqd_total += ops_deqd;
> +				} else {
> +					/**
> +					 * Count dequeue polls which didn't
> +					 * return any processed operations.
> +					 * This statistic is mainly relevant
> +					 * to hw accelerators.
> +					 */
> +					ops_deqd_failed++;
> +				}
> +			} while (ops_enqd_total - ops_deqd_total >
> +					test_burst_size * 8);
>  		}
> 
>  		/* Dequeue any operations still in the crypto device */
> --
> 2.25.1
  
Gagandeep Singh May 16, 2022, 7:14 a.m. UTC | #2
Hi

> -----Original Message-----
> From: Akhil Goyal <gakhil@marvell.com>
> Sent: Friday, May 13, 2022 3:17 PM
> To: Gagandeep Singh <G.Singh@nxp.com>; dev@dpdk.org; Hemant Agrawal
> <hemant.agrawal@nxp.com>
> Cc: Anoob Joseph <anoobj@marvell.com>
> Subject: RE: [EXT] [PATCH 1/8] app/test-crypto-perf: improve dequeue logic
> 
> Hi Gagan,
> > Issue more dequeue commands if the gap between enqueued
> > and dequeued packets is more than burst size *8
> >
> > Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
> > ---
> Why is this change required? What gain are we getting?
> I see a performance drop due to this patch.

Issue is, in case if security engine/driver is slow in processing the
Jobs especially for larger packet sizes then in that case application
will keep enqueuing packets with higher rate than dequeue which
may results in buffer pool exhaustion.
Application has option to increase pool size but that may not be
Helpful for the platforms those with memory constraints.

We can work on limiting the enqueue side instead of keeping the dequeue
to avoid any performance drop due to any empty dequeue.

Dropping this patch from this series. I will update the logic and
will try to send as separate patch.

> 
> >  app/test-crypto-perf/cperf_test_throughput.c | 42 +++++++++++---------
> >  1 file changed, 23 insertions(+), 19 deletions(-)
> >
> > diff --git a/app/test-crypto-perf/cperf_test_throughput.c b/app/test-crypto-
> > perf/cperf_test_throughput.c
> > index cecf30e470..5cd8919c91 100644
> > --- a/app/test-crypto-perf/cperf_test_throughput.c
> > +++ b/app/test-crypto-perf/cperf_test_throughput.c
> > @@ -223,26 +223,30 @@ cperf_throughput_test_runner(void *test_ctx)
> >  			ops_unused = burst_size - ops_enqd;
> >  			ops_enqd_total += ops_enqd;
> >
> > -
> >  			/* Dequeue processed burst of ops from crypto device
> > */
> > -			ops_deqd = rte_cryptodev_dequeue_burst(ctx->dev_id,
> > ctx->qp_id,
> > -					ops_processed, test_burst_size);
> > -
> > -			if (likely(ops_deqd))  {
> > -				/* Free crypto ops so they can be reused. */
> > -				rte_mempool_put_bulk(ctx->pool,
> > -						(void **)ops_processed,
> > ops_deqd);
> > -
> > -				ops_deqd_total += ops_deqd;
> > -			} else {
> > -				/**
> > -				 * Count dequeue polls which didn't return any
> > -				 * processed operations. This statistic is mainly
> > -				 * relevant to hw accelerators.
> > -				 */
> > -				ops_deqd_failed++;
> > -			}
> > -
> > +			do {
> > +				ops_deqd = rte_cryptodev_dequeue_burst(
> > +						ctx->dev_id, ctx->qp_id,
> > +						ops_processed,
> > test_burst_size);
> > +
> > +				if (likely(ops_deqd))  {
> > +					/* Free crypto ops for reuse */
> > +					rte_mempool_put_bulk(ctx->pool,
> > +							(void
> > **)ops_processed,
> > +							ops_deqd);
> > +
> > +					ops_deqd_total += ops_deqd;
> > +				} else {
> > +					/**
> > +					 * Count dequeue polls which didn't
> > +					 * return any processed operations.
> > +					 * This statistic is mainly relevant
> > +					 * to hw accelerators.
> > +					 */
> > +					ops_deqd_failed++;
> > +				}
> > +			} while (ops_enqd_total - ops_deqd_total >
> > +					test_burst_size * 8);
> >  		}
> >
> >  		/* Dequeue any operations still in the crypto device */
> > --
> > 2.25.1
  
Anoob Joseph May 16, 2022, 7:26 a.m. UTC | #3
Hi Gagandeep,

Please see inline.

Thanks,
Anoob

> -----Original Message-----
> From: Gagandeep Singh <G.Singh@nxp.com>
> Sent: Monday, May 16, 2022 12:44 PM
> To: Akhil Goyal <gakhil@marvell.com>; dev@dpdk.org; Hemant Agrawal
> <hemant.agrawal@nxp.com>
> Cc: Anoob Joseph <anoobj@marvell.com>
> Subject: RE: [EXT] [PATCH 1/8] app/test-crypto-perf: improve dequeue logic
> 
> Hi
> 
> > -----Original Message-----
> > From: Akhil Goyal <gakhil@marvell.com>
> > Sent: Friday, May 13, 2022 3:17 PM
> > To: Gagandeep Singh <G.Singh@nxp.com>; dev@dpdk.org; Hemant Agrawal
> > <hemant.agrawal@nxp.com>
> > Cc: Anoob Joseph <anoobj@marvell.com>
> > Subject: RE: [EXT] [PATCH 1/8] app/test-crypto-perf: improve dequeue
> > logic
> >
> > Hi Gagan,
> > > Issue more dequeue commands if the gap between enqueued and dequeued
> > > packets is more than burst size *8
> > >
> > > Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
> > > ---
> > Why is this change required? What gain are we getting?
> > I see a performance drop due to this patch.
> 
> Issue is, in case if security engine/driver is slow in processing the Jobs especially
> for larger packet sizes then in that case application will keep enqueuing packets
> with higher rate than dequeue which may results in buffer pool exhaustion.
> Application has option to increase pool size but that may not be Helpful for the
> platforms those with memory constraints.

[Anoob] Can you elaborate the issue that you are hitting? 

> 
> We can work on limiting the enqueue side instead of keeping the dequeue to
> avoid any performance drop due to any empty dequeue.

[Anoob] Shouldn't PMD take care of limiting the enqueue side? Application can specify the desired queue depth and when application enqueues beyond that, enqueue API can return 0. Wouldn't that good enough?
 
> 
> Dropping this patch from this series. I will update the logic and will try to send as
> separate patch.
> 
> >
> > >  app/test-crypto-perf/cperf_test_throughput.c | 42
> > > +++++++++++---------
> > >  1 file changed, 23 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/app/test-crypto-perf/cperf_test_throughput.c
> > > b/app/test-crypto- perf/cperf_test_throughput.c index
> > > cecf30e470..5cd8919c91 100644
> > > --- a/app/test-crypto-perf/cperf_test_throughput.c
> > > +++ b/app/test-crypto-perf/cperf_test_throughput.c
> > > @@ -223,26 +223,30 @@ cperf_throughput_test_runner(void *test_ctx)
> > >  			ops_unused = burst_size - ops_enqd;
> > >  			ops_enqd_total += ops_enqd;
> > >
> > > -
> > >  			/* Dequeue processed burst of ops from crypto device
> */
> > > -			ops_deqd = rte_cryptodev_dequeue_burst(ctx->dev_id,
> > > ctx->qp_id,
> > > -					ops_processed, test_burst_size);
> > > -
> > > -			if (likely(ops_deqd))  {
> > > -				/* Free crypto ops so they can be reused. */
> > > -				rte_mempool_put_bulk(ctx->pool,
> > > -						(void **)ops_processed,
> > > ops_deqd);
> > > -
> > > -				ops_deqd_total += ops_deqd;
> > > -			} else {
> > > -				/**
> > > -				 * Count dequeue polls which didn't return any
> > > -				 * processed operations. This statistic is mainly
> > > -				 * relevant to hw accelerators.
> > > -				 */
> > > -				ops_deqd_failed++;
> > > -			}
> > > -
> > > +			do {
> > > +				ops_deqd = rte_cryptodev_dequeue_burst(
> > > +						ctx->dev_id, ctx->qp_id,
> > > +						ops_processed,
> > > test_burst_size);
> > > +
> > > +				if (likely(ops_deqd))  {
> > > +					/* Free crypto ops for reuse */
> > > +					rte_mempool_put_bulk(ctx->pool,
> > > +							(void
> > > **)ops_processed,
> > > +							ops_deqd);
> > > +
> > > +					ops_deqd_total += ops_deqd;
> > > +				} else {
> > > +					/**
> > > +					 * Count dequeue polls which didn't
> > > +					 * return any processed operations.
> > > +					 * This statistic is mainly relevant
> > > +					 * to hw accelerators.
> > > +					 */
> > > +					ops_deqd_failed++;
> > > +				}
> > > +			} while (ops_enqd_total - ops_deqd_total >
> > > +					test_burst_size * 8);
> > >  		}
> > >
> > >  		/* Dequeue any operations still in the crypto device */
> > > --
> > > 2.25.1
  
Gagandeep Singh May 16, 2022, 7:54 a.m. UTC | #4
Hi,

> -----Original Message-----
> From: Anoob Joseph <anoobj@marvell.com>
> Sent: Monday, May 16, 2022 12:57 PM
> To: Gagandeep Singh <G.Singh@nxp.com>
> Cc: Akhil Goyal <gakhil@marvell.com>; dev@dpdk.org; Hemant Agrawal
> <hemant.agrawal@nxp.com>
> Subject: RE: [EXT] [PATCH 1/8] app/test-crypto-perf: improve dequeue logic
> 
> Hi Gagandeep,
> 
> Please see inline.
> 
> Thanks,
> Anoob
> 
> > -----Original Message-----
> > From: Gagandeep Singh <G.Singh@nxp.com>
> > Sent: Monday, May 16, 2022 12:44 PM
> > To: Akhil Goyal <gakhil@marvell.com>; dev@dpdk.org; Hemant Agrawal
> > <hemant.agrawal@nxp.com>
> > Cc: Anoob Joseph <anoobj@marvell.com>
> > Subject: RE: [EXT] [PATCH 1/8] app/test-crypto-perf: improve dequeue
> > logic
> >
> > Hi
> >
> > > -----Original Message-----
> > > From: Akhil Goyal <gakhil@marvell.com>
> > > Sent: Friday, May 13, 2022 3:17 PM
> > > To: Gagandeep Singh <G.Singh@nxp.com>; dev@dpdk.org; Hemant Agrawal
> > > <hemant.agrawal@nxp.com>
> > > Cc: Anoob Joseph <anoobj@marvell.com>
> > > Subject: RE: [EXT] [PATCH 1/8] app/test-crypto-perf: improve dequeue
> > > logic
> > >
> > > Hi Gagan,
> > > > Issue more dequeue commands if the gap between enqueued and
> > > > dequeued packets is more than burst size *8
> > > >
> > > > Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
> > > > ---
> > > Why is this change required? What gain are we getting?
> > > I see a performance drop due to this patch.
> >
> > Issue is, in case if security engine/driver is slow in processing the
> > Jobs especially for larger packet sizes then in that case application
> > will keep enqueuing packets with higher rate than dequeue which may results
> in buffer pool exhaustion.
> > Application has option to increase pool size but that may not be
> > Helpful for the platforms those with memory constraints.
> 
> [Anoob] Can you elaborate the issue that you are hitting?
> 
> >
> > We can work on limiting the enqueue side instead of keeping the
> > dequeue to avoid any performance drop due to any empty dequeue.
> 
> [Anoob] Shouldn't PMD take care of limiting the enqueue side? Application can
> specify the desired queue depth and when application enqueues beyond that,
> enqueue API can return 0. Wouldn't that good enough?

Agree. Unfortunately, Currently In the NXP platform drivers like dpaa_sec, caam_jr
not using the API's given queue depth. This is what I will discuss internally to support
this in the PMDs itself to limit the enqueue, if we can do so without impacting
the NXP's PMDs performance.

> >
> > Dropping this patch from this series. I will update the logic and will
> > try to send as separate patch.
> >
> > >
> > > >  app/test-crypto-perf/cperf_test_throughput.c | 42
> > > > +++++++++++---------
> > > >  1 file changed, 23 insertions(+), 19 deletions(-)
> > > >
> > > > diff --git a/app/test-crypto-perf/cperf_test_throughput.c
> > > > b/app/test-crypto- perf/cperf_test_throughput.c index
> > > > cecf30e470..5cd8919c91 100644
> > > > --- a/app/test-crypto-perf/cperf_test_throughput.c
> > > > +++ b/app/test-crypto-perf/cperf_test_throughput.c
> > > > @@ -223,26 +223,30 @@ cperf_throughput_test_runner(void *test_ctx)
> > > >  			ops_unused = burst_size - ops_enqd;
> > > >  			ops_enqd_total += ops_enqd;
> > > >
> > > > -
> > > >  			/* Dequeue processed burst of ops from crypto device
> > */
> > > > -			ops_deqd = rte_cryptodev_dequeue_burst(ctx->dev_id,
> > > > ctx->qp_id,
> > > > -					ops_processed, test_burst_size);
> > > > -
> > > > -			if (likely(ops_deqd))  {
> > > > -				/* Free crypto ops so they can be reused. */
> > > > -				rte_mempool_put_bulk(ctx->pool,
> > > > -						(void **)ops_processed,
> > > > ops_deqd);
> > > > -
> > > > -				ops_deqd_total += ops_deqd;
> > > > -			} else {
> > > > -				/**
> > > > -				 * Count dequeue polls which didn't return any
> > > > -				 * processed operations. This statistic is mainly
> > > > -				 * relevant to hw accelerators.
> > > > -				 */
> > > > -				ops_deqd_failed++;
> > > > -			}
> > > > -
> > > > +			do {
> > > > +				ops_deqd = rte_cryptodev_dequeue_burst(
> > > > +						ctx->dev_id, ctx->qp_id,
> > > > +						ops_processed,
> > > > test_burst_size);
> > > > +
> > > > +				if (likely(ops_deqd))  {
> > > > +					/* Free crypto ops for reuse */
> > > > +					rte_mempool_put_bulk(ctx->pool,
> > > > +							(void
> > > > **)ops_processed,
> > > > +							ops_deqd);
> > > > +
> > > > +					ops_deqd_total += ops_deqd;
> > > > +				} else {
> > > > +					/**
> > > > +					 * Count dequeue polls which didn't
> > > > +					 * return any processed operations.
> > > > +					 * This statistic is mainly relevant
> > > > +					 * to hw accelerators.
> > > > +					 */
> > > > +					ops_deqd_failed++;
> > > > +				}
> > > > +			} while (ops_enqd_total - ops_deqd_total >
> > > > +					test_burst_size * 8);
> > > >  		}
> > > >
> > > >  		/* Dequeue any operations still in the crypto device */
> > > > --
> > > > 2.25.1
  

Patch

diff --git a/app/test-crypto-perf/cperf_test_throughput.c b/app/test-crypto-perf/cperf_test_throughput.c
index cecf30e470..5cd8919c91 100644
--- a/app/test-crypto-perf/cperf_test_throughput.c
+++ b/app/test-crypto-perf/cperf_test_throughput.c
@@ -223,26 +223,30 @@  cperf_throughput_test_runner(void *test_ctx)
 			ops_unused = burst_size - ops_enqd;
 			ops_enqd_total += ops_enqd;
 
-
 			/* Dequeue processed burst of ops from crypto device */
-			ops_deqd = rte_cryptodev_dequeue_burst(ctx->dev_id, ctx->qp_id,
-					ops_processed, test_burst_size);
-
-			if (likely(ops_deqd))  {
-				/* Free crypto ops so they can be reused. */
-				rte_mempool_put_bulk(ctx->pool,
-						(void **)ops_processed, ops_deqd);
-
-				ops_deqd_total += ops_deqd;
-			} else {
-				/**
-				 * Count dequeue polls which didn't return any
-				 * processed operations. This statistic is mainly
-				 * relevant to hw accelerators.
-				 */
-				ops_deqd_failed++;
-			}
-
+			do {
+				ops_deqd = rte_cryptodev_dequeue_burst(
+						ctx->dev_id, ctx->qp_id,
+						ops_processed, test_burst_size);
+
+				if (likely(ops_deqd))  {
+					/* Free crypto ops for reuse */
+					rte_mempool_put_bulk(ctx->pool,
+							(void **)ops_processed,
+							ops_deqd);
+
+					ops_deqd_total += ops_deqd;
+				} else {
+					/**
+					 * Count dequeue polls which didn't
+					 * return any processed operations.
+					 * This statistic is mainly relevant
+					 * to hw accelerators.
+					 */
+					ops_deqd_failed++;
+				}
+			} while (ops_enqd_total - ops_deqd_total >
+					test_burst_size * 8);
 		}
 
 		/* Dequeue any operations still in the crypto device */