From patchwork Tue May  9 16:14:03 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Fiona Trahe <fiona.trahe@intel.com>
X-Patchwork-Id: 24173
X-Patchwork-Delegate: pablo.de.lara.guarch@intel.com
Return-Path: <dev-bounces@dpdk.org>
X-Original-To: patchwork@dpdk.org
Delivered-To: patchwork@dpdk.org
Received: from [92.243.14.124] (localhost [IPv6:::1])
	by dpdk.org (Postfix) with ESMTP id 1A97168C1;
	Tue,  9 May 2017 18:14:11 +0200 (CEST)
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
	by dpdk.org (Postfix) with ESMTP id C71AA5A8D
	for <dev@dpdk.org>; Tue,  9 May 2017 18:14:08 +0200 (CEST)
Received: from orsmga004.jf.intel.com ([10.7.209.38])
	by fmsmga105.fm.intel.com with ESMTP; 09 May 2017 09:14:07 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.38,315,1491289200"; d="scan'208";a="85379268"
Received: from sivswdev01.ir.intel.com (HELO localhost.localdomain)
	([10.237.217.45])
	by orsmga004.jf.intel.com with ESMTP; 09 May 2017 09:14:06 -0700
From: Fiona Trahe <fiona.trahe@intel.com>
To: dev@dpdk.org,
	slawomirx.mrozowicz@intel.com
Cc: Declan.doherty@intel.com, john.griffin@intel.com,
	pablo.de.lara.guarch@intel.com, Fiona Trahe <fiona.trahe@intel.com>
Date: Tue,  9 May 2017 17:14:03 +0100
Message-Id: <1494346443-11130-1-git-send-email-fiona.trahe@intel.com>
X-Mailer: git-send-email 1.7.0.7
Subject: [dpdk-dev] [PATCH] app/crypto-perf: add minimise-offload-cost flag
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
	<mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
	<mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

The throughput test enqueues and dequeues bursts of operations
to the device. For software devices the full burst size will
usually be successfully en/dequeued, on hardware devices
however the CPU can call the API more frequently than necessary,
as it has nothing else to do.
Minimum offload cost is achieved when the specified
burst_size is en/dequeued. So rather than
wasting CPU cycles continually retrying, with a
fraction of the burst being en/dequeued each time,
fewer CPU cycles are used by backing off until a full
burst can be enqueued.

This patch adds a --minimise-offload-cost flag.
When set the test backs off until full bursts are
en/dequeued and counts the cycles while waiting.
These cycles represent cycles saved by
offloading, which in a real application are
available for other work. Hence these cycles are
deducted from the total cycle-count to show the
minimum offload-cost which can be acheived.

Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
---
 app/test-crypto-perf/cperf_options.h         |  2 +
 app/test-crypto-perf/cperf_options_parsing.c | 12 +++++
 app/test-crypto-perf/cperf_test_throughput.c | 76 ++++++++++++++++++++++------
 3 files changed, 75 insertions(+), 15 deletions(-)

diff --git a/app/test-crypto-perf/cperf_options.h b/app/test-crypto-perf/cperf_options.h
index b928c58..48ca1de 100644
--- a/app/test-crypto-perf/cperf_options.h
+++ b/app/test-crypto-perf/cperf_options.h
@@ -31,6 +31,7 @@
 #define CPERF_AUTH_DIGEST_SZ	("auth-digest-sz")
 #define CPERF_AUTH_AAD_SZ	("auth-aad-sz")
 #define CPERF_CSV		("csv-friendly")
+#define CPERF_MINIMISE_OFFLOAD_COST	("minimise-offload-cost")
 
 #define MAX_LIST 32
 
@@ -65,6 +66,7 @@ struct cperf_options {
 	uint32_t out_of_place:1;
 	uint32_t silent:1;
 	uint32_t csv:1;
+	uint32_t minimise_offload_cost:1;
 
 	enum rte_crypto_cipher_algorithm cipher_algo;
 	enum rte_crypto_cipher_operation cipher_op;
diff --git a/app/test-crypto-perf/cperf_options_parsing.c b/app/test-crypto-perf/cperf_options_parsing.c
index d172671..c8d31ef 100644
--- a/app/test-crypto-perf/cperf_options_parsing.c
+++ b/app/test-crypto-perf/cperf_options_parsing.c
@@ -439,6 +439,15 @@ parse_silent(struct cperf_options *opts,
 }
 
 static int
+parse_minimise_offload_cost(struct cperf_options *opts,
+		const char *arg __rte_unused)
+{
+	opts->minimise_offload_cost = 1;
+
+	return 0;
+}
+
+static int
 parse_cipher_algo(struct cperf_options *opts, const char *arg)
 {
 
@@ -603,6 +612,7 @@ static struct option lgopts[] = {
 	{ CPERF_AUTH_DIGEST_SZ, required_argument, 0, 0 },
 	{ CPERF_AUTH_AAD_SZ, required_argument, 0, 0 },
 	{ CPERF_CSV, no_argument, 0, 0},
+	{ CPERF_MINIMISE_OFFLOAD_COST, no_argument, 0, 0 },
 
 	{ NULL, 0, 0, 0 }
 };
@@ -640,6 +650,7 @@ cperf_options_default(struct cperf_options *opts)
 	opts->sessionless = 0;
 	opts->out_of_place = 0;
 	opts->csv = 0;
+	opts->minimise_offload_cost = 0;
 
 	opts->cipher_algo = RTE_CRYPTO_CIPHER_AES_CBC;
 	opts->cipher_op = RTE_CRYPTO_CIPHER_OP_ENCRYPT;
@@ -681,6 +692,7 @@ cperf_opts_parse_long(int opt_idx, struct cperf_options *opts)
 		{ CPERF_AUTH_DIGEST_SZ,	parse_auth_digest_sz },
 		{ CPERF_AUTH_AAD_SZ,	parse_auth_aad_sz },
 		{ CPERF_CSV,	parse_csv_friendly},
+		{ CPERF_MINIMISE_OFFLOAD_COST,	parse_minimise_offload_cost },
 	};
 	unsigned int i;
 
diff --git a/app/test-crypto-perf/cperf_test_throughput.c b/app/test-crypto-perf/cperf_test_throughput.c
index 61b27ea..17044ee 100644
--- a/app/test-crypto-perf/cperf_test_throughput.c
+++ b/app/test-crypto-perf/cperf_test_throughput.c
@@ -320,6 +320,9 @@ cperf_throughput_test_runner(void *test_ctx)
 		uint64_t ops_deqd = 0, ops_deqd_total = 0, ops_deqd_failed = 0;
 
 		uint64_t m_idx = 0, tsc_start, tsc_end, tsc_duration;
+		uint64_t wait_start = 0, wait_end = 0;
+		uint64_t wait_duration = 0;
+		uint32_t wait_cycles = 0;
 
 		uint16_t ops_unused = 0;
 
@@ -375,8 +378,13 @@ cperf_throughput_test_runner(void *test_ctx)
 			/* Enqueue burst of ops on crypto device */
 			ops_enqd = rte_cryptodev_enqueue_burst(ctx->dev_id, ctx->qp_id,
 					ops, burst_size);
-			if (ops_enqd < burst_size)
+			if (ops_enqd < burst_size) {
 				ops_enqd_failed++;
+				wait_cycles += 10;
+			} else {
+				if (wait_cycles)
+					wait_cycles--;
+			}
 
 			/**
 			 * Calculate number of ops not enqueued (mainly for hw
@@ -385,6 +393,26 @@ cperf_throughput_test_runner(void *test_ctx)
 			ops_unused = burst_size - ops_enqd;
 			ops_enqd_total += ops_enqd;
 
+			/**
+			 * Minimum offload cost will be achieved when the
+			 * specified burst_size is enqueued. Rather than
+			 * wasting CPU cycles continually retrying, with a
+			 * fraction of the burst being enqueued each time,
+			 * back off until a full burst can be enqueued.
+			 * The cycles counted here represent cycles saved by
+			 * offloading, which in a real application are
+			 * available for other work. Hence these cycles are
+			 * deducted from the total cycle-count to show the
+			 * offload cost.
+			 */
+			if (ctx->options->minimise_offload_cost &&
+							wait_cycles > 30) {
+				wait_start = wait_end = rte_rdtsc_precise();
+				while ((wait_end - wait_start) < wait_cycles)
+					wait_end = rte_rdtsc_precise();
+				wait_duration += (wait_end - wait_start);
+			}
+
 
 			/* Dequeue processed burst of ops from crypto device */
 			ops_deqd = rte_cryptodev_dequeue_burst(ctx->dev_id, ctx->qp_id,
@@ -400,13 +428,19 @@ cperf_throughput_test_runner(void *test_ctx)
 					rte_crypto_op_free(ops_processed[i]);
 
 				ops_deqd_total += ops_deqd;
-			} else {
+			}
+
+			if (ops_deqd != test_burst_size) {
 				/**
-				 * Count dequeue polls which didn't return any
-				 * processed operations. This statistic is mainly
+				 * Count dequeue polls which don't return a
+				 * full burst. This statistic is mainly
 				 * relevant to hw accelerators.
 				 */
 				ops_deqd_failed++;
+				wait_cycles += 10;
+			} else {
+				if (wait_cycles)
+					wait_cycles--;
 			}
 
 			m_idx += ops_needed;
@@ -415,8 +449,7 @@ cperf_throughput_test_runner(void *test_ctx)
 		}
 
 		/* Dequeue any operations still in the crypto device */
-
-		while (ops_deqd_total < ctx->options->total_ops) {
+		while (ops_deqd_total < ops_enqd_total) {
 			/* Sending 0 length burst to flush sw crypto device */
 			rte_cryptodev_enqueue_burst(ctx->dev_id, ctx->qp_id, NULL, 0);
 
@@ -447,18 +480,22 @@ cperf_throughput_test_runner(void *test_ctx)
 		/* Calculate average cycles per packet */
 		double cycles_per_packet = ((double)tsc_duration /
 				ctx->options->total_ops);
+		double available_cycles_per_packet = ((double)wait_duration /
+						ctx->options->total_ops);
 
 		if (!ctx->options->csv) {
 			if (!only_once)
-				printf("%12s%12s%12s%12s%12s%12s%12s%12s%12s%12s\n\n",
-					"lcore id", "Buf Size", "Burst Size",
+				printf(
+			"%8s%10s%6s%12s%12s%12s%12s%8s%8s%18s%10s%12s\n\n",
+					"lcore id", "Buf Size", "Burst",
 					"Enqueued", "Dequeued", "Failed Enq",
 					"Failed Deq", "MOps", "Gbps",
-					"Cycles/Buf");
+					"Cycles/Buf(Total)", "(Offload)",
+					"(Available)");
 			only_once = 1;
 
-			printf("%12u%12u%12u%12"PRIu64"%12"PRIu64"%12"PRIu64
-					"%12"PRIu64"%12.4f%12.4f%12.2f\n",
+			printf("%8u%10u%6u%12"PRIu64"%12"PRIu64"%12"PRIu64
+				"%12"PRIu64"%8.4f%8.4f%12.0f%12.0f%12.0f\n",
 					ctx->lcore_id,
 					ctx->options->test_buffer_size,
 					test_burst_size,
@@ -468,17 +505,22 @@ cperf_throughput_test_runner(void *test_ctx)
 					ops_deqd_failed,
 					ops_per_second/1000000,
 					throughput_gbps,
-					cycles_per_packet);
+					cycles_per_packet,
+					cycles_per_packet
+						- available_cycles_per_packet,
+					available_cycles_per_packet);
+
 		} else {
 			if (!only_once)
 				printf("# lcore id, Buffer Size(B),"
 					"Burst Size,Enqueued,Dequeued,Failed Enq,"
 					"Failed Deq,Ops(Millions),Throughput(Gbps),"
-					"Cycles/Buf\n\n");
+					"Cycles/Buf(Total),(Offload),"
+					"(Available)\n\n");
 			only_once = 1;
 
 			printf("%10u;%10u;%u;%"PRIu64";%"PRIu64";%"PRIu64";%"PRIu64";"
-					"%.f3;%.f3;%.f3\n",
+					"%.f3;%.f3;%.f3;%.f3;%.f3\n",
 					ctx->lcore_id,
 					ctx->options->test_buffer_size,
 					test_burst_size,
@@ -488,7 +530,11 @@ cperf_throughput_test_runner(void *test_ctx)
 					ops_deqd_failed,
 					ops_per_second/1000000,
 					throughput_gbps,
-					cycles_per_packet);
+					cycles_per_packet,
+					cycles_per_packet
+						- available_cycles_per_packet,
+					available_cycles_per_packet);
+
 		}
 
 		/* Get next size from range or list */