[v2,07/12] crypto/octeontx2: add enqueue/dequeue ops
diff mbox series

Message ID 1570970402-20278-8-git-send-email-anoobj@marvell.com
State Changes Requested
Delegated to: akhil goyal
Headers show
Series
  • add OCTEON TX2 crypto PMD
Related show

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail apply issues

Commit Message

Anoob Joseph Oct. 13, 2019, 12:39 p.m. UTC
This patch adds the enqueue burst and dequeue burst callbacks for the
OCTEON TX2 crypto driver.

Signed-off-by: Ankur Dwivedi <adwivedi@marvell.com>
Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Signed-off-by: Tejasree Kondoj <ktejasree@marvell.com>
---
 drivers/common/cpt/cpt_hw_types.h                  |  52 ++++
 drivers/crypto/octeontx2/Makefile                  |   1 +
 drivers/crypto/octeontx2/meson.build               |   1 +
 .../crypto/octeontx2/otx2_cryptodev_hw_access.h    |  16 ++
 drivers/crypto/octeontx2/otx2_cryptodev_ops.c      | 289 +++++++++++++++++++++
 5 files changed, 359 insertions(+)

Comments

Gavin Hu (Arm Technology China) Oct. 15, 2019, 11:19 a.m. UTC | #1
Hi Anoob,

This is a typical producer-consumer case, enqueue and dequeue operations must be conducted in a synchronized way, otherwise stale or earlier-than-arrival data will be got. 
Out-of-synchronization issues are more prone to happen on weak memory ordered platforms, like arm and PPC, if execution in the program order is assumed. 
I see in this patch MOD_INC() and other reads/writes of the indexes might get reordered with regard to real enqueue and deque operations, this may cause synchronization errors.
I have a fix for rte ring using C11 to keep synchronized operations, please refer to: http://patches.dpdk.org/patch/47733/ 
/Gavin

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Anoob Joseph
> Sent: Sunday, October 13, 2019 8:40 PM
> To: Akhil.goyal@nxp.com; Pablo de Lara <pablo.de.lara.guarch@intel.com>
> Cc: Anoob Joseph <anoobj@marvell.com>; Fiona Trahe
> <fiona.trahe@intel.com>; jerinj@marvell.com; Narayana Prasad
> <pathreya@marvell.com>; Shally Verma <shallyv@marvell.com>; Ankur
> Dwivedi <adwivedi@marvell.com>; Kanaka Durga Kotamarthy
> <kkotamarthy@marvell.com>; Sunila Sahu <ssahu@marvell.com>; Tejasree
> Kondoj <ktejasree@marvell.com>; dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v2 07/12] crypto/octeontx2: add
> enqueue/dequeue ops
> 
> This patch adds the enqueue burst and dequeue burst callbacks for the
> OCTEON TX2 crypto driver.

<snip>
Anoob Joseph Oct. 16, 2019, 9:57 a.m. UTC | #2
Hi Gavin,

Thanks for your review and suggestion. I agree with your suggestion and would be taking up the rework. But since we are close to RC1, I would like to defer this to the next release cycle. This change would touch all features supported and hence, will require an extensive QA internally.

Thanks,
Anoob

> -----Original Message-----
> From: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>
> Sent: Tuesday, October 15, 2019 4:50 PM
> To: Anoob Joseph <anoobj@marvell.com>; Akhil.goyal@nxp.com; Pablo de
> Lara <pablo.de.lara.guarch@intel.com>
> Cc: Fiona Trahe <fiona.trahe@intel.com>; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>; Narayana Prasad Raju Athreya
> <pathreya@marvell.com>; Shally Verma <shallyv@marvell.com>; Ankur
> Dwivedi <adwivedi@marvell.com>; Kanaka Durga Kotamarthy
> <kkotamarthy@marvell.com>; Sunila Sahu <ssahu@marvell.com>; Tejasree
> Kondoj <ktejasree@marvell.com>; dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; nd <nd@arm.com>
> Subject: [EXT] RE: [dpdk-dev] [PATCH v2 07/12] crypto/octeontx2: add
> enqueue/dequeue ops
> 
> External Email
> 
> ----------------------------------------------------------------------
> Hi Anoob,
> 
> This is a typical producer-consumer case, enqueue and dequeue operations
> must be conducted in a synchronized way, otherwise stale or earlier-than-
> arrival data will be got.
> Out-of-synchronization issues are more prone to happen on weak memory
> ordered platforms, like arm and PPC, if execution in the program order is
> assumed.
> I see in this patch MOD_INC() and other reads/writes of the indexes might
> get reordered with regard to real enqueue and deque operations, this may
> cause synchronization errors.
> I have a fix for rte ring using C11 to keep synchronized operations, please
> refer to: https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__patches.dpdk.org_patch_47733_&d=DwIFAg&c=nKjWec2b6R0mOyPaz7
> xtfQ&r=jPfB8rwwviRSxyLWs2n6B-WYLn1v9SyTMrT5EQqh2TU&m=-
> CQNFLhmDggI38cEto5lpU668oUfOgbIO8Akw7ViaVM&s=c5-
> AH2gQk6Kqi_AKdacCs0k-w42Qrisy4Ev9SF3fG6w&e=
> /Gavin
> 
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Anoob Joseph
> > Sent: Sunday, October 13, 2019 8:40 PM
> > To: Akhil.goyal@nxp.com; Pablo de Lara
> > <pablo.de.lara.guarch@intel.com>
> > Cc: Anoob Joseph <anoobj@marvell.com>; Fiona Trahe
> > <fiona.trahe@intel.com>; jerinj@marvell.com; Narayana Prasad
> > <pathreya@marvell.com>; Shally Verma <shallyv@marvell.com>; Ankur
> > Dwivedi <adwivedi@marvell.com>; Kanaka Durga Kotamarthy
> > <kkotamarthy@marvell.com>; Sunila Sahu <ssahu@marvell.com>;
> Tejasree
> > Kondoj <ktejasree@marvell.com>; dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH v2 07/12] crypto/octeontx2: add
> > enqueue/dequeue ops
> >
> > This patch adds the enqueue burst and dequeue burst callbacks for the
> > OCTEON TX2 crypto driver.
> 
> <snip>

Patch
diff mbox series

diff --git a/drivers/common/cpt/cpt_hw_types.h b/drivers/common/cpt/cpt_hw_types.h
index 4c2893b..e2b127d 100644
--- a/drivers/common/cpt/cpt_hw_types.h
+++ b/drivers/common/cpt/cpt_hw_types.h
@@ -197,6 +197,44 @@  typedef union cpt_inst_s {
 		};
 #endif /* Word 7 - End */
 	} s8x;
+	struct cpt_inst_s_9s {
+#if (RTE_BYTE_ORDER == RTE_BIG_ENDIAN) /* Word 0 - Big Endian */
+		uint64_t nixtx_addr            : 60;
+		uint64_t doneint               : 1;
+		uint64_t nixtxl                : 3;
+#else /* Word 0 - Little Endian */
+		uint64_t nixtxl                : 3;
+		uint64_t doneint               : 1;
+		uint64_t nixtx_addr            : 60;
+#endif /* Word 0 - End */
+		uint64_t res_addr;
+#if (RTE_BYTE_ORDER == RTE_BIG_ENDIAN) /* Word 2 - Big Endian */
+		uint64_t rvu_pf_func           : 16;
+		uint64_t reserved_172_175      : 4;
+		uint64_t grp                   : 10;
+		uint64_t tt                    : 2;
+		uint64_t tag                   : 32;
+#else /* Word 2 - Little Endian */
+		uint64_t tag                   : 32;
+		uint64_t tt                    : 2;
+		uint64_t grp                   : 10;
+		uint64_t reserved_172_175      : 4;
+		uint64_t rvu_pf_func           : 16;
+#endif /* Word 2 - End */
+#if (RTE_BYTE_ORDER == RTE_BIG_ENDIAN) /* Word 3 - Big Endian */
+		uint64_t wq_ptr                : 61;
+		uint64_t reserved_194_193      : 2;
+		uint64_t qord                  : 1;
+#else /* Word 3 - Little Endian */
+		uint64_t qord                  : 1;
+		uint64_t reserved_194_193      : 2;
+		uint64_t wq_ptr                : 61;
+#endif /* Word 3 - End */
+		uint64_t ei0;
+		uint64_t ei1;
+		uint64_t ei2;
+		uint64_t ei3;
+	} s9x;
 } cpt_inst_s_t;
 
 /**
@@ -243,6 +281,20 @@  typedef union cpt_res_s {
 		uint64_t reserved_64_127       : 64;
 #endif /* Word 1 - End */
 	} s8x;
+	struct cpt_res_s_9s {
+#if (RTE_BYTE_ORDER == RTE_BIG_ENDIAN) /* Word 0 - Big Endian */
+		uint64_t reserved_17_63:47;
+		uint64_t doneint:1;
+		uint64_t uc_compcode:8;
+		uint64_t compcode:8;
+#else /* Word 0 - Little Endian */
+		uint64_t compcode:8;
+		uint64_t uc_compcode:8;
+		uint64_t doneint:1;
+		uint64_t reserved_17_63:47;
+#endif /* Word 0 - End */
+		uint64_t reserved_64_127;
+	} s9x;
 } cpt_res_s_t;
 
 /**
diff --git a/drivers/crypto/octeontx2/Makefile b/drivers/crypto/octeontx2/Makefile
index 968efac..ce116b8 100644
--- a/drivers/crypto/octeontx2/Makefile
+++ b/drivers/crypto/octeontx2/Makefile
@@ -24,6 +24,7 @@  CFLAGS += -O3
 CFLAGS += -I$(RTE_SDK)/drivers/common/cpt
 CFLAGS += -I$(RTE_SDK)/drivers/common/octeontx2
 CFLAGS += -I$(RTE_SDK)/drivers/mempool/octeontx2
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 
 ifneq ($(CONFIG_RTE_ARCH_64),y)
 CFLAGS += -Wno-int-to-pointer-cast
diff --git a/drivers/crypto/octeontx2/meson.build b/drivers/crypto/octeontx2/meson.build
index ee2e907..b6e5b73 100644
--- a/drivers/crypto/octeontx2/meson.build
+++ b/drivers/crypto/octeontx2/meson.build
@@ -10,6 +10,7 @@  deps += ['common_cpt']
 deps += ['common_octeontx2']
 name = 'octeontx2_crypto'
 
+allow_experimental_apis = true
 sources = files('otx2_cryptodev.c',
 		'otx2_cryptodev_capabilities.c',
 		'otx2_cryptodev_hw_access.c',
diff --git a/drivers/crypto/octeontx2/otx2_cryptodev_hw_access.h b/drivers/crypto/octeontx2/otx2_cryptodev_hw_access.h
index 87d4e77..6f78aa4 100644
--- a/drivers/crypto/octeontx2/otx2_cryptodev_hw_access.h
+++ b/drivers/crypto/octeontx2/otx2_cryptodev_hw_access.h
@@ -12,6 +12,7 @@ 
 
 #include "cpt_common.h"
 #include "cpt_hw_types.h"
+#include "cpt_mcode_defines.h"
 
 #include "otx2_dev.h"
 
@@ -119,6 +120,21 @@  union otx2_cpt_lf_q_grp_ptr {
 	} s;
 };
 
+/*
+ * Enumeration cpt_9x_comp_e
+ *
+ * CPT 9X Completion Enumeration
+ * Enumerates the values of CPT_RES_S[COMPCODE].
+ */
+enum cpt_9x_comp_e {
+	CPT_9X_COMP_E_NOTDONE = 0x00,
+	CPT_9X_COMP_E_GOOD = 0x01,
+	CPT_9X_COMP_E_FAULT = 0x02,
+	CPT_9X_COMP_E_HWERR = 0x04,
+	CPT_9X_COMP_E_INSTERR = 0x05,
+	CPT_9X_COMP_E_LAST_ENTRY = 0x06
+};
+
 struct otx2_cpt_qp {
 	uint32_t id;
 	/**< Queue pair id */
diff --git a/drivers/crypto/octeontx2/otx2_cryptodev_ops.c b/drivers/crypto/octeontx2/otx2_cryptodev_ops.c
index d4dc1de..ef6e7e7 100644
--- a/drivers/crypto/octeontx2/otx2_cryptodev_ops.c
+++ b/drivers/crypto/octeontx2/otx2_cryptodev_ops.c
@@ -329,6 +329,292 @@  sym_session_clear(int driver_id, struct rte_cryptodev_sym_session *sess)
 	rte_mempool_put(pool, priv);
 }
 
+static __rte_always_inline int32_t __hot
+otx2_cpt_enqueue_req(const struct otx2_cpt_qp *qp,
+		     struct pending_queue *pend_q,
+		     struct cpt_request_info *req)
+{
+	void *lmtline = qp->lmtline;
+	union cpt_inst_s inst;
+	uint64_t lmt_status;
+
+	if (unlikely(pend_q->pending_count >= OTX2_CPT_DEFAULT_CMD_QLEN))
+		return -EAGAIN;
+
+	inst.u[0] = 0;
+	inst.s9x.res_addr = req->comp_baddr;
+	inst.u[2] = 0;
+	inst.u[3] = 0;
+
+	inst.s9x.ei0 = req->ist.ei0;
+	inst.s9x.ei1 = req->ist.ei1;
+	inst.s9x.ei2 = req->ist.ei2;
+	inst.s9x.ei3 = req->ist.ei3;
+
+	req->time_out = rte_get_timer_cycles() +
+			DEFAULT_COMMAND_TIMEOUT * rte_get_timer_hz();
+
+	do {
+		/* Copy CPT command to LMTLINE */
+		memcpy(lmtline, &inst, sizeof(inst));
+
+		/*
+		 * Make sure compiler does not reorder memcpy and ldeor.
+		 * LMTST transactions are always flushed from the write
+		 * buffer immediately, a DMB is not required to push out
+		 * LMTSTs.
+		 */
+		rte_cio_wmb();
+		lmt_status = otx2_lmt_submit(qp->lf_nq_reg);
+	} while (lmt_status == 0);
+
+	pend_q->rid_queue[pend_q->enq_tail].rid = (uintptr_t)req;
+
+	/* We will use soft queue length here to limit requests */
+	MOD_INC(pend_q->enq_tail, OTX2_CPT_DEFAULT_CMD_QLEN);
+	pend_q->pending_count += 1;
+
+	return 0;
+}
+
+static __rte_always_inline int __hot
+otx2_cpt_enqueue_sym(struct otx2_cpt_qp *qp, struct rte_crypto_op *op,
+		     struct pending_queue *pend_q)
+{
+	struct rte_crypto_sym_op *sym_op = op->sym;
+	struct cpt_request_info *req;
+	struct cpt_sess_misc *sess;
+	vq_cmd_word3_t *w3;
+	uint64_t cpt_op;
+	void *mdata;
+	int ret;
+
+	sess = get_sym_session_private_data(sym_op->session,
+					    otx2_cryptodev_driver_id);
+
+	cpt_op = sess->cpt_op;
+
+	if (cpt_op & CPT_OP_CIPHER_MASK)
+		ret = fill_fc_params(op, sess, &qp->meta_info, &mdata,
+				     (void **)&req);
+	else
+		ret = fill_digest_params(op, sess, &qp->meta_info, &mdata,
+					 (void **)&req);
+
+	if (unlikely(ret)) {
+		CPT_LOG_DP_ERR("Crypto req : op %p, cpt_op 0x%x ret 0x%x",
+				op, (unsigned int)cpt_op, ret);
+		return ret;
+	}
+
+	w3 = ((vq_cmd_word3_t *)(&req->ist.ei3));
+	w3->s.grp = sess->egrp;
+
+	ret = otx2_cpt_enqueue_req(qp, pend_q, req);
+
+	if (unlikely(ret)) {
+		/* Free buffer allocated by fill params routines */
+		free_op_meta(mdata, qp->meta_info.pool);
+	}
+
+	return ret;
+}
+
+static __rte_always_inline int __hot
+otx2_cpt_enqueue_sym_sessless(struct otx2_cpt_qp *qp, struct rte_crypto_op *op,
+			      struct pending_queue *pend_q)
+{
+	const int driver_id = otx2_cryptodev_driver_id;
+	struct rte_crypto_sym_op *sym_op = op->sym;
+	struct rte_cryptodev_sym_session *sess;
+	int ret;
+
+	/* Create temporary session */
+
+	if (rte_mempool_get(qp->sess_mp, (void **)&sess))
+		return -ENOMEM;
+
+	ret = sym_session_configure(driver_id, sym_op->xform, sess,
+				    qp->sess_mp_priv);
+	if (ret)
+		goto sess_put;
+
+	sym_op->session = sess;
+
+	ret = otx2_cpt_enqueue_sym(qp, op, pend_q);
+
+	if (unlikely(ret))
+		goto priv_put;
+
+	return 0;
+
+priv_put:
+	sym_session_clear(driver_id, sess);
+sess_put:
+	rte_mempool_put(qp->sess_mp, sess);
+	return ret;
+}
+
+static uint16_t
+otx2_cpt_enqueue_burst(void *qptr, struct rte_crypto_op **ops, uint16_t nb_ops)
+{
+	uint16_t nb_allowed, count = 0;
+	struct otx2_cpt_qp *qp = qptr;
+	struct pending_queue *pend_q;
+	struct rte_crypto_op *op;
+	int ret;
+
+	pend_q = &qp->pend_q;
+
+	nb_allowed = OTX2_CPT_DEFAULT_CMD_QLEN - pend_q->pending_count;
+	if (nb_ops > nb_allowed)
+		nb_ops = nb_allowed;
+
+	for (count = 0; count < nb_ops; count++) {
+		op = ops[count];
+		if (op->type == RTE_CRYPTO_OP_TYPE_SYMMETRIC) {
+			if (op->sess_type == RTE_CRYPTO_OP_WITH_SESSION)
+				ret = otx2_cpt_enqueue_sym(qp, op, pend_q);
+			else
+				ret = otx2_cpt_enqueue_sym_sessless(qp, op,
+								    pend_q);
+		} else
+			break;
+
+		if (unlikely(ret))
+			break;
+	}
+
+	return count;
+}
+
+static inline void
+otx2_cpt_dequeue_post_process(struct otx2_cpt_qp *qp, struct rte_crypto_op *cop,
+			      uintptr_t *rsp, uint8_t cc)
+{
+	if (cop->type == RTE_CRYPTO_OP_TYPE_SYMMETRIC) {
+		if (likely(cc == NO_ERR)) {
+			/* Verify authentication data if required */
+			if (unlikely(rsp[2]))
+				compl_auth_verify(cop, (uint8_t *)rsp[2],
+						 rsp[3]);
+			else
+				cop->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
+		} else {
+			if (cc == ERR_GC_ICV_MISCOMPARE)
+				cop->status = RTE_CRYPTO_OP_STATUS_AUTH_FAILED;
+			else
+				cop->status = RTE_CRYPTO_OP_STATUS_ERROR;
+		}
+
+		if (unlikely(cop->sess_type == RTE_CRYPTO_OP_SESSIONLESS)) {
+			sym_session_clear(otx2_cryptodev_driver_id,
+					  cop->sym->session);
+			rte_mempool_put(qp->sess_mp, cop->sym->session);
+			cop->sym->session = NULL;
+		}
+	}
+}
+
+static __rte_always_inline uint8_t
+otx2_cpt_compcode_get(struct cpt_request_info *req)
+{
+	volatile struct cpt_res_s_9s *res;
+	uint8_t ret;
+
+	res = (volatile struct cpt_res_s_9s *)req->completion_addr;
+
+	if (unlikely(res->compcode == CPT_9X_COMP_E_NOTDONE)) {
+		if (rte_get_timer_cycles() < req->time_out)
+			return ERR_REQ_PENDING;
+
+		CPT_LOG_DP_ERR("Request timed out");
+		return ERR_REQ_TIMEOUT;
+	}
+
+	if (likely(res->compcode == CPT_9X_COMP_E_GOOD)) {
+		ret = NO_ERR;
+		if (unlikely(res->uc_compcode)) {
+			ret = res->uc_compcode;
+			CPT_LOG_DP_DEBUG("Request failed with microcode error");
+			CPT_LOG_DP_DEBUG("MC completion code 0x%x",
+					 res->uc_compcode);
+		}
+	} else {
+		CPT_LOG_DP_DEBUG("HW completion code 0x%x", res->compcode);
+
+		ret = res->compcode;
+		switch (res->compcode) {
+		case CPT_9X_COMP_E_INSTERR:
+			CPT_LOG_DP_ERR("Request failed with instruction error");
+			break;
+		case CPT_9X_COMP_E_FAULT:
+			CPT_LOG_DP_ERR("Request failed with DMA fault");
+			break;
+		case CPT_9X_COMP_E_HWERR:
+			CPT_LOG_DP_ERR("Request failed with hardware error");
+			break;
+		default:
+			CPT_LOG_DP_ERR("Request failed with unknown completion code");
+		}
+	}
+
+	return ret;
+}
+
+static uint16_t
+otx2_cpt_dequeue_burst(void *qptr, struct rte_crypto_op **ops, uint16_t nb_ops)
+{
+	int i, nb_pending, nb_completed;
+	struct otx2_cpt_qp *qp = qptr;
+	struct pending_queue *pend_q;
+	struct cpt_request_info *req;
+	struct rte_crypto_op *cop;
+	uint8_t cc[nb_ops];
+	struct rid *rid;
+	uintptr_t *rsp;
+	void *metabuf;
+
+	pend_q = &qp->pend_q;
+
+	nb_pending = pend_q->pending_count;
+
+	if (nb_ops > nb_pending)
+		nb_ops = nb_pending;
+
+	for (i = 0; i < nb_ops; i++) {
+		rid = &pend_q->rid_queue[pend_q->deq_head];
+		req = (struct cpt_request_info *)(rid->rid);
+
+		cc[i] = otx2_cpt_compcode_get(req);
+
+		if (unlikely(cc[i] == ERR_REQ_PENDING))
+			break;
+
+		ops[i] = req->op;
+
+		MOD_INC(pend_q->deq_head, OTX2_CPT_DEFAULT_CMD_QLEN);
+		pend_q->pending_count -= 1;
+	}
+
+	nb_completed = i;
+
+	for (i = 0; i < nb_completed; i++) {
+		rsp = (void *)ops[i];
+
+		metabuf = (void *)rsp[0];
+		cop = (void *)rsp[1];
+
+		ops[i] = cop;
+
+		otx2_cpt_dequeue_post_process(qp, cop, rsp, cc[i]);
+
+		free_op_meta(metabuf, qp->meta_info.pool);
+	}
+
+	return nb_completed;
+}
+
 /* PMD ops */
 
 static int
@@ -378,6 +664,9 @@  otx2_cpt_dev_config(struct rte_cryptodev *dev,
 		goto queues_detach;
 	}
 
+	dev->enqueue_burst = otx2_cpt_enqueue_burst;
+	dev->dequeue_burst = otx2_cpt_dequeue_burst;
+
 	rte_mb();
 	return 0;