[RFC,1/9] security: introduce CPU Crypto action type and API

Message ID 20190903154046.55992-2-roy.fan.zhang@intel.com (mailing list archive)
State Changes Requested, archived
Delegated to: akhil goyal
Headers
Series security: add software synchronous crypto process |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Fan Zhang Sept. 3, 2019, 3:40 p.m. UTC
  This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO action type to
security library. The type represents performing crypto operation with CPU
cycles. The patch also includes a new API to process crypto operations in
bulk and the function pointers for PMDs.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 lib/librte_security/rte_security.c           | 16 +++++++++
 lib/librte_security/rte_security.h           | 51 +++++++++++++++++++++++++++-
 lib/librte_security/rte_security_driver.h    | 19 +++++++++++
 lib/librte_security/rte_security_version.map |  1 +
 4 files changed, 86 insertions(+), 1 deletion(-)
  

Comments

Akhil Goyal Sept. 4, 2019, 10:32 a.m. UTC | #1
Hi Fan,

> 
> This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO action
> type to
> security library. The type represents performing crypto operation with CPU
> cycles. The patch also includes a new API to process crypto operations in
> bulk and the function pointers for PMDs.
> 
I am not able to get the flow of execution for this action type. Could you please elaborate
the flow in the documentation. If not in documentation right now, then please elaborate the
flow in cover letter.
Also I see that there are new APIs for processing crypto operations in bulk.
What does that mean. How are they different from the existing APIs which are also
handling bulk crypto ops depending on the budget.


-Akhil
  
Fan Zhang Sept. 4, 2019, 1:06 p.m. UTC | #2
Hi Akhil,

This action type allows the burst of symmetric crypto workload using the same
algorithm, key, and direction being processed by CPU cycles synchronously. 
This flexible action type does not require external hardware involvement,
having the crypto workload processed synchronously, and is more performant
than Cryptodev SW PMD due to the saved cycles on removed "async mode
simulation" as well as 3 cacheline access of the crypto ops. 

AESNI-GCM and AESNI-MB PMDs are updated with this support. There is a small
performance test app under app/test/security_aesni_gcm(mb)_perftest to
prove.

For the new API
The packet is sent to the crypto device for symmetric crypto
processing. The device will encrypt or decrypt the buffer based on the session
data specified and preprocessed in the security session. Different
than the inline or lookaside modes, when the function exits, the user will
expect the buffers are either processed successfully, or having the error number
assigned to the appropriate index of the status array.

Will update the program's guide in the v1 patch.

Regards,
Fan

> -----Original Message-----
> From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> Sent: Wednesday, September 4, 2019 11:33 AM
> To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty, Declan
> <declan.doherty@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and
> API
> 
> Hi Fan,
> 
> >
> > This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> action
> > type to security library. The type represents performing crypto
> > operation with CPU cycles. The patch also includes a new API to
> > process crypto operations in bulk and the function pointers for PMDs.
> >
> I am not able to get the flow of execution for this action type. Could you
> please elaborate the flow in the documentation. If not in documentation
> right now, then please elaborate the flow in cover letter.
> Also I see that there are new APIs for processing crypto operations in bulk.
> What does that mean. How are they different from the existing APIs which
> are also handling bulk crypto ops depending on the budget.
> 
> 
> -Akhil
  
Akhil Goyal Sept. 6, 2019, 9:01 a.m. UTC | #3
Hi Fan,
> 
> Hi Akhil,
> 
> This action type allows the burst of symmetric crypto workload using the same
> algorithm, key, and direction being processed by CPU cycles synchronously.
> This flexible action type does not require external hardware involvement,
> having the crypto workload processed synchronously, and is more performant
> than Cryptodev SW PMD due to the saved cycles on removed "async mode
> simulation" as well as 3 cacheline access of the crypto ops.

Does that mean application will not call the cryptodev_enqueue_burst and corresponding dequeue burst.
It would be a new API something like process_packets and it will have the crypto processed packets while returning from the API?

I still do not understand why we cannot do with the conventional crypto lib only.
As far as I can understand, you are not doing any protocol processing or any value add
To the crypto processing. IMO, you just need a synchronous crypto processing API which
Can be defined in cryptodev, you don't need to re-create a crypto session in the name of
Security session in the driver just to do a synchronous processing.

> 
> AESNI-GCM and AESNI-MB PMDs are updated with this support. There is a small
> performance test app under app/test/security_aesni_gcm(mb)_perftest to
> prove.
> 
> For the new API
> The packet is sent to the crypto device for symmetric crypto
> processing. The device will encrypt or decrypt the buffer based on the session
> data specified and preprocessed in the security session. Different
> than the inline or lookaside modes, when the function exits, the user will
> expect the buffers are either processed successfully, or having the error number
> assigned to the appropriate index of the status array.
> 
> Will update the program's guide in the v1 patch.
> 
> Regards,
> Fan
> 
> > -----Original Message-----
> > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > Sent: Wednesday, September 4, 2019 11:33 AM
> > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty, Declan
> > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > <pablo.de.lara.guarch@intel.com>
> > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and
> > API
> >
> > Hi Fan,
> >
> > >
> > > This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > action
> > > type to security library. The type represents performing crypto
> > > operation with CPU cycles. The patch also includes a new API to
> > > process crypto operations in bulk and the function pointers for PMDs.
> > >
> > I am not able to get the flow of execution for this action type. Could you
> > please elaborate the flow in the documentation. If not in documentation
> > right now, then please elaborate the flow in cover letter.
> > Also I see that there are new APIs for processing crypto operations in bulk.
> > What does that mean. How are they different from the existing APIs which
> > are also handling bulk crypto ops depending on the budget.
> >
> >
> > -Akhil
  
Fan Zhang Sept. 6, 2019, 1:12 p.m. UTC | #4
Hi Akhil,

You are right, the new API will process the crypto workload, no heavy enqueue
Dequeue operations required. 

Cryptodev tends to support multiple crypto devices, including HW and SW. 
The 3-cache line access, iova address computation and assignment, simulation
of async enqueue/dequeue operations, allocate and free crypto ops, even the
mbuf linked-list for scatter-gather buffers are too heavy for SW crypto PMDs.

To create this new synchronous API in cryptodev cannot avoid the problem
listed above:  first the API shall not serve only to part of the crypto (SW) PMDs -
as you know, it is Cryptodev. The users can expect some PMD only support part
of the overall algorithms, but not the workload processing API. 

Another reason is, there is assumption made, first when creating a crypto op
we have to allocate the memory to hold crypto op + sym op + iv, - we cannot
simply declare an array of crypto ops in the run-time and discard it when processing
is done. Also we need to fill aad and digest HW address, which is not required for
SW at all. 

Bottom line: using crypto op will still have 3 cache-line access performance problem.

So if we to create the new API in Cryptodev instead of rte_security, we need to
create new crypto op structure only for the SW PMDs, carefully document them
to not confuse with existing cryptodev APIs, make new device feature flags to
indicate the API is not supported by some PMDs, and again carefully document
them of these device feature flags.

So, to push these changes to rte_security instead the above problem can be resolved,
and the performance improvement because of this change is big for smaller packets
- I attached a performance test app in the patchset.

For rte_security, we already have inline-crypto type that works quite close to the this
new API, the only difference is that it is processed by the CPU cycles. As you may
have already seen the ipsec-library has wrapped these changes, and ipsec-secgw
has only minimum updates to adopt this change too. So to the end user, if they 
use IPSec this patchset can seamlessly enabled with just commandline update when
creating an SA.

Regards,
Fan
 

> -----Original Message-----
> From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> Sent: Friday, September 6, 2019 10:01 AM
> To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty, Declan
> <declan.doherty@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and
> API
> 
> 
> Hi Fan,
> >
> > Hi Akhil,
> >
> > This action type allows the burst of symmetric crypto workload using
> > the same algorithm, key, and direction being processed by CPU cycles
> synchronously.
> > This flexible action type does not require external hardware
> > involvement, having the crypto workload processed synchronously, and
> > is more performant than Cryptodev SW PMD due to the saved cycles on
> > removed "async mode simulation" as well as 3 cacheline access of the
> crypto ops.
> 
> Does that mean application will not call the cryptodev_enqueue_burst and
> corresponding dequeue burst.
> It would be a new API something like process_packets and it will have the
> crypto processed packets while returning from the API?
> 
> I still do not understand why we cannot do with the conventional crypto lib
> only.
> As far as I can understand, you are not doing any protocol processing or any
> value add To the crypto processing. IMO, you just need a synchronous crypto
> processing API which Can be defined in cryptodev, you don't need to re-
> create a crypto session in the name of Security session in the driver just to do
> a synchronous processing.
> 
> >
> > AESNI-GCM and AESNI-MB PMDs are updated with this support. There is a
> > small performance test app under
> > app/test/security_aesni_gcm(mb)_perftest to prove.
> >
> > For the new API
> > The packet is sent to the crypto device for symmetric crypto
> > processing. The device will encrypt or decrypt the buffer based on the
> > session data specified and preprocessed in the security session.
> > Different than the inline or lookaside modes, when the function exits,
> > the user will expect the buffers are either processed successfully, or
> > having the error number assigned to the appropriate index of the status
> array.
> >
> > Will update the program's guide in the v1 patch.
> >
> > Regards,
> > Fan
> >
> > > -----Original Message-----
> > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> > > Declan <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > <pablo.de.lara.guarch@intel.com>
> > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action
> > > type and API
> > >
> > > Hi Fan,
> > >
> > > >
> > > > This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > action
> > > > type to security library. The type represents performing crypto
> > > > operation with CPU cycles. The patch also includes a new API to
> > > > process crypto operations in bulk and the function pointers for PMDs.
> > > >
> > > I am not able to get the flow of execution for this action type.
> > > Could you please elaborate the flow in the documentation. If not in
> > > documentation right now, then please elaborate the flow in cover letter.
> > > Also I see that there are new APIs for processing crypto operations in
> bulk.
> > > What does that mean. How are they different from the existing APIs
> > > which are also handling bulk crypto ops depending on the budget.
> > >
> > >
> > > -Akhil
  
Ananyev, Konstantin Sept. 6, 2019, 1:27 p.m. UTC | #5
Hi Akhil,

> > This action type allows the burst of symmetric crypto workload using the same
> > algorithm, key, and direction being processed by CPU cycles synchronously.
> > This flexible action type does not require external hardware involvement,
> > having the crypto workload processed synchronously, and is more performant
> > than Cryptodev SW PMD due to the saved cycles on removed "async mode
> > simulation" as well as 3 cacheline access of the crypto ops.
> 
> Does that mean application will not call the cryptodev_enqueue_burst and corresponding dequeue burst.

Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)

> It would be a new API something like process_packets and it will have the crypto processed packets while returning from the API?

Yes, though the plan is that API will operate on raw data buffers, not mbufs.

> 
> I still do not understand why we cannot do with the conventional crypto lib only.
> As far as I can understand, you are not doing any protocol processing or any value add
> To the crypto processing. IMO, you just need a synchronous crypto processing API which
> Can be defined in cryptodev, you don't need to re-create a crypto session in the name of
> Security session in the driver just to do a synchronous processing.

I suppose your question is why not to have rte_crypot_process_cpu_crypto_bulk(...) instead?
The main reason is that would require disruptive changes in existing cryptodev API
(would cause ABI/API breakage).
Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some extra information
that normal crypto_sym_xform doesn't contain 
(cipher offset from the start of the buffer, might be something extra in future).
Also right now there is no way to add new type of crypto_sym_session without
either breaking existing crypto-dev ABI/API or introducing new structure 
(rte_crypto_sym_cpu_session or so) for that.   
While rte_security is designed in a way that we can add new session types and
related parameters without causing API/ABI breakage. 

BTW, what is your concern with proposed approach (via rte_security)?
From my perspective it is a lightweight change and it is totally optional
for the crypto PMDs to support it or not.
Konstantin 

> >
> > AESNI-GCM and AESNI-MB PMDs are updated with this support. There is a small
> > performance test app under app/test/security_aesni_gcm(mb)_perftest to
> > prove.
> >
> > For the new API
> > The packet is sent to the crypto device for symmetric crypto
> > processing. The device will encrypt or decrypt the buffer based on the session
> > data specified and preprocessed in the security session. Different
> > than the inline or lookaside modes, when the function exits, the user will
> > expect the buffers are either processed successfully, or having the error number
> > assigned to the appropriate index of the status array.
> >
> > Will update the program's guide in the v1 patch.
> >
> > Regards,
> > Fan
> >
> > > -----Original Message-----
> > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty, Declan
> > > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > <pablo.de.lara.guarch@intel.com>
> > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and
> > > API
> > >
> > > Hi Fan,
> > >
> > > >
> > > > This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > action
> > > > type to security library. The type represents performing crypto
> > > > operation with CPU cycles. The patch also includes a new API to
> > > > process crypto operations in bulk and the function pointers for PMDs.
> > > >
> > > I am not able to get the flow of execution for this action type. Could you
> > > please elaborate the flow in the documentation. If not in documentation
> > > right now, then please elaborate the flow in cover letter.
> > > Also I see that there are new APIs for processing crypto operations in bulk.
> > > What does that mean. How are they different from the existing APIs which
> > > are also handling bulk crypto ops depending on the budget.
> > >
> > >
> > > -Akhil
  
Akhil Goyal Sept. 10, 2019, 10:44 a.m. UTC | #6
Hi Konstantin,
> 
> Hi Akhil,
> 
> > > This action type allows the burst of symmetric crypto workload using the
> same
> > > algorithm, key, and direction being processed by CPU cycles synchronously.
> > > This flexible action type does not require external hardware involvement,
> > > having the crypto workload processed synchronously, and is more
> performant
> > > than Cryptodev SW PMD due to the saved cycles on removed "async mode
> > > simulation" as well as 3 cacheline access of the crypto ops.
> >
> > Does that mean application will not call the cryptodev_enqueue_burst and
> corresponding dequeue burst.
> 
> Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> 
> > It would be a new API something like process_packets and it will have the
> crypto processed packets while returning from the API?
> 
> Yes, though the plan is that API will operate on raw data buffers, not mbufs.
> 
> >
> > I still do not understand why we cannot do with the conventional crypto lib
> only.
> > As far as I can understand, you are not doing any protocol processing or any
> value add
> > To the crypto processing. IMO, you just need a synchronous crypto processing
> API which
> > Can be defined in cryptodev, you don't need to re-create a crypto session in
> the name of
> > Security session in the driver just to do a synchronous processing.
> 
> I suppose your question is why not to have
> rte_crypot_process_cpu_crypto_bulk(...) instead?
> The main reason is that would require disruptive changes in existing cryptodev
> API
> (would cause ABI/API breakage).
> Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some extra
> information
> that normal crypto_sym_xform doesn't contain
> (cipher offset from the start of the buffer, might be something extra in future).

Cipher offset will be part of rte_crypto_op. If you intend not to use rte_crypto_op
You can pass this as an argument in the new cryptodev API.
Something extra will also cause ABI breakage in security as well.
So it will be same.

> Also right now there is no way to add new type of crypto_sym_session without
> either breaking existing crypto-dev ABI/API or introducing new structure
> (rte_crypto_sym_cpu_session or so) for that.

What extra info is required in rte_cryptodev_sym_session to get the rte_crypto_sym_cpu_session.
I don't think there is any.
I believe the same crypto session will be able to work synchronously as well. We would only need
a new API to perform synchronous actions. That will reduce the duplication code significantly
in the driver to support 2 different kind of APIs with similar code inside. 
Please correct me in case I am missing something.


> While rte_security is designed in a way that we can add new session types and
> related parameters without causing API/ABI breakage.

Yes the intent is to add new sessions based on various protocols that can be supported by the driver.
It is not that we should find it as an alternative to cryptodev and using it just because it will not cause
ABI/API breakage. IMO the code should be placed where its intent is.

> 
> BTW, what is your concern with proposed approach (via rte_security)?
> From my perspective it is a lightweight change and it is totally optional
> for the crypto PMDs to support it or not.
> Konstantin
> 
> > >
> > > AESNI-GCM and AESNI-MB PMDs are updated with this support. There is a
> small
> > > performance test app under app/test/security_aesni_gcm(mb)_perftest to
> > > prove.
> > >
> > > For the new API
> > > The packet is sent to the crypto device for symmetric crypto
> > > processing. The device will encrypt or decrypt the buffer based on the
> session
> > > data specified and preprocessed in the security session. Different
> > > than the inline or lookaside modes, when the function exits, the user will
> > > expect the buffers are either processed successfully, or having the error
> number
> > > assigned to the appropriate index of the status array.
> > >
> > > Will update the program's guide in the v1 patch.
> > >
> > > Regards,
> > > Fan
> > >
> > > > -----Original Message-----
> > > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> Declan
> > > > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > > <pablo.de.lara.guarch@intel.com>
> > > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type
> and
> > > > API
> > > >
> > > > Hi Fan,
> > > >
> > > > >
> > > > > This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > action
> > > > > type to security library. The type represents performing crypto
> > > > > operation with CPU cycles. The patch also includes a new API to
> > > > > process crypto operations in bulk and the function pointers for PMDs.
> > > > >
> > > > I am not able to get the flow of execution for this action type. Could you
> > > > please elaborate the flow in the documentation. If not in documentation
> > > > right now, then please elaborate the flow in cover letter.
> > > > Also I see that there are new APIs for processing crypto operations in bulk.
> > > > What does that mean. How are they different from the existing APIs which
> > > > are also handling bulk crypto ops depending on the budget.
> > > >
> > > >
> > > > -Akhil
  
Akhil Goyal Sept. 10, 2019, 11:25 a.m. UTC | #7
Hi Fan,
> 
> Hi Akhil,
> 
> You are right, the new API will process the crypto workload, no heavy enqueue
> Dequeue operations required.
> 
> Cryptodev tends to support multiple crypto devices, including HW and SW.
> The 3-cache line access, iova address computation and assignment, simulation
> of async enqueue/dequeue operations, allocate and free crypto ops, even the
> mbuf linked-list for scatter-gather buffers are too heavy for SW crypto PMDs.

Why cant we have a cryptodev synchronous API which work on plain bufs as your suggested
API and use the same crypto sym_session creation logic as it was before? It will perform
same as it is doing in this series.

> 
> To create this new synchronous API in cryptodev cannot avoid the problem
> listed above:  first the API shall not serve only to part of the crypto (SW) PMDs -
> as you know, it is Cryptodev. The users can expect some PMD only support part
> of the overall algorithms, but not the workload processing API.

Why cant we have an optional data path in cryptodev for synchronous behavior if the
underlying PMD support it. It depends on the PMD to decide whether it can have it supported or not.
Only a feature flag will be needed to decide that.
One more option could be a PMD API which the application can directly call if the
mode is only supported in very few PMDs. This could be a backup if there is a 
requirement of deprecation notice etc.

> 
> Another reason is, there is assumption made, first when creating a crypto op
> we have to allocate the memory to hold crypto op + sym op + iv, - we cannot
> simply declare an array of crypto ops in the run-time and discard it when
> processing
> is done. Also we need to fill aad and digest HW address, which is not required for
> SW at all.

We are defining a new API which may have its own parameters and requirements which
Need to be fulfilled. In case it was a rte_security API, then also you are defining a new way
Of packet execution and API params. So it would be same.
You can reduce the cache line accesses as you need in the new API.
The session logic need not be changed from crypto session to security session.
Only the data patch need to be altered as per the new API.

> 
> Bottom line: using crypto op will still have 3 cache-line access performance
> problem.
> 
> So if we to create the new API in Cryptodev instead of rte_security, we need to
> create new crypto op structure only for the SW PMDs, carefully document them
> to not confuse with existing cryptodev APIs, make new device feature flags to
> indicate the API is not supported by some PMDs, and again carefully document
> them of these device feature flags.

The explanation of the new API will also happen in case it is a security API. Instead you need
to add more explanation for session also which is already there in cryptodev.

> 
> So, to push these changes to rte_security instead the above problem can be
> resolved,
> and the performance improvement because of this change is big for smaller
> packets
> - I attached a performance test app in the patchset.

I believe there wont be any perf gap in case the optimized new cryptodev API is used.

> 
> For rte_security, we already have inline-crypto type that works quite close to the
> this
> new API, the only difference is that it is processed by the CPU cycles. As you may
> have already seen the ipsec-library has wrapped these changes, and ipsec-secgw
> has only minimum updates to adopt this change too. So to the end user, if they
> use IPSec this patchset can seamlessly enabled with just commandline update
> when
> creating an SA.

In the IPSec application I do not see the changes wrt the new execution API.
So the data path is not getting handled there. It looks incomplete. The user experience
to use the new API will definitely be changed.

So I believe this patchset is not required in rte_security, we can have it in cryptodev unless
I have missed something.

> 
> Regards,
> Fan
> 
> 
> > -----Original Message-----
> > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > Sent: Friday, September 6, 2019 10:01 AM
> > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty, Declan
> > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > <pablo.de.lara.guarch@intel.com>
> > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and
> > API
> >
> >
> > Hi Fan,
> > >
> > > Hi Akhil,
> > >
> > > This action type allows the burst of symmetric crypto workload using
> > > the same algorithm, key, and direction being processed by CPU cycles
> > synchronously.
> > > This flexible action type does not require external hardware
> > > involvement, having the crypto workload processed synchronously, and
> > > is more performant than Cryptodev SW PMD due to the saved cycles on
> > > removed "async mode simulation" as well as 3 cacheline access of the
> > crypto ops.
> >
> > Does that mean application will not call the cryptodev_enqueue_burst and
> > corresponding dequeue burst.
> > It would be a new API something like process_packets and it will have the
> > crypto processed packets while returning from the API?
> >
> > I still do not understand why we cannot do with the conventional crypto lib
> > only.
> > As far as I can understand, you are not doing any protocol processing or any
> > value add To the crypto processing. IMO, you just need a synchronous crypto
> > processing API which Can be defined in cryptodev, you don't need to re-
> > create a crypto session in the name of Security session in the driver just to do
> > a synchronous processing.
> >
> > >
> > > AESNI-GCM and AESNI-MB PMDs are updated with this support. There is a
> > > small performance test app under
> > > app/test/security_aesni_gcm(mb)_perftest to prove.
> > >
> > > For the new API
> > > The packet is sent to the crypto device for symmetric crypto
> > > processing. The device will encrypt or decrypt the buffer based on the
> > > session data specified and preprocessed in the security session.
> > > Different than the inline or lookaside modes, when the function exits,
> > > the user will expect the buffers are either processed successfully, or
> > > having the error number assigned to the appropriate index of the status
> > array.
> > >
> > > Will update the program's guide in the v1 patch.
> > >
> > > Regards,
> > > Fan
> > >
> > > > -----Original Message-----
> > > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> > > > Declan <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > > <pablo.de.lara.guarch@intel.com>
> > > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action
> > > > type and API
> > > >
> > > > Hi Fan,
> > > >
> > > > >
> > > > > This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > action
> > > > > type to security library. The type represents performing crypto
> > > > > operation with CPU cycles. The patch also includes a new API to
> > > > > process crypto operations in bulk and the function pointers for PMDs.
> > > > >
> > > > I am not able to get the flow of execution for this action type.
> > > > Could you please elaborate the flow in the documentation. If not in
> > > > documentation right now, then please elaborate the flow in cover letter.
> > > > Also I see that there are new APIs for processing crypto operations in
> > bulk.
> > > > What does that mean. How are they different from the existing APIs
> > > > which are also handling bulk crypto ops depending on the budget.
> > > >
> > > >
> > > > -Akhil
  
Ananyev, Konstantin Sept. 11, 2019, 12:29 p.m. UTC | #8
Hi Akhil,
> >
> > > > This action type allows the burst of symmetric crypto workload using the
> > same
> > > > algorithm, key, and direction being processed by CPU cycles synchronously.
> > > > This flexible action type does not require external hardware involvement,
> > > > having the crypto workload processed synchronously, and is more
> > performant
> > > > than Cryptodev SW PMD due to the saved cycles on removed "async mode
> > > > simulation" as well as 3 cacheline access of the crypto ops.
> > >
> > > Does that mean application will not call the cryptodev_enqueue_burst and
> > corresponding dequeue burst.
> >
> > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> >
> > > It would be a new API something like process_packets and it will have the
> > crypto processed packets while returning from the API?
> >
> > Yes, though the plan is that API will operate on raw data buffers, not mbufs.
> >
> > >
> > > I still do not understand why we cannot do with the conventional crypto lib
> > only.
> > > As far as I can understand, you are not doing any protocol processing or any
> > value add
> > > To the crypto processing. IMO, you just need a synchronous crypto processing
> > API which
> > > Can be defined in cryptodev, you don't need to re-create a crypto session in
> > the name of
> > > Security session in the driver just to do a synchronous processing.
> >
> > I suppose your question is why not to have
> > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > The main reason is that would require disruptive changes in existing cryptodev
> > API
> > (would cause ABI/API breakage).
> > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some extra
> > information
> > that normal crypto_sym_xform doesn't contain
> > (cipher offset from the start of the buffer, might be something extra in future).
> 
> Cipher offset will be part of rte_crypto_op.

fill/read (+ alloc/free) is one of the main things that slowdown current crypto-op approach.
That's why the general idea - have all data that wouldn't change from packet to packet
included into the session and setup it once at session_init().

> If you intend not to use rte_crypto_op
> You can pass this as an argument in the new cryptodev API.

You mean extra parameter in rte_security_process_cpu_crypto_bulk()?
It can be in theory, but that solution looks a bit ugly:
	why to pass for each call something that would be constant per session?
	Again having that value constant per session might allow some extra optimisations
	That would be hard to achieve for dynamic case. 
and not extendable:
Suppose tomorrow will need to add something extra (some new algorithm support or so).
With what you proposing will need to new parameter to the function,
which means API breakage. 

> Something extra will also cause ABI breakage in security as well.
> So it will be same.

I don't think it would.
AFAIK, right now this patch doesn't introduce any API/ABI breakage.
Iinside struct rte_security_session_conf we have a union of xforms
depending on session type.
So as long as cpu_crypto_xform wouldn't exceed sizes of other xform -
I believe no ABI breakage will appear.


> 
> > Also right now there is no way to add new type of crypto_sym_session without
> > either breaking existing crypto-dev ABI/API or introducing new structure
> > (rte_crypto_sym_cpu_session or so) for that.
> 
> What extra info is required in rte_cryptodev_sym_session to get the rte_crypto_sym_cpu_session.

Right now - just cipher_offset (see above).
What else in future (if any) - don't know.

> I don't think there is any.
> I believe the same crypto session will be able to work synchronously as well.

Exactly the same - problematically, see above.

> We would only need  a new API to perform synchronous actions.
> That will reduce the duplication code significantly
> in the driver to support 2 different kind of APIs with similar code inside.
> Please correct me in case I am missing something.

To add new API into crypto-dev would also require changes in the PMD,
it wouldn't come totally free and I believe would require roughly the same amount of changes. 

> 
> 
> > While rte_security is designed in a way that we can add new session types and
> > related parameters without causing API/ABI breakage.
> 
> Yes the intent is to add new sessions based on various protocols that can be supported by the driver.

Various protocols and different types of sessions (and devices they belong to).
Let say right now we have INLINE_CRYPTO, INLINE_PROTO, LOOKASIDE_PROTO, etc.
Here we introduce new type of session.

> It is not that we should find it as an alternative to cryptodev and using it just because it will not cause
> ABI/API breakage.

I am considering this new API as an alternative to existing ones, but as an extension.
Existing crypto-op API has its own advantages (generic), and I think we should keep it supported by all crypto-devs. 
From other side rte_security is an extendable framework that suits the purpose:
allows easily (and yes without ABI breakage) introduce new API for special type of crypto-dev (SW based).


 


> IMO the code should be placed where its intent is.
> 
> >
> > BTW, what is your concern with proposed approach (via rte_security)?
> > From my perspective it is a lightweight change and it is totally optional
> > for the crypto PMDs to support it or not.
> > Konstantin
> >
> > > >
> > > > AESNI-GCM and AESNI-MB PMDs are updated with this support. There is a
> > small
> > > > performance test app under app/test/security_aesni_gcm(mb)_perftest to
> > > > prove.
> > > >
> > > > For the new API
> > > > The packet is sent to the crypto device for symmetric crypto
> > > > processing. The device will encrypt or decrypt the buffer based on the
> > session
> > > > data specified and preprocessed in the security session. Different
> > > > than the inline or lookaside modes, when the function exits, the user will
> > > > expect the buffers are either processed successfully, or having the error
> > number
> > > > assigned to the appropriate index of the status array.
> > > >
> > > > Will update the program's guide in the v1 patch.
> > > >
> > > > Regards,
> > > > Fan
> > > >
> > > > > -----Original Message-----
> > > > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> > Declan
> > > > > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > > > <pablo.de.lara.guarch@intel.com>
> > > > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type
> > and
> > > > > API
> > > > >
> > > > > Hi Fan,
> > > > >
> > > > > >
> > > > > > This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > > action
> > > > > > type to security library. The type represents performing crypto
> > > > > > operation with CPU cycles. The patch also includes a new API to
> > > > > > process crypto operations in bulk and the function pointers for PMDs.
> > > > > >
> > > > > I am not able to get the flow of execution for this action type. Could you
> > > > > please elaborate the flow in the documentation. If not in documentation
> > > > > right now, then please elaborate the flow in cover letter.
> > > > > Also I see that there are new APIs for processing crypto operations in bulk.
> > > > > What does that mean. How are they different from the existing APIs which
> > > > > are also handling bulk crypto ops depending on the budget.
> > > > >
> > > > >
> > > > > -Akhil
  
Ananyev, Konstantin Sept. 11, 2019, 1:01 p.m. UTC | #9
Hi lads,
> >
> > You are right, the new API will process the crypto workload, no heavy enqueue
> > Dequeue operations required.
> >
> > Cryptodev tends to support multiple crypto devices, including HW and SW.
> > The 3-cache line access, iova address computation and assignment, simulation
> > of async enqueue/dequeue operations, allocate and free crypto ops, even the
> > mbuf linked-list for scatter-gather buffers are too heavy for SW crypto PMDs.
> 
> Why cant we have a cryptodev synchronous API which work on plain bufs as your suggested
> API and use the same crypto sym_session creation logic as it was before? It will perform
> same as it is doing in this series.

I tried to summarize our reasons in another mail in that thread.

> 
> >
> > To create this new synchronous API in cryptodev cannot avoid the problem
> > listed above:  first the API shall not serve only to part of the crypto (SW) PMDs -
> > as you know, it is Cryptodev. The users can expect some PMD only support part
> > of the overall algorithms, but not the workload processing API.
> 
> Why cant we have an optional data path in cryptodev for synchronous behavior if the
> underlying PMD support it. It depends on the PMD to decide whether it can have it supported or not.
> Only a feature flag will be needed to decide that.
> One more option could be a PMD API which the application can directly call if the
> mode is only supported in very few PMDs. This could be a backup if there is a
> requirement of deprecation notice etc.
> 
> >
> > Another reason is, there is assumption made, first when creating a crypto op
> > we have to allocate the memory to hold crypto op + sym op + iv, - we cannot
> > simply declare an array of crypto ops in the run-time and discard it when
> > processing
> > is done. Also we need to fill aad and digest HW address, which is not required for
> > SW at all.
> 
> We are defining a new API which may have its own parameters and requirements which
> Need to be fulfilled. In case it was a rte_security API, then also you are defining a new way
> Of packet execution and API params. So it would be same.
> You can reduce the cache line accesses as you need in the new API.
> The session logic need not be changed from crypto session to security session.
> Only the data patch need to be altered as per the new API.
> 
> >
> > Bottom line: using crypto op will still have 3 cache-line access performance
> > problem.
> >
> > So if we to create the new API in Cryptodev instead of rte_security, we need to
> > create new crypto op structure only for the SW PMDs, carefully document them
> > to not confuse with existing cryptodev APIs, make new device feature flags to
> > indicate the API is not supported by some PMDs, and again carefully document
> > them of these device feature flags.
> 
> The explanation of the new API will also happen in case it is a security API. Instead you need
> to add more explanation for session also which is already there in cryptodev.
> 
> >
> > So, to push these changes to rte_security instead the above problem can be
> > resolved,
> > and the performance improvement because of this change is big for smaller
> > packets
> > - I attached a performance test app in the patchset.
> 
> I believe there wont be any perf gap in case the optimized new cryptodev API is used.
> 
> >
> > For rte_security, we already have inline-crypto type that works quite close to the
> > this
> > new API, the only difference is that it is processed by the CPU cycles. As you may
> > have already seen the ipsec-library has wrapped these changes, and ipsec-secgw
> > has only minimum updates to adopt this change too. So to the end user, if they
> > use IPSec this patchset can seamlessly enabled with just commandline update
> > when
> > creating an SA.
> 
> In the IPSec application I do not see the changes wrt the new execution API.
> So the data path is not getting handled there. It looks incomplete. The user experience
> to use the new API will definitely be changed.

I believe we do support it for libtre_ipsec mode.
librte_ipsec hides all processing complexity inside and
does call rte_security_process_cpu_crypto_bulk() internally.
That's why for librte_ipsec it is literally 2 lines change:
--- a/examples/ipsec-secgw/ipsec_process.c
+++ b/examples/ipsec-secgw/ipsec_process.c
@@ -101,7 +101,8 @@  fill_ipsec_session(struct rte_ipsec_session *ss, struct ipsec_ctx *ctx,
 		}
 		ss->crypto.ses = sa->crypto_session;
 	/* setup session action type */
-	} else if (sa->type == RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL) {
+	} else if (sa->type == RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL ||
+			sa->type == RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) {
 		if (sa->sec_session == NULL) {
 			rc = create_lookaside_session(ctx, sa);
 			if (rc != 0)
@@ -227,8 +228,8 @@  ipsec_process(struct ipsec_ctx *ctx, struct ipsec_traffic *trf)
 
 		/* process packets inline */
 		else if (sa->type == RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO ||
-				sa->type ==
-				RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL) {
+			sa->type == RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL ||
+			sa->type == RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) {
 
 			satp = rte_ipsec_sa_type(ips->sa);
  
Akhil Goyal Sept. 12, 2019, 2:12 p.m. UTC | #10
Hi Konstantin,

> Hi Akhil,
> > >
> > > > > This action type allows the burst of symmetric crypto workload using the
> > > same
> > > > > algorithm, key, and direction being processed by CPU cycles
> synchronously.
> > > > > This flexible action type does not require external hardware involvement,
> > > > > having the crypto workload processed synchronously, and is more
> > > performant
> > > > > than Cryptodev SW PMD due to the saved cycles on removed "async
> mode
> > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > >
> > > > Does that mean application will not call the cryptodev_enqueue_burst and
> > > corresponding dequeue burst.
> > >
> > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > >
> > > > It would be a new API something like process_packets and it will have the
> > > crypto processed packets while returning from the API?
> > >
> > > Yes, though the plan is that API will operate on raw data buffers, not mbufs.
> > >
> > > >
> > > > I still do not understand why we cannot do with the conventional crypto lib
> > > only.
> > > > As far as I can understand, you are not doing any protocol processing or
> any
> > > value add
> > > > To the crypto processing. IMO, you just need a synchronous crypto
> processing
> > > API which
> > > > Can be defined in cryptodev, you don't need to re-create a crypto session
> in
> > > the name of
> > > > Security session in the driver just to do a synchronous processing.
> > >
> > > I suppose your question is why not to have
> > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > The main reason is that would require disruptive changes in existing
> cryptodev
> > > API
> > > (would cause ABI/API breakage).
> > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some extra
> > > information
> > > that normal crypto_sym_xform doesn't contain
> > > (cipher offset from the start of the buffer, might be something extra in
> future).
> >
> > Cipher offset will be part of rte_crypto_op.
> 
> fill/read (+ alloc/free) is one of the main things that slowdown current crypto-op
> approach.
> That's why the general idea - have all data that wouldn't change from packet to
> packet
> included into the session and setup it once at session_init().

I agree that you cannot use crypto-op.
You can have the new API in crypto.
As per the current patch, you only need cipher_offset which you can have it as a parameter until
You get it approved in the crypto xform. I believe it will be beneficial in case of other crypto cases as well.
We can have cipher offset at both places(crypto-op and cipher_xform). It will give flexibility to the user to
override it.


> 
> > If you intend not to use rte_crypto_op
> > You can pass this as an argument in the new cryptodev API.
> 
> You mean extra parameter in rte_security_process_cpu_crypto_bulk()?
> It can be in theory, but that solution looks a bit ugly:
> 	why to pass for each call something that would be constant per session?
> 	Again having that value constant per session might allow some extra
> optimisations
> 	That would be hard to achieve for dynamic case.
> and not extendable:
> Suppose tomorrow will need to add something extra (some new algorithm
> support or so).
> With what you proposing will need to new parameter to the function,
> which means API breakage.
> 
> > Something extra will also cause ABI breakage in security as well.
> > So it will be same.
> 
> I don't think it would.
> AFAIK, right now this patch doesn't introduce any API/ABI breakage.
> Iinside struct rte_security_session_conf we have a union of xforms
> depending on session type.
> So as long as cpu_crypto_xform wouldn't exceed sizes of other xform -
> I believe no ABI breakage will appear.
Agreed, it will not break ABI in case of security till we do not exceed current size.

Saving an ABI/API breakage is more important or placing the code at the correct place.
We need to find a tradeoff. Others can comment on this.
@Thomas Monjalon, @De Lara Guarch, Pablo Any comments?

> 
> 
> >
> > > Also right now there is no way to add new type of crypto_sym_session
> without
> > > either breaking existing crypto-dev ABI/API or introducing new structure
> > > (rte_crypto_sym_cpu_session or so) for that.
> >
> > What extra info is required in rte_cryptodev_sym_session to get the
> rte_crypto_sym_cpu_session.
> 
> Right now - just cipher_offset (see above).
> What else in future (if any) - don't know.
> 
> > I don't think there is any.
> > I believe the same crypto session will be able to work synchronously as well.
> 
> Exactly the same - problematically, see above.
> 
> > We would only need  a new API to perform synchronous actions.
> > That will reduce the duplication code significantly
> > in the driver to support 2 different kind of APIs with similar code inside.
> > Please correct me in case I am missing something.
> 
> To add new API into crypto-dev would also require changes in the PMD,
> it wouldn't come totally free and I believe would require roughly the same
> amount of changes.

It will be required only in the PMDs which support it and would be minimal.
You would need a feature flag, support  for that synchronous API. Session information will
already be there in the session. The changes wrt cipher_offset need to be added
but with some default value to identify override will be done or not.

> 
> >
> >
> > > While rte_security is designed in a way that we can add new session types
> and
> > > related parameters without causing API/ABI breakage.
> >
> > Yes the intent is to add new sessions based on various protocols that can be
> supported by the driver.
> 
> Various protocols and different types of sessions (and devices they belong to).
> Let say right now we have INLINE_CRYPTO, INLINE_PROTO, LOOKASIDE_PROTO,
> etc.
> Here we introduce new type of session.

What is the new value add to the existing sessions. The changes that we are doing
here is just to avoid an API/ABI breakage. The synchronous processing can happen on both
crypto and security session. This would mean, only the processing API should be defined,
rest all should be already there in the sessions.
In All other cases, INLINE - eth device was not having any format to perform crypto op
LOOKASIDE - PROTO - add protocol specific sessions which is not available in crypto.

> 
> > It is not that we should find it as an alternative to cryptodev and using it just
> because it will not cause
> > ABI/API breakage.
> 
> I am considering this new API as an alternative to existing ones, but as an
> extension.
> Existing crypto-op API has its own advantages (generic), and I think we should
> keep it supported by all crypto-devs.
> From other side rte_security is an extendable framework that suits the purpose:
> allows easily (and yes without ABI breakage) introduce new API for special type
> of crypto-dev (SW based).
> 
> 

Adding a synchronous processing API is understandable and can be added in both
Crypto as well as Security, but a new action type for it is not required.
Now whether to support that, we have ABI/API breakage, that is a different issue.
And we may have to deal with it if no other option is there.

> 
> 
> 
> > IMO the code should be placed where its intent is.
> >
> > >
> > > BTW, what is your concern with proposed approach (via rte_security)?
> > > From my perspective it is a lightweight change and it is totally optional
> > > for the crypto PMDs to support it or not.
> > > Konstantin
> > >
> > > > >
> > > > > AESNI-GCM and AESNI-MB PMDs are updated with this support. There is
> a
> > > small
> > > > > performance test app under app/test/security_aesni_gcm(mb)_perftest
> to
> > > > > prove.
> > > > >
> > > > > For the new API
> > > > > The packet is sent to the crypto device for symmetric crypto
> > > > > processing. The device will encrypt or decrypt the buffer based on the
> > > session
> > > > > data specified and preprocessed in the security session. Different
> > > > > than the inline or lookaside modes, when the function exits, the user will
> > > > > expect the buffers are either processed successfully, or having the error
> > > number
> > > > > assigned to the appropriate index of the status array.
> > > > >
> > > > > Will update the program's guide in the v1 patch.
> > > > >
> > > > > Regards,
> > > > > Fan
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > > > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > > > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> > > Declan
> > > > > > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > > > > <pablo.de.lara.guarch@intel.com>
> > > > > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action
> type
> > > and
> > > > > > API
> > > > > >
> > > > > > Hi Fan,
> > > > > >
> > > > > > >
> > > > > > > This patch introduce new
> RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > > > action
> > > > > > > type to security library. The type represents performing crypto
> > > > > > > operation with CPU cycles. The patch also includes a new API to
> > > > > > > process crypto operations in bulk and the function pointers for PMDs.
> > > > > > >
> > > > > > I am not able to get the flow of execution for this action type. Could
> you
> > > > > > please elaborate the flow in the documentation. If not in
> documentation
> > > > > > right now, then please elaborate the flow in cover letter.
> > > > > > Also I see that there are new APIs for processing crypto operations in
> bulk.
> > > > > > What does that mean. How are they different from the existing APIs
> which
> > > > > > are also handling bulk crypto ops depending on the budget.
> > > > > >
> > > > > >
> > > > > > -Akhil
  
Ananyev, Konstantin Sept. 16, 2019, 2:53 p.m. UTC | #11
Hi Akhil,

> > > > > > This action type allows the burst of symmetric crypto workload using the
> > > > same
> > > > > > algorithm, key, and direction being processed by CPU cycles
> > synchronously.
> > > > > > This flexible action type does not require external hardware involvement,
> > > > > > having the crypto workload processed synchronously, and is more
> > > > performant
> > > > > > than Cryptodev SW PMD due to the saved cycles on removed "async
> > mode
> > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > >
> > > > > Does that mean application will not call the cryptodev_enqueue_burst and
> > > > corresponding dequeue burst.
> > > >
> > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > >
> > > > > It would be a new API something like process_packets and it will have the
> > > > crypto processed packets while returning from the API?
> > > >
> > > > Yes, though the plan is that API will operate on raw data buffers, not mbufs.
> > > >
> > > > >
> > > > > I still do not understand why we cannot do with the conventional crypto lib
> > > > only.
> > > > > As far as I can understand, you are not doing any protocol processing or
> > any
> > > > value add
> > > > > To the crypto processing. IMO, you just need a synchronous crypto
> > processing
> > > > API which
> > > > > Can be defined in cryptodev, you don't need to re-create a crypto session
> > in
> > > > the name of
> > > > > Security session in the driver just to do a synchronous processing.
> > > >
> > > > I suppose your question is why not to have
> > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > The main reason is that would require disruptive changes in existing
> > cryptodev
> > > > API
> > > > (would cause ABI/API breakage).
> > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some extra
> > > > information
> > > > that normal crypto_sym_xform doesn't contain
> > > > (cipher offset from the start of the buffer, might be something extra in
> > future).
> > >
> > > Cipher offset will be part of rte_crypto_op.
> >
> > fill/read (+ alloc/free) is one of the main things that slowdown current crypto-op
> > approach.
> > That's why the general idea - have all data that wouldn't change from packet to
> > packet
> > included into the session and setup it once at session_init().
> 
> I agree that you cannot use crypto-op.
> You can have the new API in crypto.
> As per the current patch, you only need cipher_offset which you can have it as a parameter until
> You get it approved in the crypto xform. I believe it will be beneficial in case of other crypto cases as well.
> We can have cipher offset at both places(crypto-op and cipher_xform). It will give flexibility to the user to
> override it.

After having another thought on your proposal: 
Probably we can introduce new rte_crypto_sym_xform_types for CPU related stuff here?
Let say we can have :
num rte_crypto_sym_xform_type {
        RTE_CRYPTO_SYM_XFORM_NOT_SPECIFIED = 0, /**< No xform specified */
        RTE_CRYPTO_SYM_XFORM_AUTH,              /**< Authentication xform */
        RTE_CRYPTO_SYM_XFORM_CIPHER,            /**< Cipher xform  */
        RTE_CRYPTO_SYM_XFORM_AEAD               /**< AEAD xform  */
+     RTE_CRYPTO_SYM_XFORM_CPU = INT32_MIN,
+    RTE_CRYPTO_SYM_XFORM_CPU_AEAD = (RTE_CRYPTO_SYM_XFORM_CPU | RTE_CRYPTO_SYM_XFORM_CPU),
      /* same for auth and crypto xforms */
};

Then we either can re-define some values in struct rte_crypto_aead_xform (via unions),
or even have new  struct rte_crypto_cpu_aead_xform (same for crypto and auth xforms).
Then if PMD wants to support new sync API it would need to recognize new xform types
and internally  it might end up with different session structure (one for sync, another for async mode).
That I think should allow us to introduce cpu_crypto as part of crypto-dev API without ABI breakage.
What do you think?
Konstantin 
 
> 
> >
> > > If you intend not to use rte_crypto_op
> > > You can pass this as an argument in the new cryptodev API.
> >
> > You mean extra parameter in rte_security_process_cpu_crypto_bulk()?
> > It can be in theory, but that solution looks a bit ugly:
> > 	why to pass for each call something that would be constant per session?
> > 	Again having that value constant per session might allow some extra
> > optimisations
> > 	That would be hard to achieve for dynamic case.
> > and not extendable:
> > Suppose tomorrow will need to add something extra (some new algorithm
> > support or so).
> > With what you proposing will need to new parameter to the function,
> > which means API breakage.
> >
> > > Something extra will also cause ABI breakage in security as well.
> > > So it will be same.
> >
> > I don't think it would.
> > AFAIK, right now this patch doesn't introduce any API/ABI breakage.
> > Iinside struct rte_security_session_conf we have a union of xforms
> > depending on session type.
> > So as long as cpu_crypto_xform wouldn't exceed sizes of other xform -
> > I believe no ABI breakage will appear.
> Agreed, it will not break ABI in case of security till we do not exceed current size.
> 
> Saving an ABI/API breakage is more important or placing the code at the correct place.
> We need to find a tradeoff. Others can comment on this.
> @Thomas Monjalon, @De Lara Guarch, Pablo Any comments?
> 
> >
> >
> > >
> > > > Also right now there is no way to add new type of crypto_sym_session
> > without
> > > > either breaking existing crypto-dev ABI/API or introducing new structure
> > > > (rte_crypto_sym_cpu_session or so) for that.
> > >
> > > What extra info is required in rte_cryptodev_sym_session to get the
> > rte_crypto_sym_cpu_session.
> >
> > Right now - just cipher_offset (see above).
> > What else in future (if any) - don't know.
> >
> > > I don't think there is any.
> > > I believe the same crypto session will be able to work synchronously as well.
> >
> > Exactly the same - problematically, see above.
> >
> > > We would only need  a new API to perform synchronous actions.
> > > That will reduce the duplication code significantly
> > > in the driver to support 2 different kind of APIs with similar code inside.
> > > Please correct me in case I am missing something.
> >
> > To add new API into crypto-dev would also require changes in the PMD,
> > it wouldn't come totally free and I believe would require roughly the same
> > amount of changes.
> 
> It will be required only in the PMDs which support it and would be minimal.
> You would need a feature flag, support  for that synchronous API. Session information will
> already be there in the session. The changes wrt cipher_offset need to be added
> but with some default value to identify override will be done or not.
> 
> >
> > >
> > >
> > > > While rte_security is designed in a way that we can add new session types
> > and
> > > > related parameters without causing API/ABI breakage.
> > >
> > > Yes the intent is to add new sessions based on various protocols that can be
> > supported by the driver.
> >
> > Various protocols and different types of sessions (and devices they belong to).
> > Let say right now we have INLINE_CRYPTO, INLINE_PROTO, LOOKASIDE_PROTO,
> > etc.
> > Here we introduce new type of session.
> 
> What is the new value add to the existing sessions. The changes that we are doing
> here is just to avoid an API/ABI breakage. The synchronous processing can happen on both
> crypto and security session. This would mean, only the processing API should be defined,
> rest all should be already there in the sessions.
> In All other cases, INLINE - eth device was not having any format to perform crypto op
> LOOKASIDE - PROTO - add protocol specific sessions which is not available in crypto.
> 
> >
> > > It is not that we should find it as an alternative to cryptodev and using it just
> > because it will not cause
> > > ABI/API breakage.
> >
> > I am considering this new API as an alternative to existing ones, but as an
> > extension.
> > Existing crypto-op API has its own advantages (generic), and I think we should
> > keep it supported by all crypto-devs.
> > From other side rte_security is an extendable framework that suits the purpose:
> > allows easily (and yes without ABI breakage) introduce new API for special type
> > of crypto-dev (SW based).
> >
> >
> 
> Adding a synchronous processing API is understandable and can be added in both
> Crypto as well as Security, but a new action type for it is not required.
> Now whether to support that, we have ABI/API breakage, that is a different issue.
> And we may have to deal with it if no other option is there.
> 
> >
> >
> >
> > > IMO the code should be placed where its intent is.
> > >
> > > >
> > > > BTW, what is your concern with proposed approach (via rte_security)?
> > > > From my perspective it is a lightweight change and it is totally optional
> > > > for the crypto PMDs to support it or not.
> > > > Konstantin
> > > >
> > > > > >
> > > > > > AESNI-GCM and AESNI-MB PMDs are updated with this support. There is
> > a
> > > > small
> > > > > > performance test app under app/test/security_aesni_gcm(mb)_perftest
> > to
> > > > > > prove.
> > > > > >
> > > > > > For the new API
> > > > > > The packet is sent to the crypto device for symmetric crypto
> > > > > > processing. The device will encrypt or decrypt the buffer based on the
> > > > session
> > > > > > data specified and preprocessed in the security session. Different
> > > > > > than the inline or lookaside modes, when the function exits, the user will
> > > > > > expect the buffers are either processed successfully, or having the error
> > > > number
> > > > > > assigned to the appropriate index of the status array.
> > > > > >
> > > > > > Will update the program's guide in the v1 patch.
> > > > > >
> > > > > > Regards,
> > > > > > Fan
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > > > > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > > > > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > > > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> > > > Declan
> > > > > > > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > > > > > <pablo.de.lara.guarch@intel.com>
> > > > > > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action
> > type
> > > > and
> > > > > > > API
> > > > > > >
> > > > > > > Hi Fan,
> > > > > > >
> > > > > > > >
> > > > > > > > This patch introduce new
> > RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > > > > action
> > > > > > > > type to security library. The type represents performing crypto
> > > > > > > > operation with CPU cycles. The patch also includes a new API to
> > > > > > > > process crypto operations in bulk and the function pointers for PMDs.
> > > > > > > >
> > > > > > > I am not able to get the flow of execution for this action type. Could
> > you
> > > > > > > please elaborate the flow in the documentation. If not in
> > documentation
> > > > > > > right now, then please elaborate the flow in cover letter.
> > > > > > > Also I see that there are new APIs for processing crypto operations in
> > bulk.
> > > > > > > What does that mean. How are they different from the existing APIs
> > which
> > > > > > > are also handling bulk crypto ops depending on the budget.
> > > > > > >
> > > > > > >
> > > > > > > -Akhil
  
Ananyev, Konstantin Sept. 16, 2019, 3:08 p.m. UTC | #12
> Hi Akhil,
> 
> > > > > > > This action type allows the burst of symmetric crypto workload using the
> > > > > same
> > > > > > > algorithm, key, and direction being processed by CPU cycles
> > > synchronously.
> > > > > > > This flexible action type does not require external hardware involvement,
> > > > > > > having the crypto workload processed synchronously, and is more
> > > > > performant
> > > > > > > than Cryptodev SW PMD due to the saved cycles on removed "async
> > > mode
> > > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > > >
> > > > > > Does that mean application will not call the cryptodev_enqueue_burst and
> > > > > corresponding dequeue burst.
> > > > >
> > > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > > >
> > > > > > It would be a new API something like process_packets and it will have the
> > > > > crypto processed packets while returning from the API?
> > > > >
> > > > > Yes, though the plan is that API will operate on raw data buffers, not mbufs.
> > > > >
> > > > > >
> > > > > > I still do not understand why we cannot do with the conventional crypto lib
> > > > > only.
> > > > > > As far as I can understand, you are not doing any protocol processing or
> > > any
> > > > > value add
> > > > > > To the crypto processing. IMO, you just need a synchronous crypto
> > > processing
> > > > > API which
> > > > > > Can be defined in cryptodev, you don't need to re-create a crypto session
> > > in
> > > > > the name of
> > > > > > Security session in the driver just to do a synchronous processing.
> > > > >
> > > > > I suppose your question is why not to have
> > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > The main reason is that would require disruptive changes in existing
> > > cryptodev
> > > > > API
> > > > > (would cause ABI/API breakage).
> > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some extra
> > > > > information
> > > > > that normal crypto_sym_xform doesn't contain
> > > > > (cipher offset from the start of the buffer, might be something extra in
> > > future).
> > > >
> > > > Cipher offset will be part of rte_crypto_op.
> > >
> > > fill/read (+ alloc/free) is one of the main things that slowdown current crypto-op
> > > approach.
> > > That's why the general idea - have all data that wouldn't change from packet to
> > > packet
> > > included into the session and setup it once at session_init().
> >
> > I agree that you cannot use crypto-op.
> > You can have the new API in crypto.
> > As per the current patch, you only need cipher_offset which you can have it as a parameter until
> > You get it approved in the crypto xform. I believe it will be beneficial in case of other crypto cases as well.
> > We can have cipher offset at both places(crypto-op and cipher_xform). It will give flexibility to the user to
> > override it.
> 
> After having another thought on your proposal:
> Probably we can introduce new rte_crypto_sym_xform_types for CPU related stuff here?
> Let say we can have :
> num rte_crypto_sym_xform_type {
>         RTE_CRYPTO_SYM_XFORM_NOT_SPECIFIED = 0, /**< No xform specified */
>         RTE_CRYPTO_SYM_XFORM_AUTH,              /**< Authentication xform */
>         RTE_CRYPTO_SYM_XFORM_CIPHER,            /**< Cipher xform  */
>         RTE_CRYPTO_SYM_XFORM_AEAD               /**< AEAD xform  */
> +     RTE_CRYPTO_SYM_XFORM_CPU = INT32_MIN,
> +    RTE_CRYPTO_SYM_XFORM_CPU_AEAD = (RTE_CRYPTO_SYM_XFORM_CPU | RTE_CRYPTO_SYM_XFORM_CPU),
Meant
RTE_CRYPTO_SYM_XFORM_CPU_AEAD = (RTE_CRYPTO_SYM_XFORM_CPU | RTE_CRYPTO_SYM_XFORM_AEAD),
of course.

>       /* same for auth and crypto xforms */
> };
> 
> Then we either can re-define some values in struct rte_crypto_aead_xform (via unions),
> or even have new  struct rte_crypto_cpu_aead_xform (same for crypto and auth xforms).
> Then if PMD wants to support new sync API it would need to recognize new xform types
> and internally  it might end up with different session structure (one for sync, another for async mode).
> That I think should allow us to introduce cpu_crypto as part of crypto-dev API without ABI breakage.
> What do you think?
> Konstantin
> 
> >
> > >
> > > > If you intend not to use rte_crypto_op
> > > > You can pass this as an argument in the new cryptodev API.
> > >
> > > You mean extra parameter in rte_security_process_cpu_crypto_bulk()?
> > > It can be in theory, but that solution looks a bit ugly:
> > > 	why to pass for each call something that would be constant per session?
> > > 	Again having that value constant per session might allow some extra
> > > optimisations
> > > 	That would be hard to achieve for dynamic case.
> > > and not extendable:
> > > Suppose tomorrow will need to add something extra (some new algorithm
> > > support or so).
> > > With what you proposing will need to new parameter to the function,
> > > which means API breakage.
> > >
> > > > Something extra will also cause ABI breakage in security as well.
> > > > So it will be same.
> > >
> > > I don't think it would.
> > > AFAIK, right now this patch doesn't introduce any API/ABI breakage.
> > > Iinside struct rte_security_session_conf we have a union of xforms
> > > depending on session type.
> > > So as long as cpu_crypto_xform wouldn't exceed sizes of other xform -
> > > I believe no ABI breakage will appear.
> > Agreed, it will not break ABI in case of security till we do not exceed current size.
> >
> > Saving an ABI/API breakage is more important or placing the code at the correct place.
> > We need to find a tradeoff. Others can comment on this.
> > @Thomas Monjalon, @De Lara Guarch, Pablo Any comments?
> >
> > >
> > >
> > > >
> > > > > Also right now there is no way to add new type of crypto_sym_session
> > > without
> > > > > either breaking existing crypto-dev ABI/API or introducing new structure
> > > > > (rte_crypto_sym_cpu_session or so) for that.
> > > >
> > > > What extra info is required in rte_cryptodev_sym_session to get the
> > > rte_crypto_sym_cpu_session.
> > >
> > > Right now - just cipher_offset (see above).
> > > What else in future (if any) - don't know.
> > >
> > > > I don't think there is any.
> > > > I believe the same crypto session will be able to work synchronously as well.
> > >
> > > Exactly the same - problematically, see above.
> > >
> > > > We would only need  a new API to perform synchronous actions.
> > > > That will reduce the duplication code significantly
> > > > in the driver to support 2 different kind of APIs with similar code inside.
> > > > Please correct me in case I am missing something.
> > >
> > > To add new API into crypto-dev would also require changes in the PMD,
> > > it wouldn't come totally free and I believe would require roughly the same
> > > amount of changes.
> >
> > It will be required only in the PMDs which support it and would be minimal.
> > You would need a feature flag, support  for that synchronous API. Session information will
> > already be there in the session. The changes wrt cipher_offset need to be added
> > but with some default value to identify override will be done or not.
> >
> > >
> > > >
> > > >
> > > > > While rte_security is designed in a way that we can add new session types
> > > and
> > > > > related parameters without causing API/ABI breakage.
> > > >
> > > > Yes the intent is to add new sessions based on various protocols that can be
> > > supported by the driver.
> > >
> > > Various protocols and different types of sessions (and devices they belong to).
> > > Let say right now we have INLINE_CRYPTO, INLINE_PROTO, LOOKASIDE_PROTO,
> > > etc.
> > > Here we introduce new type of session.
> >
> > What is the new value add to the existing sessions. The changes that we are doing
> > here is just to avoid an API/ABI breakage. The synchronous processing can happen on both
> > crypto and security session. This would mean, only the processing API should be defined,
> > rest all should be already there in the sessions.
> > In All other cases, INLINE - eth device was not having any format to perform crypto op
> > LOOKASIDE - PROTO - add protocol specific sessions which is not available in crypto.
> >
> > >
> > > > It is not that we should find it as an alternative to cryptodev and using it just
> > > because it will not cause
> > > > ABI/API breakage.
> > >
> > > I am considering this new API as an alternative to existing ones, but as an
> > > extension.
> > > Existing crypto-op API has its own advantages (generic), and I think we should
> > > keep it supported by all crypto-devs.
> > > From other side rte_security is an extendable framework that suits the purpose:
> > > allows easily (and yes without ABI breakage) introduce new API for special type
> > > of crypto-dev (SW based).
> > >
> > >
> >
> > Adding a synchronous processing API is understandable and can be added in both
> > Crypto as well as Security, but a new action type for it is not required.
> > Now whether to support that, we have ABI/API breakage, that is a different issue.
> > And we may have to deal with it if no other option is there.
> >
> > >
> > >
> > >
> > > > IMO the code should be placed where its intent is.
> > > >
> > > > >
> > > > > BTW, what is your concern with proposed approach (via rte_security)?
> > > > > From my perspective it is a lightweight change and it is totally optional
> > > > > for the crypto PMDs to support it or not.
> > > > > Konstantin
> > > > >
> > > > > > >
> > > > > > > AESNI-GCM and AESNI-MB PMDs are updated with this support. There is
> > > a
> > > > > small
> > > > > > > performance test app under app/test/security_aesni_gcm(mb)_perftest
> > > to
> > > > > > > prove.
> > > > > > >
> > > > > > > For the new API
> > > > > > > The packet is sent to the crypto device for symmetric crypto
> > > > > > > processing. The device will encrypt or decrypt the buffer based on the
> > > > > session
> > > > > > > data specified and preprocessed in the security session. Different
> > > > > > > than the inline or lookaside modes, when the function exits, the user will
> > > > > > > expect the buffers are either processed successfully, or having the error
> > > > > number
> > > > > > > assigned to the appropriate index of the status array.
> > > > > > >
> > > > > > > Will update the program's guide in the v1 patch.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Fan
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > > > > > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > > > > > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > > > > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> > > > > Declan
> > > > > > > > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > > > > > > <pablo.de.lara.guarch@intel.com>
> > > > > > > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action
> > > type
> > > > > and
> > > > > > > > API
> > > > > > > >
> > > > > > > > Hi Fan,
> > > > > > > >
> > > > > > > > >
> > > > > > > > > This patch introduce new
> > > RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > > > > > action
> > > > > > > > > type to security library. The type represents performing crypto
> > > > > > > > > operation with CPU cycles. The patch also includes a new API to
> > > > > > > > > process crypto operations in bulk and the function pointers for PMDs.
> > > > > > > > >
> > > > > > > > I am not able to get the flow of execution for this action type. Could
> > > you
> > > > > > > > please elaborate the flow in the documentation. If not in
> > > documentation
> > > > > > > > right now, then please elaborate the flow in cover letter.
> > > > > > > > Also I see that there are new APIs for processing crypto operations in
> > > bulk.
> > > > > > > > What does that mean. How are they different from the existing APIs
> > > which
> > > > > > > > are also handling bulk crypto ops depending on the budget.
> > > > > > > >
> > > > > > > >
> > > > > > > > -Akhil
  
Akhil Goyal Sept. 17, 2019, 6:02 a.m. UTC | #13
Hi Konstantin,
> 
> Hi Akhil,
> 
> > > > > > > This action type allows the burst of symmetric crypto workload using
> the
> > > > > same
> > > > > > > algorithm, key, and direction being processed by CPU cycles
> > > synchronously.
> > > > > > > This flexible action type does not require external hardware
> involvement,
> > > > > > > having the crypto workload processed synchronously, and is more
> > > > > performant
> > > > > > > than Cryptodev SW PMD due to the saved cycles on removed "async
> > > mode
> > > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > > >
> > > > > > Does that mean application will not call the cryptodev_enqueue_burst
> and
> > > > > corresponding dequeue burst.
> > > > >
> > > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > > >
> > > > > > It would be a new API something like process_packets and it will have
> the
> > > > > crypto processed packets while returning from the API?
> > > > >
> > > > > Yes, though the plan is that API will operate on raw data buffers, not
> mbufs.
> > > > >
> > > > > >
> > > > > > I still do not understand why we cannot do with the conventional
> crypto lib
> > > > > only.
> > > > > > As far as I can understand, you are not doing any protocol processing
> or
> > > any
> > > > > value add
> > > > > > To the crypto processing. IMO, you just need a synchronous crypto
> > > processing
> > > > > API which
> > > > > > Can be defined in cryptodev, you don't need to re-create a crypto
> session
> > > in
> > > > > the name of
> > > > > > Security session in the driver just to do a synchronous processing.
> > > > >
> > > > > I suppose your question is why not to have
> > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > The main reason is that would require disruptive changes in existing
> > > cryptodev
> > > > > API
> > > > > (would cause ABI/API breakage).
> > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some
> extra
> > > > > information
> > > > > that normal crypto_sym_xform doesn't contain
> > > > > (cipher offset from the start of the buffer, might be something extra in
> > > future).
> > > >
> > > > Cipher offset will be part of rte_crypto_op.
> > >
> > > fill/read (+ alloc/free) is one of the main things that slowdown current
> crypto-op
> > > approach.
> > > That's why the general idea - have all data that wouldn't change from packet
> to
> > > packet
> > > included into the session and setup it once at session_init().
> >
> > I agree that you cannot use crypto-op.
> > You can have the new API in crypto.
> > As per the current patch, you only need cipher_offset which you can have it as
> a parameter until
> > You get it approved in the crypto xform. I believe it will be beneficial in case of
> other crypto cases as well.
> > We can have cipher offset at both places(crypto-op and cipher_xform). It will
> give flexibility to the user to
> > override it.
> 
> After having another thought on your proposal:
> Probably we can introduce new rte_crypto_sym_xform_types for CPU related
> stuff here?

I also thought of adding new xforms, but that wont serve the purpose for may be all the cases.
You would be needing all information currently available in the current xforms.
So if you are adding new fields in the new xform, the size will be more than that of the union of xforms.
ABI breakage would still be there. 

If you think a valid compression of the AEAD xform can be done, then that can be done for each of the
Xforms and we can have a solution to this issue.

> Let say we can have :
> num rte_crypto_sym_xform_type {
>         RTE_CRYPTO_SYM_XFORM_NOT_SPECIFIED = 0, /**< No xform specified
> */
>         RTE_CRYPTO_SYM_XFORM_AUTH,              /**< Authentication xform */
>         RTE_CRYPTO_SYM_XFORM_CIPHER,            /**< Cipher xform  */
>         RTE_CRYPTO_SYM_XFORM_AEAD               /**< AEAD xform  */
> +     RTE_CRYPTO_SYM_XFORM_CPU = INT32_MIN,
> +    RTE_CRYPTO_SYM_XFORM_CPU_AEAD = (RTE_CRYPTO_SYM_XFORM_CPU |
> RTE_CRYPTO_SYM_XFORM_CPU),

Instead of CPU I believe SYNC would be better.

>       /* same for auth and crypto xforms */
> };
> 
> Then we either can re-define some values in struct rte_crypto_aead_xform (via
> unions),
> or even have new  struct rte_crypto_cpu_aead_xform (same for crypto and auth
> xforms).
> Then if PMD wants to support new sync API it would need to recognize new
> xform types
> and internally  it might end up with different session structure (one for sync,
> another for async mode).
> That I think should allow us to introduce cpu_crypto as part of crypto-dev API
> without ABI breakage.
> What do you think?
> Konstantin
> 
> >
> > >
> > > > If you intend not to use rte_crypto_op
> > > > You can pass this as an argument in the new cryptodev API.
> > >
> > > You mean extra parameter in rte_security_process_cpu_crypto_bulk()?
> > > It can be in theory, but that solution looks a bit ugly:
> > > 	why to pass for each call something that would be constant per session?
> > > 	Again having that value constant per session might allow some extra
> > > optimisations
> > > 	That would be hard to achieve for dynamic case.
> > > and not extendable:
> > > Suppose tomorrow will need to add something extra (some new algorithm
> > > support or so).
> > > With what you proposing will need to new parameter to the function,
> > > which means API breakage.
> > >
> > > > Something extra will also cause ABI breakage in security as well.
> > > > So it will be same.
> > >
> > > I don't think it would.
> > > AFAIK, right now this patch doesn't introduce any API/ABI breakage.
> > > Iinside struct rte_security_session_conf we have a union of xforms
> > > depending on session type.
> > > So as long as cpu_crypto_xform wouldn't exceed sizes of other xform -
> > > I believe no ABI breakage will appear.
> > Agreed, it will not break ABI in case of security till we do not exceed current
> size.
> >
> > Saving an ABI/API breakage is more important or placing the code at the
> correct place.
> > We need to find a tradeoff. Others can comment on this.
> > @Thomas Monjalon, @De Lara Guarch, Pablo Any comments?
> >
> > >
> > >
> > > >
> > > > > Also right now there is no way to add new type of crypto_sym_session
> > > without
> > > > > either breaking existing crypto-dev ABI/API or introducing new structure
> > > > > (rte_crypto_sym_cpu_session or so) for that.
> > > >
> > > > What extra info is required in rte_cryptodev_sym_session to get the
> > > rte_crypto_sym_cpu_session.
> > >
> > > Right now - just cipher_offset (see above).
> > > What else in future (if any) - don't know.
> > >
> > > > I don't think there is any.
> > > > I believe the same crypto session will be able to work synchronously as well.
> > >
> > > Exactly the same - problematically, see above.
> > >
> > > > We would only need  a new API to perform synchronous actions.
> > > > That will reduce the duplication code significantly
> > > > in the driver to support 2 different kind of APIs with similar code inside.
> > > > Please correct me in case I am missing something.
> > >
> > > To add new API into crypto-dev would also require changes in the PMD,
> > > it wouldn't come totally free and I believe would require roughly the same
> > > amount of changes.
> >
> > It will be required only in the PMDs which support it and would be minimal.
> > You would need a feature flag, support  for that synchronous API. Session
> information will
> > already be there in the session. The changes wrt cipher_offset need to be
> added
> > but with some default value to identify override will be done or not.
> >
> > >
> > > >
> > > >
> > > > > While rte_security is designed in a way that we can add new session
> types
> > > and
> > > > > related parameters without causing API/ABI breakage.
> > > >
> > > > Yes the intent is to add new sessions based on various protocols that can
> be
> > > supported by the driver.
> > >
> > > Various protocols and different types of sessions (and devices they belong
> to).
> > > Let say right now we have INLINE_CRYPTO, INLINE_PROTO,
> LOOKASIDE_PROTO,
> > > etc.
> > > Here we introduce new type of session.
> >
> > What is the new value add to the existing sessions. The changes that we are
> doing
> > here is just to avoid an API/ABI breakage. The synchronous processing can
> happen on both
> > crypto and security session. This would mean, only the processing API should
> be defined,
> > rest all should be already there in the sessions.
> > In All other cases, INLINE - eth device was not having any format to perform
> crypto op
> > LOOKASIDE - PROTO - add protocol specific sessions which is not available in
> crypto.
> >
> > >
> > > > It is not that we should find it as an alternative to cryptodev and using it
> just
> > > because it will not cause
> > > > ABI/API breakage.
> > >
> > > I am considering this new API as an alternative to existing ones, but as an
> > > extension.
> > > Existing crypto-op API has its own advantages (generic), and I think we
> should
> > > keep it supported by all crypto-devs.
> > > From other side rte_security is an extendable framework that suits the
> purpose:
> > > allows easily (and yes without ABI breakage) introduce new API for special
> type
> > > of crypto-dev (SW based).
> > >
> > >
> >
> > Adding a synchronous processing API is understandable and can be added in
> both
> > Crypto as well as Security, but a new action type for it is not required.
> > Now whether to support that, we have ABI/API breakage, that is a different
> issue.
> > And we may have to deal with it if no other option is there.
> >
> > >
> > >
> > >
> > > > IMO the code should be placed where its intent is.
> > > >
> > > > >
> > > > > BTW, what is your concern with proposed approach (via rte_security)?
> > > > > From my perspective it is a lightweight change and it is totally optional
> > > > > for the crypto PMDs to support it or not.
> > > > > Konstantin
> > > > >
> > > > > > >
> > > > > > > AESNI-GCM and AESNI-MB PMDs are updated with this support.
> There is
> > > a
> > > > > small
> > > > > > > performance test app under
> app/test/security_aesni_gcm(mb)_perftest
> > > to
> > > > > > > prove.
> > > > > > >
> > > > > > > For the new API
> > > > > > > The packet is sent to the crypto device for symmetric crypto
> > > > > > > processing. The device will encrypt or decrypt the buffer based on the
> > > > > session
> > > > > > > data specified and preprocessed in the security session. Different
> > > > > > > than the inline or lookaside modes, when the function exits, the user
> will
> > > > > > > expect the buffers are either processed successfully, or having the
> error
> > > > > number
> > > > > > > assigned to the appropriate index of the status array.
> > > > > > >
> > > > > > > Will update the program's guide in the v1 patch.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Fan
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > > > > > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > > > > > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > > > > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> > > > > Declan
> > > > > > > > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > > > > > > <pablo.de.lara.guarch@intel.com>
> > > > > > > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action
> > > type
> > > > > and
> > > > > > > > API
> > > > > > > >
> > > > > > > > Hi Fan,
> > > > > > > >
> > > > > > > > >
> > > > > > > > > This patch introduce new
> > > RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > > > > > action
> > > > > > > > > type to security library. The type represents performing crypto
> > > > > > > > > operation with CPU cycles. The patch also includes a new API to
> > > > > > > > > process crypto operations in bulk and the function pointers for
> PMDs.
> > > > > > > > >
> > > > > > > > I am not able to get the flow of execution for this action type.
> Could
> > > you
> > > > > > > > please elaborate the flow in the documentation. If not in
> > > documentation
> > > > > > > > right now, then please elaborate the flow in cover letter.
> > > > > > > > Also I see that there are new APIs for processing crypto operations
> in
> > > bulk.
> > > > > > > > What does that mean. How are they different from the existing APIs
> > > which
> > > > > > > > are also handling bulk crypto ops depending on the budget.
> > > > > > > >
> > > > > > > >
> > > > > > > > -Akhil
  
Ananyev, Konstantin Sept. 18, 2019, 7:44 a.m. UTC | #14
Hi Akhil,

> > > > > > > > This action type allows the burst of symmetric crypto workload using
> > the
> > > > > > same
> > > > > > > > algorithm, key, and direction being processed by CPU cycles
> > > > synchronously.
> > > > > > > > This flexible action type does not require external hardware
> > involvement,
> > > > > > > > having the crypto workload processed synchronously, and is more
> > > > > > performant
> > > > > > > > than Cryptodev SW PMD due to the saved cycles on removed "async
> > > > mode
> > > > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > > > >
> > > > > > > Does that mean application will not call the cryptodev_enqueue_burst
> > and
> > > > > > corresponding dequeue burst.
> > > > > >
> > > > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > > > >
> > > > > > > It would be a new API something like process_packets and it will have
> > the
> > > > > > crypto processed packets while returning from the API?
> > > > > >
> > > > > > Yes, though the plan is that API will operate on raw data buffers, not
> > mbufs.
> > > > > >
> > > > > > >
> > > > > > > I still do not understand why we cannot do with the conventional
> > crypto lib
> > > > > > only.
> > > > > > > As far as I can understand, you are not doing any protocol processing
> > or
> > > > any
> > > > > > value add
> > > > > > > To the crypto processing. IMO, you just need a synchronous crypto
> > > > processing
> > > > > > API which
> > > > > > > Can be defined in cryptodev, you don't need to re-create a crypto
> > session
> > > > in
> > > > > > the name of
> > > > > > > Security session in the driver just to do a synchronous processing.
> > > > > >
> > > > > > I suppose your question is why not to have
> > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > The main reason is that would require disruptive changes in existing
> > > > cryptodev
> > > > > > API
> > > > > > (would cause ABI/API breakage).
> > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some
> > extra
> > > > > > information
> > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > (cipher offset from the start of the buffer, might be something extra in
> > > > future).
> > > > >
> > > > > Cipher offset will be part of rte_crypto_op.
> > > >
> > > > fill/read (+ alloc/free) is one of the main things that slowdown current
> > crypto-op
> > > > approach.
> > > > That's why the general idea - have all data that wouldn't change from packet
> > to
> > > > packet
> > > > included into the session and setup it once at session_init().
> > >
> > > I agree that you cannot use crypto-op.
> > > You can have the new API in crypto.
> > > As per the current patch, you only need cipher_offset which you can have it as
> > a parameter until
> > > You get it approved in the crypto xform. I believe it will be beneficial in case of
> > other crypto cases as well.
> > > We can have cipher offset at both places(crypto-op and cipher_xform). It will
> > give flexibility to the user to
> > > override it.
> >
> > After having another thought on your proposal:
> > Probably we can introduce new rte_crypto_sym_xform_types for CPU related
> > stuff here?
> 
> I also thought of adding new xforms, but that wont serve the purpose for may be all the cases.
> You would be needing all information currently available in the current xforms.
> So if you are adding new fields in the new xform, the size will be more than that of the union of xforms.
> ABI breakage would still be there.
> 
> If you think a valid compression of the AEAD xform can be done, then that can be done for each of the
> Xforms and we can have a solution to this issue.

I think that we can re-use iv.offset for our purposes (for crypto offset).
So for now we can make that path work without any ABI breakage. 
Fan, please feel free to correct me here, if I missed something.
If in future we would need to add some extra information it might
require ABI breakage, though by now I don't envision anything particular to add.
Anyway, if there is no objection to go that way, we can try to make
these changes for v2. 

> 
> > Let say we can have :
> > num rte_crypto_sym_xform_type {
> >         RTE_CRYPTO_SYM_XFORM_NOT_SPECIFIED = 0, /**< No xform specified
> > */
> >         RTE_CRYPTO_SYM_XFORM_AUTH,              /**< Authentication xform */
> >         RTE_CRYPTO_SYM_XFORM_CIPHER,            /**< Cipher xform  */
> >         RTE_CRYPTO_SYM_XFORM_AEAD               /**< AEAD xform  */
> > +     RTE_CRYPTO_SYM_XFORM_CPU = INT32_MIN,
> > +    RTE_CRYPTO_SYM_XFORM_CPU_AEAD = (RTE_CRYPTO_SYM_XFORM_CPU |
> > RTE_CRYPTO_SYM_XFORM_CPU),
> 
> Instead of CPU I believe SYNC would be better.

I don't mind to name it to SYNC, but I'd like to outline,
that it's not really more CPU then generic SYNC API
(it doesn't pass IOVA for data buffers, etc., only VA). 

> 
> >       /* same for auth and crypto xforms */
> > };
> >
> > Then we either can re-define some values in struct rte_crypto_aead_xform (via
> > unions),
> > or even have new  struct rte_crypto_cpu_aead_xform (same for crypto and auth
> > xforms).
> > Then if PMD wants to support new sync API it would need to recognize new
> > xform types
> > and internally  it might end up with different session structure (one for sync,
> > another for async mode).
> > That I think should allow us to introduce cpu_crypto as part of crypto-dev API
> > without ABI breakage.
> > What do you think?
> > Konstantin
> >
> > >
> > > >
> > > > > If you intend not to use rte_crypto_op
> > > > > You can pass this as an argument in the new cryptodev API.
> > > >
> > > > You mean extra parameter in rte_security_process_cpu_crypto_bulk()?
> > > > It can be in theory, but that solution looks a bit ugly:
> > > > 	why to pass for each call something that would be constant per session?
> > > > 	Again having that value constant per session might allow some extra
> > > > optimisations
> > > > 	That would be hard to achieve for dynamic case.
> > > > and not extendable:
> > > > Suppose tomorrow will need to add something extra (some new algorithm
> > > > support or so).
> > > > With what you proposing will need to new parameter to the function,
> > > > which means API breakage.
> > > >
> > > > > Something extra will also cause ABI breakage in security as well.
> > > > > So it will be same.
> > > >
> > > > I don't think it would.
> > > > AFAIK, right now this patch doesn't introduce any API/ABI breakage.
> > > > Iinside struct rte_security_session_conf we have a union of xforms
> > > > depending on session type.
> > > > So as long as cpu_crypto_xform wouldn't exceed sizes of other xform -
> > > > I believe no ABI breakage will appear.
> > > Agreed, it will not break ABI in case of security till we do not exceed current
> > size.
> > >
> > > Saving an ABI/API breakage is more important or placing the code at the
> > correct place.
> > > We need to find a tradeoff. Others can comment on this.
> > > @Thomas Monjalon, @De Lara Guarch, Pablo Any comments?
> > >
> > > >
> > > >
> > > > >
> > > > > > Also right now there is no way to add new type of crypto_sym_session
> > > > without
> > > > > > either breaking existing crypto-dev ABI/API or introducing new structure
> > > > > > (rte_crypto_sym_cpu_session or so) for that.
> > > > >
> > > > > What extra info is required in rte_cryptodev_sym_session to get the
> > > > rte_crypto_sym_cpu_session.
> > > >
> > > > Right now - just cipher_offset (see above).
> > > > What else in future (if any) - don't know.
> > > >
> > > > > I don't think there is any.
> > > > > I believe the same crypto session will be able to work synchronously as well.
> > > >
> > > > Exactly the same - problematically, see above.
> > > >
> > > > > We would only need  a new API to perform synchronous actions.
> > > > > That will reduce the duplication code significantly
> > > > > in the driver to support 2 different kind of APIs with similar code inside.
> > > > > Please correct me in case I am missing something.
> > > >
> > > > To add new API into crypto-dev would also require changes in the PMD,
> > > > it wouldn't come totally free and I believe would require roughly the same
> > > > amount of changes.
> > >
> > > It will be required only in the PMDs which support it and would be minimal.
> > > You would need a feature flag, support  for that synchronous API. Session
> > information will
> > > already be there in the session. The changes wrt cipher_offset need to be
> > added
> > > but with some default value to identify override will be done or not.
> > >
> > > >
> > > > >
> > > > >
> > > > > > While rte_security is designed in a way that we can add new session
> > types
> > > > and
> > > > > > related parameters without causing API/ABI breakage.
> > > > >
> > > > > Yes the intent is to add new sessions based on various protocols that can
> > be
> > > > supported by the driver.
> > > >
> > > > Various protocols and different types of sessions (and devices they belong
> > to).
> > > > Let say right now we have INLINE_CRYPTO, INLINE_PROTO,
> > LOOKASIDE_PROTO,
> > > > etc.
> > > > Here we introduce new type of session.
> > >
> > > What is the new value add to the existing sessions. The changes that we are
> > doing
> > > here is just to avoid an API/ABI breakage. The synchronous processing can
> > happen on both
> > > crypto and security session. This would mean, only the processing API should
> > be defined,
> > > rest all should be already there in the sessions.
> > > In All other cases, INLINE - eth device was not having any format to perform
> > crypto op
> > > LOOKASIDE - PROTO - add protocol specific sessions which is not available in
> > crypto.
> > >
> > > >
> > > > > It is not that we should find it as an alternative to cryptodev and using it
> > just
> > > > because it will not cause
> > > > > ABI/API breakage.
> > > >
> > > > I am considering this new API as an alternative to existing ones, but as an
> > > > extension.
> > > > Existing crypto-op API has its own advantages (generic), and I think we
> > should
> > > > keep it supported by all crypto-devs.
> > > > From other side rte_security is an extendable framework that suits the
> > purpose:
> > > > allows easily (and yes without ABI breakage) introduce new API for special
> > type
> > > > of crypto-dev (SW based).
> > > >
> > > >
> > >
> > > Adding a synchronous processing API is understandable and can be added in
> > both
> > > Crypto as well as Security, but a new action type for it is not required.
> > > Now whether to support that, we have ABI/API breakage, that is a different
> > issue.
> > > And we may have to deal with it if no other option is there.
> > >
> > > >
> > > >
> > > >
> > > > > IMO the code should be placed where its intent is.
> > > > >
> > > > > >
> > > > > > BTW, what is your concern with proposed approach (via rte_security)?
> > > > > > From my perspective it is a lightweight change and it is totally optional
> > > > > > for the crypto PMDs to support it or not.
> > > > > > Konstantin
> > > > > >
> > > > > > > >
> > > > > > > > AESNI-GCM and AESNI-MB PMDs are updated with this support.
> > There is
> > > > a
> > > > > > small
> > > > > > > > performance test app under
> > app/test/security_aesni_gcm(mb)_perftest
> > > > to
> > > > > > > > prove.
> > > > > > > >
> > > > > > > > For the new API
> > > > > > > > The packet is sent to the crypto device for symmetric crypto
> > > > > > > > processing. The device will encrypt or decrypt the buffer based on the
> > > > > > session
> > > > > > > > data specified and preprocessed in the security session. Different
> > > > > > > > than the inline or lookaside modes, when the function exits, the user
> > will
> > > > > > > > expect the buffers are either processed successfully, or having the
> > error
> > > > > > number
> > > > > > > > assigned to the appropriate index of the status array.
> > > > > > > >
> > > > > > > > Will update the program's guide in the v1 patch.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Fan
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > > > > > > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > > > > > > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > > > > > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> > > > > > Declan
> > > > > > > > > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > > > > > > > <pablo.de.lara.guarch@intel.com>
> > > > > > > > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action
> > > > type
> > > > > > and
> > > > > > > > > API
> > > > > > > > >
> > > > > > > > > Hi Fan,
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This patch introduce new
> > > > RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > > > > > > action
> > > > > > > > > > type to security library. The type represents performing crypto
> > > > > > > > > > operation with CPU cycles. The patch also includes a new API to
> > > > > > > > > > process crypto operations in bulk and the function pointers for
> > PMDs.
> > > > > > > > > >
> > > > > > > > > I am not able to get the flow of execution for this action type.
> > Could
> > > > you
> > > > > > > > > please elaborate the flow in the documentation. If not in
> > > > documentation
> > > > > > > > > right now, then please elaborate the flow in cover letter.
> > > > > > > > > Also I see that there are new APIs for processing crypto operations
> > in
> > > > bulk.
> > > > > > > > > What does that mean. How are they different from the existing APIs
> > > > which
> > > > > > > > > are also handling bulk crypto ops depending on the budget.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > -Akhil
  
Ananyev, Konstantin Sept. 25, 2019, 6:24 p.m. UTC | #15
> > > > > > > > > This action type allows the burst of symmetric crypto workload using
> > > the
> > > > > > > same
> > > > > > > > > algorithm, key, and direction being processed by CPU cycles
> > > > > synchronously.
> > > > > > > > > This flexible action type does not require external hardware
> > > involvement,
> > > > > > > > > having the crypto workload processed synchronously, and is more
> > > > > > > performant
> > > > > > > > > than Cryptodev SW PMD due to the saved cycles on removed "async
> > > > > mode
> > > > > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > > > > >
> > > > > > > > Does that mean application will not call the cryptodev_enqueue_burst
> > > and
> > > > > > > corresponding dequeue burst.
> > > > > > >
> > > > > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > > > > >
> > > > > > > > It would be a new API something like process_packets and it will have
> > > the
> > > > > > > crypto processed packets while returning from the API?
> > > > > > >
> > > > > > > Yes, though the plan is that API will operate on raw data buffers, not
> > > mbufs.
> > > > > > >
> > > > > > > >
> > > > > > > > I still do not understand why we cannot do with the conventional
> > > crypto lib
> > > > > > > only.
> > > > > > > > As far as I can understand, you are not doing any protocol processing
> > > or
> > > > > any
> > > > > > > value add
> > > > > > > > To the crypto processing. IMO, you just need a synchronous crypto
> > > > > processing
> > > > > > > API which
> > > > > > > > Can be defined in cryptodev, you don't need to re-create a crypto
> > > session
> > > > > in
> > > > > > > the name of
> > > > > > > > Security session in the driver just to do a synchronous processing.
> > > > > > >
> > > > > > > I suppose your question is why not to have
> > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > The main reason is that would require disruptive changes in existing
> > > > > cryptodev
> > > > > > > API
> > > > > > > (would cause ABI/API breakage).
> > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some
> > > extra
> > > > > > > information
> > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > (cipher offset from the start of the buffer, might be something extra in
> > > > > future).
> > > > > >
> > > > > > Cipher offset will be part of rte_crypto_op.
> > > > >
> > > > > fill/read (+ alloc/free) is one of the main things that slowdown current
> > > crypto-op
> > > > > approach.
> > > > > That's why the general idea - have all data that wouldn't change from packet
> > > to
> > > > > packet
> > > > > included into the session and setup it once at session_init().
> > > >
> > > > I agree that you cannot use crypto-op.
> > > > You can have the new API in crypto.
> > > > As per the current patch, you only need cipher_offset which you can have it as
> > > a parameter until
> > > > You get it approved in the crypto xform. I believe it will be beneficial in case of
> > > other crypto cases as well.
> > > > We can have cipher offset at both places(crypto-op and cipher_xform). It will
> > > give flexibility to the user to
> > > > override it.
> > >
> > > After having another thought on your proposal:
> > > Probably we can introduce new rte_crypto_sym_xform_types for CPU related
> > > stuff here?
> >
> > I also thought of adding new xforms, but that wont serve the purpose for may be all the cases.
> > You would be needing all information currently available in the current xforms.
> > So if you are adding new fields in the new xform, the size will be more than that of the union of xforms.
> > ABI breakage would still be there.
> >
> > If you think a valid compression of the AEAD xform can be done, then that can be done for each of the
> > Xforms and we can have a solution to this issue.
> 
> I think that we can re-use iv.offset for our purposes (for crypto offset).
> So for now we can make that path work without any ABI breakage.
> Fan, please feel free to correct me here, if I missed something.
> If in future we would need to add some extra information it might
> require ABI breakage, though by now I don't envision anything particular to add.
> Anyway, if there is no objection to go that way, we can try to make
> these changes for v2.
> 

Actually, after looking at it more deeply it appears not that easy as I thought it would be :)
Below is a very draft version of proposed API additions.
I think it avoids ABI breakages right now and provides enough flexibility for future extensions (if any). 
For now, it doesn't address your comments about naming conventions (_CPU_ vs _SYNC_) , etc.
but I suppose is comprehensive enough to provide a main idea beyond it.
Akhil and other interested parties, please try to review and provide feedback ASAP,
as related changes would take some time and we still like to hit 19.11 deadline.
Konstantin

 diff --git a/lib/librte_cryptodev/rte_crypto_sym.h b/lib/librte_cryptodev/rte_crypto_sym.h
index bc8da2466..c03069e23 100644
--- a/lib/librte_cryptodev/rte_crypto_sym.h
+++ b/lib/librte_cryptodev/rte_crypto_sym.h
@@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
  *
  * This structure contains data relating to Cipher (Encryption and Decryption)
  *  use to create a session.
+ * Actually I was wrong saying that we don't have free space inside xforms.
+ * Making key struct packed (see below) allow us to regain 6B that could be
+ * used for future extensions.
  */
 struct rte_crypto_cipher_xform {
        enum rte_crypto_cipher_operation op;
@@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
        struct {
                const uint8_t *data;    /**< pointer to key data */
                uint16_t length;        /**< key length in bytes */
-       } key;
+       } __attribute__((__packed__)) key;
+
+       /**
+         * offset for cipher to start within user provided data buffer.
+        * Fan suggested another (and less space consuming way) -
+         * reuse iv.offset space below, by changing:
+        * struct {uint16_t offset, length;} iv;
+        * to uunamed union:
+        * union {
+        *      struct {uint16_t offset, length;} iv;
+        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
+        * };
+        * Both approaches seems ok to me in general.
+        * Comments/suggestions are welcome.
+         */
+       uint16_t offset;
+
+       uint8_t reserved1[4];
+
        /**< Cipher key
         *
         * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation, key.data will
@@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
        struct {
                const uint8_t *data;    /**< pointer to key data */
                uint16_t length;        /**< key length in bytes */
-       } key;
+       } __attribute__((__packed__)) key;
        /**< Authentication key data.
         * The authentication key length MUST be less than or equal to the
         * block size of the algorithm. It is the callers responsibility to
@@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
         * (for example RFC 2104, FIPS 198a).
         */

+       uint8_t reserved1[6];
+
        struct {
                uint16_t offset;
                /**< Starting point for Initialisation Vector or Counter,
@@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
        struct {
                const uint8_t *data;    /**< pointer to key data */
                uint16_t length;        /**< key length in bytes */
-       } key;
+       } __attribute__((__packed__)) key;
+
+       /** offset for cipher to start within data buffer */
+       uint16_t cipher_offset;
+
+       uint8_t reserved1[4];

        struct {
                uint16_t offset;
diff --git a/lib/librte_cryptodev/rte_cryptodev.h b/lib/librte_cryptodev/rte_cryptodev.h
index e175b838c..c0c7bfed7 100644
--- a/lib/librte_cryptodev/rte_cryptodev.h
+++ b/lib/librte_cryptodev/rte_cryptodev.h
@@ -1272,6 +1272,101 @@ void *
 rte_cryptodev_sym_session_get_user_data(
                                        struct rte_cryptodev_sym_session *sess);

+/*
+ * After several thoughts decided not to try to squeeze CPU_CRYPTO
+ * into existing rte_crypto_sym_session structure/API, but instead
+ * introduce an extentsion to it via new fully opaque
+ * struct rte_crypto_cpu_sym_session and additional related API.
+ * Main points:
+ * - Current crypto-dev API is reasonably mature and it is desirable
+ *   to keep it unchanged (API/ABI stability). From other side, this
+ *   new sync API is new one and probably would require extra changes.
+ *   Having it as a new one allows to mark it as experimental, without
+ *   affecting existing one.
+ * - Fully opaque cpu_sym_session structure gives more flexibility
+ *   to the PMD writers and again allows to avoid ABI breakages in future.
+ * - process() function per set of xforms
+ *   allows to expose different process() functions for different
+ *   xform combinations. PMD writer can decide, does he wants to
+ *   push all supported algorithms into one process() function,
+ *   or spread it across several ones.
+ *   I.E. More flexibility for PMD writer.
+ * - Not storing process() pointer inside the session -
+ *   Allows user to choose does he want to store a process() pointer
+ *   per session, or per group of sessions for that device that share
+ *   the same input xforms. I.E. extra flexibility for the user,
+ *   plus allows us to keep cpu_sym_session totally opaque, see above.
+ * Sketched usage model:
+ * ....
+ * /* control path, alloc/init session */
+ * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
+ * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
+ * rte_crypto_cpu_sym_process_t process =
+ *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
+ * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
+ * ...
+ * /* data-path*/
+ * process(ses, ....);
+ * ....
+ * /* control path, termiante/free session */
+ * rte_crypto_cpu_sym_session_fini(dev_id, ses);
+ */
+
+/**
+ * vector structure, contains pointer to vector array and the length
+ * of the array
+ */
+struct rte_crypto_vec {
+       struct iovec *vec;
+       uint32_t num;
+};
+
+/*
+ * Data-path bulk process crypto function.
+ */
+typedef void (*rte_crypto_cpu_sym_process_t)(
+               struct rte_crypto_cpu_sym_session *sess,
+               struct rte_crypto_vec buf[], void *iv[], void *aad[],
+               void *digest[], int status[], uint32_t num);
+/*
+ * for given device return process function specific to input xforms
+ * on error - return NULL and set rte_errno value.
+ * Note that for same input xfroms for the same device should return
+ * the same process function.
+ */
+__rte_experimental
+rte_crypto_cpu_sym_process_t
+rte_crypto_cpu_sym_session_func(uint8_t dev_id,
+                       const struct rte_crypto_sym_xform *xforms);
+
+/*
+ * Return required session size in bytes for given set of xforms.
+ * if xforms == NULL, then return the max possible session size,
+ * that would fit session for any supported by the device algorithm.
+ * if CPU mode is not supported at all, or requeted in xform
+ * algorithm is not supported, then return -ENOTSUP.
+ */
+__rte_experimental
+int
+rte_crypto_cpu_sym_session_size(uint8_t dev_id,
+                       const struct rte_crypto_sym_xform *xforms);
+
+/*
+ * Initialize session.
+ * It is caller responsibility to allocate enough space for it.
+ * See rte_crypto_cpu_sym_session_size above.
+ */
+__rte_experimental
+int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
+                       struct rte_crypto_cpu_sym_session *sess,
+                       const struct rte_crypto_sym_xform *xforms);
+
+__rte_experimental
+void
+rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
+                       struct rte_crypto_cpu_sym_session *sess);
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h b/lib/librte_cryptodev/rte_cryptodev_pmd.h
index defe05ea0..ed7e63fab 100644
--- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
+++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
@@ -310,6 +310,20 @@ typedef void (*cryptodev_sym_free_session_t)(struct rte_cryptodev *dev,
 typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev *dev,
                struct rte_cryptodev_asym_session *sess);

+typedef int (*cryptodev_cpu_sym_session_size_t) (struct rte_cryptodev *dev,
+                       const struct rte_crypto_sym_xform *xforms);
+
+typedef int (*cryptodev_cpu_sym_session_init_t) (struct rte_cryptodev *dev,
+                       struct rte_crypto_cpu_sym_session *sess,
+                       const struct rte_crypto_sym_xform *xforms);
+
+typedef void (*cryptodev_cpu_sym_session_fini_t) (struct rte_cryptodev *dev,
+                       struct rte_crypto_cpu_sym_session *sess);
+
+typedef rte_crypto_cpu_sym_process_t (*cryptodev_cpu_sym_session_func_t) (
+                       struct rte_cryptodev *dev,
+                       const struct rte_crypto_sym_xform *xforms);
+
 /** Crypto device operations function pointer table */
 struct rte_cryptodev_ops {
        cryptodev_configure_t dev_configure;    /**< Configure device. */
@@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
        /**< Clear a Crypto sessions private data. */
        cryptodev_asym_free_session_t asym_session_clear;
        /**< Clear a Crypto sessions private data. */
+
+       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
+       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
+       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
+       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
 };
  
Akhil Goyal Sept. 27, 2019, 9:26 a.m. UTC | #16
Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Sent: Wednesday, September 25, 2019 11:54 PM
> To: Akhil Goyal <akhil.goyal@nxp.com>; 'dev@dpdk.org' <dev@dpdk.org>; De
> Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; 'Thomas Monjalon'
> <thomas@monjalon.net>
> Cc: Zhang, Roy Fan <roy.fan.zhang@intel.com>; Doherty, Declan
> <declan.doherty@intel.com>; 'Anoob Joseph' <anoobj@marvell.com>
> Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
> 
> 
> > > > > > > > > > This action type allows the burst of symmetric crypto workload
> using
> > > > the
> > > > > > > > same
> > > > > > > > > > algorithm, key, and direction being processed by CPU cycles
> > > > > > synchronously.
> > > > > > > > > > This flexible action type does not require external hardware
> > > > involvement,
> > > > > > > > > > having the crypto workload processed synchronously, and is
> more
> > > > > > > > performant
> > > > > > > > > > than Cryptodev SW PMD due to the saved cycles on removed
> "async
> > > > > > mode
> > > > > > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > > > > > >
> > > > > > > > > Does that mean application will not call the
> cryptodev_enqueue_burst
> > > > and
> > > > > > > > corresponding dequeue burst.
> > > > > > > >
> > > > > > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > > > > > >
> > > > > > > > > It would be a new API something like process_packets and it will
> have
> > > > the
> > > > > > > > crypto processed packets while returning from the API?
> > > > > > > >
> > > > > > > > Yes, though the plan is that API will operate on raw data buffers,
> not
> > > > mbufs.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > I still do not understand why we cannot do with the conventional
> > > > crypto lib
> > > > > > > > only.
> > > > > > > > > As far as I can understand, you are not doing any protocol
> processing
> > > > or
> > > > > > any
> > > > > > > > value add
> > > > > > > > > To the crypto processing. IMO, you just need a synchronous
> crypto
> > > > > > processing
> > > > > > > > API which
> > > > > > > > > Can be defined in cryptodev, you don't need to re-create a crypto
> > > > session
> > > > > > in
> > > > > > > > the name of
> > > > > > > > > Security session in the driver just to do a synchronous processing.
> > > > > > > >
> > > > > > > > I suppose your question is why not to have
> > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > > The main reason is that would require disruptive changes in existing
> > > > > > cryptodev
> > > > > > > > API
> > > > > > > > (would cause ABI/API breakage).
> > > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need
> some
> > > > extra
> > > > > > > > information
> > > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > > (cipher offset from the start of the buffer, might be something extra
> in
> > > > > > future).
> > > > > > >
> > > > > > > Cipher offset will be part of rte_crypto_op.
> > > > > >
> > > > > > fill/read (+ alloc/free) is one of the main things that slowdown current
> > > > crypto-op
> > > > > > approach.
> > > > > > That's why the general idea - have all data that wouldn't change from
> packet
> > > > to
> > > > > > packet
> > > > > > included into the session and setup it once at session_init().
> > > > >
> > > > > I agree that you cannot use crypto-op.
> > > > > You can have the new API in crypto.
> > > > > As per the current patch, you only need cipher_offset which you can have
> it as
> > > > a parameter until
> > > > > You get it approved in the crypto xform. I believe it will be beneficial in
> case of
> > > > other crypto cases as well.
> > > > > We can have cipher offset at both places(crypto-op and cipher_xform). It
> will
> > > > give flexibility to the user to
> > > > > override it.
> > > >
> > > > After having another thought on your proposal:
> > > > Probably we can introduce new rte_crypto_sym_xform_types for CPU
> related
> > > > stuff here?
> > >
> > > I also thought of adding new xforms, but that wont serve the purpose for
> may be all the cases.
> > > You would be needing all information currently available in the current
> xforms.
> > > So if you are adding new fields in the new xform, the size will be more than
> that of the union of xforms.
> > > ABI breakage would still be there.
> > >
> > > If you think a valid compression of the AEAD xform can be done, then that
> can be done for each of the
> > > Xforms and we can have a solution to this issue.
> >
> > I think that we can re-use iv.offset for our purposes (for crypto offset).
> > So for now we can make that path work without any ABI breakage.
> > Fan, please feel free to correct me here, if I missed something.
> > If in future we would need to add some extra information it might
> > require ABI breakage, though by now I don't envision anything particular to
> add.
> > Anyway, if there is no objection to go that way, we can try to make
> > these changes for v2.
> >
> 
> Actually, after looking at it more deeply it appears not that easy as I thought it
> would be :)
> Below is a very draft version of proposed API additions.
> I think it avoids ABI breakages right now and provides enough flexibility for
> future extensions (if any).
> For now, it doesn't address your comments about naming conventions (_CPU_
> vs _SYNC_) , etc.
> but I suppose is comprehensive enough to provide a main idea beyond it.
> Akhil and other interested parties, please try to review and provide feedback
> ASAP,
> as related changes would take some time and we still like to hit 19.11 deadline.
> Konstantin
> 
>  diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> b/lib/librte_cryptodev/rte_crypto_sym.h
> index bc8da2466..c03069e23 100644
> --- a/lib/librte_cryptodev/rte_crypto_sym.h
> +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
>   *
>   * This structure contains data relating to Cipher (Encryption and Decryption)
>   *  use to create a session.
> + * Actually I was wrong saying that we don't have free space inside xforms.
> + * Making key struct packed (see below) allow us to regain 6B that could be
> + * used for future extensions.
>   */
>  struct rte_crypto_cipher_xform {
>         enum rte_crypto_cipher_operation op;
> @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
>         struct {
>                 const uint8_t *data;    /**< pointer to key data */
>                 uint16_t length;        /**< key length in bytes */
> -       } key;
> +       } __attribute__((__packed__)) key;
> +
> +       /**
> +         * offset for cipher to start within user provided data buffer.
> +        * Fan suggested another (and less space consuming way) -
> +         * reuse iv.offset space below, by changing:
> +        * struct {uint16_t offset, length;} iv;
> +        * to uunamed union:
> +        * union {
> +        *      struct {uint16_t offset, length;} iv;
> +        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
> +        * };
> +        * Both approaches seems ok to me in general.

No strong opinions here. OK with this one.

> +        * Comments/suggestions are welcome.
> +         */
> +       uint16_t offset;
> +
> +       uint8_t reserved1[4];
> +
>         /**< Cipher key
>          *
>          * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation, key.data will
> @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
>         struct {
>                 const uint8_t *data;    /**< pointer to key data */
>                 uint16_t length;        /**< key length in bytes */
> -       } key;
> +       } __attribute__((__packed__)) key;
>         /**< Authentication key data.
>          * The authentication key length MUST be less than or equal to the
>          * block size of the algorithm. It is the callers responsibility to
> @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
>          * (for example RFC 2104, FIPS 198a).
>          */
> 
> +       uint8_t reserved1[6];
> +
>         struct {
>                 uint16_t offset;
>                 /**< Starting point for Initialisation Vector or Counter,
> @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
>         struct {
>                 const uint8_t *data;    /**< pointer to key data */
>                 uint16_t length;        /**< key length in bytes */
> -       } key;
> +       } __attribute__((__packed__)) key;
> +
> +       /** offset for cipher to start within data buffer */
> +       uint16_t cipher_offset;
> +
> +       uint8_t reserved1[4];
> 
>         struct {
>                 uint16_t offset;
> diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> b/lib/librte_cryptodev/rte_cryptodev.h
> index e175b838c..c0c7bfed7 100644
> --- a/lib/librte_cryptodev/rte_cryptodev.h
> +++ b/lib/librte_cryptodev/rte_cryptodev.h
> @@ -1272,6 +1272,101 @@ void *
>  rte_cryptodev_sym_session_get_user_data(
>                                         struct rte_cryptodev_sym_session *sess);
> 
> +/*
> + * After several thoughts decided not to try to squeeze CPU_CRYPTO
> + * into existing rte_crypto_sym_session structure/API, but instead
> + * introduce an extentsion to it via new fully opaque
> + * struct rte_crypto_cpu_sym_session and additional related API.


What all things do we need to squeeze?
In this proposal I do not see the new struct cpu_sym_session  defined here.
I believe you will have same lib API/struct for cpu_sym_session  and sym_session.
I am not sure if that would be needed.
It would be internal to the driver that if synchronous processing is supported(from feature flag) and
Have relevant fields in xform(the newly added ones which are packed as per your suggestions) set,
It will create that type of session.


> + * Main points:
> + * - Current crypto-dev API is reasonably mature and it is desirable
> + *   to keep it unchanged (API/ABI stability). From other side, this
> + *   new sync API is new one and probably would require extra changes.
> + *   Having it as a new one allows to mark it as experimental, without
> + *   affecting existing one.
> + * - Fully opaque cpu_sym_session structure gives more flexibility
> + *   to the PMD writers and again allows to avoid ABI breakages in future.
> + * - process() function per set of xforms
> + *   allows to expose different process() functions for different
> + *   xform combinations. PMD writer can decide, does he wants to
> + *   push all supported algorithms into one process() function,
> + *   or spread it across several ones.
> + *   I.E. More flexibility for PMD writer.

Which process function should be chosen is internal to PMD, how would that info
be visible to the application or the library. These will get stored in the session private
data. It would be upto the PMD writer, to store the per session process function in
the session private data.

Process function would be a dev ops just like enc/deq operations and it should call
The respective process API stored in the session private data.

I am not sure if you would need a new session init API for this as nothing would be visible to
the app or lib.

> + * - Not storing process() pointer inside the session -
> + *   Allows user to choose does he want to store a process() pointer
> + *   per session, or per group of sessions for that device that share
> + *   the same input xforms. I.E. extra flexibility for the user,
> + *   plus allows us to keep cpu_sym_session totally opaque, see above.

If multiple sessions need to be processed via the same process function, 
PMD would save the same process in all the sessions, I don't think there would
be any perf overhead with that.

> + * Sketched usage model:
> + * ....
> + * /* control path, alloc/init session */
> + * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
> + * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
> + * rte_crypto_cpu_sym_process_t process =
> + *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
> + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
> + * ...
> + * /* data-path*/
> + * process(ses, ....);
> + * ....
> + * /* control path, termiante/free session */
> + * rte_crypto_cpu_sym_session_fini(dev_id, ses);
> + */
> +
> +/**
> + * vector structure, contains pointer to vector array and the length
> + * of the array
> + */
> +struct rte_crypto_vec {
> +       struct iovec *vec;
> +       uint32_t num;
> +};
> +
> +/*
> + * Data-path bulk process crypto function.
> + */
> +typedef void (*rte_crypto_cpu_sym_process_t)(
> +               struct rte_crypto_cpu_sym_session *sess,
> +               struct rte_crypto_vec buf[], void *iv[], void *aad[],
> +               void *digest[], int status[], uint32_t num);
> +/*
> + * for given device return process function specific to input xforms
> + * on error - return NULL and set rte_errno value.
> + * Note that for same input xfroms for the same device should return
> + * the same process function.
> + */
> +__rte_experimental
> +rte_crypto_cpu_sym_process_t
> +rte_crypto_cpu_sym_session_func(uint8_t dev_id,
> +                       const struct rte_crypto_sym_xform *xforms);
> +
> +/*
> + * Return required session size in bytes for given set of xforms.
> + * if xforms == NULL, then return the max possible session size,
> + * that would fit session for any supported by the device algorithm.
> + * if CPU mode is not supported at all, or requeted in xform
> + * algorithm is not supported, then return -ENOTSUP.
> + */
> +__rte_experimental
> +int
> +rte_crypto_cpu_sym_session_size(uint8_t dev_id,
> +                       const struct rte_crypto_sym_xform *xforms);
> +
> +/*
> + * Initialize session.
> + * It is caller responsibility to allocate enough space for it.
> + * See rte_crypto_cpu_sym_session_size above.
> + */
> +__rte_experimental
> +int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
> +                       struct rte_crypto_cpu_sym_session *sess,
> +                       const struct rte_crypto_sym_xform *xforms);
> +
> +__rte_experimental
> +void
> +rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
> +                       struct rte_crypto_cpu_sym_session *sess);
> +
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> index defe05ea0..ed7e63fab 100644
> --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> @@ -310,6 +310,20 @@ typedef void (*cryptodev_sym_free_session_t)(struct
> rte_cryptodev *dev,
>  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev *dev,
>                 struct rte_cryptodev_asym_session *sess);
> 
> +typedef int (*cryptodev_cpu_sym_session_size_t) (struct rte_cryptodev *dev,
> +                       const struct rte_crypto_sym_xform *xforms);
> +
> +typedef int (*cryptodev_cpu_sym_session_init_t) (struct rte_cryptodev *dev,
> +                       struct rte_crypto_cpu_sym_session *sess,
> +                       const struct rte_crypto_sym_xform *xforms);
> +
> +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct rte_cryptodev *dev,
> +                       struct rte_crypto_cpu_sym_session *sess);
> +
> +typedef rte_crypto_cpu_sym_process_t (*cryptodev_cpu_sym_session_func_t)
> (
> +                       struct rte_cryptodev *dev,
> +                       const struct rte_crypto_sym_xform *xforms);
> +
>  /** Crypto device operations function pointer table */
>  struct rte_cryptodev_ops {
>         cryptodev_configure_t dev_configure;    /**< Configure device. */
> @@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
>         /**< Clear a Crypto sessions private data. */
>         cryptodev_asym_free_session_t asym_session_clear;
>         /**< Clear a Crypto sessions private data. */
> +
> +       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
> +       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
> +       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
> +       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
>  };
> 
> 
>
  
Ananyev, Konstantin Sept. 30, 2019, 12:22 p.m. UTC | #17
Hi Akhil,

> > > > > > > > > > > This action type allows the burst of symmetric crypto workload
> > using
> > > > > the
> > > > > > > > > same
> > > > > > > > > > > algorithm, key, and direction being processed by CPU cycles
> > > > > > > synchronously.
> > > > > > > > > > > This flexible action type does not require external hardware
> > > > > involvement,
> > > > > > > > > > > having the crypto workload processed synchronously, and is
> > more
> > > > > > > > > performant
> > > > > > > > > > > than Cryptodev SW PMD due to the saved cycles on removed
> > "async
> > > > > > > mode
> > > > > > > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > > > > > > >
> > > > > > > > > > Does that mean application will not call the
> > cryptodev_enqueue_burst
> > > > > and
> > > > > > > > > corresponding dequeue burst.
> > > > > > > > >
> > > > > > > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > > > > > > >
> > > > > > > > > > It would be a new API something like process_packets and it will
> > have
> > > > > the
> > > > > > > > > crypto processed packets while returning from the API?
> > > > > > > > >
> > > > > > > > > Yes, though the plan is that API will operate on raw data buffers,
> > not
> > > > > mbufs.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I still do not understand why we cannot do with the conventional
> > > > > crypto lib
> > > > > > > > > only.
> > > > > > > > > > As far as I can understand, you are not doing any protocol
> > processing
> > > > > or
> > > > > > > any
> > > > > > > > > value add
> > > > > > > > > > To the crypto processing. IMO, you just need a synchronous
> > crypto
> > > > > > > processing
> > > > > > > > > API which
> > > > > > > > > > Can be defined in cryptodev, you don't need to re-create a crypto
> > > > > session
> > > > > > > in
> > > > > > > > > the name of
> > > > > > > > > > Security session in the driver just to do a synchronous processing.
> > > > > > > > >
> > > > > > > > > I suppose your question is why not to have
> > > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > > > The main reason is that would require disruptive changes in existing
> > > > > > > cryptodev
> > > > > > > > > API
> > > > > > > > > (would cause ABI/API breakage).
> > > > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need
> > some
> > > > > extra
> > > > > > > > > information
> > > > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > > > (cipher offset from the start of the buffer, might be something extra
> > in
> > > > > > > future).
> > > > > > > >
> > > > > > > > Cipher offset will be part of rte_crypto_op.
> > > > > > >
> > > > > > > fill/read (+ alloc/free) is one of the main things that slowdown current
> > > > > crypto-op
> > > > > > > approach.
> > > > > > > That's why the general idea - have all data that wouldn't change from
> > packet
> > > > > to
> > > > > > > packet
> > > > > > > included into the session and setup it once at session_init().
> > > > > >
> > > > > > I agree that you cannot use crypto-op.
> > > > > > You can have the new API in crypto.
> > > > > > As per the current patch, you only need cipher_offset which you can have
> > it as
> > > > > a parameter until
> > > > > > You get it approved in the crypto xform. I believe it will be beneficial in
> > case of
> > > > > other crypto cases as well.
> > > > > > We can have cipher offset at both places(crypto-op and cipher_xform). It
> > will
> > > > > give flexibility to the user to
> > > > > > override it.
> > > > >
> > > > > After having another thought on your proposal:
> > > > > Probably we can introduce new rte_crypto_sym_xform_types for CPU
> > related
> > > > > stuff here?
> > > >
> > > > I also thought of adding new xforms, but that wont serve the purpose for
> > may be all the cases.
> > > > You would be needing all information currently available in the current
> > xforms.
> > > > So if you are adding new fields in the new xform, the size will be more than
> > that of the union of xforms.
> > > > ABI breakage would still be there.
> > > >
> > > > If you think a valid compression of the AEAD xform can be done, then that
> > can be done for each of the
> > > > Xforms and we can have a solution to this issue.
> > >
> > > I think that we can re-use iv.offset for our purposes (for crypto offset).
> > > So for now we can make that path work without any ABI breakage.
> > > Fan, please feel free to correct me here, if I missed something.
> > > If in future we would need to add some extra information it might
> > > require ABI breakage, though by now I don't envision anything particular to
> > add.
> > > Anyway, if there is no objection to go that way, we can try to make
> > > these changes for v2.
> > >
> >
> > Actually, after looking at it more deeply it appears not that easy as I thought it
> > would be :)
> > Below is a very draft version of proposed API additions.
> > I think it avoids ABI breakages right now and provides enough flexibility for
> > future extensions (if any).
> > For now, it doesn't address your comments about naming conventions (_CPU_
> > vs _SYNC_) , etc.
> > but I suppose is comprehensive enough to provide a main idea beyond it.
> > Akhil and other interested parties, please try to review and provide feedback
> > ASAP,
> > as related changes would take some time and we still like to hit 19.11 deadline.
> > Konstantin
> >
> >  diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> > b/lib/librte_cryptodev/rte_crypto_sym.h
> > index bc8da2466..c03069e23 100644
> > --- a/lib/librte_cryptodev/rte_crypto_sym.h
> > +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> > @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
> >   *
> >   * This structure contains data relating to Cipher (Encryption and Decryption)
> >   *  use to create a session.
> > + * Actually I was wrong saying that we don't have free space inside xforms.
> > + * Making key struct packed (see below) allow us to regain 6B that could be
> > + * used for future extensions.
> >   */
> >  struct rte_crypto_cipher_xform {
> >         enum rte_crypto_cipher_operation op;
> > @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
> >         struct {
> >                 const uint8_t *data;    /**< pointer to key data */
> >                 uint16_t length;        /**< key length in bytes */
> > -       } key;
> > +       } __attribute__((__packed__)) key;
> > +
> > +       /**
> > +         * offset for cipher to start within user provided data buffer.
> > +        * Fan suggested another (and less space consuming way) -
> > +         * reuse iv.offset space below, by changing:
> > +        * struct {uint16_t offset, length;} iv;
> > +        * to uunamed union:
> > +        * union {
> > +        *      struct {uint16_t offset, length;} iv;
> > +        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
> > +        * };
> > +        * Both approaches seems ok to me in general.
> 
> No strong opinions here. OK with this one.
> 
> > +        * Comments/suggestions are welcome.
> > +         */
> > +       uint16_t offset;

After another thought - it is probably a bit better to have offset as a separate field.
In that case we can use the same xforms to create both type of sessions.

> > +
> > +       uint8_t reserved1[4];
> > +
> >         /**< Cipher key
> >          *
> >          * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation, key.data will
> > @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
> >         struct {
> >                 const uint8_t *data;    /**< pointer to key data */
> >                 uint16_t length;        /**< key length in bytes */
> > -       } key;
> > +       } __attribute__((__packed__)) key;
> >         /**< Authentication key data.
> >          * The authentication key length MUST be less than or equal to the
> >          * block size of the algorithm. It is the callers responsibility to
> > @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
> >          * (for example RFC 2104, FIPS 198a).
> >          */
> >
> > +       uint8_t reserved1[6];
> > +
> >         struct {
> >                 uint16_t offset;
> >                 /**< Starting point for Initialisation Vector or Counter,
> > @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
> >         struct {
> >                 const uint8_t *data;    /**< pointer to key data */
> >                 uint16_t length;        /**< key length in bytes */
> > -       } key;
> > +       } __attribute__((__packed__)) key;
> > +
> > +       /** offset for cipher to start within data buffer */
> > +       uint16_t cipher_offset;
> > +
> > +       uint8_t reserved1[4];
> >
> >         struct {
> >                 uint16_t offset;
> > diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> > b/lib/librte_cryptodev/rte_cryptodev.h
> > index e175b838c..c0c7bfed7 100644
> > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > @@ -1272,6 +1272,101 @@ void *
> >  rte_cryptodev_sym_session_get_user_data(
> >                                         struct rte_cryptodev_sym_session *sess);
> >
> > +/*
> > + * After several thoughts decided not to try to squeeze CPU_CRYPTO
> > + * into existing rte_crypto_sym_session structure/API, but instead
> > + * introduce an extentsion to it via new fully opaque
> > + * struct rte_crypto_cpu_sym_session and additional related API.
> 
> 
> What all things do we need to squeeze?
> In this proposal I do not see the new struct cpu_sym_session  defined here.

The plan is to have it totally opaque to the user, i.e. just:
struct rte_crypto_cpu_sym_session;
in public header files.

> I believe you will have same lib API/struct for cpu_sym_session  and sym_session.

I thought about such way, but there are few things that looks clumsy to me:
1. Right now there is no 'type' (or so) field inside rte_cryptodev_sym_session,
so it is not possible to easy distinguish what session do you have: lksd_sym or cpu_sym.
In theory, there is a hole of 4B inside rte_cryptodev_sym_session, so we can add some extra field
here, but in that case  we wouldn't be able to use the same xform for both  lksd_sym or cpu_sym
(which seems really plausible thing for me).
2.  Majority of rte_cryptodev_sym_session fields I think are unnecessary for rte_crypto_cpu_sym_session:
sess_data[], opaque_data, user_data, nb_drivers.
All that consumes space, that could be used somewhere else instead.
3. I am a bit reluctant to touch existing rte_cryptodev API - to avoid any breakages I can't foresee right now.
From other side - if we'll add new functions/structs for cpu_sym_session we can mark it
and keep it for some time as experimental, so further changes (if needed) would still be possible.

> I am not sure if that would be needed.
> It would be internal to the driver that if synchronous processing is supported(from feature flag) and
> Have relevant fields in xform(the newly added ones which are packed as per your suggestions) set,
> It will create that type of session.
> 
> 
> > + * Main points:
> > + * - Current crypto-dev API is reasonably mature and it is desirable
> > + *   to keep it unchanged (API/ABI stability). From other side, this
> > + *   new sync API is new one and probably would require extra changes.
> > + *   Having it as a new one allows to mark it as experimental, without
> > + *   affecting existing one.
> > + * - Fully opaque cpu_sym_session structure gives more flexibility
> > + *   to the PMD writers and again allows to avoid ABI breakages in future.
> > + * - process() function per set of xforms
> > + *   allows to expose different process() functions for different
> > + *   xform combinations. PMD writer can decide, does he wants to
> > + *   push all supported algorithms into one process() function,
> > + *   or spread it across several ones.
> > + *   I.E. More flexibility for PMD writer.
> 
> Which process function should be chosen is internal to PMD, how would that info
> be visible to the application or the library. These will get stored in the session private
> data. It would be upto the PMD writer, to store the per session process function in
> the session private data.
> 
> Process function would be a dev ops just like enc/deq operations and it should call
> The respective process API stored in the session private data.

That model (via devops) is possible, but has several drawbacks from my perspective:

1. It means we'll need to pass dev_id as a parameter to process() function.
Though in fact dev_id is not a relevant information for us here
(all we need is pointer to the session and pointer to the fuction to call)
and I tried to avoid using it in data-path functions for that API.
2. As you pointed in that case it will be just one process() function per device.
So if PMD would like to have several process() functions for different type of sessions  
(let say one per alg) first thing it has to do inside it's process() - read session data and
based on that, do a jump/call to particular internal sub-routine.
Something like:
driver_id = get_pmd_driver_id();
priv_ses = ses->sess_data[driver_id];
Then either:
switch(priv_sess->alg) {case XXX: process_XXX(priv_sess, ...);break;...}
OR 
priv_ses->process(priv_sess, ...);

to select and call the proper function.
Looks like totally unnecessary overhead to me.
Though if we'll have ability to query/extract some sort session_ops based on the xform -
we can avoid  this extra de-refererence+jump/call thing.

> 
> I am not sure if you would need a new session init API for this as nothing would be visible to
> the app or lib.
> 
> > + * - Not storing process() pointer inside the session -
> > + *   Allows user to choose does he want to store a process() pointer
> > + *   per session, or per group of sessions for that device that share
> > + *   the same input xforms. I.E. extra flexibility for the user,
> > + *   plus allows us to keep cpu_sym_session totally opaque, see above.
> 
> If multiple sessions need to be processed via the same process function,
> PMD would save the same process in all the sessions, I don't think there would
> be any perf overhead with that.

I think it would, see above.

> 
> > + * Sketched usage model:
> > + * ....
> > + * /* control path, alloc/init session */
> > + * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
> > + * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
> > + * rte_crypto_cpu_sym_process_t process =
> > + *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
> > + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
> > + * ...
> > + * /* data-path*/
> > + * process(ses, ....);
> > + * ....
> > + * /* control path, termiante/free session */
> > + * rte_crypto_cpu_sym_session_fini(dev_id, ses);
> > + */
> > +
> > +/**
> > + * vector structure, contains pointer to vector array and the length
> > + * of the array
> > + */
> > +struct rte_crypto_vec {
> > +       struct iovec *vec;
> > +       uint32_t num;
> > +};
> > +
> > +/*
> > + * Data-path bulk process crypto function.
> > + */
> > +typedef void (*rte_crypto_cpu_sym_process_t)(
> > +               struct rte_crypto_cpu_sym_session *sess,
> > +               struct rte_crypto_vec buf[], void *iv[], void *aad[],
> > +               void *digest[], int status[], uint32_t num);
> > +/*
> > + * for given device return process function specific to input xforms
> > + * on error - return NULL and set rte_errno value.
> > + * Note that for same input xfroms for the same device should return
> > + * the same process function.
> > + */
> > +__rte_experimental
> > +rte_crypto_cpu_sym_process_t
> > +rte_crypto_cpu_sym_session_func(uint8_t dev_id,
> > +                       const struct rte_crypto_sym_xform *xforms);
> > +
> > +/*
> > + * Return required session size in bytes for given set of xforms.
> > + * if xforms == NULL, then return the max possible session size,
> > + * that would fit session for any supported by the device algorithm.
> > + * if CPU mode is not supported at all, or requeted in xform
> > + * algorithm is not supported, then return -ENOTSUP.
> > + */
> > +__rte_experimental
> > +int
> > +rte_crypto_cpu_sym_session_size(uint8_t dev_id,
> > +                       const struct rte_crypto_sym_xform *xforms);
> > +
> > +/*
> > + * Initialize session.
> > + * It is caller responsibility to allocate enough space for it.
> > + * See rte_crypto_cpu_sym_session_size above.
> > + */
> > +__rte_experimental
> > +int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
> > +                       struct rte_crypto_cpu_sym_session *sess,
> > +                       const struct rte_crypto_sym_xform *xforms);
> > +
> > +__rte_experimental
> > +void
> > +rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
> > +                       struct rte_crypto_cpu_sym_session *sess);
> > +
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > index defe05ea0..ed7e63fab 100644
> > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > @@ -310,6 +310,20 @@ typedef void (*cryptodev_sym_free_session_t)(struct
> > rte_cryptodev *dev,
> >  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev *dev,
> >                 struct rte_cryptodev_asym_session *sess);
> >
> > +typedef int (*cryptodev_cpu_sym_session_size_t) (struct rte_cryptodev *dev,
> > +                       const struct rte_crypto_sym_xform *xforms);
> > +
> > +typedef int (*cryptodev_cpu_sym_session_init_t) (struct rte_cryptodev *dev,
> > +                       struct rte_crypto_cpu_sym_session *sess,
> > +                       const struct rte_crypto_sym_xform *xforms);
> > +
> > +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct rte_cryptodev *dev,
> > +                       struct rte_crypto_cpu_sym_session *sess);
> > +
> > +typedef rte_crypto_cpu_sym_process_t (*cryptodev_cpu_sym_session_func_t)
> > (
> > +                       struct rte_cryptodev *dev,
> > +                       const struct rte_crypto_sym_xform *xforms);
> > +
> >  /** Crypto device operations function pointer table */
> >  struct rte_cryptodev_ops {
> >         cryptodev_configure_t dev_configure;    /**< Configure device. */
> > @@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
> >         /**< Clear a Crypto sessions private data. */
> >         cryptodev_asym_free_session_t asym_session_clear;
> >         /**< Clear a Crypto sessions private data. */
> > +
> > +       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
> > +       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
> > +       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
> > +       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
> >  };
> >
> >
> >
  
Akhil Goyal Sept. 30, 2019, 1:43 p.m. UTC | #18
Hi Konstantin,
> 
> Hi Akhil,
> 
> > > > > > > > > > > > This action type allows the burst of symmetric crypto
> workload
> > > using
> > > > > > the
> > > > > > > > > > same
> > > > > > > > > > > > algorithm, key, and direction being processed by CPU cycles
> > > > > > > > synchronously.
> > > > > > > > > > > > This flexible action type does not require external hardware
> > > > > > involvement,
> > > > > > > > > > > > having the crypto workload processed synchronously, and is
> > > more
> > > > > > > > > > performant
> > > > > > > > > > > > than Cryptodev SW PMD due to the saved cycles on removed
> > > "async
> > > > > > > > mode
> > > > > > > > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > > > > > > > >
> > > > > > > > > > > Does that mean application will not call the
> > > cryptodev_enqueue_burst
> > > > > > and
> > > > > > > > > > corresponding dequeue burst.
> > > > > > > > > >
> > > > > > > > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > > > > > > > >
> > > > > > > > > > > It would be a new API something like process_packets and it
> will
> > > have
> > > > > > the
> > > > > > > > > > crypto processed packets while returning from the API?
> > > > > > > > > >
> > > > > > > > > > Yes, though the plan is that API will operate on raw data buffers,
> > > not
> > > > > > mbufs.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I still do not understand why we cannot do with the
> conventional
> > > > > > crypto lib
> > > > > > > > > > only.
> > > > > > > > > > > As far as I can understand, you are not doing any protocol
> > > processing
> > > > > > or
> > > > > > > > any
> > > > > > > > > > value add
> > > > > > > > > > > To the crypto processing. IMO, you just need a synchronous
> > > crypto
> > > > > > > > processing
> > > > > > > > > > API which
> > > > > > > > > > > Can be defined in cryptodev, you don't need to re-create a
> crypto
> > > > > > session
> > > > > > > > in
> > > > > > > > > > the name of
> > > > > > > > > > > Security session in the driver just to do a synchronous
> processing.
> > > > > > > > > >
> > > > > > > > > > I suppose your question is why not to have
> > > > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > > > > The main reason is that would require disruptive changes in
> existing
> > > > > > > > cryptodev
> > > > > > > > > > API
> > > > > > > > > > (would cause ABI/API breakage).
> > > > > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need
> > > some
> > > > > > extra
> > > > > > > > > > information
> > > > > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > > > > (cipher offset from the start of the buffer, might be something
> extra
> > > in
> > > > > > > > future).
> > > > > > > > >
> > > > > > > > > Cipher offset will be part of rte_crypto_op.
> > > > > > > >
> > > > > > > > fill/read (+ alloc/free) is one of the main things that slowdown
> current
> > > > > > crypto-op
> > > > > > > > approach.
> > > > > > > > That's why the general idea - have all data that wouldn't change
> from
> > > packet
> > > > > > to
> > > > > > > > packet
> > > > > > > > included into the session and setup it once at session_init().
> > > > > > >
> > > > > > > I agree that you cannot use crypto-op.
> > > > > > > You can have the new API in crypto.
> > > > > > > As per the current patch, you only need cipher_offset which you can
> have
> > > it as
> > > > > > a parameter until
> > > > > > > You get it approved in the crypto xform. I believe it will be beneficial
> in
> > > case of
> > > > > > other crypto cases as well.
> > > > > > > We can have cipher offset at both places(crypto-op and
> cipher_xform). It
> > > will
> > > > > > give flexibility to the user to
> > > > > > > override it.
> > > > > >
> > > > > > After having another thought on your proposal:
> > > > > > Probably we can introduce new rte_crypto_sym_xform_types for CPU
> > > related
> > > > > > stuff here?
> > > > >
> > > > > I also thought of adding new xforms, but that wont serve the purpose for
> > > may be all the cases.
> > > > > You would be needing all information currently available in the current
> > > xforms.
> > > > > So if you are adding new fields in the new xform, the size will be more
> than
> > > that of the union of xforms.
> > > > > ABI breakage would still be there.
> > > > >
> > > > > If you think a valid compression of the AEAD xform can be done, then
> that
> > > can be done for each of the
> > > > > Xforms and we can have a solution to this issue.
> > > >
> > > > I think that we can re-use iv.offset for our purposes (for crypto offset).
> > > > So for now we can make that path work without any ABI breakage.
> > > > Fan, please feel free to correct me here, if I missed something.
> > > > If in future we would need to add some extra information it might
> > > > require ABI breakage, though by now I don't envision anything particular to
> > > add.
> > > > Anyway, if there is no objection to go that way, we can try to make
> > > > these changes for v2.
> > > >
> > >
> > > Actually, after looking at it more deeply it appears not that easy as I thought
> it
> > > would be :)
> > > Below is a very draft version of proposed API additions.
> > > I think it avoids ABI breakages right now and provides enough flexibility for
> > > future extensions (if any).
> > > For now, it doesn't address your comments about naming conventions
> (_CPU_
> > > vs _SYNC_) , etc.
> > > but I suppose is comprehensive enough to provide a main idea beyond it.
> > > Akhil and other interested parties, please try to review and provide feedback
> > > ASAP,
> > > as related changes would take some time and we still like to hit 19.11
> deadline.
> > > Konstantin
> > >
> > >  diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> > > b/lib/librte_cryptodev/rte_crypto_sym.h
> > > index bc8da2466..c03069e23 100644
> > > --- a/lib/librte_cryptodev/rte_crypto_sym.h
> > > +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> > > @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
> > >   *
> > >   * This structure contains data relating to Cipher (Encryption and Decryption)
> > >   *  use to create a session.
> > > + * Actually I was wrong saying that we don't have free space inside xforms.
> > > + * Making key struct packed (see below) allow us to regain 6B that could be
> > > + * used for future extensions.
> > >   */
> > >  struct rte_crypto_cipher_xform {
> > >         enum rte_crypto_cipher_operation op;
> > > @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
> > >         struct {
> > >                 const uint8_t *data;    /**< pointer to key data */
> > >                 uint16_t length;        /**< key length in bytes */
> > > -       } key;
> > > +       } __attribute__((__packed__)) key;
> > > +
> > > +       /**
> > > +         * offset for cipher to start within user provided data buffer.
> > > +        * Fan suggested another (and less space consuming way) -
> > > +         * reuse iv.offset space below, by changing:
> > > +        * struct {uint16_t offset, length;} iv;
> > > +        * to uunamed union:
> > > +        * union {
> > > +        *      struct {uint16_t offset, length;} iv;
> > > +        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
> > > +        * };
> > > +        * Both approaches seems ok to me in general.
> >
> > No strong opinions here. OK with this one.
> >
> > > +        * Comments/suggestions are welcome.
> > > +         */
> > > +       uint16_t offset;
> 
> After another thought - it is probably a bit better to have offset as a separate
> field.
> In that case we can use the same xforms to create both type of sessions.
ok
> 
> > > +
> > > +       uint8_t reserved1[4];
> > > +
> > >         /**< Cipher key
> > >          *
> > >          * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation, key.data
> will
> > > @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
> > >         struct {
> > >                 const uint8_t *data;    /**< pointer to key data */
> > >                 uint16_t length;        /**< key length in bytes */
> > > -       } key;
> > > +       } __attribute__((__packed__)) key;
> > >         /**< Authentication key data.
> > >          * The authentication key length MUST be less than or equal to the
> > >          * block size of the algorithm. It is the callers responsibility to
> > > @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
> > >          * (for example RFC 2104, FIPS 198a).
> > >          */
> > >
> > > +       uint8_t reserved1[6];
> > > +
> > >         struct {
> > >                 uint16_t offset;
> > >                 /**< Starting point for Initialisation Vector or Counter,
> > > @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
> > >         struct {
> > >                 const uint8_t *data;    /**< pointer to key data */
> > >                 uint16_t length;        /**< key length in bytes */
> > > -       } key;
> > > +       } __attribute__((__packed__)) key;
> > > +
> > > +       /** offset for cipher to start within data buffer */
> > > +       uint16_t cipher_offset;
> > > +
> > > +       uint8_t reserved1[4];
> > >
> > >         struct {
> > >                 uint16_t offset;
> > > diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> > > b/lib/librte_cryptodev/rte_cryptodev.h
> > > index e175b838c..c0c7bfed7 100644
> > > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > > @@ -1272,6 +1272,101 @@ void *
> > >  rte_cryptodev_sym_session_get_user_data(
> > >                                         struct rte_cryptodev_sym_session *sess);
> > >
> > > +/*
> > > + * After several thoughts decided not to try to squeeze CPU_CRYPTO
> > > + * into existing rte_crypto_sym_session structure/API, but instead
> > > + * introduce an extentsion to it via new fully opaque
> > > + * struct rte_crypto_cpu_sym_session and additional related API.
> >
> >
> > What all things do we need to squeeze?
> > In this proposal I do not see the new struct cpu_sym_session  defined here.
> 
> The plan is to have it totally opaque to the user, i.e. just:
> struct rte_crypto_cpu_sym_session;
> in public header files.
> 
> > I believe you will have same lib API/struct for cpu_sym_session  and
> sym_session.
> 
> I thought about such way, but there are few things that looks clumsy to me:
> 1. Right now there is no 'type' (or so) field inside rte_cryptodev_sym_session,
> so it is not possible to easy distinguish what session do you have: lksd_sym or
> cpu_sym.
> In theory, there is a hole of 4B inside rte_cryptodev_sym_session, so we can add
> some extra field
> here, but in that case  we wouldn't be able to use the same xform for both
> lksd_sym or cpu_sym
> (which seems really plausible thing for me).
> 2.  Majority of rte_cryptodev_sym_session fields I think are unnecessary for
> rte_crypto_cpu_sym_session:
> sess_data[], opaque_data, user_data, nb_drivers.
> All that consumes space, that could be used somewhere else instead.
> 3. I am a bit reluctant to touch existing rte_cryptodev API - to avoid any
> breakages I can't foresee right now.
> From other side - if we'll add new functions/structs for cpu_sym_session we can
> mark it
> and keep it for some time as experimental, so further changes (if needed) would
> still be possible.
> 

OK let us assume that you have a separate structure. But I have a few queries:
1. how can multiple drivers use a same session
2. Can somebody use the scheduler pmd for scheduling the different type of payloads for the same session?

With your proposal the APIs would be very specific to your use case only.
When you would add more functionality to this sync API/struct, it will end up being the same API/struct.

Let us  see how close/ far we are from the existing APIs when the actual implementation is done.

> > I am not sure if that would be needed.
> > It would be internal to the driver that if synchronous processing is
> supported(from feature flag) and
> > Have relevant fields in xform(the newly added ones which are packed as per
> your suggestions) set,
> > It will create that type of session.
> >
> >
> > > + * Main points:
> > > + * - Current crypto-dev API is reasonably mature and it is desirable
> > > + *   to keep it unchanged (API/ABI stability). From other side, this
> > > + *   new sync API is new one and probably would require extra changes.
> > > + *   Having it as a new one allows to mark it as experimental, without
> > > + *   affecting existing one.
> > > + * - Fully opaque cpu_sym_session structure gives more flexibility
> > > + *   to the PMD writers and again allows to avoid ABI breakages in future.
> > > + * - process() function per set of xforms
> > > + *   allows to expose different process() functions for different
> > > + *   xform combinations. PMD writer can decide, does he wants to
> > > + *   push all supported algorithms into one process() function,
> > > + *   or spread it across several ones.
> > > + *   I.E. More flexibility for PMD writer.
> >
> > Which process function should be chosen is internal to PMD, how would that
> info
> > be visible to the application or the library. These will get stored in the session
> private
> > data. It would be upto the PMD writer, to store the per session process
> function in
> > the session private data.
> >
> > Process function would be a dev ops just like enc/deq operations and it should
> call
> > The respective process API stored in the session private data.
> 
> That model (via devops) is possible, but has several drawbacks from my
> perspective:
> 
> 1. It means we'll need to pass dev_id as a parameter to process() function.
> Though in fact dev_id is not a relevant information for us here
> (all we need is pointer to the session and pointer to the fuction to call)
> and I tried to avoid using it in data-path functions for that API.

You have a single vdev, but someone may have multiple vdevs for each thread, or may
Have same dev with multiple queues for each core.

> 2. As you pointed in that case it will be just one process() function per device.
> So if PMD would like to have several process() functions for different type of
> sessions
> (let say one per alg) first thing it has to do inside it's process() - read session data
> and
> based on that, do a jump/call to particular internal sub-routine.
> Something like:
> driver_id = get_pmd_driver_id();
> priv_ses = ses->sess_data[driver_id];
> Then either:
> switch(priv_sess->alg) {case XXX: process_XXX(priv_sess, ...);break;...}
> OR
> priv_ses->process(priv_sess, ...);
> 
> to select and call the proper function.
> Looks like totally unnecessary overhead to me.
> Though if we'll have ability to query/extract some sort session_ops based on the
> xform -
> we can avoid  this extra de-refererence+jump/call thing.

What is the issue in the priv_ses->process(); approach?
I don't understand what are you saving by not doing this.
In any case you would need to identify which session correspond to which process().
For that you would be doing it somewhere in your data path.

> 
> >
> > I am not sure if you would need a new session init API for this as nothing would
> be visible to
> > the app or lib.
> >
> > > + * - Not storing process() pointer inside the session -
> > > + *   Allows user to choose does he want to store a process() pointer
> > > + *   per session, or per group of sessions for that device that share
> > > + *   the same input xforms. I.E. extra flexibility for the user,
> > > + *   plus allows us to keep cpu_sym_session totally opaque, see above.
> >
> > If multiple sessions need to be processed via the same process function,
> > PMD would save the same process in all the sessions, I don't think there would
> > be any perf overhead with that.
> 
> I think it would, see above.
> 
> >
> > > + * Sketched usage model:
> > > + * ....
> > > + * /* control path, alloc/init session */
> > > + * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
> > > + * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
> > > + * rte_crypto_cpu_sym_process_t process =
> > > + *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
> > > + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
> > > + * ...
> > > + * /* data-path*/
> > > + * process(ses, ....);
> > > + * ....
> > > + * /* control path, termiante/free session */
> > > + * rte_crypto_cpu_sym_session_fini(dev_id, ses);
> > > + */
> > > +
> > > +/**
> > > + * vector structure, contains pointer to vector array and the length
> > > + * of the array
> > > + */
> > > +struct rte_crypto_vec {
> > > +       struct iovec *vec;
> > > +       uint32_t num;
> > > +};
> > > +
> > > +/*
> > > + * Data-path bulk process crypto function.
> > > + */
> > > +typedef void (*rte_crypto_cpu_sym_process_t)(
> > > +               struct rte_crypto_cpu_sym_session *sess,
> > > +               struct rte_crypto_vec buf[], void *iv[], void *aad[],
> > > +               void *digest[], int status[], uint32_t num);
> > > +/*
> > > + * for given device return process function specific to input xforms
> > > + * on error - return NULL and set rte_errno value.
> > > + * Note that for same input xfroms for the same device should return
> > > + * the same process function.
> > > + */
> > > +__rte_experimental
> > > +rte_crypto_cpu_sym_process_t
> > > +rte_crypto_cpu_sym_session_func(uint8_t dev_id,
> > > +                       const struct rte_crypto_sym_xform *xforms);
> > > +
> > > +/*
> > > + * Return required session size in bytes for given set of xforms.
> > > + * if xforms == NULL, then return the max possible session size,
> > > + * that would fit session for any supported by the device algorithm.
> > > + * if CPU mode is not supported at all, or requeted in xform
> > > + * algorithm is not supported, then return -ENOTSUP.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_crypto_cpu_sym_session_size(uint8_t dev_id,
> > > +                       const struct rte_crypto_sym_xform *xforms);
> > > +
> > > +/*
> > > + * Initialize session.
> > > + * It is caller responsibility to allocate enough space for it.
> > > + * See rte_crypto_cpu_sym_session_size above.
> > > + */
> > > +__rte_experimental
> > > +int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
> > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > +                       const struct rte_crypto_sym_xform *xforms);
> > > +
> > > +__rte_experimental
> > > +void
> > > +rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
> > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > +
> > > +
> > >  #ifdef __cplusplus
> > >  }
> > >  #endif
> > > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > index defe05ea0..ed7e63fab 100644
> > > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > @@ -310,6 +310,20 @@ typedef void
> (*cryptodev_sym_free_session_t)(struct
> > > rte_cryptodev *dev,
> > >  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev *dev,
> > >                 struct rte_cryptodev_asym_session *sess);
> > >
> > > +typedef int (*cryptodev_cpu_sym_session_size_t) (struct rte_cryptodev
> *dev,
> > > +                       const struct rte_crypto_sym_xform *xforms);
> > > +
> > > +typedef int (*cryptodev_cpu_sym_session_init_t) (struct rte_cryptodev
> *dev,
> > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > +                       const struct rte_crypto_sym_xform *xforms);
> > > +
> > > +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct rte_cryptodev
> *dev,
> > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > +
> > > +typedef rte_crypto_cpu_sym_process_t
> (*cryptodev_cpu_sym_session_func_t)
> > > (
> > > +                       struct rte_cryptodev *dev,
> > > +                       const struct rte_crypto_sym_xform *xforms);
> > > +
> > >  /** Crypto device operations function pointer table */
> > >  struct rte_cryptodev_ops {
> > >         cryptodev_configure_t dev_configure;    /**< Configure device. */
> > > @@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
> > >         /**< Clear a Crypto sessions private data. */
> > >         cryptodev_asym_free_session_t asym_session_clear;
> > >         /**< Clear a Crypto sessions private data. */
> > > +
> > > +       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
> > > +       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
> > > +       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
> > > +       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
> > >  };
> > >
> > >
> > >
  
Ananyev, Konstantin Oct. 1, 2019, 2:49 p.m. UTC | #19
Hi Akhil,

> > > > > > > > > > > > > This action type allows the burst of symmetric crypto
> > workload
> > > > using
> > > > > > > the
> > > > > > > > > > > same
> > > > > > > > > > > > > algorithm, key, and direction being processed by CPU cycles
> > > > > > > > > synchronously.
> > > > > > > > > > > > > This flexible action type does not require external hardware
> > > > > > > involvement,
> > > > > > > > > > > > > having the crypto workload processed synchronously, and is
> > > > more
> > > > > > > > > > > performant
> > > > > > > > > > > > > than Cryptodev SW PMD due to the saved cycles on removed
> > > > "async
> > > > > > > > > mode
> > > > > > > > > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > > > > > > > > >
> > > > > > > > > > > > Does that mean application will not call the
> > > > cryptodev_enqueue_burst
> > > > > > > and
> > > > > > > > > > > corresponding dequeue burst.
> > > > > > > > > > >
> > > > > > > > > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > > > > > > > > >
> > > > > > > > > > > > It would be a new API something like process_packets and it
> > will
> > > > have
> > > > > > > the
> > > > > > > > > > > crypto processed packets while returning from the API?
> > > > > > > > > > >
> > > > > > > > > > > Yes, though the plan is that API will operate on raw data buffers,
> > > > not
> > > > > > > mbufs.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I still do not understand why we cannot do with the
> > conventional
> > > > > > > crypto lib
> > > > > > > > > > > only.
> > > > > > > > > > > > As far as I can understand, you are not doing any protocol
> > > > processing
> > > > > > > or
> > > > > > > > > any
> > > > > > > > > > > value add
> > > > > > > > > > > > To the crypto processing. IMO, you just need a synchronous
> > > > crypto
> > > > > > > > > processing
> > > > > > > > > > > API which
> > > > > > > > > > > > Can be defined in cryptodev, you don't need to re-create a
> > crypto
> > > > > > > session
> > > > > > > > > in
> > > > > > > > > > > the name of
> > > > > > > > > > > > Security session in the driver just to do a synchronous
> > processing.
> > > > > > > > > > >
> > > > > > > > > > > I suppose your question is why not to have
> > > > > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > > > > > The main reason is that would require disruptive changes in
> > existing
> > > > > > > > > cryptodev
> > > > > > > > > > > API
> > > > > > > > > > > (would cause ABI/API breakage).
> > > > > > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need
> > > > some
> > > > > > > extra
> > > > > > > > > > > information
> > > > > > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > > > > > (cipher offset from the start of the buffer, might be something
> > extra
> > > > in
> > > > > > > > > future).
> > > > > > > > > >
> > > > > > > > > > Cipher offset will be part of rte_crypto_op.
> > > > > > > > >
> > > > > > > > > fill/read (+ alloc/free) is one of the main things that slowdown
> > current
> > > > > > > crypto-op
> > > > > > > > > approach.
> > > > > > > > > That's why the general idea - have all data that wouldn't change
> > from
> > > > packet
> > > > > > > to
> > > > > > > > > packet
> > > > > > > > > included into the session and setup it once at session_init().
> > > > > > > >
> > > > > > > > I agree that you cannot use crypto-op.
> > > > > > > > You can have the new API in crypto.
> > > > > > > > As per the current patch, you only need cipher_offset which you can
> > have
> > > > it as
> > > > > > > a parameter until
> > > > > > > > You get it approved in the crypto xform. I believe it will be beneficial
> > in
> > > > case of
> > > > > > > other crypto cases as well.
> > > > > > > > We can have cipher offset at both places(crypto-op and
> > cipher_xform). It
> > > > will
> > > > > > > give flexibility to the user to
> > > > > > > > override it.
> > > > > > >
> > > > > > > After having another thought on your proposal:
> > > > > > > Probably we can introduce new rte_crypto_sym_xform_types for CPU
> > > > related
> > > > > > > stuff here?
> > > > > >
> > > > > > I also thought of adding new xforms, but that wont serve the purpose for
> > > > may be all the cases.
> > > > > > You would be needing all information currently available in the current
> > > > xforms.
> > > > > > So if you are adding new fields in the new xform, the size will be more
> > than
> > > > that of the union of xforms.
> > > > > > ABI breakage would still be there.
> > > > > >
> > > > > > If you think a valid compression of the AEAD xform can be done, then
> > that
> > > > can be done for each of the
> > > > > > Xforms and we can have a solution to this issue.
> > > > >
> > > > > I think that we can re-use iv.offset for our purposes (for crypto offset).
> > > > > So for now we can make that path work without any ABI breakage.
> > > > > Fan, please feel free to correct me here, if I missed something.
> > > > > If in future we would need to add some extra information it might
> > > > > require ABI breakage, though by now I don't envision anything particular to
> > > > add.
> > > > > Anyway, if there is no objection to go that way, we can try to make
> > > > > these changes for v2.
> > > > >
> > > >
> > > > Actually, after looking at it more deeply it appears not that easy as I thought
> > it
> > > > would be :)
> > > > Below is a very draft version of proposed API additions.
> > > > I think it avoids ABI breakages right now and provides enough flexibility for
> > > > future extensions (if any).
> > > > For now, it doesn't address your comments about naming conventions
> > (_CPU_
> > > > vs _SYNC_) , etc.
> > > > but I suppose is comprehensive enough to provide a main idea beyond it.
> > > > Akhil and other interested parties, please try to review and provide feedback
> > > > ASAP,
> > > > as related changes would take some time and we still like to hit 19.11
> > deadline.
> > > > Konstantin
> > > >
> > > >  diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > index bc8da2466..c03069e23 100644
> > > > --- a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
> > > >   *
> > > >   * This structure contains data relating to Cipher (Encryption and Decryption)
> > > >   *  use to create a session.
> > > > + * Actually I was wrong saying that we don't have free space inside xforms.
> > > > + * Making key struct packed (see below) allow us to regain 6B that could be
> > > > + * used for future extensions.
> > > >   */
> > > >  struct rte_crypto_cipher_xform {
> > > >         enum rte_crypto_cipher_operation op;
> > > > @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
> > > >         struct {
> > > >                 const uint8_t *data;    /**< pointer to key data */
> > > >                 uint16_t length;        /**< key length in bytes */
> > > > -       } key;
> > > > +       } __attribute__((__packed__)) key;
> > > > +
> > > > +       /**
> > > > +         * offset for cipher to start within user provided data buffer.
> > > > +        * Fan suggested another (and less space consuming way) -
> > > > +         * reuse iv.offset space below, by changing:
> > > > +        * struct {uint16_t offset, length;} iv;
> > > > +        * to uunamed union:
> > > > +        * union {
> > > > +        *      struct {uint16_t offset, length;} iv;
> > > > +        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
> > > > +        * };
> > > > +        * Both approaches seems ok to me in general.
> > >
> > > No strong opinions here. OK with this one.
> > >
> > > > +        * Comments/suggestions are welcome.
> > > > +         */
> > > > +       uint16_t offset;
> >
> > After another thought - it is probably a bit better to have offset as a separate
> > field.
> > In that case we can use the same xforms to create both type of sessions.
> ok
> >
> > > > +
> > > > +       uint8_t reserved1[4];
> > > > +
> > > >         /**< Cipher key
> > > >          *
> > > >          * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation, key.data
> > will
> > > > @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
> > > >         struct {
> > > >                 const uint8_t *data;    /**< pointer to key data */
> > > >                 uint16_t length;        /**< key length in bytes */
> > > > -       } key;
> > > > +       } __attribute__((__packed__)) key;
> > > >         /**< Authentication key data.
> > > >          * The authentication key length MUST be less than or equal to the
> > > >          * block size of the algorithm. It is the callers responsibility to
> > > > @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
> > > >          * (for example RFC 2104, FIPS 198a).
> > > >          */
> > > >
> > > > +       uint8_t reserved1[6];
> > > > +
> > > >         struct {
> > > >                 uint16_t offset;
> > > >                 /**< Starting point for Initialisation Vector or Counter,
> > > > @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
> > > >         struct {
> > > >                 const uint8_t *data;    /**< pointer to key data */
> > > >                 uint16_t length;        /**< key length in bytes */
> > > > -       } key;
> > > > +       } __attribute__((__packed__)) key;
> > > > +
> > > > +       /** offset for cipher to start within data buffer */
> > > > +       uint16_t cipher_offset;
> > > > +
> > > > +       uint8_t reserved1[4];
> > > >
> > > >         struct {
> > > >                 uint16_t offset;
> > > > diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> > > > b/lib/librte_cryptodev/rte_cryptodev.h
> > > > index e175b838c..c0c7bfed7 100644
> > > > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > > > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > > > @@ -1272,6 +1272,101 @@ void *
> > > >  rte_cryptodev_sym_session_get_user_data(
> > > >                                         struct rte_cryptodev_sym_session *sess);
> > > >
> > > > +/*
> > > > + * After several thoughts decided not to try to squeeze CPU_CRYPTO
> > > > + * into existing rte_crypto_sym_session structure/API, but instead
> > > > + * introduce an extentsion to it via new fully opaque
> > > > + * struct rte_crypto_cpu_sym_session and additional related API.
> > >
> > >
> > > What all things do we need to squeeze?
> > > In this proposal I do not see the new struct cpu_sym_session  defined here.
> >
> > The plan is to have it totally opaque to the user, i.e. just:
> > struct rte_crypto_cpu_sym_session;
> > in public header files.
> >
> > > I believe you will have same lib API/struct for cpu_sym_session  and
> > sym_session.
> >
> > I thought about such way, but there are few things that looks clumsy to me:
> > 1. Right now there is no 'type' (or so) field inside rte_cryptodev_sym_session,
> > so it is not possible to easy distinguish what session do you have: lksd_sym or
> > cpu_sym.
> > In theory, there is a hole of 4B inside rte_cryptodev_sym_session, so we can add
> > some extra field
> > here, but in that case  we wouldn't be able to use the same xform for both
> > lksd_sym or cpu_sym
> > (which seems really plausible thing for me).
> > 2.  Majority of rte_cryptodev_sym_session fields I think are unnecessary for
> > rte_crypto_cpu_sym_session:
> > sess_data[], opaque_data, user_data, nb_drivers.
> > All that consumes space, that could be used somewhere else instead.
> > 3. I am a bit reluctant to touch existing rte_cryptodev API - to avoid any
> > breakages I can't foresee right now.
> > From other side - if we'll add new functions/structs for cpu_sym_session we can
> > mark it
> > and keep it for some time as experimental, so further changes (if needed) would
> > still be possible.
> >
> 
> OK let us assume that you have a separate structure. But I have a few queries:
> 1. how can multiple drivers use a same session

As a short answer: they can't.
It is pretty much the same approach as with rte_security - each device needs to create/init its own session.
So upper layer would need to maintain its own array (or so) for such case.
Though the question is why would you like to have same session over multiple SW backed devices?
As it would be anyway just a synchronous function call that will be executed on the same cpu. 

> 2. Can somebody use the scheduler pmd for scheduling the different type of payloads for the same session?

In theory yes. 
Though for that scheduler pmd should have inside it's rte_crypto_cpu_sym_session an array of pointers to
the underlying devices sessions.

> 
> With your proposal the APIs would be very specific to your use case only.

Yes in some way.
I consider that API specific for SW backed crypto PMDs.
I can hardly see how any 'real HW' PMDs (lksd-none, lksd-proto) will benefit from it.
Current crypto-op API is very much HW oriented. 
Which is ok, that's for it was intended for, but I think we also need one that would be designed
for SW backed implementation in mind.

> When you would add more functionality to this sync API/struct, it will end up being the same API/struct.
> 
> Let us  see how close/ far we are from the existing APIs when the actual implementation is done.
> 
> > > I am not sure if that would be needed.
> > > It would be internal to the driver that if synchronous processing is
> > supported(from feature flag) and
> > > Have relevant fields in xform(the newly added ones which are packed as per
> > your suggestions) set,
> > > It will create that type of session.
> > >
> > >
> > > > + * Main points:
> > > > + * - Current crypto-dev API is reasonably mature and it is desirable
> > > > + *   to keep it unchanged (API/ABI stability). From other side, this
> > > > + *   new sync API is new one and probably would require extra changes.
> > > > + *   Having it as a new one allows to mark it as experimental, without
> > > > + *   affecting existing one.
> > > > + * - Fully opaque cpu_sym_session structure gives more flexibility
> > > > + *   to the PMD writers and again allows to avoid ABI breakages in future.
> > > > + * - process() function per set of xforms
> > > > + *   allows to expose different process() functions for different
> > > > + *   xform combinations. PMD writer can decide, does he wants to
> > > > + *   push all supported algorithms into one process() function,
> > > > + *   or spread it across several ones.
> > > > + *   I.E. More flexibility for PMD writer.
> > >
> > > Which process function should be chosen is internal to PMD, how would that
> > info
> > > be visible to the application or the library. These will get stored in the session
> > private
> > > data. It would be upto the PMD writer, to store the per session process
> > function in
> > > the session private data.
> > >
> > > Process function would be a dev ops just like enc/deq operations and it should
> > call
> > > The respective process API stored in the session private data.
> >
> > That model (via devops) is possible, but has several drawbacks from my
> > perspective:
> >
> > 1. It means we'll need to pass dev_id as a parameter to process() function.
> > Though in fact dev_id is not a relevant information for us here
> > (all we need is pointer to the session and pointer to the fuction to call)
> > and I tried to avoid using it in data-path functions for that API.
> 
> You have a single vdev, but someone may have multiple vdevs for each thread, or may
> Have same dev with multiple queues for each core.

That's fine. As I said above it is a SW backed implementation.
Each session has to be a separate entity that contains all necessary information
(keys, alg/mode info,  etc.)  to process input buffers.
Plus we need the actual function pointer to call.
I just don't see what for we need a dev_id in that situation.
Again, here we don't need care about queues and their pinning to cores.
If let say someone would like to process buffers from the same IPsec SA on 2
different cores in parallel, he can just create 2 sessions for the same xform,
give one to thread #1  and second to thread #2.
After that both threads are free to call process(this_thread_ses, ...) at will.  

> 
> > 2. As you pointed in that case it will be just one process() function per device.
> > So if PMD would like to have several process() functions for different type of
> > sessions
> > (let say one per alg) first thing it has to do inside it's process() - read session data
> > and
> > based on that, do a jump/call to particular internal sub-routine.
> > Something like:
> > driver_id = get_pmd_driver_id();
> > priv_ses = ses->sess_data[driver_id];
> > Then either:
> > switch(priv_sess->alg) {case XXX: process_XXX(priv_sess, ...);break;...}
> > OR
> > priv_ses->process(priv_sess, ...);
> >
> > to select and call the proper function.
> > Looks like totally unnecessary overhead to me.
> > Though if we'll have ability to query/extract some sort session_ops based on the
> > xform -
> > we can avoid  this extra de-refererence+jump/call thing.
> 
> What is the issue in the priv_ses->process(); approach?

Nothing at all.
What I am saying that schema with dev_ops 
dev[dev_id]->dev_ops.process(ses->priv_ses[driver_id], ...)
   |
   |-> priv_ses->process(...)

Has bigger overhead then just:
process(ses,...);

So what for to introduce extra-level of indirection here?

> I don't understand what are you saving by not doing this.
> In any case you would need to identify which session correspond to which process().

Yes, sure, but I think we can make user to store information that relationship,
in a way he likes: store process() pointer for each session, or group sessions
that share the same process() somehow, or...

> For that you would be doing it somewhere in your data path.

Why at data-path?
Only once at session creation/initialization time.
Or might be even once per group of sessions.

> 
> >
> > >
> > > I am not sure if you would need a new session init API for this as nothing would
> > be visible to
> > > the app or lib.
> > >
> > > > + * - Not storing process() pointer inside the session -
> > > > + *   Allows user to choose does he want to store a process() pointer
> > > > + *   per session, or per group of sessions for that device that share
> > > > + *   the same input xforms. I.E. extra flexibility for the user,
> > > > + *   plus allows us to keep cpu_sym_session totally opaque, see above.
> > >
> > > If multiple sessions need to be processed via the same process function,
> > > PMD would save the same process in all the sessions, I don't think there would
> > > be any perf overhead with that.
> >
> > I think it would, see above.
> >
> > >
> > > > + * Sketched usage model:
> > > > + * ....
> > > > + * /* control path, alloc/init session */
> > > > + * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
> > > > + * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
> > > > + * rte_crypto_cpu_sym_process_t process =
> > > > + *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
> > > > + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
> > > > + * ...
> > > > + * /* data-path*/
> > > > + * process(ses, ....);
> > > > + * ....
> > > > + * /* control path, termiante/free session */
> > > > + * rte_crypto_cpu_sym_session_fini(dev_id, ses);
> > > > + */
> > > > +
> > > > +/**
> > > > + * vector structure, contains pointer to vector array and the length
> > > > + * of the array
> > > > + */
> > > > +struct rte_crypto_vec {
> > > > +       struct iovec *vec;
> > > > +       uint32_t num;
> > > > +};
> > > > +
> > > > +/*
> > > > + * Data-path bulk process crypto function.
> > > > + */
> > > > +typedef void (*rte_crypto_cpu_sym_process_t)(
> > > > +               struct rte_crypto_cpu_sym_session *sess,
> > > > +               struct rte_crypto_vec buf[], void *iv[], void *aad[],
> > > > +               void *digest[], int status[], uint32_t num);
> > > > +/*
> > > > + * for given device return process function specific to input xforms
> > > > + * on error - return NULL and set rte_errno value.
> > > > + * Note that for same input xfroms for the same device should return
> > > > + * the same process function.
> > > > + */
> > > > +__rte_experimental
> > > > +rte_crypto_cpu_sym_process_t
> > > > +rte_crypto_cpu_sym_session_func(uint8_t dev_id,
> > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > +
> > > > +/*
> > > > + * Return required session size in bytes for given set of xforms.
> > > > + * if xforms == NULL, then return the max possible session size,
> > > > + * that would fit session for any supported by the device algorithm.
> > > > + * if CPU mode is not supported at all, or requeted in xform
> > > > + * algorithm is not supported, then return -ENOTSUP.
> > > > + */
> > > > +__rte_experimental
> > > > +int
> > > > +rte_crypto_cpu_sym_session_size(uint8_t dev_id,
> > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > +
> > > > +/*
> > > > + * Initialize session.
> > > > + * It is caller responsibility to allocate enough space for it.
> > > > + * See rte_crypto_cpu_sym_session_size above.
> > > > + */
> > > > +__rte_experimental
> > > > +int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
> > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > +
> > > > +__rte_experimental
> > > > +void
> > > > +rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
> > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > +
> > > > +
> > > >  #ifdef __cplusplus
> > > >  }
> > > >  #endif
> > > > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > index defe05ea0..ed7e63fab 100644
> > > > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > @@ -310,6 +310,20 @@ typedef void
> > (*cryptodev_sym_free_session_t)(struct
> > > > rte_cryptodev *dev,
> > > >  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev *dev,
> > > >                 struct rte_cryptodev_asym_session *sess);
> > > >
> > > > +typedef int (*cryptodev_cpu_sym_session_size_t) (struct rte_cryptodev
> > *dev,
> > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > +
> > > > +typedef int (*cryptodev_cpu_sym_session_init_t) (struct rte_cryptodev
> > *dev,
> > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > +
> > > > +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct rte_cryptodev
> > *dev,
> > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > +
> > > > +typedef rte_crypto_cpu_sym_process_t
> > (*cryptodev_cpu_sym_session_func_t)
> > > > (
> > > > +                       struct rte_cryptodev *dev,
> > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > +
> > > >  /** Crypto device operations function pointer table */
> > > >  struct rte_cryptodev_ops {
> > > >         cryptodev_configure_t dev_configure;    /**< Configure device. */
> > > > @@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
> > > >         /**< Clear a Crypto sessions private data. */
> > > >         cryptodev_asym_free_session_t asym_session_clear;
> > > >         /**< Clear a Crypto sessions private data. */
> > > > +
> > > > +       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
> > > > +       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
> > > > +       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
> > > > +       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
> > > >  };
> > > >
> > > >
> > > >
  
Akhil Goyal Oct. 3, 2019, 1:24 p.m. UTC | #20
Hi Konstantin,
> 
> Hi Akhil,
> 
> > > > > > > > > > > > > > This action type allows the burst of symmetric crypto
> > > workload
> > > > > using
> > > > > > > > the
> > > > > > > > > > > > same
> > > > > > > > > > > > > > algorithm, key, and direction being processed by CPU
> cycles
> > > > > > > > > > synchronously.
> > > > > > > > > > > > > > This flexible action type does not require external
> hardware
> > > > > > > > involvement,
> > > > > > > > > > > > > > having the crypto workload processed synchronously,
> and is
> > > > > more
> > > > > > > > > > > > performant
> > > > > > > > > > > > > > than Cryptodev SW PMD due to the saved cycles on
> removed
> > > > > "async
> > > > > > > > > > mode
> > > > > > > > > > > > > > simulation" as well as 3 cacheline access of the crypto
> ops.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Does that mean application will not call the
> > > > > cryptodev_enqueue_burst
> > > > > > > > and
> > > > > > > > > > > > corresponding dequeue burst.
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, instead it just call
> rte_security_process_cpu_crypto_bulk(...)
> > > > > > > > > > > >
> > > > > > > > > > > > > It would be a new API something like process_packets and
> it
> > > will
> > > > > have
> > > > > > > > the
> > > > > > > > > > > > crypto processed packets while returning from the API?
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, though the plan is that API will operate on raw data
> buffers,
> > > > > not
> > > > > > > > mbufs.
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I still do not understand why we cannot do with the
> > > conventional
> > > > > > > > crypto lib
> > > > > > > > > > > > only.
> > > > > > > > > > > > > As far as I can understand, you are not doing any protocol
> > > > > processing
> > > > > > > > or
> > > > > > > > > > any
> > > > > > > > > > > > value add
> > > > > > > > > > > > > To the crypto processing. IMO, you just need a
> synchronous
> > > > > crypto
> > > > > > > > > > processing
> > > > > > > > > > > > API which
> > > > > > > > > > > > > Can be defined in cryptodev, you don't need to re-create a
> > > crypto
> > > > > > > > session
> > > > > > > > > > in
> > > > > > > > > > > > the name of
> > > > > > > > > > > > > Security session in the driver just to do a synchronous
> > > processing.
> > > > > > > > > > > >
> > > > > > > > > > > > I suppose your question is why not to have
> > > > > > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > > > > > > The main reason is that would require disruptive changes in
> > > existing
> > > > > > > > > > cryptodev
> > > > > > > > > > > > API
> > > > > > > > > > > > (would cause ABI/API breakage).
> > > > > > > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> need
> > > > > some
> > > > > > > > extra
> > > > > > > > > > > > information
> > > > > > > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > > > > > > (cipher offset from the start of the buffer, might be
> something
> > > extra
> > > > > in
> > > > > > > > > > future).
> > > > > > > > > > >
> > > > > > > > > > > Cipher offset will be part of rte_crypto_op.
> > > > > > > > > >
> > > > > > > > > > fill/read (+ alloc/free) is one of the main things that slowdown
> > > current
> > > > > > > > crypto-op
> > > > > > > > > > approach.
> > > > > > > > > > That's why the general idea - have all data that wouldn't change
> > > from
> > > > > packet
> > > > > > > > to
> > > > > > > > > > packet
> > > > > > > > > > included into the session and setup it once at session_init().
> > > > > > > > >
> > > > > > > > > I agree that you cannot use crypto-op.
> > > > > > > > > You can have the new API in crypto.
> > > > > > > > > As per the current patch, you only need cipher_offset which you
> can
> > > have
> > > > > it as
> > > > > > > > a parameter until
> > > > > > > > > You get it approved in the crypto xform. I believe it will be
> beneficial
> > > in
> > > > > case of
> > > > > > > > other crypto cases as well.
> > > > > > > > > We can have cipher offset at both places(crypto-op and
> > > cipher_xform). It
> > > > > will
> > > > > > > > give flexibility to the user to
> > > > > > > > > override it.
> > > > > > > >
> > > > > > > > After having another thought on your proposal:
> > > > > > > > Probably we can introduce new rte_crypto_sym_xform_types for
> CPU
> > > > > related
> > > > > > > > stuff here?
> > > > > > >
> > > > > > > I also thought of adding new xforms, but that wont serve the purpose
> for
> > > > > may be all the cases.
> > > > > > > You would be needing all information currently available in the
> current
> > > > > xforms.
> > > > > > > So if you are adding new fields in the new xform, the size will be more
> > > than
> > > > > that of the union of xforms.
> > > > > > > ABI breakage would still be there.
> > > > > > >
> > > > > > > If you think a valid compression of the AEAD xform can be done, then
> > > that
> > > > > can be done for each of the
> > > > > > > Xforms and we can have a solution to this issue.
> > > > > >
> > > > > > I think that we can re-use iv.offset for our purposes (for crypto offset).
> > > > > > So for now we can make that path work without any ABI breakage.
> > > > > > Fan, please feel free to correct me here, if I missed something.
> > > > > > If in future we would need to add some extra information it might
> > > > > > require ABI breakage, though by now I don't envision anything
> particular to
> > > > > add.
> > > > > > Anyway, if there is no objection to go that way, we can try to make
> > > > > > these changes for v2.
> > > > > >
> > > > >
> > > > > Actually, after looking at it more deeply it appears not that easy as I
> thought
> > > it
> > > > > would be :)
> > > > > Below is a very draft version of proposed API additions.
> > > > > I think it avoids ABI breakages right now and provides enough flexibility
> for
> > > > > future extensions (if any).
> > > > > For now, it doesn't address your comments about naming conventions
> > > (_CPU_
> > > > > vs _SYNC_) , etc.
> > > > > but I suppose is comprehensive enough to provide a main idea beyond it.
> > > > > Akhil and other interested parties, please try to review and provide
> feedback
> > > > > ASAP,
> > > > > as related changes would take some time and we still like to hit 19.11
> > > deadline.
> > > > > Konstantin
> > > > >
> > > > >  diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > index bc8da2466..c03069e23 100644
> > > > > --- a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
> > > > >   *
> > > > >   * This structure contains data relating to Cipher (Encryption and
> Decryption)
> > > > >   *  use to create a session.
> > > > > + * Actually I was wrong saying that we don't have free space inside
> xforms.
> > > > > + * Making key struct packed (see below) allow us to regain 6B that could
> be
> > > > > + * used for future extensions.
> > > > >   */
> > > > >  struct rte_crypto_cipher_xform {
> > > > >         enum rte_crypto_cipher_operation op;
> > > > > @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
> > > > >         struct {
> > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > -       } key;
> > > > > +       } __attribute__((__packed__)) key;
> > > > > +
> > > > > +       /**
> > > > > +         * offset for cipher to start within user provided data buffer.
> > > > > +        * Fan suggested another (and less space consuming way) -
> > > > > +         * reuse iv.offset space below, by changing:
> > > > > +        * struct {uint16_t offset, length;} iv;
> > > > > +        * to uunamed union:
> > > > > +        * union {
> > > > > +        *      struct {uint16_t offset, length;} iv;
> > > > > +        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
> > > > > +        * };
> > > > > +        * Both approaches seems ok to me in general.
> > > >
> > > > No strong opinions here. OK with this one.
> > > >
> > > > > +        * Comments/suggestions are welcome.
> > > > > +         */
> > > > > +       uint16_t offset;
> > >
> > > After another thought - it is probably a bit better to have offset as a separate
> > > field.
> > > In that case we can use the same xforms to create both type of sessions.
> > ok
> > >
> > > > > +
> > > > > +       uint8_t reserved1[4];
> > > > > +
> > > > >         /**< Cipher key
> > > > >          *
> > > > >          * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation,
> key.data
> > > will
> > > > > @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
> > > > >         struct {
> > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > -       } key;
> > > > > +       } __attribute__((__packed__)) key;
> > > > >         /**< Authentication key data.
> > > > >          * The authentication key length MUST be less than or equal to the
> > > > >          * block size of the algorithm. It is the callers responsibility to
> > > > > @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
> > > > >          * (for example RFC 2104, FIPS 198a).
> > > > >          */
> > > > >
> > > > > +       uint8_t reserved1[6];
> > > > > +
> > > > >         struct {
> > > > >                 uint16_t offset;
> > > > >                 /**< Starting point for Initialisation Vector or Counter,
> > > > > @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
> > > > >         struct {
> > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > -       } key;
> > > > > +       } __attribute__((__packed__)) key;
> > > > > +
> > > > > +       /** offset for cipher to start within data buffer */
> > > > > +       uint16_t cipher_offset;
> > > > > +
> > > > > +       uint8_t reserved1[4];
> > > > >
> > > > >         struct {
> > > > >                 uint16_t offset;
> > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > index e175b838c..c0c7bfed7 100644
> > > > > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > @@ -1272,6 +1272,101 @@ void *
> > > > >  rte_cryptodev_sym_session_get_user_data(
> > > > >                                         struct rte_cryptodev_sym_session *sess);
> > > > >
> > > > > +/*
> > > > > + * After several thoughts decided not to try to squeeze CPU_CRYPTO
> > > > > + * into existing rte_crypto_sym_session structure/API, but instead
> > > > > + * introduce an extentsion to it via new fully opaque
> > > > > + * struct rte_crypto_cpu_sym_session and additional related API.
> > > >
> > > >
> > > > What all things do we need to squeeze?
> > > > In this proposal I do not see the new struct cpu_sym_session  defined here.
> > >
> > > The plan is to have it totally opaque to the user, i.e. just:
> > > struct rte_crypto_cpu_sym_session;
> > > in public header files.
> > >
> > > > I believe you will have same lib API/struct for cpu_sym_session  and
> > > sym_session.
> > >
> > > I thought about such way, but there are few things that looks clumsy to me:
> > > 1. Right now there is no 'type' (or so) field inside rte_cryptodev_sym_session,
> > > so it is not possible to easy distinguish what session do you have: lksd_sym or
> > > cpu_sym.
> > > In theory, there is a hole of 4B inside rte_cryptodev_sym_session, so we can
> add
> > > some extra field
> > > here, but in that case  we wouldn't be able to use the same xform for both
> > > lksd_sym or cpu_sym
> > > (which seems really plausible thing for me).
> > > 2.  Majority of rte_cryptodev_sym_session fields I think are unnecessary for
> > > rte_crypto_cpu_sym_session:
> > > sess_data[], opaque_data, user_data, nb_drivers.
> > > All that consumes space, that could be used somewhere else instead.
> > > 3. I am a bit reluctant to touch existing rte_cryptodev API - to avoid any
> > > breakages I can't foresee right now.
> > > From other side - if we'll add new functions/structs for cpu_sym_session we
> can
> > > mark it
> > > and keep it for some time as experimental, so further changes (if needed)
> would
> > > still be possible.
> > >
> >
> > OK let us assume that you have a separate structure. But I have a few queries:
> > 1. how can multiple drivers use a same session
> 
> As a short answer: they can't.
> It is pretty much the same approach as with rte_security - each device needs to
> create/init its own session.
> So upper layer would need to maintain its own array (or so) for such case.
> Though the question is why would you like to have same session over multiple
> SW backed devices?
> As it would be anyway just a synchronous function call that will be executed on
> the same cpu.

I may have single FAT tunnel which may be distributed over multiple
Cores, and each core is affined to a different SW device.
So a single session may be accessed by multiple devices.

One more example would be depending on packet sizes, I may switch between
HW/SW PMDs with the same session.

> 
> > 2. Can somebody use the scheduler pmd for scheduling the different type of
> payloads for the same session?
> 
> In theory yes.
> Though for that scheduler pmd should have inside it's
> rte_crypto_cpu_sym_session an array of pointers to
> the underlying devices sessions.
> 
> >
> > With your proposal the APIs would be very specific to your use case only.
> 
> Yes in some way.
> I consider that API specific for SW backed crypto PMDs.
> I can hardly see how any 'real HW' PMDs (lksd-none, lksd-proto) will benefit
> from it.
> Current crypto-op API is very much HW oriented.
> Which is ok, that's for it was intended for, but I think we also need one that
> would be designed
> for SW backed implementation in mind.

We may re-use your API for HW PMDs as well which do not have requirement of
Crypto-op/mbuf etc.
The return type of your new process API may have a status which say 'processed'
Or can be say 'enqueued'. So if it is  'enqueued', we may have a new API for raw
Bufs dequeue as well.

This requirement can be for any hardware PMDs like QAT as well.
That is why a dev-ops would be a better option.

> 
> > When you would add more functionality to this sync API/struct, it will end up
> being the same API/struct.
> >
> > Let us  see how close/ far we are from the existing APIs when the actual
> implementation is done.
> >
> > > > I am not sure if that would be needed.
> > > > It would be internal to the driver that if synchronous processing is
> > > supported(from feature flag) and
> > > > Have relevant fields in xform(the newly added ones which are packed as
> per
> > > your suggestions) set,
> > > > It will create that type of session.
> > > >
> > > >
> > > > > + * Main points:
> > > > > + * - Current crypto-dev API is reasonably mature and it is desirable
> > > > > + *   to keep it unchanged (API/ABI stability). From other side, this
> > > > > + *   new sync API is new one and probably would require extra changes.
> > > > > + *   Having it as a new one allows to mark it as experimental, without
> > > > > + *   affecting existing one.
> > > > > + * - Fully opaque cpu_sym_session structure gives more flexibility
> > > > > + *   to the PMD writers and again allows to avoid ABI breakages in future.
> > > > > + * - process() function per set of xforms
> > > > > + *   allows to expose different process() functions for different
> > > > > + *   xform combinations. PMD writer can decide, does he wants to
> > > > > + *   push all supported algorithms into one process() function,
> > > > > + *   or spread it across several ones.
> > > > > + *   I.E. More flexibility for PMD writer.
> > > >
> > > > Which process function should be chosen is internal to PMD, how would
> that
> > > info
> > > > be visible to the application or the library. These will get stored in the
> session
> > > private
> > > > data. It would be upto the PMD writer, to store the per session process
> > > function in
> > > > the session private data.
> > > >
> > > > Process function would be a dev ops just like enc/deq operations and it
> should
> > > call
> > > > The respective process API stored in the session private data.
> > >
> > > That model (via devops) is possible, but has several drawbacks from my
> > > perspective:
> > >
> > > 1. It means we'll need to pass dev_id as a parameter to process() function.
> > > Though in fact dev_id is not a relevant information for us here
> > > (all we need is pointer to the session and pointer to the fuction to call)
> > > and I tried to avoid using it in data-path functions for that API.
> >
> > You have a single vdev, but someone may have multiple vdevs for each thread,
> or may
> > Have same dev with multiple queues for each core.
> 
> That's fine. As I said above it is a SW backed implementation.
> Each session has to be a separate entity that contains all necessary information
> (keys, alg/mode info,  etc.)  to process input buffers.
> Plus we need the actual function pointer to call.
> I just don't see what for we need a dev_id in that situation.

To iterate the session private data in the session.

> Again, here we don't need care about queues and their pinning to cores.
> If let say someone would like to process buffers from the same IPsec SA on 2
> different cores in parallel, he can just create 2 sessions for the same xform,
> give one to thread #1  and second to thread #2.
> After that both threads are free to call process(this_thread_ses, ...) at will.

Say you have a 16core device to handle 100G of traffic on a single tunnel.
Will we make 16 sessions with same parameters?

> 
> >
> > > 2. As you pointed in that case it will be just one process() function per device.
> > > So if PMD would like to have several process() functions for different type of
> > > sessions
> > > (let say one per alg) first thing it has to do inside it's process() - read session
> data
> > > and
> > > based on that, do a jump/call to particular internal sub-routine.
> > > Something like:
> > > driver_id = get_pmd_driver_id();
> > > priv_ses = ses->sess_data[driver_id];
> > > Then either:
> > > switch(priv_sess->alg) {case XXX: process_XXX(priv_sess, ...);break;...}
> > > OR
> > > priv_ses->process(priv_sess, ...);
> > >
> > > to select and call the proper function.
> > > Looks like totally unnecessary overhead to me.
> > > Though if we'll have ability to query/extract some sort session_ops based on
> the
> > > xform -
> > > we can avoid  this extra de-refererence+jump/call thing.
> >
> > What is the issue in the priv_ses->process(); approach?
> 
> Nothing at all.
> What I am saying that schema with dev_ops
> dev[dev_id]->dev_ops.process(ses->priv_ses[driver_id], ...)
>    |
>    |-> priv_ses->process(...)
> 
> Has bigger overhead then just:
> process(ses,...);
> 
> So what for to introduce extra-level of indirection here?

Explained above.

> 
> > I don't understand what are you saving by not doing this.
> > In any case you would need to identify which session correspond to which
> process().
> 
> Yes, sure, but I think we can make user to store information that relationship,
> in a way he likes: store process() pointer for each session, or group sessions
> that share the same process() somehow, or...

So whatever relationship that user will make and store will make its life complicated.
If we can hide that information in the driver, then what is the issue in that and user
Will not need to worry. He would just call the process() and driver will choose which
Process need to be called.

I think we should have a POC around this and see the difference in the cycle count.
IMO it would be negligible and we would end up making a generic API set which
can be used by others as well.

> 
> > For that you would be doing it somewhere in your data path.
> 
> Why at data-path?
> Only once at session creation/initialization time.
> Or might be even once per group of sessions.
> 
> >
> > >
> > > >
> > > > I am not sure if you would need a new session init API for this as nothing
> would
> > > be visible to
> > > > the app or lib.
> > > >
> > > > > + * - Not storing process() pointer inside the session -
> > > > > + *   Allows user to choose does he want to store a process() pointer
> > > > > + *   per session, or per group of sessions for that device that share
> > > > > + *   the same input xforms. I.E. extra flexibility for the user,
> > > > > + *   plus allows us to keep cpu_sym_session totally opaque, see above.
> > > >
> > > > If multiple sessions need to be processed via the same process function,
> > > > PMD would save the same process in all the sessions, I don't think there
> would
> > > > be any perf overhead with that.
> > >
> > > I think it would, see above.
> > >
> > > >
> > > > > + * Sketched usage model:
> > > > > + * ....
> > > > > + * /* control path, alloc/init session */
> > > > > + * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
> > > > > + * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
> > > > > + * rte_crypto_cpu_sym_process_t process =
> > > > > + *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
> > > > > + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
> > > > > + * ...
> > > > > + * /* data-path*/
> > > > > + * process(ses, ....);
> > > > > + * ....
> > > > > + * /* control path, termiante/free session */
> > > > > + * rte_crypto_cpu_sym_session_fini(dev_id, ses);
> > > > > + */
> > > > > +
> > > > > +/**
> > > > > + * vector structure, contains pointer to vector array and the length
> > > > > + * of the array
> > > > > + */
> > > > > +struct rte_crypto_vec {
> > > > > +       struct iovec *vec;
> > > > > +       uint32_t num;
> > > > > +};
> > > > > +
> > > > > +/*
> > > > > + * Data-path bulk process crypto function.
> > > > > + */
> > > > > +typedef void (*rte_crypto_cpu_sym_process_t)(
> > > > > +               struct rte_crypto_cpu_sym_session *sess,
> > > > > +               struct rte_crypto_vec buf[], void *iv[], void *aad[],
> > > > > +               void *digest[], int status[], uint32_t num);
> > > > > +/*
> > > > > + * for given device return process function specific to input xforms
> > > > > + * on error - return NULL and set rte_errno value.
> > > > > + * Note that for same input xfroms for the same device should return
> > > > > + * the same process function.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +rte_crypto_cpu_sym_process_t
> > > > > +rte_crypto_cpu_sym_session_func(uint8_t dev_id,
> > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > +
> > > > > +/*
> > > > > + * Return required session size in bytes for given set of xforms.
> > > > > + * if xforms == NULL, then return the max possible session size,
> > > > > + * that would fit session for any supported by the device algorithm.
> > > > > + * if CPU mode is not supported at all, or requeted in xform
> > > > > + * algorithm is not supported, then return -ENOTSUP.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +int
> > > > > +rte_crypto_cpu_sym_session_size(uint8_t dev_id,
> > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > +
> > > > > +/*
> > > > > + * Initialize session.
> > > > > + * It is caller responsibility to allocate enough space for it.
> > > > > + * See rte_crypto_cpu_sym_session_size above.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
> > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > +
> > > > > +__rte_experimental
> > > > > +void
> > > > > +rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
> > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > +
> > > > > +
> > > > >  #ifdef __cplusplus
> > > > >  }
> > > > >  #endif
> > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > index defe05ea0..ed7e63fab 100644
> > > > > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > @@ -310,6 +310,20 @@ typedef void
> > > (*cryptodev_sym_free_session_t)(struct
> > > > > rte_cryptodev *dev,
> > > > >  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev
> *dev,
> > > > >                 struct rte_cryptodev_asym_session *sess);
> > > > >
> > > > > +typedef int (*cryptodev_cpu_sym_session_size_t) (struct rte_cryptodev
> > > *dev,
> > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > +
> > > > > +typedef int (*cryptodev_cpu_sym_session_init_t) (struct rte_cryptodev
> > > *dev,
> > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > +
> > > > > +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct rte_cryptodev
> > > *dev,
> > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > +
> > > > > +typedef rte_crypto_cpu_sym_process_t
> > > (*cryptodev_cpu_sym_session_func_t)
> > > > > (
> > > > > +                       struct rte_cryptodev *dev,
> > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > +
> > > > >  /** Crypto device operations function pointer table */
> > > > >  struct rte_cryptodev_ops {
> > > > >         cryptodev_configure_t dev_configure;    /**< Configure device. */
> > > > > @@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
> > > > >         /**< Clear a Crypto sessions private data. */
> > > > >         cryptodev_asym_free_session_t asym_session_clear;
> > > > >         /**< Clear a Crypto sessions private data. */
> > > > > +
> > > > > +       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
> > > > > +       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
> > > > > +       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
> > > > > +       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
> > > > >  };
> > > > >
> > > > >
> > > > >
  
Ananyev, Konstantin Oct. 7, 2019, 12:53 p.m. UTC | #21
Hi Akhil,

> > > > > > > > > > > > > > > This action type allows the burst of symmetric crypto
> > > > workload
> > > > > > using
> > > > > > > > > the
> > > > > > > > > > > > > same
> > > > > > > > > > > > > > > algorithm, key, and direction being processed by CPU
> > cycles
> > > > > > > > > > > synchronously.
> > > > > > > > > > > > > > > This flexible action type does not require external
> > hardware
> > > > > > > > > involvement,
> > > > > > > > > > > > > > > having the crypto workload processed synchronously,
> > and is
> > > > > > more
> > > > > > > > > > > > > performant
> > > > > > > > > > > > > > > than Cryptodev SW PMD due to the saved cycles on
> > removed
> > > > > > "async
> > > > > > > > > > > mode
> > > > > > > > > > > > > > > simulation" as well as 3 cacheline access of the crypto
> > ops.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Does that mean application will not call the
> > > > > > cryptodev_enqueue_burst
> > > > > > > > > and
> > > > > > > > > > > > > corresponding dequeue burst.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, instead it just call
> > rte_security_process_cpu_crypto_bulk(...)
> > > > > > > > > > > > >
> > > > > > > > > > > > > > It would be a new API something like process_packets and
> > it
> > > > will
> > > > > > have
> > > > > > > > > the
> > > > > > > > > > > > > crypto processed packets while returning from the API?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, though the plan is that API will operate on raw data
> > buffers,
> > > > > > not
> > > > > > > > > mbufs.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I still do not understand why we cannot do with the
> > > > conventional
> > > > > > > > > crypto lib
> > > > > > > > > > > > > only.
> > > > > > > > > > > > > > As far as I can understand, you are not doing any protocol
> > > > > > processing
> > > > > > > > > or
> > > > > > > > > > > any
> > > > > > > > > > > > > value add
> > > > > > > > > > > > > > To the crypto processing. IMO, you just need a
> > synchronous
> > > > > > crypto
> > > > > > > > > > > processing
> > > > > > > > > > > > > API which
> > > > > > > > > > > > > > Can be defined in cryptodev, you don't need to re-create a
> > > > crypto
> > > > > > > > > session
> > > > > > > > > > > in
> > > > > > > > > > > > > the name of
> > > > > > > > > > > > > > Security session in the driver just to do a synchronous
> > > > processing.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I suppose your question is why not to have
> > > > > > > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > > > > > > > The main reason is that would require disruptive changes in
> > > > existing
> > > > > > > > > > > cryptodev
> > > > > > > > > > > > > API
> > > > > > > > > > > > > (would cause ABI/API breakage).
> > > > > > > > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > need
> > > > > > some
> > > > > > > > > extra
> > > > > > > > > > > > > information
> > > > > > > > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > > > > > > > (cipher offset from the start of the buffer, might be
> > something
> > > > extra
> > > > > > in
> > > > > > > > > > > future).
> > > > > > > > > > > >
> > > > > > > > > > > > Cipher offset will be part of rte_crypto_op.
> > > > > > > > > > >
> > > > > > > > > > > fill/read (+ alloc/free) is one of the main things that slowdown
> > > > current
> > > > > > > > > crypto-op
> > > > > > > > > > > approach.
> > > > > > > > > > > That's why the general idea - have all data that wouldn't change
> > > > from
> > > > > > packet
> > > > > > > > > to
> > > > > > > > > > > packet
> > > > > > > > > > > included into the session and setup it once at session_init().
> > > > > > > > > >
> > > > > > > > > > I agree that you cannot use crypto-op.
> > > > > > > > > > You can have the new API in crypto.
> > > > > > > > > > As per the current patch, you only need cipher_offset which you
> > can
> > > > have
> > > > > > it as
> > > > > > > > > a parameter until
> > > > > > > > > > You get it approved in the crypto xform. I believe it will be
> > beneficial
> > > > in
> > > > > > case of
> > > > > > > > > other crypto cases as well.
> > > > > > > > > > We can have cipher offset at both places(crypto-op and
> > > > cipher_xform). It
> > > > > > will
> > > > > > > > > give flexibility to the user to
> > > > > > > > > > override it.
> > > > > > > > >
> > > > > > > > > After having another thought on your proposal:
> > > > > > > > > Probably we can introduce new rte_crypto_sym_xform_types for
> > CPU
> > > > > > related
> > > > > > > > > stuff here?
> > > > > > > >
> > > > > > > > I also thought of adding new xforms, but that wont serve the purpose
> > for
> > > > > > may be all the cases.
> > > > > > > > You would be needing all information currently available in the
> > current
> > > > > > xforms.
> > > > > > > > So if you are adding new fields in the new xform, the size will be more
> > > > than
> > > > > > that of the union of xforms.
> > > > > > > > ABI breakage would still be there.
> > > > > > > >
> > > > > > > > If you think a valid compression of the AEAD xform can be done, then
> > > > that
> > > > > > can be done for each of the
> > > > > > > > Xforms and we can have a solution to this issue.
> > > > > > >
> > > > > > > I think that we can re-use iv.offset for our purposes (for crypto offset).
> > > > > > > So for now we can make that path work without any ABI breakage.
> > > > > > > Fan, please feel free to correct me here, if I missed something.
> > > > > > > If in future we would need to add some extra information it might
> > > > > > > require ABI breakage, though by now I don't envision anything
> > particular to
> > > > > > add.
> > > > > > > Anyway, if there is no objection to go that way, we can try to make
> > > > > > > these changes for v2.
> > > > > > >
> > > > > >
> > > > > > Actually, after looking at it more deeply it appears not that easy as I
> > thought
> > > > it
> > > > > > would be :)
> > > > > > Below is a very draft version of proposed API additions.
> > > > > > I think it avoids ABI breakages right now and provides enough flexibility
> > for
> > > > > > future extensions (if any).
> > > > > > For now, it doesn't address your comments about naming conventions
> > > > (_CPU_
> > > > > > vs _SYNC_) , etc.
> > > > > > but I suppose is comprehensive enough to provide a main idea beyond it.
> > > > > > Akhil and other interested parties, please try to review and provide
> > feedback
> > > > > > ASAP,
> > > > > > as related changes would take some time and we still like to hit 19.11
> > > > deadline.
> > > > > > Konstantin
> > > > > >
> > > > > >  diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > index bc8da2466..c03069e23 100644
> > > > > > --- a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
> > > > > >   *
> > > > > >   * This structure contains data relating to Cipher (Encryption and
> > Decryption)
> > > > > >   *  use to create a session.
> > > > > > + * Actually I was wrong saying that we don't have free space inside
> > xforms.
> > > > > > + * Making key struct packed (see below) allow us to regain 6B that could
> > be
> > > > > > + * used for future extensions.
> > > > > >   */
> > > > > >  struct rte_crypto_cipher_xform {
> > > > > >         enum rte_crypto_cipher_operation op;
> > > > > > @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
> > > > > >         struct {
> > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > -       } key;
> > > > > > +       } __attribute__((__packed__)) key;
> > > > > > +
> > > > > > +       /**
> > > > > > +         * offset for cipher to start within user provided data buffer.
> > > > > > +        * Fan suggested another (and less space consuming way) -
> > > > > > +         * reuse iv.offset space below, by changing:
> > > > > > +        * struct {uint16_t offset, length;} iv;
> > > > > > +        * to uunamed union:
> > > > > > +        * union {
> > > > > > +        *      struct {uint16_t offset, length;} iv;
> > > > > > +        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
> > > > > > +        * };
> > > > > > +        * Both approaches seems ok to me in general.
> > > > >
> > > > > No strong opinions here. OK with this one.
> > > > >
> > > > > > +        * Comments/suggestions are welcome.
> > > > > > +         */
> > > > > > +       uint16_t offset;
> > > >
> > > > After another thought - it is probably a bit better to have offset as a separate
> > > > field.
> > > > In that case we can use the same xforms to create both type of sessions.
> > > ok
> > > >
> > > > > > +
> > > > > > +       uint8_t reserved1[4];
> > > > > > +
> > > > > >         /**< Cipher key
> > > > > >          *
> > > > > >          * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation,
> > key.data
> > > > will
> > > > > > @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
> > > > > >         struct {
> > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > -       } key;
> > > > > > +       } __attribute__((__packed__)) key;
> > > > > >         /**< Authentication key data.
> > > > > >          * The authentication key length MUST be less than or equal to the
> > > > > >          * block size of the algorithm. It is the callers responsibility to
> > > > > > @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
> > > > > >          * (for example RFC 2104, FIPS 198a).
> > > > > >          */
> > > > > >
> > > > > > +       uint8_t reserved1[6];
> > > > > > +
> > > > > >         struct {
> > > > > >                 uint16_t offset;
> > > > > >                 /**< Starting point for Initialisation Vector or Counter,
> > > > > > @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
> > > > > >         struct {
> > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > -       } key;
> > > > > > +       } __attribute__((__packed__)) key;
> > > > > > +
> > > > > > +       /** offset for cipher to start within data buffer */
> > > > > > +       uint16_t cipher_offset;
> > > > > > +
> > > > > > +       uint8_t reserved1[4];
> > > > > >
> > > > > >         struct {
> > > > > >                 uint16_t offset;
> > > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > index e175b838c..c0c7bfed7 100644
> > > > > > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > @@ -1272,6 +1272,101 @@ void *
> > > > > >  rte_cryptodev_sym_session_get_user_data(
> > > > > >                                         struct rte_cryptodev_sym_session *sess);
> > > > > >
> > > > > > +/*
> > > > > > + * After several thoughts decided not to try to squeeze CPU_CRYPTO
> > > > > > + * into existing rte_crypto_sym_session structure/API, but instead
> > > > > > + * introduce an extentsion to it via new fully opaque
> > > > > > + * struct rte_crypto_cpu_sym_session and additional related API.
> > > > >
> > > > >
> > > > > What all things do we need to squeeze?
> > > > > In this proposal I do not see the new struct cpu_sym_session  defined here.
> > > >
> > > > The plan is to have it totally opaque to the user, i.e. just:
> > > > struct rte_crypto_cpu_sym_session;
> > > > in public header files.
> > > >
> > > > > I believe you will have same lib API/struct for cpu_sym_session  and
> > > > sym_session.
> > > >
> > > > I thought about such way, but there are few things that looks clumsy to me:
> > > > 1. Right now there is no 'type' (or so) field inside rte_cryptodev_sym_session,
> > > > so it is not possible to easy distinguish what session do you have: lksd_sym or
> > > > cpu_sym.
> > > > In theory, there is a hole of 4B inside rte_cryptodev_sym_session, so we can
> > add
> > > > some extra field
> > > > here, but in that case  we wouldn't be able to use the same xform for both
> > > > lksd_sym or cpu_sym
> > > > (which seems really plausible thing for me).
> > > > 2.  Majority of rte_cryptodev_sym_session fields I think are unnecessary for
> > > > rte_crypto_cpu_sym_session:
> > > > sess_data[], opaque_data, user_data, nb_drivers.
> > > > All that consumes space, that could be used somewhere else instead.
> > > > 3. I am a bit reluctant to touch existing rte_cryptodev API - to avoid any
> > > > breakages I can't foresee right now.
> > > > From other side - if we'll add new functions/structs for cpu_sym_session we
> > can
> > > > mark it
> > > > and keep it for some time as experimental, so further changes (if needed)
> > would
> > > > still be possible.
> > > >
> > >
> > > OK let us assume that you have a separate structure. But I have a few queries:
> > > 1. how can multiple drivers use a same session
> >
> > As a short answer: they can't.
> > It is pretty much the same approach as with rte_security - each device needs to
> > create/init its own session.
> > So upper layer would need to maintain its own array (or so) for such case.
> > Though the question is why would you like to have same session over multiple
> > SW backed devices?
> > As it would be anyway just a synchronous function call that will be executed on
> > the same cpu.
> 
> I may have single FAT tunnel which may be distributed over multiple
> Cores, and each core is affined to a different SW device.

If it is pure SW, then we don't need multiple devices for such scenario.
Device in that case is pure abstraction that we can skip.

> So a single session may be accessed by multiple devices.
> 
> One more example would be depending on packet sizes, I may switch between
> HW/SW PMDs with the same session.

Sure, but then we'll have multiple sessions.
BTW, we have same thing now - these private session pointers are just stored
inside the same rte_crypto_sym_session.
And if user wants to support this model, he would also need to store <dev_id, queue_id>
pair for each HW device anyway.

> 
> >
> > > 2. Can somebody use the scheduler pmd for scheduling the different type of
> > payloads for the same session?
> >
> > In theory yes.
> > Though for that scheduler pmd should have inside it's
> > rte_crypto_cpu_sym_session an array of pointers to
> > the underlying devices sessions.
> >
> > >
> > > With your proposal the APIs would be very specific to your use case only.
> >
> > Yes in some way.
> > I consider that API specific for SW backed crypto PMDs.
> > I can hardly see how any 'real HW' PMDs (lksd-none, lksd-proto) will benefit
> > from it.
> > Current crypto-op API is very much HW oriented.
> > Which is ok, that's for it was intended for, but I think we also need one that
> > would be designed
> > for SW backed implementation in mind.
> 
> We may re-use your API for HW PMDs as well which do not have requirement of
> Crypto-op/mbuf etc.
> The return type of your new process API may have a status which say 'processed'
> Or can be say 'enqueued'. So if it is  'enqueued', we may have a new API for raw
> Bufs dequeue as well.
> 
> This requirement can be for any hardware PMDs like QAT as well.

I don't think it is a good idea to extend this API for async (lookaside) devices.
You'll need to:
 - provide dev_id and queue_id for each process(enqueue) and dequeuer operation.
 - provide IOVA for all buffers passing to that function (data buffers, digest, IV, aad).
 - On dequeue provide some way to associate dequed data and digest buffers with
   crypto-session that was used  (and probably with mbuf).  
 So most likely we'll end up with another just version of our current crypto-op structure.  
If you'd like to get rid of mbufs dependency within current crypto-op API that understandable,
but I don't think we should have same API for both sync (CPU) and async (lookaside) cases. 
It doesn't seem feasible at all and voids whole purpose of that patch.

> That is why a dev-ops would be a better option.
> 
> >
> > > When you would add more functionality to this sync API/struct, it will end up
> > being the same API/struct.
> > >
> > > Let us  see how close/ far we are from the existing APIs when the actual
> > implementation is done.
> > >
> > > > > I am not sure if that would be needed.
> > > > > It would be internal to the driver that if synchronous processing is
> > > > supported(from feature flag) and
> > > > > Have relevant fields in xform(the newly added ones which are packed as
> > per
> > > > your suggestions) set,
> > > > > It will create that type of session.
> > > > >
> > > > >
> > > > > > + * Main points:
> > > > > > + * - Current crypto-dev API is reasonably mature and it is desirable
> > > > > > + *   to keep it unchanged (API/ABI stability). From other side, this
> > > > > > + *   new sync API is new one and probably would require extra changes.
> > > > > > + *   Having it as a new one allows to mark it as experimental, without
> > > > > > + *   affecting existing one.
> > > > > > + * - Fully opaque cpu_sym_session structure gives more flexibility
> > > > > > + *   to the PMD writers and again allows to avoid ABI breakages in future.
> > > > > > + * - process() function per set of xforms
> > > > > > + *   allows to expose different process() functions for different
> > > > > > + *   xform combinations. PMD writer can decide, does he wants to
> > > > > > + *   push all supported algorithms into one process() function,
> > > > > > + *   or spread it across several ones.
> > > > > > + *   I.E. More flexibility for PMD writer.
> > > > >
> > > > > Which process function should be chosen is internal to PMD, how would
> > that
> > > > info
> > > > > be visible to the application or the library. These will get stored in the
> > session
> > > > private
> > > > > data. It would be upto the PMD writer, to store the per session process
> > > > function in
> > > > > the session private data.
> > > > >
> > > > > Process function would be a dev ops just like enc/deq operations and it
> > should
> > > > call
> > > > > The respective process API stored in the session private data.
> > > >
> > > > That model (via devops) is possible, but has several drawbacks from my
> > > > perspective:
> > > >
> > > > 1. It means we'll need to pass dev_id as a parameter to process() function.
> > > > Though in fact dev_id is not a relevant information for us here
> > > > (all we need is pointer to the session and pointer to the fuction to call)
> > > > and I tried to avoid using it in data-path functions for that API.
> > >
> > > You have a single vdev, but someone may have multiple vdevs for each thread,
> > or may
> > > Have same dev with multiple queues for each core.
> >
> > That's fine. As I said above it is a SW backed implementation.
> > Each session has to be a separate entity that contains all necessary information
> > (keys, alg/mode info,  etc.)  to process input buffers.
> > Plus we need the actual function pointer to call.
> > I just don't see what for we need a dev_id in that situation.
> 
> To iterate the session private data in the session.
> 
> > Again, here we don't need care about queues and their pinning to cores.
> > If let say someone would like to process buffers from the same IPsec SA on 2
> > different cores in parallel, he can just create 2 sessions for the same xform,
> > give one to thread #1  and second to thread #2.
> > After that both threads are free to call process(this_thread_ses, ...) at will.
> 
> Say you have a 16core device to handle 100G of traffic on a single tunnel.
> Will we make 16 sessions with same parameters?

Absolutely same question we can ask for current crypto-op API.
You have lookaside crypto-dev with 16 HW queues, each queue is serviced by different CPU.
For the same SA, do you need a separate session per queue, or is it ok to reuse current one?
AFAIK, right now this is a grey area not clearly defined.
For crypto-devs I am aware - user can reuse the same session (as PMD uses it read-only).
But again, right now I think it is not clearly defined and is implementation specific.

> 
> >
> > >
> > > > 2. As you pointed in that case it will be just one process() function per device.
> > > > So if PMD would like to have several process() functions for different type of
> > > > sessions
> > > > (let say one per alg) first thing it has to do inside it's process() - read session
> > data
> > > > and
> > > > based on that, do a jump/call to particular internal sub-routine.
> > > > Something like:
> > > > driver_id = get_pmd_driver_id();
> > > > priv_ses = ses->sess_data[driver_id];
> > > > Then either:
> > > > switch(priv_sess->alg) {case XXX: process_XXX(priv_sess, ...);break;...}
> > > > OR
> > > > priv_ses->process(priv_sess, ...);
> > > >
> > > > to select and call the proper function.
> > > > Looks like totally unnecessary overhead to me.
> > > > Though if we'll have ability to query/extract some sort session_ops based on
> > the
> > > > xform -
> > > > we can avoid  this extra de-refererence+jump/call thing.
> > >
> > > What is the issue in the priv_ses->process(); approach?
> >
> > Nothing at all.
> > What I am saying that schema with dev_ops
> > dev[dev_id]->dev_ops.process(ses->priv_ses[driver_id], ...)
> >    |
> >    |-> priv_ses->process(...)
> >
> > Has bigger overhead then just:
> > process(ses,...);
> >
> > So what for to introduce extra-level of indirection here?
> 
> Explained above.
> 
> >
> > > I don't understand what are you saving by not doing this.
> > > In any case you would need to identify which session correspond to which
> > process().
> >
> > Yes, sure, but I think we can make user to store information that relationship,
> > in a way he likes: store process() pointer for each session, or group sessions
> > that share the same process() somehow, or...
> 
> So whatever relationship that user will make and store will make its life complicated.
> If we can hide that information in the driver, then what is the issue in that and user
> Will not need to worry. He would just call the process() and driver will choose which
> Process need to be called.

Driver can do that at config/init time.
Then at run-time we can avoid that choice at all and call already chosen function.

> 
> I think we should have a POC around this and see the difference in the cycle count.
> IMO it would be negligible and we would end up making a generic API set which
> can be used by others as well.
> 
> >
> > > For that you would be doing it somewhere in your data path.
> >
> > Why at data-path?
> > Only once at session creation/initialization time.
> > Or might be even once per group of sessions.
> >
> > >
> > > >
> > > > >
> > > > > I am not sure if you would need a new session init API for this as nothing
> > would
> > > > be visible to
> > > > > the app or lib.
> > > > >
> > > > > > + * - Not storing process() pointer inside the session -
> > > > > > + *   Allows user to choose does he want to store a process() pointer
> > > > > > + *   per session, or per group of sessions for that device that share
> > > > > > + *   the same input xforms. I.E. extra flexibility for the user,
> > > > > > + *   plus allows us to keep cpu_sym_session totally opaque, see above.
> > > > >
> > > > > If multiple sessions need to be processed via the same process function,
> > > > > PMD would save the same process in all the sessions, I don't think there
> > would
> > > > > be any perf overhead with that.
> > > >
> > > > I think it would, see above.
> > > >
> > > > >
> > > > > > + * Sketched usage model:
> > > > > > + * ....
> > > > > > + * /* control path, alloc/init session */
> > > > > > + * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
> > > > > > + * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
> > > > > > + * rte_crypto_cpu_sym_process_t process =
> > > > > > + *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
> > > > > > + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
> > > > > > + * ...
> > > > > > + * /* data-path*/
> > > > > > + * process(ses, ....);
> > > > > > + * ....
> > > > > > + * /* control path, termiante/free session */
> > > > > > + * rte_crypto_cpu_sym_session_fini(dev_id, ses);
> > > > > > + */
> > > > > > +
> > > > > > +/**
> > > > > > + * vector structure, contains pointer to vector array and the length
> > > > > > + * of the array
> > > > > > + */
> > > > > > +struct rte_crypto_vec {
> > > > > > +       struct iovec *vec;
> > > > > > +       uint32_t num;
> > > > > > +};
> > > > > > +
> > > > > > +/*
> > > > > > + * Data-path bulk process crypto function.
> > > > > > + */
> > > > > > +typedef void (*rte_crypto_cpu_sym_process_t)(
> > > > > > +               struct rte_crypto_cpu_sym_session *sess,
> > > > > > +               struct rte_crypto_vec buf[], void *iv[], void *aad[],
> > > > > > +               void *digest[], int status[], uint32_t num);
> > > > > > +/*
> > > > > > + * for given device return process function specific to input xforms
> > > > > > + * on error - return NULL and set rte_errno value.
> > > > > > + * Note that for same input xfroms for the same device should return
> > > > > > + * the same process function.
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +rte_crypto_cpu_sym_process_t
> > > > > > +rte_crypto_cpu_sym_session_func(uint8_t dev_id,
> > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > +
> > > > > > +/*
> > > > > > + * Return required session size in bytes for given set of xforms.
> > > > > > + * if xforms == NULL, then return the max possible session size,
> > > > > > + * that would fit session for any supported by the device algorithm.
> > > > > > + * if CPU mode is not supported at all, or requeted in xform
> > > > > > + * algorithm is not supported, then return -ENOTSUP.
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +int
> > > > > > +rte_crypto_cpu_sym_session_size(uint8_t dev_id,
> > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > +
> > > > > > +/*
> > > > > > + * Initialize session.
> > > > > > + * It is caller responsibility to allocate enough space for it.
> > > > > > + * See rte_crypto_cpu_sym_session_size above.
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
> > > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > +
> > > > > > +__rte_experimental
> > > > > > +void
> > > > > > +rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
> > > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > > +
> > > > > > +
> > > > > >  #ifdef __cplusplus
> > > > > >  }
> > > > > >  #endif
> > > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > index defe05ea0..ed7e63fab 100644
> > > > > > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > @@ -310,6 +310,20 @@ typedef void
> > > > (*cryptodev_sym_free_session_t)(struct
> > > > > > rte_cryptodev *dev,
> > > > > >  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev
> > *dev,
> > > > > >                 struct rte_cryptodev_asym_session *sess);
> > > > > >
> > > > > > +typedef int (*cryptodev_cpu_sym_session_size_t) (struct rte_cryptodev
> > > > *dev,
> > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > +
> > > > > > +typedef int (*cryptodev_cpu_sym_session_init_t) (struct rte_cryptodev
> > > > *dev,
> > > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > +
> > > > > > +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct rte_cryptodev
> > > > *dev,
> > > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > > +
> > > > > > +typedef rte_crypto_cpu_sym_process_t
> > > > (*cryptodev_cpu_sym_session_func_t)
> > > > > > (
> > > > > > +                       struct rte_cryptodev *dev,
> > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > +
> > > > > >  /** Crypto device operations function pointer table */
> > > > > >  struct rte_cryptodev_ops {
> > > > > >         cryptodev_configure_t dev_configure;    /**< Configure device. */
> > > > > > @@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
> > > > > >         /**< Clear a Crypto sessions private data. */
> > > > > >         cryptodev_asym_free_session_t asym_session_clear;
> > > > > >         /**< Clear a Crypto sessions private data. */
> > > > > > +
> > > > > > +       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
> > > > > > +       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
> > > > > > +       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
> > > > > > +       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
> > > > > >  };
> > > > > >
> > > > > >
> > > > > >
  
Akhil Goyal Oct. 9, 2019, 7:20 a.m. UTC | #22
Hi Konstantin,

> 
> 
> Hi Akhil,
> 
> > > > > > > > > > > > > > > > This action type allows the burst of symmetric crypto
> > > > > workload
> > > > > > > using
> > > > > > > > > > the
> > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > algorithm, key, and direction being processed by CPU
> > > cycles
> > > > > > > > > > > > synchronously.
> > > > > > > > > > > > > > > > This flexible action type does not require external
> > > hardware
> > > > > > > > > > involvement,
> > > > > > > > > > > > > > > > having the crypto workload processed synchronously,
> > > and is
> > > > > > > more
> > > > > > > > > > > > > > performant
> > > > > > > > > > > > > > > > than Cryptodev SW PMD due to the saved cycles on
> > > removed
> > > > > > > "async
> > > > > > > > > > > > mode
> > > > > > > > > > > > > > > > simulation" as well as 3 cacheline access of the
> crypto
> > > ops.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Does that mean application will not call the
> > > > > > > cryptodev_enqueue_burst
> > > > > > > > > > and
> > > > > > > > > > > > > > corresponding dequeue burst.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes, instead it just call
> > > rte_security_process_cpu_crypto_bulk(...)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It would be a new API something like process_packets
> and
> > > it
> > > > > will
> > > > > > > have
> > > > > > > > > > the
> > > > > > > > > > > > > > crypto processed packets while returning from the API?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes, though the plan is that API will operate on raw data
> > > buffers,
> > > > > > > not
> > > > > > > > > > mbufs.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I still do not understand why we cannot do with the
> > > > > conventional
> > > > > > > > > > crypto lib
> > > > > > > > > > > > > > only.
> > > > > > > > > > > > > > > As far as I can understand, you are not doing any
> protocol
> > > > > > > processing
> > > > > > > > > > or
> > > > > > > > > > > > any
> > > > > > > > > > > > > > value add
> > > > > > > > > > > > > > > To the crypto processing. IMO, you just need a
> > > synchronous
> > > > > > > crypto
> > > > > > > > > > > > processing
> > > > > > > > > > > > > > API which
> > > > > > > > > > > > > > > Can be defined in cryptodev, you don't need to re-
> create a
> > > > > crypto
> > > > > > > > > > session
> > > > > > > > > > > > in
> > > > > > > > > > > > > > the name of
> > > > > > > > > > > > > > > Security session in the driver just to do a synchronous
> > > > > processing.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I suppose your question is why not to have
> > > > > > > > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > > > > > > > > The main reason is that would require disruptive changes
> in
> > > > > existing
> > > > > > > > > > > > cryptodev
> > > > > > > > > > > > > > API
> > > > > > > > > > > > > > (would cause ABI/API breakage).
> > > > > > > > > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > need
> > > > > > > some
> > > > > > > > > > extra
> > > > > > > > > > > > > > information
> > > > > > > > > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > > > > > > > > (cipher offset from the start of the buffer, might be
> > > something
> > > > > extra
> > > > > > > in
> > > > > > > > > > > > future).
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cipher offset will be part of rte_crypto_op.
> > > > > > > > > > > >
> > > > > > > > > > > > fill/read (+ alloc/free) is one of the main things that
> slowdown
> > > > > current
> > > > > > > > > > crypto-op
> > > > > > > > > > > > approach.
> > > > > > > > > > > > That's why the general idea - have all data that wouldn't
> change
> > > > > from
> > > > > > > packet
> > > > > > > > > > to
> > > > > > > > > > > > packet
> > > > > > > > > > > > included into the session and setup it once at session_init().
> > > > > > > > > > >
> > > > > > > > > > > I agree that you cannot use crypto-op.
> > > > > > > > > > > You can have the new API in crypto.
> > > > > > > > > > > As per the current patch, you only need cipher_offset which
> you
> > > can
> > > > > have
> > > > > > > it as
> > > > > > > > > > a parameter until
> > > > > > > > > > > You get it approved in the crypto xform. I believe it will be
> > > beneficial
> > > > > in
> > > > > > > case of
> > > > > > > > > > other crypto cases as well.
> > > > > > > > > > > We can have cipher offset at both places(crypto-op and
> > > > > cipher_xform). It
> > > > > > > will
> > > > > > > > > > give flexibility to the user to
> > > > > > > > > > > override it.
> > > > > > > > > >
> > > > > > > > > > After having another thought on your proposal:
> > > > > > > > > > Probably we can introduce new rte_crypto_sym_xform_types
> for
> > > CPU
> > > > > > > related
> > > > > > > > > > stuff here?
> > > > > > > > >
> > > > > > > > > I also thought of adding new xforms, but that wont serve the
> purpose
> > > for
> > > > > > > may be all the cases.
> > > > > > > > > You would be needing all information currently available in the
> > > current
> > > > > > > xforms.
> > > > > > > > > So if you are adding new fields in the new xform, the size will be
> more
> > > > > than
> > > > > > > that of the union of xforms.
> > > > > > > > > ABI breakage would still be there.
> > > > > > > > >
> > > > > > > > > If you think a valid compression of the AEAD xform can be done,
> then
> > > > > that
> > > > > > > can be done for each of the
> > > > > > > > > Xforms and we can have a solution to this issue.
> > > > > > > >
> > > > > > > > I think that we can re-use iv.offset for our purposes (for crypto
> offset).
> > > > > > > > So for now we can make that path work without any ABI breakage.
> > > > > > > > Fan, please feel free to correct me here, if I missed something.
> > > > > > > > If in future we would need to add some extra information it might
> > > > > > > > require ABI breakage, though by now I don't envision anything
> > > particular to
> > > > > > > add.
> > > > > > > > Anyway, if there is no objection to go that way, we can try to make
> > > > > > > > these changes for v2.
> > > > > > > >
> > > > > > >
> > > > > > > Actually, after looking at it more deeply it appears not that easy as I
> > > thought
> > > > > it
> > > > > > > would be :)
> > > > > > > Below is a very draft version of proposed API additions.
> > > > > > > I think it avoids ABI breakages right now and provides enough
> flexibility
> > > for
> > > > > > > future extensions (if any).
> > > > > > > For now, it doesn't address your comments about naming
> conventions
> > > > > (_CPU_
> > > > > > > vs _SYNC_) , etc.
> > > > > > > but I suppose is comprehensive enough to provide a main idea
> beyond it.
> > > > > > > Akhil and other interested parties, please try to review and provide
> > > feedback
> > > > > > > ASAP,
> > > > > > > as related changes would take some time and we still like to hit 19.11
> > > > > deadline.
> > > > > > > Konstantin
> > > > > > >
> > > > > > >  diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > index bc8da2466..c03069e23 100644
> > > > > > > --- a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
> > > > > > >   *
> > > > > > >   * This structure contains data relating to Cipher (Encryption and
> > > Decryption)
> > > > > > >   *  use to create a session.
> > > > > > > + * Actually I was wrong saying that we don't have free space inside
> > > xforms.
> > > > > > > + * Making key struct packed (see below) allow us to regain 6B that
> could
> > > be
> > > > > > > + * used for future extensions.
> > > > > > >   */
> > > > > > >  struct rte_crypto_cipher_xform {
> > > > > > >         enum rte_crypto_cipher_operation op;
> > > > > > > @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
> > > > > > >         struct {
> > > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > > -       } key;
> > > > > > > +       } __attribute__((__packed__)) key;
> > > > > > > +
> > > > > > > +       /**
> > > > > > > +         * offset for cipher to start within user provided data buffer.
> > > > > > > +        * Fan suggested another (and less space consuming way) -
> > > > > > > +         * reuse iv.offset space below, by changing:
> > > > > > > +        * struct {uint16_t offset, length;} iv;
> > > > > > > +        * to uunamed union:
> > > > > > > +        * union {
> > > > > > > +        *      struct {uint16_t offset, length;} iv;
> > > > > > > +        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
> > > > > > > +        * };
> > > > > > > +        * Both approaches seems ok to me in general.
> > > > > >
> > > > > > No strong opinions here. OK with this one.
> > > > > >
> > > > > > > +        * Comments/suggestions are welcome.
> > > > > > > +         */
> > > > > > > +       uint16_t offset;
> > > > >
> > > > > After another thought - it is probably a bit better to have offset as a
> separate
> > > > > field.
> > > > > In that case we can use the same xforms to create both type of sessions.
> > > > ok
> > > > >
> > > > > > > +
> > > > > > > +       uint8_t reserved1[4];
> > > > > > > +
> > > > > > >         /**< Cipher key
> > > > > > >          *
> > > > > > >          * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation,
> > > key.data
> > > > > will
> > > > > > > @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
> > > > > > >         struct {
> > > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > > -       } key;
> > > > > > > +       } __attribute__((__packed__)) key;
> > > > > > >         /**< Authentication key data.
> > > > > > >          * The authentication key length MUST be less than or equal to
> the
> > > > > > >          * block size of the algorithm. It is the callers responsibility to
> > > > > > > @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
> > > > > > >          * (for example RFC 2104, FIPS 198a).
> > > > > > >          */
> > > > > > >
> > > > > > > +       uint8_t reserved1[6];
> > > > > > > +
> > > > > > >         struct {
> > > > > > >                 uint16_t offset;
> > > > > > >                 /**< Starting point for Initialisation Vector or Counter,
> > > > > > > @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
> > > > > > >         struct {
> > > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > > -       } key;
> > > > > > > +       } __attribute__((__packed__)) key;
> > > > > > > +
> > > > > > > +       /** offset for cipher to start within data buffer */
> > > > > > > +       uint16_t cipher_offset;
> > > > > > > +
> > > > > > > +       uint8_t reserved1[4];
> > > > > > >
> > > > > > >         struct {
> > > > > > >                 uint16_t offset;
> > > > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > index e175b838c..c0c7bfed7 100644
> > > > > > > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > @@ -1272,6 +1272,101 @@ void *
> > > > > > >  rte_cryptodev_sym_session_get_user_data(
> > > > > > >                                         struct rte_cryptodev_sym_session *sess);
> > > > > > >
> > > > > > > +/*
> > > > > > > + * After several thoughts decided not to try to squeeze CPU_CRYPTO
> > > > > > > + * into existing rte_crypto_sym_session structure/API, but instead
> > > > > > > + * introduce an extentsion to it via new fully opaque
> > > > > > > + * struct rte_crypto_cpu_sym_session and additional related API.
> > > > > >
> > > > > >
> > > > > > What all things do we need to squeeze?
> > > > > > In this proposal I do not see the new struct cpu_sym_session  defined
> here.
> > > > >
> > > > > The plan is to have it totally opaque to the user, i.e. just:
> > > > > struct rte_crypto_cpu_sym_session;
> > > > > in public header files.
> > > > >
> > > > > > I believe you will have same lib API/struct for cpu_sym_session  and
> > > > > sym_session.
> > > > >
> > > > > I thought about such way, but there are few things that looks clumsy to
> me:
> > > > > 1. Right now there is no 'type' (or so) field inside
> rte_cryptodev_sym_session,
> > > > > so it is not possible to easy distinguish what session do you have:
> lksd_sym or
> > > > > cpu_sym.
> > > > > In theory, there is a hole of 4B inside rte_cryptodev_sym_session, so we
> can
> > > add
> > > > > some extra field
> > > > > here, but in that case  we wouldn't be able to use the same xform for
> both
> > > > > lksd_sym or cpu_sym
> > > > > (which seems really plausible thing for me).
> > > > > 2.  Majority of rte_cryptodev_sym_session fields I think are unnecessary
> for
> > > > > rte_crypto_cpu_sym_session:
> > > > > sess_data[], opaque_data, user_data, nb_drivers.
> > > > > All that consumes space, that could be used somewhere else instead.
> > > > > 3. I am a bit reluctant to touch existing rte_cryptodev API - to avoid any
> > > > > breakages I can't foresee right now.
> > > > > From other side - if we'll add new functions/structs for cpu_sym_session
> we
> > > can
> > > > > mark it
> > > > > and keep it for some time as experimental, so further changes (if needed)
> > > would
> > > > > still be possible.
> > > > >
> > > >
> > > > OK let us assume that you have a separate structure. But I have a few
> queries:
> > > > 1. how can multiple drivers use a same session
> > >
> > > As a short answer: they can't.
> > > It is pretty much the same approach as with rte_security - each device needs
> to
> > > create/init its own session.
> > > So upper layer would need to maintain its own array (or so) for such case.
> > > Though the question is why would you like to have same session over
> multiple
> > > SW backed devices?
> > > As it would be anyway just a synchronous function call that will be executed
> on
> > > the same cpu.
> >
> > I may have single FAT tunnel which may be distributed over multiple
> > Cores, and each core is affined to a different SW device.
> 
> If it is pure SW, then we don't need multiple devices for such scenario.
> Device in that case is pure abstraction that we can skip.

Yes agreed, but that liberty is given to the application whether it need multiple
devices with single queue or a single device with multiple queues.
I think that independence should not be broken in this new API.

> 
> > So a single session may be accessed by multiple devices.
> >
> > One more example would be depending on packet sizes, I may switch between
> > HW/SW PMDs with the same session.
> 
> Sure, but then we'll have multiple sessions.

No, the session will be same and it will have multiple private data for each of the PMD.

> BTW, we have same thing now - these private session pointers are just stored
> inside the same rte_crypto_sym_session.
> And if user wants to support this model, he would also need to store <dev_id,
> queue_id>
> pair for each HW device anyway.

Yes agreed, but how is that thing happening in your new struct, you cannot support that.

> 
> >
> > >
> > > > 2. Can somebody use the scheduler pmd for scheduling the different type
> of
> > > payloads for the same session?
> > >
> > > In theory yes.
> > > Though for that scheduler pmd should have inside it's
> > > rte_crypto_cpu_sym_session an array of pointers to
> > > the underlying devices sessions.
> > >
> > > >
> > > > With your proposal the APIs would be very specific to your use case only.
> > >
> > > Yes in some way.
> > > I consider that API specific for SW backed crypto PMDs.
> > > I can hardly see how any 'real HW' PMDs (lksd-none, lksd-proto) will benefit
> > > from it.
> > > Current crypto-op API is very much HW oriented.
> > > Which is ok, that's for it was intended for, but I think we also need one that
> > > would be designed
> > > for SW backed implementation in mind.
> >
> > We may re-use your API for HW PMDs as well which do not have requirement
> of
> > Crypto-op/mbuf etc.
> > The return type of your new process API may have a status which say
> 'processed'
> > Or can be say 'enqueued'. So if it is  'enqueued', we may have a new API for
> raw
> > Bufs dequeue as well.
> >
> > This requirement can be for any hardware PMDs like QAT as well.
> 
> I don't think it is a good idea to extend this API for async (lookaside) devices.
> You'll need to:
>  - provide dev_id and queue_id for each process(enqueue) and dequeuer
> operation.
>  - provide IOVA for all buffers passing to that function (data buffers, digest, IV,
> aad).
>  - On dequeue provide some way to associate dequed data and digest buffers
> with
>    crypto-session that was used  (and probably with mbuf).
>  So most likely we'll end up with another just version of our current crypto-op
> structure.
> If you'd like to get rid of mbufs dependency within current crypto-op API that
> understandable,
> but I don't think we should have same API for both sync (CPU) and async
> (lookaside) cases.
> It doesn't seem feasible at all and voids whole purpose of that patch.

At this moment we are not much concerned about the dequeue API and about the
HW PMD support. It is just that the new API should be generic enough to be used in
some future scenarios as well. I am just highlighting the possible usecases which can 
be there in future.

What is the issue that you face in making a dev-op for this new API. Do you see any
performance impact with that?

> 
> > That is why a dev-ops would be a better option.
> >
> > >
> > > > When you would add more functionality to this sync API/struct, it will end
> up
> > > being the same API/struct.
> > > >
> > > > Let us  see how close/ far we are from the existing APIs when the actual
> > > implementation is done.
> > > >
> > > > > > I am not sure if that would be needed.
> > > > > > It would be internal to the driver that if synchronous processing is
> > > > > supported(from feature flag) and
> > > > > > Have relevant fields in xform(the newly added ones which are packed
> as
> > > per
> > > > > your suggestions) set,
> > > > > > It will create that type of session.
> > > > > >
> > > > > >
> > > > > > > + * Main points:
> > > > > > > + * - Current crypto-dev API is reasonably mature and it is desirable
> > > > > > > + *   to keep it unchanged (API/ABI stability). From other side, this
> > > > > > > + *   new sync API is new one and probably would require extra
> changes.
> > > > > > > + *   Having it as a new one allows to mark it as experimental, without
> > > > > > > + *   affecting existing one.
> > > > > > > + * - Fully opaque cpu_sym_session structure gives more flexibility
> > > > > > > + *   to the PMD writers and again allows to avoid ABI breakages in
> future.
> > > > > > > + * - process() function per set of xforms
> > > > > > > + *   allows to expose different process() functions for different
> > > > > > > + *   xform combinations. PMD writer can decide, does he wants to
> > > > > > > + *   push all supported algorithms into one process() function,
> > > > > > > + *   or spread it across several ones.
> > > > > > > + *   I.E. More flexibility for PMD writer.
> > > > > >
> > > > > > Which process function should be chosen is internal to PMD, how
> would
> > > that
> > > > > info
> > > > > > be visible to the application or the library. These will get stored in the
> > > session
> > > > > private
> > > > > > data. It would be upto the PMD writer, to store the per session process
> > > > > function in
> > > > > > the session private data.
> > > > > >
> > > > > > Process function would be a dev ops just like enc/deq operations and it
> > > should
> > > > > call
> > > > > > The respective process API stored in the session private data.
> > > > >
> > > > > That model (via devops) is possible, but has several drawbacks from my
> > > > > perspective:
> > > > >
> > > > > 1. It means we'll need to pass dev_id as a parameter to process() function.
> > > > > Though in fact dev_id is not a relevant information for us here
> > > > > (all we need is pointer to the session and pointer to the fuction to call)
> > > > > and I tried to avoid using it in data-path functions for that API.
> > > >
> > > > You have a single vdev, but someone may have multiple vdevs for each
> thread,
> > > or may
> > > > Have same dev with multiple queues for each core.
> > >
> > > That's fine. As I said above it is a SW backed implementation.
> > > Each session has to be a separate entity that contains all necessary
> information
> > > (keys, alg/mode info,  etc.)  to process input buffers.
> > > Plus we need the actual function pointer to call.
> > > I just don't see what for we need a dev_id in that situation.
> >
> > To iterate the session private data in the session.
> >
> > > Again, here we don't need care about queues and their pinning to cores.
> > > If let say someone would like to process buffers from the same IPsec SA on 2
> > > different cores in parallel, he can just create 2 sessions for the same xform,
> > > give one to thread #1  and second to thread #2.
> > > After that both threads are free to call process(this_thread_ses, ...) at will.
> >
> > Say you have a 16core device to handle 100G of traffic on a single tunnel.
> > Will we make 16 sessions with same parameters?
> 
> Absolutely same question we can ask for current crypto-op API.
> You have lookaside crypto-dev with 16 HW queues, each queue is serviced by
> different CPU.
> For the same SA, do you need a separate session per queue, or is it ok to reuse
> current one?
> AFAIK, right now this is a grey area not clearly defined.
> For crypto-devs I am aware - user can reuse the same session (as PMD uses it
> read-only).
> But again, right now I think it is not clearly defined and is implementation
> specific.

User can use the same session, that is what I am also insisting, but it may have separate
Session private data. Cryptodev session create API provide that functionality and we can
Leverage that.

BTW, I can see a v2 to this RFC which is still based on security library. When do you plan
To submit the patches for crypto based APIs. We have RC1 merge deadline for this
patchset on 21st Oct.

As per my understanding you only need a new dev-op for sync support. Session APIs
Will remain the same and you will have some extra fields packed in xform structs.

The PMD will need to maintain a pointer to the per session process function while creating
Session and will be used by the dev-op API at runtime without any extra check at runtime.

> 
> >
> > >
> > > >
> > > > > 2. As you pointed in that case it will be just one process() function per
> device.
> > > > > So if PMD would like to have several process() functions for different type
> of
> > > > > sessions
> > > > > (let say one per alg) first thing it has to do inside it's process() - read
> session
> > > data
> > > > > and
> > > > > based on that, do a jump/call to particular internal sub-routine.
> > > > > Something like:
> > > > > driver_id = get_pmd_driver_id();
> > > > > priv_ses = ses->sess_data[driver_id];
> > > > > Then either:
> > > > > switch(priv_sess->alg) {case XXX: process_XXX(priv_sess, ...);break;...}
> > > > > OR
> > > > > priv_ses->process(priv_sess, ...);
> > > > >
> > > > > to select and call the proper function.
> > > > > Looks like totally unnecessary overhead to me.
> > > > > Though if we'll have ability to query/extract some sort session_ops based
> on
> > > the
> > > > > xform -
> > > > > we can avoid  this extra de-refererence+jump/call thing.
> > > >
> > > > What is the issue in the priv_ses->process(); approach?
> > >
> > > Nothing at all.
> > > What I am saying that schema with dev_ops
> > > dev[dev_id]->dev_ops.process(ses->priv_ses[driver_id], ...)
> > >    |
> > >    |-> priv_ses->process(...)
> > >
> > > Has bigger overhead then just:
> > > process(ses,...);
> > >
> > > So what for to introduce extra-level of indirection here?
> >
> > Explained above.
> >
> > >
> > > > I don't understand what are you saving by not doing this.
> > > > In any case you would need to identify which session correspond to which
> > > process().
> > >
> > > Yes, sure, but I think we can make user to store information that relationship,
> > > in a way he likes: store process() pointer for each session, or group sessions
> > > that share the same process() somehow, or...
> >
> > So whatever relationship that user will make and store will make its life
> complicated.
> > If we can hide that information in the driver, then what is the issue in that and
> user
> > Will not need to worry. He would just call the process() and driver will choose
> which
> > Process need to be called.
> 
> Driver can do that at config/init time.
> Then at run-time we can avoid that choice at all and call already chosen function.
> 
> >
> > I think we should have a POC around this and see the difference in the cycle
> count.
> > IMO it would be negligible and we would end up making a generic API set
> which
> > can be used by others as well.
> >
> > >
> > > > For that you would be doing it somewhere in your data path.
> > >
> > > Why at data-path?
> > > Only once at session creation/initialization time.
> > > Or might be even once per group of sessions.
> > >
> > > >
> > > > >
> > > > > >
> > > > > > I am not sure if you would need a new session init API for this as
> nothing
> > > would
> > > > > be visible to
> > > > > > the app or lib.
> > > > > >
> > > > > > > + * - Not storing process() pointer inside the session -
> > > > > > > + *   Allows user to choose does he want to store a process() pointer
> > > > > > > + *   per session, or per group of sessions for that device that share
> > > > > > > + *   the same input xforms. I.E. extra flexibility for the user,
> > > > > > > + *   plus allows us to keep cpu_sym_session totally opaque, see
> above.
> > > > > >
> > > > > > If multiple sessions need to be processed via the same process function,
> > > > > > PMD would save the same process in all the sessions, I don't think there
> > > would
> > > > > > be any perf overhead with that.
> > > > >
> > > > > I think it would, see above.
> > > > >
> > > > > >
> > > > > > > + * Sketched usage model:
> > > > > > > + * ....
> > > > > > > + * /* control path, alloc/init session */
> > > > > > > + * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
> > > > > > > + * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
> > > > > > > + * rte_crypto_cpu_sym_process_t process =
> > > > > > > + *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
> > > > > > > + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
> > > > > > > + * ...
> > > > > > > + * /* data-path*/
> > > > > > > + * process(ses, ....);
> > > > > > > + * ....
> > > > > > > + * /* control path, termiante/free session */
> > > > > > > + * rte_crypto_cpu_sym_session_fini(dev_id, ses);
> > > > > > > + */
> > > > > > > +
> > > > > > > +/**
> > > > > > > + * vector structure, contains pointer to vector array and the length
> > > > > > > + * of the array
> > > > > > > + */
> > > > > > > +struct rte_crypto_vec {
> > > > > > > +       struct iovec *vec;
> > > > > > > +       uint32_t num;
> > > > > > > +};
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Data-path bulk process crypto function.
> > > > > > > + */
> > > > > > > +typedef void (*rte_crypto_cpu_sym_process_t)(
> > > > > > > +               struct rte_crypto_cpu_sym_session *sess,
> > > > > > > +               struct rte_crypto_vec buf[], void *iv[], void *aad[],
> > > > > > > +               void *digest[], int status[], uint32_t num);
> > > > > > > +/*
> > > > > > > + * for given device return process function specific to input xforms
> > > > > > > + * on error - return NULL and set rte_errno value.
> > > > > > > + * Note that for same input xfroms for the same device should
> return
> > > > > > > + * the same process function.
> > > > > > > + */
> > > > > > > +__rte_experimental
> > > > > > > +rte_crypto_cpu_sym_process_t
> > > > > > > +rte_crypto_cpu_sym_session_func(uint8_t dev_id,
> > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Return required session size in bytes for given set of xforms.
> > > > > > > + * if xforms == NULL, then return the max possible session size,
> > > > > > > + * that would fit session for any supported by the device algorithm.
> > > > > > > + * if CPU mode is not supported at all, or requeted in xform
> > > > > > > + * algorithm is not supported, then return -ENOTSUP.
> > > > > > > + */
> > > > > > > +__rte_experimental
> > > > > > > +int
> > > > > > > +rte_crypto_cpu_sym_session_size(uint8_t dev_id,
> > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Initialize session.
> > > > > > > + * It is caller responsibility to allocate enough space for it.
> > > > > > > + * See rte_crypto_cpu_sym_session_size above.
> > > > > > > + */
> > > > > > > +__rte_experimental
> > > > > > > +int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
> > > > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > +
> > > > > > > +__rte_experimental
> > > > > > > +void
> > > > > > > +rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
> > > > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > > > +
> > > > > > > +
> > > > > > >  #ifdef __cplusplus
> > > > > > >  }
> > > > > > >  #endif
> > > > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > index defe05ea0..ed7e63fab 100644
> > > > > > > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > @@ -310,6 +310,20 @@ typedef void
> > > > > (*cryptodev_sym_free_session_t)(struct
> > > > > > > rte_cryptodev *dev,
> > > > > > >  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev
> > > *dev,
> > > > > > >                 struct rte_cryptodev_asym_session *sess);
> > > > > > >
> > > > > > > +typedef int (*cryptodev_cpu_sym_session_size_t) (struct
> rte_cryptodev
> > > > > *dev,
> > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > +
> > > > > > > +typedef int (*cryptodev_cpu_sym_session_init_t) (struct
> rte_cryptodev
> > > > > *dev,
> > > > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > +
> > > > > > > +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct
> rte_cryptodev
> > > > > *dev,
> > > > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > > > +
> > > > > > > +typedef rte_crypto_cpu_sym_process_t
> > > > > (*cryptodev_cpu_sym_session_func_t)
> > > > > > > (
> > > > > > > +                       struct rte_cryptodev *dev,
> > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > +
> > > > > > >  /** Crypto device operations function pointer table */
> > > > > > >  struct rte_cryptodev_ops {
> > > > > > >         cryptodev_configure_t dev_configure;    /**< Configure device.
> */
> > > > > > > @@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
> > > > > > >         /**< Clear a Crypto sessions private data. */
> > > > > > >         cryptodev_asym_free_session_t asym_session_clear;
> > > > > > >         /**< Clear a Crypto sessions private data. */
> > > > > > > +
> > > > > > > +       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
> > > > > > > +       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
> > > > > > > +       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
> > > > > > > +       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
> > > > > > >  };
> > > > > > >
> > > > > > >
> > > > > > >
  
Ananyev, Konstantin Oct. 9, 2019, 1:43 p.m. UTC | #23
Hi Akhil,

> > > > > > > > > > > > > > > > > This action type allows the burst of symmetric crypto
> > > > > > workload
> > > > > > > > using
> > > > > > > > > > > the
> > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > algorithm, key, and direction being processed by CPU
> > > > cycles
> > > > > > > > > > > > > synchronously.
> > > > > > > > > > > > > > > > > This flexible action type does not require external
> > > > hardware
> > > > > > > > > > > involvement,
> > > > > > > > > > > > > > > > > having the crypto workload processed synchronously,
> > > > and is
> > > > > > > > more
> > > > > > > > > > > > > > > performant
> > > > > > > > > > > > > > > > > than Cryptodev SW PMD due to the saved cycles on
> > > > removed
> > > > > > > > "async
> > > > > > > > > > > > > mode
> > > > > > > > > > > > > > > > > simulation" as well as 3 cacheline access of the
> > crypto
> > > > ops.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Does that mean application will not call the
> > > > > > > > cryptodev_enqueue_burst
> > > > > > > > > > > and
> > > > > > > > > > > > > > > corresponding dequeue burst.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes, instead it just call
> > > > rte_security_process_cpu_crypto_bulk(...)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > It would be a new API something like process_packets
> > and
> > > > it
> > > > > > will
> > > > > > > > have
> > > > > > > > > > > the
> > > > > > > > > > > > > > > crypto processed packets while returning from the API?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes, though the plan is that API will operate on raw data
> > > > buffers,
> > > > > > > > not
> > > > > > > > > > > mbufs.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I still do not understand why we cannot do with the
> > > > > > conventional
> > > > > > > > > > > crypto lib
> > > > > > > > > > > > > > > only.
> > > > > > > > > > > > > > > > As far as I can understand, you are not doing any
> > protocol
> > > > > > > > processing
> > > > > > > > > > > or
> > > > > > > > > > > > > any
> > > > > > > > > > > > > > > value add
> > > > > > > > > > > > > > > > To the crypto processing. IMO, you just need a
> > > > synchronous
> > > > > > > > crypto
> > > > > > > > > > > > > processing
> > > > > > > > > > > > > > > API which
> > > > > > > > > > > > > > > > Can be defined in cryptodev, you don't need to re-
> > create a
> > > > > > crypto
> > > > > > > > > > > session
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > the name of
> > > > > > > > > > > > > > > > Security session in the driver just to do a synchronous
> > > > > > processing.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I suppose your question is why not to have
> > > > > > > > > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > > > > > > > > > The main reason is that would require disruptive changes
> > in
> > > > > > existing
> > > > > > > > > > > > > cryptodev
> > > > > > > > > > > > > > > API
> > > > > > > > > > > > > > > (would cause ABI/API breakage).
> > > > > > > > > > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > need
> > > > > > > > some
> > > > > > > > > > > extra
> > > > > > > > > > > > > > > information
> > > > > > > > > > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > > > > > > > > > (cipher offset from the start of the buffer, might be
> > > > something
> > > > > > extra
> > > > > > > > in
> > > > > > > > > > > > > future).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cipher offset will be part of rte_crypto_op.
> > > > > > > > > > > > >
> > > > > > > > > > > > > fill/read (+ alloc/free) is one of the main things that
> > slowdown
> > > > > > current
> > > > > > > > > > > crypto-op
> > > > > > > > > > > > > approach.
> > > > > > > > > > > > > That's why the general idea - have all data that wouldn't
> > change
> > > > > > from
> > > > > > > > packet
> > > > > > > > > > > to
> > > > > > > > > > > > > packet
> > > > > > > > > > > > > included into the session and setup it once at session_init().
> > > > > > > > > > > >
> > > > > > > > > > > > I agree that you cannot use crypto-op.
> > > > > > > > > > > > You can have the new API in crypto.
> > > > > > > > > > > > As per the current patch, you only need cipher_offset which
> > you
> > > > can
> > > > > > have
> > > > > > > > it as
> > > > > > > > > > > a parameter until
> > > > > > > > > > > > You get it approved in the crypto xform. I believe it will be
> > > > beneficial
> > > > > > in
> > > > > > > > case of
> > > > > > > > > > > other crypto cases as well.
> > > > > > > > > > > > We can have cipher offset at both places(crypto-op and
> > > > > > cipher_xform). It
> > > > > > > > will
> > > > > > > > > > > give flexibility to the user to
> > > > > > > > > > > > override it.
> > > > > > > > > > >
> > > > > > > > > > > After having another thought on your proposal:
> > > > > > > > > > > Probably we can introduce new rte_crypto_sym_xform_types
> > for
> > > > CPU
> > > > > > > > related
> > > > > > > > > > > stuff here?
> > > > > > > > > >
> > > > > > > > > > I also thought of adding new xforms, but that wont serve the
> > purpose
> > > > for
> > > > > > > > may be all the cases.
> > > > > > > > > > You would be needing all information currently available in the
> > > > current
> > > > > > > > xforms.
> > > > > > > > > > So if you are adding new fields in the new xform, the size will be
> > more
> > > > > > than
> > > > > > > > that of the union of xforms.
> > > > > > > > > > ABI breakage would still be there.
> > > > > > > > > >
> > > > > > > > > > If you think a valid compression of the AEAD xform can be done,
> > then
> > > > > > that
> > > > > > > > can be done for each of the
> > > > > > > > > > Xforms and we can have a solution to this issue.
> > > > > > > > >
> > > > > > > > > I think that we can re-use iv.offset for our purposes (for crypto
> > offset).
> > > > > > > > > So for now we can make that path work without any ABI breakage.
> > > > > > > > > Fan, please feel free to correct me here, if I missed something.
> > > > > > > > > If in future we would need to add some extra information it might
> > > > > > > > > require ABI breakage, though by now I don't envision anything
> > > > particular to
> > > > > > > > add.
> > > > > > > > > Anyway, if there is no objection to go that way, we can try to make
> > > > > > > > > these changes for v2.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Actually, after looking at it more deeply it appears not that easy as I
> > > > thought
> > > > > > it
> > > > > > > > would be :)
> > > > > > > > Below is a very draft version of proposed API additions.
> > > > > > > > I think it avoids ABI breakages right now and provides enough
> > flexibility
> > > > for
> > > > > > > > future extensions (if any).
> > > > > > > > For now, it doesn't address your comments about naming
> > conventions
> > > > > > (_CPU_
> > > > > > > > vs _SYNC_) , etc.
> > > > > > > > but I suppose is comprehensive enough to provide a main idea
> > beyond it.
> > > > > > > > Akhil and other interested parties, please try to review and provide
> > > > feedback
> > > > > > > > ASAP,
> > > > > > > > as related changes would take some time and we still like to hit 19.11
> > > > > > deadline.
> > > > > > > > Konstantin
> > > > > > > >
> > > > > > > >  diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > > b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > > index bc8da2466..c03069e23 100644
> > > > > > > > --- a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > > +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > > @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
> > > > > > > >   *
> > > > > > > >   * This structure contains data relating to Cipher (Encryption and
> > > > Decryption)
> > > > > > > >   *  use to create a session.
> > > > > > > > + * Actually I was wrong saying that we don't have free space inside
> > > > xforms.
> > > > > > > > + * Making key struct packed (see below) allow us to regain 6B that
> > could
> > > > be
> > > > > > > > + * used for future extensions.
> > > > > > > >   */
> > > > > > > >  struct rte_crypto_cipher_xform {
> > > > > > > >         enum rte_crypto_cipher_operation op;
> > > > > > > > @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
> > > > > > > >         struct {
> > > > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > > > -       } key;
> > > > > > > > +       } __attribute__((__packed__)) key;
> > > > > > > > +
> > > > > > > > +       /**
> > > > > > > > +         * offset for cipher to start within user provided data buffer.
> > > > > > > > +        * Fan suggested another (and less space consuming way) -
> > > > > > > > +         * reuse iv.offset space below, by changing:
> > > > > > > > +        * struct {uint16_t offset, length;} iv;
> > > > > > > > +        * to uunamed union:
> > > > > > > > +        * union {
> > > > > > > > +        *      struct {uint16_t offset, length;} iv;
> > > > > > > > +        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
> > > > > > > > +        * };
> > > > > > > > +        * Both approaches seems ok to me in general.
> > > > > > >
> > > > > > > No strong opinions here. OK with this one.
> > > > > > >
> > > > > > > > +        * Comments/suggestions are welcome.
> > > > > > > > +         */
> > > > > > > > +       uint16_t offset;
> > > > > >
> > > > > > After another thought - it is probably a bit better to have offset as a
> > separate
> > > > > > field.
> > > > > > In that case we can use the same xforms to create both type of sessions.
> > > > > ok
> > > > > >
> > > > > > > > +
> > > > > > > > +       uint8_t reserved1[4];
> > > > > > > > +
> > > > > > > >         /**< Cipher key
> > > > > > > >          *
> > > > > > > >          * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation,
> > > > key.data
> > > > > > will
> > > > > > > > @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
> > > > > > > >         struct {
> > > > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > > > -       } key;
> > > > > > > > +       } __attribute__((__packed__)) key;
> > > > > > > >         /**< Authentication key data.
> > > > > > > >          * The authentication key length MUST be less than or equal to
> > the
> > > > > > > >          * block size of the algorithm. It is the callers responsibility to
> > > > > > > > @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
> > > > > > > >          * (for example RFC 2104, FIPS 198a).
> > > > > > > >          */
> > > > > > > >
> > > > > > > > +       uint8_t reserved1[6];
> > > > > > > > +
> > > > > > > >         struct {
> > > > > > > >                 uint16_t offset;
> > > > > > > >                 /**< Starting point for Initialisation Vector or Counter,
> > > > > > > > @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
> > > > > > > >         struct {
> > > > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > > > -       } key;
> > > > > > > > +       } __attribute__((__packed__)) key;
> > > > > > > > +
> > > > > > > > +       /** offset for cipher to start within data buffer */
> > > > > > > > +       uint16_t cipher_offset;
> > > > > > > > +
> > > > > > > > +       uint8_t reserved1[4];
> > > > > > > >
> > > > > > > >         struct {
> > > > > > > >                 uint16_t offset;
> > > > > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > > b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > > index e175b838c..c0c7bfed7 100644
> > > > > > > > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > > @@ -1272,6 +1272,101 @@ void *
> > > > > > > >  rte_cryptodev_sym_session_get_user_data(
> > > > > > > >                                         struct rte_cryptodev_sym_session *sess);
> > > > > > > >
> > > > > > > > +/*
> > > > > > > > + * After several thoughts decided not to try to squeeze CPU_CRYPTO
> > > > > > > > + * into existing rte_crypto_sym_session structure/API, but instead
> > > > > > > > + * introduce an extentsion to it via new fully opaque
> > > > > > > > + * struct rte_crypto_cpu_sym_session and additional related API.
> > > > > > >
> > > > > > >
> > > > > > > What all things do we need to squeeze?
> > > > > > > In this proposal I do not see the new struct cpu_sym_session  defined
> > here.
> > > > > >
> > > > > > The plan is to have it totally opaque to the user, i.e. just:
> > > > > > struct rte_crypto_cpu_sym_session;
> > > > > > in public header files.
> > > > > >
> > > > > > > I believe you will have same lib API/struct for cpu_sym_session  and
> > > > > > sym_session.
> > > > > >
> > > > > > I thought about such way, but there are few things that looks clumsy to
> > me:
> > > > > > 1. Right now there is no 'type' (or so) field inside
> > rte_cryptodev_sym_session,
> > > > > > so it is not possible to easy distinguish what session do you have:
> > lksd_sym or
> > > > > > cpu_sym.
> > > > > > In theory, there is a hole of 4B inside rte_cryptodev_sym_session, so we
> > can
> > > > add
> > > > > > some extra field
> > > > > > here, but in that case  we wouldn't be able to use the same xform for
> > both
> > > > > > lksd_sym or cpu_sym
> > > > > > (which seems really plausible thing for me).
> > > > > > 2.  Majority of rte_cryptodev_sym_session fields I think are unnecessary
> > for
> > > > > > rte_crypto_cpu_sym_session:
> > > > > > sess_data[], opaque_data, user_data, nb_drivers.
> > > > > > All that consumes space, that could be used somewhere else instead.
> > > > > > 3. I am a bit reluctant to touch existing rte_cryptodev API - to avoid any
> > > > > > breakages I can't foresee right now.
> > > > > > From other side - if we'll add new functions/structs for cpu_sym_session
> > we
> > > > can
> > > > > > mark it
> > > > > > and keep it for some time as experimental, so further changes (if needed)
> > > > would
> > > > > > still be possible.
> > > > > >
> > > > >
> > > > > OK let us assume that you have a separate structure. But I have a few
> > queries:
> > > > > 1. how can multiple drivers use a same session
> > > >
> > > > As a short answer: they can't.
> > > > It is pretty much the same approach as with rte_security - each device needs
> > to
> > > > create/init its own session.
> > > > So upper layer would need to maintain its own array (or so) for such case.
> > > > Though the question is why would you like to have same session over
> > multiple
> > > > SW backed devices?
> > > > As it would be anyway just a synchronous function call that will be executed
> > on
> > > > the same cpu.
> > >
> > > I may have single FAT tunnel which may be distributed over multiple
> > > Cores, and each core is affined to a different SW device.
> >
> > If it is pure SW, then we don't need multiple devices for such scenario.
> > Device in that case is pure abstraction that we can skip.
> 
> Yes agreed, but that liberty is given to the application whether it need multiple
> devices with single queue or a single device with multiple queues.
> I think that independence should not be broken in this new API.
> >
> > > So a single session may be accessed by multiple devices.
> > >
> > > One more example would be depending on packet sizes, I may switch between
> > > HW/SW PMDs with the same session.
> >
> > Sure, but then we'll have multiple sessions.
> 
> No, the session will be same and it will have multiple private data for each of the PMD.
> 
> > BTW, we have same thing now - these private session pointers are just stored
> > inside the same rte_crypto_sym_session.
> > And if user wants to support this model, he would also need to store <dev_id,
> > queue_id>
> > pair for each HW device anyway.
> 
> Yes agreed, but how is that thing happening in your new struct, you cannot support that.

User can store all these info in his own struct.
That's exactly what we have right now.
Let say ipsec-secgw has to store for each IPsec SA:
pointer to crypto-session and/or pointer to security session
plus (for lookaside-devices) cdev_id_qp that allows it to extract
dev_id + queue_id information.
As I understand that works for now, as each ipsec_sa uses only one
dev+queue. Though if someone would like to use multiple devices/queues
for the same SA - he would need to have an array of these <dev+queue> pairs.
So even right now rte_cryptodev_sym_session is not self-consistent and
requires extra information to be maintained by user. 

> 
> >
> > >
> > > >
> > > > > 2. Can somebody use the scheduler pmd for scheduling the different type
> > of
> > > > payloads for the same session?
> > > >
> > > > In theory yes.
> > > > Though for that scheduler pmd should have inside it's
> > > > rte_crypto_cpu_sym_session an array of pointers to
> > > > the underlying devices sessions.
> > > >
> > > > >
> > > > > With your proposal the APIs would be very specific to your use case only.
> > > >
> > > > Yes in some way.
> > > > I consider that API specific for SW backed crypto PMDs.
> > > > I can hardly see how any 'real HW' PMDs (lksd-none, lksd-proto) will benefit
> > > > from it.
> > > > Current crypto-op API is very much HW oriented.
> > > > Which is ok, that's for it was intended for, but I think we also need one that
> > > > would be designed
> > > > for SW backed implementation in mind.
> > >
> > > We may re-use your API for HW PMDs as well which do not have requirement
> > of
> > > Crypto-op/mbuf etc.
> > > The return type of your new process API may have a status which say
> > 'processed'
> > > Or can be say 'enqueued'. So if it is  'enqueued', we may have a new API for
> > raw
> > > Bufs dequeue as well.
> > >
> > > This requirement can be for any hardware PMDs like QAT as well.
> >
> > I don't think it is a good idea to extend this API for async (lookaside) devices.
> > You'll need to:
> >  - provide dev_id and queue_id for each process(enqueue) and dequeuer
> > operation.
> >  - provide IOVA for all buffers passing to that function (data buffers, digest, IV,
> > aad).
> >  - On dequeue provide some way to associate dequed data and digest buffers
> > with
> >    crypto-session that was used  (and probably with mbuf).
> >  So most likely we'll end up with another just version of our current crypto-op
> > structure.
> > If you'd like to get rid of mbufs dependency within current crypto-op API that
> > understandable,
> > but I don't think we should have same API for both sync (CPU) and async
> > (lookaside) cases.
> > It doesn't seem feasible at all and voids whole purpose of that patch.
> 
> At this moment we are not much concerned about the dequeue API and about the
> HW PMD support. It is just that the new API should be generic enough to be used in
> some future scenarios as well. I am just highlighting the possible usecases which can
> be there in future.

Sorry, but I strongly disagree with such approach.
We should stop adding/modifying API 'just in case' and because 'it might be useful for some future HW'.
Inside DPDK we already do have too many dev level APIs without any implementations.
That's quite bad practice and very dis-orienting for end-users.
I think to justify API additions/changes we need at least one proper implementation for it,
or at least some strong evidence that people are really committed to support it in nearest future.
BTW, that what TB agreed on, nearly a year ago.  

This new API (if we'll go ahead with it of course) would stay experimental for some time anyway
to make sure we don't miss anything needed (I think for about a year time-frame).
So if you guys *really* want to extend it support _async_ devices too -
I am open for modifications/additions here.
Though personally I think such addition would over-complicate things and we'll end up with
another reincarnation of current crypto-op.
We actually discussed it internally, and decided to drop that idea because of that.  
Again, my opinion - for lookaside devices it might be better to try to optimize
current crypto-op path (remove mbuf requirement, probably add  ability to
group by session on enqueue/dequeue, etc.). 

> 
> What is the issue that you face in making a dev-op for this new API. Do you see any
> performance impact with that?

There are two main things:
1. user would need to maintain and provide for each process() call dev_id+queue_id.
That's means extra (and totally unnecessary for SW) overhead. 
2. yes I would expect some perf overhead too - it would be extra call or branch.
Again as it would be data-dependency - most likely cpu wouldn't be able to  pipeline
it efficiently:

rte_crypto_sym_process(uint8_t dev_id, uint16 qp_id, rte_crypto_sym_session *ses, ...)
{
     struct rte_cryptodev *dev = &rte_cryptodevs[dev_id];
     return (*dev->process)(sess->data[dev->driver_id, ...);
}

driver_specific_process(driver_specific_sym_session *sess)
{
   return sess->process(sess, ...) ;
}

I didn't make any exact measurements but sure it would be slower than just:
session_udata->process(session->udata->sess, ...);
Again it would be much more noticeable on low end cpus.
Let say here: http://mails.dpdk.org/archives/dev/2019-September/144350.html
Jerin claims 1.5-3% drop for introducing extra call via hiding eth_dev contents -
I suppose we would have something similar here.
I do realize that in majority of cases crypto is more expensive then RX/TX, but still. 

If it would be a really unavoidable tradeoff (support already existing API, or so)
I wouldn't mind, but I don't see any real need for it right now.

> 
> >
> > > That is why a dev-ops would be a better option.
> > >
> > > >
> > > > > When you would add more functionality to this sync API/struct, it will end
> > up
> > > > being the same API/struct.
> > > > >
> > > > > Let us  see how close/ far we are from the existing APIs when the actual
> > > > implementation is done.
> > > > >
> > > > > > > I am not sure if that would be needed.
> > > > > > > It would be internal to the driver that if synchronous processing is
> > > > > > supported(from feature flag) and
> > > > > > > Have relevant fields in xform(the newly added ones which are packed
> > as
> > > > per
> > > > > > your suggestions) set,
> > > > > > > It will create that type of session.
> > > > > > >
> > > > > > >
> > > > > > > > + * Main points:
> > > > > > > > + * - Current crypto-dev API is reasonably mature and it is desirable
> > > > > > > > + *   to keep it unchanged (API/ABI stability). From other side, this
> > > > > > > > + *   new sync API is new one and probably would require extra
> > changes.
> > > > > > > > + *   Having it as a new one allows to mark it as experimental, without
> > > > > > > > + *   affecting existing one.
> > > > > > > > + * - Fully opaque cpu_sym_session structure gives more flexibility
> > > > > > > > + *   to the PMD writers and again allows to avoid ABI breakages in
> > future.
> > > > > > > > + * - process() function per set of xforms
> > > > > > > > + *   allows to expose different process() functions for different
> > > > > > > > + *   xform combinations. PMD writer can decide, does he wants to
> > > > > > > > + *   push all supported algorithms into one process() function,
> > > > > > > > + *   or spread it across several ones.
> > > > > > > > + *   I.E. More flexibility for PMD writer.
> > > > > > >
> > > > > > > Which process function should be chosen is internal to PMD, how
> > would
> > > > that
> > > > > > info
> > > > > > > be visible to the application or the library. These will get stored in the
> > > > session
> > > > > > private
> > > > > > > data. It would be upto the PMD writer, to store the per session process
> > > > > > function in
> > > > > > > the session private data.
> > > > > > >
> > > > > > > Process function would be a dev ops just like enc/deq operations and it
> > > > should
> > > > > > call
> > > > > > > The respective process API stored in the session private data.
> > > > > >
> > > > > > That model (via devops) is possible, but has several drawbacks from my
> > > > > > perspective:
> > > > > >
> > > > > > 1. It means we'll need to pass dev_id as a parameter to process() function.
> > > > > > Though in fact dev_id is not a relevant information for us here
> > > > > > (all we need is pointer to the session and pointer to the fuction to call)
> > > > > > and I tried to avoid using it in data-path functions for that API.
> > > > >
> > > > > You have a single vdev, but someone may have multiple vdevs for each
> > thread,
> > > > or may
> > > > > Have same dev with multiple queues for each core.
> > > >
> > > > That's fine. As I said above it is a SW backed implementation.
> > > > Each session has to be a separate entity that contains all necessary
> > information
> > > > (keys, alg/mode info,  etc.)  to process input buffers.
> > > > Plus we need the actual function pointer to call.
> > > > I just don't see what for we need a dev_id in that situation.
> > >
> > > To iterate the session private data in the session.
> > >
> > > > Again, here we don't need care about queues and their pinning to cores.
> > > > If let say someone would like to process buffers from the same IPsec SA on 2
> > > > different cores in parallel, he can just create 2 sessions for the same xform,
> > > > give one to thread #1  and second to thread #2.
> > > > After that both threads are free to call process(this_thread_ses, ...) at will.
> > >
> > > Say you have a 16core device to handle 100G of traffic on a single tunnel.
> > > Will we make 16 sessions with same parameters?
> >
> > Absolutely same question we can ask for current crypto-op API.
> > You have lookaside crypto-dev with 16 HW queues, each queue is serviced by
> > different CPU.
> > For the same SA, do you need a separate session per queue, or is it ok to reuse
> > current one?
> > AFAIK, right now this is a grey area not clearly defined.
> > For crypto-devs I am aware - user can reuse the same session (as PMD uses it
> > read-only).
> > But again, right now I think it is not clearly defined and is implementation
> > specific.
> 
> User can use the same session, that is what I am also insisting, but it may have separate
> Session private data. Cryptodev session create API provide that functionality and we can
> Leverage that.

rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means we can't use
the same rte_cryptodev_sym_session to hold sessions for both sync and async mode
for the same device. Off course we can add a hard requirement that any driver that wants to
support process() has to create sessions that can handle both  process and enqueue/dequeue,
but then again  what for to create such overhead?

BTW, to be honest, I don't consider current rte_cryptodev_sym_session construct for multiple device_ids:
__extension__ struct {
                void *data;
                uint16_t refcnt;
        } sess_data[0];
        /**< Driver specific session material, variable size */

as an advantage.
It looks too error prone for me:
1. Simultaneous session initialization/de-initialization for devices with the same driver_id is not possible.
2. It assumes that all device driver will be loaded before we start to create session pools.

Right now it seems ok, as no-one requires such functionality, but I don't know how it will be in future.
For me rte_security session model, where for each security context user have to create new session
looks much more robust.
 
> 
> BTW, I can see a v2 to this RFC which is still based on security library.

Yes, v2 was concentrated on fixing found issues, some code restructuring, 
i.e. - changes that would be needed anyway whatever API aproach we'll choose.

> When do you plan
> To submit the patches for crypto based APIs. We have RC1 merge deadline for this
> patchset on 21st Oct.

We'd like to start working on it ASAP, but it seems we still have a major disagreement
about how this crypto-dev API should look like.  
Which makes me think - should we return to our original proposal via rte_security?
It still looks to me like clean and straightforward way to enable this new API,
and probably wouldn't cause that much controversy.
What do you think? 

> 
> As per my understanding you only need a new dev-op for sync support. Session APIs
> Will remain the same and you will have some extra fields packed in xform structs.
> 
> The PMD will need to maintain a pointer to the per session process function while creating
> Session and will be used by the dev-op API at runtime without any extra check at runtime.
> 
> >
> > >
> > > >
> > > > >
> > > > > > 2. As you pointed in that case it will be just one process() function per
> > device.
> > > > > > So if PMD would like to have several process() functions for different type
> > of
> > > > > > sessions
> > > > > > (let say one per alg) first thing it has to do inside it's process() - read
> > session
> > > > data
> > > > > > and
> > > > > > based on that, do a jump/call to particular internal sub-routine.
> > > > > > Something like:
> > > > > > driver_id = get_pmd_driver_id();
> > > > > > priv_ses = ses->sess_data[driver_id];
> > > > > > Then either:
> > > > > > switch(priv_sess->alg) {case XXX: process_XXX(priv_sess, ...);break;...}
> > > > > > OR
> > > > > > priv_ses->process(priv_sess, ...);
> > > > > >
> > > > > > to select and call the proper function.
> > > > > > Looks like totally unnecessary overhead to me.
> > > > > > Though if we'll have ability to query/extract some sort session_ops based
> > on
> > > > the
> > > > > > xform -
> > > > > > we can avoid  this extra de-refererence+jump/call thing.
> > > > >
> > > > > What is the issue in the priv_ses->process(); approach?
> > > >
> > > > Nothing at all.
> > > > What I am saying that schema with dev_ops
> > > > dev[dev_id]->dev_ops.process(ses->priv_ses[driver_id], ...)
> > > >    |
> > > >    |-> priv_ses->process(...)
> > > >
> > > > Has bigger overhead then just:
> > > > process(ses,...);
> > > >
> > > > So what for to introduce extra-level of indirection here?
> > >
> > > Explained above.
> > >
> > > >
> > > > > I don't understand what are you saving by not doing this.
> > > > > In any case you would need to identify which session correspond to which
> > > > process().
> > > >
> > > > Yes, sure, but I think we can make user to store information that relationship,
> > > > in a way he likes: store process() pointer for each session, or group sessions
> > > > that share the same process() somehow, or...
> > >
> > > So whatever relationship that user will make and store will make its life
> > complicated.
> > > If we can hide that information in the driver, then what is the issue in that and
> > user
> > > Will not need to worry. He would just call the process() and driver will choose
> > which
> > > Process need to be called.
> >
> > Driver can do that at config/init time.
> > Then at run-time we can avoid that choice at all and call already chosen function.
> >
> > >
> > > I think we should have a POC around this and see the difference in the cycle
> > count.
> > > IMO it would be negligible and we would end up making a generic API set
> > which
> > > can be used by others as well.
> > >
> > > >
> > > > > For that you would be doing it somewhere in your data path.
> > > >
> > > > Why at data-path?
> > > > Only once at session creation/initialization time.
> > > > Or might be even once per group of sessions.
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > I am not sure if you would need a new session init API for this as
> > nothing
> > > > would
> > > > > > be visible to
> > > > > > > the app or lib.
> > > > > > >
> > > > > > > > + * - Not storing process() pointer inside the session -
> > > > > > > > + *   Allows user to choose does he want to store a process() pointer
> > > > > > > > + *   per session, or per group of sessions for that device that share
> > > > > > > > + *   the same input xforms. I.E. extra flexibility for the user,
> > > > > > > > + *   plus allows us to keep cpu_sym_session totally opaque, see
> > above.
> > > > > > >
> > > > > > > If multiple sessions need to be processed via the same process function,
> > > > > > > PMD would save the same process in all the sessions, I don't think there
> > > > would
> > > > > > > be any perf overhead with that.
> > > > > >
> > > > > > I think it would, see above.
> > > > > >
> > > > > > >
> > > > > > > > + * Sketched usage model:
> > > > > > > > + * ....
> > > > > > > > + * /* control path, alloc/init session */
> > > > > > > > + * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
> > > > > > > > + * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
> > > > > > > > + * rte_crypto_cpu_sym_process_t process =
> > > > > > > > + *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
> > > > > > > > + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
> > > > > > > > + * ...
> > > > > > > > + * /* data-path*/
> > > > > > > > + * process(ses, ....);
> > > > > > > > + * ....
> > > > > > > > + * /* control path, termiante/free session */
> > > > > > > > + * rte_crypto_cpu_sym_session_fini(dev_id, ses);
> > > > > > > > + */
> > > > > > > > +
> > > > > > > > +/**
> > > > > > > > + * vector structure, contains pointer to vector array and the length
> > > > > > > > + * of the array
> > > > > > > > + */
> > > > > > > > +struct rte_crypto_vec {
> > > > > > > > +       struct iovec *vec;
> > > > > > > > +       uint32_t num;
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * Data-path bulk process crypto function.
> > > > > > > > + */
> > > > > > > > +typedef void (*rte_crypto_cpu_sym_process_t)(
> > > > > > > > +               struct rte_crypto_cpu_sym_session *sess,
> > > > > > > > +               struct rte_crypto_vec buf[], void *iv[], void *aad[],
> > > > > > > > +               void *digest[], int status[], uint32_t num);
> > > > > > > > +/*
> > > > > > > > + * for given device return process function specific to input xforms
> > > > > > > > + * on error - return NULL and set rte_errno value.
> > > > > > > > + * Note that for same input xfroms for the same device should
> > return
> > > > > > > > + * the same process function.
> > > > > > > > + */
> > > > > > > > +__rte_experimental
> > > > > > > > +rte_crypto_cpu_sym_process_t
> > > > > > > > +rte_crypto_cpu_sym_session_func(uint8_t dev_id,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * Return required session size in bytes for given set of xforms.
> > > > > > > > + * if xforms == NULL, then return the max possible session size,
> > > > > > > > + * that would fit session for any supported by the device algorithm.
> > > > > > > > + * if CPU mode is not supported at all, or requeted in xform
> > > > > > > > + * algorithm is not supported, then return -ENOTSUP.
> > > > > > > > + */
> > > > > > > > +__rte_experimental
> > > > > > > > +int
> > > > > > > > +rte_crypto_cpu_sym_session_size(uint8_t dev_id,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * Initialize session.
> > > > > > > > + * It is caller responsibility to allocate enough space for it.
> > > > > > > > + * See rte_crypto_cpu_sym_session_size above.
> > > > > > > > + */
> > > > > > > > +__rte_experimental
> > > > > > > > +int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
> > > > > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > > +__rte_experimental
> > > > > > > > +void
> > > > > > > > +rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
> > > > > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > > > > +
> > > > > > > > +
> > > > > > > >  #ifdef __cplusplus
> > > > > > > >  }
> > > > > > > >  #endif
> > > > > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > > b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > > index defe05ea0..ed7e63fab 100644
> > > > > > > > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > > @@ -310,6 +310,20 @@ typedef void
> > > > > > (*cryptodev_sym_free_session_t)(struct
> > > > > > > > rte_cryptodev *dev,
> > > > > > > >  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev
> > > > *dev,
> > > > > > > >                 struct rte_cryptodev_asym_session *sess);
> > > > > > > >
> > > > > > > > +typedef int (*cryptodev_cpu_sym_session_size_t) (struct
> > rte_cryptodev
> > > > > > *dev,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > > +typedef int (*cryptodev_cpu_sym_session_init_t) (struct
> > rte_cryptodev
> > > > > > *dev,
> > > > > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > > +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct
> > rte_cryptodev
> > > > > > *dev,
> > > > > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > > > > +
> > > > > > > > +typedef rte_crypto_cpu_sym_process_t
> > > > > > (*cryptodev_cpu_sym_session_func_t)
> > > > > > > > (
> > > > > > > > +                       struct rte_cryptodev *dev,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > >  /** Crypto device operations function pointer table */
> > > > > > > >  struct rte_cryptodev_ops {
> > > > > > > >         cryptodev_configure_t dev_configure;    /**< Configure device.
> > */
> > > > > > > > @@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
> > > > > > > >         /**< Clear a Crypto sessions private data. */
> > > > > > > >         cryptodev_asym_free_session_t asym_session_clear;
> > > > > > > >         /**< Clear a Crypto sessions private data. */
> > > > > > > > +
> > > > > > > > +       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
> > > > > > > > +       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
> > > > > > > > +       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
> > > > > > > > +       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
> > > > > > > >  };
> > > > > > > >
> > > > > > > >
> > > > > > > >
  
Akhil Goyal Oct. 11, 2019, 1:23 p.m. UTC | #24
Hi Konstantin,

> 
> Hi Akhil,
> 
..[snip]

> > > > > > OK let us assume that you have a separate structure. But I have a few
> > > queries:
> > > > > > 1. how can multiple drivers use a same session
> > > > >
> > > > > As a short answer: they can't.
> > > > > It is pretty much the same approach as with rte_security - each device
> needs
> > > to
> > > > > create/init its own session.
> > > > > So upper layer would need to maintain its own array (or so) for such case.
> > > > > Though the question is why would you like to have same session over
> > > multiple
> > > > > SW backed devices?
> > > > > As it would be anyway just a synchronous function call that will be
> executed
> > > on
> > > > > the same cpu.
> > > >
> > > > I may have single FAT tunnel which may be distributed over multiple
> > > > Cores, and each core is affined to a different SW device.
> > >
> > > If it is pure SW, then we don't need multiple devices for such scenario.
> > > Device in that case is pure abstraction that we can skip.
> >
> > Yes agreed, but that liberty is given to the application whether it need multiple
> > devices with single queue or a single device with multiple queues.
> > I think that independence should not be broken in this new API.
> > >
> > > > So a single session may be accessed by multiple devices.
> > > >
> > > > One more example would be depending on packet sizes, I may switch
> between
> > > > HW/SW PMDs with the same session.
> > >
> > > Sure, but then we'll have multiple sessions.
> >
> > No, the session will be same and it will have multiple private data for each of
> the PMD.
> >
> > > BTW, we have same thing now - these private session pointers are just
> stored
> > > inside the same rte_crypto_sym_session.
> > > And if user wants to support this model, he would also need to store <dev_id,
> > > queue_id>
> > > pair for each HW device anyway.
> >
> > Yes agreed, but how is that thing happening in your new struct, you cannot
> support that.
> 
> User can store all these info in his own struct.
> That's exactly what we have right now.
> Let say ipsec-secgw has to store for each IPsec SA:
> pointer to crypto-session and/or pointer to security session
> plus (for lookaside-devices) cdev_id_qp that allows it to extract
> dev_id + queue_id information.
> As I understand that works for now, as each ipsec_sa uses only one
> dev+queue. Though if someone would like to use multiple devices/queues
> for the same SA - he would need to have an array of these <dev+queue> pairs.
> So even right now rte_cryptodev_sym_session is not self-consistent and
> requires extra information to be maintained by user.

Why are you increasing the complexity for the user application.
The new APIs and struct should be such that it need to do minimum changes in the stack
so that stack is portable on multiple vendors.
You should try to hide as much complexity in the driver or lib to give the user simple APIs.

Having a same session for multiple devices was added by Intel only for some use cases.
And we had split that session create API into 2. Now if those are not useful shall we move back
to the single API. I think @Doherty, Declan and @De Lara Guarch, Pablo can comment on this.

> 
> >
> > >
> > > >
> > > > >
> > > > > > 2. Can somebody use the scheduler pmd for scheduling the different
> type
> > > of
> > > > > payloads for the same session?
> > > > >
> > > > > In theory yes.
> > > > > Though for that scheduler pmd should have inside it's
> > > > > rte_crypto_cpu_sym_session an array of pointers to
> > > > > the underlying devices sessions.
> > > > >
> > > > > >
> > > > > > With your proposal the APIs would be very specific to your use case
> only.
> > > > >
> > > > > Yes in some way.
> > > > > I consider that API specific for SW backed crypto PMDs.
> > > > > I can hardly see how any 'real HW' PMDs (lksd-none, lksd-proto) will
> benefit
> > > > > from it.
> > > > > Current crypto-op API is very much HW oriented.
> > > > > Which is ok, that's for it was intended for, but I think we also need one
> that
> > > > > would be designed
> > > > > for SW backed implementation in mind.
> > > >
> > > > We may re-use your API for HW PMDs as well which do not have
> requirement
> > > of
> > > > Crypto-op/mbuf etc.
> > > > The return type of your new process API may have a status which say
> > > 'processed'
> > > > Or can be say 'enqueued'. So if it is  'enqueued', we may have a new API for
> > > raw
> > > > Bufs dequeue as well.
> > > >
> > > > This requirement can be for any hardware PMDs like QAT as well.
> > >
> > > I don't think it is a good idea to extend this API for async (lookaside) devices.
> > > You'll need to:
> > >  - provide dev_id and queue_id for each process(enqueue) and dequeuer
> > > operation.
> > >  - provide IOVA for all buffers passing to that function (data buffers, digest,
> IV,
> > > aad).
> > >  - On dequeue provide some way to associate dequed data and digest buffers
> > > with
> > >    crypto-session that was used  (and probably with mbuf).
> > >  So most likely we'll end up with another just version of our current crypto-op
> > > structure.
> > > If you'd like to get rid of mbufs dependency within current crypto-op API that
> > > understandable,
> > > but I don't think we should have same API for both sync (CPU) and async
> > > (lookaside) cases.
> > > It doesn't seem feasible at all and voids whole purpose of that patch.
> >
> > At this moment we are not much concerned about the dequeue API and about
> the
> > HW PMD support. It is just that the new API should be generic enough to be
> used in
> > some future scenarios as well. I am just highlighting the possible usecases
> which can
> > be there in future.
> 
> Sorry, but I strongly disagree with such approach.
> We should stop adding/modifying API 'just in case' and because 'it might be
> useful for some future HW'.
> Inside DPDK we already do have too many dev level APIs without any
> implementations.
> That's quite bad practice and very dis-orienting for end-users.
> I think to justify API additions/changes we need at least one proper
> implementation for it,
> or at least some strong evidence that people are really committed to support it
> in nearest future.
> BTW, that what TB agreed on, nearly a year ago.
> 
> This new API (if we'll go ahead with it of course) would stay experimental for
> some time anyway
> to make sure we don't miss anything needed (I think for about a year time-
> frame).
> So if you guys *really* want to extend it support _async_ devices too -
> I am open for modifications/additions here.
> Though personally I think such addition would over-complicate things and we'll
> end up with
> another reincarnation of current crypto-op.
> We actually discussed it internally, and decided to drop that idea because of that.
> Again, my opinion - for lookaside devices it might be better to try to optimize
> current crypto-op path (remove mbuf requirement, probably add  ability to
> group by session on enqueue/dequeue, etc.).

I agree that the new API is experimental and can be modified later. So no issues in that,
but we can keep some things in mind while defining APIs. These were some comments from
my side, if those are impacting the current scenario, you can drop those. We will take care of those
later.

> 
> >
> > What is the issue that you face in making a dev-op for this new API. Do you see
> any
> > performance impact with that?
> 
> There are two main things:
> 1. user would need to maintain and provide for each process() call
> dev_id+queue_id.
> That's means extra (and totally unnecessary for SW) overhead.

You are using a crypto device for performing the processing,
you must use dev_id to identify which SW device it is. This is how the DPDK
Framework works.
.

> 2. yes I would expect some perf overhead too - it would be extra call or branch.
> Again as it would be data-dependency - most likely cpu wouldn't be able to
> pipeline
> it efficiently:
> 
> rte_crypto_sym_process(uint8_t dev_id, uint16 qp_id, rte_crypto_sym_session
> *ses, ...)
> {
>      struct rte_cryptodev *dev = &rte_cryptodevs[dev_id];
>      return (*dev->process)(sess->data[dev->driver_id, ...);
> }
> 
> driver_specific_process(driver_specific_sym_session *sess)
> {
>    return sess->process(sess, ...) ;
> }
> 
> I didn't make any exact measurements but sure it would be slower than just:
> session_udata->process(session->udata->sess, ...);
> Again it would be much more noticeable on low end cpus.
> Let say here:
> http://mails.dpdk.org/archives/dev/2019-September/144350.html
> Jerin claims 1.5-3% drop for introducing extra call via hiding eth_dev contents -
> I suppose we would have something similar here.
> I do realize that in majority of cases crypto is more expensive then RX/TX, but
> still.
> 
> If it would be a really unavoidable tradeoff (support already existing API, or so)
> I wouldn't mind, but I don't see any real need for it right now.

Calling session_udata->process(session->udata->sess, ...); from the application and
Application need to maintain for each PMD the process() API in its memory will make
the application not portable to other vendors.

What we are doing here is defining another way to create sessions for the same stuff
that is already done. This make applications non-portable and confusing for the application
writer.

I would say you should do some profiling first. As you also mentioned crypto workload is more
Cycle consuming, it will not impact this case.


> 
> >
> > >
> > > > That is why a dev-ops would be a better option.
> > > >
> > > > >
> > > > > > When you would add more functionality to this sync API/struct, it will
> end
> > > up
> > > > > being the same API/struct.
> > > > > >
> > > > > > Let us  see how close/ far we are from the existing APIs when the
> actual
> > > > > implementation is done.
> > > > > >
> > > > > > > > I am not sure if that would be needed.
> > > > > > > > It would be internal to the driver that if synchronous processing is
> > > > > > > supported(from feature flag) and
> > > > > > > > Have relevant fields in xform(the newly added ones which are
> packed
> > > as
> > > > > per
> > > > > > > your suggestions) set,
> > > > > > > > It will create that type of session.
> > > > > > > >
> > > > > > > >
> > > > > > > > > + * Main points:
> > > > > > > > > + * - Current crypto-dev API is reasonably mature and it is
> desirable
> > > > > > > > > + *   to keep it unchanged (API/ABI stability). From other side, this
> > > > > > > > > + *   new sync API is new one and probably would require extra
> > > changes.
> > > > > > > > > + *   Having it as a new one allows to mark it as experimental,
> without
> > > > > > > > > + *   affecting existing one.
> > > > > > > > > + * - Fully opaque cpu_sym_session structure gives more flexibility
> > > > > > > > > + *   to the PMD writers and again allows to avoid ABI breakages
> in
> > > future.
> > > > > > > > > + * - process() function per set of xforms
> > > > > > > > > + *   allows to expose different process() functions for different
> > > > > > > > > + *   xform combinations. PMD writer can decide, does he wants
> to
> > > > > > > > > + *   push all supported algorithms into one process() function,
> > > > > > > > > + *   or spread it across several ones.
> > > > > > > > > + *   I.E. More flexibility for PMD writer.
> > > > > > > >
> > > > > > > > Which process function should be chosen is internal to PMD, how
> > > would
> > > > > that
> > > > > > > info
> > > > > > > > be visible to the application or the library. These will get stored in
> the
> > > > > session
> > > > > > > private
> > > > > > > > data. It would be upto the PMD writer, to store the per session
> process
> > > > > > > function in
> > > > > > > > the session private data.
> > > > > > > >
> > > > > > > > Process function would be a dev ops just like enc/deq operations
> and it
> > > > > should
> > > > > > > call
> > > > > > > > The respective process API stored in the session private data.
> > > > > > >
> > > > > > > That model (via devops) is possible, but has several drawbacks from
> my
> > > > > > > perspective:
> > > > > > >
> > > > > > > 1. It means we'll need to pass dev_id as a parameter to process()
> function.
> > > > > > > Though in fact dev_id is not a relevant information for us here
> > > > > > > (all we need is pointer to the session and pointer to the fuction to call)
> > > > > > > and I tried to avoid using it in data-path functions for that API.
> > > > > >
> > > > > > You have a single vdev, but someone may have multiple vdevs for each
> > > thread,
> > > > > or may
> > > > > > Have same dev with multiple queues for each core.
> > > > >
> > > > > That's fine. As I said above it is a SW backed implementation.
> > > > > Each session has to be a separate entity that contains all necessary
> > > information
> > > > > (keys, alg/mode info,  etc.)  to process input buffers.
> > > > > Plus we need the actual function pointer to call.
> > > > > I just don't see what for we need a dev_id in that situation.
> > > >
> > > > To iterate the session private data in the session.
> > > >
> > > > > Again, here we don't need care about queues and their pinning to cores.
> > > > > If let say someone would like to process buffers from the same IPsec SA
> on 2
> > > > > different cores in parallel, he can just create 2 sessions for the same
> xform,
> > > > > give one to thread #1  and second to thread #2.
> > > > > After that both threads are free to call process(this_thread_ses, ...) at will.
> > > >
> > > > Say you have a 16core device to handle 100G of traffic on a single tunnel.
> > > > Will we make 16 sessions with same parameters?
> > >
> > > Absolutely same question we can ask for current crypto-op API.
> > > You have lookaside crypto-dev with 16 HW queues, each queue is serviced by
> > > different CPU.
> > > For the same SA, do you need a separate session per queue, or is it ok to
> reuse
> > > current one?
> > > AFAIK, right now this is a grey area not clearly defined.
> > > For crypto-devs I am aware - user can reuse the same session (as PMD uses it
> > > read-only).
> > > But again, right now I think it is not clearly defined and is implementation
> > > specific.
> >
> > User can use the same session, that is what I am also insisting, but it may have
> separate
> > Session private data. Cryptodev session create API provide that functionality
> and we can
> > Leverage that.
> 
> rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means
> we can't use
> the same rte_cryptodev_sym_session to hold sessions for both sync and async
> mode
> for the same device. Off course we can add a hard requirement that any driver
> that wants to
> support process() has to create sessions that can handle both  process and
> enqueue/dequeue,
> but then again  what for to create such overhead?
> 
> BTW, to be honest, I don't consider current rte_cryptodev_sym_session
> construct for multiple device_ids:
> __extension__ struct {
>                 void *data;
>                 uint16_t refcnt;
>         } sess_data[0];
>         /**< Driver specific session material, variable size */
> 
Yes I also feel the same. I was also not in favor of this when it was introduced.
Please go ahead and remove this. I have no issues with that.

> as an advantage.
> It looks too error prone for me:
> 1. Simultaneous session initialization/de-initialization for devices with the same
> driver_id is not possible.
> 2. It assumes that all device driver will be loaded before we start to create
> session pools.
> 
> Right now it seems ok, as no-one requires such functionality, but I don't know
> how it will be in future.
> For me rte_security session model, where for each security context user have to
> create new session
> looks much more robust.
Agreed

> 
> >
> > BTW, I can see a v2 to this RFC which is still based on security library.
> 
> Yes, v2 was concentrated on fixing found issues, some code restructuring,
> i.e. - changes that would be needed anyway whatever API aproach we'll choose.
> 
> > When do you plan
> > To submit the patches for crypto based APIs. We have RC1 merge deadline for
> this
> > patchset on 21st Oct.
> 
> We'd like to start working on it ASAP, but it seems we still have a major
> disagreement
> about how this crypto-dev API should look like.
> Which makes me think - should we return to our original proposal via
> rte_security?
> It still looks to me like clean and straightforward way to enable this new API,
> and probably wouldn't cause that much controversy.
> What do you think?

I cannot spend more time discussing on this until RC1 date. I have some other stuff pending.
You can send the patches early next week with the approach that I mentioned or else we
can discuss this post RC1(which would mean deferring to 20.02).

But moving back to security is not acceptable to me. The code should be put where it is
intended and not where it is easy to put. You are not doing any rte_security stuff.


Regards,
Akhil
  
Fan Zhang Oct. 13, 2019, 11:07 p.m. UTC | #25
Hi Akhil,

Thanks for the review and comments! 
Knowing you are extremely busy. Here is my point in brief:
I think placing the CPU synchronous crypto in the rte_security make sense, as

1. rte_security contains inline crypto and lookaside crypto action type already, adding cpu_crypto action type is reasonable.
2. rte_security contains the security features may not supported by all devices, such as crypto, ipsec, and PDCP. cpu_crypto follow this category, again crypto.
3. placing CPU synchronous crypto API in rte_security is natural - as inline mode works synchronously, too. However cryptodev doesn't.
4. placing CPU synchronous crypto API in rte_security helps boosting SW crypto performance, I have already provided a simple perf test inside the unit test in the patchset for the user to try out - just comparing its output against DPDK crypto perf app output.
5. placing CPU synchronous crypto API in cryptodev will never serve HW lookaside crypto PMDs, as making them to work synchronously have huge performance penalty. However Cryptodev framework's existing design is providing APIs that will work in all crypto PMDs (rte_cryptodev_enqueue_burst / dequeue_burst for example), this does not fit in cryptodev's principle.
6. placing CPU synchronous crypto API in cryptodev confuses the user, as: 
	- the session created for async mode may not work in sync mode
	- both enqueue/dequeue and cpu_crypto_process does the same crypto processing, but one PMD may support only one API (set), the other may support another, and the third PMD supports both. We have to provide another API to let the user query which one to support which.
	- two completely different code paths for async/sync mode.
7. You said in the end of the email - placing CPU synchronous crypto API into rte_security is not acceptable as it does not do any rte_security stuff - crypto isn't? You may call this a quibble, but in my idea, in the patchset both PMDs' implementations did offload the work to the CPU's special circuit designed dedicated to accelerate the crypto processing.

To me cryptodev is the one CPU synchronous crypto API should not go into, rte_security is.

Regards,
Fan

> -----Original Message-----
> From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> Sent: Friday, October 11, 2019 2:24 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; 'dev@dpdk.org'
> <dev@dpdk.org>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>;
> 'Thomas Monjalon' <thomas@monjalon.net>; Zhang, Roy Fan
> <roy.fan.zhang@intel.com>; Doherty, Declan <declan.doherty@intel.com>
> Cc: 'Anoob Joseph' <anoobj@marvell.com>
> Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and
> API
> 
> Hi Konstantin,
> 
> >
> > Hi Akhil,
> >
> ..[snip]
> 
> > > > > > > OK let us assume that you have a separate structure. But I
> > > > > > > have a few
> > > > queries:
> > > > > > > 1. how can multiple drivers use a same session
> > > > > >
> > > > > > As a short answer: they can't.
> > > > > > It is pretty much the same approach as with rte_security -
> > > > > > each device
> > needs
> > > > to
> > > > > > create/init its own session.
> > > > > > So upper layer would need to maintain its own array (or so) for such
> case.
> > > > > > Though the question is why would you like to have same session
> > > > > > over
> > > > multiple
> > > > > > SW backed devices?
> > > > > > As it would be anyway just a synchronous function call that
> > > > > > will be
> > executed
> > > > on
> > > > > > the same cpu.
> > > > >
> > > > > I may have single FAT tunnel which may be distributed over
> > > > > multiple Cores, and each core is affined to a different SW device.
> > > >
> > > > If it is pure SW, then we don't need multiple devices for such scenario.
> > > > Device in that case is pure abstraction that we can skip.
> > >
> > > Yes agreed, but that liberty is given to the application whether it
> > > need multiple devices with single queue or a single device with multiple
> queues.
> > > I think that independence should not be broken in this new API.
> > > >
> > > > > So a single session may be accessed by multiple devices.
> > > > >
> > > > > One more example would be depending on packet sizes, I may
> > > > > switch
> > between
> > > > > HW/SW PMDs with the same session.
> > > >
> > > > Sure, but then we'll have multiple sessions.
> > >
> > > No, the session will be same and it will have multiple private data
> > > for each of
> > the PMD.
> > >
> > > > BTW, we have same thing now - these private session pointers are
> > > > just
> > stored
> > > > inside the same rte_crypto_sym_session.
> > > > And if user wants to support this model, he would also need to
> > > > store <dev_id, queue_id> pair for each HW device anyway.
> > >
> > > Yes agreed, but how is that thing happening in your new struct, you
> > > cannot
> > support that.
> >
> > User can store all these info in his own struct.
> > That's exactly what we have right now.
> > Let say ipsec-secgw has to store for each IPsec SA:
> > pointer to crypto-session and/or pointer to security session plus (for
> > lookaside-devices) cdev_id_qp that allows it to extract dev_id +
> > queue_id information.
> > As I understand that works for now, as each ipsec_sa uses only one
> > dev+queue. Though if someone would like to use multiple devices/queues
> > for the same SA - he would need to have an array of these <dev+queue>
> pairs.
> > So even right now rte_cryptodev_sym_session is not self-consistent and
> > requires extra information to be maintained by user.
> 
> Why are you increasing the complexity for the user application.
> The new APIs and struct should be such that it need to do minimum changes
> in the stack so that stack is portable on multiple vendors.
> You should try to hide as much complexity in the driver or lib to give the user
> simple APIs.
> 
> Having a same session for multiple devices was added by Intel only for some
> use cases.
> And we had split that session create API into 2. Now if those are not useful
> shall we move back to the single API. I think @Doherty, Declan and @De Lara
> Guarch, Pablo can comment on this.
> 
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > 2. Can somebody use the scheduler pmd for scheduling the
> > > > > > > different
> > type
> > > > of
> > > > > > payloads for the same session?
> > > > > >
> > > > > > In theory yes.
> > > > > > Though for that scheduler pmd should have inside it's
> > > > > > rte_crypto_cpu_sym_session an array of pointers to the
> > > > > > underlying devices sessions.
> > > > > >
> > > > > > >
> > > > > > > With your proposal the APIs would be very specific to your
> > > > > > > use case
> > only.
> > > > > >
> > > > > > Yes in some way.
> > > > > > I consider that API specific for SW backed crypto PMDs.
> > > > > > I can hardly see how any 'real HW' PMDs (lksd-none,
> > > > > > lksd-proto) will
> > benefit
> > > > > > from it.
> > > > > > Current crypto-op API is very much HW oriented.
> > > > > > Which is ok, that's for it was intended for, but I think we
> > > > > > also need one
> > that
> > > > > > would be designed
> > > > > > for SW backed implementation in mind.
> > > > >
> > > > > We may re-use your API for HW PMDs as well which do not have
> > requirement
> > > > of
> > > > > Crypto-op/mbuf etc.
> > > > > The return type of your new process API may have a status which
> > > > > say
> > > > 'processed'
> > > > > Or can be say 'enqueued'. So if it is  'enqueued', we may have a
> > > > > new API for
> > > > raw
> > > > > Bufs dequeue as well.
> > > > >
> > > > > This requirement can be for any hardware PMDs like QAT as well.
> > > >
> > > > I don't think it is a good idea to extend this API for async (lookaside)
> devices.
> > > > You'll need to:
> > > >  - provide dev_id and queue_id for each process(enqueue) and
> > > > dequeuer operation.
> > > >  - provide IOVA for all buffers passing to that function (data
> > > > buffers, digest,
> > IV,
> > > > aad).
> > > >  - On dequeue provide some way to associate dequed data and digest
> > > > buffers with
> > > >    crypto-session that was used  (and probably with mbuf).
> > > >  So most likely we'll end up with another just version of our
> > > > current crypto-op structure.
> > > > If you'd like to get rid of mbufs dependency within current
> > > > crypto-op API that understandable, but I don't think we should
> > > > have same API for both sync (CPU) and async
> > > > (lookaside) cases.
> > > > It doesn't seem feasible at all and voids whole purpose of that patch.
> > >
> > > At this moment we are not much concerned about the dequeue API and
> > > about
> > the
> > > HW PMD support. It is just that the new API should be generic enough
> > > to be
> > used in
> > > some future scenarios as well. I am just highlighting the possible
> > > usecases
> > which can
> > > be there in future.
> >
> > Sorry, but I strongly disagree with such approach.
> > We should stop adding/modifying API 'just in case' and because 'it
> > might be useful for some future HW'.
> > Inside DPDK we already do have too many dev level APIs without any
> > implementations.
> > That's quite bad practice and very dis-orienting for end-users.
> > I think to justify API additions/changes we need at least one proper
> > implementation for it, or at least some strong evidence that people
> > are really committed to support it in nearest future.
> > BTW, that what TB agreed on, nearly a year ago.
> >
> > This new API (if we'll go ahead with it of course) would stay
> > experimental for some time anyway to make sure we don't miss anything
> > needed (I think for about a year time- frame).
> > So if you guys *really* want to extend it support _async_ devices too
> > - I am open for modifications/additions here.
> > Though personally I think such addition would over-complicate things
> > and we'll end up with another reincarnation of current crypto-op.
> > We actually discussed it internally, and decided to drop that idea because
> of that.
> > Again, my opinion - for lookaside devices it might be better to try to
> > optimize current crypto-op path (remove mbuf requirement, probably add
> > ability to group by session on enqueue/dequeue, etc.).
> 
> I agree that the new API is experimental and can be modified later. So no
> issues in that, but we can keep some things in mind while defining APIs.
> These were some comments from my side, if those are impacting the current
> scenario, you can drop those. We will take care of those later.
> 
> >
> > >
> > > What is the issue that you face in making a dev-op for this new API.
> > > Do you see
> > any
> > > performance impact with that?
> >
> > There are two main things:
> > 1. user would need to maintain and provide for each process() call
> > dev_id+queue_id.
> > That's means extra (and totally unnecessary for SW) overhead.
> 
> You are using a crypto device for performing the processing, you must use
> dev_id to identify which SW device it is. This is how the DPDK Framework
> works.
> .
> 
> > 2. yes I would expect some perf overhead too - it would be extra call or
> branch.
> > Again as it would be data-dependency - most likely cpu wouldn't be
> > able to pipeline it efficiently:
> >
> > rte_crypto_sym_process(uint8_t dev_id, uint16 qp_id,
> > rte_crypto_sym_session *ses, ...) {
> >      struct rte_cryptodev *dev = &rte_cryptodevs[dev_id];
> >      return (*dev->process)(sess->data[dev->driver_id, ...); }
> >
> > driver_specific_process(driver_specific_sym_session *sess) {
> >    return sess->process(sess, ...) ;
> > }
> >
> > I didn't make any exact measurements but sure it would be slower than
> just:
> > session_udata->process(session->udata->sess, ...); Again it would be
> > much more noticeable on low end cpus.
> > Let say here:
> > http://mails.dpdk.org/archives/dev/2019-September/144350.html
> > Jerin claims 1.5-3% drop for introducing extra call via hiding eth_dev
> > contents - I suppose we would have something similar here.
> > I do realize that in majority of cases crypto is more expensive then
> > RX/TX, but still.
> >
> > If it would be a really unavoidable tradeoff (support already existing
> > API, or so) I wouldn't mind, but I don't see any real need for it right now.
> 
> Calling session_udata->process(session->udata->sess, ...); from the
> application and Application need to maintain for each PMD the process() API
> in its memory will make the application not portable to other vendors.
> 
> What we are doing here is defining another way to create sessions for the
> same stuff that is already done. This make applications non-portable and
> confusing for the application writer.
> 
> I would say you should do some profiling first. As you also mentioned crypto
> workload is more Cycle consuming, it will not impact this case.
> 
> 
> >
> > >
> > > >
> > > > > That is why a dev-ops would be a better option.
> > > > >
> > > > > >
> > > > > > > When you would add more functionality to this sync
> > > > > > > API/struct, it will
> > end
> > > > up
> > > > > > being the same API/struct.
> > > > > > >
> > > > > > > Let us  see how close/ far we are from the existing APIs
> > > > > > > when the
> > actual
> > > > > > implementation is done.
> > > > > > >
> > > > > > > > > I am not sure if that would be needed.
> > > > > > > > > It would be internal to the driver that if synchronous
> > > > > > > > > processing is
> > > > > > > > supported(from feature flag) and
> > > > > > > > > Have relevant fields in xform(the newly added ones which
> > > > > > > > > are
> > packed
> > > > as
> > > > > > per
> > > > > > > > your suggestions) set,
> > > > > > > > > It will create that type of session.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > + * Main points:
> > > > > > > > > > + * - Current crypto-dev API is reasonably mature and
> > > > > > > > > > + it is
> > desirable
> > > > > > > > > > + *   to keep it unchanged (API/ABI stability). From other
> side, this
> > > > > > > > > > + *   new sync API is new one and probably would require
> extra
> > > > changes.
> > > > > > > > > > + *   Having it as a new one allows to mark it as experimental,
> > without
> > > > > > > > > > + *   affecting existing one.
> > > > > > > > > > + * - Fully opaque cpu_sym_session structure gives more
> flexibility
> > > > > > > > > > + *   to the PMD writers and again allows to avoid ABI
> breakages
> > in
> > > > future.
> > > > > > > > > > + * - process() function per set of xforms
> > > > > > > > > > + *   allows to expose different process() functions for
> different
> > > > > > > > > > + *   xform combinations. PMD writer can decide, does he
> wants
> > to
> > > > > > > > > > + *   push all supported algorithms into one process()
> function,
> > > > > > > > > > + *   or spread it across several ones.
> > > > > > > > > > + *   I.E. More flexibility for PMD writer.
> > > > > > > > >
> > > > > > > > > Which process function should be chosen is internal to
> > > > > > > > > PMD, how
> > > > would
> > > > > > that
> > > > > > > > info
> > > > > > > > > be visible to the application or the library. These will
> > > > > > > > > get stored in
> > the
> > > > > > session
> > > > > > > > private
> > > > > > > > > data. It would be upto the PMD writer, to store the per
> > > > > > > > > session
> > process
> > > > > > > > function in
> > > > > > > > > the session private data.
> > > > > > > > >
> > > > > > > > > Process function would be a dev ops just like enc/deq
> > > > > > > > > operations
> > and it
> > > > > > should
> > > > > > > > call
> > > > > > > > > The respective process API stored in the session private data.
> > > > > > > >
> > > > > > > > That model (via devops) is possible, but has several
> > > > > > > > drawbacks from
> > my
> > > > > > > > perspective:
> > > > > > > >
> > > > > > > > 1. It means we'll need to pass dev_id as a parameter to
> > > > > > > > process()
> > function.
> > > > > > > > Though in fact dev_id is not a relevant information for us
> > > > > > > > here (all we need is pointer to the session and pointer to
> > > > > > > > the fuction to call) and I tried to avoid using it in data-path
> functions for that API.
> > > > > > >
> > > > > > > You have a single vdev, but someone may have multiple vdevs
> > > > > > > for each
> > > > thread,
> > > > > > or may
> > > > > > > Have same dev with multiple queues for each core.
> > > > > >
> > > > > > That's fine. As I said above it is a SW backed implementation.
> > > > > > Each session has to be a separate entity that contains all
> > > > > > necessary
> > > > information
> > > > > > (keys, alg/mode info,  etc.)  to process input buffers.
> > > > > > Plus we need the actual function pointer to call.
> > > > > > I just don't see what for we need a dev_id in that situation.
> > > > >
> > > > > To iterate the session private data in the session.
> > > > >
> > > > > > Again, here we don't need care about queues and their pinning to
> cores.
> > > > > > If let say someone would like to process buffers from the same
> > > > > > IPsec SA
> > on 2
> > > > > > different cores in parallel, he can just create 2 sessions for
> > > > > > the same
> > xform,
> > > > > > give one to thread #1  and second to thread #2.
> > > > > > After that both threads are free to call process(this_thread_ses, ...)
> at will.
> > > > >
> > > > > Say you have a 16core device to handle 100G of traffic on a single
> tunnel.
> > > > > Will we make 16 sessions with same parameters?
> > > >
> > > > Absolutely same question we can ask for current crypto-op API.
> > > > You have lookaside crypto-dev with 16 HW queues, each queue is
> > > > serviced by different CPU.
> > > > For the same SA, do you need a separate session per queue, or is
> > > > it ok to
> > reuse
> > > > current one?
> > > > AFAIK, right now this is a grey area not clearly defined.
> > > > For crypto-devs I am aware - user can reuse the same session (as
> > > > PMD uses it read-only).
> > > > But again, right now I think it is not clearly defined and is
> > > > implementation specific.
> > >
> > > User can use the same session, that is what I am also insisting, but
> > > it may have
> > separate
> > > Session private data. Cryptodev session create API provide that
> > > functionality
> > and we can
> > > Leverage that.
> >
> > rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which
> > means we can't use the same rte_cryptodev_sym_session to hold sessions
> > for both sync and async mode for the same device. Off course we can
> > add a hard requirement that any driver that wants to support process()
> > has to create sessions that can handle both  process and
> > enqueue/dequeue, but then again  what for to create such overhead?
> >
> > BTW, to be honest, I don't consider current rte_cryptodev_sym_session
> > construct for multiple device_ids:
> > __extension__ struct {
> >                 void *data;
> >                 uint16_t refcnt;
> >         } sess_data[0];
> >         /**< Driver specific session material, variable size */
> >
> Yes I also feel the same. I was also not in favor of this when it was introduced.
> Please go ahead and remove this. I have no issues with that.
> 
> > as an advantage.
> > It looks too error prone for me:
> > 1. Simultaneous session initialization/de-initialization for devices
> > with the same driver_id is not possible.
> > 2. It assumes that all device driver will be loaded before we start to
> > create session pools.
> >
> > Right now it seems ok, as no-one requires such functionality, but I
> > don't know how it will be in future.
> > For me rte_security session model, where for each security context
> > user have to create new session looks much more robust.
> Agreed
> 
> >
> > >
> > > BTW, I can see a v2 to this RFC which is still based on security library.
> >
> > Yes, v2 was concentrated on fixing found issues, some code
> > restructuring, i.e. - changes that would be needed anyway whatever API
> aproach we'll choose.
> >
> > > When do you plan
> > > To submit the patches for crypto based APIs. We have RC1 merge
> > > deadline for
> > this
> > > patchset on 21st Oct.
> >
> > We'd like to start working on it ASAP, but it seems we still have a
> > major disagreement about how this crypto-dev API should look like.
> > Which makes me think - should we return to our original proposal via
> > rte_security?
> > It still looks to me like clean and straightforward way to enable this
> > new API, and probably wouldn't cause that much controversy.
> > What do you think?
> 
> I cannot spend more time discussing on this until RC1 date. I have some other
> stuff pending.
> You can send the patches early next week with the approach that I
> mentioned or else we can discuss this post RC1(which would mean deferring
> to 20.02).
> 
> But moving back to security is not acceptable to me. The code should be put
> where it is intended and not where it is easy to put. You are not doing any
> rte_security stuff.
> 
> 
> Regards,
> Akhil
  
Ananyev, Konstantin Oct. 14, 2019, 11:10 a.m. UTC | #26
> Hi Akhil,
> 
> Thanks for the review and comments!
> Knowing you are extremely busy. Here is my point in brief:
> I think placing the CPU synchronous crypto in the rte_security make sense, as
> 
> 1. rte_security contains inline crypto and lookaside crypto action type already, adding cpu_crypto action type is reasonable.
> 2. rte_security contains the security features may not supported by all devices, such as crypto, ipsec, and PDCP. cpu_crypto follow this
> category, again crypto.
> 3. placing CPU synchronous crypto API in rte_security is natural - as inline mode works synchronously, too. However cryptodev doesn't.
> 4. placing CPU synchronous crypto API in rte_security helps boosting SW crypto performance, I have already provided a simple perf test
> inside the unit test in the patchset for the user to try out - just comparing its output against DPDK crypto perf app output.
> 5. placing CPU synchronous crypto API in cryptodev will never serve HW lookaside crypto PMDs, as making them to work synchronously
> have huge performance penalty. However Cryptodev framework's existing design is providing APIs that will work in all crypto PMDs
> (rte_cryptodev_enqueue_burst / dequeue_burst for example), this does not fit in cryptodev's principle.
> 6. placing CPU synchronous crypto API in cryptodev confuses the user, as:
> 	- the session created for async mode may not work in sync mode
> 	- both enqueue/dequeue and cpu_crypto_process does the same crypto processing, but one PMD may support only one API (set),
> the other may support another, and the third PMD supports both. We have to provide another API to let the user query which one to
> support which.
> 	- two completely different code paths for async/sync mode.
> 7. You said in the end of the email - placing CPU synchronous crypto API into rte_security is not acceptable as it does not do any
> rte_security stuff - crypto isn't? You may call this a quibble, but in my idea, in the patchset both PMDs' implementations did offload the work
> to the CPU's special circuit designed dedicated to accelerate the crypto processing.
> 
> To me cryptodev is the one CPU synchronous crypto API should not go into, rte_security is.

I also don't understand why rte_security is not an option here.
We do have inline-crypto right now, why we can't have cpu-crypto with new process() API here?
Actually would like to hear more opinions from the community here -
what other interested parties think is the best way for introducing cpu-crypto specific API? 

Konstantin

> 
> Regards,
> Fan
> 
> > -----Original Message-----
> > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > Sent: Friday, October 11, 2019 2:24 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; 'dev@dpdk.org'
> > <dev@dpdk.org>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>;
> > 'Thomas Monjalon' <thomas@monjalon.net>; Zhang, Roy Fan
> > <roy.fan.zhang@intel.com>; Doherty, Declan <declan.doherty@intel.com>
> > Cc: 'Anoob Joseph' <anoobj@marvell.com>
> > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and
> > API
> >
> > Hi Konstantin,
> >
> > >
> > > Hi Akhil,
> > >
> > ..[snip]
> >
> > > > > > > > OK let us assume that you have a separate structure. But I
> > > > > > > > have a few
> > > > > queries:
> > > > > > > > 1. how can multiple drivers use a same session
> > > > > > >
> > > > > > > As a short answer: they can't.
> > > > > > > It is pretty much the same approach as with rte_security -
> > > > > > > each device
> > > needs
> > > > > to
> > > > > > > create/init its own session.
> > > > > > > So upper layer would need to maintain its own array (or so) for such
> > case.
> > > > > > > Though the question is why would you like to have same session
> > > > > > > over
> > > > > multiple
> > > > > > > SW backed devices?
> > > > > > > As it would be anyway just a synchronous function call that
> > > > > > > will be
> > > executed
> > > > > on
> > > > > > > the same cpu.
> > > > > >
> > > > > > I may have single FAT tunnel which may be distributed over
> > > > > > multiple Cores, and each core is affined to a different SW device.
> > > > >
> > > > > If it is pure SW, then we don't need multiple devices for such scenario.
> > > > > Device in that case is pure abstraction that we can skip.
> > > >
> > > > Yes agreed, but that liberty is given to the application whether it
> > > > need multiple devices with single queue or a single device with multiple
> > queues.
> > > > I think that independence should not be broken in this new API.
> > > > >
> > > > > > So a single session may be accessed by multiple devices.
> > > > > >
> > > > > > One more example would be depending on packet sizes, I may
> > > > > > switch
> > > between
> > > > > > HW/SW PMDs with the same session.
> > > > >
> > > > > Sure, but then we'll have multiple sessions.
> > > >
> > > > No, the session will be same and it will have multiple private data
> > > > for each of
> > > the PMD.
> > > >
> > > > > BTW, we have same thing now - these private session pointers are
> > > > > just
> > > stored
> > > > > inside the same rte_crypto_sym_session.
> > > > > And if user wants to support this model, he would also need to
> > > > > store <dev_id, queue_id> pair for each HW device anyway.
> > > >
> > > > Yes agreed, but how is that thing happening in your new struct, you
> > > > cannot
> > > support that.
> > >
> > > User can store all these info in his own struct.
> > > That's exactly what we have right now.
> > > Let say ipsec-secgw has to store for each IPsec SA:
> > > pointer to crypto-session and/or pointer to security session plus (for
> > > lookaside-devices) cdev_id_qp that allows it to extract dev_id +
> > > queue_id information.
> > > As I understand that works for now, as each ipsec_sa uses only one
> > > dev+queue. Though if someone would like to use multiple devices/queues
> > > for the same SA - he would need to have an array of these <dev+queue>
> > pairs.
> > > So even right now rte_cryptodev_sym_session is not self-consistent and
> > > requires extra information to be maintained by user.
> >
> > Why are you increasing the complexity for the user application.
> > The new APIs and struct should be such that it need to do minimum changes
> > in the stack so that stack is portable on multiple vendors.
> > You should try to hide as much complexity in the driver or lib to give the user
> > simple APIs.
> >
> > Having a same session for multiple devices was added by Intel only for some
> > use cases.
> > And we had split that session create API into 2. Now if those are not useful
> > shall we move back to the single API. I think @Doherty, Declan and @De Lara
> > Guarch, Pablo can comment on this.
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > 2. Can somebody use the scheduler pmd for scheduling the
> > > > > > > > different
> > > type
> > > > > of
> > > > > > > payloads for the same session?
> > > > > > >
> > > > > > > In theory yes.
> > > > > > > Though for that scheduler pmd should have inside it's
> > > > > > > rte_crypto_cpu_sym_session an array of pointers to the
> > > > > > > underlying devices sessions.
> > > > > > >
> > > > > > > >
> > > > > > > > With your proposal the APIs would be very specific to your
> > > > > > > > use case
> > > only.
> > > > > > >
> > > > > > > Yes in some way.
> > > > > > > I consider that API specific for SW backed crypto PMDs.
> > > > > > > I can hardly see how any 'real HW' PMDs (lksd-none,
> > > > > > > lksd-proto) will
> > > benefit
> > > > > > > from it.
> > > > > > > Current crypto-op API is very much HW oriented.
> > > > > > > Which is ok, that's for it was intended for, but I think we
> > > > > > > also need one
> > > that
> > > > > > > would be designed
> > > > > > > for SW backed implementation in mind.
> > > > > >
> > > > > > We may re-use your API for HW PMDs as well which do not have
> > > requirement
> > > > > of
> > > > > > Crypto-op/mbuf etc.
> > > > > > The return type of your new process API may have a status which
> > > > > > say
> > > > > 'processed'
> > > > > > Or can be say 'enqueued'. So if it is  'enqueued', we may have a
> > > > > > new API for
> > > > > raw
> > > > > > Bufs dequeue as well.
> > > > > >
> > > > > > This requirement can be for any hardware PMDs like QAT as well.
> > > > >
> > > > > I don't think it is a good idea to extend this API for async (lookaside)
> > devices.
> > > > > You'll need to:
> > > > >  - provide dev_id and queue_id for each process(enqueue) and
> > > > > dequeuer operation.
> > > > >  - provide IOVA for all buffers passing to that function (data
> > > > > buffers, digest,
> > > IV,
> > > > > aad).
> > > > >  - On dequeue provide some way to associate dequed data and digest
> > > > > buffers with
> > > > >    crypto-session that was used  (and probably with mbuf).
> > > > >  So most likely we'll end up with another just version of our
> > > > > current crypto-op structure.
> > > > > If you'd like to get rid of mbufs dependency within current
> > > > > crypto-op API that understandable, but I don't think we should
> > > > > have same API for both sync (CPU) and async
> > > > > (lookaside) cases.
> > > > > It doesn't seem feasible at all and voids whole purpose of that patch.
> > > >
> > > > At this moment we are not much concerned about the dequeue API and
> > > > about
> > > the
> > > > HW PMD support. It is just that the new API should be generic enough
> > > > to be
> > > used in
> > > > some future scenarios as well. I am just highlighting the possible
> > > > usecases
> > > which can
> > > > be there in future.
> > >
> > > Sorry, but I strongly disagree with such approach.
> > > We should stop adding/modifying API 'just in case' and because 'it
> > > might be useful for some future HW'.
> > > Inside DPDK we already do have too many dev level APIs without any
> > > implementations.
> > > That's quite bad practice and very dis-orienting for end-users.
> > > I think to justify API additions/changes we need at least one proper
> > > implementation for it, or at least some strong evidence that people
> > > are really committed to support it in nearest future.
> > > BTW, that what TB agreed on, nearly a year ago.
> > >
> > > This new API (if we'll go ahead with it of course) would stay
> > > experimental for some time anyway to make sure we don't miss anything
> > > needed (I think for about a year time- frame).
> > > So if you guys *really* want to extend it support _async_ devices too
> > > - I am open for modifications/additions here.
> > > Though personally I think such addition would over-complicate things
> > > and we'll end up with another reincarnation of current crypto-op.
> > > We actually discussed it internally, and decided to drop that idea because
> > of that.
> > > Again, my opinion - for lookaside devices it might be better to try to
> > > optimize current crypto-op path (remove mbuf requirement, probably add
> > > ability to group by session on enqueue/dequeue, etc.).
> >
> > I agree that the new API is experimental and can be modified later. So no
> > issues in that, but we can keep some things in mind while defining APIs.
> > These were some comments from my side, if those are impacting the current
> > scenario, you can drop those. We will take care of those later.
> >
> > >
> > > >
> > > > What is the issue that you face in making a dev-op for this new API.
> > > > Do you see
> > > any
> > > > performance impact with that?
> > >
> > > There are two main things:
> > > 1. user would need to maintain and provide for each process() call
> > > dev_id+queue_id.
> > > That's means extra (and totally unnecessary for SW) overhead.
> >
> > You are using a crypto device for performing the processing, you must use
> > dev_id to identify which SW device it is. This is how the DPDK Framework
> > works.
> > .
> >
> > > 2. yes I would expect some perf overhead too - it would be extra call or
> > branch.
> > > Again as it would be data-dependency - most likely cpu wouldn't be
> > > able to pipeline it efficiently:
> > >
> > > rte_crypto_sym_process(uint8_t dev_id, uint16 qp_id,
> > > rte_crypto_sym_session *ses, ...) {
> > >      struct rte_cryptodev *dev = &rte_cryptodevs[dev_id];
> > >      return (*dev->process)(sess->data[dev->driver_id, ...); }
> > >
> > > driver_specific_process(driver_specific_sym_session *sess) {
> > >    return sess->process(sess, ...) ;
> > > }
> > >
> > > I didn't make any exact measurements but sure it would be slower than
> > just:
> > > session_udata->process(session->udata->sess, ...); Again it would be
> > > much more noticeable on low end cpus.
> > > Let say here:
> > > http://mails.dpdk.org/archives/dev/2019-September/144350.html
> > > Jerin claims 1.5-3% drop for introducing extra call via hiding eth_dev
> > > contents - I suppose we would have something similar here.
> > > I do realize that in majority of cases crypto is more expensive then
> > > RX/TX, but still.
> > >
> > > If it would be a really unavoidable tradeoff (support already existing
> > > API, or so) I wouldn't mind, but I don't see any real need for it right now.
> >
> > Calling session_udata->process(session->udata->sess, ...); from the
> > application and Application need to maintain for each PMD the process() API
> > in its memory will make the application not portable to other vendors.
> >
> > What we are doing here is defining another way to create sessions for the
> > same stuff that is already done. This make applications non-portable and
> > confusing for the application writer.
> >
> > I would say you should do some profiling first. As you also mentioned crypto
> > workload is more Cycle consuming, it will not impact this case.
> >
> >
> > >
> > > >
> > > > >
> > > > > > That is why a dev-ops would be a better option.
> > > > > >
> > > > > > >
> > > > > > > > When you would add more functionality to this sync
> > > > > > > > API/struct, it will
> > > end
> > > > > up
> > > > > > > being the same API/struct.
> > > > > > > >
> > > > > > > > Let us  see how close/ far we are from the existing APIs
> > > > > > > > when the
> > > actual
> > > > > > > implementation is done.
> > > > > > > >
> > > > > > > > > > I am not sure if that would be needed.
> > > > > > > > > > It would be internal to the driver that if synchronous
> > > > > > > > > > processing is
> > > > > > > > > supported(from feature flag) and
> > > > > > > > > > Have relevant fields in xform(the newly added ones which
> > > > > > > > > > are
> > > packed
> > > > > as
> > > > > > > per
> > > > > > > > > your suggestions) set,
> > > > > > > > > > It will create that type of session.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > + * Main points:
> > > > > > > > > > > + * - Current crypto-dev API is reasonably mature and
> > > > > > > > > > > + it is
> > > desirable
> > > > > > > > > > > + *   to keep it unchanged (API/ABI stability). From other
> > side, this
> > > > > > > > > > > + *   new sync API is new one and probably would require
> > extra
> > > > > changes.
> > > > > > > > > > > + *   Having it as a new one allows to mark it as experimental,
> > > without
> > > > > > > > > > > + *   affecting existing one.
> > > > > > > > > > > + * - Fully opaque cpu_sym_session structure gives more
> > flexibility
> > > > > > > > > > > + *   to the PMD writers and again allows to avoid ABI
> > breakages
> > > in
> > > > > future.
> > > > > > > > > > > + * - process() function per set of xforms
> > > > > > > > > > > + *   allows to expose different process() functions for
> > different
> > > > > > > > > > > + *   xform combinations. PMD writer can decide, does he
> > wants
> > > to
> > > > > > > > > > > + *   push all supported algorithms into one process()
> > function,
> > > > > > > > > > > + *   or spread it across several ones.
> > > > > > > > > > > + *   I.E. More flexibility for PMD writer.
> > > > > > > > > >
> > > > > > > > > > Which process function should be chosen is internal to
> > > > > > > > > > PMD, how
> > > > > would
> > > > > > > that
> > > > > > > > > info
> > > > > > > > > > be visible to the application or the library. These will
> > > > > > > > > > get stored in
> > > the
> > > > > > > session
> > > > > > > > > private
> > > > > > > > > > data. It would be upto the PMD writer, to store the per
> > > > > > > > > > session
> > > process
> > > > > > > > > function in
> > > > > > > > > > the session private data.
> > > > > > > > > >
> > > > > > > > > > Process function would be a dev ops just like enc/deq
> > > > > > > > > > operations
> > > and it
> > > > > > > should
> > > > > > > > > call
> > > > > > > > > > The respective process API stored in the session private data.
> > > > > > > > >
> > > > > > > > > That model (via devops) is possible, but has several
> > > > > > > > > drawbacks from
> > > my
> > > > > > > > > perspective:
> > > > > > > > >
> > > > > > > > > 1. It means we'll need to pass dev_id as a parameter to
> > > > > > > > > process()
> > > function.
> > > > > > > > > Though in fact dev_id is not a relevant information for us
> > > > > > > > > here (all we need is pointer to the session and pointer to
> > > > > > > > > the fuction to call) and I tried to avoid using it in data-path
> > functions for that API.
> > > > > > > >
> > > > > > > > You have a single vdev, but someone may have multiple vdevs
> > > > > > > > for each
> > > > > thread,
> > > > > > > or may
> > > > > > > > Have same dev with multiple queues for each core.
> > > > > > >
> > > > > > > That's fine. As I said above it is a SW backed implementation.
> > > > > > > Each session has to be a separate entity that contains all
> > > > > > > necessary
> > > > > information
> > > > > > > (keys, alg/mode info,  etc.)  to process input buffers.
> > > > > > > Plus we need the actual function pointer to call.
> > > > > > > I just don't see what for we need a dev_id in that situation.
> > > > > >
> > > > > > To iterate the session private data in the session.
> > > > > >
> > > > > > > Again, here we don't need care about queues and their pinning to
> > cores.
> > > > > > > If let say someone would like to process buffers from the same
> > > > > > > IPsec SA
> > > on 2
> > > > > > > different cores in parallel, he can just create 2 sessions for
> > > > > > > the same
> > > xform,
> > > > > > > give one to thread #1  and second to thread #2.
> > > > > > > After that both threads are free to call process(this_thread_ses, ...)
> > at will.
> > > > > >
> > > > > > Say you have a 16core device to handle 100G of traffic on a single
> > tunnel.
> > > > > > Will we make 16 sessions with same parameters?
> > > > >
> > > > > Absolutely same question we can ask for current crypto-op API.
> > > > > You have lookaside crypto-dev with 16 HW queues, each queue is
> > > > > serviced by different CPU.
> > > > > For the same SA, do you need a separate session per queue, or is
> > > > > it ok to
> > > reuse
> > > > > current one?
> > > > > AFAIK, right now this is a grey area not clearly defined.
> > > > > For crypto-devs I am aware - user can reuse the same session (as
> > > > > PMD uses it read-only).
> > > > > But again, right now I think it is not clearly defined and is
> > > > > implementation specific.
> > > >
> > > > User can use the same session, that is what I am also insisting, but
> > > > it may have
> > > separate
> > > > Session private data. Cryptodev session create API provide that
> > > > functionality
> > > and we can
> > > > Leverage that.
> > >
> > > rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which
> > > means we can't use the same rte_cryptodev_sym_session to hold sessions
> > > for both sync and async mode for the same device. Off course we can
> > > add a hard requirement that any driver that wants to support process()
> > > has to create sessions that can handle both  process and
> > > enqueue/dequeue, but then again  what for to create such overhead?
> > >
> > > BTW, to be honest, I don't consider current rte_cryptodev_sym_session
> > > construct for multiple device_ids:
> > > __extension__ struct {
> > >                 void *data;
> > >                 uint16_t refcnt;
> > >         } sess_data[0];
> > >         /**< Driver specific session material, variable size */
> > >
> > Yes I also feel the same. I was also not in favor of this when it was introduced.
> > Please go ahead and remove this. I have no issues with that.
> >
> > > as an advantage.
> > > It looks too error prone for me:
> > > 1. Simultaneous session initialization/de-initialization for devices
> > > with the same driver_id is not possible.
> > > 2. It assumes that all device driver will be loaded before we start to
> > > create session pools.
> > >
> > > Right now it seems ok, as no-one requires such functionality, but I
> > > don't know how it will be in future.
> > > For me rte_security session model, where for each security context
> > > user have to create new session looks much more robust.
> > Agreed
> >
> > >
> > > >
> > > > BTW, I can see a v2 to this RFC which is still based on security library.
> > >
> > > Yes, v2 was concentrated on fixing found issues, some code
> > > restructuring, i.e. - changes that would be needed anyway whatever API
> > aproach we'll choose.
> > >
> > > > When do you plan
> > > > To submit the patches for crypto based APIs. We have RC1 merge
> > > > deadline for
> > > this
> > > > patchset on 21st Oct.
> > >
> > > We'd like to start working on it ASAP, but it seems we still have a
> > > major disagreement about how this crypto-dev API should look like.
> > > Which makes me think - should we return to our original proposal via
> > > rte_security?
> > > It still looks to me like clean and straightforward way to enable this
> > > new API, and probably wouldn't cause that much controversy.
> > > What do you think?
> >
> > I cannot spend more time discussing on this until RC1 date. I have some other
> > stuff pending.
> > You can send the patches early next week with the approach that I
> > mentioned or else we can discuss this post RC1(which would mean deferring
> > to 20.02).
> >
> > But moving back to security is not acceptable to me. The code should be put
> > where it is intended and not where it is easy to put. You are not doing any
> > rte_security stuff.
> >
> >
> > Regards,
> > Akhil
  
Akhil Goyal Oct. 15, 2019, 3 p.m. UTC | #27
Hi Fan,

> 
> Hi Akhil,
> 
> Thanks for the review and comments!
> Knowing you are extremely busy. Here is my point in brief:
> I think placing the CPU synchronous crypto in the rte_security make sense, as
> 
> 1. rte_security contains inline crypto and lookaside crypto action type already,
> adding cpu_crypto action type is reasonable.

The argument here is not about cpu-crypto, any SW PMD is nothing but a cpu-crypto.
Hence cryptodev already support that.
Here we are concerned only with synchronous processing for crypto workloads.

> 2. rte_security contains the security features may not supported by all devices,
> such as crypto, ipsec, and PDCP. cpu_crypto follow this category, again crypto.

I do not get the intent of this comment. Looking at your patchset, what I get is,
You need a synchronous API for crypto workloads.
If sync processing is required for security payloads, we can add a sync API there as well.
I have made that comment before also. We can have sync API in both security and cryptodev.

> 3. placing CPU synchronous crypto API in rte_security is natural - as inline mode
> works synchronously, too. However cryptodev doesn't.

It is a valid use case for all the cryptodev SW PMDs
that there should be a synchronous API for crypto processing and that is what your usecase
need. 

> 4. placing CPU synchronous crypto API in rte_security helps boosting SW crypto
> performance, I have already provided a simple perf test inside the unit test in the
> patchset for the user to try out - just comparing its output against DPDK crypto
> perf app output.

I don't expect any performance difference while moving this from security to cryptodev.
Have you done any profiling?

> 5. placing CPU synchronous crypto API in cryptodev will never serve HW
> lookaside crypto PMDs, as making them to work synchronously have huge
> performance penalty. However Cryptodev framework's existing design is
> providing APIs that will work in all crypto PMDs (rte_cryptodev_enqueue_burst /
> dequeue_burst for example), this does not fit in cryptodev's principle.

Agreed that it is not for HW PMDs, however the op may be null in those cases.
Why it is against the cryptodev principle? There are some ops which 
PMDs may or may not support.

> 6. placing CPU synchronous crypto API in cryptodev confuses the user, as:
> 	- the session created for async mode may not work in sync mode

Why? The whole idea for my conversations on this patchset talks about that.
Same session should work for both sync and async processing.

> 	- both enqueue/dequeue and cpu_crypto_process does the same crypto
> processing, but one PMD may support only one API (set), the other may support
> another, and the third PMD supports both. We have to provide another API to
> let the user query which one to support which.

This should be based on a Feature flag. It would be upto the application developer
To decide which (sync/async) processing is required for which type of flows that
it is configuring.

> 	- two completely different code paths for async/sync mode.
> 7. You said in the end of the email - placing CPU synchronous crypto API into
> rte_security is not acceptable as it does not do any rte_security stuff - crypto
> isn't? You may call this a quibble, but in my idea, in the patchset both PMDs'
> implementations did offload the work to the CPU's special circuit designed
> dedicated to accelerate the crypto processing.

This is specific to Intel SW PMDs only. IMO, if you are talking about SW PMDs,
openssl can also benefit from this.

> 
> To me cryptodev is the one CPU synchronous crypto API should not go into,
> rte_security is.
> 
> Regards,
> Fan
> 
Regards,
Akhil
  
Akhil Goyal Oct. 15, 2019, 3:02 p.m. UTC | #28
> 
> 
> > Hi Akhil,
> >
> > Thanks for the review and comments!
> > Knowing you are extremely busy. Here is my point in brief:
> > I think placing the CPU synchronous crypto in the rte_security make sense, as
> >
> > 1. rte_security contains inline crypto and lookaside crypto action type already,
> adding cpu_crypto action type is reasonable.
> > 2. rte_security contains the security features may not supported by all devices,
> such as crypto, ipsec, and PDCP. cpu_crypto follow this
> > category, again crypto.
> > 3. placing CPU synchronous crypto API in rte_security is natural - as inline
> mode works synchronously, too. However cryptodev doesn't.
> > 4. placing CPU synchronous crypto API in rte_security helps boosting SW
> crypto performance, I have already provided a simple perf test
> > inside the unit test in the patchset for the user to try out - just comparing its
> output against DPDK crypto perf app output.
> > 5. placing CPU synchronous crypto API in cryptodev will never serve HW
> lookaside crypto PMDs, as making them to work synchronously
> > have huge performance penalty. However Cryptodev framework's existing
> design is providing APIs that will work in all crypto PMDs
> > (rte_cryptodev_enqueue_burst / dequeue_burst for example), this does not fit
> in cryptodev's principle.
> > 6. placing CPU synchronous crypto API in cryptodev confuses the user, as:
> > 	- the session created for async mode may not work in sync mode
> > 	- both enqueue/dequeue and cpu_crypto_process does the same crypto
> processing, but one PMD may support only one API (set),
> > the other may support another, and the third PMD supports both. We have to
> provide another API to let the user query which one to
> > support which.
> > 	- two completely different code paths for async/sync mode.
> > 7. You said in the end of the email - placing CPU synchronous crypto API into
> rte_security is not acceptable as it does not do any
> > rte_security stuff - crypto isn't? You may call this a quibble, but in my idea, in
> the patchset both PMDs' implementations did offload the work
> > to the CPU's special circuit designed dedicated to accelerate the crypto
> processing.
> >
> > To me cryptodev is the one CPU synchronous crypto API should not go into,
> rte_security is.
> 
> I also don't understand why rte_security is not an option here.
> We do have inline-crypto right now, why we can't have cpu-crypto with new
> process() API here?
> Actually would like to hear more opinions from the community here -
> what other interested parties think is the best way for introducing cpu-crypto
> specific API?

I have raised this concern in the weekly status meeting as well. But it looks like nobody
is interested.
  
Ananyev, Konstantin Oct. 16, 2019, 1:04 p.m. UTC | #29
> -----Original Message-----
> From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> Sent: Tuesday, October 15, 2019 4:02 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Zhang, Roy Fan <roy.fan.zhang@intel.com>; 'dev@dpdk.org' <dev@dpdk.org>;
> De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; 'Thomas Monjalon' <thomas@monjalon.net>; Doherty, Declan
> <declan.doherty@intel.com>
> Cc: 'Anoob Joseph' <anoobj@marvell.com>; Jerin Jacob <jerinj@marvell.com>; Hemant Agrawal <hemant.agrawal@nxp.com>
> Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
> 
> 
> 
> >
> >
> > > Hi Akhil,
> > >
> > > Thanks for the review and comments!
> > > Knowing you are extremely busy. Here is my point in brief:
> > > I think placing the CPU synchronous crypto in the rte_security make sense, as
> > >
> > > 1. rte_security contains inline crypto and lookaside crypto action type already,
> > adding cpu_crypto action type is reasonable.
> > > 2. rte_security contains the security features may not supported by all devices,
> > such as crypto, ipsec, and PDCP. cpu_crypto follow this
> > > category, again crypto.
> > > 3. placing CPU synchronous crypto API in rte_security is natural - as inline
> > mode works synchronously, too. However cryptodev doesn't.
> > > 4. placing CPU synchronous crypto API in rte_security helps boosting SW
> > crypto performance, I have already provided a simple perf test
> > > inside the unit test in the patchset for the user to try out - just comparing its
> > output against DPDK crypto perf app output.
> > > 5. placing CPU synchronous crypto API in cryptodev will never serve HW
> > lookaside crypto PMDs, as making them to work synchronously
> > > have huge performance penalty. However Cryptodev framework's existing
> > design is providing APIs that will work in all crypto PMDs
> > > (rte_cryptodev_enqueue_burst / dequeue_burst for example), this does not fit
> > in cryptodev's principle.
> > > 6. placing CPU synchronous crypto API in cryptodev confuses the user, as:
> > > 	- the session created for async mode may not work in sync mode
> > > 	- both enqueue/dequeue and cpu_crypto_process does the same crypto
> > processing, but one PMD may support only one API (set),
> > > the other may support another, and the third PMD supports both. We have to
> > provide another API to let the user query which one to
> > > support which.
> > > 	- two completely different code paths for async/sync mode.
> > > 7. You said in the end of the email - placing CPU synchronous crypto API into
> > rte_security is not acceptable as it does not do any
> > > rte_security stuff - crypto isn't? You may call this a quibble, but in my idea, in
> > the patchset both PMDs' implementations did offload the work
> > > to the CPU's special circuit designed dedicated to accelerate the crypto
> > processing.
> > >
> > > To me cryptodev is the one CPU synchronous crypto API should not go into,
> > rte_security is.
> >
> > I also don't understand why rte_security is not an option here.
> > We do have inline-crypto right now, why we can't have cpu-crypto with new
> > process() API here?
> > Actually would like to hear more opinions from the community here -
> > what other interested parties think is the best way for introducing cpu-crypto
> > specific API?
> 
> I have raised this concern in the weekly status meeting as well. But it looks like nobody
> is interested.

That's really a pity...
CC-ing it to TB members, hopefully someone would be interested,
or at least can forward to interested person.
Konstantin
  
Ananyev, Konstantin Oct. 16, 2019, 10:07 p.m. UTC | #30
Hi Akhil,

> > > User can use the same session, that is what I am also insisting, but it may have
> > separate
> > > Session private data. Cryptodev session create API provide that functionality
> > and we can
> > > Leverage that.
> >
> > rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means
> > we can't use
> > the same rte_cryptodev_sym_session to hold sessions for both sync and async
> > mode
> > for the same device. Off course we can add a hard requirement that any driver
> > that wants to
> > support process() has to create sessions that can handle both  process and
> > enqueue/dequeue,
> > but then again  what for to create such overhead?
> >
> > BTW, to be honest, I don't consider current rte_cryptodev_sym_session
> > construct for multiple device_ids:
> > __extension__ struct {
> >                 void *data;
> >                 uint16_t refcnt;
> >         } sess_data[0];
> >         /**< Driver specific session material, variable size */
> >
> Yes I also feel the same. I was also not in favor of this when it was introduced.
> Please go ahead and remove this. I have no issues with that.

If you are not happy with that structure, and admit there are issues with it,
why do you push for reusing it for cpu-crypto API?  
Why  not to take step back, take into account current drawbacks
and define something that (hopefully) would suite us better?
Again new API will be experimental for some time, so we'll
have some opportunity to see does it works and if not fix it.  

About removing data[] from existing rte_cryptodev_sym_session - 
Personally would like to do that, but the change seems to be too massive.
Definitely not ready for such effort right now.

> 
> > as an advantage.
> > It looks too error prone for me:
> > 1. Simultaneous session initialization/de-initialization for devices with the same
> > driver_id is not possible.
> > 2. It assumes that all device driver will be loaded before we start to create
> > session pools.
> >
> > Right now it seems ok, as no-one requires such functionality, but I don't know
> > how it will be in future.
> > For me rte_security session model, where for each security context user have to
> > create new session
> > looks much more robust.
> Agreed
> 
> >
> > >
> > > BTW, I can see a v2 to this RFC which is still based on security library.
> >
> > Yes, v2 was concentrated on fixing found issues, some code restructuring,
> > i.e. - changes that would be needed anyway whatever API aproach we'll choose.
> >
> > > When do you plan
> > > To submit the patches for crypto based APIs. We have RC1 merge deadline for
> > this
> > > patchset on 21st Oct.
> >
> > We'd like to start working on it ASAP, but it seems we still have a major
> > disagreement
> > about how this crypto-dev API should look like.
> > Which makes me think - should we return to our original proposal via
> > rte_security?
> > It still looks to me like clean and straightforward way to enable this new API,
> > and probably wouldn't cause that much controversy.
> > What do you think?
> 
> I cannot spend more time discussing on this until RC1 date. I have some other stuff pending.
> You can send the patches early next week with the approach that I mentioned or else we
> can discuss this post RC1(which would mean deferring to 20.02).
> 
> But moving back to security is not acceptable to me. The code should be put where it is
> intended and not where it is easy to put. You are not doing any rte_security stuff.
> 

Ok, then my suggestion:
Let's at least write down all points about crypto-dev approach where we
disagree and then probably try to resolve them one by one....
If we fail to make an agreement/progress in next week or so,
(and no more reviews from the community) 
will have bring that subject to TB meeting to decide.
Sounds fair to you?

List is below.
Please add/correct me, if I missed something.

Konstantin

1. extra input parameters to create/init rte_(cpu)_sym_session.

Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo' and 'key' fields.
New fields will be optional and would be used by PMD only when cpu-crypto session is requested.
For lksd-crypto session PMD is free to ignore these fields.  
No ABI breakage is required. 

Hopefully no controversy here with #1.

2. cpu-crypto create/init.
    a) Our suggestion - introduce new API for that:
        - rte_crypto_cpu_sym_init() that would init completely opaque  rte_crypto_cpu_sym_session.
        - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear); /*whatever else we'll need *'};
        - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform *xforms)
          that would return const struct rte_crypto_cpu_sym_session_ops *based on input xforms.
	Advantages:
	1)  totally opaque data structure (no ABI breakages in future), PMD writer is totally free
	     with it format and contents. 
	2) each session entity is self-contained, user doesn't need to bring along dev_id etc.
	    dev_id is needed  only at init stage, after that user will use session ops to perform
	    all operations on that session (process(), clear(), etc.).
	3) User can decide does he wants to store ops[] pointer on a per session basis,
	    or on a per group of same sessions, or...
	4) No mandatory mempools for private sessions. User can allocate memory for cpu-crypto
	    session whenever he likes.
	Disadvantages:
	5) Extra changes in control path
	6) User has to store session_ops pointer explicitly.
     b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and existing rte_cryptodev_sym_session
      structure.
	Advantages:
	1) allows to reuse same struct and init/create/clear() functions.
	    Probably less changes in control path.
	Disadvantages:
	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means that 
	    we can't use the same rte_cryptodev_sym_session to hold private sessions pointers
	    for both sync and async mode  for the same device.
                   So wthe only option we have - make PMD devops->sym_session_configure()
	    always create a session that can work in both cpu and lksd modes.
	    For some implementations that would probably mean that under the hood  PMD would create
	    2 different session structs (sync/async) and then use one or another depending on from what API been called.
	    Seems doable, but ...:
                   - will contradict with statement from 1: 
	      " New fields will be optional and would be used by PMD only when cpu-crypto session is requested."
                      Now it becomes mandatory for all apps to specify cpu-crypto related parameters too,
	       even if they don't plan to use that mode - i.e. behavior change, existing app change.
                     - might cause extra space overhead.
	3) not possible to store device (not driver) specific data within the session, but I think it is not really needed right now.  
	    So probably minor compared to 2.b.2.

Actually #3 follows from #2, but decided to have them separated.

3. process() parameters/behavior
    a) Our suggestion: user stores ptr to session ops (or to (*process) itself) and just does:
        session_ops->process(sess, ...);
	Advantages:
	1) fastest possible execution path
	2) no need to carry on dev_id for data-path
	Disadvantages:
	3) user has to carry on session_ops pointer explicitly
    b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and then:
        rte_crypto_cpu_sym_process(uint8_t dev_id, rte_cryptodev_sym_session  *sess, /*data parameters*/) {...
                     rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
                      /*and then inside PMD specifc process: */
                     pmd_private_session = sess->sess_data[this_pmd_driver_id].data;
                     /* and then most likely either */
                     pmd_private_session->process(pmd_private_session, ...);
                     /* or jump based on session/input data */
	Advantages:
	1) don't see any...
	Disadvantages:
	2) User has to carry on dev_id inside data-path
	3) Extra level of indirection (plus data dependency) - both for data and instructions.
	    Possible slowdown compared to a) (not measured).
  
Ananyev, Konstantin Oct. 17, 2019, 12:49 p.m. UTC | #31
> 
> > > > User can use the same session, that is what I am also insisting, but it may have
> > > separate
> > > > Session private data. Cryptodev session create API provide that functionality
> > > and we can
> > > > Leverage that.
> > >
> > > rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means
> > > we can't use
> > > the same rte_cryptodev_sym_session to hold sessions for both sync and async
> > > mode
> > > for the same device. Off course we can add a hard requirement that any driver
> > > that wants to
> > > support process() has to create sessions that can handle both  process and
> > > enqueue/dequeue,
> > > but then again  what for to create such overhead?
> > >
> > > BTW, to be honest, I don't consider current rte_cryptodev_sym_session
> > > construct for multiple device_ids:
> > > __extension__ struct {
> > >                 void *data;
> > >                 uint16_t refcnt;
> > >         } sess_data[0];
> > >         /**< Driver specific session material, variable size */
> > >
> > Yes I also feel the same. I was also not in favor of this when it was introduced.
> > Please go ahead and remove this. I have no issues with that.
> 
> If you are not happy with that structure, and admit there are issues with it,
> why do you push for reusing it for cpu-crypto API?
> Why  not to take step back, take into account current drawbacks
> and define something that (hopefully) would suite us better?
> Again new API will be experimental for some time, so we'll
> have some opportunity to see does it works and if not fix it.
> 
> About removing data[] from existing rte_cryptodev_sym_session -
> Personally would like to do that, but the change seems to be too massive.
> Definitely not ready for such effort right now.
> 
> >
> > > as an advantage.
> > > It looks too error prone for me:
> > > 1. Simultaneous session initialization/de-initialization for devices with the same
> > > driver_id is not possible.
> > > 2. It assumes that all device driver will be loaded before we start to create
> > > session pools.
> > >
> > > Right now it seems ok, as no-one requires such functionality, but I don't know
> > > how it will be in future.
> > > For me rte_security session model, where for each security context user have to
> > > create new session
> > > looks much more robust.
> > Agreed
> >
> > >
> > > >
> > > > BTW, I can see a v2 to this RFC which is still based on security library.
> > >
> > > Yes, v2 was concentrated on fixing found issues, some code restructuring,
> > > i.e. - changes that would be needed anyway whatever API aproach we'll choose.
> > >
> > > > When do you plan
> > > > To submit the patches for crypto based APIs. We have RC1 merge deadline for
> > > this
> > > > patchset on 21st Oct.
> > >
> > > We'd like to start working on it ASAP, but it seems we still have a major
> > > disagreement
> > > about how this crypto-dev API should look like.
> > > Which makes me think - should we return to our original proposal via
> > > rte_security?
> > > It still looks to me like clean and straightforward way to enable this new API,
> > > and probably wouldn't cause that much controversy.
> > > What do you think?
> >
> > I cannot spend more time discussing on this until RC1 date. I have some other stuff pending.
> > You can send the patches early next week with the approach that I mentioned or else we
> > can discuss this post RC1(which would mean deferring to 20.02).
> >
> > But moving back to security is not acceptable to me. The code should be put where it is
> > intended and not where it is easy to put. You are not doing any rte_security stuff.
> >
> 
> Ok, then my suggestion:
> Let's at least write down all points about crypto-dev approach where we
> disagree and then probably try to resolve them one by one....
> If we fail to make an agreement/progress in next week or so,
> (and no more reviews from the community)
> will have bring that subject to TB meeting to decide.
> Sounds fair to you?
> 
> List is below.
> Please add/correct me, if I missed something.
> 
> Konstantin
> 
> 1. extra input parameters to create/init rte_(cpu)_sym_session.
> 
> Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo' and 'key' fields.
> New fields will be optional and would be used by PMD only when cpu-crypto session is requested.
> For lksd-crypto session PMD is free to ignore these fields.
> No ABI breakage is required.
> 
> Hopefully no controversy here with #1.
> 
> 2. cpu-crypto create/init.
>     a) Our suggestion - introduce new API for that:
>         - rte_crypto_cpu_sym_init() that would init completely opaque  rte_crypto_cpu_sym_session.
>         - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear); /*whatever else we'll need *'};
>         - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform *xforms)
>           that would return const struct rte_crypto_cpu_sym_session_ops *based on input xforms.
> 	Advantages:
> 	1)  totally opaque data structure (no ABI breakages in future), PMD writer is totally free
> 	     with it format and contents.
> 	2) each session entity is self-contained, user doesn't need to bring along dev_id etc.
> 	    dev_id is needed  only at init stage, after that user will use session ops to perform
> 	    all operations on that session (process(), clear(), etc.).
> 	3) User can decide does he wants to store ops[] pointer on a per session basis,
> 	    or on a per group of same sessions, or...
> 	4) No mandatory mempools for private sessions. User can allocate memory for cpu-crypto
> 	    session whenever he likes.
> 	Disadvantages:
> 	5) Extra changes in control path
> 	6) User has to store session_ops pointer explicitly.

After another thought if 2.a.6 is really that big deal we can have small shim layer on top:

rte_crypto_cpu_sym_session { void *ses; struct rte_crypto_cpu_sym_session_ops * const ops; }
OR even
rte_crypto_cpu_sym_session { void *ses; struct rte_crypto_cpu_sym_session_ops ops; }

And merge rte_crypto_cpu_sym_init() and rte_crypto_cpu_sym_get_ops() into one (init).

Then process() can become a wrapper:
rte_crypto_cpu_sym_process(ses, ...) {return ses->ops->process(ses->ses, ...);}
OR
rte_crypto_cpu_sym_process(ses, ...) {return ses->ops.process(ses->ses, ...);}

if that would help to reach consensus - works for me. 

>      b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and existing rte_cryptodev_sym_session
>       structure.
> 	Advantages:
> 	1) allows to reuse same struct and init/create/clear() functions.
> 	    Probably less changes in control path.
> 	Disadvantages:
> 	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means that
> 	    we can't use the same rte_cryptodev_sym_session to hold private sessions pointers
> 	    for both sync and async mode  for the same device.
>                    So wthe only option we have - make PMD devops->sym_session_configure()
> 	    always create a session that can work in both cpu and lksd modes.
> 	    For some implementations that would probably mean that under the hood  PMD would create
> 	    2 different session structs (sync/async) and then use one or another depending on from what API been called.
> 	    Seems doable, but ...:
>                    - will contradict with statement from 1:
> 	      " New fields will be optional and would be used by PMD only when cpu-crypto session is requested."
>                       Now it becomes mandatory for all apps to specify cpu-crypto related parameters too,
> 	       even if they don't plan to use that mode - i.e. behavior change, existing app change.
>                      - might cause extra space overhead.
> 	3) not possible to store device (not driver) specific data within the session, but I think it is not really needed right now.
> 	    So probably minor compared to 2.b.2.
> 
> Actually #3 follows from #2, but decided to have them separated.
> 
> 3. process() parameters/behavior
>     a) Our suggestion: user stores ptr to session ops (or to (*process) itself) and just does:
>         session_ops->process(sess, ...);
> 	Advantages:
> 	1) fastest possible execution path
> 	2) no need to carry on dev_id for data-path
> 	Disadvantages:
> 	3) user has to carry on session_ops pointer explicitly
>     b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and then:
>         rte_crypto_cpu_sym_process(uint8_t dev_id, rte_cryptodev_sym_session  *sess, /*data parameters*/) {...
>                      rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
>                       /*and then inside PMD specifc process: */
>                      pmd_private_session = sess->sess_data[this_pmd_driver_id].data;
>                      /* and then most likely either */
>                      pmd_private_session->process(pmd_private_session, ...);
>                      /* or jump based on session/input data */
> 	Advantages:
> 	1) don't see any...
> 	Disadvantages:
> 	2) User has to carry on dev_id inside data-path
> 	3) Extra level of indirection (plus data dependency) - both for data and instructions.
> 	    Possible slowdown compared to a) (not measured).
>
  
Akhil Goyal Oct. 18, 2019, 1:17 p.m. UTC | #32
Hi Konstantin,

Added my comments inline with your draft.
> 
> 
> Hi Akhil,
> 
> > > BTW, to be honest, I don't consider current rte_cryptodev_sym_session
> > > construct for multiple device_ids:
> > > __extension__ struct {
> > >                 void *data;
> > >                 uint16_t refcnt;
> > >         } sess_data[0];
> > >         /**< Driver specific session material, variable size */
> > >
> > Yes I also feel the same. I was also not in favor of this when it was introduced.
> > Please go ahead and remove this. I have no issues with that.
> 
> If you are not happy with that structure, and admit there are issues with it,
> why do you push for reusing it for cpu-crypto API?
> Why  not to take step back, take into account current drawbacks
> and define something that (hopefully) would suite us better?
> Again new API will be experimental for some time, so we'll
> have some opportunity to see does it works and if not fix it.

[Akhil] This structure is serving some use case which is agreed upon in the
Community, we cannot just remove a feature altogether. Rather it is Intel's
Use case only.

> 
> About removing data[] from existing rte_cryptodev_sym_session -
> Personally would like to do that, but the change seems to be too massive.
> Definitely not ready for such effort right now.
> 

[snip]..

> 
> Ok, then my suggestion:
> Let's at least write down all points about crypto-dev approach where we
> disagree and then probably try to resolve them one by one....
> If we fail to make an agreement/progress in next week or so,
> (and no more reviews from the community)
> will have bring that subject to TB meeting to decide.
> Sounds fair to you?
Agreed
> 
> List is below.
> Please add/correct me, if I missed something.
> 
> Konstantin

Before going into comparison, we should define the requirement as well.
What I understood from the patchset,
"You need a synchronous API to perform crypto operations on raw data using SW PMDs"
So,
- no crypto-ops,
- no separate enq-deq, only single process API for data path
- Do not need any value addition to the session parameters.
  (You would need some parameters from the crypto-op which
   Are constant per session and since you wont use crypto-op,
   You need some place to store that)

Now as per your mail, the comparison
1. extra input parameters to create/init rte_(cpu)_sym_session.

Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo' and 'key' fields.
New fields will be optional and would be used by PMD only when cpu-crypto session is requested.
For lksd-crypto session PMD is free to ignore these fields.  
No ABI breakage is required. 

[Akhil] Agreed, no issues.

2. cpu-crypto create/init.
    a) Our suggestion - introduce new API for that:
        - rte_crypto_cpu_sym_init() that would init completely opaque  rte_crypto_cpu_sym_session.
        - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear); /*whatever else we'll need *'};
        - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform *xforms)
          that would return const struct rte_crypto_cpu_sym_session_ops *based on input xforms.
	Advantages:
	1)  totally opaque data structure (no ABI breakages in future), PMD writer is totally free
	     with it format and contents. 

[Akhil] It will have breakage at some point till we don't hit the union size.
Rather I don't suspect there will be more parameters added.
Or do we really care about the ABI breakage when the argument is about 
the correct place to add a piece of code or do we really agree to add code
anywhere just to avoid that breakage.

	2) each session entity is self-contained, user doesn't need to bring along dev_id etc.
	    dev_id is needed  only at init stage, after that user will use session ops to perform
	    all operations on that session (process(), clear(), etc.).

[Akhil] There is nothing called as session ops in current DPDK. What you are proposing
is a new concept which doesn't have any extra benefit, rather it is adding complexity
to have two different code paths for session create.


	3) User can decide does he wants to store ops[] pointer on a per session basis,
	    or on a per group of same sessions, or...

[Akhil] Will the user really care which process API should be called from the PMD.
Rather it should be driver's responsibility to store that in the session private data
which would be opaque to the user. As per my suggestion same process function can
be added to multiple sessions or a single session can be managed inside the PMD.


	4) No mandatory mempools for private sessions. User can allocate memory for cpu-crypto
	    session whenever he likes.

[Akhil] you mean session private data? You would need that memory anyways, user will be
allocating that already. You do not need to manage that.

	Disadvantages:
	5) Extra changes in control path
	6) User has to store session_ops pointer explicitly.

[Akhil] More disadvantages:
- All supporting PMDs will need to maintain TWO types of session for the
same crypto processing. Suppose a fix or a new feature(or algo) is added, PMD owner
will need to add code in both the session create APIs. Hence more maintenance and
error prone.
- Stacks which will be using these new APIs also need to maintain two
code path for the same processing while doing session initialization
for sync and async


     b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and existing rte_cryptodev_sym_session
      structure.
	Advantages:
	1) allows to reuse same struct and init/create/clear() functions.
	    Probably less changes in control path.
	Disadvantages:
	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means that 
	    we can't use the same rte_cryptodev_sym_session to hold private sessions pointers
	    for both sync and async mode  for the same device.
                   So the only option we have - make PMD devops->sym_session_configure()
	    always create a session that can work in both cpu and lksd modes.
	    For some implementations that would probably mean that under the hood  PMD would create
	    2 different session structs (sync/async) and then use one or another depending on from what API been called.
	    Seems doable, but ...:
                   - will contradict with statement from 1: 
	      " New fields will be optional and would be used by PMD only when cpu-crypto session is requested."
                      Now it becomes mandatory for all apps to specify cpu-crypto related parameters too,
	       even if they don't plan to use that mode - i.e. behavior change, existing app change.
                     - might cause extra space overhead.

[Akhil] It will not contradict with #1, you will only have few checks in the session init PMD
Which support this mode, find appropriate values and set the appropriate process() in it.
User should be able to call, legacy enq-deq as well as the new process() without any issue.
User would be at runtime will be able to change the datapath.
So this is not a disadvantage, it would be additional flexibility for the user.


	3) not possible to store device (not driver) specific data within the session, but I think it is not really needed right now.  
	    So probably minor compared to 2.b.2.

[Akhil] So lets omit this for current discussion. And I hope we can find some way to deal with it.


Actually #3 follows from #2, but decided to have them separated.

3. process() parameters/behavior
    a) Our suggestion: user stores ptr to session ops (or to (*process) itself) and just does:
        session_ops->process(sess, ...);
	Advantages:
	1) fastest possible execution path
	2) no need to carry on dev_id for data-path

[Akhil] I don't see any overhead of carrying dev id, at least it would be inline with the
current DPDK methodology.
What you are suggesting is a new way to get the things done without much benefit.
Also I don't see any performance difference as crypto workload is heavier than
Code cycles, so that wont matter.
So IMO, there is no advantage in your suggestion as well.


	Disadvantages:
	3) user has to carry on session_ops pointer explicitly
    b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and then:
        rte_crypto_cpu_sym_process(uint8_t dev_id, rte_cryptodev_sym_session  *sess, /*data parameters*/) {...
                     rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
                      /*and then inside PMD specifc process: */
                     pmd_private_session = sess->sess_data[this_pmd_driver_id].data;
                     /* and then most likely either */
                     pmd_private_session->process(pmd_private_session, ...);
                     /* or jump based on session/input data */
	Advantages:
	1) don't see any...
	Disadvantages:
	2) User has to carry on dev_id inside data-path
	3) Extra level of indirection (plus data dependency) - both for data and instructions.
	    Possible slowdown compared to a) (not measured). 
	 
Having said all this, if the disagreements cannot be resolved, you can go for a pmd API specific
to your PMDs, because as per my understanding the solution doesn't look scalable to other PMDs.
Your approach is aligned only to Intel, will not benefit others like openssl which is used by all
vendors.

Regards,
Akhil
  
Ananyev, Konstantin Oct. 21, 2019, 1:47 p.m. UTC | #33
Hi Akhil,

 
> Added my comments inline with your draft.
> >
> >
> > Hi Akhil,
> >
> > > > BTW, to be honest, I don't consider current rte_cryptodev_sym_session
> > > > construct for multiple device_ids:
> > > > __extension__ struct {
> > > >                 void *data;
> > > >                 uint16_t refcnt;
> > > >         } sess_data[0];
> > > >         /**< Driver specific session material, variable size */
> > > >
> > > Yes I also feel the same. I was also not in favor of this when it was introduced.
> > > Please go ahead and remove this. I have no issues with that.
> >
> > If you are not happy with that structure, and admit there are issues with it,
> > why do you push for reusing it for cpu-crypto API?
> > Why  not to take step back, take into account current drawbacks
> > and define something that (hopefully) would suite us better?
> > Again new API will be experimental for some time, so we'll
> > have some opportunity to see does it works and if not fix it.
> 
> [Akhil] This structure is serving some use case which is agreed upon in the
> Community, we cannot just remove a feature altogether.

I understand that, but we don't suggest to remove anything that already here.
We are talking about extending existing/adding new API.  
All our debates around how much we can reuse from existing one and what new
needs to be added.

> Rather it is Intel's  Use case only.
> 
> >
> > About removing data[] from existing rte_cryptodev_sym_session -
> > Personally would like to do that, but the change seems to be too massive.
> > Definitely not ready for such effort right now.
> >
> 
> [snip]..
> 
> >
> > Ok, then my suggestion:
> > Let's at least write down all points about crypto-dev approach where we
> > disagree and then probably try to resolve them one by one....
> > If we fail to make an agreement/progress in next week or so,
> > (and no more reviews from the community)
> > will have bring that subject to TB meeting to decide.
> > Sounds fair to you?
> Agreed
> >
> > List is below.
> > Please add/correct me, if I missed something.
> >
> > Konstantin
> 
> Before going into comparison, we should define the requirement as well.

Good point.

> What I understood from the patchset,
> "You need a synchronous API to perform crypto operations on raw data using SW PMDs"
> So,
> - no crypto-ops,
> - no separate enq-deq, only single process API for data path
> - Do not need any value addition to the session parameters.
>   (You would need some parameters from the crypto-op which
>    Are constant per session and since you wont use crypto-op,
>    You need some place to store that)

Yes, this is correct, I think.

> 
> Now as per your mail, the comparison
> 1. extra input parameters to create/init rte_(cpu)_sym_session.
> 
> Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo' and 'key' fields.
> New fields will be optional and would be used by PMD only when cpu-crypto session is requested.
> For lksd-crypto session PMD is free to ignore these fields.
> No ABI breakage is required.
> 
> [Akhil] Agreed, no issues.
> 
> 2. cpu-crypto create/init.
>     a) Our suggestion - introduce new API for that:
>         - rte_crypto_cpu_sym_init() that would init completely opaque  rte_crypto_cpu_sym_session.
>         - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear); /*whatever else we'll need *'};
>         - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform *xforms)
>           that would return const struct rte_crypto_cpu_sym_session_ops *based on input xforms.
> 	Advantages:
> 	1)  totally opaque data structure (no ABI breakages in future), PMD writer is totally free
> 	     with it format and contents.
> 
> [Akhil] It will have breakage at some point till we don't hit the union size.

Not sure, what union you are talking about?

> Rather I don't suspect there will be more parameters added.
> Or do we really care about the ABI breakage when the argument is about
> the correct place to add a piece of code or do we really agree to add code
> anywhere just to avoid that breakage.

I am talking about maintaining it in future.
if your struct is not seen externally, no chances to introduce ABI breakage. 

> 
> 	2) each session entity is self-contained, user doesn't need to bring along dev_id etc.
> 	    dev_id is needed  only at init stage, after that user will use session ops to perform
> 	    all operations on that session (process(), clear(), etc.).
> 
> [Akhil] There is nothing called as session ops in current DPDK.

True, but it doesn't mean we can't/shouldn't have it.

> What you are proposing
> is a new concept which doesn't have any extra benefit, rather it is adding complexity
> to have two different code paths for session create.
> 
> 
> 	3) User can decide does he wants to store ops[] pointer on a per session basis,
> 	    or on a per group of same sessions, or...
> 
> [Akhil] Will the user really care which process API should be called from the PMD.
> Rather it should be driver's responsibility to store that in the session private data
> which would be opaque to the user. As per my suggestion same process function can
> be added to multiple sessions or a single session can be managed inside the PMD.

In that case we either need to have a function per session (stored internally),
or make decision (branches) at run-time.
But as I said in other mail - I am ok to add small shim structure here:
either rte_crypto_cpu_sym_session { void *ses; struct rte_crypto_cpu_sym_session_ops ops; }
or rte_crypto_cpu_sym_session { void *ses; struct rte_crypto_cpu_sym_session_ops *ops; } 
And merge rte_crypto_cpu_sym_init() and rte_crypto_cpu_sym_get_ops() into one (init).

> 
> 
> 	4) No mandatory mempools for private sessions. User can allocate memory for cpu-crypto
> 	    session whenever he likes.
> 
> [Akhil] you mean session private data? 

Yes.

> You would need that memory anyways, user will be
> allocating that already.  You do not need to manage that.

What I am saying - right now user has no choice but to allocate it via mempool.
Which is probably not the best options for all cases.

> 
> 	Disadvantages:
> 	5) Extra changes in control path
> 	6) User has to store session_ops pointer explicitly.
> 
> [Akhil] More disadvantages:
> - All supporting PMDs will need to maintain TWO types of session for the
> same crypto processing. Suppose a fix or a new feature(or algo) is added, PMD owner
> will need to add code in both the session create APIs. Hence more maintenance and
> error prone.

I think majority of code for both paths will be common, plus even we'll reuse current sym_session_init() -
changes in PMD session_init() code will be unavoidable. 
But yes, it will be new entry in devops, that PMD will have to support.
Ok to add it as 7) to the list.

> - Stacks which will be using these new APIs also need to maintain two
> code path for the same processing while doing session initialization
> for sync and async

That's the same as #5 above, I think.

> 
> 
>      b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and existing rte_cryptodev_sym_session
>       structure.
> 	Advantages:
> 	1) allows to reuse same struct and init/create/clear() functions.
> 	    Probably less changes in control path.
> 	Disadvantages:
> 	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means that
> 	    we can't use the same rte_cryptodev_sym_session to hold private sessions pointers
> 	    for both sync and async mode  for the same device.
>                    So the only option we have - make PMD devops->sym_session_configure()
> 	    always create a session that can work in both cpu and lksd modes.
> 	    For some implementations that would probably mean that under the hood  PMD would create
> 	    2 different session structs (sync/async) and then use one or another depending on from what API been called.
> 	    Seems doable, but ...:
>                    - will contradict with statement from 1:
> 	      " New fields will be optional and would be used by PMD only when cpu-crypto session is requested."
>                       Now it becomes mandatory for all apps to specify cpu-crypto related parameters too,
> 	       even if they don't plan to use that mode - i.e. behavior change, existing app change.
>                      - might cause extra space overhead.
> 
> [Akhil] It will not contradict with #1, you will only have few checks in the session init PMD
> Which support this mode, find appropriate values and set the appropriate process() in it.
> User should be able to call, legacy enq-deq as well as the new process() without any issue.
> User would be at runtime will be able to change the datapath.
> So this is not a disadvantage, it would be additional flexibility for the user.

Ok, but that's what I am saying - if PMD would *always* have to create a session that can handle
both modes (sync/async), then user would *always* have to provide parameters for both modes too.
Otherwise if let say user didn't setup sync specific parameters at all, what PMD should do?
  - return with error?
  - init session that can be used with async path only?
My current assumption is #1.
If #2, then how user will be able to distinguish is that session valid for both modes, or only for one? 


> 
> 
> 	3) not possible to store device (not driver) specific data within the session, but I think it is not really needed right now.
> 	    So probably minor compared to 2.b.2.
> 
> [Akhil] So lets omit this for current discussion. And I hope we can find some way to deal with it.

I don't think there is an easy way to fix that with existing API.

> 
> 
> Actually #3 follows from #2, but decided to have them separated.
> 
> 3. process() parameters/behavior
>     a) Our suggestion: user stores ptr to session ops (or to (*process) itself) and just does:
>         session_ops->process(sess, ...);
> 	Advantages:
> 	1) fastest possible execution path
> 	2) no need to carry on dev_id for data-path
> 
> [Akhil] I don't see any overhead of carrying dev id, at least it would be inline with the
> current DPDK methodology.

If we'll add process() into rte_cryptodev itself (same as we have enqueue_burst/dequeue_burst),
then it will be an ABI breakage.
Also there are discussions to get rid of that approach completely:
http://mails.dpdk.org/archives/dev/2019-September/144674.html
So I am not sure this is a recommended way these days.

> What you are suggesting is a new way to get the things done without much benefit.

Would help with ABI stability plus better performance, isn't it enough?

> Also I don't see any performance difference as crypto workload is heavier than
> Code cycles, so that wont matter.

It depends.
Suppose function call costs you ~30 cycles.
If you have burst of big packets (let say crypto for each will take ~2K cycles) that belong
to the same session, then yes you wouldn't notice these extra 30 cycles at all.
If you have burst of small packets (let say crypto for each will take ~300 cycles)  each
belongs to different session, then it will cost you ~10% extra.

> So IMO, there is no advantage in your suggestion as well.
> 
> 
> 	Disadvantages:
> 	3) user has to carry on session_ops pointer explicitly
>     b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and then:
>         rte_crypto_cpu_sym_process(uint8_t dev_id, rte_cryptodev_sym_session  *sess, /*data parameters*/) {...
>                      rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
>                       /*and then inside PMD specifc process: */
>                      pmd_private_session = sess->sess_data[this_pmd_driver_id].data;
>                      /* and then most likely either */
>                      pmd_private_session->process(pmd_private_session, ...);
>                      /* or jump based on session/input data */
> 	Advantages:
> 	1) don't see any...
> 	Disadvantages:
> 	2) User has to carry on dev_id inside data-path
> 	3) Extra level of indirection (plus data dependency) - both for data and instructions.
> 	    Possible slowdown compared to a) (not measured).
> 
> Having said all this, if the disagreements cannot be resolved, you can go for a pmd API specific
> to your PMDs,

I don't think it is good idea.
PMD specific API is sort of deprecated path, also there is no clean way to use it within the libraries.

> because as per my understanding the solution doesn't look scalable to other PMDs.
> Your approach is aligned only to Intel , will not benefit others like openssl which is used by all
> vendors.

I feel quite opposite, from my perspective majority of SW backed PMDs will benefit from it.
And I don't see anything Intel specific in my proposals above. 
About openssl PMD: I am not an expert here, but looking at the code, I think it will fit really well.
Look yourself at its internal functions: process_openssl_auth_op/process_openssl_crypto_op,
I think they doing exactly the same - they use sync API underneath, and they are session based
(AFAIK you don't need any device/queue data, everything that needed for crypto/auth is stored inside session).

Konstantin
  
Akhil Goyal Oct. 22, 2019, 1:31 p.m. UTC | #34
Hi Konstantin,
> 
> 
> Hi Akhil,
> 
> 
> > Added my comments inline with your draft.
> > [snip]..
> >
> > >
> > > Ok, then my suggestion:
> > > Let's at least write down all points about crypto-dev approach where we
> > > disagree and then probably try to resolve them one by one....
> > > If we fail to make an agreement/progress in next week or so,
> > > (and no more reviews from the community)
> > > will have bring that subject to TB meeting to decide.
> > > Sounds fair to you?
> > Agreed
> > >
> > > List is below.
> > > Please add/correct me, if I missed something.
> > >
> > > Konstantin
> >
> > Before going into comparison, we should define the requirement as well.
> 
> Good point.
> 
> > What I understood from the patchset,
> > "You need a synchronous API to perform crypto operations on raw data using
> SW PMDs"
> > So,
> > - no crypto-ops,
> > - no separate enq-deq, only single process API for data path
> > - Do not need any value addition to the session parameters.
> >   (You would need some parameters from the crypto-op which
> >    Are constant per session and since you wont use crypto-op,
> >    You need some place to store that)
> 
> Yes, this is correct, I think.
> 
> >
> > Now as per your mail, the comparison
> > 1. extra input parameters to create/init rte_(cpu)_sym_session.
> >
> > Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo' and
> 'key' fields.
> > New fields will be optional and would be used by PMD only when cpu-crypto
> session is requested.
> > For lksd-crypto session PMD is free to ignore these fields.
> > No ABI breakage is required.
> >
> > [Akhil] Agreed, no issues.
> >
> > 2. cpu-crypto create/init.
> >     a) Our suggestion - introduce new API for that:
> >         - rte_crypto_cpu_sym_init() that would init completely opaque
> rte_crypto_cpu_sym_session.
> >         - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear);
> /*whatever else we'll need *'};
> >         - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform
> *xforms)
> >           that would return const struct rte_crypto_cpu_sym_session_ops *based
> on input xforms.
> > 	Advantages:
> > 	1)  totally opaque data structure (no ABI breakages in future), PMD
> writer is totally free
> > 	     with it format and contents.
> >
> > [Akhil] It will have breakage at some point till we don't hit the union size.
> 
> Not sure, what union you are talking about?

Union of xforms in rte_security_session_conf

> 
> > Rather I don't suspect there will be more parameters added.
> > Or do we really care about the ABI breakage when the argument is about
> > the correct place to add a piece of code or do we really agree to add code
> > anywhere just to avoid that breakage.
> 
> I am talking about maintaining it in future.
> if your struct is not seen externally, no chances to introduce ABI breakage.
> 
> >
> > 	2) each session entity is self-contained, user doesn't need to bring along
> dev_id etc.
> > 	    dev_id is needed  only at init stage, after that user will use session ops
> to perform
> > 	    all operations on that session (process(), clear(), etc.).
> >
> > [Akhil] There is nothing called as session ops in current DPDK.
> 
> True, but it doesn't mean we can't/shouldn't have it.

We can have it if it is not adding complexity for the user. Creating 2 different code
Paths for user is not desirable for the stack developers.

> 
> > What you are proposing
> > is a new concept which doesn't have any extra benefit, rather it is adding
> complexity
> > to have two different code paths for session create.
> >
> >
> > 	3) User can decide does he wants to store ops[] pointer on a per session
> basis,
> > 	    or on a per group of same sessions, or...
> >
> > [Akhil] Will the user really care which process API should be called from the
> PMD.
> > Rather it should be driver's responsibility to store that in the session private
> data
> > which would be opaque to the user. As per my suggestion same process
> function can
> > be added to multiple sessions or a single session can be managed inside the
> PMD.
> 
> In that case we either need to have a function per session (stored internally),
> or make decision (branches) at run-time.
> But as I said in other mail - I am ok to add small shim structure here:
> either rte_crypto_cpu_sym_session { void *ses; struct
> rte_crypto_cpu_sym_session_ops ops; }
> or rte_crypto_cpu_sym_session { void *ses; struct
> rte_crypto_cpu_sym_session_ops *ops; }
> And merge rte_crypto_cpu_sym_init() and rte_crypto_cpu_sym_get_ops() into
> one (init).

Again that will be a separate API call from the user perspective which is not good.

> 
> >
> >
> > 	4) No mandatory mempools for private sessions. User can allocate
> memory for cpu-crypto
> > 	    session whenever he likes.
> >
> > [Akhil] you mean session private data?
> 
> Yes.
> 
> > You would need that memory anyways, user will be
> > allocating that already.  You do not need to manage that.
> 
> What I am saying - right now user has no choice but to allocate it via mempool.
> Which is probably not the best options for all cases.
> 
> >
> > 	Disadvantages:
> > 	5) Extra changes in control path
> > 	6) User has to store session_ops pointer explicitly.
> >
> > [Akhil] More disadvantages:
> > - All supporting PMDs will need to maintain TWO types of session for the
> > same crypto processing. Suppose a fix or a new feature(or algo) is added, PMD
> owner
> > will need to add code in both the session create APIs. Hence more
> maintenance and
> > error prone.
> 
> I think majority of code for both paths will be common, plus even we'll reuse
> current sym_session_init() -
> changes in PMD session_init() code will be unavoidable.
> But yes, it will be new entry in devops, that PMD will have to support.
> Ok to add it as 7) to the list.
> 
> > - Stacks which will be using these new APIs also need to maintain two
> > code path for the same processing while doing session initialization
> > for sync and async
> 
> That's the same as #5 above, I think.
> 
> >
> >
> >      b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and
> existing rte_cryptodev_sym_session
> >       structure.
> > 	Advantages:
> > 	1) allows to reuse same struct and init/create/clear() functions.
> > 	    Probably less changes in control path.
> > 	Disadvantages:
> > 	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id,
> which means that
> > 	    we can't use the same rte_cryptodev_sym_session to hold private
> sessions pointers
> > 	    for both sync and async mode  for the same device.
> >                    So the only option we have - make PMD devops-
> >sym_session_configure()
> > 	    always create a session that can work in both cpu and lksd modes.
> > 	    For some implementations that would probably mean that under the
> hood  PMD would create
> > 	    2 different session structs (sync/async) and then use one or another
> depending on from what API been called.
> > 	    Seems doable, but ...:
> >                    - will contradict with statement from 1:
> > 	      " New fields will be optional and would be used by PMD only when
> cpu-crypto session is requested."
> >                       Now it becomes mandatory for all apps to specify cpu-crypto
> related parameters too,
> > 	       even if they don't plan to use that mode - i.e. behavior change,
> existing app change.
> >                      - might cause extra space overhead.
> >
> > [Akhil] It will not contradict with #1, you will only have few checks in the
> session init PMD
> > Which support this mode, find appropriate values and set the appropriate
> process() in it.
> > User should be able to call, legacy enq-deq as well as the new process()
> without any issue.
> > User would be at runtime will be able to change the datapath.
> > So this is not a disadvantage, it would be additional flexibility for the user.
> 
> Ok, but that's what I am saying - if PMD would *always* have to create a
> session that can handle
> both modes (sync/async), then user would *always* have to provide parameters
> for both modes too.
> Otherwise if let say user didn't setup sync specific parameters at all, what PMD
> should do?
>   - return with error?
>   - init session that can be used with async path only?
> My current assumption is #1.
> If #2, then how user will be able to distinguish is that session valid for both
> modes, or only for one?

I would say a 3rd option, do nothing if sync params are not set.
Probably have a debug print in the PMD(which support sync mode) to specify that 
session is not configured properly for sync mode.
Internally the PMD will not store the process() API in the session priv data
And while calling the first packet, devops->process will give an assert that session
Is not configured for sync mode. The session validation would be done in any case
your suggestion or mine. So no extra overhead at runtime.

> 
> 
> >
> >
> > 	3) not possible to store device (not driver) specific data within the
> session, but I think it is not really needed right now.
> > 	    So probably minor compared to 2.b.2.
> >
> > [Akhil] So lets omit this for current discussion. And I hope we can find some
> way to deal with it.
> 
> I don't think there is an easy way to fix that with existing API.
> 
> >
> >
> > Actually #3 follows from #2, but decided to have them separated.
> >
> > 3. process() parameters/behavior
> >     a) Our suggestion: user stores ptr to session ops (or to (*process) itself) and
> just does:
> >         session_ops->process(sess, ...);
> > 	Advantages:
> > 	1) fastest possible execution path
> > 	2) no need to carry on dev_id for data-path
> >
> > [Akhil] I don't see any overhead of carrying dev id, at least it would be inline
> with the
> > current DPDK methodology.
> 
> If we'll add process() into rte_cryptodev itself (same as we have
> enqueue_burst/dequeue_burst),
> then it will be an ABI breakage.
> Also there are discussions to get rid of that approach completely:
> http://mails.dpdk.org/archives/dev/2019-September/144674.html
> So I am not sure this is a recommended way these days.

We can either have it in rte_cryptodev or in rte_cryptodev_ops whichever
is good for you.

Whether it is ABI breakage or not, as per your requirements, this is the correct
approach. Do you agree with this or not?

Now handling the API/ABI breakage is a separate story. In 19.11 release we 
Are not much concerned about the ABI breakages, this was discussed in
community. So adding a new dev_ops wouldn't have been an issue.
Now since we are so close to RC1 deadline, we should come up with some
other solution for next release. May be having a pmd API in 20.02 and 
converting it into formal one in 20.11


> 
> > What you are suggesting is a new way to get the things done without much
> benefit.
> 
> Would help with ABI stability plus better performance, isn't it enough?
> 
> > Also I don't see any performance difference as crypto workload is heavier than
> > Code cycles, so that wont matter.
> 
> It depends.
> Suppose function call costs you ~30 cycles.
> If you have burst of big packets (let say crypto for each will take ~2K cycles) that
> belong
> to the same session, then yes you wouldn't notice these extra 30 cycles at all.
> If you have burst of small packets (let say crypto for each will take ~300 cycles)
> each
> belongs to different session, then it will cost you ~10% extra.

Let us do some profiling on openssl with both the approaches and find out the
difference.

> 
> > So IMO, there is no advantage in your suggestion as well.
> >
> >
> > 	Disadvantages:
> > 	3) user has to carry on session_ops pointer explicitly
> >     b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and then:
> >         rte_crypto_cpu_sym_process(uint8_t dev_id, rte_cryptodev_sym_session
> *sess, /*data parameters*/) {...
> >                      rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
> >                       /*and then inside PMD specifc process: */
> >                      pmd_private_session = sess->sess_data[this_pmd_driver_id].data;
> >                      /* and then most likely either */
> >                      pmd_private_session->process(pmd_private_session, ...);
> >                      /* or jump based on session/input data */
> > 	Advantages:
> > 	1) don't see any...
> > 	Disadvantages:
> > 	2) User has to carry on dev_id inside data-path
> > 	3) Extra level of indirection (plus data dependency) - both for data and
> instructions.
> > 	    Possible slowdown compared to a) (not measured).
> >
> > Having said all this, if the disagreements cannot be resolved, you can go for a
> pmd API specific
> > to your PMDs,
> 
> I don't think it is good idea.
> PMD specific API is sort of deprecated path, also there is no clean way to use it
> within the libraries.

I know that this is a deprecated path, we can use it until we are not allowed
to break ABI/API

> 
> > because as per my understanding the solution doesn't look scalable to other
> PMDs.
> > Your approach is aligned only to Intel , will not benefit others like openssl
> which is used by all
> > vendors.
> 
> I feel quite opposite, from my perspective majority of SW backed PMDs will
> benefit from it.
> And I don't see anything Intel specific in my proposals above.
> About openssl PMD: I am not an expert here, but looking at the code, I think it
> will fit really well.
> Look yourself at its internal functions:
> process_openssl_auth_op/process_openssl_crypto_op,
> I think they doing exactly the same - they use sync API underneath, and they are
> session based
> (AFAIK you don't need any device/queue data, everything that needed for
> crypto/auth is stored inside session).
> 
By vendor specific, I mean, 
- no PMD would like to have 2 different variants of session Init APIs for doing the same stuff.
- stacks will become vendor specific while using 2 separate session create APIs. No stack would
Like to support 2 variants of session create- one for HW PMDs and one for SW PMDs.

-Akhil
  
Ananyev, Konstantin Oct. 22, 2019, 5:44 p.m. UTC | #35
Hi Akhil,


> > > Added my comments inline with your draft.
> > > [snip]..
> > >
> > > >
> > > > Ok, then my suggestion:
> > > > Let's at least write down all points about crypto-dev approach where we
> > > > disagree and then probably try to resolve them one by one....
> > > > If we fail to make an agreement/progress in next week or so,
> > > > (and no more reviews from the community)
> > > > will have bring that subject to TB meeting to decide.
> > > > Sounds fair to you?
> > > Agreed
> > > >
> > > > List is below.
> > > > Please add/correct me, if I missed something.
> > > >
> > > > Konstantin
> > >
> > > Before going into comparison, we should define the requirement as well.
> >
> > Good point.
> >
> > > What I understood from the patchset,
> > > "You need a synchronous API to perform crypto operations on raw data using
> > SW PMDs"
> > > So,
> > > - no crypto-ops,
> > > - no separate enq-deq, only single process API for data path
> > > - Do not need any value addition to the session parameters.
> > >   (You would need some parameters from the crypto-op which
> > >    Are constant per session and since you wont use crypto-op,
> > >    You need some place to store that)
> >
> > Yes, this is correct, I think.
> >
> > >
> > > Now as per your mail, the comparison
> > > 1. extra input parameters to create/init rte_(cpu)_sym_session.
> > >
> > > Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo' and
> > 'key' fields.
> > > New fields will be optional and would be used by PMD only when cpu-crypto
> > session is requested.
> > > For lksd-crypto session PMD is free to ignore these fields.
> > > No ABI breakage is required.
> > >
> > > [Akhil] Agreed, no issues.
> > >
> > > 2. cpu-crypto create/init.
> > >     a) Our suggestion - introduce new API for that:
> > >         - rte_crypto_cpu_sym_init() that would init completely opaque
> > rte_crypto_cpu_sym_session.
> > >         - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear);
> > /*whatever else we'll need *'};
> > >         - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform
> > *xforms)
> > >           that would return const struct rte_crypto_cpu_sym_session_ops *based
> > on input xforms.
> > > 	Advantages:
> > > 	1)  totally opaque data structure (no ABI breakages in future), PMD
> > writer is totally free
> > > 	     with it format and contents.
> > >
> > > [Akhil] It will have breakage at some point till we don't hit the union size.
> >
> > Not sure, what union you are talking about?
> 
> Union of xforms in rte_security_session_conf

Hmm, how does it relates here?
I thought we discussing pure rte_cryptodev_sym_session, no?

> 
> >
> > > Rather I don't suspect there will be more parameters added.
> > > Or do we really care about the ABI breakage when the argument is about
> > > the correct place to add a piece of code or do we really agree to add code
> > > anywhere just to avoid that breakage.
> >
> > I am talking about maintaining it in future.
> > if your struct is not seen externally, no chances to introduce ABI breakage.
> >
> > >
> > > 	2) each session entity is self-contained, user doesn't need to bring along
> > dev_id etc.
> > > 	    dev_id is needed  only at init stage, after that user will use session ops
> > to perform
> > > 	    all operations on that session (process(), clear(), etc.).
> > >
> > > [Akhil] There is nothing called as session ops in current DPDK.
> >
> > True, but it doesn't mean we can't/shouldn't have it.
> 
> We can have it if it is not adding complexity for the user. Creating 2 different code
> Paths for user is not desirable for the stack developers.
> 
> >
> > > What you are proposing
> > > is a new concept which doesn't have any extra benefit, rather it is adding
> > complexity
> > > to have two different code paths for session create.
> > >
> > >
> > > 	3) User can decide does he wants to store ops[] pointer on a per session
> > basis,
> > > 	    or on a per group of same sessions, or...
> > >
> > > [Akhil] Will the user really care which process API should be called from the
> > PMD.
> > > Rather it should be driver's responsibility to store that in the session private
> > data
> > > which would be opaque to the user. As per my suggestion same process
> > function can
> > > be added to multiple sessions or a single session can be managed inside the
> > PMD.
> >
> > In that case we either need to have a function per session (stored internally),
> > or make decision (branches) at run-time.
> > But as I said in other mail - I am ok to add small shim structure here:
> > either rte_crypto_cpu_sym_session { void *ses; struct
> > rte_crypto_cpu_sym_session_ops ops; }
> > or rte_crypto_cpu_sym_session { void *ses; struct
> > rte_crypto_cpu_sym_session_ops *ops; }
> > And merge rte_crypto_cpu_sym_init() and rte_crypto_cpu_sym_get_ops() into
> > one (init).
> 
> Again that will be a separate API call from the user perspective which is not good.
> 
> >
> > >
> > >
> > > 	4) No mandatory mempools for private sessions. User can allocate
> > memory for cpu-crypto
> > > 	    session whenever he likes.
> > >
> > > [Akhil] you mean session private data?
> >
> > Yes.
> >
> > > You would need that memory anyways, user will be
> > > allocating that already.  You do not need to manage that.
> >
> > What I am saying - right now user has no choice but to allocate it via mempool.
> > Which is probably not the best options for all cases.
> >
> > >
> > > 	Disadvantages:
> > > 	5) Extra changes in control path
> > > 	6) User has to store session_ops pointer explicitly.
> > >
> > > [Akhil] More disadvantages:
> > > - All supporting PMDs will need to maintain TWO types of session for the
> > > same crypto processing. Suppose a fix or a new feature(or algo) is added, PMD
> > owner
> > > will need to add code in both the session create APIs. Hence more
> > maintenance and
> > > error prone.
> >
> > I think majority of code for both paths will be common, plus even we'll reuse
> > current sym_session_init() -
> > changes in PMD session_init() code will be unavoidable.
> > But yes, it will be new entry in devops, that PMD will have to support.
> > Ok to add it as 7) to the list.
> >
> > > - Stacks which will be using these new APIs also need to maintain two
> > > code path for the same processing while doing session initialization
> > > for sync and async
> >
> > That's the same as #5 above, I think.
> >
> > >
> > >
> > >      b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and
> > existing rte_cryptodev_sym_session
> > >       structure.
> > > 	Advantages:
> > > 	1) allows to reuse same struct and init/create/clear() functions.
> > > 	    Probably less changes in control path.
> > > 	Disadvantages:
> > > 	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id,
> > which means that
> > > 	    we can't use the same rte_cryptodev_sym_session to hold private
> > sessions pointers
> > > 	    for both sync and async mode  for the same device.
> > >                    So the only option we have - make PMD devops-
> > >sym_session_configure()
> > > 	    always create a session that can work in both cpu and lksd modes.
> > > 	    For some implementations that would probably mean that under the
> > hood  PMD would create
> > > 	    2 different session structs (sync/async) and then use one or another
> > depending on from what API been called.
> > > 	    Seems doable, but ...:
> > >                    - will contradict with statement from 1:
> > > 	      " New fields will be optional and would be used by PMD only when
> > cpu-crypto session is requested."
> > >                       Now it becomes mandatory for all apps to specify cpu-crypto
> > related parameters too,
> > > 	       even if they don't plan to use that mode - i.e. behavior change,
> > existing app change.
> > >                      - might cause extra space overhead.
> > >
> > > [Akhil] It will not contradict with #1, you will only have few checks in the
> > session init PMD
> > > Which support this mode, find appropriate values and set the appropriate
> > process() in it.
> > > User should be able to call, legacy enq-deq as well as the new process()
> > without any issue.
> > > User would be at runtime will be able to change the datapath.
> > > So this is not a disadvantage, it would be additional flexibility for the user.
> >
> > Ok, but that's what I am saying - if PMD would *always* have to create a
> > session that can handle
> > both modes (sync/async), then user would *always* have to provide parameters
> > for both modes too.
> > Otherwise if let say user didn't setup sync specific parameters at all, what PMD
> > should do?
> >   - return with error?
> >   - init session that can be used with async path only?
> > My current assumption is #1.
> > If #2, then how user will be able to distinguish is that session valid for both
> > modes, or only for one?
> 
> I would say a 3rd option, do nothing if sync params are not set.
> Probably have a debug print in the PMD(which support sync mode) to specify that
> session is not configured properly for sync mode.

So, just print warning and proceed with init session that can be used with async path only?
Then it sounds the same as #2 above.	
Which actually means that sync mode parameters for sym_session_init() becomes optional.
Then we need an API to provide to the user information what modes
(sync+async/async only) is supported by that session for given dev_id.
And user would have to query/retain this information at control-path,
and store it somewhere in user-space together with session pointer and dev_ids
to use later at data-path (same as we do now for session type).
That definitely requires changes in control-path to start using it.
Plus the fact that this value can differ for different dev_ids for the same session -
doesn't make things easier here. 

> Internally the PMD will not store the process() API in the session priv data
> And while calling the first packet, devops->process will give an assert that session
> Is not configured for sync mode. The session validation would be done in any case
> your suggestion or mine. So no extra overhead at runtime.

I believe that after session_init() user should get either an error or
valid  session handler that he can use at runtime.
Pushing session validation to runtime doesn't seem like a good idea.

> 
> >
> >
> > >
> > >
> > > 	3) not possible to store device (not driver) specific data within the
> > session, but I think it is not really needed right now.
> > > 	    So probably minor compared to 2.b.2.
> > >
> > > [Akhil] So lets omit this for current discussion. And I hope we can find some
> > way to deal with it.
> >
> > I don't think there is an easy way to fix that with existing API.
> >
> > >
> > >
> > > Actually #3 follows from #2, but decided to have them separated.
> > >
> > > 3. process() parameters/behavior
> > >     a) Our suggestion: user stores ptr to session ops (or to (*process) itself) and
> > just does:
> > >         session_ops->process(sess, ...);
> > > 	Advantages:
> > > 	1) fastest possible execution path
> > > 	2) no need to carry on dev_id for data-path
> > >
> > > [Akhil] I don't see any overhead of carrying dev id, at least it would be inline
> > with the
> > > current DPDK methodology.
> >
> > If we'll add process() into rte_cryptodev itself (same as we have
> > enqueue_burst/dequeue_burst),
> > then it will be an ABI breakage.
> > Also there are discussions to get rid of that approach completely:
> > http://mails.dpdk.org/archives/dev/2019-September/144674.html
> > So I am not sure this is a recommended way these days.
> 
> We can either have it in rte_cryptodev or in rte_cryptodev_ops whichever
> is good for you.
> 
> Whether it is ABI breakage or not, as per your requirements, this is the correct
> approach. Do you agree with this or not?

I think it is possible approach, but not the best one:
it looks quite flakey to me (see all these uncertainty with sym_session_init above),
plus introduces extra overhead at data-path.

> 
> Now handling the API/ABI breakage is a separate story. In 19.11 release we
> Are not much concerned about the ABI breakages, this was discussed in
> community. So adding a new dev_ops wouldn't have been an issue.
> Now since we are so close to RC1 deadline, we should come up with some
> other solution for next release. May be having a pmd API in 20.02 and
> converting it into formal one in 20.11
> 
> 
> >
> > > What you are suggesting is a new way to get the things done without much
> > benefit.
> >
> > Would help with ABI stability plus better performance, isn't it enough?
> >
> > > Also I don't see any performance difference as crypto workload is heavier than
> > > Code cycles, so that wont matter.
> >
> > It depends.
> > Suppose function call costs you ~30 cycles.
> > If you have burst of big packets (let say crypto for each will take ~2K cycles) that
> > belong
> > to the same session, then yes you wouldn't notice these extra 30 cycles at all.
> > If you have burst of small packets (let say crypto for each will take ~300 cycles)
> > each
> > belongs to different session, then it will cost you ~10% extra.
> 
> Let us do some profiling on openssl with both the approaches and find out the
> difference.
> 
> >
> > > So IMO, there is no advantage in your suggestion as well.
> > >
> > >
> > > 	Disadvantages:
> > > 	3) user has to carry on session_ops pointer explicitly
> > >     b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and then:
> > >         rte_crypto_cpu_sym_process(uint8_t dev_id, rte_cryptodev_sym_session
> > *sess, /*data parameters*/) {...
> > >                      rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
> > >                       /*and then inside PMD specifc process: */
> > >                      pmd_private_session = sess->sess_data[this_pmd_driver_id].data;
> > >                      /* and then most likely either */
> > >                      pmd_private_session->process(pmd_private_session, ...);
> > >                      /* or jump based on session/input data */
> > > 	Advantages:
> > > 	1) don't see any...
> > > 	Disadvantages:
> > > 	2) User has to carry on dev_id inside data-path
> > > 	3) Extra level of indirection (plus data dependency) - both for data and
> > instructions.
> > > 	    Possible slowdown compared to a) (not measured).
> > >
> > > Having said all this, if the disagreements cannot be resolved, you can go for a
> > pmd API specific
> > > to your PMDs,
> >
> > I don't think it is good idea.
> > PMD specific API is sort of deprecated path, also there is no clean way to use it
> > within the libraries.
> 
> I know that this is a deprecated path, we can use it until we are not allowed
> to break ABI/API
> 
> >
> > > because as per my understanding the solution doesn't look scalable to other
> > PMDs.
> > > Your approach is aligned only to Intel , will not benefit others like openssl
> > which is used by all
> > > vendors.
> >
> > I feel quite opposite, from my perspective majority of SW backed PMDs will
> > benefit from it.
> > And I don't see anything Intel specific in my proposals above.
> > About openssl PMD: I am not an expert here, but looking at the code, I think it
> > will fit really well.
> > Look yourself at its internal functions:
> > process_openssl_auth_op/process_openssl_crypto_op,
> > I think they doing exactly the same - they use sync API underneath, and they are
> > session based
> > (AFAIK you don't need any device/queue data, everything that needed for
> > crypto/auth is stored inside session).
> >
> By vendor specific, I mean,
> - no PMD would like to have 2 different variants of session Init APIs for doing the same stuff.
> - stacks will become vendor specific while using 2 separate session create APIs. No stack would
> Like to support 2 variants of session create- one for HW PMDs and one for SW PMDs.

I think what you refer on has nothing to do with 'vendor specific'.
I would name it 'extra overhead for PMD and stack writers'.
Yes, for sure there is extra overhead (as always with new API) -
for both producer (PMD writer) and consumer (stack writer): 
New function(s) to support,  probably more tests to create/run, etc.
Though this API is optional - if PMD/stack maintainer doesn't see
value in it, they are free not to support it.
From other side, re-using  rte_cryptodev_sym_session_init()
wouldn't help anyway - both data-path and control-path would differ
from async mode anyway.
BTW, right now to support different HW flavors
we do have 4 different control and data-paths for both
ipsec-secgw and librte_ipsec:
lkds-none/lksd-proto/inline-crypto/inline-proto.
And that is considered to be ok.
Honestly, I don't understand why SW backed implementations
can't have their own path that would suite them most.
Konstantin
  
Ananyev, Konstantin Oct. 22, 2019, 10:21 p.m. UTC | #36
> > > > Added my comments inline with your draft.
> > > > [snip]..
> > > >
> > > > >
> > > > > Ok, then my suggestion:
> > > > > Let's at least write down all points about crypto-dev approach where we
> > > > > disagree and then probably try to resolve them one by one....
> > > > > If we fail to make an agreement/progress in next week or so,
> > > > > (and no more reviews from the community)
> > > > > will have bring that subject to TB meeting to decide.
> > > > > Sounds fair to you?
> > > > Agreed
> > > > >
> > > > > List is below.
> > > > > Please add/correct me, if I missed something.
> > > > >
> > > > > Konstantin
> > > >
> > > > Before going into comparison, we should define the requirement as well.
> > >
> > > Good point.
> > >
> > > > What I understood from the patchset,
> > > > "You need a synchronous API to perform crypto operations on raw data using
> > > SW PMDs"
> > > > So,
> > > > - no crypto-ops,
> > > > - no separate enq-deq, only single process API for data path
> > > > - Do not need any value addition to the session parameters.
> > > >   (You would need some parameters from the crypto-op which
> > > >    Are constant per session and since you wont use crypto-op,
> > > >    You need some place to store that)
> > >
> > > Yes, this is correct, I think.
> > >
> > > >
> > > > Now as per your mail, the comparison
> > > > 1. extra input parameters to create/init rte_(cpu)_sym_session.
> > > >
> > > > Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo' and
> > > 'key' fields.
> > > > New fields will be optional and would be used by PMD only when cpu-crypto
> > > session is requested.
> > > > For lksd-crypto session PMD is free to ignore these fields.
> > > > No ABI breakage is required.
> > > >
> > > > [Akhil] Agreed, no issues.
> > > >
> > > > 2. cpu-crypto create/init.
> > > >     a) Our suggestion - introduce new API for that:
> > > >         - rte_crypto_cpu_sym_init() that would init completely opaque
> > > rte_crypto_cpu_sym_session.
> > > >         - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear);
> > > /*whatever else we'll need *'};
> > > >         - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform
> > > *xforms)
> > > >           that would return const struct rte_crypto_cpu_sym_session_ops *based
> > > on input xforms.
> > > > 	Advantages:
> > > > 	1)  totally opaque data structure (no ABI breakages in future), PMD
> > > writer is totally free
> > > > 	     with it format and contents.
> > > >
> > > > [Akhil] It will have breakage at some point till we don't hit the union size.
> > >
> > > Not sure, what union you are talking about?
> >
> > Union of xforms in rte_security_session_conf
> 
> Hmm, how does it relates here?
> I thought we discussing pure rte_cryptodev_sym_session, no?
> 
> >
> > >
> > > > Rather I don't suspect there will be more parameters added.
> > > > Or do we really care about the ABI breakage when the argument is about
> > > > the correct place to add a piece of code or do we really agree to add code
> > > > anywhere just to avoid that breakage.
> > >
> > > I am talking about maintaining it in future.
> > > if your struct is not seen externally, no chances to introduce ABI breakage.
> > >
> > > >
> > > > 	2) each session entity is self-contained, user doesn't need to bring along
> > > dev_id etc.
> > > > 	    dev_id is needed  only at init stage, after that user will use session ops
> > > to perform
> > > > 	    all operations on that session (process(), clear(), etc.).
> > > >
> > > > [Akhil] There is nothing called as session ops in current DPDK.
> > >
> > > True, but it doesn't mean we can't/shouldn't have it.
> >
> > We can have it if it is not adding complexity for the user. Creating 2 different code
> > Paths for user is not desirable for the stack developers.
> >
> > >
> > > > What you are proposing
> > > > is a new concept which doesn't have any extra benefit, rather it is adding
> > > complexity
> > > > to have two different code paths for session create.
> > > >
> > > >
> > > > 	3) User can decide does he wants to store ops[] pointer on a per session
> > > basis,
> > > > 	    or on a per group of same sessions, or...
> > > >
> > > > [Akhil] Will the user really care which process API should be called from the
> > > PMD.
> > > > Rather it should be driver's responsibility to store that in the session private
> > > data
> > > > which would be opaque to the user. As per my suggestion same process
> > > function can
> > > > be added to multiple sessions or a single session can be managed inside the
> > > PMD.
> > >
> > > In that case we either need to have a function per session (stored internally),
> > > or make decision (branches) at run-time.
> > > But as I said in other mail - I am ok to add small shim structure here:
> > > either rte_crypto_cpu_sym_session { void *ses; struct
> > > rte_crypto_cpu_sym_session_ops ops; }
> > > or rte_crypto_cpu_sym_session { void *ses; struct
> > > rte_crypto_cpu_sym_session_ops *ops; }
> > > And merge rte_crypto_cpu_sym_init() and rte_crypto_cpu_sym_get_ops() into
> > > one (init).
> >
> > Again that will be a separate API call from the user perspective which is not good.
> >
> > >
> > > >
> > > >
> > > > 	4) No mandatory mempools for private sessions. User can allocate
> > > memory for cpu-crypto
> > > > 	    session whenever he likes.
> > > >
> > > > [Akhil] you mean session private data?
> > >
> > > Yes.
> > >
> > > > You would need that memory anyways, user will be
> > > > allocating that already.  You do not need to manage that.
> > >
> > > What I am saying - right now user has no choice but to allocate it via mempool.
> > > Which is probably not the best options for all cases.
> > >
> > > >
> > > > 	Disadvantages:
> > > > 	5) Extra changes in control path
> > > > 	6) User has to store session_ops pointer explicitly.
> > > >
> > > > [Akhil] More disadvantages:
> > > > - All supporting PMDs will need to maintain TWO types of session for the
> > > > same crypto processing. Suppose a fix or a new feature(or algo) is added, PMD
> > > owner
> > > > will need to add code in both the session create APIs. Hence more
> > > maintenance and
> > > > error prone.
> > >
> > > I think majority of code for both paths will be common, plus even we'll reuse
> > > current sym_session_init() -
> > > changes in PMD session_init() code will be unavoidable.
> > > But yes, it will be new entry in devops, that PMD will have to support.
> > > Ok to add it as 7) to the list.
> > >
> > > > - Stacks which will be using these new APIs also need to maintain two
> > > > code path for the same processing while doing session initialization
> > > > for sync and async
> > >
> > > That's the same as #5 above, I think.
> > >
> > > >
> > > >
> > > >      b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and
> > > existing rte_cryptodev_sym_session
> > > >       structure.
> > > > 	Advantages:
> > > > 	1) allows to reuse same struct and init/create/clear() functions.
> > > > 	    Probably less changes in control path.
> > > > 	Disadvantages:
> > > > 	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id,
> > > which means that
> > > > 	    we can't use the same rte_cryptodev_sym_session to hold private
> > > sessions pointers
> > > > 	    for both sync and async mode  for the same device.
> > > >                    So the only option we have - make PMD devops-
> > > >sym_session_configure()
> > > > 	    always create a session that can work in both cpu and lksd modes.
> > > > 	    For some implementations that would probably mean that under the
> > > hood  PMD would create
> > > > 	    2 different session structs (sync/async) and then use one or another
> > > depending on from what API been called.
> > > > 	    Seems doable, but ...:
> > > >                    - will contradict with statement from 1:
> > > > 	      " New fields will be optional and would be used by PMD only when
> > > cpu-crypto session is requested."
> > > >                       Now it becomes mandatory for all apps to specify cpu-crypto
> > > related parameters too,
> > > > 	       even if they don't plan to use that mode - i.e. behavior change,
> > > existing app change.
> > > >                      - might cause extra space overhead.
> > > >
> > > > [Akhil] It will not contradict with #1, you will only have few checks in the
> > > session init PMD
> > > > Which support this mode, find appropriate values and set the appropriate
> > > process() in it.
> > > > User should be able to call, legacy enq-deq as well as the new process()
> > > without any issue.
> > > > User would be at runtime will be able to change the datapath.
> > > > So this is not a disadvantage, it would be additional flexibility for the user.
> > >
> > > Ok, but that's what I am saying - if PMD would *always* have to create a
> > > session that can handle
> > > both modes (sync/async), then user would *always* have to provide parameters
> > > for both modes too.
> > > Otherwise if let say user didn't setup sync specific parameters at all, what PMD
> > > should do?
> > >   - return with error?
> > >   - init session that can be used with async path only?
> > > My current assumption is #1.
> > > If #2, then how user will be able to distinguish is that session valid for both
> > > modes, or only for one?
> >
> > I would say a 3rd option, do nothing if sync params are not set.
> > Probably have a debug print in the PMD(which support sync mode) to specify that
> > session is not configured properly for sync mode.
> 
> So, just print warning and proceed with init session that can be used with async path only?
> Then it sounds the same as #2 above.
> Which actually means that sync mode parameters for sym_session_init() becomes optional.
> Then we need an API to provide to the user information what modes
> (sync+async/async only) is supported by that session for given dev_id.
> And user would have to query/retain this information at control-path,
> and store it somewhere in user-space together with session pointer and dev_ids
> to use later at data-path (same as we do now for session type).
> That definitely requires changes in control-path to start using it.
> Plus the fact that this value can differ for different dev_ids for the same session -
> doesn't make things easier here.
> 
> > Internally the PMD will not store the process() API in the session priv data
> > And while calling the first packet, devops->process will give an assert that session
> > Is not configured for sync mode. The session validation would be done in any case
> > your suggestion or mine. So no extra overhead at runtime.
> 
> I believe that after session_init() user should get either an error or
> valid  session handler that he can use at runtime.
> Pushing session validation to runtime doesn't seem like a good idea.
> 
> >
> > >
> > >
> > > >
> > > >
> > > > 	3) not possible to store device (not driver) specific data within the
> > > session, but I think it is not really needed right now.
> > > > 	    So probably minor compared to 2.b.2.
> > > >
> > > > [Akhil] So lets omit this for current discussion. And I hope we can find some
> > > way to deal with it.
> > >
> > > I don't think there is an easy way to fix that with existing API.
> > >
> > > >
> > > >
> > > > Actually #3 follows from #2, but decided to have them separated.
> > > >
> > > > 3. process() parameters/behavior
> > > >     a) Our suggestion: user stores ptr to session ops (or to (*process) itself) and
> > > just does:
> > > >         session_ops->process(sess, ...);
> > > > 	Advantages:
> > > > 	1) fastest possible execution path
> > > > 	2) no need to carry on dev_id for data-path
> > > >
> > > > [Akhil] I don't see any overhead of carrying dev id, at least it would be inline
> > > with the
> > > > current DPDK methodology.
> > >
> > > If we'll add process() into rte_cryptodev itself (same as we have
> > > enqueue_burst/dequeue_burst),
> > > then it will be an ABI breakage.
> > > Also there are discussions to get rid of that approach completely:
> > > http://mails.dpdk.org/archives/dev/2019-September/144674.html
> > > So I am not sure this is a recommended way these days.
> >
> > We can either have it in rte_cryptodev or in rte_cryptodev_ops whichever
> > is good for you.
> >
> > Whether it is ABI breakage or not, as per your requirements, this is the correct
> > approach. Do you agree with this or not?
> 
> I think it is possible approach, but not the best one:
> it looks quite flakey to me (see all these uncertainty with sym_session_init above),
> plus introduces extra overhead at data-path.
> 
> >
> > Now handling the API/ABI breakage is a separate story. In 19.11 release we
> > Are not much concerned about the ABI breakages, this was discussed in
> > community. So adding a new dev_ops wouldn't have been an issue.
> > Now since we are so close to RC1 deadline, we should come up with some
> > other solution for next release. May be having a pmd API in 20.02 and
> > converting it into formal one in 20.11
> >
> >
> > >
> > > > What you are suggesting is a new way to get the things done without much
> > > benefit.
> > >
> > > Would help with ABI stability plus better performance, isn't it enough?
> > >
> > > > Also I don't see any performance difference as crypto workload is heavier than
> > > > Code cycles, so that wont matter.
> > >
> > > It depends.
> > > Suppose function call costs you ~30 cycles.
> > > If you have burst of big packets (let say crypto for each will take ~2K cycles) that
> > > belong
> > > to the same session, then yes you wouldn't notice these extra 30 cycles at all.
> > > If you have burst of small packets (let say crypto for each will take ~300 cycles)
> > > each
> > > belongs to different session, then it will cost you ~10% extra.
> >
> > Let us do some profiling on openssl with both the approaches and find out the
> > difference.
> >
> > >
> > > > So IMO, there is no advantage in your suggestion as well.
> > > >
> > > >
> > > > 	Disadvantages:
> > > > 	3) user has to carry on session_ops pointer explicitly
> > > >     b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and then:
> > > >         rte_crypto_cpu_sym_process(uint8_t dev_id, rte_cryptodev_sym_session
> > > *sess, /*data parameters*/) {...
> > > >                      rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
> > > >                       /*and then inside PMD specifc process: */
> > > >                      pmd_private_session = sess->sess_data[this_pmd_driver_id].data;
> > > >                      /* and then most likely either */
> > > >                      pmd_private_session->process(pmd_private_session, ...);
> > > >                      /* or jump based on session/input data */
> > > > 	Advantages:
> > > > 	1) don't see any...
> > > > 	Disadvantages:
> > > > 	2) User has to carry on dev_id inside data-path
> > > > 	3) Extra level of indirection (plus data dependency) - both for data and
> > > instructions.
> > > > 	    Possible slowdown compared to a) (not measured).
> > > >
> > > > Having said all this, if the disagreements cannot be resolved, you can go for a
> > > pmd API specific
> > > > to your PMDs,
> > >
> > > I don't think it is good idea.
> > > PMD specific API is sort of deprecated path, also there is no clean way to use it
> > > within the libraries.
> >
> > I know that this is a deprecated path, we can use it until we are not allowed
> > to break ABI/API
> >
> > >
> > > > because as per my understanding the solution doesn't look scalable to other
> > > PMDs.
> > > > Your approach is aligned only to Intel , will not benefit others like openssl
> > > which is used by all
> > > > vendors.
> > >
> > > I feel quite opposite, from my perspective majority of SW backed PMDs will
> > > benefit from it.
> > > And I don't see anything Intel specific in my proposals above.
> > > About openssl PMD: I am not an expert here, but looking at the code, I think it
> > > will fit really well.
> > > Look yourself at its internal functions:
> > > process_openssl_auth_op/process_openssl_crypto_op,
> > > I think they doing exactly the same - they use sync API underneath, and they are
> > > session based
> > > (AFAIK you don't need any device/queue data, everything that needed for
> > > crypto/auth is stored inside session).

Looked at drivers/crypto/armv8 - same story here, I believe.

> > >
> > By vendor specific, I mean,
> > - no PMD would like to have 2 different variants of session Init APIs for doing the same stuff.
> > - stacks will become vendor specific while using 2 separate session create APIs. No stack would
> > Like to support 2 variants of session create- one for HW PMDs and one for SW PMDs.
> 
> I think what you refer on has nothing to do with 'vendor specific'.
> I would name it 'extra overhead for PMD and stack writers'.
> Yes, for sure there is extra overhead (as always with new API) -
> for both producer (PMD writer) and consumer (stack writer):
> New function(s) to support,  probably more tests to create/run, etc.
> Though this API is optional - if PMD/stack maintainer doesn't see
> value in it, they are free not to support it.
> From other side, re-using  rte_cryptodev_sym_session_init()
> wouldn't help anyway - both data-path and control-path would differ
> from async mode anyway.
> BTW, right now to support different HW flavors
> we do have 4 different control and data-paths for both
> ipsec-secgw and librte_ipsec:
> lkds-none/lksd-proto/inline-crypto/inline-proto.
> And that is considered to be ok.
> Honestly, I don't understand why SW backed implementations
> can't have their own path that would suite them most.
> Konstantin
> 
> 
> 
> 
>
  
Akhil Goyal Oct. 23, 2019, 10:05 a.m. UTC | #37
Hi Konstantin,
> 
> Hi Akhil,
> 
> 
> > > > Added my comments inline with your draft.
> > > > [snip]..
> > > >
> > > > >
> > > > > Ok, then my suggestion:
> > > > > Let's at least write down all points about crypto-dev approach where we
> > > > > disagree and then probably try to resolve them one by one....
> > > > > If we fail to make an agreement/progress in next week or so,
> > > > > (and no more reviews from the community)
> > > > > will have bring that subject to TB meeting to decide.
> > > > > Sounds fair to you?
> > > > Agreed
> > > > >
> > > > > List is below.
> > > > > Please add/correct me, if I missed something.
> > > > >
> > > > > Konstantin
> > > >
> > > > Before going into comparison, we should define the requirement as well.
> > >
> > > Good point.
> > >
> > > > What I understood from the patchset,
> > > > "You need a synchronous API to perform crypto operations on raw data
> using
> > > SW PMDs"
> > > > So,
> > > > - no crypto-ops,
> > > > - no separate enq-deq, only single process API for data path
> > > > - Do not need any value addition to the session parameters.
> > > >   (You would need some parameters from the crypto-op which
> > > >    Are constant per session and since you wont use crypto-op,
> > > >    You need some place to store that)
> > >
> > > Yes, this is correct, I think.
> > >
> > > >
> > > > Now as per your mail, the comparison
> > > > 1. extra input parameters to create/init rte_(cpu)_sym_session.
> > > >
> > > > Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo'
> and
> > > 'key' fields.
> > > > New fields will be optional and would be used by PMD only when cpu-
> crypto
> > > session is requested.
> > > > For lksd-crypto session PMD is free to ignore these fields.
> > > > No ABI breakage is required.
> > > >
> > > > [Akhil] Agreed, no issues.
> > > >
> > > > 2. cpu-crypto create/init.
> > > >     a) Our suggestion - introduce new API for that:
> > > >         - rte_crypto_cpu_sym_init() that would init completely opaque
> > > rte_crypto_cpu_sym_session.
> > > >         - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear);
> > > /*whatever else we'll need *'};
> > > >         - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform
> > > *xforms)
> > > >           that would return const struct rte_crypto_cpu_sym_session_ops
> *based
> > > on input xforms.
> > > > 	Advantages:
> > > > 	1)  totally opaque data structure (no ABI breakages in future), PMD
> > > writer is totally free
> > > > 	     with it format and contents.
> > > >
> > > > [Akhil] It will have breakage at some point till we don't hit the union size.
> > >
> > > Not sure, what union you are talking about?
> >
> > Union of xforms in rte_security_session_conf
> 
> Hmm, how does it relates here?
> I thought we discussing pure rte_cryptodev_sym_session, no?
> 
> >
> > >
> > > > Rather I don't suspect there will be more parameters added.
> > > > Or do we really care about the ABI breakage when the argument is about
> > > > the correct place to add a piece of code or do we really agree to add code
> > > > anywhere just to avoid that breakage.
> > >
> > > I am talking about maintaining it in future.
> > > if your struct is not seen externally, no chances to introduce ABI breakage.
> > >
> > > >
> > > > 	2) each session entity is self-contained, user doesn't need to bring along
> > > dev_id etc.
> > > > 	    dev_id is needed  only at init stage, after that user will use session ops
> > > to perform
> > > > 	    all operations on that session (process(), clear(), etc.).
> > > >
> > > > [Akhil] There is nothing called as session ops in current DPDK.
> > >
> > > True, but it doesn't mean we can't/shouldn't have it.
> >
> > We can have it if it is not adding complexity for the user. Creating 2 different
> code
> > Paths for user is not desirable for the stack developers.
> >
> > >
> > > > What you are proposing
> > > > is a new concept which doesn't have any extra benefit, rather it is adding
> > > complexity
> > > > to have two different code paths for session create.
> > > >
> > > >
> > > > 	3) User can decide does he wants to store ops[] pointer on a per session
> > > basis,
> > > > 	    or on a per group of same sessions, or...
> > > >
> > > > [Akhil] Will the user really care which process API should be called from the
> > > PMD.
> > > > Rather it should be driver's responsibility to store that in the session private
> > > data
> > > > which would be opaque to the user. As per my suggestion same process
> > > function can
> > > > be added to multiple sessions or a single session can be managed inside the
> > > PMD.
> > >
> > > In that case we either need to have a function per session (stored internally),
> > > or make decision (branches) at run-time.
> > > But as I said in other mail - I am ok to add small shim structure here:
> > > either rte_crypto_cpu_sym_session { void *ses; struct
> > > rte_crypto_cpu_sym_session_ops ops; }
> > > or rte_crypto_cpu_sym_session { void *ses; struct
> > > rte_crypto_cpu_sym_session_ops *ops; }
> > > And merge rte_crypto_cpu_sym_init() and rte_crypto_cpu_sym_get_ops()
> into
> > > one (init).
> >
> > Again that will be a separate API call from the user perspective which is not
> good.
> >
> > >
> > > >
> > > >
> > > > 	4) No mandatory mempools for private sessions. User can allocate
> > > memory for cpu-crypto
> > > > 	    session whenever he likes.
> > > >
> > > > [Akhil] you mean session private data?
> > >
> > > Yes.
> > >
> > > > You would need that memory anyways, user will be
> > > > allocating that already.  You do not need to manage that.
> > >
> > > What I am saying - right now user has no choice but to allocate it via
> mempool.
> > > Which is probably not the best options for all cases.
> > >
> > > >
> > > > 	Disadvantages:
> > > > 	5) Extra changes in control path
> > > > 	6) User has to store session_ops pointer explicitly.
> > > >
> > > > [Akhil] More disadvantages:
> > > > - All supporting PMDs will need to maintain TWO types of session for the
> > > > same crypto processing. Suppose a fix or a new feature(or algo) is added,
> PMD
> > > owner
> > > > will need to add code in both the session create APIs. Hence more
> > > maintenance and
> > > > error prone.
> > >
> > > I think majority of code for both paths will be common, plus even we'll reuse
> > > current sym_session_init() -
> > > changes in PMD session_init() code will be unavoidable.
> > > But yes, it will be new entry in devops, that PMD will have to support.
> > > Ok to add it as 7) to the list.
> > >
> > > > - Stacks which will be using these new APIs also need to maintain two
> > > > code path for the same processing while doing session initialization
> > > > for sync and async
> > >
> > > That's the same as #5 above, I think.
> > >
> > > >
> > > >
> > > >      b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and
> > > existing rte_cryptodev_sym_session
> > > >       structure.
> > > > 	Advantages:
> > > > 	1) allows to reuse same struct and init/create/clear() functions.
> > > > 	    Probably less changes in control path.
> > > > 	Disadvantages:
> > > > 	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id,
> > > which means that
> > > > 	    we can't use the same rte_cryptodev_sym_session to hold private
> > > sessions pointers
> > > > 	    for both sync and async mode  for the same device.
> > > >                    So the only option we have - make PMD devops-
> > > >sym_session_configure()
> > > > 	    always create a session that can work in both cpu and lksd modes.
> > > > 	    For some implementations that would probably mean that under the
> > > hood  PMD would create
> > > > 	    2 different session structs (sync/async) and then use one or another
> > > depending on from what API been called.
> > > > 	    Seems doable, but ...:
> > > >                    - will contradict with statement from 1:
> > > > 	      " New fields will be optional and would be used by PMD only when
> > > cpu-crypto session is requested."
> > > >                       Now it becomes mandatory for all apps to specify cpu-crypto
> > > related parameters too,
> > > > 	       even if they don't plan to use that mode - i.e. behavior change,
> > > existing app change.
> > > >                      - might cause extra space overhead.
> > > >
> > > > [Akhil] It will not contradict with #1, you will only have few checks in the
> > > session init PMD
> > > > Which support this mode, find appropriate values and set the appropriate
> > > process() in it.
> > > > User should be able to call, legacy enq-deq as well as the new process()
> > > without any issue.
> > > > User would be at runtime will be able to change the datapath.
> > > > So this is not a disadvantage, it would be additional flexibility for the user.
> > >
> > > Ok, but that's what I am saying - if PMD would *always* have to create a
> > > session that can handle
> > > both modes (sync/async), then user would *always* have to provide
> parameters
> > > for both modes too.
> > > Otherwise if let say user didn't setup sync specific parameters at all, what
> PMD
> > > should do?
> > >   - return with error?
> > >   - init session that can be used with async path only?
> > > My current assumption is #1.
> > > If #2, then how user will be able to distinguish is that session valid for both
> > > modes, or only for one?
> >
> > I would say a 3rd option, do nothing if sync params are not set.
> > Probably have a debug print in the PMD(which support sync mode) to specify
> that
> > session is not configured properly for sync mode.
> 
> So, just print warning and proceed with init session that can be used with async
> path only?
> Then it sounds the same as #2 above.
> Which actually means that sync mode parameters for sym_session_init()
> becomes optional.
> Then we need an API to provide to the user information what modes
> (sync+async/async only) is supported by that session for given dev_id.
> And user would have to query/retain this information at control-path,
> and store it somewhere in user-space together with session pointer and dev_ids
> to use later at data-path (same as we do now for session type).
> That definitely requires changes in control-path to start using it.
> Plus the fact that this value can differ for different dev_ids for the same session -
> doesn't make things easier here.

API wont be required to specify that. Feature flag will be sufficient, not a big change
From the application perspective.

Here is some pseudo code just to elaborate my understanding. This will need some

From application,
If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC) {
	/* set additional params in crypto xform */
}

Now in the driver,
pmd_sym_session_configure(dev,xform,sess,mempool) {
	...
	If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC
		&& xform->/*sync params are set*/) {
		/*Assign process function pointer in sess->priv_data*/
	} /* It may return error if FF_SYNC is set and params are not correct.
	        It would be upto the driver whether it support both SYNC and ASYNC.*/
}

Now the new sync API

pmd_process(...) {
	If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC
			 && sess_priv->process != NULL)
		sess_priv->process(...);
	else
		ASSERT("sync mode not configured properly or not supported");
}

In the data path, there is no extra processing happening.
Even in case of your suggestion, you should have these type of error checks,
You cannot blindly trust on the application that the pointers are correct.

> 
> > Internally the PMD will not store the process() API in the session priv data
> > And while calling the first packet, devops->process will give an assert that
> session
> > Is not configured for sync mode. The session validation would be done in any
> case
> > your suggestion or mine. So no extra overhead at runtime.
> 
> I believe that after session_init() user should get either an error or
> valid  session handler that he can use at runtime.
> Pushing session validation to runtime doesn't seem like a good idea.
> 
It may get a warning from the PMD, that FF_SYNC is set but params are not
Correct/available. See above.

> >
> > >
> > >
> > > >
> > > >
> > > > 	3) not possible to store device (not driver) specific data within the
> > > session, but I think it is not really needed right now.
> > > > 	    So probably minor compared to 2.b.2.
> > > >
> > > > [Akhil] So lets omit this for current discussion. And I hope we can find some
> > > way to deal with it.
> > >
> > > I don't think there is an easy way to fix that with existing API.
> > >
> > > >
> > > >
> > > > Actually #3 follows from #2, but decided to have them separated.
> > > >
> > > > 3. process() parameters/behavior
> > > >     a) Our suggestion: user stores ptr to session ops (or to (*process) itself)
> and
> > > just does:
> > > >         session_ops->process(sess, ...);
> > > > 	Advantages:
> > > > 	1) fastest possible execution path
> > > > 	2) no need to carry on dev_id for data-path
> > > >
> > > > [Akhil] I don't see any overhead of carrying dev id, at least it would be
> inline
> > > with the
> > > > current DPDK methodology.
> > >
> > > If we'll add process() into rte_cryptodev itself (same as we have
> > > enqueue_burst/dequeue_burst),
> > > then it will be an ABI breakage.
> > > Also there are discussions to get rid of that approach completely:
> > >
> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpd
> k.org%2Farchives%2Fdev%2F2019-
> September%2F144674.html&amp;data=02%7C01%7Cakhil.goyal%40nxp.com%7
> C1859dc1d29cd45a51e9908d7571784bb%7C686ea1d3bc2b4c6fa92cd99c5c301
> 635%7C0%7C0%7C637073630835415165&amp;sdata=Bz9jgisyVzRJNt1BijtvSlurh
> JU1vXBbynNwlMDjaco%3D&amp;reserved=0
> > > So I am not sure this is a recommended way these days.
> >
> > We can either have it in rte_cryptodev or in rte_cryptodev_ops whichever
> > is good for you.
> >
> > Whether it is ABI breakage or not, as per your requirements, this is the correct
> > approach. Do you agree with this or not?
> 
> I think it is possible approach, but not the best one:
> it looks quite flakey to me (see all these uncertainty with sym_session_init
> above),
> plus introduces extra overhead at data-path.

Uncertainties can be handled appropriately using a feature flag
And As per my understanding there is no extra overhead in data path.

> 
> >
> > Now handling the API/ABI breakage is a separate story. In 19.11 release we
> > Are not much concerned about the ABI breakages, this was discussed in
> > community. So adding a new dev_ops wouldn't have been an issue.
> > Now since we are so close to RC1 deadline, we should come up with some
> > other solution for next release. May be having a pmd API in 20.02 and
> > converting it into formal one in 20.11
> >
> >
> > >
> > > > What you are suggesting is a new way to get the things done without much
> > > benefit.
> > >
> > > Would help with ABI stability plus better performance, isn't it enough?
> > >
> > > > Also I don't see any performance difference as crypto workload is heavier
> than
> > > > Code cycles, so that wont matter.
> > >
> > > It depends.
> > > Suppose function call costs you ~30 cycles.
> > > If you have burst of big packets (let say crypto for each will take ~2K cycles)
> that
> > > belong
> > > to the same session, then yes you wouldn't notice these extra 30 cycles at all.
> > > If you have burst of small packets (let say crypto for each will take ~300
> cycles)
> > > each
> > > belongs to different session, then it will cost you ~10% extra.
> >
> > Let us do some profiling on openssl with both the approaches and find out the
> > difference.
> >
> > >
> > > > So IMO, there is no advantage in your suggestion as well.
> > > >
> > > >
> > > > 	Disadvantages:
> > > > 	3) user has to carry on session_ops pointer explicitly
> > > >     b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and
> then:
> > > >         rte_crypto_cpu_sym_process(uint8_t dev_id,
> rte_cryptodev_sym_session
> > > *sess, /*data parameters*/) {...
> > > >                      rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
> > > >                       /*and then inside PMD specifc process: */
> > > >                      pmd_private_session = sess-
> >sess_data[this_pmd_driver_id].data;
> > > >                      /* and then most likely either */
> > > >                      pmd_private_session->process(pmd_private_session, ...);
> > > >                      /* or jump based on session/input data */
> > > > 	Advantages:
> > > > 	1) don't see any...
> > > > 	Disadvantages:
> > > > 	2) User has to carry on dev_id inside data-path
> > > > 	3) Extra level of indirection (plus data dependency) - both for data and
> > > instructions.
> > > > 	    Possible slowdown compared to a) (not measured).
> > > >
> > > > Having said all this, if the disagreements cannot be resolved, you can go
> for a
> > > pmd API specific
> > > > to your PMDs,
> > >
> > > I don't think it is good idea.
> > > PMD specific API is sort of deprecated path, also there is no clean way to use
> it
> > > within the libraries.
> >
> > I know that this is a deprecated path, we can use it until we are not allowed
> > to break ABI/API
> >
> > >
> > > > because as per my understanding the solution doesn't look scalable to
> other
> > > PMDs.
> > > > Your approach is aligned only to Intel , will not benefit others like openssl
> > > which is used by all
> > > > vendors.
> > >
> > > I feel quite opposite, from my perspective majority of SW backed PMDs will
> > > benefit from it.
> > > And I don't see anything Intel specific in my proposals above.
> > > About openssl PMD: I am not an expert here, but looking at the code, I think
> it
> > > will fit really well.
> > > Look yourself at its internal functions:
> > > process_openssl_auth_op/process_openssl_crypto_op,
> > > I think they doing exactly the same - they use sync API underneath, and they
> are
> > > session based
> > > (AFAIK you don't need any device/queue data, everything that needed for
> > > crypto/auth is stored inside session).
> > >
> > By vendor specific, I mean,
> > - no PMD would like to have 2 different variants of session Init APIs for doing
> the same stuff.
> > - stacks will become vendor specific while using 2 separate session create APIs.
> No stack would
> > Like to support 2 variants of session create- one for HW PMDs and one for SW
> PMDs.
> 
> I think what you refer on has nothing to do with 'vendor specific'.
> I would name it 'extra overhead for PMD and stack writers'.
> Yes, for sure there is extra overhead (as always with new API) -
> for both producer (PMD writer) and consumer (stack writer):
> New function(s) to support,  probably more tests to create/run, etc.
> Though this API is optional - if PMD/stack maintainer doesn't see
> value in it, they are free not to support it.
> From other side, re-using  rte_cryptodev_sym_session_init()
> wouldn't help anyway - both data-path and control-path would differ
> from async mode anyway.
> BTW, right now to support different HW flavors
> we do have 4 different control and data-paths for both
> ipsec-secgw and librte_ipsec:
> lkds-none/lksd-proto/inline-crypto/inline-proto.
> And that is considered to be ok.

No that is not ok. We cannot add new paths for every other case.
Those 4 are controlled using 2 set of APIs. We should try our best to
Have minimum overhead to the application writer. This pain was also discussed
In the one of DPDK conference as well.
DPDK is not a standalone entity, there are stacks running over it always.
We should not add API for every other use case when we have an alternative
Approach with the existing API set.

Now introducing another one would add to that pain and a lot of work for
Both producer and consumer.
It would be interesting to see how much performance difference will be there in the
Two approaches. As per my understanding it wont be much as compared to the
Extra work that you will be inducing.

-Akhil

> Honestly, I don't understand why SW backed implementations
> can't have their own path that would suite them most.
> Konstantin
> 
> 
> 
> 
>
  
Ananyev, Konstantin Oct. 30, 2019, 2:23 p.m. UTC | #38
Hi Akhil,

> > > > > Added my comments inline with your draft.
> > > > > [snip]..
> > > > >
> > > > > >
> > > > > > Ok, then my suggestion:
> > > > > > Let's at least write down all points about crypto-dev approach where we
> > > > > > disagree and then probably try to resolve them one by one....
> > > > > > If we fail to make an agreement/progress in next week or so,
> > > > > > (and no more reviews from the community)
> > > > > > will have bring that subject to TB meeting to decide.
> > > > > > Sounds fair to you?
> > > > > Agreed
> > > > > >
> > > > > > List is below.
> > > > > > Please add/correct me, if I missed something.
> > > > > >
> > > > > > Konstantin
> > > > >
> > > > > Before going into comparison, we should define the requirement as well.
> > > >
> > > > Good point.
> > > >
> > > > > What I understood from the patchset,
> > > > > "You need a synchronous API to perform crypto operations on raw data
> > using
> > > > SW PMDs"
> > > > > So,
> > > > > - no crypto-ops,
> > > > > - no separate enq-deq, only single process API for data path
> > > > > - Do not need any value addition to the session parameters.
> > > > >   (You would need some parameters from the crypto-op which
> > > > >    Are constant per session and since you wont use crypto-op,
> > > > >    You need some place to store that)
> > > >
> > > > Yes, this is correct, I think.
> > > >
> > > > >
> > > > > Now as per your mail, the comparison
> > > > > 1. extra input parameters to create/init rte_(cpu)_sym_session.
> > > > >
> > > > > Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo'
> > and
> > > > 'key' fields.
> > > > > New fields will be optional and would be used by PMD only when cpu-
> > crypto
> > > > session is requested.
> > > > > For lksd-crypto session PMD is free to ignore these fields.
> > > > > No ABI breakage is required.
> > > > >
> > > > > [Akhil] Agreed, no issues.
> > > > >
> > > > > 2. cpu-crypto create/init.
> > > > >     a) Our suggestion - introduce new API for that:
> > > > >         - rte_crypto_cpu_sym_init() that would init completely opaque
> > > > rte_crypto_cpu_sym_session.
> > > > >         - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear);
> > > > /*whatever else we'll need *'};
> > > > >         - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform
> > > > *xforms)
> > > > >           that would return const struct rte_crypto_cpu_sym_session_ops
> > *based
> > > > on input xforms.
> > > > > 	Advantages:
> > > > > 	1)  totally opaque data structure (no ABI breakages in future), PMD
> > > > writer is totally free
> > > > > 	     with it format and contents.
> > > > >
> > > > > [Akhil] It will have breakage at some point till we don't hit the union size.
> > > >
> > > > Not sure, what union you are talking about?
> > >
> > > Union of xforms in rte_security_session_conf
> >
> > Hmm, how does it relates here?
> > I thought we discussing pure rte_cryptodev_sym_session, no?
> >
> > >
> > > >
> > > > > Rather I don't suspect there will be more parameters added.
> > > > > Or do we really care about the ABI breakage when the argument is about
> > > > > the correct place to add a piece of code or do we really agree to add code
> > > > > anywhere just to avoid that breakage.
> > > >
> > > > I am talking about maintaining it in future.
> > > > if your struct is not seen externally, no chances to introduce ABI breakage.
> > > >
> > > > >
> > > > > 	2) each session entity is self-contained, user doesn't need to bring along
> > > > dev_id etc.
> > > > > 	    dev_id is needed  only at init stage, after that user will use session ops
> > > > to perform
> > > > > 	    all operations on that session (process(), clear(), etc.).
> > > > >
> > > > > [Akhil] There is nothing called as session ops in current DPDK.
> > > >
> > > > True, but it doesn't mean we can't/shouldn't have it.
> > >
> > > We can have it if it is not adding complexity for the user. Creating 2 different
> > code
> > > Paths for user is not desirable for the stack developers.
> > >
> > > >
> > > > > What you are proposing
> > > > > is a new concept which doesn't have any extra benefit, rather it is adding
> > > > complexity
> > > > > to have two different code paths for session create.
> > > > >
> > > > >
> > > > > 	3) User can decide does he wants to store ops[] pointer on a per session
> > > > basis,
> > > > > 	    or on a per group of same sessions, or...
> > > > >
> > > > > [Akhil] Will the user really care which process API should be called from the
> > > > PMD.
> > > > > Rather it should be driver's responsibility to store that in the session private
> > > > data
> > > > > which would be opaque to the user. As per my suggestion same process
> > > > function can
> > > > > be added to multiple sessions or a single session can be managed inside the
> > > > PMD.
> > > >
> > > > In that case we either need to have a function per session (stored internally),
> > > > or make decision (branches) at run-time.
> > > > But as I said in other mail - I am ok to add small shim structure here:
> > > > either rte_crypto_cpu_sym_session { void *ses; struct
> > > > rte_crypto_cpu_sym_session_ops ops; }
> > > > or rte_crypto_cpu_sym_session { void *ses; struct
> > > > rte_crypto_cpu_sym_session_ops *ops; }
> > > > And merge rte_crypto_cpu_sym_init() and rte_crypto_cpu_sym_get_ops()
> > into
> > > > one (init).
> > >
> > > Again that will be a separate API call from the user perspective which is not
> > good.
> > >
> > > >
> > > > >
> > > > >
> > > > > 	4) No mandatory mempools for private sessions. User can allocate
> > > > memory for cpu-crypto
> > > > > 	    session whenever he likes.
> > > > >
> > > > > [Akhil] you mean session private data?
> > > >
> > > > Yes.
> > > >
> > > > > You would need that memory anyways, user will be
> > > > > allocating that already.  You do not need to manage that.
> > > >
> > > > What I am saying - right now user has no choice but to allocate it via
> > mempool.
> > > > Which is probably not the best options for all cases.
> > > >
> > > > >
> > > > > 	Disadvantages:
> > > > > 	5) Extra changes in control path
> > > > > 	6) User has to store session_ops pointer explicitly.
> > > > >
> > > > > [Akhil] More disadvantages:
> > > > > - All supporting PMDs will need to maintain TWO types of session for the
> > > > > same crypto processing. Suppose a fix or a new feature(or algo) is added,
> > PMD
> > > > owner
> > > > > will need to add code in both the session create APIs. Hence more
> > > > maintenance and
> > > > > error prone.
> > > >
> > > > I think majority of code for both paths will be common, plus even we'll reuse
> > > > current sym_session_init() -
> > > > changes in PMD session_init() code will be unavoidable.
> > > > But yes, it will be new entry in devops, that PMD will have to support.
> > > > Ok to add it as 7) to the list.
> > > >
> > > > > - Stacks which will be using these new APIs also need to maintain two
> > > > > code path for the same processing while doing session initialization
> > > > > for sync and async
> > > >
> > > > That's the same as #5 above, I think.
> > > >
> > > > >
> > > > >
> > > > >      b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and
> > > > existing rte_cryptodev_sym_session
> > > > >       structure.
> > > > > 	Advantages:
> > > > > 	1) allows to reuse same struct and init/create/clear() functions.
> > > > > 	    Probably less changes in control path.
> > > > > 	Disadvantages:
> > > > > 	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id,
> > > > which means that
> > > > > 	    we can't use the same rte_cryptodev_sym_session to hold private
> > > > sessions pointers
> > > > > 	    for both sync and async mode  for the same device.
> > > > >                    So the only option we have - make PMD devops-
> > > > >sym_session_configure()
> > > > > 	    always create a session that can work in both cpu and lksd modes.
> > > > > 	    For some implementations that would probably mean that under the
> > > > hood  PMD would create
> > > > > 	    2 different session structs (sync/async) and then use one or another
> > > > depending on from what API been called.
> > > > > 	    Seems doable, but ...:
> > > > >                    - will contradict with statement from 1:
> > > > > 	      " New fields will be optional and would be used by PMD only when
> > > > cpu-crypto session is requested."
> > > > >                       Now it becomes mandatory for all apps to specify cpu-crypto
> > > > related parameters too,
> > > > > 	       even if they don't plan to use that mode - i.e. behavior change,
> > > > existing app change.
> > > > >                      - might cause extra space overhead.
> > > > >
> > > > > [Akhil] It will not contradict with #1, you will only have few checks in the
> > > > session init PMD
> > > > > Which support this mode, find appropriate values and set the appropriate
> > > > process() in it.
> > > > > User should be able to call, legacy enq-deq as well as the new process()
> > > > without any issue.
> > > > > User would be at runtime will be able to change the datapath.
> > > > > So this is not a disadvantage, it would be additional flexibility for the user.
> > > >
> > > > Ok, but that's what I am saying - if PMD would *always* have to create a
> > > > session that can handle
> > > > both modes (sync/async), then user would *always* have to provide
> > parameters
> > > > for both modes too.
> > > > Otherwise if let say user didn't setup sync specific parameters at all, what
> > PMD
> > > > should do?
> > > >   - return with error?
> > > >   - init session that can be used with async path only?
> > > > My current assumption is #1.
> > > > If #2, then how user will be able to distinguish is that session valid for both
> > > > modes, or only for one?
> > >
> > > I would say a 3rd option, do nothing if sync params are not set.
> > > Probably have a debug print in the PMD(which support sync mode) to specify
> > that
> > > session is not configured properly for sync mode.
> >
> > So, just print warning and proceed with init session that can be used with async
> > path only?
> > Then it sounds the same as #2 above.
> > Which actually means that sync mode parameters for sym_session_init()
> > becomes optional.
> > Then we need an API to provide to the user information what modes
> > (sync+async/async only) is supported by that session for given dev_id.
> > And user would have to query/retain this information at control-path,
> > and store it somewhere in user-space together with session pointer and dev_ids
> > to use later at data-path (same as we do now for session type).
> > That definitely requires changes in control-path to start using it.
> > Plus the fact that this value can differ for different dev_ids for the same session -
> > doesn't make things easier here.
> 
> API wont be required to specify that. Feature flag will be sufficient, not a big change
> From the application perspective.
> 
> Here is some pseudo code just to elaborate my understanding. This will need some
> 
> From application,
> If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC) {
> 	/* set additional params in crypto xform */
> }
> 
> Now in the driver,
> pmd_sym_session_configure(dev,xform,sess,mempool) {
> 	...
> 	If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC
> 		&& xform->/*sync params are set*/) {
> 		/*Assign process function pointer in sess->priv_data*/
> 	} /* It may return error if FF_SYNC is set and params are not correct.

Then all apps will always *have to* setup  sync parameters in xform.
What you suggest is *mandatory* sync mode: user always has to setup sync
mode params if PMD does support it (no matter does he plan to use sync mode or not).   
Which means behavior change in existing apps.

> 	        It would be upto the driver whether it support both SYNC and ASYNC.*/
> }
> 
> Now the new sync API
> 
> pmd_process(...) {
> 	If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC
> 			 && sess_priv->process != NULL)
> 		sess_priv->process(...);
> 	else
> 		ASSERT("sync mode not configured properly or not supported");
> }
> 
> In the data path, there is no extra processing happening.
> Even in case of your suggestion, you should have these type of error checks,
> You cannot blindly trust on the application that the pointers are correct.
> 
> >
> > > Internally the PMD will not store the process() API in the session priv data
> > > And while calling the first packet, devops->process will give an assert that
> > session
> > > Is not configured for sync mode. The session validation would be done in any
> > case
> > > your suggestion or mine. So no extra overhead at runtime.
> >
> > I believe that after session_init() user should get either an error or
> > valid  session handler that he can use at runtime.
> > Pushing session validation to runtime doesn't seem like a good idea.
> >
> It may get a warning from the PMD, that FF_SYNC is set but params are not
> Correct/available. See above.

I think warning is not enough.
There should be a clear way (API) for developer to realize is the created session
can be used by sync API data-path or not. 

> 
> > >
> > > >
> > > >
> > > > >
> > > > >
> > > > > 	3) not possible to store device (not driver) specific data within the
> > > > session, but I think it is not really needed right now.
> > > > > 	    So probably minor compared to 2.b.2.
> > > > >
> > > > > [Akhil] So lets omit this for current discussion. And I hope we can find some
> > > > way to deal with it.
> > > >
> > > > I don't think there is an easy way to fix that with existing API.
> > > >
> > > > >
> > > > >
> > > > > Actually #3 follows from #2, but decided to have them separated.
> > > > >
> > > > > 3. process() parameters/behavior
> > > > >     a) Our suggestion: user stores ptr to session ops (or to (*process) itself)
> > and
> > > > just does:
> > > > >         session_ops->process(sess, ...);
> > > > > 	Advantages:
> > > > > 	1) fastest possible execution path
> > > > > 	2) no need to carry on dev_id for data-path
> > > > >
> > > > > [Akhil] I don't see any overhead of carrying dev id, at least it would be
> > inline
> > > > with the
> > > > > current DPDK methodology.
> > > >
> > > > If we'll add process() into rte_cryptodev itself (same as we have
> > > > enqueue_burst/dequeue_burst),
> > > > then it will be an ABI breakage.
> > > > Also there are discussions to get rid of that approach completely:
> > > >
> > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpd
> > k.org%2Farchives%2Fdev%2F2019-
> > September%2F144674.html&amp;data=02%7C01%7Cakhil.goyal%40nxp.com%7
> > C1859dc1d29cd45a51e9908d7571784bb%7C686ea1d3bc2b4c6fa92cd99c5c301
> > 635%7C0%7C0%7C637073630835415165&amp;sdata=Bz9jgisyVzRJNt1BijtvSlurh
> > JU1vXBbynNwlMDjaco%3D&amp;reserved=0
> > > > So I am not sure this is a recommended way these days.
> > >
> > > We can either have it in rte_cryptodev or in rte_cryptodev_ops whichever
> > > is good for you.
> > >
> > > Whether it is ABI breakage or not, as per your requirements, this is the correct
> > > approach. Do you agree with this or not?
> >
> > I think it is possible approach, but not the best one:
> > it looks quite flakey to me (see all these uncertainty with sym_session_init
> > above),
> > plus introduces extra overhead at data-path.
> 
> Uncertainties can be handled appropriately using a feature flag
> 
> >
> > >
> > > Now handling the API/ABI breakage is a separate story. In 19.11 release we
> > > Are not much concerned about the ABI breakages, this was discussed in
> > > community. So adding a new dev_ops wouldn't have been an issue.
> > > Now since we are so close to RC1 deadline, we should come up with some
> > > other solution for next release. May be having a pmd API in 20.02 and
> > > converting it into formal one in 20.11
> > >
> > >
> > > >
> > > > > What you are suggesting is a new way to get the things done without much
> > > > benefit.
> > > >
> > > > Would help with ABI stability plus better performance, isn't it enough?
> > > >
> > > > > Also I don't see any performance difference as crypto workload is heavier
> > than
> > > > > Code cycles, so that wont matter.
> > > >
> > > > It depends.
> > > > Suppose function call costs you ~30 cycles.
> > > > If you have burst of big packets (let say crypto for each will take ~2K cycles)
> > that
> > > > belong
> > > > to the same session, then yes you wouldn't notice these extra 30 cycles at all.
> > > > If you have burst of small packets (let say crypto for each will take ~300
> > cycles)
> > > > each
> > > > belongs to different session, then it will cost you ~10% extra.
> > >
> > > Let us do some profiling on openssl with both the approaches and find out the
> > > difference.
> > >
> > > >
> > > > > So IMO, there is no advantage in your suggestion as well.
> > > > >
> > > > >
> > > > > 	Disadvantages:
> > > > > 	3) user has to carry on session_ops pointer explicitly
> > > > >     b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and
> > then:
> > > > >         rte_crypto_cpu_sym_process(uint8_t dev_id,
> > rte_cryptodev_sym_session
> > > > *sess, /*data parameters*/) {...
> > > > >                      rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
> > > > >                       /*and then inside PMD specifc process: */
> > > > >                      pmd_private_session = sess-
> > >sess_data[this_pmd_driver_id].data;
> > > > >                      /* and then most likely either */
> > > > >                      pmd_private_session->process(pmd_private_session, ...);
> > > > >                      /* or jump based on session/input data */
> > > > > 	Advantages:
> > > > > 	1) don't see any...
> > > > > 	Disadvantages:
> > > > > 	2) User has to carry on dev_id inside data-path
> > > > > 	3) Extra level of indirection (plus data dependency) - both for data and
> > > > instructions.
> > > > > 	    Possible slowdown compared to a) (not measured).
> > > > >
> > > > > Having said all this, if the disagreements cannot be resolved, you can go
> > for a
> > > > pmd API specific
> > > > > to your PMDs,
> > > >
> > > > I don't think it is good idea.
> > > > PMD specific API is sort of deprecated path, also there is no clean way to use
> > it
> > > > within the libraries.
> > >
> > > I know that this is a deprecated path, we can use it until we are not allowed
> > > to break ABI/API
> > >
> > > >
> > > > > because as per my understanding the solution doesn't look scalable to
> > other
> > > > PMDs.
> > > > > Your approach is aligned only to Intel , will not benefit others like openssl
> > > > which is used by all
> > > > > vendors.
> > > >
> > > > I feel quite opposite, from my perspective majority of SW backed PMDs will
> > > > benefit from it.
> > > > And I don't see anything Intel specific in my proposals above.
> > > > About openssl PMD: I am not an expert here, but looking at the code, I think
> > it
> > > > will fit really well.
> > > > Look yourself at its internal functions:
> > > > process_openssl_auth_op/process_openssl_crypto_op,
> > > > I think they doing exactly the same - they use sync API underneath, and they
> > are
> > > > session based
> > > > (AFAIK you don't need any device/queue data, everything that needed for
> > > > crypto/auth is stored inside session).
> > > >
> > > By vendor specific, I mean,
> > > - no PMD would like to have 2 different variants of session Init APIs for doing
> > the same stuff.
> > > - stacks will become vendor specific while using 2 separate session create APIs.
> > No stack would
> > > Like to support 2 variants of session create- one for HW PMDs and one for SW
> > PMDs.
> >
> > I think what you refer on has nothing to do with 'vendor specific'.
> > I would name it 'extra overhead for PMD and stack writers'.
> > Yes, for sure there is extra overhead (as always with new API) -
> > for both producer (PMD writer) and consumer (stack writer):
> > New function(s) to support,  probably more tests to create/run, etc.
> > Though this API is optional - if PMD/stack maintainer doesn't see
> > value in it, they are free not to support it.
> > From other side, re-using  rte_cryptodev_sym_session_init()
> > wouldn't help anyway - both data-path and control-path would differ
> > from async mode anyway.
> > BTW, right now to support different HW flavors
> > we do have 4 different control and data-paths for both
> > ipsec-secgw and librte_ipsec:
> > lkds-none/lksd-proto/inline-crypto/inline-proto.
> > And that is considered to be ok.
> 
> No that is not ok. We cannot add new paths for every other case.

What I am saying: if let-say lookaside-proto/inline-crypto/inline-proto
deserves its own case in rte_security/rte_crypto API,
I don't understand why cpu-crypto doesn't.

> Those 4 are controlled using 2 set of APIs.

Yes there are 2 API sets (rte_cryptodev/rte_security),
but in fact if you look at ipsec-secgw and librte_ipsec we have 4 different code paths.
For both create_session() and ipsec_enqueue() we have a big switch() with 4 different cases.
Nearly the same for librte_ipsec - we have different prepare/process
function pointers for each security type.  

> We should try our best to
> Have minimum overhead to the application writer. This pain was also discussed
> In the one of DPDK conference as well.
> DPDK is not a standalone entity, there are stacks running over it always.
> We should not add API for every other use case when we have an alternative
> Approach with the existing API set.
> 
> Now introducing another one would add to that pain and a lot of work for
> Both producer and consumer.

If I would see a clean approach to implement desired functionality
without introducing new API - I would definitely support it.
The problem is  that from my perspective,
what you suggesting with existing API will bring more drawbacks then positives.
BTW, our first approach (via rte_security) does reuse existing API,
so if adding new API is the main concern - let's reconsider that path.    

> It would be interesting to see how much performance difference will be there in the
> Two approaches. As per my understanding it wont be much as compared to the
> Extra work that you will be inducing.
> 
> -Akhil
> 
> > Honestly, I don't understand why SW backed implementations
> > can't have their own path that would suite them most.
> > Konstantin
> >
> >
> >
> >
> >
  
Akhil Goyal Nov. 1, 2019, 1:53 p.m. UTC | #39
Hi Konstantin,

> 
> 
> Hi Akhil,
> 
> > > > > > Added my comments inline with your draft.
> > > > > > [snip]..
> > > > > >
> > > > > > >
> > > > > > > Ok, then my suggestion:
> > > > > > > Let's at least write down all points about crypto-dev approach where
> we
> > > > > > > disagree and then probably try to resolve them one by one....
> > > > > > > If we fail to make an agreement/progress in next week or so,
> > > > > > > (and no more reviews from the community)
> > > > > > > will have bring that subject to TB meeting to decide.
> > > > > > > Sounds fair to you?
> > > > > > Agreed
> > > > > > >
> > > > > > > List is below.
> > > > > > > Please add/correct me, if I missed something.
> > > > > > >
> > > > > > > Konstantin
> > > > > >
> > > > > > Before going into comparison, we should define the requirement as
> well.
> > > > >
> > > > > Good point.
> > > > >
> > > > > > What I understood from the patchset,
> > > > > > "You need a synchronous API to perform crypto operations on raw data
> > > using
> > > > > SW PMDs"
> > > > > > So,
> > > > > > - no crypto-ops,
> > > > > > - no separate enq-deq, only single process API for data path
> > > > > > - Do not need any value addition to the session parameters.
> > > > > >   (You would need some parameters from the crypto-op which
> > > > > >    Are constant per session and since you wont use crypto-op,
> > > > > >    You need some place to store that)
> > > > >
> > > > > Yes, this is correct, I think.
> > > > >
> > > > > >
> > > > > > Now as per your mail, the comparison
> > > > > > 1. extra input parameters to create/init rte_(cpu)_sym_session.
> > > > > >
> > > > > > Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo'
> > > and
> > > > > 'key' fields.
> > > > > > New fields will be optional and would be used by PMD only when cpu-
> > > crypto
> > > > > session is requested.
> > > > > > For lksd-crypto session PMD is free to ignore these fields.
> > > > > > No ABI breakage is required.
> > > > > >
> > > > > > [Akhil] Agreed, no issues.
> > > > > >
> > > > > > 2. cpu-crypto create/init.
> > > > > >     a) Our suggestion - introduce new API for that:
> > > > > >         - rte_crypto_cpu_sym_init() that would init completely opaque
> > > > > rte_crypto_cpu_sym_session.
> > > > > >         - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear);
> > > > > /*whatever else we'll need *'};
> > > > > >         - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform
> > > > > *xforms)
> > > > > >           that would return const struct rte_crypto_cpu_sym_session_ops
> > > *based
> > > > > on input xforms.
> > > > > > 	Advantages:
> > > > > > 	1)  totally opaque data structure (no ABI breakages in future),
> PMD
> > > > > writer is totally free
> > > > > > 	     with it format and contents.
> > > > > >
> > > > > > [Akhil] It will have breakage at some point till we don't hit the union
> size.
> > > > >
> > > > > Not sure, what union you are talking about?
> > > >
> > > > Union of xforms in rte_security_session_conf
> > >
> > > Hmm, how does it relates here?
> > > I thought we discussing pure rte_cryptodev_sym_session, no?
> > >
> > > >
> > > > >
> > > > > > Rather I don't suspect there will be more parameters added.
> > > > > > Or do we really care about the ABI breakage when the argument is
> about
> > > > > > the correct place to add a piece of code or do we really agree to add
> code
> > > > > > anywhere just to avoid that breakage.
> > > > >
> > > > > I am talking about maintaining it in future.
> > > > > if your struct is not seen externally, no chances to introduce ABI
> breakage.
> > > > >
> > > > > >
> > > > > > 	2) each session entity is self-contained, user doesn't need to
> bring along
> > > > > dev_id etc.
> > > > > > 	    dev_id is needed  only at init stage, after that user will use
> session ops
> > > > > to perform
> > > > > > 	    all operations on that session (process(), clear(), etc.).
> > > > > >
> > > > > > [Akhil] There is nothing called as session ops in current DPDK.
> > > > >
> > > > > True, but it doesn't mean we can't/shouldn't have it.
> > > >
> > > > We can have it if it is not adding complexity for the user. Creating 2
> different
> > > code
> > > > Paths for user is not desirable for the stack developers.
> > > >
> > > > >
> > > > > > What you are proposing
> > > > > > is a new concept which doesn't have any extra benefit, rather it is
> adding
> > > > > complexity
> > > > > > to have two different code paths for session create.
> > > > > >
> > > > > >
> > > > > > 	3) User can decide does he wants to store ops[] pointer on a per
> session
> > > > > basis,
> > > > > > 	    or on a per group of same sessions, or...
> > > > > >
> > > > > > [Akhil] Will the user really care which process API should be called from
> the
> > > > > PMD.
> > > > > > Rather it should be driver's responsibility to store that in the session
> private
> > > > > data
> > > > > > which would be opaque to the user. As per my suggestion same process
> > > > > function can
> > > > > > be added to multiple sessions or a single session can be managed inside
> the
> > > > > PMD.
> > > > >
> > > > > In that case we either need to have a function per session (stored
> internally),
> > > > > or make decision (branches) at run-time.
> > > > > But as I said in other mail - I am ok to add small shim structure here:
> > > > > either rte_crypto_cpu_sym_session { void *ses; struct
> > > > > rte_crypto_cpu_sym_session_ops ops; }
> > > > > or rte_crypto_cpu_sym_session { void *ses; struct
> > > > > rte_crypto_cpu_sym_session_ops *ops; }
> > > > > And merge rte_crypto_cpu_sym_init() and rte_crypto_cpu_sym_get_ops()
> > > into
> > > > > one (init).
> > > >
> > > > Again that will be a separate API call from the user perspective which is not
> > > good.
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > 	4) No mandatory mempools for private sessions. User can
> allocate
> > > > > memory for cpu-crypto
> > > > > > 	    session whenever he likes.
> > > > > >
> > > > > > [Akhil] you mean session private data?
> > > > >
> > > > > Yes.
> > > > >
> > > > > > You would need that memory anyways, user will be
> > > > > > allocating that already.  You do not need to manage that.
> > > > >
> > > > > What I am saying - right now user has no choice but to allocate it via
> > > mempool.
> > > > > Which is probably not the best options for all cases.
> > > > >
> > > > > >
> > > > > > 	Disadvantages:
> > > > > > 	5) Extra changes in control path
> > > > > > 	6) User has to store session_ops pointer explicitly.
> > > > > >
> > > > > > [Akhil] More disadvantages:
> > > > > > - All supporting PMDs will need to maintain TWO types of session for
> the
> > > > > > same crypto processing. Suppose a fix or a new feature(or algo) is
> added,
> > > PMD
> > > > > owner
> > > > > > will need to add code in both the session create APIs. Hence more
> > > > > maintenance and
> > > > > > error prone.
> > > > >
> > > > > I think majority of code for both paths will be common, plus even we'll
> reuse
> > > > > current sym_session_init() -
> > > > > changes in PMD session_init() code will be unavoidable.
> > > > > But yes, it will be new entry in devops, that PMD will have to support.
> > > > > Ok to add it as 7) to the list.
> > > > >
> > > > > > - Stacks which will be using these new APIs also need to maintain two
> > > > > > code path for the same processing while doing session initialization
> > > > > > for sync and async
> > > > >
> > > > > That's the same as #5 above, I think.
> > > > >
> > > > > >
> > > > > >
> > > > > >      b) Your suggestion - reuse existing rte_cryptodev_sym_session_init()
> and
> > > > > existing rte_cryptodev_sym_session
> > > > > >       structure.
> > > > > > 	Advantages:
> > > > > > 	1) allows to reuse same struct and init/create/clear() functions.
> > > > > > 	    Probably less changes in control path.
> > > > > > 	Disadvantages:
> > > > > > 	2) rte_cryptodev_sym_session. sess_data[] is indexed by
> driver_id,
> > > > > which means that
> > > > > > 	    we can't use the same rte_cryptodev_sym_session to hold
> private
> > > > > sessions pointers
> > > > > > 	    for both sync and async mode  for the same device.
> > > > > >                    So the only option we have - make PMD devops-
> > > > > >sym_session_configure()
> > > > > > 	    always create a session that can work in both cpu and lksd
> modes.
> > > > > > 	    For some implementations that would probably mean that
> under the
> > > > > hood  PMD would create
> > > > > > 	    2 different session structs (sync/async) and then use one or
> another
> > > > > depending on from what API been called.
> > > > > > 	    Seems doable, but ...:
> > > > > >                    - will contradict with statement from 1:
> > > > > > 	      " New fields will be optional and would be used by PMD only
> when
> > > > > cpu-crypto session is requested."
> > > > > >                       Now it becomes mandatory for all apps to specify cpu-
> crypto
> > > > > related parameters too,
> > > > > > 	       even if they don't plan to use that mode - i.e. behavior
> change,
> > > > > existing app change.
> > > > > >                      - might cause extra space overhead.
> > > > > >
> > > > > > [Akhil] It will not contradict with #1, you will only have few checks in
> the
> > > > > session init PMD
> > > > > > Which support this mode, find appropriate values and set the
> appropriate
> > > > > process() in it.
> > > > > > User should be able to call, legacy enq-deq as well as the new process()
> > > > > without any issue.
> > > > > > User would be at runtime will be able to change the datapath.
> > > > > > So this is not a disadvantage, it would be additional flexibility for the
> user.
> > > > >
> > > > > Ok, but that's what I am saying - if PMD would *always* have to create a
> > > > > session that can handle
> > > > > both modes (sync/async), then user would *always* have to provide
> > > parameters
> > > > > for both modes too.
> > > > > Otherwise if let say user didn't setup sync specific parameters at all, what
> > > PMD
> > > > > should do?
> > > > >   - return with error?
> > > > >   - init session that can be used with async path only?
> > > > > My current assumption is #1.
> > > > > If #2, then how user will be able to distinguish is that session valid for
> both
> > > > > modes, or only for one?
> > > >
> > > > I would say a 3rd option, do nothing if sync params are not set.
> > > > Probably have a debug print in the PMD(which support sync mode) to
> specify
> > > that
> > > > session is not configured properly for sync mode.
> > >
> > > So, just print warning and proceed with init session that can be used with
> async
> > > path only?
> > > Then it sounds the same as #2 above.
> > > Which actually means that sync mode parameters for sym_session_init()
> > > becomes optional.
> > > Then we need an API to provide to the user information what modes
> > > (sync+async/async only) is supported by that session for given dev_id.
> > > And user would have to query/retain this information at control-path,
> > > and store it somewhere in user-space together with session pointer and
> dev_ids
> > > to use later at data-path (same as we do now for session type).
> > > That definitely requires changes in control-path to start using it.
> > > Plus the fact that this value can differ for different dev_ids for the same
> session -
> > > doesn't make things easier here.
> >
> > API wont be required to specify that. Feature flag will be sufficient, not a big
> change
> > From the application perspective.
> >
> > Here is some pseudo code just to elaborate my understanding. This will need
> some
> >
> > From application,
> > If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC) {
> > 	/* set additional params in crypto xform */
> > }
> >
> > Now in the driver,
> > pmd_sym_session_configure(dev,xform,sess,mempool) {
> > 	...
> > 	If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC
> > 		&& xform->/*sync params are set*/) {
> > 		/*Assign process function pointer in sess->priv_data*/
> > 	} /* It may return error if FF_SYNC is set and params are not correct.
> 
> Then all apps will always *have to* setup  sync parameters in xform.
> What you suggest is *mandatory* sync mode: user always has to setup sync
> mode params if PMD does support it (no matter does he plan to use sync mode
> or not).
> Which means behavior change in existing apps.

We are adding new params in xform, and user may not fill those params and defaults
To 0 for all the params. Or we can pack a flag in xform when all sync params are set.
It can be dealt with when we do the code.

I don't say, user will always have to set the params when sync mode is supported.
It will be a warning from the PMD and user may ignore it if he doesn't want to use sync mode. 


> 
> > 	        It would be upto the driver whether it support both SYNC and
> ASYNC.*/
> > }
> >
> > Now the new sync API
> >
> > pmd_process(...) {
> > 	If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC
> > 			 && sess_priv->process != NULL)
> > 		sess_priv->process(...);
> > 	else
> > 		ASSERT("sync mode not configured properly or not supported");
> > }
> >
> > In the data path, there is no extra processing happening.
> > Even in case of your suggestion, you should have these type of error checks,
> > You cannot blindly trust on the application that the pointers are correct.
> >
> > >
> > > > Internally the PMD will not store the process() API in the session priv data
> > > > And while calling the first packet, devops->process will give an assert that
> > > session
> > > > Is not configured for sync mode. The session validation would be done in
> any
> > > case
> > > > your suggestion or mine. So no extra overhead at runtime.
> > >
> > > I believe that after session_init() user should get either an error or
> > > valid  session handler that he can use at runtime.
> > > Pushing session validation to runtime doesn't seem like a good idea.
> > >
> > It may get a warning from the PMD, that FF_SYNC is set but params are not
> > Correct/available. See above.
> 
> I think warning is not enough.
> There should be a clear way (API) for developer to realize is the created session
> can be used by sync API data-path or not.

Warning is a clear notification to the user, that SYNC mode can be supported by the device
But user does not want to use that.
Moreover, when first packet is sent, sync API will throw error. So what is the issue.

> 
> >
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > 	3) not possible to store device (not driver) specific data within
> the
> > > > > session, but I think it is not really needed right now.
> > > > > > 	    So probably minor compared to 2.b.2.
> > > > > >
> > > > > > [Akhil] So lets omit this for current discussion. And I hope we can find
> some
> > > > > way to deal with it.
> > > > >
> > > > > I don't think there is an easy way to fix that with existing API.
> > > > >
> > > > > >
> > > > > >
> > > > > > Actually #3 follows from #2, but decided to have them separated.
> > > > > >
> > > > > > 3. process() parameters/behavior
> > > > > >     a) Our suggestion: user stores ptr to session ops (or to (*process)
> itself)
> > > and
> > > > > just does:
> > > > > >         session_ops->process(sess, ...);
> > > > > > 	Advantages:
> > > > > > 	1) fastest possible execution path
> > > > > > 	2) no need to carry on dev_id for data-path
> > > > > >
> > > > > > [Akhil] I don't see any overhead of carrying dev id, at least it would be
> > > inline
> > > > > with the
> > > > > > current DPDK methodology.
> > > > >
> > > > > If we'll add process() into rte_cryptodev itself (same as we have
> > > > > enqueue_burst/dequeue_burst),
> > > > > then it will be an ABI breakage.
> > > > > Also there are discussions to get rid of that approach completely:
> > > > >
> > >
> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpd
> > > k.org%2Farchives%2Fdev%2F2019-
> > >
> September%2F144674.html&amp;data=02%7C01%7Cakhil.goyal%40nxp.com%7
> > >
> C1859dc1d29cd45a51e9908d7571784bb%7C686ea1d3bc2b4c6fa92cd99c5c301
> > >
> 635%7C0%7C0%7C637073630835415165&amp;sdata=Bz9jgisyVzRJNt1BijtvSlurh
> > > JU1vXBbynNwlMDjaco%3D&amp;reserved=0
> > > > > So I am not sure this is a recommended way these days.
> > > >
> > > > We can either have it in rte_cryptodev or in rte_cryptodev_ops whichever
> > > > is good for you.
> > > >
> > > > Whether it is ABI breakage or not, as per your requirements, this is the
> correct
> > > > approach. Do you agree with this or not?
> > >
> > > I think it is possible approach, but not the best one:
> > > it looks quite flakey to me (see all these uncertainty with sym_session_init
> > > above),
> > > plus introduces extra overhead at data-path.
> >
> > Uncertainties can be handled appropriately using a feature flag
> >
> > >
> > > >
> > > > Now handling the API/ABI breakage is a separate story. In 19.11 release we
> > > > Are not much concerned about the ABI breakages, this was discussed in
> > > > community. So adding a new dev_ops wouldn't have been an issue.
> > > > Now since we are so close to RC1 deadline, we should come up with some
> > > > other solution for next release. May be having a pmd API in 20.02 and
> > > > converting it into formal one in 20.11
> > > >
> > > >
> > > > >
> > > > > > What you are suggesting is a new way to get the things done without
> much
> > > > > benefit.
> > > > >
> > > > > Would help with ABI stability plus better performance, isn't it enough?
> > > > >
> > > > > > Also I don't see any performance difference as crypto workload is
> heavier
> > > than
> > > > > > Code cycles, so that wont matter.
> > > > >
> > > > > It depends.
> > > > > Suppose function call costs you ~30 cycles.
> > > > > If you have burst of big packets (let say crypto for each will take ~2K
> cycles)
> > > that
> > > > > belong
> > > > > to the same session, then yes you wouldn't notice these extra 30 cycles
> at all.
> > > > > If you have burst of small packets (let say crypto for each will take ~300
> > > cycles)
> > > > > each
> > > > > belongs to different session, then it will cost you ~10% extra.
> > > >
> > > > Let us do some profiling on openssl with both the approaches and find out
> the
> > > > difference.
> > > >
> > > > >
> > > > > > So IMO, there is no advantage in your suggestion as well.
> > > > > >
> > > > > >
> > > > > > 	Disadvantages:
> > > > > > 	3) user has to carry on session_ops pointer explicitly
> > > > > >     b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops
> and
> > > then:
> > > > > >         rte_crypto_cpu_sym_process(uint8_t dev_id,
> > > rte_cryptodev_sym_session
> > > > > *sess, /*data parameters*/) {...
> > > > > >                      rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
> > > > > >                       /*and then inside PMD specifc process: */
> > > > > >                      pmd_private_session = sess-
> > > >sess_data[this_pmd_driver_id].data;
> > > > > >                      /* and then most likely either */
> > > > > >                      pmd_private_session->process(pmd_private_session, ...);
> > > > > >                      /* or jump based on session/input data */
> > > > > > 	Advantages:
> > > > > > 	1) don't see any...
> > > > > > 	Disadvantages:
> > > > > > 	2) User has to carry on dev_id inside data-path
> > > > > > 	3) Extra level of indirection (plus data dependency) - both for
> data and
> > > > > instructions.
> > > > > > 	    Possible slowdown compared to a) (not measured).
> > > > > >
> > > > > > Having said all this, if the disagreements cannot be resolved, you can
> go
> > > for a
> > > > > pmd API specific
> > > > > > to your PMDs,
> > > > >
> > > > > I don't think it is good idea.
> > > > > PMD specific API is sort of deprecated path, also there is no clean way to
> use
> > > it
> > > > > within the libraries.
> > > >
> > > > I know that this is a deprecated path, we can use it until we are not
> allowed
> > > > to break ABI/API
> > > >
> > > > >
> > > > > > because as per my understanding the solution doesn't look scalable to
> > > other
> > > > > PMDs.
> > > > > > Your approach is aligned only to Intel , will not benefit others like
> openssl
> > > > > which is used by all
> > > > > > vendors.
> > > > >
> > > > > I feel quite opposite, from my perspective majority of SW backed PMDs
> will
> > > > > benefit from it.
> > > > > And I don't see anything Intel specific in my proposals above.
> > > > > About openssl PMD: I am not an expert here, but looking at the code, I
> think
> > > it
> > > > > will fit really well.
> > > > > Look yourself at its internal functions:
> > > > > process_openssl_auth_op/process_openssl_crypto_op,
> > > > > I think they doing exactly the same - they use sync API underneath, and
> they
> > > are
> > > > > session based
> > > > > (AFAIK you don't need any device/queue data, everything that needed for
> > > > > crypto/auth is stored inside session).
> > > > >
> > > > By vendor specific, I mean,
> > > > - no PMD would like to have 2 different variants of session Init APIs for
> doing
> > > the same stuff.
> > > > - stacks will become vendor specific while using 2 separate session create
> APIs.
> > > No stack would
> > > > Like to support 2 variants of session create- one for HW PMDs and one for
> SW
> > > PMDs.
> > >
> > > I think what you refer on has nothing to do with 'vendor specific'.
> > > I would name it 'extra overhead for PMD and stack writers'.
> > > Yes, for sure there is extra overhead (as always with new API) -
> > > for both producer (PMD writer) and consumer (stack writer):
> > > New function(s) to support,  probably more tests to create/run, etc.
> > > Though this API is optional - if PMD/stack maintainer doesn't see
> > > value in it, they are free not to support it.
> > > From other side, re-using  rte_cryptodev_sym_session_init()
> > > wouldn't help anyway - both data-path and control-path would differ
> > > from async mode anyway.
> > > BTW, right now to support different HW flavors
> > > we do have 4 different control and data-paths for both
> > > ipsec-secgw and librte_ipsec:
> > > lkds-none/lksd-proto/inline-crypto/inline-proto.
> > > And that is considered to be ok.
> >
> > No that is not ok. We cannot add new paths for every other case.
> 
> What I am saying: if let-say lookaside-proto/inline-crypto/inline-proto
> deserves its own case in rte_security/rte_crypto API,
> I don't understand why cpu-crypto doesn't.

Because cpu-crypto is done by a crypto device and for that we have lookaside none.
SW PMDs are registered as crypto device and we should leverage crypto framework.
I would suggest in future may be 20.11, we can have a security wrapper over cryptodev API
For lookaside none use case. So that we will have a single API set for all cases.

> 
> > Those 4 are controlled using 2 set of APIs.
> 
> Yes there are 2 API sets (rte_cryptodev/rte_security),
> but in fact if you look at ipsec-secgw and librte_ipsec we have 4 different code
> paths.
> For both create_session() and ipsec_enqueue() we have a big switch() with 4
> different cases.
> Nearly the same for librte_ipsec - we have different prepare/process
> function pointers for each security type.
> 
> > We should try our best to
> > Have minimum overhead to the application writer. This pain was also
> discussed
> > In the one of DPDK conference as well.
> > DPDK is not a standalone entity, there are stacks running over it always.
> > We should not add API for every other use case when we have an alternative
> > Approach with the existing API set.
> >
> > Now introducing another one would add to that pain and a lot of work for
> > Both producer and consumer.
> 
> If I would see a clean approach to implement desired functionality
> without introducing new API - I would definitely support it.
> The problem is  that from my perspective,
> what you suggesting with existing API will bring more drawbacks then positives.

From my perspective I see more benefits than the negatives.
- less changes in driver/app
- no major performance gap
- easier migration for the stack from one SOC to other.

The main argument from my side is that:
You need synchronous processing for SW PMDs which is data path.
Why do you need a special session control path to do that. You should have some extra
Params packed in the same control API.

> BTW, our first approach (via rte_security) does reuse existing API,
> so if adding new API is the main concern - let's reconsider that path.
> 
That will be there only if we have security wrapper on cryptodev session create
For lookaside none use case. But the issue would still remain the same.
No special session create for supporting sync mode.


> > It would be interesting to see how much performance difference will be there
> in the
> > Two approaches. As per my understanding it wont be much as compared to
> the
> > Extra work that you will be inducing.
> >
> > -Akhil
> >
> > > Honestly, I don't understand why SW backed implementations
> > > can't have their own path that would suite them most.
> > > Konstantin
> > >
> > >
> > >
> > >
> > >
  

Patch

diff --git a/lib/librte_security/rte_security.c b/lib/librte_security/rte_security.c
index bc81ce15d..0f85c1b59 100644
--- a/lib/librte_security/rte_security.c
+++ b/lib/librte_security/rte_security.c
@@ -141,3 +141,19 @@  rte_security_capability_get(struct rte_security_ctx *instance,
 
 	return NULL;
 }
+
+void
+rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
+		struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num)
+{
+	uint32_t i;
+
+	for (i = 0; i < num; i++)
+		status[i] = -1;
+
+	RTE_FUNC_PTR_OR_RET(*instance->ops->process_cpu_crypto_bulk);
+	instance->ops->process_cpu_crypto_bulk(sess, buf, iv,
+			aad, digest, status, num);
+}
diff --git a/lib/librte_security/rte_security.h b/lib/librte_security/rte_security.h
index 96806e3a2..5a0f8901b 100644
--- a/lib/librte_security/rte_security.h
+++ b/lib/librte_security/rte_security.h
@@ -18,6 +18,7 @@  extern "C" {
 #endif
 
 #include <sys/types.h>
+#include <sys/uio.h>
 
 #include <netinet/in.h>
 #include <netinet/ip.h>
@@ -272,6 +273,20 @@  struct rte_security_pdcp_xform {
 	uint32_t hfn_threshold;
 };
 
+struct rte_security_cpu_crypto_xform {
+	/** For cipher/authentication crypto operation the authentication may
+	 * cover more content then the cipher. E.g., for IPSec ESP encryption
+	 * with AES-CBC and SHA1-HMAC, the encryption happens after the ESP
+	 * header but whole packet (apart from MAC header) is authenticated.
+	 * The cipher_offset field is used to deduct the cipher data pointer
+	 * from the buffer to be processed.
+	 *
+	 * NOTE this parameter shall be ignored by AEAD algorithms, since it
+	 * uses the same offset for cipher and authentication.
+	 */
+	int32_t cipher_offset;
+};
+
 /**
  * Security session action type.
  */
@@ -286,10 +301,14 @@  enum rte_security_session_action_type {
 	/**< All security protocol processing is performed inline during
 	 * transmission
 	 */
-	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
+	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL,
 	/**< All security protocol processing including crypto is performed
 	 * on a lookaside accelerator
 	 */
+	RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
+	/**< Crypto processing for security protocol is processed by CPU
+	 * synchronously
+	 */
 };
 
 /** Security session protocol definition */
@@ -315,6 +334,7 @@  struct rte_security_session_conf {
 		struct rte_security_ipsec_xform ipsec;
 		struct rte_security_macsec_xform macsec;
 		struct rte_security_pdcp_xform pdcp;
+		struct rte_security_cpu_crypto_xform cpucrypto;
 	};
 	/**< Configuration parameters for security session */
 	struct rte_crypto_sym_xform *crypto_xform;
@@ -639,6 +659,35 @@  const struct rte_security_capability *
 rte_security_capability_get(struct rte_security_ctx *instance,
 			    struct rte_security_capability_idx *idx);
 
+/**
+ * Security vector structure, contains pointer to vector array and the length
+ * of the array
+ */
+struct rte_security_vec {
+	struct iovec *vec;
+	uint32_t num;
+};
+
+/**
+ * Processing bulk crypto workload with CPU
+ *
+ * @param	instance	security instance.
+ * @param	sess		security session
+ * @param	buf		array of buffer SGL vectors
+ * @param	iv		array of IV pointers
+ * @param	aad		array of AAD pointers
+ * @param	digest		array of digest pointers
+ * @param	status		array of status for the function to return
+ * @param	num		number of elements in each array
+ *
+ */
+__rte_experimental
+void
+rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
+		struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_security/rte_security_driver.h b/lib/librte_security/rte_security_driver.h
index 1b561f852..70fcb0c26 100644
--- a/lib/librte_security/rte_security_driver.h
+++ b/lib/librte_security/rte_security_driver.h
@@ -132,6 +132,23 @@  typedef int (*security_get_userdata_t)(void *device,
 typedef const struct rte_security_capability *(*security_capabilities_get_t)(
 		void *device);
 
+/**
+ * Process security operations in bulk using CPU accelerated method.
+ *
+ * @param	sess		Security session structure.
+ * @param	buf		Buffer to the vectors to be processed.
+ * @param	iv		IV pointers.
+ * @param	aad		AAD pointers.
+ * @param	digest		Digest pointers.
+ * @param	status		Array of status value.
+ * @param	num		Number of elements in each array.
+ */
+
+typedef void (*security_process_cpu_crypto_bulk_t)(
+		struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num);
+
 /** Security operations function pointer table */
 struct rte_security_ops {
 	security_session_create_t session_create;
@@ -150,6 +167,8 @@  struct rte_security_ops {
 	/**< Get userdata associated with session which processed the packet. */
 	security_capabilities_get_t capabilities_get;
 	/**< Get security capabilities. */
+	security_process_cpu_crypto_bulk_t process_cpu_crypto_bulk;
+	/**< Process data in bulk. */
 };
 
 #ifdef __cplusplus
diff --git a/lib/librte_security/rte_security_version.map b/lib/librte_security/rte_security_version.map
index 53267bf3c..2132e7a00 100644
--- a/lib/librte_security/rte_security_version.map
+++ b/lib/librte_security/rte_security_version.map
@@ -18,4 +18,5 @@  EXPERIMENTAL {
 	rte_security_get_userdata;
 	rte_security_session_stats_get;
 	rte_security_session_update;
+	rte_security_process_cpu_crypto_bulk;
 };