event/octeontx2: use wfe while waiting for head

Message ID 20191023161244.3284-1-pbhagavatula@marvell.com (mailing list archive)
State Accepted, archived
Delegated to: Jerin Jacob
Headers
Series event/octeontx2: use wfe while waiting for head |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-compilation success Compile Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/travis-robot success Travis build: passed
ci/Intel-compilation success Compilation OK

Commit Message

Pavan Nikhilesh Bhagavatula Oct. 23, 2019, 4:12 p.m. UTC
  From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Use wfe to save power while waiting for tag to become head.

SSO signals EVENTI to allow cores to exit from wfe when they
are waiting for specific operations in which one of them is
setting HEAD bit in GWS_TAG.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/event/octeontx2/otx2_worker.h | 30 ++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)
  

Comments

Gavin Hu Oct. 24, 2019, 3:53 p.m. UTC | #1
Hi Pavan,

> -----Original Message-----
> From: pbhagavatula@marvell.com <pbhagavatula@marvell.com>
> Sent: Thursday, October 24, 2019 12:13 AM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> jerinj@marvell.com; Pavan Nikhilesh <pbhagavatula@marvell.com>
> Cc: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH] event/octeontx2: use wfe while waiting for
> head
> 
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> 
> Use wfe to save power while waiting for tag to become head.
> 
> SSO signals EVENTI to allow cores to exit from wfe when they
> are waiting for specific operations in which one of them is
> setting HEAD bit in GWS_TAG.
> 
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> ---
>  drivers/event/octeontx2/otx2_worker.h | 30 ++++++++++++++++++++++++--
> -
>  1 file changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/event/octeontx2/otx2_worker.h
> b/drivers/event/octeontx2/otx2_worker.h
> index 4e971f27c..7a55caca5 100644
> --- a/drivers/event/octeontx2/otx2_worker.h
> +++ b/drivers/event/octeontx2/otx2_worker.h
> @@ -226,10 +226,34 @@ otx2_ssogws_swtag_wait(struct otx2_ssogws *ws)
>  }
> 
>  static __rte_always_inline void
> -otx2_ssogws_head_wait(struct otx2_ssogws *ws, const uint8_t wait_flag)
> +otx2_ssogws_head_wait(struct otx2_ssogws *ws)
>  {
> -	while (wait_flag && !(otx2_read64(ws->tag_op) & BIT_ULL(35)))
> +#ifdef RTE_ARCH_ARM64
> +	uint64_t tag;
> +
> +	asm volatile (
> +			"	ldr %[tag], [%[tag_op]]		\n"
"ldxr" should be used, exclusive-load is required to "monitor" the location, then a write to the location will cause clear of the exclusive monitor, thus a wake up event is generated implicitly.
You can find more explanation is here:
http://inbox.dpdk.org/dev/AM0PR08MB5363F9D1BA158B66B803EA068F6B0@AM0PR08MB5363.eurprd08.prod.outlook.com/ 
/Gavin
> +			"	tbnz %[tag], 35, done%=		\n"
> +			"	sevl				\n"
> +			"rty%=:	wfe				\n"
> +			"	ldr %[tag], [%[tag_op]]		\n"
> +			"	tbz %[tag], 35, rty%=		\n"
> +			"done%=:				\n"
> +			: [tag] "=&r" (tag)
> +			: [tag_op] "r" (ws->tag_op)
> +			);
> +#else
> +	/* Wait for the HEAD to be set */
> +	while (!(otx2_read64(ws->tag_op) & BIT_ULL(35)))
>  		;
> +#endif
> +}
> +
> +static __rte_always_inline void
> +otx2_ssogws_order(struct otx2_ssogws *ws, const uint8_t wait_flag)
> +{
> +	if (wait_flag)
> +		otx2_ssogws_head_wait(ws);
> 
>  	rte_cio_wmb();
What ordering does this barrier try to keep?  If there is a write then wait for kind of response, should this barrier move before  otx2_ssogws_head_wait?
/Gavin
>  }
> @@ -258,7 +282,7 @@ otx2_ssogws_event_tx(struct otx2_ssogws *ws,
> struct rte_event ev[],
> 
>  	/* Perform header writes before barrier for TSO */
>  	otx2_nix_xmit_prepare_tso(m, flags);
> -	otx2_ssogws_head_wait(ws, !ev->sched_type);
> +	otx2_ssogws_order(ws, !ev->sched_type);
>  	otx2_ssogws_prepare_pkt(txq, m, cmd, flags);
> 
>  	if (flags & NIX_TX_MULTI_SEG_F) {
> --
> 2.17.1
  
Pavan Nikhilesh Bhagavatula Oct. 25, 2019, 4:26 a.m. UTC | #2
Hi Gavin,

>-----Original Message-----
>From: dev <dev-bounces@dpdk.org> On Behalf Of Gavin Hu (Arm
>Technology China)
>Sent: Thursday, October 24, 2019 9:23 PM
>To: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>; Jerin
>Jacob Kollanukkaran <jerinj@marvell.com>
>Cc: dev@dpdk.org; nd <nd@arm.com>
>Subject: Re: [dpdk-dev] [PATCH] event/octeontx2: use wfe while
>waiting for head
>
>Hi Pavan,
>
>> -----Original Message-----
>> From: pbhagavatula@marvell.com <pbhagavatula@marvell.com>
>> Sent: Thursday, October 24, 2019 12:13 AM
>> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
>> jerinj@marvell.com; Pavan Nikhilesh <pbhagavatula@marvell.com>
>> Cc: dev@dpdk.org
>> Subject: [dpdk-dev] [PATCH] event/octeontx2: use wfe while waiting
>for
>> head
>>
>> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>
>> Use wfe to save power while waiting for tag to become head.
>>
>> SSO signals EVENTI to allow cores to exit from wfe when they
>> are waiting for specific operations in which one of them is
>> setting HEAD bit in GWS_TAG.
>>
>> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
>> ---
>>  drivers/event/octeontx2/otx2_worker.h | 30
>++++++++++++++++++++++++--
>> -
>>  1 file changed, 27 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/event/octeontx2/otx2_worker.h
>> b/drivers/event/octeontx2/otx2_worker.h
>> index 4e971f27c..7a55caca5 100644
>> --- a/drivers/event/octeontx2/otx2_worker.h
>> +++ b/drivers/event/octeontx2/otx2_worker.h
>> @@ -226,10 +226,34 @@ otx2_ssogws_swtag_wait(struct
>otx2_ssogws *ws)
>>  }
>>
>>  static __rte_always_inline void
>> -otx2_ssogws_head_wait(struct otx2_ssogws *ws, const uint8_t
>wait_flag)
>> +otx2_ssogws_head_wait(struct otx2_ssogws *ws)
>>  {
>> -	while (wait_flag && !(otx2_read64(ws->tag_op) &
>BIT_ULL(35)))
>> +#ifdef RTE_ARCH_ARM64
>> +	uint64_t tag;
>> +
>> +	asm volatile (
>> +			"	ldr %[tag], [%[tag_op]]		\n"
>"ldxr" should be used, exclusive-load is required to "monitor" the
>location, then a write to the location will cause clear of the exclusive
>monitor, thus a wake up event is generated implicitly.

As I have mentioned in the commit log:
"SSO signals EVENTI to allow cores to exit from wfe when they
are waiting for specific operations in which one of them is
setting HEAD bit in GWS_TAG."

The address need not be tracked by the global monitor.

>You can find more explanation is here:
>https://urldefense.proofpoint.com/v2/url?u=http-
>3A__inbox.dpdk.org_dev_AM0PR08MB5363F9D1BA158B66B803EA068F
>6B0-
>40AM0PR08MB5363.eurprd08.prod.outlook.com_&d=DwIFAg&c=nKjW
>ec2b6R0mOyPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6F
>N6Z0&m=JMzT-4V2megNsFYxaO0V2wE0-
>GlK9UPUvE1K0pPA9aQ&s=JajU2VklhV_jFE0WKAZ076KjjWymIC-
>iTiJXU0Vwxr4&e=
>/Gavin
>> +			"	tbnz %[tag], 35, done%=
>	\n"
>> +			"	sevl				\n"
>> +			"rty%=:	wfe				\n"
>> +			"	ldr %[tag], [%[tag_op]]		\n"
>> +			"	tbz %[tag], 35, rty%=		\n"
>> +			"done%=:				\n"
>> +			: [tag] "=&r" (tag)
>> +			: [tag_op] "r" (ws->tag_op)
>> +			);
>> +#else
>> +	/* Wait for the HEAD to be set */
>> +	while (!(otx2_read64(ws->tag_op) & BIT_ULL(35)))
>>  		;
>> +#endif
>> +}
>> +
>> +static __rte_always_inline void
>> +otx2_ssogws_order(struct otx2_ssogws *ws, const uint8_t
>wait_flag)
>> +{
>> +	if (wait_flag)
>> +		otx2_ssogws_head_wait(ws);
>>
>>  	rte_cio_wmb();
>What ordering does this barrier try to keep?  If there is a write then wait
>for kind of response, should this barrier move before
>otx2_ssogws_head_wait?

The barrier is used to flush out write buffer to LLC (octeontx2 point of coherence) so
that NIX Tx picks up all the modifications done to the packet. 

>/Gavin
>>  }
>> @@ -258,7 +282,7 @@ otx2_ssogws_event_tx(struct otx2_ssogws
>*ws,
>> struct rte_event ev[],
>>
>>  	/* Perform header writes before barrier for TSO */
>>  	otx2_nix_xmit_prepare_tso(m, flags);
>> -	otx2_ssogws_head_wait(ws, !ev->sched_type);
>> +	otx2_ssogws_order(ws, !ev->sched_type);
>>  	otx2_ssogws_prepare_pkt(txq, m, cmd, flags);
>>
>>  	if (flags & NIX_TX_MULTI_SEG_F) {
>> --
>> 2.17.1
  
Gavin Hu Oct. 25, 2019, 4:34 p.m. UTC | #3
Hi Pavan,

> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: Friday, October 25, 2019 12:26 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> jerinj@marvell.com
> Cc: dev@dpdk.org; nd <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH] event/octeontx2: use wfe while waiting for
> head
> 
> Hi Gavin,
> 
> >-----Original Message-----
> >From: dev <dev-bounces@dpdk.org> On Behalf Of Gavin Hu (Arm
> >Technology China)
> >Sent: Thursday, October 24, 2019 9:23 PM
> >To: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>; Jerin
> >Jacob Kollanukkaran <jerinj@marvell.com>
> >Cc: dev@dpdk.org; nd <nd@arm.com>
> >Subject: Re: [dpdk-dev] [PATCH] event/octeontx2: use wfe while
> >waiting for head
> >
> >Hi Pavan,
> >
> >> -----Original Message-----
> >> From: pbhagavatula@marvell.com <pbhagavatula@marvell.com>
> >> Sent: Thursday, October 24, 2019 12:13 AM
> >> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> >> jerinj@marvell.com; Pavan Nikhilesh <pbhagavatula@marvell.com>
> >> Cc: dev@dpdk.org
> >> Subject: [dpdk-dev] [PATCH] event/octeontx2: use wfe while waiting
> >for
> >> head
> >>
> >> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> >>
> >> Use wfe to save power while waiting for tag to become head.
> >>
> >> SSO signals EVENTI to allow cores to exit from wfe when they
> >> are waiting for specific operations in which one of them is
> >> setting HEAD bit in GWS_TAG.
> >>
> >> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> >> ---
> >>  drivers/event/octeontx2/otx2_worker.h | 30
> >++++++++++++++++++++++++--
> >> -
> >>  1 file changed, 27 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/event/octeontx2/otx2_worker.h
> >> b/drivers/event/octeontx2/otx2_worker.h
> >> index 4e971f27c..7a55caca5 100644
> >> --- a/drivers/event/octeontx2/otx2_worker.h
> >> +++ b/drivers/event/octeontx2/otx2_worker.h
> >> @@ -226,10 +226,34 @@ otx2_ssogws_swtag_wait(struct
> >otx2_ssogws *ws)
> >>  }
> >>
> >>  static __rte_always_inline void
> >> -otx2_ssogws_head_wait(struct otx2_ssogws *ws, const uint8_t
> >wait_flag)
> >> +otx2_ssogws_head_wait(struct otx2_ssogws *ws)
> >>  {
> >> -	while (wait_flag && !(otx2_read64(ws->tag_op) &
> >BIT_ULL(35)))
> >> +#ifdef RTE_ARCH_ARM64
> >> +	uint64_t tag;
> >> +
> >> +	asm volatile (
> >> +			"	ldr %[tag], [%[tag_op]]		\n"
> >"ldxr" should be used, exclusive-load is required to "monitor" the
> >location, then a write to the location will cause clear of the exclusive
> >monitor, thus a wake up event is generated implicitly.
> 
> As I have mentioned in the commit log:
> "SSO signals EVENTI to allow cores to exit from wfe when they
> are waiting for specific operations in which one of them is
> setting HEAD bit in GWS_TAG."
If you have other expected wake up sources, that is ok. Just curious is this signal explicitly sent to quit WFE? 
Just wondering, implicit event(Clear of exclusive monitor) vs explicit signal, which has shorter latency?
/Gavin
> 
> The address need not be tracked by the global monitor.
> 
> >You can find more explanation is here:
> >https://urldefense.proofpoint.com/v2/url?u=http-
> >3A__inbox.dpdk.org_dev_AM0PR08MB5363F9D1BA158B66B803EA068F
> >6B0-
> >40AM0PR08MB5363.eurprd08.prod.outlook.com_&d=DwIFAg&c=nKjW
> >ec2b6R0mOyPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6F
> >N6Z0&m=JMzT-4V2megNsFYxaO0V2wE0-
> >GlK9UPUvE1K0pPA9aQ&s=JajU2VklhV_jFE0WKAZ076KjjWymIC-
> >iTiJXU0Vwxr4&e=
> >/Gavin
> >> +			"	tbnz %[tag], 35, done%=
> >	\n"
> >> +			"	sevl				\n"
> >> +			"rty%=:	wfe				\n"
> >> +			"	ldr %[tag], [%[tag_op]]		\n"
> >> +			"	tbz %[tag], 35, rty%=		\n"
> >> +			"done%=:				\n"
> >> +			: [tag] "=&r" (tag)
> >> +			: [tag_op] "r" (ws->tag_op)
> >> +			);
> >> +#else
> >> +	/* Wait for the HEAD to be set */
> >> +	while (!(otx2_read64(ws->tag_op) & BIT_ULL(35)))
> >>  		;
> >> +#endif
> >> +}
> >> +
> >> +static __rte_always_inline void
> >> +otx2_ssogws_order(struct otx2_ssogws *ws, const uint8_t
> >wait_flag)
> >> +{
> >> +	if (wait_flag)
> >> +		otx2_ssogws_head_wait(ws);
> >>
> >>  	rte_cio_wmb();
> >What ordering does this barrier try to keep?  If there is a write then wait
> >for kind of response, should this barrier move before
> >otx2_ssogws_head_wait?
> 
> The barrier is used to flush out write buffer to LLC (octeontx2 point of
> coherence) so
> that NIX Tx picks up all the modifications done to the packet.
Looking at the otx2_ssogws_event_tx function, so far at the point of rte_cio_wmb, only the header is written?
Should it be delayed after the whole packet written and before the submission? 
If NIX is not falling within the SMP configuration, should it be rte_io_wmb instead?
/Gavin
> >>  }
> >> @@ -258,7 +282,7 @@ otx2_ssogws_event_tx(struct otx2_ssogws
> >*ws,
> >> struct rte_event ev[],
> >>
> >>  	/* Perform header writes before barrier for TSO */
> >>  	otx2_nix_xmit_prepare_tso(m, flags);
> >> -	otx2_ssogws_head_wait(ws, !ev->sched_type);
> >> +	otx2_ssogws_order(ws, !ev->sched_type);
> >>  	otx2_ssogws_prepare_pkt(txq, m, cmd, flags);
> >>
> >>  	if (flags & NIX_TX_MULTI_SEG_F) {
> >> --
> >> 2.17.1
  
Pavan Nikhilesh Bhagavatula Oct. 25, 2019, 5:06 p.m. UTC | #4
Hi Gavin,
>Hi Pavan,
>
>> -----Original Message-----
>> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
>> Sent: Friday, October 25, 2019 12:26 PM
>> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
>> jerinj@marvell.com
>> Cc: dev@dpdk.org; nd <nd@arm.com>
>> Subject: RE: [dpdk-dev] [PATCH] event/octeontx2: use wfe while
>waiting for
>> head
>>
>> Hi Gavin,
>>
>> >-----Original Message-----
>> >From: dev <dev-bounces@dpdk.org> On Behalf Of Gavin Hu (Arm
>> >Technology China)
>> >Sent: Thursday, October 24, 2019 9:23 PM
>> >To: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>;
>Jerin
>> >Jacob Kollanukkaran <jerinj@marvell.com>
>> >Cc: dev@dpdk.org; nd <nd@arm.com>
>> >Subject: Re: [dpdk-dev] [PATCH] event/octeontx2: use wfe while
>> >waiting for head
>> >
>> >Hi Pavan,
>> >
>> >> -----Original Message-----
>> >> From: pbhagavatula@marvell.com <pbhagavatula@marvell.com>
>> >> Sent: Thursday, October 24, 2019 12:13 AM
>> >> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
>> >> jerinj@marvell.com; Pavan Nikhilesh
><pbhagavatula@marvell.com>
>> >> Cc: dev@dpdk.org
>> >> Subject: [dpdk-dev] [PATCH] event/octeontx2: use wfe while
>waiting
>> >for
>> >> head
>> >>
>> >> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>> >>
>> >> Use wfe to save power while waiting for tag to become head.
>> >>
>> >> SSO signals EVENTI to allow cores to exit from wfe when they
>> >> are waiting for specific operations in which one of them is
>> >> setting HEAD bit in GWS_TAG.
>> >>
>> >> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
>> >> ---
>> >>  drivers/event/octeontx2/otx2_worker.h | 30
>> >++++++++++++++++++++++++--
>> >> -
>> >>  1 file changed, 27 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/drivers/event/octeontx2/otx2_worker.h
>> >> b/drivers/event/octeontx2/otx2_worker.h
>> >> index 4e971f27c..7a55caca5 100644
>> >> --- a/drivers/event/octeontx2/otx2_worker.h
>> >> +++ b/drivers/event/octeontx2/otx2_worker.h
>> >> @@ -226,10 +226,34 @@ otx2_ssogws_swtag_wait(struct
>> >otx2_ssogws *ws)
>> >>  }
>> >>
>> >>  static __rte_always_inline void
>> >> -otx2_ssogws_head_wait(struct otx2_ssogws *ws, const uint8_t
>> >wait_flag)
>> >> +otx2_ssogws_head_wait(struct otx2_ssogws *ws)
>> >>  {
>> >> -	while (wait_flag && !(otx2_read64(ws->tag_op) &
>> >BIT_ULL(35)))
>> >> +#ifdef RTE_ARCH_ARM64
>> >> +	uint64_t tag;
>> >> +
>> >> +	asm volatile (
>> >> +			"	ldr %[tag], [%[tag_op]]		\n"
>> >"ldxr" should be used, exclusive-load is required to "monitor" the
>> >location, then a write to the location will cause clear of the exclusive
>> >monitor, thus a wake up event is generated implicitly.
>>
>> As I have mentioned in the commit log:
>> "SSO signals EVENTI to allow cores to exit from wfe when they
>> are waiting for specific operations in which one of them is
>> setting HEAD bit in GWS_TAG."
>If you have other expected wake up sources, that is ok. Just curious is
>this signal explicitly sent to quit WFE?

AFAIK yes, explicitly sent to quit WFE.

>Just wondering, implicit event(Clear of exclusive monitor) vs explicit
>signal, which has shorter latency?

Not really sure but SSO has dedicated bus inside each core.

>/Gavin
>>
>> The address need not be tracked by the global monitor.
>>
>> >You can find more explanation is here:
>> >https://urldefense.proofpoint.com/v2/url?u=http-
>>
>>3A__inbox.dpdk.org_dev_AM0PR08MB5363F9D1BA158B66B803EA068
>F
>> >6B0-
>>
>>40AM0PR08MB5363.eurprd08.prod.outlook.com_&d=DwIFAg&c=nKj
>W
>>
>>ec2b6R0mOyPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6
>F
>> >N6Z0&m=JMzT-4V2megNsFYxaO0V2wE0-
>> >GlK9UPUvE1K0pPA9aQ&s=JajU2VklhV_jFE0WKAZ076KjjWymIC-
>> >iTiJXU0Vwxr4&e=
>> >/Gavin
>> >> +			"	tbnz %[tag], 35, done%=
>> >	\n"
>> >> +			"	sevl				\n"
>> >> +			"rty%=:	wfe				\n"
>> >> +			"	ldr %[tag], [%[tag_op]]		\n"
>> >> +			"	tbz %[tag], 35, rty%=		\n"
>> >> +			"done%=:				\n"
>> >> +			: [tag] "=&r" (tag)
>> >> +			: [tag_op] "r" (ws->tag_op)
>> >> +			);
>> >> +#else
>> >> +	/* Wait for the HEAD to be set */
>> >> +	while (!(otx2_read64(ws->tag_op) & BIT_ULL(35)))
>> >>  		;
>> >> +#endif
>> >> +}
>> >> +
>> >> +static __rte_always_inline void
>> >> +otx2_ssogws_order(struct otx2_ssogws *ws, const uint8_t
>> >wait_flag)
>> >> +{
>> >> +	if (wait_flag)
>> >> +		otx2_ssogws_head_wait(ws);
>> >>
>> >>  	rte_cio_wmb();
>> >What ordering does this barrier try to keep?  If there is a write then
>wait
>> >for kind of response, should this barrier move before
>> >otx2_ssogws_head_wait?
>>
>> The barrier is used to flush out write buffer to LLC (octeontx2 point of
>> coherence) so
>> that NIX Tx picks up all the modifications done to the packet.

>Looking at the otx2_ssogws_event_tx function, so far at the point of
>rte_cio_wmb, only the header is written?
>Should it be delayed after the whole packet written and before the
>submission?

We only care that the writes to the actual packet buffer ex. Start of ethernet header
are committed.
The rest of mbuf fields are translated into a HW command after the barrier and written 
to a LMTLINE using ldoer.

>If NIX is not falling within the SMP configuration, should it be
>rte_io_wmb instead?

Octeontx2 has only single shareability domain i.e. it makes no distinction between 
Outer and inner sharable domains. 
Since all IO devices are interpreted to be on outer sharable domain, we like to use 
rte_cio_(r/w)mb for IO devices. 

>/Gavin

Regards,
Pavan.

>> >>  }
>> >> @@ -258,7 +282,7 @@ otx2_ssogws_event_tx(struct otx2_ssogws
>> >*ws,
>> >> struct rte_event ev[],
>> >>
>> >>  	/* Perform header writes before barrier for TSO */
>> >>  	otx2_nix_xmit_prepare_tso(m, flags);
>> >> -	otx2_ssogws_head_wait(ws, !ev->sched_type);
>> >> +	otx2_ssogws_order(ws, !ev->sched_type);
>> >>  	otx2_ssogws_prepare_pkt(txq, m, cmd, flags);
>> >>
>> >>  	if (flags & NIX_TX_MULTI_SEG_F) {
>> >> --
>> >> 2.17.1
  
Gavin Hu Oct. 27, 2019, 9:12 a.m. UTC | #5
Hi Pavan,

> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: Saturday, October 26, 2019 1:06 AM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> jerinj@marvell.com
> Cc: dev@dpdk.org; nd <nd@arm.com>; nd <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH] event/octeontx2: use wfe while waiting for
> head
> 
> Hi Gavin,
> >Hi Pavan,
> >
> >> -----Original Message-----
> >> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> >> Sent: Friday, October 25, 2019 12:26 PM
> >> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> >> jerinj@marvell.com
> >> Cc: dev@dpdk.org; nd <nd@arm.com>
> >> Subject: RE: [dpdk-dev] [PATCH] event/octeontx2: use wfe while
> >waiting for
> >> head
> >>
> >> Hi Gavin,
> >>
> >> >-----Original Message-----
> >> >From: dev <dev-bounces@dpdk.org> On Behalf Of Gavin Hu (Arm
> >> >Technology China)
> >> >Sent: Thursday, October 24, 2019 9:23 PM
> >> >To: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>;
> >Jerin
> >> >Jacob Kollanukkaran <jerinj@marvell.com>
> >> >Cc: dev@dpdk.org; nd <nd@arm.com>
> >> >Subject: Re: [dpdk-dev] [PATCH] event/octeontx2: use wfe while
> >> >waiting for head
> >> >
> >> >Hi Pavan,
> >> >
> >> >> -----Original Message-----
> >> >> From: pbhagavatula@marvell.com <pbhagavatula@marvell.com>
> >> >> Sent: Thursday, October 24, 2019 12:13 AM
> >> >> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> >> >> jerinj@marvell.com; Pavan Nikhilesh
> ><pbhagavatula@marvell.com>
> >> >> Cc: dev@dpdk.org
> >> >> Subject: [dpdk-dev] [PATCH] event/octeontx2: use wfe while
> >waiting
> >> >for
> >> >> head
> >> >>
> >> >> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> >> >>
> >> >> Use wfe to save power while waiting for tag to become head.
> >> >>
> >> >> SSO signals EVENTI to allow cores to exit from wfe when they
> >> >> are waiting for specific operations in which one of them is
> >> >> setting HEAD bit in GWS_TAG.
> >> >>
> >> >> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> >> >> ---
> >> >>  drivers/event/octeontx2/otx2_worker.h | 30
> >> >++++++++++++++++++++++++--
> >> >> -
> >> >>  1 file changed, 27 insertions(+), 3 deletions(-)
> >> >>
> >> >> diff --git a/drivers/event/octeontx2/otx2_worker.h
> >> >> b/drivers/event/octeontx2/otx2_worker.h
> >> >> index 4e971f27c..7a55caca5 100644
> >> >> --- a/drivers/event/octeontx2/otx2_worker.h
> >> >> +++ b/drivers/event/octeontx2/otx2_worker.h
> >> >> @@ -226,10 +226,34 @@ otx2_ssogws_swtag_wait(struct
> >> >otx2_ssogws *ws)
> >> >>  }
> >> >>
> >> >>  static __rte_always_inline void
> >> >> -otx2_ssogws_head_wait(struct otx2_ssogws *ws, const uint8_t
> >> >wait_flag)
> >> >> +otx2_ssogws_head_wait(struct otx2_ssogws *ws)
> >> >>  {
> >> >> -	while (wait_flag && !(otx2_read64(ws->tag_op) &
> >> >BIT_ULL(35)))
> >> >> +#ifdef RTE_ARCH_ARM64
> >> >> +	uint64_t tag;
> >> >> +
> >> >> +	asm volatile (
> >> >> +			"	ldr %[tag], [%[tag_op]]		\n"
> >> >"ldxr" should be used, exclusive-load is required to "monitor" the
> >> >location, then a write to the location will cause clear of the exclusive
> >> >monitor, thus a wake up event is generated implicitly.
> >>
> >> As I have mentioned in the commit log:
> >> "SSO signals EVENTI to allow cores to exit from wfe when they
> >> are waiting for specific operations in which one of them is
> >> setting HEAD bit in GWS_TAG."
> >If you have other expected wake up sources, that is ok. Just curious is
> >this signal explicitly sent to quit WFE?
> 
> AFAIK yes, explicitly sent to quit WFE.
> 
> >Just wondering, implicit event(Clear of exclusive monitor) vs explicit
> >signal, which has shorter latency?
> 
> Not really sure but SSO has dedicated bus inside each core.
That's ok.
> 
> >/Gavin
> >>
> >> The address need not be tracked by the global monitor.
> >>
> >> >You can find more explanation is here:
> >> >https://urldefense.proofpoint.com/v2/url?u=http-
> >>
> >>3A__inbox.dpdk.org_dev_AM0PR08MB5363F9D1BA158B66B803EA068
> >F
> >> >6B0-
> >>
> >>40AM0PR08MB5363.eurprd08.prod.outlook.com_&d=DwIFAg&c=nKj
> >W
> >>
> >>ec2b6R0mOyPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6
> >F
> >> >N6Z0&m=JMzT-4V2megNsFYxaO0V2wE0-
> >> >GlK9UPUvE1K0pPA9aQ&s=JajU2VklhV_jFE0WKAZ076KjjWymIC-
> >> >iTiJXU0Vwxr4&e=
> >> >/Gavin
> >> >> +			"	tbnz %[tag], 35, done%=
> >> >	\n"
> >> >> +			"	sevl				\n"
> >> >> +			"rty%=:	wfe				\n"
> >> >> +			"	ldr %[tag], [%[tag_op]]		\n"
> >> >> +			"	tbz %[tag], 35, rty%=		\n"
> >> >> +			"done%=:				\n"
> >> >> +			: [tag] "=&r" (tag)
> >> >> +			: [tag_op] "r" (ws->tag_op)
> >> >> +			);
> >> >> +#else
> >> >> +	/* Wait for the HEAD to be set */
> >> >> +	while (!(otx2_read64(ws->tag_op) & BIT_ULL(35)))
> >> >>  		;
> >> >> +#endif
> >> >> +}
> >> >> +
> >> >> +static __rte_always_inline void
> >> >> +otx2_ssogws_order(struct otx2_ssogws *ws, const uint8_t
> >> >wait_flag)
> >> >> +{
> >> >> +	if (wait_flag)
> >> >> +		otx2_ssogws_head_wait(ws);
> >> >>
> >> >>  	rte_cio_wmb();
> >> >What ordering does this barrier try to keep?  If there is a write then
> >wait
> >> >for kind of response, should this barrier move before
> >> >otx2_ssogws_head_wait?
> >>
> >> The barrier is used to flush out write buffer to LLC (octeontx2 point of
> >> coherence) so
> >> that NIX Tx picks up all the modifications done to the packet.
> 
> >Looking at the otx2_ssogws_event_tx function, so far at the point of
> >rte_cio_wmb, only the header is written?
> >Should it be delayed after the whole packet written and before the
> >submission?
> 
> We only care that the writes to the actual packet buffer ex. Start of ethernet
> header
> are committed.
> The rest of mbuf fields are translated into a HW command after the barrier
> and written
> to a LMTLINE using ldoer.
> 
> >If NIX is not falling within the SMP configuration, should it be
> >rte_io_wmb instead?
> 
> Octeontx2 has only single shareability domain i.e. it makes no distinction
> between
> Outer and inner sharable domains.
> Since all IO devices are interpreted to be on outer sharable domain, we like
> to use
> rte_cio_(r/w)mb for IO devices.
Yes, for an integral part of the out sharable domain, rte_cio_(r/w)mb is sufficient. 
> 
> Regards,
> Pavan.
> 
> >> >>  }
> >> >> @@ -258,7 +282,7 @@ otx2_ssogws_event_tx(struct otx2_ssogws
> >> >*ws,
> >> >> struct rte_event ev[],
> >> >>
> >> >>  	/* Perform header writes before barrier for TSO */
> >> >>  	otx2_nix_xmit_prepare_tso(m, flags);
> >> >> -	otx2_ssogws_head_wait(ws, !ev->sched_type);
> >> >> +	otx2_ssogws_order(ws, !ev->sched_type);
> >> >>  	otx2_ssogws_prepare_pkt(txq, m, cmd, flags);
> >> >>
> >> >>  	if (flags & NIX_TX_MULTI_SEG_F) {
> >> >> --
> >> >> 2.17.1

Reviewed-by: Gavin Hu <gavin.hu@arm.com>
  
Jerin Jacob Oct. 30, 2019, 1:33 p.m. UTC | #6
On Sun, Oct 27, 2019 at 2:42 PM Gavin Hu (Arm Technology China)
<Gavin.Hu@arm.com> wrote:
>
> Hi Pavan,
>
> > -----Original Message-----
> > From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> > Sent: Saturday, October 26, 2019 1:06 AM
> > To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> > jerinj@marvell.com
> > Cc: dev@dpdk.org; nd <nd@arm.com>; nd <nd@arm.com>
> > Subject: RE: [dpdk-dev] [PATCH] event/octeontx2: use wfe while waiting for
> > head
> >

> Reviewed-by: Gavin Hu <gavin.hu@arm.com>

Applied to dpdk-next-eventdev/master. Thanks.

>
  
Honnappa Nagarahalli Dec. 18, 2019, 5:42 p.m. UTC | #7
<snip>

> >> >>
> >> >>  static __rte_always_inline void
> >> >> -otx2_ssogws_head_wait(struct otx2_ssogws *ws, const uint8_t
> >> >wait_flag)
> >> >> +otx2_ssogws_head_wait(struct otx2_ssogws *ws)
> >> >>  {
> >> >> -	while (wait_flag && !(otx2_read64(ws->tag_op) &
> >> >BIT_ULL(35)))
> >> >> +#ifdef RTE_ARCH_ARM64
> >> >> +	uint64_t tag;
> >> >> +
> >> >> +	asm volatile (
> >> >> +			"	ldr %[tag], [%[tag_op]]		\n"
> >> >"ldxr" should be used, exclusive-load is required to "monitor" the
> >> >location, then a write to the location will cause clear of the
> >> >exclusive monitor, thus a wake up event is generated implicitly.
> >>
> >> As I have mentioned in the commit log:
> >> "SSO signals EVENTI to allow cores to exit from wfe when they are
> >> waiting for specific operations in which one of them is setting HEAD
> >> bit in GWS_TAG."
> >If you have other expected wake up sources, that is ok. Just curious is
> >this signal explicitly sent to quit WFE?
> 
> AFAIK yes, explicitly sent to quit WFE.
Pavan, is the wake up event sent to the particular core that is waiting on this head or is it sent to all the cores?
  

Patch

diff --git a/drivers/event/octeontx2/otx2_worker.h b/drivers/event/octeontx2/otx2_worker.h
index 4e971f27c..7a55caca5 100644
--- a/drivers/event/octeontx2/otx2_worker.h
+++ b/drivers/event/octeontx2/otx2_worker.h
@@ -226,10 +226,34 @@  otx2_ssogws_swtag_wait(struct otx2_ssogws *ws)
 }
 
 static __rte_always_inline void
-otx2_ssogws_head_wait(struct otx2_ssogws *ws, const uint8_t wait_flag)
+otx2_ssogws_head_wait(struct otx2_ssogws *ws)
 {
-	while (wait_flag && !(otx2_read64(ws->tag_op) & BIT_ULL(35)))
+#ifdef RTE_ARCH_ARM64
+	uint64_t tag;
+
+	asm volatile (
+			"	ldr %[tag], [%[tag_op]]		\n"
+			"	tbnz %[tag], 35, done%=		\n"
+			"	sevl				\n"
+			"rty%=:	wfe				\n"
+			"	ldr %[tag], [%[tag_op]]		\n"
+			"	tbz %[tag], 35, rty%=		\n"
+			"done%=:				\n"
+			: [tag] "=&r" (tag)
+			: [tag_op] "r" (ws->tag_op)
+			);
+#else
+	/* Wait for the HEAD to be set */
+	while (!(otx2_read64(ws->tag_op) & BIT_ULL(35)))
 		;
+#endif
+}
+
+static __rte_always_inline void
+otx2_ssogws_order(struct otx2_ssogws *ws, const uint8_t wait_flag)
+{
+	if (wait_flag)
+		otx2_ssogws_head_wait(ws);
 
 	rte_cio_wmb();
 }
@@ -258,7 +282,7 @@  otx2_ssogws_event_tx(struct otx2_ssogws *ws, struct rte_event ev[],
 
 	/* Perform header writes before barrier for TSO */
 	otx2_nix_xmit_prepare_tso(m, flags);
-	otx2_ssogws_head_wait(ws, !ev->sched_type);
+	otx2_ssogws_order(ws, !ev->sched_type);
 	otx2_ssogws_prepare_pkt(txq, m, cmd, flags);
 
 	if (flags & NIX_TX_MULTI_SEG_F) {