[dpdk-dev] eal/armv8: high-resolution cycle counter

Message ID 1471521090-21067-1-git-send-email-jerin.jacob@caviumnetworks.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers

Commit Message

Jerin Jacob Aug. 18, 2016, 11:51 a.m. UTC
  Existing cntvct_el0 based rte_rdtsc() provides portable
means to get wall clock counter at user space. Typically
it runs at <= 100MHz.

The alternative method to enable rte_rdtsc() for high resolution
wall clock counter is through armv8 PMU subsystem.
The PMU cycle counter runs at CPU frequency, However,
access to PMU cycle counter from user space is not enabled
by default in the arm64 linux kernel.
It is possible to enable cycle counter at user space access
by configuring the PMU from the privileged mode (kernel space).

by default rte_rdtsc() implementation uses portable
cntvct_el0 scheme. Application can choose the PMU based
implementation with CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---

The PMU based scheme useful for high accuracy performance profiling.
Find below the example steps to configure the PMU based cycle counter on an
armv8 machine.

# git clone https://github.com/jerinjacobk/armv8_pmu_cycle_counter_el0
# cd armv8_pmu_cycle_counter_el0
# make
# sudo insmod pmu_el0_cycle_counter.ko
# cd $DPDK_DIR
# make config T=arm64-armv8a-linuxapp-gcc
# echo "CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU=y" >> build/.config
# make -j 4

---
 .../common/include/arch/arm/rte_cycles_64.h        | 33 ++++++++++++++++++++++
 1 file changed, 33 insertions(+)
  

Comments

Nipun Gupta Aug. 19, 2016, 9:43 a.m. UTC | #1
Hi Jerin,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> Sent: Thursday, August 18, 2016 17:22
> To: dev@dpdk.org
> Cc: thomas.monjalon@6wind.com; jianbo.liu@linaro.org;
> viktorin@rehivetech.com; Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Subject: [dpdk-dev] [PATCH] eal/armv8: high-resolution cycle counter
> 
> Existing cntvct_el0 based rte_rdtsc() provides portable
> means to get wall clock counter at user space. Typically
> it runs at <= 100MHz.
> 
> The alternative method to enable rte_rdtsc() for high resolution
> wall clock counter is through armv8 PMU subsystem.
> The PMU cycle counter runs at CPU frequency, However,
> access to PMU cycle counter from user space is not enabled
> by default in the arm64 linux kernel.
> It is possible to enable cycle counter at user space access
> by configuring the PMU from the privileged mode (kernel space).
> 
> by default rte_rdtsc() implementation uses portable
> cntvct_el0 scheme. Application can choose the PMU based
> implementation with CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU
> 
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
> 
> The PMU based scheme useful for high accuracy performance profiling.
> Find below the example steps to configure the PMU based cycle counter on an
> armv8 machine.
> 
> # git clone https://github.com/jerinjacobk/armv8_pmu_cycle_counter_el0
> # cd armv8_pmu_cycle_counter_el0
> # make
> # sudo insmod pmu_el0_cycle_counter.ko
> # cd $DPDK_DIR
> # make config T=arm64-armv8a-linuxapp-gcc
> # echo "CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU=y" >> build/.config
> # make -j 4

Can we make this kernel module also a part of DPDK. May be in the linuxapp so that it is also compiled with DPDK?

> 
> ---
>  .../common/include/arch/arm/rte_cycles_64.h        | 33
> ++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> index 14f2612..867a946 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> @@ -45,6 +45,11 @@ extern "C" {
>   * @return
>   *   The time base for this lcore.
>   */
> +#ifndef RTE_ARM_EAL_RDTSC_USE_PMU
> +/**
> + * This call is portable to any ARMv8 architecture, however, typically
> + * cntvct_el0 runs at <= 100MHz and it may be imprecise for some tasks.
> + */
>  static inline uint64_t
>  rte_rdtsc(void)
>  {
> @@ -53,6 +58,34 @@ rte_rdtsc(void)
>  	asm volatile("mrs %0, cntvct_el0" : "=r" (tsc));
>  	return tsc;
>  }
> +#else
> +/**
> + * This is an alternative method to enable rte_rdtsc() with high resolution
> + * PMU cycles counter.The cycle counter runs at cpu frequency and this scheme
> + * uses ARMv8 PMU subsystem to get the cycle counter at userspace, However,
> + * access to PMU cycle counter from user space is not enabled by default in
> + * arm64 linux kernel.
> + * It is possible to enable cycle counter at user space access by configuring
> + * the PMU from the privileged mode (kernel space).
> + *
> + * asm volatile("msr pmintenset_el1, %0" : : "r" ((u64)(0 << 31)));
> + * asm volatile("msr pmcntenset_el0, %0" :: "r" BIT(31));
> + * asm volatile("msr pmuserenr_el0, %0" : : "r"(BIT(0) | BIT(2)));
> + * asm volatile("mrs %0, pmcr_el0" : "=r" (val));
> + * val |= (BIT(0) | BIT(2));
> + * isb();
> + * asm volatile("msr pmcr_el0, %0" : : "r" (val));

In your git repo I see that on cleanup the cycle count register is not disabled (PMCNTENCLR_EL0). It shall be better to disable the cycle count register too at module exit.

> + *
> + */
> +static inline uint64_t
> +rte_rdtsc(void)
> +{
> +	uint64_t tsc;
> +
> +	asm volatile("mrs %0, pmccntr_el0" : "=r"(tsc));
> +	return tsc;
> +}
> +#endif
> 
>  static inline uint64_t
>  rte_rdtsc_precise(void)
> --
> 2.5.5

Do you also plan to support performance monitor event counters?

Regards,
Nipun
  
Jerin Jacob Aug. 19, 2016, 11:46 a.m. UTC | #2
On Fri, Aug 19, 2016 at 09:43:36AM +0000, Nipun Gupta wrote:
> Hi Jerin,
> 

Hi Nipun,

> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > Sent: Thursday, August 18, 2016 17:22
> > To: dev@dpdk.org
> > Cc: thomas.monjalon@6wind.com; jianbo.liu@linaro.org;
> > viktorin@rehivetech.com; Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > Subject: [dpdk-dev] [PATCH] eal/armv8: high-resolution cycle counter
> > 
> > Existing cntvct_el0 based rte_rdtsc() provides portable
> > means to get wall clock counter at user space. Typically
> > it runs at <= 100MHz.
> > 
> > The alternative method to enable rte_rdtsc() for high resolution
> > wall clock counter is through armv8 PMU subsystem.
> > The PMU cycle counter runs at CPU frequency, However,
> > access to PMU cycle counter from user space is not enabled
> > by default in the arm64 linux kernel.
> > It is possible to enable cycle counter at user space access
> > by configuring the PMU from the privileged mode (kernel space).
> > 
> > by default rte_rdtsc() implementation uses portable
> > cntvct_el0 scheme. Application can choose the PMU based
> > implementation with CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU
> > 
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > ---
> > 
> > The PMU based scheme useful for high accuracy performance profiling.
> > Find below the example steps to configure the PMU based cycle counter on an
> > armv8 machine.
> > 
> > # git clone https://github.com/jerinjacobk/armv8_pmu_cycle_counter_el0
> > # cd armv8_pmu_cycle_counter_el0
> > # make
> > # sudo insmod pmu_el0_cycle_counter.ko
> > # cd $DPDK_DIR
> > # make config T=arm64-armv8a-linuxapp-gcc
> > # echo "CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU=y" >> build/.config
> > # make -j 4
> 
> Can we make this kernel module also a part of DPDK. May be in the linuxapp so that it is also compiled with DPDK?

I thought so, Later I realized it may not be a good idea to add yet
another out of tree module in DPDK repo and DPDK tries to get rid of
existing out of tree modules.

> 
> > 
> > ---
> >  .../common/include/arch/arm/rte_cycles_64.h        | 33
> > ++++++++++++++++++++++
> >  1 file changed, 33 insertions(+)
> > 
> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> > b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> > index 14f2612..867a946 100644
> > --- a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> > +++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> > @@ -45,6 +45,11 @@ extern "C" {
> >   * @return
> >   *   The time base for this lcore.
> >   */
> > +#ifndef RTE_ARM_EAL_RDTSC_USE_PMU
> > +/**
> > + * This call is portable to any ARMv8 architecture, however, typically
> > + * cntvct_el0 runs at <= 100MHz and it may be imprecise for some tasks.
> > + */
> >  static inline uint64_t
> >  rte_rdtsc(void)
> >  {
> > @@ -53,6 +58,34 @@ rte_rdtsc(void)
> >  	asm volatile("mrs %0, cntvct_el0" : "=r" (tsc));
> >  	return tsc;
> >  }
> > +#else
> > +/**
> > + * This is an alternative method to enable rte_rdtsc() with high resolution
> > + * PMU cycles counter.The cycle counter runs at cpu frequency and this scheme
> > + * uses ARMv8 PMU subsystem to get the cycle counter at userspace, However,
> > + * access to PMU cycle counter from user space is not enabled by default in
> > + * arm64 linux kernel.
> > + * It is possible to enable cycle counter at user space access by configuring
> > + * the PMU from the privileged mode (kernel space).
> > + *
> > + * asm volatile("msr pmintenset_el1, %0" : : "r" ((u64)(0 << 31)));
> > + * asm volatile("msr pmcntenset_el0, %0" :: "r" BIT(31));
> > + * asm volatile("msr pmuserenr_el0, %0" : : "r"(BIT(0) | BIT(2)));
> > + * asm volatile("mrs %0, pmcr_el0" : "=r" (val));
> > + * val |= (BIT(0) | BIT(2));
> > + * isb();
> > + * asm volatile("msr pmcr_el0, %0" : : "r" (val));
> 
> In your git repo I see that on cleanup the cycle count register is not disabled (PMCNTENCLR_EL0). It shall be better to disable the cycle count register too at module exit.

OK

> 
> > + *
> > + */
> > +static inline uint64_t
> > +rte_rdtsc(void)
> > +{
> > +	uint64_t tsc;
> > +
> > +	asm volatile("mrs %0, pmccntr_el0" : "=r"(tsc));
> > +	return tsc;
> > +}
> > +#endif
> > 
> >  static inline uint64_t
> >  rte_rdtsc_precise(void)
> > --
> > 2.5.5
> 
> Do you also plan to support performance monitor event counters?

No. This patch was inspired by armv7 PMU scheme and its part of DPDK.
The sole reason to add this support to catch any performance regression
through app/test application.Other than that, I think cntvct_el0 based
existing scheme is good enough for all the use cases.

> 
> Regards,
> Nipun
>
  
Jan Viktorin Aug. 19, 2016, 12:24 p.m. UTC | #3
On Fri, 19 Aug 2016 17:16:12 +0530
Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:

> On Fri, Aug 19, 2016 at 09:43:36AM +0000, Nipun Gupta wrote:
> > Hi Jerin,
> >   
> 
> Hi Nipun,
> 
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > > Sent: Thursday, August 18, 2016 17:22
> > > To: dev@dpdk.org
> > > Cc: thomas.monjalon@6wind.com; jianbo.liu@linaro.org;
> > > viktorin@rehivetech.com; Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > Subject: [dpdk-dev] [PATCH] eal/armv8: high-resolution cycle counter
> > > 
> > > Existing cntvct_el0 based rte_rdtsc() provides portable
> > > means to get wall clock counter at user space. Typically
> > > it runs at <= 100MHz.
> > > 
> > > The alternative method to enable rte_rdtsc() for high resolution
> > > wall clock counter is through armv8 PMU subsystem.
> > > The PMU cycle counter runs at CPU frequency, However,
> > > access to PMU cycle counter from user space is not enabled
> > > by default in the arm64 linux kernel.
> > > It is possible to enable cycle counter at user space access
> > > by configuring the PMU from the privileged mode (kernel space).
> > > 
> > > by default rte_rdtsc() implementation uses portable
> > > cntvct_el0 scheme. Application can choose the PMU based
> > > implementation with CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU
> > > 
> > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > ---
> > > 
> > > The PMU based scheme useful for high accuracy performance profiling.
> > > Find below the example steps to configure the PMU based cycle counter on an
> > > armv8 machine.
> > > 
> > > # git clone https://github.com/jerinjacobk/armv8_pmu_cycle_counter_el0
> > > # cd armv8_pmu_cycle_counter_el0
> > > # make
> > > # sudo insmod pmu_el0_cycle_counter.ko
> > > # cd $DPDK_DIR
> > > # make config T=arm64-armv8a-linuxapp-gcc
> > > # echo "CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU=y" >> build/.config
> > > # make -j 4  
> > 
> > Can we make this kernel module also a part of DPDK. May be in the linuxapp so that it is also compiled with DPDK?  
> 
> I thought so, Later I realized it may not be a good idea to add yet
> another out of tree module in DPDK repo and DPDK tries to get rid of
> existing out of tree modules.

This has also been my way of thinking. However, if we discover that such
kernel module would be really useful, I think we can do it.

> 
> >   
> > > 
> > > ---
> > >  .../common/include/arch/arm/rte_cycles_64.h        | 33
> > > ++++++++++++++++++++++
> > >  1 file changed, 33 insertions(+)
> > > 
> > > diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> > > b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> > > index 14f2612..867a946 100644
> > > --- a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> > > +++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> > > @@ -45,6 +45,11 @@ extern "C" {
> > >   * @return
> > >   *   The time base for this lcore.
> > >   */
> > > +#ifndef RTE_ARM_EAL_RDTSC_USE_PMU
> > > +/**
> > > + * This call is portable to any ARMv8 architecture, however, typically
> > > + * cntvct_el0 runs at <= 100MHz and it may be imprecise for some tasks.
> > > + */
> > >  static inline uint64_t
> > >  rte_rdtsc(void)
> > >  {
> > > @@ -53,6 +58,34 @@ rte_rdtsc(void)
> > >  	asm volatile("mrs %0, cntvct_el0" : "=r" (tsc));
> > >  	return tsc;
> > >  }
> > > +#else
> > > +/**
> > > + * This is an alternative method to enable rte_rdtsc() with high resolution
> > > + * PMU cycles counter.The cycle counter runs at cpu frequency and this scheme
> > > + * uses ARMv8 PMU subsystem to get the cycle counter at userspace, However,
> > > + * access to PMU cycle counter from user space is not enabled by default in
> > > + * arm64 linux kernel.
> > > + * It is possible to enable cycle counter at user space access by configuring
> > > + * the PMU from the privileged mode (kernel space).
> > > + *
> > > + * asm volatile("msr pmintenset_el1, %0" : : "r" ((u64)(0 << 31)));
> > > + * asm volatile("msr pmcntenset_el0, %0" :: "r" BIT(31));
> > > + * asm volatile("msr pmuserenr_el0, %0" : : "r"(BIT(0) | BIT(2)));
> > > + * asm volatile("mrs %0, pmcr_el0" : "=r" (val));
> > > + * val |= (BIT(0) | BIT(2));
> > > + * isb();
> > > + * asm volatile("msr pmcr_el0, %0" : : "r" (val));  
> > 
> > In your git repo I see that on cleanup the cycle count register is not disabled (PMCNTENCLR_EL0). It shall be better to disable the cycle count register too at module exit.  
> 
> OK

+1

I've got a private kernel driver enabling and disabling (hopefully) properly
this for ARMv7. If we'd like to merge it, I'd like to have a single module
or at least single module with 2 implementations...

I can post it if it would be helpful.

Regards
Jan

> 
> >   
> > > + *
> > > + */
> > > +static inline uint64_t
> > > +rte_rdtsc(void)
> > > +{
> > > +	uint64_t tsc;
> > > +
> > > +	asm volatile("mrs %0, pmccntr_el0" : "=r"(tsc));
> > > +	return tsc;
> > > +}
> > > +#endif
> > > 
> > >  static inline uint64_t
> > >  rte_rdtsc_precise(void)
> > > --
> > > 2.5.5  
> > 
> > Do you also plan to support performance monitor event counters?  
> 
> No. This patch was inspired by armv7 PMU scheme and its part of DPDK.
> The sole reason to add this support to catch any performance regression
> through app/test application.Other than that, I think cntvct_el0 based
> existing scheme is good enough for all the use cases.
> 
> > 
> > Regards,
> > Nipun
> >
  
Jerin Jacob Aug. 19, 2016, 12:52 p.m. UTC | #4
On Fri, Aug 19, 2016 at 02:24:58PM +0200, Jan Viktorin wrote:
> On Fri, 19 Aug 2016 17:16:12 +0530
> Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> 
> 
> I've got a private kernel driver enabling and disabling (hopefully) properly
> this for ARMv7. If we'd like to merge it, I'd like to have a single module
> or at least single module with 2 implementations...
> 
> I can post it if it would be helpful.

I don't think we can use this in production as this may alter PMU state used
by 'perf' etc.I think let it be a debug interface for armv7 and armv8
and disable it by default.


> 
> Regards
> Jan
> 
> > 
> > >   
> > > > + *
> > > > + */
> > > > +static inline uint64_t
> > > > +rte_rdtsc(void)
> > > > +{
> > > > +	uint64_t tsc;
> > > > +
> > > > +	asm volatile("mrs %0, pmccntr_el0" : "=r"(tsc));
> > > > +	return tsc;
> > > > +}
> > > > +#endif
> > > > 
> > > >  static inline uint64_t
> > > >  rte_rdtsc_precise(void)
> > > > --
> > > > 2.5.5  
> > > 
> > > Do you also plan to support performance monitor event counters?  
> > 
> > No. This patch was inspired by armv7 PMU scheme and its part of DPDK.
> > The sole reason to add this support to catch any performance regression
> > through app/test application.Other than that, I think cntvct_el0 based
> > existing scheme is good enough for all the use cases.
> > 
> > > 
> > > Regards,
> > > Nipun
> > >   
> 
> 
> 
> -- 
>    Jan Viktorin                  E-mail: Viktorin@RehiveTech.com
>    System Architect              Web:    www.RehiveTech.com
>    RehiveTech
>    Brno, Czech Republic
  
Hemant Agrawal Aug. 23, 2016, 10:01 a.m. UTC | #5
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> Sent: Thursday, August 18, 2016 5:22 PM
> To: dev@dpdk.org
> Cc: thomas.monjalon@6wind.com; jianbo.liu@linaro.org;
> viktorin@rehivetech.com; Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Subject: [dpdk-dev] [PATCH] eal/armv8: high-resolution cycle counter
> 
> Existing cntvct_el0 based rte_rdtsc() provides portable means to get wall clock
> counter at user space. Typically it runs at <= 100MHz.
> 
> The alternative method to enable rte_rdtsc() for high resolution wall clock
> counter is through armv8 PMU subsystem.
> The PMU cycle counter runs at CPU frequency, However, access to PMU cycle
> counter from user space is not enabled by default in the arm64 linux kernel.
> It is possible to enable cycle counter at user space access by configuring the
> PMU from the privileged mode (kernel space).
> 
> by default rte_rdtsc() implementation uses portable
> cntvct_el0 scheme. Application can choose the PMU based implementation with
> CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU
> 
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
  
Thomas Monjalon Oct. 4, 2016, 8:42 a.m. UTC | #6
2016-08-19 18:22, Jerin Jacob:
> On Fri, Aug 19, 2016 at 02:24:58PM +0200, Jan Viktorin wrote:
> > On Fri, 19 Aug 2016 17:16:12 +0530
> > Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > 
> > 
> > I've got a private kernel driver enabling and disabling (hopefully) properly
> > this for ARMv7. If we'd like to merge it, I'd like to have a single module
> > or at least single module with 2 implementations...
> > 
> > I can post it if it would be helpful.
> 
> I don't think we can use this in production as this may alter PMU state used
> by 'perf' etc.I think let it be a debug interface for armv7 and armv8
> and disable it by default.

Please could you document the use of PMU for debug and how it alters
usage of kernel counters?
A patch in doc/guides/prog_guide/profile_app.rst would be welcome.

Ideally, it would be a lot better to have a sysfs entry to enable PMU
counter with an upstream kernel.
  
Thomas Monjalon Oct. 4, 2016, 8:46 a.m. UTC | #7
> > Existing cntvct_el0 based rte_rdtsc() provides portable means to get wall clock
> > counter at user space. Typically it runs at <= 100MHz.
> > 
> > The alternative method to enable rte_rdtsc() for high resolution wall clock
> > counter is through armv8 PMU subsystem.
> > The PMU cycle counter runs at CPU frequency, However, access to PMU cycle
> > counter from user space is not enabled by default in the arm64 linux kernel.
> > It is possible to enable cycle counter at user space access by configuring the
> > PMU from the privileged mode (kernel space).
> > 
> > by default rte_rdtsc() implementation uses portable
> > cntvct_el0 scheme. Application can choose the PMU based implementation with
> > CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU
> > 
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> 
> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

Applied, thanks

Please do not forget documentation and upstreaming efforts.
  

Patch

diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
index 14f2612..867a946 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
@@ -45,6 +45,11 @@  extern "C" {
  * @return
  *   The time base for this lcore.
  */
+#ifndef RTE_ARM_EAL_RDTSC_USE_PMU
+/**
+ * This call is portable to any ARMv8 architecture, however, typically
+ * cntvct_el0 runs at <= 100MHz and it may be imprecise for some tasks.
+ */
 static inline uint64_t
 rte_rdtsc(void)
 {
@@ -53,6 +58,34 @@  rte_rdtsc(void)
 	asm volatile("mrs %0, cntvct_el0" : "=r" (tsc));
 	return tsc;
 }
+#else
+/**
+ * This is an alternative method to enable rte_rdtsc() with high resolution
+ * PMU cycles counter.The cycle counter runs at cpu frequency and this scheme
+ * uses ARMv8 PMU subsystem to get the cycle counter at userspace, However,
+ * access to PMU cycle counter from user space is not enabled by default in
+ * arm64 linux kernel.
+ * It is possible to enable cycle counter at user space access by configuring
+ * the PMU from the privileged mode (kernel space).
+ *
+ * asm volatile("msr pmintenset_el1, %0" : : "r" ((u64)(0 << 31)));
+ * asm volatile("msr pmcntenset_el0, %0" :: "r" BIT(31));
+ * asm volatile("msr pmuserenr_el0, %0" : : "r"(BIT(0) | BIT(2)));
+ * asm volatile("mrs %0, pmcr_el0" : "=r" (val));
+ * val |= (BIT(0) | BIT(2));
+ * isb();
+ * asm volatile("msr pmcr_el0, %0" : : "r" (val));
+ *
+ */
+static inline uint64_t
+rte_rdtsc(void)
+{
+	uint64_t tsc;
+
+	asm volatile("mrs %0, pmccntr_el0" : "=r"(tsc));
+	return tsc;
+}
+#endif
 
 static inline uint64_t
 rte_rdtsc_precise(void)