[v3,3/5] net/octeontx: fix build with sve enabled

Message ID 20210112025709.1121523-4-ruifeng.wang@arm.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series lpm lookup with sve support |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Ruifeng Wang Jan. 12, 2021, 2:57 a.m. UTC
  Building with gcc 10.2 with SVE extension enabled got error:

{standard input}: Assembler messages:
{standard input}:91: Error: selected processor does not support `addvl x4,x8,#-1'
{standard input}:95: Error: selected processor does not support `ptrue p1.d,all'
{standard input}:135: Error: selected processor does not support `whilelo p2.d,xzr,x5'
{standard input}:137: Error: selected processor does not support `decb x1'

This is because inline assembly code explicitly resets cpu model to
not have SVE support. Thus SVE instructions generated by compiler
auto vectorization got rejected by assembler.

Added SVE to the cpu model specified by inline assembly for SVE support.
Not replacing the inline assembly with C atomics because the driver relies
on specific LSE instruction to interface to co-processor [1].

Fixes: f0c7bb1bf778 ("net/octeontx/base: add octeontx IO operations")
Cc: jerinj@marvell.com
Cc: stable@dpdk.org

[1] https://mails.dpdk.org/archives/dev/2021-January/196092.html

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
v3:
Keep inline assembly and add sve extension to fix issue. (Pavan)

 drivers/net/octeontx/base/octeontx_io.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)
  

Comments

Jerin Jacob Jan. 12, 2021, 4:39 a.m. UTC | #1
On Tue, Jan 12, 2021 at 8:28 AM Ruifeng Wang <ruifeng.wang@arm.com> wrote:
>
> Building with gcc 10.2 with SVE extension enabled got error:
>
> {standard input}: Assembler messages:
> {standard input}:91: Error: selected processor does not support `addvl x4,x8,#-1'
> {standard input}:95: Error: selected processor does not support `ptrue p1.d,all'
> {standard input}:135: Error: selected processor does not support `whilelo p2.d,xzr,x5'
> {standard input}:137: Error: selected processor does not support `decb x1'
>
> This is because inline assembly code explicitly resets cpu model to
> not have SVE support. Thus SVE instructions generated by compiler
> auto vectorization got rejected by assembler.
>
> Added SVE to the cpu model specified by inline assembly for SVE support.
> Not replacing the inline assembly with C atomics because the driver relies
> on specific LSE instruction to interface to co-processor [1].
>
> Fixes: f0c7bb1bf778 ("net/octeontx/base: add octeontx IO operations")
> Cc: jerinj@marvell.com
> Cc: stable@dpdk.org
>
> [1] https://mails.dpdk.org/archives/dev/2021-January/196092.html
>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>


Reviewed-by: Jerin Jacob <jerinj@marvell.com>


> ---
> v3:
> Keep inline assembly and add sve extension to fix issue. (Pavan)
>
>  drivers/net/octeontx/base/octeontx_io.h | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/octeontx/base/octeontx_io.h b/drivers/net/octeontx/base/octeontx_io.h
> index 04b9ce191..d0b9cfbc6 100644
> --- a/drivers/net/octeontx/base/octeontx_io.h
> +++ b/drivers/net/octeontx/base/octeontx_io.h
> @@ -52,6 +52,11 @@ do {                                                 \
>  #endif
>
>  #if defined(RTE_ARCH_ARM64)
> +#if defined(__ARM_FEATURE_SVE)
> +#define __LSE_PREAMBLE " .cpu  generic+lse+sve\n"
> +#else
> +#define __LSE_PREAMBLE " .cpu  generic+lse\n"
> +#endif
>  /**
>   * Perform an atomic fetch-and-add operation.
>   */
> @@ -61,7 +66,7 @@ octeontx_reg_ldadd_u64(void *addr, int64_t off)
>         uint64_t old_val;
>
>         __asm__ volatile(
> -               " .cpu          generic+lse\n"
> +               __LSE_PREAMBLE
>                 " ldadd %1, %0, [%2]\n"
>                 : "=r" (old_val) : "r" (off), "r" (addr) : "memory");
>
> @@ -98,12 +103,13 @@ octeontx_reg_lmtst(void *lmtline_va, void *ioreg_va, const uint64_t cmdbuf[],
>
>                 /* LDEOR initiates atomic transfer to I/O device */
>                 __asm__ volatile(
> -                       " .cpu          generic+lse\n"
> +                       __LSE_PREAMBLE
>                         " ldeor xzr, %0, [%1]\n"
>                         : "=r" (result) : "r" (ioreg_va) : "memory");
>         } while (!result);
>  }
>
> +#undef __LSE_PREAMBLE
>  #else
>
>  static inline uint64_t
> --
> 2.25.1
>
  

Patch

diff --git a/drivers/net/octeontx/base/octeontx_io.h b/drivers/net/octeontx/base/octeontx_io.h
index 04b9ce191..d0b9cfbc6 100644
--- a/drivers/net/octeontx/base/octeontx_io.h
+++ b/drivers/net/octeontx/base/octeontx_io.h
@@ -52,6 +52,11 @@  do {							\
 #endif
 
 #if defined(RTE_ARCH_ARM64)
+#if defined(__ARM_FEATURE_SVE)
+#define __LSE_PREAMBLE " .cpu	generic+lse+sve\n"
+#else
+#define __LSE_PREAMBLE " .cpu	generic+lse\n"
+#endif
 /**
  * Perform an atomic fetch-and-add operation.
  */
@@ -61,7 +66,7 @@  octeontx_reg_ldadd_u64(void *addr, int64_t off)
 	uint64_t old_val;
 
 	__asm__ volatile(
-		" .cpu		generic+lse\n"
+		__LSE_PREAMBLE
 		" ldadd	%1, %0, [%2]\n"
 		: "=r" (old_val) : "r" (off), "r" (addr) : "memory");
 
@@ -98,12 +103,13 @@  octeontx_reg_lmtst(void *lmtline_va, void *ioreg_va, const uint64_t cmdbuf[],
 
 		/* LDEOR initiates atomic transfer to I/O device */
 		__asm__ volatile(
-			" .cpu		generic+lse\n"
+			__LSE_PREAMBLE
 			" ldeor	xzr, %0, [%1]\n"
 			: "=r" (result) : "r" (ioreg_va) : "memory");
 	} while (!result);
 }
 
+#undef __LSE_PREAMBLE
 #else
 
 static inline uint64_t