BUILD bug hidden in SFC driver.

Message ID 20231111085634.5df512bf@hermes.local (mailing list archive)
State Not Applicable, archived
Delegated to: Ferruh Yigit
Headers
Series BUILD bug hidden in SFC driver. |

Checks

Context Check Description
ci/loongarch-compilation warning apply patch failure
ci/Intel-compilation warning apply issues
ci/iol-testing warning apply patch failure

Commit Message

Stephen Hemminger Nov. 11, 2023, 4:56 p.m. UTC
  While examining the use of VLA in DPDK, ran into a bug in sfc driver.

If DPDK is built with -Wvla, then the RTE_BUILD_BUG_ON() macro won't work
as written. Experimenting with a better more portable version of that macro
as:
	#define RTE_BUILD_BUG_ON(e) _Static_assert(!(e),  #e)

revealed that the SFC driver was calling RTE_BUILD_BUG_ON with non constant
expression.

../drivers/net/sfc/sfc_ef100_tx.c: In function ‘sfc_ef100_tx_pkt_descs_max’:
../lib/eal/include/rte_common.h:585:20: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘int’ [-Wsign-compare]
  585 |                 _a < _b ? _a : _b; \
      |                    ^
../lib/eal/include/rte_common.h:498:46: note: in definition of macro ‘RTE_BUILD_BUG_ON’
  498 | #define RTE_BUILD_BUG_ON(e) _Static_assert(!(e),  #e)
      |                                              ^
../drivers/net/sfc/sfc_ef100_tx.c:566:34: note: in expansion of macro ‘RTE_MIN’
  566 |                                  RTE_MIN((unsigned int)EFX_MAC_PDU_MAX,
      |                                  ^~~~~~~
../lib/eal/include/rte_common.h:585:32: warning: operand of ‘?:’ changes signedness from ‘int’ to ‘unsigned int’ due to unsignedness of other operand [-Wsign-compare]
  585 |                 _a < _b ? _a : _b; \
      |                                ^~
../lib/eal/include/rte_common.h:498:46: note: in definition of macro ‘RTE_BUILD_BUG_ON’
  498 | #define RTE_BUILD_BUG_ON(e) _Static_assert(!(e),  #e)
      |                                              ^
../drivers/net/sfc/sfc_ef100_tx.c:566:34: note: in expansion of macro ‘RTE_MIN’
  566 |                                  RTE_MIN((unsigned int)EFX_MAC_PDU_MAX,
      |                                  ^~~~~~~
../lib/eal/include/rte_common.h:498:44: error: expression in static assertion is not constant
  498 | #define RTE_BUILD_BUG_ON(e) _Static_assert(!(e),  #e)
      |                                            ^~~~
../drivers/net/sfc/sfc_ef100_tx.c:565:17: note: in expansion of macro ‘RTE_BUILD_BUG_ON’


The problem is that Gcc does not evaluate a ternary operator expression
with all constants at compile time to produce a constant value! Apparently,
the language standards leave this as ambiguous.

If the code is expanded into two assertions as:


Then a new problem arises:
In file included from ../lib/mbuf/rte_mbuf.h:36,
                 from ../drivers/net/sfc/sfc_ef100_tx.c:12:
../drivers/net/sfc/sfc_ef100_tx.c: In function ‘sfc_ef100_tx_pkt_descs_max’:
../lib/eal/include/rte_common.h:498:29: error: static assertion failed: "SFC_EF100_TX_SEND_DESC_LEN_MAX < SFC_MBUF_SEG_LEN_MAX"
  498 | #define RTE_BUILD_BUG_ON(e) _Static_assert(!(e),  #e)
      |                             ^~~~~~~~~~~~~~
../drivers/net/sfc/sfc_ef100_tx.c:566:17: note: in expansion of macro ‘RTE_BUILD_BUG_ON’
  566 |                 RTE_BUILD_BUG_ON(SFC_EF100_TX_SEND_DESC_LEN_MAX < SFC_MBUF_SEG_LEN_MAX);
      |                 ^~~~~~~~~~~~~~~~

Building a little program to unwind the #defines reveals:

SFC_EF100_TX_SEND_DESC_LEN_MAX = 16383
EFX_MAC_PDU_MAX = 9240
SFC_MBUF_SEG_LEN_MAX = 65535

I.e:
	RTE_BUILD_BUG_ON(16383 < RTE_MIN(9240, 65535));
	

Therefore the current driver should be getting build bug, but the existing macro
hides it.
  

Comments

Stephen Hemminger Nov. 11, 2023, 5:29 p.m. UTC | #1
On Sat, 11 Nov 2023 08:56:34 -0800
Stephen Hemminger <stephen@networkplumber.org> wrote:

> While examining the use of VLA in DPDK, ran into a bug in sfc driver.
> 
> If DPDK is built with -Wvla, then the RTE_BUILD_BUG_ON() macro won't work
> as written. Experimenting with a better more portable version of that macro
> as:
> 	#define RTE_BUILD_BUG_ON(e) _Static_assert(!(e),  #e)

Sending this as RFC also catches another driver with hidden failure.

from ../drivers/event/opdl/opdl_ring.c:10:
./drivers/event/opdl/opdl_ring.c: In function ‘opdl_ring_create’:
./lib/eal/include/rte_common.h:498:44: error: expression in static assertion is not constant
#define RTE_BUILD_BUG_ON(e) _Static_assert(!(e), #e)
^~~~
./drivers/event/opdl/opdl_ring.c:913:2: note: in expansion of macro ‘RTE_BUILD_BUG_ON’
RTE_BUILD_BUG_ON(!rte_is_power_of_2(OPDL_DISCLAIMS_PER_LCORE));
  
Ivan Malov Nov. 13, 2023, 12:04 p.m. UTC | #2
Hi Stephen,

On Sat, 11 Nov 2023, Stephen Hemminger wrote:

> While examining the use of VLA in DPDK, ran into a bug in sfc driver.
>
> If DPDK is built with -Wvla, then the RTE_BUILD_BUG_ON() macro won't work
> as written. Experimenting with a better more portable version of that macro
> as:
> 	#define RTE_BUILD_BUG_ON(e) _Static_assert(!(e),  #e)

First of all, thanks for the effort. Very helpful.
Please see below.

>
> revealed that the SFC driver was calling RTE_BUILD_BUG_ON with non constant
> expression.
>
> ../drivers/net/sfc/sfc_ef100_tx.c: In function ‘sfc_ef100_tx_pkt_descs_max’:
> ../lib/eal/include/rte_common.h:585:20: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘int’ [-Wsign-compare]
>  585 |                 _a < _b ? _a : _b; \
>      |                    ^
> ../lib/eal/include/rte_common.h:498:46: note: in definition of macro ‘RTE_BUILD_BUG_ON’
>  498 | #define RTE_BUILD_BUG_ON(e) _Static_assert(!(e),  #e)
>      |                                              ^
> ../drivers/net/sfc/sfc_ef100_tx.c:566:34: note: in expansion of macro ‘RTE_MIN’
>  566 |                                  RTE_MIN((unsigned int)EFX_MAC_PDU_MAX,
>      |                                  ^~~~~~~
> ../lib/eal/include/rte_common.h:585:32: warning: operand of ‘?:’ changes signedness from ‘int’ to ‘unsigned int’ due to unsignedness of other operand [-Wsign-compare]
>  585 |                 _a < _b ? _a : _b; \
>      |                                ^~
> ../lib/eal/include/rte_common.h:498:46: note: in definition of macro ‘RTE_BUILD_BUG_ON’
>  498 | #define RTE_BUILD_BUG_ON(e) _Static_assert(!(e),  #e)
>      |                                              ^
> ../drivers/net/sfc/sfc_ef100_tx.c:566:34: note: in expansion of macro ‘RTE_MIN’
>  566 |                                  RTE_MIN((unsigned int)EFX_MAC_PDU_MAX,
>      |                                  ^~~~~~~
> ../lib/eal/include/rte_common.h:498:44: error: expression in static assertion is not constant
>  498 | #define RTE_BUILD_BUG_ON(e) _Static_assert(!(e),  #e)
>      |                                            ^~~~
> ../drivers/net/sfc/sfc_ef100_tx.c:565:17: note: in expansion of macro ‘RTE_BUILD_BUG_ON’
>
>
> The problem is that Gcc does not evaluate a ternary operator expression
> with all constants at compile time to produce a constant value! Apparently,
> the language standards leave this as ambiguous.
>
> If the code is expanded into two assertions as:
>
> diff --git a/drivers/net/sfc/sfc_ef100_tx.c b/drivers/net/sfc/sfc_ef100_tx.c
> index 1b6374775f07..25e6633d6679 100644
> --- a/drivers/net/sfc/sfc_ef100_tx.c
> +++ b/drivers/net/sfc/sfc_ef100_tx.c
> @@ -562,9 +562,8 @@ sfc_ef100_tx_pkt_descs_max(const struct rte_mbuf *m)
>                 * Make sure that the first segment does not need fragmentation
>                 * (split into many Tx descriptors).
>                 */
> -               RTE_BUILD_BUG_ON(SFC_EF100_TX_SEND_DESC_LEN_MAX <
> -                                RTE_MIN((unsigned int)EFX_MAC_PDU_MAX,
> -                                SFC_MBUF_SEG_LEN_MAX));
> +               RTE_BUILD_BUG_ON(SFC_EF100_TX_SEND_DESC_LEN_MAX < EFX_MAC_PDU_MAX);
> +               RTE_BUILD_BUG_ON(SFC_EF100_TX_SEND_DESC_LEN_MAX < SFC_MBUF_SEG_LEN_MAX);
>        }
>
>        if (m->ol_flags & sfc_dp_mport_override) {
>
> Then a new problem arises:
> In file included from ../lib/mbuf/rte_mbuf.h:36,
>                 from ../drivers/net/sfc/sfc_ef100_tx.c:12:
> ../drivers/net/sfc/sfc_ef100_tx.c: In function ‘sfc_ef100_tx_pkt_descs_max’:
> ../lib/eal/include/rte_common.h:498:29: error: static assertion failed: "SFC_EF100_TX_SEND_DESC_LEN_MAX < SFC_MBUF_SEG_LEN_MAX"
>  498 | #define RTE_BUILD_BUG_ON(e) _Static_assert(!(e),  #e)
>      |                             ^~~~~~~~~~~~~~
> ../drivers/net/sfc/sfc_ef100_tx.c:566:17: note: in expansion of macro ‘RTE_BUILD_BUG_ON’
>  566 |                 RTE_BUILD_BUG_ON(SFC_EF100_TX_SEND_DESC_LEN_MAX < SFC_MBUF_SEG_LEN_MAX);
>      |                 ^~~~~~~~~~~~~~~~
>
> Building a little program to unwind the #defines reveals:
>
> SFC_EF100_TX_SEND_DESC_LEN_MAX = 16383
> EFX_MAC_PDU_MAX = 9240
> SFC_MBUF_SEG_LEN_MAX = 65535
>
> I.e:
> 	RTE_BUILD_BUG_ON(16383 < RTE_MIN(9240, 65535));
>
>
> Therefore the current driver should be getting build bug, but the existing macro
> hides it.

As far as I understand, the intention behind this check is to make sure
that SFC_EF100_TX_SEND_DESC_LEN_MAX represents enough room to
accommodate either EFX_MAC_PDU_MAX or SFC_MBUF_SEG_LEN_MAX bytes,
whichever is smaller. Is 16383 sufficient to accommodate 9240?
I think so. Do you agree?

That being said, indeed, applying the "more portable version" of yours
results in me seeing the warning about a non-constant expression.

Applying the following patch makes all errors disappear
when building with either version of RTE_BUILD_BUG_ON:

diff --git a/drivers/net/sfc/sfc_ef100_tx.c b/drivers/net/sfc/sfc_ef100_tx.c
index 1b6374775f..01f37c2616 100644
--- a/drivers/net/sfc/sfc_ef100_tx.c
+++ b/drivers/net/sfc/sfc_ef100_tx.c
@@ -563,7 +563,7 @@ sfc_ef100_tx_pkt_descs_max(const struct rte_mbuf *m)
                  * (split into many Tx descriptors).
                  */
                 RTE_BUILD_BUG_ON(SFC_EF100_TX_SEND_DESC_LEN_MAX <
-                                RTE_MIN((unsigned int)EFX_MAC_PDU_MAX,
+                                MIN((unsigned int)EFX_MAC_PDU_MAX,
                                  SFC_MBUF_SEG_LEN_MAX));
         }

with MIN being defined in drivers/common/sfc_efx/efsys.h as
#define MIN(v1, v2)	((v1) < (v2) ? (v1) : (v2))

Would that be an acceptable fix? Or am I missing something?

Thank you.
  
Tyler Retzlaff Nov. 13, 2023, 4:28 p.m. UTC | #3
On Sat, Nov 11, 2023 at 08:56:34AM -0800, Stephen Hemminger wrote:
> While examining the use of VLA in DPDK, ran into a bug in sfc driver.
> 
> If DPDK is built with -Wvla, then the RTE_BUILD_BUG_ON() macro won't work
> as written. Experimenting with a better more portable version of that macro
> as:
> 	#define RTE_BUILD_BUG_ON(e) _Static_assert(!(e),  #e)
> 
> revealed that the SFC driver was calling RTE_BUILD_BUG_ON with non constant
> expression.
> 
> ../drivers/net/sfc/sfc_ef100_tx.c: In function ‘sfc_ef100_tx_pkt_descs_max’:
> ../lib/eal/include/rte_common.h:585:20: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘int’ [-Wsign-compare]
>   585 |                 _a < _b ? _a : _b; \
>       |                    ^
> ../lib/eal/include/rte_common.h:498:46: note: in definition of macro ‘RTE_BUILD_BUG_ON’
>   498 | #define RTE_BUILD_BUG_ON(e) _Static_assert(!(e),  #e)
>       |                                              ^
> ../drivers/net/sfc/sfc_ef100_tx.c:566:34: note: in expansion of macro ‘RTE_MIN’
>   566 |                                  RTE_MIN((unsigned int)EFX_MAC_PDU_MAX,
>       |                                  ^~~~~~~
> ../lib/eal/include/rte_common.h:585:32: warning: operand of ‘?:’ changes signedness from ‘int’ to ‘unsigned int’ due to unsignedness of other operand [-Wsign-compare]
>   585 |                 _a < _b ? _a : _b; \
>       |                                ^~
> ../lib/eal/include/rte_common.h:498:46: note: in definition of macro ‘RTE_BUILD_BUG_ON’
>   498 | #define RTE_BUILD_BUG_ON(e) _Static_assert(!(e),  #e)
>       |                                              ^
> ../drivers/net/sfc/sfc_ef100_tx.c:566:34: note: in expansion of macro ‘RTE_MIN’
>   566 |                                  RTE_MIN((unsigned int)EFX_MAC_PDU_MAX,
>       |                                  ^~~~~~~
> ../lib/eal/include/rte_common.h:498:44: error: expression in static assertion is not constant
>   498 | #define RTE_BUILD_BUG_ON(e) _Static_assert(!(e),  #e)
>       |                                            ^~~~
> ../drivers/net/sfc/sfc_ef100_tx.c:565:17: note: in expansion of macro ‘RTE_BUILD_BUG_ON’
> 
> 
> The problem is that Gcc does not evaluate a ternary operator expression
> with all constants at compile time to produce a constant value! Apparently,
> the language standards leave this as ambiguous.

the problem is the expression being asserted was *never* constant or at
least not the expression you thought was being evaluated.

when the non-constant value is provided to the macro it produces a VLA
where the sizeof that VLA is the result of the expansion. i'm aware of
multiple instances of this bug outside of the use of this macro in dpdk.

the reason the expression is non-constant is because all the RTE_XXX
macros are expanded to statement expressions which themselves do not
evaluate as constant expressions. any RTE_BUILD_BUG_ON() expanding an
RTE_XXX macro will be impacted.

ideally we would move the source tree to disable VLAs entirely to wipe
out this and other similar bugs i've found. it would also be nice to
find a way to eliminate the use of statement expressions because of the
extremely subtle ways they can explode when seemingly innocuous control
flow mechanisms are present.

we should immediately convert to _Static_assert even if that means
temporarily disabling some of the RTE_BUILD_BUG_ON entries (which as we
can see have no value as written).

we should enable -Wvla and then handle re-enablement as an exception to
allow us to transition to a tree without VLAs to squash new additions
and catch other unexpected VLA generation.

> 
> If the code is expanded into two assertions as:
> 
> diff --git a/drivers/net/sfc/sfc_ef100_tx.c b/drivers/net/sfc/sfc_ef100_tx.c
> index 1b6374775f07..25e6633d6679 100644
> --- a/drivers/net/sfc/sfc_ef100_tx.c
> +++ b/drivers/net/sfc/sfc_ef100_tx.c
> @@ -562,9 +562,8 @@ sfc_ef100_tx_pkt_descs_max(const struct rte_mbuf *m)
>                  * Make sure that the first segment does not need fragmentation
>                  * (split into many Tx descriptors).
>                  */
> -               RTE_BUILD_BUG_ON(SFC_EF100_TX_SEND_DESC_LEN_MAX <
> -                                RTE_MIN((unsigned int)EFX_MAC_PDU_MAX,
> -                                SFC_MBUF_SEG_LEN_MAX));
> +               RTE_BUILD_BUG_ON(SFC_EF100_TX_SEND_DESC_LEN_MAX < EFX_MAC_PDU_MAX);
> +               RTE_BUILD_BUG_ON(SFC_EF100_TX_SEND_DESC_LEN_MAX < SFC_MBUF_SEG_LEN_MAX);
>         }
>  
>         if (m->ol_flags & sfc_dp_mport_override) {
> 
> Then a new problem arises:
> In file included from ../lib/mbuf/rte_mbuf.h:36,
>                  from ../drivers/net/sfc/sfc_ef100_tx.c:12:
> ../drivers/net/sfc/sfc_ef100_tx.c: In function ‘sfc_ef100_tx_pkt_descs_max’:
> ../lib/eal/include/rte_common.h:498:29: error: static assertion failed: "SFC_EF100_TX_SEND_DESC_LEN_MAX < SFC_MBUF_SEG_LEN_MAX"
>   498 | #define RTE_BUILD_BUG_ON(e) _Static_assert(!(e),  #e)
>       |                             ^~~~~~~~~~~~~~
> ../drivers/net/sfc/sfc_ef100_tx.c:566:17: note: in expansion of macro ‘RTE_BUILD_BUG_ON’
>   566 |                 RTE_BUILD_BUG_ON(SFC_EF100_TX_SEND_DESC_LEN_MAX < SFC_MBUF_SEG_LEN_MAX);
>       |                 ^~~~~~~~~~~~~~~~
> 
> Building a little program to unwind the #defines reveals:
> 
> SFC_EF100_TX_SEND_DESC_LEN_MAX = 16383
> EFX_MAC_PDU_MAX = 9240
> SFC_MBUF_SEG_LEN_MAX = 65535
> 
> I.e:
> 	RTE_BUILD_BUG_ON(16383 < RTE_MIN(9240, 65535));
> 	
> 
> Therefore the current driver should be getting build bug, but the existing macro
> hides it.
  
Stephen Hemminger Nov. 13, 2023, 4:30 p.m. UTC | #4
On Mon, 13 Nov 2023 16:04:51 +0400 (+04)
Ivan Malov <ivan.malov@arknetworks.am> wrote:

> diff --git a/drivers/net/sfc/sfc_ef100_tx.c b/drivers/net/sfc/sfc_ef100_tx.c
> index 1b6374775f..01f37c2616 100644
> --- a/drivers/net/sfc/sfc_ef100_tx.c
> +++ b/drivers/net/sfc/sfc_ef100_tx.c
> @@ -563,7 +563,7 @@ sfc_ef100_tx_pkt_descs_max(const struct rte_mbuf *m)
>                   * (split into many Tx descriptors).
>                   */
>                  RTE_BUILD_BUG_ON(SFC_EF100_TX_SEND_DESC_LEN_MAX <
> -                                RTE_MIN((unsigned int)EFX_MAC_PDU_MAX,
> +                                MIN((unsigned int)EFX_MAC_PDU_MAX,
>                                   SFC_MBUF_SEG_LEN_MAX));
>          }
> 
> with MIN being defined in drivers/common/sfc_efx/efsys.h as
> #define MIN(v1, v2)	((v1) < (v2) ? (v1) : (v2))

That should work.
  

Patch

diff --git a/drivers/net/sfc/sfc_ef100_tx.c b/drivers/net/sfc/sfc_ef100_tx.c
index 1b6374775f07..25e6633d6679 100644
--- a/drivers/net/sfc/sfc_ef100_tx.c
+++ b/drivers/net/sfc/sfc_ef100_tx.c
@@ -562,9 +562,8 @@  sfc_ef100_tx_pkt_descs_max(const struct rte_mbuf *m)
                 * Make sure that the first segment does not need fragmentation
                 * (split into many Tx descriptors).
                 */
-               RTE_BUILD_BUG_ON(SFC_EF100_TX_SEND_DESC_LEN_MAX <
-                                RTE_MIN((unsigned int)EFX_MAC_PDU_MAX,
-                                SFC_MBUF_SEG_LEN_MAX));
+               RTE_BUILD_BUG_ON(SFC_EF100_TX_SEND_DESC_LEN_MAX < EFX_MAC_PDU_MAX);
+               RTE_BUILD_BUG_ON(SFC_EF100_TX_SEND_DESC_LEN_MAX < SFC_MBUF_SEG_LEN_MAX);
        }
 
        if (m->ol_flags & sfc_dp_mport_override) {