[RFC] eal_debug: do not use malloc in rte_dump_stack

Message ID 20220129011039.264377-1-stephen@networkplumber.org (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [RFC] eal_debug: do not use malloc in rte_dump_stack |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation warning apply issues

Commit Message

Stephen Hemminger Jan. 29, 2022, 1:10 a.m. UTC
  The glibc backtrace_symbols() calls malloc which makes it
dangerous to use rte_dump_stack() in a signal handler that
is handling errors that maybe due to memory corruption.

Instead, use dladdr() to lookup up symbols incrementally.

The format of the messages is based on what X org server
has been doing for many years. It changes from bottom up
to top down order.

Bugzilla ID: 929
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/eal/linux/eal_debug.c | 45 ++++++++++++++++++++++++++++-----------
 1 file changed, 32 insertions(+), 13 deletions(-)
  

Comments

Morten Brørup Jan. 29, 2022, 8:25 a.m. UTC | #1
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Saturday, 29 January 2022 02.11
> 
> The glibc backtrace_symbols() calls malloc which makes it
> dangerous to use rte_dump_stack() in a signal handler that
> is handling errors that maybe due to memory corruption.

Yes. We have experienced that problem with backtrace_symbols(); so as a workaround, our failure signal handler dumps all other information first, and calls backtrace_symbols() last, in case it crashes.

> 
> Instead, use dladdr() to lookup up symbols incrementally.

I took a brief look at the dladdr() source code, and it looks good to me.

> 
> The format of the messages is based on what X org server
> has been doing for many years. It changes from bottom up
> to top down order.

Good idea. Seems more logical.

> 
> Bugzilla ID: 929
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/eal/linux/eal_debug.c | 45 ++++++++++++++++++++++++++++-----------
>  1 file changed, 32 insertions(+), 13 deletions(-)
> 
> diff --git a/lib/eal/linux/eal_debug.c b/lib/eal/linux/eal_debug.c
> index 64dab4e0da24..bf232f72f402 100644
> --- a/lib/eal/linux/eal_debug.c
> +++ b/lib/eal/linux/eal_debug.c
> @@ -4,6 +4,7 @@
> 
>  #ifdef RTE_BACKTRACE
>  #include <execinfo.h>
> +#include <dlfcn.h>
>  #endif
>  #include <stdarg.h>
>  #include <signal.h>
> @@ -18,26 +19,44 @@
> 
>  #define BACKTRACE_SIZE 256
> 
> -/* dump the stack of the calling core */
> +/* Dump the stack of the calling core
> + *
> + * Note: this requires some careful usage in order to
> + * stay safe in case where called from a signal
> + * handler and the malloc pool may be corrupted.
> + */
>  void rte_dump_stack(void)
>  {
>  #ifdef RTE_BACKTRACE
>  	void *func[BACKTRACE_SIZE];
> -	char **symb = NULL;
> -	int size;
> +	int i, size;
> 
>  	size = backtrace(func, BACKTRACE_SIZE);
> -	symb = backtrace_symbols(func, size);
> -
> -	if (symb == NULL)
> -		return;
> 
> -	while (size > 0) {
> -		rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
> -			"%d: [%s]\n", size, symb[size - 1]);
> -		size --;
> +	for (i = 0; i < size; i++) {
> +		void *pc = func[i];
> +		const char *fname;
> +		Dl_info info;
> +
> +		if (dladdr(pc, &info) == 0) {
> +			rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
> +				"%d: ?? [%p]\n", i, pc);
> +			continue;
> +		}
> +
> +		fname = (info.dli_fname && *info.dli_fname) ?
> info.dli_fname : "(vdso)";
> +		if (info.dli_saddr != NULL)
> +			rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
> +				"%d: %s (%s+%#tx) [%p]\n",
> +				i, fname, info.dli_sname,
> +				(ptrdiff_t)((uintptr_t)pc -
> (uintptr_t)info.dli_saddr),
> +				pc);
> +		else
> +			rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
> +				"%d: %s (%p+%#tx) [%p]\n",
> +				i, fname, info.dli_fbase,
> +				(ptrdiff_t)((uintptr_t)pc -
> (uintptr_t)info.dli_fbase),
> +				pc);
>  	}
> -
> 	free(symb);

Probably something is lost in formatting here, but free(symb) must also be removed.

>  #endif /* RTE_BACKTRACE */
>  }
> --
> 2.34.1
> 

Great improvement, Stephen!

Acked-by: Morten Brørup <mb@smartsharesystems.com>
  

Patch

diff --git a/lib/eal/linux/eal_debug.c b/lib/eal/linux/eal_debug.c
index 64dab4e0da24..bf232f72f402 100644
--- a/lib/eal/linux/eal_debug.c
+++ b/lib/eal/linux/eal_debug.c
@@ -4,6 +4,7 @@ 
 
 #ifdef RTE_BACKTRACE
 #include <execinfo.h>
+#include <dlfcn.h>
 #endif
 #include <stdarg.h>
 #include <signal.h>
@@ -18,26 +19,44 @@ 
 
 #define BACKTRACE_SIZE 256
 
-/* dump the stack of the calling core */
+/* Dump the stack of the calling core
+ *
+ * Note: this requires some careful usage in order to
+ * stay safe in case where called from a signal
+ * handler and the malloc pool may be corrupted.
+ */
 void rte_dump_stack(void)
 {
 #ifdef RTE_BACKTRACE
 	void *func[BACKTRACE_SIZE];
-	char **symb = NULL;
-	int size;
+	int i, size;
 
 	size = backtrace(func, BACKTRACE_SIZE);
-	symb = backtrace_symbols(func, size);
-
-	if (symb == NULL)
-		return;
 
-	while (size > 0) {
-		rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
-			"%d: [%s]\n", size, symb[size - 1]);
-		size --;
+	for (i = 0; i < size; i++) {
+		void *pc = func[i];
+		const char *fname;
+		Dl_info info;
+
+		if (dladdr(pc, &info) == 0) {
+			rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
+				"%d: ?? [%p]\n", i, pc);
+			continue;
+		}
+
+		fname = (info.dli_fname && *info.dli_fname) ? info.dli_fname : "(vdso)";
+		if (info.dli_saddr != NULL)
+			rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
+				"%d: %s (%s+%#tx) [%p]\n",
+				i, fname, info.dli_sname,
+				(ptrdiff_t)((uintptr_t)pc - (uintptr_t)info.dli_saddr),
+				pc);
+		else
+			rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
+				"%d: %s (%p+%#tx) [%p]\n",
+				i, fname, info.dli_fbase,
+				(ptrdiff_t)((uintptr_t)pc - (uintptr_t)info.dli_fbase),
+				pc);
 	}
-
	free(symb);
 #endif /* RTE_BACKTRACE */
 }