[RFC] eal_debug: do not use malloc in rte_dump_stack
Checks
Commit Message
The glibc backtrace_symbols() calls malloc which makes it
dangerous to use rte_dump_stack() in a signal handler that
is handling errors that maybe due to memory corruption.
Instead, use dladdr() to lookup up symbols incrementally.
The format of the messages is based on what X org server
has been doing for many years. It changes from bottom up
to top down order.
Bugzilla ID: 929
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
lib/eal/linux/eal_debug.c | 45 ++++++++++++++++++++++++++++-----------
1 file changed, 32 insertions(+), 13 deletions(-)
Comments
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Saturday, 29 January 2022 02.11
>
> The glibc backtrace_symbols() calls malloc which makes it
> dangerous to use rte_dump_stack() in a signal handler that
> is handling errors that maybe due to memory corruption.
Yes. We have experienced that problem with backtrace_symbols(); so as a workaround, our failure signal handler dumps all other information first, and calls backtrace_symbols() last, in case it crashes.
>
> Instead, use dladdr() to lookup up symbols incrementally.
I took a brief look at the dladdr() source code, and it looks good to me.
>
> The format of the messages is based on what X org server
> has been doing for many years. It changes from bottom up
> to top down order.
Good idea. Seems more logical.
>
> Bugzilla ID: 929
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> lib/eal/linux/eal_debug.c | 45 ++++++++++++++++++++++++++++-----------
> 1 file changed, 32 insertions(+), 13 deletions(-)
>
> diff --git a/lib/eal/linux/eal_debug.c b/lib/eal/linux/eal_debug.c
> index 64dab4e0da24..bf232f72f402 100644
> --- a/lib/eal/linux/eal_debug.c
> +++ b/lib/eal/linux/eal_debug.c
> @@ -4,6 +4,7 @@
>
> #ifdef RTE_BACKTRACE
> #include <execinfo.h>
> +#include <dlfcn.h>
> #endif
> #include <stdarg.h>
> #include <signal.h>
> @@ -18,26 +19,44 @@
>
> #define BACKTRACE_SIZE 256
>
> -/* dump the stack of the calling core */
> +/* Dump the stack of the calling core
> + *
> + * Note: this requires some careful usage in order to
> + * stay safe in case where called from a signal
> + * handler and the malloc pool may be corrupted.
> + */
> void rte_dump_stack(void)
> {
> #ifdef RTE_BACKTRACE
> void *func[BACKTRACE_SIZE];
> - char **symb = NULL;
> - int size;
> + int i, size;
>
> size = backtrace(func, BACKTRACE_SIZE);
> - symb = backtrace_symbols(func, size);
> -
> - if (symb == NULL)
> - return;
>
> - while (size > 0) {
> - rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
> - "%d: [%s]\n", size, symb[size - 1]);
> - size --;
> + for (i = 0; i < size; i++) {
> + void *pc = func[i];
> + const char *fname;
> + Dl_info info;
> +
> + if (dladdr(pc, &info) == 0) {
> + rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
> + "%d: ?? [%p]\n", i, pc);
> + continue;
> + }
> +
> + fname = (info.dli_fname && *info.dli_fname) ?
> info.dli_fname : "(vdso)";
> + if (info.dli_saddr != NULL)
> + rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
> + "%d: %s (%s+%#tx) [%p]\n",
> + i, fname, info.dli_sname,
> + (ptrdiff_t)((uintptr_t)pc -
> (uintptr_t)info.dli_saddr),
> + pc);
> + else
> + rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
> + "%d: %s (%p+%#tx) [%p]\n",
> + i, fname, info.dli_fbase,
> + (ptrdiff_t)((uintptr_t)pc -
> (uintptr_t)info.dli_fbase),
> + pc);
> }
> -
> free(symb);
Probably something is lost in formatting here, but free(symb) must also be removed.
> #endif /* RTE_BACKTRACE */
> }
> --
> 2.34.1
>
Great improvement, Stephen!
Acked-by: Morten Brørup <mb@smartsharesystems.com>
@@ -4,6 +4,7 @@
#ifdef RTE_BACKTRACE
#include <execinfo.h>
+#include <dlfcn.h>
#endif
#include <stdarg.h>
#include <signal.h>
@@ -18,26 +19,44 @@
#define BACKTRACE_SIZE 256
-/* dump the stack of the calling core */
+/* Dump the stack of the calling core
+ *
+ * Note: this requires some careful usage in order to
+ * stay safe in case where called from a signal
+ * handler and the malloc pool may be corrupted.
+ */
void rte_dump_stack(void)
{
#ifdef RTE_BACKTRACE
void *func[BACKTRACE_SIZE];
- char **symb = NULL;
- int size;
+ int i, size;
size = backtrace(func, BACKTRACE_SIZE);
- symb = backtrace_symbols(func, size);
-
- if (symb == NULL)
- return;
- while (size > 0) {
- rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
- "%d: [%s]\n", size, symb[size - 1]);
- size --;
+ for (i = 0; i < size; i++) {
+ void *pc = func[i];
+ const char *fname;
+ Dl_info info;
+
+ if (dladdr(pc, &info) == 0) {
+ rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
+ "%d: ?? [%p]\n", i, pc);
+ continue;
+ }
+
+ fname = (info.dli_fname && *info.dli_fname) ? info.dli_fname : "(vdso)";
+ if (info.dli_saddr != NULL)
+ rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
+ "%d: %s (%s+%#tx) [%p]\n",
+ i, fname, info.dli_sname,
+ (ptrdiff_t)((uintptr_t)pc - (uintptr_t)info.dli_saddr),
+ pc);
+ else
+ rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL,
+ "%d: %s (%p+%#tx) [%p]\n",
+ i, fname, info.dli_fbase,
+ (ptrdiff_t)((uintptr_t)pc - (uintptr_t)info.dli_fbase),
+ pc);
}
-
free(symb);
#endif /* RTE_BACKTRACE */
}