[v2] ci: catch coredumps

Message ID 20210125150539.27537-1-david.marchand@redhat.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series [v2] ci: catch coredumps |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Functional success Functional Testing PASS
ci/iol-testing warning Testing issues

Commit Message

David Marchand Jan. 25, 2021, 3:05 p.m. UTC
  Parts of the unit tests code rely on forked/secondary processes
(expectedly) failing.
A crash in those situations could be missed so add a check on coredumps
presence after unit tests have run.
When unit tests fail, it can also help checking for coredumps as it
could give more insights on what happened.

In some situations (like explicit call to rte_panic), coredump generation
must be disabled to avoid false positives.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changelog since RFC v1
- removed RFC,
- pushed coredumps into gdb for in-situ analysis,
- gdb presence is used to enable the check. Travis config is left
  untouched for now,

---
 .ci/linux-build.sh          | 37 ++++++++++++++++++++++++++++++++++---
 .github/workflows/build.yml |  4 ++++
 app/test/test_debug.c       | 11 +++++++++--
 app/test/test_mbuf.c        |  9 ++++++++-
 4 files changed, 55 insertions(+), 6 deletions(-)
  

Comments

Luca Boccassi March 2, 2021, 4:16 p.m. UTC | #1
On Mon, 2021-01-25 at 16:05 +0100, David Marchand wrote:
> Parts of the unit tests code rely on forked/secondary processes
> (expectedly) failing.
> A crash in those situations could be missed so add a check on coredumps
> presence after unit tests have run.
> When unit tests fail, it can also help checking for coredumps as it
> could give more insights on what happened.
> 
> In some situations (like explicit call to rte_panic), coredump generation
> must be disabled to avoid false positives.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
> Changelog since RFC v1
> - removed RFC,
> - pushed coredumps into gdb for in-situ analysis,
> - gdb presence is used to enable the check. Travis config is left
>   untouched for now,
> 
> ---
>  .ci/linux-build.sh          | 37 ++++++++++++++++++++++++++++++++++---
>  .github/workflows/build.yml |  4 ++++
>  app/test/test_debug.c       | 11 +++++++++--
>  app/test/test_mbuf.c        |  9 ++++++++-
>  4 files changed, 55 insertions(+), 6 deletions(-)

Acked-by: Luca Boccassi <bluca@debian.org>

This fixes an issue for me, with running the debug_autotest under a
reproducible build CI environment.
  
Aaron Conole March 2, 2021, 9:17 p.m. UTC | #2
David Marchand <david.marchand@redhat.com> writes:

> Parts of the unit tests code rely on forked/secondary processes
> (expectedly) failing.
> A crash in those situations could be missed so add a check on coredumps
> presence after unit tests have run.
> When unit tests fail, it can also help checking for coredumps as it
> could give more insights on what happened.
>
> In some situations (like explicit call to rte_panic), coredump generation
> must be disabled to avoid false positives.
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---

I had this in my backlog and accidentally marked it 'visited' - sorry
about that.

Acked-by: Aaron Conole <aconole@redhat.com>
  
David Marchand March 3, 2021, 9:03 a.m. UTC | #3
On Mon, Jan 25, 2021 at 4:06 PM David Marchand
<david.marchand@redhat.com> wrote:
>
> Parts of the unit tests code rely on forked/secondary processes
> (expectedly) failing.
> A crash in those situations could be missed so add a check on coredumps
> presence after unit tests have run.
> When unit tests fail, it can also help checking for coredumps as it
> could give more insights on what happened.
>
> In some situations (like explicit call to rte_panic), coredump generation
> must be disabled to avoid false positives.
>

As suggested by Luca, marking for backport.
Cc: stable@dpdk.org

> Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Aaron Conole <aconole@redhat.com>
  

Patch

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index d2c821adf3..26c30a2301 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -4,7 +4,10 @@  on_error() {
     if [ $? = 0 ]; then
         exit
     fi
-    FILES_TO_PRINT="build/meson-logs/testlog.txt build/.ninja_log build/meson-logs/meson-log.txt"
+    FILES_TO_PRINT="build/meson-logs/testlog.txt"
+    FILES_TO_PRINT="$FILES_TO_PRINT build/.ninja_log"
+    FILES_TO_PRINT="$FILES_TO_PRINT build/meson-logs/meson-log.txt"
+    FILES_TO_PRINT="$FILES_TO_PRINT build/gdb.log"
 
     for pr_file in $FILES_TO_PRINT; do
         if [ -e "$pr_file" ]; then
@@ -30,6 +33,26 @@  install_libabigail() {
     rm ${version}.tar.gz
 }
 
+configure_coredump() {
+    # No point in configuring coredump without gdb
+    which gdb >/dev/null || return 0
+    ulimit -c unlimited
+    sudo sysctl -w kernel.core_pattern=/tmp/dpdk-core.%e.%p
+}
+
+catch_coredump() {
+    ls /tmp/dpdk-core.*.* 2>/dev/null || return 0
+    for core in /tmp/dpdk-core.*.*; do
+        binary=$(sudo readelf -n $core |grep $(pwd)/build/ 2>/dev/null |head -n1)
+        [ -x $binary ] || binary=
+        sudo gdb $binary -c $core \
+            -ex 'info threads' \
+            -ex 'thread apply all bt full' \
+            -ex 'quit'
+    done |tee -a build/gdb.log
+    return 1
+}
+
 if [ "$AARCH64" = "true" ]; then
     # convert the arch specifier
     OPTS="$OPTS --cross-file config/arm/arm64_armv8_linux_gcc"
@@ -57,7 +80,11 @@  meson build --werror $OPTS
 ninja -C build
 
 if [ "$AARCH64" != "true" ]; then
-    devtools/test-null.sh
+    failed=
+    configure_coredump
+    devtools/test-null.sh || failed="true"
+    catch_coredump
+    [ "$failed" != "true" ]
 fi
 
 if [ "$ABI_CHECKS" = "true" ]; then
@@ -102,5 +129,9 @@  if [ "$ABI_CHECKS" = "true" ]; then
 fi
 
 if [ "$RUN_TESTS" = "true" ]; then
-    sudo meson test -C build --suite fast-tests -t 3
+    failed=
+    configure_coredump
+    sudo meson test -C build --suite fast-tests -t 3 || failed="true"
+    catch_coredump
+    [ "$failed" != "true" ]
 fi
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index a5b579adda..786525e5a3 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -104,6 +104,9 @@  jobs:
       if: env.AARCH64 == 'true'
       run: sudo apt install -y gcc-aarch64-linux-gnu libc6-dev-arm64-cross
         pkg-config-aarch64-linux-gnu
+    - name: Install test tools packages
+      if: env.AARCH64 != 'true' || env.RUN_TESTS == 'true'
+      run: sudo apt install -y gdb
     - name: Install doc generation packages
       if: env.BUILD_DOCS == 'true'
       run: sudo apt install -y doxygen graphviz python3-sphinx
@@ -124,3 +127,4 @@  jobs:
           build/meson-logs/testlog.txt
           build/.ninja_log
           build/meson-logs/meson-log.txt
+          build/gdb.log
diff --git a/app/test/test_debug.c b/app/test/test_debug.c
index 834a7386f5..23b24db177 100644
--- a/app/test/test_debug.c
+++ b/app/test/test_debug.c
@@ -4,6 +4,8 @@ 
 
 #include <stdio.h>
 #include <stdint.h>
+#include <sys/resource.h>
+#include <sys/time.h>
 #include <sys/wait.h>
 #include <unistd.h>
 
@@ -28,9 +30,14 @@  test_panic(void)
 
 	pid = fork();
 
-	if (pid == 0)
+	if (pid == 0) {
+		struct rlimit rl;
+
+		/* No need to generate a coredump when panicking. */
+		rl.rlim_cur = rl.rlim_max = 0;
+		setrlimit(RLIMIT_CORE, &rl);
 		rte_panic("Test Debug\n");
-	else if (pid < 0){
+	} else if (pid < 0) {
 		printf("Fork Failed\n");
 		return -1;
 	}
diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index a40f7d4883..47a7b197d7 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -1174,6 +1174,8 @@  test_refcnt_mbuf(void)
 }
 
 #include <unistd.h>
+#include <sys/resource.h>
+#include <sys/time.h>
 #include <sys/wait.h>
 
 /* use fork() to test mbuf errors panic */
@@ -1186,9 +1188,14 @@  verify_mbuf_check_panics(struct rte_mbuf *buf)
 	pid = fork();
 
 	if (pid == 0) {
+		struct rlimit rl;
+
+		/* No need to generate a coredump when panicking. */
+		rl.rlim_cur = rl.rlim_max = 0;
+		setrlimit(RLIMIT_CORE, &rl);
 		rte_mbuf_sanity_check(buf, 1); /* should panic */
 		exit(0);  /* return normally if it doesn't panic */
-	} else if (pid < 0){
+	} else if (pid < 0) {
 		printf("Fork Failed\n");
 		return -1;
 	}