Message ID | 20200104013341.19809-1-stephen@networkplumber.org (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 14A08A04DD; Sat, 4 Jan 2020 02:33:53 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 406751D509; Sat, 4 Jan 2020 02:33:52 +0100 (CET) Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by dpdk.org (Postfix) with ESMTP id 7193F1D501 for <dev@dpdk.org>; Sat, 4 Jan 2020 02:33:50 +0100 (CET) Received: by mail-pj1-f42.google.com with SMTP id j11so5213826pjs.1 for <dev@dpdk.org>; Fri, 03 Jan 2020 17:33:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=FRuUbN1OEE+X7cnfHd4EopovSr5Kjxhs16L2WBMiIdk=; b=WT9g1hqaS6suZmib0WUcbAtN91oVaASzkexIJw+mi+FUTiBQp/5wXyp6hIJbuXoHof Mw4XQJQXIpCQCHUXHBAauMXLpQtdOzVj4MBNHlfpPvvZHYPoRsRNlLqPSDhancqSKYuw qyS7RG7FB3VC3NTCkX8qSuAcYZe1d6goIV3Kh+0y4/iqBkPI26nqu8uxBYls/tdQsw/K Hz2kRxP1TSWyrOEsSxRbHmOfpE6bfH6neikS6edF1DI7gov99WeHaYfWdgexZi+Unieo OmNVoajYdPCjqkmegEkzOcZNPoH3rXsi66YfKBhAs1DQDTCHsh7sGIOXsD141faSfAts T4Gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=FRuUbN1OEE+X7cnfHd4EopovSr5Kjxhs16L2WBMiIdk=; b=nOyjRcGm1lYo7JGfe3UQNmOqp/ejDi9GG+K3Uniqil1EUh674dcFHBnpwQsxuNYoQk FzuB2C8+gdA5N6PyG7OzytnPX0UgfFJQgu0Dd+YnMDMNC8i1LPfzlOWHlz5f4IKJg6Pp W0zUa1928iHx6J73CnFBsKjv/dZs3mMkhkTek3KKz4mqBvLjnFK4R+9S5wU7CuHsABR/ B0CDkJxvsu54llM8ZWOvJiG1mW+vYKveZJ6dqHbe6LG8ssBT2u9yz5iK1KXSmBsV8QMu qoJD1mjIzUIhWTYBJEuTp9MbDoNLH/umFGn1hXRTEgiBcRTYTlCgsWeXg3LPgtAUTqr6 rHGQ== X-Gm-Message-State: APjAAAUeshymkyVPKM6BA/JU7FFOelBqStne7lJYMhoujDPjYAFBKQ80 bQJOQXaebq2TjCsur+uyef9XVB9a/CE= X-Google-Smtp-Source: APXvYqwxTTYB2H+1rMZ4VswRdt7buKFCjaK2lTA7dpSOp1nUBzhdrVkUfNF2/tQL7Fa+DUI90VaEqw== X-Received: by 2002:a17:902:8d95:: with SMTP id v21mr94335793plo.61.1578101628883; Fri, 03 Jan 2020 17:33:48 -0800 (PST) Received: from hermes.lan (204-195-22-127.wavecable.com. [204.195.22.127]) by smtp.gmail.com with ESMTPSA id j6sm16212934pjv.10.2020.01.03.17.33.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Jan 2020 17:33:47 -0800 (PST) From: Stephen Hemminger <stephen@networkplumber.org> To: dev@dpdk.org Cc: Stephen Hemminger <stephen@networkplumber.org> Date: Fri, 3 Jan 2020 17:33:27 -0800 Message-Id: <20200104013341.19809-1-stephen@networkplumber.org> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] [PATCH 00/14] cleanup resources on shutdown X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Series |
cleanup resources on shutdown
|
|
Message
Stephen Hemminger
Jan. 4, 2020, 1:33 a.m. UTC
Recently started using valgrind with DPDK, and the results are not clean. The DPDK has a function that applications can use to tell it to cleanup resources on shutdown (rte_eal_cleanup). But the current coverage of that API is spotty. Many internal parts of DPDK leave files and allocated memory behind. This patch set is a start at getting the sub-parts of DPDK to cleanup after themselves. These are the easier ones, the harder and more critical ones are in the drivers and the memory subsystem. There are no visible API or ABI changes here. Stephen Hemminger (14): eal: log: close on cleanup eal: log: free dynamic state on cleanup eal: alarm: close timerfd on eal cleanup eal: cleanup threads eal: intr: cleanup resources eal: mp: end the multiprocess thread during cleanup eal: interrupts close epoll fd on shutdown eal: vfio: cleanup the mp sync handle eal: close mem config on cleanup tap: close netlink socket on device close eal: cleanup plugins data ethdev: raise priority of old driver warning eal: hotplug: cleanup multiprocess resources eal: malloc: cleanup mp resources drivers/net/tap/rte_eth_tap.c | 7 ++++- lib/librte_eal/common/eal_common_log.c | 30 +++++++++++++++++- lib/librte_eal/common/eal_common_options.c | 12 +++++++ lib/librte_eal/common/eal_common_proc.c | 17 +++++++--- lib/librte_eal/common/eal_options.h | 1 + lib/librte_eal/common/eal_private.h | 30 ++++++++++++++++++ lib/librte_eal/common/hotplug_mp.c | 5 +++ lib/librte_eal/common/hotplug_mp.h | 6 ++++ lib/librte_eal/common/malloc_heap.c | 6 ++++ lib/librte_eal/common/malloc_heap.h | 3 ++ lib/librte_eal/common/malloc_mp.c | 12 +++++++ lib/librte_eal/common/malloc_mp.h | 3 ++ lib/librte_eal/linux/eal/eal.c | 28 +++++++++++++++++ lib/librte_eal/linux/eal/eal_alarm.c | 11 +++++++ lib/librte_eal/linux/eal/eal_interrupts.c | 35 ++++++++++++++++++--- lib/librte_eal/linux/eal/eal_log.c | 14 +++++++++ lib/librte_eal/linux/eal/eal_vfio.h | 1 + lib/librte_eal/linux/eal/eal_vfio_mp_sync.c | 8 +++++ lib/librte_ethdev/rte_ethdev.c | 2 +- 19 files changed, 218 insertions(+), 13 deletions(-)
Comments
Hello Stephen, On Sat, Jan 4, 2020 at 2:34 AM Stephen Hemminger <stephen@networkplumber.org> wrote: > > Recently started using valgrind with DPDK, and the results > are not clean. > > The DPDK has a function that applications can use to tell it > to cleanup resources on shutdown (rte_eal_cleanup). But the > current coverage of that API is spotty. Many internal parts of > DPDK leave files and allocated memory behind. > > This patch set is a start at getting the sub-parts of > DPDK to cleanup after themselves. These are the easier ones, > the harder and more critical ones are in the drivers > and the memory subsystem. > > There are no visible API or ABI changes here. Could you share what you did to run a dpdk application with valgrind? I tried with testpmd and a 3.15 valgrind (fc30), but I get an init failure on the cpu flags. $ LD_LIBRARY_PATH=/home/dmarchan/builds/build-gcc-shared/install/usr/local/lib64 valgrind /home/dmarchan/builds/build-gcc-shared/install/usr/local/bin/dpdk-testpmd -c 3 --no-huge -m 20 -d librte_mempool_ring.so -d librte_pmd_null.so -w 0:0.0 --vdev net_null1 --vdev net_null2 -- --no-mlockall --total-num-mbufs=2048 -ia ==10258== Memcheck, a memory error detector ==10258== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==10258== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==10258== Command: /home/dmarchan/builds/build-gcc-shared/install/usr/local/bin/dpdk-testpmd -c 3 --no-huge -m 20 -d librte_mempool_ring.so -d librte_pmd_null.so -w 0:0.0 --vdev net_null1 --vdev net_null2 -- --no-mlockall --total-num-mbufs=2048 -ia ==10258== ERROR: This system does not support "RDSEED". Please check that RTE_MACHINE is set correctly. EAL: FATAL: unsupported cpu type. EAL: unsupported cpu type. EAL: Error - exiting with code: 1 Cause: Cannot init EAL: Operation not supported ==10258== ==10258== HEAP SUMMARY: ==10258== in use at exit: 1,388 bytes in 49 blocks ==10258== total heap usage: 97 allocs, 48 frees, 89,426 bytes allocated ==10258== ==10258== LEAK SUMMARY: ==10258== definitely lost: 0 bytes in 0 blocks ==10258== indirectly lost: 0 bytes in 0 blocks ==10258== possibly lost: 0 bytes in 0 blocks ==10258== still reachable: 1,388 bytes in 49 blocks ==10258== suppressed: 0 bytes in 0 blocks ==10258== Rerun with --leak-check=full to see details of leaked memory ==10258== ==10258== For lists of detected and suppressed errors, rerun with: -s ==10258== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Thanks. -- David Marchand
On Wed, 5 Feb 2020 10:32:49 +0100 David Marchand <david.marchand@redhat.com> wrote: > Hello Stephen, > > On Sat, Jan 4, 2020 at 2:34 AM Stephen Hemminger > <stephen@networkplumber.org> wrote: > > > > Recently started using valgrind with DPDK, and the results > > are not clean. > > > > The DPDK has a function that applications can use to tell it > > to cleanup resources on shutdown (rte_eal_cleanup). But the > > current coverage of that API is spotty. Many internal parts of > > DPDK leave files and allocated memory behind. > > > > This patch set is a start at getting the sub-parts of > > DPDK to cleanup after themselves. These are the easier ones, > > the harder and more critical ones are in the drivers > > and the memory subsystem. > > > > There are no visible API or ABI changes here. > > Could you share what you did to run a dpdk application with valgrind? > > I tried with testpmd and a 3.15 valgrind (fc30), but I get an init > failure on the cpu flags. > > $ LD_LIBRARY_PATH=/home/dmarchan/builds/build-gcc-shared/install/usr/local/lib64 > valgrind /home/dmarchan/builds/build-gcc-shared/install/usr/local/bin/dpdk-testpmd > -c 3 --no-huge -m 20 -d librte_mempool_ring.so -d librte_pmd_null.so > -w 0:0.0 --vdev net_null1 --vdev net_null2 -- --no-mlockall > --total-num-mbufs=2048 -ia > ==10258== Memcheck, a memory error detector > ==10258== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. > ==10258== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info > ==10258== Command: > /home/dmarchan/builds/build-gcc-shared/install/usr/local/bin/dpdk-testpmd > -c 3 --no-huge -m 20 -d librte_mempool_ring.so -d librte_pmd_null.so > -w 0:0.0 --vdev net_null1 --vdev net_null2 -- --no-mlockall > --total-num-mbufs=2048 -ia > ==10258== > ERROR: This system does not support "RDSEED". > Please check that RTE_MACHINE is set correctly. > EAL: FATAL: unsupported cpu type. > EAL: unsupported cpu type. > EAL: Error - exiting with code: 1 > Cause: Cannot init EAL: Operation not supported > ==10258== > ==10258== HEAP SUMMARY: > ==10258== in use at exit: 1,388 bytes in 49 blocks > ==10258== total heap usage: 97 allocs, 48 frees, 89,426 bytes allocated > ==10258== > ==10258== LEAK SUMMARY: > ==10258== definitely lost: 0 bytes in 0 blocks > ==10258== indirectly lost: 0 bytes in 0 blocks > ==10258== possibly lost: 0 bytes in 0 blocks > ==10258== still reachable: 1,388 bytes in 49 blocks > ==10258== suppressed: 0 bytes in 0 blocks > ==10258== Rerun with --leak-check=full to see details of leaked memory > ==10258== > ==10258== For lists of detected and suppressed errors, rerun with: -s > ==10258== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) > I am testing with valgrind on ARM. It should be possible on x86 but you need to dial down the RTE_MACHINE choice to something valgrind understands.
On Wed, Feb 5, 2020 at 1:07 PM Stephen Hemminger <stephen@networkplumber.org> wrote: > > On Wed, 5 Feb 2020 10:32:49 +0100 > David Marchand <david.marchand@redhat.com> wrote: > > > Hello Stephen, > > > > On Sat, Jan 4, 2020 at 2:34 AM Stephen Hemminger > > <stephen@networkplumber.org> wrote: > > > > > > Recently started using valgrind with DPDK, and the results > > > are not clean. > > > > > > The DPDK has a function that applications can use to tell it > > > to cleanup resources on shutdown (rte_eal_cleanup). But the > > > current coverage of that API is spotty. Many internal parts of > > > DPDK leave files and allocated memory behind. > > > > > > This patch set is a start at getting the sub-parts of > > > DPDK to cleanup after themselves. These are the easier ones, > > > the harder and more critical ones are in the drivers > > > and the memory subsystem. > > > > > > There are no visible API or ABI changes here. > > > > Could you share what you did to run a dpdk application with valgrind? > > > > I tried with testpmd and a 3.15 valgrind (fc30), but I get an init > > failure on the cpu flags. > > > > $ LD_LIBRARY_PATH=/home/dmarchan/builds/build-gcc-shared/install/usr/local/lib64 > > valgrind /home/dmarchan/builds/build-gcc-shared/install/usr/local/bin/dpdk-testpmd > > -c 3 --no-huge -m 20 -d librte_mempool_ring.so -d librte_pmd_null.so > > -w 0:0.0 --vdev net_null1 --vdev net_null2 -- --no-mlockall > > --total-num-mbufs=2048 -ia > > ==10258== Memcheck, a memory error detector > > ==10258== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. > > ==10258== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info > > ==10258== Command: > > /home/dmarchan/builds/build-gcc-shared/install/usr/local/bin/dpdk-testpmd > > -c 3 --no-huge -m 20 -d librte_mempool_ring.so -d librte_pmd_null.so > > -w 0:0.0 --vdev net_null1 --vdev net_null2 -- --no-mlockall > > --total-num-mbufs=2048 -ia > > ==10258== > > ERROR: This system does not support "RDSEED". > > Please check that RTE_MACHINE is set correctly. > > EAL: FATAL: unsupported cpu type. > > EAL: unsupported cpu type. > > EAL: Error - exiting with code: 1 > > Cause: Cannot init EAL: Operation not supported > > ==10258== > > ==10258== HEAP SUMMARY: > > ==10258== in use at exit: 1,388 bytes in 49 blocks > > ==10258== total heap usage: 97 allocs, 48 frees, 89,426 bytes allocated > > ==10258== > > ==10258== LEAK SUMMARY: > > ==10258== definitely lost: 0 bytes in 0 blocks > > ==10258== indirectly lost: 0 bytes in 0 blocks > > ==10258== possibly lost: 0 bytes in 0 blocks > > ==10258== still reachable: 1,388 bytes in 49 blocks > > ==10258== suppressed: 0 bytes in 0 blocks > > ==10258== Rerun with --leak-check=full to see details of leaked memory > > ==10258== > > ==10258== For lists of detected and suppressed errors, rerun with: -s > > ==10258== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) > > > > I am testing with valgrind on ARM. > It should be possible on x86 but you need to dial down the RTE_MACHINE > choice to something valgrind understands. > Ok, so no black magic in valgrind :-) Yeah I managed to run with the x86-default target we have in test-meson-builds.sh. Thanks. -- David Marchand
On Sat, Jan 4, 2020 at 2:34 AM Stephen Hemminger <stephen@networkplumber.org> wrote: > > Recently started using valgrind with DPDK, and the results > are not clean. > > The DPDK has a function that applications can use to tell it > to cleanup resources on shutdown (rte_eal_cleanup). But the > current coverage of that API is spotty. Many internal parts of > DPDK leave files and allocated memory behind. > > This patch set is a start at getting the sub-parts of > DPDK to cleanup after themselves. These are the easier ones, > the harder and more critical ones are in the drivers > and the memory subsystem. I am too short on time to check and integrate this big series in rc2, and from now it would be too risky to take in 20.02. Can you respin it for 20.05 with FreeBSD fixes too? Thanks.
On Thu, 6 Feb 2020 15:06:56 +0100 David Marchand <david.marchand@redhat.com> wrote: > On Sat, Jan 4, 2020 at 2:34 AM Stephen Hemminger > <stephen@networkplumber.org> wrote: > > > > Recently started using valgrind with DPDK, and the results > > are not clean. > > > > The DPDK has a function that applications can use to tell it > > to cleanup resources on shutdown (rte_eal_cleanup). But the > > current coverage of that API is spotty. Many internal parts of > > DPDK leave files and allocated memory behind. > > > > This patch set is a start at getting the sub-parts of > > DPDK to cleanup after themselves. These are the easier ones, > > the harder and more critical ones are in the drivers > > and the memory subsystem. > > I am too short on time to check and integrate this big series in rc2, > and from now it would be too risky to take in 20.02. > Can you respin it for 20.05 with FreeBSD fixes too? OK, but if this kind of patch can't be reviewed then the DPDK process is still broken. I don't see how FreeBSD matters here. It can be leaky but that is ok. I split it out to get review, then you complain it is too big :-(
On Sat, Jan 4, 2020 at 2:34 AM Stephen Hemminger <stephen@networkplumber.org> wrote: > > Recently started using valgrind with DPDK, and the results > are not clean. > > The DPDK has a function that applications can use to tell it > to cleanup resources on shutdown (rte_eal_cleanup). But the > current coverage of that API is spotty. Many internal parts of > DPDK leave files and allocated memory behind. > > This patch set is a start at getting the sub-parts of > DPDK to cleanup after themselves. These are the easier ones, > the harder and more critical ones are in the drivers > and the memory subsystem. > > There are no visible API or ABI changes here. I was about to push the series (except patch 10), but I hit a crash when passing an invalid option to test-null.sh. Reproduced with: Core was generated by `/home/dmarchan/builds/x86_64-native-linux-gcc+shared+kmods/app/testpmd -c 0x3 --log-level='. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007fd5231dba64 in pthread_cancel () from /usr/lib64/libpthread.so.0 Missing separate debuginfos, use: dnf debuginfo-install elfutils-libelf-0.178-7.fc30.x86_64 glibc-2.29-28.fc30.x86_64 jansson-2.12-2.fc30.x86_64 libgcc-9.2.1-1.fc30.x86_64 libpcap-1.9.1-1.fc30.x86_64 numactl-libs-2.0.12-2.fc30.x86_64 zlib-1.2.11-19.fc30.x86_64 (gdb) bt full #0 0x00007fd5231dba64 in pthread_cancel () from /usr/lib64/libpthread.so.0 No symbol table info available. #1 0x00007fd52320c586 in rte_eal_cleanup () at /home/dmarchan/dpdk/lib/librte_eal/linux/eal.c:1339 i = 1 #2 0x00007fd523215f5e in rte_exit (exit_code=exit_code@entry=1, format=format@entry=0x47ada4 "Cannot init EAL: %s\n") at /home/dmarchan/dpdk/lib/librte_eal/linux/eal_debug.c:83 ap = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = 0x7ffecdf7aa70, reg_save_area = 0x7ffecdf7a9a0}} #3 0x000000000043535b in main (argc=21, argv=0x7ffecdf7abc8) at /home/dmarchan/dpdk/app/test-pmd/testpmd.c:3647 diag = -1 port_id = <optimized out> count = <optimized out> ret = <optimized out> (gdb) f 1 #1 0x00007fd52320c586 in rte_eal_cleanup () at /home/dmarchan/dpdk/lib/librte_eal/linux/eal.c:1339 1339 pthread_cancel(lcore_config[i].thread_id); (gdb) p lcore_config[1].thread_id $1 = 0 rte_eal_cleanup() is called from rte_exit() by testpmd. But since rte_eal_init() failed at parsing, lcore_config[*].thread_id are invalid, and we crash on pthread_cancel. I have no quick idea to fix this, series postponed to rc2.