From patchwork Tue Sep 26 09:23:45 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Martin Weiser X-Patchwork-Id: 29189 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id CEE681B1B4; Tue, 26 Sep 2017 11:23:47 +0200 (CEST) Received: from smtprelay07.ispgateway.de (smtprelay07.ispgateway.de [134.119.228.103]) by dpdk.org (Postfix) with ESMTP id 767531B1B2 for ; Tue, 26 Sep 2017 11:23:46 +0200 (CEST) Received: from [146.52.109.75] (helo=nb-martin.allegro) by smtprelay07.ispgateway.de with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.89) (envelope-from ) id 1dwm5h-0003TS-Hb; Tue, 26 Sep 2017 11:23:45 +0200 To: Adrien Mazarguil , Nelio Laranjeiro Cc: "dev@dpdk.org" From: Martin Weiser Message-ID: <5d1f07c4-5933-806d-4d11-8fdfabc701d7@allegro-packets.com> Date: Tue, 26 Sep 2017 11:23:45 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 Content-Language: en-US X-Df-Sender: bWFydGluLndlaXNlckBhbGxlZ3JvLXBhY2tldHMuY29t Subject: [dpdk-dev] Mellanox ConnectX-5 crashes and mbuf leak X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi, we are currently testing the Mellanox ConnectX-5 100G NIC with DPDK 17.08 as well as dpdk-net-next and are experiencing mbuf leaks as well as crashes (and in some instances even kernel panics in a mlx5 module) under certain load conditions. We initially saw these issues only in our own DPDK-based application and it took some effort to reproduce this in one of the DPDK example applications. However with the attached patch to the load-balancer example we can reproduce the issues reliably. The patch may look weird at first but I will explain why I made these changes: * the sleep introduced in the worker threads simulates heavy processing which causes the software rx rings to fill   up under load. If the rings are large enough (I increased the ring size with the load-balancer command line option   as you can see in the example call further down) the mbuf pool may run empty and I believe this leads to a malfunction   in the mlx5 driver. As soon as this happens the NIC will stop forwarding traffic, probably because the driver   cannot allocate mbufs for the packets received by the NIC. Unfortunately when this happens most of the mbufs will   never return to the mbuf pool so that even when the traffic stops the pool will remain almost empty and the   application will not forward traffic even at a very low rate. * the use of the reference count in the mbuf in addition to the situation described above is what makes the   mlx5 DPDK driver crash almost immediately under load. In our application we rely on this feature to be able to forward   the packet quickly and still send the packet to a worker thread for analysis and finally free the packet when analysis is   done. Here I simulated this by increasing the mbuf reference count immediately after receiving the mbuf from the   driver and then calling rte_pktmbuf_free in the worker thread which should only decrement the reference count again   and not actually free the mbuf. We executed the patched load-balancer application with the following command line:     ./build/load_balancer -l 3-7 -n 4 -- --rx "(0,0,3),(1,0,3)" --tx "(0,3),(1,3)" --w "4" --lpm "16.0.0.0/8=>0; 48.0.0.0/8=>1;" --pos-lb 29 --rsz "1024, 32768, 1024, 1024" Then we generated traffic using the t-rex traffic generator and the sfr test case. On our machine the issues start to happen when the traffic exceeds ~6 Gbps but this may vary depending on how powerful the test machine is (by the way we were able to reproduce this on different types of hardware). A typical stacktrace looks like this:     Thread 1 "load_balancer" received signal SIGSEGV, Segmentation fault.     0x0000000000614475 in _mm_storeu_si128 (__B=..., __P=) at /usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h:716     716      __builtin_ia32_storedqu ((char *)__P, (__v16qi)__B);     (gdb) bt     #0  0x0000000000614475 in _mm_storeu_si128 (__B=..., __P=) at /usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h:716     #1  rxq_cq_decompress_v (elts=0x7fff3732bef0, cq=0x7ffff7f99380, rxq=0x7fff3732a980) at /root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:679     #2  rxq_burst_v (pkts_n=, pkts=0xa7c7b0 , rxq=0x7fff3732a980) at /root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:1242     #3  mlx5_rx_burst_vec (dpdk_rxq=0x7fff3732a980, pkts=, pkts_n=) at /root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:1277     #4  0x000000000043c11d in rte_eth_rx_burst (nb_pkts=3599, rx_pkts=0xa7c7b0 , queue_id=0, port_id=0 '\000')     at /root/dpdk-next-net//x86_64-native-linuxapp-gcc/include/rte_ethdev.h:2781     #5  app_lcore_io_rx (lp=lp@entry=0xa7c700 , n_workers=n_workers@entry=1, bsz_rd=bsz_rd@entry=144, bsz_wr=bsz_wr@entry=144, pos_lb=pos_lb@entry=29 '\035')     at /root/dpdk-next-net/examples/load_balancer/runtime.c:198     #6  0x0000000000447dc0 in app_lcore_main_loop_io () at /root/dpdk-next-net/examples/load_balancer/runtime.c:485     #7  app_lcore_main_loop (arg=) at /root/dpdk-next-net/examples/load_balancer/runtime.c:669     #8  0x0000000000495e8b in rte_eal_mp_remote_launch ()     #9  0x0000000000441e0d in main (argc=, argv=) at /root/dpdk-next-net/examples/load_balancer/main.c:99 The crash does not always happen at the exact same spot but in our tests always in the same function. In a few instances instead of an application crash the system froze completely with what appeared to be a kernel panic. The last output looked like a crash in the interrupt handler of a mlx5 module but unfortunately I cannot provide the exact output right now. All tests were performed under Ubuntu 16.04 server running a 4.4.0-96-generic kernel and the lasted Mellanox OFED MLNX_OFED_LINUX-4.1-1.0.2.0-ubuntu16.04-x86_64 was used. Any help with this issue is greatly appreciated. Best regards, Martin diff --git a/config/common_base b/config/common_base index 439f3cc..12b71e9 100644 --- a/config/common_base +++ b/config/common_base @@ -220,7 +220,7 @@ CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8 # # Compile burst-oriented Mellanox ConnectX-4 & ConnectX-5 (MLX5) PMD # -CONFIG_RTE_LIBRTE_MLX5_PMD=n +CONFIG_RTE_LIBRTE_MLX5_PMD=y CONFIG_RTE_LIBRTE_MLX5_DEBUG=n CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8 diff --git a/examples/load_balancer/runtime.c b/examples/load_balancer/runtime.c index e54b785..d448100 100644 --- a/examples/load_balancer/runtime.c +++ b/examples/load_balancer/runtime.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include @@ -133,6 +134,8 @@ app_lcore_io_rx_buffer_to_send ( uint32_t pos; int ret; + rte_pktmbuf_refcnt_update(mbuf, 1); + pos = lp->rx.mbuf_out[worker].n_mbufs; lp->rx.mbuf_out[worker].array[pos ++] = mbuf; if (likely(pos < bsz)) { @@ -521,6 +524,8 @@ app_lcore_worker( continue; #endif + usleep(20); + APP_WORKER_PREFETCH1(rte_pktmbuf_mtod(lp->mbuf_in.array[0], unsigned char *)); APP_WORKER_PREFETCH0(lp->mbuf_in.array[1]); @@ -530,6 +535,8 @@ app_lcore_worker( uint32_t ipv4_dst, pos; uint32_t port; + rte_pktmbuf_free(lp->mbuf_in.array[j]); + if (likely(j < bsz_rd - 1)) { APP_WORKER_PREFETCH1(rte_pktmbuf_mtod(lp->mbuf_in.array[j+1], unsigned char *)); }