From patchwork Tue Oct 23 17:48:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: bugzilla@dpdk.org X-Patchwork-Id: 47261 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id B4BD21B3B5; Tue, 23 Oct 2018 19:48:09 +0200 (CEST) Received: by dpdk.org (Postfix, from userid 33) id 23CB51B3B5; Tue, 23 Oct 2018 19:48:09 +0200 (CEST) From: bugzilla@dpdk.org To: dev@dpdk.org Date: Tue, 23 Oct 2018 17:48:09 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: DPDK X-Bugzilla-Component: core X-Bugzilla-Version: 18.08 X-Bugzilla-Keywords: X-Bugzilla-Severity: critical X-Bugzilla-Who: yskoh@mellanox.com X-Bugzilla-Status: CONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: dev@dpdk.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: X-Bugzilla-URL: http://bugs.dpdk.org/ Auto-Submitted: auto-generated X-Auto-Response-Suppress: All MIME-Version: 1.0 Subject: [dpdk-dev] [Bug 97] rte_memcpy() moves data incorrectly on Ubuntu 18.04 on Intel Skylake X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" https://bugs.dpdk.org/show_bug.cgi?id=97 Bug ID: 97 Summary: rte_memcpy() moves data incorrectly on Ubuntu 18.04 on Intel Skylake Product: DPDK Version: 18.08 Hardware: x86 OS: Linux Status: CONFIRMED Severity: critical Priority: Normal Component: core Assignee: dev@dpdk.org Reporter: yskoh@mellanox.com Target Milestone: --- Reported by: https://mails.dpdk.org/archives/dev/2018-September/111522.html We've recently encountered a weird issue with Ubuntu 18.04 on the Skylake server. I can always reproduce this crash and I could narrowed it down. I guess it could be a GCC issue. [1] How to reproduce - ConnectX-4Lx/ConnectX-5 with mlx5 PMD in DPDK 18.02/18.05/18.08 - Ubuntu 18.04 on Intel Skylake server - gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0 - Testpmd crashes when it starts to forward traffic. Easy to reproduce. - Only happens on the Skylake server. [2] Failure point The attached patch gives an insight of why it crashes. The following is the result of the patch and the GDB commands. In summary, rte_memcpy() doesn't work as expected. In __mempool_generic_put(), there's rte_memcpy() to move the array of objects to the lcore cache. If I run memcmp() right after rte_memcpy(dst, src, n), data in dst differs from data in src. And it looks like some of data got shifted by a few bytes as you can see below. [GDB command] $dst = 0x7ffff4e09ea8 $src = 0x7fffce3fb970 $n = 256 x/32gx 0x7ffff4e09ea8 x/32gx 0x7fffce3fb970 testpmd: /home/mlnxtest/dpdk/build/include/rte_mempool.h:1140: __mempool_generic_put: Assertion `0' failed. Thread 4 "lcore-slave-1" received signal SIGABRT, Aborted. [Switching to Thread 0x7fffce3ff700 (LWP 69913)] (gdb) x/32gx 0x7ffff4e09ea8 0x7ffff4e09ea8: 0x00007fffaac38ec0 0x00007fffaac38500 0x7ffff4e09eb8: 0x00007fffaac37b40 0x00007fffaac37180 0x7ffff4e09ec8: 0x850000007fffaac3 0x7b4000007fffaac3 0x7ffff4e09ed8: 0x00007fffaac35440 0x00007fffaac34a80 0x7ffff4e09ee8: 0xaac3850000007fff 0xaac37b4000007fff 0x7ffff4e09ef8: 0x00007fffaac32d40 0x00007fffaac32380 0x7ffff4e09f08: 0x7fffaac385000000 0x7fffaac37b400000 0x7ffff4e09f18: 0x00007fffaac30640 0x00007fffaac2fc80 0x7ffff4e09f28: 0x00007fffaac2f2c0 0x00007fffaac2e900 0x7ffff4e09f38: 0x00007fffaac2df40 0x00007fffaac2d580 0x7ffff4e09f48: 0x00007fffaac2cbc0 0x00007fffaac2c200 0x7ffff4e09f58: 0x00007fffaac2b840 0x00007fffaac2ae80 0x7ffff4e09f68: 0x00007fffaac2a4c0 0x00007fffaac29b00 0x7ffff4e09f78: 0x00007fffaac29140 0x00007fffaac28780 0x7ffff4e09f88: 0x00007fffaac27dc0 0x00007fffaac27400 0x7ffff4e09f98: 0x00007fffaac26a40 0x00007fffaac26080 (gdb) x/32gx 0x7fffce3fb970 0x7fffce3fb970: 0x00007fffaac38ec0 0x00007fffaac38500 0x7fffce3fb980: 0x00007fffaac37b40 0x00007fffaac37180 0x7fffce3fb990: 0x00007fffaac367c0 0x00007fffaac35e00 0x7fffce3fb9a0: 0x00007fffaac35440 0x00007fffaac34a80 0x7fffce3fb9b0: 0x00007fffaac340c0 0x00007fffaac33700 0x7fffce3fb9c0: 0x00007fffaac32d40 0x00007fffaac32380 0x7fffce3fb9d0: 0x00007fffaac319c0 0x00007fffaac31000 0x7fffce3fb9e0: 0x00007fffaac30640 0x00007fffaac2fc80 0x7fffce3fb9f0: 0x00007fffaac2f2c0 0x00007fffaac2e900 0x7fffce3fba00: 0x00007fffaac2df40 0x00007fffaac2d580 0x7fffce3fba10: 0x00007fffaac2cbc0 0x00007fffaac2c200 0x7fffce3fba20: 0x00007fffaac2b840 0x00007fffaac2ae80 0x7fffce3fba30: 0x00007fffaac2a4c0 0x00007fffaac29b00 0x7fffce3fba40: 0x00007fffaac29140 0x00007fffaac28780 0x7fffce3fba50: 0x00007fffaac27dc0 0x00007fffaac27400 0x7fffce3fba60: 0x00007fffaac26a40 0x00007fffaac26080 AFAIK, AVX512F support is disabled by default in DPDK as it is still experimental (CONFIG_RTE_ENABLE_AVX512=n). But with gcc optimization, AVX2 version of rte_memcpy() seems to be optimized with 512b instructions. If I disable it by adding EXTRA_CFLAGS="-mno-avx512f", then it works fine and doesn't crash. Do you have any idea regarding this issue or are you already aware of it? Thanks, Yongseok $ git diff if (cache->len >= cache->flushthresh) { diff --git a/config/common_base b/config/common_base index ad03cf433..f512b5a88 100644 --- a/config/common_base +++ b/config/common_base @@ -275,8 +275,8 @@ CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8 # # Compile burst-oriented Mellanox ConnectX-4 & ConnectX-5 (MLX5) PMD # -CONFIG_RTE_LIBRTE_MLX5_PMD=n -CONFIG_RTE_LIBRTE_MLX5_DEBUG=n +CONFIG_RTE_LIBRTE_MLX5_PMD=y +CONFIG_RTE_LIBRTE_MLX5_DEBUG=y CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS=n CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8 @@ -597,7 +597,7 @@ CONFIG_RTE_RING_USE_C11_MEM_MODEL=n # CONFIG_RTE_LIBRTE_MEMPOOL=y CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE=512 -CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n +CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=y # # Compile Mempool drivers diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 8b1b7f7ed..9f48028d9 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -1123,6 +1124,22 @@ __mempool_generic_put(struct rte_mempool *mp, void * const *obj_table, /* Add elements back into the cache */ rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); + if(memcmp(&cache_objs[0], obj_table, sizeof(void *) * n)) { + printf("[GDB command] \n" + "$dst = %p\n" + "$src = %p\n" + "$n = %ld\n" + "x/%ldgx %p\n" + "x/%ldgx %p\n", + (void *)&cache_objs[0], + (const void *)obj_table, + sizeof(void *) * n, + sizeof(void *) * n / 8, (void *)&cache_objs[0], + sizeof(void *) * n / 8, (const void *)obj_table + ); + assert(0); + } + cache->len += n;