From patchwork Wed Sep 6 02:31:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jieqiang Wang X-Patchwork-Id: 131183 X-Patchwork-Delegate: david.marchand@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 529FD42504; Wed, 6 Sep 2023 04:31:23 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D7A80406B7; Wed, 6 Sep 2023 04:31:22 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 17A3A4029E; Wed, 6 Sep 2023 04:31:21 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 63E0111FB; Tue, 5 Sep 2023 19:31:58 -0700 (PDT) Received: from net-x86-dell-8268.shanghai.arm.com (net-x86-dell-8268.shanghai.arm.com [10.169.210.116]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 903893F64C; Tue, 5 Sep 2023 19:31:16 -0700 (PDT) From: Jieqiang Wang To: Yipeng Wang , Sameh Gobriel , Bruce Richardson , Vladimir Medvedkin , Honnappa Nagarahalli , Dharmik Thakkar Cc: dev@dpdk.org, nd@arm.com, Jieqiang Wang , stable@dpdk.org, Feifei Wang , Ruifeng Wang Subject: [PATCH] hash: fix SSE comparison Date: Wed, 6 Sep 2023 10:31:00 +0800 Message-Id: <20230906023100.3618303-1-jieqiang.wang@arm.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org __mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements are equal. In original SSE2 implementation for function compare_signatures, it utilizes _mm_movemask_epi8 to create mask from the MSB of each 8-bit element, while we should only care about the MSB of lower 8-bit in each 16-bit element. For example, if the comparison result is all equal, SSE2 path returns 0xFFFF while NEON and default scalar path return 0x5555. Although this bug is not causing any negative effects since the caller function solely examines the trailing zeros of each match mask, we recommend this fix to ensure consistency with NEON and default scalar code behaviors. Fixes: c7d93df552c2 ("hash: use partial-key hashing") Cc: yipeng1.wang@intel.com Cc: stable@dpdk.org Signed-off-by: Feifei Wang Signed-off-by: Jieqiang Wang Reviewed-by: Ruifeng Wang --- lib/hash/rte_cuckoo_hash.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c index d92a903bb3..acaa8b74bd 100644 --- a/lib/hash/rte_cuckoo_hash.c +++ b/lib/hash/rte_cuckoo_hash.c @@ -1862,17 +1862,19 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, /* For match mask the first bit of every two bits indicates the match */ switch (sig_cmp_fn) { #if defined(__SSE2__) - case RTE_HASH_COMPARE_SSE: + case RTE_HASH_COMPARE_SSE: { /* Compare all signatures in the bucket */ - *prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16( - _mm_load_si128( + __m128i shift_mask = _mm_set1_epi16(0x0080); + __m128i prim_cmp = _mm_cmpeq_epi16(_mm_load_si128( (__m128i const *)prim_bkt->sig_current), - _mm_set1_epi16(sig))); + _mm_set1_epi16(sig)); + *prim_hash_matches = _mm_movemask_epi8(_mm_and_si128(prim_cmp, shift_mask)); /* Compare all signatures in the bucket */ - *sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16( - _mm_load_si128( + __m128i sec_cmp = _mm_cmpeq_epi16(_mm_load_si128( (__m128i const *)sec_bkt->sig_current), - _mm_set1_epi16(sig))); + _mm_set1_epi16(sig)); + *sec_hash_matches = _mm_movemask_epi8(_mm_and_si128(sec_cmp, shift_mask)); + } break; #elif defined(__ARM_NEON) case RTE_HASH_COMPARE_NEON: {