From patchwork Thu Oct 21 08:51:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eli Britstein X-Patchwork-Id: 102539 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0ADA6A0547; Thu, 21 Oct 2021 10:52:46 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 558C2411D8; Thu, 21 Oct 2021 10:52:34 +0200 (CEST) Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2053.outbound.protection.outlook.com [40.107.94.53]) by mails.dpdk.org (Postfix) with ESMTP id D37DD411CF; Thu, 21 Oct 2021 10:52:31 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=obJt/8mfrcY9g1hth5x8jvtbbKpnVO33nYfXf2bx+1MWXI3Laped7u+R4lmyEgKj+g5IKl1EYODAaN6LQangfpgIy5ms7B+s54PFFgRkyR+RuRyyUxcZG4syQ9nzA/DO3pri54lv93IyG/a2iB20QvcruTYeJ8U5p3OyHZc5DroGrmgr32Na/r9Gn2pnmNOQJWpC5esoJSz9L9qEO3mhK7G95dSi2LJdCsBhMyG5ckAcz4t+y7SanjuCHW4d23EohHywQ2iMrWRjjxlX6ytqBMSZRubVbKp/tkpLGQK+mDh86L5xQuXjJsi4mV+HRBJslzWoVPEjp+iBx09s9wiUJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=iR/dl36ZU37C5HlIvZBUFUVn75SAD9JH8ARitWoqBdE=; b=I5vIJIkNILCL2IO3p//6ENjzqgAEAyhL2NntH3YKvLdgVEKXJW+YmHUFodLB+Shfzhv4LcrkBk9gCgrrtf+L73fI1OTcT3OeWV2L4c/HmMekYbRhuLFWo9APhVdFy6euelMedBxMD8UjBFOVed29oiB3XecTSpQzFYawfLy9vtsapOjtGAa39buVZhLLFWfDltYH8UKnZL1O+OmbF1+e9KLZRvD0xHh2ecb1MMY8zn0RD34ZQLYXPAfbvHd67ro0cJABBeNmHhgUXATOiLSOIUWsvSuyZOaf+H2c3GpbXvSzv9CZDsMpp+HgawvbfzXi3jO+nABrxkNSJYbeLXpcAg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=intel.com smtp.mailfrom=nvidia.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=iR/dl36ZU37C5HlIvZBUFUVn75SAD9JH8ARitWoqBdE=; b=QXoSDrOfANPxmyBz160mL6SMP784ck0f7Nus3uY1GFZrQGXz9bh5CNTjQzRr1MdKh2l+iTpL1PbOPcR0mCiUkYcjfNY9zjsgNwgb4MyXTF+zEeXrapcyHHVxwM7nnU56iNQzzMhTAg/9KLGULCk9Ikk2SkxrRbHx6O+OendV/76GFC2djMcB0IDth3GPh3ecGp6NrHAdbjL10afQw1JKirZYvAiHDu7YQF9JM/WjxPfstT76s9X3NjVRhr87r23+MSciH7ZWhVau24qf+RLPUFcuOufO5FwzKy2b8SfG4zldTv1Qw3uIi5JK08r8XrrDreHTIScI7ygiwu79WlgJJw== Received: from MW4PR03CA0102.namprd03.prod.outlook.com (2603:10b6:303:b7::17) by BN9PR12MB5290.namprd12.prod.outlook.com (2603:10b6:408:103::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4628.18; Thu, 21 Oct 2021 08:52:28 +0000 Received: from CO1NAM11FT003.eop-nam11.prod.protection.outlook.com (2603:10b6:303:b7:cafe::af) by MW4PR03CA0102.outlook.office365.com (2603:10b6:303:b7::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4608.16 via Frontend Transport; Thu, 21 Oct 2021 08:52:28 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by CO1NAM11FT003.mail.protection.outlook.com (10.13.175.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.4628.16 via Frontend Transport; Thu, 21 Oct 2021 08:52:27 +0000 Received: from nvidia.com (172.20.187.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Thu, 21 Oct 2021 08:52:23 +0000 From: Eli Britstein To: CC: Matan Azrad , Asaf Penso , "Slava Ovsiienko" , Thomas Monjalon , , , , Eli Britstein , Date: Thu, 21 Oct 2021 11:51:32 +0300 Message-ID: <20211021085132.12672-3-elibr@nvidia.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20211021085132.12672-1-elibr@nvidia.com> References: <20210713064910.12793-1-elibr@nvidia.com> <20211021085132.12672-1-elibr@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [172.20.187.6] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 7e627750-9681-4b01-09a2-08d994701c5a X-MS-TrafficTypeDiagnostic: BN9PR12MB5290: X-LD-Processed: 43083d15-7273-40c1-b7db-39efd9ccc17a,ExtAddr X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:5516; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: xB0TH9nWyK5j9IZtuSa4ZIP9Zg3P54yTAASSP4osG5GX0uoDCf56mfV+gI2h4iOXX2Cl5pDPqkEsnHiDGpKXG2UsNzS245YBJ6QaPicIp9H46kHLq8xY5/UCvkX7P+9nwGQf+i8Nm0vlVnFdfxv5QDFodEW7U8hrPdHCHjMOS3m+t9YYtRjwN0/Irh3XWm+CLqGt2fsLzYabYoZjC0k/2QgYcXz+B2pJTgES7pJaqqQo1mj6zZj0qg47AyCN9x+GlhAQuzTwAXJN0lMKXdRVF/wb/HnvZ/3wAPltbSF4UR0YV4xsPlW87IhfpyN2p6vIL58YJGOwiRJ9ga3c4eZYzzS+JeAXDFG0JZO6wlftYe9nz8KscPmefmwGTqWMPLhdw2oZoUcJrBud726kKXdpv9gOvsLZ3r9Pm6S/o8uDddeQGwQuZdYSHX6C7yrtd05FjpM5AgP1EQFZrKGCjJ2/MIBcfvKfnZEcQSgsgeLbRTQ/nGseRBQzNHm1EQIWuzGtx1Zl8Q+F15vLActUK1ZY7h6M8b98ZwQ1ZVHxNZAGZ4D3KkpwlKZ41Q2GtjcStPyhrws+xHAjumksa0WY7GMpNZsntGkPkbv864yD8hTtQcbhMUeHlXn+2UiTLZFJax/JJ2beMYzL0zsrvuJIvDeJE4ymD0EvhjD9K511iiCIWKqF+0WmBQUtq83rP6nZdpMIhALsqLsMGFqR0dax9NTZgQ== X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(36840700001)(46966006)(36906005)(2906002)(316002)(36756003)(83380400001)(86362001)(356005)(7636003)(6916009)(8676002)(8936002)(7696005)(82310400003)(2616005)(186003)(6666004)(70586007)(1076003)(336012)(16526019)(26005)(426003)(47076005)(70206006)(55016002)(54906003)(4326008)(36860700001)(508600001)(5660300002)(6286002); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Oct 2021 08:52:27.7279 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7e627750-9681-4b01-09a2-08d994701c5a X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT003.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN9PR12MB5290 Subject: [dpdk-dev] [PATCH V2 3/3] eal/x86: avoid cast-align warning in x86 memcpy functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Functions and macros in x86 rte_memcpy.h may cause cast-align warnings, when using strict cast align flag with supporting gcc: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static For example: In file included from main.c:24: /dpdk/build/include/rte_memcpy.h: In function 'rte_mov16': /dpdk/build/include/rte_memcpy.h:306:25: warning: cast increases required alignment of target type [-Wcast-align] 306 | xmm0 = _mm_loadu_si128((const __m128i *)src); | ^ As the code assumes correct alignment, add first a (void *) or (const void *) castings, to avoid the warnings. Fixes: 9484092baad3 ("eal/x86: optimize memcpy for AVX512 platforms") Cc: stable@dpdk.org Signed-off-by: Eli Britstein --- lib/eal/x86/include/rte_memcpy.h | 80 ++++++++++++++++++-------------- 1 file changed, 44 insertions(+), 36 deletions(-) diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h index 79f381dd9b..1b6c6e585f 100644 --- a/lib/eal/x86/include/rte_memcpy.h +++ b/lib/eal/x86/include/rte_memcpy.h @@ -303,8 +303,8 @@ rte_mov16(uint8_t *dst, const uint8_t *src) { __m128i xmm0; - xmm0 = _mm_loadu_si128((const __m128i *)src); - _mm_storeu_si128((__m128i *)dst, xmm0); + xmm0 = _mm_loadu_si128((const __m128i *)(const void *)src); + _mm_storeu_si128((__m128i *)(void *)dst, xmm0); } /** @@ -316,8 +316,8 @@ rte_mov32(uint8_t *dst, const uint8_t *src) { __m256i ymm0; - ymm0 = _mm256_loadu_si256((const __m256i *)src); - _mm256_storeu_si256((__m256i *)dst, ymm0); + ymm0 = _mm256_loadu_si256((const __m256i *)(const void *)src); + _mm256_storeu_si256((__m256i *)(void *)dst, ymm0); } /** @@ -354,16 +354,24 @@ rte_mov128blocks(uint8_t *dst, const uint8_t *src, size_t n) __m256i ymm0, ymm1, ymm2, ymm3; while (n >= 128) { - ymm0 = _mm256_loadu_si256((const __m256i *)((const uint8_t *)src + 0 * 32)); + ymm0 = _mm256_loadu_si256((const __m256i *)(const void *) + ((const uint8_t *)src + 0 * 32)); n -= 128; - ymm1 = _mm256_loadu_si256((const __m256i *)((const uint8_t *)src + 1 * 32)); - ymm2 = _mm256_loadu_si256((const __m256i *)((const uint8_t *)src + 2 * 32)); - ymm3 = _mm256_loadu_si256((const __m256i *)((const uint8_t *)src + 3 * 32)); + ymm1 = _mm256_loadu_si256((const __m256i *)(const void *) + ((const uint8_t *)src + 1 * 32)); + ymm2 = _mm256_loadu_si256((const __m256i *)(const void *) + ((const uint8_t *)src + 2 * 32)); + ymm3 = _mm256_loadu_si256((const __m256i *)(const void *) + ((const uint8_t *)src + 3 * 32)); src = (const uint8_t *)src + 128; - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 0 * 32), ymm0); - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 1 * 32), ymm1); - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 2 * 32), ymm2); - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 3 * 32), ymm3); + _mm256_storeu_si256((__m256i *)(void *) + ((uint8_t *)dst + 0 * 32), ymm0); + _mm256_storeu_si256((__m256i *)(void *) + ((uint8_t *)dst + 1 * 32), ymm1); + _mm256_storeu_si256((__m256i *)(void *) + ((uint8_t *)dst + 2 * 32), ymm2); + _mm256_storeu_si256((__m256i *)(void *) + ((uint8_t *)dst + 3 * 32), ymm3); dst = (uint8_t *)dst + 128; } } @@ -496,8 +504,8 @@ rte_mov16(uint8_t *dst, const uint8_t *src) { __m128i xmm0; - xmm0 = _mm_loadu_si128((const __m128i *)(const __m128i *)src); - _mm_storeu_si128((__m128i *)dst, xmm0); + xmm0 = _mm_loadu_si128((const __m128i *)(const void *)src); + _mm_storeu_si128((__m128i *)(void *)dst, xmm0); } /** @@ -581,25 +589,25 @@ rte_mov256(uint8_t *dst, const uint8_t *src) __extension__ ({ \ size_t tmp; \ while (len >= 128 + 16 - offset) { \ - xmm0 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 0 * 16)); \ + xmm0 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 0 * 16)); \ len -= 128; \ - xmm1 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 1 * 16)); \ - xmm2 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 2 * 16)); \ - xmm3 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 3 * 16)); \ - xmm4 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 4 * 16)); \ - xmm5 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 5 * 16)); \ - xmm6 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 6 * 16)); \ - xmm7 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 7 * 16)); \ - xmm8 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 8 * 16)); \ + xmm1 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 1 * 16)); \ + xmm2 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 2 * 16)); \ + xmm3 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 3 * 16)); \ + xmm4 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 4 * 16)); \ + xmm5 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 5 * 16)); \ + xmm6 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 6 * 16)); \ + xmm7 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 7 * 16)); \ + xmm8 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 8 * 16)); \ src = (const uint8_t *)src + 128; \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 0 * 16), _mm_alignr_epi8(xmm1, xmm0, offset)); \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 1 * 16), _mm_alignr_epi8(xmm2, xmm1, offset)); \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 2 * 16), _mm_alignr_epi8(xmm3, xmm2, offset)); \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 3 * 16), _mm_alignr_epi8(xmm4, xmm3, offset)); \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 4 * 16), _mm_alignr_epi8(xmm5, xmm4, offset)); \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 5 * 16), _mm_alignr_epi8(xmm6, xmm5, offset)); \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 6 * 16), _mm_alignr_epi8(xmm7, xmm6, offset)); \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 7 * 16), _mm_alignr_epi8(xmm8, xmm7, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 0 * 16), _mm_alignr_epi8(xmm1, xmm0, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 1 * 16), _mm_alignr_epi8(xmm2, xmm1, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 2 * 16), _mm_alignr_epi8(xmm3, xmm2, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 3 * 16), _mm_alignr_epi8(xmm4, xmm3, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 4 * 16), _mm_alignr_epi8(xmm5, xmm4, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 5 * 16), _mm_alignr_epi8(xmm6, xmm5, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 6 * 16), _mm_alignr_epi8(xmm7, xmm6, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 7 * 16), _mm_alignr_epi8(xmm8, xmm7, offset)); \ dst = (uint8_t *)dst + 128; \ } \ tmp = len; \ @@ -609,13 +617,13 @@ __extension__ ({ dst = (uint8_t *)dst + tmp; \ if (len >= 32 + 16 - offset) { \ while (len >= 32 + 16 - offset) { \ - xmm0 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 0 * 16)); \ + xmm0 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 0 * 16)); \ len -= 32; \ - xmm1 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 1 * 16)); \ - xmm2 = _mm_loadu_si128((const __m128i *)((const uint8_t *)src - offset + 2 * 16)); \ + xmm1 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 1 * 16)); \ + xmm2 = _mm_loadu_si128((const __m128i *)(const void *)((const uint8_t *)src - offset + 2 * 16)); \ src = (const uint8_t *)src + 32; \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 0 * 16), _mm_alignr_epi8(xmm1, xmm0, offset)); \ - _mm_storeu_si128((__m128i *)((uint8_t *)dst + 1 * 16), _mm_alignr_epi8(xmm2, xmm1, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 0 * 16), _mm_alignr_epi8(xmm1, xmm0, offset)); \ + _mm_storeu_si128((__m128i *)(void *)((uint8_t *)dst + 1 * 16), _mm_alignr_epi8(xmm2, xmm1, offset)); \ dst = (uint8_t *)dst + 32; \ } \ tmp = len; \