From patchwork Thu Dec 30 14:37:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 105530 X-Patchwork-Delegate: david.marchand@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A3F04A04A5; Thu, 30 Dec 2021 15:38:16 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D9CC54114B; Thu, 30 Dec 2021 15:38:13 +0100 (CET) Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2051.outbound.protection.outlook.com [40.107.237.51]) by mails.dpdk.org (Postfix) with ESMTP id C8E0641148 for ; Thu, 30 Dec 2021 15:38:10 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OJILh5PhUci3xoHse3cjsLXw0W5g/oa1ncnM0mEdzUA/iAFuEEXRCTqlXvWOqetxQR8jESOfZZr0b1cXVJXyFrfQ6fYn4gE/fUiWLP4pSJ1WpmyC9sjJ+YV1MijNrl3+mx6FNrxXwan+iWJ7MBMwSEpgzCueLgqo+eXVf5YmeVfpL0Pq2a7TUHP2e7b/VyuIM7f609/srn/Q6z1OFZ9GjwR1ZUnrfaYtppMcoTTeVie2j61WEkgwjO8oz4XMRvrVlRQURSsR44xfeYVF/pHzc5LDBb72HvMBoGcSmtdZoWdkghAyNvMFbfGwSZQ4Hv55vwKZI8CFTZL7NOy/LNeC2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qlfQCgnXh7AmEdsWvM57eF3cRI6VRAE5nfICFWeemwQ=; b=jdLr8kK46t11WvnhFAf+kqTJedvNGJSTalE24s+b8viioY8Ry0cdyEGMQkdR1DvOnyCejN2UDDmgasa5UroOIVYeLFAoBjj09dz0lg8ltVWT00Vh5ZWZ6SEaBzbkDSBPSo+QM4RCHKvDvLpqGYI7X1W6J0yqBSXBQo9/LfK5RwErAcYuBKmX5Jljc3NNCxFTPWdP13Wgmf5WNDbAM78ZNCnrh0+mG+w0lauo6rYEVn/sTNGfmzgw0TK+wY5ZmxPrFLY7JGgjvBHjhDY7CFkd+hrBR76BErsVFwif0cfp3mcEMX2WJyv37XeBE50n34Xwnb0GveZ6CkAwjSLqjKIiEw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.236) smtp.rcpttodomain=monjalon.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qlfQCgnXh7AmEdsWvM57eF3cRI6VRAE5nfICFWeemwQ=; b=PyJDbamQPpBIPDNX5QzvrxVbmCz9EvEsS2cxXhdnUKAHZjtHjHCZRVuF/p7MQX78DDq41/A1HPFI9XCEwdr3qVIc1fOCmwiKFhWcNl5Y6CI+FiqSuRTkGnpzywHDeO9ssvLr4EO3TMdf2kHYFMPRAKIiLx5nTexNw2RGkRHvRWkiu2yXN35Z0yu+qVkZAh7bKQVnvvqCqAmfHnyW3ksl7AAxzATj6uQE2AQWsOegk1gTPEm3Q3Sq6lPkcm00Y7yU78WP7ErImrcMxRTKDVwb0eg3Y7Cyb/2fhGv6p4aB6MKscF4gDW52FHLGXexDsIRSAruiJp8NX58o0vtVdfEn/Q== Received: from CO2PR18CA0051.namprd18.prod.outlook.com (2603:10b6:104:2::19) by DM4PR12MB5342.namprd12.prod.outlook.com (2603:10b6:5:39f::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4823.22; Thu, 30 Dec 2021 14:38:09 +0000 Received: from CO1NAM11FT053.eop-nam11.prod.protection.outlook.com (2603:10b6:104:2:cafe::31) by CO2PR18CA0051.outlook.office365.com (2603:10b6:104:2::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4844.13 via Frontend Transport; Thu, 30 Dec 2021 14:38:09 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.236) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.236 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.236; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.236) by CO1NAM11FT053.mail.protection.outlook.com (10.13.175.63) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.4844.14 via Frontend Transport; Thu, 30 Dec 2021 14:38:08 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by DRHQMAIL109.nvidia.com (10.27.9.19) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Thu, 30 Dec 2021 14:38:07 +0000 Received: from nvidia.com (172.20.187.6) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.986.9; Thu, 30 Dec 2021 06:38:01 -0800 From: Dmitry Kozlyuk To: CC: Anatoly Burakov , Thomas Monjalon Subject: [RFC PATCH 1/6] doc: add hugepage mapping details Date: Thu, 30 Dec 2021 16:37:39 +0200 Message-ID: <20211230143744.3550098-2-dkozlyuk@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211230143744.3550098-1-dkozlyuk@nvidia.com> References: <20211230143744.3550098-1-dkozlyuk@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [172.20.187.6] X-ClientProxiedBy: HQMAIL105.nvidia.com (172.20.187.12) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 1b5d6a76-ecc8-4291-ed44-08d9cba1ff94 X-MS-TrafficTypeDiagnostic: DM4PR12MB5342:EE_ X-LD-Processed: 43083d15-7273-40c1-b7db-39efd9ccc17a,ExtAddr X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: XSfLMdPvU8hsj0H+bRE10djZozRuuUn+Mn3OQ+yTvdl24zBy9Y/c95zTcM/FQeNCxzbbnHHBEpI/OfjmUdOUb5CysSoNjCqcSE7SOUxNeUlPQpmyszfO6EaA/pb7mbKk98GyfR1QZrOhSIwFYU5iAn+E6ihJ5F8Gby5Vbq51xdQp2qvMaIhV+VjDyvv+0Ss91W5vMhI76g/kH+U1bBJrfvlBbtdA5BF7nh8s0J3cxJKq/yrgM6gwEr0utaJrRtTcoWk322XXChi/GL5nVAPHYsY4L2BLXov/4DWH7r6n5386kpBOLXhUrLn38eJVTylYmRG1+fmXgqGFg6shnyD+A8PkcH01t290JrD3CVcYsWNw6KMbi0cEV+xh79yG1dCPV8Xq09228Che/F5GAQX9z1JrW5GTwXiPGl9hsxidX1fYY79wvRzW8j5RttqCFl06LuCQraIA5678JYWAFAtEOFQxhEOFU03fj8XOOnYmbEKOE+fPhEsjUKPO4vN0PP1v3l4NGHlqRHX9LLnZODfVWNNLP4eptwlyR8cZ29jQCaUvPaTj8Ibd+YvxQZsuciQxD7bIIYCNjd/vL/yIluA+U6806DqgrPd+fRW883ua25Rpfjn3zfq+rDbnPyxw1QStOoAGronIWVlV4y8jEThXqZl0bLDidiJmCqjZTSxe/bTy5DsWNc4jH3+HNmRv34zs9044eWNR4UUGyxDWavtB3EC7FSXN5BnMFTU3N2TY9+dByUhX8xY5kVFnU3s0cdm122GA2PBi5Om+LeZl3WmSMMT034HaMa8XxGDNLyGdJgA= X-Forefront-Antispam-Report: CIP:12.22.5.236; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(4636009)(40470700002)(46966006)(36840700001)(82310400004)(336012)(508600001)(83380400001)(7696005)(6916009)(6666004)(40460700001)(356005)(316002)(55016003)(70206006)(8936002)(26005)(2906002)(81166007)(16526019)(186003)(86362001)(54906003)(4326008)(36756003)(1076003)(8676002)(70586007)(47076005)(426003)(2616005)(6286002)(5660300002)(36860700001)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Dec 2021 14:38:08.6509 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1b5d6a76-ecc8-4291-ed44-08d9cba1ff94 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.236]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT053.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB5342 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hugepage mapping is a layer of EAL malloc builds upon. There were implicit references to its details, like mentions of segment file descriptors, but no explicit description of its modes and operation. Add an overview of mechanics used on ech supported OS. Convert memory management subsections from list items to level 4 headers: they are big and important enough. Signed-off-by: Dmitry Kozlyuk --- .../prog_guide/env_abstraction_layer.rst | 85 +++++++++++++++++-- 1 file changed, 76 insertions(+), 9 deletions(-) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 29f6fefc48..6cddb86467 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -86,7 +86,7 @@ See chapter Memory Mapping Discovery and Memory Reservation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The allocation of large contiguous physical memory is done using the hugetlbfs kernel filesystem. +The allocation of large contiguous physical memory is done using hugepages. The EAL provides an API to reserve named memory zones in this contiguous memory. The physical address of the reserved memory for that memory zone is also returned to the user by the memory zone reservation API. @@ -95,11 +95,12 @@ and legacy mode. Both modes are explained below. .. note:: - Memory reservations done using the APIs provided by rte_malloc are also backed by pages from the hugetlbfs filesystem. + Memory reservations done using the APIs provided by rte_malloc are also backed by hugepages. -+ Dynamic memory mode +Dynamic Memory Mode +^^^^^^^^^^^^^^^^^^^ -Currently, this mode is only supported on Linux. +Currently, this mode is only supported on Linux and Windows. In this mode, usage of hugepages by DPDK application will grow and shrink based on application's requests. Any memory allocation through ``rte_malloc()``, @@ -155,7 +156,8 @@ of memory that can be used by DPDK application. :ref:`Multi-process Support ` for more details about DPDK IPC. -+ Legacy memory mode +Legacy Memory Mode +^^^^^^^^^^^^^^^^^^ This mode is enabled by specifying ``--legacy-mem`` command-line switch to the EAL. This switch will have no effect on FreeBSD as FreeBSD only supports @@ -168,7 +170,8 @@ not allow acquiring or releasing hugepages from the system at runtime. If neither ``-m`` nor ``--socket-mem`` were specified, the entire available hugepage memory will be preallocated. -+ Hugepage allocation matching +Hugepage Allocation Matching +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This behavior is enabled by specifying the ``--match-allocations`` command-line switch to the EAL. This switch is Linux-only and not supported with @@ -182,7 +185,8 @@ matching can be used by these types of applications to satisfy both of these requirements. This can result in some increased memory usage which is very dependent on the memory allocation patterns of the application. -+ 32-bit support +32-bit Support +^^^^^^^^^^^^^^ Additional restrictions are present when running in 32-bit mode. In dynamic memory mode, by default maximum of 2 gigabytes of VA space will be preallocated, @@ -192,7 +196,8 @@ used. In legacy mode, VA space will only be preallocated for segments that were requested (plus padding, to keep IOVA-contiguousness). -+ Maximum amount of memory +Maximum Amount of Memory +^^^^^^^^^^^^^^^^^^^^^^^^ All possible virtual memory space that can ever be used for hugepage mapping in a DPDK process is preallocated at startup, thereby placing an upper limit on how @@ -222,7 +227,68 @@ Normally, these options do not need to be changed. can later be mapped into that preallocated VA space (if dynamic memory mode is enabled), and can optionally be mapped into it at startup. -+ Segment file descriptors +Hugepage Mapping +^^^^^^^^^^^^^^^^ + +Below is an overview of methods used for each OS to obtain hugepages, +explaining why certain limitations and options exist in EAL. +See the user guide for a specific OS for configuration details. + +FreeBSD uses ``contigmem`` kernel module +to reserve a fixed number of hugepages at system start, +which are mapped by EAL at initialization using a specific ``sysctl()``. + +Windows EAL allocates hugepages from the OS as needed using Win32 API, +so available amount depends on the system load. +It uses ``virt2phys`` kernel module to obtain physical addresses, +unless running in IOVA-as-VA mode (e.g. forced with ``--iova-mode=va``). + +Linux implements a variety of methods: + +* mapping each hugepage from its own file in hugetlbfs; +* mapping multiple hugepages from a shared file in hugetlbfs; +* anonymous mapping. + +Mapping hugepages from files in hugetlbfs is essential for multi-process, +because secondary processes need to map the same hugepages. +EAL creates files like ``rtemap_0`` +in directories specified with ``--huge-dir`` option +(or in the mount point for a specific hugepage size). +The ``rtemap_`` prefix can be changed using ``--file-prefix``. +This may be needed for running multiple primary processes +that share a hugetlbfs mount point. +Each backing file by default corresponds to one hugepage, +it is opened and locked for the entire time the hugepage is used. +See :ref:`segment-file-descriptors` section +on how the number of open backing file descriptors can be reduced. + +Backing files may persist after the corresponding hugepage is freed +and even after the application terminates, +reducing the number of hugepages available to other processes. +EAL removes existing files at startup +and can remove newly created files before mapping them with ``--huge-unlink``. +However, since it disables multi-process anyway, +using anonymous mapping (``--in-memory``) is recommended instead. + +:ref:`EAL memory allocator ` relies on hugepages being zero-filled. +Hugepages are cleared by the kernel when a file in hugetlbfs or its part +is mapped for the first time system-wide +to prevent data leaks from previous users of the same hugepage. +EAL ensures this behavior by removing existing backing files at startup +and by recreating them before opening for mapping (as a precaution). + +Anonymous mapping does not allow multi-process architecture, +but it is free of filename conflicts and leftover files on hugetlbfs. +If memfd_create(2) is supported both at build and run time, +DPDK memory manager can provide file descriptors for memory segments, +which are required for VirtIO with vhost-user backend. +This means open file descriptor issues may also affect this mode, +with the same solution. + +.. _segment-file-descriptors: + +Segment File Descriptors +^^^^^^^^^^^^^^^^^^^^^^^^ On Linux, in most cases, EAL will store segment file descriptors in EAL. This can become a problem when using smaller page sizes due to underlying limitations @@ -731,6 +797,7 @@ We expect only 50% of CPU spend on packet IO. echo 100000 > pkt_io/cpu.cfs_period_us echo 50000 > pkt_io/cpu.cfs_quota_us +.. _malloc: Malloc ------ From patchwork Thu Dec 30 14:37:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 105531 X-Patchwork-Delegate: david.marchand@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 04900A04A5; Thu, 30 Dec 2021 15:38:23 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E3A3941155; Thu, 30 Dec 2021 15:38:14 +0100 (CET) Received: from NAM04-MW2-obe.outbound.protection.outlook.com (mail-mw2nam08on2044.outbound.protection.outlook.com [40.107.101.44]) by mails.dpdk.org (Postfix) with ESMTP id CE0DA4114B for ; Thu, 30 Dec 2021 15:38:11 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cYvhmC2wap3YhYud1m0BbHEvVFc7KjpDvQQPQ+g6Q11Y2QbooDqvwVwHx5zJMfqDs+Vhhw6nCLMRSOB5pJeGH4toq7hnDVxqqqDff6g0XVrKZdRwBwB+LeoGnYqtKmIHvt2dqfA0phw5JgjWGL2GRb5SAD2vRogJeGHVFbao70eyF/v0IsLjhlvy71b6zR2tUWHAeZEp/5zXlns9ZcG+Nv4CkK+l25S2ilTLQR8My1MgnKFLvCBDJTiVTE+WPEC0AQrcFRxlRLKjjZ59yO9oaTdVtpLtXRa4BJkAuM0GOqNfBjk10Vt2pXWt9+U2tJ8oAw5RLHHtGChtMVOD5OUmdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8eCp5kln7VpcTxH7TvnKPCtOGgMUQyayCZHVigMpvTU=; b=AIpGtzvL4TYN24tPIISgjrQldechMYLkcRzj3sFRfDL2f3K74ow1hFDiLXFevKS187jK/qcOkF6Rsi30vVhzDXjwukW//d0x6ekHIvOfsf+B1N6BiJLey/UksD06SQJ9hT2AAn6BMZ87kJrjTeRrD94alcy6a4PC9ol7/lwPxLisvUEmEkq5q28hdoFP7Mz5N/0GYY0SvOufaVLHyRnEjnTMHYFRIQD8CgGUsBVM9ypecIRuRKB/cAPseBNBdoxlJcxOwvLkEqqvz96oHgdCzNTYIJ1ll4+B54D8cCL4gVNW6tq+wmwMMfUO+ia4mxB9Ngp+ATGdmaylq+DT31/Esw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.236) smtp.rcpttodomain=intel.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8eCp5kln7VpcTxH7TvnKPCtOGgMUQyayCZHVigMpvTU=; b=WWMRqBXiulx+Vt6mTLSss5/I2NT1avJp6Yi0pnKzjI+362o2ErNkjlELbBH7qktMrSDuSXwmtXmmVD54xHG5i5eWvUsONhZZTCNcHdXT/XQ0KvG5WPHZiv+Fu9wKQizsqvRsQ5kW9HLtqn+Sp8KzFPh31mF2ngVSDdpSv3ky1/yhCMTJUIOzckAMvV70R1/tjNsQQgtJURVmD5FnmZNlsuly+dv21rkaE0y1/7XBrU13k74Wya9Jdn27SZexAjN/1onnX9NbFSXXb9xO3QwuTP9ckTOK5eQAgxeKgrtF24pJpXWzQNZi4uEtVykR2iYm9qyoyaN/T73nxDUOapYDXQ== Received: from CO2PR18CA0049.namprd18.prod.outlook.com (2603:10b6:104:2::17) by BN8PR12MB2865.namprd12.prod.outlook.com (2603:10b6:408:96::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4823.18; Thu, 30 Dec 2021 14:38:09 +0000 Received: from CO1NAM11FT053.eop-nam11.prod.protection.outlook.com (2603:10b6:104:2:cafe::2a) by CO2PR18CA0049.outlook.office365.com (2603:10b6:104:2::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4844.14 via Frontend Transport; Thu, 30 Dec 2021 14:38:09 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.236) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.236 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.236; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.236) by CO1NAM11FT053.mail.protection.outlook.com (10.13.175.63) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.4844.14 via Frontend Transport; Thu, 30 Dec 2021 14:38:09 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by DRHQMAIL109.nvidia.com (10.27.9.19) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Thu, 30 Dec 2021 14:38:07 +0000 Received: from nvidia.com (172.20.187.6) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.986.9; Thu, 30 Dec 2021 06:38:03 -0800 From: Dmitry Kozlyuk To: CC: Anatoly Burakov Subject: [RFC PATCH 2/6] mem: add dirty malloc element support Date: Thu, 30 Dec 2021 16:37:40 +0200 Message-ID: <20211230143744.3550098-3-dkozlyuk@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211230143744.3550098-1-dkozlyuk@nvidia.com> References: <20211230143744.3550098-1-dkozlyuk@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [172.20.187.6] X-ClientProxiedBy: HQMAIL105.nvidia.com (172.20.187.12) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 39215f94-b823-4b21-f8d0-08d9cba20004 X-MS-TrafficTypeDiagnostic: BN8PR12MB2865:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: KAgZ80A41HjvA5KVj0CpPCyCbfy5gXNA2o14TgkU3fK3rbRrC2rklXFExC8hOCUsgkSeG8usBm9etPXrFL7I8WkvMZA9HvKDnahFHMQ6jUI/uuGH2xuIOLkXqsFnhQzbfFbEEZ1p7t36o5vZtWystFtDOPXsf624QWCM4utb9Nnq4aiABq1GpKniiv4PP115l+sRmCHMwUckNp5jK4b8koYChEpbLyWex/utre0YYHmEnF5d4gK/j7lDGbSw/uISo8WlNm1nCzmwdFeRD9m+m+78FCeeHcBRnV5Xu27bfwm5GNSnIH/ZffSIETBnMTHs9QWFryBJme8A9TBHdovvkaAcvyryb9PRoM61pimSMBG1I6RELXWh+Rvo9GzOtakr1KSD+6d/GTQUN4WMIqL5A1njuityjiCyTd+Qiq/wpSjCVLMd3JT0OwQG/uLtkslU2KMQMiJBU6PlqOl9o7A4POHe2WnCns353FMMwyTKfbFwkyJQFPsHHDCLnpy/IdxX7QAGG3ikiAlI3sG1HQeFUGelxCXI0VZpE/EcRVFwLhEb4QaXM6SfeUQmSXmGDWkKfdUkXWXPdgEt9zdWiWR8IUPbyuBxdBq2NSYrVtIguqNATHcPpVQlDiGMVIUshdz9/dgrtydiDZph5rD5bUyYluStJuyXaTFpMUnVOPYcGMEysgrx2K9/xuA7uT9IYE6ONbScSjuHZzVoNZSJZxTVjLd3IXPJTOW2iTwDwUEISB+W5ooMuuUai7gCoCnZBwvlN6Z7pruNsgSqcdO+ixtt/97Mfbs+NHNXAsYmJIjEZ+/ne0/Nmpuf3Xi6jxieHitayJSNm4m/z52UfqQ/VBbzsA== X-Forefront-Antispam-Report: CIP:12.22.5.236; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(4636009)(40470700002)(36840700001)(46966006)(70206006)(81166007)(7696005)(2616005)(26005)(4326008)(316002)(6916009)(508600001)(83380400001)(47076005)(1076003)(6666004)(70586007)(16526019)(6286002)(86362001)(36756003)(2906002)(5660300002)(186003)(40460700001)(356005)(8676002)(426003)(336012)(8936002)(82310400004)(36860700001)(55016003)(14143004)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Dec 2021 14:38:09.3852 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 39215f94-b823-4b21-f8d0-08d9cba20004 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.236]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT053.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN8PR12MB2865 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org EAL malloc layer assumed all free elements content is filled with zeros ("clean"), as opposed to uninitialized ("dirty"). This assumption was ensured in two ways: 1. EAL memalloc layer always returned clean memory. 2. Freed memory was cleared before returning into the heap. Clearing the memory can be as slow as around 14 GiB/s. To save doing so, memalloc layer is allowed to return dirty memory. Such segments being marked with RTE_MEMSEG_FLAG_DIRTY. The allocator tracks elements that contain dirty memory using the new flag in the element header. When clean memory is requested via rte_zmalloc*() and the suitable element is dirty, it is cleared on allocation. When memory is deallocated, the freed element is joined with adjacent free elements, and the dirty flag is updated: dirty + freed + dirty = dirty => no need to clean freed + dirty = dirty the freed memory clean + freed + clean = clean => freed memory clean + freed = clean must be cleared freed + clean = clean freed = clean As a result, memory is either cleared on free, as before, or it will be cleared on allocation if need be, but never twice. Signed-off-by: Dmitry Kozlyuk --- lib/eal/common/malloc_elem.c | 22 +++++++++++++++++++--- lib/eal/common/malloc_elem.h | 11 +++++++++-- lib/eal/common/malloc_heap.c | 18 ++++++++++++------ lib/eal/common/rte_malloc.c | 21 ++++++++++++++------- lib/eal/include/rte_memory.h | 8 ++++++-- 5 files changed, 60 insertions(+), 20 deletions(-) diff --git a/lib/eal/common/malloc_elem.c b/lib/eal/common/malloc_elem.c index bdd20a162e..e04e0890fb 100644 --- a/lib/eal/common/malloc_elem.c +++ b/lib/eal/common/malloc_elem.c @@ -129,7 +129,7 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align) void malloc_elem_init(struct malloc_elem *elem, struct malloc_heap *heap, struct rte_memseg_list *msl, size_t size, - struct malloc_elem *orig_elem, size_t orig_size) + struct malloc_elem *orig_elem, size_t orig_size, bool dirty) { elem->heap = heap; elem->msl = msl; @@ -137,6 +137,7 @@ malloc_elem_init(struct malloc_elem *elem, struct malloc_heap *heap, elem->next = NULL; memset(&elem->free_list, 0, sizeof(elem->free_list)); elem->state = ELEM_FREE; + elem->dirty = dirty; elem->size = size; elem->pad = 0; elem->orig_elem = orig_elem; @@ -300,7 +301,7 @@ split_elem(struct malloc_elem *elem, struct malloc_elem *split_pt) const size_t new_elem_size = elem->size - old_elem_size; malloc_elem_init(split_pt, elem->heap, elem->msl, new_elem_size, - elem->orig_elem, elem->orig_size); + elem->orig_elem, elem->orig_size, elem->dirty); split_pt->prev = elem; split_pt->next = next_elem; if (next_elem) @@ -506,6 +507,7 @@ join_elem(struct malloc_elem *elem1, struct malloc_elem *elem2) else elem1->heap->last = elem1; elem1->next = next; + elem1->dirty |= elem2->dirty; if (elem1->pad) { struct malloc_elem *inner = RTE_PTR_ADD(elem1, elem1->pad); inner->size = elem1->size - elem1->pad; @@ -579,6 +581,14 @@ malloc_elem_free(struct malloc_elem *elem) ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN); data_len = elem->size - MALLOC_ELEM_OVERHEAD; + /* + * Consider the element clean for the purposes of joining. + * If both neighbors are clean or non-existent, + * the joint element will be clean, + * which means the memory should be cleared. + * There is no need to clear the memory if the joint element is dirty. + */ + elem->dirty = false; elem = malloc_elem_join_adjacent_free(elem); malloc_elem_free_list_insert(elem); @@ -588,8 +598,14 @@ malloc_elem_free(struct malloc_elem *elem) /* decrease heap's count of allocated elements */ elem->heap->alloc_count--; - /* poison memory */ +#ifndef RTE_MALLOC_DEBUG + /* Normally clear the memory when needed. */ + if (!elem->dirty) + memset(ptr, 0, data_len); +#else + /* Always poison the memory in debug mode. */ memset(ptr, MALLOC_POISON, data_len); +#endif return elem; } diff --git a/lib/eal/common/malloc_elem.h b/lib/eal/common/malloc_elem.h index 15d8ba7af2..f2aa98821b 100644 --- a/lib/eal/common/malloc_elem.h +++ b/lib/eal/common/malloc_elem.h @@ -27,7 +27,13 @@ struct malloc_elem { LIST_ENTRY(malloc_elem) free_list; /**< list of free elements in heap */ struct rte_memseg_list *msl; - volatile enum elem_state state; + /** Element state, @c dirty and @c pad validity depends on it. */ + /* An extra bit is needed to represent enum elem_state as signed int. */ + enum elem_state state : 3; + /** If state == ELEM_FREE: the memory is not filled with zeroes. */ + uint32_t dirty : 1; + /** Reserved for future use. */ + uint32_t reserved : 28; uint32_t pad; size_t size; struct malloc_elem *orig_elem; @@ -320,7 +326,8 @@ malloc_elem_init(struct malloc_elem *elem, struct rte_memseg_list *msl, size_t size, struct malloc_elem *orig_elem, - size_t orig_size); + size_t orig_size, + bool dirty); void malloc_elem_insert(struct malloc_elem *elem); diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c index 55aad2711b..24080fc473 100644 --- a/lib/eal/common/malloc_heap.c +++ b/lib/eal/common/malloc_heap.c @@ -93,11 +93,11 @@ malloc_socket_to_heap_id(unsigned int socket_id) */ static struct malloc_elem * malloc_heap_add_memory(struct malloc_heap *heap, struct rte_memseg_list *msl, - void *start, size_t len) + void *start, size_t len, bool dirty) { struct malloc_elem *elem = start; - malloc_elem_init(elem, heap, msl, len, elem, len); + malloc_elem_init(elem, heap, msl, len, elem, len, dirty); malloc_elem_insert(elem); @@ -135,7 +135,8 @@ malloc_add_seg(const struct rte_memseg_list *msl, found_msl = &mcfg->memsegs[msl_idx]; - malloc_heap_add_memory(heap, found_msl, ms->addr, len); + malloc_heap_add_memory(heap, found_msl, ms->addr, len, + ms->flags & RTE_MEMSEG_FLAG_DIRTY); heap->total_size += len; @@ -303,7 +304,8 @@ alloc_pages_on_heap(struct malloc_heap *heap, uint64_t pg_sz, size_t elt_size, struct rte_memseg_list *msl; struct malloc_elem *elem = NULL; size_t alloc_sz; - int allocd_pages; + int allocd_pages, i; + bool dirty = false; void *ret, *map_addr; alloc_sz = (size_t)pg_sz * n_segs; @@ -372,8 +374,12 @@ alloc_pages_on_heap(struct malloc_heap *heap, uint64_t pg_sz, size_t elt_size, goto fail; } + /* Element is dirty if it contains at least one dirty page. */ + for (i = 0; i < allocd_pages; i++) + dirty |= ms[i]->flags & RTE_MEMSEG_FLAG_DIRTY; + /* add newly minted memsegs to malloc heap */ - elem = malloc_heap_add_memory(heap, msl, map_addr, alloc_sz); + elem = malloc_heap_add_memory(heap, msl, map_addr, alloc_sz, dirty); /* try once more, as now we have allocated new memory */ ret = find_suitable_element(heap, elt_size, flags, align, bound, @@ -1260,7 +1266,7 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, memset(msl->base_va, 0, msl->len); /* now, add newly minted memory to the malloc heap */ - malloc_heap_add_memory(heap, msl, msl->base_va, msl->len); + malloc_heap_add_memory(heap, msl, msl->base_va, msl->len, false); heap->total_size += msl->len; diff --git a/lib/eal/common/rte_malloc.c b/lib/eal/common/rte_malloc.c index d0bec26920..71a3f7ecb4 100644 --- a/lib/eal/common/rte_malloc.c +++ b/lib/eal/common/rte_malloc.c @@ -115,15 +115,22 @@ rte_zmalloc_socket(const char *type, size_t size, unsigned align, int socket) { void *ptr = rte_malloc_socket(type, size, align, socket); + if (ptr != NULL) { + struct malloc_elem *elem = malloc_elem_from_data(ptr); + + if (elem->dirty) { + memset(ptr, 0, size); + } else { #ifdef RTE_MALLOC_DEBUG - /* - * If DEBUG is enabled, then freed memory is marked with poison - * value and set to zero on allocation. - * If DEBUG is not enabled then memory is already zeroed. - */ - if (ptr != NULL) - memset(ptr, 0, size); + /* + * If DEBUG is enabled, then freed memory is marked + * with a poison value and set to zero on allocation. + * If DEBUG is disabled then memory is already zeroed. + */ + memset(ptr, 0, size); #endif + } + } rte_eal_trace_mem_zmalloc(type, size, align, socket, ptr); return ptr; diff --git a/lib/eal/include/rte_memory.h b/lib/eal/include/rte_memory.h index 6d018629ae..d76e7ba780 100644 --- a/lib/eal/include/rte_memory.h +++ b/lib/eal/include/rte_memory.h @@ -19,6 +19,7 @@ extern "C" { #endif +#include #include #include #include @@ -37,11 +38,14 @@ extern "C" { #define SOCKET_ID_ANY -1 /**< Any NUMA socket. */ +/** Prevent this segment from being freed back to the OS. */ +#define RTE_MEMSEG_FLAG_DO_NOT_FREE RTE_BIT32(0) +/** This segment is not fileld with zeros. */ +#define RTE_MEMSEG_FLAG_DIRTY RTE_BIT32(1) + /** * Physical memory segment descriptor. */ -#define RTE_MEMSEG_FLAG_DO_NOT_FREE (1 << 0) -/**< Prevent this segment from being freed back to the OS. */ struct rte_memseg { rte_iova_t iova; /**< Start IO address. */ RTE_STD_C11 From patchwork Thu Dec 30 14:37:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 105532 X-Patchwork-Delegate: david.marchand@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id EAAA3A04A5; Thu, 30 Dec 2021 15:38:28 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C5BBF4115D; Thu, 30 Dec 2021 15:38:15 +0100 (CET) Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2055.outbound.protection.outlook.com [40.107.94.55]) by mails.dpdk.org (Postfix) with ESMTP id 9BC0641148 for ; Thu, 30 Dec 2021 15:38:12 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EAB1dbAIdR2MMx4ZCbP/uNBlP7t/8/MPAYTZ+5AKWewM65hwdpg9SJBmZUwf5l9rf4i09CbrPr96UnmIwqwHfDVOfslCGLiU6V4XsPWs2ymqLi4gy3FQn1XrXkYihsSFQ+hLa3WO0FLdc4llZCYoFD3Vfye74eFi0J5ufgjRlFitB0i0NCWRABVAZHdW001f7xlo8zX+z5t2b8BuTBW3ChNnb88vUsg8QIm6UBDLdX8bi7Vqo09OFzk5qsniLHiz5H85Y1303LF9wPYEqOdKQ7YMLw8qijuJncyVUFtSEf3Zx8jkYrLE+NekYPon4SQAOfPchFx4nkrydApjBckM+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=MotN3DXJjnc90k7ER7XOP0TEwbKuHQSsDOkUzmyIzf0=; b=D2sMQFutSte/BN5GqkK9m7sMUXaAd3XL5o5P/r/0i/NMBvuzutYakSLPoyUTV4maKlmacmF5h6slA/CkCa++xrVODm1ZxVLRCIXqWzD1HXGfZsdiDTsm5ovsJuIFl8Jg5uUxN6ePxZu7pnJftXEeFra0uUSQCYBa4GzupdMNiMNG8IEashD0NFicoAPvsAb6rEoah6b9VLo9BY3EjEZlz/xrh9VAoz3XeYdUiHyvmQKqXgxU1dslmoyDRn3MeuGiqha2fEHwnYmJztARAX8bggkQncWHBGByQP/extrCWiR4orvocipqKeQ2VPmuv31m9AFKyr1G1bIHMKwXtx3Iwg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.236) smtp.rcpttodomain=intel.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MotN3DXJjnc90k7ER7XOP0TEwbKuHQSsDOkUzmyIzf0=; b=hSNkMXu0Zlqyd6SNNb6waOS4JFkpkDRlwhlgCwg3m/GUwsVBx4PNsYeHA+DWGJhYPwbt61jY6UnsvsvEjMfJ4rXno1SXZ15bWWZsTxEoFf8acc75c9lULg4MlabRqY/SJfbI++qRXKij/ADy16v0VmPmVAnQTv2/ghzAC70+XYyW8FBOhzDiy+CoGra9BTXWsVYrvAurkdHD3cR+lLWSm6LQ0f760p3rrVIkJAO6Ig+ZVPq2Ci50uYmQUJhQIxw0W4bKPVFrFy8qos0n3Dj1UYfrDiGNLONWvh6giKv4BHPsV+1M+8uyK2WNF5bPWiC7wvlhBQdLTAo5XXH8T/ja2A== Received: from CO2PR18CA0063.namprd18.prod.outlook.com (2603:10b6:104:2::31) by DM6PR12MB4985.namprd12.prod.outlook.com (2603:10b6:5:16d::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4844.14; Thu, 30 Dec 2021 14:38:10 +0000 Received: from CO1NAM11FT053.eop-nam11.prod.protection.outlook.com (2603:10b6:104:2:cafe::ef) by CO2PR18CA0063.outlook.office365.com (2603:10b6:104:2::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4823.19 via Frontend Transport; Thu, 30 Dec 2021 14:38:10 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.236) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.236 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.236; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.236) by CO1NAM11FT053.mail.protection.outlook.com (10.13.175.63) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.4844.14 via Frontend Transport; Thu, 30 Dec 2021 14:38:10 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by DRHQMAIL109.nvidia.com (10.27.9.19) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Thu, 30 Dec 2021 14:38:08 +0000 Received: from nvidia.com (172.20.187.6) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.986.9; Thu, 30 Dec 2021 06:38:04 -0800 From: Dmitry Kozlyuk To: CC: Anatoly Burakov Subject: [RFC PATCH 3/6] eal: refactor --huge-unlink storage Date: Thu, 30 Dec 2021 16:37:41 +0200 Message-ID: <20211230143744.3550098-4-dkozlyuk@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211230143744.3550098-1-dkozlyuk@nvidia.com> References: <20211230143744.3550098-1-dkozlyuk@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [172.20.187.6] X-ClientProxiedBy: HQMAIL105.nvidia.com (172.20.187.12) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: de36673d-a378-47ea-3fbc-08d9cba20070 X-MS-TrafficTypeDiagnostic: DM6PR12MB4985:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:1850; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: bDutFIMqN5CI1ZE1F9tXual/T+b3nZpObpC9vMlJ7YtEOYm7/BJJal9JAVUSJm2cM0SAKOJ3FeGmNJu8BongHOe53UvtAxoagZQsY1fUIcmtu4h3XM6v7n7gpPsSEcL1KGwA6zQDGk7jg5CxNou6i7Q2siDgost9R8Vh3WTVAFoeT6MNXdbeHdzuzOwKtFPQveeliVpa7QU9GX9fqJ0wFFWntGNkiNbqwr35Xq2txcfy217z1owvBStWkzvT7lYbA97Bb8UlyaEYQGnmUYceoKupd0ZXvGSX16O1iPZMgTz8FCQaFpZsGT36qkeuewvfjiP96GH3Z+eTOa//nXI/SvXdY9wDq2hLq/A8dxF8zOXLoynj/5VBBPlQwQPOQT3Av46mBcB90b2Ux2QISROdzY+HPEWrzjvmlYmVeEtmCxeMvzUH5ez1V9js0yBCFnelO4O+9iZno4GYfk34gPJ/d2BQe03XN8nDrW4oy+DYloHq4HbwL5g38obnopDbxvtp/r28IQmGVNCz/dCOBqdGzKbfmPcLbWEybSFg9tRbw6miBcHsB0hJue7scP//WciNLTIq6QsqlQGWAMyLYzaL1+PkCvUc2dJWIzoAV8SWXCd9YpuRE8ZEREUCJnN3YBRt8lG428m/+/qedll6uo6bQq6EB7CgathKgMv8rTXwOaznTOrxSOTVc8lLbe0A7fojFUDLtDOZtU+mBVcQUFoZnqvZYHQLX3B/wg0G32ZAt/8LohJ5QB6fLuvdpp6QBJsyHQwcOT3OHegqEu8lr/FkCrOETg32qye7Urws8xYaMqNJcllNT6Qmkx6oxOS/b0vqpeIFePYzZkAE6D2CmBjobA== X-Forefront-Antispam-Report: CIP:12.22.5.236; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(4636009)(46966006)(40470700002)(36840700001)(40460700001)(16526019)(186003)(1076003)(426003)(70206006)(4326008)(7696005)(6916009)(55016003)(8936002)(82310400004)(356005)(83380400001)(6666004)(86362001)(70586007)(336012)(2616005)(2906002)(8676002)(81166007)(316002)(26005)(36860700001)(36756003)(508600001)(47076005)(6286002)(5660300002)(14583001)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Dec 2021 14:38:10.0882 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: de36673d-a378-47ea-3fbc-08d9cba20070 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.236]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT053.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4985 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org In preparation to extend --huge-unlink option semantics refactor how it is stored in the internal configuration. It makes future changes more isolated. Signed-off-by: Dmitry Kozlyuk --- lib/eal/common/eal_common_options.c | 9 +++++---- lib/eal/common/eal_internal_cfg.h | 8 +++++++- lib/eal/linux/eal_memalloc.c | 7 ++++--- lib/eal/linux/eal_memory.c | 2 +- 4 files changed, 17 insertions(+), 9 deletions(-) diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_common_options.c index 1cfdd75f3b..7520ebda8e 100644 --- a/lib/eal/common/eal_common_options.c +++ b/lib/eal/common/eal_common_options.c @@ -1737,7 +1737,7 @@ eal_parse_common_option(int opt, const char *optarg, /* long options */ case OPT_HUGE_UNLINK_NUM: - conf->hugepage_unlink = 1; + conf->hugepage_file.unlink_before_mapping = true; break; case OPT_NO_HUGE_NUM: @@ -1766,7 +1766,7 @@ eal_parse_common_option(int opt, const char *optarg, conf->in_memory = 1; /* in-memory is a superset of noshconf and huge-unlink */ conf->no_shconf = 1; - conf->hugepage_unlink = 1; + conf->hugepage_file.unlink_before_mapping = true; break; case OPT_PROC_TYPE_NUM: @@ -2050,7 +2050,8 @@ eal_check_common_options(struct internal_config *internal_cfg) "be specified together with --"OPT_NO_HUGE"\n"); return -1; } - if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink && + if (internal_cfg->no_hugetlbfs && + internal_cfg->hugepage_file.unlink_before_mapping && !internal_cfg->in_memory) { RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot " "be specified together with --"OPT_NO_HUGE"\n"); @@ -2061,7 +2062,7 @@ eal_check_common_options(struct internal_config *internal_cfg) " is only supported in non-legacy memory mode\n"); } if (internal_cfg->single_file_segments && - internal_cfg->hugepage_unlink && + internal_cfg->hugepage_file.unlink_before_mapping && !internal_cfg->in_memory) { RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is " "not compatible with --"OPT_HUGE_UNLINK"\n"); diff --git a/lib/eal/common/eal_internal_cfg.h b/lib/eal/common/eal_internal_cfg.h index d6c0470eb8..b5e6942578 100644 --- a/lib/eal/common/eal_internal_cfg.h +++ b/lib/eal/common/eal_internal_cfg.h @@ -40,6 +40,12 @@ struct simd_bitwidth { uint16_t bitwidth; /**< bitwidth value */ }; +/** Hugepage backing files discipline. */ +struct hugepage_file_discipline { + /** Unlink files before mapping them to leave no trace in hugetlbfs. */ + bool unlink_before_mapping; +}; + /** * internal configuration */ @@ -48,7 +54,7 @@ struct internal_config { volatile unsigned force_nchannel; /**< force number of channels */ volatile unsigned force_nrank; /**< force number of ranks */ volatile unsigned no_hugetlbfs; /**< true to disable hugetlbfs */ - unsigned hugepage_unlink; /**< true to unlink backing files */ + struct hugepage_file_discipline hugepage_file; volatile unsigned no_pci; /**< true to disable PCI */ volatile unsigned no_hpet; /**< true to disable HPET */ volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping diff --git a/lib/eal/linux/eal_memalloc.c b/lib/eal/linux/eal_memalloc.c index 337f2bc739..abbe605e49 100644 --- a/lib/eal/linux/eal_memalloc.c +++ b/lib/eal/linux/eal_memalloc.c @@ -564,7 +564,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, __func__, strerror(errno)); goto resized; } - if (internal_conf->hugepage_unlink && + if (internal_conf->hugepage_file.unlink_before_mapping && !internal_conf->in_memory) { if (unlink(path)) { RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n", @@ -697,7 +697,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, close_hugefile(fd, path, list_idx); } else { /* only remove file if we can take out a write lock */ - if (internal_conf->hugepage_unlink == 0 && + if (!internal_conf->hugepage_file.unlink_before_mapping && internal_conf->in_memory == 0 && lock(fd, LOCK_EX) == 1) unlink(path); @@ -756,7 +756,8 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi, /* if we're able to take out a write lock, we're the last one * holding onto this page. */ - if (!internal_conf->in_memory && !internal_conf->hugepage_unlink) { + if (!internal_conf->in_memory && + internal_conf->hugepage_file.unlink_before_mapping) { ret = lock(fd, LOCK_EX); if (ret >= 0) { /* no one else is using this page */ diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c index 03a4f2dd2d..83eec078a4 100644 --- a/lib/eal/linux/eal_memory.c +++ b/lib/eal/linux/eal_memory.c @@ -1428,7 +1428,7 @@ eal_legacy_hugepage_init(void) } /* free the hugepage backing files */ - if (internal_conf->hugepage_unlink && + if (internal_conf->hugepage_file.unlink_before_mapping && unlink_hugepage_files(tmp_hp, internal_conf->num_hugepage_sizes) < 0) { RTE_LOG(ERR, EAL, "Unlinking hugepage files failed!\n"); goto fail; From patchwork Thu Dec 30 14:37:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 105533 X-Patchwork-Delegate: david.marchand@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9F640A04A5; Thu, 30 Dec 2021 15:38:36 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2E56741174; Thu, 30 Dec 2021 15:38:17 +0100 (CET) Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2079.outbound.protection.outlook.com [40.107.92.79]) by mails.dpdk.org (Postfix) with ESMTP id 0C84B41148 for ; Thu, 30 Dec 2021 15:38:13 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=R2yneNHPSRcqbKN8/okvXGnhIdZjvfRhJ7NEcCEOfSpjTLVGY9+5TvZXzB1p9bmJgL8oYjpB0uPMOjO/GvhMmu2BCF6V2BP1/4kB5M9V72Q1oaVM7CwJI5TrJUAn1cuUfHpneziyyAzdKXueiql3QfJnvhzlEySeOFmLI5zB9ZPAaOnYTWYuKOuFM6zDKwa2y/UVDxuv70Os/MSIGdmY+D0Iun/5GrYSGDP3uL4k/EdzO3KjJHV8tbXMOddnnmxe6nJMU1DOt0KavMPl+Cgewgn4nHhqF4mpLf5YqnU0tOnyxyfa2g0tBzVBnUWBf8g65f6KzCNCvh8ExoyBOmuqDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=eQVaTsSne+Qcod8p9FmCKdIBumlNwY94+Fc+J4zCyfM=; b=MVL/Vv74FJrhBavq/gzemnSrsGrNx4cfoTIzvbj4P0KEpPOCR+0BZGjTVP7XBTB15erWlEq2UH6bhWHQ7w4Dvk9T1f67eK7YnGUhd2OYkQAKMvdnyW+eStoc3fi/ZGLIT5bEWIopuJeRyGP6zHZ4tcHG89ZipeGAYjBtIakzjas/A1q6zlIWXF6M6ntZF5TTBGxrzghBxhB4X5p7I3pWLjHb6ZY3UMlzJzP4zlKDAjHVXxXGlNZiG+rz5mQ4p6jY7Ya7LZOfR5FhnaDsZNG391+O9gDzRHriw43JnjqI6bea/rHIba3QnskbiRW/bkQZNzGDrbhjWb8+DdMKc0v4pg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.236) smtp.rcpttodomain=intel.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=eQVaTsSne+Qcod8p9FmCKdIBumlNwY94+Fc+J4zCyfM=; b=dhKvDYXB0TJnyEcaoSRysuUjwuMSOxsukRnFok9xFf8wVCqtdchHo43mAkLEmTr/iJkBhmgs0UdtB4AtYqhByVwAv5IPzwaRiCcv+Hg9xcfDmQvBcW4rK1nJWGMCrUy7iXOHqZKCqKfi7lEH8mj3AM8VtxsxFdnd/9VNFE+E/I6ghRim9c2oKZrYZN1HGlBesCqUFygUDo/6oLpuemV+pTKAdgzYmFVlmg16HiLJVB1sZIg2qbmL7wULFBFHhKjw+puqr0YAuJtPHAstuUCbJIZai6Ka/3I7p5GyqMgvkwydqBvbS8zA1v38jUnvD883+zmojCB7rvKl/dIoCbaz1A== Received: from CO2PR18CA0061.namprd18.prod.outlook.com (2603:10b6:104:2::29) by DM6PR12MB3210.namprd12.prod.outlook.com (2603:10b6:5:185::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4823.19; Thu, 30 Dec 2021 14:38:10 +0000 Received: from CO1NAM11FT053.eop-nam11.prod.protection.outlook.com (2603:10b6:104:2:cafe::73) by CO2PR18CA0061.outlook.office365.com (2603:10b6:104:2::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4844.14 via Frontend Transport; Thu, 30 Dec 2021 14:38:10 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.236) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.236 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.236; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.236) by CO1NAM11FT053.mail.protection.outlook.com (10.13.175.63) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.4844.14 via Frontend Transport; Thu, 30 Dec 2021 14:38:10 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by DRHQMAIL109.nvidia.com (10.27.9.19) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Thu, 30 Dec 2021 14:38:08 +0000 Received: from nvidia.com (172.20.187.6) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.986.9; Thu, 30 Dec 2021 06:38:06 -0800 From: Dmitry Kozlyuk To: CC: Anatoly Burakov Subject: [RFC PATCH 4/6] eal/linux: allow hugepage file reuse Date: Thu, 30 Dec 2021 16:37:42 +0200 Message-ID: <20211230143744.3550098-5-dkozlyuk@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211230143744.3550098-1-dkozlyuk@nvidia.com> References: <20211230143744.3550098-1-dkozlyuk@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [172.20.187.6] X-ClientProxiedBy: HQMAIL105.nvidia.com (172.20.187.12) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 21990e4b-43a2-4a65-ec97-08d9cba200be X-MS-TrafficTypeDiagnostic: DM6PR12MB3210:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:6790; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: wFTlkvoS7xI7HeJrnS5eRmjFBMVpfT4RoK1GvngqZDhYovx+lTdUwNyLmeQXS1cQZa23gp7obifaKq0Iu8+2ORq5zrx68bQQRM9y0xuO1UqBD0gLdcrYFSkcVgNPM6CHYGarqg123rMYq10u4joM6wLN7abr7zEmsn1wE/7+MybDOas1oCMa7iGa0q74fsStf68k1ZC3+22rofJ8vYAUYgYKxXAe2/pEbC92jPNpjvzFRHnxrT9jkqU95d/+qMt0s+izLS0YSt8JfKlPMa4ogoDIp23P4lwt1C+xehFYW0vOz+uCG+tANg96vccB5dd7P6arDmJ/wM9QAg5uXZqX+SVcE53NHy3uNvf34tOy44lPEdJzCEpAsLj9sP0ADA5hoPgytaeZOxClNVmpxCWAe5fLK4t3GeXsCm5i7oPs//3bXYWvrzmDo76lyDGelxMCu6Dt8M/WKmNXfllok7w/5YF3W9q0ipC3HCTgaq/E37OA6wldxi9AlvhhCB4bshzVEX4VuG7V0cV3xN6lWwmtdbgjT6OBnvAsxDoVYv1b5vjTKBuM/2Nr0UZUFf+jlU2gciIXaUZQB3AJTCTkRBHhhnw9OgBHM6JM4VzhiOFRcHSX85Zt010xf6HuD5Gft/pulkk09KEzbC+mLa0fJSr7TokDQGyet82hSd3FUXd5qi722wkGmF8zdYWf831OCuVBoc8xgHLkGVlgV7mlf4oeyiZoFO12opivOdtc4NbfLzn1DqUlo7hz3rcA/7vCpli0TyZvhXmE5Eb66bcDCoU6+ASd/ouLYAuqHFnMyLXpPps= X-Forefront-Antispam-Report: CIP:12.22.5.236; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(4636009)(40470700002)(36840700001)(46966006)(8936002)(4326008)(30864003)(70586007)(70206006)(7696005)(8676002)(47076005)(82310400004)(426003)(26005)(16526019)(36860700001)(336012)(186003)(36756003)(86362001)(81166007)(356005)(6286002)(1076003)(6666004)(2906002)(316002)(83380400001)(2616005)(40460700001)(55016003)(508600001)(5660300002)(6916009)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Dec 2021 14:38:10.6038 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 21990e4b-43a2-4a65-ec97-08d9cba200be X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.236]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT053.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB3210 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Linux EAL ensured that mapped hugepages are clean by always mapping from newly created files: existing hugepage backing files were always removed. In this case, the kernel clears the page to prevent data leaks, because the mapped memory may contain leftover data from the previous process that was using this memory. Clearing takes the bulk of the time spent in mmap(2), increasing EAL initialization time. Introduce a mode to keep existing files and reuse them in order to speed up initial memory allocation in EAL. Hugepages mapped from such files may contain data left by the previous process that used this memory, so RTE_MEMSEG_FLAG_DIRTY is set for their segments. If multiple hugepages are mapped from the same file: 1. When fallocate(2) is used, all memory mapped from this file is considered dirty, because it is unknown which parts of the file are holes. 2. When ftruncate(3) is used, memory mapped from this file is considered dirty unless the file is extended to create a new mapping, which implies clean memory. Signed-off-by: Dmitry Kozlyuk --- lib/eal/common/eal_internal_cfg.h | 2 + lib/eal/linux/eal_hugepage_info.c | 59 +++++++---- lib/eal/linux/eal_memalloc.c | 157 ++++++++++++++++++------------ 3 files changed, 140 insertions(+), 78 deletions(-) diff --git a/lib/eal/common/eal_internal_cfg.h b/lib/eal/common/eal_internal_cfg.h index b5e6942578..3685aa7c52 100644 --- a/lib/eal/common/eal_internal_cfg.h +++ b/lib/eal/common/eal_internal_cfg.h @@ -44,6 +44,8 @@ struct simd_bitwidth { struct hugepage_file_discipline { /** Unlink files before mapping them to leave no trace in hugetlbfs. */ bool unlink_before_mapping; + /** Reuse existing files, never delete or re-create them. */ + bool keep_existing; }; /** diff --git a/lib/eal/linux/eal_hugepage_info.c b/lib/eal/linux/eal_hugepage_info.c index 9fb0e968db..55debdedf0 100644 --- a/lib/eal/linux/eal_hugepage_info.c +++ b/lib/eal/linux/eal_hugepage_info.c @@ -84,7 +84,7 @@ static int get_hp_sysfs_value(const char *subdir, const char *file, unsigned lon /* this function is only called from eal_hugepage_info_init which itself * is only called from a primary process */ static uint32_t -get_num_hugepages(const char *subdir, size_t sz) +get_num_hugepages(const char *subdir, size_t sz, unsigned int reusable_pages) { unsigned long resv_pages, num_pages, over_pages, surplus_pages; const char *nr_hp_file = "free_hugepages"; @@ -116,7 +116,7 @@ get_num_hugepages(const char *subdir, size_t sz) else over_pages = 0; - if (num_pages == 0 && over_pages == 0) + if (num_pages == 0 && over_pages == 0 && reusable_pages) RTE_LOG(WARNING, EAL, "No available %zu kB hugepages reported\n", sz >> 10); @@ -124,6 +124,10 @@ get_num_hugepages(const char *subdir, size_t sz) if (num_pages < over_pages) /* overflow */ num_pages = UINT32_MAX; + num_pages += reusable_pages; + if (num_pages < reusable_pages) /* overflow */ + num_pages = UINT32_MAX; + /* we want to return a uint32_t and more than this looks suspicious * anyway ... */ if (num_pages > UINT32_MAX) @@ -298,12 +302,12 @@ get_hugepage_dir(uint64_t hugepage_sz, char *hugedir, int len) } /* - * Clear the hugepage directory of whatever hugepage files - * there are. Checks if the file is locked (i.e. - * if it's in use by another DPDK process). + * Search the hugepage directory for whatever hugepage files there are. + * Check if the file is in use by another DPDK process. + * If not, either remove it, or keep and count the page as reusable. */ static int -clear_hugedir(const char * hugedir) +clear_hugedir(const char *hugedir, bool keep, unsigned int *reusable_pages) { DIR *dir; struct dirent *dirent; @@ -346,8 +350,12 @@ clear_hugedir(const char * hugedir) lck_result = flock(fd, LOCK_EX | LOCK_NB); /* if lock succeeds, remove the file */ - if (lck_result != -1) - unlinkat(dir_fd, dirent->d_name, 0); + if (lck_result != -1) { + if (keep) + (*reusable_pages)++; + else + unlinkat(dir_fd, dirent->d_name, 0); + } close (fd); dirent = readdir(dir); } @@ -375,7 +383,8 @@ compare_hpi(const void *a, const void *b) } static void -calc_num_pages(struct hugepage_info *hpi, struct dirent *dirent) +calc_num_pages(struct hugepage_info *hpi, struct dirent *dirent, + unsigned int reusable_pages) { uint64_t total_pages = 0; unsigned int i; @@ -388,8 +397,15 @@ calc_num_pages(struct hugepage_info *hpi, struct dirent *dirent) * in one socket and sorting them later */ total_pages = 0; - /* we also don't want to do this for legacy init */ - if (!internal_conf->legacy_mem) + + /* + * We also don't want to do this for legacy init. + * When there are hugepage files to reuse it is unknown + * what NUMA node the pages are on. + * This could be determined by mapping, + * but it is precisely what hugepage file reuse is trying to avoid. + */ + if (!internal_conf->legacy_mem && reusable_pages == 0) for (i = 0; i < rte_socket_count(); i++) { int socket = rte_socket_id_by_idx(i); unsigned int num_pages = @@ -405,7 +421,7 @@ calc_num_pages(struct hugepage_info *hpi, struct dirent *dirent) */ if (total_pages == 0) { hpi->num_pages[0] = get_num_hugepages(dirent->d_name, - hpi->hugepage_sz); + hpi->hugepage_sz, reusable_pages); #ifndef RTE_ARCH_64 /* for 32-bit systems, limit number of hugepages to @@ -421,6 +437,7 @@ hugepage_info_init(void) { const char dirent_start_text[] = "hugepages-"; const size_t dirent_start_len = sizeof(dirent_start_text) - 1; unsigned int i, num_sizes = 0; + unsigned int reusable_pages; DIR *dir; struct dirent *dirent; struct internal_config *internal_conf = @@ -454,7 +471,7 @@ hugepage_info_init(void) uint32_t num_pages; num_pages = get_num_hugepages(dirent->d_name, - hpi->hugepage_sz); + hpi->hugepage_sz, 0); if (num_pages > 0) RTE_LOG(NOTICE, EAL, "%" PRIu32 " hugepages of size " @@ -473,7 +490,7 @@ hugepage_info_init(void) "hugepages of size %" PRIu64 " bytes " "will be allocated anonymously\n", hpi->hugepage_sz); - calc_num_pages(hpi, dirent); + calc_num_pages(hpi, dirent, 0); num_sizes++; } #endif @@ -489,11 +506,17 @@ hugepage_info_init(void) "Failed to lock hugepage directory!\n"); break; } - /* clear out the hugepages dir from unused pages */ - if (clear_hugedir(hpi->hugedir) == -1) - break; - calc_num_pages(hpi, dirent); + /* + * Check for existing hugepage files and either remove them + * or count how many of them can be reused. + */ + reusable_pages = 0; + if (clear_hugedir(hpi->hugedir, + internal_conf->hugepage_file.keep_existing, + &reusable_pages) == -1) + break; + calc_num_pages(hpi, dirent, reusable_pages); num_sizes++; } diff --git a/lib/eal/linux/eal_memalloc.c b/lib/eal/linux/eal_memalloc.c index abbe605e49..cbd7c9cbee 100644 --- a/lib/eal/linux/eal_memalloc.c +++ b/lib/eal/linux/eal_memalloc.c @@ -287,12 +287,19 @@ get_seg_memfd(struct hugepage_info *hi __rte_unused, static int get_seg_fd(char *path, int buflen, struct hugepage_info *hi, - unsigned int list_idx, unsigned int seg_idx) + unsigned int list_idx, unsigned int seg_idx, + bool *dirty) { int fd; + int *out_fd; + struct stat st; + int ret; const struct internal_config *internal_conf = eal_get_internal_configuration(); + if (dirty != NULL) + *dirty = false; + /* for in-memory mode, we only make it here when we're sure we support * memfd, and this is a special case. */ @@ -300,66 +307,68 @@ get_seg_fd(char *path, int buflen, struct hugepage_info *hi, return get_seg_memfd(hi, list_idx, seg_idx); if (internal_conf->single_file_segments) { - /* create a hugepage file path */ + out_fd = &fd_list[list_idx].memseg_list_fd; eal_get_hugefile_path(path, buflen, hi->hugedir, list_idx); - - fd = fd_list[list_idx].memseg_list_fd; - - if (fd < 0) { - fd = open(path, O_CREAT | O_RDWR, 0600); - if (fd < 0) { - RTE_LOG(ERR, EAL, "%s(): open failed: %s\n", - __func__, strerror(errno)); - return -1; - } - /* take out a read lock and keep it indefinitely */ - if (lock(fd, LOCK_SH) < 0) { - RTE_LOG(ERR, EAL, "%s(): lock failed: %s\n", - __func__, strerror(errno)); - close(fd); - return -1; - } - fd_list[list_idx].memseg_list_fd = fd; - } } else { - /* create a hugepage file path */ + out_fd = &fd_list[list_idx].fds[seg_idx]; eal_get_hugefile_path(path, buflen, hi->hugedir, list_idx * RTE_MAX_MEMSEG_PER_LIST + seg_idx); + } + fd = *out_fd; + if (fd >= 0) + return fd; - fd = fd_list[list_idx].fds[seg_idx]; - - if (fd < 0) { - /* A primary process is the only one creating these - * files. If there is a leftover that was not cleaned - * by clear_hugedir(), we must *now* make sure to drop - * the file or we will remap old stuff while the rest - * of the code is built on the assumption that a new - * page is clean. - */ - if (rte_eal_process_type() == RTE_PROC_PRIMARY && - unlink(path) == -1 && - errno != ENOENT) { + /* + * The kernel clears a hugepage only when it is mapped + * from a particular file for the first time. + * If the file already exists, mapped will be the old + * content of the hugepages. If the memory manager + * assumes all mapped pages to be clean, + * the file must be removed and created anew. + * Otherwise the primary caller must be notified + * that mapped pages will be dirty (secondary callers + * receive the segment state from the primary one). + * When multiple hugepages are mapped from the same file, + * whether they will be dirty depends on the part that is mapped. + * + * There is no TOCTOU between stat() and unlink()/open() + * because the hugepage directory is locked. + */ + if (!internal_conf->single_file_segments) { + ret = stat(path, &st); + if (ret < 0 && errno != ENOENT) { + RTE_LOG(DEBUG, EAL, "%s(): stat() for '%s' failed: %s\n", + __func__, path, strerror(errno)); + return -1; + } + if (rte_eal_process_type() == RTE_PROC_PRIMARY && ret == 0) { + if (internal_conf->hugepage_file.keep_existing && + dirty != NULL) { + *dirty = true; + /* coverity[toctou] */ + } else if (unlink(path) < 0) { RTE_LOG(DEBUG, EAL, "%s(): could not remove '%s': %s\n", __func__, path, strerror(errno)); return -1; } - - fd = open(path, O_CREAT | O_RDWR, 0600); - if (fd < 0) { - RTE_LOG(DEBUG, EAL, "%s(): open failed: %s\n", - __func__, strerror(errno)); - return -1; - } - /* take out a read lock */ - if (lock(fd, LOCK_SH) < 0) { - RTE_LOG(ERR, EAL, "%s(): lock failed: %s\n", - __func__, strerror(errno)); - close(fd); - return -1; - } - fd_list[list_idx].fds[seg_idx] = fd; } } + + /* coverity[toctou] */ + fd = open(path, O_CREAT | O_RDWR, 0600); + if (fd < 0) { + RTE_LOG(DEBUG, EAL, "%s(): open failed: %s\n", + __func__, strerror(errno)); + return -1; + } + /* take out a read lock */ + if (lock(fd, LOCK_SH) < 0) { + RTE_LOG(ERR, EAL, "%s(): lock failed: %s\n", + __func__, strerror(errno)); + close(fd); + return -1; + } + *out_fd = fd; return fd; } @@ -385,8 +394,10 @@ resize_hugefile_in_memory(int fd, uint64_t fa_offset, static int resize_hugefile_in_filesystem(int fd, uint64_t fa_offset, uint64_t page_sz, - bool grow) + bool grow, bool *dirty) { + const struct internal_config *internal_conf = + eal_get_internal_configuration(); bool again = false; do { @@ -405,6 +416,8 @@ resize_hugefile_in_filesystem(int fd, uint64_t fa_offset, uint64_t page_sz, uint64_t cur_size = get_file_size(fd); /* fallocate isn't supported, fall back to ftruncate */ + if (dirty != NULL) + *dirty = new_size <= cur_size; if (new_size > cur_size && ftruncate(fd, new_size) < 0) { RTE_LOG(DEBUG, EAL, "%s(): ftruncate() failed: %s\n", @@ -447,8 +460,17 @@ resize_hugefile_in_filesystem(int fd, uint64_t fa_offset, uint64_t page_sz, strerror(errno)); return -1; } - } else + } else { fallocate_supported = 1; + /* + * It is unknown which portions of an existing + * hugepage file were allocated previously, + * so all pages within the file are considered + * dirty, unless the file is a fresh one. + */ + if (dirty != NULL) + *dirty = internal_conf->hugepage_file.keep_existing; + } } } while (again); @@ -475,7 +497,8 @@ close_hugefile(int fd, char *path, int list_idx) } static int -resize_hugefile(int fd, uint64_t fa_offset, uint64_t page_sz, bool grow) +resize_hugefile(int fd, uint64_t fa_offset, uint64_t page_sz, bool grow, + bool *dirty) { /* in-memory mode is a special case, because we can be sure that * fallocate() is supported. @@ -483,12 +506,15 @@ resize_hugefile(int fd, uint64_t fa_offset, uint64_t page_sz, bool grow) const struct internal_config *internal_conf = eal_get_internal_configuration(); - if (internal_conf->in_memory) + if (internal_conf->in_memory) { + if (dirty != NULL) + *dirty = false; return resize_hugefile_in_memory(fd, fa_offset, page_sz, grow); + } return resize_hugefile_in_filesystem(fd, fa_offset, page_sz, - grow); + grow, dirty); } static int @@ -505,6 +531,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, char path[PATH_MAX]; int ret = 0; int fd; + bool dirty; size_t alloc_sz; int flags; void *new_addr; @@ -534,6 +561,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, pagesz_flag = pagesz_flags(alloc_sz); fd = -1; + dirty = false; mmap_flags = in_memory_flags | pagesz_flag; /* single-file segments codepath will never be active @@ -544,7 +572,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, map_offset = 0; } else { /* takes out a read lock on segment or segment list */ - fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx); + fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx, + &dirty); if (fd < 0) { RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n"); return -1; @@ -552,7 +581,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, if (internal_conf->single_file_segments) { map_offset = seg_idx * alloc_sz; - ret = resize_hugefile(fd, map_offset, alloc_sz, true); + ret = resize_hugefile(fd, map_offset, alloc_sz, true, + &dirty); if (ret < 0) goto resized; @@ -662,6 +692,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, ms->nrank = rte_memory_get_nrank(); ms->iova = iova; ms->socket_id = socket_id; + ms->flags = dirty ? RTE_MEMSEG_FLAG_DIRTY : 0; return 0; @@ -689,7 +720,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, return -1; if (internal_conf->single_file_segments) { - resize_hugefile(fd, map_offset, alloc_sz, false); + resize_hugefile(fd, map_offset, alloc_sz, false, NULL); /* ignore failure, can't make it any worse */ /* if refcount is at zero, close the file */ @@ -739,13 +770,13 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi, * segment and thus drop the lock on original fd, but hugepage dir is * now locked so we can take out another one without races. */ - fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx); + fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx, NULL); if (fd < 0) return -1; if (internal_conf->single_file_segments) { map_offset = seg_idx * ms->len; - if (resize_hugefile(fd, map_offset, ms->len, false)) + if (resize_hugefile(fd, map_offset, ms->len, false, NULL)) return -1; if (--(fd_list[list_idx].count) == 0) @@ -1743,6 +1774,12 @@ eal_memalloc_init(void) RTE_LOG(ERR, EAL, "Using anonymous memory is not supported\n"); return -1; } + /* safety net, should be impossible to configure */ + if (internal_conf->hugepage_file.unlink_before_mapping && + internal_conf->hugepage_file.keep_existing) { + RTE_LOG(ERR, EAL, "Unable both to keep existing hugepage files and to unlink them.\n"); + return -1; + } } /* initialize all of the fd lists */ From patchwork Thu Dec 30 14:48:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 105534 X-Patchwork-Delegate: david.marchand@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 5951EA04A5; Thu, 30 Dec 2021 15:48:56 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E04FE410F1; Thu, 30 Dec 2021 15:48:55 +0100 (CET) Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2086.outbound.protection.outlook.com [40.107.244.86]) by mails.dpdk.org (Postfix) with ESMTP id 295D440F35 for ; Thu, 30 Dec 2021 15:48:55 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LukjQOCAPrDi6xcUaYKhKGJsrYauOTxSBHacvURdFTTe28jrD/0YG4ueqr1bRvtysQ7DvXI6AoNakKXVsy1excy3JlMtubY3oXVdm/bYR6aDVadnWFAqBNW9FYSNtkfT5icd3DWGrBQdy4vJ8hq1nyn6ih4vJzi+40YlH/kZN3sBuMN+xvEBsrv99wK2tW+4UmJvCfAJXEn2gC0nxW6upiHq+PH7TQUrw2SPhtmZ6XpbOBsfn+oP2bxGKDEfrGlKLI6lDXcoMWxpUrJnifrBx9dlnNYDrdz/nTO9KEz1y7H61yZUMUhNXTEsBwV+C42ZyHXMCJPERGIbSQSPUKDDFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YNex9xojSORQQ5vrHy0+qa4ydSeStw/7eFu+Ayts7yE=; b=Vh5VS653pmTu0QiqHzrdJH1mbRqjvOemZj0rOBnH8HP7LRad0Rucp7m4MUuykiIPOxjoH6lod5Z/uBsmOhzd8tFQVCcdFXoolabG3Xpiz3KzTgu2CH8mqseW6Z/o/GnfG4yUrISuaRZznAUCANIzBgbLtqvTntFo/LfzhkSj9JwB0YaacKRTN5/pRk2nP9bQXnQtloGp0gdkW/N7rXylgyn42tGOk+1kYZk7qVlj1jU6HnHtsn3Y1mIoqVo96LW2ZpFnEgEW2e1dc0B7mpjvh5pFDSOZ7R7MCK/yL4ROQBQa8/d5DjWqNpIs+ZF7B3ygrCJPRazLtu97W16tki59Ow== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.234) smtp.rcpttodomain=intel.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YNex9xojSORQQ5vrHy0+qa4ydSeStw/7eFu+Ayts7yE=; b=kJ1QCh6diHI3CFGYq+c7oNoz5gKT919XrN6bsWmkwAzVsRuF+ZJorsY0S0LZ8RMTxAiddlLPWjUj6AO8zgxg5qFtkwShFCAp1y1AttD4vkON0YBIkQ0kKfRedbWhDXGjJegrhHQkNWbITfJygySgwD//NLvLKfnYhn5yHVKWtCtqczSQQDXUSCsSP0tD03iHLyTGQl7PEyS74XnZ9yiWNgq46j0QO5gI/sl1DfwD1mGjZUSQmP4Yv8iwFDTgTGsKkoek0E+PNOXBWk4WctfUjviVHhXKiXof3GyvsjRklYJ4jE5kwuf6hLriN+wJrO4k1QGEtVbB4DzyK+45OPU+6g== Received: from DM5PR21CA0068.namprd21.prod.outlook.com (2603:10b6:3:129::30) by BL1PR12MB5287.namprd12.prod.outlook.com (2603:10b6:208:317::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4823.19; Thu, 30 Dec 2021 14:48:53 +0000 Received: from DM6NAM11FT060.eop-nam11.prod.protection.outlook.com (2603:10b6:3:129:cafe::b1) by DM5PR21CA0068.outlook.office365.com (2603:10b6:3:129::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4867.3 via Frontend Transport; Thu, 30 Dec 2021 14:48:53 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.234) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.234 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.234; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.234) by DM6NAM11FT060.mail.protection.outlook.com (10.13.173.63) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.4844.14 via Frontend Transport; Thu, 30 Dec 2021 14:48:52 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by DRHQMAIL101.nvidia.com (10.27.9.10) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Thu, 30 Dec 2021 14:48:52 +0000 Received: from nvidia.com (172.20.187.6) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.986.9; Thu, 30 Dec 2021 06:48:51 -0800 From: Dmitry Kozlyuk To: CC: Anatoly Burakov Subject: [RFC PATCH 5/6] eal: allow hugepage file reuse with --huge-unlink Date: Thu, 30 Dec 2021 16:48:28 +0200 Message-ID: <20211230144828.3550807-1-dkozlyuk@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211230143744.3550098-1-dkozlyuk@nvidia.com> References: <20211230143744.3550098-1-dkozlyuk@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [172.20.187.6] X-ClientProxiedBy: HQMAIL111.nvidia.com (172.20.187.18) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: a1a00c35-f250-4d50-2f6b-08d9cba37f98 X-MS-TrafficTypeDiagnostic: BL1PR12MB5287:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:8882; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ZfcYdjo6y+0/VBIFU4hG7we8WUOUMs3+jaqXWsz8kIAaU3CAl/gL+Kp48VElH7AI7kfujTvRaIiCdL65J3qZlAC1DkrobLkdt9llUPOn/wNbRM4L8pIWIklaRyjKc8Xa8CDENnKGrwqDZBd6iW8Xqd0Yfn+uhXCFYvmlahjyM4AXsNKbMlsa/8wPljuSnSGhzLJg8ZALqcmRlIIurjNWml7YCj9c8QzSNwOeuw7i2nYFjN70CISe4jftw/WGr/NVjP+nRwzXSVJzaLOfjYR3DNaaeGS20/RckOvlxry/e4o2rJ/nTgNRNTwpo4t5nx5lYXVZv5uVgYioKeQvhuvBLEPRkK9DC6Rzpp/OfzwRm0bpATe09IqDTgwLKk6rRQdiHu/uOEXj2z0ceIYw5SrEsPfqSOAp3P6oWoLAaIehcQ+aA3ZmT0u5SvaOiZ7XAiQU0ol4ttFML+L344MtkUypVW/iUfLPfHynryCfMB774C0UFq9p+YmjPZwpJftKF4V61jFSyE5opjU+ay0NWaSlkJICTKaRuKhaR93twE29AiPQqMu6t/HoN7aYlWH+Nkp91kVfG6a7yB5EvYwYUqO9e8zMd1eyjRUquKW4KHKZKhanxFXq2CgoLf+8kLNg0Pl4KyAdJuQhiDjbHVxwfjnYIZ2IehZ4mg0IfhaH7+MnlfGZzC2yKiq5wKjgrdCRPHAHgIPhb9h/jJIW9jERlGSyqU+cenKZwh5yeYyFdcuLQWEhogIJ1wSLn4OY3YZLU/puXRpJeWKGw0KMEA7YO2fuVSJyt4fz0ZDc04aBfGG3PeFUNDeTkONInuuCRIthEK+d6nfqeQK6+GCHzSFCg5nn0A== X-Forefront-Antispam-Report: CIP:12.22.5.234; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(4636009)(46966006)(36840700001)(40470700002)(316002)(2906002)(36756003)(5660300002)(1076003)(6916009)(47076005)(70586007)(508600001)(6666004)(86362001)(2616005)(186003)(4326008)(83380400001)(426003)(55016003)(336012)(40460700001)(82310400004)(81166007)(6286002)(8936002)(356005)(8676002)(16526019)(7696005)(36860700001)(26005)(70206006)(14583001)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Dec 2021 14:48:52.9017 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a1a00c35-f250-4d50-2f6b-08d9cba37f98 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.234]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT060.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL1PR12MB5287 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Expose Linux EAL ability to reuse existing hugepage files via --huge-unlink=never switch. Default behavior is unchanged, it can also be specified using --huge-unlink=existing for consistency. Old --huge-unlink switch is kept, it is an alias for --huge-unlink=always. Signed-off-by: Dmitry Kozlyuk --- doc/guides/linux_gsg/linux_eal_parameters.rst | 21 ++++++++-- .../prog_guide/env_abstraction_layer.rst | 9 +++++ doc/guides/rel_notes/release_22_03.rst | 7 ++++ lib/eal/common/eal_common_options.c | 39 +++++++++++++++++-- 4 files changed, 69 insertions(+), 7 deletions(-) diff --git a/doc/guides/linux_gsg/linux_eal_parameters.rst b/doc/guides/linux_gsg/linux_eal_parameters.rst index 74df2611b5..64cd73b497 100644 --- a/doc/guides/linux_gsg/linux_eal_parameters.rst +++ b/doc/guides/linux_gsg/linux_eal_parameters.rst @@ -84,10 +84,23 @@ Memory-related options Use specified hugetlbfs directory instead of autodetected ones. This can be a sub-directory within a hugetlbfs mountpoint. -* ``--huge-unlink`` - - Unlink hugepage files after creating them (implies no secondary process - support). +* ``--huge-unlink[=existing|always|never]`` + + No ``--huge-unlink`` option or ``--huge-unlink=existing`` is the default: + existing hugepage files are removed and re-created + to ensure the kernel clears the memory and prevents any data leaks. + + With ``--huge-unlink`` (no value) or ``--huge-unlink=always``, + hugepage files are also removed after creating them, + so that the application leaves no files in hugetlbfs. + This mode implies no multi-process support. + + When ``--huge-unlink=never`` is specified, existing hugepage files + are not removed neither before nor after mapping them. + This makes restart faster by saving time to clear memory at initialization, + but it may slow down zeroed allocations later. + Reused hugepages can contain data from previous processes that used them, + which may be a security concern. * ``--match-allocations`` diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 6cddb86467..d8940f5e2e 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -277,6 +277,15 @@ to prevent data leaks from previous users of the same hugepage. EAL ensures this behavior by removing existing backing files at startup and by recreating them before opening for mapping (as a precaution). +One expection is ``--huge-unlink=never`` mode. +It is used to speed up EAL initialization, usually on application restart. +Clearing memory constitutes more than 95% of hugepage mapping time. +EAL can save it by remapping existing backing files +with all the data left in the mapped hugepages ("dirty" memory). +Such segments are marked with ``RTE_MEMSEG_FLAG_DIRTY``. +Memory allocator detects dirty segments handles them accordingly, +in particular, it clears memory requested with ``rte_zmalloc*()``. + Anonymous mapping does not allow multi-process architecture, but it is free of filename conflicts and leftover files on hugetlbfs. If memfd_create(2) is supported both at build and run time, diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst index 6d99d1eaa9..0b882362cf 100644 --- a/doc/guides/rel_notes/release_22_03.rst +++ b/doc/guides/rel_notes/release_22_03.rst @@ -55,6 +55,13 @@ New Features Also, make sure to start the actual text at the margin. ======================================================= +* **Added ability to reuse hugepages in Linux.** + + It is possible to reuse files in hugetlbfs to speed up hugepage mapping, + which may be useful for fast restart and large allocations. + The new mode is activated with ``--huge-unlink=never`` + and has security implications, refer to the user and programmer guides. + Removed Items ------------- diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_common_options.c index 7520ebda8e..905a7769bd 100644 --- a/lib/eal/common/eal_common_options.c +++ b/lib/eal/common/eal_common_options.c @@ -74,7 +74,7 @@ eal_long_options[] = { {OPT_FILE_PREFIX, 1, NULL, OPT_FILE_PREFIX_NUM }, {OPT_HELP, 0, NULL, OPT_HELP_NUM }, {OPT_HUGE_DIR, 1, NULL, OPT_HUGE_DIR_NUM }, - {OPT_HUGE_UNLINK, 0, NULL, OPT_HUGE_UNLINK_NUM }, + {OPT_HUGE_UNLINK, 2, NULL, OPT_HUGE_UNLINK_NUM }, {OPT_IOVA_MODE, 1, NULL, OPT_IOVA_MODE_NUM }, {OPT_LCORES, 1, NULL, OPT_LCORES_NUM }, {OPT_LOG_LEVEL, 1, NULL, OPT_LOG_LEVEL_NUM }, @@ -1596,6 +1596,28 @@ available_cores(void) return str; } +#define HUGE_UNLINK_NEVER "never" + +static int +eal_parse_huge_unlink(const char *arg, struct hugepage_file_discipline *out) +{ + if (arg == NULL || strcmp(arg, "always") == 0) { + out->unlink_before_mapping = true; + return 0; + } + if (strcmp(arg, "existing") == 0) { + /* same as not specifying the option */ + return 0; + } + if (strcmp(arg, HUGE_UNLINK_NEVER) == 0) { + RTE_LOG(WARNING, EAL, "Using --"OPT_HUGE_UNLINK"=" + HUGE_UNLINK_NEVER" may create data leaks.\n"); + out->keep_existing = true; + return 0; + } + return -1; +} + int eal_parse_common_option(int opt, const char *optarg, struct internal_config *conf) @@ -1737,7 +1759,10 @@ eal_parse_common_option(int opt, const char *optarg, /* long options */ case OPT_HUGE_UNLINK_NUM: - conf->hugepage_file.unlink_before_mapping = true; + if (eal_parse_huge_unlink(optarg, &conf->hugepage_file) < 0) { + RTE_LOG(ERR, EAL, "invalid --"OPT_HUGE_UNLINK" option\n"); + return -1; + } break; case OPT_NO_HUGE_NUM: @@ -2068,6 +2093,12 @@ eal_check_common_options(struct internal_config *internal_cfg) "not compatible with --"OPT_HUGE_UNLINK"\n"); return -1; } + if (internal_cfg->hugepage_file.keep_existing && + internal_cfg->in_memory) { + RTE_LOG(ERR, EAL, "Option --"OPT_IN_MEMORY" is not compatible " + "with --"OPT_HUGE_UNLINK"="HUGE_UNLINK_NEVER"\n"); + return -1; + } if (internal_cfg->legacy_mem && internal_cfg->in_memory) { RTE_LOG(ERR, EAL, "Option --"OPT_LEGACY_MEM" is not compatible " @@ -2200,7 +2231,9 @@ eal_common_usage(void) " --"OPT_NO_TELEMETRY" Disable telemetry support\n" " --"OPT_FORCE_MAX_SIMD_BITWIDTH" Force the max SIMD bitwidth\n" "\nEAL options for DEBUG use only:\n" - " --"OPT_HUGE_UNLINK" Unlink hugepage files after init\n" + " --"OPT_HUGE_UNLINK"[=existing|always|never]\n" + " When to unlink files in hugetlbfs\n" + " ('existing' by default, no value means 'always')\n" " --"OPT_NO_HUGE" Use malloc instead of hugetlbfs\n" " --"OPT_NO_PCI" Disable PCI\n" " --"OPT_NO_HPET" Disable HPET\n" From patchwork Thu Dec 30 14:49:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 105535 X-Patchwork-Delegate: david.marchand@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E87B1A04A5; Thu, 30 Dec 2021 15:49:31 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D61AD4111B; Thu, 30 Dec 2021 15:49:31 +0100 (CET) Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2085.outbound.protection.outlook.com [40.107.223.85]) by mails.dpdk.org (Postfix) with ESMTP id 3AF8040F35 for ; Thu, 30 Dec 2021 15:49:31 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UC2RELSJgvSN0UbFNhzhXMOgDAKi/L9m8W1ICvZDclgav3dwev4gV3h38riWpPLsQbmZHYqk6FQJs9JcHqxB7AK4/zZlTw/wsmubqRloTxSw6iLtosCIQzRScpFUuv0WBpZdMcMEa4Yn2BC+NToPUmnFMgwaYjsEYUEuXXFHOmzvjaW+F7BW+/gXKPj3FkN5a4ipm8iJbyFdgKqTStH1M31H9LTUFWuOg7VplRQZrgKx1b1T1OiXc01TuCw0gQI8zSaFXVOBGNLiEsYYAvYI+N9diW067Fe962ymjzSkcc8ideLxHjQ1hCtuS49FYHXAmb+PfWVEQJkxVqIgHMqIgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=L/Ct4fLtjANp0MGdxque7yfAREQy9+PsTozSYq+ggGU=; b=F68yg25DUqEf0kCv/12bLhVJtSU1Nkaz6Mx6ZYEZv6UWl5l/8+WEQpzE1UpVDWBCUCW1Yq/d/beFHvHIF8TLBQYuye0f+n7S5FL5vtvHbCoG6DITVE3Q9WGrHAo8ZkIE9bxsvQ/DF/QzA0tjP8xzGxrUbQ7wQrfC5PVAmTf24kik86l8kjcgnjb9GKKp5FUojm8QQyo1ElrlDFT9OvZVoR9XrtiFM/vU4dy8Y62UzQ+fCcIHgiNYvRdTXr7u++tvT4b5AbmgNkoXX3LFwysOw9t8BpfvvOgOkBxahoI2E3JmrbBo6rCCbeIDFxSqQ7yLO1mAkPacWdaRdoy2TcW1yw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.236) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=L/Ct4fLtjANp0MGdxque7yfAREQy9+PsTozSYq+ggGU=; b=CC0Z7C4u6MLduvRhKFizNIc85Xt5B7qc6Hh1swosWfJ40oBEzjupDGkQpC7Vbx7gwYxNmZQJH5Xs/qaW7BtJKkEihGSw5+hG613fWprMMsItXghH1aMr+l9wLQ1oaMezPWmt3rkfJiyxhym7y2Ya5be7A3I6UGI77uvyAHtCygot6+mbZnl0RKBPKTgyq4WKErbkk4FMJdErYjXxH4i3yIIA2f0LFn8kBft4hLE/BFgN/oA74dBmD1HfyabPihT1mcD5QmgJL85Z2yoKOgIGTQeLtxm1lfWrv6oQ8sn7kiAsCVkGUz9n2oCAXGzDk4rwZaHzRUBqwL5xVf1lRy/umw== Received: from BN1PR10CA0027.namprd10.prod.outlook.com (2603:10b6:408:e0::32) by DM6PR12MB3499.namprd12.prod.outlook.com (2603:10b6:5:11e::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4823.21; Thu, 30 Dec 2021 14:49:29 +0000 Received: from BN8NAM11FT032.eop-nam11.prod.protection.outlook.com (2603:10b6:408:e0:cafe::31) by BN1PR10CA0027.outlook.office365.com (2603:10b6:408:e0::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4844.13 via Frontend Transport; Thu, 30 Dec 2021 14:49:29 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.236) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.236 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.236; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.236) by BN8NAM11FT032.mail.protection.outlook.com (10.13.177.88) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.4844.14 via Frontend Transport; Thu, 30 Dec 2021 14:49:29 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by DRHQMAIL109.nvidia.com (10.27.9.19) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Thu, 30 Dec 2021 14:49:25 +0000 Received: from nvidia.com (172.20.187.6) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.986.9; Thu, 30 Dec 2021 06:49:23 -0800 From: Dmitry Kozlyuk To: CC: Aaron Conole , Viacheslav Ovsiienko Subject: [RFC PATCH 6/6] app/test: add allocator performance benchmark Date: Thu, 30 Dec 2021 16:49:10 +0200 Message-ID: <20211230144910.3551027-1-dkozlyuk@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211230143744.3550098-1-dkozlyuk@nvidia.com> References: <20211230143744.3550098-1-dkozlyuk@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [172.20.187.6] X-ClientProxiedBy: HQMAIL111.nvidia.com (172.20.187.18) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 476539d9-fa4f-41e9-b410-08d9cba3952b X-MS-TrafficTypeDiagnostic: DM6PR12MB3499:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:3968; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: u0zxSCDcGju3ybCewiNS9DOu/u2JcVPme9kQa6WLW5jlpkeApEmHkT2uY2/4jzRfzqD2a0MmgBJDAaaOKI88MqQQCUdRqbw85QKOvmn4rOmNTfURy3+KMge8cbtu1WybM06Msx7/tgB7J+DIlUtIXIDMXpOLzZjqxK59WQYITONetmFXg+MsVwskpn9MDMQXdAs0vSsU5MVmjXxgZS1PC+z+DDaMqz1YHgcAJhdyt4fs8anD19jIYbS0XOV1Vh02I/nY58jLhEXUn0NnX1enQIIjW7IyajLIHMF2E39Bc85p4kYrD6NrkqrPCeBsSdmPMRdi5Mchhv1av6gWZONCSip28j4MBaMdzHdEOfnktlnHgxzouofYrr0trPAPpO7SdYy4gSCQsUcuqFpIz+BrJar1CSbZSxLdEoOnyxiYW3FJJ3KcLuI/5B8P3I9n5VIcwAksjf2culVAtSUvoQOCb23zM89dg9ZifeTSTZG5P0U/zUOkwLKHSTERY4Vpk+oGJN88mzILFry4mc0dKCH5zBaX4Nr7N77p//Z7OrG/rcLlIhE7JrFUFWGLs9jy7AtGcFb5SZyCDXo03WPObZ5aP8t4ns4cGWjfX0P3pW1GX/GaXd7d6HkjumqdIAwiKx8GcGer9obfvKxLNorMRlva9HV9IMEMuLvZWlPtMY2HXFUyh4rWZ6G8dC3gSkpJbCLU03HO6QTCODLE2HE4AN4/c18ln4lQkIQj6O2dTZbHfDbn1PalQxKGAVB3vrh5bOtucJfNl8gRA+OGZ+7hFPvbC7Ii3HuIqitj7kQpFI6+oOA= X-Forefront-Antispam-Report: CIP:12.22.5.236; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(4636009)(40470700002)(46966006)(36840700001)(4326008)(36860700001)(7696005)(54906003)(336012)(426003)(36756003)(508600001)(86362001)(70586007)(356005)(81166007)(8936002)(5660300002)(1076003)(70206006)(2906002)(26005)(8676002)(16526019)(6286002)(316002)(107886003)(55016003)(6916009)(2616005)(40460700001)(186003)(82310400004)(47076005)(6666004)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Dec 2021 14:49:29.0694 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 476539d9-fa4f-41e9-b410-08d9cba3952b X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.236]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT032.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB3499 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Memory allocator performance is crucial to applications that deal with large amount of memory or allocate frequently. DPDK allocator performance is affected by EAL options, API used and, at least, allocation size. New autotest is intended to be run with different EAL options. It measures performance with a range of sizes for dirrerent APIs: rte_malloc, rte_zmalloc, and rte_memzone_reserve. Work distribution between allocation and deallocation depends on EAL options. The test prints both times and total time to ease comparison. Memory can be filled with zeroes at different points of allocation path, but it always takes considerable fraction of overall timing. This is why the test measures filling speed and prints how long clearing takes for each size as a reference (for rte_memzone_reserve estimations are printed). Signed-off-by: Dmitry Kozlyuk Reviewed-by: Viacheslav Ovsiienko --- app/test/meson.build | 2 + app/test/test_malloc_perf.c | 174 ++++++++++++++++++++++++++++++++++++ 2 files changed, 176 insertions(+) create mode 100644 app/test/test_malloc_perf.c diff --git a/app/test/meson.build b/app/test/meson.build index 2b480adfba..899034fc2a 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -88,6 +88,7 @@ test_sources = files( 'test_lpm6_perf.c', 'test_lpm_perf.c', 'test_malloc.c', + 'test_malloc_perf.c', 'test_mbuf.c', 'test_member.c', 'test_member_perf.c', @@ -295,6 +296,7 @@ extra_test_names = [ perf_test_names = [ 'ring_perf_autotest', + 'malloc_perf_autotest', 'mempool_perf_autotest', 'memcpy_perf_autotest', 'hash_perf_autotest', diff --git a/app/test/test_malloc_perf.c b/app/test/test_malloc_perf.c new file mode 100644 index 0000000000..9686fc8af5 --- /dev/null +++ b/app/test/test_malloc_perf.c @@ -0,0 +1,174 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2021 NVIDIA Corporation & Affiliates + */ + +#include +#include +#include +#include +#include +#include + +#include "test.h" + +#define TEST_LOG(level, ...) RTE_LOG(level, USER1, __VA_ARGS__) + +typedef void * (alloc_t)(const char *name, size_t size, unsigned int align); +typedef void (free_t)(void *addr); +typedef void * (memset_t)(void *addr, int value, size_t size); + +static const uint64_t KB = 1 << 10; +static const uint64_t GB = 1 << 30; + +static double +tsc_to_us(uint64_t tsc, size_t runs) +{ + return (double)tsc / rte_get_tsc_hz() * US_PER_S / runs; +} + +static int +test_memset_perf(double *us_per_gb) +{ + static const size_t RUNS = 20; + + void *ptr; + size_t i; + uint64_t tsc; + + TEST_LOG(INFO, "Reference: memset\n"); + + ptr = rte_malloc(NULL, GB, 0); + if (ptr == NULL) { + TEST_LOG(ERR, "rte_malloc(size=%"PRIx64") failed\n", GB); + return -1; + } + + tsc = rte_rdtsc_precise(); + for (i = 0; i < RUNS; i++) + memset(ptr, 0, GB); + tsc = rte_rdtsc_precise() - tsc; + + *us_per_gb = tsc_to_us(tsc, RUNS); + TEST_LOG(INFO, "Result: %f.3 GiB/s <=> %.2f us/MiB\n", + US_PER_S / *us_per_gb, *us_per_gb / KB); + + rte_free(ptr); + TEST_LOG(INFO, "\n"); + return 0; +} + +static int +test_alloc_perf(const char *name, alloc_t *alloc_fn, free_t *free_fn, + memset_t *memset_fn, double memset_gb_us, size_t max_runs) +{ + static const size_t SIZES[] = { + 1 << 6, 1 << 7, 1 << 10, 1 << 12, 1 << 16, 1 << 20, + 1 << 21, 1 << 22, 1 << 24, 1 << 30 }; + + size_t i, j; + void **ptrs; + + TEST_LOG(INFO, "Performance: %s\n", name); + + ptrs = calloc(max_runs, sizeof(ptrs[0])); + if (ptrs == NULL) { + TEST_LOG(ERR, "Cannot allocate memory for pointers"); + return -1; + } + + TEST_LOG(INFO, "%12s%8s%12s%12s%12s%17s\n", "Size (B)", "Runs", + "Alloc (us)", "Free (us)", "Total (us)", + memset_fn != NULL ? "memset (us)" : "est.memset (us)"); + for (i = 0; i < RTE_DIM(SIZES); i++) { + size_t size = SIZES[i]; + size_t runs_done; + uint64_t tsc_start, tsc_alloc, tsc_memset = 0, tsc_free; + double alloc_time, free_time, memset_time; + + tsc_start = rte_rdtsc_precise(); + for (j = 0; j < max_runs; j++) { + ptrs[j] = alloc_fn(NULL, size, 0); + if (ptrs[j] == NULL) + break; + } + tsc_alloc = rte_rdtsc_precise() - tsc_start; + + if (j == 0) { + TEST_LOG(INFO, "%12zu Interrupted: out of memory.\n", + size); + break; + } + runs_done = j; + + if (memset_fn != NULL) { + tsc_start = rte_rdtsc_precise(); + for (j = 0; j < runs_done && ptrs[j] != NULL; j++) + memset_fn(ptrs[j], 0, size); + tsc_memset = rte_rdtsc_precise() - tsc_start; + } + + tsc_start = rte_rdtsc_precise(); + for (j = 0; j < runs_done && ptrs[j] != NULL; j++) + free_fn(ptrs[j]); + tsc_free = rte_rdtsc_precise() - tsc_start; + + alloc_time = tsc_to_us(tsc_alloc, runs_done); + free_time = tsc_to_us(tsc_free, runs_done); + memset_time = memset_fn != NULL ? + tsc_to_us(tsc_memset, runs_done) : + memset_gb_us * size / GB; + TEST_LOG(INFO, "%12zu%8zu%12.2f%12.2f%12.2f%17.2f\n", + size, runs_done, alloc_time, free_time, + alloc_time + free_time, memset_time); + + memset(ptrs, 0, max_runs * sizeof(ptrs[0])); + } + + free(ptrs); + TEST_LOG(INFO, "\n"); + return 0; +} + +static void * +memzone_alloc(const char *name __rte_unused, size_t size, unsigned int align) +{ + const struct rte_memzone *mz; + char gen_name[RTE_MEMZONE_NAMESIZE]; + + snprintf(gen_name, sizeof(gen_name), "test-mz-%"PRIx64, rte_rdtsc()); + mz = rte_memzone_reserve_aligned(gen_name, size, SOCKET_ID_ANY, + RTE_MEMZONE_1GB | RTE_MEMZONE_SIZE_HINT_ONLY, align); + return (void *)(uintptr_t)mz; +} + +static void +memzone_free(void *addr) +{ + rte_memzone_free((struct rte_memzone *)addr); +} + +static int +test_malloc_perf(void) +{ + static const size_t MAX_RUNS = 10000; + + double memset_us_gb; + + if (test_memset_perf(&memset_us_gb) < 0) + return -1; + + if (test_alloc_perf("rte_malloc", rte_malloc, rte_free, memset, + memset_us_gb, MAX_RUNS) < 0) + return -1; + if (test_alloc_perf("rte_zmalloc", rte_zmalloc, rte_free, memset, + memset_us_gb, MAX_RUNS) < 0) + return -1; + + if (test_alloc_perf("rte_memzone_reserve", memzone_alloc, memzone_free, + NULL, memset_us_gb, RTE_MAX_MEMZONE - 1) < 0) + return -1; + + return 0; +} + +REGISTER_TEST_COMMAND(malloc_perf_autotest, test_malloc_perf);