From patchwork Tue Apr 14 19:44:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 68443 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 61754A0577; Tue, 14 Apr 2020 21:44:51 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 3942E1D37F; Tue, 14 Apr 2020 21:44:44 +0200 (CEST) Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com [209.85.208.196]) by dpdk.org (Postfix) with ESMTP id 982C31D15C for ; Tue, 14 Apr 2020 21:44:42 +0200 (CEST) Received: by mail-lj1-f196.google.com with SMTP id d8so1130784ljo.6 for ; Tue, 14 Apr 2020 12:44:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/LcieALohMA7P3wxubjSpXuQ8UhoSbJqvuQUj1XS5Ok=; b=BSCimoSxCSE2plgLDCdEuEAxnfc66sPBGb014guNGNLx4Q5Ms7ckDPETDoEgh+AGIQ lSDoISjso6HZB6ronnYWxCfHPvXOPESXY6wBkO6pRKe0KJGtELQ4MkKHYq0PZYpyDz/Z Wu045LsEain9b/A8sHiWxL+34Y4A+lyfdbYoco20MjkFsm6fGAa+fxKkjSU5Jbz1ry2D Li6lF7937HMMuzsJED0YiFKqSICATivnkEKmUQSXw5gT3KUpJzzjeZi//qM7BpEX9sAB TqktrVinkVZ7IrvNQ5KjwD3Ehs+qlc5Utss1Lo39NeQW+BVIUK7T0YQe4APkbKO6FZcz aiPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/LcieALohMA7P3wxubjSpXuQ8UhoSbJqvuQUj1XS5Ok=; b=nZrY+JUbuVBkNu64bJnr+k+asnrJlcQnmt74p855OUWegOj4Nazq3X81dSKAr9aPrX 61WnRCasclu09wGDC+dpT1WvOf+uFGLcND9IsCH2XnLNnriwQSfEdRjOLjHdGvGDu683 7QcEoo4oGXVYZg6xiqyqh5VeiCnG5EhAdWiubYL6Kja9FQsvYEu7hHhWrUQJbaKlLtEK 3RLdogyXaAGdOBc0de21Wgn3HiJAb0iZ3A82fi8lV9pdvLcV0TB7IYItjPgSX9R5bRrO V19/jiAeiaVI/t9ISqNxGefqM/juQNzZtLAv2+g4orl3Y2EeG9rbgZtXCRqDRnbxHIZh 7TNw== X-Gm-Message-State: AGi0Puaq0luLLCNVSVKrkhyFwLPN/tpRexXmYtu2rAtLsACsKDxZ00X5 3O/dJ56NSsVVUUIo87OvyOvNka0ereUP/g== X-Google-Smtp-Source: APiQypI7r1gNiJ7faKhTUWPqs1hRUb4lzhs47NU2tSNYqdPehgvhR0m3fKgNTHc/blqfCiDZ1KlIyw== X-Received: by 2002:a2e:9455:: with SMTP id o21mr1040038ljh.245.1586893481363; Tue, 14 Apr 2020 12:44:41 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.googlemail.com with ESMTPSA id h14sm11272426lfm.60.2020.04.14.12.44.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Apr 2020 12:44:40 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: "Dmitry Malloy (MESHCHANINOV)" , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Ranjit Menon Date: Tue, 14 Apr 2020 22:44:16 +0300 Message-Id: <20200414194426.1640704-2-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> References: <20200410164342.1194634-1-dmitry.kozliuk@gmail.com> <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v3 1/1] virt2phys: virtual to physical address translator for Windows X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This driver supports Windows EAL memory management by translating current process virtual addresses to physical addresses (IOVA). Standalone virt2phys allows using DPDK without PMD and provides a reference implementation. Suggested-by: Ranjit Menon Signed-off-by: Dmitry Kozlyuk Reviewed-by: Ranjit Menon Acked-by: Ranjit Menon --- windows/README.rst | 103 +++++++++ windows/virt2phys/virt2phys.c | 129 +++++++++++ windows/virt2phys/virt2phys.h | 34 +++ windows/virt2phys/virt2phys.inf | 64 ++++++ windows/virt2phys/virt2phys.sln | 27 +++ windows/virt2phys/virt2phys.vcxproj | 228 ++++++++++++++++++++ windows/virt2phys/virt2phys.vcxproj.filters | 36 ++++ 7 files changed, 621 insertions(+) create mode 100644 windows/README.rst create mode 100644 windows/virt2phys/virt2phys.c create mode 100644 windows/virt2phys/virt2phys.h create mode 100644 windows/virt2phys/virt2phys.inf create mode 100644 windows/virt2phys/virt2phys.sln create mode 100644 windows/virt2phys/virt2phys.vcxproj create mode 100644 windows/virt2phys/virt2phys.vcxproj.filters diff --git a/windows/README.rst b/windows/README.rst new file mode 100644 index 0000000..45a1d80 --- /dev/null +++ b/windows/README.rst @@ -0,0 +1,103 @@ +Developing Windows Drivers +========================== + +Prerequisites +------------- + +Building Windows Drivers is only possible on Windows. + +1. Visual Studio 2019 Community or Professional Edition +2. Windows Driver Kit (WDK) for Windows 10, version 1903 + +Follow the official instructions to obtain all of the above: +https://docs.microsoft.com/en-us/windows-hardware/drivers/download-the-wdk + + +Build the Drivers +----------------- + +Build from Visual Studio +~~~~~~~~~~~~~~~~~~~~~~~~ + +Open a solution (``*.sln``) with Visual Studio and build it (Ctrl+Shift+B). + + +Build from Command-Line +~~~~~~~~~~~~~~~~~~~~~~~ + +Run *Developer Command Prompt for VS 2019* from the Start menu. + +Navigate to the solution directory (with ``*.sln``), then run: + +.. code-block:: console + + msbuild + +To build a particular combination of configuration and platform: + +.. code-block:: console + + msbuild -p:Configuration=Debug;Platform=x64 + + +Install the Drivers +------------------- + +Disable Driver Signature Enforcement +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By default Windows prohibits installing and loading drivers without `digital +signature`_ obtained from Microsoft. For development signature enforcement may +be disabled as follows. + +In Elevated Command Prompt (from this point, sufficient privileges are +assumed): + +.. code-block:: console + + bcdedit -set loadoptions DISABLE_INTEGRITY_CHECKS + bcdedit -set TESTSIGNING ON + shutdown -r -t 0 + +Upon reboot, an overlay message should appear on the desktop informing +that Windows is in test mode, which means it allows loading unsigned drivers. + +.. _digital signature: https://docs.microsoft.com/en-us/windows-hardware/drivers/install/driver-signing + +Install, List, and Remove Drivers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Driver package is by default located in a subdirectory of its source tree, +e.g. ``x64\Debug\virt2phys\virt2phys`` (note two levels of ``virt2phys``). + +To install the driver and bind associated devices to it: + +.. code-block:: console + + pnputil /add-driver x64\Debug\virt2phys\virt2phys\virt2phys.inf /install + +A graphical confirmation to load an unsigned driver will still appear. + +On Windows Server additional steps are required if the driver uses a custom +setup class: + +1. From "Device Manager", "Action" menu, select "Add legacy hardware". +2. It will launch the "Add Hardware Wizard". Click "Next". +3. Select second option "Install the hardware that I manually select + from a list (Advanced)". +4. On the next screen, locate the driver device class. +5. Select it, and click "Next". +6. The previously installed drivers will now be installed for + the appropriate devices (software devices will be created). + +To list installed drivers: + +.. code-block:: console + + pnputil /enum-drivers + +To remove the driver package and to uninstall its devices: + +.. code-block:: console + + pnputil /delete-driver oem2.inf /uninstall diff --git a/windows/virt2phys/virt2phys.c b/windows/virt2phys/virt2phys.c new file mode 100644 index 0000000..e157e9c --- /dev/null +++ b/windows/virt2phys/virt2phys.c @@ -0,0 +1,129 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Dmitry Kozlyuk + */ + +#include +#include +#include +#include + +#include "virt2phys.h" + +DRIVER_INITIALIZE DriverEntry; +EVT_WDF_DRIVER_DEVICE_ADD virt2phys_driver_EvtDeviceAdd; +EVT_WDF_IO_IN_CALLER_CONTEXT virt2phys_device_EvtIoInCallerContext; + +NTSTATUS +DriverEntry( + IN PDRIVER_OBJECT driver_object, IN PUNICODE_STRING registry_path) +{ + WDF_DRIVER_CONFIG config; + WDF_OBJECT_ATTRIBUTES attributes; + NTSTATUS status; + + PAGED_CODE(); + + WDF_DRIVER_CONFIG_INIT(&config, virt2phys_driver_EvtDeviceAdd); + WDF_OBJECT_ATTRIBUTES_INIT(&attributes); + status = WdfDriverCreate( + driver_object, registry_path, + &attributes, &config, WDF_NO_HANDLE); + if (!NT_SUCCESS(status)) { + KdPrint(("WdfDriverCreate() failed, status=%08x\n", status)); + } + + return status; +} + +_Use_decl_annotations_ +NTSTATUS +virt2phys_driver_EvtDeviceAdd( + WDFDRIVER driver, PWDFDEVICE_INIT init) +{ + WDF_OBJECT_ATTRIBUTES attributes; + WDFDEVICE device; + NTSTATUS status; + + UNREFERENCED_PARAMETER(driver); + + PAGED_CODE(); + + WdfDeviceInitSetIoType( + init, WdfDeviceIoNeither); + WdfDeviceInitSetIoInCallerContextCallback( + init, virt2phys_device_EvtIoInCallerContext); + + WDF_OBJECT_ATTRIBUTES_INIT(&attributes); + + status = WdfDeviceCreate(&init, &attributes, &device); + if (!NT_SUCCESS(status)) { + KdPrint(("WdfDeviceCreate() failed, status=%08x\n", status)); + return status; + } + + status = WdfDeviceCreateDeviceInterface( + device, &GUID_DEVINTERFACE_VIRT2PHYS, NULL); + if (!NT_SUCCESS(status)) { + KdPrint(("WdfDeviceCreateDeviceInterface() failed, " + "status=%08x\n", status)); + return status; + } + + return STATUS_SUCCESS; +} + +_Use_decl_annotations_ +VOID +virt2phys_device_EvtIoInCallerContext( + IN WDFDEVICE device, IN WDFREQUEST request) +{ + WDF_REQUEST_PARAMETERS params; + ULONG code; + PVOID *virt; + PHYSICAL_ADDRESS *phys; + size_t size; + NTSTATUS status; + + UNREFERENCED_PARAMETER(device); + + PAGED_CODE(); + + WDF_REQUEST_PARAMETERS_INIT(¶ms); + WdfRequestGetParameters(request, ¶ms); + + if (params.Type != WdfRequestTypeDeviceControl) { + KdPrint(("bogus request type=%u\n", params.Type)); + WdfRequestComplete(request, STATUS_NOT_SUPPORTED); + return; + } + + code = params.Parameters.DeviceIoControl.IoControlCode; + if (code != IOCTL_VIRT2PHYS_TRANSLATE) { + KdPrint(("bogus IO control code=%lu\n", code)); + WdfRequestComplete(request, STATUS_NOT_SUPPORTED); + return; + } + + status = WdfRequestRetrieveInputBuffer( + request, sizeof(*virt), (PVOID *)&virt, &size); + if (!NT_SUCCESS(status)) { + KdPrint(("WdfRequestRetrieveInputBuffer() failed, " + "status=%08x\n", status)); + WdfRequestComplete(request, status); + return; + } + + status = WdfRequestRetrieveOutputBuffer( + request, sizeof(*phys), (PVOID *)&phys, &size); + if (!NT_SUCCESS(status)) { + KdPrint(("WdfRequestRetrieveOutputBuffer() failed, " + "status=%08x\n", status)); + WdfRequestComplete(request, status); + return; + } + + *phys = MmGetPhysicalAddress(*virt); + + WdfRequestCompleteWithInformation( + request, STATUS_SUCCESS, sizeof(*phys)); +} diff --git a/windows/virt2phys/virt2phys.h b/windows/virt2phys/virt2phys.h new file mode 100644 index 0000000..4bb2b4a --- /dev/null +++ b/windows/virt2phys/virt2phys.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2020 Dmitry Kozlyuk + */ + +/** + * @file virt2phys driver interface + */ + +/** + * Driver device interface GUID {539c2135-793a-4926-afec-d3a1b61bbc8a}. + */ +DEFINE_GUID(GUID_DEVINTERFACE_VIRT2PHYS, + 0x539c2135, 0x793a, 0x4926, + 0xaf, 0xec, 0xd3, 0xa1, 0xb6, 0x1b, 0xbc, 0x8a); + +/** + * Driver device type for IO control codes. + */ +#define VIRT2PHYS_DEVTYPE 0x8000 + +/** + * Translate a valid non-paged virtual address to a physical address. + * + * Note: A physical address zero (0) is reported if input address + * is paged out or not mapped. However, if input is a valid mapping + * of I/O port 0x0000, output is also zero. There is no way + * to distinguish between these cases by return value only. + * + * Input: a non-paged virtual address (PVOID). + * + * Output: the corresponding physical address (LARGE_INTEGER). + */ +#define IOCTL_VIRT2PHYS_TRANSLATE CTL_CODE( \ + VIRT2PHYS_DEVTYPE, 0x800, METHOD_BUFFERED, FILE_ANY_ACCESS) diff --git a/windows/virt2phys/virt2phys.inf b/windows/virt2phys/virt2phys.inf new file mode 100644 index 0000000..e35765e --- /dev/null +++ b/windows/virt2phys/virt2phys.inf @@ -0,0 +1,64 @@ +; SPDX-License-Identifier: BSD-3-Clause +; Copyright (c) 2020 Dmitry Kozlyuk + +[Version] +Signature = "$WINDOWS NT$" +Class = %ClassName% +ClassGuid = {78A1C341-4539-11d3-B88D-00C04FAD5171} +Provider = %ManufacturerName% +CatalogFile = virt2phys.cat +DriverVer = + +[DestinationDirs] +DefaultDestDir = 12 + +; ================= Class section ===================== + +[ClassInstall32] +Addreg = virt2phys_ClassReg + +[virt2phys_ClassReg] +HKR,,,0,%ClassName% +HKR,,Icon,,-5 + +[SourceDisksNames] +1 = %DiskName%,,,"" + +[SourceDisksFiles] +virt2phys.sys = 1,, + +;***************************************** +; Install Section +;***************************************** + +[Manufacturer] +%ManufacturerName%=Standard,NT$ARCH$ + +[Standard.NT$ARCH$] +%virt2phys.DeviceDesc%=virt2phys_Device, Root\virt2phys + +[virt2phys_Device.NT] +CopyFiles = Drivers_Dir + +[Drivers_Dir] +virt2phys.sys + +;-------------- Service installation +[virt2phys_Device.NT.Services] +AddService = virt2phys,%SPSVCINST_ASSOCSERVICE%, virt2phys_Service_Inst + +; -------------- virt2phys driver install sections +[virt2phys_Service_Inst] +DisplayName = %virt2phys.SVCDESC% +ServiceType = 1 ; SERVICE_KERNEL_DRIVER +StartType = 3 ; SERVICE_DEMAND_START +ErrorControl = 1 ; SERVICE_ERROR_NORMAL +ServiceBinary = %12%\virt2phys.sys + +[Strings] +SPSVCINST_ASSOCSERVICE = 0x00000002 +ManufacturerName = "Dmitry Kozlyuk" +ClassName = "Kernel bypass" +DiskName = "virt2phys Installation Disk" +virt2phys.DeviceDesc = "Virtual to physical address translator" +virt2phys.SVCDESC = "virt2phys Service" diff --git a/windows/virt2phys/virt2phys.sln b/windows/virt2phys/virt2phys.sln new file mode 100644 index 0000000..0f5ecdc --- /dev/null +++ b/windows/virt2phys/virt2phys.sln @@ -0,0 +1,27 @@ + +Microsoft Visual Studio Solution File, Format Version 12.00 +# Visual Studio Version 16 +VisualStudioVersion = 16.0.29613.14 +MinimumVisualStudioVersion = 10.0.40219.1 +Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "virt2phys", "virt2phys.vcxproj", "{0EEF826B-9391-43A8-A722-BDD6F6115137}" +EndProject +Global + GlobalSection(SolutionConfigurationPlatforms) = preSolution + Debug|x64 = Debug|x64 + Release|x64 = Release|x64 + EndGlobalSection + GlobalSection(ProjectConfigurationPlatforms) = postSolution + {0EEF826B-9391-43A8-A722-BDD6F6115137}.Debug|x64.ActiveCfg = Debug|x64 + {0EEF826B-9391-43A8-A722-BDD6F6115137}.Debug|x64.Build.0 = Debug|x64 + {0EEF826B-9391-43A8-A722-BDD6F6115137}.Debug|x64.Deploy.0 = Debug|x64 + {0EEF826B-9391-43A8-A722-BDD6F6115137}.Release|x64.ActiveCfg = Release|x64 + {0EEF826B-9391-43A8-A722-BDD6F6115137}.Release|x64.Build.0 = Release|x64 + {0EEF826B-9391-43A8-A722-BDD6F6115137}.Release|x64.Deploy.0 = Release|x64 + EndGlobalSection + GlobalSection(SolutionProperties) = preSolution + HideSolutionNode = FALSE + EndGlobalSection + GlobalSection(ExtensibilityGlobals) = postSolution + SolutionGuid = {845012FB-4471-4A12-A1C4-FF7E05C40E8E} + EndGlobalSection +EndGlobal diff --git a/windows/virt2phys/virt2phys.vcxproj b/windows/virt2phys/virt2phys.vcxproj new file mode 100644 index 0000000..fa51916 --- /dev/null +++ b/windows/virt2phys/virt2phys.vcxproj @@ -0,0 +1,228 @@ + + + + + Debug + Win32 + + + Release + Win32 + + + Debug + x64 + + + Release + x64 + + + Debug + ARM + + + Release + ARM + + + Debug + ARM64 + + + Release + ARM64 + + + + + + + + + + + + + {0EEF826B-9391-43A8-A722-BDD6F6115137} + {497e31cb-056b-4f31-abb8-447fd55ee5a5} + v4.5 + 12.0 + Debug + Win32 + virt2phys + + + + Windows10 + true + WindowsKernelModeDriver10.0 + Driver + KMDF + Universal + + + Windows10 + false + WindowsKernelModeDriver10.0 + Driver + KMDF + Universal + + + Windows10 + true + WindowsKernelModeDriver10.0 + Driver + KMDF + Universal + + + Windows10 + false + WindowsKernelModeDriver10.0 + Driver + KMDF + Universal + + + Windows10 + true + WindowsKernelModeDriver10.0 + Driver + KMDF + Universal + + + Windows10 + false + WindowsKernelModeDriver10.0 + Driver + KMDF + Universal + + + Windows10 + true + WindowsKernelModeDriver10.0 + Driver + KMDF + Universal + + + Windows10 + false + WindowsKernelModeDriver10.0 + Driver + KMDF + Universal + + + + + + + + + + + DbgengKernelDebugger + + + DbgengKernelDebugger + + + DbgengKernelDebugger + + + DbgengKernelDebugger + + + DbgengKernelDebugger + + + DbgengKernelDebugger + + + DbgengKernelDebugger + + + DbgengKernelDebugger + + + + true + true + trace.h + true + + + + + true + true + trace.h + true + + + + + false + true + trace.h + true + + + $(DDK_LIB_PATH)wdmsec.lib;%(AdditionalDependencies) + + + 0.1 + + + + + true + true + trace.h + true + + + + + true + true + trace.h + true + + + + + true + true + trace.h + true + + + + + true + true + trace.h + true + + + + + true + true + trace.h + true + + + + + + + + + \ No newline at end of file diff --git a/windows/virt2phys/virt2phys.vcxproj.filters b/windows/virt2phys/virt2phys.vcxproj.filters new file mode 100644 index 0000000..0fe65fc --- /dev/null +++ b/windows/virt2phys/virt2phys.vcxproj.filters @@ -0,0 +1,36 @@ + + + + + {4FC737F1-C7A5-4376-A066-2A32D752A2FF} + cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx + + + {93995380-89BD-4b04-88EB-625FBE52EBFB} + h;hpp;hxx;hm;inl;inc;xsd + + + {67DA6AB6-F800-4c08-8B7A-83BB121AAD01} + rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav;mfcribbon-ms + + + {8E41214B-6785-4CFE-B992-037D68949A14} + inf;inv;inx;mof;mc; + + + + + Driver Files + + + + + Header Files + + + + + Source Files + + + From patchwork Tue Apr 14 19:44:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 68444 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 27FBFA0577; Tue, 14 Apr 2020 21:45:03 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 54A2C1D414; Tue, 14 Apr 2020 21:44:47 +0200 (CEST) Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com [209.85.208.176]) by dpdk.org (Postfix) with ESMTP id F2C991D183 for ; Tue, 14 Apr 2020 21:44:43 +0200 (CEST) Received: by mail-lj1-f176.google.com with SMTP id q22so1180076ljg.0 for ; Tue, 14 Apr 2020 12:44:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=FM3gEU8MnpUPb5NbZInoh9N+C2gm7mARsTrMd87uB+k=; b=mrbax550oQkIbiHN/2g0izafOoW9/TAJd1tCdJWk2sAk7IOY5XOvCktBhmQoxDTkSY vIR2dGi4DjQTzpfk+RaNpsukuyIuTht9vU+NHBgKf+dnOTxsB8hFXcBYpMZpazayEP6s 0OZu1Z83QGsqx1np1YA6Ln2Lxiu+BSILBY3g4SbMTwjwJ2lVCi/0jtsyj/kF/idJjDko ZyqdiOanGwT2sgSIU4HanJn2H1XMmBdg/Mg0odpL8BvcXTigLt/cMUPc/panlwe7GGfn cfnVI/+e/ujOBN8JZlfOnBFCFH3Jm5f6lDuXixzIyM51sUZwk8oKV3Rki4JXOlR39Ojr 75UA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=FM3gEU8MnpUPb5NbZInoh9N+C2gm7mARsTrMd87uB+k=; b=joCrhc+mM6gbMhBHaQLbUJSRfl8JNIDkZfnoz+qW/49rgO98smacMedM5uPk1T19ax LtUry1wvXj2Re5hCaJRdfo9lURLHJNOIepSCFG8IpkHI+/C+dNzc1+NUY2X2ZP5mYFEf hRTGBaHHLr8wRahUn3DrMwLtpXRyxMJbcslQopzaa3btuVButplb9oLoU1YKEBwRjpz0 /NEUT/8778ZfUiY60U9tT+4zXf1Kh9nVni3kBBXbN0U5XB6VNNTYRU32XCK22ku5IG7a kSFQgNZ/BG/XvQuZJIdJI8cotunJNWV2PgM++TTW6pyQ0vgNksBFSF0XFKWLJVrOSQw3 Nc+Q== X-Gm-Message-State: AGi0Puba8EbQwZLolnaKgqBQkkHcTx1iEMD92sgUJE5cz/OJhwpaRtbG BMEIPuUcc37MSWLOa3eU1VA56O3Wo/uetg== X-Google-Smtp-Source: APiQypJmduYYokNFxVvCMmt0RJwpgkaSBsPulMdR9kHsjJIDr2VwfbqADaNhZrqQaW1/oUW/g31Ikg== X-Received: by 2002:a2e:900c:: with SMTP id h12mr1047994ljg.290.1586893483009; Tue, 14 Apr 2020 12:44:43 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.googlemail.com with ESMTPSA id h14sm11272426lfm.60.2020.04.14.12.44.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Apr 2020 12:44:42 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: "Dmitry Malloy (MESHCHANINOV)" , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Harini Ramakrishnan , Omar Cardona , Pallavi Kadam , Ranjit Menon , Thomas Monjalon , Anand Rawat Date: Tue, 14 Apr 2020 22:44:17 +0300 Message-Id: <20200414194426.1640704-3-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> References: <20200410164342.1194634-1-dmitry.kozliuk@gmail.com> <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v3 02/10] eal/windows: do not expose private EAL facilities X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The goal of rte_os.h is to mitigate OS differences for EAL users. In Windows EAL, rte_os.h did excessive things: 1. It included platform SDK headers (windows.h, etc). Those files are huge, require specific inclusion order, and are generally unused by the code including rte_os.h. Declarations from platform SDK may break otherwise platform-independent code, e.g. min, max, ERROR. 2. It included pthread.h, which is clearly not always required. 3. It defined functions private to Windows EAL. Reorganize Windows EAL includes in the following way: 1. Create rte_windows.h to properly import Windows-specific facilities. Primary users are bus drivers, tests, and external applications. 2. Remove platform SDK includes from rte_os.h to prevent breaking otherwise portable code by including rte_os.h on Windows. Copy necessary definitions to avoid including those headers. 3. Remove pthread.h include from rte_os.h. 4. Move declarations private to Windows EAL into eal_windows.h. Fixes: 428eb983f5f7 ("eal: add OS specific header file") Signed-off-by: Dmitry Kozlyuk --- lib/librte_eal/windows/eal.c | 2 + lib/librte_eal/windows/eal_lcore.c | 2 + lib/librte_eal/windows/eal_thread.c | 1 + lib/librte_eal/windows/eal_windows.h | 29 +++++++++++++ lib/librte_eal/windows/include/meson.build | 1 + lib/librte_eal/windows/include/pthread.h | 2 + lib/librte_eal/windows/include/rte_os.h | 44 ++++++-------------- lib/librte_eal/windows/include/rte_windows.h | 41 ++++++++++++++++++ 8 files changed, 91 insertions(+), 31 deletions(-) create mode 100644 lib/librte_eal/windows/eal_windows.h create mode 100644 lib/librte_eal/windows/include/rte_windows.h diff --git a/lib/librte_eal/windows/eal.c b/lib/librte_eal/windows/eal.c index e4b50df3b..2cf7a04ef 100644 --- a/lib/librte_eal/windows/eal.c +++ b/lib/librte_eal/windows/eal.c @@ -18,6 +18,8 @@ #include #include +#include "eal_windows.h" + /* Allow the application to print its usage message too if set */ static rte_usage_hook_t rte_application_usage_hook; diff --git a/lib/librte_eal/windows/eal_lcore.c b/lib/librte_eal/windows/eal_lcore.c index b3a6c63af..82ee45413 100644 --- a/lib/librte_eal/windows/eal_lcore.c +++ b/lib/librte_eal/windows/eal_lcore.c @@ -2,12 +2,14 @@ * Copyright(c) 2019 Intel Corporation */ +#include #include #include #include "eal_private.h" #include "eal_thread.h" +#include "eal_windows.h" /* global data structure that contains the CPU map */ static struct _wcpu_map { diff --git a/lib/librte_eal/windows/eal_thread.c b/lib/librte_eal/windows/eal_thread.c index 9e4bbaa08..e149199a6 100644 --- a/lib/librte_eal/windows/eal_thread.c +++ b/lib/librte_eal/windows/eal_thread.c @@ -14,6 +14,7 @@ #include #include "eal_private.h" +#include "eal_windows.h" RTE_DEFINE_PER_LCORE(unsigned int, _lcore_id) = LCORE_ID_ANY; RTE_DEFINE_PER_LCORE(unsigned int, _socket_id) = (unsigned int)SOCKET_ID_ANY; diff --git a/lib/librte_eal/windows/eal_windows.h b/lib/librte_eal/windows/eal_windows.h new file mode 100644 index 000000000..fadd676b2 --- /dev/null +++ b/lib/librte_eal/windows/eal_windows.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2020 Dmitry Kozlyuk + */ + +#ifndef _EAL_WINDOWS_H_ +#define _EAL_WINDOWS_H_ + +/** + * @file Facilities private to Windows EAL + */ + +#include + +/** + * Create a map of processors and cores on the system. + */ +void eal_create_cpu_map(void); + +/** + * Create a thread. + * + * @param thread + * The location to store the thread id if successful. + * @return + * 0 for success, -1 if the thread is not created. + */ +int eal_thread_create(pthread_t *thread); + +#endif /* _EAL_WINDOWS_H_ */ diff --git a/lib/librte_eal/windows/include/meson.build b/lib/librte_eal/windows/include/meson.build index 7d18dd52f..5fb1962ac 100644 --- a/lib/librte_eal/windows/include/meson.build +++ b/lib/librte_eal/windows/include/meson.build @@ -5,4 +5,5 @@ includes += include_directories('.') headers += files( 'rte_os.h', + 'rte_windows.h', ) diff --git a/lib/librte_eal/windows/include/pthread.h b/lib/librte_eal/windows/include/pthread.h index b9dd18e56..cfd53f0b8 100644 --- a/lib/librte_eal/windows/include/pthread.h +++ b/lib/librte_eal/windows/include/pthread.h @@ -5,6 +5,8 @@ #ifndef _PTHREAD_H_ #define _PTHREAD_H_ +#include + /** * This file is required to support the common code in eal_common_proc.c, * eal_common_thread.c and common\include\rte_per_lcore.h as Microsoft libc diff --git a/lib/librte_eal/windows/include/rte_os.h b/lib/librte_eal/windows/include/rte_os.h index e1e0378e6..510e39e03 100644 --- a/lib/librte_eal/windows/include/rte_os.h +++ b/lib/librte_eal/windows/include/rte_os.h @@ -8,20 +8,18 @@ /** * This is header should contain any function/macro definition * which are not supported natively or named differently in the - * Windows OS. Functions will be added in future releases. + * Windows OS. It must not include Windows-specific headers. */ +#include +#include +#include + #ifdef __cplusplus extern "C" { #endif -#include -#include -#include -#include - -/* limits.h replacement */ -#include +/* limits.h replacement, value as in */ #ifndef PATH_MAX #define PATH_MAX _MAX_PATH #endif @@ -31,8 +29,6 @@ extern "C" { /* strdup is deprecated in Microsoft libc and _strdup is preferred */ #define strdup(str) _strdup(str) -typedef SSIZE_T ssize_t; - #define strtok_r(str, delim, saveptr) strtok_s(str, delim, saveptr) #define index(a, b) strchr(a, b) @@ -40,22 +36,14 @@ typedef SSIZE_T ssize_t; #define strncasecmp(s1, s2, count) _strnicmp(s1, s2, count) -/** - * Create a thread. - * This function is private to EAL. - * - * @param thread - * The location to store the thread id if successful. - * @return - * 0 for success, -1 if the thread is not created. - */ -int eal_thread_create(pthread_t *thread); +/* cpu_set macros implementation */ +#define RTE_CPU_AND(dst, src1, src2) CPU_AND(dst, src1, src2) +#define RTE_CPU_OR(dst, src1, src2) CPU_OR(dst, src1, src2) +#define RTE_CPU_FILL(set) CPU_FILL(set) +#define RTE_CPU_NOT(dst, src) CPU_NOT(dst, src) -/** - * Create a map of processors and cores on the system. - * This function is private to EAL. - */ -void eal_create_cpu_map(void); +/* as in */ +typedef long long ssize_t; #ifndef RTE_TOOLCHAIN_GCC static inline int @@ -86,12 +74,6 @@ asprintf(char **buffer, const char *format, ...) } #endif /* RTE_TOOLCHAIN_GCC */ -/* cpu_set macros implementation */ -#define RTE_CPU_AND(dst, src1, src2) CPU_AND(dst, src1, src2) -#define RTE_CPU_OR(dst, src1, src2) CPU_OR(dst, src1, src2) -#define RTE_CPU_FILL(set) CPU_FILL(set) -#define RTE_CPU_NOT(dst, src) CPU_NOT(dst, src) - #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/windows/include/rte_windows.h b/lib/librte_eal/windows/include/rte_windows.h new file mode 100644 index 000000000..ed6e4c148 --- /dev/null +++ b/lib/librte_eal/windows/include/rte_windows.h @@ -0,0 +1,41 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2020 Dmitry Kozlyuk + */ + +#ifndef _RTE_WINDOWS_H_ +#define _RTE_WINDOWS_H_ + +/** + * @file Windows-specific facilities + * + * This file should be included by DPDK libraries and applications + * that need access to Windows API. It includes platform SDK headers + * in compatible order with proper options and defines error-handling macros. + */ + +/* Disable excessive libraries. */ +#ifndef WIN32_LEAN_AND_MEAN +#define WIN32_LEAN_AND_MEAN +#endif + +/* Must come first. */ +#include + +#include +#include + +/* Have GUIDs defined. */ +#ifndef INITGUID +#define INITGUID +#endif +#include + +/** + * Log GetLastError() with context, usually a Win32 API function and arguments. + */ +#define RTE_LOG_WIN32_ERR(...) \ + RTE_LOG(DEBUG, EAL, RTE_FMT("GetLastError()=%lu: " \ + RTE_FMT_HEAD(__VA_ARGS__,) "\n", GetLastError(), \ + RTE_FMT_TAIL(__VA_ARGS__,))) + +#endif /* _RTE_WINDOWS_H_ */ From patchwork Tue Apr 14 19:44:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 68445 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 73B20A0577; Tue, 14 Apr 2020 21:45:12 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id ACC241D418; Tue, 14 Apr 2020 21:44:48 +0200 (CEST) Received: from mail-lj1-f193.google.com (mail-lj1-f193.google.com [209.85.208.193]) by dpdk.org (Postfix) with ESMTP id 3D1DE1D407 for ; Tue, 14 Apr 2020 21:44:45 +0200 (CEST) Received: by mail-lj1-f193.google.com with SMTP id r7so1081434ljg.13 for ; Tue, 14 Apr 2020 12:44:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=mal3E7O5kzn2u7Zr4faoLT2/hNtxrHtmR1Zz4Rjh7O0=; b=IoyCKmBC5z1M0ZkBdZc1PzV9JtCHUAq4WcstrkeRmecAnm6P6QtjLzfp5aki/SaLa5 mqXsG7EMYBtiExJBpVq+SdfZ6ujLN8AVv0mpSiqhqSRcGXGkTWH8gqyuuaTdxvPKdnF2 UL2ijJwZaBDmBvuS0d1BQ7ndd+YW7rG0uYIuNs6cBA8f/9ERAqoTDqhDHr6zj7OgDdV6 N8Qi7FuU2aWc8IWqgDpZYsP+QzEC/yXL2kAdeGJLCLb38tbOMWT1qOUaZQu3Uu16i5DK P4GSpjEGERAJrOWOvsI6NylWFXhbTwT40A/01Pe4hnehGK/7pJ3UXZ5wmB9xMSg/ycGV W7gA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=mal3E7O5kzn2u7Zr4faoLT2/hNtxrHtmR1Zz4Rjh7O0=; b=QuJX37pfzvpjB2I+t+612VjNXMkajDphww0OgPjV6DYAjApyJIj7crKt+tB93ih6ih jgGgFd1X1UqK5O9kCX4YaPKaGZAwAqpOXyRm7NgwKZF5KPO/1SKIKla6vv//js0NQxqs 8nerkH4FGRxDsdqDt5ME4/3nk8eIZWj//nHdetDY3kwmXYqEhumDYJ1TFGap2Mvvngih FXmf2Tp9PDajgZd5H7mm1zi0wqAK7bGeGyCi7HAyJTi5kuFMOqGPuB3JKcAWSl1KxqZO LF1pH6LQsl8r44tQz5kbqQRXcD+q6A8UaAHGbzU9kNFMXAziesr3IolLFmpaih+BWlO9 EyyA== X-Gm-Message-State: AGi0PuY1B1oTLBuOJrgEAS8g0SasdqMAwIrYQ8rWxypK+hh3yqzekt2r QDpaoCEBrLgAQL03YfDzM9DZ/cC2gh6OQw== X-Google-Smtp-Source: APiQypIepum9+sQN+STVuudi2lcjgHT3boMXiPjTNaQbciQ8/bGOY4P1V2zKJfKW0vqNBE5EmIv1Eg== X-Received: by 2002:a2e:2413:: with SMTP id k19mr1077114ljk.134.1586893484395; Tue, 14 Apr 2020 12:44:44 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.googlemail.com with ESMTPSA id h14sm11272426lfm.60.2020.04.14.12.44.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Apr 2020 12:44:43 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: "Dmitry Malloy (MESHCHANINOV)" , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Harini Ramakrishnan , Omar Cardona , Pallavi Kadam , Ranjit Menon , Anand Rawat , Jeff Shaw Date: Tue, 14 Apr 2020 22:44:18 +0300 Message-Id: <20200414194426.1640704-4-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> References: <20200410164342.1194634-1-dmitry.kozliuk@gmail.com> <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v3 03/10] eal/windows: improve CPU and NUMA node detection X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 1. Map CPU cores to their respective NUMA nodes as reported by system. 2. Support systems with more than 64 cores (multiple processor groups). 3. Fix magic constants, styling issues, and compiler warnings. 4. Add EAL private function to map DPDK socket ID to NUMA node number. Fixes: 53ffd9f080fc ("eal/windows: add minimum viable code") Signed-off-by: Dmitry Kozlyuk --- lib/librte_eal/windows/eal_lcore.c | 185 +++++++++++++++++---------- lib/librte_eal/windows/eal_windows.h | 10 ++ 2 files changed, 124 insertions(+), 71 deletions(-) diff --git a/lib/librte_eal/windows/eal_lcore.c b/lib/librte_eal/windows/eal_lcore.c index 82ee45413..9d931d50a 100644 --- a/lib/librte_eal/windows/eal_lcore.c +++ b/lib/librte_eal/windows/eal_lcore.c @@ -3,103 +3,146 @@ */ #include +#include #include #include +#include +#include +#include #include "eal_private.h" #include "eal_thread.h" #include "eal_windows.h" -/* global data structure that contains the CPU map */ -static struct _wcpu_map { - unsigned int total_procs; - unsigned int proc_sockets; - unsigned int proc_cores; - unsigned int reserved; - struct _win_lcore_map { - uint8_t socket_id; - uint8_t core_id; - } wlcore_map[RTE_MAX_LCORE]; -} wcpu_map = { 0 }; - -/* - * Create a map of all processors and associated cores on the system - */ +/** Number of logical processors (cores) in a processor group (32 or 64). */ +#define EAL_PROCESSOR_GROUP_SIZE (sizeof(KAFFINITY) * CHAR_BIT) + +struct lcore_map { + uint8_t socket_id; + uint8_t core_id; +}; + +struct socket_map { + uint16_t node_id; +}; + +struct cpu_map { + unsigned int socket_count; + unsigned int lcore_count; + struct lcore_map lcores[RTE_MAX_LCORE]; + struct socket_map sockets[RTE_MAX_NUMA_NODES]; +}; + +static struct cpu_map cpu_map = { 0 }; + void -eal_create_cpu_map() +eal_create_cpu_map(void) { - wcpu_map.total_procs = - GetActiveProcessorCount(ALL_PROCESSOR_GROUPS); - - LOGICAL_PROCESSOR_RELATIONSHIP lprocRel; - DWORD lprocInfoSize = 0; - BOOL ht_enabled = FALSE; - - /* First get the processor package information */ - lprocRel = RelationProcessorPackage; - /* Determine the size of buffer we need (pass NULL) */ - GetLogicalProcessorInformationEx(lprocRel, NULL, &lprocInfoSize); - wcpu_map.proc_sockets = lprocInfoSize / 48; - - lprocInfoSize = 0; - /* Next get the processor core information */ - lprocRel = RelationProcessorCore; - GetLogicalProcessorInformationEx(lprocRel, NULL, &lprocInfoSize); - wcpu_map.proc_cores = lprocInfoSize / 48; - - if (wcpu_map.total_procs > wcpu_map.proc_cores) - ht_enabled = TRUE; - - /* Distribute the socket and core ids appropriately - * across the logical cores. For now, split the cores - * equally across the sockets. - */ - unsigned int lcore = 0; - for (unsigned int socket = 0; socket < - wcpu_map.proc_sockets; ++socket) { - for (unsigned int core = 0; - core < (wcpu_map.proc_cores / wcpu_map.proc_sockets); - ++core) { - wcpu_map.wlcore_map[lcore] - .socket_id = socket; - wcpu_map.wlcore_map[lcore] - .core_id = core; - lcore++; - if (ht_enabled) { - wcpu_map.wlcore_map[lcore] - .socket_id = socket; - wcpu_map.wlcore_map[lcore] - .core_id = core; - lcore++; + SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *infos, *info; + DWORD infos_size; + bool full = false; + + infos_size = 0; + if (!GetLogicalProcessorInformationEx( + RelationNumaNode, NULL, &infos_size)) { + DWORD error = GetLastError(); + if (error != ERROR_INSUFFICIENT_BUFFER) { + rte_panic("cannot get NUMA node info size, error %lu", + GetLastError()); + } + } + + infos = malloc(infos_size); + if (infos == NULL) { + rte_panic("cannot allocate memory for NUMA node information"); + return; + } + + if (!GetLogicalProcessorInformationEx( + RelationNumaNode, infos, &infos_size)) { + rte_panic("cannot get NUMA node information, error %lu", + GetLastError()); + } + + info = infos; + while ((uint8_t *)info - (uint8_t *)infos < infos_size) { + unsigned int node_id = info->NumaNode.NodeNumber; + GROUP_AFFINITY *cores = &info->NumaNode.GroupMask; + struct lcore_map *lcore; + unsigned int i, socket_id; + + /* NUMA node may be reported multiple times if it includes + * cores from different processor groups, e. g. 80 cores + * of a physical processor comprise one NUMA node, but two + * processor groups, because group size is limited by 32/64. + */ + for (socket_id = 0; socket_id < cpu_map.socket_count; + socket_id++) { + if (cpu_map.sockets[socket_id].node_id == node_id) + break; + } + + if (socket_id == cpu_map.socket_count) { + if (socket_id == RTE_DIM(cpu_map.sockets)) { + full = true; + goto exit; } + + cpu_map.sockets[socket_id].node_id = node_id; + cpu_map.socket_count++; + } + + for (i = 0; i < EAL_PROCESSOR_GROUP_SIZE; i++) { + if ((cores->Mask & ((KAFFINITY)1 << i)) == 0) + continue; + + if (cpu_map.lcore_count == RTE_DIM(cpu_map.lcores)) { + full = true; + goto exit; + } + + lcore = &cpu_map.lcores[cpu_map.lcore_count]; + lcore->socket_id = socket_id; + lcore->core_id = + cores->Group * EAL_PROCESSOR_GROUP_SIZE + i; + cpu_map.lcore_count++; } + + info = (SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *)( + (uint8_t *)info + info->Size); + } + +exit: + if (full) { + /* RTE_LOG() may not be available, but this is important. */ + fprintf(stderr, "Enumerated maximum of %u NUMA nodes and %u cores\n", + cpu_map.socket_count, cpu_map.lcore_count); } + + free(infos); } -/* - * Check if a cpu is present by the presence of the cpu information for it - */ int eal_cpu_detected(unsigned int lcore_id) { - return (lcore_id < wcpu_map.total_procs); + return lcore_id < cpu_map.lcore_count; } -/* - * Get CPU socket id for a logical core - */ unsigned eal_cpu_socket_id(unsigned int lcore_id) { - return wcpu_map.wlcore_map[lcore_id].socket_id; + return cpu_map.lcores[lcore_id].socket_id; } -/* - * Get CPU socket id (NUMA node) for a logical core - */ unsigned eal_cpu_core_id(unsigned int lcore_id) { - return wcpu_map.wlcore_map[lcore_id].core_id; + return cpu_map.lcores[lcore_id].core_id; +} + +unsigned int +eal_socket_numa_node(unsigned int socket_id) +{ + return cpu_map.sockets[socket_id].node_id; } diff --git a/lib/librte_eal/windows/eal_windows.h b/lib/librte_eal/windows/eal_windows.h index fadd676b2..390d2fd66 100644 --- a/lib/librte_eal/windows/eal_windows.h +++ b/lib/librte_eal/windows/eal_windows.h @@ -26,4 +26,14 @@ void eal_create_cpu_map(void); */ int eal_thread_create(pthread_t *thread); +/** + * Get system NUMA node number for a socket ID. + * + * @param socket_id + * Valid EAL socket ID. + * @return + * NUMA node number to use with Win32 API. + */ +unsigned int eal_socket_numa_node(unsigned int socket_id); + #endif /* _EAL_WINDOWS_H_ */ From patchwork Tue Apr 14 19:44:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 68446 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id D0558A0577; Tue, 14 Apr 2020 21:45:21 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 20B171D422; Tue, 14 Apr 2020 21:44:50 +0200 (CEST) Received: from mail-lf1-f68.google.com (mail-lf1-f68.google.com [209.85.167.68]) by dpdk.org (Postfix) with ESMTP id D4BFA1D414 for ; Tue, 14 Apr 2020 21:44:46 +0200 (CEST) Received: by mail-lf1-f68.google.com with SMTP id w145so759839lff.3 for ; Tue, 14 Apr 2020 12:44:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=7XQ9j9FAZ6d9LIZhGg5Kit5XVvRMPEpuIe4txMfQbLw=; b=fbBs1/FaFW6/XS2/E+tVLnZ1kGE5fxfOLe6L37UUj/KiCIAjKcvhdZngETQAtlijrH RVUboQsPhsTFFBtlMct931ZezvTZuluvhXldouEk86eimuRd2ORY5IffdKuFrm7GODk2 Ud5J+u6P3FQGdI8/xPYaGONUbf1Ll1L7qsag600If2KRkb77MKyINY02Sh866l2baD5o NeAUvNY9xss1mPnOKYIMx+Mg6c2v9OaUX2R5OTqslKuVxGydLcNehhcWEQu8ww5NvA+6 9t8dkmSjQIXe5129rUgaX2g3++JujIg8dPCw6uUCWx3XAPulWxcSejpY26wd/6Y1pvgE pacA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7XQ9j9FAZ6d9LIZhGg5Kit5XVvRMPEpuIe4txMfQbLw=; b=HFndhi6Ikbeiyq37rXv6FzBrZ/y0zDhyfFGGJFwbyjmtXO+k3EU33Zq2efQs7JceyD n81rMTxDY9yevoXnl6CeEwr+cnXp9skNsmbc5tcTa21ftIjPKXm2aRCr6xaGDPmJC4UA clhfg5yqyGdM5Jnumj1NElWgOEQAhTYBtBPCBpde7xTYxwbJR9Oi/3PnGxCHHHJRUn1I 42DDbzsoqgBlweLYTjfzDy+b9W4SwCO20aZ9b7Bd55yQ35CGCOgch0w21qyx8l3Py+fi Okai1VtTDg0lq1TSSsc95gVik7l9ZYV3CmnsFtVC6r/azrexJkm0tqONrH7QCrCAyS2H J5HA== X-Gm-Message-State: AGi0PuZISJye6yPYXlw5eHwIV6Kakp+beABwtpRwF9gM+HiSkpGSnIBB XqXNH5JYlj7K2Sk6+HAfHpuTDBlVQ93JEA== X-Google-Smtp-Source: APiQypIlK49w+PVmUDGeibGAYzvzvfJxojcnPpc3b6H9dax79KG15PvJGuszYlk+s5vPyQTTBbnAlQ== X-Received: by 2002:ac2:551e:: with SMTP id j30mr841644lfk.179.1586893485875; Tue, 14 Apr 2020 12:44:45 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.googlemail.com with ESMTPSA id h14sm11272426lfm.60.2020.04.14.12.44.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Apr 2020 12:44:45 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: "Dmitry Malloy (MESHCHANINOV)" , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Thomas Monjalon , Harini Ramakrishnan , Omar Cardona , Pallavi Kadam , Ranjit Menon , John McNamara , Marko Kovacevic Date: Tue, 14 Apr 2020 22:44:19 +0300 Message-Id: <20200414194426.1640704-5-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> References: <20200410164342.1194634-1-dmitry.kozliuk@gmail.com> <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v3 04/10] eal/windows: initialize hugepage info X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add hugepages discovery ("large pages" in Windows terminology) and update documentation for required privilege setup. Signed-off-by: Dmitry Kozlyuk --- config/meson.build | 2 + doc/guides/windows_gsg/build_dpdk.rst | 20 ----- doc/guides/windows_gsg/index.rst | 1 + doc/guides/windows_gsg/run_apps.rst | 47 +++++++++++ lib/librte_eal/windows/eal.c | 14 ++++ lib/librte_eal/windows/eal_hugepages.c | 108 +++++++++++++++++++++++++ lib/librte_eal/windows/meson.build | 1 + 7 files changed, 173 insertions(+), 20 deletions(-) create mode 100644 doc/guides/windows_gsg/run_apps.rst create mode 100644 lib/librte_eal/windows/eal_hugepages.c diff --git a/config/meson.build b/config/meson.build index 58421342b..4607655d9 100644 --- a/config/meson.build +++ b/config/meson.build @@ -263,6 +263,8 @@ if is_windows if cc.get_id() == 'gcc' add_project_arguments('-D__USE_MINGW_ANSI_STDIO', language: 'c') endif + + add_project_link_arguments('-ladvapi32', language: 'c') endif if get_option('b_lto') diff --git a/doc/guides/windows_gsg/build_dpdk.rst b/doc/guides/windows_gsg/build_dpdk.rst index d46e84e3f..650483e3b 100644 --- a/doc/guides/windows_gsg/build_dpdk.rst +++ b/doc/guides/windows_gsg/build_dpdk.rst @@ -111,23 +111,3 @@ Depending on the distribution, paths in this file may need adjustments. meson --cross-file config/x86/meson_mingw.txt -Dexamples=helloworld build ninja -C build - - -Run the helloworld example -========================== - -Navigate to the examples in the build directory and run `dpdk-helloworld.exe`. - -.. code-block:: console - - cd C:\Users\me\dpdk\build\examples - dpdk-helloworld.exe - hello from core 1 - hello from core 3 - hello from core 0 - hello from core 2 - -Note for MinGW-w64: applications are linked to ``libwinpthread-1.dll`` -by default. To run the example, either add toolchain executables directory -to the PATH or copy the library to the working directory. -Alternatively, static linking may be used (mind the LGPLv2.1 license). diff --git a/doc/guides/windows_gsg/index.rst b/doc/guides/windows_gsg/index.rst index d9b7990a8..e94593572 100644 --- a/doc/guides/windows_gsg/index.rst +++ b/doc/guides/windows_gsg/index.rst @@ -12,3 +12,4 @@ Getting Started Guide for Windows intro build_dpdk + run_apps diff --git a/doc/guides/windows_gsg/run_apps.rst b/doc/guides/windows_gsg/run_apps.rst new file mode 100644 index 000000000..21ac7f6c1 --- /dev/null +++ b/doc/guides/windows_gsg/run_apps.rst @@ -0,0 +1,47 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2020 Dmitry Kozlyuk + +Running DPDK Applications +========================= + +Grant *Lock pages in memory* Privilege +-------------------------------------- + +Use of hugepages ("large pages" in Windows terminolocy) requires +``SeLockMemoryPrivilege`` for the user running an application. + +1. Open *Local Security Policy* snap in, either: + + * Control Panel / Computer Management / Local Security Policy; + * or Win+R, type ``secpol``, press Enter. + +2. Open *Local Policies / User Rights Assignment / Lock pages in memory.* + +3. Add desired users or groups to the list of grantees. + +4. Privilege is applied upon next logon. In particular, if privilege has been + granted to current user, a logoff is required before it is available. + +See `Large-Page Support`_ in MSDN for details. + +.. _Large-page Support: https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support + + +Run the ``helloworld`` Example +------------------------------ + +Navigate to the examples in the build directory and run `dpdk-helloworld.exe`. + +.. code-block:: console + + cd C:\Users\me\dpdk\build\examples + dpdk-helloworld.exe + hello from core 1 + hello from core 3 + hello from core 0 + hello from core 2 + +Note for MinGW-w64: applications are linked to ``libwinpthread-1.dll`` +by default. To run the example, either add toolchain executables directory +to the PATH or copy the library to the working directory. +Alternatively, static linking may be used (mind the LGPLv2.1 license). diff --git a/lib/librte_eal/windows/eal.c b/lib/librte_eal/windows/eal.c index 2cf7a04ef..63461f51a 100644 --- a/lib/librte_eal/windows/eal.c +++ b/lib/librte_eal/windows/eal.c @@ -18,8 +18,11 @@ #include #include +#include "eal_hugepages.h" #include "eal_windows.h" +#define MEMSIZE_IF_NO_HUGE_PAGE (64ULL * 1024ULL * 1024ULL) + /* Allow the application to print its usage message too if set */ static rte_usage_hook_t rte_application_usage_hook; @@ -242,6 +245,17 @@ rte_eal_init(int argc, char **argv) if (fctret < 0) exit(1); + if (!internal_config.no_hugetlbfs && (eal_hugepage_info_init() < 0)) { + rte_eal_init_alert("Cannot get hugepage information"); + rte_errno = EACCES; + return -1; + } + + if (internal_config.memory == 0 && !internal_config.force_sockets) { + if (internal_config.no_hugetlbfs) + internal_config.memory = MEMSIZE_IF_NO_HUGE_PAGE; + } + eal_thread_init_master(rte_config.master_lcore); RTE_LCORE_FOREACH_SLAVE(i) { diff --git a/lib/librte_eal/windows/eal_hugepages.c b/lib/librte_eal/windows/eal_hugepages.c new file mode 100644 index 000000000..b099d13f9 --- /dev/null +++ b/lib/librte_eal/windows/eal_hugepages.c @@ -0,0 +1,108 @@ +#include +#include +#include +#include +#include + +#include "eal_filesystem.h" +#include "eal_hugepages.h" +#include "eal_internal_cfg.h" +#include "eal_windows.h" + +static int +hugepage_claim_privilege(void) +{ + static const wchar_t privilege[] = L"SeLockMemoryPrivilege"; + + HANDLE token; + LUID luid; + TOKEN_PRIVILEGES tp; + int ret = -1; + + if (!OpenProcessToken(GetCurrentProcess(), + TOKEN_ADJUST_PRIVILEGES, &token)) { + RTE_LOG_WIN32_ERR("OpenProcessToken()"); + return -1; + } + + if (!LookupPrivilegeValueW(NULL, privilege, &luid)) { + RTE_LOG_WIN32_ERR("LookupPrivilegeValue(\"%S\")", privilege); + goto exit; + } + + tp.PrivilegeCount = 1; + tp.Privileges[0].Luid = luid; + tp.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED; + + if (!AdjustTokenPrivileges( + token, FALSE, &tp, sizeof(tp), NULL, NULL)) { + RTE_LOG_WIN32_ERR("AdjustTokenPrivileges()"); + goto exit; + } + + ret = 0; + +exit: + CloseHandle(token); + + return ret; +} + +static int +hugepage_info_init(void) +{ + struct hugepage_info *hpi; + unsigned int socket_id; + int ret = 0; + + /* Only one hugepage size available in Windows. */ + internal_config.num_hugepage_sizes = 1; + hpi = &internal_config.hugepage_info[0]; + + hpi->hugepage_sz = GetLargePageMinimum(); + if (hpi->hugepage_sz == 0) + return -ENOTSUP; + + /* Assume all memory on each NUMA node available for hugepages, + * because Windows neither advertises additional limits, + * nor provides an API to query them. + */ + for (socket_id = 0; socket_id < rte_socket_count(); socket_id++) { + ULONGLONG bytes; + unsigned int numa_node; + + numa_node = eal_socket_numa_node(socket_id); + if (!GetNumaAvailableMemoryNodeEx(numa_node, &bytes)) { + RTE_LOG_WIN32_ERR("GetNumaAvailableMemoryNodeEx(%u)", + numa_node); + continue; + } + + hpi->num_pages[socket_id] = bytes / hpi->hugepage_sz; + RTE_LOG(DEBUG, EAL, + "Found %u hugepages of %zu bytes on socket %u\n", + hpi->num_pages[socket_id], hpi->hugepage_sz, socket_id); + } + + /* No hugepage filesystem in Windows. */ + hpi->lock_descriptor = -1; + memset(hpi->hugedir, 0, sizeof(hpi->hugedir)); + + return ret; +} + +int +eal_hugepage_info_init(void) +{ + if (hugepage_claim_privilege() < 0) { + RTE_LOG(ERR, EAL, "Cannot claim hugepage privilege\n"); + return -1; + } + + if (hugepage_info_init() < 0) { + RTE_LOG(ERR, EAL, "Cannot get hugepage information\n"); + return -1; + } + + return 0; +} diff --git a/lib/librte_eal/windows/meson.build b/lib/librte_eal/windows/meson.build index 09dd4ab2f..5f118bfe2 100644 --- a/lib/librte_eal/windows/meson.build +++ b/lib/librte_eal/windows/meson.build @@ -6,6 +6,7 @@ subdir('include') sources += files( 'eal.c', 'eal_debug.c', + 'eal_hugepages.c', 'eal_lcore.c', 'eal_thread.c', 'getopt.c', From patchwork Tue Apr 14 19:44:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 68447 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 63ADCA0577; Tue, 14 Apr 2020 21:45:33 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 482581D440; Tue, 14 Apr 2020 21:44:53 +0200 (CEST) Received: from mail-lf1-f68.google.com (mail-lf1-f68.google.com [209.85.167.68]) by dpdk.org (Postfix) with ESMTP id 0A18D1D418 for ; Tue, 14 Apr 2020 21:44:48 +0200 (CEST) Received: by mail-lf1-f68.google.com with SMTP id u10so730133lfo.8 for ; Tue, 14 Apr 2020 12:44:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=C27Z+XHQPlR/3v09fEYJyoGWFH01nD1Ie7MCpke9tAc=; b=bOtkpWIr5eO/2fJ+A8Frurg787LKCtZErIyn7v43NZmnCvG+tY6pkvIxhx33S/tte1 0HfJWFiDyDqMPqUqoD9WhKMKXK7ekgAy+4oLVHpoS5gLIpKCOH5/dUuuFm9493vI+LVL Hz+db6qyu9KacgfV5g2LYaFHeg7lw0cNNcqbkSSKf1FhVmmKx0sclAWlH0oJ+DRDh442 KRgXtDxZLnIbC+CmGbqsEy0Eix847ifbp5NBDe60rBjmUgzOiFkAJZ1h6WRPs1cuNF9w uPR1j2K8r93YifPEWPGkRyAsOYtdjsJinm1X1DrfvCsnYpeTcHBASRKMxwOTo8XAh10E BfVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=C27Z+XHQPlR/3v09fEYJyoGWFH01nD1Ie7MCpke9tAc=; b=IQsJKHOEBwDfs5UdPEitkarNz9NUJsfoqdcZhcO5npsM2dqMi4otdL6NteBapApV6K qfcmJZHOnagrDhJJ08wa+nz1JO38+QLwiz2YOWPVt9RL+hNkiR9JTjhAFdrFf/4+wBGR QZQ/xfDcRvlaN2vNQEks0dLS4QbcTd5XRZ+n1M/PJUtxTw9Q+arC/xS405IlmdLLs52K AwJuldT7OYSfruJv1mG6Wz6A5WESUfZAX+34rA/xEzlXBYNObIQIWGXL1W33Vlpryd51 cEminmfXHKYRyydxlpv1uGCmE6p9txh9JaPVYb19O0TUL+gL8bdPVyxbOefAfnqKwk50 p7Pw== X-Gm-Message-State: AGi0PuZ0jBLOIqHeqKEvMl/ShTVMIGvmS4EsQUK4+/9j4BmqfqsytxUB B7yR0Ylfoj0weeFOZp/N2Fpcnn/gvTa+Qg== X-Google-Smtp-Source: APiQypIetvjDhPvfnHCenV/eW9RaQ/h+rWGuZ/0NNgkwahP6eKZ2eMGc8SQ8p0Z2lOSjqW6ZMF1g3Q== X-Received: by 2002:a05:6512:3189:: with SMTP id i9mr829702lfe.178.1586893487015; Tue, 14 Apr 2020 12:44:47 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.googlemail.com with ESMTPSA id h14sm11272426lfm.60.2020.04.14.12.44.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Apr 2020 12:44:46 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: "Dmitry Malloy (MESHCHANINOV)" , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Harini Ramakrishnan , Omar Cardona , Pallavi Kadam , Ranjit Menon Date: Tue, 14 Apr 2020 22:44:20 +0300 Message-Id: <20200414194426.1640704-6-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> References: <20200410164342.1194634-1-dmitry.kozliuk@gmail.com> <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v3 05/10] eal: introduce internal wrappers for file operations X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" EAL common code uses file locking and truncation. Introduce OS-independent wrappers in order to support both Linux/FreeBSD and Windows: * eal_file_lock: lock or unlock an open file. * eal_file_truncate: enforce a given size for an open file. Wrappers follow POSIX semantics, but interface is not POSIX, so that it can be made more clean, e.g. by not mixing locking operation and behaviour on conflict. Implementation for Linux and FreeBSD is placed in "unix" subdirectory, which is intended for common code between the two. Files should be named after the ones from which the code is factored in OS subdirectory. Signed-off-by: Dmitry Kozlyuk --- lib/librte_eal/common/eal_private.h | 45 ++++++++++++++++ lib/librte_eal/meson.build | 4 ++ lib/librte_eal/unix/eal.c | 47 ++++++++++++++++ lib/librte_eal/unix/meson.build | 6 +++ lib/librte_eal/windows/eal.c | 83 +++++++++++++++++++++++++++++ 5 files changed, 185 insertions(+) create mode 100644 lib/librte_eal/unix/eal.c create mode 100644 lib/librte_eal/unix/meson.build diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index ddcfbe2e4..65d61ff13 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -443,4 +443,49 @@ rte_option_usage(void); uint64_t eal_get_baseaddr(void); +/** File locking operation. */ +enum eal_flock_op { + EAL_FLOCK_SHARED, /**< Acquire a shared lock. */ + EAL_FLOCK_EXCLUSIVE, /**< Acquire an exclusive lock. */ + EAL_FLOCK_UNLOCK /**< Release a previously taken lock. */ +}; + +/** Behavior on file locking conflict. */ +enum eal_flock_mode { + EAL_FLOCK_WAIT, /**< Wait until the file gets unlocked to lock it. */ + EAL_FLOCK_RETURN /**< Return immediately if the file is locked. */ +}; + +/** + * Lock or unlock the file. + * + * On failure @code rte_errno @endcode is set to the error code + * specified by POSIX flock(3) description. + * + * @param fd + * Opened file descriptor. + * @param op + * Operation to perform. + * @param mode + * Behavior on conflict. + * @return + * 0 on success, (-1) on failure. + */ +int eal_file_lock(int fd, enum eal_flock_op op, enum eal_flock_mode mode); + +/** + * Truncate or extend the file to the specified size. + * + * On failure @code rte_errno @endcode is set to the error code + * specified by POSIX ftruncate(3) description. + * + * @param fd + * Opened file descriptor. + * @param size + * Desired file size. + * @return + * 0 on success, (-1) on failure. + */ +int eal_file_truncate(int fd, ssize_t size); + #endif /* _EAL_PRIVATE_H_ */ diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build index 9d219a0e6..1f89efb88 100644 --- a/lib/librte_eal/meson.build +++ b/lib/librte_eal/meson.build @@ -6,6 +6,10 @@ subdir('include') subdir('common') +if not is_windows + subdir('unix') +endif + dpdk_conf.set('RTE_EXEC_ENV_' + exec_env.to_upper(), 1) subdir(exec_env) diff --git a/lib/librte_eal/unix/eal.c b/lib/librte_eal/unix/eal.c new file mode 100644 index 000000000..a337b59b1 --- /dev/null +++ b/lib/librte_eal/unix/eal.c @@ -0,0 +1,47 @@ +#include +#include +#include + +#include + +#include "eal_private.h" + +int +eal_file_truncate(int fd, ssize_t size) +{ + int ret; + + ret = ftruncate(fd, size); + if (ret) + rte_errno = errno; + + return ret; +} + +int +eal_file_lock(int fd, enum eal_flock_op op, enum eal_flock_mode mode) +{ + int sys_flags = 0; + int ret; + + if (mode == EAL_FLOCK_RETURN) + sys_flags |= LOCK_NB; + + switch (op) { + case EAL_FLOCK_EXCLUSIVE: + sys_flags |= LOCK_EX; + break; + case EAL_FLOCK_SHARED: + sys_flags |= LOCK_SH; + break; + case EAL_FLOCK_UNLOCK: + sys_flags |= LOCK_UN; + break; + } + + ret = flock(fd, sys_flags); + if (ret) + rte_errno = errno; + + return ret; +} diff --git a/lib/librte_eal/unix/meson.build b/lib/librte_eal/unix/meson.build new file mode 100644 index 000000000..13564838e --- /dev/null +++ b/lib/librte_eal/unix/meson.build @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2020 Dmitry Kozlyuk + +sources += files( + 'eal.c', +) diff --git a/lib/librte_eal/windows/eal.c b/lib/librte_eal/windows/eal.c index 63461f51a..9dba895e7 100644 --- a/lib/librte_eal/windows/eal.c +++ b/lib/librte_eal/windows/eal.c @@ -224,6 +224,89 @@ rte_eal_init_alert(const char *msg) RTE_LOG(ERR, EAL, "%s\n", msg); } +int +eal_file_truncate(int fd, ssize_t size) +{ + HANDLE handle; + DWORD ret; + LONG low = (LONG)((size_t)size); + LONG high = (LONG)((size_t)size >> 32); + + handle = (HANDLE)_get_osfhandle(fd); + if (handle == INVALID_HANDLE_VALUE) { + rte_errno = EBADF; + return -1; + } + + ret = SetFilePointer(handle, low, &high, FILE_BEGIN); + if (ret == INVALID_SET_FILE_POINTER) { + RTE_LOG_WIN32_ERR("SetFilePointer()"); + rte_errno = EINVAL; + return -1; + } + + return 0; +} + +static int +lock_file(HANDLE handle, enum eal_flock_op op, enum eal_flock_mode mode) +{ + DWORD sys_flags = 0; + OVERLAPPED overlapped; + + if (op == EAL_FLOCK_EXCLUSIVE) + sys_flags |= LOCKFILE_EXCLUSIVE_LOCK; + if (mode == EAL_FLOCK_RETURN) + sys_flags |= LOCKFILE_FAIL_IMMEDIATELY; + + memset(&overlapped, 0, sizeof(overlapped)); + if (!LockFileEx(handle, sys_flags, 0, 0, 0, &overlapped)) { + if ((sys_flags & LOCKFILE_FAIL_IMMEDIATELY) && + (GetLastError() == ERROR_IO_PENDING)) { + rte_errno = EWOULDBLOCK; + } else { + RTE_LOG_WIN32_ERR("LockFileEx()"); + rte_errno = EINVAL; + } + return -1; + } + + return 0; +} + +static int +unlock_file(HANDLE handle) +{ + if (!UnlockFileEx(handle, 0, 0, 0, NULL)) { + RTE_LOG_WIN32_ERR("UnlockFileEx()"); + rte_errno = EINVAL; + return -1; + } + return 0; +} + +int +eal_file_lock(int fd, enum eal_flock_op op, enum eal_flock_mode mode) +{ + HANDLE handle = (HANDLE)_get_osfhandle(fd); + + if (handle == INVALID_HANDLE_VALUE) { + rte_errno = EBADF; + return -1; + } + + switch (op) { + case EAL_FLOCK_EXCLUSIVE: + case EAL_FLOCK_SHARED: + return lock_file(handle, op, mode); + case EAL_FLOCK_UNLOCK: + return unlock_file(handle); + default: + rte_errno = EINVAL; + return -1; + } +} + /* Launch threads, called at application init(). */ int rte_eal_init(int argc, char **argv) From patchwork Tue Apr 14 19:44:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 68448 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 42ADBA0577; Tue, 14 Apr 2020 21:45:40 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id CC9B71D44F; Tue, 14 Apr 2020 21:44:54 +0200 (CEST) Received: from mail-lf1-f66.google.com (mail-lf1-f66.google.com [209.85.167.66]) by dpdk.org (Postfix) with ESMTP id DDF271D41F for ; Tue, 14 Apr 2020 21:44:49 +0200 (CEST) Received: by mail-lf1-f66.google.com with SMTP id f8so704721lfe.12 for ; Tue, 14 Apr 2020 12:44:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=I81Zpr8DhSudc6XjZRMGewQbKbXeKv7t0wqUCZDaq6w=; b=kld3l7oxWi29L8qjz4oy1Ub0aSkgroFmdPpj5EppHcUt0ZHwwnxHPGcY/z8hwOyOXu 6VsWfZ5QZR7uVuxvRi4F+exFntaoqqx5fOlHQV0yjLkwpfBBD/bcJ82qI/RaZ/g0XEe0 MWqOyisUDUZelP/aUjdlWhrWHRWjkZPl0HnQI4cUIs7u0i4iUwRzxMi+Q6jgECb5X1+0 +WyaE5OGa6NiitbKeAerx6AMr1uizdpH8O0rxNmqXyZZ1goSpi1wRuE8Y9bB3QelBZQN tZr9UHd1CRj4vv5UQsDJOkyzHxprK3imjbj5S9la6MH3b/ImE5W2I/7o4MDNQvqvmBXK /Z9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=I81Zpr8DhSudc6XjZRMGewQbKbXeKv7t0wqUCZDaq6w=; b=r/IAA26Ngki8GYmA+44yFgze5WGeNutL8zwPyHAxk//M0S6K8kxCjA/C1LtwReE0xO GWSz7QKHahLwff0Iq/JuqvpltPc6+jX++m/CNyyZ/7SmmMzCtwjcptlogudL4L/pL5zh s4kdsjMChC1CHtUCfsmJrSksocyHmyXR14oh0hzXyaemUyUHZgF+ay+LHDc673bc0WMg 0t9edM6ntpjsr81mvs/ZNLB5wKE9WPZDpOJ7LGD9neveoqYzYajNABjwj3lZ7HShbWLj odCj2FJNbl1cVlrDQR/zmUxhxLO9jQbccQrOC8dokmfERQ0QSyuIpaD4sEFgLmD1Eish Z/rw== X-Gm-Message-State: AGi0PuZwpOfctqMop9YmqbpZ3vnEslWYkLSekNBnGUhyTreA6zOClrTt TlFvkSBB7wHzrrz9X35ok27P9JW5tiUguQ== X-Google-Smtp-Source: APiQypLB4TETmCtvrRH2Cv2xpS0iQlIBt6lebsYX+uv/lMAjknmt+DCBF104n8XF1TlRdwAfKbGKLQ== X-Received: by 2002:ac2:4d10:: with SMTP id r16mr828441lfi.180.1586893488495; Tue, 14 Apr 2020 12:44:48 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.googlemail.com with ESMTPSA id h14sm11272426lfm.60.2020.04.14.12.44.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Apr 2020 12:44:47 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: "Dmitry Malloy (MESHCHANINOV)" , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Thomas Monjalon , Anatoly Burakov , Harini Ramakrishnan , Omar Cardona , Pallavi Kadam , Ranjit Menon Date: Tue, 14 Apr 2020 22:44:21 +0300 Message-Id: <20200414194426.1640704-7-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> References: <20200410164342.1194634-1-dmitry.kozliuk@gmail.com> <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v3 06/10] eal: introduce memory management wrappers X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" System meory management is implemented differently for POSIX and Windows. Introduce wrapper functions for operations used across DPDK: * rte_mem_map() Create memory mapping for a regular file or a page file (swap). This supports mapping to a reserved memory region even on Windows. * rte_mem_unmap() Remove mapping created with rte_mem_map(). * rte_get_page_size() Obtain default system page size. * rte_mem_lock() Make arbitrary-sized memory region non-swappable. Wrappers follow POSIX semantics limited to DPDK tasks, but their signatures deliberately differ from POSIX ones to be more safe and expressive. Signed-off-by: Dmitry Kozlyuk --- config/meson.build | 10 +- lib/librte_eal/common/eal_private.h | 51 +++- lib/librte_eal/include/rte_memory.h | 68 +++++ lib/librte_eal/rte_eal_exports.def | 4 + lib/librte_eal/rte_eal_version.map | 4 + lib/librte_eal/unix/eal_memory.c | 113 +++++++ lib/librte_eal/unix/meson.build | 1 + lib/librte_eal/windows/eal.c | 6 + lib/librte_eal/windows/eal_memory.c | 437 +++++++++++++++++++++++++++ lib/librte_eal/windows/eal_windows.h | 67 ++++ lib/librte_eal/windows/meson.build | 1 + 11 files changed, 758 insertions(+), 4 deletions(-) create mode 100644 lib/librte_eal/unix/eal_memory.c create mode 100644 lib/librte_eal/windows/eal_memory.c diff --git a/config/meson.build b/config/meson.build index 4607655d9..bceb5ef7b 100644 --- a/config/meson.build +++ b/config/meson.build @@ -256,14 +256,20 @@ if is_freebsd endif if is_windows - # Minimum supported API is Windows 7. - add_project_arguments('-D_WIN32_WINNT=0x0601', language: 'c') + # VirtualAlloc2() is available since Windows 10 / Server 2016. + add_project_arguments('-D_WIN32_WINNT=0x0A00', language: 'c') # Use MinGW-w64 stdio, because DPDK assumes ANSI-compliant formatting. if cc.get_id() == 'gcc' add_project_arguments('-D__USE_MINGW_ANSI_STDIO', language: 'c') endif + # Contrary to docs, VirtualAlloc2() is exported by mincore.lib + # in Windows SDK, while MinGW exports it by advapi32.a. + if is_ms_linker + add_project_link_arguments('-lmincore', language: 'c') + endif + add_project_link_arguments('-ladvapi32', language: 'c') endif diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index 65d61ff13..1e89338f2 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -11,6 +11,7 @@ #include #include +#include /** * Structure storing internal configuration (per-lcore) @@ -202,6 +203,16 @@ int rte_eal_alarm_init(void); */ int rte_eal_check_module(const char *module_name); +/** + * Memory reservation flags. + */ +enum eal_mem_reserve_flags { + /**< Reserve hugepages (support may be limited or missing). */ + EAL_RESERVE_HUGEPAGES = 1 << 0, + /**< Fail if requested address is not available. */ + EAL_RESERVE_EXACT_ADDRESS = 1 << 1 +}; + /** * Get virtual area of specified size from the OS. * @@ -232,8 +243,8 @@ int rte_eal_check_module(const char *module_name); #define EAL_VIRTUAL_AREA_UNMAP (1 << 2) /**< immediately unmap reserved virtual area. */ void * -eal_get_virtual_area(void *requested_addr, size_t *size, - size_t page_sz, int flags, int mmap_flags); +eal_get_virtual_area(void *requested_addr, size_t *size, size_t page_sz, + int flags, int mmap_flags); /** * Get cpu core_id. @@ -488,4 +499,40 @@ int eal_file_lock(int fd, enum eal_flock_op op, enum eal_flock_mode mode); */ int eal_file_truncate(int fd, ssize_t size); +/** + * Reserve a region of virtual memory. + * + * Use eal_mem_free() to free reserved memory. + * + * @param requested_addr + * A desired reservation address. The system may not respect it. + * NULL means the address will be chosen by the system. + * @param size + * Reservation size. Must be a multiple of system page size. + * @param flags + * Reservation options. + * @returns + * Starting address of the reserved area on success, NULL on failure. + * Callers must not access this memory until remapping it. + */ +void *eal_mem_reserve(void *requested_addr, size_t size, + enum eal_mem_reserve_flags flags); + +/** + * Free memory obtained by eal_mem_reserve() or eal_mem_alloc(). + * + * If @code virt @endcode and @code size @endcode describe a part of the + * reserved region, only this part of the region is freed (accurately + * up to the system page size). If @code virt @endcode points to allocated + * memory, @code size @endcode must match the one specified on allocation. + * The behavior is undefined if the memory pointed by @code virt @endcode + * is obtained from another source than listed above. + * + * @param virt + * A virtual address in a region previously reserved. + * @param size + * Number of bytes to unreserve. + */ +void eal_mem_free(void *virt, size_t size); + #endif /* _EAL_PRIVATE_H_ */ diff --git a/lib/librte_eal/include/rte_memory.h b/lib/librte_eal/include/rte_memory.h index 3d8d0bd69..1b7c3e5df 100644 --- a/lib/librte_eal/include/rte_memory.h +++ b/lib/librte_eal/include/rte_memory.h @@ -85,6 +85,74 @@ struct rte_memseg_list { struct rte_fbarray memseg_arr; }; +/** + * Memory protection flags. + */ +enum rte_mem_prot { + RTE_PROT_READ = 1 << 0, /**< Read access. */ + RTE_PROT_WRITE = 1 << 1, /**< Write access. */ + RTE_PROT_EXECUTE = 1 << 2 /**< Code execution. */ +}; + +/** + * Memory mapping additional flags. + * + * In Linux and FreeBSD, each flag is semantically equivalent + * to OS-specific mmap(3) flag with the same or similar name. + * In Windows, POSIX and MAP_ANONYMOUS semantics are followed. + */ +enum rte_map_flags { + /** Changes of mapped memory are visible to other processes. */ + RTE_MAP_SHARED = 1 << 0, + /** Mapping is not backed by a regular file. */ + RTE_MAP_ANONYMOUS = 1 << 1, + /** Copy-on-write mapping, changes are invisible to other processes. */ + RTE_MAP_PRIVATE = 1 << 2, + /** Fail if requested address cannot be taken. */ + RTE_MAP_FIXED = 1 << 3 +}; + +/** + * OS-independent implementation of POSIX mmap(3) + * with MAP_ANONYMOUS Linux/FreeBSD extension. + */ +__rte_experimental +void *rte_mem_map(void *requested_addr, size_t size, enum rte_mem_prot prot, + enum rte_map_flags flags, int fd, size_t offset); + +/** + * OS-independent implementation of POSIX munmap(3). + */ +__rte_experimental +int rte_mem_unmap(void *virt, size_t size); + +/** + * Get system page size. This function never failes. + * + * @return + * Positive page size in bytes. + */ +__rte_experimental +int rte_get_page_size(void); + +/** + * Lock region in physical memory and prevent it from swapping. + * + * @param virt + * The virtual address. + * @param size + * Size of the region. + * @return + * 0 on success, negative on error. + * + * @note Implementations may require @p virt and @p size to be multiples + * of system page size. + * @see rte_get_page_size() + * @see rte_mem_lock_page() + */ +__rte_experimental +int rte_mem_lock(const void *virt, size_t size); + /** * Lock page in physical memory and prevent from swapping. * diff --git a/lib/librte_eal/rte_eal_exports.def b/lib/librte_eal/rte_eal_exports.def index 12a6c79d6..bacf9a107 100644 --- a/lib/librte_eal/rte_eal_exports.def +++ b/lib/librte_eal/rte_eal_exports.def @@ -5,5 +5,9 @@ EXPORTS rte_eal_mp_remote_launch rte_eal_mp_wait_lcore rte_eal_remote_launch + rte_get_page_size rte_log + rte_mem_lock + rte_mem_map + rte_mem_unmap rte_vlog diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index f9ede5b41..07128898f 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -337,5 +337,9 @@ EXPERIMENTAL { rte_thread_is_intr; # added in 20.05 + rte_get_page_size; rte_log_can_log; + rte_mem_lock; + rte_mem_map; + rte_mem_unmap; }; diff --git a/lib/librte_eal/unix/eal_memory.c b/lib/librte_eal/unix/eal_memory.c new file mode 100644 index 000000000..6bd087d94 --- /dev/null +++ b/lib/librte_eal/unix/eal_memory.c @@ -0,0 +1,113 @@ +#include +#include +#include + +#include +#include +#include + +#include "eal_private.h" + +static void * +mem_map(void *requested_addr, size_t size, int prot, int flags, + int fd, size_t offset) +{ + void *virt = mmap(requested_addr, size, prot, flags, fd, offset); + if (virt == MAP_FAILED) { + RTE_LOG(ERR, EAL, + "Cannot mmap(%p, 0x%zx, 0x%x, 0x%x, %d, 0x%zx): %s\n", + requested_addr, size, prot, flags, fd, offset, + strerror(errno)); + rte_errno = errno; + return NULL; + } + return virt; +} + +static int +mem_unmap(void *virt, size_t size) +{ + int ret = munmap(virt, size); + if (ret < 0) { + RTE_LOG(ERR, EAL, "Cannot munmap(%p, 0x%zx): %s\n", + virt, size, strerror(errno)); + rte_errno = errno; + } + return ret; +} + +void * +eal_mem_reserve(void *requested_addr, size_t size, + enum eal_mem_reserve_flags flags) +{ + int sys_flags = MAP_PRIVATE | MAP_ANONYMOUS; + +#ifdef MAP_HUGETLB + if (flags & EAL_RESERVE_HUGEPAGES) + sys_flags |= MAP_HUGETLB; +#endif + if (flags & EAL_RESERVE_EXACT_ADDRESS) + sys_flags |= MAP_FIXED; + + return mem_map(requested_addr, size, PROT_NONE, sys_flags, -1, 0); +} + +void +eal_mem_free(void *virt, size_t size) +{ + mem_unmap(virt, size); +} + +static int +mem_rte_to_sys_prot(enum rte_mem_prot prot) +{ + int sys_prot = 0; + + if (prot & RTE_PROT_READ) + sys_prot |= PROT_READ; + if (prot & RTE_PROT_WRITE) + sys_prot |= PROT_WRITE; + if (prot & RTE_PROT_EXECUTE) + sys_prot |= PROT_EXEC; + + return sys_prot; +} + +void * +rte_mem_map(void *requested_addr, size_t size, enum rte_mem_prot prot, + enum rte_map_flags flags, int fd, size_t offset) +{ + int sys_prot = 0; + int sys_flags = 0; + + sys_prot = mem_rte_to_sys_prot(prot); + + if (flags & RTE_MAP_SHARED) + sys_flags |= MAP_SHARED; + if (flags & RTE_MAP_ANONYMOUS) + sys_flags |= MAP_ANONYMOUS; + if (flags & RTE_MAP_PRIVATE) + sys_flags |= MAP_PRIVATE; + if (flags & RTE_MAP_FIXED) + sys_flags |= MAP_FIXED; + + return mem_map(requested_addr, size, sys_prot, sys_flags, fd, offset); +} + +int +rte_mem_unmap(void *virt, size_t size) +{ + return mem_unmap(virt, size); +} + +int +rte_get_page_size(void) +{ + return getpagesize(); +} + +int +rte_mem_lock(const void *virt, size_t size) +{ + return mlock(virt, size); +} diff --git a/lib/librte_eal/unix/meson.build b/lib/librte_eal/unix/meson.build index 13564838e..50c019a56 100644 --- a/lib/librte_eal/unix/meson.build +++ b/lib/librte_eal/unix/meson.build @@ -3,4 +3,5 @@ sources += files( 'eal.c', + 'eal_memory.c', ) diff --git a/lib/librte_eal/windows/eal.c b/lib/librte_eal/windows/eal.c index 9dba895e7..cf55b56da 100644 --- a/lib/librte_eal/windows/eal.c +++ b/lib/librte_eal/windows/eal.c @@ -339,6 +339,12 @@ rte_eal_init(int argc, char **argv) internal_config.memory = MEMSIZE_IF_NO_HUGE_PAGE; } + if (eal_mem_win32api_init() < 0) { + rte_eal_init_alert("Cannot access Win32 memory management"); + rte_errno = ENOTSUP; + return -1; + } + eal_thread_init_master(rte_config.master_lcore); RTE_LCORE_FOREACH_SLAVE(i) { diff --git a/lib/librte_eal/windows/eal_memory.c b/lib/librte_eal/windows/eal_memory.c new file mode 100644 index 000000000..5697187ce --- /dev/null +++ b/lib/librte_eal/windows/eal_memory.c @@ -0,0 +1,437 @@ +#include + +#include +#include + +#include "eal_private.h" +#include "eal_windows.h" + +/* MinGW-w64 headers lack VirtualAlloc2() in some distributions. + * Provide a copy of definitions and code to load it dynamically. + * Note: definitions are copied verbatim from Microsoft documentation + * and don't follow DPDK code style. + */ +#ifndef MEM_PRESERVE_PLACEHOLDER + +/* https://docs.microsoft.com/en-us/windows/win32/api/winnt/ne-winnt-mem_extended_parameter_type */ +typedef enum MEM_EXTENDED_PARAMETER_TYPE { + MemExtendedParameterInvalidType, + MemExtendedParameterAddressRequirements, + MemExtendedParameterNumaNode, + MemExtendedParameterPartitionHandle, + MemExtendedParameterMax, + MemExtendedParameterUserPhysicalHandle, + MemExtendedParameterAttributeFlags +} *PMEM_EXTENDED_PARAMETER_TYPE; + +#define MEM_EXTENDED_PARAMETER_TYPE_BITS 4 + +/* https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-mem_extended_parameter */ +typedef struct MEM_EXTENDED_PARAMETER { + struct { + DWORD64 Type : MEM_EXTENDED_PARAMETER_TYPE_BITS; + DWORD64 Reserved : 64 - MEM_EXTENDED_PARAMETER_TYPE_BITS; + } DUMMYSTRUCTNAME; + union { + DWORD64 ULong64; + PVOID Pointer; + SIZE_T Size; + HANDLE Handle; + DWORD ULong; + } DUMMYUNIONNAME; +} MEM_EXTENDED_PARAMETER, *PMEM_EXTENDED_PARAMETER; + +/* https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc2 */ +typedef PVOID (*VirtualAlloc2_type)( + HANDLE Process, + PVOID BaseAddress, + SIZE_T Size, + ULONG AllocationType, + ULONG PageProtection, + MEM_EXTENDED_PARAMETER *ExtendedParameters, + ULONG ParameterCount +); + +/* VirtualAlloc2() flags. */ +#define MEM_COALESCE_PLACEHOLDERS 0x00000001 +#define MEM_PRESERVE_PLACEHOLDER 0x00000002 +#define MEM_REPLACE_PLACEHOLDER 0x00004000 +#define MEM_RESERVE_PLACEHOLDER 0x00040000 + +/* Named exactly as the function, so that user code does not depend + * on it being found at compile time or dynamically. + */ +static VirtualAlloc2_type VirtualAlloc2; + +int +eal_mem_win32api_init(void) +{ + static const char library_name[] = "kernelbase.dll"; + static const char function[] = "VirtualAlloc2"; + + OSVERSIONINFO info; + HMODULE library = NULL; + int ret = 0; + + /* Already done. */ + if (VirtualAlloc2 != NULL) + return 0; + + /* IsWindows10OrGreater() may also be unavailable. */ + memset(&info, 0, sizeof(info)); + info.dwOSVersionInfoSize = sizeof(info); + GetVersionEx(&info); + + /* Checking for Windows 10+ will also detect Windows Server 2016+. + * Do not abort, because Windows may report false version depending + * on executable manifest, compatibility mode, etc. + */ + if (info.dwMajorVersion < 10) + RTE_LOG(DEBUG, EAL, "Windows 10+ or Windows Server 2016+ " + "is required for advanced memory features\n"); + + library = LoadLibraryA(library_name); + if (library == NULL) { + RTE_LOG_WIN32_ERR("LoadLibraryA(\"%s\")", library_name); + return -1; + } + + VirtualAlloc2 = (VirtualAlloc2_type)( + (void *)GetProcAddress(library, function)); + if (VirtualAlloc2 == NULL) { + RTE_LOG_WIN32_ERR("GetProcAddress(\"%s\", \"%s\")\n", + library_name, function); + ret = -1; + } + + FreeLibrary(library); + + return ret; +} + +#else + +/* Stub in case VirtualAlloc2() is provided by the compiler. */ +int +eal_mem_win32api_init(void) +{ + return 0; +} + +#endif /* no VirtualAlloc2() */ + +/* Approximate error mapping from VirtualAlloc2() to POSIX mmap(3). */ +static void +set_errno_from_win32_alloc_error(DWORD code) +{ + switch (code) { + case ERROR_SUCCESS: + rte_errno = 0; + break; + + case ERROR_INVALID_ADDRESS: + /* A valid requested address is not available. */ + case ERROR_COMMITMENT_LIMIT: + /* May occcur when committing regular memory. */ + case ERROR_NO_SYSTEM_RESOURCES: + /* Occurs when the system runs out of hugepages. */ + rte_errno = ENOMEM; + break; + + case ERROR_INVALID_PARAMETER: + default: + rte_errno = EINVAL; + break; + } +} + +void * +eal_mem_reserve(void *requested_addr, size_t size, + enum eal_mem_reserve_flags flags) +{ + void *virt; + + /* Windows requires hugepages to be committed. */ + if (flags & EAL_RESERVE_HUGEPAGES) { + RTE_LOG(ERR, EAL, "Hugepage reservation is not supported\n"); + rte_errno = ENOTSUP; + return NULL; + } + + virt = VirtualAlloc2(GetCurrentProcess(), requested_addr, size, + MEM_RESERVE | MEM_RESERVE_PLACEHOLDER, PAGE_NOACCESS, + NULL, 0); + if (virt == NULL) { + DWORD err = GetLastError(); + RTE_LOG_WIN32_ERR("VirtualAlloc2()"); + set_errno_from_win32_alloc_error(err); + } + + if ((flags & EAL_RESERVE_EXACT_ADDRESS) && (virt != requested_addr)) { + if (!VirtualFree(virt, 0, MEM_RELEASE)) + RTE_LOG_WIN32_ERR("VirtualFree()"); + rte_errno = ENOMEM; + return NULL; + } + + return virt; +} + +void * +eal_mem_alloc(size_t size, enum rte_page_sizes page_size) +{ + if (page_size != 0) + return eal_mem_alloc_socket(size, SOCKET_ID_ANY); + + return VirtualAlloc( + NULL, size, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE); +} + +void * +eal_mem_alloc_socket(size_t size, int socket_id) +{ + DWORD flags = MEM_RESERVE | MEM_COMMIT; + void *addr; + + flags = MEM_RESERVE | MEM_COMMIT | MEM_LARGE_PAGES; + addr = VirtualAllocExNuma(GetCurrentProcess(), NULL, size, flags, + PAGE_READWRITE, eal_socket_numa_node(socket_id)); + if (addr == NULL) + rte_errno = ENOMEM; + return addr; +} + +void* +eal_mem_commit(void *requested_addr, size_t size, int socket_id) +{ + MEM_EXTENDED_PARAMETER param; + DWORD param_count = 0; + DWORD flags; + void *addr; + + if (requested_addr != NULL) { + MEMORY_BASIC_INFORMATION info; + if (VirtualQuery(requested_addr, &info, sizeof(info)) == 0) { + RTE_LOG_WIN32_ERR("VirtualQuery()"); + return NULL; + } + + /* Split reserved region if only a part is committed. */ + flags = MEM_RELEASE | MEM_PRESERVE_PLACEHOLDER; + if ((info.RegionSize > size) && + !VirtualFree(requested_addr, size, flags)) { + RTE_LOG_WIN32_ERR("VirtualFree(%p, %zu, " + ")", requested_addr, size); + return NULL; + } + } + + if (socket_id != SOCKET_ID_ANY) { + param_count = 1; + memset(¶m, 0, sizeof(param)); + param.Type = MemExtendedParameterNumaNode; + param.ULong = eal_socket_numa_node(socket_id); + } + + flags = MEM_RESERVE | MEM_COMMIT | MEM_LARGE_PAGES; + if (requested_addr != NULL) + flags |= MEM_REPLACE_PLACEHOLDER; + + addr = VirtualAlloc2(GetCurrentProcess(), requested_addr, size, + flags, PAGE_READWRITE, ¶m, param_count); + if (addr == NULL) { + DWORD err = GetLastError(); + RTE_LOG_WIN32_ERR("VirtualAlloc2(%p, %zu, " + ")", addr, size); + set_errno_from_win32_alloc_error(err); + return NULL; + } + + return addr; +} + +int +eal_mem_decommit(void *addr, size_t size) +{ + if (!VirtualFree(addr, size, MEM_RELEASE | MEM_PRESERVE_PLACEHOLDER)) { + RTE_LOG_WIN32_ERR("VirtualFree(%p, %zu, ...)", addr, size); + return -1; + } + return 0; +} + +/** + * Free a reserved memory region in full or in part. + * + * @param addr + * Starting address of the area to free. + * @param size + * Number of bytes to free. Must be a multiple of page size. + * @param reserved + * Fail if the region is not in reserved state. + * @return + * * 0 on successful deallocation; + * * 1 if region mut be in reserved state but it is not; + * * (-1) on system API failures. + */ +static int +mem_free(void *addr, size_t size, bool reserved) +{ + MEMORY_BASIC_INFORMATION info; + if (VirtualQuery(addr, &info, sizeof(info)) == 0) { + RTE_LOG_WIN32_ERR("VirtualQuery()"); + return -1; + } + + if (reserved && (info.State != MEM_RESERVE)) + return 1; + + /* Free complete region. */ + if ((addr == info.AllocationBase) && (size == info.RegionSize)) { + if (!VirtualFree(addr, 0, MEM_RELEASE)) { + RTE_LOG_WIN32_ERR("VirtualFree(%p, 0, MEM_RELEASE)", + addr); + } + return 0; + } + + /* Split the part to be freed and the remaining reservation. */ + if (!VirtualFree(addr, size, MEM_RELEASE | MEM_PRESERVE_PLACEHOLDER)) { + RTE_LOG_WIN32_ERR("VirtualFree(%p, %zu, " + "MEM_RELEASE | MEM_PRESERVE_PLACEHOLDER)", addr, size); + return -1; + } + + /* Actually free reservation part. */ + if (!VirtualFree(addr, 0, MEM_RELEASE)) { + RTE_LOG_WIN32_ERR("VirtualFree(%p, 0, MEM_RELEASE)", addr); + return -1; + } + + return 0; +} + +void +eal_mem_free(void *virt, size_t size) +{ + mem_free(virt, size, false); +} + +void * +rte_mem_map(void *requested_addr, size_t size, enum rte_mem_prot prot, + enum rte_map_flags flags, int fd, size_t offset) +{ + HANDLE file_handle = INVALID_HANDLE_VALUE; + HANDLE mapping_handle = INVALID_HANDLE_VALUE; + DWORD sys_prot = 0; + DWORD sys_access = 0; + DWORD size_high = (DWORD)(size >> 32); + DWORD size_low = (DWORD)size; + DWORD offset_high = (DWORD)(offset >> 32); + DWORD offset_low = (DWORD)offset; + LPVOID virt = NULL; + + if (prot & RTE_PROT_EXECUTE) { + if (prot & RTE_PROT_READ) { + sys_prot = PAGE_EXECUTE_READ; + sys_access = FILE_MAP_READ | FILE_MAP_EXECUTE; + } + if (prot & RTE_PROT_WRITE) { + sys_prot = PAGE_EXECUTE_READWRITE; + sys_access = FILE_MAP_WRITE | FILE_MAP_EXECUTE; + } + } else { + if (prot & RTE_PROT_READ) { + sys_prot = PAGE_READONLY; + sys_access = FILE_MAP_READ; + } + if (prot & RTE_PROT_WRITE) { + sys_prot = PAGE_READWRITE; + sys_access = FILE_MAP_WRITE; + } + } + + if (flags & RTE_MAP_PRIVATE) + sys_access |= FILE_MAP_COPY; + + if ((flags & RTE_MAP_ANONYMOUS) == 0) + file_handle = (HANDLE)_get_osfhandle(fd); + + mapping_handle = CreateFileMapping( + file_handle, NULL, sys_prot, size_high, size_low, NULL); + if (mapping_handle == INVALID_HANDLE_VALUE) { + RTE_LOG_WIN32_ERR("CreateFileMapping()"); + return NULL; + } + + /* TODO: there is a race for the requested_addr between mem_free() + * and MapViewOfFileEx(). MapViewOfFile3() that can replace a reserved + * region with a mapping in a single operation, but it does not support + * private mappings. + */ + if (requested_addr != NULL) { + int ret = mem_free(requested_addr, size, true); + if (ret) { + if (ret > 0) { + RTE_LOG(ERR, EAL, "Cannot map memory " + "to a region not reserved\n"); + rte_errno = EADDRNOTAVAIL; + } + return NULL; + } + } + + virt = MapViewOfFileEx(mapping_handle, sys_access, + offset_high, offset_low, size, requested_addr); + if (!virt) { + RTE_LOG_WIN32_ERR("MapViewOfFileEx()"); + return NULL; + } + + if ((flags & RTE_MAP_FIXED) && (virt != requested_addr)) { + BOOL ret = UnmapViewOfFile(virt); + virt = NULL; + if (!ret) + RTE_LOG_WIN32_ERR("UnmapViewOfFile()"); + } + + if (!CloseHandle(mapping_handle)) + RTE_LOG_WIN32_ERR("CloseHandle()"); + + return virt; +} + +int +rte_mem_unmap(void *virt, size_t size) +{ + RTE_SET_USED(size); + + if (!UnmapViewOfFile(virt)) { + rte_errno = GetLastError(); + RTE_LOG_WIN32_ERR("UnmapViewOfFile()"); + return -1; + } + return 0; +} + +int +rte_get_page_size(void) +{ + SYSTEM_INFO info; + GetSystemInfo(&info); + return info.dwPageSize; +} + +int +rte_mem_lock(const void *virt, size_t size) +{ + /* VirtualLock() takes `void*`, work around compiler warning. */ + void *addr = (void *)((uintptr_t)virt); + + if (!VirtualLock(addr, size)) { + RTE_LOG_WIN32_ERR("VirtualLock()"); + return -1; + } + + return 0; +} diff --git a/lib/librte_eal/windows/eal_windows.h b/lib/librte_eal/windows/eal_windows.h index 390d2fd66..b202a1aa5 100644 --- a/lib/librte_eal/windows/eal_windows.h +++ b/lib/librte_eal/windows/eal_windows.h @@ -36,4 +36,71 @@ int eal_thread_create(pthread_t *thread); */ unsigned int eal_socket_numa_node(unsigned int socket_id); +/** + * Locate Win32 memory management routines in system libraries. + * + * @return 0 on success, (-1) on failure. + */ +int eal_mem_win32api_init(void); + +/** + * Allocate a contiguous chunk of virtual memory. + * + * Use eal_mem_free() to free allocated memory. + * + * @param size + * Number of bytes to allocate. + * @param page_size + * If non-zero, means memory must be allocated in hugepages + * of the specified size. The @code size @endcode parameter + * must then be a multiple of the largest hugepage size requested. + * @return + * Address of allocated memory or NULL on failure (rte_errno is set). + */ +void *eal_mem_alloc(size_t size, enum rte_page_sizes page_size); + +/** + * Allocate new memory in hugepages on the specified NUMA node. + * + * @param size + * Number of bytes to allocate. Must be a multiple of huge page size. + * @param socket_id + * Socket ID. + * @return + * Address of the memory allocated on success or NULL on failure. + */ +void *eal_mem_alloc_socket(size_t size, int socket_id); + +/** + * Commit memory previously reserved with @ref eal_mem_reserve() + * or decommitted from hugepages by @ref eal_mem_decommit(). + * + * @param requested_addr + * Address within a reserved region. Must not be NULL. + * @param size + * Number of bytes to commit. Must be a multiple of page size. + * @param socket_id + * Socket ID to allocate on. Can be SOCKET_ID_ANY. + * @return + * On success, address of the committed memory, that is, requested_addr. + * On failure, NULL and @code rte_errno @endcode is set. + */ +void *eal_mem_commit(void *requested_addr, size_t size, int socket_id); + +/** + * Put allocated or committed memory back into reserved state. + * + * @param addr + * Address of the region to decommit. + * @param size + * Number of bytes to decommit. + * + * The @code addr @endcode and @code param @endcode must match + * location and size of previously allocated or committed region. + * + * @return + * 0 on success, (-1) on failure. + */ +int eal_mem_decommit(void *addr, size_t size); + #endif /* _EAL_WINDOWS_H_ */ diff --git a/lib/librte_eal/windows/meson.build b/lib/librte_eal/windows/meson.build index 5f118bfe2..81d3ee095 100644 --- a/lib/librte_eal/windows/meson.build +++ b/lib/librte_eal/windows/meson.build @@ -8,6 +8,7 @@ sources += files( 'eal_debug.c', 'eal_hugepages.c', 'eal_lcore.c', + 'eal_memory.c', 'eal_thread.c', 'getopt.c', ) From patchwork Tue Apr 14 19:44:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 68449 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 20B1AA0577; Tue, 14 Apr 2020 21:45:51 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6038B1D483; Tue, 14 Apr 2020 21:44:56 +0200 (CEST) Received: from mail-lf1-f65.google.com (mail-lf1-f65.google.com [209.85.167.65]) by dpdk.org (Postfix) with ESMTP id 975EB1D427 for ; Tue, 14 Apr 2020 21:44:50 +0200 (CEST) Received: by mail-lf1-f65.google.com with SMTP id j14so726909lfg.9 for ; Tue, 14 Apr 2020 12:44:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=hfl6HnDUalJ3DI06sowQDoUqinHg5jz18NO73eRnfds=; b=H9xFDMsc1b7j6K6oOrhCm74R45q1rXoDrOAahT5qm/O3Zwr/KhspVS/1qeEwgzzXE3 kt4bDSPLP0GXDR5ca6Rxo3SwM7koskCqViblu6vrHgaOmUq44Ul7ibFdaQ4crlXLZIfF Kp83iDRHrUNj682qsj3LkOeZGEw7QrDIoJNYKpiwucJ54YmLDqPLJPbNHfKaNni0mr+v YAqe0MnOMeQEO1oIfw8r5JNIvlOAj2GEbnqhn8nnQNM7xWlHqStjOcMOGWeuIjN6OGg0 Ivs8d7DHrerZojKTmPzozYwvJ9UtxX9JgvqCXslS2N5+FA0QAUsttbaRqShNi6SRp+qM V1og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hfl6HnDUalJ3DI06sowQDoUqinHg5jz18NO73eRnfds=; b=rCmqb94dEyZ8A5i6cbdSnOUx5qgJsHRDTGJG5/ihSTVftcWKtC/v/zf2lilBy9CXIL tD8dOURKt8p74KVQNpqP7Xk2xngwc4F6OQ4dF98cReN2joixRRP7hyDJ2kkdjHQtUhGU 4aY+5sKhceYCZJXSHTF0aeR6cjqTXZ1rndwj59ykzGpNuhUagDYMwtWyHqRHU3SMz9NI 3vNPxVm54ctRAR3ogPhvQ0M6LSlf/6kBvpXm+qKRp2BJNzXdx2qyPUSYtM9dXsmoA/Z9 ZLzauBLoQmQIBLSbxP3369XMFLfYmcL2SWhgHd1NAkYSRU/VzdOR8FpI5wujQjczZ95Q q6Yg== X-Gm-Message-State: AGi0PuY4+MCPKXlWb6ylpL6hb5EFdqyQ99tx5MaUBe4eI0pnm3OylDpx nrtuBeCRi2SDB+18+dLlSHC1gvlyrFIxVQ== X-Google-Smtp-Source: APiQypIrZInQRIMsd4PrpEJo+emzsfxMQQ8LUWDOxd3wXPaSroT0PTImagDKBfWf99eub481DTFl6w== X-Received: by 2002:a19:c216:: with SMTP id l22mr764933lfc.172.1586893489691; Tue, 14 Apr 2020 12:44:49 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.googlemail.com with ESMTPSA id h14sm11272426lfm.60.2020.04.14.12.44.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Apr 2020 12:44:49 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: "Dmitry Malloy (MESHCHANINOV)" , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Anatoly Burakov , Bruce Richardson Date: Tue, 14 Apr 2020 22:44:22 +0300 Message-Id: <20200414194426.1640704-8-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> References: <20200410164342.1194634-1-dmitry.kozliuk@gmail.com> <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v3 07/10] eal: extract common code for memseg list initialization X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" All supported OS create memory segment lists (MSL) and reserve VA space for them in a nearly identical way. Move common code into EAL private functions to reduce duplication. Signed-off-by: Dmitry Kozlyuk --- lib/librte_eal/common/eal_common_memory.c | 54 ++++++++++++++++++ lib/librte_eal/common/eal_private.h | 34 ++++++++++++ lib/librte_eal/freebsd/eal_memory.c | 54 +++--------------- lib/librte_eal/linux/eal_memory.c | 68 +++++------------------ 4 files changed, 110 insertions(+), 100 deletions(-) diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index cc7d54e0c..d9764681a 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -25,6 +25,7 @@ #include "eal_private.h" #include "eal_internal_cfg.h" #include "eal_memcfg.h" +#include "eal_options.h" #include "malloc_heap.h" /* @@ -182,6 +183,59 @@ eal_get_virtual_area(void *requested_addr, size_t *size, return aligned_addr; } +int +eal_reserve_memseg_list(struct rte_memseg_list *msl, + enum eal_mem_reserve_flags flags) +{ + uint64_t page_sz; + size_t mem_sz; + void *addr; + + page_sz = msl->page_sz; + mem_sz = page_sz * msl->memseg_arr.len; + + addr = eal_get_virtual_area(msl->base_va, &mem_sz, page_sz, 0, flags); + if (addr == NULL) { + if (rte_errno == EADDRNOTAVAIL) + RTE_LOG(ERR, EAL, "Cannot reserve %llu bytes at [%p] - " + "please use '--" OPT_BASE_VIRTADDR "' option\n", + (unsigned long long)mem_sz, msl->base_va); + else + RTE_LOG(ERR, EAL, "Cannot reserve memory\n"); + return -1; + } + msl->base_va = addr; + msl->len = mem_sz; + + return 0; +} + +int +eal_alloc_memseg_list(struct rte_memseg_list *msl, uint64_t page_sz, + int n_segs, int socket_id, int type_msl_idx, bool heap) +{ + char name[RTE_FBARRAY_NAME_LEN]; + + snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id, + type_msl_idx); + if (rte_fbarray_init(&msl->memseg_arr, name, n_segs, + sizeof(struct rte_memseg))) { + RTE_LOG(ERR, EAL, "Cannot allocate memseg list: %s\n", + rte_strerror(rte_errno)); + return -1; + } + + msl->page_sz = page_sz; + msl->socket_id = socket_id; + msl->base_va = NULL; + msl->heap = heap; + + RTE_LOG(DEBUG, EAL, "Memseg list allocated: 0x%zxkB at socket %i\n", + (size_t)page_sz >> 10, socket_id); + + return 0; +} + static struct rte_memseg * virt2memseg(const void *addr, const struct rte_memseg_list *msl) { diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index 1e89338f2..76938e379 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -246,6 +246,40 @@ void * eal_get_virtual_area(void *requested_addr, size_t *size, size_t page_sz, int flags, int mmap_flags); +/** + * Reserve VA space for a memory segment list. + * + * @param msl + * Memory segment list with page size defined. + * @param flags + * Extra memory reservation flags. Can be 0 if unnecessary. + * @return + * 0 on success, (-1) on failure and rte_errno is set. + */ +int +eal_reserve_memseg_list(struct rte_memseg_list *msl, + enum eal_mem_reserve_flags flags); + +/** + * Initialize a memory segment list with its backing storage. + * + * @param msl + * Memory segment list to be filled. + * @param page_sz + * Size of segment pages in the MSL. + * @param n_segs + * Number of segments. + * @param socket_id + * Socket ID. Must not be SOCKET_ID_ANY. + * @param type_msl_idx + * Index of the MSL among other MSLs of the same socket and page size. + * @param heap + * Mark MSL as pointing to a heap. + */ +int +eal_alloc_memseg_list(struct rte_memseg_list *msl, uint64_t page_sz, + int n_segs, int socket_id, int type_msl_idx, bool heap); + /** * Get cpu core_id. * diff --git a/lib/librte_eal/freebsd/eal_memory.c b/lib/librte_eal/freebsd/eal_memory.c index a97d8f0f0..5174f9cd0 100644 --- a/lib/librte_eal/freebsd/eal_memory.c +++ b/lib/librte_eal/freebsd/eal_memory.c @@ -336,61 +336,23 @@ get_mem_amount(uint64_t page_sz, uint64_t max_mem) return RTE_ALIGN(area_sz, page_sz); } -#define MEMSEG_LIST_FMT "memseg-%" PRIu64 "k-%i-%i" static int -alloc_memseg_list(struct rte_memseg_list *msl, uint64_t page_sz, +memseg_list_alloc(struct rte_memseg_list *msl, uint64_t page_sz, int n_segs, int socket_id, int type_msl_idx) { - char name[RTE_FBARRAY_NAME_LEN]; - - snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id, - type_msl_idx); - if (rte_fbarray_init(&msl->memseg_arr, name, n_segs, - sizeof(struct rte_memseg))) { - RTE_LOG(ERR, EAL, "Cannot allocate memseg list: %s\n", - rte_strerror(rte_errno)); - return -1; - } - - msl->page_sz = page_sz; - msl->socket_id = socket_id; - msl->base_va = NULL; - - RTE_LOG(DEBUG, EAL, "Memseg list allocated: 0x%zxkB at socket %i\n", - (size_t)page_sz >> 10, socket_id); - - return 0; + return eal_alloc_memseg_list( + msl, page_sz, n_segs, socket_id, type_msl_idx, false); } static int -alloc_va_space(struct rte_memseg_list *msl) +memseg_list_reserve(struct rte_memseg_list *msl) { - uint64_t page_sz; - size_t mem_sz; - void *addr; - int flags = 0; + enum eal_reserve_flags flags = 0; #ifdef RTE_ARCH_PPC_64 - flags |= MAP_HUGETLB; + flags |= EAL_RESERVE_HUGEPAGES; #endif - - page_sz = msl->page_sz; - mem_sz = page_sz * msl->memseg_arr.len; - - addr = eal_get_virtual_area(msl->base_va, &mem_sz, page_sz, 0, flags); - if (addr == NULL) { - if (rte_errno == EADDRNOTAVAIL) - RTE_LOG(ERR, EAL, "Could not mmap %llu bytes at [%p] - " - "please use '--" OPT_BASE_VIRTADDR "' option\n", - (unsigned long long)mem_sz, msl->base_va); - else - RTE_LOG(ERR, EAL, "Cannot reserve memory\n"); - return -1; - } - msl->base_va = addr; - msl->len = mem_sz; - - return 0; + return eal_reserve_memseg_list(msl, flags); } @@ -479,7 +441,7 @@ memseg_primary_init(void) cur_max_mem); n_segs = cur_mem / hugepage_sz; - if (alloc_memseg_list(msl, hugepage_sz, n_segs, + if (memseg_list_alloc(msl, hugepage_sz, n_segs, 0, type_msl_idx)) return -1; diff --git a/lib/librte_eal/linux/eal_memory.c b/lib/librte_eal/linux/eal_memory.c index 7a9c97ff8..a01a7ce76 100644 --- a/lib/librte_eal/linux/eal_memory.c +++ b/lib/librte_eal/linux/eal_memory.c @@ -802,7 +802,7 @@ get_mem_amount(uint64_t page_sz, uint64_t max_mem) } static int -free_memseg_list(struct rte_memseg_list *msl) +memseg_list_free(struct rte_memseg_list *msl) { if (rte_fbarray_destroy(&msl->memseg_arr)) { RTE_LOG(ERR, EAL, "Cannot destroy memseg list\n"); @@ -812,58 +812,18 @@ free_memseg_list(struct rte_memseg_list *msl) return 0; } -#define MEMSEG_LIST_FMT "memseg-%" PRIu64 "k-%i-%i" static int -alloc_memseg_list(struct rte_memseg_list *msl, uint64_t page_sz, +memseg_list_alloc(struct rte_memseg_list *msl, uint64_t page_sz, int n_segs, int socket_id, int type_msl_idx) { - char name[RTE_FBARRAY_NAME_LEN]; - - snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id, - type_msl_idx); - if (rte_fbarray_init(&msl->memseg_arr, name, n_segs, - sizeof(struct rte_memseg))) { - RTE_LOG(ERR, EAL, "Cannot allocate memseg list: %s\n", - rte_strerror(rte_errno)); - return -1; - } - - msl->page_sz = page_sz; - msl->socket_id = socket_id; - msl->base_va = NULL; - msl->heap = 1; /* mark it as a heap segment */ - - RTE_LOG(DEBUG, EAL, "Memseg list allocated: 0x%zxkB at socket %i\n", - (size_t)page_sz >> 10, socket_id); - - return 0; + return eal_alloc_memseg_list( + msl, page_sz, n_segs, socket_id, type_msl_idx, true); } static int -alloc_va_space(struct rte_memseg_list *msl) +memseg_list_reserve(struct rte_memseg_list *msl) { - uint64_t page_sz; - size_t mem_sz; - void *addr; - int flags = 0; - - page_sz = msl->page_sz; - mem_sz = page_sz * msl->memseg_arr.len; - - addr = eal_get_virtual_area(msl->base_va, &mem_sz, page_sz, 0, flags); - if (addr == NULL) { - if (rte_errno == EADDRNOTAVAIL) - RTE_LOG(ERR, EAL, "Could not mmap %llu bytes at [%p] - " - "please use '--" OPT_BASE_VIRTADDR "' option\n", - (unsigned long long)mem_sz, msl->base_va); - else - RTE_LOG(ERR, EAL, "Cannot reserve memory\n"); - return -1; - } - msl->base_va = addr; - msl->len = mem_sz; - - return 0; + return eal_reserve_memseg_list(msl, 0); } /* @@ -1009,12 +969,12 @@ prealloc_segments(struct hugepage_file *hugepages, int n_pages) } /* now, allocate fbarray itself */ - if (alloc_memseg_list(msl, page_sz, n_segs, socket, + if (memseg_list_alloc(msl, page_sz, n_segs, socket, msl_idx) < 0) return -1; /* finally, allocate VA space */ - if (alloc_va_space(msl) < 0) + if (memseg_list_reserve(msl) < 0) return -1; } } @@ -2191,7 +2151,7 @@ memseg_primary_init_32(void) max_pagesz_mem); n_segs = cur_mem / hugepage_sz; - if (alloc_memseg_list(msl, hugepage_sz, n_segs, + if (memseg_list_alloc(msl, hugepage_sz, n_segs, socket_id, type_msl_idx)) { /* failing to allocate a memseg list is * a serious error. @@ -2200,13 +2160,13 @@ memseg_primary_init_32(void) return -1; } - if (alloc_va_space(msl)) { + if (memseg_list_reserve(msl)) { /* if we couldn't allocate VA space, we * can try with smaller page sizes. */ RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list, retrying with different page size\n"); /* deallocate memseg list */ - if (free_memseg_list(msl)) + if (memseg_list_free(msl)) return -1; break; } @@ -2395,11 +2355,11 @@ memseg_primary_init(void) } msl = &mcfg->memsegs[msl_idx++]; - if (alloc_memseg_list(msl, pagesz, n_segs, + if (memseg_list_alloc(msl, pagesz, n_segs, socket_id, cur_seglist)) goto out; - if (alloc_va_space(msl)) { + if (memseg_list_reserve(msl)) { RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list\n"); goto out; } @@ -2433,7 +2393,7 @@ memseg_secondary_init(void) } /* preallocate VA space */ - if (alloc_va_space(msl)) { + if (memseg_list_reserve(msl)) { RTE_LOG(ERR, EAL, "Cannot preallocate VA space for hugepage memory\n"); return -1; } From patchwork Tue Apr 14 19:44:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 68450 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9483CA0577; Tue, 14 Apr 2020 21:46:02 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 3DE931D429; Tue, 14 Apr 2020 21:44:59 +0200 (CEST) Received: from mail-lf1-f66.google.com (mail-lf1-f66.google.com [209.85.167.66]) by dpdk.org (Postfix) with ESMTP id 6CBEE1D429 for ; Tue, 14 Apr 2020 21:44:51 +0200 (CEST) Received: by mail-lf1-f66.google.com with SMTP id j14so726934lfg.9 for ; Tue, 14 Apr 2020 12:44:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Iork4+fw+umgfjcoLg8piSqfkvrIXDOmesUDuo40mjk=; b=p7hruKcFxLUSWAR29/W679JJ4VzTW/lsc269VWX384ZPTSlkWB9uYBP4BoyhewpTE5 3bAOGKdGvtMivxbsJ8oQH6VRFBg2lr29ZHsuuY0N+cEzMxgQRoLPLd3cJAy2NDdCbUkF 90zk6WmSo9tIz9WAV6kodaVuiW06gzN8/A/90dggyUqnyJ8/TxLhNmMKfpX9MOIFM1qc ErDgVI97QfUoEkl5Y+PPpKFIVGarYP1eCO4y7AZJYozpDqw+KGd4rvf0oMRcKbXS4Pn/ +F5swgIHThHktRk5xBFTlkM74j1uiUW1+xBc84JlVEWc5Ld5gtdMbRTVHxlASiXx58kR wEKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Iork4+fw+umgfjcoLg8piSqfkvrIXDOmesUDuo40mjk=; b=HTl+xXyF5OEUe3jHjdhFwYHD+IAVgM50OsFwW1ikzxvxLtRZn8vwPIU09dXURf/qNC DS2d6fXX554DcH0OdkcHid10lkHA8BwkbsVB752nImpateM+b5NcgY+Z7XeHbh5oBRCO Qoxyqs/sMk73Io5TgEexMeKsK/rNbpQWF0A2+7zwwvwlE1i8rhOjYpe76ftNoo0gf8cE +qlyUnhK/nZXYvTG/8+I7nPx4YpqsEgLX8Y9Qn+e02b3StLP205IuwvORcb9AFiMukuD ZG54RQVHvHlEgdnNCAkqXkXOCSaIaNx4Cgc3BkaqkpUxU4nw6ykaPuX37hEUnnBASXSA k3Hw== X-Gm-Message-State: AGi0PuZWaj6gE2Z+Og+ZFcRqwe8QbDPVBOvgQS3WD3asllpwZOOwbDR0 8zVAHQ+62B0XyB/HAiaaWRXKi4P9FP4LCA== X-Google-Smtp-Source: APiQypKKUruN4UpNtUDxC94m3izwkEdFL6NsZAlI9bGmIoXWIFpRHay2w2+PxXwZLQZmRiny8UiL2g== X-Received: by 2002:a19:5e46:: with SMTP id z6mr769100lfi.74.1586893490748; Tue, 14 Apr 2020 12:44:50 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.googlemail.com with ESMTPSA id h14sm11272426lfm.60.2020.04.14.12.44.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Apr 2020 12:44:50 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: "Dmitry Malloy (MESHCHANINOV)" , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Anatoly Burakov Date: Tue, 14 Apr 2020 22:44:23 +0300 Message-Id: <20200414194426.1640704-9-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> References: <20200410164342.1194634-1-dmitry.kozliuk@gmail.com> <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v3 08/10] eal/windows: fix rte_page_sizes with Clang on Windows X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Clang on Windows follows MS ABI where enum values are limited to 2^31-1. Enum rte_page_size has members valued above this limit, which get wrapped to zero, resulting in compilation error (duplicate values in enum). Using MS ABI is mandatory for Windows EAL to call Win32 APIs. Define these values outside of the enum for Clang on Windows only. This does not affect runtime, because Windows doesn't run on machines with 4GiB and 16GiB hugepages. Signed-off-by: Dmitry Kozlyuk --- lib/librte_eal/include/rte_memory.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/lib/librte_eal/include/rte_memory.h b/lib/librte_eal/include/rte_memory.h index 1b7c3e5df..3ec673f51 100644 --- a/lib/librte_eal/include/rte_memory.h +++ b/lib/librte_eal/include/rte_memory.h @@ -34,8 +34,14 @@ enum rte_page_sizes { RTE_PGSIZE_256M = 1ULL << 28, RTE_PGSIZE_512M = 1ULL << 29, RTE_PGSIZE_1G = 1ULL << 30, +/* Work around Clang on Windows being limited to 32-bit underlying type. */ +#if !defined(RTE_CC_CLANG) || !defined(RTE_EXEC_ENV_WINDOWS) RTE_PGSIZE_4G = 1ULL << 32, RTE_PGSIZE_16G = 1ULL << 34, +#else +#define RTE_PGSIZE_4G (1ULL << 32) +#define RTE_PGSIZE_16G (1ULL << 34) +#endif }; #define SOCKET_ID_ANY -1 /**< Any NUMA socket. */ From patchwork Tue Apr 14 19:44:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 68451 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id EC9F1A0577; Tue, 14 Apr 2020 21:46:08 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 980D61D514; Tue, 14 Apr 2020 21:45:00 +0200 (CEST) Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com [209.85.208.196]) by dpdk.org (Postfix) with ESMTP id 8644F1D44A for ; Tue, 14 Apr 2020 21:44:53 +0200 (CEST) Received: by mail-lj1-f196.google.com with SMTP id q22so1180602ljg.0 for ; Tue, 14 Apr 2020 12:44:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=IbsXLNiJ9aVm86S7Wc/IFSEiWqZM4CcxGlj8p036YBs=; b=oa8l3VBiqVwW1FH4km4LF418rX5gTrquZzTUAS97SeyljnTxuTBriUCUjFM+pS2maV WIiBYO3MlLsNjIl4HY6J0wjuL/PSQD4pW1MEI3nrXI9hXtrboqzIJF4IAMzB1inq5QnB mJU6jh8BAz+quqcwGldaLlXOJAS17eLNtIqORR4jYhqQp4I20r3r4KR+TssvZUuPjL0a omRWXQlZHcwTqlQXPo9FceJ3UAaZIpdCJESxHazIYQDtMxspSW7ef7PEs7MSiVSdr+BA OPtGoqfyvZHiFxEXbagpK5RhenHQjNqJAiomlxxZxHdhs1VMmnKdOgEUpeAhMeUQ8iGT gpLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=IbsXLNiJ9aVm86S7Wc/IFSEiWqZM4CcxGlj8p036YBs=; b=dhAUIcc7IUVHsZKLBAFNuF3hBvPLLtAdgipqVb11MOSo6JIt+GdxKtzpF5REJwnmlp k5+8pAEISGG/D9E527+f3036lsaPtTVU5dD2AoEo7ip+ZU6e4yGPb8m1OWNdg/zMvhci AieSq7WSswka4qR+muuQO7XH9MSSWcA15ZsY8SUgA3XdnrKGxwSX+Xz0Pdw8+TcY1B6d F4q6XLPExZd6SxxaDg0Pt6OMEqLcz/NQqkzkxdMuVkN1yPKRs1cPezhu4e/VJrWRzZrO qAHL3yd9Glo1oB0qyVksx85k02rZ0ZiUkXsXsvtgkuZ2y6uxJUyCucy5eSIjsU2hh2vw f0ew== X-Gm-Message-State: AGi0PuaPuGwu9nhxsfyRwIsheCsf3RHQZenZygi6MGCs7FXITI/ZMIXY 4SHI6k+EAXQMSQbFAynNcE8T22kbwFnkOw== X-Google-Smtp-Source: APiQypL4YcJP+70hUOm2QmsYHd6l/JzgM8v84voZCvyRRePicjwkd/7ciwq8hZve8w/ZoGxjvyQbhg== X-Received: by 2002:a2e:b70d:: with SMTP id j13mr996431ljo.252.1586893491944; Tue, 14 Apr 2020 12:44:51 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.googlemail.com with ESMTPSA id h14sm11272426lfm.60.2020.04.14.12.44.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Apr 2020 12:44:51 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: "Dmitry Malloy (MESHCHANINOV)" , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Harini Ramakrishnan , Omar Cardona , Pallavi Kadam , Ranjit Menon Date: Tue, 14 Apr 2020 22:44:24 +0300 Message-Id: <20200414194426.1640704-10-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> References: <20200410164342.1194634-1-dmitry.kozliuk@gmail.com> <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v3 09/10] eal/windows: replace sys/queue.h with a complete one from FreeBSD X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Limited version imported previously lacks at least SLIST macros. Import a complete file from FreeBSD, since its license exception is already approved by Technical Board. Signed-off-by: Dmitry Kozlyuk --- lib/librte_eal/windows/include/sys/queue.h | 663 +++++++++++++++++++-- 1 file changed, 601 insertions(+), 62 deletions(-) diff --git a/lib/librte_eal/windows/include/sys/queue.h b/lib/librte_eal/windows/include/sys/queue.h index a65949a78..9756bee6f 100644 --- a/lib/librte_eal/windows/include/sys/queue.h +++ b/lib/librte_eal/windows/include/sys/queue.h @@ -8,7 +8,36 @@ #define _SYS_QUEUE_H_ /* - * This file defines tail queues. + * This file defines four types of data structures: singly-linked lists, + * singly-linked tail queues, lists and tail queues. + * + * A singly-linked list is headed by a single forward pointer. The elements + * are singly linked for minimum space and pointer manipulation overhead at + * the expense of O(n) removal for arbitrary elements. New elements can be + * added to the list after an existing element or at the head of the list. + * Elements being removed from the head of the list should use the explicit + * macro for this purpose for optimum efficiency. A singly-linked list may + * only be traversed in the forward direction. Singly-linked lists are ideal + * for applications with large datasets and few or no removals or for + * implementing a LIFO queue. + * + * A singly-linked tail queue is headed by a pair of pointers, one to the + * head of the list and the other to the tail of the list. The elements are + * singly linked for minimum space and pointer manipulation overhead at the + * expense of O(n) removal for arbitrary elements. New elements can be added + * to the list after an existing element, at the head of the list, or at the + * end of the list. Elements being removed from the head of the tail queue + * should use the explicit macro for this purpose for optimum efficiency. + * A singly-linked tail queue may only be traversed in the forward direction. + * Singly-linked tail queues are ideal for applications with large datasets + * and few or no removals or for implementing a FIFO queue. + * + * A list is headed by a single forward pointer (or an array of forward + * pointers for a hash table header). The elements are doubly linked + * so that an arbitrary element can be removed without a need to + * traverse the list. New elements can be added to the list before + * or after an existing element or at the head of the list. A list + * may be traversed in either direction. * * A tail queue is headed by a pair of pointers, one to the head of the * list and the other to the tail of the list. The elements are doubly @@ -17,65 +46,93 @@ * after an existing element, at the head of the list, or at the end of * the list. A tail queue may be traversed in either direction. * + * For details on the use of these macros, see the queue(3) manual page. + * * Below is a summary of implemented functions where: * + means the macro is available * - means the macro is not available * s means the macro is available but is slow (runs in O(n) time) * - * TAILQ - * _HEAD + - * _CLASS_HEAD + - * _HEAD_INITIALIZER + - * _ENTRY + - * _CLASS_ENTRY + - * _INIT + - * _EMPTY + - * _FIRST + - * _NEXT + - * _PREV + - * _LAST + - * _LAST_FAST + - * _FOREACH + - * _FOREACH_FROM + - * _FOREACH_SAFE + - * _FOREACH_FROM_SAFE + - * _FOREACH_REVERSE + - * _FOREACH_REVERSE_FROM + - * _FOREACH_REVERSE_SAFE + - * _FOREACH_REVERSE_FROM_SAFE + - * _INSERT_HEAD + - * _INSERT_BEFORE + - * _INSERT_AFTER + - * _INSERT_TAIL + - * _CONCAT + - * _REMOVE_AFTER - - * _REMOVE_HEAD - - * _REMOVE + - * _SWAP + + * SLIST LIST STAILQ TAILQ + * _HEAD + + + + + * _CLASS_HEAD + + + + + * _HEAD_INITIALIZER + + + + + * _ENTRY + + + + + * _CLASS_ENTRY + + + + + * _INIT + + + + + * _EMPTY + + + + + * _FIRST + + + + + * _NEXT + + + + + * _PREV - + - + + * _LAST - - + + + * _LAST_FAST - - - + + * _FOREACH + + + + + * _FOREACH_FROM + + + + + * _FOREACH_SAFE + + + + + * _FOREACH_FROM_SAFE + + + + + * _FOREACH_REVERSE - - - + + * _FOREACH_REVERSE_FROM - - - + + * _FOREACH_REVERSE_SAFE - - - + + * _FOREACH_REVERSE_FROM_SAFE - - - + + * _INSERT_HEAD + + + + + * _INSERT_BEFORE - + - + + * _INSERT_AFTER + + + + + * _INSERT_TAIL - - + + + * _CONCAT s s + + + * _REMOVE_AFTER + - + - + * _REMOVE_HEAD + - + - + * _REMOVE s + s + + * _SWAP + + + + * */ - -#ifdef __cplusplus -extern "C" { +#ifdef QUEUE_MACRO_DEBUG +#warn Use QUEUE_MACRO_DEBUG_TRACE and/or QUEUE_MACRO_DEBUG_TRASH +#define QUEUE_MACRO_DEBUG_TRACE +#define QUEUE_MACRO_DEBUG_TRASH #endif -/* - * List definitions. - */ -#define LIST_HEAD(name, type) \ -struct name { \ - struct type *lh_first; /* first element */ \ -} +#ifdef QUEUE_MACRO_DEBUG_TRACE +/* Store the last 2 places the queue element or head was altered */ +struct qm_trace { + unsigned long lastline; + unsigned long prevline; + const char *lastfile; + const char *prevfile; +}; + +#define TRACEBUF struct qm_trace trace; +#define TRACEBUF_INITIALIZER { __LINE__, 0, __FILE__, NULL } , + +#define QMD_TRACE_HEAD(head) do { \ + (head)->trace.prevline = (head)->trace.lastline; \ + (head)->trace.prevfile = (head)->trace.lastfile; \ + (head)->trace.lastline = __LINE__; \ + (head)->trace.lastfile = __FILE__; \ +} while (0) +#define QMD_TRACE_ELEM(elem) do { \ + (elem)->trace.prevline = (elem)->trace.lastline; \ + (elem)->trace.prevfile = (elem)->trace.lastfile; \ + (elem)->trace.lastline = __LINE__; \ + (elem)->trace.lastfile = __FILE__; \ +} while (0) + +#else /* !QUEUE_MACRO_DEBUG_TRACE */ #define QMD_TRACE_ELEM(elem) #define QMD_TRACE_HEAD(head) #define TRACEBUF #define TRACEBUF_INITIALIZER +#endif /* QUEUE_MACRO_DEBUG_TRACE */ +#ifdef QUEUE_MACRO_DEBUG_TRASH +#define QMD_SAVELINK(name, link) void **name = (void *)&(link) +#define TRASHIT(x) do {(x) = (void *)-1;} while (0) +#define QMD_IS_TRASHED(x) ((x) == (void *)(intptr_t)-1) +#else /* !QUEUE_MACRO_DEBUG_TRASH */ +#define QMD_SAVELINK(name, link) #define TRASHIT(x) #define QMD_IS_TRASHED(x) 0 - -#define QMD_SAVELINK(name, link) +#endif /* QUEUE_MACRO_DEBUG_TRASH */ #ifdef __cplusplus /* @@ -86,6 +143,445 @@ struct name { \ #define QUEUE_TYPEOF(type) struct type #endif +/* + * Singly-linked List declarations. + */ +#define SLIST_HEAD(name, type) \ +struct name { \ + struct type *slh_first; /* first element */ \ +} + +#define SLIST_CLASS_HEAD(name, type) \ +struct name { \ + class type *slh_first; /* first element */ \ +} + +#define SLIST_HEAD_INITIALIZER(head) \ + { NULL } + +#define SLIST_ENTRY(type) \ +struct { \ + struct type *sle_next; /* next element */ \ +} + +#define SLIST_CLASS_ENTRY(type) \ +struct { \ + class type *sle_next; /* next element */ \ +} + +/* + * Singly-linked List functions. + */ +#if (defined(_KERNEL) && defined(INVARIANTS)) +#define QMD_SLIST_CHECK_PREVPTR(prevp, elm) do { \ + if (*(prevp) != (elm)) \ + panic("Bad prevptr *(%p) == %p != %p", \ + (prevp), *(prevp), (elm)); \ +} while (0) +#else +#define QMD_SLIST_CHECK_PREVPTR(prevp, elm) +#endif + +#define SLIST_CONCAT(head1, head2, type, field) do { \ + QUEUE_TYPEOF(type) *curelm = SLIST_FIRST(head1); \ + if (curelm == NULL) { \ + if ((SLIST_FIRST(head1) = SLIST_FIRST(head2)) != NULL) \ + SLIST_INIT(head2); \ + } else if (SLIST_FIRST(head2) != NULL) { \ + while (SLIST_NEXT(curelm, field) != NULL) \ + curelm = SLIST_NEXT(curelm, field); \ + SLIST_NEXT(curelm, field) = SLIST_FIRST(head2); \ + SLIST_INIT(head2); \ + } \ +} while (0) + +#define SLIST_EMPTY(head) ((head)->slh_first == NULL) + +#define SLIST_FIRST(head) ((head)->slh_first) + +#define SLIST_FOREACH(var, head, field) \ + for ((var) = SLIST_FIRST((head)); \ + (var); \ + (var) = SLIST_NEXT((var), field)) + +#define SLIST_FOREACH_FROM(var, head, field) \ + for ((var) = ((var) ? (var) : SLIST_FIRST((head))); \ + (var); \ + (var) = SLIST_NEXT((var), field)) + +#define SLIST_FOREACH_SAFE(var, head, field, tvar) \ + for ((var) = SLIST_FIRST((head)); \ + (var) && ((tvar) = SLIST_NEXT((var), field), 1); \ + (var) = (tvar)) + +#define SLIST_FOREACH_FROM_SAFE(var, head, field, tvar) \ + for ((var) = ((var) ? (var) : SLIST_FIRST((head))); \ + (var) && ((tvar) = SLIST_NEXT((var), field), 1); \ + (var) = (tvar)) + +#define SLIST_FOREACH_PREVPTR(var, varp, head, field) \ + for ((varp) = &SLIST_FIRST((head)); \ + ((var) = *(varp)) != NULL; \ + (varp) = &SLIST_NEXT((var), field)) + +#define SLIST_INIT(head) do { \ + SLIST_FIRST((head)) = NULL; \ +} while (0) + +#define SLIST_INSERT_AFTER(slistelm, elm, field) do { \ + SLIST_NEXT((elm), field) = SLIST_NEXT((slistelm), field); \ + SLIST_NEXT((slistelm), field) = (elm); \ +} while (0) + +#define SLIST_INSERT_HEAD(head, elm, field) do { \ + SLIST_NEXT((elm), field) = SLIST_FIRST((head)); \ + SLIST_FIRST((head)) = (elm); \ +} while (0) + +#define SLIST_NEXT(elm, field) ((elm)->field.sle_next) + +#define SLIST_REMOVE(head, elm, type, field) do { \ + QMD_SAVELINK(oldnext, (elm)->field.sle_next); \ + if (SLIST_FIRST((head)) == (elm)) { \ + SLIST_REMOVE_HEAD((head), field); \ + } \ + else { \ + QUEUE_TYPEOF(type) *curelm = SLIST_FIRST(head); \ + while (SLIST_NEXT(curelm, field) != (elm)) \ + curelm = SLIST_NEXT(curelm, field); \ + SLIST_REMOVE_AFTER(curelm, field); \ + } \ + TRASHIT(*oldnext); \ +} while (0) + +#define SLIST_REMOVE_AFTER(elm, field) do { \ + SLIST_NEXT(elm, field) = \ + SLIST_NEXT(SLIST_NEXT(elm, field), field); \ +} while (0) + +#define SLIST_REMOVE_HEAD(head, field) do { \ + SLIST_FIRST((head)) = SLIST_NEXT(SLIST_FIRST((head)), field); \ +} while (0) + +#define SLIST_REMOVE_PREVPTR(prevp, elm, field) do { \ + QMD_SLIST_CHECK_PREVPTR(prevp, elm); \ + *(prevp) = SLIST_NEXT(elm, field); \ + TRASHIT((elm)->field.sle_next); \ +} while (0) + +#define SLIST_SWAP(head1, head2, type) do { \ + QUEUE_TYPEOF(type) *swap_first = SLIST_FIRST(head1); \ + SLIST_FIRST(head1) = SLIST_FIRST(head2); \ + SLIST_FIRST(head2) = swap_first; \ +} while (0) + +/* + * Singly-linked Tail queue declarations. + */ +#define STAILQ_HEAD(name, type) \ +struct name { \ + struct type *stqh_first;/* first element */ \ + struct type **stqh_last;/* addr of last next element */ \ +} + +#define STAILQ_CLASS_HEAD(name, type) \ +struct name { \ + class type *stqh_first; /* first element */ \ + class type **stqh_last; /* addr of last next element */ \ +} + +#define STAILQ_HEAD_INITIALIZER(head) \ + { NULL, &(head).stqh_first } + +#define STAILQ_ENTRY(type) \ +struct { \ + struct type *stqe_next; /* next element */ \ +} + +#define STAILQ_CLASS_ENTRY(type) \ +struct { \ + class type *stqe_next; /* next element */ \ +} + +/* + * Singly-linked Tail queue functions. + */ +#define STAILQ_CONCAT(head1, head2) do { \ + if (!STAILQ_EMPTY((head2))) { \ + *(head1)->stqh_last = (head2)->stqh_first; \ + (head1)->stqh_last = (head2)->stqh_last; \ + STAILQ_INIT((head2)); \ + } \ +} while (0) + +#define STAILQ_EMPTY(head) ((head)->stqh_first == NULL) + +#define STAILQ_FIRST(head) ((head)->stqh_first) + +#define STAILQ_FOREACH(var, head, field) \ + for((var) = STAILQ_FIRST((head)); \ + (var); \ + (var) = STAILQ_NEXT((var), field)) + +#define STAILQ_FOREACH_FROM(var, head, field) \ + for ((var) = ((var) ? (var) : STAILQ_FIRST((head))); \ + (var); \ + (var) = STAILQ_NEXT((var), field)) + +#define STAILQ_FOREACH_SAFE(var, head, field, tvar) \ + for ((var) = STAILQ_FIRST((head)); \ + (var) && ((tvar) = STAILQ_NEXT((var), field), 1); \ + (var) = (tvar)) + +#define STAILQ_FOREACH_FROM_SAFE(var, head, field, tvar) \ + for ((var) = ((var) ? (var) : STAILQ_FIRST((head))); \ + (var) && ((tvar) = STAILQ_NEXT((var), field), 1); \ + (var) = (tvar)) + +#define STAILQ_INIT(head) do { \ + STAILQ_FIRST((head)) = NULL; \ + (head)->stqh_last = &STAILQ_FIRST((head)); \ +} while (0) + +#define STAILQ_INSERT_AFTER(head, tqelm, elm, field) do { \ + if ((STAILQ_NEXT((elm), field) = STAILQ_NEXT((tqelm), field)) == NULL)\ + (head)->stqh_last = &STAILQ_NEXT((elm), field); \ + STAILQ_NEXT((tqelm), field) = (elm); \ +} while (0) + +#define STAILQ_INSERT_HEAD(head, elm, field) do { \ + if ((STAILQ_NEXT((elm), field) = STAILQ_FIRST((head))) == NULL) \ + (head)->stqh_last = &STAILQ_NEXT((elm), field); \ + STAILQ_FIRST((head)) = (elm); \ +} while (0) + +#define STAILQ_INSERT_TAIL(head, elm, field) do { \ + STAILQ_NEXT((elm), field) = NULL; \ + *(head)->stqh_last = (elm); \ + (head)->stqh_last = &STAILQ_NEXT((elm), field); \ +} while (0) + +#define STAILQ_LAST(head, type, field) \ + (STAILQ_EMPTY((head)) ? NULL : \ + __containerof((head)->stqh_last, \ + QUEUE_TYPEOF(type), field.stqe_next)) + +#define STAILQ_NEXT(elm, field) ((elm)->field.stqe_next) + +#define STAILQ_REMOVE(head, elm, type, field) do { \ + QMD_SAVELINK(oldnext, (elm)->field.stqe_next); \ + if (STAILQ_FIRST((head)) == (elm)) { \ + STAILQ_REMOVE_HEAD((head), field); \ + } \ + else { \ + QUEUE_TYPEOF(type) *curelm = STAILQ_FIRST(head); \ + while (STAILQ_NEXT(curelm, field) != (elm)) \ + curelm = STAILQ_NEXT(curelm, field); \ + STAILQ_REMOVE_AFTER(head, curelm, field); \ + } \ + TRASHIT(*oldnext); \ +} while (0) + +#define STAILQ_REMOVE_AFTER(head, elm, field) do { \ + if ((STAILQ_NEXT(elm, field) = \ + STAILQ_NEXT(STAILQ_NEXT(elm, field), field)) == NULL) \ + (head)->stqh_last = &STAILQ_NEXT((elm), field); \ +} while (0) + +#define STAILQ_REMOVE_HEAD(head, field) do { \ + if ((STAILQ_FIRST((head)) = \ + STAILQ_NEXT(STAILQ_FIRST((head)), field)) == NULL) \ + (head)->stqh_last = &STAILQ_FIRST((head)); \ +} while (0) + +#define STAILQ_SWAP(head1, head2, type) do { \ + QUEUE_TYPEOF(type) *swap_first = STAILQ_FIRST(head1); \ + QUEUE_TYPEOF(type) **swap_last = (head1)->stqh_last; \ + STAILQ_FIRST(head1) = STAILQ_FIRST(head2); \ + (head1)->stqh_last = (head2)->stqh_last; \ + STAILQ_FIRST(head2) = swap_first; \ + (head2)->stqh_last = swap_last; \ + if (STAILQ_EMPTY(head1)) \ + (head1)->stqh_last = &STAILQ_FIRST(head1); \ + if (STAILQ_EMPTY(head2)) \ + (head2)->stqh_last = &STAILQ_FIRST(head2); \ +} while (0) + + +/* + * List declarations. + */ +#define LIST_HEAD(name, type) \ +struct name { \ + struct type *lh_first; /* first element */ \ +} + +#define LIST_CLASS_HEAD(name, type) \ +struct name { \ + class type *lh_first; /* first element */ \ +} + +#define LIST_HEAD_INITIALIZER(head) \ + { NULL } + +#define LIST_ENTRY(type) \ +struct { \ + struct type *le_next; /* next element */ \ + struct type **le_prev; /* address of previous next element */ \ +} + +#define LIST_CLASS_ENTRY(type) \ +struct { \ + class type *le_next; /* next element */ \ + class type **le_prev; /* address of previous next element */ \ +} + +/* + * List functions. + */ + +#if (defined(_KERNEL) && defined(INVARIANTS)) +/* + * QMD_LIST_CHECK_HEAD(LIST_HEAD *head, LIST_ENTRY NAME) + * + * If the list is non-empty, validates that the first element of the list + * points back at 'head.' + */ +#define QMD_LIST_CHECK_HEAD(head, field) do { \ + if (LIST_FIRST((head)) != NULL && \ + LIST_FIRST((head))->field.le_prev != \ + &LIST_FIRST((head))) \ + panic("Bad list head %p first->prev != head", (head)); \ +} while (0) + +/* + * QMD_LIST_CHECK_NEXT(TYPE *elm, LIST_ENTRY NAME) + * + * If an element follows 'elm' in the list, validates that the next element + * points back at 'elm.' + */ +#define QMD_LIST_CHECK_NEXT(elm, field) do { \ + if (LIST_NEXT((elm), field) != NULL && \ + LIST_NEXT((elm), field)->field.le_prev != \ + &((elm)->field.le_next)) \ + panic("Bad link elm %p next->prev != elm", (elm)); \ +} while (0) + +/* + * QMD_LIST_CHECK_PREV(TYPE *elm, LIST_ENTRY NAME) + * + * Validates that the previous element (or head of the list) points to 'elm.' + */ +#define QMD_LIST_CHECK_PREV(elm, field) do { \ + if (*(elm)->field.le_prev != (elm)) \ + panic("Bad link elm %p prev->next != elm", (elm)); \ +} while (0) +#else +#define QMD_LIST_CHECK_HEAD(head, field) +#define QMD_LIST_CHECK_NEXT(elm, field) +#define QMD_LIST_CHECK_PREV(elm, field) +#endif /* (_KERNEL && INVARIANTS) */ + +#define LIST_CONCAT(head1, head2, type, field) do { \ + QUEUE_TYPEOF(type) *curelm = LIST_FIRST(head1); \ + if (curelm == NULL) { \ + if ((LIST_FIRST(head1) = LIST_FIRST(head2)) != NULL) { \ + LIST_FIRST(head2)->field.le_prev = \ + &LIST_FIRST((head1)); \ + LIST_INIT(head2); \ + } \ + } else if (LIST_FIRST(head2) != NULL) { \ + while (LIST_NEXT(curelm, field) != NULL) \ + curelm = LIST_NEXT(curelm, field); \ + LIST_NEXT(curelm, field) = LIST_FIRST(head2); \ + LIST_FIRST(head2)->field.le_prev = &LIST_NEXT(curelm, field); \ + LIST_INIT(head2); \ + } \ +} while (0) + +#define LIST_EMPTY(head) ((head)->lh_first == NULL) + +#define LIST_FIRST(head) ((head)->lh_first) + +#define LIST_FOREACH(var, head, field) \ + for ((var) = LIST_FIRST((head)); \ + (var); \ + (var) = LIST_NEXT((var), field)) + +#define LIST_FOREACH_FROM(var, head, field) \ + for ((var) = ((var) ? (var) : LIST_FIRST((head))); \ + (var); \ + (var) = LIST_NEXT((var), field)) + +#define LIST_FOREACH_SAFE(var, head, field, tvar) \ + for ((var) = LIST_FIRST((head)); \ + (var) && ((tvar) = LIST_NEXT((var), field), 1); \ + (var) = (tvar)) + +#define LIST_FOREACH_FROM_SAFE(var, head, field, tvar) \ + for ((var) = ((var) ? (var) : LIST_FIRST((head))); \ + (var) && ((tvar) = LIST_NEXT((var), field), 1); \ + (var) = (tvar)) + +#define LIST_INIT(head) do { \ + LIST_FIRST((head)) = NULL; \ +} while (0) + +#define LIST_INSERT_AFTER(listelm, elm, field) do { \ + QMD_LIST_CHECK_NEXT(listelm, field); \ + if ((LIST_NEXT((elm), field) = LIST_NEXT((listelm), field)) != NULL)\ + LIST_NEXT((listelm), field)->field.le_prev = \ + &LIST_NEXT((elm), field); \ + LIST_NEXT((listelm), field) = (elm); \ + (elm)->field.le_prev = &LIST_NEXT((listelm), field); \ +} while (0) + +#define LIST_INSERT_BEFORE(listelm, elm, field) do { \ + QMD_LIST_CHECK_PREV(listelm, field); \ + (elm)->field.le_prev = (listelm)->field.le_prev; \ + LIST_NEXT((elm), field) = (listelm); \ + *(listelm)->field.le_prev = (elm); \ + (listelm)->field.le_prev = &LIST_NEXT((elm), field); \ +} while (0) + +#define LIST_INSERT_HEAD(head, elm, field) do { \ + QMD_LIST_CHECK_HEAD((head), field); \ + if ((LIST_NEXT((elm), field) = LIST_FIRST((head))) != NULL) \ + LIST_FIRST((head))->field.le_prev = &LIST_NEXT((elm), field);\ + LIST_FIRST((head)) = (elm); \ + (elm)->field.le_prev = &LIST_FIRST((head)); \ +} while (0) + +#define LIST_NEXT(elm, field) ((elm)->field.le_next) + +#define LIST_PREV(elm, head, type, field) \ + ((elm)->field.le_prev == &LIST_FIRST((head)) ? NULL : \ + __containerof((elm)->field.le_prev, \ + QUEUE_TYPEOF(type), field.le_next)) + +#define LIST_REMOVE(elm, field) do { \ + QMD_SAVELINK(oldnext, (elm)->field.le_next); \ + QMD_SAVELINK(oldprev, (elm)->field.le_prev); \ + QMD_LIST_CHECK_NEXT(elm, field); \ + QMD_LIST_CHECK_PREV(elm, field); \ + if (LIST_NEXT((elm), field) != NULL) \ + LIST_NEXT((elm), field)->field.le_prev = \ + (elm)->field.le_prev; \ + *(elm)->field.le_prev = LIST_NEXT((elm), field); \ + TRASHIT(*oldnext); \ + TRASHIT(*oldprev); \ +} while (0) + +#define LIST_SWAP(head1, head2, type, field) do { \ + QUEUE_TYPEOF(type) *swap_tmp = LIST_FIRST(head1); \ + LIST_FIRST((head1)) = LIST_FIRST((head2)); \ + LIST_FIRST((head2)) = swap_tmp; \ + if ((swap_tmp = LIST_FIRST((head1))) != NULL) \ + swap_tmp->field.le_prev = &LIST_FIRST((head1)); \ + if ((swap_tmp = LIST_FIRST((head2))) != NULL) \ + swap_tmp->field.le_prev = &LIST_FIRST((head2)); \ +} while (0) + /* * Tail queue declarations. */ @@ -123,10 +619,58 @@ struct { \ /* * Tail queue functions. */ +#if (defined(_KERNEL) && defined(INVARIANTS)) +/* + * QMD_TAILQ_CHECK_HEAD(TAILQ_HEAD *head, TAILQ_ENTRY NAME) + * + * If the tailq is non-empty, validates that the first element of the tailq + * points back at 'head.' + */ +#define QMD_TAILQ_CHECK_HEAD(head, field) do { \ + if (!TAILQ_EMPTY(head) && \ + TAILQ_FIRST((head))->field.tqe_prev != \ + &TAILQ_FIRST((head))) \ + panic("Bad tailq head %p first->prev != head", (head)); \ +} while (0) + +/* + * QMD_TAILQ_CHECK_TAIL(TAILQ_HEAD *head, TAILQ_ENTRY NAME) + * + * Validates that the tail of the tailq is a pointer to pointer to NULL. + */ +#define QMD_TAILQ_CHECK_TAIL(head, field) do { \ + if (*(head)->tqh_last != NULL) \ + panic("Bad tailq NEXT(%p->tqh_last) != NULL", (head)); \ +} while (0) + +/* + * QMD_TAILQ_CHECK_NEXT(TYPE *elm, TAILQ_ENTRY NAME) + * + * If an element follows 'elm' in the tailq, validates that the next element + * points back at 'elm.' + */ +#define QMD_TAILQ_CHECK_NEXT(elm, field) do { \ + if (TAILQ_NEXT((elm), field) != NULL && \ + TAILQ_NEXT((elm), field)->field.tqe_prev != \ + &((elm)->field.tqe_next)) \ + panic("Bad link elm %p next->prev != elm", (elm)); \ +} while (0) + +/* + * QMD_TAILQ_CHECK_PREV(TYPE *elm, TAILQ_ENTRY NAME) + * + * Validates that the previous element (or head of the tailq) points to 'elm.' + */ +#define QMD_TAILQ_CHECK_PREV(elm, field) do { \ + if (*(elm)->field.tqe_prev != (elm)) \ + panic("Bad link elm %p prev->next != elm", (elm)); \ +} while (0) +#else #define QMD_TAILQ_CHECK_HEAD(head, field) #define QMD_TAILQ_CHECK_TAIL(head, headname) #define QMD_TAILQ_CHECK_NEXT(elm, field) #define QMD_TAILQ_CHECK_PREV(elm, field) +#endif /* (_KERNEL && INVARIANTS) */ #define TAILQ_CONCAT(head1, head2, field) do { \ if (!TAILQ_EMPTY(head2)) { \ @@ -191,9 +735,8 @@ struct { \ #define TAILQ_INSERT_AFTER(head, listelm, elm, field) do { \ QMD_TAILQ_CHECK_NEXT(listelm, field); \ - TAILQ_NEXT((elm), field) = TAILQ_NEXT((listelm), field); \ - if (TAILQ_NEXT((listelm), field) != NULL) \ - TAILQ_NEXT((elm), field)->field.tqe_prev = \ + if ((TAILQ_NEXT((elm), field) = TAILQ_NEXT((listelm), field)) != NULL)\ + TAILQ_NEXT((elm), field)->field.tqe_prev = \ &TAILQ_NEXT((elm), field); \ else { \ (head)->tqh_last = &TAILQ_NEXT((elm), field); \ @@ -217,8 +760,7 @@ struct { \ #define TAILQ_INSERT_HEAD(head, elm, field) do { \ QMD_TAILQ_CHECK_HEAD(head, field); \ - TAILQ_NEXT((elm), field) = TAILQ_FIRST((head)); \ - if (TAILQ_FIRST((head)) != NULL) \ + if ((TAILQ_NEXT((elm), field) = TAILQ_FIRST((head))) != NULL) \ TAILQ_FIRST((head))->field.tqe_prev = \ &TAILQ_NEXT((elm), field); \ else \ @@ -250,21 +792,24 @@ struct { \ * you may want to prefetch the last data element. */ #define TAILQ_LAST_FAST(head, type, field) \ - (TAILQ_EMPTY(head) ? NULL : __containerof((head)->tqh_last, \ - QUEUE_TYPEOF(type), field.tqe_next)) + (TAILQ_EMPTY(head) ? NULL : __containerof((head)->tqh_last, QUEUE_TYPEOF(type), field.tqe_next)) #define TAILQ_NEXT(elm, field) ((elm)->field.tqe_next) #define TAILQ_PREV(elm, headname, field) \ (*(((struct headname *)((elm)->field.tqe_prev))->tqh_last)) +#define TAILQ_PREV_FAST(elm, head, type, field) \ + ((elm)->field.tqe_prev == &(head)->tqh_first ? NULL : \ + __containerof((elm)->field.tqe_prev, QUEUE_TYPEOF(type), field.tqe_next)) + #define TAILQ_REMOVE(head, elm, field) do { \ QMD_SAVELINK(oldnext, (elm)->field.tqe_next); \ QMD_SAVELINK(oldprev, (elm)->field.tqe_prev); \ QMD_TAILQ_CHECK_NEXT(elm, field); \ QMD_TAILQ_CHECK_PREV(elm, field); \ if ((TAILQ_NEXT((elm), field)) != NULL) \ - TAILQ_NEXT((elm), field)->field.tqe_prev = \ + TAILQ_NEXT((elm), field)->field.tqe_prev = \ (elm)->field.tqe_prev; \ else { \ (head)->tqh_last = (elm)->field.tqe_prev; \ @@ -277,26 +822,20 @@ struct { \ } while (0) #define TAILQ_SWAP(head1, head2, type, field) do { \ - QUEUE_TYPEOF(type) * swap_first = (head1)->tqh_first; \ - QUEUE_TYPEOF(type) * *swap_last = (head1)->tqh_last; \ + QUEUE_TYPEOF(type) *swap_first = (head1)->tqh_first; \ + QUEUE_TYPEOF(type) **swap_last = (head1)->tqh_last; \ (head1)->tqh_first = (head2)->tqh_first; \ (head1)->tqh_last = (head2)->tqh_last; \ (head2)->tqh_first = swap_first; \ (head2)->tqh_last = swap_last; \ - swap_first = (head1)->tqh_first; \ - if (swap_first != NULL) \ + if ((swap_first = (head1)->tqh_first) != NULL) \ swap_first->field.tqe_prev = &(head1)->tqh_first; \ else \ (head1)->tqh_last = &(head1)->tqh_first; \ - swap_first = (head2)->tqh_first; \ - if (swap_first != NULL) \ + if ((swap_first = (head2)->tqh_first) != NULL) \ swap_first->field.tqe_prev = &(head2)->tqh_first; \ else \ (head2)->tqh_last = &(head2)->tqh_first; \ } while (0) -#ifdef __cplusplus -} -#endif - -#endif /* _SYS_QUEUE_H_ */ +#endif /* !_SYS_QUEUE_H_ */ From patchwork Tue Apr 14 19:44:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Kozlyuk X-Patchwork-Id: 68452 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id E00FCA0577; Tue, 14 Apr 2020 21:46:19 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 94CAA1D52D; Tue, 14 Apr 2020 21:45:02 +0200 (CEST) Received: from mail-lj1-f193.google.com (mail-lj1-f193.google.com [209.85.208.193]) by dpdk.org (Postfix) with ESMTP id DE2071D42B for ; Tue, 14 Apr 2020 21:44:56 +0200 (CEST) Received: by mail-lj1-f193.google.com with SMTP id v9so1089799ljk.12 for ; Tue, 14 Apr 2020 12:44:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1kN5TY+Zl6QEN2hfRs8+DrUJ4teAQmQxRco9vDWaEFk=; b=nB5jPdGwioZdBYmP2KQCLfLvRqv947cdMKTSpxnv+HWKZO7MTbSHrCjPfEzJ5l0t2w TTaaWx+AnvOopVM4qVb96D154e2M3Kx60LVE8bIDbeF3u5sM68NwYVMv1eTv2JmlB0v/ cNlFZl/T1mLffqvhK0QoJ/ggLn8OJYLtmd8CCF539g6uJkORsCQRdS7yfsv5DF1KcTDR NgP6lRGPVYJ6RpOwEQkQS7eB70ifDD/RlKWZfWs+AoDWrLOwHyo2kPnJvyiBGr8Rtwrc xMriQCzeQnJZos4HUc5zaCeEhxzJcz5Z3SLui+5SUchNebZ1i+C+ojH6vOVKxx7iiucm wjgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1kN5TY+Zl6QEN2hfRs8+DrUJ4teAQmQxRco9vDWaEFk=; b=qYA+bOHe/P1wIKekb+NrdaX3RDeCf12S8Xu7exIkIEMEVNiNtF4qzRTDB5n9wdJQ5x dmn9eNlGvSTXhZ6hMhnua7xk306KD1aokw74UEm9jWxIGm0mUfgk8zBWMcXMCZ7rgVAo qZpg9N2EPrF5hPmx9zYHOxS3EKqjCtsH17rJIJ01PTjnSOw6OK91H17RS4t1yuSu4jvv 4zyoFaGSrxHuvH1xnHJ7ERvPLDuKzbTHNw1Z3i1+xWwD1k+Qs8KmadUQxVaRsPnIvenw OdynWTWIaXxcEo6e6v5/MbvsS5OUbQ6j+lZojWU/JCwouQCOzVb/5EMy1A0vABT88I4h lPJA== X-Gm-Message-State: AGi0PuYa8PgU7mZw3UI7CNG5hUrHNk5GcCnE/hdDdf6kwuSO9PAXDwkh l2nOOc+tcUaQnL7LGTrru3FAoIoK3x7+fw== X-Google-Smtp-Source: APiQypKwbmLD3xi4/eThs2FdHTdEF+i6zVHJ8BDNtwQizaHmq/8Vg/n59J+fXV03VIPbU1j9hM2PvQ== X-Received: by 2002:a05:651c:50a:: with SMTP id o10mr949196ljp.163.1586893493806; Tue, 14 Apr 2020 12:44:53 -0700 (PDT) Received: from localhost.localdomain (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.googlemail.com with ESMTPSA id h14sm11272426lfm.60.2020.04.14.12.44.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Apr 2020 12:44:52 -0700 (PDT) From: Dmitry Kozlyuk To: dev@dpdk.org Cc: "Dmitry Malloy (MESHCHANINOV)" , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Dmitry Kozlyuk , Thomas Monjalon , Harini Ramakrishnan , Omar Cardona , Pallavi Kadam , Ranjit Menon , John McNamara , Marko Kovacevic , Anatoly Burakov , Bruce Richardson Date: Tue, 14 Apr 2020 22:44:25 +0300 Message-Id: <20200414194426.1640704-11-dmitry.kozliuk@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> References: <20200410164342.1194634-1-dmitry.kozliuk@gmail.com> <20200414194426.1640704-1-dmitry.kozliuk@gmail.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v3 10/10] eal/windows: implement basic memory management X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Basic memory management supports core libraries and PMDs operating in IOVA as PA mode. It uses a kernel-mode driver, virt2phys, to obtain IOVAs of hugepages allocated from user-mode. Signed-off-by: Dmitry Kozlyuk --- config/meson.build | 2 +- doc/guides/windows_gsg/run_apps.rst | 43 +- lib/librte_eal/common/eal_common_fbarray.c | 57 +- lib/librte_eal/common/eal_common_memory.c | 50 +- lib/librte_eal/common/eal_private.h | 6 +- lib/librte_eal/common/malloc_heap.c | 1 + lib/librte_eal/common/meson.build | 9 + lib/librte_eal/freebsd/eal_memory.c | 1 - lib/librte_eal/rte_eal_exports.def | 119 ++- lib/librte_eal/windows/eal.c | 55 ++ lib/librte_eal/windows/eal_memalloc.c | 418 +++++++++++ lib/librte_eal/windows/eal_memory.c | 706 +++++++++++++++++- lib/librte_eal/windows/eal_mp.c | 103 +++ lib/librte_eal/windows/eal_windows.h | 23 + lib/librte_eal/windows/include/meson.build | 1 + lib/librte_eal/windows/include/rte_os.h | 4 + .../windows/include/rte_virt2phys.h | 34 + lib/librte_eal/windows/include/rte_windows.h | 2 + lib/librte_eal/windows/include/unistd.h | 3 + lib/librte_eal/windows/meson.build | 4 + 20 files changed, 1571 insertions(+), 70 deletions(-) create mode 100644 lib/librte_eal/windows/eal_memalloc.c create mode 100644 lib/librte_eal/windows/eal_mp.c create mode 100644 lib/librte_eal/windows/include/rte_virt2phys.h diff --git a/config/meson.build b/config/meson.build index bceb5ef7b..7b8baa788 100644 --- a/config/meson.build +++ b/config/meson.build @@ -270,7 +270,7 @@ if is_windows add_project_link_arguments('-lmincore', language: 'c') endif - add_project_link_arguments('-ladvapi32', language: 'c') + add_project_link_arguments('-ladvapi32', '-lsetupapi', language: 'c') endif if get_option('b_lto') diff --git a/doc/guides/windows_gsg/run_apps.rst b/doc/guides/windows_gsg/run_apps.rst index 21ac7f6c1..323d050dc 100644 --- a/doc/guides/windows_gsg/run_apps.rst +++ b/doc/guides/windows_gsg/run_apps.rst @@ -7,10 +7,10 @@ Running DPDK Applications Grant *Lock pages in memory* Privilege -------------------------------------- -Use of hugepages ("large pages" in Windows terminolocy) requires +Use of hugepages ("large pages" in Windows terminology) requires ``SeLockMemoryPrivilege`` for the user running an application. -1. Open *Local Security Policy* snap in, either: +1. Open *Local Security Policy* snap-in, either: * Control Panel / Computer Management / Local Security Policy; * or Win+R, type ``secpol``, press Enter. @@ -24,7 +24,44 @@ Use of hugepages ("large pages" in Windows terminolocy) requires See `Large-Page Support`_ in MSDN for details. -.. _Large-page Support: https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support +.. _Large-Page Support: https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support + + +Load virt2phys Driver +--------------------- + +Access to physical addresses is provided by a kernel-mode driver, virt2phys. +It is mandatory at least for using hardware PMDs, but may also be required +for mempools. + +Refer to documentation in ``dpdk-kmods`` repository for details on system +setup, driver build and installation. This driver is not signed, so signature +checking must be disabled to load it. + +.. warning:: + + Disabling driver signature enforcement weakens OS security. + It is discouraged in production environments. + +Compiled package consists of ``virt2phys.inf``, ``virt2phys.cat``, +and ``virt2phys.sys``. It can be installed as follows +from Elevated Command Prompt: + +.. code-block:: console + + pnputil /add-driver Z:\path\to\virt2phys.inf /install + +When loaded successfully, the driver is shown in *Device Manager* as *Virtual +to physical address translator* device under *Kernel bypass* category. +Installed driver persists across reboots. + +If DPDK is unable to communicate with the driver, a warning is printed +on initialization (debug-level logs provide more details): + +.. code-block:: text + + EAL: Cannot open virt2phys driver interface + Run the ``helloworld`` Example diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c index 1312f936b..236db9cb7 100644 --- a/lib/librte_eal/common/eal_common_fbarray.c +++ b/lib/librte_eal/common/eal_common_fbarray.c @@ -5,15 +5,15 @@ #include #include #include -#include #include #include -#include #include +#include #include -#include #include +#include +#include #include #include @@ -85,19 +85,16 @@ resize_and_map(int fd, void *addr, size_t len) char path[PATH_MAX]; void *map_addr; - if (ftruncate(fd, len)) { + if (eal_file_truncate(fd, len)) { RTE_LOG(ERR, EAL, "Cannot truncate %s\n", path); /* pass errno up the chain */ rte_errno = errno; return -1; } - map_addr = mmap(addr, len, PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_FIXED, fd, 0); + map_addr = rte_mem_map(addr, len, RTE_PROT_READ | RTE_PROT_WRITE, + RTE_MAP_SHARED | RTE_MAP_FIXED, fd, 0); if (map_addr != addr) { - RTE_LOG(ERR, EAL, "mmap() failed: %s\n", strerror(errno)); - /* pass errno up the chain */ - rte_errno = errno; return -1; } return 0; @@ -735,7 +732,7 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len, return -1; } - page_sz = sysconf(_SC_PAGESIZE); + page_sz = rte_get_page_size(); if (page_sz == (size_t)-1) { free(ma); return -1; @@ -756,9 +753,12 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len, if (internal_config.no_shconf) { /* remap virtual area as writable */ - void *new_data = mmap(data, mmap_len, PROT_READ | PROT_WRITE, - MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, fd, 0); - if (new_data == MAP_FAILED) { + void *new_data = rte_mem_map( + data, mmap_len, + RTE_PROT_READ | RTE_PROT_WRITE, + RTE_MAP_FIXED | RTE_MAP_PRIVATE | RTE_MAP_ANONYMOUS, + fd, 0); + if (new_data == NULL) { RTE_LOG(DEBUG, EAL, "%s(): couldn't remap anonymous memory: %s\n", __func__, strerror(errno)); goto fail; @@ -778,7 +778,8 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len, __func__, path, strerror(errno)); rte_errno = errno; goto fail; - } else if (flock(fd, LOCK_EX | LOCK_NB)) { + } else if (eal_file_lock( + fd, EAL_FLOCK_EXCLUSIVE, EAL_FLOCK_RETURN)) { RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n", __func__, path, strerror(errno)); rte_errno = EBUSY; @@ -789,10 +790,8 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len, * still attach to it, but no other process could reinitialize * it. */ - if (flock(fd, LOCK_SH | LOCK_NB)) { - rte_errno = errno; + if (eal_file_lock(fd, EAL_FLOCK_SHARED, EAL_FLOCK_RETURN)) goto fail; - } if (resize_and_map(fd, data, mmap_len)) goto fail; @@ -824,7 +823,7 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len, return 0; fail: if (data) - munmap(data, mmap_len); + rte_mem_unmap(data, mmap_len); if (fd >= 0) close(fd); free(ma); @@ -862,7 +861,7 @@ rte_fbarray_attach(struct rte_fbarray *arr) return -1; } - page_sz = sysconf(_SC_PAGESIZE); + page_sz = rte_get_page_size(); if (page_sz == (size_t)-1) { free(ma); return -1; @@ -895,10 +894,8 @@ rte_fbarray_attach(struct rte_fbarray *arr) } /* lock the file, to let others know we're using it */ - if (flock(fd, LOCK_SH | LOCK_NB)) { - rte_errno = errno; + if (eal_file_lock(fd, EAL_FLOCK_SHARED, EAL_FLOCK_RETURN)) goto fail; - } if (resize_and_map(fd, data, mmap_len)) goto fail; @@ -916,7 +913,7 @@ rte_fbarray_attach(struct rte_fbarray *arr) return 0; fail: if (data) - munmap(data, mmap_len); + rte_mem_unmap(data, mmap_len); if (fd >= 0) close(fd); free(ma); @@ -944,8 +941,7 @@ rte_fbarray_detach(struct rte_fbarray *arr) * really do anything about it, things will blow up either way. */ - size_t page_sz = sysconf(_SC_PAGESIZE); - + size_t page_sz = rte_get_page_size(); if (page_sz == (size_t)-1) return -1; @@ -964,7 +960,7 @@ rte_fbarray_detach(struct rte_fbarray *arr) goto out; } - munmap(arr->data, mmap_len); + rte_mem_unmap(arr->data, mmap_len); /* area is unmapped, close fd and remove the tailq entry */ if (tmp->fd >= 0) @@ -999,8 +995,7 @@ rte_fbarray_destroy(struct rte_fbarray *arr) * really do anything about it, things will blow up either way. */ - size_t page_sz = sysconf(_SC_PAGESIZE); - + size_t page_sz = rte_get_page_size(); if (page_sz == (size_t)-1) return -1; @@ -1025,7 +1020,7 @@ rte_fbarray_destroy(struct rte_fbarray *arr) * has been detached by all other processes */ fd = tmp->fd; - if (flock(fd, LOCK_EX | LOCK_NB)) { + if (eal_file_lock(fd, EAL_FLOCK_EXCLUSIVE, EAL_FLOCK_RETURN)) { RTE_LOG(DEBUG, EAL, "Cannot destroy fbarray - another process is using it\n"); rte_errno = EBUSY; ret = -1; @@ -1042,14 +1037,14 @@ rte_fbarray_destroy(struct rte_fbarray *arr) * we're still holding an exclusive lock, so drop it to * shared. */ - flock(fd, LOCK_SH | LOCK_NB); + eal_file_lock(fd, EAL_FLOCK_SHARED, EAL_FLOCK_RETURN); ret = -1; goto out; } close(fd); } - munmap(arr->data, mmap_len); + rte_mem_unmap(arr->data, mmap_len); /* area is unmapped, remove the tailq entry */ TAILQ_REMOVE(&mem_area_tailq, tmp, next); diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index d9764681a..5c3cf1f75 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -11,7 +11,6 @@ #include #include #include -#include #include #include @@ -44,7 +43,7 @@ static uint64_t system_page_sz; #define MAX_MMAP_WITH_DEFINED_ADDR_TRIES 5 void * eal_get_virtual_area(void *requested_addr, size_t *size, - size_t page_sz, int flags, int mmap_flags) + size_t page_sz, int flags, enum eal_mem_reserve_flags reserve_flags) { bool addr_is_hint, allow_shrink, unmap, no_align; uint64_t map_sz; @@ -52,9 +51,7 @@ eal_get_virtual_area(void *requested_addr, size_t *size, uint8_t try = 0; if (system_page_sz == 0) - system_page_sz = sysconf(_SC_PAGESIZE); - - mmap_flags |= MAP_PRIVATE | MAP_ANONYMOUS; + system_page_sz = rte_get_page_size(); RTE_LOG(DEBUG, EAL, "Ask a virtual area of 0x%zx bytes\n", *size); @@ -98,24 +95,24 @@ eal_get_virtual_area(void *requested_addr, size_t *size, return NULL; } - mapped_addr = mmap(requested_addr, (size_t)map_sz, PROT_NONE, - mmap_flags, -1, 0); - if (mapped_addr == MAP_FAILED && allow_shrink) - *size -= page_sz; + mapped_addr = eal_mem_reserve( + requested_addr, (size_t)map_sz, reserve_flags); + if ((mapped_addr == NULL) && allow_shrink) + size -= page_sz; - if (mapped_addr != MAP_FAILED && addr_is_hint && - mapped_addr != requested_addr) { + if ((mapped_addr != NULL) && addr_is_hint && + (mapped_addr != requested_addr)) { try++; next_baseaddr = RTE_PTR_ADD(next_baseaddr, page_sz); if (try <= MAX_MMAP_WITH_DEFINED_ADDR_TRIES) { /* hint was not used. Try with another offset */ - munmap(mapped_addr, map_sz); - mapped_addr = MAP_FAILED; + eal_mem_free(mapped_addr, *size); + mapped_addr = NULL; requested_addr = next_baseaddr; } } } while ((allow_shrink || addr_is_hint) && - mapped_addr == MAP_FAILED && *size > 0); + (mapped_addr == NULL) && (*size > 0)); /* align resulting address - if map failed, we will ignore the value * anyway, so no need to add additional checks. @@ -125,20 +122,17 @@ eal_get_virtual_area(void *requested_addr, size_t *size, if (*size == 0) { RTE_LOG(ERR, EAL, "Cannot get a virtual area of any size: %s\n", - strerror(errno)); - rte_errno = errno; + strerror(rte_errno)); return NULL; - } else if (mapped_addr == MAP_FAILED) { + } else if (mapped_addr == NULL) { RTE_LOG(ERR, EAL, "Cannot get a virtual area: %s\n", - strerror(errno)); - /* pass errno up the call chain */ - rte_errno = errno; + strerror(rte_errno)); return NULL; } else if (requested_addr != NULL && !addr_is_hint && aligned_addr != requested_addr) { RTE_LOG(ERR, EAL, "Cannot get a virtual area at requested address: %p (got %p)\n", requested_addr, aligned_addr); - munmap(mapped_addr, map_sz); + eal_mem_free(mapped_addr, map_sz); rte_errno = EADDRNOTAVAIL; return NULL; } else if (requested_addr != NULL && addr_is_hint && @@ -154,7 +148,7 @@ eal_get_virtual_area(void *requested_addr, size_t *size, aligned_addr, *size); if (unmap) { - munmap(mapped_addr, map_sz); + eal_mem_free(mapped_addr, map_sz); } else if (!no_align) { void *map_end, *aligned_end; size_t before_len, after_len; @@ -172,12 +166,12 @@ eal_get_virtual_area(void *requested_addr, size_t *size, /* unmap space before aligned mmap address */ before_len = RTE_PTR_DIFF(aligned_addr, mapped_addr); if (before_len > 0) - munmap(mapped_addr, before_len); + eal_mem_free(mapped_addr, before_len); /* unmap space after aligned end mmap address */ after_len = RTE_PTR_DIFF(map_end, aligned_end); if (after_len > 0) - munmap(aligned_end, after_len); + eal_mem_free(aligned_end, after_len); } return aligned_addr; @@ -586,10 +580,10 @@ rte_eal_memdevice_init(void) int rte_mem_lock_page(const void *virt) { - unsigned long virtual = (unsigned long)virt; - int page_size = getpagesize(); - unsigned long aligned = (virtual & ~(page_size - 1)); - return mlock((void *)aligned, page_size); + uintptr_t virtual = (uintptr_t)virt; + int page_size = rte_get_page_size(); + uintptr_t aligned = (virtual & ~(page_size - 1)); + return rte_mem_lock((void *)aligned, page_size); } int diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index 76938e379..59ac41916 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -226,8 +226,8 @@ enum eal_mem_reserve_flags { * Page size on which to align requested virtual area. * @param flags * EAL_VIRTUAL_AREA_* flags. - * @param mmap_flags - * Extra flags passed directly to mmap(). + * @param reserve_flags + * Extra flags passed directly to rte_mem_reserve(). * * @return * Virtual area address if successful. @@ -244,7 +244,7 @@ enum eal_mem_reserve_flags { /**< immediately unmap reserved virtual area. */ void * eal_get_virtual_area(void *requested_addr, size_t *size, size_t page_sz, - int flags, int mmap_flags); + int flags, enum eal_mem_reserve_flags reserve_flags); /** * Reserve VA space for a memory segment list. diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index 842eb9de7..6534c895c 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -729,6 +729,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg, if (ret != NULL) return ret; } + return NULL; } diff --git a/lib/librte_eal/common/meson.build b/lib/librte_eal/common/meson.build index 02d9280cc..6dcdcc890 100644 --- a/lib/librte_eal/common/meson.build +++ b/lib/librte_eal/common/meson.build @@ -9,11 +9,20 @@ if is_windows 'eal_common_class.c', 'eal_common_devargs.c', 'eal_common_errno.c', + 'eal_common_fbarray.c', 'eal_common_launch.c', 'eal_common_lcore.c', 'eal_common_log.c', + 'eal_common_mcfg.c', + 'eal_common_memalloc.c', + 'eal_common_memory.c', + 'eal_common_memzone.c', 'eal_common_options.c', + 'eal_common_tailqs.c', 'eal_common_thread.c', + 'malloc_elem.c', + 'malloc_heap.c', + 'rte_malloc.c', 'rte_option.c', ) subdir_done() diff --git a/lib/librte_eal/freebsd/eal_memory.c b/lib/librte_eal/freebsd/eal_memory.c index 5174f9cd0..99bf6ec9e 100644 --- a/lib/librte_eal/freebsd/eal_memory.c +++ b/lib/librte_eal/freebsd/eal_memory.c @@ -355,7 +355,6 @@ memseg_list_reserve(struct rte_memseg_list *msl) return eal_reserve_memseg_list(msl, flags); } - static int memseg_primary_init(void) { diff --git a/lib/librte_eal/rte_eal_exports.def b/lib/librte_eal/rte_eal_exports.def index bacf9a107..854b83bcd 100644 --- a/lib/librte_eal/rte_eal_exports.def +++ b/lib/librte_eal/rte_eal_exports.def @@ -1,13 +1,128 @@ EXPORTS __rte_panic + rte_calloc + rte_calloc_socket rte_eal_get_configuration + rte_eal_has_hugepages rte_eal_init + rte_eal_iova_mode rte_eal_mp_remote_launch rte_eal_mp_wait_lcore + rte_eal_process_type rte_eal_remote_launch - rte_get_page_size + rte_eal_tailq_lookup + rte_eal_tailq_register + rte_eal_using_phys_addrs + rte_free rte_log + rte_malloc + rte_malloc_dump_stats + rte_malloc_get_socket_stats + rte_malloc_set_limit + rte_malloc_socket + rte_malloc_validate + rte_malloc_virt2iova + rte_mcfg_mem_read_lock + rte_mcfg_mem_read_unlock + rte_mcfg_mem_write_lock + rte_mcfg_mem_write_unlock + rte_mcfg_mempool_read_lock + rte_mcfg_mempool_read_unlock + rte_mcfg_mempool_write_lock + rte_mcfg_mempool_write_unlock + rte_mcfg_tailq_read_lock + rte_mcfg_tailq_read_unlock + rte_mcfg_tailq_write_lock + rte_mcfg_tailq_write_unlock + rte_mem_lock_page + rte_mem_virt2iova + rte_mem_virt2phy + rte_memory_get_nchannel + rte_memory_get_nrank + rte_memzone_dump + rte_memzone_free + rte_memzone_lookup + rte_memzone_reserve + rte_memzone_reserve_aligned + rte_memzone_reserve_bounded + rte_memzone_walk + rte_vlog + rte_realloc + rte_zmalloc + rte_zmalloc_socket + + rte_mp_action_register + rte_mp_action_unregister + rte_mp_reply + rte_mp_sendmsg + + rte_fbarray_attach + rte_fbarray_destroy + rte_fbarray_detach + rte_fbarray_dump_metadata + rte_fbarray_find_contig_free + rte_fbarray_find_contig_used + rte_fbarray_find_idx + rte_fbarray_find_next_free + rte_fbarray_find_next_n_free + rte_fbarray_find_next_n_used + rte_fbarray_find_next_used + rte_fbarray_get + rte_fbarray_init + rte_fbarray_is_used + rte_fbarray_set_free + rte_fbarray_set_used + rte_malloc_dump_heaps + rte_mem_alloc_validator_register + rte_mem_alloc_validator_unregister + rte_mem_check_dma_mask + rte_mem_event_callback_register + rte_mem_event_callback_unregister + rte_mem_iova2virt + rte_mem_virt2memseg + rte_mem_virt2memseg_list + rte_memseg_contig_walk + rte_memseg_list_walk + rte_memseg_walk + rte_mp_request_async + rte_mp_request_sync + + rte_fbarray_find_prev_free + rte_fbarray_find_prev_n_free + rte_fbarray_find_prev_n_used + rte_fbarray_find_prev_used + rte_fbarray_find_rev_contig_free + rte_fbarray_find_rev_contig_used + rte_memseg_contig_walk_thread_unsafe + rte_memseg_list_walk_thread_unsafe + rte_memseg_walk_thread_unsafe + + rte_malloc_heap_create + rte_malloc_heap_destroy + rte_malloc_heap_get_socket + rte_malloc_heap_memory_add + rte_malloc_heap_memory_attach + rte_malloc_heap_memory_detach + rte_malloc_heap_memory_remove + rte_malloc_heap_socket_is_external + rte_mem_check_dma_mask_thread_unsafe + rte_mem_set_dma_mask + rte_memseg_get_fd + rte_memseg_get_fd_offset + rte_memseg_get_fd_offset_thread_unsafe + rte_memseg_get_fd_thread_unsafe + + rte_extmem_attach + rte_extmem_detach + rte_extmem_register + rte_extmem_unregister + + rte_fbarray_find_biggest_free + rte_fbarray_find_biggest_used + rte_fbarray_find_rev_biggest_free + rte_fbarray_find_rev_biggest_used + + rte_get_page_size rte_mem_lock rte_mem_map rte_mem_unmap - rte_vlog diff --git a/lib/librte_eal/windows/eal.c b/lib/librte_eal/windows/eal.c index cf55b56da..38f17f09c 100644 --- a/lib/librte_eal/windows/eal.c +++ b/lib/librte_eal/windows/eal.c @@ -93,6 +93,24 @@ eal_proc_type_detect(void) return ptype; } +enum rte_proc_type_t +rte_eal_process_type(void) +{ + return rte_config.process_type; +} + +int +rte_eal_has_hugepages(void) +{ + return !internal_config.no_hugetlbfs; +} + +enum rte_iova_mode +rte_eal_iova_mode(void) +{ + return rte_config.iova_mode; +} + /* display usage */ static void eal_usage(const char *prgname) @@ -328,6 +346,13 @@ rte_eal_init(int argc, char **argv) if (fctret < 0) exit(1); + /* Prevent creation of shared memory files. */ + if (internal_config.no_shconf == 0) { + RTE_LOG(WARNING, EAL, "Multi-process support is requested, " + "but not available.\n"); + internal_config.no_shconf = 1; + } + if (!internal_config.no_hugetlbfs && (eal_hugepage_info_init() < 0)) { rte_eal_init_alert("Cannot get hugepage information"); rte_errno = EACCES; @@ -345,6 +370,36 @@ rte_eal_init(int argc, char **argv) return -1; } + if (eal_mem_virt2iova_init() < 0) { + /* Non-fatal error if physical addresses are not required. */ + RTE_LOG(WARNING, EAL, "Cannot access virt2phys driver, " + "PA will not be available\n"); + } + + if (rte_eal_memzone_init() < 0) { + rte_eal_init_alert("Cannot init memzone"); + rte_errno = ENODEV; + return -1; + } + + if (rte_eal_memory_init() < 0) { + rte_eal_init_alert("Cannot init memory"); + rte_errno = ENOMEM; + return -1; + } + + if (rte_eal_malloc_heap_init() < 0) { + rte_eal_init_alert("Cannot init malloc heap"); + rte_errno = ENODEV; + return -1; + } + + if (rte_eal_tailqs_init() < 0) { + rte_eal_init_alert("Cannot init tail queues for objects"); + rte_errno = EFAULT; + return -1; + } + eal_thread_init_master(rte_config.master_lcore); RTE_LCORE_FOREACH_SLAVE(i) { diff --git a/lib/librte_eal/windows/eal_memalloc.c b/lib/librte_eal/windows/eal_memalloc.c new file mode 100644 index 000000000..e72e785b8 --- /dev/null +++ b/lib/librte_eal/windows/eal_memalloc.c @@ -0,0 +1,418 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2020 Dmitry Kozlyuk + */ + +#include +#include +#include + +#include "eal_internal_cfg.h" +#include "eal_memalloc.h" +#include "eal_memcfg.h" +#include "eal_private.h" +#include "eal_windows.h" + +int +eal_memalloc_get_seg_fd(int list_idx, int seg_idx) +{ + /* Hugepages have no assiciated files in Windows. */ + RTE_SET_USED(list_idx); + RTE_SET_USED(seg_idx); + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +int +eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset) +{ + /* Hugepages have no assiciated files in Windows. */ + RTE_SET_USED(list_idx); + RTE_SET_USED(seg_idx); + RTE_SET_USED(offset); + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +static int +alloc_seg(struct rte_memseg *ms, void *requested_addr, int socket_id, + struct hugepage_info *hi) +{ + HANDLE current_process; + unsigned int numa_node; + size_t alloc_sz; + void *addr; + rte_iova_t iova = RTE_BAD_IOVA; + PSAPI_WORKING_SET_EX_INFORMATION info; + PSAPI_WORKING_SET_EX_BLOCK *page; + + if (ms->len > 0) { + /* If a segment is already allocated as needed, return it. */ + if ((ms->addr == requested_addr) && + (ms->socket_id == socket_id) && + (ms->hugepage_sz == hi->hugepage_sz)) { + return 0; + } + + /* Bugcheck, should not happen. */ + RTE_LOG(DEBUG, EAL, "Attempted to reallocate segment %p " + "(size %zu) on socket %d", ms->addr, + ms->len, ms->socket_id); + return -1; + } + + current_process = GetCurrentProcess(); + numa_node = eal_socket_numa_node(socket_id); + alloc_sz = hi->hugepage_sz; + + if (requested_addr == NULL) { + /* Request a new chunk of memory from OS. */ + addr = eal_mem_alloc_socket(alloc_sz, socket_id); + if (addr == NULL) { + RTE_LOG(DEBUG, EAL, "Cannot allocate %zu bytes " + "on socket %d\n", alloc_sz, socket_id); + return -1; + } + } else { + /* Requested address is already reserved, commit memory. */ + addr = eal_mem_commit(requested_addr, alloc_sz, socket_id); + if (addr == NULL) { + RTE_LOG(DEBUG, EAL, "Cannot commit reserved memory %p " + "(size %zu) on socket %d\n", + requested_addr, alloc_sz, socket_id); + return -1; + } + } + + /* Force OS to allocate a physical page and select a NUMA node. + * Hugepages are not pageable in Windows, so there's no race + * for physical address. + */ + *(volatile int *)addr = *(volatile int *)addr; + + /* Only try to obtain IOVA if it's available, so that applications + * that do not need IOVA can use this allocator. + */ + if (rte_eal_using_phys_addrs()) { + iova = rte_mem_virt2iova(addr); + if (iova == RTE_BAD_IOVA) { + RTE_LOG(DEBUG, EAL, + "Cannot get IOVA of allocated segment\n"); + goto error; + } + } + + /* Only "Ex" function can handle hugepages. */ + info.VirtualAddress = addr; + if (!QueryWorkingSetEx(current_process, &info, sizeof(info))) { + RTE_LOG_WIN32_ERR("QueryWorkingSetEx()"); + goto error; + } + + page = &info.VirtualAttributes; + if (!page->Valid || !page->LargePage) { + RTE_LOG(DEBUG, EAL, "Got regular page instead of a hugepage\n"); + goto error; + } + if (page->Node != numa_node) { + RTE_LOG(DEBUG, EAL, + "NUMA node hint %u (socket %d) not respected, got %u\n", + numa_node, socket_id, page->Node); + goto error; + } + + ms->addr = addr; + ms->hugepage_sz = hi->hugepage_sz; + ms->len = alloc_sz; + ms->nchannel = rte_memory_get_nchannel(); + ms->nrank = rte_memory_get_nrank(); + ms->iova = iova; + ms->socket_id = socket_id; + + return 0; + +error: + /* Only jump here when `addr` and `alloc_sz` are valid. */ + eal_mem_decommit(addr, alloc_sz); + return -1; +} + +static int +free_seg(struct rte_memseg *ms) +{ + if (eal_mem_decommit(ms->addr, ms->len)) + return -1; + + /* Must clear the segment, because alloc_seg() inspects it. */ + memset(ms, 0, sizeof(*ms)); + return 0; +} + +struct alloc_walk_param { + struct hugepage_info *hi; + struct rte_memseg **ms; + size_t page_sz; + unsigned int segs_allocated; + unsigned int n_segs; + int socket; + bool exact; +}; + +static int +alloc_seg_walk(const struct rte_memseg_list *msl, void *arg) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct alloc_walk_param *wa = arg; + struct rte_memseg_list *cur_msl; + size_t page_sz; + int cur_idx, start_idx, j; + unsigned int msl_idx, need, i; + + if (msl->page_sz != wa->page_sz) + return 0; + if (msl->socket_id != wa->socket) + return 0; + + page_sz = (size_t)msl->page_sz; + + msl_idx = msl - mcfg->memsegs; + cur_msl = &mcfg->memsegs[msl_idx]; + + need = wa->n_segs; + + /* try finding space in memseg list */ + if (wa->exact) { + /* if we require exact number of pages in a list, find them */ + cur_idx = rte_fbarray_find_next_n_free( + &cur_msl->memseg_arr, 0, need); + if (cur_idx < 0) + return 0; + start_idx = cur_idx; + } else { + int cur_len; + + /* we don't require exact number of pages, so we're going to go + * for best-effort allocation. that means finding the biggest + * unused block, and going with that. + */ + cur_idx = rte_fbarray_find_biggest_free( + &cur_msl->memseg_arr, 0); + if (cur_idx < 0) + return 0; + start_idx = cur_idx; + /* adjust the size to possibly be smaller than original + * request, but do not allow it to be bigger. + */ + cur_len = rte_fbarray_find_contig_free( + &cur_msl->memseg_arr, cur_idx); + need = RTE_MIN(need, (unsigned int)cur_len); + } + + for (i = 0; i < need; i++, cur_idx++) { + struct rte_memseg *cur; + void *map_addr; + + cur = rte_fbarray_get(&cur_msl->memseg_arr, cur_idx); + map_addr = RTE_PTR_ADD(cur_msl->base_va, cur_idx * page_sz); + + if (alloc_seg(cur, map_addr, wa->socket, wa->hi)) { + RTE_LOG(DEBUG, EAL, "attempted to allocate %i segments, " + "but only %i were allocated\n", need, i); + + /* if exact number wasn't requested, stop */ + if (!wa->exact) + goto out; + + /* clean up */ + for (j = start_idx; j < cur_idx; j++) { + struct rte_memseg *tmp; + struct rte_fbarray *arr = &cur_msl->memseg_arr; + + tmp = rte_fbarray_get(arr, j); + rte_fbarray_set_free(arr, j); + + if (free_seg(tmp)) + RTE_LOG(DEBUG, EAL, "Cannot free page\n"); + } + /* clear the list */ + if (wa->ms) + memset(wa->ms, 0, sizeof(*wa->ms) * wa->n_segs); + + return -1; + } + if (wa->ms) + wa->ms[i] = cur; + + rte_fbarray_set_used(&cur_msl->memseg_arr, cur_idx); + } + +out: + wa->segs_allocated = i; + if (i > 0) + cur_msl->version++; + + /* if we didn't allocate any segments, move on to the next list */ + return i > 0; +} + +struct free_walk_param { + struct hugepage_info *hi; + struct rte_memseg *ms; +}; +static int +free_seg_walk(const struct rte_memseg_list *msl, void *arg) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct rte_memseg_list *found_msl; + struct free_walk_param *wa = arg; + uintptr_t start_addr, end_addr; + int msl_idx, seg_idx, ret; + + start_addr = (uintptr_t) msl->base_va; + end_addr = start_addr + msl->len; + + if ((uintptr_t)wa->ms->addr < start_addr || + (uintptr_t)wa->ms->addr >= end_addr) + return 0; + + msl_idx = msl - mcfg->memsegs; + seg_idx = RTE_PTR_DIFF(wa->ms->addr, start_addr) / msl->page_sz; + + /* msl is const */ + found_msl = &mcfg->memsegs[msl_idx]; + found_msl->version++; + + rte_fbarray_set_free(&found_msl->memseg_arr, seg_idx); + + ret = free_seg(wa->ms); + + return (ret < 0) ? (-1) : 1; +} + +int +eal_memalloc_alloc_seg_bulk(struct rte_memseg **ms, int n_segs, + size_t page_sz, int socket, bool exact) +{ + unsigned int i; + int ret = -1; + struct alloc_walk_param wa; + struct hugepage_info *hi = NULL; + + if (internal_config.legacy_mem) { + RTE_LOG(ERR, EAL, "dynamic allocation not supported in legacy mode\n"); + return -ENOTSUP; + } + + for (i = 0; i < internal_config.num_hugepage_sizes; i++) { + struct hugepage_info *hpi = &internal_config.hugepage_info[i]; + if (page_sz == hpi->hugepage_sz) { + hi = hpi; + break; + } + } + if (!hi) { + RTE_LOG(ERR, EAL, "cannot find relevant hugepage_info entry\n"); + return -1; + } + + memset(&wa, 0, sizeof(wa)); + wa.exact = exact; + wa.hi = hi; + wa.ms = ms; + wa.n_segs = n_segs; + wa.page_sz = page_sz; + wa.socket = socket; + wa.segs_allocated = 0; + + /* memalloc is locked, so it's safe to use thread-unsafe version */ + ret = rte_memseg_list_walk_thread_unsafe(alloc_seg_walk, &wa); + if (ret == 0) { + RTE_LOG(ERR, EAL, "cannot find suitable memseg_list\n"); + ret = -1; + } else if (ret > 0) { + ret = (int)wa.segs_allocated; + } + + return ret; +} + +struct rte_memseg * +eal_memalloc_alloc_seg(size_t page_sz, int socket) +{ + struct rte_memseg *ms = NULL; + eal_memalloc_alloc_seg_bulk(&ms, 1, page_sz, socket, true); + return ms; +} + +int +eal_memalloc_free_seg_bulk(struct rte_memseg **ms, int n_segs) +{ + int seg, ret = 0; + + /* dynamic free not supported in legacy mode */ + if (internal_config.legacy_mem) + return -1; + + for (seg = 0; seg < n_segs; seg++) { + struct rte_memseg *cur = ms[seg]; + struct hugepage_info *hi = NULL; + struct free_walk_param wa; + size_t i; + int walk_res; + + /* if this page is marked as unfreeable, fail */ + if (cur->flags & RTE_MEMSEG_FLAG_DO_NOT_FREE) { + RTE_LOG(DEBUG, EAL, "Page is not allowed to be freed\n"); + ret = -1; + continue; + } + + memset(&wa, 0, sizeof(wa)); + + for (i = 0; i < RTE_DIM(internal_config.hugepage_info); + i++) { + hi = &internal_config.hugepage_info[i]; + if (cur->hugepage_sz == hi->hugepage_sz) + break; + } + if (i == RTE_DIM(internal_config.hugepage_info)) { + RTE_LOG(ERR, EAL, "Can't find relevant hugepage_info entry\n"); + ret = -1; + continue; + } + + wa.ms = cur; + wa.hi = hi; + + /* memalloc is locked, so it's safe to use thread-unsafe version + */ + walk_res = rte_memseg_list_walk_thread_unsafe(free_seg_walk, + &wa); + if (walk_res == 1) + continue; + if (walk_res == 0) + RTE_LOG(ERR, EAL, "Couldn't find memseg list\n"); + ret = -1; + } + return ret; +} + +int +eal_memalloc_free_seg(struct rte_memseg *ms) +{ + return eal_memalloc_free_seg_bulk(&ms, 1); +} + +int +eal_memalloc_sync_with_primary(void) +{ + /* No multi-process support. */ + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +int +eal_memalloc_init(void) +{ + /* No action required. */ + return 0; +} diff --git a/lib/librte_eal/windows/eal_memory.c b/lib/librte_eal/windows/eal_memory.c index 5697187ce..82f3dafe6 100644 --- a/lib/librte_eal/windows/eal_memory.c +++ b/lib/librte_eal/windows/eal_memory.c @@ -1,11 +1,23 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2010-2014 Intel Corporation (functions from Linux EAL) + * Copyright (c) 2020 Dmitry Kozlyuk (Windows specifics) + */ + +#include #include #include #include +#include "eal_internal_cfg.h" +#include "eal_memalloc.h" +#include "eal_memcfg.h" +#include "eal_options.h" #include "eal_private.h" #include "eal_windows.h" +#include + /* MinGW-w64 headers lack VirtualAlloc2() in some distributions. * Provide a copy of definitions and code to load it dynamically. * Note: definitions are copied verbatim from Microsoft documentation @@ -120,6 +132,119 @@ eal_mem_win32api_init(void) #endif /* no VirtualAlloc2() */ +static HANDLE virt2phys_device = INVALID_HANDLE_VALUE; + +int +eal_mem_virt2iova_init(void) +{ + HDEVINFO list = INVALID_HANDLE_VALUE; + SP_DEVICE_INTERFACE_DATA ifdata; + SP_DEVICE_INTERFACE_DETAIL_DATA *detail = NULL; + DWORD detail_size; + int ret = -1; + + list = SetupDiGetClassDevs( + &GUID_DEVINTERFACE_VIRT2PHYS, NULL, NULL, + DIGCF_DEVICEINTERFACE | DIGCF_PRESENT); + if (list == INVALID_HANDLE_VALUE) { + RTE_LOG_WIN32_ERR("SetupDiGetClassDevs()"); + goto exit; + } + + ifdata.cbSize = sizeof(ifdata); + if (!SetupDiEnumDeviceInterfaces( + list, NULL, &GUID_DEVINTERFACE_VIRT2PHYS, 0, &ifdata)) { + RTE_LOG_WIN32_ERR("SetupDiEnumDeviceInterfaces()"); + goto exit; + } + + if (!SetupDiGetDeviceInterfaceDetail( + list, &ifdata, NULL, 0, &detail_size, NULL)) { + if (GetLastError() != ERROR_INSUFFICIENT_BUFFER) { + RTE_LOG_WIN32_ERR( + "SetupDiGetDeviceInterfaceDetail(probe)"); + goto exit; + } + } + + detail = malloc(detail_size); + if (detail == NULL) { + RTE_LOG(ERR, EAL, "Cannot allocate virt2phys " + "device interface detail data\n"); + goto exit; + } + + detail->cbSize = sizeof(*detail); + if (!SetupDiGetDeviceInterfaceDetail( + list, &ifdata, detail, detail_size, NULL, NULL)) { + RTE_LOG_WIN32_ERR("SetupDiGetDeviceInterfaceDetail(read)"); + goto exit; + } + + RTE_LOG(DEBUG, EAL, "Found virt2phys device: %s\n", detail->DevicePath); + + virt2phys_device = CreateFile( + detail->DevicePath, 0, 0, NULL, OPEN_EXISTING, 0, NULL); + if (virt2phys_device == INVALID_HANDLE_VALUE) { + RTE_LOG_WIN32_ERR("CreateFile()"); + goto exit; + } + + /* Indicate success. */ + ret = 0; + +exit: + if (detail != NULL) + free(detail); + if (list != INVALID_HANDLE_VALUE) + SetupDiDestroyDeviceInfoList(list); + + return ret; +} + +phys_addr_t +rte_mem_virt2phy(const void *virt) +{ + LARGE_INTEGER phys; + DWORD bytes_returned; + + if (virt2phys_device == INVALID_HANDLE_VALUE) + return RTE_BAD_PHYS_ADDR; + + if (!DeviceIoControl( + virt2phys_device, IOCTL_VIRT2PHYS_TRANSLATE, + &virt, sizeof(virt), &phys, sizeof(phys), + &bytes_returned, NULL)) { + RTE_LOG_WIN32_ERR("DeviceIoControl(IOCTL_VIRT2PHYS_TRANSLATE)"); + return RTE_BAD_PHYS_ADDR; + } + + return phys.QuadPart; +} + +/* Windows currently only supports IOVA as PA. */ +rte_iova_t +rte_mem_virt2iova(const void *virt) +{ + phys_addr_t phys; + + if (virt2phys_device == INVALID_HANDLE_VALUE) + return RTE_BAD_IOVA; + + phys = rte_mem_virt2phy(virt); + if (phys == RTE_BAD_PHYS_ADDR) + return RTE_BAD_IOVA; + + return (rte_iova_t)phys; +} + +/* Always using physical addresses under Windows if they can be obtained. */ +int +rte_eal_using_phys_addrs(void) +{ + return virt2phys_device != INVALID_HANDLE_VALUE; +} + /* Approximate error mapping from VirtualAlloc2() to POSIX mmap(3). */ static void set_errno_from_win32_alloc_error(DWORD code) @@ -253,6 +378,10 @@ eal_mem_commit(void *requested_addr, size_t size, int socket_id) int eal_mem_decommit(void *addr, size_t size) { + /* Decommit memory, which might be a part of a larger reserved region. + * Allocator commits hugepage-sized placeholders, so there's no need + * to coalesce placeholders back into region, they can be reused as is. + */ if (!VirtualFree(addr, size, MEM_RELEASE | MEM_PRESERVE_PLACEHOLDER)) { RTE_LOG_WIN32_ERR("VirtualFree(%p, %zu, ...)", addr, size); return -1; @@ -364,7 +493,7 @@ rte_mem_map(void *requested_addr, size_t size, enum rte_mem_prot prot, return NULL; } - /* TODO: there is a race for the requested_addr between mem_free() + /* There is a race for the requested_addr between mem_free() * and MapViewOfFileEx(). MapViewOfFile3() that can replace a reserved * region with a mapping in a single operation, but it does not support * private mappings. @@ -414,6 +543,16 @@ rte_mem_unmap(void *virt, size_t size) return 0; } +uint64_t +eal_get_baseaddr(void) +{ + /* Windows strategy for memory allocation is undocumented. + * Returning 0 here effectively disables address guessing + * unless user provides an address hint. + */ + return 0; +} + int rte_get_page_size(void) { @@ -435,3 +574,568 @@ rte_mem_lock(const void *virt, size_t size) return 0; } + +static int +memseg_list_alloc(struct rte_memseg_list *msl, uint64_t page_sz, + int n_segs, int socket_id, int type_msl_idx) +{ + return eal_alloc_memseg_list( + msl, page_sz, n_segs, socket_id, type_msl_idx, true); +} + +static int +memseg_list_reserve(struct rte_memseg_list *msl) +{ + return eal_reserve_memseg_list(msl, 0); +} + +/* + * Remaining code in this file largely duplicates Linux EAL. + * Although Windows EAL supports only one hugepage size currently, + * code structure and comments are preserved so that changes may be + * easily ported until duplication is removed. + */ + +static int +memseg_primary_init(void) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct memtype { + uint64_t page_sz; + int socket_id; + } *memtypes = NULL; + int i, hpi_idx, msl_idx, ret = -1; /* fail unless told to succeed */ + struct rte_memseg_list *msl; + uint64_t max_mem, max_mem_per_type; + unsigned int max_seglists_per_type; + unsigned int n_memtypes, cur_type; + + /* no-huge does not need this at all */ + if (internal_config.no_hugetlbfs) + return 0; + + /* + * figuring out amount of memory we're going to have is a long and very + * involved process. the basic element we're operating with is a memory + * type, defined as a combination of NUMA node ID and page size (so that + * e.g. 2 sockets with 2 page sizes yield 4 memory types in total). + * + * deciding amount of memory going towards each memory type is a + * balancing act between maximum segments per type, maximum memory per + * type, and number of detected NUMA nodes. the goal is to make sure + * each memory type gets at least one memseg list. + * + * the total amount of memory is limited by RTE_MAX_MEM_MB value. + * + * the total amount of memory per type is limited by either + * RTE_MAX_MEM_MB_PER_TYPE, or by RTE_MAX_MEM_MB divided by the number + * of detected NUMA nodes. additionally, maximum number of segments per + * type is also limited by RTE_MAX_MEMSEG_PER_TYPE. this is because for + * smaller page sizes, it can take hundreds of thousands of segments to + * reach the above specified per-type memory limits. + * + * additionally, each type may have multiple memseg lists associated + * with it, each limited by either RTE_MAX_MEM_MB_PER_LIST for bigger + * page sizes, or RTE_MAX_MEMSEG_PER_LIST segments for smaller ones. + * + * the number of memseg lists per type is decided based on the above + * limits, and also taking number of detected NUMA nodes, to make sure + * that we don't run out of memseg lists before we populate all NUMA + * nodes with memory. + * + * we do this in three stages. first, we collect the number of types. + * then, we figure out memory constraints and populate the list of + * would-be memseg lists. then, we go ahead and allocate the memseg + * lists. + */ + + /* create space for mem types */ + n_memtypes = internal_config.num_hugepage_sizes * rte_socket_count(); + memtypes = calloc(n_memtypes, sizeof(*memtypes)); + if (memtypes == NULL) { + RTE_LOG(ERR, EAL, "Cannot allocate space for memory types\n"); + return -1; + } + + /* populate mem types */ + cur_type = 0; + for (hpi_idx = 0; hpi_idx < (int) internal_config.num_hugepage_sizes; + hpi_idx++) { + struct hugepage_info *hpi; + uint64_t hugepage_sz; + + hpi = &internal_config.hugepage_info[hpi_idx]; + hugepage_sz = hpi->hugepage_sz; + + for (i = 0; i < (int) rte_socket_count(); i++, cur_type++) { + int socket_id = rte_socket_id_by_idx(i); + + memtypes[cur_type].page_sz = hugepage_sz; + memtypes[cur_type].socket_id = socket_id; + + RTE_LOG(DEBUG, EAL, "Detected memory type: " + "socket_id:%u hugepage_sz:%" PRIu64 "\n", + socket_id, hugepage_sz); + } + } + /* number of memtypes could have been lower due to no NUMA support */ + n_memtypes = cur_type; + + /* set up limits for types */ + max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; + max_mem_per_type = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20, + max_mem / n_memtypes); + + /* + * limit maximum number of segment lists per type to ensure there's + * space for memseg lists for all NUMA nodes with all page sizes + */ + max_seglists_per_type = RTE_MAX_MEMSEG_LISTS / n_memtypes; + + if (max_seglists_per_type == 0) { + RTE_LOG(ERR, EAL, "Cannot accommodate all memory types, please increase %s\n", + RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS)); + goto out; + } + + /* go through all mem types and create segment lists */ + msl_idx = 0; + for (cur_type = 0; cur_type < n_memtypes; cur_type++) { + unsigned int cur_seglist, n_seglists, n_segs; + unsigned int max_segs_per_type, max_segs_per_list; + struct memtype *type = &memtypes[cur_type]; + uint64_t max_mem_per_list, pagesz; + int socket_id; + + pagesz = type->page_sz; + socket_id = type->socket_id; + + /* + * we need to create segment lists for this type. we must take + * into account the following things: + * + * 1. total amount of memory we can use for this memory type + * 2. total amount of memory per memseg list allowed + * 3. number of segments needed to fit the amount of memory + * 4. number of segments allowed per type + * 5. number of segments allowed per memseg list + * 6. number of memseg lists we are allowed to take up + */ + + /* calculate how much segments we will need in total */ + max_segs_per_type = max_mem_per_type / pagesz; + /* limit number of segments to maximum allowed per type */ + max_segs_per_type = RTE_MIN(max_segs_per_type, + (unsigned int)RTE_MAX_MEMSEG_PER_TYPE); + /* limit number of segments to maximum allowed per list */ + max_segs_per_list = RTE_MIN(max_segs_per_type, + (unsigned int)RTE_MAX_MEMSEG_PER_LIST); + + /* calculate how much memory we can have per segment list */ + max_mem_per_list = RTE_MIN(max_segs_per_list * pagesz, + (uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20); + + /* calculate how many segments each segment list will have */ + n_segs = RTE_MIN(max_segs_per_list, max_mem_per_list / pagesz); + + /* calculate how many segment lists we can have */ + n_seglists = RTE_MIN(max_segs_per_type / n_segs, + max_mem_per_type / max_mem_per_list); + + /* limit number of segment lists according to our maximum */ + n_seglists = RTE_MIN(n_seglists, max_seglists_per_type); + + RTE_LOG(DEBUG, EAL, "Creating %i segment lists: " + "n_segs:%i socket_id:%i hugepage_sz:%" PRIu64 "\n", + n_seglists, n_segs, socket_id, pagesz); + + /* create all segment lists */ + for (cur_seglist = 0; cur_seglist < n_seglists; cur_seglist++) { + if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { + RTE_LOG(ERR, EAL, + "No more space in memseg lists, please increase %s\n", + RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS)); + goto out; + } + msl = &mcfg->memsegs[msl_idx++]; + + if (memseg_list_alloc(msl, pagesz, n_segs, + socket_id, cur_seglist)) + goto out; + + if (memseg_list_reserve(msl)) { + RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list\n"); + goto out; + } + } + } + /* we're successful */ + ret = 0; +out: + free(memtypes); + return ret; +} + +static int +memseg_secondary_init(void) +{ + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +int +rte_eal_memseg_init(void) +{ + if (rte_eal_process_type() == RTE_PROC_PRIMARY) + return memseg_primary_init(); + return memseg_secondary_init(); +} + +static inline uint64_t +get_socket_mem_size(int socket) +{ + uint64_t size = 0; + unsigned int i; + + for (i = 0; i < internal_config.num_hugepage_sizes; i++) { + struct hugepage_info *hpi = &internal_config.hugepage_info[i]; + size += hpi->hugepage_sz * hpi->num_pages[socket]; + } + + return size; +} + +static int +calc_num_pages_per_socket(uint64_t *memory, + struct hugepage_info *hp_info, + struct hugepage_info *hp_used, + unsigned int num_hp_info) +{ + unsigned int socket, j, i = 0; + unsigned int requested, available; + int total_num_pages = 0; + uint64_t remaining_mem, cur_mem; + uint64_t total_mem = internal_config.memory; + + if (num_hp_info == 0) + return -1; + + /* if specific memory amounts per socket weren't requested */ + if (internal_config.force_sockets == 0) { + size_t total_size; + int cpu_per_socket[RTE_MAX_NUMA_NODES]; + size_t default_size; + unsigned int lcore_id; + + /* Compute number of cores per socket */ + memset(cpu_per_socket, 0, sizeof(cpu_per_socket)); + RTE_LCORE_FOREACH(lcore_id) { + cpu_per_socket[rte_lcore_to_socket_id(lcore_id)]++; + } + + /* + * Automatically spread requested memory amongst detected + * sockets according to number of cores from cpu mask present + * on each socket. + */ + total_size = internal_config.memory; + for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_size != 0; + socket++) { + + /* Set memory amount per socket */ + default_size = internal_config.memory * + cpu_per_socket[socket] / rte_lcore_count(); + + /* Limit to maximum available memory on socket */ + default_size = RTE_MIN( + default_size, get_socket_mem_size(socket)); + + /* Update sizes */ + memory[socket] = default_size; + total_size -= default_size; + } + + /* + * If some memory is remaining, try to allocate it by getting + * all available memory from sockets, one after the other. + */ + for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_size != 0; + socket++) { + /* take whatever is available */ + default_size = RTE_MIN( + get_socket_mem_size(socket) - memory[socket], + total_size); + + /* Update sizes */ + memory[socket] += default_size; + total_size -= default_size; + } + } + + for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_mem != 0; + socket++) { + /* skips if the memory on specific socket wasn't requested */ + for (i = 0; i < num_hp_info && memory[socket] != 0; i++) { + strncpy(hp_used[i].hugedir, hp_info[i].hugedir, + sizeof(hp_used[i].hugedir)); + hp_used[i].num_pages[socket] = RTE_MIN( + memory[socket] / hp_info[i].hugepage_sz, + hp_info[i].num_pages[socket]); + + cur_mem = hp_used[i].num_pages[socket] * + hp_used[i].hugepage_sz; + + memory[socket] -= cur_mem; + total_mem -= cur_mem; + + total_num_pages += hp_used[i].num_pages[socket]; + + /* check if we have met all memory requests */ + if (memory[socket] == 0) + break; + + /* Check if we have any more pages left at this size, + * if so, move on to next size. + */ + if (hp_used[i].num_pages[socket] == + hp_info[i].num_pages[socket]) + continue; + + /* At this point we know that there are more pages + * available that are bigger than the memory we want, + * so lets see if we can get enough from other page + * sizes. + */ + remaining_mem = 0; + for (j = i+1; j < num_hp_info; j++) + remaining_mem += hp_info[j].hugepage_sz * + hp_info[j].num_pages[socket]; + + /* Is there enough other memory? + * If not, allocate another page and quit. + */ + if (remaining_mem < memory[socket]) { + cur_mem = RTE_MIN( + memory[socket], hp_info[i].hugepage_sz); + memory[socket] -= cur_mem; + total_mem -= cur_mem; + hp_used[i].num_pages[socket]++; + total_num_pages++; + break; /* we are done with this socket*/ + } + } + /* if we didn't satisfy all memory requirements per socket */ + if (memory[socket] > 0 && + internal_config.socket_mem[socket] != 0) { + /* to prevent icc errors */ + requested = (unsigned int)( + internal_config.socket_mem[socket] / 0x100000); + available = requested - + ((unsigned int)(memory[socket] / 0x100000)); + RTE_LOG(ERR, EAL, "Not enough memory available on " + "socket %u! Requested: %uMB, available: %uMB\n", + socket, requested, available); + return -1; + } + } + + /* if we didn't satisfy total memory requirements */ + if (total_mem > 0) { + requested = (unsigned int) (internal_config.memory / 0x100000); + available = requested - (unsigned int) (total_mem / 0x100000); + RTE_LOG(ERR, EAL, "Not enough memory available! " + "Requested: %uMB, available: %uMB\n", + requested, available); + return -1; + } + return total_num_pages; +} + +/* Limit is checked by validator itself, nothing left to analyze.*/ +static int +limits_callback(int socket_id, size_t cur_limit, size_t new_len) +{ + RTE_SET_USED(socket_id); + RTE_SET_USED(cur_limit); + RTE_SET_USED(new_len); + return -1; +} + +static int +eal_hugepage_init(void) +{ + struct hugepage_info used_hp[MAX_HUGEPAGE_SIZES]; + uint64_t memory[RTE_MAX_NUMA_NODES]; + int hp_sz_idx, socket_id; + + memset(used_hp, 0, sizeof(used_hp)); + + for (hp_sz_idx = 0; + hp_sz_idx < (int) internal_config.num_hugepage_sizes; + hp_sz_idx++) { + /* also initialize used_hp hugepage sizes in used_hp */ + struct hugepage_info *hpi; + hpi = &internal_config.hugepage_info[hp_sz_idx]; + used_hp[hp_sz_idx].hugepage_sz = hpi->hugepage_sz; + } + + /* make a copy of socket_mem, needed for balanced allocation. */ + for (socket_id = 0; socket_id < RTE_MAX_NUMA_NODES; socket_id++) + memory[socket_id] = internal_config.socket_mem[socket_id]; + + /* calculate final number of pages */ + if (calc_num_pages_per_socket(memory, + internal_config.hugepage_info, used_hp, + internal_config.num_hugepage_sizes) < 0) + return -1; + + for (hp_sz_idx = 0; + hp_sz_idx < (int)internal_config.num_hugepage_sizes; + hp_sz_idx++) { + for (socket_id = 0; socket_id < RTE_MAX_NUMA_NODES; + socket_id++) { + struct rte_memseg **pages; + struct hugepage_info *hpi = &used_hp[hp_sz_idx]; + unsigned int num_pages = hpi->num_pages[socket_id]; + unsigned int num_pages_alloc; + + if (num_pages == 0) + continue; + + RTE_LOG(DEBUG, EAL, + "Allocating %u pages of size %" PRIu64 "M on socket %i\n", + num_pages, hpi->hugepage_sz >> 20, socket_id); + + /* we may not be able to allocate all pages in one go, + * because we break up our memory map into multiple + * memseg lists. therefore, try allocating multiple + * times and see if we can get the desired number of + * pages from multiple allocations. + */ + + num_pages_alloc = 0; + do { + int i, cur_pages, needed; + + needed = num_pages - num_pages_alloc; + + pages = malloc(sizeof(*pages) * needed); + + /* do not request exact number of pages */ + cur_pages = eal_memalloc_alloc_seg_bulk(pages, + needed, hpi->hugepage_sz, + socket_id, false); + if (cur_pages <= 0) { + free(pages); + return -1; + } + + /* mark preallocated pages as unfreeable */ + for (i = 0; i < cur_pages; i++) { + struct rte_memseg *ms = pages[i]; + ms->flags |= + RTE_MEMSEG_FLAG_DO_NOT_FREE; + } + free(pages); + + num_pages_alloc += cur_pages; + } while (num_pages_alloc != num_pages); + } + } + /* if socket limits were specified, set them */ + if (internal_config.force_socket_limits) { + unsigned int i; + for (i = 0; i < RTE_MAX_NUMA_NODES; i++) { + uint64_t limit = internal_config.socket_limit[i]; + if (limit == 0) + continue; + if (rte_mem_alloc_validator_register("socket-limit", + limits_callback, i, limit)) + RTE_LOG(ERR, EAL, "Failed to register socket " + "limits validator callback\n"); + } + } + return 0; +} + +static int +eal_nohuge_init(void) +{ + struct rte_mem_config *mcfg; + struct rte_memseg_list *msl; + int n_segs, cur_seg; + uint64_t page_sz; + void *addr; + struct rte_fbarray *arr; + struct rte_memseg *ms; + + mcfg = rte_eal_get_configuration()->mem_config; + + /* nohuge mode is legacy mode */ + internal_config.legacy_mem = 1; + + /* create a memseg list */ + msl = &mcfg->memsegs[0]; + + page_sz = RTE_PGSIZE_4K; + n_segs = internal_config.memory / page_sz; + + if (rte_fbarray_init(&msl->memseg_arr, "nohugemem", n_segs, + sizeof(struct rte_memseg))) { + RTE_LOG(ERR, EAL, "Cannot allocate memseg list\n"); + return -1; + } + + addr = eal_mem_alloc(internal_config.memory, 0); + if (addr == NULL) { + RTE_LOG(ERR, EAL, "Cannot allocate %zu bytes", + internal_config.memory); + return -1; + } + + msl->base_va = addr; + msl->page_sz = page_sz; + msl->socket_id = 0; + msl->len = internal_config.memory; + msl->heap = 1; + + /* populate memsegs. each memseg is one page long */ + for (cur_seg = 0; cur_seg < n_segs; cur_seg++) { + arr = &msl->memseg_arr; + + ms = rte_fbarray_get(arr, cur_seg); + ms->iova = RTE_BAD_IOVA; + ms->addr = addr; + ms->hugepage_sz = page_sz; + ms->socket_id = 0; + ms->len = page_sz; + + rte_fbarray_set_used(arr, cur_seg); + + addr = RTE_PTR_ADD(addr, (size_t)page_sz); + } + + if (mcfg->dma_maskbits && + rte_mem_check_dma_mask_thread_unsafe(mcfg->dma_maskbits)) { + RTE_LOG(ERR, EAL, + "%s(): couldn't allocate memory due to IOVA " + "exceeding limits of current DMA mask.\n", __func__); + return -1; + } + + return 0; +} + +int +rte_eal_hugepage_init(void) +{ + return internal_config.no_hugetlbfs ? + eal_nohuge_init() : eal_hugepage_init(); +} + +int +rte_eal_hugepage_attach(void) +{ + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} diff --git a/lib/librte_eal/windows/eal_mp.c b/lib/librte_eal/windows/eal_mp.c new file mode 100644 index 000000000..16a5e8ba0 --- /dev/null +++ b/lib/librte_eal/windows/eal_mp.c @@ -0,0 +1,103 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2020 Dmitry Kozlyuk + */ + +/** + * @file Multiprocess support stubs + * + * Stubs must log an error until implemented. If success is required + * for non-multiprocess operation, stub must log a warning and a comment + * must document what requires success emulation. + */ + +#include +#include + +#include "eal_private.h" +#include "eal_windows.h" +#include "malloc_mp.h" + +void +rte_mp_channel_cleanup(void) +{ + EAL_LOG_NOT_IMPLEMENTED(); +} + +int +rte_mp_action_register(const char *name, rte_mp_t action) +{ + RTE_SET_USED(name); + RTE_SET_USED(action); + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +void +rte_mp_action_unregister(const char *name) +{ + RTE_SET_USED(name); + EAL_LOG_NOT_IMPLEMENTED(); +} + +int +rte_mp_sendmsg(struct rte_mp_msg *msg) +{ + RTE_SET_USED(msg); + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +int +rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply, + const struct timespec *ts) +{ + RTE_SET_USED(req); + RTE_SET_USED(reply); + RTE_SET_USED(ts); + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +int +rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts, + rte_mp_async_reply_t clb) +{ + RTE_SET_USED(req); + RTE_SET_USED(ts); + RTE_SET_USED(clb); + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +int +rte_mp_reply(struct rte_mp_msg *msg, const char *peer) +{ + RTE_SET_USED(msg); + RTE_SET_USED(peer); + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +int +register_mp_requests(void) +{ + /* Non-stub function succeeds if multi-process is not supported. */ + EAL_LOG_STUB(); + return 0; +} + +int +request_to_primary(struct malloc_mp_req *req) +{ + RTE_SET_USED(req); + EAL_LOG_NOT_IMPLEMENTED(); + return -1; +} + +int +request_sync(void) +{ + /* Common memory allocator depends on this function success. */ + EAL_LOG_STUB(); + return 0; +} diff --git a/lib/librte_eal/windows/eal_windows.h b/lib/librte_eal/windows/eal_windows.h index b202a1aa5..083ab8b93 100644 --- a/lib/librte_eal/windows/eal_windows.h +++ b/lib/librte_eal/windows/eal_windows.h @@ -9,8 +9,24 @@ * @file Facilities private to Windows EAL */ +#include #include +/** + * Log current function as not implemented and set rte_errno. + */ +#define EAL_LOG_NOT_IMPLEMENTED() \ + do { \ + RTE_LOG(DEBUG, EAL, "%s() is not implemented\n", __func__); \ + rte_errno = ENOTSUP; \ + } while (0) + +/** + * Log current function as a stub. + */ +#define EAL_LOG_STUB() \ + RTE_LOG(DEBUG, EAL, "Windows: %s() is a stub\n", __func__) + /** * Create a map of processors and cores on the system. */ @@ -36,6 +52,13 @@ int eal_thread_create(pthread_t *thread); */ unsigned int eal_socket_numa_node(unsigned int socket_id); +/** + * Open virt2phys driver interface device. + * + * @return 0 on success, (-1) on failure. + */ +int eal_mem_virt2iova_init(void); + /** * Locate Win32 memory management routines in system libraries. * diff --git a/lib/librte_eal/windows/include/meson.build b/lib/librte_eal/windows/include/meson.build index 5fb1962ac..b3534b025 100644 --- a/lib/librte_eal/windows/include/meson.build +++ b/lib/librte_eal/windows/include/meson.build @@ -5,5 +5,6 @@ includes += include_directories('.') headers += files( 'rte_os.h', + 'rte_virt2phys.h', 'rte_windows.h', ) diff --git a/lib/librte_eal/windows/include/rte_os.h b/lib/librte_eal/windows/include/rte_os.h index 510e39e03..62805a307 100644 --- a/lib/librte_eal/windows/include/rte_os.h +++ b/lib/librte_eal/windows/include/rte_os.h @@ -36,6 +36,10 @@ extern "C" { #define strncasecmp(s1, s2, count) _strnicmp(s1, s2, count) +#define open _open +#define close _close +#define unlink _unlink + /* cpu_set macros implementation */ #define RTE_CPU_AND(dst, src1, src2) CPU_AND(dst, src1, src2) #define RTE_CPU_OR(dst, src1, src2) CPU_OR(dst, src1, src2) diff --git a/lib/librte_eal/windows/include/rte_virt2phys.h b/lib/librte_eal/windows/include/rte_virt2phys.h new file mode 100644 index 000000000..4bb2b4aaf --- /dev/null +++ b/lib/librte_eal/windows/include/rte_virt2phys.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2020 Dmitry Kozlyuk + */ + +/** + * @file virt2phys driver interface + */ + +/** + * Driver device interface GUID {539c2135-793a-4926-afec-d3a1b61bbc8a}. + */ +DEFINE_GUID(GUID_DEVINTERFACE_VIRT2PHYS, + 0x539c2135, 0x793a, 0x4926, + 0xaf, 0xec, 0xd3, 0xa1, 0xb6, 0x1b, 0xbc, 0x8a); + +/** + * Driver device type for IO control codes. + */ +#define VIRT2PHYS_DEVTYPE 0x8000 + +/** + * Translate a valid non-paged virtual address to a physical address. + * + * Note: A physical address zero (0) is reported if input address + * is paged out or not mapped. However, if input is a valid mapping + * of I/O port 0x0000, output is also zero. There is no way + * to distinguish between these cases by return value only. + * + * Input: a non-paged virtual address (PVOID). + * + * Output: the corresponding physical address (LARGE_INTEGER). + */ +#define IOCTL_VIRT2PHYS_TRANSLATE CTL_CODE( \ + VIRT2PHYS_DEVTYPE, 0x800, METHOD_BUFFERED, FILE_ANY_ACCESS) diff --git a/lib/librte_eal/windows/include/rte_windows.h b/lib/librte_eal/windows/include/rte_windows.h index ed6e4c148..899ed7d87 100644 --- a/lib/librte_eal/windows/include/rte_windows.h +++ b/lib/librte_eal/windows/include/rte_windows.h @@ -23,6 +23,8 @@ #include #include +#include +#include /* Have GUIDs defined. */ #ifndef INITGUID diff --git a/lib/librte_eal/windows/include/unistd.h b/lib/librte_eal/windows/include/unistd.h index 757b7f3c5..6b33005b2 100644 --- a/lib/librte_eal/windows/include/unistd.h +++ b/lib/librte_eal/windows/include/unistd.h @@ -9,4 +9,7 @@ * as Microsoft libc does not contain unistd.h. This may be removed * in future releases. */ + +#include + #endif /* _UNISTD_H_ */ diff --git a/lib/librte_eal/windows/meson.build b/lib/librte_eal/windows/meson.build index 81d3ee095..0bd56cd8f 100644 --- a/lib/librte_eal/windows/meson.build +++ b/lib/librte_eal/windows/meson.build @@ -8,7 +8,11 @@ sources += files( 'eal_debug.c', 'eal_hugepages.c', 'eal_lcore.c', + 'eal_memalloc.c', 'eal_memory.c', + 'eal_mp.c', 'eal_thread.c', 'getopt.c', ) + +dpdk_conf.set10('RTE_EAL_NUMA_AWARE_HUGEPAGES', true)